Re: O_DIRECT "packet mode" pipes on Linux

2020-09-23 Thread Vito Caputo
On Thu, Sep 24, 2020 at 12:48:14PM +0700, Robert Elz wrote:
> Date:Wed, 23 Sep 2020 21:47:10 -0700
> From:Vito Caputo 
> Message-ID:  <20200924044710.xpltp22bpxoxi...@shells.gnugeneration.com>
> 
> 
>   | It's useful if you're doing something like say, aggregating data from
>   | multiple piped sources into a single bytestream.  With the default
>   | pipe behavior, you'd have the output interleaved at random boundaries.
> 
> If that's happening, then either the pipe implementation is badly broken,
> or the applications using it aren't doing what you'd like them to do.
> 
> Writes (<= the pipe buffer size) have always (since ancient unix, probably
> since pipes were first created) been atomic - nothing will randomly split
> the data.
> 
> What the new option is offering (as best I can tell from the discussion
> here, I am not a linux user) is passing those boundaries through the pipe
> to the reader - that hasn't been a pipe feature, but it is exactly what a
> unix domain datagram socket provides (these days pipes are sometimes
> implemented using unix domain connection oriented sockets ... I'm guessing
> that the option simply changes the transport protocol used with an
> implementation that works that way).
> 

Apparently I was incomplete in describing my conjured example.

The aggregator in this case is a process connected to multiple pipes,
not a pipe with multiple writer processes.

What you describe is correct WRT multiple writers to a shared pipe.

In my example, the aggregator can trivially read the separate records
at the write boundaries from each of the connected packetized pipes.
The reads return at the write boundaries.  Without packetized pipes
you'd need to parse the contents to search for record boundaries.

Imagine it's like an inverted `tee` for input instead of output.
Without packetized pipes, this hypethetical program couldn't
interleave the collected inputs at record boundaries without parsing
the contents.  Presumably this is *why* we don't already have an input
version of `tee`.  I'd like to work towards changing that.


>   | With packetized pipes, if your sources write say, newline-delimited
>   | text records, kept under PIPE_BUF length, the aggregated output would
>   | always interleave between the lines, never in the middle of them.
> 
> That happens with regular pipes.
> 

See above, the aggregator is a process, not a shared pipe.


>   | If we added this to the shell, I suppose the next thing to explore
>   | would be how to get all the existing core shell utilities to detect a
>   | packetized pipe on stdout and switch to a line-buffered mode instead
>   | of block-buffered, assuming they're using stdio.
> 
> I suspect that is really all you need - a mechanism to request line
> buffered output rather than blocksize buffered.   You don't need to
> go fiddling with pipes for that, and abusing the pipe interface as a
> way to pass a "line buffer this output please" request to the application
> seems like the wrong way to achieve that to me.
> 

This is probably true, though if a packetized pipe were introspectable
we could request the behavior via the ||| construction, while
simultaneously enabling record boundaries regardless of how the
contents are delimited.  If consumers knew about packetized pipes,
they could treat the separately returned reads as records
independently of what's inside.

> This isn't a criticism of the datagram packet pipe idea - there are
> applications for that (pipe is easier to use than manually setting up
> a pair of unix domain datagram sockets) but that is for specialised
> applications, where for whatever reason the receiver needs to read just
> one packet at a time (usually because of a desire to have multiple
> reading applications, each taking the next request, and then processing
> it ... if there is just one receiving process all that is needed is
> to stick a record length before each packet sent to a normal pipe, and
> let the receiver process the records from the aggregations it receives).
> 

Thanks for the thoughtful response,
Vito Caputo



Re: O_DIRECT "packet mode" pipes on Linux

2020-09-23 Thread Robert Elz
Date:Wed, 23 Sep 2020 21:47:10 -0700
From:Vito Caputo 
Message-ID:  <20200924044710.xpltp22bpxoxi...@shells.gnugeneration.com>


  | It's useful if you're doing something like say, aggregating data from
  | multiple piped sources into a single bytestream.  With the default
  | pipe behavior, you'd have the output interleaved at random boundaries.

If that's happening, then either the pipe implementation is badly broken,
or the applications using it aren't doing what you'd like them to do.

Writes (<= the pipe buffer size) have always (since ancient unix, probably
since pipes were first created) been atomic - nothing will randomly split
the data.

What the new option is offering (as best I can tell from the discussion
here, I am not a linux user) is passing those boundaries through the pipe
to the reader - that hasn't been a pipe feature, but it is exactly what a
unix domain datagram socket provides (these days pipes are sometimes
implemented using unix domain connection oriented sockets ... I'm guessing
that the option simply changes the transport protocol used with an
implementation that works that way).

  | With packetized pipes, if your sources write say, newline-delimited
  | text records, kept under PIPE_BUF length, the aggregated output would
  | always interleave between the lines, never in the middle of them.

That happens with regular pipes.

  | If we added this to the shell, I suppose the next thing to explore
  | would be how to get all the existing core shell utilities to detect a
  | packetized pipe on stdout and switch to a line-buffered mode instead
  | of block-buffered, assuming they're using stdio.

I suspect that is really all you need - a mechanism to request line
buffered output rather than blocksize buffered.   You don't need to
go fiddling with pipes for that, and abusing the pipe interface as a
way to pass a "line buffer this output please" request to the application
seems like the wrong way to achieve that to me.

This isn't a criticism of the datagram packet pipe idea - there are
applications for that (pipe is easier to use than manually setting up
a pair of unix domain datagram sockets) but that is for specialised
applications, where for whatever reason the receiver needs to read just
one packet at a time (usually because of a desire to have multiple
reading applications, each taking the next request, and then processing
it ... if there is just one receiving process all that is needed is
to stick a record length before each packet sent to a normal pipe, and
let the receiver process the records from the aggregations it receives).

kre





Re: O_DIRECT "packet mode" pipes on Linux

2020-09-23 Thread Vito Caputo
On Wed, Sep 23, 2020 at 11:53:10PM -0400, Lawrence Velázquez wrote:
> > On Sep 23, 2020, at 11:41 PM, Vito Caputo  wrote:
> > 
> > On Wed, Sep 23, 2020 at 09:12:40AM -0400, Chet Ramey wrote:
> >> On 9/22/20 11:23 PM, Vito Caputo wrote:
> >>> Hello list,
> >>> 
> >>> Is there any chance we could get a | modifier for enabling O_DIRECT on the
> >>> created pipe?  "Packet" style pipes have some interesting and potentially
> >>> useful properties, it would be nice if bash made them more accessible.
> >> 
> >> Is there a general need, especially since they're Linux-specific?
> >> 
> > 
> > I'm not sure, but as far as GNU/Linux distros go bash, is kind of the
> > canonical shell, and this functionality is kind of inaccessible
> > without the shell wiring it up.
> 
> What functionality? I (and I'm sure some others) am not familiar
> with packet-style pipes and their benefits. You haven't actually
> described *how* exposing them would be useful, and why that would
> justify introducing new syntax that only matters/works on Linux.
> 

Packetized pipes establish well-defined boundaries between writes
reproduced at the read side.  If the write sizes are kept within
PIPE_BUF bounds, then you can be certain what's read is an atomic
record including nothing from a subsequent or previous write, with no
possibility for partial records.

It's useful if you're doing something like say, aggregating data from
multiple piped sources into a single bytestream.  With the default
pipe behavior, you'd have the output interleaved at random boundaries.
With packetized pipes, if your sources write say, newline-delimited
text records, kept under PIPE_BUF length, the aggregated output would
always interleave between the lines, never in the middle of them.

If we added this to the shell, I suppose the next thing to explore
would be how to get all the existing core shell utilities to detect a
packetized pipe on stdout and switch to a line-buffered mode instead
of block-buffered, assuming they're using stdio.  That should turn
their lines into packets on the pipe, and it all becomes generally
relevant across the existing shell utils landscape.  This heuristic
echoes of the terminal output detection for stdout line-buffering
already performed according to setvbuf(3).

Thanks,
Vito Caputo



Re: O_DIRECT "packet mode" pipes on Linux

2020-09-23 Thread Lawrence Velázquez
> On Sep 23, 2020, at 11:41 PM, Vito Caputo  wrote:
> 
> On Wed, Sep 23, 2020 at 09:12:40AM -0400, Chet Ramey wrote:
>> On 9/22/20 11:23 PM, Vito Caputo wrote:
>>> Hello list,
>>> 
>>> Is there any chance we could get a | modifier for enabling O_DIRECT on the
>>> created pipe?  "Packet" style pipes have some interesting and potentially
>>> useful properties, it would be nice if bash made them more accessible.
>> 
>> Is there a general need, especially since they're Linux-specific?
>> 
> 
> I'm not sure, but as far as GNU/Linux distros go bash, is kind of the
> canonical shell, and this functionality is kind of inaccessible
> without the shell wiring it up.

What functionality? I (and I'm sure some others) am not familiar
with packet-style pipes and their benefits. You haven't actually
described *how* exposing them would be useful, and why that would
justify introducing new syntax that only matters/works on Linux.

vq


Re: O_DIRECT "packet mode" pipes on Linux

2020-09-23 Thread Vito Caputo
On Wed, Sep 23, 2020 at 09:12:40AM -0400, Chet Ramey wrote:
> On 9/22/20 11:23 PM, Vito Caputo wrote:
> > Hello list,
> > 
> > Is there any chance we could get a | modifier for enabling O_DIRECT on the
> > created pipe?  "Packet" style pipes have some interesting and potentially
> > useful properties, it would be nice if bash made them more accessible.
> 
> Is there a general need, especially since they're Linux-specific?
>

I'm not sure, but as far as GNU/Linux distros go bash, is kind of the
canonical shell, and this functionality is kind of inaccessible
without the shell wiring it up.

If I'm not mistaken this pipe flavor exists as the default behavior in
plan9, so it's not entirely unique to linux conceptually.

> What kind of modifier would you suggest?
> 

Maybe triple pipe could be packetized pipe?  It visually expresses
being sliced up somewhat; `foo ||| bar`

> Does anyone want to take a shot at implementing this idea?
> 

It's possible I could find time to take a stab at it, if nobody else
wants to.

Cheers,
Vito Caputo



Re: Feature request: Enable possibility of colored stderr output

2020-09-23 Thread Dale R. Worley
A M  writes:
> It would be really neat to have functionality that stderr could be 
> output in a different color compared to stdout.

It seems like the right way to implement this would be to build a
program that reads from two inputs at once and interleaves what it gets
from the two inputs, and outputs that on its stdout.  In addition, it
would ensure that text from each input was labled with the right
colorization codes (whatever that might be).  Writing that has some
complexity but is mostly careful programming.  It also involves nailing
down exactly what colorization scheme you want to use.

Then, in bash, you could execute something like:

exec >&$xxx 2>&$yyy

to redirect all future stderr and stdout into the post-processor.

The part I can't figure out is how to start and background the
postprocessor leaving its two input fd's (xxx and yyy in the above
commands) open and accessible to later bash commands.  That isn't
difficult to do with one fd; you can do something like

exec 3> >( postprocessor )

command >&3

I wonder if you could do something like this, using a process
substitition to a dummy "sleep" process to create an additional
connection:

exec 3> >( sleep 60 )
exec 4> >( postprocessor /dev/fd/3 )

exec >&3 2>&4
exec 3>&- 4>&-

I'll leave that question to the experts.

Dale



Re: Confusing documentation of `ENV` in section "Bash variables"

2020-09-23 Thread Reuben Thomas via Bug reports for the GNU Bourne Again SHell
On Wed, 23 Sep 2020 at 14:24, Chet Ramey  wrote:


> "Expanded and executed similarly to BASH_ENV when an interactive shell is
> invoked in POSIX Mode."
>

Yes, that's better than my suggestions, thanks!

-- 
https://rrt.sc3d.org


Re: Confusing documentation of `ENV` in section "Bash variables"

2020-09-23 Thread Chet Ramey
On 9/22/20 6:27 PM, Reuben Thomas via Bug reports for the GNU Bourne Again
SHell wrote:
> The documentation says:
> 
> 'ENV'
>  Similar to 'BASH_ENV'; used when the shell is invoked in POSIX Mode
>  (*note Bash POSIX Mode::).

Hmmm, I can see that. How about, as you suggest, something like

"Expanded and executed similarly to BASH_ENV when an interactive shell is
invoked in POSIX Mode."

The behavior is described pretty completely in the INVOCATION section.


-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: O_DIRECT "packet mode" pipes on Linux

2020-09-23 Thread Chet Ramey
On 9/22/20 11:23 PM, Vito Caputo wrote:
> Hello list,
> 
> Is there any chance we could get a | modifier for enabling O_DIRECT on the
> created pipe?  "Packet" style pipes have some interesting and potentially
> useful properties, it would be nice if bash made them more accessible.

Is there a general need, especially since they're Linux-specific? What kind
of modifier would you suggest?

Does anyone want to take a shot at implementing this idea?


-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



O_DIRECT "packet mode" pipes on Linux

2020-09-23 Thread Vito Caputo
Hello list,

Is there any chance we could get a | modifier for enabling O_DIRECT on the
created pipe?  "Packet" style pipes have some interesting and potentially
useful properties, it would be nice if bash made them more accessible.

pipe(2) O_DIRECT excerpt:

O_DIRECT (since Linux 3.4)
   Create a pipe that performs I/O in "packet" mode.  Each write(2)
   to  the  pipe  is  dealt with as a separate packet, and read(2)s
   from the pipe will read one packet at a time.  Note the  follow‐
   ing points:
   
   *  Writes  of  greater than PIPE_BUF bytes (see pipe(7)) will be
  split  into  multiple  packets.   The  constant  PIPE_BUF  is
  defined in .
   
   *  If a read(2) specifies a buffer size that is smaller than the
  next packet, then the requested number of bytes are read, and
  the  excess  bytes in the packet are discarded.  Specifying a
  buffer size of  PIPE_BUF  will  be  sufficient  to  read  the
  largest possible packets (see the previous point).
   
   *  Zero-length packets are not supported.  (A read(2) that spec‐
  ifies a buffer size of zero is a no-op, and returns 0.)
   
   Older kernels that do not support this flag will  indicate  this
   via an EINVAL error.

Regards,
Vito Caputo



Re: Bash-5.1-beta available

2020-09-23 Thread felix
Hi,

On Mon, Sep 21, 2020 at 09:13:13PM -0400, Dale R. Worley wrote:
> Andreas Schwab  writes:
> I assume that if you really want the old effect, you can still do
> 
> exec {dup}<&1
You mean:
exec {dup}<&0

> 
> ... <( ... <$dup ) ...
and:

... <( ... <&$dup ) ...

> 
> exec {dup}<&-
> 
> Dale
Sample:

This worked under previous bash versions (I use 5.0.3(1)-release):

testForkInput () {
local line 
while read line ;do
echo "$line" 
done < <(
sed 's/^/> /'
)
}

But with 5.1.0(1)-beta, I have to replace this with:

testForkInput () {
local dup line
exec {dup}<&0
while read line ;do
echo "$line"
done < <(
sed 's/^/> /' <&$dup
)
exec {dup}>&-
}

testForkInput < <(seq 1 5)
> 1
> 2
> 3
> 4
> 5

On Tue, Sep 22, 2020 at 08:57:57AM +0200, Andreas Schwab wrote:
> That point is that it silently breaks existing scripts.

I agree.

-- 
 Félix Hauri  --  http://www.f-hauri.ch