Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2023-03-01 Thread Pádraig Brady

On 01/03/2023 09:25, Arsen Arsenović wrote:

Morning Padraig,

Pádraig Brady  writes:


I'm squashing in the following to handle illumos and macOS.
Also it decouples the code from the pollfd structure layout,
by using C99 designated initializers.


Thanks.  I read over the patches you attached and they seem reasonable.

Thanks for working on getting this merged, have a great day!


Thanks for the review.

Merged.

cheers,
Pádraig




Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2023-03-01 Thread Arsen Arsenović
Morning Padraig,

Pádraig Brady  writes:

> I'm squashing in the following to handle illumos and macOS.
> Also it decouples the code from the pollfd structure layout,
> by using C99 designated initializers.

Thanks.  I read over the patches you attached and they seem reasonable.

Thanks for working on getting this merged, have a great day!

> cheers,
> Pádraig
>
> diff --git a/src/iopoll.c b/src/iopoll.c
> index 916241f89..ceb1b43ad 100644
> --- a/src/iopoll.c
> +++ b/src/iopoll.c
> @@ -49,7 +49,10 @@
>  extern int
>  iopoll (int fdin, int fdout)
>  {
> -  struct pollfd pfds[2] = {{fdin, POLLIN, 0}, {fdout, 0, 0}};
> +  struct pollfd pfds[2] = {  /* POLLRDBAND needed for illumos, macOS.  */
> +{ .fd = fdin,  .events = POLLIN | POLLRDBAND, .revents = 0 },
> +{ .fd = fdout, .events = POLLRDBAND, .revents = 0 },
> +  };
>
>while (poll (pfds, 2, -1) > 0 || errno == EINTR)
>  {


-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2023-02-28 Thread Pádraig Brady

On 27/02/2023 18:26, Pádraig Brady wrote:

Attached are the current patches I have for this functionality.
I'll do a bit more testing and hope to push them early tomorrow.


I'm squashing in the following to handle illumos and macOS.
Also it decouples the code from the pollfd structure layout,
by using C99 designated initializers.

cheers,
Pádraig

diff --git a/src/iopoll.c b/src/iopoll.c
index 916241f89..ceb1b43ad 100644
--- a/src/iopoll.c
+++ b/src/iopoll.c
@@ -49,7 +49,10 @@
 extern int
 iopoll (int fdin, int fdout)
 {
-  struct pollfd pfds[2] = {{fdin, POLLIN, 0}, {fdout, 0, 0}};
+  struct pollfd pfds[2] = {  /* POLLRDBAND needed for illumos, macOS.  */
+{ .fd = fdin,  .events = POLLIN | POLLRDBAND, .revents = 0 },
+{ .fd = fdout, .events = POLLRDBAND, .revents = 0 },
+  };

   while (poll (pfds, 2, -1) > 0 || errno == EINTR)
 {




Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2023-02-27 Thread Pádraig Brady

Attached are the current patches I have for this functionality.
I'll do a bit more testing and hope to push them early tomorrow.

thanks,
PádraigFrom 229a87ed0e45eb5f6875147c48719820291e5238 Mon Sep 17 00:00:00 2001
From: Carl Edquist 
Date: Thu, 15 Dec 2022 06:10:33 -0600
Subject: [PATCH 1/4] all: add broken pipe detection while waiting for input

When a program's output becomes a broken pipe, future attempts to write
to that ouput will fail (SIGPIPE/EPIPE).  Once it is known that all
future write attepts will fail (due to broken pipes), in many cases it
becomes pointless to wait for further input for slow devices like ttys.
Ideally, a program could use this information to exit early once it is
known that future writes will fail.

Introduce iopoll() to wait on a pair of fds (input & output) for input
to become ready or output to become a broken pipe.

This is relevant when input is intermittent (a tty, pipe, or socket);
but if input is always ready (a regular file or block device), then
a read() will not block, and write failures for a broken pipe will
happen normally.

Introduce iopoll_input_ok() to check whether an input fd is relevant
for iopoll().

Experimentally, broken pipes are only detectable immediately for pipes,
but not sockets.  Errors for other file types will be detected in the
usual way, on write failure.

Introduce iopoll_output_ok() to check whether an output fd is suitable
for iopoll() -- namely, whether it is a pipe.

iopoll() is best implemented with a native poll(2) where possible, but
fall back to a select(2)-based implementation platforms where there are
portability issues.  See also discussion in tail.c.

In general, adding a call to iopoll() before a read() in filter programs
also allows broken pipes to "propagate" backwards in a shell pipeline.

* src/iopoll.c, src/iopoll.h (iopoll): New function implementing broken
pipe detection on output while waiting for input.
(IOPOLL_BROKEN_OUTPUT, IOPOLL_ERROR): Return codes for iopoll().
(IOPOLL_USES_POLL): Macro for poll() vs select() implementation.
(iopoll_input_ok): New function to check whether an input fd is relevant
for iopoll().
(iopoll_output_ok): New function to check whether an input fd is
suitable for iopoll().
* src/local.mk (noinst_HEADERS): add src/iopoll.h.
---
 src/iopoll.c | 118 +++
 src/iopoll.h |   6 +++
 src/local.mk |   1 +
 3 files changed, 125 insertions(+)
 create mode 100644 src/iopoll.c
 create mode 100644 src/iopoll.h

diff --git a/src/iopoll.c b/src/iopoll.c
new file mode 100644
index 0..916241f89
--- /dev/null
+++ b/src/iopoll.c
@@ -0,0 +1,118 @@
+/* iopoll.c  -- broken pipe detection (while waiting for input)
+   Copyright (C) 1989-2022 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation, either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see .
+
+   Written by Carl Edquist in collaboration with Arsen Arsenović.  */
+
+#include 
+
+  /* poll(2) is needed on AIX (where 'select' gives a readable
+ event immediately) and Solaris (where 'select' never gave
+ a readable event).  Also use poll(2) on systems we know work
+ and/or are already using poll (linux).  */
+
+#if defined _AIX || defined __sun || defined __APPLE__ || \
+defined __linux__ || defined __ANDROID__
+# define IOPOLL_USES_POLL 1
+#endif
+
+#if IOPOLL_USES_POLL
+# include 
+#else
+# include 
+#endif
+
+#include "system.h"
+#include "iopoll.h"
+#include "isapipe.h"
+
+
+/* Wait for fdin to become ready for reading or fdout to become a broken pipe.
+   Return 0 if fdin can be read() without blocking, or IOPOLL_BROKEN_OUTPUT if
+   fdout becomes a broken pipe, otherwise IOPOLL_ERROR if there is a poll()
+   or select() error.  */
+
+#if IOPOLL_USES_POLL
+
+extern int
+iopoll (int fdin, int fdout)
+{
+  struct pollfd pfds[2] = {{fdin, POLLIN, 0}, {fdout, 0, 0}};
+
+  while (poll (pfds, 2, -1) > 0 || errno == EINTR)
+{
+  if (errno == EINTR)
+continue;
+  if (pfds[0].revents) /* input available or pipe closed indicating EOF; */
+return 0;  /* should now be able to read() without blocking  */
+  if (pfds[1].revents)/* POLLERR, POLLHUP (or POLLNVAL) */
+return IOPOLL_BROKEN_OUTPUT;  /* output error or broken pipe*/
+}
+  return IOPOLL_ERROR;  /* poll error */
+}
+
+#else  /* fall back to select()-based implementation */
+
+extern int
+iopoll (int 

Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2023-02-27 Thread Pádraig Brady

On 03/01/2023 17:49, Arsen Arsenović wrote:

Hi Padraig,

Pádraig Brady  writes:


Really nice work on this.
Only very small syntax tweaks (attached) at this point.
I'm going to do testing with this and will add an appropriate test case.


I spotted some a slightly less minor error, and notified Carl off-list,
but you beat us to resubmitting a fixed patchset ;)

Namely, select (rfds, ...) would leave the state of rfds undefined.  On
Linux, this didn't cause errors, but I can definitely see it doing so on
other platforms.  I attached a patch that fixes that.


Squashed that in locally, thanks.


I also attached the test case I mentioned.


Async stuff like this is tricky to test.
Your test waited at least 5 seconds, was still a bit racy,
and had some bashisms. I think I'll go with the following
which runs in around 0.1s normally.

diff --git a/tests/misc/tee.sh b/tests/misc/tee.sh
index b96523186..30d64a9d2 100755
--- a/tests/misc/tee.sh
+++ b/tests/misc/tee.sh
@@ -63,6 +63,16 @@ if test -w /dev/full && test -c /dev/full; then
   test $(wc -l < err) = 1 || { cat err; fail=1; }
 fi

+# Test iopoll-powered early exit for closed pipes
+tee_exited() { sleep $1; test -f tee.exited; }
+# Currently this functionality is most useful with
+# intermittent input from a terminal, but here we
+# use an input pipe that doesn't write anything
+# but will exit as soon as tee does, or it times out
+retry_delay_ tee_exited .1 7 | # 12.7s (Must be > following timeout)
+{ timeout 10 tee -p 2>err && touch tee.exited; } | :
+test $(wc -l < err) = 0 || { cat err; fail=1; }
+test -f tee.exited || fail=1



Oh, I also just noticed: In the non-poll case, a warning will be emitted
because an undefined macro value is used in an #if.  Please also add a

   #else
   # define IOPOLL_USES_POLL 0


We rely quite often on -Wundef being disabled,
so it's probably best for cleaner code to avoid
the explicit define in that case.


cheers,
Pádraig



Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2023-02-02 Thread Arsen Arsenović

Pádraig Brady  writes:

> Yes definitely.
> This is the top of my list to merge.

Lovely, thanks!
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2023-02-02 Thread Pádraig Brady

On 02/02/2023 09:40, Arsen Arsenović wrote:

Hi Padraig,

I saw that you are planning on making a coreutils release soon.  Can
these patches be included in it?

Thanks in advance, have a lovely day.


Yes definitely.
This is the top of my list to merge.

thanks,
Pádraig



Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2023-02-02 Thread Arsen Arsenović
Hi Padraig,

I saw that you are planning on making a coreutils release soon.  Can
these patches be included in it?

Thanks in advance, have a lovely day.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2023-01-03 Thread Arsen Arsenović
Hi Padraig,

Pádraig Brady  writes:

> Really nice work on this.
> Only very small syntax tweaks (attached) at this point.
> I'm going to do testing with this and will add an appropriate test case.

I spotted some a slightly less minor error, and notified Carl off-list,
but you beat us to resubmitting a fixed patchset ;)

Namely, select (rfds, ...) would leave the state of rfds undefined.  On
Linux, this didn't cause errors, but I can definitely see it doing so on
other platforms.  I attached a patch that fixes that.  I also attached
the test case I mentioned.

From 2e26d25475b1541ff6f03685c671c63277b837d5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Arsen=20Arsenovi=C4=87?= 
Date: Tue, 3 Jan 2023 18:05:07 +0100
Subject: [PATCH 1/2] iopoll: Fix select fd_set UB in iopoll

* src/iopoll.c (iopoll): Reinitialize rfds fd_set on each select
iteration.
---
 src/iopoll.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/src/iopoll.c b/src/iopoll.c
index 9424af6fa..e1714728c 100644
--- a/src/iopoll.c
+++ b/src/iopoll.c
@@ -66,19 +66,22 @@ iopoll(int fdin, int fdout)
 #else  /* fall back to select()-based implementation */
 
 extern int
-iopoll(int fdin, int fdout)
+iopoll (int fdin, int fdout)
 {
   int nfds = (fdin > fdout ? fdin : fdout) + 1;
-  fd_set rfds;
-  FD_ZERO ();
-  FD_SET (fdin, );
-  FD_SET (fdout, );
+  int ret = 0;
 
   /* If fdout has an error condition (like a broken pipe) it will be seen
  as ready for reading.  Assumes fdout is not actually readable.  */
-  while (select (nfds, , NULL, NULL, NULL) > 0 || errno == EINTR)
+  while (ret >= 0 || errno == EINTR)
 {
-  if (errno == EINTR)
+  fd_set rfds;
+  FD_ZERO ();
+  FD_SET (fdin, );
+  FD_SET (fdout, );
+  ret = select (nfds, , NULL, NULL, NULL);
+
+  if (ret < 0)
 continue;
   if (FD_ISSET (fdin, ))  /* input available or EOF; should now */
 return 0;  /* be able to read() without blocking */
-- 
2.39.0

From 28abc6c347e74a0e61dc3dfac40f09b186fab65f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Arsen=20Arsenovi=C4=87?= 
Date: Tue, 3 Jan 2023 18:06:45 +0100
Subject: [PATCH 2/2] tests: Add test for tee -p

* tests/misc/tee.sh: Add tee -p test.
---
 tests/misc/tee.sh | 17 +
 1 file changed, 17 insertions(+)

diff --git a/tests/misc/tee.sh b/tests/misc/tee.sh
index 0bd91b6cb..6945865ff 100755
--- a/tests/misc/tee.sh
+++ b/tests/misc/tee.sh
@@ -63,6 +63,23 @@ if test -w /dev/full && test -c /dev/full; then
   test $(wc -l < err) = 1 || { cat err; fail=1; }
 fi
 
+# This is a testcase for the iopoll-powered read-write loop in tee.  In
+# essence, the test checks for sleep exiting as soon as all it's outputs die.
+# With the presence of some bashisms, this test could be more complete, so that
+# it includes tests for outputting to named pipes too, but the handling of
+# outputs in tee is sufficiently elegant to make it hopefully identical.
+#
+# Component breakdown of this pipeline:
+# - sleep emulates misbehaving input.
+# - The timeout is our failure safety-net.
+# - We ignore stderr from tee, and should have no stdout anyway.
+# - If the tee succeeds, we print TEST_PASSED into FD 8 to grep for later.
+# (FD 8 was selected by a D20 roll, or rather, a software emulation of one)
+# - The colon is the immediately closed output process.
+# - We redirect 8 back into stdout to grep it.
+( sleep 5 | (timeout 3 tee -p 2>/dev/null && echo TEST_PASSED >&8) | : ) 8>&1 \
+| grep -x TEST_PASSED >/dev/null || fail=1
+
 
 # Ensure tee honors --output-error modes
 mkfifo_or_skip_ fifo
-- 
2.39.0


I originally wanted to include these squashed into the original two
commits, which is why I held off from posting an amended patchset.

Oh, I also just noticed: In the non-poll case, a warning will be emitted
because an undefined macro value is used in an #if.  Please also add a

  #else
  # define IOPOLL_USES_POLL 0

... branch.

Thanks, and happy holidays!
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2023-01-03 Thread Pádraig Brady

On 15/12/2022 18:38, Carl Edquist wrote:

Aaak!  Sorry, please disregard the patchset in my previous email.  I
messed up a rebase I was doing.  I am attaching a corrected patchset now.


Really nice work on this.
Only very small syntax tweaks (attached) at this point.
I'm going to do testing with this and will add an appropriate test case.

thank you!
Pádraig
diff --git a/NEWS b/NEWS
index bd4f1d082..f33f9acdf 100644
--- a/NEWS
+++ b/NEWS
@@ -96,7 +96,8 @@ GNU coreutils NEWS-*- outline -*-
   diagnosed, rather than being conflated with ENOTEMPTY.
 
   tee -p detects when all remaining outputs have become broken pipes, and
-  exits, rather than waiting for more input before producing a write failure.
+  exits, rather than waiting for more input to induce an exit when written.
+
 
 * Noteworthy changes in release 9.1 (2022-04-15) [stable]
 
diff --git a/src/iopoll.c b/src/iopoll.c
index 9424af6fa..808c39b53 100644
--- a/src/iopoll.c
+++ b/src/iopoll.c
@@ -23,17 +23,17 @@
  a readable event).  Also use poll(2) on systems we know work
  and/or are already using poll (linux).  */
 
-#define IOPOLL_USES_POLL \
-defined _AIX || defined __sun || defined __APPLE__ || \
-defined __linux__ || defined __ANDROID__
+#if defined _AIX || defined __sun || defined __APPLE__ || \
+defined __linux__ || defined __ANDROID__
+# define IOPOLL_USES_POLL 1
+#endif
 
 #if IOPOLL_USES_POLL
-#include 
+# include 
 #else
-#include 
+# include 
 #endif
 
-#include "error.h"
 #include "system.h"
 #include "iopoll.h"
 #include "isapipe.h"
@@ -47,7 +47,7 @@
 #if IOPOLL_USES_POLL
 
 extern int
-iopoll(int fdin, int fdout)
+iopoll (int fdin, int fdout)
 {
   struct pollfd pfds[2] = {{fdin, POLLIN, 0}, {fdout, 0, 0}};
 
@@ -66,7 +66,7 @@ iopoll(int fdin, int fdout)
 #else  /* fall back to select()-based implementation */
 
 extern int
-iopoll(int fdin, int fdout)
+iopoll (int fdin, int fdout)
 {
   int nfds = (fdin > fdout ? fdin : fdout) + 1;
   fd_set rfds;
@@ -96,19 +96,20 @@ iopoll(int fdin, int fdout)
which is the case for a regular file or block device.  */
 
 extern bool
-iopoll_input_ok(int fdin)
+iopoll_input_ok (int fdin)
 {
   struct stat st;
-  bool always_ready = fstat (fdin, ) == 0 && (S_ISREG (st.st_mode) ||
- S_ISBLK (st.st_mode));
-  return !always_ready;
+  bool always_ready = fstat (fdin, ) == 0
+  && (S_ISREG (st.st_mode)
+  || S_ISBLK (st.st_mode));
+  return ! always_ready;
 }
 
 /* Return true if fdout is suitable for iopoll().
Namely, fdout refers to a pipe.  */
 
 extern bool
-iopoll_output_ok(int fdout)
+iopoll_output_ok (int fdout)
 {
   return isapipe (fdout) > 0;
 }
diff --git a/src/iopoll.h b/src/iopoll.h
index f57e3a33d..85935e960 100644
--- a/src/iopoll.h
+++ b/src/iopoll.h
@@ -1,6 +1,6 @@
 #define IOPOLL_BROKEN_OUTPUT -2
 #define IOPOLL_ERROR -3
 
-int iopoll(int fdin, int fdout);
-bool iopoll_input_ok(int fdin);
-bool iopoll_output_ok(int fdout);
+int iopoll (int fdin, int fdout);
+bool iopoll_input_ok (int fdin);
+bool iopoll_output_ok (int fdout);
diff --git a/src/tee.c b/src/tee.c
index 6ba0dd655..c69ea9133 100644
--- a/src/tee.c
+++ b/src/tee.c
@@ -154,9 +154,9 @@ main (int argc, char **argv)
   else
 output_error = output_error_warn_nopipe;
 
-  /* Detect and close a broken pipe output when ignorning EPIPE */
-  if (output_error == output_error_warn_nopipe ||
-  output_error == output_error_exit_nopipe)
+  /* Detect and close a broken pipe output when ignoring EPIPE.  */
+  if (output_error == output_error_warn_nopipe
+  || output_error == output_error_exit_nopipe)
 pipe_check = true;
 
   break;
@@ -195,7 +195,7 @@ main (int argc, char **argv)
or -1 if all are NULL.  */
 
 static int
-get_next_out(FILE **descriptors, int nfiles, int idx)
+get_next_out (FILE **descriptors, int nfiles, int idx)
 {
   for (idx++; idx <= nfiles; idx++)
 if (descriptors[idx])
@@ -207,11 +207,12 @@ get_next_out(FILE **descriptors, int nfiles, int idx)
Return true if this indicates a reportable error.  */
 
 static bool
-fail_output(FILE **descriptors, char **files, int i)
+fail_output (FILE **descriptors, char **files, int i)
 {
   int w_errno = errno;
-  bool fail = errno != EPIPE || (output_error == output_error_exit
-|| output_error == output_error_warn);
+  bool fail = errno != EPIPE
+  || output_error == output_error_exit
+  || output_error == output_error_warn;
   if (descriptors[i] == stdout)
 clearerr (stdout); /* Avoid redundant close_stdout diagnostic.  */
   if (fail)


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-12-15 Thread Carl Edquist

Hi Pádraig,
O2
At your convenience, please see my attached patch set.


... A couple very small implementation details I guess I'll mention here.

1. In iopoll.c, I use isapipe() from gnulib, as is done in tail.c.  I am 
not wild about the gnulib implementation of isapipe() - I would rather do 
a simple S_ISFIFO check, as the FIFO vs pipe distinction is irrelevant for 
iopoll(), and I'm not sure whether the "pipes are sockets" for Darwin 7.7 
still needs to be supported.  But I was happy to see (at least for me on 
Linux) it looks like the config.h macros end up compiling away the 
complicated parts of this function, so I guess it's not that bad  :)


2. I believe the "if (errno == EINTR) continue;" logic in iopoll() is 
unnecessary, as an interrupted poll() or select() call will result in 
repeating the loop anyway.  I've left it for clarity, but as far as I can 
tell it could be removed just the same.



On Tue, 13 Dec 2022, Arsen Arsenović wrote:

It might also be good to give a quick test on FreeBSD, since it has some 
popularity too.


I don't have easy access to play with this, but yeah I think I've seen the 
BSD man page for poll(2), and I imagine poll would work there.  If so we 
can tweak the IOPOLL_USES_POLL definition in iopoll.c to include an 
appropriate preprocessor check for BSD.




If you prefer, I'll have some time in the latter part of this week too.

Let's not forget to include the testcase posted previously (with -p 
instead of -P, since it was suggested to enable polling for -p):


Hi Arsen,

Would you like to add your test case and comments in a separate commit?

Thanks!
Carl



( sleep 5 | (timeout 3 tee -p 2>/dev/null && echo TEST_PASSED >&8) | : ) 8>&1 | 
grep -qx TEST_PASSED

To annotate it, and let's include this info in a comment:

- sleep emulates misbehaving input.
- The timeout is our failure safety-net.
- We ignore stderr from tee, and should have no stdout anyway.
- If that succeeds, we print TEST_PASSED into FD 8 to grep for later.
 (FD 8 was selected by a D20 roll, or rather, a software emulation)
- The colon is the immediately closed output process.
- We redirect 8 back into stdout to grep it.

If tee fails, for instance because it times out, or it fails to
recognize -P for some reason, the echo simply won't run.  The grep
options are in POSIX (or, at least, in POSIX.1-2017).

Thank you both, have a great night.
--
Arsen Arsenović
From d2a62057cacd25730243c20506edc66a27e3bf0d Mon Sep 17 00:00:00 2001
From: Carl Edquist 
Date: Thu, 15 Dec 2022 06:10:33 -0600
Subject: [PATCH 1/2] iopoll: broken pipe detection while waiting for input

When a program's output becomes a broken pipe, future attempts to write
to that ouput will fail (SIGPIPE/EPIPE).  Once it is known that all
future write attepts will fail (due to broken pipes), in many cases it
becomes pointless to wait for further input for slow devices like ttys.
Ideally, a program could use this information to exit early once it is
known that future writes will fail.

Introduce iopoll() to wait on a pair of fds (input & output) for input
to become ready or output to become a broken pipe.

This is relevant when input is intermittent (a tty, pipe, or socket);
but if input is always ready (a regular file or block device), then
a read() will not block, and write failures for a broken pipe will
happen normally.

Introduce iopoll_input_ok() to check whether an input fd is relevant
for iopoll().

Experimentally, broken pipes are only detectable immediately for pipes,
but not sockets.  Errors for other file types will be detected in the
usual way, on write failure.

Introduce iopoll_output_ok() to check whether an output fd is suitable
for iopoll() -- namely, whether it is a pipe.

iopoll() is best implemented with a native poll(2) where possible, but
fall back to a select(2)-based implementation platforms where there are
portability issues.  See also discussion in tail.c.

In general, adding a call to iopoll() before a read() in filter programs
also allows broken pipes to "propagate" backwards in a shell pipeline.

* src/iopoll.c, src/iopoll.h (iopoll): New function implementing broken
pipe detection on output while waiting for input.
(IOPOLL_BROKEN_OUTPUT, IOPOLL_ERROR): Return codes for iopoll().
(IOPOLL_USES_POLL): Macro for poll() vs select() implementation.
(iopoll_input_ok): New function to check whether an input fd is relevant
for iopoll().
(iopoll_output_ok): New function to check whether an input fd is
suitable for iopoll().
* src/local.mk (noinst_HEADERS): add src/iopoll.h.
---
 src/iopoll.c | 128 +++
 src/iopoll.h |   6 +++
 src/local.mk |   1 +
 3 files changed, 135 insertions(+)
 create mode 100644 src/iopoll.c
 create mode 100644 src/iopoll.h

diff --git a/src/iopoll.c b/src/iopoll.c
new file mode 100644
index 000..32462d8
--- /dev/null
+++ b/src/iopoll.c
@@ -0,0 +1,128 @@
+/* iopoll.c  -- broken pipe detection (while waiting for input)
+   Copyright (C) 

Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-12-13 Thread Arsen Arsenović

Hi Carl, Padraig,

Thanks for the ACK.  I've sent the signed copyright assignment form;
I'll keep you posted on that.

On Tue, 13 Dec 2022, Pádraig Brady wrote:

>> Re HAVE_INOTIFY, that's really a proxy for a linux kernel, and so would be
>> most appropriately changed to:
>>
>>  defined __linux__ || defined __ANDROID__
>>
>> I'm thinking these hardcoded defines would be best for now at least as it
>> covers the vast majority of systems, and avoids complicated (cross) compile
>> time checks.

It might also be good to give a quick test on FreeBSD, since it has some
popularity too.

>> A modularised iopoll.c would be better, given the potential uses by other
>> tools, though we'd probably release for just tee initially.
>>
>> As for interface to this functionality I'm wondering if we could just have
>> the existing tee {-p,--output-error} imply the use of poll() on output.
>>
>> I.e. from a high level -p just means to deal more appropriately with non file
>> outputs, and as part of that, dealing immediately with closed outputs would
>> be an improvement.

That seems reasonable to me.

>> Note also tail(1) enables this functionality by default. I'm not sure about
>> other utilities, but we can deal with that later if needed.

Carl Edquist via GNU coreutils General Discussion  writes:

>
> Thanks Pádraig for the feedback - that all sounds good.
>
> Will try to follow-up sometime this week...

If you prefer, I'll have some time in the latter part of this week too.

Let's not forget to include the testcase posted previously (with -p
instead of -P, since it was suggested to enable polling for -p):

( sleep 5 | (timeout 3 tee -p 2>/dev/null && echo TEST_PASSED >&8) | : ) 8>&1 | 
grep -qx TEST_PASSED

To annotate it, and let's include this info in a comment:

- sleep emulates misbehaving input.
- The timeout is our failure safety-net.
- We ignore stderr from tee, and should have no stdout anyway.
- If that succeeds, we print TEST_PASSED into FD 8 to grep for later.
  (FD 8 was selected by a D20 roll, or rather, a software emulation)
- The colon is the immediately closed output process.
- We redirect 8 back into stdout to grep it.

If tee fails, for instance because it times out, or it fails to
recognize -P for some reason, the echo simply won't run.  The grep
options are in POSIX (or, at least, in POSIX.1-2017).

Thank you both, have a great night.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-12-13 Thread Carl Edquist via GNU coreutils General Discussion

On Tue, 13 Dec 2022, Pádraig Brady wrote:

Thanks to both of you for working through the details on this. This does 
seem like a useful feature, and would be appropriate to add.


A summarized patch set would be appreciated at this stage, thanks.

Re HAVE_INOTIFY, that's really a proxy for a linux kernel, and so would 
be most appropriately changed to:


 defined __linux__ || defined __ANDROID__

I'm thinking these hardcoded defines would be best for now at least as 
it covers the vast majority of systems, and avoids complicated (cross) 
compile time checks.


A modularised iopoll.c would be better, given the potential uses by 
other tools, though we'd probably release for just tee initially.


As for interface to this functionality I'm wondering if we could just 
have the existing tee {-p,--output-error} imply the use of poll() on 
output.


I.e. from a high level -p just means to deal more appropriately with non 
file outputs, and as part of that, dealing immediately with closed 
outputs would be an improvement.


Note also tail(1) enables this functionality by default. I'm not sure 
about other utilities, but we can deal with that later if needed.


Thanks Pádraig for the feedback - that all sounds good.

Will try to follow-up sometime this week...

Carl


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-12-13 Thread Pádraig Brady

On 12/12/2022 22:29, Carl Edquist wrote:

On Mon, 12 Dec 2022, Arsen Arsenović wrote:


Hi Carl,

Carl Edquist  writes:


[2. text/x-diff; 0001-tee-only-fstat-outputs-if-pipe_check-is-active.patch]...

[3. text/x-diff; 
0002-tee-skip-pipe-checks-if-input-is-always-ready-for-re.patch]...


Thanks for writing these, and the other patches.  I've once again been
stripped of time, but I think we've nailed the concept down for the most
part.  I think we should wait for Pádraig to voice his opinion at this
point.  Details can be ironed out later, and pretty easily too.


Sure...  Sorry for throwing so many patches at the list  :)

I did have in mind also to send a summarized patch series in a single
email to make reviewing everything a bit easier.  Pádraig is the
gatekeeper here, in any case.

The main question on my mind currently is still what's the best way to do
the platform preprocessor logic for poll vs select ... (assuming
HAVE_INOTIFY is only in tail.c for "tail -f" mode)

And, i don't know if this would help "sell" it or not, but locally i have
separated out the iopoll implementation to a separate iopoll.c/.h, and as
an example successfully added a --pipe-check option to cat and tr.  (Not
to pressure anyone to take it, but, sometimes it's easier to tell if you
like an idea or not when you have a patch to look at...)

Personally, i see this broken-pipe detection being useful in tee, cat,
sed, and grep for starters.  (Though i know sed and grep live outside
coreutils.)

In some ways i feel like tee is actually the most immediate use case
though.

cat and sed mostly came into play for me as a way to type commands into a
socket; though as it seems to me now, "broken-pipe" detection for a socket
may not even be possible.


Thanks to both of you for working through the details on this.
This does seem like a useful feature, and would be appropriate to add.

A summarized patch set would be appreciated at this stage, thanks.

Re HAVE_INOTIFY, that's really a proxy for a linux kernel,
and so would be most appropriately changed to:
  defined __linux__ || defined __ANDROID__
I'm thinking these hardcoded defines would be best for now at least
as it covers the vast majority of systems, and avoids complicated
(cross) compile time checks.

A modularised iopoll.c would be better, given the potential
uses by other tools, though we'd probably release for just tee initially.

As for interface to this functionality I'm wondering if we
could just have the existing tee {-p,--output-error} imply the use of poll() on 
output.
I.e. from a high level -p just means to deal more appropriately with non file 
outputs,
and as part of that, dealing immediately with closed outputs would be an 
improvement.
Note also tail(1) enables this functionality by default.
I'm not sure about other utilities, but we can deal with that later if needed.

thanks!
Pádraig



Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-12-12 Thread Carl Edquist via GNU coreutils General Discussion

On Mon, 12 Dec 2022, Arsen Arsenović wrote:


Hi Carl,

Carl Edquist  writes:


[2. text/x-diff; 0001-tee-only-fstat-outputs-if-pipe_check-is-active.patch]...

[3. text/x-diff; 
0002-tee-skip-pipe-checks-if-input-is-always-ready-for-re.patch]...


Thanks for writing these, and the other patches.  I've once again been 
stripped of time, but I think we've nailed the concept down for the most 
part.  I think we should wait for Pádraig to voice his opinion at this 
point.  Details can be ironed out later, and pretty easily too.


Sure...  Sorry for throwing so many patches at the list  :)

I did have in mind also to send a summarized patch series in a single 
email to make reviewing everything a bit easier.  Pádraig is the 
gatekeeper here, in any case.


The main question on my mind currently is still what's the best way to do 
the platform preprocessor logic for poll vs select ... (assuming 
HAVE_INOTIFY is only in tail.c for "tail -f" mode)


And, i don't know if this would help "sell" it or not, but locally i have 
separated out the iopoll implementation to a separate iopoll.c/.h, and as 
an example successfully added a --pipe-check option to cat and tr.  (Not 
to pressure anyone to take it, but, sometimes it's easier to tell if you 
like an idea or not when you have a patch to look at...)


Personally, i see this broken-pipe detection being useful in tee, cat, 
sed, and grep for starters.  (Though i know sed and grep live outside 
coreutils.)


In some ways i feel like tee is actually the most immediate use case 
though.


cat and sed mostly came into play for me as a way to type commands into a 
socket; though as it seems to me now, "broken-pipe" detection for a socket 
may not even be possible.



All for now.

Have a good one,

Carl


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-12-12 Thread Carl Edquist via GNU coreutils General Discussion

On Mon, 12 Dec 2022, Arsen Arsenović wrote:


Hi Rob,

Rob Landley  writes:


This sort of thing is why I added -i to toybox's "timeout" command:

-i  Only kill for inactivity (restart timeout when command produces output)

It runs the command's stdout through a pipe and does a poll() with the 
-i seconds value, and signals the program if the poll() expires.


The android guys found it useful, but I was waiting to hear back about 
"cut -DF" before bringing it up here...


That's interesting, might be worth adding to the GNU timeout, however, 
it's not appropriate for what I'm using tee for, since compiler 
processes could appear idle for a long time, if doing LTO for instance.


Thanks Rob for the idea.

Sounds like it could be useful if you know what an appropriate timeout 
ought to be ... though in the general data-driven case i feel like filters 
should only quit for relevant events like end-of-input or write failure. 
(The broken-pipe detection idea being an extension that detects the point 
when future writes will fail.)


Sounds like a neat idea to add to a timeout command (where you know the 
output timeout you want) if it can be made to work with arbitrary 
programs.


Carl


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-12-12 Thread Arsen Arsenović
Hi Carl,

Carl Edquist  writes:

> [2. text/x-diff; 0001-tee-only-fstat-outputs-if-pipe_check-is-active.patch]...
>
> [3. text/x-diff; 
> 0002-tee-skip-pipe-checks-if-input-is-always-ready-for-re.patch]...

Thanks for writing these, and the other patches.  I've once again been
stripped of time, but I think we've nailed the concept down for the most
part.  I think we should wait for Pádraig to voice his opinion at this
point.  Details can be ironed out later, and pretty easily too.

Thank you again, have a great day.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-12-12 Thread Arsen Arsenović
Hi Rob,

Rob Landley  writes:

> This sort of thing is why I added -i to toybox's "timeout" command:
>
> -i  Only kill for inactivity (restart timeout when command produces 
> output)
>
> It runs the command's stdout through a pipe and does a poll() with the -i
> seconds value, and signals the program if the poll() expires.
>
> The android guys found it useful, but I was waiting to hear back about "cut 
> -DF"
> before bringing it up here...

That's interesting, might be worth adding to the GNU timeout, however,
it's not appropriate for what I'm using tee for, since compiler
processes could appear idle for a long time, if doing LTO for instance.

Thanks, have a great day.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-12-11 Thread Rob Landley
On 12/9/22 08:26, Carl Edquist via GNU coreutils General Discussion wrote:
>>> Similar to the situation here, i was seeing things annoyingly look like 
>>> they are still 'alive' longer than they ought to be when providing 
>>> input from the terminal.
>>
>> Huh, I never tried that, honestly.
> 
> Here is a simple example:
> 
>  exec 3<> /dev/tcp/example.com/80
> 
>  echo_crlf () { printf "%s\r\n" "$@"; }
> 
>  { echo_crlf "GET / HTTP/1.1"
>echo_crlf "Host: example.com"
>echo_crlf "Connection: close"
>echo_crlf ""
>  } >&3
> 
>  cat <&3
> 
> In this example, i'm sending the input to the socket (on fd 3) directly 
> using the 'printf' builtin and shell redirection, and i request the server 
> close its side of the connection after this request.

This sort of thing is why I added -i to toybox's "timeout" command:

-i  Only kill for inactivity (restart timeout when command produces output)

It runs the command's stdout through a pipe and does a poll() with the -i
seconds value, and signals the program if the poll() expires.

The android guys found it useful, but I was waiting to hear back about "cut -DF"
before bringing it up here...

Rob



Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-12-10 Thread Carl Edquist

On Fri, 9 Dec 2022, Carl Edquist wrote:


On Fri, 9 Dec 2022, Arsen Arsenović wrote:

... Also, i suspect that the pipe_check option can be disabled if the 
_input_ is a regular file (or block device), since (i think) these 
always show up as ready for reading.  (This check would only need to 
be done once for fd 0 at program start.)


Yes, there's no point poll-driving those, since it'll be always 
readable, up to EOF, and never hesitate to bring more data.  It might 
just end up being a no-op if used in current form (but I haven't 
tried).


Well currently if the input is a regular file, poll() immediately 
returns POLLIN, along with any potential errors for the output.  But yes 
the net effective behavior is the same as if the poll() call had been 
skipped.


In a way i don't think it's so necessary (as it's just an optimization, 
and it's only relevant if the user calls tee with -P/--pipe-check despite 
stdin being a regular file or block device), but anyway my attached patch 
(0002-tee-skip-pipe-checks-if-input-is-always-ready-for-re.patch) adds 
logic to disable pipe_check mode in this case.  (Though in order not to 
change the SIGPIPE semantics, -P still implies -p.)



CarlFrom fb3227b0e252b897f60053a6ffae48eb6ec19e62 Mon Sep 17 00:00:00 2001
From: Carl Edquist 
Date: Sat, 10 Dec 2022 10:00:12 -0600
Subject: [PATCH 1/2] tee: only fstat outputs if pipe_check is active

* src/tee.c (tee_files): avoid populating out_is_pipe array if
pipe_check is not enabled.
---
 src/tee.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/tee.c b/src/tee.c
index 482967a..1f5c171 100644
--- a/src/tee.c
+++ b/src/tee.c
@@ -313,10 +313,12 @@ tee_files (int nfiles, char **files)
  In both arrays, entry 0 corresponds to standard output.  */
 
   descriptors = xnmalloc (nfiles + 1, sizeof *descriptors);
-  out_is_pipe = xnmalloc (nfiles + 1, sizeof *out_is_pipe);
+  if (pipe_check)
+out_is_pipe = xnmalloc (nfiles + 1, sizeof *out_is_pipe);
   files--;
   descriptors[0] = stdout;
-  out_is_pipe[0] = is_pipe (fileno (descriptors[0]));
+  if (pipe_check)
+out_is_pipe[0] = is_pipe (fileno (descriptors[0]));
   files[0] = bad_cast (_("standard output"));
   setvbuf (stdout, NULL, _IONBF, 0);
   n_outputs++;
@@ -325,7 +327,8 @@ tee_files (int nfiles, char **files)
 {
   /* Do not treat "-" specially - as mandated by POSIX.  */
   descriptors[i] = fopen (files[i], mode_string);
-  out_is_pipe[i] = is_pipe (fileno (descriptors[i]));
+  if (pipe_check)
+out_is_pipe[i] = is_pipe (fileno (descriptors[i]));
   if (descriptors[i] == NULL)
 {
   error (output_error == output_error_exit
@@ -397,7 +400,8 @@ tee_files (int nfiles, char **files)
   }
 
   free (descriptors);
-  free (out_is_pipe);
+  if (pipe_check)
+free (out_is_pipe);
 
   return ok;
 }
-- 
2.9.0

From 07853a6624ca805497ad5438dfc3092c3fdf41bc Mon Sep 17 00:00:00 2001
From: Carl Edquist 
Date: Sat, 10 Dec 2022 10:21:48 -0600
Subject: [PATCH 2/2] tee: skip pipe checks if input is always ready for
 reading

Regular files and block devices are always ready for reading, even
at EOF where they just return 0 bytes.  Thus if the input is one of
these, it's never necessary to poll outputs, since we will never be
in a situation stuck waiting for input while all remaining outputs
are broken pipes.

Note that -P/--pipe-check still implies -p in this case, otherwise
a SIGPIPE would kill the process early.

* src/tee.c (always_ready): helper function to test an fd file type.
(main): disable pipe_check if STDIN is always ready for reading.
---
 src/tee.c | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/src/tee.c b/src/tee.c
index 1f5c171..f09324e 100644
--- a/src/tee.c
+++ b/src/tee.c
@@ -47,6 +47,7 @@
 #define IOPOLL_ERROR -3
 
 static bool tee_files (int nfiles, char **files);
+static bool always_ready (int fd);
 
 /* If true, append to output files rather than truncating them. */
 static bool append;
@@ -184,6 +185,11 @@ main (int argc, char **argv)
   if (output_error != output_error_sigpipe)
 signal (SIGPIPE, SIG_IGN);
 
+  /* No need to poll outputs if input is always ready for reading.  */
+
+  if (pipe_check && always_ready (STDIN_FILENO))
+	  pipe_check = false;
+
   /* Do *not* warn if tee is given no file arguments.
  POSIX requires that it work when given no arguments.  */
 
@@ -276,13 +282,25 @@ fail_output(FILE **descriptors, char **files, int i)
   return fail;
 }
 
-/* Return true if fd refers to a pipe */
+/* Return true if fd refers to a pipe.  */
 
 static bool
 is_pipe(int fd)
 {
 	struct stat st;
-	return fstat (fd, ) == 0 && S_ISFIFO(st.st_mode);
+	return fstat (fd, ) == 0 && S_ISFIFO (st.st_mode);
+}
+
+
+/* Return true if fd is always ready for reading,
+   ie, it is a regular file or block device.  */
+
+static bool
+always_ready(int fd)
+{
+	struct stat st;
+	return fstat (fd, ) == 0 && (S_ISREG 

Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-12-10 Thread Carl Edquist

On Fri, 9 Dec 2022, Carl Edquist wrote:


 A quick note, this check only needs to be done a total of once per
 output, it shouldn't be done inside iopoll(), which would result in
 an additional redundant fstat() per read().


 Could this be handled by get_next_out?


Sure, either in that function or immediately after it gets called.  But 
also once for stdout before the while (n_outputs) loop.  Alternatively, 
allocate a file type array and populate it in the for loop that does all 
the fopen() calls.


Patch attached for this.  I ended up going with a bool array to keep track 
of whether each output is a pipe or not.  Potentially a very small amount 
of extra work up front, but it seemed to make implementing the rest of it 
simpler.


CarlFrom 488e1398da6a15b70399d52346b05d9770db4732 Mon Sep 17 00:00:00 2001
From: Carl Edquist 
Date: Sat, 10 Dec 2022 09:31:03 -0600
Subject: [PATCH] tee: only do iopoll for pipe outputs

We can skip the broken-pipe detection for file types other than pipes.
Sockets behave similarly to pipes in some ways, but the semantics
appear to be different in important ways, such that poll() doesn't
detect when the remote read end is closed.

* src/tee.c (is_pipe): add helper function detecting if fd is a pipe.
* (tee_files): populate out_is_pipe bool array corresponding to whether
items in descriptors array are pipes, only do pipe_check/iopoll() logic
for a given output if it is a pipe.
---
 src/tee.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/src/tee.c b/src/tee.c
index b254466..482967a 100644
--- a/src/tee.c
+++ b/src/tee.c
@@ -276,6 +276,15 @@ fail_output(FILE **descriptors, char **files, int i)
   return fail;
 }
 
+/* Return true if fd refers to a pipe */
+
+static bool
+is_pipe(int fd)
+{
+	struct stat st;
+	return fstat (fd, ) == 0 && S_ISFIFO(st.st_mode);
+}
+
 /* Copy the standard input into each of the NFILES files in FILES
and into the standard output.  As a side effect, modify FILES[-1].
Return true if successful.  */
@@ -285,6 +294,7 @@ tee_files (int nfiles, char **files)
 {
   size_t n_outputs = 0;
   FILE **descriptors;
+  bool *out_is_pipe;
   char buffer[BUFSIZ];
   ssize_t bytes_read = 0;
   int i;
@@ -303,8 +313,10 @@ tee_files (int nfiles, char **files)
  In both arrays, entry 0 corresponds to standard output.  */
 
   descriptors = xnmalloc (nfiles + 1, sizeof *descriptors);
+  out_is_pipe = xnmalloc (nfiles + 1, sizeof *out_is_pipe);
   files--;
   descriptors[0] = stdout;
+  out_is_pipe[0] = is_pipe (fileno (descriptors[0]));
   files[0] = bad_cast (_("standard output"));
   setvbuf (stdout, NULL, _IONBF, 0);
   n_outputs++;
@@ -313,6 +325,7 @@ tee_files (int nfiles, char **files)
 {
   /* Do not treat "-" specially - as mandated by POSIX.  */
   descriptors[i] = fopen (files[i], mode_string);
+  out_is_pipe[i] = is_pipe (fileno (descriptors[i]));
   if (descriptors[i] == NULL)
 {
   error (output_error == output_error_exit
@@ -329,7 +342,7 @@ tee_files (int nfiles, char **files)
 
   while (n_outputs)
 {
-  if (pipe_check)
+  if (pipe_check && out_is_pipe[first_out])
 {
   int err = iopoll (STDIN_FILENO, fileno (descriptors[first_out]));
 
@@ -384,6 +397,7 @@ tee_files (int nfiles, char **files)
   }
 
   free (descriptors);
+  free (out_is_pipe);
 
   return ok;
 }
-- 
2.9.0



Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-12-10 Thread Carl Edquist via GNU coreutils General Discussion

On Fri, 9 Dec 2022, Carl Edquist wrote:


On Fri, 9 Dec 2022, Arsen Arsenović wrote:

Originally i had imagined (or hoped) that this broken-pipe detection 
could also be used for sockets (that was how the issue came up for 
me), but it seems the semantics for sockets are different than for 
pipes.


This might require POLLPRI or POLLRDHUP or such.  Can you try with 
those to the set of events in pollfd?


Oh interesting!  I had assumed these wouldn't help - POLLPRI is for OOB data, 
and, I had assumed POLLRDHUP was only for polling the read side of a socket 
(thus POLL*RD*HUP), to determine if the remote write end was shutdown.  But 
you are right, poll() returns with POLLRDHUP in revents as soon as the remote 
end is closed.  Thanks for the tip!


I assume the reason this works is, even though i have in mind that i am 
monitoring the socket as an output, the socket serves as an input also (it's 
open for RW).  So, if my interpretation is correct, POLLRDHUP is actually 
monitoring the local read side of the socket.  And so, poll() is actually 
detecting the remote write side getting shutdown.


This is not technically the same thing as monitoring the local output side, 
but if the remote end of the socket closes entirely (shutting down the remote 
read and write ends together), then that tells us about the local output side 
as well.


This definitely seems to work for the case i was playing with, though i'm not 
sure if it would behave as intended if the remote side only shut down its 
read or write end (but not both).


Alright, i got a chance to play around with this more locally with 
socketpairs.  Unfortunately, i've confirmed that POLLRDHUP is clearly 
intended for monitoring the local read end of a socket, not the local 
write end.  (Thus it is not what we want for monitoring an output.)


In each case below, i started with a socket open for RW on both ends, and 
poll()ed the local socket fd with POLLRDHUP.  I observed the following:


- shutdown local read -> POLLRDHUP
- shutdown local write -> nothing detected
- shutdown remote read -> nothing detected
- shutdown remote write -> POLLRDHUP

This is exactly the reverse of what we want when monitoring a socket as an 
output.  Shutting down the local read end should be fine (since we are 
using it for output).  Shutting down the local write end should be 
considered like a broken pipe, since nothing further can be sent. 
Shutting down the remote read end should also be considered like a broken 
pipe, as with a normal pipe.  Shutting down the remote write end should be 
fine, since supposedly the local end of the socket is used for output.


Again, the reason POLLRDHUP worked in my case is that the remote end 
closed completely (shutting down remote read and write).  But for cases 
where either end (local or remote) shuts down only read or write, using 
POLLRDHUP checks for exactly the wrong thing.


It seems that there ought to be a "POLLWRHUP", but as far as i can tell 
that does not exist.


...

My conclusion (as far as i can tell) is that we can't do this iopoll jazz 
correctly for sockets.


The up side though is that that makes it ok to explicitly check that the 
output is a pipe, and then only worry about pipe semantics.  Which makes 
the select-based implementation safe, since pipe fds are not open for RW.


Carl


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-12-09 Thread Carl Edquist via GNU coreutils General Discussion

Hi Pádraig,

Getting back to this portability question:

On Fri, 2 Dec 2022, Pádraig Brady wrote:


 Anyway if it's possible just to use poll(2) (the system one, not the
 gnulib replacement), that might simplify the portability logic.


Yes it would be better to use poll() if possible,
and that's what I attempted with tail(1) recently.
From the gnulib docs on poll():

"
 Portability problems fixed by Gnulib:

  This function is missing on some platforms:
  mingw, MSVC 14, HP NonStop.

  This function doesn't work on special files like @file{/dev/null} and ttys
  like
  @file{/dev/tty} on some platforms:
  AIX 5.3, Mac OS X 10.4.0

Portability problems not fixed by Gnulib:

  Under Windows, when passing a pipe, Gnulib's @code{poll} replacement might
  return 0 even before the timeout has passed.  Programs using it with pipes
  can
  thus busy wait.

  On some platforms, file descriptors other than sockets do not support
  POLLHUP; they will return a "readable" or "writable" status instead:
  AIX 7.2, HP NonStop.
"

So to use poll() everywhere we'd need the gnulib module.
But empirically the replacement didn't work on macos at least
for this use case, and we know the select emulation wouldn't work on AIX.

So portable use of poll() will be awkward.


Don't remember if i said so already or not, but thanks for the summary!

In my last patch i made a first attempt to define a 
HAVE_POLL_PIPE_CLOSE_DETECTION in configure.ac using AC_RUN_IFELSE to test 
poll() functionality.  Arsen pointed out that this does not work for 
cross-compiling.  (That's too bad, but, i think i understand now why it's 
a problem.)


We could still do just an AC_COMPILE_IFELSE to test whether native poll() 
is available, though if i understand your summary right there are a number 
of platforms where native poll() is available but won't work correctly 
here.


So, if we need to separate out the poll() vs select() implementations 
using a list of platforms, do we have a good idea what list of platforms 
are "ok for native poll pipe detection", or alternatively a list of 
platforms that are not ok?


So tail.c uses:

#if defined _AIX || defined __sun || defined __APPLE__ || HAVE_INOTIFY

but as far as i can tell HAVE_INOTIFY relates to the use of inotify in 
tail, which is specific to "tail -f" mode, and perhaps not relevant to the 
logic in tee (or other filters).


The section you quoted "From the gnulib docs on poll()" makes it sound 
like there might be problems with: Windows, mingw, MSVC 14, HP NonStop, 
AIX 5.3 & 7.2, and Mac OS X 10.4.0.  (Already that makes me wonder about 
_AIX and __APPLE__ being in the list of platforms to use poll().)


Anyway, just trying to get a handle on what might be a good way to do the 
preprocessor macro logic to decide between the poll() and select() 
implementations.


I guess in worst case, it sounds like select() could be used everywhere. 
And there is some appeal to having a single implementation.  But select() 
does seem less efficient if poll() is available; for instance with 
select(), two fd_sets need to be zeroed out (128 bytes each) before every 
call to read(), compared to the poll() version that only needs to 
initialize two struct pollfds (8 bytes each).  So i still see some value 
in doing poll() where possible.



Any thoughts on how to put together preprocessor logic that gets at the 
heart of the issue?


Thanks,
Carl


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-12-09 Thread Carl Edquist via GNU coreutils General Discussion

On Fri, 9 Dec 2022, Arsen Arsenović wrote:

Similar to the situation here, i was seeing things annoyingly look like 
they are still 'alive' longer than they ought to be when providing 
input from the terminal.


Huh, I never tried that, honestly.


Here is a simple example:

exec 3<> /dev/tcp/example.com/80

echo_crlf () { printf "%s\r\n" "$@"; }

{ echo_crlf "GET / HTTP/1.1"
  echo_crlf "Host: example.com"
  echo_crlf "Connection: close"
  echo_crlf ""
} >&3

cat <&3


In this example, i'm sending the input to the socket (on fd 3) directly 
using the 'printf' builtin and shell redirection, and i request the server 
close its side of the connection after this request.


But if i want to type the request by hand, i might do:

sed -u 's/$/\r/' >&3

to read from the terminal, convert line terminators to CRLF, and write to 
the socket.  Only, similar to the situation with tee and pipes, sed here 
will not detect when the remote read end of the socket has closed, so it 
hangs waiting for more input.



Did polling help your use-case?


Mostly no, as i kind of got into later last time.  It seems the semantics 
are different for sockets than pipes; for some reason it seems the socket 
doesn't behave like a broken pipe until after the remote read end shuts 
down and then another write is made to the local end.  Only after these 
does poll() seem to detect the broken-pipe -like state.


Oh interesting.  That wasn't on my radar at all.  I guess this means 
that when cross-compiling, the configure script is run on the 
cross-compiling host, rather than on the target platform; so any test 
programs in configure.ac with AC_RUN_IFELSE don't necessarily check the 
target platform functionality (?)


Or worse, is unable to run at all (and always fails), if the binary is 
for a different kernel or architecture.


Ok, so i guess for the sake of cross-compiling we are limited to 
compile/link checks.


Originally i had imagined (or hoped) that this broken-pipe detection 
could also be used for sockets (that was how the issue came up for me), 
but it seems the semantics for sockets are different than for pipes.


This might require POLLPRI or POLLRDHUP or such.  Can you try with those 
to the set of events in pollfd?


Oh interesting!  I had assumed these wouldn't help - POLLPRI is for OOB 
data, and, I had assumed POLLRDHUP was only for polling the read side of a 
socket (thus POLL*RD*HUP), to determine if the remote write end was 
shutdown.  But you are right, poll() returns with POLLRDHUP in revents as 
soon as the remote end is closed.  Thanks for the tip!


I assume the reason this works is, even though i have in mind that i am 
monitoring the socket as an output, the socket serves as an input also 
(it's open for RW).  So, if my interpretation is correct, POLLRDHUP is 
actually monitoring the local read side of the socket.  And so, poll() is 
actually detecting the remote write side getting shutdown.


This is not technically the same thing as monitoring the local output 
side, but if the remote end of the socket closes entirely (shutting down 
the remote read and write ends together), then that tells us about the 
local output side as well.


This definitely seems to work for the case i was playing with, though i'm 
not sure if it would behave as intended if the remote side only shut down 
its read or write end (but not both).


Also, my bits/poll.h seems to suggest this is a Linux extension:

#ifdef __USE_GNU
/* These are extensions for Linux.  */
# define POLLMSG0x400
# define POLLREMOVE 0x1000
# define POLLRDHUP  0x2000
#endif


Anyway, this is all Good To Know.  I don't know what the semantics with 
poll() for sockets are supposed to be in the general case (beyond Linux), 
so i don't feel i'm in a good position to advocate for it in coreutils. 
But maybe someone who knows better can comment on this topic.


A quick note, this check only needs to be done a total of once per 
output, it shouldn't be done inside iopoll(), which would result in an 
additional redundant fstat() per read().


Could this be handled by get_next_out?


Sure, either in that function or immediately after it gets called.  But 
also once for stdout before the while (n_outputs) loop.  Alternatively, 
allocate a file type array and populate it in the for loop that does all 
the fopen() calls.


Either way, the idea would be to then replace

if (pipe_check)

with something like

if (pipe_check && first_out_is_pipe)

or (with a file type array)

if (pipe_check && S_ISFIFO (output_types[first_out]))


... Also, i suspect that the pipe_check option can be disabled if the 
_input_ is a regular file (or block device), since (i think) these 
always show up as ready for reading.  (This check would only need to be 
done once for fd 0 at program start.)


Yes, there's no point poll-driving those, since it'll be always 
readable, up to EOF, and never hesitate to bring more data.  It might 

Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-12-09 Thread Arsen Arsenović

Carl Edquist  writes:

> On Thu, 8 Dec 2022, Arsen Arsenović wrote:
>
>> Apologies for my absence, Tuesdays and Wednesdays are long workdays for me.
>
> No need for apologies - I feel like i am the one who should apologize for my
> high volume of email to the list.  People have lives after all!  :)
>
> The timing of this thread caught my attention because I had recently been
> wrestling with a similar issue, trying to use shell utils to talk over a 
> socket
> with the help of bash's /dev/tpc/host/port interface.
>
> Similar to the situation here, i was seeing things annoyingly look like they
> are still 'alive' longer than they ought to be when providing input from the
> terminal.

Huh, I never tried that, honestly.  Did polling help your use-case?

>
>>> Biggest item is making a new configure macro based on whether poll() is
>>> present and and works as intended for pipes.  With 0 timeout, polling the
>>> write-end of a pipe that is open on both ends for errors does not indicate a
>>> broken pipe; but polling the write-end of a pipe with the read-end closed
>>> does indicate a broken pipe.
>>
>> This might be a bit problematic when cross compiling (which is why I imagine
>> systems were hard-coded before).
>
> Oh interesting.  That wasn't on my radar at all.  I guess this means that when
> cross-compiling, the configure script is run on the cross-compiling host,
> rather than on the target platform; so any test programs in configure.ac with
> AC_RUN_IFELSE don't necessarily check the target platform functionality (?)

Or worse, is unable to run at all (and always fails), if the binary is
for a different kernel or architecture.

> That's too bad.  I had hoped to come up with a better way to indicate a 
> working
> poll() for this feature than maintaining a list of platforms.
>
>
 So I guess (on Linux at least) that means a "readable event on STDOUT" is
 equivalent to (POLLRDNORM | POLLRDBAND | POLLIN | POLLHUP | POLLERR).

 So, it'll catch the relevant poll() errors (POLLHUP | POLLERR), but the
 inclusion of POLLIN results in the gotcha that it will be a false positive
 if stdout is already open for RW (eg a socket) and there is actually data
 ready.
>>
>> Ah - yes.  tail.c guards against this by checking the type of the file
>> descriptor before selecting it, and makes sure it's among the "one-way"
>> file descriptors:
>>
>>  if (forever && ignore_fifo_and_pipe (F, n_files))
>>{
>>  /* If stdout is a fifo or pipe, then monitor it
>> so that we exit if the reader goes away.  */
>>  struct stat out_stat;
>>  if (fstat (STDOUT_FILENO, _stat) < 0)
>>die (EXIT_FAILURE, errno, _("standard output"));
>>  monitor_output = (S_ISFIFO (out_stat.st_mode)
>>|| (HAVE_FIFO_PIPES != 1 && isapipe (STDOUT_FILENO)));
>>
>> Good catch!  It completely slipped by my mind.
>
> Ah, yeah if we know it's a pipe we shouldn't have to worry about an output
> being open for RW.
>
> Originally i had imagined (or hoped) that this broken-pipe detection could 
> also
> be used for sockets (that was how the issue came up for me), but it seems the
> semantics for sockets are different than for pipes.

This might require POLLPRI or POLLRDHUP or such.  Can you try with those
to the set of events in pollfd?

> Experimentally, it seems that once the remote read end of the socket is
> shutdown, poll() does not detect a broken pipe - it will wait indefinitely.
> But at this point if a write() is done on the local end of the socket, the
> first write() will succeed, and then _after_ this it will behave like a broken
> pipe - poll() returns POLLERR|POLLHUP, and write() results in SIGPIPE/EPIPE.
>
> It's fairly confusing.  But it seems due to the difference in semantics with
> sockets, likely this broken-pipe detection will only really work properly for
> pipes.
>
> So yeah, back to your point, there is a little room for improvement here by
> fstat()ing the output and only doing the iopoll() waiting if the output is a
> pipe.
>
> A quick note, this check only needs to be done a total of once per output, it
> shouldn't be done inside iopoll(), which would result in an additional
> redundant fstat() per read().

Could this be handled by get_next_out?

> ... Also, i suspect that the pipe_check option can be disabled if the _input_
> is a regular file (or block device), since (i think) these always show up as
> ready for reading.  (This check would only need to be done once for fd 0 at
> program start.)

Yes, there's no point poll-driving those, since it'll be always
readable, up to EOF, and never hesitate to bring more data.  It might
just end up being a no-op if used in current form (but I haven't tried).

> But ... one step at a time!  :)
>
>
> Carl

Have a great day.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-12-08 Thread Carl Edquist via GNU coreutils General Discussion

On Thu, 8 Dec 2022, Arsen Arsenović wrote:

Apologies for my absence, Tuesdays and Wednesdays are long workdays for 
me.


No need for apologies - I feel like i am the one who should apologize for 
my high volume of email to the list.  People have lives after all!  :)


The timing of this thread caught my attention because I had recently been 
wrestling with a similar issue, trying to use shell utils to talk over a 
socket with the help of bash's /dev/tpc/host/port interface.


Similar to the situation here, i was seeing things annoyingly look like 
they are still 'alive' longer than they ought to be when providing input 
from the terminal.



Biggest item is making a new configure macro based on whether poll() is 
present and and works as intended for pipes.  With 0 timeout, polling 
the write-end of a pipe that is open on both ends for errors does not 
indicate a broken pipe; but polling the write-end of a pipe with the 
read-end closed does indicate a broken pipe.


This might be a bit problematic when cross compiling (which is why I 
imagine systems were hard-coded before).


Oh interesting.  That wasn't on my radar at all.  I guess this means that 
when cross-compiling, the configure script is run on the cross-compiling 
host, rather than on the target platform; so any test programs in 
configure.ac with AC_RUN_IFELSE don't necessarily check the target 
platform functionality (?)


That's too bad.  I had hoped to come up with a better way to indicate a 
working poll() for this feature than maintaining a list of platforms.




So I guess (on Linux at least) that means a "readable event on STDOUT" is
equivalent to (POLLRDNORM | POLLRDBAND | POLLIN | POLLHUP | POLLERR).

So, it'll catch the relevant poll() errors (POLLHUP | POLLERR), but 
the inclusion of POLLIN results in the gotcha that it will be a false 
positive if stdout is already open for RW (eg a socket) and there is 
actually data ready.


Ah - yes.  tail.c guards against this by checking the type of the file
descriptor before selecting it, and makes sure it's among the "one-way"
file descriptors:

 if (forever && ignore_fifo_and_pipe (F, n_files))
   {
 /* If stdout is a fifo or pipe, then monitor it
so that we exit if the reader goes away.  */
 struct stat out_stat;
 if (fstat (STDOUT_FILENO, _stat) < 0)
   die (EXIT_FAILURE, errno, _("standard output"));
 monitor_output = (S_ISFIFO (out_stat.st_mode)
   || (HAVE_FIFO_PIPES != 1 && isapipe (STDOUT_FILENO)));

Good catch!  It completely slipped by my mind.


Ah, yeah if we know it's a pipe we shouldn't have to worry about an output 
being open for RW.


Originally i had imagined (or hoped) that this broken-pipe detection could 
also be used for sockets (that was how the issue came up for me), but it 
seems the semantics for sockets are different than for pipes.


Experimentally, it seems that once the remote read end of the socket is 
shutdown, poll() does not detect a broken pipe - it will wait 
indefinitely.  But at this point if a write() is done on the local end of 
the socket, the first write() will succeed, and then _after_ this it will 
behave like a broken pipe - poll() returns POLLERR|POLLHUP, and write() 
results in SIGPIPE/EPIPE.


It's fairly confusing.  But it seems due to the difference in semantics 
with sockets, likely this broken-pipe detection will only really work 
properly for pipes.


So yeah, back to your point, there is a little room for improvement here 
by fstat()ing the output and only doing the iopoll() waiting if the output 
is a pipe.


A quick note, this check only needs to be done a total of once per output, 
it shouldn't be done inside iopoll(), which would result in an additional 
redundant fstat() per read().


... Also, i suspect that the pipe_check option can be disabled if the 
_input_ is a regular file (or block device), since (i think) these always 
show up as ready for reading.  (This check would only need to be done once 
for fd 0 at program start.)


But ... one step at a time!  :)


Carl


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-12-08 Thread Arsen Arsenović
Hi Carl,

Apologies for my absence, Tuesdays and Wednesdays are long workdays for
me.

Carl Edquist  writes:

> Alright, lest I be guilty of idle nay-saying, I've attached another patch to
> address all of my complaints.
>
> (Apply it after Arsen's last one, which comes after my previous one. Otherwise
> if desired I can send a single summary patch.)
>
> Biggest item is making a new configure macro based on whether poll() is 
> present
> and and works as intended for pipes.  With 0 timeout, polling the write-end of
> a pipe that is open on both ends for errors does not indicate a broken pipe;
> but polling the write-end of a pipe with the read-end closed does indicate a
> broken pipe.

This might be a bit problematic when cross compiling (which is why I
imagine systems were hard-coded before).

> Beyond that, I revised the select()-based implementation of iopoll() to 
> address
> my previous comments.  Sorry I got my grubby hands all over it.
> I do hope you'll like it though! :)

Thanks :)

>>>  (4.)
>>>
   +  /* readable event on STDOUT is equivalent to POLLERR,
   + and implies an error condition on output like broken pipe.  */
>>>
>>>  I know this is what the comment from tail.c says, but is it actually
>>>  documented to be true somewhere?  And on what platforms?  I don't see it
>>>  being documented in my select(2) on Linux, anyway.  (Though it does seem
>>>  to work.)  Wondering if this behavior is "standard".

/me shrugs

I cannot speak for many platforms, so I just opted to follow what tail.c
already did.

>> Ah!
>>
>> Well, it's not documented in my (oldish) select(2), but I do find the
>> following in a newer version of that manpage:
>>
>>
>>>  Correspondence between select() and poll() notifications
>>>
>>>   Within the Linux kernel source, we find the following definitions which
>>>   show the correspondence between the readable, writable, and exceptional
>>>   condition  notifications  of  select() and the event notifications pro-
>>>   vided by poll(2) (and epoll(7)):
>>>
>>>   #define POLLIN_SET (POLLRDNORM | POLLRDBAND | POLLIN | POLLHUP |
>>>  POLLERR)
>>>  /* Ready for reading */
>>>   #define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR)
>>>  /* Ready for writing */
>>>   #define POLLEX_SET (POLLPRI)
>>>  /* Exceptional condition */
>>> 
>>
>>
>> So I guess (on Linux at least) that means a "readable event on STDOUT" is
>> equivalent to (POLLRDNORM | POLLRDBAND | POLLIN | POLLHUP | POLLERR).
>>
>> So, it'll catch the relevant poll() errors (POLLHUP | POLLERR), but the
>> inclusion of POLLIN results in the gotcha that it will be a false positive if
>> stdout is already open for RW (eg a socket) and there is actually data ready.

Ah - yes.  tail.c guards against this by checking the type of the file
descriptor before selecting it, and makes sure it's among the "one-way"
file descriptors:

  if (forever && ignore_fifo_and_pipe (F, n_files))
{
  /* If stdout is a fifo or pipe, then monitor it
 so that we exit if the reader goes away.  */
  struct stat out_stat;
  if (fstat (STDOUT_FILENO, _stat) < 0)
die (EXIT_FAILURE, errno, _("standard output"));
  monitor_output = (S_ISFIFO (out_stat.st_mode)
|| (HAVE_FIFO_PIPES != 1 && isapipe (STDOUT_FILENO)));

Good catch!  It completely slipped by my mind.

>> Also, the POLLEX_SET definition of (POLLPRI) doesn't seem relevant; so I
>> might suggest removing the 'xfd' arg for the select()-based implementation:
>>
>>>  POLLPRI
>>>
>>>There is some exceptional  condition  on  the  file  descriptor.
>>>Possibilities include:
>>>
>>>*  There is out-of-band data on a TCP socket (see tcp(7)).
>>>
>>>*  A  pseudoterminal  master  in  packet  mode  has seen a state
>>>   change on the slave (see ioctl_tty(2)).
>>>
>>>*  A cgroup.events file has been modified (see cgroups(7)).

Yes, adding POLLPRI and xfds is likely excessive.  I did the former
while quite tired (so, under a misunderstanding), and the latter was a
translation of the former.

In any case, iopoll appears to be the path ahead, regardless of
implementation details.

Thanks again.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-12-07 Thread Carl Edquist
Alright, lest I be guilty of idle nay-saying, I've attached another patch 
to address all of my complaints.


(Apply it after Arsen's last one, which comes after my previous one. 
Otherwise if desired I can send a single summary patch.)


Biggest item is making a new configure macro based on whether poll() is 
present and and works as intended for pipes.  With 0 timeout, polling the 
write-end of a pipe that is open on both ends for errors does not indicate 
a broken pipe; but polling the write-end of a pipe with the read-end 
closed does indicate a broken pipe.


Hopefully this test covers all platforms.  The only part that I'm iffy 
about was this bit that Pádraig mentioned:



Portability problems fixed by Gnulib:

 This function doesn't work on special files like @file{/dev/null} and
 ttys like @file{/dev/tty} on some platforms: AIX 5.3, Mac OS X 10.4.0


If indeed there are issues on these platforms using poll() against these 
special files (character devices), I suspect we can just test the relevant 
behavior in the test program for the new configure macro.  (eg, poll() for 
/dev/null should always show POLLIN when open for reading, and should not 
returning any errors when open for writing.)



Beyond that, I revised the select()-based implementation of iopoll() to 
address my previous comments.  Sorry I got my grubby hands all over it.

I do hope you'll like it though! :)


Cheers,
Carl


On Tue, 6 Dec 2022, Carl Edquist wrote:


 (4.)


  +  /* readable event on STDOUT is equivalent to POLLERR,
  + and implies an error condition on output like broken pipe.  */


 I know this is what the comment from tail.c says, but is it actually
 documented to be true somewhere?  And on what platforms?  I don't see it
 being documented in my select(2) on Linux, anyway.  (Though it does seem
 to work.)  Wondering if this behavior is "standard".


Ah!

Well, it's not documented in my (oldish) select(2), but I do find the 
following in a newer version of that manpage:




 Correspondence between select() and poll() notifications

  Within the Linux kernel source, we find the following definitions which
  show the correspondence between the readable, writable, and exceptional
  condition  notifications  of  select() and the event notifications pro-
  vided by poll(2) (and epoll(7)):

  #define POLLIN_SET (POLLRDNORM | POLLRDBAND | POLLIN | POLLHUP |
 POLLERR)
 /* Ready for reading */
  #define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR)
 /* Ready for writing */
  #define POLLEX_SET (POLLPRI)
 /* Exceptional condition */




So I guess (on Linux at least) that means a "readable event on STDOUT" is 
equivalent to (POLLRDNORM | POLLRDBAND | POLLIN | POLLHUP | POLLERR).


So, it'll catch the relevant poll() errors (POLLHUP | POLLERR), but the 
inclusion of POLLIN results in the gotcha that it will be a false positive if 
stdout is already open for RW (eg a socket) and there is actually data ready.


Also, the POLLEX_SET definition of (POLLPRI) doesn't seem relevant; so I 
might suggest removing the 'xfd' arg for the select()-based implementation:



 POLLPRI

   There is some exceptional  condition  on  the  file  descriptor.
   Possibilities include:

   *  There is out-of-band data on a TCP socket (see tcp(7)).

   *  A  pseudoterminal  master  in  packet  mode  has seen a state
  change on the slave (see ioctl_tty(2)).

   *  A cgroup.events file has been modified (see cgroups(7)).



Of course, those definitions live in the Linux kernel source, and on Linux 
we'd get to use poll() instead of select().


Don't know if the select() <-> poll() correspondence differs at all on other 
platforms .. ?



(But, this does give me a bit more confidence about select() being a (mostly) 
viable alternative.)



Carl

From 26f991a2a3d8c3d4ea6300b5e00e47276d7544be Mon Sep 17 00:00:00 2001
From: Carl Edquist 
Date: Wed, 7 Dec 2022 09:01:35 -0600
Subject: [PATCH] tee: define and use HAVE_POLL_PIPE_CLOSE_DETECTION

It's preferable to use native poll(2) wherever possible for broken pipe
detection, otherwise fall back to using select(2).  Rather than doing
the preprocessor logic using a collection of macros representing
different platforms, simply define a macro at configure time based on
whether or not poll can be used for this purpose.

While we're at it, revise the select()-based implementation of iopoll().
('xfd' shouldn't be used; ret isn't needed; some comments needed
clarification; and a few items were arranged to match the poll()-based
version of iopoll().)

* configure.ac (HAVE_POLL_PIPE_CLOSE_DETECTION): New macro based on
program testing whether poll() can be used to detect broken pipes.
* src/tee.c (headers): include sys/select.h when using select()-based
implementation of iopoll().
(iopoll): use HAVE_POLL_PIPE_CLOSE_DETECTION macro to determine poll()
or select()-based implementation; revise 

Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-12-06 Thread Carl Edquist

On Tue, 6 Dec 2022, Carl Edquist wrote:


(4.)


 +  /* readable event on STDOUT is equivalent to POLLERR,
 + and implies an error condition on output like broken pipe.  */


I know this is what the comment from tail.c says, but is it actually 
documented to be true somewhere?  And on what platforms?  I don't see it 
being documented in my select(2) on Linux, anyway.  (Though it does seem 
to work.)  Wondering if this behavior is "standard".


Ah!

Well, it's not documented in my (oldish) select(2), but I do find the 
following in a newer version of that manpage:




Correspondence between select() and poll() notifications

 Within the Linux kernel source, we find the following definitions which
 show the correspondence between the readable, writable, and exceptional
 condition  notifications  of  select() and the event notifications pro-
 vided by poll(2) (and epoll(7)):

 #define POLLIN_SET (POLLRDNORM | POLLRDBAND | POLLIN | POLLHUP |
 POLLERR)
/* Ready for reading */
 #define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR)
/* Ready for writing */
 #define POLLEX_SET (POLLPRI)
/* Exceptional condition */




So I guess (on Linux at least) that means a "readable event on STDOUT" is 
equivalent to (POLLRDNORM | POLLRDBAND | POLLIN | POLLHUP | POLLERR).


So, it'll catch the relevant poll() errors (POLLHUP | POLLERR), but the 
inclusion of POLLIN results in the gotcha that it will be a false positive 
if stdout is already open for RW (eg a socket) and there is actually data 
ready.


Also, the POLLEX_SET definition of (POLLPRI) doesn't seem relevant; so I 
might suggest removing the 'xfd' arg for the select()-based 
implementation:



POLLPRI

  There is some exceptional  condition  on  the  file  descriptor.
  Possibilities include:

  *  There is out-of-band data on a TCP socket (see tcp(7)).

  *  A  pseudoterminal  master  in  packet  mode  has seen a state
 change on the slave (see ioctl_tty(2)).

  *  A cgroup.events file has been modified (see cgroups(7)).



Of course, those definitions live in the Linux kernel source, and on Linux 
we'd get to use poll() instead of select().


Don't know if the select() <-> poll() correspondence differs at all on 
other platforms .. ?



(But, this does give me a bit more confidence about select() being a 
(mostly) viable alternative.)



Carl



Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-12-06 Thread Carl Edquist via GNU coreutils General Discussion

Hi Arsen, thanks for your feedback!

On Tue, 6 Dec 2022, Arsen Arsenović wrote:

The stubborn part of me might say, for platforms that do not natively 
support poll(2), we could simply leave out support for this feature. 
If that's not acceptable, we could add a select(2)-based fallback for 
platforms that do not have a native poll(2).


There's no need to omit it.  iopoll() seems sufficiently easy to 
implement via select():


So, you're right...

I think the stubborn part of me was reacting (badly) to what looked to be 
a messy situation in tail.c's mix of poll() and select().  I got further 
disenchanted when i saw gnulib's select()-based emulation of poll()  :(


But hey, if it works...


Anyway a few comments:

(1.)


+#if defined _AIX || defined __sun || defined __APPLE__ || HAVE_INOTIFY


I know this is what tail.c has, but I would like to seriously rethink 
this.  What do these macros have to do with poll()?  (And just as 
importantly, how can you tell by looking at it?)


First of all, I think poll(2) (the native system one, not the gnulib 
emulation) should be used wherever possible.  Which is to say, select() 
should only be used where native poll() is unavailable (or if it's 
"available" but broken in some important way).


I believe the HAVE_INOTIFY bit is specific to tail.c, which relates to its 
inotify mode.  I don't think it should be part of the equation here in 
tee.c.


The remaining macros _AIX, __sun, __APPLE__ -- do these cover all systems 
that have a working native poll() -- eg, Linux and the BSDs?  (I suspect 
not?)  And they do not clearly communicate anything to the reader about 
why they were picked / what they represent in relation to the availability 
of poll().


*Ideally,* there might be some configure macro like HAVE_POLL, except one 
that refers specifically to the system one and not the gnulib emulation. 
Something like "HAVE_NATIVE_poll".  Then it would be clear what the 
preprocessor logic is trying to accomplish.



(2.) if we want to use #if preprocessor logic (and preferably "#if 
HAVE_NATIVE_poll") for this poll/select fallback mechanism, I'd suggest to 
include each entire version of the function (including the function header 
and comments) in separate preprocessor blocks.  (One complete function in 
#if, and the other in #else.) The iopoll() implementation I provided is 
straightforward by itself, but its readability gets trampled badly when 
trying to mix the body of the select()-based implementation in with it. 
Just my 2 cents.



(3.)


+  FD_SET(fdout, );
+  FD_SET(fdin, );


Is it documented somewhere that the 'exceptfds' argument (xfd) to 
select(2) can be used in a useful way here?


The only thing I've seen (from select_tut(2)) is that these can be used to 
detect out-of-band data for reading on a socket, or that control status 
information is available for a pty in packet mode (tty_ioctl(4)).  I 
believe both of these would be false positives if encountered on input, 
and it's not clear what they would mean for output.



(4.)


+  /* readable event on STDOUT is equivalent to POLLERR,
+ and implies an error condition on output like broken pipe.  */


I know this is what the comment from tail.c says, but is it actually 
documented to be true somewhere?  And on what platforms?  I don't see it 
being documented in my select(2) on Linux, anyway.  (Though it does seem 
to work.)  Wondering if this behavior is "standard".



I guess there is also a little gotcha.  In tee, all the outputs *except* 
stdout are explicitly open()ed, but stdout itself is not, and therefore it 
inherits the open mode that fd 1 had when tee was exec()ed.


In particular, if fd 1 / stdout is a socket, open for RW, then a "readable 
event on STDOUT" could be a legitimate indication that data is ready for 
reading.  So that would be a false positive causing stdout to be removed 
even though successful writes are still possible.


(Maybe nobody will care about this, but it's an interesting corner-case.)

...

"But other than that", sure!  I guess I'm all for it  :)

Carl


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-12-06 Thread Arsen Arsenović

Arsen Arsenović  writes:

> From 582e0a27b7995aac90cc360463ec8bde1a44cfe4 Mon Sep 17 00:00:00 2001
> From: Paul Eggert 

^ Whoops, I forgot to fix this after committing with the wrong hash in
--reuse-commit.  I don't want to confuse anyone, I authored the patch.

Apologies.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-12-06 Thread Arsen Arsenović
Hi Carl,

Carl Edquist  writes:

> Originally I had in mind to put the read() call inside the poll() loop. But if
> we keep this feature as an option, it felt it was a bit easier just to add the
> "if (pipe_check) {...}" block before the read().

Yes, I do agree that this is likely cleaner.

> For Pádraig, I think the same function & approach here could be used in other
> filters (cat for example).  The stubborn part of me might say, for platforms
> that do not natively support poll(2), we could simply leave out support for
> this feature.  If that's not acceptable, we could add a select(2)-based
> fallback for platforms that do not have a native poll(2).

There's no need to omit it.  iopoll() seems sufficiently easy to
implement via select():

From 582e0a27b7995aac90cc360463ec8bde1a44cfe4 Mon Sep 17 00:00:00 2001
From: Paul Eggert 
Date: Mon, 5 Dec 2022 18:42:19 -0800
Subject: [PATCH] tee: Support select fallback path
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

2022-12-06  Arsen Arsenović  

	iopoll: Support select fallback path
	* src/tee.c (iopoll): Add logic to enable select usage.
---
 src/tee.c | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/src/tee.c b/src/tee.c
index c17c5c788..7064ad27d 100644
--- a/src/tee.c
+++ b/src/tee.c
@@ -197,6 +197,7 @@ main (int argc, char **argv)
 static int
 iopoll(int fdin, int fdout)
 {
+#if defined _AIX || defined __sun || defined __APPLE__ || HAVE_INOTIFY
   struct pollfd pfds[2] = {{fdin, POLLIN, 0}, {fdout, 0, 0}};
 
   while (poll (pfds, 2, -1) > 0 || errno == EINTR)
@@ -206,7 +207,41 @@ iopoll(int fdin, int fdout)
   if (pfds[1].revents)/* POLLERR, POLLHUP, or POLLNVAL */
 return IOPOLL_BROKEN_OUTPUT;  /* output error or broken pipe   */
 }
+#else
+  int ret;
+  int bigger_fd = fdin > fdout ? fdin : fdout;
+  fd_set rfd;
+  fd_set xfd;
+  FD_ZERO();
+  FD_ZERO();
+  FD_SET(fdout, );
+  FD_SET(fdout, );
+  FD_SET(fdin, );
+  FD_SET(fdin, );
 
+  /* readable event on STDOUT is equivalent to POLLERR,
+ and implies an error condition on output like broken pipe.  */
+  while ((ret = select (bigger_fd + 1, , NULL, , NULL)) > 0
+	 || errno == EINTR)
+{
+  if (errno == EINTR)
+continue;
+
+  if (ret < 0)
+break;
+
+  if (FD_ISSET(fdout, ) || FD_ISSET(fdout, ))
+{
+  /* Implies broken fdout.  */
+  return IOPOLL_BROKEN_OUTPUT;
+}
+  else if (FD_ISSET(fdin, ) || FD_ISSET(fdin, ))
+{
+  /* Something on input.  Error handled in subsequent read.  */
+  return 0;
+}
+}
+#endif
   return IOPOLL_ERROR;  /* poll error */
 }
 
-- 
2.38.1


Note that I also needed to replace the ``/* falls through */'' comment
with [[fallthrough]]; to build your patch on gcc 12.2.1 20221008.
I'd guess there's some way to pick the correct marking method.

I tested both codepaths in the patch above on Linux.  I suggest adding
the test case I provided before to test on more platforms (and I'll give
some VMs a shot when I get home; currently in a lecture).

The API here seems quite general, I'd be surprised if other utils
couldn't make use of it too, though, maybe it should be given a slightly
more descriptive name (iopoll feels a bit broad, maybe select_inout ()
to signify that it makes a selection between one input or one output
exactly).

> Unique to tee is its multiple outputs.  The new get_next_out() helper simply
> advances to select the next remaining (ie, not-yet-removed) output. As
> described last time, it's sufficient to track a single output at a time, and
> perhaps it even simplifies the implementation.  It also avoids the need for a
> malloc() for the pollfd array before every read().

I think this is okay, I struggle to find a case where it couldn't work.

Note that removing polled files from a pollfd array does not require any
reallocation (just setting the fd to -1, as in the code I initially
posted), so there's no malloc either way ;).

Thanks for working on this, have a great day.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-12-05 Thread Carl Edquist

On Fri, 2 Dec 2022, Carl Edquist wrote:

Although tee has multiple outputs, you only need to monitor a single 
output fd at a time.  Because the only case you actually need to catch 
is when the final valid output becomes a broken pipe.  (So I don't 
think it's necessary to poll(2) all the output fds together.)


That is technically true, but I think coupling this to two FDs might 
prove a bit inelegant in implementation (since you'd have to decide 
which entry from an unorganized list with holes do you pick?  any of 
them could spontaneously go away), so I'm not sure the implementation 
would be cleaner that way.


It'd be best to explain with a patch - I'll plan to send one later for a 
concrete example.


Ok it's patch time.  Attached!  Feedback welcome.


A few notes:


Originally I had in mind to put the read() call inside the poll() loop. 
But if we keep this feature as an option, it felt it was a bit easier just 
to add the "if (pipe_check) {...}" block before the read().



The new iopoll() function is the core concept here for waiting on an 
[input, output] pair of fds - waiting for the input to become ready to 
read, or the output to have an error or become a broken pipe.



For Pádraig, I think the same function & approach here could be used in 
other filters (cat for example).  The stubborn part of me might say, for 
platforms that do not natively support poll(2), we could simply leave out 
support for this feature.  If that's not acceptable, we could add a 
select(2)-based fallback for platforms that do not have a native poll(2).



Unique to tee is its multiple outputs.  The new get_next_out() helper 
simply advances to select the next remaining (ie, not-yet-removed) output. 
As described last time, it's sufficient to track a single output at a 
time, and perhaps it even simplifies the implementation.  It also avoids 
the need for a malloc() for the pollfd array before every read().



I moved the somewhat complicated write-failure logic from tee_files() out 
to a new helper function, fail_output(), which now also gets called for 
broken pipes that we want to remove.



Note also that I make -P imply -p.  I think this is necessary, otherwise 
an output pipe becoming broken always produces an error.  But normally, an 
output pipe breaking does not necessarily produce an error, since EOF can 
arrive before any further input, and in that case no write is then 
attempted into the broken pipe.




Happy hacking / feedback welcome.


Thanks,
Carlcommit cabf34aa748afa676f10a7943de0e25a962d23a4
Author: Carl Edquist 
Date:   Mon Dec 5 15:43:24 2022 -0600

tee: add -P/--pipe-check option to remove broken pipe outputs

In case input is intermittent (a tty, pipe, or socket), and all
remaining outputs are pipes (eg, >(cmd) process substitutions), provide
a way to exit early when they have all become broken pipes (and thus
future writes will fail), without having to wait for more input to tee,
to actually encounter the signal or write failure (SIGPIPE/EPIPE).

Broken pipe detection is done by calling poll(2) with an unlimited
timeout, waiting for input to be available, or errors on input or the
first remaining output.  The iopoll() function that implements this
could be of general use for other filter utils, even cat for instance.
This would allow broken pipes to "propagate" backwards in a shell
pipeline.

Note also that -P implies -p.  This is necessary, otherwise an output
pipe becoming broken always produces an error.  But normally, an output
pipe breaking does not produce an error if EOF is hit on input before
any further data, as no write is then attempted into a broken pipe.

* src/tee.c (iopoll): New function implementing broken pipe detection.
(pipe_check, long_options, main): Add -P/--pipe-check option.
(get_next_out): Helper function for finding next valid output.
(tee_files): Add broken pipe detection before calling read().
(tee_files, fail_output): Break out write failure/output removal logic
to helper function.

diff --git a/src/tee.c b/src/tee.c
index 971b768..c17c5c7 100644
--- a/src/tee.c
+++ b/src/tee.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "system.h"
 #include "argmatch.h"
@@ -37,6 +38,9 @@
   proper_name ("Richard M. Stallman"), \
   proper_name ("David MacKenzie")
 
+#define IOPOLL_BROKEN_OUTPUT -2
+#define IOPOLL_ERROR -3
+
 static bool tee_files (int nfiles, char **files);
 
 /* If true, append to output files rather than truncating them. */
@@ -45,6 +49,9 @@ static bool append;
 /* If true, ignore interrupts. */
 static bool ignore_interrupts;
 
+/* If true, detect if next output becomes broken while waiting for input. */
+static bool pipe_check;
+
 enum output_error
   {
 output_error_sigpipe,  /* traditional behavior, sigpipe enabled.  */
@@ -61,6 +68,7 @@ static struct option const long_options[] =
   {"append", no_argument, 

Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-12-02 Thread Carl Edquist via GNU coreutils General Discussion

On Fri, 2 Dec 2022, Arsen Arsenović wrote:

I'm concerned with adding such a behavior change by default still.  I 
can imagine this "lifetime extension" properly having been relied on in 
the last many decades it has been around for ;)


That's fair! :)  I'd be curious to hear about a use-case for that; but 
anyway yeah if it's an option at least there won't be any surprises.



Although tee has multiple outputs, you only need to monitor a single 
output fd at a time.  Because the only case you actually need to catch 
is when the final valid output becomes a broken pipe.  (So I don't 
think it's necessary to poll(2) all the output fds together.)


That is technically true, but I think coupling this to two FDs might 
prove a bit inelegant in implementation (since you'd have to decide 
which entry from an unorganized list with holes do you pick?  any of 
them could spontaneously go away), so I'm not sure the implementation 
would be cleaner that way.


It'd be best to explain with a patch - I'll plan to send one later for a 
concrete example.


But to try to answer your question:

you'd have to decide which entry from an unorganized list with holes do 
you pick?


So as you've seen there is a "descriptors" array corresponding to all the 
outputs.  What I had in mind is to maintain an int that keeps track of the 
index of the first item in the descriptors array that is still valid. 
(They are actually FILE *'s, which are set to NULL when they become 
invalid.)


So, you'd start with the first one (which happens always to be stdout). 
If the output at the current index (starting with stdout) ever becomes 
invalid (due to broken pipe detection, or due to other non-fatal write 
failure), then increase the index until the next valid output is found 
(skipping NULL entries).  If the last output is closed, it's not really 
important what happens to the index - it can be left as-is or set to -1; 
it won't be used again in any case.



any of them could spontaneously go away


I think it should be OK just to check the current output index in the list 
and advance if that one closes.  If a pipe becomes broken in the middle of 
the list, I think it's fine to let it be.


Why is this OK?

First of all, the current index still refers to a valid output.  That 
means you are _not_ in a situation where all outputs are broken pipes, so 
the right thing to do is to continue waiting for input.


Then, if input arrives, it will get written to all the valid (ie, 
non-NULL) outputs, including the one that has now become a broken pipe 
(without being detected).


But this is OK, because (as we've discussed before) we should already be 
ignoring SIGPIPE (and handling EPIPE), to prevent races that can happen 
where a fatal SIGPIPE comes in after a read() and before the corresponding 
write().  So, this broken pipe gets removed due to the write failure 
(EPIPE), rather than the broken-pipe detection.


But it does not affect the lifetime of the tee process, as any time poll() 
is waiting, the index will point to an output that is still valid and is 
not (yet) a broken pipe.


...


But the proof is in the pudding; so I'll try to draft something up and you 
can see what you think technically & aesthetically...



Cheers!
Carl


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-12-02 Thread Pádraig Brady

On 02/12/2022 11:47, Carl Edquist wrote:

PS:

On Fri, 2 Dec 2022, Carl Edquist wrote:


On the topic of implementation - I was thinking more about a general
solution for filter utils, and I am thinking the key thing is to provide
a replacement (wrapper) for read(2), that polls two fds together (one
input and one ouput), with no timeout.

It would check for POLLIN on input (in which case do the read()).
Otherwise if there is an error (POLLERR or POLLHUP) on input, treat it
as EOF. Otherwise if there's an error on output, remove this output, or
handle it similar to SIGPIPE/EPIPE.

(Nothing is written to the output fd here, it's just used for polling.)

...

I think this general approach of using poll(2) for a single input along
with a single ouput could be used for both tee and other filters that
only write to stdout.



A question for Pádraig (or anyone else who might know) -

Are there portability concerns with the system poll(2), or just with the
gnulib replacement for poll(), which is based on select(2) ?

The commit message (3a1c328cd) seems to suggest the latter.

In other words, are there any platforms that do not support using the
system poll(2) directly?

tail.c seems to use poll(2) for _AIX || __sun || __APPLE__ || HAVE_INOTIFY
and *otherwise* it uses select(2).

But I am thinking, if it's available, poll() is the more appropriate
interface for this check in the first place.

The 'trick' in tail.c of using select() to test when STDOUT becomes
available for reading, and taking that as an indication of an error
condition, doesn't seem to be documented (in my linux select(2)), and also
seems like it might not work as intended if fd 1 is open for read-write.


Anyway if it's possible just to use poll(2) (the system one, not the
gnulib replacement), that might simplify the portability logic.


Yes it would be better to use poll() if possible,
and that's what I attempted with tail(1) recently.
From the gnulib docs on poll():

"
 Portability problems fixed by Gnulib:

  This function is missing on some platforms:
  mingw, MSVC 14, HP NonStop.

  This function doesn't work on special files like @file{/dev/null} and ttys 
like
  @file{/dev/tty} on some platforms:
  AIX 5.3, Mac OS X 10.4.0

 Portability problems not fixed by Gnulib:

  Under Windows, when passing a pipe, Gnulib's @code{poll} replacement might
  return 0 even before the timeout has passed.  Programs using it with pipes can
  thus busy wait.

  On some platforms, file descriptors other than sockets do not support
  POLLHUP; they will return a "readable" or "writable" status instead:
  AIX 7.2, HP NonStop.
"

So to use poll() everywhere we'd need the gnulib module.
But empirically the replacement didn't work on macos at least
for this use case, and we know the select emulation wouldn't work on AIX.

So portable use of poll() will be awkward.

cheers,
Pádraig



Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-12-02 Thread Arsen Arsenović

Carl Edquist  writes:

> On the topic of implementation - I was thinking more about a general solution
> for filter utils, and I am thinking the key thing is to provide a replacement
> (wrapper) for read(2), that polls two fds together (one input and one ouput),
> with no timeout.
>
> It would check for POLLIN on input (in which case do the read()). Otherwise if
> there is an error (POLLERR or POLLHUP) on input, treat it as EOF.  Otherwise 
> if
> there's an error on output, remove this output, or handle it similar to
> SIGPIPE/EPIPE.
>
> (Nothing is written to the output fd here, it's just used for polling.)

I'm concerned with adding such a behavior change by default still.  I
can imagine this "lifetime extension" properly having been relied on in
the last many decades it has been around for ;)

> Although tee has multiple outputs, you only need to monitor a single output fd
> at a time.  Because the only case you actually need to catch is when the final
> valid output becomes a broken pipe.  (So I don't think it's necessary to
> poll(2) all the output fds together.)

That is technically true, but I think coupling this to two FDs might
prove a bit inelegant in implementation (since you'd have to decide
which entry from an unorganized list with holes do you pick?  any of
them could spontaneously go away), so I'm not sure the implementation
would be cleaner that way.

Thanks, have a wonderful day.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-12-02 Thread Carl Edquist via GNU coreutils General Discussion

PS:

On Fri, 2 Dec 2022, Carl Edquist wrote:

On the topic of implementation - I was thinking more about a general 
solution for filter utils, and I am thinking the key thing is to provide 
a replacement (wrapper) for read(2), that polls two fds together (one 
input and one ouput), with no timeout.


It would check for POLLIN on input (in which case do the read()). 
Otherwise if there is an error (POLLERR or POLLHUP) on input, treat it 
as EOF. Otherwise if there's an error on output, remove this output, or 
handle it similar to SIGPIPE/EPIPE.


(Nothing is written to the output fd here, it's just used for polling.)

...

I think this general approach of using poll(2) for a single input along 
with a single ouput could be used for both tee and other filters that 
only write to stdout.



A question for Pádraig (or anyone else who might know) -

Are there portability concerns with the system poll(2), or just with the 
gnulib replacement for poll(), which is based on select(2) ?


The commit message (3a1c328cd) seems to suggest the latter.

In other words, are there any platforms that do not support using the 
system poll(2) directly?


tail.c seems to use poll(2) for _AIX || __sun || __APPLE__ || HAVE_INOTIFY 
and *otherwise* it uses select(2).


But I am thinking, if it's available, poll() is the more appropriate 
interface for this check in the first place.


The 'trick' in tail.c of using select() to test when STDOUT becomes 
available for reading, and taking that as an indication of an error 
condition, doesn't seem to be documented (in my linux select(2)), and also 
seems like it might not work as intended if fd 1 is open for read-write.



Anyway if it's possible just to use poll(2) (the system one, not the 
gnulib replacement), that might simplify the portability logic.



What do you think?


Thanks,
Carl


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-12-02 Thread Carl Edquist via GNU coreutils General Discussion

On Wed, 30 Nov 2022, Arsen Arsenović wrote:


Carl Edquist  writes:

It sounds like one way or another you want to copy your endless but 
intermittent input to multiple output pipes, but you want to quit as 
soon as all the output pipes become broken.


Precisely.  The most important requirement there is that the tee-based 
substitute imitates the lifetime of it's longest lived output.  Now I'm 
thinking, maybe --pipe-check should also block SIGPIPE, to prevent the 
race between poll, process death and write (which would result in the 
process getting killed, as it'd happen right now, to see what I mean, 
try ``tee >(sleep 100) >(:)'' and press enter after a bit; a race could 
make --pipe-check behave like that).


Right, you need to ignore SIGPIPE (like "tee -p" does) for multiple 
outputs if any of them can exit early...  Sometimes I forget that '-p' is 
not on by default for tee.  I don't think I've encountered a use-case for 
specifically wanting this option to be off.



I'll keep this in mind, for v2, which is currently waiting on me having 
some time to research the portability of this whole thing, and for a 
decision on whether to even include this feature is made.


On the topic of implementation - I was thinking more about a general 
solution for filter utils, and I am thinking the key thing is to provide a 
replacement (wrapper) for read(2), that polls two fds together (one input 
and one ouput), with no timeout.


It would check for POLLIN on input (in which case do the read()). 
Otherwise if there is an error (POLLERR or POLLHUP) on input, treat it as 
EOF.  Otherwise if there's an error on output, remove this output, or 
handle it similar to SIGPIPE/EPIPE.


(Nothing is written to the output fd here, it's just used for polling.)

...

Although tee has multiple outputs, you only need to monitor a single 
output fd at a time.  Because the only case you actually need to catch is 
when the final valid output becomes a broken pipe.  (So I don't think it's 
necessary to poll(2) all the output fds together.)


I think this general approach of using poll(2) for a single input along 
with a single ouput could be used for both tee and other filters that only 
write to stdout.


(But again, "tail -f" is different, because it checks for regular files to 
grow as the source of intermittent input - which I don't think is 
something you can use poll() to wait for.)



I'll try to put together a patch do demo what I have in mind...


But if you don't have control over that, the fundamental problem is 
detecting broken pipes *without writing to them*, and I don't think 
that can be solved with any amount of extra pipes and fd redirection...


I imagine that, technically, this is attainable by editing the process 
substitutions involved to also signal the original process back; 
however, this feels less elegant and generally useful than tee handling 
this, given that tee's use-case is redirecting data to many places.


Ah, yeah, I bet you could rig something up waiting on processes and/or 
using signals.  In the general case there is a slight difference between 
"waiting for a process to terminate" and "waiting for the pipe to become 
broken" (the process attached to the read-end of the pipe could close its 
stdin early, or it could exit after forking a child that keeps the pipe 
open), but yeah in the most common case the two events happen together.



Have a nice day :)

Carl


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-11-30 Thread Arsen Arsenović

Carl Edquist  writes:

> It sounds like one way or another you want to copy your endless but
> intermittent input to multiple output pipes, but you want to quit as soon as
> all the output pipes become broken.

Precisely.  The most important requirement there is that the tee-based
substitute imitates the lifetime of it's longest lived output.  Now I'm
thinking, maybe --pipe-check should also block SIGPIPE, to prevent the
race between poll, process death and write (which would result in the
process getting killed, as it'd happen right now, to see what I mean,
try ``tee >(sleep 100) >(:)'' and press enter after a bit; a race could
make --pipe-check behave like that).

I'll keep this in mind, for v2, which is currently waiting on me having
some time to research the portability of this whole thing, and for a
decision on whether to even include this feature is made.

> To me, tee sounds like exactly the place to do that.  Otherwise, you'd have to
> add the broken-pipe detection (as in your patch) to your own program, along
> with the rest of tee's basic functionality  :)
>
> It would be one thing if you controlled the programs that consume the input
> (you could have them handle 'heartbeats' in the input stream, and once these
> programs terminate, the heartbeats would trip on the broken pipe).  (However
> (in)elegant that becomes to implement...)
>
> But if you don't have control over that, the fundamental problem is detecting
> broken pipes *without writing to them*, and I don't think that can be solved
> with any amount of extra pipes and fd redirection...

I imagine that, technically, this is attainable by editing the process
substitutions involved to also signal the original process back;
however, this feels less elegant and generally useful than tee handling
this, given that tee's use-case is redirecting data to many places.

Thanks, have a nice day.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-11-29 Thread Carl Edquist via GNU coreutils General Discussion

On Tue, 29 Nov 2022, Arsen Arsenović wrote:

The issue here isn't the compilers hanging, it's tee living longer than 
all the compilers do because it's stdin doesn't EOF (it'd be preferable 
for it to only live as long as the last of the compilers).


I can imagine attempting to implement this with enough pipe and fd 
redirection magic, but I'm not sure how (in)elegant that becomes.


It sounds like one way or another you want to copy your endless but 
intermittent input to multiple output pipes, but you want to quit as soon 
as all the output pipes become broken.


To me, tee sounds like exactly the place to do that.  Otherwise, you'd 
have to add the broken-pipe detection (as in your patch) to your own 
program, along with the rest of tee's basic functionality  :)


It would be one thing if you controlled the programs that consume the 
input (you could have them handle 'heartbeats' in the input stream, and 
once these programs terminate, the heartbeats would trip on the broken 
pipe).  (However (in)elegant that becomes to implement...)


But if you don't have control over that, the fundamental problem is 
detecting broken pipes *without writing to them*, and I don't think that 
can be solved with any amount of extra pipes and fd redirection...


Carl


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-11-29 Thread Pádraig Brady

On 29/11/2022 21:48, Carl Edquist wrote:

Or perhaps when you mention "inducing SIGPIPE", you are referring to how
tail(1) does things currently (when it detects a broken output), by
attempting raise(SIGPIPE) followed by exit(EXIT_FAILURE).


Yes this is what I was alluding to.


It seems this
is just an attempt to make it look to the waiting parent process that tail
died trying to write to a broken pipe (somewhat of a white lie).  Most
likely it could just exit(EXIT_FAILURE) without confusing the caller.


Right yes we probably should not add this into the mix
and just exit() as tee(1) does now for this case.


... Sorry to see the poll thing is complicated by cross-platform behavior
differences  :(


Yes that is a pain :(


My apologies for the long email...  Hopefully some food for thought! :)


All useful and valid points.
Thanks for taking the time to detail them.

cheers,
Pádraig



Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-11-29 Thread Carl Edquist via GNU coreutils General Discussion

On Tue, 29 Nov 2022, Pádraig Brady wrote:


On 29/11/2022 17:32, Carl Edquist wrote:

 ...

 If this kind of detect-broken-output-pipe logic were added to filter
 utils generally, the above example (with 4 cats) would return to the
 shell immediately.


Right. Thanks for discussing the more general pattern.
I.e. that SIGPIPE doesn't cascade back up the pipeline,
only upon attempted write to the pipe.
So it's not really a tee issue, more of a general pattern.

So it wouldn't be wrong to add this to tee (by default),
but I'm not sure how useful it is given this is a general issue for all 
filters.


That makes sense; though I suppose it would continue to become more useful 
for these types of cases as it gets added to more filters  :)


Also I'm a bit wary of inducing SIGPIPE as traditionally it hasn't been 
handled well:


But wait now, are we talking about inducing SIGPIPE?

In the current patch for tee, I think the idea was just to remove an 
output from the list when it's detected to be a broken pipe, allowing tee 
to exit sooner if all of its outputs are detected to be broken.


Similarly for the general filter case (usually with only a single output), 
the idea would be to allow the program to exit right away if it's detected 
that the output is a broken pipe.



https://www.pixelbeat.org/programming/sigpipe_handling.html


Actually, to me, if anything this page seems to serve as motivation to add 
broken-pipe checking to coreutils filters in general.


The article you linked ("Don't fear SIGPIPE!") discusses three topics:


1. First it discusses default SIGPIPE behavior - programs that do not 
specifically handle SIGPIPE do the right thing by default (they get 
killed) when they continue writing to their output pipe after it 
breaks.


We wouldn't be changing this for any filters in coreutils.  Writing to a 
broken pipe should still produce a SIGPIPE to kill the program.  (Even 
with broken-pipe detection, this can still happen if input becomes ready, 
then the output pipe's read end is closed, then the write is attempted.)


tee is actually a special case (if -p is given), because then SIGPIPE does 
not kill the process, but the write will still fail with EPIPE and tee 
will remove that output.


We also wouldn't really be inducing anything new for programs earlier in 
the pipeline.  If they don't handle SIGPIPE, they will just get killed 
with it more promptly - they will end up writing to a broken pipe one 
write(2) call sooner.



2. The "Incorrect handling of SIGPIPE" section discusses programs that 
attempt to handle SIGPIPE but do so poorly.  This doesn't apply to us 
either.  Filters that add broken-pipe detection do not need to add SIGPIPE 
handling.  And programs that handle it poorly, earlier in the pipeline, 
will have their problems regardless.  (Again, just one write(2) call 
sooner.)



3. Finally, the "Cases where SIGPIPE is not useful" section actually 
highlights why we *should* add this broken-pipe checking to filters in 
general.


The "Intermittent sources" subsection discusses exactly what we are 
talking about fixing:


For example 'cat | grep -m1 exit' will only exit, when you type a line 
after you type "exit".


If we added broken-pipe checking to cat, then this example would behave 
like the user would have wanted - typing "exit" would cause grep to exit, 
and cat will detect it's output pipe breaking, and exit immediately.


The other example about 'tail' was fixed already, as this kind of checking 
was added to tail, as we've discussed.  It's a good start!  The more utils 
we add it to, the more will be able to benefit.


The "Multiple outputs" subsection is specific to tee, and if anything 
perhaps suggests that the '-p' option should be on by default.  That is, 
it makes an argument for why it makes sense for tee to avoid letting a 
SIGPIPE kill it, but rather only to exit when all the input is consumed or 
all the outputs have been removed due to write errors.


The "Long lived services" subsection is a generalization of what was just 
said about tee - namely that it makes sense that some programs want to 
continue after a failed write attempt into a broken pipe, and such 
programs need to handle or ignore SIGPIPE.  This is true for such programs 
already, and adding broken-pipe checking to a filter in the same pipeline 
doesn't change that at all.  (Again, it will just cause them to get a 
SIGPIPE/EPIPE *promptly* - one write call sooner - when the final consumer 
of the output completes.)


...

Or perhaps when you mention "inducing SIGPIPE", you are referring to how 
tail(1) does things currently (when it detects a broken output), by 
attempting raise(SIGPIPE) followed by exit(EXIT_FAILURE).  It seems this 
is just an attempt to make it look to the waiting parent process that tail 
died trying to write to a broken pipe (somewhat of a white lie).  Most 
likely it could just exit(EXIT_FAILURE) without confusing the caller.  So 
if you'd like to avoid that, it's 

Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-11-29 Thread Arsen Arsenović

William Bader  writes:

> For the case of testing two compile runs, could you use something like the 
> bash
> command below (replacing 'sleep ...' with 'gcc ...')?

The issue here isn't the compilers hanging, it's tee living longer than
all the compilers do because it's stdin doesn't EOF (it'd be preferable
for it to only live as long as the last of the compilers).

I can imagine attempting to implement this with enough pipe and fd
redirection magic, but I'm not sure how (in)elegant that becomes.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-11-29 Thread Arsen Arsenović

Pádraig Brady  writes:

> Right. Thanks for discussing the more general pattern.
> I.e. that SIGPIPE doesn't cascade back up the pipeline,
> only upon attempted write to the pipe.
> So it's not really a tee issue, more of a general pattern.
>
> So it wouldn't be wrong to add this to tee (by default),
> but I'm not sure how useful it is given this is a general issue for all 
> filters.
> Also I'm a bit wary of inducing SIGPIPE as traditionally it hasn't been 
> handled well:
> https://www.pixelbeat.org/programming/sigpipe_handling.html

I hesitated making tee poll driven by default because I can imagine
someone relying on this behavior.

I feel that this is especially useful in the case of teeing pipes
because the example I presented isn't very common with other tools,
since tee is unique in copying to many outputs, managing the N arbitrary
subprocesses can prove difficult.

I wouldn't be necessarily opposed to making all the coreutils able to be
poll driven, but that'd definitely require some common cross-platform
event loopy code ;)

Thanks, have a great night.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-11-29 Thread William Bader
For the case of testing two compile runs, could you use something like the bash 
command below (replacing 'sleep ...' with 'gcc ...')?

(timeout 10 sleep 2 ; echo a) & (timeout 2 sleep 10 ; echo b) & echo c ; wait ; 
echo d

From: coreutils-bounces+williambader=hotmail@gnu.org 
 on behalf of Carl Edquist 
via GNU coreutils General Discussion 
Sent: Tuesday, November 29, 2022 12:32 PM
To: Arsen Arsenović 
Cc: Pádraig Brady ; coreutils@gnu.org 
Subject: Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed 
outputs

Oh and, ...

On Mon, 28 Nov 2022, Pádraig Brady wrote:

> I'm presuming the input generator is depending on the compiler runs
> (written to by tee) to exit cleanly, before exiting / generating more?
> Hence the hangs?
>
> If that was the case then there still might be the potential for hangs
> even if tee detected closed pipes. I.e. if the compiler runs hung rather
> than exited, this not be distinguishable from tee as the pipe outputs
> would remain.

This is a good point too.

So perhaps the generally useful cases are (1) if the intermittent input is
from a tty, but also (2) if the input program (that writes to a pipe) has
similar logic to what we are proposing to add to tee here; ie, that it
detects when its output becomes a broken pipe.

More generally for (2), if the whole command pipeline is written that way,
this will propagate all the way back.  In that sense there might even be
some merit in adding this type of logic to all coreutils programs that
filter stdin to stdout.

For instance, take this oversimplified example with cat:

 cat | cat | cat | cat | true

If this is run on a terminal, you have to hit Enter *4 times* before
control returns to the shell, as it takes that many separate writes to
cause all the pipes to break in sequence (right-to-left) and finally get
the write failure in the left-most cat.

If this kind of detect-broken-output-pipe logic were added to filter utils
generally, the above example (with 4 cats) would return to the shell
immediately.

Carl

On Tue, 29 Nov 2022, Carl Edquist via GNU coreutils General Discussion wrote:

> Hi all,
>
> On Mon, 28 Nov 2022, Arsen Arsenović wrote:
>
>>  Pádraig Brady  writes:
>>
>>>  Trying to understand your use case better,
>>>  ...
>>
>>  The bug we observed is that on occasion, for instance when running with a
>>  tty, or with a script that (for some reason) has a pipe on stdin, the
>>  tee-based "compiler" would hang.  To replicate this, try:
>>
>>  tee > (gcc test.c -o a.out.1) >(gcc test.c -o a.out.2)
>>
>>  in a tty (here, the stdin is meant to be irrelevant).
>
> If I may try to provide a simple example of the problem, consider the
> following command line:
>
>tee > (sleep 3) | sleep 5
>
> Let tee's stdin be a terminal to supply the "intermittent input".
>
> You'll see that after 5 seconds, this will hang indefinitely until you hit
> Enter.
>
> For the first 3 seconds, when hitting the Enter key, tee will successfully
> write the line to each pipe.  Between 3 and 5 seconds, the pipe to "sleep 3"
> will be broken, which tee will notice, and then tee will continue writing the
> lines to the "sleep 5" pipe.
>
> But after 5 seconds, when "sleep 5" has terminated and that pipe becomes
> broken, tee will continue to "hang" waiting for input (in this case the
> intermittent input from the terminal) indefinitely, despite the fact that all
> of tee's outputs are now broken pipes.  tee will only "notice" that the
> "sleep 5" pipe is broken when it receives input after that point, because
> then the write to that pipe fails with EPIPE (and/or a SIGPIPE is delivered).
>
> ...
>
> It seems the ideal thing to happen here is for tee to terminate once it
> determines that all of its outputs are broken pipes.  It comes close to this
> already, but it only learns about this when write attempts fail, and it only
> attempts a write when it has input to tee.
>
> As I suppose was suggested in the patch, perhaps poll(2) could be used to
> wait for POLLIN from fd 0, and POLLHUP for outputs (perhaps limited to pipes
> / sockets).
>
> The patch subject suggests adding --pipe-check as an option, but on first
> blush it seems like this would actually be a good thing to do by default...
>
> Cheers,
> Carl
>


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-11-29 Thread Pádraig Brady

On 29/11/2022 17:32, Carl Edquist wrote:

Oh and, ...

On Mon, 28 Nov 2022, Pádraig Brady wrote:


I'm presuming the input generator is depending on the compiler runs
(written to by tee) to exit cleanly, before exiting / generating more?
Hence the hangs?

If that was the case then there still might be the potential for hangs
even if tee detected closed pipes. I.e. if the compiler runs hung rather
than exited, this not be distinguishable from tee as the pipe outputs
would remain.


This is a good point too.

So perhaps the generally useful cases are (1) if the intermittent input is
from a tty, but also (2) if the input program (that writes to a pipe) has
similar logic to what we are proposing to add to tee here; ie, that it
detects when its output becomes a broken pipe.

More generally for (2), if the whole command pipeline is written that way,
this will propagate all the way back.  In that sense there might even be
some merit in adding this type of logic to all coreutils programs that
filter stdin to stdout.

For instance, take this oversimplified example with cat:

cat | cat | cat | cat | true

If this is run on a terminal, you have to hit Enter *4 times* before
control returns to the shell, as it takes that many separate writes to
cause all the pipes to break in sequence (right-to-left) and finally get
the write failure in the left-most cat.

If this kind of detect-broken-output-pipe logic were added to filter utils
generally, the above example (with 4 cats) would return to the shell
immediately.


Right. Thanks for discussing the more general pattern.
I.e. that SIGPIPE doesn't cascade back up the pipeline,
only upon attempted write to the pipe.
So it's not really a tee issue, more of a general pattern.

So it wouldn't be wrong to add this to tee (by default),
but I'm not sure how useful it is given this is a general issue for all filters.
Also I'm a bit wary of inducing SIGPIPE as traditionally it hasn't been handled 
well:
https://www.pixelbeat.org/programming/sigpipe_handling.html

cheers,
Pádraig



Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-11-29 Thread Carl Edquist via GNU coreutils General Discussion

Hi all,

On Mon, 28 Nov 2022, Arsen Arsenović wrote:


Pádraig Brady  writes:


Trying to understand your use case better,
...


The bug we observed is that on occasion, for instance when running with 
a tty, or with a script that (for some reason) has a pipe on stdin, the 
tee-based "compiler" would hang.  To replicate this, try:


 tee >(gcc test.c -o a.out.1) >(gcc test.c -o a.out.2)

in a tty (here, the stdin is meant to be irrelevant).


If I may try to provide a simple example of the problem, consider the 
following command line:


tee >(sleep 3) | sleep 5

Let tee's stdin be a terminal to supply the "intermittent input".

You'll see that after 5 seconds, this will hang indefinitely until you hit 
Enter.


For the first 3 seconds, when hitting the Enter key, tee will successfully 
write the line to each pipe.  Between 3 and 5 seconds, the pipe to "sleep 
3" will be broken, which tee will notice, and then tee will continue 
writing the lines to the "sleep 5" pipe.


But after 5 seconds, when "sleep 5" has terminated and that pipe becomes 
broken, tee will continue to "hang" waiting for input (in this case the 
intermittent input from the terminal) indefinitely, despite the fact that 
all of tee's outputs are now broken pipes.  tee will only "notice" that 
the "sleep 5" pipe is broken when it receives input after that point, 
because then the write to that pipe fails with EPIPE (and/or a SIGPIPE is 
delivered).


...

It seems the ideal thing to happen here is for tee to terminate once it 
determines that all of its outputs are broken pipes.  It comes close to 
this already, but it only learns about this when write attempts fail, and 
it only attempts a write when it has input to tee.


As I suppose was suggested in the patch, perhaps poll(2) could be used to 
wait for POLLIN from fd 0, and POLLHUP for outputs (perhaps limited to 
pipes / sockets).


The patch subject suggests adding --pipe-check as an option, but on first 
blush it seems like this would actually be a good thing to do by 
default...


Cheers,
Carl


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-11-28 Thread Arsen Arsenović

Pádraig Brady  writes:

> To get a better handle on the generality of this
> I went back about 16 years on the list,
> and even though there have been many tee functionality requests over the 
> years,
> this is the first time this has been requested.

I'm glad not to have wasted time on a redundancy!  I totally overlooked
checking the MLs, so I only searched through the "rejected features"
page; I'll remember to check archives too in the future.

> Trying to understand your use case better,
> I'm presuming the input generator is depending on the compiler runs
> (written to by tee) to exit cleanly, before exiting / generating more?
> Hence the hangs?

> If that was the case then there still might be the potential for hangs
> even if tee detected closed pipes. I.e. if the compiler runs hung rather
> than exited, this not be distinguishable from tee as the pipe outputs would 
> remain.

Yes, but these are fine.  The problem here isn't that the build process
can hang, the problem is that it can hang in a way different to the
compilers it's emulating.

The work tee is involved in consists of running the configure stage of a
package build against two compilers, universally, by essentially tee'ing
to both, storing and comparing the results, and then returning the
result of the "normal" compiler, in order to detect packages that
misconfigure due to compiler changes.

The bug we observed is that on occasion, for instance when running with
a tty, or with a script that (for some reason) has a pipe on stdin, the
tee-based "compiler" would hang.  To replicate this, try:

  tee >(gcc test.c -o a.out.1) >(gcc test.c -o a.out.2)

in a tty (here, the stdin is meant to be irrelevant).

In an applied version of this scenario, this would be cc-old/normal and
cc-new of some sort, through a wrapper function so that results are
stored/retrieved/compared/etc, but that complexity isn't necessary to
see to understand the issue I think (but [1] is the script involved in
the process, see the timeout line and the comment above it)

The behavior we'd want out of tee here is to let it pass down as much
data as possible to both compilers, and to be able to give up when the
last of the compilers die, hence waiting for POLLHUP events.

This is useful in other instances where stdin is not necessarily
relevant to what tee is sending data to, and where you can't deduce
whether it would be (at least, without fairly complex code to replicate
argument parsing).

> If that's the case this has become a bit less generally useful in my mind.
>
> To keep tee data driven, perhaps your input could periodically
> send a "clock" input through to (say a newline), to check everything
> is still running as expected.  I.e. your periodic input generator
> seems like it would be async already, so it would be better to
> add any extra async logic there, and keep tee more simple and data driven.

This is, sadly, not in our code.  We control (distro) build scripts
(ebuilds) and the compiler-diff-wrapper-thing, and the ability to run
(normal program) build systems unmodified, with just a custom CC, in
order to detect when they misbehave due to a compiler update is
paramount here.

> cheers,
> Pádraig.
>
> p.s. while reading the tee docs in this area, they were a bit dispersed,
> so the attached updates the texinfo for the -p option to summarise
> it's operation and how it differs from the default operation.

Thanks!  This seems nice.

Have a great night.

[1] https://gist.github.com/thesamesam/4ddaa95f3f42c2be312b676476cbf505
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-11-28 Thread Pádraig Brady

On 19/11/2022 19:27, Arsen Arsenović wrote:


Pádraig Brady  writes:


Thanks a lot for the patch.

There is similar functionality in tail.c (which should probably be reused
if we do decide to implement this in tee as it's tricky to do portably).
Anyway the tail(1) case makes sense considering:


Looking at ``check_output_alive'', this logic should be fairly easy to
reuse.  Worst case, I'd need to refactor some of the code to handle the
non-pipe-check case more elegantly so that it doesn't become too
unreadable when also adding the logic to pick between poll and select.


   tail -f file.log | grep -q trigger && process_immediately

So a similar use case with tee might be:

   tee -p (grep -q trigger_1) | grep -q trigger_2 && 
process_immediately

However tee wouldn't be waiting for more input in that case.
It would either consume the whole file, or exit when processing it.

So a more appropriate case is:

   intermittent_output |
   tee -p >(grep -q trigger_1) | grep -q trigger_2 && process_immediately


This case is just about what we were using tee for when I wrote this
patch.  We use a compiler wrapper that compares the outputs of few
compilers by running them in parallel (to detect some new behavior), and
we started noticing some builds would mysteriously hang forever.  I
traced it back to ``tee'' waiting on stdin even after the child
processes die on occasion.

Really, that was an oversight on my part.  I didn't think of the (quite
normal) case of stdin being some long-lived low traffic pipe rather than
either a file or /dev/null or a pipe that gets closed properly after
writing.


Where intermittent_output is stdin, the use case is a bit contrived
as any input will cause tee to exit.
The general more general non stdin case above has some merit,
though not as common as the tail example I think.


Right, but no user input could ever happen when the hangs described
above happened, since it'd happen in some automated code, so (AFAICT)
the only option was to teach tee to detect outputs going away more
promptly.


To get a better handle on the generality of this
I went back about 16 years on the list,
and even though there have been many tee functionality requests over the years,
this is the first time this has been requested.

Trying to understand your use case better,
I'm presuming the input generator is depending on the compiler runs
(written to by tee) to exit cleanly, before exiting / generating more?
Hence the hangs?

If that was the case then there still might be the potential for hangs
even if tee detected closed pipes. I.e. if the compiler runs hung rather
than exited, this not be distinguishable from tee as the pipe outputs would 
remain.

If that's the case this has become a bit less generally useful in my mind.

To keep tee data driven, perhaps your input could periodically
send a "clock" input through to (say a newline), to check everything
is still running as expected.  I.e. your periodic input generator
seems like it would be async already, so it would be better to
add any extra async logic there, and keep tee more simple and data driven.

cheers,
Pádraig.

p.s. while reading the tee docs in this area, they were a bit dispersed,
so the attached updates the texinfo for the -p option to summarise
it's operation and how it differs from the default operation.From 950f3ff58ce483670b9b5e12823a57fa8dd353fa Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= 
Date: Mon, 28 Nov 2022 22:39:19 +
Subject: [PATCH] doc: tee: make -p decription more complete

* doc/coreutils.texi (tee invocation): Give a more
cohesive description of the -p option, and how
it differs from the default operation.
---
 doc/coreutils.texi | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index fca7f6961..ed4114e95 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -14079,8 +14079,13 @@ Ignore interrupt signals.
 @itemx --output-error[=@var{mode}]
 @opindex -p
 @opindex --output-error
-Adjust the behavior with errors on the outputs,
-with the long form option supporting selection
+Adjust the behavior with errors on the outputs.
+In summary @option{-p} allows @command{tee} to continue to process data
+to any remaining outputs, if any pipe outputs exit early.
+I.e. the default operation when @option{--output-error} is
+@emph{not} specified is to exit immediately on error writing to a pipe,
+and diagnose errors writing to a non-pipe.
+The long form @option{--output-error} option supports selection
 between the following @var{mode}s:
 
 @table @samp
-- 
2.26.2



Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-11-27 Thread Arsen Arsenović
Hi,

Pádraig Brady  writes:

> I vaguely remember on macos that POLLRDBAND needed to be set on read fds for 
> select,
> though didn't check all combinations since we didn't actually need
> the poll() replacement.  Also select() (emulation of poll) doesn't work on
> Solaris or AIX, so we needed to explicitly disable emulation there.

Ah, hmm.  I don't know how notification APIs work there, maybe some
other one can be picked?  (IIRC Windows also has it's own set of
notification APIs that Gnulib uses when on Windows)

> Perhaps we could adjust poll() emulation to be compat everywhere,
> but I'm not confident.

I see.  I'll try to dig around a bit for notes about these platforms
(IIRC the libevent manual documented a bunch of weird notification API
quirks across platforms) to see how to reliably wait on pipes becoming
either readable, closed, or writable, if possible at all.

> We can help test on esoteric systems,
> especially if appropriate tests are in place.

More of a reason to figure out the test then :).  On that topic, I did
come up with a testcase that should be appropriate for the Coreutils
testsuite, but it takes a while to execute (5s), which is something to
consider.  Here it is:

  ( sleep 5 | (timeout 3 tee -P && echo g >&2) | : ) 2>&1 | grep -q g

The 5s time is no coincidence ;).  Maybe a better tool exists, that I'm
unaware of, that would just idly wait on stdout to become unwritable
(which sounds suspiciously like the issue this patch addresses ;).
Expect might also be able to handle this test, but I'm not sure whether
that's available in the testsuite.

> Note https://www.nongnu.org/pretest/ which may be useful.

I'll play around with this a bit later, too.

Thanks, have a lovely evening.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-11-27 Thread Pádraig Brady

On 27/11/2022 11:32, Arsen Arsenović wrote:


Arsen Arsenović  writes:


Looking at ``check_output_alive'', this logic should be fairly easy to
reuse.  Worst case, I'd need to refactor some of the code to handle the
non-pipe-check case more elegantly so that it doesn't become too
unreadable when also adding the logic to pick between poll and select.


Got some free time again.  The more I think about how to do this, the
more it feels like the solution involves a portable poll or poll-like
function that gets translated to select() if need be on a given
platform, which sounds an awful lot like Gnulib ;).  The
``check_output_alive'' code checks for Gnulib poll and declares it
incompatible, but I can't quite tell what incompatibility this is.  I
wonder if that can be fixed?


I vaguely remember on macos that POLLRDBAND needed to be set on read fds for 
select,
though didn't check all combinations since we didn't actually need
the poll() replacement.  Also select() (emulation of poll) doesn't work on
Solaris or AIX, so we needed to explicitly disable emulation there.

Perhaps we could adjust poll() emulation to be compat everywhere,
but I'm not confident.


If not, I can make factor out the check_alive logic and have it also
check for file descriptors with input data.  Sadly, I don't have many
non-GNU and non-Linux systems to test for poll behavior on; just some
(modern) Solaris and BSD VMs.


We can help test on esoteric systems,
especially if appropriate tests are in place.
Note https://www.nongnu.org/pretest/ which may be useful.

cheers,
Pádraig



Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-11-27 Thread Arsen Arsenović

Arsen Arsenović  writes:

> Looking at ``check_output_alive'', this logic should be fairly easy to
> reuse.  Worst case, I'd need to refactor some of the code to handle the
> non-pipe-check case more elegantly so that it doesn't become too
> unreadable when also adding the logic to pick between poll and select.

Got some free time again.  The more I think about how to do this, the
more it feels like the solution involves a portable poll or poll-like
function that gets translated to select() if need be on a given
platform, which sounds an awful lot like Gnulib ;).  The
``check_output_alive'' code checks for Gnulib poll and declares it
incompatible, but I can't quite tell what incompatibility this is.  I
wonder if that can be fixed?

If not, I can make factor out the check_alive logic and have it also
check for file descriptors with input data.  Sadly, I don't have many
non-GNU and non-Linux systems to test for poll behavior on; just some
(modern) Solaris and BSD VMs.

Thanks in advance, have a lovely day.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-11-19 Thread Arsen Arsenović

Pádraig Brady  writes:

> Thanks a lot for the patch.
>
> There is similar functionality in tail.c (which should probably be reused
> if we do decide to implement this in tee as it's tricky to do portably).
> Anyway the tail(1) case makes sense considering:

Looking at ``check_output_alive'', this logic should be fairly easy to
reuse.  Worst case, I'd need to refactor some of the code to handle the
non-pipe-check case more elegantly so that it doesn't become too
unreadable when also adding the logic to pick between poll and select.

>   tail -f file.log | grep -q trigger && process_immediately
>
> So a similar use case with tee might be:
>
>   tee -p (grep -q trigger_1) | grep -q trigger_2 && 
> process_immediately
>
> However tee wouldn't be waiting for more input in that case.
> It would either consume the whole file, or exit when processing it.
>
> So a more appropriate case is:
>
>   intermittent_output |
>   tee -p >(grep -q trigger_1) | grep -q trigger_2 && process_immediately

This case is just about what we were using tee for when I wrote this
patch.  We use a compiler wrapper that compares the outputs of few
compilers by running them in parallel (to detect some new behavior), and
we started noticing some builds would mysteriously hang forever.  I
traced it back to ``tee'' waiting on stdin even after the child
processes die on occasion.

Really, that was an oversight on my part.  I didn't think of the (quite
normal) case of stdin being some long-lived low traffic pipe rather than
either a file or /dev/null or a pipe that gets closed properly after
writing.

> Where intermittent_output is stdin, the use case is a bit contrived
> as any input will cause tee to exit.
> The general more general non stdin case above has some merit,
> though not as common as the tail example I think.

Right, but no user input could ever happen when the hangs described
above happened, since it'd happen in some automated code, so (AFAICT)
the only option was to teach tee to detect outputs going away more
promptly.

> I'll think a bit more about it.
>
> thanks!
> Pádraig

Thank you!

Have a great evening.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature


Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs

2022-11-19 Thread Pádraig Brady

On 19/11/2022 08:23, Arsen Arsenović wrote:

This flag comes in useful in scripts in which tee is used to duplicate
output over process substitutions, so that dying processes and no stdin
won't lead to a hang.

Without this option, a command like ``tee >(compiler1) >(compiler2) |
compiler3'' would hang forever if stdin was a terminal and the compilers
didn't expect any data, but since tee -P would do polling, the compilers
dying could mean that tee terminates.

* src/tee.c (long_options): Add -P, --pipe-check.
(main): Parse out -P, --pipe-check.
(usage): Add -P, --pipe-check.
(tee_files): Enable polling inputs and outputs to drive the tee,
rather than exclusively using read and write blocking to control
data flow.
* doc/coreutils.texi (tee invocation): Document -P, --pipecheck.
---
Hi there,

While working on some compiler hacks, I found it necessary to copy data in
parallel to multiple compilers to compare their results.  Currently, with
``tee'', that is only mostly possible, since tee always expects to read some
data on stdin before it can see whether it can terminate.  This breaks unless
stdin is /dev/null, since stdin would never get EOF or some data to wake tee up
and have it terminate.

The polling implementation provided below will wait on and remove terminating
outputs from the list of files ``tee'' watches when an error on them occurs,
except in the special case of stdin, which will never be handled by the poll
loop, and only serves to break out of the poll in the case of new data.

I couldn't figure out a decent test to write for this, the solutions I came up
all relied on PIPESTATUS, which I wasn't sure is permitted in tests, since they
all share a #!/bin/sh bang, but if that's allowed, I could add a test like the
following in:

   sleep 11 | timeout 10 tee -P | true
   [[ "${PIPESTATUS[1]}" -eq 0 ]] || fail=1

Note that the above does take 10-11 seconds in all cases, but the test needs
some long-enough-lived program that does not write data, and sleep fits the
bill.  The timeout could probably get knocked down a good bit.

Thanks in advance, have a great day.

Tested on x86_64-pc-linux-gnu.

  NEWS   |   3 ++
  doc/coreutils.texi |  11 +
  src/tee.c  | 103 ++---
  3 files changed, 110 insertions(+), 7 deletions(-)

diff --git a/NEWS b/NEWS
index b6b5201..3a6bbfe 100644
--- a/NEWS
+++ b/NEWS
@@ -62,6 +62,9 @@ GNU coreutils NEWS-*- 
outline -*-
wc now accepts the --total={auto,never,always,only} option
to give explicit control over when the total is output.
  
+  tee now accepts the --pipe-check flag, to enable polling input and output

+  file descriptors, rather than only relying on stdin for notifications.
+
  ** Improvements
  
date --debug now diagnoses if multiple --date or --set options are

diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index fca7f69..c372e2e 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -14075,6 +14075,17 @@ them.
  @opindex --ignore-interrupts
  Ignore interrupt signals.
  
+@item -P

+@itemx --pipe-check
+@opindex -P
+@opindex --pipe-check
+
+Polls file descriptors instead of just waiting on standard input, to
+allow dying pipes to be detected instantly, rather than waiting for
+standard input to write some data first.  This is especially useful to
+permit process substitutions to notify @command{tee} of completion, so
+that it stops waiting for input data when all outputs are closed.
+
  @item -p
  @itemx --output-error[=@var{mode}]
  @opindex -p
diff --git a/src/tee.c b/src/tee.c
index 971b768..25ab5b5 100644
--- a/src/tee.c
+++ b/src/tee.c
@@ -20,6 +20,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "system.h"

  #include "argmatch.h"
@@ -37,7 +38,7 @@
proper_name ("Richard M. Stallman"), \
proper_name ("David MacKenzie")
  
-static bool tee_files (int nfiles, char **files);

+static bool tee_files (int nfiles, char **files, bool pipecheck);
  
  /* If true, append to output files rather than truncating them. */

  static bool append;
@@ -61,6 +62,7 @@ static struct option const long_options[] =
{"append", no_argument, NULL, 'a'},
{"ignore-interrupts", no_argument, NULL, 'i'},
{"output-error", optional_argument, NULL, 'p'},
+  {"pipe-check", no_argument, NULL, 'P'},
{GETOPT_HELP_OPTION_DECL},
{GETOPT_VERSION_OPTION_DECL},
{NULL, 0, NULL, 0}
@@ -89,6 +91,7 @@ usage (int status)
  Copy standard input to each FILE, and also to standard output.\n\
  \n\
-a, --append  append to the given FILEs, do not overwrite\n\
+  -P, --pipe-check  polls before reading, to detect closed pipes\n\
-i, --ignore-interrupts   ignore interrupt signals\n\
  "), stdout);
fputs (_("\
@@ -118,6 +121,7 @@ int
  main (int argc, char **argv)
  {
bool ok;
+  bool pipecheck = false;
int optc;
  
initialize_main (, );