On Sat, 27 Feb 2016, Jilles Tjoelker wrote:

On Sat, Feb 27, 2016 at 09:48:05AM -0500, Pedro Giffuni wrote:
In the case of rlogind, note that the above limitation [FD_SETSIZE]
has disappeared by using poll(2).

I will add that FreeBSD has a native poll(2) implementation, it is
not a wrapper around select as some old postings would suggest.

I don't have any plans to do a massive shakeup against select(2), this
was some lower hanging fruit that was easy to change. For new code
kqueue(2) should be preferred.

The FD_SETSIZE can be a more important issue in library code which may
be called from applications that have many descriptors open already.

I don't agree with always using kqueue(2) for new code. Provided poll(2)
has the necessary functionality and the number of file descriptors is
low, using poll(2) tends to result in simpler code and better
performance. Also, poll(2) is more portable and avoids a failure mode
from the kqueues rlimit.

All the conversions to poll() seem to be buggy.  poll() gives better
detection of EOF, but you have to actually check for it or you get
different behaviour from select().  gdb in FreeBSD has been broken
for 10-15 years by misconversion to poll().  E.g.:

    ttyv1:bde@besplex:~> echo "p 2+2" | gdb -q
    (gdb) Hangup detected on fd 0
    error detected on stdin

This is because it fails to check for POLLIN on stdin when it sees
POLLHUP on stdin.  It loses all the buffered input.  I tried more
input.  It loses all input for a buffer size of < 8K, then starts
seeing some.  The details depend on the kernel buffering.  PIPE_BUF
is only 512 but the kernel normally uses larger buffers than that.

kdump output for this:

   987 cat      CALL  write(0x1,0x8060000,0x1fff)
   987 cat      GIO   fd 1 wrote 4096 bytes
       "
        p 2+2
        p 2+2
        ...
        p 2"
   987 cat      RET   write 8191/0x1fff
   987 cat      CALL  read(0x3,0x8060000,0x4000)
   987 cat      GIO   fd 3 read 0 bytes
       ""
   987 cat      RET   read 0
   987 cat      CALL  close(0x3)
   987 cat      RET   close 0
   987 cat      CALL  close(0x1)
   987 cat      RET   close 0
   987 cat      CALL  exit(0)
   986 sh       RET   wait4 987/0x3db
   986 sh       CALL  wait4(0xffffffff,0xbfbfe7c8,0x2,0)
   988 gdb      RET   poll 1
   988 gdb      CALL  write(0x1,0x8300000,0x18)
   988 gdb      GIO   fd 1 wrote 24 bytes
       "Hangup detected on fd 0
       "
   988 gdb      RET   write 24/0x18
   988 gdb      CALL  write(0x1,0x8300000,0x18)
   988 gdb      GIO   fd 1 wrote 24 bytes
       "error detected on stdin
       "
   988 gdb      RET   write 24/0x18
   988 gdb      CALL  sigprocmask(0x1,0,0xbfbfe33c)
   988 gdb      RET   sigprocmask 0
   988 gdb      CALL  exit(0)
   986 sh       RET   wait4 988/0x3dc
   986 sh       CALL  exit(0)

The changes have the opposite problem.  They check for POLLIN but not
POLLHUP.  This depends on the unportable and broken but possibly
permittied behaviour of poll() always setting POLLIN when it sets
POLLHUP.

select() has no way to report POLLHUP, so it converts POLLHUP to
input-ready.  poll() should do the following:
- for hangup with buffered input, as in the above pipe case, set
  POLLHUP to indicate the hangup and POLLIN to indicate the data
- for hangup with no buffered input, set POLLHUP to indicate the
  hangup and clear POLLIN to indicate no data
but the usual misimplementation does:
- always set POLLIN when setting POLLHUP.  This makes poll() behave
  more like select(), so that buggy conversions to poll() work, but
  it means that POLLIN cannot be trusted for determining if more
  than non-null data can be read.  This is always done for regular
  files IIRC.  For regular files, there is never any buffered input
  POLLHUP with POLLIN really means POLLHUP without POLLIN.
Another possible misimplementation that I have not observed is:
- delay setting POLLHUP until all buffered input has been read or
  discardered.  This would work OK for terminals since terminals
  are specified to discard input on hangup (but there are many races
  at low levels with the timing of the hangup and the input -- many
  drivers lose input much like gdb does -- they deliver the hangup
  to upper layers synchronously, but deliver buffered input later).
  However, for pipes the input is not discarded, so POLLHUP should
  not be delayed, so that applications can decide if they want to
  discard the input on hanhyp.

The poll regression tests check which version of the bugs are implemented
for pipes and sockets.  They only pass for some file types because the
bugs are implemented for others.  The buggy cases are system-dependent.

Due to the bugs, POLLIN together with POLLHUP can never be trusted.
Applications should use similar code to after select() to detect EOF:

        switch (events & (POLLHUP | POLLIN)) {
        case POLLHUP:
                /* Non-buggy case. */
                exit(0);
        case POLLIN:
                /* Normal input. */
                nr = read(fd, buf, size);
                if (nr == 0)
                        goto maybe_eof; /* likely EOF, but check again */
                handle_input(nr);
                break;
        case POLLHUP | POLLIN:
                nr = read(fd, buf, size);
                if (nr == 0)
                        /*
                         * There may have been input that was eaten by a
                         * race, but any new input would be associated with
                         * a new connection and we don't want to see it
                         * even if POLLHUP is not sticky for this file type.
                         */
                        exit(0);
        }

or in a sloppier version, just convert POLLHUP to POLLIN like select() does,
and assume that that other threads don't eat the input so that reading 0
means EOF:

        if (events & POLLHUP)
                events |= POLLIN;
        if (events & POLLIN) {
                nr = read(fd, buf, size);
                if (nr == 0)
                        exit(0);        /* only likely EOF, but don't check */
        }
        handle_input(nr);

Bruce
_______________________________________________
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Reply via email to