Hello,

On 2020-08-16 03:04, Rob Landley wrote:
On 8/15/20 6:43 AM, Ariadne Conill wrote:> On 2020-08-15 05:43, Rob Landley 
wrote:
On 8/13/20 10:02 AM, Rob Landley wrote:
On 8/11/20 2:25 PM, enh via Toybox wrote:
The key issues here turned out to be that getty is responsible for
creating the file if it doesn't exist, and that the -H flag doesn't
control whether utmp is updated, but whether or not to override the
hostname within the utmp entry.

While I'm here switch to the more modern utx APIs that all the non-pending
parts of toybox use, and remove the duplication.

Applied and pushed, but this reminds me I should really clean this up and
promote it.

The reason I haven't is I don't really have a test environment for it? (This
waits for a modem to dial in to a serial port, adjusts the baud rate, and calls
login. I got my first broadband connection in 2001 and haven't owned a modem
since.)

Do you have the FOGGIEST idea why:

      xopen_stdio(TT.tty_name, O_RDWR|O_NDELAY|O_CLOEXEC);
      fcntl(0, F_SETFL, fcntl(0, F_GETFL) & ~O_NONBLOCK); // Block read

O_NDELAY == O_NONBLOCK so it opens the tty nonblock and then immediately
switches off the nonblock. What (if anything) does that DO? Why is it here? Is
this something BSD 2.x needed in 1983 because of certain defective ASR-33
teletype variants when plugged into one of the HP minicomputers with the "zero
and add packed" instruction?

The man page is unenlightening, and the kernel source has 373 _files_ with
O_NONBLOCK in them...

In traditional SysV (with streams), O_NDELAY does not behave the same as
O_NONBLOCK.

Linux explicitly rejected supporting "streams" 22 years ago:

   https://lkml.org/lkml/1998/6/28/138

One Linux one of those symbols is #defined to the other:

   glibc: /usr/include/asm-generic/fcntl.h:#define O_NDELAY     O_NONBLOCK
   musl: arch/x86_64/bits/fcntl.h:#define O_NDELAY O_NONBLOCK
   bionic: libc/kernel/uapi/asm-generic/fcntl.h:#define O_NDELAY O_NONBLOCK

And that's been true since 0.0.1:

   ~/linux/linux$ git checkout v0.0.1
   Previous HEAD position was a068026b4a06... Linux 1.0
   HEAD is now at cff5a6fb6676... Linux-0.01 (September 17, 1991)
   ~/linux/linux$ grep -r O_NDELAY include
   include/fcntl.h:#define O_NDELAY     O_NONBLOCK

(Yes, I have a git tree back to 0.0.1, http://landley.net/kdocs/fullhist/)

Right. Mainline Linux has never implemented streams. I was just discussing why O_NONBLOCK is a thing to begin with, verses just using O_NDELAY in all cases. There is, however, a third-party implementation of streams for Linux that is used for some telecom software.

I'm trying to figure out if modern Linux needs those two lines of code I posted.

It doesn't.

In SysV, O_NDELAY means that a read from a file descriptor that does not have
any data (or a write which would extend the sendq length above the kernel
watermark) would immediately return 0 and not set an errno.

Did I mention that computer history is a hobby of mine? :)

Although Linus used the printed Solaris manuals in his university library to
implement the second round of Linux system calls in 1991 (because he couldn't
afford a copy of the posix spec and nobody would give him one when he asked
https://www.kclug.org/old_archives/linux-activists/1992/jul/4/0335.shtml so he
used what was available), and that did give Linux 0.0.1 a "System V flavor"
several people at the time commented on (since the switch from SunOS to Solaris
was AT&T lawyers convincing Sun to rebase from BSD to System V for licensing
reasons, as detailed in Robert Young's book "Under the Radar"), that's the
extent of Linux's System V connection.

That is an interesting anecdote, I never knew that Linus used the Solaris manuals.

Linux was a fresh from-scratch implementation written under Minix and using the
Minix filesystem format the summer after Linus took a unix internals course
using Andrew Tanenbaum's Minix textbook and initially posted to comp.os.minix
and recruiting existing minix developers from there who maintained minix patch
stacks that could never go upstream for licensing reasons.

But it's not a minix clone either: Torvalds rejected Minix's microkernel
approach and did a new monolithic kernel, a design the author of Minix strongly
and publicly disapproved of:

   https://www.oreilly.com/openbook/opensources/book/appa.html

It's entirely possible that this getty is blindly copying a procedure that dates
back to systemv, but proving a negative is a slightly higher bar than that. :(

This is basically where I was going with this. Most likely, that is the intention here: to get an O_NONBLOCK that behaves like legacy SysV O_NDELAY on SysV systems.

It is possible (likely, even) that early Linux contributors copied procedures that would have been appropriate on Solaris but in reality were a no-op on Linux due to following BSD semantics for O_NDELAY.

BSD also added O_NDELAY, which sets errno to EWOULDBLOCK and returns -1.  POSIX
standardized the BSD implementation, adding the O_NONBLOCK flag for it.

I'm confused, "30 years ago O_NDELAY did not behave the same as O_NONBLOCK" and
"POSIX standardized the BSD implementation, adding the O_NONBLOCK flag for it"?

So, basically during the UNIX wars, both the BSD camp and AT&T implemented O_NDELAY, but differently. POSIX standardized the BSD behavior and EWOULDBLOCK with it as O_NONBLOCK to solve the symbol collision.

There is also the FIONBIO ioctl that is still around because of this, in SysV, FIONBIO worked on file descriptors that were not backed by a stream, while O_NDELAY was used with streams. POSIX of course ensured that O_NONBLOCK worked for *all* cases.

The oldest version of posix that's online seems to be
https://pubs.opengroup.org/onlinepubs/009695399/functions/open.html from 2001
(which still doesn't provide useful information about WHEN you'd need this flag,
under what circumstances does a SERIAL device block for nontrivial amounts of
time, what flow control? In open()? The Data Terminal Ready line?)

The Open Group doesn't host earlier versions, as explained in Q8 of their FAQ:

   http://www.opengroup.org/austin/papers/posix_faq.html

So, in SysV, this would give you a file descriptor that has "blocking" reads but
where the program itself does the blocking.

The code I posted fed the nonblock to open() and then immediately took it off
the resulting file descriptor, so only the behavior change to open itself would
be relevant.

On a system where O_NDELAY and O_NONBLOCK are the same, the behavior change of course results in the file descriptor becoming blocking.

I was wondering A) what the open() behavior change actually was, B) if the
behavior change was purely historic (either not applying to current kernels, or
not applying to any drivers still in modern use), which is why I was trying to
read the kernel source for where this flag is used internally (and finding too
many occurrences in too many drivers to wade through just then).

On Linux, as previously mentioned you just wind up with a non-blocking FD as you already know.

On SysV though, you would wind up with a descriptor that is O_NDELAY, but *not* O_NONBLOCK, which does have some semantic differences (namely that read() immediately bails with zero if there is no data without EWOULDBLOCK -- try it on an SCO machine sometime :)).

If I were to guess, this was a workaround for some bug in the SysV kernel and how it interacted with the termio subsystem. Streams are a nightmare, and Linux rightly rejected them.

Next post I tracked down at least some commentary on what the open() behavior
change was (open itself on serial devices can indeed block, still not sure under
which circumstances) and this turns that into an error return instead. And
either it's not needed because minicom doesn't do it, or minicom should probably
also do it.

Implementation-wise this flag mostly just gets passed to drivers which act upon
it, I need to audit the Linux tty path (which I read a couple files into
yesterday to see what the SIGHUP was about, it's sent from
drivers/tty/tty_jobctrl.c function disassociate_ctty() by the way) and also
audit the current set of serial drivers, and THEN maybe I'd have proved a
negative. (Modulo https://landley.net/toybox/faq.html#support_horizon which is
currently... linux v3.10.)

But for the moment I'm probably just keeping that code with a TODO comment on it
about maybe not being of modern relevance? If I was just adding it I'd leave it
off and see who complained, but Elliott is sending me bugfixes for the version
in pending which implies somebody is using it?

It's not one of the special 'tread lightly' pending entries AOSP is already
using (all of which i should get cleaned up and promoted after sh.c and 
route.c):

   $ sed -n 's@.*pending/\([^.]*\).*@\1@p' android/toybox/Android.bp | xargs
   dd diff expr getopt tr getfattr lsof modprobe more readelf stty traceroute vi

But it's not quite "rip the unexplained code out and hope for the best" 
either...

I would say it is likely safe to rip out the code. But as you mention, it is probably best to verify that the tty subsystem is behaving sanely first.

Ariadne
_______________________________________________
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Reply via email to