gmake write error and possible solution

2009-01-06 Thread Vadim Zhukov
Hello all.

Putting this on a misc@ list because this looks like not the port problem 
itself.

Recently I start running (too) often in GMake's write error problem. It 
was reported some times ago here with no result. And after some more 
digging I found that commit in DragonFlyBSD:

http://www.mail-archive.com/commits%40crater.dragonflybsd.org/msg02534.html

 Log:
 Do not set O_NONBLOCK on a threaded program's descriptors any more. 
 Instead, use the new system calls to directly issue non-blocking I/O. 
 Additionally, force blocking I/O for debug output.

 This partly solves the problem of programs such as bmake or gmake
 fork/exec'd children which happen to be threaded.  The children would
 set O_NONBLOCK on e.g. stdin, stdout, and stderr, resulting in
 unexpected operation if the unrelated parent program tries to issue a
 read or write.

 Solves: gmake 'write error' problem

Can anyone expirinced comment this, please?

-- 
  Best wishes,
Vadim Zhukov



Re: gmake write error and possible solution

2009-01-06 Thread Ted Unangst
On Tue, Jan 6, 2009 at 6:47 PM, Vadim Zhukov persg...@gmail.com wrote:
 Recently I start running (too) often in GMake's write error problem. It
 was reported some times ago here with no result. And after some more
 digging I found that commit in DragonFlyBSD:

 http://www.mail-archive.com/commits%40crater.dragonflybsd.org/msg02534.html

 Log:
 Do not set O_NONBLOCK on a threaded program's descriptors any more.
 Instead, use the new system calls to directly issue non-blocking I/O.
 Additionally, force blocking I/O for debug output.

 This partly solves the problem of programs such as bmake or gmake
 fork/exec'd children which happen to be threaded.  The children would
 set O_NONBLOCK on e.g. stdin, stdout, and stderr, resulting in
 unexpected operation if the unrelated parent program tries to issue a
 read or write.

 Solves: gmake 'write error' problem

 Can anyone expirinced comment this, please?

We don't have whatever these new syscalls are and are unlikely to
adopt them, so I don't think the fix is particularly relevant to
openbsd.  But yeah, faking threads in userland causes trouble.  If we
replace the thread library with a better one, then the problem goes
away.  Maybe.

Let me qualify that.  The reason for the maybe is that there can be
many reasons for a program to set stdout to non-blocking.  It may not
always be the result of pthread fiddling.  So gmake is still wrong.
If its behavior depends on whether a fd is set nonblocking in a child,
that's a problem.  Just a problem that occurs less frequently without
threads it seems.



Re: gmake write error and possible solution

2009-01-06 Thread Philip Guenther
On Tue, Jan 6, 2009 at 5:07 PM, Ted Unangst ted.unan...@gmail.com wrote:
...
 Let me qualify that.  The reason for the maybe is that there can be
 many reasons for a program to set stdout to non-blocking.  It may not
 always be the result of pthread fiddling.  So gmake is still wrong.
 If its behavior depends on whether a fd is set nonblocking in a child,
 that's a problem.  Just a problem that occurs less frequently without
 threads it seems.

Some of us wish that the non-blocking flag was an fd flag (like
FD_CLOEXEC) instead of a file table flag like it really is**; this
would have never been an issue then.

As for this being a bug in gmake, well, the same bug exists in *lots*
of programs.  I used to hit it all the time with the system 'vi' when
debugging a threaded program that crashed, leaving the session's
std{in,out,err} as non-blocking.  That mostly went away when the
system ksh started resetting the terminal to blocking when the
foreground process exited, but you can still hit it by running 'vi'
from inside a threaded program (with system()), then stopping and
starting the program and vi with ^Z and fg:
  Error: input: Resource temporarily unavailable

Notice that resetting the state at startup isn't enough.  Since the
state could be changed by another process at any moment, you actually
have to replace each should-be-blocking call with try it, then poll()
and loop if EAGAIN logic...which probably isn't correct for a
terminal device in non-canonical mode.  Altering almost every program
on the system to do that seems like the Wrong Thing to me.


Philip Guenther

** Yes, yes, there would have had to been some way to specify
non-blocking open().  If we lived in that universe, the details would
have been worked out already.



Re: gmake write error and possible solution

2009-01-06 Thread Ted Unangst
On Tue, Jan 6, 2009 at 8:51 PM, Philip Guenther guent...@gmail.com wrote:
 As for this being a bug in gmake, well, the same bug exists in *lots*
 of programs.  I used to hit it all the time with the system 'vi' when
 debugging a threaded program that crashed, leaving the session's
 std{in,out,err} as non-blocking.  That mostly went away when the
 system ksh started resetting the terminal to blocking when the
 foreground process exited, but you can still hit it by running 'vi'
 from inside a threaded program (with system()), then stopping and
 starting the program and vi with ^Z and fg:
  Error: input: Resource temporarily unavailable

 Notice that resetting the state at startup isn't enough.  Since the
 state could be changed by another process at any moment, you actually
 have to replace each should-be-blocking call with try it, then poll()
 and loop if EAGAIN logic...which probably isn't correct for a
 terminal device in non-canonical mode.  Altering almost every program
 on the system to do that seems like the Wrong Thing to me.

My opinion is that for vi this is more a corner case.  I think it's
reasonable for vi to assume it has blocking fds to start, and for the
shell to enforce that. Same for any other app that doesn't anticipate
being toggled with another app on console.  But gmake is actively
execing other jobs.  It *knows* that other processes are running and
that they are likely writing to stdout, so it should handle this case.

Fixing every program that writes out data to use a loop is certainly
overkill, but I don't think fixing every program that uses fork+exec
to reset or deal with non-blocking shared descriptors is too much to
ask.