Re: CVS problem with ssh
Richard M. Stallman wrote: >It's possible that this fix is a full and correct solution. If the >descriptor has room for at least one byte, `write' won't return >EAGAIN; it will write at least one byte. It may write less than the >whole buffer that was specified. If the stdio code handles that case >correctly, the results should be correct. > Are you telling me that the glibc implementation of streams provides a buffer with infinite space? POSIX 1003.1 says fwrite() returns errors as fputc(). POSIX 1003.1 says about fputc: The fputc() function shall fail if either the STREAM is unbuffered or the STREAM's buffer needs to be flushed, and: [EAGAIN] The O_NONBLOCK flag is set for the file descriptor underlying /stream/ and the thread would be delayed in the write operation. >Roland, does GNU libc >handle that case correctly? I expect it does, since that case can >arise in many situations. If it were not handled correctly, we would >hear lots of screaming--I think. > > I bumped CVS's test for this error up to dumping about 110MB of data to stdout, and it still passes, but not for the reason you think, I think. I'm watching the file fill more slowly than it should if limited only by the disk, and the data is coming from both the network and diff. I'm guessing that the file writes are simply usually faster than the reads of data produced at a high CPU cost and transferred across a network. Regardless of the fact that this will usually hold true, the algorithm still contains a race condition, might not hold true across diverse platforms, and is not a true fix. Regards, Derek ___ Bug-cvs mailing list Bug-cvs@gnu.org http://lists.gnu.org/mailman/listinfo/bug-cvs
Re: CVS problem with ssh
Richard M. Stallman writes: > > I'm glad to hear that the new CVS version has a fix that may prevent > the problem. Maybe this problem is fixed now. But we should verify that. It's not a fix, it's a temporary workaround that avoids the problem in some, but not all, cases. > It's possible that this fix is a full and correct solution. If the > descriptor has room for at least one byte, `write' won't return > EAGAIN; it will write at least one byte. It may write less than the > whole buffer that was specified. If the stdio code handles that case > correctly, the results should be correct. That depends on your definition of "correctly". I suspect that all stdio implementations handle short writes by immediately doing another write for the remaining data, which in this case will immediately fail with EAGAIN and we're back to where we started. > If the current fix does not make it correct, please do make the larger > fix--do your own buffering. That is surely not really a large job. Replacing stdio isn't a large job?!? > As this case shows, other > processes that share the descriptor have the reasonable expectation > that they can take an inherited descriptor and make it non-blocking. That is an entirely *un*reasonable expectation. Like I said before, nonblocking mode does not affect the descriptor, it affects the underlying file and it violates all of the conventions of how files are supposed to behave. Thus, it cannot reasonably be used except under very tightly controlled conditions -- when you know that you have exclusive use, not just of the descriptor, but of the underlying file. Typically, it's used on newly-created sockets, which do satisfy those conditions. Ssh is trying to use it on standard descriptors that it inherits from an unknown source that almost certainly do not satisfy those conditions. That's just plain wrong. Note that rsh has always worked just fine without using non-blocking mode for *anything*, ssh should certainly be able to get away without using it on its standard files. > Rather that deciding what to fix by first placing blame, we can do > better if we look at how we can contribute to making the system work. Agreed, which is why we adopted the workaround. But it's not "placing blame", it's determining the root of the problem. > The right program to fix, in a case like this, is whichever one we > actually can get fixed. No, the right program to fix is the one that's broken: ssh. -Larry Jones Whatever it is, it's driving me crazy! -- Calvin ___ Bug-cvs mailing list Bug-cvs@gnu.org http://lists.gnu.org/mailman/listinfo/bug-cvs
Re: CVS problem with ssh
* client.c (handle_m): Workaround to deal with stdio getting put into non-blocking via redirection of stderr and interaction with ssh on some platforms. On those boxes, stdio can put stdout unexpectedly into non-blocking mode which may lead to fwrite() or fflush() failing with EAGAIN, but cvs not checking for the error. (Patch suggested by Frank Hemer <[EMAIL PROTECTED]>.) I'm glad to hear that the new CVS version has a fix that may prevent the problem. Maybe this problem is fixed now. But we should verify that. It's possible that this fix is a full and correct solution. If the descriptor has room for at least one byte, `write' won't return EAGAIN; it will write at least one byte. It may write less than the whole buffer that was specified. If the stdio code handles that case correctly, the results should be correct. Roland, does GNU libc handle that case correctly? I expect it does, since that case can arise in many situations. If it were not handled correctly, we would hear lots of screaming--I think. If the current fix is correct, great! If the current fix does not make it correct, please do make the larger fix--do your own buffering. That is surely not really a large job. It is worth that much effort to fix such a serious bug. I agree with Larry Jones, that the problem is deeper. It seems to me that any program writing to stderr or stdout via stdio should have the reasonable expectation that the stream will remain blocking, or perhaps remain non-blocking, as the case may be. That would clearly be "the right thing", but it cannot be achieved without a basic redesign of POSIX. As this case shows, other processes that share the descriptor have the reasonable expectation that they can take an inherited descriptor and make it non-blocking. These two expectations, though both reasonable, conflict, and that is what made this problem so hard to fix. People argued for years about which program is wrong, and meanwhile we remained in an impasse. Rather that deciding what to fix by first placing blame, we can do better if we look at how we can contribute to making the system work. The right program to fix, in a case like this, is whichever one we actually can get fixed. I am grateful to Mark Baushke for approaching the situation that way. The same would need to be done in any program piping output through ssh, some of which don't have CVS's infrastructure, and this seems unreasonable. This scenario concerns output from ssh that is read by CVS. It is true that any program that forks a remote-login program and reads that program's output thru a pipe is liable to encounter this issue, just as CVS does. It would be nice if that were fixed in a more global way, if correct use of ssh were simpler. Perhaps someday that will happen--but in the mean time, let's fix this bug for sure. ___ Bug-cvs mailing list Bug-cvs@gnu.org http://lists.gnu.org/mailman/listinfo/bug-cvs