Re: CVS problem with ssh

2005-06-23 Thread Derek Price
Richard M. Stallman wrote:

>It's possible that this fix is a full and correct solution.  If the
>descriptor has room for at least one byte, `write' won't return
>EAGAIN; it will write at least one byte.  It may write less than the
>whole buffer that was specified.  If the stdio code handles that case
>correctly, the results should be correct.
>

Are you telling me that the glibc implementation of streams provides a
buffer with infinite space?  POSIX 1003.1 says fwrite() returns errors
as fputc().  POSIX 1003.1 says about fputc:

The fputc() function shall fail if either the STREAM is unbuffered
or the STREAM's buffer needs to be flushed, and:

[EAGAIN]

The O_NONBLOCK flag is set for the file descriptor underlying
/stream/ and the thread would be delayed in the write operation.



>Roland, does GNU libc
>handle that case correctly?  I expect it does, since that case can
>arise in many situations.  If it were not handled correctly, we would
>hear lots of screaming--I think.
>  
>

I bumped CVS's test for this error up to dumping about 110MB of data to
stdout, and it still passes, but not for the reason you think, I think. 
I'm watching the file fill more slowly than it should if limited only by
the disk, and the data is coming from both the network and diff.  I'm
guessing that the file writes are simply usually faster than the reads
of data produced at a high CPU cost and transferred across a network. 
Regardless of the fact that this will usually hold true, the algorithm
still contains a race condition, might not hold true across diverse
platforms, and is not a true fix.

Regards,

Derek



___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-06-23 Thread Larry Jones
Richard M. Stallman writes:
> 
> I'm glad to hear that the new CVS version has a fix that may prevent
> the problem.  Maybe this problem is fixed now.  But we should verify that.

It's not a fix, it's a temporary workaround that avoids the problem in
some, but not all, cases.

> It's possible that this fix is a full and correct solution.  If the
> descriptor has room for at least one byte, `write' won't return
> EAGAIN; it will write at least one byte.  It may write less than the
> whole buffer that was specified.  If the stdio code handles that case
> correctly, the results should be correct.

That depends on your definition of "correctly".  I suspect that all
stdio implementations handle short writes by immediately doing another
write for the remaining data, which in this case will immediately fail
with EAGAIN and we're back to where we started.

> If the current fix does not make it correct, please do make the larger
> fix--do your own buffering.  That is surely not really a large job.

Replacing stdio isn't a large job?!?

> As this case shows, other
> processes that share the descriptor have the reasonable expectation
> that they can take an inherited descriptor and make it non-blocking.

That is an entirely *un*reasonable expectation.  Like I said before,
nonblocking mode does not affect the descriptor, it affects the
underlying file and it violates all of the conventions of how files are
supposed to behave.  Thus, it cannot reasonably be used except under
very tightly controlled conditions -- when you know that you have
exclusive use, not just of the descriptor, but of the underlying file. 
Typically, it's used on newly-created sockets, which do satisfy those
conditions.  Ssh is trying to use it on standard descriptors that it
inherits from an unknown source that almost certainly do not satisfy
those conditions.  That's just plain wrong.  Note that rsh has always
worked just fine without using non-blocking mode for *anything*, ssh
should certainly be able to get away without using it on its standard
files.

> Rather that deciding what to fix by first placing blame, we can do
> better if we look at how we can contribute to making the system work.

Agreed, which is why we adopted the workaround.  But it's not "placing
blame", it's determining the root of the problem.

> The right program to fix, in a case like this, is whichever one we
> actually can get fixed.

No, the right program to fix is the one that's broken: ssh.

-Larry Jones

Whatever it is, it's driving me crazy! -- Calvin


___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-06-23 Thread Richard M. Stallman
* client.c (handle_m): Workaround to deal with stdio getting put
into non-blocking via redirection of stderr and interaction with
ssh on some platforms. On those boxes, stdio can put stdout
unexpectedly into non-blocking mode which may lead to fwrite() or
fflush() failing with EAGAIN, but cvs not checking for the error.
(Patch suggested by Frank Hemer <[EMAIL PROTECTED]>.)

I'm glad to hear that the new CVS version has a fix that may prevent
the problem.  Maybe this problem is fixed now.  But we should verify that.

It's possible that this fix is a full and correct solution.  If the
descriptor has room for at least one byte, `write' won't return
EAGAIN; it will write at least one byte.  It may write less than the
whole buffer that was specified.  If the stdio code handles that case
correctly, the results should be correct.  Roland, does GNU libc
handle that case correctly?  I expect it does, since that case can
arise in many situations.  If it were not handled correctly, we would
hear lots of screaming--I think.

If the current fix is correct, great!

If the current fix does not make it correct, please do make the larger
fix--do your own buffering.  That is surely not really a large job.
It is worth that much effort to fix such a serious bug.

I agree with Larry Jones, that the problem is deeper.  It seems to me
that any program writing to stderr or stdout via stdio should have the
reasonable expectation that the stream will remain blocking, or perhaps
remain non-blocking, as the case may be.

That would clearly be "the right thing", but it cannot be achieved
without a basic redesign of POSIX.  As this case shows, other
processes that share the descriptor have the reasonable expectation
that they can take an inherited descriptor and make it non-blocking.
These two expectations, though both reasonable, conflict, and that is
what made this problem so hard to fix.  People argued for years about
which program is wrong, and meanwhile we remained in an impasse.

Rather that deciding what to fix by first placing blame, we can do
better if we look at how we can contribute to making the system work.
The right program to fix, in a case like this, is whichever one we
actually can get fixed.

I am grateful to Mark Baushke for approaching the situation that way.

  The
same would need to be done in any program piping output through ssh,
some of which don't have CVS's infrastructure, and this seems unreasonable.

This scenario concerns output from ssh that is read by CVS.  It is
true that any program that forks a remote-login program and reads that
program's output thru a pipe is liable to encounter this issue, just
as CVS does.  It would be nice if that were fixed in a more global
way, if correct use of ssh were simpler.  Perhaps someday that will
happen--but in the mean time, let's fix this bug for sure.



___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs