Re: CVS problem with ssh

2005-07-15 Thread Larry Jones
Richard M. Stallman writes:
> 
> If you prefer a different fix that _can be done by us_, that's fine
> with me.  Just as long as we fix the problem.

WE CAN'T FIX THE PROBLEM.

Nothing you do to the car can fix the potholes in the road.  You can
spend all the time and money you want upgrading the suspension to try to
smooth out the ride, but you're still going to destroy tires if the road
is bad enough.  If you want to fix it, go figure out how to fix ssh and
submit a patch.  Or convince the ssh maintainers to do it.  Here's
evidence that it affects things other than CVS:

$ cat bigfile | wc -c
 11458815
$ (ssh localhost sleep 10& sleep 3; cat bigfile) | wc -c
cat: stdout: Resource temporarily unavailable
  266240

ssh and cat are sharing the same stdout, so when ssh in the background
sets nonblocking mode, it screws up cat in exactly the same way as it
screws up CVS.

-Larry Jones

The living dead don't NEED to solve word problems. -- Calvin


___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-07-15 Thread Derek Price
Richard M. Stallman wrote:

>If the problem occurs with fflush, most likely it will also occur
>with fwrite, fclose, etc.  fflush is merely the guinea pig here.
>
>We may have to send bug reports for the systems where it does not always work.
>  
>

Agreed.  A failed retry in this case is no worse than before, when the
initial failure was not detected and data was lost.

Regards,

Derek



___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-07-15 Thread Richard M. Stallman
That's essentially *ALL* of them.  How many time do I have to say it --
nonblocking mode violates the fundamental assumtions about how files are
supposed to behave

It didn't convince me the first time, and repeating it won't convince
me either.  We need to get this bug fixed, and saying it's ssh's fault
won't get there.

If you prefer a different fix that _can be done by us_, that's fine
with me.  Just as long as we fix the problem.



___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-07-15 Thread Larry Jones
Richard M. Stallman writes:
> 
> We may have to send bug reports for the systems where it does not always work.

That's essentially *ALL* of them.  How many time do I have to say it --
nonblocking mode violates the fundamental assumtions about how files are
supposed to behave and virtually *no* code, including stdio libraries,
handles it unless it sets nonblocking mode itself.  Nor should it, in my
opinion.

-Larry Jones

The hardest part for us avant-garde post-modern artists is
deciding whether or not to embrace commercialism.  -- Calvin


___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-07-14 Thread Richard M. Stallman
If the problem occurs with fflush, most likely it will also occur
with fwrite, fclose, etc.  fflush is merely the guinea pig here.

We may have to send bug reports for the systems where it does not always work.
However, even on those systems, this is change likely to be a big improvement.
In practice, that case isn't the one that CVS normally encounters.

And on systems which don't have such a bug, it will work 100%.

You suggested an alternative approach to this problem three years ago,
here:

If they want to implement this in CVS, that would be fine.
However, you have not written code they can actually use.
I don't know if they want to write that code.

CVS developers, if you don't implement the other solution, please
install my code.

Possibly, but appears that the same bug is in Solaris 10 and in
Debian GNU/Linux 3.1.  If the bug is that widespread, we have
to deal with it somehow.

If the bug is present in GNU libc, we should fix it, and we will.
Can you make up a GNU Libc bug report?

We can probably get the fix into the stable version of Debian if
we work at it.



___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-07-14 Thread Paul Eggert
"Richard M. Stallman" <[EMAIL PROTECTED]> writes:

> I am not sure what job that program does.  The program is too long for
> me to read it just to try to figure that out.

The program arranges for fflush to have a nontrivial amount of
buffered data to flush.  It then calls fflush, so the buffer is then
flushed into a pipe that is in O_NONBLOCK mode.  If the fflush fails
with errno == EAGAIN, the program retries the fflush.  The program
then checks that the appropriate amount of data was actually written.

> However, if it is impossible to do this job reliably for fflush,
> we can still adapt the solution to that fact, by using
>   #define fflush undefined_fflush
> so that people can modify the relevant apps not to use fflush.

If the problem occurs with fflush, most likely it will also occur
with fwrite, fclose, etc.  fflush is merely the guinea pig here.


You suggested an alternative approach to this problem three years ago,
here:

http://lists.gnu.org/archive/html/bug-cvs/2002-07/msg00162.html

The basic idea is to use a pipe and an intervening subprocess to work
around the problem.  That is, if the application detects that a file
is in O_NOBLOCK mode, it forks a subprocess to talk to that file
directly, and the parent process replaces O_NONBLOCKed file descriptor
with a pipe to the subprocess.  The pipe will not be in O_NONBLOCK
mode, so it will work reliably with the parent process's stdio code.
The child process will run code taken from the blocking-io module that
uses low-level read, write, and select (or poll) calls to do the I/O
reliably.

I have implemented a quick shell-script approximation to this
suggestion, in which the child process is simply "cat -u".  This
script is called "ssh4cvs".  You can find ssh4cvs here:

http://lists.gnu.org/archive/html/bug-gnulib/2004-09/msg3.html

I've been using ssh4cvs with GNU Emacs 21.4 and a recent CVS version
(currently 1.12.9 as patched by Debian) since September 2004.  It
works for me.

More details about this suggestion are here:

http://lists.gnu.org/archive/html/bug-cvs/2002-08/msg00032.html

There were some objections at the time to the overhead of
the extra process, but I agree with what you wrote in July 2002:

  Please don't dismiss that simple idea because it is "inefficient"; the
  inefficiency of this will usually be insignificant compared with the
  slowness of the network, and this needs to be solved somehow.

  



> Isn't that a bug in FreeBSD?

Possibly, but appears that the same bug is in Solaris 10 and in
Debian GNU/Linux 3.1.  If the bug is that widespread, we have
to deal with it somehow.


___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-07-14 Thread Richard M. Stallman
I just checked FreeBSD libc, and it appears that fflush fails once its
underlying write fails with EAGAIN.  So it appears that this approach
won't work under FreeBSD.  That's not a good sign.

Isn't that a bug in FreeBSD?

We can try to work around bugs in various systems, but it is not an
essential requirement that we do so.  If it is hard to work around
them, the best we can do is report them and hope they get fixed by and
by.

Can you write a program to detect whether fflush reliably restarts?  I
just wrote the program below to do that, derived from your code, and
it fails for me both with Debian GNU/Linux 3.1 r0a and with Solaris
10, so that's not a good sign either.

I am not sure what job that program does.  The program is too long for
me to read it just to try to figure that out.

However, if it is impossible to do this job reliably for fflush,
we can still adapt the solution to that fact, by using
  #define fflush undefined_fflush
so that people can modify the relevant apps not to use fflush.



___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-07-13 Thread Richard M. Stallman
I am not sure it is useful to install this in Glibc.
Its main use is for application programs that want to be
more or less portable, and for that use, gnulib is exactly
the right place.


___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-07-13 Thread Paul Eggert
Derek Price <[EMAIL PROTECTED]> writes:

>>You have to include stdio.h first, then #undef fwrite, then
>>#define fwrite.  It can be done (Emacs did it, after all), but I don't
>>recommend it.
>>
> Isn't this the standard behavior for a GNULIB module's header file? 
> This is almost exactly what unlocked-io does, after all.

Yes, that's true.  I'm starting to unbend on that recommendation, for
gnulib code anyway.  It's not something that can be done casually, and
it certainly wouldn't have worked in the gnulib context 10 or 20 years
ago (because files could not routinely be reincluded back then), but
I think it's OK nowadays.


>>As far as I can tell POSIX doesn't guarantee that repeated fflushes
>>will eventually write all the data, with no duplicates, in this
>>situation.
>
> Maybe not, but isn't this by far the implication of EAGAIN in most other
> contexts?  Couldn't do what you asked this time, try *AGAIN*?  I can
> understand the problem with fwrite and an object size > 1 since there is
> no way for the function to return a partial item count, but once stdio
> has the data, why should fflush not track what was written properly? 

Because fflush gets an error.  Common practice is to stop writing when
you get an error.

> GNULIB:
>
>1. All the reasons I want this in GLIBC.
>2. Ease of importing GLIBC updates to this code into CVS.

How does the proposed blocking-io module interact with the existing
unlocked-io module in gnulib?  It seems to me that they are
incompatible, since they both attempt to #define fwrite, for example.
This is a fairly serious problem, I'd think.  Can this be fixed
(possibly by modifying unlocked-io as well), so that the two modules
are compatible?


> Aside from bugs, this is what I would expect from most fflush
> implementations.  How about we install the module and see what sort of
> problem reports we get?

That sounds pretty drastic.  How about if we try this idea out first,
before installing it?

I just checked FreeBSD libc, and it appears that fflush fails once its
underlying write fails with EAGAIN.  So it appears that this approach
won't work under FreeBSD.  That's not a good sign.

Can you write a program to detect whether fflush reliably restarts?  I
just wrote the program below to do that, derived from your code, and
it fails for me both with Debian GNU/Linux 3.1 r0a and with Solaris
10, so that's not a good sign either.

But perhaps I'm not understanding what you're trying to do.



/* The idea of this module is to help programs cope with output
   streams that have been set nonblocking.

   To use it, simply make these definitions in your other files:

   #define printf blocking_printf
   #define fprintf blocking_fprintf
   #define vprintf blocking_vprintf
   #define vfprintf blocking_vfprintf
   #undef putchar
   #define putchar blocking_putchar
   #undef putc
   #define putc blocking_putc
   #define fputc blocking_putc
   #define puts blocking_puts
   #define fputs blocking_fputs
   #define fwrite blocking_fwrite
   #define fflush blocking_fflush

   and link with this module.  */

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 


/* EWOULDBLOCK is not defined by POSIX, but some BSD systems will
   return it, rather than EAGAIN, for nonblocking writes.  */
#ifdef EWOULDBLOCK
# define blocking_error(err) ((err) == EWOULDBLOCK || (err) == EAGAIN)
#else
# define blocking_error(err) ((err) == EAGAIN)
#endif

int
blocking_fwrite (const void *buffer, size_t size, size_t count, FILE *stream)
{
  size_t count_left = count;
  size_t count_written = 0;
  int desc = fileno (stream);

  if (size == 0 || count == 0)
return 0;

  /* Handle the case where SIZE * COUNT overflows.
 (The comparison is designed to be conservative.)  */
  if (size > 1 && (double)size * (double)count > SIZE_MAX / 4)
{
  const char *p = (const char *) buffer;
  for (count_written = 0; count_written < count; count_written++)
{
  if (blocking_fwrite (p, sizeof *p, size, stream) < size)
break;
  p += size;
}
  return count_written;
}

  /* When SIZE is > 1, convert to the byte-wise case
 since that is the only way fwrite undertakes not to repeat
 writing part of the data.  */
  if (size > 1)
{
  size_t total = size * count;
  unsigned int value = blocking_fwrite (buffer, sizeof 1, total,
stream);
  /* It is ok, according to POSIX, if VALUE is not a multiple
 of SIZE, because fwrite in that case can write part of an object.  */
  return value / size;
}

  while (1)
{
  int written;
  fd_set set;

  written = fwrite (buffer, 1, count_left, stream);
  count_written += written;
  if (written == count_left)
break;
  if (blocking_error (errno))
break;

  /* Wait for space to be available.  */
  FD_ZERO (&set);
  FD_SET (desc, &set);
  select (desc + 1, NULL, &

Re: CVS problem with ssh

2005-07-13 Thread Derek Price
Hi bug-gnulib,

I'd like to reopen a discussion on this list from last September: 
.

At the end of that discussion I didn't create the blocking-io module due
mostly to Paul Eggert's objections (and a lack of time on my part,
especially for a solution which sounded like it would be rejected). 
Anyhow, RMS wanted the problem this fixes in CVS fixed badly enough that
he wrote most of the stdio wrappers we discussed himself.  The public
portion of that new discussion is archived here:
.

In any case, after a few iterations and testing in CVS on GNU/Linux, RMS
& I came up with the two files I've attached to this email.  I believe
they answer all of the objections raised in the first thread linked
above except Bruno's comment on porting to WOE32 and the following
comments, which I will attempt to address here.  The WOE32 issue and
conversion to a proper GNULIB module can wait until the disposition of
this code is settled.

Paul Eggert wrote:

>You have to include stdio.h first, then #undef fwrite, then
>#define fwrite.  It can be done (Emacs did it, after all), but I don't
>recommend it.
>

Isn't this the standard behavior for a GNULIB module's header file? 
This is almost exactly what unlocked-io does, after all.  In any case, I
think this is now addressed by the functionality in blocking-io.h.

>/ POSIX says fflush sets errno.  I'm worried about implementations that/
>/ don't conform (or perhaps conform to an older standard?)./


In the cases where fflush does not set errno, we are no worse off than
we were before.  The way the new wrappers are written, the function
behaviors should remain unchanged if errno != EAGAIN (or EWOULDBLOCK on
BSD) and when when they do not return short writes.

>As far as I can tell POSIX doesn't guarantee that repeated fflushes
>will eventually write all the data, with no duplicates, in this
>situation.  Here's the spec:
>http://www.opengroup.org/onlinepubs/009695399/functions/fflush.html
>

Maybe not, but isn't this by far the implication of EAGAIN in most other
contexts?  Couldn't do what you asked this time, try *AGAIN*?  I can
understand the problem with fwrite and an object size > 1 since there is
no way for the function to return a partial item count, but once stdio
has the data, why should fflush not track what was written properly? 
Aside from bugs, this is what I would expect from most fflush
implementations.  How about we install the module and see what sort of
problem reports we get?

Bruno Habile wrote:

>Larry Jones wrote:
>>/ they should be testing for writability with select before trying to/
>>/ write to stderr rather than just blindly putting stderr into nonblocking/
>>/ mode/
>
>Agreed: On platforms where select() or poll() is available, there is no
>reason to use the old non-blocking I/O facility.
>

With this fix, CVS and potentially other tools will work with the broken
SSH implementations and other, hypothetical, future tools, which are as
rude in their treatment of shared file descriptors, regardless of
whether the OpenSSH folks fix this issue.


I would like to see this installed in several locations for the
following reasons, all basically various aspects of the advantages of
code-sharing:

GLIBC:

   1. This module provides useful functionality often expected by
  programs: guaranteed blocking behavior from stdio.
   2. Would promote use in other tools, avoiding similar problems in the
  future.
   3. Would be ported to more platforms more quickly, decreasing
  liklihood of broken CVS releases.
   4. More eyes on this code would mean fewer bugs.


GNULIB:

   1. All the reasons I want this in GLIBC.
   2. Ease of importing GLIBC updates to this code into CVS.


CVS:

   1. This fix certainly shouldn't hurt anything, once the portability
  issues are dealt with.
   2. Will hopefully fix interaction with OpenSSH, and potentially other
  tools, on most platforms.


I might get away with installing this just in CVS, but I am afraid that
CVS doesn't have the manpower to deal with the potential porting issues
raised by various parties in the earlier discussions without the GLIBC
and GNULIB support, and RMS said he would back getting this into GLIBC &
GNULIB to get the fix into CVS.  Regardless, as I already stated, I
think this module provides useful functionality often expected by
programs: guaranteed blocking behavior from stdio.

Regards,

Derek
#ifndef BLOCKING_IO_H
# define BLOCKING_IO_H

/* Get va_list.  */
# include 
# include 

# define printf blocking_printf
# define fprintf blocking_fprintf
# define vprintf blocking_vprintf
# define vfprintf blocking_vfprintf
# undef putchar
# define putchar blocking_putchar
# undef putc
# define putc blocking_putc
# define fputc blocking_putc
# define puts blocking_puts
# define fputs blocking_fputs
# define fwrite blocking_fwrite
# define fflush blocking_fflush
# de

Re: CVS problem with ssh

2005-07-07 Thread Derek Price
Richard M. Stallman wrote:

>I wrote the code, checking the specs, and got it to compile correctly.
>I don't have any easy way to test it--Emacs is not well suited to that
>purpose.  Could you test it?
>  
>

You neglected to include a wrapper for at least fflush().

>  subst_fwrite (buffer, 1, out_count, stdout);
>
>  return out_count;
>  
>

You need to free BUFFER after writing it in all your *printf variants.

>int
>subst_fwrite (char *buffer, size_t size, size_t count, FILE *stream)
>  
>

POSIX says fwrite accepts a BUFFER of type `const void *', not `char *'.

>{
>  int count_left = count;
>  int desc = fileno (stream);
>
>  while (1)
>{
>  int written;
>  fd_set set;
>
>  written = fwrite (buffer, size, count_left, stream);
>  
>

According to POSIX, a short return from fwrite does not guarantee that a
portion of the following item was not written for items of SIZE > 1, so
you need to convert BUFFER to COUNT * SIZE items of size 1 and loop
around writes of that to avoid writing data twice.  I'm not sure of the
most appropriate way to deal with overflow in COUNT * SIZE here.

>  if (written == count_left)
>   break;
>  
>

If errno != EAGAIN here, the function should return the short item count
anyhow, without molesting errno.  For instance, POSIX states that fputc
(and thus fwrite) can return a short count with errno = ENOSPC for a
disk full error.  This wrapper should not continue to attempt writes in
such a case.

>  /* Wait for space to be available.  */
>  FD_ZERO (&set);
>  FD_SET (desc, &set);
>  select (desc + 1, NULL, &set, NULL, NULL);
>
>  buffer += written;
>  count_left -= written;
>}
>
>  return count;
>}
>  
>

Regards,

Derek



___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-07-06 Thread Richard M. Stallman
I wrote the code, checking the specs, and got it to compile correctly.
I don't have any easy way to test it--Emacs is not well suited to that
purpose.  Could you test it?


/* The idea of this module is to help programs cope with output
   streams that have become nonblocking.

   To use it, simply make these definitions in your other files:

   #define printf subst_printf
   #define fprintf subst_fprintf
   #define vprintf subst_vprintf
   #define vfprintf subst_vfprintf
   #undef putchar
   #define putchar subst_putchar
   #undef putc
   #define putc subst_putc
   #define fputc subst_fputc
   #define puts subst_puts
   #define fputs subst_fputs
   #define fwrite subst_fwrite

   and link with this module.  */

#include 
#include 
#include 
#include 

int
subst_printf (const char *format, ...)
{
  int out_count;
  char *buffer;
  int size = 1000;

  while (1)
{
  va_list argptr;

  buffer = (char *) malloc (size);

  /* Format it into BUFFER, as much as fits.  */

  va_start (argptr, format);
  out_count = vsnprintf (buffer, size, format, argptr);
  va_end (argptr);
  /* OUT_COUNT now is number of bytes needed, not counting final null.  */

  if (out_count + 1 <= size)
break;

  /* If it did not fit, try again with more space.  */

  free (buffer);
  size = out_count + 1;
}

  /* Now we have the desired output in BUFFER.  Output it.  */

  subst_fwrite (buffer, 1, out_count, stdout);

  return out_count;
}

int
subst_fprintf (FILE *stream, const char *format, ...)
{
  int out_count;
  char *buffer;
  int size = 1000;

  while (1)
{
  va_list argptr;

  buffer = (char *) malloc (size);

  /* Format it into BUFFER, as much as fits.  */

  va_start (argptr, format);
  out_count = vsnprintf (buffer, size, format, argptr);
  va_end (argptr);
  /* OUT_COUNT now is number of bytes needed, not counting final null.  */

  if (out_count + 1 <= size)
break;

  /* If it did not fit, try again with more space.  */

  free (buffer);
  size = out_count + 1;
}

  /* Now we have the desired output in BUFFER.  Output it.  */

  subst_fwrite (buffer, 1, out_count, stream);

  return out_count;
}

int
subst_vprintf (const char *format, va_list ap)
{
  int out_count;
  char *buffer;
  int size = 1000;

  while (1)
{
  va_list argptr;

  buffer = (char *) malloc (size);

  /* Format it into BUFFER, as much as fits.  */

  va_copy (argptr, ap);
  out_count = vsnprintf (buffer, size, format, argptr);
  va_end (argptr);
  /* OUT_COUNT now is number of bytes needed, not counting final null.  */

  if (out_count + 1 <= size)
break;

  /* If it did not fit, try again with more space.  */

  free (buffer);
  size = out_count + 1;
}

  /* Now we have the desired output in BUFFER.  Output it.  */

  subst_fwrite (buffer, 1, out_count, stdout);

  return out_count;
}

int
subst_vfprintf (FILE *stream, const char *format, va_list ap)
{
  int out_count;
  char *buffer;
  int size = 1000;

  while (1)
{
  va_list argptr;

  buffer = (char *) malloc (size);

  /* Format it into BUFFER, as much as fits.  */

  va_copy (argptr, ap);
  out_count = vsnprintf (buffer, size, format, argptr);
  va_end (argptr);
  /* OUT_COUNT now is number of bytes needed, not counting final null.  */

  if (out_count + 1 <= size)
break;

  /* If it did not fit, try again with more space.  */

  free (buffer);
  size = out_count + 1;
}

  /* Now we have the desired output in BUFFER.  Output it.  */

  subst_fwrite (buffer, 1, out_count, stream);

  return out_count;
}

int
subst_putchar (char c)
{
  return (subst_fwrite (&c, 1, 1, stdout) < 1 ? EOF
  : (int) (unsigned char) c);
}

/* This serves for fputs also.  */
int
subst_putc (char c, FILE *stream)
{
  return (subst_fwrite (&c, 1, 1, stream) < 1 ? EOF
  : (int) (unsigned char) c);
}

int
subst_puts (char *s)
{
  size_t len = strlen (s);
  return (subst_fwrite (s, 1, len, stdout) < len ? EOF : 0);
}

int
subst_fputs (char *s, FILE *stream)
{
  size_t len = strlen (s);
  return (subst_fwrite (s, 1, len, stream) < len ? EOF : 0);
}

int
subst_fwrite (char *buffer, size_t size, size_t count, FILE *stream)
{
  int count_left = count;
  int desc = fileno (stream);

  while (1)
{
  int written;
  fd_set set;

  written = fwrite (buffer, size, count_left, stream);
  if (written == count_left)
break;
  
  /* Wait for space to be available.  */
  FD_ZERO (&set);
  FD_SET (desc, &set);
  select (desc + 1, NULL, &set, NULL, NULL);

  buffer += written;
  count_left -= written;
}

  return count;
}


___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-07-06 Thread Derek Price
Richard M. Stallman wrote:

>Not exactly.  If a solution like this were to go into GLIBC, then it
>becomes much more likely that it can be shared with GNULIB, which means
>that the infrastructure for it is already in CVS and that supporting it
>won't rely exclusively on the extremely small and overworked CVS team. 
>
>If this is what it takes for you to fix the bug, I will do this.
>  
>

If you can get this done, great!  I think it is the most complete way to
fix this bug since it also handles future (currently unknown and
arguably broken) tools that also set shared file descriptors to a
non-blocking status, as well as making the
stdio-always-blocks-and-no-data-sent-will-be-lost functionality
available to other applications that may need it.

Regards,

Derek



___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-07-06 Thread Derek Price
Larry Jones wrote:

>Richard M. Stallman writes:
>  
>
>>If this is what it takes for you to fix the bug, I will do this.
>>
>>
>
>Instead of spending large amounts of effort on half-assed workarounds,
>why don't you make a concerted effort to explain the problem to the
>OpenSSH developers and encourage them to fix the actual problem?
>

Though I agree that it appears, at the least, to be rude to change the
blocking status of a potentially shared file descriptor, I can also see
the argument from the SSH side that they wish to be able to expect the
behavior they request from said file descriptor.

Since there is no actual standard on the matter, a widely available set
of *_block stdio variants seems a reasonable way to satisfy everybody,
to me, and could be used in other tools which encounter similar problems.

>  As far
>as I can tell, no one has actually bothered to do that, which leads me
>to believe that it's not really a serious problem for anyone.
>  
>

This is possible, though, as I previously presented, I am fairly certain
the race condition still exists.  Even if no one is currently reporting
problems, it may crop up again as networks and platforms change, or
perhaps as the new code gets tested on new variants of the same.

Of course, you may be correct that it makes the most sense to wait until
we see more actual problem reports with the new code before spending
time fixing it, but then, as long as someone else submits a complete
patch (i.e. it isn't my time being spent writing it), I don't mind
applying it so much since I am fairly sure the problem is still present,
if currently dormant.

Cheers,

Derek



___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-07-06 Thread Richard M. Stallman
Not exactly.  If a solution like this were to go into GLIBC, then it
becomes much more likely that it can be shared with GNULIB, which means
that the infrastructure for it is already in CVS and that supporting it
won't rely exclusively on the extremely small and overworked CVS team. 

If this is what it takes for you to fix the bug, I will do this.


___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-07-06 Thread Larry Jones
Richard M. Stallman writes:
> 
> If this is what it takes for you to fix the bug, I will do this.

Instead of spending large amounts of effort on half-assed workarounds,
why don't you make a concerted effort to explain the problem to the
OpenSSH developers and encourage them to fix the actual problem?  As far
as I can tell, no one has actually bothered to do that, which leads me
to believe that it's not really a serious problem for anyone.

-Larry Jones

I'm not a vegetarian!  I'm a dessertarian. -- Calvin


___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-07-05 Thread Derek Price
Richard M. Stallman wrote:

>Why not have glibc incorporate the moral equivalent of the "unlocked"
>stdio routines (fwrite_unlocked, printf_unlocked, etc.), like
>fwrite_block?
>
>I am not sure what "unlocked" means, 
>

I'm referring to the non-thread-safe stdio variants exported by glibc
and supported as a GNULIB module.

>but any solution like this would
>only work on GNU.  It would be better to use an equally simple
>approach that will work on other systems also: to define simple
>functions inside CVS to replace fwrite, fprintf, etc.
>  
>

Not exactly.  If a solution like this were to go into GLIBC, then it
becomes much more likely that it can be shared with GNULIB, which means
that the infrastructure for it is already in CVS and that supporting it
won't rely exclusively on the extremely small and overworked CVS team. 
Even if it isn't shared with GNULIB, we can probably rely on the GLIBC
folks to handle portability issues and import a file or two from GLIBC
directly into the CVS tree.

>My suggestion is to write a function cvs_fprintf that uses vsnprintf
>and alloca so as to generate the entire message, and then calls fwrite
>in a suitable loop with select to output the message.  That should be
>simple enough.  How about it?
>
>I will write the function for you if you don't want to write it.
>It would give me an opportunity to restudy how varargs works.
>  
>

Again, I am less afraid of writing this than maintaining it, though I
feel I have the time to do neither just now.  The whole of stdio would
need replacement functions in CVS for this to work consistently: fwrite,
fflush, fprintf, printf, etc. - not just fwrite and not just fprintf. 
If anyone were to contribute such a complete patch, I would consider it
more seriously, still subject, of course, to further comment from the
other CVS developers.

To summarize, if such a patch were accepted into GNULIB, I would have no
problem importing the module into CVS.  Only accepted into GLIBC, I
would have to think about it some more and examine how easy it would be
to make the wrappers stand-alone.  Applied only to CVS, I remain
reluctant to accept the probable maintenance cost but may consider a
complete patch, pending further comment from the other CVS developers
and the results of tests on diverse platforms.

Regards,

Derek



___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-07-01 Thread Richard M. Stallman
Why not have glibc incorporate the moral equivalent of the "unlocked"
stdio routines (fwrite_unlocked, printf_unlocked, etc.), like
fwrite_block?

I am not sure what "unlocked" means, but any solution like this would
only work on GNU.  It would be better to use an equally simple
approach that will work on other systems also: to define simple
functions inside CVS to replace fwrite, fprintf, etc.

  In fact then they (and CVS) could probably get it down to
some #defines in a header without changing the existing stdio calls.

That approach would still work if the functions are defined
by CVS itself.

   #define fprintf cvs_fprintf

My suggestion is to write a function cvs_fprintf that uses vsnprintf
and alloca so as to generate the entire message, and then calls fwrite
in a suitable loop with select to output the message.  That should be
simple enough.  How about it?

I will write the function for you if you don't want to write it.
It would give me an opportunity to restudy how varargs works.


___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-06-29 Thread Derek Price
Why not have glibc incorporate the moral equivalent of the "unlocked"
stdio routines (fwrite_unlocked, printf_unlocked, etc.), like
fwrite_block?  Similarly for printf_block, fprintf_block, etc.  Most of
the stdio code would not need to be reimplemented since it is already
present in glibc and applications which wanted to rely on the blocking
behavior of stdout and stderr could use these routines instead of the
usual stdio?  In fact then they (and CVS) could probably get it down to
some #defines in a header without changing the existing stdio calls.

Regards,

Derek



___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-06-29 Thread Derek Price
Richard M. Stallman wrote:

>Again, we could install loops around fwrite in handle_m, handle_e,
>handle_mbinary, and handle_mt and transfer most of the data coming
>across from the server correctly to stdout/stderr,
>
>Why do you say "most"?  I think we have already established that _all_
>the data will be transferred correctly.  Isn't that so?
>
>  but this still
>wouldn't handle other client output.
>
>Could you describe a case in which there would be a problem?  It would
>have to be a case in which substantial data is forwarded from the
>server, but the client also generates substantial data.
>  
>

The client rarely generates substantial data.  The case I am considering
is when the server dumps a substantial amount of data, followed by any
length printf from the client.  The dump of data from the server could
conceivably fill the stdio buffer without triggering a flush.  Any
printf following that would lose data if the attempted flush returned
EAGAIN.

Regards,

Derek



___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-06-28 Thread Richard M. Stallman
>could not do any harm on any system.  If it fixes the problem on GNU
>systems, that's an important improvement, so why not do it?
>  
>

It might not do any additional harm.  I won't claim to understand the
issue completely, but I was told that there might be data loss since
EAGAIN may not mean what it is supposed to on some systems.  I suppose
that we would be no worse off than we are now.

Exactly.  Would you please install that change now?

>It sounds like you can rely on stdio on a GNU system.  

In the loop we are avoiding, perhaps.

Yes, that's what I meant.  If you make this additional change,
it will work reliably with GNU Libc.  And if any other libc
doesn't do what's necessary for reliable operation, that's a pretty
clear bug in the other libc.

Again, we could install loops around fwrite in handle_m, handle_e,
handle_mbinary, and handle_mt and transfer most of the data coming
across from the server correctly to stdout/stderr,

Why do you say "most"?  I think we have already established that _all_
the data will be transferred correctly.  Isn't that so?

   but this still
wouldn't handle other client output.

Could you describe a case in which there would be a problem?  It would
have to be a case in which substantial data is forwarded from the
server, but the client also generates substantial data.

  It really seems
the much simpler solution for ssh to play nice with the standard file
descriptors.

It's not simple, it's impossible.  The developers of Openssh don't
agree with this position, and they won't make this change.  We have
two choices: fix it in CVS, or leave it broken for ever.

The users will be much better off if it is fixed in CVS, so won't you
please do so?  Fixing this problem is well worth the work it would
take to rewrite all the output routines of CVS.  Since the change
required is much less than that, the sooner it's done, the better.



___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-06-28 Thread Derek Price
Richard M. Stallman wrote:

>  Currently CVS does a single select on the stdout fileno
>before dumping as much data as it last received from the network to
>stdout via fwrite.  There is *no looping or checking for EAGAIN
>returns*, which I believe you advocated but, I am informed by Larry
>Jones and Paul Eggert, would likely be unportable to non-GNU systems.
>
>What do you mean by "unportable"?  As far as I can see, that change
>could not do any harm on any system.  If it fixes the problem on GNU
>systems, that's an important improvement, so why not do it?
>  
>

It might not do any additional harm.  I won't claim to understand the
issue completely, but I was told that there might be data loss since
EAGAIN may not mean what it is supposed to on some systems.  I suppose
that we would be no worse off than we are now.

Paul Eggert was also claiming that a successful select on STDOUT_FILENO
wouldn't necessarily imply a successful write of even a single byte to
the stdout stream (in a loop context, so I don't believe he was
referring to any potential race condition), though I know of no set of
conditions where that would happen.

>It sounds like you can rely on stdio on a GNU system.  
>

In the loop we are avoiding, perhaps.  In its current context, obviously
not, which is where I wanted to be able to rely on it.

>If other
>systems don't handle this reliably, maybe that is their fault.  You
>could report bugs in fwrite if they fail to write the data and don't
>return EAGAIN.
>  
>

Again, we could install loops around fwrite in handle_m, handle_e,
handle_mbinary, and handle_mt and transfer most of the data coming
across from the server correctly to stdout/stderr, but this still
wouldn't handle other client output.  That would require a complete
stdio wrapper, replacing all our calls to printf, fprintf, fflush, etc. 
This would be serious work and not easily maintainable.  It really seems
the much simpler solution for ssh to play nice with the standard file
descriptors.

Regards,

Derek



___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-06-27 Thread Richard M. Stallman
  Currently CVS does a single select on the stdout fileno
before dumping as much data as it last received from the network to
stdout via fwrite.  There is *no looping or checking for EAGAIN
returns*, which I believe you advocated but, I am informed by Larry
Jones and Paul Eggert, would likely be unportable to non-GNU systems.

What do you mean by "unportable"?  As far as I can see, that change
could not do any harm on any system.  If it fixes the problem on GNU
systems, that's an important improvement, so why not do it?

So would you please install that change?

  I would much prefer to be able to
rely on stdio for a situation like this.

It sounds like you can rely on stdio on a GNU system.  If other
systems don't handle this reliably, maybe that is their fault.  You
could report bugs in fwrite if they fail to write the data and don't
return EAGAIN.



___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-06-27 Thread Derek Price
Richard M. Stallman wrote:

>The code currently works by making the assumption that network reads
>will always return chunks of data small enough to be written to stdout
>in a single call to fwrite after a single select on fileno (stdout).
>
>What would make some data too long?  The nmemb argument of fwrite is a
>size_t, and unless fwrite has a bug, it should handle whatever amount
>of data you specify.
>
>I don't see why a large amount of data would cause a failure.
>Previous messages claimed that it could, but I think I've refuted that
>claim.  Is there a flaw in the argument I gave?
>  
>

I've simply been describing what I believe is the current race
condition.  Perhaps you misunderstood the status of the current
workaround in CVS (you may be thinking "fix", but I am attempting to
refute that)?  Currently CVS does a single select on the stdout fileno
before dumping as much data as it last received from the network to
stdout via fwrite.  There is *no looping or checking for EAGAIN
returns*, which I believe you advocated but, I am informed by Larry
Jones and Paul Eggert, would likely be unportable to non-GNU systems.

Currently, this workaround appears to prevent the original problem, but
I think it is working because the select usually implies there is more
room for writing to stdout than we have data only because reading from
the network is slower than writing to disk.  The whole point of this
discussion, I thought, was that we could rely on an EAGAIN return from
GNU fwrite and could retry the write with the remaining data when there
was no room?

Since there is no looping going on and any single fwrite of an arbitrary
amount of data > 1 byte should be able to sometimes trigger the EAGAIN
even after a select, I suspect the race condition described above
exists.  On a fast network with a slow disk, the problem should still be
reproducible.  Perhaps this could be verified by piping the CVS stdout &
stderr to netcat across an even slower network or somesuch.  I don't
know and I don't really have the facilities here to test it.


Though it was not what I was arguing in the above thread - I only meant
to point out that, as I understand the problem, the current situation is
not a true fix, I will reiterate that I still agree with Larry and am
loathe to reimplement stdio in CVS.  I would much prefer to be able to
rely on stdio for a situation like this.

I most certainly don't have the time to reimplement it myself at the
moment and even were a complete patch submitted I would still be
reluctant to accept it due to potential maintenance hassles.  I
understand a similar stdio reimplementation happened in Emacs in the
past and the team eventually gave up on it due to unmaintainability.

Cheers,

Derek



___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-06-25 Thread Richard M. Stallman
The code currently works by making the assumption that network reads
will always return chunks of data small enough to be written to stdout
in a single call to fwrite after a single select on fileno (stdout).

What would make some data too long?  The nmemb argument of fwrite is a
size_t, and unless fwrite has a bug, it should handle whatever amount
of data you specify.

I don't see why a large amount of data would cause a failure.
Previous messages claimed that it could, but I think I've refuted that
claim.  Is there a flaw in the argument I gave?





___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-06-24 Thread Derek Price
Richard M. Stallman wrote:

>Regardless of the fact that this will usually hold true, the algorithm
>still contains a race condition, might not hold true across diverse
>platforms, and is not a true fix.
>
>I do not see the race condition.
>  
>

The code currently works by making the assumption that network reads
will always return chunks of data small enough to be written to stdout
in a single call to fwrite after a single select on fileno (stdout).  As
long as network reads are always slower that writes to stdout, this
should hold true, but it does not sound like an assumption which can be
counted on to prove consistently valid across all platforms and
circumstances.  At the least there is the case where the stdout output
is being piped across an even slower network.

Regards,

Derek



___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-06-24 Thread Richard M. Stallman
> It's possible that this fix is a full and correct solution.  If the
> descriptor has room for at least one byte, `write' won't return
> EAGAIN; it will write at least one byte.  It may write less than the
> whole buffer that was specified.  If the stdio code handles that case
> correctly, the results should be correct.

That depends on your definition of "correctly".  I suspect that all
stdio implementations handle short writes by immediately doing another
write for the remaining data, which in this case will immediately fail
with EAGAIN and we're back to where we started.

If that's what happens, it ought to give good results.  When the
second `write' call fails with EAGAIN, the "temporary workaround" will
call `select' and wait until there is space.  This may not be
optimally elegant, but it is adequate.  It will write all the data,
and it won't busy-wait.  Do we need more?

> If the current fix does not make it correct, please do make the larger
> fix--do your own buffering.  That is surely not really a large job.

Replacing stdio isn't a large job?!?

Replacing stdio, that is to say writing another stdio implementation,
would be a large job, but doing your own buffering is a much smaller
job.  You can still use the complex parts of stdio, via sprintf, if
you need them.

However, it currently looks like no further change is needed.

> The right program to fix, in a case like this, is whichever one we
> actually can get fixed.

No, the right program to fix is the one that's broken: ssh.

My understanding is that the openssh developers don't think that this
is broken, and declined to change it.  However, I am not sure of that.
Do you know?

If you convince them to change this, that would be fine.


___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-06-24 Thread Richard M. Stallman
>It's possible that this fix is a full and correct solution.  If the
>descriptor has room for at least one byte, `write' won't return
>EAGAIN; it will write at least one byte.  It may write less than the
>whole buffer that was specified.  If the stdio code handles that case
>correctly, the results should be correct.
>

Are you telling me that the glibc implementation of streams provides a
buffer with infinite space?

No, that's not what I said.  It seems there is misunderstanding.

  POSIX 1003.1 says fwrite() returns errors
as fputc().

I don't have a copy I can check.  If this is true, it doesn't directly
conflict with what I said about `write'.  However, it could be
relevant to the question of whether the current code really works.
Let's look at what it implies:

The fputc() function shall fail if either the STREAM is unbuffered
or the STREAM's buffer needs to be flushed, and:

[EAGAIN]

The O_NONBLOCK flag is set for the file descriptor underlying
/stream/ and the thread would be delayed in the write operation.

If libc operates by calling `write', which will write some of the data,
and then calls `write' again and it fails with EAGAIN, and that causes
`fputc' to fail, everything should work correctly (see my other message).

That text is also consistent with an implementation of `fputc' which
returns EAGAIN without writing any data.  With such an implementation,
the program would busy-wait, because `select' would return immediately
and it would call `fputc' again, which would return EAGAIN again, etc.
That would be wasteful but the output would still be correct.

I think it is unlikely that any libc implementation would behave this
way.  The natural way to implement it is to call `write', which will
write enough to fill up the buffer; then, after `fputc' fails,
`select' will wait.

Regardless of the fact that this will usually hold true, the algorithm
still contains a race condition, might not hold true across diverse
platforms, and is not a true fix.

I do not see the race condition.


___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-06-23 Thread Derek Price
Richard M. Stallman wrote:

>It's possible that this fix is a full and correct solution.  If the
>descriptor has room for at least one byte, `write' won't return
>EAGAIN; it will write at least one byte.  It may write less than the
>whole buffer that was specified.  If the stdio code handles that case
>correctly, the results should be correct.
>

Are you telling me that the glibc implementation of streams provides a
buffer with infinite space?  POSIX 1003.1 says fwrite() returns errors
as fputc().  POSIX 1003.1 says about fputc:

The fputc() function shall fail if either the STREAM is unbuffered
or the STREAM's buffer needs to be flushed, and:

[EAGAIN]

The O_NONBLOCK flag is set for the file descriptor underlying
/stream/ and the thread would be delayed in the write operation.



>Roland, does GNU libc
>handle that case correctly?  I expect it does, since that case can
>arise in many situations.  If it were not handled correctly, we would
>hear lots of screaming--I think.
>  
>

I bumped CVS's test for this error up to dumping about 110MB of data to
stdout, and it still passes, but not for the reason you think, I think. 
I'm watching the file fill more slowly than it should if limited only by
the disk, and the data is coming from both the network and diff.  I'm
guessing that the file writes are simply usually faster than the reads
of data produced at a high CPU cost and transferred across a network. 
Regardless of the fact that this will usually hold true, the algorithm
still contains a race condition, might not hold true across diverse
platforms, and is not a true fix.

Regards,

Derek



___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-06-23 Thread Larry Jones
Richard M. Stallman writes:
> 
> I'm glad to hear that the new CVS version has a fix that may prevent
> the problem.  Maybe this problem is fixed now.  But we should verify that.

It's not a fix, it's a temporary workaround that avoids the problem in
some, but not all, cases.

> It's possible that this fix is a full and correct solution.  If the
> descriptor has room for at least one byte, `write' won't return
> EAGAIN; it will write at least one byte.  It may write less than the
> whole buffer that was specified.  If the stdio code handles that case
> correctly, the results should be correct.

That depends on your definition of "correctly".  I suspect that all
stdio implementations handle short writes by immediately doing another
write for the remaining data, which in this case will immediately fail
with EAGAIN and we're back to where we started.

> If the current fix does not make it correct, please do make the larger
> fix--do your own buffering.  That is surely not really a large job.

Replacing stdio isn't a large job?!?

> As this case shows, other
> processes that share the descriptor have the reasonable expectation
> that they can take an inherited descriptor and make it non-blocking.

That is an entirely *un*reasonable expectation.  Like I said before,
nonblocking mode does not affect the descriptor, it affects the
underlying file and it violates all of the conventions of how files are
supposed to behave.  Thus, it cannot reasonably be used except under
very tightly controlled conditions -- when you know that you have
exclusive use, not just of the descriptor, but of the underlying file. 
Typically, it's used on newly-created sockets, which do satisfy those
conditions.  Ssh is trying to use it on standard descriptors that it
inherits from an unknown source that almost certainly do not satisfy
those conditions.  That's just plain wrong.  Note that rsh has always
worked just fine without using non-blocking mode for *anything*, ssh
should certainly be able to get away without using it on its standard
files.

> Rather that deciding what to fix by first placing blame, we can do
> better if we look at how we can contribute to making the system work.

Agreed, which is why we adopted the workaround.  But it's not "placing
blame", it's determining the root of the problem.

> The right program to fix, in a case like this, is whichever one we
> actually can get fixed.

No, the right program to fix is the one that's broken: ssh.

-Larry Jones

Whatever it is, it's driving me crazy! -- Calvin


___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-06-23 Thread Richard M. Stallman
* client.c (handle_m): Workaround to deal with stdio getting put
into non-blocking via redirection of stderr and interaction with
ssh on some platforms. On those boxes, stdio can put stdout
unexpectedly into non-blocking mode which may lead to fwrite() or
fflush() failing with EAGAIN, but cvs not checking for the error.
(Patch suggested by Frank Hemer <[EMAIL PROTECTED]>.)

I'm glad to hear that the new CVS version has a fix that may prevent
the problem.  Maybe this problem is fixed now.  But we should verify that.

It's possible that this fix is a full and correct solution.  If the
descriptor has room for at least one byte, `write' won't return
EAGAIN; it will write at least one byte.  It may write less than the
whole buffer that was specified.  If the stdio code handles that case
correctly, the results should be correct.  Roland, does GNU libc
handle that case correctly?  I expect it does, since that case can
arise in many situations.  If it were not handled correctly, we would
hear lots of screaming--I think.

If the current fix is correct, great!

If the current fix does not make it correct, please do make the larger
fix--do your own buffering.  That is surely not really a large job.
It is worth that much effort to fix such a serious bug.

I agree with Larry Jones, that the problem is deeper.  It seems to me
that any program writing to stderr or stdout via stdio should have the
reasonable expectation that the stream will remain blocking, or perhaps
remain non-blocking, as the case may be.

That would clearly be "the right thing", but it cannot be achieved
without a basic redesign of POSIX.  As this case shows, other
processes that share the descriptor have the reasonable expectation
that they can take an inherited descriptor and make it non-blocking.
These two expectations, though both reasonable, conflict, and that is
what made this problem so hard to fix.  People argued for years about
which program is wrong, and meanwhile we remained in an impasse.

Rather that deciding what to fix by first placing blame, we can do
better if we look at how we can contribute to making the system work.
The right program to fix, in a case like this, is whichever one we
actually can get fixed.

I am grateful to Mark Baushke for approaching the situation that way.

  The
same would need to be done in any program piping output through ssh,
some of which don't have CVS's infrastructure, and this seems unreasonable.

This scenario concerns output from ssh that is read by CVS.  It is
true that any program that forks a remote-login program and reads that
program's output thru a pipe is liable to encounter this issue, just
as CVS does.  It would be nice if that were fixed in a more global
way, if correct use of ssh were simpler.  Perhaps someday that will
happen--but in the mean time, let's fix this bug for sure.



___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-06-22 Thread Derek Price
Richard M. Stallman wrote:

>Couldn't this be fixed easily by changing cvs to handle EAGAIN
>in a simple way when outputting to stdout?  If it just sleeps for a second
>and tries again, that ought to fix the problem.
>  
>

Mark Baushke installed Frank Hemer's partial work-around for this
problem some time ago.  This should have been released with CVS 1.11.18
& 1.12.10:

2004-08-17  Mark D. Baushke  <[EMAIL PROTECTED]>

* sanity.sh (sshstdio): Fix comment typo plus gratuitous
reformatting.

* client.c (handle_m): Workaround to deal with stdio getting put
into non-blocking via redirection of stderr and interaction with
ssh on some platforms. On those boxes, stdio can put stdout
unexpectedly into non-blocking mode which may lead to fwrite() or
fflush() failing with EAGAIN, but cvs not checking for the error.
(Patch suggested by Frank Hemer <[EMAIL PROTECTED]>.)

* client.c (handle_e): Similar fix for stderr.
* sanity.sh (sshstdio): New test for non-blocking stdio via ssh.


This fix selects on the file descriptor associated with the stream
before writing to the stream via stdio, which is probably wrong on
several levels, as Paul Eggert pointed out in this discussion:
. 
This is also definitely only a workaround and I suspect won't work for
very large file sizes since the select only guarantees that the
descriptor is ready for a single byte, not an arbitrary amount of data,
though it appears to handle the original test case.

I agree with Larry Jones, that the problem is deeper.  It seems to me
that any program writing to stderr or stdout via stdio should have the
reasonable expectation that the stream will remain blocking, or perhaps
remain non-blocking, as the case may be.

If we have to fix anything on the CVS side, I think the only complete
fix (read working in all cases) would be to replace all our stdio calls
with our own buffering writes that handled things regardless of EAGAIN. 
Much of the code that would be necessary is already in CVS 1.12.x - it's
used for the network buffers on the server side, but nobody has
submitted the patch to apply it everywhere and, as I said, I agree now
that this change would be wrong so I have not written it myself.  The
same would need to be done in any program piping output through ssh,
some of which don't have CVS's infrastructure, and this seems unreasonable.

Regards,

Derek



___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-06-22 Thread Larry Jones
Richard M. Stallman writes:
> 
> I am trying to bring about a fix for the bad interaction between
> CVS and SSH that causes data to be lost?
> 
> See  for
> a good explanation.
> 
> After studying that message, I think I understand the problem.  It
> occurs when the stdout of cvs equals the stderr of ssh.  When ssh sets
> O_NONBLOCK on that descriptor, cvs gets confused.

The key thing to understand is that O_NONBLOCK does not affect the
*descriptor*, it affects the underlying *file*.  Thus, if other
descriptors are bound to the same file, they also become non-blocking.

> This problem has gone unfixed because neither CVS nor SSH is doing
> something clearly wrong.  Nonetheless, the combination of the two
> are doing something wrong, so we need to fix it somewhere.

To my mind, SSH *is* doing something clearly wrong.  See:



> Couldn't this be fixed easily by changing cvs to handle EAGAIN
> in a simple way when outputting to stdout?

No, because CVS uses stdio for output to stdout and thus the response to
errors is out of its control.  And most extant stdio libraries do not
handle EAGAIN.

-Larry Jones

In a minute, you and I are going to settle this out of doors. -- Calvin


___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs


Re: CVS problem with ssh

2005-06-22 Thread Frank Hemer
On Tuesday 21 June 2005 17:13, Richard M. Stallman wrote:
> I am trying to bring about a fix for the bad interaction between
> CVS and SSH that causes data to be lost?
>
> See 
> for a good explanation.
>
> After studying that message, I think I understand the problem.  It
> occurs when the stdout of cvs equals the stderr of ssh.  When ssh sets
> O_NONBLOCK on that descriptor, cvs gets confused.
>
> This problem has gone unfixed because neither CVS nor SSH is doing
> something clearly wrong.  Nonetheless, the combination of the two
> are doing something wrong, so we need to fix it somewhere.
>
> Couldn't this be fixed easily by changing cvs to handle EAGAIN
> in a simple way when outputting to stdout?  If it just sleeps for a second
> and tries again, that ought to fix the problem.
>
> Is there someone who can fix this?  Please respond, if you are working
> on CVS.

Hi Richard,

I have sent a patch for this quite a while ago, see:
http://lists.gnu.org/archive/html/bug-cvs/2004-08/msg00058.html

Its certainly not a real fix - but a workaround.
Another way would be to pipe stderr like stdout is piped - with the 
disadvantage of maybe mixing the order of stdout/stderr msg, where stderr 
would only contain ssh errors.

Regards,
Frank

-- 
- The LinCVS Team -
http://www.lincvs.com



___
Bug-cvs mailing list
Bug-cvs@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-cvs