Ah, I just found the patch that jw sent (email system locked it as potential virus). Will try to compile and test this week. My own environment uses only SSH push.
jpt > -----Original Message----- > From: jw schultz [mailto:[EMAIL PROTECTED] > Sent: Saturday, July 12, 2003 6:53 AM > To: [EMAIL PROTECTED] > Subject: Re: PATCH/RFC: Another stab at the Cygwin hang problem > > > On Wed, Jul 09, 2003 at 06:47:35AM -0400, Tillman, James wrote: > > > > > > > -----Original Message----- > > > From: jw schultz [mailto:[EMAIL PROTECTED] > > > Sent: Wednesday, July 09, 2003 5:59 AM > > > To: [EMAIL PROTECTED] > > > Subject: Re: PATCH/RFC: Another stab at the Cygwin hang problem > > > > > > > > > > I can't quite place why but my instincts inform me that you > > > > have latched onto something. Some sort of one character > > > > buffering error in the io libraries under cygwin. Most > > > > likely in the windos libs. > > > > > > > > Well, we have two reports of this fixing the rsync hang > > > > problem when signals failed. I'd like a little more testing > > > > before mainlining it. > > > > > > Nope! This is a no-go. It intermittantly produces > > > > > > error (10) -- error in socket IO > > > > > > on both network and local transfers. > > > > > > > I guess I'd better double check my processes to make sure > that I'm getting a > > satisfactory success rate on my own servers. If I see any > clues, I'll > > report them here. Any hope for a fix, or does this look > like an inherent > > problem in the method being used? > > It looks like the method is fairly sound. The problem seems > to primarily be in dealing with the child termination. > > io_set_error_fd(-1); > - kill(pid, SIGUSR2); > - wait_process(pid, &status); > + write(cleanup_pipe[1], ".", 1); > + if (waitpid(pid, &status, 0) != pid) { > + rprintf(FERROR,"cleanup in do_recv failed\n"); > + exit_cleanup(RERR_SOCKETIO); > + } > return status; > > There is a huge window between the write() and the return of > waitpid() that depending on scheduling and signal delivery > allows the child pid to be reaped by SIGCHILD handler. That > results in this waitpid() returning -1 with errno of ECHILD. > EINTER would also be possible. The timing dependencies > account for intermittency of the error. > > I've attached an altered patch. I've only dealt with this > one location which produced errors doing a ssh pull. I > haven't addressed the local transfer errors but i suspect > that derived from this waitpid error. Further testing will > still be needed to ensure that ssh push and rsyncd usage are > unbroken. This really needs testing in cygwin which i don't > have. If it takes care of the the cygwin hang then we can > polish it. There remains the issue of an error status when > when the only failure is termination. > > -- > ________________________________________________________________ > J.W. Schultz Pegasystems Technologies > email address: [EMAIL PROTECTED] > > Remember Cernan and Schmitt > -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html