RE: PATCH/RFC: Another stab at the Cygwin hang problem

2003-07-14 Thread Tillman, James
Ah, I just found the patch that jw sent (email system locked it as potential
virus).  Will try to compile and test this week.  My own environment uses
only SSH push.

jpt

> -Original Message-
> From: jw schultz [mailto:[EMAIL PROTECTED]
> Sent: Saturday, July 12, 2003 6:53 AM
> To: [EMAIL PROTECTED]
> Subject: Re: PATCH/RFC: Another stab at the Cygwin hang problem
> 
> 
> On Wed, Jul 09, 2003 at 06:47:35AM -0400, Tillman, James wrote:
> > 
> > 
> > > -Original Message-
> > > From: jw schultz [mailto:[EMAIL PROTECTED]
> > > Sent: Wednesday, July 09, 2003 5:59 AM
> > > To: [EMAIL PROTECTED]
> > > Subject: Re: PATCH/RFC: Another stab at the Cygwin hang problem
> > > 
> > > 
> > > > I can't quite place why but my instincts inform me that you
> > > > have latched onto something.  Some sort of one character
> > > > buffering error in the io libraries under cygwin.  Most
> > > > likely in the windos libs.
> > > > 
> > > > Well, we have two reports of this fixing the rsync hang
> > > > problem when signals failed.  I'd like a little more testing
> > > > before mainlining it.
> > > 
> > > Nope!  This is a no-go.  It intermittantly produces
> > > 
> > >   error (10) -- error in socket IO
> > > 
> > > on both network and local transfers.
> > > 
> > 
> > I guess I'd better double check my processes to make sure 
> that I'm getting a
> > satisfactory success rate on my own servers.  If I see any 
> clues, I'll
> > report them here.  Any hope for a fix, or does this look 
> like an inherent
> > problem in the method being used?
> 
> It looks like the method is fairly sound.  The problem seems
> to primarily be in dealing with the child termination.
> 
>   io_set_error_fd(-1);
> - kill(pid, SIGUSR2);
> - wait_process(pid, &status);
> + write(cleanup_pipe[1], ".", 1);
> + if (waitpid(pid, &status, 0) != pid) {
> + rprintf(FERROR,"cleanup in do_recv failed\n");
> + exit_cleanup(RERR_SOCKETIO);
> + }
>   return status;
> 
> There is a huge window between the write() and the return of
> waitpid() that depending on scheduling and signal delivery
> allows the child pid to be reaped by SIGCHILD handler.  That
> results in this waitpid() returning -1 with errno of ECHILD.
> EINTER would also be possible.  The timing dependencies
> account for intermittency of the error.
> 
> I've attached an altered patch.  I've only dealt with this
> one location which produced errors doing a ssh pull.  I
> haven't addressed the local transfer errors but i suspect
> that derived from this waitpid error.  Further testing will
> still be needed to ensure that ssh push and rsyncd usage are
> unbroken.  This really needs testing in cygwin which i don't
> have.  If it takes care of the the cygwin hang then we can
> polish it.  There remains the issue of an error status when
> when the only failure is termination.
> 
> -- 
> 
>   J.W. SchultzPegasystems Technologies
>   email address:  [EMAIL PROTECTED]
> 
>   Remember Cernan and Schmitt
> 
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


RE: PATCH/RFC: Another stab at the Cygwin hang problem

2003-07-14 Thread Tillman, James


> -Original Message-
> From: jw schultz [mailto:[EMAIL PROTECTED]
> Sent: Saturday, July 12, 2003 11:25 AM
> To: [EMAIL PROTECTED]
> Subject: Re: PATCH/RFC: Another stab at the Cygwin hang problem
> 
> 
[...]

> > Anyhow, just to let you know.  If you're happy tidying
> > up and refining the patch yourself, please go ahead. If
> > you want to me to do anything, or have any comments on
> > what I've done, I'd appreciate an email.  However I
> > will try to follow the rsync list for the next few
> > weeks at least.
> 
> As i said earlier, i intuit you are on to something with
> this patch.  If you care to clean it up that would be good.
> I would rather someone experiencing the hangs do the fix.
> That tends to reduce the cycle times.

I'm willing to help test if someone sends improvements on Anthony's original
patch to list.  The original has been working great for my own purposes so
far.  I realized when I started using it that I was being a little hasty,
but my own situation required quicker action than is usually recommended.
The risks were worth it, apparently.

What I'm most interested in seeing is a real fix for this hang problem
(Anthony's or someone else's) incorporated into an rsync release sometime in
the near future so that I don't have to retain the patch code and special
instructions for reinstalling my own running system.

jpt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: PATCH/RFC: Another stab at the Cygwin hang problem

2003-07-12 Thread jw schultz
On Sat, Jul 12, 2003 at 11:42:52PM +0900, Anthony Heading wrote:
> On Sat, Jul 12, 2003 at 03:52:59AM -0700, jw schultz wrote:
> > There is a huge window between the write() and the return of
> > waitpid() that depending on scheduling and signal delivery
> > allows the child pid to be reaped by SIGCHILD handler.  That
> > results in this waitpid() returning -1 with errno of ECHILD.
> > EINTER would also be possible.  The timing dependencies
> > account for intermittency of the error.
> 
> Hi JW - 
> 
> Afraid I've not really been following the rsync mailing list,
> and it seems you've been addressing your comments about
> my patch to James Tillman?

Not in the least.  I've addressed them to the list.

> As I said originally, it was illustrative patch - I didn't
> flesh out the error handling since that made the concept
> more difficult to follow.
> 
> Catching up now, I think your observation here is right.
> In fact I'd made a similar change already myself locally.
> 
> Only one difference - I was conciously avoiding calling
> wait_process(), since that function calls msleep() - which
> was implicated in the original hanging problem!  Since
> there is no signal being sent any more, hopefully it's not
> a problem (except for the SIGUSR2 cases?) - however I
> was wanting to ensure that the hangs were _completely_
> eliminated, and thus didn't want to take any chances.
> 
> So my own patch here is checking the errno and gives
> the OK for ECHILD.  I would worry that the whole
> msleep NOHANG io_flush stuff is a very complex loop
> to run simply to collect an exit status, particularly
> when we believe that the root of the hang lies with
> the underlying Cygwin OS.

I don't recall msleep being a hang problem.  I don't see how
it could be.  Myself i wonder why the WNOHANG and msleep
loop instead of a normal waitpid.  I initially had waitpid
with checking of the pid_stat_table if ECHILD but disliked
having the duplicate code.  Besides, if wait_process has a
hang problem lets fix that instead of orphaning it.

> But I think as long as the hangs don't reappear, your
> updated patch is obviously more concise.  Otherwise, I'll be
> further tempted to take the axe to the SIGCHLD handling,
> which looks somewhat jammed with voodoo cruft.

Layer on layer.  I don't care for it myself but changes in
this tend to cause problems on less popular platforms.

> Anyhow, just to let you know.  If you're happy tidying
> up and refining the patch yourself, please go ahead. If
> you want to me to do anything, or have any comments on
> what I've done, I'd appreciate an email.  However I
> will try to follow the rsync list for the next few
> weeks at least.

As i said earlier, i intuit you are on to something with
this patch.  If you care to clean it up that would be good.
I would rather someone experiencing the hangs do the fix.
That tends to reduce the cycle times.

-- 

J.W. SchultzPegasystems Technologies
email address:  [EMAIL PROTECTED]

Remember Cernan and Schmitt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: PATCH/RFC: Another stab at the Cygwin hang problem

2003-07-12 Thread Anthony Heading
On Sat, Jul 12, 2003 at 03:52:59AM -0700, jw schultz wrote:
> There is a huge window between the write() and the return of
> waitpid() that depending on scheduling and signal delivery
> allows the child pid to be reaped by SIGCHILD handler.  That
> results in this waitpid() returning -1 with errno of ECHILD.
> EINTER would also be possible.  The timing dependencies
> account for intermittency of the error.

Hi JW - 

Afraid I've not really been following the rsync mailing list,
and it seems you've been addressing your comments about
my patch to James Tillman?

As I said originally, it was illustrative patch - I didn't
flesh out the error handling since that made the concept
more difficult to follow.

Catching up now, I think your observation here is right.
In fact I'd made a similar change already myself locally.

Only one difference - I was conciously avoiding calling
wait_process(), since that function calls msleep() - which
was implicated in the original hanging problem!  Since
there is no signal being sent any more, hopefully it's not
a problem (except for the SIGUSR2 cases?) - however I
was wanting to ensure that the hangs were _completely_
eliminated, and thus didn't want to take any chances.

So my own patch here is checking the errno and gives
the OK for ECHILD.  I would worry that the whole
msleep NOHANG io_flush stuff is a very complex loop
to run simply to collect an exit status, particularly
when we believe that the root of the hang lies with
the underlying Cygwin OS.

But I think as long as the hangs don't reappear, your
updated patch is obviously more concise.  Otherwise, I'll be
further tempted to take the axe to the SIGCHLD handling,
which looks somewhat jammed with voodoo cruft.

Anyhow, just to let you know.  If you're happy tidying
up and refining the patch yourself, please go ahead. If
you want to me to do anything, or have any comments on
what I've done, I'd appreciate an email.  However I
will try to follow the rsync list for the next few
weeks at least.

Rgds

Anthony

This communication is for informational purposes only.  It is not intended as
an offer or solicitation for the purchase or sale of any financial instrument
or as an official confirmation of any transaction. All market prices, data
and other information are not warranted as to completeness or accuracy and
are subject to change without notice. Any comments or statements made herein
do not necessarily reflect those of J.P. Morgan Chase & Co., its
subsidiaries and affiliates.

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: PATCH/RFC: Another stab at the Cygwin hang problem

2003-07-12 Thread Lapo Luchini
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
jw schultz wrote:

>I've attached an altered patch.  I've only dealt with this
>one location which produced errors doing a ssh pull.
>
OK, I created a test package with your patch included, so that anyone
willing to test but not wililng to compile can use it.
Please notice that it is build against the experimental cygwin DLL release
1.5.0, with support for 64-bit files.
If used on a system with an older DLL it can do "bad things"...
and maybe can do bad things anyway, I take no responsability as this is a
*TEST* package.
(not that I am responsable anyway =P)
http://www.lapo.it/tmp/rsync-2.5.6-3.tar.bz2
http://www.lapo.it/tmp/rsync-2.5.6-3-src.tar.bz2
Please notice moreover that overall in seven "make check" I got the
following failures (once each one).
I'd say this patch gives problems ^_^
-BEGIN PGP SIGNATURE-
Version: PGP 8.0 - not licensed for commercial use: www.pgp.com
iQA/AwUBPw/5+2iYgizI8lL7EQJnWwCdGFxTPjDT8voCXgonG9CYS5h/JGwAmgJ0
cCjDDP03tmNHYBaPEsfeSgnm
=Xkyb
-END PGP SIGNATURE-
- unsafe-links log follows
Testing for symlinks using 'test -h'
+ echo rsync with relative path and just -a
rsync with relative path and just -a
+ /tmp/rsync-2.5.6/rsync.exe -avv from/safe/ to
building file list ...
expand file_list to 4000 bytes, did move
done
created directory to
delta-transmission disabled for local transfer or --whole-file
files/file1
files/file2
links/file1 -> ../files/file1
links/file2 -> ../files/file2
links/unsafefile -> ../../unsafe/unsafefile
total: matches=0  tag_hits=0  false_alarms=0 data=0
wrote 297 bytes  read 52 bytes  232.67 bytes/sec
total size is 342  speedup is 0.98
+ test_symlink to/links/file1
+ is_a_link to/links/file1
+ test -h to/links/file1
+ test_symlink to/links/file2
+ is_a_link to/links/file2
+ test -h to/links/file2
+ test_symlink to/links/unsafefile
+ is_a_link to/links/unsafefile
+ test -h to/links/unsafefile
+ echo rsync with relative path and -a --copy-links
rsync with relative path and -a --copy-links
+ /tmp/rsync-2.5.6/rsync.exe -avv --copy-links from/safe/ to
building file list ...
expand file_list to 4000 bytes, did move
done
delta-transmission disabled for local transfer or --whole-file
files/file1 is uptodate
files/file2 is uptodate
links/file1 is uptodate
links/file2 is uptodate
links/unsafefile
total: matches=0  tag_hits=0  false_alarms=0 data=0
wrote 198 bytes  read 36 bytes  468.00 bytes/sec
total size is 0  speedup is 0.00
+ test_regular to/links/file1
+ [ ! -f to/links/file1 ]
+ test_regular to/links/file2
+ [ ! -f to/links/file2 ]
+ test_regular to/links/unsafefile
+ [ ! -f to/links/unsafefile ]
+ echo rsync with relative path and --copy-unsafe-links
rsync with relative path and --copy-unsafe-links
+ /tmp/rsync-2.5.6/rsync.exe -avv --copy-unsafe-links from/safe/ to
pipe: Address already in use
rsync error: error in IPC code (code 14) at pipe.c(107)
- unsafe-links log ends
FAILunsafe-links
- hands log follows
Testing for symlinks using 'test -h'
Test basic operation: Running: "/tmp/rsync-2.5.6/rsync.exe -av 
/tmp/rsync-2.5.6/testtmp.hands/from/ /tmp/rsync-2.5.6/testtmp.hands/to"
building file list ... done
./
dir/
dir/subdir/
dir/subdir/subsubdir/
dir/subdir/subsubdir/etc-ltr-list
dir/subdir/subsubdir2/
dir/subdir/subsubdir2/bin-lt-list
dir/text
empty
emptydir/
filelist
nolf
nolf-symlink -> nolf
text
wrote 829890 bytes  read 132 bytes  1660044.00 bytes/sec
total size is 829321  speedup is 1.00
-
check how the files compare with diff:

-
check how the directory listings compare with diff:
   done.
Test hard links: Running: "/tmp/rsync-2.5.6/rsync.exe -avH 
/tmp/rsync-2.5.6/testtmp.hands/from/ /tmp/rsync-2.5.6/testtmp.hands/to"
building file list ... done
dir/
dir/filelist
filelist => dir/filelist
wrote 21870 bytes  read 36 bytes  43812.00 bytes/sec
total size is 850647  speedup is 38.83
-
check how the files compare with diff:

-
check how the directory listings compare with diff:
   done.
Test one file: Running: "/tmp/rsync-2.5.6/rsync.exe -avH 
/tmp/rsync-2.5.6/testtmp.hands/from/ /tmp/rsync-2.5.6/testtmp.hands/to"
building file list ... done
./
text
wrote 374971 bytes  read 36 bytes  250004.67 bytes/sec
total size is 850647  speedup is 2.27
-
check how the files compare with diff:

-
check how the directory listings compare with diff:
   done.
Test extra data: Running: "/tmp/rsync-2.5.6/rsync.exe -avH 
/tmp/rsync-2.5.6/testtmp.hands/from/ /tmp/rsync-2.5.6/testtmp.hands/to"
building file list ... done
pipe failed in do_recv
rsync error: error in socket IO (code 10) at main.c(412)
rsync: connection unexpectedly closed (8 bytes read so far)
rsync error: error in rsync protocol data stream (code 12) at io.c(165)
- hands log ends
FAILhands

- hands log follows
Testing for symlinks using 'test -h'
Test basic operation: Running: "/tmp/rsync-2.5.6/rsync.exe -av 
/tmp/rsync-2.5.6/testtmp.hands/fr

Re: PATCH/RFC: Another stab at the Cygwin hang problem

2003-07-12 Thread jw schultz
On Wed, Jul 09, 2003 at 06:47:35AM -0400, Tillman, James wrote:
> 
> 
> > -Original Message-
> > From: jw schultz [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, July 09, 2003 5:59 AM
> > To: [EMAIL PROTECTED]
> > Subject: Re: PATCH/RFC: Another stab at the Cygwin hang problem
> > 
> > 
> > > I can't quite place why but my instincts inform me that you
> > > have latched onto something.  Some sort of one character
> > > buffering error in the io libraries under cygwin.  Most
> > > likely in the windos libs.
> > > 
> > > Well, we have two reports of this fixing the rsync hang
> > > problem when signals failed.  I'd like a little more testing
> > > before mainlining it.
> > 
> > Nope!  This is a no-go.  It intermittantly produces
> > 
> > error (10) -- error in socket IO
> > 
> > on both network and local transfers.
> > 
> 
> I guess I'd better double check my processes to make sure that I'm getting a
> satisfactory success rate on my own servers.  If I see any clues, I'll
> report them here.  Any hope for a fix, or does this look like an inherent
> problem in the method being used?

It looks like the method is fairly sound.  The problem seems
to primarily be in dealing with the child termination.

io_set_error_fd(-1);
-   kill(pid, SIGUSR2);
-   wait_process(pid, &status);
+   write(cleanup_pipe[1], ".", 1);
+   if (waitpid(pid, &status, 0) != pid) {
+   rprintf(FERROR,"cleanup in do_recv failed\n");
+   exit_cleanup(RERR_SOCKETIO);
+   }
return status;

There is a huge window between the write() and the return of
waitpid() that depending on scheduling and signal delivery
allows the child pid to be reaped by SIGCHILD handler.  That
results in this waitpid() returning -1 with errno of ECHILD.
EINTER would also be possible.  The timing dependencies
account for intermittency of the error.

I've attached an altered patch.  I've only dealt with this
one location which produced errors doing a ssh pull.  I
haven't addressed the local transfer errors but i suspect
that derived from this waitpid error.  Further testing will
still be needed to ensure that ssh push and rsyncd usage are
unbroken.  This really needs testing in cygwin which i don't
have.  If it takes care of the the cygwin hang then we can
polish it.  There remains the issue of an error status when
when the only failure is termination.

-- 

J.W. SchultzPegasystems Technologies
email address:  [EMAIL PROTECTED]

Remember Cernan and Schmitt
? main.2.5.5
Index: cleanup.c
===
RCS file: /data/cvs/rsync/cleanup.c,v
retrieving revision 1.18
diff -u -r1.18 cleanup.c
--- cleanup.c   21 Mar 2003 23:43:50 -  1.18
+++ cleanup.c   12 Jul 2003 10:31:04 -
@@ -96,7 +96,6 @@
inside_cleanup++;
 
signal(SIGUSR1, SIG_IGN);
-   signal(SIGUSR2, SIG_IGN);
 
if (verbose > 3)
rprintf(FINFO,"_exit_cleanup(code=%d, file=%s, line=%d): entered\n", 
Index: main.c
===
RCS file: /data/cvs/rsync/main.c,v
retrieving revision 1.169
diff -u -r1.169 main.c
--- main.c  4 Jul 2003 15:11:46 -   1.169
+++ main.c  12 Jul 2003 10:31:04 -
@@ -391,6 +391,7 @@
int status=0;
int recv_pipe[2];
int error_pipe[2];
+   int cleanup_pipe[2];
extern int preserve_hard_links;
extern int delete_after;
extern int recurse;
@@ -417,11 +418,19 @@
exit_cleanup(RERR_SOCKETIO);
}
 
+   if (pipe(cleanup_pipe) < 0) {
+   rprintf(FERROR,"cleanup pipe failed in do_recv\n");
+   exit_cleanup(RERR_SOCKETIO);
+   }
+  
io_flush();
 
if ((pid=do_fork()) == 0) {
+   char tmp;
+
close(recv_pipe[0]);
close(error_pipe[0]);
+   close(cleanup_pipe[1]);
if (f_in != f_out) close(f_out);
 
/* we can't let two processes write to the socket at one time */
@@ -437,15 +446,21 @@
write_int(recv_pipe[1],1);
close(recv_pipe[1]);
io_flush();
-   /* finally we go to sleep until our parent kills us
-  with a USR2 signal. We sleep for a short time as on
-  some OSes a signal won't interrupt a sleep! */
-   while (msleep(20))
-   ;
+   do {
+   status = read(cleanup_pipe[0], &tmp, 1);
+   } while (sta

Re: PATCH/RFC: Another stab at the Cygwin hang problem

2003-07-10 Thread jw schultz
On Wed, Jul 09, 2003 at 06:47:35AM -0400, Tillman, James wrote:
> 
> 
> > -Original Message-
> > From: jw schultz [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, July 09, 2003 5:59 AM
> > To: [EMAIL PROTECTED]
> > Subject: Re: PATCH/RFC: Another stab at the Cygwin hang problem
> > 
> > 
> > > I can't quite place why but my instincts inform me that you
> > > have latched onto something.  Some sort of one character
> > > buffering error in the io libraries under cygwin.  Most
> > > likely in the windos libs.
> > > 
> > > Well, we have two reports of this fixing the rsync hang
> > > problem when signals failed.  I'd like a little more testing
> > > before mainlining it.
> > 
> > Nope!  This is a no-go.  It intermittantly produces
> > 
> > error (10) -- error in socket IO
> > 
> > on both network and local transfers.
> > 
> 
> I guess I'd better double check my processes to make sure that I'm getting a
> satisfactory success rate on my own servers.  If I see any clues, I'll
> report them here.  Any hope for a fix, or does this look like an inherent
> problem in the method being used?

Better diags might help.  Pull over ssh hits this.

+   write(cleanup_pipe[1], ".", 1);
+   if (waitpid(pid, &status, 0) != pid) {
+   rprintf(FERROR,"cleanup in do_recv failed\n");
+   exit_cleanup(RERR_SOCKETIO); 
+   }   

I have two problems here.  Firstly you are ignoring errno.
The waitpid call fails but you don't identify why.
Secondly, as long as the processes exit (no hangs, zombies
or runaways) and the actual transfer is successful i don't
mind too much if the termination is less than perfect.
Lets not use RERR_SOCKETIO.  Lets use a different warning
status that only applies to the termination.





-- 

J.W. SchultzPegasystems Technologies
email address:  [EMAIL PROTECTED]

Remember Cernan and Schmitt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: PATCH/RFC: Another stab at the Cygwin hang problem

2003-07-09 Thread jw schultz
On Wed, Jul 09, 2003 at 06:47:35AM -0400, Tillman, James wrote:
> 
> 
> > -Original Message-
> > From: jw schultz [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, July 09, 2003 5:59 AM
> > To: [EMAIL PROTECTED]
> > Subject: Re: PATCH/RFC: Another stab at the Cygwin hang problem
> > 
> > 
> > > I can't quite place why but my instincts inform me that you
> > > have latched onto something.  Some sort of one character
> > > buffering error in the io libraries under cygwin.  Most
> > > likely in the windos libs.
> > > 
> > > Well, we have two reports of this fixing the rsync hang
> > > problem when signals failed.  I'd like a little more testing
> > > before mainlining it.
> > 
> > Nope!  This is a no-go.  It intermittantly produces
> > 
> > error (10) -- error in socket IO
> > 
> > on both network and local transfers.
> > 
> 
> I guess I'd better double check my processes to make sure that I'm getting a
> satisfactory success rate on my own servers.  If I see any clues, I'll
> report them here.  Any hope for a fix, or does this look like an inherent
> problem in the method being used?

I haven't dug into it yet.  As the patch author i you might
be a bit more familiar with it.

I'm not running cygwin so i never had the hangs.  I only
applied it to test for regression, which is what i found.

-- 

J.W. SchultzPegasystems Technologies
email address:  [EMAIL PROTECTED]

Remember Cernan and Schmitt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


RE: PATCH/RFC: Another stab at the Cygwin hang problem

2003-07-09 Thread Tillman, James
My sincerest apologies for the duplicate msgs from me that were sent to the
list this morning.  My email administrator must have done something quite
stupid to have all msgs I've sent in the last week go out again!

jpt

> -Original Message-
> From: Tillman, James 
> Sent: Wednesday, July 09, 2003 6:48 AM
> To: [EMAIL PROTECTED]
> Subject: RE: PATCH/RFC: Another stab at the Cygwin hang problem
> 
> 
> 
> 
> > -Original Message-
> > From: jw schultz [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, July 09, 2003 5:59 AM
> > To: [EMAIL PROTECTED]
> > Subject: Re: PATCH/RFC: Another stab at the Cygwin hang problem
> > 
> > 
> > > I can't quite place why but my instincts inform me that you
> > > have latched onto something.  Some sort of one character
> > > buffering error in the io libraries under cygwin.  Most
> > > likely in the windos libs.
> > > 
> > > Well, we have two reports of this fixing the rsync hang
> > > problem when signals failed.  I'd like a little more testing
> > > before mainlining it.
> > 
> > Nope!  This is a no-go.  It intermittantly produces
> > 
> > error (10) -- error in socket IO
> > 
> > on both network and local transfers.
> > 
> 
> I guess I'd better double check my processes to make sure 
> that I'm getting a
> satisfactory success rate on my own servers.  If I see any clues, I'll
> report them here.  Any hope for a fix, or does this look like 
> an inherent
> problem in the method being used?
> 
> jpt
> -- 
> To unsubscribe or change options: 
> http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: 
> http://www.catb.org/~esr/faqs/smart-questions.html
> 
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


RE: PATCH/RFC: Another stab at the Cygwin hang problem

2003-07-09 Thread Tillman, James


> -Original Message-
> From: jw schultz [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, July 09, 2003 5:59 AM
> To: [EMAIL PROTECTED]
> Subject: Re: PATCH/RFC: Another stab at the Cygwin hang problem
> 
> 
> > I can't quite place why but my instincts inform me that you
> > have latched onto something.  Some sort of one character
> > buffering error in the io libraries under cygwin.  Most
> > likely in the windos libs.
> > 
> > Well, we have two reports of this fixing the rsync hang
> > problem when signals failed.  I'd like a little more testing
> > before mainlining it.
> 
> Nope!  This is a no-go.  It intermittantly produces
> 
>   error (10) -- error in socket IO
> 
> on both network and local transfers.
> 

I guess I'd better double check my processes to make sure that I'm getting a
satisfactory success rate on my own servers.  If I see any clues, I'll
report them here.  Any hope for a fix, or does this look like an inherent
problem in the method being used?

jpt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: PATCH/RFC: Another stab at the Cygwin hang problem

2003-07-09 Thread jw schultz
On Mon, Jun 30, 2003 at 05:49:45PM -0700, jw schultz wrote:
> On Mon, Jun 30, 2003 at 11:12:29PM +0900, Anthony Heading wrote:
> > On Mon, Jun 30, 2003 at 04:54:22AM -0700, jw schultz wrote:
> > > Could you regenerate the patch with diff -u please?
> > 
> > Okay, sure.  This one against current CVS.
> 
> Thanks that helps in examining it.
> 
> I can't quite place why but my instincts inform me that you
> have latched onto something.  Some sort of one character
> buffering error in the io libraries under cygwin.  Most
> likely in the windos libs.
> 
> Well, we have two reports of this fixing the rsync hang
> problem when signals failed.  I'd like a little more testing
> before mainlining it.

Nope!  This is a no-go.  It intermittantly produces

error (10) -- error in socket IO

on both network and local transfers.



-- 

J.W. SchultzPegasystems Technologies
email address:  [EMAIL PROTECTED]

Remember Cernan and Schmitt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: PATCH/RFC: Another stab at the Cygwin hang problem

2003-06-30 Thread jw schultz
On Mon, Jun 30, 2003 at 11:12:29PM +0900, Anthony Heading wrote:
> On Mon, Jun 30, 2003 at 04:54:22AM -0700, jw schultz wrote:
> > Could you regenerate the patch with diff -u please?
> 
> Okay, sure.  This one against current CVS.

Thanks that helps in examining it.

I can't quite place why but my instincts inform me that you
have latched onto something.  Some sort of one character
buffering error in the io libraries under cygwin.  Most
likely in the windos libs.

Well, we have two reports of this fixing the rsync hang
problem when signals failed.  I'd like a little more testing
before mainlining it.

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: PATCH/RFC: Another stab at the Cygwin hang problem

2003-06-30 Thread Anthony Heading
On Mon, Jun 30, 2003 at 04:54:22AM -0700, jw schultz wrote:
> Could you regenerate the patch with diff -u please?

Okay, sure.  This one against current CVS.

Anthony

--- cleanup.c.Orig  2003-06-30 22:42:16.0 +0900
+++ cleanup.c   2003-06-30 22:42:47.0 +0900
@@ -96,7 +96,6 @@
inside_cleanup++;
 
signal(SIGUSR1, SIG_IGN);
-   signal(SIGUSR2, SIG_IGN);
 
if (verbose > 3)
rprintf(FINFO,"_exit_cleanup(code=%d, file=%s, line=%d): entered\n", 
--- main.c.Orig 2003-04-25 01:26:09.0 +0900
+++ main.c  2003-06-30 22:41:35.0 +0900
@@ -391,6 +391,7 @@
int status=0;
int recv_pipe[2];
int error_pipe[2];
+   int cleanup_pipe[2];
extern int preserve_hard_links;
extern int delete_after;
extern int recurse;
@@ -417,11 +418,19 @@
exit_cleanup(RERR_SOCKETIO);
}
 
+   if (pipe(cleanup_pipe) < 0) {
+   rprintf(FERROR,"cleanup pipe failed in do_recv\n");
+   exit_cleanup(RERR_SOCKETIO);
+   }
+  
io_flush();
 
if ((pid=do_fork()) == 0) {
+   char tmp;
+
close(recv_pipe[0]);
close(error_pipe[0]);
+   close(cleanup_pipe[1]);
if (f_in != f_out) close(f_out);
 
/* we can't let two processes write to the socket at one time */
@@ -437,15 +446,21 @@
write_int(recv_pipe[1],1);
close(recv_pipe[1]);
io_flush();
-   /* finally we go to sleep until our parent kills us
-  with a USR2 signal. We sleep for a short time as on
-  some OSes a signal won't interrupt a sleep! */
-   while (msleep(20))
-   ;
+   do {
+   status = read(cleanup_pipe[0], &tmp, 1);
+   } while (status == -1 && errno == EINTR);
+   if (status != 1) {
+   rprintf(FERROR,"cleanup read returned %d in do_recv\n", 
status);
+   if (status == -1)
+   rprintf(FERROR,"with errno %d (%s)\n", errno, 
strerror(errno));
+   _exit(RERR_PARTIAL);
+   }
+   _exit(0);
}
 
close(recv_pipe[1]);
close(error_pipe[1]);
+   close(cleanup_pipe[0]);
if (f_in != f_out) close(f_in);
 
io_start_buffering(f_out);
@@ -463,8 +478,11 @@
io_flush();
 
io_set_error_fd(-1);
-   kill(pid, SIGUSR2);
-   wait_process(pid, &status);
+   write(cleanup_pipe[1], ".", 1);
+   if (waitpid(pid, &status, 0) != pid) {
+   rprintf(FERROR,"cleanup in do_recv failed\n");
+   exit_cleanup(RERR_SOCKETIO);
+   }
return status;
 }
 
@@ -881,12 +899,6 @@
exit_cleanup(RERR_SIGNAL);
 }
 
-static RETSIGTYPE sigusr2_handler(int UNUSED(val)) {
-   extern int log_got_error;
-   if (log_got_error) _exit(RERR_PARTIAL);
-   _exit(0);
-}
-
 static RETSIGTYPE sigchld_handler(int UNUSED(val)) {
 #ifdef WNOHANG
int cnt, status;
@@ -976,7 +988,6 @@
orig_argv = argv;
 
signal(SIGUSR1, sigusr1_handler);
-   signal(SIGUSR2, sigusr2_handler);
signal(SIGCHLD, sigchld_handler);
 #ifdef MAINTAINER_MODE
signal(SIGSEGV, rsync_panic_handler);
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: PATCH/RFC: Another stab at the Cygwin hang problem

2003-06-30 Thread jw schultz
Apparently this fixed the problem for Tillman, James.

Could you regenerate the patch with diff -u please?

On Fri, Jun 27, 2003 at 04:16:12PM +0900, Anthony Heading wrote:
> Hi,
> 
> In http://sources.redhat.com/ml/cygwin/2002-09/msg01155.html, I noted that
> the often-observed hangs of rsync under Cygwin were assuaged by a call to
> msleep().
> 
> After upgrading my Cygwin environment to rsync 2.5.6, I'm seeing these
> hangs again, not surprisingly given a CVS entry for main.c notes that
> this kludge was not harmless:
> 
> Revision 1.162 / (download) - annotate - [select for diffs] ,
>   Tue Jan 28 05:05:53 2003 UTC (4 months, 4 weeks ago) by dwd
> 
> Remove the Cygwin msleep(100) before the generator kills the receiver,
> because it caused the testsuite/unsafe-links test to hang.
> 
> So it seems sensible to attempt something a bit more elegant.
> 
> And the first question is why kill/signals are being used
> being used here at all.
> 
> The illustrative patch below I think effects an equivalent synchronization,
> but does so by queuing a byte into a pipe rather than sending a signal.
> 
> Of course, since it's not currently done this way, I may be overlooking
> something obvious. I can't quite see what though, since in the event
> that an error occurs then exit_cleanup is available to send SIGUSR1
> with extreme prejudice; but if the protocol in fact concludes cleanly
> then there really should be no need for an asynchronous notification?
> 
> Comments sought, meanwhile I'll test the patch a bit...
> 
> Regards
> 
> Anthony
> 
> 
> *** main.c.Orig   Fri Jun 27 15:21:22 2003
> --- main.cFri Jun 27 15:30:09 2003
> ***
> *** 390,395 
> --- 390,396 
>   int status=0;
>   int recv_pipe[2];
>   int error_pipe[2];
> + int cleanup_pipe[2];
>   extern int preserve_hard_links;
>   extern int delete_after;
>   extern int recurse;
> ***
> *** 416,426 
> --- 417,435 
>   exit_cleanup(RERR_SOCKETIO);
>   }
> 
> + if (pipe(cleanup_pipe) < 0) {
> + rprintf(FERROR,"cleanup pipe failed in do_recv\n");
> + exit_cleanup(RERR_SOCKETIO);
> + }
> +   
>   io_flush();
>   
>   if ((pid=do_fork()) == 0) {
> + char tmp;
> + 
>   close(recv_pipe[0]);
>   close(error_pipe[0]);
> + close(cleanup_pipe[1]);
>   if (f_in != f_out) close(f_out);
>   
>   /* we can't let two processes write to the socket at one time */
> ***
> *** 436,450 
>   write_int(recv_pipe[1],1);
>   close(recv_pipe[1]);
>   io_flush();
> ! /* finally we go to sleep until our parent kills us
> !with a USR2 signal. We sleep for a short time as on
> !some OSes a signal won't interrupt a sleep! */
> ! while (msleep(20))
> ! ;
>   }
>   
>   close(recv_pipe[1]);
>   close(error_pipe[1]);
>   if (f_in != f_out) close(f_in);
>   
>   io_start_buffering(f_out);
> --- 445,465 
>   write_int(recv_pipe[1],1);
>   close(recv_pipe[1]);
>   io_flush();
> ! do {
> ! status = read(cleanup_pipe[0], &tmp, 1);
> ! } while (status == -1 && errno == EINTR);
> ! if (status != 1) {
> ! rprintf(FERROR,"cleanup read returned %d in do_recv\n", 
> status);
> ! if (status == -1)
> ! rprintf(FERROR,"with errno %d (%s)\n", errno, 
> strerror(errno));
> ! _exit(RERR_PARTIAL);
> ! }
> ! _exit(0);
>   }
>   
>   close(recv_pipe[1]);
>   close(error_pipe[1]);
> + close(cleanup_pipe[0]);
>   if (f_in != f_out) close(f_in);
>   
>   io_start_buffering(f_out);
> ***
> *** 462,469 
>   io_flush();
>   
>   io_set_error_fd(-1);
> ! kill(pid, SIGUSR2);
> ! wait_process(pid, &status);
>   return status;
>   }
>   
> --- 477,487 
>   io_flush();
>   
>   io_set_error_fd(-1);
> ! write(cleanup_pipe[1], ".", 1);
> ! if (waitpid(pid, &status, 0) != pid) {
> ! rprintf(FERROR,"cleanup in do_recv failed\n");
> ! exit_cleanup(RERR_SOCKETIO);
> ! }
>   return status;
>   }
>   
> ***
> *** 867,878 
>   exit_cleanup(RERR_SIGNAL);
>   }
>   
> - static RETSIGTYPE sigusr2_handler(int UNUSED(val)) {
> - extern int log_got_error;
> - if (log_got_error) _exit(RERR_PARTIAL);
> - _exit(0);
> - }
> - 
>   static RETSIGTYPE sigchld_handler(int UNUSED(val)) {
>   #ifdef WNOHANG
>   int cnt, status;
> --- 885,890 
> ***
> *** 964,970 
>   orig_argv = argv;
>   
>   signal(SIGUSR1, sigusr1_handler);
> - signal(SIGUSR2, sigusr2_handler);
>   signal(SIGCHLD, sigchld_handler)