RE: PATCH/RFC: Another stab at the Cygwin hang problem

2003-07-14 Thread Tillman, James


 -Original Message-
 From: jw schultz [mailto:[EMAIL PROTECTED]
 Sent: Saturday, July 12, 2003 11:25 AM
 To: [EMAIL PROTECTED]
 Subject: Re: PATCH/RFC: Another stab at the Cygwin hang problem
 
 
[...]

  Anyhow, just to let you know.  If you're happy tidying
  up and refining the patch yourself, please go ahead. If
  you want to me to do anything, or have any comments on
  what I've done, I'd appreciate an email.  However I
  will try to follow the rsync list for the next few
  weeks at least.
 
 As i said earlier, i intuit you are on to something with
 this patch.  If you care to clean it up that would be good.
 I would rather someone experiencing the hangs do the fix.
 That tends to reduce the cycle times.

I'm willing to help test if someone sends improvements on Anthony's original
patch to list.  The original has been working great for my own purposes so
far.  I realized when I started using it that I was being a little hasty,
but my own situation required quicker action than is usually recommended.
The risks were worth it, apparently.

What I'm most interested in seeing is a real fix for this hang problem
(Anthony's or someone else's) incorporated into an rsync release sometime in
the near future so that I don't have to retain the patch code and special
instructions for reinstalling my own running system.

jpt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


RE: PATCH/RFC: Another stab at the Cygwin hang problem

2003-07-14 Thread Tillman, James
Ah, I just found the patch that jw sent (email system locked it as potential
virus).  Will try to compile and test this week.  My own environment uses
only SSH push.

jpt

 -Original Message-
 From: jw schultz [mailto:[EMAIL PROTECTED]
 Sent: Saturday, July 12, 2003 6:53 AM
 To: [EMAIL PROTECTED]
 Subject: Re: PATCH/RFC: Another stab at the Cygwin hang problem
 
 
 On Wed, Jul 09, 2003 at 06:47:35AM -0400, Tillman, James wrote:
  
  
   -Original Message-
   From: jw schultz [mailto:[EMAIL PROTECTED]
   Sent: Wednesday, July 09, 2003 5:59 AM
   To: [EMAIL PROTECTED]
   Subject: Re: PATCH/RFC: Another stab at the Cygwin hang problem
   
   
I can't quite place why but my instincts inform me that you
have latched onto something.  Some sort of one character
buffering error in the io libraries under cygwin.  Most
likely in the windos libs.

Well, we have two reports of this fixing the rsync hang
problem when signals failed.  I'd like a little more testing
before mainlining it.
   
   Nope!  This is a no-go.  It intermittantly produces
   
 error (10) -- error in socket IO
   
   on both network and local transfers.
   
  
  I guess I'd better double check my processes to make sure 
 that I'm getting a
  satisfactory success rate on my own servers.  If I see any 
 clues, I'll
  report them here.  Any hope for a fix, or does this look 
 like an inherent
  problem in the method being used?
 
 It looks like the method is fairly sound.  The problem seems
 to primarily be in dealing with the child termination.
 
   io_set_error_fd(-1);
 - kill(pid, SIGUSR2);
 - wait_process(pid, status);
 + write(cleanup_pipe[1], ., 1);
 + if (waitpid(pid, status, 0) != pid) {
 + rprintf(FERROR,cleanup in do_recv failed\n);
 + exit_cleanup(RERR_SOCKETIO);
 + }
   return status;
 
 There is a huge window between the write() and the return of
 waitpid() that depending on scheduling and signal delivery
 allows the child pid to be reaped by SIGCHILD handler.  That
 results in this waitpid() returning -1 with errno of ECHILD.
 EINTER would also be possible.  The timing dependencies
 account for intermittency of the error.
 
 I've attached an altered patch.  I've only dealt with this
 one location which produced errors doing a ssh pull.  I
 haven't addressed the local transfer errors but i suspect
 that derived from this waitpid error.  Further testing will
 still be needed to ensure that ssh push and rsyncd usage are
 unbroken.  This really needs testing in cygwin which i don't
 have.  If it takes care of the the cygwin hang then we can
 polish it.  There remains the issue of an error status when
 when the only failure is termination.
 
 -- 
 
   J.W. SchultzPegasystems Technologies
   email address:  [EMAIL PROTECTED]
 
   Remember Cernan and Schmitt
 
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: PATCH/RFC: Another stab at the Cygwin hang problem

2003-07-12 Thread jw schultz
On Wed, Jul 09, 2003 at 06:47:35AM -0400, Tillman, James wrote:
 
 
  -Original Message-
  From: jw schultz [mailto:[EMAIL PROTECTED]
  Sent: Wednesday, July 09, 2003 5:59 AM
  To: [EMAIL PROTECTED]
  Subject: Re: PATCH/RFC: Another stab at the Cygwin hang problem
  
  
   I can't quite place why but my instincts inform me that you
   have latched onto something.  Some sort of one character
   buffering error in the io libraries under cygwin.  Most
   likely in the windos libs.
   
   Well, we have two reports of this fixing the rsync hang
   problem when signals failed.  I'd like a little more testing
   before mainlining it.
  
  Nope!  This is a no-go.  It intermittantly produces
  
  error (10) -- error in socket IO
  
  on both network and local transfers.
  
 
 I guess I'd better double check my processes to make sure that I'm getting a
 satisfactory success rate on my own servers.  If I see any clues, I'll
 report them here.  Any hope for a fix, or does this look like an inherent
 problem in the method being used?

It looks like the method is fairly sound.  The problem seems
to primarily be in dealing with the child termination.

io_set_error_fd(-1);
-   kill(pid, SIGUSR2);
-   wait_process(pid, status);
+   write(cleanup_pipe[1], ., 1);
+   if (waitpid(pid, status, 0) != pid) {
+   rprintf(FERROR,cleanup in do_recv failed\n);
+   exit_cleanup(RERR_SOCKETIO);
+   }
return status;

There is a huge window between the write() and the return of
waitpid() that depending on scheduling and signal delivery
allows the child pid to be reaped by SIGCHILD handler.  That
results in this waitpid() returning -1 with errno of ECHILD.
EINTER would also be possible.  The timing dependencies
account for intermittency of the error.

I've attached an altered patch.  I've only dealt with this
one location which produced errors doing a ssh pull.  I
haven't addressed the local transfer errors but i suspect
that derived from this waitpid error.  Further testing will
still be needed to ensure that ssh push and rsyncd usage are
unbroken.  This really needs testing in cygwin which i don't
have.  If it takes care of the the cygwin hang then we can
polish it.  There remains the issue of an error status when
when the only failure is termination.

-- 

J.W. SchultzPegasystems Technologies
email address:  [EMAIL PROTECTED]

Remember Cernan and Schmitt
? main.2.5.5
Index: cleanup.c
===
RCS file: /data/cvs/rsync/cleanup.c,v
retrieving revision 1.18
diff -u -r1.18 cleanup.c
--- cleanup.c   21 Mar 2003 23:43:50 -  1.18
+++ cleanup.c   12 Jul 2003 10:31:04 -
@@ -96,7 +96,6 @@
inside_cleanup++;
 
signal(SIGUSR1, SIG_IGN);
-   signal(SIGUSR2, SIG_IGN);
 
if (verbose  3)
rprintf(FINFO,_exit_cleanup(code=%d, file=%s, line=%d): entered\n, 
Index: main.c
===
RCS file: /data/cvs/rsync/main.c,v
retrieving revision 1.169
diff -u -r1.169 main.c
--- main.c  4 Jul 2003 15:11:46 -   1.169
+++ main.c  12 Jul 2003 10:31:04 -
@@ -391,6 +391,7 @@
int status=0;
int recv_pipe[2];
int error_pipe[2];
+   int cleanup_pipe[2];
extern int preserve_hard_links;
extern int delete_after;
extern int recurse;
@@ -417,11 +418,19 @@
exit_cleanup(RERR_SOCKETIO);
}
 
+   if (pipe(cleanup_pipe)  0) {
+   rprintf(FERROR,cleanup pipe failed in do_recv\n);
+   exit_cleanup(RERR_SOCKETIO);
+   }
+  
io_flush();
 
if ((pid=do_fork()) == 0) {
+   char tmp;
+
close(recv_pipe[0]);
close(error_pipe[0]);
+   close(cleanup_pipe[1]);
if (f_in != f_out) close(f_out);
 
/* we can't let two processes write to the socket at one time */
@@ -437,15 +446,21 @@
write_int(recv_pipe[1],1);
close(recv_pipe[1]);
io_flush();
-   /* finally we go to sleep until our parent kills us
-  with a USR2 signal. We sleep for a short time as on
-  some OSes a signal won't interrupt a sleep! */
-   while (msleep(20))
-   ;
+   do {
+   status = read(cleanup_pipe[0], tmp, 1);
+   } while (status == -1  errno == EINTR);
+   if (status != 1) {
+   rprintf(FERROR,cleanup read returned %d in do_recv\n, 
status);
+   if (status == -1)
+   rprintf(FERROR,with errno %d (%s)\n, errno, 
strerror(errno));
+   _exit(RERR_PARTIAL

Re: PATCH/RFC: Another stab at the Cygwin hang problem

2003-07-12 Thread Lapo Luchini
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
jw schultz wrote:

I've attached an altered patch.  I've only dealt with this
one location which produced errors doing a ssh pull.

OK, I created a test package with your patch included, so that anyone
willing to test but not wililng to compile can use it.
Please notice that it is build against the experimental cygwin DLL release
1.5.0, with support for 64-bit files.
If used on a system with an older DLL it can do bad things...
and maybe can do bad things anyway, I take no responsability as this is a
*TEST* package.
(not that I am responsable anyway =P)
http://www.lapo.it/tmp/rsync-2.5.6-3.tar.bz2
http://www.lapo.it/tmp/rsync-2.5.6-3-src.tar.bz2
Please notice moreover that overall in seven make check I got the
following failures (once each one).
I'd say this patch gives problems ^_^
-BEGIN PGP SIGNATURE-
Version: PGP 8.0 - not licensed for commercial use: www.pgp.com
iQA/AwUBPw/5+2iYgizI8lL7EQJnWwCdGFxTPjDT8voCXgonG9CYS5h/JGwAmgJ0
cCjDDP03tmNHYBaPEsfeSgnm
=Xkyb
-END PGP SIGNATURE-
- unsafe-links log follows
Testing for symlinks using 'test -h'
+ echo rsync with relative path and just -a
rsync with relative path and just -a
+ /tmp/rsync-2.5.6/rsync.exe -avv from/safe/ to
building file list ...
expand file_list to 4000 bytes, did move
done
created directory to
delta-transmission disabled for local transfer or --whole-file
files/file1
files/file2
links/file1 - ../files/file1
links/file2 - ../files/file2
links/unsafefile - ../../unsafe/unsafefile
total: matches=0  tag_hits=0  false_alarms=0 data=0
wrote 297 bytes  read 52 bytes  232.67 bytes/sec
total size is 342  speedup is 0.98
+ test_symlink to/links/file1
+ is_a_link to/links/file1
+ test -h to/links/file1
+ test_symlink to/links/file2
+ is_a_link to/links/file2
+ test -h to/links/file2
+ test_symlink to/links/unsafefile
+ is_a_link to/links/unsafefile
+ test -h to/links/unsafefile
+ echo rsync with relative path and -a --copy-links
rsync with relative path and -a --copy-links
+ /tmp/rsync-2.5.6/rsync.exe -avv --copy-links from/safe/ to
building file list ...
expand file_list to 4000 bytes, did move
done
delta-transmission disabled for local transfer or --whole-file
files/file1 is uptodate
files/file2 is uptodate
links/file1 is uptodate
links/file2 is uptodate
links/unsafefile
total: matches=0  tag_hits=0  false_alarms=0 data=0
wrote 198 bytes  read 36 bytes  468.00 bytes/sec
total size is 0  speedup is 0.00
+ test_regular to/links/file1
+ [ ! -f to/links/file1 ]
+ test_regular to/links/file2
+ [ ! -f to/links/file2 ]
+ test_regular to/links/unsafefile
+ [ ! -f to/links/unsafefile ]
+ echo rsync with relative path and --copy-unsafe-links
rsync with relative path and --copy-unsafe-links
+ /tmp/rsync-2.5.6/rsync.exe -avv --copy-unsafe-links from/safe/ to
pipe: Address already in use
rsync error: error in IPC code (code 14) at pipe.c(107)
- unsafe-links log ends
FAILunsafe-links
- hands log follows
Testing for symlinks using 'test -h'
Test basic operation: Running: /tmp/rsync-2.5.6/rsync.exe -av 
/tmp/rsync-2.5.6/testtmp.hands/from/ /tmp/rsync-2.5.6/testtmp.hands/to
building file list ... done
./
dir/
dir/subdir/
dir/subdir/subsubdir/
dir/subdir/subsubdir/etc-ltr-list
dir/subdir/subsubdir2/
dir/subdir/subsubdir2/bin-lt-list
dir/text
empty
emptydir/
filelist
nolf
nolf-symlink - nolf
text
wrote 829890 bytes  read 132 bytes  1660044.00 bytes/sec
total size is 829321  speedup is 1.00
-
check how the files compare with diff:

-
check how the directory listings compare with diff:
   done.
Test hard links: Running: /tmp/rsync-2.5.6/rsync.exe -avH 
/tmp/rsync-2.5.6/testtmp.hands/from/ /tmp/rsync-2.5.6/testtmp.hands/to
building file list ... done
dir/
dir/filelist
filelist = dir/filelist
wrote 21870 bytes  read 36 bytes  43812.00 bytes/sec
total size is 850647  speedup is 38.83
-
check how the files compare with diff:

-
check how the directory listings compare with diff:
   done.
Test one file: Running: /tmp/rsync-2.5.6/rsync.exe -avH 
/tmp/rsync-2.5.6/testtmp.hands/from/ /tmp/rsync-2.5.6/testtmp.hands/to
building file list ... done
./
text
wrote 374971 bytes  read 36 bytes  250004.67 bytes/sec
total size is 850647  speedup is 2.27
-
check how the files compare with diff:

-
check how the directory listings compare with diff:
   done.
Test extra data: Running: /tmp/rsync-2.5.6/rsync.exe -avH 
/tmp/rsync-2.5.6/testtmp.hands/from/ /tmp/rsync-2.5.6/testtmp.hands/to
building file list ... done
pipe failed in do_recv
rsync error: error in socket IO (code 10) at main.c(412)
rsync: connection unexpectedly closed (8 bytes read so far)
rsync error: error in rsync protocol data stream (code 12) at io.c(165)
- hands log ends
FAILhands

- hands log follows
Testing for symlinks using 'test -h'
Test basic operation: Running: /tmp/rsync-2.5.6/rsync.exe -av 
/tmp/rsync-2.5.6/testtmp.hands/from/ 

Re: PATCH/RFC: Another stab at the Cygwin hang problem

2003-07-12 Thread Anthony Heading
On Sat, Jul 12, 2003 at 03:52:59AM -0700, jw schultz wrote:
 There is a huge window between the write() and the return of
 waitpid() that depending on scheduling and signal delivery
 allows the child pid to be reaped by SIGCHILD handler.  That
 results in this waitpid() returning -1 with errno of ECHILD.
 EINTER would also be possible.  The timing dependencies
 account for intermittency of the error.

Hi JW - 

Afraid I've not really been following the rsync mailing list,
and it seems you've been addressing your comments about
my patch to James Tillman?

As I said originally, it was illustrative patch - I didn't
flesh out the error handling since that made the concept
more difficult to follow.

Catching up now, I think your observation here is right.
In fact I'd made a similar change already myself locally.

Only one difference - I was conciously avoiding calling
wait_process(), since that function calls msleep() - which
was implicated in the original hanging problem!  Since
there is no signal being sent any more, hopefully it's not
a problem (except for the SIGUSR2 cases?) - however I
was wanting to ensure that the hangs were _completely_
eliminated, and thus didn't want to take any chances.

So my own patch here is checking the errno and gives
the OK for ECHILD.  I would worry that the whole
msleep NOHANG io_flush stuff is a very complex loop
to run simply to collect an exit status, particularly
when we believe that the root of the hang lies with
the underlying Cygwin OS.

But I think as long as the hangs don't reappear, your
updated patch is obviously more concise.  Otherwise, I'll be
further tempted to take the axe to the SIGCHLD handling,
which looks somewhat jammed with voodoo cruft.

Anyhow, just to let you know.  If you're happy tidying
up and refining the patch yourself, please go ahead. If
you want to me to do anything, or have any comments on
what I've done, I'd appreciate an email.  However I
will try to follow the rsync list for the next few
weeks at least.

Rgds

Anthony

This communication is for informational purposes only.  It is not intended as
an offer or solicitation for the purchase or sale of any financial instrument
or as an official confirmation of any transaction. All market prices, data
and other information are not warranted as to completeness or accuracy and
are subject to change without notice. Any comments or statements made herein
do not necessarily reflect those of J.P. Morgan Chase  Co., its
subsidiaries and affiliates.

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: PATCH/RFC: Another stab at the Cygwin hang problem

2003-07-12 Thread jw schultz
On Sat, Jul 12, 2003 at 11:42:52PM +0900, Anthony Heading wrote:
 On Sat, Jul 12, 2003 at 03:52:59AM -0700, jw schultz wrote:
  There is a huge window between the write() and the return of
  waitpid() that depending on scheduling and signal delivery
  allows the child pid to be reaped by SIGCHILD handler.  That
  results in this waitpid() returning -1 with errno of ECHILD.
  EINTER would also be possible.  The timing dependencies
  account for intermittency of the error.
 
 Hi JW - 
 
 Afraid I've not really been following the rsync mailing list,
 and it seems you've been addressing your comments about
 my patch to James Tillman?

Not in the least.  I've addressed them to the list.

 As I said originally, it was illustrative patch - I didn't
 flesh out the error handling since that made the concept
 more difficult to follow.
 
 Catching up now, I think your observation here is right.
 In fact I'd made a similar change already myself locally.
 
 Only one difference - I was conciously avoiding calling
 wait_process(), since that function calls msleep() - which
 was implicated in the original hanging problem!  Since
 there is no signal being sent any more, hopefully it's not
 a problem (except for the SIGUSR2 cases?) - however I
 was wanting to ensure that the hangs were _completely_
 eliminated, and thus didn't want to take any chances.
 
 So my own patch here is checking the errno and gives
 the OK for ECHILD.  I would worry that the whole
 msleep NOHANG io_flush stuff is a very complex loop
 to run simply to collect an exit status, particularly
 when we believe that the root of the hang lies with
 the underlying Cygwin OS.

I don't recall msleep being a hang problem.  I don't see how
it could be.  Myself i wonder why the WNOHANG and msleep
loop instead of a normal waitpid.  I initially had waitpid
with checking of the pid_stat_table if ECHILD but disliked
having the duplicate code.  Besides, if wait_process has a
hang problem lets fix that instead of orphaning it.

 But I think as long as the hangs don't reappear, your
 updated patch is obviously more concise.  Otherwise, I'll be
 further tempted to take the axe to the SIGCHLD handling,
 which looks somewhat jammed with voodoo cruft.

Layer on layer.  I don't care for it myself but changes in
this tend to cause problems on less popular platforms.

 Anyhow, just to let you know.  If you're happy tidying
 up and refining the patch yourself, please go ahead. If
 you want to me to do anything, or have any comments on
 what I've done, I'd appreciate an email.  However I
 will try to follow the rsync list for the next few
 weeks at least.

As i said earlier, i intuit you are on to something with
this patch.  If you care to clean it up that would be good.
I would rather someone experiencing the hangs do the fix.
That tends to reduce the cycle times.

-- 

J.W. SchultzPegasystems Technologies
email address:  [EMAIL PROTECTED]

Remember Cernan and Schmitt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: PATCH/RFC: Another stab at the Cygwin hang problem

2003-07-10 Thread jw schultz
On Wed, Jul 09, 2003 at 06:47:35AM -0400, Tillman, James wrote:
 
 
  -Original Message-
  From: jw schultz [mailto:[EMAIL PROTECTED]
  Sent: Wednesday, July 09, 2003 5:59 AM
  To: [EMAIL PROTECTED]
  Subject: Re: PATCH/RFC: Another stab at the Cygwin hang problem
  
  
   I can't quite place why but my instincts inform me that you
   have latched onto something.  Some sort of one character
   buffering error in the io libraries under cygwin.  Most
   likely in the windos libs.
   
   Well, we have two reports of this fixing the rsync hang
   problem when signals failed.  I'd like a little more testing
   before mainlining it.
  
  Nope!  This is a no-go.  It intermittantly produces
  
  error (10) -- error in socket IO
  
  on both network and local transfers.
  
 
 I guess I'd better double check my processes to make sure that I'm getting a
 satisfactory success rate on my own servers.  If I see any clues, I'll
 report them here.  Any hope for a fix, or does this look like an inherent
 problem in the method being used?

Better diags might help.  Pull over ssh hits this.

+   write(cleanup_pipe[1], ., 1);
+   if (waitpid(pid, status, 0) != pid) {
+   rprintf(FERROR,cleanup in do_recv failed\n);
+   exit_cleanup(RERR_SOCKETIO); 
+   }   

I have two problems here.  Firstly you are ignoring errno.
The waitpid call fails but you don't identify why.
Secondly, as long as the processes exit (no hangs, zombies
or runaways) and the actual transfer is successful i don't
mind too much if the termination is less than perfect.
Lets not use RERR_SOCKETIO.  Lets use a different warning
status that only applies to the termination.





-- 

J.W. SchultzPegasystems Technologies
email address:  [EMAIL PROTECTED]

Remember Cernan and Schmitt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: PATCH/RFC: Another stab at the Cygwin hang problem

2003-07-09 Thread jw schultz
On Mon, Jun 30, 2003 at 05:49:45PM -0700, jw schultz wrote:
 On Mon, Jun 30, 2003 at 11:12:29PM +0900, Anthony Heading wrote:
  On Mon, Jun 30, 2003 at 04:54:22AM -0700, jw schultz wrote:
   Could you regenerate the patch with diff -u please?
  
  Okay, sure.  This one against current CVS.
 
 Thanks that helps in examining it.
 
 I can't quite place why but my instincts inform me that you
 have latched onto something.  Some sort of one character
 buffering error in the io libraries under cygwin.  Most
 likely in the windos libs.
 
 Well, we have two reports of this fixing the rsync hang
 problem when signals failed.  I'd like a little more testing
 before mainlining it.

Nope!  This is a no-go.  It intermittantly produces

error (10) -- error in socket IO

on both network and local transfers.



-- 

J.W. SchultzPegasystems Technologies
email address:  [EMAIL PROTECTED]

Remember Cernan and Schmitt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


RE: PATCH/RFC: Another stab at the Cygwin hang problem

2003-07-09 Thread Tillman, James


 -Original Message-
 From: jw schultz [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, July 09, 2003 5:59 AM
 To: [EMAIL PROTECTED]
 Subject: Re: PATCH/RFC: Another stab at the Cygwin hang problem
 
 
  I can't quite place why but my instincts inform me that you
  have latched onto something.  Some sort of one character
  buffering error in the io libraries under cygwin.  Most
  likely in the windos libs.
  
  Well, we have two reports of this fixing the rsync hang
  problem when signals failed.  I'd like a little more testing
  before mainlining it.
 
 Nope!  This is a no-go.  It intermittantly produces
 
   error (10) -- error in socket IO
 
 on both network and local transfers.
 

I guess I'd better double check my processes to make sure that I'm getting a
satisfactory success rate on my own servers.  If I see any clues, I'll
report them here.  Any hope for a fix, or does this look like an inherent
problem in the method being used?

jpt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


RE: PATCH/RFC: Another stab at the Cygwin hang problem

2003-07-09 Thread Tillman, James
My sincerest apologies for the duplicate msgs from me that were sent to the
list this morning.  My email administrator must have done something quite
stupid to have all msgs I've sent in the last week go out again!

jpt

 -Original Message-
 From: Tillman, James 
 Sent: Wednesday, July 09, 2003 6:48 AM
 To: [EMAIL PROTECTED]
 Subject: RE: PATCH/RFC: Another stab at the Cygwin hang problem
 
 
 
 
  -Original Message-
  From: jw schultz [mailto:[EMAIL PROTECTED]
  Sent: Wednesday, July 09, 2003 5:59 AM
  To: [EMAIL PROTECTED]
  Subject: Re: PATCH/RFC: Another stab at the Cygwin hang problem
  
  
   I can't quite place why but my instincts inform me that you
   have latched onto something.  Some sort of one character
   buffering error in the io libraries under cygwin.  Most
   likely in the windos libs.
   
   Well, we have two reports of this fixing the rsync hang
   problem when signals failed.  I'd like a little more testing
   before mainlining it.
  
  Nope!  This is a no-go.  It intermittantly produces
  
  error (10) -- error in socket IO
  
  on both network and local transfers.
  
 
 I guess I'd better double check my processes to make sure 
 that I'm getting a
 satisfactory success rate on my own servers.  If I see any clues, I'll
 report them here.  Any hope for a fix, or does this look like 
 an inherent
 problem in the method being used?
 
 jpt
 -- 
 To unsubscribe or change options: 
 http://lists.samba.org/mailman/listinfo/rsync
 Before posting, read: 
 http://www.catb.org/~esr/faqs/smart-questions.html
 
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: PATCH/RFC: Another stab at the Cygwin hang problem

2003-07-09 Thread jw schultz
On Wed, Jul 09, 2003 at 06:47:35AM -0400, Tillman, James wrote:
 
 
  -Original Message-
  From: jw schultz [mailto:[EMAIL PROTECTED]
  Sent: Wednesday, July 09, 2003 5:59 AM
  To: [EMAIL PROTECTED]
  Subject: Re: PATCH/RFC: Another stab at the Cygwin hang problem
  
  
   I can't quite place why but my instincts inform me that you
   have latched onto something.  Some sort of one character
   buffering error in the io libraries under cygwin.  Most
   likely in the windos libs.
   
   Well, we have two reports of this fixing the rsync hang
   problem when signals failed.  I'd like a little more testing
   before mainlining it.
  
  Nope!  This is a no-go.  It intermittantly produces
  
  error (10) -- error in socket IO
  
  on both network and local transfers.
  
 
 I guess I'd better double check my processes to make sure that I'm getting a
 satisfactory success rate on my own servers.  If I see any clues, I'll
 report them here.  Any hope for a fix, or does this look like an inherent
 problem in the method being used?

I haven't dug into it yet.  As the patch author i you might
be a bit more familiar with it.

I'm not running cygwin so i never had the hangs.  I only
applied it to test for regression, which is what i found.

-- 

J.W. SchultzPegasystems Technologies
email address:  [EMAIL PROTECTED]

Remember Cernan and Schmitt
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: PATCH/RFC: Another stab at the Cygwin hang problem

2003-06-30 Thread Anthony Heading
On Mon, Jun 30, 2003 at 04:54:22AM -0700, jw schultz wrote:
 Could you regenerate the patch with diff -u please?

Okay, sure.  This one against current CVS.

Anthony

--- cleanup.c.Orig  2003-06-30 22:42:16.0 +0900
+++ cleanup.c   2003-06-30 22:42:47.0 +0900
@@ -96,7 +96,6 @@
inside_cleanup++;
 
signal(SIGUSR1, SIG_IGN);
-   signal(SIGUSR2, SIG_IGN);
 
if (verbose  3)
rprintf(FINFO,_exit_cleanup(code=%d, file=%s, line=%d): entered\n, 
--- main.c.Orig 2003-04-25 01:26:09.0 +0900
+++ main.c  2003-06-30 22:41:35.0 +0900
@@ -391,6 +391,7 @@
int status=0;
int recv_pipe[2];
int error_pipe[2];
+   int cleanup_pipe[2];
extern int preserve_hard_links;
extern int delete_after;
extern int recurse;
@@ -417,11 +418,19 @@
exit_cleanup(RERR_SOCKETIO);
}
 
+   if (pipe(cleanup_pipe)  0) {
+   rprintf(FERROR,cleanup pipe failed in do_recv\n);
+   exit_cleanup(RERR_SOCKETIO);
+   }
+  
io_flush();
 
if ((pid=do_fork()) == 0) {
+   char tmp;
+
close(recv_pipe[0]);
close(error_pipe[0]);
+   close(cleanup_pipe[1]);
if (f_in != f_out) close(f_out);
 
/* we can't let two processes write to the socket at one time */
@@ -437,15 +446,21 @@
write_int(recv_pipe[1],1);
close(recv_pipe[1]);
io_flush();
-   /* finally we go to sleep until our parent kills us
-  with a USR2 signal. We sleep for a short time as on
-  some OSes a signal won't interrupt a sleep! */
-   while (msleep(20))
-   ;
+   do {
+   status = read(cleanup_pipe[0], tmp, 1);
+   } while (status == -1  errno == EINTR);
+   if (status != 1) {
+   rprintf(FERROR,cleanup read returned %d in do_recv\n, 
status);
+   if (status == -1)
+   rprintf(FERROR,with errno %d (%s)\n, errno, 
strerror(errno));
+   _exit(RERR_PARTIAL);
+   }
+   _exit(0);
}
 
close(recv_pipe[1]);
close(error_pipe[1]);
+   close(cleanup_pipe[0]);
if (f_in != f_out) close(f_in);
 
io_start_buffering(f_out);
@@ -463,8 +478,11 @@
io_flush();
 
io_set_error_fd(-1);
-   kill(pid, SIGUSR2);
-   wait_process(pid, status);
+   write(cleanup_pipe[1], ., 1);
+   if (waitpid(pid, status, 0) != pid) {
+   rprintf(FERROR,cleanup in do_recv failed\n);
+   exit_cleanup(RERR_SOCKETIO);
+   }
return status;
 }
 
@@ -881,12 +899,6 @@
exit_cleanup(RERR_SIGNAL);
 }
 
-static RETSIGTYPE sigusr2_handler(int UNUSED(val)) {
-   extern int log_got_error;
-   if (log_got_error) _exit(RERR_PARTIAL);
-   _exit(0);
-}
-
 static RETSIGTYPE sigchld_handler(int UNUSED(val)) {
 #ifdef WNOHANG
int cnt, status;
@@ -976,7 +988,6 @@
orig_argv = argv;
 
signal(SIGUSR1, sigusr1_handler);
-   signal(SIGUSR2, sigusr2_handler);
signal(SIGCHLD, sigchld_handler);
 #ifdef MAINTAINER_MODE
signal(SIGSEGV, rsync_panic_handler);
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: PATCH/RFC: Another stab at the Cygwin hang problem

2003-06-30 Thread jw schultz
On Mon, Jun 30, 2003 at 11:12:29PM +0900, Anthony Heading wrote:
 On Mon, Jun 30, 2003 at 04:54:22AM -0700, jw schultz wrote:
  Could you regenerate the patch with diff -u please?
 
 Okay, sure.  This one against current CVS.

Thanks that helps in examining it.

I can't quite place why but my instincts inform me that you
have latched onto something.  Some sort of one character
buffering error in the io libraries under cygwin.  Most
likely in the windos libs.

Well, we have two reports of this fixing the rsync hang
problem when signals failed.  I'd like a little more testing
before mainlining it.

-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html