There was a Solaris patch a while back 
that broke rshell in 2.6 that caused
this same issue  on many of our systems.

Bug ID 4242754 caused by jumbo kernel patch
105181-13...

Check and see if you have the updated 105181 
patch. Might be a good place to start.

-b

-----Original Message-----
From: Hal Haygood [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, January 09, 2001 4:06 AM
To: [EMAIL PROTECTED]
Subject: Rsync 2.4.6, Solaris 2.6 hang, w/info


I'm experiencing a hang with rsync 2.4.6 on Solaris.  Inititating and
target 
hosts are both Solaris 2.6.  It looks like there might be some network 
latency issues, but the parent rsh process has been blocking on the same

write() for several hours now, so I don't think that's quite it.  It
also 
looks like something's quite hung up, because the 15-minute timeout
isn't 
timing out.

This is for an rsync push of a large directory tree.  The command is:

/usr/local/bin/rsync \
        -avzHlW \
        --rsync-path=/usr/local/bin/rsync \
        --timeout=900 \
        --delete \
        --exclude (some excludes here) \
        /local/directory/name/* \
        remotehost:/remote/directory/name

The TCP queue on the sending host looks like this:

   Local Address        Remote Address    Swind Send-Q Rwind Recv-Q
State
-------------------- -------------------- ----- ------ ----- ------
-------
thishost.1018        remotehost.shell     8760      0     0      0
ESTABLISHED
thishost.1017        remotehost.1022      8760      0  8760      0
ESTABLISHED

The TCP queue on the receiving host looks like this:

   Local Address        Remote Address    Swind Send-Q Rwind Recv-Q
State
-------------------- -------------------- ----- ------ ----- ------
-------
remotehost.shell     thishost.1018           1      0  8760      0
ESTABLISHED
remotehost.1022      thishost.1017        8760      0  8760      0
ESTABLISHED

The "rsync --avzHlW" process on the sending host is looping on something
like 
this:
poll(0xEFFFD580, 0, 20)                         = 0
poll(0xEFFFD580, 0, 1)                          = 0
waitid(P_PID, 3019, 0xEFFFF588, WEXITED|WTRAPPED|WNOHANG) = 0
poll(0xEFFFD580, 0, 20)                         = 0
waitid(P_PID, 3019, 0xEFFFF588, WEXITED|WTRAPPED|WNOHANG) = 0
poll(0xEFFFD580, 0, 20)                         = 0
poll(0xEFFFD580, 0, 1)                          = 0
waitid(P_PID, 3019, 0xEFFFF588, WEXITED|WTRAPPED|WNOHANG) = 0
poll(0xEFFFD580, 0, 20)                         = 0
waitid(P_PID, 3019, 0xEFFFF588, WEXITED|WTRAPPED|WNOHANG) = 0
poll(0xEFFFD580, 0, 20)                         = 0
poll(0xEFFFD580, 0, 9)                          = 0
waitid(P_PID, 3019, 0xEFFFF588, WEXITED|WTRAPPED|WNOHANG) = 0
poll(0xEFFFD580, 0, 20)                         = 0
poll(0xEFFFD580, 0, 1)                          = 0

The parent rsh process on the sending host is stuck in:
write(1, " p a r t o f a f i l e n a m e".., 285) (sleeping...)

The child rsh process on the sending host is stuck in:
read(0, 0xEFFFF410, 1024)       (sleeping...)

The "rsync --server" process on the receiving host is stuck in:
poll(0xEFFFC110, 1, 60000)      (sleeping...)

The "csh --c /usr/local/bin/rsync" process on the receiving host is
stuck in:
sigsuspend(0xEFFFF938)          (sleeping...)

The "in.rshd" process on the receiving host is stuck in:
poll(0xEFFFD7F8, 2, -1)         (sleeping...)

So, any ideas?  Like I said, it looks like write() is blocking for no 
particular reason, and that's causing us to sit and spin.

Thoughts?

Hal

Reply via email to