Github user shivaram commented on the pull request:
https://github.com/apache/spark/pull/1471#issuecomment-50208195
@mateiz So I looked at this more closely today -- It turns out these
retries don't help much with Connection timed out exceptions. If the connection
attempt times out, the socket gets closed and from [1] we get a
ClosedChannelException. So even if we sleep and call `finishConnect` again, we
get back the same exception.
The right thing to do if the Socket was closed due to a timeout, is to open
a new socket and try to connect. The ConnectionManager doesn't have support for
that right now and it seems like a much bigger change. Do you think that is
something that might be useful to do ?
FWIW, I was able to work around my problems by increasing the number of SYN
retries in Linux. (I ran `echo 8 > /proc/sys/net/ipv4/tcp_syn_retries`)
[1]
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/sun/nio/ch/SocketChannelImpl.java?av=h#573
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---