We have a specific environment where we need to harmonize socket connection timeouts for all Hadoop daemons and some downstreamers too. While reviewing the socket connection timeouts set in NetUtils, UrlConnection (HttpURLConnection), I compiled a list of the following configurations:
- ipc.client.connect.timeout - dfs.client.socket-timeout - dfs.datanode.socket.write.timeout - dfs.client.fsck.connect.timeout - dfs.client.fsck.read.timeout - dfs.federation.router.connect.timeout - dfs.qjournal.http.open.timeout.ms - dfs.qjournal.http.read.timeout.ms - dfs.checksum.ec.socket-timeout - hadoop.security.kms.client.timeout - mapreduce.reduce.shuffle.connect.timeout - mapreduce.reduce.shuffle.read.timeout Moreover, although “dfs.datanode.socket.reuse.keepalive” does not indicate a direct socket timeout, we set it as SocketOptions#SO_TIMEOUT if opsProcessed != 0 (to block read on InputStream only for this timeout, beyond which it would result in SocketTimeoutException). Similarly, “ipc.ping.interval” and “ipc.client.rpc-timeout.ms” are also used to set SocketOptions#SO_TIMEOUT on the socket. It's possible that I may have missed some socket timeout configs in the above list. If anyone could provide feedback on this list or suggest any missing configs, it would be greatly appreciated.