Hi,
I'm using hadoop-2.2.0 and take advantage of Hadoop WritableRpcEngine to
build my distributed application, and I have 'heartbeat' interface in my
application to check availability periodically, in order to detect any
potential failure, I enabled "rpc_timeout" when creating the proxy as below
int rpcTimeout=1000;// 1 second as rpc timeout
RPC.waitForProxy(
MyApplicationInterface.class, MyApplicationInterface.versionID,
socAddr, conf, rpcTimeout, timeout);
Everything went fine initially, I can see failures can be detected by the
heartbeat, but after a period of time(2 days or so), I saw a lot of TCP
connections in CLOSE_WAIT state on server side, and client was not able to
connect to it again.
Any clue about this?
Thanks
--
--Anfernee