OK. Here is the summary of what I know:
A region server, after some amount of scanning, can begin to get ClosedChannelException when it tries to respond to the client. Unfortunately, this only effects the response to the client. The region server apparently continues to tell zookeeper and say "I'm alive and OK". Consequently, the regionserver is never shutdown. This causes the client to still attempt to access regions on the effectively-dead server. But each request will eventually time out on the client side, since all the client sees is "I sent a request, and never receive any response on the socket.". However, the client has no capability to inform the master of the problem. If I manually shutdown the region server where the problem exists, the regions get redistributed other region servers automatically, and then the client will receive new information about the new location of the regions, on a different region server, and the client can begin functioning again. However, the problem will soon reappear on a different region server. -geoff
