Hi YuMing, :) yes. several iterations of jstack on the problem regionserver could help identify the problem
Rural, you probably hit hbase11277(and probably YuMin as well) - the reader 14 loops again and again in below stack(high cpu usage) and listener 12 is blocked and cannot accept new connections. 1. Thread 12 (RpcServer.listener,port=60020): 2. State: BLOCKED 3. Blocked count: 123264191 4. Waited count: 0 5. Blocked on org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader@77f87716 6. Blocked by 14 (RpcServer.reader=1,port=60020) 7. Stack: 8. org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.registerChannel(RpcServer.java:598) 9. org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:755) 10. org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:673) 11. Thread 24 (RpcServer.responder): 1. Thread 14 (RpcServer.reader=1,port=60020): 2. State: RUNNABLE 3. Blocked count: 12510492 4. Waited count: 12826560 5. Stack: 6. sun.nio.ch.FileDispatcher.read0(Native Method) 7. sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) 8. sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:251) 9. sun.nio.ch.IOUtil.read(IOUtil.java:224) 10. sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:254) 11. org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2438) 12. org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2404) 13. org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1498) 14. org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:780) 15. org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:568) 16. org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:543) 17. java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) 18. java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 19. java.lang.Thread.run(Thread.java:701) 20. Thread 13 (RpcServer.reader=0,port=60020): 21. 1. 2014-07-10 14:13:49,614 WARN [RpcServer.reader=7,port=60020] ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0 2. java.io.IOException: Connection reset by peer 3. at sun.nio.ch.FileDispatcher.read0(Native Method) 4. at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) 5. at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:251) 6. at sun.nio.ch.IOUtil.read(IOUtil.java:224) 7. at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:254) 8. at org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2404) 9. at org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1425) 10. at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:780) 11. at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:568) 12. at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:543) 13. at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) 14. at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 15. at java.lang.Thread.run(Thread.java:701) On Mon, Jul 14, 2014 at 9:24 AM, Rural Hunter <[email protected]> wrote: > Yes. But you may want to check if there are many connections in SYN_RECV > state when the problem happens. > > > 于 2014/7/14 4:18, vito 写道: > >> Hi Rural , >> >> >> Do you mean the following action you have taken? Thanks a lot. >> >> "Anyway, I just changed these kernel settings: >> net.core.somaxconn=1024 (original 128) >> net.ipv4.tcp_synack_retries=2 (original 5) " >> >> >> >> -- >> View this message in context: http://apache-hbase.679495.n3. >> nabble.com/hbase-region-servers-refuse-connection-tp4061278p4061293.html >> Sent from the HBase User mailing list archive at Nabble.com. >> . >> >> >
