Did you set hbase.ipc.server.callqueue.handler.factor? it looks there are 3 queues, handlers on queue 1 are all gone as Stack mentioned. jstack and pastebin regions server log would help.
On Sat, Aug 23, 2014 at 7:02 AM, Stack <[email protected]> wrote: > On Fri, Aug 22, 2014 at 3:24 PM, Johannes Schaback < > [email protected]> wrote: > > > ... > > I grep'ed "defaultRpcServer.handler=" on the log from that particular RS. > > The > > RS started at 15:35. After that, the handlers > > > > 6, 24, 0, 15, 28, 26, 7, 19, 21, 3, 5 and 23 > > > > make an appearance in error messages of HDFS related exceptions: > > > > 2014-08-22 13:54:57,470 WARN > > [defaultRpcServer.handler=6,queue=0,port=60020] hdfs.BlockReaderFactory: > > I/O error constructing remote block reader. > > 2014-08-22 13:54:57,472 WARN > > [defaultRpcServer.handler=6,queue=0,port=60020] hdfs.DFSClient: > Connection > > failure: Failed to connect to /192.168.3.233:50010 for file > > /hbase/data/default/image/dae1b8e3bfdb608571d09916bf0f > > a156/cf/866a773857654b0d83275dc4e4558be6 for block > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1294949058_255264636:java.io.IOException: > > Got error for OP_READ_BLOCK, self=/192.168.3.179:37835, remote=/ > > 192.168.3.233:50 010, for file > > > > > /hbase/data/default/image/dae1b8e3bfdb608571d09916bf0fa156/cf/866a773857654b0d83275dc4e4558be6, > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > 1294949058_255264636 > > 2014-08-22 13:56:58,525 WARN > > [defaultRpcServer.handler=24,queue=0,port=60020] > hdfs.BlockReaderFactory: > > I/O error constructing remote block reader. > > 2014-08-22 13:56:58,799 WARN > > [defaultRpcServer.handler=24,queue=0,port=60020] hdfs.DFSClient: > > Connection failure: Failed to connect to /192.168.3.142:50010 for file > > /hbase/data/default/image/cf493cbd9921ae6ca5e5281cc07 > > 18ca2/cf/7f9618dcdeae40ddbff21165d08e0a83 for block > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1294072754_254292863:java.io.IOException: > > Got error for OP_READ_BLOCK, self=/192.168.3.179:59268, remote=/ > > 192.168.3.142:5 0010, for file > > > > > /hbase/data/default/image/cf493cbd9921ae6ca5e5281cc0718ca2/cf/7f9618dcdeae40ddbff21165d08e0a83, > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > 1294072754_254292863 > > 2014-08-22 14:08:01,632 WARN > > [defaultRpcServer.handler=0,queue=0,port=60020] hdfs.BlockReaderFactory: > > I/O error constructing remote block reader. > > 2014-08-22 14:08:01,633 WARN > > [defaultRpcServer.handler=0,queue=0,port=60020] hdfs.DFSClient: > Connection > > failure: Failed to connect to /192.168.3.53:50010 for file > > /hbase/data/default/image/af02bc7fb404f4c054dcd64b44b0e > > 2a9/cf/a8881b6e2d5d41b3b56fd34fd4ca8ffd for block > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1294876123_255182439:java.io.IOException: > > Got error for OP_READ_BLOCK, self=/192.168.3.179:60737, remote=/ > > 192.168.3.53:5001 0, for file > > > > > /hbase/data/default/image/af02bc7fb404f4c054dcd64b44b0e2a9/cf/a8881b6e2d5d41b3b56fd34fd4ca8ffd, > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > 1294876123_255182439 > > 2014-08-22 14:11:34,192 WARN > > [defaultRpcServer.handler=15,queue=0,port=60020] ipc.RpcServer: > > (responseTooSlow): > > > > > {"processingtimems":15432,"call":"Multi(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MultiReque > > st)","client":"192.168.3.54:52838 > > > > > ","starttimems":1408709478713,"queuetimems":0,"class":"HRegionServer","responsesize":99638,"method":"Multi"} > > 2014-08-22 14:22:21,847 WARN > > [defaultRpcServer.handler=28,queue=1,port=60020] > hdfs.BlockReaderFactory: > > I/O error constructing remote block reader. > > 2014-08-22 14:22:21,848 WARN > > [defaultRpcServer.handler=28,queue=1,port=60020] hdfs.DFSClient: > > Connection failure: Failed to connect to /192.168.3.233:50010 for file > > /hbase/data/default/image/1d33a251862055cd999078bbd10 > > aa44c/cf/a7470351725f4cc192b6210bac9b7c44 for block > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1294949316_255264902:java.io.IOException: > > Got error for OP_READ_BLOCK, self=/192.168.3.179:44787, remote=/ > > 192.168.3.233:5 0010, for file > > > > > /hbase/data/default/image/1d33a251862055cd999078bbd10aa44c/cf/a7470351725f4cc192b6210bac9b7c44, > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > 1294949316_255264902 > > 2014-08-22 14:23:22,628 WARN > > [defaultRpcServer.handler=26,queue=2,port=60020] > hdfs.BlockReaderFactory: > > I/O error constructing remote block reader. > > 2014-08-22 14:23:22,628 WARN > > [defaultRpcServer.handler=26,queue=2,port=60020] hdfs.DFSClient: > > Connection failure: Failed to connect to /192.168.3.142:50010 for file > > /hbase/data/default/image/4a9c830a4fe006f0a6af7418164 > > dd86d/cf/47d6f2c3f7054d40aed66cae9787c464 for block > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1293459594_253612452:java.io.IOException: > > Got error for OP_READ_BLOCK, self=/192.168.3.179:37660, remote=/ > > 192.168.3.142:5 0010, for file > > > > > /hbase/data/default/image/4a9c830a4fe006f0a6af7418164dd86d/cf/47d6f2c3f7054d40aed66cae9787c464, > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > 1293459594_253612452 > > 2014-08-22 14:25:35,003 WARN > > [defaultRpcServer.handler=7,queue=1,port=60020] hdfs.BlockReaderFactory: > > I/O error constructing remote block reader. > > 2014-08-22 14:25:35,004 WARN > > [defaultRpcServer.handler=7,queue=1,port=60020] hdfs.DFSClient: > Connection > > failure: Failed to connect to /192.168.3.51:50010 for file > > /hbase/data/default/run_automaton_cache/da03f8123004be3 > > 2659e1c8a51afbbf8/cf/1daf234e740f4b00889bb60e574dc79b for block > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1294614349_254896345:java.io.IOException: > > Got error for OP_READ_BLOCK, self=/192.168.3.179:53627, remote=/192 > > .168.3.51:50010, for file > > > > > /hbase/data/default/run_automaton_cache/da03f8123004be32659e1c8a51afbbf8/cf/1daf234e740f4b00889bb60e574dc79b, > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > 1294614349_254896345 > > 2014-08-22 14:25:46,831 WARN > > [defaultRpcServer.handler=19,queue=1,port=60020] > hdfs.BlockReaderFactory: > > I/O error constructing remote block reader. > > 2014-08-22 14:25:46,832 WARN > > [defaultRpcServer.handler=19,queue=1,port=60020] hdfs.DFSClient: > > Connection failure: Failed to connect to /192.168.3.30:50010 for file > > /hbase/data/default/image/1c3bab43e260ddb46a06cf04e293 > > e386/cf/740f8240ac9a4abc9e5fcf6ec7df18bc for block > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1294151564_254380545:java.io.IOException: > > Got error for OP_READ_BLOCK, self=/192.168.3.179:55660, remote=/ > > 192.168.3.30:500 10, for file > > > > > /hbase/data/default/image/1c3bab43e260ddb46a06cf04e293e386/cf/740f8240ac9a4abc9e5fcf6ec7df18bc, > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > 1294151564_254380545 > > 2014-08-22 14:28:22,395 WARN > > [defaultRpcServer.handler=26,queue=2,port=60020] > hdfs.BlockReaderFactory: > > I/O error constructing remote block reader. > > 2014-08-22 14:28:22,397 WARN > > [defaultRpcServer.handler=26,queue=2,port=60020] hdfs.DFSClient: > > Connection failure: Failed to connect to /192.168.3.143:50010 for file > > /hbase/data/default/image/b4b48e20a606c431e393b674f92 > > 79daf/cf/76ff3628306a48f187a285b2a21d9ac9 for block > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1294308080_254554506:java.io.IOException: > > Got error for OP_READ_BLOCK, self=/192.168.3.179:51759, remote=/ > > 192.168.3.143:5 0010, for file > > > > > /hbase/data/default/image/b4b48e20a606c431e393b674f9279daf/cf/76ff3628306a48f187a285b2a21d9ac9, > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > 1294308080_254554506 > > 2014-08-22 14:30:29,395 WARN > > [defaultRpcServer.handler=21,queue=0,port=60020] > hdfs.BlockReaderFactory: > > I/O error constructing remote block reader. > > 2014-08-22 14:30:29,396 WARN > > [defaultRpcServer.handler=21,queue=0,port=60020] hdfs.DFSClient: > > Connection failure: Failed to connect to /192.168.3.30:50010 for file > > /hbase/data/default/image/77ceb80d73c065f1a4ba2ad3b7cf > > 04a2/cf/151c244177ac42b5996af0c9052660cc for block > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1294365296_254618150:java.io.IOException: > > Got error for OP_READ_BLOCK, self=/192.168.3.179:57238, remote=/ > > 192.168.3.30:500 10, for file > > > > > /hbase/data/default/image/77ceb80d73c065f1a4ba2ad3b7cf04a2/cf/151c244177ac42b5996af0c9052660cc, > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > 1294365296_254618150 > > 2014-08-22 14:31:34,016 WARN > > [defaultRpcServer.handler=6,queue=0,port=60020] hdfs.BlockReaderFactory: > > I/O error constructing remote block reader. > > 2014-08-22 14:31:34,018 WARN > > [defaultRpcServer.handler=6,queue=0,port=60020] hdfs.DFSClient: > Connection > > failure: Failed to connect to /192.168.3.212:50010 for file > > /hbase/data/default/image/45578853a5c807919578043c7715 > > 1efa/cf/40c00d49a7494929a319467d82b383da for block > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1294930505_255244337:java.io.IOException: > > Got error for OP_READ_BLOCK, self=/192.168.3.179:33973, remote=/ > > 192.168.3.212:50 010, for file > > > > > /hbase/data/default/image/45578853a5c807919578043c77151efa/cf/40c00d49a7494929a319467d82b383da, > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > 1294930505_255244337 > > 2014-08-22 14:33:55,994 WARN > > [defaultRpcServer.handler=3,queue=0,port=60020] hdfs.BlockReaderFactory: > > I/O error constructing remote block reader. > > 2014-08-22 14:33:55,995 WARN > > [defaultRpcServer.handler=3,queue=0,port=60020] hdfs.DFSClient: > Connection > > failure: Failed to connect to /192.168.3.52:50010 for file > > /hbase/data/default/image/1f07e0160fbba29a2c26104dd5966 > > 39f/cf/7e36136497fc4a5a9889797a1dfb5d3b for block > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1294915607_255226575:java.io.IOException: > > Got error for OP_READ_BLOCK, self=/192.168.3.179:41017, remote=/ > > 192.168.3.52:5001 0, for file > > > > > /hbase/data/default/image/1f07e0160fbba29a2c26104dd596639f/cf/7e36136497fc4a5a9889797a1dfb5d3b, > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > 1294915607_255226575 > > 2014-08-22 14:44:14,301 WARN > > [defaultRpcServer.handler=3,queue=0,port=60020] hdfs.BlockReaderFactory: > > I/O error constructing remote block reader. > > 2014-08-22 14:44:14,302 WARN > > [defaultRpcServer.handler=3,queue=0,port=60020] hdfs.DFSClient: > Connection > > failure: Failed to connect to /192.168.3.154:50010 for file > > /hbase/data/default/image/bb082a5f098d1d2a95365545349c > > 77a4/cf/1af06c6fd36045bab888b4ec18e45f0f for block > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1294911152_255221397:java.io.IOException: > > Got error for OP_READ_BLOCK, self=/192.168.3.179:44711, remote=/ > > 192.168.3.154:50 010, for file > > > > > /hbase/data/default/image/bb082a5f098d1d2a95365545349c77a4/cf/1af06c6fd36045bab888b4ec18e45f0f, > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > 1294911152_255221397 > > 77a4/cf/1af06c6fd36045bab888b4ec18e45f0f for block > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1294911152_255221397:java.io.IOException: > > Got error for OP_READ_BLOCK, self=/192.168.3.179:44711, remote=/ > > 192.168.3.154:50 010, for file > > > > > /hbase/data/default/image/bb082a5f098d1d2a95365545349c77a4/cf/1af06c6fd36045bab888b4ec18e45f0f, > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > 1294911152_255221397 > > 2014-08-22 14:49:14,780 WARN > > [defaultRpcServer.handler=5,queue=2,port=60020] ipc.RpcServer: > > (responseTooSlow): > > > > > {"processingtimems":15494,"call":"Multi(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MultiReques > > t)","client":"192.168.3.11:53884 > > > > > ","starttimems":1408711739283,"queuetimems":0,"class":"HRegionServer","responsesize":483492,"method":"Multi"} > > 2014-08-22 14:50:37,900 WARN > > [defaultRpcServer.handler=7,queue=1,port=60020] hdfs.BlockReaderFactory: > > I/O error constructing remote block reader. > > 2014-08-22 14:50:37,901 WARN > > [defaultRpcServer.handler=7,queue=1,port=60020] hdfs.DFSClient: > Connection > > failure: Failed to connect to /192.168.3.64:50010 for file > > /hbase/data/default/image/a50d67b81c44c864265f5030c7c39 > > 959/cf/840ff51e591946c487413273d5341a24 for block > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1294309355_254555905:java.io.IOException: > > Got error for OP_READ_BLOCK, self=/192.168.3.179:45043, remote=/ > > 192.168.3.64:5001 0, for file > > > > > /hbase/data/default/image/a50d67b81c44c864265f5030c7c39959/cf/840ff51e591946c487413273d5341a24, > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > 1294309355_254555905 > > 2014-08-22 15:09:35,289 WARN > > [defaultRpcServer.handler=23,queue=2,port=60020] > hdfs.BlockReaderFactory: > > I/O error constructing remote block reader. > > 2014-08-22 15:09:35,289 WARN > > [defaultRpcServer.handler=23,queue=2,port=60020] hdfs.DFSClient: > > Connection failure: Failed to connect to /192.168.3.63:50010 for file > > /hbase/data/default/image/1db3051c8d41943892c0230cb75b > > c1f2/cf/75ae96ea8ca2477e8f83291d7e1fe7cb for block > > > > > BP-1157637685-192.168.3.192-1382642140917:blk_1293205473_253317737:java.io.IOException: > > Got error for OP_READ_BLOCK, self=/192.168.3.179:56326, remote=/ > > 192.168.3.63:500 10, for file > > > > > /hbase/data/default/image/1db3051c8d41943892c0230cb75bc1f2/cf/75ae96ea8ca2477e8f83291d7e1fe7cb, > > for pool BP-1157637685-192.168.3.192-1382642140917 block > > 1293205473_253317737 > > 2014-08-22 15:18:38,891 WARN > > [defaultRpcServer.handler=5,queue=2,port=60020] ipc.RpcServer: > > (responseTooSlow): > > > > > {"processingtimems":26092,"call":"Multi(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MultiReques > > t)","client":"192.168.3.91:40189 > > > > > ","starttimems":1408713492722,"queuetimems":0,"class":"HRegionServer","responsesize":483492,"method":"Multi"} > > > > What's interesting is that not all crashed handlers are later missing > after > > the freeze. > > > > Does that substantiate your conjecture? > > > > > I saw those in your log. They are WARN from DFSClient which should be ok. > Any other exceptions going on in your logs that could explain our handler > loss? Or, can you see any thing particular about say the last mention of > defaultRpcServer.handler=6 say? > > St.Ack >
