Following up on my last post, I ran "flush" and "major_compact" from the
shell, and it seems to have jolted HBase into resuming writes. The
blocked Put method returned, and writes have now resumed normally. Any
ideas why? Here are a few other relevant details:
hbase(main):015:0> zk_dump
HBase tree in ZooKeeper is rooted at /hbase
Cluster up? true
In safe mode? false
Master address: 10.241.6.82:60000
Region server holding ROOT: 10.241.6.83:60020
Region servers:
- 10.241.6.83:60020
- 10.241.6.81:60020
- 10.241.6.82:60020
Quorum Server Statistics:
- dt5:2181
Zookeeper version: 3.2.2-888565, built on 12/08/2009 21:51 GMT
Clients:
/10.241.6.81:52081[1](queued=0,recved=35496,sent=0)
/10.241.6.82:38365[1](queued=0,recved=32798,sent=0)
/10.241.6.82:60720[1](queued=0,recved=0,sent=0)
/10.241.6.82:40457[1](queued=0,recved=114,sent=0)
Latency min/avg/max: 0/15/669
Received: 73534
Sent: 0
Outstanding: 0
Zxid: 0x500033498
Mode: leader
Node count: 13
- dt4:2181
Zookeeper version: 3.2.2-888565, built on 12/08/2009 21:51 GMT
Clients:
/10.241.0.18:39273[1](queued=0,recved=34,sent=0)
/10.241.6.82:43315[1](queued=0,recved=0,sent=0)
/10.241.6.81:41762[1](queued=0,recved=169,sent=0)
/10.241.6.83:47803[1](queued=0,recved=35438,sent=0)
Latency min/avg/max: 0/2/2249
Received: 1432019
Sent: 0
Outstanding: 0
Zxid: 0x500033498
Mode: follower
Node count: 13
- dt3:2181
Zookeeper version: 3.2.2-888565, built on 12/08/2009 21:51 GMT
Clients:
/10.241.6.82:59048[1](queued=0,recved=0,sent=0)
/10.241.6.82:50822[1](queued=0,recved=36260,sent=0)
/10.241.6.81:45696[1](queued=0,recved=30691,sent=0)
/10.241.6.83:50027[1](queued=0,recved=36261,sent=0)
/10.241.6.82:50823[1](queued=0,recved=36270,sent=0)
Latency min/avg/max: 0/3/40
Received: 140600
Sent: 0
Outstanding: 0
Zxid: 0x500033498
Mode: follower
Node count: 13
hbase(main):016:0> status
3 servers, 0 dead, 227.3333 average load
hbase(main):017:0> flush "SEARCH_KEYS"
0 row(s) in 0.7600 seconds
hbase(main):018:0> status
3 servers, 0 dead, 228.3333 average load
________________________________
From: Geoff Hendrey
Sent: Tuesday, May 18, 2010 11:56 PM
To: [email protected]
Subject: Put slows down and eventually blocks
I am experiencing a problem in which Put operations transition from
working just fine, to blocking forever. I am doing Put from a reducer. I
have tried the following, but none of them prevents the Puts from
eventually blocking totally in all the reducers, until the task tracker
kills the task due to 10 minute timeout.
1) try using just one reducer (didn't help)
2) try Put.setWriteToWall both true and false (didn't help)
3) try autoflush true and false. When true, experiment with different
flush buffer sizes (didn't help)
I'v been watching the HDFS namenode and datanode logs, and also the
HBase master and region servers. I am running a 3-node HDFS cluster
(20.2) sharing same 3 nodes with HBase 20.3. I see no problems in any
logs, except that the datanode logs eventually stop showing WRITE
operations (corresponding to the Put operations eventually coming to a
halt). The HBase shell remains snappy and I can do list and status
operations and scans without any issue from the shell.
Anyone ever seen anything like this?
-geoff
<blocked::http://www.decarta.com>