Another update...actually even after the flush and major_compact the Puts still stopped. I checked my job this morning and it had progressed farther, but ultimately still was killed on the 10 minute timeouts. -geoff
________________________________ From: Geoff Hendrey Sent: Wednesday, May 19, 2010 12:26 AM To: '[email protected]' Subject: RE: Put slows down and eventually blocks Following up on my last post, I ran "flush" and "major_compact" from the shell, and it seems to have jolted HBase into resuming writes. The blocked Put method returned, and writes have now resumed normally. Any ideas why? Here are a few other relevant details: hbase(main):015:0> zk_dump HBase tree in ZooKeeper is rooted at /hbase Cluster up? true In safe mode? false Master address: 10.241.6.82:60000 Region server holding ROOT: 10.241.6.83:60020 Region servers: - 10.241.6.83:60020 - 10.241.6.81:60020 - 10.241.6.82:60020 Quorum Server Statistics: - dt5:2181 Zookeeper version: 3.2.2-888565, built on 12/08/2009 21:51 GMT Clients: /10.241.6.81:52081[1](queued=0,recved=35496,sent=0) /10.241.6.82:38365[1](queued=0,recved=32798,sent=0) /10.241.6.82:60720[1](queued=0,recved=0,sent=0) /10.241.6.82:40457[1](queued=0,recved=114,sent=0) Latency min/avg/max: 0/15/669 Received: 73534 Sent: 0 Outstanding: 0 Zxid: 0x500033498 Mode: leader Node count: 13 - dt4:2181 Zookeeper version: 3.2.2-888565, built on 12/08/2009 21:51 GMT Clients: /10.241.0.18:39273[1](queued=0,recved=34,sent=0) /10.241.6.82:43315[1](queued=0,recved=0,sent=0) /10.241.6.81:41762[1](queued=0,recved=169,sent=0) /10.241.6.83:47803[1](queued=0,recved=35438,sent=0) Latency min/avg/max: 0/2/2249 Received: 1432019 Sent: 0 Outstanding: 0 Zxid: 0x500033498 Mode: follower Node count: 13 - dt3:2181 Zookeeper version: 3.2.2-888565, built on 12/08/2009 21:51 GMT Clients: /10.241.6.82:59048[1](queued=0,recved=0,sent=0) /10.241.6.82:50822[1](queued=0,recved=36260,sent=0) /10.241.6.81:45696[1](queued=0,recved=30691,sent=0) /10.241.6.83:50027[1](queued=0,recved=36261,sent=0) /10.241.6.82:50823[1](queued=0,recved=36270,sent=0) Latency min/avg/max: 0/3/40 Received: 140600 Sent: 0 Outstanding: 0 Zxid: 0x500033498 Mode: follower Node count: 13 hbase(main):016:0> status 3 servers, 0 dead, 227.3333 average load hbase(main):017:0> flush "SEARCH_KEYS" 0 row(s) in 0.7600 seconds hbase(main):018:0> status 3 servers, 0 dead, 228.3333 average load ________________________________ From: Geoff Hendrey Sent: Tuesday, May 18, 2010 11:56 PM To: [email protected] Subject: Put slows down and eventually blocks I am experiencing a problem in which Put operations transition from working just fine, to blocking forever. I am doing Put from a reducer. I have tried the following, but none of them prevents the Puts from eventually blocking totally in all the reducers, until the task tracker kills the task due to 10 minute timeout. 1) try using just one reducer (didn't help) 2) try Put.setWriteToWall both true and false (didn't help) 3) try autoflush true and false. When true, experiment with different flush buffer sizes (didn't help) I'v been watching the HDFS namenode and datanode logs, and also the HBase master and region servers. I am running a 3-node HDFS cluster (20.2) sharing same 3 nodes with HBase 20.3. I see no problems in any logs, except that the datanode logs eventually stop showing WRITE operations (corresponding to the Put operations eventually coming to a halt). The HBase shell remains snappy and I can do list and status operations and scans without any issue from the shell. Anyone ever seen anything like this? -geoff <blocked::http://www.decarta.com>
