RE: Put slows down and eventually blocks

Geoff Hendrey Wed, 19 May 2010 00:26:36 -0700

Following up on my last post, I ran "flush" and "major_compact" from the
shell, and it seems to have jolted HBase into resuming writes. The
blocked Put method returned, and writes have now resumed normally. Any
ideas why? Here are a few other relevant details:


hbase(main):015:0> zk_dump
 
HBase tree in ZooKeeper is rooted at /hbase
  Cluster up? true
  In safe mode? false
  Master address: 10.241.6.82:60000
  Region server holding ROOT: 10.241.6.83:60020
  Region servers:
    - 10.241.6.83:60020
    - 10.241.6.81:60020
    - 10.241.6.82:60020
  Quorum Server Statistics:
    - dt5:2181
        Zookeeper version: 3.2.2-888565, built on 12/08/2009 21:51 GMT
        Clients:
         /10.241.6.81:52081[1](queued=0,recved=35496,sent=0)
         /10.241.6.82:38365[1](queued=0,recved=32798,sent=0)
         /10.241.6.82:60720[1](queued=0,recved=0,sent=0)
         /10.241.6.82:40457[1](queued=0,recved=114,sent=0)
 
        Latency min/avg/max: 0/15/669
        Received: 73534
        Sent: 0
        Outstanding: 0
        Zxid: 0x500033498
        Mode: leader
        Node count: 13
    - dt4:2181
        Zookeeper version: 3.2.2-888565, built on 12/08/2009 21:51 GMT
        Clients:
         /10.241.0.18:39273[1](queued=0,recved=34,sent=0)
         /10.241.6.82:43315[1](queued=0,recved=0,sent=0)
         /10.241.6.81:41762[1](queued=0,recved=169,sent=0)
         /10.241.6.83:47803[1](queued=0,recved=35438,sent=0)
 
        Latency min/avg/max: 0/2/2249
        Received: 1432019
        Sent: 0
        Outstanding: 0
        Zxid: 0x500033498
        Mode: follower
        Node count: 13
    - dt3:2181
        Zookeeper version: 3.2.2-888565, built on 12/08/2009 21:51 GMT
        Clients:
         /10.241.6.82:59048[1](queued=0,recved=0,sent=0)
         /10.241.6.82:50822[1](queued=0,recved=36260,sent=0)
         /10.241.6.81:45696[1](queued=0,recved=30691,sent=0)
         /10.241.6.83:50027[1](queued=0,recved=36261,sent=0)
         /10.241.6.82:50823[1](queued=0,recved=36270,sent=0)
 
        Latency min/avg/max: 0/3/40
        Received: 140600
        Sent: 0
        Outstanding: 0
        Zxid: 0x500033498
        Mode: follower
        Node count: 13
hbase(main):016:0> status
3 servers, 0 dead, 227.3333 average load
hbase(main):017:0> flush "SEARCH_KEYS"
0 row(s) in 0.7600 seconds
hbase(main):018:0> status
3 servers, 0 dead, 228.3333 average load


________________________________

From: Geoff Hendrey 
Sent: Tuesday, May 18, 2010 11:56 PM
To: [email protected]
Subject: Put slows down and eventually blocks


I am experiencing a problem in which Put operations transition from
working just fine, to blocking forever. I am doing Put from a reducer. I
have tried the following, but none of them prevents the Puts from
eventually blocking totally in all the reducers, until the task tracker
kills the task due to 10 minute timeout.
 
1) try using just one reducer (didn't help)
2) try Put.setWriteToWall both true and false (didn't help)
3) try autoflush true and false. When true, experiment with different
flush buffer sizes (didn't help)
 
I'v been watching the HDFS namenode and datanode logs, and also the
HBase master and region servers. I am running a 3-node HDFS cluster
(20.2) sharing same 3 nodes with HBase 20.3. I see no problems in any
logs, except that the datanode logs eventually stop showing WRITE
operations (corresponding to the Put operations eventually coming to a
halt). The HBase shell remains snappy and I can do list and status
operations and scans without any issue from the shell.
 
Anyone ever seen anything like this?
 
-geoff
 

<blocked::http://www.decarta.com>

RE: Put slows down and eventually blocks

Reply via email to