Answers prefixed by "geoff:": >What do the MR logs say? Do they point to an explicit row or region for >failed task?
Geoff: MR syslog shows nothing abnormal that I can see. I am running just a single reducer, and autoflushing puts. I have pasted the reducer logs at the end of my email. As context for the information that I've pasted - I instantiate the HTable in the setup method, can I close it in cleanup. >What do the MR logs say? Do they point to an explicit row or region for >failed task? Can you trace life of that region by grepping it in master logs? > Geoff: I was writing the row key to stdout so I could collect debugging. So I know the row key, and I grepped all hadoop and hbase logs on all three servers that participate in the little Hbase cluster. Nothing in the logs that greps out for the particular row key that the put hangs on. >At time that the failing task runs, grep the regionservers logs for that time. > What is going on? Geoff: Ahhh, glad you asked. I see the Put connection crapping out (of course, because the task is killed), but ten minutes earlier a global memstore fush started, and from then until the Put is killed it's wall to wall compaction and memstore flushing. It seems clear that this caused the server to block the Put, and the compaction and flushing went on for 10 minutes, at which time the task tracker killed the job, and the flushing and compations continued for a total of about 12 minutes. This also explains why the tasks are killed so reliably at the same time each time I try to run this job - that is, because each time the reducer has written a certain amount of data, this flushing happens. So, should I try decreasing the size of the global memstore? It seems like I have to trade performance against these long pauses; the bigger the memstore, the longer the pauses. Does that make sense? Any recommendations for global memstore size on a machine with 32 GB ram, 64-bit JVM, and 8 GB ram per Hbase process? Thanks -- we're getting somewhere. (MR syslog follows) -geoff 2010-05-19 01:00:52,720 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=SHUFFLE, sessionId= 2010-05-19 01:00:52,894 INFO org.apache.hadoop.mapred.ReduceTask: ShuffleRamManager: MemoryLimit=361909440, MaxSingleShuffleLimit=90477360 2010-05-19 01:00:52,901 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201005141621_0203_r_000000_3 Thread started: Thread for merging on-disk files 2010-05-19 01:00:52,902 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201005141621_0203_r_000000_3 Thread waiting: Thread for merging on-disk files 2010-05-19 01:00:52,903 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201005141621_0203_r_000000_3 Thread started: Thread for merging in memory files 2010-05-19 01:00:52,904 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201005141621_0203_r_000000_3 Need another 4 map output(s) where 0 is already in progress 2010-05-19 01:00:52,904 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201005141621_0203_r_000000_3 Scheduled 0 outputs (0 slow hosts and0 dup hosts) 2010-05-19 01:00:52,904 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201005141621_0203_r_000000_3 Thread started: Thread for polling Map Completion Events 2010-05-19 01:00:52,911 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring obsolete output of KILLED map-task: 'attempt_201005141621_0203_m_000001_1' 2010-05-19 01:00:52,911 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring obsolete output of KILLED map-task: 'attempt_201005141621_0203_m_000002_1' 2010-05-19 01:00:52,911 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring obsolete output of KILLED map-task: 'attempt_201005141621_0203_m_000003_1' 2010-05-19 01:00:52,911 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201005141621_0203_r_000000_3: Got 4 new map-outputs 2010-05-19 01:00:57,906 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201005141621_0203_r_000000_3 Scheduled 4 outputs (0 slow hosts and0 dup hosts) 2010-05-19 01:00:57,960 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201005141621_0203_m_000000_0, compressed len: 216313347, decompressed len: 216313343 2010-05-19 01:00:57,960 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201005141621_0203_m_000001_0, compressed len: 207291891, decompressed len: 207291887 2010-05-19 01:00:57,960 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 216313343 bytes (216313347 raw bytes) into Local-FS from attempt_201005141621_0203_m_000000_0 2010-05-19 01:00:57,960 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 207291887 bytes (207291891 raw bytes) into Local-FS from attempt_201005141621_0203_m_000001_0 2010-05-19 01:00:57,961 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201005141621_0203_m_000003_0, compressed len: 222666366, decompressed len: 222666362 2010-05-19 01:00:57,961 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 222666362 bytes (222666366 raw bytes) into Local-FS from attempt_201005141621_0203_m_000003_0 2010-05-19 01:00:57,996 INFO org.apache.hadoop.mapred.ReduceTask: header: attempt_201005141621_0203_m_000002_0, compressed len: 196007332, decompressed len: 196007328 2010-05-19 01:00:57,997 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 196007328 bytes (196007332 raw bytes) into Local-FS from attempt_201005141621_0203_m_000002_0 2010-05-19 01:01:05,103 INFO org.apache.hadoop.mapred.ReduceTask: Read 207291891 bytes from map-output for attempt_201005141621_0203_m_000001_0 2010-05-19 01:01:05,106 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201005141621_0203_r_000000_3 Thread waiting: Thread for merging on-disk files 2010-05-19 01:01:05,155 INFO org.apache.hadoop.mapred.ReduceTask: Read 222666366 bytes from map-output for attempt_201005141621_0203_m_000003_0 2010-05-19 01:01:05,161 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201005141621_0203_r_000000_3 Thread waiting: Thread for merging on-disk files 2010-05-19 01:01:05,307 INFO org.apache.hadoop.mapred.ReduceTask: Read 216313347 bytes from map-output for attempt_201005141621_0203_m_000000_0 2010-05-19 01:01:05,308 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201005141621_0203_r_000000_3 Thread waiting: Thread for merging on-disk files 2010-05-19 01:01:05,379 INFO org.apache.hadoop.mapred.ReduceTask: Read 196007332 bytes from map-output for attempt_201005141621_0203_m_000002_0 2010-05-19 01:01:05,380 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201005141621_0203_r_000000_3 Thread waiting: Thread for merging on-disk files 2010-05-19 01:01:05,949 INFO org.apache.hadoop.mapred.ReduceTask: GetMapEventsThread exiting 2010-05-19 01:01:05,950 INFO org.apache.hadoop.mapred.ReduceTask: getMapsEventsThread joined. 2010-05-19 01:01:05,951 INFO org.apache.hadoop.mapred.ReduceTask: Closed ram manager 2010-05-19 01:01:05,951 INFO org.apache.hadoop.mapred.ReduceTask: Interleaved on-disk merge complete: 4 files left. 2010-05-19 01:01:05,951 INFO org.apache.hadoop.mapred.ReduceTask: In-memory merge complete: 0 files left. 2010-05-19 01:01:05,956 INFO org.apache.hadoop.mapred.ReduceTask: Merging 4 files, 842278936 bytes from disk 2010-05-19 01:01:05,957 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0 segments, 0 bytes from memory into reduce 2010-05-19 01:01:05,960 INFO org.apache.hadoop.mapred.Merger: Merging 4 sorted segments 2010-05-19 01:01:05,972 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 4 segments left of total size: 842278920 bytes 2010-05-19 01:01:06,116 INFO org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.2.2-888565, built on 12/08/2009 21:51 GMT 2010-05-19 01:01:06,116 INFO org.apache.zookeeper.ZooKeeper: Client environment:host.name=ddbs-host7-dm0 2010-05-19 01:01:06,116 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.6.0_17 2010-05-19 01:01:06,116 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc. 2010-05-19 01:01:06,116 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.6.0_17/jre 2010-05-19 01:01:06,116 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.class.path=/home/dmc2/hadoop/bin/../conf:/usr/java/default/lib/tools.jar:/home/dmc2/hadoop/bin/..:/home/dmc2/hadoop/bin/../hadoop-0.20.2-core.jar:/home/dmc2/hadoop/bin/../lib/commons-cli-1.2.jar:/home/dmc2/hadoop/bin/../lib/commons-codec-1.3.jar:/home/dmc2/hadoop/bin/../lib/commons-el-1.0.jar:/home/dmc2/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/home/dmc2/hadoop/bin/../lib/commons-logging-1.0.4.jar:/home/dmc2/hadoop/bin/../lib/commons-logging-api-1.0.4.jar:/home/dmc2/hadoop/bin/../lib/commons-net-1.4.1.jar:/home/dmc2/hadoop/bin/../lib/core-3.1.1.jar:/home/dmc2/hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/home/dmc2/hadoop/bin/../lib/jasper-compiler-5.5.12.jar:/home/dmc2/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/home/dmc2/hadoop/bin/../lib/jets3t-0.6.1.jar:/home/dmc2/hadoop/bin/../lib/jetty-6.1.14.jar:/home/dmc2/hadoop/bin/../lib/jetty-util-6.1.14.jar:/home/dmc2/hadoop/bin/../lib/junit-3.8.1.jar:/home/dmc2/hadoop/bin/../lib/kfs-0.2.2.jar:/home/dmc2/hadoop/bin/../lib/log4j-1.2.15.jar:/home/dmc2/hadoop/bin/../lib/mockito-all-1.8.0.jar:/home/dmc2/hadoop/bin/../lib/oro-2.0.8.jar:/home/dmc2/hadoop/bin/../lib/servlet-api-2.5-6.1.14.jar:/home/dmc2/hadoop/bin/../lib/slf4j-api-1.4.3.jar:/home/dmc2/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/dmc2/hadoop/bin/../lib/xmlenc-0.52.jar:/home/dmc2/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/dmc2/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar::/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/xsltc.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/jh.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/opencsv-1.8.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/xalan.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/jsr311-api-1.1.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/hadoop-0.20.1-core.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/Echo.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/xsdlib.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/saaj-api.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/ant-launcher.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/junit.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/Echo_FileTransfer.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/jta-spec1_0_1.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/hadoop-0.20.1-tools.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/commons-io-1.4.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/servletapi-2.3.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/jaxb-impl.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/lucene-queries-2.4.0.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/json.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/lucene-spellchecker-2.4.0.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/relaxngDatatype.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/xmlsec.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/saaj-impl.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/commons-logging-1.1.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/xws-security.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/commons-codec-1.3.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/commons-cli-1.2.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/workflow.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/lucene-core-2.4.0.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/hadoop-0.20.1-streaming.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/jaxb-libs.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/hadoop-0.20.1-index.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/mail.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/jaxb-xjc.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/jcommon-0.9.1.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/log4j-1.2.16.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/ant.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/xmldsig.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/decarta-util.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/marsh.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/hadoopxmlstreamer.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/xercesImpl.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/jax-qname.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/sax.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/jaxp-api.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/hadoop-0.20.1-ant.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/commons-collections-3.2.1.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/jxl.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/lucene-snowball-2.4.0.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/commons-dbcp-1.1.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/dom4j-full.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/lucene-analyzers-2.4.0.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/commons-pool-1.1.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/EchoServer.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/commons-httpclient-3.1.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/jaxb-api.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/gnujaxp.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/echopoint-0.9.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/zookeeper-3.2.2.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/signpost-core-1.1-SNAPSHOT.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/mysql-connector-java-5.0.8-bin.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/shades-0.0.5.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/commons-fileupload-1.2.1.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/dds-head.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/namespace.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/jersey-bundle-1.1.5.1.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/xml4j.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/cobertura.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/hbase-0.20.3.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/jfreechart-0.9.16.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/lib/hsqldb-2.0.0-rc8.jar:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars/classes:/hive3/mapred/local/taskTracker/jobcache/job_201005141621_0203/jars:/hive4/mapred/local/taskTracker/jobcache/job_201005141621_0203/attempt_201005141621_0203_r_000000_3/work 2010-05-19 01:01:06,116 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/home/dmc2/hadoop/bin/../lib/native/Linux-i386-32:/hive4/mapred/local/taskTracker/jobcache/job_201005141621_0203/attempt_201005141621_0203_r_000000_3/work 2010-05-19 01:01:06,117 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/hive4/mapred/local/taskTracker/jobcache/job_201005141621_0203/attempt_201005141621_0203_r_000000_3/work/tmp 2010-05-19 01:01:06,117 INFO org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=<NA> 2010-05-19 01:01:06,117 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux 2010-05-19 01:01:06,117 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.arch=i386 2010-05-19 01:01:06,117 INFO org.apache.zookeeper.ZooKeeper: Client environment:os.version=2.6.18-128.el5PAE 2010-05-19 01:01:06,117 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.name=dmc2 2010-05-19 01:01:06,117 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/dmc2 2010-05-19 01:01:06,117 INFO org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/disk4/hive4/mapred/local/taskTracker/jobcache/job_201005141621_0203/attempt_201005141621_0203_r_000000_3/work 2010-05-19 01:01:06,118 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=dt4.dt.sv4.decarta.com:2181 sessionTimeout=60000 watcher=org.apache.hadoop.hbase.client.hconnectionmanager$clientzkwatc...@11f23e5 2010-05-19 01:01:06,120 INFO org.apache.zookeeper.ClientCnxn: zookeeper.disableAutoWatchReset is false 2010-05-19 01:01:06,136 INFO org.apache.zookeeper.ClientCnxn: Attempting connection to server dt4.dt.sv4.decarta.com/10.241.6.82:2181 2010-05-19 01:01:06,141 INFO org.apache.zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected local=/10.241.0.69:34497 remote=dt4.dt.sv4.decarta.com/10.241.6.82:2181] 2010-05-19 01:01:06,155 INFO org.apache.zookeeper.ClientCnxn: Server connection successful 2010-05-19 01:11:24,084 INFO org.apache.zookeeper.ZooKeeper: Closing session: 0x1288430447a0449 2010-05-19 01:11:24,084 INFO org.apache.zookeeper.ClientCnxn: Closing ClientCnxn for session: 0x1288430447a0449 -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Stack Sent: Wednesday, May 19, 2010 3:29 PM To: [email protected] Subject: Re: Put slows down and eventually blocks What do the MR logs say? Do they point to an explicit row or region for failed task? Can you trace life of that region by grepping it in master logs? At time that the failing task runs, grep the regionservers logs for that time. What is going on? Can you get from the bad region now, after MR has gone away? Are your tasktrackers running beside your datanodes and regionservers? Swapping? i/o stress? If so, try running w/ less concurrent tasks at a time. See if that helps? St.Ack On Wed, May 19, 2010 at 9:23 AM, Geoff Hendrey <[email protected]> wrote: > Another update...actually even after the flush and major_compact the > Puts still stopped. I checked my job this morning and it had > progressed farther, but ultimately still was killed on the 10 minute timeouts. > > -geoff > > ________________________________ > > From: Geoff Hendrey > Sent: Wednesday, May 19, 2010 12:26 AM > To: '[email protected]' > Subject: RE: Put slows down and eventually blocks > > > Following up on my last post, I ran "flush" and "major_compact" from > the shell, and it seems to have jolted HBase into resuming writes. The > blocked Put method returned, and writes have now resumed normally. Any > ideas why? Here are a few other relevant details: > > > hbase(main):015:0> zk_dump > > HBase tree in ZooKeeper is rooted at /hbase > Cluster up? true > In safe mode? false > Master address: 10.241.6.82:60000 > Region server holding ROOT: 10.241.6.83:60020 > Region servers: > - 10.241.6.83:60020 > - 10.241.6.81:60020 > - 10.241.6.82:60020 > Quorum Server Statistics: > - dt5:2181 > Zookeeper version: 3.2.2-888565, built on 12/08/2009 21:51 GMT > Clients: > /10.241.6.81:52081[1](queued=0,recved=35496,sent=0) > /10.241.6.82:38365[1](queued=0,recved=32798,sent=0) > /10.241.6.82:60720[1](queued=0,recved=0,sent=0) > /10.241.6.82:40457[1](queued=0,recved=114,sent=0) > > Latency min/avg/max: 0/15/669 > Received: 73534 > Sent: 0 > Outstanding: 0 > Zxid: 0x500033498 > Mode: leader > Node count: 13 > - dt4:2181 > Zookeeper version: 3.2.2-888565, built on 12/08/2009 21:51 GMT > Clients: > /10.241.0.18:39273[1](queued=0,recved=34,sent=0) > /10.241.6.82:43315[1](queued=0,recved=0,sent=0) > /10.241.6.81:41762[1](queued=0,recved=169,sent=0) > /10.241.6.83:47803[1](queued=0,recved=35438,sent=0) > > Latency min/avg/max: 0/2/2249 > Received: 1432019 > Sent: 0 > Outstanding: 0 > Zxid: 0x500033498 > Mode: follower > Node count: 13 > - dt3:2181 > Zookeeper version: 3.2.2-888565, built on 12/08/2009 21:51 GMT > Clients: > /10.241.6.82:59048[1](queued=0,recved=0,sent=0) > /10.241.6.82:50822[1](queued=0,recved=36260,sent=0) > /10.241.6.81:45696[1](queued=0,recved=30691,sent=0) > /10.241.6.83:50027[1](queued=0,recved=36261,sent=0) > /10.241.6.82:50823[1](queued=0,recved=36270,sent=0) > > Latency min/avg/max: 0/3/40 > Received: 140600 > Sent: 0 > Outstanding: 0 > Zxid: 0x500033498 > Mode: follower > Node count: 13 > hbase(main):016:0> status > 3 servers, 0 dead, 227.3333 average load hbase(main):017:0> flush > "SEARCH_KEYS" > 0 row(s) in 0.7600 seconds > hbase(main):018:0> status > 3 servers, 0 dead, 228.3333 average load > > > ________________________________ > > From: Geoff Hendrey > Sent: Tuesday, May 18, 2010 11:56 PM > To: [email protected] > Subject: Put slows down and eventually blocks > > > I am experiencing a problem in which Put operations transition from > working just fine, to blocking forever. I am doing Put from a reducer. > I have tried the following, but none of them prevents the Puts from > eventually blocking totally in all the reducers, until the task > tracker kills the task due to 10 minute timeout. > > 1) try using just one reducer (didn't help) > 2) try Put.setWriteToWall both true and false (didn't help) > 3) try autoflush true and false. When true, experiment with different > flush buffer sizes (didn't help) > > I'v been watching the HDFS namenode and datanode logs, and also the > HBase master and region servers. I am running a 3-node HDFS cluster > (20.2) sharing same 3 nodes with HBase 20.3. I see no problems in any > logs, except that the datanode logs eventually stop showing WRITE > operations (corresponding to the Put operations eventually coming to a > halt). The HBase shell remains snappy and I can do list and status > operations and scans without any issue from the shell. > > Anyone ever seen anything like this? > > -geoff > > > <blocked::http://www.decarta.com> > > > > >
