I was about to migrate to CHD3b2 but thought I would wait for a few replies before doing so. I'll likely have the migration done over the weekend.
Java: java version "1.6.0_16" Java(TM) SE Runtime Environment (build 1.6.0_16-b01) Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode) OS: CentOS release 5.5 (Final) 2.6.18-194.3.1.el5 #1 SMP Thu May 13 13:08:30 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux Thanks Luke On 7/16/10 12:45 PM, "Stack" <[email protected]> wrote: Yeah, go to cdh3b2 if you can. If you can repro there, there's a few fellas (other than us hbasers) who'd be real interested in your problem. St.Ack On Fri, Jul 16, 2010 at 10:40 AM, Stack <[email protected]> wrote: > Each time you threaddump, its stuck in same way? > > I've not seen this dfsclient hangup before, not that I remember. Let > me ask some hdfs-heads. Will be back to you. > > Any chance of your upping to CHD3b2, for your hadoop at least? HDFS > has a few dfsclient/ipc fixes -- though looking at them none seem to > explicitly address your issue. > > Whats that jvm that you are running? Can you do a java -version? > Whats your OS? > > Thanks, > St.Ack > > > > On Fri, Jul 16, 2010 at 10:10 AM, Luke Forehand > <[email protected]> wrote: >> Line 58 and line 79 are the threads that I found suspicious. >> >> http://pastebin.com/W1E2nCZq >> >> The other stack traces from the other two region servers look identical to >> this one. BTW - I have made the config changes per Ryan Rawson's suggestion >> (thanks!) and I've processed ~7 GB of the 15 GB without hangup thus far so >> I'm crossing my fingers. >> >> -Luke >> >> On 7/16/10 11:48 AM, "Stack" <[email protected]> wrote: >> >> Would you mind pastebinning the stacktrace? It doesn't looks like >> https://issues.apache.org/jira/browse/HDFS-88 (HBASE-667) going by the >> below, an issue that HADOOP-5859 purportedly fixes -- I see you >> commented on it -- but our Todd thinks otherwise (He has a 'real' fix >> up in another issue that I currently can't put my finger on). >> St.Ack >> >> On Fri, Jul 16, 2010 at 7:19 AM, Luke Forehand >> <[email protected]> wrote: >>> >>> I grepped yesterday's logs on all servers for "Blocking updates" and there >>> was no trace. I believe I had encountered the blocking updates problem >>> earlier in the project but throttled down the import speed which seemed to >>> fix that. >>> >>> I just double checked and all three region servers were idle. Something >>> interesting that I noticed however, was that each regionserver had a >>> particular ResponseProcessor thread running for a specific block, and that >>> thread was stuck in a running state during the entirety of the hang. Also >>> a DataStreamer thread for the block associated with the ResponseProcessor >>> was in a wait state. This makes me think that each server was stuck >>> operating on a specific block. >>> >>> "ResponseProcessor for block blk_1926230463847049982_2694658" - Thread >>> t...@61160 >>> java.lang.Thread.State: RUNNABLE >>> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) >>> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) >>> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) >>> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) >>> - locked sun.nio.ch.uti...@196fbfd0 >>> - locked java.util.collections$unmodifiable...@7799fdbb >>> - locked sun.nio.ch.epollselectori...@1ee13d55 >>> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) >>> at >>> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332) >>> at >>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) >>> at >>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) >>> at >>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) >>> at java.io.DataInputStream.readFully(DataInputStream.java:178) >>> at java.io.DataInputStream.readLong(DataInputStream.java:399) >>> at >>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2399) >>> >>> Locked ownable synchronizers: >>> - None >>> >>> "DataStreamer for file >>> /hbase/.logs/dn01.colo.networkedinsights.com,60020,1279222293084/hlog.dat.1279228611023 >>> block blk_1926230463847049982_2694658" - Thread t...@61158 >>> java.lang.Thread.State: TIMED_WAITING on java.util.linkedl...@475b455c >>> at java.lang.Object.wait(Native Method) >>> at >>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2247) >>> >>> Locked ownable synchronizers: >>> - None >>> >> >> >
