Re: Hanging regionservers

Stack Fri, 16 Jul 2010 10:41:46 -0700

Each time you threaddump, its stuck in same way?

I've not seen this dfsclient hangup before, not that I remember.  Let
me ask some hdfs-heads.  Will be back to you.


Any chance of your upping to CHD3b2, for your hadoop at least?  HDFS
has a few dfsclient/ipc fixes -- though looking at them none seem to
explicitly address your issue.

Whats that jvm that you are running?  Can you do a java -version?
Whats your OS?

Thanks,
St.Ack



On Fri, Jul 16, 2010 at 10:10 AM, Luke Forehand
<[email protected]> wrote:
> Line 58 and line 79 are the threads that I found suspicious.
>
> http://pastebin.com/W1E2nCZq
>
> The other stack traces from the other two region servers look identical to 
> this one.  BTW - I have made the config changes per Ryan Rawson's suggestion 
> (thanks!) and I've processed ~7 GB of the 15 GB without hangup thus far so 
> I'm crossing my fingers.
>
> -Luke
>
> On 7/16/10 11:48 AM, "Stack" <[email protected]> wrote:
>
> Would you mind pastebinning the stacktrace?  It doesn't looks like
> https://issues.apache.org/jira/browse/HDFS-88 (HBASE-667) going by the
> below, an issue that HADOOP-5859 purportedly fixes -- I see you
> commented on it -- but our Todd thinks otherwise (He has a 'real' fix
> up in another issue that I currently can't put my finger on).
> St.Ack
>
> On Fri, Jul 16, 2010 at 7:19 AM, Luke Forehand
> <[email protected]> wrote:
>>
>> I grepped yesterday's logs on all servers for "Blocking updates" and there 
>> was no trace.  I believe I had encountered the blocking updates problem 
>> earlier in the project but throttled down the import speed which seemed to 
>> fix that.
>>
>> I just double checked and all three region servers were idle.  Something 
>> interesting that I noticed however, was that each regionserver had a 
>> particular ResponseProcessor thread running for a specific block, and that 
>> thread was stuck in a running state during the entirety of the hang.  Also a 
>> DataStreamer thread for the block associated with the ResponseProcessor was 
>> in a wait state.  This makes me think that each server was stuck operating 
>> on a specific block.
>>
>> "ResponseProcessor for block blk_1926230463847049982_2694658" - Thread 
>> t...@61160
>>   java.lang.Thread.State: RUNNABLE
>>    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>>    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
>>    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
>>    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
>>    - locked sun.nio.ch.uti...@196fbfd0
>>    - locked java.util.collections$unmodifiable...@7799fdbb
>>    - locked sun.nio.ch.epollselectori...@1ee13d55
>>    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
>>    at 
>> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
>>    at 
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
>>    at 
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>    at 
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>    at java.io.DataInputStream.readFully(DataInputStream.java:178)
>>    at java.io.DataInputStream.readLong(DataInputStream.java:399)
>>    at 
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2399)
>>
>>   Locked ownable synchronizers:
>>    - None
>>
>> "DataStreamer for file 
>> /hbase/.logs/dn01.colo.networkedinsights.com,60020,1279222293084/hlog.dat.1279228611023
>>  block blk_1926230463847049982_2694658" - Thread t...@61158
>>   java.lang.Thread.State: TIMED_WAITING on java.util.linkedl...@475b455c
>>    at java.lang.Object.wait(Native Method)
>>    at 
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2247)
>>
>>   Locked ownable synchronizers:
>>    - None
>>
>
>

Re: Hanging regionservers

Reply via email to