Alright so I was able to get the logs from Eran, the HDFS errors are a
red herring, what followed in the region server log that is really
important is:

2011-04-10 10:14:27,278 INFO org.apache.zookeeper.ClientCnxn: Client
session timed out, have not heard from server in 144490ms for
sessionid 0x12ee42283320050, closing socket connection and attempting
reconnect

Which is a 2m20s GC pause. The HDFS errors come from the fact that the
master split the logs _while_ the region server was sleeping.

J-D

On Mon, Apr 11, 2011 at 11:47 AM, Jean-Daniel Cryans
<[email protected]> wrote:
> So my understanding is that this log file was opened at 7:29 and then
> something happened at 10:12:55 as something triggered the recovery on
> that block. It triggered a recovery of the block with the new name
> being blk_1213779416283711358_54249
>
> It seems that that process was started by the DFS Client at 10:12:55
> but the RS log starts at 10:14. Would it be possible to see what was
> before that? Also it would be nice to have a view for those blocks on
> all the datanodes.
>
> It would be nice to do this debugging on IRC is it can require a lot
> of back and forth.
>
> J-D
>
> On Mon, Apr 11, 2011 at 11:22 AM, Eran Kutner <[email protected]> wrote:
>> There wasn't an attachment, I pasted all the lines from all the NN logs that
>> contain that particular block number inline.
>>
>> As for CPU/IO, first there is nothing else running on those servers, second,
>> CPU utilization on the slaves at peak load was around 40% and disk IO
>> utilization less than 20%. That's the strange thing about it (I have another
>> thread going about the performance), there is no bottleneck I could identify
>> and yet the performance was relatively low, compared to the numbers I see
>> quoted for HBase in other places.
>>
>> The first line of the NN log says:
>> BLOCK* NameSystem.allocateBlock:
>> /hbase/.logs/hadoop1-s01.farm-ny.gigya.com,60020,1302185988579/hadoop1-s01.farm-ny.gigya.com%3A60020.1302434963279.blk_1213779416283711358_54194
>> So it looks like a file name is:
>> /hbase/.logs/hadoop1-s01.farm-ny.gigya.com,60020,1302185988579/hadoop1-s01.farm-ny.gigya.com%3A60020.1302434963279
>>
>> Is there a better way to associate a file with a block?
>>
>> -eran
>>
>>
>>
>

Reply via email to