Re: distributed log splitting aborted

Cyril Scetbon Fri, 06 Jul 2012 11:41:16 -0700

As you can see in the master log, region servers are in charge of splitting log 
files (not found I suppose) and it's retried several times (I didn't check if 
it's always redone)  on different region servers. You can for example follow a 
failing split concerning a file not found in the hadoop filesystem :


http://pastebin.com/RbcLdbcs

Regards

Cyril SCETBON

On Jul 6, 2012, at 8:17 PM, Cyril Scetbon wrote:

> Here are the log files you asked for :
> 
> http://pastebin.com/xRBuQdNS  <---- hbase-master.log
> 
> http://pastebin.com/u6WYQT6R <---- hdfs-namenode.log
> 
> If you find the fix to this damn issue I'll enjoy !
> 
> Thanks
> 
> Cyril SCETBON
> 
> On Jul 5, 2012, at 11:44 PM, Jean-Daniel Cryans wrote:
> 
>> Interesting... Can you read the file? Try a "hadoop dfs -cat" on it
>> and see if it goes to the end of it.
>> 
>> It could also be useful to see a bigger portion of the master log, for
>> all I know maybe it handles it somehow and there's a problem
>> elsewhere.
>> 
>> Finally, which Hadoop version are you using?
>> 
>> Thx,
>> 
>> J-D
>> 
>> On Thu, Jul 5, 2012 at 1:58 PM, Cyril Scetbon <[email protected]> wrote:
>>> yes :
>>> 
>>> /hbase/.logs/hb-d12,60020,1341429679981-splitting/hb-d12%2C60020%2C1341429679981.134143064971
>>> 
>>> I did a fsck and here is the report :
>>> 
>>> Status: HEALTHY
>>> Total size:    618827621255 B (Total open files size: 868 B)
>>> Total dirs:    4801
>>> Total files:   2825 (Files currently being written: 42)
>>> Total blocks (validated):      11479 (avg. block size 53909541 B) (Total 
>>> open file blocks (not validated): 41)
>>> Minimally replicated blocks:   11479 (100.0 %)
>>> Over-replicated blocks:        1 (0.008711561 %)
>>> Under-replicated blocks:       0 (0.0 %)
>>> Mis-replicated blocks:         0 (0.0 %)
>>> Default replication factor:    4
>>> Average block replication:     4.0000873
>>> Corrupt blocks:                0
>>> Missing replicas:              0 (0.0 %)
>>> Number of data-nodes:          12
>>> Number of racks:               1
>>> FSCK ended at Thu Jul 05 20:56:35 UTC 2012 in 795 milliseconds
>>> 
>>> 
>>> The filesystem under path '/hbase' is HEALTHY
>>> 
>>> Cyril SCETBON
>>> 
>>> Cyril SCETBON
>>> 
>>> On Jul 5, 2012, at 7:59 PM, Jean-Daniel Cryans wrote:
>>> 
>>>> Does this file really exist in HDFS?
>>>> 
>>>> hdfs://hb-zk1:54310/hbase/.logs/hb-d12,60020,1341429679981-splitting/hb-d12%2C60020%2C1341429679981.1341430649711
>>>> 
>>>> If so, did you run fsck in HDFS?
>>>> 
>>>> It would be weird if HDFS doesn't report anything bad but somehow the
>>>> clients (like HBase) can't read it.
>>>> 
>>>> J-D
>>>> 
>>>> On Thu, Jul 5, 2012 at 12:45 AM, Cyril Scetbon <[email protected]> 
>>>> wrote:
>>>>> Hi,
>>>>> 
>>>>> I can nolonger start my cluster correctly and get messages like 
>>>>> http://pastebin.com/T56wrJxE (taken on one region server)
>>>>> 
>>>>> I suppose Hbase is not done for being stopped but only for having some 
>>>>> nodes going down ??? HDFS is not complaining, it's only HBase that can't 
>>>>> start correctly :(
>>>>> 
>>>>> I suppose some data has not been flushed and it's not really important 
>>>>> for me. Is there a way to fix theses errors even if I will lose data ?
>>>>> 
>>>>> thanks
>>>>> 
>>>>> Cyril SCETBON
>>>>> 
>>> 
>

Re: distributed log splitting aborted

Reply via email to