I've finally succeeded in starting my cluster by disabling hbase.master.distributed.log.splitting
it took less than 10 minutes to start it compared to the whole night without any success with distributed log splitting enabled. Don't you think like me that it's just buggy ?? thanks Cyril SCETBON On Jul 6, 2012, at 8:40 PM, Cyril Scetbon wrote: > As you can see in the master log, region servers are in charge of splitting > log files (not found I suppose) and it's retried several times (I didn't > check if it's always redone) on different region servers. You can for > example follow a failing split concerning a file not found in the hadoop > filesystem : > > http://pastebin.com/RbcLdbcs > > Regards > > Cyril SCETBON > > On Jul 6, 2012, at 8:17 PM, Cyril Scetbon wrote: > >> Here are the log files you asked for : >> >> http://pastebin.com/xRBuQdNS <---- hbase-master.log >> >> http://pastebin.com/u6WYQT6R <---- hdfs-namenode.log >> >> If you find the fix to this damn issue I'll enjoy ! >> >> Thanks >> >> Cyril SCETBON >> >> On Jul 5, 2012, at 11:44 PM, Jean-Daniel Cryans wrote: >> >>> Interesting... Can you read the file? Try a "hadoop dfs -cat" on it >>> and see if it goes to the end of it. >>> >>> It could also be useful to see a bigger portion of the master log, for >>> all I know maybe it handles it somehow and there's a problem >>> elsewhere. >>> >>> Finally, which Hadoop version are you using? >>> >>> Thx, >>> >>> J-D >>> >>> On Thu, Jul 5, 2012 at 1:58 PM, Cyril Scetbon <[email protected]> wrote: >>>> yes : >>>> >>>> /hbase/.logs/hb-d12,60020,1341429679981-splitting/hb-d12%2C60020%2C1341429679981.134143064971 >>>> >>>> I did a fsck and here is the report : >>>> >>>> Status: HEALTHY >>>> Total size: 618827621255 B (Total open files size: 868 B) >>>> Total dirs: 4801 >>>> Total files: 2825 (Files currently being written: 42) >>>> Total blocks (validated): 11479 (avg. block size 53909541 B) (Total >>>> open file blocks (not validated): 41) >>>> Minimally replicated blocks: 11479 (100.0 %) >>>> Over-replicated blocks: 1 (0.008711561 %) >>>> Under-replicated blocks: 0 (0.0 %) >>>> Mis-replicated blocks: 0 (0.0 %) >>>> Default replication factor: 4 >>>> Average block replication: 4.0000873 >>>> Corrupt blocks: 0 >>>> Missing replicas: 0 (0.0 %) >>>> Number of data-nodes: 12 >>>> Number of racks: 1 >>>> FSCK ended at Thu Jul 05 20:56:35 UTC 2012 in 795 milliseconds >>>> >>>> >>>> The filesystem under path '/hbase' is HEALTHY >>>> >>>> Cyril SCETBON >>>> >>>> Cyril SCETBON >>>> >>>> On Jul 5, 2012, at 7:59 PM, Jean-Daniel Cryans wrote: >>>> >>>>> Does this file really exist in HDFS? >>>>> >>>>> hdfs://hb-zk1:54310/hbase/.logs/hb-d12,60020,1341429679981-splitting/hb-d12%2C60020%2C1341429679981.1341430649711 >>>>> >>>>> If so, did you run fsck in HDFS? >>>>> >>>>> It would be weird if HDFS doesn't report anything bad but somehow the >>>>> clients (like HBase) can't read it. >>>>> >>>>> J-D >>>>> >>>>> On Thu, Jul 5, 2012 at 12:45 AM, Cyril Scetbon <[email protected]> >>>>> wrote: >>>>>> Hi, >>>>>> >>>>>> I can nolonger start my cluster correctly and get messages like >>>>>> http://pastebin.com/T56wrJxE (taken on one region server) >>>>>> >>>>>> I suppose Hbase is not done for being stopped but only for having some >>>>>> nodes going down ??? HDFS is not complaining, it's only HBase that can't >>>>>> start correctly :( >>>>>> >>>>>> I suppose some data has not been flushed and it's not really important >>>>>> for me. Is there a way to fix theses errors even if I will lose data ? >>>>>> >>>>>> thanks >>>>>> >>>>>> Cyril SCETBON >>>>>> >>>> >> >
