As you can see in the master log, region servers are in charge of splitting log files (not found I suppose) and it's retried several times (I didn't check if it's always redone) on different region servers. You can for example follow a failing split concerning a file not found in the hadoop filesystem :
http://pastebin.com/RbcLdbcs Regards Cyril SCETBON On Jul 6, 2012, at 8:17 PM, Cyril Scetbon wrote: > Here are the log files you asked for : > > http://pastebin.com/xRBuQdNS <---- hbase-master.log > > http://pastebin.com/u6WYQT6R <---- hdfs-namenode.log > > If you find the fix to this damn issue I'll enjoy ! > > Thanks > > Cyril SCETBON > > On Jul 5, 2012, at 11:44 PM, Jean-Daniel Cryans wrote: > >> Interesting... Can you read the file? Try a "hadoop dfs -cat" on it >> and see if it goes to the end of it. >> >> It could also be useful to see a bigger portion of the master log, for >> all I know maybe it handles it somehow and there's a problem >> elsewhere. >> >> Finally, which Hadoop version are you using? >> >> Thx, >> >> J-D >> >> On Thu, Jul 5, 2012 at 1:58 PM, Cyril Scetbon <[email protected]> wrote: >>> yes : >>> >>> /hbase/.logs/hb-d12,60020,1341429679981-splitting/hb-d12%2C60020%2C1341429679981.134143064971 >>> >>> I did a fsck and here is the report : >>> >>> Status: HEALTHY >>> Total size: 618827621255 B (Total open files size: 868 B) >>> Total dirs: 4801 >>> Total files: 2825 (Files currently being written: 42) >>> Total blocks (validated): 11479 (avg. block size 53909541 B) (Total >>> open file blocks (not validated): 41) >>> Minimally replicated blocks: 11479 (100.0 %) >>> Over-replicated blocks: 1 (0.008711561 %) >>> Under-replicated blocks: 0 (0.0 %) >>> Mis-replicated blocks: 0 (0.0 %) >>> Default replication factor: 4 >>> Average block replication: 4.0000873 >>> Corrupt blocks: 0 >>> Missing replicas: 0 (0.0 %) >>> Number of data-nodes: 12 >>> Number of racks: 1 >>> FSCK ended at Thu Jul 05 20:56:35 UTC 2012 in 795 milliseconds >>> >>> >>> The filesystem under path '/hbase' is HEALTHY >>> >>> Cyril SCETBON >>> >>> Cyril SCETBON >>> >>> On Jul 5, 2012, at 7:59 PM, Jean-Daniel Cryans wrote: >>> >>>> Does this file really exist in HDFS? >>>> >>>> hdfs://hb-zk1:54310/hbase/.logs/hb-d12,60020,1341429679981-splitting/hb-d12%2C60020%2C1341429679981.1341430649711 >>>> >>>> If so, did you run fsck in HDFS? >>>> >>>> It would be weird if HDFS doesn't report anything bad but somehow the >>>> clients (like HBase) can't read it. >>>> >>>> J-D >>>> >>>> On Thu, Jul 5, 2012 at 12:45 AM, Cyril Scetbon <[email protected]> >>>> wrote: >>>>> Hi, >>>>> >>>>> I can nolonger start my cluster correctly and get messages like >>>>> http://pastebin.com/T56wrJxE (taken on one region server) >>>>> >>>>> I suppose Hbase is not done for being stopped but only for having some >>>>> nodes going down ??? HDFS is not complaining, it's only HBase that can't >>>>> start correctly :( >>>>> >>>>> I suppose some data has not been flushed and it's not really important >>>>> for me. Is there a way to fix theses errors even if I will lose data ? >>>>> >>>>> thanks >>>>> >>>>> Cyril SCETBON >>>>> >>> >
