A network issue ?? it's weird, cause reads/writes are working well and not rising errors (I'll double check it)
Regards Cyril SCETBON On Jul 9, 2012, at 10:55 PM, Jean-Daniel Cryans wrote: > We've been running with distributed splitting here for >6 months and > never had this issue. Also the exceptions you are seeing come from > HDFS and not HBase, the fact that it worked from the master and not > the region servers seem to point to a network configuration issue > because the actual splitting code is really the same. > > J-D > > On Sun, Jul 8, 2012 at 2:25 PM, Cyril Scetbon <[email protected]> wrote: >> I've finally succeeded in starting my cluster by disabling >> hbase.master.distributed.log.splitting >> >> it took less than 10 minutes to start it compared to the whole night without >> any success with distributed log splitting enabled. Don't you think like me >> that it's just buggy ?? >> >> thanks >> >> Cyril SCETBON >> >> On Jul 6, 2012, at 8:40 PM, Cyril Scetbon wrote: >> >>> As you can see in the master log, region servers are in charge of splitting >>> log files (not found I suppose) and it's retried several times (I didn't >>> check if it's always redone) on different region servers. You can for >>> example follow a failing split concerning a file not found in the hadoop >>> filesystem : >>> >>> http://pastebin.com/RbcLdbcs >>> >>> Regards >>> >>> Cyril SCETBON >>> >>> On Jul 6, 2012, at 8:17 PM, Cyril Scetbon wrote: >>> >>>> Here are the log files you asked for : >>>> >>>> http://pastebin.com/xRBuQdNS <---- hbase-master.log >>>> >>>> http://pastebin.com/u6WYQT6R <---- hdfs-namenode.log >>>> >>>> If you find the fix to this damn issue I'll enjoy ! >>>> >>>> Thanks >>>> >>>> Cyril SCETBON >>>> >>>> On Jul 5, 2012, at 11:44 PM, Jean-Daniel Cryans wrote: >>>> >>>>> Interesting... Can you read the file? Try a "hadoop dfs -cat" on it >>>>> and see if it goes to the end of it. >>>>> >>>>> It could also be useful to see a bigger portion of the master log, for >>>>> all I know maybe it handles it somehow and there's a problem >>>>> elsewhere. >>>>> >>>>> Finally, which Hadoop version are you using? >>>>> >>>>> Thx, >>>>> >>>>> J-D >>>>> >>>>> On Thu, Jul 5, 2012 at 1:58 PM, Cyril Scetbon <[email protected]> >>>>> wrote: >>>>>> yes : >>>>>> >>>>>> /hbase/.logs/hb-d12,60020,1341429679981-splitting/hb-d12%2C60020%2C1341429679981.134143064971 >>>>>> >>>>>> I did a fsck and here is the report : >>>>>> >>>>>> Status: HEALTHY >>>>>> Total size: 618827621255 B (Total open files size: 868 B) >>>>>> Total dirs: 4801 >>>>>> Total files: 2825 (Files currently being written: 42) >>>>>> Total blocks (validated): 11479 (avg. block size 53909541 B) (Total >>>>>> open file blocks (not validated): 41) >>>>>> Minimally replicated blocks: 11479 (100.0 %) >>>>>> Over-replicated blocks: 1 (0.008711561 %) >>>>>> Under-replicated blocks: 0 (0.0 %) >>>>>> Mis-replicated blocks: 0 (0.0 %) >>>>>> Default replication factor: 4 >>>>>> Average block replication: 4.0000873 >>>>>> Corrupt blocks: 0 >>>>>> Missing replicas: 0 (0.0 %) >>>>>> Number of data-nodes: 12 >>>>>> Number of racks: 1 >>>>>> FSCK ended at Thu Jul 05 20:56:35 UTC 2012 in 795 milliseconds >>>>>> >>>>>> >>>>>> The filesystem under path '/hbase' is HEALTHY >>>>>> >>>>>> Cyril SCETBON >>>>>> >>>>>> Cyril SCETBON >>>>>> >>>>>> On Jul 5, 2012, at 7:59 PM, Jean-Daniel Cryans wrote: >>>>>> >>>>>>> Does this file really exist in HDFS? >>>>>>> >>>>>>> hdfs://hb-zk1:54310/hbase/.logs/hb-d12,60020,1341429679981-splitting/hb-d12%2C60020%2C1341429679981.1341430649711 >>>>>>> >>>>>>> If so, did you run fsck in HDFS? >>>>>>> >>>>>>> It would be weird if HDFS doesn't report anything bad but somehow the >>>>>>> clients (like HBase) can't read it. >>>>>>> >>>>>>> J-D >>>>>>> >>>>>>> On Thu, Jul 5, 2012 at 12:45 AM, Cyril Scetbon <[email protected]> >>>>>>> wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I can nolonger start my cluster correctly and get messages like >>>>>>>> http://pastebin.com/T56wrJxE (taken on one region server) >>>>>>>> >>>>>>>> I suppose Hbase is not done for being stopped but only for having some >>>>>>>> nodes going down ??? HDFS is not complaining, it's only HBase that >>>>>>>> can't start correctly :( >>>>>>>> >>>>>>>> I suppose some data has not been flushed and it's not really important >>>>>>>> for me. Is there a way to fix theses errors even if I will lose data ? >>>>>>>> >>>>>>>> thanks >>>>>>>> >>>>>>>> Cyril SCETBON >>>>>>>> >>>>>> >>>> >>> >>
