dfs.datanode.max.xcievers is set to 4096 and the soft limit of nofile is set to 32768 (it is the default in the package)
However when I log in as hdfs it's set to 1024 and I can't find if it's set somewhere to more... Cyril SCETBON On Jul 6, 2012, at 12:19 PM, N Keywal wrote: > Hi Cyril, > > BTW, have you checked dfs.datanode.max.xcievers and ulimit -n? When > underconfigured they can cause this type of errors, even if it seems > it's not the case here... > > Cheers, > > N. > > On Fri, Jul 6, 2012 at 11:31 AM, Cyril Scetbon <[email protected]> wrote: >> The file is now missing but I have tried with another one and you can see >> the error : >> >> shell> hdfs dfs -ls >> "/hbase/.logs/hb-d11,60020,1341097456894-splitting/hb-d11%2C60020%2C1341097456894.1341421613446" >> Found 1 items >> -rw-r--r-- 4 hbase supergroup 0 2012-07-04 17:06 >> /hbase/.logs/hb-d11,60020,1341097456894-splitting/hb-d11%2C60020%2C1341097456894.1341421613446 >> shell> hdfs dfs -cat >> "/hbase/.logs/hb-d11,60020,1341097456894-splitting/hb-d11%2C60020%2C1341097456894.1341421613446" >> 12/07/06 09:27:51 WARN hdfs.DFSClient: Last block locations not available. >> Datanodes might not have reported blocks completely. Will retry for 3 times >> 12/07/06 09:27:55 WARN hdfs.DFSClient: Last block locations not available. >> Datanodes might not have reported blocks completely. Will retry for 2 times >> 12/07/06 09:27:59 WARN hdfs.DFSClient: Last block locations not available. >> Datanodes might not have reported blocks completely. Will retry for 1 times >> cat: Could not obtain the last block locations. >> >> I'm using hadoop 2.0 from Cloudera package (CDH4) with hbase 0.92.1 >> >> Regards >> Cyril SCETBON >> >> On Jul 5, 2012, at 11:44 PM, Jean-Daniel Cryans wrote: >> >>> Interesting... Can you read the file? Try a "hadoop dfs -cat" on it >>> and see if it goes to the end of it. >>> >>> It could also be useful to see a bigger portion of the master log, for >>> all I know maybe it handles it somehow and there's a problem >>> elsewhere. >>> >>> Finally, which Hadoop version are you using? >>> >>> Thx, >>> >>> J-D >>> >>> On Thu, Jul 5, 2012 at 1:58 PM, Cyril Scetbon <[email protected]> wrote: >>>> yes : >>>> >>>> /hbase/.logs/hb-d12,60020,1341429679981-splitting/hb-d12%2C60020%2C1341429679981.134143064971 >>>> >>>> I did a fsck and here is the report : >>>> >>>> Status: HEALTHY >>>> Total size: 618827621255 B (Total open files size: 868 B) >>>> Total dirs: 4801 >>>> Total files: 2825 (Files currently being written: 42) >>>> Total blocks (validated): 11479 (avg. block size 53909541 B) (Total >>>> open file blocks (not validated): 41) >>>> Minimally replicated blocks: 11479 (100.0 %) >>>> Over-replicated blocks: 1 (0.008711561 %) >>>> Under-replicated blocks: 0 (0.0 %) >>>> Mis-replicated blocks: 0 (0.0 %) >>>> Default replication factor: 4 >>>> Average block replication: 4.0000873 >>>> Corrupt blocks: 0 >>>> Missing replicas: 0 (0.0 %) >>>> Number of data-nodes: 12 >>>> Number of racks: 1 >>>> FSCK ended at Thu Jul 05 20:56:35 UTC 2012 in 795 milliseconds >>>> >>>> >>>> The filesystem under path '/hbase' is HEALTHY >>>> >>>> Cyril SCETBON >>>> >>>> Cyril SCETBON >>>> >>>> On Jul 5, 2012, at 7:59 PM, Jean-Daniel Cryans wrote: >>>> >>>>> Does this file really exist in HDFS? >>>>> >>>>> hdfs://hb-zk1:54310/hbase/.logs/hb-d12,60020,1341429679981-splitting/hb-d12%2C60020%2C1341429679981.1341430649711 >>>>> >>>>> If so, did you run fsck in HDFS? >>>>> >>>>> It would be weird if HDFS doesn't report anything bad but somehow the >>>>> clients (like HBase) can't read it. >>>>> >>>>> J-D >>>>> >>>>> On Thu, Jul 5, 2012 at 12:45 AM, Cyril Scetbon <[email protected]> >>>>> wrote: >>>>>> Hi, >>>>>> >>>>>> I can nolonger start my cluster correctly and get messages like >>>>>> http://pastebin.com/T56wrJxE (taken on one region server) >>>>>> >>>>>> I suppose Hbase is not done for being stopped but only for having some >>>>>> nodes going down ??? HDFS is not complaining, it's only HBase that can't >>>>>> start correctly :( >>>>>> >>>>>> I suppose some data has not been flushed and it's not really important >>>>>> for me. Is there a way to fix theses errors even if I will lose data ? >>>>>> >>>>>> thanks >>>>>> >>>>>> Cyril SCETBON >>>>>> >>>> >>
