The namenode architecture is a source of fragility in HDFS. While a high availability deployment (with two namenodes, and a failover mechanism) means you're unlikely to see service interruption, it is still possible to have a complete loss of filesystem metadata with the loss of two machines.
Secondly, because HDFS identifies datanodes by their hostname/ip, dns changes can cause havoc with HDFS (see my war story on this here: https://medium.com/handy-tech/renaming-hdfs-datanodes-considered-terribly-harmful-2bc2f37aabab ). Also, the namenode/datanode architecture probably does contribute to the small files problem being a problem. That said, there are lot of practical solutions for the small files problem. If you're just setting up a data infrastructure, I would say consider alternatives before you pick HDFS. If you run in AWS, S3 is a good alternative. If you run in some other cloud, it's probably worth considering whatever their equivalent storage system is. On Sat, Jun 4, 2016 at 7:43 AM, Ascot Moss <[email protected]> wrote: > Hi, > > I read some (old?) articles from Internet about Mapr-FS vs HDFS. > > https://www.mapr.com/products/m5-features/no-namenode-architecture > > It states that HDFS Federation has > > a) "Multiple Single Points of Failure", is it really true? > Why MapR uses HDFS but not HDFS2 in its comparison as this would lead to > an unfair comparison (or even misleading comparison)? (HDFS was from > Hadoop 1.x, the old generation) HDFS2 is available since 2013-10-15, there > is no any Single Points of Failure in HDFS2. > > b) "Limit to 50-200 million files", is it really true? > I have seen so many real world Hadoop Clusters with over 10PB data, some > even with 150PB data. If "Limit to 50 -200 millions files" were true in > HDFS2, why are there so many production Hadoop clusters in real world? how > can they mange well the issue of "Limit to 50-200 million files"? For > instances, the Facebook's "Like" implementation runs on HBase at Web > Scale, I can image HBase generates huge number of files in Facbook's Hadoop > cluster, the number of files in Facebook's Hadoop cluster should be much > much bigger than 50-200 million. > > From my point of view, in contrast, MaprFS should have true limitation up > to 1T files while HDFS2 can handle true unlimited files, please do correct > me if I am wrong. > > c) "Performance Bottleneck", again, is it really true? > MaprFS does not have namenode in order to gain file system performance. If > without Namenode, MaprFS would lose Data Locality which is one of the > beauties of Hadoop If Data Locality is no longer available, any big data > application running on MaprFS might gain some file system performance but > it would totally lose the true gain of performance from Data Locality > provided by Hadoop's namenode (gain small lose big) > > d) "Commercial NAS required" > Is there any wiki/blog/discussion about Commercial NAS on Hadoop > Federation? > > regards > > > > -- Want to work at Handy? Check out our culture deck and open roles <http://www.handy.com/careers> Latest news <http://www.handy.com/press> at Handy Handy just raised $50m <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> led by Fidelity
