Will the the common pool of datanodes and namenode federation be a more effective alternative in HDFS2 than multiple clusters?
On Sun, Jun 5, 2016 at 12:19 PM, daemeon reiydelle <[email protected]> wrote: > There are indeed many tuning points here. If the name nodes and journal > nodes can be larger, perhaps even bonding multiple 10gbyte nics, one can > easily scale. I did have one client where the file counts forced multiple > clusters. But we were able to differentiate by airframe types ... eg fixed > wing in one, rotary subsonic in another, etc. > > sent from my mobile > Daemeon C.M. Reiydelle > USA 415.501.0198 > London +44.0.20.8144.9872 > On Jun 4, 2016 2:23 PM, "Gavin Yue" <[email protected]> wrote: > >> Here is what I found on Horton website. >> >> >> *Namespace scalability* >> >> While HDFS cluster storage scales horizontally with the addition of >> datanodes, the namespace does not. Currently the namespace can only be >> vertically scaled on a single namenode. The namenode stores the entire >> file system metadata in memory. This limits the number of blocks, files, >> and directories supported on the file system to what can be accommodated in >> the memory of a single namenode. A typical large deployment at Yahoo! >> includes an HDFS cluster with 2700-4200 datanodes with 180 million files >> and blocks, and address ~25 PB of storage. At Facebook, HDFS has around >> 2600 nodes, 300 million files and blocks, addressing up to 60PB of storage. >> While these are very large systems and good enough for majority of Hadoop >> users, a few deployments that might want to grow even larger could find the >> namespace scalability limiting. >> >> >> >> On Jun 4, 2016, at 04:43, Ascot Moss <[email protected]> wrote: >> >> Hi, >> >> I read some (old?) articles from Internet about Mapr-FS vs HDFS. >> >> https://www.mapr.com/products/m5-features/no-namenode-architecture >> >> It states that HDFS Federation has >> >> a) "Multiple Single Points of Failure", is it really true? >> Why MapR uses HDFS but not HDFS2 in its comparison as this would lead to >> an unfair comparison (or even misleading comparison)? (HDFS was from >> Hadoop 1.x, the old generation) HDFS2 is available since 2013-10-15, there >> is no any Single Points of Failure in HDFS2. >> >> b) "Limit to 50-200 million files", is it really true? >> I have seen so many real world Hadoop Clusters with over 10PB data, some >> even with 150PB data. If "Limit to 50 -200 millions files" were true in >> HDFS2, why are there so many production Hadoop clusters in real world? how >> can they mange well the issue of "Limit to 50-200 million files"? For >> instances, the Facebook's "Like" implementation runs on HBase at Web >> Scale, I can image HBase generates huge number of files in Facbook's Hadoop >> cluster, the number of files in Facebook's Hadoop cluster should be much >> much bigger than 50-200 million. >> >> From my point of view, in contrast, MaprFS should have true limitation up >> to 1T files while HDFS2 can handle true unlimited files, please do correct >> me if I am wrong. >> >> c) "Performance Bottleneck", again, is it really true? >> MaprFS does not have namenode in order to gain file system performance. >> If without Namenode, MaprFS would lose Data Locality which is one of the >> beauties of Hadoop If Data Locality is no longer available, any big data >> application running on MaprFS might gain some file system performance but >> it would totally lose the true gain of performance from Data Locality >> provided by Hadoop's namenode (gain small lose big) >> >> d) "Commercial NAS required" >> Is there any wiki/blog/discussion about Commercial NAS on Hadoop >> Federation? >> >> regards >> >> >> >>
