it is written 128 000 000 million in my previous post. it was incorrect (million million)
what i mean is 128 million. 1gb raughly 1 million. 5 Haz 2016 16:58 tarihinde "Ascot Moss" <[email protected]> yazdı: > HDFS2 "Limit to 50-200 million files", is it really true like what MapR > says? > > On Sun, Jun 5, 2016 at 7:55 PM, Hayati Gonultas <[email protected] > > wrote: > >> I forgot to mention about file system limit. >> >> Yes HDFS has limit, because for the performance considirations HDFS >> filesystem is read from disk to RAM and rest of the work is done with RAM. >> So RAM should be big enough to fit the filesystem image. But HDFS has >> configuration options like har files (Hadoop Archive) to defeat these >> limitations. >> >> On Sun, Jun 5, 2016 at 11:14 AM, Ascot Moss <[email protected]> wrote: >> >>> Will the the common pool of datanodes and namenode federation be a more >>> effective alternative in HDFS2 than multiple clusters? >>> >>> On Sun, Jun 5, 2016 at 12:19 PM, daemeon reiydelle <[email protected]> >>> wrote: >>> >>>> There are indeed many tuning points here. If the name nodes and journal >>>> nodes can be larger, perhaps even bonding multiple 10gbyte nics, one can >>>> easily scale. I did have one client where the file counts forced multiple >>>> clusters. But we were able to differentiate by airframe types ... eg fixed >>>> wing in one, rotary subsonic in another, etc. >>>> >>>> sent from my mobile >>>> Daemeon C.M. Reiydelle >>>> USA 415.501.0198 >>>> London +44.0.20.8144.9872 >>>> On Jun 4, 2016 2:23 PM, "Gavin Yue" <[email protected]> wrote: >>>> >>>>> Here is what I found on Horton website. >>>>> >>>>> >>>>> *Namespace scalability* >>>>> >>>>> While HDFS cluster storage scales horizontally with the addition of >>>>> datanodes, the namespace does not. Currently the namespace can only be >>>>> vertically scaled on a single namenode. The namenode stores the entire >>>>> file system metadata in memory. This limits the number of blocks, files, >>>>> and directories supported on the file system to what can be accommodated >>>>> in >>>>> the memory of a single namenode. A typical large deployment at Yahoo! >>>>> includes an HDFS cluster with 2700-4200 datanodes with 180 million >>>>> files and blocks, and address ~25 PB of storage. At Facebook, HDFS has >>>>> around 2600 nodes, 300 million files and blocks, addressing up to 60PB of >>>>> storage. While these are very large systems and good enough for majority >>>>> of >>>>> Hadoop users, a few deployments that might want to grow even larger could >>>>> find the namespace scalability limiting. >>>>> >>>>> >>>>> >>>>> On Jun 4, 2016, at 04:43, Ascot Moss <[email protected]> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I read some (old?) articles from Internet about Mapr-FS vs HDFS. >>>>> >>>>> https://www.mapr.com/products/m5-features/no-namenode-architecture >>>>> >>>>> It states that HDFS Federation has >>>>> >>>>> a) "Multiple Single Points of Failure", is it really true? >>>>> Why MapR uses HDFS but not HDFS2 in its comparison as this would lead >>>>> to an unfair comparison (or even misleading comparison)? (HDFS was from >>>>> Hadoop 1.x, the old generation) HDFS2 is available since 2013-10-15, there >>>>> is no any Single Points of Failure in HDFS2. >>>>> >>>>> b) "Limit to 50-200 million files", is it really true? >>>>> I have seen so many real world Hadoop Clusters with over 10PB data, >>>>> some even with 150PB data. If "Limit to 50 -200 millions files" were true >>>>> in HDFS2, why are there so many production Hadoop clusters in real world? >>>>> how can they mange well the issue of "Limit to 50-200 million files"? For >>>>> instances, the Facebook's "Like" implementation runs on HBase at Web >>>>> Scale, I can image HBase generates huge number of files in Facbook's >>>>> Hadoop >>>>> cluster, the number of files in Facebook's Hadoop cluster should be much >>>>> much bigger than 50-200 million. >>>>> >>>>> From my point of view, in contrast, MaprFS should have true limitation >>>>> up to 1T files while HDFS2 can handle true unlimited files, please do >>>>> correct me if I am wrong. >>>>> >>>>> c) "Performance Bottleneck", again, is it really true? >>>>> MaprFS does not have namenode in order to gain file system >>>>> performance. If without Namenode, MaprFS would lose Data Locality which is >>>>> one of the beauties of Hadoop If Data Locality is no longer available, >>>>> any >>>>> big data application running on MaprFS might gain some file system >>>>> performance but it would totally lose the true gain of performance from >>>>> Data Locality provided by Hadoop's namenode (gain small lose big) >>>>> >>>>> d) "Commercial NAS required" >>>>> Is there any wiki/blog/discussion about Commercial NAS on Hadoop >>>>> Federation? >>>>> >>>>> regards >>>>> >>>>> >>>>> >>>>> >>> >> >> >> -- >> Hayati Gonultas >> > >
