Hi Aaron, from MapR site, [now HDSF2] "Limit to 50-200 million files", is it really true?
On Tue, Jun 7, 2016 at 12:09 AM, Aaron Eng <[email protected]> wrote: > As I said, MapRFS has topologies. You assign a volume (which is mounted > at a directory path) to a topology and in turn all the data for the volume > (e.g. under the directory) is stored on the storage hardware assigned to > the topology. > > These topological labels provide the same benefits as dfs.storage.policy > as well as enabling additional types of use cases. > > On Mon, Jun 6, 2016 at 9:02 AM, Ascot Moss <[email protected]> wrote: > >> In HDFS2, I can find "dfs.storage.policy", for instances, HDFS2 allows >> to *Apply the COLD storage policy to a directory,* >> where are these features in Mapr-FS? >> >> On Mon, Jun 6, 2016 at 11:43 PM, Aaron Eng <[email protected]> wrote: >> >>> >Since MapR is proprietary, I find that it has many compatibility >>> issues in Apache open source projects >>> >>> This is faulty logic. And rather than saying it has "many compatibility >>> issues", perhaps you can describe one. >>> >>> Both MapRFS and HDFS are accessible through the same API. The backend >>> implementations are what differs. >>> >>> >Hadoop has a built-in storage policy named COLD, where is it in >>> Mapr-FS? >>> >>> Long before HDFS had storage policies, MapRFS had topologies. You can >>> restrict particular types of storage to a topology and then assign a volume >>> (subset of data stored in MapRFS) to the topology, and hence the data in >>> that subset would be served by whatever hardware was mapped into the >>> topology. >>> >>> >no to mention that Mapr-FS loses Data-Locality. >>> >>> This statement is false. >>> >>> >>> >>> On Mon, Jun 6, 2016 at 8:32 AM, Ascot Moss <[email protected]> wrote: >>> >>>> Since MapR is proprietary, I find that it has many compatibility >>>> issues in Apache open source projects, or even worse, lose Hadoop's >>>> features. For instances, Hadoop has a built-in storage policy named COLD, >>>> where is it in Mapr-FS? no to mention that Mapr-FS loses Data-Locality. >>>> >>>> On Mon, Jun 6, 2016 at 11:26 PM, Ascot Moss <[email protected]> >>>> wrote: >>>> >>>>> I don't think HDFS2 needs SAN, use the QuorumJournal approach is much >>>>> better than using Shared edits directory SAN approach. >>>>> >>>>> >>>>> >>>>> >>>>> On Monday, June 6, 2016, Peyman Mohajerian <[email protected]> wrote: >>>>> >>>>>> It is very common practice to backup the metadata in some SAN store. >>>>>> So the idea of complete loss of all the metadata is preventable. You >>>>>> could >>>>>> lose a day worth of data if e.g. you back the metadata once a day but you >>>>>> could do it more frequently. I'm not saying S3 or Azure Blob are bad >>>>>> ideas. >>>>>> >>>>>> On Sun, Jun 5, 2016 at 8:19 AM, Marcin Tustin <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> The namenode architecture is a source of fragility in HDFS. While a >>>>>>> high availability deployment (with two namenodes, and a failover >>>>>>> mechanism) >>>>>>> means you're unlikely to see service interruption, it is still possible >>>>>>> to >>>>>>> have a complete loss of filesystem metadata with the loss of two >>>>>>> machines. >>>>>>> >>>>>>> Secondly, because HDFS identifies datanodes by their hostname/ip, >>>>>>> dns changes can cause havoc with HDFS (see my war story on this here: >>>>>>> https://medium.com/handy-tech/renaming-hdfs-datanodes-considered-terribly-harmful-2bc2f37aabab >>>>>>> ). >>>>>>> >>>>>>> Also, the namenode/datanode architecture probably does contribute to >>>>>>> the small files problem being a problem. That said, there are lot of >>>>>>> practical solutions for the small files problem. >>>>>>> >>>>>>> If you're just setting up a data infrastructure, I would say >>>>>>> consider alternatives before you pick HDFS. If you run in AWS, S3 is a >>>>>>> good >>>>>>> alternative. If you run in some other cloud, it's probably worth >>>>>>> considering whatever their equivalent storage system is. >>>>>>> >>>>>>> >>>>>>> On Sat, Jun 4, 2016 at 7:43 AM, Ascot Moss <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I read some (old?) articles from Internet about Mapr-FS vs HDFS. >>>>>>>> >>>>>>>> https://www.mapr.com/products/m5-features/no-namenode-architecture >>>>>>>> >>>>>>>> It states that HDFS Federation has >>>>>>>> >>>>>>>> a) "Multiple Single Points of Failure", is it really true? >>>>>>>> Why MapR uses HDFS but not HDFS2 in its comparison as this would >>>>>>>> lead to an unfair comparison (or even misleading comparison)? (HDFS >>>>>>>> was >>>>>>>> from Hadoop 1.x, the old generation) HDFS2 is available since >>>>>>>> 2013-10-15, >>>>>>>> there is no any Single Points of Failure in HDFS2. >>>>>>>> >>>>>>>> b) "Limit to 50-200 million files", is it really true? >>>>>>>> I have seen so many real world Hadoop Clusters with over 10PB data, >>>>>>>> some even with 150PB data. If "Limit to 50 -200 millions files" were >>>>>>>> true >>>>>>>> in HDFS2, why are there so many production Hadoop clusters in real >>>>>>>> world? >>>>>>>> how can they mange well the issue of "Limit to 50-200 million files"? >>>>>>>> For >>>>>>>> instances, the Facebook's "Like" implementation runs on HBase at Web >>>>>>>> Scale, I can image HBase generates huge number of files in Facbook's >>>>>>>> Hadoop >>>>>>>> cluster, the number of files in Facebook's Hadoop cluster should be >>>>>>>> much >>>>>>>> much bigger than 50-200 million. >>>>>>>> >>>>>>>> From my point of view, in contrast, MaprFS should have true >>>>>>>> limitation up to 1T files while HDFS2 can handle true unlimited files, >>>>>>>> please do correct me if I am wrong. >>>>>>>> >>>>>>>> c) "Performance Bottleneck", again, is it really true? >>>>>>>> MaprFS does not have namenode in order to gain file system >>>>>>>> performance. If without Namenode, MaprFS would lose Data Locality >>>>>>>> which is >>>>>>>> one of the beauties of Hadoop If Data Locality is no longer >>>>>>>> available, any >>>>>>>> big data application running on MaprFS might gain some file system >>>>>>>> performance but it would totally lose the true gain of performance from >>>>>>>> Data Locality provided by Hadoop's namenode (gain small lose big) >>>>>>>> >>>>>>>> d) "Commercial NAS required" >>>>>>>> Is there any wiki/blog/discussion about Commercial NAS on Hadoop >>>>>>>> Federation? >>>>>>>> >>>>>>>> regards >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> Want to work at Handy? Check out our culture deck and open roles >>>>>>> <http://www.handy.com/careers> >>>>>>> Latest news <http://www.handy.com/press> at Handy >>>>>>> Handy just raised $50m >>>>>>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> >>>>>>> led >>>>>>> by Fidelity >>>>>>> >>>>>>> >>>>>> >>>> >>> >> >
