Re: HDFS2 vs MaprFS

Ascot Moss Mon, 06 Jun 2016 09:02:46 -0700

In HDFS2, I can find "dfs.storage.policy",  for instances, HDFS2
allows to *Apply
the COLD storage policy to a directory,*
 where are these features in Mapr-FS?


On Mon, Jun 6, 2016 at 11:43 PM, Aaron Eng <[email protected]> wrote:

> >Since MapR  is proprietary, I find that it has many compatibility issues
> in Apache open source projects
>
> This is faulty logic. And rather than saying it has "many compatibility
> issues", perhaps you can describe one.
>
> Both MapRFS and HDFS are accessible through the same API.  The backend
> implementations are what differs.
>
> >Hadoop has a built-in storage policy named COLD, where is it in Mapr-FS?
>
> Long before HDFS had storage policies, MapRFS had topologies.  You can
> restrict particular types of storage to a topology and then assign a volume
> (subset of data stored in MapRFS) to the topology, and hence the data in
> that subset would be served by whatever hardware was mapped into the
> topology.
>
> >no to mention that Mapr-FS  loses Data-Locality.
>
> This statement is false.
>
>
>
> On Mon, Jun 6, 2016 at 8:32 AM, Ascot Moss <[email protected]> wrote:
>
>> Since MapR  is proprietary, I find that it has many compatibility issues
>> in Apache open source projects, or even worse, lose Hadoop's features.  For
>> instances, Hadoop has a built-in storage policy named COLD, where is it in
>> Mapr-FS? no to mention that Mapr-FS  loses Data-Locality.
>>
>> On Mon, Jun 6, 2016 at 11:26 PM, Ascot Moss <[email protected]> wrote:
>>
>>> I don't think HDFS2 needs SAN, use the QuorumJournal approach is much
>>> better than using Shared edits directory SAN approach.
>>>
>>>
>>>
>>>
>>> On Monday, June 6, 2016, Peyman Mohajerian <[email protected]> wrote:
>>>
>>>> It is very common practice to backup the metadata in some SAN store. So
>>>> the idea of complete loss of all the metadata is preventable. You could
>>>> lose a day worth of data if e.g. you back the metadata once a day but you
>>>> could do it more frequently. I'm not saying S3 or Azure Blob are bad ideas.
>>>>
>>>> On Sun, Jun 5, 2016 at 8:19 AM, Marcin Tustin <[email protected]>
>>>> wrote:
>>>>
>>>>> The namenode architecture is a source of fragility in HDFS. While a
>>>>> high availability deployment (with two namenodes, and a failover 
>>>>> mechanism)
>>>>> means you're unlikely to see service interruption, it is still possible to
>>>>> have a complete loss of filesystem metadata with the loss of two machines.
>>>>>
>>>>> Secondly, because HDFS identifies datanodes by their hostname/ip, dns
>>>>> changes can cause havoc with HDFS (see my war story on this here:
>>>>> https://medium.com/handy-tech/renaming-hdfs-datanodes-considered-terribly-harmful-2bc2f37aabab
>>>>> ).
>>>>>
>>>>> Also, the namenode/datanode architecture probably does contribute to
>>>>> the small files problem being a problem. That said, there are lot of
>>>>> practical solutions for the small files problem.
>>>>>
>>>>> If you're just setting up a data infrastructure, I would say consider
>>>>> alternatives before you pick HDFS. If you run in AWS, S3 is a good
>>>>> alternative. If you run in some other cloud, it's probably worth
>>>>> considering whatever their equivalent storage system is.
>>>>>
>>>>>
>>>>> On Sat, Jun 4, 2016 at 7:43 AM, Ascot Moss <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I read some (old?) articles from Internet about Mapr-FS vs HDFS.
>>>>>>
>>>>>> https://www.mapr.com/products/m5-features/no-namenode-architecture
>>>>>>
>>>>>> It states that HDFS Federation has
>>>>>>
>>>>>> a) "Multiple Single Points of Failure", is it really true?
>>>>>> Why MapR uses HDFS but not HDFS2 in its comparison as this would lead
>>>>>> to an unfair comparison (or even misleading comparison)?  (HDFS was from
>>>>>> Hadoop 1.x, the old generation) HDFS2 is available since 2013-10-15, 
>>>>>> there
>>>>>> is no any Single Points of  Failure in HDFS2.
>>>>>>
>>>>>> b) "Limit to 50-200 million files", is it really true?
>>>>>> I have seen so many real world Hadoop Clusters with over 10PB data,
>>>>>> some even with 150PB data.  If "Limit to 50 -200 millions files" were 
>>>>>> true
>>>>>> in HDFS2, why are there so many production Hadoop clusters in real world?
>>>>>> how can they mange well the issue of  "Limit to 50-200 million files"? 
>>>>>> For
>>>>>> instances,  the Facebook's "Like" implementation runs on HBase at Web
>>>>>> Scale, I can image HBase generates huge number of files in Facbook's 
>>>>>> Hadoop
>>>>>> cluster, the number of files in Facebook's Hadoop cluster should be much
>>>>>> much bigger than 50-200 million.
>>>>>>
>>>>>> From my point of view, in contrast, MaprFS should have true
>>>>>> limitation up to 1T files while HDFS2 can handle true unlimited files,
>>>>>> please do correct me if I am wrong.
>>>>>>
>>>>>> c) "Performance Bottleneck", again, is it really true?
>>>>>> MaprFS does not have namenode in order to gain file system
>>>>>> performance. If without Namenode, MaprFS would lose Data Locality which 
>>>>>> is
>>>>>> one of the beauties of Hadoop  If Data Locality is no longer available, 
>>>>>> any
>>>>>> big data application running on MaprFS might gain some file system
>>>>>> performance but it would totally lose the true gain of performance from
>>>>>> Data Locality provided by Hadoop's namenode (gain small lose big)
>>>>>>
>>>>>> d) "Commercial NAS required"
>>>>>> Is there any wiki/blog/discussion about Commercial NAS on Hadoop
>>>>>> Federation?
>>>>>>
>>>>>> regards
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> Want to work at Handy? Check out our culture deck and open roles
>>>>> <http://www.handy.com/careers>
>>>>> Latest news <http://www.handy.com/press> at Handy
>>>>> Handy just raised $50m
>>>>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
>>>>>  led
>>>>> by Fidelity
>>>>>
>>>>>
>>>>
>>
>

Re: HDFS2 vs MaprFS

Reply via email to