Re: is HDFS RAID "data locality" efficient?

Ajit Ratnaparkhi Wed, 08 Aug 2012 11:32:25 -0700

Agreed with Steve.
That is most important use of HDFS RAID, where you consume less disk space
with same reliability and availability guarantee at cost of processing
performance. Most of data in hdfs is cold data, without HDFS RAID you end
up maintaining 3 replicas of data which is hardly going to be processed
again, but you cant remove/move this data to separate archive because if
 required processing should be as soon as possible.


-Ajit

On Wed, Aug 8, 2012 at 11:01 PM, Steve Loughran <[email protected]>wrote:

>
>
> On 8 August 2012 09:46, Sourygna Luangsay <[email protected]> wrote:
>
>>  Hi folks!****
>>
>> One of the scenario I can think in order to take advantage of HDFS RAID
>> without suffering this penalty is:**
>>
>> **-          **Using normal HDFS with default replication=3 for my
>> “fresh data”****
>>
>> **-          **Using HDFS RAID for my historical data (that is barely
>> used by M/R)****
>>
>> ** **
>>
>>
>>
> exactly: less space use on cold data, with the penalty that access
> performance can be worse. As the majority of data on a hadoop cluster is
> usually "cold", it's a space and power efficient story for the archive data
>
> --
> Steve Loughran
> Hortonworks Inc
>
>

Re: is HDFS RAID "data locality" efficient?

Reply via email to