Ok... So under Apache Hadoop, how do you specify the location of when and where a directory will be created on HDFS?
As an example, if I want to create a /coldData directory in HDFS as a place to store my older data sets, How does that get assigned specifically to a RAIDed HDFS? (Or even specific machines?) I know I can do this in MapR's distribution, but I am not aware of this feature being made available in the Apache based releases? Is this part of the latest feature set? Thx -Mike On Aug 8, 2012, at 12:31 PM, Steve Loughran <[email protected]> wrote: > > > On 8 August 2012 09:46, Sourygna Luangsay <[email protected]> wrote: > Hi folks! > > One of the scenario I can think in order to take advantage of HDFS RAID > without suffering this penalty is: > > - Using normal HDFS with default replication=3 for my “fresh data” > > - Using HDFS RAID for my historical data (that is barely used by M/R) > > > > > > > exactly: less space use on cold data, with the penalty that access > performance can be worse. As the majority of data on a hadoop cluster is > usually "cold", it's a space and power efficient story for the archive data > > -- > Steve Loughran > Hortonworks Inc >
