On Wed, Aug 25, 2010 at 8:05 AM, Matthew LeMieux <[email protected]> wrote:
> For those who are curious, using rack awareness to speed up the process of 
> adding and removing nodes did not work in my experiment.
>
> Once the extra rack was no longer needed, HDFS was using up time and 
> bandwidth to duplicate the data on the primary rack over to the transient 
> rack rather than replicating the data on transient rack over to the primary 
> rack.  With the alternative method of putting all machines in one rack, the 
> data on the permanent machines does not need to be copied over to the 
> transient machines, making the process of replication much faster.
>

Would adding a custom script for rack awareness help here?


> But, once I've copied the files over to the M/R cluster, is there a way to 
> read in the files, i.e., is there an HFileInputFormat equivalent?
>

No.  We don't have such a beastie at the moment.  It'd be a little
tricky to write in that it would in essence need to keep the order and
merge the content of all files in the same way as its done inside in
the HRegion class that floats in a running HRegionServer (maybe whats
needed is a HRegionInputFormat?).

You'll need to bring up an hbase instance on the M/R cluster if you
want to run M/R against the hbase content.  Or, just M/R across racks?

St.Ack

Reply via email to