When you write HDFS, you write N replicas. By default, the first replica is written to the local datanode. Reading, the DFSClient will try to read from the most local replica first.
Compactions read from multiple files and write out a single merged file. This newly written files' blocks will all be on the local datanode unless anomaly. St.Ack On Tue, Oct 12, 2010 at 11:58 AM, Jack Levin <[email protected]> wrote: > Ryan, can you elaborate how compactions create data locality? > > -Jack > > > On Oct 11, 2010, at 10:12 PM, Ryan Rawson <[email protected]> wrote: > >> We don't attempt to optimize region placement with hdfs locations yet. A >> reason why is because on a long lived cluster compactions create the >> locality you are looking for. Furthermore, in the old master such an >> optimization was really hard to do. The new master should make it easier to >> write such 1 off hacks. >> On Oct 11, 2010 9:43 PM, "Tao Xie" <[email protected]> wrote: >>> hi, all >>> I set hdfs replica=1 when running hbase. And DN and RS co-exists on each >>> slave node. So the data in the regions managed by RS will be stored on its >>> local data node, rite? >>> But when I restart hbase and hbase client does gets on RS, datanode will >>> read data from remote data nodes. Does that mean when RS restart, the >>> regions are re-arranged? If so, will hbase is clever enough to re-adjust >> the >>> regions? I'm not clear about the behind mechanism so anyone can give me >> some >>> explanations? Thanks. >
