Ah thanks for that link. I missed it while browsing the docs. The link from there to this blog post
http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html really answers my questions! :-) On Wed, Aug 29, 2012 at 2:38 AM, N Keywal <[email protected]> wrote: > Inline. Just a set of "you're right" :-). > It's documented here: > http://hbase.apache.org/book.html#regions.arch.locality > > On Wed, Aug 29, 2012 at 8:06 AM, Robert Dyer <[email protected]> wrote: >> >> Ok but does that imply that only 1 of your compute nodes is promised >> to have all of the data for any given row? The blocks will replicate, >> but they don't necessarily all replicate to the same nodes right? > > > Right. > >> >> So if I have say 2 column families (cf1, cf2) and there is 2 physical >> files on the HDFS for those (per region) then those files are created >> on one datanode (dn1) which will have all blocks local to that node. > > > Yes. Nit: datanodes don't "see" files, only blocks. But the logic remains > the same. > >> >> Once it replicates those blocks 2 more times by default, isn't it >> possible the blocks for cf1 will go to dn2, dn3 while the blocks for >> cf2 goes to dn4, dn5? > > > Yes, it's possible (and even likely).
