Data locality in HBase

Ben Kim Thu, 14 Jun 2012 21:57:43 -0700

Hi,

I've been posting questions in the mailing-list quiet often lately, and
here goes another one about data locality
I read the excellent blog post about data locality that Lars George wrote
at http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html


I understand data locality in hbase as locating a region in a region-server
where most of its data blocks reside.
So that way fast data access is guranteed when running a MR because each
map/reduce task is run for each region in the tasktracker where the region
co-locates.

But what if the data blocks of the region are evenly spread over multiple
region-servers?
Does a MR task has to remotely access the data blocks from other
regionservers?
How good is hbase locating datablocks where a region resides?

Also is it correct to say that if i set smaller data block size data
locality gets worse, and if data block size gets bigger  data locality gets
better.

Best regards,
-- 

*Benjamin Kim*
*benkimkimben at gmail*

Data locality in HBase

Reply via email to