Re: Data locality in HBase

Michael Segel Thu, 21 Jun 2012 05:46:12 -0700

While data locality is nice, you may see it becoming less of a bonus or issue.


With Co-processors available, indexing becomes viable.  So you may see things 
where within the M/R you process a row from table A, maybe hit an index to find 
a value in table B and then do some processing.  

There's more to it, like using the intersection of indexes across a common key 
to join data which blows the whole issue of data locality out of the water. But 
that's a longer discussion requiring a white board, and some alcohol or 
coffee.... ;-)


On Jun 21, 2012, at 5:07 AM, Lars George wrote:

> Hi Ben,
> 
> According to your fsck dump, the first copy is located on hadoop-143, which 
> has all the blocks for the region. So if you check, I would assume that the 
> region is currently open and served by hadoop-143, right?
> 
> The TableInputFormat getSplit() will report that server to the MapReduce 
> framework, so the task would run on that node to access the data locally. 
> *Only* if you have speculative execution turned on you will have a task that 
> is run on another random node in parallel, which would need to do heaps of 
> remote reads. That is why it is recommended to turn that of for HBase and 
> MapReduce in combination.
> 
> Lars
> 
> On Jun 21, 2012, at 6:57 AM, Ben Kim wrote:
> 
>> Hi Lars,
>> I appreciate a lot for your reply.
>> 
>> As you told, a regionserver processes hfiles so that all data blocks are
>> located in the same physical machine unless the regionserver failes.
>> I ran following hadoop command to see location of a HFile
>> 
>> *hadoop fsck
>> /hbase/testtable/9488ef7fbd23b62b9bf85b722c015e90/testcf/08dc1940944b4952b23f0cbee51bcea8
>> -files -locations -blocks*
>> 
>> here is the output...
>> 
>> FSCK started by hadoop from /203.235.211.142 for path
>>> /hbase/testtable/9488ef7fbd23b62b9bf85b722c015e90/testcf/08dc1940944b4952b23f0cbee51bcea8
>>> at Thu Jun 21 13:40:11 KST 2012
>>> /hbase/testtable/9488ef7fbd23b62b9bf85b722c015e90/testcf/08dc1940944b4952b23f0cbee51bcea8
>>> 727156659 bytes, 11 block(s):  OK
>>> 0. blk_1832396139416350298_1296638 len=67108864 repl=3 [hadoop-145:50010,
>>> hadoop-143:50010, hadoop-144:50010]
>>> 1. blk_8910330590545256327_1296640 len=67108864 repl=3 [hadoop-143:50010,
>>> hadoop-157:50010, hadoop-159:50010]
>>> 2. blk_-3868612696419011016_1296640 len=67108864 repl=3 [hadoop-145:50010,
>>> hadoop-143:50010, hadoop-156:50010]
>>> 3. blk_-7551946394410945015_1296640 len=67108864 repl=3 [hadoop-145:50010,
>>> hadoop-143:50010, hadoop-157:50010]
>>> 4. blk_-1875839158119319613_1296640 len=67108864 repl=3 [hadoop-145:50010,
>>> hadoop-143:50010, hadoop-146:50010]
>>> 5. blk_-6953623390282045248_1296640 len=67108864 repl=3 [hadoop-143:50010,
>>> hadoop-157:50010, hadoop-144:50010]
>>> 6. blk_-3016727256928339770_1296640 len=67108864 repl=3 [hadoop-143:50010,
>>> hadoop-146:50010, hadoop-159:50010]
>>> 7. blk_3526351456802007773_1296640 len=67108864 repl=3 [hadoop-143:50010,
>>> hadoop-160:50010, hadoop-156:50010]
>>> 8. blk_5134681308608742320_1296640 len=67108864 repl=3 [hadoop-145:50010,
>>> hadoop-143:50010, hadoop-144:50010]
>>> 9. blk_-6875541109589395450_1296640 len=67108864 repl=3 [hadoop-145:50010,
>>> hadoop-143:50010, hadoop-156:50010]
>>> 10. blk_-553661064097182668_1296640 len=56068019 repl=3 [hadoop-143:50010,
>>> hadoop-146:50010, hadoop-159:50010]
>>> 
>>> Status: HEALTHY
>>> Total size:    727156659 B
>>> Total dirs:    0
>>> Total files:   1
>>> Total blocks (validated):      11 (avg. block size 66105150 B)
>>> Minimally replicated blocks:   11 (100.0 %)
>>> Over-replicated blocks:        0 (0.0 %)
>>> Under-replicated blocks:       0 (0.0 %)
>>> Mis-replicated blocks:         0 (0.0 %)
>>> Default replication factor:    3
>>> Average block replication:     3.0
>>> Corrupt blocks:                0
>>> Missing replicas:              0 (0.0 %)
>>> Number of data-nodes:          9
>>> Number of racks:               1
>>> FSCK ended at Thu Jun 21 13:40:11 KST 2012 in 4 milliseconds
>>> 
>> 
>> As you see, data blocks of the HFile are stored across two different
>> datanodes (hadoop-145 and hadoop-143).
>> 
>> Let say a map task runs on hadoop-145 and needs to access the block 7. Then
>> the map task needs to remotely access the block 7 on hadoop-143 server.
>> Almost half of the data blocks are stored & accessed remotely. Referring
>> from the above example, It's hard to say that the data locality is being
>> applied to HBase.
>> 
>> Ben
>> 
>> 
>> On Fri, Jun 15, 2012 at 5:21 PM, Lars George <lars.geo...@gmail.com> wrote:
>> 
>>> Hi Ben,
>>> 
>>> See inline...
>>> 
>>> On Jun 15, 2012, at 6:56 AM, Ben Kim wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I've been posting questions in the mailing-list quiet often lately, and
>>>> here goes another one about data locality
>>>> I read the excellent blog post about data locality that Lars George wrote
>>>> at http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html
>>>> 
>>>> I understand data locality in hbase as locating a region in a
>>> region-server
>>>> where most of its data blocks reside.
>>> 
>>> The opposite is happening, i.e. the region server process triggers for all
>>> data it writes to be located on the same physical machine.
>>> 
>>>> So that way fast data access is guranteed when running a MR because each
>>>> map/reduce task is run for each region in the tasktracker where the
>>> region
>>>> co-locates.
>>> 
>>> Correct.
>>> 
>>>> But what if the data blocks of the region are evenly spread over multiple
>>>> region-servers?
>>> 
>>> This will not happen, unless the original server fails. Then the region is
>>> moved to another that now needs to do a lot of remote reads over the
>>> network. This is way there is work being done to allow for custom placement
>>> policies in HDFS. That way you can store the entire region and all copies
>>> as complete units on three data nodes. In case of a failure you can then
>>> move the region to one of the two copies. This is not available yet though,
>>> but it is being worked on (so I heard).
>>> 
>>>> Does a MR task has to remotely access the data blocks from other
>>>> regionservers?
>>> 
>>> For the above failure case, it would be the region server accessing the
>>> remote data, yes.
>>> 
>>>> How good is hbase locating datablocks where a region resides?
>>> 
>>> That is again the wrong way around. HBase has no clue as to where blocks
>>> reside, nor does it know that the file system in fact uses separate blocks.
>>> HBase stores files, HDFS does the block magic underneath the hood, and
>>> transparent to HBase.
>>> 
>>>> Also is it correct to say that if i set smaller data block size data
>>>> locality gets worse, and if data block size gets bigger  data locality
>>> gets
>>>> better.
>>> 
>>> This is not applicable here, I am assuming this stems from the above
>>> confusion about which system is handling the blocks, HBase or HDFS. See
>>> above.
>>> 
>>> HTH,
>>> Lars
>>> 
>>>> 
>>>> Best regards,
>>>> --
>>>> 
>>>> *Benjamin Kim*
>>>> *benkimkimben at gmail*
>>> 
>>> 
>> 
>> 
>> -- 
>> 
>> *Benjamin Kim*
>> *benkimkimben at gmail*
> 
>

Re: Data locality in HBase

Reply via email to