Hi, 
I have a 10 nodes cluster with 8 of them are datanode/tasknode/HbaseRegionNode.
I have a HBase table with one column family and 1.5T data, spread across 55 
regions on these 8 region servers. When I run a testing scan MR job, it will 
generate 55 mapper tasks, (Matching with 55 regions), but all of them are 
rack-local map tasks (Not a single data-local map tasks). The cluster is being 
running for weeks. I did a major compact before the MR job. I run the MR job 
for several times, and all I got are 55 rack-local map tasks, not a single data 
local map tasks. I think something is wrong with my cluster/hbase setting, but 
not sure why.
All 8 child boxes are running datanode, tasknode and hbase region servers. All 
10 boxes are in one rack.
Here is what I observed some difference:
In the MR job running a Hbase table, here is one example:
Task AttemptsMachineStatusProgressStart TimeFinish TimeErrorsTask 
LogsCountersActionsattempt_201402131137_0469_m_000000_0/default-rack/10.xx.xx.xxSUCCEEDED100.00%24-Feb-2014
 09:58:2324-Feb-2014 10:31:41 (33mins, 18sec)Last 4KB
Last 8KB
All
13 Input Split Locations/default-rack/real_hostname.


As you can see, in the input split, it shows the real HOSTNAME of of the box, 
and in the Task attempts, the machine information is the real IP of the machine 
running the task, which is NOT the same as the InputSplit Location.
On the other hand, if I running a MR job of the HDFS files in this cluster, I 
will get 30 of 32 mappers are data local tasks. Here is the output:
All Task AttemptsTask AttemptsMachineStatusProgressStart TimeFinish 
TimeErrorsTask 
LogsCountersActionsattempt_201402131137_0467_m_000000_0/default-rack/10.xx.xx.133SUCCEEDED100.00%24-Feb-2014
 09:49:5824-Feb-2014 09:50:29 (30sec)Last 4KB
Last 8KB
All
20 Input Split 
Locations/default-rack/10.xx.xx.133/default-rack/10.xx.xx.135/default-rack/10.xx.xx.140


What difference I saw here is that the InputSplit Location in MR job on HDFS 
file are shown as real IP address, instead of host name as in Hbase. Could it 
be the reason I got 0 data local map tasks in Hbase MR job? If not, what could 
be?
Thanks

Yong                                      

Reply via email to