Hi, We have a setup with 4 regionservers and a replication factor of 3. We are running MapReduce tasks using Hbase as data-source and sink. When running MapReduce tasks over data stored on the 4 nodes we noticed that in the statistics of a successfully completed job, the majority of the maps are "rack-local" and not "data-local". In this particular case we had 48 maps where 19 of them were data-local and 29 rack-local. I would have expected to have the majority of them "data-local" as the data should be available on 3 out of 4 nodes due to the replication. Is this a configuration issue or am I just thinking in a wrong way?
thanks, Felix
