Hi,

We have a setup with 4 regionservers and a replication factor of 3. We are
running MapReduce tasks using Hbase as data-source and sink. When running
MapReduce tasks over data stored on the 4 nodes we noticed that in the
statistics of a successfully completed job, the majority of the maps are
"rack-local" and not "data-local". In this particular case we had 48 maps
where 19 of them were data-local and 29 rack-local. I would have expected to
have the majority of them "data-local" as the data should be available on 3
out of 4 nodes due to the replication. Is this a configuration issue or am I
just thinking in a wrong way?

thanks,
Felix

Reply via email to