Did you pre split your table or did you let balancer assign regions to regionservers for you ?
Did your regionserver(s) fail ? On Thu, Aug 2, 2012 at 8:31 AM, Bryan Keller <[email protected]> wrote: > I have an 8 node cluster and a table that is pretty well balanced with on > average 36 regions/node. When I run a mapreduce job on the cluster against > this table, the data locality of the mappers is poor, e.g 100 rack local > mappers and only 188 data local mappers. I would expect nearly all of the > mappers to be data local. DNS appears to be fine, i.e. the hostname in the > splits is the same as the hostnames in the task attempts. > > The performance of the rack local mappers is poor and causes overall scan > performance to suffer. > > The table isn't new and from what I understand, HDFS replication will > eventually keep region data blocks local to the regionserver. Are there > other reasons for data locality to be poor and any way to fix it? > > -- Adrien Mogenet 06.59.16.64.22 http://www.mogenet.me
