Thanks Stack. We're looking into this a lot.
As far as we can tell DNS is correct, machine host names are correct etc.
In .META. it uses fully qualified names (c4n5.gbif.org) so I guess I'll
start looking at the job launching machine.
The code you link to is quite different to the TableInputFormatBase in
CDH3u3. I actually patched that with the following to verify to myself it
would work, and it did indeed work (got a blog about the performance which
you'll like):
// patch the possible GBIF DNS issue - TT report differing things to
split locations
// Task attempts show as /default-rack/c4n2.gbif.org
// splits are coming in as /default-rack/130.226.238.182
regionLocation = regionLocation.replaceAll("130.226.238.181", "
c4n1.gbif.org");
regionLocation = regionLocation.replaceAll("130.226.238.182", "
c4n2.gbif.org");
regionLocation = regionLocation.replaceAll("130.226.238.183", "
c4n3.gbif.org");
regionLocation = regionLocation.replaceAll("130.226.238.184", "
c4n4.gbif.org");
regionLocation = regionLocation.replaceAll("130.226.238.185", "
c4n5.gbif.org");
regionLocation = regionLocation.replaceAll("130.226.238.186", "
c4n6.gbif.org");
More when we know more.
Tim
On Mon, May 28, 2012 at 12:32 AM, Stack <[email protected]> wrote:
> On Sun, May 27, 2012 at 1:05 PM, Tim Robertson
> <[email protected]> wrote:
> > Hi all,
> >
> > When I run MR jobs, I don't see data locality because the TT sees
> > /default-rack/c4n1.gbif.org but the TableInputFormat is
> > giving /default-rack/130.226.238.181 (the same machine) when it
> determines
> > the splits for the job.
>
> Its doing this Tim:
>
>
> http://hbase.apache.org/xref/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.html#145
>
> On the machine launching the job, its asking what the region location
> is. What is in .META. table? Names or IPs? If former, then its the
> resolve on the machine launching the job that is mangling it (DNS
> falls back to IP if problem figuring name). Can you mess w/ the DNS
> on the machine that is launching the job? See if you can find issue
> in its DNS (This is 0.90.X? If so, does its forward and back DNS give
> same answer? If 0.92.1, shouldn't matter).
>
> St.Ack
>