Hbase Map Reduce job slows down on final 25% of jobs.

Colin Kincaid Williams Tue, 12 Apr 2016 13:51:06 -0700

 I've been asked to migrate an hbase table from hbase .92_1 cdh4.1.3
to cdh5.3.0 hbase .98 . I originally tried using distcp to copy the
data between hbase versions. However distcp fails and both hadoop-user
and cdh-user mailing lists have been unable to assist finding a means
to making distcp work. Then I could not export the hbase table and
copy between clusters.


So I decided to run a mapreduce job on the hbase table, to get the
items, serialize, and transfer into kafka, and then transfer it to the
more recent version. Most of my maps have finished, but 10 remain
running. Those that remain running have been running for 3-4x longer
than the longest map that finished in 7 hours. Currently, I'm afraid
that this job will continue to remain running forever. I've noticed
that previously I had between 50-150k requests per second. Now I'm
getting between 100-150k requests per minute. The map reduce job has
slowed down significantly

If I look at HDFS, I see 43 paths that appear to have between 5 and
10G. These appear to correspond to each region map. Then I would
expect all my jobs to be finished since the longest completed job took
7 hours, but the rest have been running for 21-36+.

I might also mention that I also noticed that there were 6 failed maps
in the beginning, which all ran on the servers that contained the ROOT
and META regions. These jobs failed at the start of the job, within
the first 10 minutes. Unfortunately I cannot view the logs for the
failure.

I think that possibly this is all due to data locality and replication
issues. Is there a simple way for me to map each map to it's
respective hbase region?

Has anybody experienced a similar slowdown when running a map reduce
job against HBase? Hbase experts, Can I get some advice related to
this issue?

Hbase Map Reduce job slows down on final 25% of jobs.

Reply via email to