I've been asked to migrate an hbase table from hbase .92_1 cdh4.1.3 to cdh5.3.0 hbase .98 . I originally tried using distcp to copy the data between hbase versions. However distcp fails and both hadoop-user and cdh-user mailing lists have been unable to assist finding a means to making distcp work. Then I could not export the hbase table and copy between clusters.
So I decided to run a mapreduce job on the hbase table, to get the items, serialize, and transfer into kafka, and then transfer it to the more recent version. Most of my maps have finished, but 10 remain running. Those that remain running have been running for 3-4x longer than the longest map that finished in 7 hours. Currently, I'm afraid that this job will continue to remain running forever. I've noticed that previously I had between 50-150k requests per second. Now I'm getting between 100-150k requests per minute. The map reduce job has slowed down significantly If I look at HDFS, I see 43 paths that appear to have between 5 and 10G. These appear to correspond to each region map. Then I would expect all my jobs to be finished since the longest completed job took 7 hours, but the rest have been running for 21-36+. I might also mention that I also noticed that there were 6 failed maps in the beginning, which all ran on the servers that contained the ROOT and META regions. These jobs failed at the start of the job, within the first 10 minutes. Unfortunately I cannot view the logs for the failure. I think that possibly this is all due to data locality and replication issues. Is there a simple way for me to map each map to it's respective hbase region? Has anybody experienced a similar slowdown when running a map reduce job against HBase? Hbase experts, Can I get some advice related to this issue?