HBase mapreduce job crawls on final 25% of maps

Colin Kincaid Williams Tue, 12 Apr 2016 15:26:30 -0700

After trying to get help with distcp on hadoop-user and cdh-user
mailing lists, I've given up on trying to use distcp and exporttable
to migrate my hbase from .92.1 cdh4.1.3 to .98 on cdh5.3.0


I've been working on an hbase map reduce job to serialize my entries
and insert them into kafka. Then I plan to re-import them into
cdh5.3.0.

Currently I'm having trouble with my map-reduce job. I have 43 maps,
33 which have finished successfully, and 10 which are currently still
running. I had previously seen requests of 50-150k per second. Now for
the final 10 maps, I'm seeing 100-150k per minute.

I might also mention that there were 6 failures near the application
start. Unfortunately, I cannot read the logs for these 6 failures.
There is an exception related to the yarn logging for these maps,
maybe because they failed to start.

I had a look around HDFS. It appears that the regions are all between
5-10GB. The longest completed map so far took 7 hours, with the
majority appearing to take around 3.5 hours .

The remaining 10 maps have each been running between 23-27 hours.

Considering data locality issues. 6 of the remaining jobs are running
on the same rack. Then the other 4 are split between my other two
racks. There should currently be a replica on each rack, since it
appears the replicas are set to 3. Then I'm not sure this is really
the cause of the slowdown.

Then I'm looking for advice on what I can do to troubleshoot my job.
I'm setting up my map job like:

main(String[] args){
...
Scan fromScan = new Scan();
System.out.println(fromScan);
TableMapReduceUtil.initTableMapperJob(fromTableName, fromScan, Map.class,
null, null, job, true, TableInputFormat.class);

// My guess is this contols the output type for the reduce function
base on setOutputKeyClass and setOutput value class from p.27 . Since
there is no reduce step, then this is currently null.
job.setOutputFormatClass(NullOutputFormat.class);
job.setNumReduceTasks(0);
job.submit();
...
}

I'm not performing a reduce step, and I'm traversing row keys like

map(final ImmutableBytesWritable fromRowKey,
Result fromResult, Context context) throws IOException {
...
      // should I assume that each keyvalue is a version of the stored row?
      for (KeyValue kv : fromResult.raw()) {
        ADTreeMap.get(kv.getQualifier()).fakeLambda(messageBuilder,
kv.getValue());
        //TODO: ADD counter for each qualifier
      }



I've also have a list of simple questions.

Has anybody experienced a significant slowdown on map jobs related to
a portion of their hbase regions? If so what issues did you come
across?

Can I get a suggestion how to show which map corresponds to which
region, so I can troubleshoot from there? Is this already logged
somewhere by default, or is there a way to set this up with the
TableMapReduceUtil.initTableMapperJob ?

Any other suggestions would be appreciated.

HBase mapreduce job crawls on final 25% of maps

Reply via email to