Re: HBase mapreduce job crawls on final 25% of maps

Colin Kincaid Williams Tue, 12 Apr 2016 15:59:34 -0700

I've noticed that I've omitted

scan.setCaching(500);        // 1 is the default in Scan, which will
be bad for MapReduce jobs
scan.setCacheBlocks(false);  // don't set to true for MR jobs


which appear to be suggestions from examples. Still I am not sure if
this explains the significant request slowdown on the final 25% of the
jobs.

On Tue, Apr 12, 2016 at 10:36 PM, Colin Kincaid Williams <disc...@uw.edu> wrote:
> Excuse my double post. I thought I deleted my draft, and then
> constructed a cleaner, more detailed, more readable mail.
>
> On Tue, Apr 12, 2016 at 10:26 PM, Colin Kincaid Williams <disc...@uw.edu> 
> wrote:
>> After trying to get help with distcp on hadoop-user and cdh-user
>> mailing lists, I've given up on trying to use distcp and exporttable
>> to migrate my hbase from .92.1 cdh4.1.3 to .98 on cdh5.3.0
>>
>> I've been working on an hbase map reduce job to serialize my entries
>> and insert them into kafka. Then I plan to re-import them into
>> cdh5.3.0.
>>
>> Currently I'm having trouble with my map-reduce job. I have 43 maps,
>> 33 which have finished successfully, and 10 which are currently still
>> running. I had previously seen requests of 50-150k per second. Now for
>> the final 10 maps, I'm seeing 100-150k per minute.
>>
>> I might also mention that there were 6 failures near the application
>> start. Unfortunately, I cannot read the logs for these 6 failures.
>> There is an exception related to the yarn logging for these maps,
>> maybe because they failed to start.
>>
>> I had a look around HDFS. It appears that the regions are all between
>> 5-10GB. The longest completed map so far took 7 hours, with the
>> majority appearing to take around 3.5 hours .
>>
>> The remaining 10 maps have each been running between 23-27 hours.
>>
>> Considering data locality issues. 6 of the remaining jobs are running
>> on the same rack. Then the other 4 are split between my other two
>> racks. There should currently be a replica on each rack, since it
>> appears the replicas are set to 3. Then I'm not sure this is really
>> the cause of the slowdown.
>>
>> Then I'm looking for advice on what I can do to troubleshoot my job.
>> I'm setting up my map job like:
>>
>> main(String[] args){
>> ...
>> Scan fromScan = new Scan();
>> System.out.println(fromScan);
>> TableMapReduceUtil.initTableMapperJob(fromTableName, fromScan, Map.class,
>> null, null, job, true, TableInputFormat.class);
>>
>> // My guess is this contols the output type for the reduce function
>> base on setOutputKeyClass and setOutput value class from p.27 . Since
>> there is no reduce step, then this is currently null.
>> job.setOutputFormatClass(NullOutputFormat.class);
>> job.setNumReduceTasks(0);
>> job.submit();
>> ...
>> }
>>
>> I'm not performing a reduce step, and I'm traversing row keys like
>>
>> map(final ImmutableBytesWritable fromRowKey,
>> Result fromResult, Context context) throws IOException {
>> ...
>>       // should I assume that each keyvalue is a version of the stored row?
>>       for (KeyValue kv : fromResult.raw()) {
>>         ADTreeMap.get(kv.getQualifier()).fakeLambda(messageBuilder,
>> kv.getValue());
>>         //TODO: ADD counter for each qualifier
>>       }
>>
>>
>>
>> I've also have a list of simple questions.
>>
>> Has anybody experienced a significant slowdown on map jobs related to
>> a portion of their hbase regions? If so what issues did you come
>> across?
>>
>> Can I get a suggestion how to show which map corresponds to which
>> region, so I can troubleshoot from there? Is this already logged
>> somewhere by default, or is there a way to set this up with the
>> TableMapReduceUtil.initTableMapperJob ?
>>
>> Any other suggestions would be appreciated.

Re: HBase mapreduce job crawls on final 25% of maps

Reply via email to