Excuse my double post. I thought I deleted my draft, and then constructed a cleaner, more detailed, more readable mail.
On Tue, Apr 12, 2016 at 10:26 PM, Colin Kincaid Williams <disc...@uw.edu> wrote: > After trying to get help with distcp on hadoop-user and cdh-user > mailing lists, I've given up on trying to use distcp and exporttable > to migrate my hbase from .92.1 cdh4.1.3 to .98 on cdh5.3.0 > > I've been working on an hbase map reduce job to serialize my entries > and insert them into kafka. Then I plan to re-import them into > cdh5.3.0. > > Currently I'm having trouble with my map-reduce job. I have 43 maps, > 33 which have finished successfully, and 10 which are currently still > running. I had previously seen requests of 50-150k per second. Now for > the final 10 maps, I'm seeing 100-150k per minute. > > I might also mention that there were 6 failures near the application > start. Unfortunately, I cannot read the logs for these 6 failures. > There is an exception related to the yarn logging for these maps, > maybe because they failed to start. > > I had a look around HDFS. It appears that the regions are all between > 5-10GB. The longest completed map so far took 7 hours, with the > majority appearing to take around 3.5 hours . > > The remaining 10 maps have each been running between 23-27 hours. > > Considering data locality issues. 6 of the remaining jobs are running > on the same rack. Then the other 4 are split between my other two > racks. There should currently be a replica on each rack, since it > appears the replicas are set to 3. Then I'm not sure this is really > the cause of the slowdown. > > Then I'm looking for advice on what I can do to troubleshoot my job. > I'm setting up my map job like: > > main(String[] args){ > ... > Scan fromScan = new Scan(); > System.out.println(fromScan); > TableMapReduceUtil.initTableMapperJob(fromTableName, fromScan, Map.class, > null, null, job, true, TableInputFormat.class); > > // My guess is this contols the output type for the reduce function > base on setOutputKeyClass and setOutput value class from p.27 . Since > there is no reduce step, then this is currently null. > job.setOutputFormatClass(NullOutputFormat.class); > job.setNumReduceTasks(0); > job.submit(); > ... > } > > I'm not performing a reduce step, and I'm traversing row keys like > > map(final ImmutableBytesWritable fromRowKey, > Result fromResult, Context context) throws IOException { > ... > // should I assume that each keyvalue is a version of the stored row? > for (KeyValue kv : fromResult.raw()) { > ADTreeMap.get(kv.getQualifier()).fakeLambda(messageBuilder, > kv.getValue()); > //TODO: ADD counter for each qualifier > } > > > > I've also have a list of simple questions. > > Has anybody experienced a significant slowdown on map jobs related to > a portion of their hbase regions? If so what issues did you come > across? > > Can I get a suggestion how to show which map corresponds to which > region, so I can troubleshoot from there? Is this already logged > somewhere by default, or is there a way to set this up with the > TableMapReduceUtil.initTableMapperJob ? > > Any other suggestions would be appreciated.