HFileOutputFormat2 is used at the final output step, i.e, reduce output in MR job or map output in Map-Only job. It uses ImmutableBytesWritable and Cell as key and value. So I think your problem is not related to HFileOutputFormat2? If you want to use KeyValue or Put at the shuffle step(the output types of Mapper and input types of Reducer), you should implement Serializers for them by yourself. And I suggest to use general Writable classes at the shuffle step, and convert them to ImmutableBytesWritable and Cell in Reducer then collect them out.
Thanks. 2015-04-29 7:49 GMT+08:00 Jean-Marc Spaggiari <[email protected]>: > ImmutableBytesWritable works because it implements WritableComparable.. The > others don't. So make sense. > > Now question is. Should Put implement it too? If not, how are we expecting > HFileOutputFormat2 to work with MR? Or at least Writable? > > 2015-04-28 18:43 GMT-04:00 Jean-Marc Spaggiari <[email protected]>: > > > Hi all, > > > > Quick question. I'm trying to do a very simple MR job just doing > > nothing... Just to try to get it run. > > > > But as soon as I set the output value to be KeyValue or Put, I get > > exception from the MR framework. > > > > The exception is the following: > > java.lang.Exception: java.lang.NullPointerException > > at > > > org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) > > at > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) > > Caused by: java.lang.NullPointerException > > at > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:988) > > at > > org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391) > > at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80) > > at > > > org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675) > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > > at > > > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) > > at > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > > > > If you look into the code, that means that Hadoop is not able to > serialize > > KeyValue nor Put and so is not able to usethem in the Mapper class. > > > > To validate, I tried this: > > SerializationFactory serializationFactory = new > > SerializationFactory(conf); > > System.out.println > > (serializationFactory.getSerializer(KeyValue.class)); > > System.out.println (serializationFactory.getSerializer(Put.class)); > > System.out.println > (serializationFactory.getSerializer(Cell.class)); > > And they all return null. Which is consistent with the exception. > > > > So you don't even need to run MR to get it fails. Just a small main with > > those 4 lines. > > > > Am I missing something? Like, doing some initialization to help Hadoop to > > serialize those classes? > > > > Thanks, > > > > JM > > >
