Re: HFileOutputFormat2 + HBase 1.0.0

张铎 Tue, 28 Apr 2015 17:28:40 -0700

HFileOutputFormat2 is used at the final output step, i.e, reduce output in
MR job or map output in Map-Only job. It uses ImmutableBytesWritable and
Cell as key and value.
So I think your problem is not related to HFileOutputFormat2? If you want
to use KeyValue or Put at the shuffle step(the output types of Mapper and
input types of Reducer), you should implement Serializers for them by
yourself.
And I suggest to use general Writable classes at the shuffle step, and
convert them to ImmutableBytesWritable and Cell in Reducer then collect
them out.


Thanks.

2015-04-29 7:49 GMT+08:00 Jean-Marc Spaggiari <[email protected]>:

> ImmutableBytesWritable works because it implements WritableComparable.. The
> others don't. So make sense.
>
> Now question is. Should Put implement it too? If not, how are we expecting
> HFileOutputFormat2 to work with MR? Or at least Writable?
>
> 2015-04-28 18:43 GMT-04:00 Jean-Marc Spaggiari <[email protected]>:
>
> > Hi all,
> >
> > Quick question. I'm trying to do a very simple MR job just doing
> > nothing... Just to try to get it run.
> >
> > But as soon as I set the output value to be KeyValue or Put, I get
> > exception from the MR framework.
> >
> > The exception is the following:
> > java.lang.Exception: java.lang.NullPointerException
> >     at
> >
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
> >     at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
> > Caused by: java.lang.NullPointerException
> >     at
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:988)
> >     at
> > org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
> >     at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
> >     at
> >
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
> >     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
> >     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> >     at
> >
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
> >     at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> >     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >     at java.lang.Thread.run(Thread.java:745)
> >
> > If you look into the code, that means that Hadoop is not able to
> serialize
> > KeyValue nor Put and so is not able to usethem in the Mapper class.
> >
> > To validate, I tried this:
> >       SerializationFactory serializationFactory = new
> > SerializationFactory(conf);
> >       System.out.println
> > (serializationFactory.getSerializer(KeyValue.class));
> >       System.out.println (serializationFactory.getSerializer(Put.class));
> >       System.out.println
> (serializationFactory.getSerializer(Cell.class));
> > And they all return null. Which is consistent with the exception.
> >
> > So you don't even need to run MR to get it fails. Just a small main with
> > those 4 lines.
> >
> > Am I missing something? Like, doing some initialization to help Hadoop to
> > serialize those classes?
> >
> > Thanks,
> >
> > JM
> >
>

Re: HFileOutputFormat2 + HBase 1.0.0

Reply via email to