Ok. I get is passed with your settings and using Cell (KeyValu)... I don't need the IO setting because HFileOutputFormat2 already set them as you said.
I need to look a bit more to all of that... Thanks 2015-04-28 20:49 GMT-04:00 Jean-Marc Spaggiari <[email protected]>: > Which HBase version are you running that with? > > 2015-04-28 20:42 GMT-04:00 张铎 <[email protected]>: > > Then this is a Map-Only job? Try this? >> job.setNumReduceTasks(0); >> >> This is my code to setup a Map-Only job which generates HFiles for >> bulkload >> >> job.setMapperClass(ImportMapper.class); >> job.setNumReduceTasks(0); >> job.setOutputKeyClass(ImmutableBytesWritable.class); >> job.setOutputValueClass(Cell.class); >> job.setOutputFormatClass(HFileOutputFormat2.class); >> >> And I found that there are Serializers for KeyValue and Put... I found >> this >> in HFileOutputFormat2 >> >> conf.setStrings("io.serializations", conf.get("io.serializations"), >> MutationSerialization.class.getName(), >> ResultSerialization.class.getName(), >> KeyValueSerialization.class.getName()); >> >> 2015-04-29 8:31 GMT+08:00 Jean-Marc Spaggiari <[email protected]>: >> >> > Hum. My goal was to output the Puts directly from the Mapper, so seems >> that >> > this is not doable, right? You will always need to have the reducers to >> > create as many files has you have regions. Since I have just one, I was >> > trying to save some steps and generate the output directly from the >> mapper >> > side. So I guess this is not doable. >> > >> > I will try to the Reducer then... >> > >> > 2015-04-28 20:27 GMT-04:00 张铎 <[email protected]>: >> > >> > > HFileOutputFormat2 is used at the final output step, i.e, reduce >> output >> > in >> > > MR job or map output in Map-Only job. It uses ImmutableBytesWritable >> and >> > > Cell as key and value. >> > > So I think your problem is not related to HFileOutputFormat2? If you >> want >> > > to use KeyValue or Put at the shuffle step(the output types of Mapper >> and >> > > input types of Reducer), you should implement Serializers for them by >> > > yourself. >> > > And I suggest to use general Writable classes at the shuffle step, and >> > > convert them to ImmutableBytesWritable and Cell in Reducer then >> collect >> > > them out. >> > > >> > > Thanks. >> > > >> > > 2015-04-29 7:49 GMT+08:00 Jean-Marc Spaggiari < >> [email protected]>: >> > > >> > > > ImmutableBytesWritable works because it implements >> WritableComparable.. >> > > The >> > > > others don't. So make sense. >> > > > >> > > > Now question is. Should Put implement it too? If not, how are we >> > > expecting >> > > > HFileOutputFormat2 to work with MR? Or at least Writable? >> > > > >> > > > 2015-04-28 18:43 GMT-04:00 Jean-Marc Spaggiari < >> > [email protected] >> > > >: >> > > > >> > > > > Hi all, >> > > > > >> > > > > Quick question. I'm trying to do a very simple MR job just doing >> > > > > nothing... Just to try to get it run. >> > > > > >> > > > > But as soon as I set the output value to be KeyValue or Put, I get >> > > > > exception from the MR framework. >> > > > > >> > > > > The exception is the following: >> > > > > java.lang.Exception: java.lang.NullPointerException >> > > > > at >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) >> > > > > at >> > > > > >> > > >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) >> > > > > Caused by: java.lang.NullPointerException >> > > > > at >> > > > > >> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:988) >> > > > > at >> > > > > >> > > >> org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391) >> > > > > at >> org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80) >> > > > > at >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675) >> > > > > at >> > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747) >> > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) >> > > > > at >> > > > > >> > > > >> > > >> > >> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) >> > > > > at >> > > > > >> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >> > > > > at java.util.concurrent.FutureTask.run(FutureTask.java:262) >> > > > > at >> > > > > >> > > > >> > > >> > >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> > > > > at >> > > > > >> > > > >> > > >> > >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> > > > > at java.lang.Thread.run(Thread.java:745) >> > > > > >> > > > > If you look into the code, that means that Hadoop is not able to >> > > > serialize >> > > > > KeyValue nor Put and so is not able to usethem in the Mapper >> class. >> > > > > >> > > > > To validate, I tried this: >> > > > > SerializationFactory serializationFactory = new >> > > > > SerializationFactory(conf); >> > > > > System.out.println >> > > > > (serializationFactory.getSerializer(KeyValue.class)); >> > > > > System.out.println >> > > (serializationFactory.getSerializer(Put.class)); >> > > > > System.out.println >> > > > (serializationFactory.getSerializer(Cell.class)); >> > > > > And they all return null. Which is consistent with the exception. >> > > > > >> > > > > So you don't even need to run MR to get it fails. Just a small >> main >> > > with >> > > > > those 4 lines. >> > > > > >> > > > > Am I missing something? Like, doing some initialization to help >> > Hadoop >> > > to >> > > > > serialize those classes? >> > > > > >> > > > > Thanks, >> > > > > >> > > > > JM >> > > > > >> > > > >> > > >> > >> > >
