Hello,all:
【My environment versions are :Hadoop 2.6.0 、hive 1.2.1、tez 0.7.0】
Our term develop a plug-in in hive, its function is similiar to
hive-hbase-handler.
Now I executed a HQL “select count(*) from h_im;”(h_im is an external
table, hbase table) in hive CLI, it throw exceptions:
(I am sorry, I can not copy the error information here, because we use
inner network,so some information will be omitted)
—————----------------------------------------
INFO [Dispatcher thread: Central] history.HistoryEventHandler: .... .....
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor.run(TezProcessor.java:172)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
...... .......
Caused by: java.lang.RuntimeException:
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing writable org.apache.hadoop.hive.hbase....
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:367)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor.run(TezProcessor.java:149)
.... 14 more
Caused by : java.lang.NullPointerException
at
com.fiberhome.nebula.datacenter.hbasehandler.NBHBaseSerde.deserialize(NBHBaseSerde.java:210)
at
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:145)
at
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$2(MapOperator.java:143)
at
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:512)
..... 18 more
-------------------------------------------------------------------------------------
I know that, it is custome storageHandler about hive, but, now, my
questions are about how the two(tez&hive) to work together:
There, NBHBaseSerde is a custom SerDe:
NBHBaseSerde extends ColumarSerDeBase implements
Configurable{
@Override
initialize() { ......}
deserialize() { ......}
....
}
In order to debug and solve the error above, I printed some logs in
related classes(local mode executed right, cluster mode is difficult to debug),
but there is no log message printed in yarn 8088 container logs:
(1)as showed above,the exceptions said,the nullpointer occured in
“NBHBaseSerde.deserialize(NBHBaseSerde.java:210)”,and line 210 is :
-------------------------------------------------------------------------------------------------------------
line 210: this.pair.setValue(zoneid);
-
-----------------------------------------------------------------------------------------------------------
I guess mybe "pair" is Null; so I printed one log before line 210( line
210 is not the first line in deserialize()):
---------------------------------------------------------------------------------------------------------------
LOG.info("deserialize begine ....."); //this log message is in he first line of
deserialize()
LOG.info("....pair.toString....." + pair.toString());// this log message is
just before "this.pair.setValue(zoneid)"
----------------------------------------------------------------------------------------------------------------
While after I changed NBHBaseSerde.class of the JAR file, some strange
things happened that I still do not understand:
①there is no log message in hive log and yarn container log(port:8088) ,
no "deserialize begine .....",no "....pair.toString.....".
②the exception said " Caused by : java.lang.NullPointerException at
com.fiberhome.nebula.datacenter.hbasehandler.NBHBaseSerde.deserialize(NBHBaseSerde.java:211)
", that is to say “LOG.info("....pair.toString....." + pair.toString());”is
the error line.
I was confused... they should be executed.But where were the log
messages?
(2) the parameter "pair" was assigned a value in NBHBaseSerde.initialize().
There was a hint LOG message "Serde initializeation begine.." in
the first line of NBHBaseSerde.initialize(), and I can only find one message
of "Serde initializeation begine.." in hive log. So I guess
NBHBaseSerde.initialize() was executed just one time during the entire process
of HQL execution.
It's said that,the log message can prove that this piece of code(
NBHBaseSerde.initialize()) just executed only one time in the hive client, it
was not called after job submitted.---------Am I right?
There are some other parameters like "pair" which were setted values
in NBHBaseSerde.initialize() lost thrie values after DAG job submitted to the
cluster. So I use set() to save these values in NBHiveHBaseUtils.java, the
method was resetting these parameters values in MapRecordProcessor.init(). Like
this:
-------------------------------------------------------------------------
legacyMRInput = getMRInput(inputs); //this is source code
......
NBHiveHBaseUtils.setPair(pair);//I added
....... .....
---------------------------------------------------------------------------
It was failed. Because I found that ,when I set
"hive.compute.splits.in.am=true", the logical was different to triditional
mr's, it seems MapRecordProcessor.init() was not executed(because log message
in MapRecordProcessor.init() were not printed).
But from the exception message, I also found this
"org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor.run(TezProcessor.java:149)",
In my hive source code:
---------------------------------
line 147: MRTaskReporter mrReporter = new MRTaskReporter(getContext));
line 148: rpoc.init(mrReporter, input, outputs);
line 149: rpoc.run();
----------------------------------------
There rpoc is MapRecordProcessor. It means MapRecordProcessor.init() was
executed. But why I couldn't find any log printed in it?
I also add a LOG message before line 149, it wasn't printed in hive log
or container log. why? I can not understand.
(3)As the title says, I really can not understand what's tez's logic in
processing hiveQL when need serialization and deserialization. I also study
hive and tez source code, I know tez's split mechanism can connect custom
storageHandler by HiveInputFormat. I think mybe I should to add
NBHBaseSerde.initialize() in somewhere to call this logic again, but I have
not found appropriate places.
I am eager to get your guidance. I would very much appreciate your help.
Any reply will be appreciated.
Thankyou & Best Regards.
---LLBian