Xingxing Di created HIVE-24720:
--
Summary: Got error while join iceberg table and hive table
Key: HIVE-24720
URL: https://issues.apache.org/jira/browse/HIVE-24720
Project: Hive
Issue Type: Bug
Components: Serializers/Deserializers, StorageHandler
Affects Versions: 2.0.1
Environment: Iceberg : 0.11
hive : 2.0.1
hadoop : 2.7.2
JDK : 1.8
Reporter: Xingxing Di
We got error while join iceberg table and hive table at same time, most of
mappers succeed, but some mappers got an `cannot find field` error:
{code:java}
Caused by: java.lang.RuntimeException: cannot find field log_src from
[org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector$IcebergRecordStructField@8838736]
at
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:442)
at
org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector.getStructFieldRef(IcebergRecordObjectInspector.java:78)
at
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55)
at
org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:139)
at
org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:980)
at
org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1006)
at
org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:77)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:355)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:504)
at
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:457)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:365)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:504)
at
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:457)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:365)
at
org.apache.hadoop.hive.ql.exec.MapOperator.initializeMapOperator(MapOperator.java:498)
at
org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:115)
... 22 more
{code}
{code:java}
2021-02-02 15:57:35,036 INFO [main] org.apache.hadoop.mapred.MapTask:
Processing split:
Paths:/tmp/hive/flink/25369d0b-110e-4eca-8553-f2f3953f0303/hive_2021-02-02_15-56-22_060_2459779186626852210-1/-mr-10005/0/emptyFile:0+0InputFormatClass:
org.apache.hadoop.mapred.TextInputFormat
{code}
*The split file is from `hive table`(which is an empty table), but still the
mapper using `org.apache.iceberg.mr.hive.HiveIcebergSerDe`, the table
properties is also from iceberg table.*
It seems like hive use wrong serd to deserialize data, I am not an expert in
hive, hope someone could give me some clues :).
SQL:
{code:java}
select count(1),count(a.dt),count(b.dt)
from (
select dt,concat_ws('###',wx_source, log_src, dt, hour) as str
from flink_fdm_iceberg.iceberg_table1
where dt='2021-01-29' and hour='16') a
full outer join (
select dt,concat_ws('###',wx_source, log_src, dt, hour) as str
from flink_fdm_iceberg.hive_table1
where dt='2021-01-29' and hour='16') b on a.str=b.str
where a.str is null or b.str is null;
{code}
The full log by the failed mapper:
{code:java}
2021-02-02 15:57:30,060 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
hadoop-metrics2.properties2021-02-02 15:57:30,060 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
hadoop-metrics2.properties2021-02-02 15:57:30,118 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at
10 second(s).2021-02-02 15:57:30,118 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system
started2021-02-02 15:57:30,131 INFO [main] org.apache.hadoop.mapred.YarnChild:
Executing with tokens:2021-02-02 15:57:30,131 INFO [main]
org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service:
job_1605975810748_1427, Ident:
(org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@4b2bac3f)2021-02-02
15:57:30,167 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind:
HDFS_DELEGATION_TOKEN, Service: ha-hdfs:ns2, Ident: (HDFS_DELEGATION_TOKEN
token 151282235 for flink)2021-02-02 15:57:30,168 INFO [main]
org.apache.hadoop.mapred.YarnChild: Kind: HDFS_DELEGATION_TOKEN, Service:
ha-hdfs:ns1, Ident: (HDFS_DELEGATION_TOKEN token 151249944 for flink)2021-02-02
15:57:30,168 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind:
HDFS_DELEGATION_TOKEN, Service: ha-hdfs:ns4, Ident: (HDFS_DELEGATION_TOKEN
token 151235141 for flink)2021-02-02 15:57:30,169 INFO [main]
org.apache.hadoop.mapred.YarnChild: Kind: