## Version:
phoenix: 4.13.2-cdh5.11.2
Advertising
hive: 1.1.0-cdh5.11.2
There is an ColumnNotFoundException when joining a hive internal table with an
Phoenix external table.
## Table1: phoenix external table "ext_tmp":
+-----------+------------+--------------------+--+
| col_name | data_type | comment |
+-----------+------------+--------------------+--+
| cola | string | from deserializer |
| colb | string | from deserializer |
+-----------+------------+--------------------+--+
### Backend Phoenix table "TMP":
select * from TMP;
+-------+------------+
| cola | colb |
+-------+------------+
| a | aaaa |
| b | bbaaaa |
| ccc | cccbbaaaa |
+-------+------------+
### Hive external table creation statement:
+----------------------------------------------------+--+
| createtab_stmt |
+----------------------------------------------------+--+
| CREATE EXTERNAL TABLE `ext_tmp`( |
| `cola` string COMMENT 'from deserializer', |
| `colb` string COMMENT 'from deserializer') |
| ROW FORMAT SERDE |
| 'org.apache.phoenix.hive.PhoenixSerDe' |
| STORED BY |
| 'org.apache.phoenix.hive.PhoenixStorageHandler' |
| WITH SERDEPROPERTIES ( |
| 'serialization.format'='1') |
| LOCATION |
| 'hdfs://st:8020/data/user/hive/warehouse/ext_tmp'|
| TBLPROPERTIES ( |
| 'phoenix.column.mapping'='cola:cola,colb:colb', |
| 'phoenix.rowkeys'='cola', |
| 'phoenix.table.name'='tmp', |
| 'phoenix.zookeeper.client.port'='2181', |
| 'phoenix.zookeeper.quorum'='st1,st2,st3', |
| 'phoenix.zookeeper.znode.parent'='/hbase', |
| 'transient_lastDdlTime'='1523607352') |
+----------------------------------------------------+--+
## Table2: hive internal table "native1":
+-----------+------------+----------+--+
| col_name | data_type | comment |
+-----------+------------+----------+--+
| cola | string | |
| colb | string | |
+-----------+------------+----------+--+
## When join the two tables:
select * from native1 join ext_tmp t on native1.cola= t.cola;
Exception:
org.apache.phoenix.schema.ColumnNotFoundException: ERROR 504 (42703): Undefined
column. columnName=TMP
Detailed exception as below. There is an empty column "" as readcolumn name,
which is not exsit in Pheonix table at all. so the phoenix query will never be
correct: select /*+ NO_CACHE */ "","cola","colb" from tmp where "cola" is not
null
Any thoughts?
---------
2018-04-14 21:13:40,923 INFO org.apache.hadoop.hive.ql.io.HiveInputFormat:
[HiveServer2-Background-Pool: Thread-304]: hive.io.file.readcolumn.ids=
2018-04-14 21:13:40,923 INFO org.apache.hadoop.hive.ql.io.HiveInputFormat:
[HiveServer2-Background-Pool: Thread-304]:
hive.io.file.readcolumn.names=,cola,colb
2018-04-14 21:13:40,923 INFO org.apache.hadoop.hive.ql.io.HiveInputFormat:
[HiveServer2-Background-Pool: Thread-304]: Generating splits
2018-04-14 21:13:40,924 INFO
org.apache.phoenix.hive.query.PhoenixQueryBuilder:
[HiveServer2-Background-Pool: Thread-304]: Input query : select /*+ NO_CACHE
*/ "","cola","colb" from tmp where "cola" is not null
2018-04-14 21:13:40,932 ERROR
org.apache.phoenix.hive.mapreduce.PhoenixInputFormat:
[HiveServer2-Background-Pool: Thread-304]: Failed to get the query plan with
error [ERROR 504 (42703): Undefined column. columnName=TMP]
Thanks,
Wei