## Version: phoenix: 4.13.2-cdh5.11.2
hive: 1.1.0-cdh5.11.2 There is an ColumnNotFoundException when joining a hive internal table with an Phoenix external table. ## Table1: phoenix external table "ext_tmp": +-----------+------------+--------------------+--+ | col_name | data_type | comment | +-----------+------------+--------------------+--+ | cola | string | from deserializer | | colb | string | from deserializer | +-----------+------------+--------------------+--+ ### Backend Phoenix table "TMP": select * from TMP; +-------+------------+ | cola | colb | +-------+------------+ | a | aaaa | | b | bbaaaa | | ccc | cccbbaaaa | +-------+------------+ ### Hive external table creation statement: +----------------------------------------------------+--+ | createtab_stmt | +----------------------------------------------------+--+ | CREATE EXTERNAL TABLE `ext_tmp`( | | `cola` string COMMENT 'from deserializer', | | `colb` string COMMENT 'from deserializer') | | ROW FORMAT SERDE | | 'org.apache.phoenix.hive.PhoenixSerDe' | | STORED BY | | 'org.apache.phoenix.hive.PhoenixStorageHandler' | | WITH SERDEPROPERTIES ( | | 'serialization.format'='1') | | LOCATION | | 'hdfs://st:8020/data/user/hive/warehouse/ext_tmp'| | TBLPROPERTIES ( | | 'phoenix.column.mapping'='cola:cola,colb:colb', | | 'phoenix.rowkeys'='cola', | | 'phoenix.table.name'='tmp', | | 'phoenix.zookeeper.client.port'='2181', | | 'phoenix.zookeeper.quorum'='st1,st2,st3', | | 'phoenix.zookeeper.znode.parent'='/hbase', | | 'transient_lastDdlTime'='1523607352') | +----------------------------------------------------+--+ ## Table2: hive internal table "native1": +-----------+------------+----------+--+ | col_name | data_type | comment | +-----------+------------+----------+--+ | cola | string | | | colb | string | | +-----------+------------+----------+--+ ## When join the two tables: select * from native1 join ext_tmp t on native1.cola= t.cola; Exception: org.apache.phoenix.schema.ColumnNotFoundException: ERROR 504 (42703): Undefined column. columnName=TMP Detailed exception as below. There is an empty column "" as readcolumn name, which is not exsit in Pheonix table at all. so the phoenix query will never be correct: select /*+ NO_CACHE */ "","cola","colb" from tmp where "cola" is not null Any thoughts? --------- 2018-04-14 21:13:40,923 INFO org.apache.hadoop.hive.ql.io.HiveInputFormat: [HiveServer2-Background-Pool: Thread-304]: hive.io.file.readcolumn.ids= 2018-04-14 21:13:40,923 INFO org.apache.hadoop.hive.ql.io.HiveInputFormat: [HiveServer2-Background-Pool: Thread-304]: hive.io.file.readcolumn.names=,cola,colb 2018-04-14 21:13:40,923 INFO org.apache.hadoop.hive.ql.io.HiveInputFormat: [HiveServer2-Background-Pool: Thread-304]: Generating splits 2018-04-14 21:13:40,924 INFO org.apache.phoenix.hive.query.PhoenixQueryBuilder: [HiveServer2-Background-Pool: Thread-304]: Input query : select /*+ NO_CACHE */ "","cola","colb" from tmp where "cola" is not null 2018-04-14 21:13:40,932 ERROR org.apache.phoenix.hive.mapreduce.PhoenixInputFormat: [HiveServer2-Background-Pool: Thread-304]: Failed to get the query plan with error [ERROR 504 (42703): Undefined column. columnName=TMP] Thanks, Wei