Quanlong Huang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21657


Change subject: IMPALA-13284: Loading test data on Apache Hive3
......................................................................

IMPALA-13284: Loading test data on Apache Hive3

There are some failures in loading test data on Apache Hive 3.1.3:
 - STORED AS JSONFILE is not supported
 - STORED BY ICEBERG is not supported. Similarly, STORED BY ICEBERG
   STORED AS AVRO is not supported.
 - Missing the jar of iceberg-hive-runtime in CLASSPATH of HMS and Tez
   jobs.
 - Creating table in Impala is not translated to EXTERNAL table in HMS
 - Hive INSERT on insert-only tables failed in generating InsertEvents
   (HIVE-20067).

This patch fixes the syntax issues by using old syntax of Apache Hive
3.1.3:
 - Convert STORED AS JSONFILE to ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.JsonSerDe'
 - Convert STORED BY ICEBERG to STORED BY
   'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
 - Convert STORED BY ICEBERG STORED AS AVRO to the above one and add an
   additional ALTER statement to set tblproperty 'write.format.default'
Most of the conversion are done in generate-schema-statements.py. One
exception is in testdata/bin/load-dependent-tables.sql where we need to
generate a new file with the conversion when using it.

The missing jar of iceberg-hive-runtime is added into HIVE_AUX_JARS_PATH
in bin/impala-config.sh. Note that this is only needed by Apache Hive3
since CDP Hive3 has the jar of hive-iceberg-handler in its lib folder.

To fix the failure of InsertEvents, we add the patch of HIVE-20067 and
modify testdata/bin/patch_hive.sh to also recompile the submodule
standalone-metastore.

Modified some statements in
testdata/datasets/functional/functional_schema_template.sql to be more
reliable in retry.

Tests
 - Verified the testdata can be loaded in ubuntu-20.04-from-scratch

Change-Id: I8f52c91602da8822b0f46f19dc4111c7187ce400
---
M bin/impala-config.sh
M testdata/bin/create-load-data.sh
M testdata/bin/generate-schema-statements.py
M testdata/bin/load-dependent-tables.sql
M testdata/bin/patch_hive.sh
M testdata/cluster/hive/README
A testdata/cluster/hive/patch3-HIVE-20067.diff
M testdata/datasets/functional/functional_schema_template.sql
8 files changed, 90 insertions(+), 10 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/21657/1
--
To view, visit http://gerrit.cloudera.org:8080/21657
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8f52c91602da8822b0f46f19dc4111c7187ce400
Gerrit-Change-Number: 21657
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <[email protected]>

Reply via email to