Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21657
Change subject: IMPALA-13284: Loading test data on Apache Hive3 ...................................................................... IMPALA-13284: Loading test data on Apache Hive3 There are some failures in loading test data on Apache Hive 3.1.3: - STORED AS JSONFILE is not supported - STORED BY ICEBERG is not supported. Similarly, STORED BY ICEBERG STORED AS AVRO is not supported. - Missing the jar of iceberg-hive-runtime in CLASSPATH of HMS and Tez jobs. - Creating table in Impala is not translated to EXTERNAL table in HMS - Hive INSERT on insert-only tables failed in generating InsertEvents (HIVE-20067). This patch fixes the syntax issues by using old syntax of Apache Hive 3.1.3: - Convert STORED AS JSONFILE to ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.JsonSerDe' - Convert STORED BY ICEBERG to STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' - Convert STORED BY ICEBERG STORED AS AVRO to the above one and add an additional ALTER statement to set tblproperty 'write.format.default' Most of the conversion are done in generate-schema-statements.py. One exception is in testdata/bin/load-dependent-tables.sql where we need to generate a new file with the conversion when using it. The missing jar of iceberg-hive-runtime is added into HIVE_AUX_JARS_PATH in bin/impala-config.sh. Note that this is only needed by Apache Hive3 since CDP Hive3 has the jar of hive-iceberg-handler in its lib folder. To fix the failure of InsertEvents, we add the patch of HIVE-20067 and modify testdata/bin/patch_hive.sh to also recompile the submodule standalone-metastore. Modified some statements in testdata/datasets/functional/functional_schema_template.sql to be more reliable in retry. Tests - Verified the testdata can be loaded in ubuntu-20.04-from-scratch Change-Id: I8f52c91602da8822b0f46f19dc4111c7187ce400 --- M bin/impala-config.sh M testdata/bin/create-load-data.sh M testdata/bin/generate-schema-statements.py M testdata/bin/load-dependent-tables.sql M testdata/bin/patch_hive.sh M testdata/cluster/hive/README A testdata/cluster/hive/patch3-HIVE-20067.diff M testdata/datasets/functional/functional_schema_template.sql 8 files changed, 90 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/21657/1 -- To view, visit http://gerrit.cloudera.org:8080/21657 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I8f52c91602da8822b0f46f19dc4111c7187ce400 Gerrit-Change-Number: 21657 Gerrit-PatchSet: 1 Gerrit-Owner: Quanlong Huang <[email protected]>
