Joe McDonnell has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8350
Change subject: IMPALA-6068: Fix dataload for complextypes_fileformat ...................................................................... IMPALA-6068: Fix dataload for complextypes_fileformat Dataload typically follows a pattern of loading data into a text version of a table, and then using an insert overwrite from the text table to populate the table for other file formats. This insert is always done in Impala for Parquet and Kudu. Otherwise it runs in Hive. Since Impala doesn't support writing nested data, the population of complextypes_fileformat tries to hack the insert to run in Hive by including it in the ALTER part of the table definition. ALTER runs immediately after CREATE and always runs in Hive. The problem is that ALTER also runs before the base table (functional.complextypes_fileformat) is populated. The insert succeeds, but it is inserting zero rows. This code change introduces a way to force the Parquet load to run using Hive. This lets complextypes_fileformat specify that the insert should happen in Hive and fixes the ordering so that the table is populated correctly. This is also useful for loading custom Parquet files into Parquet tables. Hive supports the DATA LOAD LOCAL syntax, which can read a file from the local filesystem. This means that several locations that currently use the hdfs commandline can be modified to use this SQL. This change speeds up dataload by a few minutes, as it avoids the overhead of the hdfs commandline. Any other location that could use DATA LOAD LOCAL is also switched over to use it. Any location that already uses DATA LOAD LOCAL is also switched to indicate that it must run in Hive. Testing: Ran dataload and verified that functional_parquet.complextypes_fileformat has rows. Change-Id: I7152306b2907198204a6d8d282a0bad561129b82 --- M testdata/bin/create-load-data.sh M testdata/bin/generate-schema-statements.py M testdata/datasets/functional/functional_schema_template.sql 3 files changed, 133 insertions(+), 130 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/8350/1 -- To view, visit http://gerrit.cloudera.org:8080/8350 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I7152306b2907198204a6d8d282a0bad561129b82 Gerrit-Change-Number: 8350 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell <joemcdonn...@cloudera.com>