IMPALA-6070: Parallelize another bit of data load. The two Kudu loads and Hive UDFs can all run in parallel. This should shave about 4 minutes off of the data load. (Current timings are 3.5, 4, and 0.6 minutes, see below.)
I've run dataload with this change many times. Loading Kudu functional (logging to /home/ubuntu/Impala/logs/data_loading/load-kudu.log)... Loading workload 'functional-query' using exploration strategy 'core' in table formats 'kudu/none/none' OK (Took: 3 min 29 sec) Loading Kudu TPCH (logging to /home/ubuntu/Impala/logs/data_loading/load-kudu-tpch.log)... Loading workload 'tpch' using exploration strategy 'core' in table formats 'kudu/none/none' OK (Took: 4 min 0 sec) Loading Hive UDFs (logging to /home/ubuntu/Impala/logs/data_loading/build-and-copy-hive-udfs.log)... Loading Hive UDFs OK (Took: 0 min 41 sec) Change-Id: I7e93ee5a77ec9271b980b88bef7ad512ecbe0407 Reviewed-on: http://gerrit.cloudera.org:8080/8822 Reviewed-by: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Tested-by: Impala Public Jenkins Project: http://git-wip-us.apache.org/repos/asf/impala/repo Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/11dbb395 Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/11dbb395 Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/11dbb395 Branch: refs/heads/master Commit: 11dbb3952a1c598f27de281c5020ed2df325d6e8 Parents: 5c593be Author: Philip Zeyliger <phi...@cloudera.com> Authored: Tue Dec 12 15:38:54 2017 -0800 Committer: Impala Public Jenkins <impala-public-jenk...@gerrit.cloudera.org> Committed: Thu Dec 14 02:28:40 2017 +0000 ---------------------------------------------------------------------- testdata/bin/create-load-data.sh | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/impala/blob/11dbb395/testdata/bin/create-load-data.sh ---------------------------------------------------------------------- diff --git a/testdata/bin/create-load-data.sh b/testdata/bin/create-load-data.sh index 099fe59..df6622a 100755 --- a/testdata/bin/create-load-data.sh +++ b/testdata/bin/create-load-data.sh @@ -507,13 +507,14 @@ fi if $KUDU_IS_SUPPORTED; then # Tests depend on the kudu data being clean, so load the data from scratch. - run-step "Loading Kudu functional" load-kudu.log \ + run-step-backgroundable "Loading Kudu functional" load-kudu.log \ load-data "functional-query" "core" "kudu/none/none" force - run-step "Loading Kudu TPCH" load-kudu-tpch.log \ + run-step-backgroundable "Loading Kudu TPCH" load-kudu-tpch.log \ load-data "tpch" "core" "kudu/none/none" force fi -run-step "Loading Hive UDFs" build-and-copy-hive-udfs.log \ +run-step-backgroundable "Loading Hive UDFs" build-and-copy-hive-udfs.log \ build-and-copy-hive-udfs +run-step-wait-all run-step "Running custom post-load steps" custom-post-load-steps.log \ custom-post-load-steps