Philip Zeyliger has uploaded this change for review. (
http://gerrit.cloudera.org:8080/8822
Change subject: IMPALA-6070: Parallelize another bit of data load.
......................................................................
IMPALA-6070: Parallelize another bit of data load.
The two Kudu loads and Hive UDFs can all run in parallel. This
should shave about 4 minutes off of the data load. (Current
timings are 3.5, 4, and 0.6 minutes, see below.)
I've run dataload with this change many times.
Loading Kudu functional (logging to
/home/ubuntu/Impala/logs/data_loading/load-kudu.log)...
Loading workload 'functional-query' using exploration strategy 'core' in
table formats 'kudu/none/none' OK (Took: 3 min 29 sec)
Loading Kudu TPCH (logging to
/home/ubuntu/Impala/logs/data_loading/load-kudu-tpch.log)...
Loading workload 'tpch' using exploration strategy 'core' in table formats
'kudu/none/none' OK (Took: 4 min 0 sec)
Loading Hive UDFs (logging to
/home/ubuntu/Impala/logs/data_loading/build-and-copy-hive-udfs.log)...
Loading Hive UDFs OK (Took: 0 min 41 sec)
Change-Id: I7e93ee5a77ec9271b980b88bef7ad512ecbe0407
---
M testdata/bin/create-load-data.sh
1 file changed, 4 insertions(+), 3 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/8822/1
--
To view, visit http://gerrit.cloudera.org:8080/8822
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7e93ee5a77ec9271b980b88bef7ad512ecbe0407
Gerrit-Change-Number: 8822
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger <[email protected]>
Gerrit-Reviewer: Philip Zeyliger <[email protected]>