Hello Philip Zeyliger, Impala Public Jenkins,
I'd like you to do a code review. Please visit
http://gerrit.cloudera.org:8080/10167
to review the following change.
Change subject: IMPALA-6899: Optimize the HDFS commands used in dataload
......................................................................
IMPALA-6899: Optimize the HDFS commands used in dataload
HDFS commandline calls can be expensive due to JVM
startup and other costs. Since most HDFS commandline
calls can take multiple paths, one way to reduce
execution time is to consolidate multiple HDFS
commands into a single HDFS call. Since HDFS put
commands will follow symbolic links and can copy
recursively, this can allow for further consolidation
by creating the full directory structure and
copying it in a single HDFS call.
This does several of these optimizations throughout
the dataload codepath. It saves a few seconds here
and there:
Loading Hive Builtins: 1:10 -> 0:30
Loading custom schemas: 0:35 -> 0:20
Loading Hive UDFs: 0:45 -> 0:25
Conflicts:
testdata/bin/copy-udfs-udas.sh - conflict due to
"Loosen hive-exec.jar glob pattern..."
Change-Id: I0934353329dc7312394fc4457ab8db2a272c6282
Reviewed-on: http://gerrit.cloudera.org:8080/10120
Reviewed-by: Philip Zeyliger <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
(cherry picked from commit da363a99a4b1afff91600c71650e26932be9350a)
---
M testdata/bin/copy-udfs-udas.sh
M testdata/bin/create-load-data.sh
M testdata/bin/load-hive-builtins.sh
3 files changed, 131 insertions(+), 122 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/10167/1
--
To view, visit http://gerrit.cloudera.org:8080/10167
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0934353329dc7312394fc4457ab8db2a272c6282
Gerrit-Change-Number: 10167
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Philip Zeyliger <[email protected]>