Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10167 )
Change subject: IMPALA-6899: Optimize the HDFS commands used in dataload ...................................................................... IMPALA-6899: Optimize the HDFS commands used in dataload HDFS commandline calls can be expensive due to JVM startup and other costs. Since most HDFS commandline calls can take multiple paths, one way to reduce execution time is to consolidate multiple HDFS commands into a single HDFS call. Since HDFS put commands will follow symbolic links and can copy recursively, this can allow for further consolidation by creating the full directory structure and copying it in a single HDFS call. This does several of these optimizations throughout the dataload codepath. It saves a few seconds here and there: Loading Hive Builtins: 1:10 -> 0:30 Loading custom schemas: 0:35 -> 0:20 Loading Hive UDFs: 0:45 -> 0:25 Conflicts: testdata/bin/copy-udfs-udas.sh - conflict due to "Loosen hive-exec.jar glob pattern..." Change-Id: I0934353329dc7312394fc4457ab8db2a272c6282 Reviewed-on: http://gerrit.cloudera.org:8080/10120 Reviewed-by: Philip Zeyliger <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> (cherry picked from commit da363a99a4b1afff91600c71650e26932be9350a) Reviewed-on: http://gerrit.cloudera.org:8080/10167 Reviewed-by: Joe McDonnell <[email protected]> --- M testdata/bin/copy-udfs-udas.sh M testdata/bin/create-load-data.sh M testdata/bin/load-hive-builtins.sh 3 files changed, 131 insertions(+), 122 deletions(-) Approvals: Joe McDonnell: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/10167 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: 2.x Gerrit-MessageType: merged Gerrit-Change-Id: I0934353329dc7312394fc4457ab8db2a272c6282 Gerrit-Change-Number: 10167 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Philip Zeyliger <[email protected]>
