Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17145 )
Change subject: IMPALA-10552: External Frontend CTAS support ...................................................................... Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/17145/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17145/4//COMMIT_MSG@12 PS4, Line 12: - When this is set, the external frontend is responsible for : for moving and managing the results I assume the external frontend is also handling any metadata operations as well. Is that right? http://gerrit.cloudera.org:8080/#/c/17145/4/be/src/exec/hdfs-table-sink.cc File be/src/exec/hdfs-table-sink.cc: http://gerrit.cloudera.org:8080/#/c/17145/4/be/src/exec/hdfs-table-sink.cc@127 PS4, Line 127: if (HasExternalStagingDir()) { : // When an external FE has provided a staging path, we use it directly. : staging_dir_ = external_staging_dir_; : } else { : staging_dir_ = Substitute("$0/_impala_insert_staging/$1", : table_desc_->hdfs_base_dir(), PrintId(state->query_id(), "_")); : } The external staging directory is a staging directory from the point of view of an external frontend, but in this file, it doesn't use the staging codepath. It uses the direct writing codepath with an alternate location (i.e. ShouldSkipStaging() returns true and it customizes final_hdfs_file_name_prefix). So, I think there is no need to set staging_dir_ to something different. That doesn't interact with the external staging directory codepath. Separately, what do you think about calling this something other than a staging directory? Custom output location? External output directory? http://gerrit.cloudera.org:8080/#/c/17145/4/be/src/exec/hdfs-table-sink.cc@279 PS4, Line 279: // The 0 padding on base and delta is to match the behavior of Hive since various : // systems will expect a certain format for dynamic partition creation. Additionally : // include an 0 statement id for delta directory so various Hive AcidUtils detect : // the directory (such as AcidUtils.baseOrDeltaSubdir()). Multiple statements in a : // single transaction is not supported. : if (overwrite_) { : output_partition->final_hdfs_file_name_prefix += StringPrintf("/base_%07ld/", : write_id_); : } else { : output_partition->final_hdfs_file_name_prefix += StringPrintf( : "/delta_%07ld_%07ld_0000/", write_id_, write_id_); : } Should this be triggered by its own variable independently from the external staging directory? It seems orthogonal to where the output goes. -- To view, visit http://gerrit.cloudera.org:8080/17145 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iae0ea4a832d8281c563427d0d7da1623bfce437b Gerrit-Change-Number: 17145 Gerrit-PatchSet: 4 Gerrit-Owner: John Sherman <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: John Sherman <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Comment-Date: Mon, 08 Mar 2021 20:52:53 +0000 Gerrit-HasComments: Yes
