Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17144 )
Change subject: IMPALA-10551: Add result sink support for external frontends ...................................................................... Patch Set 6: (4 comments) This is making sense to me. http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc@796 PS6, Line 796: // All instances must have reported their final statuses before finalization, which is a : // post-condition of Wait. If the query was not successful, still try to clean up the : // staging directory. This is a copy/paste from FinalizeHdfsDml(), so the second sentence doesn't mean that we actually thought that this would remove files in the case of errors. Based on what we do today, we should change the second sentence to say that this does not clean up files in the event of an error. The external frontend handles that (or that is my assumption). http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc@802 PS6, Line 802: RETURN_IF_ERROR(UpdateExecState(Status::OK(), nullptr, FLAGS_hostname)); If there is an error from execution, it would show up here and this would return. The rest of this is the success case. It sounds intentional based on other comments, and if the external frontend is doing cleanup, this makes sense. The only concern would be query retries on the Impala side (the "retry_failed_queries" query option). I'm a bit unclear about how retry_failed_queries would interact with an external frontend, but to be safe, an external frontend would want to make sure "retry_failed_queries" is not true for these types of queries until we can test/fix it. It might be worth adding a DCHECK somewhere that the retry_failed_queries query option is false when using the result sink. http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc@807 PS6, Line 807: 0 Nit: This "0" is the table id. I'm guessing 0 is a special constant for the result sink? It would be nice to have this be a named constant. http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc@810 PS6, Line 810: << finalize_params()->table_id; I think this should be "0" (i.e. the special table id used in the CreateHdfsTblDescriptor() call). I think this got carried over from FinalizeHdfsDml(). -- To view, visit http://gerrit.cloudera.org:8080/17144 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072 Gerrit-Change-Number: 17144 Gerrit-PatchSet: 6 Gerrit-Owner: John Sherman <j...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: John Sherman <j...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Comment-Date: Thu, 11 Mar 2021 19:17:39 +0000 Gerrit-HasComments: Yes