Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17144 )

Change subject: IMPALA-10551: Add result sink support for external frontends
......................................................................


Patch Set 6:

(4 comments)

This is making sense to me.

http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc@796
PS6, Line 796:   // All instances must have reported their final statuses 
before finalization, which is a
             :   // post-condition of Wait. If the query was not successful, 
still try to clean up the
             :   // staging directory.
This is a copy/paste from FinalizeHdfsDml(), so the second sentence doesn't 
mean that we actually thought that this would remove files in the case of 
errors.

Based on what we do today, we should change the second sentence to say that 
this does not clean up files in the event of an error. The external frontend 
handles that (or that is my assumption).


http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc@802
PS6, Line 802:   RETURN_IF_ERROR(UpdateExecState(Status::OK(), nullptr, 
FLAGS_hostname));
If there is an error from execution, it would show up here and this would 
return. The rest of this is the success case. It sounds intentional based on 
other comments, and if the external frontend is doing cleanup, this makes sense.

The only concern would be query retries on the Impala side (the 
"retry_failed_queries" query option). I'm a bit unclear about how 
retry_failed_queries would interact with an external frontend, but to be safe, 
an external frontend would want to make sure "retry_failed_queries" is not true 
for these types of queries until we can test/fix it. It might be worth adding a 
DCHECK somewhere that the retry_failed_queries query option is false when using 
the result sink.


http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc@807
PS6, Line 807: 0
Nit: This "0" is the table id. I'm guessing 0 is a special constant for the 
result sink? It would be nice to have this be a named constant.


http://gerrit.cloudera.org:8080/#/c/17144/6/be/src/runtime/coordinator.cc@810
PS6, Line 810:     << finalize_params()->table_id;
I think this should be "0" (i.e. the special table id used in the 
CreateHdfsTblDescriptor() call). I think this got carried over from 
FinalizeHdfsDml().



--
To view, visit http://gerrit.cloudera.org:8080/17144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024bf41d77bb81f1ab0debdbd31ec3687c83f072
Gerrit-Change-Number: 17144
Gerrit-PatchSet: 6
Gerrit-Owner: John Sherman <j...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: John Sherman <j...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Mar 2021 19:17:39 +0000
Gerrit-HasComments: Yes

Reply via email to