Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16706 )

Change subject: IMPALA-10317: Add query option that limits huge joins at runtime
......................................................................


Patch Set 3:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/16706/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16706/3//COMMIT_MSG@11
PS3, Line 11: exceed
nit: 'exceeding' instead of exceed


http://gerrit.cloudera.org:8080/#/c/16706/3//COMMIT_MSG@18
PS3, Line 18: backend metrics for RowsReturned. Example from "select count(*) 
from
For this example, can you also add the query option ? I assume it was set to 
10M ?


http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h
File be/src/runtime/coordinator.h:

http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h@230
PS3, Line 230:     /// Total num rows produced by each join node
Pls indicate what the key of the map is. I am also wondering whether the string 
(which is the node name) is the only way to identify the node. Is there no 
numeric id that you can use ?


http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h@231
PS3, Line 231: per_num_join_rows_produced
This could more simply be named 'per_join_rows_produced'  (see similar comment 
elsewhere  on the option name)


http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc
File be/src/runtime/fragment-instance-state.cc:

http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc@331
PS3, Line 331:         // stats join node
nit: this comment is a bit confusing.  Suggest rephrasing to 'row count stats 
for a join node'


http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc@335
PS3, Line 335: rfind(nested_loop_type
A bit odd that for hash_type you are passing a second argument of 0 while not 
doing the same for nested_loop_type ? If there's a reason for that it would be 
good to add a comment.


http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc@336
PS3, Line 336:           per_num_join_rows_produced[node->name()] = 
rows_counter->value();
Is there not a unique numeric id that can be used here instead of the node name 
? (related comment in coordinator.h)


http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/impala-server.cc@2688
PS3, Line 2688:     Status err = 
Status::Expected(TErrorCode::JOIN_ROWS_PRODUCED_LIMIT_EXCEEDED,
it would be useful to know which join node (with the node id) in a complex plan 
exceeded this limit. Is it possible to print that info in the error message ?


http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/query-options.cc@991
PS3, Line 991:       case TImpalaQueryOptions::NUM_JOIN_ROWS_PRODUCED_LIMIT: {
I think the prefix 'NUM' can be removed and simplify the naming.  
JOIN_ROWS_PRODUCED_LIMIT gives sufficient information that it is about the 
number of rows.  Similar simplification of the variable names can be done in 
other parts of the code.


http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/query-options.cc@996
PS3, Line 996:           return Status(Substitute("Invalid rows returned limit: 
'$0'. "
'rows returned limit' doesn't match the name of the query option. Should this 
be 'Invalid join rows produced limit'?


http://gerrit.cloudera.org:8080/#/c/16706/3/testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test
File 
testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test:

http://gerrit.cloudera.org:8080/#/c/16706/3/testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test@71
PS3, Line 71: ---- QUERY
Suggest adding a test with a mix of joins ..i.e couple of hash joins and a 
nested loop join where one or more of these might exceed the threshold.


http://gerrit.cloudera.org:8080/#/c/16706/3/testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test@87
PS3, Line 87: in
nit: remove trailing whitespace



--
To view, visit http://gerrit.cloudera.org:8080/16706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02
Gerrit-Change-Number: 16706
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu <chufu...@hotmail.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Comment-Date: Wed, 16 Dec 2020 08:11:26 +0000
Gerrit-HasComments: Yes

Reply via email to