Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 )
Change subject: IMPALA-10317: Add query option that limits huge joins at runtime ...................................................................... Patch Set 3: (12 comments) http://gerrit.cloudera.org:8080/#/c/16706/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16706/3//COMMIT_MSG@11 PS3, Line 11: exceed nit: 'exceeding' instead of exceed http://gerrit.cloudera.org:8080/#/c/16706/3//COMMIT_MSG@18 PS3, Line 18: backend metrics for RowsReturned. Example from "select count(*) from For this example, can you also add the query option ? I assume it was set to 10M ? http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h File be/src/runtime/coordinator.h: http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h@230 PS3, Line 230: /// Total num rows produced by each join node Pls indicate what the key of the map is. I am also wondering whether the string (which is the node name) is the only way to identify the node. Is there no numeric id that you can use ? http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/coordinator.h@231 PS3, Line 231: per_num_join_rows_produced This could more simply be named 'per_join_rows_produced' (see similar comment elsewhere on the option name) http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc File be/src/runtime/fragment-instance-state.cc: http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc@331 PS3, Line 331: // stats join node nit: this comment is a bit confusing. Suggest rephrasing to 'row count stats for a join node' http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc@335 PS3, Line 335: rfind(nested_loop_type A bit odd that for hash_type you are passing a second argument of 0 while not doing the same for nested_loop_type ? If there's a reason for that it would be good to add a comment. http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/runtime/fragment-instance-state.cc@336 PS3, Line 336: per_num_join_rows_produced[node->name()] = rows_counter->value(); Is there not a unique numeric id that can be used here instead of the node name ? (related comment in coordinator.h) http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/impala-server.cc@2688 PS3, Line 2688: Status err = Status::Expected(TErrorCode::JOIN_ROWS_PRODUCED_LIMIT_EXCEEDED, it would be useful to know which join node (with the node id) in a complex plan exceeded this limit. Is it possible to print that info in the error message ? http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/query-options.cc@991 PS3, Line 991: case TImpalaQueryOptions::NUM_JOIN_ROWS_PRODUCED_LIMIT: { I think the prefix 'NUM' can be removed and simplify the naming. JOIN_ROWS_PRODUCED_LIMIT gives sufficient information that it is about the number of rows. Similar simplification of the variable names can be done in other parts of the code. http://gerrit.cloudera.org:8080/#/c/16706/3/be/src/service/query-options.cc@996 PS3, Line 996: return Status(Substitute("Invalid rows returned limit: '$0'. " 'rows returned limit' doesn't match the name of the query option. Should this be 'Invalid join rows produced limit'? http://gerrit.cloudera.org:8080/#/c/16706/3/testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test File testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test: http://gerrit.cloudera.org:8080/#/c/16706/3/testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test@71 PS3, Line 71: ---- QUERY Suggest adding a test with a mix of joins ..i.e couple of hash joins and a nested loop join where one or more of these might exceed the threshold. http://gerrit.cloudera.org:8080/#/c/16706/3/testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test@87 PS3, Line 87: in nit: remove trailing whitespace -- To view, visit http://gerrit.cloudera.org:8080/16706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 Gerrit-Change-Number: 16706 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu <chufu...@hotmail.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Comment-Date: Wed, 16 Dec 2020 08:11:26 +0000 Gerrit-HasComments: Yes