[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 5: (5 comments) http://gerrit.cloudera.org:8080/#/c/16765/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16765/3//COMMIT_MSG@10 PS3, Line 10: This happens : because results are spilled using a SpillableRowBatchQueue which needs 2 : buffer (read and write) with at least MAX_ROW_SIZE bytes per buffer. : This patch fixes this by s > nit: how about : Done http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py File tests/query_test/test_result_spooling.py: http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@68 PS3, Line 68: exec_options['max_row_size'] = 16 * 1024 > I think you can remove this change and the one in test_query_retries since test_spilling and test_query_retries need to force spill by lowering spill related query options. As consequence of this patch, MAX_ROW_SIZE now also need to be lowered in order to force spill. The default MAX_ROW_SIZE is too high (512 KB) for the context of the tests. With changes introduced in PlanRootSink.java, default MAX_ROW_SIZE will now contribute to pushing maxMemReservationBytes to 1 MB, which in turn will fit all rows in memory (no spill happens). I have double check by commenting the MAX_ROW_SIZE option, and these tests failed without it. http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@106 PS3, Line 106: assert re.search(plan_root_sink_reservation_limit, profile) This assertion, however, is OK to delete, because it is not the focus of the test. I just add this to verify that ReservationLimit stay the same as before when MAX_ROW_SIZE does not contribute to maxMemReservationBytes. Similarly with the one I added in test_query_retries. Let me know what you think of. http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@419 PS3, Line 419: """These tests verify that while calculating max_reservation for spooling these query > nit: you can add a comment here saying that these tests verify that while c Done http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@479 PS3, Line 479: ") > nit: spills Done -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 26 Nov 2020 04:22:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9865: part 1: basic profile log parser
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16821 ) Change subject: IMPALA-9865: part 1: basic profile log parser .. Patch Set 2: (2 comments) > Uploaded patch set 2. Hi Tim, Thanks for writing this parser. This is very useful. I just have 2 comment. http://gerrit.cloudera.org:8080/#/c/16821/2/be/src/util/impala-profile-tool.cc File be/src/util/impala-profile-tool.cc: http://gerrit.cloudera.org:8080/#/c/16821/2/be/src/util/impala-profile-tool.cc@31 PS2, Line 31: // is pretty-printed to standard output. Add simple usage example in the doc maybe? like impala-profile-tool < impala_profile_log_1.1-1607057366897 http://gerrit.cloudera.org:8080/#/c/16821/2/be/src/util/impala-profile-tool.cc@59 PS2, Line 59: getline(cin, line); Tried to run the parser against my local runtime profile log. It seems It always hit "Error reading line" when it reach EOF. What if we move this getline as the loop condition? say for (std::string line; std::getline(cin, line); ) { -- To view, visit http://gerrit.cloudera.org:8080/16821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6178399ac96e176f7067cc47347e51cda2f3 Gerrit-Change-Number: 16821 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 05 Dec 2020 02:48:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Hello Quanlong Huang, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16765 to look at the new patch set (#7). Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation PlanRootSink can fail silently if result spooling is enabled and maxMemReservationBytes is less than 2 * MAX_ROW_SIZE. This happens because results are spilled using a SpillableRowBatchQueue which needs 2 buffer (read and write) with at least MAX_ROW_SIZE bytes per buffer. This patch fixes this by setting a lower bound of 2 * MAX_ROW_SIZE while computing the min reservation for the PlanRootSink. Testing: - Pass exhaustive tests. - Add e2e TestResultSpoolingMaxReservation. - Lower MAX_ROW_SIZE on tests where MAX_RESULT_SPOOLING_MEM is set to extremely low value. Also verify that PLAN_ROOT_SINK's ReservationLimit remain unchanged after lowering the MAX_ROW_SIZE. Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 --- M be/src/runtime/buffered-tuple-stream.h M be/src/runtime/spillable-row-batch-queue.cc M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M tests/custom_cluster/test_query_retries.py M tests/query_test/test_result_spooling.py 5 files changed, 118 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/16765/7 -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 7: (3 comments) Thanks Bikram, http://gerrit.cloudera.org:8080/#/c/16765/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16765/6//COMMIT_MSG@14 PS6, Line 14: computing the min reservation for the PlanRootS > nit: i think this description is slightly opaque since the reader would hav Done http://gerrit.cloudera.org:8080/#/c/16765/6/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java: http://gerrit.cloudera.org:8080/#/c/16765/6/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@90 PS6, Line 90: bufferSize, queryOptions.getMax_row_size()); > nit: can you add a small comment on why we need 2 maxRowBufferSize, either I edited the method documentation a bit to explain this. Hope it is clear. http://gerrit.cloudera.org:8080/#/c/16765/6/tests/query_test/test_result_spooling.py File tests/query_test/test_result_spooling.py: http://gerrit.cloudera.org:8080/#/c/16765/6/tests/query_test/test_result_spooling.py@426 PS6, Line 426: DEBUG_ACTION_V > nit: switch to ALL_CAPS to be consistent with rest of the codebase. Done -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 04 Dec 2020 04:55:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9865: part 1: basic profile log parser
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16821 ) Change subject: IMPALA-9865: part 1: basic profile log parser .. Patch Set 4: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/16821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6178399ac96e176f7067cc47347e51cda2f3 Gerrit-Change-Number: 16821 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 Dec 2020 22:39:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/16765/1/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java: http://gerrit.cloudera.org:8080/#/c/16765/1/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@87 PS1, Line 87: getDefault_spillable_buffer_size > Is there any relation to this and the max_row_Size? My understanding is that DEFAULT_SPILLABLE_BUFFER_SIZE control the minimal unit of page size when increasing memory reservation. It is intended to tune performance in spill operation (larger buffer sizes result in Impala issuing larger I/O requests to storage devices, which might result in higher throughput) rather than in-memory allocation. it contribute to calculation of maxRowBufferSize, but not to maxMemReservationBytes. Let say user only set MAX_ROW_SIZE to 256MB, and leave DEFAULT_SPILLABLE_BUFFER_SIZE and MAX_RESULT_SPOOLING_MEM as default (2MB and 100 MB accordingly). Without the patch, the result is the following: bufferSize = 2MB maxRowBufferSize = 256MB (rounded up to nearest power of two) minMemReservationBytes = 4MB maxMemReservationBytes = 100MB maxMemReservationBytes should be at least 2 * MAX_ROW_SIZE for reservation of read+write page to be satisfied, which is not true in this case. http://gerrit.cloudera.org:8080/#/c/16765/1/tests/custom_cluster/test_query_retries.py File tests/custom_cluster/test_query_retries.py: http://gerrit.cloudera.org:8080/#/c/16765/1/tests/custom_cluster/test_query_retries.py@589 PS1, Line 589: 'max_row_size': 4 * 1024, > is this having any effect on the query? the max_result_spooling_mem is alre It intended to keep the test the same, keeping the maxMemReservationBytes to 8KB. Default MAX_ROW_SIZE is 512KB. With the proposed changes in PlannerRootSink.java and without explicitly lowering MAX_ROW_SIZE to 4KB, MAX_RESULT_SPOOLING_MEM will be ignored and maxMemReservationBytes will be set to 1MB instead of intended 8KB. -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 24 Nov 2020 01:28:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 6: (5 comments) http://gerrit.cloudera.org:8080/#/c/16765/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16765/3//COMMIT_MSG@26 PS3, Line 26: > A better test is using SET_DENY_RESERVATION_PROBABILITY to verify the Buffe Added debug action SET_DENY_RESERVATION_PROBABILITY@1.0 in TestResultSpoolingMaxReservation. test_spilling_large_rows already exercise SET_DENY_RESERVATION_PROBABILITY with mt_dop=1. http://gerrit.cloudera.org:8080/#/c/16765/5/be/src/runtime/spillable-row-batch-queue.cc File be/src/runtime/spillable-row-batch-queue.cc: http://gerrit.cloudera.org:8080/#/c/16765/5/be/src/runtime/spillable-row-batch-queue.cc@97 PS5, Line 97: // have been set. > It'd be helpful to log the row size and batch_queue_->DebugString() here. Added row size and batch_queue_->DebugString() info in the log. However, this reminds me of IMPALA-9851. BufferedTupleStream::DebugString() iterate std::list that can potentially grow very large. While most references to this method are behind DCHEK logging (thus stripped in release build), one of them is used to build Status message through call to GroupingAggregator::Partition::DebugString() at this line: https://github.com/apache/impala/blob/5530b62/be/src/exec/grouping-aggregator-partition.cc#L155 IMPALA-9851 already cap error message to 128kb at most, but that does not eliminate the cost to iterate page list in BufferedTupleStream::DebugString(). Should I file this as separate Jira? http://gerrit.cloudera.org:8080/#/c/16765/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java: http://gerrit.cloudera.org:8080/#/c/16765/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@90 PS3, Line 90: long minMemReservationBytes = 2 * maxRowBufferSize; > We should be able to work using the minimal reservation. So I think the pro Make sense. Bump up minMemReservation instead of maxMemReservationBytes in patch set #6. http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py File tests/query_test/test_result_spooling.py: http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@68 PS3, Line 68: exec_options['max_row_size'] = 16 * 1024 > test_spilling and test_query_retries need to force spill by lowering spill Done http://gerrit.cloudera.org:8080/#/c/16765/3/tests/query_test/test_result_spooling.py@106 PS3, Line 106: assert re.search(plan_root_sink_reservation_limit, profile) > Thanks for the context above, makes sense to keep this too. Done -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 30 Nov 2020 22:32:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Hello Quanlong Huang, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16765 to look at the new patch set (#6). Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation PlanRootSink can fail silently if result spooling is enabled and maxMemReservationBytes is less than 2 * MAX_ROW_SIZE. This happens because results are spilled using a SpillableRowBatchQueue which needs 2 buffer (read and write) with at least MAX_ROW_SIZE bytes per buffer. This patch fixes this by setting the PlanRootSink's minMemReservationBytes as 2 * maxRowBufferSize. Testing: - Pass exhaustive tests. - Add e2e TestResultSpoolingMaxReservation. - Lower MAX_ROW_SIZE on tests where MAX_RESULT_SPOOLING_MEM is set to extremely low value. Also verify that PLAN_ROOT_SINK's ReservationLimit remain unchanged after lowering the MAX_ROW_SIZE. Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 --- M be/src/runtime/buffered-tuple-stream.h M be/src/runtime/spillable-row-batch-queue.cc M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M tests/custom_cluster/test_query_retries.py M tests/query_test/test_result_spooling.py 5 files changed, 115 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/16765/6 -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16765 ) Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/16765/5/be/src/runtime/spillable-row-batch-queue.cc File be/src/runtime/spillable-row-batch-queue.cc: http://gerrit.cloudera.org:8080/#/c/16765/5/be/src/runtime/spillable-row-batch-queue.cc@97 PS5, Line 97: // have been set. > Good point! I didin't realize IMPALA-9851 when adding that DebugString() in Done. Filed as IMPALA-10374. -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 02 Dec 2020 18:34:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10390: impala-profile-tool JSON output
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16855 ) Change subject: IMPALA-10390: impala-profile-tool JSON output .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/16855/2/be/src/util/impala-profile-tool.cc File be/src/util/impala-profile-tool.cc: http://gerrit.cloudera.org:8080/#/c/16855/2/be/src/util/impala-profile-tool.cc@44 PS2, Line 44: " json: output as JSON with one profile per line\n" missing description for prettyjson option? http://gerrit.cloudera.org:8080/#/c/16855/2/be/src/util/impala-profile-tool.cc@120 PS2, Line 120: CHECK_EQ("json", profile_format); For json and prettyjson format, what if we return array of json profile? So if this parser return multiple profile, the output is still a valid json and can be piped to other program that read json like jq. -- To view, visit http://gerrit.cloudera.org:8080/16855 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I82ae0fe9379b7e3cbe93166adaa4c37212ea0f67 Gerrit-Change-Number: 16855 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 10 Dec 2020 21:16:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9856: Enable result spooling by default.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16755 ) Change subject: IMPALA-9856: Enable result spooling by default. .. Patch Set 3: (10 comments) Patch set 3 is a rebase after several commits get in, including IMPALA-10337. I add more descriptive comments on tests where we need to explicitly disable result spooling. We still need to wait for IMPALA-10371 to be resolved before we can enable result spooling by default. http://gerrit.cloudera.org:8080/#/c/16755/1/tests/custom_cluster/test_admission_controller.py File tests/custom_cluster/test_admission_controller.py: http://gerrit.cloudera.org:8080/#/c/16755/1/tests/custom_cluster/test_admission_controller.py@337 PS1, Line 337: _timeout_ > nit: mention briefly what we are trying to assert, that way we know from th Done http://gerrit.cloudera.org:8080/#/c/16755/1/tests/custom_cluster/test_admission_controller.py@873 PS1, Line 873: u > flake8: E203 whitespace before ':' Done http://gerrit.cloudera.org:8080/#/c/16755/2/tests/custom_cluster/test_admission_controller.py File tests/custom_cluster/test_admission_controller.py: http://gerrit.cloudera.org:8080/#/c/16755/2/tests/custom_cluster/test_admission_controller.py@883 PS2, Line 883: : > flake8: E203 whitespace before ':' Done http://gerrit.cloudera.org:8080/#/c/16755/1/tests/custom_cluster/test_observability.py File tests/custom_cluster/test_observability.py: http://gerrit.cloudera.org:8080/#/c/16755/1/tests/custom_cluster/test_observability.py@37 PS1, Line 37: ue to unresolve > nit: mention why will this cause a crash Done http://gerrit.cloudera.org:8080/#/c/16755/1/tests/custom_cluster/test_observability.py@38 PS1, Line 38: l > flake8: E203 whitespace before ':' Done http://gerrit.cloudera.org:8080/#/c/16755/2/tests/custom_cluster/test_observability.py File tests/custom_cluster/test_observability.py: http://gerrit.cloudera.org:8080/#/c/16755/2/tests/custom_cluster/test_observability.py@40 PS2, Line 40: : > flake8: E203 whitespace before ':' Done http://gerrit.cloudera.org:8080/#/c/16755/1/tests/query_test/test_udfs.py File tests/query_test/test_udfs.py: http://gerrit.cloudera.org:8080/#/c/16755/1/tests/query_test/test_udfs.py@623 PS1, Line 623: l > flake8: E203 whitespace before ':' Done http://gerrit.cloudera.org:8080/#/c/16755/1/tests/query_test/test_udfs.py@630 PS1, Line 630: > flake8: E501 line too long (98 > 90 characters) Done http://gerrit.cloudera.org:8080/#/c/16755/2/tests/query_test/test_udfs.py File tests/query_test/test_udfs.py: http://gerrit.cloudera.org:8080/#/c/16755/2/tests/query_test/test_udfs.py@625 PS2, Line 625: : > flake8: E203 whitespace before ':' Done http://gerrit.cloudera.org:8080/#/c/16755/2/tests/query_test/test_udfs.py@632 PS2, Line 632: > flake8: E501 line too long (98 > 90 characters) Done -- To view, visit http://gerrit.cloudera.org:8080/16755 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e360c1428676d8f3fab5d95efee18aca085eba4 Gerrit-Change-Number: 16755 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 10 Dec 2020 18:38:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9856: Enable result spooling by default.
Hello Quanlong Huang, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16755 to look at the new patch set (#2). Change subject: IMPALA-9856: Enable result spooling by default. .. IMPALA-9856: Enable result spooling by default. Result spooling has been relatively stable since it was introduced, and it has several benefits described in IMPALA-8656. This patch enable result spooling (SPOOL_QUERY_RESULTS) query options by default. Furthermore, some tests need to be adjusted to account for result spooling by default. The following are the adjustment categories and list of tests that fall under such category. Change in assertions: PlannerTest#testAcidTableScans PlannerTest#testBloomFilterAssignment PlannerTest#testConstantFolding PlannerTest#testFkPkJoinDetection PlannerTest#testFkPkJoinDetectionWithHDFSNumRowsEstDisabled PlannerTest#testKuduSelectivity PlannerTest#testMaxRowSize PlannerTest#testMinMaxRuntimeFilters PlannerTest#testMinMaxRuntimeFiltersWithHDFSNumRowsEstDisabled PlannerTest#testMtDopValidation PlannerTest#testParquetFiltering PlannerTest#testParquetFilteringDisabled PlannerTest#testPartitionPruning PlannerTest#testPreaggBytesLimit PlannerTest#testResourceRequirements PlannerTest#testRuntimeFilterQueryOptions PlannerTest#testSortExprMaterialization PlannerTest#testSpillableBufferSizing PlannerTest#testTableSample PlannerTest#testTpch PlannerTest#testKuduTpch PlannerTest#testTpchNested PlannerTest#testUnion TpcdsPlannerTest custom_cluster/test_admission_controller.py::TestAdmissionController::test_dedicated_coordinator_planner_estimates custom_cluster/test_admission_controller.py::TestAdmissionController::test_memory_rejection custom_cluster/test_admission_controller.py::TestAdmissionController::test_pool_mem_limit_configs metadata/test_explain.py::TestExplain::test_explain_level2 metadata/test_explain.py::TestExplain::test_explain_level3 metadata/test_stats_extrapolation.py::TestStatsExtrapolation::test_stats_extrapolation Increase BUFFER_POOL_LIMIT: query_test/test_queries.py::TestQueries::test_analytic_fns query_test/test_runtime_filters.py::TestRuntimeRowFilters::test_row_filter_reservation query_test/test_sort.py::TestQueryFullSort::test_multiple_mem_limits_full_output query_test/test_spilling.py::TestSpillingBroadcastJoins::test_spilling_broadcast_joins query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_aggs query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_regression_exhaustive query_test/test_udfs.py::TestUdfExecution::test_mem_limits Increase MEM_LIMIT: query_test/test_mem_usage_scaling.py::TestExchangeMemUsage::test_exchange_mem_usage_scaling query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling Increase MAX_ROW_SIZE: custom_cluster/test_parquet_max_page_header.py::TestParquetMaxPageHeader::test_large_page_header_config query_test/test_insert.py::TestInsertQueries::test_insert_large_string query_test/test_query_mem_limit.py::TestQueryMemLimit::test_mem_limit query_test/test_scanners.py::TestTextSplitDelimiters::test_text_split_across_buffers_delimiter query_test/test_scanners.py::TestWideRow::test_wide_row Disable result spooling to maintain assertion: custom_cluster/test_admission_controller.py::TestAdmissionController::test_set_request_pool custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_host_memory custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_pool_memory custom_cluster/test_admission_controller.py::TestAdmissionController::test_queue_reasons_memory custom_cluster/test_admission_controller.py::TestAdmissionController::test_pool_config_change_while_queued custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_fetched_rows custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_finished_query custom_cluster/test_scratch_disk.py::TestScratchDir::test_no_dirs custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_existing_dirs custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_writable_dirs query_test/test_insert.py::TestInsertQueries::test_insert_large_string (the last query only) query_test/test_kudu.py::TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_kudu_scan_mem_usage query_test/test_queries.py::TestQueriesParquetTables::test_very_large_strings query_test/test_query_mem_limit.py::TestCodegenMemLimit::test_codegen_mem_limit shell/test_shell_client.py::TestShellClient::test_fetch_size Disable result spooling to avoid crash (IMPALA-10371): custom_cluster/test_observability.py::TestObservability::test_host_profile_jvm_gc_metrics query_test/test_udfs.py::TestUdfExecution::test_java_udfs query_test/test_udfs.py::TestUdfTargeted::test_udf_profile
[Impala-ASF-CR] IMPALA-9856: Enable result spooling by default.
Hello Quanlong Huang, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16755 to look at the new patch set (#3). Change subject: IMPALA-9856: Enable result spooling by default. .. IMPALA-9856: Enable result spooling by default. Result spooling has been relatively stable since it was introduced, and it has several benefits described in IMPALA-8656. This patch enable result spooling (SPOOL_QUERY_RESULTS) query options by default. Furthermore, some tests need to be adjusted to account for result spooling by default. The following are the adjustment categories and list of tests that fall under such category. Change in assertions: PlannerTest#testAcidTableScans PlannerTest#testBloomFilterAssignment PlannerTest#testConstantFolding PlannerTest#testFkPkJoinDetection PlannerTest#testFkPkJoinDetectionWithHDFSNumRowsEstDisabled PlannerTest#testKuduSelectivity PlannerTest#testMaxRowSize PlannerTest#testMinMaxRuntimeFilters PlannerTest#testMinMaxRuntimeFiltersWithHDFSNumRowsEstDisabled PlannerTest#testMtDopValidation PlannerTest#testParquetFiltering PlannerTest#testParquetFilteringDisabled PlannerTest#testPartitionPruning PlannerTest#testPreaggBytesLimit PlannerTest#testResourceRequirements PlannerTest#testRuntimeFilterQueryOptions PlannerTest#testSortExprMaterialization PlannerTest#testSpillableBufferSizing PlannerTest#testTableSample PlannerTest#testTpch PlannerTest#testKuduTpch PlannerTest#testTpchNested PlannerTest#testUnion TpcdsPlannerTest custom_cluster/test_admission_controller.py::TestAdmissionController::test_dedicated_coordinator_planner_estimates custom_cluster/test_admission_controller.py::TestAdmissionController::test_memory_rejection custom_cluster/test_admission_controller.py::TestAdmissionController::test_pool_mem_limit_configs metadata/test_explain.py::TestExplain::test_explain_level2 metadata/test_explain.py::TestExplain::test_explain_level3 metadata/test_stats_extrapolation.py::TestStatsExtrapolation::test_stats_extrapolation Increase BUFFER_POOL_LIMIT: query_test/test_queries.py::TestQueries::test_analytic_fns query_test/test_runtime_filters.py::TestRuntimeRowFilters::test_row_filter_reservation query_test/test_sort.py::TestQueryFullSort::test_multiple_mem_limits_full_output query_test/test_spilling.py::TestSpillingBroadcastJoins::test_spilling_broadcast_joins query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_aggs query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_regression_exhaustive query_test/test_udfs.py::TestUdfExecution::test_mem_limits Increase MEM_LIMIT: query_test/test_mem_usage_scaling.py::TestExchangeMemUsage::test_exchange_mem_usage_scaling query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling Increase MAX_ROW_SIZE: custom_cluster/test_parquet_max_page_header.py::TestParquetMaxPageHeader::test_large_page_header_config query_test/test_insert.py::TestInsertQueries::test_insert_large_string query_test/test_query_mem_limit.py::TestQueryMemLimit::test_mem_limit query_test/test_scanners.py::TestTextSplitDelimiters::test_text_split_across_buffers_delimiter query_test/test_scanners.py::TestWideRow::test_wide_row Disable result spooling to maintain assertion: custom_cluster/test_admission_controller.py::TestAdmissionController::test_set_request_pool custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_host_memory custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_pool_memory custom_cluster/test_admission_controller.py::TestAdmissionController::test_queue_reasons_memory custom_cluster/test_admission_controller.py::TestAdmissionController::test_pool_config_change_while_queued custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_fetched_rows custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_finished_query custom_cluster/test_scratch_disk.py::TestScratchDir::test_no_dirs custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_existing_dirs custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_writable_dirs query_test/test_insert.py::TestInsertQueries::test_insert_large_string (the last query only) query_test/test_kudu.py::TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_kudu_scan_mem_usage query_test/test_queries.py::TestQueriesParquetTables::test_very_large_strings query_test/test_query_mem_limit.py::TestCodegenMemLimit::test_codegen_mem_limit shell/test_shell_client.py::TestShellClient::test_fetch_size Disable result spooling to avoid crash (IMPALA-10371): custom_cluster/test_observability.py::TestObservability::test_host_profile_jvm_gc_metrics query_test/test_udfs.py::TestUdfExecution::test_java_udfs query_test/test_udfs.py::TestUdfTargeted::test_udf_profile
[Impala-ASF-CR] IMPALA-10390: impala-profile-tool JSON output
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16855 ) Change subject: IMPALA-10390: impala-profile-tool JSON output .. Patch Set 3: Code-Review+1 (1 comment) Thank you, Tim. LGTM! http://gerrit.cloudera.org:8080/#/c/16855/2/be/src/util/impala-profile-tool.cc File be/src/util/impala-profile-tool.cc: http://gerrit.cloudera.org:8080/#/c/16855/2/be/src/util/impala-profile-tool.cc@120 PS2, Line 120: if (profile_format == "text") { > The one line per document is pretty common and is a bit easier to handle wi Ah, didn't now there is a .jsonl format. I learn new thing. -- To view, visit http://gerrit.cloudera.org:8080/16855 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I82ae0fe9379b7e3cbe93166adaa4c37212ea0f67 Gerrit-Change-Number: 16855 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 11 Dec 2020 02:10:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString
Hello Quanlong Huang, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16884 to look at the new patch set (#2). Change subject: IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString .. IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString BufferedTupleStream::DebugString() iterate std::list that can potentially grow very large. As consequent, the returned string can grow large as well and cause a problem as previously happen in IMPALA-9851. With this patch, BufferedTupleStream::DebugString() only include maximum of 100 first pages of page list. Testing: - Add new be test SimpleTupleStreamTest.ShortDebugString in buffered-tuple-stream-test.cc - Pass core tests Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d --- M be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h 3 files changed, 73 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/16884/2 -- To view, visit http://gerrit.cloudera.org:8080/16884 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d Gerrit-Change-Number: 16884 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16884 ) Change subject: IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/16884/1/be/src/runtime/buffered-tuple-stream-test.cc File be/src/runtime/buffered-tuple-stream-test.cc: http://gerrit.cloudera.org:8080/#/c/16884/1/be/src/runtime/buffered-tuple-stream-test.cc@1472 PS1, Line 1472: SimpleTupleStreamTest > nit: To be consistent with other tests, I think the label should be "Simple Done http://gerrit.cloudera.org:8080/#/c/16884/1/be/src/runtime/buffered-tuple-stream-test.cc@1485 PS1, Line 1485: bool b = stream.AddRow(batch->GetRow(j), ); > I think we can remove this TODO. Done http://gerrit.cloudera.org:8080/#/c/16884/1/be/src/runtime/buffered-tuple-stream-test.cc@1508 PS1, Line 1508: > Can we calculate 148 out based on the vars? I'm concerning it will introduc It is not obvious for me to calculate it based on the vars. So instead, I change it to look up stream.num_pages_ directly and add verification that num_pages_ > MAX_PAGE_ITER_DEBUG, given the workload. http://gerrit.cloudera.org:8080/#/c/16884/2/be/src/runtime/buffered-tuple-stream.h File be/src/runtime/buffered-tuple-stream.h: http://gerrit.cloudera.org:8080/#/c/16884/2/be/src/runtime/buffered-tuple-stream.h@563 PS2, Line 563: friend class ArrayTupleStreamTest_TestArrayDeepCopy_Test; : friend class ArrayTupleStreamTest_TestComputeRowSize_Test; : friend class MultiNullableTupleStreamTest_TestComputeRowSize_Test; I notice that since IMPALA-10337 changed BufferedTupleStream::ComputeRowSize() from private to public, these lines are not required anymore. Can I delete them along with this patch? http://gerrit.cloudera.org:8080/#/c/16884/1/be/src/runtime/buffered-tuple-stream.cc File be/src/runtime/buffered-tuple-stream.cc: http://gerrit.cloudera.org:8080/#/c/16884/1/be/src/runtime/buffered-tuple-stream.cc@189 PS1, Line 189: \ > Comparing to the original codes, we should have "\n" here. Done. Thanks for catching this! -- To view, visit http://gerrit.cloudera.org:8080/16884 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d Gerrit-Change-Number: 16884 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 21 Dec 2020 22:14:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString
Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16884 Change subject: IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString .. IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString BufferedTupleStream::DebugString() iterate std::list that can potentially grow very large. As consequent, the returned string can grow large as well and cause a problem as previously happen in IMPALA-9851. With this patch, BufferedTupleStream::DebugString() only include maximum of 100 first pages of page list. Testing: - Add new be test SimpleNullStreamTest.ShortDebugString in buffered-tuple-stream-test.cc - Pass core tests Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d --- M be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h 3 files changed, 71 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/16884/1 -- To view, visit http://gerrit.cloudera.org:8080/16884 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d Gerrit-Change-Number: 16884 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto
[Impala-ASF-CR] IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString
Hello Quanlong Huang, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16884 to look at the new patch set (#3). Change subject: IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString .. IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString BufferedTupleStream::DebugString() iterate std::list that can potentially grow very large. As consequent, the returned string can grow large as well and cause a problem as previously happen in IMPALA-9851. With this patch, BufferedTupleStream::DebugString() only include maximum of 100 first pages of page list. Testing: - Add new be test SimpleTupleStreamTest.ShortDebugString in buffered-tuple-stream-test.cc - Pass core tests Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d --- M be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h 3 files changed, 73 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/16884/3 -- To view, visit http://gerrit.cloudera.org:8080/16884 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d Gerrit-Change-Number: 16884 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-9865: part 2/2: add verbosity to profile tool
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16881 ) Change subject: IMPALA-9865: part 2/2: add verbosity to profile tool .. Patch Set 8: Code-Review+1 (2 comments) Thanks for the explanation. LGTM. http://gerrit.cloudera.org:8080/#/c/16881/7/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/16881/7/be/src/util/runtime-profile.cc@2723 PS7, Line 2723: // Legacy profile did not show per-instance stats for averaged profile. > Done Resolved http://gerrit.cloudera.org:8080/#/c/16881/6/tests/observability/test_profile_tool.py File tests/observability/test_profile_tool.py: http://gerrit.cloudera.org:8080/#/c/16881/6/tests/observability/test_profile_tool.py@36 PS6, Line 36: self._compare_profile_tool_output([], > Yeah, producing identical output is not necessarily the goal of this tool, Got it, thanks! -- To view, visit http://gerrit.cloudera.org:8080/16881 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I82618a813e29af7996dfaed78873b2a73bc0231d Gerrit-Change-Number: 16881 Gerrit-PatchSet: 8 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 22 Dec 2020 06:58:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16884 ) Change subject: IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/16884/2/be/src/runtime/buffered-tuple-stream.h File be/src/runtime/buffered-tuple-stream.h: http://gerrit.cloudera.org:8080/#/c/16884/2/be/src/runtime/buffered-tuple-stream.h@563 PS2, Line 563: friend class SimpleTupleStreamTest_ShortDebugString_Test; : : /// Runtime state instance used to check for cancellation. Not own > Yeah, please delete them. Done -- To view, visit http://gerrit.cloudera.org:8080/16884 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d Gerrit-Change-Number: 16884 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 22 Dec 2020 03:26:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9865: part 2/2: add verbosity to profile tool
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16881 ) Change subject: IMPALA-9865: part 2/2: add verbosity to profile tool .. Patch Set 7: (6 comments) I have 1 follow up question and 1 new finding. http://gerrit.cloudera.org:8080/#/c/16881/6/be/src/util/impala-profile-tool.cc File be/src/util/impala-profile-tool.cc: http://gerrit.cloudera.org:8080/#/c/16881/6/be/src/util/impala-profile-tool.cc@88 PS6, Line 88: RuntimeProfileBase::Verbosity configured_verbosity = > We shouldn't actually be using the default value, since we only use if it's Done http://gerrit.cloudera.org:8080/#/c/16881/6/be/src/util/impala-profile-tool.cc@136 PS6, Line 136: if (FLAGS_profile_verbosity == "") { > Yes and no. We can set the verbosity level to any level for any profile, bu Make sense. Thanks for clarifying this. http://gerrit.cloudera.org:8080/#/c/16881/7/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/16881/7/be/src/util/runtime-profile.cc@2723 PS7, Line 2723: if (verbosity <= Verbosity::LEGACY) continue; Looks like this condition can be flipped and merged with the branch bellow? if (verbosity >= Verbosity::DEFAULT && v.second.second.size() > 1) { http://gerrit.cloudera.org:8080/#/c/16881/6/bin/rat_exclude_files.txt File bin/rat_exclude_files.txt: http://gerrit.cloudera.org:8080/#/c/16881/6/bin/rat_exclude_files.txt@157 PS6, Line 157: testdata/impala-profiles/impala_profile_log_tpcds_compute_stats.expected.txt > Done Resolved http://gerrit.cloudera.org:8080/#/c/16881/6/tests/observability/test_profile_tool.py File tests/observability/test_profile_tool.py: http://gerrit.cloudera.org:8080/#/c/16881/6/tests/observability/test_profile_tool.py@36 PS6, Line 36: self._compare_profile_tool_output([], > Leaving --profile_verbosity unset is different from setting it to --legacy Ok, I understand the difference now. Setting --profile_verbosity=legacy means forcing legacy format, regardless of the profile_version. Since we don't test with --profile_verbosity=legacy, am I correct to assume that accurate backward-compatible parsing of v2 profile to v1 representation is not the goal of impala-profile-tool? http://gerrit.cloudera.org:8080/#/c/16881/6/tests/observability/test_profile_tool.py@38 PS6, Line 38: get_profile_path('impala_profile_log_tpcds_compute_stats.expected.txt')) > See above comment - this is actually kinda different from legacy Got it, thanks! -- To view, visit http://gerrit.cloudera.org:8080/16881 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I82618a813e29af7996dfaed78873b2a73bc0231d Gerrit-Change-Number: 16881 Gerrit-PatchSet: 7 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 22 Dec 2020 03:12:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16884 ) Change subject: IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString .. Patch Set 4: > Patch Set 4: Verified-1 > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6800/ Hit by IMPALA-9550. Ironically, I just close IMPALA-9550 yesterday as not reproducible. I will reopen the JIRA again. -- To view, visit http://gerrit.cloudera.org:8080/16884 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d Gerrit-Change-Number: 16884 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 22 Dec 2020 15:46:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9550: Fix flakiness in TestResultSpoolingFetchSize.test fetch
Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16895 Change subject: IMPALA-9550: Fix flakiness in TestResultSpoolingFetchSize.test_fetch .. IMPALA-9550: Fix flakiness in TestResultSpoolingFetchSize.test_fetch TestResultSpoolingFetchSize.test_fetch has been flaky in ubuntu-16.04-dockerised environment for not reaching finished state within 10 seconds. This patch increase the timeout of the test to 30 seconds. Testing: - Looped the test locally. Change-Id: Id2e8a9db904da5f1e4acc9e18b3987b8a4ec24e5 --- M tests/query_test/test_result_spooling.py 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/16895/1 -- To view, visit http://gerrit.cloudera.org:8080/16895 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Id2e8a9db904da5f1e4acc9e18b3987b8a4ec24e5 Gerrit-Change-Number: 16895 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto
[Impala-ASF-CR] IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16884 ) Change subject: IMPALA-10374: Limit iteration at BufferedTupleStream::DebugString .. Patch Set 5: Patch set 5 is a rebase on top of IMPALA-9550 fix. The flakiness should be gone now. -- To view, visit http://gerrit.cloudera.org:8080/16884 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6626c8d54f35f303c01f85be1dd9aa54c8ad9a2d Gerrit-Change-Number: 16884 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 23 Dec 2020 01:33:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9865: part 2/2: add verbosity to profile tool
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16881 ) Change subject: IMPALA-9865: part 2/2: add verbosity to profile tool .. Patch Set 6: (5 comments) http://gerrit.cloudera.org:8080/#/c/16881/6/be/src/util/impala-profile-tool.cc File be/src/util/impala-profile-tool.cc: http://gerrit.cloudera.org:8080/#/c/16881/6/be/src/util/impala-profile-tool.cc@88 PS6, Line 88: RuntimeProfileBase::Verbosity configured_verbosity; Is this meant to be initialized as RuntimeProfileBase::Verbosity::LEGACY ? http://gerrit.cloudera.org:8080/#/c/16881/6/be/src/util/impala-profile-tool.cc@136 PS6, Line 136: verbosity = profile_version < 2 ? RuntimeProfileBase::Verbosity::LEGACY : In case of v1 profile input and --profile_verbosity=default, this line does not seem to override it back to Verbosity::LEGACY. Does this implies that both v1 and v2 profile input can be parsed into any verbosity level? http://gerrit.cloudera.org:8080/#/c/16881/6/bin/rat_exclude_files.txt File bin/rat_exclude_files.txt: http://gerrit.cloudera.org:8080/#/c/16881/6/bin/rat_exclude_files.txt@157 PS6, Line 157: testdata/impala-profiles/impala_profile_log_tpcds_compute_stats.expected Should we add `.txt` suffix for expected output in text format? http://gerrit.cloudera.org:8080/#/c/16881/6/tests/observability/test_profile_tool.py File tests/observability/test_profile_tool.py: http://gerrit.cloudera.org:8080/#/c/16881/6/tests/observability/test_profile_tool.py@36 PS6, Line 36: self._compare_profile_tool_output([], Since default verbosity might change in the future, can we add a test using argument `--profile_verbosity=legacy` ? Should we also test between v1 vs v2 profile input when --profile_verbosity is not specified? http://gerrit.cloudera.org:8080/#/c/16881/6/tests/observability/test_profile_tool.py@38 PS6, Line 38: get_profile_path('impala_profile_log_tpcds_compute_stats.expected')) Rename expected file name to "impala_profile_log_tpcds_compute_stats_legacy.expected" for clarity? -- To view, visit http://gerrit.cloudera.org:8080/16881 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I82618a813e29af7996dfaed78873b2a73bc0231d Gerrit-Change-Number: 16881 Gerrit-PatchSet: 6 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 21 Dec 2020 19:18:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9856: Enable result spooling by default.
Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16755 Change subject: IMPALA-9856: Enable result spooling by default. .. IMPALA-9856: Enable result spooling by default. Result spooling has been relatively stable since it was introduced, and it has several benefits described in IMPALA-8656. This patch enable result spooling (SPOOL_QUERY_RESULTS) query options by default. Furthermore, some tests need to be adjusted to account for result spooling by default. The following are the adjustment categories and list of tests that fall under such category. Change in assertions: PlannerTest TpcdsPlannerTest custom_cluster/test_admission_controller.py::TestAdmissionController::test_dedicated_coordinator_planner_estimates custom_cluster/test_admission_controller.py::TestAdmissionController::test_memory_rejection custom_cluster/test_admission_controller.py::TestAdmissionController::test_pool_mem_limit_configs metadata/test_explain.py::TestExplain::test_explain_level2 metadata/test_explain.py::TestExplain::test_explain_level3 metadata/test_stats_extrapolation.py::TestStatsExtrapolation::test_stats_extrapolation Increase BUFFER_POOL_LIMIT: query_test/test_queries.py::TestQueries::test_analytic_fns query_test/test_runtime_filters.py::TestRuntimeRowFilters::test_row_filter_reservation query_test/test_spilling.py::TestSpillingBroadcastJoins::test_spilling_broadcast_joins query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_aggs query_test/test_udfs.py::TestUdfExecution::test_mem_limits Increase MEM_LIMIT: query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling Increase MAX_ROW_SIZE: custom_cluster/test_parquet_max_page_header.py::TestParquetMaxPageHeader::test_large_page_header_config query_test/test_scanners.py::TestTextSplitDelimiters::test_text_split_across_buffers_delimiter Disable result spooling to maintain assertion: custom_cluster/test_admission_controller.py::TestAdmissionController::test_set_request_pool custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_host_memory custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_pool_memory custom_cluster/test_admission_controller.py::TestAdmissionController::test_queue_reasons_memory custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_fetched_rows custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_finished_query custom_cluster/test_scratch_disk.py::TestScratchDir::test_no_dirs custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_existing_dirs custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_writable_dirs query_test/test_kudu.py::TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_kudu_scan_mem_usage query_test/test_query_mem_limit.py::TestCodegenMemLimit::test_codegen_mem_limit query_test/test_query_mem_limit.py::TestQueryMemLimit::test_mem_limit query_test/test_sort.py::TestQueryFullSort::test_multiple_mem_limits_full_output query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_regression_exhaustive shell/test_shell_client.py::TestShellClient::test_fetch_size Disable result spooling to avoid crash / silent error: custom_cluster/test_observability.py::TestObservability::test_host_profile_jvm_gc_metrics query_test/test_insert.py::TestInsertQueries::test_insert_large_string query_test/test_queries.py::TestQueriesParquetTables::test_very_large_strings query_test/test_scanners.py::TestWideRow::test_wide_row query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_large_rows query_test/test_udfs.py::TestUdfExecution::test_java_udfs query_test/test_udfs.py::TestUdfTargeted::test_udf_profile Further investigation need to be done to address the last category. Testing: - Pass exhaustive tests. Change-Id: I9e360c1428676d8f3fab5d95efee18aca085eba4 --- M common/thrift/ImpalaInternalService.thrift M testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection-hdfs-num-rows-est-enabled.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters-hdfs-num-rows-est-enabled.test M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test M
[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation
Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16765 Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation .. IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation PlanRootSink can fail silently if result spooling is enabled and maxMemReservationBytes is less than 2 * MAX_ROW_SIZE. Underneath, a SpillableRowBatchQueue need 2 buffer (read and write) each to fit at least MAX_ROW_SIZE bytes. This patch change the ResourceProfile's maxMemReservationBytes as: max(minMemReservationBytes, MAX_RESULT_SPOOLING_MEM, 2 * MAX_ROW_SIZE) Testing: - Pass exhaustive tests. - Lower MAX_ROW_SIZE on tests where MAX_RESULT_SPOOLING_MEM is set to extremely low value. Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 --- M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M tests/custom_cluster/test_query_retries.py M tests/query_test/test_result_spooling.py 3 files changed, 4 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/16765/1 -- To view, visit http://gerrit.cloudera.org:8080/16765 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726 Gerrit-Change-Number: 16765 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto
[Impala-ASF-CR] IMPALA-9856: Enable result spooling by default.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16755 ) Change subject: IMPALA-9856: Enable result spooling by default. .. Patch Set 1: I file IMPALA-10337 separately to solve the crash / silent error case related to large rows. I think that should get in first before this patch. I will investigate the other one that involve Udf. -- To view, visit http://gerrit.cloudera.org:8080/16755 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e360c1428676d8f3fab5d95efee18aca085eba4 Gerrit-Change-Number: 16755 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 21 Nov 2020 16:24:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10266: Identify FileSystem type based on the protocol scheme.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16628 ) Change subject: IMPALA-10266: Identify FileSystem type based on the protocol scheme. .. Patch Set 1: > Patch Set 1: > > It looks like there are some more instanceof usages sprinkled around the code: > > $ git grep 'instanceof.*FileSy' > fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java: if (!(fs > instanceof DistributedFileSystem) && !(fs instanceof S3AFileSystem) && > fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java: !(fs > instanceof AzureBlobFileSystem) && > fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java: !(fs > instanceof SecureAzureBlobFileSystem) && > fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java: !(fs > instanceof AdlFileSystem) && > fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java: !(fs > instanceof OzoneFileSystem)) { > fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java: boolean > shouldCheckPerms = !(fs instanceof AdlFileSystem || > fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java:fs > instanceof AzureBlobFileSystem || fs instanceof SecureAzureBlobFileSystem); > fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: > if (FileSystemUtil.getDefaultFileSystem() instanceof DistributedFileSystem) { > fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: > if (FileSystemUtil.getDefaultFileSystem() instanceof DistributedFileSystem) { > fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: > Preconditions.checkState(fs instanceof DistributedFileSystem); > fe/src/main/java/org/apache/impala/service/JniFrontend.java: if (!(fs > instanceof DistributedFileSystem || > fe/src/main/java/org/apache/impala/service/JniFrontend.java:fs > instanceof S3AFileSystem || > fe/src/main/java/org/apache/impala/service/JniFrontend.java:fs > instanceof AzureBlobFileSystem || > fe/src/main/java/org/apache/impala/service/JniFrontend.java:fs > instanceof SecureAzureBlobFileSystem || > fe/src/main/java/org/apache/impala/service/JniFrontend.java:fs > instanceof AdlFileSystem)) { Ack. Will look into these codes. -- To view, visit http://gerrit.cloudera.org:8080/16628 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I04492326a6e84895eef369fc11a3ec11f1536b6b Gerrit-Change-Number: 16628 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 22 Oct 2020 17:47:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10266: Identify FileSystem type based on the protocol scheme.
Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16628 Change subject: IMPALA-10266: Identify FileSystem type based on the protocol scheme. .. IMPALA-10266: Identify FileSystem type based on the protocol scheme. Frontend identifies the type of FileSystem in two ways. The first is done using the instanceof operator with subclasses of org.apache.hadoop.fs.FileSystem. The second is by checking the FileSystem protocol scheme. This patch standardizes the FileSystem identification based on the scheme only. Testing: - Add testSupportStorageIds and testWriteableByImpala in FileSystemUtilTest. - Run and pass core tests. Change-Id: I04492326a6e84895eef369fc11a3ec11f1536b6b --- M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java 2 files changed, 141 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/16628/1 -- To view, visit http://gerrit.cloudera.org:8080/16628 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I04492326a6e84895eef369fc11a3ec11f1536b6b Gerrit-Change-Number: 16628 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto
[Impala-ASF-CR] IMPALA-10266: Identify FileSystem type based on the protocol scheme.
Hello Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16628 to look at the new patch set (#4). Change subject: IMPALA-10266: Identify FileSystem type based on the protocol scheme. .. IMPALA-10266: Identify FileSystem type based on the protocol scheme. Frontend identifies the type of FileSystem in two ways. The first is done using the instanceof operator with subclasses of org.apache.hadoop.fs.FileSystem. The second is by checking the FileSystem protocol scheme. This patch standardizes the FileSystem identification based on the scheme only. Testing: - Add several tests in FileSystemUtilTest to check validity of some FileSystemUtil functions. - Run and pass core tests. Change-Id: I04492326a6e84895eef369fc11a3ec11f1536b6b --- M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java 5 files changed, 261 insertions(+), 67 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/16628/4 -- To view, visit http://gerrit.cloudera.org:8080/16628 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I04492326a6e84895eef369fc11a3ec11f1536b6b Gerrit-Change-Number: 16628 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10266: Identify FileSystem type based on the protocol scheme.
Hello Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16628 to look at the new patch set (#2). Change subject: IMPALA-10266: Identify FileSystem type based on the protocol scheme. .. IMPALA-10266: Identify FileSystem type based on the protocol scheme. Frontend identifies the type of FileSystem in two ways. The first is done using the instanceof operator with subclasses of org.apache.hadoop.fs.FileSystem. The second is by checking the FileSystem protocol scheme. This patch standardizes the FileSystem identification based on the scheme only. Testing: - Add several tests in FileSystemUtilTest to check validity of some FileSystemUtil functions. - Run and pass core tests. Change-Id: I04492326a6e84895eef369fc11a3ec11f1536b6b --- M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java 5 files changed, 220 insertions(+), 60 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/16628/2 -- To view, visit http://gerrit.cloudera.org:8080/16628 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I04492326a6e84895eef369fc11a3ec11f1536b6b Gerrit-Change-Number: 16628 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10266: Identify FileSystem type based on the protocol scheme.
Hello Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16628 to look at the new patch set (#3). Change subject: IMPALA-10266: Identify FileSystem type based on the protocol scheme. .. IMPALA-10266: Identify FileSystem type based on the protocol scheme. Frontend identifies the type of FileSystem in two ways. The first is done using the instanceof operator with subclasses of org.apache.hadoop.fs.FileSystem. The second is by checking the FileSystem protocol scheme. This patch standardizes the FileSystem identification based on the scheme only. Testing: - Add several tests in FileSystemUtilTest to check validity of some FileSystemUtil functions. - Run and pass core tests. Change-Id: I04492326a6e84895eef369fc11a3ec11f1536b6b --- M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java 5 files changed, 222 insertions(+), 65 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/16628/3 -- To view, visit http://gerrit.cloudera.org:8080/16628 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I04492326a6e84895eef369fc11a3ec11f1536b6b Gerrit-Change-Number: 16628 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10266: Identify FileSystem type based on the protocol scheme.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16628 ) Change subject: IMPALA-10266: Identify FileSystem type based on the protocol scheme. .. Patch Set 4: (5 comments) http://gerrit.cloudera.org:8080/#/c/16628/3/fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java File fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java: http://gerrit.cloudera.org:8080/#/c/16628/3/fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java@63 PS3, Line 63: Preconditions.checkNotNull(tableName); > Maybe move this to FileSystemUtil? That way all these lists of capabilities Done http://gerrit.cloudera.org:8080/#/c/16628/3/fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java@161 PS3, Line 161: > Can we generate this error message from the list of supported schemes? I th Done http://gerrit.cloudera.org:8080/#/c/16628/3/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: http://gerrit.cloudera.org:8080/#/c/16628/3/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@72 PS3, Line 72: ImmutableSet.builder() > I like this pattern! Agree, I feel it is easier to understand by grouping them like this. http://gerrit.cloudera.org:8080/#/c/16628/2/fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java File fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java: http://gerrit.cloudera.org:8080/#/c/16628/2/fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java@100 PS2, Line 100: testFsType( > line too long (91 > 90) Done http://gerrit.cloudera.org:8080/#/c/16628/2/fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java@186 PS2, Line 186: return "adl://dummy-account.azuredatalakestore.net/dummy-part-3"; > line too long (91 > 90) Done -- To view, visit http://gerrit.cloudera.org:8080/16628 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I04492326a6e84895eef369fc11a3ec11f1536b6b Gerrit-Change-Number: 16628 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 22 Oct 2020 21:13:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10266: Identify FileSystem type based on the protocol scheme.
Hello Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16628 to look at the new patch set (#5). Change subject: IMPALA-10266: Identify FileSystem type based on the protocol scheme. .. IMPALA-10266: Identify FileSystem type based on the protocol scheme. Frontend identifies the type of FileSystem in two ways. The first is done using the instanceof operator with subclasses of org.apache.hadoop.fs.FileSystem. The second is by checking the FileSystem protocol scheme. This patch standardizes the FileSystem identification based on the scheme only. Testing: - Add several tests in FileSystemUtilTest to check validity of some FileSystemUtil functions. - Run and pass core tests. Change-Id: I04492326a6e84895eef369fc11a3ec11f1536b6b --- M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java 5 files changed, 261 insertions(+), 67 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/16628/5 -- To view, visit http://gerrit.cloudera.org:8080/16628 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I04492326a6e84895eef369fc11a3ec11f1536b6b Gerrit-Change-Number: 16628 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16963 ) Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits .. Patch Set 2: (2 comments) Thank you for the review. Patch set 2 move the handle variable and scope exit trigger closer around the file handle retrieval. http://gerrit.cloudera.org:8080/#/c/16963/1/be/src/runtime/io/hdfs-file-reader.cc File be/src/runtime/io/hdfs-file-reader.cc: http://gerrit.cloudera.org:8080/#/c/16963/1/be/src/runtime/io/hdfs-file-reader.cc@99 PS1, Line 99: Status status = Status::OK(); : { : ScopedTimer req_context_read_timer( : scan_range_->reader_->read_timer_); : bool logged_slow_read = false; // True if we already logged about a slow read. : : // Try reading from the remote data cache if it's enabled for the scan range. : DataCache* remote_data_cache = io_mgr->remote_data_cache(); : bool try_data_cache = scan_range_->UseDataCache() && remote_data_cache != nullptr; : i > Nit: Since these are now initialized later in the function, I would also mo Done http://gerrit.cloudera.org:8080/#/c/16963/1/be/src/runtime/io/hdfs-file-reader.cc@126 PS1, Line 126: hdfsFile hdfs_file; > For the clang-tidy issue you are seeing here: Thank you. I didn't read thoroughly to realize that the logic later won't execute if all bytes successfully read from data cache. -- To view, visit http://gerrit.cloudera.org:8080/16963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91 Gerrit-Change-Number: 16963 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 21 Jan 2021 17:13:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits
Hello Zoltan Borok-Nagy, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16963 to look at the new patch set (#2). Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits .. IMPALA-10147: Avoid getting a file handle for data cache hits When reading from the data cache, the disk IO thread first gets a file handle, then it checks the data cache for a hit. The file handle is only used if there is a data cache miss. It is not used when data cache hit and in turns becomes an overhead. This patch move the file handle retrieval later when data cache miss hapens. Testing: - Add custom cluster test test_no_fd_caching_on_cached_data. - Pass core tests. Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91 --- M be/src/runtime/io/hdfs-file-reader.cc M tests/custom_cluster/test_hdfs_fd_caching.py 2 files changed, 71 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/16963/2 -- To view, visit http://gerrit.cloudera.org:8080/16963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91 Gerrit-Change-Number: 16963 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10147: Avoid getting a file handle for data cache hits
Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16963 Change subject: IMPALA-10147: Avoid getting a file handle for data cache hits .. IMPALA-10147: Avoid getting a file handle for data cache hits When reading from the data cache, the disk IO thread first gets a file handle, then it checks the data cache for a hit. The file handle is only used if there is a data cache miss. It is not used when data cache hit and in turns becomes an overhead. This patch move the file handle retrieval later when data cache miss hapens. Testing: - Add custom cluster test test_no_fd_caching_on_cached_data. - Pass core tests. Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91 --- M be/src/runtime/io/hdfs-file-reader.cc M tests/custom_cluster/test_hdfs_fd_caching.py 2 files changed, 59 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/16963/1 -- To view, visit http://gerrit.cloudera.org:8080/16963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Icc68f233518f862454e87bcbbef14d65fcdb7c91 Gerrit-Change-Number: 16963 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto
[Impala-ASF-CR] IMPALA-9355: TestExchangeMemUsage.test exchange mem usage scaling doesn't hit the memory limit
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17586 ) Change subject: IMPALA-9355: TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory limit .. Patch Set 1: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/17586 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia4ad4508028645b67de419cfdfa2327d2847cfc4 Gerrit-Change-Number: 17586 Gerrit-PatchSet: 1 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 11 Jun 2021 23:22:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 28: (3 comments) http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/partitioned-hash-join-builder.cc@337 PS27, Line 337: for (std::vector::const_iterator it = minmax_filter_ctxs_.begin(); > Sounds like a good idea. Looks good, thanks! http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/partitioned-hash-join-builder.cc@404 PS27, Line 404: } > Partial filter publishing with propagation to scan nodes may be a little bi Make sense. I suppose execution nodes consuming the minmax filters also still need to wait for the remaining filters to arrive before start reading. http://gerrit.cloudera.org:8080/#/c/17295/28/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/17295/28/be/src/exec/partitioned-hash-join-builder.cc@335 PS28, Line 335: if (filter_ctxs_.size() == 0) return; Looks like we can remove this branch? In case minmax_filter_ctxs_ is empty, the loop below will stop immediately. -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 28 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 18 May 2021 18:09:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10650: Bailout min/max filters in hash join builder early
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17295 ) Change subject: IMPALA-10650: Bailout min/max filters in hash join builder early .. Patch Set 27: (2 comments) Hi Qifan, I wonder if we can improve the minmax filter performance from the build side. I have the following questions and comments. http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/partitioned-hash-join-builder.cc File be/src/exec/partitioned-hash-join-builder.cc: http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/partitioned-hash-join-builder.cc@337 PS27, Line 337: for (const FilterContext& ctx : filter_ctxs_) { I wonder if we can speed this up by iterating ONLY the minmax filters. Maybe copy reference of the minmax filters into separate vector? This function seems to be called frequently on every PhjBuilder::AddBatch. I imagine if minmax filter is enabled, only half of filter_ctxs_ elements are actually minmax filter. We can also pop filter out of the vector once it deemed not useful, therefore speeding up the next iteration. http://gerrit.cloudera.org:8080/#/c/17295/27/be/src/exec/partitioned-hash-join-builder.cc@404 PS27, Line 404: PublishRuntimeFilters(num_build_rows); It seems to me that PublishRuntimeFilters is only called here in FinalizeBuild (I assume near the end of the build process). Since minmax filter can be quickly disabled after reading few early RowBatch, shall we consider to publish them as soon as possible? Say, immediately publish disabled minmax filter from PhjBuilder::DetermineUsefulnessForMinmaxFilters()? -- To view, visit http://gerrit.cloudera.org:8080/17295 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183 Gerrit-Change-Number: 17295 Gerrit-PatchSet: 27 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 18 May 2021 02:52:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10584: Defer advancing read page if stream only has 2 pages.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17195 ) Change subject: IMPALA-10584: Defer advancing read page if stream only has 2 pages. .. Patch Set 7: > Patch Set 7: Verified-1 > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7091/ Looks like one of the maven dependency download failed 08:31:54 08:31:54 [ERROR] Failed to execute goal on project impala-executor-deps: Could not resolve dependencies for project org.apache.impala:impala-executor-deps:pom:4.0.0-SNAPSHOT: Failed to collect dependencies at com.google.cloud.bigdataoss:gcs-connector:jar:2.1.2.7.2.9.0-146 -> com.google.cloud.bigdataoss:util:jar:2.1.2.7.2.9.0-146: Failed to read artifact descriptor for com.google.cloud.bigdataoss:util:jar:2.1.2.7.2.9.0-146: Could not transfer artifact com.google.cloud.bigdataoss:util:pom:2.1.2.7.2.9.0-146 from/to impala.cdp.repo (https://native-toolchain.s3.amazonaws.com/build/cdp_components/11920537/maven): Connection reset -> [Help 1] -- To view, visit http://gerrit.cloudera.org:8080/17195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a Gerrit-Change-Number: 17195 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 22 Apr 2021 15:58:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10497: Fix flakiness in test no fd caching on cached data.
Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17054 Change subject: IMPALA-10497: Fix flakiness in test_no_fd_caching_on_cached_data. .. IMPALA-10497: Fix flakiness in test_no_fd_caching_on_cached_data. test_no_fd_caching_on_cached_data has been flaky for not having all of the data fully cached in warm up phase. This patch fix the test by: 1. Reduce the number of rows written to table cachefd.simple. 2. Repeat the warm up query 5 times. 3. Lower the cluster size to 1. Testing: - Loop the test manually 100 times and see no more failures. Change-Id: I774f9dfea7dcc107c3c7f2b76db3aaf4b2dd7952 --- M tests/custom_cluster/test_hdfs_fd_caching.py 1 file changed, 31 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/17054/1 -- To view, visit http://gerrit.cloudera.org:8080/17054 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I774f9dfea7dcc107c3c7f2b76db3aaf4b2dd7952 Gerrit-Change-Number: 17054 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto
[Impala-ASF-CR] IMPALA-10497: Fix flakiness in test no fd caching on cached data.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17054 ) Change subject: IMPALA-10497: Fix flakiness in test_no_fd_caching_on_cached_data. .. Patch Set 1: (1 comment) Thanks Joe for the feedback. I will revert most of these and leave only the extra warm up part. http://gerrit.cloudera.org:8080/#/c/17054/1/tests/custom_cluster/test_hdfs_fd_caching.py File tests/custom_cluster/test_hdfs_fd_caching.py: http://gerrit.cloudera.org:8080/#/c/17054/1/tests/custom_cluster/test_hdfs_fd_caching.py@190 PS1, Line 190: cluster_size=1 > I think this should not impact the caching. Our scheduling algorithm maps f I will remove this so the cluster size back to default. Previously, when I kept cluster size as default (3), after several iteration of looping this test, one impalad failed with the following error. I0210 17:50:00.781529 10340 mem_tracker.cc:83] Creating tracker /tmp-sharded_lru_cache->root I0210 17:50:00.782321 10340 status.cc:129] Failed to delete old cache file /tmp/impala-cache-file-fc4060e3ab7e45b0:a680f0d3b05005a1: Not found: /tmp/impala-cache-file-fc4060e3ab7e45b0:a680f0d3b05005a1: No such file or directory (error 2) @ 0x2c008d7 _ZN6impala13GetStackTraceB5cxx11Ev @ 0x1c9003e impala::Status::Status() @ 0x36016f9 impala::io::DataCache::Partition::DeleteExistingFiles() @ 0x3601e22 impala::io::DataCache::Partition::Init() @ 0x360975e impala::io::DataCache::Init() @ 0x35b4db0 impala::io::DiskIoMgr::Init() @ 0x251b229 impala::ExecEnv::Init() @ 0x29a0412 ImpaladMain() @ 0x1bdf4d3 main @ 0x7faa8e3bebf7 __libc_start_main @ 0x1ae81da _start I0210 17:50:00.897989 10340 mem_tracker.cc:87] Destroying tracker /tmp-sharded_lru_cache->root I was guessing that since all three write their data cache to common /tmp/ dir, one might accidentally delete cache file that belong to other impalad. Just running the test one time doesn't reveal this error. So in this patchset 1, I follow examples from custom_cluster/test_data_cache.py to set custom_cluster=1. -- To view, visit http://gerrit.cloudera.org:8080/17054 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I774f9dfea7dcc107c3c7f2b76db3aaf4b2dd7952 Gerrit-Change-Number: 17054 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 11 Feb 2021 02:18:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10497: Fix flakiness in test no fd caching on cached data.
Hello Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17054 to look at the new patch set (#2). Change subject: IMPALA-10497: Fix flakiness in test_no_fd_caching_on_cached_data. .. IMPALA-10497: Fix flakiness in test_no_fd_caching_on_cached_data. test_no_fd_caching_on_cached_data has been flaky for not having all of the data fully cached in the warm-up phase. There is a limit on concurrency in writing to the cache such that we may fail to cache data the first time read it. This patch fixes the test by repeating the warm-up query 5 times. This patch also reduce the test cluster size to just 1 to avoid potential flakiness from multiple impalad racing to delete the same old fata cache files. Testing: - Loop the test manually 100 times and see no more failures. Change-Id: I774f9dfea7dcc107c3c7f2b76db3aaf4b2dd7952 --- M tests/custom_cluster/test_hdfs_fd_caching.py 1 file changed, 7 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/17054/2 -- To view, visit http://gerrit.cloudera.org:8080/17054 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I774f9dfea7dcc107c3c7f2b76db3aaf4b2dd7952 Gerrit-Change-Number: 17054 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10497: Fix flakiness in test no fd caching on cached data.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17054 ) Change subject: IMPALA-10497: Fix flakiness in test_no_fd_caching_on_cached_data. .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/17054/1/tests/custom_cluster/test_hdfs_fd_caching.py File tests/custom_cluster/test_hdfs_fd_caching.py: http://gerrit.cloudera.org:8080/#/c/17054/1/tests/custom_cluster/test_hdfs_fd_caching.py@179 PS1, Line 179: le handles cached > One thing to watch out for here is if you inherit from TestHdfsFdCaching, y Done http://gerrit.cloudera.org:8080/#/c/17054/1/tests/custom_cluster/test_hdfs_fd_caching.py@184 PS1, Line 184: # handle ca > I believe that the number of rows should not impact the behavior. The diffe Done http://gerrit.cloudera.org:8080/#/c/17054/1/tests/custom_cluster/test_hdfs_fd_caching.py@190 PS1, Line 190: > I will remove this so the cluster size back to default. I change my mind, I think we should keep cluster_size=1 here. On start up, impalad delete the old cache file by listing file matching CACHE_FILE_PREFIX and delete them one by one. https://github.com/apache/impala/blob/b42c649/be/src/runtime/io/data-cache.cc#L415 So if cluster_size > 1, multiple impalad might race to delete the same old cache file and crash the losing ones. Also, setting cluster_size=1 does not change the behavior being tested by this test, so I think it should be okay. http://gerrit.cloudera.org:8080/#/c/17054/1/tests/custom_cluster/test_hdfs_fd_caching.py@207 PS1, Line 207: : def cached_handles(self): : return self.get_ag > This makes a lot of sense to me as the explanation. There are times when da Done http://gerrit.cloudera.org:8080/#/c/17054/1/tests/custom_cluster/test_hdfs_fd_caching.py@224 PS1, Line 224: > I'm fine with either value here, but I also don't see this changing the beh Done -- To view, visit http://gerrit.cloudera.org:8080/17054 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I774f9dfea7dcc107c3c7f2b76db3aaf4b2dd7952 Gerrit-Change-Number: 17054 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 11 Feb 2021 03:09:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17166 to look at the new patch set (#4). Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. IMPALA-10565: Adjust result spooling memory based on scratch_limit IMPALA-9856 enables result spooling by default. Result spooling depends on the ability to spill its entire BufferedTupleStream to disk once it hits maximum memory reservation. However, if the query option scratch_limit is set lower than max_spilled_result_spooling_mem, the query might fail in the middle of execution due to insufficient scratch space. This patch adds planner change to consider scratch_limit and scratch_dirs query option when computing resource used by result spooling. The algorithm is as follow: * If scratch_dirs is empty or scratch_limit < minMemReservationBytes required to use BufferedPlanRootSink, we set spool_query_results to false and fallback to use BlockingPlanRootSink. * If scratch_limit > minMemReservationBytes but still fairly low, we lower the max_result_spooling_mem (default is 100MB) and max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit. * if scratch_limit > max_spilled_result_spooling_mem, do nothing. Testing: - Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit - Verify that spool_query_results query option is disabled in TestScratchDir::test_no_dirs - Pass exhaustive tests. Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 --- M be/src/service/query-options-test.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java A testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test M tests/custom_cluster/test_scratch_disk.py M tests/query_test/test_scratch_limit.py 8 files changed, 140 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/4 -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 4: (8 comments) My exhaustive test run last night reveal that scratch_limit might still get violated if maxMemReservationBytes is equal to scratch_limit. This is because the content of SpillableRowBatchQueue can be slightly higher than maxMemReservationBytes when it decide to spill. To anticipate that, I lower the spooling mem config a little further here in Patch Set 4. http://gerrit.cloudera.org:8080/#/c/17166/3/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/17166/3/be/src/service/query-options.cc@1104 PS3, Line 1104: // max_spilled_result_spooling_mem (a value of 0 means memory is unbounded). > I just figured out in ParseUtil::ParseMemSpec() that -1 for memory query op Done http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java: http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@77 PS3, Line 77:* If SPOOL_QUERY_RESULTS is true, then the ResourceProfile sets a min/max resevation, > Some of the method level comment should be updated to reflect the behavior Done http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@92 PS3, Line 92: > nit: typo ? Done http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@110 PS3, Line 110: long bufferSize = queryOptions.getDefault_spillable_buffer_size(); : long maxRowBufferSize = PlanNode.computeMaxSpillableBufferSize( > It sounds like an existing bug. If you can create a test case for it can y I filed IMPALA-10583. Will work on that next. http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@126 PS3, Line 126: > Suggest rewording: 'to >=' minMemReservationBytes Done http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@126 PS3, Line 126: > nit: 'increasing' Done http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@142 PS3, Line 142: maxMemReservationBytes = scratchLimit - maxRowBufferSize; > Would be useful to add a trace level log message here as well. Done http://gerrit.cloudera.org:8080/#/c/17166/3/testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test File testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test: http://gerrit.cloudera.org:8080/#/c/17166/3/testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test@2 PS3, Line 2: QUERY > Could you add 1 tests with empty scratch dirs ? Since scratch_dirs is a backend flag, I piggy back the test under TestScratchDir::test_no_dirs -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 12 Mar 2021 20:00:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17166 to look at the new patch set (#3). Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. IMPALA-10565: Adjust result spooling memory based on scratch_limit IMPALA-9856 enables result spooling by default. Result spooling depends on the ability to spill its entire BufferedTupleStream to disk once it hits maximum memory reservation. However, if the query option scratch_limit is set lower than max_spilled_result_spooling_mem, the query might fail in the middle of execution due to insufficient scratch space. This patch adds planner change to consider scratch_limit and scratch_dirs query option when computing resource used by result spooling. The algorithm is as follow: * If scratch_dirs is empty or scratch_limit < minMemReservationBytes required to use BufferedPlanRootSink, we set spool_query_results to false and fallback to use BlockingPlanRootSink. * If scratch_limit > minMemReservationBytes but still fairly low, we lower the max_result_spooling_mem (default is 100MB) and max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit. * if scratch_limit > max_spilled_result_spooling_mem, do nothing. This patch also fix validation between max_result_spooling_mem and max_spilled_result_spooling_mem that should treat both value 0 and -1 as unbounded. Testing: - Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit - Remove spool_query_results query option in custom_cluster/test_scratch_disk.py - Change be test QueryOptions.ResultSpooling to also test -1 as unbounded. - Pass exhaustive tests. Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java A testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test M tests/custom_cluster/test_scratch_disk.py M tests/query_test/test_scratch_limit.py 9 files changed, 140 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/3 -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 3: Following up on private discussion, we decide that it is better to override result spooling configuration if scratch_limit is too low. Patch set 3 implement this strategy. I will rerun an exhaustive tests over this patch set later tonight. -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 12 Mar 2021 00:00:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java: http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@120 PS4, Line 120: > ACK. Done http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@142 PS4, Line 142: queryOptions.setSpool_query_results(false); > Yeah, that will make the code cleaner. Agree about the first point about u Done -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 13 Mar 2021 03:59:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17166 to look at the new patch set (#5). Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. IMPALA-10565: Adjust result spooling memory based on scratch_limit IMPALA-9856 enables result spooling by default. Result spooling depends on the ability to spill its entire BufferedTupleStream to disk once it hits maximum memory reservation. However, if the query option scratch_limit is set lower than max_spilled_result_spooling_mem, the query might fail in the middle of execution due to insufficient scratch space. This patch adds planner change to consider scratch_limit and scratch_dirs query option when computing resource used by result spooling. The algorithm is as follow: * If scratch_dirs is empty or scratch_limit < minMemReservationBytes required to use BufferedPlanRootSink, we set spool_query_results to false and fallback to use BlockingPlanRootSink. * If scratch_limit > minMemReservationBytes but still fairly low, we lower the max_result_spooling_mem (default is 100MB) and max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit. * if scratch_limit > max_spilled_result_spooling_mem, do nothing. Testing: - Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit - Verify that spool_query_results query option is disabled in TestScratchDir::test_no_dirs - Pass exhaustive tests. Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 --- M be/src/service/query-options-test.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java A testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test M tests/custom_cluster/test_scratch_disk.py M tests/query_test/test_scratch_limit.py 8 files changed, 143 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/5 -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17166 to look at the new patch set (#6). Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. IMPALA-10565: Adjust result spooling memory based on scratch_limit IMPALA-9856 enables result spooling by default. Result spooling depends on the ability to spill its entire BufferedTupleStream to disk once it hits maximum memory reservation. However, if the query option scratch_limit is set lower than max_spilled_result_spooling_mem, the query might fail in the middle of execution due to insufficient scratch space. This patch adds planner change to consider scratch_limit and scratch_dirs query option when computing resource used by result spooling. The algorithm is as follow: * If scratch_dirs is empty or scratch_limit < minMemReservationBytes required to use BufferedPlanRootSink, we set spool_query_results to false and fallback to use BlockingPlanRootSink. * If scratch_limit > minMemReservationBytes but still fairly low, we lower the max_result_spooling_mem (default is 100MB) and max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit. * if scratch_limit > max_spilled_result_spooling_mem, do nothing. Testing: - Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit - Verify that spool_query_results query option is disabled in TestScratchDir::test_no_dirs - Pass exhaustive tests. Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 --- M be/src/service/query-options-test.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java A testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test M tests/custom_cluster/test_scratch_disk.py M tests/query_test/test_scratch_limit.py 8 files changed, 143 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/6 -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/17166/5/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java: http://gerrit.cloudera.org:8080/#/c/17166/5/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@123 PS5, Line 123: // If maxAllowedScratchLimit < minMemReservat > nit: Shouldn't this be maxAllowedScratchLimit < minMemReservationBytes ? Thanks! Sorry for missing this. http://gerrit.cloudera.org:8080/#/c/17166/5/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@125 PS5, Line 125: If maxAllowedScratchLimit < maxMemReserva > nit: Similar update needed here ? Done -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 13 Mar 2021 06:44:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java: http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@120 PS4, Line 120: scratch_limit < minMemReservationBytes > Update this and the one below to account for the extra maxRowBufferSize ACK. http://gerrit.cloudera.org:8080/#/c/17166/4/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@142 PS4, Line 142: maxMemReservationBytes = scratchLimit - maxRowBufferSize; > For this adjustment for maxRowBufferSize, can we not just do it up front (o If scratch_limit is unbounded, the maxMemReservationBytes calculation in line 114 is OK. Little overspill will not fail the query. In contrary, if scratch_limit is bounded, just a little overspill will terminate the query because scratch_limit is strictly enforced. What if I tidy up the comparison a bit so that it looks simpler? We define long maxAllowedScratchLimit = scratchLimit - maxRowBufferSize; Instead of comparing against scratchLimit, these should compare against maxAllowedScratchLimit; -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 13 Mar 2021 01:47:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 6: > Yes, let's do that. I presume you will explicitly enable result_spooling only > for the following test? > def test_result_spooling_and_varying_scratch_limit(self, vector): Yes. My plan is to disable result spooling as a separate patch for IMPALA-10559. This patch then only have result spooling enabled for test_result_spooling_and_varying_scratch_limit. -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 13 Mar 2021 17:36:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 6: > Patch Set 6: Verified-1 > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6965/ Looks like it hit flaky TestScratchLimit::test_with_unlimited_scratch_limit (IMPALA-10559). -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 13 Mar 2021 15:49:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 6: > Patch Set 6: > > > Patch Set 6: > > > > > Patch Set 6: Verified-1 > > > > > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6965/ > > > > Looks like it hit flaky TestScratchLimit::test_with_unlimited_scratch_limit > > (IMPALA-10559). > > Yeah, that is one failure..but I also see for the dockerized tests bunch of > other failures: > https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/3935/ although it > seems the failures are all related to lost connection. > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EINNER EXCEPTION: 'thrift.transport.TTransport.TTransportException'> > EMESSAGE: TSocket read 0 bytes TestScratchLimit::test_with_unlimited_scratch_limit crash one of the impalad due to DCHECK violation in reservation-tracker.cc:436. It is a violation when BufferedTupleStream is spilling and subsequently allocating new page for read. It is unrelated with this patch. But the combination of sort query + result spooling in this test seems to reveal another bug in BufferedTupleStream. At this point, I think we should disable result spooling for all test in TestScratchLimit (per discussion in IMPALA-10559 jira) and investigate it separately. What do you think? -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 13 Mar 2021 17:10:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10565: Check max spilled result spooling mem vs scratch limit
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17166 to look at the new patch set (#2). Change subject: IMPALA-10565: Check max_spilled_result_spooling_mem vs scratch_limit .. IMPALA-10565: Check max_spilled_result_spooling_mem vs scratch_limit IMPALA-9856 enables result spooling by default. However, if the query option scratch_limit is set lower than max_spilled_result_spooling_mem, the query might fail in the middle of execution due to insufficient scratch space. This patch validation that when result spooling is enabled, max_spilled_result_spooling_mem <= scratch_limit. This patch also fix validation between max_result_spooling_mem and max_spilled_result_spooling_mem that should treat both value 0 and -1 as unbounded. Testing: - Lower max_spilled_result_spooling_mem in test_with_high_scratch_limit and test_with_low_scratch_limit. - Toggle off spool_query_results in tests that set scratch_limit=0 to ensure that result spilling will not happen. - Add test_with_scratch_limit_less_than_max_spilled_result_spooling_mem. - Add be test QueryOptions.ResultSpoolingWithScratchLimit. - Pass exhaustive tests. Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 --- M be/src/runtime/bufferpool/buffer-pool-test.cc M be/src/service/query-options-test.cc M be/src/service/query-options.cc M testdata/workloads/functional-query/queries/QueryTest/spilling-naaj-no-deny-reservation.test M testdata/workloads/functional-query/queries/QueryTest/spilling-naaj.test M testdata/workloads/tpch/queries/sort-reservation-usage-single-node.test M tests/query_test/test_scratch_limit.py 7 files changed, 129 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/2 -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10565: Check max spilled result spooling mem vs scratch limit
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Check max_spilled_result_spooling_mem vs scratch_limit .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc@ PS1, Line : $1 > slightly misleading,does not necessarily have to be the same, could be 0 or Agree. I will rephrase into this: "If max_result_spooling_mem is set to unbounded ($0) max_spilled_result_spooling_mem must be set to unbounded (0 or -1) as well." http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc@1124 PS1, Line 1124: if (query_options->spool_query_results && scratch_limit != -1) { > since now we have spilling on by default, spool_query_results is set to 1GB Good point. Quick git grep show some more places where scratch_limit is set to 0, like some in spilling-naaj-no-deny-reservation.test. I will also set spool_query_results=0 in these places. -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 10 Mar 2021 23:28:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10565: Check max spilled result spooling mem vs scratch limit
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Check max_spilled_result_spooling_mem vs scratch_limit .. Patch Set 2: (2 comments) Patch set 2 disable result spooling in tests that set scratch_limit > -1. http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc@ PS1, Line : un > Agree. I will rephrase into this: Done http://gerrit.cloudera.org:8080/#/c/17166/1/be/src/service/query-options.cc@1124 PS1, Line 1124: if (query_options->spool_query_results && scratch_limit != -1) { > Good point. Quick git grep show some more places where scratch_limit is set Done -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 11 Mar 2021 00:53:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10565: Check max spilled result spooling mem vs scratch limit
Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17166 Change subject: IMPALA-10565: Check max_spilled_result_spooling_mem vs scratch_limit .. IMPALA-10565: Check max_spilled_result_spooling_mem vs scratch_limit IMPALA-9856 enables result spooling by default. However, if the query option scratch_limit is set lower than max_spilled_result_spooling_mem, the query might fail in the middle of execution due to insufficient scratch space. This patch validation that when result spooling is enabled, max_spilled_result_spooling_mem <= scratch_limit. This patch also fix validation between max_result_spooling_mem and max_spilled_result_spooling_mem that should treat both value 0 and -1 as unbounded. Testing: - Lower max_spilled_result_spooling_mem in test_with_high_scratch_limit and test_with_low_scratch_limit. - Toggle off spool_query_results in test_with_zero_scratch_limit_no_memory_limit to ensure that spill will not happen. - Add test_with_scratch_limit_less_than_max_spilled_result_spooling_mem. - Add be test QueryOptions.ResultSpoolingWithScratchLimit. - Pass exhaustive tests. Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 --- M be/src/service/query-options-test.cc M be/src/service/query-options.cc M tests/query_test/test_scratch_limit.py 3 files changed, 120 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/1 -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto
[Impala-ASF-CR] IMPALA-10583: Fix bug on unbounded result spooling memory config.
Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17187 Change subject: IMPALA-10583: Fix bug on unbounded result spooling memory config. .. IMPALA-10583: Fix bug on unbounded result spooling memory config. Two bugs happening on result spooling when we set two of its query options to 0 (unbounded). The first bug happens if max_spilled_result_spooling_mem = 0 (unbounded). max_unpinned_bytes_ in SpillableRowBatchQueue will be set to 0, SpillableRowBatchQueue::IsFull() then will always return true, and the query hang. This patch fix it by setting max_unpinned_bytes_ to INT64_MAX if max_spilled_result_spooling_mem = 0. The second bug happens if we set max_result_spooling_mem = 0 (unbounded). PlanRootSink.java will peg maxMemReservationBytes to always equal to minMemReservationBytes. This patch fixes this by reverting to the default max_result_spooling_mem (100MB). Testing: - Add test_unbounded_result_spooling_mem. - Pass core tests. Change-Id: If8f5e3668281bba8813f8082f45b4faa7721530e --- M be/src/exec/buffered-plan-root-sink.cc M be/src/runtime/spillable-row-batch-queue.cc M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M tests/query_test/test_result_spooling.py 4 files changed, 43 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/17187/1 -- To view, visit http://gerrit.cloudera.org:8080/17187 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: If8f5e3668281bba8813f8082f45b4faa7721530e Gerrit-Change-Number: 17187 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/17166/3/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/17166/3/be/src/service/query-options.cc@1104 PS3, Line 1104: // (a value of 0 or -1 means memory is unbounded). I just figured out in ParseUtil::ParseMemSpec() that -1 for memory query option is accepted as indicator for infinite memory and translated into 0 return value. So testing against -1 here is not necessary. I will revert this change. http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java File fe/src/main/java/org/apache/impala/planner/PlanRootSink.java: http://gerrit.cloudera.org:8080/#/c/17166/3/fe/src/main/java/org/apache/impala/planner/PlanRootSink.java@110 PS3, Line 110: long maxMemReservationBytes = Math.max( : queryOptions.getMax_result_spooling_mem(), minMemReservationBytes); This maxMemReservationBytes does not seems to take account of max_result_spooling_mem = 0 (unbounded), which can peg maxMemReservationBytes = minMemReservationBytes. Im not sure what to set maxMemReservationBytes in this case. Similarly, SpillableRowBatchQueue does not seem to recognize max_spilled_result_spooling_mem = 0 as unbounded. -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 12 Mar 2021 05:00:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Hello Aman Sinha, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17166 to look at the new patch set (#7). Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. IMPALA-10565: Adjust result spooling memory based on scratch_limit IMPALA-9856 enables result spooling by default. Result spooling depends on the ability to spill its entire BufferedTupleStream to disk once it hits maximum memory reservation. However, if the query option scratch_limit is set lower than max_spilled_result_spooling_mem, the query might fail in the middle of execution due to insufficient scratch space. This patch adds planner change to consider scratch_limit and scratch_dirs query option when computing resource used by result spooling. The algorithm is as follow: * If scratch_dirs is empty or scratch_limit < minMemReservationBytes required to use BufferedPlanRootSink, we set spool_query_results to false and fallback to use BlockingPlanRootSink. * If scratch_limit > minMemReservationBytes but still fairly low, we lower the max_result_spooling_mem (default is 100MB) and max_spilled_result_spooling_mem (default is 1GB) to fit scratch_limit. * if scratch_limit > max_spilled_result_spooling_mem, do nothing. Testing: - Add TestScratchLimit::test_result_spooling_and_varying_scratch_limit - Verify that spool_query_results query option is disabled in TestScratchDir::test_no_dirs - Pass exhaustive tests. Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 --- M be/src/service/query-options-test.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java A testdata/workloads/functional-query/queries/QueryTest/scratch-limit.test M tests/custom_cluster/test_scratch_disk.py M tests/query_test/test_scratch_limit.py 8 files changed, 143 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/17166/7 -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10559: Fix flakiness in TestScratchLimit.
Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17182 Change subject: IMPALA-10559: Fix flakiness in TestScratchLimit. .. IMPALA-10559: Fix flakiness in TestScratchLimit. TestScratchLimit has been flaky in ubuntu-16.04-dockerised-tests environment since results spooling is enabled by default in IMPALA-9856. A combination of result spooling, sort query, and low buffer_pool_limit in TestScratchLimit::test_with_unlimited_scratch_limit seems to reveal a memory reservation bug in BufferedTutpleStream. This patch disables result spooling for tests under TestScratchLimit until the underlying bug is found. We will investigate the bug in a separate JIRA. Testing: - Disable result spooling in all tests of TestScratchLimit before IMPALA-9856 gets in. - Run and pass TestScratchLimit locally. Change-Id: I68736d6bfb0001423fd138000670ac60b2117fbe --- M tests/query_test/test_scratch_limit.py 1 file changed, 6 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/17182/1 -- To view, visit http://gerrit.cloudera.org:8080/17182 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I68736d6bfb0001423fd138000670ac60b2117fbe Gerrit-Change-Number: 17182 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto
[Impala-ASF-CR] IMPALA-10565: Adjust result spooling memory based on scratch limit
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17166 ) Change subject: IMPALA-10565: Adjust result spooling memory based on scratch_limit .. Patch Set 7: Patch set 7 is a rebase of patch set 6. -- To view, visit http://gerrit.cloudera.org:8080/17166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I541f46e6911694e14c0fc25be1a6982fd929d3a9 Gerrit-Change-Number: 17166 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sun, 14 Mar 2021 03:32:06 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10584: Allow UnpinStream to fail if reservation is not enough
Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17195 Change subject: IMPALA-10584: Allow UnpinStream to fail if reservation is not enough .. IMPALA-10584: Allow UnpinStream to fail if reservation is not enough TestScratchLimit::test_with_unlimited_scratch_limit has been intermittently crashing in ubuntu-16.04-dockerised-tests environment after result spooling is enabled by default in IMPALA-9856. DCHECK violation occurs in ReservationTracker::CheckConsistency() due to BufferedTupleStream wrongly tries to reclaim memory reservation while unpinning the stream. For this bug to surface, all of the following needs to happen: - Stream is in pinned mode. - There are only 2 pages in the stream: 1 read and 1 write. - Stream can not increase reservation anymore either due to memory pressure or low buffer/memory limit. - The stream read page has been fully read and is attached to output RowBatch. But the output RowBatch has not cleaned up yet. - BufferedTupleStream::UnpinStream is invoked. The bug happens because UnpinStream proceeds to NextReadPage where the read page buffer was mistakenly assumed as released. default_page_len_ bytes were added into write_page_reservation_ and subsequently violates the total memory reservation. This patch fixes the bug by disallowing UnpinStream to advance the read iterator if the read page is attached to output RowBatch and there is no more unused reservation. If UnpinStream in turns unable to unpin any page to reclaim memory reservation, TErrorCode::NO_PAGE_UNPINNED will be returned and the stream will stay in pinned mode. In the context of BufferedPlanRootSink, if this error occurs in Send(), the writer will release the lock to give a chance for the reader to read some more rows and clean up the output RowBatch (current_batch_) before retying to add more rows again. Testing: - Add BE test UnpinReadPageMinReservation. - Reenable result spooling in TestScratchLimit that was disabled in IMPALA-10559. - Pass core tests. Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a --- M be/src/exec/buffered-plan-root-sink.cc M be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h M be/src/runtime/spillable-row-batch-queue.cc M be/src/runtime/spillable-row-batch-queue.h M common/thrift/generate_error_codes.py M tests/query_test/test_scratch_limit.py 8 files changed, 200 insertions(+), 47 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17195/1 -- To view, visit http://gerrit.cloudera.org:8080/17195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a Gerrit-Change-Number: 17195 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto
[Impala-ASF-CR] IMPALA-10492: Lower default MAX CNF EXPRS query option
Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17132 Change subject: IMPALA-10492: Lower default MAX_CNF_EXPRS query option .. IMPALA-10492: Lower default MAX_CNF_EXPRS query option MAX_CNF_EXPRS was set to unlimited by default. A complex query that container many predicates the CNF rewrite can lead to significant frontend memory usage and eventually OutOfMemory. We need to lower the default value to avoid the memory problem while maintaining performance for our TPC-DS and TPC-H workloads. We investigate the maximum number of CFN expressions in TPC-DS and TPC-H by printing out the final value of 'numCnfExprs_' from ConvertToCNFRule.java to the query profile. We found 5 queries that applies CNF rewrite rules as follow: | Query | numCnfExprs_ | |---+--| | TPCDS-Q13 | 168 | | TPCDS-Q85 | 100 | | TPCDS-Q48 | 34 | | TPCH-Q19 | 124 | | TPCH-Q7 |3 | This patch lower the default value from unlimited to 200 based on the result above. Testing: - Manually verify that MAX_CNF_EXPRS 200 is enough for our TPC-DS and TPC-H worloads. - Pass core tests. Change-Id: I7ca3d0e094ac01c24a046c25d6a1b56bf134faa8 --- M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift 2 files changed, 2 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/17132/1 -- To view, visit http://gerrit.cloudera.org:8080/17132 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I7ca3d0e094ac01c24a046c25d6a1b56bf134faa8 Gerrit-Change-Number: 17132 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto
[Impala-ASF-CR] IMPALA-10492: Lower default MAX CNF EXPRS query option
Hello Aman Sinha, David Rorke, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17132 to look at the new patch set (#2). Change subject: IMPALA-10492: Lower default MAX_CNF_EXPRS query option .. IMPALA-10492: Lower default MAX_CNF_EXPRS query option MAX_CNF_EXPRS was set to unlimited by default. The CNF rewrite can lead to significant frontend memory usage and eventually OutOfMemory for a complex query that contain many predicates. We need to lower the default value to avoid this memory problem while maintaining performance for our TPC-DS and TPC-H workloads. We investigate the maximum number of CFN expressions in TPC-DS and TPC-H by printing out the final value of 'numCnfExprs_' from ConvertToCNFRule.java to the query profile. We found 5 queries that applies CNF rewrite rules as follow: | Query | numCnfExprs_ | |---+--| | TPCDS-Q13 | 168 | | TPCDS-Q85 | 100 | | TPCDS-Q48 | 34 | | TPCH-Q19 | 124 | | TPCH-Q7 |3 | This patch lower the default value from unlimited to 200 based on the result above. Testing: - Manually verify that MAX_CNF_EXPRS 200 is enough for our TPC-DS and TPC-H worloads. - Pass core tests. Change-Id: I7ca3d0e094ac01c24a046c25d6a1b56bf134faa8 --- M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift 2 files changed, 2 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/17132/2 -- To view, visit http://gerrit.cloudera.org:8080/17132 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7ca3d0e094ac01c24a046c25d6a1b56bf134faa8 Gerrit-Change-Number: 17132 Gerrit-PatchSet: 2 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-9856: Enable result spooling by default.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/16755 ) Change subject: IMPALA-9856: Enable result spooling by default. .. Patch Set 4: (2 comments) Patch set 4 is a rebase after IMPALA-10371 resolved. http://gerrit.cloudera.org:8080/#/c/16755/3/tests/custom_cluster/test_parquet_max_page_header.py File tests/custom_cluster/test_parquet_max_page_header.py: http://gerrit.cloudera.org:8080/#/c/16755/3/tests/custom_cluster/test_parquet_max_page_header.py@110 PS3, Line 110: # IMPALA-9856: Since this test expect to read a row up to 10 > nit: add comment as to why this query option is needed and why 11mb Done http://gerrit.cloudera.org:8080/#/c/16755/3/tests/query_test/test_query_mem_limit.py File tests/query_test/test_query_mem_limit.py: http://gerrit.cloudera.org:8080/#/c/16755/3/tests/query_test/test_query_mem_limit.py@99 PS3, Line 99: # IMPALA-9856: For the group_concat > nit: add why we need this and why it is 18M Done -- To view, visit http://gerrit.cloudera.org:8080/16755 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e360c1428676d8f3fab5d95efee18aca085eba4 Gerrit-Change-Number: 16755 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 24 Feb 2021 17:11:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9856: Enable result spooling by default.
Hello Quanlong Huang, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16755 to look at the new patch set (#4). Change subject: IMPALA-9856: Enable result spooling by default. .. IMPALA-9856: Enable result spooling by default. Result spooling has been relatively stable since it was introduced, and it has several benefits described in IMPALA-8656. This patch enable result spooling (SPOOL_QUERY_RESULTS) query options by default. Furthermore, some tests need to be adjusted to account for result spooling by default. The following are the adjustment categories and list of tests that fall under such category. Change in assertions: PlannerTest#testAcidTableScans PlannerTest#testBloomFilterAssignment PlannerTest#testConstantFolding PlannerTest#testFkPkJoinDetection PlannerTest#testFkPkJoinDetectionWithHDFSNumRowsEstDisabled PlannerTest#testKuduSelectivity PlannerTest#testMaxRowSize PlannerTest#testMinMaxRuntimeFilters PlannerTest#testMinMaxRuntimeFiltersWithHDFSNumRowsEstDisabled PlannerTest#testMtDopValidation PlannerTest#testParquetFiltering PlannerTest#testParquetFilteringDisabled PlannerTest#testPartitionPruning PlannerTest#testPreaggBytesLimit PlannerTest#testResourceRequirements PlannerTest#testRuntimeFilterQueryOptions PlannerTest#testSortExprMaterialization PlannerTest#testSpillableBufferSizing PlannerTest#testTableSample PlannerTest#testTpch PlannerTest#testKuduTpch PlannerTest#testTpchNested PlannerTest#testUnion TpcdsPlannerTest custom_cluster/test_admission_controller.py::TestAdmissionController::test_dedicated_coordinator_planner_estimates custom_cluster/test_admission_controller.py::TestAdmissionController::test_memory_rejection custom_cluster/test_admission_controller.py::TestAdmissionController::test_pool_mem_limit_configs metadata/test_explain.py::TestExplain::test_explain_level2 metadata/test_explain.py::TestExplain::test_explain_level3 metadata/test_stats_extrapolation.py::TestStatsExtrapolation::test_stats_extrapolation Increase BUFFER_POOL_LIMIT: query_test/test_queries.py::TestQueries::test_analytic_fns query_test/test_runtime_filters.py::TestRuntimeRowFilters::test_row_filter_reservation query_test/test_sort.py::TestQueryFullSort::test_multiple_mem_limits_full_output query_test/test_spilling.py::TestSpillingBroadcastJoins::test_spilling_broadcast_joins query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_aggs query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_regression_exhaustive query_test/test_udfs.py::TestUdfExecution::test_mem_limits Increase MEM_LIMIT: query_test/test_mem_usage_scaling.py::TestExchangeMemUsage::test_exchange_mem_usage_scaling query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling Increase MAX_ROW_SIZE: custom_cluster/test_parquet_max_page_header.py::TestParquetMaxPageHeader::test_large_page_header_config query_test/test_insert.py::TestInsertQueries::test_insert_large_string query_test/test_query_mem_limit.py::TestQueryMemLimit::test_mem_limit query_test/test_scanners.py::TestTextSplitDelimiters::test_text_split_across_buffers_delimiter query_test/test_scanners.py::TestWideRow::test_wide_row Disable result spooling to maintain assertion: custom_cluster/test_admission_controller.py::TestAdmissionController::test_set_request_pool custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_host_memory custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_pool_memory custom_cluster/test_admission_controller.py::TestAdmissionController::test_queue_reasons_memory custom_cluster/test_admission_controller.py::TestAdmissionController::test_pool_config_change_while_queued custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_fetched_rows custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_finished_query custom_cluster/test_scratch_disk.py::TestScratchDir::test_no_dirs custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_existing_dirs custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_writable_dirs query_test/test_insert.py::TestInsertQueries::test_insert_large_string (the last query only) query_test/test_kudu.py::TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_kudu_scan_mem_usage query_test/test_queries.py::TestQueriesParquetTables::test_very_large_strings query_test/test_query_mem_limit.py::TestCodegenMemLimit::test_codegen_mem_limit shell/test_shell_client.py::TestShellClient::test_fetch_size Testing: - Pass exhaustive tests. Change-Id: I9e360c1428676d8f3fab5d95efee18aca085eba4 --- M common/thrift/ImpalaInternalService.thrift M testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test M
[Impala-ASF-CR] IMPALA-10492: Lower default MAX CNF EXPRS query option
Hello Aman Sinha, David Rorke, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17132 to look at the new patch set (#3). Change subject: IMPALA-10492: Lower default MAX_CNF_EXPRS query option .. IMPALA-10492: Lower default MAX_CNF_EXPRS query option MAX_CNF_EXPRS was set to unlimited by default. The CNF rewrite can lead to significant frontend memory usage and eventually OutOfMemory for a complex query that contain many predicates. We need to lower the default value to avoid this memory problem while maintaining performance for our TPC-DS and TPC-H workloads. We investigate the maximum number of CNF expressions in TPC-DS and TPC-H by printing out the final value of 'numCnfExprs_' from ConvertToCNFRule.java to the query profile. We found 5 queries that applies CNF rewrite rules as follow: | Query | numCnfExprs_ | |---+--| | TPCDS-Q13 | 168 | | TPCDS-Q85 | 100 | | TPCDS-Q48 | 34 | | TPCH-Q19 | 124 | | TPCH-Q7 |3 | This patch lower the default value from unlimited to 200 based on the result above. Testing: - Manually verify that MAX_CNF_EXPRS 200 is enough for our TPC-DS and TPC-H worloads. - Pass core tests. Change-Id: I7ca3d0e094ac01c24a046c25d6a1b56bf134faa8 --- M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift 2 files changed, 2 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/17132/3 -- To view, visit http://gerrit.cloudera.org:8080/17132 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7ca3d0e094ac01c24a046c25d6a1b56bf134faa8 Gerrit-Change-Number: 17132 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10492: Lower default MAX CNF EXPRS query option
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17132 ) Change subject: IMPALA-10492: Lower default MAX_CNF_EXPRS query option .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/17132/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17132/2//COMMIT_MSG@15 PS2, Line 15: CNF > nit: CNF Done -- To view, visit http://gerrit.cloudera.org:8080/17132 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7ca3d0e094ac01c24a046c25d6a1b56bf134faa8 Gerrit-Change-Number: 17132 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Sat, 27 Feb 2021 14:25:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 ) Change subject: IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky .. Patch Set 10: (3 comments) http://gerrit.cloudera.org:8080/#/c/17252/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17252/10//COMMIT_MSG@7 PS10, Line 7: IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky In my opinion, we should split this into 2 separate changes. One is focus on fixing the test_overlap_min_max_filters, and the other is for the query profile improvement. http://gerrit.cloudera.org:8080/#/c/17252/10/be/src/exec/scan-node.cc File be/src/exec/scan-node.cc: http://gerrit.cloudera.org:8080/#/c/17252/10/be/src/exec/scan-node.cc@236 PS10, Line 236: Current " :"time(ms): $2", I think this is redundant and potentially confuse users who try to debug query profile. The information from wait_time (Maximum arrival delay) should be sufficient. As far as I know, MonotonicMilis mostly used to calculate time interval, not to represent system time. Besides, filter arrival time is already displayed under host profile like this: impala-executor-000-1:27000: Filter 16 arrival: 662ms Filter 14 arrival: 10s454ms Filter 6 arrival: 10s514ms Filter 8 arrival: 5m46s Filter 9 arrival: 5m46s Filter 10 arrival: 5m46s Filter 0 arrival: 7m29s Filter 1 arrival: 7m29s Filter 2 arrival: 7m29s http://gerrit.cloudera.org:8080/#/c/17252/10/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/17252/10/be/src/runtime/coordinator.cc@1614 PS10, Line 1614: minmaxDisabled_ = true; I think we should maintain the same behavior as bloom_filter_ in here, making the logic consistent for both filter type. I understand that this change is made to distinguish between 1) an actual AlwaysTrue filter that originate from executor, vs 2) AlwaysTrue filter that comes from disabling by coordinator. Instead, what if we have a new boolean variable to track 1). Say, a boolean variable 'always_true_minmax_received_'. Coordinator::FilterDebugString() then should refer this boolean variable instead of checking minmax_filterPB.always_true(). Similarly also for AlwaysFalse filter arrival. -- To view, visit http://gerrit.cloudera.org:8080/17252 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964 Gerrit-Change-Number: 17252 Gerrit-PatchSet: 10 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Thu, 08 Apr 2021 16:25:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 ) Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator .. Patch Set 21: (1 comment) http://gerrit.cloudera.org:8080/#/c/17252/21/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test File testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test: http://gerrit.cloudera.org:8080/#/c/17252/21/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test@285 PS21, Line 285: set minmax_filter_threshold=0.5; : select straight_join count(*) from : lineitem_orderkey_only a join [SHUFFLE] tpch_parquet.orders b : where a.l_orderkey = b.o_orderkey; It looks like this test behave differently if filter arrived on time vs arrived late. The jenkins failure seemingly because of both filters arrived late. It is curious that late filter arrival lead to less row from lineitem_orderkey_only scan, not more. In my local machine, if I add query option "set runtime_filter_wait_time_ms=5000;", this test pass. But without setting wait time, an AlwaysFalse minmax filter seems to be broadcasted. This is a section of query profile where this test failed: lv-desktop:27000: Filter 1 arrival: 1s367ms Filter 0 arrival: 1s371ms ... Runtime filters: Not all filters arrived (arrived: [], missing [1, 0]), waited for 785ms. Arrival delay: 1s000ms. ... Filter 1 (0): - Files processed: 0 (0) - Files rejected: 0 (0) - Files total: 0 (0) - InactiveTotalTime: 0.000ns - RowGroups processed: 0 (0) - RowGroups rejected: 0 (0) - RowGroups total: 1 (1) - Rows processed: 1.81M (1810767) - Rows rejected: 1.81M (1810767) - Rows total: 2.14M (2142543) - Splits processed: 0 (0) - Splits rejected: 0 (0) - Splits total: 0 (0) - TotalTime: 0.000ns -- To view, visit http://gerrit.cloudera.org:8080/17252 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964 Gerrit-Change-Number: 17252 Gerrit-PatchSet: 21 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 15 Apr 2021 21:07:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 ) Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator .. Patch Set 15: (2 comments) Thanks Qifan for adjusting the changes. I have 2 comments for patch set 15. http://gerrit.cloudera.org:8080/#/c/17252/15//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17252/15//COMMIT_MSG@29 PS15, Line 29: Testing: : 1. Ran unit tests; : 2. Ran core tests. Can we add a test to verify the content of MIN/MAX column in filter routing table? http://gerrit.cloudera.org:8080/#/c/17252/15/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/17252/15/be/src/runtime/coordinator.cc@659 PS15, Line 659: minmax_filterPB.has_always_true() && minmax_filterPB.always_true() After a filter is disabled and broadcasted, isn't this always evaluate to True? Maybe we should check state.AlwaysTrueFilterReceived() instead? -- To view, visit http://gerrit.cloudera.org:8080/17252 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964 Gerrit-Change-Number: 17252 Gerrit-PatchSet: 15 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Sat, 10 Apr 2021 01:19:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.
Hello Quanlong Huang, Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17195 to look at the new patch set (#3). Change subject: IMPALA-10584: Defer advancing read page when reservation is tight. .. IMPALA-10584: Defer advancing read page when reservation is tight. TestScratchLimit::test_with_unlimited_scratch_limit has been intermittently crashing in ubuntu-16.04-dockerised-tests environment after result spooling is enabled by default in IMPALA-9856. DCHECK violation occurs in ReservationTracker::CheckConsistency() due to BufferedTupleStream wrongly tries to reclaim memory reservation while unpinning the stream. For this bug to surface, all of the following needs to happen: - Stream is in pinned mode. - There are only 2 pages in the stream: 1 read and 1 write. - Stream can not increase reservation anymore either due to memory pressure or low buffer/memory limit. - The stream read page has been fully read and is attached to output RowBatch. But the output RowBatch has not cleaned up yet. - BufferedTupleStream::UnpinStream is invoked. The memory accounting bug happens because UnpinStream proceeds to NextReadPage where the read page buffer was mistakenly assumed as released. default_page_len_ bytes were added into write_page_reservation_ and subsequently violates the total memory reservation. This patch fixes the bug by deferring advancement of the read iterator in UnpinStream if the read page is attached to output RowBatch and there is no more unused reservation. This is OK because after UnpinStream finished, the stream is now in unpinned mode and has_read_write_page is false. The next AddRow operation is then allowed to unpin the previous write page first before reusing the reservation to allocate a new write page. Testing: - Loop the TestScratchLimit::test_with_unlimited_scratch_limit in my local dev machine and verify that each test passed without triggering the DCHECK violation. - Reenable result spooling in TestScratchLimit that was disabled in IMPALA-10559. - Pass core tests. Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a --- M be/src/runtime/buffered-tuple-stream.cc M tests/query_test/test_scratch_limit.py 2 files changed, 24 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17195/3 -- To view, visit http://gerrit.cloudera.org:8080/17195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a Gerrit-Change-Number: 17195 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 ) Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator .. Patch Set 17: (5 comments) http://gerrit.cloudera.org:8080/#/c/17252/15//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17252/15//COMMIT_MSG@29 PS15, Line 29: row group is processed. By comparing these two timestamps, one : can easily diagnose issues related to late arrival of min/max : filters. > Consider adding these three tests for Final table as its content is sending The test added in Patch set 17 looks good to me, thanks! http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/runtime/coordinator-filter-state.h File be/src/runtime/coordinator-filter-state.h: http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/runtime/coordinator-filter-state.h@178 PS17, Line 178: True value means the always false flag in aggregated filter is flipped. To further clarify the doc, please mention that a True value means the filter was flipped from True to False by coordinator. http://gerrit.cloudera.org:8080/#/c/17252/15/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/17252/15/be/src/runtime/coordinator.cc@659 PS15, Line 659: > If AlwaysTrueFilterReceived() is true, then the accumulated filter is logic New logic looks good to me. I'll continue my comments on patch set 17. http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/runtime/coordinator.cc@668 PS17, Line 668: state.AlwaysFalseFlipped() The method name sounds ambiguous. Was it flipped from true to false, or false to true. Seems like the former. The method return true IF filter was an AlwaysFalse before being disabled (flipped to an AlwaysTrue) by coordinator. Maybe "WasAlwaysFalse" is more descriptive? http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/util/min-max-filter.h File be/src/util/min-max-filter.h: http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/util/min-max-filter.h@358 PS17, Line 358: : std::string DebugString(const MinMaxFilterPB& filter); : bool AlwaysTrue(const MinMaxFilterPB& filter); : bool AlwaysFalse(const MinMaxFilterPB& filter); : std::string DebugString(const ColumnValuePB& value); Any reason not making these methods as static methods under MinMaxFilter class? -- To view, visit http://gerrit.cloudera.org:8080/17252 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964 Gerrit-Change-Number: 17252 Gerrit-PatchSet: 17 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 12 Apr 2021 18:59:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17195 ) Change subject: IMPALA-10584: Defer advancing read page when reservation is tight. .. Patch Set 3: It looks like the solution of this problem is much more simple than I thought: just don't advance the read page. The initial reason why we are advancing read page in the first place is explained in IMPALA-8890 (review is in https://gerrit.cloudera.org/c/14144/): a DCHECK hit in ExpectedPinCount() because we tried to unpin a fully exhausted and attached read_page_. IMPALA-8890 choose to advance the read iterator to avoid this DCHECK. Patch set 3 in this review choose to NOT advance the read page, but instead skip the first page (that is the read page) when iterating pages_ to call UnpinPageIfNeeded. -- To view, visit http://gerrit.cloudera.org:8080/17195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a Gerrit-Change-Number: 17195 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Mon, 12 Apr 2021 20:18:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 ) Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator .. Patch Set 18: Code-Review+1 (1 comment) Looks good to me. Just have one more nit. Please carry my +1 after fix. http://gerrit.cloudera.org:8080/#/c/17252/18//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17252/18//COMMIT_MSG@12 PS18, Line 12: always_false_flipped_ nit: always_false_flipped_to_false_ -- To view, visit http://gerrit.cloudera.org:8080/17252 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964 Gerrit-Change-Number: 17252 Gerrit-PatchSet: 18 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 12 Apr 2021 21:00:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10611: Fix flakiness in test wide row
Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17324 Change subject: IMPALA-10611: Fix flakiness in test_wide_row .. IMPALA-10611: Fix flakiness in test_wide_row test_wide_row has been intermittently failed with "Failed to allocate row batch" error message. This is due to recent change in IMPALA-9856 that add query option max_row_size=10MB without raising the mem_limit. This patch fix the flakiness by increasing the mem_limit from 100 MB to 132 MB to account for 32 MB reservation needed by BufferedPlanRootSink. Testing: - Loop the test in local dev machine. Change-Id: Ie1f0b7d4d6b3a875d9b408f057d46fdbdbdf2a34 --- M tests/query_test/test_scanners.py 1 file changed, 4 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/17324/1 -- To view, visit http://gerrit.cloudera.org:8080/17324 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ie1f0b7d4d6b3a875d9b408f057d46fdbdbdf2a34 Gerrit-Change-Number: 17324 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto
[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 ) Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator .. Patch Set 22: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/17252 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964 Gerrit-Change-Number: 17252 Gerrit-PatchSet: 22 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 19 Apr 2021 18:04:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 ) Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator .. Patch Set 21: > Patch Set 21: > > Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7067/ > DRY_RUN=false I think we need to rerun the gerrit-verify-dryrun since jenkins.impala.io down yesterday. -- To view, visit http://gerrit.cloudera.org:8080/17252 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964 Gerrit-Change-Number: 17252 Gerrit-PatchSet: 21 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 14 Apr 2021 11:39:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 ) Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator .. Patch Set 20: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/17252 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964 Gerrit-Change-Number: 17252 Gerrit-PatchSet: 20 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Tue, 13 Apr 2021 16:01:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.
Hello Quanlong Huang, Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17195 to look at the new patch set (#4). Change subject: IMPALA-10584: Defer advancing read page when reservation is tight. .. IMPALA-10584: Defer advancing read page when reservation is tight. TestScratchLimit::test_with_unlimited_scratch_limit has been intermittently crashing in ubuntu-16.04-dockerised-tests environment after result spooling is enabled by default in IMPALA-9856. DCHECK violation occurs in ReservationTracker::CheckConsistency() due to BufferedTupleStream wrongly tries to reclaim memory reservation while unpinning the stream. For this bug to surface, all of the following needs to happen: - Stream is in pinned mode. - There are only 2 pages in the stream: 1 read and 1 write. - Stream can not increase reservation anymore either due to memory pressure or low buffer/memory limit. - The stream read page has been fully read and is attached to output RowBatch. But the output RowBatch has not cleaned up yet. - BufferedTupleStream::UnpinStream is invoked. The memory accounting bug happens because UnpinStream proceeds to NextReadPage where the read page buffer was mistakenly assumed as released. default_page_len_ bytes were added into write_page_reservation_ and subsequently violates the total memory reservation. This patch fixes the bug by deferring advancement of the read iterator in UnpinStream if the read page is attached to output RowBatch and there is no more unused reservation. This is OK because after UnpinStream finished, the stream is now in unpinned mode and has_read_write_page is false. The next AddRow operation is then allowed to unpin the previous write page first before reusing the reservation to allocate a new write page. Testing: - Add be test DeferAdvancingReadPage. - Loop the TestScratchLimit::test_with_unlimited_scratch_limit in my local dev machine and verify that each test passed without triggering the DCHECK violation. - Reenable result spooling in TestScratchLimit that was disabled in IMPALA-10559. - Pass core tests. Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a --- M be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h M tests/query_test/test_scratch_limit.py 4 files changed, 97 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17195/4 -- To view, visit http://gerrit.cloudera.org:8080/17195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a Gerrit-Change-Number: 17195 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17195 ) Change subject: IMPALA-10584: Defer advancing read page when reservation is tight. .. Patch Set 4: (2 comments) Patch set 4 submitted. http://gerrit.cloudera.org:8080/#/c/17195/3/be/src/runtime/buffered-tuple-stream-test.cc File be/src/runtime/buffered-tuple-stream-test.cc: http://gerrit.cloudera.org:8080/#/c/17195/3/be/src/runtime/buffered-tuple-stream-test.cc@1237 PS3, Line 1237: TEST_F(SimpleTupleStreamTest, UnpinReadPage) { > It is possible to reproduce the bug with some changes in this test? Added DeferAdvancingReadPage to test this corner case. http://gerrit.cloudera.org:8080/#/c/17195/3/be/src/runtime/buffered-tuple-stream.cc File be/src/runtime/buffered-tuple-stream.cc: http://gerrit.cloudera.org:8080/#/c/17195/3/be/src/runtime/buffered-tuple-stream.cc@723 PS3, Line 723: > Does this assume the size of the read page is default_page_len_? Should we That is a good point, thanks! However, replacing it with read_it_.read_page_->len() caught this DCHECK failure: "buffer-pool.cc:93] Check failed: is_open()". I think this because the the page handle has been closed in AttachBufferToBatch(). I made a work around by memorizing the page len in last_attached_page_len_ variable. -- To view, visit http://gerrit.cloudera.org:8080/17195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a Gerrit-Change-Number: 17195 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 13 Apr 2021 19:07:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17195 ) Change subject: IMPALA-10584: Defer advancing read page when reservation is tight. .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/17195/4/be/src/runtime/buffered-tuple-stream.cc File be/src/runtime/buffered-tuple-stream.cc: http://gerrit.cloudera.org:8080/#/c/17195/4/be/src/runtime/buffered-tuple-stream.cc@733 PS4, Line 733: dvancing_read_page = true; > This DCHECK was part of my debugging when investigating the bug. Done http://gerrit.cloudera.org:8080/#/c/17195/4/be/src/runtime/buffered-tuple-stream.cc@750 PS4, Line 750: > Ack Done. Reword it a bit. -- To view, visit http://gerrit.cloudera.org:8080/17195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a Gerrit-Change-Number: 17195 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 13 Apr 2021 20:41:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17195 ) Change subject: IMPALA-10584: Defer advancing read page when reservation is tight. .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/17195/4/be/src/runtime/buffered-tuple-stream.cc File be/src/runtime/buffered-tuple-stream.cc: http://gerrit.cloudera.org:8080/#/c/17195/4/be/src/runtime/buffered-tuple-stream.cc@733 PS4, Line 733: read_it_.read_page_ == pages_.begin() > I wonder if this assertion is always true, since in pinned mode, there can This DCHECK was part of my debugging when investigating the bug. The intent is to match assertion in NextReadPage (buffered-tuple-stream.cc:529). Agree that this DCHECK most likely will always be true since we check earlier that attached_to_output_batch == true. Will remove this DCHECK in next patch set. http://gerrit.cloudera.org:8080/#/c/17195/4/be/src/runtime/buffered-tuple-stream.cc@750 PS4, Line 750: unpin this first page > May add a comment on when this currently unpinned page is finally unpinned. Ack -- To view, visit http://gerrit.cloudera.org:8080/17195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a Gerrit-Change-Number: 17195 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 13 Apr 2021 20:06:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17195 to look at the new patch set (#5). Change subject: IMPALA-10584: Defer advancing read page when reservation is tight. .. IMPALA-10584: Defer advancing read page when reservation is tight. TestScratchLimit::test_with_unlimited_scratch_limit has been intermittently crashing in ubuntu-16.04-dockerised-tests environment after result spooling is enabled by default in IMPALA-9856. DCHECK violation occurs in ReservationTracker::CheckConsistency() due to BufferedTupleStream wrongly tries to reclaim memory reservation while unpinning the stream. For this bug to surface, all of the following needs to happen: - Stream is in pinned mode. - There are only 2 pages in the stream: 1 read and 1 write. - Stream can not increase reservation anymore either due to memory pressure or low buffer/memory limit. - The stream read page has been fully read and is attached to output RowBatch. But the output RowBatch has not cleaned up yet. - BufferedTupleStream::UnpinStream is invoked. The memory accounting bug happens because UnpinStream proceeds to NextReadPage where the read page buffer was mistakenly assumed as released. default_page_len_ bytes were added into write_page_reservation_ and subsequently violates the total memory reservation. This patch fixes the bug by deferring advancement of the read iterator in UnpinStream if the read page is attached to output RowBatch and there is no more unused reservation. This is OK because after UnpinStream finished, the stream is now in unpinned mode and has_read_write_page is false. The next AddRow operation is then allowed to unpin the previous write page first before reusing the reservation to allocate a new write page. Testing: - Add be test DeferAdvancingReadPage. - Loop the TestScratchLimit::test_with_unlimited_scratch_limit in my local dev machine and verify that each test passed without triggering the DCHECK violation. - Reenable result spooling in TestScratchLimit that was disabled in IMPALA-10559. - Pass core tests. Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a --- M be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h M tests/query_test/test_scratch_limit.py 4 files changed, 96 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17195/5 -- To view, visit http://gerrit.cloudera.org:8080/17195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a Gerrit-Change-Number: 17195 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17195 ) Change subject: IMPALA-10584: Defer advancing read page when reservation is tight. .. Patch Set 5: (1 comment) s http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc File be/src/runtime/buffered-tuple-stream.cc: http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc@728 PS5, Line 728: if (buffer_pool_client_->GetUnusedReservation() < last_attached_page_len_) { : // Since attached_to_output_batch is true and GetUnusedReservation() is less : // than last_attached_page_len_, the reader is most likely has not clean up the : // RowBatch where the read page is attached to. We defer advancing the read page : // until the next GetNext() call by the reader (see IMPALA-10584). : defer_advancing_read_page = true; : } > Your understanding is correct. Ideally, BufferedTupleStream should be 100% Ok, after some investigation, I'm not confident I can implement solution 2) safely. The refactoring will touch several code that I don't fully understand yet such as grouping-aggregator, partitioned-hash-join-builder, and analytic-eval-node. What if I change the logic into this instead: defer_advancing_read_page = num_pages_ <= 2; Basically, we avoid getting into situation where we ended up with only 1 read-write page while unpinning the stream. I think this is also a valid reason not to advance the read page (the other reason is that the reader has not freed the attached buffer). The "stealing" still won't happen and the DCHECK still won't hit. -- To view, visit http://gerrit.cloudera.org:8080/17195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a Gerrit-Change-Number: 17195 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 14 Apr 2021 21:27:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17195 ) Change subject: IMPALA-10584: Defer advancing read page when reservation is tight. .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc File be/src/runtime/buffered-tuple-stream.cc: http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc@728 PS5, Line 728: if (buffer_pool_client_->GetUnusedReservation() < last_attached_page_len_) { : // Since attached_to_output_batch is true and GetUnusedReservation() is less : // than last_attached_page_len_, the reader is most likely has not clean up the : // RowBatch where the read page is attached to. We defer advancing the read page : // until the next GetNext() call by the reader (see IMPALA-10584). : defer_advancing_read_page = true; : } > Maybe I'm misunderstanding something. This still seems tricky for me. I agr Your understanding is correct. Ideally, BufferedTupleStream should be 100% aware whether its client has freed the attached buffer or not. It raise a question though on how to do that, because the stream lost reference to the buffer once it attach the buffer to output row batch. In relation of BufferedPlanRootSink and SpillableRowBatchQueue, I can think of 2 solution: 1). Track the amount of both used and unused reservation at the end of GetNext() call. We can then check the reservation amount again in UnpinStream() to determine whether client has freed the buffer or not. However, this can be tricky and lead to more corner cases such as what if we increase/reduce the total reservation in between? 2). Add an explicit method in BufferedTupleStream to signal the free. SpillableRowBatchQueue then need similar method as well to chain the signal down from BufferedPlanRootSink to BufferedTupleStream. The mechanics might becomes awkward, but it maintain correctness. I'll try to implement solution 2) first. -- To view, visit http://gerrit.cloudera.org:8080/17195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a Gerrit-Change-Number: 17195 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 14 Apr 2021 18:13:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10596: De-flake TestAdmissionControllerStress
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17272 ) Change subject: IMPALA-10596: De-flake TestAdmissionControllerStress .. Patch Set 1: Code-Review+1 Looks good to me, thanks Bikram for digging into this. Agree that result spooling is not necessary for this test. -- To view, visit http://gerrit.cloudera.org:8080/17272 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1594a82f507db8dc094d4441dd8739d037f974ff Gerrit-Change-Number: 17272 Gerrit-PatchSet: 1 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 09 Apr 2021 17:35:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10583: Fix bug on unbounded result spooling memory config.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17187 ) Change subject: IMPALA-10583: Fix bug on unbounded result spooling memory config. .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17187/1/be/src/exec/buffered-plan-root-sink.cc File be/src/exec/buffered-plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/17187/1/be/src/exec/buffered-plan-root-sink.cc@55 PS1, Line 55: if (max_spilled_mem <= 0) max_spilled_mem = INT64_MAX; > In practice, this would eventually encounter other limits such as running o The closest limit I find is scratch_dirs startup flag. In this flag, we define the path of scratch dirs and capacity quota (maximum bytes allowed to spill) for each dirs. But the quota specification itself is optional. So without quota being specified, I suppose we depend on OS to tell impala when disk space has ran out. https://impala.apache.org/docs/build/html/topics/impala_disk_space.html -- To view, visit http://gerrit.cloudera.org:8080/17187 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If8f5e3668281bba8813f8082f45b4faa7721530e Gerrit-Change-Number: 17187 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 09 Apr 2021 17:26:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10584: Defer advancing read page if stream only has 2 pages.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17195 ) Change subject: IMPALA-10584: Defer advancing read page if stream only has 2 pages. .. Patch Set 6: (8 comments) Patch set 5 change the condition on when to NOT advance the read page. Read page will not be advanced in UnpinStream if read page is attached to output RowBatch and there are only 2 pages in the stream. http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc File be/src/runtime/buffered-tuple-stream-test.cc: http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1271 PS5, Line 1271: ASSERT_OK(stream.UnpinStream(BufferedTupleStream::UNPIN_ALL_EXCEPT_CURRENT)); > Not related to this patch, but I'm confused how do we verify we have advanc I think this verify the read page has been advanced by not hitting DCHECK at ExpectedPinCount(). http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1300 PS5, Line 1300: // Test that UnpinStream defer advancing the read page when all rows from the read page > Could you add some comments above this? e.g. Done http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1333 PS5, Line 1333: ASSERT_TRUE(read_batch.num_rows() < num_rows); > Could you add a comment here that we are adding rows without releasing the Done http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1334 PS5, Line 1334: ASSERT_TRUE(!eos); > Could you leave a comment about why do we run this twice? I guess we want t Done http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1354 PS5, Line 1354: // Retry inserting this row by decreasing the index. > Could you add ASSERT_FALSE(stream.is_pinned()) at the end? Done http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream-test.cc@1356 PS5, Line 1356: // even if we're not immediately cleaning up the read_batch. > Do we need read_batch.Reset() as well? Done http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc File be/src/runtime/buffered-tuple-stream.cc: http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc@728 PS5, Line 728: // defer advancing the read page until the next GetNext() call by the reader : // (see IMPALA-10584). : defer_advancing_read_page = true; : } : } : : if > Ok, after some investigation, I'm not confident I can implement solution 2) Marking this as Done. Lets continue this discussion in patch set 5. http://gerrit.cloudera.org:8080/#/c/17195/5/be/src/runtime/buffered-tuple-stream.cc@747 PS5, Line 747: > May DCHECK() on this assertion here. Done -- To view, visit http://gerrit.cloudera.org:8080/17195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a Gerrit-Change-Number: 17195 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 14 Apr 2021 22:22:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10584: Defer advancing read page if stream only has 2 pages.
Hello Quanlong Huang, Qifan Chen, Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17195 to look at the new patch set (#6). Change subject: IMPALA-10584: Defer advancing read page if stream only has 2 pages. .. IMPALA-10584: Defer advancing read page if stream only has 2 pages. TestScratchLimit::test_with_unlimited_scratch_limit has been intermittently crashing in ubuntu-16.04-dockerised-tests environment after result spooling is enabled by default in IMPALA-9856. DCHECK violation occurs in ReservationTracker::CheckConsistency() due to BufferedTupleStream wrongly tries to reclaim memory reservation while unpinning the stream. For this bug to surface, all of the following needs to happen: - Stream is in pinned mode. - There are only 2 pages in the stream: 1 read and 1 write. - Stream can not increase reservation anymore either due to memory pressure or low buffer/memory limit. - The stream read page has been fully read and is attached to output RowBatch. But the output RowBatch has not cleaned up yet. - BufferedTupleStream::UnpinStream is invoked. The memory accounting bug happens because UnpinStream proceeds to NextReadPage where the read page buffer was mistakenly assumed as released. default_page_len_ bytes were added into write_page_reservation_ and subsequently violates the total memory reservation. This patch fixes the bug by deferring advancement of the read iterator in UnpinStream if the read page is attached to output RowBatch and there are only 2 pages in the stream. This is OK because after UnpinStream finished, the stream is now in unpinned mode and has_read_write_page is false. The next AddRow operation is then allowed to unpin the previous write page first before reusing the reservation to allocate a new write page. The next GetNext call will be responsible to advance the read page. Testing: - Add be test DeferAdvancingReadPage. - Loop the TestScratchLimit::test_with_unlimited_scratch_limit in my local dev machine and verify that each test passed without triggering the DCHECK violation. - Reenable result spooling in TestScratchLimit that was disabled in IMPALA-10559. - Pass core tests. Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a --- M be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/buffered-tuple-stream.cc M be/src/runtime/buffered-tuple-stream.h M tests/query_test/test_scratch_limit.py 4 files changed, 100 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17195/6 -- To view, visit http://gerrit.cloudera.org:8080/17195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a Gerrit-Change-Number: 17195 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10584: Defer advancing read page if stream only has 2 pages.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17195 ) Change subject: IMPALA-10584: Defer advancing read page if stream only has 2 pages. .. Patch Set 6: > Patch Set 6: > > (8 comments) > > Patch set 5 change the condition on when to NOT advance the read page. > Read page will not be advanced in UnpinStream if read page is attached to > output RowBatch and there are only 2 pages in the stream. Sorry, I should mention patch set 6 instead of 5. -- To view, visit http://gerrit.cloudera.org:8080/17195 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a Gerrit-Change-Number: 17195 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 14 Apr 2021 22:43:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10497: Fix flakiness in test no fd caching on cached data.
Hello Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17054 to look at the new patch set (#3). Change subject: IMPALA-10497: Fix flakiness in test_no_fd_caching_on_cached_data. .. IMPALA-10497: Fix flakiness in test_no_fd_caching_on_cached_data. test_no_fd_caching_on_cached_data has been flaky for not having all of the data fully cached in the warm-up phase. There is a limit on concurrency in writing to the cache such that we may fail to cache data the first time read it. This patch fixes the test by repeating the warm-up query 5 times. This patch also reduce the test cluster size to just 1 to avoid potential flakiness from multiple impalad racing to delete the same old fata cache files. Testing: - Loop the test manually 100 times and see no more failures. Change-Id: I774f9dfea7dcc107c3c7f2b76db3aaf4b2dd7952 --- M tests/custom_cluster/test_hdfs_fd_caching.py 1 file changed, 8 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/17054/3 -- To view, visit http://gerrit.cloudera.org:8080/17054 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I774f9dfea7dcc107c3c7f2b76db3aaf4b2dd7952 Gerrit-Change-Number: 17054 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-10497: Fix flakiness in test no fd caching on cached data.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17054 ) Change subject: IMPALA-10497: Fix flakiness in test_no_fd_caching_on_cached_data. .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/17054/2/tests/custom_cluster/test_hdfs_fd_caching.py File tests/custom_cluster/test_hdfs_fd_caching.py: http://gerrit.cloudera.org:8080/#/c/17054/2/tests/custom_cluster/test_hdfs_fd_caching.py@168 PS2, Line 168: 500MB", > Under the covers, custom cluster test uses bin/start-impala-cluster.py and Done. Thanks for pointing out this solution. -- To view, visit http://gerrit.cloudera.org:8080/17054 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I774f9dfea7dcc107c3c7f2b76db3aaf4b2dd7952 Gerrit-Change-Number: 17054 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 16 Feb 2021 21:10:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10497: Fix flakiness in test no fd caching on cached data.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17054 ) Change subject: IMPALA-10497: Fix flakiness in test_no_fd_caching_on_cached_data. .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/17054/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17054/3//COMMIT_MSG@13 PS3, Line 13: This patch also add a proper start_args to the : test so that each impalad will write their data cache file in their own : directory. > Oh, can you revise this statement? Done. Thanks for catching this! -- To view, visit http://gerrit.cloudera.org:8080/17054 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I774f9dfea7dcc107c3c7f2b76db3aaf4b2dd7952 Gerrit-Change-Number: 17054 Gerrit-PatchSet: 4 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 16 Feb 2021 22:10:50 + Gerrit-HasComments: Yes