[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 ) Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection .. Patch Set 1: Code-Review+1 (1 comment) I'm comfortable with this. Not sure if anyone else wants to take a look. http://gerrit.cloudera.org:8080/#/c/8623/1/testdata/workloads/functional-query/queries/QueryTest/scanners.test File testdata/workloads/functional-query/queries/QueryTest/scanners.test: http://gerrit.cloudera.org:8080/#/c/8623/1/testdata/workloads/functional-query/queries/QueryTest/scanners.test@80 PS1, Line 80: select count(*) from alltypes where rand() * 10 > 0; I think rand() can return 0.0, so maybe it would be better to make this >= so it's guaranteed to be always true. -- To view, visit http://gerrit.cloudera.org:8080/8623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517 Gerrit-Change-Number: 8623 Gerrit-PatchSet: 1 Gerrit-Owner: Michael HoGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 27 Nov 2017 17:35:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5936: operator '%' overflows on large decimals
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8574 ) Change subject: IMPALA-5936: operator '%' overflows on large decimals .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8574 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2b06c8acd5aa490943e84013faf2eaac7c26ceb4 Gerrit-Change-Number: 8574 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 28 Nov 2017 17:35:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6241: timeout in admission control test under ASAN
Tim Armstrong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8652 Change subject: IMPALA-6241: timeout in admission control test under ASAN .. IMPALA-6241: timeout in admission control test under ASAN The fix for IMPALA-6241 is to increase the timeout for all slow builds. While testing that fix, I discovered that the ASAN build detection logic was failing silently, resulting in it assuming that it was testing a DEBUG build. The error was: Unexpected DW_AT_name in first CU: /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-ubuntu-16-04/toolchain/source/llvm/llvm-3.9.1.src/projects/compiler-rt/lib/asan/asan_preinit.cc; choosing DEBUG The fix for that issue is to remove the build type detection heuristic and instead just write a file with the build type as part of the build process. Testing: Before this change I was able to reproduce locally every 5-10 test iterations. After this change I haven't seen it reproduce. Change-Id: Ia4ed949cac99b9925f72e19e4adaa2ead370b536 --- M .gitignore M CMakeLists.txt M bin/clean-cmake.sh M infra/python/deps/requirements.txt M tests/common/environ.py M tests/custom_cluster/test_admission_controller.py 6 files changed, 54 insertions(+), 140 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/8652/1 -- To view, visit http://gerrit.cloudera.org:8080/8652 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ia4ed949cac99b9925f72e19e4adaa2ead370b536 Gerrit-Change-Number: 8652 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 ) Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/exec/hdfs-scanner.cc File be/src/exec/hdfs-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/exec/hdfs-scanner.cc@226 PS2, Line 226: DCHECK_EQ(scan_node_->tuple_desc()->byte_size(), 0); This seems to violate the invariant that the scan node only produces non-NULL tuples (the previous version could too). I'm not sure if the planner can produce plans that have a TupleIsNull() predicate on a scan producing zero slots, because as far as I understand, TupleIsNull() predicates are only emitted when a slot from the scan is referenced, but I could be wrong - I feel like there could be some edge case if you get creative with nested loop joins, etc (which pass the tuple pointers through unmodified). I think we should either write a dummy non-NULL pointer or have a comment explaining why that's not necessary. http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/runtime/row-batch.cc File be/src/runtime/row-batch.cc: http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/runtime/row-batch.cc@60 PS2, Line 60: tuple_ptrs_ = reinterpret_cast(calloc(1, tuple_ptrs_size_)); I don't think we should zero-initialise this. We can't depend on the tuple_ptrs_ being zero-initialised since Reset() doesn't zero-initialise them, so it's just unnecessary overhead. -- To view, visit http://gerrit.cloudera.org:8080/8623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517 Gerrit-Change-Number: 8623 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 28 Nov 2017 19:52:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6227: deflake admission stress tests
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8631 ) Change subject: IMPALA-6227: deflake admission stress tests .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/8631/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8631/1//COMMIT_MSG@11 PS1, Line 11: All of the accounting in the test implicitly relies on queries not being : dequeued until queries are later explicitly ended, so if this happened, : the test broke in multiple subtle ways. > and this assumption is no longer true because of the final change for IMPAL I don't think it was ever true for the mem_limit tests. Even before IMPALA-1575 it depended implicitly on the memory staying reserved on all backends. >From what I can tell, the bug was always there, it was a tweak to the >statestore frequency that threw it out of balance. http://gerrit.cloudera.org:8080/#/c/8631/1/tests/custom_cluster/test_admission_controller.py File tests/custom_cluster/test_admission_controller.py: http://gerrit.cloudera.org:8080/#/c/8631/1/tests/custom_cluster/test_admission_controller.py@705 PS1, Line 705: b4 > where is that used? Leftover debugging code that I missed removing. http://gerrit.cloudera.org:8080/#/c/8631/1/tests/custom_cluster/test_admission_controller.py@792 PS1, Line 792: amount of time > is this actually timing sensitive, or is it that as long as we don't fetch It shouldn't be timing sensitive. Updated the comment. -- To view, visit http://gerrit.cloudera.org:8080/8631 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iafb3af0ce68f96e5d713dbb3b37dd0b50ea66bb4 Gerrit-Change-Number: 8631 Gerrit-PatchSet: 1 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 22 Nov 2017 22:45:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6227: deflake admission stress tests
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8631 ) Change subject: IMPALA-6227: deflake admission stress tests .. Patch Set 5: Code-Review+2 Carry -- To view, visit http://gerrit.cloudera.org:8080/8631 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iafb3af0ce68f96e5d713dbb3b37dd0b50ea66bb4 Gerrit-Change-Number: 8631 Gerrit-PatchSet: 5 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 22 Nov 2017 23:17:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6227: deflake admission stress tests
Hello Bikramjeet Vig, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8631 to look at the new patch set (#2). Change subject: IMPALA-6227: deflake admission stress tests .. IMPALA-6227: deflake admission stress tests The problem was that, during the initial admission decision phase, some queries were initially queued then dequeued once memory came available. All of the accounting in the test implicitly relies on queries not being dequeued until queries are later explicitly ended, so if this happened, the test broke in multiple subtle ways. This happened because the query only scanned a small number of rows, which could be all buffered on the receiver side of the exchange even before the client fetched any rows from the coordinator. This means that the reserved memory on some backends could increase then decrease during the initial admission phase, resulting in a query being queued then dequeued. The fix is to increase the number of rows returned by the query so that all fragments remain active during the initial admission phase. This increased test execution time somewhat, so I also had to bump the queue wait timeout for the admission stress tests (they assume that queries don't time out in the queue). Testing: Ran the test under debug, release and ASAN builds, i.e. impala-py.test tests/custom_cluster/test_admission_controller.py \ --workload_exploration_strategy="functional-query:exhaustive" I looped the mem_limit test for a while to confirm it didn't reproduce (it reproduced reliably every 2-3 iterations before this fix). I will try looping it a bit more under different build types concurrently with the review. Change-Id: Iafb3af0ce68f96e5d713dbb3b37dd0b50ea66bb4 --- M fe/src/test/resources/llama-site-test2.xml M tests/custom_cluster/test_admission_controller.py 2 files changed, 89 insertions(+), 47 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/8631/2 -- To view, visit http://gerrit.cloudera.org:8080/8631 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iafb3af0ce68f96e5d713dbb3b37dd0b50ea66bb4 Gerrit-Change-Number: 8631 Gerrit-PatchSet: 2 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht
[Impala-ASF-CR] IMPALA-6227: deflake admission stress tests
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8631 ) Change subject: IMPALA-6227: deflake admission stress tests .. Patch Set 3: (1 comment) Just to summarise where this is at, it's a lot less flaky but I'm still seeing occasional flakes that I think are the result of the test seeing inconsistent metrics (i.e. admitted, queued, dequeued, etc are out of sync). http://gerrit.cloudera.org:8080/#/c/8631/3/tests/custom_cluster/test_admission_controller.py File tests/custom_cluster/test_admission_controller.py: http://gerrit.cloudera.org:8080/#/c/8631/3/tests/custom_cluster/test_admission_controller.py@706 PS3, Line 706: sleep(0.2) > what's that about? does the QUERY_TIMEOUT comment above need updating? I was experimenting with ways to reduce test runtime and reduce the chances of it hitting various timeouts. Factored out a constant to make the relationship clearer. -- To view, visit http://gerrit.cloudera.org:8080/8631 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iafb3af0ce68f96e5d713dbb3b37dd0b50ea66bb4 Gerrit-Change-Number: 8631 Gerrit-PatchSet: 3 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 22 Nov 2017 23:03:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6227: deflake admission stress tests
Hello Bikramjeet Vig, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8631 to look at the new patch set (#4). Change subject: IMPALA-6227: deflake admission stress tests .. IMPALA-6227: deflake admission stress tests The problem was that, during the initial admission decision phase, some queries were initially queued then dequeued once memory came available. All of the accounting in the test implicitly relies on queries not being dequeued until queries are later explicitly ended, so if this happened, the test broke in multiple subtle ways. This happened because the query only scanned a small number of rows, which could be all buffered on the receiver side of the exchange even before the client fetched any rows from the coordinator. This means that the reserved memory on some backends could increase then decrease during the initial admission phase, resulting in a query being queued then dequeued. The fix is to increase the number of rows returned by the query so that all fragments remain active during the initial admission phase. This increased test execution time somewhat, so I also had to bump the queue wait timeout for the admission stress tests (they assume that queries don't time out in the queue). Testing: Ran the test under debug, release and ASAN builds, i.e. impala-py.test tests/custom_cluster/test_admission_controller.py \ --workload_exploration_strategy="functional-query:exhaustive" I looped the mem_limit test for a while to confirm it didn't reproduce (it reproduced reliably every 2-3 iterations before this fix). It still reproduces every 5-10 runs with exhaustive+release, so I need to do further work to make it more robust. Change-Id: Iafb3af0ce68f96e5d713dbb3b37dd0b50ea66bb4 --- M fe/src/test/resources/llama-site-test2.xml M tests/custom_cluster/test_admission_controller.py 2 files changed, 114 insertions(+), 47 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/8631/4 -- To view, visit http://gerrit.cloudera.org:8080/8631 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iafb3af0ce68f96e5d713dbb3b37dd0b50ea66bb4 Gerrit-Change-Number: 8631 Gerrit-PatchSet: 4 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6227: deflake admission stress tests
Hello Bikramjeet Vig, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8631 to look at the new patch set (#5). Change subject: IMPALA-6227: deflake admission stress tests .. IMPALA-6227: deflake admission stress tests The problem was that, during the initial admission decision phase, some queries were initially queued then dequeued once memory came available. All of the accounting in the test implicitly relies on queries not being dequeued until queries are later explicitly ended, so if this happened, the test broke in multiple subtle ways. This happened because the query only scanned a small number of rows, which could be all buffered on the receiver side of the exchange even before the client fetched any rows from the coordinator. This means that the reserved memory on some backends could increase then decrease during the initial admission phase, resulting in a query being queued then dequeued. The fix is to increase the number of rows returned by the query so that all fragments remain active during the initial admission phase. This increased test execution time somewhat, so I also had to bump the queue wait timeout for the admission stress tests (they assume that queries don't time out in the queue). Testing: Ran the test under debug, release and ASAN builds, i.e. impala-py.test tests/custom_cluster/test_admission_controller.py \ --workload_exploration_strategy="functional-query:exhaustive" I looped the mem_limit test for a while to confirm it didn't reproduce (it reproduced reliably every 2-3 iterations before this fix). It still reproduces every 5-10 runs with exhaustive+release, so I need to do further work to make it more robust. Change-Id: Iafb3af0ce68f96e5d713dbb3b37dd0b50ea66bb4 --- M fe/src/test/resources/llama-site-test2.xml M tests/custom_cluster/test_admission_controller.py 2 files changed, 116 insertions(+), 49 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/8631/5 -- To view, visit http://gerrit.cloudera.org:8080/8631 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iafb3af0ce68f96e5d713dbb3b37dd0b50ea66bb4 Gerrit-Change-Number: 8631 Gerrit-PatchSet: 5 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6227: deflake admission stress tests
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8631 ) Change subject: IMPALA-6227: deflake admission stress tests .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/8631/4/tests/custom_cluster/test_admission_controller.py File tests/custom_cluster/test_admission_controller.py: http://gerrit.cloudera.org:8080/#/c/8631/4/tests/custom_cluster/test_admission_controller.py@698 PS4, Line 698: every : # second > shouldn't that say "at least every second" or something? Done -- To view, visit http://gerrit.cloudera.org:8080/8631 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iafb3af0ce68f96e5d713dbb3b37dd0b50ea66bb4 Gerrit-Change-Number: 8631 Gerrit-PatchSet: 4 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 22 Nov 2017 23:16:40 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3804: Re-enable per-scan filtering for sequence-based scanners
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8684 ) Change subject: IMPALA-3804: Re-enable per-scan filtering for sequence-based scanners .. Patch Set 1: (2 comments) The code looks correct to me, but I had some ideas about how a cleaner solution - let me know if they make sense or if I'm missing something. http://gerrit.cloudera.org:8080/#/c/8684/1/be/src/exec/base-sequence-scanner.cc File be/src/exec/base-sequence-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8684/1/be/src/exec/base-sequence-scanner.cc@174 PS1, Line 174: if (!scan_node_->PartitionPassesFilters(context_->partition_descriptor()->id(), This is a bit inconsistent with the other file types because it checks once per 1024-row batch (GetNextInternal() is called many times per file) instead of once per scan range. Not the biggest deal but it adds another special case to this code (which unfortunately already has a lot of them). http://gerrit.cloudera.org:8080/#/c/8684/1/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/8684/1/be/src/exec/hdfs-scan-node.cc@a497 PS1, Line 497: : After looking at the code I'm wondering if this idea would lead to a more elegant solution. If I understand correctly, the idea is that, instead of marking the scan range done by incrementing progress_, for sequence file header ranges, we should instead mark all the ranges in the file as done by calling RangeComplete() with all of the ranges in the file. I think then we could maybe remove the special casing for FileFormatIsSequenceBased() and instead check something like if (metadata->is_header_range()) Which seems to express the logic more directly. -- To view, visit http://gerrit.cloudera.org:8080/8684 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4b38c26bcbe67f83efcc65a1723d766626ae3d3e Gerrit-Change-Number: 8684 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 29 Nov 2017 17:20:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Update incubator-impala -> impala URLs
Tim Armstrong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8685 Change subject: Update incubator-impala -> impala URLs .. Update incubator-impala -> impala URLs This fixes push_to_asf.py and various other scripts that had the Apache repo location hard-coded. Also fixed the location of the github mirror and mailing list archives. Testing: Ran push_to_asf.py to check I got the URL right. Checked a couple of the github and mailing list URLs to make sure the new URL is valid. Change-Id: Ie49221300340ef34bdd7c01670c35bdbbce3e84f --- M bin/bootstrap_system.sh M bin/push_to_asf.py M docs/README.md M docs/impala_keydefs.ditamap M docs/topics/impala_install.xml M docs/topics/impala_reserved_words.xml M tests/benchmark/report_benchmark_results.py 7 files changed, 11 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/8685/1 -- To view, visit http://gerrit.cloudera.org:8080/8685 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ie49221300340ef34bdd7c01670c35bdbbce3e84f Gerrit-Change-Number: 8685 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 ) Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection .. Patch Set 4: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517 Gerrit-Change-Number: 8623 Gerrit-PatchSet: 4 Gerrit-Owner: Michael HoGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 29 Nov 2017 01:24:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6255: Add device names to DiskIoMgr thread names
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8669 ) Change subject: IMPALA-6255: Add device names to DiskIoMgr thread names .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8669 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I30faeda6db8846e4aad64ce29ca811366d84910b Gerrit-Change-Number: 8669 Gerrit-PatchSet: 2 Gerrit-Owner: Lars VolkerGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 29 Nov 2017 22:33:26 + Gerrit-HasComments: No
[Impala-ASF-CR] Remove "incubator-" from URLs.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8698 ) Change subject: Remove "incubator-" from URLs. .. Patch Set 1: Heh :) -- To view, visit http://gerrit.cloudera.org:8080/8698 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6a8e10418f75c89a1f0b2111f860417a80dc10c5 Gerrit-Change-Number: 8698 Gerrit-PatchSet: 1 Gerrit-Owner: Lars VolkerGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 30 Nov 2017 16:32:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2248: Make idle session timeout a query option
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8490 ) Change subject: IMPALA-2248: Make idle_session_timeout a query option .. Patch Set 10: (5 comments) Did an initial pass. Still need to look at the tests in detail. http://gerrit.cloudera.org:8080/#/c/8490/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8490/10//COMMIT_MSG@7 PS10, Line 7: IMPALA-2248: Make idle_session_timeout a query option I did an initial pass over the code and wanted to clarify my understanding of the current behaviour: * Does "set idle_session_timeout" in impala-shell have any effect? I wouldn't think so since it just updates a client-side map, right? * The request pool query options overlay doesn't have an effect on this option, does it? Since a session doesn't belong to a request pool. I think unfortunately this will end up being a bit of a special case in terms of behaviour, but I think this is expected. http://gerrit.cloudera.org:8080/#/c/8490/10/be/src/service/client-request-state.h File be/src/service/client-request-state.h: http://gerrit.cloudera.org:8080/#/c/8490/10/be/src/service/client-request-state.h@423 PS10, Line 423: void UpdateSessionTimeout(int32_t requested_timeout); I think this assumes that session_->lock_ is held. We should be careful to document assumptions about which locks are and aren't held in this part of the code (there are unfortunately a lot of locks). http://gerrit.cloudera.org:8080/#/c/8490/10/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/8490/10/be/src/service/client-request-state.cc@207 PS10, Line 207: UpdateSessionTimeout(atoi(value.c_str())); It feels a bit weird that we're doing string parsing of a query option here. Maybe we should set the query option first and then get the parsed value? http://gerrit.cloudera.org:8080/#/c/8490/10/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/8490/10/be/src/service/impala-server.h@436 PS10, Line 436: void SetTimeout(int32_t requested_timeout); The caller must hold 'lock', right? http://gerrit.cloudera.org:8080/#/c/8490/7/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/8490/7/be/src/service/impala-server.h@423 PS7, Line 423: /// session may be correctly expired after a timeout (when ref_count == 0). Typically Need to think about whether this is necessary - should we just use set_query_options? -- To view, visit http://gerrit.cloudera.org:8080/8490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32e2775f80da387b0df4195fe2c5435b3f8e585e Gerrit-Change-Number: 8490 Gerrit-PatchSet: 10 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 01 Dec 2017 02:13:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6201: Fix test basic filters on ASAN
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8646 ) Change subject: IMPALA-6201: Fix test_basic_filters on ASAN .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8646/1/tests/query_test/test_runtime_filters.py File tests/query_test/test_runtime_filters.py: http://gerrit.cloudera.org:8080/#/c/8646/1/tests/query_test/test_runtime_filters.py@27 PS1, Line 27: WAIT_TIME_MS = specific_build_type_timeout(6, slow_build_timeout=10) Just curious - did the longer timeout take effect on ASAN for you? I was seeing some cases locally where the ASAN build type detection logic flaked out on me and fell back to the DEBUG timing. -- To view, visit http://gerrit.cloudera.org:8080/8646 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8c20cbb75a9b6da73137f220657aa75dea9dfdce Gerrit-Change-Number: 8646 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 27 Nov 2017 22:46:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6241: timeout in admission control test under ASAN
Hello Michael Brown, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8652 to look at the new patch set (#2). Change subject: IMPALA-6241: timeout in admission control test under ASAN .. IMPALA-6241: timeout in admission control test under ASAN The fix for IMPALA-6241 is to increase the timeout for all slow builds. While testing that fix, I discovered that the ASAN build detection logic was failing silently, resulting in it assuming that it was testing a DEBUG build. The error was: Unexpected DW_AT_name in first CU: /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-ubuntu-16-04/toolchain/source/llvm/llvm-3.9.1.src/projects/compiler-rt/lib/asan/asan_preinit.cc; choosing DEBUG The fix for that issue is to remove the build type detection heuristic and instead just write a file with the build type as part of the build process. Testing: Before this change I was able to reproduce locally every 5-10 test iterations. After this change I haven't seen it reproduce. Change-Id: Ia4ed949cac99b9925f72e19e4adaa2ead370b536 --- M .gitignore M CMakeLists.txt M bin/clean-cmake.sh M infra/python/deps/requirements.txt M tests/common/environ.py M tests/custom_cluster/test_admission_controller.py 6 files changed, 61 insertions(+), 143 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/8652/2 -- To view, visit http://gerrit.cloudera.org:8080/8652 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia4ed949cac99b9925f72e19e4adaa2ead370b536 Gerrit-Change-Number: 8652 Gerrit-PatchSet: 2 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Michael Brown
[Impala-ASF-CR] IMPALA-6241: timeout in admission control test under ASAN
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8652 ) Change subject: IMPALA-6241: timeout in admission control test under ASAN .. Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/8652/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8652/1//COMMIT_MSG@10 PS1, Line 10: : While testing that fix, I discovered that the ASAN build detection logic : was failing silently > Sorry to hear this wasn't working. Thanks for making an improvement. It looks like it was my bug that caused it to fail actually :) (it's unfortunate that the silent nature of the failure caused us to go this long without noticing it though). http://gerrit.cloudera.org:8080/#/c/8652/1/infra/python/deps/requirements.txt File infra/python/deps/requirements.txt: http://gerrit.cloudera.org:8080/#/c/8652/1/infra/python/deps/requirements.txt@45 PS1, Line 45: monkeypatch == 0.1rc3 > I don't think anything else is using this, so it can be removed. Done http://gerrit.cloudera.org:8080/#/c/8652/1/tests/common/environ.py File tests/common/environ.py: http://gerrit.cloudera.org:8080/#/c/8652/1/tests/common/environ.py@a198 PS1, Line 198: > I guess the bug is that this should have been elif. Oops, that is my bug then - I didn't see this when looking at the code. It doesn't seem like the changed approach really directly addresses the bug then. http://gerrit.cloudera.org:8080/#/c/8652/1/tests/common/environ.py@19 PS1, Line 19: import pytest > Unused; remove. Done http://gerrit.cloudera.org:8080/#/c/8652/1/tests/common/environ.py@112 PS1, Line 112: is_dev > Do you want to add ubsan to this method? Done http://gerrit.cloudera.org:8080/#/c/8652/1/tests/common/environ.py@122 PS1, Line 122: runs_slowly > Do you want to add ubsan to this method? Done -- To view, visit http://gerrit.cloudera.org:8080/8652 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia4ed949cac99b9925f72e19e4adaa2ead370b536 Gerrit-Change-Number: 8652 Gerrit-PatchSet: 1 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 28 Nov 2017 01:20:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2181: Add query option levels for display
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8447 ) Change subject: IMPALA-2181: Add query option levels for display .. Patch Set 14: (5 comments) http://gerrit.cloudera.org:8080/#/c/8447/18/be/src/service/query-options.h File be/src/service/query-options.h: http://gerrit.cloudera.org:8080/#/c/8447/18/be/src/service/query-options.h@113 PS18, Line 113: per discussion on the JIRA, I think this should be under DEVELOPMENT until it's documented. http://gerrit.cloudera.org:8080/#/c/8447/18/shell/impala_client.py File shell/impala_client.py: http://gerrit.cloudera.org:8080/#/c/8447/18/shell/impala_client.py@100 PS18, Line 100: _indent_level, output): This check shouldn't be necessary, right? Unless you're doing what I did and running impala-shell against the wrong thrift output because you're too lazy to rebuild Impala :). http://gerrit.cloudera.org:8080/#/c/8447/18/shell/impala_shell.py File shell/impala_shell.py: http://gerrit.cloudera.org:8080/#/c/8447/18/shell/impala_shell.py@83 PS18, Line 83: correspond http://gerrit.cloudera.org:8080/#/c/8447/18/shell/impala_shell.py@242 PS18, Line 242: self._print_option_group(advanced_options) This is subtle - can you add a comment? If I understand correctly, this is detecting that the impala server didn't send back any advanced query options, so we're assuming it's an old one. http://gerrit.cloudera.org:8080/#/c/8447/14/tests/hs2/test_hs2.py File tests/hs2/test_hs2.py: http://gerrit.cloudera.org:8080/#/c/8447/14/tests/hs2/test_hs2.py@53 PS14, Line 53: fetch_results_req.operationHandle = execute_statement_resp.operationHandle > This function is actually a copy paste: I moved it out from test_session_op Seems ok then. -- To view, visit http://gerrit.cloudera.org:8080/8447 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I75720d0d454527e1a0ed19bb43cf9e4f018ce1d1 Gerrit-Change-Number: 8447 Gerrit-PatchSet: 14 Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 22 Nov 2017 19:06:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6241: timeout in admission control test under ASAN
Hello Michael Brown, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8652 to look at the new patch set (#4). Change subject: IMPALA-6241: timeout in admission control test under ASAN .. IMPALA-6241: timeout in admission control test under ASAN The fix for IMPALA-6241 is to increase the timeout for all slow builds. While testing that fix, I discovered that the ASAN build detection logic was failing silently, resulting in it assuming that it was testing a DEBUG build. The error was: Unexpected DW_AT_name in first CU: /data/jenkins/workspace/verify-impala-toolchain-package-build/label/ec2-package-ubuntu-16-04/toolchain/source/llvm/llvm-3.9.1.src/projects/compiler-rt/lib/asan/asan_preinit.cc; choosing DEBUG The fix for that issue is to remove the build type detection heuristic and instead just write a file with the build type as part of the build process. Testing: Before this change I was able to reproduce locally every 5-10 test iterations. After this change I haven't seen it reproduce. Change-Id: Ia4ed949cac99b9925f72e19e4adaa2ead370b536 --- M .gitignore M CMakeLists.txt M bin/clean-cmake.sh M infra/python/deps/requirements.txt M tests/common/environ.py M tests/custom_cluster/test_admission_controller.py 6 files changed, 62 insertions(+), 133 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/8652/4 -- To view, visit http://gerrit.cloudera.org:8080/8652 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia4ed949cac99b9925f72e19e4adaa2ead370b536 Gerrit-Change-Number: 8652 Gerrit-PatchSet: 4 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6241: timeout in admission control test under ASAN
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8652 ) Change subject: IMPALA-6241: timeout in admission control test under ASAN .. Patch Set 4: Code-Review+2 I discovered that some of the the code I thought was dead was actually imported by another test. Reinstated it. -- To view, visit http://gerrit.cloudera.org:8080/8652 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia4ed949cac99b9925f72e19e4adaa2ead370b536 Gerrit-Change-Number: 8652 Gerrit-PatchSet: 4 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 28 Nov 2017 23:56:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1144: Fix exception when cancelling query in Impala-shell with CTRL-C
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8549 ) Change subject: IMPALA-1144: Fix exception when cancelling query in Impala-shell with CTRL-C .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6cefaf1dae78baae238289816a7cb9d210fb38e2 Gerrit-Change-Number: 8549 Gerrit-PatchSet: 8 Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 29 Nov 2017 00:12:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1144: Fix exception when cancelling query in Impala-shell with CTRL-C
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8549 ) Change subject: IMPALA-1144: Fix exception when cancelling query in Impala-shell with CTRL-C .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6cefaf1dae78baae238289816a7cb9d210fb38e2 Gerrit-Change-Number: 8549 Gerrit-PatchSet: 9 Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 29 Nov 2017 00:12:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 ) Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection .. Patch Set 2: (1 comment) Would it make sense to leave the existing behaviour and file the potential bug with NULL tuples as a separate JIRA? It sounds like the problem and solution for this bug are clearer but the other problem is very unclear. http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/exec/hdfs-scanner.cc File be/src/exec/hdfs-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/exec/hdfs-scanner.cc@226 PS2, Line 226: DCHECK_EQ(scan_node_->tuple_desc()->byte_size(), 0); > I agree writing a dummy non-NULL pointer seems most correct, but have we de I was able to construct a query (based on an existing test) that had a TupleIsNull() predicate referencing the tuple from scan that doesn't materialise anything: SELECT /* +straight_join */ COUNT(t1.id) FROM functional.alltypessmall t1 LEFT OUTER JOIN ( SELECT /* +straight_join */ IFNULL(t2.int_col, 1) AS c FROM functional.alltypessmall t2 LEFT OUTER JOIN functional.alltypestiny t3 ON t2.id < 1000 ) v ON t1.int_col = v.c; The relevant part of the plan is: | 04:HASH JOIN [LEFT OUTER JOIN, PARTITIONED] | | | hash predicates: t1.int_col = if(TupleIsNull(1, 2), NULL, ifnull(t2.int_col, 1)) | | | fk/pk conjuncts: assumed fk/pk | | | mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB | | | tuple-ids=0,1N,2N row-size=16B cardinality=100 | | | | | |--08:EXCHANGE [HASH(if(TupleIsNull(1, 2), NULL, ifnull(t2.int_col, 1)))] | | | | mem-estimate=0B mem-reservation=0B | | | | tuple-ids=1,2N row-size=8B cardinality=100 | | | | | | | F01:PLAN FRAGMENT [RANDOM] hosts=3 instances=3 | | | Per-Host Resources: mem-estimate=32.00MB mem-reservation=0B | | | 03:NESTED LOOP JOIN [LEFT OUTER JOIN, BROADCAST] | | | | join predicates: t2.id < 1000 | | | | mem-estimate=0B mem-reservation=0B | | | | tuple-ids=1,2N row-size=8B cardinality=100 | | | | | | | |--06:EXCHANGE [BROADCAST] | | | | | mem-estimate=0B mem-reservation=0B | | | | | tuple-ids=2 row-size=0B cardinality=8 | | | | | | | | | F02:PLAN FRAGMENT [RANDOM] hosts=3 instances=3 | | | | Per-Host Resources: mem-estimate=32.00MB mem-reservation=0B | | | | 02:SCAN HDFS [functional.alltypestiny t3, RANDOM] | | | | partitions=4/4 files=4 size=460B | | | | stats-rows=8 extrapolated-rows=disabled | | | | table stats: rows=8 size=unavailable | | | | column stats: all | | | | mem-estimate=32.00MB mem-reservation=0B | | | | tuple-ids=2 row-size=0B cardinality=8 | | | | | | | 01:SCAN HDFS [functional.alltypessmall t2, RANDOM] | | | partitions=4/4 files=4 size=6.32KB | | | stats-rows=100 extrapolated-rows=disabled | | | table stats: rows=100 size=unavailable | | | column stats: all | | | mem-estimate=32.00MB mem-reservation=0B | | | tuple-ids=1 row-size=8B cardinality=100 | So it does look like the scenario is somehow possible. I don't really understand exactly why it is needed here. -- To view, visit
[Impala-ASF-CR] IMPALA-4132: Use -fno-omit-frame-pointer
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8612 ) Change subject: IMPALA-4132: Use -fno-omit-frame-pointer .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8612/1/be/CMakeLists.txt File be/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/8612/1/be/CMakeLists.txt@42 PS1, Line 42: # -fno-omit-frame-pointers: Keep frame pointer for functions. Produces better > From my side no such consideration was made. I didn't do any deep research into it. It's unclear to me what tooling supports that, whether it gets stripped out when binaries are stripped, etc. It may also be a good option, but I'm in favour of moving forward with the frame pointer approach since we know that it is widely supported and doesn't appear to have a significant perf impact. -- To view, visit http://gerrit.cloudera.org:8080/8612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib7d0d88ba015a847356ed0274fe91017b98cb941 Gerrit-Change-Number: 8612 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 29 Nov 2017 00:08:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1474: Add a metric for running queries
Tim Armstrong has abandoned this change. ( http://gerrit.cloudera.org:8080/7228 ) Change subject: IMPALA-1474: Add a metric for running queries .. Abandoned Seems like this is probably redundant now. Can always reopen. -- To view, visit http://gerrit.cloudera.org:8080/7228 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: I41bc88d0e91427ca2fd70136dc6e06fa7a2663da Gerrit-Change-Number: 7228 Gerrit-PatchSet: 1 Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-5987: LZ4 Codec silently produces bogus compressed data for large inputs
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8748 ) Change subject: IMPALA-5987: LZ4 Codec silently produces bogus compressed data for large inputs .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8748 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifb0bc4ed98c5d7b628b791aa90ead36347b9fbb8 Gerrit-Change-Number: 8748 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 04 Dec 2017 18:01:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6232: Disable file handle cache by default
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8750 ) Change subject: IMPALA-6232: Disable file handle cache by default .. Patch Set 1: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/8750/1/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/8750/1/be/src/runtime/io/disk-io-mgr.cc@96 PS1, Line 96: // or smaller, depending on the replication factor for this file or the path name. Maybe add a TODO to re-enable it? http://gerrit.cloudera.org:8080/#/c/8750/1/be/src/runtime/io/disk-io-mgr.cc@109 PS1, Line 109: DEFINE_uint64(unused_file_handle_timeout_sec, 270, "Maximum time, in seconds, that an " Maybe also mention here why you chose 270 (I see there's an explanation in the commit message already) -- To view, visit http://gerrit.cloudera.org:8080/8750 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iea7f943f63b72b42286a9e8b9987308baa79d7b0 Gerrit-Change-Number: 8750 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnellGerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 04 Dec 2017 19:22:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3804: Re-enable per-scan filtering for sequence-based scanners
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8684 ) Change subject: IMPALA-3804: Re-enable per-scan filtering for sequence-based scanners .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8684/3/be/src/exec/base-sequence-scanner.cc File be/src/exec/base-sequence-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8684/3/be/src/exec/base-sequence-scanner.cc@56 PS3, Line 56: ScanRangeMetadata* header_metadata = new ScanRangeMetadata(*metadata); We need to add this to an ObjectPool so it's freed when the query is torn down (see what is done for the other place where these are allocated). -- To view, visit http://gerrit.cloudera.org:8080/8684 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4b38c26bcbe67f83efcc65a1723d766626ae3d3e Gerrit-Change-Number: 8684 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 04 Dec 2017 19:28:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2248: Make idle session timeout a query option
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8490 ) Change subject: IMPALA-2248: Make idle_session_timeout a query option .. Patch Set 10: (8 comments) http://gerrit.cloudera.org:8080/#/c/8490/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8490/10//COMMIT_MSG@7 PS10, Line 7: IMPALA-2248: Make idle_session_timeout a query option > * Yeah, currently it doesn't have any effect for the shell. As you said, it Sounds reasonable. We'll have to make sure to document the limitations clearly - we can file a Doc JIRA as a follow-up. http://gerrit.cloudera.org:8080/#/c/8490/10/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/8490/10/be/src/service/client-request-state.cc@207 PS10, Line 207: UpdateSessionTimeout(atoi(value.c_str())); > At this point the value can be invalid, e.g. the client wants to set the ti Right, but if the client provides an invalid value, I think it will just get swallowed up here. E.g. I tried set idle_session_timeout="foo"; and it seems to fail silently. I'd expect it return the "Invalid idle session timeout: " error to the client. I think we should add tests to set.test for validation of the option. http://gerrit.cloudera.org:8080/#/c/8490/11/fe/src/test/java/org/apache/impala/service/JdbcTest.java File fe/src/test/java/org/apache/impala/service/JdbcTest.java: http://gerrit.cloudera.org:8080/#/c/8490/11/fe/src/test/java/org/apache/impala/service/JdbcTest.java@616 PS11, Line 616: ew Long( nit: is the new Long() necessary? Seems like it should be automatically cast http://gerrit.cloudera.org:8080/#/c/8490/11/fe/src/test/java/org/apache/impala/service/JdbcTest.java@626 PS11, Line 626: int sleepPeriod = (int)(timeout * timeoutTolerance * 1000) + 500; I think the wakeup timing could do with a brief explanation. Is the idea is that each iteration might let one session expire and then renews the timeout for the remaining sessions? http://gerrit.cloudera.org:8080/#/c/8490/11/fe/src/test/java/org/apache/impala/service/JdbcTest.java@634 PS11, Line 634: new Long same here http://gerrit.cloudera.org:8080/#/c/8490/11/fe/src/test/java/org/apache/impala/service/JdbcTest.java@636 PS11, Line 636: Boolean bool? Not sure why we need to use the boxed type http://gerrit.cloudera.org:8080/#/c/8490/11/fe/src/test/java/org/apache/impala/service/JdbcTest.java@639 PS11, Line 639: try(ResultSet rs = connection.createStatement().executeQuery("SELECT 1+2")) { nit: whitespace http://gerrit.cloudera.org:8080/#/c/8490/11/fe/src/test/java/org/apache/impala/util/Metrics.java File fe/src/test/java/org/apache/impala/util/Metrics.java: http://gerrit.cloudera.org:8080/#/c/8490/11/fe/src/test/java/org/apache/impala/util/Metrics.java@43 PS11, Line 43: public Object getMetric(String metric) throws Exception { A brief comment would be helpful, mainly to explain what it returns - when reading the test I was initially confused about the casting to (Long). -- To view, visit http://gerrit.cloudera.org:8080/8490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32e2775f80da387b0df4195fe2c5435b3f8e585e Gerrit-Change-Number: 8490 Gerrit-PatchSet: 10 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 05 Dec 2017 01:25:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6265 Query cancellation test enhancements
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8713 ) Change subject: IMPALA-6265 Query cancellation test enhancements .. Patch Set 2: (6 comments) http://gerrit.cloudera.org:8080/#/c/8713/2/tests/shell/test_shell_commandline.py File tests/shell/test_shell_commandline.py: http://gerrit.cloudera.org:8080/#/c/8713/2/tests/shell/test_shell_commandline.py@349 PS2, Line 349: def wait_query_until_desired_state(self, stmt, state, max_retry=10): Thanks, this looks good. The name is a little non-idiomatic. Maybe something like wait_for_query_state()? http://gerrit.cloudera.org:8080/#/c/8713/2/tests/shell/test_shell_commandline.py@362 PS2, Line 362: return "The found in flight query is not the one under test: " + \ Maybe just raise an exception? Makes it hard for tests to accidentally forget to check the return value http://gerrit.cloudera.org:8080/#/c/8713/2/tests/shell/test_shell_commandline.py@366 PS2, Line 366: ++retry_count I don't think ++ works in python. It is syntactically valid but doesn't increment the variable. I think "retry_count += 1" is the most concise way of doing this. http://gerrit.cloudera.org:8080/#/c/8713/2/tests/shell/test_shell_commandline.py@368 PS2, Line 368: return "Query didn't reach desired state: " + state Same here - maybe raise an exception or just assert? http://gerrit.cloudera.org:8080/#/c/8713/2/tests/shell/test_shell_commandline.py@385 PS2, Line 385: get_statement_from_args It looks like we have the original statement in the tests so it seems better to pass the statement through wait_query_until_desired_state() instead of trying to extract it from the command line. http://gerrit.cloudera.org:8080/#/c/8713/2/tests/shell/test_shell_commandline.py@386 PS2, Line 386: ImapalShell ImpalaShell -- To view, visit http://gerrit.cloudera.org:8080/8713 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie0bff485a872df7be8efd784314a6ca91aaadd11 Gerrit-Change-Number: 8713 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 04 Dec 2017 17:39:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation
Tim Armstrong has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/8414 ) Change subject: IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation .. IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation In preparation for switching the I/O mgr to the buffer pool, this removes and cleans up a lot of code so that the switchover patch starts from a cleaner slate. * Remove the free buffer cache (which will be replaced by buffer pool's own caching). * Make memory limit exceeded error checking synchronous (in anticipation of having to propagate buffer pool errors synchronously). * Simplify error propagation - remove the (ineffectual) code that enqueued BufferDescriptors containing error statuses. * Document locking scheme better in a few places, make it part of the function signature when it seemed reasonable. * Move ReturnBuffer() to ScanRange, because it is intrinsically connected with the lifecycle of a scan range. * Separate external ReturnBuffer() and internal CleanUpBuffer() interfaces - previously callers of ReturnBuffer() were fudging the num_buffers_in_reader accounting to make the external interface work. * Eliminate redundant state in ScanRange: 'eosr_returned_' and 'is_cancelled_'. * Clarify the logic around calling Close() for the last BufferDescriptor. -> There appeared to be an implicit assumption that buffers would be freed in the order they were returned from the scan range, so that the "eos" buffer was returned last. Instead just count the number of outstanding buffers to detect the last one. -> Touching the is_cancelled_ field without holding a lock was hard to reason about - violated locking rules and it was unclear that it was race-free. * Remove DiskIoMgr::Read() to simplify interface. It is trivial to inline at the callsites. This will probably regress performance somewhat because of the cache removal, so my plan is to merge it around the same time as switching the I/O mgr to allocate from the buffer pool. I'm keeping the patches separate to make reviewing easier. Testing: * Ran exhaustive tests * Ran the disk-io-mgr-stress-test overnight Change-Id: If5cb42437d11c13bc4a55c3ab426b66777332bd1 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/scanner-context.cc M be/src/runtime/exec-env.cc M be/src/runtime/io/disk-io-mgr-stress.cc M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/runtime/mem-tracker.h M be/src/runtime/test-env.cc M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h 19 files changed, 575 insertions(+), 904 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/8414/14 -- To view, visit http://gerrit.cloudera.org:8080/8414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If5cb42437d11c13bc4a55c3ab426b66777332bd1 Gerrit-Change-Number: 8414 Gerrit-PatchSet: 14 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time.
Hello Lars Volker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8814 to look at the new patch set (#3). Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time. .. IMPALA-6290: limit ScannerContext to 1 buffer at a time. This is a prerequisite for constraining the number of buffers per scan range. Before this patch, calling ReadBytes(), SkipBytes(), etc could cause an arbitrary number of I/O buffers to accumulate in 'completed_io_buffers_'. E.g. if we allocated 3 * 8MB I/O buffers for a range and then called ReadBytes(30MB), we would hit resource exhaustion as soon as 3 buffers were accumulated in 'completed_io_buffers_'. The fix is to avoid accumulating any buffers in 'completed_io_buffers_'. Instead of adding them to 'completed_io_buffers_', completed buffers are just returned to the I/O manager. It turned out that this did not weaken the ScannerContext's guarantees about memory lifetime, because ScannerContext::GetBytesInternal() cleared 'boundary_buffer_' each time it was called regardless. I checked that this behaviour wasn't a bug by inspecting the scanner code. I could not find any cases where scanners depended on returned memory remaining valid beyond the next Read*()/Get*()/Skip*() call on the stream. This change makes that lifetime explicit in the comments. A side-effect of this fix is that scanners do not need to call ReleaseCompletedResources() in CommitRows() and means that the ScannerContext only ever needs to hold one I/O buffer at a time. This change also reimplements SkipBytes() to avoid it accumulating memory in the boundary buffer for large skip sizes. Also clarifies some of the invariants in ScannerContext. E.g. some places assumed io_buffer_ != NULL, but that is no longer needed. Testing: Ran core tests with ASAN and exhaustive tests with DEBUG. Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M be/src/exec/scanner-context.inline.h M testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-scan-node-errors.test 10 files changed, 241 insertions(+), 174 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/8814/3 -- To view, visit http://gerrit.cloudera.org:8080/8814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 Gerrit-Change-Number: 8814 Gerrit-PatchSet: 3 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8820 ) Change subject: IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE .. Patch Set 3: (3 comments) Took a quick look - I had some concerns with the solution at a high level http://gerrit.cloudera.org:8080/#/c/8820/3/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java: http://gerrit.cloudera.org:8080/#/c/8820/3/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java@299 PS3, Line 299: if (!isReAnalyzed_) setManagedKuduTableName(); Alex and Dimitris are more authoritative on this than me, but adding a new re-analyzed sub-state to all statements seems like too global a solution to solve this local problem. I think we could solve this locally by either keeping track of whether KEY_TABLE_NAME was present in the original table properties, or by checking whether KEY_TABLE_NAME is the same as the managed table name (which is arguably less good because it allows the property to be specified if it's exactly the right value). http://gerrit.cloudera.org:8080/#/c/8820/3/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: http://gerrit.cloudera.org:8080/#/c/8820/3/tests/query_test/test_kudu.py@a861 PS3, Line 861: I think we still want the test coverage from the test. It seems like the first part needs to use an external table instead. http://gerrit.cloudera.org:8080/#/c/8820/3/tests/query_test/test_kudu.py@a881 PS3, Line 881: I this second part of the test is still valid - we want to make sure that "show create table" doesn't show the table name. -- To view, visit http://gerrit.cloudera.org:8080/8820 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieca037498abf8f5fde67b77e824b720482cdbe6f Gerrit-Change-Number: 8820 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 12 Dec 2017 16:34:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8814 ) Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time. .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8814/2/be/src/exec/scanner-context.h File be/src/exec/scanner-context.h: http://gerrit.cloudera.org:8080/#/c/8814/2/be/src/exec/scanner-context.h@65 PS2, Line 65: buffer need to clarify "buffer". Should probably stick to "memory returned" like the first sentence. -- To view, visit http://gerrit.cloudera.org:8080/8814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 Gerrit-Change-Number: 8814 Gerrit-PatchSet: 2 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 12 Dec 2017 18:06:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6284: Mark the intermediate decimal avg struct as packed
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8836 ) Change subject: IMPALA-6284: Mark the intermediate decimal avg struct as packed .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8836/1/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/8836/1/be/src/exprs/aggregate-functions-ir.cc@388 PS1, Line 388: DCHECK_EQ(dst->len, sizeof(DecimalAvgState)); Did you test on a debug build? I think there's a constant in the FE that needs to be updated too (DECIMAL_AVG_INTERMEDIATE_SIZE). This DCHECK is meant to assert that the fe and be sizes are the same. -- To view, visit http://gerrit.cloudera.org:8080/8836 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id25ec6e20dde3f50fb37a22135b355ad251809e0 Gerrit-Change-Number: 8836 Gerrit-PatchSet: 1 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 14 Dec 2017 17:50:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6284: Mark the intermediate decimal avg struct as packed
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8836 ) Change subject: IMPALA-6284: Mark the intermediate decimal avg struct as packed .. Patch Set 1: UBSAN might also be able to detect problems like this (although I haven't investigated) -- To view, visit http://gerrit.cloudera.org:8080/8836 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id25ec6e20dde3f50fb37a22135b355ad251809e0 Gerrit-Change-Number: 8836 Gerrit-PatchSet: 1 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 14 Dec 2017 17:54:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5315: Cast to timestamp fails for YYYY-M-D format
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7009 ) Change subject: IMPALA-5315: Cast to timestamp fails for -M-D format .. Patch Set 10: (5 comments) http://gerrit.cloudera.org:8080/#/c/7009/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/7009/10//COMMIT_MSG@39 PS10, Line 39: Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f I think we need some additional tests for bad data and some query tests that scan tables with mixed-format timestamps to test that it correctly switches to a different context. I was able to crash things with the current patch fairly easily. Repro: drop table if exists timestamp_strs; create table timestamp_strs as select '2001-1-2' as str; insert into timestamp_strs select '2001-10-2'; insert into timestamp_strs select * from timestamp_strs; insert into timestamp_strs select * from timestamp_strs; insert into timestamp_strs select * from timestamp_strs; insert into timestamp_strs select * from timestamp_strs; insert into timestamp_strs select '2001-foo-bar'; select str, cast(str as timestamp) from timestamp_strs; http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.h File be/src/runtime/timestamp-parse-util.h: http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.h@267 PS10, Line 267: static DateTimeFormatContext LAZY_CTX; Having this as a static variable doesn't seem right, since it will be shared between multiple threads, which will updated it. I don't see a particularly easy place to put the last context so that it's local to the thread/column that's executing Parse(). I'm wondering if it's just easiest to have lazy_ctx be a local variable that we create on the fly - we'd take a 50% perf hit for the lazy formats according to your benchmark but it's not clear to me that the optimisation is worth the extra complexity of having to manage LAZY_CTX. http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.cc File be/src/runtime/timestamp-parse-util.cc: http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.cc@378 PS10, Line 378: LIKELY LIKELY doesn't seem right here - I think we can just remove it. http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.cc@381 PS10, Line 381: if (LAZY_CTX.fmt != NULL) dt_ctx = _CTX; nit: we use braces on both branches if there is an else. http://gerrit.cloudera.org:8080/#/c/7009/10/be/src/runtime/timestamp-parse-util.cc@383 PS10, Line 383: LAZY_CTX.Reset(str, len); nit: missing indentation. -- To view, visit http://gerrit.cloudera.org:8080/7009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib9a184a09d7e7783f04d47588537612c2ecec28f Gerrit-Change-Number: 7009 Gerrit-PatchSet: 10 Gerrit-Owner: Vincent TranGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vincent Tran Gerrit-Comment-Date: Thu, 14 Dec 2017 23:21:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6284: Mark the intermediate decimal avg struct as packed
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8836 ) Change subject: IMPALA-6284: Mark the intermediate decimal avg struct as packed .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8836 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id25ec6e20dde3f50fb37a22135b355ad251809e0 Gerrit-Change-Number: 8836 Gerrit-PatchSet: 2 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 14 Dec 2017 23:21:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6300: Fix decimal modulo overflow
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8833 ) Change subject: IMPALA-6300: Fix decimal modulo overflow .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8833/1/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: http://gerrit.cloudera.org:8080/#/c/8833/1/be/src/runtime/decimal-value.inline.h@261 PS1, Line 261: #ifdef DEBUG Is there some way we can avoid having different control flow on release and debug builds. E.g. a SafeMultiply overload that takes a bool may_overflow param? -- To view, visit http://gerrit.cloudera.org:8080/8833 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I27c7f25f68353c19c315e1639311ec06b2dea686 Gerrit-Change-Number: 8833 Gerrit-PatchSet: 1 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Thu, 14 Dec 2017 23:38:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8814 ) Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time. .. Patch Set 3: (14 comments) http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/hdfs-scanner.h File be/src/exec/hdfs-scanner.h: http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/hdfs-scanner.h@97 PS3, Line 97: /// This class also encapsulates row batch management. Subclasses should call CommitRows() > nit: long line (and the new Gerrit UI breaks it now, making it look bad). Done http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.h File be/src/exec/scanner-context.h: http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.h@143 PS3, Line 143: /// (the scan range could be longer than the file). > Can we extend this comment with what the implication are? Is the stream sti Done http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.h@146 PS3, Line 146: /// If true, the stream has reached the end of the file. > Can we extend this comment with what the implication are? Is the stream sti Done http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.h@161 PS3, Line 161: bool ReadBoolean(bool* boolean, Status* status); > Should we add WARN_UNUSED_RESULT while you're here? Done. The calling convention of these functions is a bit weir d to avoid overhead of status, because returning false implies !status->ok() and vice-versa, so some caller were ignoring the return value. I kept the calling convention and adjusted the callers to use a pattern that won't trip the warning. http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.h@193 PS3, Line 193: /// Release resources from previous reads in this stream. If 'done' is true all > This only releases buffers that are completely read, right? In that case, w Updated this and the other ReleaseCompleteResources() comment. http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.h@224 PS3, Line 224: initial > "initial" implies that there are more, no? We create extra scan ranges to read past the end of the stream. Agree the adjective is confusing so reworded ot be less ambiguous. http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.h@254 PS3, Line 254: // We always want output_buffer_bytes_left_ to be non-NULL, so we can avoid a NULL check > nit: long line Done http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.h@298 PS3, Line 298: 2 > Huh? Fat finger error. http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.h@320 PS3, Line 320: Always : /// returns the current I/O buffer to the I/O manager. > This only seems true when the buffer has been read/skipped completely. Done http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.cc File be/src/exec/scanner-context.cc: http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.cc@101 PS3, Line 101: buffer_range->ReturnBuffer(move(io_buffer_)); > Can we reset io_buffer_pos_ here, too? It looks to me below like we're usin I don't think we should be inferring eos from the state of 'io_buffer_' - where did you see that? I think we should only be checking it immediately after the GetNext() calls. I factored out this logic into a ReturnIoBuffer() that resets the related members though. http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.cc@182 PS3, Line 182: if (eosr()) return Status::OK(); > Can you explain here (or in the header at either this function or scan_rang eosr() mean that all the data in the range was returned to the client. scan_range_eosr_ is true and eosr() is false when the last buffer is in the scan range but hasn't been fully returned to the client. http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.cc@226 PS3, Line 226: Status ScannerContext::Stream::GetBytesInternal(int64_t requested_len, > This method still looks very confusing to me. Can you think of ways to make I added some comments to try and make it clearer. I think the current code structure is reasonable with more explanation - it first ensures that it has all the data, then figures out how to return it. I don't think separating the case 1/2 branches earlier results in clearer code, because if we don't have a current io_buffer_, we need to fetch the next I/O buffer to figure out if it has all the required bytes. http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.cc@228 PS3, Line 228: DCHECK_GT(requested_len, boundary_buffer_bytes_left_); > Can you also add a DCHECK to document and assert requested_len > io_buffer_ Done. That precondition isn't exactly right but I added one. http://gerrit.cloudera.org:8080/#/c/8814/3/be/src/exec/scanner-context.cc@260 PS3, Line 260: num_bytes > This is the
[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time.
Hello Lars Volker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8814 to look at the new patch set (#4). Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time. .. IMPALA-6290: limit ScannerContext to 1 buffer at a time. This is a prerequisite for constraining the number of buffers per scan range. Before this patch, calling ReadBytes(), SkipBytes(), etc could cause an arbitrary number of I/O buffers to accumulate in 'completed_io_buffers_'. E.g. if we allocated 3 * 8MB I/O buffers for a range and then called ReadBytes(30MB), we would hit resource exhaustion as soon as 3 buffers were accumulated in 'completed_io_buffers_'. The fix is to avoid accumulating any buffers in 'completed_io_buffers_'. Instead of adding them to 'completed_io_buffers_', completed buffers are just returned to the I/O manager. It turned out that this did not weaken the ScannerContext's guarantees about memory lifetime, because ScannerContext::GetBytesInternal() cleared 'boundary_buffer_' each time it was called regardless. I checked that this behaviour wasn't a bug by inspecting the scanner code. I could not find any cases where scanners depended on returned memory remaining valid beyond the next Read*()/Get*()/Skip*() call on the stream. This change makes that lifetime explicit in the comments. A side-effect of this fix is that scanners do not need to call ReleaseCompletedResources() in CommitRows() and means that the ScannerContext only ever needs to hold one I/O buffer at a time. This change also reimplements SkipBytes() to avoid it accumulating memory in the boundary buffer for large skip sizes. Also clarifies some of the invariants in ScannerContext. E.g. some places assumed io_buffer_ != NULL, but that is no longer needed. Testing: Ran core tests with ASAN and exhaustive tests with DEBUG. Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M be/src/exec/scanner-context.inline.h M testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-scan-node-errors.test 10 files changed, 295 insertions(+), 201 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/8814/4 -- To view, visit http://gerrit.cloudera.org:8080/8814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 Gerrit-Change-Number: 8814 Gerrit-PatchSet: 4 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4664: Unexpected string conversion in Shell
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8762 ) Change subject: IMPALA-4664: Unexpected string conversion in Shell .. Patch Set 11: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8762 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifdce9781d1d97596c188691b62a141b9bd137610 Gerrit-Change-Number: 8762 Gerrit-PatchSet: 11 Gerrit-Owner: Kim Jin ChulGerrit-Reviewer: Andre Araujo Gerrit-Reviewer: John Russell Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 14 Dec 2017 22:39:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6177: Cleanup incomplete handcrafted IRs before finalizing module
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8541 ) Change subject: IMPALA-6177: Cleanup incomplete handcrafted IRs before finalizing module .. Patch Set 10: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8541 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If975cfb3906482b36dd6ede32ca81de6fcee1d7f Gerrit-Change-Number: 8541 Gerrit-PatchSet: 10 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 14 Dec 2017 22:38:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8414 ) Change subject: IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation .. Patch Set 15: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If5cb42437d11c13bc4a55c3ab426b66777332bd1 Gerrit-Change-Number: 8414 Gerrit-PatchSet: 15 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 14 Dec 2017 22:53:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time
Hello Lars Volker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8814 to look at the new patch set (#5). Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time .. IMPALA-6290: limit ScannerContext to 1 buffer at a time This is a prerequisite for constraining the number of buffers per scan range. Before this patch, calling ReadBytes(), SkipBytes(), etc could cause an arbitrary number of I/O buffers to accumulate in 'completed_io_buffers_'. E.g. if we allocated 3 * 8MB I/O buffers for a range and then called ReadBytes(30MB), we would hit resource exhaustion as soon as 3 buffers were accumulated in 'completed_io_buffers_'. The fix is to avoid accumulating any buffers in 'completed_io_buffers_'. Instead of adding them to 'completed_io_buffers_', completed buffers are just returned to the I/O manager. It turned out that this did not weaken the ScannerContext's guarantees about memory lifetime, because ScannerContext::GetBytesInternal() cleared 'boundary_buffer_' each time it was called regardless. I checked that this behaviour wasn't a bug by inspecting the scanner code. I could not find any cases where scanners depended on returned memory remaining valid beyond the next Read*()/Get*()/Skip*() call on the stream. This change makes that lifetime explicit in the comments. A side-effect of this fix is that scanners do not need to call ReleaseCompletedResources() in CommitRows() and means that the ScannerContext only ever needs to hold one I/O buffer at a time. This change also reimplements SkipBytes() to avoid it accumulating memory in the boundary buffer for large skip sizes. Also clarifies some of the invariants in ScannerContext. E.g. some places assumed io_buffer_ != NULL, but that is no longer needed. Testing: Ran core tests with ASAN and exhaustive tests with DEBUG. Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M be/src/exec/scanner-context.inline.h M testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-scan-node-errors.test 10 files changed, 295 insertions(+), 202 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/8814/5 -- To view, visit http://gerrit.cloudera.org:8080/8814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 Gerrit-Change-Number: 8814 Gerrit-PatchSet: 5 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8814 ) Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time. .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/8814/4/be/src/exec/scanner-context.h File be/src/exec/scanner-context.h: http://gerrit.cloudera.org:8080/#/c/8814/4/be/src/exec/scanner-context.h@231 PS4, Line 231: ScannerContext > scan range? Done http://gerrit.cloudera.org:8080/#/c/8814/4/be/src/exec/scanner-context.cc File be/src/exec/scanner-context.cc: http://gerrit.cloudera.org:8080/#/c/8814/4/be/src/exec/scanner-context.cc@238 PS4, Line 238: // First this loop ensures that we've got all of the requested bytes in either > nit: Assuming that this comment covers the while loop to the empty line aft Wording was confusing, I meant what you said. Reworded to hopefully be less confusing. Also moved the case1/2 comment below since I think this comment is sufficient to explain the loop now. -- To view, visit http://gerrit.cloudera.org:8080/8814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 Gerrit-Change-Number: 8814 Gerrit-PatchSet: 4 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 15 Dec 2017 16:40:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3703: Store query context in thread-local variables
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8621 ) Change subject: IMPALA-3703: Store query context in thread-local variables .. Patch Set 12: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/8621/12/be/src/common/thread-debug-info-test.cc File be/src/common/thread-debug-info-test.cc: http://gerrit.cloudera.org:8080/#/c/8621/12/be/src/common/thread-debug-info-test.cc@30 PS12, Line 30: Not a big deal, but there's a bit more vertical whitespace in these tests than we usually prefer. -- To view, visit http://gerrit.cloudera.org:8080/8621 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I566f7f1db5117c498e86e0bd05b33bdcff624609 Gerrit-Change-Number: 8621 Gerrit-PatchSet: 12 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 15 Dec 2017 00:54:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8820 ) Change subject: IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE .. Patch Set 4: (1 comment) The code and tests look good to me, had one outstanding question that came out of a discussion. http://gerrit.cloudera.org:8080/#/c/8820/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8820/4//COMMIT_MSG@7 PS4, Line 7: IMPALA-5654: Disallow setting Kudu table name in CREATE TABLE Gabor mentioned to me offline that we might want to catch it in alter table too (unsure if that is already disallowed). -- To view, visit http://gerrit.cloudera.org:8080/8820 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieca037498abf8f5fde67b77e824b720482cdbe6f Gerrit-Change-Number: 8820 Gerrit-PatchSet: 4 Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 15 Dec 2017 01:00:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time
Hello Michael Ho, Lars Volker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8814 to look at the new patch set (#6). Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time .. IMPALA-6290: limit ScannerContext to 1 buffer at a time This is a prerequisite for constraining the number of buffers per scan range. Before this patch, calling ReadBytes(), SkipBytes(), etc could cause an arbitrary number of I/O buffers to accumulate in 'completed_io_buffers_'. E.g. if we allocated 3 * 8MB I/O buffers for a range and then called ReadBytes(30MB), we would hit resource exhaustion as soon as 3 buffers were accumulated in 'completed_io_buffers_'. The fix is to avoid accumulating any buffers in 'completed_io_buffers_'. Instead of adding them to 'completed_io_buffers_', completed buffers are just returned to the I/O manager. It turned out that this did not weaken the ScannerContext's guarantees about memory lifetime, because ScannerContext::GetBytesInternal() cleared 'boundary_buffer_' each time it was called regardless. I checked that this behaviour wasn't a bug by inspecting the scanner code. I could not find any cases where scanners depended on returned memory remaining valid beyond the next Read*()/Get*()/Skip*() call on the stream. This change makes that lifetime explicit in the comments. A side-effect of this fix is that scanners do not need to call ReleaseCompletedResources() in CommitRows() and means that the ScannerContext only ever needs to hold one I/O buffer at a time. This change also reimplements SkipBytes() to avoid it accumulating memory in the boundary buffer for large skip sizes. Also clarifies some of the invariants in ScannerContext. E.g. some places assumed io_buffer_ != NULL, but that is no longer needed. Testing: Ran core tests with ASAN and exhaustive tests with DEBUG. Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/parquet-column-readers.cc M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M be/src/exec/scanner-context.inline.h M testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-scan-node-errors.test 10 files changed, 294 insertions(+), 201 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/8814/6 -- To view, visit http://gerrit.cloudera.org:8080/8814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 Gerrit-Change-Number: 8814 Gerrit-PatchSet: 6 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8814 ) Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time .. Patch Set 6: Code-Review+1 Rebased onto master. There wasn't really a dependency between this and the patch it was on top of. -- To view, visit http://gerrit.cloudera.org:8080/8814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 Gerrit-Change-Number: 8814 Gerrit-PatchSet: 6 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 15 Dec 2017 21:44:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6318: Adjustment for hanging query cancellation test
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8852 ) Change subject: IMPALA-6318: Adjustment for hanging query cancellation test .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/8852/3/tests/shell/util.py File tests/shell/util.py: http://gerrit.cloudera.org:8080/#/c/8852/3/tests/shell/util.py@168 PS3, Line 168: stdout = None if omit_stdout else PIPE Looking at the Popen docs, I think this means that stdout gets sent to the stdout of the current process, which I think means that we're capturing it as part of the test output. Is this what we want? -- To view, visit http://gerrit.cloudera.org:8080/8852 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I082c83b91b6d0c527de92c7992f0dc9d1b290433 Gerrit-Change-Number: 8852 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 15 Dec 2017 16:28:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6290: limit ScannerContext to 1 buffer at a time.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8814 ) Change subject: IMPALA-6290: limit ScannerContext to 1 buffer at a time. .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/8814/4/be/src/exec/scanner-context.cc File be/src/exec/scanner-context.cc: http://gerrit.cloudera.org:8080/#/c/8814/4/be/src/exec/scanner-context.cc@232 PS4, Line 232: // 1. if the read starts at an I/O buffer boundary and 'requested_len' is smaller than This was not quite right - we also return a pointer into the I/O buffer if the previous read was from a boundary buffer, ending partway into an I/O buffer -- To view, visit http://gerrit.cloudera.org:8080/8814 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I74c5960a75f7d88b0e1de4199af731fb13e592f0 Gerrit-Change-Number: 8814 Gerrit-PatchSet: 4 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 15 Dec 2017 16:43:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4664: Unexpected string conversion in Shell
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8762 ) Change subject: IMPALA-4664: Unexpected string conversion in Shell .. Patch Set 15: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8762 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifdce9781d1d97596c188691b62a141b9bd137610 Gerrit-Change-Number: 8762 Gerrit-PatchSet: 15 Gerrit-Owner: Kim Jin ChulGerrit-Reviewer: Andre Araujo Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: John Russell Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 15 Dec 2017 17:56:56 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2640: Make a given command case-sensitive
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8762 ) Change subject: IMPALA-2640: Make a given command case-sensitive .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/8762/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8762/6//COMMIT_MSG@7 PS6, Line 7: IMPALA-2640: Make a given command case-sensitive > In the fix of IMPALA-4664(https://gerrit.cloudera.org/#/c/8639/4/shell/impa I think the lower-case column description is intended. The tests you added in https://gerrit.cloudera.org/#/c/8639/4/tests/shell/test_shell_interactive.py all appear to be fixed by this patch set, so I think we should add those tests to this patch. http://gerrit.cloudera.org:8080/#/c/8762/9/shell/impala_shell.py File shell/impala_shell.py: http://gerrit.cloudera.org:8080/#/c/8762/9/shell/impala_shell.py@555 PS9, Line 555: > I added some comment for the coping. The PSF license is already included in We still need to add a line in LICENSE.txt like: Parts of shell/impala_shell.py: Python Software License V2 -- To view, visit http://gerrit.cloudera.org:8080/8762 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifdce9781d1d97596c188691b62a141b9bd137610 Gerrit-Change-Number: 8762 Gerrit-PatchSet: 6 Gerrit-Owner: Kim Jin ChulGerrit-Reviewer: Andre Araujo Gerrit-Reviewer: John Russell Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 13 Dec 2017 23:09:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2640: Make a given command case-sensitive
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8762 ) Change subject: IMPALA-2640: Make a given command case-sensitive .. Patch Set 6: (6 comments) http://gerrit.cloudera.org:8080/#/c/8762/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8762/6//COMMIT_MSG@7 PS6, Line 7: IMPALA-2640: Make a given command case-sensitive Unless I'm missing something, doesn't this solve IMPALA-4664 too? I checked out this commit and played around and I get: [localhost:21000] > sElect'FOO'; Query: sElect 'FOO' Query submitted at: 2017-12-12 15:30:44 (Coordinator: http://tarmstrong-box:25000) Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=c049cc68f0842616:4419587c +---+ | 'foo' | +---+ | FOO | +---+ Fetched 1 row(s) in 0.00s http://gerrit.cloudera.org:8080/#/c/8762/6/shell/impala_shell.py File shell/impala_shell.py: http://gerrit.cloudera.org:8080/#/c/8762/6/shell/impala_shell.py@572 PS6, Line 572: return func(arg, cmd_) > Done. It's a matter of taste but I think passing in cmd as an argument is clearer - fewer member variables and implicit argument parsing seems easier to follow. I agree it still feels a bit weird in some ways. Anyway, I think I'm ok with the current version. http://gerrit.cloudera.org:8080/#/c/8762/9/shell/impala_shell.py File shell/impala_shell.py: http://gerrit.cloudera.org:8080/#/c/8762/9/shell/impala_shell.py@305 PS9, Line 305: self.set_query_options) Long line http://gerrit.cloudera.org:8080/#/c/8762/9/shell/impala_shell.py@310 PS9, Line 310:! Long line. http://gerrit.cloudera.org:8080/#/c/8762/9/shell/impala_shell.py@555 PS9, Line 555: Can you include attribution for where the copied code came from (e.g. see be/src/runtime/string-search.h). It also needs to be added to LICENSE.txt. It looks like it's licensed with the PSF license V2, which we already include in LICENSE.txt. http://gerrit.cloudera.org:8080/#/c/8762/9/shell/impala_shell.py@571 PS9, Line 571: etur _ on the end of a local variable is a bit weird to me - can we call it command instead? -- To view, visit http://gerrit.cloudera.org:8080/8762 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifdce9781d1d97596c188691b62a141b9bd137610 Gerrit-Change-Number: 8762 Gerrit-PatchSet: 6 Gerrit-Owner: Kim Jin ChulGerrit-Reviewer: Andre Araujo Gerrit-Reviewer: John Russell Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 12 Dec 2017 23:31:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6177: Cleanup incomplete handcrafted IRs before finalizing module
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8541 ) Change subject: IMPALA-6177: Cleanup incomplete handcrafted IRs before finalizing module .. Patch Set 9: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/8541/8/be/src/codegen/llvm-codegen-test.cc File be/src/codegen/llvm-codegen-test.cc: http://gerrit.cloudera.org:8080/#/c/8541/8/be/src/codegen/llvm-codegen-test.cc@167 PS8, Line 167: } > that would crash llvm(and hence the process) during FinalizeModule We could use the magic IMPALA_ASSERT_DEBUG_DEATH() macro if we want to test that code path. It might not be worth testing if it's an invalid use of the API. -- To view, visit http://gerrit.cloudera.org:8080/8541 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If975cfb3906482b36dd6ede32ca81de6fcee1d7f Gerrit-Change-Number: 8541 Gerrit-PatchSet: 9 Gerrit-Owner: Bikramjeet VigGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 13 Dec 2017 00:01:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5191: Standardize column alias behavior
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8801 ) Change subject: IMPALA-5191: Standardize column alias behavior .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/8801/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8801/5//COMMIT_MSG@14 PS5, Line 14: SELECT int_col / 2 AS x Do we have existing end-to-end tests for queries of these forms? Would be good to confirm that the frontend produces plans that are executable and correct. http://gerrit.cloudera.org:8080/#/c/8801/5/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/8801/5/fe/src/main/java/org/apache/impala/analysis/Expr.java@817 PS5, Line 817: boolean preserveRootType, boolean substituteChildren I find it a bit hard to read code calling methods with multiple boolean arguments since it's easy to get the flags mixed up at the callsite. I'm ok with this as-is, but we could consider adding public methods instead of adding flags. E.g. trySubsitute() vs trySubsituteRoot() or trySubstituteRecursive() vs trySubstituteRoot(). -- To view, visit http://gerrit.cloudera.org:8080/8801 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f82483b486acf6953876cfa672b0d034f3709a8 Gerrit-Change-Number: 8801 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 13 Dec 2017 21:50:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6222: Add details to error msg on failure to get min reservation
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8781 ) Change subject: IMPALA-6222: Add details to error msg on failure to get min reservation .. Patch Set 3: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/8781/3/be/src/runtime/mem-tracker.cc File be/src/runtime/mem-tracker.cc: http://gerrit.cloudera.org:8080/#/c/8781/3/be/src/runtime/mem-tracker.cc@324 PS3, Line 324: std::priority_queue, vector >, nit: add a using up the top of the file or in common/names.h instead of using the std:: prefixes. http://gerrit.cloudera.org:8080/#/c/8781/3/be/src/runtime/mem-tracker.cc@340 PS3, Line 340: int limit) { nit: weird wrapping. -- To view, visit http://gerrit.cloudera.org:8080/8781 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic4675fe923b33fdc4ddefd1872e6d6b803993d74 Gerrit-Change-Number: 8781 Gerrit-PatchSet: 3 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 13 Dec 2017 22:51:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6184: Clean up aftr ScalarExprEvaluator::Clone() fails
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8572 ) Change subject: IMPALA-6184: Clean up aftr ScalarExprEvaluator::Clone() fails .. Patch Set 1: (2 comments) Just a couple of questions. http://gerrit.cloudera.org:8080/#/c/8572/1/be/src/exprs/scalar-expr-evaluator.cc File be/src/exprs/scalar-expr-evaluator.cc: http://gerrit.cloudera.org:8080/#/c/8572/1/be/src/exprs/scalar-expr-evaluator.cc@193 PS1, Line 193: Status status = Kinda subtle, can you add a comment like the one above? // Always add the evaluator to the vector so it can be cleaned up. http://gerrit.cloudera.org:8080/#/c/8572/1/testdata/workloads/functional-query/queries/QueryTest/udf-errors.test File testdata/workloads/functional-query/queries/QueryTest/udf-errors.test: http://gerrit.cloudera.org:8080/#/c/8572/1/testdata/workloads/functional-query/queries/QueryTest/udf-errors.test@85 PS1, Line 85: on (bad_expr2(rand()) = (t2.bool_col && t1.bool_col)); Is this one guaranteed to fail? It seems like if we're unlucky, rand() could not return one of the values that BadExpr fails on. -- To view, visit http://gerrit.cloudera.org:8080/8572 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I45ffd722d0a69ad05ae3c748cf504c7f1a959a1d Gerrit-Change-Number: 8572 Gerrit-PatchSet: 1 Gerrit-Owner: Michael HoGerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 16 Nov 2017 15:50:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4835 (prep only): create io subfolder and namespace
Hello Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8424 to look at the new patch set (#9). Change subject: IMPALA-4835 (prep only): create io subfolder and namespace .. IMPALA-4835 (prep only): create io subfolder and namespace Instead of using the DiskIoMgr class as a namespace, which prevents forward-declaration of inner classes, create an impala::io namespace and unnested the inner class. This is done in anticipation of DiskIoMgr depending on BufferPool. This helps avoid a circular dependency between DiskIoMgr, TmpFileMgr and BufferPool headers that could not be broken with forward declarations. Testing: Ran core tests. Change-Id: If807f93a47d8027a43e56dd80b1b535d0bb74e1b --- M be/CMakeLists.txt M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node-mt.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M be/src/runtime/CMakeLists.txt D be/src/runtime/disk-io-mgr.h M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h A be/src/runtime/io/CMakeLists.txt R be/src/runtime/io/disk-io-mgr-internal.h R be/src/runtime/io/disk-io-mgr-stress-test.cc R be/src/runtime/io/disk-io-mgr-stress.cc R be/src/runtime/io/disk-io-mgr-stress.h R be/src/runtime/io/disk-io-mgr-test.cc R be/src/runtime/io/disk-io-mgr.cc A be/src/runtime/io/disk-io-mgr.h R be/src/runtime/io/handle-cache.h R be/src/runtime/io/handle-cache.inline.h R be/src/runtime/io/request-context.cc R be/src/runtime/io/request-context.h A be/src/runtime/io/request-ranges.h R be/src/runtime/io/scan-range.cc M be/src/runtime/row-batch.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/test-env.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h 38 files changed, 1,417 insertions(+), 1,310 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/8424/9 -- To view, visit http://gerrit.cloudera.org:8080/8424 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If807f93a47d8027a43e56dd80b1b535d0bb74e1b Gerrit-Change-Number: 8424 Gerrit-PatchSet: 9 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation
Hello Tianyi Wang, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8414 to look at the new patch set (#10). Change subject: IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation .. IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation In preparation for switching the I/O mgr to the buffer pool, this removes and cleans up a lot of code so that the switchover patch starts from a cleaner slate. * Remove the free buffer cache (which will be replaced by buffer pool's own caching). * Make memory limit exceeded error checking synchronous (in anticipation of having to propagate buffer pool errors synchronously). * Simplify error propagation - remove the (ineffectual) code that enqueued BufferDescriptors containing error statuses. * Document locking scheme better in a few places, make it part of the function signature when it seemed reasonable. * Move ReturnBuffer() to ScanRange, because it is intrinsically connected with the lifecycle of a scan range. * Separate external ReturnBuffer() and internal CleanUpBuffer() interfaces - previously callers of ReturnBuffer() were fudging the num_buffers_in_reader accounting to make the external interface work. * Eliminate redundant state in ScanRange: 'eosr_returned_' and 'is_cancelled_'. * Clarify the logic around calling Close() for the last BufferDescriptor. -> There appeared to be an implicit assumption that buffers would be freed in the order they were returned from the scan range, so that the "eos" buffer was returned last. Instead just count the number of outstanding buffers to detect the last one. -> Touching the is_cancelled_ field without holding a lock was hard to reason about - violated locking rules and it was unclear that it was race-free. This will probably regress performance somewhat because of the cache removal, so my plan is to merge it around the same time as switching the I/O mgr to allocate from the buffer pool. I'm keeping the patches separate to make reviewing easier. Testing: * Ran exhaustive tests * Ran the disk-io-mgr-stress-test overnight Change-Id: If5cb42437d11c13bc4a55c3ab426b66777332bd1 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/scanner-context.cc M be/src/runtime/exec-env.cc M be/src/runtime/io/disk-io-mgr-stress.cc M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/runtime/mem-tracker.h M be/src/runtime/test-env.cc M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h 19 files changed, 552 insertions(+), 861 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/8414/10 -- To view, visit http://gerrit.cloudera.org:8080/8414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If5cb42437d11c13bc4a55c3ab426b66777332bd1 Gerrit-Change-Number: 8414 Gerrit-PatchSet: 10 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6206: Fix data load failure with -notests
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8580 ) Change subject: IMPALA-6206: Fix data load failure with -notests .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8580/2/be/src/udf_samples/CMakeLists.txt File be/src/udf_samples/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/8580/2/be/src/udf_samples/CMakeLists.txt@44 PS2, Line 44: # Data loading depends on building these targets This feels a little weird since those targets aren't really part of the ImpalaUdf library (or in that directory). How about added a UdfSamples target and adding it to MAKE_TARGETS in bin/make_impala.sh. -- To view, visit http://gerrit.cloudera.org:8080/8580 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idaa193989d77d56b72db05ad54e1fb0a345938fb Gerrit-Change-Number: 8580 Gerrit-PatchSet: 2 Gerrit-Owner: Zach AmsdenGerrit-Reviewer: David Knupp Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 17 Nov 2017 17:15:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4835 (prep only): create io subfolder and namespace
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8424 ) Change subject: IMPALA-4835 (prep only): create io subfolder and namespace .. Patch Set 9: rebased and tweaked the #define guard to include _IO so that text scanner plugins like Impala-lzo can detect whether it is pre/post refactor. -- To view, visit http://gerrit.cloudera.org:8080/8424 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If807f93a47d8027a43e56dd80b1b535d0bb74e1b Gerrit-Change-Number: 8424 Gerrit-PatchSet: 9 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 17 Nov 2017 17:47:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4835 (prep only): create io subfolder and namespace
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8424 ) Change subject: IMPALA-4835 (prep only): create io subfolder and namespace .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8424 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If807f93a47d8027a43e56dd80b1b535d0bb74e1b Gerrit-Change-Number: 8424 Gerrit-PatchSet: 9 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 17 Nov 2017 17:47:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6128: Spill-to-disk Encryption(AES-CFB + SHA256) is slow
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8510 ) Change subject: IMPALA-6128: Spill-to-disk Encryption(AES-CFB + SHA256) is slow .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/8510/7/be/src/util/openssl-util.cc File be/src/util/openssl-util.cc: http://gerrit.cloudera.org:8080/#/c/8510/7/be/src/util/openssl-util.cc@147 PS7, Line 147: return (SSLeay() >= OPENSSL_VERSION_1_0_1); When we re-do this patch we should also add a test option to force use of CFB mode and make sure that the unit test tests both encryption modes. -- To view, visit http://gerrit.cloudera.org:8080/8510 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib97939f2334838263364b53ef3413871638bf53e Gerrit-Change-Number: 8510 Gerrit-PatchSet: 7 Gerrit-Owner: Xianda KeGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mike Yoder Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Xianda Ke Gerrit-Comment-Date: Mon, 20 Nov 2017 17:57:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6220: Revert "IMPALA-6128: Spill-to-disk Encryption(AES-CFB + SHA256) is slow"
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8597 ) Change subject: IMPALA-6220: Revert "IMPALA-6128: Spill-to-disk Encryption(AES-CFB + SHA256) is slow" .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8597 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id31d5fcfec5c6d777d4acee5c1be2d4fc4605efb Gerrit-Change-Number: 8597 Gerrit-PatchSet: 1 Gerrit-Owner: Michael HoGerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Xianda Ke Gerrit-Comment-Date: Mon, 20 Nov 2017 17:00:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1144: Fix exception when cancelling query in Impala-shell with CTRL-C
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8549 ) Change subject: IMPALA-1144: Fix exception when cancelling query in Impala-shell with CTRL-C .. Patch Set 5: (10 comments) Overall approach seems good to me, but had quite a few comments about comments and naming. http://gerrit.cloudera.org:8080/#/c/8549/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8549/2//COMMIT_MSG@25 PS2, Line 25: Change-Id: I6cefaf1dae78baae238289816a7cb9d210fb38e2 > Thanks for the nice example, Phil! It seems ok that it doesn't reproduce it every time - it's inherently racy. We could probably have a deterministic unit test by mocking out enough things, but I think I prefer the end-to-end shell test. It would be good to also test on release and ASAN builds to make sure that it doesn't immediately fail when the daemon is slow or fast. http://gerrit.cloudera.org:8080/#/c/8549/5/shell/impala_client.py File shell/impala_client.py: http://gerrit.cloudera.org:8080/#/c/8549/5/shell/impala_client.py@58 PS5, Line 58: QueryCancelledException Can we call this QueryCancelledByShellException or ExpectedQueryCancellationException or something like that to distinguish from the case when the query was cancelled unexpectedly (e.g. by an admin from the impala debug ui). http://gerrit.cloudera.org:8080/#/c/8549/5/shell/impala_client.py@60 PS5, Line 60: self.value = value The value might not be necessary if we're always going to pass in the same string. The existing exceptions do this but seems like an odd pattern. We could just print that string instead of stringifying the exception. http://gerrit.cloudera.org:8080/#/c/8549/5/shell/impala_client.py@83 PS5, Line 83: self.is_query_cancelled = False Can you add a brief comment explaining how this works. I.e. that it's set to true by the cancellation signal handler and suppresses any errors after cancellation. http://gerrit.cloudera.org:8080/#/c/8549/5/shell/impala_client.py@435 PS5, Line 435: except BeeswaxService.QueryNotFoundException: It's weird that we don't need the logic for QueryNotFoundException. Does Impala even throw this? I'm ok with leaving this in since it's part of the beeswax protocol but we should also check for cancellation for consistency. http://gerrit.cloudera.org:8080/#/c/8549/5/shell/impala_shell.py File shell/impala_shell.py: http://gerrit.cloudera.org:8080/#/c/8549/5/shell/impala_shell.py@1026 PS5, Line 1026: print_to_stderr(str(e)) It might be to just not print anything - we don't always throw this exception when the query is cancelled so the message only gets printed sometimes. http://gerrit.cloudera.org:8080/#/c/8549/5/tests/shell/test_shell_commandline.py File tests/shell/test_shell_commandline.py: http://gerrit.cloudera.org:8080/#/c/8549/5/tests/shell/test_shell_commandline.py@333 PS5, Line 333: 'select * from v, v v2, v v3, v v4, v v5, v v6, v v7, v v8, v v9;"' Maybe add a few more tables to the cross join to make sure that it runs for a very long time. We don't want a flaky test if it sometimes finishes quicker. http://gerrit.cloudera.org:8080/#/c/8549/5/tests/shell/test_shell_commandline.py@342 PS5, Line 342:'tpch.customer c3 order by c1.c_name"' You could also do something like: order by c1.c_name, sleep(1) That would make it pause 1ms per row and slow it down further. http://gerrit.cloudera.org:8080/#/c/8549/5/tests/shell/test_shell_commandline.py@346 PS5, Line 346: query_txt Isn't this the command-line arguments? http://gerrit.cloudera.org:8080/#/c/8549/5/tests/shell/test_shell_commandline.py@349 PS5, Line 349: sleep(3) Can you add a comment to the function explaining that it waits 3 seconds before cancelling and expects the query to be cancelled. Is there any reason why 3 seconds is the right amount of time? Currently it just seems like a magic number. Would be good to reduce it if possible, since it will make the tests very slow (other shell tests have this problem already). -- To view, visit http://gerrit.cloudera.org:8080/8549 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6cefaf1dae78baae238289816a7cb9d210fb38e2 Gerrit-Change-Number: 8549 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8414 ) Change subject: IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation .. Patch Set 9: (19 comments) http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/disk-io-mgr.h File be/src/runtime/io/disk-io-mgr.h: http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/disk-io-mgr.h@a222 PS11, Line 222: : : > did you mean to just delete that? Done http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/disk-io-mgr.h@457 PS11, Line 457: XCEED > queued? Done http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/disk-io-mgr.cc@333 PS11, Line 333: CANCELLED > seems like the right thing to do but are you sure we weren't depending on p Before this patch the only place when RequestContext::Cancel() was called with a non-CANCELLED status was with MEM_LIMIT_EXCEEDED. This patch removes that call and the need for any asynchronous error propagation via RequestContext, so can't introduce any new bugs in this area. Error propagation bugs are still possible in client code if they cancel the context before propagating the original error, but I believe the HDFS scan node code is written to avoid this. http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/disk-io-mgr.cc@581 PS11, Line 581: DCHECK(!buffer->is_cached()) << "HDFS cache reads don't go through this code path."; > maybe put that dcheck upstream too (like ReadRange() or even earlier) to he There is already a DCHECK in ReadRange() that catches this, but its non-obvious. I added a string to it to make it clearer what it's checking. http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/disk-io-mgr.cc@665 PS11, Line 665: DCHECK > nit: _EQ The strong typed enums prevent that unfortunately. I did change this to print out the integer value on failure. http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/request-context.h File be/src/runtime/io/request-context.h: http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/request-context.h@448 PS11, Line 448: again. > disk threads? Done http://gerrit.cloudera.org:8080/#/c/8414/9/be/src/runtime/io/request-context.h File be/src/runtime/io/request-context.h: http://gerrit.cloudera.org:8080/#/c/8414/9/be/src/runtime/io/request-context.h@27 PS9, Line 27: For most I/O manager clients it is an : /// opaque pointer, but some clients may need to include this heade > is that still true? Done http://gerrit.cloudera.org:8080/#/c/8414/9/be/src/runtime/io/request-context.h@93 PS9, Line 93: void Cancel(); > how did you decide to make this a first class thing in the RequestContext, Yeah, I'd probably call it DiskIoMgrClient or something like that if I was writing this from scratch. The ones I moved were the trivial methods that were only called during RequestContext set-up and tear-down. The other DiskIoMgr methods are called from more places and are more about the I/O operations than setting up the context. It isn't really principled now that I think about it. http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/request-context.cc File be/src/runtime/io/request-context.cc: http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/request-context.cc@42 PS11, Line 42: << static_cast(range->external_buffer_tag_); > oh, I guess DCHECK_EQ doesn't work with enum or something? Yeah, not with the strongly-typed "enum class" enums. It works with weakly typed enums because they get implicitly converted to integers. http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/request-context.cc@75 PS11, Line 75: reader > what about the writing case? is that relevant? Yep, this needed updating. http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/request-context.cc@76 PS11, Line 76: GetNext > GetNext() Done http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/request-context.cc@79 PS11, Line 79: it > does this referring to the context or something else? Done http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/request-context.cc@90 PS11, Line 90: cancelled Status > does that mean the CANCELLED status or the status that lead to cancellation The RequestContext can't be cancelled with an arbitrary status. If there was an earlier error for a ScanRange, the cancelled status doesn't overwrite the prior status. http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/request-context.cc@91 PS11, Line 91: the cancelled state. > what does that mean? reworded. http://gerrit.cloudera.org:8080/#/c/8414/11/be/src/runtime/io/request-context.cc@95 PS11, Line 95: decrements the number of disk queues the context : // is on. > which code is this talking about? Oh, I guess that's
[Impala-ASF-CR] IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation
Hello Tianyi Wang, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8414 to look at the new patch set (#12). Change subject: IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation .. IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation In preparation for switching the I/O mgr to the buffer pool, this removes and cleans up a lot of code so that the switchover patch starts from a cleaner slate. * Remove the free buffer cache (which will be replaced by buffer pool's own caching). * Make memory limit exceeded error checking synchronous (in anticipation of having to propagate buffer pool errors synchronously). * Simplify error propagation - remove the (ineffectual) code that enqueued BufferDescriptors containing error statuses. * Document locking scheme better in a few places, make it part of the function signature when it seemed reasonable. * Move ReturnBuffer() to ScanRange, because it is intrinsically connected with the lifecycle of a scan range. * Separate external ReturnBuffer() and internal CleanUpBuffer() interfaces - previously callers of ReturnBuffer() were fudging the num_buffers_in_reader accounting to make the external interface work. * Eliminate redundant state in ScanRange: 'eosr_returned_' and 'is_cancelled_'. * Clarify the logic around calling Close() for the last BufferDescriptor. -> There appeared to be an implicit assumption that buffers would be freed in the order they were returned from the scan range, so that the "eos" buffer was returned last. Instead just count the number of outstanding buffers to detect the last one. -> Touching the is_cancelled_ field without holding a lock was hard to reason about - violated locking rules and it was unclear that it was race-free. * Remove DiskIoMgr::Read() to simplify interface. It is trivial to inline at the callsites. This will probably regress performance somewhat because of the cache removal, so my plan is to merge it around the same time as switching the I/O mgr to allocate from the buffer pool. I'm keeping the patches separate to make reviewing easier. Testing: * Ran exhaustive tests * Ran the disk-io-mgr-stress-test overnight Change-Id: If5cb42437d11c13bc4a55c3ab426b66777332bd1 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/scanner-context.cc M be/src/runtime/exec-env.cc M be/src/runtime/io/disk-io-mgr-stress.cc M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/runtime/mem-tracker.h M be/src/runtime/test-env.cc M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h 19 files changed, 575 insertions(+), 904 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/8414/12 -- To view, visit http://gerrit.cloudera.org:8080/8414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If5cb42437d11c13bc4a55c3ab426b66777332bd1 Gerrit-Change-Number: 8414 Gerrit-PatchSet: 12 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2181: Add query option levels for display
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8447 ) Change subject: IMPALA-2181: Add query option levels for display .. Patch Set 15: (17 comments) http://gerrit.cloudera.org:8080/#/c/8447/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8447/14//COMMIT_MSG@10 PS14, Line 10: diplayed displayed http://gerrit.cloudera.org:8080/#/c/8447/14//COMMIT_MSG@12 PS14, Line 12: called SET ALL shows all the options grouped by their option levels. Mention that there are also server-side versions of these commands? http://gerrit.cloudera.org:8080/#/c/8447/14/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/8447/14/be/src/service/impala-server.cc@1236 PS14, Line 1236: else { Nit: } else { http://gerrit.cloudera.org:8080/#/c/8447/15/be/src/service/query-options.h File be/src/service/query-options.h: http://gerrit.cloudera.org:8080/#/c/8447/15/be/src/service/query-options.h@45 PS15, Line 45: QUERY_OPT_FN(disable_cached_reads, DISABLE_CACHED_READS, TQueryOptionLevel::REGULAR)\ Looks like disable_cached_reads is deprecated: see IMPALA-2963 . I missed that on an earlier pass http://gerrit.cloudera.org:8080/#/c/8447/15/be/src/service/query-options.h@89 PS15, Line 89: QUERY_OPT_FN(enable_expr_rewrites, ENABLE_EXPR_REWRITES, TQueryOptionLevel::REGULAR)\ I think enable_expr_rewrites should be advanced. It's only disabled as a workaround to planner bugs. http://gerrit.cloudera.org:8080/#/c/8447/15/be/src/service/query-options.h@91 PS15, Line 91: QUERY_OPT_FN(parquet_dictionary_filtering, PARQUET_DICTIONARY_FILTERING, TQueryOptionLevel::REGULAR)\ parquet_dictionary_filtering should probably be advanced, there's typically not much reason to disable it unless working around a bug. http://gerrit.cloudera.org:8080/#/c/8447/15/be/src/service/query-options.h@93 PS15, Line 93: QUERY_OPT_FN(parquet_read_statistics, PARQUET_READ_STATISTICS, TQueryOptionLevel::REGULAR)\ Same for parquet_read_statistics. http://gerrit.cloudera.org:8080/#/c/8447/14/be/src/service/query-options.h File be/src/service/query-options.h: http://gerrit.cloudera.org:8080/#/c/8447/14/be/src/service/query-options.h@39 PS14, Line 39: QUERY_OPT_FN(abort_on_default_limit_exceeded, ABORT_ON_DEFAULT_LIMIT_EXCEEDED, TQueryOptionLevel::DEPRECATED)\ nit: can we wrap these to 90 chars? http://gerrit.cloudera.org:8080/#/c/8447/14/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/8447/14/be/src/service/query-options.cc@41 PS14, Line 41: using namespace beeswax; nit: probably best to individually import the names so it's easier to figure out what got imported. http://gerrit.cloudera.org:8080/#/c/8447/14/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/8447/14/common/thrift/ImpalaService.thrift@294 PS14, Line 294: typedef mapTQueryOptionLevels Does this need to be a typedef here? I don't see any references in any thrift files. Maybe we can just use the map type directly where its needed in the backend http://gerrit.cloudera.org:8080/#/c/8447/14/common/thrift/beeswax.thrift File common/thrift/beeswax.thrift: http://gerrit.cloudera.org:8080/#/c/8447/14/common/thrift/beeswax.thrift@111 PS14, Line 111: 4: optional TQueryOptionLevel level Comment that this was added for impala-shell? http://gerrit.cloudera.org:8080/#/c/8447/14/fe/src/main/java/org/apache/impala/analysis/SetStmt.java File fe/src/main/java/org/apache/impala/analysis/SetStmt.java: http://gerrit.cloudera.org:8080/#/c/8447/14/fe/src/main/java/org/apache/impala/analysis/SetStmt.java@55 PS14, Line 55: if (isSetAll_) { Nit: could be one line if (isSetAll_) return ...; http://gerrit.cloudera.org:8080/#/c/8447/14/fe/src/main/java/org/apache/impala/analysis/SetStmt.java@76 PS14, Line 76: request.setIs_set_all(true); nit: could be one line http://gerrit.cloudera.org:8080/#/c/8447/14/shell/impala_client.py File shell/impala_client.py: http://gerrit.cloudera.org:8080/#/c/8447/14/shell/impala_client.py@97 PS14, Line 97: self.query_option_levels[option.key.upper()] = option.level Does this still work if we're talking to an older Impala server that doesn't return the level? Update: looks like it doesn't: Traceback (most recent call last): File "/home/tarmstrong/Impala/incubator-impala/shell/impala_shell.py", line 15 27, in shell = ImpalaShell(options, query_options) File "/home/tarmstrong/Impala/incubator-impala/shell/impala_shell.py", line 20 5, in __init__ self.do_connect(options.impalad) File "/home/tarmstrong/Impala/incubator-impala/shell/impala_shell.py", line 73 1, in do_connect self.imp_client.build_default_query_options_dict() File
[Impala-ASF-CR] IMPALA-6217: fix DCHECK in Parquet fuzz test
Tim Armstrong has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/8594 ) Change subject: IMPALA-6217: fix DCHECK in Parquet fuzz test .. IMPALA-6217: fix DCHECK in Parquet fuzz test The IMPALA-4177 change accidentally removed a Status check that could be hit with a corrupt parquet file. Change-Id: I6ceca7de31f602b75d744dacbdf37afa75983344 --- M be/src/exec/parquet-column-readers.cc 1 file changed, 1 insertion(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/8594/2 -- To view, visit http://gerrit.cloudera.org:8080/8594 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6ceca7de31f602b75d744dacbdf37afa75983344 Gerrit-Change-Number: 8594 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-6217: fix DCHECK in Parquet fuzz test
Hello Lars Volker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8594 to look at the new patch set (#3). Change subject: IMPALA-6217: fix DCHECK in Parquet fuzz test .. IMPALA-6217: fix DCHECK in Parquet fuzz test The IMPALA-4177 change accidentally removed a Status check that could be hit with a corrupt parquet file. Testing: Ran TestScannersFuzzing in a loop for 2 days. Change-Id: I6ceca7de31f602b75d744dacbdf37afa75983344 --- M be/src/exec/parquet-column-readers.cc 1 file changed, 1 insertion(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/8594/3 -- To view, visit http://gerrit.cloudera.org:8080/8594 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6ceca7de31f602b75d744dacbdf37afa75983344 Gerrit-Change-Number: 8594 Gerrit-PatchSet: 3 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4132: Use -fno-omit-frame-pointer
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8612 ) Change subject: IMPALA-4132: Use -fno-omit-frame-pointer .. Patch Set 1: Code-Review+1 (1 comment) The perf results looked acceptable to me - there didn't seem to be any significant regressions. http://gerrit.cloudera.org:8080/#/c/8612/1/be/CMakeLists.txt File be/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/8612/1/be/CMakeLists.txt@42 PS1, Line 42: pointers nit: extra s -- To view, visit http://gerrit.cloudera.org:8080/8612 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib7d0d88ba015a847356ed0274fe91017b98cb941 Gerrit-Change-Number: 8612 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 21 Nov 2017 18:38:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] [DOCS] Correct bit patterns in comments for shiftright() examples
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8624 ) Change subject: [DOCS] Correct bit patterns in comments for shiftright() examples .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8624 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0d1bc00e4c8d1ae3352e18a8df95745d9c9b5ea6 Gerrit-Change-Number: 8624 Gerrit-PatchSet: 1 Gerrit-Owner: John RussellGerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 21 Nov 2017 20:02:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4964: Fix Decimal modulo overflow
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8329 ) Change subject: IMPALA-4964: Fix Decimal modulo overflow .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8329 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5420201d4440d421e33e443df005cdcc16b8a6cd Gerrit-Change-Number: 8329 Gerrit-PatchSet: 4 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Wed, 22 Nov 2017 00:44:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4835 (prep only): create io subfolder and namespace
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8424 ) Change subject: IMPALA-4835 (prep only): create io subfolder and namespace .. Patch Set 8: rebased and resolved conflicts with other patches -- To view, visit http://gerrit.cloudera.org:8080/8424 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If807f93a47d8027a43e56dd80b1b535d0bb74e1b Gerrit-Change-Number: 8424 Gerrit-PatchSet: 8 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 16 Nov 2017 18:46:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8414 ) Change subject: IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation .. Patch Set 8: Ended up doing some further changes to make the DecrementRequestThread*(), locking and error propagation a bit more comprehensible. -- To view, visit http://gerrit.cloudera.org:8080/8414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If5cb42437d11c13bc4a55c3ab426b66777332bd1 Gerrit-Change-Number: 8414 Gerrit-PatchSet: 8 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 16 Nov 2017 19:08:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6080: clean up table descriptor handling
Hello Tianyi Wang, anujphadke, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8330 to look at the new patch set (#3). Change subject: IMPALA-6080: clean up table descriptor handling .. IMPALA-6080: clean up table descriptor handling * Add DescriptorTbl::CreateHdfsTableDescriptor to avoid having to create an entire DescriptorTbl during INSERT finalization (when only a descriptor for the output table is needed) * Remove TQueryExecRequest.desc_tbl, there's already a home for it in TQueryContext.desc_tbl This required fixing a problem in the planner test infrastructure where the TQueryCtx was reused for planning multiple times despite being modified during planning. This is based on Marcel Kornacker's coordinator cleanup patch. Testing: Ran core tests. Change-Id: Id427dab0c196b556bd8b2d64ec618403d5cbd4d6 --- M be/src/runtime/coordinator.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java 6 files changed, 116 insertions(+), 110 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/8330/3 -- To view, visit http://gerrit.cloudera.org:8080/8330 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id427dab0c196b556bd8b2d64ec618403d5cbd4d6 Gerrit-Change-Number: 8330 Gerrit-PatchSet: 3 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-6080: clean up table descriptor handling
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8330 ) Change subject: IMPALA-6080: clean up table descriptor handling .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/8330/2/be/src/benchmarks/expr-benchmark.cc File be/src/benchmarks/expr-benchmark.cc: http://gerrit.cloudera.org:8080/#/c/8330/2/be/src/benchmarks/expr-benchmark.cc@57 PS2, Line 57: #include "service/frontend.h" > I don't think this include is needed. Done http://gerrit.cloudera.org:8080/#/c/8330/2/be/src/runtime/descriptors.h File be/src/runtime/descriptors.h: http://gerrit.cloudera.org:8080/#/c/8330/2/be/src/runtime/descriptors.h@79 PS2, Line 79: struct LlvmTupleStruct { > Is this struct used? Done -- To view, visit http://gerrit.cloudera.org:8080/8330 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id427dab0c196b556bd8b2d64ec618403d5cbd4d6 Gerrit-Change-Number: 8330 Gerrit-PatchSet: 2 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Thu, 16 Nov 2017 18:20:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4177,IMPALA-6039: batched bit reading and rle decoding
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8267 ) Change subject: IMPALA-4177,IMPALA-6039: batched bit reading and rle decoding .. Patch Set 15: Code-Review+2 Rebased -- To view, visit http://gerrit.cloudera.org:8080/8267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I35de0cf80c86f501c4a39270afc8fb8111552ac6 Gerrit-Change-Number: 8267 Gerrit-PatchSet: 15 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 16 Nov 2017 17:52:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4835 (prep only): create io subfolder and namespace
Hello Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8424 to look at the new patch set (#8). Change subject: IMPALA-4835 (prep only): create io subfolder and namespace .. IMPALA-4835 (prep only): create io subfolder and namespace Instead of using the DiskIoMgr class as a namespace, which prevents forward-declaration of inner classes, create an impala::io namespace and unnested the inner class. This is done in anticipation of DiskIoMgr depending on BufferPool. This helps avoid a circular dependency between DiskIoMgr, TmpFileMgr and BufferPool headers that could not be broken with forward declarations. Testing: Ran core tests. Change-Id: If807f93a47d8027a43e56dd80b1b535d0bb74e1b --- M be/CMakeLists.txt M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node-mt.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/kudu-scan-node.cc M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M be/src/runtime/CMakeLists.txt D be/src/runtime/disk-io-mgr.h M be/src/runtime/exec-env.cc M be/src/runtime/exec-env.h A be/src/runtime/io/CMakeLists.txt R be/src/runtime/io/disk-io-mgr-internal.h R be/src/runtime/io/disk-io-mgr-stress-test.cc R be/src/runtime/io/disk-io-mgr-stress.cc R be/src/runtime/io/disk-io-mgr-stress.h R be/src/runtime/io/disk-io-mgr-test.cc R be/src/runtime/io/disk-io-mgr.cc A be/src/runtime/io/disk-io-mgr.h R be/src/runtime/io/handle-cache.h R be/src/runtime/io/handle-cache.inline.h R be/src/runtime/io/request-context.cc R be/src/runtime/io/request-context.h A be/src/runtime/io/request-ranges.h R be/src/runtime/io/scan-range.cc M be/src/runtime/row-batch.h M be/src/runtime/runtime-state.cc M be/src/runtime/runtime-state.h M be/src/runtime/test-env.h M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h 38 files changed, 1,416 insertions(+), 1,310 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/8424/8 -- To view, visit http://gerrit.cloudera.org:8080/8424 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If807f93a47d8027a43e56dd80b1b535d0bb74e1b Gerrit-Change-Number: 8424 Gerrit-PatchSet: 8 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation
Hello Tianyi Wang, Dan Hecht, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8414 to look at the new patch set (#8). Change subject: IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation .. IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation In preparation for switching the I/O mgr to the buffer pool, this removes and cleans up a lot of code so that the switchover patch starts from a cleaner slate. * Remove the free buffer cache (which will be replaced by buffer pool's own caching). * Make memory limit exceeded error checking synchronous (in anticipation of having to propagate buffer pool errors synchronously). * Simplify error propagation - remove the (ineffectual) code that enqueued BufferDescriptors containing error statuses. * Document locking scheme better in a few places, make it part of the function signature when it seemed reasonable. * Move ReturnBuffer() to ScanRange, because it is intrinsically connected with the lifecycle of a scan range. * Separate external ReturnBuffer() and internal CleanUpBuffer() interfaces - previously callers of ReturnBuffer() were fudging the num_buffers_in_reader accounting to make the external interface work. * Eliminate redundant state in ScanRange: 'eosr_returned_' and 'is_cancelled_'. * Clarify the logic around calling Close() for the last BufferDescriptor. -> There appeared to be an implicit assumption that buffers would be freed in the order they were returned from the scan range, so that the "eos" buffer was returned last. Instead just count the number of outstanding buffers to detect the last one. -> Touching the is_cancelled_ field without holding a lock was hard to reason about - violated locking rules and it was unclear that it was race-free. This will probably regress performance somewhat because of the cache removal, so my plan is to merge it around the same time as switching the I/O mgr to allocate from the buffer pool. I'm keeping the patches separate to make reviewing easier. Testing: * Ran exhaustive tests * Ran the disk-io-mgr-stress-test overnight Change-Id: If5cb42437d11c13bc4a55c3ab426b66777332bd1 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/scanner-context.cc M be/src/runtime/exec-env.cc M be/src/runtime/io/disk-io-mgr-stress.cc M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/runtime/mem-tracker.h M be/src/runtime/test-env.cc M be/src/runtime/tmp-file-mgr-test.cc M be/src/runtime/tmp-file-mgr.cc M be/src/util/impalad-metrics.cc M be/src/util/impalad-metrics.h 19 files changed, 537 insertions(+), 847 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/8414/8 -- To view, visit http://gerrit.cloudera.org:8080/8414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If5cb42437d11c13bc4a55c3ab426b66777332bd1 Gerrit-Change-Number: 8414 Gerrit-PatchSet: 8 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5019: Decimal V2 addition
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8309 ) Change subject: IMPALA-5019: Decimal V2 addition .. Patch Set 4: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/8309/3/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: http://gerrit.cloudera.org:8080/#/c/8309/3/be/src/runtime/decimal-value.inline.h@243 PS3, Line 243: // of the numbers can be zero and one must be positive and the other one negative. > The way it is currently implemented, I don't see an advantage in doing that I looked at it again and it seems fine as is. http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h File be/src/runtime/decimal-value.inline.h: http://gerrit.cloudera.org:8080/#/c/8309/2/be/src/runtime/decimal-value.inline.h@310 PS2, Line 310: DCHECK(!*overflow) << "Cannot overflow unless result is Decimal16Value"; > Modified the comment of AdjustToSameScale() slightly. I was debating chang wfm -- To view, visit http://gerrit.cloudera.org:8080/8309 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I401049c56d910eb1546a178c909c923b01239336 Gerrit-Change-Number: 8309 Gerrit-PatchSet: 4 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Thu, 16 Nov 2017 23:39:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5019: Decimal V2 addition
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8309 ) Change subject: IMPALA-5019: Decimal V2 addition .. Patch Set 4: I'm happy with this - did anyone else want to take a pass over it? -- To view, visit http://gerrit.cloudera.org:8080/8309 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I401049c56d910eb1546a178c909c923b01239336 Gerrit-Change-Number: 8309 Gerrit-PatchSet: 4 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Thu, 16 Nov 2017 23:40:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1474: Add a metric for running queries
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7228 ) Change subject: IMPALA-1474: Add a metric for running queries .. Patch Set 1: I think it probably does. There's some miscellaneous cleanup in this patch but I think they implement the same functionality, more or less. -- To view, visit http://gerrit.cloudera.org:8080/7228 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I41bc88d0e91427ca2fd70136dc6e06fa7a2663da Gerrit-Change-Number: 7228 Gerrit-PatchSet: 1 Gerrit-Owner: Henry RobinsonGerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 16 Nov 2017 21:33:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6184: Clean up after ScalarExprEvaluator::Clone() fails
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8572 ) Change subject: IMPALA-6184: Clean up after ScalarExprEvaluator::Clone() fails .. Patch Set 2: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/8572/1/testdata/workloads/functional-query/queries/QueryTest/udf-errors.test File testdata/workloads/functional-query/queries/QueryTest/udf-errors.test: http://gerrit.cloudera.org:8080/#/c/8572/1/testdata/workloads/functional-query/queries/QueryTest/udf-errors.test@85 PS1, Line 85: on (bad_expr2(rand()) = (t2.bool_col && t1.bool_col)); > PS2 has been updated to fail more deterministically. rand() with the same s Yeah I wasn't sure if it could behave differently if the scan ranges were scheduled differently or something. -- To view, visit http://gerrit.cloudera.org:8080/8572 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I45ffd722d0a69ad05ae3c748cf504c7f1a959a1d Gerrit-Change-Number: 8572 Gerrit-PatchSet: 2 Gerrit-Owner: Michael HoGerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 16 Nov 2017 21:19:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1575: part 2: yield admission control resources
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8581 ) Change subject: IMPALA-1575: part 2: yield admission control resources .. Patch Set 1: This is essentially the same patch as before, except with the statestore update frequency in the test tweaked to avoid the flaky timeout problem we were hitting on release builds. The problem is that queries were sometimes in the queue long enough (60s) to hit the default admission control timeout. The reason it was so slow is that the test frequently waits for 10 statestore updates or 5 seconds. -- To view, visit http://gerrit.cloudera.org:8080/8581 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib1fae8dc1c4b0eca7bfa8fadae4a56ef2b37947a Gerrit-Change-Number: 8581 Gerrit-PatchSet: 1 Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 17 Nov 2017 00:57:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6128: Spill-to-disk Encryption(AES-CFB + SHA256) is slow
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8510 ) Change subject: IMPALA-6128: Spill-to-disk Encryption(AES-CFB + SHA256) is slow .. Patch Set 4: Code-Review+1 Change makes sense to me. I don't know the story with what OpenSSL versions we have to interoperate with, so I'll let Sailesh weigh in on that. -- To view, visit http://gerrit.cloudera.org:8080/8510 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib97939f2334838263364b53ef3413871638bf53e Gerrit-Change-Number: 8510 Gerrit-PatchSet: 4 Gerrit-Owner: Xianda KeGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Mike Yoder Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Xianda Ke Gerrit-Comment-Date: Thu, 16 Nov 2017 23:55:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6084: Avoid using of global namespace for llvm
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8489 ) Change subject: IMPALA-6084: Avoid using of global namespace for llvm .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/8489/5/be/src/exprs/scalar-expr.cc File be/src/exprs/scalar-expr.cc: http://gerrit.cloudera.org:8080/#/c/8489/5/be/src/exprs/scalar-expr.cc@260 PS5, Line 260: // Compute the alignment of this value. Values should be self-aligned for > Thanks for the fix. The weird formatting comes from my vi setting. Anyway, No problem, it's easy to miss in a large patch. -- To view, visit http://gerrit.cloudera.org:8080/8489 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7098baff5f746bcc2965583ca99f6188997b733a Gerrit-Change-Number: 8489 Gerrit-PatchSet: 5 Gerrit-Owner: Kim Jin ChulGerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kim Jin Chul Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 16 Nov 2017 00:19:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6206: Fix data load failure with -notests
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8580 ) Change subject: IMPALA-6206: Fix data load failure with -notests .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8580 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idaa193989d77d56b72db05ad54e1fb0a345938fb Gerrit-Change-Number: 8580 Gerrit-PatchSet: 3 Gerrit-Owner: Zach AmsdenGerrit-Reviewer: David Knupp Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zach Amsden Gerrit-Comment-Date: Fri, 17 Nov 2017 23:46:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6292: Fix incorrect DCHECK in decimal subtraction
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8796 ) Change subject: IMPALA-6292: Fix incorrect DCHECK in decimal subtraction .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8796 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I42d42ad85efe32b7a0db0d2353385939fee09934 Gerrit-Change-Number: 8796 Gerrit-PatchSet: 1 Gerrit-Owner: Taras BobrovytskyGerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 08 Dec 2017 01:23:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6285: Don't print stack trace on RPC errors.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8788 ) Change subject: IMPALA-6285: Don't print stack trace on RPC errors. .. Patch Set 2: Code-Review+2 (2 comments) I'm assuming Sailesh is happy with this based on his previous comment. http://gerrit.cloudera.org:8080/#/c/8788/2/be/src/runtime/client-cache.h File be/src/runtime/client-cache.h: http://gerrit.cloudera.org:8080/#/c/8788/2/be/src/runtime/client-cache.h@314 PS2, Line 314: return Status::Expected(msg); This idiom is a bit inefficient - i think it results in three copies of the string - first to construct it, then to copy it into the ErrorMsg, then to copy it into a new ErrorMsg in the status. It probably doesn't really matter though. http://gerrit.cloudera.org:8080/#/c/8788/2/be/src/runtime/client-cache.h@329 PS2, Line 329: VLOG_QUERY << msg.msg() << " rpc: " << typeid(*response).name(); Should we also include the typeid() in the Status? Or is thta overkill? -- To view, visit http://gerrit.cloudera.org:8080/8788 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia83294494442ef21f7934f92ba9112e80d81fa58 Gerrit-Change-Number: 8788 Gerrit-PatchSet: 2 Gerrit-Owner: Michael HoGerrit-Reviewer: Michael Ho Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 07 Dec 2017 17:59:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8034 ) Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder .. Patch Set 18: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 Gerrit-Change-Number: 8034 Gerrit-PatchSet: 18 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Bikramjeet VigGerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 08 Dec 2017 00:57:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6265 Query cancellation test enhancements
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8713 ) Change subject: IMPALA-6265 Query cancellation test enhancements .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8713 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie0bff485a872df7be8efd784314a6ca91aaadd11 Gerrit-Change-Number: 8713 Gerrit-PatchSet: 4 Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 05 Dec 2017 18:08:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6265 Query cancellation test enhancements
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8713 ) Change subject: IMPALA-6265 Query cancellation test enhancements .. Patch Set 3: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/8713/2/tests/shell/test_shell_commandline.py File tests/shell/test_shell_commandline.py: http://gerrit.cloudera.org:8080/#/c/8713/2/tests/shell/test_shell_commandline.py@385 PS2, Line 385: test_get_log_once(self, > I'm not sure I get this comment :) Yep, that looks better. -- To view, visit http://gerrit.cloudera.org:8080/8713 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie0bff485a872df7be8efd784314a6ca91aaadd11 Gerrit-Change-Number: 8713 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor KaszabGerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 05 Dec 2017 18:08:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6268: KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8766 ) Change subject: IMPALA-6268: KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8766 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6c750850ff916617a06e3cfac330072d8e2179e8 Gerrit-Change-Number: 8766 Gerrit-PatchSet: 1 Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 05 Dec 2017 19:25:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3804: Re-enable per-scan filtering for sequence-based scanners
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8684 ) Change subject: IMPALA-3804: Re-enable per-scan filtering for sequence-based scanners .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8684 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4b38c26bcbe67f83efcc65a1723d766626ae3d3e Gerrit-Change-Number: 8684 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-NagyGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 05 Dec 2017 18:13:46 + Gerrit-HasComments: No