[Impala-ASF-CR] IMPALA-8021: Add estimated cardinality to EXPLAIN output
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12136 ) Change subject: IMPALA-8021: Add estimated cardinality to EXPLAIN output .. Patch Set 14: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1735/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12136 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie9aa2d715b04cbb279aaffec8c5692686562d986 Gerrit-Change-Number: 12136 Gerrit-PatchSet: 14 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 08 Jan 2019 07:06:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8021: Add estimated cardinality to EXPLAIN output
Hello Bharath Vissapragada, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12136 to look at the new patch set (#14). Change subject: IMPALA-8021: Add estimated cardinality to EXPLAIN output .. IMPALA-8021: Add estimated cardinality to EXPLAIN output Cardinality is vital to understanding why a plan has the form it does, yet the planner normally emits cardinality information only for the detailed levels. Unfortunately, most query profiles we see are at the standard level without this information (except in the summary table), making it hard to understand what happened. This patch adds cardinality to the standard EXPLAIN output. It also changes the displayed cardinality value to be in abbreviated "metric" form: 1.23K instead of 1234, etc. Changing the DESCRIBE output has a huge impact on PlannerTest: all the "golden" test files must change. To avoid doing this twice, this patch also includes: IMPALA-7919: Add predicates line in plan output for partition key predicates This is also the time to also include: IMPALA-8022: Add cardinality checks to PlannerTest The comparison code was changed to allow a set of validators, one of which compares cardinality to ensure it is within 5% of the expected value. This should ensure we don't change estimates unintentionally. While many planner tests are concerned with cardinality, many others are not. Testing showed that the cardinality is actually unstable within tests. For such tests, added filters to ignore cardinality. The filter is enabled by default (for backward compatibility) but disabled (to allow cardinality verification) for the critical tests. Rebasing the tests was complicated by a bug in the error-matching code, so this patch also fixes: IMPALA-8023: Fix PlannerTest to handle error lines consistently Now, the error output written to the output "save results" file matches that expected in the "golden" file -- no more handling these specially. Testing: * Added cardinality verification. * Reran all FE tests. * Rebased all PlannerTest .test files. * Adjusted the metadata/test_explain.py test to handle the changed EXPLAIN output. Change-Id: Ie9aa2d715b04cbb279aaffec8c5692686562d986 --- M fe/src/main/java/org/apache/impala/analysis/PartitionSet.java M fe/src/main/java/org/apache/impala/common/PrintUtils.java M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M fe/src/test/java/org/apache/impala/testutil/TestUtils.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/complex-types-file-formats.test M testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/constant.test M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test M testdata/workloads/functional-planner/queries/PlannerTest/default-join-distr-mode-broadcast.test M testdata/workloads/functional-planner/queries/PlannerTest/default-join-distr-mode-shuffle.test M testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test M testdata/workloads/functional-planner/queries/PlannerTest/disable-preaggregations.test M testdata/workloads/functional-planner/queries/PlannerTest/distinct-estimate.test M testdata/workloads/functional-planner/queries/PlannerTest/distinct.test M testdata/workloads/functional-planner/queries/PlannerTest/empty.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M
[Impala-ASF-CR] IMPALA-8021: Add estimated cardinality to EXPLAIN output
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12136 ) Change subject: IMPALA-8021: Add estimated cardinality to EXPLAIN output .. Patch Set 14: (3 comments) http://gerrit.cloudera.org:8080/#/c/12136/14/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/12136/14/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@276 PS14, Line 276: List partitions, TableRef hdfsTblRef, AggregateInfo aggInfo, line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/12136/14/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: http://gerrit.cloudera.org:8080/#/c/12136/14/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@729 PS14, Line 729: assertEquals(" foo=bar cardinality=", filter.transform(" foo=bar cardinality=unavailable")); line too long (96 > 90) http://gerrit.cloudera.org:8080/#/c/12136/14/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@731 PS14, Line 731: assertEquals(" row-size= cardinality=10.3K", filter.transform(" row-size=10B cardinality=10.3K")); line too long (102 > 90) -- To view, visit http://gerrit.cloudera.org:8080/12136 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie9aa2d715b04cbb279aaffec8c5692686562d986 Gerrit-Change-Number: 12136 Gerrit-PatchSet: 14 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 08 Jan 2019 06:34:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7666: Propagate name of test into CLIENT IDENTIFIER.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12177 ) Change subject: IMPALA-7666: Propagate name of test into CLIENT_IDENTIFIER. .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1734/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2f685fd16982d73ad3fc0f4a7578c5ad83b9a84c Gerrit-Change-Number: 12177 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 08 Jan 2019 02:44:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7666: Adding an opaque client identifier to query options.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12130 ) Change subject: IMPALA-7666: Adding an opaque client identifier to query options. .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1732/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12130 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0a7708492f05d33b2bc99fc3a03b461bbb6f3ea4 Gerrit-Change-Number: 12130 Gerrit-PatchSet: 3 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 08 Jan 2019 02:36:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8026: Fix #rows accounting for NLJ
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12164 ) Change subject: IMPALA-8026: Fix #rows accounting for NLJ .. IMPALA-8026: Fix #rows accounting for NLJ Use the same, much simpler, approach used by all other ExecNodes. Testing: Manually tested by adding logging to print values of the counter. Confirmed that the fix led to the counter having the correct values (instead of values exceeding num_rows_returned_). Change-Id: I43e2e1d8f85478ff36c8cddc69bac3bdccd2c824 Reviewed-on: http://gerrit.cloudera.org:8080/12164 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/exec/nested-loop-join-node.cc 1 file changed, 8 insertions(+), 20 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/12164 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I43e2e1d8f85478ff36c8cddc69bac3bdccd2c824 Gerrit-Change-Number: 12164 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall
[Impala-ASF-CR] IMPALA-8021: Add estimated cardinality to EXPLAIN output
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12136 ) Change subject: IMPALA-8021: Add estimated cardinality to EXPLAIN output .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1733/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12136 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie9aa2d715b04cbb279aaffec8c5692686562d986 Gerrit-Change-Number: 12136 Gerrit-PatchSet: 13 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 08 Jan 2019 02:28:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7986,IMPALA-7987: run daemons in docker containers
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12095 ) Change subject: IMPALA-7986,IMPALA-7987: run daemons in docker containers .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/12095/11/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/12095/11/bin/impala-config.sh@275 PS11, Line 275: export EXTERNAL_LISTEN_HOST="${EXTERNAL_LISTEN_HOST-localhost}" This should be 0.0.0.0 by default. -- To view, visit http://gerrit.cloudera.org:8080/12095 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5975cced33fa93df43101dd47d19b8af12e93d11 Gerrit-Change-Number: 12095 Gerrit-PatchSet: 11 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 08 Jan 2019 02:15:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7970 : Add support for metastore event based automatic invalidate
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12118 ) Change subject: IMPALA-7970 : Add support for metastore event based automatic invalidate .. Patch Set 14: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1731/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12118 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic70b27780560b7ac9b33418d132b36cd0ca4abf7 Gerrit-Change-Number: 12118 Gerrit-PatchSet: 14 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Tue, 08 Jan 2019 02:12:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7666: Propagate name of test into CLIENT IDENTIFIER.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12177 ) Change subject: IMPALA-7666: Propagate name of test into CLIENT_IDENTIFIER. .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/12177/1/tests/conftest.py File tests/conftest.py: http://gerrit.cloudera.org:8080/#/c/12177/1/tests/conftest.py@595 PS1, Line 595: @pytest.hookimpl(trylast=True) flake8: E302 expected 2 blank lines, found 1 http://gerrit.cloudera.org:8080/#/c/12177/1/tests/conftest.py@601 PS1, Line 601: , flake8: E231 missing whitespace after ',' http://gerrit.cloudera.org:8080/#/c/12177/1/tests/conftest.py@601 PS1, Line 601: : flake8: E501 line too long (95 > 90 characters) -- To view, visit http://gerrit.cloudera.org:8080/12177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2f685fd16982d73ad3fc0f4a7578c5ad83b9a84c Gerrit-Change-Number: 12177 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 08 Jan 2019 02:07:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7666: Propagate name of test into CLIENT IDENTIFIER.
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12177 Change subject: IMPALA-7666: Propagate name of test into CLIENT_IDENTIFIER. .. IMPALA-7666: Propagate name of test into CLIENT_IDENTIFIER. To facilitate correlating test failures (where we sometimes know things like fragment id) with the tests that generated those queries, we can stuff the test name into CLIENT_IDENTIFIER. The mechanics here are to create a global, tests.common.current_node to store the current test, create a plugin in conftest to set this when entering a test, and then configuring connections as they're created deep in test code. Change-Id: I2f685fd16982d73ad3fc0f4a7578c5ad83b9a84c --- M tests/common/impala_connection.py M tests/conftest.py M tests/query_test/test_observability.py 3 files changed, 32 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/12177/1 -- To view, visit http://gerrit.cloudera.org:8080/12177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I2f685fd16982d73ad3fc0f4a7578c5ad83b9a84c Gerrit-Change-Number: 12177 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] IMPALA-8021: Add estimated cardinality to EXPLAIN output
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/12136 ) Change subject: IMPALA-8021: Add estimated cardinality to EXPLAIN output .. Patch Set 13: (7 comments) Addressed review comments, rebased on latest master, and fixed a potentially unstable test. Will attempt to remove the range checks for cardinality. Please hold off on another review until we see the results. http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java: http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@100 PS12, Line 100:* Return a list of partitions left after applying the conjuncts. Please note > Needs an update to explain the Pair<> Done http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1230 PS12, Line 1230: if (partitionConjuncts_ != null && !partitionConjuncts_.isEmpty()) { > Looks like Greg had a different opinion on the jira when I mentioned separa Agreed. I added this partly because of Greg's request (and your response), but also because I found it very helpful to understand the cardinality in a plan. We have a table with 11K rows. The scan node had no (visible) predicates, yet claimed to return 5K rows. The "5/11 partitions" should have clued me in. Had to debug the code to see why that was the case. Once these partition predicates appeared, however, it was obvious what was happening. So, is that a good reason to include this info in the EXPLAIN output? IMHO it is, what do you think? Discovered one reason that we did not include them: we had some partitioning tests that use the system time for a partitioning value. As a result, the predicate changes on each run and it is hard to verify the test results. Simply changed it to use a constant date/time to work around that issue. http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1283 PS12, Line 1283: Pair, List> pair = > line too long (110 > 90) Done http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1331 PS12, Line 1331: scanNode.init(analyzer); > nit: Can we force this in the c'tor above? Setting it separately seems weir Done http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java: http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java@541 PS12, Line 541: if (!testOptions.contains(PlannerTestOption.VALIDATE_CARDINALITY)) { > Did you run into some cases where this was required to make tests pass? Or Mostly it is to reduce redundant output. By definition, an exchange just passes along the rows from its input. So, repeating the row width and cardinality for an Exchange adds no information. Added a comment to explain this thought. http://gerrit.cloudera.org:8080/#/c/12136/4/fe/src/test/java/org/apache/impala/testutil/TestUtils.java File fe/src/test/java/org/apache/impala/testutil/TestUtils.java: http://gerrit.cloudera.org:8080/#/c/12136/4/fe/src/test/java/org/apache/impala/testutil/TestUtils.java@302 PS4, Line 302: } > Just for reference, notice that the old code consumed three two per pass: o Argh.. That should be "..two tokens per pass..." http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/test/java/org/apache/impala/testutil/TestUtils.java File fe/src/test/java/org/apache/impala/testutil/TestUtils.java: http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/test/java/org/apache/impala/testutil/TestUtils.java@207 PS12, Line 207: If found, then parse the values and compare them. Report a match :* if the values are within 5%, else report a mismatch. > Yeah I'm not sure if we expect the cardinality estimates to be exactly repr On my first attempt, I enabled cardinality checks for all tests. I saw some variation. Then, when running on the Jenkins server, I found that the metadata actually differs from what we have on dev machines, so I disabled cardinality checks by default, turning them on only for selected tests where they add value. So, with that revision, it may be that I can remove this code. I'll commit the other review comments and do a commit. Then, I'll remove this code, do another commit, and do a
[Impala-ASF-CR] IMPALA-8021: Add estimated cardinality to EXPLAIN output
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12136 ) Change subject: IMPALA-8021: Add estimated cardinality to EXPLAIN output .. Patch Set 13: (3 comments) http://gerrit.cloudera.org:8080/#/c/12136/13/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/12136/13/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@276 PS13, Line 276: List partitions, TableRef hdfsTblRef, AggregateInfo aggInfo, line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/12136/13/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: http://gerrit.cloudera.org:8080/#/c/12136/13/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@729 PS13, Line 729: assertEquals(" foo=bar cardinality=", filter.transform(" foo=bar cardinality=unavailable")); line too long (96 > 90) http://gerrit.cloudera.org:8080/#/c/12136/13/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@731 PS13, Line 731: assertEquals(" row-size= cardinality=10.3K", filter.transform(" row-size=10B cardinality=10.3K")); line too long (102 > 90) -- To view, visit http://gerrit.cloudera.org:8080/12136 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie9aa2d715b04cbb279aaffec8c5692686562d986 Gerrit-Change-Number: 12136 Gerrit-PatchSet: 13 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 08 Jan 2019 02:01:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8021: Add estimated cardinality to EXPLAIN output
Hello Bharath Vissapragada, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12136 to look at the new patch set (#13). Change subject: IMPALA-8021: Add estimated cardinality to EXPLAIN output .. IMPALA-8021: Add estimated cardinality to EXPLAIN output Cardinality is vital to understanding why a plan has the form it does, yet the planner normally emits cardinality information only for the detailed levels. Unfortunately, most query profiles we see are at the standard level without this information (except in the summary table), making it hard to understand what happened. This patch adds cardinality to the standard EXPLAIN output. It also changes the displayed cardinality value to be in abbreviated "metric" form: 1.23K instead of 1234, etc. Changing the DESCRIBE output has a huge impact on PlannerTest: all the "golden" test files must change. To avoid doing this twice, this patch also includes: IMPALA-7919: Add predicates line in plan output for partition key predicates This is also the time to also include: IMPALA-8022: Add cardinality checks to PlannerTest The comparison code was changed to allow a set of validators, one of which compares cardinality to ensure it is within 5% of the expected value. This should ensure we don't change estimates unintentionally. While many planner tests are concerned with cardinality, many others are not. Testing showed that the cardinality is actually unstable within tests. For such tests, added filters to ignore cardinality. The filter is enabled by default (for backward compatibility) but disabled (to allow cardinality verification) for the critical tests. Rebasing the tests was complicated by a bug in the error-matching code, so this patch also fixes: IMPALA-8023: Fix PlannerTest to handle error lines consistently Now, the error output written to the output "save results" file matches that expected in the "golden" file -- no more handling these specially. Testing: * Added cardinality verification. * Reran all FE tests. * Rebased all PlannerTest .test files. * Adjusted the metadata/test_explain.py test to handle the changed EXPLAIN output. Change-Id: Ie9aa2d715b04cbb279aaffec8c5692686562d986 --- M fe/src/main/java/org/apache/impala/analysis/PartitionSet.java M fe/src/main/java/org/apache/impala/common/PrintUtils.java M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M fe/src/test/java/org/apache/impala/testutil/TestUtils.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/complex-types-file-formats.test M testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/constant.test M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test M testdata/workloads/functional-planner/queries/PlannerTest/default-join-distr-mode-broadcast.test M testdata/workloads/functional-planner/queries/PlannerTest/default-join-distr-mode-shuffle.test M testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test M testdata/workloads/functional-planner/queries/PlannerTest/disable-preaggregations.test M testdata/workloads/functional-planner/queries/PlannerTest/distinct-estimate.test M testdata/workloads/functional-planner/queries/PlannerTest/distinct.test M testdata/workloads/functional-planner/queries/PlannerTest/empty.test M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M
[Impala-ASF-CR] IMPALA-7666: Adding an opaque client identifier to query options.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12130 ) Change subject: IMPALA-7666: Adding an opaque client identifier to query options. .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/12130/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12130/2//COMMIT_MSG@15 PS2, Line 15: protoype > *prototype I'm going to do the py.test stuff in a separate patch. impala-shell is a good idea, and I've done that. http://gerrit.cloudera.org:8080/#/c/12130/2/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/12130/2/be/src/service/query-options.cc@724 PS2, Line 724: .c_str() > Can't we just pass the original string? Done -- To view, visit http://gerrit.cloudera.org:8080/12130 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0a7708492f05d33b2bc99fc3a03b461bbb6f3ea4 Gerrit-Change-Number: 12130 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 08 Jan 2019 01:59:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7666: Adding an opaque client identifier to query options.
Hello Lars Volker, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12130 to look at the new patch set (#3). Change subject: IMPALA-7666: Adding an opaque client identifier to query options. .. IMPALA-7666: Adding an opaque client identifier to query options. We sometimes struggle to identify the client (e.g., a given version of a JDBC driver, Tableau, Hue, etc.) for a given query. This commit adds a User-Agent header style, called "Client Identifier", which clients can set as a Query Option. Nothing is done with this header, but it's written into logs and query profiles. This commit includes changes to impala-shell to include the version of impala shell with an associated test. A future commit will serialize the name of the py.test being run into this field, which is handy for figuring out where a query came from. Change-Id: I0a7708492f05d33b2bc99fc3a03b461bbb6f3ea4 --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M shell/impala_shell.py M tests/shell/test_shell_commandline.py 6 files changed, 27 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/12130/3 -- To view, visit http://gerrit.cloudera.org:8080/12130 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0a7708492f05d33b2bc99fc3a03b461bbb6f3ea4 Gerrit-Change-Number: 12130 Gerrit-PatchSet: 3 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7970 : Add support for metastore event based automatic invalidate
Vihang Karajgaonkar has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/12118 ) Change subject: IMPALA-7970 : Add support for metastore event based automatic invalidate .. IMPALA-7970 : Add support for metastore event based automatic invalidate This change adds support to CatalogD to poll metastore events to issue invalidate on tables automatically. It adds basic infrastructure to poll HMS notifications events at a configurable frequency using a backend config called hms_event_polling_frequency_s flag. Currently, it issues invalidate at tables when it received alter events on table and partitions. It also adds tables/databases and removes tables from catalogD when it receives create_table/create_database and drop_table/drop_database events. The default value of hms_event_polling_frequency_s is 0 which disables the feature. A non-zero value in seconds of this configuration can be used to enable the feature and set the polling frequency. In order to process each event atomically, this feature relies on version lock in CatalogServiceCatalog. It adds new methods in CatalogServiceCatalog which takes a write lock on version so that readers are blocked until the catalog state is updated based on the events. In case of processing events, the metastore operation is already completed and only catalog state needs to be updated. Hence we do not need to make new metastore calls while processing the events and only version lock is sufficient to serialize updates to the catalog objects based on events. This locking protocol is similar to what is done in case of DDL processing in CatalogOpExecutor except it does not need to take metastoreDdlLock since no metastore operations are needed during event processing. The change also adds a new test class to test the basic functionality for each of the event type which is supported. Note that this feature is still a work in progress and additional improvements will be done in subsequent patches. Be default the feature is turned off. Change-Id: Ic70b27780560b7ac9b33418d132b36cd0ca4abf7 --- M be/src/common/global-flags.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java A fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java A fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java A fe/src/main/java/org/apache/impala/catalog/MetastoreNotificationException.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java A fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java M fe/src/test/resources/postgresql-hive-site.xml.template 10 files changed, 1,585 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/12118/14 -- To view, visit http://gerrit.cloudera.org:8080/12118 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic70b27780560b7ac9b33418d132b36cd0ca4abf7 Gerrit-Change-Number: 12118 Gerrit-PatchSet: 14 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-8026: Fix #rows accounting for NLJ
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12164 ) Change subject: IMPALA-8026: Fix #rows accounting for NLJ .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/12164 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I43e2e1d8f85478ff36c8cddc69bac3bdccd2c824 Gerrit-Change-Number: 12164 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Tue, 08 Jan 2019 01:22:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7905: Hive keywords not quoted for identifiers
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12009 ) Change subject: IMPALA-7905: Hive keywords not quoted for identifiers .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1730/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I06cc20b052a3a66535a171c36b4b31477c0ba6d0 Gerrit-Change-Number: 12009 Gerrit-PatchSet: 8 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Comment-Date: Tue, 08 Jan 2019 01:07:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7468: Port CancelQueryFInstances() to KRPC.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12142 ) Change subject: IMPALA-7468: Port CancelQueryFInstances() to KRPC. .. IMPALA-7468: Port CancelQueryFInstances() to KRPC. When the Coordinator needs to cancel a query (for example because a user has hit Control-C), it does this by sending a CancelQueryFInstances message to each fragment instance. This change switches this code to use KRPC. Add new protobuf definitions for the messages, and remove the old thrift definitions. Move the server-side implementation of Cancel() from ImpalaInternalService to ControlService. Rework the scheduler so that the FInstanceExecParams always contains the KRPC address of the fragment executors, this address can then be used if a query is to be cancelled. For now keep the KRPC calls to CancelQueryFInstances() as synchronous. While moving the client-side code, remove the fault injection code that was inserted with FAULT_INJECTION_SEND_RPC_EXCEPTION and FAULT_INJECTION_RECV_RPC_EXCEPTION (triggered by running impalad with --fault_injection_rpc_exception_type=1) as this tickles code in client-cache.h which is now not used. TESTING: Ran all end-to-end tests. No new tests as test_cancellation.py provides good coverage. Checked in debugger that DebugAction style fault injection (triggered from test_cancellation.py) was working correctly. Change-Id: I625030c3f1068061aa029e6e242f016cadd84969 Reviewed-on: http://gerrit.cloudera.org:8080/12142 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/runtime/backend-client.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/scheduling/query-schedule.h M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/control-service.cc M be/src/service/control-service.h M be/src/service/impala-internal-service.cc M be/src/service/impala-internal-service.h M common/protobuf/control_service.proto M common/thrift/ImpalaInternalService.thrift 12 files changed, 192 insertions(+), 139 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/12142 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I625030c3f1068061aa029e6e242f016cadd84969 Gerrit-Change-Number: 12142 Gerrit-PatchSet: 6 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-7468: Port CancelQueryFInstances() to KRPC.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12142 ) Change subject: IMPALA-7468: Port CancelQueryFInstances() to KRPC. .. Patch Set 5: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/12142 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I625030c3f1068061aa029e6e242f016cadd84969 Gerrit-Change-Number: 12142 Gerrit-PatchSet: 5 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Tue, 08 Jan 2019 01:05:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7905: Hive keywords not quoted for identifiers
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12009 ) Change subject: IMPALA-7905: Hive keywords not quoted for identifiers .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1729/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I06cc20b052a3a66535a171c36b4b31477c0ba6d0 Gerrit-Change-Number: 12009 Gerrit-PatchSet: 7 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Comment-Date: Tue, 08 Jan 2019 00:50:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6503: Support reading complex types from ORC format files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12168 ) Change subject: IMPALA-6503: Support reading complex types from ORC format files .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1728/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790 Gerrit-Change-Number: 12168 Gerrit-PatchSet: 7 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 08 Jan 2019 00:43:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7905: Hive keywords not quoted for identifiers
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/12009 ) Change subject: IMPALA-7905: Hive keywords not quoted for identifiers .. Patch Set 7: (2 comments) Addressed recent review comments. http://gerrit.cloudera.org:8080/#/c/12009/5/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java: http://gerrit.cloudera.org:8080/#/c/12009/5/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@116 PS5, Line 116: an identifier > I meant to refer to the method argument @ident here. But it is fine. Not to Took a second look. The text was ambiguous, using "identifier" for two distinct concepts. Reworded to be clearer. http://gerrit.cloudera.org:8080/#/c/12009/5/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@148 PS5, Line 148: / : public static boolean impalaNeedsQuotes(String ident) { : return SqlScanner.isReserved(ident) || : // Impala's scanner recognizes the ".123" portion of "db.123e45" as a decimal, : // so while the quoting is not necessary > ya, 123e45 is a valid decimal literal but not .123 part like you mentioned Reworded the comment and inserted an actual example that illustrates the issue. "3a3" is a fine unquoted identifier. "3e3" must be quoted, else it is a double. So, to be safe, we quote all identifiers that start with a digit. -- To view, visit http://gerrit.cloudera.org:8080/12009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I06cc20b052a3a66535a171c36b4b31477c0ba6d0 Gerrit-Change-Number: 12009 Gerrit-PatchSet: 7 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Comment-Date: Tue, 08 Jan 2019 00:37:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7905: Hive keywords not quoted for identifiers
Hello Bharath Vissapragada, Fredy Wijaya, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12009 to look at the new patch set (#8). Change subject: IMPALA-7905: Hive keywords not quoted for identifiers .. IMPALA-7905: Hive keywords not quoted for identifiers Impala often generates SQL for statements in the toSql() call, often used during testing or when writing the query plan. Impala keywords such as "create", when used as identifiers, must be quoted: SELECT `select`, `from` FROM `order` ... The code in ToSqlUtils.getIdentSql() also quotes the identifier if it is a Hive keyword. But, that code contained a flaw: it uses the Hive lexer to detect a keyword, but the lexer expects a case-insensitive input. We provide a case sensitive input. As a result, "MONTH" is caught as a Hive keyword and quoted, but "month" is not. This patch fixes that flaw. This patch also fixes: IMPALA-8051: Compute stats fails on a column with comment character in name The code uses the Hive lexical analyzer to check names. Since "#" and "--" are comment characters, a name like "foo#" is parsed as "foo" which does not need quotes, hence we don't quote "foo#", which causes issues. Added a special check for "#" and "--" to resolve this issue. Testing: * Refactored getIdentSql() easier testing. * Added a tests to the recently added ToSqlUtilsTest for this case and several others. * Making this change caused the columns `month`, `year`, and `key` to be quoted when before they were not. Updated many tests as a result. * Added a new identSql() function, for use in tests, to match the quoting that Impala uses, and to handle the wildcard, and multi-part names. Used this in ToSqlTest to handle the quoted names. * Reran all FE tests. Change-Id: I06cc20b052a3a66535a171c36b4b31477c0ba6d0 --- M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlUtilsTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test M testdata/workloads/functional-planner/queries/PlannerTest/partition-key-scans.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test M testdata/workloads/functional-planner/queries/PlannerTest/shuffle-by-distinct-exprs.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/union.test 31 files changed, 741 insertions(+), 503 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/12009/8 -- To view, visit http://gerrit.cloudera.org:8080/12009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I06cc20b052a3a66535a171c36b4b31477c0ba6d0 Gerrit-Change-Number: 12009 Gerrit-PatchSet: 8 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers
[Impala-ASF-CR] IMPALA-7905: Hive keywords not quoted for identifiers
Hello Bharath Vissapragada, Fredy Wijaya, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12009 to look at the new patch set (#7). Change subject: IMPALA-7905: Hive keywords not quoted for identifiers .. IMPALA-7905: Hive keywords not quoted for identifiers Impala often generates SQL for statements in the toSql() call, often used during testing or when writing the query plan. Impala keywords such as "create", when used as identifiers, must be quoted: SELECT `select`, `from` FROM `order` ... The code in ToSqlUtils.getIdentSql() also quotes the identifier if it is a Hive keyword. But, that code contained a flaw: it uses the Hive lexer to detect a keyword, but the lexer expects a case-insensitive input. We provide a case sensitive input. As a result, "MONTH" is caught as a Hive keyword and quoted, but "month" is not. This patch fixes that flaw. This patch also fixes: IMPALA-8051: Compute stats fails on a column with comment character in name The code uses the Hive lexical analyzer to check names. Since "#" and "--" are comment characters, a name like "foo#" is parsed as "foo" which does not need quotes, hence we don't quote "foo#", which causes issues. Added a special check for "#" and "--" to resolve this issue. Testing: * Refactored getIdentSql() easier testing. * Added a tests to the recently added ToSqlUtilsTest for this case and several others. * Making this change caused the columns `month`, `year`, and `key` to be quoted when before they were not. Updated many tests as a result. * Added a new identSql() function, for use in tests, to match the quoting that Impala uses, and to handle the wildcard, and multi-part names. Used this in ToSqlTest to handle the quoted names. * Reran all FE tests. Change-Id: I06cc20b052a3a66535a171c36b4b31477c0ba6d0 --- M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlUtilsTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test M testdata/workloads/functional-planner/queries/PlannerTest/partition-key-scans.test M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test M testdata/workloads/functional-planner/queries/PlannerTest/shuffle-by-distinct-exprs.test M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/union.test 31 files changed, 738 insertions(+), 503 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/12009/7 -- To view, visit http://gerrit.cloudera.org:8080/12009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I06cc20b052a3a66535a171c36b4b31477c0ba6d0 Gerrit-Change-Number: 12009 Gerrit-PatchSet: 7 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers
[Impala-ASF-CR] IMPALA-7905: Hive keywords not quoted for identifiers
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/12009 ) Change subject: IMPALA-7905: Hive keywords not quoted for identifiers .. Patch Set 6: Rebased on latest master to resolve merge conflict. Added a trivial fix for IMPALA-8051: using a comment character in a column name. Please review the fix and rebase and renew your approval if you agree. -- To view, visit http://gerrit.cloudera.org:8080/12009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I06cc20b052a3a66535a171c36b4b31477c0ba6d0 Gerrit-Change-Number: 12009 Gerrit-PatchSet: 6 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Comment-Date: Tue, 08 Jan 2019 00:17:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6503: Support reading complex types from ORC format files
Hello Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12168 to look at the new patch set (#7). Change subject: IMPALA-6503: Support reading complex types from ORC format files .. IMPALA-6503: Support reading complex types from ORC format files We’ve supported reading primitive types from ORC files (IMPALA-5717). In this patch we add support for complex types (struct/array/map). In IMPALA-5717, we depend on the ORC lib to read ORC binaries. The ORC lib can materialize ORC column binaries into its representation (orc::ColumnVectorBatch), so we don’t need to do anything about decoding/decompression in hdfs-orc-scanner. Since it already supports complex types, we’ll still depend on it. What we need to add in IMPALA-6503 are two things: 1. Specify which nested columns we need to the ORC lib 2. Transform outputs of ORC lib (nested orc::ColumnVectorBatch) into Impala’s representation To format the materialization, we implement several ORC column readers used in hdfs-orc-scanner. Each kind of reader treats a column type. Don’t like the Parquet readers (used in hdfs-parquet-scanner) which materializes Parquet column binaries into tuple values directly, the ORC readers (in hdfs-orc-scanner) just need to transform outputs of the ORC lib into tuple/slot values. Tests: * Enable existing tests for complex types (test_nested_types.py, test_tpch_nested_queries.py) for ORC. Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790 --- M be/src/exec/CMakeLists.txt M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-orc-scanner.h A be/src/exec/orc-column-readers.cc A be/src/exec/orc-column-readers.h A be/src/exec/orc-metadata-utils.cc A be/src/exec/orc-metadata-utils.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java A testdata/ComplexTypesTbl/README A testdata/ComplexTypesTbl/nonnullable.orc A testdata/ComplexTypesTbl/nullable.orc M testdata/bin/create-load-data.sh M testdata/bin/load_nested.py M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/complex-types-file-formats.test M testdata/workloads/functional-query/queries/QueryTest/max-nesting-depth.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch-limit.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch-mem-limit.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch.test M testdata/workloads/tpch_nested/tpch_nested_core.csv M testdata/workloads/tpch_nested/tpch_nested_dimensions.csv M testdata/workloads/tpch_nested/tpch_nested_exhaustive.csv M testdata/workloads/tpch_nested/tpch_nested_pairwise.csv M tests/query_test/test_nested_types.py M tests/query_test/test_tpch_nested_queries.py 29 files changed, 1,685 insertions(+), 444 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/12168/7 -- To view, visit http://gerrit.cloudera.org:8080/12168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790 Gerrit-Change-Number: 12168 Gerrit-PatchSet: 7 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-8041, Part 2: Refactor SELECT list
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12144 ) Change subject: IMPALA-8041, Part 2: Refactor SELECT list .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1727/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12144 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5e173dad717e739c097f1f0cd86a0352648ff886 Gerrit-Change-Number: 12144 Gerrit-PatchSet: 2 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Comment-Date: Mon, 07 Jan 2019 23:15:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8041, Part 2: Refactor SELECT list
Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/12144 ) Change subject: IMPALA-8041, Part 2: Refactor SELECT list .. Patch Set 2: Rebased on master. Fixed code style issues. -- To view, visit http://gerrit.cloudera.org:8080/12144 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5e173dad717e739c097f1f0cd86a0352648ff886 Gerrit-Change-Number: 12144 Gerrit-PatchSet: 2 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Comment-Date: Mon, 07 Jan 2019 22:42:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8041, Part 2: Refactor SELECT list
Hello Bharath Vissapragada, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12144 to look at the new patch set (#2). Change subject: IMPALA-8041, Part 2: Refactor SELECT list .. IMPALA-8041, Part 2: Refactor SELECT list Another step toward combining analysis and rewrites. This step refactors the SELECT list to: * Enable each select list item to hold its "source" (before analysis and rewrite) SQL. * Split the select list item into two classes: one for the wildcard (which holds no expresion) and the other for actual select expressions. * Moves some analysis steps into the select list item itself. * Cleans up a few other minor items. Tests: No functional change, just refactoring. Reran all FE tests. Change-Id: I5e173dad717e739c097f1f0cd86a0352648ff886 --- M fe/src/main/cup/sql-parser.cup A fe/src/main/java/org/apache/impala/analysis/AbstractExpression.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java M fe/src/main/java/org/apache/impala/analysis/SelectList.java M fe/src/main/java/org/apache/impala/analysis/SelectListItem.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java 12 files changed, 300 insertions(+), 133 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/12144/2 -- To view, visit http://gerrit.cloudera.org:8080/12144 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5e173dad717e739c097f1f0cd86a0352648ff886 Gerrit-Change-Number: 12144 Gerrit-PatchSet: 2 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers
[Impala-ASF-CR] IMPALA-7625: test web pages.py backend tests are failing
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12024 ) Change subject: IMPALA-7625: test_web_pages.py backend tests are failing .. IMPALA-7625: test_web_pages.py backend tests are failing The backend tests in test_web_page.py do some basic validation of the query_backends and query_finstances endpoints. The tests were failing due to a race condition in the test itself. The issue is that the backend endpoints are only usable while a query is running. When queried against completed queries, the backend endpoints return nothing. The original test worked by running a "complex" query asynchronously and then querying the backend endpoints before the query completed. The assumption was that the query would run long enough for the request to the backend endpoint to complete. However, that is not always the case and the query easily completes before the backend endpoints can be reached. This patch updates the test so that instead of running a "complex" query, it runs a query that calls the sleep UDF, guaranteeing the query will take enough time for the backend endpoints to be reached. Furthermore, the patch adds additional validation and cleanup of all the backend endpoint tests. Testing: Ran core tests. Change-Id: I2092afe1bfaec30cd3e7b0040f06865e43fe62fb Reviewed-on: http://gerrit.cloudera.org:8080/12024 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M tests/common/impala_test_suite.py M tests/custom_cluster/test_admission_controller.py M tests/webserver/test_web_pages.py 3 files changed, 83 insertions(+), 48 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/12024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I2092afe1bfaec30cd3e7b0040f06865e43fe62fb Gerrit-Change-Number: 12024 Gerrit-PatchSet: 7 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7625: test web pages.py backend tests are failing
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12024 ) Change subject: IMPALA-7625: test_web_pages.py backend tests are failing .. Patch Set 6: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/12024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2092afe1bfaec30cd3e7b0040f06865e43fe62fb Gerrit-Change-Number: 12024 Gerrit-PatchSet: 6 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 Jan 2019 22:05:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8047 Support .proto files in .clang-format
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12165 ) Change subject: IMPALA-8047 Support .proto files in .clang-format .. IMPALA-8047 Support .proto files in .clang-format The .proto file extension is used for the Google Protocol Buffers language. Impala uses this language to specify the format of messages used by KRPC. Add support for this language to .clang-format so that we can have consistent formatting. The proposed support is: Language: Proto BasedOnStyle: Google ColumnLimit: 90 This produces only a few diffs when run against the existing Impala code. I’m not proposing to make any changes to .proto files, this is just to show what clang-format will do. Apart from wrapping comments and code at 90 chars, the diffs are mostly of the form -syntax="proto2"; +syntax = "proto2"; - message Certificate {}; + message Certificate { + }; - optional bool client_timeout_defined = 4 [ default = false ]; + optional bool client_timeout_defined = 4 [default = false]; -UNKNOWN= 999; -NEGOTIATE = 1; -SASL_SUCCESS = 0; -SASL_INITIATE = 2; +UNKNOWN = 999; +NEGOTIATE = 1; +SASL_SUCCESS = 0; +SASL_INITIATE = 2; This last change can be configured using “AlignConsecutiveAssignments: true” but that creates a different set of diffs. Change-Id: I0c2dcfc21fc8f9206adb64166fbd05580516791f Reviewed-on: http://gerrit.cloudera.org:8080/12165 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M .clang-format 1 file changed, 4 insertions(+), 0 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/12165 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I0c2dcfc21fc8f9206adb64166fbd05580516791f Gerrit-Change-Number: 12165 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-8047 Support .proto files in .clang-format
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12165 ) Change subject: IMPALA-8047 Support .proto files in .clang-format .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/12165 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0c2dcfc21fc8f9206adb64166fbd05580516791f Gerrit-Change-Number: 12165 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 Jan 2019 21:55:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7087: Read Parquet decimal columns with lower precision/scale
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/12163 ) Change subject: IMPALA-7087: Read Parquet decimal columns with lower precision/scale .. Patch Set 1: > > (1 comment) > > > > Thanks for taking a look @Csaba! I've only responded to your > > comment on overflow handling so far, as I want to get that > behavior > > locked down first. > > Are you sure that the query will be aborted - isn't it only one > warning / file? If there are "normal" files processed before the > one with overflows, then some rows will be returned before running > to the error. In the current implementation it is also possible > that some rows are returned from the file before reaching the > overflowing value. > > I am also not completely sure about the best approach. > > I would prefer the error to be non-fatal, so to print one > warning/file and continue the query. > > If no important use case is lost this way, then I would prefer to > use the formula and skip files based on metadata instead of > checking every row. > > This would make it possible to print a useful error message, so > that the user will know the precision needed to process the file, > and can call ALTER TABLE to increase precision / decrease scale and > make the file usable. If the error is printed based the first row, > then this can become a long iterative process. Thats a fair point. It is throwing an error. The last test in parquet-decimal-precision-and-scale-widening.test has a `CATCH` section, so if I understand the logic in ImpalaTestSuite#run_test_case then that means the query threw an exception. I agree that its possible for some rows to be returned before this error is throw, but I guess thats an issue with many runtime errors in Impala. I'm good with printing a non-fatal error and setting the row to NULL. This is what Hive does, so we will be consistent with Hive. Using the formula will prevent overflows from happening in the first place, so it somewhat orthogonal to the discussion of error handling. The only drawback I can see to using the formula you suggested is that it is not exactly SQL Standard compliant. It's entirely possible that the formula prevents a query from running, even though none of the values in the table would cause an overflow. The standard says that values should be rounded or truncated to the match target precision / scale, but I guess it doesn't explicitly mention overflows so its hard to say. I think that following Hive's behavior might be best here. I've added @Lars to the review to see what he thinks. -- To view, visit http://gerrit.cloudera.org:8080/12163 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iafc8efd12379a39756e3e70f022a81a636dadb61 Gerrit-Change-Number: 12163 Gerrit-PatchSet: 1 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Mon, 07 Jan 2019 21:32:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8026: Fix #rows accounting for NLJ
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12164 ) Change subject: IMPALA-8026: Fix #rows accounting for NLJ .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12164 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I43e2e1d8f85478ff36c8cddc69bac3bdccd2c824 Gerrit-Change-Number: 12164 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Mon, 07 Jan 2019 21:12:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8026: Fix #rows accounting for NLJ
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12164 ) Change subject: IMPALA-8026: Fix #rows accounting for NLJ .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3618/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/12164 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I43e2e1d8f85478ff36c8cddc69bac3bdccd2c824 Gerrit-Change-Number: 12164 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Mon, 07 Jan 2019 21:12:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7446: enable buffer pool GC when near process mem limit
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/12133 ) Change subject: IMPALA-7446: enable buffer pool GC when near process mem limit .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/12133/4/be/src/runtime/bufferpool/buffer-pool-test.cc File be/src/runtime/bufferpool/buffer-pool-test.cc: http://gerrit.cloudera.org:8080/#/c/12133/4/be/src/runtime/bufferpool/buffer-pool-test.cc@2260 PS4, Line 2260: // Make sure we have a process memory tracker that uses TCMalloc metrics. nit: maybe explain here briefly why those metrics are necessary. Its not obvious unless you read the JIRA or the patch and know which code-path this is trying to hit http://gerrit.cloudera.org:8080/#/c/12133/4/be/src/runtime/bufferpool/buffer-pool-test.cc@2277 PS4, Line 2277: ASSERT_OK(client.DecreaseReservationTo(numeric_limits::max(), 0)); nit: should we add a check here to see that the relevant Buffer Pool metrics are in the required state (FREE_BUFFER_BYTES, CLEAN_PAGE_BYTES and UNUSED_RESERVATION_BYTES), to make sure the right code path is hit http://gerrit.cloudera.org:8080/#/c/12133/4/be/src/runtime/exec-env.h File be/src/runtime/exec-env.h: http://gerrit.cloudera.org:8080/#/c/12133/4/be/src/runtime/exec-env.h@254 PS4, Line 254: Must be called after : /// InitBufferPool(). and RegisterMemoryMetrics() http://gerrit.cloudera.org:8080/#/c/12133/4/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: http://gerrit.cloudera.org:8080/#/c/12133/4/be/src/runtime/exec-env.cc@436 PS4, Line 436: DCHECK(AggregateMemoryMetrics::TOTAL_USED != nullptr); nit: DCHECK(AggregateMemoryMetrics::TOTAL_USED != nullptr) << "Mem metrics not registered."; -- To view, visit http://gerrit.cloudera.org:8080/12133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I81e8e29f1ba319f1b499032f9518d32c511b4b21 Gerrit-Change-Number: 12133 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Bikramjeet Vig Gerrit-Comment-Date: Mon, 07 Jan 2019 21:07:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8026: Fix #rows accounting for NLJ
Thomas Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/12164 ) Change subject: IMPALA-8026: Fix #rows accounting for NLJ .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12164 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I43e2e1d8f85478ff36c8cddc69bac3bdccd2c824 Gerrit-Change-Number: 12164 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Mon, 07 Jan 2019 21:02:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7468: Port CancelQueryFInstances() to KRPC.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12142 ) Change subject: IMPALA-7468: Port CancelQueryFInstances() to KRPC. .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3617/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/12142 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I625030c3f1068061aa029e6e242f016cadd84969 Gerrit-Change-Number: 12142 Gerrit-PatchSet: 5 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Mon, 07 Jan 2019 21:02:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7468: Port CancelQueryFInstances() to KRPC.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12142 ) Change subject: IMPALA-7468: Port CancelQueryFInstances() to KRPC. .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12142 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I625030c3f1068061aa029e6e242f016cadd84969 Gerrit-Change-Number: 12142 Gerrit-PatchSet: 5 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Mon, 07 Jan 2019 21:02:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7468: Port CancelQueryFInstances() to KRPC.
Thomas Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/12142 ) Change subject: IMPALA-7468: Port CancelQueryFInstances() to KRPC. .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12142 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I625030c3f1068061aa029e6e242f016cadd84969 Gerrit-Change-Number: 12142 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Mon, 07 Jan 2019 20:54:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6533: Add min-max filter for decimal types on kudu tables.
Thomas Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/12113 ) Change subject: IMPALA-6533: Add min-max filter for decimal types on kudu tables. .. Patch Set 14: (4 comments) http://gerrit.cloudera.org:8080/#/c/12113/14/be/src/runtime/decimal-value.h File be/src/runtime/decimal-value.h: http://gerrit.cloudera.org:8080/#/c/12113/14/be/src/runtime/decimal-value.h@199 PS14, Line 199: value_ nit: /// nit: 'value_' (i.e. quotes around variable names) More importantly, I don't understand this comment - eg. what is "the receiving end" in this context? I assume you're thinking in the context of this patch, that a DecimalMinMaxFilter will be reconstructed after being sent over the network using this function, but the function is more general so I think it makes more sense to have a more general comment. Its also generally nice to specifically mention the use of the out parameter. Maybe something like: /// Store the binary representation of this DecimalValue in 'tvalue'. http://gerrit.cloudera.org:8080/#/c/12113/14/be/src/util/min-max-filter-ir.cc File be/src/util/min-max-filter-ir.cc: http://gerrit.cloudera.org:8080/#/c/12113/14/be/src/util/min-max-filter-ir.cc@95 PS14, Line 95: switch (size_) { So this is pretty perf-sensitive code (executed once per row on the build side of joins), and since 'size_' is a constant we really don't want to be doing this switch here. There are at least two ways to fix this: - Use codegen, eg. introduce a MinMaxFilter::CodegenInsert() which returns an llvm::Function. For other types, this would just call codegen->GetFunction(GetInsertIRFunctionType()) but for DecimalMinMaxFilter it could replace this with a constant (eg. by having a function GetSize() here and using ReplaceCallSites()), then call this new function in FilterContext::CodegenInsert(). - Use macros. eg. have separate classes for Decimal4MinMaxFilter, Decimal8MinMaxFilter, and Decimal16MinMaxFilter generated with macros just as we do for NUMERIC_MIN_MAX_FILTER, and return an object of the appropriate type from MinMaxFilter::Create() I have a strong preference for the macros option as codegen can be tricky and I think using macros will be more efficient (eg. because it doesn't add to codegen time, eliminates switching in other function like DecimalMinMaxFilter::Or()/ToThrift()) http://gerrit.cloudera.org:8080/#/c/12113/14/be/src/util/min-max-filter.h File be/src/util/min-max-filter.h: http://gerrit.cloudera.org:8080/#/c/12113/14/be/src/util/min-max-filter.h@298 PS14, Line 298: static int GetSize(int precision) { I think that you can just use ColumnType::GetDecimalByteSize() instead of defining this function yourself. http://gerrit.cloudera.org:8080/#/c/12113/13/tests/query_test/test_runtime_filters.py File tests/query_test/test_runtime_filters.py: http://gerrit.cloudera.org:8080/#/c/12113/13/tests/query_test/test_runtime_filters.py@116 PS13, Line 116: self.run_test_case('QueryTest/decimal_min_max_filters', vector) > How do I find out the time it takes to run? Hmm, alright. It doesn't make much sense to me to split the decimal-related tests based on the join type (and my guess is that isn't really what Tim had in mind anyways), so my recommendation is to move all of the decimal-specific test cases into this decimal-specific file (eg. test cases 5+), and leave the decimal stuff that you added to non-decimal-specific test cases in min_max_filters.test (eg. the stuff in test case 1). Then probably have test_decimal_min_max_filters run only in exhaustive (based on a combination of the long running time and the assumption that the tests in case 1 combined with various tests elsewhere that address different join types cover most of this well enough to catch most issues, and the specific things being tested by your join-type-specific decimal tests are unlikely to break regularly). -- To view, visit http://gerrit.cloudera.org:8080/12113 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib7e7278e902160d7060f8097290bc172d9031f94 Gerrit-Change-Number: 12113 Gerrit-PatchSet: 14 Gerrit-Owner: Janaki Lahorani Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Janaki Lahorani Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 Jan 2019 19:42:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7979: Enhance decoders to support value-skipping
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12172 ) Change subject: IMPALA-7979: Enhance decoders to support value-skipping .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1726/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12172 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib848f1bd71735fe84e8064daf700417b32589f57 Gerrit-Change-Number: 12172 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 Jan 2019 19:13:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7468: Port CancelQueryFInstances() to KRPC.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12142 ) Change subject: IMPALA-7468: Port CancelQueryFInstances() to KRPC. .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1725/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12142 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I625030c3f1068061aa029e6e242f016cadd84969 Gerrit-Change-Number: 12142 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Mon, 07 Jan 2019 19:12:17 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP: IMPALA-5843: Use page index in Parquet files to skip pages
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/12065 ) Change subject: WIP: IMPALA-5843: Use page index in Parquet files to skip pages .. Patch Set 1: (7 comments) Thanks for the comments. This time I only reply to the ones related to the decoders. I started a new patch that only focus on the decoders and value-skipping: https://gerrit.cloudera.org/#/c/12172/ http://gerrit.cloudera.org:8080/#/c/12065/1/be/src/exec/parquet/parquet-bool-decoder.cc File be/src/exec/parquet/parquet-bool-decoder.cc: PS1: > Might be worth pulling out this code change to a separate patch and adding Good idea, thanks! http://gerrit.cloudera.org:8080/#/c/12065/1/be/src/exec/parquet/parquet-bool-decoder.cc@70 PS1, Line 70: int skip_cached = min(num_unpacked_values_ - unpacked_value_idx_, num_values); > Maybe add DCHECK_GT(num_values, 0). Done http://gerrit.cloudera.org:8080/#/c/12065/1/be/src/exec/parquet/parquet-bool-decoder.cc@86 PS1, Line 86: } else { > The else branch seems inconsistent with ParquetBoolDecoder::DecodeValue, wh I'm not sure I follow. When we skip values we try to avoid decoding and unpacking whenever possible. Am I missing something? http://gerrit.cloudera.org:8080/#/c/12065/1/be/src/util/rle-encoding.h File be/src/util/rle-encoding.h: http://gerrit.cloudera.org:8080/#/c/12065/1/be/src/util/rle-encoding.h@636 PS1, Line 636: DCHECK_GE(num_values, 0); > Does skipping 0 values make sense? Changed to GT http://gerrit.cloudera.org:8080/#/c/12065/1/be/src/util/rle-encoding.h@686 PS1, Line 686: inline bool RleBatchDecoder::SkipLiteralValues( : int32_t num_literals_to_skip) { > nit: could fit to one line Done http://gerrit.cloudera.org:8080/#/c/12065/1/be/src/util/rle-encoding.h@688 PS1, Line 688: DCHECK_GE(num_literals_to_skip, 0); > Does skipping 0 values make sense? Changed to GT http://gerrit.cloudera.org:8080/#/c/12065/1/be/src/util/rle-encoding.h@698 PS1, Line 698: // run avoid ending on a non-byte boundary. > nit: missing 'to' Done -- To view, visit http://gerrit.cloudera.org:8080/12065 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a Gerrit-Change-Number: 12065 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 07 Jan 2019 18:44:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7979: Enhance decoders to support value-skipping
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12172 Change subject: IMPALA-7979: Enhance decoders to support value-skipping .. IMPALA-7979: Enhance decoders to support value-skipping This commit adds value-skipping functionality to the decoders. Value-skipping is useful when we know that we won't need the following N values, so instead of decoding and dropping them, we can just "jump" through them. This feature is a prerequisite for Parquet page skipping (IMPALA-5843). I added backend tests for all the decoders. Change-Id: Ib848f1bd71735fe84e8064daf700417b32589f57 --- M be/src/exec/parquet/CMakeLists.txt A be/src/exec/parquet/parquet-bool-decoder-test.cc M be/src/exec/parquet/parquet-bool-decoder.cc M be/src/exec/parquet/parquet-bool-decoder.h M be/src/util/bit-stream-utils.h M be/src/util/bit-stream-utils.inline.h M be/src/util/dict-encoding.h M be/src/util/dict-test.cc M be/src/util/rle-encoding.h M be/src/util/rle-test.cc 10 files changed, 515 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/12172/1 -- To view, visit http://gerrit.cloudera.org:8080/12172 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ib848f1bd71735fe84e8064daf700417b32589f57 Gerrit-Change-Number: 12172 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-7468: Port CancelQueryFInstances() to KRPC.
Andrew Sherman has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/12142 ) Change subject: IMPALA-7468: Port CancelQueryFInstances() to KRPC. .. IMPALA-7468: Port CancelQueryFInstances() to KRPC. When the Coordinator needs to cancel a query (for example because a user has hit Control-C), it does this by sending a CancelQueryFInstances message to each fragment instance. This change switches this code to use KRPC. Add new protobuf definitions for the messages, and remove the old thrift definitions. Move the server-side implementation of Cancel() from ImpalaInternalService to ControlService. Rework the scheduler so that the FInstanceExecParams always contains the KRPC address of the fragment executors, this address can then be used if a query is to be cancelled. For now keep the KRPC calls to CancelQueryFInstances() as synchronous. While moving the client-side code, remove the fault injection code that was inserted with FAULT_INJECTION_SEND_RPC_EXCEPTION and FAULT_INJECTION_RECV_RPC_EXCEPTION (triggered by running impalad with --fault_injection_rpc_exception_type=1) as this tickles code in client-cache.h which is now not used. TESTING: Ran all end-to-end tests. No new tests as test_cancellation.py provides good coverage. Checked in debugger that DebugAction style fault injection (triggered from test_cancellation.py) was working correctly. Change-Id: I625030c3f1068061aa029e6e242f016cadd84969 --- M be/src/runtime/backend-client.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator-backend-state.h M be/src/scheduling/query-schedule.h M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/control-service.cc M be/src/service/control-service.h M be/src/service/impala-internal-service.cc M be/src/service/impala-internal-service.h M common/protobuf/control_service.proto M common/thrift/ImpalaInternalService.thrift 12 files changed, 192 insertions(+), 139 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/12142/4 -- To view, visit http://gerrit.cloudera.org:8080/12142 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I625030c3f1068061aa029e6e242f016cadd84969 Gerrit-Change-Number: 12142 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-7468: Port CancelQueryFInstances() to KRPC.
Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/12142 ) Change subject: IMPALA-7468: Port CancelQueryFInstances() to KRPC. .. Patch Set 3: (11 comments) Thanks for the code reviews http://gerrit.cloudera.org:8080/#/c/12142/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12142/3//COMMIT_MSG@9 PS3, Line 9: (usually because a user has : hit Control-C) > Not that important, but is this really true? Some other cancellation situat Thanks good point, "usually" is definitely wrong. http://gerrit.cloudera.org:8080/#/c/12142/3/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: http://gerrit.cloudera.org:8080/#/c/12142/3/be/src/runtime/coordinator-backend-state.h@239 PS3, Line 239: Execution backend. > Thrift execution backend? Done http://gerrit.cloudera.org:8080/#/c/12142/1/be/src/runtime/coordinator-backend-state.h File be/src/runtime/coordinator-backend-state.h: http://gerrit.cloudera.org:8080/#/c/12142/1/be/src/runtime/coordinator-backend-state.h@294 PS1, Line 294: Status DoCancelQueryFInstancesRrpcWithRetry( > I believe our current clang-format rules will tell you to put the symbols n Done http://gerrit.cloudera.org:8080/#/c/12142/3/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/12142/3/be/src/runtime/coordinator-backend-state.cc@427 PS3, Line 427: auto > We generally avoid the use of auto as it hurts readability. Done http://gerrit.cloudera.org:8080/#/c/12142/3/be/src/runtime/coordinator-backend-state.cc@440 PS3, Line 440: DoCancelQueryFInstancesRrpcWithRetry > Is there a reasonable way to make this generic like DoRpcWithRetry()? We'll I'm doing retry using a lambda http://gerrit.cloudera.org:8080/#/c/12142/3/be/src/scheduling/query-schedule.h File be/src/scheduling/query-schedule.h: http://gerrit.cloudera.org:8080/#/c/12142/3/be/src/scheduling/query-schedule.h@88 PS3, Line 88: execution backend > Thrift execution backend? Done http://gerrit.cloudera.org:8080/#/c/12142/3/be/src/service/control-service.h File be/src/service/control-service.h: http://gerrit.cloudera.org:8080/#/c/12142/3/be/src/service/control-service.h@41 PS3, Line 41: public: > Not your code, but while you're here would you mind fixing the spacing in t Done http://gerrit.cloudera.org:8080/#/c/12142/3/be/src/service/control-service.h@63 PS3, Line 63: /// Cancel any executing fragment instances for the query id specified in 'req'. > nit: could you move this under ReportExecStatus() to keep all of the rpc ha Done http://gerrit.cloudera.org:8080/#/c/12142/3/be/src/service/control-service.h@64 PS3, Line 64: void > virtual Done http://gerrit.cloudera.org:8080/#/c/12142/3/be/src/service/control-service.cc File be/src/service/control-service.cc: http://gerrit.cloudera.org:8080/#/c/12142/3/be/src/service/control-service.cc@173 PS3, Line 173: "query" > Since this is a constant, its not really necessary to Substitute() it. Done http://gerrit.cloudera.org:8080/#/c/12142/3/common/protobuf/control_service.proto File common/protobuf/control_service.proto: http://gerrit.cloudera.org:8080/#/c/12142/3/common/protobuf/control_service.proto@181 PS3, Line 181: rpc CancelQueryFInstances(CancelQueryFInstancesRequestPB) returns (CancelQueryFInstancesResponsePB); > Long line. Done, I added IMPALA-8047 for the clang-format support -- To view, visit http://gerrit.cloudera.org:8080/12142 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I625030c3f1068061aa029e6e242f016cadd84969 Gerrit-Change-Number: 12142 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Mon, 07 Jan 2019 18:36:44 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7087: Read Parquet decimal columns with lower precision/scale
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/12163 ) Change subject: IMPALA-7087: Read Parquet decimal columns with lower precision/scale .. Patch Set 1: > (1 comment) > > Thanks for taking a look @Csaba! I've only responded to your > comment on overflow handling so far, as I want to get that behavior > locked down first. Are you sure that the query will be aborted - isn't it only one warning / file? If there are "normal" files processed before the one with overflows, then some rows will be returned before running to the error. In the current implementation it is also possible that some rows are returned from the file before reaching the overflowing value. I am also not completely sure about the best approach. I would prefer the error to be non-fatal, so to print one warning/file and continue the query. If no important use case is lost this way, then I would prefer to use the formula and skip files based on metadata instead of checking every row. This would make it possible to print a useful error message, so that the user will know the precision needed to process the file, and can call ALTER TABLE to increase precision / decrease scale and make the file usable. If the error is printed based the first row, then this can become a long iterative process. -- To view, visit http://gerrit.cloudera.org:8080/12163 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iafc8efd12379a39756e3e70f022a81a636dadb61 Gerrit-Change-Number: 12163 Gerrit-PatchSet: 1 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Mon, 07 Jan 2019 18:22:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7924: Generate Thrift 11 Python Code
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12036 ) Change subject: IMPALA-7924: Generate Thrift 11 Python Code .. Patch Set 2: (2 comments) LGTM, just one ask about the commit message then I can start GVD http://gerrit.cloudera.org:8080/#/c/12036/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12036/2//COMMIT_MSG@13 PS2, Line 13: The Thrift 11 Python deserialization code has some big performance Might be worth briefly mentioning how to take advantage of it - does one just set PYTHONPATH to include it manually? http://gerrit.cloudera.org:8080/#/c/12036/2/common/thrift/CMakeLists.txt File common/thrift/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/12036/2/common/thrift/CMakeLists.txt@a230 PS2, Line 230: > thrift-deps depends on thrift11-py, which depends on thrift-ext-data-src. M Ah I missed that. I suppose that works. -- To view, visit http://gerrit.cloudera.org:8080/12036 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3432c3e29d28ec3ef6a0a22156a18910f511fed0 Gerrit-Change-Number: 12036 Gerrit-PatchSet: 2 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 Jan 2019 18:22:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7924: Generate Thrift 11 Python Code
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/12036 ) Change subject: IMPALA-7924: Generate Thrift 11 Python Code .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/12036/2/common/thrift/CMakeLists.txt File common/thrift/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/12036/2/common/thrift/CMakeLists.txt@a230 PS2, Line 230: > Did you mean to remove this as a dependency? thrift-deps depends on thrift11-py, which depends on thrift-ext-data-src. My understanding is thats its basically a chain of dependencies, so the intention was to just add thrift11-py as a node in that chain. But I might be missing something, pretty new to CMake :) -- To view, visit http://gerrit.cloudera.org:8080/12036 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3432c3e29d28ec3ef6a0a22156a18910f511fed0 Gerrit-Change-Number: 12036 Gerrit-PatchSet: 2 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 Jan 2019 18:12:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7917 (Part 1): Decouple Sentry from Impala
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12020 ) Change subject: IMPALA-7917 (Part 1): Decouple Sentry from Impala .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1724/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If1fd1df0b38ddd7cfa41299e95f5827f8a9e9c1f Gerrit-Change-Number: 12020 Gerrit-PatchSet: 7 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 07 Jan 2019 18:09:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7625: test web pages.py backend tests are failing
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12024 ) Change subject: IMPALA-7625: test_web_pages.py backend tests are failing .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2092afe1bfaec30cd3e7b0040f06865e43fe62fb Gerrit-Change-Number: 12024 Gerrit-PatchSet: 6 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 Jan 2019 18:09:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7625: test web pages.py backend tests are failing
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12024 ) Change subject: IMPALA-7625: test_web_pages.py backend tests are failing .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3616/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/12024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2092afe1bfaec30cd3e7b0040f06865e43fe62fb Gerrit-Change-Number: 12024 Gerrit-PatchSet: 6 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 Jan 2019 18:09:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7625: test web pages.py backend tests are failing
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12024 ) Change subject: IMPALA-7625: test_web_pages.py backend tests are failing .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2092afe1bfaec30cd3e7b0040f06865e43fe62fb Gerrit-Change-Number: 12024 Gerrit-PatchSet: 5 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 Jan 2019 18:09:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7924: Generate Thrift 11 Python Code
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12036 ) Change subject: IMPALA-7924: Generate Thrift 11 Python Code .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/12036/2/common/thrift/CMakeLists.txt File common/thrift/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/12036/2/common/thrift/CMakeLists.txt@a230 PS2, Line 230: Did you mean to remove this as a dependency? -- To view, visit http://gerrit.cloudera.org:8080/12036 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3432c3e29d28ec3ef6a0a22156a18910f511fed0 Gerrit-Change-Number: 12036 Gerrit-PatchSet: 2 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 Jan 2019 18:05:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8047 Support .proto files in .clang-format
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12165 ) Change subject: IMPALA-8047 Support .proto files in .clang-format .. Patch Set 2: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/12165/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12165/2//COMMIT_MSG@20 PS2, Line 20: This produces only a few diffs when run against the existing Impala Thanks for including this, made it way easier to review. -- To view, visit http://gerrit.cloudera.org:8080/12165 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0c2dcfc21fc8f9206adb64166fbd05580516791f Gerrit-Change-Number: 12165 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 Jan 2019 17:57:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8047 Support .proto files in .clang-format
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12165 ) Change subject: IMPALA-8047 Support .proto files in .clang-format .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12165 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0c2dcfc21fc8f9206adb64166fbd05580516791f Gerrit-Change-Number: 12165 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 Jan 2019 17:57:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8047 Support .proto files in .clang-format
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12165 ) Change subject: IMPALA-8047 Support .proto files in .clang-format .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3615/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/12165 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0c2dcfc21fc8f9206adb64166fbd05580516791f Gerrit-Change-Number: 12165 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 Jan 2019 17:57:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6742: Profiles of running queries should include execution summary.
Joe McDonnell has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11591 ) Change subject: IMPALA-6742: Profiles of running queries should include execution summary. .. IMPALA-6742: Profiles of running queries should include execution summary. Currently execution summary is not included in the profiles of running queries, and it's only reported when the query is finished. This jira makes the execution summary to the profile reported when queries are still running. Testing: Added unit test. Done with real cluster. Change-Id: Idc7f714c9427d4b26d4e78cf27ceca2b0b336699 Reviewed-on: http://gerrit.cloudera.org:8080/11591 Reviewed-by: Joe McDonnell Tested-by: Impala Public Jenkins --- M be/src/service/impala-server.cc M be/src/service/impala-server.h M tests/query_test/test_observability.py 3 files changed, 42 insertions(+), 7 deletions(-) Approvals: Joe McDonnell: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/11591 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Idc7f714c9427d4b26d4e78cf27ceca2b0b336699 Gerrit-Change-Number: 11591 Gerrit-PatchSet: 6 Gerrit-Owner: Yongjun Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Yongjun Zhang
[Impala-ASF-CR] IMPALA-8021: Add estimated cardinality to EXPLAIN output
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12136 ) Change subject: IMPALA-8021: Add estimated cardinality to EXPLAIN output .. Patch Set 12: (1 comment) http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/test/java/org/apache/impala/testutil/TestUtils.java File fe/src/test/java/org/apache/impala/testutil/TestUtils.java: http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/test/java/org/apache/impala/testutil/TestUtils.java@207 PS12, Line 207: If found, then parse the values and compare them. Report a match :* if the values are within 5%, else report a mismatch. > Actually, looking at this again, it appears that we are just comparing the Yeah I'm not sure if we expect the cardinality estimates to be exactly reproducible from run-to-run. I can't think of a reason why they wouldn't be exactly equal - the collected stats should always be the same (there's no randomness or non-determinism in the cardinality there AFAIK). There are some other cases where there's variation in the exact numbers produced by the planner - memory reservations for scans depend on the file size, which can vary over time slightly for various reasons. I figured that having this tolerance factor didn't significantly weaken the tests (especially since we didn't even have cardinality before) and would alleviate some amount of labour if estimates change marginally. -- To view, visit http://gerrit.cloudera.org:8080/12136 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie9aa2d715b04cbb279aaffec8c5692686562d986 Gerrit-Change-Number: 12136 Gerrit-PatchSet: 12 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 07 Jan 2019 17:42:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7917 (Part 1): Decouple Sentry from Impala
Fredy Wijaya has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/12020 ) Change subject: IMPALA-7917 (Part 1): Decouple Sentry from Impala .. IMPALA-7917 (Part 1): Decouple Sentry from Impala The first part of this patch is to provide an initial work to decouple Sentry from Impala by creating a generic authorization provider interface that Sentry implements. The idea is to allow more authorization providers in the future. The patch updates the following: - Renamed Authorizeable to Authorizable to fix typographical error. - Moved any clases that uses Sentry specific code to org.apache.impala.authorization.sentry package and created interfaces when necessary. - Moved all generic authorization related classes to org.apache.impala.authorization package. - Minor clean up on authorization related code. In this patch, switching the authorization provider implementation still requires updating the code in many different places. A follow up patch will make it easy to switch an authorization provider implementation. Testing: - Ran all FE tests - Ran all E2E authorization tests Change-Id: If1fd1df0b38ddd7cfa41299e95f5827f8a9e9c1f --- M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/CreateFunctionStmtBase.java M fe/src/main/java/org/apache/impala/analysis/DropFunctionStmt.java M fe/src/main/java/org/apache/impala/analysis/HdfsUri.java M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java R fe/src/main/java/org/apache/impala/authorization/Authorizable.java M fe/src/main/java/org/apache/impala/authorization/AuthorizationChecker.java M fe/src/main/java/org/apache/impala/authorization/AuthorizationConfig.java R fe/src/main/java/org/apache/impala/authorization/AuthorizationException.java C fe/src/main/java/org/apache/impala/authorization/AuthorizationPolicyReaderException.java R fe/src/main/java/org/apache/impala/authorization/AuthorizationProxy.java D fe/src/main/java/org/apache/impala/authorization/AuthorizeableUri.java A fe/src/main/java/org/apache/impala/authorization/Authorizer.java R fe/src/main/java/org/apache/impala/authorization/AuthorizerUnavailableException.java M fe/src/main/java/org/apache/impala/authorization/Privilege.java M fe/src/main/java/org/apache/impala/authorization/PrivilegeRequest.java M fe/src/main/java/org/apache/impala/authorization/PrivilegeRequestBuilder.java A fe/src/main/java/org/apache/impala/authorization/sentry/ImpalaAction.java R fe/src/main/java/org/apache/impala/authorization/sentry/ImpalaActionFactory.java R fe/src/main/java/org/apache/impala/authorization/sentry/ImpalaPrivilegeModel.java R fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthProvider.java A fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizable.java R fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizableColumn.java R fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizableDb.java R fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizableFn.java R fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizableServer.java R fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizableTable.java A fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizableUri.java A fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizationChecker.java A fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizationConfig.java A fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizationPolicy.java R fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizer.java R fe/src/main/java/org/apache/impala/authorization/sentry/SentryConfig.java R fe/src/main/java/org/apache/impala/authorization/sentry/SentryUtil.java M fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java D fe/src/main/java/org/apache/impala/common/SentryPolicyReaderException.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/service/JniCatalog.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M fe/src/test/java/org/apache/impala/analysis/AnalysisSessionFixture.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeAuthStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AuditingTest.java M fe/src/test/java/org/apache/impala/analysis/AuthorizationStmtTest.java M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java M
[Impala-ASF-CR] IMPALA-7917 (Part 1): Decouple Sentry from Impala
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/12020 ) Change subject: IMPALA-7917 (Part 1): Decouple Sentry from Impala .. Patch Set 7: (7 comments) http://gerrit.cloudera.org:8080/#/c/12020/5/fe/src/main/java/org/apache/impala/authorization/AuthorizationProxy.java File fe/src/main/java/org/apache/impala/authorization/AuthorizationProxy.java: http://gerrit.cloudera.org:8080/#/c/12020/5/fe/src/main/java/org/apache/impala/authorization/AuthorizationProxy.java@222 PS5, Line 222: m authoriz > nit: should be lower case Done http://gerrit.cloudera.org:8080/#/c/12020/5/fe/src/main/java/org/apache/impala/authorization/Privilege.java File fe/src/main/java/org/apache/impala/authorization/Privilege.java: http://gerrit.cloudera.org:8080/#/c/12020/5/fe/src/main/java/org/apache/impala/authorization/Privilege.java@23 PS5, Line 23: * List of Impala privileges. > not understanding this Done http://gerrit.cloudera.org:8080/#/c/12020/5/fe/src/main/java/org/apache/impala/authorization/Privilege.java@40 PS5, Line 40: static { > it's sort of odd that you use a static block to set up the 'privileges_' me Java doesn't allow this construct. public enum Privilege { ... VIEW_METADATA(EnumSet.of(INSERT, SELECT, REFRESH), true); // compilation error. Privilege(EnumSet privileges, boolean anyOf) { ... } } http://gerrit.cloudera.org:8080/#/c/12020/5/fe/src/main/java/org/apache/impala/authorization/Privilege.java@53 PS5, Line 53: Privilege> implied_; > maybe this should be like 'implied_' or something? If I understand correctl Yeah implied is a better name. It basically means if a SQL has VIEW_METADATA privilege, it requires either INSERT, SELECT, or REFRESH. http://gerrit.cloudera.org:8080/#/c/12020/5/fe/src/main/java/org/apache/impala/authorization/PrivilegeRequest.java File fe/src/main/java/org/apache/impala/authorization/PrivilegeRequest.java: http://gerrit.cloudera.org:8080/#/c/12020/5/fe/src/main/java/org/apache/impala/authorization/PrivilegeRequest.java@45 PS5, Line 45: > seems to no longer be permitted Done http://gerrit.cloudera.org:8080/#/c/12020/5/fe/src/main/java/org/apache/impala/authorization/PrivilegeRequestBuilder.java File fe/src/main/java/org/apache/impala/authorization/PrivilegeRequestBuilder.java: http://gerrit.cloudera.org:8080/#/c/12020/5/fe/src/main/java/org/apache/impala/authorization/PrivilegeRequestBuilder.java@74 PS5, Line 74: /** > worth adding Preconditions.checkState(authorizable_ == null) in all of thes Done http://gerrit.cloudera.org:8080/#/c/12020/5/fe/src/main/java/org/apache/impala/authorization/sentry/ImpalaAction.java File fe/src/main/java/org/apache/impala/authorization/sentry/ImpalaAction.java: http://gerrit.cloudera.org:8080/#/c/12020/5/fe/src/main/java/org/apache/impala/authorization/sentry/ImpalaAction.java@48 PS5, Line 48: mpalaAction(String value, int code) { : bitFieldAction_ = new BitFieldAction(value, code); : } > do we still need to support both versions? maybe we can have a TODO to remo We only need to support Sentry 2.0.0. I'll update the code. Done. -- To view, visit http://gerrit.cloudera.org:8080/12020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If1fd1df0b38ddd7cfa41299e95f5827f8a9e9c1f Gerrit-Change-Number: 12020 Gerrit-PatchSet: 7 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 07 Jan 2019 17:34:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7087: Read Parquet decimal columns with lower precision/scale
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/12163 ) Change subject: IMPALA-7087: Read Parquet decimal columns with lower precision/scale .. Patch Set 1: (1 comment) Thanks for taking a look @Csaba! I've only responded to your comment on overflow handling so far, as I want to get that behavior locked down first. http://gerrit.cloudera.org:8080/#/c/12163/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12163/1//COMMIT_MSG@25 PS1, Line 25: If the underlying Parquet files contain rows that cannot be converted to : the higher scale without overflow, an error is thrown and the query : fails. This is different from other engines such as Hive which set the : row to NULL and allow the query to run to completion without error. > I do not know the use case, so I may be wrong, but I dislike the idea of fa Yes, the example you provided would trigger an overflow. The formula you provided sounds like it should work. I couldn't find much info in the SQL standard on how to handle these types of overflows, so maybe the behavior is ambiguous. For relational databases, you get an error when you try to insert a row that causes an overflow. However, the error is upon row insertion, so the table is still queryable after a failed attempt at inserting the row. As mentioned in the comments, Hive sets the row to NULL, and the query runs to completion. So sounds like there are three options: (1) the current behavior in the patch, (2) use the formula you provided to prevent overflows in which case no runtime errors or warnings are necessary, or (3) set the row to NULL and print out a warning (I think we do this for other types of Parquet scanning errors?). FWIW, the current behavior is exactly what you described. If you put a Parquet file into a table with a lower precision / scale, the table will be unqueryable until the files are removed. So if we went with option (2), any file inside the table that doesn't match the formula will make the table unqueryable. The main difference with this patch is that the table will be unqueryable only if certain rows that overflow are present. I don't feel strongly about any of the approaches, and I'm not super familiar with how Impala handles these types of issues, so open to suggestions. -- To view, visit http://gerrit.cloudera.org:8080/12163 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iafc8efd12379a39756e3e70f022a81a636dadb61 Gerrit-Change-Number: 12163 Gerrit-PatchSet: 1 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Mon, 07 Jan 2019 16:26:28 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6503: Support reading complex types from ORC format files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12168 ) Change subject: IMPALA-6503: Support reading complex types from ORC format files .. Patch Set 5: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/1723/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/12168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790 Gerrit-Change-Number: 12168 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 07 Jan 2019 15:28:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7087: Read Parquet decimal columns with lower precision/scale
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/12163 ) Change subject: IMPALA-7087: Read Parquet decimal columns with lower precision/scale .. Patch Set 1: (7 comments) http://gerrit.cloudera.org:8080/#/c/12163/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12163/1//COMMIT_MSG@25 PS1, Line 25: If the underlying Parquet files contain rows that cannot be converted to : the higher scale without overflow, an error is thrown and the query : fails. This is different from other engines such as Hive which set the : row to NULL and allow the query to run to completion without error. I do not know the use case, so I may be wrong, but I dislike the idea of failing the query if the Parquet file is not corrupt and only the data is "too large". If a file with such data is added to a table, the table can become unusable and Impala offers no tool to get rid of the problematic file. Would it make sense to be stricter during scale + precision checking to exclude the possibility of overflow? If I understand correctly, overflow can only occur if the scale in the file is lower then the expected scale: e.g value 2 in decimal (1,0) would overflow when converted to decimal(1, 1). If precision_file - scale_file <= precision_expected - scale_expected, then the conversion should happen without overflow, shouldn't it? http://gerrit.cloudera.org:8080/#/c/12163/1/be/src/exec/parquet/parquet-metadata-utils.cc File be/src/exec/parquet/parquet-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/12163/1/be/src/exec/parquet/parquet-metadata-utils.cc@243 PS1, Line 243: // TODO: we could allow a mismatch and do a conversion at this step. This TODO could be probably updated. http://gerrit.cloudera.org:8080/#/c/12163/1/be/src/exec/parquet/parquet-metadata-utils.cc@255 PS1, Line 255: // TODO: we could allow a mismatch and do a conversion at this step. This TODO could be probably updated. http://gerrit.cloudera.org:8080/#/c/12163/1/testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-widening.test File testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-widening.test: http://gerrit.cloudera.org:8080/#/c/12163/1/testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-widening.test@85 PS1, Line 85: Please add some tests with 'where col=const' clauses to exercise the dictionary and stat filtering code paths. Dictionary filtering is disabled if 'needs_conversion_' is true, but it may be enabled in the future if conversion will be moved to dictionary construction. I think that stat filtering won't work correctly with the current implementation - the stat value will be interpreted as if it had the same precision/scale as the SQL column, which can lead to incorrectly dropping row groups. I had to deal with somewhat similar problems in https://gerrit.cloudera.org/#/c/11057/ http://gerrit.cloudera.org:8080/#/c/12163/1/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/12163/1/tests/query_test/test_scanners.py@842 PS1, Line 842: decimal_stored_as_int32.parquet I do not see this file in the patch and it is not mentioned in the README. http://gerrit.cloudera.org:8080/#/c/12163/1/tests/query_test/test_scanners.py@848 PS1, Line 848: decimal_stored_as_int64.parquet I do not see this file in the patch and it is not mentioned in the README. http://gerrit.cloudera.org:8080/#/c/12163/1/tests/query_test/test_scanners.py@856 PS1, Line 856: binary_decimal_no_dictionary.parquet I do not see this file in the patch and it is not mentioned in the README. -- To view, visit http://gerrit.cloudera.org:8080/12163 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iafc8efd12379a39756e3e70f022a81a636dadb61 Gerrit-Change-Number: 12163 Gerrit-PatchSet: 1 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Comment-Date: Mon, 07 Jan 2019 15:14:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6503: Support reading complex types from ORC format files
Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12168 Change subject: IMPALA-6503: Support reading complex types from ORC format files .. IMPALA-6503: Support reading complex types from ORC format files We’ve supported reading primitive types from ORC files (IMPALA-5717). In this patch we add support for complex types (struct/array/map). In IMPALA-5717, we depend on the ORC lib to read ORC binaries. The ORC lib can materialize ORC column binaries into its representation (orc::ColumnVectorBatch), so we don’t need to do anything about decoding/decompression in hdfs-orc-scanner. Since it already supports complex types, we’ll still depend on it. What we need to add in IMPALA-6503 are two things: 1. Specify which nested columns we need to the ORC lib 2. Transform outputs of ORC lib (nested orc::ColumnVectorBatch) into Impala’s representation To format the materialization, we implement several ORC column readers used in hdfs-orc-scanner. Each kind of reader treats a column type. Don’t like the Parquet readers (used in hdfs-parquet-scanner) which materializes Parquet column binaries into tuple values directly, the ORC readers (in hdfs-orc-scanner) just need to transform outputs of the ORC lib into tuple/slot values. Tests: * Enable existing tests for complex types (test_nested_types.py, test_tpch_nested_queries.py) for ORC. Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790 --- M be/src/exec/CMakeLists.txt M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-orc-scanner.h A be/src/exec/orc-column-readers.cc A be/src/exec/orc-column-readers.h A be/src/exec/orc-metadata-utils.cc A be/src/exec/orc-metadata-utils.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java A testdata/ComplexTypesTbl/README A testdata/ComplexTypesTbl/nonnullable.orc A testdata/ComplexTypesTbl/nullable.orc M testdata/bin/create-load-data.sh M testdata/bin/load_nested.py M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-planner/queries/PlannerTest/complex-types-file-formats.test M testdata/workloads/functional-query/queries/QueryTest/max-nesting-depth.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch-limit.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch-mem-limit.test M testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch.test M testdata/workloads/tpch_nested/tpch_nested_core.csv M testdata/workloads/tpch_nested/tpch_nested_dimensions.csv M testdata/workloads/tpch_nested/tpch_nested_exhaustive.csv M testdata/workloads/tpch_nested/tpch_nested_pairwise.csv M tests/query_test/test_nested_types.py M tests/query_test/test_tpch_nested_queries.py 29 files changed, 1,656 insertions(+), 442 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/12168/5 -- To view, visit http://gerrit.cloudera.org:8080/12168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790 Gerrit-Change-Number: 12168 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang
[Impala-ASF-CR] IMPALA-6503: Support reading complex types from ORC format files
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12168 ) Change subject: IMPALA-6503: Support reading complex types from ORC format files .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/12168/5/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/12168/5/be/src/exec/hdfs-orc-scanner.cc@291 PS5, Line 291: RETURN_IF_ERROR(schema_resolver_->ResolveColumn(tuple_desc.tuple_path(), , _field, line too long (93 > 90) -- To view, visit http://gerrit.cloudera.org:8080/12168 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790 Gerrit-Change-Number: 12168 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 07 Jan 2019 14:55:52 + Gerrit-HasComments: Yes