[Impala-ASF-CR] IMPALA-8021: Add estimated cardinality to EXPLAIN output

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12136 )

Change subject: IMPALA-8021: Add estimated cardinality to EXPLAIN output
..


Patch Set 14:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1735/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12136
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9aa2d715b04cbb279aaffec8c5692686562d986
Gerrit-Change-Number: 12136
Gerrit-PatchSet: 14
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 08 Jan 2019 07:06:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8021: Add estimated cardinality to EXPLAIN output

2019-01-07 Thread Paul Rogers (Code Review)
Hello Bharath Vissapragada, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12136

to look at the new patch set (#14).

Change subject: IMPALA-8021: Add estimated cardinality to EXPLAIN output
..

IMPALA-8021: Add estimated cardinality to EXPLAIN output

Cardinality is vital to understanding why a plan has the form it does,
yet the planner normally emits cardinality information only for the
detailed levels. Unfortunately, most query profiles we see are at the
standard level without this information (except in the summary table),
making it hard to understand what happened.

This patch adds cardinality to the standard EXPLAIN output. It also
changes the displayed cardinality value to be in abbreviated "metric"
form: 1.23K instead of 1234, etc.

Changing the DESCRIBE output has a huge impact on PlannerTest: all the
"golden" test files must change. To avoid doing this twice, this patch
also includes:

IMPALA-7919: Add predicates line in plan output for partition key
predicates

This is also the time to also include:

IMPALA-8022: Add cardinality checks to PlannerTest

The comparison code was changed to allow a set of validators, one of
which compares cardinality to ensure it is within 5% of the expected
value. This should ensure we don't change estimates unintentionally.

While many planner tests are concerned with cardinality, many others are
not. Testing showed that the cardinality is actually unstable within
tests. For such tests, added filters to ignore cardinality. The filter
is enabled by default (for backward compatibility) but disabled (to
allow cardinality verification) for the critical tests.

Rebasing the tests was complicated by a bug in the error-matching code,
so this patch also fixes:

IMPALA-8023: Fix PlannerTest to handle error lines consistently

Now, the error output written to the output "save results" file matches
that expected in the "golden" file -- no more handling these specially.

Testing:

* Added cardinality verification.
* Reran all FE tests.
* Rebased all PlannerTest .test files.
* Adjusted the metadata/test_explain.py test to handle the changed
  EXPLAIN output.

Change-Id: Ie9aa2d715b04cbb279aaffec8c5692686562d986
---
M fe/src/main/java/org/apache/impala/analysis/PartitionSet.java
M fe/src/main/java/org/apache/impala/common/PrintUtils.java
M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/complex-types-file-formats.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/constant.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/default-join-distr-mode-broadcast.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/default-join-distr-mode-shuffle.test
M testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/disable-preaggregations.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/distinct-estimate.test
M testdata/workloads/functional-planner/queries/PlannerTest/distinct.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M 

[Impala-ASF-CR] IMPALA-8021: Add estimated cardinality to EXPLAIN output

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12136 )

Change subject: IMPALA-8021: Add estimated cardinality to EXPLAIN output
..


Patch Set 14:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/12136/14/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/12136/14/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@276
PS14, Line 276:   List partitions, TableRef 
hdfsTblRef, AggregateInfo aggInfo,
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/12136/14/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

http://gerrit.cloudera.org:8080/#/c/12136/14/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@729
PS14, Line 729: assertEquals(" foo=bar cardinality=", filter.transform(" 
foo=bar cardinality=unavailable"));
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/12136/14/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@731
PS14, Line 731: assertEquals(" row-size= cardinality=10.3K", 
filter.transform(" row-size=10B cardinality=10.3K"));
line too long (102 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/12136
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9aa2d715b04cbb279aaffec8c5692686562d986
Gerrit-Change-Number: 12136
Gerrit-PatchSet: 14
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 08 Jan 2019 06:34:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7666: Propagate name of test into CLIENT IDENTIFIER.

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12177 )

Change subject: IMPALA-7666: Propagate name of test into CLIENT_IDENTIFIER.
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1734/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f685fd16982d73ad3fc0f4a7578c5ad83b9a84c
Gerrit-Change-Number: 12177
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 08 Jan 2019 02:44:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7666: Adding an opaque client identifier to query options.

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12130 )

Change subject: IMPALA-7666: Adding an opaque client identifier to query 
options.
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1732/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0a7708492f05d33b2bc99fc3a03b461bbb6f3ea4
Gerrit-Change-Number: 12130
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 08 Jan 2019 02:36:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8026: Fix #rows accounting for NLJ

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/12164 )

Change subject: IMPALA-8026: Fix #rows accounting for NLJ
..

IMPALA-8026: Fix #rows accounting for NLJ

Use the same, much simpler, approach used by all other ExecNodes.

Testing:
Manually tested by adding logging to print values of the counter.
Confirmed that the fix led to the counter having the correct
values (instead of values exceeding num_rows_returned_).

Change-Id: I43e2e1d8f85478ff36c8cddc69bac3bdccd2c824
Reviewed-on: http://gerrit.cloudera.org:8080/12164
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/exec/nested-loop-join-node.cc
1 file changed, 8 insertions(+), 20 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/12164
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I43e2e1d8f85478ff36c8cddc69bac3bdccd2c824
Gerrit-Change-Number: 12164
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Marshall 


[Impala-ASF-CR] IMPALA-8021: Add estimated cardinality to EXPLAIN output

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12136 )

Change subject: IMPALA-8021: Add estimated cardinality to EXPLAIN output
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1733/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12136
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9aa2d715b04cbb279aaffec8c5692686562d986
Gerrit-Change-Number: 12136
Gerrit-PatchSet: 13
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 08 Jan 2019 02:28:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7986,IMPALA-7987: run daemons in docker containers

2019-01-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12095 )

Change subject: IMPALA-7986,IMPALA-7987: run daemons in docker containers
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12095/11/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/12095/11/bin/impala-config.sh@275
PS11, Line 275: export EXTERNAL_LISTEN_HOST="${EXTERNAL_LISTEN_HOST-localhost}"
This should be 0.0.0.0 by default.



--
To view, visit http://gerrit.cloudera.org:8080/12095
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5975cced33fa93df43101dd47d19b8af12e93d11
Gerrit-Change-Number: 12095
Gerrit-PatchSet: 11
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 08 Jan 2019 02:15:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7970 : Add support for metastore event based automatic invalidate

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12118 )

Change subject: IMPALA-7970 : Add support for metastore event based automatic 
invalidate
..


Patch Set 14:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1731/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12118
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic70b27780560b7ac9b33418d132b36cd0ca4abf7
Gerrit-Change-Number: 12118
Gerrit-PatchSet: 14
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Tue, 08 Jan 2019 02:12:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7666: Propagate name of test into CLIENT IDENTIFIER.

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12177 )

Change subject: IMPALA-7666: Propagate name of test into CLIENT_IDENTIFIER.
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/12177/1/tests/conftest.py
File tests/conftest.py:

http://gerrit.cloudera.org:8080/#/c/12177/1/tests/conftest.py@595
PS1, Line 595: @pytest.hookimpl(trylast=True)
flake8: E302 expected 2 blank lines, found 1


http://gerrit.cloudera.org:8080/#/c/12177/1/tests/conftest.py@601
PS1, Line 601: ,
flake8: E231 missing whitespace after ','


http://gerrit.cloudera.org:8080/#/c/12177/1/tests/conftest.py@601
PS1, Line 601: :
flake8: E501 line too long (95 > 90 characters)



--
To view, visit http://gerrit.cloudera.org:8080/12177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f685fd16982d73ad3fc0f4a7578c5ad83b9a84c
Gerrit-Change-Number: 12177
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 08 Jan 2019 02:07:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7666: Propagate name of test into CLIENT IDENTIFIER.

2019-01-07 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12177


Change subject: IMPALA-7666: Propagate name of test into CLIENT_IDENTIFIER.
..

IMPALA-7666: Propagate name of test into CLIENT_IDENTIFIER.

To facilitate correlating test failures (where we sometimes know things
like fragment id) with the tests that generated those queries, we can
stuff the test name into CLIENT_IDENTIFIER.

The mechanics here are to create a global, tests.common.current_node to
store the current test, create a plugin in conftest to set this when
entering a test, and then configuring connections as they're created
deep in test code.

Change-Id: I2f685fd16982d73ad3fc0f4a7578c5ad83b9a84c
---
M tests/common/impala_connection.py
M tests/conftest.py
M tests/query_test/test_observability.py
3 files changed, 32 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/12177/1
--
To view, visit http://gerrit.cloudera.org:8080/12177
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2f685fd16982d73ad3fc0f4a7578c5ad83b9a84c
Gerrit-Change-Number: 12177
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-8021: Add estimated cardinality to EXPLAIN output

2019-01-07 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12136 )

Change subject: IMPALA-8021: Add estimated cardinality to EXPLAIN output
..


Patch Set 13:

(7 comments)

Addressed review comments, rebased on latest master, and fixed a potentially 
unstable test.

Will attempt to remove the range checks for cardinality. Please hold off on 
another review until we see the results.

http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java:

http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@100
PS12, Line 100:* Return a list of partitions left after applying the 
conjuncts. Please note
> Needs an update to explain the Pair<>
Done


http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1230
PS12, Line 1230:   if (partitionConjuncts_ != null && 
!partitionConjuncts_.isEmpty()) {
> Looks like Greg had a different opinion on the jira when I mentioned separa
Agreed. I added this partly because of Greg's request (and your response), but 
also because I found it very helpful to understand the cardinality in a plan. 
We have a table with 11K rows. The scan node had no (visible) predicates, yet 
claimed to return 5K rows. The "5/11 partitions" should have clued me in. Had 
to debug the code to see why that was the case. Once these partition predicates 
appeared, however, it was obvious what was happening.

So, is that a good reason to include this info in the EXPLAIN output? IMHO it 
is, what do you think?

Discovered one reason that we did not include them: we had some partitioning 
tests that use the system time for a partitioning value. As a result, the 
predicate changes on each run and it is hard to verify the test results. Simply 
changed it to use a constant date/time to work around that issue.


http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1283
PS12, Line 1283: Pair, List> pair =
> line too long (110 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1331
PS12, Line 1331: scanNode.init(analyzer);
> nit: Can we force this in the c'tor above? Setting it separately seems weir
Done


http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java@541
PS12, Line 541:   if 
(!testOptions.contains(PlannerTestOption.VALIDATE_CARDINALITY)) {
> Did you run into some cases where this was required to make tests pass? Or
Mostly it is to reduce redundant output. By definition, an exchange just passes 
along the rows from its input. So, repeating the row width and cardinality for 
an Exchange adds no information.

Added a comment to explain this thought.


http://gerrit.cloudera.org:8080/#/c/12136/4/fe/src/test/java/org/apache/impala/testutil/TestUtils.java
File fe/src/test/java/org/apache/impala/testutil/TestUtils.java:

http://gerrit.cloudera.org:8080/#/c/12136/4/fe/src/test/java/org/apache/impala/testutil/TestUtils.java@302
PS4, Line 302: }
> Just for reference, notice that the old code consumed three two per pass: o
Argh.. That should be "..two tokens per pass..."


http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/test/java/org/apache/impala/testutil/TestUtils.java
File fe/src/test/java/org/apache/impala/testutil/TestUtils.java:

http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/test/java/org/apache/impala/testutil/TestUtils.java@207
PS12, Line 207:  If found, then parse the values and compare them. Report a 
match
  :* if the values are within 5%, else report a mismatch.
> Yeah I'm not sure if we expect the cardinality estimates to be exactly repr
On my first attempt, I enabled cardinality checks for all tests. I saw some 
variation. Then, when running on the Jenkins server, I found that the metadata 
actually differs from what we have on dev machines, so I disabled cardinality 
checks by default, turning them on only for selected tests where they add value.

So, with that revision, it may be that I can remove this code. I'll commit the 
other review comments and do a commit. Then, I'll remove this code, do another 
commit, and do a 

[Impala-ASF-CR] IMPALA-8021: Add estimated cardinality to EXPLAIN output

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12136 )

Change subject: IMPALA-8021: Add estimated cardinality to EXPLAIN output
..


Patch Set 13:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/12136/13/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/12136/13/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@276
PS13, Line 276:   List partitions, TableRef 
hdfsTblRef, AggregateInfo aggInfo,
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/12136/13/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

http://gerrit.cloudera.org:8080/#/c/12136/13/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@729
PS13, Line 729: assertEquals(" foo=bar cardinality=", filter.transform(" 
foo=bar cardinality=unavailable"));
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/12136/13/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@731
PS13, Line 731: assertEquals(" row-size= cardinality=10.3K", 
filter.transform(" row-size=10B cardinality=10.3K"));
line too long (102 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/12136
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9aa2d715b04cbb279aaffec8c5692686562d986
Gerrit-Change-Number: 12136
Gerrit-PatchSet: 13
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 08 Jan 2019 02:01:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8021: Add estimated cardinality to EXPLAIN output

2019-01-07 Thread Paul Rogers (Code Review)
Hello Bharath Vissapragada, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12136

to look at the new patch set (#13).

Change subject: IMPALA-8021: Add estimated cardinality to EXPLAIN output
..

IMPALA-8021: Add estimated cardinality to EXPLAIN output

Cardinality is vital to understanding why a plan has the form it does,
yet the planner normally emits cardinality information only for the
detailed levels. Unfortunately, most query profiles we see are at the
standard level without this information (except in the summary table),
making it hard to understand what happened.

This patch adds cardinality to the standard EXPLAIN output. It also
changes the displayed cardinality value to be in abbreviated "metric"
form: 1.23K instead of 1234, etc.

Changing the DESCRIBE output has a huge impact on PlannerTest: all the
"golden" test files must change. To avoid doing this twice, this patch
also includes:

IMPALA-7919: Add predicates line in plan output for partition key
predicates

This is also the time to also include:

IMPALA-8022: Add cardinality checks to PlannerTest

The comparison code was changed to allow a set of validators, one of
which compares cardinality to ensure it is within 5% of the expected
value. This should ensure we don't change estimates unintentionally.

While many planner tests are concerned with cardinality, many others are
not. Testing showed that the cardinality is actually unstable within
tests. For such tests, added filters to ignore cardinality. The filter
is enabled by default (for backward compatibility) but disabled (to
allow cardinality verification) for the critical tests.

Rebasing the tests was complicated by a bug in the error-matching code,
so this patch also fixes:

IMPALA-8023: Fix PlannerTest to handle error lines consistently

Now, the error output written to the output "save results" file matches
that expected in the "golden" file -- no more handling these specially.

Testing:

* Added cardinality verification.
* Reran all FE tests.
* Rebased all PlannerTest .test files.
* Adjusted the metadata/test_explain.py test to handle the changed
  EXPLAIN output.

Change-Id: Ie9aa2d715b04cbb279aaffec8c5692686562d986
---
M fe/src/main/java/org/apache/impala/analysis/PartitionSet.java
M fe/src/main/java/org/apache/impala/common/PrintUtils.java
M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/complex-types-file-formats.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/constant.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/default-join-distr-mode-broadcast.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/default-join-distr-mode-shuffle.test
M testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/disable-preaggregations.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/distinct-estimate.test
M testdata/workloads/functional-planner/queries/PlannerTest/distinct.test
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M 

[Impala-ASF-CR] IMPALA-7666: Adding an opaque client identifier to query options.

2019-01-07 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12130 )

Change subject: IMPALA-7666: Adding an opaque client identifier to query 
options.
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12130/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12130/2//COMMIT_MSG@15
PS2, Line 15: protoype
> *prototype
I'm going to do the py.test stuff in a separate patch. impala-shell is a good 
idea, and I've done that.


http://gerrit.cloudera.org:8080/#/c/12130/2/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/12130/2/be/src/service/query-options.cc@724
PS2, Line 724: .c_str()
> Can't we just pass the original string?
Done



--
To view, visit http://gerrit.cloudera.org:8080/12130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0a7708492f05d33b2bc99fc3a03b461bbb6f3ea4
Gerrit-Change-Number: 12130
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 08 Jan 2019 01:59:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7666: Adding an opaque client identifier to query options.

2019-01-07 Thread Philip Zeyliger (Code Review)
Hello Lars Volker, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12130

to look at the new patch set (#3).

Change subject: IMPALA-7666: Adding an opaque client identifier to query 
options.
..

IMPALA-7666: Adding an opaque client identifier to query options.

We sometimes struggle to identify the client (e.g., a given version of a
JDBC driver, Tableau, Hue, etc.) for a given query. This commit adds a
User-Agent header style, called "Client Identifier", which clients can
set as a Query Option. Nothing is done with this header, but it's
written into logs and query profiles.

This commit includes changes to impala-shell to include the version of
impala shell with an associated test.

A future commit will serialize the name of the py.test being run into
this field, which is handy for figuring out where a query came from.

Change-Id: I0a7708492f05d33b2bc99fc3a03b461bbb6f3ea4
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M shell/impala_shell.py
M tests/shell/test_shell_commandline.py
6 files changed, 27 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/12130/3
--
To view, visit http://gerrit.cloudera.org:8080/12130
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0a7708492f05d33b2bc99fc3a03b461bbb6f3ea4
Gerrit-Change-Number: 12130
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7970 : Add support for metastore event based automatic invalidate

2019-01-07 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#14). ( 
http://gerrit.cloudera.org:8080/12118 )

Change subject: IMPALA-7970 : Add support for metastore event based automatic 
invalidate
..

IMPALA-7970 : Add support for metastore event based automatic invalidate

This change adds support to CatalogD to poll metastore events to issue
invalidate on tables automatically. It adds basic infrastructure to poll
HMS notifications events at a configurable frequency using a backend
config called hms_event_polling_frequency_s flag. Currently, it issues
invalidate at tables when it received alter events on table and
partitions. It also adds tables/databases and removes tables from
catalogD when it receives create_table/create_database and
drop_table/drop_database events. The default value of
hms_event_polling_frequency_s is 0 which disables the feature. A
non-zero value in seconds of this configuration can be used to enable
the feature and set the polling frequency.

In order to process each event atomically, this feature relies on
version lock in CatalogServiceCatalog. It adds new methods in
CatalogServiceCatalog which takes a write lock on version so that
readers are blocked until the catalog state is updated based on the
events. In case of processing events, the metastore operation is already
completed and only catalog state needs to be updated. Hence we do not
need to make new metastore calls while processing the events and only
version lock is sufficient to serialize updates to the catalog objects
based on events. This locking protocol is similar to what is done in
case of DDL processing in CatalogOpExecutor except it does not need to
take metastoreDdlLock since no metastore operations are needed during
event processing.

The change also adds a new test class to test the basic functionality
for each of the event type which is supported.

Note that this feature is still a work in progress and additional
improvements will be done in subsequent patches. Be default the feature
is turned off.

Change-Id: Ic70b27780560b7ac9b33418d132b36cd0ca4abf7
---
M be/src/common/global-flags.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
A fe/src/main/java/org/apache/impala/catalog/MetastoreEventHandlerUtils.java
A fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java
A fe/src/main/java/org/apache/impala/catalog/MetastoreNotificationException.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
A fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java
M fe/src/test/resources/postgresql-hive-site.xml.template
10 files changed, 1,585 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/18/12118/14
--
To view, visit http://gerrit.cloudera.org:8080/12118
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic70b27780560b7ac9b33418d132b36cd0ca4abf7
Gerrit-Change-Number: 12118
Gerrit-PatchSet: 14
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-8026: Fix #rows accounting for NLJ

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12164 )

Change subject: IMPALA-8026: Fix #rows accounting for NLJ
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/12164
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I43e2e1d8f85478ff36c8cddc69bac3bdccd2c824
Gerrit-Change-Number: 12164
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Tue, 08 Jan 2019 01:22:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7905: Hive keywords not quoted for identifiers

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12009 )

Change subject: IMPALA-7905: Hive keywords not quoted for identifiers
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1730/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I06cc20b052a3a66535a171c36b4b31477c0ba6d0
Gerrit-Change-Number: 12009
Gerrit-PatchSet: 8
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Comment-Date: Tue, 08 Jan 2019 01:07:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7468: Port CancelQueryFInstances() to KRPC.

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/12142 )

Change subject: IMPALA-7468: Port CancelQueryFInstances() to KRPC.
..

IMPALA-7468: Port CancelQueryFInstances() to KRPC.

When the Coordinator needs to cancel a query (for example because a user
has hit Control-C), it does this by sending a CancelQueryFInstances
message to each fragment instance. This change switches this code to use
KRPC.

Add new protobuf definitions for the messages, and remove the old thrift
definitions. Move the server-side implementation of Cancel() from
ImpalaInternalService to ControlService. Rework the scheduler so
that the FInstanceExecParams always contains the KRPC address of the
fragment executors, this address can then be used if a query is to be
cancelled.

For now keep the KRPC calls to CancelQueryFInstances() as synchronous.

While moving the client-side code, remove the fault injection code that
was inserted with FAULT_INJECTION_SEND_RPC_EXCEPTION and
FAULT_INJECTION_RECV_RPC_EXCEPTION (triggered by running impalad with
--fault_injection_rpc_exception_type=1) as this tickles code in
client-cache.h which is now not used.

TESTING:
  Ran all end-to-end tests.
  No new tests as test_cancellation.py provides good coverage.
  Checked in debugger that DebugAction style fault injection (triggered
  from test_cancellation.py) was working correctly.

Change-Id: I625030c3f1068061aa029e6e242f016cadd84969
Reviewed-on: http://gerrit.cloudera.org:8080/12142
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/runtime/backend-client.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/scheduling/query-schedule.h
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/control-service.cc
M be/src/service/control-service.h
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M common/protobuf/control_service.proto
M common/thrift/ImpalaInternalService.thrift
12 files changed, 192 insertions(+), 139 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/12142
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I625030c3f1068061aa029e6e242f016cadd84969
Gerrit-Change-Number: 12142
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Zoram Thanga 


[Impala-ASF-CR] IMPALA-7468: Port CancelQueryFInstances() to KRPC.

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12142 )

Change subject: IMPALA-7468: Port CancelQueryFInstances() to KRPC.
..


Patch Set 5: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/12142
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I625030c3f1068061aa029e6e242f016cadd84969
Gerrit-Change-Number: 12142
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Tue, 08 Jan 2019 01:05:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7905: Hive keywords not quoted for identifiers

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12009 )

Change subject: IMPALA-7905: Hive keywords not quoted for identifiers
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1729/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I06cc20b052a3a66535a171c36b4b31477c0ba6d0
Gerrit-Change-Number: 12009
Gerrit-PatchSet: 7
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Comment-Date: Tue, 08 Jan 2019 00:50:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6503: Support reading complex types from ORC format files

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12168 )

Change subject: IMPALA-6503: Support reading complex types from ORC format files
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1728/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790
Gerrit-Change-Number: 12168
Gerrit-PatchSet: 7
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 08 Jan 2019 00:43:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7905: Hive keywords not quoted for identifiers

2019-01-07 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12009 )

Change subject: IMPALA-7905: Hive keywords not quoted for identifiers
..


Patch Set 7:

(2 comments)

Addressed recent review comments.

http://gerrit.cloudera.org:8080/#/c/12009/5/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java:

http://gerrit.cloudera.org:8080/#/c/12009/5/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@116
PS5, Line 116: an identifier
> I meant to refer to the method argument @ident here. But it is fine. Not to
Took a second look. The text was ambiguous, using "identifier" for two distinct 
concepts. Reworded to be clearer.


http://gerrit.cloudera.org:8080/#/c/12009/5/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@148
PS5, Line 148: /
 :   public static boolean impalaNeedsQuotes(String ident) {
 : return SqlScanner.isReserved(ident) ||
 :   // Impala's scanner recognizes the ".123" portion of 
"db.123e45" as a decimal,
 :   // so while the quoting is not necessary
> ya, 123e45 is a valid decimal literal but not .123 part like you mentioned
Reworded the comment and inserted an actual example that illustrates the issue. 
"3a3" is a fine unquoted identifier. "3e3" must be quoted, else it is a double. 
So, to be safe, we quote all identifiers that start with a digit.



--
To view, visit http://gerrit.cloudera.org:8080/12009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I06cc20b052a3a66535a171c36b4b31477c0ba6d0
Gerrit-Change-Number: 12009
Gerrit-PatchSet: 7
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Comment-Date: Tue, 08 Jan 2019 00:37:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7905: Hive keywords not quoted for identifiers

2019-01-07 Thread Paul Rogers (Code Review)
Hello Bharath Vissapragada, Fredy Wijaya, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12009

to look at the new patch set (#8).

Change subject: IMPALA-7905: Hive keywords not quoted for identifiers
..

IMPALA-7905: Hive keywords not quoted for identifiers

Impala often generates SQL for statements in the toSql() call, often
used during testing or when writing the query plan. Impala keywords such
as "create", when used as identifiers, must be quoted:

SELECT `select`, `from` FROM `order` ...

The code in ToSqlUtils.getIdentSql() also quotes the identifier if it is
a Hive keyword. But, that code contained a flaw: it uses the Hive lexer
to detect a keyword, but the lexer expects a case-insensitive input.
We provide a case sensitive input. As a result, "MONTH" is caught as a
Hive keyword and quoted, but "month" is not.

This patch fixes that flaw.

This patch also fixes:

IMPALA-8051: Compute stats fails on a column with comment character in
name

The code uses the Hive lexical analyzer to check names. Since "#" and
"--" are comment characters, a name like "foo#" is parsed as "foo" which
does not need quotes, hence we don't quote "foo#", which causes issues.

Added a special check for "#" and "--" to resolve this issue.

Testing:

* Refactored getIdentSql() easier testing.
* Added a tests to the recently added ToSqlUtilsTest for this case and
  several others.
* Making this change caused the columns `month`, `year`, and `key` to be
  quoted when before they were not. Updated many tests as a result.
* Added a new identSql() function, for use in tests, to match the
  quoting that Impala uses, and to handle the wildcard, and multi-part
  names. Used this in ToSqlTest to handle the quoted names.
* Reran all FE tests.

Change-Id: I06cc20b052a3a66535a171c36b4b31477c0ba6d0
---
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlUtilsTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/partition-key-scans.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/shuffle-by-distinct-exprs.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
31 files changed, 741 insertions(+), 503 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/12009/8
--
To view, visit http://gerrit.cloudera.org:8080/12009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I06cc20b052a3a66535a171c36b4b31477c0ba6d0
Gerrit-Change-Number: 12009
Gerrit-PatchSet: 8
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 


[Impala-ASF-CR] IMPALA-7905: Hive keywords not quoted for identifiers

2019-01-07 Thread Paul Rogers (Code Review)
Hello Bharath Vissapragada, Fredy Wijaya, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12009

to look at the new patch set (#7).

Change subject: IMPALA-7905: Hive keywords not quoted for identifiers
..

IMPALA-7905: Hive keywords not quoted for identifiers

Impala often generates SQL for statements in the toSql() call, often
used during testing or when writing the query plan. Impala keywords such
as "create", when used as identifiers, must be quoted:

SELECT `select`, `from` FROM `order` ...

The code in ToSqlUtils.getIdentSql() also quotes the identifier if it is
a Hive keyword. But, that code contained a flaw: it uses the Hive lexer
to detect a keyword, but the lexer expects a case-insensitive input.
We provide a case sensitive input. As a result, "MONTH" is caught as a
Hive keyword and quoted, but "month" is not.

This patch fixes that flaw.

This patch also fixes:

IMPALA-8051: Compute stats fails on a column with comment character in
name

The code uses the Hive lexical analyzer to check names. Since "#" and
"--" are comment characters, a name like "foo#" is parsed as "foo" which
does not need quotes, hence we don't quote "foo#", which causes issues.

Added a special check for "#" and "--" to resolve this issue.

Testing:

* Refactored getIdentSql() easier testing.
* Added a tests to the recently added ToSqlUtilsTest for this case and
  several others.
* Making this change caused the columns `month`, `year`, and `key` to be
  quoted when before they were not. Updated many tests as a result.
* Added a new identSql() function, for use in tests, to match the
  quoting that Impala uses, and to handle the wildcard, and multi-part
  names. Used this in ToSqlTest to handle the quoted names.
* Reran all FE tests.

Change-Id: I06cc20b052a3a66535a171c36b4b31477c0ba6d0
---
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlUtilsTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/partition-key-scans.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/shuffle-by-distinct-exprs.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
31 files changed, 738 insertions(+), 503 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/12009/7
--
To view, visit http://gerrit.cloudera.org:8080/12009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I06cc20b052a3a66535a171c36b4b31477c0ba6d0
Gerrit-Change-Number: 12009
Gerrit-PatchSet: 7
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 


[Impala-ASF-CR] IMPALA-7905: Hive keywords not quoted for identifiers

2019-01-07 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12009 )

Change subject: IMPALA-7905: Hive keywords not quoted for identifiers
..


Patch Set 6:

Rebased on latest master to resolve merge conflict.

Added a trivial fix for IMPALA-8051: using a comment character in a column name.

Please review the fix and rebase and renew your approval if you agree.


--
To view, visit http://gerrit.cloudera.org:8080/12009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I06cc20b052a3a66535a171c36b4b31477c0ba6d0
Gerrit-Change-Number: 12009
Gerrit-PatchSet: 6
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Comment-Date: Tue, 08 Jan 2019 00:17:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6503: Support reading complex types from ORC format files

2019-01-07 Thread Quanlong Huang (Code Review)
Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12168

to look at the new patch set (#7).

Change subject: IMPALA-6503: Support reading complex types from ORC format files
..

IMPALA-6503: Support reading complex types from ORC format files

We’ve supported reading primitive types from ORC files (IMPALA-5717).
In this patch we add support for complex types (struct/array/map).

In IMPALA-5717, we depend on the ORC lib to read ORC binaries. The ORC
lib can materialize ORC column binaries into its representation
(orc::ColumnVectorBatch), so we don’t need to do anything about
decoding/decompression in hdfs-orc-scanner. Since it already supports
complex types, we’ll still depend on it.

What we need to add in IMPALA-6503 are two things:
1. Specify which nested columns we need to the ORC lib
2. Transform outputs of ORC lib (nested orc::ColumnVectorBatch) into
  Impala’s representation

To format the materialization, we implement several ORC column readers
used in hdfs-orc-scanner. Each kind of reader treats a column type.
Don’t like the Parquet readers (used in hdfs-parquet-scanner) which
materializes Parquet column binaries into tuple values directly, the ORC
readers (in hdfs-orc-scanner) just need to transform outputs of the ORC
lib into tuple/slot values.

Tests:
* Enable existing tests for complex types (test_nested_types.py,
test_tpch_nested_queries.py) for ORC.

Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790
---
M be/src/exec/CMakeLists.txt
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
A be/src/exec/orc-column-readers.cc
A be/src/exec/orc-column-readers.h
A be/src/exec/orc-metadata-utils.cc
A be/src/exec/orc-metadata-utils.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
A testdata/ComplexTypesTbl/README
A testdata/ComplexTypesTbl/nonnullable.orc
A testdata/ComplexTypesTbl/nullable.orc
M testdata/bin/create-load-data.sh
M testdata/bin/load_nested.py
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/complex-types-file-formats.test
M testdata/workloads/functional-query/queries/QueryTest/max-nesting-depth.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch-limit.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch-mem-limit.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch.test
M testdata/workloads/tpch_nested/tpch_nested_core.csv
M testdata/workloads/tpch_nested/tpch_nested_dimensions.csv
M testdata/workloads/tpch_nested/tpch_nested_exhaustive.csv
M testdata/workloads/tpch_nested/tpch_nested_pairwise.csv
M tests/query_test/test_nested_types.py
M tests/query_test/test_tpch_nested_queries.py
29 files changed, 1,685 insertions(+), 444 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/12168/7
--
To view, visit http://gerrit.cloudera.org:8080/12168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790
Gerrit-Change-Number: 12168
Gerrit-PatchSet: 7
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8041, Part 2: Refactor SELECT list

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12144 )

Change subject: IMPALA-8041, Part 2: Refactor SELECT list
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1727/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5e173dad717e739c097f1f0cd86a0352648ff886
Gerrit-Change-Number: 12144
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Comment-Date: Mon, 07 Jan 2019 23:15:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8041, Part 2: Refactor SELECT list

2019-01-07 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12144 )

Change subject: IMPALA-8041, Part 2: Refactor SELECT list
..


Patch Set 2:

Rebased on master. Fixed code style issues.


--
To view, visit http://gerrit.cloudera.org:8080/12144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5e173dad717e739c097f1f0cd86a0352648ff886
Gerrit-Change-Number: 12144
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Comment-Date: Mon, 07 Jan 2019 22:42:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8041, Part 2: Refactor SELECT list

2019-01-07 Thread Paul Rogers (Code Review)
Hello Bharath Vissapragada, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12144

to look at the new patch set (#2).

Change subject: IMPALA-8041, Part 2: Refactor SELECT list
..

IMPALA-8041, Part 2: Refactor SELECT list

Another step toward combining analysis and rewrites. This step refactors
the SELECT list to:

* Enable each select list item to hold its "source" (before analysis and
  rewrite) SQL.
* Split the select list item into two classes: one for the wildcard
  (which holds no expresion) and the other for actual select
  expressions.
* Moves some analysis steps into the select list item itself.
* Cleans up a few other minor items.

Tests: No functional change, just refactoring. Reran all FE tests.

Change-Id: I5e173dad717e739c097f1f0cd86a0352648ff886
---
M fe/src/main/cup/sql-parser.cup
A fe/src/main/java/org/apache/impala/analysis/AbstractExpression.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectList.java
M fe/src/main/java/org/apache/impala/analysis/SelectListItem.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
12 files changed, 300 insertions(+), 133 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/12144/2
--
To view, visit http://gerrit.cloudera.org:8080/12144
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5e173dad717e739c097f1f0cd86a0352648ff886
Gerrit-Change-Number: 12144
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 


[Impala-ASF-CR] IMPALA-7625: test web pages.py backend tests are failing

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/12024 )

Change subject: IMPALA-7625: test_web_pages.py backend tests are failing
..

IMPALA-7625: test_web_pages.py backend tests are failing

The backend tests in test_web_page.py do some basic validation of
the query_backends and query_finstances endpoints. The tests were
failing due to a race condition in the test itself. The issue is that
the backend endpoints are only usable while a query is running. When
queried against completed queries, the backend endpoints return nothing.
The original test worked by running a "complex" query asynchronously and
then querying the backend endpoints before the query completed. The
assumption was that the query would run long enough for the request to
the backend endpoint to complete. However, that is not always the case
and the query easily completes before the backend endpoints can be
reached.

This patch updates the test so that instead of running a "complex"
query, it runs a query that calls the sleep UDF, guaranteeing the query
will take enough time for the backend endpoints to be reached.

Furthermore, the patch adds additional validation and cleanup of all the
backend endpoint tests.

Testing:

Ran core tests.

Change-Id: I2092afe1bfaec30cd3e7b0040f06865e43fe62fb
Reviewed-on: http://gerrit.cloudera.org:8080/12024
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_admission_controller.py
M tests/webserver/test_web_pages.py
3 files changed, 83 insertions(+), 48 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/12024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I2092afe1bfaec30cd3e7b0040f06865e43fe62fb
Gerrit-Change-Number: 12024
Gerrit-PatchSet: 7
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7625: test web pages.py backend tests are failing

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12024 )

Change subject: IMPALA-7625: test_web_pages.py backend tests are failing
..


Patch Set 6: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/12024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2092afe1bfaec30cd3e7b0040f06865e43fe62fb
Gerrit-Change-Number: 12024
Gerrit-PatchSet: 6
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 07 Jan 2019 22:05:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8047 Support .proto files in .clang-format

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/12165 )

Change subject: IMPALA-8047 Support .proto files in .clang-format
..

IMPALA-8047 Support .proto files in .clang-format

The .proto file extension is used for the Google Protocol Buffers
language. Impala uses this language to specify the format of messages
used by KRPC. Add support for this language to .clang-format so that we
can have consistent formatting.

The proposed support is:

Language: Proto
BasedOnStyle: Google
ColumnLimit: 90

This produces only a few diffs when run against the existing Impala
code. I’m not proposing to make any changes to .proto files, this is
just to show what clang-format will do. Apart from wrapping comments and
code at 90 chars, the diffs are mostly of the form

-syntax="proto2";
+syntax = "proto2";

-  message Certificate {};
+  message Certificate {
+  };

-  optional bool client_timeout_defined = 4 [ default = false ];
+  optional bool client_timeout_defined = 4 [default = false];

-UNKNOWN= 999;
-NEGOTIATE  = 1;
-SASL_SUCCESS   = 0;
-SASL_INITIATE  = 2;
+UNKNOWN = 999;
+NEGOTIATE = 1;
+SASL_SUCCESS = 0;
+SASL_INITIATE = 2;

This last change can be configured using “AlignConsecutiveAssignments:
true” but that creates a different set of diffs.

Change-Id: I0c2dcfc21fc8f9206adb64166fbd05580516791f
Reviewed-on: http://gerrit.cloudera.org:8080/12165
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M .clang-format
1 file changed, 4 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/12165
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I0c2dcfc21fc8f9206adb64166fbd05580516791f
Gerrit-Change-Number: 12165
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8047 Support .proto files in .clang-format

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12165 )

Change subject: IMPALA-8047 Support .proto files in .clang-format
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/12165
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0c2dcfc21fc8f9206adb64166fbd05580516791f
Gerrit-Change-Number: 12165
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 07 Jan 2019 21:55:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7087: Read Parquet decimal columns with lower precision/scale

2019-01-07 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12163 )

Change subject: IMPALA-7087: Read Parquet decimal columns with lower 
precision/scale
..


Patch Set 1:

> > (1 comment)
 > >
 > > Thanks for taking a look @Csaba! I've only responded to your
 > > comment on overflow handling so far, as I want to get that
 > behavior
 > > locked down first.
 >
 > Are you sure that the query will be aborted - isn't it only one
 > warning / file? If there are "normal" files processed before the
 > one with overflows, then some rows will be returned before running
 > to the error. In the current implementation it is also possible
 > that some rows are returned from the file before reaching the
 > overflowing value.
 >
 > I am also not completely sure about the best approach.
 >
 > I would prefer the error to be non-fatal, so to print one
 > warning/file and continue the query.
 >
 > If no important use case is lost this way, then I would prefer to
 > use the formula and skip files based on metadata instead of
 > checking every row.
 >
 > This would make it possible to print a useful error message, so
 > that the user will know the precision needed to process the file,
 > and can call ALTER TABLE to increase precision / decrease scale and
 > make the file usable. If the error is printed based the first row,
 > then this can become a long iterative process.

Thats a fair point. It is throwing an error. The last test in 
parquet-decimal-precision-and-scale-widening.test has a `CATCH` section, so if 
I understand the logic in ImpalaTestSuite#run_test_case then that means the 
query threw an exception. I agree that its possible for some rows to be 
returned before this error is throw, but I guess thats an issue with many 
runtime errors in Impala.

I'm good with printing a non-fatal error and setting the row to NULL. This is 
what Hive does, so we will be consistent with Hive.

Using the formula will prevent overflows from happening in the first place, so 
it somewhat orthogonal to the discussion of error handling. The only drawback I 
can see to using the formula you suggested is that it is not exactly SQL 
Standard compliant. It's entirely possible that the formula prevents a query 
from running, even though none of the values in the table would cause an 
overflow. The standard says that values should be rounded or truncated to the 
match target precision / scale, but I guess it doesn't explicitly mention 
overflows so its hard to say.

I think that following Hive's behavior might be best here. I've added @Lars to 
the review to see what he thinks.


--
To view, visit http://gerrit.cloudera.org:8080/12163
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iafc8efd12379a39756e3e70f022a81a636dadb61
Gerrit-Change-Number: 12163
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Mon, 07 Jan 2019 21:32:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8026: Fix #rows accounting for NLJ

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12164 )

Change subject: IMPALA-8026: Fix #rows accounting for NLJ
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12164
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I43e2e1d8f85478ff36c8cddc69bac3bdccd2c824
Gerrit-Change-Number: 12164
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Mon, 07 Jan 2019 21:12:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8026: Fix #rows accounting for NLJ

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12164 )

Change subject: IMPALA-8026: Fix #rows accounting for NLJ
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3618/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/12164
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I43e2e1d8f85478ff36c8cddc69bac3bdccd2c824
Gerrit-Change-Number: 12164
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Mon, 07 Jan 2019 21:12:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7446: enable buffer pool GC when near process mem limit

2019-01-07 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12133 )

Change subject: IMPALA-7446: enable buffer pool GC when near process mem limit
..


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/12133/4/be/src/runtime/bufferpool/buffer-pool-test.cc
File be/src/runtime/bufferpool/buffer-pool-test.cc:

http://gerrit.cloudera.org:8080/#/c/12133/4/be/src/runtime/bufferpool/buffer-pool-test.cc@2260
PS4, Line 2260:   // Make sure we have a process memory tracker that uses 
TCMalloc metrics.
nit: maybe explain here briefly why those metrics are necessary. Its not 
obvious unless you read the JIRA or the patch and know which code-path this is 
trying to hit


http://gerrit.cloudera.org:8080/#/c/12133/4/be/src/runtime/bufferpool/buffer-pool-test.cc@2277
PS4, Line 2277:   
ASSERT_OK(client.DecreaseReservationTo(numeric_limits::max(), 0));
nit: should we add a check here to see that the relevant Buffer Pool metrics 
are in the required state (FREE_BUFFER_BYTES, CLEAN_PAGE_BYTES and 
UNUSED_RESERVATION_BYTES), to make sure the right code path is hit


http://gerrit.cloudera.org:8080/#/c/12133/4/be/src/runtime/exec-env.h
File be/src/runtime/exec-env.h:

http://gerrit.cloudera.org:8080/#/c/12133/4/be/src/runtime/exec-env.h@254
PS4, Line 254: Must be called after
 :   /// InitBufferPool().
and RegisterMemoryMetrics()


http://gerrit.cloudera.org:8080/#/c/12133/4/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

http://gerrit.cloudera.org:8080/#/c/12133/4/be/src/runtime/exec-env.cc@436
PS4, Line 436: DCHECK(AggregateMemoryMetrics::TOTAL_USED != nullptr);
nit: DCHECK(AggregateMemoryMetrics::TOTAL_USED != nullptr) << "Mem metrics not 
registered.";



--
To view, visit http://gerrit.cloudera.org:8080/12133
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I81e8e29f1ba319f1b499032f9518d32c511b4b21
Gerrit-Change-Number: 12133
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Comment-Date: Mon, 07 Jan 2019 21:07:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8026: Fix #rows accounting for NLJ

2019-01-07 Thread Thomas Marshall (Code Review)
Thomas Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12164 )

Change subject: IMPALA-8026: Fix #rows accounting for NLJ
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12164
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I43e2e1d8f85478ff36c8cddc69bac3bdccd2c824
Gerrit-Change-Number: 12164
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Comment-Date: Mon, 07 Jan 2019 21:02:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7468: Port CancelQueryFInstances() to KRPC.

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12142 )

Change subject: IMPALA-7468: Port CancelQueryFInstances() to KRPC.
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3617/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/12142
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I625030c3f1068061aa029e6e242f016cadd84969
Gerrit-Change-Number: 12142
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Mon, 07 Jan 2019 21:02:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7468: Port CancelQueryFInstances() to KRPC.

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12142 )

Change subject: IMPALA-7468: Port CancelQueryFInstances() to KRPC.
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12142
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I625030c3f1068061aa029e6e242f016cadd84969
Gerrit-Change-Number: 12142
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Mon, 07 Jan 2019 21:02:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7468: Port CancelQueryFInstances() to KRPC.

2019-01-07 Thread Thomas Marshall (Code Review)
Thomas Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12142 )

Change subject: IMPALA-7468: Port CancelQueryFInstances() to KRPC.
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12142
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I625030c3f1068061aa029e6e242f016cadd84969
Gerrit-Change-Number: 12142
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Mon, 07 Jan 2019 20:54:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6533: Add min-max filter for decimal types on kudu tables.

2019-01-07 Thread Thomas Marshall (Code Review)
Thomas Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12113 )

Change subject: IMPALA-6533: Add min-max filter for decimal types on kudu 
tables.
..


Patch Set 14:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/12113/14/be/src/runtime/decimal-value.h
File be/src/runtime/decimal-value.h:

http://gerrit.cloudera.org:8080/#/c/12113/14/be/src/runtime/decimal-value.h@199
PS14, Line 199: value_
nit: ///
nit: 'value_' (i.e. quotes around variable names)

More importantly, I don't understand this comment - eg. what is "the receiving 
end" in this context?

I assume you're thinking in the context of this patch, that a 
DecimalMinMaxFilter will be reconstructed after being sent over the network 
using this function, but the function is more general so I think it makes more 
sense to have a more general comment.

Its also generally nice to specifically mention the use of the out parameter.

Maybe something like:
/// Store the binary representation of this DecimalValue in 'tvalue'.


http://gerrit.cloudera.org:8080/#/c/12113/14/be/src/util/min-max-filter-ir.cc
File be/src/util/min-max-filter-ir.cc:

http://gerrit.cloudera.org:8080/#/c/12113/14/be/src/util/min-max-filter-ir.cc@95
PS14, Line 95:   switch (size_) {
So this is pretty perf-sensitive code (executed once per row on the build side 
of joins), and since 'size_' is a constant we really don't want to be doing 
this switch here.

There are at least two ways to fix this:
- Use codegen, eg. introduce a MinMaxFilter::CodegenInsert() which returns an 
llvm::Function. For other types, this would just call 
codegen->GetFunction(GetInsertIRFunctionType()) but for DecimalMinMaxFilter it 
could replace this with a constant (eg. by having a function GetSize() here and 
using ReplaceCallSites()), then call this new function in 
FilterContext::CodegenInsert().

- Use macros. eg. have separate classes for Decimal4MinMaxFilter, 
Decimal8MinMaxFilter, and Decimal16MinMaxFilter generated with macros just as 
we do for NUMERIC_MIN_MAX_FILTER, and return an object of the appropriate type 
from MinMaxFilter::Create()

I have a strong preference for the macros option as codegen can be tricky and I 
think using macros will be more efficient (eg. because it doesn't add to 
codegen time, eliminates switching in other function like 
DecimalMinMaxFilter::Or()/ToThrift())


http://gerrit.cloudera.org:8080/#/c/12113/14/be/src/util/min-max-filter.h
File be/src/util/min-max-filter.h:

http://gerrit.cloudera.org:8080/#/c/12113/14/be/src/util/min-max-filter.h@298
PS14, Line 298:   static int GetSize(int precision) {
I think that you can just use ColumnType::GetDecimalByteSize() instead of 
defining this function yourself.


http://gerrit.cloudera.org:8080/#/c/12113/13/tests/query_test/test_runtime_filters.py
File tests/query_test/test_runtime_filters.py:

http://gerrit.cloudera.org:8080/#/c/12113/13/tests/query_test/test_runtime_filters.py@116
PS13, Line 116: self.run_test_case('QueryTest/decimal_min_max_filters', 
vector)
> How do I find out the time it takes to run?
Hmm, alright. It doesn't make much sense to me to split the decimal-related 
tests based on the join type (and my guess is that isn't really what Tim had in 
mind anyways), so my recommendation is to move all of the decimal-specific test 
cases into this decimal-specific file (eg. test cases 5+), and leave the 
decimal stuff that you added to non-decimal-specific test cases in 
min_max_filters.test (eg. the stuff in test case 1).

Then probably have test_decimal_min_max_filters run only in exhaustive (based 
on a combination of the long running time and the assumption that the tests in 
case 1 combined with various tests elsewhere that address different join types 
cover most of this well enough to catch most issues, and the specific things 
being tested by your join-type-specific decimal tests are unlikely to break 
regularly).



--
To view, visit http://gerrit.cloudera.org:8080/12113
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib7e7278e902160d7060f8097290bc172d9031f94
Gerrit-Change-Number: 12113
Gerrit-PatchSet: 14
Gerrit-Owner: Janaki Lahorani 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Janaki Lahorani 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 07 Jan 2019 19:42:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7979: Enhance decoders to support value-skipping

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12172 )

Change subject: IMPALA-7979: Enhance decoders to support value-skipping
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1726/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib848f1bd71735fe84e8064daf700417b32589f57
Gerrit-Change-Number: 12172
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 07 Jan 2019 19:13:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7468: Port CancelQueryFInstances() to KRPC.

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12142 )

Change subject: IMPALA-7468: Port CancelQueryFInstances() to KRPC.
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1725/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12142
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I625030c3f1068061aa029e6e242f016cadd84969
Gerrit-Change-Number: 12142
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Mon, 07 Jan 2019 19:12:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP: IMPALA-5843: Use page index in Parquet files to skip pages

2019-01-07 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12065 )

Change subject: WIP: IMPALA-5843: Use page index in Parquet files to skip pages
..


Patch Set 1:

(7 comments)

Thanks for the comments. This time I only reply to the ones related to the 
decoders. I started a new patch that only focus on the decoders and 
value-skipping: https://gerrit.cloudera.org/#/c/12172/

http://gerrit.cloudera.org:8080/#/c/12065/1/be/src/exec/parquet/parquet-bool-decoder.cc
File be/src/exec/parquet/parquet-bool-decoder.cc:

PS1:
> Might be worth pulling out this code change to a separate patch and adding
Good idea, thanks!


http://gerrit.cloudera.org:8080/#/c/12065/1/be/src/exec/parquet/parquet-bool-decoder.cc@70
PS1, Line 70:   int skip_cached = min(num_unpacked_values_ - 
unpacked_value_idx_, num_values);
> Maybe add DCHECK_GT(num_values, 0).
Done


http://gerrit.cloudera.org:8080/#/c/12065/1/be/src/exec/parquet/parquet-bool-decoder.cc@86
PS1, Line 86:   } else {
> The else branch seems inconsistent with ParquetBoolDecoder::DecodeValue, wh
I'm not sure I follow.
When we skip values we try to avoid decoding and unpacking whenever possible. 
Am I missing something?


http://gerrit.cloudera.org:8080/#/c/12065/1/be/src/util/rle-encoding.h
File be/src/util/rle-encoding.h:

http://gerrit.cloudera.org:8080/#/c/12065/1/be/src/util/rle-encoding.h@636
PS1, Line 636: DCHECK_GE(num_values, 0);
> Does skipping 0 values make sense?
Changed to GT


http://gerrit.cloudera.org:8080/#/c/12065/1/be/src/util/rle-encoding.h@686
PS1, Line 686: inline bool RleBatchDecoder::SkipLiteralValues(
 : int32_t num_literals_to_skip) {
> nit: could fit to one line
Done


http://gerrit.cloudera.org:8080/#/c/12065/1/be/src/util/rle-encoding.h@688
PS1, Line 688: DCHECK_GE(num_literals_to_skip, 0);
> Does skipping 0 values make sense?
Changed to GT


http://gerrit.cloudera.org:8080/#/c/12065/1/be/src/util/rle-encoding.h@698
PS1, Line 698:   // run avoid ending on a non-byte boundary.
> nit: missing 'to'
Done



--
To view, visit http://gerrit.cloudera.org:8080/12065
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a
Gerrit-Change-Number: 12065
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 07 Jan 2019 18:44:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7979: Enhance decoders to support value-skipping

2019-01-07 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12172


Change subject: IMPALA-7979: Enhance decoders to support value-skipping
..

IMPALA-7979: Enhance decoders to support value-skipping

This commit adds value-skipping functionality to the decoders.
Value-skipping is useful when we know that we won't need the
following N values, so instead of decoding and dropping them, we
can just "jump" through them.

This feature is a prerequisite for Parquet page skipping (IMPALA-5843).

I added backend tests for all the decoders.

Change-Id: Ib848f1bd71735fe84e8064daf700417b32589f57
---
M be/src/exec/parquet/CMakeLists.txt
A be/src/exec/parquet/parquet-bool-decoder-test.cc
M be/src/exec/parquet/parquet-bool-decoder.cc
M be/src/exec/parquet/parquet-bool-decoder.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
M be/src/util/dict-encoding.h
M be/src/util/dict-test.cc
M be/src/util/rle-encoding.h
M be/src/util/rle-test.cc
10 files changed, 515 insertions(+), 16 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/12172/1
--
To view, visit http://gerrit.cloudera.org:8080/12172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib848f1bd71735fe84e8064daf700417b32589f57
Gerrit-Change-Number: 12172
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-7468: Port CancelQueryFInstances() to KRPC.

2019-01-07 Thread Andrew Sherman (Code Review)
Andrew Sherman has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/12142 )

Change subject: IMPALA-7468: Port CancelQueryFInstances() to KRPC.
..

IMPALA-7468: Port CancelQueryFInstances() to KRPC.

When the Coordinator needs to cancel a query (for example because a user
has hit Control-C), it does this by sending a CancelQueryFInstances
message to each fragment instance. This change switches this code to use
KRPC.

Add new protobuf definitions for the messages, and remove the old thrift
definitions. Move the server-side implementation of Cancel() from
ImpalaInternalService to ControlService. Rework the scheduler so
that the FInstanceExecParams always contains the KRPC address of the
fragment executors, this address can then be used if a query is to be
cancelled.

For now keep the KRPC calls to CancelQueryFInstances() as synchronous.

While moving the client-side code, remove the fault injection code that
was inserted with FAULT_INJECTION_SEND_RPC_EXCEPTION and
FAULT_INJECTION_RECV_RPC_EXCEPTION (triggered by running impalad with
--fault_injection_rpc_exception_type=1) as this tickles code in
client-cache.h which is now not used.

TESTING:
  Ran all end-to-end tests.
  No new tests as test_cancellation.py provides good coverage.
  Checked in debugger that DebugAction style fault injection (triggered
  from test_cancellation.py) was working correctly.

Change-Id: I625030c3f1068061aa029e6e242f016cadd84969
---
M be/src/runtime/backend-client.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/coordinator-backend-state.h
M be/src/scheduling/query-schedule.h
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/control-service.cc
M be/src/service/control-service.h
M be/src/service/impala-internal-service.cc
M be/src/service/impala-internal-service.h
M common/protobuf/control_service.proto
M common/thrift/ImpalaInternalService.thrift
12 files changed, 192 insertions(+), 139 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/42/12142/4
--
To view, visit http://gerrit.cloudera.org:8080/12142
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I625030c3f1068061aa029e6e242f016cadd84969
Gerrit-Change-Number: 12142
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Zoram Thanga 


[Impala-ASF-CR] IMPALA-7468: Port CancelQueryFInstances() to KRPC.

2019-01-07 Thread Andrew Sherman (Code Review)
Andrew Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12142 )

Change subject: IMPALA-7468: Port CancelQueryFInstances() to KRPC.
..


Patch Set 3:

(11 comments)

Thanks for the code reviews

http://gerrit.cloudera.org:8080/#/c/12142/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12142/3//COMMIT_MSG@9
PS3, Line 9: (usually because a user has
   : hit Control-C)
> Not that important, but is this really true? Some other cancellation situat
Thanks good point, "usually" is definitely wrong.


http://gerrit.cloudera.org:8080/#/c/12142/3/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

http://gerrit.cloudera.org:8080/#/c/12142/3/be/src/runtime/coordinator-backend-state.h@239
PS3, Line 239: Execution backend.
> Thrift execution backend?
Done


http://gerrit.cloudera.org:8080/#/c/12142/1/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

http://gerrit.cloudera.org:8080/#/c/12142/1/be/src/runtime/coordinator-backend-state.h@294
PS1, Line 294:   Status DoCancelQueryFInstancesRrpcWithRetry(
> I believe our current clang-format rules will tell you to put the symbols n
Done


http://gerrit.cloudera.org:8080/#/c/12142/3/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/12142/3/be/src/runtime/coordinator-backend-state.cc@427
PS3, Line 427: auto
> We generally avoid the use of auto as it hurts readability.
Done


http://gerrit.cloudera.org:8080/#/c/12142/3/be/src/runtime/coordinator-backend-state.cc@440
PS3, Line 440: DoCancelQueryFInstancesRrpcWithRetry
> Is there a reasonable way to make this generic like DoRpcWithRetry()? We'll
I'm doing retry using a lambda


http://gerrit.cloudera.org:8080/#/c/12142/3/be/src/scheduling/query-schedule.h
File be/src/scheduling/query-schedule.h:

http://gerrit.cloudera.org:8080/#/c/12142/3/be/src/scheduling/query-schedule.h@88
PS3, Line 88: execution backend
> Thrift execution backend?
Done


http://gerrit.cloudera.org:8080/#/c/12142/3/be/src/service/control-service.h
File be/src/service/control-service.h:

http://gerrit.cloudera.org:8080/#/c/12142/3/be/src/service/control-service.h@41
PS3, Line 41:   public:
> Not your code, but while you're here would you mind fixing the spacing in t
Done


http://gerrit.cloudera.org:8080/#/c/12142/3/be/src/service/control-service.h@63
PS3, Line 63:   /// Cancel any executing fragment instances for the query id 
specified in 'req'.
> nit: could you move this under ReportExecStatus() to keep all of the rpc ha
Done


http://gerrit.cloudera.org:8080/#/c/12142/3/be/src/service/control-service.h@64
PS3, Line 64: void
> virtual
Done


http://gerrit.cloudera.org:8080/#/c/12142/3/be/src/service/control-service.cc
File be/src/service/control-service.cc:

http://gerrit.cloudera.org:8080/#/c/12142/3/be/src/service/control-service.cc@173
PS3, Line 173: "query"
> Since this is a constant, its not really necessary to Substitute() it.
Done


http://gerrit.cloudera.org:8080/#/c/12142/3/common/protobuf/control_service.proto
File common/protobuf/control_service.proto:

http://gerrit.cloudera.org:8080/#/c/12142/3/common/protobuf/control_service.proto@181
PS3, Line 181:   rpc CancelQueryFInstances(CancelQueryFInstancesRequestPB) 
returns (CancelQueryFInstancesResponsePB);
> Long line.
Done, I added IMPALA-8047 for the clang-format support



--
To view, visit http://gerrit.cloudera.org:8080/12142
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I625030c3f1068061aa029e6e242f016cadd84969
Gerrit-Change-Number: 12142
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Mon, 07 Jan 2019 18:36:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7087: Read Parquet decimal columns with lower precision/scale

2019-01-07 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12163 )

Change subject: IMPALA-7087: Read Parquet decimal columns with lower 
precision/scale
..


Patch Set 1:

> (1 comment)
 >
 > Thanks for taking a look @Csaba! I've only responded to your
 > comment on overflow handling so far, as I want to get that behavior
 > locked down first.

Are you sure that the query will be aborted - isn't it only one warning / file? 
If there are "normal" files processed before the one with overflows, then some 
rows will be returned before running to the error. In the current 
implementation it is also possible that some rows are returned from the file 
before reaching the overflowing value.

I am also not completely sure about the best approach.

I would prefer the error to be non-fatal, so to print one warning/file and 
continue the query.

If no important use case is lost this way, then I would prefer to use the 
formula and skip files based on metadata instead of checking every row.

This would make it possible to print a useful error message, so that the user 
will know the precision needed to process the file, and can call ALTER TABLE to 
increase precision / decrease scale and make the file usable. If the error is 
printed based the first row, then this can become a long iterative process.


--
To view, visit http://gerrit.cloudera.org:8080/12163
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iafc8efd12379a39756e3e70f022a81a636dadb61
Gerrit-Change-Number: 12163
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Mon, 07 Jan 2019 18:22:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7924: Generate Thrift 11 Python Code

2019-01-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12036 )

Change subject: IMPALA-7924: Generate Thrift 11 Python Code
..


Patch Set 2:

(2 comments)

LGTM, just one ask about the commit message then I can start GVD

http://gerrit.cloudera.org:8080/#/c/12036/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12036/2//COMMIT_MSG@13
PS2, Line 13: The Thrift 11 Python deserialization code has some big performance
Might be worth briefly mentioning how to take advantage of it - does one just 
set PYTHONPATH to include it manually?


http://gerrit.cloudera.org:8080/#/c/12036/2/common/thrift/CMakeLists.txt
File common/thrift/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/12036/2/common/thrift/CMakeLists.txt@a230
PS2, Line 230:
> thrift-deps depends on thrift11-py, which depends on thrift-ext-data-src. M
Ah I missed that. I suppose that works.



--
To view, visit http://gerrit.cloudera.org:8080/12036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3432c3e29d28ec3ef6a0a22156a18910f511fed0
Gerrit-Change-Number: 12036
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 07 Jan 2019 18:22:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7924: Generate Thrift 11 Python Code

2019-01-07 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12036 )

Change subject: IMPALA-7924: Generate Thrift 11 Python Code
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12036/2/common/thrift/CMakeLists.txt
File common/thrift/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/12036/2/common/thrift/CMakeLists.txt@a230
PS2, Line 230:
> Did you mean to remove this as a dependency?
thrift-deps depends on thrift11-py, which depends on thrift-ext-data-src. My 
understanding is thats its basically a chain of dependencies, so the intention 
was to just add thrift11-py as a node in that chain. But I might be missing 
something, pretty new to CMake :)



--
To view, visit http://gerrit.cloudera.org:8080/12036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3432c3e29d28ec3ef6a0a22156a18910f511fed0
Gerrit-Change-Number: 12036
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 07 Jan 2019 18:12:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7917 (Part 1): Decouple Sentry from Impala

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12020 )

Change subject: IMPALA-7917 (Part 1): Decouple Sentry from Impala
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/1724/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If1fd1df0b38ddd7cfa41299e95f5827f8a9e9c1f
Gerrit-Change-Number: 12020
Gerrit-PatchSet: 7
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 07 Jan 2019 18:09:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7625: test web pages.py backend tests are failing

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12024 )

Change subject: IMPALA-7625: test_web_pages.py backend tests are failing
..


Patch Set 6: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2092afe1bfaec30cd3e7b0040f06865e43fe62fb
Gerrit-Change-Number: 12024
Gerrit-PatchSet: 6
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 07 Jan 2019 18:09:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7625: test web pages.py backend tests are failing

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12024 )

Change subject: IMPALA-7625: test_web_pages.py backend tests are failing
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3616/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/12024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2092afe1bfaec30cd3e7b0040f06865e43fe62fb
Gerrit-Change-Number: 12024
Gerrit-PatchSet: 6
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 07 Jan 2019 18:09:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7625: test web pages.py backend tests are failing

2019-01-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12024 )

Change subject: IMPALA-7625: test_web_pages.py backend tests are failing
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2092afe1bfaec30cd3e7b0040f06865e43fe62fb
Gerrit-Change-Number: 12024
Gerrit-PatchSet: 5
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 07 Jan 2019 18:09:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7924: Generate Thrift 11 Python Code

2019-01-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12036 )

Change subject: IMPALA-7924: Generate Thrift 11 Python Code
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12036/2/common/thrift/CMakeLists.txt
File common/thrift/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/12036/2/common/thrift/CMakeLists.txt@a230
PS2, Line 230:
Did you mean to remove this as a dependency?



--
To view, visit http://gerrit.cloudera.org:8080/12036
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3432c3e29d28ec3ef6a0a22156a18910f511fed0
Gerrit-Change-Number: 12036
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 07 Jan 2019 18:05:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8047 Support .proto files in .clang-format

2019-01-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12165 )

Change subject: IMPALA-8047 Support .proto files in .clang-format
..


Patch Set 2: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12165/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12165/2//COMMIT_MSG@20
PS2, Line 20: This produces only a few diffs when run against the existing 
Impala
Thanks for including this, made it way easier to review.



--
To view, visit http://gerrit.cloudera.org:8080/12165
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0c2dcfc21fc8f9206adb64166fbd05580516791f
Gerrit-Change-Number: 12165
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 07 Jan 2019 17:57:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8047 Support .proto files in .clang-format

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12165 )

Change subject: IMPALA-8047 Support .proto files in .clang-format
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12165
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0c2dcfc21fc8f9206adb64166fbd05580516791f
Gerrit-Change-Number: 12165
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 07 Jan 2019 17:57:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8047 Support .proto files in .clang-format

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12165 )

Change subject: IMPALA-8047 Support .proto files in .clang-format
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3615/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/12165
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0c2dcfc21fc8f9206adb64166fbd05580516791f
Gerrit-Change-Number: 12165
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 07 Jan 2019 17:57:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6742: Profiles of running queries should include execution summary.

2019-01-07 Thread Joe McDonnell (Code Review)
Joe McDonnell has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/11591 )

Change subject: IMPALA-6742: Profiles of running queries should include 
execution summary.
..

IMPALA-6742: Profiles of running queries should include execution summary.

Currently execution summary is not included in the profiles of running
queries, and it's only reported when the query is finished. This jira makes
the execution summary to the profile reported when queries are still running.

Testing:
Added unit test.
Done with real cluster.

Change-Id: Idc7f714c9427d4b26d4e78cf27ceca2b0b336699
Reviewed-on: http://gerrit.cloudera.org:8080/11591
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins 
---
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M tests/query_test/test_observability.py
3 files changed, 42 insertions(+), 7 deletions(-)

Approvals:
  Joe McDonnell: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/11591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Idc7f714c9427d4b26d4e78cf27ceca2b0b336699
Gerrit-Change-Number: 11591
Gerrit-PatchSet: 6
Gerrit-Owner: Yongjun Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Yongjun Zhang 


[Impala-ASF-CR] IMPALA-8021: Add estimated cardinality to EXPLAIN output

2019-01-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12136 )

Change subject: IMPALA-8021: Add estimated cardinality to EXPLAIN output
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/test/java/org/apache/impala/testutil/TestUtils.java
File fe/src/test/java/org/apache/impala/testutil/TestUtils.java:

http://gerrit.cloudera.org:8080/#/c/12136/12/fe/src/test/java/org/apache/impala/testutil/TestUtils.java@207
PS12, Line 207:  If found, then parse the values and compare them. Report a 
match
  :* if the values are within 5%, else report a mismatch.
> Actually, looking at this again, it appears that we are just comparing the
Yeah I'm not sure if we expect the cardinality estimates to be exactly 
reproducible from run-to-run. I can't think of a reason why they wouldn't be 
exactly equal - the collected stats should always be the same (there's no 
randomness or non-determinism in the cardinality there AFAIK).

There are some other cases where there's variation in the exact numbers 
produced by the planner - memory reservations for scans depend on the file 
size, which can vary over time slightly for various reasons.

I figured that having this tolerance factor didn't significantly weaken the 
tests (especially since we didn't even have cardinality before) and would 
alleviate some amount of labour if estimates change marginally.



--
To view, visit http://gerrit.cloudera.org:8080/12136
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie9aa2d715b04cbb279aaffec8c5692686562d986
Gerrit-Change-Number: 12136
Gerrit-PatchSet: 12
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 07 Jan 2019 17:42:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7917 (Part 1): Decouple Sentry from Impala

2019-01-07 Thread Fredy Wijaya (Code Review)
Fredy Wijaya has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/12020 )

Change subject: IMPALA-7917 (Part 1): Decouple Sentry from Impala
..

IMPALA-7917 (Part 1): Decouple Sentry from Impala

The first part of this patch is to provide an initial work to decouple
Sentry from Impala by creating a generic authorization provider
interface that Sentry implements. The idea is to allow more
authorization providers in the future. The patch updates the following:
- Renamed Authorizeable to Authorizable to fix typographical error.
- Moved any clases that uses Sentry specific code to
  org.apache.impala.authorization.sentry package and created interfaces
  when necessary.
- Moved all generic authorization related classes to
  org.apache.impala.authorization package.
- Minor clean up on authorization related code.

In this patch, switching the authorization provider implementation
still requires updating the code in many different places. A follow up
patch will make it easy to switch an authorization provider
implementation.

Testing:
- Ran all FE tests
- Ran all E2E authorization tests

Change-Id: If1fd1df0b38ddd7cfa41299e95f5827f8a9e9c1f
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateFunctionStmtBase.java
M fe/src/main/java/org/apache/impala/analysis/DropFunctionStmt.java
M fe/src/main/java/org/apache/impala/analysis/HdfsUri.java
M fe/src/main/java/org/apache/impala/analysis/ResetMetadataStmt.java
R fe/src/main/java/org/apache/impala/authorization/Authorizable.java
M fe/src/main/java/org/apache/impala/authorization/AuthorizationChecker.java
M fe/src/main/java/org/apache/impala/authorization/AuthorizationConfig.java
R fe/src/main/java/org/apache/impala/authorization/AuthorizationException.java
C 
fe/src/main/java/org/apache/impala/authorization/AuthorizationPolicyReaderException.java
R fe/src/main/java/org/apache/impala/authorization/AuthorizationProxy.java
D fe/src/main/java/org/apache/impala/authorization/AuthorizeableUri.java
A fe/src/main/java/org/apache/impala/authorization/Authorizer.java
R 
fe/src/main/java/org/apache/impala/authorization/AuthorizerUnavailableException.java
M fe/src/main/java/org/apache/impala/authorization/Privilege.java
M fe/src/main/java/org/apache/impala/authorization/PrivilegeRequest.java
M fe/src/main/java/org/apache/impala/authorization/PrivilegeRequestBuilder.java
A fe/src/main/java/org/apache/impala/authorization/sentry/ImpalaAction.java
R 
fe/src/main/java/org/apache/impala/authorization/sentry/ImpalaActionFactory.java
R 
fe/src/main/java/org/apache/impala/authorization/sentry/ImpalaPrivilegeModel.java
R 
fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthProvider.java
A 
fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizable.java
R 
fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizableColumn.java
R 
fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizableDb.java
R 
fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizableFn.java
R 
fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizableServer.java
R 
fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizableTable.java
A 
fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizableUri.java
A 
fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizationChecker.java
A 
fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizationConfig.java
A 
fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizationPolicy.java
R fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizer.java
R fe/src/main/java/org/apache/impala/authorization/sentry/SentryConfig.java
R fe/src/main/java/org/apache/impala/authorization/sentry/SentryUtil.java
M fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
D fe/src/main/java/org/apache/impala/common/SentryPolicyReaderException.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/analysis/AnalysisSessionFixture.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeAuthStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AuditingTest.java
M fe/src/test/java/org/apache/impala/analysis/AuthorizationStmtTest.java
M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
M 

[Impala-ASF-CR] IMPALA-7917 (Part 1): Decouple Sentry from Impala

2019-01-07 Thread Fredy Wijaya (Code Review)
Fredy Wijaya has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12020 )

Change subject: IMPALA-7917 (Part 1): Decouple Sentry from Impala
..


Patch Set 7:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/12020/5/fe/src/main/java/org/apache/impala/authorization/AuthorizationProxy.java
File fe/src/main/java/org/apache/impala/authorization/AuthorizationProxy.java:

http://gerrit.cloudera.org:8080/#/c/12020/5/fe/src/main/java/org/apache/impala/authorization/AuthorizationProxy.java@222
PS5, Line 222: m authoriz
> nit: should be lower case
Done


http://gerrit.cloudera.org:8080/#/c/12020/5/fe/src/main/java/org/apache/impala/authorization/Privilege.java
File fe/src/main/java/org/apache/impala/authorization/Privilege.java:

http://gerrit.cloudera.org:8080/#/c/12020/5/fe/src/main/java/org/apache/impala/authorization/Privilege.java@23
PS5, Line 23:  * List of Impala privileges.
> not understanding this
Done


http://gerrit.cloudera.org:8080/#/c/12020/5/fe/src/main/java/org/apache/impala/authorization/Privilege.java@40
PS5, Line 40:   static {
> it's sort of odd that you use a static block to set up the 'privileges_' me
Java doesn't allow this construct.

public enum Privilege {
  ...
  VIEW_METADATA(EnumSet.of(INSERT, SELECT, REFRESH), true); // compilation 
error.

  Privilege(EnumSet privileges, boolean anyOf) { ... }
}


http://gerrit.cloudera.org:8080/#/c/12020/5/fe/src/main/java/org/apache/impala/authorization/Privilege.java@53
PS5, Line 53: Privilege> implied_;
> maybe this should be like 'implied_' or something? If I understand correctl
Yeah implied is a better name. It basically means if a SQL has VIEW_METADATA 
privilege, it requires either INSERT, SELECT, or REFRESH.


http://gerrit.cloudera.org:8080/#/c/12020/5/fe/src/main/java/org/apache/impala/authorization/PrivilegeRequest.java
File fe/src/main/java/org/apache/impala/authorization/PrivilegeRequest.java:

http://gerrit.cloudera.org:8080/#/c/12020/5/fe/src/main/java/org/apache/impala/authorization/PrivilegeRequest.java@45
PS5, Line 45:
> seems to no longer be permitted
Done


http://gerrit.cloudera.org:8080/#/c/12020/5/fe/src/main/java/org/apache/impala/authorization/PrivilegeRequestBuilder.java
File 
fe/src/main/java/org/apache/impala/authorization/PrivilegeRequestBuilder.java:

http://gerrit.cloudera.org:8080/#/c/12020/5/fe/src/main/java/org/apache/impala/authorization/PrivilegeRequestBuilder.java@74
PS5, Line 74:   /**
> worth adding Preconditions.checkState(authorizable_ == null) in all of thes
Done


http://gerrit.cloudera.org:8080/#/c/12020/5/fe/src/main/java/org/apache/impala/authorization/sentry/ImpalaAction.java
File fe/src/main/java/org/apache/impala/authorization/sentry/ImpalaAction.java:

http://gerrit.cloudera.org:8080/#/c/12020/5/fe/src/main/java/org/apache/impala/authorization/sentry/ImpalaAction.java@48
PS5, Line 48: mpalaAction(String value, int code) {
: bitFieldAction_ = new BitFieldAction(value, code);
:   }
> do we still need to support both versions? maybe we can have a TODO to remo
We only need to support Sentry 2.0.0. I'll update the code. Done.



--
To view, visit http://gerrit.cloudera.org:8080/12020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If1fd1df0b38ddd7cfa41299e95f5827f8a9e9c1f
Gerrit-Change-Number: 12020
Gerrit-PatchSet: 7
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 07 Jan 2019 17:34:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7087: Read Parquet decimal columns with lower precision/scale

2019-01-07 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12163 )

Change subject: IMPALA-7087: Read Parquet decimal columns with lower 
precision/scale
..


Patch Set 1:

(1 comment)

Thanks for taking a look @Csaba! I've only responded to your comment on 
overflow handling so far, as I want to get that behavior locked down first.

http://gerrit.cloudera.org:8080/#/c/12163/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12163/1//COMMIT_MSG@25
PS1, Line 25: If the underlying Parquet files contain rows that cannot be 
converted to
: the higher scale without overflow, an error is thrown and the 
query
: fails. This is different from other engines such as Hive which 
set the
: row to NULL and allow the query to run to completion without 
error.
> I do not know the use case, so I may be wrong, but I dislike the idea of fa
Yes, the example you provided would trigger an overflow. The formula you 
provided sounds like it should work.

I couldn't find much info in the SQL standard on how to handle these types of 
overflows, so maybe the behavior is ambiguous. For relational databases, you 
get an error when you try to insert a row that causes an overflow. However, the 
error is upon row insertion, so the table is still queryable after a failed 
attempt at inserting the row.

As mentioned in the comments, Hive sets the row to NULL, and the query runs to 
completion.

So sounds like there are three options: (1) the current behavior in the patch, 
(2) use the formula you provided to prevent overflows in which case no runtime 
errors or warnings are necessary, or (3) set the row to NULL and print out a 
warning (I think we do this for other types of Parquet scanning errors?).

FWIW, the current behavior is exactly what you described. If you put a Parquet 
file into a table with a lower precision / scale, the table will be unqueryable 
until the files are removed. So if we went with option (2), any file inside the 
table that doesn't match the formula will make the table unqueryable. The main 
difference with this patch is that the table will be unqueryable only if 
certain rows that overflow are present.

I don't feel strongly about any of the approaches, and I'm not super familiar 
with how Impala handles these types of issues, so open to suggestions.



--
To view, visit http://gerrit.cloudera.org:8080/12163
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iafc8efd12379a39756e3e70f022a81a636dadb61
Gerrit-Change-Number: 12163
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Mon, 07 Jan 2019 16:26:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6503: Support reading complex types from ORC format files

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12168 )

Change subject: IMPALA-6503: Support reading complex types from ORC format files
..


Patch Set 5:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/1723/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/12168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790
Gerrit-Change-Number: 12168
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 07 Jan 2019 15:28:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7087: Read Parquet decimal columns with lower precision/scale

2019-01-07 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12163 )

Change subject: IMPALA-7087: Read Parquet decimal columns with lower 
precision/scale
..


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/12163/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12163/1//COMMIT_MSG@25
PS1, Line 25: If the underlying Parquet files contain rows that cannot be 
converted to
: the higher scale without overflow, an error is thrown and the 
query
: fails. This is different from other engines such as Hive which 
set the
: row to NULL and allow the query to run to completion without 
error.
I do not know the use case, so I may be wrong, but I dislike the idea of 
failing the query if the Parquet file is not corrupt and only the data is "too 
large". If a file with such data is added to a table, the table can become 
unusable and Impala offers no tool to get rid of the problematic file.

Would it make sense to be stricter during scale + precision checking to exclude 
the possibility of overflow? If I understand correctly, overflow can only occur 
if the scale in the file is lower then the expected scale: e.g value 2 in 
decimal (1,0) would overflow when converted to decimal(1, 1). If precision_file 
- scale_file <= precision_expected - scale_expected, then the conversion should 
happen without overflow, shouldn't it?


http://gerrit.cloudera.org:8080/#/c/12163/1/be/src/exec/parquet/parquet-metadata-utils.cc
File be/src/exec/parquet/parquet-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/12163/1/be/src/exec/parquet/parquet-metadata-utils.cc@243
PS1, Line 243: // TODO: we could allow a mismatch and do a conversion at this 
step.
This TODO could be probably updated.


http://gerrit.cloudera.org:8080/#/c/12163/1/be/src/exec/parquet/parquet-metadata-utils.cc@255
PS1, Line 255: // TODO: we could allow a mismatch and do a conversion at this 
step.
This TODO could be probably updated.


http://gerrit.cloudera.org:8080/#/c/12163/1/testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-widening.test
File 
testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-widening.test:

http://gerrit.cloudera.org:8080/#/c/12163/1/testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-widening.test@85
PS1, Line 85: 
Please add some tests with 'where col=const' clauses to exercise the dictionary 
and stat filtering code paths.

Dictionary filtering is disabled if 'needs_conversion_' is true, but it may be 
enabled in the future if conversion will be moved to dictionary construction.

I think that stat filtering won't work correctly with the current 
implementation - the stat value will be interpreted as if it had the same 
precision/scale as the SQL column, which can lead to incorrectly dropping row 
groups.

I had to deal with somewhat similar problems in 
https://gerrit.cloudera.org/#/c/11057/


http://gerrit.cloudera.org:8080/#/c/12163/1/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/12163/1/tests/query_test/test_scanners.py@842
PS1, Line 842: decimal_stored_as_int32.parquet
I do not see this file in the patch and it is not mentioned in the README.


http://gerrit.cloudera.org:8080/#/c/12163/1/tests/query_test/test_scanners.py@848
PS1, Line 848: decimal_stored_as_int64.parquet
I do not see this file in the patch and it is not mentioned in the README.


http://gerrit.cloudera.org:8080/#/c/12163/1/tests/query_test/test_scanners.py@856
PS1, Line 856: binary_decimal_no_dictionary.parquet
I do not see this file in the patch and it is not mentioned in the README.



--
To view, visit http://gerrit.cloudera.org:8080/12163
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iafc8efd12379a39756e3e70f022a81a636dadb61
Gerrit-Change-Number: 12163
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Mon, 07 Jan 2019 15:14:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6503: Support reading complex types from ORC format files

2019-01-07 Thread Quanlong Huang (Code Review)
Quanlong Huang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12168


Change subject: IMPALA-6503: Support reading complex types from ORC format files
..

IMPALA-6503: Support reading complex types from ORC format files

We’ve supported reading primitive types from ORC files (IMPALA-5717).
In this patch we add support for complex types (struct/array/map).

In IMPALA-5717, we depend on the ORC lib to read ORC binaries. The ORC
lib can materialize ORC column binaries into its representation
(orc::ColumnVectorBatch), so we don’t need to do anything about
decoding/decompression in hdfs-orc-scanner. Since it already supports
complex types, we’ll still depend on it.

What we need to add in IMPALA-6503 are two things:
1. Specify which nested columns we need to the ORC lib
2. Transform outputs of ORC lib (nested orc::ColumnVectorBatch) into
  Impala’s representation

To format the materialization, we implement several ORC column readers
used in hdfs-orc-scanner. Each kind of reader treats a column type.
Don’t like the Parquet readers (used in hdfs-parquet-scanner) which
materializes Parquet column binaries into tuple values directly, the ORC
readers (in hdfs-orc-scanner) just need to transform outputs of the ORC
lib into tuple/slot values.

Tests:
* Enable existing tests for complex types (test_nested_types.py,
test_tpch_nested_queries.py) for ORC.

Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790
---
M be/src/exec/CMakeLists.txt
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
A be/src/exec/orc-column-readers.cc
A be/src/exec/orc-column-readers.h
A be/src/exec/orc-metadata-utils.cc
A be/src/exec/orc-metadata-utils.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
A testdata/ComplexTypesTbl/README
A testdata/ComplexTypesTbl/nonnullable.orc
A testdata/ComplexTypesTbl/nullable.orc
M testdata/bin/create-load-data.sh
M testdata/bin/load_nested.py
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/complex-types-file-formats.test
M testdata/workloads/functional-query/queries/QueryTest/max-nesting-depth.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-subplan.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch-limit.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch-mem-limit.test
M testdata/workloads/functional-query/queries/QueryTest/nested-types-tpch.test
M testdata/workloads/tpch_nested/tpch_nested_core.csv
M testdata/workloads/tpch_nested/tpch_nested_dimensions.csv
M testdata/workloads/tpch_nested/tpch_nested_exhaustive.csv
M testdata/workloads/tpch_nested/tpch_nested_pairwise.csv
M tests/query_test/test_nested_types.py
M tests/query_test/test_tpch_nested_queries.py
29 files changed, 1,656 insertions(+), 442 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/12168/5
--
To view, visit http://gerrit.cloudera.org:8080/12168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790
Gerrit-Change-Number: 12168
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang 


[Impala-ASF-CR] IMPALA-6503: Support reading complex types from ORC format files

2019-01-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12168 )

Change subject: IMPALA-6503: Support reading complex types from ORC format files
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12168/5/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/12168/5/be/src/exec/hdfs-orc-scanner.cc@291
PS5, Line 291:   
RETURN_IF_ERROR(schema_resolver_->ResolveColumn(tuple_desc.tuple_path(), , 
_field,
line too long (93 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/12168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790
Gerrit-Change-Number: 12168
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 07 Jan 2019 14:55:52 +
Gerrit-HasComments: Yes