[Impala-ASF-CR] IMPALA-10156: test unmatched schema should use unique database

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16758 )

Change subject: IMPALA-10156: test_unmatched_schema should use unique_database
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16758
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I66c5388d62f87795176b20243a4ccca70412b18c
Gerrit-Change-Number: 16758
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Sat, 21 Nov 2020 06:54:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10156: test unmatched schema should use unique database

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16758 )

Change subject: IMPALA-10156: test_unmatched_schema should use unique_database
..

IMPALA-10156: test_unmatched_schema should use unique_database

This updates the test to use the unique_database fixture instead
of trying to generate a unique table name on its own.

Change-Id: I66c5388d62f87795176b20243a4ccca70412b18c
Reviewed-on: http://gerrit.cloudera.org:8080/16758
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M tests/query_test/test_scanners.py
1 file changed, 14 insertions(+), 18 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/16758
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I66c5388d62f87795176b20243a4ccca70412b18c
Gerrit-Change-Number: 16758
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
..


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6689/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 10
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 21 Nov 2020 05:46:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits

2020-11-20 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
..


Patch Set 10: Code-Review+2

Rebased on latest master.  Carry forward +2 .


--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 10
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 21 Nov 2020 05:36:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7711/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 9
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 21 Nov 2020 04:17:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits

2020-11-20 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
..


Patch Set 9:

> Patch Set 8:
>
> > Patch Set 8:
> >
> > > Patch Set 8:
> > >
> > > > Patch Set 8: Verified-1
> > > >
> > > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6687/
> > >
> > > There's 1 failure:
> > > [gw1] FAILED 
> > > query_test/test_exprs.py::TestExprLimits::test_statement_expression_limit
> > >
> > > However, on my desktop I ran query_test/test_exprs.py and all tests under 
> > > it passed.
> >
> > On Jenkins this test hit an OOM GC overhead limit exceeded:
> > gw1] linux2 -- Python 2.7.16 
> > /home/ubuntu/Impala/bin/../infra/python/env-gcc7.5.0/bin/python
> > query_test/test_exprs.py:176: in test_statement_expression_limit
> > assert re.search(expected_err_re, str(err))
> > E   assert None
> > E+  where None = ('Exceeded the 
> > statement expression limit \\(25\\)\nStatement has .* expressions.', 
> > "ImpalaBeeswaxException:\n INNER EXCEPTION:  > 'beeswaxd.ttypes.BeeswaxException'>\n MESSAGE: OutOfMemoryError: GC 
> > overhead limit exceeded\n")
> > E+where  = re.search
> > E+and   "ImpalaBeeswaxException:\n INNER EXCEPTION:  > 'beeswaxd.ttypes.BeeswaxException'>\n MESSAGE: OutOfMemoryError: GC 
> > overhead limit exceeded\n" = str(ImpalaBeeswaxException())
>
> Oh wait, there's actually another failure in FE:
>
> 23:47:52 [INFO] Running org.apache.impala.analysis.ParserTest
> 23:47:52 [ERROR] Tests run: 98, Failures: 1, Errors: 0, Skipped: 0, Time 
> elapsed: 0.829 s <<< FAILURE! - in org.apache.impala.analysis.ParserTest
> 23:47:52 [ERROR] TestGetErrorMsg(org.apache.impala.analysis.ParserTest)  Time 
> elapsed: 0.005 s  <<< FAILURE!
> at 
> org.apache.impala.analysis.ParserTest.ParserError(ParserTest.java:77)
> at 
> org.apache.impala.analysis.ParserTest.TestGetErrorMsg(ParserTest.java:3475)
>
> Will need to look into this one.

PS9 fixes the ParserTest issue..just had to update the list of expected 
keywords in the test after a WHERE clause.  Not yet sure about the other OOM 
failure since I cannot repro locally.


--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 9
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 21 Nov 2020 03:58:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits

2020-11-20 Thread Aman Sinha (Code Review)
Hello Qifan Chen, Shant Hovsepian, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16723

to look at the new patch set (#9).

Change subject: IMPALA-10314: Optimize planning time for simple limits
..

IMPALA-10314: Optimize planning time for simple limits

This patch optimizes the planning time for simple limit
queries by only considering a minimal set of partitions
whose file descriptors add up to N (the specified limit).
Each file is conservatively estimated to contain 1 row.

This reduces the number of partitions processed by
HdfsScanNode.computeScanRangeLocations() which, according
to query profiling, has been the main contributor to the
planning time especially for large number of partitions.
Further, within each partition, we only consider the number
of non-empty files that brings the total to N.

This is an opt-in optimization. A new planner option
OPTIMIZE_SIMPLE_LIMIT enables this optimization. Further,
if there's a WHERE clause, it must have an 'always_true'
hint in order for the optimization to be considered. For
example:
  set optimize_simple_limit = true;
  SELECT * FROM T
WHERE /* +always_true */ 
  LIMIT 10;

If there are too many empty files in the partitions, it is
possible that the query may produce fewer rows although
those are still valid rows.

Testing:
 - Added planner tests for the optimization
 - Ran query_test.py tests by enabling the optimize_simple_limit
 - Added an e2e test. Since result rows are non-deterministic,
   only simple count(*) query on top of subquery with limit
   was added.

Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/PartitionSet.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test
M 
testdata/workloads/functional-query/queries/QueryTest/range-constant-propagation.test
17 files changed, 507 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/16723/9
--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 9
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits

2020-11-20 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
..


Patch Set 8:

> Patch Set 8:
>
> > Patch Set 8:
> >
> > > Patch Set 8: Verified-1
> > >
> > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6687/
> >
> > There's 1 failure:
> > [gw1] FAILED 
> > query_test/test_exprs.py::TestExprLimits::test_statement_expression_limit
> >
> > However, on my desktop I ran query_test/test_exprs.py and all tests under 
> > it passed.
>
> On Jenkins this test hit an OOM GC overhead limit exceeded:
> gw1] linux2 -- Python 2.7.16 
> /home/ubuntu/Impala/bin/../infra/python/env-gcc7.5.0/bin/python
> query_test/test_exprs.py:176: in test_statement_expression_limit
> assert re.search(expected_err_re, str(err))
> E   assert None
> E+  where None = ('Exceeded the 
> statement expression limit \\(25\\)\nStatement has .* expressions.', 
> "ImpalaBeeswaxException:\n INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'>\n MESSAGE: OutOfMemoryError: GC overhead 
> limit exceeded\n")
> E+where  = re.search
> E+and   "ImpalaBeeswaxException:\n INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'>\n MESSAGE: OutOfMemoryError: GC overhead 
> limit exceeded\n" = str(ImpalaBeeswaxException())

Oh wait, there's actually another failure in FE:

23:47:52 [INFO] Running org.apache.impala.analysis.ParserTest
23:47:52 [ERROR] Tests run: 98, Failures: 1, Errors: 0, Skipped: 0, Time 
elapsed: 0.829 s <<< FAILURE! - in org.apache.impala.analysis.ParserTest
23:47:52 [ERROR] TestGetErrorMsg(org.apache.impala.analysis.ParserTest)  Time 
elapsed: 0.005 s  <<< FAILURE!
at org.apache.impala.analysis.ParserTest.ParserError(ParserTest.java:77)
at 
org.apache.impala.analysis.ParserTest.TestGetErrorMsg(ParserTest.java:3475)

Will need to look into this one.


--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 8
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 21 Nov 2020 03:24:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits

2020-11-20 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
..


Patch Set 8:

> Patch Set 8:
>
> > Patch Set 8: Verified-1
> >
> > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6687/
>
> There's 1 failure:
> [gw1] FAILED 
> query_test/test_exprs.py::TestExprLimits::test_statement_expression_limit
>
> However, on my desktop I ran query_test/test_exprs.py and all tests under it 
> passed.

On Jenkins this test hit an OOM GC overhead limit exceeded:
gw1] linux2 -- Python 2.7.16 
/home/ubuntu/Impala/bin/../infra/python/env-gcc7.5.0/bin/python
query_test/test_exprs.py:176: in test_statement_expression_limit
assert re.search(expected_err_re, str(err))
E   assert None
E+  where None = ('Exceeded the 
statement expression limit \\(25\\)\nStatement has .* expressions.', 
"ImpalaBeeswaxException:\n INNER EXCEPTION: \n MESSAGE: OutOfMemoryError: GC overhead 
limit exceeded\n")
E+where  = re.search
E+and   "ImpalaBeeswaxException:\n INNER EXCEPTION: \n MESSAGE: OutOfMemoryError: GC overhead 
limit exceeded\n" = str(ImpalaBeeswaxException())


--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 8
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 21 Nov 2020 03:14:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits

2020-11-20 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
..


Patch Set 8:

> Patch Set 8: Verified-1
>
> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6687/

There's 1 failure:
[gw1] FAILED 
query_test/test_exprs.py::TestExprLimits::test_statement_expression_limit

However, on my desktop I ran query_test/test_exprs.py and all tests under it 
passed.


--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 8
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 21 Nov 2020 03:09:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
..


Patch Set 8: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6687/


--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 8
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 21 Nov 2020 02:48:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10258: Fixed flaky TestQueryRetries.test original query cancel

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16763 )

Change subject: IMPALA-10258: Fixed flaky 
TestQueryRetries.test_original_query_cancel
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7710/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16763
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib89f7b01a0f2a66a97f312e779a4ab04f4f347f3
Gerrit-Change-Number: 16763
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Sat, 21 Nov 2020 02:08:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10312: bump timeout in test ddl queries are closed

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16762 )

Change subject: IMPALA-10312: bump timeout in test_ddl_queries_are_closed
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7709/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16762
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5885df6494122dffe2bbc6877cec3b90a9eb4ec6
Gerrit-Change-Number: 16762
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 21 Nov 2020 01:58:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10258: Fixed flaky TestQueryRetries.test original query cancel

2020-11-20 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16763


Change subject: IMPALA-10258: Fixed flaky 
TestQueryRetries.test_original_query_cancel
..

IMPALA-10258: Fixed flaky TestQueryRetries.test_original_query_cancel

When TestQueryRetries.test_original_query_cancel was ran on s3
with query option spool_query_results enabled, the query was
timeout before reaching the expected state.
This patch double the timeout for the query when the test is
running on S3 and double the timeout for query to reaching
"FINISHED" state.

For IMPALA-10109, test_retries_from_cancellation_pool did not
trigger query-retry when one of impalad was killed. It seems
that membership updating message was not received and processed
by coordinator before reaching terminated state, hence the
query-retry was not triggered.
This patch reduce the heartbeat_frequency and
max_missed_heartbeats so that statestore will take much less
time to update membership when one impalad was killed so that
coordinator could start query-retry.

Testing:
 - Ran the two tests in a loop for more than 3 hours. The test
   failures did not happen.

Change-Id: Ib89f7b01a0f2a66a97f312e779a4ab04f4f347f3
---
M tests/custom_cluster/test_query_retries.py
1 file changed, 12 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/16763/1
--
To view, visit http://gerrit.cloudera.org:8080/16763
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib89f7b01a0f2a66a97f312e779a4ab04f4f347f3
Gerrit-Change-Number: 16763
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-10312: bump timeout in test ddl queries are closed

2020-11-20 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16762


Change subject: IMPALA-10312: bump timeout in test_ddl_queries_are_closed
..

IMPALA-10312: bump timeout in test_ddl_queries_are_closed

This increases the timeout from 10s to 30s for waiting for the
queries to be closed under the theory that the test failure is
caused by random slowness.

Change-Id: I5885df6494122dffe2bbc6877cec3b90a9eb4ec6
---
M tests/shell/test_shell_interactive.py
1 file changed, 5 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/16762/1
--
To view, visit http://gerrit.cloudera.org:8080/16762
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I5885df6494122dffe2bbc6877cec3b90a9eb4ec6
Gerrit-Change-Number: 16762
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9050: fix TestScanRangeLengths params

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16761 )

Change subject: IMPALA-9050: fix TestScanRangeLengths params
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7708/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16761
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b8591335dcdc85ce27674b35661444a46d30d5a
Gerrit-Change-Number: 16761
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 21 Nov 2020 01:29:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10156: test unmatched schema should use unique database

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16758 )

Change subject: IMPALA-10156: test_unmatched_schema should use unique_database
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6688/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16758
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I66c5388d62f87795176b20243a4ccca70412b18c
Gerrit-Change-Number: 16758
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Sat, 21 Nov 2020 01:29:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10156: test unmatched schema should use unique database

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16758 )

Change subject: IMPALA-10156: test_unmatched_schema should use unique_database
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16758
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I66c5388d62f87795176b20243a4ccca70412b18c
Gerrit-Change-Number: 16758
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Sat, 21 Nov 2020 01:29:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4238: make TestClientSsl more robust

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16760 )

Change subject: IMPALA-4238: make TestClientSsl more robust
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7707/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0c884f76659005e7245a156ee33c249b86662b75
Gerrit-Change-Number: 16760
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 21 Nov 2020 01:15:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9050: fix TestScanRangeLengths params

2020-11-20 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16761


Change subject: IMPALA-9050: fix TestScanRangeLengths params
..

IMPALA-9050: fix TestScanRangeLengths params

This test is only relevant from HDFS-based table formats. The option
under test does not affect behaviour for Kudu or HBase.

Change-Id: I9b8591335dcdc85ce27674b35661444a46d30d5a
---
M tests/query_test/test_scanners.py
1 file changed, 3 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/16761/1
--
To view, visit http://gerrit.cloudera.org:8080/16761
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9b8591335dcdc85ce27674b35661444a46d30d5a
Gerrit-Change-Number: 16761
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-4238: make TestClientSsl more robust

2020-11-20 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16760


Change subject: IMPALA-4238: make TestClientSsl more robust
..

IMPALA-4238: make TestClientSsl more robust

This changes the test to wait until it is executing in the backend
before trying to cancel it. This should remove planning time as
a variable that might cause the test to be flaky (e.g. if planning
is slow on S3 because of the time taken to list files).

Also dump the /queries debug page when the assertion is hit to
aid debugging.

Change-Id: I0c884f76659005e7245a156ee33c249b86662b75
---
M tests/custom_cluster/test_client_ssl.py
1 file changed, 8 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/16760/1
--
To view, visit http://gerrit.cloudera.org:8080/16760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0c884f76659005e7245a156ee33c249b86662b75
Gerrit-Change-Number: 16760
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10156: test unmatched schema should use unique database

2020-11-20 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16758 )

Change subject: IMPALA-10156: test_unmatched_schema should use unique_database
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16758
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I66c5388d62f87795176b20243a4ccca70412b18c
Gerrit-Change-Number: 16758
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Sat, 21 Nov 2020 00:48:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10156: test unmatched schema should use unique database

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16758 )

Change subject: IMPALA-10156: test_unmatched_schema should use unique_database
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7706/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16758
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I66c5388d62f87795176b20243a4ccca70412b18c
Gerrit-Change-Number: 16758
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Sat, 21 Nov 2020 00:24:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10156: test unmatched schema should use unique database

2020-11-20 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16758


Change subject: IMPALA-10156: test_unmatched_schema should use unique_database
..

IMPALA-10156: test_unmatched_schema should use unique_database

This updates the test to use the unique_database fixture instead
of trying to generate a unique table name on its own.

Change-Id: I66c5388d62f87795176b20243a4ccca70412b18c
---
M tests/query_test/test_scanners.py
1 file changed, 14 insertions(+), 18 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/16758/1
--
To view, visit http://gerrit.cloudera.org:8080/16758
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I66c5388d62f87795176b20243a4ccca70412b18c
Gerrit-Change-Number: 16758
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10325 Parquet scan should use min/max statistics to skip pages based on equi-join predicate

2020-11-20 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: IMPALA-10325 Parquet scan should use min/max statistics to skip 
pages based on equi-join predicate
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16720/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16720/12//COMMIT_MSG@9
PS12, Line 9: This patch adds the logic to utilize min/max stats
> Does this patch also leads to utilizing min/max filters per-row, similarly
I think this would be a good thing to do (I think the patch does this 
automatically).

One question I have is, if this is the case, whether the min/max filter is 
evaluated before the bloom filter or vice-versa. That might have some perf 
implications. It's not clear to me which order is better or whether it really 
matters.



--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 12
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 20 Nov 2020 23:17:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10325 Parquet scan should use min/max statistics to skip pages based on equi-join predicate

2020-11-20 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: IMPALA-10325 Parquet scan should use min/max statistics to skip 
pages based on equi-join predicate
..


Patch Set 12:

(3 comments)

I have some high level comments. I plan to go through the patch in more detail 
later.

http://gerrit.cloudera.org:8080/#/c/16720/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16720/12//COMMIT_MSG@9
PS12, Line 9: This patch adds the logic to utilize min/max stats
Does this patch also leads to utilizing min/max filters per-row, similarly to 
bloom filters?


http://gerrit.cloudera.org:8080/#/c/16720/12/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/16720/12/be/src/exec/parquet/hdfs-parquet-scanner.cc@549
PS12, Line 549: if ( eval_min_max ) {
I am wondering if it is possible to handle min/max runtime filters more 
similarly to existing stat filtering.

A possible idea is to split the new filter do distinct data_min>join_max and 
data_maxhttps://github.com/apache/impala/blob/master/be/src/exec/parquet/parquet-column-stats.cc#L87


http://gerrit.cloudera.org:8080/#/c/16720/12/be/src/exec/parquet/hdfs-parquet-scanner.cc@862
PS12, Line 862: TYPE_DATETIME
You meant TYPE_TIMESTAMP, right? DATETIME is completely unsupported in Impala



--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 12
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 20 Nov 2020 22:33:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

2020-11-20 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
..

IMPALA-10152: Add support for Iceberg HiveCatalog

HiveCatalog is one of Iceberg's catalog implementations. It uses
the Hive metastore and it is the recommended catalog implementation
when the table data is stored in object stores like S3.

This commit updates the Iceberg version to a newer one, and it also
retrieves Iceberg from the CDP distribution because that version of
Iceberg is built against Hive 3 (Impala is only compatible with
Hive 3).

This commit makes HiveCatalog the default Iceberg catalog in Impala
because it can be used in more environments (e.g. cloud stores),
and it is more featureful. Also, other engines that store their
table metadata in HMS will probably use HiveCatalog as well.

Tables stored in HiveCatalog are similar to Kudu tables with HMS
integration, i.e. modifying an Iceberg table via the Iceberg APIs
also modifies the HMS table. So in CatalogOpExecutor we handle
such Iceberg tables similarly to integrated Kudu tables.

Testing:
 * Added e2e tests for creating, writing, and altering Iceberg
   tables
 * Added SHOW CREATE TABLE tests

Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Reviewed-on: http://gerrit.cloudera.org:8080/16721
Reviewed-by: wangsheng 
Tested-by: Impala Public Jenkins 
---
M bin/impala-config.sh
M common/thrift/CatalogObjects.thrift
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
16 files changed, 577 insertions(+), 92 deletions(-)

Approvals:
  wangsheng: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 


[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6687/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 8
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 20 Nov 2020 21:20:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits

2020-11-20 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
..


Patch Set 7: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 7
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 20 Nov 2020 21:20:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
..


Patch Set 8: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 8
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 20 Nov 2020 21:20:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10334 test stats extrapolation output doesn't match on erasure coding build

2020-11-20 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16756 )

Change subject: IMPALA-10334 test_stats_extrapolation output doesn't match on 
erasure coding build
..


Patch Set 2: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16756/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16756/2//COMMIT_MSG@7
PS2, Line 7: IMPALA-10334 test_stats_extrapolation output doesn't match on 
erasure coding build
nit: we usually put : after jira



--
To view, visit http://gerrit.cloudera.org:8080/16756
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16c11aa0a1ec2d4569c272d2454915041039f950
Gerrit-Change-Number: 16756
Gerrit-PatchSet: 2
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 20 Nov 2020 20:57:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10334 test stats extrapolation output doesn't match on erasure coding build

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16756 )

Change subject: IMPALA-10334 test_stats_extrapolation output doesn't match on 
erasure coding build
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7705/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16756
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16c11aa0a1ec2d4569c272d2454915041039f950
Gerrit-Change-Number: 16756
Gerrit-PatchSet: 2
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 20 Nov 2020 20:46:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
..


Patch Set 7: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 20 Nov 2020 20:39:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10334 test stats extrapolation output doesn't match on erasure coding build

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16756 )

Change subject: IMPALA-10334 test_stats_extrapolation output doesn't match on 
erasure coding build
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7704/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16756
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16c11aa0a1ec2d4569c272d2454915041039f950
Gerrit-Change-Number: 16756
Gerrit-PatchSet: 1
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 20 Nov 2020 20:28:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10334 test stats extrapolation output doesn't match on erasure coding build

2020-11-20 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/16756 )

Change subject: IMPALA-10334 test_stats_extrapolation output doesn't match on 
erasure coding build
..

IMPALA-10334 test_stats_extrapolation output doesn't match on erasure coding 
build

This patch skips test_stats_extrapolation for erasure code builds. The
reason is that an extra erasure code information line can be included
in the scan explain section when a hdfs table is erasure coded. This
makes the explain output different between a normal build and an
erasure code build. A new reason 'contain_full_explain' is added to
SkipIfEC to facilitate this.

Testing:
  Ran erasure coding version of the EE and CLUSTER tests.
  Ran core tests

Change-Id: I16c11aa0a1ec2d4569c272d2454915041039f950
---
M tests/common/skip.py
M tests/metadata/test_stats_extrapolation.py
2 files changed, 4 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/16756/2
--
To view, visit http://gerrit.cloudera.org:8080/16756
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I16c11aa0a1ec2d4569c272d2454915041039f950
Gerrit-Change-Number: 16756
Gerrit-PatchSet: 2
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] [WIP] IMPALA-10334 test stats extrapolation output doesn't match on erasure coding build

2020-11-20 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16756


Change subject: [WIP] IMPALA-10334 test_stats_extrapolation output doesn't 
match on erasure coding build
..

[WIP] IMPALA-10334 test_stats_extrapolation output doesn't match on erasure 
coding build

This patch skips test_stats_extrapolation for erasure code builds. The
reason is that an extra information line can be included in the scan
explain section when a hdfs table is erasure coded, which makes the
explain output non-deterministic. A new reason 'contain_full_explain'
is added to SkipIfEC to facilitate this.

Testing:
  Ran erasure coding version of the EE and CLUSTER tests.
  Ran core tests

Change-Id: I16c11aa0a1ec2d4569c272d2454915041039f950
---
M tests/common/skip.py
M tests/metadata/test_stats_extrapolation.py
2 files changed, 4 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/16756/1
--
To view, visit http://gerrit.cloudera.org:8080/16756
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I16c11aa0a1ec2d4569c272d2454915041039f950
Gerrit-Change-Number: 16756
Gerrit-PatchSet: 1
Gerrit-Owner: Qifan Chen 


[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7703/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 7
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 20 Nov 2020 19:00:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7702/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 6
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 20 Nov 2020 18:55:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits

2020-11-20 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
..


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/cup/sql-parser.cup@3115
PS5, Line 3115: // clause. An attempt was made to set this for individual exprs
> I guess this is a bit limiting in the it applies only to the whole where cl
That was my original attempt too..setting the hint at the predicate level but 
when I was running into shift/reduce conflicts.  I gave it another try today 
with a few variations and still could not get it to work.  I have left a 
comment in the code about this.

For reference, here's one change I tried:
expr ::=
  non_pred_expr:e
  {: RESULT = e; :}
  | opt_plan_hints:pred_hints predicate:p
  {:
p.setPredicateHints(pred_hints);
RESULT = p;
  :};

 This generated quite a few conflicts..example:
Warning : *** Shift/Reduce conflict found in state #253
  between opt_plan_hints ::= (*)
  and case_expr ::= (*) KW_CASE expr case_when_clause_list case_else_clause 
KW_END
  and case_expr ::= (*) KW_CASE case_when_clause_list case_else_clause 
KW_END
  under symbol KW_CASE
  Resolved in favor of shifting.

Warning : *** Shift/Reduce conflict found in state #253
  between opt_plan_hints ::= (*)
  and cast_expr ::= (*) KW_CAST LPAREN expr KW_AS type_def cast_format_val 
RPAREN
  under symbol KW_CAST
  Resolved in favor of shifting.
..
...


http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/analysis/Predicate.java
File fe/src/main/java/org/apache/impala/analysis/Predicate.java:

http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/analysis/Predicate.java@30
PS5, Line 30:  {
> maybe hasAlwaysTrueHint_ just to make it crystal-clear that it's not actual
Changed this and the names of the setter/getter also.


http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@869
PS5, Line 869: if ((fsHasBlocks && fd.getNumFileBlocks() == 0)
> nit: use braces for multi-line if
Done


http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@870
PS5, Line 870: d.getFileLength() <
> We already had to deal with a similar issue here
I will follow up with the doc writer on this.



--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 7
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 20 Nov 2020 18:47:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits

2020-11-20 Thread Aman Sinha (Code Review)
Hello Qifan Chen, Shant Hovsepian, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16723

to look at the new patch set (#7).

Change subject: IMPALA-10314: Optimize planning time for simple limits
..

IMPALA-10314: Optimize planning time for simple limits

This patch optimizes the planning time for simple limit
queries by only considering a minimal set of partitions
whose file descriptors add up to N (the specified limit).
Each file is conservatively estimated to contain 1 row.

This reduces the number of partitions processed by
HdfsScanNode.computeScanRangeLocations() which, according
to query profiling, has been the main contributor to the
planning time especially for large number of partitions.
Further, within each partition, we only consider the number
of non-empty files that brings the total to N.

This is an opt-in optimization. A new planner option
OPTIMIZE_SIMPLE_LIMIT enables this optimization. Further,
if there's a WHERE clause, it must have an 'always_true'
hint in order for the optimization to be considered. For
example:
  set optimize_simple_limit = true;
  SELECT * FROM T
WHERE /* +always_true */ 
  LIMIT 10;

If there are too many empty files in the partitions, it is
possible that the query may produce fewer rows although
those are still valid rows.

Testing:
 - Added planner tests for the optimization
 - Ran query_test.py tests by enabling the optimize_simple_limit
 - Added an e2e test. Since result rows are non-deterministic,
   only simple count(*) query on top of subquery with limit
   was added.

Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/PartitionSet.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test
M 
testdata/workloads/functional-query/queries/QueryTest/range-constant-propagation.test
16 files changed, 505 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/16723/7
--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 7
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits

2020-11-20 Thread Aman Sinha (Code Review)
Hello Qifan Chen, Shant Hovsepian, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16723

to look at the new patch set (#6).

Change subject: IMPALA-10314: Optimize planning time for simple limits
..

IMPALA-10314: Optimize planning time for simple limits

This patch optimizes the planning time for simple limit
queries by only considering a minimal set of partitions
whose file descriptors add up to N (the specified limit).
Each file is conservatively estimated to contain 1 row.

This reduces the number of partitions processed by
HdfsScanNode.computeScanRangeLocations() which, according
to query profiling, has been the main contributor to the
planning time especially for large number of partitions.
Further, within each partition, we only consider the number
of non-empty files that brings the total to N.

This is an opt-in optimization. A new planner option
OPTIMIZE_SIMPLE_LIMIT enables this optimization. Further,
if there's a WHERE clause, it must have an 'always_true'
hint in order for the optimization to be considered. For
example:
  set optimize_simple_limit = true;
  SELECT * FROM T
WHERE /* +always_true */ 
  LIMIT 10;

If there are too many empty files in the partitions, it is
possible that the query may produce fewer rows although
those are still valid rows.

Testing:
 - Added planner tests for the optimization
 - Ran query_test.py tests by enabling the optimize_simple_limit
 - Added an e2e test. Since result rows are non-deterministic,
   only simple count(*) query on top of subquery with limit
   was added.

Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/PartitionSet.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test
M 
testdata/workloads/functional-query/queries/QueryTest/range-constant-propagation.test
16 files changed, 505 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/16723/6
--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 6
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9856: Enable result spooling by default.

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16755 )

Change subject: IMPALA-9856: Enable result spooling by default.
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7701/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16755
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e360c1428676d8f3fab5d95efee18aca085eba4
Gerrit-Change-Number: 16755
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 20 Nov 2020 18:27:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9856: Enable result spooling by default.

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16755 )

Change subject: IMPALA-9856: Enable result spooling by default.
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16755/1/tests/custom_cluster/test_admission_controller.py
File tests/custom_cluster/test_admission_controller.py:

http://gerrit.cloudera.org:8080/#/c/16755/1/tests/custom_cluster/test_admission_controller.py@873
PS1, Line 873:
flake8: E203 whitespace before ':'


http://gerrit.cloudera.org:8080/#/c/16755/1/tests/custom_cluster/test_observability.py
File tests/custom_cluster/test_observability.py:

http://gerrit.cloudera.org:8080/#/c/16755/1/tests/custom_cluster/test_observability.py@38
PS1, Line 38:
flake8: E203 whitespace before ':'


http://gerrit.cloudera.org:8080/#/c/16755/1/tests/query_test/test_udfs.py
File tests/query_test/test_udfs.py:

http://gerrit.cloudera.org:8080/#/c/16755/1/tests/query_test/test_udfs.py@623
PS1, Line 623:
flake8: E203 whitespace before ':'


http://gerrit.cloudera.org:8080/#/c/16755/1/tests/query_test/test_udfs.py@630
PS1, Line 630: _
flake8: E501 line too long (98 > 90 characters)



--
To view, visit http://gerrit.cloudera.org:8080/16755
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e360c1428676d8f3fab5d95efee18aca085eba4
Gerrit-Change-Number: 16755
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 20 Nov 2020 18:06:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9856: Enable result spooling by default.

2020-11-20 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16755


Change subject: IMPALA-9856: Enable result spooling by default.
..

IMPALA-9856: Enable result spooling by default.

Result spooling has been relatively stable since it was introduced, and
it has several benefits described in IMPALA-8656. This patch enable
result spooling (SPOOL_QUERY_RESULTS) query options by default.

Furthermore, some tests need to be adjusted to account for result
spooling by default. The following are the adjustment categories and
list of tests that fall under such category.

Change in assertions:
PlannerTest
TpcdsPlannerTest
custom_cluster/test_admission_controller.py::TestAdmissionController::test_dedicated_coordinator_planner_estimates
custom_cluster/test_admission_controller.py::TestAdmissionController::test_memory_rejection
custom_cluster/test_admission_controller.py::TestAdmissionController::test_pool_mem_limit_configs
metadata/test_explain.py::TestExplain::test_explain_level2
metadata/test_explain.py::TestExplain::test_explain_level3
metadata/test_stats_extrapolation.py::TestStatsExtrapolation::test_stats_extrapolation

Increase BUFFER_POOL_LIMIT:
query_test/test_queries.py::TestQueries::test_analytic_fns
query_test/test_runtime_filters.py::TestRuntimeRowFilters::test_row_filter_reservation
query_test/test_spilling.py::TestSpillingBroadcastJoins::test_spilling_broadcast_joins
query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_aggs
query_test/test_udfs.py::TestUdfExecution::test_mem_limits

Increase MEM_LIMIT:
query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_hdfs_scanner_thread_mem_scaling

Increase MAX_ROW_SIZE:
custom_cluster/test_parquet_max_page_header.py::TestParquetMaxPageHeader::test_large_page_header_config
query_test/test_scanners.py::TestTextSplitDelimiters::test_text_split_across_buffers_delimiter

Disable result spooling to maintain assertion:
custom_cluster/test_admission_controller.py::TestAdmissionController::test_set_request_pool
custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_host_memory
custom_cluster/test_admission_controller.py::TestAdmissionController::test_timeout_reason_pool_memory
custom_cluster/test_admission_controller.py::TestAdmissionController::test_queue_reasons_memory
custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_fetched_rows
custom_cluster/test_query_retries.py::TestQueryRetries::test_retry_finished_query
custom_cluster/test_scratch_disk.py::TestScratchDir::test_no_dirs
custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_existing_dirs
custom_cluster/test_scratch_disk.py::TestScratchDir::test_non_writable_dirs
query_test/test_kudu.py::TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan
query_test/test_mem_usage_scaling.py::TestScanMemLimit::test_kudu_scan_mem_usage
query_test/test_query_mem_limit.py::TestCodegenMemLimit::test_codegen_mem_limit
query_test/test_query_mem_limit.py::TestQueryMemLimit::test_mem_limit
query_test/test_sort.py::TestQueryFullSort::test_multiple_mem_limits_full_output
query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_regression_exhaustive
shell/test_shell_client.py::TestShellClient::test_fetch_size

Disable result spooling to avoid crash / silent error:
custom_cluster/test_observability.py::TestObservability::test_host_profile_jvm_gc_metrics
query_test/test_insert.py::TestInsertQueries::test_insert_large_string
query_test/test_queries.py::TestQueriesParquetTables::test_very_large_strings
query_test/test_scanners.py::TestWideRow::test_wide_row
query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_large_rows
query_test/test_udfs.py::TestUdfExecution::test_java_udfs
query_test/test_udfs.py::TestUdfTargeted::test_udf_profile

Further investigation need to be done to address the last category.

Testing:
- Pass exhaustive tests.

Change-Id: I9e360c1428676d8f3fab5d95efee18aca085eba4
---
M common/thrift/ImpalaInternalService.thrift
M testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection-hdfs-num-rows-est-enabled.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters-hdfs-num-rows-est-enabled.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M 

[Impala-ASF-CR] IMPALA-10332: Add file formats to HdfsScanNode's thrift representation.

2020-11-20 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16728 )

Change subject: IMPALA-10332: Add file formats to HdfsScanNode's thrift 
representation.
..

IMPALA-10332: Add file formats to HdfsScanNode's thrift representation.

List all file formats that a HdfsScanNode needs to process in any
fragment instance. It is possible that some file formats will not be
needed in all fragment instances.

This is a step towards sharing codegen between different impala
backends. Using the file formats provided in the thrift file, a backend
can codegen code for file formats that are not needed in its own process
but are needed in other fragment instances running on other backends,
and the resulting binary can be shared between multiple backends.

Codegenning for file formats will be done based on the thrift message
and not on what is needed for the actual backend. This leads to some
extra work in case a file format is not needed for the current backend
and codegen sharing is not available (at this point it is not
implemented). However, the overall number of such cases is low.

Also adding the file formats to the node's explain string at level 3.

Testing:
 - Added tests to verify that the file formats are present in the
   explain string at level 3.

Change-Id: Iad6b8271bd248983f327c07883a3bedf50f25b5d
Reviewed-on: http://gerrit.cloudera.org:8080/16728
Tested-by: Impala Public Jenkins 
Reviewed-by: Csaba Ringhofer 
---
M be/src/exec/hdfs-scan-node-base.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test
5 files changed, 52 insertions(+), 8 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Csaba Ringhofer: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/16728
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Iad6b8271bd248983f327c07883a3bedf50f25b5d
Gerrit-Change-Number: 16728
Gerrit-PatchSet: 11
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10332: Add file formats to HdfsScanNode's thrift representation.

2020-11-20 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16728 )

Change subject: IMPALA-10332: Add file formats to HdfsScanNode's thrift 
representation.
..


Patch Set 10: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16728
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iad6b8271bd248983f327c07883a3bedf50f25b5d
Gerrit-Change-Number: 16728
Gerrit-PatchSet: 10
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 20 Nov 2020 17:54:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10332: Add file formats to HdfsScanNode's thrift representation.

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16728 )

Change subject: IMPALA-10332: Add file formats to HdfsScanNode's thrift 
representation.
..


Patch Set 10: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16728
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iad6b8271bd248983f327c07883a3bedf50f25b5d
Gerrit-Change-Number: 16728
Gerrit-PatchSet: 10
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 20 Nov 2020 17:25:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10346: Rename Iceberg test tables' name with specific cases

2020-11-20 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16753 )

Change subject: IMPALA-10346: Rename Iceberg test tables' name with specific 
cases
..

IMPALA-10346: Rename Iceberg test tables' name with specific cases

We used some unrecognized table names in Iceberg related test cases,
such as iceberg_test1/iceberg_test2 and so on, which resulted in poor
readability. So we better rename these Iceberg test tables' name by
specific cases.

Testing:
  - Renamed tables' name in iceberg-create.test
  - Renamed tables' name in iceberg-alter.test

Change-Id: Ifdaeaaeed69753222668342dcac852677fdd9ae5
Reviewed-on: http://gerrit.cloudera.org:8080/16753
Reviewed-by: Zoltan Borok-Nagy 
Tested-by: Impala Public Jenkins 
---
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
2 files changed, 62 insertions(+), 62 deletions(-)

Approvals:
  Zoltan Borok-Nagy: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/16753
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ifdaeaaeed69753222668342dcac852677fdd9ae5
Gerrit-Change-Number: 16753
Gerrit-PatchSet: 3
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 


[Impala-ASF-CR] IMPALA-10346: Rename Iceberg test tables' name with specific cases

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16753 )

Change subject: IMPALA-10346: Rename Iceberg test tables' name with specific 
cases
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16753
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifdaeaaeed69753222668342dcac852677fdd9ae5
Gerrit-Change-Number: 16753
Gerrit-PatchSet: 2
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 20 Nov 2020 16:21:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10282: Implement ds cpc sketch() and ds cpc estimate() functions

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16656 )

Change subject: IMPALA-10282: Implement ds_cpc_sketch() and ds_cpc_estimate() 
functions
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7700/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I731e66fbadc74bc339c973f4d9337db9b7dd715a
Gerrit-Change-Number: 16656
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 20 Nov 2020 15:36:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

2020-11-20 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
..


Patch Set 7:

Thanks for the review! After +2 we can run the verify job with DRY_RUN=false, 
so on success the job submits the patch.


--
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 20 Nov 2020 15:18:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10282: Implement ds cpc sketch() and ds cpc estimate() functions

2020-11-20 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16656


Change subject: IMPALA-10282: Implement ds_cpc_sketch() and ds_cpc_estimate() 
functions
..

IMPALA-10282: Implement ds_cpc_sketch() and ds_cpc_estimate() functions

These functions can be used to get cardinality estimates of data
using CPC algorithm from Apache DataSketches. ds_cpc_sketch()
receives a dataset, e.g. a column from a table, and returns a
serialized CPC sketch in string format. This can be written to a
table or be fed directly to ds_cpc_estimate() that returns the
cardinality estimate for that sketch.

Similar to the HLL sketch, the primary use-case for the CPC sketch
is for counting distinct values as a stream, and then merging
multiple sketches together for a total distinct count.

For more details about Apache DataSketches' CPC see:
http://datasketches.apache.org/docs/CPC/CPC.html
Figures-of-Merit Comparison of the HLL and CPC Sketches see:
https://datasketches.apache.org/docs/DistinctCountMeritComparisons.html

Testing:
 - Added some tests running estimates for small datasets where the
   amount of data is small enough to get the correct results.
 - Ran manual tests on tpch_parquet.lineitem to compare perfomance
   with ndv(). Depending on data characteristics ndv() appears 2x-3x
   faster. CPC gives closer estimate than current ndv(). CPC is more
   accurate than HLL in some cases

Change-Id: I731e66fbadc74bc339c973f4d9337db9b7dd715a
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M be/src/exprs/datasketches-common.cc
M be/src/exprs/datasketches-common.h
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
A testdata/data/cpc_sketches_from_hive.parquet
A testdata/workloads/functional-query/queries/QueryTest/datasketches-cpc.test
M tests/query_test/test_datasketches.py
12 files changed, 398 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/56/16656/4
--
To view, visit http://gerrit.cloudera.org:8080/16656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I731e66fbadc74bc339c973f4d9337db9b7dd715a
Gerrit-Change-Number: 16656
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6686/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 20 Nov 2020 15:10:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

2020-11-20 Thread wangsheng (Code Review)
wangsheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
..


Patch Set 7: Code-Review+2

Thanks for this new feature, Zoltan, LGTM!


--
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 20 Nov 2020 15:10:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10325 Parquet scan should use min/max statistics to skip pages based on equi-join predicate

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: IMPALA-10325 Parquet scan should use min/max statistics to skip 
pages based on equi-join predicate
..


Patch Set 12:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7699/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 12
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 20 Nov 2020 14:45:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10325 Parquet scan should use min/max statistics to skip pages based on equi-join predicate

2020-11-20 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#12). ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: IMPALA-10325 Parquet scan should use min/max statistics to skip 
pages based on equi-join predicate
..

IMPALA-10325 Parquet scan should use min/max statistics to skip pages based on 
equi-join predicate

This patch adds the logic to utilize min/max stats for Parquet row
groups or pages to skip these entities when they don't qualify an
equi-join predicate.

A new class of predicates called overlap predicates is introduced to aid
in the determination of whether a Parquet row group or a page overlap
with the a range computed from the hash join. If not, then the entire
Parquet row group or the page are skipped. The new class of predicates
co-exist with the existing min/max conjuncts that are introduced based
on the local scan predicates.

Both classes of predicates can work individually or togther with each
other. The overlap predicates are evaualted after the existing min/max
conjuncts.

To be done:
1. Handle all data types;
2. Unit/performance testing;
3. Core testing.

Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
---
M be/src/exec/exec-node.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-column-stats.cc
M be/src/exec/parquet/parquet-column-stats.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
14 files changed, 386 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/16720/12
--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 12
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7698/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 20 Nov 2020 13:40:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

2020-11-20 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
..


Patch Set 7:

(2 comments)

Thanks for the quick review!

http://gerrit.cloudera.org:8080/#/c/16721/6/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/16721/6/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@84
PS6, Line 84:   // Internal Iceberg table property that specifies the absolute 
path of the current
:   // table metadata.
> We may add some explain here: this propery is only valid for 'hive.catalog'
Done


http://gerrit.cloudera.org:8080/#/c/16721/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test:

http://gerrit.cloudera.org:8080/#/c/16721/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test@127
PS6, Line 127: CREATE TABLE iceberg_hadoop_cat_with_metadata_locacti
> Shall we add a test for HadoopCatalog here?
Done



--
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 20 Nov 2020 13:18:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

2020-11-20 Thread Zoltan Borok-Nagy (Code Review)
Hello Gabor Kaszab, wangsheng, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16721

to look at the new patch set (#7).

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
..

IMPALA-10152: Add support for Iceberg HiveCatalog

HiveCatalog is one of Iceberg's catalog implementations. It uses
the Hive metastore and it is the recommended catalog implementation
when the table data is stored in object stores like S3.

This commit updates the Iceberg version to a newer one, and it also
retrieves Iceberg from the CDP distribution because that version of
Iceberg is built against Hive 3 (Impala is only compatible with
Hive 3).

This commit makes HiveCatalog the default Iceberg catalog in Impala
because it can be used in more environments (e.g. cloud stores),
and it is more featureful. Also, other engines that store their
table metadata in HMS will probably use HiveCatalog as well.

Tables stored in HiveCatalog are similar to Kudu tables with HMS
integration, i.e. modifying an Iceberg table via the Iceberg APIs
also modifies the HMS table. So in CatalogOpExecutor we handle
such Iceberg tables similarly to integrated Kudu tables.

Testing:
 * Added e2e tests for creating, writing, and altering Iceberg
   tables
 * Added SHOW CREATE TABLE tests

Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
---
M bin/impala-config.sh
M common/thrift/CatalogObjects.thrift
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
16 files changed, 577 insertions(+), 92 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/16721/7
--
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 


[Impala-ASF-CR] IMPALA-10332: Add file formats to HdfsScanNode's thrift representation.

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16728 )

Change subject: IMPALA-10332: Add file formats to HdfsScanNode's thrift 
representation.
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7697/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16728
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iad6b8271bd248983f327c07883a3bedf50f25b5d
Gerrit-Change-Number: 16728
Gerrit-PatchSet: 9
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 20 Nov 2020 12:16:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10332: Add file formats to HdfsScanNode's thrift representation.

2020-11-20 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16728 )

Change subject: IMPALA-10332: Add file formats to HdfsScanNode's thrift 
representation.
..


Patch Set 9: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16728
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iad6b8271bd248983f327c07883a3bedf50f25b5d
Gerrit-Change-Number: 16728
Gerrit-PatchSet: 9
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 20 Nov 2020 11:57:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10332: Add file formats to HdfsScanNode's thrift representation.

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16728 )

Change subject: IMPALA-10332: Add file formats to HdfsScanNode's thrift 
representation.
..


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6685/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16728
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iad6b8271bd248983f327c07883a3bedf50f25b5d
Gerrit-Change-Number: 16728
Gerrit-PatchSet: 10
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 20 Nov 2020 11:55:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10332: Add file formats to HdfsScanNode's thrift representation.

2020-11-20 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/16728 )

Change subject: IMPALA-10332: Add file formats to HdfsScanNode's thrift 
representation.
..

IMPALA-10332: Add file formats to HdfsScanNode's thrift representation.

List all file formats that a HdfsScanNode needs to process in any
fragment instance. It is possible that some file formats will not be
needed in all fragment instances.

This is a step towards sharing codegen between different impala
backends. Using the file formats provided in the thrift file, a backend
can codegen code for file formats that are not needed in its own process
but are needed in other fragment instances running on other backends,
and the resulting binary can be shared between multiple backends.

Codegenning for file formats will be done based on the thrift message
and not on what is needed for the actual backend. This leads to some
extra work in case a file format is not needed for the current backend
and codegen sharing is not available (at this point it is not
implemented). However, the overall number of such cases is low.

Also adding the file formats to the node's explain string at level 3.

Testing:
 - Added tests to verify that the file formats are present in the
   explain string at level 3.

Change-Id: Iad6b8271bd248983f327c07883a3bedf50f25b5d
---
M be/src/exec/hdfs-scan-node-base.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test
5 files changed, 52 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/16728/9
--
To view, visit http://gerrit.cloudera.org:8080/16728
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iad6b8271bd248983f327c07883a3bedf50f25b5d
Gerrit-Change-Number: 16728
Gerrit-PatchSet: 9
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10346: Rename Iceberg test tables' name with specific cases

2020-11-20 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16753 )

Change subject: IMPALA-10346: Rename Iceberg test tables' name with specific 
cases
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16753
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifdaeaaeed69753222668342dcac852677fdd9ae5
Gerrit-Change-Number: 16753
Gerrit-PatchSet: 2
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 20 Nov 2020 11:47:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

2020-11-20 Thread wangsheng (Code Review)
wangsheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
..


Patch Set 6:

(2 comments)

Thanks for a quick turnaround, just two nits.

http://gerrit.cloudera.org:8080/#/c/16721/6/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/16721/6/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@84
PS6, Line 84:   // Internal Iceberg table property that specifies the absolute 
path of the current
:   // table metadata.
We may add some explain here: this propery is only valid for 'hive.catalog' or 
'HiveCatalog'


http://gerrit.cloudera.org:8080/#/c/16721/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test:

http://gerrit.cloudera.org:8080/#/c/16721/6/testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test@127
PS6, Line 127: CREATE TABLE iceberg_hive_cat_with_metadata_locaction
Shall we add a test for HadoopCatalog here?



--
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 20 Nov 2020 11:22:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10346: Rename Iceberg test tables' name with specific cases

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16753 )

Change subject: IMPALA-10346: Rename Iceberg test tables' name with specific 
cases
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7696/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16753
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifdaeaaeed69753222668342dcac852677fdd9ae5
Gerrit-Change-Number: 16753
Gerrit-PatchSet: 2
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 20 Nov 2020 11:17:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10346: Rename Iceberg test tables' name with specific cases

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16753 )

Change subject: IMPALA-10346: Rename Iceberg test tables' name with specific 
cases
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6684/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/16753
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifdaeaaeed69753222668342dcac852677fdd9ae5
Gerrit-Change-Number: 16753
Gerrit-PatchSet: 2
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 20 Nov 2020 10:56:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10346: Rename Iceberg test tables' name with specific cases

2020-11-20 Thread wangsheng (Code Review)
wangsheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16753 )

Change subject: IMPALA-10346: Rename Iceberg test tables' name with specific 
cases
..


Patch Set 2:

(2 comments)

Thanks for review!

http://gerrit.cloudera.org:8080/#/c/16753/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16753/1//COMMIT_MSG@7
PS1, Line 7: cases
> nit: probably the word 'cases' fits better here
Done


http://gerrit.cloudera.org:8080/#/c/16753/1//COMMIT_MSG@10
PS1, Line 10: iceberg_test
> nit: iceberg_test1
Done



--
To view, visit http://gerrit.cloudera.org:8080/16753
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifdaeaaeed69753222668342dcac852677fdd9ae5
Gerrit-Change-Number: 16753
Gerrit-PatchSet: 2
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 20 Nov 2020 10:56:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10346: Rename Iceberg test tables' name with specific cases

2020-11-20 Thread wangsheng (Code Review)
wangsheng has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/16753 )

Change subject: IMPALA-10346: Rename Iceberg test tables' name with specific 
cases
..

IMPALA-10346: Rename Iceberg test tables' name with specific cases

We used some unrecognized table names in Iceberg related test cases,
such as iceberg_test1/iceberg_test2 and so on, which resulted in poor
readability. So we better rename these Iceberg test tables' name by
specific cases.

Testing:
  - Renamed tables' name in iceberg-create.test
  - Renamed tables' name in iceberg-alter.test

Change-Id: Ifdaeaaeed69753222668342dcac852677fdd9ae5
---
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
2 files changed, 62 insertions(+), 62 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/16753/2
--
To view, visit http://gerrit.cloudera.org:8080/16753
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifdaeaaeed69753222668342dcac852677fdd9ae5
Gerrit-Change-Number: 16753
Gerrit-PatchSet: 2
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10346: Rename Iceberg test tables' name with specific situations

2020-11-20 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16753 )

Change subject: IMPALA-10346: Rename Iceberg test tables' name with specific 
situations
..


Patch Set 1: Code-Review+2

(2 comments)

Thanks for making the tests more readable!

http://gerrit.cloudera.org:8080/#/c/16753/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16753/1//COMMIT_MSG@7
PS1, Line 7: situations
nit: probably the word 'cases' fits better here


http://gerrit.cloudera.org:8080/#/c/16753/1//COMMIT_MSG@10
PS1, Line 10: iceberg_tes1
nit: iceberg_test1



--
To view, visit http://gerrit.cloudera.org:8080/16753
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifdaeaaeed69753222668342dcac852677fdd9ae5
Gerrit-Change-Number: 16753
Gerrit-PatchSet: 1
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 20 Nov 2020 10:25:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7695/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 20 Nov 2020 10:05:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7694/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 20 Nov 2020 10:04:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

2020-11-20 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
..


Patch Set 6:

(3 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java:

http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@419
PS4, Line 419: cebergTable.METAD
> This table property is generated by HiveCatalog, I'm curious about:
Good point! 'metadata_location' is internal to Iceberg so we shouldn't allow 
users modifying it.

Updated the code accordingly.


http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java:

http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java@65
PS4, Line 65: return hiveCatalog_.createTable(identifier, schema, spec, 
location, properties);
:   }
> nits: One line is ok, unnecessary for two lines.
Done


http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java
File fe/src/main/java/org/apache/impala/util/IcebergUtil.java:

http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@242
PS4, Line 242:* Get TIcebergFileFormat from a string, usually from table 
properties.
 :*
> Maybe we need add a comment here, since we change the default value to 'PAR
Updated the code a bit. Now this method is returning PARQUET when 'format' is 
null. Format can be null when the table was created by other engines. And 
returning null when the format string is invalid.



--
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 20 Nov 2020 09:45:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

2020-11-20 Thread Zoltan Borok-Nagy (Code Review)
Hello Gabor Kaszab, wangsheng, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16721

to look at the new patch set (#6).

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
..

IMPALA-10152: Add support for Iceberg HiveCatalog

HiveCatalog is one of Iceberg's catalog implementations. It uses
the Hive metastore and it is the recommended catalog implementation
when the table data is stored in object stores like S3.

This commit updates the Iceberg version to a newer one, and it also
retrieves Iceberg from the CDP distribution because that version of
Iceberg is built against Hive 3 (Impala is only compatible with
Hive 3).

This commit makes HiveCatalog the default Iceberg catalog in Impala
because it can be used in more environments (e.g. cloud stores),
and it is more featureful. Also, other engines that store their
table metadata in HMS will probably use HiveCatalog as well.

Tables stored in HiveCatalog are similar to Kudu tables with HMS
integration, i.e. modifying an Iceberg table via the Iceberg APIs
also modifies the HMS table. So in CatalogOpExecutor we handle
such Iceberg tables similarly to integrated Kudu tables.

Testing:
 * Added e2e tests for creating, writing, and altering Iceberg
   tables
 * Added SHOW CREATE TABLE tests

Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
---
M bin/impala-config.sh
M common/thrift/CatalogObjects.thrift
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
16 files changed, 568 insertions(+), 92 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/16721/6
--
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 


[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

2020-11-20 Thread Zoltan Borok-Nagy (Code Review)
Hello Gabor Kaszab, wangsheng, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16721

to look at the new patch set (#5).

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
..

IMPALA-10152: Add support for Iceberg HiveCatalog

HiveCatalog is one of Iceberg's catalog implementations. It uses
the Hive metastore and it is the recommended catalog implementation
when the table data is stored in object stores like S3.

This commit updates the Iceberg version to a newer one, and it also
retrieves Iceberg from the CDP distribution because that version of
Iceberg is built against Hive 3 (Impala is only compatible with
Hive 3).

This commit makes HiveCatalog the default Iceberg catalog in Impala
because it can be used in more environments (e.g. cloud stores),
and it is more featureful. Also, other engines that store their
table metadata in HMS will probably use HiveCatalog as well.

Tables stored in HiveCatalog are similar to Kudu tables with HMS
integration, i.e. modifying an Iceberg table via the Iceberg APIs
also modifies the HMS table. So in CatalogOpExecutor we handle
such Iceberg tables similarly to integrated Kudu tables.

Testing:
 * Added e2e tests for creating, writing, and altering Iceberg
   tables
 * Added SHOW CREATE TABLE tests

Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
---
M bin/impala-config.sh
M common/thrift/CatalogObjects.thrift
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
16 files changed, 568 insertions(+), 92 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/16721/5
--
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 


[Impala-ASF-CR] IMPALA-10346: Rename Iceberg test tables' name with specific situations

2020-11-20 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16753 )

Change subject: IMPALA-10346: Rename Iceberg test tables' name with specific 
situations
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7693/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16753
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifdaeaaeed69753222668342dcac852677fdd9ae5
Gerrit-Change-Number: 16753
Gerrit-PatchSet: 1
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 20 Nov 2020 09:19:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10346: Rename Iceberg test tables' name with specific situations

2020-11-20 Thread wangsheng (Code Review)
wangsheng has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16753


Change subject: IMPALA-10346: Rename Iceberg test tables' name with specific 
situations
..

IMPALA-10346: Rename Iceberg test tables' name with specific situations

We used some unrecognized table names in Iceberg related test cases,
such as iceberg_tes1/iceberg_test2 and so on, which resulted in poor
readability. So we better rename these Iceberg test tables' name by
specific situations.

Testing:
  - Renamed tables' name in iceberg-create.test
  - Renamed tables' name in iceberg-alter.test

Change-Id: Ifdaeaaeed69753222668342dcac852677fdd9ae5
---
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
2 files changed, 62 insertions(+), 62 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/16753/1
--
To view, visit http://gerrit.cloudera.org:8080/16753
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ifdaeaaeed69753222668342dcac852677fdd9ae5
Gerrit-Change-Number: 16753
Gerrit-PatchSet: 1
Gerrit-Owner: wangsheng 


[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

2020-11-20 Thread wangsheng (Code Review)
wangsheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java:

http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java@419
PS4, Line 419: metadata_location
This table property is generated by HiveCatalog, I'm curious about:
1. Can we set this table property when creating table? If not, I think we need 
to add some check in code;
2. Can we alter table to set this table property? If not, we also need to add 
some check in code;
3. Maybe we should define a static variable in IcebergTable.java, just like 
ICEBERG_CATALOG, and we can use a reference here.

As far as I know, 'metadata_location' is refer to a metadata file's absolute 
path, and maybe we cannot modify this property manually.


http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
File fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java:

http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java@65
PS4, Line 65: return hiveCatalog_.createTable(identifier, schema, spec, 
location,
: properties);
nits: One line is ok, unnecessary for two lines.


http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java
File fe/src/main/java/org/apache/impala/util/IcebergUtil.java:

http://gerrit.cloudera.org:8080/#/c/16721/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@242
PS4, Line 242:* Get TIcebergFileFormat from a string, usually from table 
properties
 :*/
Maybe we need add a comment here, since we change the default value to 
'PARQUET' to replace 'null'



--
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 20 Nov 2020 07:59:49 +
Gerrit-HasComments: Yes