[Impala-ASF-CR] IMPALA-10329 Change apt install retry times to 30

2020-11-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16751 )

Change subject: IMPALA-10329 Change apt install retry times to 30
..

IMPALA-10329 Change apt install retry times to 30

Change apt install retry times to 30 in bootstrap_system.sh,
Because this always timeout recently.
And add solution for waiting the apt's lock-frontend

Change-Id: Id664dd66874ac65d6b78e630c974a6a563408147
Reviewed-on: http://gerrit.cloudera.org:8080/16751
Reviewed-by: Jim Apple 
Tested-by: Impala Public Jenkins 
---
M bin/bootstrap_system.sh
1 file changed, 6 insertions(+), 1 deletion(-)

Approvals:
  Jim Apple: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/16751
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id664dd66874ac65d6b78e630c974a6a563408147
Gerrit-Change-Number: 16751
Gerrit-PatchSet: 2
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 


[Impala-ASF-CR] IMPALA-10329 Change apt install retry times to 30

2020-11-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16751 )

Change subject: IMPALA-10329 Change apt install retry times to 30
..


Patch Set 1: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16751
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id664dd66874ac65d6b78e630c974a6a563408147
Gerrit-Change-Number: 16751
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Comment-Date: Fri, 20 Nov 2020 07:44:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8990: Fix flakiness in test set request pool

2020-11-19 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16749 )

Change subject: IMPALA-8990: Fix flakiness in test_set_request_pool
..


Patch Set 2:

hit IMPALA-9355


--
To view, visit http://gerrit.cloudera.org:8080/16749
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ife06509e936443579ca60780013ce01352c8932e
Gerrit-Change-Number: 16749
Gerrit-PatchSet: 2
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 20 Nov 2020 03:16:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10329 Change apt install retry times to 30

2020-11-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16751 )

Change subject: IMPALA-10329 Change apt install retry times to 30
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7692/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16751
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id664dd66874ac65d6b78e630c974a6a563408147
Gerrit-Change-Number: 16751
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Comment-Date: Fri, 20 Nov 2020 02:30:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10329 Change apt install retry times to 30

2020-11-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16751 )

Change subject: IMPALA-10329 Change apt install retry times to 30
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6683/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16751
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id664dd66874ac65d6b78e630c974a6a563408147
Gerrit-Change-Number: 16751
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Comment-Date: Fri, 20 Nov 2020 02:28:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10329 Change apt install retry times to 30

2020-11-19 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16751 )

Change subject: IMPALA-10329 Change apt install retry times to 30
..


Patch Set 1: Code-Review+2

Thank you!


--
To view, visit http://gerrit.cloudera.org:8080/16751
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id664dd66874ac65d6b78e630c974a6a563408147
Gerrit-Change-Number: 16751
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Comment-Date: Fri, 20 Nov 2020 02:27:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10329 Change apt install retry times to 30

2020-11-19 Thread Anonymous Coward (Code Review)
zhaoren...@hotmail.com has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16751


Change subject: IMPALA-10329 Change apt install retry times to 30
..

IMPALA-10329 Change apt install retry times to 30

Change apt install retry times to 30 in bootstrap_system.sh,
Because this always timeout recently.
And add solution for waiting the apt's lock-frontend

Change-Id: Id664dd66874ac65d6b78e630c974a6a563408147
---
M bin/bootstrap_system.sh
1 file changed, 6 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/16751/1
--
To view, visit http://gerrit.cloudera.org:8080/16751
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id664dd66874ac65d6b78e630c974a6a563408147
Gerrit-Change-Number: 16751
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward 


[Impala-ASF-CR] IMPALA-10329 Change apt install retry times to 30

2020-11-19 Thread Anonymous Coward (Code Review)
zhaoren...@hotmail.com has abandoned this change. ( 
http://gerrit.cloudera.org:8080/16725 )

Change subject: IMPALA-10329 Change apt install retry times to 30
..


Abandoned

duplicate
--
To view, visit http://gerrit.cloudera.org:8080/16725
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I790750da36ad53c87a830dfab6803a1862490daf
Gerrit-Change-Number: 16725
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 


[Impala-ASF-CR] IMPALA-10329 Change apt install retry times to 30

2020-11-19 Thread Anonymous Coward (Code Review)
zhaoren...@hotmail.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16725 )

Change subject: IMPALA-10329 Change apt install retry times to 30
..


Patch Set 3:

Sorry, Jim, my develop environment is recreated, so I create a new commit on 
here: http://gerrit.cloudera.org:8080/16751
And I will abandon this.


--
To view, visit http://gerrit.cloudera.org:8080/16725
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I790750da36ad53c87a830dfab6803a1862490daf
Gerrit-Change-Number: 16725
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Comment-Date: Fri, 20 Nov 2020 02:07:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits

2020-11-19 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
..


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/cup/sql-parser.cup@3115
PS5, Line 3115:   KW_WHERE opt_plan_hints:pred_hints expr:e
I guess this is a bit limiting in the it applies only to the whole where 
clause. Should it be part of the expr production below so it can be attached to 
any expression?

I don't think this affects the functionality of this patch, since we're only 
checking the top-level statement anyway, but it seems like itwould me more 
elegant to have the expr hint be associated with the expr in the parser?

If there are complications with that, maybe a comment here explaining the 
limitation would be sufficient.


http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/analysis/Predicate.java
File fe/src/main/java/org/apache/impala/analysis/Predicate.java:

http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/analysis/Predicate.java@30
PS5, Line 30: isAlwaysTrue_
maybe hasAlwaysTrueHint_ just to make it crystal-clear that it's not actually a 
guarantee?


http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@869
PS5, Line 869: if (fsHasBlocks && fd.getNumFileBlocks() == 0) 
continue;
nit: use braces for multi-line if


http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@870
PS5, Line 870: fd.getFileLength()
> Yes. Totally agree. We probably can live with the 0-row data files through
We already had to deal with a similar issue here
https://impala.apache.org/docs/build/html/topics/impala_optimize_partition_key_scans.html
 ; we should document similarly

Generally it doesn't make any sense to write files with 0 rows and it should be 
rare. Our experience is that some misbehaving tools can generate 0 row files 
(we've seen Spark do it with issues like 
https://issues.apache.org/jira/browse/SPARK-10216).

You're right that the files are non-empty because they have the footer with the 
schema. I don't think there's an upper bound on the size either though, cause 
they could have an arbitrarily complex scheme.



--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 5
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 20 Nov 2020 01:29:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8990: Fix flakiness in test set request pool

2020-11-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16749 )

Change subject: IMPALA-8990: Fix flakiness in test_set_request_pool
..


Patch Set 2: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6681/


--
To view, visit http://gerrit.cloudera.org:8080/16749
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ife06509e936443579ca60780013ce01352c8932e
Gerrit-Change-Number: 16749
Gerrit-PatchSet: 2
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 20 Nov 2020 00:38:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10325 Parquet scan should use min/max statistics to skip pages based on equi-join predicate

2020-11-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: IMPALA-10325 Parquet scan should use min/max statistics to skip 
pages based on equi-join predicate
..


Patch Set 11:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/7691/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 11
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 19 Nov 2020 21:58:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10325 Parquet scan should use min/max statistics to skip pages based on equi-join predicate

2020-11-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: IMPALA-10325 Parquet scan should use min/max statistics to skip 
pages based on equi-join predicate
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16720/11/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/16720/11/common/thrift/PlanNodes.thrift@299
PS11, Line 299:   12: optional i32 overlap_predicate_start_index
line has trailing whitespace



--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 11
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 19 Nov 2020 21:37:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10325 Parquet scan should use min/max statistics to skip pages based on equi-join predicate

2020-11-19 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#11). ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: IMPALA-10325 Parquet scan should use min/max statistics to skip 
pages based on equi-join predicate
..

IMPALA-10325 Parquet scan should use min/max statistics to skip pages based on 
equi-join predicate

This patch adds the logic to utilize min/max stats for Parquet row
groups or pages to skip these entities when they don't qualify an
equi-join predicate.

A new class of predicates called overlap predicates is introduced to aid
in the determination of whether a Parquet row group or a page overlap
with the a range computed from the hash join. If not, then the entire
Parquet row group or the page are skipped. The new class of predicates
co-exist with the existing min/max conjuncts that are introduced based
on the local scan predicates.

Both classes of predicates can work individually or togther with each
other. The overlap predicates are evaualted after the existing min/max
conjuncts.

To be done:
1. Handle all data types;
2. Unit testing;
3. Core testing.

Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
---
M be/src/exec/exec-node.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-column-stats.cc
M be/src/exec/parquet/parquet-column-stats.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
14 files changed, 386 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/16720/11
--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 11
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] [WIP] IMPALA-10325 Parquet scan should use min/max statistics to skip pages based on equi-join predicate

2020-11-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: [WIP] IMPALA-10325 Parquet scan should use min/max statistics 
to skip pages based on equi-join predicate
..


Patch Set 9:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/7690/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 9
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 19 Nov 2020 19:52:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] [WIP] IMPALA-10325 Parquet scan should use min/max statistics to skip pages based on equi-join predicate

2020-11-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: [WIP] IMPALA-10325 Parquet scan should use min/max statistics 
to skip pages based on equi-join predicate
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16720/9/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/16720/9/common/thrift/PlanNodes.thrift@299
PS9, Line 299:   12: optional list slot_usage_map
line has trailing whitespace



--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 9
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 19 Nov 2020 19:36:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] [WIP] IMPALA-10325 Parquet scan should use min/max statistics to skip pages based on equi-join predicate

2020-11-19 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: [WIP] IMPALA-10325 Parquet scan should use min/max statistics 
to skip pages based on equi-join predicate
..

[WIP] IMPALA-10325 Parquet scan should use min/max statistics to skip pages 
based on equi-join predicate

This patch adds the logic to utilize min/max stats for Parquet row
groups or pages to skip these entities when they don't qualify an
equi-join predicate.

A new class of predicates called overlap predicates is introduced to aid
in the determination of whether a Parquet row group or a page overlap
with the a range computed from the hash join. If not, then the entire
Parquet row group or the page are skipped. The new class of predicates
co-exist with the existing min/max conjuncts that are introduced based
on the local scan predicates. Both classes of predicates can work
individually or togther with each other.

Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
---
M be/src/exec/exec-node.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-column-stats.cc
M be/src/exec/parquet/parquet-column-stats.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
14 files changed, 442 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/16720/9
--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 9
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-8990: Fix flakiness in test set request pool

2020-11-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16749 )

Change subject: IMPALA-8990: Fix flakiness in test_set_request_pool
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6681/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16749
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ife06509e936443579ca60780013ce01352c8932e
Gerrit-Change-Number: 16749
Gerrit-PatchSet: 2
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 19 Nov 2020 19:13:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8990: Fix flakiness in test set request pool

2020-11-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16749 )

Change subject: IMPALA-8990: Fix flakiness in test_set_request_pool
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16749
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ife06509e936443579ca60780013ce01352c8932e
Gerrit-Change-Number: 16749
Gerrit-PatchSet: 2
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 19 Nov 2020 19:13:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10332: Add file formats to HdfsScanNode's thrift representation.

2020-11-19 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16728 )

Change subject: IMPALA-10332: Add file formats to HdfsScanNode's thrift 
representation.
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16728/8/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
File testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test:

http://gerrit.cloudera.org:8080/#/c/16728/8/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test@298
PS8, Line 298: |  | file formats: [ORC]
We shouldn't see this, as explain_level is only 2, right?



--
To view, visit http://gerrit.cloudera.org:8080/16728
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iad6b8271bd248983f327c07883a3bedf50f25b5d
Gerrit-Change-Number: 16728
Gerrit-PatchSet: 8
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 19 Nov 2020 18:44:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10332: Add file formats to HdfsScanNode's thrift representation.

2020-11-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16728 )

Change subject: IMPALA-10332: Add file formats to HdfsScanNode's thrift 
representation.
..


Patch Set 8: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6680/


--
To view, visit http://gerrit.cloudera.org:8080/16728
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iad6b8271bd248983f327c07883a3bedf50f25b5d
Gerrit-Change-Number: 16728
Gerrit-PatchSet: 8
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 19 Nov 2020 18:35:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits

2020-11-19 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
..


Patch Set 5: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@870
PS5, Line 870: fd.getFileLength()
> The getFileLength() check is used in other places too..so I borrowed that f
Yes. Totally agree. We probably can live with the 0-row data files through 
documentation.



--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 5
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 19 Nov 2020 17:52:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits

2020-11-19 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@870
PS5, Line 870: fd.getFileLength()
> Is it possible for a Parquet data file with empty # of rows pass this test?
The getFileLength() check is used in other places too..so I borrowed that from 
generateScanRangeSpec().  For self-describing schema file formats like Parquet, 
 yes it is possible for the length to be non-zero and  num_rows zero. I think 
that handling such cases will need some major rework of this 
computeScanRangeLocation() method since right now it is agnostic to the file 
format (it does care about file system type but not so much the formats).  
Further,  I believe other changes will be needed in the metadata catalog layer 
to ensure this FileMetaData is plumbed through although I haven't looked 
closely into that. The trade-off is the size of the metadata in catalog cache 
could blow up for large number of files and we are already run into significant 
memory issues.


http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@947
PS5, Line 947: if (isSimpleLimit && simpleLimitNumRows ==
 :   analyzer.getSimpleLimitStatus().second) {
 : // for the simple limit case if the estimated rows has 
already reached the limit
 : // there's no need to process more partitions
 : break;
 :   }
> This is good.
Ack



--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 5
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 19 Nov 2020 16:51:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8990: Fix flakiness in test set request pool

2020-11-19 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16749 )

Change subject: IMPALA-8990: Fix flakiness in test_set_request_pool
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16749
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ife06509e936443579ca60780013ce01352c8932e
Gerrit-Change-Number: 16749
Gerrit-PatchSet: 1
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 19 Nov 2020 16:49:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10329 Change apt install retry times to 30

2020-11-19 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16725 )

Change subject: IMPALA-10329 Change apt install retry times to 30
..


Patch Set 3:

> for 'why should it be done', no reason, just don't output to the
 > console, I already tested, adding it or not don't impact the logic.

In that case, let's not redirect. I think that's more the style of the rest of 
the script.


--
To view, visit http://gerrit.cloudera.org:8080/16725
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I790750da36ad53c87a830dfab6803a1862490daf
Gerrit-Change-Number: 16725
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Comment-Date: Thu, 19 Nov 2020 16:17:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

2020-11-19 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
..


Patch Set 4: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16721/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/16721/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2012
PS2, Line 2012: ncompleteTable && isSynchronizedIceber
> We still need to invoke HMS dropTable for synchronized tables that don't ha
I am also unsure about this scenario, but I preferred not to change the 
original handling.



--
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Thu, 19 Nov 2020 15:10:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9121: try to avoid ASAN error in hdfs-util-test

2020-11-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16748 )

Change subject: IMPALA-9121: try to avoid ASAN error in hdfs-util-test
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16748
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7b42be0f8b5d6c6a31095f9d1a278fd82bd500c
Gerrit-Change-Number: 16748
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 19 Nov 2020 14:51:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9121: try to avoid ASAN error in hdfs-util-test

2020-11-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16748 )

Change subject: IMPALA-9121: try to avoid ASAN error in hdfs-util-test
..

IMPALA-9121: try to avoid ASAN error in hdfs-util-test

I couldn't discern the likely root cause of the ASAN error,
but have a hunch that it's a background thread accessing
some data structure that is being torn down as the
process exits.

The tests in this file are simple so there shouldn't really
be that much that can go wrong, except for the stuff
started by ExecEnv::Init().

I modified the test to only initialize the necessary configs
in ExecEnv, not start up the whole thing. Hopefully that
make the problem go away.

Testing:
Looped the test locally with ASAN.

Change-Id: Ic7b42be0f8b5d6c6a31095f9d1a278fd82bd500c
Reviewed-on: http://gerrit.cloudera.org:8080/16748
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/util/hdfs-util-test.cc
3 files changed, 18 insertions(+), 15 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/16748
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic7b42be0f8b5d6c6a31095f9d1a278fd82bd500c
Gerrit-Change-Number: 16748
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

2020-11-19 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16741 )

Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition 
transforms as built-in functions
..


Patch Set 1:

Thanks for working on this. The change looks great, though I became a bit 
unsure about whether we want to make these functions visible to the user. 
However, it's definitely useful during development. Maybe write C++ unit tests 
instead of e2e tests, and later we can decide the visibility of these functions.


--
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 19 Nov 2020 14:46:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10314: Optimize planning time for simple limits

2020-11-19 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
..


Patch Set 5:

(2 comments)

Looks very good!

The empty parquet data file may be a corner case to worry about.

http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@870
PS5, Line 870: fd.getFileLength()
Is it possible for a Parquet data file with empty # of rows pass this test? 
Note that due to the meta-data portion, such a file will have some number of 
bytes.  See one case here https://github.com/G-Research/ParquetSharp/issues/110.

If we can look at the meta-data of such a file, the number of rows is right 
there.

871 struct FileMetaData { 
872   /** Version of this file **/
873   1: required i32 version  
874   
875   /** Parquet schema for this file.  This schema contains metadata for all 
the columns.
876* The schema is represented as a tree with a single root.  The nodes of 
the tree
877* are flattened to a list by doing a depth-first traversal.
878* The column metadata contains the path in the schema for that column 
which can be
879* used to map columns to nodes in the schema.
 
880* The first element is the root **/  
881   2: required list schema;  
882   
883   /** Number of rows in this file **/   
884   3: required i64 num_rows


http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@947
PS5, Line 947: if (isSimpleLimit && simpleLimitNumRows ==
 :   analyzer.getSimpleLimitStatus().second) {
 : // for the simple limit case if the estimated rows has 
already reached the limit
 : // there's no need to process more partitions
 : break;
 :   }
This is good.



--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 5
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 19 Nov 2020 14:30:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10332: Add file formats to HdfsScanNode's thrift representation.

2020-11-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16728 )

Change subject: IMPALA-10332: Add file formats to HdfsScanNode's thrift 
representation.
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6680/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16728
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iad6b8271bd248983f327c07883a3bedf50f25b5d
Gerrit-Change-Number: 16728
Gerrit-PatchSet: 8
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 19 Nov 2020 13:08:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10332: Add file formats to HdfsScanNode's thrift representation.

2020-11-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16728 )

Change subject: IMPALA-10332: Add file formats to HdfsScanNode's thrift 
representation.
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7689/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16728
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iad6b8271bd248983f327c07883a3bedf50f25b5d
Gerrit-Change-Number: 16728
Gerrit-PatchSet: 7
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 19 Nov 2020 12:38:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10332: Add file formats to HdfsScanNode's thrift representation.

2020-11-19 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/16728 )

Change subject: IMPALA-10332: Add file formats to HdfsScanNode's thrift 
representation.
..

IMPALA-10332: Add file formats to HdfsScanNode's thrift representation.

List all file formats that a HdfsScanNode needs to process in any
fragment instance. It is possible that some file formats will not be
needed in all fragment instances.

This is a step towards sharing codegen between different impala
backends. Using the file formats provided in the thrift file, a backend
can codegen code for file formats that are not needed in its own process
but are needed in other fragment instances running on other backends,
and the resulting binary can be shared between multiple backends.

Codegenning for file formats will be done based on the thrift message
and not on what is needed for the actual backend. This leads to some
extra work in case a file format is not needed for the current backend
and codegen sharing is not available (at this point it is not
implemented). However, the overall number of such cases is low.

Also adding the file formats to the node's explain string at level 3.

Testing:
 - Added tests to verify that the file formats are present in the
   explain string at level 3.

Change-Id: Iad6b8271bd248983f327c07883a3bedf50f25b5d
---
M be/src/exec/hdfs-scan-node-base.cc
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test
5 files changed, 60 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/16728/7
--
To view, visit http://gerrit.cloudera.org:8080/16728
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iad6b8271bd248983f327c07883a3bedf50f25b5d
Gerrit-Change-Number: 16728
Gerrit-PatchSet: 7
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

2020-11-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7688/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Thu, 19 Nov 2020 11:58:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

2020-11-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7687/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Thu, 19 Nov 2020 11:48:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

2020-11-19 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
..


Patch Set 4:

PS4 is only a rebase.


--
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Thu, 19 Nov 2020 11:42:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

2020-11-19 Thread Zoltan Borok-Nagy (Code Review)
Hello Gabor Kaszab, wangsheng, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16721

to look at the new patch set (#4).

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
..

IMPALA-10152: Add support for Iceberg HiveCatalog

HiveCatalog is one of Iceberg's catalog implementations. It uses
the Hive metastore and it is the recommended catalog implementation
when the table data is stored in object stores like S3.

This commit updates the Iceberg version to a newer one, and it also
retrieves Iceberg from the CDP distribution because that version of
Iceberg is built against Hive 3 (Impala is only compatible with
Hive 3).

This commit makes HiveCatalog the default Iceberg catalog in Impala
because it can be used in more environments (e.g. cloud stores),
and it is more featureful. Also, other engines that store their
table metadata in HMS will probably use HiveCatalog as well.

Tables stored in HiveCatalog are similar to Kudu tables with HMS
integration, i.e. modifying an Iceberg table via the Iceberg APIs
also modifies the HMS table. So in CatalogOpExecutor we handle
such Iceberg tables similarly to integrated Kudu tables.

Testing:
 * Added e2e tests for creating, writing, and altering Iceberg
   tables
 * Added SHOW CREATE TABLE tests

Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
---
M bin/impala-config.sh
M common/thrift/CatalogObjects.thrift
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
14 files changed, 524 insertions(+), 90 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/16721/4
--
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 


[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

2020-11-19 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16721 )

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16721/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16721/1//COMMIT_MSG@29
PS1, Line 29: e2e
> e2e
Done


http://gerrit.cloudera.org:8080/#/c/16721/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/16721/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2012
PS2, Line 2012:
> This will "double drop" Kudu tables where  existingTbl instanceof Incomplet
We still need to invoke HMS dropTable for synchronized tables that don't have 
HMS integration enabled.

So the "double drop" can only happen when

 existingTbl instanceof IncompleteTable &&
 msTbl table could be retrieved &&
 isHmsIntegrationAutomatic(msTbl)

I'm not sure if we can hit such scenario with normal usage, but anyway I 
restricted this condition to Iceberg tables.


http://gerrit.cloudera.org:8080/#/c/16721/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2015
PS2, Line 2015: !isHmsIntegrationA
> it calls dropTable, so needsHmsDropTable would clearer
Done



--
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Thu, 19 Nov 2020 11:29:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10152: Add support for Iceberg HiveCatalog

2020-11-19 Thread Zoltan Borok-Nagy (Code Review)
Hello Gabor Kaszab, wangsheng, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16721

to look at the new patch set (#3).

Change subject: IMPALA-10152: Add support for Iceberg HiveCatalog
..

IMPALA-10152: Add support for Iceberg HiveCatalog

HiveCatalog is one of Iceberg's catalog implementations. It uses
the Hive metastore and it is the recommended catalog implementation
when the table data is stored in object stores like S3.

This commit updates the Iceberg version to a newer one, and it also
retrieves Iceberg from the CDP distribution because that version of
Iceberg is built against Hive 3 (Impala is only compatible with
Hive 3).

This commit makes HiveCatalog the default Iceberg catalog in Impala
because it can be used in more environments (e.g. cloud stores),
and it is more featureful. Also, other engines that store their
table metadata in HMS will probably use HiveCatalog as well.

Tables stored in HiveCatalog are similar to Kudu tables with HMS
integration, i.e. modifying an Iceberg table via the Iceberg APIs
also modifies the HMS table. So in CatalogOpExecutor we handle
such Iceberg tables similarly to integrated Kudu tables.

Testing:
 * Added e2e tests for creating, writing, and altering Iceberg
   tables
 * Added SHOW CREATE TABLE tests

Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
---
M bin/impala-config.sh
M common/thrift/CatalogObjects.thrift
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
A fe/src/main/java/org/apache/impala/catalog/iceberg/IcebergHiveCatalog.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-alter.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
14 files changed, 524 insertions(+), 90 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/16721/3
--
To view, visit http://gerrit.cloudera.org:8080/16721
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie574589a1751aaa9ccbd34a89c6819714d103197
Gerrit-Change-Number: 16721
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 


[Impala-ASF-CR] IMPALA-9121: try to avoid ASAN error in hdfs-util-test

2020-11-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16748 )

Change subject: IMPALA-9121: try to avoid ASAN error in hdfs-util-test
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6679/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16748
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7b42be0f8b5d6c6a31095f9d1a278fd82bd500c
Gerrit-Change-Number: 16748
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 19 Nov 2020 09:26:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9121: try to avoid ASAN error in hdfs-util-test

2020-11-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16748 )

Change subject: IMPALA-9121: try to avoid ASAN error in hdfs-util-test
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16748
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7b42be0f8b5d6c6a31095f9d1a278fd82bd500c
Gerrit-Change-Number: 16748
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 19 Nov 2020 09:26:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9121: try to avoid ASAN error in hdfs-util-test

2020-11-19 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16748 )

Change subject: IMPALA-9121: try to avoid ASAN error in hdfs-util-test
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16748
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7b42be0f8b5d6c6a31095f9d1a278fd82bd500c
Gerrit-Change-Number: 16748
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 19 Nov 2020 09:26:32 +
Gerrit-HasComments: No