[Impala-ASF-CR] IMPALA-10361: Use field id to resolve columns for Iceberg tables

2020-12-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16788 )

Change subject: IMPALA-10361: Use field id to resolve columns for Iceberg tables
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7784/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I057bdc6ab2859cc4d40de5ed428d0c20028b8435
Gerrit-Change-Number: 16788
Gerrit-PatchSet: 6
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Sat, 05 Dec 2020 04:12:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10361: Use field id to resolve columns for Iceberg tables

2020-12-04 Thread wangsheng (Code Review)
wangsheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16788 )

Change subject: IMPALA-10361: Use field id to resolve columns for Iceberg tables
..


Patch Set 6:

(2 comments)

Hi Zoltan, thanks for review again. I think it is a good idea to handle with 
orc tables in another patch. I will consider this lately.
And I modify code to set FIELD_ID resolving for Iceberg tables, which means 
'PARQUET_FALLBACK_SCHEMA_RESOLUTION' is invalid for Iceberg tables. If you 
agree with this kind of design, I will update commit message lately.

http://gerrit.cloudera.org:8080/#/c/16788/5/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java:

http://gerrit.cloudera.org:8080/#/c/16788/5/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@339
PS5, Line 339: boolean isFullAcidTable = 
AcidUtils.isFullAcidTable(msTbl.getParameters());
> Iceberg tables cannot be full ACID, maybe it can be a precondition.
Done


http://gerrit.cloudera.org:8080/#/c/16788/5/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/16788/5/testdata/data/README@608
PS5, Line 608: generated file will contains multi blocks, multi pages per block.
> Please add information about the newly added files and tests.
Done



--
To view, visit http://gerrit.cloudera.org:8080/16788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I057bdc6ab2859cc4d40de5ed428d0c20028b8435
Gerrit-Change-Number: 16788
Gerrit-PatchSet: 6
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Sat, 05 Dec 2020 03:54:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10361: Use field id to resolve columns for Iceberg tables

2020-12-04 Thread wangsheng (Code Review)
wangsheng has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/16788 )

Change subject: IMPALA-10361: Use field id to resolve columns for Iceberg tables
..

IMPALA-10361: Use field id to resolve columns for Iceberg tables

We supported resolve column by field id for Iceberg table in this
patch. We can use 'set PARQUET_FALLBACK_SCHEMA_RESOLUTION=FIELD_ID'
or 'set PARQUET_FALLBACK_SCHEMA_RESOLUTION=2' to choose field id
resolving. But pay attention, if you use this for non-Iceberg
table, the result will be NULL.

Change-Id: I057bdc6ab2859cc4d40de5ed428d0c20028b8435
---
M be/src/exec/parquet/parquet-metadata-utils.cc
M be/src/exec/parquet/parquet-metadata-utils.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/types.cc
M be/src/runtime/types.h
M be/src/service/query-options-test.cc
M common/thrift/CatalogObjects.thrift
M common/thrift/Descriptors.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/Types.thrift
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/IcebergColumn.java
A fe/src/main/java/org/apache/impala/catalog/IcebergStructField.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/StructType.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/data/README
A 
testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/1-1-13d79bd6-4b97-4680-b4e1-52e93b6ce04e-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/6-6-305c9b7a-f42d-4245-b806-dfa7a792593f-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/9-9-224fe2d6-b0d9-42d6-bc95-15f52ecb29ad-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-1a38e294-5992-48d9-a18e-08e129bb418c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-74cfcf22-3de2-489a-b1ec-d5141e75a8e8-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-5f91dc85-b8f3-4cc2-a5c6-38b7fee49709-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-dc3510cc-e765-43bc-be03-c5561a8d50a3-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-364afc4a-b718-406d-a532-58fab5c8f85d-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/4-4-7a1a8e89-8aeb-4405-be64-76557432cf21-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00014-14-765d552a-fddc-42f3-adfd-ecba20a01d80-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00015-15-9957db43-3b9a-4a50-9946-d003cc1d461c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00019-19-1e1895d0-1f42-4c30-989f-968802831077-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00020-20-bb59ac6d-aeee-4c35-9f8a-1a03127d33b8-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00028-28-44ba3ad9-737c-4416-a32c-501cc9a4aa90-0.parquet
A 

[Impala-ASF-CR] IMPALA-10377 Improve the accuracy of resource estimation PlanNode does not consider some factors when estimating memory, this will cause a large error rate

2020-12-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16822 )

Change subject: IMPALA-10377 Improve the accuracy of resource estimation 
PlanNode does not consider some factors when estimating memory, this will cause 
a large error rate
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7783/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16822
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46c656bc88b969f4de99e187df16be3887592f3d
Gerrit-Change-Number: 16822
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward <54liu...@163.com>
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 05 Dec 2020 03:39:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10377 Improve the accuracy of resource estimation PlanNode does not consider some factors when estimating memory, this will cause a large error rate

2020-12-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16822 )

Change subject: IMPALA-10377 Improve the accuracy of resource estimation 
PlanNode does not consider some factors when estimating memory, this will cause 
a large error rate
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6731/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/16822
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46c656bc88b969f4de99e187df16be3887592f3d
Gerrit-Change-Number: 16822
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward <54liu...@163.com>
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 05 Dec 2020 03:31:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators

2020-12-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16435 )

Change subject: IMPALA-9936: Only send invalidations in DDL responses to 
LocalCatalog coordinators
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7782/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id45827295ddee3eb6e98a11c55f582b2aebe5f38
Gerrit-Change-Number: 16435
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Sat, 05 Dec 2020 03:27:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators

2020-12-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16435 )

Change subject: IMPALA-9936: Only send invalidations in DDL responses to 
LocalCatalog coordinators
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6730/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id45827295ddee3eb6e98a11c55f582b2aebe5f38
Gerrit-Change-Number: 16435
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Sat, 05 Dec 2020 03:18:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10377 Improve the accuracy of resource estimation PlanNode does not consider some factors when estimating memory, this will cause a large error rate

2020-12-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16822 )

Change subject: IMPALA-10377 Improve the accuracy of resource estimation 
PlanNode does not consider some factors when estimating memory, this will cause 
a large error rate
..


Patch Set 1:

(39 comments)

http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@64
PS1, Line 64:   // If the group clause is empty ( aggInfo.getGroupingExprs() is 
empty ),
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@538
PS1, Line 538: // A skew factor of 1.5 was added to account for 
data skew among multiple fragment instances.
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@538
PS1, Line 538: // A skew factor of 1.5 was added to account for 
data skew among multiple fragment instances.
line too long (106 > 90)


http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@539
PS1, Line 539: // This number was derived using empirical analysis 
of real-world and benchmark (tpch, tpcds) queries.
line too long (114 > 90)


http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@544
PS1, Line 544: perInstanceInputCardinality = (long) 
Math.ceil(inputCardinality / numInstances);
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@551
PS1, Line 551:
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@553
PS1, Line 553:   // A reduction factor of 2 (input rows divided by 
output rows) was added to grow hash tables.
line too long (103 > 90)


http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@554
PS1, Line 554:   // If the reduction factor is lower than 2, only part 
of the data will be inserted into the hash table.
line too long (113 > 90)


http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@562
PS1, Line 562:   // The memory of the data stored in hash table and the 
memory of the hash tableā€˜s structure
line too long (99 > 90)


http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@563
PS1, Line 563:   perInstanceDataBytes = 
(long)Math.ceil(perInstanceCardinality *
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@564
PS1, Line 564:(avgRowSize_ + 
PlannerContext.SIZE_OF_BUCKET));
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@239
PS1, Line 239:   // The memory of the data stored in hash table and
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@242
PS1, Line 242:   BitUtil.roundUpToPowerOf2((long) Math.ceil(3 * rhsCard 
/ 2)) *
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@245
PS1, Line 245: perBuildInstanceDataBytes += (rhsCard - rhsNdv) *
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/PlannerContext.java
File fe/src/main/java/org/apache/impala/planner/PlannerContext.java:

http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/PlannerContext.java@45
PS1, Line 45:
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/SortNode.java
File fe/src/main/java/org/apache/impala/planner/SortNode.java:

http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/main/java/org/apache/impala/planner/SortNode.java@338
PS1, Line 338:   perInstanceMemEstimate = fullInputSize < 0 ?
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/16822/1/fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
File fe/src/test/java/org/apache/impala/planner/CardinalityTest.java:


[Impala-ASF-CR] IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators

2020-12-04 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16435 )

Change subject: IMPALA-9936: Only send invalidations in DDL responses to 
LocalCatalog coordinators
..


Patch Set 5: Code-Review+2

(1 comment)

Thank Tim's quick review! Carry on Tim's +2.

http://gerrit.cloudera.org:8080/#/c/16435/4/be/src/exec/catalog-op-executor.cc
File be/src/exec/catalog-op-executor.cc:

http://gerrit.cloudera.org:8080/#/c/16435/4/be/src/exec/catalog-op-executor.cc@72
PS4, Line 72: VerifyMinimalResponse
> nit: VerifyMinimalResponse
Done



--
To view, visit http://gerrit.cloudera.org:8080/16435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id45827295ddee3eb6e98a11c55f582b2aebe5f38
Gerrit-Change-Number: 16435
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Sat, 05 Dec 2020 03:17:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10377 Improve the accuracy of resource estimation PlanNode does not consider some factors when estimating memory, this will cause a large error rate

2020-12-04 Thread Anonymous Coward (Code Review)
54liu...@163.com has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16822


Change subject: IMPALA-10377 Improve the accuracy of resource estimation 
PlanNode does not consider some factors when estimating memory, this will cause 
a large error rate
..

IMPALA-10377 Improve the accuracy of resource estimation
PlanNode does not consider some factors when estimating memory, this will cause 
a large error rate

AggregationNode
1.The memory occupied by hash table's own data structure is not considered. 
Hash table inserts a new value, which will add a bucket. The size of a bucket 
is 16 bytes.
2.When estimating the NDV of merge aggregation, if there are multiple grouping 
exprs, it may be divided by the number of Fragment Instances several times, and 
it should be divided only once.
3.When estimating the NDV of merge aggregation, and there are multiple grouping 
exprs, the estimated memory is much smaller than the actual use.
4.If there is no grouping exprs, the estimated memory is much larger than the 
actual use.
5.If the NDV of grouping exprs is very small, the estimated memory is much 
larger than the actual use.

SortNode
1.Estimate the memory usage of external sort. the estimated memory is much 
smaller than the actual use.

HashJoinNode
1.The memory occupied by hash table's own data structure is not considered.Hash 
Table will keep duplicate data, so the size of DuplicateNode should be 
considered.
2.Hash table will create multiple buckets in advance. The size of these buckets 
should be considered.

KuduScanNode
1.Estimate memory by scanning all columns,the estimated memory is much larger 
than the actual use.

Change-Id: I46c656bc88b969f4de99e187df16be3887592f3d
---
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlannerContext.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
8 files changed, 227 insertions(+), 16 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/16822/1
--
To view, visit http://gerrit.cloudera.org:8080/16822
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I46c656bc88b969f4de99e187df16be3887592f3d
Gerrit-Change-Number: 16822
Gerrit-PatchSet: 1
Gerrit-Owner: Anonymous Coward <54liu...@163.com>


[Impala-ASF-CR] IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators

2020-12-04 Thread Quanlong Huang (Code Review)
Hello Vihang Karajgaonkar, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16435

to look at the new patch set (#5).

Change subject: IMPALA-9936: Only send invalidations in DDL responses to 
LocalCatalog coordinators
..

IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog 
coordinators

Catalogd RPC response contains the updated catalog objects in a full
form. For instance, a RPC for adding a new partition to an HdfsTable
will return the whole HdfsTable object(metadata) containing all the
partitions. This is required by legacy coordinators where the whole
HdfsTable object is used to replace the stale object(metadata snapshot).
However, LocalCatalog coordinators just need the object names for
invalidations. It's a waste of space to send the full catalog objects to
LocalCatalog coordinators. On the other hand, there is a risk of OOM due
to hitting the Java array limit when serializing a table that has a huge
metadata footprint.

This patch refactors the catalogd RPC responses to only send back
invalidations in need. To distinguish between legacy and LocalCatalog
coordinators, a new field, want_minimal_response, is introduced in
TCatalogServiceRequestHeader which is the header for most of the
Catalogd RPC requests (e.g. TDdlExecRequest, TUpdateCatalogRequest and
TResetMetadataRequest). LocalCatalog coordinators will set this field to
true. When adding updated catalog objects to the response, catalogd will
add invalidations which only contain the object names (e.g. db name,
table name). Note that function objects are small so are ignored in this
optimization.

Tests:
 - Add DCHECKs in catalog-op-executor.cc to verify the catalog objects
   recieved by LocalCatalog coordinators are in minimal mode.
 - Run test_ddl.py in both legacy catalog mode and local catalog mode.

Change-Id: Id45827295ddee3eb6e98a11c55f582b2aebe5f38
---
M be/src/exec/catalog-op-executor.cc
M be/src/service/client-request-state.cc
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/CatalogObject.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
10 files changed, 239 insertions(+), 139 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/16435/5
--
To view, visit http://gerrit.cloudera.org:8080/16435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id45827295ddee3eb6e98a11c55f582b2aebe5f38
Gerrit-Change-Number: 16435
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-9865: part 1: basic profile log parser

2020-12-04 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16821 )

Change subject: IMPALA-9865: part 1: basic profile log parser
..


Patch Set 2:

(2 comments)

> Uploaded patch set 2.

Hi Tim,
Thanks for writing this parser. This is very useful.
I just have 2 comment.

http://gerrit.cloudera.org:8080/#/c/16821/2/be/src/util/impala-profile-tool.cc
File be/src/util/impala-profile-tool.cc:

http://gerrit.cloudera.org:8080/#/c/16821/2/be/src/util/impala-profile-tool.cc@31
PS2, Line 31: // is pretty-printed to standard output.
Add simple usage example in the doc maybe? like

impala-profile-tool < impala_profile_log_1.1-1607057366897


http://gerrit.cloudera.org:8080/#/c/16821/2/be/src/util/impala-profile-tool.cc@59
PS2, Line 59: getline(cin, line);
Tried to run the parser against my local runtime profile log. It seems It 
always hit "Error reading line" when it reach EOF.
What if we move this getline as the loop condition? say

for (std::string line; std::getline(cin, line); ) {



--
To view, visit http://gerrit.cloudera.org:8080/16821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6178399ac96e176f7067cc47347e51cda2f3
Gerrit-Change-Number: 16821
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 05 Dec 2020 02:48:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators

2020-12-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16435 )

Change subject: IMPALA-9936: Only send invalidations in DDL responses to 
LocalCatalog coordinators
..


Patch Set 4: Code-Review+2

(1 comment)

This makes sense to me.

http://gerrit.cloudera.org:8080/#/c/16435/4/be/src/exec/catalog-op-executor.cc
File be/src/exec/catalog-op-executor.cc:

http://gerrit.cloudera.org:8080/#/c/16435/4/be/src/exec/catalog-op-executor.cc@72
PS4, Line 72: verifyMinimalResponse
nit: VerifyMinimalResponse



--
To view, visit http://gerrit.cloudera.org:8080/16435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id45827295ddee3eb6e98a11c55f582b2aebe5f38
Gerrit-Change-Number: 16435
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Sat, 05 Dec 2020 01:48:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9865: part 1: basic profile log parser

2020-12-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16821 )

Change subject: IMPALA-9865: part 1: basic profile log parser
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7781/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6178399ac96e176f7067cc47347e51cda2f3
Gerrit-Change-Number: 16821
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 05 Dec 2020 01:24:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9865: part 1: basic profile log parser

2020-12-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16821 )

Change subject: IMPALA-9865: part 1: basic profile log parser
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7780/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6178399ac96e176f7067cc47347e51cda2f3
Gerrit-Change-Number: 16821
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 05 Dec 2020 01:23:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9865: part 1: basic profile log parser

2020-12-04 Thread Tim Armstrong (Code Review)
Hello Riza Suminto, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16821

to look at the new patch set (#2).

Change subject: IMPALA-9865: part 1: basic profile log parser
..

IMPALA-9865: part 1: basic profile log parser

This adds a utility that consumes the Impala profile log format from
stdin and pretty-prints the profiles.

It supports some basic filters - --query_id, --min_timestamp and
--max_timestamp.

If --gen_experimental_profile=true is set, it dumps the aggregated
part of the profile with the full output for the new experimental
profiles. In a future change, we should detect this based on
the profile version set.

This utility will be extended in future with more options, but
is already useful in that it can handle the new experimental
profile format and produce pretty-printed output consistent
with the Impala web UI and impala-shell.

Change-Id: I6178399ac96e176f7067cc47347e51cda2f3
---
M be/src/util/CMakeLists.txt
A be/src/util/impala-profile-tool.cc
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
4 files changed, 115 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/16821/2
--
To view, visit http://gerrit.cloudera.org:8080/16821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6178399ac96e176f7067cc47347e51cda2f3
Gerrit-Change-Number: 16821
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-9865: part 1: basic profile log parser

2020-12-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16821 )

Change subject: IMPALA-9865: part 1: basic profile log parser
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16821/1/be/src/util/impala-profile-tool.cc
File be/src/util/impala-profile-tool.cc:

http://gerrit.cloudera.org:8080/#/c/16821/1/be/src/util/impala-profile-tool.cc@34
PS1, Line 34: // --query_id=: given an impala query ID, only process 
profiles with this query id
line too long (92 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/16821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6178399ac96e176f7067cc47347e51cda2f3
Gerrit-Change-Number: 16821
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 05 Dec 2020 01:02:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9865: part 1: basic profile log parser

2020-12-04 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16821


Change subject: IMPALA-9865: part 1: basic profile log parser
..

IMPALA-9865: part 1: basic profile log parser

This adds a utility that consumes the Impala profile log format from
stdin and pretty-prints the profiles.

It supports some basic filters - --query_id, --min_timestamp and
--max_timestamp.

If --gen_experimental_profile=true is set, it dumps the aggregated
part of the profile with the full output for the new experimental
profiles. In a future change, we should detect this based on
the profile version set.

This utility will be extended in future with more options, but
is already useful in that it can handle the new experimental
profile format and produce pretty-printed output consistent
with the Impala web UI and impala-shell.

Change-Id: I6178399ac96e176f7067cc47347e51cda2f3
---
M be/src/util/CMakeLists.txt
A be/src/util/impala-profile-tool.cc
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
4 files changed, 114 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/16821/1
--
To view, visit http://gerrit.cloudera.org:8080/16821
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6178399ac96e176f7067cc47347e51cda2f3
Gerrit-Change-Number: 16821
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation

2020-12-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16765 )

Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max 
reservation
..

IMPALA-10337: Consider MAX_ROW_SIZE when computing max reservation

PlanRootSink can fail silently if result spooling is enabled and
maxMemReservationBytes is less than 2 * MAX_ROW_SIZE. This happens
because results are spilled using a SpillableRowBatchQueue which needs 2
buffer (read and write) with at least MAX_ROW_SIZE bytes per buffer.
This patch fixes this by setting a lower bound of 2 * MAX_ROW_SIZE while
computing the min reservation for the PlanRootSink.

Testing:
- Pass exhaustive tests.
- Add e2e TestResultSpoolingMaxReservation.
- Lower MAX_ROW_SIZE on tests where MAX_RESULT_SPOOLING_MEM is set to
  extremely low value. Also verify that PLAN_ROOT_SINK's ReservationLimit
  remain unchanged after lowering the MAX_ROW_SIZE.

Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
Reviewed-on: http://gerrit.cloudera.org:8080/16765
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/runtime/buffered-tuple-stream.h
M be/src/runtime/spillable-row-batch-queue.cc
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M tests/custom_cluster/test_query_retries.py
M tests/query_test/test_result_spooling.py
5 files changed, 118 insertions(+), 9 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/16765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
Gerrit-Change-Number: 16765
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation

2020-12-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16765 )

Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max 
reservation
..


Patch Set 8: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
Gerrit-Change-Number: 16765
Gerrit-PatchSet: 8
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 04 Dec 2020 23:55:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint

2020-12-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16792 )

Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7779/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b
Gerrit-Change-Number: 16792
Gerrit-PatchSet: 6
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 04 Dec 2020 22:30:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint

2020-12-04 Thread Aman Sinha (Code Review)
Hello Qifan Chen, Shant Hovsepian, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16792

to look at the new patch set (#6).

Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint
..

IMPALA-10360: Allow simple limit to be treated as sampling hint

As a follow-up to IMPALA-10314, it is sometimes useful to consider
a simple limit as a way to sample from a table if a relevant hint
has been provided. Doing a sample instead of pure limit serves
dual purposes: (a) it still helps with reducing the planning time
since the scan ranges need be computed only for the sample files,
(b) it allows sufficient number of files/rows to be read from
the table such that after applying filter conditions or joins with
another table, the query may still produce the N rows needed for
limit.

This fuctionality is especially useful if the query is against a
view (note that TABLESAMPLE clause cannot be applied to a view).

In this patch, a new table level hint, 'convert_limit_to_sample'
is added. If this hint is attached to a table either in the main
query block or within a view/subquery and simple limit optimization
conditions are satisfied (according to IMPALA-10314), the limit
is converted to a table sample. For example:

 set optimize_simple_limit = true;
 CREATE VIEW v1 as SELECT * FROM T [convert_limit_to_sample]
WHERE [always_true] ;
 SELECT * FROM v1 LIMIT 10;

In this case, the limit 10 is converted to a sample of T and the
sampling percent is the greater of 1% or ratio (in percent) of
limit to the estimated row count of the table (after partition
pruning).

Testing:
 - Added a alltypes_date_partition_2 table where the date and
   timestamp values match (this helps with setting the
   'always_true' hint).
 - Added views with 'convert_limit_to_sample' and 'always_true'
   hints and added new tests against the views. Modified a few
   existing tests to reference the new table variant.
 - Added an end-to-end test.

Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b
---
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableRef.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/bin/compute-table-stats.sh
M testdata/datasets/functional/functional_schema_template.sql
M 
testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test
M 
testdata/workloads/functional-query/queries/QueryTest/range-constant-propagation.test
10 files changed, 285 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/16792/6
--
To view, visit http://gerrit.cloudera.org:8080/16792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b
Gerrit-Change-Number: 16792
Gerrit-PatchSet: 6
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] [WIP] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate

2020-12-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: [WIP] IMPALA-10325: Parquet scan should use min/max statistics 
to skip pages based on equi-join predicate
..


Patch Set 27:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7778/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 27
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 04 Dec 2020 21:24:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] [WIP] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate

2020-12-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: [WIP] IMPALA-10325: Parquet scan should use min/max statistics 
to skip pages based on equi-join predicate
..


Patch Set 27:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/16720/27/be/src/util/min-max-filter-test.cc
File be/src/util/min-max-filter-test.cc:

http://gerrit.cloudera.org:8080/#/c/16720/27/be/src/util/min-max-filter-test.cc@589
PS27, Line 589: EXPECT_EQ(overflow, false); 
  \
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/16720/27/be/src/util/min-max-filter-test.cc@592
PS27, Line 592: EXPECT_EQ(overflow, false); 
  \
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/16720/27/be/src/util/min-max-filter-test.cc@597
PS27, Line 597: EXPECT_EQ(overflow, false); 
  \
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/16720/27/be/src/util/min-max-filter-test.cc@600
PS27, Line 600: EXPECT_EQ(overflow, false); 
  \
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/16720/27/be/src/util/min-max-filter-test.cc@649
PS27, Line 649: CheckDecimalVals(filter##SIZE, decimal##SIZE##_type, 
d1##SIZE, d1##SIZE);  \
line too long (108 > 90)


http://gerrit.cloudera.org:8080/#/c/16720/27/be/src/util/min-max-filter-test.cc@653
PS27, Line 653: CheckDecimalVals(filter##SIZE, decimal##SIZE##_type, 
d1##SIZE, d2##SIZE);  \
line too long (108 > 90)


http://gerrit.cloudera.org:8080/#/c/16720/27/be/src/util/min-max-filter-test.cc@657
PS27, Line 657: CheckDecimalVals(filter##SIZE, decimal##SIZE##_type, 
d3##SIZE, d2##SIZE);  \
line too long (108 > 90)


http://gerrit.cloudera.org:8080/#/c/16720/27/be/src/util/min-max-filter-test.cc@669
PS27, Line 669: CheckDecimalVals(filter##SIZE##2, decimal##SIZE##_type, 
d3##SIZE, d2##SIZE); \
line too long (110 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 27
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 04 Dec 2020 21:05:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] [WIP] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate

2020-12-04 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#27). ( 
http://gerrit.cloudera.org:8080/16720 )

Change subject: [WIP] IMPALA-10325: Parquet scan should use min/max statistics 
to skip pages based on equi-join predicate
..

[WIP] IMPALA-10325: Parquet scan should use min/max statistics to skip pages 
based on equi-join predicate

This patch adds the logic to utilize min/max stats for Parquet row
groups or pages to skip these entities when they don't qualify an
equi-join predicate.

A new class of predicates called overlap predicates is introduced to aid
in the determination of whether a Parquet row group or a page overlap
with a range computed from the hash join. If not, then the entire
Parquet row group or the page are skipped.

The new class of predicates co-exist with the existing min/max conjuncts
that are introduced based on the local or transitive scan predicates.
Both classes of predicates can work individually or together with each
other. The overlap predicates are evaluated after the existing min/max
conjuncts.

Two new run-time profile counters are added for the number of row groups
or pages filtered out via the overlap predicates respectively:
  1. NumMinMaxFilteredRowGroups
  2. NumMinMaxFilteredPages

An overlap predicate associated with a join column of type J and a scan
column type of S will be formed provided the following is true:
   Both S and J are Booleans
   Both S and J are Integers (tinyint, smallint, int, or bigint)
   Both S and J are approximate numeric (float or double)
   Both S and J are Decimals with the same precision and scale
   Both S and J are strings (STRING, CHAR or VARCHAR)
   Both S and J are date
   Both S and J are timestamp

Testing:
1. Added data type specific overlap method tests in
   min-max-filter-test.cc (boolean, int, string, date, timestamp and
   decimal);
2. Unit tested on various column types (int, bigint, string
   and decimal) with TPCH and TPCDS tables. Benefits were significant
   when the join column on the outer table is sorted, or when the
   min/max boundary values of the pages or row groups are monotonic;
3. Added new tests in min_max_filters.test to demonstrate filtered
   pages and row groups.

TBD:
1. Compute a usefulness score for the overlap predicate and integrate
   it into MAX_NUM_RUNTIME_FILTERS limit;
2. Performance measurement;
3. Core testing.

Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
---
M be/src/exec/exec-node.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-column-stats.cc
M be/src/exec/parquet/parquet-column-stats.h
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/date-value.cc
M be/src/runtime/date-value.h
M be/src/runtime/decimal-value.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test
23 files changed, 1,090 insertions(+), 153 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/16720/27
--
To view, visit http://gerrit.cloudera.org:8080/16720
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691
Gerrit-Change-Number: 16720
Gerrit-PatchSet: 27
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint

2020-12-04 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16792 )

Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint
..


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@223
PS5, Line 223: if (getTableRefs().size() == 1) return true;
> Should we remove this? It seems hasConvertLimitToSampleHint() can return tr
We cannot remove this because as I mentioned in a previous comment (patchset 2) 
the table level hint is not required in order for simple limit optimization to 
be applied.   For example, there are 2 cases:
 1.   select * from (select * from  t where [always_true] a > 0) limit 10;
 2.   select * from (select * from t [convert_limit_to_sample] where 
[always_true] a > 0) limit 10;
In both cases, we want to be able to apply the optimization.  In case 1, it 
will just pick first 10 files while in case 2 it will sample across multiple 
partitions.   Case 1 will typically be much faster planning time, so we should 
support that.


http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java:

http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@209
PS2, Line 209: estimatedTotalRows
> Sounds about right.
I haven't looked into why the past decision was to only support whole numbers 
for the sampling but probably the use case wasn't there to motivate supporting 
fractional values. You may want to look into the history but yeah as I said 
smaller sample size would be useful in this situation.


http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java:

http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@217
PS5, Line 217: partitions.size()/numTotalPartitions
> Cool! This will work well when the partitions are about the same size, whic
Yes, there is an assumption of uniform distribution (I should add a comment) as 
this is still a heuristic. I would like to avoid the per partition numRows 
estimate since that can be way off and in the past Tim has also discouraged 
using it. The total estimated row count can be completely off as well, so I 
acknowledge there's weakness here.

Even if it was accurate, I also didn't want to add a for loop to add up the 
numRows of the surviving partitions since that could potentially run into tens 
of thousands or hundreds of thousands (especially if no pruning happens which 
is quite common).

I am beginning to think the only foolproof way is to let the user specify exact 
percentage in the hint.  e.g  [convert_limit_to_sample=5].  This guarantees a 
5% sampling of surviving partitions if limit is present and does not rely on 
stats etc.  What do you guys think ?  I will have to add a bit of parsing logic 
to the hint processing.



--
To view, visit http://gerrit.cloudera.org:8080/16792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b
Gerrit-Change-Number: 16792
Gerrit-PatchSet: 5
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 04 Dec 2020 18:57:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation

2020-12-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16765 )

Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max 
reservation
..


Patch Set 8: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
Gerrit-Change-Number: 16765
Gerrit-PatchSet: 8
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 04 Dec 2020 18:23:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation

2020-12-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16765 )

Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max 
reservation
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6729/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
Gerrit-Change-Number: 16765
Gerrit-PatchSet: 8
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 04 Dec 2020 18:23:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10337: Consider MAX ROW SIZE when computing max reservation

2020-12-04 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16765 )

Change subject: IMPALA-10337: Consider MAX_ROW_SIZE when computing max 
reservation
..


Patch Set 7: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id7138e1e034ea5d1cd15cf8de399690e52a9d726
Gerrit-Change-Number: 16765
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 04 Dec 2020 17:46:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint

2020-12-04 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16792 )

Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint
..


Patch Set 5:

(3 comments)

Looks good to me!

http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@223
PS5, Line 223: if (getTableRefs().size() == 1) return true;
Should we remove this? It seems hasConvertLimitToSampleHint() can return true 
or false depending on whether the hint has been set to the only table ref here. 
It could be not set.


http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java:

http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@209
PS2, Line 209: estimatedTotalRows
> I made this change to use a scaled down value of the estimated row count  (
Sounds about right.

I also like the idea to specify the sample size in terms of number rows, which 
will speed up the sampling of a few rows from a very large table, where %1 
could be in the order of million rows. I can file a JIRA on this and work on it 
after the min/max work.


http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java:

http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@217
PS5, Line 217: partitions.size()/numTotalPartitions
Cool! This will work well when the partitions are about the same size, which is 
mostly true with hash partitions.

For other partition schemes with unequal sizes, such as range partitioning, I 
wonder if the use of HdfsPartition::numRows_ would work:  sample rate = #rows 
to return / # rows in the surviving partitions.



--
To view, visit http://gerrit.cloudera.org:8080/16792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b
Gerrit-Change-Number: 16792
Gerrit-PatchSet: 5
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 04 Dec 2020 14:34:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10361: Use field id to resolve columns for Iceberg tables

2020-12-04 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16788 )

Change subject: IMPALA-10361: Use field id to resolve columns for Iceberg tables
..


Patch Set 5: Code-Review+1

(2 comments)

Thanks for adding the tests, the change looks great. I'm planning to do another 
round next week, so only giving it +1 for now.

I think for Iceberg tables we should always try to resolve columns via field 
id, i.e. for Iceberg tables we can ignore the value of 
PARQUET_FALLBACK_SCHEMA_RESOLUTION.

Do you plan to implement this for ORC tables as well (in a separate patch)? 
Maybe we should open another Jira/subtask for that.

http://gerrit.cloudera.org:8080/#/c/16788/5/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java:

http://gerrit.cloudera.org:8080/#/c/16788/5/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@339
PS5, Line 339: boolean isFullAcidTable = 
AcidUtils.isFullAcidTable(msTbl.getParameters());
Iceberg tables cannot be full ACID, maybe it can be a precondition.


http://gerrit.cloudera.org:8080/#/c/16788/5/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/16788/5/testdata/data/README@608
PS5, Line 608: generated file will contains multi blocks, multi pages per block.
Please add information about the newly added files and tests.



--
To view, visit http://gerrit.cloudera.org:8080/16788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I057bdc6ab2859cc4d40de5ed428d0c20028b8435
Gerrit-Change-Number: 16788
Gerrit-PatchSet: 5
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 04 Dec 2020 13:19:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10361: Use field id to resolve columns for Iceberg tables

2020-12-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16788 )

Change subject: IMPALA-10361: Use field id to resolve columns for Iceberg tables
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks// : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16788
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I057bdc6ab2859cc4d40de5ed428d0c20028b8435
Gerrit-Change-Number: 16788
Gerrit-PatchSet: 5
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Fri, 04 Dec 2020 11:48:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] WiP: IMPALA-10237: Support Bucket and Truncate partition transforms as built-in functions

2020-12-04 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16741 )

Change subject: WiP: IMPALA-10237: Support Bucket and Truncate partition 
transforms as built-in functions
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16741/7/be/src/exprs/iceberg-functions-ir.cc
File be/src/exprs/iceberg-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16741/7/be/src/exprs/iceberg-functions-ir.cc@59
PS7, Line 59:   if (input.val4 < 0 && result.val4 > 0) {
: return 
TruncatePartitionTransformDecimalImpl(input.val4, width.val);
:   }
Could you add a comment what happens here? Shouldn't we use something like 
RETURN_IF_OVERFLOW in decimal-operators-ir.cc?

impala_udf::DecimalVal is able to hold decimals with any size, but 
impala::DecimalVal might only have 4 bytes of storage, this might be 
problematic in some cases.



--
To view, visit http://gerrit.cloudera.org:8080/16741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I485680cf79d96d578dd8cfbfd554bec468fe84bd
Gerrit-Change-Number: 16741
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 04 Dec 2020 11:48:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10361: Use field id to resolve columns for Iceberg tables

2020-12-04 Thread wangsheng (Code Review)
wangsheng has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/16788 )

Change subject: IMPALA-10361: Use field id to resolve columns for Iceberg tables
..

IMPALA-10361: Use field id to resolve columns for Iceberg tables

We supported resolve column by field id for Iceberg table in this
patch. We can use 'set PARQUET_FALLBACK_SCHEMA_RESOLUTION=FIELD_ID'
or 'set PARQUET_FALLBACK_SCHEMA_RESOLUTION=2' to choose field id
resolving. But pay attention, if you use this for non-Iceberg
table, the result will be NULL.

Change-Id: I057bdc6ab2859cc4d40de5ed428d0c20028b8435
---
M be/src/exec/parquet/parquet-metadata-utils.cc
M be/src/exec/parquet/parquet-metadata-utils.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/types.cc
M be/src/runtime/types.h
M be/src/service/query-options-test.cc
M common/thrift/CatalogObjects.thrift
M common/thrift/Descriptors.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/Types.thrift
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/IcebergColumn.java
A fe/src/main/java/org/apache/impala/catalog/IcebergStructField.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/StructType.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
A 
testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/1-1-13d79bd6-4b97-4680-b4e1-52e93b6ce04e-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/6-6-305c9b7a-f42d-4245-b806-dfa7a792593f-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/9-9-224fe2d6-b0d9-42d6-bc95-15f52ecb29ad-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00017-17-1a38e294-5992-48d9-a18e-08e129bb418c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00023-23-74cfcf22-3de2-489a-b1ec-d5141e75a8e8-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00027-27-5f91dc85-b8f3-4cc2-a5c6-38b7fee49709-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00030-30-dc3510cc-e765-43bc-be03-c5561a8d50a3-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-08/action=view/00031-31-364afc4a-b718-406d-a532-58fab5c8f85d-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/4-4-7a1a8e89-8aeb-4405-be64-76557432cf21-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00014-14-765d552a-fddc-42f3-adfd-ecba20a01d80-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00015-15-9957db43-3b9a-4a50-9946-d003cc1d461c-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00019-19-1e1895d0-1f42-4c30-989f-968802831077-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00020-20-bb59ac6d-aeee-4c35-9f8a-1a03127d33b8-0.parquet
A 
testdata/data/iceberg_test/hadoop_catalog/iceberg_resolution_test/functional_parquet/iceberg_resolution_test/data/event_time_hour=2020-01-01-09/action=click/00028-28-44ba3ad9-737c-4416-a32c-501cc9a4aa90-0.parquet
A 

[Impala-ASF-CR] IMPALA-9936: Only send invalidations in DDL responses to LocalCatalog coordinators

2020-12-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16435 )

Change subject: IMPALA-9936: Only send invalidations in DDL responses to 
LocalCatalog coordinators
..


Patch Set 4: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id45827295ddee3eb6e98a11c55f582b2aebe5f38
Gerrit-Change-Number: 16435
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 04 Dec 2020 11:12:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint

2020-12-04 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16792 )

Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/7776/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b
Gerrit-Change-Number: 16792
Gerrit-PatchSet: 5
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 04 Dec 2020 08:31:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint

2020-12-04 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16792 )

Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java:

http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@209
PS2, Line 209: estimatedTotalRows
> The TABLESAMPLE is a long type, so yeah the minimum can be 1%.  You're righ
I made this change to use a scaled down value of the estimated row count  
(after partition pruning).  Also added a test which exercises both partition 
pruning and convert_limit_to_sample . When adding the test I realized that in 
my previous patchset compute stats was not run on the alltypes_date_partition_2 
table.  I added that to the compute-table-stats.sh script and made related 
updates to the plans.



--
To view, visit http://gerrit.cloudera.org:8080/16792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b
Gerrit-Change-Number: 16792
Gerrit-PatchSet: 5
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 04 Dec 2020 08:13:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint

2020-12-04 Thread Aman Sinha (Code Review)
Hello Qifan Chen, Shant Hovsepian, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16792

to look at the new patch set (#5).

Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint
..

IMPALA-10360: Allow simple limit to be treated as sampling hint

As a follow-up to IMPALA-10314, it is sometimes useful to consider
a simple limit as a way to sample from a table if a relevant hint
has been provided. Doing a sample instead of pure limit serves
dual purposes: (a) it still helps with reducing the planning time
since the scan ranges need be computed only for the sample files,
(b) it allows sufficient number of files/rows to be read from
the table such that after applying filter conditions or joins with
another table, the query may still produce the N rows needed for
limit.

This fuctionality is especially useful if the query is against a
view (note that TABLESAMPLE clause cannot be applied to a view).

In this patch, a new table level hint, 'convert_limit_to_sample'
is added. If this hint is attached to a table either in the main
query block or within a view/subquery and simple limit optimization
conditions are satisfied (according to IMPALA-10314), the limit
is converted to a table sample. For example:

 set optimize_simple_limit = true;
 CREATE VIEW v1 as SELECT * FROM T [convert_limit_to_sample]
WHERE [always_true] ;
 SELECT * FROM v1 LIMIT 10;

In this case, the limit 10 is converted to a sample of T and the
sampling percent is the greater of 1% or ratio (in percent) of
limit to the estimated row count of the table (after partition
pruning).

Testing:
 - Added a alltypes_date_partition_2 table where the date and
   timestamp values match (this helps with setting the
   'always_true' hint).
 - Added views with 'convert_limit_to_sample' and 'always_true'
   hints and added new tests against the views. Modified a few
   existing tests to reference the new table variant.
 - Added an end-to-end test.

Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b
---
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableRef.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M testdata/bin/compute-table-stats.sh
M testdata/datasets/functional/functional_schema_template.sql
M 
testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test
M 
testdata/workloads/functional-query/queries/QueryTest/range-constant-propagation.test
10 files changed, 279 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/16792/5
--
To view, visit http://gerrit.cloudera.org:8080/16792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b
Gerrit-Change-Number: 16792
Gerrit-PatchSet: 5
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong