[Impala-ASF-CR] IMPALA-9741: Supported query Icebreg table by impala

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16143 )

Change subject: IMPALA-9741: Supported query Icebreg table by impala
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6612/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
Gerrit-Change-Number: 16143
Gerrit-PatchSet: 7
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Thu, 16 Jul 2020 05:13:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9741: Supported query Icebreg table by impala

2020-07-15 Thread wangsheng (Code Review)
wangsheng has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/16143 )

Change subject: IMPALA-9741: Supported query Icebreg table by impala
..

IMPALA-9741: Supported query Icebreg table by impala

This patch mainly realizes the query of iceberg table through impala,
we can use the following sql to create an external iceberg table:
CREATE EXTERNAL TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
)
STORED AS ICEBERG
LOCATION 'hdfs://xxx'
TBLPROPERTIES ('iceberg_file_format'='parquet');
Or just including table name and location like this:
CREATE EXTERNAL TABLE default.iceberg_test
STORED AS ICEBERG
LOCATION 'hdfs://xxx'
TBLPROPERTIES ('iceberg_file_format'='parquet');
'iceberg_file_format' is the file format in iceberg, currently only
support PARQUET, other format would be supported in the future. And
if you don't identity this property in your SQL, default file format
is PARQUET.

We achieved this function by treating the iceberg table as normal
unpartitioned hdfs table. When query iceberg table, we pushdown
partition column predicates to iceberg to decided which data files
need to be scanned, and then transformed these information to BE to
do the real scan operation.

Testing:
- Unit test for Iceberg in FileMetadataLoaderTest
- Create table tests in functional_schema_template.sql
- Iceberg table query test in custom cluster test test_iceberg.py

Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
---
M be/src/runtime/descriptors.cc
M bin/rat_exclude_files.txt
M common/thrift/CatalogObjects.thrift
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java
M fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java
M fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
M testdata/data/README
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/1-100-e1a80ed6-1064-494d-9cdd-c4a30c1ab8dc-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/3-102-511427f2-85f0-43ae-9b39-a456f8dc57b6-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/4-103-00fc55e1-6ef7-4241-ace2-6d075b9737fc-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/6-105-ef9e76d5-c060-4040-8aa1-b7c275610daa-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/7-106-c09c9c8d-9478-44f9-8501-f85f53112bc3-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/9-108-3b4f06ac-dca3-4f4e-be60-bf42d9927b5b-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00011-110-1e653ccf-0963-4fb0-941c-32c9de13268b-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00012-111-dfa70658-eb4b-4fa0-9ffa-b892cf90d6ac-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00014-113-2d16e751-e2a4-4856-ab89-145996e3815e-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00015-114-0f710621-cbbf-4509-a93d-b58808978e2e-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00017-116-0b666c79-53df-4507-906c-542e65a83443-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00019-118-1bc6bc6e-e061-4da3-9d1e-a427a306c471-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00020-119-ae7b2c67-1538-4429-8246-4998960e3817-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00022-121-8db0f1e1-d88c-4aad-a8b3-24fd07329cdb-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00023-122-de57b6b0-f54b-40ac-85cd-e783505094b6-0.parquet
A 

[Impala-ASF-CR] IMPALA-3127: Support incremental metadata updates in partition level

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16159 )

Change subject: IMPALA-3127: Support incremental metadata updates in partition 
level
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6611/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0abfb346903d6e7cdc603af91c2b8937d24d870
Gerrit-Change-Number: 16159
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 16 Jul 2020 04:01:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3127: Support incremental metadata updates in partition level

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16159 )

Change subject: IMPALA-3127: Support incremental metadata updates in partition 
level
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6139/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/16159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0abfb346903d6e7cdc603af91c2b8937d24d870
Gerrit-Change-Number: 16159
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 16 Jul 2020 03:35:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3127: Support incremental metadata updates in partition level

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16159 )

Change subject: IMPALA-3127: Support incremental metadata updates in partition 
level
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16159/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/16159/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1574
PS2, Line 1574: for (Map.Entry part : 
hdfsTable.getPartitions().entrySet()) {
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/16159/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/16159/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4241
PS2, Line 4241: // TODO(IMPALA-9937): if client is a 'v1' impalad, only 
send back incremental updates
line too long (93 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/16159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0abfb346903d6e7cdc603af91c2b8937d24d870
Gerrit-Change-Number: 16159
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 16 Jul 2020 03:34:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3127: Support incremental metadata updates in partition level

2020-07-15 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16159 )

Change subject: IMPALA-3127: Support incremental metadata updates in partition 
level
..


Patch Set 2:

(6 comments)

Thanks for the review! Addressed the comments.

http://gerrit.cloudera.org:8080/#/c/16159/1/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

http://gerrit.cloudera.org:8080/#/c/16159/1/common/thrift/CatalogObjects.thrift@424
PS1, Line 424:
> nit, partition
Removed this field


http://gerrit.cloudera.org:8080/#/c/16159/1/common/thrift/CatalogObjects.thrift@425
PS1, Line 425:   // Each TNetworkAddress is a datanode which contains blocks of 
a file in the table.
 :   // Used so that each THdfsFileBlock can just reference an 
index in this list rather
 :   // than duplicate the list of network address, w
> Is there any value of having a new field? Seems like this list is always de
Done. Merged the list into the partition map and introduce some flags to reveal 
the state.


http://gerrit.cloudera.org:8080/#/c/16159/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java
File fe/src/main/java/org/apache/impala/catalog/Catalog.java:

http://gerrit.cloudera.org:8080/#/c/16159/1/fe/src/main/java/org/apache/impala/catalog/Catalog.java@632
PS1, Line 632: ":"
> I feel having space in the catalogObjectKey is bit unconventional and can c
Sure. Will change to ":"


http://gerrit.cloudera.org:8080/#/c/16159/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/16159/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@710
PS1, Line 710:   byte[] data = serializer.serialize(minimalObject);
 :   String v2Key = 
CatalogServiceConstants.CATALOG_TOPIC_V2_PREFIX + key;
 :
> You may want to consider sending the updates for partitions as well since t
Yeah, good point! We can send invalidation on the old (replaced) partition id.


http://gerrit.cloudera.org:8080/#/c/16159/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/16159/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1614
PS1, Line 1614: ptor tableDesc = new TTab
> perhaps a better name could be toThriftWithMinimalPartitions since we are a
Done


http://gerrit.cloudera.org:8080/#/c/16159/1/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
File fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java:

http://gerrit.cloudera.org:8080/#/c/16159/1/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java@469
PS1, Line 469: pdates.
 : if (newTable instanceof HdfsTable
> I think it would be more readable if we move this to a method called isFrom
Done. Merge the list into the map.



--
To view, visit http://gerrit.cloudera.org:8080/16159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0abfb346903d6e7cdc603af91c2b8937d24d870
Gerrit-Change-Number: 16159
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 16 Jul 2020 03:34:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3127: Support incremental metadata updates in partition level

2020-07-15 Thread Quanlong Huang (Code Review)
Hello Anurag Mantripragada, Vihang Karajgaonkar, Tim Armstrong, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16159

to look at the new patch set (#2).

Change subject: IMPALA-3127: Support incremental metadata updates in partition 
level
..

IMPALA-3127: Support incremental metadata updates in partition level

Currently, partitions are tightly integrated into the HdfsTable objects.
Catalogd has to transmit the entire table metadata even when few
partitions change. This is a waste of resources and can lead to OOM in
transmitting large tables due to the 2GB JVM array limit.

This patch makes HdfsPartition extend CatalogObject so the catalogd can
send partitions as individual catalog objects. Consequently, table
objects in the catalog topic update can have minimal partition maps that
only contain the partition ids, which reduces the thrift object size for
large tables. The catalog object key of HdfsPartition consists of db
name, table name and partition name.

In "full" topic mode (catalog_topic_mode=full), catalogd only sends
changed partitions with their latest table states. The latest table
states are table objects with the minimal partition map. Legacy
coordinators use the partition list to pick up existing (unchanged)
partitions from the existing table object and new partitions in the
catalog update.

Currently, partition instances are immutable - all partition
modifications are implemented by deleting the old instance and adding a
new one with a new partition id. Since partition ids are generated by a
global counter. Newer partition instances will have larger partition
ids. So catalogd maintains a watermark for each table as the max sent
partition id. Partition instances with ids larger than this are new
partitions that should be sent in the next catalog update. For the
deleted partition instances, they are kept in a set for each table until
the next catalog update. If there are no updates on the same partition
name, catalogd will send deletion on the partition.

For dropped or invalidated tables, catalogd will still send deletions on
their partitions. Although they are not used in coordinators
(coordinators delete the partitions when they delete the table
instances), they help in avoiding topic entry leak in the statestore
catalog topic.

In "minimal" topic mode (catalog_topic_mode=minimal), catalogd only
sends invalidations on tables and stale partition instances. Each
partition instance is identified by its partition id. LocalCatalog
coordinators use the partition invalidations to evict stale partitions
in time. For instance, let's say partition(year=2010) is updated in
catalogd. This is done by deleting the old partition instance
partition(id=0, year=2010) and adding a new partition instance
partition(id=1, year=2010). Catalogd will send invalidations on the
table and partition instance with id=0, but not the one with id=1. A
LocalCatalog coordinator will invalidate the partition instance(id=0) if
it's in the cache. If the partition instance(id=1) is cached, it's
already the latest version since partition instances are immutable. So
we don't need to invalidate it.

Tests
 - Run exhaustive tests.
 - Run exhaustive test_ddl.py in LocalCatalog mode.
 - (TODO) Add tests on long statestore update frequency that several
   table changes are sent in the same topic update.
 - (TODO) Add tests on straggler coordinators that need to process
   several incremental updates at once.
 - (TODO) Add tests on no statestore topic entry leak.

Change-Id: Ia0abfb346903d6e7cdc603af91c2b8937d24d870
---
M be/src/catalog/catalog-util.cc
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogObject.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
12 files changed, 501 insertions(+), 62 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/16159/2
--
To view, visit http://gerrit.cloudera.org:8080/16159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia0abfb346903d6e7cdc603af91c2b8937d24d870
Gerrit-Change-Number: 16159
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 

[Impala-ASF-CR] IMPALA-6788: Abort ExecFInstance() RPC loop early after query failure

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16192 )

Change subject: IMPALA-6788: Abort ExecFInstance() RPC loop early after query 
failure
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6610/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I034788f7720fc97c25c54f006ff72dce6cb199c3
Gerrit-Change-Number: 16192
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Thu, 16 Jul 2020 02:34:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6788: Abort ExecFInstance() RPC loop early after query failure

2020-07-15 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/16192 )

Change subject: IMPALA-6788: Abort ExecFInstance() RPC loop early after query 
failure
..

IMPALA-6788: Abort ExecFInstance() RPC loop early after query failure

Stops issuing ExecQueryFInstance rpcs and cancels any inflight when
backend reports failure.
Adds new debug action CONSTRUCT_QUERY_STATE_REPORT that runs when
constructing a query state report.
Adds a new test case for handling errors reported from query state.

Testing:
 - Ran following command for new test case and verified that the code
   working as expected:
 ./bin/impala-py.test tests/custom_cluster/test_rpc_exception.py\
   ::TestRPCException::test_state_report_error \
   --workload_exploration_strategy=functional-query:exhaustive
 - Passed exhaustive tests.

Change-Id: I034788f7720fc97c25c54f006ff72dce6cb199c3
---
M be/src/runtime/coordinator.cc
M be/src/runtime/query-state.cc
M tests/custom_cluster/test_rpc_exception.py
3 files changed, 39 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/16192/3
--
To view, visit http://gerrit.cloudera.org:8080/16192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I034788f7720fc97c25c54f006ff72dce6cb199c3
Gerrit-Change-Number: 16192
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-9956: inline hot functions in Sorter

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16202 )

Change subject: IMPALA-9956: inline hot functions in Sorter
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16202
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a8034ab6d2e3c71a2d2f2fcc3d6b788e9398194
Gerrit-Change-Number: 16202
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 16 Jul 2020 00:01:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9956: inline hot functions in Sorter

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16202 )

Change subject: IMPALA-9956: inline hot functions in Sorter
..

IMPALA-9956: inline hot functions in Sorter

Add some compiler hints to force inlining of small
functions into the hot Partition() loop.

Performance:
A single node perf run on TPC-H showed no perf change.

A single node performance run with the queries that target
sort performance showed up to a 19% reduction in time spent
in the sort.

+---+---+-++++
| Workload  | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) 
| Delta(GeoMean) |
+---+---+-++++
| TARGETED-PERF(30) | parquet / none / none | 5.52| -5.82% | 4.00   
| -9.74% |
+---+---+-++++

+---+-+---++-++---++---++-++
| Workload  | Query   | File Format 
  | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | 
Median Diff(%) | MW Zval | Tval   |
+---+-+---++-++---++---++-++
| TARGETED-PERF(30) | primitive_orderby_all   | parquet / none / 
none | 11.89  | 12.22   |   -2.73%   |   1.07%   |   1.20%| 10| 
  -2.88%   | -3.13   | -5.42  |
| TARGETED-PERF(30) | primitive_orderby_bigint_expression | parquet / none / 
none | 2.61   | 2.94| I -11.27%  |   0.83%   |   1.14%| 10| 
I -12.56%  | -3.58   | -26.25 |
| TARGETED-PERF(30) | primitive_orderby_bigint| parquet / none / 
none | 2.06   | 2.42| I -14.80%  |   0.94%   |   0.68%| 10| 
I -17.43%  | -3.58   | -44.37 |
+---+-+---++-++---++---++-++

(I) Improvement: TARGETED-PERF(30) primitive_orderby_bigint_expression [parquet 
/ none / none] (2.94s -> 2.61s [-11.27%])
+-++--+--++---+--+--+++---+---+---+
| Operator| % of Query | Avg  | Base Avg | Delta(Avg) | 
StdDev(%) | Max  | Base Max | Delta(Max) | #Hosts | #Inst | #Rows | Est 
#Rows |
+-++--+--++---+--+--+++---+---+---+
| 02:ANALYTIC | 11.84% | 332.95ms | 337.56ms | -1.37% |   4.86% 
  | 360.86ms | 379.52ms | -4.92% | 1  | 1 | 5.09M | 18.00M|
| F00:EXCHANGE SENDER | 15.61% | 439.03ms | 454.63ms | -3.43% |   4.86% 
  | 478.29ms | 485.79ms | -1.55% | 1  | 1 | -1| -1|
| 01:SORT | 67.05% | 1.89s| 2.21s| -14.88%|   0.98% 
  | 1.92s| 2.26s| -15.07%| 1  | 1 | 5.09M | 18.00M|
+-++--+--++---+--+--+++---+---+---+

(I) Improvement: TARGETED-PERF(30) primitive_orderby_bigint [parquet / none / 
none] (2.42s -> 2.06s [-14.80%])
+-++--+--++---+--+--+++---+---+---+
| Operator| % of Query | Avg  | Base Avg | Delta(Avg) | 
StdDev(%) | Max  | Base Max | Delta(Max) | #Hosts | #Inst | #Rows | Est 
#Rows |
+-++--+--++---+--+--+++---+---+---+
| 02:ANALYTIC | 15.39% | 367.90ms | 373.26ms | -1.44% |   3.48% 
  | 390.03ms | 393.01ms | -0.76% | 1  | 1 | 5.09M | 18.00M|
| F00:EXCHANGE SENDER | 15.64% | 373.88ms | 374.12ms | -0.07% |   2.83% 
  | 389.96ms | 386.36ms | +0.93% | 1  | 1 | -1| -1|
| 01:SORT | 56.28% | 1.35s| 1.68s| -20.10%|   1.14% 
  | 1.38s| 1.70s| -18.92%| 1  | 1 | 5.09M | 18.00M|
| 00:SCAN HDFS| 9.67%  | 231.18ms | 231.77ms | -0.25% |   7.06% 
  | 247.79ms | 250.70ms | -1.16% | 1  | 1 | 5.09M | 18.00M|

[Impala-ASF-CR] IMPALA-8125: Add query option to limit number of hdfs writer instances

2020-07-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16204 )

Change subject: IMPALA-8125: Add query option to limit number of hdfs writer 
instances
..


Patch Set 1:

(6 comments)

I had some initial things that I noticed while doing a pass over it.

http://gerrit.cloudera.org:8080/#/c/16204/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16204/1//COMMIT_MSG@9
PS1, Line 9: This patch adds a new query option MAX_HDFS_WRITERS that limits the
Maybe we should call it FS instead of HDFS? Just cause the name is a bit 
anachronistic at this point.


http://gerrit.cloudera.org:8080/#/c/16204/1//COMMIT_MSG@26
PS1, Line 26: - Added e2e tests to confirm that the scheduler is enforcing the 
limit
It seemed based on first glance that we should probably have end-to-end tests 
for more of the different plan shapes that could be generated. Maybe I am 
missing something though.


http://gerrit.cloudera.org:8080/#/c/16204/1/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/16204/1/be/src/scheduling/scheduler.cc@496
PS1, Line 496: // This implementation ensures that instances on the same 
host get consecutive
Can you also comment what it's trying to achieve (i.e. Create the desired 
number of instances while balancing them across hosts).


http://gerrit.cloudera.org:8080/#/c/16204/1/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/16204/1/be/src/service/query-options.cc@685
PS1, Line 685: break;
Uhh... oops. It would be good to file a separate JIRA for the missing breaks. 
Just cause it could be something people actually run into.


http://gerrit.cloudera.org:8080/#/c/16204/1/testdata/workloads/functional-planner/queries/PlannerTest/insert-hdfs-writer-limit.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/insert-hdfs-writer-limit.test:

http://gerrit.cloudera.org:8080/#/c/16204/1/testdata/workloads/functional-planner/queries/PlannerTest/insert-hdfs-writer-limit.test@6
PS1, Line 6:  PLAN
Maybe we should only include DISTRIBUTEDPLAN in these tests? I feel like the 
single node plans are mostly adding noise.


http://gerrit.cloudera.org:8080/#/c/16204/1/tests/custom_cluster/test_mt_dop.py
File tests/custom_cluster/test_mt_dop.py:

http://gerrit.cloudera.org:8080/#/c/16204/1/tests/custom_cluster/test_mt_dop.py@117
PS1, Line 117:   
@CustomClusterTestSuite.with_args(impalad_args="--unlock_mt_dop=true", 
cluster_size=3)
We actually set unlock_mt_dop=true for the e2e tests, so I think this could be 
an end-to-end test.



--
To view, visit http://gerrit.cloudera.org:8080/16204
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I17c8e61b9a32d908eec82c83618ff9caa41078a5
Gerrit-Change-Number: 16204
Gerrit-PatchSet: 1
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 15 Jul 2020 23:47:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7001: Fix Privilege inconsistency between SHOW TABLES and SHOW FUNCTIONS

2020-07-15 Thread Fang-Yu Rao (Code Review)
Fang-Yu Rao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16199 )

Change subject: IMPALA-7001: Fix Privilege inconsistency between SHOW TABLES 
and SHOW FUNCTIONS
..


Patch Set 2: Code-Review+1

Hi Adam, thanks for working on this patch!

The patch looks good to me since you have implemented what Fredy had suggested 
at https://issues.apache.org/jira/browse/IMPALA-7001. I only have two minor 
comments regarding the test and the commit message.

Specifically, after your patch, a user granted only the privilege of CREATE on 
a specified database, e.g., functional, would be able to execute a statement 
like "SHOW FUNCTIONS IN functional", since according to 
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/authorization/Privilege.java
 and 
https://github.com/apache/impala/blob/3a6022ce80ca1cedb629400b18caaf0d1f54137c/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationChecker.java#L431-L453,
 such a statement would succeed as long as the user is granted any privilege in 
the set {ALL, OWNER, ALTER, DROP, CREATE, INSERT, SELECT, REFRESH}.

Before your patch, in order for the statement above to succeed, a user has to 
be granted any privilege in the set {INSERT, SELECT, REFRESH}. Thus I think it 
would be good to add one more test case in 
https://github.com/apache/impala/blob/master/tests/authorization/test_ranger.py,
 where we 1) grant the privilege of CREATE to a user (as admin_client), and 2)  
execute a statement like "SHOW FUNCTIONS IN unique_database" to verify there is 
no exception thrown.

On the other hand, I think it may also be good to provide more detail of the 
difference before and after the patch. For instance, we could mention that a 
user granted only the privilege of CREATE is now able to execute that SQL 
statement above after this patch, making it easier for the user to manage the 
functions it creates.


--
To view, visit http://gerrit.cloudera.org:8080/16199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ae7546c206daaf98ecc3de449069027c43c6e1a
Gerrit-Change-Number: 16199
Gerrit-PatchSet: 2
Gerrit-Owner: Adam Tamas 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 15 Jul 2020 23:27:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8125: Add query option to limit number of hdfs writer instances

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16204 )

Change subject: IMPALA-8125: Add query option to limit number of hdfs writer 
instances
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6609/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16204
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I17c8e61b9a32d908eec82c83618ff9caa41078a5
Gerrit-Change-Number: 16204
Gerrit-PatchSet: 1
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 15 Jul 2020 21:48:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8125: Add query option to limit number of hdfs writer instances

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16204 )

Change subject: IMPALA-8125: Add query option to limit number of hdfs writer 
instances
..


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16204/1/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/16204/1/common/thrift/ImpalaService.thrift@547
PS1, Line 547:   // Sets an upper limit on the number of hdfs writer instances 
used scheduled during insert.
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/16204/1/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

http://gerrit.cloudera.org:8080/#/c/16204/1/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@325
PS1, Line 325: "create table 
test_hdfs_insert_writer_limit.unpartitioned_table (id int) location '/'");
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/16204/1/tests/custom_cluster/test_mt_dop.py
File tests/custom_cluster/test_mt_dop.py:

http://gerrit.cloudera.org:8080/#/c/16204/1/tests/custom_cluster/test_mt_dop.py@105
PS1, Line 105: class TestMtDopHdfsWriterLimit(CustomClusterTestSuite):
flake8: E302 expected 2 blank lines, found 1


http://gerrit.cloudera.org:8080/#/c/16204/1/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/16204/1/tests/query_test/test_insert.py@354
PS1, Line 354: class TestInsertHdfsWriterLimit(ImpalaTestSuite):
flake8: E302 expected 2 blank lines, found 1


http://gerrit.cloudera.org:8080/#/c/16204/1/tests/query_test/test_insert.py@382
PS1, Line 382: ,
flake8: E231 missing whitespace after ','



--
To view, visit http://gerrit.cloudera.org:8080/16204
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I17c8e61b9a32d908eec82c83618ff9caa41078a5
Gerrit-Change-Number: 16204
Gerrit-PatchSet: 1
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 15 Jul 2020 21:21:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8125: Add query option to limit number of hdfs writer instances

2020-07-15 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16204


Change subject: IMPALA-8125: Add query option to limit number of hdfs writer 
instances
..

IMPALA-8125: Add query option to limit number of hdfs writer instances

This patch adds a new query option MAX_HDFS_WRITERS that limits the
number of HDFS writer instances.

Highlights:
- Depending on the plan, it either restricts the num of instances of
  the root fragment or adds an exchange and then limits the num of
  instances of that.
- Assigns instances evenly across available backends.
- "no-shuffle" query hint is ignored when using query option.
- Change in behavior of plans is only when this query option is used.
- The only exception to the previous point is that the optimization
  logic that decides to add an exchange now looks at the num of
  instances instead of the number of nodes.

Testing:
- Adding planner tests to cover all cases where this enforcement kicks
  in and to highlight the behavior.
- Added e2e tests to confirm that the scheduler is enforcing the limit
  and distributing the instance evenly across backends.

Change-Id: I17c8e61b9a32d908eec82c83618ff9caa41078a5
---
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/insert-hdfs-writer-limit.test
M tests/custom_cluster/test_mt_dop.py
M tests/query_test/test_insert.py
17 files changed, 1,271 insertions(+), 34 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/16204/1
--
To view, visit http://gerrit.cloudera.org:8080/16204
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I17c8e61b9a32d908eec82c83618ff9caa41078a5
Gerrit-Change-Number: 16204
Gerrit-PatchSet: 1
Gerrit-Owner: Bikramjeet Vig 


[Impala-ASF-CR] Bump up CDP BUILD NUMBER to 4493826

2020-07-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16195 )

Change subject: Bump up CDP_BUILD_NUMBER to 4493826
..

Bump up CDP_BUILD_NUMBER to 4493826

This change bumps up the CDP_BUILD_NUMBER to 4493826. This is needed
to fix a failing test.

Hive started to assign bucket ids to files differently. Because of
that I had to modify the test_full_acid_rowid test that had an
assumption about how bucket ids are assigned to files.

If you have problems restarting the Hive Metastore, try the following:

  buildall.sh  -upgrade_metastore_db

If you have problems restarting Kudu, try the following:

  Unset LD_LIBRARY_PATH in your shell, and stop setting it in
  impala-config-local.sh

Change-Id: Ia4635feef146c945624135e0715495bb01ea4699
Reviewed-on: http://gerrit.cloudera.org:8080/16195
Tested-by: Impala Public Jenkins 
Reviewed-by: Tim Armstrong 
---
M bin/impala-config.sh
M fe/pom.xml
M testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test
3 files changed, 30 insertions(+), 17 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/16195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ia4635feef146c945624135e0715495bb01ea4699
Gerrit-Change-Number: 16195
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] Bump up CDP BUILD NUMBER to 4493826

2020-07-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16195 )

Change subject: Bump up CDP_BUILD_NUMBER to 4493826
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4635feef146c945624135e0715495bb01ea4699
Gerrit-Change-Number: 16195
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 15 Jul 2020 19:14:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9956: inline hot functions in Sorter

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16202 )

Change subject: IMPALA-9956: inline hot functions in Sorter
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6137/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16202
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a8034ab6d2e3c71a2d2f2fcc3d6b788e9398194
Gerrit-Change-Number: 16202
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 15 Jul 2020 18:54:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9956: inline hot functions in Sorter

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16202 )

Change subject: IMPALA-9956: inline hot functions in Sorter
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16202
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a8034ab6d2e3c71a2d2f2fcc3d6b788e9398194
Gerrit-Change-Number: 16202
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 15 Jul 2020 18:54:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9956: inline hot functions in Sorter

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16202 )

Change subject: IMPALA-9956: inline hot functions in Sorter
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6608/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16202
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a8034ab6d2e3c71a2d2f2fcc3d6b788e9398194
Gerrit-Change-Number: 16202
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 15 Jul 2020 18:48:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9956: inline hot functions in Sorter

2020-07-15 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16202 )

Change subject: IMPALA-9956: inline hot functions in Sorter
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16202
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a8034ab6d2e3c71a2d2f2fcc3d6b788e9398194
Gerrit-Change-Number: 16202
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 15 Jul 2020 18:32:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9956: inline hot functions in Sorter

2020-07-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16202


Change subject: IMPALA-9956: inline hot functions in Sorter
..

IMPALA-9956: inline hot functions in Sorter

Add some compiler hints to force inlining of small
functions into the hot Partition() loop.

Performance:
A single node perf run on TPC-H showed no perf change.

A single node performance run with the queries that target
sort performance showed up to a 19% reduction in time spent
in the sort.

+---+---+-++++
| Workload  | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) 
| Delta(GeoMean) |
+---+---+-++++
| TARGETED-PERF(30) | parquet / none / none | 5.52| -5.82% | 4.00   
| -9.74% |
+---+---+-++++

+---+-+---++-++---++---++-++
| Workload  | Query   | File Format 
  | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | 
Median Diff(%) | MW Zval | Tval   |
+---+-+---++-++---++---++-++
| TARGETED-PERF(30) | primitive_orderby_all   | parquet / none / 
none | 11.89  | 12.22   |   -2.73%   |   1.07%   |   1.20%| 10| 
  -2.88%   | -3.13   | -5.42  |
| TARGETED-PERF(30) | primitive_orderby_bigint_expression | parquet / none / 
none | 2.61   | 2.94| I -11.27%  |   0.83%   |   1.14%| 10| 
I -12.56%  | -3.58   | -26.25 |
| TARGETED-PERF(30) | primitive_orderby_bigint| parquet / none / 
none | 2.06   | 2.42| I -14.80%  |   0.94%   |   0.68%| 10| 
I -17.43%  | -3.58   | -44.37 |
+---+-+---++-++---++---++-++

(I) Improvement: TARGETED-PERF(30) primitive_orderby_bigint_expression [parquet 
/ none / none] (2.94s -> 2.61s [-11.27%])
+-++--+--++---+--+--+++---+---+---+
| Operator| % of Query | Avg  | Base Avg | Delta(Avg) | 
StdDev(%) | Max  | Base Max | Delta(Max) | #Hosts | #Inst | #Rows | Est 
#Rows |
+-++--+--++---+--+--+++---+---+---+
| 02:ANALYTIC | 11.84% | 332.95ms | 337.56ms | -1.37% |   4.86% 
  | 360.86ms | 379.52ms | -4.92% | 1  | 1 | 5.09M | 18.00M|
| F00:EXCHANGE SENDER | 15.61% | 439.03ms | 454.63ms | -3.43% |   4.86% 
  | 478.29ms | 485.79ms | -1.55% | 1  | 1 | -1| -1|
| 01:SORT | 67.05% | 1.89s| 2.21s| -14.88%|   0.98% 
  | 1.92s| 2.26s| -15.07%| 1  | 1 | 5.09M | 18.00M|
+-++--+--++---+--+--+++---+---+---+

(I) Improvement: TARGETED-PERF(30) primitive_orderby_bigint [parquet / none / 
none] (2.42s -> 2.06s [-14.80%])
+-++--+--++---+--+--+++---+---+---+
| Operator| % of Query | Avg  | Base Avg | Delta(Avg) | 
StdDev(%) | Max  | Base Max | Delta(Max) | #Hosts | #Inst | #Rows | Est 
#Rows |
+-++--+--++---+--+--+++---+---+---+
| 02:ANALYTIC | 15.39% | 367.90ms | 373.26ms | -1.44% |   3.48% 
  | 390.03ms | 393.01ms | -0.76% | 1  | 1 | 5.09M | 18.00M|
| F00:EXCHANGE SENDER | 15.64% | 373.88ms | 374.12ms | -0.07% |   2.83% 
  | 389.96ms | 386.36ms | +0.93% | 1  | 1 | -1| -1|
| 01:SORT | 56.28% | 1.35s| 1.68s| -20.10%|   1.14% 
  | 1.38s| 1.70s| -18.92%| 1  | 1 | 5.09M | 18.00M|
| 00:SCAN HDFS| 9.67%  | 231.18ms | 231.77ms | -0.25% |   7.06% 
  | 247.79ms | 250.70ms | -1.16% | 1  | 1 | 5.09M | 18.00M|

[Impala-ASF-CR] IMPALA-1270: add distinct aggregation to semi joins

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16180 )

Change subject: IMPALA-1270: add distinct aggregation to semi joins
..

IMPALA-1270: add distinct aggregation to semi joins

When generating plans with left semi/anti joins (typically
resulting from subquery rewrites), the planner now
considers inserting a distinct aggregation on the inner
side of the join. The decision is based on whether that
aggregation would reduce the number of rows by more than
75%. This is fairly conservative and the optimization
might be beneficial for smaller reductions, but the
conservative threshold is chosen to reduce the number
of potential plan regressions.

The aggregation can both reduce the # of rows and the
width of the rows, by projecting out unneeded slots.

ENABLE_DISTINCT_SEMI_JOIN_OPTIMIZATION query option is
added to allow toggling the optimization.

Tests:
* Add positive and negative planner tests for various
  cases - including semi/anti joins, missing stats,
  broadcast/shuffle, different numbers of join predicates.
* Add some end-to-end tests to verify plans execute correctly.

Change-Id: Icbb955e805d9e764edf11c57b98f341b88a37fcc
Reviewed-on: http://gerrit.cloudera.org:8080/16180
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-loop-join.test
M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/semi-join-distinct.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite-hdfs-num-rows-est-enabled.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-runtime.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
25 files changed, 3,746 insertions(+), 467 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/16180
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Icbb955e805d9e764edf11c57b98f341b88a37fcc
Gerrit-Change-Number: 16180
Gerrit-PatchSet: 13
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-1270: add distinct aggregation to semi joins

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16180 )

Change subject: IMPALA-1270: add distinct aggregation to semi joins
..


Patch Set 12: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16180
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icbb955e805d9e764edf11c57b98f341b88a37fcc
Gerrit-Change-Number: 16180
Gerrit-PatchSet: 12
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 15 Jul 2020 17:10:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] Bump up CDP BUILD NUMBER to 4493826

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16195 )

Change subject: Bump up CDP_BUILD_NUMBER to 4493826
..


Patch Set 4: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4635feef146c945624135e0715495bb01ea4699
Gerrit-Change-Number: 16195
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 15 Jul 2020 15:16:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9741: Supported query Icebreg table by impala

2020-07-15 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16143 )

Change subject: IMPALA-9741: Supported query Icebreg table by impala
..


Patch Set 6:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/16143/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16143/6//COMMIT_MSG@7
PS6, Line 7: Icebreg
It's still misspelled


http://gerrit.cloudera.org:8080/#/c/16143/6//COMMIT_MSG@7
PS6, Line 7: Supported query
nit: Support querying


http://gerrit.cloudera.org:8080/#/c/16143/6//COMMIT_MSG@26
PS6, Line 26: identity
specify


http://gerrit.cloudera.org:8080/#/c/16143/6/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

http://gerrit.cloudera.org:8080/#/c/16143/6/common/thrift/CatalogObjects.thrift@512
PS6, Line 512: source_cols_map
nit: column_to_source_id ?


http://gerrit.cloudera.org:8080/#/c/16143/6/common/thrift/CatalogObjects.thrift@515
PS6, Line 515: partition_col_to_source_id_map
The mapping is reversed. Name it "source_id_to_partition" ?


http://gerrit.cloudera.org:8080/#/c/16143/6/common/thrift/CatalogObjects.thrift@516
PS6, Line 516: map file_descriptors
Please follow the above conventions for naming maps.


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java
File fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java:

http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java@28
PS6, Line 28:   // The id of the source field in iceberg table Schema, you can 
get these source
:   // fields by Schema.columns(), the return type is 
List.
Might worth rewording it a bit:

"The id of the source column in the Iceberg table schema. The source column is 
used as the input for this partition field."


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java@88
PS6, Line 88: if (table_ instanceof FeIcebergTable) {
:   if (((FeIcebergTable) 
table_).getPartitionColToSourceIdMap().isEmpty()) {
: notPartitioned = true;
:   }
Probably we should treat all Iceberg tables as not partitioned, since it's 
partitioning is different than other file system tables' partitioning.


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@66
PS6, Line 66: getFileDescMap
nit: getPartitionToFileDescMap


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@219
PS6, Line 219: isPartitionTable
nit: isPartitioned?


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@258
PS6, Line 258: PartitionColToSourceId
It returns a mapping from source ids to partition columns, therefore please 
name it "sourceIdToPartitionCol".


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@271
PS6, Line 271: getSourceColsMap
nit: getColumnToSourceIdMap?


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@305
PS6, Line 305:
nit: wrong indentation


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/util/IcebergUtil.java
File fe/src/main/java/org/apache/impala/util/IcebergUtil.java:

http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@114
PS6, Line 114: if (format == null) return null;
 : format = format.toUpperCase();
 : if (format.equals("PARQUET")) {
 :   return TIcebergFileFormat.PARQUET;
 : }
 : return null;
How about:

 if ("PARQUET".equalsIgnoreCase(format)) return TIcebergFileFormat.PARQUET;
 return null;


http://gerrit.cloudera.org:8080/#/c/16143/6/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

http://gerrit.cloudera.org:8080/#/c/16143/6/testdata/bin/generate-schema-statements.py@193
PS6, Line 193:   'iceberg': 'ICEBERG'
You probably don't need to modify this file. I think adding HUDIPARQUET to this 
file was also unnecessary.

Probably we can do the same thing that we did for Hudi, i.e. add the Iceberg 
tables under the functional_parquet database.

https://gerrit.cloudera.org/c/14711/25/testdata/datasets/functional/schema_constraints.csv

[Impala-ASF-CR] IMPALA-1270: add distinct aggregation to semi joins

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16180 )

Change subject: IMPALA-1270: add distinct aggregation to semi joins
..


Patch Set 11: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16180
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icbb955e805d9e764edf11c57b98f341b88a37fcc
Gerrit-Change-Number: 16180
Gerrit-PatchSet: 11
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 15 Jul 2020 12:16:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1270: add distinct aggregation to semi joins

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16180 )

Change subject: IMPALA-1270: add distinct aggregation to semi joins
..


Patch Set 12: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16180
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icbb955e805d9e764edf11c57b98f341b88a37fcc
Gerrit-Change-Number: 16180
Gerrit-PatchSet: 12
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 15 Jul 2020 11:59:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1270: add distinct aggregation to semi joins

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16180 )

Change subject: IMPALA-1270: add distinct aggregation to semi joins
..


Patch Set 12:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6136/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16180
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icbb955e805d9e764edf11c57b98f341b88a37fcc
Gerrit-Change-Number: 16180
Gerrit-PatchSet: 12
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 15 Jul 2020 12:00:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1270: add distinct aggregation to semi joins

2020-07-15 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16180 )

Change subject: IMPALA-1270: add distinct aggregation to semi joins
..


Patch Set 11: Code-Review+2

Great work, LGTM!


--
To view, visit http://gerrit.cloudera.org:8080/16180
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icbb955e805d9e764edf11c57b98f341b88a37fcc
Gerrit-Change-Number: 16180
Gerrit-PatchSet: 11
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 15 Jul 2020 11:59:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] Bump up CDP BUILD NUMBER to 4493826

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16195 )

Change subject: Bump up CDP_BUILD_NUMBER to 4493826
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6607/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4635feef146c945624135e0715495bb01ea4699
Gerrit-Change-Number: 16195
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 15 Jul 2020 10:29:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7001: Fix Privilege inconsistency between SHOW TABLES and SHOW FUNCTIONS

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16199 )

Change subject: IMPALA-7001: Fix Privilege inconsistency between SHOW TABLES 
and SHOW FUNCTIONS
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6606/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ae7546c206daaf98ecc3de449069027c43c6e1a
Gerrit-Change-Number: 16199
Gerrit-PatchSet: 2
Gerrit-Owner: Adam Tamas 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 15 Jul 2020 10:15:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] Bump up CDP BUILD NUMBER to 4493826

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16195 )

Change subject: Bump up CDP_BUILD_NUMBER to 4493826
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6135/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4635feef146c945624135e0715495bb01ea4699
Gerrit-Change-Number: 16195
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 15 Jul 2020 10:06:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] Bump up CDP BUILD NUMBER to 4493826

2020-07-15 Thread Zoltan Borok-Nagy (Code Review)
Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16195

to look at the new patch set (#3).

Change subject: Bump up CDP_BUILD_NUMBER to 4493826
..

Bump up CDP_BUILD_NUMBER to 4493826

This change bumps up the CDP_BUILD_NUMBER to 4493826. This is needed
to fix a failing test.

Hive started to assign bucket ids to files differently. Because of
that I had to modify the test_full_acid_rowid test that had an
assumption about how bucket ids are assigned to files.

If you have problems restarting the Hive Metastore, try the following:

  buildall.sh  -upgrade_metastore_db

If you have problems restarting Kudu, try the following:

  Unset LD_LIBRARY_PATH in your shell, and stop setting it in
  impala-config-local.sh

Change-Id: Ia4635feef146c945624135e0715495bb01ea4699
---
M bin/impala-config.sh
M fe/pom.xml
M testdata/workloads/functional-query/queries/QueryTest/full-acid-rowid.test
3 files changed, 30 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/16195/3
--
To view, visit http://gerrit.cloudera.org:8080/16195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia4635feef146c945624135e0715495bb01ea4699
Gerrit-Change-Number: 16195
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7001: Fix Privilege inconsistency between SHOW TABLES and SHOW FUNCTIONS

2020-07-15 Thread Adam Tamas (Code Review)
Adam Tamas has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/16199 )

Change subject: IMPALA-7001: Fix Privilege inconsistency between SHOW TABLES 
and SHOW FUNCTIONS
..

IMPALA-7001: Fix Privilege inconsistency between SHOW TABLES and SHOW FUNCTIONS

In "show tables" ANY privilege was used, whereas in "show functions"
the required privilege was VIEW_METADATA.
To solve the inconsistency "show functions" will use ANY instead of
VIEW_METADATA similar to "show tables".

Testing:
-Ran CORE tests.
-Added new test to check the privilege.

Change-Id: I9ae7546c206daaf98ecc3de449069027c43c6e1a
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/ShowFunctionsStmt.java
M fe/src/test/java/org/apache/impala/analysis/AuditingTest.java
3 files changed, 15 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/16199/2
--
To view, visit http://gerrit.cloudera.org:8080/16199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9ae7546c206daaf98ecc3de449069027c43c6e1a
Gerrit-Change-Number: 16199
Gerrit-PatchSet: 2
Gerrit-Owner: Adam Tamas 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-7001: Fix Privilege inconsistency between SHOW TABLES and SHOW FUNCTIONS

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16199 )

Change subject: IMPALA-7001: Fix Privilege inconsistency between SHOW TABLES 
and SHOW FUNCTIONS
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6605/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ae7546c206daaf98ecc3de449069027c43c6e1a
Gerrit-Change-Number: 16199
Gerrit-PatchSet: 1
Gerrit-Owner: Adam Tamas 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 15 Jul 2020 09:15:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7001: Fix Privilege inconsistency between SHOW TABLES and SHOW FUNCTIONS

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16199 )

Change subject: IMPALA-7001: Fix Privilege inconsistency between SHOW TABLES 
and SHOW FUNCTIONS
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16199/1/fe/src/test/java/org/apache/impala/analysis/AuditingTest.java
File fe/src/test/java/org/apache/impala/analysis/AuditingTest.java:

http://gerrit.cloudera.org:8080/#/c/16199/1/fe/src/test/java/org/apache/impala/analysis/AuditingTest.java@372
PS1, Line 372:   Set accessEvents = 
AnalyzeAccessEvents(String.format("show %s in functional", qual));
line too long (109 > 90)


http://gerrit.cloudera.org:8080/#/c/16199/1/fe/src/test/java/org/apache/impala/analysis/AuditingTest.java@374
PS1, Line 374:   Sets.newHashSet(new TAccessEvent("functional", 
TCatalogObjectType.DATABASE, "ANY")));
line too long (103 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/16199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ae7546c206daaf98ecc3de449069027c43c6e1a
Gerrit-Change-Number: 16199
Gerrit-PatchSet: 1
Gerrit-Owner: Adam Tamas 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 15 Jul 2020 08:47:16 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7001: Fix Privilege inconsistency between SHOW TABLES and SHOW FUNCTIONS

2020-07-15 Thread Adam Tamas (Code Review)
Adam Tamas has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16199


Change subject: IMPALA-7001: Fix Privilege inconsistency between SHOW TABLES 
and SHOW FUNCTIONS
..

IMPALA-7001: Fix Privilege inconsistency between SHOW TABLES and SHOW FUNCTIONS

In "show tables" ANY privilege was used, whereas in "show functions"
the required privilege was VIEW_METADATA.
To solve the inconsistency "show functions" will use ANY instead of
VIEW_METADATA similar to "show tables".

Testing:
-Ran CORE tests.
-Added new test to check the privilege.

Change-Id: I9ae7546c206daaf98ecc3de449069027c43c6e1a
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/ShowFunctionsStmt.java
M fe/src/test/java/org/apache/impala/analysis/AuditingTest.java
3 files changed, 13 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/16199/1
--
To view, visit http://gerrit.cloudera.org:8080/16199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9ae7546c206daaf98ecc3de449069027c43c6e1a
Gerrit-Change-Number: 16199
Gerrit-PatchSet: 1
Gerrit-Owner: Adam Tamas 


[Impala-ASF-CR] IMPALA-9741: Supported query Icebreg table by impala

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16143 )

Change subject: IMPALA-9741: Supported query Icebreg table by impala
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6604/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
Gerrit-Change-Number: 16143
Gerrit-PatchSet: 6
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Wed, 15 Jul 2020 08:32:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9741: Supported query Icebreg table by impala

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16143 )

Change subject: IMPALA-9741: Supported query Icebreg table by impala
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16143/6/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

http://gerrit.cloudera.org:8080/#/c/16143/6/testdata/bin/generate-schema-statements.py@766
PS6, Line 766: n
flake8: E501 line too long (94 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/16143/6/tests/common/test_dimensions.py
File tests/common/test_dimensions.py:

http://gerrit.cloudera.org:8080/#/c/16143/6/tests/common/test_dimensions.py@32
PS6, Line 32: c
flake8: E501 line too long (98 > 90 characters)



--
To view, visit http://gerrit.cloudera.org:8080/16143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
Gerrit-Change-Number: 16143
Gerrit-PatchSet: 6
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Wed, 15 Jul 2020 08:05:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9741: Supported query Icebreg table by impala

2020-07-15 Thread wangsheng (Code Review)
wangsheng has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/16143 )

Change subject: IMPALA-9741: Supported query Icebreg table by impala
..

IMPALA-9741: Supported query Icebreg table by impala

This patch mainly realizes the query of iceberg table through impala,
we can use the following sql to create an external iceberg table:
CREATE EXTERNAL TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
)
STORED AS ICEBERG
LOCATION 'hdfs://xxx'
TBLPROPERTIES ('iceberg_file_format'='parquet');
Or just including table name and location like this:
CREATE EXTERNAL TABLE default.iceberg_test
STORED AS ICEBERG
LOCATION 'hdfs://xxx'
TBLPROPERTIES ('iceberg_file_format'='parquet');
'iceberg_file_format' is the file format in iceberg, currently only
support PARQUET, other format would be supported in the future. And
if you don't identity this property in your SQL, default file format
is PARQUET.

We achieved this function by treating the iceberg table as normal
unpartitioned hdfs table. When query iceberg table, we pushdown
partition column predicates to iceberg to decided which data files
need to be scanned, and then transformed these information to BE to
do the real scan operation.

Testing:
- Unit test for Iceberg in FileMetadataLoaderTest
- Create table tests in functional_schema_template.sql
- Iceberg table query test in custom cluster test test_iceberg.py

Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
---
M be/src/runtime/descriptors.cc
M bin/rat_exclude_files.txt
M common/thrift/CatalogObjects.thrift
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java
M fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java
M fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
M testdata/bin/generate-schema-statements.py
M testdata/data/README
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/1-100-e1a80ed6-1064-494d-9cdd-c4a30c1ab8dc-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/3-102-511427f2-85f0-43ae-9b39-a456f8dc57b6-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/4-103-00fc55e1-6ef7-4241-ace2-6d075b9737fc-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/6-105-ef9e76d5-c060-4040-8aa1-b7c275610daa-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/7-106-c09c9c8d-9478-44f9-8501-f85f53112bc3-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/9-108-3b4f06ac-dca3-4f4e-be60-bf42d9927b5b-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00011-110-1e653ccf-0963-4fb0-941c-32c9de13268b-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00012-111-dfa70658-eb4b-4fa0-9ffa-b892cf90d6ac-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00014-113-2d16e751-e2a4-4856-ab89-145996e3815e-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00015-114-0f710621-cbbf-4509-a93d-b58808978e2e-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00017-116-0b666c79-53df-4507-906c-542e65a83443-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00019-118-1bc6bc6e-e061-4da3-9d1e-a427a306c471-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00020-119-ae7b2c67-1538-4429-8246-4998960e3817-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00022-121-8db0f1e1-d88c-4aad-a8b3-24fd07329cdb-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00023-122-de57b6b0-f54b-40ac-85cd-e783505094b6-0.parquet
A 

[Impala-ASF-CR] IMPALA-1270: add distinct aggregation to semi joins

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16180 )

Change subject: IMPALA-1270: add distinct aggregation to semi joins
..


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6134/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/16180
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icbb955e805d9e764edf11c57b98f341b88a37fcc
Gerrit-Change-Number: 16180
Gerrit-PatchSet: 11
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 15 Jul 2020 07:07:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1270: add distinct aggregation to semi joins

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16180 )

Change subject: IMPALA-1270: add distinct aggregation to semi joins
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6603/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16180
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icbb955e805d9e764edf11c57b98f341b88a37fcc
Gerrit-Change-Number: 16180
Gerrit-PatchSet: 11
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 15 Jul 2020 06:59:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP: IMPALA-9889: Fixed flaky test runtime filters on Kudu table

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16191 )

Change subject: WIP: IMPALA-9889: Fixed flaky test_runtime_filters on Kudu table
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6602/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16191
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I94a08e272f0870c04c96563fa614e3416fb5379b
Gerrit-Change-Number: 16191
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Wed, 15 Jul 2020 06:47:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6788: Abort ExecFInstance() RPC loop early after query failure

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16192 )

Change subject: IMPALA-6788: Abort ExecFInstance() RPC loop early after query 
failure
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6601/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I034788f7720fc97c25c54f006ff72dce6cb199c3
Gerrit-Change-Number: 16192
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Wed, 15 Jul 2020 06:47:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1270: add distinct aggregation to semi joins

2020-07-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16180 )

Change subject: IMPALA-1270: add distinct aggregation to semi joins
..


Patch Set 10:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16180/10/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/16180/10/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1877
PS10, Line 1877: List distinctExprs = new ArrayList<>();
> Should this be Set instead of List ?  For example,  if there's a correlatio
Yeah, Expr.getIds() deduplicates the slot ids (I checked that when i was 
writing the code, but didn't add any breadcrumbs). It isn't actually documented 
on the method, so added a comment.


http://gerrit.cloudera.org:8080/#/c/16180/10/testdata/workloads/functional-planner/queries/PlannerTest/semi-join-distinct.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/semi-join-distinct.test:

http://gerrit.cloudera.org:8080/#/c/16180/10/testdata/workloads/functional-planner/queries/PlannerTest/semi-join-distinct.test@826
PS10, Line 826: |  |  group by: count(*)
> It is strange to see an aggregate expr in the group-by since it is not vali
There is some weirdness with how expressions are shown in the explain after 
substitution. It also happens with the transpose agg with the CASE statements 
appearing in places in the plan where they're not actually evaluated.



--
To view, visit http://gerrit.cloudera.org:8080/16180
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icbb955e805d9e764edf11c57b98f341b88a37fcc
Gerrit-Change-Number: 16180
Gerrit-PatchSet: 10
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 15 Jul 2020 06:36:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1270: add distinct aggregation to semi joins

2020-07-15 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16180 )

Change subject: IMPALA-1270: add distinct aggregation to semi joins
..


Patch Set 11: Code-Review+1

carry +1


--
To view, visit http://gerrit.cloudera.org:8080/16180
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icbb955e805d9e764edf11c57b98f341b88a37fcc
Gerrit-Change-Number: 16180
Gerrit-PatchSet: 11
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 15 Jul 2020 06:36:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1270: add distinct aggregation to semi joins

2020-07-15 Thread Tim Armstrong (Code Review)
Hello Aman Sinha, Shant Hovsepian, David Rorke, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16180

to look at the new patch set (#11).

Change subject: IMPALA-1270: add distinct aggregation to semi joins
..

IMPALA-1270: add distinct aggregation to semi joins

When generating plans with left semi/anti joins (typically
resulting from subquery rewrites), the planner now
considers inserting a distinct aggregation on the inner
side of the join. The decision is based on whether that
aggregation would reduce the number of rows by more than
75%. This is fairly conservative and the optimization
might be beneficial for smaller reductions, but the
conservative threshold is chosen to reduce the number
of potential plan regressions.

The aggregation can both reduce the # of rows and the
width of the rows, by projecting out unneeded slots.

ENABLE_DISTINCT_SEMI_JOIN_OPTIMIZATION query option is
added to allow toggling the optimization.

Tests:
* Add positive and negative planner tests for various
  cases - including semi/anti joins, missing stats,
  broadcast/shuffle, different numbers of join predicates.
* Add some end-to-end tests to verify plans execute correctly.

Change-Id: Icbb955e805d9e764edf11c57b98f341b88a37fcc
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/join-order.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-loop-join.test
M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/semi-join-distinct.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite-hdfs-num-rows-est-enabled.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
M 
testdata/workloads/functional-query/queries/QueryTest/nested-types-runtime.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
25 files changed, 3,746 insertions(+), 467 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/16180/11
--
To view, visit http://gerrit.cloudera.org:8080/16180
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icbb955e805d9e764edf11c57b98f341b88a37fcc
Gerrit-Change-Number: 16180
Gerrit-PatchSet: 11
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] WIP: IMPALA-9889: Fixed flaky test runtime filters on Kudu table

2020-07-15 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16191


Change subject: WIP: IMPALA-9889: Fixed flaky test_runtime_filters on Kudu table
..

WIP: IMPALA-9889: Fixed flaky test_runtime_filters on Kudu table

Test cases in test_runtime_filters failed occasionally in ASAN
builds due to runtime filters not arriving scan nodes in time.
Query profiles showed that codegen took 2 to 4 minutes for one
fragment when this issue happened. This caused hash join nodes
waiting long time to generate and publish runtime filters, hence
arrival delay on scan nodes. To avoid the delay, turn on
ASYNC_CODEGEN for test_runtime_filters agaiest Kudu table when
test runs for ASAN build.

Testing:
 - Passed core test for regular debug.

TODO: pass ASAN build with core test.
There are some unrelated issues which cause lots of failures
for the ASAN build on Jenkins. The daily ASAN builds have
same issue.

Change-Id: I94a08e272f0870c04c96563fa614e3416fb5379b
---
M tests/query_test/test_runtime_filters.py
1 file changed, 19 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/16191/2
--
To view, visit http://gerrit.cloudera.org:8080/16191
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I94a08e272f0870c04c96563fa614e3416fb5379b
Gerrit-Change-Number: 16191
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-6788: Abort ExecFInstance() RPC loop early after query failure

2020-07-15 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16192


Change subject: IMPALA-6788: Abort ExecFInstance() RPC loop early after query 
failure
..

IMPALA-6788: Abort ExecFInstance() RPC loop early after query failure

Stops issuing ExecQueryFInstance rpcs and cancels any inflight when
backend reports failure.
Adds new debug action CONSTRUCT_QUERY_STATE_REPORT that runs when
constructing a query state report.
Adds a new test case for handling errors reported from query state.

Testing:
 - Ran following command for new test case and verified that the code
   working as expected:
 ./bin/impala-py.test tests/custom_cluster/test_rpc_exception.py\
   ::TestRPCException::test_state_report_error \
   --workload_exploration_strategy=functional-query:exhaustive
 - Passed core tests.

Change-Id: I034788f7720fc97c25c54f006ff72dce6cb199c3
---
M be/src/runtime/coordinator.cc
M be/src/runtime/query-state.cc
M tests/custom_cluster/test_rpc_exception.py
3 files changed, 38 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/16192/2
--
To view, visit http://gerrit.cloudera.org:8080/16192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I034788f7720fc97c25c54f006ff72dce6cb199c3
Gerrit-Change-Number: 16192
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Thomas Tauber-Marshall