[Impala-ASF-CR](asf-site) Add slack channel in the community page

2021-04-12 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17311 )

Change subject: Add slack channel in the community page
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17311
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-MessageType: comment
Gerrit-Change-Id: I26112aff63cabf6f4ca437f12d71928e80106687
Gerrit-Change-Number: 17311
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 13 Apr 2021 05:12:06 +
Gerrit-HasComments: No


[Impala-ASF-CR](asf-site) Add slack channel in the community page

2021-04-12 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17311 )

Change subject: Add slack channel in the community page
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17311/1/community.html
File community.html:

http://gerrit.cloudera.org:8080/#/c/17311/1/community.html@155
PS1, Line 155: sign up
> Do join links expire after a certain amount of time or a certain number of
I remember those things too. But this time when I copy the link in "Invite 
people to Apache Impala", it notifies me "never expires". So I think it's fine 
(including user limits).

BTW, Presto uses such kind of a link as well in their doc: 
https://prestodb.io/community.html



--
To view, visit http://gerrit.cloudera.org:8080/17311
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-MessageType: comment
Gerrit-Change-Id: I26112aff63cabf6f4ca437f12d71928e80106687
Gerrit-Change-Number: 17311
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 13 Apr 2021 03:48:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR](asf-site) Add slack channel in the community page

2021-04-12 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17311 )

Change subject: Add slack channel in the community page
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17311/1/community.html
File community.html:

http://gerrit.cloudera.org:8080/#/c/17311/1/community.html@155
PS1, Line 155: sign up
Do join links expire after a certain amount of time or a certain number of 
users? I have a recollection of running into that.



--
To view, visit http://gerrit.cloudera.org:8080/17311
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-MessageType: comment
Gerrit-Change-Id: I26112aff63cabf6f4ca437f12d71928e80106687
Gerrit-Change-Number: 17311
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 13 Apr 2021 02:51:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR](asf-site) Add slack channel in the community page

2021-04-12 Thread Quanlong Huang (Code Review)
Quanlong Huang has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17311


Change subject: Add slack channel in the community page
..

Add slack channel in the community page

This patch adds links to join the apache-impala slack channel.

Change-Id: I26112aff63cabf6f4ca437f12d71928e80106687
---
M community.html
1 file changed, 1 insertion(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/17311/1
--
To view, visit http://gerrit.cloudera.org:8080/17311
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-MessageType: newchange
Gerrit-Change-Id: I26112aff63cabf6f4ca437f12d71928e80106687
Gerrit-Change-Number: 17311
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky

2021-04-12 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17289 )

Change subject: IMPALA-10532: 
TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17289/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17289/2//COMMIT_MSG@10
PS2, Line 10: due to the query plan change (from 3-node scan to
: 2-node scan) which splits the row groups among scan nodes 
differently.
> One more piece of details on instance 2. The overlap ratio of 0 indicates t
Ah, I see! Thanks for the detailed explanation! It's really helpful.


http://gerrit.cloudera.org:8080/#/c/17289/2/tests/query_test/test_runtime_filters.py
File tests/query_test/test_runtime_filters.py:

http://gerrit.cloudera.org:8080/#/c/17289/2/tests/query_test/test_runtime_filters.py@270
PS2, Line 270: @SkipIfNotHdfsMinicluster.scheduling
IIUC, the issue is due to the underlying parquet files have different stats 
between EC and non-EC runs. It's not about scheduling. So I suggest we add a 
new skip type in SkipIfEC and use that instead, e.g. SkipIfEC.different_stats, 
and adding this in SkipIfEC (tests/common/skip.py)

  different_stats = pytest.mark.skipif(IS_EC, reason="Parquet/ORC files that 
are generated"
  " in tests have different stats in EC mode due to larger block group 
size.")



--
To view, visit http://gerrit.cloudera.org:8080/17289
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f
Gerrit-Change-Number: 17289
Gerrit-PatchSet: 2
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 13 Apr 2021 02:15:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions.

2021-04-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table cache for hms ddl apis 
which modify tables and partitions.
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8557/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 3
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 12 Apr 2021 22:29:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions.

2021-04-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17298 )

Change subject: IMPALA-10648: Invalidate catalogd table cache for hms ddl apis 
which modify tables and partitions.
..


Patch Set 3:

(50 comments)

http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
File 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java:

http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@348
PS3, Line 348: LOG.info("Invalidate catalogd cache for DDLs on non 
transactional tables is set to {}",
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@658
PS3, Line 658:   removeNonTransactionalTableIfExists(dbname, tblname, 
"drop_table_with_environment_context");
line too long (98 > 90)


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@843
PS3, Line 843:   
invalidateNonTransactionalTableIfExists(newTable.getDbName(), 
newTable.getTableName(),
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@890
PS3, Line 890:   Partition addedPartition = 
client.getHiveClient().getThriftClient().add_partition(partition);
line too long (99 > 90)


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@891
PS3, Line 891:   
invalidateNonTransactionalTableIfExists(partition.getDbName(), 
partition.getTableName(),
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@904
PS3, Line 904:   
invalidateNonTransactionalTableIfExists(partition.getDbName(), 
partition.getTableName(),
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@914
PS3, Line 914:   int numPartitionsAdded = 
client.getHiveClient().getThriftClient().add_partitions(partitionList);
line too long (102 > 90)


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@916
PS3, Line 916: 
invalidateNonTransactionalTableIfExists(partition.getDbName(), 
partition.getTableName(),
line too long (96 > 90)


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@930
PS3, Line 930: 
invalidateNonTransactionalTableIfExists(partitionSpec.getDbName(), 
partitionSpec.getTableName(),
line too long (104 > 90)


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@955
PS3, Line 955:   
invalidateNonTransactionalTableIfExists(addPartitionsRequest.getDbName(), 
addPartitionsRequest.getTblName(),
line too long (114 > 90)


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@968
PS3, Line 968:   .append_partition_with_environment_context(dbname, 
tblname, partVals, environmentContext);
line too long (104 > 90)


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@994
PS3, Line 994:   
.append_partition_by_name_with_environment_context(dbname, tblname, partName,
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@1021
PS3, Line 1021:   .drop_partition_with_environment_context(dbname, 
tblname, partNames, deleteData,
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@1034
PS3, Line 1034:   boolean partitionsDropped =  
client.getHiveClient().getThriftClient().drop_partition_by_name(dbname,
line too long (106 > 90)


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@1049
PS3, Line 1049:   
.drop_partition_by_name_with_environment_context(dbName, tableName, partName,
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17298/3/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@1064
PS3, Line 1064:   
invalidateNonTransactionalTableIfExists(dropPartitionsRequest.getDbName(), 
dropPartitionsRequest.getTblName(),
line too long (116 > 90)



[Impala-ASF-CR] IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify tables and partitions.

2021-04-12 Thread Sourabh Goyal (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17298

to look at the new patch set (#3).

Change subject: IMPALA-10648: Invalidate catalogd table cache for hms ddl apis 
which modify tables and partitions.
..

IMPALA-10648: Invalidate catalogd table cache for hms ddl apis which modify 
tables and partitions.

For non transactional tables, invalidate the table from cache if HMS DDL apis 
are accessed
from catalogd's metastore server. Any subsequent get table request fetches the 
table from
HMS and loads it in cache. This ensures that any get_table/get_partition 
requests after DDL
operations on the same table return updated table. This behavior has a 
performance penalty
(since table loading in cache takes time) but ensures consistency. This change 
is behind catalogd
server's flag: invalidate_hms_cache_on_ddls which is enabled by default. The 
flag needs to be
turned off if this change becomes a performance bottleneck.

Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
---
M be/src/catalog/catalog-server.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M 
fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M tests/custom_cluster/test_metastore_service.py
6 files changed, 420 insertions(+), 48 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/98/17298/3
--
To view, visit http://gerrit.cloudera.org:8080/17298
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idb9cc22ebfb51948433e4d57f4705ce201acaf98
Gerrit-Change-Number: 17298
Gerrit-PatchSet: 3
Gerrit-Owner: Sourabh Goyal 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..


Patch Set 18:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8556/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 18
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 12 Apr 2021 21:12:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10619: Minor refactoring of analytic function methods

2021-04-12 Thread Aman Sinha (Code Review)
Aman Sinha has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17237 )

Change subject: IMPALA-10619: Minor refactoring of analytic function methods
..

IMPALA-10619: Minor refactoring of analytic function methods

The FIRST_VALUE, LAST_VALUE functions go through standardization
process in AnalyticExpr where they may be rewritten with different
number of parameters or with different window frame. In order for an
external FE to leverage this standardization, this patch creates a
wrapper method for FunctionCallExpr creation and does minor refactoring.

Also added accessor methods to AnalyticEvalNode and changed visibility
of couple of methods in PlanNode for use by external FE.

Testing: Ran PlannerTests. No new tests are added since this does not
change the existing behavior.

Change-Id: I39e4268c0c5500f09acf98357a80763c28f615c2
Reviewed-on: http://gerrit.cloudera.org:8080/17237
Reviewed-by: Impala Public Jenkins 
Tested-by: Aman Sinha 
Reviewed-by: Aman Sinha 
---
M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
3 files changed, 18 insertions(+), 7 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved
  Aman Sinha: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/17237
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I39e4268c0c5500f09acf98357a80763c28f615c2
Gerrit-Change-Number: 17237
Gerrit-PatchSet: 5
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10619: Minor refactoring of analytic function methods

2021-04-12 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17237 )

Change subject: IMPALA-10619: Minor refactoring of analytic function methods
..


Patch Set 4: Code-Review+2

Carry forward +2


--
To view, visit http://gerrit.cloudera.org:8080/17237
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I39e4268c0c5500f09acf98357a80763c28f615c2
Gerrit-Change-Number: 17237
Gerrit-PatchSet: 4
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 12 Apr 2021 21:06:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-12 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..


Patch Set 18: Code-Review+1

(1 comment)

Looks good to me. Just have one more nit.
Please carry my +1 after fix.

http://gerrit.cloudera.org:8080/#/c/17252/18//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17252/18//COMMIT_MSG@12
PS18, Line 12: always_false_flipped_
nit: always_false_flipped_to_false_



--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 18
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 12 Apr 2021 21:00:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-12 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#18). ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..

IMPALA-10647 Improve always-true min/max filter handling in coordinator

The change improves how a coordinator behaves when a just
arriving min/max filter is always true. A new member
'always_true_filter_received_' is introduced to record such a
fact. Similarily, the new member always_false_flipped_ is
added to indicate that the always false flag is flipped from
'true' to 'false'. These two members only influence how the min
and max columns in "Filter routing table" and "Final filter
table" in profile are displayed as follows.

  1. 'PartialUpdates' - The min and the max are partially updated;
  2. 'AlwaysTrue' - One received filter is AlwaysTrue;
  3. 'AlwaysFalse'- No filter is received or all received
filters are empty;
  4. 'Real values'- The final accumulated min/max from all
received filters.

A second change introduced is to record, in scan node, the
arrival time of min/max filters (as a timestamp since the system
is rebooted, obtained by calling MonotonicMillis()). A timestamp
of similar nature is recorded for hdfs parquet scanners when a
row group is processed. By comparing these two timestamps, one
can easily diagnose issues related to late arrival of min/max
filters.

Testing:
  1. Ran unit tests;
  2. Ran core tests.

Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M 
testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
8 files changed, 223 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/17252/18
--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 18
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky

2021-04-12 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17289 )

Change subject: IMPALA-10532: 
TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17289/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17289/2//COMMIT_MSG@10
PS2, Line 10: due to the query plan change (from 3-node scan to
: 2-node scan) which splits the row groups among scan nodes 
differently.
> Here are some details if it is not too much :-) The bottom line is that onl
One more piece of details on instance 2. The overlap ratio of 0 indicates that 
the entire row group is filtered out (which is very good). We do not drill down 
to figure out how many pages there are in the group, for the sake of reporting.



--
To view, visit http://gerrit.cloudera.org:8080/17289
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f
Gerrit-Change-Number: 17289
Gerrit-PatchSet: 2
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 12 Apr 2021 20:49:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.

2021-04-12 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17195 )

Change subject: IMPALA-10584: Defer advancing read page when reservation is 
tight.
..


Patch Set 3:

It looks like the solution of this problem is much more simple than I thought: 
just don't advance the read page.

The initial reason why we are advancing read page in the first place is 
explained in IMPALA-8890 (review is in https://gerrit.cloudera.org/c/14144/): a 
DCHECK hit in ExpectedPinCount() because we tried to unpin a fully exhausted 
and attached read_page_. IMPALA-8890 choose to advance the read iterator to 
avoid this DCHECK.

Patch set 3 in this review choose to NOT advance the read page, but instead 
skip the first page (that is the read page) when iterating pages_ to call 
UnpinPageIfNeeded.


--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 12 Apr 2021 20:18:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.

2021-04-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17195 )

Change subject: IMPALA-10584: Defer advancing read page when reservation is 
tight.
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8555/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 12 Apr 2021 20:12:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10584: Defer advancing read page when reservation is tight.

2021-04-12 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Joe McDonnell, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17195

to look at the new patch set (#3).

Change subject: IMPALA-10584: Defer advancing read page when reservation is 
tight.
..

IMPALA-10584: Defer advancing read page when reservation is tight.

TestScratchLimit::test_with_unlimited_scratch_limit has been
intermittently crashing in ubuntu-16.04-dockerised-tests environment
after result spooling is enabled by default in IMPALA-9856. DCHECK
violation occurs in ReservationTracker::CheckConsistency() due to
BufferedTupleStream wrongly tries to reclaim memory reservation while
unpinning the stream.

For this bug to surface, all of the following needs to happen:
- Stream is in pinned mode.
- There are only 2 pages in the stream: 1 read and 1 write.
- Stream can not increase reservation anymore either due to memory
  pressure or low buffer/memory limit.
- The stream read page has been fully read and is attached to output
  RowBatch. But the output RowBatch has not cleaned up yet.
- BufferedTupleStream::UnpinStream is invoked.

The memory accounting bug happens because UnpinStream proceeds to
NextReadPage where the read page buffer was mistakenly assumed as
released. default_page_len_ bytes were added into
write_page_reservation_ and subsequently violates the total memory
reservation.

This patch fixes the bug by deferring advancement of the read iterator
in UnpinStream if the read page is attached to output RowBatch and there
is no more unused reservation. This is OK because after UnpinStream
finished, the stream is now in unpinned mode and has_read_write_page is
false. The next AddRow operation is then allowed to unpin the previous
write page first before reusing the reservation to allocate a new write
page.

Testing:
- Loop the TestScratchLimit::test_with_unlimited_scratch_limit in my
  local dev machine and verify that each test passed without triggering
  the DCHECK violation.
- Reenable result spooling in TestScratchLimit that was disabled in
  IMPALA-10559.
- Pass core tests.

Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
---
M be/src/runtime/buffered-tuple-stream.cc
M tests/query_test/test_scratch_limit.py
2 files changed, 24 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/17195/3
--
To view, visit http://gerrit.cloudera.org:8080/17195
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I16137b6e423f190f60c3115a06ccd0f77e9f585a
Gerrit-Change-Number: 17195
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-10350 Fix precision loss while converting DecimalValue to double.

2021-04-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17303 )

Change subject: IMPALA-10350 Fix precision loss while converting DecimalValue 
to double.
..


Patch Set 4:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/8554/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/17303
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56f0652cb8f81a491b87d9b108a94c00ae6c99a1
Gerrit-Change-Number: 17303
Gerrit-PatchSet: 4
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 12 Apr 2021 19:18:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10350 Fix precision loss while converting DecimalValue to double.

2021-04-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17303 )

Change subject: IMPALA-10350 Fix precision loss while converting DecimalValue 
to double.
..


Patch Set 4:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h
File be/src/thirdparty/fast_double_parser/fast_double_parser.h:

http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@13
PS4, Line 13: #if (defined(sun) || defined(__sun))
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@17
PS4, Line 17: #if defined(__CYGWIN__) || defined(__MINGW32__) || 
defined(__MINGW64__)
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@22
PS4, Line 22:  * Determining whether we should import xlocale.h or not is
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@25
PS4, Line 25: #if defined(FAST_DOUBLE_PARSER_SOLARIS) || 
defined(FAST_DOUBLE_PARSER_CYGWIN)
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@67
PS4, Line 67: #endif //  defined(FAST_DOUBLE_PARSER_SOLARIS) || 
defined(FAST_DOUBLE_PARSER_CYGWIN)
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@84
PS4, Line 84:  * However, we have that
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@86
PS4, Line 86:  * Thus it is possible for a number of the form w * 10^-342 where
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@94
PS4, Line 94:  * Any number of form w * 10^309 where w>= 1 is going to be
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@153
PS4, Line 153: // credit: 
https://stackoverflow.com/questions/28868367/getting-the-high-part-of-64-bit-integer-multiplication
line too long (110 > 90)


http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@154
PS4, Line 154: really_inline uint64_t Emulate64x64to128(uint64_t& r_hi, const 
uint64_t x, const uint64_t y) {
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@159
PS4, Line 159: 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@224
PS4, Line 224:  */
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@958
PS4, Line 958:
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@960
PS4, Line 960:   // The exponent is 1024 + 63 + power
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@976
PS4, Line 976:   // The 65536 is (1<<16) and corresponds to
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@979
PS4, Line 979:   // ((152170 * power ) >> 16) is equal to
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@980
PS4, Line 980:   // floor(log(5**power)/log(2))
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@982
PS4, Line 982:   // Note that this is not magic: 152170/(1<<16) is
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@984
PS4, Line 984:   // The 1<<16 value is a power of two; we could use a
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17303/4/be/src/thirdparty/fast_double_parser/fast_double_parser.h@1097
PS4, Line 1097: #if defined(FAST_DOUBLE_PARSER_SOLARIS) || 
defined(FAST_DOUBLE_PARSER_CYGWIN)
line has trailing whitespace



--
To view, visit http://gerrit.cloudera.org:8080/17303
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56f0652cb8f81a491b87d9b108a94c00ae6c99a1
Gerrit-Change-Number: 17303
Gerrit-PatchSet: 4
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 12 Apr 2021 18:59:23 +

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-12 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..


Patch Set 17:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17252/15//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17252/15//COMMIT_MSG@29
PS15, Line 29: row group is processed. By comparing these two timestamps, one
 : can easily diagnose issues related to late arrival of min/max
 : filters.
> Consider adding these three tests for Final table as its content is sending
The test added in Patch set 17 looks good to me, thanks!


http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/runtime/coordinator-filter-state.h
File be/src/runtime/coordinator-filter-state.h:

http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/runtime/coordinator-filter-state.h@178
PS17, Line 178: True value means the always false flag in aggregated filter is 
flipped.
To further clarify the doc, please mention that a True value means the filter 
was flipped from True to False by coordinator.


http://gerrit.cloudera.org:8080/#/c/17252/15/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17252/15/be/src/runtime/coordinator.cc@659
PS15, Line 659:
> If AlwaysTrueFilterReceived() is true, then the accumulated filter is logic
New logic looks good to me. I'll continue my comments on patch set 17.


http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/runtime/coordinator.cc@668
PS17, Line 668: state.AlwaysFalseFlipped()
The method name sounds ambiguous. Was it flipped from true to false, or false 
to true. Seems like the former. The method return true IF filter was an 
AlwaysFalse before being disabled (flipped to an AlwaysTrue) by coordinator.

Maybe "WasAlwaysFalse" is more descriptive?


http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/util/min-max-filter.h
File be/src/util/min-max-filter.h:

http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/util/min-max-filter.h@358
PS17, Line 358:
  : std::string DebugString(const MinMaxFilterPB& filter);
  : bool AlwaysTrue(const MinMaxFilterPB& filter);
  : bool AlwaysFalse(const MinMaxFilterPB& filter);
  : std::string DebugString(const ColumnValuePB& value);
Any reason not making these methods as static methods under MinMaxFilter class?



--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 17
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 12 Apr 2021 18:59:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10350 Fix precision loss while converting DecimalValue to double.

2021-04-12 Thread Amogh Margoor (Code Review)
Amogh Margoor has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/17303 )

Change subject: IMPALA-10350 Fix precision loss while converting DecimalValue 
to double.
..

IMPALA-10350 Fix precision loss while converting DecimalValue to double.

Original approach to convert DecimalValue(internal representation of decimals) 
was not accurate.
It was:
   static_cast(value_) / pow(10.0, scale).
However only integers from −2^53 to 2^53 can be represented accurately by
double precision without any loss.
Hence, it would not work for numbers like -0.43149576573887316.
For DecimalValue representing -0.43149576573887316, value_ would be
-43149576573887316 and scale would be 17. As value_ < -2^53, result would
not be accurate. In newer approach we are using third party library
https://github.com/lemire/fast_double_parser, which handles above scenario
in a performant manner.

Change-Id: I56f0652cb8f81a491b87d9b108a94c00ae6c99a1
---
M be/src/runtime/decimal-value.inline.h
A be/src/thirdparty/fast_double_parser/LICENSE
A be/src/thirdparty/fast_double_parser/LICENSE.BSL
A be/src/thirdparty/fast_double_parser/README.md
A be/src/thirdparty/fast_double_parser/fast_double_parser.h
M bin/rat_exclude_files.txt
M testdata/workloads/functional-query/queries/QueryTest/values.test
7 files changed, 1,551 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/17303/4
--
To view, visit http://gerrit.cloudera.org:8080/17303
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56f0652cb8f81a491b87d9b108a94c00ae6c99a1
Gerrit-Change-Number: 17303
Gerrit-PatchSet: 4
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] [WIP] IMPALA-10650: Bailout min/max filters in hash join builder early

2021-04-12 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17295 )

Change subject: [WIP] IMPALA-10650: Bailout min/max filters in hash join 
builder early
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17295/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17295/5//COMMIT_MSG@10
PS5, Line 10: periodically measuring
> What's the average sampling rate for typical load?
It is per every batch of rows (1024) for now. The performance test is pending.



--
To view, visit http://gerrit.cloudera.org:8080/17295
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I193646e7acfdd3023f7c947d8107da58a1f41183
Gerrit-Change-Number: 17295
Gerrit-PatchSet: 5
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 12 Apr 2021 17:53:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10619: Minor refactoring of analytic function methods

2021-04-12 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17237 )

Change subject: IMPALA-10619: Minor refactoring of analytic function methods
..


Patch Set 4: Verified+1

> Patch Set 4: Verified-1
>
> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7063/

2 HBASE test failures are flaky tests as mentioned in IMPALA-1995.


--
To view, visit http://gerrit.cloudera.org:8080/17237
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I39e4268c0c5500f09acf98357a80763c28f615c2
Gerrit-Change-Number: 17237
Gerrit-PatchSet: 4
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 12 Apr 2021 17:51:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10619: Minor refactoring of analytic function methods

2021-04-12 Thread Aman Sinha (Code Review)
Aman Sinha has removed a vote on this change.

Change subject: IMPALA-10619: Minor refactoring of analytic function methods
..


Removed Verified-1 by Impala Public Jenkins 
--
To view, visit http://gerrit.cloudera.org:8080/17237
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I39e4268c0c5500f09acf98357a80763c28f615c2
Gerrit-Change-Number: 17237
Gerrit-PatchSet: 4
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..


Patch Set 17:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8553/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 17
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 12 Apr 2021 17:48:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky

2021-04-12 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17289 )

Change subject: IMPALA-10532: 
TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17289/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17289/2//COMMIT_MSG@10
PS2, Line 10: due to the query plan change (from 3-node scan to
: 2-node scan) which splits the row groups among scan nodes 
differently.
> It's not clear to me why no pages are filtered by the min-max filter. Could
Here are some details if it is not too much :-) The bottom line is that only 2 
nodes are involved and the split is such that in instance 1, the overlap ratio 
(0.749748) is higher than the threshold (0.5) which implies no min/max 
filtering at the page and row level.

In instance 2, the overlap ration is 0 which implies no min/max filtering.

Combined, no page level filtering is done and thus the total pages filtered is 
0.

The details is purposely left out in the commit message.

Executor instance 1:

1732 E RowGroup Debug: Try to filter out a rowgroup via overlap 
predicate: MonotonicMillis=13987739, fid=1, SchemaNode=optional int64 
l_orderkey  [i:0 d:1 r:0], columnType=BIGINT, overlap ratio=0.749748, 
threshold=0.5, worthiness=0, enabled for page=0, enabled for row=0, data min=1, 
data max=3209607, content=BigIntMinMaxFilter(min=224167, max=2630562, 
always_false=false), always_true=false)

1778 E  - NumRuntimeFilteredPages: 0 (0)

1779 E  - NumRuntimeFilteredRowGroups: 0 (0)


Executor instance 2

1919 E RowGroup Debug: Try to filter out a rowgroup via overlap 
predicate: MonotonicMillis=13987773, fid=1, SchemaNode=optional int64 
l_orderkey  [i:0 d:1 r:0], columnType=BIGINT, overlap ratio=0, 
threshold=0.5, worthiness=1, enabled for page=1, enabled for row=0, data 
min=3209632, data max=600, content=BigIntMinMaxFilter(min=224167, 
max=2630562, always_false=false), always_true=false)

1965 E  - NumRuntimeFilteredPages: 0 (0) 
1966 E  - NumRuntimeFilteredRowGroups: 1 (1)



--
To view, visit http://gerrit.cloudera.org:8080/17289
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f
Gerrit-Change-Number: 17289
Gerrit-PatchSet: 2
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 12 Apr 2021 17:45:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-12 Thread Qifan Chen (Code Review)
Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..


Patch Set 17:

Three new tests are added in Test/overlap_min_max_filters.test.


--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 17
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 12 Apr 2021 17:29:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

2021-04-12 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#17). ( 
http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in 
coordinator
..

IMPALA-10647 Improve always-true min/max filter handling in coordinator

The change improves how a coordinator behaves when a just
arriving min/max filter is always true. A new member
'always_true_filter_received_' is introduced to record such a
fact.  Similarily, the new member always_false_flipped_ is
added to indicate that the always false flag is flipped from
true to false. These two members only influence how the min
and max columns in "Filter routing table" and "Final filter
table" in profile are displayed as follows.

  1. 'PartialUpdates' - The min and the max are partially updated;
  2. 'AlwaysTrue' - One received filter is AlwaysTrue;
  3. 'AlwaysFalse'- No filter is received or all received
filters are empty;
  4. 'Real values'- The final accumulated min/max from all
received filters.

A second change introduced is to record, in scan node, the
arrival time of min/max filters (as a timestamp since the system
is rebooted obtained by calling MonotonicMillis()). A timestamp
of similar nature is recorded for hdfs parquet scanners when a
row group is processed. By comparing these two timestamps, one
can easily diagnose issues related to late arrival of min/max
filters.

Testing:
  1. Ran unit tests;
  2. Ran core tests.

Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M 
testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
8 files changed, 214 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/17252/17
--
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 17
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-9732: Improve exceptions of unsupported HdfsTableSink formats

2021-04-12 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17300 )

Change subject: IMPALA-9732: Improve exceptions of unsupported HdfsTableSink 
formats
..


Patch Set 1: Code-Review+2

(3 comments)

Nice improvement!

http://gerrit.cloudera.org:8080/#/c/17300/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17300/1//COMMIT_MSG@13
PS1, Line 13: eaerlier
earlier


http://gerrit.cloudera.org:8080/#/c/17300/1//COMMIT_MSG@15
PS1, Line 15: Iceberg
Actually Iceberg tables are related because they might use Parquet or ORC 
underneath. FeIcebergTable inherits from FeFsTable and it overrides 
'getFileFormats()' so your patch works well on iceberg tables.


http://gerrit.cloudera.org:8080/#/c/17300/1//COMMIT_MSG@16
PS1, Line 16: realted
related



--
To view, visit http://gerrit.cloudera.org:8080/17300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7fa2f949336a422acb4d01c9347b9b2e808e4aec
Gerrit-Change-Number: 17300
Gerrit-PatchSet: 1
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 12 Apr 2021 16:44:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's precision/scale by query option

2021-04-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's precision/scale by query option
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8552/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 8
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 12 Apr 2021 16:37:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's precision/scale by query option

2021-04-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's precision/scale by query option
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8551/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 7
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 12 Apr 2021 16:33:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's precision/scale by query option

2021-04-12 Thread fifteencai (Code Review)
fifteencai has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's precision/scale by query option
..

IMPALA-10445: Adjust NDV's precision/scale by query option

This is a new way to control NDV's scale.

Since IMPALA-2658, we can trade additional memory for more accurate
estimation by setting larger `scale`. The scale is decided by SQL
writers. However, it is a bumpy road to smoothly upgrade ndv by setting
scales larger than the default value of 2. Here lies 2 reasons:

- Firstly, SQL writers are reluctant to lower expectations, they prone
to write ndv(id, 10) other than ndv(id, 9), ndv(id, 8) and so on. But
larger scales like 10 will use much more memories, especially when
there are `group by`s with high cardinality. So it is wiser to let
cluster admin to choose appropriate scale instead.

- Secondly, The queries are stored in a BI tool's configuration DB.
Rewriting thousands of SQLs is a risky job.

In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`.
During to its runtime essence, We can either set it before query
submission according to cluster's overall load, or set it by placing
a default query option for dynamic resource pool.

We also modified the `Analyzer` to make sure APPX_COUNT_DISTINCT
can work with this query option. So if needed, we can degrade service
by transforming `count (distinct id)` to `ndv(id, scale)`.

Implementation details:

- The default value of DEFAULT_NDV_SCALE is 2, so we won't change
the default ndv behavior.
- We port `CountDistinctToNdv` transform logic from
`SelectStmt.analyze()` to `ExprRewriter`, making it compatible with
further rewrite rules.
- The newly added rewrite rule `DefaultNdvScaleRule` is applied
after `CountDistinctToNdvRule`.

Usage:

To set a default ndv scale:
```
SET DEFAULT_NDV_SCALE = 10;  -- The range is [1, 10]
```

To unset:
```
SET DEFAULT_NDV_SCALE = 2;
```

Here are test results of a typical workload (cardinality=40,090,650):
++
|   Metric| Count Distinct |NDV2|NDV5|NDV10  |
++
|  Memory(GB) |   3.83 |1.84|1.85| 1.89  |
| Duration(s) |  182.89|   30.22|29.72   | 29.24 |
|  ErrorRate  |0%  |1.8%|1.17%   | 0.06% |
++

Testing:
1) Added 3 unit test cases in `ExprRewriteRulesTest`.
2) Added 5 unit test cases in `ExprRewriterTest`.
3) Ran all front-end unit test, passed.
4) Added a new query-option test.

Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java
A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
11 files changed, 229 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/8
--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 8
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's precision/scale by query option

2021-04-12 Thread fifteencai (Code Review)
fifteencai has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's precision/scale by query option
..

IMPALA-10445: Adjust NDV's precision/scale by query option

We introduce a new way to control NDV's scale.

Since IMPALA-2658, we can trade additional memory for more accurate
estimation by setting larger `scale`. The scale is decided by SQL
writers. However, it is a bumpy road to smoothly upgrade ndv by setting
scales larger than the default value of 2. Here lies 2 reasons:
-Firstly, SQL writers are reluctant to lower expectations, they prone
to write ndv(id, 10) other than ndv(id, 9), ndv(id, 8) and so on. But
larger scales like 10 will use much more memories, especially when
there are `group by`s with high cardinality. So it is wiser to let
cluster admin to choose appropriate scale instead.
Secondly, The queries are stored in a BI tool's configuration DB.
Rewriting thousands of SQLs is a risky job.

In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`.
During to its runtime essence, We can either set it before query
submission according to cluster's overall load, or set it by placing
a default query option for dynamic resource pool.

We also modified the `Analyzer` to make sure APPX_COUNT_DISTINCT
can work with this query option. So if needed, we can degrade service
by transforming `count (distinct id)` to `ndv(id, scale)`.

Implementation details:

- The default value of DEFAULT_NDV_SCALE is 2, so we won't change
the default ndv behavior.
- We port `CountDistinctToNdv` transform logic from
`SelectStmt.analyze()` to `ExprRewriter`, making it compatible with
further rewrite rules.
- The newly added rewrite rule `DefaultNdvScaleRule` is applied
after `CountDistinctToNdvRule`.

Usage:

To set a default ndv scale:
```
SET DEFAULT_NDV_SCALE = 10;  -- The range is [1, 10]
```

To unset:
```
SET DEFAULT_NDV_SCALE = 2;
```

Here are test results of a typical workload (cardinality=40,090,650):
++
|   Metric| Count Distinct |NDV2|NDV5|NDV10  |
++
|  Memory(GB) |   3.83 |1.84|1.85| 1.89  |
| Duration(s) |  182.89|   30.22|29.72   | 29.24 |
|  ErrorRate  |0%  |1.8%|1.17%   | 0.06% |
++

Testing:
1) Added 3 unit test cases in `ExprRewriteRulesTest`.
2) Added 5 unit test cases in `ExprRewriterTest`.
3) Ran all front-end unit test, passed.
4) Added a new query-option test.

Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java
A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
11 files changed, 229 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/7
--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 7
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's precision/scale by query option

2021-04-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's precision/scale by query option
..


Patch Set 3:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/8550/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 3
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 12 Apr 2021 15:58:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10640: Support reading Parquet Bloom filters - most common types

2021-04-12 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17026 )

Change subject: IMPALA-10640: Support reading Parquet Bloom filters - most 
common types
..


Patch Set 25: Code-Review+1

(5 comments)

Thanks Daniel, the change LGTM. Tamas said he's planning to take another look 
so I'm only giving +1 for now.

http://gerrit.cloudera.org:8080/#/c/17026/24//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17026/24//COMMIT_MSG@7
PS24, Line 7: IMPALA-10640: Support reading Parquet Bloom filters - most common 
types
> I think it can only apply to top level columns as I check whether the type
I see. Is there a plan to extend it to non top level fields? Btw, does 
Parquet-MR generate bloom filters for them as well?


http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc
File be/src/exec/parquet/hdfs-parquet-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@887
PS24, Line 887:
> I've added the query option and put this code and the call to CreateColIdx2
Yeah, adding a test is never a bad idea and it shouldn't be hard to add such a 
test in a .test file.


http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@1483
PS24, Line 1483:
> I've considered that but ParquetPageIndex::ReadAll()
Oh, I see. Thanks for taking a look.


http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/hdfs-parquet-scanner.cc@1716
PS24, Line 1716:   
bloom_filter.InitFromDirectoryNoCopy(data_buffer.buffer(), data_buffer.Size()));
> I included the row group idx, column idx and filename in the log message. I
I think it's OK as it is. Based on the column idx and query text I think we 
should be able to identify the conjunct.


http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/parquet-bloom-filter-util.h
File be/src/exec/parquet/parquet-bloom-filter-util.h:

http://gerrit.cloudera.org:8080/#/c/17026/24/be/src/exec/parquet/parquet-bloom-filter-util.h@37
PS24, Line 37: .
> I do not want to guarantee in which cases '*storage' will be used as the im
I see, thanks for extending the comment.



--
To view, visit http://gerrit.cloudera.org:8080/17026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7119c7161fa3658e561fc1265430cb90079d8287
Gerrit-Change-Number: 17026
Gerrit-PatchSet: 25
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 12 Apr 2021 15:55:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10350 Fix precision loss while converting DecimalValue to double.

2021-04-12 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17303 )

Change subject: IMPALA-10350 Fix precision loss while converting DecimalValue 
to double.
..


Patch Set 3:

(7 comments)

The change looks great overall!

http://gerrit.cloudera.org:8080/#/c/17303/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17303/3//COMMIT_MSG@7
PS3, Line 7: IMPALA-10350
Please add : after IMPALA-10350


http://gerrit.cloudera.org:8080/#/c/17303/3//COMMIT_MSG@9
PS3, Line 9: imals) was not accurate.
In commit message bodies we use lines with 72 chars width.


http://gerrit.cloudera.org:8080/#/c/17303/3//COMMIT_MSG@20
PS3, Line 20:
Please add section about testing.


http://gerrit.cloudera.org:8080/#/c/17303/3/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

http://gerrit.cloudera.org:8080/#/c/17303/3/be/src/runtime/decimal-value.inline.h@727
PS3, Line 727:   // Original approach was to use:
Thanks for the detailed comments!


http://gerrit.cloudera.org:8080/#/c/17303/3/be/src/runtime/decimal-value.inline.h@745
PS3, Line 745: int64_t> (value_
Type of value_ can be __int128_t. If that's the case and the value is greater 
than the max value of int64_t, then I think we also want to fallback to the 
original algorithm.


http://gerrit.cloudera.org:8080/#/c/17303/3/be/src/runtime/decimal-value.inline.h@747
PS3, Line 747:   // Fallback to original approach. This is not always accurate 
as described above.
 :   // Other alternative would be to convert value_ to string and 
parse it into
 :   // double using std:strtod. However std::strtod is atleast 4X 
slower
 :   // than this approach 
(https://github.com/lemire/fast_double_parser#sample-results)
 :   // and that is excluding cost to convert value_ to string.
 :   if (!success) {
 : result = static_cast(value_) / pow(10.0, scale);
 :   }
I'm happy with the current patch as it is a big step forward, but IMPALA-10350 
is about correctness and I think we still want to track that problem in Jira.

How about creating a sub-task for IMPALA-10350 and commit this patch with that 
Jira ID? We'll close the sub-task once this patch is committed, but leave 
IMPALA-10350 open so we'll be aware of the limitations. Hopefully we'll find a 
proper algorithm in the future. (If this patch was using strtod then we'd still 
need another Jira to track perf improvements possibilities in the future).


http://gerrit.cloudera.org:8080/#/c/17303/3/testdata/workloads/functional-query/queries/QueryTest/values.test
File testdata/workloads/functional-query/queries/QueryTest/values.test:

http://gerrit.cloudera.org:8080/#/c/17303/3/testdata/workloads/functional-query/queries/QueryTest/values.test@106
PS3, Line 106: # insert one negative value < -2^53 * 10^(-17)
Please add test where the value needs to be stored in __int128_t.



--
To view, visit http://gerrit.cloudera.org:8080/17303
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56f0652cb8f81a491b87d9b108a94c00ae6c99a1
Gerrit-Change-Number: 17303
Gerrit-PatchSet: 3
Gerrit-Owner: Amogh Margoor 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 12 Apr 2021 15:50:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's precision/scale by query option

2021-04-12 Thread fifteencai (Code Review)
fifteencai has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's precision/scale by query option
..

IMPALA-10445: Adjust NDV's precision/scale by query option

In this commit, we introduced a new Query Option to control NDV's default 
HyperLogLog scale.

Since IMPALA-2658, we can trade additional memory for more accurate estimation 
by setting larger `scale`.
The scale is decided by SQL writers. However, it is a bumpy road to smoothly 
upgrade ndv by setting
scales larger than the default value of 2. Here lies 2 reasons, firstly, SQL 
writers are reluctant to
lower expectations, they prone to write ndv(id, 10) other than ndv(id, 9), 
ndv(id, 8) and so on. But
larger scales like 10 will use much more memories, especially when there are 
`group by`s with high
cardinality. So it is wiser to let cluster admin to choose appropriate scale 
instead. Secondly, The
queries are stored in a BI tool's configuration DB. Rewriting thousands of SQLs 
is a risky job.

In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`. During to 
its runtime essence, We
can either set it before query submission according to cluster's overall load, 
or set it by placing a
default query option for dynamic resource pool.

We also modified the `Analyzer` to make sure APPX_COUNT_DISTINCT works with 
this query option. So if
needed, we can degrade the service by transforming `count (distinct id)` to 
`ndv(id, scale)`.

Implementation details:

- The default value of DEFAULT_NDV_SCALE is 2, so we won't change the default 
ndv behavior.
- We port `CountDistinctToNdv` transform logic from `SelectStmt.analyze()` to 
`ExprRewriter`, making
it compatible with further rewrite rules.
- The newly added rewrite rule `DefaultNdvScaleRule` is applied after 
`CountDistinctToNdvRule`.

Usage:

To set a default ndv scale:
```
SET DEFAULT_NDV_SCALE = 10;  -- The range is [1, 10]
```

To unset:
```
SET DEFAULT_NDV_SCALE = 2;
```

Here are test results of a typical workload (cardinality = 40,090,650):
+=+
|   Metric| Count Distinct |NDV2|NDV5|NDV10   |
+-+
|  Memory(GB) |   3.83 |1.84|1.85| 1.89   |
| Duration(s) |  182.89|   30.22|29.72   | 29.24  |
|  ErrorRate  |0%  |1.8%|1.17%   | 0.06%  |
+=+

Testing:
1) Added 3 unit test cases in `ExprRewriteRulesTest`.
2) Added 5 unit test cases in `ExprRewriterTest`.
3) Ran all front-end unit test, passed.
4) Added a new query-option test.

Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java
A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
11 files changed, 196 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/3
--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 3
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types

2021-04-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17262 )

Change subject: WIP - IMPALA-10642: Write support for Parquet Bloom filters - 
most common types
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8549/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792
Gerrit-Change-Number: 17262
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 12 Apr 2021 15:32:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types

2021-04-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17262 )

Change subject: WIP - IMPALA-10642: Write support for Parquet Bloom filters - 
most common types
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17262/3/be/src/exec/parquet/hdfs-parquet-table-writer.cc
File be/src/exec/parquet/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/17262/3/be/src/exec/parquet/hdfs-parquet-table-writer.cc@495
PS3, Line 495:   // Write dictionary keys to Parquet Bloom filter if we 
haven't been filling it so
line too long (91 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/17262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792
Gerrit-Change-Number: 17262
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 12 Apr 2021 15:12:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types

2021-04-12 Thread Daniel Becker (Code Review)
Daniel Becker has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17262


Change subject: WIP - IMPALA-10642: Write support for Parquet Bloom filters - 
most common types
..

WIP - IMPALA-10642: Write support for Parquet Bloom filters - most common types

This change adds support for writing Parquet Bloom filters for the types
for which read support was added in IMPALA-10640.

Writing of Parquet Bloom filters can be controlled by the
'parquet_bloom_filter_write' query option which has the following
possible values:
  NEVER  - never write Parquet Bloom filters
  TBL_PROPS  - write Parquet Bloom filters as set in table properties
  IF_NO_DICT - write Parquet Bloom filters if the row group is not
   fully dictionary encoded
  ALWAYS - always write Parquet Bloom filters, even if the row
   group is fully dictionary encoded

TODO: Implement table properties involving Parquet Bloom filters.

TODO: Decide size of Parquet Bloom filter based on NDV heuristics or
configuration.

Testing:
  TODO

Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792
---
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.h
M be/src/exec/parquet/parquet-bloom-filter-util.cc
M be/src/exec/parquet/parquet-bloom-filter-util.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/dict-encoding.h
M be/src/util/parquet-bloom-filter.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
12 files changed, 313 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/17262/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792
Gerrit-Change-Number: 17262
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's precision/scale by query option

2021-04-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's precision/scale by query option
..


Patch Set 2:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/8548/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 2
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 12 Apr 2021 14:57:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's precision/scale by query option

2021-04-12 Thread fifteencai (Code Review)
fifteencai has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's precision/scale by query option
..

IMPALA-10445: Adjust NDV's precision/scale by query option

In this commit, we introduced a new Query Option to control NDV's default 
HyperLogLog scale.

Since IMPALA-2658, we can trade additional memory for more accurate estimation 
by setting larger `scale`.
The scale is decided by SQL writers. However, it is a bumpy road to smoothly 
upgrade ndv by setting
scales larger than the default value of 2. Here lies 2 reasons, firstly, SQL 
writers are reluctant to
lower expectations, they prone to write ndv(id, 10) other than ndv(id, 9), 
ndv(id, 8) and so on. But
larger scales like 10 will use much more memories, especially when there are 
`group by`s with high
cardinality. So it is wiser to let cluster admin to choose appropriate scale 
instead. Secondly, The
queries are stored in a BI tool's configuration DB. Rewriting thousands of SQLs 
is a risky job.

In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`. During to 
its runtime essence, We
can either set it before query submission according to cluster's overall load, 
or set it by placing a
default query option for dynamic resource pool.

We also modified the `Analyzer` to make sure APPX_COUNT_DISTINCT works with 
this query option. So if
needed, we can degrade the service by transforming `count (distinct id)` to 
`ndv(id, scale)`.

Implementation details:

- The default value of DEFAULT_NDV_SCALE is 2, so we won't change the default 
ndv behavior.
- We port `CountDistinctToNdv` transform logic from `SelectStmt.analyze()` to 
`ExprRewriter`, making
it compatible with further rewrite rules.
- The newly added rewrite rule `DefaultNdvScaleRule` is applied after 
`CountDistinctToNdvRule`.

Usage:

To set a default ndv scale:
```
SET DEFAULT_NDV_SCALE = 10;  -- The range is [1, 10]
```

To unset:
```
SET DEFAULT_NDV_SCALE = 2;
```

Here are test results of a typical workload (cardinality = 40,090,650):
+=+
|   Metric| Count Distinct |NDV2|NDV5|NDV10   |
+-+
|  Memory(GB) |   3.83 |1.84|1.85| 1.89   |
| Duration(s) |  182.89|   30.22|29.72   | 29.24  |
|  ErrorRate  |0%  |1.8%|1.17%   | 0.06%  |
+=+

Testing:
1) Added 3 unit test cases in `ExprRewriteRulesTest`.
2) Added 5 unit test cases in `ExprRewriterTest`.
3) Ran all front-end unit test, passed.
4) Added a new query-option test.

Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java
A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
11 files changed, 196 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/2
--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 2
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR](asf-site) Switch to .asf.yaml for site publishing

2021-04-12 Thread Jim Apple (Code Review)
Jim Apple has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17301 )

Change subject: Switch to .asf.yaml for site publishing
..

Switch to .asf.yaml for site publishing

Our existing method of web publication will be going offline.
.asf.yaml is the replacement.

See also

https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features#Git.asf.yamlfeatures-Publishingabranchtoyourprojectwebsite

(not visible to all users)
https://lists.apache.org/thread.html/r8d023c0f5afefca7f6ce4e26d02404762bd6234fbe328011e1564249%40%3Cusers.infra.apache.org%3E

https://infra-reports.apache.org/site-source/

Change-Id: I9154ab397674d4c98e20f7c4f3774dfc19ff2932
Reviewed-on: http://gerrit.cloudera.org:8080/17301
Reviewed-by: Jim Apple 
Tested-by: Jim Apple 
---
A .asf.yaml
1 file changed, 2 insertions(+), 0 deletions(-)

Approvals:
  Jim Apple: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/17301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-MessageType: merged
Gerrit-Change-Id: I9154ab397674d4c98e20f7c4f3774dfc19ff2932
Gerrit-Change-Number: 17301
Gerrit-PatchSet: 3
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Shajini Thayasingh 


[Impala-ASF-CR](asf-site) Switch to .asf.yaml for site publishing

2021-04-12 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17301 )

Change subject: Switch to .asf.yaml for site publishing
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-MessageType: comment
Gerrit-Change-Id: I9154ab397674d4c98e20f7c4f3774dfc19ff2932
Gerrit-Change-Number: 17301
Gerrit-PatchSet: 2
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Shajini Thayasingh 
Gerrit-Comment-Date: Mon, 12 Apr 2021 14:22:31 +
Gerrit-HasComments: No


[Impala-ASF-CR](asf-site) Switch to .asf.yaml for site publishing

2021-04-12 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17301 )

Change subject: Switch to .asf.yaml for site publishing
..


Patch Set 2: Code-Review+2

Carry +2 (only change is in formatting of commit description


--
To view, visit http://gerrit.cloudera.org:8080/17301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-MessageType: comment
Gerrit-Change-Id: I9154ab397674d4c98e20f7c4f3774dfc19ff2932
Gerrit-Change-Number: 17301
Gerrit-PatchSet: 2
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Shajini Thayasingh 
Gerrit-Comment-Date: Mon, 12 Apr 2021 14:20:42 +
Gerrit-HasComments: No


[Impala-ASF-CR](asf-site) Switch to .asf.yaml for site publishing

2021-04-12 Thread Jim Apple (Code Review)
Jim Apple has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17301


Change subject: Switch to .asf.yaml for site publishing
..

Switch to .asf.yaml for site publishing

Our existing method of web publication will be going offline. .asf.yaml is the 
replacement.

See also

https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features#Git.asf.yamlfeatures-Publishingabranchtoyourprojectwebsite

(not visible to all users)
https://lists.apache.org/thread.html/r8d023c0f5afefca7f6ce4e26d02404762bd6234fbe328011e1564249%40%3Cusers.infra.apache.org%3E

https://infra-reports.apache.org/site-source/

Change-Id: I9154ab397674d4c98e20f7c4f3774dfc19ff2932
---
A .asf.yaml
1 file changed, 2 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/17301/1
--
To view, visit http://gerrit.cloudera.org:8080/17301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9154ab397674d4c98e20f7c4f3774dfc19ff2932
Gerrit-Change-Number: 17301
Gerrit-PatchSet: 1
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Shajini Thayasingh 


[Impala-ASF-CR](asf-site) Switch to .asf.yaml for site publishing

2021-04-12 Thread Jim Apple (Code Review)
Hello Quanlong Huang, Shajini Thayasingh, Sahil Takiar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17301

to look at the new patch set (#2).

Change subject: Switch to .asf.yaml for site publishing
..

Switch to .asf.yaml for site publishing

Our existing method of web publication will be going offline.
.asf.yaml is the replacement.

See also

https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features#Git.asf.yamlfeatures-Publishingabranchtoyourprojectwebsite

(not visible to all users)
https://lists.apache.org/thread.html/r8d023c0f5afefca7f6ce4e26d02404762bd6234fbe328011e1564249%40%3Cusers.infra.apache.org%3E

https://infra-reports.apache.org/site-source/

Change-Id: I9154ab397674d4c98e20f7c4f3774dfc19ff2932
---
A .asf.yaml
1 file changed, 2 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/17301/2
--
To view, visit http://gerrit.cloudera.org:8080/17301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9154ab397674d4c98e20f7c4f3774dfc19ff2932
Gerrit-Change-Number: 17301
Gerrit-PatchSet: 2
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Shajini Thayasingh 


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's precision/scale by query option

2021-04-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's precision/scale by query option
..


Patch Set 1:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/8547/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 1
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 12 Apr 2021 13:47:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's precision/scale by query option

2021-04-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's precision/scale by query option
..


Patch Set 1:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/17306/1/fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java
File fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java:

http://gerrit.cloudera.org:8080/#/c/17306/1/fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java@13
PS1, Line 13: public static 
org.apache.impala.rewrite.CountDistinctToNdvRule INSTANCE = new 
org.apache.impala.rewrite.CountDistinctToNdvRule();
line too long (133 > 90)


http://gerrit.cloudera.org:8080/#/c/17306/1/fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java@25
PS1, Line 25: if 
(!analyzer.getQueryCtx().client_request.query_options.appx_count_distinct) 
return expr;
line too long (98 > 90)


http://gerrit.cloudera.org:8080/#/c/17306/1/fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java@31
PS1, Line 31: || !oldFunctionCallExpr.isDistinct() || 
oldFunctionCallExpr.getParams().exprs().size() != 1 ) {
line too long (111 > 90)


http://gerrit.cloudera.org:8080/#/c/17306/1/fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java@36
PS1, Line 36: FunctionCallExpr ndvFunc = new FunctionCallExpr("ndv", 
oldFunctionCallExpr.getParams().exprs());
line too long (104 > 90)


http://gerrit.cloudera.org:8080/#/c/17306/1/fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
File fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java:

http://gerrit.cloudera.org:8080/#/c/17306/1/fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java@28
PS1, Line 28:  * changes were made, the transformed Expr is guaranteed to 
be a different Expr object,
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17306/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java:

http://gerrit.cloudera.org:8080/#/c/17306/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@521
PS1, Line 521: assertToSql(createAnalysisCtx(options), countDistinctSql, 
countDistinctSql, "SELECT ndv(id) FROM functional.alltypes");
line too long (123 > 90)


http://gerrit.cloudera.org:8080/#/c/17306/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@525
PS1, Line 525: assertToSql(createAnalysisCtx(options), countDistinctSql, 
countDistinctSql, "SELECT ndv(id, 10) FROM functional.alltypes");
line too long (127 > 90)


http://gerrit.cloudera.org:8080/#/c/17306/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java@535
PS1, Line 535: assertToSql(createAnalysisCtx(options), ndvSql, ndvSql, 
"SELECT ndv(id, 9) FROM functional.alltypes");
line too long (106 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 1
Gerrit-Owner: fifteencai 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 12 Apr 2021 13:29:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10445: Adjust NDV's precision/scale by query option

2021-04-12 Thread fifteencai (Code Review)
fifteencai has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17306


Change subject: IMPALA-10445: Adjust NDV's precision/scale by query option
..

IMPALA-10445: Adjust NDV's precision/scale by query option

In this commit, we introduced a new Query Option to control NDV's default 
HyperLogLog scale.

Since IMPALA-2658, we can trade additional memory for more accurate estimation 
by setting larger `scale`.
The scale is decided by SQL writers. However, it is a bumpy road to smoothly 
upgrade ndv by setting
scales larger than the default value of 2. Here lies 2 reasons, firstly, SQL 
writers are reluctant to
lower expectations, they prone to write ndv(id, 10) other than ndv(id, 9), 
ndv(id, 8) and so on. But
larger scales like 10 will use much more memories, especially when there are 
`group by`s with high
cardinality. So it is wiser to let cluster admin to choose appropriate scale 
instead. Secondly, The
queries are stored in a BI tool's configuration DB. Rewriting thousands of SQLs 
is a risky job.

In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`. During to 
its runtime essence, We
can either set it before query submission according to cluster's overall load, 
or set it by placing a
default query option for dynamic resource pool.

We also modified the `Analyzer` to make sure APPX_COUNT_DISTINCT works with 
this query option. So if
needed, we can degrade the service by transforming `count (distinct id)` to 
`ndv(id, scale)`.

Implementation details:

- The default value of DEFAULT_NDV_SCALE is 2, so we won't change the default 
ndv behavior.
- We port `CountDistinctToNdv` transform logic from `SelectStmt.analyze()` to 
`ExprRewriter`, making
it compatible with further rewrite rules.
- The newly added rewrite rule `DefaultNdvScaleRule` is applied after 
`CountDistinctToNdvRule`.

Usage:

To set a default ndv scale:
```
SET DEFAULT_NDV_SCALE = 10;  -- The range is [1, 10]
```

To unset:
```
SET DEFAULT_NDV_SCALE = 2;
```

Here are test results of a typical workload (cardinality = 40,090,650):
+=+
|   Metric| Count Distinct |NDV2|NDV5|NDV10   |
+-+
|  Memory(GB) |   3.83 |1.84|1.85| 1.89   |
| Duration(s) |  182.89|   30.22|29.72   | 29.24  |
|  ErrorRate  |0%  |1.8%|1.17%   | 0.06%  |
+=+

Testing:
1) Added 3 unit test cases in `ExprRewriteRulesTest`.
2) Added 5 unit test cases in `ExprRewriterTest`.
3) Ran all front-end unit test, passed.
4) Added a new query-option test.

Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java
A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
11 files changed, 190 insertions(+), 34 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/1
--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 1
Gerrit-Owner: fifteencai 


[Impala-ASF-CR] IMPALA-10619: Minor refactoring of analytic function methods

2021-04-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17237 )

Change subject: IMPALA-10619: Minor refactoring of analytic function methods
..


Patch Set 4: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7063/


--
To view, visit http://gerrit.cloudera.org:8080/17237
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I39e4268c0c5500f09acf98357a80763c28f615c2
Gerrit-Change-Number: 17237
Gerrit-PatchSet: 4
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 12 Apr 2021 13:12:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10631: Upgrade DataSketches to version 3.0.0

2021-04-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17294 )

Change subject: IMPALA-10631: Upgrade DataSketches to version 3.0.0
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/8546/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/17294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9
Gerrit-Change-Number: 17294
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 12 Apr 2021 10:37:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10631: Upgrade DataSketches to version 3.0.0

2021-04-12 Thread Fucun Chu (Code Review)
Fucun Chu has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/17294 )

Change subject: IMPALA-10631: Upgrade DataSketches to version 3.0.0
..

IMPALA-10631: Upgrade DataSketches to version 3.0.0

Upgrade the external DataSketches files CPC/HLL/KLL/Theta to version
3.0.0

tests:
 -Ran the tests from tests/query_test/test_datasketches.py

Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9
---
M be/src/exprs/datasketches-test.cc
M be/src/thirdparty/datasketches/AuxHashMap-internal.hpp
M be/src/thirdparty/datasketches/AuxHashMap.hpp
M be/src/thirdparty/datasketches/CompositeInterpolationXTable.hpp
M be/src/thirdparty/datasketches/CouponHashSet-internal.hpp
M be/src/thirdparty/datasketches/CouponHashSet.hpp
M be/src/thirdparty/datasketches/CouponList-internal.hpp
M be/src/thirdparty/datasketches/CouponList.hpp
M be/src/thirdparty/datasketches/CubicInterpolation.hpp
M be/src/thirdparty/datasketches/HarmonicNumbers.hpp
M be/src/thirdparty/datasketches/Hll4Array-internal.hpp
M be/src/thirdparty/datasketches/Hll4Array.hpp
M be/src/thirdparty/datasketches/Hll6Array-internal.hpp
M be/src/thirdparty/datasketches/Hll6Array.hpp
M be/src/thirdparty/datasketches/Hll8Array-internal.hpp
M be/src/thirdparty/datasketches/Hll8Array.hpp
M be/src/thirdparty/datasketches/HllArray-internal.hpp
M be/src/thirdparty/datasketches/HllArray.hpp
M be/src/thirdparty/datasketches/HllSketch-internal.hpp
M be/src/thirdparty/datasketches/HllSketchImpl.hpp
M be/src/thirdparty/datasketches/HllSketchImplFactory.hpp
M be/src/thirdparty/datasketches/HllUnion-internal.hpp
M be/src/thirdparty/datasketches/HllUtil.hpp
M be/src/thirdparty/datasketches/MurmurHash3.h
M be/src/thirdparty/datasketches/README.md
M be/src/thirdparty/datasketches/RelativeErrorTables.hpp
A be/src/thirdparty/datasketches/bounds_on_ratios_in_sampled_sets.hpp
A be/src/thirdparty/datasketches/bounds_on_ratios_in_theta_sketched_sets.hpp
M be/src/thirdparty/datasketches/cpc_common.hpp
M be/src/thirdparty/datasketches/cpc_compressor.hpp
M be/src/thirdparty/datasketches/cpc_compressor_impl.hpp
M be/src/thirdparty/datasketches/cpc_sketch.hpp
M be/src/thirdparty/datasketches/cpc_sketch_impl.hpp
M be/src/thirdparty/datasketches/cpc_union.hpp
M be/src/thirdparty/datasketches/cpc_union_impl.hpp
M be/src/thirdparty/datasketches/cpc_util.hpp
M be/src/thirdparty/datasketches/hll.hpp
M be/src/thirdparty/datasketches/icon_estimator.hpp
M be/src/thirdparty/datasketches/kll_quantile_calculator.hpp
M be/src/thirdparty/datasketches/kll_quantile_calculator_impl.hpp
M be/src/thirdparty/datasketches/kll_sketch.hpp
M be/src/thirdparty/datasketches/kll_sketch_impl.hpp
M be/src/thirdparty/datasketches/memory_operations.hpp
M be/src/thirdparty/datasketches/theta_a_not_b.hpp
M be/src/thirdparty/datasketches/theta_a_not_b_impl.hpp
A be/src/thirdparty/datasketches/theta_comparators.hpp
A be/src/thirdparty/datasketches/theta_constants.hpp
A be/src/thirdparty/datasketches/theta_helpers.hpp
M be/src/thirdparty/datasketches/theta_intersection.hpp
A be/src/thirdparty/datasketches/theta_intersection_base.hpp
A be/src/thirdparty/datasketches/theta_intersection_base_impl.hpp
M be/src/thirdparty/datasketches/theta_intersection_impl.hpp
A be/src/thirdparty/datasketches/theta_jaccard_similarity.hpp
A be/src/thirdparty/datasketches/theta_jaccard_similarity_base.hpp
A be/src/thirdparty/datasketches/theta_set_difference_base.hpp
A be/src/thirdparty/datasketches/theta_set_difference_base_impl.hpp
M be/src/thirdparty/datasketches/theta_sketch.hpp
M be/src/thirdparty/datasketches/theta_sketch_impl.hpp
M be/src/thirdparty/datasketches/theta_union.hpp
A be/src/thirdparty/datasketches/theta_union_base.hpp
A be/src/thirdparty/datasketches/theta_union_base_impl.hpp
M be/src/thirdparty/datasketches/theta_union_impl.hpp
A be/src/thirdparty/datasketches/theta_update_sketch_base.hpp
A be/src/thirdparty/datasketches/theta_update_sketch_base_impl.hpp
M be/src/thirdparty/datasketches/u32_table.hpp
M be/src/thirdparty/datasketches/u32_table_impl.hpp
66 files changed, 2,646 insertions(+), 1,873 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/17294/3
--
To view, visit http://gerrit.cloudera.org:8080/17294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I37622a7643d015b80f55b802421eae826aa7a4f9
Gerrit-Change-Number: 17294
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[native-toolchain-CR] IMPALA-10488: Add jwt-cpp 0.5.0 to the toolchain

2021-04-12 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17118 )

Change subject: IMPALA-10488: Add jwt-cpp 0.5.0 to the toolchain
..


Patch Set 3: Code-Review+2

>here are some CMake extensions that can go fetch a project as of a particular 
>revision and build it

I agree, that would be the best. It adds one more step that can fail, but if 
github doesn't work, then we wouldn't even get to that step.


--
To view, visit http://gerrit.cloudera.org:8080/17118
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77aa3b36b45e8ef3c2d7873327948197c2c65d11
Gerrit-Change-Number: 17118
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Mon, 12 Apr 2021 10:17:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky

2021-04-12 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17289 )

Change subject: IMPALA-10532: 
TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17289/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17289/2//COMMIT_MSG@10
PS2, Line 10: due to the query plan change (from 3-node scan to
: 2-node scan) which splits the row groups among scan nodes 
differently.
It's not clear to me why no pages are filtered by the min-max filter. Could you 
help to add more details?

BTW, I left a comment in the JIRA about how to reproduce this issue locally: 
https://issues.apache.org/jira/browse/IMPALA-10532?focusedCommentId=17319265=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17319265



--
To view, visit http://gerrit.cloudera.org:8080/17289
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I527de530f7db1ce959e7ef2ae3ced18677221c9f
Gerrit-Change-Number: 17289
Gerrit-PatchSet: 2
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 12 Apr 2021 10:09:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10619: Minor refactoring of analytic function methods

2021-04-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17237 )

Change subject: IMPALA-10619: Minor refactoring of analytic function methods
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7063/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17237
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I39e4268c0c5500f09acf98357a80763c28f615c2
Gerrit-Change-Number: 17237
Gerrit-PatchSet: 4
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 12 Apr 2021 07:22:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10619: Minor refactoring of analytic function methods

2021-04-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17237 )

Change subject: IMPALA-10619: Minor refactoring of analytic function methods
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17237
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I39e4268c0c5500f09acf98357a80763c28f615c2
Gerrit-Change-Number: 17237
Gerrit-PatchSet: 4
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 12 Apr 2021 07:22:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10619: Minor refactoring of analytic function methods

2021-04-12 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17237 )

Change subject: IMPALA-10619: Minor refactoring of analytic function methods
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17237/1/fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
File fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java:

http://gerrit.cloudera.org:8080/#/c/17237/1/fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java@833
PS1, Line 833:   protected FunctionCallExpr 
createRewrittenFunction(FunctionName funcName,
> Sure. Don't want to block this. I'm ok with this since I think the external
Yes, that's indeed how it is being checked in the derived class ..through a 
Preconditions.checkArgument on the instance of getFnCall().



--
To view, visit http://gerrit.cloudera.org:8080/17237
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I39e4268c0c5500f09acf98357a80763c28f615c2
Gerrit-Change-Number: 17237
Gerrit-PatchSet: 3
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 12 Apr 2021 07:20:49 +
Gerrit-HasComments: Yes