[Impala-ASF-CR] IMPALA-9691: Support Kudu Timestamp and Date bloom filter

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16094 )

Change subject: IMPALA-9691: Support Kudu Timestamp and Date bloom filter
..


Patch Set 6: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48
Gerrit-Change-Number: 16094
Gerrit-PatchSet: 6
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 26 Jun 2020 06:56:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9691: Support Kudu Timestamp and Date bloom filter

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16094 )

Change subject: IMPALA-9691: Support Kudu Timestamp and Date bloom filter
..

IMPALA-9691: Support Kudu Timestamp and Date bloom filter

Impala save timestamp as 12 bytes of structure TimestampValue with
time in nano seconds. Kudu store timestamp as 8 bytes of Unix Time
microseconds. To avoid the data truncation issue in the bloom filter,
add FunctionCallExpr with 'utc_to_unix_micros' as the root of source
expression of bloom filter to convert timestamp values to microseconds
when building timestamp bloom filter for Kudu.
Generated functional date_tbl table in Kudu format for unit-test.
Added new test cases for Kudu Timestamp and Date bloom filters.

Testing:
Passed all core tests.

Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48
Reviewed-on: http://gerrit.cloudera.org:8080/16094
Reviewed-by: Thomas Tauber-Marshall 
Tested-by: Impala Public Jenkins 
---
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
7 files changed, 311 insertions(+), 30 deletions(-)

Approvals:
  Thomas Tauber-Marshall: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/16094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48
Gerrit-Change-Number: 16094
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
..


Patch Set 6: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 6
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 26 Jun 2020 05:19:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6055/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 6
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 26 Jun 2020 05:19:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
..


Patch Set 5: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6052/


--
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 5
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 26 Jun 2020 03:43:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9691: Support Kudu Timestamp and Date bloom filter

2020-06-25 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16094 )

Change subject: IMPALA-9691: Support Kudu Timestamp and Date bloom filter
..


Patch Set 6: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48
Gerrit-Change-Number: 16094
Gerrit-PatchSet: 6
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 26 Jun 2020 01:57:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9691: Support Kudu Timestamp and Date bloom filter

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16094 )

Change subject: IMPALA-9691: Support Kudu Timestamp and Date bloom filter
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6054/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48
Gerrit-Change-Number: 16094
Gerrit-PatchSet: 6
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 26 Jun 2020 01:57:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9691: Support Kudu Timestamp and Date bloom filter

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16094 )

Change subject: IMPALA-9691: Support Kudu Timestamp and Date bloom filter
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6427/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48
Gerrit-Change-Number: 16094
Gerrit-PatchSet: 6
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 26 Jun 2020 01:43:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9691: Support Kudu Timestamp and Date bloom filter

2020-06-25 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16094 )

Change subject: IMPALA-9691: Support Kudu Timestamp and Date bloom filter
..


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16094/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/16094/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@199
PS5, Line 199: // If set, indicates that the filter is targeted for Kudu 
scan node with source
> mention that this is for Kudu
Ok, will fix it.


http://gerrit.cloudera.org:8080/#/c/16094/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@346
PS5, Line 346: } catch (AnalysisException e) {
> Lets add a Log.warn with the error message here.
Will add log message.


http://gerrit.cloudera.org:8080/#/c/16094/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@644
PS5, Line 644:   joinNode, filterType, bloomFilterSizeLimits_,
> Add a brief comment here saying something like "For timestamp bloom filters
Will add comments.


http://gerrit.cloudera.org:8080/#/c/16094/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@773
PS5, Line 773: nue;
> I think it would make this 'if' clearer if you surrounded this with parenth
Will fix it.



--
To view, visit http://gerrit.cloudera.org:8080/16094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48
Gerrit-Change-Number: 16094
Gerrit-PatchSet: 6
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 26 Jun 2020 01:16:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9691: Support Kudu Timestamp and Date bloom filter

2020-06-25 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/16094 )

Change subject: IMPALA-9691: Support Kudu Timestamp and Date bloom filter
..

IMPALA-9691: Support Kudu Timestamp and Date bloom filter

Impala save timestamp as 12 bytes of structure TimestampValue with
time in nano seconds. Kudu store timestamp as 8 bytes of Unix Time
microseconds. To avoid the data truncation issue in the bloom filter,
add FunctionCallExpr with 'utc_to_unix_micros' as the root of source
expression of bloom filter to convert timestamp values to microseconds
when building timestamp bloom filter for Kudu.
Generated functional date_tbl table in Kudu format for unit-test.
Added new test cases for Kudu Timestamp and Date bloom filters.

Testing:
Passed all core tests.

Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48
---
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
7 files changed, 311 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/16094/6
--
To view, visit http://gerrit.cloudera.org:8080/16094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48
Gerrit-Change-Number: 16094
Gerrit-PatchSet: 6
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-7020: fix costing of non-trivial CAST expressions

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16073 )

Change subject: IMPALA-7020: fix costing of non-trivial CAST expressions
..

IMPALA-7020: fix costing of non-trivial CAST expressions

Some cast operations are quite expensive to evaluate,
which was not reflected in the uniform costing of CAST
expresions.

We fix this by increasing the cost of non-trivial
casts to be the same as an arbitrary function call.

Testing:
Ran exhaustive tests.

Add planner tests to check that CAST expressions are
materialized or not based on the input and output
types - the planner output lists 'materialized:'
expressions for the SORT operator.

A few existing planner tests had changes in predicate
ordering. I checked manually that these changes made
sense.

Perf:
I sanity-checked that this actually helped (a variant of)
the example query from IMPALA-7020. The following query
went from ~8s to ~2s in my dev environment:

select *
FROM
  (
SELECT
  o.*,
  ROW_NUMBER() OVER(ORDER BY evt_ts DESC) AS rn
FROM
  (
SELECT
  l_orderkey,l_partkey,l_linenumber,l_quantity, cast (l_shipdate as 
date) evt_ts
FROM
  tpch_parquet.lineitem
  ) o
  ) r
WHERE
  rn BETWEEN 1 AND 101
ORDER BY rn;

Change-Id: I3f1a16fc45191a2eedf38cc243c70173d44806c6
Reviewed-on: http://gerrit.cloudera.org:8080/16073
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 
---
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
6 files changed, 212 insertions(+), 10 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/16073
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I3f1a16fc45191a2eedf38cc243c70173d44806c6
Gerrit-Change-Number: 16073
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7020: fix costing of non-trivial CAST expressions

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16073 )

Change subject: IMPALA-7020: fix costing of non-trivial CAST expressions
..


Patch Set 6: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16073
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3f1a16fc45191a2eedf38cc243c70173d44806c6
Gerrit-Change-Number: 16073
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 26 Jun 2020 00:05:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9691: Support Kudu Timestamp and Date bloom filter

2020-06-25 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16094 )

Change subject: IMPALA-9691: Support Kudu Timestamp and Date bloom filter
..


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16094/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/16094/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@199
PS5, Line 199: // If set, indicates that the filter need to truncate 
timestamp.
mention that this is for Kudu


http://gerrit.cloudera.org:8080/#/c/16094/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@346
PS5, Line 346:   return null;
Lets add a Log.warn with the error message here.


http://gerrit.cloudera.org:8080/#/c/16094/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@644
PS5, Line 644:   if (filterType == TRuntimeFilterType.BLOOM
Add a brief comment here saying something like "For timestamp bloom filters we 
also generate a RuntimeFilter with the src timestamp truncated for Kudu scan 
node targets"


http://gerrit.cloudera.org:8080/#/c/16094/5/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@773
PS5, Line 773: targetExpr.getType().isTimestamp() && 
!filter.isTimestampTruncation()
I think it would make this 'if' clearer if you surrounded this with 
parentheses, to show that these conditions are related.



--
To view, visit http://gerrit.cloudera.org:8080/16094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48
Gerrit-Change-Number: 16094
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 26 Jun 2020 00:00:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9294: Support DATE for min-max runtime filter

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16103 )

Change subject: IMPALA-9294: Support DATE for min-max runtime filter
..


Patch Set 2: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6050/


--
To view, visit http://gerrit.cloudera.org:8080/16103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2f6e2dc6949735d5f0fcf317361cc2969a5e82c
Gerrit-Change-Number: 16103
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 25 Jun 2020 23:19:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9691: Support Kudu Timestamp and Date bloom filter

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16094 )

Change subject: IMPALA-9691: Support Kudu Timestamp and Date bloom filter
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6426/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48
Gerrit-Change-Number: 16094
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 25 Jun 2020 23:11:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6052/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 5
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Jun 2020 22:45:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 5
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Jun 2020 22:45:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9691: Support Kudu Timestamp and Date bloom filter

2020-06-25 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/16094 )

Change subject: IMPALA-9691: Support Kudu Timestamp and Date bloom filter
..

IMPALA-9691: Support Kudu Timestamp and Date bloom filter

Impala save timestamp as 12 bytes of structure TimestampValue with
time in nano seconds. Kudu store timestamp as 8 bytes of Unix Time
microseconds. To avoid the data truncation issue in the bloom filter,
add FunctionCallExpr with 'utc_to_unix_micros' as the root of source
expression of bloom filter to convert timestamp values to microseconds
when building timestamp bloom filter for Kudu.
Generated functional date_tbl table in Kudu format for unit-test.
Added new test cases for Kudu Timestamp and Date bloom filters.

Testing:
Passed all core tests.

Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48
---
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M 
testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
7 files changed, 299 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/16094/5
--
To view, visit http://gerrit.cloudera.org:8080/16094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48
Gerrit-Change-Number: 16094
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-9691: Support Kudu Timestamp and Date bloom filter

2020-06-25 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16094 )

Change subject: IMPALA-9691: Support Kudu Timestamp and Date bloom filter
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16094/4/fe/src/main/java/org/apache/impala/analysis/TimestampTruncationExpr.java
File fe/src/main/java/org/apache/impala/analysis/TimestampTruncationExpr.java:

http://gerrit.cloudera.org:8080/#/c/16094/4/fe/src/main/java/org/apache/impala/analysis/TimestampTruncationExpr.java@41
PS4, Line 41: public class TimestampTruncationExpr extends Expr {
> So I'm sorry I didn't think of this before and wasted your time, but I real
Right, the code change will be much more simple and don't need to make any 
change in backend. Thanks.



--
To view, visit http://gerrit.cloudera.org:8080/16094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48
Gerrit-Change-Number: 16094
Gerrit-PatchSet: 4
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 25 Jun 2020 22:18:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9829: Add Write Metrics for Spilling

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16083 )

Change subject: IMPALA-9829: Add Write Metrics for Spilling
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6425/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16083
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914
Gerrit-Change-Number: 16083
Gerrit-PatchSet: 13
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 25 Jun 2020 22:14:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
..


Patch Set 4: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6049/


--
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 4
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Jun 2020 22:09:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2658: Extend the NDV function to accept a precision

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/15997 )

Change subject: IMPALA-2658: Extend the NDV function to accept a precision
..

IMPALA-2658: Extend the NDV function to accept a precision

This work addresses the current limitation in NDV function by
extending the function to optionally take a secondary argument
called scale.

   NDV([DISTINCT | ALL] expression [, scale])

Without the secondary argument, all the syntax and semantics are
preserved. The precision, which determines the total number
of different estimators in the HLL algorithm, is still 10.

When supplied, the scale argument must be an interger literal
in the range from 1 to 10. Its value is internally mapped
to a precision used by the HLL algorithm, with the following
mapping formula:

  precision = scale + 8.

Thus, a scale of 1 is mapped to a precision of 9 and a scale of
10 is mapped to a precision of 18.

A large precision value generally produces a better estimation
(i.e. with less error) than a small precision value, due to extra
number of estimators involved. The expense is at the extra amount of
memory needed. For a given precision p, the amount of memory used
by the HLL algorithm is in the order of 2^p bytes.

Testing:
1. Ran unit tests against table store_sales in TPC-DS and table customer
   in TPCH in both serial and parallel plan settings;
2. Added and ran a new regression test (test_ndv)) in
   TestAggregationQueries section to compute NDV() for every supported
   Impala data type over all valid scale values;
3. Ran "core" tests.

Performance:
1. Ran estimation error tests against a total of 22 distinct data sets
   loaded into external Impala tables.

   The error was computed as
   abs( - ) / .

   Overall, the precision of 18 (or the scale value of 10) gave
   the best result with worst estimation error at 0.42% (for one set
   of 10 million integers), and average error no more than 0.17%,
   at the cost of 256Kb of memory for the internal data structure per
   evaluation of the HLL algorithm.  Other precisions (such as 16 and
   17) were also very reasonable but with slightly larger estimation
   errors.

2. Ran execution time tests against a total of 6 distinct data files
   on a single node EC2 VM in debug mode. These data files were loaded
   in turn into a single column in an external Impala table.  It was
   found that the total execution time was relatively the same across
   different scales for a given table configuration. It remains to be
   seen the execution time for tables involving multiple data files
   across multiple nodes.

3. Ran execution time tests comparing the before- and
   after-enhancement version of NDV().

Change-Id: I48a4517bd0959f7021143073d37505a46c551a58
Reviewed-on: http://gerrit.cloudera.org:8080/15997
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/common/logging.h
M be/src/exec/incr-stats-util-test.cc
M be/src/exec/incr-stats-util.cc
M be/src/exec/incr-stats-util.h
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M tests/query_test/test_aggregation.py
9 files changed, 426 insertions(+), 82 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/15997
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I48a4517bd0959f7021143073d37505a46c551a58
Gerrit-Change-Number: 15997
Gerrit-PatchSet: 42
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 


[Impala-ASF-CR] IMPALA-2658: Extend the NDV function to accept a precision

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15997 )

Change subject: IMPALA-2658: Extend the NDV function to accept a precision
..


Patch Set 41: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/15997
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I48a4517bd0959f7021143073d37505a46c551a58
Gerrit-Change-Number: 15997
Gerrit-PatchSet: 41
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Thu, 25 Jun 2020 21:55:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9829: Add Write Metrics for Spilling

2020-06-25 Thread Yida Wu (Code Review)
Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16083 )

Change subject: IMPALA-9829: Add Write Metrics for Spilling
..


Patch Set 13:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/16083/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16083/10//COMMIT_MSG@9
PS10, Line 9: Currently, we have read metrics for spilling, in this patch, we 
add
> Nit: in the commit message it can be best to be more high level, and descri
Thank you for the suggestion. Have added a summary of the task and changed for 
some descriptive terms in the commit message.


http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc
File be/src/runtime/io/disk-io-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1648
PS10, Line 1648: // the write operations.
> Finish comment with a period.
Done


http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1664
PS10, Line 1664:   // Reset the Metric if it exists.
> Add period.
Done


http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1686
PS10, Line 1686:   // Issue a number of writes to the disks.
> Add period.
Done


http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1706
PS10, Line 1706:   // Check the count and max/min of the histogram metric.
> Add period.
Done


http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1712
PS10, Line 1712: // The count should be added by num_ranges/num_disks per 
disk.
> Add period.
Done


http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1714
PS10, Line 1714: // Check if the min and max of write size are the same as 
the written len.
> Add period.
Done


http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1732
PS10, Line 1732: // Issue a writing operation to a non-existent tmp file path.
> Add periods
Done


http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1737
PS10, Line 1737:   string tmp_file = "/non-existent/file.txt";
> Another test uses "/non-existent/file.txt" to indicate a non-existing file,
Done


http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1739
PS10, Line 1739:   // Reset the Metric if it exists.
> Add period.
Done


http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1746
PS10, Line 1746:   // Remove the path in case it exists.
> Add period.
Done


http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1767
PS10, Line 1767:   // One IO Error should be added to the metrics counter.
> Add period.
Done


http://gerrit.cloudera.org:8080/#/c/16083/10/common/thrift/metrics.json
File common/thrift/metrics.json:

http://gerrit.cloudera.org:8080/#/c/16083/10/common/thrift/metrics.json@663
PS10, Line 663: "description": "The number of write io errors on disk.",
> Should be "errors".
Done



--
To view, visit http://gerrit.cloudera.org:8080/16083
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914
Gerrit-Change-Number: 16083
Gerrit-PatchSet: 13
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 25 Jun 2020 21:51:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9829: Add Write Metrics for Spilling

2020-06-25 Thread Yida Wu (Code Review)
Yida Wu has uploaded a new patch set (#13). ( 
http://gerrit.cloudera.org:8080/16083 )

Change subject: IMPALA-9829: Add Write Metrics for Spilling
..

IMPALA-9829: Add Write Metrics for Spilling

Currently, we have read metrics for spilling, in this patch, we add
support for write metrics. The new metrics could be useful to measure
the write operations and target performance issues when involving in
spilling to remote disks(S3) (IMPALA-9828).

The metrics added record the information includes:
1. write latency of each write operation to the disk, metric kind:
HistogramMetric, unit: nanosecond.
2. write size of each write operation to the disk, metric kind:
HistogramMetric, unit: Bytes.
3. number of write IO errors when writing to the disk, metric kind:
IntCounter.

Testing:
 * added DiskIoMgrTest.MetricsOfWriteSizeAndLatency
 * added DiskIoMgrTest.MetricsOfWriteIoError
Ran unit test disk-io-mgr-test and pre-commit test

Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914
---
M be/src/runtime/io/disk-io-mgr-internal.h
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/util/histogram-metric.h
M common/thrift/metrics.json
6 files changed, 258 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/16083/13
--
To view, visit http://gerrit.cloudera.org:8080/16083
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914
Gerrit-Change-Number: 16083
Gerrit-PatchSet: 13
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 


[Impala-ASF-CR] IMPALA-9829: Add Write Metrics for Spilling

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16083 )

Change subject: IMPALA-9829: Add Write Metrics for Spilling
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6424/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16083
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914
Gerrit-Change-Number: 16083
Gerrit-PatchSet: 11
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 25 Jun 2020 21:43:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9829: Add Write Metrics for Spilling

2020-06-25 Thread Yida Wu (Code Review)
Yida Wu has uploaded a new patch set (#11). ( 
http://gerrit.cloudera.org:8080/16083 )

Change subject: IMPALA-9829: Add Write Metrics for Spilling
..

IMPALA-9829: Add Write Metrics for Spilling

Currently, we have read metrics for spilling, in this patch, we add
support for write metrics. The new metrics could be useful to measure
the write operations and target performance issues when involving in
spilling to remote disks(S3) (IMPALA-9828).

Three types of metrics are added in disk-io-mgr:
1. impala-server.io-mgr.queue-$0.write-latency, unit: ns,
kind: HistogramMetric
2. impala-server.io-mgr.queue-$0.write-size, unit: Bytes,
kind: HistogramMetric
3. impala-server.io-mgr.queue-$0.write-io-error, kind: IntCounter

Write size, latency and io errors will be recorded in
impala::io::DiskIoMgr::Write.

Testing:
 * added DiskIoMgrTest.MetricsOfWriteSizeAndLatency
 * added DiskIoMgrTest.MetricsOfWriteIoError
Ran unit test disk-io-mgr-test and pre-commit test

Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914
---
M be/src/runtime/io/disk-io-mgr-internal.h
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/util/histogram-metric.h
M common/thrift/metrics.json
6 files changed, 258 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/16083/11
--
To view, visit http://gerrit.cloudera.org:8080/16083
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914
Gerrit-Change-Number: 16083
Gerrit-PatchSet: 11
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 


[Impala-ASF-CR] IMPALA-9691: Support Kudu Timestamp and Date bloom filter

2020-06-25 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16094 )

Change subject: IMPALA-9691: Support Kudu Timestamp and Date bloom filter
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16094/4/fe/src/main/java/org/apache/impala/analysis/TimestampTruncationExpr.java
File fe/src/main/java/org/apache/impala/analysis/TimestampTruncationExpr.java:

http://gerrit.cloudera.org:8080/#/c/16094/4/fe/src/main/java/org/apache/impala/analysis/TimestampTruncationExpr.java@41
PS4, Line 41: public class TimestampTruncationExpr extends Expr {
So I'm sorry I didn't think of this before and wasted your time, but I realized 
I don't think this is actually necessary.

There's an existing function called 'utc_to_unix_micros' that does exactly what 
you're doing here. All you should need to do is create a FunctionCallExpr with 
that function name in RuntimeFilterGenerator.



--
To view, visit http://gerrit.cloudera.org:8080/16094
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c1e9bcc9fd6d79a39f25eaa3396188fc0a52a48
Gerrit-Change-Number: 16094
Gerrit-PatchSet: 4
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 25 Jun 2020 20:27:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-25 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc
File be/src/exec/sort-node.cc:

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@90
PS12, Line 90: GetRowSize
> I didn't dig too deep, but row_descriptor_->GetRowSize() seems to contain t
I will look at possibility to access that average size data in the backend.

But just to make sure I get it right.
For row that contain varlen data, the GetRowSize() will most likely 
underestimate the size, since it only takes account for the pointer, but not 
the string length itself?
So that, in turn, will cause return value of this ComputeInputSizeEstimate() to 
be underestimate as well.

But isn't this input size underestimation better than overestimation? In case 
of underestimation, the worse situation is that we don't enforce 
sort_run_bytes_limit  for the first run (hoping that all will fit in memory), 
turns out wrong and spill, but we then enforce sort_run_bytes_limit for the 
next runs. Overestimation is worse, because we unnecessarily spill from 
beginning when the input can possibly fit in the memory.



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 12
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Jun 2020 20:09:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8984: Uncorrelated scalar subqueries in the select list

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16007 )

Change subject: IMPALA-8984: Uncorrelated scalar subqueries in the select list
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6422/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16007
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibcf55d26889aa01d69bb85f18c9241dda095fb66
Gerrit-Change-Number: 16007
Gerrit-PatchSet: 4
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Jun 2020 20:01:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9784: Non correlated subqueries in HAVING.

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16052 )

Change subject: IMPALA-9784: Non correlated subqueries in HAVING.
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6423/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16052
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I124a58a09a1a47e1222a22d84b54fe7d07844461
Gerrit-Change-Number: 16052
Gerrit-PatchSet: 2
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Jun 2020 19:57:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9784: Non correlated subqueries in HAVING.

2020-06-25 Thread Shant Hovsepian (Code Review)
Shant Hovsepian has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16052 )

Change subject: IMPALA-9784: Non correlated subqueries in HAVING.
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16052/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/16052/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@466
PS2, Line 466: // TODO: IMPALA-5100 to cover all cases, we do let 
through runtime scalars with
I relaxed some of these rules to let through subqueries such as (select 
count(a) from t group by b where b=1).
Referenced the jira to enhance the scalar subquery planner checks to handle 
more expression evaluation but for now thought the tradeoff was better to let 
these queries through wrapped in a CardinalityCheckNode.

There are case where two different runtime scalar subqueries in nested query 
blocks could run and have runtime errors that interfere with each other since 
we don't have independent execution, but I checked and many other database 
(hive, vertica, vectorwise) have this kind of behavior. It feels like a 
worthwhile trade off to allow more queries to run where some might have a 
runtime error in an off chance when we'd just otherwise not let the query run 
at all. Also it's needed to support Q44 from TPC-DS.



--
To view, visit http://gerrit.cloudera.org:8080/16052
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I124a58a09a1a47e1222a22d84b54fe7d07844461
Gerrit-Change-Number: 16052
Gerrit-PatchSet: 2
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Jun 2020 19:52:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7020: fix costing of non-trivial CAST expressions

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16073 )

Change subject: IMPALA-7020: fix costing of non-trivial CAST expressions
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6421/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16073
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3f1a16fc45191a2eedf38cc243c70173d44806c6
Gerrit-Change-Number: 16073
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Jun 2020 19:40:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8984: Uncorrelated scalar subqueries in the select list

2020-06-25 Thread Shant Hovsepian (Code Review)
Shant Hovsepian has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16007 )

Change subject: IMPALA-8984: Uncorrelated scalar subqueries in the select list
..


Patch Set 4:

(9 comments)

Some more tests and review comments addressed. Still want to get a good test 
run out of jenkins.

http://gerrit.cloudera.org:8080/#/c/16007/3/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/16007/3/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@292
PS3, Line 292: "Invariant violated: Only subqueries that are 
guaranteed to return a "
> nit:  "guaranteed"
Done


http://gerrit.cloudera.org:8080/#/c/16007/3/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/16007/3/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@937
PS3, Line 937:  * supported in the FROM clause, WHERE clause and SELECT 
list. The rewrite is
> Update this comment for SELECT clause.
Done


http://gerrit.cloudera.org:8080/#/c/16007/3/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1137
PS3, Line 1137:  *returned per group so a run time cardinality check 
must be applied. An exception
> nit: duplicate 'primary'
Done


http://gerrit.cloudera.org:8080/#/c/16007/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/16007/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1117
PS4, Line 1117:  *
Done


http://gerrit.cloudera.org:8080/#/c/16007/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1117
PS4, Line 1117:  *
Done


http://gerrit.cloudera.org:8080/#/c/16007/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1135
PS4, Line 1135:  *
Done


http://gerrit.cloudera.org:8080/#/c/16007/4/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1230
PS4, Line 1230:   // rewrite to a LOJ.
I added a test for this. I know it feels weird but since the slotref for the 
subquery is marked as materialized and the other join queries get bound by the 
USING/ON clause, nothing explodes. Since there are scalar subqueries, the only 
weird situation is if the cardinality of all the joins where equal to 1 then it 
might get reordered but the results would still be correct.

If it were a correlated subquery then we've need to handle things more 
carefully, but that's for a later commit.


http://gerrit.cloudera.org:8080/#/c/16007/4/testdata/workloads/functional-query/queries/QueryTest/subquery.test
File testdata/workloads/functional-query/queries/QueryTest/subquery.test:

http://gerrit.cloudera.org:8080/#/c/16007/4/testdata/workloads/functional-query/queries/QueryTest/subquery.test@1044
PS4, Line 1044: select id, 1+(select min(id) from functional.alltypessmall)
Added the tpc-ds query, it's a pretty complex plan.


http://gerrit.cloudera.org:8080/#/c/16007/4/testdata/workloads/tpcds/queries/tpcds-decimal_v2-q9.test
File testdata/workloads/tpcds/queries/tpcds-decimal_v2-q9.test:

http://gerrit.cloudera.org:8080/#/c/16007/4/testdata/workloads/tpcds/queries/tpcds-decimal_v2-q9.test@3
PS4, Line 3: select case when (select count(*)
Done



--
To view, visit http://gerrit.cloudera.org:8080/16007
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibcf55d26889aa01d69bb85f18c9241dda095fb66
Gerrit-Change-Number: 16007
Gerrit-PatchSet: 4
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Jun 2020 19:38:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9784: Non correlated subqueries in HAVING.

2020-06-25 Thread Shant Hovsepian (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16052

to look at the new patch set (#2).

Change subject: IMPALA-9784: Non correlated subqueries in HAVING.
..

IMPALA-9784: Non correlated subqueries in HAVING.

Support rewriting subqueries in the HAVING clause by nesting the
aggregation query and pulling up the subquery predicates into the outer
WHERE clause.

Testing:
  * New analyzer tests
  * New functional subquery tests
  * Added Q23, Q24 and Q44 to the tpcds workload

Change-Id: I124a58a09a1a47e1222a22d84b54fe7d07844461
---
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
A testdata/workloads/tpcds/queries/tpcds-q23-1.test
A testdata/workloads/tpcds/queries/tpcds-q23-2.test
A testdata/workloads/tpcds/queries/tpcds-q24-1.test
A testdata/workloads/tpcds/queries/tpcds-q24-2.test
A testdata/workloads/tpcds/queries/tpcds-q44.test
M tests/query_test/test_tpcds_queries.py
11 files changed, 1,123 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/16052/2
--
To view, visit http://gerrit.cloudera.org:8080/16052
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I124a58a09a1a47e1222a22d84b54fe7d07844461
Gerrit-Change-Number: 16052
Gerrit-PatchSet: 2
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-8984: Uncorrelated scalar subqueries in the select list

2020-06-25 Thread Shant Hovsepian (Code Review)
Hello Aman Sinha, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16007

to look at the new patch set (#4).

Change subject: IMPALA-8984: Uncorrelated scalar subqueries in the select list
..

IMPALA-8984: Uncorrelated scalar subqueries in the select list

Extend StmtRewriter with the ability to rewrite scalar subqueries in the
select list into cross joins. Currently the subquery must pass plan-time
checks to determine that it returns a single row which may miss cases
that may be valid at runtime or with more complex evaluation of the
predicate expressions in the planner. Support for correlated subqueries
will be a follow on change.

With this change Q9 of TPC-DS is supported, we now load the 'reasons'
table as part of the TPC-DS workload for use by Q9.

Testing:
  * Added new analyzer tests, updated previous subquery tests
  * test_queries.py::TestQueries::test_subquery

Change-Id: Ibcf55d26889aa01d69bb85f18c9241dda095fb66
---
M fe/src/main/java/org/apache/impala/analysis/SelectList.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
M testdata/datasets/tpcds/tpcds_schema_template.sql
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
M testdata/workloads/functional-query/queries/QueryTest/subquery.test
M testdata/workloads/tpcds/queries/count.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q9.test
A testdata/workloads/tpcds/queries/tpcds-q9.test
M tests/query_test/test_tpcds_queries.py
11 files changed, 1,455 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/16007/4
--
To view, visit http://gerrit.cloudera.org:8080/16007
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibcf55d26889aa01d69bb85f18c9241dda095fb66
Gerrit-Change-Number: 16007
Gerrit-PatchSet: 4
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-8253: Parquet delta encoding and decoding.

2020-06-25 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12621 )

Change subject: IMPALA-8253: Parquet delta encoding and decoding.
..


Patch Set 16:

Hmm, this CR somehow got forgotten. Anyway, I'm planning to take a look in the 
following days. Daniel, do you plan to continue this work?


--
To view, visit http://gerrit.cloudera.org:8080/12621
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie7378ac1a490a6c89a0a4349aae86cbc0fbc80f8
Gerrit-Change-Number: 12621
Gerrit-PatchSet: 16
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 25 Jun 2020 19:12:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7020: fix costing of non-trivial CAST expressions

2020-06-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16073 )

Change subject: IMPALA-7020: fix costing of non-trivial CAST expressions
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16073
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3f1a16fc45191a2eedf38cc243c70173d44806c6
Gerrit-Change-Number: 16073
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Jun 2020 19:10:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7020: fix costing of non-trivial CAST expressions

2020-06-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16073 )

Change subject: IMPALA-7020: fix costing of non-trivial CAST expressions
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16073/4/fe/src/main/java/org/apache/impala/analysis/CastExpr.java
File fe/src/main/java/org/apache/impala/analysis/CastExpr.java:

http://gerrit.cloudera.org:8080/#/c/16073/4/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@a280
PS4, Line 280:
> I think this isn't used anywhere now, so you could remove its definition in
Done



--
To view, visit http://gerrit.cloudera.org:8080/16073
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3f1a16fc45191a2eedf38cc243c70173d44806c6
Gerrit-Change-Number: 16073
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Jun 2020 19:10:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7020: fix costing of non-trivial CAST expressions

2020-06-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16073 )

Change subject: IMPALA-7020: fix costing of non-trivial CAST expressions
..


Patch Set 6: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16073
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3f1a16fc45191a2eedf38cc243c70173d44806c6
Gerrit-Change-Number: 16073
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Jun 2020 19:11:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7020: fix costing of non-trivial CAST expressions

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16073 )

Change subject: IMPALA-7020: fix costing of non-trivial CAST expressions
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6051/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16073
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3f1a16fc45191a2eedf38cc243c70173d44806c6
Gerrit-Change-Number: 16073
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Jun 2020 19:10:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7020: fix costing of non-trivial CAST expressions

2020-06-25 Thread Tim Armstrong (Code Review)
Hello Aman Sinha, Thomas Tauber-Marshall, Shant Hovsepian, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16073

to look at the new patch set (#5).

Change subject: IMPALA-7020: fix costing of non-trivial CAST expressions
..

IMPALA-7020: fix costing of non-trivial CAST expressions

Some cast operations are quite expensive to evaluate,
which was not reflected in the uniform costing of CAST
expresions.

We fix this by increasing the cost of non-trivial
casts to be the same as an arbitrary function call.

Testing:
Ran exhaustive tests.

Add planner tests to check that CAST expressions are
materialized or not based on the input and output
types - the planner output lists 'materialized:'
expressions for the SORT operator.

A few existing planner tests had changes in predicate
ordering. I checked manually that these changes made
sense.

Perf:
I sanity-checked that this actually helped (a variant of)
the example query from IMPALA-7020. The following query
went from ~8s to ~2s in my dev environment:

select *
FROM
  (
SELECT
  o.*,
  ROW_NUMBER() OVER(ORDER BY evt_ts DESC) AS rn
FROM
  (
SELECT
  l_orderkey,l_partkey,l_linenumber,l_quantity, cast (l_shipdate as 
date) evt_ts
FROM
  tpch_parquet.lineitem
  ) o
  ) r
WHERE
  rn BETWEEN 1 AND 101
ORDER BY rn;

Change-Id: I3f1a16fc45191a2eedf38cc243c70173d44806c6
---
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
6 files changed, 212 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/16073/5
--
To view, visit http://gerrit.cloudera.org:8080/16073
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3f1a16fc45191a2eedf38cc243c70173d44806c6
Gerrit-Change-Number: 16073
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7020: fix costing of non-trivial CAST expressions

2020-06-25 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16073 )

Change subject: IMPALA-7020: fix costing of non-trivial CAST expressions
..


Patch Set 4: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16073/4/fe/src/main/java/org/apache/impala/analysis/CastExpr.java
File fe/src/main/java/org/apache/impala/analysis/CastExpr.java:

http://gerrit.cloudera.org:8080/#/c/16073/4/fe/src/main/java/org/apache/impala/analysis/CastExpr.java@a280
PS4, Line 280:
I think this isn't used anywhere now, so you could remove its definition in 
Expr.java



--
To view, visit http://gerrit.cloudera.org:8080/16073
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3f1a16fc45191a2eedf38cc243c70173d44806c6
Gerrit-Change-Number: 16073
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Jun 2020 18:30:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9294: Support DATE for min-max runtime filter

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16103 )

Change subject: IMPALA-9294: Support DATE for min-max runtime filter
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6050/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2f6e2dc6949735d5f0fcf317361cc2969a5e82c
Gerrit-Change-Number: 16103
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 25 Jun 2020 18:14:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9294: Support DATE for min-max runtime filter

2020-06-25 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16103 )

Change subject: IMPALA-9294: Support DATE for min-max runtime filter
..


Patch Set 2: Code-Review+2

Thanks for doing this


--
To view, visit http://gerrit.cloudera.org:8080/16103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2f6e2dc6949735d5f0fcf317361cc2969a5e82c
Gerrit-Change-Number: 16103
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 25 Jun 2020 18:14:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-25 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc
File be/src/exec/sort-node.cc:

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@90
PS12, Line 90: GetRowSize
> So what is the nature of varlen column? Is each row possibly will have diff
I didn't dig too deep, but row_descriptor_->GetRowSize() seems to contain the 
size of the tuple that holds a row - but in case of string and varchar it 
contains a pointer (+length), so there is additional data in some buffer.

The column stats contain AvgSize and MaxSize - these are constants for fixed 
sized types, but we calculate them for strings during COMPUTE STATS, so we can 
get a more or less accurate estimation for the total amount of memory consumed.

I don't know from the top of my head how to access this data in the backend.

Strings are very common, so many queries contain varlen slots. I am not sure if 
it is a good idea to create an optimization specifically for queries without 
strings.



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 12
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Jun 2020 18:11:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9829: Add Write Metrics for Spilling

2020-06-25 Thread Andrew Sherman (Code Review)
Andrew Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16083 )

Change subject: IMPALA-9829: Add Write Metrics for Spilling
..


Patch Set 10:

(13 comments)

Good change! I like the detailed unit tests.
I think a few cosmetic changes are all that is needed.
This may seem like a picky nit but in Impala code, comments start with a 
capital letter and end with a period.
This may seem weirdly prescriptive but it does enhance readability.

http://gerrit.cloudera.org:8080/#/c/16083/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16083/10//COMMIT_MSG@9
PS10, Line 9: Three types of metrics are added in disk-io-mgr:
Nit: in the commit message it can be best to be more high level, and describe 
what is added in more descriptive terms.


http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc
File be/src/runtime/io/disk-io-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1648
PS10, Line 1648: // the write operations
Finish comment with a period.


http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1664
PS10, Line 1664:   // Reset the Metric if it exists
Add period.


http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1686
PS10, Line 1686:   // Issue a number of writes to the disks
Add period.


http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1706
PS10, Line 1706:   // Check the count and max/min of the histogram metric
Add period.


http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1712
PS10, Line 1712: // The count should be added by num_ranges/num_disks per 
disk
Add period.


http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1714
PS10, Line 1714: // Check if the min and max of write size are the same as 
the written len
Add period.


http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1732
PS10, Line 1732: // Issue a writing operation to a non-existent tmp file path
Add periods


http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1737
PS10, Line 1737:   string tmp_file = 
"/tmp/disk_io_mgr_test/MetricsOfWriteIoError";
Another test uses "/non-existent/file.txt" to indicate a non-existing file, 
this makes it clearer what is happening


http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1739
PS10, Line 1739:   // Reset the Metric if it exists
Add period.


http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1746
PS10, Line 1746:   // Remove the path in case it exists
Add period.


http://gerrit.cloudera.org:8080/#/c/16083/10/be/src/runtime/io/disk-io-mgr-test.cc@1767
PS10, Line 1767:   // One IO Error should be added to the metrics counter
Add period.


http://gerrit.cloudera.org:8080/#/c/16083/10/common/thrift/metrics.json
File common/thrift/metrics.json:

http://gerrit.cloudera.org:8080/#/c/16083/10/common/thrift/metrics.json@663
PS10, Line 663: "description": "The number of write io error on disk.",
Should be "errors".



--
To view, visit http://gerrit.cloudera.org:8080/16083
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914
Gerrit-Change-Number: 16083
Gerrit-PatchSet: 10
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 25 Jun 2020 17:36:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9294: Support DATE for min-max runtime filter

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16103 )

Change subject: IMPALA-9294: Support DATE for min-max runtime filter
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6420/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2f6e2dc6949735d5f0fcf317361cc2969a5e82c
Gerrit-Change-Number: 16103
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 25 Jun 2020 17:34:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9790: option to use resolved hostname everywhere

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16108 )

Change subject: IMPALA-9790: option to use resolved hostname everywhere
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6419/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16108
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d5cb9c68c60ce8dc838cde9dcf1c590017f5c9a
Gerrit-Change-Number: 16108
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Comment-Date: Thu, 25 Jun 2020 17:17:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9294: Support DATE for min-max runtime filter

2020-06-25 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/16103 )

Change subject: IMPALA-9294: Support DATE for min-max runtime filter
..

IMPALA-9294: Support DATE for min-max runtime filter

Implemented Date min-max filter and applied it to Kudu as other
min-max runtime filters.
Added new test cases for Date min-max filters.

Testing:
Passed all core tests.

Change-Id: Ic2f6e2dc6949735d5f0fcf317361cc2969a5e82c
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/runtime/date-value.h
M be/src/util/min-max-filter-ir.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/protobuf/common.proto
M common/thrift/Data.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test
12 files changed, 274 insertions(+), 164 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/16103/2
--
To view, visit http://gerrit.cloudera.org:8080/16103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic2f6e2dc6949735d5f0fcf317361cc2969a5e82c
Gerrit-Change-Number: 16103
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

2020-06-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
..


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16091/2/be/src/runtime/tmp-file-mgr.cc
File be/src/runtime/tmp-file-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/16091/2/be/src/runtime/tmp-file-mgr.cc@540
PS2, Line 540: for (int index = start; index <= end; ++index) {
> Maybe print the values in the DCHECK error
Checked that this was fixed



--
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 3
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Jun 2020 17:04:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6049/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 4
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Jun 2020 17:04:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9697: Support priority based scratch directory selection

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16091 )

Change subject: IMPALA-9697: Support priority based scratch directory selection
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I381c3a358e1382e6696325fec74667f1fa18dd17
Gerrit-Change-Number: 16091
Gerrit-PatchSet: 4
Gerrit-Owner: Abhishek Rawat 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Jun 2020 17:04:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2658: Extend the NDV function to accept a precision

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15997 )

Change subject: IMPALA-2658: Extend the NDV function to accept a precision
..


Patch Set 41:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6048/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/15997
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I48a4517bd0959f7021143073d37505a46c551a58
Gerrit-Change-Number: 15997
Gerrit-PatchSet: 41
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Thu, 25 Jun 2020 16:56:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2658: Extend the NDV function to accept a precision

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15997 )

Change subject: IMPALA-2658: Extend the NDV function to accept a precision
..


Patch Set 41: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/15997
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I48a4517bd0959f7021143073d37505a46c551a58
Gerrit-Change-Number: 15997
Gerrit-PatchSet: 41
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Thu, 25 Jun 2020 16:56:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2658: Extend the NDV function to accept a precision

2020-06-25 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15997 )

Change subject: IMPALA-2658: Extend the NDV function to accept a precision
..


Patch Set 40: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/15997
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I48a4517bd0959f7021143073d37505a46c551a58
Gerrit-Change-Number: 15997
Gerrit-PatchSet: 40
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Thu, 25 Jun 2020 16:56:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7020: fix costing of non-trivial CAST expressions

2020-06-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16073 )

Change subject: IMPALA-7020: fix costing of non-trivial CAST expressions
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16073/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16073/4//COMMIT_MSG@11
PS4, Line 11: expresions
spelling



--
To view, visit http://gerrit.cloudera.org:8080/16073
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3f1a16fc45191a2eedf38cc243c70173d44806c6
Gerrit-Change-Number: 16073
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Jun 2020 16:52:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9515: Full ACID Milestone 3: Read support for "original files"

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16001 )

Change subject: IMPALA-9515: Full ACID Milestone 3: Read support for "original 
files"
..


Patch Set 11:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6047/


--
To view, visit http://gerrit.cloudera.org:8080/16001
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I176497ef9873ed7589bd3dee07d048a42dfad953
Gerrit-Change-Number: 16001
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 25 Jun 2020 16:51:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9294: Support DATE for min-max runtime filter

2020-06-25 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16103 )

Change subject: IMPALA-9294: Support DATE for min-max runtime filter
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16103/1/be/src/util/min-max-filter.h
File be/src/util/min-max-filter.h:

http://gerrit.cloudera.org:8080/#/c/16103/1/be/src/util/min-max-filter.h@250
PS1, Line 250: class DateMinMaxFilter : public MinMaxFilter {
> Oh, I actually meant macro, which is more consistent with the rest of this
Will define macros for timestamp and date as suggested.



--
To view, visit http://gerrit.cloudera.org:8080/16103
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2f6e2dc6949735d5f0fcf317361cc2969a5e82c
Gerrit-Change-Number: 16103
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 25 Jun 2020 16:51:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9790: option to use resolved hostname everywhere

2020-06-25 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16108


Change subject: IMPALA-9790: option to use resolved hostname everywhere
..

IMPALA-9790: option to use resolved hostname everywhere

This adds a flag --use_resolved_hostname, which replaces
--hostname with a resolved IP on startup. This is useful
for containerized environments where the hostname -> IP
mapping can be very dynamic.

This flag is used by default in the dockerized minicluster.

This also fixes a bug in the test code that incorrectly
identified command line flags. Specifically it only checked
the suffix, so it confused use_resolved_hostname and hostname.

Change-Id: I0d5cb9c68c60ce8dc838cde9dcf1c590017f5c9a
---
M be/src/common/global-flags.cc
M be/src/common/init.cc
M docker/catalogd/Dockerfile
M docker/impalad_coord_exec/Dockerfile
M docker/impalad_coordinator/Dockerfile
M docker/impalad_executor/Dockerfile
M docker/statestored/Dockerfile
M tests/common/impala_cluster.py
8 files changed, 23 insertions(+), 6 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/16108/2
--
To view, visit http://gerrit.cloudera.org:8080/16108
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0d5cb9c68c60ce8dc838cde9dcf1c590017f5c9a
Gerrit-Change-Number: 16108
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-25 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 12:

(3 comments)

Thank you Csaba for your feedback!
I have couple follow up questions.

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc
File be/src/exec/sort-node.cc:

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@89
PS12, Line 89: cardinality
> What is the default value of this? Can it be -1 (unknown)? The result seems
Ok, in that case, we should just abandon estimate in case of cardinality is -1.


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@90
PS12, Line 90: GetRowSize
> I think that this doesn't contain varlen data, so it can greatly underestim
So what is the nature of varlen column? Is each row possibly will have 
different sizes with large variations?
And what is GetRowSize() return in that case? Thinking if we should abandon the 
estimate entirely for input rows having varlen data.


http://gerrit.cloudera.org:8080/#/c/15963/12/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

http://gerrit.cloudera.org:8080/#/c/15963/12/tests/query_test/test_sort.py@74
PS12, Line 74: """The first sort run is given a privilege to ignore 
sort_run_bytes_limit, except
 :when estimate hints that spill is inevitable. The lower 
sort_run_bytes_limit of
 :a query is, the more sort runs are likely to be produced.
 :Case 1 : 1 run produced, because all rows fit within the 
maximum reservation.
 : sort_run_bytes_limit is not enforced.
 :Case 2 : 3 run produced, because the first run hit 
reservation limit, and the
 : next 2 runs are capped to 150m.
 :Case 3 : 4 run produced, because sort node estimate that 
spill is inevitable.
 : So all runs are capped to 130m, including the 
first one."""
> Isn't there something in query_result.runtime_profile that could be used to
I will look at that 'query_result.runtime_profile'.
Otherwise, I will change this test to run_test_case and verify the profile via 
regex.



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 12
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Jun 2020 16:15:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-06-25 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 12:

(13 comments)

Sorry for the many grammar comments, I was also the victim of this in the past 
:)

My only real concern is about the case when the cardinality is unknown. My 
preference would be to try to allow spilling in that case.

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.h
File be/src/exec/sort-node.h:

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.h@77
PS12, Line 77: going
nit: "will go"?


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc
File be/src/exec/sort-node.cc:

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@89
PS12, Line 89: cardinality
What is the default value of this? Can it be -1 (unknown)? The result seems 
pretty wrong in that case.


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@90
PS12, Line 90: GetRowSize
I think that this doesn't contain varlen data, so it can greatly underestimate 
the input size if there are strings.


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/exec/sort-node.cc@92
PS12, Line 92: 2)
I think that VLOG(3) is enough.


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h
File be/src/runtime/sorter.h:

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@101
PS12, Line 101: a
"the" would be better


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@101
PS12, Line 101:   /// 'estimated_input_size' is a total rows in bytes that 
estimated to get added into
nit: missing "are"


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@102
PS12, Line 102:   /// this sorter. This is used to decide if sorter need to 
proactively spilling for
nit: needs


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@102
PS12, Line 102: spilling
nit: spill


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.h@223
PS12, Line 223: run
nit: "do an"?


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.cc@816
PS12, Line 816:   VLOG(2) << Substitute(
I think that VLOG(3) is enough here - this should happen if the cardinality 
estimation was wrong, which may make WARNING logical, but this seems 
unavoidable for many queries, so I wouldn't spam the warning log.


http://gerrit.cloudera.org:8080/#/c/15963/12/be/src/runtime/sorter.cc@907
PS12, Line 907: VLOG(2) << Substitute(
Same as line 816.


http://gerrit.cloudera.org:8080/#/c/15963/12/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/15963/12/common/thrift/ImpalaInternalService.thrift@645
PS12, Line 645: backeds
typo: backends


http://gerrit.cloudera.org:8080/#/c/15963/12/tests/query_test/test_sort.py
File tests/query_test/test_sort.py:

http://gerrit.cloudera.org:8080/#/c/15963/12/tests/query_test/test_sort.py@74
PS12, Line 74: """The first sort run is given a privilege to ignore 
sort_run_bytes_limit, except
 :when estimate hints that spill is inevitable. The lower 
sort_run_bytes_limit of
 :a query is, the more sort runs are likely to be produced.
 :Case 1 : 1 run produced, because all rows fit within the 
maximum reservation.
 : sort_run_bytes_limit is not enforced.
 :Case 2 : 3 run produced, because the first run hit 
reservation limit, and the
 : next 2 runs are capped to 150m.
 :Case 3 : 4 run produced, because sort node estimate that 
spill is inevitable.
 : So all runs are capped to 130m, including the 
first one."""
Isn't there something in query_result.runtime_profile that could be used to 
check some of these statements? E.g. I think we can check that no spilling 
occurred for case 1, but it did occur for case 2 and 3



--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 12
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Jun 2020 15:00:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9829: Add Write Metrics for Spilling

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16083 )

Change subject: IMPALA-9829: Add Write Metrics for Spilling
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6418/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16083
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914
Gerrit-Change-Number: 16083
Gerrit-PatchSet: 10
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 25 Jun 2020 14:59:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9829: Add Write Metrics for Spilling

2020-06-25 Thread Yida Wu (Code Review)
Yida Wu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16083 )

Change subject: IMPALA-9829: Add Write Metrics for Spilling
..


Patch Set 10:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/16083/8/be/src/runtime/io/disk-io-mgr-test.cc
File be/src/runtime/io/disk-io-mgr-test.cc:

http://gerrit.cloudera.org:8080/#/c/16083/8/be/src/runtime/io/disk-io-mgr-test.cc@1665
PS8, Line 1665:   for (int i = 0; i < num_disks; i++) {
> Missing spaces:
Have got the clang-format-diff setting done. The style problem should be solved 
now.


http://gerrit.cloudera.org:8080/#/c/16083/8/be/src/runtime/io/disk-io-mgr-test.cc@1666
PS8, Line 1666: string key_prefix = "impala-server.io-mgr.queue-";
> The space at the end before the semicolon can probably be removed. Same app
Done


http://gerrit.cloudera.org:8080/#/c/16083/8/be/src/runtime/io/disk-io-mgr-test.cc@1673
PS8, Line 1673: auto write_latency_org =
> Use proper spacing like follows:
Done


http://gerrit.cloudera.org:8080/#/c/16083/8/be/src/runtime/io/disk-io-mgr-test.cc@1677
PS8, Line 1677: if (write_latency_org != nullptr) 
write_latency_org->Reset();
> Use proper spacing like follows:
Done


http://gerrit.cloudera.org:8080/#/c/16083/8/be/src/runtime/io/disk-io-mgr-test.cc@1699
PS8, Line 1699: ASSERT_OK(writer->AddWriteRange(*new_range));
> Note the disk id is always 0 because of `num_ranges % num_disks`. Was the i
Done


http://gerrit.cloudera.org:8080/#/c/16083/8/be/src/runtime/io/disk-io-mgr-test.cc@1711
PS8, Line 1711: uint64_t max_value = metric->MaxValue();
> Suggest removing the space before semicolon.
Done


http://gerrit.cloudera.org:8080/#/c/16083/8/be/src/runtime/io/disk-io-mgr-test.cc@1721
PS8, Line 1721:   for (int i = 0; i < num_disks; i++) {
> Don't think you need the if else block here. The code right now is only usi
Done


http://gerrit.cloudera.org:8080/#/c/16083/8/be/src/runtime/io/disk-io-mgr-test.cc@1735
PS8, Line 1735:   InitRootReservation(LARGE_RESERVATION_LIMIT);
> Missing space after 'for'
Done


http://gerrit.cloudera.org:8080/#/c/16083/8/be/src/runtime/io/disk-io-mgr-test.cc@1783
PS8, Line 1783:
> The space before closing bracket could be removed.
Done


http://gerrit.cloudera.org:8080/#/c/16083/8/be/src/util/histogram-metric.h
File be/src/util/histogram-metric.h:

http://gerrit.cloudera.org:8080/#/c/16083/8/be/src/util/histogram-metric.h@49
PS8, Line 49:   uint64_t MinValue() const { return histogram_->MinValue(); }
> The preferred style is no space before the semicolon, but space between sem
Done



--
To view, visit http://gerrit.cloudera.org:8080/16083
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914
Gerrit-Change-Number: 16083
Gerrit-PatchSet: 10
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 25 Jun 2020 14:32:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9829: Add Write Metrics for Spilling

2020-06-25 Thread Yida Wu (Code Review)
Yida Wu has uploaded a new patch set (#10). ( 
http://gerrit.cloudera.org:8080/16083 )

Change subject: IMPALA-9829: Add Write Metrics for Spilling
..

IMPALA-9829: Add Write Metrics for Spilling

Three types of metrics are added in disk-io-mgr:
1. impala-server.io-mgr.queue-$0.write-latency, unit: ns,
kind: HistogramMetric
2. impala-server.io-mgr.queue-$0.write-size, unit: Bytes,
kind: HistogramMetric
3. impala-server.io-mgr.queue-$0.write-io-error, kind: IntCounter

Write size, latency and io errors will be recorded in
impala::io::DiskIoMgr::Write.

Testing:
 * added DiskIoMgrTest.MetricsOfWriteSizeAndLatency
 * added DiskIoMgrTest.MetricsOfWriteIoError
Ran unit test disk-io-mgr-test and pre-commit test

Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914
---
M be/src/runtime/io/disk-io-mgr-internal.h
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
M be/src/util/histogram-metric.h
M common/thrift/metrics.json
6 files changed, 258 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/16083/10
--
To view, visit http://gerrit.cloudera.org:8080/16083
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I152b9c5339cedabe33f8873a2bbf651aa5dbb914
Gerrit-Change-Number: 16083
Gerrit-PatchSet: 10
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Yida Wu 


[Impala-ASF-CR] IMPALA-2658: Extend the NDV function to accept a precision

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15997 )

Change subject: IMPALA-2658: Extend the NDV function to accept a precision
..


Patch Set 40:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6417/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15997
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I48a4517bd0959f7021143073d37505a46c551a58
Gerrit-Change-Number: 15997
Gerrit-PatchSet: 40
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Thu, 25 Jun 2020 14:28:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2658: Extend the NDV function to accept a precision

2020-06-25 Thread Qifan Chen (Code Review)
Qifan Chen has uploaded a new patch set (#40). ( 
http://gerrit.cloudera.org:8080/15997 )

Change subject: IMPALA-2658: Extend the NDV function to accept a precision
..

IMPALA-2658: Extend the NDV function to accept a precision

This work addresses the current limitation in NDV function by
extending the function to optionally take a secondary argument
called scale.

   NDV([DISTINCT | ALL] expression [, scale])

Without the secondary argument, all the syntax and semantics are
preserved. The precision, which determines the total number
of different estimators in the HLL algorithm, is still 10.

When supplied, the scale argument must be an interger literal
in the range from 1 to 10. Its value is internally mapped
to a precision used by the HLL algorithm, with the following
mapping formula:

  precision = scale + 8.

Thus, a scale of 1 is mapped to a precision of 9 and a scale of
10 is mapped to a precision of 18.

A large precision value generally produces a better estimation
(i.e. with less error) than a small precision value, due to extra
number of estimators involved. The expense is at the extra amount of
memory needed. For a given precision p, the amount of memory used
by the HLL algorithm is in the order of 2^p bytes.

Testing:
1. Ran unit tests against table store_sales in TPC-DS and table customer
   in TPCH in both serial and parallel plan settings;
2. Added and ran a new regression test (test_ndv)) in
   TestAggregationQueries section to compute NDV() for every supported
   Impala data type over all valid scale values;
3. Ran "core" tests.

Performance:
1. Ran estimation error tests against a total of 22 distinct data sets
   loaded into external Impala tables.

   The error was computed as
   abs( - ) / .

   Overall, the precision of 18 (or the scale value of 10) gave
   the best result with worst estimation error at 0.42% (for one set
   of 10 million integers), and average error no more than 0.17%,
   at the cost of 256Kb of memory for the internal data structure per
   evaluation of the HLL algorithm.  Other precisions (such as 16 and
   17) were also very reasonable but with slightly larger estimation
   errors.

2. Ran execution time tests against a total of 6 distinct data files
   on a single node EC2 VM in debug mode. These data files were loaded
   in turn into a single column in an external Impala table.  It was
   found that the total execution time was relatively the same across
   different scales for a given table configuration. It remains to be
   seen the execution time for tables involving multiple data files
   across multiple nodes.

3. Ran execution time tests comparing the before- and
   after-enhancement version of NDV().

Change-Id: I48a4517bd0959f7021143073d37505a46c551a58
---
M be/src/common/logging.h
M be/src/exec/incr-stats-util-test.cc
M be/src/exec/incr-stats-util.cc
M be/src/exec/incr-stats-util.h
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M tests/query_test/test_aggregation.py
9 files changed, 426 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/15997/40
--
To view, visit http://gerrit.cloudera.org:8080/15997
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I48a4517bd0959f7021143073d37505a46c551a58
Gerrit-Change-Number: 15997
Gerrit-PatchSet: 40
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 


[Impala-ASF-CR] IMPALA-9515: Full ACID Milestone 3: Read support for "original files"

2020-06-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16001 )

Change subject: IMPALA-9515: Full ACID Milestone 3: Read support for "original 
files"
..


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6047/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16001
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I176497ef9873ed7589bd3dee07d048a42dfad953
Gerrit-Change-Number: 16001
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 25 Jun 2020 11:45:05 +
Gerrit-HasComments: No