[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()

2022-06-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions: CORR(), 
COVAR_SAMP()  and COVAR_POP()
..

IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Reviewed-on: http://gerrit.cloudera.org:8080/18413
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 1,074 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 30
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()

2022-06-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions: CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 29: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 29
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sun, 19 Jun 2022 04:35:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()

2022-06-18 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions: CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 28: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 28
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sat, 18 Jun 2022 23:49:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()

2022-06-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions: CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 29:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8246/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 29
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sat, 18 Jun 2022 23:50:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()

2022-06-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions: CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 29: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 29
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sat, 18 Jun 2022 23:50:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()

2022-06-18 Thread Code Review
Gergely Fürnstáhl has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions: CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 28: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 28
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sat, 18 Jun 2022 17:22:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()

2022-06-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions: CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 28:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10793/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 28
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 17 Jun 2022 08:44:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()

2022-06-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions: CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 27:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10792/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 27
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 17 Jun 2022 08:40:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()

2022-06-17 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions: CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 28:

(4 comments)

> Patch Set 25:
>
> (1 comment)

http://gerrit.cloudera.org:8080/#/c/18413/26//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18413/26//COMMIT_MSG@7
PS26, Line 7: :
> nit: remove the space before ":"
Done


http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@300
PS25, Line 300: // Correlation coefficient formula used:
  : // r = covar / (√(xvar * yvar)
  : struct CorrState {
  :   int64_t count; // number of elements
  :   double xavg; // average of x elements
  :   double yavg; // average of y elements
  :   double xvar; // n times the variance of x elements
  :
> As we discussed in the call, removing the code duplication brings up more u
Sure, I've added that.


http://gerrit.cloudera.org:8080/#/c/18413/26/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/26/be/src/exprs/aggregate-functions-ir.cc@343
PS26, Line 343:   if (state->count <= 1) {
  : mems
> nit: put "}" and "else {" at the same line
Done


http://gerrit.cloudera.org:8080/#/c/18413/26/be/src/exprs/aggregate-functions-ir.cc@513
PS26, Line 513: memset(state, 0, sizeof(CovarState));
  :   } else
> nit: put "}" and "else {" at the same line
Done



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 28
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 17 Jun 2022 08:25:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()

2022-06-17 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#28). ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions: CORR(), 
COVAR_SAMP()  and COVAR_POP()
..

IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 1,074 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/28
--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 28
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()

2022-06-17 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#27). ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions: CORR(), 
COVAR_SAMP()  and COVAR_POP()
..

IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 1,074 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/27
--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 27
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-17 Thread Code Review
Gergely Fürnstáhl has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 25:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@300
PS25, Line 300: struct CorrState {
  :   int64_t count; // number of elements
  :   double xavg; // average of x elements
  :   double yavg; // average of y elements
  :   double xvar; // n times the variance of x elements
  :   double yvar; // n times the variance of y elements
  :   double covar; // n times the covariance
  : };
> In Patchset12, you used CorrState for Covar calculations as well, which I a
As we discussed in the call, removing the code duplication brings up more 
unnecessary complications in the code and its cleaner to keep them separate, as 
it is. Maybe you can add a cross reference in the comment to the structs, e.g. 
"If you change Corr functions you probably have to change Covar functions too 
and vice versa.



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 25
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 17 Jun 2022 07:17:24 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-16 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 26:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/18413/26//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18413/26//COMMIT_MSG@7
PS26, Line 7:
nit: remove the space before ":"


http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@324
PS25, Line 324: if (state->count > 1
> Yeah, you're right, Quanlong and I had a discussion on this, we decided to
We can improve this later if we do find it impacts the performance. I think 
most of the group sizes are large in practise. So the cpu prediction 
correctness rate won't be so bad. Maybe we can add a LIKELY() marcro here.


http://gerrit.cloudera.org:8080/#/c/18413/26/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/26/be/src/exprs/aggregate-functions-ir.cc@343
PS26, Line 343:   }
  :   else {
nit: put "}" and "else {" at the same line


http://gerrit.cloudera.org:8080/#/c/18413/26/be/src/exprs/aggregate-functions-ir.cc@513
PS26, Line 513:   }
  :   else {
nit: put "}" and "else {" at the same line



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 26
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 17 Jun 2022 03:11:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 26:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10775/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 26
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 15 Jun 2022 15:47:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-15 Thread Code Review
Gergely Fürnstáhl has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 25:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@300
PS25, Line 300: struct CorrState {
  :   int64_t count; // number of elements
  :   double xavg; // average of x elements
  :   double yavg; // average of y elements
  :   double xvar; // n times the variance of x elements
  :   double yvar; // n times the variance of y elements
  :   double covar; // n times the covariance
  : };
> Yeah, that's right, we did something similar in our 12th patch where we did
In Patchset12, you used CorrState for Covar calculations as well, which I agree 
is not great, wasteful and confusing

My proposal is to still have 2 separate structs, but CorrState could utilize 
CovarState.

Some sketch, you can use composition instead of inheritance as well.

struct CovarState {
  int64_t count; // number of elements
  double xavg; // average of x elements
  double yavg; // average of y elements
  double covar; // n times the covariance
  void update(double x, double y) {...}
  void remove(double x, double y) {...}
  void merge(CovarState& other) {...}
};

struct CorrState : public CovarState {\
  double xvar; // n times the variance of x elements
  double yvar; // n times the variance of y elements
  void update(double x, double y) {
double prevXavg = xavg;
double prevYavg = yavg;
CovarState::update(x,y);
xvar = ...
yvar = ...
  }
  ...
};



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 25
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 15 Jun 2022 15:42:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-15 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 26:

(3 comments)

> Patch Set 25: Code-Review+1
>
> (3 comments)
>
> Math seems good to me

http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@300
PS25, Line 300: struct CorrState {
  :   int64_t count; // number of elements
  :   double xavg; // average of x elements
  :   double yavg; // average of y elements
  :   double xvar; // n times the variance of x elements
  :   double yvar; // n times the variance of y elements
  :   double covar; // n times the covariance
  : };
> As I understand, both CorrState and CovarState calculates xavg,yavg and cov
Yeah, that's right, we did something similar in our 12th patch where we didn't 
create a new state for covar_samp() and covar_pop() but two fields were not 
getting used in calculating covar_samp() and covar_pop() and it was causing a 
lot of unnecessary computation so we separated it into two different states.


http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@324
PS25, Line 324: if (state->count > 1
> nit: do we need this? for first items state->xavg = deltaX = x, so we will
Yeah, you're right, Quanlong and I had a discussion on this, we decided to add 
this check as it saves some floating point computation. More computation would 
means more floating point rounding error.


http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@331
PS25, Line 331: // vx_n = vx_(n - 1) + (x_n - mx_(n - 1)) * (x_n - mx_n)
> nit: the order is a bit misleading as deltax=(x_n - mx_(n - 1))
Yeah, I just changed it.



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 26
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 15 Jun 2022 15:28:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-15 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#26). ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..

IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 1,074 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/26
--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 26
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-15 Thread Code Review
Gergely Fürnstáhl has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 25: Code-Review+1

(3 comments)

Math seems good to me

http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@300
PS25, Line 300: struct CorrState {
  :   int64_t count; // number of elements
  :   double xavg; // average of x elements
  :   double yavg; // average of y elements
  :   double xvar; // n times the variance of x elements
  :   double yvar; // n times the variance of y elements
  :   double covar; // n times the covariance
  : };
As I understand, both CorrState and CovarState calculates xavg,yavg and covar 
the same way. I think we could restructure this a bit to spare code duplication.

E.g.:
CovarState::update, remove and merge could encapsulate the logic regarding the 
common members, and CorrState could have a CovarState member, and call its 
functions meanwhile calculating xvar and yvar too


http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@324
PS25, Line 324: if (state->count > 1
nit: do we need this? for first items state->xavg = deltaX = x, so we will add 
0 to state->covar, does not change anything. This way we pay a branching for 
every count unnecessarily


http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@331
PS25, Line 331: // vx_n = vx_(n - 1) + (x_n - mx_n) * (x_n - mx_(n - 1))
nit: the order is a bit misleading as deltax=(x_n - mx_(n - 1))



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 25
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Gergely Fürnstáhl 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 15 Jun 2022 14:57:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 25: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8230/


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 25
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 14 Jun 2022 07:49:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 25:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8230/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 25
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 14 Jun 2022 06:33:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 25: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 25
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 14 Jun 2022 06:33:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-14 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 24: Code-Review+2

> Patch Set 24:
>
> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8229/

The failure is unrelated to this patch
https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/5845
https://issues.apache.org/jira/browse/IMPALA-10125


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 24
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 14 Jun 2022 06:22:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 24:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8229/


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 24
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 14 Jun 2022 04:41:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-13 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 24: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 24
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 14 Jun 2022 00:39:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 24:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8229/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 24
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 14 Jun 2022 00:14:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-13 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 24:

recheck


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 24
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 13 Jun 2022 16:39:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-12 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 24: Code-Review+1

The test failure is due to a flaky test (IMPALA-1995) which is unrelated to 
this patch.


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 24
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 13 Jun 2022 01:32:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 24: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8220/


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 24
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sun, 12 Jun 2022 18:11:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 24:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8220/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 24
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sun, 12 Jun 2022 13:49:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 24:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10750/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 24
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 10 Jun 2022 12:37:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 22:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8212/


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 10 Jun 2022 12:35:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-10 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#24). ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..

IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 1,074 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/24
--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 24
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-10 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 23: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 23
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 10 Jun 2022 11:07:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 23:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10749/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 23
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 10 Jun 2022 10:12:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-10 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 23:

(4 comments)

> Patch Set 22: Code-Review+1
>
> (4 comments)

http://gerrit.cloudera.org:8080/#/c/18413/22/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/22/be/src/exprs/aggregate-functions-ir.cc@341
PS22, Line 341: if (state->count <= 1) {
> nit: this is part of a multi-line if-else statement so our code style still
Done


http://gerrit.cloudera.org:8080/#/c/18413/22/be/src/exprs/aggregate-functions-ir.cc@452
PS22, Line 452:   DCHECK(state->xvar >= -1E-8);
  :   DCHECK(state->yvar >= -1E-8
> nit: add brackets {} for multi-line if-statement
Done


http://gerrit.cloudera.org:8080/#/c/18413/22/be/src/exprs/aggregate-functions-ir.cc@507
PS22, Line 507:   state->xavg += (x - state->xavg) / state->count;
> nit: this is part of a multi-line if-else statement so our code style still
Done


http://gerrit.cloudera.org:8080/#/c/18413/22/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

http://gerrit.cloudera.org:8080/#/c/18413/22/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1324
PS22, Line 1324: List
> nit: use the same idention as the above metionds.
Done



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 23
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 10 Jun 2022 09:50:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-10 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#23). ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..

IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 1,074 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/23
--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 23
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 22:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8212/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 10 Jun 2022 08:06:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-10 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 22: Code-Review+1

(4 comments)

http://gerrit.cloudera.org:8080/#/c/18413/22/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/22/be/src/exprs/aggregate-functions-ir.cc@341
PS22, Line 341: if (state->count <= 1) memset(state, 0, sizeof(CorrState));
nit: this is part of a multi-line if-else statement so our code style still 
prefer adding brackets {} here.


http://gerrit.cloudera.org:8080/#/c/18413/22/be/src/exprs/aggregate-functions-ir.cc@452
PS22, Line 452:   if (state->count == 0 || state->count == 1 || state->xvar <= 
0.0 || state->yvar <= 0.0)
  : return DoubleVal::null();
nit: add brackets {} for multi-line if-statement


http://gerrit.cloudera.org:8080/#/c/18413/22/be/src/exprs/aggregate-functions-ir.cc@507
PS22, Line 507:   if (state->count <= 1) memset(state, 0, sizeof(CovarState));
nit: this is part of a multi-line if-else statement so our code style still 
prefer adding brackets {} here.


http://gerrit.cloudera.org:8080/#/c/18413/22/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

http://gerrit.cloudera.org:8080/#/c/18413/22/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1324
PS22, Line 1324: 
nit: use the same idention as the above metionds.



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 10 Jun 2022 08:05:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 22: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8205/


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 08 Jun 2022 23:20:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 22:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10734/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 08 Jun 2022 18:44:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 22:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8205/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 08 Jun 2022 18:43:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-08 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 22:

(21 comments)

> Patch Set 20:
>
> (19 comments)

http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@301
PS20, Line 301:   int64_t count; // number of elements
> Change all occurrences of long to int64_t
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@341
PS20, Line 341: )
> nit: remove the space before ")"
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@342
PS20, Line 342:   else {
  : --state->count;
  : // mx_(n - 1) = mx_n - (x_n - mx_n) / (n - 1)
  : state->xavg -= deltaX / state->count;
  : // my_(n - 1) = my_n - (y_n - my_n) / (n - 1)
  : state->yavg -= delt
> nit: these can be simplified by memset(), just like what we do at line 313.
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@348
PS20, Line 348: // c_(n - 1) = c_n - (x_n - mx_n) * (y_n - my_(n -1))
  : st
> nit: put "}" and "else" at the same line, i.e. "} else {"
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@355
PS20, Line 355:
> This comment seems incorrect. 'deltaX' is actually (x_n - mx_n). 'y - state
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@375
PS20, Line 375:
> This is not clear to me. Do you mean 'count'?
In functions which don't have finalize() function, we need to check explicitly 
if num_removes() >= num_updates(), but here since we have finalize() function, 
we don't need to explicitly check the number of update() and remove() calls.


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@378
PS20, Line 378: void AggregateFunctions::TimestampCorrUpdate(FunctionContext* 
ctx,
> Change all occurrences of NULL to nullptr
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@393
PS20, Line 393:   // Remove doesn't need to explicitly
> nit: please add brackets {} for multi-line if-statement.
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@398
PS20, Line 398: lue::FromTimestampVal(src1);
> Could you explain more? Why do we need to check the number of Update/Remove
In functions which don't have finalize() function, we need to check explicitly 
if num_removes() >= num_updates(), but here since we have finalize() function, 
we don't need to explicitly check the number of update() and remove() calls.


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@407
PS20, Line 407: void AggregateFunctions::CorrMerge(Func
> nit: please add brackets {} for multi-line if-statement.
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@451
PS20, Line 451: e->yvar >= -1E-8);
> We should check whether 'state->xvar * state->yvar' become negative before
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@452
PS20, Line 452:   if (state->count == 0 || state->count == 1 || state->xvar <= 
0.0 || state->yvar <= 0.0)
  : return DoubleVal::null();
  :   double r = sqrt(state->xvar * state->yvar);
> move this to the begining of this method, and split it into two DCHECKs so
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@467
PS20, Line 467:   ctx->Free(src.ptr);
  :   return r;
> This is a use-after-free pattern. We should not free the resource if we sti
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@507
PS20, Line 507: )
> nit: remove the space before ")".
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@508
PS20, Line 508:   else {
  : --state->count;
  : // my_(n - 1) = my_n - (y_n - my_n) / (n - 1)
  : state->yavg -= (y -
> nit: use memset instead.
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@512
PS20, Line 512: // c_(n - 1) = c_n - (x_n - mx_(n - 1)) * (y_n - my_n)
  : stat
> nit: put them into one line, i.e. "} else {"
Done


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@535
PS20, Line 535:
> Need explanation here too
In functions which don't have finalize() function, we need to check explicitly 
if num_removes() >= num_updates(), but here since we have 

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-08 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#22). ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..

IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 1,068 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/22
--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 22
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 21: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 21
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 08 Jun 2022 11:02:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 21:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8200/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 21
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 08 Jun 2022 06:32:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-07 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 20:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@341
PS20, Line 341:
nit: remove the space before ")"


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@342
PS20, Line 342: state->count = 0;
  : state->xavg = 0.0;
  : state->yavg = 0.0;
  : state->xvar = 0.0;
  : state->yvar = 0.0;
  : state->covar = 0.0;
nit: these can be simplified by memset(), just like what we do at line 313.

 memset(state, 0, sizeof(CorrState))


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@348
PS20, Line 348:   }
  :   else
nit: put "}" and "else" at the same line, i.e. "} else {"


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@355
PS20, Line 355: (x_n - mx_(n - 1)) * (y_n - my_n)
This comment seems incorrect. 'deltaX' is actually (x_n - mx_n). 'y - 
state->yavg' is (y_n - my_(n-1)).

BTW, please fix the idention of line 355 ~ 360.


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@375
PS20, Line 375: the number of calls to Update() or Remove()
This is not clear to me. Do you mean 'count'?


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@393
PS20, Line 393: CorrUpdateState(val1, val2, state);
nit: please add brackets {} for multi-line if-statement.


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@398
PS20, Line 398: the number of calls to Update() or Remove()
Could you explain more? Why do we need to check the number of Update/Remove 
calls?


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@407
PS20, Line 407: CorrRemoveState(val1, val2, state);
nit: please add brackets {} for multi-line if-statement.


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@451
PS20, Line 451: sqrt((state->xvar * state->yvar))
We should check whether 'state->xvar * state->yvar' become negative before 
calling sqrt(). Please split the if-statement at line 455 into two: One for 
checking counts and vars before this line, and the other one for checking 'r == 
0.0' after this line.

BTW, don't need duplicated parentheses here.


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@452
PS20, Line 452:   // xvar and yvar become negative in certain cases due to 
floating point rounding error.
  :   // Since these values are very small, they can be ignored and 
rounded to 0.
  :   DCHECK(state->xvar >= -1E-8 && state->yvar >= -1E-8);
move this to the begining of this method, and split it into two DCHECKs so we 
know exactly which one breaks if it happens.

 DCHECK(state->xvar >= -1E-8);
 DCHECK(state->yvar >= -1E-8);


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@467
PS20, Line 467:   ctx->Free(src.ptr);
  :   return CorrGetValue(ctx, src);
This is a use-after-free pattern. We should not free the resource if we still 
want to use it. Chang it to

 DoubleVal r = CorrGetValue(ctx, src);
 ctx->Free(src.ptr);
 return r;


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@507
PS20, Line 507:
nit: remove the space before ")".


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@508
PS20, Line 508: state->count = 0;
  : state->xavg = 0.0;
  : state->yavg = 0.0;
  : state->covar = 0.0;
nit: use memset instead.

 memset(state, 0, sizeof(CovarState))


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@512
PS20, Line 512:   }
  :   else {
nit: put them into one line, i.e. "} else {"


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@535
PS20, Line 535: the number of calls to Update() or Remove()
Need explanation here too


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@553
PS20, Line 553: CovarUpdateState(val1, val2, state);
nit: add brackets {} for multi-line if-statement.


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@567
PS20, Line 567: CovarRemoveState(val1, val2, state);
nit: add brackets {} for multi-line if-statement.


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@625
PS20, Line 625:   

[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-07 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 20:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@301
PS20, Line 301:   long count; // number of elements
Change all occurrences of long to int64_t


http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@378
PS20, Line 378:   DCHECK(dst->ptr != NULL);
Change all occurrences of NULL to nullptr



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 07 Jun 2022 13:10:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 20: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8191/


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 07 Jun 2022 09:07:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 20:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8191/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 07 Jun 2022 04:36:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 19: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8189/


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 19
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 06 Jun 2022 21:50:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 20:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10712/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 06 Jun 2022 18:51:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-06 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 20:

(1 comment)

> Patch Set 18:
>
> (1 comment)

http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@408
PS18, Line 408: }
> Code is still skipping NaN..
Oh sorry, I missed it, its fixed now.



-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 06 Jun 2022 18:34:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-06 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#20). ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..

IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 1,067 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/20
--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 20
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-06 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@408
PS18, Line 408: if (isnan(val1) || isnan(val2)) return;
> Skipping NaN here seems questionable? Please add comments or handle appropr
Code is still skipping NaN..



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 06 Jun 2022 18:02:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-06 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 19:

(3 comments)

> Patch Set 19:
>
> (10 comments)
>
> > Patch Set 18:
> >
> > (1 comment)

http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@416
PS17, Line 416:   CorrState* src_state = reinterpret_cast(src.ptr);
  :   DCHECK(dst->ptr != NULL);
  :   DCHECK_EQ(sizeof(CorrState), dst->len);
  :   CorrState* dst_state = reinterpret_cast(dst->ptr);
  :   if (src.ptr != nullptr) {
  : long nA = dst_state->count;
> This was causing impala crash so didn't add this as of now
Added


http://gerrit.cloudera.org:8080/#/c/18413/19/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/19/be/src/exprs/aggregate-functions-ir.cc@324
PS19, Line 324: count
This check is just added as it saves some floating point computation. More 
computation would means more floating point rounding error.


http://gerrit.cloudera.org:8080/#/c/18413/19/be/src/exprs/aggregate-functions-ir.cc@458
PS19, Line 458: =
Please let me know if this tolerance value is suitable



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 19
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 06 Jun 2022 17:36:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 19:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10709/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 19
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 06 Jun 2022 17:33:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 19:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8189/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 19
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 06 Jun 2022 17:26:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-06 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 19:

(10 comments)

> Patch Set 18:
>
> (1 comment)

http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@296
PS18, Line 296: // 
https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online
> could you explain more about how the `CorrState` is updated and used to cal
Added a few links


http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@408
PS18, Line 408:   tm_src2.ToSubsecondUnixTime(UTCPTR, )) {
> Skipping NaN here seems questionable? Please add comments or handle appropr
Yeah, its fixed now.


http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@422
PS18, Line 422: long nB = src_state->count;
> Does (nB == 0) need handling?
nB is the count of src->state. nB == 0 has been accounted for in CorrInit() and 
CorrRemove(). We alse make sure that src.ptr is not equal to nullptr.


http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@462
PS18, Line 462:   return DoubleVal(corr);
> Add an assert that negative values are small and comment that this can happ
Sure,added, I've taken the tolerance value to be around -1E-8 as it seemed safe 
to me. We can probably go for smaller values like -1E-10 as well. Please let me 
know what would be a suitable tolerance value.


http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@543
PS18, Line 543:   DCHECK_EQ(sizeof(CovarState), dst->len);
> Skipping Nan?
Updated in the latest patch.


http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@575
PS18, Line 575:   }
> Skipping NaN
Updated in the latest patch.


http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@589
PS18, Line 589:   return;
> handle (nB == 0) ?
nB is the count of src->state. nB == 0 has been accounted for in CorrInit() and 
CorrRemove(). We alse make sure that src.ptr is not equal to nullptr.


http://gerrit.cloudera.org:8080/#/c/18413/18/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/18413/18/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1526
PS18, Line 1526:  from tpcds.store;
> This may be non-deterministic (flaky) without an order by on the window cla
Yeah, its fixed in the latest patch


http://gerrit.cloudera.org:8080/#/c/18413/18/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1544
PS18, Line 1544: select id, double_col, corr(double_col, int_col) over 
(partition by month order by id) from functional.alltypes
> This may be non-deterministic (flaky) without an order by on the window cla
Yeah, its fixed in the latest patch


http://gerrit.cloudera.org:8080/#/c/18413/18/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1578
PS18, Line 1578: select s_store_sk, corr(s_number_employees, s_floor_space) 
over (partition by s_city order by s_store_sk
> Check if order by city is strong ordering.
It wasn't strong ordering, I've changed it to s_store_sk which contains unique 
values.



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 19
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 06 Jun 2022 17:23:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-06 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#19). ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..

IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 1,075 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/19
--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 19
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-03 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@462
PS18, Line 462: return DoubleVal::null();
Add an assert that negative values are small and comment that this can happen 
due to floating point rounding error.



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 03 Jun 2022 13:42:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-03 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 18:

> Patch Set 18:
>
> (1 comment)
Hi, sure Jian, I’ll be adding a few details in the new patch after I get a few 
clarifications, till then you can checkout these links :
Basic overview :
https://docs.google.com/document/d/1TH-907nK5JWIGZ-ePo9CQLQYlW2Y-jEAajfCYHhswus/edit
This contains implementation details :
https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online
https://www.osti.gov/biblio/1028931
Hive also implements using a similar method :
https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCorrelation.java#L366


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 03 Jun 2022 06:32:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-02 Thread Jian Zhang (Code Review)
Jian Zhang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@296
PS18, Line 296: // r = covar / (√(xvar * yvar)
could you explain more about how the `CorrState` is updated and used to 
calculate the final correlation of two variables?

it would be helpful for readers unfamiliar with the algorithms if you could add 
links about the "Pearson's correlation" and "Welford's online algorithm".



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jian Zhang 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 03 Jun 2022 01:38:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-01 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 18:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@408
PS18, Line 408: if (isnan(val1) || isnan(val2)) return;
Skipping NaN here seems questionable? Please add comments or handle 
appropriately.


http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@422
PS18, Line 422: if (nA == 0) {
Does (nB == 0) need handling?


http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@543
PS18, Line 543:   if (src1.is_null || isnan(src1.val) || src2.is_null || 
isnan(src2.val)) return;
Skipping Nan?


http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@575
PS18, Line 575: if (isnan(val1) || isnan(val2)) return;
Skipping NaN


http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@589
PS18, Line 589: if (nA == 0) {
handle (nB == 0) ?


http://gerrit.cloudera.org:8080/#/c/18413/18/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/18413/18/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1526
PS18, Line 1526: select s_store_sk, corr(s_number_employees, s_floor_space) 
over (partition by s_city) from tpcds.store;
This may be non-deterministic (flaky) without an order by on the window clause.


http://gerrit.cloudera.org:8080/#/c/18413/18/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1544
PS18, Line 1544: select id, double_col, corr(double_col, int_col) over 
(partition by month) from functional.alltypes
This may be non-deterministic (flaky) without an order by on the window clause.


http://gerrit.cloudera.org:8080/#/c/18413/18/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1578
PS18, Line 1578: select s_store_sk, corr(s_number_employees, s_floor_space) 
over (partition by s_city order by s_city
Check if order by city is strong ordering.



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 01 Jun 2022 18:38:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-06-01 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 18:

(11 comments)

> Patch Set 17:
>
> (9 comments)

http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@293
PS17, Line 293: // online algorithm. This is calcula
> nit: please also mention "Welford's online algorithm".
Done


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@337
PS17, Line 337:   double deltaY = y - state->yavg;
> If state->count is 1, it will be decreased to 0. We will hit divide-by-zero
Yeah, it's been taken care of in the new patch.


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@342
PS17, Line 342: state->xvar = 0.0;
> If the original count is 2, it will be decreased to 1 at line 337. Then we
Sure, updated in the latest patch


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@355
PS17, Line 355:  vx_
> nit: we prefer nullptr to NULL.
Done


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@358
PS17, Line 358:   state->yvar -= deltaY * (y - state->yavg);
> nit: we can merge this check to line 354, i.e.
Done


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@371
PS17, Line 371:
> nit: same as above. We can merge the check to line 367.
Done


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@412
PS17, Line 412:
> nit: add a space after "if" and replace NULL with nullptr.
Done


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@416
PS17, Line 416:   DCHECK(dst->ptr != NULL);
  :   DCHECK_EQ(sizeof(CorrState), dst->len);
  :   CorrState* dst_state = reinterpret_cast(dst->ptr);
  :   if (src.ptr != nullptr) {
  : long nA = dst_state->count;
  : long nB = src_state->count;
> nit: Can we simplify this by using memcpy? i.e.
This was causing impala crash so didn't add this as of now


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@423
PS17, Line 423:   dst_state->count = src_state->count;
> nit: replace "if" with "else if" to save one check, or add a "return" at th
Done


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@453
PS17, Line 453:
> sqrt() is expensive. Let's change this to sqrt(state->xvar * state->yvar).
Done


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@493
PS17, Line 493:
> nit: both terms? Maybe you want to add another equation after line 491:
Done



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 01 Jun 2022 07:05:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 18: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 01 Jun 2022 01:58:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 18:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10664/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 31 May 2022 21:59:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-31 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#18). ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..

IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 1,078 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/18
--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 18
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-31 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 17:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@293
PS17, Line 293: // using a stable one-pass algorithm
nit: please also mention "Welford's online algorithm".


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@355
PS17, Line 355: NULL
nit: we prefer nullptr to NULL.


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@358
PS17, Line 358:   if (!isnan(src1.val) && !isnan(src2.val)) {
nit: we can merge this check to line 354, i.e.

  if (src1.is_null || isnan(src1.val) || src2.is_null || isnan(src2.val)) 
return;


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@371
PS17, Line 371:   if (!isnan(src1.val) && !isnan(src2.val)) {
nit: same as above. We can merge the check to line 367.


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@412
PS17, Line 412:   if(src.ptr != NULL) {
nit: add a space after "if" and replace NULL with nullptr.


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@416
PS17, Line 416:   dst_state->count = src_state->count;
  :   dst_state->xavg = src_state->xavg;
  :   dst_state->yavg = src_state->yavg;
  :   dst_state->xvar = src_state->xvar;
  :   dst_state->yvar = src_state->yvar;
  :   dst_state->covar = src_state->covar;
nit: Can we simplify this by using memcpy? i.e.

  memcpy(dst, , sizeof(CorrState)


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@423
PS17, Line 423: if (nA != 0 && nB != 0) {
nit: replace "if" with "else if" to save one check, or add a "return" at the 
end of the above if-branch.


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@453
PS17, Line 453: sqrt(state->xvar) / sqrt(state->yvar)
sqrt() is expensive. Let's change this to sqrt(state->xvar * state->yvar).


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@493
PS17, Line 493: both terms
nit: both terms? Maybe you want to add another equation after line 491:

 // c_n = c_(n - 1) + (x_n - mx_n) * (y_n - my_(n - 1))



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 17
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 31 May 2022 06:57:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-30 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 17:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@337
PS17, Line 337:   if (state->count > 0) --state->count;
If state->count is 1, it will be decreased to 0. We will hit divide-by-zero 
errors in the following lines.


http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@342
PS17, Line 342:   if (state->count > 1) {
If the original count is 2, it will be decreased to 1 at line 337. Then we will 
miss it here which is wrong IIUC.



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 17
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 31 May 2022 05:08:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-24 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 17:

(2 comments)

> Patch Set 15:
>
> (3 comments)

http://gerrit.cloudera.org:8080/#/c/18413/15/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/15/be/src/exprs/aggregate-functions-ir.cc@328
PS15, Line 328: state->xvar += deltaX * (x - state-
> This seems wrong to me. Let's say mx_n is the avg of [x_1, x_2, ..., x_n].
Right, I was skeptical about it too! I've made the required changes.


http://gerrit.cloudera.org:8080/#/c/18413/15/be/src/exprs/aggregate-functions-ir.cc@332
PS15, Line 332: }
  :
  : static inline void CorrRemoveState(double x, d
> I think we need to find a reference for these.
Here are a few useful links:
https://www.osti.gov/servlets/purl/1028931
https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online
https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCorrelation.java#L366



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 17
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 24 May 2022 07:34:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 17: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 17
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 24 May 2022 02:06:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 17:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10623/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 17
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 23 May 2022 21:56:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 16:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10622/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 16
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 23 May 2022 21:50:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 17:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8146/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 17
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 23 May 2022 21:37:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-23 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#17). ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..

IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 940 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/17
--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 17
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-23 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#16). ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..

IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 940 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/16
--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 16
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-23 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has restored this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Restored
--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: restore
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 15
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-23 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has abandoned this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Abandoned

minor change
-- 
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 16
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-19 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 15:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/18413/15/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/15/be/src/exprs/aggregate-functions-ir.cc@328
PS15, Line 328:   state->xavg -= deltaX / state->count;
This seems wrong to me. Let's say mx_n is the avg of [x_1, x_2, ..., x_n]. In 
CorrUpdateState(), we have
mx_n = mx_(n-1) + [x_n - mx_(n-1)]/n
Reverting the formula we have
mx_(n-1) = (n * mx_n - x_n)/(n-1) = mx_n - (x_n - mx_n)/(n-1)

The above code corresponds to
mx_(n-1) = mx_n - (x_n - mx_n) / n
So it's incorrect. We should decrease state->count before this.

I think the correct code is

  --state->count;
  state->xavg -= deltaX / state->count;
  state->yavg -= deltaY / state->count;

We also need to check if state->count becomes 0.


http://gerrit.cloudera.org:8080/#/c/18413/15/be/src/exprs/aggregate-functions-ir.cc@332
PS15, Line 332: state->covar -= deltaX * (y - state->yavg);
  : state->xvar -= deltaX * (x - state->xavg);
  : state->yvar -= deltaY * (y - state->yavg);
I think we need to find a reference for these.


http://gerrit.cloudera.org:8080/#/c/18413/14/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/18413/14/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1492
PS14, Line 1492: NULL
> Resolved
FWIW, this is resolved by changing the formula to use another one with better 
error precision.
https://www.johndcook.com/blog/standard_deviation/



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 15
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 19 May 2022 07:02:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 15: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 15
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 18 May 2022 12:26:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 15:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10591/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 15
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 18 May 2022 08:11:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 15:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8115/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 15
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 18 May 2022 07:54:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-18 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 15:

(2 comments)

> Patch Set 14:
>
> (2 comments)
>
> I double the tests and find a question we need to address.

http://gerrit.cloudera.org:8080/#/c/18413/14/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/18413/14/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1492
PS14, Line 1492: NULL
> Hive returns NULL in this case. I think the reason is https://issues.apache
Resolved


http://gerrit.cloudera.org:8080/#/c/18413/14/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1558
PS14, Line 1558: 
> For tests on window functions, can we also copy Hive's test?
We still don't have support for window clause



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 15
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 18 May 2022 07:51:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-18 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#15). ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..

IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 853 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/15
--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 15
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-15 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 14:

(2 comments)

I double the tests and find a question we need to address.

http://gerrit.cloudera.org:8080/#/c/18413/14/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/18413/14/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1492
PS14, Line 1492: 1.01233222875e-08
Hive returns NULL in this case. I think the reason is 
https://issues.apache.org/jira/browse/HIVE-16178:

 If N * SUMX2 equals SUMX * SUMX , then the result is the null value.
 If N * SUMY2 equals SUMY * SUMY , then the result is the null value.

The commit is 
https://github.com/apache/hive/commit/d97e4874dc0e09c535e8cb908f6b17698e49a5d6

Could you check if indeed N * SUMY2 equals SUMY * SUMY in this case, or it's a 
precision issue in Hive?


http://gerrit.cloudera.org:8080/#/c/18413/14/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1558
PS14, Line 1558: 
For tests on window functions, can we also copy Hive's test?
https://github.com/apache/hive/blob/3b3da9ed7f3813bae3e959670df55682fea648d3/ql/src/test/results/clientpositive/llap/windowing.q.out#L853-L890



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 16 May 2022 01:36:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 14: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 13 May 2022 15:47:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-13 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 14: Code-Review+1

LGTM


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 13 May 2022 12:45:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 14:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10572/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 13 May 2022 11:43:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-13 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 14:

(3 comments)

> Patch Set 13:
>
> (3 comments)

http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1526
PS12, Line 1526: select s_store_sk, corr(s_number_employees, s_floor_space) 
over (partition by s_city) from tpcds.store;
> sorry that I mean changing the query to
No problem at all, I've updated the query.


http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1683
PS12, Line 1683: 
> sorry, I mean changing the query to
No problem at all, I've updated the query.


http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1700
PS12, Line 1700:  TYPES
> sorry, I mean changing it to
No problem at all, I've updated the query.



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 13 May 2022 11:31:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-13 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#14). ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..

IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 759 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/14
--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 14:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8101/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 14
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 13 May 2022 11:24:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-13 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 13:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1526
PS12, Line 1526: select corr(s_store_sk, s_floor_space) over (partition by 
s_city) from tpcds.store;
> Done
sorry that I mean changing the query to

 select s_store_sk, corr(s_number_employees,s_floor_space) over (partition by 
s_city) from tpcds.store

Some result rows are identical. Adding the s_store_sk column helps to 
distinguish them.


http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1683
PS12, Line 1683: 
> Done
sorry, I mean changing the query to

  select s_store_sk, covar_samp(s_number_employees, s_floor_space) over 
(partition by s_city) from tpcds.store


http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1700
PS12, Line 1700:  TYPES
> Done
sorry, I mean changing it to

  select s_store_sk, covar_pop(s_number_employees, s_floor_space) over 
(partition by s_city) from tpcds.store



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Fri, 13 May 2022 10:46:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 13: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 12 May 2022 23:21:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 13:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8099/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 12 May 2022 18:53:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10564/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 12 May 2022 18:29:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-12 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 13:

(12 comments)

> Patch Set 12:
>
> (12 comments)
>
> Thanks for adding the other two stats functions!

http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@301
PS12, Line 301:
> nit: add a space
Done


http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@309
PS12, Line 309:
  : void AggregateFunctions::CorrInit(FunctionContext* ctx, 
StringVal* dst) {
  :   dst->is_null = false;
  :   dst->len = sizeof(Co
> These two fields are not used in covar_samp() and covar_pop(). Can we have
Sure, I added a new state, CovarState, and a few related functions.


http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@349
PS12, Line 349: rns NULL if
> nit: please fix the space
Done


http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@362
PS12, Line 362: pret_cast nit: fix the space
Done


http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@376
PS12, Line 376: return
> nit: add spaces
Done


http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@392
PS12, Line 392: L);
> nit: add spaces
Done


http://gerrit.cloudera.org:8080/#/c/18413/12/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

http://gerrit.cloudera.org:8080/#/c/18413/12/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1347
PS12, Line 1347: //
> nit: add one space
Done


http://gerrit.cloudera.org:8080/#/c/18413/12/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1372
PS12, Line 1372: //
> nit: add one space
Done


http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1526
PS12, Line 1526: select corr(s_store_sk, s_floor_space) over (partition by 
s_city) from tpcds.store;
> nit: could you add s_store_sk as the first column? Also we don't need "limi
Done


http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1619
PS12, Line 1619:
> Could you leave a commet for this?
Done


http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1683
PS12, Line 1683: 
> nit: add s_store_sk column and remove the limit clause
Done


http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1700
PS12, Line 1700:  TYPES
> nit: add s_store_sk column and remove the limit clause
Done



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 12 May 2022 18:17:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-12 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has uploaded a new patch set (#13). ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..

IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP()
 and COVAR_POP()

CORR() function takes two numeric type columns as arguments and returns
the Pearson's correlation coefficient between them.
COVAR_SAMP() function takes two numeric type columns and returns sample
 covariance between them.
COVAR_POP() function takes two numeric type columns and returns
 population covariance between them.
These UDAFs are tested with a few query tests written in aggregation.test.

Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/aggregation.test
4 files changed, 759 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/13
--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 13
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-12 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 12:

(12 comments)

Thanks for adding the other two stats functions!

http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@301
PS12, Line 301: //
nit: add a space


http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@309
PS12, Line 309:   // Sum(x) squared
  :   double sum_squaredx;
  :   // Sum(y) squared
  :   double sum_squaredy;
These two fields are not used in covar_samp() and covar_pop(). Can we have a 
separate State for covar_samp() and covar_pop()?


http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@349
PS12, Line 349: ,src2.val ,
nit: please fix the space


http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@362
PS12, Line 362: ,src2.val ,
nit: fix the space


http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@376
PS12, Line 376: ,val2,
nit: add spaces


http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@392
PS12, Line 392: ,val2,
nit: add spaces


http://gerrit.cloudera.org:8080/#/c/18413/12/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

http://gerrit.cloudera.org:8080/#/c/18413/12/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1347
PS12, Line 1347: //
nit: add one space


http://gerrit.cloudera.org:8080/#/c/18413/12/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1372
PS12, Line 1372: //
nit: add one space


http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1526
PS12, Line 1526: select corr(s_number_employees,s_floor_space) over (partition 
by s_city) from tpcds.store limit 10;
nit: could you add s_store_sk as the first column? Also we don't need "limit 
10" since this table only has 12 rows.


http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1619
PS12, Line 1619: /1E+10
Could you leave a commet for this?


http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1683
PS12, Line 1683: select covar_samp(s_number_employees, s_floor_space) over 
(partition by s_city) from tpcds.store limit 10;
nit: add s_store_sk column and remove the limit clause


http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1700
PS12, Line 1700: select covar_pop(s_number_employees, s_floor_space) over 
(partition by s_city) from tpcds.store limit 10;
nit: add s_store_sk column and remove the limit clause



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 12 May 2022 07:48:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 12:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/10556/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 11 May 2022 09:38:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()

2022-05-11 Thread Anonymous Coward (Code Review)
pranav.lo...@cloudera.com has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
..


Patch Set 12:

(7 comments)

> Patch Set 11:
>
> (1 comment)

http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@296
PS10, Line 296: // r = (n(Σxy)−(Σx)(Σy))/(√(nΣx^2−(Σx)^2)*(nΣy^2−(Σy)^2))
> Seems ok to stick with double as other databases do this and there does not
Done


http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@339
PS10, Line 339:
> nit: remove the space
Done


http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@350
PS10, Line 350:
> nit: remove the space
Done


http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@364
PS10, Line 364:
> nit: fix the indention
Done


http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@365
PS10, Line 365:
> nit: remove blank lines
Done


http://gerrit.cloudera.org:8080/#/c/18413/11/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/11/be/src/exprs/aggregate-functions-ir.cc@295
PS11, Line 295: // Correlation coefficient:
> line too long (97 > 90)
Ack


http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1539
PS10, Line 1539: double
> Also test empty table and all zero values cases.
Done



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 12
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 11 May 2022 09:18:21 +
Gerrit-HasComments: Yes


  1   2   >