[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP() and COVAR_POP() .. IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP() and COVAR_POP() CORR() function takes two numeric type columns as arguments and returns the Pearson's correlation coefficient between them. COVAR_SAMP() function takes two numeric type columns and returns sample covariance between them. COVAR_POP() function takes two numeric type columns and returns population covariance between them. These UDAFs are tested with a few query tests written in aggregation.test. Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Reviewed-on: http://gerrit.cloudera.org:8080/18413 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 1,074 insertions(+), 0 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 30 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 29: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 29 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sun, 19 Jun 2022 04:35:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 28: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 28 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sat, 18 Jun 2022 23:49:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 29: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8246/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 29 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sat, 18 Jun 2022 23:50:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 29: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 29 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sat, 18 Jun 2022 23:50:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()
Gergely Fürnstáhl has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 28: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 28 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sat, 18 Jun 2022 17:22:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 28: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10793/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 28 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 17 Jun 2022 08:44:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 27: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10792/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 27 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 17 Jun 2022 08:40:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 28: (4 comments) > Patch Set 25: > > (1 comment) http://gerrit.cloudera.org:8080/#/c/18413/26//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18413/26//COMMIT_MSG@7 PS26, Line 7: : > nit: remove the space before ":" Done http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@300 PS25, Line 300: // Correlation coefficient formula used: : // r = covar / (√(xvar * yvar) : struct CorrState { : int64_t count; // number of elements : double xavg; // average of x elements : double yavg; // average of y elements : double xvar; // n times the variance of x elements : > As we discussed in the call, removing the code duplication brings up more u Sure, I've added that. http://gerrit.cloudera.org:8080/#/c/18413/26/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/26/be/src/exprs/aggregate-functions-ir.cc@343 PS26, Line 343: if (state->count <= 1) { : mems > nit: put "}" and "else {" at the same line Done http://gerrit.cloudera.org:8080/#/c/18413/26/be/src/exprs/aggregate-functions-ir.cc@513 PS26, Line 513: memset(state, 0, sizeof(CovarState)); : } else > nit: put "}" and "else {" at the same line Done -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 28 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 17 Jun 2022 08:25:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has uploaded a new patch set (#28). ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP() and COVAR_POP() .. IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP() and COVAR_POP() CORR() function takes two numeric type columns as arguments and returns the Pearson's correlation coefficient between them. COVAR_SAMP() function takes two numeric type columns and returns sample covariance between them. COVAR_POP() function takes two numeric type columns and returns population covariance between them. These UDAFs are tested with a few query tests written in aggregation.test. Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 1,074 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/28 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 28 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions: CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has uploaded a new patch set (#27). ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP() and COVAR_POP() .. IMPALA-11205: Implement Statistical functions: CORR(), COVAR_SAMP() and COVAR_POP() CORR() function takes two numeric type columns as arguments and returns the Pearson's correlation coefficient between them. COVAR_SAMP() function takes two numeric type columns and returns sample covariance between them. COVAR_POP() function takes two numeric type columns and returns population covariance between them. These UDAFs are tested with a few query tests written in aggregation.test. Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 1,074 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/27 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 27 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Gergely Fürnstáhl has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 25: (1 comment) http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@300 PS25, Line 300: struct CorrState { : int64_t count; // number of elements : double xavg; // average of x elements : double yavg; // average of y elements : double xvar; // n times the variance of x elements : double yvar; // n times the variance of y elements : double covar; // n times the covariance : }; > In Patchset12, you used CorrState for Covar calculations as well, which I a As we discussed in the call, removing the code duplication brings up more unnecessary complications in the code and its cleaner to keep them separate, as it is. Maybe you can add a cross reference in the comment to the structs, e.g. "If you change Corr functions you probably have to change Covar functions too and vice versa. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 25 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 17 Jun 2022 07:17:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 26: (4 comments) http://gerrit.cloudera.org:8080/#/c/18413/26//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18413/26//COMMIT_MSG@7 PS26, Line 7: nit: remove the space before ":" http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@324 PS25, Line 324: if (state->count > 1 > Yeah, you're right, Quanlong and I had a discussion on this, we decided to We can improve this later if we do find it impacts the performance. I think most of the group sizes are large in practise. So the cpu prediction correctness rate won't be so bad. Maybe we can add a LIKELY() marcro here. http://gerrit.cloudera.org:8080/#/c/18413/26/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/26/be/src/exprs/aggregate-functions-ir.cc@343 PS26, Line 343: } : else { nit: put "}" and "else {" at the same line http://gerrit.cloudera.org:8080/#/c/18413/26/be/src/exprs/aggregate-functions-ir.cc@513 PS26, Line 513: } : else { nit: put "}" and "else {" at the same line -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 26 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 17 Jun 2022 03:11:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 26: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10775/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 26 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 15 Jun 2022 15:47:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Gergely Fürnstáhl has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 25: (1 comment) http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@300 PS25, Line 300: struct CorrState { : int64_t count; // number of elements : double xavg; // average of x elements : double yavg; // average of y elements : double xvar; // n times the variance of x elements : double yvar; // n times the variance of y elements : double covar; // n times the covariance : }; > Yeah, that's right, we did something similar in our 12th patch where we did In Patchset12, you used CorrState for Covar calculations as well, which I agree is not great, wasteful and confusing My proposal is to still have 2 separate structs, but CorrState could utilize CovarState. Some sketch, you can use composition instead of inheritance as well. struct CovarState { int64_t count; // number of elements double xavg; // average of x elements double yavg; // average of y elements double covar; // n times the covariance void update(double x, double y) {...} void remove(double x, double y) {...} void merge(CovarState& other) {...} }; struct CorrState : public CovarState {\ double xvar; // n times the variance of x elements double yvar; // n times the variance of y elements void update(double x, double y) { double prevXavg = xavg; double prevYavg = yavg; CovarState::update(x,y); xvar = ... yvar = ... } ... }; -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 25 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 15 Jun 2022 15:42:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 26: (3 comments) > Patch Set 25: Code-Review+1 > > (3 comments) > > Math seems good to me http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@300 PS25, Line 300: struct CorrState { : int64_t count; // number of elements : double xavg; // average of x elements : double yavg; // average of y elements : double xvar; // n times the variance of x elements : double yvar; // n times the variance of y elements : double covar; // n times the covariance : }; > As I understand, both CorrState and CovarState calculates xavg,yavg and cov Yeah, that's right, we did something similar in our 12th patch where we didn't create a new state for covar_samp() and covar_pop() but two fields were not getting used in calculating covar_samp() and covar_pop() and it was causing a lot of unnecessary computation so we separated it into two different states. http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@324 PS25, Line 324: if (state->count > 1 > nit: do we need this? for first items state->xavg = deltaX = x, so we will Yeah, you're right, Quanlong and I had a discussion on this, we decided to add this check as it saves some floating point computation. More computation would means more floating point rounding error. http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@331 PS25, Line 331: // vx_n = vx_(n - 1) + (x_n - mx_(n - 1)) * (x_n - mx_n) > nit: the order is a bit misleading as deltax=(x_n - mx_(n - 1)) Yeah, I just changed it. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 26 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 15 Jun 2022 15:28:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has uploaded a new patch set (#26). ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() CORR() function takes two numeric type columns as arguments and returns the Pearson's correlation coefficient between them. COVAR_SAMP() function takes two numeric type columns and returns sample covariance between them. COVAR_POP() function takes two numeric type columns and returns population covariance between them. These UDAFs are tested with a few query tests written in aggregation.test. Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 1,074 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/26 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 26 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Gergely Fürnstáhl has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 25: Code-Review+1 (3 comments) Math seems good to me http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@300 PS25, Line 300: struct CorrState { : int64_t count; // number of elements : double xavg; // average of x elements : double yavg; // average of y elements : double xvar; // n times the variance of x elements : double yvar; // n times the variance of y elements : double covar; // n times the covariance : }; As I understand, both CorrState and CovarState calculates xavg,yavg and covar the same way. I think we could restructure this a bit to spare code duplication. E.g.: CovarState::update, remove and merge could encapsulate the logic regarding the common members, and CorrState could have a CovarState member, and call its functions meanwhile calculating xvar and yvar too http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@324 PS25, Line 324: if (state->count > 1 nit: do we need this? for first items state->xavg = deltaX = x, so we will add 0 to state->covar, does not change anything. This way we pay a branching for every count unnecessarily http://gerrit.cloudera.org:8080/#/c/18413/25/be/src/exprs/aggregate-functions-ir.cc@331 PS25, Line 331: // vx_n = vx_(n - 1) + (x_n - mx_n) * (x_n - mx_(n - 1)) nit: the order is a bit misleading as deltax=(x_n - mx_(n - 1)) -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 25 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Gergely Fürnstáhl Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 15 Jun 2022 14:57:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 25: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8230/ -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 25 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 14 Jun 2022 07:49:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 25: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8230/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 25 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 14 Jun 2022 06:33:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 25: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 25 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 14 Jun 2022 06:33:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 24: Code-Review+2 > Patch Set 24: > > Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8229/ The failure is unrelated to this patch https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/5845 https://issues.apache.org/jira/browse/IMPALA-10125 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 24 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 14 Jun 2022 06:22:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 24: Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8229/ -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 24 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 14 Jun 2022 04:41:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 24: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 24 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 14 Jun 2022 00:39:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 24: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8229/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 24 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 14 Jun 2022 00:14:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 24: recheck -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 24 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 13 Jun 2022 16:39:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 24: Code-Review+1 The test failure is due to a flaky test (IMPALA-1995) which is unrelated to this patch. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 24 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 13 Jun 2022 01:32:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 24: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8220/ -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 24 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sun, 12 Jun 2022 18:11:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 24: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8220/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 24 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sun, 12 Jun 2022 13:49:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 24: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10750/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 24 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 10 Jun 2022 12:37:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 22: Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8212/ -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 22 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 10 Jun 2022 12:35:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has uploaded a new patch set (#24). ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() CORR() function takes two numeric type columns as arguments and returns the Pearson's correlation coefficient between them. COVAR_SAMP() function takes two numeric type columns and returns sample covariance between them. COVAR_POP() function takes two numeric type columns and returns population covariance between them. These UDAFs are tested with a few query tests written in aggregation.test. Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 1,074 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/24 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 24 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 23: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 23 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 10 Jun 2022 11:07:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 23: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10749/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 23 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 10 Jun 2022 10:12:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 23: (4 comments) > Patch Set 22: Code-Review+1 > > (4 comments) http://gerrit.cloudera.org:8080/#/c/18413/22/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/22/be/src/exprs/aggregate-functions-ir.cc@341 PS22, Line 341: if (state->count <= 1) { > nit: this is part of a multi-line if-else statement so our code style still Done http://gerrit.cloudera.org:8080/#/c/18413/22/be/src/exprs/aggregate-functions-ir.cc@452 PS22, Line 452: DCHECK(state->xvar >= -1E-8); : DCHECK(state->yvar >= -1E-8 > nit: add brackets {} for multi-line if-statement Done http://gerrit.cloudera.org:8080/#/c/18413/22/be/src/exprs/aggregate-functions-ir.cc@507 PS22, Line 507: state->xavg += (x - state->xavg) / state->count; > nit: this is part of a multi-line if-else statement so our code style still Done http://gerrit.cloudera.org:8080/#/c/18413/22/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java: http://gerrit.cloudera.org:8080/#/c/18413/22/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1324 PS22, Line 1324: List > nit: use the same idention as the above metionds. Done -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 23 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 10 Jun 2022 09:50:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has uploaded a new patch set (#23). ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() CORR() function takes two numeric type columns as arguments and returns the Pearson's correlation coefficient between them. COVAR_SAMP() function takes two numeric type columns and returns sample covariance between them. COVAR_POP() function takes two numeric type columns and returns population covariance between them. These UDAFs are tested with a few query tests written in aggregation.test. Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 1,074 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/23 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 23 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 22: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8212/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 22 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 10 Jun 2022 08:06:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 22: Code-Review+1 (4 comments) http://gerrit.cloudera.org:8080/#/c/18413/22/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/22/be/src/exprs/aggregate-functions-ir.cc@341 PS22, Line 341: if (state->count <= 1) memset(state, 0, sizeof(CorrState)); nit: this is part of a multi-line if-else statement so our code style still prefer adding brackets {} here. http://gerrit.cloudera.org:8080/#/c/18413/22/be/src/exprs/aggregate-functions-ir.cc@452 PS22, Line 452: if (state->count == 0 || state->count == 1 || state->xvar <= 0.0 || state->yvar <= 0.0) : return DoubleVal::null(); nit: add brackets {} for multi-line if-statement http://gerrit.cloudera.org:8080/#/c/18413/22/be/src/exprs/aggregate-functions-ir.cc@507 PS22, Line 507: if (state->count <= 1) memset(state, 0, sizeof(CovarState)); nit: this is part of a multi-line if-else statement so our code style still prefer adding brackets {} here. http://gerrit.cloudera.org:8080/#/c/18413/22/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java: http://gerrit.cloudera.org:8080/#/c/18413/22/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1324 PS22, Line 1324: nit: use the same idention as the above metionds. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 22 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 10 Jun 2022 08:05:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 22: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8205/ -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 22 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 08 Jun 2022 23:20:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 22: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10734/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 22 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 08 Jun 2022 18:44:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 22: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8205/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 22 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 08 Jun 2022 18:43:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 22: (21 comments) > Patch Set 20: > > (19 comments) http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@301 PS20, Line 301: int64_t count; // number of elements > Change all occurrences of long to int64_t Done http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@341 PS20, Line 341: ) > nit: remove the space before ")" Done http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@342 PS20, Line 342: else { : --state->count; : // mx_(n - 1) = mx_n - (x_n - mx_n) / (n - 1) : state->xavg -= deltaX / state->count; : // my_(n - 1) = my_n - (y_n - my_n) / (n - 1) : state->yavg -= delt > nit: these can be simplified by memset(), just like what we do at line 313. Done http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@348 PS20, Line 348: // c_(n - 1) = c_n - (x_n - mx_n) * (y_n - my_(n -1)) : st > nit: put "}" and "else" at the same line, i.e. "} else {" Done http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@355 PS20, Line 355: > This comment seems incorrect. 'deltaX' is actually (x_n - mx_n). 'y - state Done http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@375 PS20, Line 375: > This is not clear to me. Do you mean 'count'? In functions which don't have finalize() function, we need to check explicitly if num_removes() >= num_updates(), but here since we have finalize() function, we don't need to explicitly check the number of update() and remove() calls. http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@378 PS20, Line 378: void AggregateFunctions::TimestampCorrUpdate(FunctionContext* ctx, > Change all occurrences of NULL to nullptr Done http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@393 PS20, Line 393: // Remove doesn't need to explicitly > nit: please add brackets {} for multi-line if-statement. Done http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@398 PS20, Line 398: lue::FromTimestampVal(src1); > Could you explain more? Why do we need to check the number of Update/Remove In functions which don't have finalize() function, we need to check explicitly if num_removes() >= num_updates(), but here since we have finalize() function, we don't need to explicitly check the number of update() and remove() calls. http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@407 PS20, Line 407: void AggregateFunctions::CorrMerge(Func > nit: please add brackets {} for multi-line if-statement. Done http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@451 PS20, Line 451: e->yvar >= -1E-8); > We should check whether 'state->xvar * state->yvar' become negative before Done http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@452 PS20, Line 452: if (state->count == 0 || state->count == 1 || state->xvar <= 0.0 || state->yvar <= 0.0) : return DoubleVal::null(); : double r = sqrt(state->xvar * state->yvar); > move this to the begining of this method, and split it into two DCHECKs so Done http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@467 PS20, Line 467: ctx->Free(src.ptr); : return r; > This is a use-after-free pattern. We should not free the resource if we sti Done http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@507 PS20, Line 507: ) > nit: remove the space before ")". Done http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@508 PS20, Line 508: else { : --state->count; : // my_(n - 1) = my_n - (y_n - my_n) / (n - 1) : state->yavg -= (y - > nit: use memset instead. Done http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@512 PS20, Line 512: // c_(n - 1) = c_n - (x_n - mx_(n - 1)) * (y_n - my_n) : stat > nit: put them into one line, i.e. "} else {" Done http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@535 PS20, Line 535: > Need explanation here too In functions which don't have finalize() function, we need to check explicitly if num_removes() >= num_updates(), but here since we have
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has uploaded a new patch set (#22). ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() CORR() function takes two numeric type columns as arguments and returns the Pearson's correlation coefficient between them. COVAR_SAMP() function takes two numeric type columns and returns sample covariance between them. COVAR_POP() function takes two numeric type columns and returns population covariance between them. These UDAFs are tested with a few query tests written in aggregation.test. Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 1,068 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/22 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 22 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 21: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 21 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 08 Jun 2022 11:02:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 21: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8200/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 21 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 08 Jun 2022 06:32:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 20: (19 comments) http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@341 PS20, Line 341: nit: remove the space before ")" http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@342 PS20, Line 342: state->count = 0; : state->xavg = 0.0; : state->yavg = 0.0; : state->xvar = 0.0; : state->yvar = 0.0; : state->covar = 0.0; nit: these can be simplified by memset(), just like what we do at line 313. memset(state, 0, sizeof(CorrState)) http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@348 PS20, Line 348: } : else nit: put "}" and "else" at the same line, i.e. "} else {" http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@355 PS20, Line 355: (x_n - mx_(n - 1)) * (y_n - my_n) This comment seems incorrect. 'deltaX' is actually (x_n - mx_n). 'y - state->yavg' is (y_n - my_(n-1)). BTW, please fix the idention of line 355 ~ 360. http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@375 PS20, Line 375: the number of calls to Update() or Remove() This is not clear to me. Do you mean 'count'? http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@393 PS20, Line 393: CorrUpdateState(val1, val2, state); nit: please add brackets {} for multi-line if-statement. http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@398 PS20, Line 398: the number of calls to Update() or Remove() Could you explain more? Why do we need to check the number of Update/Remove calls? http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@407 PS20, Line 407: CorrRemoveState(val1, val2, state); nit: please add brackets {} for multi-line if-statement. http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@451 PS20, Line 451: sqrt((state->xvar * state->yvar)) We should check whether 'state->xvar * state->yvar' become negative before calling sqrt(). Please split the if-statement at line 455 into two: One for checking counts and vars before this line, and the other one for checking 'r == 0.0' after this line. BTW, don't need duplicated parentheses here. http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@452 PS20, Line 452: // xvar and yvar become negative in certain cases due to floating point rounding error. : // Since these values are very small, they can be ignored and rounded to 0. : DCHECK(state->xvar >= -1E-8 && state->yvar >= -1E-8); move this to the begining of this method, and split it into two DCHECKs so we know exactly which one breaks if it happens. DCHECK(state->xvar >= -1E-8); DCHECK(state->yvar >= -1E-8); http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@467 PS20, Line 467: ctx->Free(src.ptr); : return CorrGetValue(ctx, src); This is a use-after-free pattern. We should not free the resource if we still want to use it. Chang it to DoubleVal r = CorrGetValue(ctx, src); ctx->Free(src.ptr); return r; http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@507 PS20, Line 507: nit: remove the space before ")". http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@508 PS20, Line 508: state->count = 0; : state->xavg = 0.0; : state->yavg = 0.0; : state->covar = 0.0; nit: use memset instead. memset(state, 0, sizeof(CovarState)) http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@512 PS20, Line 512: } : else { nit: put them into one line, i.e. "} else {" http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@535 PS20, Line 535: the number of calls to Update() or Remove() Need explanation here too http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@553 PS20, Line 553: CovarUpdateState(val1, val2, state); nit: add brackets {} for multi-line if-statement. http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@567 PS20, Line 567: CovarRemoveState(val1, val2, state); nit: add brackets {} for multi-line if-statement. http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@625 PS20, Line 625:
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 20: (2 comments) http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@301 PS20, Line 301: long count; // number of elements Change all occurrences of long to int64_t http://gerrit.cloudera.org:8080/#/c/18413/20/be/src/exprs/aggregate-functions-ir.cc@378 PS20, Line 378: DCHECK(dst->ptr != NULL); Change all occurrences of NULL to nullptr -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 20 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 07 Jun 2022 13:10:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 20: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8191/ -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 20 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 07 Jun 2022 09:07:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 20: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8191/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 20 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 07 Jun 2022 04:36:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 19: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8189/ -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 19 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 06 Jun 2022 21:50:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 20: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10712/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 20 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 06 Jun 2022 18:51:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 20: (1 comment) > Patch Set 18: > > (1 comment) http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@408 PS18, Line 408: } > Code is still skipping NaN.. Oh sorry, I missed it, its fixed now. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 20 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 06 Jun 2022 18:34:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has uploaded a new patch set (#20). ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() CORR() function takes two numeric type columns as arguments and returns the Pearson's correlation coefficient between them. COVAR_SAMP() function takes two numeric type columns and returns sample covariance between them. COVAR_POP() function takes two numeric type columns and returns population covariance between them. These UDAFs are tested with a few query tests written in aggregation.test. Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 1,067 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/20 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 20 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 18: (1 comment) http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@408 PS18, Line 408: if (isnan(val1) || isnan(val2)) return; > Skipping NaN here seems questionable? Please add comments or handle appropr Code is still skipping NaN.. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 18 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 06 Jun 2022 18:02:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 19: (3 comments) > Patch Set 19: > > (10 comments) > > > Patch Set 18: > > > > (1 comment) http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@416 PS17, Line 416: CorrState* src_state = reinterpret_cast(src.ptr); : DCHECK(dst->ptr != NULL); : DCHECK_EQ(sizeof(CorrState), dst->len); : CorrState* dst_state = reinterpret_cast(dst->ptr); : if (src.ptr != nullptr) { : long nA = dst_state->count; > This was causing impala crash so didn't add this as of now Added http://gerrit.cloudera.org:8080/#/c/18413/19/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/19/be/src/exprs/aggregate-functions-ir.cc@324 PS19, Line 324: count This check is just added as it saves some floating point computation. More computation would means more floating point rounding error. http://gerrit.cloudera.org:8080/#/c/18413/19/be/src/exprs/aggregate-functions-ir.cc@458 PS19, Line 458: = Please let me know if this tolerance value is suitable -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 19 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 06 Jun 2022 17:36:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 19: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10709/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 19 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 06 Jun 2022 17:33:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 19: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8189/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 19 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 06 Jun 2022 17:26:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 19: (10 comments) > Patch Set 18: > > (1 comment) http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@296 PS18, Line 296: // https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online > could you explain more about how the `CorrState` is updated and used to cal Added a few links http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@408 PS18, Line 408: tm_src2.ToSubsecondUnixTime(UTCPTR, )) { > Skipping NaN here seems questionable? Please add comments or handle appropr Yeah, its fixed now. http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@422 PS18, Line 422: long nB = src_state->count; > Does (nB == 0) need handling? nB is the count of src->state. nB == 0 has been accounted for in CorrInit() and CorrRemove(). We alse make sure that src.ptr is not equal to nullptr. http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@462 PS18, Line 462: return DoubleVal(corr); > Add an assert that negative values are small and comment that this can happ Sure,added, I've taken the tolerance value to be around -1E-8 as it seemed safe to me. We can probably go for smaller values like -1E-10 as well. Please let me know what would be a suitable tolerance value. http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@543 PS18, Line 543: DCHECK_EQ(sizeof(CovarState), dst->len); > Skipping Nan? Updated in the latest patch. http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@575 PS18, Line 575: } > Skipping NaN Updated in the latest patch. http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@589 PS18, Line 589: return; > handle (nB == 0) ? nB is the count of src->state. nB == 0 has been accounted for in CorrInit() and CorrRemove(). We alse make sure that src.ptr is not equal to nullptr. http://gerrit.cloudera.org:8080/#/c/18413/18/testdata/workloads/functional-query/queries/QueryTest/aggregation.test File testdata/workloads/functional-query/queries/QueryTest/aggregation.test: http://gerrit.cloudera.org:8080/#/c/18413/18/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1526 PS18, Line 1526: from tpcds.store; > This may be non-deterministic (flaky) without an order by on the window cla Yeah, its fixed in the latest patch http://gerrit.cloudera.org:8080/#/c/18413/18/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1544 PS18, Line 1544: select id, double_col, corr(double_col, int_col) over (partition by month order by id) from functional.alltypes > This may be non-deterministic (flaky) without an order by on the window cla Yeah, its fixed in the latest patch http://gerrit.cloudera.org:8080/#/c/18413/18/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1578 PS18, Line 1578: select s_store_sk, corr(s_number_employees, s_floor_space) over (partition by s_city order by s_store_sk > Check if order by city is strong ordering. It wasn't strong ordering, I've changed it to s_store_sk which contains unique values. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 19 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 06 Jun 2022 17:23:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has uploaded a new patch set (#19). ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() CORR() function takes two numeric type columns as arguments and returns the Pearson's correlation coefficient between them. COVAR_SAMP() function takes two numeric type columns and returns sample covariance between them. COVAR_POP() function takes two numeric type columns and returns population covariance between them. These UDAFs are tested with a few query tests written in aggregation.test. Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 1,075 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/19 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 19 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 18: (1 comment) http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@462 PS18, Line 462: return DoubleVal::null(); Add an assert that negative values are small and comment that this can happen due to floating point rounding error. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 18 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 03 Jun 2022 13:42:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 18: > Patch Set 18: > > (1 comment) Hi, sure Jian, I’ll be adding a few details in the new patch after I get a few clarifications, till then you can checkout these links : Basic overview : https://docs.google.com/document/d/1TH-907nK5JWIGZ-ePo9CQLQYlW2Y-jEAajfCYHhswus/edit This contains implementation details : https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online https://www.osti.gov/biblio/1028931 Hive also implements using a similar method : https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCorrelation.java#L366 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 18 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 03 Jun 2022 06:32:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Jian Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 18: (1 comment) http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@296 PS18, Line 296: // r = covar / (√(xvar * yvar) could you explain more about how the `CorrState` is updated and used to calculate the final correlation of two variables? it would be helpful for readers unfamiliar with the algorithms if you could add links about the "Pearson's correlation" and "Welford's online algorithm". -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 18 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jian Zhang Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 03 Jun 2022 01:38:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 18: (8 comments) http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@408 PS18, Line 408: if (isnan(val1) || isnan(val2)) return; Skipping NaN here seems questionable? Please add comments or handle appropriately. http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@422 PS18, Line 422: if (nA == 0) { Does (nB == 0) need handling? http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@543 PS18, Line 543: if (src1.is_null || isnan(src1.val) || src2.is_null || isnan(src2.val)) return; Skipping Nan? http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@575 PS18, Line 575: if (isnan(val1) || isnan(val2)) return; Skipping NaN http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@589 PS18, Line 589: if (nA == 0) { handle (nB == 0) ? http://gerrit.cloudera.org:8080/#/c/18413/18/testdata/workloads/functional-query/queries/QueryTest/aggregation.test File testdata/workloads/functional-query/queries/QueryTest/aggregation.test: http://gerrit.cloudera.org:8080/#/c/18413/18/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1526 PS18, Line 1526: select s_store_sk, corr(s_number_employees, s_floor_space) over (partition by s_city) from tpcds.store; This may be non-deterministic (flaky) without an order by on the window clause. http://gerrit.cloudera.org:8080/#/c/18413/18/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1544 PS18, Line 1544: select id, double_col, corr(double_col, int_col) over (partition by month) from functional.alltypes This may be non-deterministic (flaky) without an order by on the window clause. http://gerrit.cloudera.org:8080/#/c/18413/18/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1578 PS18, Line 1578: select s_store_sk, corr(s_number_employees, s_floor_space) over (partition by s_city order by s_city Check if order by city is strong ordering. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 18 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 01 Jun 2022 18:38:53 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 18: (11 comments) > Patch Set 17: > > (9 comments) http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@293 PS17, Line 293: // online algorithm. This is calcula > nit: please also mention "Welford's online algorithm". Done http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@337 PS17, Line 337: double deltaY = y - state->yavg; > If state->count is 1, it will be decreased to 0. We will hit divide-by-zero Yeah, it's been taken care of in the new patch. http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@342 PS17, Line 342: state->xvar = 0.0; > If the original count is 2, it will be decreased to 1 at line 337. Then we Sure, updated in the latest patch http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@355 PS17, Line 355: vx_ > nit: we prefer nullptr to NULL. Done http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@358 PS17, Line 358: state->yvar -= deltaY * (y - state->yavg); > nit: we can merge this check to line 354, i.e. Done http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@371 PS17, Line 371: > nit: same as above. We can merge the check to line 367. Done http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@412 PS17, Line 412: > nit: add a space after "if" and replace NULL with nullptr. Done http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@416 PS17, Line 416: DCHECK(dst->ptr != NULL); : DCHECK_EQ(sizeof(CorrState), dst->len); : CorrState* dst_state = reinterpret_cast(dst->ptr); : if (src.ptr != nullptr) { : long nA = dst_state->count; : long nB = src_state->count; > nit: Can we simplify this by using memcpy? i.e. This was causing impala crash so didn't add this as of now http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@423 PS17, Line 423: dst_state->count = src_state->count; > nit: replace "if" with "else if" to save one check, or add a "return" at th Done http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@453 PS17, Line 453: > sqrt() is expensive. Let's change this to sqrt(state->xvar * state->yvar). Done http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@493 PS17, Line 493: > nit: both terms? Maybe you want to add another equation after line 491: Done -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 18 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 01 Jun 2022 07:05:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 18: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 18 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 01 Jun 2022 01:58:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 18: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10664/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 18 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 31 May 2022 21:59:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has uploaded a new patch set (#18). ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() CORR() function takes two numeric type columns as arguments and returns the Pearson's correlation coefficient between them. COVAR_SAMP() function takes two numeric type columns and returns sample covariance between them. COVAR_POP() function takes two numeric type columns and returns population covariance between them. These UDAFs are tested with a few query tests written in aggregation.test. Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 1,078 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/18 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 18 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 17: (9 comments) http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@293 PS17, Line 293: // using a stable one-pass algorithm nit: please also mention "Welford's online algorithm". http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@355 PS17, Line 355: NULL nit: we prefer nullptr to NULL. http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@358 PS17, Line 358: if (!isnan(src1.val) && !isnan(src2.val)) { nit: we can merge this check to line 354, i.e. if (src1.is_null || isnan(src1.val) || src2.is_null || isnan(src2.val)) return; http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@371 PS17, Line 371: if (!isnan(src1.val) && !isnan(src2.val)) { nit: same as above. We can merge the check to line 367. http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@412 PS17, Line 412: if(src.ptr != NULL) { nit: add a space after "if" and replace NULL with nullptr. http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@416 PS17, Line 416: dst_state->count = src_state->count; : dst_state->xavg = src_state->xavg; : dst_state->yavg = src_state->yavg; : dst_state->xvar = src_state->xvar; : dst_state->yvar = src_state->yvar; : dst_state->covar = src_state->covar; nit: Can we simplify this by using memcpy? i.e. memcpy(dst, , sizeof(CorrState) http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@423 PS17, Line 423: if (nA != 0 && nB != 0) { nit: replace "if" with "else if" to save one check, or add a "return" at the end of the above if-branch. http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@453 PS17, Line 453: sqrt(state->xvar) / sqrt(state->yvar) sqrt() is expensive. Let's change this to sqrt(state->xvar * state->yvar). http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@493 PS17, Line 493: both terms nit: both terms? Maybe you want to add another equation after line 491: // c_n = c_(n - 1) + (x_n - mx_n) * (y_n - my_(n - 1)) -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 17 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 31 May 2022 06:57:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 17: (2 comments) http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@337 PS17, Line 337: if (state->count > 0) --state->count; If state->count is 1, it will be decreased to 0. We will hit divide-by-zero errors in the following lines. http://gerrit.cloudera.org:8080/#/c/18413/17/be/src/exprs/aggregate-functions-ir.cc@342 PS17, Line 342: if (state->count > 1) { If the original count is 2, it will be decreased to 1 at line 337. Then we will miss it here which is wrong IIUC. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 17 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 31 May 2022 05:08:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 17: (2 comments) > Patch Set 15: > > (3 comments) http://gerrit.cloudera.org:8080/#/c/18413/15/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/15/be/src/exprs/aggregate-functions-ir.cc@328 PS15, Line 328: state->xvar += deltaX * (x - state- > This seems wrong to me. Let's say mx_n is the avg of [x_1, x_2, ..., x_n]. Right, I was skeptical about it too! I've made the required changes. http://gerrit.cloudera.org:8080/#/c/18413/15/be/src/exprs/aggregate-functions-ir.cc@332 PS15, Line 332: } : : static inline void CorrRemoveState(double x, d > I think we need to find a reference for these. Here are a few useful links: https://www.osti.gov/servlets/purl/1028931 https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCorrelation.java#L366 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 17 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 24 May 2022 07:34:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 17: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 17 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 24 May 2022 02:06:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 17: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10623/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 17 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 23 May 2022 21:56:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 16: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10622/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 16 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 23 May 2022 21:50:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 17: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8146/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 17 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 23 May 2022 21:37:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has uploaded a new patch set (#17). ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() CORR() function takes two numeric type columns as arguments and returns the Pearson's correlation coefficient between them. COVAR_SAMP() function takes two numeric type columns and returns sample covariance between them. COVAR_POP() function takes two numeric type columns and returns population covariance between them. These UDAFs are tested with a few query tests written in aggregation.test. Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 940 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/17 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 17 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() CORR() function takes two numeric type columns as arguments and returns the Pearson's correlation coefficient between them. COVAR_SAMP() function takes two numeric type columns and returns sample covariance between them. COVAR_POP() function takes two numeric type columns and returns population covariance between them. These UDAFs are tested with a few query tests written in aggregation.test. Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 940 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/16 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 16 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has restored this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Restored -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: restore Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 15 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has abandoned this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Abandoned minor change -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 16 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 15: (3 comments) http://gerrit.cloudera.org:8080/#/c/18413/15/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/15/be/src/exprs/aggregate-functions-ir.cc@328 PS15, Line 328: state->xavg -= deltaX / state->count; This seems wrong to me. Let's say mx_n is the avg of [x_1, x_2, ..., x_n]. In CorrUpdateState(), we have mx_n = mx_(n-1) + [x_n - mx_(n-1)]/n Reverting the formula we have mx_(n-1) = (n * mx_n - x_n)/(n-1) = mx_n - (x_n - mx_n)/(n-1) The above code corresponds to mx_(n-1) = mx_n - (x_n - mx_n) / n So it's incorrect. We should decrease state->count before this. I think the correct code is --state->count; state->xavg -= deltaX / state->count; state->yavg -= deltaY / state->count; We also need to check if state->count becomes 0. http://gerrit.cloudera.org:8080/#/c/18413/15/be/src/exprs/aggregate-functions-ir.cc@332 PS15, Line 332: state->covar -= deltaX * (y - state->yavg); : state->xvar -= deltaX * (x - state->xavg); : state->yvar -= deltaY * (y - state->yavg); I think we need to find a reference for these. http://gerrit.cloudera.org:8080/#/c/18413/14/testdata/workloads/functional-query/queries/QueryTest/aggregation.test File testdata/workloads/functional-query/queries/QueryTest/aggregation.test: http://gerrit.cloudera.org:8080/#/c/18413/14/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1492 PS14, Line 1492: NULL > Resolved FWIW, this is resolved by changing the formula to use another one with better error precision. https://www.johndcook.com/blog/standard_deviation/ -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 15 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 19 May 2022 07:02:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 15: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 15 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 18 May 2022 12:26:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 15: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10591/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 15 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 18 May 2022 08:11:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 15: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8115/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 15 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 18 May 2022 07:54:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 15: (2 comments) > Patch Set 14: > > (2 comments) > > I double the tests and find a question we need to address. http://gerrit.cloudera.org:8080/#/c/18413/14/testdata/workloads/functional-query/queries/QueryTest/aggregation.test File testdata/workloads/functional-query/queries/QueryTest/aggregation.test: http://gerrit.cloudera.org:8080/#/c/18413/14/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1492 PS14, Line 1492: NULL > Hive returns NULL in this case. I think the reason is https://issues.apache Resolved http://gerrit.cloudera.org:8080/#/c/18413/14/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1558 PS14, Line 1558: > For tests on window functions, can we also copy Hive's test? We still don't have support for window clause -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 15 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 18 May 2022 07:51:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() CORR() function takes two numeric type columns as arguments and returns the Pearson's correlation coefficient between them. COVAR_SAMP() function takes two numeric type columns and returns sample covariance between them. COVAR_POP() function takes two numeric type columns and returns population covariance between them. These UDAFs are tested with a few query tests written in aggregation.test. Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 853 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/15 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 15 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 14: (2 comments) I double the tests and find a question we need to address. http://gerrit.cloudera.org:8080/#/c/18413/14/testdata/workloads/functional-query/queries/QueryTest/aggregation.test File testdata/workloads/functional-query/queries/QueryTest/aggregation.test: http://gerrit.cloudera.org:8080/#/c/18413/14/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1492 PS14, Line 1492: 1.01233222875e-08 Hive returns NULL in this case. I think the reason is https://issues.apache.org/jira/browse/HIVE-16178: If N * SUMX2 equals SUMX * SUMX , then the result is the null value. If N * SUMY2 equals SUMY * SUMY , then the result is the null value. The commit is https://github.com/apache/hive/commit/d97e4874dc0e09c535e8cb908f6b17698e49a5d6 Could you check if indeed N * SUMY2 equals SUMY * SUMY in this case, or it's a precision issue in Hive? http://gerrit.cloudera.org:8080/#/c/18413/14/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1558 PS14, Line 1558: For tests on window functions, can we also copy Hive's test? https://github.com/apache/hive/blob/3b3da9ed7f3813bae3e959670df55682fea648d3/ql/src/test/results/clientpositive/llap/windowing.q.out#L853-L890 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 14 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 16 May 2022 01:36:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 14: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 14 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 13 May 2022 15:47:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 14: Code-Review+1 LGTM -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 14 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 13 May 2022 12:45:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 14: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10572/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 14 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 13 May 2022 11:43:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 14: (3 comments) > Patch Set 13: > > (3 comments) http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test File testdata/workloads/functional-query/queries/QueryTest/aggregation.test: http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1526 PS12, Line 1526: select s_store_sk, corr(s_number_employees, s_floor_space) over (partition by s_city) from tpcds.store; > sorry that I mean changing the query to No problem at all, I've updated the query. http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1683 PS12, Line 1683: > sorry, I mean changing the query to No problem at all, I've updated the query. http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1700 PS12, Line 1700: TYPES > sorry, I mean changing it to No problem at all, I've updated the query. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 14 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 13 May 2022 11:31:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() CORR() function takes two numeric type columns as arguments and returns the Pearson's correlation coefficient between them. COVAR_SAMP() function takes two numeric type columns and returns sample covariance between them. COVAR_POP() function takes two numeric type columns and returns population covariance between them. These UDAFs are tested with a few query tests written in aggregation.test. Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 759 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/14 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 14 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 14: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8101/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 14 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 13 May 2022 11:24:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 13: (3 comments) http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test File testdata/workloads/functional-query/queries/QueryTest/aggregation.test: http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1526 PS12, Line 1526: select corr(s_store_sk, s_floor_space) over (partition by s_city) from tpcds.store; > Done sorry that I mean changing the query to select s_store_sk, corr(s_number_employees,s_floor_space) over (partition by s_city) from tpcds.store Some result rows are identical. Adding the s_store_sk column helps to distinguish them. http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1683 PS12, Line 1683: > Done sorry, I mean changing the query to select s_store_sk, covar_samp(s_number_employees, s_floor_space) over (partition by s_city) from tpcds.store http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1700 PS12, Line 1700: TYPES > Done sorry, I mean changing it to select s_store_sk, covar_pop(s_number_employees, s_floor_space) over (partition by s_city) from tpcds.store -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 13 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Fri, 13 May 2022 10:46:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 13: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 13 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 12 May 2022 23:21:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 13: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8099/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 13 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 12 May 2022 18:53:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 13: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10564/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 13 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 12 May 2022 18:29:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 13: (12 comments) > Patch Set 12: > > (12 comments) > > Thanks for adding the other two stats functions! http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@301 PS12, Line 301: > nit: add a space Done http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@309 PS12, Line 309: : void AggregateFunctions::CorrInit(FunctionContext* ctx, StringVal* dst) { : dst->is_null = false; : dst->len = sizeof(Co > These two fields are not used in covar_samp() and covar_pop(). Can we have Sure, I added a new state, CovarState, and a few related functions. http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@349 PS12, Line 349: rns NULL if > nit: please fix the space Done http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@362 PS12, Line 362: pret_cast nit: fix the space Done http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@376 PS12, Line 376: return > nit: add spaces Done http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@392 PS12, Line 392: L); > nit: add spaces Done http://gerrit.cloudera.org:8080/#/c/18413/12/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java: http://gerrit.cloudera.org:8080/#/c/18413/12/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1347 PS12, Line 1347: // > nit: add one space Done http://gerrit.cloudera.org:8080/#/c/18413/12/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1372 PS12, Line 1372: // > nit: add one space Done http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test File testdata/workloads/functional-query/queries/QueryTest/aggregation.test: http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1526 PS12, Line 1526: select corr(s_store_sk, s_floor_space) over (partition by s_city) from tpcds.store; > nit: could you add s_store_sk as the first column? Also we don't need "limi Done http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1619 PS12, Line 1619: > Could you leave a commet for this? Done http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1683 PS12, Line 1683: > nit: add s_store_sk column and remove the limit clause Done http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1700 PS12, Line 1700: TYPES > nit: add s_store_sk column and remove the limit clause Done -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 13 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 12 May 2022 18:17:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() CORR() function takes two numeric type columns as arguments and returns the Pearson's correlation coefficient between them. COVAR_SAMP() function takes two numeric type columns and returns sample covariance between them. COVAR_POP() function takes two numeric type columns and returns population covariance between them. These UDAFs are tested with a few query tests written in aggregation.test. Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/workloads/functional-query/queries/QueryTest/aggregation.test 4 files changed, 759 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/18413/13 -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 13 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 12: (12 comments) Thanks for adding the other two stats functions! http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@301 PS12, Line 301: // nit: add a space http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@309 PS12, Line 309: // Sum(x) squared : double sum_squaredx; : // Sum(y) squared : double sum_squaredy; These two fields are not used in covar_samp() and covar_pop(). Can we have a separate State for covar_samp() and covar_pop()? http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@349 PS12, Line 349: ,src2.val , nit: please fix the space http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@362 PS12, Line 362: ,src2.val , nit: fix the space http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@376 PS12, Line 376: ,val2, nit: add spaces http://gerrit.cloudera.org:8080/#/c/18413/12/be/src/exprs/aggregate-functions-ir.cc@392 PS12, Line 392: ,val2, nit: add spaces http://gerrit.cloudera.org:8080/#/c/18413/12/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java: http://gerrit.cloudera.org:8080/#/c/18413/12/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1347 PS12, Line 1347: // nit: add one space http://gerrit.cloudera.org:8080/#/c/18413/12/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1372 PS12, Line 1372: // nit: add one space http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test File testdata/workloads/functional-query/queries/QueryTest/aggregation.test: http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1526 PS12, Line 1526: select corr(s_number_employees,s_floor_space) over (partition by s_city) from tpcds.store limit 10; nit: could you add s_store_sk as the first column? Also we don't need "limit 10" since this table only has 12 rows. http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1619 PS12, Line 1619: /1E+10 Could you leave a commet for this? http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1683 PS12, Line 1683: select covar_samp(s_number_employees, s_floor_space) over (partition by s_city) from tpcds.store limit 10; nit: add s_store_sk column and remove the limit clause http://gerrit.cloudera.org:8080/#/c/18413/12/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1700 PS12, Line 1700: select covar_pop(s_number_employees, s_floor_space) over (partition by s_city) from tpcds.store limit 10; nit: add s_store_sk column and remove the limit clause -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 12 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Thu, 12 May 2022 07:48:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 12: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10556/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 12 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 11 May 2022 09:38:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-11205: Implement Statistical functions : CORR(), COVAR SAMP() and COVAR POP()
pranav.lo...@cloudera.com has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 ) Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() .. Patch Set 12: (7 comments) > Patch Set 11: > > (1 comment) http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@296 PS10, Line 296: // r = (n(Σxy)−(Σx)(Σy))/(√(nΣx^2−(Σx)^2)*(nΣy^2−(Σy)^2)) > Seems ok to stick with double as other databases do this and there does not Done http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@339 PS10, Line 339: > nit: remove the space Done http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@350 PS10, Line 350: > nit: remove the space Done http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@364 PS10, Line 364: > nit: fix the indention Done http://gerrit.cloudera.org:8080/#/c/18413/10/be/src/exprs/aggregate-functions-ir.cc@365 PS10, Line 365: > nit: remove blank lines Done http://gerrit.cloudera.org:8080/#/c/18413/11/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/11/be/src/exprs/aggregate-functions-ir.cc@295 PS11, Line 295: // Correlation coefficient: > line too long (97 > 90) Ack http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/aggregation.test File testdata/workloads/functional-query/queries/QueryTest/aggregation.test: http://gerrit.cloudera.org:8080/#/c/18413/10/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1539 PS10, Line 1539: double > Also test empty table and all zero values cases. Done -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 12 Gerrit-Owner: Anonymous Coward Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 11 May 2022 09:18:21 + Gerrit-HasComments: Yes