Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18413 )

Change subject: IMPALA-11205: Implement Statistical functions : CORR(), 
COVAR_SAMP()  and COVAR_POP()
......................................................................


Patch Set 15:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/18413/15/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/18413/15/be/src/exprs/aggregate-functions-ir.cc@328
PS15, Line 328:   state->xavg -= deltaX / state->count;
This seems wrong to me. Let's say mx_n is the avg of [x_1, x_2, ..., x_n]. In 
CorrUpdateState(), we have
mx_n = mx_(n-1) + [x_n - mx_(n-1)]/n
Reverting the formula we have
mx_(n-1) = (n * mx_n - x_n)/(n-1) = mx_n - (x_n - mx_n)/(n-1)

The above code corresponds to
mx_(n-1) = mx_n - (x_n - mx_n) / n
So it's incorrect. We should decrease state->count before this.

I think the correct code is

  --state->count;
  state->xavg -= deltaX / state->count;
  state->yavg -= deltaY / state->count;

We also need to check if state->count becomes 0.


http://gerrit.cloudera.org:8080/#/c/18413/15/be/src/exprs/aggregate-functions-ir.cc@332
PS15, Line 332:     state->covar -= deltaX * (y - state->yavg);
              :     state->xvar -= deltaX * (x - state->xavg);
              :     state->yvar -= deltaY * (y - state->yavg);
I think we need to find a reference for these.


http://gerrit.cloudera.org:8080/#/c/18413/14/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/18413/14/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1492
PS14, Line 1492: NULL
> Resolved
FWIW, this is resolved by changing the formula to use another one with better 
error precision.
https://www.johndcook.com/blog/standard_deviation/



--
To view, visit http://gerrit.cloudera.org:8080/18413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06
Gerrit-Change-Number: 18413
Gerrit-PatchSet: 15
Gerrit-Owner: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Thu, 19 May 2022 07:02:19 +0000
Gerrit-HasComments: Yes

Reply via email to