[email protected] has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
......................................................................


Patch Set 4:

(54 comments)

> Patch Set 4:
>
> (1 comment)

A few more tests are yet to be added.

http://gerrit.cloudera.org:8080/#/c/19569/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19569/2//COMMIT_MSG@18
PS2, Line 18:
> You could have a newline after the ':'.
Done


http://gerrit.cloudera.org:8080/#/c/19569/2//COMMIT_MSG@18
PS2, Line 18:
> Nit: Hive (capital H).
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@291
PS1, Line 291: // Implementation of regr_slope() and regr_intercept():
> Did you use the Oracle link as specification? Did you take the implementati
Yes, I've used the Oracle link just for specification, couldn't find these 
exact formulas on any paper but their constituent parts like xvar, yvar, covar 
etc are mentioned in papers and have a similar implementation as correlation 
and covariance functions.


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@294
PS1, Line 294: he regression slope of the line and the y-int
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@294
PS1, Line 294: return
> Nit: return
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@296
PS1, Line 296: a set of nu
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@297
PS1, Line 297: // analytic functions.
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@303
PS1, Line 303: // regr_intercept() formula used:
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@308
PS1, Line 308: a
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@317
PS1, Line 317:   dst->len = sizeof(RegrSlopeState);
> Not addressed yet.
As discussed, to maintain uniformity across all the other aggregate functions, 
it'd be better to stick with memset(). Open for discussion.


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@341
PS1, Line 341:
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@344
PS1, Line 344:     memset(state, 0, sizeof(RegrSlopeState));
> Not addressed yet.
As discussed, to maintain uniformity across all the other aggregate functions, 
it'd be better to stick with memset(). Open for discussion.


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@418
PS1, Line 418:     int64_t nB = src_state->count;
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@421
PS1, Line 421:       return;
> Could use else if, and then no need for the condition "nA != 0".
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@442
PS1, Line 442:
> line too long (93 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@452
PS1, Line 452:   }
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@453
PS1, Line 453: regr
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@457
PS1, Line 457: DoubleVal AggregateFunctions::RegrSlopeFinalize(FunctionContext* 
ctx,
> line too long (93 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@459
PS1, Line 459:
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@468
PS1, Line 468: DoubleVal 
AggregateFunctions::RegrInterceptGetValue(FunctionContext* ctx,
> line too long (97 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@479
PS1, Line 479:   double regrIntercept = state->yavg - (regrSlope * state->xavg);
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@480
PS1, Line 480: DoubleVal
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@486
PS1, Line 486:     ctx->Free(src.ptr);
> line too long (97 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@488
PS1, Line 488:
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@690
PS1, Line 690:
> This condition is superfluous, it follows from the ones above, but in case
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@691
PS1, Line 691: DoubleVal AggregateFunctions::Regr_r2GetValue(FunctionContext* 
ctx,
> See comment on L303 about the independent and dependent variables, it appli
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@692
PS1, Line 692:     const StringVal& src) {
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@699
PS1, Line 699: state->xvar <= 0.0 ||
             :       state->yvar <= 0.0
> Not addressed yet. It is important that we create tests for this case.
Under process, I ran a bunch of queries but couldn't find a case where yvar 
becomes negative, am still looking for it and will update the new patch with it.


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@703
PS1, Line 703: n
> Please find a meaningful name for 'r'.
I think r_squared should make sense. Please let me know if we should change it 
to something else.


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@705
PS1, Line 705:     double r_squared = state->xvar * state->yvar;
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@706
PS1, Line 706:     // Returns null if xvar and yvar are very small and their 
product
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@707
PS1, Line 707: e of floating point
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@708
PS1, Line 708: quared == 0.0) return DoubleVal::nu
> Done
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/be/src/exprs/aggregate-functions-ir.cc@712
PS1, Line 712: DoubleVal AggregateFunctions::Regr_r2Finalize(FunctionContext* 
ctx,
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/19569/2/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/2/be/src/exprs/aggregate-functions-ir.cc@299
PS2, Line 299: // 
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/REGR_-Linear-Regression-Functions.html#GUID-A675B68F-2A88-4843-BE2C-FCDE9C65F9A9
> I think URLs should be an exception to the line width rule as if they are b
Done


http://gerrit.cloudera.org:8080/#/c/19569/2/be/src/exprs/aggregate-functions-ir.cc@301
PS2, Line 301: // regr_slope() formula used:
> Nit: add a newline after the link.
Done


http://gerrit.cloudera.org:8080/#/c/19569/2/be/src/exprs/aggregate-functions-ir.cc@302
PS2, Line 302: x, y
> Although covar_pop is commutative, it's more conventional to write covar_po
Done


http://gerrit.cloudera.org:8080/#/c/19569/2/be/src/exprs/aggregate-functions-ir.cc@304
PS2, Line 304: regr_intercept
> Should be 'regr_intercepr(y, x)'.
Done


http://gerrit.cloudera.org:8080/#/c/19569/2/be/src/exprs/aggregate-functions-ir.cc@305
PS2, Line 305: wh
> Nit: no need for these slashes.
Done


http://gerrit.cloudera.org:8080/#/c/19569/2/be/src/exprs/aggregate-functions-ir.cc@423
PS2, Line 423:       double yavgA = dst_state->yavg;
> Start the 'else if' on the previous line after the '}'.
Done


http://gerrit.cloudera.org:8080/#/c/19569/2/be/src/exprs/aggregate-functions-ir.cc@423
PS2, Line 423: yavgA =
> No need for this as if nA==0 then we go into the IF-THEN block, not the ELS
Done


http://gerrit.cloudera.org:8080/#/c/19569/2/be/src/exprs/aggregate-functions-ir.cc@460
PS2, Line 460:     ctx->Free(src.ptr);
> Nit: unnecessary line.
Done


http://gerrit.cloudera.org:8080/#/c/19569/2/be/src/exprs/aggregate-functions-ir.cc@488
PS2, Line 488:   }
> Nit: unnecessary line.
Done


http://gerrit.cloudera.org:8080/#/c/19569/2/be/src/exprs/aggregate-functions-ir.cc@689
PS2, Line 689: Note th
> Should be 'regr_r2(y, x)', also for the next two lines.
Done


http://gerrit.cloudera.org:8080/#/c/19569/2/be/src/exprs/aggregate-functions-ir.cc@691
PS2, Line 691:
             :     const StringVal&
> If we keep this:
Done


http://gerrit.cloudera.org:8080/#/c/19569/2/be/src/exprs/aggregate-functions-ir.cc@706
PS2, Line 706: l
> This should be 'yvar', see the formula.
Done


http://gerrit.cloudera.org:8080/#/c/19569/2/be/src/exprs/aggregate-functions-ir.cc@706
PS2, Line 706:   //
> Start it on the previous line, after the '}'.
Done


http://gerrit.cloudera.org:8080/#/c/19569/2/be/src/exprs/aggregate-functions-ir.cc@706
PS2, Line 706: nd yvar
> If we have an ELSE or ELSE IF branch, we always use blocks, i.e.
Noted, done!


http://gerrit.cloudera.org:8080/#/c/19569/2/be/src/exprs/aggregate-functions-ir.cc@708
PS2, Line 708:     if (r_squared == 0.0) return DoubleVal::null();
> Am I right that the formula should be
Makes sense, updated!


http://gerrit.cloudera.org:8080/#/c/19569/1/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

http://gerrit.cloudera.org:8080/#/c/19569/1/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1369
PS1, Line 1369:
> Not addressed yet.
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1394
PS1, Line 1394:
> Not addressed yet.
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1415
PS1, Line 1415:             prefix + 
"17RegrSlopeFinalizeEPN10impala_udf15FunctionContextERKNS1_9StringValE",
> Not addressed yet.
Done


http://gerrit.cloudera.org:8080/#/c/19569/1/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/19569/1/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@2157
PS1, Line 2157: NULL,NULL,NULL
> Why are these NULLs? Is it correct?
Yes, the results match with hive.


http://gerrit.cloudera.org:8080/#/c/19569/1/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@2443
PS1, Line 2443: state->xvar <= 0.0 || state->yvar <= 0.0
> See L699 in aggregate-functions-ir.cc about why we can't handle these two c
Under process and discussion.



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 4
Gerrit-Owner: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Tue, 04 Apr 2023 21:23:26 +0000
Gerrit-HasComments: Yes

Reply via email to