Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 )
Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() ...................................................................... Patch Set 7: (10 comments) http://gerrit.cloudera.org:8080/#/c/19569/7//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/19569/7//COMMIT_MSG@18 PS7, Line 18: regr_r2() takes two arguments of numeric type and returns the coefficient This line is still too long (73 vs. 73). http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@678 PS7, Line 678: // Implementation of regr_r2(): In source files the max line width is 90. Many lines here are shorter than that for no reason. We should normally not use shorter lines, only if a line break has a reason, for example a new paragraph. http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@679 PS7, Line 679: is Nit: no need for this 'is'. http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@684 PS7, Line 684: // regr_2(dependent, independent) = NULL if var_pop(independent) = 0, else I think using 'dependent' and 'independent' here in the doc comment is overkill, you see 'y' and 'x' in most descriptions, where 'y' is the dependent var and 'x' is the independent var. http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@685 PS7, Line 685: ( You left out var_pop. http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@698 PS7, Line 698: double dependent_var = state->xvar; We should add a comment describing why we assign 'state->xvar' to 'dependent' and not to 'independent' (and vice versa). Something like this: In this function we use 'dependent_var' and 'independent_var' instead of 'y_var' and 'x_var'. This is to avoid confusion, because for regr_r2() the dependent variable is the first parameter and the independent variable is the second parameter, but CorrUpdate(), which we use to produce the intermediate values, has the opposite order. Our aggregate function framework passes the variables in order to CorrUpdate(), so in CorrUpdate() 'x' corresponds to the dependent variable of regr_r2() and 'y' to the independent variable of regr_r2(). http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@702 PS7, Line 702: dependent_var < 0.0 This still doesn't seem right. If 'dependent_var' is less than zero, then we should treat it as zero, but if at the same time 'independent_var' is not zero, then we should return 1, like on L705. Maybe it's easier if we do the following: First check if independent_var is negative, if it is we set it to 0. Then we do the same with dependent_var. We remove the checks for negativity from L702, so we only have (state->count < 2 || independent_var == 0) as the condition. http://gerrit.cloudera.org:8080/#/c/19569/7/be/src/exprs/aggregate-functions-ir.cc@708 PS7, Line 708: / Returns null if dependent_var and independent_var are very small and : // if their product might become 0 because of floating point underflow. This is unnecessary here, the following lines contain this information. http://gerrit.cloudera.org:8080/#/c/19569/7/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java: http://gerrit.cloudera.org:8080/#/c/19569/7/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1344 PS7, Line 1344: Nit: trailing whitespace, should be removed. http://gerrit.cloudera.org:8080/#/c/19569/7/testdata/workloads/functional-query/queries/QueryTest/aggregation.test File testdata/workloads/functional-query/queries/QueryTest/aggregation.test: http://gerrit.cloudera.org:8080/#/c/19569/7/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@2398 PS7, Line 2398: (state->xvar <= 0.0 || state->yvar <= 0.0) This is not accurate, we shouldn't handle these cases as the same. See comment in AggregateFunctions::Regr_r2GetValue(). Applies also to the other tests. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 7 Gerrit-Owner: Anonymous Coward <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Thu, 27 Apr 2023 14:47:18 +0000 Gerrit-HasComments: Yes
