Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 )
Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() ...................................................................... Patch Set 10: (8 comments) Thanks, it is getting better and better. We'll still need to find out more about the problem of negative covariances of the dependent variable. http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@712 PS9, Line 712: d_squared' can only be 0 if eit > No, because covariance will also get divided by state->count which will can I see. In this case I don't think we should divide 'independent_var' and 'dependent_var' in the other places either, they are just checks for zero-ness. If they are not zero before division it means they shouldn't be zero after division either, the only way they may become zero is by floating point rounding. But we never actually use the divided values later, so we don't have to treat them as zero in this case. This involves L705-706 and L708 in Patch Set 10. http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@683 PS10, Line 683: // 1 if (var_pop(y) = 0) and (var_pop(x) != 0), else Add one more space so that "1" aligns with "NULL". Also applies to the next line. http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@683 PS10, Line 683: ( I don't see much value in surrounding "var_pop(y) = 0" and "var_pop(x) != 0" with parentheses. What I thought of was ... 1 if var_pop(y) = 0 (and var_pop(x) != 0), else ... This puts "var_pop(x) != 0" in parentheses because it is redundant - it is implied by the previous line as it ends with "else". I would actually even omit "var_pop(x) != 0". http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@700 PS10, Line 700: // dependent_var and independent_var become negative in certain cases due to floating Nit: I would leave an empty line before this. http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@706 PS10, Line 706: < I've done some experiments with Hive and indeed we get the results when we treat this case like this, but it needs more investigation because it is not clear why Hive returns NULL in this case, for example for this query: select s_store_sk, regr_r2(s_number_employees, s_floor_space) over (partition by s_city order by s_store_sk rows between 1 preceding and 1 following) from tpcds.store; It doesn't seem to handle negative values here: https://github.com/apache/hive/blob/aa1e067033ef0b5468f725cfd3776810800af96d/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFBinarySetFunctions.java#L331 I'll try to do some more digging. http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@713 PS10, Line 713: are This is the one that should be "is", as in "if either 'dependent_var' or 'independent_var' is 0". In the next sentence it should be "are", as in "if both 'dependent_var' and 'independent_var' are very small". http://gerrit.cloudera.org:8080/#/c/19569/7/testdata/workloads/functional-query/queries/QueryTest/aggregation.test File testdata/workloads/functional-query/queries/QueryTest/aggregation.test: http://gerrit.cloudera.org:8080/#/c/19569/7/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@2444 PS7, Line 2444: dependednt It is the dependent variable that becomes negative here. http://gerrit.cloudera.org:8080/#/c/19569/7/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@2695 PS7, Line 2695: independe It is the independent var that becomes negative here. -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 10 Gerrit-Owner: Anonymous Coward <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Thu, 18 May 2023 13:05:37 +0000 Gerrit-HasComments: Yes
