Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
......................................................................


Patch Set 10:

(8 comments)

Thanks, it is getting better and better. We'll still need to find out more 
about the problem of negative covariances of the dependent variable.

http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@712
PS9, Line 712: d_squared' can only be 0 if eit
> No, because covariance will also get divided by state->count which will can
I see. In this case I don't think we should divide 'independent_var' and 
'dependent_var' in the other places either, they are just checks for zero-ness.

If they are not zero before division it means they shouldn't be zero after 
division either, the only way they may become zero is by floating point 
rounding. But we never actually use the divided values later, so we don't have 
to treat them as zero in this case.
This involves L705-706 and L708 in Patch Set 10.


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@683
PS10, Line 683: //               1 if (var_pop(y) = 0) and (var_pop(x) != 0), 
else
Add one more space so that "1" aligns with "NULL". Also applies to the next 
line.


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@683
PS10, Line 683: (
I don't see much value in surrounding "var_pop(y) = 0" and "var_pop(x) != 0" 
with parentheses. What I thought of was

  ...
  1 if var_pop(y) = 0 (and var_pop(x) != 0), else
  ...

This puts "var_pop(x) != 0" in parentheses because it is redundant - it is 
implied by the previous line as it ends with "else". I would actually even omit 
"var_pop(x) != 0".


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@700
PS10, Line 700:   // dependent_var and independent_var become negative in 
certain cases due to floating
Nit: I would leave an empty line before this.


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@706
PS10, Line 706: <
I've done some experiments with Hive and indeed we get the results when we 
treat this case like this, but it needs more investigation because it is not 
clear why Hive returns NULL in this case, for example for this query:

select s_store_sk, regr_r2(s_number_employees, s_floor_space) over (partition 
by s_city order by s_store_sk
 rows between 1 preceding and 1 following) from tpcds.store;

It doesn't seem to handle negative values here: 
https://github.com/apache/hive/blob/aa1e067033ef0b5468f725cfd3776810800af96d/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFBinarySetFunctions.java#L331

I'll try to do some more digging.


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@713
PS10, Line 713: are
This is the one that should be "is", as in "if either 'dependent_var' or 
'independent_var' is 0".

In the next sentence it should be "are", as in "if both 'dependent_var' and 
'independent_var' are very small".


http://gerrit.cloudera.org:8080/#/c/19569/7/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/19569/7/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@2444
PS7, Line 2444: dependednt
It is the dependent variable that becomes negative here.


http://gerrit.cloudera.org:8080/#/c/19569/7/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@2695
PS7, Line 2695: independe
It is the independent var that becomes negative here.



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 10
Gerrit-Owner: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Thu, 18 May 2023 13:05:37 +0000
Gerrit-HasComments: Yes

Reply via email to