[email protected] has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19569 )

Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), 
regr_intercept() and regr_r2()
......................................................................


Patch Set 11:

(23 comments)

> Patch Set 11:
>
> (1 comment)

http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@712
PS9, Line 712: as 0.
> I see. In this case I don't think we should divide 'independent_var' and 'd
I see, but if the values become zero due to floating point rounding, shouldn't 
we deal with those specifically, as in return null or 1.


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@317
PS10, Line 317:   *(reinterpret_cast<RegrSlopeState*>(dst->ptr)) = {};
> Probably faster to use:
Sure, after running a pseudo code, it does perform a little better.


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@342
PS10, Line 342:     
*(reinterpret_cast<RegrSlopeState*>(sizeof(RegrSlopeState))) = {};
> Probably faster to use:
Sure, after running a pseudo code, it does perform a little better.


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@386
PS10, Line 386:   if (tm_src1.ToSubsecondUnixTime(UTCPTR, &val1) &&
> Can UtcToUnixTime be used here? Even better to add a function to convert di
Unfortunately no, since UtcToUnixTime takes a time_t* argument which would 
defeat the purpose of the whole operation. ToSubsecondUnixTime() takes a 
double* argument which converts the timestamp into double values. These values 
can then be used in the standard update and remove functions.


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@402
PS10, Line 402:   if (tm_src1.ToSubsecondUnixTime(UTCPTR, &val1) &&
> Can UtcToUnixTime be used here?
Unfortunately no, since UtcToUnixTime takes a time_t* argument which would 
defeat the purpose of the whole operation. ToSubsecondUnixTime() takes a 
double* argument which converts the timestamp into double values. These values 
can then be used in the standard update and remove functions.


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@410
PS10, Line 410:   const RegrSlopeState* src_state = 
reinterpret_cast<RegrSlopeState*>(src.ptr);
> declare const
Done


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@415
PS10, Line 415:     int64_t nA = dst_state->count;
> It's often faster to not copy these member references to variables, especia
Done


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@423
PS10, Line 423:
> src fields are not updated and do not need to be copied.
Done


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@443
PS10, Line 443:   // Since these values are very small, they can be ignored and 
rounded to 0.
> Declare as const
Done


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@451
PS10, Line 451: DoubleVal 
AggregateFunctions::RegrSlopeFinalize(FunctionContext* ctx,
> eliminate this variable.
Done


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@472
PS10, Line 472:  stat
> define as constant or use an existing one
Done


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@476
PS10, Line 476: DoubleVal 
AggregateFunctions::RegrInterceptFinalize(FunctionContext* ctx,
> Check if removing these variables improves code generation.
@Daniel, can we discuss about this offline.


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@683
PS10, Line 683:     const StringVal& src) {
> Add one more space so that "1" aligns with "NULL". Also applies to the next
Done


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@683
PS10, Line 683: r
> I don't see much value in surrounding "var_pop(y) = 0" and "var_pop(x) != 0
Done


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@689
PS10, Line 689:   // CorrUpdate(), which we use to produce the intermediate 
values, has the opposite
> Declare const
Done


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@700
PS10, Line 700:   DCHECK(independent_var >= FloatingErrorEstimator);
> Nit: I would leave an empty line before this.
Done


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@706
PS10, Line 706:
> I've done some experiments with Hive and indeed we get the results when we
I checked the behavior of a few queries with hive and postgreSQL, our code is 
in sync with them. Please let me know if you find any ambiguities.


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@713
PS10, Line 713: ==
> This is the one that should be "is", as in "if either 'dependent_var' or 'i
Done


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@729
PS10, Line 729: // population covariance between two columns of numeric types 
respectively using
> Changing function to have one Free call should reduce code size
Done


http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@751
PS10, Line 751:   // my_n = my_(n - 1) + [y_n - my_(n - 1)] / n
> Probably faster to use:
Done, probably would be better to do this in another patch, or if its okay I 
can do the changes in other functions in the same patch. Kindly let me know.


http://gerrit.cloudera.org:8080/#/c/19569/10/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java:

http://gerrit.cloudera.org:8080/#/c/19569/10/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1344
PS10, Line 1344:
> nit: extra WS
Done


http://gerrit.cloudera.org:8080/#/c/19569/7/testdata/workloads/functional-query/queries/QueryTest/aggregation.test
File testdata/workloads/functional-query/queries/QueryTest/aggregation.test:

http://gerrit.cloudera.org:8080/#/c/19569/7/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@2444
PS7, Line 2444: dependednt
> It is the dependent variable that becomes negative here.
Done


http://gerrit.cloudera.org:8080/#/c/19569/7/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@2695
PS7, Line 2695: independe
> It is the independent var that becomes negative here.
Done



--
To view, visit http://gerrit.cloudera.org:8080/19569
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3
Gerrit-Change-Number: 19569
Gerrit-PatchSet: 11
Gerrit-Owner: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Wed, 14 Jun 2023 18:51:53 +0000
Gerrit-HasComments: Yes

Reply via email to