[email protected] has posted comments on this change. ( http://gerrit.cloudera.org:8080/19569 )
Change subject: IMPALA-11957: Implement Regression functions: regr_slope(), regr_intercept() and regr_r2() ...................................................................... Patch Set 11: (23 comments) > Patch Set 11: > > (1 comment) http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/9/be/src/exprs/aggregate-functions-ir.cc@712 PS9, Line 712: as 0. > I see. In this case I don't think we should divide 'independent_var' and 'd I see, but if the values become zero due to floating point rounding, shouldn't we deal with those specifically, as in return null or 1. http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@317 PS10, Line 317: *(reinterpret_cast<RegrSlopeState*>(dst->ptr)) = {}; > Probably faster to use: Sure, after running a pseudo code, it does perform a little better. http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@342 PS10, Line 342: *(reinterpret_cast<RegrSlopeState*>(sizeof(RegrSlopeState))) = {}; > Probably faster to use: Sure, after running a pseudo code, it does perform a little better. http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@386 PS10, Line 386: if (tm_src1.ToSubsecondUnixTime(UTCPTR, &val1) && > Can UtcToUnixTime be used here? Even better to add a function to convert di Unfortunately no, since UtcToUnixTime takes a time_t* argument which would defeat the purpose of the whole operation. ToSubsecondUnixTime() takes a double* argument which converts the timestamp into double values. These values can then be used in the standard update and remove functions. http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@402 PS10, Line 402: if (tm_src1.ToSubsecondUnixTime(UTCPTR, &val1) && > Can UtcToUnixTime be used here? Unfortunately no, since UtcToUnixTime takes a time_t* argument which would defeat the purpose of the whole operation. ToSubsecondUnixTime() takes a double* argument which converts the timestamp into double values. These values can then be used in the standard update and remove functions. http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@410 PS10, Line 410: const RegrSlopeState* src_state = reinterpret_cast<RegrSlopeState*>(src.ptr); > declare const Done http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@415 PS10, Line 415: int64_t nA = dst_state->count; > It's often faster to not copy these member references to variables, especia Done http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@423 PS10, Line 423: > src fields are not updated and do not need to be copied. Done http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@443 PS10, Line 443: // Since these values are very small, they can be ignored and rounded to 0. > Declare as const Done http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@451 PS10, Line 451: DoubleVal AggregateFunctions::RegrSlopeFinalize(FunctionContext* ctx, > eliminate this variable. Done http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@472 PS10, Line 472: stat > define as constant or use an existing one Done http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@476 PS10, Line 476: DoubleVal AggregateFunctions::RegrInterceptFinalize(FunctionContext* ctx, > Check if removing these variables improves code generation. @Daniel, can we discuss about this offline. http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@683 PS10, Line 683: const StringVal& src) { > Add one more space so that "1" aligns with "NULL". Also applies to the next Done http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@683 PS10, Line 683: r > I don't see much value in surrounding "var_pop(y) = 0" and "var_pop(x) != 0 Done http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@689 PS10, Line 689: // CorrUpdate(), which we use to produce the intermediate values, has the opposite > Declare const Done http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@700 PS10, Line 700: DCHECK(independent_var >= FloatingErrorEstimator); > Nit: I would leave an empty line before this. Done http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@706 PS10, Line 706: > I've done some experiments with Hive and indeed we get the results when we I checked the behavior of a few queries with hive and postgreSQL, our code is in sync with them. Please let me know if you find any ambiguities. http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@713 PS10, Line 713: == > This is the one that should be "is", as in "if either 'dependent_var' or 'i Done http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@729 PS10, Line 729: // population covariance between two columns of numeric types respectively using > Changing function to have one Free call should reduce code size Done http://gerrit.cloudera.org:8080/#/c/19569/10/be/src/exprs/aggregate-functions-ir.cc@751 PS10, Line 751: // my_n = my_(n - 1) + [y_n - my_(n - 1)] / n > Probably faster to use: Done, probably would be better to do this in another patch, or if its okay I can do the changes in other functions in the same patch. Kindly let me know. http://gerrit.cloudera.org:8080/#/c/19569/10/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java File fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java: http://gerrit.cloudera.org:8080/#/c/19569/10/fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java@1344 PS10, Line 1344: > nit: extra WS Done http://gerrit.cloudera.org:8080/#/c/19569/7/testdata/workloads/functional-query/queries/QueryTest/aggregation.test File testdata/workloads/functional-query/queries/QueryTest/aggregation.test: http://gerrit.cloudera.org:8080/#/c/19569/7/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@2444 PS7, Line 2444: dependednt > It is the dependent variable that becomes negative here. Done http://gerrit.cloudera.org:8080/#/c/19569/7/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@2695 PS7, Line 2695: independe > It is the independent var that becomes negative here. Done -- To view, visit http://gerrit.cloudera.org:8080/19569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab6bd84ae3e0c02ec924c30183308123b951caa3 Gerrit-Change-Number: 19569 Gerrit-PatchSet: 11 Gerrit-Owner: Anonymous Coward <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Daniel Becker <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Wed, 14 Jun 2023 18:51:53 +0000 Gerrit-HasComments: Yes
