[email protected] has posted comments on this change. ( http://gerrit.cloudera.org:8080/18413 )
Change subject: IMPALA-11205: Implement Statistical functions : CORR(), COVAR_SAMP() and COVAR_POP() ...................................................................... Patch Set 19: (10 comments) > Patch Set 18: > > (1 comment) http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@296 PS18, Line 296: // https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online > could you explain more about how the `CorrState` is updated and used to cal Added a few links http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@408 PS18, Line 408: tm_src2.ToSubsecondUnixTime(UTCPTR, &val2)) { > Skipping NaN here seems questionable? Please add comments or handle appropr Yeah, its fixed now. http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@422 PS18, Line 422: long nB = src_state->count; > Does (nB == 0) need handling? nB is the count of src->state. nB == 0 has been accounted for in CorrInit() and CorrRemove(). We alse make sure that src.ptr is not equal to nullptr. http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@462 PS18, Line 462: return DoubleVal(corr); > Add an assert that negative values are small and comment that this can happ Sure,added, I've taken the tolerance value to be around -1E-8 as it seemed safe to me. We can probably go for smaller values like -1E-10 as well. Please let me know what would be a suitable tolerance value. http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@543 PS18, Line 543: DCHECK_EQ(sizeof(CovarState), dst->len); > Skipping Nan? Updated in the latest patch. http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@575 PS18, Line 575: } > Skipping NaN Updated in the latest patch. http://gerrit.cloudera.org:8080/#/c/18413/18/be/src/exprs/aggregate-functions-ir.cc@589 PS18, Line 589: return; > handle (nB == 0) ? nB is the count of src->state. nB == 0 has been accounted for in CorrInit() and CorrRemove(). We alse make sure that src.ptr is not equal to nullptr. http://gerrit.cloudera.org:8080/#/c/18413/18/testdata/workloads/functional-query/queries/QueryTest/aggregation.test File testdata/workloads/functional-query/queries/QueryTest/aggregation.test: http://gerrit.cloudera.org:8080/#/c/18413/18/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1526 PS18, Line 1526: from tpcds.store; > This may be non-deterministic (flaky) without an order by on the window cla Yeah, its fixed in the latest patch http://gerrit.cloudera.org:8080/#/c/18413/18/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1544 PS18, Line 1544: select id, double_col, corr(double_col, int_col) over (partition by month order by id) from functional.alltypes > This may be non-deterministic (flaky) without an order by on the window cla Yeah, its fixed in the latest patch http://gerrit.cloudera.org:8080/#/c/18413/18/testdata/workloads/functional-query/queries/QueryTest/aggregation.test@1578 PS18, Line 1578: select s_store_sk, corr(s_number_employees, s_floor_space) over (partition by s_city order by s_store_sk > Check if order by city is strong ordering. It wasn't strong ordering, I've changed it to s_store_sk which contains unique values. -- To view, visit http://gerrit.cloudera.org:8080/18413 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I32ad627c953ba24d9cde2d5549bdd0d27a9c0d06 Gerrit-Change-Number: 18413 Gerrit-PatchSet: 19 Gerrit-Owner: Anonymous Coward <[email protected]> Gerrit-Reviewer: Anonymous Coward <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Jian Zhang <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Mon, 06 Jun 2022 17:23:00 +0000 Gerrit-HasComments: Yes
