Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17008 )
Change subject: IMPALA-10463: Implement ds_theta_sketch() and ds_theta_estimate() functions ...................................................................... Patch Set 3: (7 comments) http://gerrit.cloudera.org:8080/#/c/17008/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17008/2//COMMIT_MSG@7 PS2, Line 7: ds_theta_estimate > nit: typo Done http://gerrit.cloudera.org:8080/#/c/17008/2//COMMIT_MSG@13 PS2, Line 13: ds_theta_estimate > nit: same typo Done http://gerrit.cloudera.org:8080/#/c/17008/2//COMMIT_MSG@28 PS2, Line 28: see IMPALA-10464. > I'd also include some highlights from that perf measurement doc into the co Done http://gerrit.cloudera.org:8080/#/c/17008/2/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/17008/2/be/src/exprs/aggregate-functions-ir.cc@1646 PS2, Line 1646: SerializeDsThetaSketch( > In contrast with HLL as I see Theta doesn't compact the sketch just seriali Done http://gerrit.cloudera.org:8080/#/c/17008/2/be/src/exprs/aggregate-functions-ir.cc@1899 PS2, Line 1899: or dst->len == sizeof(datasketches::theta_union)); > I;m a bit lost here. Could you help me understand why is it needed to conve Previously, it was processed along the idea that the size of dst is unchanged, and it is better to return union_sketch. http://gerrit.cloudera.org:8080/#/c/17008/2/be/src/exprs/datasketches-functions-ir.cc File be/src/exprs/datasketches-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/17008/2/be/src/exprs/datasketches-functions-ir.cc@110 PS2, Line 110: return 0; > HLL returns a null here. Have you checked the behaviour in Hive to be in sy Comparing the test cases of HLL and Theta, the results are different. Theta: https://github.com/apache/datasketches-hive/blob/master/src/test/java/org/apache/datasketches/hive/theta/EstimateSketchUDFTest.java#L34 HLL: https://github.com/apache/datasketches-hive/blob/master/src/test/java/org/apache/datasketches/hive/hll/SketchToEstimateUDFTest.java#L31 http://gerrit.cloudera.org:8080/#/c/17008/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test File testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test: http://gerrit.cloudera.org:8080/#/c/17008/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@138 PS2, Line 138: # Check that ds_theta_estimate returns error for strings that are not serialized sketches. > Please add a test when ds_theta_estimate() is used on an HLL sketch. I gues Done -- To view, visit http://gerrit.cloudera.org:8080/17008 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14f24c16b815eec75cf90bb92c8b8b0363dcbfbc Gerrit-Change-Number: 17008 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu <[email protected]> Gerrit-Reviewer: Fucun Chu <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Comment-Date: Wed, 10 Feb 2021 13:38:14 +0000 Gerrit-HasComments: Yes
