Fucun Chu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17008 )

Change subject: IMPALA-10463: Implement ds_theta_sketch() and 
ds_theta_estimate() functions
......................................................................


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17008/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17008/2//COMMIT_MSG@7
PS2, Line 7: ds_theta_estimate
> nit: typo
Done


http://gerrit.cloudera.org:8080/#/c/17008/2//COMMIT_MSG@13
PS2, Line 13: ds_theta_estimate
> nit: same typo
Done


http://gerrit.cloudera.org:8080/#/c/17008/2//COMMIT_MSG@28
PS2, Line 28:    see IMPALA-10464.
> I'd also include some highlights from that perf measurement doc into the co
Done


http://gerrit.cloudera.org:8080/#/c/17008/2/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17008/2/be/src/exprs/aggregate-functions-ir.cc@1646
PS2, Line 1646: SerializeDsThetaSketch(
> In contrast with HLL as I see Theta doesn't compact the sketch just seriali
Done


http://gerrit.cloudera.org:8080/#/c/17008/2/be/src/exprs/aggregate-functions-ir.cc@1899
PS2, Line 1899:       or dst->len == sizeof(datasketches::theta_union));
> I;m a bit lost here. Could you help me understand why is it needed to conve
Previously, it was processed along the idea that the size of dst is unchanged, 
and it is better to return union_sketch.


http://gerrit.cloudera.org:8080/#/c/17008/2/be/src/exprs/datasketches-functions-ir.cc
File be/src/exprs/datasketches-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17008/2/be/src/exprs/datasketches-functions-ir.cc@110
PS2, Line 110: return 0;
> HLL returns a null here. Have you checked the behaviour in Hive to be in sy
Comparing the test cases of HLL and Theta, the results are different.
Theta:
https://github.com/apache/datasketches-hive/blob/master/src/test/java/org/apache/datasketches/hive/theta/EstimateSketchUDFTest.java#L34
HLL:
https://github.com/apache/datasketches-hive/blob/master/src/test/java/org/apache/datasketches/hive/hll/SketchToEstimateUDFTest.java#L31


http://gerrit.cloudera.org:8080/#/c/17008/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
File 
testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test:

http://gerrit.cloudera.org:8080/#/c/17008/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@138
PS2, Line 138: # Check that ds_theta_estimate returns error for strings that 
are not serialized sketches.
> Please add a test when ds_theta_estimate() is used on an HLL sketch. I gues
Done



--
To view, visit http://gerrit.cloudera.org:8080/17008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14f24c16b815eec75cf90bb92c8b8b0363dcbfbc
Gerrit-Change-Number: 17008
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu <[email protected]>
Gerrit-Reviewer: Fucun Chu <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Comment-Date: Wed, 10 Feb 2021 13:38:14 +0000
Gerrit-HasComments: Yes

Reply via email to