Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16656 )

Change subject: IMPALA-10282: Implement ds_cpc_sketch() and ds_cpc_estimate() 
functions
......................................................................


Patch Set 4: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16656/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16656/4//COMMIT_MSG@28
PS4, Line 28:  Ran manual tests on tpch_parquet.lineitem to compare perfomance
            :    with ndv(). Depending on data characteristics ndv() appears 
2x-3x
            :    faster. CPC gives closer estimate than current ndv(). CPC is 
more
            :    accurate than HLL in some cases
Have you compared CPC and HLL in terms of runtime performance? It would be nice 
to see if any of them is faster. I see the link for comparison above, I just 
wanted to see some numbers when we run these algorithms through Impala.

Additionally, could you share more details when you compare CPC an HLL in terms 
of accuracy. You mention that in some cases CPC is more accurate. Could you 
mention which are these cases and what is the difference between the algorithms?



--
To view, visit http://gerrit.cloudera.org:8080/16656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I731e66fbadc74bc339c973f4d9337db9b7dd715a
Gerrit-Change-Number: 16656
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Comment-Date: Wed, 02 Dec 2020 08:07:17 +0000
Gerrit-HasComments: Yes

Reply via email to