Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16656 )
Change subject: IMPALA-10282: Implement ds_cpc_sketch() and ds_cpc_estimate() functions ...................................................................... Patch Set 4: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/16656/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16656/4//COMMIT_MSG@28 PS4, Line 28: Ran manual tests on tpch_parquet.lineitem to compare perfomance : with ndv(). Depending on data characteristics ndv() appears 2x-3x : faster. CPC gives closer estimate than current ndv(). CPC is more : accurate than HLL in some cases Have you compared CPC and HLL in terms of runtime performance? It would be nice to see if any of them is faster. I see the link for comparison above, I just wanted to see some numbers when we run these algorithms through Impala. Additionally, could you share more details when you compare CPC an HLL in terms of accuracy. You mention that in some cases CPC is more accurate. Could you mention which are these cases and what is the difference between the algorithms? -- To view, visit http://gerrit.cloudera.org:8080/16656 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I731e66fbadc74bc339c973f4d9337db9b7dd715a Gerrit-Change-Number: 16656 Gerrit-PatchSet: 4 Gerrit-Owner: Fucun Chu <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Comment-Date: Wed, 02 Dec 2020 08:07:17 +0000 Gerrit-HasComments: Yes
