[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16267 ) Change subject: IMPALA-10017: Implement ds_kll_union() function .. IMPALA-10017: Implement ds_kll_union() function This function receives a set of serialized Apache DataSketches KLL sketches produced by ds_kll_sketch() and merges them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and based on which partition the user is interested of the relevant sketches can be union-ed together to get an estimate. E.g.: SELECT ds_kll_quantile(ds_kll_union(sketch_col), 0.5) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Testing: - Apart from the automated tests I added to this patch I also tested ds_kll_union() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_kll_union() on those sketches. Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf Reviewed-on: http://gerrit.cloudera.org:8080/16267 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README A testdata/data/kll_sketches_from_impala.parquet M testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test M tests/query_test/test_datasketches.py 7 files changed, 204 insertions(+), 39 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/16267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf Gerrit-Change-Number: 16267 Gerrit-PatchSet: 10 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16267 ) Change subject: IMPALA-10017: Implement ds_kll_union() function .. Patch Set 9: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf Gerrit-Change-Number: 16267 Gerrit-PatchSet: 9 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sat, 08 Aug 2020 11:50:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16267 ) Change subject: IMPALA-10017: Implement ds_kll_union() function .. Patch Set 9: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6253/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf Gerrit-Change-Number: 16267 Gerrit-PatchSet: 9 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sat, 08 Aug 2020 06:35:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16267 ) Change subject: IMPALA-10017: Implement ds_kll_union() function .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf Gerrit-Change-Number: 16267 Gerrit-PatchSet: 9 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sat, 08 Aug 2020 06:35:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16267 ) Change subject: IMPALA-10017: Implement ds_kll_union() function .. Patch Set 8: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6252/ -- To view, visit http://gerrit.cloudera.org:8080/16267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf Gerrit-Change-Number: 16267 Gerrit-PatchSet: 8 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sat, 08 Aug 2020 01:16:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16267 ) Change subject: IMPALA-10017: Implement ds_kll_union() function .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6839/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf Gerrit-Change-Number: 16267 Gerrit-PatchSet: 7 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 07 Aug 2020 21:28:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16267 ) Change subject: IMPALA-10017: Implement ds_kll_union() function .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf Gerrit-Change-Number: 16267 Gerrit-PatchSet: 8 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 07 Aug 2020 21:04:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16267 ) Change subject: IMPALA-10017: Implement ds_kll_union() function .. Patch Set 8: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6252/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf Gerrit-Change-Number: 16267 Gerrit-PatchSet: 8 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 07 Aug 2020 21:04:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16267 ) Change subject: IMPALA-10017: Implement ds_kll_union() function .. Patch Set 7: Code-Review+2 Carry +2 from Csaba -- To view, visit http://gerrit.cloudera.org:8080/16267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf Gerrit-Change-Number: 16267 Gerrit-PatchSet: 7 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 07 Aug 2020 21:04:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16267 ) Change subject: IMPALA-10017: Implement ds_kll_union() function .. Patch Set 7: PS7 is rebase with master -- To view, visit http://gerrit.cloudera.org:8080/16267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf Gerrit-Change-Number: 16267 Gerrit-PatchSet: 7 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 07 Aug 2020 21:03:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function
Hello Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16267 to look at the new patch set (#7). Change subject: IMPALA-10017: Implement ds_kll_union() function .. IMPALA-10017: Implement ds_kll_union() function This function receives a set of serialized Apache DataSketches KLL sketches produced by ds_kll_sketch() and merges them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and based on which partition the user is interested of the relevant sketches can be union-ed together to get an estimate. E.g.: SELECT ds_kll_quantile(ds_kll_union(sketch_col), 0.5) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Testing: - Apart from the automated tests I added to this patch I also tested ds_kll_union() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_kll_union() on those sketches. Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README A testdata/data/kll_sketches_from_impala.parquet M testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test M tests/query_test/test_datasketches.py 7 files changed, 204 insertions(+), 39 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/16267/7 -- To view, visit http://gerrit.cloudera.org:8080/16267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf Gerrit-Change-Number: 16267 Gerrit-PatchSet: 7 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16267 ) Change subject: IMPALA-10017: Implement ds_kll_union() function .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6824/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf Gerrit-Change-Number: 16267 Gerrit-PatchSet: 6 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 07 Aug 2020 10:56:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16267 ) Change subject: IMPALA-10017: Implement ds_kll_union() function .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6823/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf Gerrit-Change-Number: 16267 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 07 Aug 2020 10:45:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16267 ) Change subject: IMPALA-10017: Implement ds_kll_union() function .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/16267/3/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/16267/3/be/src/exprs/aggregate-functions-ir.cc@1851 PS3, Line 1851: > Can you also add a similar block for HLL (line 1796)? It is ok to do that i Done -- To view, visit http://gerrit.cloudera.org:8080/16267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf Gerrit-Change-Number: 16267 Gerrit-PatchSet: 6 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 07 Aug 2020 10:44:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function
Hello Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16267 to look at the new patch set (#6). Change subject: IMPALA-10017: Implement ds_kll_union() function .. IMPALA-10017: Implement ds_kll_union() function This function receives a set of serialized Apache DataSketches KLL sketches produced by ds_kll_sketch() and merges them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and based on which partition the user is interested of the relevant sketches can be union-ed together to get an estimate. E.g.: SELECT ds_kll_quantile(ds_kll_union(sketch_col), 0.5) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Testing: - Apart from the automated tests I added to this patch I also tested ds_kll_union() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_kll_union() on those sketches. Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README A testdata/data/kll_sketches_from_impala.parquet M testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test M tests/query_test/test_datasketches.py 7 files changed, 204 insertions(+), 39 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/16267/6 -- To view, visit http://gerrit.cloudera.org:8080/16267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf Gerrit-Change-Number: 16267 Gerrit-PatchSet: 6 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16267 ) Change subject: IMPALA-10017: Implement ds_kll_union() function .. Patch Set 5: PS5 is a rebase with master. -- To view, visit http://gerrit.cloudera.org:8080/16267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf Gerrit-Change-Number: 16267 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 07 Aug 2020 10:24:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function
Hello Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16267 to look at the new patch set (#5). Change subject: IMPALA-10017: Implement ds_kll_union() function .. IMPALA-10017: Implement ds_kll_union() function This function receives a set of serialized Apache DataSketches KLL sketches produced by ds_kll_sketch() and merges them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and based on which partition the user is interested of the relevant sketches can be union-ed together to get an estimate. E.g.: SELECT ds_kll_quantile(ds_kll_union(sketch_col), 0.5) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Testing: - Apart from the automated tests I added to this patch I also tested ds_kll_union() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_kll_union() on those sketches. Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README A testdata/data/kll_sketches_from_impala.parquet M testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test M tests/query_test/test_datasketches.py 7 files changed, 199 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/16267/5 -- To view, visit http://gerrit.cloudera.org:8080/16267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf Gerrit-Change-Number: 16267 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16267 ) Change subject: IMPALA-10017: Implement ds_kll_union() function .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6806/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf Gerrit-Change-Number: 16267 Gerrit-PatchSet: 4 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 06 Aug 2020 13:15:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16267 ) Change subject: IMPALA-10017: Implement ds_kll_union() function .. Patch Set 4: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/16267/3/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/16267/3/be/src/exprs/aggregate-functions-ir.cc@1851 PS3, Line 1851: etch)) { > The code you linked is urelated here, it is for HLL sketches. However, the Can you also add a similar block for HLL (line 1796)? It is ok to do that in another patch, but I think that it is the simplest to do it here. -- To view, visit http://gerrit.cloudera.org:8080/16267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf Gerrit-Change-Number: 16267 Gerrit-PatchSet: 4 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 06 Aug 2020 13:12:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16267 ) Change subject: IMPALA-10017: Implement ds_kll_union() function .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/16267/3/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/16267/3/be/src/exprs/aggregate-functions-ir.cc@1851 PS3, Line 1851: etch)) { > Can you add a try-catch block? I randomly checked a deserialize function an The code you linked is urelated here, it is for HLL sketches. However, the one for KLL ca also throw: https://github.com/apache/impala/blob/074731e2bcf37643710f2fdf236829991a462fc3/be/src/thirdparty/datasketches/kll_sketch_impl.hpp#L534 E.g. ensure_minimum_memory() throws, haven't checked the rest, I'll add a try-catch block, thanks for spotting. Done http://gerrit.cloudera.org:8080/#/c/16267/3/be/src/exprs/aggregate-functions-ir.cc@1922 PS3, Line 1922: DCHECK(!dst->is_null); > merge can throw an exception, please put it in a try-catch block: Done. I also found one occurrence above, changed that as well. -- To view, visit http://gerrit.cloudera.org:8080/16267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf Gerrit-Change-Number: 16267 Gerrit-PatchSet: 4 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 06 Aug 2020 12:52:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function
Hello Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16267 to look at the new patch set (#4). Change subject: IMPALA-10017: Implement ds_kll_union() function .. IMPALA-10017: Implement ds_kll_union() function This function receives a set of serialized Apache DataSketches KLL sketches produced by ds_kll_sketch() and merges them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and based on which partition the user is interested of the relevant sketches can be union-ed together to get an estimate. E.g.: SELECT ds_kll_quantile(ds_kll_union(sketch_col), 0.5) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Testing: - Apart from the automated tests I added to this patch I also tested ds_kll_union() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_kll_union() on those sketches. Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README A testdata/data/kll_sketches_from_impala.parquet M testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test M tests/query_test/test_datasketches.py 7 files changed, 199 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/16267/4 -- To view, visit http://gerrit.cloudera.org:8080/16267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf Gerrit-Change-Number: 16267 Gerrit-PatchSet: 4 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16267 ) Change subject: IMPALA-10017: Implement ds_kll_union() function .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/16267/3/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/16267/3/be/src/exprs/aggregate-functions-ir.cc@1851 PS3, Line 1851: deserialize Can you add a try-catch block? I randomly checked a deserialize function and it can throw exception: https://github.com/apache/impala/blob/074731e2bcf37643710f2fdf236829991a462fc3/be/src/thirdparty/datasketches/HllSketchImplFactory.hpp#L86 It would be nice to do this for other similar calls to datasketches functions too. http://gerrit.cloudera.org:8080/#/c/16267/3/be/src/exprs/aggregate-functions-ir.cc@1922 PS3, Line 1922: dst_sketch->merge(src_sketch); merge can throw an exception, please put it in a try-catch block: https://github.com/apache/impala/blob/074731e2bcf37643710f2fdf236829991a462fc3/be/src/thirdparty/datasketches/kll_sketch_impl.hpp#L182 -- To view, visit http://gerrit.cloudera.org:8080/16267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf Gerrit-Change-Number: 16267 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 05 Aug 2020 15:10:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16267 ) Change subject: IMPALA-10017: Implement ds_kll_union() function .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6751/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf Gerrit-Change-Number: 16267 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 31 Jul 2020 14:09:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16267 ) Change subject: IMPALA-10017: Implement ds_kll_union() function .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6750/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf Gerrit-Change-Number: 16267 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 31 Jul 2020 14:05:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16267 to look at the new patch set (#2). Change subject: IMPALA-10017: Implement ds_kll_union() function .. IMPALA-10017: Implement ds_kll_union() function This function receives a set of serialized Apache DataSketches KLL sketches produced by ds_kll_sketch() and merges them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and based on which partition the user is interested of the relevant sketches can be union-ed together to get an estimate. E.g.: SELECT ds_kll_quantile(ds_kll_union(sketch_col), 0.5) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Testing: - Apart from the automated tests I added to this patch I also tested ds_kll_union() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_kll_union() on those sketches. Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README A testdata/data/kll_sketches_from_impala.parquet M testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test M tests/query_test/test_datasketches.py 7 files changed, 184 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/16267/2 -- To view, visit http://gerrit.cloudera.org:8080/16267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf Gerrit-Change-Number: 16267 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10017: Implement ds kll union() function
Gabor Kaszab has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16267 Change subject: IMPALA-10017: Implement ds_kll_union() function .. IMPALA-10017: Implement ds_kll_union() function This function receives a set of serialized Apache DataSketches KLL sketches produced by ds_kll_sketch() and merges them into a single sketch. An example usage is to create a sketch for each partition of a table, write these sketches to a separate table and based on which partition the user is interested of the relevant sketches can be union-ed together to get an estimate. E.g.: SELECT ds_kll_quantile(ds_kll_union(sketch_col), 0.5) FROM sketch_tbl WHERE partition_col=1 OR partition_col=5; Testing: - Apart from the automated tests I added to this patch I also tested ds_kll_union() on a bigger dataset to check that serialization, deserialization and merging steps work well. I took TPCH25.linelitem, created a number of sketches with grouping by l_shipdate and called ds_kll_union() on those sketches. Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README M testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test M tests/query_test/test_datasketches.py 6 files changed, 184 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/16267/1 -- To view, visit http://gerrit.cloudera.org:8080/16267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I020aea28d36f9b6ef9fb57c08411f2170f5c24bf Gerrit-Change-Number: 16267 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab