Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16000 )
Change subject: IMPALA-9632: Implement ds_hll_sketch() and ds_hll_estimate() ...................................................................... Patch Set 5: (4 comments) http://gerrit.cloudera.org:8080/#/c/16000/5/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/16000/5/be/src/exprs/aggregate-functions-ir.cc@1578 PS5, Line 1578: void AggregateFunctions::DsHllUpdate( > Removed decimal support, however, Impala converts them automatically to big I have experimented a bit with upstream Hive and it turned out that (apart from TIMESTAMP and DECIMAL) the following types are also not supported: - BOOLEAN - SMALLINT - DATE I think it would be great if we could disable DECIMAL too, e.g. by adding a new parameter "strict_type_check" to functions. The rationale is that in functions that hash values, implicit casts shouldn't be allowed, as even if the cast is lossless, the change of representation can still change the results. We can go another way if this turns out to be very hard, but I think it worth to take a deeper look. http://gerrit.cloudera.org:8080/#/c/16000/6/be/src/exprs/datasketches-functions-ir.cc File be/src/exprs/datasketches-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/16000/6/be/src/exprs/datasketches-functions-ir.cc@37 PS6, Line 37: if (ctx->impl()->state()->abort_on_error()) I would vote on treating this always as error. We can relax this later if it turns out that there are occasionally corrupt sketches and we want to ignore them. http://gerrit.cloudera.org:8080/#/c/16000/5/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test File testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test: http://gerrit.cloudera.org:8080/#/c/16000/5/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test@64 PS5, Line 64: 0,0 Hive seems to return NULL if there are no non-NULL values. I think that the two different solution could interoperate well, but I would prefer to work the same if it is not very hard to do it. http://gerrit.cloudera.org:8080/#/c/16000/6/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test File testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test: http://gerrit.cloudera.org:8080/#/c/16000/6/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test@121 PS6, Line 121: typo -- To view, visit http://gerrit.cloudera.org:8080/16000 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic602cb6eb2bfbeab37e5e4cba11fbf0ca40b03fe Gerrit-Change-Number: 16000 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Comment-Date: Tue, 23 Jun 2020 16:50:58 +0000 Gerrit-HasComments: Yes