Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16000 )

Change subject: IMPALA-9632: Implement ds_hll_sketch() and ds_hll_estimate()
......................................................................


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16000/5/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16000/5/be/src/exprs/aggregate-functions-ir.cc@1578
PS5, Line 1578: void AggregateFunctions::DsHllUpdate(
> Removed decimal support, however, Impala converts them automatically to big
I have experimented a bit with upstream Hive and it turned out that (apart from 
TIMESTAMP and DECIMAL) the following types are also not supported:
  - BOOLEAN
  - SMALLINT
  - DATE

I think it would be great if we could disable DECIMAL too, e.g. by adding a new 
parameter "strict_type_check" to functions. The rationale is that in functions 
that hash values, implicit casts shouldn't be allowed, as even if the cast is 
lossless, the change of representation can still change the results. We can go 
another way if this turns out to be very hard, but I think it worth to take a 
deeper look.


http://gerrit.cloudera.org:8080/#/c/16000/6/be/src/exprs/datasketches-functions-ir.cc
File be/src/exprs/datasketches-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16000/6/be/src/exprs/datasketches-functions-ir.cc@37
PS6, Line 37: if (ctx->impl()->state()->abort_on_error())
I would vote on treating this always as error. We can relax this later if it 
turns out that there are occasionally corrupt sketches and we want to ignore 
them.


http://gerrit.cloudera.org:8080/#/c/16000/5/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
File 
testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test:

http://gerrit.cloudera.org:8080/#/c/16000/5/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test@64
PS5, Line 64: 0,0
Hive seems to return NULL if there are no non-NULL values. I think that the two 
different solution could interoperate well, but I would prefer to work the same 
if it is not very hard to do it.


http://gerrit.cloudera.org:8080/#/c/16000/6/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
File 
testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test:

http://gerrit.cloudera.org:8080/#/c/16000/6/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test@121
PS6, Line 121:
typo



--
To view, visit http://gerrit.cloudera.org:8080/16000
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic602cb6eb2bfbeab37e5e4cba11fbf0ca40b03fe
Gerrit-Change-Number: 16000
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 Jun 2020 16:50:58 +0000
Gerrit-HasComments: Yes

Reply via email to