Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11535 )

Change subject: IMPALA-6661 Make NaN values equal for grouping purposes.
......................................................................


Patch Set 11:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/codegen/codegen-anyval.cc
File be/src/codegen/codegen-anyval.cc:

http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/codegen/codegen-anyval.cc@748
PS9, Line 748:       llvm::Value* cmp_raw = builder_->CreateFCmpOEQ(local_val, 
val, "cmp_raw");
             :       if (!inclusive_equality) return cmp_raw;
> I need to generate that comparison in either case, so if I followed you sug
Good point. Missed that it's used twice.


http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/exec/hash-table.cc
File be/src/exec/hash-table.cc:

http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/exec/hash-table.cc@801
PS9, Line 801:
> There is no "inclusive_equality" flag here.  The restriction to probe-only
The probe-only behavior here doesn't match the behavior of the non-codegen'd 
version of the code (i.e. EvalRow which is called by EvalBuildRow() and 
EvalProbeRow()) which unconditionally converts Nan to its canonical form.


http://gerrit.cloudera.org:8080/#/c/11535/11/be/src/exec/hash-table.cc
File be/src/exec/hash-table.cc:

http://gerrit.cloudera.org:8080/#/c/11535/11/be/src/exec/hash-table.cc@237
PS11, Line 237: INCLUSIVE_EQUALITY
This change in the logic of hash table may have implication to Join too.

It may warrant a test case to verify the behavior of Nan <=> Nan (which should 
evaluate to false isn't changed after this patch in this file: 
https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/joins.test

Please see the section "Handling NULLs in Join Columns" here 
(https://www.cloudera.com/documentation/enterprise/latest/topics/impala_joins.html)
 for background.

Can you please also verify the behavior of join on columns with Nan value 
doesn't change after this patch ?

  create table nan_test (c1 float, c2 float);

  insert into nan_test select cast(sqrt(0.5-x) as FLOAT), cast(sqrt(3.5-y) as 
FLOAT) from (VALUES((1.6 x, 2 y), (3.2, 4), (5.4,6), (0.5, 2), (0.5, 3), (-0.5, 
1))) T;

  select t1.c1,t2.c1 from nan_test t1 right outer join nan_test t2 on t1.c1 = 
t2.c1;


http://gerrit.cloudera.org:8080/#/c/11535/11/be/src/exec/hash-table.cc@256
PS11, Line 256: val_is_ambiguous
nit: val_is_nan


http://gerrit.cloudera.org:8080/#/c/11535/11/be/src/exec/hash-table.cc@257
PS11, Line 257: local_is_ambiguous
nit: local_is_nan


http://gerrit.cloudera.org:8080/#/c/11535/11/be/src/runtime/raw-value.h
File be/src/runtime/raw-value.h:

http://gerrit.cloudera.org:8080/#/c/11535/11/be/src/runtime/raw-value.h@40
PS11, Line 40:
nit: extra space


http://gerrit.cloudera.org:8080/#/c/11535/9/testdata/workloads/functional-query/queries/QueryTest/exprs.test
File testdata/workloads/functional-query/queries/QueryTest/exprs.test:

http://gerrit.cloudera.org:8080/#/c/11535/9/testdata/workloads/functional-query/queries/QueryTest/exprs.test@25
PS9, Line 25: group by Z
> Done
The new test case still seems to be group-by a single column. May be you can 
try something like:

  select count(*), cast(sqrt(0.5-x) as FLOAT) as Z, cast(sqrt(0.5+x) as FLOAT) 
as Y
   from (VALUES((1.6 x, 2 y), (3.2, 4), (5.4,6), (0.5, 2), (0.5, 3), (-0.5, 
1))) T
   group by Y,Z;


http://gerrit.cloudera.org:8080/#/c/11535/11/testdata/workloads/functional-query/queries/QueryTest/exprs.test
File testdata/workloads/functional-query/queries/QueryTest/exprs.test:

http://gerrit.cloudera.org:8080/#/c/11535/11/testdata/workloads/functional-query/queries/QueryTest/exprs.test@12
PS11, Line 12:  GROUP BY of NaN values aggregates NaN's as one grouping
             : select count(*), sqrt(0.5-x) as Z
             : from (VALUES((1.6 x, 2 y), (3.2, 4), (5.4,6))) T
             : group by Z
             : ---- RESULTS
             : 3, NaN
             : ---- TYPES
             : bigint, double
             : ====
             : ---- QUERY
             : # GROUP BY of NaN values aggregates NaN's as one grouping
             : select count(*), cast(sqrt(0.5-x) as FLOAT) as Z
             : from (VALUES((1.6 x, 2 y), (3.2, 4), (5.4,6), (0.5, 2), (0.5, 
3), (-0.5, 1))) T
             : group by Z order by Z
             : ---- RESULTS
             : 3, NaN
             : 2, 0
             : 1, 1
             : ---- TYPES
             : bigint, float
             : ====
             : ---- QUERY
These tests are testing the behavior of group-by so they should belong to 
aggregation.test



--
To view, visit http://gerrit.cloudera.org:8080/11535
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I996c4a2e1934fd887046ed0c55457b7285375086
Gerrit-Change-Number: 11535
Gerrit-PatchSet: 11
Gerrit-Owner: Michal Ostrowski <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Michal Ostrowski <[email protected]>
Gerrit-Reviewer: Thomas Marshall <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Fri, 12 Oct 2018 00:49:35 +0000
Gerrit-HasComments: Yes

Reply via email to