Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11535 )
Change subject: IMPALA-6661 Make NaN values equal for grouping purposes. ...................................................................... Patch Set 11: (8 comments) http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/codegen/codegen-anyval.cc File be/src/codegen/codegen-anyval.cc: http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/codegen/codegen-anyval.cc@748 PS9, Line 748: llvm::Value* cmp_raw = builder_->CreateFCmpOEQ(local_val, val, "cmp_raw"); : if (!inclusive_equality) return cmp_raw; > I need to generate that comparison in either case, so if I followed you sug Good point. Missed that it's used twice. http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/exec/hash-table.cc File be/src/exec/hash-table.cc: http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/exec/hash-table.cc@801 PS9, Line 801: > There is no "inclusive_equality" flag here. The restriction to probe-only The probe-only behavior here doesn't match the behavior of the non-codegen'd version of the code (i.e. EvalRow which is called by EvalBuildRow() and EvalProbeRow()) which unconditionally converts Nan to its canonical form. http://gerrit.cloudera.org:8080/#/c/11535/11/be/src/exec/hash-table.cc File be/src/exec/hash-table.cc: http://gerrit.cloudera.org:8080/#/c/11535/11/be/src/exec/hash-table.cc@237 PS11, Line 237: INCLUSIVE_EQUALITY This change in the logic of hash table may have implication to Join too. It may warrant a test case to verify the behavior of Nan <=> Nan (which should evaluate to false isn't changed after this patch in this file: https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/joins.test Please see the section "Handling NULLs in Join Columns" here (https://www.cloudera.com/documentation/enterprise/latest/topics/impala_joins.html) for background. Can you please also verify the behavior of join on columns with Nan value doesn't change after this patch ? create table nan_test (c1 float, c2 float); insert into nan_test select cast(sqrt(0.5-x) as FLOAT), cast(sqrt(3.5-y) as FLOAT) from (VALUES((1.6 x, 2 y), (3.2, 4), (5.4,6), (0.5, 2), (0.5, 3), (-0.5, 1))) T; select t1.c1,t2.c1 from nan_test t1 right outer join nan_test t2 on t1.c1 = t2.c1; http://gerrit.cloudera.org:8080/#/c/11535/11/be/src/exec/hash-table.cc@256 PS11, Line 256: val_is_ambiguous nit: val_is_nan http://gerrit.cloudera.org:8080/#/c/11535/11/be/src/exec/hash-table.cc@257 PS11, Line 257: local_is_ambiguous nit: local_is_nan http://gerrit.cloudera.org:8080/#/c/11535/11/be/src/runtime/raw-value.h File be/src/runtime/raw-value.h: http://gerrit.cloudera.org:8080/#/c/11535/11/be/src/runtime/raw-value.h@40 PS11, Line 40: nit: extra space http://gerrit.cloudera.org:8080/#/c/11535/9/testdata/workloads/functional-query/queries/QueryTest/exprs.test File testdata/workloads/functional-query/queries/QueryTest/exprs.test: http://gerrit.cloudera.org:8080/#/c/11535/9/testdata/workloads/functional-query/queries/QueryTest/exprs.test@25 PS9, Line 25: group by Z > Done The new test case still seems to be group-by a single column. May be you can try something like: select count(*), cast(sqrt(0.5-x) as FLOAT) as Z, cast(sqrt(0.5+x) as FLOAT) as Y from (VALUES((1.6 x, 2 y), (3.2, 4), (5.4,6), (0.5, 2), (0.5, 3), (-0.5, 1))) T group by Y,Z; http://gerrit.cloudera.org:8080/#/c/11535/11/testdata/workloads/functional-query/queries/QueryTest/exprs.test File testdata/workloads/functional-query/queries/QueryTest/exprs.test: http://gerrit.cloudera.org:8080/#/c/11535/11/testdata/workloads/functional-query/queries/QueryTest/exprs.test@12 PS11, Line 12: GROUP BY of NaN values aggregates NaN's as one grouping : select count(*), sqrt(0.5-x) as Z : from (VALUES((1.6 x, 2 y), (3.2, 4), (5.4,6))) T : group by Z : ---- RESULTS : 3, NaN : ---- TYPES : bigint, double : ==== : ---- QUERY : # GROUP BY of NaN values aggregates NaN's as one grouping : select count(*), cast(sqrt(0.5-x) as FLOAT) as Z : from (VALUES((1.6 x, 2 y), (3.2, 4), (5.4,6), (0.5, 2), (0.5, 3), (-0.5, 1))) T : group by Z order by Z : ---- RESULTS : 3, NaN : 2, 0 : 1, 1 : ---- TYPES : bigint, float : ==== : ---- QUERY These tests are testing the behavior of group-by so they should belong to aggregation.test -- To view, visit http://gerrit.cloudera.org:8080/11535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I996c4a2e1934fd887046ed0c55457b7285375086 Gerrit-Change-Number: 11535 Gerrit-PatchSet: 11 Gerrit-Owner: Michal Ostrowski <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Michal Ostrowski <[email protected]> Gerrit-Reviewer: Thomas Marshall <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Fri, 12 Oct 2018 00:49:35 +0000 Gerrit-HasComments: Yes
