Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/11535 )
Change subject: IMPALA-6661 Make NaN values equal for grouping purposes. ...................................................................... Patch Set 9: (25 comments) http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/codegen/codegen-anyval.h File be/src/codegen/codegen-anyval.h: http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/codegen/codegen-anyval.h@236 PS9, Line 236: ensure ensures http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/codegen/codegen-anyval.h@239 PS9, Line 239: /// value of the value. May want to clarify that this is a no-op for types other than FLOAT and DOUBLE or if the value itself is not a Nan to begin with. http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/codegen/codegen-anyval.h@240 PS9, Line 240: ConvertToCanonicalForm ConvertNanToCanonicalForm(); http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/codegen/codegen-anyval.h@249 PS9, Line 249: inclusive_equality Warrants a quick comment. http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/codegen/codegen-anyval.cc File be/src/codegen/codegen-anyval.cc: http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/codegen/codegen-anyval.cc@476 PS9, Line 476: llvm::Value * llvm::Value* . Same below. http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/codegen/codegen-anyval.cc@477 PS9, Line 477: NULL Seems unnecessary to initialize this value here as it's set in all paths which can be taken below, http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/codegen/codegen-anyval.cc@483 PS9, Line 483: cond "is_nan" is a more fitting name. http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/codegen/codegen-anyval.cc@487 PS9, Line 487: default: : ; May as well remove these lines. http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/codegen/codegen-anyval.cc@747 PS9, Line 747: llvm::Value * llvm::Value* . Same below. http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/codegen/codegen-anyval.cc@748 PS9, Line 748: llvm::Value *cmp_raw = builder_->CreateFCmpOEQ(local_val, val, "cmp_raw"); : if (!inclusive_equality) return cmp_raw; nit: seems slightly easier to follow: if (!inclusive_equality) { return builder_->CreateFCmpOEQ(local_val, val, "cmp_raw"); } http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/exec/hash-table.h File be/src/exec/hash-table.h: http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/exec/hash-table.h@464 PS9, Line 464: broadens the meaning of equality to null means "NULL == NULL" and "Nan == Nan" http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/exec/hash-table.cc File be/src/exec/hash-table.cc: http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/exec/hash-table.cc@198 PS9, Line 198: const ColumnType & nit: const ColumnType& expr_type http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/exec/hash-table.cc@253 PS9, Line 253: const ColumnType & nit: const Columntype& expr_type http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/exec/hash-table.cc@257 PS9, Line 257: loc huh ? Did you mean val ? Also, the entire thing seems easier to parse if it's broken down: if (!RawValue::Eq(loc, val, expr_type)) { if (INCLUSIVE_EQUALITY) { // Check for the case of Nan == Nan bool loc_is_nan = RawValue::IsNan(loc, expr_type) ; bool val_is_nan = RawValue::IsNan(val, expr_type); if (loc_is_nan && val_is_nan) continue; } return false; } http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/exec/hash-table.cc@801 PS9, Line 801: if (!build) { Don't we need to check for inclusive_equality flag here ? Also, why is the conversion to canonical form limited to probe only ? (i.e. !build) ? http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/exec/hash-table.cc@1149 PS9, Line 1149: if (inclusive_equality) { : result.ConvertToCanonicalForm(); : } nit: one line http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/exec/hash-table.cc@1179 PS9, Line 1179: nit: extra blank line http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/runtime/raw-value.h File be/src/runtime/raw-value.h: http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/runtime/raw-value.h@40 PS9, Line 40: static const double CANONICAL_DOUBLE_NAN; : static const float CANONICAL_FLOAT_NAN; These deserve some comments on their purposes. http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/runtime/raw-value.h@107 PS9, Line 107: IsAmbiguous IsNan() seems to be a more appropriate name. The name and the description of this function is a bit ambiguous (punt unintended). The comment seems clearer to just say: Returns true if 'val' is of type float or double and it has a Nan value. http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/runtime/raw-value.h@112 PS9, Line 112: const ColumnType & nit: const ColumnType& type http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/runtime/raw-value.h@112 PS9, Line 112: CanonicalValue CanonicalNanValue(). The comment can be: Returns the canonical representation of Nan value corresponding to 'type'. http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/runtime/raw-value.inline.h File be/src/runtime/raw-value.inline.h: http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/runtime/raw-value.inline.h@34 PS9, Line 34: nit: blank line. http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/runtime/raw-value.inline.h@50 PS9, Line 50: DCHECK(false); Calling this function with a type other than float or double seems legitimate. This DCHECK() seems unnecessary. http://gerrit.cloudera.org:8080/#/c/11535/9/be/src/runtime/raw-value.inline.h@63 PS9, Line 63: NULL nullptr http://gerrit.cloudera.org:8080/#/c/11535/9/testdata/workloads/functional-query/queries/QueryTest/exprs.test File testdata/workloads/functional-query/queries/QueryTest/exprs.test: http://gerrit.cloudera.org:8080/#/c/11535/9/testdata/workloads/functional-query/queries/QueryTest/exprs.test@25 PS9, Line 25: group by Z May warrant some test cases which group by multiple columns, Some columns which may evaluate to Nan and some don't. -- To view, visit http://gerrit.cloudera.org:8080/11535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I996c4a2e1934fd887046ed0c55457b7285375086 Gerrit-Change-Number: 11535 Gerrit-PatchSet: 9 Gerrit-Owner: Michal Ostrowski <mostr...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Michal Ostrowski <mostr...@cloudera.com> Gerrit-Reviewer: Thomas Marshall <thomasmarsh...@cmu.edu> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Tue, 09 Oct 2018 01:17:45 +0000 Gerrit-HasComments: Yes