Dan Hecht has posted comments on this change. Change subject: Add a build flag for the undefined behavior sanitizer, aka "ubsan". ......................................................................
Patch Set 1: (22 comments) what's the plan for keeping this compiling? http://gerrit.cloudera.org:8080/#/c/5082/1/be/CMakeLists.txt File be/CMakeLists.txt: PS1, Line 95: synbol typo http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exec/data-source-scan-node.cc File be/src/exec/data-source-scan-node.cc: PS1, Line 150: ) add "!= 0" as we try to avoid implicit comparisons against zero conversion to bool. http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exec/hdfs-avro-scanner-ir.cc File be/src/exec/hdfs-avro-scanner-ir.cc: Line 276: *decimal >>= bytes_to_fill * 8; > If bytes_to_fill >= sizeof(decimal), this is UB since len comes from the data file, we could also have bytes_to_fill < 0, no? isn't that also undefined? Line 278: *decimal = 0; can this happen with well-formed avro? if not, this be a return false case. http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exec/partitioned-aggregation-node.cc File be/src/exec/partitioned-aggregation-node.cc: Line 1379: double percent = total_rows * 100 / std::max(1L, num_input_rows); > num_input_rows is sometimes 0. if that were the case, we would have seen crashes. In any case, when num_input_rows == 0, shouldn't we set precent to either 0 or 100? or i guess that happens since it implies total_rows == 0, but would probably be more clear what's going on to just write: percent = num_input_rows == 0 ? 0 : total_rows * 100 / num_input_rows; http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exec/read-write-util.cc File be/src/exec/read-write-util.cc: PS1, Line 97: * nit: we put the * next to type. Line 100: memcpy(&uinteger, &integer, sizeof(integer)); > type punning is UB. using memcpy doesn't seem necessary. can't we just write: uint32_t uinteger = integer; uinteger = (uinteger << 1) | (uinteger >> 31); neither of those stmts are undefined. http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exprs/bit-byte-functions-ir.cc File be/src/exprs/bit-byte-functions-ir.cc: PS1, Line 151: v * why is this necessary? and does it generate the same code (let's make sure there is no multiply)? http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exprs/string-functions-ir.cc File be/src/exprs/string-functions-ir.cc: Line 820: if (!haystack) return NULL; okay but this check is still unnecessary - no loop iterations will be taken. let's avoid the extraneous cmp/branch. http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/buffered-tuple-stream.cc File be/src/runtime/buffered-tuple-stream.cc: PS1, Line 804: t nit: "t != nullptr" per our style. http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/raw-value.cc File be/src/runtime/raw-value.cc: Line 161: if (dest->len) memcpy(dest->ptr, src->ptr, dest->len); Don't we return a non-null ptr (zero_length_region_) in this case to avoid the need for this check? http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: Line 975: if (string_val->ptr) memcpy(dest, string_val->ptr, string_val->len); != nullptr http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/string-value.inline.h File be/src/runtime/string-value.inline.h: Line 58: const int result = len ? strncmp(s1, s2, len) : 0; > if either s1 or s2 is nullptr, this is UB len != 0 though, we may define StringCompare() to have the same semantics, making this unnecessary. e.g. Compare() wrapper, though maybe other callsites can pass nulls? http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/tuple.h File be/src/runtime/tuple.h: Line 75: if (tuple_desc.num_null_bytes()) { != 0 http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/udf/udf.cc File be/src/udf/udf.cc: Line 487: if (LIKELY(len && !result.is_null)) { len != 0 though this may be another place we expect to always have a valid pointer (as long as i!s_null). We may need to switch the order of the FLAGS_stress_free_pool_alloc code in free-pool and its check for 0, however. That said, I'm okay with adding the len check here since we don't make the non-null assumption in the StringVal::StringVal(FunctionContext* context, int len) constructor itself either. http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/util/bit-packing-test.cc File be/src/util/bit-packing-test.cc: Line 36: if (bit_width >= CHAR_BIT * sizeof(uint32_t)) return ~0; > lshift past the end is UB But where would we ever do that? i.e. shouldn't this be a precondition to this routine (DCHECK)? http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/util/bitmap.h File be/src/util/bitmap.h: Line 78: if (buffer_.size()) memset(buffer_.data(), 255 * b, buffer_.size() * sizeof(uint64_t)); != 0 http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/util/coding-util-test.cc File be/src/util/coding-util-test.cc: Line 45: if (input.size()) memcpy(input_vector.data(), input.c_str(), input.size()); != 0 http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: Line 387: ::max(1L, total_time_counter()->value()); > sometimes divides by 0: UB same comment - should it be set to 0 or 100 in this case rather than local_time_ns? Line 998: if (num) { != 0 http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/util/streaming-sampler.h File be/src/util/streaming-sampler.h: Line 55: if (samples_collected_) { != 0 Line 112: if (samples_collected_) memcpy(samples_, samples.data(), sizeof(T) * samples_collected_); same -- To view, visit http://gerrit.cloudera.org:8080/5082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I88c7234bd7c5eb7404490a0913d90470c10835e7 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple <jbap...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Jim Apple <jbap...@cloudera.com> Gerrit-HasComments: Yes