Dan Hecht has posted comments on this change.

Change subject: Add a build flag for the undefined behavior sanitizer, aka 
"ubsan".
......................................................................


Patch Set 1:

(22 comments)

what's the plan for keeping this compiling?

http://gerrit.cloudera.org:8080/#/c/5082/1/be/CMakeLists.txt
File be/CMakeLists.txt:

PS1, Line 95: synbol
typo


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exec/data-source-scan-node.cc
File be/src/exec/data-source-scan-node.cc:

PS1, Line 150: )
add "!= 0" as we try to avoid implicit comparisons against zero conversion to 
bool.


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exec/hdfs-avro-scanner-ir.cc
File be/src/exec/hdfs-avro-scanner-ir.cc:

Line 276:           *decimal >>= bytes_to_fill * 8;
> If bytes_to_fill >= sizeof(decimal), this is UB
since len comes from the data file, we could also have bytes_to_fill < 0, no? 
isn't that also undefined?


Line 278:           *decimal = 0;
can this happen with well-formed avro? if not, this be a return false case.


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exec/partitioned-aggregation-node.cc
File be/src/exec/partitioned-aggregation-node.cc:

Line 1379:     double percent = total_rows * 100 / std::max(1L, num_input_rows);
> num_input_rows is sometimes 0.
if that were the case, we would have seen crashes.
In any case, when num_input_rows == 0, shouldn't we set precent to either 0 or 
100? or i guess that happens since it implies total_rows == 0, but would 
probably be more clear what's going on to just write:

percent = num_input_rows == 0 ? 0 : total_rows * 100 / num_input_rows;


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exec/read-write-util.cc
File be/src/exec/read-write-util.cc:

PS1, Line 97: *
nit: we put the * next to type.


Line 100:   memcpy(&uinteger, &integer, sizeof(integer));
> type punning is UB.
using memcpy doesn't seem necessary. can't we just write:

uint32_t uinteger = integer;
uinteger = (uinteger << 1) | (uinteger >> 31);

neither of those stmts are undefined.


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exprs/bit-byte-functions-ir.cc
File be/src/exprs/bit-byte-functions-ir.cc:

PS1, Line 151: v * 
why is this necessary?  and does it generate the same code (let's make sure 
there is no multiply)?


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

Line 820:   if (!haystack) return NULL;
okay but this check is still unnecessary - no loop iterations will be taken. 
let's avoid the extraneous cmp/branch.


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/buffered-tuple-stream.cc
File be/src/runtime/buffered-tuple-stream.cc:

PS1, Line 804: t
nit: "t != nullptr" per our style.


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/raw-value.cc
File be/src/runtime/raw-value.cc:

Line 161:         if (dest->len) memcpy(dest->ptr, src->ptr, dest->len);
Don't we return a non-null ptr (zero_length_region_) in this case to avoid the 
need for this check?


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

Line 975:     if (string_val->ptr) memcpy(dest, string_val->ptr, 
string_val->len);
!= nullptr


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/string-value.inline.h
File be/src/runtime/string-value.inline.h:

Line 58:   const int result = len ? strncmp(s1, s2, len) : 0;
> if either s1 or s2 is nullptr, this is UB
len != 0

though, we may define StringCompare() to have the same semantics, making this 
unnecessary. e.g. Compare() wrapper, though  maybe other callsites can pass 
nulls?


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/runtime/tuple.h
File be/src/runtime/tuple.h:

Line 75:     if (tuple_desc.num_null_bytes()) {
!= 0


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/udf/udf.cc
File be/src/udf/udf.cc:

Line 487:   if (LIKELY(len && !result.is_null)) {
len != 0

though this may be another place we expect to always have a valid pointer (as 
long as i!s_null).  We may need to switch the order of the 
FLAGS_stress_free_pool_alloc code in free-pool and its check for 0, however.  

That said, I'm okay with adding the len check here since we don't make the 
non-null assumption in the StringVal::StringVal(FunctionContext* context, int 
len) constructor itself either.


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

Line 36:   if (bit_width >= CHAR_BIT * sizeof(uint32_t)) return ~0;
> lshift past the end is UB
But where would we ever do that?  i.e. shouldn't this be a precondition to this 
routine (DCHECK)?


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/util/bitmap.h
File be/src/util/bitmap.h:

Line 78:     if (buffer_.size()) memset(buffer_.data(), 255 * b, buffer_.size() 
* sizeof(uint64_t));
!= 0


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/util/coding-util-test.cc
File be/src/util/coding-util-test.cc:

Line 45:   if (input.size()) memcpy(input_vector.data(), input.c_str(), 
input.size());
!= 0


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

Line 387:                         ::max(1L, total_time_counter()->value());
> sometimes divides by 0: UB
same comment - should it be set to 0 or 100 in this case rather than 
local_time_ns?


Line 998:   if (num) {
!= 0


http://gerrit.cloudera.org:8080/#/c/5082/1/be/src/util/streaming-sampler.h
File be/src/util/streaming-sampler.h:

Line 55:     if (samples_collected_) {
!= 0


Line 112:     if (samples_collected_) memcpy(samples_, samples.data(), 
sizeof(T) * samples_collected_);
same


-- 
To view, visit http://gerrit.cloudera.org:8080/5082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I88c7234bd7c5eb7404490a0913d90470c10835e7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple <jbap...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbap...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to