Jim Apple has posted comments on this change.

Change subject: Add a build flag for the undefined behavior sanitizer, aka 
"ubsan".
......................................................................


Patch Set 5: Code-Review+1

(18 comments)

Carry Tim's +1.

http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/exec/hdfs-avro-scanner-ir.cc
File be/src/exec/hdfs-avro-scanner-ir.cc:

Line 275:       switch (slot_byte_size) {
> this doesn't seem necessary: we know that:
It is actually undefined to shift int32 by 32.

Added a check outside the switch; nice catch.


http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/exec/zigzag-test.cc
File be/src/exec/zigzag-test.cc:

PS4, Line 101: 
> what's undefined about that?
left shift of negative values is undefined.


http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS4, Line 161: src.ptr != nullptr)
> dst->ptr equals copy, which we've already checked against NULL at line 154,
Done


http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/exprs/expr-value.h
File be/src/exprs/expr-value.h:

PS4, Line 72: empty(
> size() != 0
Done


http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/exprs/math-functions-ir.cc
File be/src/exprs/math-functions-ir.cc:

Line 379:     }
> why don't we need to check overflow here? and then why do we need to check 
We don't need to check it here because the original didn't check it here: the 
Check on L373 is independent of the produc on L371.

The check on L367 of PS4 should not be there, I think. Removed.


http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/exprs/operators-ir.cc
File be/src/exprs/operators-ir.cc:

Line 176: BINARY_OP_NUMERIC_TYPES(Subtract, -, Overflow::UnsignedDifference);
> for these, we eventually probably want to detect overflow and return NULL (
Noted. I'd like to leave that for a future patch.


PS4, Line 177: Overflow::UnsignedProduct
> won't this give the wrong result for negative inputs?
I think that signed two's complement and unsigned multiplications yield the 
same results on a product of two N-bit numbers where the result is the low N 
bits. Concretely, I think the following do the same thing:

    int32_t mul(int32_t x, int32_t y) { return x * y; }
    uint32_t mul(uint32_t x, uint32_t y) { return x * y; }

https://gcc.godbolt.org/ compiles them to the same thing.


http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

PS4, Line 819: 
> why remove that? do we sometimes pass null?
Yes: select split_part('', 'a', 1);
    F0129 14:38:29.681943 31628 string-functions-ir.cc:819] Check failed: 
haystack != __null && needle != __null


Line 629:   if (strs[0].len) memcpy(ptr, strs[0].ptr, strs[0].len);
> != 0.
If either pointer is nullptr, it's UB.

It would be more standards-compliant to check both pointers. I can do that, 
rather than checking just one length, but it would be more source code and 
likely slower.

What do you think?


http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

PS4, Line 388: (
> these are extraneous
I think they make the code much more readable.


http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

PS4, Line 83: 
> not your change, but please fix this formatting.
Done


PS4, Line 95:  |= abs(result) >= max_value / mult;
> can't result be negative, in which case this isn't correct.  also note that
See earlier comment about unsigned and signed-two's-complement multiplications 
being identical on the low N bits.

Regarding the callers ignoring result if overflow is true: are you suggesting 
that this multiplication not happen if overflow is already true?


Line 143:       // TODO: faster to return later? We don't care at all about the 
perf on the overflow
> this comment seemed to imply that this change will make this code slower. i
There is no BE benchmark for this, as far as I can tell. I can add one, but 
this patch is already quite large.


PS4, Line 149: 
> why do we need UnsignedSum() on line 144 case but not here?
Done


http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/runtime/multi-precision.h
File be/src/runtime/multi-precision.h:

Line 111:     scale = Overflow::UnsignedProduct(scale, base);
> given that callers won't actually use result if *overflow is true, do we ne
We still need it. The frustrating thing about UB is that it isn't just 
undefined results but undefined behavior. There are cases where compilers will 
take both branches of a conditional, for instance.

The change written here seems less invasive than changing 'result' and 'scale'. 
For instance, scale can be negative, which influences line 107.


http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/runtime/sorter.cc
File be/src/runtime/sorter.cc:

PS4, Line 975: string_val->ptr != nullptr
> is this actually possible? it's not checked upstream?
Yes; every check I added was because UBSAN found it dynamically.


http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/udf/udf.cc
File be/src/udf/udf.cc:

PS4, Line 493: !result
> len will almost always be > 0 so this condition should go second (if we nee
Done


http://gerrit.cloudera.org:8080/#/c/5082/4/be/src/util/decompress-test.cc
File be/src/util/decompress-test.cc:

PS4, Line 118: input_len
> != 0 here and elsewhere. but why needed?
added '> 0'. Once we settle the best way to check for nulls in C standard 
library functions with undefined semantics, I'll apply those here as needed. 
See comments earlier in this patch.


-- 
To view, visit http://gerrit.cloudera.org:8080/5082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I88c7234bd7c5eb7404490a0913d90470c10835e7
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple <jbapple-imp...@apache.org>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbapple-imp...@apache.org>
Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to