Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14585 )
Change subject: gutil/strings: fix UB in FindNth ...................................................................... Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/14585/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/14585/1//COMMIT_MSG@9 PS1, Line 9: It's not OK to overflow an unsigned type (size_t) in this way > Why? As of my knowledge, size_t is unsigned type. It overflows safely, an I pasted the UBSAN error below. The docs say: -fsanitize=unsigned-integer-overflow: Unsigned integer overflow, where the result of an unsigned integer computation cannot be represented in its type. Unlike signed integer overflow, this is not undefined behavior, but it is often unintentional. I can add this to the build-support/ubsan-blacklist.txt instead, but we've only done that for thirdparty libraries. http://gerrit.cloudera.org:8080/#/c/14585/1//COMMIT_MSG@11 PS1, Line 11: int instead of size_t > I think this is even more unsafe: it's not safe to overflow signed types. We're initializing a variable to -1 and then adding 1 to it. That overflows an unsigned type (intentionally), but not a signed type. http://gerrit.cloudera.org:8080/#/c/14585/1/src/kudu/gutil/strings/string_util-test.cc File src/kudu/gutil/strings/string_util-test.cc: http://gerrit.cloudera.org:8080/#/c/14585/1/src/kudu/gutil/strings/string_util-test.cc@64 PS1, Line 64: } > Could you add a test for: Done http://gerrit.cloudera.org:8080/#/c/14585/1/src/kudu/gutil/strings/util.cc File src/kudu/gutil/strings/util.cc: http://gerrit.cloudera.org:8080/#/c/14585/1/src/kudu/gutil/strings/util.cc@1026 PS1, Line 1026: int n > What if I pass std::max<int>() here? Shouldn't matter; n is only used to derive i, and i is only used to control the number of loop iterations. Neither i nor n are involved in the find_first_of() call. http://gerrit.cloudera.org:8080/#/c/14585/1/src/kudu/gutil/strings/util.cc@1030 PS1, Line 1030: s.find_first_of(c, pos + 1); > Instead, could we leave 'pos' as 'size_t' and we rewrite it like: I thought about doing that, but it adds an extra branch to each iteration of the loop. That's why I opted to change the type to int instead. -- To view, visit http://gerrit.cloudera.org:8080/14585 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I00a36f999a8c3b13469600b785da95c487643d4a Gerrit-Change-Number: 14585 Gerrit-PatchSet: 1 Gerrit-Owner: Adar Dembo <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 01 Nov 2019 00:21:40 +0000 Gerrit-HasComments: Yes
