> regcomp.c uses the "start + count < end" idiom to check that there are > "count" bytes available in an array of char "start" and "end" both point > to. > > This is fine, unless "start + count" goes beyond the last element of the > array. In this case, pedantic interpretation of the C standard makes > the comparison of such a pointer against "end" undefined, and optimizers > from hell will happily remove as much code as possible because of this.
I am only noticing now that llvm contains a copy of OpenBSD's libc regex code (with an llvm_ prefix to the public interfaces) in its llvmSupport library. I am thus surprised that one of their sanitizers did not expose that wrong construct already. I'll report this to the llvm project tomorrow. In the meantime, under OpenBSD, it might be worth investigating shutting that copy and having llvm_re* aliases of libc's re* functions, if only to make the code smaller.
