https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89670
Bug ID: 89670 Summary: __builtin_ctz(_mm256_movemask_epi8(foo)) assumed to be <31 ? Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: joern at purestorage dot com Target Milestone: --- Created attachment 45945 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45945&action=edit matchlen testcase extracted from lz compressor I ran across this while working on a LZ compression library. One way of calculating the match length is through vector-comparison, movemask and ctz. It is relatively useful because it covers up to 32 equal bytes without branch. If 32 bytes match, the true match length might be much longer than 32. So naturally the code contains a branch if (ml == 32) { /* calculate actual match length */ } That branch was optimized away, which surprised me a bit. I have reduced the problem to the attached testcase. Testcase seems to work fine with gcc 4.8, but fails with 4.9, 5, 6, 7 and 8. It also fails with clang 3.5, 3.8, 4.0, 6.0 and 7, fwiw. System is an old Debian unstable, compilers are from Debian.b