https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89670

            Bug ID: 89670
           Summary: __builtin_ctz(_mm256_movemask_epi8(foo)) assumed to be
                    <31 ?
           Product: gcc
           Version: 8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: joern at purestorage dot com
  Target Milestone: ---

Created attachment 45945
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45945&action=edit
matchlen testcase extracted from lz compressor

I ran across this while working on a LZ compression library.  One way of
calculating the match length is through vector-comparison, movemask and ctz. 
It is relatively useful because it covers up to 32 equal bytes without branch.

If 32 bytes match, the true match length might be much longer than 32.  So
naturally the code contains a branch
if (ml == 32) {
    /* calculate actual match length */
}

That branch was optimized away, which surprised me a bit.  I have reduced the
problem to the attached testcase.  Testcase seems to work fine with gcc 4.8,
but fails with 4.9, 5, 6, 7 and 8.  It also fails with clang 3.5, 3.8, 4.0, 6.0
and 7, fwiw.

System is an old Debian unstable, compilers are from Debian.b

Reply via email to