[Bug c/89670] __builtin_ctz(_mm256_movemask_epi8(foo)) assumed to be <31 ?

2019-03-12 Thread joern at purestorage dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89670 --- Comment #15 from Jörn Engel --- > int foo (int x) { return __builtin_ctz (x); } > > Without -mbmi, gcc emits: > xorl%eax, %eax > rep bsfl%edi, %eax > ret That example convinces me. Code would be broken

[Bug c/89670] __builtin_ctz(_mm256_movemask_epi8(foo)) assumed to be <31 ?

2019-03-11 Thread joern at purestorage dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89670 --- Comment #13 from Jörn Engel --- None of those examples convince me. If you or I know that a zero-argument is impossible, but the compiler doesn't know, wouldn't that still be UB? And if the compiler knows, it can remove the branch either

[Bug c/89670] __builtin_ctz(_mm256_movemask_epi8(foo)) assumed to be <31 ?

2019-03-11 Thread joern at purestorage dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89670 --- Comment #11 from Jörn Engel --- I stand corrected. Thank you very much! Out of curiosity, if the only non-broken way to call __builtin_ctz(foo) is via "foo ? __builtin_ctz(foo) : 32", why isn't the conditional moved into __builtin_ctz()?

[Bug c/89670] __builtin_ctz(_mm256_movemask_epi8(foo)) assumed to be <31 ?

2019-03-11 Thread joern at purestorage dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89670 --- Comment #8 from Jörn Engel --- Updated testcase below fails to remove the branch with my gcc-8. /* * usage: * gcc -std=gnu11 -Wall -Wextra -g -march=core-avx2 -mbmi -fPIC -O3 % && ./a.out < /dev/zero */ #include #include #include

[Bug c/89670] __builtin_ctz(_mm256_movemask_epi8(foo)) assumed to be <31 ?

2019-03-11 Thread joern at purestorage dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89670 --- Comment #6 from Jörn Engel --- True for one, but not the other. return mask ? __builtin_ctz(mask) : 32; 1099: 83 f6 ffxor$0x,%esi 109c: 74 47 je 10e5 109e:

[Bug c/89670] __builtin_ctz(_mm256_movemask_epi8(foo)) assumed to be <31 ?

2019-03-11 Thread joern at purestorage dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89670 --- Comment #4 from Jörn Engel --- Fair enough. That means the only way to get tzcnt without a conditional is by using inline asm. Annoying, but something I can work with. Annoying because for CPUs with BMI1, tzcnt is well-defined and I

[Bug c/89670] __builtin_ctz(_mm256_movemask_epi8(foo)) assumed to be <31 ?

2019-03-11 Thread joern at purestorage dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89670 --- Comment #2 from Jörn Engel --- The input is 32. Does the "undefined-if-zero" thing give gcc license to remove code depending on the output? If it does, why is the code only removed when comparing against 31/32, not when comparing against

[Bug c/89670] New: __builtin_ctz(_mm256_movemask_epi8(foo)) assumed to be <31 ?

2019-03-11 Thread joern at purestorage dot com
mal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: joern at purestorage dot com Target Milestone: --- Created attachment 45945 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45945=edit matchlen testcase extracted from lz compressor