[Bug c++/85684] New: output of instrinsic _xgetbv is wrongly overwritten
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85684 Bug ID: 85684 Summary: output of instrinsic _xgetbv is wrongly overwritten Product: gcc Version: 8.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: gregory.hainaut at gmail dot com Target Milestone: --- Hello, GCC8.0 adds the long long _xgetbv(unsigned int), however the output isn't taken into account. xgetbv should return values in rdx:rax (edx:eax in 32 bits). The following code snippet #include #include int main() { if ((_xgetbv(0) & 6) == 6) printf("Yes\n"); else printf("No\n"); return 0; } will compile as (g++ -mavx2), xgetbv in O0 build ... xgetbv movq %rsi, %rax andl $6, %eax cmpq $6, %rax sete %al testb %al, %al je .L2 ... Note: the full code is optimized in -O1 as only 2 instructions ! main: movl $0, %ecx xgetbv Best regards
[Bug rtl-optimization/80799] New: [7 Regression] x86-32 bits generates MMX without EMMS
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80799 Bug ID: 80799 Summary: [7 Regression] x86-32 bits generates MMX without EMMS Product: gcc Version: 7.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: gregory.hainaut at gmail dot com Target Milestone: --- Created attachment 41373 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41373=edit MMX without EMMS instead of SSE2 Dear GCC developers, GCC7 breaks the build of the project PCSX2. We found that MMX opcodes are generated without any EMMS opcode on the 32 bits build (compiled from x86-64). Note: it would likely be better to use the SSE2 equivalent opcode. Please find a small example attached. Compiled with : g++ -msse -msse2 -O2 -m32 -g -c gcc7_mmx.cpp Here the generated code GIFRegHandlerTRXPOS(GIFRegTRXPOS const&): pushebx sub esp, 8 mov ebx, DWORD PTR [esp+16] movqxmm0, QWORD PTR TRXPOS movqmm0, QWORD PTR [ebx] movq2dq xmm1, mm0 pcmpeqd xmm0, xmm1 pmovmskbeax, xmm0 cmp eax, 65535 je .L2 calldummy_call() movqmm0, QWORD PTR [ebx] .L2: movqQWORD PTR TRXPOS, mm0 add esp, 8 pop ebx ret Best Regards, Gregory
[Bug target/80286] [5/6 Regression] AVX2 _mm_cvtsi128_si32 doesn't return a proper 32bits int
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80286 --- Comment #8 from gregory hainaut --- (In reply to Jakub Jelinek from comment #7) > wrong-code fixed on the trunk so far, optimizations still not. Thanks you for the quick fix :)
[Bug c++/80286] [4.9/5/6/7 regressions] AVX2 _mm_cvtsi128_si32 doesn't return a proper 32bits int
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80286 --- Comment #1 from gregory hainaut --- Of course, I forgot to say that code was compiled with the following option g++ -O2 -c -std=c++11 -mavx2
[Bug c++/80286] New: [4.9/5/6/7 regressions] AVX2 _mm_cvtsi128_si32 doesn't return a proper 32bits int
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80286 Bug ID: 80286 Summary: [4.9/5/6/7 regressions] AVX2 _mm_cvtsi128_si32 doesn't return a proper 32bits int Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: gregory.hainaut at gmail dot com Target Milestone: --- Dear GCC developers, It seems that G++4.9 introduced an optimization to reduce stack/memory access that broke _mm_cvtsi128_si32 behavior. Note: I tested the various GCC version with godbolt.org, I don't know if GCC 7 snapshot is recent or not. Note2: maybe the issue belong to RTL/tree optimization but I have no clue. Here a small test case -8<--- #include __m256i m; __m128i extract(__m128i minmax) { int shift = _mm_cvtsi128_si32(_mm256_castsi256_si128(m)); return _mm_srli_epi16(minmax, shift); } ->8--- It will be compiled as 2 following asm intruction (on recent GCC). The issue is that shift operand is 64 bits. So "shift" must be zero extended to 64 bits. Typically Clang uses vpmovzxdq vmovdqa m(%rip), %ymm1 vpsrlw %xmm1, %xmm0, %xmm0 Best Regards, Gregory