[Bug c++/85684] New: output of instrinsic _xgetbv is wrongly overwritten

2018-05-07 Thread gregory.hainaut at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85684

Bug ID: 85684
   Summary: output of instrinsic _xgetbv is wrongly overwritten
   Product: gcc
   Version: 8.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gregory.hainaut at gmail dot com
  Target Milestone: ---

Hello,

GCC8.0 adds the long long _xgetbv(unsigned int), however the output isn't taken
into account. xgetbv should return values in rdx:rax (edx:eax in 32 bits).

The following code snippet

#include 
#include 

int main()
{
if ((_xgetbv(0) & 6) == 6)
printf("Yes\n");
else
printf("No\n");
return 0;
}

will compile as (g++ -mavx2), xgetbv in O0 build
...
xgetbv
movq %rsi, %rax
andl $6, %eax
cmpq $6, %rax
sete %al
testb %al, %al
je .L2
...

Note: the full code is optimized in -O1 as only 2 instructions !
main:
  movl $0, %ecx
  xgetbv

Best regards

[Bug rtl-optimization/80799] New: [7 Regression] x86-32 bits generates MMX without EMMS

2017-05-17 Thread gregory.hainaut at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80799

Bug ID: 80799
   Summary: [7 Regression] x86-32 bits generates MMX without EMMS
   Product: gcc
   Version: 7.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gregory.hainaut at gmail dot com
  Target Milestone: ---

Created attachment 41373
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41373=edit
MMX without EMMS instead of SSE2

Dear GCC developers,

GCC7 breaks the build of the project PCSX2. We found that MMX opcodes are
generated without any EMMS opcode on the 32 bits build (compiled from x86-64).
Note: it would likely be better to use the SSE2 equivalent opcode.

Please find a small example attached. Compiled with : g++  -msse -msse2 -O2
-m32 -g -c gcc7_mmx.cpp

Here the generated code

GIFRegHandlerTRXPOS(GIFRegTRXPOS const&):
pushebx
sub esp, 8
mov ebx, DWORD PTR [esp+16]
movqxmm0, QWORD PTR TRXPOS

movqmm0, QWORD PTR [ebx]
movq2dq xmm1, mm0

pcmpeqd xmm0, xmm1
pmovmskbeax, xmm0
cmp eax, 65535
je  .L2
calldummy_call()

movqmm0, QWORD PTR [ebx]
.L2:
movqQWORD PTR TRXPOS, mm0

add esp, 8
pop ebx
ret

Best Regards,
Gregory

[Bug target/80286] [5/6 Regression] AVX2 _mm_cvtsi128_si32 doesn't return a proper 32bits int

2017-04-04 Thread gregory.hainaut at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80286

--- Comment #8 from gregory hainaut  ---
(In reply to Jakub Jelinek from comment #7)
> wrong-code fixed on the trunk so far, optimizations still not.

Thanks you for the quick fix :)

[Bug c++/80286] [4.9/5/6/7 regressions] AVX2 _mm_cvtsi128_si32 doesn't return a proper 32bits int

2017-04-02 Thread gregory.hainaut at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80286

--- Comment #1 from gregory hainaut  ---
Of course, I forgot to say that code was compiled with the following option
g++ -O2 -c -std=c++11 -mavx2

[Bug c++/80286] New: [4.9/5/6/7 regressions] AVX2 _mm_cvtsi128_si32 doesn't return a proper 32bits int

2017-04-02 Thread gregory.hainaut at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80286

Bug ID: 80286
   Summary: [4.9/5/6/7 regressions] AVX2 _mm_cvtsi128_si32 doesn't
return a proper 32bits int
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gregory.hainaut at gmail dot com
  Target Milestone: ---

Dear GCC developers,

It seems that G++4.9 introduced an optimization to reduce stack/memory access
that broke _mm_cvtsi128_si32 behavior. 

Note: I tested the various GCC version with godbolt.org, I don't know if GCC 7
snapshot is recent or not.
Note2: maybe the issue belong to RTL/tree optimization but I have no clue.

Here a small test case
-8<---
#include 

__m256i m;

__m128i extract(__m128i minmax)
{
int shift = _mm_cvtsi128_si32(_mm256_castsi256_si128(m));
return _mm_srli_epi16(minmax, shift);
}
->8---
It will be compiled as 2 following asm intruction (on recent GCC). The issue is
that shift operand is 64 bits. So "shift" must be zero extended to 64 bits.
Typically Clang uses vpmovzxdq

 vmovdqa m(%rip), %ymm1
 vpsrlw  %xmm1, %xmm0, %xmm0

Best Regards,
Gregory