Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: spatel at rotateright dot com
$ cat fneg.c
#include xmmintrin.h
__m128 fneg4(__m128 x) {
return _mm_sub_ps(_mm_set1_ps(-0.0), x);
}
$ ~gcc49/local/bin/gcc -march=core-avx2 -O2 -S
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: spatel at rotateright dot com
$ cat fabs.c
#include math.h
float foo(float a) {
return fabsf(a);
}
$ gcc49 -O1 fabs.c -S -o -
.text
.globl _foo
_foo:
LFB19:
movss
: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: spatel at rotateright dot com
$ cat fnabs.c
#include math.h
float foo(float a) {
return -fabsf(a);
}
$ gcc49 -O1 fnabs.c -S -o -
.text
.globl _foo
_foo:
LFB19
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62054
Sanjay Patel spatel at rotateright dot com changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62054
--- Comment #3 from Sanjay Patel spatel at rotateright dot com ---
I think there's still an optimization possible here regarding the constant pool
data - see bug 62055. Hopefully, I didn't mess that one up. :)
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: spatel at rotateright dot com
Using gcc 4.9:
$ cat sdiv.c
typedef int vecint __attribute__((vector_size(16)));
vecint f(vecint x) {
return x/2;
}
$ gcc -O2 sdiv.c -S -o
Assignee: unassigned at gcc dot gnu.org
Reporter: spatel at rotateright dot com
With gcc 4.9.0 (version details below), the x86 bit manipulation instruction
(BMI) C intrinsics are not being recognized. This appears to be a regression
from gcc 4.8.2.
$ cat bmi.c
#include x86intrin.h
int foo
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60847
--- Comment #1 from Sanjay Patel spatel at rotateright dot com ---
It looks like an extra leading underscore is required to recognize the BMI
intrinsics. This is not happening with other (BMI2, SSE4) intrinsics.
According to the Intel reference
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60847
Sanjay Patel spatel at rotateright dot com changed:
What|Removed |Added
Component|target |c
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60847
--- Comment #8 from Sanjay Patel spatel at rotateright dot com ---
Thanks, Jakub.
I see that the fix duplicates all of the intrinsics with a
double-leading-underscore variant. Why do we need that? AFAIK, no other x86
intrinsics have this kind
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60847
--- Comment #10 from Sanjay Patel spatel at rotateright dot com ---
Ah - thank you for the explanation! I found the original checkin from AMD:
http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01356.html
Strangely, I can't find any documentation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64677
--- Comment #5 from Sanjay Patel spatel at rotateright dot com ---
(In reply to Mikhail Maltsev from comment #3)
So, compile-time result is more precise. BTW, what does the disassembly look
like?
In the -O0 case, it looks like all of the math
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64677
--- Comment #2 from Sanjay Patel spatel at rotateright dot com ---
This is on plain x86-64 with SSE (before the addition of any FMA instructions),
so lack of FMA must be accounted for?
The answers differ in the last digit / ULP. Is there some
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64677
--- Comment #9 from Sanjay Patel spatel at rotateright dot com ---
(In reply to Sanjay Patel from comment #8)
It seems I don't need the -std=c++11 flag as I do on OS X?
Actually, I screwed that up. We don't need that flag on OS X either
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64677
--- Comment #11 from Sanjay Patel spatel at rotateright dot com ---
(In reply to Mikhail Maltsev from comment #10)
C++11 supports constexpr (and std::complex has constexpr constructor).
Ah, that makes sense. Yes, we're only generating
++
Assignee: unassigned at gcc dot gnu.org
Reporter: spatel at rotateright dot com
I'm not sure if this is a bug at -O0, at -O1 (in MPFR because all math is
folded out in this case?), or neither:
#include complex
#include iostream
#include iomanip
int main()
{
std::complexdouble
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64677
--- Comment #8 from Sanjay Patel spatel at rotateright dot com ---
(In reply to Andrew Pinski from comment #7)
Can you try this under Linux too, just to double check there?
Wow, that other bug shows that there are a lot of variables here.
I
17 matches
Mail list logo