https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
--- Comment #3 from Matthias Kretz ---
Did you consider the error introduced by scaling with __amax? I made sure that
the division is without error by zeroing the mantissa bits. Here's a motivating
example that shows an error of 1 ulp otherwise:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052
--- Comment #5 from Matthias Kretz ---
Thank you Jakub! Here's a tested x86 library implementation for all conversions
and different ISA extension support for reference:
https://github.com/mattkretz/gcc/blob/mkretz/simd/libstdc%2B%2B-v3/include/
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84949
--- Comment #7 from Matthias Kretz ---
Example showing the discrepancy: https://godbolt.org/z/D15m71
Also PR83875 is relevant wrt. giving different answers depending on function
attributes.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84949
Matthias Kretz changed:
What|Removed |Added
CC||kretz at kde dot org
--- Comment #6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=6
Matthias Kretz changed:
What|Removed |Added
CC||kretz at kde dot org
--- Comment #1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88152
--- Comment #5 from Matthias Kretz ---
> -fno-signed-zeros isn't a guarantee the operand will not be -0.0 and having
> x < 0.0 behave differently based on whether x is -0.0 or 0.0 (with
> -fno-signed-zeros quite randomly) is IMHO very bad.
I agr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88152
--- Comment #2 from Matthias Kretz ---
I just realized, the movmsk(x<0) => movmsk(x) transformation also applies to
float and double if -ffinite-math-only (i.e. no NaN, it's alright for inf) and
-fno-signed-zeros are active.
tion
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Target: x86_64-*-*, i?86-*-*
Testcase (https://godbolt.org/z/YNPZyf):
#include
template
u
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44551
--- Comment #20 from Matthias Kretz ---
The original issue I meant to report is fixed. There are many more missed
optimizations in the original example, though.
I.e. https://godbolt.org/z/7P1o3O should compile to:
use_insert_extract():
vmovdqu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44551
--- Comment #18 from Matthias Kretz ---
FWIW, the issue is resolved on trunk. GCC8.2 still has the missed optimization:
https://godbolt.org/z/hbgIIi
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87989
--- Comment #4 from Matthias Kretz ---
Yes, looks like a duplicate of 86246.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87989
Matthias Kretz changed:
What|Removed |Added
Known to work||7.3.0
Known to fail|
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Testcase (https://godbolt.org/z/sStNGV):
struct X {
template operator T() const;
operator float() const;
};
template
T f(const X &a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87631
--- Comment #2 from Matthias Kretz ---
My (current) use case is structures (nested) of builtin types and vector types.
These structures have a trivial copy constructor.
Generalization
---
I believe generalization of this approach s
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: other
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Consider:
using V [[gnu::vector_size(16)]] = float;
struct X1 { V a
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Test case (unreduced) at
https://web-docs.gsi.de/~mkretz/invalid_knl_instruction.cpp
Compile the test case with `g++ -std=c++17 -O1 -march=knl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86655
--- Comment #2 from Matthias Kretz ---
http://eel.is/c++draft/c.math#sf.cmath-1.3 might be the reason why `m <= l` is
enforced. But unless I'm confused the footnote on "mathematically defined"
tells us it should work:
- "(a) if it is explicitly
Priority: P3
Component: libstdc++
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
https://wg21.link/c.math#sf.cmath.assoc_legendre leaves m unconstrained.
__detail::__assoc_legendre_p documents "@param m The order o
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86267
--- Comment #2 from Matthias Kretz ---
Sorry for the delay. Vacation...
This pattern appears in many variations in the implementation of
wg21.link/p0214r9. The fixed_size ABI tag used with a simd_mask type
requires a decision from the implemente
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Testcase (cf. https://godbolt.org/g/gi6f7V):
#include
auto f(__m256i a, __m256i b) {
__m256i k = a < b;
l
: normal
Priority: P3
Component: libstdc++
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
make_(un)signed_t of char16_t, char32_t, or wchar_t should never be
char16_t/char32_t/wchar_t, just like it is the case for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85827
--- Comment #3 from Matthias Kretz ---
But macros are different. They remove the code before the C++ parser sees it
(at least as-if). One great improvement of constexpr-if over macros is that all
the other branches are parsed and their syntax che
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85827
--- Comment #1 from Matthias Kretz ---
Same issue for -Wunused-variable
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Testcase `-std=c++17 -Wall` (cf. https://godbolt.org/g/kfgN2V):
template int f()
{
constexpr bool _1 = N == 1;
constexpr bool _2
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Testcase (cf. https://godbolt.org/g/UoU3zj):
using T = float;
using To [[gnu::vector_size(32)]] = T;
using From [[gnu::vector_size(32)]] = unsigned;
#define A2(I) (T)a[I
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
The absolute value for 64-bit integer SSE vectors is only optimized when
AVX512VL is available. Test case (`-O2 -ffast-math` and one of -mavx512vl,
-msse4, or -msse2):
#include
__v2di
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85538
--- Comment #3 from Matthias Kretz ---
Some more observations:
1. The instruction sequence:
kmovq %k1,-0x8(%rsp)
vmovq -0x8(%rsp),%xmm1
vmovq %xmm1,%rax
kmovq %rax,%k0
should be a simple `kmovq %k1,%k0` instead.
2. Adding `
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85538
--- Comment #1 from Matthias Kretz ---
Sorry, I was trying to force GCC to use the k1 register and playing with
register asm (which didn't have any effect at all). f8 should actually be (cf.
https://godbolt.org/g/hSkoJV):
bool f8(__m512i x, __m5
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Test case (`-O2 -march=skylake-avx512`, cf. https://godbolt.org/g/ou3oAZ):
#include
// bad:
bool f8(__m512i x, __m512i y) {
register __mmask64 k asm("
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85482
Matthias Kretz changed:
What|Removed |Added
Keywords||missed-optimization
Target|
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Test case (cf. https://godbolt.org/g/QkJYSK):
#include
__m256 zero_extend1(__m128 a) {
return _mm256_insertf128_ps(__m256(), a, 0);
}
__m256d zero_extend1(__m128d a
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Test case (cf. https://godbolt.org/g/p4Kt8X):
#include
__m512 zero_extend2(__m128 a) {
return _mm512_insertf32x4(__m512(), a, 0
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
The following test case shows that constant propagation through conversion
intrinsics does not work:
#include
template using V [[gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85323
--- Comment #1 from Matthias Kretz ---
Created attachment 43898
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43898&action=edit
idea for a partial solution
Constant propagation works using the built in shift operators. At least for the
sh
: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
In the following test case, all three functions should compile to just `ret`:
#include
__m128i f(__m128i x) {
x = _mm_sll_epi64(x, __m128i());
x = _mm_sll_epi32(x
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
The following test case shows that the movemask intrinsics are are a barrier
for constant propagation. All of these functions should have a trivial constant
return value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85077
--- Comment #8 from Matthias Kretz ---
Thanks! FWIW my abs implementation now uses:
template
[[gnu::optimize("finite-math-only,no-signed-zeros")]]
constexpr Storage abs(Storage v)
{
return v.d < 0 ? -v.d : v.d;
}
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84786
--- Comment #15 from Matthias Kretz ---
Here's an idea for a test case (https://godbolt.org/g/SjM2HE: it appears fixed
on GCC 8):
typedef unsigned short V __attribute__((vector_size (16)));
V foo (V x, int y)
{
x <<= y;
asm volatile (""::"x
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84786
--- Comment #14 from Matthias Kretz ---
I applied both patches to my GCC 7.2 installation and as a result my complete
testsuite passes now. Anything else I can help with?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84786
--- Comment #13 from Matthias Kretz ---
I'll try to apply it locally and will report my findings.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84786
--- Comment #11 from Matthias Kretz ---
Created attachment 43762
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43762&action=edit
test case that produces incorrect vpsrlw
Compiled with `g++-7 -std=c++17 -O0 -fabi-version=0 -fabi-compat-ver
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84786
--- Comment #10 from Matthias Kretz ---
This is all I have right now:
TID 0 SDE-ERROR: Executed instruction not valid for specified chip (KNL):
0x70d281: vpsrlw xmm0, xmm0, xmm16
Image:
/home/travis/build/VcDevel/Vc/build-Experimental/c2dd920conc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84786
--- Comment #8 from Matthias Kretz ---
There seems to be a similar bug for vpsrlw and vpsllw. Do you need a testcase?
(It's hard to hit the bug... just had one occur on a Travis CI build)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85077
--- Comment #4 from Matthias Kretz ---
Oh, there seems to be a regression in GCC 8. In 7 it works as you say. In 8 I
can't get the andps to show up
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85077
--- Comment #3 from Matthias Kretz ---
Ouch, right I didn't think of non-finite values.
I.e. -0 < 0 is false...
However, this is what I wanted:
abs(-inf) -> inf
abs( inf) -> inf
abs( nan) -> nan
abs( -0) -> 0
abs( 0) -> 0
The sign bit manip
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
The following test case (also at https://godbolt.org/g/XEPk7M) shows that `x <
0 ? -x : x` is not optimized to an efficient abs implementation. This is not
only the case for SSE, but also
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48701
--- Comment #3 from Matthias Kretz ---
Updated test case at https://godbolt.org/g/D5P1N1.
`testLoad` was fixed with 4.7.
`testStore` still combines via the stack.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43514
Matthias Kretz changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85048
--- Comment #3 from Matthias Kretz ---
Just opened PR85052 for tracking __builtin_convertvector support.
y: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
clang implements __builtin_convertvector to simplify conversions between
different vector builtins. In contrast to bitcasts, supported through C casts,
this builtin con
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85048
--- Comment #1 from Matthias Kretz ---
Godbolt link:
https://godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAKxAEZSAbAQwDtRkBSAJgCFufSAZ1QBXYskwgA5NwDMeFsgYisAag6yAwskEF8LAhuwcADAEFTZgpgC2AB2bX1WpU0GDVAFVKqFBVQByPn6qAMp4AF6YzgAigaoAVKqCkZi
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
The following testcase lists all integer and/or float conversions applied to
vector builtins of the same number of elements. All of those functions can be
compiled to a single
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84786
--- Comment #2 from Matthias Kretz ---
Created attachment 43618
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43618&action=edit
unreduced testcase
Compile with `g++ -std=c++17 -O2 -march=knl -o knl-fail knl-fail.cpp`.
The function `Tests:
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
I see generated code, such as:
424821:· vpxord %zmm17,%zmm17,%zmm17
424827:· vpxord %zmm18,%zmm18,%zmm18
[...]
424855:· vunpcklpd
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Testcase: https://godbolt.org/g/S3tfrL
#include
int f(__m128 a) { return _mm_movemask_ps(a)& 0xf; }
int f(__m128d a) { return _mm_movemask_pd(a)& 0x3;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83875
--- Comment #9 from Matthias Kretz ---
> inside multi-versioned (target_clones/target) function it depends on the
> active target
Yes., this part is easy.
> inside a constexpr context (function/variable, your examples) or
> always_inline func
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83875
--- Comment #7 from Matthias Kretz ---
Hmm,
what should the following print?
constexpr int native_simd_width = __builtin_target_supports("avx512f") ? 64 :
__builtin_target_supports("avx") ? 32 : __builtin_target_supports("sse") ? 16 :
__builtin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83875
Matthias Kretz changed:
What|Removed |Added
CC||kretz at kde dot org
--- Comment #6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83894
--- Comment #2 from Matthias Kretz ---
I compiled with:
g++-7 -march=haswell -std=c++17 -O3 -flax-vector-conversions -o char_shift
char_shift.cpp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83894
--- Comment #1 from Matthias Kretz ---
Created attachment 43149
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43149&action=edit
tsc.h
Header required for the benchmark code.
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Created attachment 43148
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43148&action=edit
benchmark
shifts of vector builtins
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83793
--- Comment #4 from Matthias Kretz ---
(In reply to Jonathan Wakely from comment #2)
> Looks like a dup of PR 47226
Ah, yes. Sorry for missing it, I recall seeing it before. I agree, a backport
would be nice, but an overhaul is not a backportabl
: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Testcase (cf. https://godbolt.org/g/jFkk7N):
```
#include
template struct simd
{
static constexpr size_t size() { return 4; }
template simd(F &&gen, decltype(std::declval()(0)) * =
: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Testcase (https://godbolt.org/g/mNhetZ):
#include
#include
using std::size_t;
template auto f(std::index_sequence) {
std
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47226
Matthias Kretz changed:
What|Removed |Added
CC||kretz at kde dot org
--- Comment #10
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
Related: #55894
Testcase:
#include
int f() {
__m128i x{};
x = _mm_cmpeq_epi16(x, x);
return _pext_u32(_mm_movemask_epi8(x
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68049
--- Comment #1 from Matthias Kretz ---
Is there anything I can do to help finding a resolution to this issue? It's a
rather annoying issue for my SIMD code.
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
The following testcase fails to compile at -O0, but works at -O1 and higher
(-std=c++11 is the only required compiler flag):
template struct
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67011
Matthias Kretz changed:
What|Removed |Added
CC||kretz at kde dot org
--- Comment #3
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Target Milestone: ---
The following testcase fails at -O2:
#include
typedef short A __attribute__((__may_alias__));
short extr(const __m128i &d, int index) { re
: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Testcase:
template (F::operator())(A()))>
void test();
Compile it with '-std=c++11'. Tested to fail with GCC 4.8.[0123]. GCC 4.9.x
does not
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63385
Matthias Kretz changed:
What|Removed |Added
Known to work||4.9.0, 4.9.1, 4.9.2
Known to fail|
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
Testcase:
template void f(F closure) { auto g = [&]() { return closure; }; }
Compile with '
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50800
Matthias Kretz changed:
What|Removed |Added
CC||kretz at kde dot org
--- Comment #9
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57156
--- Comment #8 from Matthias Kretz ---
I just noticed the following in the Intel Optimization Reference Manual
(Version 028 from July 2013), section 2.2 "Sandy Bridge":
2.2.3.1 Renamer
[...]
There is another dependency breaking idiom - the "ones i
rmal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: kretz at kde dot org
int main()
{
return (int() & int());
}
compile with g++ -O0 -o test main.cpp
this compiles with all compilers I know, except GCC 4.8.1
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57156
Matthias Kretz changed:
What|Removed |Added
Component|target |tree-optimization
--- Comment #7 from M
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47769
--- Comment #5 from Matthias Kretz 2013-05-03 11:45:49
UTC ---
Another ping.
The bug status is still WAITING...
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57156
--- Comment #5 from Matthias Kretz 2013-05-03 09:56:00
UTC ---
(In reply to comment #4)
> I wouldn't know how to counter this for the _mm_cmpeq_epi8 case
Actually, I have yet to find something in the standard that says using an
uninitia
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57156
--- Comment #4 from Matthias Kretz 2013-05-03 09:37:58
UTC ---
(In reply to comment #3)
> I think this is undefined code as you use a uninitialized.
I wouldn't know how to counter this for the _mm_cmpeq_epi8 case, but for
_mm_comtrue_ep
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57156
--- Comment #2 from Matthias Kretz 2013-05-03 09:15:33
UTC ---
The failure disappears with -fno-tree-ccp
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57156
Matthias Kretz changed:
What|Removed |Added
Known to fail||4.7.0, 4.7.1, 4.7.2, 4.8.0
---
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57156
Bug #: 57156
Summary: miscompilation of call to _mm_cmpeq_epi8(a, a) or
_mm_comtrue_epu8(a, a) with uninitialized a
Classification: Unclassified
Product: gcc
Version: 4.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56920
Bug #: 56920
Summary: another static initialization of an array miscompiled
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: nor
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56918
Bug #: 56918
Summary: incorrect auto-vectorization of array initialization
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: norm
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56038
Matthias Kretz changed:
What|Removed |Added
CC||kretz at kde dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56253
Bug #: 56253
Summary: fp-contract does not work with SSE and AVX FMAs
(neither FMA4 nor FMA3)
Classification: Unclassified
Product: gcc
Version: 4.7.2
Statu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56022
Bug #: 56022
Summary: [4.8 regression] ICE (segfault) at
convert_memory_address_addr_space (explow.c:334)
Classification: Unclassified
Product: gcc
Version: 4.8.0
St
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56019
Bug #: 56019
Summary: max_align_t should be in std namespace
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
Severity: normal
Priority:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55727
Matthias Kretz changed:
What|Removed |Added
Attachment #29002|0 |1
is obsolete|
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55727
--- Comment #4 from Matthias Kretz 2012-12-18 18:20:00
UTC ---
Created attachment 29002
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29002
support for over-aligned types in new_allocator
I finished my allocator to fix the issue and it wa
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55727
--- Comment #3 from Matthias Kretz 2012-12-18 13:26:21
UTC ---
(In reply to comment #2)
> (In reply to comment #0)
> > Right now it does not even suffice to reimplement new/delete inside Foo to
> > make
> > std::vector work.
>
> Sorry
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55727
--- Comment #2 from Matthias Kretz 2012-12-18 09:11:41
UTC ---
(In reply to comment #0)
> Right now it does not even suffice to reimplement new/delete inside Foo to
> make
> std::vector work.
Sorry, this statement seems to be wrong. Th
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55727
--- Comment #1 from Matthias Kretz 2012-12-18 08:53:24
UTC ---
Created attachment 28992
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28992
simple testcase for std::vector
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55727
Bug #: 55727
Summary: better support for dynamic allocation of over-aligned
types
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: UNCONFIRMED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55448
--- Comment #3 from Matthias Kretz 2012-11-24 21:38:21
UTC ---
BTW, the problem is just as well visible with only SSE. The __m128 case then
compiles to movlps and movhps instead of the memory operand.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55448
Bug #: 55448
Summary: using const-reference SSE or AVX types leads to
unnecessary unaligned loads
Classification: Unclassified
Product: gcc
Version: 4.7.2
S
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54703
--- Comment #7 from Matthias Kretz 2012-09-26 10:52:38
UTC ---
Thanks for the quick response! You guys are cool! :)
The pattern here is for calculation with extended precision:
xh = x & mask;
xl = x - xh;
yh = y & mask
yl = y - yh;
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54703
--- Comment #1 from Matthias Kretz 2012-09-25 13:32:27
UTC ---
Um, sorry. Forgot to note the compiler switches:
gcc -O1 -march=bdver1
I can't reproduce the error with corei7-avx or any other non-AVX target.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54703
Bug #: 54703
Summary: [miscompilation] _mm_sub_pd is incorrectly substituted
with vandnps
Classification: Unclassified
Product: gcc
Version: 4.8.0
Status: U
101 - 200 of 248 matches
Mail list logo