[Bug target/72827] [7 Regression] gnat bootstrap broken on powerpc64le-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72827 Marc Glisse changed: What|Removed |Added CC||marc.glisse at normalesup dot org --- Comment #9 from Marc Glisse --- (In reply to Eric Botcazou from comment #8) > Unfortunately I don't seem to be able to connect to gcc112 in the > CompileFarm: > > eric@arcturus:~> ssh -l ebotcazou gcc112.osuosl.org > ssh: Could not resolve hostname gcc112.osuosl.org: Name or service not known gcc112.fsffrance.org (aka gcc2-power8.osuosl.org)
[Bug target/63789] g++ -m32 on solaris has trouble finding abs with int64_t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63789 --- Comment #6 from Marc Glisse --- Sorry, by recent I meant at least 6.1, I should have been more specific.
[Bug target/63789] g++ -m32 on solaris has trouble finding abs with int64_t
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63789 Marc Glisse changed: What|Removed |Added CC||marc.glisse at normalesup dot org --- Comment #4 from Marc Glisse --- I would expect that this is fixed (or at least the behavior changed) in recent versions of gcc, that ship a C++ stdlib.h wrapper. Can someone confirm?
[Bug tree-optimization/73714] [Regression 7] Incorrect unsigned long long arithmetic optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=73714 Marc Glisse changed: What|Removed |Added CC||marc.glisse at normalesup dot org --- Comment #2 from Marc Glisse --- Ah, my verifier considers that some_32bit_int << 57 is 0, while it actually is undefined or unspecified, and in particular may yield some_32bit_int << 25 on some platforms. I hope I didn't introduce too many similar bugs... I'll look at it more closely when I get time, but anyone should feel free to revert the first hunk of my patch if they need a quicker resolution.
[Bug tree-optimization/51938] missed optimization: 2 comparisons
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51938 Marc Glisse changed: What|Removed |Added Component|rtl-optimization|tree-optimization --- Comment #4 from Marc Glisse 2012-06-07 14:54:02 UTC --- Changing to tree-optimization (doing the optimization at RTL level would require finite-math-only). There is plenty of code that corresponds to A&&B and A||B, but (almost) nothing for A&&!B. Quite a big missing piece... : if (x_2(D) > 0.0) goto ; else goto ; : if (x_2(D) < 0.0) goto ; else goto ; The 2 conditions don't share the same then branch or the same else branch (it is a mix), so ifcombine doesn't even try to turn it into if (x_2(D) > 0.0 || !(x_2(D) < 0.0)) goto ; else goto ; Besides, it doesn't look like the logic is in place to fold that condition into just its second half (but I may have missed it).
[Bug c++/53360] Problems with -std=gnu++0x
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53360 --- Comment #1 from Marc Glisse 2012-05-15 15:00:58 UTC --- clang and gcc reject it, but intel and oracle accept it.
[Bug c++/53350] Internal compiler error when compiling boost/smart_ptr/intrusive_ptr.hpp 1.49
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53350 --- Comment #5 from Marc Glisse 2012-05-15 14:50:42 UTC --- You may first want to check whether you still get the bug with a more recent gcc version.
[Bug c/53216] fmaf() alters rounding mode of sse2 FPU
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53216 Marc Glisse changed: What|Removed |Added CC||marc.glisse at normalesup ||dot org --- Comment #1 from Marc Glisse 2012-05-03 19:47:45 UTC --- This looks like a glibc issue, doesn't it? Or do you see something wrong with the code gcc produces for this example?
[Bug target/53101] Recognize casts to sub-vectors
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101 --- Comment #4 from Marc Glisse 2012-05-03 19:19:00 UTC --- (define_peephole2 [(set (mem:VI8F_256 (match_operand 2)) (match_operand:VI8F_256 1 "register_operand")) (set (match_operand: 0 "register_operand") (mem: (match_dup 2)))] "TARGET_AVX" [(set (match_dup 0) (vec_select: (match_dup 1) (parallel [(const_int 0) (const_int 1)])))] ) (and similar for VI4F_256) is much less hackish than the XEXP stuff. I was quite sure I'd tested exactly this and it didn't work, but now it looks like it does :-/ Except that following http://gcc.gnu.org/ml/gcc-patches/2012-05/msg00197.html , this is not the right place to try and add such logic. That's a good thing because it is way too fragile, another instruction can easily squeeze between the two sets and disable the peephole.
[Bug tree-optimization/30318] VRP does not create ANTI_RANGEs on overflow
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30318 --- Comment #7 from Marc Glisse 2012-05-02 14:33:42 UTC --- (In reply to comment #6) > On Sat, 28 Apr 2012, marc.glisse at normalesup dot org wrote: > > I find it easier to use bignum and wrap at the end, instead of checking for > > each operation if it overflows. > I think using GMP is way too expensive for this (simple) task. As long as you only try to handle operations on types no larger than HOST_WIDE_INT, using double_int should be possible. But if you want to handle wrapping multiplication of __int128, that's going to be hard without a widening multiplication to __int256. I guess I could implement a mulhi on double_int... Or at least make sure the slow path is only used for __int128 and not for small types. Or even fall back to VR_VARYING when __int128 overflows, but that's sad. (as a side note, it is strange that double_int is signed, it seems it should break with strict overflow) > Well, my original idea was to simultanely do range propagation for > wrapping and undefined overflow, and in the case that both results > result in different final transforms warn (to avoid the fact that > we do not fully take advantage of undefined overflow during propagation > and to avoid false positives on the warnings for undefined overflow). Good idea. I guess one of my problems is that there are several possible notions of overflow and I don't really know which gcc wants. - wrap (unsigned and -fwrapv) - saturating (not currently) - trap (has to detect overflows and do something about them) - unspecified (don't know anything about the value produced by an overflow, but it is legal) - illegal (we are allowed to crash the computer if such a path is ever taken, but also to just keep going with a random value, that may not even be consistent between uses, I guess that's -fstrict-overflow) The comments at the definition of TYPE_OVERFLOW_UNDEFINED seem to indicate that it means "illegal", but tree-vrp tends to use: non-wrapping => unspecified. And I don't think value_range_d has a notion of an empty range (VR_UNDEFINED or VR_RANGE with max
[Bug c++/53177] 20_util/function/cons/callable.cc failed with -m32 -march=corei7
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53177 Marc Glisse changed: What|Removed |Added CC||marc.glisse at normalesup ||dot org --- Comment #4 from Marc Glisse 2012-05-02 00:54:38 UTC --- (In reply to comment #2) > I was seeing an ICE in the same place with an earlier version of the changes > which caused this testcase regression. I have only managed to reduce it to > 10k > lines so far - that delta-reduced file is attached, I haven't had time to try > manually reducing it. If you only want a small example causing the ICE, here is one (-std=c++0x is enough, no need for -m32 or -march). If you want something that looks vaguely like a valid C++ program, it's going to be bigger... extern "C++" namespace __attribute__ ) template < } template < typename > struct add_rvalue_reference ; template < _Tp > typename declval ( ) noexcept ; struct { typedef long __type } struct { } template < typename _Res , typename ... _ArgTypes > class function < _Res ( _ArgTypes ) { _Signature_type ( _ArgTypes ) template < typename _Functor > using _Invoke decltype ( ( declval < _Functor > ) ) template < typename , typename > struct _CheckResult { } template < ; template < typename _Functor > using _Callable _CheckResult < _Invoke < _Functor > , _Res > template < typename , typename > using _Requires template < typename _Functor , typename = _Requires < _Callable < _Functor > , void > > function ( _Functor ; } ; f ( function < void > ) { f ( [ ] )
[Bug target/53101] Recognize casts to sub-vectors
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101 --- Comment #3 from Marc Glisse 2012-05-01 17:17:42 UTC --- (In reply to comment #2) > but operands[2] and operands[3] don't compare equal with rtx_equal_p, and > trying a match_dup refuses to compile because of the mode mismatch, so I don't > know how to constrain 2 and 3 to be "the same". rtx_equal_p (XEXP (operands[2], 0), XEXP (operands[3], 0)) seems to give the right answer in the 3 manual tests I did. Currently checking if the testsuite finds something. It is very likely not the right way to do it, but I didn't find any inspiring pattern in the .md files. Then I'll see if I understand how the fancy macros make it possible to have a single piece of code for all modes, and if instead of calling gen_vec_extract_lo_v8sf I shouldn't give a replacement pattern like (set (match_dup 0) (vec_select (match_dup 1) (const_int 0))).
[Bug target/53101] Recognize casts to sub-vectors
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101 --- Comment #2 from Marc Glisse 2012-05-01 15:10:26 UTC --- (In reply to comment #1) > We get MEM[(T * {ref-all})&x] for the casting (not a BIT_FIELD_REF for > example). > This gets expanded to > > (insn 6 5 7 (set (reg:OI 63) > (subreg:OI (reg/v:V4DF 61 [ x ]) 0)) t.c:8 -1 > (nil)) > > (insn 7 6 8 (set (reg:V2DF 60 [ ]) > (subreg:V2DF (reg:OI 63) 0)) t.c:8 -1 > (nil)) > > but that should be perfectly optimizable. A bit hard for me (never touched those md files before)... This obviously incorrect code does the transformation: (define_peephole2 [ (set (match_operand:V8SF 2 "memory_operand") (match_operand:V8SF 1 "register_operand") ) (set (match_operand:V4SF 0 "register_operand") (match_operand:V4SF 3 "memory_operand") ) ] "TARGET_AVX" [(const_int 0)] { emit_insn (gen_vec_extract_lo_v8sf (operands[0], operands[1])); DONE; }) (the code in this experiment uses __v4sf and __v8sf instead of __m128d/__m256d in the description above) but operands[2] and operands[3] don't compare equal with rtx_equal_p, and trying a match_dup refuses to compile because of the mode mismatch, so I don't know how to constrain 2 and 3 to be "the same". I tried adding some (subreg: ...) in there, but it didn't match, and looking at the rtl peephole dump, there isn't any subreg there. Then maybe peephole isn't the right place, but that's the only one where I managed to get something that compiles and is executed by the compiler on this testcase.
[Bug middle-end/53100] Optimize __int128 with range information
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53100 --- Comment #3 from Marc Glisse 2012-05-01 12:47:03 UTC --- (In reply to comment #2) > and not to introduce them just before an optimization that removes them. Usually, doing (long)num1*(__int128)(long)num2 does the right thing. I tried in the example here replacing the plain __int128 multiplications with: inline bool g1(__int128 x){ //return(x<=LONG_MAX)&&(x>=LONG_MIN); //on 2 lines because of PR30318, unless you apply the patch I posted there bool b1 = x<=LONG_MAX; bool b2 = x>=LONG_MIN; return b1&&b2; } inline __int128 mul(__int128 a,__int128 b){ bool B=g1(a)&&g1(b); if(__builtin_constant_p(B)&&B) return (long)a*(__int128)(long)b; return a*b; } __builtin_constant_p does detect we are in the right case, however, because of bad timing between the various optimizations, the double cast (__int128)(long)(u-x) is simplified to just (u-x) before it gets a chance to help. I need to replace the subtraction instead (or in addition) to the multiplication: inline __int128 sub(__int128 a,__int128 b){ bool B=g1(a)&&g1(b)&& g1(a-b); if(__builtin_constant_p(B)&&B) return (long)a-(long)b; return a-b; } But it would fit better inside the compiler than as a fragile use of __builtin_constant_p.
[Bug middle-end/27139] Optimize double INT->FP->INT conversions with -ffast-math
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27139 --- Comment #4 from Marc Glisse 2012-05-01 09:32:25 UTC --- Hello Uros, is there any other case you think should be handled, or should we close the bug?
[Bug c++/53173] PROD02
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53173 --- Comment #1 from Marc Glisse 2012-04-30 20:02:59 UTC --- Uh, where are you reporting a bug in gcc? (In reply to comment #0) > I am trying to upgrade (GCC) 4.4.0 to (GCC) 4.6.2. I see bunch of > incompatible > error from code which works with (GCC) 4.4.0 but NOT with (GCC) 4.6.2. Yes, g++ becomes better at detecting illegal code. > 1. error: ‘constexpr’ needed for in-class initialization of static data member Are you using -std=c++0x? Why? > 2. error: no matching function for call to ‘std::pair boost::shared_ptr 3. /usr/include/sigc++-2.0/sigc++/signal.h:38:11: error: 'ptrdiff_t' does not > name a typeFix: #include actually stddef.h if you want ptrdiff_t and not just std::ptrdiff_t (unless there is a using namespace std, as 6. makes me fear) > 4. error: no matching function for call to ‘make_pair(std::string&, > std::string&)’ #include > 5. error: declaration of ‘~typename Missing most of the message again > 6. error: call of overloaded ‘isnan(double&)’ is ambiguous PR48891 maybe? > I do refer https://wiki.edubuntu.org/GCC4.6 to fix some of the issue. I > rebuilt boost_1_47_0, SQLAPI-3.7.35, etc. with (GCC) 4.6.2 as well to remove > incompatibilty between these. Gcc release notes often also contain relevant information, too. > I am suspicious if some of the issue is already fixed in (GCC) 4.6.3 (already > released). What do you mean, fixed? The bugs are in your code. > Please let me know if we can use (GCC) 4.6.3 instead of (GCC) 4.6.2. Sure, more bugs fixed.
[Bug c++/51312] [C++0x] Wrong interpretation of converted constant expressions (for enumerator initializers)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51312 --- Comment #5 from Marc Glisse 2012-04-29 14:12:12 UTC --- Created attachment 27261 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27261 build_enumerator patch Changes the behavior on g++.dg/cpp0x/enum_base.C from an error to a warning in the first 2 cases. And reusing the narrowing warning message may look a bit strange. enum E4 : char { val = 500 // { dg-error "too large" } }; enum_base.C:9:9: warning: narrowing conversion of '500' from 'int' to 'char' inside { } [-Wnarrowing] val = 500 // { dg-error "too large" } ^ enum_base.C:9:9: warning: overflow in implicit constant conversion [-Woverflow] val = 500 // { dg-error "too large" } ^
[Bug c++/53159] New: Missing narrowing check
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53159 Bug #: 53159 Summary: Missing narrowing check Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: marc.gli...@normalesup.org In this example, I get narrowing warnings for a and b but not c. struct X { constexpr operator int() { return __INT_MAX__; } }; int f(){ return __INT_MAX__; } signed char a { __INT_MAX__ }; signed char b { f() }; signed char c { X{} };
[Bug libstdc++/48891] std functions conflicts with C functions when building with c++0x support (and using namespace std)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48891 --- Comment #6 from Marc Glisse 2012-04-29 13:15:40 UTC --- I don't think it matters that much whether the return type is int or bool, compared to the inconvenience of having 2 functions that conflict. The constexpr qualifier is nice, but not required by the standard, and not even by gcc which recognizes that extern "C" int isnan(double) is a builtin (note that it doesn't recognize it anymore if you change the return type to bool, that should be fixed). For the same reason (recognized as a builtin), there is no performance advantage to having it inline. So I think: * glibc could change the return type of isnan to bool in C++ (there would be a regression in that ::isnan wouldn't be constexpr and inline until g++ is taught the right prototype) * libstdc++ could import ::isnan in std::, assuming isnan exists. Maybe that requires a configure test. Maybe that test would be rather fragile (depends on feature macros). Maybe that's where this stops being a good idea :-(
[Bug middle-end/53100] Optimize __int128 with range information
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53100 --- Comment #2 from Marc Glisse 2012-04-29 08:42:36 UTC --- (In reply to comment #1) > On the other hand, tree-vrp does have the information that the > differences are in [-4294967295, 4294967295], which comfortably fits in a type > half the size of __int128. It seems a possible strategy would be to have > tree-vrp mark variables that fit in a type half their size (only for TImode?), > try and preserve that information along the way, and finally use it in > expand_doubleword_mult. An other possibility would be, when the range analysis detects this situation, to have it introduce a double-cast: (__int128)(long)var. In the example here, it would give: ((__int128)(long)((__int128)c-(__int128)a))*((__int128)(long)((__int128)f-(__int128)b)) and existing optimizations already handle: (long)((__int128)c-(__int128)a) as (long)c-(long)a and (__int128)mylong1*(__int128)mylong2 as a widening multiplication. But then we'd have to be careful not to introduce too many such casts, not to introduce them too late, and not to introduce them just before an optimization that removes them. And find the appropriate half-sized type to cast to. And possibly do this only for modes not handled natively.
[Bug middle-end/53100] Optimize __int128 with range information
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53100 --- Comment #1 from Marc Glisse 2012-04-29 08:05:59 UTC --- (In reply to comment #0) > It would be convenient if I > could just write the whole code with __int128 and let the compiler do the > optimization by tracking the range of numbers. The transformation from an __int128 to a pair of long happens extremely late (optabs.c), so we can't count on tree-vrp to notice that one of them is always zero (and actually it is either 0 or -1, as a sign extension, which would make this hard). On the other hand, tree-vrp does have the information that the differences are in [-4294967295, 4294967295], which comfortably fits in a type half the size of __int128. It seems a possible strategy would be to have tree-vrp mark variables that fit in a type half their size (only for TImode?), try and preserve that information along the way, and finally use it in expand_doubleword_mult. But that seems to imply storing the information in an rtx, and rtx seems a bit too densely packed to add this. Better ideas?
[Bug c/43772] Errant -Wlogical-op warning when testing limits
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43772 --- Comment #19 from Marc Glisse 2012-04-28 22:16:55 UTC --- (In reply to comment #18) > I'm afraid that false positives would still be likely. > For example, suppose we're on a platform where > INT_MAX = LONG_MAX < INTMAX_MAX. Then: > > intmax_t i = (whatever); > if (INT_MAX < i && i <= LONG_MAX) > print ("i is in 'long' but not 'int' range"); Have you actually seen that? I would imagine the following to be more common: if(i<=INT_MAX) print("i is in 'int'"); else if(i<=LONG_MAX) ... > This sort of thing is fairly common in portable code, > and GCC shouldn't warn about it merely because > we're on a platform where the two tests cannot both > be true when INT_MAX == LONG_MAX. Well, can you define a set of circumstances where gcc could / should warn? a
[Bug testsuite/53155] Not parallel: test for -j fails with new make
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53155 --- Comment #2 from Marc Glisse 2012-04-28 21:49:43 UTC --- laptop-mg /tmp/m $ cat Makefile all: $(MAKE) plouf plouf: echo $(MFLAGS) "$(filter -j, $(MFLAGS))" laptop-mg /tmp/m $ make -j make plouf make[1]: Entering directory `/tmp/m' echo -wj "" -wj make[1]: Leaving directory `/tmp/m' version 3.81-8.2
[Bug testsuite/53155] New: Not parallel: test for -j fails
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53155 Bug #: 53155 Summary: Not parallel: test for -j fails Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: testsuite AssignedTo: unassig...@gcc.gnu.org ReportedBy: marc.gli...@normalesup.org Hello, in order to decide whether they should run the testsuite in parallel, the makefiles in gcc/ and libstdc++-v3/testsuite/ use the following test: [ "$(filter -j, $(MFLAGS))" = "-j" ] However, at least with the gnu make 3.81 shipped by debian, MFLAGS merges all options, so it would normally be something like -wkj, which doesn't match the filter.
[Bug c/43772] Errant -Wlogical-op warning when testing limits
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43772 --- Comment #17 from Marc Glisse 2012-04-28 18:49:49 UTC --- (In reply to comment #16) > I understand now, and I think you are right. We don't have a warning for > "((int)x) < INT_MIN" or ((int)x) > INT_MAX but I think it should go to > Wtype-limits. Interestingly, for an int i, we don't warn for x<=INT_MAX, but we do warn for x<=(long)INT_MAX (adapt if your platform has int and long of the same size). > Do you think we could test this situation just before the Wlogical-op warning? It is easy to re-check inside warn_logical_operator if one of the tests is always true. I have no idea how to pass the information from Wtype-limits that warn_logical_operator shouldn't be called. > I can see that some macros may generate x >= INT_MIN but the x < INT_MIN case > seems less likely to be intented and we should warn (and then return and avoid > warning with Wlogical-op). I think < INT_MIN and >= INT_MIN should either both warn of both be quiet. It is a matter of style whether people write: if (x in range) do the work; or if (x out of range) abort; do the work; (In reply to comment #12) > Do you mean: > > if (or_op && integer_onep(tem)) { warn();} > else if (!or_op && integer_zerop(tem)) { warn();} Even smaller would be to replace the current (TREE_CODE (tem) != INTEGER_CST) with integer_zerop(tem) and pass build_range_check in_p^or_op (or in_p==or_op, don't know which) instead of just in_p. It would already be an improvement over the current situation, and I expect the remaining false positives to be very rare. i>=INT_MIN&&isomething are common, but isomething seems less likely.
[Bug tree-optimization/30318] VRP does not create ANTI_RANGEs on overflow
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30318 --- Comment #5 from Marc Glisse 2012-04-28 13:18:25 UTC --- Created attachment 27260 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27260 Wrap using gmp I find it easier to use bignum and wrap at the end, instead of checking for each operation if it overflows. There is something wrong about having better range propagation for the wrapping case than for the case where overflow is undefined behavior. There are cases where a range is set to varying whereas it could be set to empty, and the branch marked as unreachable (haven't seen how that's done). But that's not the subject of this bug.
[Bug c/43772] Errant -Wlogical-op warning when testing limits
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43772 --- Comment #15 from Marc Glisse 2012-04-28 12:55:28 UTC --- (In reply to comment #14) > (In reply to comment #13) > > > > Except that this version would warn for xINT_MAX, whereas this > > belongs to other warnings. So testing the triviality of the first ranges > > seems > > best. > > I don't understand. This warning (whatever its name) should precisely warn for > that with "logical 'and' of mutually exclusive tests is always false". No, there could be a warning that the first test is always false, another one that the second one is always false, but adding a third warning that the conjunction of the 2 is always false seems bogus. This warning is meant for: x<5&&x>10, where each test independently could be true, just not both at the same time. At least that is my understanding...
[Bug c/53131] -Wlogical-op: ready for prime time in -Wall ?
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53131 --- Comment #6 from Marc Glisse 2012-04-28 12:45:19 UTC --- (In reply to comment #5) > It seems a pretty small warning, but I guess #1 and #2 could > be split up, if that helps get #2 in. I think it is the opposite actually, #2 is more controversial than #1 (at least until PR43772 is fixed).
[Bug c/43772] Errant -Wlogical-op warning when testing limits
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43772 --- Comment #13 from Marc Glisse 2012-04-28 12:40:14 UTC --- (In reply to comment #10) > But there is something strange, because it is warning "it is always false", > which is obviously not true. So I think at some moment it is doing some > transformation we don't want to do. It notices that it should warn, and unless one of the first ranges is trivial (a case it forgot), with an operator &&, the only warning that makes sense is that it is always false. It never shows that it is false, it is just a bit hasty in deciding which warning to pick. And indeed the "logical and...always true" sentence does not exist, because it doesn't make sense. (In reply to comment #11) > (In reply to comment #9) > > It forgets to check first whether the first 2 ranges are trivial. > Or easier, instead of checking: > if (TREE_CODE (tem) != INTEGER_CST) > it could check integer_onep(tem) or integer_zerop(tem) depending on or_op. Or > build a tree integer constant from or_op and tree_int_cst_equal it to tem. Except that this version would warn for xINT_MAX, whereas this belongs to other warnings. So testing the triviality of the first ranges seems best.
[Bug c/43772] Errant -Wlogical-op warning when testing limits
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43772 --- Comment #11 from Marc Glisse 2012-04-28 12:33:26 UTC --- (In reply to comment #9) > It forgets to check first whether the first 2 ranges are trivial. Or easier, instead of checking: if (TREE_CODE (tem) != INTEGER_CST) it could check integer_onep(tem) or integer_zerop(tem) depending on or_op. Or build a tree integer constant from or_op and tree_int_cst_equal it to tem.
[Bug c/43772] Errant -Wlogical-op warning when testing limits
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43772 Marc Glisse changed: What|Removed |Added CC||marc.glisse at normalesup ||dot org --- Comment #9 from Marc Glisse 2012-04-28 12:19:54 UTC --- For : x>=INT_MIN && x<=INT_MAX the code creates a range for x>=INT_MIN, another range for x<=INT_MAX, merges them into a single range, checks that that range is trivial (empty or full), and then warns according to the operator && or ||. It forgets to check first whether the first 2 ranges are trivial.
[Bug c/53131] -Wlogical-op: ready for prime time in -Wall ?
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53131 Marc Glisse changed: What|Removed |Added CC||marc.glisse at normalesup ||dot org --- Comment #4 from Marc Glisse 2012-04-28 12:05:20 UTC --- (In reply to comment #2) > > Do the warnings indicate bugs or not? > Yes. I checked the first ten. Could you give a sample? -Wlogical-op merges 2 unrelated warnings: *) x && 2 (you would expect a boolean, not 2, so maybe x&2 was meant) *) x<0 && x>0 (not so likely to happen) or x>=-5 || x<2 (always true) and it is not clear which one you are most interested in.
[Bug c/53153] ice in tree_low_cst, at tree.c:6569
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53153 Marc Glisse changed: What|Removed |Added CC||marc.glisse at normalesup ||dot org --- Comment #1 from Marc Glisse 2012-04-28 09:11:45 UTC --- Reduced: void f (char *BufPtr) { int Char = *BufPtr; switch (Char) { case 'a': case 181: case ~(0xff & (~180)): PrintError(); } } $ gcc a.c -c -O2 a.c: In function 'f': a.c:3:3: internal compiler error: in tree_low_cst, at tree.c:6569 switch (Char) { ^ The regression is recent.
[Bug c++/53139] internal compiler error: expected a type, got '#'tree_vec' not supported by dump_expr#'
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53139 Marc Glisse changed: What|Removed |Added CC||marc.glisse at normalesup ||dot org --- Comment #1 from Marc Glisse 2012-04-27 10:26:35 UTC --- Works fine on trunk (since very recently). The 4.6 message looks fine (it indeed wasn't implemented in 4.6). Can you check whether it works with a 4.7 snapshot?
[Bug c++/29131] [DR 225] Bad name lookup for templates due to fundamental types namespace for ADL.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29131 Marc Glisse changed: What|Removed |Added CC||marc.glisse at normalesup ||dot org --- Comment #24 from Marc Glisse 2012-04-26 19:14:45 UTC --- (In reply to comment #22) > I am sorry if my knowledge on this issue is limited, but if I put t() and f() > in namespace glm (re. the code in comment #20), should this compile? (That is > what you comment #19 implies). Actually it does not. So you are talking about this? Notice how vec3 isn't actually in glm. Interactions between namespaces and name lookup can be difficult. namespace glm { namespace detail { struct vec3{}; } using detail::vec3; } template int t(T i) { return f (i); } namespace glm { int f (glm::vec3 i) { return 0; } } int main() { glm::vec3 b; return t(b); }
[Bug c++/53121] New: Allow static_cast from pointer-to-vector to pointer-to-object
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53121 Bug #: 53121 Summary: Allow static_cast from pointer-to-vector to pointer-to-object Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: marc.gli...@normalesup.org Hello, casting a __m128d* to a double* currently requires an ugly C cast. However, this pointer cast is the official way to access the elements of the vector, so I believe it should be allowed in a static_cast. Whether that should extend to casts with references and arrays is a harder question, but the answer is probably yes. Casts to sub-vectors (__m256d* to __m128d*) would be a bonus. #include double* f(__m128d* x){ return static_cast(x); // return (double*)(x); } v.cc: In function ‘double* f(__m128d*)’: v.cc:3:32: error: invalid static_cast from type ‘__m128d* {aka __vector(2) double*}’ to type ‘double*’
[Bug c++/53000] Conditional operator does not behave as standardized
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53000 --- Comment #7 from Marc Glisse 2012-04-24 23:23:09 UTC --- (In reply to comment #6) > which way is the standards committee leaning? The DR is young, there hasn't been a meeting since. There weren't many objections to the proposed resolution, although it did seem strange to some that common_type::type would be int and not int&. I am too new to the process to say more... (I guess the proposed resolution should make the one-argument version of common_type equivalent to decay, to be consistent)
[Bug c++/53000] Conditional operator does not behave as standardized
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53000 --- Comment #5 from Marc Glisse 2012-04-24 22:35:31 UTC --- (In reply to comment #4) > it's not obvious to me what the right fix is > either so I'm not in a rush to change anything. Actually, I now believe it is a good idea to rush (well, maybe not quite) the change: - it is needed by clang, - it gives users an opportunity to complain against the proposed resolution (if they don't, it is an argument in favor of it), - it removes an excuse not to fix ?: with xvalues. I think I've canceled my comment #3 enough that we are back to your comment #2 where you were proposing to make the change ;-)
[Bug c++/53000] Conditional operator does not behave as standardized
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53000 Marc Glisse changed: What|Removed |Added CC||marc.glisse at normalesup ||dot org --- Comment #3 from Marc Glisse 2012-04-24 19:31:52 UTC --- (In reply to comment #2) > Confirmed. > > I suppose we could make the libstdc++ change now rather than waiting for the > FE > fix, as it shouldn't change the current behaviour of the library. It doesn't seem completely obvious to me that this is the right library fix. What happens if instead of the standard declval you use the trivial version? template _Tp __declval2() noexcept; (except for the obvious problem with indestructible types, but then the decay version may give you an answer that isn't constructible from the input for references to a non-copyable type, so that's fair) Rereading the DR, it appears that some people actually want to decay independently from this rvalue issue, which is quite a strong change. And after all, people can use decay, but if decay is included in common_type, it can't be undone. Although now that I think as a library writer who has to specialize common_type for some of his types, I don't really want to specialize it for all cv-ref variants of my types, so I'd actually like the default common_type to decay not only the result, but also its arguments! And while we are at it, it could even try canonicalizing them, like operator auto(). Hmm, I guess you can forget this rant and go ahead (I am still posting it because there may be real arguments somewhere).
[Bug c++/51033] generic vector subscript and shuffle support was not added to C++
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51033 --- Comment #23 from Marc Glisse 2012-04-24 11:57:22 UTC --- (In reply to comment #21) > What does it mean "exercise the backend a lot"? Do you mean it takes a lot of > time? I think so. > I haven't looked at the tests, but I think it is not a problem to run > compile-only tests with both gcc and g++. compile-time tests are not always sufficient. The __builtin_shuffle tests are spread in: gcc.dg{,/torture} gcc.target/{i386,powerpc} gcc.c-torture/{compile,execute} I assume the tests in gcc.dg can move to c-c++-common. The target tests should stay in target. Not sure about gcc.c-torture. But one interesting thing to test is if the front-end passes the arguments as constants and thus the backend can use specialized code instead of the slow generic one. And this kind of test seems necessarily target-specific. Bah, I guess I shouldn't ask for too much and moving the gcc.dg tests would be enough.
[Bug target/53101] New: Recognize casts to sub-vectors
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53101 Bug #: 53101 Summary: Recognize casts to sub-vectors Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: marc.gli...@normalesup.org Target: x86_64-linux-gnu Hello, starting from an AVX __m256d vector x, getting its first element is best done with *(double*)&x, which is what x[0] internally does, and which generates no instruction (well, the following has vzeroupper, but let's forget that). However, *(__m128d*)&x generates 2 movs and I have to explicitly use _mm256_extractf128_pd to get the proper nop. Could the compiler be taught to recognize the casts between pointers to vectors of the same object type the same way it recognizes casts to pointers to that object type? #include #if 0 typedef double T; #else typedef __m128d T; #endif T f(__m256d x){ return *(T*)&x; } The closest report I found is PR 44551, which is quite different. PR 29881 shows that using a union is not an interesting alternative. I marked this one as target, but it may very well be that the recognition should be in the middle-end, or even that the front-end should mark the cast somehow.
[Bug middle-end/53100] New: Optimize __int128 with range information
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53100 Bug #: 53100 Summary: Optimize __int128 with range information Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: middle-end AssignedTo: unassig...@gcc.gnu.org ReportedBy: marc.gli...@normalesup.org (not sure about the "component" field) In the following program, on x86_64, the first version generates two imulq, while the second generates 4 imulq and 2 mulq. It would be convenient if I could just write the whole code with __int128 and let the compiler do the optimization by tracking the range of numbers. int f(int a,int b,int c,int d,int e,int f){ #if 0 long x=a; long y=b; long z=c; long t=d; long u=e; long v=f; return (z-x)*(__int128)(v-y) < (u-x)*(__int128)(t-y); #else __int128 x=a; __int128 y=b; __int128 z=c; __int128 t=d; __int128 u=e; __int128 v=f; return (z-x)*(v-y) < (u-x)*(t-y); #endif }
[Bug middle-end/27139] Optimize double INT->FP->INT conversions with -ffast-math
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27139 Marc Glisse changed: What|Removed |Added CC||marc.glisse at normalesup ||dot org --- Comment #2 from Marc Glisse 2012-04-24 06:35:43 UTC --- (In reply to comment #0) > int test (int a) > { > return (double) a; > } I just wrote the very same testcase today, extracted from my code... > Produces: > > cvtsi2sd%edi, %xmm0 > cvttsd2si %xmm0, %eax > ret Still does. Did you have any idea how to handle it? > However, following code does the same (at least for -ffast-math): > movl%edi, %eax > ret I don't think -ffast-math is relevant here, on x86 the int->double conversion is exact hence the reverse has to be as well. (In reply to comment #1) > Confirmed, I doubt this shows up that much anyways. Just posting to mention that it does show up...
[Bug c++/53094] New: vector literal
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53094 Bug #: 53094 Summary: vector literal Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: marc.gli...@normalesup.org Hello, VECTOR_TYPE should be a literal type in C++11, so we can have for instance: constexpr __m128i v = { 1, 0 }; constexpr __m128i s = v + v; Once PR c++/51033 is fixed, ideally, the following would also work: constexpr long long i = v[1]; constexpr __m128i w = __builtin_shuffle (m, m); but I guess this can be made in several steps as long as the compiler doesn't ICE on those.
[Bug middle-end/53082] local malloc/free optimization
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53082 Marc Glisse changed: What|Removed |Added CC||marc.glisse at normalesup ||dot org --- Comment #2 from Marc Glisse 2012-04-23 07:11:14 UTC --- (In reply to comment #1) > Dup of an older bug 19831. The second part (coalescing mallocs and/or replacing them with alloca) doesn't look like a dup of 19831.
[Bug c++/51033] generic vector subscript and shuffle support was not added to C++
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51033 --- Comment #22 from Marc Glisse 2012-04-22 15:09:23 UTC --- (In reply to comment #20) > And then I still need to write a cxx_eval_vec_perm function so the result of > __builtin_shuffle can be constexpr. I haven't seen how the C front-end handles > shuffles of constants. Maybe a "sorry" would do for now. Making vectors literals is too much for now, the following seems sufficient as long as they are not. --- cp/semantics.c(revision 186667) +++ cp/semantics.c(working copy) @@ -8262,10 +8262,11 @@ potential_constant_expression_1 (tree t, case TRANSACTION_EXPR: case IF_STMT: case DO_STMT: case FOR_STMT: case WHILE_STMT: +case VEC_PERM_EXPR: if (flags & tf_error) error ("expression %qE is not a constant-expression", t); return false; case TYPEID_EXPR:
[Bug c++/51033] generic vector subscript and shuffle support was not added to C++
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51033 --- Comment #20 from Marc Glisse 2012-04-22 13:21:14 UTC --- (In reply to comment #19) > Created attachment 27217 [details] > shuffle Doesn't work with -std=c++11, which requires: --- semantics.c(revision 186667) +++ semantics.c(working copy) @@ -5603,11 +5603,12 @@ float_const_decimal64_p (void) bool literal_type_p (tree t) { if (SCALAR_TYPE_P (t) - || TREE_CODE (t) == REFERENCE_TYPE) + || TREE_CODE (t) == REFERENCE_TYPE + || TREE_CODE (t) == VECTOR_TYPE) return true; if (CLASS_TYPE_P (t)) { t = complete_type (t); gcc_assert (COMPLETE_TYPE_P (t) || errorcount); @@ -8487,10 +8488,11 @@ potential_constant_expression_1 (tree t, want_rval, flags)) return false; return true; case FMA_EXPR: +case VEC_PERM_EXPR: for (i = 0; i < 3; ++i) if (!potential_constant_expression_1 (TREE_OPERAND (t, i), true, flags)) return false; return true; And then I still need to write a cxx_eval_vec_perm function so the result of __builtin_shuffle can be constexpr. I haven't seen how the C front-end handles shuffles of constants. Maybe a "sorry" would do for now.
[Bug c++/51033] generic vector subscript and shuffle support was not added to C++
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51033 --- Comment #19 from Marc Glisse 2012-04-22 10:31:33 UTC --- Created attachment 27217 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27217 shuffle With this patch, g++ passes the few __builtin_shuffle tests I tried, and generates generic code for non-constant indexes and special code for constant indexes. I don't really know what to do about the testsuite. The tests exercise the backend a lot, and it probably doesn't make sense to run everything with both gcc and g++. But we still want to test that g++ accepts the syntax, and maybe even that it handles constants well. Content of the patch: - move c_build_vec_perm_expr to c-common and condition the maybe_const stuff to the dialect - adapt the C RID_BUILTIN_SHUFFLE parser code to the C++ FE (the 2 are different enough that it isn't easy to share) - remove the C_ONLY tag from __builtin_shuffle As usual, my limited knowledge of the compiler means I may have missed fundamental things.
[Bug c/53060] Typo in build_binary_op for scalar-vector ops
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53060 --- Comment #2 from Marc Glisse 2012-04-21 14:59:54 UTC --- (In reply to comment #1) > * gcc.dg/scal-to-vec2.c: New test. This one runs the problematic code, but since this is a compile-only test, it can't detect a problem. A variant that does fail: extern void abort (void); int f(void) { return 2; } unsigned intg(void) { return 5; } unsigned inth = 1; typedef unsigned int vec __attribute__((vector_size(16))); vec i = { 1, 2, 3, 4}; vec fv1(void) { return i + (h ? f() : g()); } vec fv2(void) { return (h ? f() : g()) + i; } int main(){ vec j = fv1(); if (j[0] != 3) abort(); } (it works ok with fv2)
[Bug c/53060] New: Typo in build_binary_op for scalar-vector ops
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53060 Bug #: 53060 Summary: Typo in build_binary_op for scalar-vector ops Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: marc.gli...@normalesup.org Hello, in file c-typeck.c, function build_binary_op, for mixed scalar-vector operations, there are 2 cases: stv_firstarg and stv_secondarg. The first one has: op0 = c_wrap_maybe_const (op0, true); while the second has: op0 = c_wrap_maybe_const (op1, true); I think the second one should read "op1 = ...", for symmetry. I haven't managed to come up with a testcase that runs this line of code :-(
[Bug c++/53057] [c++0x] ICE on construction off of initializer list with overloads for constructor
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53057 Marc Glisse changed: What|Removed |Added CC||marc.glisse at normalesup ||dot org --- Comment #1 from Marc Glisse 2012-04-21 07:57:23 UTC --- This seems to have been fixed recently on trunk. Maybe related to PR c++/52905 ?
[Bug c++/53025] [C++11] noexcept operator depends on copy-elision
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53025 --- Comment #2 from Marc Glisse 2012-04-21 07:45:57 UTC --- Created attachment 27210 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27210 patch Bootstrapped and regression tested. Not posting it to gcc-patches yet, for several reasons: - I have other patches (at least 3) waiting for a review, - I am not 100% certain that this can't cause legitimate elisions to be missed (say if something is first instantiated inside the noexcept), - people may not like using globals that way, - I might prefer the old behavior... but if anyone wants to submit it, feel free.
[Bug c++/53055] ICE in cp_build_indirect_ref, at cp/typeck.c:2836
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53055 Marc Glisse changed: What|Removed |Added CC||marc.glisse at normalesup ||dot org --- Comment #2 from Marc Glisse 2012-04-20 14:44:29 UTC --- A brutal application of delta gives this short but non-sensical code: void f () ; struct A A :: * p ; int i = p ->* f ;
[Bug c++/51314] [C++0x] sizeof... and parentheses
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51314 --- Comment #2 from Marc Glisse 2012-04-19 21:19:23 UTC --- Created attachment 27200 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27200 patch s.cc: In function 'void f(U ...)': s.cc:3:18: error: 'sizeof...' argument must be surrounded by parentheses A x; // template argument 1 is invalid ^ s.cc:3:19: error: template argument 1 is invalid A x; // template argument 1 is invalid ^ s.cc:3:22: error: invalid type in declaration before ';' token A x; // template argument 1 is invalid ^ s.cc: At global scope: s.cc:10:37: error: 'sizeof...' argument must be surrounded by parentheses typedef Indices type; // OK ^ Error recovery is not that great in the first case, but fine in the second.
[Bug c++/53036] [c++11] trivial class fails std::is_trivial test
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53036 --- Comment #2 from Marc Glisse 2012-04-19 12:14:04 UTC --- Created attachment 27189 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27189 basic patch The patch detects D as trivial. Sadly, on this case: struct A { A()=default; A(int=2); }; it says A is trivial whereas I guess the ambiguity makes it non-trivial. That could be solved for the traits by combining it with is_default_constructible, but it may be problematic to let g++ internally believe that the class is trivially default constructible. For some strange reason, in the case of an ellipsis: struct A { A()=default; A(...); }; it does say: non-trivial. Maybe the whole dance should only be done if the constructor argument is a parameter pack (one that belongs to the function? or several packs?).
[Bug c++/53036] [c++11] trivial class fails std::is_trivial test
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53036 Marc Glisse changed: What|Removed |Added CC||marc.glisse at normalesup ||dot org --- Comment #1 from Marc Glisse 2012-04-19 06:28:57 UTC --- (In reply to comment #0) > In my understanding of the new C++ standard, the following code should > compile. > It does not. > > struct D > { > D() = default; > D(D const &) = default; > template > constexpr D(U ...u) > {} > }; > static_assert(std::is_trivial::value, "here"); With the declarations in this order, it seems easy to fix, in grok_special_member_properties, only set TYPE_HAS_COMPLEX_DFLT to 1 if we didn't already have TYPE_HAS_DEFAULT_CONSTRUCTOR (might have hidden issues, but they are not obvious to me). Now if you put the defaulted constructor after the user-provided variadic one, it becomes much harder, and it looks like we'd have to remember one extra bit of information: the reason why we set TYPE_HAS_COMPLEX_DFLT.
[Bug c++/53025] [C++11] noexcept operator depends on copy-elision
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53025 Marc Glisse changed: What|Removed |Added CC||marc.glisse at normalesup ||dot org --- Comment #1 from Marc Glisse 2012-04-18 06:18:57 UTC --- (In reply to comment #0) > b) An even more convincing argument is that when adding the compiler argument > > --no-elide-constructors > > the original code becomes accepted as well, thus the outcome indeed depends on > copy-elision taking place or not. The semantics of the noexcept operator > (5.3.7) are described by "potentially evaluated functions calls" and 3.2 p3 > says in a note that "A constructor selected to copy or move an object of class > type is odr-used even if the call is actually elided by the implementation", > so > this observable behaviour seems to be non-conforming. It seems you are right, because the standard gives an unusual definition of "potentially evaluated". In English an elided function call is not potentially evaluated as the code for it isn't even generated. It looks like the standard may require noexcept to be computed as if there were no elisions, but that is a code pessimization that may not be necessary (or it may be, so we can better rely on noexcept not subtly changing when the circumstances are just different enough that the compiler won't elide a copy). I wonder if saving flag_elide_constructors and setting it to false in cp_parser_unary_expression before calling cp_parser_expression, and restoring it afterwards (like many other flags get saved, set and restored) would be enough, or if the elision is sometimes done later.
[Bug c/53024] New: Power of 2 requirement on vector_size not documented
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53024 Bug #: 53024 Summary: Power of 2 requirement on vector_size not documented Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: marc.gli...@normalesup.org Hello, typedef float VEC __attribute__ ((__vector_size__ (12))); fails to compile with the message: error: number of components of the vector not a power of two This is quite clear, and I guess it makes sense. However, http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html says: "Specifying a combination that is not valid for the current architecture will cause GCC to synthesize the instructions using a narrower mode." so I was expecting gcc to handle it somehow. Could we add a sentence, anywhere in that page, that makes the requirement that the size is a power of 2 explicit? Or if the requirement can be lifted... (I don't care so much about 3 float, I can just store 4 and ignore the last, but I do care about 12 double and don't want to store 16 until we get 512bit vectors)
[Bug c++/51033] generic vector subscript and shuffle support was not added to C++
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51033 --- Comment #18 from Marc Glisse 2012-04-17 16:41:58 UTC --- (In reply to comment #17) > > And now I should actually bootstrap and run the testsuite ;-) > Good luck! It worked fine, same failures as I got the other day for another patch. > BTW, it may be handy to get an account in the GCC compile farm: > http://gcc.gnu.org/wiki/CompileFarm Thanks for the advice. I looked into it once, but don't currently need it: make -j check leaves my 3 year old desktop at least 60% idle, and the few architectures that could tempt me are not available in the farm (an x64 with AVX? a sparc with VIS3?).
[Bug c++/51033] generic vector subscript and shuffle support was not added to C++
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51033 --- Comment #16 from Marc Glisse 2012-04-17 13:57:05 UTC --- (In reply to comment #15) > Are you planning to send it to gcc-patches for approval or are you not happy > with it yet? There is the problem of moving the testcases. What svn diff prints is nonsense, so I guess I should just write the Changelog and let whoever commits do the moves? The following can move to c-c++-common: gcc.dg/vector-2.c gcc.dg/vector-subscript-2.c gcc.dg/vector-3.c gcc.dg/vector-subscript-3.c gcc.dg/vector-init-1.c gcc.dg/vector-4.c gcc.dg/vector-init-2.c gcc.dg/vector-1.c gcc.dg/vector-subscript-1.c with these minor modifications: Index: c-c++-common/vector-subscript-1.c === --- c-c++-common/vector-subscript-1.c(revision 186523) +++ c-c++-common/vector-subscript-1.c(working copy) @@ -6,7 +6,7 @@ float vf(vector float a) { - return 0[a]; /* { dg-error "subscripted value is neither array nor pointer nor vector" } */ + return 0[a]; /* { dg-error "subscripted value is neither array nor pointer nor vector|invalid types .* for array subscript" } */ } Index: c-c++-common/vector-3.c === --- c-c++-common/vector-3.c(revision 186523) +++ c-c++-common/vector-3.c(working copy) @@ -2,4 +2,7 @@ /* Check that we error out when using vector_size on the bool type. */ +#ifdef __cplusplus +#define _Bool bool +#endif __attribute__((vector_size(16) )) _Bool a; /* { dg-error "" } */ And now I should actually bootstrap and run the testsuite ;-)
[Bug c++/51033] generic vector subscript and shuffle support was not added to C++
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51033 --- Comment #14 from Marc Glisse 2012-04-17 13:06:40 UTC --- Created attachment 27178 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27178 subscript 2 (Manuel-compliant)
[Bug c++/51033] generic vector subscript and shuffle support was not added to C++
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51033 --- Comment #12 from Marc Glisse 2012-04-17 11:59:12 UTC --- (In reply to comment #11) > If it is indeed a copy, you should move the code c-common.c and share it. The > C-family FEs should share as much code as possible. I agree on the principle. If more code was shared, C++ would already support this feature ;-) On the other hand, here I am copying a small block of code in the middle of a function. Making just that paragraph common wouldn't make much sense imho. Factoring most of (cp_)build_array_ref might make sense, but requires someone with a better understanding of the FEs, because there are slight differences that may or may not be relevant.
[Bug c++/53017] New: Integer constant not constant enough for vector_size
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53017 Bug #: 53017 Summary: Integer constant not constant enough for vector_size Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: marc.gli...@normalesup.org In the following code, s is apparently not an acceptable parameter for the vector_size attribute, but s+0 is. constexpr int s=32; typedef double VEC __attribute__ ((__vector_size__ (s #ifndef BUG + 0 #endif ))); VEC a={2.,3.,4.}; $ g++ -std=c++0x v.cc -Wall -W -c -O3 $ g++ -std=c++0x v.cc -Wall -W -c -O3 -DBUG v.cc:6:4: warning: '__vector_size__' attribute ignored [-Wattributes] v.cc:8:16: error: scalar object 'a' requires one element in initializer
[Bug c++/51033] generic vector subscript and shuffle support was not added to C++
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51033 --- Comment #10 from Marc Glisse 2012-04-17 10:22:07 UTC --- Created attachment 27176 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27176 subscript This patch (a simple copy of a paragraph from the C front-end) seems sufficient to add vector subscript support to the C++ front-end. At least, on the related testcases I could find in the testsuite (vector-init-2.c, vector-subscript-[123].c), g++ produces the same results as gcc (some error messages have different content, but the same meaning, and the carets point to '[' in C and ']' in C++). I don't know if any of the functions called have more idiomatic counterparts in the C++ front-end. __builtin_shuffle seems a bit harder to move for someone not familiar with the code. Note that in C++ operator[] can only be a member function, which means we don't need to worry about overloading or anything like that.
[Bug c++/50025] [DR 1288] C++0x initialization syntax doesn't work for class members of reference type
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50025 Marc Glisse changed: What|Removed |Added CC||marc.glisse at normalesup ||dot org --- Comment #7 from Marc Glisse 2012-04-14 07:07:05 UTC --- Link changed now that it has been voted into the working paper: http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1288 should it be un-suspended?
[Bug target/52607] v4df __builtin_shuffle with {0,2,1,3} or {1,3,0,2}
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607 --- Comment #29 from Marc Glisse 2012-04-11 20:35:00 UTC --- Created attachment 27136 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27136 V4DF generic shuffle A patch (independent from the others) implementing what is explained in the last 2 comments. It is simple and works really well, all V4DF shuffles (even with 2 vectors) take only 3 insn (and often just 2). It only requires AVX, but also improves a lot on the current AVX2 code which casts to vectors of integers and uses up to 9 insn (although my "default case" patch also goes down to 3 insn on AVX2). The drawback is that it is limited to V4DF. vshufps is a different enough beast from vshufpd that it would require a different code, which wouldn't even apply that often. For V8SF, my "default case" patch seems more interesting. Integer vectors have different instructions again... By the way, I tested all V4DF permutations (there are only 2^12 of them) in the simulator. I also have a file (400K) with the code for each permutation, that looks like the following: 0,0,0,0 vpermilpd$0, %ymm0, %ymm0 vperm2f128$0, %ymm0, %ymm0, %ymm0 [...] 1,7,6,3 vperm2f128 $48, %ymm1, %ymm0, %ymm2 vperm2f128 $19, %ymm1, %ymm0, %ymm0 vshufpd $11, %ymm0, %ymm2, %ymm0 1,7,6,4 vperm2f128 $48, %ymm1, %ymm0, %ymm0 vperm2f128 $33, %ymm1, %ymm1, %ymm1 vshufpd $3, %ymm1, %ymm0, %ymm0 [...] If anyone wants to take a look, tell me and I'll attach it.
[Bug target/52607] v4df __builtin_shuffle with {0,2,1,3} or {1,3,0,2}
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607 --- Comment #28 from Marc Glisse 2012-04-11 16:48:47 UTC --- A difficulty I hadn't foreseen is that the code that canonicalizes permutations (and in particular checks if one of the operands is unused) is in ix86_expand_vec_perm_const. So if I ask expand_vec_perm_1 to generate the 2-operand 0,1,2,3 permutation, it will happily generate vperm2f128 with immediate 16 without noticing that it is the identity on the first operand. I should probably move that code into its own function so I can call it before expand_vec_perm_1.
[Bug libstdc++/52931] New: std::hash shouldn't be defined for unknown types
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52931 Bug #: 52931 Summary: std::hash shouldn't be defined for unknown types Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: marc.gli...@normalesup.org As explained by Daniel Krügler in c++std-lib-32420 and nearby messages, the default definition of std::hash is non-standard, it is supposed to be undefined so we can test with sfinae whether hash was specialized for some type.
[Bug target/52607] v4df __builtin_shuffle with {0,2,1,3} or {1,3,0,2}
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607 --- Comment #27 from Marc Glisse 2012-04-09 16:50:47 UTC --- Notes to self (or other): - Intel's SDE makes it possible to test without appropriate hardware; - for V4DF shuffles, there seems to be a very simple generic solution that performs two vperm2f128 and then one vshufpd. permutation (a,b,c,d), input (x,y): t1 = vperm2f128(x,y,(a/2)+16*(c/2)); t2 = vperm2f128(x,y,(b/2)+16*(d/2)); return vshufpd(t1,t2,(a%2)+2*(b%2)+4*(c%2)+8*(d%2)); (when t1 or t2 is equal to x or y, it generates only 2 insn in cases that the current code doesn't detect, like {3,1,2,2})
[Bug c++/52901] invalid rvalue reference
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52901 Marc Glisse changed: What|Removed |Added CC||marc.glisse at normalesup ||dot org --- Comment #3 from Marc Glisse 2012-04-08 15:32:45 UTC --- (In reply to comment #2) > > X&& f() { > > X x; > > return std::move(x); > > } > > This function is unsafe, it returns a reference to a local variable. You > probably meant it to return X not X&& > > It is effectively the same as: > > X& f() { >X x; >return x; > } > > (except G++ warns about that, because it's simpler) Maybe this could be taken as a RFE for a warning with std::move? Many people learning C++11 are bound to try similar things. g++ warns for return X(); return static_cast(x); but not return std::move(x); I expect the case of std::move to be important enough that if doing a generic warning is too hard, special-casing std::move could be worth the trouble (assuming it is easier).
[Bug c++/49152] Unhelpful diagnostic for iterator dereference
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49152 Marc Glisse changed: What|Removed |Added CC||marc.glisse at normalesup ||dot org --- Comment #29 from Marc Glisse 2012-04-01 20:28:14 UTC --- (In reply to comment #24) > Personally, I don't believe Gaby is open to other solutions outside the > full-fledged "caret diagnostics" context, He didn't seem opposed to _adding_ the type information (without removing the current information).
[Bug c++/52654] [C++11] Warn on overflow in user-defined literals
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52654 --- Comment #7 from Marc Glisse 2012-03-31 17:18:37 UTC --- (In reply to comment #6) > Also, what about this: > > -3_w; What about it? IIUC, it is just -(3_w), I don't think it requires a particular treatment.
[Bug target/52607] v4df __builtin_shuffle with {0,2,1,3} or {1,3,0,2}
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607 Marc Glisse changed: What|Removed |Added Attachment #26979|0 |1 is obsolete|| --- Comment #26 from Marc Glisse 2012-03-31 14:02:54 UTC --- Created attachment 27052 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27052 default case Updated with your comments, still can't properly test.
[Bug target/52607] v4df __builtin_shuffle with {0,2,1,3} or {1,3,0,2}
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607 --- Comment #25 from Marc Glisse 2012-03-31 09:37:51 UTC --- The test for AVX2 in expand_vec_perm_interleave2 might be too strict. For the V4DF shuffle 4,0,2,6, removing that check lets the compiler generate a nice vunpcklpd+vpermilpd (as opposed to 3 insn with my patch and 5+ without). The expansion of dfinal is already protected (so the function returns false for 4,2,0,6), I haven't checked whether something else (dremap?) needs protecting, but it doesn't look like it.
[Bug target/52607] v4df __builtin_shuffle with {0,2,1,3} or {1,3,0,2}
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607 --- Comment #24 from Marc Glisse 2012-03-29 14:19:11 UTC --- (In reply to comment #23) > (In reply to comment #18) > > + if (!d->testing_p) > +dsecond.target = gen_reg_rtx (dsecond.vmode); > + dfirst.op1 = dsecond.target; > > This bit has a problem with testing_p in that we'll have op0==op1 > while testing and not when expanding. Which means that testing_p > will be checking something else. Unless d->target==d->op0 (is that the case? I was kind of assuming it wasn't), it looks ok, but I agree that it should be improved. From other code, it looks like using gen_reg_rtx in testing is fine and avoiding it is just an optimization. On the other hand, if I remember correctly, the function could just return true early when testing (like the other function does) and assert during expansion, since it is not supposed to fail (except for the initial mode/target check), that would document the intent better. > I've been meaning to convert i386 from op0==op1 to one_operand_p, > like I used in targets I converted later, like ia64. I'll see about > making this change this afternoon, and then you can update your > patch to match. ok (no promise timewise). > + ok = expand_vec_perm_1 (&dsecond); > + ok &= ix86_expand_vec_perm_const_1 (&dfirst); > + > + if (!ok) > +return false; > + > + return true; > > Better with a short-circuit to avoid extra work: > > return (expand_vec_perm_1 (&dsecond) > && ix86_expand_vec_perm_const_1 (&dfirst)); Indeed! Thanks for the comments.
[Bug target/52607] v4df __builtin_shuffle with {0,2,1,3} or {1,3,0,2}
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607 --- Comment #22 from Marc Glisse 2012-03-27 20:57:16 UTC --- (In reply to comment #20) > Lastly for each routine it is desirable to think whether it might be useful > for > other vector modes (likely 32-byte only) for TARGET_AVX2. I am not very familiar with the integer versions, so I tried: #include __v32qi f(__v32qi x){ __v32qi m={0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}; return __builtin_shuffle(x,m); } $ gcc r.c -S -O1 -mavx && cat r.s r.c: In function 'f': r.c:2:9: error: invalid position or size operand to BIT_FIELD_REF BIT_FIELD_REF r.c:2:9: note: in statement D.5992_24 = BIT_FIELD_REF ; r.c:2:9: error: invalid position or size operand to BIT_FIELD_REF BIT_FIELD_REF r.c:2:9: note: in statement D.5993_25 = BIT_FIELD_REF ; [...] r.c:2:9: internal compiler error: verify_gimple failed (with -mavx2 it works fine)
[Bug target/52607] v4df __builtin_shuffle with {0,2,1,3} or {1,3,0,2}
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607 --- Comment #21 from Marc Glisse 2012-03-27 18:21:39 UTC --- (In reply to comment #20) > I don't like much the calls to ix86_expand_vec_perm_const_1, if you are > looking > for exactly two insn permutations, Actually, it isn't just 2 insn. The call in expand_vec_perm_vperm2f128_merge can take 3, and the calls in expand_vec_perm_perm_blend(...,true) up to 4 (this is how I get a maximum of 9 insn, 1+2*4). But some more splits of ix86_expand_vec_perm_const_1 to avoid recursive calls should be doable, if you don't like the recursion. > then really the two insn permutation > functions should be groupped together into expand_vec_perm_2 and you should > call that instead, or if it is 1 or 2, then expand_vec_perm_1 || > expand_vec_perm_2. Yes, this grouping by size makes sense, whether it ends up being used or not. Although there are expanders in the "3" category that occasionally get lucky and generate only 2 :-) > expand_vec_perm_vperm2f128_merge has probably swapped the meaning of dfirst > and > dsecond permutations when it first performs the dsecond permutation. If you are just talking of the naming of the variables, yes, I completely agree they should be swapped (or given more explicit names, like swap_lanes and dintra).
[Bug target/52607] v4df __builtin_shuffle with {0,2,1,3} or {1,3,0,2}
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607 Marc Glisse changed: What|Removed |Added Attachment #26938|0 |1 is obsolete|| --- Comment #18 from Marc Glisse 2012-03-25 13:52:09 UTC --- Created attachment 26979 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26979 default case An updated version of this simple, generic-case shuffle (do note that I didn't run the generated code, just checked that it compiled and the instructions generated looked roughly ok). With the patch, we have (concerning v4df and v8sf): - no single-vector shuffle takes more than 4 insn, - no 2-vector shuffle takes more than 9 insn (or 3 (+ 2 movs for constants...) with AVX2). I think the current code already guarantees than anything that can be done in a single instruction is. Some possible goals (making everything optimal may be a bit hard) would be: - everything that can be done in 2 insn is, - no single-vector v4df takes more than 3 insn, - one or two extra optimizations, if they are generic enough. I do wonder occasionally about allowing wild indexes (jokers, places where you can put anything) in shuffles, whether it is exposed to users or just an internal tool.
[Bug c++/52521] [C++11] user defined literals and order of declaration
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52521 --- Comment #12 from Marc Glisse 2012-03-22 09:42:43 UTC --- (In reply to comment #11) > GCC 4.7.0 is being released, adjusting target milestone. I think it is already fixed, actually. (not closing with this message to leave someone a chance to contradict me)
[Bug c++/52654] New: [C++11] Warn on overflow in user-defined literals
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52654 Bug #: 52654 Summary: [C++11] Warn on overflow in user-defined literals Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: marc.gli...@normalesup.org Hello, should there be a warning for this kind of overflow? (-Wall -Wextra is currently silent) int operator"" _w(unsigned long long){return 0;} int main(){ return 12345678901234567890123456789012345678901234567890_w; }
[Bug target/52607] v4df __builtin_shuffle with {0,2,1,3} or {1,3,0,2}
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607 Marc Glisse changed: What|Removed |Added Attachment #26912|0 |1 is obsolete|| --- Comment #17 from Marc Glisse 2012-03-20 21:50:40 UTC --- Created attachment 26938 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26938 intra-lane shuffle in 3 insn This (mostly untested) patch is a reformulation of the generic v8sf single vector shuffle in 4 insn as a generic intra-lane 2 vector shuffle in at most 3 insn. Reformulating __builtin_shuffle(x,m) as __builtin_shuffle(x,vperm2f128(x,1),mm) would then guarantee a maximum size of 4. Note that the strategy of doing a 2-vector shuffle by shuffling (not restricted to one vpermilp*) each vector and blending the results gives a maximum of 9 insn, whereas the current code often generates twice that number. By the way, I have trouble understanding this comment: /* For d->op0 == d->op1 the only useful vperm2f128 permutation is 0x10. */ Is it really 0x10, or is there a stray 0 at the end and it is really just 1?
[Bug target/52607] v4df __builtin_shuffle with {0,2,1,3} or {1,3,0,2}
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607 --- Comment #16 from Marc Glisse 2012-03-20 19:05:22 UTC --- (In reply to comment #15) > If I am not mistaken, the V8SF shuffle 22022246 is doable by a vperm2f128 that > takes 01234567 to 01230123, followed by a vshufps (mask 138 maybe). Was your > patch supposed to handle it? Uh, no it isn't supposed to handle it (there would be redundancy and it wouldn't know where to take elements from), sorry, forget that comment.
[Bug target/52607] v4df __builtin_shuffle with {0,2,1,3} or {1,3,0,2}
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607 --- Comment #15 from Marc Glisse 2012-03-20 19:00:32 UTC --- If I am not mistaken, the V8SF shuffle 22022246 is doable by a vperm2f128 that takes 01234567 to 01230123, followed by a vshufps (mask 138 maybe). Was your patch supposed to handle it?
[Bug target/52607] v4df __builtin_shuffle with {0,2,1,3} or {1,3,0,2}
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607 --- Comment #9 from Marc Glisse 2012-03-19 18:29:50 UTC --- (In reply to comment #8) > I'm not very keen on having too many different routines, the more generic they > are, the better. Agreed, that was one of my concerns from the first message in this bug, but to experiment it was easier to have separate functions. > So IMHO e.g. the two insn sequence, vperm2[if]128 + some one > insn shuffle could look like: > > /* A subroutine of ix86_expand_vec_perm_builtin_1. Try to expand >a vector permutation using two instructions, vperm2f128 resp. >vperm2i128 followed by any single in-lane permutation. */ I haven't yet looked at it closely enough to understand what it does (those functions are surprisingly confusing when you don't write them yourself), but that looks interesting. My first idea in order to make things more generic was to tentatively turn __builtin_shuffle(x,m) into __builtin_shuffle(x,vperm2f128(x,x,33),mm) where mm avoids any cross-lane. The 2-vector no-cross-lane shuffle should take at most 3 instructions in v4df or v8sf (I haven't checked if it works now) and that's where most of the work would happen (instead of having many routines for single-vector shuffles that almost all start with vperm2f128). Then you would probably want to check how many instructions it used, since it could be more or less than one of the few instruction sequences that don't start with vperm2f128. >From a quick look, it looks like you may be doing something even more generic... > This will handle e.g. vperm2f128 + {vshufpd,vblendpd,vunpcklpd,vunpckhpd} etc. Cool!
[Bug target/52607] v4df __builtin_shuffle with {0,2,1,3} or {1,3,0,2}
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607 --- Comment #6 from Marc Glisse 2012-03-18 18:58:44 UTC --- Created attachment 26912 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26912 generic shuffle of a single v8sf An additional function (I should find better names...) to handle generic shuffles of a single v8sf in 4 instructions. Only tested on {6,2,3,3,5,2,3,7}. By the way, expand_vec_perm_vperm2f128_vblend2 does vpermilpd+vperm2f128 in this order, but it would be better to do it in the reverse order (adapting the mask), because it is common to need several __builtin_shuffle(x,*) and the vperm2f128 can then be shared. I also noticed while experimenting that -mavx2 generates vpermd instead of vpermps (the vpermq->vpermpd change didn't affect that).
[Bug target/52607] v4df __builtin_shuffle with {0,2,1,3} or {1,3,0,2}
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607 Marc Glisse changed: What|Removed |Added Attachment #26909|0 |1 is obsolete|| --- Comment #5 from Marc Glisse 2012-03-18 12:53:13 UTC --- Created attachment 26911 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26911 patch With this one, all V4DF shuffles on one vector are done in at most 3 instructions (and are correct). Doing V8SF at the same time was getting confusing so I dropped it for the last 2 functions, which end up looking almost like: "if the pattern is 0112 do this, if it is 0130 do that, etc". I didn't check if all the functions are still used by at least one pattern... Note: my access to an avx machine is not sufficient to submit a patch, so feel free to take pieces of this and modify/test/submit them (I have a copyright assignment).
[Bug target/52607] v4df __builtin_shuffle with {0,2,1,3} or {1,3,0,2}
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607 Marc Glisse changed: What|Removed |Added Attachment #26908|0 |1 is obsolete|| --- Comment #4 from Marc Glisse 2012-03-17 22:03:08 UTC --- Created attachment 26909 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26909 patch Here is a try. Again, I just looked at the generated code on a couple examples, which isn't very reliable... expand_vec_perm_vperm2f128_vblend0 is already covered by expand_vec_perm_vperm2f128_vblend1, but it is confusing to have a 3-instruction function generate only 2. I didn't do generic permutations with 4 instructions. There is probably more that can be done with vshufp[sd].
[Bug target/52607] v4df __builtin_shuffle with {0,2,1,3} or {1,3,0,2}
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607 --- Comment #3 from Marc Glisse 2012-03-17 19:55:18 UTC --- Uh. I feel silly, but it looks like vshufpd could replace vpermilpd+vblendpd in many cases, including the original 1230 from PR52568...
[Bug target/52607] v4df __builtin_shuffle with {0,2,1,3} or {1,3,0,2}
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607 --- Comment #2 from Marc Glisse 2012-03-17 19:20:36 UTC --- Created attachment 26908 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26908 copy-paste patch for 0213 and 1302 This seems to handle 0213 and 1302 (I only vaguely looked at the generated code, can't do proper testing). It is really a copy-paste of the function that handles 1230. I didn't try to understand everything, so there may be things that made sense in the original function but don't anymore here. It should be possible to merge the 2 new functions, but merging them with the previous one looks harder.
[Bug target/52607] v4df __builtin_shuffle with {0,2,1,3} or {1,3,0,2}
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607 --- Comment #1 from Marc Glisse 2012-03-17 01:05:57 UTC --- Note that {1,2,0,3} seems harder, I need one extra vpermilpd. Actually, it looks like every v4df shuffle can be realized as a vblendpd of a vpermilpd and a vpermilpd+vperm2f128. For v8sf, it also seems true but may require the version of vpermilps that takes its controls from a register/memory.
[Bug target/52607] New: v4df __builtin_shuffle with {0,2,1,3} or {1,3,0,2}
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52607 Bug #: 52607 Summary: v4df __builtin_shuffle with {0,2,1,3} or {1,3,0,2} Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: marc.gli...@normalesup.org Hello, this is really just a follow-up to PR52568. The permutations {0,2,1,3} and {1,3,0,2} can be realized with a very similar technique. Starting from 0123: vpermilpd+vperm2f128->3210 vblendpd(0123,3210)->0213 or: vpermilpd->1032 vperm2f128->2301 vblendpd(1032,2301)->1302 I am not sure if there is a nice way to generalize this or if the function expand_vec_perm_vperm2f128_vblend should be cloned a few times and slightly modified. (these permutations are less important to me than 1230 was)
[Bug c++/52521] [C++11] user defined literals and order of declaration
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52521 --- Comment #7 from Marc Glisse 2012-03-16 19:39:14 UTC --- (In reply to comment #6) > constexpr long double operator"" _degrees(long double d) > { >return d * 0.0175; > } > > int main() > { >long double pi = 180_degrees; >std::cout << pi << std::endl; > } There is no dot in 180, so it is looking for an unsigned long long overload (which you could provide). 180._degrees works.
[Bug target/52572] suboptimal assignment to avx element
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52572 --- Comment #3 from Marc Glisse 2012-03-13 17:57:58 UTC --- Or for this variant: __m256d f(__m256d *y){ __m256d x=*y; x[0]=0; // or x[3] return x; } it looks like vmaskmovpd could replace: vmovapd(%rdi), %ymm0 vmovapd%xmm0, %xmm1 vmovlpd.LC0(%rip), %xmm1, %xmm1 vinsertf128$0x0, %xmm1, %ymm0, %ymm0 (I tried a version with __builtin_shuffle but it wouldn't generate vmaskmovpd either) (sorry for the naive suggestions, there are too many possibilities to optimize them all...)
[Bug target/52572] suboptimal assignment to avx element
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52572 --- Comment #2 from Marc Glisse 2012-03-13 08:16:58 UTC --- (In reply to comment #1) > Have you actually tried that? Ah, no, sorry, I only have occasional access to such a machine to benchmark the code. From a -Os perspective it is still shorter (but indeed that matters less to me than -O3 performance). > Mixing VEX encoded insns with legacy encoded > SSE* insns is very costly, for good performance there needs to be a vzeroupper > in between (but then you lose the upper bits). See e.g. 2.8 in the AVX > Programming Reference. Thanks, I'd missed that. The vblendpd solution should still apply (from the initial 'v' it sounds safe), no?
[Bug target/52572] New: suboptimal assignment to avx element
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52572 Bug #: 52572 Summary: suboptimal assignment to avx element Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: marc.gli...@normalesup.org For the following program: #include __m256d f(__m256d x){ x[0]=0; return x; } gcc -O3 generates: vmovlpd.LC0(%rip), %xmm0, %xmm1 vinsertf128$0x0, %xmm1, %ymm0, %ymm0 or with -Os: vxorps%xmm2, %xmm2, %xmm2 vmovsd%xmm2, %xmm0, %xmm1 vinsertf128$0x0, %xmm1, %ymm0, %ymm0 If I understand correctly, it first constructs {0,x[1],0,0} and then merges it with the upper part of x. However, using the legacy movlpd instruction would avoid zeroing the upper 128 bits and thus the vinsertf128 wouldn't be needed. Is there a policy not to generate the non-VEX instructions anymore, or is this a missed optimization? Setting x[1] is similar. For x[2] or x[3], we get extract+mov+insert, but it might be better to do something with vblendpd.
[Bug target/52568] New: suboptimal __builtin_shuffle on cycles with AVX
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52568 Bug #: 52568 Summary: suboptimal __builtin_shuffle on cycles with AVX Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: marc.gli...@normalesup.org Hello, I compiled the following with -O3 (or -Os) and -mavx #include __m256d left(__m256d x){ __m256i mask={1,2,3,0}; return __builtin_shuffle(x,mask); } (by the way, for some reason, gcc insists that 'mask' is set but not used with -Wall) and got: vunpckhpd%xmm0, %xmm0, %xmm3 vmovapd%xmm0, %xmm1 vextractf128$0x1, %ymm0, %xmm0 vmovaps%xmm0, %xmm2 vunpckhpd%xmm0, %xmm0, %xmm0 vunpcklpd%xmm1, %xmm0, %xmm1 vunpcklpd%xmm2, %xmm3, %xmm0 vinsertf128$0x1, %xmm1, %ymm0, %ymm0 ret That doesn't really match the code I currently use to do this: #ifdef __AVX2__ __m256d d=_mm256_permute4x64_pd(x,1+2*4+3*16+0*64); #else __m256d b=_mm256_shuffle_pd(x,x,5); __m256d c=_mm256_permute2f128_pd(b,b,1); __m256d d=_mm256_blend_pd(b,c,10); #endif Could something recognizing this permutation pattern (and the right cyclic shift) be added? I know there are too many shuffles to hand-code them all, but cycles seem like they shouldn't be too uncommon. With -mavx2, I get a single vpermq, which is close enough to the expected vpermpd.
[Bug c++/52567] constant expression not recognized as being constant
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52567 --- Comment #1 from Marc Glisse 2012-03-12 18:10:16 UTC --- 1<<31 overflows and is thus not a constant. Try maybe 1LL<<31 ?
[Bug c++/52521] New: [C++11] user defined literals and order of declaration
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52521 Bug #: 52521 Summary: [C++11] user defined literals and order of declaration Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: marc.gli...@normalesup.org #include int operator "" _w(const char*); int operator "" _w(const char*, std::size_t); int main() { 123_w; } a.cc: In function 'int main()': a.cc:5:3: error: unable to find numeric literal operator 'operator"" _w' The problem disappears if I switch the 2 declarations... Btw, mangling these operators like functions called li_w taking the same arguments is strange, I could have such a function in my code.
[Bug libstdc++/22200] numeric_limits::is_modulo is inconsistent with gcc
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22200 Marc Glisse changed: What|Removed |Added CC||marc.glisse at normalesup ||dot org --- Comment #40 from Marc Glisse 2012-02-29 12:32:10 UTC --- I haven't seen it mentioned in the discussion here, but in C++11, the definition of is_modulo was clarified as: "True if the type is modulo. A type is modulo if, for any operation involving +, -, or * on values of that type whose result would fall outside the range [min(),max()], the value returned differs from the true value by an integer multiple of max() - min() + 1." Do people have objections to switching numeric_limits::is_modulo to false (setting it to true when -fwrapv is used can still be discussed afterwards)?
[Bug libstdc++/51785] gets not anymore declared
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51785 --- Comment #12 from Marc Glisse 2012-02-28 15:47:04 UTC --- (In reply to comment #10) > If the libstdc++ people are going to do something for 4.7, it really needs > to be done very soon. The question is: what do the glibc people want? By removing the gets prototype, they are explicitly going against the C++ standard. Seems to me that libstdc++ should respect that choice (add a test in configure to see if gets is provided, and protect "using ::gets;" with #ifdef) and not provide gets. The alternative is to disagree with the glibc developers and fixinclude stdio.h.