Re: [Bug tree-optimization/32183] [4.3 Regression] reassoc2 can more extra calculations into a loop
On 10 Oct 2007 08:58:00 -, steven at gcc dot gnu dot org [EMAIL PROTECTED] wrote: --- Comment #33 from steven at gcc dot gnu dot org 2007-10-10 08:57 --- What happened with the suggestion to only do this in reassoc2 (see comment #27)? Yeah, i'm not sure why we just made both reassocs more expensive when we only care what happens with the second.
Re: [Bug c++/33604] [4.3 Regression] Revision 119502 causes significantly slower results with 4.3 compared to 4.2
I'm not fixing this until someone can tell me what exactly is going wrong. There have been *so* many changes to PTA since that revision that the majority of the code it touched doesn't even do the same thing anymore. My guess is that this is a case where adding extra vdefs/vuses made some dumb optimizer able to see something it can't when the chains are separate like they should be. On 1 Oct 2007 21:04:40 -, hjl at lucon dot org [EMAIL PROTECTED] wrote: --- Comment #4 from hjl at lucon dot org 2007-10-01 21:04 --- I saw 40% performance regression at -O3 with testcase in comment #1 on Linux/x86-64. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33604
Re: [Bug c/32575] [4.2/4.3 regression] With -ftree-vrp miscompiles a single line of code in SQLite
On 28 Aug 2007 15:58:29 -, jakub at gcc dot gnu dot org [EMAIL PROTECTED] wrote: --- Comment #6 from jakub at gcc dot gnu dot org 2007-08-28 15:58 --- if (a == 0) a = bar (); isn't necessary either. salias has: # BLOCK 2 freq:1 # PRED: ENTRY [100.0%] (fallthru,exec) # VUSE qD.2026_12(D), SMT.25D.2079_13(D) { qD.2026 SMT.25D.2079 } D.2027_3 = foo (); pD.2025_4 = (struct S *) D.2027_3; if (pD.2025_4 == 0B) goto bb 3; else goto bb 4; # SUCC: 3 [7.3%] (true,exec) 4 [92.7%] (false,exec) # BLOCK 3 freq:735 # PRED: 2 [7.3%] (true,exec) # qD.2026_15 = VDEF qD.2026_12(D) # SMT.25D.2079_16 = VDEF SMT.25D.2079_13(D) # SMT.26D.2080_17 = VDEF SMT.26D.2080_14(D) { qD.2026 SMT.25D.2079 SMT.26D.2080 } __builtin_memset (qD.2026, 0, 24); # SUCC: 4 [100.0%] (fallthru,exec) # BLOCK 4 freq:1 # PRED: 2 [92.7%] (false,exec) 3 [100.0%] (fallthru,exec) # qD.2026_11 = PHI qD.2026_12(D)(2), qD.2026_15(3) # pD.2025_1 = PHI pD.2025_4(2), qD.2026(3) # qD.2026_18 = VDEF qD.2026_11 { qD.2026 } pD.2025_1-s1D.2008 = aD.2021_6(D); # qD.2026_19 = VDEF qD.2026_18 { qD.2026 } pD.2025_1-s2D.2009 = bD.2022_7(D); Shouldn't the VDEFs be a PHI of some SMT and qD? For VDEF/VUSE, you will never have a PHI of anything other than multiple versions of the same SMT/virtual variable. The above looks right to me at a glance. It is probably pruning the result using TBAA which is what p-s isn't thought to access the SMT.
Re: [Bug tree-optimization/33159] [4.3 Regression] wrong VDEF for gcc.target/i386/cmov4.c
Yes, you are right. I wasn't thinking clearly --- Comment #4 from bonzini at gnu dot org 2007-08-23 14:04 --- Hmmm, a store into an int * could not touch nodekind itself, only a store into an int ** could. Isn't SMT.8 the VDEF saying it could touch *the thing pointed to by nodekind*?
Re: [Bug c++/32900] New: [4.2/4.3 regression] compile time and memory regression
Points-to memory with these is almost nothing, so don't look at meef. It looks like size goes up for each function and is not fully recovered by the time we start the next. On 25 Jul 2007 22:25:22 -, debian-gcc at lists dot debian dot org [EMAIL PROTECTED] wrote: [forwarded from http://bugs.debian.org/431608] c++ source files generated with sip-qt take much longer (4.2) and much more memory (4.3) to build, than building with 4.1: 4.1 200707180m58.881s about 400mb 4.2.1 release 86m13.933s about 400mb 4.3 20070720 14m51.718s about 1.5gb built on i486-linux-gnu. 4.1 and 4.2 are built with --enable-checking=release, 4.3 with the setting from trunk. Matthias -- Summary: [4.2/4.3 regression] compile time and memory regression Product: gcc Version: 4.2.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: debian-gcc at lists dot debian dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32900
Re: [Bug tree-optimization/32746] [4.3 Regression] tree-ssa-operands int.comp error
I already submitted a patch for this (see my followup to HP that fixes valid_gimple_expression_p). As soon as i can bootstrap on darwin, i will commit it. If someone wants to do so before me, all you need to do is change is_gimple_addressable to is_gimple_id in valid_gimple_expression_p
Re: [Bug tree-optimization/32746] [4.3 Regression] tree-ssa-operands int.comp error
valid_gimple_expression_p claims ((struct RegisterLayout *) (char *) SimulatedRegisters)-intmask; is valid GIMPLE, when it is not. On 13 Jul 2007 23:37:00 -, hp at gcc dot gnu dot org [EMAIL PROTECTED] wrote: --- Comment #4 from hp at gcc dot gnu dot org 2007-07-13 23:36 --- Also happens for cris-axis-elf and likely other 32-bit platforms. -- hp at gcc dot gnu dot org changed: What|Removed |Added CC||hp at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32746
Re: [Bug tree-optimization/32705] [4.3 regression] ICE in set_ssa_val_to, at tree-ssa-sccvn.c:1022
The only way i can see this happening is if you have a truly uninitialized variable, or there is something we have missed. Does this function have cfun-static_chain_decl being used, and we have a copy of that here? It is theoretically safe to call set_ssa_to_val with to == vn_top, but it's probably a bug somewhere, and i'd rather eliminate the bug cases before turning it off. On 11 Jul 2007 20:10:10 -, ebotcazou at gcc dot gnu dot org [EMAIL PROTECTED] wrote: --- Comment #5 from ebotcazou at gcc dot gnu dot org 2007-07-11 20:10 --- Can someone paste the output of debug_generic_stmt (to) and debug_tree(to) at the point of failure? (gdb) p debug_tree(to) var_decl 0x557f7114 vn_top.181 type void_type 0x55716804 void sizes-gimplified visited VOID align 8 symtab 0 alias set 36 canonical type 0x55716804 pointer_to_this pointer_type 0x55716870 used ignored VOID file ../c87b26b.adb line 4 align 8 $4 = void (gdb) p debug_generic_stmt(to) vn_top.181 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32705
Re: [Bug tree-optimization/32328] [4.2/4.3 Regression] -fstrict-aliasing causes skipped code
On 4 Jul 2007 03:29:25 -, mmitchel at gcc dot gnu dot org [EMAIL PROTECTED] wrote: -- Just as an update: I have been working with richi (I code, he tests :P) diligently on a patch for mainline, and have one that fixes the dealii regression (and thus, should fix this as well).
Re: [Bug middle-end/30075] Missed optimizations with -fwhole-program -combine
On 26 Jun 2007 03:10:26 -, acahalan at gmail dot com [EMAIL PROTECTED] wrote: --- Comment #4 from acahalan at gmail dot com 2007-06-26 03:10 --- (In reply to comment #3) Subject: Re: Missed optimizations with -fwhole-program -combine I would not expect this to be fixed anytime soon. I have yet to find any real people who use either combine or -fwhole-program. They use *way* too much memory on real programs. As a result, no real people involved in optimization work on optimizers for them. I'm real, and I want to use those. That's nice and all, but i still wouldn't expect any work on them until LTO is finished. They are useless options right now. I'd vote to remove them.
Re: [Bug tree-optimization/30052] [4.2 Regression] possible quadratic behaviour.
On 20 May 2007 04:57:45 -, pluto at agmk dot net [EMAIL PROTECTED] wrote: --- Comment #25 from pluto at agmk dot net 2007-05-20 05:57 --- Subject: Re: [4.2 Regression] possible quadratic behaviour. -- Change line 4275 of the patched tree-ssa-structalias.c to be rhs.var = vi-id instead of rhs.var = id Remove the id variable declaration. This would have only affected fortran http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30052
Re: [Bug tree-optimization/30052] [4.2 Regression] possible quadratic behaviour.
On 19 May 2007 14:30:43 -, pluto at agmk dot net [EMAIL PROTECTED] wrote: --- Comment #21 from pluto at agmk dot net 2007-05-19 15:30 --- with this patc gcc works much better. xf86ScanPci.i : 84MB / ~5sec. sipQtCorepart0.ii.bz2 : 340MB / ~440sec There are optimizations that could be made to the 440 seconds if they are in PTA solving, but they wouldn't really help mainline much, so i'm not sure if it is worth it.
Re: [Bug tree-optimization/30052] [4.2 Regression] possible quadratic behaviour.
On 19 May 2007 17:16:35 -, pluto at agmk dot net [EMAIL PROTECTED] wrote: --- Comment #23 from pluto at agmk dot net 2007-05-19 18:16 --- bad news, this patch ices fortran build: (...) ../../../libgfortran/intrinsics/selected_int_kind.f90:22: internal compiler error: in process_constraint, at tree-ssa-structalias.c:2260 Meh, send me the file. This is just a small bug somewhere in the backport.
Re: [Bug libstdc++/29286] [4.0/4.1/4.2/4.3 Regression] placement new does not change the dynamic type as it should
On 14 May 2007 08:25:27 -, rguenth at gcc dot gnu dot org [EMAIL PROTECTED] wrote: --- Comment #60 from rguenth at gcc dot gnu dot org 2007-05-14 09:25 --- But it doesn't have a result, does it? Given that, I wonder how moving stmts across it is prevented? Okay, so then it needs an LHS that defines a new SSA name, otherwise, we'll end up with dead ones everywhere, and they will keep other dead code alive.
Re: [Bug tree-optimization/30604] Unable to coalesce ssa_names x and y which are marked as MUST COALESCE
On 8 Mar 2007 20:12:16 -, amacleod at redhat dot com [EMAIL PROTECTED] wrote: --- Comment #7 from amacleod at redhat dot com 2007-03-08 20:12 --- Looking at the original testcase, the complaint is that _t_8232 and _t_3 are both used in the PHI definition of _t_7. (using mainline from march 5th) ie, _t_7(ab) = PHI , _t_8232, ... , _t_3, ... Uh, did you not put the (ab) next to the arguments, or do they really not have SSA_NAME_OCCURS_IN_ABNORMAL_PHI set on them? (They should) I can't really read the detailed output from FRE, but it does seem to have replaced a bunch of expressions with _t_3, so that would appear to be the culprit. It won't value number things with SSA_NAME_OCCURS_IN_ABNORMAL_PHI set, so it should never eliminate anything with them.
Re: [Bug tree-optimization/30089] Compiling FreeFem3d uses unreasonable amount of time and memory
okay, i'll update changelog, submit and commit. On 13 Jan 2007 23:02:13 -, rguenth at gcc dot gnu dot org [EMAIL PROTECTED] wrote: --- Comment #21 from rguenth at gcc dot gnu dot org 2007-01-13 23:02 --- The patch fixed the freefem memory regression. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30089
Re: [Bug tree-optimization/30089] Compiling FreeFem3d uses unreasonable amount of time and memory
Try the attached, let me know how it goes. On 9 Jan 2007 21:17:05 -, rguenth at gcc dot gnu dot org [EMAIL PROTECTED] wrote: --- Comment #16 from rguenth at gcc dot gnu dot org 2007-01-09 21:17 --- Pling! -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30089 --- gcc/tree.h (/mirror/gcc-trunk) (revision 1114) +++ gcc/tree.h (/local/gcc-clean) (revision 1114) @@ -2449,10 +2449,14 @@ struct tree_decl_minimal GTY(()) struct tree_memory_tag GTY(()) { struct tree_decl_minimal common; + + bitmap GTY ((skip)) aliases; + unsigned int is_global:1; }; #define MTAG_GLOBAL(NODE) (TREE_MEMORY_TAG_CHECK (NODE)-mtag.is_global) +#define MTAG_ALIASES(NODE) (TREE_MEMORY_TAG_CHECK (NODE)-mtag.aliases) struct tree_struct_field_tag GTY(()) { --- gcc/tree-ssa-alias.c (/mirror/gcc-trunk) (revision 1114) +++ gcc/tree-ssa-alias.c (/local/gcc-clean) (revision 1114) @@ -90,6 +90,7 @@ struct alias_stats_d /* Local variables. */ static struct alias_stats_d alias_stats; +static bitmap_obstack alias_bitmap_obstack; /* Local functions. */ static void compute_flow_insensitive_aliasing (struct alias_info *); @@ -99,7 +100,7 @@ static bool may_alias_p (tree, HOST_WIDE static tree create_memory_tag (tree type, bool is_type_tag); static tree get_smt_for (tree, struct alias_info *); static tree get_nmt_for (tree); -static void add_may_alias (tree, tree, struct pointer_set_t *); +static void add_may_alias (tree, tree); static struct alias_info *init_alias_info (void); static void delete_alias_info (struct alias_info *); static void compute_flow_sensitive_aliasing (struct alias_info *); @@ -194,19 +195,21 @@ static void mark_aliases_call_clobbered (tree tag, VEC (tree, heap) **worklist, VEC (int, heap) **worklist2) { + bitmap aliases; + bitmap_iterator bi; unsigned int i; - VEC (tree, gc) *ma; tree entry; var_ann_t ta = var_ann (tag); if (!MTAG_P (tag)) return; - ma = may_aliases (tag); - if (!ma) + aliases = may_aliases (tag); + if (!aliases) return; - for (i = 0; VEC_iterate (tree, ma, i, entry); i++) + EXECUTE_IF_SET_IN_BITMAP (aliases, 0, i, bi) { + entry = referenced_var (i); if (!unmodifiable_var_p (entry)) { add_to_worklist (entry, worklist, worklist2, ta-escape_mask); @@ -264,7 +267,8 @@ compute_tag_properties (void) changed = false; for (k = 0; VEC_iterate (tree, taglist, k, tag); k++) { - VEC (tree, gc) *ma; + bitmap ma; + bitmap_iterator bi; unsigned int i; tree entry; bool tagcc = is_call_clobbered (tag); @@ -277,8 +281,9 @@ compute_tag_properties (void) if (!ma) continue; - for (i = 0; VEC_iterate (tree, ma, i, entry); i++) + EXECUTE_IF_SET_IN_BITMAP (ma, 0, i, bi) { + entry = referenced_var (i); /* Call clobbered entries cause the tag to be marked call clobbered. */ if (!tagcc is_call_clobbered (entry)) @@ -508,8 +513,9 @@ sort_mp_info (VEC(mp_info_t,heap) *list) static void create_partition_for (mp_info_t mp_p) { + bitmap_iterator bi; tree mpt, sym; - VEC(tree,gc) *aliases; + bitmap aliases; unsigned i; if (mp_p-num_vops = (long) MAX_ALIASED_VOPS) @@ -556,11 +562,12 @@ create_partition_for (mp_info_t mp_p) else { aliases = may_aliases (mp_p-var); - gcc_assert (VEC_length (tree, aliases) 1); + gcc_assert (!bitmap_empty_p (aliases)); mpt = NULL_TREE; - for (i = 0; VEC_iterate (tree, aliases, i, sym); i++) + EXECUTE_IF_SET_IN_BITMAP (aliases, 0, i, bi) { + sym = referenced_var (i); /* Only set the memory partition for aliased symbol SYM if SYM does not belong to another partition. */ if (memory_partition (sym) == NULL_TREE) @@ -614,11 +621,10 @@ rewrite_alias_set_for (tree tag, bitmap else { /* Create a new alias set for TAG with the new partitions. */ - var_ann_t ann; - ann = var_ann (tag); - for (i = 0; VEC_iterate (tree, ann-may_aliases, i, sym); i++) + EXECUTE_IF_SET_IN_BITMAP (MTAG_ALIASES (tag), 0, i, bi) { + sym = referenced_var (i); mpt = memory_partition (sym); if (mpt) bitmap_set_bit (new_aliases, DECL_UID (mpt)); @@ -627,9 +633,7 @@ rewrite_alias_set_for (tree tag, bitmap } /* Rebuild the may-alias array for TAG. */ - VEC_free (tree, gc, ann-may_aliases); - EXECUTE_IF_SET_IN_BITMAP (new_aliases, 0, i, bi) - VEC_safe_push (tree, gc, ann-may_aliases, referenced_var (i)); + bitmap_copy (MTAG_ALIASES (tag), new_aliases); } } @@ -691,7 +695,10 @@ compute_memory_partitions (void) /* Each reference to VAR will produce as many VOPs as elements exist in its alias set. */ mp.var = var; - mp.num_vops = VEC_length (tree, may_aliases (var)); + if (!may_aliases (var)) + mp.num_vops = 0; + else + mp.num_vops = bitmap_count_bits (may_aliases (var)); /* No point grouping singleton alias sets. */ if
Re: [Bug libstdc++/29286] [4.0/4.1/4.2/4.3 Regression] placement new does not change the dynamic type as it should
On 1 Jan 2007 00:41:44 -, mark at codesourcery dot com [EMAIL PROTECTED] wrote: --- Comment #26 from mark at codesourcery dot com 2007-01-01 00:41 --- Subject: Re: [4.0/4.1/4.2/4.3 Regression] placement new does not change the dynamic type as it should dberlin at gcc dot gnu dot org wrote: If we add a placement_new_expr, and not try to revisit our interpretation of the standard, we can just DTRT and fix placement new. This would be best for optimizations, and IMHO, for users. I agree that treating placement new specially makes sense. The first argument to a placement new operator could be considered to have an unspecified dynamic type on entrance to the operator, while the return value has the dynamic type specified by the operator. (So that the pointer returned by new (x) int has type int *.) Right. I'm not sure that placement_new_expr is the best way to accomplish this, but, maybe it is. Another possibility would be to define an attribute or attributes to specify the dynamic type of arguments and return types, and then have the C++ front end annotate all placement new operators with those attributes. It would be nice if we could transform those attributes on gimplification to something like an an alias preserving cast (or something of that nature) that states that the cast is type unioning for alias purposes (IE that the possible types of the result for TBAA/etc purposes is the union of the type of the cast and the type of the cast's operand).. Not a fully fleshed out idea, just something that popped into my head.
Re: [Bug tree-optimization/29922] [4.3 Regression] [Linux] ICE in insert_into_preds_of_block
I will try to get back to this bug this week. I was fighting some other fights last week, i apologize.
Re: [Bug libstdc++/30203] New: std::vector::size() 10x speedup (patch)
And what are the timings with a recent version of g++ and actually turning on optimization? On 13 Dec 2006 17:38:06 -, charles at rebelbase dot com [EMAIL PROTECTED] wrote: vector::size() in bits/stl_vector.h is currently implemented as size_type size() const { return size_type(end() - begin()); } A faster implementation is size_type size() const { return _M_impl._M_finish - _M_impl._M_start; } Which avoids the temporary iterators' life cycles and operator- calls. I tried a simple timing test on both implementations, and the latter appears to be 10x faster: (11:35:56)(charles xyzzy)(~): cat test.cc #include vector int main () { std::vectorint x (100); unsigned long l = 0; const unsigned long iterations = 1; for (unsigned long i=0; iiterations; ++i) l += x.size (); return 0; } (11:35:58)(charles xyzzy)(~): g++ -o test test.cc -lstdc++ (11:36:05)(charles xyzzy)(~): time ./test real0m3.692s user0m3.676s sys 0m0.004s (11:36:10)(charles xyzzy)(~): cat test2.cc #include vector int main () { std::vectorint x (100); unsigned long l = 0; const unsigned long iterations = 1; for (unsigned long i=0; iiterations; ++i) l += x._M_impl._M_finish - x._M_impl._M_start; return 0; } (11:36:13)(charles xyzzy)(~): g++ -o test2 test2.cc -lstdc++ (11:36:19)(charles xyzzy)(~): time ./test2 real0m0.342s user0m0.336s sys 0m0.004s -- Summary: std::vector::size() 10x speedup (patch) Product: gcc Version: unknown Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: libstdc++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: charles at rebelbase dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30203
Re: [Bug middle-end/30075] Missed optimizations with -fwhole-program -combine
I would not expect this to be fixed anytime soon. I have yet to find any real people who use either combine or -fwhole-program. They use *way* too much memory on real programs. As a result, no real people involved in optimization work on optimizers for them. On 5 Dec 2006 19:38:51 -, pinskia at gcc dot gnu dot org [EMAIL PROTECTED] wrote: --
Re: [Bug debug/29792] DWARF: Not all inline concrete instances are being generated
OK, so I'll have to find another way of using the DWARF info to see if a inline routine, such as __task_rq_lock was used at all in the build or was just included in the DWARF info but not referenced anywhere, have to dig more into the available information... BTW, if, in these cases, DW_TAG_subroutine is not referenced, what is the purpose of it being included? Is there a reason my limited knowledge is not realising? Well, it is referenced. It did exist in the source, and was inlined. That's what we output. DW_TAG_subprogram with no PC range is actually common. Because all the inlined instances were optimized away, there are no DW_TAG_inlined_* entries for them. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29792
Re: [Bug debug/29792] DWARF: Not all inline concrete instances are being generated
On 12 Nov 2006 20:39:43 -, acme at mandriva dot com [EMAIL PROTECTED] wrote: --- Comment #5 from acme at mandriva dot com 2006-11-12 20:39 --- (In reply to comment #4) The only thing left from __task_rq_lock is a label. SNIP task_cpu were inlined and we constant proped the value of rq the first of the way through the function which we inlined this to. OK, I thought that this was due to something like what you described, even not knowing that much about gcc internals, but I thought that even in this case the DW_TAG_inlined_subroutine would be emitted, or hoped to as it would allow me to do what I want with my tools :-\ There is nothing to emit debug info about, so we don't.
Re: [Bug debug/29792] DWARF: Not all inline concrete instances are being generated
On 13 Nov 2006 16:16:50 -, acme at mandriva dot com [EMAIL PROTECTED] wrote: --- Comment #8 from acme at mandriva dot com 2006-11-13 16:16 --- OK, I thought that this was due to something like what you described, even not knowing that much about gcc internals, but I thought that even in this case the DW_TAG_inlined_subroutine would be emitted, or hoped to as it would allow me to do what I want with my tools :-\ There is nothing to emit debug info about, so we don't. Well, at least gcc emits this: 1a2f2: Abbrev Number: 65 (DW_TAG_subprogram) DW_AT_sibling : a324 DW_AT_name: (indirect string, offset: 0x4515): __task_rq_lock DW_AT_decl_file : 1 DW_AT_decl_line : 378 DW_AT_prototyped : 1 DW_AT_type: 9a2f DW_AT_inline : 3 (declared as inline and inlined) But no DW_TAG_inlined_subroutine, as we've been discussing: [EMAIL PROTECTED] net-2.6.20]$ readelf -wi ../OUTPUT/qemu/net-2.6.20/kernel/sched.o | grep a2f2 DW_AT_sibling : a2f2 1a2f2: Abbrev Number: 65 (DW_TAG_subprogram) [EMAIL PROTECTED] net-2.6.20]$ I'm quite aware of what GCC outputs here :) However, past the initial declarations, we don't output debug information about what the state of the IR is at random points in the compilation, only about what the final output looks like. Since there is no inlined code left, we don't end up saying there is an inlined subroutine. Even if we could change this, i'm not sure we'd want to. It doesn't seem incorrect at all to do what we do. Otherwise, you'd end up with inlined subroutine dies with no low pc/high pc associated with them, which seems nonsensical.
Re: [Bug java/29587] jc1: out of memory allocating 4072 bytes after a total of 708630224 bytes
Can you try the attached and let me know if it fixes it? fordanglin.diff Description: Binary data
Re: [Bug tree-optimization/29680] [4.3 Regression] Misscompilation of spec2006 gcc
A detailed proposal: So here is what i was thinking of. When i say symbols below, I mean some VAR_DECL or structure that has a name (like our memory tags do). A symbol is *not* a real variable that occurred in the user program. When I say varaible i mean a variable that occurred in the user program. The real problem with our alias system in terms of precision, and often in terms of number of useless vops, is that we are trying to use real, existing, variables, to approximate the portions of the heap a statement accesses. When things access portions of the heap we can't see (nonlocal variables), we fall down badly in terms of precision because we can eliminate every single local variable as an alias, and need to then just say it accesses some nonlocal variable. This causes precision problems because it means that statements accessing nonlocal variables that we can *prove* don't interfere, still currently share a VUSE between them. We also have useless vops whenever we have points-to sets that intersect between all statements that interfere, because we end up adding aliases for you can eliminate the members of the alias set We also currently rely on operand-scan time pruning, which is very ugly. There is a way to get the minimal number of vuses/vdefs necessary to represent completely precise (in terms of info we have) aliasing, while still being able to degrade the precision gracefully in order to enable the vuses/vdefs necessary to go down The scheme i propose *never* has overlapping live ranges of the individual symbols, even though the symbols may represent pieces of the same heap. In other words, you can rely on the fact that once an individual symbol has a new version, there will never be a vuse of an old version of that symbol. The current vdef/vuse scheme consists of creating memory tags to represent portions of the heap. When a memory tag has aliases, we use it's alias list to generate virtual operands. When a memory tag does not have aliases, we generate a virtual operand of the base symbol. The basic idea in the new scheme is to never have a list of aliases for a symbol representing portions of the heap. The symbols representing portions of the heap are themselves always the target of a vuse/vdef. The aliases they represent is immaterial (though we can keep a list if something wants it). This enables us to have a smaller number of vops, and have something else generate the set of symbols in a precise manner, rather than have things like the operand scanner try to post process it. The symbols are also attached to the load/store statements, and not to the variables. The operand renamer only has to add vuses/vdefs for all the symbols attached to a statement, and it is done. In the simplest, dumb, non-precise version of this scheme, this means you only have one symbol, called MEM, and generate vuse/vdefs linking every load/store together. In the absolute most-precise version of this scheme, you partition the loads/store conflicts in statements into symbols that represent statement conflictingness. In a completely naive, O(N^3) version, the following algorithm will work and generate completely precise results: Collect all loads and stores into a list (lslist) for each statement in lslist (x): for each statement in lslist (y): if x conflicts with y: if there is no partition for x, y, create a new one containing x and y. otherwise for every partition y belongs to: if all members of this partition have memory access that conflicts with x: add x to this partition otherwise create a new partition containing all members of the partition except the ones x does not conflict with. add x to this partition This is a very very slow way to do it, but it should be clear (there are much much much faster ways to do this). Basically, a single load/store statement can belong to multiple partitions. All members of a given partition conflict with each other. given the following set of memory accesses statements: a, b, c, d where: a conflicts with b and c b conflicts with c and d c conflicts with a and b d conflicts with a and c you will end up with 3 partitions: part1: {a, b, c} part2: {b, c, d} part3: {d, a, c} statement c will conflict with every member of partition 1 and thus get partition 1, rather than a new partition. You now create symbols for each partition, and for each statement in the partition, add the symbol to it's list. Thus, in the above example we get statement a - symbols: MEM.PART1, MEM.PART3 statement b - symbols: MEM.PART1, MEM.PART2 statement c - symbols: MEM.PART1, MEM.PART2, MEM.PART3 statement d - symbols MEM.PART2, MEM.PART3 As mentioned before, the operand renamer simply adds a vdef/vuse for each symbol in the statement list. Note that this is the minimal number of symbols necessary to precisely represent the conflicting accesses. If the number of partitions grows
Re: [Bug tree-optimization/29680] [4.3 Regression] Misscompilation of spec2006 gcc
Memory SSA brings down the number of virtual operators to exactly one per statement. However, it does so in a way that makes the traditional things that actually want to do cool memory optimizations, harder. I'm still on the fence over whether it's a good idea or not. verified before we introduce milion new bugs with mem-ssa (nothing personal, it simply is too large and too intrusive change not to bring any). Intrusive? Well, the only pass that was wired to the previous virtual operator scheme was PRE. DSE is also wired but to a lesser extent. No other optimization had to be changed for mem-ssa. It's obviously intrusive in the renamer, but that's it. Uh, LIM and store sinking are too. Roughly all of our memory optimizations are. The basic problem is in mem-ssa that vdefs and vuses don't accurately reflect what symbols are being defined and used anymore. They represent the factoring of a use and definition of a whole bunch of symbols. Things like PRE and DSE break not because they are wired to the previous virtual operator scheme so much, but because they rely on the virtual use/def chains accurately representing where a symbol representing a memory access dies. In mem-ssa, you have VDEF's of the same symbol all over the place. The changes i have to make to PRE (and to the other things) to account for this is actually to rebuild the non-mem-ssa-factored (IE the current factored) form out of the chains by seeing what symbols they really affect. This is going to be expensive, and IMHO, is what almost all of our SSA memory optimizations are going to have to do. So while mem-ssa doesn't affect *precision*, it does affect how you can use the chains in a very significant way. For at least all the opts i see us doing, it makes them more or less useless without doing things (like reexpanding them) first. Because this is true, I'm not sure it's a good idea at all, which is why i'm still on the fence.
Re: [Bug tree-optimization/29680] [4.3 Regression] Misscompilation of spec2006 gcc
In mem-ssa, you have VDEF's of the same symbol all over the place. version of a symbol
Re: [Bug tree-optimization/29680] [4.3 Regression] Misscompilation of spec2006 gcc
Zdenek, can you revert your patch until we fix this? It might be a month or two before i get back to it. (Yeah, i know it sucks to have to do this, but) On 6 Nov 2006 15:12:30 -, hjl at lucon dot org [EMAIL PROTECTED] wrote: --- Comment #14 from hjl at lucon dot org 2006-11-06 15:12 --- I checked gcc 4.3. The same source code, which is miscompiled in gcc from SPEC CPU 2006, is there. It is most likely that gcc 4.3 is also miscompiled and now generating wrong unwind/debug info, if not wrong instructions. -- hjl at lucon dot org changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2006-11-06 15:12:29 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29680
Re: [Bug java/29587] jc1: out of memory allocating 4072 bytes after a total of 708630224 bytes
On 5 Nov 2006 21:22:24 -, dave at hiauly1 dot hia dot nrc dot ca [EMAIL PROTECTED] wrote: --- Comment #7 from dave at hiauly1 dot hia dot nrc dot ca 2006-11-05 21:22 --- Subject: Re: jc1: out of memory allocating 4072 bytes after a total of 708630224 bytes Can you bzip2 compress -fdump-tree-alias-vops-details-blocks-stats (it's going to be very large) and put it somewhere for me? The files are here: ftp://hiauly1.hia.nrc.ca/outgoing/berlin/. Thanks! So this ends up being what i thought. The variables aren't being collapsed, but i can't figure out why (IE it can't prove they are the same). This causes it to give them separate solution bitmaps, and the solutions are very large,and involve thousands of variables, so thousands * thousands = a lot of memory. However, all of these variables should collapse, as they do in the earlier functions. They also collapse on my machine on this testcase (which admittedly has different code there). It is, in fact, *incredibly* strange that not a single variable is collapsed or unified in this function. I'm not sure what to do here. Can you poke around in perform_var_substitution and see if you can figure out what conditions are causing all the variables to fail out of the collapsing. Particularly, roughly every variable that has a constaint like foo = ESCAPED_VARS in the dump should be getting collapsed to ESCAPED_VARS. Can you poke
Re: [Bug java/29587] jc1: out of memory allocating 4072 bytes after a total of 708630224 bytes
The change on the 19th caused a significant increase in memory consumption http://gcc.gnu.org/ml/gcc-patches/2006-10/msg01029.html and java bootstrap failures on s390, s390x and ia64. See this thread http://gcc.gnu.org/ml/gcc-patches/2006-10/msg01058.html. Except that all of these were fixed in the followup patch and a later typo fix, *including* the memory usage (see honza's tester).
Re: [Bug tree-optimization/14784] [Tree-ssa] alias analysis deficiency
Details, source, etc needed. On 31 Oct 2006 15:02:02 -, hjl at lucon dot org [EMAIL PROTECTED] wrote: --- Comment #10 from hjl at lucon dot org 2006-10-31 15:02 --- It miscompiles dwarf2out.c in gcc in SPEC CPU 2006.
Re: [Bug tree-optimization/29585] [4.2/4.3 Regression] tree check: expected ssa_name, have var_decl in is_old_name, at tree-into-ssa.c:558
On 25 Oct 2006 05:23:00 -, pinskia at gcc dot gnu dot org [EMAIL PROTECTED] wrote: --- Comment #4 from pinskia at gcc dot gnu dot org 2006-10-25 05:22 --- _ZTCN33_GLOBAL__N_t.cc__2292CFAC11NullostreamE0_13basic_ostream # _ZTI13basic_ostream = V_MAY_DEF _ZTI13basic_ostream_16; # _ZTIN33_GLOBAL__N_t.cc__2292CFAC11NullostreamE = V_MAY_DEF _ZTIN33_GLOBAL__N_t.cc__2292CFAC11NullostreamE_17; # _ZTCN33_GLOBAL__N_t.cc__2292CFAC11NullostreamE0_13basic_ostream = V_MAY_DEF _ZTCN33_GLOBAL__N_t.cc__2292CFAC11NullostreamE0_13basic_ostream; # _ZTSN33_GLOBAL__N_t.cc__2292CFAC11NullostreamE = V_MAY_DEF _ZTSN33_GLOBAL__N_t.cc__2292CFAC11NullostreamE; # _ZTVN10__cxxabiv120__si_class_type_infoE = V_MAY_DEF _ZTVN10__cxxabiv120__si_class_type_infoE; # _ZTI8ios_base = V_MAY_DEF _ZTI8ios_base; # _ZTS13basic_ostream = V_MAY_DEF _ZTS13basic_ostream; # _ZTVN10__cxxabiv121__vmi_class_type_infoE = V_MAY_DEF _ZTVN10__cxxabiv121__vmi_class_type_infoE; # _ZTS8ios_base = V_MAY_DEF _ZTS8ios_base; # _ZTVN10__cxxabiv117__class_type_infoE = V_MAY_DEF _ZTVN10__cxxabiv117__class_type_infoE; # SFT.5 = V_MAY_DEF SFT.5; # SFT.6 = V_MAY_DEF SFT.6; # SFT.7 = V_MAY_DEF SFT.7; # SFT.8 = V_MAY_DEF SFT.8; # SFT.9 = V_MAY_DEF SFT.9; # NONLOCAL.15 = V_MAY_DEF NONLOCAL.15; this_9-_vptr.basic_ostream = iftmp.1_13; Uh, this is pretty weird. *all* of these should have been marked for renaming, not just NONLOCAL.
Re: [Bug tree-optimization/25737] ACATS c974001 c974013 hang with struct aliasing
On 24 Sep 2006 18:23:41 -, ebotcazou at gcc dot gnu dot org [EMAIL PROTECTED] wrote: --- Comment #37 from ebotcazou at gcc dot gnu dot org 2006-09-24 18:23 --- No, really, you don't seem to understand. If you respect these DECL_NONADDRESSABLE_P or TYPE_NONALIASED_COMPONENT flags, you are going to make them unaliased. Your whole bug report is that they are not aliased and should be, and that the loads and stores currently don't interfere but should. I think I understand your viewpoint: the name of TYPE_NONALIASED_COMPONENT and DECL_NONADDRESSABLE_P seems to imply than setting them would always result in less V_MAY_DEF's in the code. But... The name, and all the documentation, which say they cannot be addressed, which means they cannot be pointed to by any pointer, which means they are unaliased. Diego, the short summary is that Eric has some Ada testcases where we end up with less V_MAY_DEF's than he thinks should. He believes that respecting these flags, which specify you cannot form the address of a certain component, etc, will somehow cause him to end up with more aliasing and fix his testcase by anything other than luck. ...that's not so simple. If you look at how these flags work in GCC 3.x, you'll see that setting them has some impact on the alias sets used to access memory references, via can_address_p and the MEM_KEEP_ALIAS_SET_P flag. In GCC 4 dialect, this would result in different V_MAY_DEF's, not less. If so, then you've both hacked around something more funamental, and the documentation of all these flags don't actually match what you really mean. I'm not saying that this is a sane design or that we should try to replicate it in GCC 4, I'm just saying that for the time being struct aliasing totally overlooks this mechanism and doesn't work for Ada because of that. Okay, and i'm saying i don't plan on accepting fixes that appear to hack around well accepted infrastructure to try to fix symptoms. Really. That's all. I'm not going to approve patches that randomly skip fields because it seems to get the right result sometimes. If you want to try to explain what all this is actually trying to do, i'm happy to work with you to come up with a sane solution.
Re: [Bug tree-optimization/28944] New: tree-dce incorrectly removes an assignment.
asm volatile ( push %1 \n\t call *%0 \n\t add$4, %%esp \n\t : : r ( test ), r ( x ) ); asm statements are not allowed to alter control flow
Re: [Bug tree-optimization/28937] [4.2 regression] ICE in add_virtual_operand, at tree-ssa-operands.c:1309
Why does loop change the SMT usage? In addition, since there are times loop doesn't do anything, you should simply be returning PROP_smt_usage when it does do something, and nothing otherwise. On 4 Sep 2006 03:52:04 -, pinskia at gcc dot gnu dot org [EMAIL PROTECTED] wrote: --- Comment #4 from pinskia at gcc dot gnu dot org 2006-09-04 03:52 --- Note the patch is: Index: tree-ssa-loop.c === --- tree-ssa-loop.c (revision 116671) +++ tree-ssa-loop.c (working copy) @@ -405,9 +405,11 @@ struct tree_opt_pass pass_complete_unrol TV_COMPLETE_UNROLL, /* tv_id */ PROP_cfg | PROP_ssa, /* properties_required */ 0, /* properties_provided */ - 0, /* properties_destroyed */ + PROP_smt_usage, /* properties_destroyed */ 0, /* todo_flags_start */ - TODO_dump_func | TODO_verify_loops, /* todo_flags_finish */ + TODO_dump_func +| TODO_verify_loops +| PROP_smt_usage, /* todo_flags_finish */ 0/* letter */ }; -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28937
Re: [Bug tree-optimization/28798] remove_phi_node attempts removal of a phi node resized by resize_phi_node
hosking at cs dot purdue dot edu wrote: --- Comment #13 from hosking at cs dot purdue dot edu 2006-08-24 15:27 --- Is this enough? Here is the dump output, followed by stack traces at the resize and remove points (the remove goes on to fail). So, this edge can't exist. Note: Its src is: (gdb) p *(e-src) $12 = { index = 0, } Its dest is: (gdb) p *(e-dest) $13 = { index = 0, } It claims to be an edge from block 0 to block 0, but your according to your dump, block 0 is not a successor of block 0 (IE it is not a self loop). --Dan
Re: [Bug tree-optimization/15452] [tree-ssa] Optimize cascaded a = a == 0;
pinskia at gcc dot gnu dot org wrote: --- Comment #6 from pinskia at gcc dot gnu dot org 2006-08-24 04:27 --- Another interesting case would be (but which could be handled by VRP): int foo (int a) { a = a!=0; a = a!=0; a = a!=0; a = a!=0; a = a!=0; return a; } Which should be optimized to: int foo(int a) { return a!=0;} Uh, FRE could also optimize this to the same thing, I just don't remember whether it bothers to look at conditionals as eliminable expressions.
Re: [Bug tree-optimization/28798] remove_phi_node attempts removal of a phi node resized by resize_phi_node
hosking at cs dot purdue dot edu wrote: --- Comment #7 from hosking at cs dot purdue dot edu 2006-08-23 22:29 --- This is with the Modula-3 backend. I am porting it to 4.1.1 and encountered this problem with -O3 turned on. Does 4.1 have the check for EDGE_CRITICAL_P in insert_aux? If not, that is the problem.
Re: [Bug tree-optimization/28798] remove_phi_node attempts removal of a phi node resized by resize_phi_node
hosking at cs dot purdue dot edu wrote: --- Comment #11 from hosking at cs dot purdue dot edu 2006-08-24 00:57 --- (In reply to comment #9) Does 4.1 have the check for EDGE_CRITICAL_P in insert_aux? Yes: /* This can happen in the very weird case that our fake infinite loop edges have caused a critical edge to appear. */ if (EDGE_CRITICAL_P (pred)) { cant_insert = true; break; } Honestly, there should be no other case in which the edge actually needs to be split. It is just a shortcut rather than trying to whether we want the beginning of the succ or the end of the pred (it figures it out for us). If you could attach the dump from -fdump-tree-crited-vops-details-blocks-stats, and tell me what pred, src, and block are, that would be helpful. Without more, it's either something *very* strange in the code modula3 is creating (or broken gimplification), *or* the edge inserter is confused and believes it needs to create a block in a case it doesn't.
Re: [Bug tree-optimization/28798] remove_phi_node attempts removal of a phi node resized by resize_phi_node
pinskia at gcc dot gnu dot org wrote: --- Comment #2 from pinskia at gcc dot gnu dot org 2006-08-22 06:17 --- We should never had needed resize_phi_node inside PRE and resize_phi_node also does an exact replacement so that means you are keeping a reference to the old PHI node when adding an edge which is wrong. PRE never directly calls resize_phi_node The insert_on_edge call PRE makes should *never* cause the number of predecessors to change, so i can't see why resize_phi_node would ever be called. Without an example case where it does, i can't debug this further. However, it's not wrong to keep a reference to a phi node when a random edge in the program changes. The API that doesn't allow such a thing is just broken. This is a symptom of the fact that our phi node arguments are stored in pretend vectors, even though it would be saner to use an embedded vec in that structure. This would allow reallocating arguments without having to change the entire phi node structure.
Re: [Bug tree-optimization/28643] redundant phi-node in latch-block prevents vectorization
pinskia at gcc dot gnu dot org wrote: --- Comment #1 from pinskia at gcc dot gnu dot org 2006-08-08 01:47 --- SSA copy prop with dce after that should really be the correct way. Err, SSA copy prop should be enough, actually, since after copy-prop, the phi will have no users (and they shouldn't care about code with no uses that doesn't access memory). Though it's interesting that this redundant phi survives so long. What is creating it?
Re: [Bug c/28073] Type-punned pointer passed as function parameter generates bad assembly sequence
sorenj at us dot ibm dot com wrote: --- Comment #2 from sorenj at us dot ibm dot com 2006-06-19 16:44 --- Changing just one line of the test program to the (AFAIK) legal C code. By casting through void *, we are addressing Andrew's concerns about violating the C rules. No you aren't. The only thing that matters is what the type of the dereferenced pointer is, not the intermediate casts. For example, int *foo float b; float *c; b = 5.0 foo = (int*)b c = (float *)foo printf(%f\n, *c); is legal. Foo *pFoo = *(Foo **) ((void *)longPtr); /* // BAD! */ Still not legal. eliminates the type-punned warning, even at the highest possible warning level, and continues to generate code the results in a bad return value. This test case illustrates that this problem is actually worse than we originally thought, as now incorrect code is generated without any warning. We can't issue warnings in every case because it is impossible to detect every case. We could probably issue a warning in this case.
Re: [Bug tree-optimization/28003] [4.2 Regression] optimizer bug
pinskia at gcc dot gnu dot org wrote: --- Comment #2 from pinskia at gcc dot gnu dot org 2006-06-13 04:41 --- Hmm, we get after dce, just: reduced_cell_two_folds[26] = {}; And DCE removes: this_616 = reduced_cell_two_folds[26].u; # SMT.68_1055 = V_MAY_DEF SMT.68_1054; this_616-elems[0] = 1; # SMT.68_1056 = V_MAY_DEF SMT.68_1055; this_616-elems[1] = 0; # SMT.68_1057 = V_MAY_DEF SMT.68_1056; this_616-elems[2] = 0; ... this_621 = reduced_cell_two_folds[26].h; ... # SMT.68_1058 = V_MAY_DEF SMT.68_1057; this_621-elems[0] = 2; # SMT.68_1059 = V_MAY_DEF SMT.68_1058; this_621-elems[1] = 1; # SMT.68_1060 = V_MAY_DEF SMT.68_1059; this_621-elems[2] = 1; Which does not make sense. Nothing is special in alias shows what is going wrong. The only thing i can think of is that SMT.68 is not marked global. Is it?
Re: [Bug target/27855] reassociation pass produces ~30% slower matrix multiplication code
steven at gcc dot gnu dot org wrote: --- Comment #4 from steven at gcc dot gnu dot org 2006-06-02 23:19 --- Real bug, despite Andrew's usual portion of x86-hate. It'd be good to know what exactly is going wrong. Reassociation only touches floating point because someone asked me to make it touch floating point. It still shouldn't have *this* much of an affect, my guess is it is triggering some bad behavior elsewhere.
Re: [Bug middle-end/27445] create_tmp_var_raw (gimplify.c) inadventently asserts 'volatile' on temps
I haven't looked into the rev. history, to see why/when this fix was made, but will ask the hypothetical: was this fix made to workaround the misbehavior in create_tmp_var_raw()? Note that create_tmp_var_raw() is exported from gimplify.c and appears to be called from quite a few places. The question arises: what are the preconditions for calling create_tmp_var_raw()? If you want to assert that it uses whatever type was passed in and all the callers have to remove qualifiers as necessary that's fine, but requires some knowledge of the original intent behind create_tmp_var_raw() and the assumptions its callers make. I'd be temtpted to add an assert that the type passed in has no qualifiers if that is a pre-condition. Compiler temporaries we generate explicitly, have the same qualifiers as the expression they are generated from. This is by design.
Re: [Bug tree-optimization/26304] [4.2 Regression] 25_algorithms/prev_permutation/1.cc on powerpc{64,}-linux and powerpc-darwin
On Sun, 2006-04-23 at 23:14 +, pinskia at gcc dot gnu dot org wrote: --- Comment #17 from pinskia at gcc dot gnu dot org 2006-04-23 23:14 --- Rewritting that loop like: [kudzu:local/trunk/gcc] pinskia% svn diff tree-ssa-loop-niter.c Index: tree-ssa-loop-niter.c === --- tree-ssa-loop-niter.c (revision 113199) +++ tree-ssa-loop-niter.c (working copy) @@ -1939,6 +1939,7 @@ scev_probably_wraps_p (tree type, tree b tree unsigned_type, valid_niter; tree base_plus_step, bpsps; int cps, cpsps; + bool known_not_to_wrap; /* FIXME: The following code will not be used anymore once http://gcc.gnu.org/ml/gcc-patches/2005-06/msg02025.html is @@ -2077,8 +2078,10 @@ scev_probably_wraps_p (tree type, tree b estimate_numbers_of_iterations_loop (loop); for (bound = loop-bounds; bound; bound = bound-next) -if (proved_non_wrapping_p (at_stmt, bound, type, valid_niter)) - return false; +if (!proved_non_wrapping_p (at_stmt, bound, type, valid_niter)) + known_not_to_wrap = false; + if (known_not_to_wrap) + return false; /* At this point we still don't have a proof that the iv does not overflow: give up. */ known_to_wrap may be uninitialized at the if statement here. You need to init it to true.
Re: [Bug tree-optimization/27140] Compiling LLVM now takes nearly 5x as long with 4.1 as it did with 4.0
On Apr 13, 2006, at 1:30 PM, rspencer at x10sys dot com wrote: --- Comment #6 from rspencer at x10sys dot com 2006-04-13 20:30 --- Created an attachment (id=11261) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=11261action=view) Timing results with -fno-tree-salias Andrew Pinskia suggested that I try -fno-tree-salias. This decreased compilation time by about 10% (244 secs vs 265 secs). Only by virtue of the fact that you have a smaller number of phi nodes. It's not going to give an order of magnitude improvement here.
Re: [Bug tree-optimization/19590] IVs with the same evolution not eliminated
--- Comment #10 from stevenb dot gcc at gmail dot com 2006-04-08 21:13 --- Subject: Re: IVs with the same evolution not eliminated The new SCC value numberer for PRE i'm working on gets this case right (and this is in fact, one of the advantages of SCC based value numbering). Is the SCC-VN patch I posted long ago still of some use to you, or are you writing something new from scratch? I ended up rewriting it from scratch, for other reasons. In particular 1. I keep separate hash tables for unary, binary, references, and phi expressions, each with their own structure This is because you really want valuized structures in the hash table. Your implementation will get the wrong answers during optimistic lookup at times, because the value representative for a phi argument can change and will get hashed to the wrong value. 2. I keep track of what expressions simplified to, and whether they have constants in the simplified expression. This enables much more simplification that simply storing the value number name. In particular, in something like int main(int argc) { int a; int b; int c; int d; a = argc + 4; b = argc + 8; c = a b; d = a + 4; return c + d; } We will prove that d and b have the same value. BTW, you missed the part of the thesis where he explains that phi nodes in different blocks can't be congruent to each other (this isn't quite true, but it's a much harder property to prove). 3. I needed the structures i made so i could directly transform the results into value handles.
Re: [Bug tree-optimization/27056] New: ICE in loop_depth_of_name
On Thu, 2006-04-06 at 11:49 +, jakub at gcc dot gnu dot org wrote: On the attached testcase with today's gcc-4_1-branch -m32 -g -O2 I get ICE during copy propagation. Unfortunately, even doing minor changes in different routines makes the problem go away. What I see in the dumps is: 1) at *t26.ssa, in draw_digit, there are two SSA_NAMEs with version 2: This is already wrong :)
Re: [Bug tree-optimization/26944] [4.1/4.2 Regression] -ftree-ch generates worse code
Compare pretmp.28_49 with pretmp.32_11, why are the arguments in a different order? Is there something unstable in the PRE algorithm? No, we just call fold on the expressions we build, and whatever it gives us, we use :)
Re: [Bug tree-optimization/26781] [4.2 Regression] ICE in tree-ssa-pre.c at create_component_ref_by_piec
On Tue, 2006-03-21 at 15:02 +, malitzke at metronets dot com wrote: --- Comment #5 from malitzke at metronets dot com 2006-03-21 15:02 --- The two if (tree_code(genop) == VALUE_HANDLE) at lines 2190 of tree-ssa-pre.c look suspicious to me. They aren't suspicious at all.
Re: [Bug tree-optimization/26726] -fivopts producing out of bounds array refs
On Fri, 2006-03-17 at 12:40 +, mueller at gcc dot gnu dot org wrote: --- Comment #2 from mueller at gcc dot gnu dot org 2006-03-17 12:40 --- one possible workaround would be to lower the ARRAY_REF's to indirect mem refs, which I don't track Uh, no. We are in fact, trying to do the exact opposite in the future (keep things array ref as long as possible)
Re: [Bug tree-optimization/26626] [4.2 Regression] ICE in in add_virtual_operand
On Thu, 2006-03-09 at 22:54 +, pinskia at gcc dot gnu dot org wrote: --- Comment #3 from pinskia at gcc dot gnu dot org 2006-03-09 22:54 --- The difference between copyprop and before is the following. Before: rv.0_3 = rv.0_2; # VUSE NMT.7_13; D.1900_4 = rv.0_3-d; After: rv.0_3 = rv.0_2; # VUSE SMT.6; D.1900_4 = rv.0_2-d; This is nonsensical, and very bad.
Re: [Bug tree-optimization/26608] New: address of local variables are said to escape even though it is obvious they don't
On Wed, 2006-03-08 at 18:59 +, pinskia at gcc dot gnu dot org wrote: Testcase: int *d1; int g(int *b) { d1 = b; } int f(int a, int b, int c) { int i, j; int *d; if (a) d = i; else d = j; i = 2; j = 3; g(b); if (i!=2) link_error(); if (j!=3) link_error(); return *d; } int main(void) { f(1, 2,3); return 0; } This should link with optimize but right now i and j are said to be call clobbered for some reason. What does the dump say. My guess is that it believes that they are returned from the call, even though they are not.
Re: [Bug tree-optimization/26443] [4.2 regression] ICE in add_virtual_operand, at tree-ssa-operands.c:1867
On Fri, 2006-02-24 at 13:06 +, pinskia at gcc dot gnu dot org wrote: --- Comment #2 from pinskia at gcc dot gnu dot org 2006-02-24 13:06 --- Confirmed. Though VRP2 is just doing constant propagation at this point. Last time i looked at a bug like this, it was actually some other pass not rescanning operands when it should have.
Re: [Bug fortran/26444] gfortran does not compile cp2k
On Thu, 2006-02-23 at 18:37 +, jb at gcc dot gnu dot org wrote: --- Comment #2 from jb at gcc dot gnu dot org 2006-02-23 18:37 --- I have the current CVS of cp2k, it fails with gfortran -c -O3 -g -ffast-math -fomit-frame-pointer message_passing.f90 ... message_passing.f90: In function 'mp_perf_env_create': message_passing.f90:58: internal compiler error: in add_virtual_operand, at tree-ssa-operands.c:1867 Confirmed. And yes, it seems cp2k is a good testsuite for modern Fortran features. This assert means some pass changed TMT usage without the right update flags. Andrew, can you try to figure out what pass did this (it should be relatively simple to see what the last pass touching the statement in question is).
Re: [Bug tree-optimization/14784] [Tree-ssa] alias analysis deficiency
On Thu, 2006-02-16 at 21:40 +, pinskia at gcc dot gnu dot org wrote: --- Comment #4 from pinskia at gcc dot gnu dot org 2006-02-16 21:40 --- We get: # bitmap_free_7 = PHI bitmap_free_1(4), bitmap_free_6(5); L0:; # bitmap_free_1 = PHI bitmap_free_7(3), bitmap_free_2(2); L4:; # VUSE bitmap_free_1; D.1534_4 = head_3-using_obstack; if (D.1534_4 != 0) goto L1; else goto L0; L1:; # bitmap_free_6 = V_MUST_DEF bitmap_free_1; bitmap_free = elt_5; goto bb 3 (L0); I cannot figure out why Daniel's recent patches did not fix this one. Probably the !POINTER_TYPE_P check
Re: [Bug tree-optimization/8361] [4.1/4.2 regression] C++ compile-time performance regression
Flags: -O3 GCC 4.0 (release branch today): real0m24.412s 0m25.000s 0m24.771s user0m23.921s 0m24.430s 0m24.210s sys 0m0.368s0m0.408s0m0.420s GCC 4.1 (release branch today): real0m33.260s 0m33.140s 0m33.188s user0m32.602s 0m32.522s 0m32.554s sys 0m0.556s0m0.544s0m0.600s GCC 4.2 (trunk today): real0m36.544s 0m36.614s 0m36.492s user0m35.950s 0m35.942s 0m35.994s sys 0m0.544s0m0.600s0m0.464s Significant compile time sinks in GCC 4.1 that don't appear in GCC 4.0: tree PTA : 2.31 ( 7%) usr tree SSA incremental : 2.14 ( 6%) usr expand: 1.71 ( 5%) usr So, could you do me a favor if you get a chance, and change the macro DONT_PROPAGATE_WITH_ANYTHING to 1 in tree-ssa-structalias.c, and see if it speeds it up at all?
Re: [Bug tree-optimization/24169] Address (full struct) escapes even though the called function does not cause it to escape
On Sun, 2006-01-01 at 00:41 +, pinskia at gcc dot gnu dot org wrote: --- Comment #1 from pinskia at gcc dot gnu dot org 2006-01-01 00:41 --- Just a clarification here, I just want the SFT for k.j to be considered call clobbered for this testcase. This is not anywhere near as easy as you think it is. In fact, we used to only call clobber k.j. Because our standards experts tell us that doing pointer arithmetic magic to get back to k.i is legal, we could only consider this function to clobber *just* k.j if the pointer doesn't escape from f, *and* f does not do any pointer arithmetic on it's arguments. This is usually *not* the case, making this testcase more or less not interesting at all.
Re: [Bug rtl-optimization/24762] [killloop-branch] code motion of non-invariant expressions with hard registers.
On Wed, 2005-11-09 at 23:45 +, steven at gcc dot gnu dot org wrote: --- Comment #10 from steven at gcc dot gnu dot org 2005-11-09 23:45 --- Actually, flow.c does get it right. Okay, then df.c on dataflow branch should get it right too.
Re: [Bug tree-optimization/24694] New: Address taken and addressable variables and call clobber
On Sun, 2005-11-06 at 15:46 +, pinskia at gcc dot gnu dot org wrote: Take the following code: int f(int); int g(void) { int i; int *iptr = i; int **ipp = iptr; **ipp = 1; f(i); return **ipp; } -- Here we consider i being call clobber because we lose the fact that iptr is addressable but we don't look to see if its address escapes at all (which in this case it does not). No, we don't actually. In fact, that's not even close to what happens. iptr isn't renamed, and thus, we assume the address taking of i and storage into iptr is the same as a global store, because we know nothing about unrenamed variables.
Re: [Bug rtl-optimization/8361] [3.4/4.0/4.1 regression] C++ compile-time performance regression
On Thu, 2005-10-13 at 03:34 +, pinskia at gcc dot gnu dot org wrote: --- Comment #57 from pinskia at gcc dot gnu dot org 2005-10-13 03:34 --- A semi recent 4.1 (the 10th) gives: tree PTA : 1.60 ( 6%) usr 0.02 ( 1%) sys 1.73 ( 6%) wall 10338 kB ( 1%) ggc tree alias analysis : 1.32 ( 5%) usr 0.19 (10%) sys 1.48 ( 5%) wall 18910 kB ( 3%) ggc while 4.0 gave: tree PTA : 0.50 ( 2%) usr 0.00 ( 0%) sys 0.48 ( 2%) wall tree alias analysis : 0.73 ( 3%) usr 0.00 ( 0%) sys 0.76 ( 3%) wall So this is definitely a 4.1 regression. I'm pretty sure we run PTA more times in 4.1 than 4.0 Maybe i'm wrong. Can you oprofile this and give me some kind of hotspot to look into in PTA?
Re: [Bug libgcj/24170] [SECURITY] readdir_r considered harmful
On Sun, 2 Oct 2005, ben at decadentplace dot org dot uk wrote: --- Comment #1 from ben at decadentplace dot org dot uk 2005-10-02 23:16 --- Can someone please remove this from public view, as Mozilla does for security bugs on their Bugzilla? Unlike mozilla, we do not remove security bugs from public view. Nobody has ever set a policy for gcc that says we should (IE taking position on the merits of whether we should have such a policy, we don't).
Re: [Bug tree-optimization/24146] Optimizes away FPU control word store
On Fri, 2005-09-30 at 13:58 +, rearnsha at gcc dot gnu dot org wrote: --- Additional Comments From rearnsha at gcc dot gnu dot org 2005-09-30 13:58 --- (In reply to comment #1) volatile is needed here. No, the manual says: An @code{asm} instruction without any output operands will be treated identically to a volatile @code{asm} instruction. So this insn should be kept even though it isn't explicitly volatile. Then i guess we should teach the FE to just mark them volatile, so we don't have to worry about this in the middle end.
Re: [Bug tree-optimization/24146] [4.0 Regression] Optimizes away FPU control word store
On Fri, 2005-09-30 at 14:07 +, pinskia at gcc dot gnu dot org wrote: --- Additional Comments From pinskia at gcc dot gnu dot org 2005-09-30 14:07 --- I still say this is invalid. well, that just makes you wrong. the docs clearly say it's supposed to be treated as volatile.
Re: [Bug tree-optimization/24001] Simple redundancy not eliminated
On Thu, 2005-09-22 at 08:31 +, rguenth at gcc dot gnu dot org wrote: --- Additional Comments From rguenth at gcc dot gnu dot org 2005-09-22 08:31 --- load-pre should sink the load and fix the problem at the tree level. Uh, load PRE doesn't sink loads, it would lift it.
Re: [Bug middle-end/23672] Fold does not fold (a^b)^a to b
On Sat, 2005-09-17 at 02:12 +, pinskia at gcc dot gnu dot org wrote: --- Additional Comments From pinskia at gcc dot gnu dot org 2005-09-17 02:12 --- Confirmed. The new reassoc should take care of this
Re: [Bug tree-optimization/23386] [4.1 Regression] bitmap.c is being miscompiled (VRP)
On Sun, 2005-08-14 at 17:32 +, pinskia at gcc dot gnu dot org wrote: --- Additional Comments From pinskia at gcc dot gnu dot org 2005-08-14 17:32 --- Here is something which is a little more reduced: int f[100]; int g[100]; unsigned char f1 (int a, int b) { unsigned ix; if (a == b) return 1; for (ix = 4; ix--;) if (f[ix] != g[ix]) return 0; return 1; } int main(void) { if (!f1 (1, 2)) __builtin_abort(); return 0; } The SSA version used in the pointer arithmetic doesn't wrap. The other SSA versions do. We can't afford to simply assume that everything wraps, or else we can't calculate the number of iterations on pretty much any loop.
Re: [Bug tree-optimization/23361] Can't eliminate empty loops with power of two step and variable bounds
On Fri, 2005-08-12 at 19:10 +, pinskia at gcc dot gnu dot org wrote: --- Additional Comments From pinskia at gcc dot gnu dot org 2005-08-12 19:10 --- Personally, i think -funsafe-loop-optimizations should be on by default in -O3, with a warning for when we rely on it. It's *incredibly* rare that a user actually intends for a loop counter to be able to overflow.
Re: [Bug libstdc++/23278] SJLJ-exceptions broken
On Tue, 9 Aug 2005, jacob dot navia at ants dot com wrote: --- Additional Comments From jacob dot navia at ants dot com 2005-08-09 19:57 --- If I can't mix SJLJ exceptions with DWARF2 exceptions how this is supposed to work? How is what supposed to work? I mean I have to rebuild all libraries including libc, libm, and whatever Yes. This can't be. It is. Besides, why this mixing should lead to the address of a function being stored in the high 32 bits of a 64 bit address? Possibly because it's attempting to read the wrong place as if it was an unwind table, and gets confused. k
Re: [Bug java/1427] gcj should generate N_MAIN stab or DW_AT_entry_point dwarf2 debug info
On Tue, 2005-08-09 at 04:11 +, woodzltc at sources dot redhat dot com wrote: --- Additional Comments From woodzltc at sources dot redhat dot com 2005-08-09 04:11 --- OK. I had some time and would like to have a look into this, and I found something inconsistent. My founding is listed below, wishing that it can help clarify the situation a little: 1. Someone mentioned DW_AT_entry_point in above comments. It should be a typo IMHO. In DWARF standard, there is no such an attribute named DW_AT_entry_point, but there does exist a tag named DW_TAG_entry_point. 2. Seen from the DWARF standard, DW_TAG_entry_point doesn't live to act as what was supposed to do. Section-3.3 of DWARF-3 standard (Subroutine and Entry Point Entries) says: DWARF3 is not quite standardized yet. But it's weeks away. DW_TAG_entry_pointA Fortran alternate entry point Yes, well, i can bring it up if you want, but it seems the right way to describe your entry points. Although I am not very sure about what it means by alternate entry point. But I believe that it is not to represent the entry point in the final executable. This is wrong, at least for fortran. 3. I had a browsing over the DWARF standard, didn't found anything that is the same as N_MAIN in stabs. Maybe we can suggest DWARF to add such a tag? Any comments? Please add an issue on dwarf.freestandards.org and i'll take it from there. Regards - Wu Zhou
Re: [Bug c++/23278] New: SJLJ-exceptions broken
On Sun, 2005-08-07 at 19:50 +, jacob dot navia at ants dot com wrote: We have a program (c++) that needs c++ SJLJ exceptions. We have built all compilers from 3.3.1 to 3.3.6 and they all have the same bug: In the first throw that the program does, we get an exception in the runtime Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 1166014832 (LWP 24573)] parse_lsda_header (context=0x457f6978, p=0xd5a040 Address 0xd5a040 out of bounds, info=0x457f6900) at ../../../../gcc-3.3.6/libstdc++-v3/libsupc++/eh_personality.cc:62 62lpstart_encoding = *p++; You can't mix SJLJ exceptions and dwarf2 exceptions, which is what happened here, AFAICT
Re: [Bug c++/22602] New: I can't enter a bug here
On Fri, 2005-07-22 at 00:57 +, jacob dot navia at ants dot com wrote: Because there is a size limitation to 64K in this software. I prepared a single file with no includes that faithfully reproduced the bug: bug0.cpp: In member function 'double AtomicDouble::CompareExchange(double, double) volatile': bug0.cpp:4999: internal compiler error: in create_tmp_var, at gimplify.c:368 Please submit a full bug report, with preprocessed source if appropriate. See URL:http://gcc.gnu.org/bugs.html for instructions. This took me hours. THEN, I entered here the file. This software told me when I pressed the submmit button that my stuff was bigger than 64K, then IT DISCARDED ALL MY INPUT. NICE. I have worked like 3 hours more but the file size went down fro; 350k to 162K only. It is becoming increasingly difficult to reduce the size. In this times, *ANY* include directive will produce file sizes of more than 64K. Why this stupid limitation? Uh, becuase we want you to *attach the file*, not *paste it into the comments*. Click create new attachment Why the heck would we want to see 65k of text in the comments of a bug?
Re: [Bug tree-optimization/22376] PTA is slow on a silly unrealistic test case
On Thu, 2005-07-14 at 17:13 +, pinskia at gcc dot gnu dot org wrote: --- Additional Comments From pinskia at gcc dot gnu dot org 2005-07-14 17:13 --- Confirmed, patch here: http://gcc.gnu.org/ml/gcc-patches/2005-07/msg00918.html. I'm waiting for mainline to settle a bit before committing to make sure we don't cause more problems.
Re: Someone introduced a libiberty crashing bug in the past week
On Mon, 2005-06-20 at 16:05 +, Joseph S. Myers wrote: On Mon, 20 Jun 2005, Daniel Berlin wrote: The crash line is 3729 if (pedantic !DECL_IN_SYSTEM_HEADER (fundecl)) Here, fundecl is null. Any problem with fundecl being null should also be reproducible with a call through a function pointer where fundecl would never have been set to non-null anyway. Restoring fundecl = function; in the if (TREE_CODE (function) == FUNCTION_DECL) part of build_function_call should fix the particular ICE, but the problem with function pointers should still get a PR filed. I'll do this
Re: [Bug tree-optimization/21712] missed optimization due with const function and pulling out of loops
On Sun, 2005-05-22 at 19:36 +, rakdver at gcc dot gnu dot org wrote: --- Additional Comments From rakdver at gcc dot gnu dot org 2005-05-22 19:36 --- Because do_something does not have to return, therefore get_type2 does not necessarily have to be executed. In this case we cannot move the call to get_type2 from the loop (since do_something could for example initialize some table used internally by get_type2). This is wrong. do_something can't write. it's const.
Re: [Bug tree-optimization/21712] missed optimization due with const function and pulling out of loops
. Nevertheless, even if we are very strict with the definition, moving get_type2 out of the loop is not a good idea, since get_type2 might potentially be very expensive (and we have no way how to determine that this is not the case), thus we would lose in case get_type2 should be never executed. Don't we attempt to detect zero trip loops? (If not, we should :P)
Re: [Bug tree-optimization/21712] missed optimization due with const function and pulling out of loops
On Sun, 2005-05-22 at 21:13 +, rakdver at atrey dot karlin dot mff dot cuni dot cz wrote: --- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-05-22 21:13 --- Subject: Re: missed optimization due with const function and pulling out of loops Nevertheless, even if we are very strict with the definition, moving get_type2 out of the loop is not a good idea, since get_type2 might potentially be very expensive (and we have no way how to determine that this is not the case), thus we would lose in case get_type2 should be never executed. Don't we attempt to detect zero trip loops? (If not, we should :P) I don't see how this is relevant to the PR. Uh, you claimed we won't move get_type2 out, even if it is const, becuase it might not normally execute. If we can't prove we don't execute the loop, you should move it out. Otherwise, your logic would hold for get_type1 just the same, which we *do* move out of the loop. IOW, there is no reason to move get_type1 out but not get_type2
Re: [Bug tree-optimization/21712] missed optimization due with const function and pulling out of loops
On Sun, 2005-05-22 at 21:36 +, rakdver at gcc dot gnu dot org wrote: --- Additional Comments From rakdver at gcc dot gnu dot org 2005-05-22 21:36 --- Do you still believe we should move gettype2 out of the loop??? Okay, let's compromise. If i move cgraph do noreturn and infinite loop detection, so that we know everything we can about do_something and gettype2 that is possible, and we detect neither for do_something, are you still going to claim that we shouldn't move it out of the loop? ISTM that presuming a call in a loop is incredibly expensive seems wrong, when that call is const. Your case seems the very extreme corner case, not the common case. People mark const on simple calls (remember, const can't read from anything but readonly memory), not huge monster calls that do lots of stuff.
Re: [Bug tree-optimization/21712] missed optimization due with const function and pulling out of loops
On Sun, 2005-05-22 at 21:51 +, rakdver at atrey dot karlin dot mff dot cuni dot cz wrote: --- Additional Comments From rakdver at atrey dot karlin dot mff dot cuni dot cz 2005-05-22 21:50 --- Subject: Re: missed optimization due with const function and pulling out of loops const is different from pure, const cannot read from memory. this is something that have been discussed many times; some people like the definition with behaves like if (that enables you for example to cache or precompute the results of the function) more, and it is used in several existing programs. Anyway, the argument that the function may be costly is valid regardless of whether you want to strictly enforce the no memory access constraint, or whether you use the more useful definition. These people are strictly wrong, and will in fact get burned by the new pure/const detection (which is better about recursive calls). We shouldn't let people who have the wrong definition of const get in the way of optimization
Re: [Bug tree-optimization/21712] missed optimization due with const function and pulling out of loops
on the other hand, we should not let the definition make the concept useless. Being able to make The definition actually matches what other compilers call isolated (no access to global variables) combined with the property called side-effect free (calling multiple times with same parameters is same as calling once). We could of course, split these concepts if we wanted to. int something(int i) { static int a[100]; if (a[i] == 0) a[i] = somewhat_slow_computation; return a[i]; } const is fairly useful. Anyway, moving possibly non-executed const function may cause also other problems. Consider int my_fancy_divide(int x, int z) attribute(const) { return x / z; } while (...) { if (z != 0) x = my_fancy_divide (x,z) } Uh, this may be const, but you can't move these out anyway because the value of the parameters has changed, so i'm not sure what you are going for. Thus you would also have to require the const function to be total. Making const still more and more useless. const has a very specific definition already. Moving get_type2 out of the loop is consistent with that definition.
Re: [Bug tree-optimization/13761] [tree-ssa] component refs to the same struct should not alias
On Sat, 2005-04-23 at 16:52 +, steven at gcc dot gnu dot org wrote: --- Additional Comments From steven at gcc dot gnu dot org 2005-04-23 16:52 --- Will the second part of the struct alias merge fix Dann's original test case? (http://gcc.gnu.org/wiki/Structure Aliasing Part II) Yes, but not immediately. structure aliasing part ii is really two parts First, is a new alias analyzer to handle structure fields, allow inteprocedural analysis. Second is improving our representation to handle base+offset dereferences. The second is a lot harder than one would think.
Re: [Bug middle-end/20674] unexpected result from floating compare
On Mon, 2005-03-28 at 23:05 +, piaget at us dot ibm dot com wrote: --- Additional Comments From piaget at us dot ibm dot com 2005-03-28 23:05 --- 323 compares 2 values across a function call ... somthing a programmer can reasonably consider. My problem occurs with 2 successive lines of code admittedly with 2 compares per line). I don't have a problem that the value of the variable changes after precision truncation ... but it seems like a bug that the compiler uses a full precision value for the 1st test and a truncated value for the 2nd test (the 2nd test being the next line of C++ code). Except, the value could have been spilled and reloaded from registers between those two source lines, which on x86, is where the problem comes from. The problem is no different simply because the *source* lines happen to be right next to each other.
Re: [Bug rtl-optimization/20376] The missed-optimization of general induction variables in the new rtl-level loop optimizer cause performance degradation.
On Tue, 2005-03-08 at 03:18 +, pinskia at physics dot uc dot edu wrote: --- Additional Comments From pinskia at physics dot uc dot edu 2005-03-08 03:18 --- Subject: Re: The missed-optimization of general induction variables in the new rtl-level loop optimizer cause performance degradation. On Mar 7, 2005, at 10:16 PM, Diego Novillo wrote: pinskia at gcc dot gnu dot org wrote: Why isn't the tree level loop IV-OPTs doing this? Because variable i is static. I think you commenting on the wrong bug. In swim, most of the loop bounds are accessed through the COMMON block, which is a structure.
Re: [Bug tree-optimization/20134] New: 176.gcc miscompare with -m64 after DOM change
On Tue, 2005-02-22 at 00:12 +, janis at gcc dot gnu dot org wrote: The SPEC CPU2000 test 176.gcc has been failing on powerpc64-*-linux-gnu with -m64 -O1 since this patch was added: 2004-10-23 Daniel Berlin [EMAIL PROTECTED] * tree-ssa-dom.c (record_equality): Use loop depth to determine which way to record the equality as well. (loop_depth_of_name): New function. This can't be the real cause of the problem, however, it must just be exposing the latent bug. It just changes the direction we record the equality, so that we will use one variable instead of another. The code still believes both variables to be equal. In other words, there is something in record_equality that isn't correct, or some pass later on is now doing something wrong as a result. Can you print out the values of x, y, and prev_x we are passing to record_const_or_copy_1 in record_equality before and after the patch, for that function?
Re: [Bug tree-optimization/14741] missing transformations lead to poorly optimized code
On Fri, 28 Jan 2005, jv244 at cam dot ac dot uk wrote: --- Additional Comments From jv244 at cam dot ac dot uk 2005-01-28 16:31 --- You could try gfortran -O3 -mtune=pentium4 -ffast-math -mfpmath=sse -ftree-loop-linear -ftree-vectorize yourcode.f90 and see if it helps. Unhappily, seems to make things slower: multgen/basic_mult gfortran -O3 -mtune=pentium4 -ffast-math -mfpmath=sse -ftree-loop-linear -ftree-vectorize mult.f90 mult.f90:0: warning: SSE instruction set disabled, using 387 arithmetics You'd need -msse2 or -msse (or is it -march=pentium4 that enables these?)
Re: [Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)
I believe seb/zdenek already submitted patches for speeding up scev quite recently, with the goal of alleviating this problem. I'm pretty sure they have not been applied yet.
Re: [Bug tree-optimization/18595] [4.0 Regression] IV-OPTS is O(N^3)
On Sun, 24 Jan 2005, rakdver at gcc dot gnu dot org wrote: --- Additional Comments From rakdver at gcc dot gnu dot org 2005-01-24 01:46 --- On a side note, PRE also seems to have problems with the testcase. With the patch mentioned above, the largest consumers of compile time are ivopts (45%) and pre (20%). Uh, there was a bug filed about this, and i fixed it, last i looked.
Re: [Bug inline-asm/11203] source doesn't compile with -O0 but they compile with -O3
The reason is dead simple: register allocation is NP-complete, so it is even *theoretically* not possible to write register allocators that always find a coloring. register allocation in general is NP-complete, yes, but it seems u forget that this is about finding the optimal solution while gcc fails finding any solution which in practice is a matter of assigning the registers beginning from the most constrained operands to the least, and copying a few things on the stack if gcc cant figure out howto access them, sure this method might fail in 0.001% of the practical cases and need a 2nd or 3rd pass where it tries different registers it might also happen that in some intentionally overconstrained cases it ends up searching the whole 5040 possible assignments of 7 registers onto 7 non memory operands but still it wont fail Just to also point out, it doesn't appear to be NP complete for register interference graphs, because they all seem to be 1-perfect. Various papers have observed this, and i've actually compiled all of gcc, libstdc++, etc, and every package ever on my computer, and not once has a single non-1-perfect interference graph occurred [my compiler would abort if it was true]. On 1-perfect graphs you can solve this problem in O(time it takes to determine the max clique), and there already exists a polynomial time algorithm for max-clique on perfect graphs. That means any register allocator will always fail on some very constrained asm input. now that statement is just false, not to mention irrelevant as none of these asm statemets are unreasonably constrained You are correct, NP completeness does not imply impossiblity. There are only a finite number of possibilities. And you cannot allow it to run indefinitely until a coloring is found, because then you've turned the graph coloring problem into the halting problem because you can't prove that a coloring exists and that the register allocator algorithm will terminate. this is ridiculous, the number of possible colorings is finite, u can always try them all in finite time You are right, he is wrong.
Re: [Bug debug/19367] [4.0 Regression] ICE: tree_check in lookup_local_die with local `using'
On Mon, 10 Jan 2005, pinskia at gcc dot gnu dot org wrote: --- Additional Comments From pinskia at gcc dot gnu dot org 2005-01-10 21:56 --- Confirmed, I think this is the boost ICE. This happens because the orig_decl that we are trying to use in emitting the using decl info appears to have been trashed or garbage collected before we emit it. I think i know why.
Re: [Bug debug/19267] New: [4.0 regression] execute/921215-1.c fails with -fpic at -O3 -g
On Wed, 5 Jan 2005, ghazi at gcc dot gnu dot org wrote: When running the testsuite with -fpic/-fPIC, I get an additional failure in the testsuite with mainline: FAIL: gcc.c-torture/execute/921215-1.c compilation, -O3 -g The regression appeared sometime in the last day or so between these postings: http://gcc.gnu.org/ml/gcc-testresults/2005-01/msg00135.html http://gcc.gnu.org/ml/gcc-testresults/2005-01/msg00179.html The compilation dies like this: 921215-1.c:22: internal compiler error: in gen_subprogram_die, at dwarf2out.c:11207 in the source we have: 11207 gcc_assert (errorcount); The problem is that errorcount is zero, so the gcc_assert() dies. I'm about to submit a patch that will fix this.
Fix longstanding bugzilla anoyance
Accept bug should now assign the bug to you, as one expects it to. Sorry it took so long for me to fix this, it kept falling off my todo list since it was really a minor annoyance :) --Dan
Re: GCC C bug: sizeof a union of structs returns zero value
On Thu, 16 Dec 2004, Hugh Daniel wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Note, I gave up on GNATS after repeatedly getting this error message no matter what I did to the text: You have not described how to repeat the bug You have not defined a category for the bug If there is a maintainer of the [EMAIL PROTECTED] bot I would be happy to help debug the problem with your script. If you can pass me the full raw email message you sent to the script (including headers, etc), i'm happy to try to debug it. Note that the [EMAIL PROTECTED] is (or should be) deprecated. The bug reporting instructions will point you to report bugs using our bugzilla system now. The gcc-gnats script is only really to handle the occasional gcc-gnats email that comes in. --Dan
Re: [Bug rtl-optimization/16613] [3.4 Regression] compile time regression, when adding cerr usage
On Fri, 10 Dec 2004, andre maute wrote: Once more i couldn't upload an attachment with the bugzilla upload form, so i send it here. You can email it to [EMAIL PROTECTED] with a subject of Bug 16613 (or whatever the bug number is), and it'll auto-add it to the bug for you.
Re: [Bug c++/18368] New: C++ error message regression
Yes, it happens ta global scope too. struct foo {} void method () {} will give the same error On Sun, 8 Nov 2004, sabre at nondot dot org wrote: On this c++ code: struct C { struct foo { int A; } void method(); }; This probably also happens at global scope. -Chris