A question about macro replacement
Hi, all, With following code: [CODE] struct B { int c; int d; }; #define X(a, b, c) \ do\ {\ if (a)\ printf("%d, %d\n", b.c, c);\ else\ printf("%d\n", c);\ }while(0); [/CODE] Why int d = 24; X(1, b, d); can be compiled successfully but X(1, b, 24); not. I cannot find any description about this behavior in C standard.
Re: After GIMPLE...
On 2/6/07, Diego Novillo <[EMAIL PROTECTED]> wrote: Paulo J. Matos wrote on 02/06/07 14:19: > Why before pass_build_ssa? (version 4.1.1) > It depends on the properties your pass requires. If you ask for PROP_cfg and PROP_gimple_any then you should schedule it after the CFG has been built, but if you need PROP_ssa, then you must be after pass_build_ssa which implies that your pass only gets enabled at -O1+. Ok, thank you very much. -- Paulo Jorge Matos - pocm at soton.ac.uk http://www.personal.soton.ac.uk/pocm PhD Student @ ECS University of Southampton, UK
Re: A question about macro replacement
[EMAIL PROTECTED] wrote: With following code: [CODE] struct B { int c; int d; }; #define X(a, b, c) \ do\ {\ if (a)\ printf("%d, %d\n", b.c, c);\ else\ printf("%d\n", c);\ }while(0); [/CODE] Why int d = 24; X(1, b, d); can be compiled successfully but X(1, b, 24); not. I cannot find any description about this behavior in C standard. Well, with the X(1, b, 24) case, the b.c in the first printf line becomes b.24, which is obviously a syntax error. This sort of thing would be a fair bit easier to track down if you quoted the error message rather than just saying that it cannot be compiled successfully. - Brooks
Re: False ???noreturn??? function does return warnings
Jan Hubicka <[EMAIL PROTECTED]> writes: [...] >> static inline void __attribute__((noreturn)) BUG(void) >> { >> __asm__ __volatile__("trap"); >> __builtin_unreached(); > > This is bit dificult to do in general since it introduces new kind of > control flow construct. It would be better to express such functions > explicitely to GCC. How about static inline void __attribute__((noreturn)) BUG(void) { __asm__ __volatile__ __noreturn__("trap"); } then ;) -- Sergei.
which opt. flags go where? - references
Hello, I'm planning to do some research on the optimization flags available for GCC (currently, using 4.1.1). More in particular, we want to see how we can come up with a set of combinations of flags which allow a tradeoff between compilation time, execution time and code size (as with -O1, -O2, -O3, -Os). Off course, we don't want to do an exhaustive search of all possible combinations of flags, because that would be totally unfeasible (using the 56 flags enabled in -O3 for gcc 4.1.1 yields ~72*10^15 (= 2^56-1) possible candidates). It seems there has already been some work done on this subject, or atleast that's what richi on #gcc (OFTC) told me. He wasn't able to refer me to work in that area though. I have found some references myself (partially listed below), but I'm hoping people more familiar with the GCC community can help expand this list. [1] Almagor et al., Finding effective compilation sequences (LCES'04) [2] Cooper et al., Optimizing for Reduced Code Space using Genetic Algorithms (LCTES'99) [3] Almagor et al., Compilation Order Matters: Exploring the Structure of the Space of Compilation Sequences Using Randomized Search Algorithms (Tech.Report) [3] Acovea: Using Natural Selection to Investigate Software Complexities (http://www.coyotegulch.com/products/acovea/) Some other questions: * I'm planning to do this work on an x86 platform (i.e. Pentium4), but richi told me that's probably not a good idea, because of the low number of registers available on x86. Comments? * Since we have done quite some analysis on the SPEC2k benchmarks, we'll also be using them for this work. Other suggestions are highly appreciated. * Since there has been some previous work on this, I wonder why none of it has made it into GCC development. Were the methods proposed unfeasible for some reason? What would be needed to make an approach to automatically find suitable flags for -Ox interesting enough to incorporate it into GCC? Any references to this previous work? greetings, Kenneth Hoste Paris, ELIS, Ghent University (Belgium) -- Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital (Aaron Levenstein) Kenneth Hoste ELIS - Ghent University [EMAIL PROTECTED] http://www.elis.ugent.be/~kehoste
Re: which opt. flags go where? - references
Kenneth Hoste wrote on 02/07/07 08:56: [1] Almagor et al., Finding effective compilation sequences (LCES'04) [2] Cooper et al., Optimizing for Reduced Code Space using Genetic Algorithms (LCTES'99) [3] Almagor et al., Compilation Order Matters: Exploring the Structure of the Space of Compilation Sequences Using Randomized Search Algorithms (Tech.Report) [3] Acovea: Using Natural Selection to Investigate Software Complexities (http://www.coyotegulch.com/products/acovea/) You should also contact Ben Elliston (CC'd) and Grigori Fursin (sorry, no email). Ben worked on dynamic reordering of passes, his thesis will have more information about it. Grigori is working on an API for iterative an adaptive optimization, implemented in GCC. He presented at the last HiPEAC 2007 GCC workshop. Their presentation should be available at http://www.hipeac.net/node/746 Some other questions: * I'm planning to do this work on an x86 platform (i.e. Pentium4), but richi told me that's probably not a good idea, because of the low number of registers available on x86. Comments? When deriving ideal flag combinations for -Ox, we will probably want common sets for the more popular architectures, so I would definitely include x86. * Since we have done quite some analysis on the SPEC2k benchmarks, we'll also be using them for this work. Other suggestions are highly appreciated. We have a collection of tests from several user communities that we use as performance benchmarks (DLV, TRAMP3D, MICO). There should be links to the testers somewhere in http://gcc.gnu.org/ * Since there has been some previous work on this, I wonder why none of it has made it into GCC development. Were the methods proposed unfeasible for some reason? What would be needed to make an approach to automatically find suitable flags for -Ox interesting enough to incorporate it into GCC? Any references to this previous work? It's one of the things I would like to see implemented in GCC in the near future. I've been chatting with Ben and Grigori about their work and it would be a great idea if we could discuss this at the next GCC Summit. I'm hoping someone will propose a BoF about it.
Re: 27% regression of gcc 4.3 performance on cpu2k6/calculix
Hi! I create test to reproduce issue with cpu2006/454.calculix See attached. File e_c3d.f contains cutted subroutine from calculix. tr535.f main entry point of the test. you can use go-script as a reference how i get these results. find_stall.pl script which find problem instruction combinations. Problem that new compiler generates read instruction right after write. See some dumps below. This is inner cycle near line #42 generated by rev. 119759 compiler .L13: .LBB22: .loc 1 42 0 movapd %xmm2, %xmm0 leaq(%rdx,%rbx), %rax .loc 1 38 0 addl$1, %edi addq$24, %rdx .loc 1 42 0 mulsd 72(%rcx), %xmm0 .loc 1 38 0 addq$72, %rcx cmpl$4, %edi .loc 1 42 0 mulsd %xmm3, %xmm0 mulsd -8(%rax,%r9,8), %xmm0 mulsd %xmm4, %xmm0 addsd %xmm0, %xmm1 .loc 1 38 0 jne .L13 This is for line 42 generated by rev. 119760 compiler .L13: .LBB23: .loc 1 42 0 movsd 72(%rdx), %xmm0 movq80(%rsp), %rax addq$72, %rdx mulsd -8(%r9,%r15,8), %xmm0 addq%rdi, %rax addq$24, %rdi .loc 1 38 0 cmpq$72, %rdi .loc 1 42 0 mulsd -8(%r11,%r14,8), %xmm0 mulsd -8(%rax,%r13,8), %xmm0 movq440(%rsp), %rax mulsd (%rax), %xmm0 addsd (%rsi,%r10,8), %xmm0 <-| movsd %xmm0, (%rsi,%r10,8)<-+- problems .loc 1 38 0 jne .L13 My output is: real0m3.781s user0m3.776s sys 0m0.004s real0m5.956s user0m5.948s sys 0m0.004s hey... we are going hey... we are going Line 31 addsd (%rsi,%r10,8), %xmm0 movsd %xmm0, (%rsi,%r10,8) Line 42 addsd (%rsi,%r10,8), %xmm0 movsd %xmm0, (%rsi,%r10,8) Feel free to ask if any problems with reproducing occurs. -Vladimir -- * From: Grigory Zagorodnev * To: gcc at gcc dot gnu dot org, dnovillo at redhat dot com * Cc: "H. J. Lu" * Date: Mon, 15 Jan 2007 17:59:31 +0300 * Subject: 27% regression of gcc 4.3 performance on cpu2k6/calculix Hi! There is a huge regression of gcc 4.3 performance detected on cpu2006/454.calculix benchmark at -O2 optimization level on x86_64-redhat-linux. Regression is caused by mem-ssa merge 12/12/2006 (revision 119760). http://gcc.gnu.org/viewcvs?view=rev&revision=119760 PS: I'm trying to get a small reproducer - Grigory test_calculix.tar.bz2 Description: BZip2 compressed data
Re: which opt. flags go where? - references
Hi, On 07 Feb 2007, at 15:22, Diego Novillo wrote: Kenneth Hoste wrote on 02/07/07 08:56: [1] Almagor et al., Finding effective compilation sequences (LCES'04) [2] Cooper et al., Optimizing for Reduced Code Space using Genetic Algorithms (LCTES'99) [3] Almagor et al., Compilation Order Matters: Exploring the Structure of the Space of Compilation Sequences Using Randomized Search Algorithms (Tech.Report) [3] Acovea: Using Natural Selection to Investigate Software Complexities (http://www.coyotegulch.com/products/acovea/) You should also contact Ben Elliston (CC'd) and Grigori Fursin (sorry, no email). Ben worked on dynamic reordering of passes, his thesis will have more information about it. Grigori is working on an API for iterative an adaptive optimization, implemented in GCC. He presented at the last HiPEAC 2007 GCC workshop. Their presentation should be available at http://www.hipeac.net/node/746 I actually talked to Grigori about the -Ox flags, I was at the HiPEAC conference too ;-) I didn't include references to his work, because my aim wouldn't be at reordering of passes, but just selecting them. I understand that reordering is of great importance while optimizing, but I think this project is big enough as is. Some other questions: * I'm planning to do this work on an x86 platform (i.e. Pentium4), but richi told me that's probably not a good idea, because of the low number of registers available on x86. Comments? When deriving ideal flag combinations for -Ox, we will probably want common sets for the more popular architectures, so I would definitely include x86. OK. I think richi's comment on x86 was the fact that evaluating the technique we are thinking about might produce results which are hard to 'port' to a different architecture. But then again, we won't be stating we have found _the_ best set of flags for a given goal... Thank you for your comment. * Since we have done quite some analysis on the SPEC2k benchmarks, we'll also be using them for this work. Other suggestions are highly appreciated. We have a collection of tests from several user communities that we use as performance benchmarks (DLV, TRAMP3D, MICO). There should be links to the testers somewhere in http://gcc.gnu.org/ OK, sounds interesting, I'll look into it. In which way are these benchmarks used? Just to test the general performance of GCC? Have they been compared to say, SPEC CPU, or other 'research/industrial' benchmark suites (such as MiBench, MediaBench, EEMBC, ...) ? * Since there has been some previous work on this, I wonder why none of it has made it into GCC development. Were the methods proposed unfeasible for some reason? What would be needed to make an approach to automatically find suitable flags for -Ox interesting enough to incorporate it into GCC? Any references to this previous work? It's one of the things I would like to see implemented in GCC in the near future. I've been chatting with Ben and Grigori about their work and it would be a great idea if we could discuss this at the next GCC Summit. I'm hoping someone will propose a BoF about it. I'm hoping my ideas will lead to significant results, because I think this is an important issue. greetings, Kenneth -- Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital (Aaron Levenstein) Kenneth Hoste ELIS - Ghent University [EMAIL PROTECTED] http://www.elis.ugent.be/~kehoste
Bug in value-prof.c:visit_hist
There appears to be a bug in value-prof.c:visit_hist rev 121554. This function always returns 0, which causes htab_traverse to exit early. This means that only the first histogram that appears in cfun- >value_histograms->entries is ever checked, so verify_histograms will only indicate an error if the first histogram is unreachable. The attached patch changes the return value to ensure that all histograms are checked. This patch bootstraps and passes make check on x86_64. Robert Kidd [EMAIL PROTECTED] Index: gcc/value-prof.c === --- gcc/value-prof.c(revision 121671) +++ gcc/value-prof.c(working copy) @@ -353,8 +353,9 @@ visit_hist (void **slot, void *data) dump_histogram_value (stderr, hist); debug_generic_stmt (hist->hvalue.stmt); error_found = true; + return 0; } - return 0; + return 1; } /* Verify sanity of the histograms. */
Re: Bug in value-prof.c:visit_hist
> This patch bootstraps and passes make check on x86_64. Please do not cross-post. Patches should go to gcc-patches@ only. -- Eric Botcazou
Regarding tree traversal
I am new to this list, so please excuse any obvious mistakes. I am trying to check if two types are equal or one is derived from the other within the compiler. One of the types is a struct that is defined under the std namescope. How do I search for a "node" that is a TYPE_DECL of the structure that I want? I would like to search for the TYPE_DECL of "struct foo" in the tree std_node. Regards, Anju -- This too shall pass
Re: ICE in gcc/libgcc2.c:566 (gcc trunk)
Hi Ian, sorry to bother again. I reduced the code (attached) that segfaults here on Core 2 Duo [1]. If I add -fno-split-wide-types the code does not segfault. That flag comes from your patchset [2]. execute: # ./cc1 -quiet -m64 -O1 test.c -o test.o Any ideas? Regards, Hanno [1] http://gcc.gnu.org/ml/gcc/2007-02/msg00095.html [2] http://gcc.gnu.org/ml/gcc-patches/2007-02/msg2.html typedef int TItype __attribute__ ((mode (TI))); typedef int DItype __attribute__ ((mode (DI))); typedef unsigned int UDItype __attribute__ ((mode (DI))); struct DWstruct {DItype low, high;}; typedef union { struct DWstruct s; TItype ll; } DWunion; TItype __multi3 (TItype u, TItype v) { const DWunion uu = {.ll = u}; const DWunion vv = {.ll = v}; DWunion w = { .ll = ({ DWunion __w; do { UDItype __x0, __x1, __x2, __x3; UDItype __ul, __vl, __uh, __vh; __ul = ((UDItype) (uu.s.low) & (((UDItype) 1 << ((8 * 8) / 2)) - 1)); __uh = ((UDItype) (uu.s.low) >> ((8 * 8) / 2)); __vl = ((UDItype) (vv.s.low) & (((UDItype) 1 << ((8 * 8) / 2)) - 1)); __vh = ((UDItype) (vv.s.low) >> ((8 * 8) / 2)); __x0 = (UDItype) __ul * __vl; __x1 = (UDItype) __ul * __vh; __x2 = (UDItype) __uh * __vl; __x3 = (UDItype) __uh * __vh; __x1 += ((UDItype) (__x0) >> ((8 * 8) / 2)); __x1 += __x2; if (__x1 < __x2) __x3 += ((UDItype) 1 << ((8 * 8) / 2)); (__w.s.high) = __x3 + ((UDItype) (__x1) >> ((8 * 8) / 2)); (__w.s.low) = ((UDItype) (__x1) & (((UDItype) 1 << ((8 * 8) / 2)) - 1)) * ((UDItype) 1 << ((8 * 8) / 2)) + ((UDItype) (__x0) & (((UDItype) 1 << ((8 * 8) / 2)) - 1)); } while (0); __w.ll; } )}; w.s.high += ((UDItype) uu.s.low * (UDItype) vv.s.high + (UDItype) uu.s.high * (UDItype) vv.s.low); return w.ll; }
Fw: Scheduling an early complete loop unrolling pass?
... >Ah, right... I wonder if we can keep the loop structure in place, even >after completely unrolling the loop - I mean the 'struct loop' in >'current_loops' (not the actual CFG), so that the "SLP in loops" would have >a chance to at least consider vectorizing this "loop". Having a "loop" structure for a piece of CFG that is not a loop, was used in some other compiler we worked with - the notion of 'region' was such that it corresponded to loops, and in addition the entire function belonged to a "universal" region (Peter - please correct if I'm wrong). But I think you were looking for some marking of a basic block saying "this used to be a loop but got completely unrolled". I wonder how much such a "dummy" loop structure can really help the vectorizer, except for (convenience of) keeping intact the driver that traverses all such structures or the hanging of additional data off of them. Ayal.
Re: ICE in gcc/libgcc2.c:566 (gcc trunk)
Hanno Meyer-Thurow <[EMAIL PROTECTED]> writes: > Hi Ian, > sorry to bother again. I reduced the code (attached) that segfaults here > on Core 2 Duo [1]. If I add -fno-split-wide-types the code does not segfault. > That flag comes from your patchset [2]. > > execute: > # ./cc1 -quiet -m64 -O1 test.c -o test.o > > Any ideas? The test case works for me. Note that I've committed several cleanup patches for lower-subreg.c over the last several days. In particular, do you have this change in your sources? 2007-02-01 Ian Lance Taylor <[EMAIL PROTECTED]> * lower-subreg.c (resolve_clobber): Handle a subreg of a concatn. ? Let me know if you still see the problem with up to date sources. Ian
Re: ICE in gcc/libgcc2.c:566 (gcc trunk)
On 07 Feb 2007 13:46:43 -0800 Ian Lance Taylor <[EMAIL PROTECTED]> wrote: > * lower-subreg.c (resolve_clobber): Handle a subreg of a concatn. Yes, that is there. I have revision 121690. Hanno
gcc-4.2-20070207 is now available
Snapshot gcc-4.2-20070207 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.2-20070207/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.2 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_2-branch revision 121698 You'll find: gcc-4.2-20070207.tar.bz2 Complete GCC (includes all of below) gcc-core-4.2-20070207.tar.bz2 C front end and core compiler gcc-ada-4.2-20070207.tar.bz2 Ada front end and runtime gcc-fortran-4.2-20070207.tar.bz2 Fortran front end and runtime gcc-g++-4.2-20070207.tar.bz2 C++ front end and runtime gcc-java-4.2-20070207.tar.bz2 Java front end and runtime gcc-objc-4.2-20070207.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.2-20070207.tar.bz2The GCC testsuite Diffs from 4.2-20070131 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.2 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: Regarding tree traversal
On Feb 7, 2007, at 1:05 PM, Prabhanjan Kambadur wrote: I am trying to check if two types are equal equal, what's that? :-) (That's a joke for the rest of the folks here. See the CANONICAL types work that Doug did recently for some of the more recent email threads.) One of the types is a struct that is defined under the std namescope. How do I search for a "node" that is a TYPE_DECL of the structure that I want? I would like to search for the TYPE_DECL of "struct foo" in the tree std_node. This process is called lookup. Glance around at routines like lookup_qualified_name, Don't be afraid to fire up gdb under emacs, set a breakpoint in the parser for the construct you're interested in, run the sample testcase through it and watch what the compiler does. It'd take you right there.
Re: Regarding tree traversal
On Feb 7, 2007, at 1:05 PM, Prabhanjan Kambadur wrote: I am new to this list, so please excuse any obvious mistakes. I am trying to check if two types are equal or one is derived from the other within the compiler. One of the types is a struct that is defined under the std namescope. How do I search for a "node" that is a TYPE_DECL of the structure that I want? I would like to search for the TYPE_DECL of "struct foo" in the tree std_node. Just to be clear, given "std" and given "foo" you want to find std::foo, right? That's the question I previously answered. Or, is your question, given the shape expressed by tree bar (a TYPE_DECL), find a "foo" with the same shape? The second question is answered by looping over all the members of std, and checking each one for the right shape.
Re: Regarding tree traversal
Yup, what you answered is indeed what I want Thanks, Anju
Re: ICE in gcc/libgcc2.c:566 (gcc trunk)
Hanno Meyer-Thurow <[EMAIL PROTECTED]> writes: > sorry to bother again. I reduced the code (attached) that segfaults here > on Core 2 Duo [1]. If I add -fno-split-wide-types the code does not segfault. > That flag comes from your patchset [2]. > > execute: > # ./cc1 -quiet -m64 -O1 test.c -o test.o > > Any ideas? I don't know what is causing this. I just checked again, and it does not happen for me. Looking at your backtrace from http://gcc.gnu.org/ml/gcc/2007-02/msg00095.html count_pseudo is being called with register 71. Register 71 no longer exists; it was split. That is why you are getting the SIGSEGV. But when I run my copy of the compiler, count_pseudo is never called with register 71. count_pseudo is being called from this code in order_regs_for_reload: EXECUTE_IF_SET_IN_REG_SET (&chain->live_throughout, FIRST_PSEUDO_REGISTER, i, rsi) { count_pseudo (i); } Since register 71 no longer exists, it should not be in chain->live_throughout. So why is it set? I'm not sure what else to say, since I can't recreate the problem myself. Can anybody else out there recreate this on their x86_64 system? Ian
"error: unable to generate reloads for...", any hints?
Hi, I am working on gcc 4.1.1 and Itanium architecture. I want to modify the machine description of ia64.md to add some checks before each ld instruction. the following is the original define_insn: (define_insn "*movqi_internal" [(set (match_operand:QI 0 "destination_operand" "=r,r,r, m, r,*f,*f") (match_operand:QI 1 "move_operand""rO,J,m,rO,*f,rO,*f"))] "ia64_move_ok (operands[0], operands[1])" "@ mov %0 = %r1 addl %0 = %1, r0 ld1%O1 %0 = %1%P1 st1%Q0 %0 = %r1%P0 getf.sig %0 = %1 setf.sig %0 = %r1 mov %0 = %1" [(set_attr "itanium_class" "ialu,ialu,ld,st,frfr,tofr,fmisc")]) I observe that there is a ld instruction in 3rd alternative, so I add a new define_insn before it in the hope that it will be matched firstly. (define_insn "*ld_movqi_internal" [(set (match_operand:QI 0 "destination_operand" "=r") (match_operand:QI 1 "move_operand" "m"))] "ia64_move_ok (operands[0], operands[1]) && flag_check_ld" { printf("define_insn ld_movqi_internal\n"); return "ld1%O1 %0 = %1%P1"; } [(set_attr "itanium_class" "ld")] I keep every thing the same as 3rd alternative in original define_insn except using C statement to return the desired output template. However, when I use the newly builded gcc to compile the following program, it crashes. #include char characters[8192]={'a',}; int main() { char c = characters[0]; printf("Hello World! c:%c\n", c); } the error reported is: hi.c:9: error: unable to generate reloads for: (insn 10 9 12 1 (set (mem/c/i:QI (reg/f:DI 111 loc79) [0 c+0 S1 A128]) (reg:QI 14 r14 [orig:342 characters ] [342])) 3 {*gift_movqi_internal_ld} (nil) (expr_list:REG_DEAD (reg:QI 14 r14 [orig:342 characters ] [342]) (nil))) hi.c:9: internal compiler error: in find_reloads, at reload.c:3738 In IA64, the first pesudo register number is 334, thus register 111 and register 14 are both hardware registers. I looked at find_reloads at reload.c and find the following code fragement and comment: /* The operands don't meet the constraints. goal_alternative describes the alternative that we could reach by reloading the fewest operands. Reload so as to fit it. */ if (best == MAX_RECOG_OPERANDS * 2 + 600) { /* No alternative works with reloads?? */ if (insn_code_number >= 0) fatal_insn ("unable to generate reloads for:", insn); ... So, what is going on here? Especially, what is find_reloads going to finish and why it is going wrong here... I would appreciate any help on this question, thx! Best Regards --andy.wu