improve -fverbose-asm option
Hello, I'd like to get more helpful information from the final .S file, such as basic block info, so that I can draw a cfg graph through a script. Perhaps the -fverbose-asm option is the right way to open this functionality. Here's a simple patch based on the current trunk svn. Index: gcc/final.c === --- gcc/final.c (revision 144878) +++ gcc/final.c (working copy) @@ -1830,10 +1830,38 @@ final_scan_insn (rtx insn, FILE *file, i targetm.asm_out.unwind_emit (asm_out_file, insn); #endif - if (flag_debug_asm) + if (flag_debug_asm !flag_verbose_asm) fprintf (asm_out_file, \t%s basic block %d\n, ASM_COMMENT_START, NOTE_BASIC_BLOCK (insn)-index); + /* Print basic block info. */ + if (flag_verbose_asm) +{ + fprintf (asm_out_file, \t%s BLOCK %d, + ASM_COMMENT_START, NOTE_BASIC_BLOCK (insn)-index); + if (NOTE_BASIC_BLOCK (insn)-frequency) +fprintf (asm_out_file, freq: %d, + NOTE_BASIC_BLOCK (insn)-frequency); + if (NOTE_BASIC_BLOCK (insn)-count) +fprintf (asm_out_file, count: %d, + NOTE_BASIC_BLOCK (insn)-count); + fprintf (asm_out_file, \n); + + fprintf (asm_out_file, \t%s PRED:, ASM_COMMENT_START); + FOR_EACH_EDGE (e, ei, NOTE_BASIC_BLOCK (insn)-preds) +{ + dump_edge_info (asm_out_file, e, 0); +} + fprintf (asm_out_file, \n); + + fprintf (asm_out_file, \t%s SUCC:, ASM_COMMENT_START); + FOR_EACH_EDGE (e, ei, NOTE_BASIC_BLOCK (insn)-succs) +{ + dump_edge_info (asm_out_file, e, 1); +} + fprintf (asm_out_file, \n); +} + if ((*seen (SEEN_EMITTED | SEEN_BB)) == SEEN_BB) { *seen |= SEEN_EMITTED; Also, I think it will be better to generate one label for each basic block, and the local label should have the function name as the suffix. Because some profile tools, such as oprofile, will output samples based on the labels. So this will help us to analyze the samples for each basic block. But current generated code will have many local labels with the same name. Perhaps it's again the -fverbose-asm to enable this functionality. But where should I go if I wanna implement this functionality? Cheers, Eric Fisher Mar 16, 2009
Re: help for arm avr bfin cris frv h8300 m68k mcore mmix pdp11 rs6000 sh vax
On 3/14/09, Paolo Bonzini bonz...@gnu.org wrote: Hans-Peter Nilsson wrote: The answer to the question is no, but I'd guess the more useful answer is yes, for different definitions of truncate. Ok, after my patches you will be able to teach GCC about this definition of truncate. I expect it's a bit too extreme an example, but I've just found (to my horror) that the MaverickCrunch FPU truncates all its shift counts to 6-bit signed (-32(right) to +31(left)), including on 64-bit integers, which is not very helpful to compile for. ...unless it happens to come easy to handle shift count is truncated to less than size of word in your new framework M
Re: help for arm avr bfin cris frv h8300 m68k mcore mmix pdp11 rs6000 sh vax
Martin Guy wrote: On 3/14/09, Paolo Bonzini bonz...@gnu.org wrote: Hans-Peter Nilsson wrote: The answer to the question is no, but I'd guess the more useful answer is yes, for different definitions of truncate. Ok, after my patches you will be able to teach GCC about this definition of truncate. I expect it's a bit too extreme an example, but I've just found (to my horror) that the MaverickCrunch FPU truncates all its shift counts to 6-bit signed (-32(right) to +31(left)), including on 64-bit integers, which is not very helpful to compile for. ...unless it happens to come easy to handle shift count is truncated to less than size of word in your new framework Uhm, well, no. :-) This could already be handled by faking a 63 bit truncation and using a splitter to expand those into something like this (I only know integer ARM assembly, so I'm making this up): AND R1, R0, #31 MOV R2, R2, SHIFT R1 ANDS R1, R0, #32 MOVNE R2, R2, SHIFT #31 MOVNE R2, R2, SHIFT #1 or ANDS R1, R0, #32 MOVNE R2, R2, SHIFT #-32 SUB R1, R1, R0 ; R1 = (x = 32 ? 32 - x : -x) MOV R2, R2, SHIFT R1 (which requires a scratch register, so it cannot be done postreload... this might be a problem) But my new stuff won't change anything. Paolo
Re: Preprocessor for assembler macros?
Philipp Marek philipp at marek.priv.at writes: gcc -S tmp.S for some reason prints to stdout, so gcc -S tmp.S tmp.s is what you need Thank you very much, I'll take a look. I tried very hard to achieve that; and one time it seemed to work, but I cannot make it work again. As an example I'm trying to expand the macros in the linux kernel source file arch/x86/kernel/entry_64.S I tried to call gcc -S, to put the various -I.. paths as needed, and I even renamed my as to as.bin and tried to get the assembler source directly (by using gcc -S $COLLECT_GCC_OPTIONS sourcefile) ... I cannot make it work again ... Do you have some other hint for me? Thank you very much. Regards, Phil
Re: -mfpmath=sse,387 is experimental ?
Hi, Timothy Madden terminato...@gmail.com 写入消息 news:5078d8af0903120218i23b69a4bma28ad9b3f1bd4...@mail.gmail.com... On Thu, Mar 12, 2009 at 1:15 AM, Jan Hubicka hubi...@ucw.cz wrote: Timothy Madden wrote: Hello Is -mfpmath=both for i386 and x86-64 still experimental in gcc 4.3, as the in the online manual page ? [...] The fundamental problem here is that backend lies to compiler about the fact that FP operation can not take one operand from SSE and other from X87. This is something I want to look into once I have more time. With new RA, perhaps we can drop all these fake constraints. That would be great ! I am sure having twice the number of registers (sse+387) would make a big difference. Even if SSE and FPU instructions set can not mix operands, using both at the same time (each with its registers) will be an improvement. Until then I would have a question: if I compile with -msse than using -mfpmath=387 would help floating-point operations not steal SSE registers that are already used by CPU operations ? And using -mfpmath=sse would make FPU and CPU share the SSE registers and compete on them ? How would I know if my AMD Sempron 2200+ has separate execution units for SSE and FPU instructions, with independent registers ? Most CPU use the same FP unit for both x87 and SIMD operations so it wouldn't give you double the performance. The only exception I know of is K6-2/3, whose x87 and 3DNow! units are separate. -- Zuxy
Re: GCC 4.4.0 Status Report (2009-03-13)
NightStrike wrote: On Fri, Mar 13, 2009 at 1:58 PM, Joseph S. Myers jos...@codesourcery.com wrote: Given the SC request we need to stay in Stage 4 rather than trying to work around it. What if GCC went back to stage 3 until the issue is resolved, thus opening the door for a number of stage3-type patches that don't affect 1) licensing and 2) plugin frameworks, but are merely bug fixes which would have long been shaken out by now. No, not at all. The only benefit we're having from this is that GCC 4.4 should be quite stable already in GCC 4.4.0, let's not destroy this one too. Paolo
Re: help for arm avr bfin cris frv h8300 m68k mcore mmix pdp11 rs6000 sh vax
On 3/16/09, Paolo Bonzini bonz...@gnu.org wrote: AND R1, R0, #31 MOV R2, R2, SHIFT R1 ANDS R1, R0, #32 MOVNE R2, R2, SHIFT #31 MOVNE R2, R2, SHIFT #1 or ANDS R1, R0, #32 MOVNE R2, R2, SHIFT #-32 SUB R1, R1, R0 ; R1 = (x = 32 ? 32 - x : -x) MOV R2, R2, SHIFT R1 Thanks for the tips. Yes, I was contemplating cooking up something like that, hobbled by the fact that if you use maverick instructions conditionally you either have to put seven nops either side of them or risk death by astonishment. M
Re: -mfpmath=sse,387 is experimental ?
Zuxy Meng wrote: Hi, Timothy Madden terminato...@gmail.com 写入消息 ! I am sure having twice the number of registers (sse+387) would make a big difference. You're not counting the rename registers, you're talking about 32-bit mode only, and you're discounting the different mode of accessing the registers. How would I know if my AMD Sempron 2200+ has separate execution units for SSE and FPU instructions, with independent registers ? Most CPU use the same FP unit for both x87 and SIMD operations so it wouldn't give you double the performance. The only exception I know of is K6-2/3, whose x87 and 3DNow! units are separate. -march=pentium-m observed the preference of those CPUs for mixing the types of code. This was due more to the limited issue rate for SSE instructions than to the expanded number of registers in use. You are welcome to test it on your CPU; however, AMD CPUs were designed to perform well with SSE alone, particularly in 64-bit mode.
RE: ARM compiler rewriting code to be longer and slower
[Resent because of account funnies. Apologies to those who get this twice] Hi, This problem is reported every once in a while, all targets with small load-immediate instructions suffer from this, especially since GCC 4.0 (i.e. since tree-ssa). But it seems there is just not enough interest in having it fixed somehow, or someone would have taken care of it by now. I've summed up before how the problem _could_ be fixed, but I can't find where. So here we go again. This could be solved in CSE by extending the notion of related expressions to constants that can be generated from other constants by a shift. Alternatively, you could create a simple, separate pass that applies CSE's related expressions thing in dominator tree walk. See http://gcc.gnu.org/ml/gcc-patches/2009-03/msg00158.html for handling something similar when related expressions differ by a small additive constant. I am planning to finish this and submit it for 4.5. Wouldn't doing this in CSE only solve the problem within an extended basic block and not necessarily across the program ? Surely you'd want to do it globally or am I missing something very basic here ? Ramana
Dose gcc provide any function to build def-use chain in RTL form
hi now i'm trying to construct def-use chain after the PASS_LEAF_REGS. for the ssa form structure has been destoried during the former passes. I have found that gcc provides a way to build the def-use chain in the PASS_REGRENAME, but it only contains the defs and uses all in one basic block. so if I want to get the global def-use data of the whole function, need i to construct it myself ? Does gcc provide any function to build the def-use chain in RTL form? thank you
Re: Dose gcc provide any function to build def-use chain in RTL form
villa gogh wrote: hi now i'm trying to construct def-use chain after the PASS_LEAF_REGS. for the ssa form structure has been destoried during the former passes. I have found that gcc provides a way to build the def-use chain in the PASS_REGRENAME, but it only contains the defs and uses all in one basic block. No, don't look at those. Instead look at fwprop.c which uses use-def chains -- DU chains are the same but they are computed with df_chain_add_problem (DF_DU_CHAIN); instead of df_chain_add_problem (DF_UD_CHAIN); before df_analyze. fwprop accesses use-def chains by using DF_REF_CHAIN (use); def-use chains are the same but the DF_REF_CHAIN macro is used with a def argument instead. Paolo
Re: GCC 4.4.0 Status Report (2009-03-13)
What about allowing for more backports from the graphite branch if this drags out for an extended period of time? In particular, I am thinking of those changes in graphite branch that might reduce those cases where -fgraphite-identity degrades the performance of the resulting code. Jack On Mon, Mar 16, 2009 at 11:10:07AM +0100, Paolo Bonzini wrote: NightStrike wrote: On Fri, Mar 13, 2...@1:58 PM, Joseph S. Myers jos...@codesourcery.com wrote: Given the SC request we need to stay in Stage 4 rather than trying to work around it. What if GCC went back to stage 3 until the issue is resolved, thus opening the door for a number of stage3-type patches that don't affect 1) licensing and 2) plugin frameworks, but are merely bug fixes which would have long been shaken out by now. No, n...@all. The only benefit we're having from this is that GCC 4.4 should be quite stable already in GCC 4.4.0, let's not destroy this one too. Paolo
Re: sign/zero extension of function arguments on x86-64
I got mixed results with icc for -- short a; void g(short); void f(void) { g(a); } -- it produces a movswl. For --- void g(int); void f(short a) { g(a); } -- it produces a movswq. For the original test - void g(short); void f(short a) { g(a); } -- it avoids the extension. Cheers, -- Rafael Avila de Espindola Google | Gordon House | Barrow Street | Dublin 4 | Ireland Registered in Dublin, Ireland | Registration Number: 368047
Re: Typo or intended?
Bingfeng Mei wrote: I just updated our porting to include last 2-3 weeks of GCC developments. I noticed a large number of test failures at -O1 that use a user-defined data type (based on a special register file of our processor). All variables of such type are now spilled to memory which we don't allow at -O1 because it is too expensive. After investigation, I found that it is the following new code causes the trouble. I don't quite understand the function of the new code, but I don't see what's special for -O1 in terms of register allocation in comparison with higher optimizing levels. If I change it to (optimize 1), everthing is fine as before. I start to wonder whether (optimize = 1) is a typo or intended. Thanks in advance. -O1 is supposed to allow debugging but still optimize, so it's quite possible that Vlad did intend to do this. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39432 Andrew.
RE: ARM compiler rewriting code to be longer and slower
Ramana Radhakrishnan writes: [Resent because of account funnies. Apologies to those who get this twice] Hi, This problem is reported every once in a while, all targets with small load-immediate instructions suffer from this, especially since GCC 4.0 (i.e. since tree-ssa). But it seems there is just not enough interest in having it fixed somehow, or someone would have taken care of it by now. I've summed up before how the problem _could_ be fixed, but I can't find where. So here we go again. This could be solved in CSE by extending the notion of related expressions to constants that can be generated from other constants by a shift. Alternatively, you could create a simple, separate pass that applies CSE's related expressions thing in dominator tree walk. See http://gcc.gnu.org/ml/gcc-patches/2009-03/msg00158.html for handling something similar when related expressions differ by a small additive constant. I am planning to finish this and submit it for 4.5. Wouldn't doing this in CSE only solve the problem within an extended basic block and not necessarily across the program ? Surely you'd want to do it globally or am I missing something very basic here ? No, you're not. There are plans moving some of what's in CSE to a new LCM (global) pass. Also note that for a global a pass you clearly need some more sophisticated cost model for deciding when CSEing is beneficial. On a multi-scalar architecture, instructions synthesizing consts sometimes appear to be free whereas holding a value a in a register for an extended period of time is not. Adam
Typo or intended?
Hello, I just updated our porting to include last 2-3 weeks of GCC developments. I noticed a large number of test failures at -O1 that use a user-defined data type (based on a special register file of our processor). All variables of such type are now spilled to memory which we don't allow at -O1 because it is too expensive. After investigation, I found that it is the following new code causes the trouble. I don't quite understand the function of the new code, but I don't see what's special for -O1 in terms of register allocation in comparison with higher optimizing levels. If I change it to (optimize 1), everthing is fine as before. I start to wonder whether (optimize = 1) is a typo or intended. Thanks in advance. Cheers, Bingfeng Mei Broadcom UK if ((! flag_caller_saves ALLOCNO_CALLS_CROSSED_NUM (a) != 0) /* For debugging purposes don't put user defined variables in callee-clobbered registers. */ || (optimize = 1 - why include -O1? (attrs = REG_ATTRS (regno_reg_rtx [ALLOCNO_REGNO (a)])) != NULL (decl = attrs-decl) != NULL VAR_OR_FUNCTION_DECL_P (decl) ! DECL_ARTIFICIAL (decl))) { IOR_HARD_REG_SET (ALLOCNO_TOTAL_CONFLICT_HARD_REGS (a), call_used_reg_set); IOR_HARD_REG_SET (ALLOCNO_CONFLICT_HARD_REGS (a), call_used_reg_set); } else if (ALLOCNO_CALLS_CROSSED_NUM (a) != 0) { IOR_HARD_REG_SET (ALLOCNO_TOTAL_CONFLICT_HARD_REGS (a), no_caller_save_reg_set); IOR_HARD_REG_SET (ALLOCNO_TOTAL_CONFLICT_HARD_REGS (a), temp_hard_reg_set); IOR_HARD_REG_SET (ALLOCNO_CONFLICT_HARD_REGS (a), no_caller_save_reg_set); IOR_HARD_REG_SET (ALLOCNO_CONFLICT_HARD_REGS (a), temp_hard_reg_set); }
Re: ARM compiler rewriting code to be longer and slower
On Mon, Mar 16, 2009 at 2:52 PM, Ramana Radhakrishnan ramana.radhakrish...@arm.com wrote: Wouldn't doing this in CSE only solve the problem within an extended basic block and not necessarily across the program ? Surely you'd want to do it globally or am I missing something very basic here ? Why so serious^Wsurely? I think doing this optimization over extended basic blocks would catch 90% of the cases. The loop-carried form is covered by auto-increment generation (and yes I know that pass also needs to be improved ;-) Ciao! Steven
Re: ARM compiler rewriting code to be longer and slower
On Mon, Mar 16, 2009 at 12:11 PM, Adam Nemet ane...@caviumnetworks.com wrote: Ramana Radhakrishnan writes: [Resent because of account funnies. Apologies to those who get this twice] Hi, This problem is reported every once in a while, all targets with small load-immediate instructions suffer from this, especially since GCC 4.0 (i.e. since tree-ssa). But it seems there is just not enough interest in having it fixed somehow, or someone would have taken care of it by now. I've summed up before how the problem _could_ be fixed, but I can't find where. So here we go again. This could be solved in CSE by extending the notion of related expressions to constants that can be generated from other constants by a shift. Alternatively, you could create a simple, separate pass that applies CSE's related expressions thing in dominator tree walk. See http://gcc.gnu.org/ml/gcc-patches/2009-03/msg00158.html for handling something similar when related expressions differ by a small additive constant. I am planning to finish this and submit it for 4.5. Wouldn't doing this in CSE only solve the problem within an extended basic block and not necessarily across the program ? Surely you'd want to do it globally or am I missing something very basic here ? No, you're not. There are plans moving some of what's in CSE to a new LCM (global) pass. Also note that for a global a pass you clearly need some more sophisticated cost model for deciding when CSEing is beneficial. On a multi-scalar architecture, instructions synthesizing consts sometimes appear to be free whereas holding a value a in a register for an extended period of time is not. Right. You probably want something closer to nigel horspool's isothermal speculative PRE which takes into account (using heuristics and profiles) where the best place to put things is based on costs, instead of LCM, which uses a notion of lifetime optimality See http://webhome.cs.uvic.ca/~nigelh/pubs.html for Fast Profile-Based Partial Redundancy Elimination There was a working implementation of this done for GCC 4.1 that used profile info and execution counts. If you are interested, and can hunt down David Pereira (He isn't at uvic anymore, and i haven't talked to him since so i don't have his email), he'd probably give you the code :)
Re: Fwd: Mips, -fpie and TLS management
2009/3/12 Daniel Jacobowitz d...@false.org: On Thu, Mar 12, 2009 at 02:02:36PM +0100, Joel Porquet wrote: Check what symbol is at, or near, 0x4003 + 22368. It's probably the GOT plus a constant bias. It seems there is nothing at this address. Here is the program header: Don't know then. Look at compiler-generated assembly instead of disassembly; that often helps. Do you mean the object file produced by gcc before linkage? If yes, the code looks like: 3c05lui a1,0x0 40: R_MIPS_TLS_DTPREL_HI16 a which will be computed later as 3c054003lui a1,0x4003 By the way, how did you test the code of TLS for mips? I mean, uclibc seems the more advanced lib for mips, and although this lib seems to have the necessary code to manage tls once it is installed, the ldso doesn't contain any code for handling TLS (relocation, tls allocation, etc)... That statement about uclibc strikes me as bizarre. I tested it with glibc, naturally. GLIBC has a much more reliable TLS implementation than uclibc's in-progress one. I just downloaded the glibc archive without noticing that the mips port was in another archive... My mistake.. Last question, is there a difference between DSO and PIE objects other than the INTERP entry in the program header? Yes. Symbol preemption is allowed for DSOs but not for PIEs or normal executables. That explains the different choice of model. But this is only a property, isn't it? I was meaning, how can you differenciate them at loading time, when you analyse the elf file. You can't. As you surely know, ELF_R_SYM() macro performs (val8) which gives the symbol index in order to retrieve the name of the symbol. This name then allows to look up the symbol. Unfortunately, in the case of local-dynamic, ELF_R_SYM will return 0 which is not correct (the same for global-dynamic will return 9): we can see by the way that readelf is not able to get the symbol name. What do you think about this? This is a *module* relocation. In local dynamic the module is always the current DSO; it does not need a symbol. But what if the DSO access other module's TLS? Finally, I noticed another problem. GCC seems to not make room for the 4 arguments as specified in the ABI, when calling __get_tls_addr. For example, here is an extract of the code for calling (we see that data are stored directly at the top of the stack): ... 5ffe0bfc: 27bdfff0addiu sp,sp,-16 5ffe0c00: afbf000csw ra,12(sp) 5ffe0c04: afbcsw gp,0(sp) 5ffe0c08: afa40010sw a0,16(sp) 5ffe0c0c: 100db 5ffe0c44 puts+0x54 5ffe0c10: nop 5ffe0c14: 8f998030lw t9,-32720(gp) 5ffe0c18: 27848038addiu a0,gp,-32712 5ffe0c1c: 0320f809jalrt9 5ffe0c20: nop 5ffe0c24: 8fbclw gp,0(sp) ... The jalr t9 is the call to get_tls_addr whose code is: ... 5ffe0b40: 27bdffe8addiu sp,sp,-24 5ffe0b44: afbcsw gp,0(sp) 5ffe0b48: afa40018sw a0,24(sp) 5ffe0b4c: 7c03e83b0x7c03e83b ... We notice then that sw a0, 24(sp) will erase $gp which was saved at the same place (sw gp, 0(gp)) by the caller. Regards, Joel
Re: Fwd: Mips, -fpie and TLS management
On Mon, Mar 16, 2009 at 06:19:01PM +0100, Joel Porquet wrote: 2009/3/12 Daniel Jacobowitz d...@false.org: On Thu, Mar 12, 2009 at 02:02:36PM +0100, Joel Porquet wrote: Check what symbol is at, or near, 0x4003 + 22368. It's probably the GOT plus a constant bias. It seems there is nothing at this address. Here is the program header: Don't know then. Look at compiler-generated assembly instead of disassembly; that often helps. Do you mean the object file produced by gcc before linkage? That will do, but the actual assembly (-S) is more helpful sometimes. This is a *module* relocation. In local dynamic the module is always the current DSO; it does not need a symbol. But what if the DSO access other module's TLS? Then it does not use Local Dynamic to do so. Finally, I noticed another problem. GCC seems to not make room for the 4 arguments as specified in the ABI, when calling __get_tls_addr. For example, here is an extract of the code for calling (we see that data are stored directly at the top of the stack): ... 5ffe0bfc: 27bdfff0addiu sp,sp,-16 5ffe0c00: afbf000csw ra,12(sp) 5ffe0c04: afbcsw gp,0(sp) That line is bogus. Figure out where it came from; the cprestore offset should not be zero. -- Daniel Jacobowitz CodeSourcery
[Fwd: gomp - cost of threadprivate data access]
[ Perhaps we need a somewhat larger audience for this one, as it isn't a gfortran specific issue (despite the COMMONs). ] The reporter of this problem (perhaps it's necessary to open a bugzilla PR) uses: It is GNU/linux on x86_64, fedora 10 kernel 2.6.27.12-170.2.5.fc10.x86_64 glibc-2.9-3.x86_64 -- Toon Moene - e-mail: t...@moene.org (*NEW*) - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/ Progress of GNU Fortran: http://gcc.gnu.org/gcc-4.4/changes.html ---BeginMessage--- Hello, We have parallelized a relatively large f77 project (GEANT3, ~200k loc) using OpenMP. Now we are running comparisons between standard and parallel version and it turns out that just making the commons threadprivate results in 20% percent speed penalty. This extra time is spent in __tls_get_addr() function which seems to be called for every access of a threadprivate variable. Would it be in principle possible to optimize this access? I figure that the base address of all referenced commons could be obtained once per function thus drastically reducing the __tls_get_addr() call count. We are using gcc-4.3 branch from the beginning of February, with patches to allow equivalence statements among threadprivate data. Callgrind output of a sample run is available at: -O2 https://mtadel.web.cern.ch/mtadel/callgrind.out.13032 -O2 -g https://mtadel.web.cern.ch/mtadel/callgrind.out.13055 Best, Matevz ---End Message---
Re: [Fwd: gomp - cost of threadprivate data access]
On Mon, Mar 16, 2009 at 7:06 PM, Toon Moene t...@moene.org wrote: [ Perhaps we need a somewhat larger audience for this one, as it isn't a gfortran specific issue (despite the COMMONs). ] The reporter of this problem (perhaps it's necessary to open a bugzilla PR) uses: It is GNU/linux on x86_64, fedora 10 kernel 2.6.27.12-170.2.5.fc10.x86_64 glibc-2.9-3.x86_64 The __tls_get_addr() calls should already be optimized if the proper TLS model is used. Do we have a test case? Ciao! Steven
Re: improve -fverbose-asm option
Eric Fisher joefoxr...@gmail.com writes: I'd like to get more helpful information from the final .S file, such as basic block info, so that I can draw a cfg graph through a script. The basic block information and the CFG graph is not reliable at that point in the compilation. Your patch will work reliably for some targets and optimization levels but not for others. The CFG information is messed up by the machine dependent reorg pass and the delay slot pass. I would be worried about confusing people. Also, I think it will be better to generate one label for each basic block, and the local label should have the function name as the suffix. Because some profile tools, such as oprofile, will output samples based on the labels. So this will help us to analyze the samples for each basic block. But current generated code will have many local labels with the same name. Perhaps it's again the -fverbose-asm to enable this functionality. But where should I go if I wanna implement this functionality? The local labels used for blocks are normally discarded by the assembler and thus are never seen by tools like oprofile. Using named symbols for basic blocks seems like a reasonable option if it will indeed give better information from oprofile, but it should be an option separate from -fverbose-asm. The labels in RTL are CODE_LABEL insns, so you would want to change the way that they are emitted in final_scan_insn. The fact that there can be several CODE_LABELs in sequence doesn't seem to matter too much, since only one will be picked up by profiling tools. To be clear, I would want to see that you really do get better results from profiling tools before accepting such a patch. Ian
Re: Preprocessor for assembler macros?
Ph. Marek phil...@marek.priv.at writes: Philipp Marek philipp at marek.priv.at writes: gcc -S tmp.S for some reason prints to stdout, so gcc -S tmp.S tmp.s is what you need Thank you very much, I'll take a look. I tried very hard to achieve that; and one time it seemed to work, but I cannot make it work again. I already asked you to take this question to a different mailing list, and I already answered your question. http://gcc.gnu.org/ml/gcc/2009-03/msg00187.html Please take any followups to a different mailing list. Ian
Difference between local/global/parameter array handling
Dear all, I've been working on explaining to GCC the cost of loads/stores on my target and I arrived to this problem. Consider the following code: uint64_t sum = 0; for(i=0; iN; i += 2) { /* N is defined by a macro */ z0 = buff[i]; z1 = buff[i+1]; sum += z0 + z1; } Depending on the type (local/global or parameter of the function) of buff, I get different code generations for the loop: For global and local definitions of buff: $L2: ldd r6,8(r10) ldd r7,0(r10) addir10,r10,16 cmpne r8,r11,r10 add r6,r6,r7 add r9,r9,r6 bt r8,$L2 For the parameter, I get this: $L7 add r6,r48,r10 ldd r8,0(r6) ldd r7,0(r11) addir10,r10,16 cmpine r6,r10,1024 addir11,r11,16 add r7,r7,r8 add r9,r9,r7 bt r6,$L7 I don't seem to see why the compiler handles the case of buff as a parameter to the function differently. It uses 2 registers and fails to see that it could use the same one with the offset like how it does it in the global/local cases. Any idea of why this happens to my code generation? I wonder now that I look at this if it's an address issue. If you compare the way it handles the end test, for local and global (where the compiler has the information of the array), the compare is done using the end address of the array, whereas this is no longer the case for the parameter. Instead it uses the number of iterations instead. I have just now confirmed this by defining the global array as a pointer or an array (int *tab or int tab[128];). In the case of the array, I get the solution I would expect. In the case of the pointer, I get the version that I do not like. Any ideas? Thank you very much for your help, Jean Christophe Beyler
Re: Understand BLKmode and returning structure in register.
Bingfeng Mei b...@broadcom.com writes: In foo function, compute_record_mode function will set the mode for struct COMPLEX as BLKmode partly because STRICT_ALIGNMENT is 1 on my target. In TARGET_RETURN_IN_MEMORY hook, I return 1 for BLKmode type and 0 otherwise for small size (8) (like MIPS). Thus, this structure is still returned through memory, which is not very efficient. More importantly, ABI is NOT FIXED under such situation. If an assembly code programmer writes a function returning a structure. How does he know the structure will be treated as BLKmode or otherwise? So he doesn't know whether to pass result through memory or register. Do I understand correctly? Yes. I think having TARGET_RETURN_IN_MEMORY depend on internal details like the RTL mode is often seen as an historical mistake. As you say, the ABI should be defined directly by the type instead. Unfortunately, once you start using a mode, it's difficult to stop using a mode without breaking compatibility. So one of the main reasons the MIPS port still uses the mode is because no-one dares touch it. Likewise, it's now difficult to change the mode attached to a structure (which could potentially make structure accesses more efficient) without accidentally breaking someone's ABI. On the other hand, if I return 0 only according to struct type's size regardless BLKmode or not, GCC will produces very inefficient code. For example, stack setup code in foo is still generated even it is totally unnecessary. Yeah, there's definitely room for improvement here. And as you say, it's already a problem for MIPS. I think it's just one of those things that doesn't occur often enough in critical code for anyone to have spent time optimising it. Richard
generic bug in fixed-point constant folding
Hi, I think I found a generic problem for fixed point constant folding. In fold-const.c:11872 gcc tries to apply: /* Transform (x c) c into x (-1c), or transform (x c) c into x ((unsigned)-1 c) for unsigned types. */ I attached a simple patch which fixes the problem by not applying this optimization to fixed point types. I would like to have this optimization because it is possible.. but the problem is fixed-point types do not support bitwise operations like | ^ ~.. so without supporting these somehow internally but not allowing the user to have them, this can't take place. I am open to other suggestions. For future reference should this be posted as a bug report? It seems simple enough that it could be included right away.. but I feel like if it's a bug report no one will notice since fixed-point support is not widely used. Sean Index: fold-const.c === --- fold-const.c (revision 144210) +++ fold-const.c (working copy) @@ -11877,7 +11877,8 @@ fold_binary (enum tree_code code, tree t host_integerp (arg1, false) TREE_INT_CST_LOW (arg1) TYPE_PRECISION (type) host_integerp (TREE_OPERAND (arg0, 1), false) - TREE_INT_CST_LOW (TREE_OPERAND (arg0, 1)) TYPE_PRECISION (type)) + TREE_INT_CST_LOW (TREE_OPERAND (arg0, 1)) TYPE_PRECISION (type) + TREE_CODE (type) != FIXED_POINT_TYPE) { HOST_WIDE_INT low0 = TREE_INT_CST_LOW (TREE_OPERAND (arg0, 1)); HOST_WIDE_INT low1 = TREE_INT_CST_LOW (arg1);
[Bug libobjc/39465] libobjc does not find classes of DLLs
--- Comment #2 from ayers at gcc dot gnu dot org 2009-03-16 07:27 --- So the situation seems to be: - libobjc is a static library. - libfoo is a dll statically linked against libobjc. - test is program which is linked both against libfoo and libobjc. I'm guessing here since I have no experience mingw and with linking libobjc statically, but I could imagine that you may have two copies of libobjc in your executable each with it's own set of runtime structures, which may cause confusion. Is there any reason why libobjc isn't dynamically linked if you going to use DLL's? Note I'll still need to build a mingw compiler and look into the auto-import warning and I'm not sure when I'll get around to it, so I haven't assigned the bug yet in case someone else can easily test it. Cheers, David -- ayers at gcc dot gnu dot org changed: What|Removed |Added CC||ayers at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39465
[Bug debug/39355] [4.4 Regression] Revision 144529 miscompiled libcpp/expr.c
--- Comment #25 from jakub at gcc dot gnu dot org 2009-03-16 07:52 --- I'd say first try to add noinline attribute on all callers of num_positive, if it fails even with those, add also __attribute__((__optimize__(0))) to them one by one. If the noinline attribute to those makes the miscompilation go away, search one by one which one it is and retry with all callers of that function. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39355
[Bug tree-optimization/39455] [4.3/4.4 Regression] ICE : in compare_values_warnv, at tree-vrp.c:1073
--- Comment #7 from jakub at gcc dot gnu dot org 2009-03-16 08:15 --- Reduced testcase: /* { dg-do compile } */ /* { dg-options -O2 -fprefetch-loop-arrays } */ void foo (char *x, unsigned long y, unsigned char *z) { unsigned int c[256], *d; for (d = c + 1; d c + 256; ++d) *d += d[-1]; x[--c[z[y]]] = 0; } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39455
[Bug tree-optimization/39455] [4.3/4.4 Regression] ICE : in compare_values_warnv, at tree-vrp.c:1073
--- Comment #8 from pinskia at gmail dot com 2009-03-16 08:28 --- Subject: Re: [4.3/4.4 Regression] ICE : in compare_values_warnv, at tree-vrp.c:1073 Sent from my iPhone On Mar 16, 2009, at 1:15 AM, jakub at gcc dot gnu dot org gcc-bugzi...@gcc.gnu.org wrote: --- Comment #7 from jakub at gcc dot gnu dot org 2009-03-16 08:15 --- Reduced testcase: /* { dg-do compile } */ /* { dg-options -O2 -fprefetch-loop-arrays } */ void foo (char *x, unsigned long y, unsigned char *z) { unsigned int c[256], *d; for (d = c + 1; d c + 256; ++d) *d += d[-1]; x[--c[z[y]]] = 0; Hmm. Could this be the char-- bug? Where the front-end/gimplifier does not promote that to int? Thanks, Andrew Pinski } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39455 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39455
[Bug rtl-optimization/30688] Branch registers loaded too late on ia64
--- Comment #5 from steven at gcc dot gnu dot org 2009-03-16 08:46 --- Can someone point me to the IA64 optimiation manuals mentioned in comment #0? I'm looking for some answers, for example: * Which branch registers can I use? bt-load can actually perform register renaming. It has to, of course, because bt-load runs after the register allocator. The register allocator prefers to always use tr0 on sh64, and it probably always tries to use the same branch register on ia64 too. So register renaming is a Good Thing here. But which regs can I use on IA64? * What does as early as possible mean in comment #0? Are there recommendations for what is considered too early (for example due to interactions with calls and such)? * What happens if a value is assigned to a branch register on IA64? Is the prefetcher always triggered? What is the latency of the prefetching after a branch register has been assigned a value? * Is there a possibility to add a prediction hint to say branch register A is more likely to be used than branch register B when multiple branch registers are assigned a value in the same basic block? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30688
[Bug tree-optimization/39455] [4.3/4.4 Regression] ICE : in compare_values_warnv, at tree-vrp.c:1073
--- Comment #9 from jakub at gcc dot gnu dot org 2009-03-16 08:49 --- No, this seems to be aprefetch's pass fault, at least in quick skim *.cunroll seems to be ok typewise, while *.aprefetch has: D.1649_44 = c + 1024; D.1650_43 = (long unsigned int) D.1649_44; if (c[2] = D.1650_43) D.1650 is long unsigned int and c is unsigned int c[256], so obviously the comparison above is wrong. Will try to debug it. -- jakub at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |jakub at gcc dot gnu dot org |dot org | Status|NEW |ASSIGNED Last reconfirmed|2009-03-13 14:03:54 |2009-03-16 08:49:06 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39455
[Bug driver/39356] assembler isn't called
--- Comment #9 from ktietz at gcc dot gnu dot org 2009-03-16 09:15 --- (In reply to comment #8) (In reply to comment #7) The following patch solves this problem and prevents the name collision for 32 and 64 bits win32 systems. ChangeLog * config/i386/i386.md (allocate_stack_worker_32): Use ___gnu_chkstk. (allocate_stack_worker_64): Likewise. * config/i386/cygwin.asm (__alloca): Renamed to __gnu_alloca. (___chkstk): Renamed to ___gnu_chkstk. No. This breaks backward compatibility. Static libraries and objects built with current and older versions of gcc will not be able to resolve references to __alloca or ___chkstk.Why not add labels with the new names as aliases rather than replace. Danny Ok, for 32-bits this makes sense to keep the old symbol names. Beside there is still a chance that a user uses the manually the chkstk.o file, which can lead to undefined behaviour (at least if the user code references __chkstk). For 64-bit I prefer to avoid those old names and simply rename it. Is this ok for you? I'll file then a patch for it? Kai -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39356
[Bug target/39115] [4.3 Regression] Value of variable is not read again
-- rguenth at gcc dot gnu dot org changed: What|Removed |Added Known to fail||4.3.3 Known to work|4.2.4 |4.2.4 4.4.0 Priority|P3 |P2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39115
[Bug tree-optimization/39455] [4.3/4.4 Regression] ICE : in compare_values_warnv, at tree-vrp.c:1073
--- Comment #10 from jakub at gcc dot gnu dot org 2009-03-16 09:43 --- Seems tree-ssa-loop-niter.c has a lot of p+ issues. The following untested patch fixes just the number_of_iterations_lt_to_ne bugs and fixes this testcase: --- gcc/tree-ssa-loop-niter.c.jj2009-03-04 20:06:31.0 +0100 +++ gcc/tree-ssa-loop-niter.c 2009-03-16 10:30:39.0 +0100 @@ -699,8 +699,10 @@ number_of_iterations_lt_to_ne (tree type iv0-base = iv1-base + MOD. */ if (!iv0-no_overflow !integer_zerop (mod)) { - bound = fold_build2 (MINUS_EXPR, type, + bound = fold_build2 (MINUS_EXPR, type1, TYPE_MAX_VALUE (type1), tmod); + if (POINTER_TYPE_P (type)) + bound = fold_convert (type, bound); assumption = fold_build2 (LE_EXPR, boolean_type_node, iv1-base, bound); if (integer_zerop (assumption)) @@ -708,6 +710,11 @@ number_of_iterations_lt_to_ne (tree type } if (mpz_cmp (mmod, bnds-below) 0) noloop = boolean_false_node; + else if (POINTER_TYPE_P (type)) + noloop = fold_build2 (GT_EXPR, boolean_type_node, + iv0-base, + fold_build2 (POINTER_PLUS_EXPR, type, + iv1-base, tmod)); else noloop = fold_build2 (GT_EXPR, boolean_type_node, iv0-base, @@ -723,6 +730,8 @@ number_of_iterations_lt_to_ne (tree type { bound = fold_build2 (PLUS_EXPR, type1, TYPE_MIN_VALUE (type1), tmod); + if (POINTER_TYPE_P (type)) + bound = fold_convert (type, bound); assumption = fold_build2 (GE_EXPR, boolean_type_node, iv0-base, bound); if (integer_zerop (assumption)) @@ -730,6 +739,13 @@ number_of_iterations_lt_to_ne (tree type } if (mpz_cmp (mmod, bnds-below) 0) noloop = boolean_false_node; + else if (POINTER_TYPE_P (type)) + noloop = fold_build2 (GT_EXPR, boolean_type_node, + fold_build2 (POINTER_PLUS_EXPR, type, + iv0-base, + fold_unary (NEGATE_EXPR, + type1, tmod)), + iv1-base); else noloop = fold_build2 (GT_EXPR, boolean_type_node, fold_build2 (MINUS_EXPR, type1, but e.g. number_of_iterations_le doesn't look correct at all as well. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39455
[Bug middle-end/39333] gcc 4.3.3 miscompiles when -finline-small-functions is used
--- Comment #19 from falk at debian dot org 2009-03-16 10:24 --- (In reply to comment #18) Well, I've got bad news for you anyway: it seems that the problem affects gcc-4.3.2 too: it seems it's reproducible in another app, however one potentially much harder to debug. Please read http://bugs.winehq.org/show_bug.cgi?id=17406 and give some ideas for a test. The fact that -fno-inline helps gives only very little indication that this is actually the same problem. In any case, I don't think there's really anything we can do without a complete test case (that is, a single file with a main() that exits with 0 when everything's fine and 1 otherwise). This is very difficult to do for someone who doesn't know the freeciv codebase. -- falk at debian dot org changed: What|Removed |Added Status|UNCONFIRMED |WAITING http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39333
[Bug libobjc/39465] libobjc does not find classes of DLLs
--- Comment #3 from js-gcc at webkeks dot org 2009-03-16 11:24 --- When the target is mingw32, it seems that libobjc is only built as a static library. This isn't a bad idea after all, because I guess no win32 user has a libobjc.so installed somewhere, so you would need to ship that file with every binary produced from ObjC-sources. I heard from the GNUstep guys that they had the same problem until they linked libobjc dynamically. But IMO, this is only a workaround - it should also work if libobjc is linked statically. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39465
[Bug target/36047] -pg does not work on large binaries and m68k
--- Comment #2 from mkuvyrkov at gcc dot gnu dot org 2009-03-16 11:35 --- Would you please attach a preprocessed testcase so one can reproduce the problem. -- mkuvyrkov at gcc dot gnu dot org changed: What|Removed |Added CC||mkuvyrkov at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36047
[Bug libobjc/39465] libobjc does not find classes of DLLs
--- Comment #4 from ayers at gcc dot gnu dot org 2009-03-16 11:41 --- Well, consider me a GNUstep guy yet I'm definitely not a GNUstep on MinGW32 guy. (Or anything on MinGW32... which is why this a bit difficult, yet I'm trying to help maintain libobjc so I'll see what I can do.) Could you please add a link to that discussion? It seems that I missed it. I've found a few mingw32 discussions searching the archive but nothing recent wrt static linking. In the meantime I'm learning how to setup a cross tool chain... please be patient. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39465
[Bug libobjc/39465] libobjc does not find classes of DLLs
--- Comment #5 from js-gcc at webkeks dot org 2009-03-16 11:46 --- It would be hard to link to that discussion as that was IRL on FOSDEM in the GNUstep Dev Room :). I reported that bug once on the mingw32 list, but they wouldn't really care about it. After speaking to Nicola Pero on FOSDEM, I decided that it'd be best to file a bug for libobjc - and so I did :). For building mingw32 with gcc 4, you could have a look at these Port files I wrote: https://webkeks.org/hg/crux_ports/file/6062794869e8/mingw32-api https://webkeks.org/hg/crux_ports/file/6062794869e8/mingw32-binutils https://webkeks.org/hg/crux_ports/file/6062794869e8/mingw32-gcc https://webkeks.org/hg/crux_ports/file/6062794869e8/mingw32-runtime -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39465
[Bug debug/37890] Incorrect nesting for DW_TAG_imported_declaration
--- Comment #1 from jan dot kratochvil at redhat dot com 2009-03-16 14:24 --- Verified as the problem exists on GNU C++ 4.4.0 20090315 (experimental). Tried also non-main function and slightly complicated function. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37890
[Bug debug/39471] New: DW_TAG_imported_module should be used (not DW_TAG_imported_declaration)
Regression from g++-4.3 for GNU C++ 4.4.0 20090315 (experimental) (+also for 4.4.0 20090313 (Red Hat 4.4.0-0.26)) For full namespace import one should use DW_TAG_imported_module. 1:namespace A 2:{ 3: int i = 1; 4:} 5: 6:int 7:main () 8:{ 9: using namespace A; 10: i = 2; 11: return 0; 12:} Using g++-4.4 DWARF one must use `A::i' at `main' in the debugger. The whole namespace `A' should be imported there instead. WRONG g++-4.4 debuginfo: c DW_AT_producer: (indirect string, offset: 0x0): GNU C++ 4.4.0 20090315 (experimental) 12d: Abbrev Number: 2 (DW_TAG_subprogram) 2f DW_AT_name: (indirect string, offset: 0x7d): main 251: Abbrev Number: 3 (DW_TAG_lexical_block) 52 DW_AT_low_pc : 0x4 5a DW_AT_high_pc : 0x13 362: Abbrev Number: 4 (DW_TAG_imported_declaration) 65 DW_AT_name: A 67 DW_AT_import : 0x74 [Abbrev Number: 6 (DW_TAG_namespace)] 174: Abbrev Number: 6 (DW_TAG_namespace) 75 DW_AT_name: A 27d: Abbrev Number: 7 (DW_TAG_variable) 7e DW_AT_name: i 82 DW_AT_MIPS_linkage_name: (indirect string, offset: 0x74): _ZN1A1iE Correct g++-4.3 debuginfo: c DW_AT_producer: (indirect string, offset: 0x0): GNU C++ 4.3.2 20081105 (Red Hat 4.3.2-7) 12d: Abbrev Number: 2 (DW_TAG_subprogram) 2f DW_AT_name: (indirect string, offset: 0x80): main 251: Abbrev Number: 3 (DW_TAG_imported_module) 54 DW_AT_import : 0x60 [Abbrev Number: 5 (DW_TAG_namespace)] 160: Abbrev Number: 5 (DW_TAG_namespace) 61 DW_AT_name: A 269: Abbrev Number: 6 (DW_TAG_variable) 6a DW_AT_name: i 6e DW_AT_MIPS_linkage_name: (indirect string, offset: 0x77): _ZN1A1iE 72 DW_AT_type: 0x59 It causes regressions on gdb.cp/namespace-using.exp for the GDB project Archer. -- Summary: DW_TAG_imported_module should be used (not DW_TAG_imported_declaration) Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: debug AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: jan dot kratochvil at redhat dot com GCC target triplet: x86_64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39471
[Bug tree-optimization/39455] [4.3/4.4 Regression] ICE : in compare_values_warnv, at tree-vrp.c:1073
--- Comment #11 from jakub at gcc dot gnu dot org 2009-03-16 16:07 --- Subject: Bug 39455 Author: jakub Date: Mon Mar 16 16:07:07 2009 New Revision: 144885 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=144885 Log: PR tree-optimization/39455 * tree-ssa-loop-niter.c (number_of_iterations_lt_to_ne): Fix types mismatches for POINTER_TYPE_P (type). (number_of_iterations_le): Likewise. * gcc.dg/pr39455.c: New test. Added: trunk/gcc/testsuite/gcc.dg/pr39455.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-loop-niter.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39455
[Bug tree-optimization/39455] [4.3 Regression] ICE : in compare_values_warnv, at tree-vrp.c:1073
--- Comment #12 from jakub at gcc dot gnu dot org 2009-03-16 16:27 --- Fixed on the trunk so far. -- jakub at gcc dot gnu dot org changed: What|Removed |Added Known to fail|4.3.3 4.4.0 |4.3.3 Known to work|4.1.2 4.2.4 |4.1.2 4.2.4 4.4.0 Summary|[4.3/4.4 Regression] ICE : |[4.3 Regression] ICE : in |in compare_values_warnv, at |compare_values_warnv, at |tree-vrp.c:1073 |tree-vrp.c:1073 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39455
[Bug target/39472] New: Add -mabi=[ms|sysv]
UEFI uses MS x64 calling convention. It will be nice to support -mabi=ms on Linux so that we can use gcc 4.4 to build UEFI applications on Linux. -- Summary: Add -mabi=[ms|sysv] Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: hjl dot tools at gmail dot com GCC build triplet: x86_64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39472
[Bug c/39375] asm with a =X output overwrites the output
--- Comment #4 from balrogg at gmail dot com 2009-03-16 16:53 --- Reopening because int params; __asm__ (xxx : =X (params)); and int params[1]; __asm__ (xxx : =X (params[0])); still produce different output in a way that is undocumented. -- balrogg at gmail dot com changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|INVALID | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39375
[Bug c/39375] asm with a =X output overwrites the output
--- Comment #5 from pinskia at gcc dot gnu dot org 2009-03-16 17:02 --- (In reply to comment #4) Reopening because int params; __asm__ (xxx : =X (params)); and int params[1]; __asm__ (xxx : =X (params[0])); still produce different output in a way that is undocumented. How so? =X (params[0]) says it can be in memory which means params is addressable. This is documented as =X really means =rfm (plus extra constraints which don't correspond to r, f, or m). -- pinskia at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||INVALID http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39375
[Bug debug/39355] [4.4 Regression] Revision 144529 miscompiled libcpp/expr.c
--- Comment #26 from dave at hiauly1 dot hia dot nrc dot ca 2009-03-16 17:20 --- Subject: Re: [4.4 Regression] Revision 144529 miscompiled libcpp/expr.c Since revision 144529: http://gcc.gnu.org/ml/gcc-patches/2009-03/msg0.html is the cause and it is inline related, I suggest you use revision 144529 as base and revert the tree-inline.c change to see if it fixes libcpp/expr.c. The regressions don't occur with revision 144874 if I replace tree-inline.c with the version from revision 144528. http://gcc.gnu.org/ml/gcc-testresults/2009-03/msg01655.html 144529 significantly changed the amount of inlining. Thus, it's very difficult to determine the location of the miscompilation in expr.o by comparing the difference in code between 144528 and 144529. It's also my impression that the miscompilation has moved in subsequent revisions. The miscompilation is related to the generation of dwarf2 debug information as it doesn't appear with hpux. While it may be that the changes to tree-inline.c are not directly responsilble for the regressions, they are definitely a contributing factor. I note that Jan does have an account on a hppa linux machine, gsyprf11.external.hp.com. Probably, I should rebuild 144529 and try Jakub's suggestions. Dave -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39355
[Bug debug/39355] [4.4 Regression] Revision 144529 miscompiled libcpp/expr.c
--- Comment #27 from hjl dot tools at gmail dot com 2009-03-16 17:26 --- (In reply to comment #26) Probably, I should rebuild 144529 and try Jakub's suggestions. You need the fix for PR 39345 on top of revision 144529. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39355
[Bug target/39473] New: Typo in untyped_call in i386.md
untyped_call in i386.md has ix86_expand_call ((TARGET_FLOAT_RETURNS_IN_80387 ? gen_rtx_REG (XCmode, FIRST_FLOAT_REG) : NULL), operands[0], const0_rtx, GEN_INT ((DEFAULT_ABI == SYSV_ABI ? X86_64_SSE_REGPARM_MAX : X64_SSE_REGPARM_MAX) - 1), NULL, 0); It doesn't look right for 32bit. Shouldn't it be GEN_INT (SSE_REGPARM_MAX) instead? -- Summary: Typo in untyped_call in i386.md Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: hjl dot tools at gmail dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39473
[Bug target/39473] Typo in untyped_call in i386.md
--- Comment #1 from hjl dot tools at gmail dot com 2009-03-16 18:26 --- Also void ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1, rtx callarg2, rtx pop, int sibcall) { rtx use = NULL, call; enum calling_abi function_call_abi; if (callarg2 INTVAL (callarg2) == -2) function_call_abi = MS_ABI; else function_call_abi = SYSV_ABI; doesn't look right either. Where does -2 come from? Shouldn't it check TARGET_64BIT? -- hjl dot tools at gmail dot com changed: What|Removed |Added CC||ktietz at onevision dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39473
[Bug target/39473] Typo in untyped_call in i386.md
--- Comment #2 from hjl dot tools at gmail dot com 2009-03-16 18:40 --- (In reply to comment #1) Also void ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1, rtx callarg2, rtx pop, int sibcall) { rtx use = NULL, call; enum calling_abi function_call_abi; if (callarg2 INTVAL (callarg2) == -2) function_call_abi = MS_ABI; else function_call_abi = SYSV_ABI; doesn't look right either. Where does -2 come from? Shouldn't it check TARGET_64BIT? This was added by revision 142859: http://gcc.gnu.org/ml/gcc-cvs/2008-12/msg00559.html -- hjl dot tools at gmail dot com changed: What|Removed |Added CC||jh at suse dot cz http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39473
[Bug target/39473] Typo in untyped_call in i386.md
--- Comment #3 from hjl dot tools at gmail dot com 2009-03-16 18:47 --- (In reply to comment #0) untyped_call in i386.md has ix86_expand_call ((TARGET_FLOAT_RETURNS_IN_80387 ? gen_rtx_REG (XCmode, FIRST_FLOAT_REG) : NULL), operands[0], const0_rtx, GEN_INT ((DEFAULT_ABI == SYSV_ABI ? X86_64_SSE_REGPARM_MAX : X64_SSE_REGPARM_MAX) - 1), NULL, 0); It doesn't look right for 32bit. Shouldn't it be GEN_INT (SSE_REGPARM_MAX) instead? This was changed by revision 136311: http://gcc.gnu.org/ml/gcc-cvs/2008-06/msg00067.html Those changes: @ -1953,9 +1972,22 @@ is also used as the pic register in ELF. So for now, don't allow more than 3 registers to be passed in registers. */ -#define REGPARM_MAX (TARGET_64BIT ? 6 : 3) - -#define SSE_REGPARM_MAX (TARGET_64BIT ? 8 : (TARGET_SSE ? 3 : 0)) +/* Abi specific values for REGPARM_MAX and SSE_REGPARM_MAX */ +#define X86_64_REGPARM_MAX 6 +#define X64_REGPARM_MAX 4 +#define X86_32_REGPARM_MAX 3 + +#define X86_64_SSE_REGPARM_MAX 8 +#define X64_SSE_REGPARM_MAX 4 +#define X86_32_SSE_REGPARM_MAX (TARGET_SSE ? 3 : 0) + +#define REGPARM_MAX (TARGET_64BIT ? (TARGET_64BIT_MS_ABI ? X64_REGPARM_MAX \ +: X86_64_REGPARM_MAX) \ + : X86_32_REGPARM_MAX) + +#define SSE_REGPARM_MAX (TARGET_64BIT ? (TARGET_64BIT_MS_ABI ? X64_SSE_REGPARM_MAX \ +: X86_64_SSE_REGPARM_MAX) \ + : X86_32_SSE_REGPARM_MAX) weren't properly mentioned in ChangeLog. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39473
[Bug debug/39474] New: DW_AT_location missing for unused variables even at -O0
It is a regression since gcc-4.3 but it was found only at artificial (GDB) testcase. Also at -O2 such behavior is even expected. The variable is considered as optimized-out which should not happen on -O0. Testcase: -- int main (void) { int var; return 0; } -- gcc -Wall -g WRONG gcc-4.4: c DW_AT_producer: (indirect string, offset: 0xb): GNU C 4.4.0 20090315 (experimental) 12d: Abbrev Number: 2 (DW_TAG_subprogram) 2f DW_AT_name: (indirect string, offset: 0x39): main 252: Abbrev Number: 3 (DW_TAG_variable) 53 DW_AT_name: var 59 DW_AT_type: 0x5e Correct gcc-4.3: c DW_AT_producer: (indirect string, offset: 0xf): GNU C 4.3.2 20081105 (Red Hat 4.3.2-7) 12d: Abbrev Number: 2 (DW_TAG_subprogram) 2f DW_AT_name: (indirect string, offset: 0x36): main 252: Abbrev Number: 3 (DW_TAG_variable) 53 DW_AT_name: var 59 DW_AT_type: 0x61 5d DW_AT_location: 2 byte block: 91 6c (DW_OP_fbreg: -20) -- Summary: DW_AT_location missing for unused variables even at -O0 Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: minor Priority: P3 Component: debug AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: jan dot kratochvil at redhat dot com GCC target triplet: x86_64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39474
[Bug c++/39475] New: c++0x type-traits should error out in case of incompleteness
The current implementation returns misleading results if used the wrong way. A simple example is: #include iostream struct X; int main() { std::cout __is_abstract(X) std::endl; } compiles and prints 0. Things get worse when templates are involved. PR libstdc++/39405 shows why this can be a real problem. I attach the example code from 39405 to this PR again. -- Summary: c++0x type-traits should error out in case of incompleteness Product: gcc Version: 4.3.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: d dot frey at gmx dot de http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39475
[Bug c++/39475] c++0x type-traits should error out in case of incompleteness
--- Comment #1 from d dot frey at gmx dot de 2009-03-16 19:05 --- Created an attachment (id=17468) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17468action=view) show inconsistency for is_abstract -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39475
[Bug rtl-optimization/30688] Branch registers loaded too late on ia64
--- Comment #6 from wilson at codesourcery dot com 2009-03-16 19:07 --- Subject: Re: Branch registers loaded too late on ia64 steven at gcc dot gnu dot org wrote: --- Comment #5 from steven at gcc dot gnu dot org 2009-03-16 08:46 --- Can someone point me to the IA64 optimiation manuals mentioned in comment #0? You can find manuals on the Intel web site. You want the Intel Itanium 2 Processor Reference Manual (For Software Development and Optimization). Chapter 7 talks about branch instructions. * Which branch registers can I use? Any one of the 8 special branch registers, class BR_REGS. * What does as early as possible mean in comment #0? The manual says there should be several cycles between the branch register write and the branch for correct prediction. There is probably no too early to worry about, as long as you don't use more than the available 8 registers. You want to avoid reloads here. Some of the regs are call clobbered, some are preserved, and probably some are reserved for call/return. I don't recall all of the ABI details. You can look them up in the manuals. See the Itanium Software Conventions and Runtime Architecture Guide. * What happens if a value is assigned to a branch register on IA64? Is the prefetcher always triggered? What is the latency of the prefetching after a branch register has been assigned a value? This is complicated. I suggest downloading the docs and reading them. * Is there a possibility to add a prediction hint to say branch register A is more likely to be used than branch register B when multiple branch registers are assigned a value in the same basic block? There is separate predication support for each branch register, but I assume this is about priority for prefetching? Yes, there are branch hints for that. See the Itanium Architecture Software Developer's Manual, Volume 1, section 4.5 is for branch instructions. There is a few completer for prefetching a few lines, and a many completer for prefetching many lines. ia64.md uses many for call and return. Jim -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30688
[Bug libstdc++/39405] [4.3 regression] std::shared_ptr barfs on incomplete template class that boost::shared_ptr accepts
--- Comment #27 from d dot frey at gmx dot de 2009-03-16 19:08 --- Thanks Paolo. I've opened PR c++/39475 for the type traits intrinsics. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39405
[Bug c++/39475] c++0x type-traits should error out in case of incompleteness
--- Comment #2 from paolo dot carlini at oracle dot com 2009-03-16 19:20 --- Indeed, ICC errors out. -- paolo dot carlini at oracle dot com changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |paolo dot carlini at oracle |dot org |dot com Status|UNCONFIRMED |ASSIGNED Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2009-03-16 19:20:13 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39475
[Bug target/39473] Typo in untyped_call in i386.md
--- Comment #4 from hjl dot tools at gmail dot com 2009-03-16 19:21 --- A patch is posed at http://gcc.gnu.org/ml/gcc-patches/2009-03/msg00749.html -- hjl dot tools at gmail dot com changed: What|Removed |Added URL||http://gcc.gnu.org/ml/gcc- ||patches/2009- ||03/msg00749.html http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39473
[Bug testsuite/37628] gcc.c-torture/execute/pr35456.c is not generic
--- Comment #1 from janis at gcc dot gnu dot org 2009-03-16 19:58 --- Subject: Bug 37628 Author: janis Date: Mon Mar 16 19:58:32 2009 New Revision: 144890 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=144890 Log: PR testsuite/37628 * gcc.c-torture/execute/pr35456.x: New, skip for vax. Added: trunk/gcc/testsuite/gcc.c-torture/execute/pr35456.x Modified: trunk/gcc/testsuite/ChangeLog -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37628
[Bug testsuite/37630] gcc.dg/20001012-1.c depends on IEEE FP encoding
--- Comment #2 from janis at gcc dot gnu dot org 2009-03-16 19:59 --- Subject: Bug 37630 Author: janis Date: Mon Mar 16 19:59:37 2009 New Revision: 144891 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=144891 Log: PR testsuite/37630 * lib/target-supports.exp (check_effective_target_ieee): New. * gcc.c-torture/execute/ieee/ieee.exp: Use it. * gcc.dg/20001012-1.c: Require ieee. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.c-torture/execute/ieee/ieee.exp trunk/gcc/testsuite/gcc.dg/20001012-1.c trunk/gcc/testsuite/lib/target-supports.exp -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37630
[Bug testsuite/37960] FAIL: gcc.dg/pr11492.c (test for bogus messages, line 8)
--- Comment #10 from janis at gcc dot gnu dot org 2009-03-16 20:01 --- Subject: Bug 37960 Author: janis Date: Mon Mar 16 20:01:15 2009 New Revision: 144892 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=144892 Log: PR testsuite/37960 * gcc.dg/pr11492.c: Replace constant and remove xfail. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/pr11492.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37960
[Bug debug/39471] DW_TAG_imported_module should be used (not DW_TAG_imported_declaration)
--- Comment #1 from jakub at gcc dot gnu dot org 2009-03-16 20:55 --- Created an attachment (id=17469) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17469action=view) gcc44-pr39471.patch Untested patch. Dodji, any reason why you started emitting DW_TAG_imported_declaration for this instead of DW_TAG_imported_module? Also, looking at the http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37410#c6 comment, I'm wondering about the C++ doesn't allow that usage part in the comment. Isn't: namespace A { int i = 1; int j = 2; } namespace B { int k = 3; } int k = 13; int main () { using namespace A; i++; j++; k++; { using B::k; k++; } return 0; } a testcase which needs IMPORTED_DECL with non-NAMESPACE_DECL IMPORTED_DECL_ASSOCIATED_DECL? -- jakub at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |jakub at gcc dot gnu dot org |dot org | Status|UNCONFIRMED |ASSIGNED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39471
[Bug target/39473] Typo in untyped_call in i386.md
--- Comment #5 from hjl dot tools at gmail dot com 2009-03-16 21:00 --- An updated patch is posted at http://gcc.gnu.org/ml/gcc-patches/2009-03/msg00754.html -- hjl dot tools at gmail dot com changed: What|Removed |Added URL|http://gcc.gnu.org/ml/gcc- |http://gcc.gnu.org/ml/gcc- |patches/2009- |patches/2009- |03/msg00749.html|03/msg00754.html http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39473
[Bug testsuite/37630] gcc.dg/20001012-1.c depends on IEEE FP encoding
--- Comment #3 from janis at gcc dot gnu dot org 2009-03-16 21:12 --- Subject: Bug 37630 Author: janis Date: Mon Mar 16 21:11:57 2009 New Revision: 144893 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=144893 Log: Revert patch for PR testsuite/37630. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.c-torture/execute/ieee/ieee.exp trunk/gcc/testsuite/gcc.dg/20001012-1.c trunk/gcc/testsuite/lib/target-supports.exp -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37630
[Bug target/39291] _Unwind_Backtrace fails.
--- Comment #6 from pluto at agmk dot net 2009-03-16 21:24 --- i've tested u-dw2.exe on wine and got more info. $ ./u-dw2.exe err:process:start_wineboot failed to start wineboot, err 2 err:process:__wine_kernel_init boot event wait timed out fixme:msvcrt:__lconv_init stub foo:enter bar:enter zoo:enter boom! signalHandler:enter lookupSymbol: 00401887 lookupSymbol: 0040166A signalHandler:longjmp err:seh:raise_exception Unhandled exception code c096 flags 0 addr 0x409461 $ i486-pc-mingw32-objdump -hw u-dw2.exe u-dw2.exe: file format pei-i386 Sections: Idx Name Size VMA LMA File off Algn Flags 0 .text 54c4 00401000 00401000 0400 2**4 CONTENTS, ALLOC, LOAD, READONLY, CODE, DATA 1 .data 0030 00407000 00407000 5a00 2**2 CONTENTS, ALLOC, LOAD, DATA 2 .rdata0c28 00408000 00408000 5c00 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 3 .bss 0538 00409000 00409000 2**5 ALLOC 4 .idata0558 0040a000 0040a000 6a00 2**2 CONTENTS, ALLOC, LOAD, DATA 5 .CRT 0034 0040b000 0040b000 7000 2**2 CONTENTS, ALLOC, LOAD, DATA 6 .tls 0008 0040c000 0040c000 7200 2**2 CONTENTS, ALLOC, LOAD, DATA the 0xc096 code means 'EXCEPTION_PRIV_INSTRUCTION' and the 0x409461 points to the .bss section. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39291
[Bug debug/39471] DW_TAG_imported_module should be used (not DW_TAG_imported_declaration)
--- Comment #2 from jan dot kratochvil at redhat dot com 2009-03-16 21:37 --- Thanks although there is still excessive DW_AT_name: 3422: Abbrev Number: 12 (DW_TAG_imported_module) 425 DW_AT_name: A 427 DW_AT_import : 0x113 [Abbrev Number: 2 (DW_TAG_namespace)] DW_AT_name looks as undefined for me for DW_TAG_imported_module and it certainly breaks the current Archer C++ implementation. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39471
[Bug target/39476] New: Typo in ix86_function_regparm in i386.c
ix86_function_regparm in i386.c has if (TARGET_64BIT) { if (ix86_function_type_abi (type) == ix86_abi) return regparm; return ix86_abi != SYSV_ABI ? X86_64_REGPARM_MAX : X64_REGPARM_MAX; } Shouldn't it be return ix86_abi == SYSV_ABI ? X86_64_REGPARM_MAX : X64_REGPARM_MAX; -- Summary: Typo in ix86_function_regparm in i386.c Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: hjl dot tools at gmail dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39476
[Bug target/39476] Typo in ix86_function_regparm in i386.c
--- Comment #1 from hjl dot tools at gmail dot com 2009-03-16 21:59 --- It is if (TARGET_64BIT) { if (ix86_function_type_abi (type) == DEFAULT_ABI) return regparm; return DEFAULT_ABI != SYSV_ABI ? X86_64_REGPARM_MAX : X64_REGPARM_MAX; } Shouldn't it be return DEFAULT_ABI == SYSV_ABI ? X86_64_REGPARM_MAX : X64_REGPARM_MAX; -- hjl dot tools at gmail dot com changed: What|Removed |Added CC||ktietz at onevision dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39476
[Bug bootstrap/39470] [melt] - lrand48_r() and srand48_r() are GNU extensions and are not portable
--- Comment #2 from rob1weld at aol dot com 2009-03-16 22:08 --- My next difficulty (on OpenSolaris) is the lack of a fopencookie() function (and the related support in FILE). I'm now building melt on i686-pc-linux-gnu and running into a few other errors; thus melt does need some fixing, even on a Linux Operating System. Rob -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39470
[Bug target/39476] Typo in ix86_function_regparm in i386.c
--- Comment #2 from hjl dot tools at gmail dot com 2009-03-16 22:09 --- We never change regparm for 64bit. Does this patch Index: gcc/config/i386/i386.c === --- gcc/config/i386/i386.c (revision 144817) +++ gcc/config/i386/i386.c (working copy) @@ -4273,17 +4273,15 @@ static int ix86_function_regparm (const_tree type, const_tree decl) { tree attr; - int regparm = ix86_regparm; + int regparm; static bool error_issued; if (TARGET_64BIT) -{ - if (ix86_function_type_abi (type) == DEFAULT_ABI) -return regparm; - return DEFAULT_ABI != SYSV_ABI ? X86_64_REGPARM_MAX : X64_REGPARM_MAX; -} +return (ix86_function_type_abi (type) == SYSV_ABI + ? X86_64_REGPARM_MAX : X64_REGPARM_MAX); + regparm = ix86_regparm; attr = lookup_attribute (regparm, TYPE_ATTRIBUTES (type)); if (attr) { look OK? -- hjl dot tools at gmail dot com changed: What|Removed |Added CC||jh at suse dot cz http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39476
[Bug middle-end/39447] ICE in create_data_ref with -O1 -floop-interchange
--- Comment #3 from il dot basso dot buffo at gmail dot com 2009-03-16 22:21 --- Here's a further reduction: struct Point { int line, col; Point( int l = -1, int c = 0 ) throw() : line( l ), col( c ) {} bool operator==( const Point p ) const throw() { return ( line == p.line col == p.col ); } bool operator( const Point p ) const throw() { return ( line p.line || ( line == p.line col p.col ) ); } }; class Basic_buffer { protected: void add_line( const char * const buf, const int len ); public: Basic_buffer( const Basic_buffer b, const Point p1, const Point p2 ); int characters( const int line ) const throw(); int pgetc( Point p ) const throw(); Point eof() const throw() { return Point( 0, 0 ); } bool pisvalid( const Point p ) const throw() { return ( ( p.col = 0 p.col characters( p.line ) ) || p == eof() ); } }; class Buffer : public Basic_buffer { public: bool save( Point p1 = Point(), Point p2 = Point() ) const; }; bool Buffer::save( Point p1, Point p2 ) const { if( !this-pisvalid( p1 ) ) p1 = eof(); if( !this-pisvalid( p2 ) ) p2 = eof(); for( Point p = p1; p p2; ) { pgetc( p ); } return true; } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39447
[Bug middle-end/39447] ICE in create_data_ref with -O1 -floop-interchange
--- Comment #4 from il dot basso dot buffo at gmail dot com 2009-03-16 22:24 --- Bah, here's an even smaller example: struct Point { int line, col; Point( int l = -1, int c = 0 ) throw() : line( l ), col( c ) {} bool operator==( const Point p ) const throw() { return ( line == p.line col == p.col ); } bool operator( const Point p ) const throw() { return ( line p.line || ( line == p.line col p.col ) ); } }; class Buffer { public: int characters( const int line ) const throw(); int pgetc( Point p ) const throw(); Point eof() const throw() { return Point( 0, 0 ); } bool pisvalid( const Point p ) const throw() { return ( ( p.col = 0 p.col characters( p.line ) ) || p == eof() ); } bool save( Point p1 = Point(), Point p2 = Point() ) const; }; bool Buffer::save( Point p1, Point p2 ) const { if( !this-pisvalid( p1 ) ) p1 = eof(); if( !this-pisvalid( p2 ) ) p2 = eof(); for( Point p = p1; p p2; ) { pgetc( p ); } return true; } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39447
Re: [Bug middle-end/39447] ICE in create_data_ref with -O1 -floop-interchange
Thanks for the reduced testcase, it completely went out of my radar (by now my delta script should have finished reducing it as well on the gcc-farm, but I won't even look at it). Thanks again for the reduced case. I will look at the bug now. Sebastian
[Bug middle-end/39447] ICE in create_data_ref with -O1 -floop-interchange
--- Comment #5 from sebpop at gmail dot com 2009-03-16 22:34 --- Subject: Re: ICE in create_data_ref with -O1 -floop-interchange Thanks for the reduced testcase, it completely went out of my radar (by now my delta script should have finished reducing it as well on the gcc-farm, but I won't even look at it). Thanks again for the reduced case. I will look at the bug now. Sebastian -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39447
[Bug debug/39474] DW_AT_location missing for unused variables even at -O0
--- Comment #1 from rguenth at gcc dot gnu dot org 2009-03-16 22:59 --- Well, it doesn't even have a value assigned. So I consider this a valid optimization for -O0. Does the variable have a location once you inintialize it? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39474
[Bug c++/39429] compiler create bad asm codes.
--- Comment #2 from rearnsha at gcc dot gnu dot org 2009-03-16 22:53 --- Confirmed. This is a bug in the arith_adjacent_mem pattern that only triggers when the offset to the memory from the base pointer exceeds the range of a simple add instruction (ie more than 1024 bytes). In that case we fall back to emitting two ldr instructions, but fail to consider the case when the first load overwrites the base address. -- rearnsha at gcc dot gnu dot org changed: What|Removed |Added CC||rearnsha at gcc dot gnu dot ||org, ramana dot r at gmail ||dot com Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2009-03-16 22:53:12 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39429
[Bug target/39429] compiler create bad asm codes.
-- rearnsha at gcc dot gnu dot org changed: What|Removed |Added Keywords||wrong-code Priority|P3 |P2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39429
Re: [Bug middle-end/39447] ICE in create_data_ref with -O1 -floop-interchange
Hi, I don't know who coded the overly complicated exclude_component_ref. In the graphite branch we already cleaned up all this code, but in trunk we still have it. Attached is a patch that fixes the problem by looking at whether the operand contains COMPONENT_REFs before calling the data reference analysis. I'm testing the patch on the gcc farm, and will send it to the gcc-patches once it finishes regstrap. Sebastian * graphite.c (exclude_component_ref): Renamed contains_component_ref_p. (is_simple_operand): Call contains_component_ref_p before calling data reference analysis that would fail on COMPONENT_REFs. Index: graphite.c === --- graphite.c (revision 144893) +++ graphite.c (working copy) @@ -1062,27 +1062,20 @@ loop_affine_expr (basic_block scop_entry is component_ref. */ static bool -exclude_component_ref (tree op) +contains_component_ref_p (tree op) { int i; - int len; - if (op) -{ - if (TREE_CODE (op) == COMPONENT_REF) - return false; - else - { - len = TREE_OPERAND_LENGTH (op); - for (i = 0; i len; ++i) - { - if (!exclude_component_ref (TREE_OPERAND (op, i))) - return false; - } - } -} + if (!op) +return false; - return true; + if (TREE_CODE (op) == COMPONENT_REF) +return true; + + for (i = 0; i TREE_OPERAND_LENGTH (op); i++) +return contains_component_ref_p (TREE_OPERAND (op, i)); + + return false; } /* Return true if the operand OP is simple. */ @@ -1094,13 +1087,15 @@ is_simple_operand (loop_p loop, gimple s if (DECL_P (op) /* or a structure, */ || AGGREGATE_TYPE_P (TREE_TYPE (op)) + /* or a COMPONENT_REF, */ + || contains_component_ref_p (op) /* or a memory access that cannot be analyzed by the data reference analysis. */ || ((handled_component_p (op) || INDIRECT_REF_P (op)) !stmt_simple_memref_p (loop, stmt, op))) return false; - return exclude_component_ref (op); + return true; } /* Return true only when STMT is simple enough for being handled by
[Bug middle-end/39447] ICE in create_data_ref with -O1 -floop-interchange
--- Comment #6 from sebpop at gmail dot com 2009-03-16 23:18 --- Subject: Re: ICE in create_data_ref with -O1 -floop-interchange Hi, I don't know who coded the overly complicated exclude_component_ref. In the graphite branch we already cleaned up all this code, but in trunk we still have it. Attached is a patch that fixes the problem by looking at whether the operand contains COMPONENT_REFs before calling the data reference analysis. I'm testing the patch on the gcc farm, and will send it to the gcc-patches once it finishes regstrap. Sebastian --- Comment #7 from sebpop at gmail dot com 2009-03-16 23:18 --- Created an attachment (id=17470) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17470action=view) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39447
[Bug inline-asm/38815] Taking the address of a __thread variable prevents the r0 register from being loaded
--- Comment #2 from rearnsha at gcc dot gnu dot org 2009-03-16 23:27 --- I believe this is a bug in the way we expand local reg vars. The manual says: Local register variables in specific registers do not reserve the registers, except at the point where they are used as input or output operands in an @code{asm} statement and the @code{asm} statement itself is not deleted. The compiler's data flow analysis is capable of determining where the specified registers contain live values, and where they are available for other uses. There are two key points to note in the above: 1) The only point at which a register variable *has* to be in the named register is when an inline ASM appears. 2) Data flow is supposed to know when the value is live. I thus believe we need to expand local vars as used in this test-case by copying a pseudo reg that contains the real value into the required register immediately before its use in an ASM -- and to leave optimizing this code path to the register allocator -- so that ideally no copy is necessary. In the test-case cited, the user assigns the variable r0 with a value and then tries to assign another value to the variable r1. The second step requires a libcall sequence that clobbers the value previously stored into r0 -- to avoid this happening the value previously assigned must be copied to a call-saved register (or the assignment deferred until after the libcall). -- rearnsha at gcc dot gnu dot org changed: What|Removed |Added CC||rearnsha at gcc dot gnu dot ||org, ramana dot r at gmail ||dot com Status|UNCONFIRMED |NEW Component|target |inline-asm Ever Confirmed|0 |1 Keywords||wrong-code Last reconfirmed|-00-00 00:00:00 |2009-03-16 23:27:22 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38815
[Bug middle-end/38674] When storing in a register the address of a value contained in the same register, gcc 4.3.2 on ARM clobbers the register before saving its content on the stack.
--- Comment #2 from rearnsha at gcc dot gnu dot org 2009-03-16 23:38 --- Confirmed. We need a way to represent an early-clobber between a register and a memory-address with side-effects. -- rearnsha at gcc dot gnu dot org changed: What|Removed |Added CC||rearnsha at gcc dot gnu dot ||org, ramana dot r at gmail ||dot com Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2009-03-16 23:38:45 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38674
[Bug middle-end/38674] When storing in a register the address of a value contained in the same register, gcc 4.3.2 on ARM clobbers the register before saving its content on the stack.
-- rearnsha at gcc dot gnu dot org changed: What|Removed |Added Keywords||wrong-code Priority|P3 |P2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38674
[Bug c++/39475] c++0x type-traits should error out in case of incompleteness
--- Comment #3 from d dot frey at gmx dot de 2009-03-16 23:49 --- One more thought on the diagnostics: There are two cases: Incomplete types (like in the initial example in the description of this PR) and recursive template instantiations (see attachment). I think the latter produces a diagnostic which suggests it is the former. This problem not only affects C++0x, it also happens for normal C++: f...@viasko:~/work/test/recursive_instantiation$ cat t.cc template typename T struct foo { typename T::type dummy(); }; template typename T struct bar { typedef void type; foo bar p; }; foo bar int x; f...@viasko:~/work/test/recursive_instantiation$ g++ t.cc t.cc: In instantiation of 'barint': t.cc:4: instantiated from 'foobarint ' t.cc:14: instantiated from here t.cc:11: error: 'barT::p' has incomplete type t.cc:3: error: declaration of 'struct foobarint ' f...@viasko:~/work/test/recursive_instantiation$ g++ is Ubuntu's GCC 4.3.2. The error message says barT::p is incomplete, but there is no hint why this is the case. Would it be possible to generally improve this type of diagnostic? Should I open yet another PR or is that not possible/worth it/...? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39475
[Bug libobjc/39465] libobjc does not find classes of DLLs
--- Comment #6 from ayers at gcc dot gnu dot org 2009-03-16 23:51 --- I've played a bit with creating a trivial static library and linking into an dynamic library and into an executable. After tweaking back and forth it seems that at least on GNU/Linux the static version linked into the executable actually replaces the version that was linked into the dynamic library... not sure what would happen if the version linked in last doesn't satisfy all the requirements needed by the dynamic library. All very intriguing , yet I believe it has nothing to do with your issue. Since I wasn't able to get a cross tool chain running (and www.mingw.org doesn't seem to support that with the current gcc versions) I went ahead and updated an old Windows VM, installed all kinds of updates... and then installed MinGW/MSYS natively. First I reproduced you issue successfully and then went about installing GNUstep. Note that GNUstep's MinGW HOWTO explicitly states: It's a good idea to remove the libobjc.a and libobjc.la and include/objc headers that come with gcc (gcc -v for location) so that they are not accidentally found instead of the libobjc DLL that you will compile below. ... After installing the GNUstep packages, I was able to build and execute applications. Now GNUstep uses it's own build environment (gnustep-make) to hide all the fancy stuff that needs to be done on windows. I was hoping to see something with messages=yes to give me an indication of what you need to do. Yet I had no luck in identifying anything interesting. Well except that GNUstep is using a shared libobjc. I'm going to throw in the towel here, but I don't believe your issue has to do with libobjc. I think your missing some flag or extra processing that gnustep-make might do for you dll or the program. But I also believe that statically linking (potentially different versions) of libobjc into different modules is error prone. I guess it would be OK, if you only have a single executable, but the constellation of the dll linking one version and the executable potentially linking another scares me... even if that itself is most likely not your issue either. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39465
[Bug target/38644] Optimization flag -O1 -fschedule-insns2 causes wrong code
--- Comment #2 from rearnsha at gcc dot gnu dot org 2009-03-17 00:03 --- Confirmed, this is a nasty bug that might silently bite users after a long period of apparently correct operation. -- rearnsha at gcc dot gnu dot org changed: What|Removed |Added CC||rearnsha at gcc dot gnu dot ||org, ramana dot r at gmail ||dot com Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Priority|P3 |P2 Last reconfirmed|-00-00 00:00:00 |2009-03-17 00:03:45 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38644
[Bug target/10242] [ARM] subsequent use of plus and minus operators could be improved
--- Comment #4 from ramana dot r at gmail dot com 2009-03-17 00:05 --- Still present with 4.4 mainline as on 20090312 revision. It looks like some sort of relic left behind with the calculations of the soft frame pointer. Maybe a peephole will help. -- ramana dot r at gmail dot com changed: What|Removed |Added CC||rearnsha at gcc dot gnu dot ||org, ramana dot r at gmail ||dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10242
[Bug rtl-optimization/11222] arm/thumb __Unwind_SjLj_Register prologue optimization causes crash on interrupts
--- Comment #10 from ramana dot r at gmail dot com 2009-03-17 00:11 --- This should be a target bug. Also with mainline the testcase empty described in comment #9 appears fixed. -- ramana dot r at gmail dot com changed: What|Removed |Added CC||rearnsha at gcc dot gnu dot ||org, ramana dot r at gmail ||dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=11222
[Bug target/10242] [ARM] subsequent use of plus and minus operators could be improved
--- Comment #5 from rearnsha at gcc dot gnu dot org 2009-03-17 00:15 --- This is a case where early splitting (before register allocation) of a constant in a plus expression leads to poor code. We should try disabling the split of a plus when combined with the internal frame pointer. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10242
[Bug target/39477] New: Incorrect document for regparm attribute
extend.texi has --- @item regparm (@var{number}) @cindex @code{regparm} attribute @cindex functions that are passed arguments in registers on the 386 On the Intel 386, the @code{regparm} attribute causes the compiler to pass arguments number one to @var{number} if they are of integral type in registers EAX, EDX, and ECX instead of on the stack. Functions that take a variable number of arguments will continue to be passed all of their arguments on the stack. Beware that on some ELF systems this attribute is unsuitable for global functions in shared libraries with lazy binding (which is the default). Lazy binding will send the first call via resolving code in the loader, which might assume EAX, EDX and ECX can be clobbered, as per the standard calling conventions. Solaris 8 is affected by this. GNU systems with GLIBC 2.1 or higher, and FreeBSD, are believed to be safe since the loaders there save all registers. (Lazy binding can be disabled with the linker or the loader if desired, to avoid the problem.) --- Although glibc is safe since it preserves EAX, EDX and ECX: _dl_runtime_resolve: cfi_adjust_cfa_offset (8) pushl %eax # Preserve registers otherwise clobbered. cfi_adjust_cfa_offset (4) pushl %ecx cfi_adjust_cfa_offset (4) pushl %edx cfi_adjust_cfa_offset (4) movl 16(%esp), %edx # Copy args pushed by PLT in register. Note movl 12(%esp), %eax # that `fixup' takes its parameters in regs. call _dl_fixup # Call resolver. it doesn't save all registers. -- Summary: Incorrect document for regparm attribute Product: gcc Version: 4.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: hjl dot tools at gmail dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39477
[Bug target/39477] Incorrect document for regparm attribute
-- hjl dot tools at gmail dot com changed: What|Removed |Added GCC target triplet||i686-pc-linux-gnu Target Milestone|--- |4.4.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39477
[Bug c++/39475] c++0x type-traits should error out in case of incompleteness
--- Comment #4 from paolo dot carlini at oracle dot com 2009-03-17 00:34 --- Maybe Daniel, but this is a completely separate issue. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39475
[Bug target/39477] Incorrect document for regparm attribute
--- Comment #1 from hjl dot tools at gmail dot com 2009-03-17 00:45 --- A patch is posted at http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39477 -- hjl dot tools at gmail dot com changed: What|Removed |Added CC||ubizjak at gmail dot com URL||http://gcc.gnu.org/bugzilla/ ||show_bug.cgi?id=39477 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39477
[Bug target/39476] Typo in ix86_function_regparm in i386.c
--- Comment #3 from hjl dot tools at gmail dot com 2009-03-17 01:24 --- A patch is posted at http://gcc.gnu.org/ml/gcc-patches/2009-03/msg00761.html -- hjl dot tools at gmail dot com changed: What|Removed |Added URL||http://gcc.gnu.org/ml/gcc- ||patches/2009- ||03/msg00761.html Target Milestone|--- |4.4.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39476
[Bug target/39473] Typo in untyped_call in i386.md
-- hjl dot tools at gmail dot com changed: What|Removed |Added Target Milestone|--- |4.4.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39473
[Bug target/35180] built-in-setjmp.x2
--- Comment #1 from hp at gcc dot gnu dot org 2009-03-17 04:18 --- Does this still happen? See also PR38609. -- hp at gcc dot gnu dot org changed: What|Removed |Added CC||hp at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35180
[Bug target/38609] [4.4 Regression]: gcc.c-torture/execute/built-in-setjmp.c execute -O2 and above
--- Comment #9 from hp at gcc dot gnu dot org 2009-03-17 05:35 --- (In reply to comment #8) Guess it probably won't be TARGET_BUILTIN_SETJMP_FRAME_VALUE then. At any rate, changing it to hard_frame_pointer_rtx doesn't help by itself. (Resulting diffs in RTL dumps are gone after 132r.unshare, for r144898.) Either, GCC should punt and force p to the stack, or calculate p / keep track of the stack-pointer correctly: the value is off by 20 when used after the longjump. (It should be move.d [$sp+28],$r10, not $sp+8.) Right, that's the sp -= 20 due to the __builtin_alloca (20) before the __builtin_setjmp call. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38609