Re: [PATCH] [RFC, GCC 4.8] Optimize conditional moves from adjacent memory locations
On Fri, 2012-02-24 at 15:41 -0600, William J. Schmidt wrote: On Fri, 2012-02-10 at 15:46 -0500, Michael Meissner wrote: I was looking at the routelookup EEMBC benchmark and it has code of the form: while ( this_node-cmpbit next_node-cmpbit ) { this_node = next_node; if ( proute-dst_addr (0x1 this_node-cmpbit) ) next_node = this_node-rlink; else next_node = this_node-llink; } Andrew and Richard both suggested this would be better handled as a tree optimization. Here is a proposed patch to do that. I think this is slightly different from what was suggested by me. Since my suggestion has to deal with the load from this_node-cmpbit taken into account and not just because this_node-rlink and this_node-llink might be in the same page. Thanks, Andrew Pinski
Re: [PR52001] too many cse reverse equiv exprs (take2)
Alexandre Oliva aol...@redhat.com writes: On Feb 19, 2012, Richard Sandiford rdsandif...@googlemail.com wrote: and it still isn't obvious to me when canonical_cselib_val is supposed to be used. For comparison of VALUEs, it avoids the need for recursive or combinatorial compares, for all equivalent VALUEs map directly to the single canonical value. For recursive searches and other the like, it's just an optimization to avoid an additional recursion (avoiding recursing into *any* VALUEs is recommended along with it). Other algorithms that iterate over loc lists and recurse should take care to avoid infinite recursion, by marking already-visited nodes (cselib and var-tracking do some of this), temporarily zeroing out their loc lists (like find_base_term), or using other mechanisms to break recursion cycles (like get_addr). Algorithms that didn't expect naked VALUEs in loc lists (like get_addr) may need adjusting to iterate over the loc lists of the canonical value (for non-canonical values have a single loc, the canonical value), and to disregard values in canonical value's lists (unless e.g. we happen to be looking for VALUEs that turn out to be noncanonical). In other cases, the use of canonical values instead of noncanonical ones when filling in data structures (say, building expressions to record in cselib) may save memory by avoiding duplication, but since it causes cselib to compute different hashes, we'd better use whatever is most likely to be searched for by hashing. (We could tweak lookups to use canonical values and to recompute hashes when adding equivalences between values already used in expressions, but this hasn't been done yet). I hope this makes some sense ;-) Yeah, it does, thanks. It seemed that when we recorded two values V1 and V2 were equivalent, we added V1 to V2's location list and V2 to V1's location list. But it sounds from the above like the canonical value is what we want in almost all cases, so if V2 is the one that becomes noncanonical, is it really worth adding V2 to V1's location list? Richard
[patch committed SH] Fix some comment typos and formattings
Hello, I've just committed the attached patch as obvious. Cheers, Oleg 2012-02-26 Oleg Endo olege...@gcc.gnu.org * config/sh/predicates.md: Remove blank lines. * config/sh/sh.c: Fix typos in comments. * config/sh/constraints.md: Likewise. * config/sh/sh.md: Remove blank lines. Fix typos in comments. Use ;; as comment characters. Index: gcc/config/sh/predicates.md === --- gcc/config/sh/predicates.md (revision 184582) +++ gcc/config/sh/predicates.md (working copy) @@ -448,7 +448,6 @@ return general_operand (op, mode); }) - ;; Returns 1 if OP is a POST_INC on stack pointer register. (define_predicate sh_no_delay_pop_operand @@ -466,7 +465,6 @@ return 0; }) - ;; Returns 1 if OP is a MEM that can be source of a simple move operation. (define_predicate unaligned_load_operand Index: gcc/config/sh/sh.c === --- gcc/config/sh/sh.c (revision 184582) +++ gcc/config/sh/sh.c (working copy) @@ -1850,7 +1850,7 @@ } /* ??? How should we distribute probabilities when more than one branch - is generated. So far we only have soem ad-hoc observations: + is generated. So far we only have some ad-hoc observations: - If the operands are random, they are likely to differ in both parts. - If comparing items in a hash chain, the operands are random or equal; operation should be EQ or NE. @@ -5380,7 +5380,7 @@ /* If relaxing, generate pseudo-ops to associate function calls with the symbols they call. It does no harm to not generate these - pseudo-ops. However, when we can generate them, it enables to + pseudo-ops. However, when we can generate them, it enables the linker to potentially relax the jsr to a bsr, and eliminate the register load and, possibly, the constant pool entry. */ @@ -9259,7 +9259,7 @@ #if 0 /* If this is a label that existed before reload, then the register - if dead here. However, if this is a label added by reorg, then + is dead here. However, if this is a label added by reorg, then the register may still be live here. We can't tell the difference, so we just ignore labels completely. */ if (code == CODE_LABEL) @@ -9569,7 +9569,7 @@ { int size; - /* Check if this the address of an unaligned load / store. */ + /* Check if this is the address of an unaligned load / store. */ if (mode == VOIDmode) return CONST_OK_FOR_I06 (INTVAL (op)); Index: gcc/config/sh/constraints.md === --- gcc/config/sh/constraints.md (revision 184582) +++ gcc/config/sh/constraints.md (working copy) @@ -139,7 +139,7 @@ (match_test ival = 0 ival = 255))) (define_constraint K12 - An unsigned 8-bit constant, as used in SH2A 12-bit display. + An unsigned 8-bit constant, as used in SH2A 12-bit displacement addressing. (and (match_code const_int) (match_test ival = 0 ival = 4095))) Index: gcc/config/sh/sh.md === --- gcc/config/sh/sh.md (revision 184582) +++ gcc/config/sh/sh.md (working copy) @@ -4405,8 +4405,6 @@ sub r63, %1, %0 [(set_attr type arith_media)]) - - ;; Don't expand immediately because otherwise neg:DI (abs:DI) will not be ;; combined. (define_expand negdi2 @@ -4497,7 +4495,6 @@ DONE; }) - ;; The SH4 202 can do zero-offset branches without pipeline stalls. ;; This can be used as some kind of conditional execution, which is useful ;; for abs. @@ -5468,7 +5465,7 @@ operands[3] = gen_rtx_REG (DImode, REGNO (operands[2])); }) -/* When storing r0, we have to avoid reg+reg addressing. */ +;; When storing r0, we have to avoid reg+reg addressing. (define_insn movhi_i [(set (match_operand:HI 0 general_movdst_operand =r,r,r,r,m,r,l,r) (match_operand:HI 1 general_movsrc_operand Q,rI08,m,t,r,l,r,i))] @@ -7472,7 +7469,7 @@ (set_attr fp_set unknown)]) ;; This is TBR relative jump instruction for SH2A architecture. -;; Its use is enabled assigning an attribute function_vector +;; Its use is enabled by assigning an attribute function_vector ;; and the vector number to a function during its declaration. (define_insn call_valuei_tbr_rel @@ -9581,8 +9578,6 @@ DONE; ) - - ;; sne moves the complement of the T reg to DEST like this: ;; cmp/eq ... ;; mov#-1,temp @@ -9605,7 +9600,6 @@ operands[1] = gen_reg_rtx (SImode); }) - ;; Recognize mov #-1/negc/neg sequence, and change it to movt/add #-1. ;; This prevents a regression that occurred when we switched from xor to ;; mov/neg for sne. @@ -9659,7 +9653,6 @@ DONE; ) - ;; - ;; Instructions to cope with inline literal tables ;;
New Swedish PO file for 'gcc' (version 4.7-b20120128)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'gcc' has been submitted by the Swedish team of translators. The file is available at: http://translationproject.org/latest/gcc/sv.po (This file, 'gcc-4.7-b20120128.sv.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: http://translationproject.org/latest/gcc/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: http://translationproject.org/domain/gcc.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator. coordina...@translationproject.org
Re: [PATCH] [RFC, GCC 4.8] Optimize conditional moves from adjacent memory locations
On Sun, 2012-02-26 at 00:39 -0800, Andrew T Pinski wrote: On Fri, 2012-02-24 at 15:41 -0600, William J. Schmidt wrote: On Fri, 2012-02-10 at 15:46 -0500, Michael Meissner wrote: I was looking at the routelookup EEMBC benchmark and it has code of the form: while ( this_node-cmpbit next_node-cmpbit ) { this_node = next_node; if ( proute-dst_addr (0x1 this_node-cmpbit) ) next_node = this_node-rlink; else next_node = this_node-llink; } Andrew and Richard both suggested this would be better handled as a tree optimization. Here is a proposed patch to do that. I think this is slightly different from what was suggested by me. Since my suggestion has to deal with the load from this_node-cmpbit taken into account and not just because this_node-rlink and this_node-llink might be in the same page. Hi Andrew, That's true, but I must be missing your concern here. If this_node is NULL, an exception will occur with or without the transformation, whether this_node is dereferenced in bb0 or not. Are you worried about exception ordering or...? Sorry if I'm being dense. Thanks, Bill Thanks, Andrew Pinski
[patch committed SH] PR 49263 - add missing test case
Hello, The test case that was originally included in the patch for PR 49263 didn't make it into trunk. Committed as rev 184585. Cheers, Oleg testsuite/ChangeLog: 2012-02-26 Oleg Endo olege...@gcc.gnu.org PR target/49263 * gcc.target/sh/pr49263.c: New. Index: gcc/testsuite/gcc.target/sh/pr49263.c === --- gcc/testsuite/gcc.target/sh/pr49263.c (revision 0) +++ gcc/testsuite/gcc.target/sh/pr49263.c (revision 0) @@ -0,0 +1,86 @@ +/* Verify that TST #imm, R0 instruction is generated if the constant + allows it. Under some circumstances another compare instruction might + be selected, which is also fine. Any AND instructions are considered + counter productive and fail the test. */ +/* { dg-do compile { target sh*-*-* } } */ +/* { dg-options -O2 } */ +/* { dg-final { scan-assembler-not and } } */ + +#define make_func(__valtype__, __valget__, __tstval__, __suff__)\ + int test_imm_##__tstval__##__suff__ (__valtype__ val) \ +{\ + return ((__valget__) (0x##__tstval__ 0)) ? -20 : -40;\ +} + +#define make_func_0_F(__valtype__, __valget__, __y__, __suff__)\ + make_func (__valtype__, __valget__, __y__##0, __suff__)\ + make_func (__valtype__, __valget__, __y__##1, __suff__)\ + make_func (__valtype__, __valget__, __y__##2, __suff__)\ + make_func (__valtype__, __valget__, __y__##3, __suff__)\ + make_func (__valtype__, __valget__, __y__##4, __suff__)\ + make_func (__valtype__, __valget__, __y__##5, __suff__)\ + make_func (__valtype__, __valget__, __y__##6, __suff__)\ + make_func (__valtype__, __valget__, __y__##7, __suff__)\ + make_func (__valtype__, __valget__, __y__##8, __suff__)\ + make_func (__valtype__, __valget__, __y__##9, __suff__)\ + make_func (__valtype__, __valget__, __y__##A, __suff__)\ + make_func (__valtype__, __valget__, __y__##B, __suff__)\ + make_func (__valtype__, __valget__, __y__##C, __suff__)\ + make_func (__valtype__, __valget__, __y__##D, __suff__)\ + make_func (__valtype__, __valget__, __y__##E, __suff__)\ + make_func (__valtype__, __valget__, __y__##F, __suff__)\ + +#define make_funcs_0_FF(__valtype__, __valget__, __suff__)\ + make_func_0_F (__valtype__, __valget__, 0, __suff__)\ + make_func_0_F (__valtype__, __valget__, 1, __suff__)\ + make_func_0_F (__valtype__, __valget__, 2, __suff__)\ + make_func_0_F (__valtype__, __valget__, 3, __suff__)\ + make_func_0_F (__valtype__, __valget__, 4, __suff__)\ + make_func_0_F (__valtype__, __valget__, 5, __suff__)\ + make_func_0_F (__valtype__, __valget__, 6, __suff__)\ + make_func_0_F (__valtype__, __valget__, 7, __suff__)\ + make_func_0_F (__valtype__, __valget__, 8, __suff__)\ + make_func_0_F (__valtype__, __valget__, 9, __suff__)\ + make_func_0_F (__valtype__, __valget__, A, __suff__)\ + make_func_0_F (__valtype__, __valget__, B, __suff__)\ + make_func_0_F (__valtype__, __valget__, C, __suff__)\ + make_func_0_F (__valtype__, __valget__, D, __suff__)\ + make_func_0_F (__valtype__, __valget__, E, __suff__)\ + make_func_0_F (__valtype__, __valget__, F, __suff__)\ + +make_funcs_0_FF (signed char*, *val, int8_mem) +make_funcs_0_FF (signed char, val, int8_reg) + +make_funcs_0_FF (unsigned char*, *val, uint8_mem) +make_funcs_0_FF (unsigned char, val, uint8_reg) + +make_funcs_0_FF (short*, *val, int16_mem) +make_funcs_0_FF (short, val, int16_reg) + +make_funcs_0_FF (unsigned short*, *val, uint16_mem) +make_funcs_0_FF (unsigned short, val, uint16_reg) + +make_funcs_0_FF (int*, *val, int32_mem) +make_funcs_0_FF (int, val, int32_reg) + +make_funcs_0_FF (unsigned int*, *val, uint32_mem) +make_funcs_0_FF (unsigned int, val, uint32_reg) + +make_funcs_0_FF (long long*, *val, int64_lowword_mem) +make_funcs_0_FF (long long, val, int64_lowword_reg) + +make_funcs_0_FF (unsigned long long*, *val, uint64_lowword_mem) +make_funcs_0_FF (unsigned long long, val, uint64_lowword_reg) + +make_funcs_0_FF (long long*, *val 32, int64_highword_mem) +make_funcs_0_FF (long long, val 32, int64_highword_reg) + +make_funcs_0_FF (unsigned long long*, *val 32, uint64_highword_mem) +make_funcs_0_FF (unsigned long long, val 32, uint64_highword_reg) + +make_funcs_0_FF (long long*, *val 16, int64_midword_mem) +make_funcs_0_FF (long long, val 16, int64_midword_reg) + +make_funcs_0_FF (unsigned long long*, *val 16, uint64_midword_mem) +make_funcs_0_FF (unsigned long long, val 16, uint64_midword_reg) +
Re: [PATCH] Fix PR52298
On Fri, Feb 24, 2012 at 2:16 PM, Ulrich Weigand uweig...@de.ibm.com wrote: Richard Guenther wrote: On Thu, 23 Feb 2012, Ulrich Weigand wrote: The assert in question looks like: if (nested_in_vect_loop (TREE_INT_CST_LOW (STMT_VINFO_DR_STEP (stmt_info)) % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0)) { gcc_assert (alignment_support_scheme != dr_explicit_realign_optimized); compute_in_loop = true; } where your patch changed DR_STEP to STMT_VINFO_DR_STEP (reverting just this one change makes the ICEs go away). However, at the place where the decision to use the dr_explicit_realign_optimized strategy is made (tree-vect-data-refs.c:vect_supportable_dr_alignment), we still have: if ((nested_in_vect_loop (TREE_INT_CST_LOW (DR_STEP (dr)) != GET_MODE_SIZE (TYPE_MODE (vectype || !loop_vinfo) return dr_explicit_realign; else return dr_explicit_realign_optimized; Should this now also use STMT_VINFO_DR_STEP? Yes, I think so. Hmmm. Reading the comment in vect_supportable_dr_alignment: However, in the case of outer-loop vectorization, when vectorizing a memory access in the inner-loop nested within the LOOP that is now being vectorized, while it is guaranteed that the misalignment of the vectorized memory access will remain the same in different outer-loop iterations, it is *not* guaranteed that is will remain the same throughout the execution of the inner-loop. This is because the inner-loop advances with the original scalar step (and not in steps of VS). If the inner-loop step happens to be a multiple of VS, then the misalignment remains fixed and we can use the optimized realignment scheme. it would appear that in this case, checking the inner-loop step is deliberate. Given the comment in vectorizable_load: /* If the misalignment remains the same throughout the execution of the loop, we can create the init_addr and permutation mask at the loop preheader. Otherwise, it needs to be created inside the loop. This can only occur when vectorizing memory accesses in the inner-loop nested within an outer-loop that is being vectorized. */ this looks to me that, since the check is intended to verify that misalignment remains the same throughout the execuction of the loop, we actually want to check the inner-loop step here as well, i.e. revert this chunk of your patch ... Hmm. I have to admit I don't know the outer loop vectorization code very well, but the comments indeed suggest that revering that hunk is ok. Can you check that patch on powerpc and commit it if it works ok? Thanks, Richard. Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: [PR51752] publication safety violations in loop invariant motion pass
On Fri, Feb 24, 2012 at 2:10 PM, Torvald Riegel trie...@redhat.com wrote: On Fri, 2012-02-24 at 09:58 +0100, Richard Guenther wrote: On Thu, Feb 23, 2012 at 10:11 PM, Aldy Hernandez al...@redhat.com wrote: On 02/23/12 12:19, Aldy Hernandez wrote: about hit me. Instead now I save all loads in a function and iterate through them in a brute force way. I'd like to rewrite this into a hash of some sort, but before I go any further I'm interested to know if the main idea is ok. For the record, it may be ideal to reuse some of the iterations we already do over the function's basic blocks, so we don't have to iterate yet again over the IL stream. Though it may complicate the pass unnecessarily. It's definitely ugly that you need to do this. Indeed. But that's simply the price we have to pay for making publication with transactions easier for the programmer yet still at acceptable performance. Also note that what we primarily have to care about for this PR is to not hoist loads _within_ transactions if they would violate publication safety. I didn't have time to look at Aldy's patch yet, but a first safe and conservative way would be to treat transactions as full transformation barriers, and prevent publication-safety-violating transformations _within_ transactions. Which I would prefer until we're confident that we understood all of it. For hoisting out of or across transactions, we have to reason about more than just publication safety. And yes, you have to look at very many passes I guess, PRE comes to my mind at least. I still do not understand the situation very well, at least why for transaction { for () if (b) load; } load; it should be safe to hoist load out of the transaction This one is funny. *Iff* this is an atomic txn, we can assume that the transaction does not wait for a concurrent event. If it is a relaxed txn, then we must not have a loop which terminates depending on an atomic load (which we don't in that example); otherwise, we cannot hoist load to before the txn (same reason as why we must not hoist to before an atomic load with memory_order_acquire). Now, if these things hold, then the load will always be executed after the txn. Thus, we can assume that it will happen anyway and nothing can stop it (irrespective of which position the txn gets in the Transactional Synchronization Order (see the C++ TM spec)). It is a nonatomic load, and we can rely on the program being data-race-free, so we cannot have other threads storing to the same location, so we can hoist it across because in that part of the program, the location is guaranteed to be thread-local data (or immutable). As I said, this isn't related to just pub safety anymore. And this is tricky enough that I'd rather not try to optimize this currently but instead wait until we have more confidence in our understanding of the matter. while for load; transaction { for () if (b) load; } it is not. If the same assumptions as above hold, I think you can hoist it out, because again you can assume that it targets thread-local/immutable data: the nontransactional (+nonatomic!) load can happen at any time, essentially, irrespective of b's value or how/when other threads modify it. Thus, it cannot have been changed between the two loads in a data-race-free program. Neither do I understand why it's not ok for transaction { for () if (b) load; } to hoist the load out of the transaction. You would violate publication safety. Also, you don't have any reason to believe that load targets thread-local/immutable data, so you must not make it nontransactional (otherwise, you could introduce a race condition). I assume all the issue is about hoisting things over the trasaction start. So - why does the transaction start not simply clobber all global memory? That would avoid the hoisting. I assume that transforming the above to transaction { tem = load; for () if (b) = tem; } is ok? No, it is not. Actually, this is precisely the transformation that we need to prevent from happening. As Aldy said, please see the explanations in the PR, or have a look at the C++ TM specification and the C++11 memory model alternatively. We could also discuss this on IRC, if this would be easier. Ok. I see. So, I think what would be best is to have a way to check whether a store/load is part of a transaction - do we have a way to do that right now? (For example a flag on a gimple stmt?) Then we can simply avoid the LIM transform by making transaction load/store stmts behave the same as potentially trapping stmts (thus, only optimize if the memory is accessed unconditional somewhere else). That would work for PRE as well. [easiest would be to make *_could_trap_p return true for memory ops inside a transaction] Does that sound like it would make TM happy (well, apart from missed optimizations)? Thus, is it enough to avoid transforms that would be
Re: [PR51752] publication safety violations in loop invariant motion pass
On Fri, Feb 24, 2012 at 5:34 PM, Aldy Hernandez al...@redhat.com wrote: On 02/24/12 07:10, Torvald Riegel wrote: On Fri, 2012-02-24 at 09:58 +0100, Richard Guenther wrote: On Thu, Feb 23, 2012 at 10:11 PM, Aldy Hernandezal...@redhat.com wrote: On 02/23/12 12:19, Aldy Hernandez wrote: about hit me. Instead now I save all loads in a function and iterate through them in a brute force way. I'd like to rewrite this into a hash of some sort, but before I go any further I'm interested to know if the main idea is ok. For the record, it may be ideal to reuse some of the iterations we already do over the function's basic blocks, so we don't have to iterate yet again over the IL stream. Though it may complicate the pass unnecessarily. It's definitely ugly that you need to do this. Indeed. But that's simply the price we have to pay for making publication with transactions easier for the programmer yet still at acceptable performance. Also note that what we primarily have to care about for this PR is to not hoist loads _within_ transactions if they would violate publication For that matter, didn't rth add a memory barrier at the beginning of transactions last week? That would mean that we can't hoist anything outside of a transaction anyhow. Or was it not a full memory barrier? It's now a full memory barrier for all global memory and for local statics if their address is taken (and for automatic vars with their address taken). Do we need to be concerned about non-address-taken local statics? safety. I didn't have time to look at Aldy's patch yet, but a first safe and conservative way would be to treat transactions as full transformation barriers, and prevent publication-safety-violating transformations _within_ transactions. Which I would prefer until we're confident that we understood all of it. Do you mean disallow hoisting of *any* loads that happen inside of a transaction (regardless of whether a subsequent load happens on every path out of the loop)? This would definitely be safe and quite easily doable, simply by checking if loads to be hoisted are within a transaction. Yes, simply handle memory accesses inside a transaction as if they may trap. Then all optimizers should be fine (or they are buggy even without TM). Richard. For hoisting out of or across transactions, we have to reason about more than just publication safety. Again, __transactions being barriers and all, I don't think we should complicate things unnecessarily at this point, since it doesn't happen. Aldy
[Patch libiberty/mach-o] fix byte-swapping of indices in wrapper sections.
Found while testing x86 X ppc ... .. I missed byte-swapping the indices when outputting the index of the GNU wrapper for LTO sections. OK/When? Iain libiberty: * simple-object-mach-o.c (simple_object_mach_o_write_segment): Byte-swap indices when required. diff --git a/libiberty/simple-object-mach-o.c b/libiberty/simple- object-mach-o.c index af5e4f9..fbf6a3f 100644 --- a/libiberty/simple-object-mach-o.c +++ b/libiberty/simple-object-mach-o.c @@ -1224,6 +1224,11 @@ simple_object_mach_o_write_segment (simple_object_write *sobj, int descriptor, index[4 * i] -= index[0]; index[0] = 0; + /* Swap the indices, if required. */ + + for (i = 0; i (nsects_in * 4); ++i) +set_32 (index[i], index[i]); + sechdr_offset += sechdrsize; /* Write out the section names.
Re: [Patch libiberty/mach-o] fix byte-swapping of indices in wrapper sections.
Iain Sandoe develo...@sandoe-acoustics.co.uk writes: Found while testing x86 X ppc ... .. I missed byte-swapping the indices when outputting the index of the GNU wrapper for LTO sections. OK/When? Iain libiberty: * simple-object-mach-o.c (simple_object_mach_o_write_segment): Byte-swap indices when required. diff --git a/libiberty/simple-object-mach-o.c b/libiberty/simple- object-mach-o.c index af5e4f9..fbf6a3f 100644 --- a/libiberty/simple-object-mach-o.c +++ b/libiberty/simple-object-mach-o.c @@ -1224,6 +1224,11 @@ simple_object_mach_o_write_segment (simple_object_write *sobj, int descriptor, index[4 * i] -= index[0]; index[0] = 0; + /* Swap the indices, if required. */ + + for (i = 0; i (nsects_in * 4); ++i) +set_32 (index[i], index[i]); + sechdr_offset += sechdrsize; /* Write out the section names. I think that is not the only way the code is broken. When you pass the array to simple_object_internal_write, you are assuming that the type unsigned int is exactly 4 bytes long. I would strongly recommend changing the code to do something like index_bytes = XNEWVEC (unsigned char, nsects_in * 16); for (i = 0; i nsects_in * 4; ++i) set_32 (index_bytes + i * 4, index[i]); if (!simple_object_internal_write (descriptor, offset, index_bytes, nsects_in * 16, errmsg, err)) return 0; XFREEVEC (index_bytes); If that works, I'll approve it. I think this is OK in stage 4, as it is a bug fix. Ian
[wwwdocs] SH update for 4.7
Hello, The attached patch adds some SH update notes for GCC 4.7. OK to commit? Cheers, Oleg Index: htdocs/gcc-4.7/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.7/changes.html,v retrieving revision 1.88 diff -u -r1.88 changes.html --- htdocs/gcc-4.7/changes.html 23 Feb 2012 13:24:36 - 1.88 +++ htdocs/gcc-4.7/changes.html 26 Feb 2012 21:55:51 - @@ -735,6 +735,24 @@ being defined in preprocessor output./li /ul +h3SH/h3 + ul +liA new option code-msoft-atomic/code has been added. When it is +specified, GCC will generate GNU/Linux compatible gUSA atomic sequences +for the new code__atomic/code routines. +liSince it is neither supported by GAS nor officially documented, code +generation for little endian SH2A has been disabled. Specifying +code-ml/code with code-m2a*/code will now result in a compiler +error. +liThe defunct code-mbranch-cost/code option has been fixed. +liSome improvements to the generated code of: + ul +liUtilization of the codetst #imm,R0/code instruction. +liDynamic shift instructions on SH2A. +liInteger absolute value calculations. + /ul + /ul + h3SPARC/h3 ul liThe option code-mflat/code has been reinstated. When it is
Re: [SH] Delete dead GO_IF_LEGITIMATE_INDEX macro
Oleg Endo oleg.e...@t-online.de wrote: The attached patch deletes the dead GO_IF_LEGITIMATE_INDEX macro in sh.h. OK to apply to trunk? OK. Regards, kaz
Re: [wwwdocs] SH update for 4.7
Oleg Endo oleg.e...@t-online.de wrote: The attached patch adds some SH update notes for GCC 4.7. OK to commit? Looks fine to me, though it requires OK from wwwdocs maintainer. Regards, kaz
[SH] Use SImode for return value in atomic_compare_and_swap*
Hello, The attached patch changes the atomic_compare_and_swap expander/insn to use SImode for the return value instead of QImode. This is more aligned to the other insns which handle the T bit as SImode and avoids some unnecessary test instructions in cases where the result of the atomic op in the T bit is re-used. Tested against rev 184582 with make -k check RUNTESTFLAGS=--target_board=sh-sim \{-m2/-ml/-msoft-atomic, -m2/-mb/-msoft-atomic, -m2a-single/-mb/-msoft-atomic, -m4-single/-ml/-msoft-atomic, -m4-single/-mb/-msoft-atomic, -m4a-single/-ml/-msoft-atomic, -m4a-single/-mb/-msoft-atomic} and no new failures. Cheers, Oleg 2012-02-27 Oleg Endo olege...@gcc.gnu.org * config/sh/sync.md (atomic_compare_and_swapmode): Use SImode for return value instead of QImode. (atomic_compare_and_swapmode_soft): Likewise. Index: gcc/config/sh/sync.md === --- gcc/config/sh/sync.md (revision 184582) +++ gcc/config/sh/sync.md (working copy) @@ -109,7 +109,7 @@ [(plus add) (minus sub) (ior or) (xor xor) (and and)]) (define_expand atomic_compare_and_swapmode - [(match_operand:QI 0 register_operand ) ;; bool success output + [(match_operand:SI 0 register_operand ) ;; bool success output (match_operand:I124 1 register_operand ) ;; oldval output (match_operand:I124 2 memory_operand ) ;; memory (match_operand:I124 3 register_operand ) ;; expected input @@ -131,7 +131,7 @@ else if (MODEmode == HImode) emit_insn (gen_zero_extendhisi2 (gen_lowpart (SImode, operands[1]), operands[1])); - emit_insn (gen_movqi (operands[0], gen_rtx_REG (QImode, T_REG))); + emit_insn (gen_movsi (operands[0], gen_rtx_REG (SImode, T_REG))); DONE; }) @@ -144,8 +144,8 @@ UNSPECV_CMPXCHG_1)) (set (mem:I124 (match_dup 1)) (unspec_volatile:I124 [(const_int 0)] UNSPECV_CMPXCHG_2)) - (set (reg:QI T_REG) - (unspec_volatile:QI [(const_int 0)] UNSPECV_CMPXCHG_3)) + (set (reg:SI T_REG) + (unspec_volatile:SI [(const_int 0)] UNSPECV_CMPXCHG_3)) (clobber (match_scratch:SI 4 =u)) (clobber (reg:SI R0_REG)) (clobber (reg:SI R1_REG))]
[SH] Add atomic_exchange patterns
Hello, The attached patch adds atomic_exchange patterns to the SH target. This results in slightly better generated code compared to the default compare_and_swap loop that is generated if atomic_exchange patterns are absent. Tested against rev 184582 with make -k check RUNTESTFLAGS=--target_board=sh-sim \{-m2/-ml/-msoft-atomic, -m2/-mb/-msoft-atomic, -m2a-single/-mb/-msoft-atomic, -m4-single/-ml/-msoft-atomic, -m4-single/-mb/-msoft-atomic, -m4a-single/-ml/-msoft-atomic, -m4a-single/-mb/-msoft-atomic} and no new failures. Cheers, Oleg 2012-02-27 Oleg Endo olege...@gcc.gnu.org * config/sh/sync.md (atomic_exchangemode): New expander. (atomic_exchangemode_soft): New insn. Index: gcc/config/sh/sync.md === --- gcc/config/sh/sync.md (revision 184582) +++ gcc/config/sh/sync.md (working copy) @@ -164,6 +164,45 @@ } [(set_attr length 20)]) +(define_expand atomic_exchangemode + [(match_operand:I124 0 register_operand ) ;; oldval output + (match_operand:I124 1 memory_operand ) ;; memory + (match_operand:I124 2 register_operand ) ;; newval input + (match_operand:SI 3 const_int_operand )] ;; memory model + TARGET_SOFT_ATOMIC !TARGET_SHMEDIA +{ + rtx addr = force_reg (Pmode, XEXP (operands[1], 0)); + emit_insn (gen_atomic_exchangemode_soft + (operands[0], addr, operands[2])); + if (MODEmode == QImode) +emit_insn (gen_zero_extendqisi2 (gen_lowpart (SImode, operands[0]), + operands[0])); + else if (MODEmode == HImode) +emit_insn (gen_zero_extendhisi2 (gen_lowpart (SImode, operands[0]), + operands[0])); + DONE; +}) + +(define_insn atomic_exchangemode_soft + [(set (match_operand:I124 0 register_operand =u) + (mem:I124 (match_operand:SI 1 register_operand u))) + (set (mem:I124 (match_dup 1)) + (unspec:I124 + [(match_operand:I124 2 register_operand u)] UNSPEC_ATOMIC)) + (clobber (reg:SI R0_REG)) + (clobber (reg:SI R1_REG))] + TARGET_SOFT_ATOMIC !TARGET_SHMEDIA +{ + return mova 1f,r0\n + .align 2 \n + mov r15,r1 \n + mov #(0f-1f),r15 \n + 0: mov.i124suffix @%1,%0 \n + mov.i124suffix %2,@%1 \n + 1: mov r1,r15; +} + [(set_attr length 14)]) + (define_expand atomic_fetch_fetchop_namemode [(set (match_operand:I124 0 register_operand ) (match_operand:I124 1 memory_operand ))
Re: [wwwdocs] SH update for 4.7
On Mon, 27 Feb 2012, Kaz Kojima wrote: The attached patch adds some SH update notes for GCC 4.7. OK to commit? Looks fine to me, though it requires OK from wwwdocs maintainer. It does not. :-) As port maintainer, you are more then welcome (and obviously qualified), Kaz! Oleg, just one detail regarding markup: please close all list items with /li at the end, or the validator will haunt you. I was going to show this using an example, but then figured, why not be helpful and go ahead and make this change for you? The patch now looks as follows and I applied it already. Thanks, Gerald Index: changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.7/changes.html,v retrieving revision 1.88 diff -u -3 -p -r1.88 changes.html --- changes.html23 Feb 2012 13:24:36 - 1.88 +++ changes.html27 Feb 2012 00:30:46 - @@ -735,6 +735,24 @@ int add_values (const __flash int *p, in being defined in preprocessor output./li /ul +h3SH/h3 + ul +liA new option code-msoft-atomic/code has been added. When it is +specified, GCC will generate GNU/Linux compatible gUSA atomic sequences + for the new code__atomic/code routines./li +liSince it is neither supported by GAS nor officially documented, code +generation for little endian SH2A has been disabled. Specifying +code-ml/code with code-m2a*/code will now result in a compiler + error./li +liThe defunct code-mbranch-cost/code option has been fixed./li +liSome improvements to the generated code of: + ul +liUtilization of the codetst #imm,R0/code instruction./li + liDynamic shift instructions on SH2A./li +liInteger absolute value calculations./li + /ul/li + /ul + h3SPARC/h3 ul liThe option code-mflat/code has been reinstated. When it is
Re: [SH] Use SImode for return value in atomic_compare_and_swap*
Oleg Endo oleg.e...@t-online.de wrote: The attached patch changes the atomic_compare_and_swap expander/insn to use SImode for the return value instead of QImode. This is more aligned to the other insns which handle the T bit as SImode and avoids some unnecessary test instructions in cases where the result of the atomic op in the T bit is re-used. Tested against rev 184582 with make -k check RUNTESTFLAGS=--target_board=sh-sim \{-m2/-ml/-msoft-atomic, -m2/-mb/-msoft-atomic, -m2a-single/-mb/-msoft-atomic, -m4-single/-ml/-msoft-atomic, -m4-single/-mb/-msoft-atomic, -m4a-single/-ml/-msoft-atomic, -m4a-single/-mb/-msoft-atomic} and no new failures. OK for 4.8. Regards, kaz
Re: [SH] Add atomic_exchange patterns
Oleg Endo oleg.e...@t-online.de wrote: The attached patch adds atomic_exchange patterns to the SH target. This results in slightly better generated code compared to the default compare_and_swap loop that is generated if atomic_exchange patterns are absent. Tested against rev 184582 with make -k check RUNTESTFLAGS=--target_board=sh-sim \{-m2/-ml/-msoft-atomic, -m2/-mb/-msoft-atomic, -m2a-single/-mb/-msoft-atomic, -m4-single/-ml/-msoft-atomic, -m4-single/-mb/-msoft-atomic, -m4a-single/-ml/-msoft-atomic, -m4a-single/-mb/-msoft-atomic} and no new failures. OK for 4.8. Regards, kaz
[patch libgcc]: Fix float128 soft-float for mingw targets
Hi, by recent tests in gcc.target/i386 I noticed that testcase float128-2.c failed on executation. This failure is caused by incompatible bitfield-structure definition in soft-fp/quad.h for enabled ms-bitfields layout. Patch marks those structures to be 'gcc_struct' for mingw targets. ChangeLog 2012-02-27 Kai Tietz kti...@redhat.com * soft-fp/quad.h: Mark bitfield-structures as gcc_struct. Regression tested for i686-w64-mingw32, x86_64-w64-mingw32, and x86_64-unknown-linux-gnu for all languages (including Ada + Obj-C++). Ok for apply? Regards, Kai Index: soft-fp/quad.h === --- soft-fp/quad.h (revision 184486) +++ soft-fp/quad.h (working copy) @@ -67,7 +67,11 @@ union _FP_UNION_Q { TFtype flt; - struct + struct +#ifdef __MINGW32__ + /* Make sure we are using gnu-style bitfield handling. */ + __attribute__ ((gcc_struct)) +#endif { #if __BYTE_ORDER == __BIG_ENDIAN unsigned sign : 1; @@ -174,7 +178,12 @@ struct { _FP_W_TYPE a, b; } longs; - struct { + struct +#ifdef __MINGW32__ + /* Make sure we are using gnu-style bitfield handling. */ + __attribute__ ((gcc_struct)) +#endif + { #if __BYTE_ORDER == __BIG_ENDIAN unsigned sign: 1; unsigned exp : _FP_EXPBITS_Q;
Re: [PR52001] too many cse reverse equiv exprs (take2)
On Feb 26, 2012, Richard Sandiford rdsandif...@googlemail.com wrote: It seemed that when we recorded two values V1 and V2 were equivalent, we added V1 to V2's location list and V2 to V1's location list. But it sounds from the above like the canonical value is what we want in almost all cases, so if V2 is the one that becomes noncanonical, is it really worth adding V2 to V1's location list? I'd given that some thought and concluded that it wasn't safe to take V2 out of V1's list, in case what we were searching for among V1's locations was precisely V2. Now, maybe there are ways around that that (say, canonicalizing a value before searching for it) that I haven't given much thought. I didn't think it would buy us much, but I could easily be wrong, and I'd be glad to look into this given evidence that I am. -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer