[Fortran, Patch] Coarray: libcaf patch for _gfortran_caf_deregister
Allocatable coarrays are freed and deregistered via the libcaf function _gfortran_caf_deregister. Currently, the front end does not generate calls to the that function, however, this patch already implements the function. See http://gcc.gnu.org/wiki/CoarrayLib and http://gcc.gnu.org/ml/fortran/2010-04/msg00168.html for details. The function is called with the coarray token as argument. The token identifies the coarray in a way defined by the library. In case of single.c, it just contains the address of the allocated memory of the coarray. In case of mpi.c, it is an array of memory addresses on all images such that token[this_image()-1] is the memory location of the current image. The patch also adds stat= and errmsg= diagnostic. TODO: Adding calls to the function in code generated by the compiler - and testing the function. Tested by compiling with mpicc and gcc with -Wall -Wextra -std=c99. OK for the trunk? Tobias 2011-08-26 Tobias Burnus bur...@net-b.de * caf/libcaf.h (_gfortran_caf_deregister): Update prototype. * caf/mpi.c (_gfortran_caf_deregister): Modify prototype, actually free memory and add error diagnostic. * caf/single.c (_gfortran_caf_deregister): Ditto. diff --git a/libgfortran/caf/libcaf.h b/libgfortran/caf/libcaf.h index 4fe09e4..e6be7ce 100644 --- a/libgfortran/caf/libcaf.h +++ b/libgfortran/caf/libcaf.h @@ -69,7 +69,7 @@ void _gfortran_caf_finalize (void); void * _gfortran_caf_register (ptrdiff_t, caf_register_t, void **, int *, char *, int); -int _gfortran_caf_deregister (void **); +void _gfortran_caf_deregister (void **, int *, char *, int); void _gfortran_caf_sync_all (int *, char *, int); diff --git a/libgfortran/caf/mpi.c b/libgfortran/caf/mpi.c index ea4c0f0..711c6ee 100644 --- a/libgfortran/caf/mpi.c +++ b/libgfortran/caf/mpi.c @@ -103,7 +103,7 @@ _gfortran_caf_finalize (void) { while (caf_static_list != NULL) { - free(caf_static_list-token[caf_this_image-1]); + free (caf_static_list-token[caf_this_image-1]); caf_static_list = caf_static_list-prev; } @@ -187,10 +187,36 @@ error: } -int -_gfortran_caf_deregister (void **token __attribute__ ((unused))) +void +_gfortran_caf_deregister (void **token, int *stat, char *errmsg, int errmsg_len) { - return 0; + if (unlikely (caf_is_finalized)) +{ + const char msg[] = Failed to deallocate coarray - + there are stopped images; + if (stat) + { + *stat = STAT_STOPPED_IMAGE; + + if (errmsg_len 0) + { + int len = ((int) sizeof (msg) - 1 errmsg_len) + ? errmsg_len : (int) sizeof (msg) - 1; + memcpy (errmsg, msg, len); + if (errmsg_len len) + memset (errmsg[len], ' ', errmsg_len-len); + } + return; + } + caf_runtime_error (msg); +} + + _gfortran_caf_sync_all (NULL, NULL, 0); + + if (stat) +*stat = 0; + + free (token[caf_this_image-1]); } @@ -267,7 +293,7 @@ _gfortran_caf_sync_images (int count, int images[], int *stat, char *errmsg, } /* Handle SYNC IMAGES(*). */ - if (unlikely(caf_is_finalized)) + if (unlikely (caf_is_finalized)) ierr = STAT_STOPPED_IMAGE; else ierr = MPI_Barrier (MPI_COMM_WORLD); diff --git a/libgfortran/caf/single.c b/libgfortran/caf/single.c index 09cc62f..50acc3d 100644 --- a/libgfortran/caf/single.c +++ b/libgfortran/caf/single.c @@ -121,10 +121,15 @@ _gfortran_caf_register (ptrdiff_t size, caf_register_t type, void **token, } -int -_gfortran_caf_deregister (void **token __attribute__ ((unused))) +void +_gfortran_caf_deregister (void **token, int *stat, + char *errmsg __attribute__ ((unused)), + int errmsg_len __attribute__ ((unused))) { - return 0; + free (*token); + + if (stat) +*stat = 0; }
Re: PATCH: Support BMI, BMI2 and LZCNT in immintrin.h
On Fri, Aug 26, 2011 at 1:11 AM, H.J. Lu hongjiu...@intel.com wrote: immintrin.h should support all Intel intrinsics. This patch adds BMI, BMI2 and LZCNT support to immintrin.h. OK for trunk? OK if passes regression testing. Uros.
Re: [PATCH, middle-end]: Fix PR50083: All 32-bit fortran tests fail on 32-bit Solaris
On Thu, 25 Aug 2011, Uros Bizjak wrote: Hello! As noted in the PR, we also have to protect conversion from round-lround for non-TARGET_C99_FUNCTIONS targets. Otherwise, gcc chokes in fold_fixed_mathfn, trying to canonicalize iround to (non-existent) lround. It looks to me, that we can trigger the same problem trying to convert (long long) round - llround - lround on non-TARGET_C99_FUNCTIONS LP64 targets, so this fix probably applies to other release branches as well. 2011-08-25 Uros Bizjak ubiz...@gmail.com PR middle-end/50083 * convert.c (convert_to_integer) BUIT_IN_ROUND{,F,L}: Convert only when TARGET_C99_FUNCTIONS. BUILT_IN_NEARBYINT{,F,L}: Ditto. BUILT_IN_RINT{,F,L}: Ditto. Bootstrapped on x86_64-pc-linux-gnu, regtesting in progress. OK for SVN and 4.6? Hmm. In builtins.c we usually check if the target has support to expand the builtins directly in case we have named patterns for them. IMHO these convert.c optimizations belong somewhere else (so that they trigger for all languages). Somewhere else these days would be tree-ssa-forwprop.c. I'm not asking you to do this move but please consider also doing the optimization when the target can expand the function directly. Thanks, Richard.
Re: [PATCH, middle-end]: Fix PR50083: All 32-bit fortran tests fail on 32-bit Solaris
On Fri, Aug 26, 2011 at 9:05 AM, Richard Guenther rguent...@suse.de wrote: As noted in the PR, we also have to protect conversion from round-lround for non-TARGET_C99_FUNCTIONS targets. Otherwise, gcc chokes in fold_fixed_mathfn, trying to canonicalize iround to (non-existent) lround. It looks to me, that we can trigger the same problem trying to convert (long long) round - llround - lround on non-TARGET_C99_FUNCTIONS LP64 targets, so this fix probably applies to other release branches as well. 2011-08-25 Uros Bizjak ubiz...@gmail.com PR middle-end/50083 * convert.c (convert_to_integer) BUIT_IN_ROUND{,F,L}: Convert only when TARGET_C99_FUNCTIONS. BUILT_IN_NEARBYINT{,F,L}: Ditto. BUILT_IN_RINT{,F,L}: Ditto. Bootstrapped on x86_64-pc-linux-gnu, regtesting in progress. OK for SVN and 4.6? Hmm. In builtins.c we usually check if the target has support to expand the builtins directly in case we have named patterns for them. IMHO these convert.c optimizations belong somewhere else (so that they trigger for all languages). Somewhere else these days would be tree-ssa-forwprop.c. I'm not asking you to do this move but please consider also doing the optimization when the target can expand the function directly. Yes, I know from our previous communication (ilogb handling) that this whole convert.c part is fishy, but my attached patch fixes the unwanted conversion in the same way as other similar builtins are handled. Uros.
Re: [PATCH, middle-end]: Fix PR50083: All 32-bit fortran tests fail on 32-bit Solaris
On Fri, Aug 26, 2011 at 9:30 AM, Richard Guenther rguent...@suse.de wrote: As noted in the PR, we also have to protect conversion from round-lround for non-TARGET_C99_FUNCTIONS targets. Otherwise, gcc chokes in fold_fixed_mathfn, trying to canonicalize iround to (non-existent) lround. It looks to me, that we can trigger the same problem trying to convert (long long) round - llround - lround on non-TARGET_C99_FUNCTIONS LP64 targets, so this fix probably applies to other release branches as well. 2011-08-25 Uros Bizjak ubiz...@gmail.com PR middle-end/50083 * convert.c (convert_to_integer) BUIT_IN_ROUND{,F,L}: Convert only when TARGET_C99_FUNCTIONS. BUILT_IN_NEARBYINT{,F,L}: Ditto. BUILT_IN_RINT{,F,L}: Ditto. Bootstrapped on x86_64-pc-linux-gnu, regtesting in progress. OK for SVN and 4.6? Hmm. In builtins.c we usually check if the target has support to expand the builtins directly in case we have named patterns for them. IMHO these convert.c optimizations belong somewhere else (so that they trigger for all languages). Somewhere else these days would be tree-ssa-forwprop.c. I'm not asking you to do this move but please consider also doing the optimization when the target can expand the function directly. Yes, I know from our previous communication (ilogb handling) that this whole convert.c part is fishy, but my attached patch fixes the unwanted conversion in the same way as other similar builtins are handled. Hmm, right, I see that now. Well, patch is ok then. I will wait for the confirmation from Rainer before committing the patch. Uros.
Re: fix for segmentation violation in dump_generic_node
On Thu, Aug 25, 2011 at 5:51 PM, Tom de Vries vr...@codesourcery.com wrote: Hi Richard, thanks for the review. On 08/25/2011 12:45 PM, Richard Guenther wrote: On Thu, Aug 25, 2011 at 12:32 PM, Tom de Vries vr...@codesourcery.com wrote: Jakub, This patch fixes a segmentation violation, which occurs when printing a MEM_REF or COMPONENT_REF containing a released ssa name. This can happen when we print basic blocks upon removal, enabled by -ftree-dump-tree-*-details (see remove_bb:tree-cfg.c). Where do we dump stmts there? In dump_bb: static void remove_bb (basic_block bb) { gimple_stmt_iterator i; if (dump_file) { fprintf (dump_file, Removing basic block %d\n, bb-index); if (dump_flags TDF_DETAILS) { dump_bb (bb, dump_file, 0); fprintf (dump_file, \n); } } Bootstrapped and reg-tested on x86_64. OK for trunk? At least TREE_TYPE (TREE_OPERAND (node, 1)) != NULL_TREE is always true. Right. The comment before the new lines is now in the wrong place and this check at least needs a comment as well. Ok, fixed that. But - it's broken to dump freed stuff, why and where do we do this? Sorry, I did not realize that. The scenario is as follows: fnsplit splits a function, and as todo cleanup_tree_cfg is called and unreachable blocks are removed, among which blocks 12 and 13. Block 12 contains a use of 45: # BLOCK 12 freq:9100 # PRED: 13 D.13888_46 = *sD.13886_45; Block 13 contains a def of 45: Block 13 # BLOCK 13 # PRED: 11 12 ... # sD.13886_45 = PHI sD.13886_44(11), sD.13886_49(12) ... if (sizeD.8479_2 iD.13887_50) goto bb 12; else goto bb 14; # SUCC: 12 14 First block 13 is removed, and remove_phi_nodes_and_edges_for_unreachable_block in remove_bb removes the phi def and releases version 45. Then block 12 is removed, and before removal it is dumped by dump_bb in remove_bb, triggering the segv. The order of removal is determined by the 2nd loop in delete_unreachable_blocks, which is chosen because there is no dominator info present: for (b = EXIT_BLOCK_PTR-prev_bb; b != ENTRY_BLOCK_PTR; b = prev_bb) { prev_bb = b-prev_bb; if (!(b-flags BB_REACHABLE)) { delete_basic_block (b); changed = true; } } I'm not sure how to fix this. Hm, it's probably easiest to fixup the dumper here indeed. Another occurance of the same segv is in remove_dead_inserted_code: EXECUTE_IF_SET_IN_BITMAP (inserted_exprs, 0, i, bi) { t = SSA_NAME_DEF_STMT (ssa_name (i)); if (!gimple_plf (t, NECESSARY)) { gimple_stmt_iterator gsi; if (dump_file (dump_flags TDF_DETAILS)) { fprintf (dump_file, Removing unnecessary insertion:); print_gimple_stmt (dump_file, t, 0, 0); } gsi = gsi_for_stmt (t); if (gimple_code (t) == GIMPLE_PHI) remove_phi_node (gsi, true); else { gsi_remove (gsi, true); release_defs (t); } } } Here a version is released, while it's used in the defining statement of version+1, which is subsequently printed. This is easy to fix by splitting the loop, I'll make a patch for this. Probably also not worth fixing. I guess we can simply go with your patch, which in it's updated form is ok for trunk. Thanks, Richard. There might be other occurrences (I triggered these 2 doing a gcc build), but I cannot trigger others until delete_unreachable_blocks does not trigger anymore. Richard. Updated untested patch attached, I'll test this patch together with the remove_dead_inserted_code patch. Thanks, - Tom 2011-08-25 Tom de Vries t...@codesourcery.com * tree-pretty-print (dump_generic_node): Test for NULL_TREE before accessing TREE_TYPE.
Re: [PATCH] Add infrastructure to merge standard builtin enums with backend builtins
On Thu, Aug 25, 2011 at 10:35 PM, Michael Meissner meiss...@linux.vnet.ibm.com wrote: On Wed, Aug 24, 2011 at 11:06:55AM +0200, Richard Guenther wrote: This basically would make DECL_BUILT_IN_CLASS no longer necessary if all targets where converted, right? (We don't currently have any BUILT_IN_FRONTEND builtins). That would sound appealing if this patch weren't a partial transition ;) Or we could reduce it to 1 bit if we aren't going to change all of the backends. Now for the possible downsides. How can we reliably distinguish middle-end from target builtins for purpose of lazy initialization? Doesn't this complicate the idea of pluggable targets, thus something like a hybrid ppc / spu compiler? In this light merging middle-end and target builtin enums and arrays sounds like a step backward. If we are willing to pay the storage costs, we could have 1 or 2 bytes for builtin owner, and 2 bytes for builtin index, and then reserve 0 for standard builtins and 1 for machine dependent builtins. However, then you still have the potential problem that sooner or later somebody else will omit the checks. I don't think that the issue you only can index BUILT_IN_NORMAL builtins in built_in_decls is an issue and worth thinking about at all. It's simply bugs. We could reserve a fixed range for plugin builtins if you think that is desirable. Oh, plugin builtins - I didn't even think about the possibility of having those ;) In the end I think we should stick with BUILT_IN_CLASS and maybe add BUILT_IN_PLUGIN then ;) What I _do_ like is having common machinery for defining builtins. Though instead of continuing the .def file way with all the current warts of ways of adding attributes, etc. to builtins I would have prefered a genbuiltins.c program that can parse standard C declarations and generate whatever is necessary to setup the builtin decls. Thus, instead of DEF_GCC_BUILTIN (BUILT_IN_CLZ, clz, BT_FN_INT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST) have simply int __builtin_clz (unsigned int) __attribute__((const,nothrow,leaf)); in a header file which genbuiltins.c would parse. My first idea when discussing this was a -fgenbuiltins flag to the C frontend (because that already can do all the parsing ...), but Micha suggested a parser that can deal with the above is easy enough to re-implement. Yes, that is certainly do-able. My main intention is to see what kind of infrastructure people wanted before changing all of the ppc builtins. Sure. I agree that all the duplicated code we have in backends for a way to create target builtins, defining enums (or not) for them and having a way to reference them for targetm.builtin_decl (or not) is bad. But unifying those, or providing common infrastructure for them should be orthogonal to the issue whether we want to merge the builtin classes or their storage in some way (I think we don't). It would of course be nice if the infrastructure to create taget builtins were generic enough to eventually handle builtin creation in the middle-end (and the frontends) as well. Hm, I guess this pushes back a bit on your patch. Sorry for that. If you're not excited to try the above idea, can you split out the pieces that do the .def file thing for rs6000, keeping the separation of md and middle-end builtin arrays and enums? I have several goals for the 4.7 time frame: 1) Make target attribute and pragma enable appropriate machine dependent builtins; That's now something completely new ;) Why do we need builtins for this? 2) Make it less likely we will again be bitten by code that blindly references built_in_decl without checking if it is MD or standard; I don't think this is important at all. Proposed solution: transition builtin decl access to a functional interface: tree built_in_decl (enum built_in_code) which when building with C++ will get you warnings if indexed with a bougs enum type or an integer type. 3) Make at least the MD builtins created on demand. It would be nice to do the standard builtins as well, but that may somewhat more problematical. I do think all references to built_in_decl and implicit_built_in_decl should be moved to a macro wrapper. To a (inline) function wrapper with the same name, indeed. If we restrict the types and attributes for a C like header file, it shouldn't be that hard (famous last words). I would think adding #ifdef also, so: #ifdef __ALTIVEC__ extern vector float __builtin_altivec_vaddfp (vector float, vector float) __attribute__ ((...)); #endif The backend would need to specify a list of valid #ifdef's and the mapping to TARGET_xxx, and valid extra types with a mapping to the internal type node. Yes. For the middle-end/frontend stuff we also need a way to specify the difference between C89 and C99 builtins and GCC internal builtins. Not sure if I'd use #ifdef like above or simply
Re: [lto] Refactor streamer (1/N) (issue4809083)
On Mon, Aug 8, 2011 at 5:17 PM, Richard Guenther rguent...@suse.de wrote: On Mon, 8 Aug 2011, Diego Novillo wrote: On Mon, Aug 8, 2011 at 10:52, Michael Matz m...@suse.de wrote: Sound. ;) Looking forward to some bikeshedding about naming in (2) and overabstraction in (3) :) Heh, yeah. I am going to be sending the renaming patch later today or tomorrow. In principle, the things I want to abstract are those that are forcing me to include lto-streamer.h from {tree,gimple,data}-streamer.*. I will know better when I merge this into the pph branch, though. Yeah, I think we discussed this already and agreed on that this is a sensible plan. This patch caused http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50165 it seems that LTO string hashing is seriously broken now. Richard. Richard.
[Patch ARM] Fix scheduling descriptions for smull.
Hi, This fixes the missing scheduling descriptions for some of the DImode multiply instructions. Tested on arm-linux-gnueabi and benchmarked with SPEC2k showing minor improvements. Will be committed shortly. cheers Ramana 2011-08-26 Ramana Radhakrishnan ramana.radhakrish...@arm.com * config/arm/cortex-a9.md (cortex_a9_mult_long): New. (cortex_a9_multiply_long): New and use above. Handle all long multiply cases. (cortex_a9_multiply): Handle smmul and smmulr. (cortex_a9_mac): Handle smmla. diff --git a/gcc/config/arm/cortex-a9.md b/gcc/config/arm/cortex-a9.md index b74ace8..12c19ef 100644 --- a/gcc/config/arm/cortex-a9.md +++ b/gcc/config/arm/cortex-a9.md @@ -68,7 +68,8 @@ cortex_a9_p1_e2 + cortex_a9_p0_e1 + cortex_a9_p1_e1) cortex_a9_mac_m1*2, cortex_a9_mac_m2, cortex_a9_p0_wb) (define_reservation cortex_a9_mac cortex_a9_multcycle1*2 ,cortex_a9_mac_m2, cortex_a9_p0_wb) - +(define_reservation cortex_a9_mult_long + cortex_a9_mac_m1*3, cortex_a9_mac_m2, cortex_a9_p0_wb) ;; Issue at the same time along the load store pipeline and ;; the VFP / Neon pipeline is not possible. @@ -139,29 +140,35 @@ cortex_a9_p1_e2 + cortex_a9_p0_e1 + cortex_a9_p1_e1) (eq_attr insn smlaxy)) cortex_a9_mac16) - (define_insn_reservation cortex_a9_multiply 4 (and (eq_attr tune cortexa9) - (eq_attr insn mul)) + (eq_attr insn mul,smmul,smmulr)) cortex_a9_mult) (define_insn_reservation cortex_a9_mac 4 (and (eq_attr tune cortexa9) - (eq_attr insn mla)) + (eq_attr insn mla,smmla)) cortex_a9_mac) +(define_insn_reservation cortex_a9_multiply_long 5 + (and (eq_attr tune cortexa9) + (eq_attr insn smull,umull,smulls,umulls,smlal,smlals,umlal,umlals)) + cortex_a9_mult_long) + ;; An instruction with a result in E2 can be forwarded ;; to E2 or E1 or M1 or the load store unit in the next cycle. (define_bypass 1 cortex_a9_dp cortex_a9_dp_shift, cortex_a9_multiply, cortex_a9_load1_2, cortex_a9_dp, cortex_a9_store1_2, - cortex_a9_mult16, cortex_a9_mac16, cortex_a9_mac, cortex_a9_store3_4, cortex_a9_load3_4) + cortex_a9_mult16, cortex_a9_mac16, cortex_a9_mac, cortex_a9_store3_4, cortex_a9_load3_4, + cortex_a9_multiply_long) (define_bypass 2 cortex_a9_dp_shift cortex_a9_dp_shift, cortex_a9_multiply, cortex_a9_load1_2, cortex_a9_dp, cortex_a9_store1_2, - cortex_a9_mult16, cortex_a9_mac16, cortex_a9_mac, cortex_a9_store3_4, cortex_a9_load3_4) + cortex_a9_mult16, cortex_a9_mac16, cortex_a9_mac, cortex_a9_store3_4, cortex_a9_load3_4, + cortex_a9_multiply_long) ;; An instruction in the load store pipeline can provide ;; read access to a DP instruction in the P0 default pipeline @@ -212,7 +219,7 @@ cortex_a9_store3_4, cortex_a9_store1_2, cortex_a9_load3_4) (define_bypass 1 cortex_a9_fps - cortex_a9_fadd, cortex_a9_fps, cortex_a9_fcmp, cortex_a9_dp, cortex_a9_dp_shift, cortex_a9_multiply) + cortex_a9_fadd, cortex_a9_fps, cortex_a9_fcmp, cortex_a9_dp, cortex_a9_dp_shift, cortex_a9_multiply, cortex_a9_multiply_long) ;; Scheduling on the FP_ADD pipeline. (define_reservation ca9fp_add ca9_issue_vfp_neon + ca9fp_add1, ca9fp_add2, ca9fp_add3, ca9fp_add4)
[PATCH, testsuite, ARM] change XFAIL to pass for ARM on a case testing tree-ssa-dom
Test case gcc.dg/tree-ssa/20040204-1.c can pass for -O1 after Richard Guenther rguent...@suse.de fixed something in tree-ssa-dom. The link_error should be optimized away for ARM targets as well. The patch is: diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c b/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c index 45e44a1..470b585 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c @@ -33,5 +33,5 @@ void test55 (int x, int y) that the should be emitted (based on BRANCH_COST). Fix this by teaching dom to look through and register all components as true. */ -/* { dg-final { scan-tree-dump-times link_error 0 optimized { xfail { ! alpha*-*-* powerpc*-*-* cris-*-* crisv32-*-* hppa*-*-* i?86-*-* mmix-*-* mips*-*-* m68k*-*-* moxie-*-* sparc*-*-* spu-*-* x86_64-*-* } } } } */ +/* { dg-final { scan-tree-dump-times link_error 0 optimized { xfail { ! alpha*-*-* arm*-*-* powerpc*-*-* cris-*-* crisv32-*-* hppa*-*-* i?86-*-* mmix-*-* mips*-*-* m68k*-*-* moxie-*-* sparc*-*-* spu-*-* x86_64-*-* } } } } */ /* { dg-final { cleanup-tree-dump optimized } } */ gcc/testsuite/ChangeLog: 2011-08-26 Jiangning Liu jiangning@arm.com PR tree-optimization/46021 * gcc.dg/tree-ssa/20040204-1.c: Don't XFAIL on arm*-*-*. Thanks, -Jiangning
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
Double ping. 2011/8/19 Ilya Enkovich enkovich@gmail.com: Ping. 2011/8/10 Ilya Enkovich enkovich@gmail.com: Hello, Here is a new version of the patch. Changes from the previous version (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg02240.html): - updated to trunk - TODO_remove_unused_locals flag was removed from todo_flags_finish of reassoc pass Bootstrapped and checked on x86_64-linux. Thanks, Ilya --- gcc/ 2011-08-10 Enkovich Ilya ilya.enkov...@intel.com PR middle-end/44382 * target.def (reassociation_width): New hook. * doc/tm.texi.in (reassociation_width): Likewise. * doc/tm.texi (reassociation_width): Likewise. * doc/invoke.texi (tree-reassoc-width): New param documented. * hooks.h (hook_int_uint_mode_1): New default hook. * hooks.c (hook_int_uint_mode_1): Likewise. * config/i386/i386.h (ix86_tune_indices): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (TARGET_REASSOC_INT_TO_PARALLEL): New. (TARGET_REASSOC_FP_TO_PARALLEL): Likewise. * config/i386/i386.c (initial_ix86_tune_features): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (ix86_reassociation_width) implementation of new hook for i386 target. * params.def (PARAM_TREE_REASSOC_WIDTH): New param added. * tree-ssa-reassoc.c (get_required_cycles): New function. (get_reassociation_width): Likewise. (swap_ops_for_binary_stmt): Likewise. (rewrite_expr_tree_parallel): Likewise. (rewrite_expr_tree): Refactored. Part of code moved into swap_ops_for_binary_stmt. (reassociate_bb): Now checks reassociation width to be used and call rewrite_expr_tree_parallel instead of rewrite_expr_tree if needed. gcc/testsuite/ 2011-08-10 Enkovich Ilya ilya.enkov...@intel.com * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option --param tree-reassoc-width=1. * gcc.dg/tree-ssa/reassoc-24.c: New test. * gcc.dg/tree-ssa/reassoc-25.c: Likewise.
Re: [PATCH][ARM] Thumb2 replicated constants
On 09/05/11 17:23, Andrew Stubbs wrote: On 06/05/11 12:18, Richard Earnshaw wrote: OK with a change to do that. Thanks, I can't commit this until my ADDW/SUBW patch has been committed. There was a bug I found in final testing, so this has been delayed somewhat. I've just committed this version. There are a few minor changes to the way negative/inverted constants are generated. Andrew 2011-08-26 Andrew Stubbs a...@codesourcery.com gcc/ * config/arm/arm.c (struct four_ints): New type. (count_insns_for_constant): Delete function. (find_best_start): Delete function. (optimal_immediate_sequence): New function. (optimal_immediate_sequence_1): New function. (arm_gen_constant): Move constant splitting code to optimal_immediate_sequence. Rewrite constant negation/invertion code. gcc/testsuite/ * gcc.target/arm/thumb2-replicated-constant1.c: New file. * gcc.target/arm/thumb2-replicated-constant2.c: New file. * gcc.target/arm/thumb2-replicated-constant3.c: New file. * gcc.target/arm/thumb2-replicated-constant4.c: New file. --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -64,6 +64,11 @@ typedef struct minipool_fixup Mfix; void (*arm_lang_output_object_attributes_hook)(void); +struct four_ints +{ + int i[4]; +}; + /* Forward function declarations. */ static bool arm_needs_doubleword_align (enum machine_mode, const_tree); static int arm_compute_static_chain_stack_bytes (void); @@ -128,7 +133,13 @@ static void arm_output_function_prologue (FILE *, HOST_WIDE_INT); static int arm_comp_type_attributes (const_tree, const_tree); static void arm_set_default_type_attributes (tree); static int arm_adjust_cost (rtx, rtx, rtx, int); -static int count_insns_for_constant (HOST_WIDE_INT, int); +static int optimal_immediate_sequence (enum rtx_code code, + unsigned HOST_WIDE_INT val, + struct four_ints *return_sequence); +static int optimal_immediate_sequence_1 (enum rtx_code code, + unsigned HOST_WIDE_INT val, + struct four_ints *return_sequence, + int i); static int arm_get_strip_length (int); static bool arm_function_ok_for_sibcall (tree, tree); static enum machine_mode arm_promote_function_mode (const_tree, @@ -2513,68 +2524,41 @@ arm_split_constant (enum rtx_code code, enum machine_mode mode, rtx insn, 1); } -/* Return the number of instructions required to synthesize the given - constant, if we start emitting them from bit-position I. */ -static int -count_insns_for_constant (HOST_WIDE_INT remainder, int i) -{ - HOST_WIDE_INT temp1; - int step_size = TARGET_ARM ? 2 : 1; - int num_insns = 0; - - gcc_assert (TARGET_ARM || i == 0); - - do -{ - int end; - - if (i = 0) - i += 32; - if (remainder (((1 step_size) - 1) (i - step_size))) - { - end = i - 8; - if (end 0) - end += 32; - temp1 = remainder ((0x0ff end) -| ((i end) ? (0xff (32 - end)) : 0)); - remainder = ~temp1; - num_insns++; - i -= 8 - step_size; - } - i -= step_size; -} while (remainder); - return num_insns; -} - +/* Return a sequence of integers, in RETURN_SEQUENCE that fit into + ARM/THUMB2 immediates, and add up to VAL. + Thr function return value gives the number of insns required. */ static int -find_best_start (unsigned HOST_WIDE_INT remainder) +optimal_immediate_sequence (enum rtx_code code, unsigned HOST_WIDE_INT val, + struct four_ints *return_sequence) { int best_consecutive_zeros = 0; int i; int best_start = 0; + int insns1, insns2; + struct four_ints tmp_sequence; /* If we aren't targetting ARM, the best place to start is always at - the bottom. */ - if (! TARGET_ARM) -return 0; - - for (i = 0; i 32; i += 2) + the bottom, otherwise look more closely. */ + if (TARGET_ARM) { - int consecutive_zeros = 0; - - if (!(remainder (3 i))) + for (i = 0; i 32; i += 2) { - while ((i 32) !(remainder (3 i))) - { - consecutive_zeros += 2; - i += 2; - } - if (consecutive_zeros best_consecutive_zeros) + int consecutive_zeros = 0; + + if (!(val (3 i))) { - best_consecutive_zeros = consecutive_zeros; - best_start = i - consecutive_zeros; + while ((i 32) !(val (3 i))) + { + consecutive_zeros += 2; + i += 2; + } + if (consecutive_zeros best_consecutive_zeros) + { + best_consecutive_zeros = consecutive_zeros; + best_start = i - consecutive_zeros; + } + i -= 2; } - i -= 2; } } @@ -2601,13 +2585,161 @@ find_best_start (unsigned HOST_WIDE_INT remainder) the constant starting from `best_start', and also starting from zero (i.e. with bit 31 first to be output). If `best_start' doesn't yield a shorter sequence, we may as well use zero. */ + insns1 = optimal_immediate_sequence_1 (code, val, return_sequence, best_start); if (best_start != 0 - unsigned HOST_WIDE_INT) 1) best_start) remainder) -
Re: [PATCH] PR42554/49992: avoid use of '-c' flag with ranlib on darwin10 and later
* Jack Howarth wrote on Fri, Aug 12, 2011 at 01:27:21AM CEST: The following patch addresses http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42554#c15 by extending the logic used in... URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=157563 Log: PR ada/42554 * configure.ac: Only pass -c to ranlib for darwin9 and earlier. * configure: Regenerate. Okay for gcc trunk? OK with ... 2010-08-11 Jack Howarth howa...@bromo.med.uc.edu PR 42554/49992 * gcc/configure.ac: Only pass -c to ranlib for darwin9 and earlier. * gcc/configure.ac: Regenerate. ... typo in file name fixed. Thanks, Ralf --- gcc/configure.ac (revision 177684) +++ gcc/configure.ac (working copy) @@ -821,11 +821,8 @@ gcc_AC_PROG_LN_S ACX_PROG_LN($LN_S) AC_PROG_RANLIB case ${host} in -*-*-darwin*) - # By default, the Darwin ranlib will not treat common symbols as - # definitions when building the archive table of contents. Other - # ranlibs do that; pass an option to the Darwin ranlib that makes - # it behave similarly. +*-*-darwin[[3-9]]*) + # ranlib before Darwin10 requires the -c flag to look at common symbols. ranlib_flags=-c ;; *)
PING: [PATCH]: Fix -fbranch-probabilities
Hello, Could I have a review for the trivial patch posted in http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01123.html -fprofile-use sets flag_branch_probabilities. But we should also be able to use -fbranch-probabilities on its own using the information generated by -fprofile-arcs, as documented. Many thanks Christian
Re: [PATCH] PR42554/49992: avoid use of '-c' flag with ranlib on darwin10 and later
On 26 Aug 2011, at 11:27, Ralf Wildenhues wrote: * Jack Howarth wrote on Fri, Aug 12, 2011 at 01:27:21AM CEST: The following patch addresses http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42554#c15 by extending the logic used in... URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=157563 Log: PR ada/42554 * configure.ac: Only pass -c to ranlib for darwin9 and earlier. * configure: Regenerate. Okay for gcc trunk? OK with ... 2010-08-11 Jack Howarth howa...@bromo.med.uc.edu PR 42554/49992 * gcc/configure.ac: Only pass -c to ranlib for darwin9 and earlier. * gcc/configure.ac: Regenerate. ... typo in file name fixed. Thanks, Ralf --- gcc/configure.ac(revision 177684) +++ gcc/configure.ac(working copy) @@ -821,11 +821,8 @@ gcc_AC_PROG_LN_S ACX_PROG_LN($LN_S) AC_PROG_RANLIB case ${host} in -*-*-darwin*) - # By default, the Darwin ranlib will not treat common symbols as - # definitions when building the archive table of contents. Other - # ranlibs do that; pass an option to the Darwin ranlib that makes - # it behave similarly. +*-*-darwin[[3-9]]*) + # ranlib before Darwin10 requires the -c flag to look at common symbols. ranlib_flags=-c ;; *) I am still investigating this -- getting Ada bootstrapped on ppc has taken some time... not objecting to the patch - but I think we can go further ... as commented in the PR, I would say that we can likely remove the special casing of ranlib completely for all Darwin (some more testing on ppc/ada still under way). So far OK on ppc/darwin8x86_64/ darwin10 (incl. ada on *86*) As things stand, darwin 8 will not bootstrap GCC 4.6 or trunk with its native toolset; it requires the use of odcctools or similar to make use of newer versions of ld. (thus, support of ancient darwin is conditional on use of a toolset from at least darwin 8 era). cheers Iain
[testsuite, i386] Fix for PR50185
Hi, Here is a fix for http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182 testsuite/ChangeLog entry: 2011-08-26 Kirill Yukhin kirill.yuk...@intel.com PR testsuite/50185 * gcc.target/i386/avx2-vmovmskb-2.c: Rename to ... * gcc.target/i386/avx2-vpmovmskb-2.c: ... this. Update. Test passes. Ok for trunk? Thanks, K pr50185.gcc.patch Description: Binary data
Re: [testsuite, i386] Fix for PR50185
According to Jakub's input, I've updated test to scan instruction, not pattern name. Is it ok? Thanks, K On Fri, Aug 26, 2011 at 3:45 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hi, Here is a fix for http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182 testsuite/ChangeLog entry: 2011-08-26 Kirill Yukhin kirill.yuk...@intel.com PR testsuite/50185 * gcc.target/i386/avx2-vmovmskb-2.c: Rename to ... * gcc.target/i386/avx2-vpmovmskb-2.c: ... this. Update. Test passes. Ok for trunk? Thanks, K pr50185-2.gcc.patch Description: Binary data
Re: [lto] Refactor streamer (1/N) (issue4809083)
Hi, On Fri, 26 Aug 2011, Richard Guenther wrote: I am going to be sending the renaming patch later today or tomorrow. In principle, the things I want to abstract are those that are forcing me to include lto-streamer.h from {tree,gimple,data}-streamer.*. I will know better when I merge this into the pph branch, though. Yeah, I think we discussed this already and agreed on that this is a sensible plan. This patch caused http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50165 it seems that LTO string hashing is seriously broken now. Once regstrap passes on x86_64-linux I'm checking this in as obvious. Ciao, Michael. -- PR lto/50165 * lto-streamer-in.c (canon_file_name): Initialize new_slot-len. Index: lto-streamer-in.c === --- lto-streamer-in.c (revision 178040) +++ lto-streamer-in.c (working copy) @@ -113,6 +113,7 @@ canon_file_name (const char *string) new_slot = XCNEW (struct string_slot); strcpy (saved_string, string); new_slot-s = saved_string; + new_slot-len = len; *slot = new_slot; return saved_string; }
Re: [lto] Refactor streamer (1/N) (issue4809083)
On 11-08-26 04:24 , Richard Guenther wrote: This patch caused http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50165 it seems that LTO string hashing is seriously broken now. Sorry about this. Bad timing as I will be away until 7/Sep. Would it make things easier if the commit that introduced this was reverted? Diego.
[patch, libfortran] Fix PR 50192 - fix wide-char comparison
Hello world, the attached patch fixes the PR by doing comparisions for wide characters as unsigned 4-byte ints. I have put the comparison function into libgfortran.h because I will need it for MINLOC and friends for characters. OK for trunk? Which branches should this be backported to? Thomas 2011-08-26 Thomas Koenig tkoe...@gcc.gnu.org PR libfortran/50192 * intrinsics/string_intrinsics.c (memcmp_char4): New function. * intrinsics/string_intrinsics_inc.c: New macro MEMCMP, either set to memcmp or memcmp_char4. (compare_string): Use MEMCMP, with correct size for it. * libgfortran.h: Add prototype for memcmp_char4. 2011-08-26 Thomas Koenig tkoe...@gcc.gnu.org PR libfortran/50192 * gfortran.dg/widechar_compare_1.f90: New test. Index: intrinsics/string_intrinsics_inc.c === --- intrinsics/string_intrinsics_inc.c (Revision 178067) +++ intrinsics/string_intrinsics_inc.c (Arbeitskopie) @@ -90,7 +90,7 @@ compare_string (gfc_charlen_type len1, const CHART gfc_charlen_type len; int res; - res = memcmp (s1, s2, ((len1 len2) ? len1 : len2) * sizeof (CHARTYPE)); + res = MEMCMP (s1, s2, ((len1 len2) ? len1 : len2)); if (res != 0) return res; Index: intrinsics/string_intrinsics.c === --- intrinsics/string_intrinsics.c (Revision 178067) +++ intrinsics/string_intrinsics.c (Arbeitskopie) @@ -51,7 +51,24 @@ memset_char4 (gfc_char4_t *b, gfc_char4_t c, size_ return b; } +/* Compare wide character types, which are handled internally as + unsigned 4-byte integers. */ +int +memcmp_char4 (const void *a, const void *b, size_t len) +{ + const GFC_UINTEGER_4 *pa = a; + const GFC_UINTEGER_4 *pb = b; + while (len-- 0) +{ + if (*pa != *pb) + return *pa *pb ? -1 : 1; + pa ++; + pb ++; +} + return 0; +} + /* All other functions are defined using a few generic macros in string_intrinsics_inc.c, so we avoid code duplication between the various character type kinds. */ @@ -64,6 +81,8 @@ memset_char4 (gfc_char4_t *b, gfc_char4_t c, size_ #define SUFFIX(x) x #undef MEMSET #define MEMSET memset +#undef MEMCMP +#define MEMCMP memcmp #include string_intrinsics_inc.c @@ -76,6 +95,8 @@ memset_char4 (gfc_char4_t *b, gfc_char4_t c, size_ #define SUFFIX(x) x ## _char4 #undef MEMSET #define MEMSET memset_char4 +#undef MEMCMP +#define MEMCMP memcmp_char4 #include string_intrinsics_inc.c Index: libgfortran.h === --- libgfortran.h (Revision 178067) +++ libgfortran.h (Arbeitskopie) @@ -1266,6 +1266,10 @@ extern int compare_string_char4 (gfc_charlen_type, gfc_charlen_type, const gfc_char4_t *); iexport_proto(compare_string_char4); +extern int memcmp_char4 (const void *, const void *, size_t); +internal_proto(memcmp_char4); + + /* random.c */ extern void random_seed_i4 (GFC_INTEGER_4 * size, gfc_array_i4 * put, ! { dg-do run } ! PR 50192 - on little-endian systems, this used to fail. program main character(kind=4,len=2) :: c1, c2 c1 = 4_' ' c2 = 4_' ' c1(1:1) = transfer(257, mold=c1(1:1)) c2(1:1) = transfer(64, mold=c2(1:1)) if (c1 c2) call abort end program main
Re: [patch, libfortran] Fix PR 50192 - fix wide-char comparison
Am 26.08.2011 14:40, schrieb Thomas Koenig: OK for trunk? Which branches should this be backported to? I forgot - also regression-tested. Thomas
Re: [lto] Refactor streamer (1/N) (issue4809083)
On Fri, Aug 26, 2011 at 02:34:29PM +0200, Michael Matz wrote: Hi, On Fri, 26 Aug 2011, Richard Guenther wrote: I am going to be sending the renaming patch later today or tomorrow. In principle, the things I want to abstract are those that are forcing me to include lto-streamer.h from {tree,gimple,data}-streamer.*. I will know better when I merge this into the pph branch, though. Yeah, I think we discussed this already and agreed on that this is a sensible plan. This patch caused http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50165 it seems that LTO string hashing is seriously broken now. Once regstrap passes on x86_64-linux I'm checking this in as obvious. While you are touching it, I think we should also optimize it as in the patch below. I'm afraid no string length optimization would be able to figure out that it doesn't have to call strlen twice, because the htab_find_slot isn't pure. 2011-08-26 Jakub Jelinek ja...@redhat.com * lto-streamer-in.c (canon_file_name): Avoid calling strlen twice, use memcpy instead of strcpy. --- gcc/lto-streamer-in.c.jj2011-08-26 14:39:52.0 +0200 +++ gcc/lto-streamer-in.c 2011-08-26 14:40:59.543884012 +0200 @@ -98,21 +98,20 @@ canon_file_name (const char *string) { void **slot; struct string_slot s_slot; + size_t len = strlen (string); s_slot.s = string; - s_slot.len = strlen (string); + s_slot.len = len; slot = htab_find_slot (file_name_hash_table, s_slot, INSERT); if (*slot == NULL) { - size_t len; char *saved_string; struct string_slot *new_slot; - len = strlen (string); saved_string = (char *) xmalloc (len + 1); new_slot = XCNEW (struct string_slot); new_slot-len = len; - strcpy (saved_string, string); + memcpy (saved_string, string, len + 1); new_slot-s = saved_string; *slot = new_slot; return saved_string; Jakub
Re: [testsuite, i386] Fix for PR50185
On Fri, Aug 26, 2011 at 2:04 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: According to Jakub's input, I've updated test to scan instruction, not pattern name. Is it ok? Thanks, K On Fri, Aug 26, 2011 at 3:45 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hi, Here is a fix for http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182 testsuite/ChangeLog entry: 2011-08-26 Kirill Yukhin kirill.yuk...@intel.com PR testsuite/50185 * gcc.target/i386/avx2-vmovmskb-2.c: Rename to ... * gcc.target/i386/avx2-vpmovmskb-2.c: ... this. Update. Test passes. Ok for trunk? Is this correct ChangeLog? Looking into the patch, you are changing one test to look for insn name, while adding avx2-vpmovmskb-2.c which still looks for pattern name. Please update ChangeLog and/or attached patch. Uros.
[v3] Handle different versions of Solaris 8 iso/math_iso.h, iso/stdlib_iso.h
All my testing of the __cplusplus 199711L patches had been on Solaris 8+/x86. During last weekend's bootstrap on the whole range of systems (Solaris 8 to 11, SPARC and x86), it turned out that there are possible variations of iso/math_iso.h and iso/stdlib_iso.h between Solaris 8 FCS and patches, so we cannot statically configure which overloads are present, but need autoconf checks for that. The situation is as follows: * Solaris 8 FCS shipped rev. 1.1 of iso/math_iso.h which only had double std::abs(double). Later, in patches 111721-04 (SPARC) and 112757-01 (x86), rev. 1.3 whas shipped that has everything that's also present in Solaris 9 and up. * Similarly, Solaris 8 FCS has rev. 1.1 of iso/stdlib_iso.h without any overloads. Patches 109607-02 (SPARC) and 109608-02 (x86) added long std::abs(long) and ldiv_t div(lng, long) in rev. 1.3. Since bits/os_defines.h is included before configure results, configure needs to define the affected __CORRECT_ISO_CPP_MATH_H_PROTO[12] and __CORRECT_ISO_CPP_STDLIB_H_PROTO directly. The following patch does just that. Bootstrapped without regressions on x86_64-unknown-linux-gnu and i386-pc-solaris2.11, bootstraps on i386-pc-solaris2.8 (with the old rev. 1.1 headers) and sparc-sun-solaris2.8 (with the the rev. 1.3 headers) are still in progress, but I've verified that the __CORRECT_ISO_CPP_* macros are all defined correctly.. Since errors in previous versions of the patch manifested themselves in build failures immediately, I'm pretty certain that there are no errors. Ok for mainline if bootstraps pass? Thanks. Rainer 2011-08-25 Rainer Orth r...@cebitec.uni-bielefeld.de * acinclude.m4 (GLIBCXX_CHECK_MATH_PROTO) (GLIBCXX_CHECK_STDLIB_PROTO): New tests. * configure.ac (GLIBCXX_CHECK_MATH_PROTO) (GLIBCXX_CHECK_STDLIB_PROTO): Call them. * configure: Regenerate. * config.h.in: Regenerate. * config/os/solaris/solaris2.8/os_defines.h (__CORRECT_ISO_CPP_MATH_H_PROTO2): Don't define. * config/os/solaris/solaris2.9: Remove. * configure.host (solaris2.8): Merge with ... (solaris2.9, solaris2.1[0-9]): ... this. Always use os/solaris/solaris2.8. # HG changeset patch # Parent b3524f20d0077532a567b222d37ef05976af2743 Handle different versions of Solaris 8 iso/math_iso.h diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4 --- a/libstdc++-v3/acinclude.m4 +++ b/libstdc++-v3/acinclude.m4 @@ -1693,6 +1693,100 @@ AC_DEFUN([GLIBCXX_COMPUTE_STDIO_INTEGER_ ]) dnl +dnl Check whether required C++ overloads are present in math.h. +dnl + +AC_DEFUN([GLIBCXX_CHECK_MATH_PROTO], [ + + AC_LANG_SAVE + AC_LANG_CPLUSPLUS + + case $host in +*-*-solaris2.*) + # Solaris 8 FCS only had an overload for double std::abs(double) in + # iso/math_iso.h. Patches 111721-04 (SPARC) and 112757-01 (x86) + # introduced the full set also found from Solaris 9 onwards. + AC_MSG_CHECKING([for float std::abs(float) overload]) + AC_CACHE_VAL(glibcxx_cv_abs_float, [ + AC_COMPILE_IFELSE([AC_LANG_SOURCE( + [#include math.h + namespace std { + inline float abs(float __x) + { return __builtin_fabsf(__x); } + } + ])], +[glibcxx_cv_abs_float=no], +[glibcxx_cv_abs_float=yes] + )]) + + # autoheader cannot handle indented templates. + AH_VERBATIM([__CORRECT_ISO_CPP_MATH_H_PROTO1], +[/* Define if all C++ overloads are available in math.h. */ +#if __cplusplus = 199711L +#undef __CORRECT_ISO_CPP_MATH_H_PROTO1 +#endif]) + AH_VERBATIM([__CORRECT_ISO_CPP_MATH_H_PROTO2], +[/* Define if only double std::abs(double) is available in math.h. */ +#if __cplusplus = 199711L +#undef __CORRECT_ISO_CPP_MATH_H_PROTO2 +#endif]) + + if test $glibcxx_cv_abs_float = yes; then +AC_DEFINE(__CORRECT_ISO_CPP_MATH_H_PROTO1) + else +AC_DEFINE(__CORRECT_ISO_CPP_MATH_H_PROTO2) + fi + AC_MSG_RESULT($glibcxx_cv_abs_float) + ;; + esac + + AC_LANG_RESTORE +]) + +dnl +dnl Check whether required C++ overloads are present in stdlib.h. +dnl + +AC_DEFUN([GLIBCXX_CHECK_STDLIB_PROTO], [ + + AC_LANG_SAVE + AC_LANG_CPLUSPLUS + + case $host in +*-*-solaris2.*) + # Solaris 8 FCS lacked the overloads for long std::abs(long) and + # ldiv_t std::div(long, long) in iso/stdlib_iso.h. Patches 109607-02 + # (SPARC) and 109608-02 (x86) introduced them. + AC_MSG_CHECKING([for long std::abs(long) overload]) + AC_CACHE_VAL(glibcxx_cv_abs_long, [ + AC_COMPILE_IFELSE([AC_LANG_SOURCE( + [#include stdlib.h + namespace std { + inline long + abs(long __i) { return labs(__i); } + } +])], +[glibcxx_cv_abs_long=no], +[glibcxx_cv_abs_long=yes] + )]) + + # autoheader cannot handle indented templates. + AH_VERBATIM([__CORRECT_ISO_CPP_STDLIB_H_PROTO], +[/* Define if all C++ overloads are available in stdlib.h. */
Re: [PATCH] PR42554/49992: avoid use of '-c' flag with ranlib on darwin10 and later
On Fri, Aug 26, 2011 at 12:09:53PM +0100, Iain Sandoe wrote: On 26 Aug 2011, at 11:27, Ralf Wildenhues wrote: * Jack Howarth wrote on Fri, Aug 12, 2011 at 01:27:21AM CEST: The following patch addresses http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42554#c15 by extending the logic used in... URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=157563 Log: PR ada/42554 * configure.ac: Only pass -c to ranlib for darwin9 and earlier. * configure: Regenerate. Okay for gcc trunk? OK with ... 2010-08-11 Jack Howarth howa...@bromo.med.uc.edu PR 42554/49992 * gcc/configure.ac: Only pass -c to ranlib for darwin9 and earlier. * gcc/configure.ac: Regenerate. ... typo in file name fixed. Thanks, Ralf --- gcc/configure.ac(revision 177684) +++ gcc/configure.ac(working copy) @@ -821,11 +821,8 @@ gcc_AC_PROG_LN_S ACX_PROG_LN($LN_S) AC_PROG_RANLIB case ${host} in -*-*-darwin*) - # By default, the Darwin ranlib will not treat common symbols as - # definitions when building the archive table of contents. Other - # ranlibs do that; pass an option to the Darwin ranlib that makes - # it behave similarly. +*-*-darwin[[3-9]]*) + # ranlib before Darwin10 requires the -c flag to look at common symbols. ranlib_flags=-c ;; *) I am still investigating this -- getting Ada bootstrapped on ppc has taken some time... Iain, Why don't you take the path of least resistance and just post your proposed patch to unconditionally drop -c from ranflags with a cc to the AdaCore developers. They should already be configured to test on darwin. Jack not objecting to the patch - but I think we can go further ... as commented in the PR, I would say that we can likely remove the special casing of ranlib completely for all Darwin (some more testing on ppc/ada still under way). So far OK on ppc/darwin8x86_64/darwin10 (incl. ada on *86*) As things stand, darwin 8 will not bootstrap GCC 4.6 or trunk with its native toolset; it requires the use of odcctools or similar to make use of newer versions of ld. (thus, support of ancient darwin is conditional on use of a toolset from at least darwin 8 era). cheers Iain
Re: [PATCH, ARM] Generate conditional compares in Thumb2 state
On 19 August 2011 11:06, Ramana Radhakrishnan ramana.radhakrish...@linaro.org wrote: Regression test against cortex-M0/M3/M4 profile with -mthumb option doesn't show any new failures. Please test on ARM state as well and make sure there are no regressions before committing. Jiangning told me privately that the test-results for v7-a were fine for cross-testing for arm-eabi with C and C++. And this is what I committed cheers Ramana 2011-08-26 Jiangning Liu jiangning@arm.com * config/arm/arm.md (*ior_scc_scc): Enable for Thumb2 as well. (*ior_scc_scc_cmp): Likewise (*and_scc_scc): Likewise. (*and_scc_scc_cmp): Likewise. (*and_scc_scc_nodom): Likewise. (*cmp_ite0, *cmp_ite1, *cmp_and, *cmp_ior): Handle Thumb2. 2011-08-26 Jiangning Liu jiangning@arm.com * gcc.target/arm/thumb2-cond-cmp-1.c: New. * gcc.target/arm/thumb2-cond-cmp-2.c: Likewise. * gcc.target/arm/thumb2-cond-cmp-3.c: Likewise. * gcc.target/arm/thumb2-cond-cmp-4.c: Likewise. Ok if no regressions. Ramana Thanks, -Jiangning Index: gcc/config/arm/arm.md === --- gcc/config/arm/arm.md (revision 178097) +++ gcc/config/arm/arm.md (working copy) @@ -49,6 +49,15 @@ (DOM_CC_X_OR_Y 2) ] ) +;; conditional compare combination +(define_constants + [(CMP_CMP 0) + (CMN_CMP 1) + (CMP_CMN 2) + (CMN_CMN 3) + (NUM_OF_COND_CMP 4) + ] +) ;; UNSPEC Usage: ;; Note: sin and cos are no-longer used. @@ -8980,40 +8989,85 @@ (set_attr length 8,12)] ) -;; ??? Is it worth using these conditional patterns in Thumb-2 mode? (define_insn *cmp_ite0 [(set (match_operand 6 dominant_cc_register ) (compare (if_then_else:SI (match_operator 4 arm_comparison_operator - [(match_operand:SI 0 s_register_operand r,r,r,r) - (match_operand:SI 1 arm_add_operand rI,L,rI,L)]) + [(match_operand:SI 0 s_register_operand + l,l,l,r,r,r,r,r,r) + (match_operand:SI 1 arm_add_operand + lPy,lPy,lPy,rI,L,rI,L,rI,L)]) (match_operator:SI 5 arm_comparison_operator - [(match_operand:SI 2 s_register_operand r,r,r,r) - (match_operand:SI 3 arm_add_operand rI,rI,L,L)]) + [(match_operand:SI 2 s_register_operand + l,r,r,l,l,r,r,r,r) + (match_operand:SI 3 arm_add_operand + lPy,rI,L,lPy,lPy,rI,rI,L,L)]) (const_int 0)) (const_int 0)))] - TARGET_ARM + TARGET_32BIT * { -static const char * const opcodes[4][2] = +static const char * const cmp1[NUM_OF_COND_CMP][2] = { - {\cmp\\t%2, %3\;cmp%d5\\t%0, %1\, - \cmp\\t%0, %1\;cmp%d4\\t%2, %3\}, - {\cmp\\t%2, %3\;cmn%d5\\t%0, #%n1\, - \cmn\\t%0, #%n1\;cmp%d4\\t%2, %3\}, - {\cmn\\t%2, #%n3\;cmp%d5\\t%0, %1\, - \cmp\\t%0, %1\;cmn%d4\\t%2, #%n3\}, - {\cmn\\t%2, #%n3\;cmn%d5\\t%0, #%n1\, - \cmn\\t%0, #%n1\;cmn%d4\\t%2, #%n3\} + {\cmp%d5\\t%0, %1\, + \cmp%d4\\t%2, %3\}, + {\cmn%d5\\t%0, #%n1\, + \cmp%d4\\t%2, %3\}, + {\cmp%d5\\t%0, %1\, + \cmn%d4\\t%2, #%n3\}, + {\cmn%d5\\t%0, #%n1\, + \cmn%d4\\t%2, #%n3\} }; +static const char * const cmp2[NUM_OF_COND_CMP][2] = +{ + {\cmp\\t%2, %3\, + \cmp\\t%0, %1\}, + {\cmp\\t%2, %3\, + \cmn\\t%0, #%n1\}, + {\cmn\\t%2, #%n3\, + \cmp\\t%0, %1\}, + {\cmn\\t%2, #%n3\, + \cmn\\t%0, #%n1\} +}; +static const char * const ite[2] = +{ + \it\\t%d5\, + \it\\t%d4\ +}; +static const int cmp_idx[9] = {CMP_CMP, CMP_CMP, CMP_CMN, + CMP_CMP, CMN_CMP, CMP_CMP, + CMN_CMP, CMP_CMN, CMN_CMN}; int swap = comparison_dominates_p (GET_CODE (operands[5]), GET_CODE (operands[4])); -return opcodes[which_alternative][swap]; +output_asm_insn (cmp2[cmp_idx[which_alternative]][swap], operands); +if (TARGET_THUMB2) { + output_asm_insn (ite[swap], operands); +} +output_asm_insn (cmp1[cmp_idx[which_alternative]][swap], operands); +return \\; } [(set_attr conds set) - (set_attr length 8)] + (set_attr arch t2,t2,t2,t2,t2,any,any,any,any) + (set_attr_alternative length + [(const_int 6) + (const_int 8) + (const_int 8) + (const_int 8) + (const_int 8) + (if_then_else (eq_attr is_thumb no) + (const_int 8) + (const_int 10)) + (if_then_else (eq_attr is_thumb no) + (const_int 8) + (const_int 10)) + (if_then_else (eq_attr is_thumb no) + (const_int 8) + (const_int 10)) + (if_then_else (eq_attr is_thumb no) + (const_int 8) + (const_int 10))])] ) (define_insn *cmp_ite1 @@ -9021,35 +9075,81 @@ (compare (if_then_else:SI
[PATCH] Handle MEM_REF in decode_addr_const
Another missed piece, exposed by less MEM_REF - ARRAY_REF folding. Interestingly only for Ada testcases. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2011-08-26 Richard Guenther rguent...@suse.de * varasm.c (decode_addr_const): Handle MEM_REF[X, OFF]. Index: gcc/varasm.c === *** gcc/varasm.c(revision 178096) --- gcc/varasm.c(working copy) *** decode_addr_const (tree exp, struct addr *** 2592,2597 --- 2592,2603 * tree_low_cst (TREE_OPERAND (target, 1), 0)); target = TREE_OPERAND (target, 0); } + else if (TREE_CODE (target) == MEM_REF + TREE_CODE (TREE_OPERAND (target, 0)) == ADDR_EXPR) + { + offset += mem_ref_offset (target).low; + target = TREE_OPERAND (TREE_OPERAND (target, 0), 0); + } else if (TREE_CODE (target) == INDIRECT_REF TREE_CODE (TREE_OPERAND (target, 0)) == NOP_EXPR TREE_CODE (TREE_OPERAND (TREE_OPERAND (target, 0), 0))
Re: [v3] Handle different versions of Solaris 8 iso/math_iso.h, iso/stdlib_iso.h
Hi, Ok for mainline if bootstraps pass? Not a comment strictly about this patch, but why we have things like #if __cplusplus = 199711L anywhere? For sure the library is not supposed to be used together with old C++ front-ends. Paolo.
Re: Rename across basic block boundaries
Rather than using global variables and then copying them into a bb structure, would it be possible to write directly into the bb structure? The answer's probably no, just asking. :-) Bernd Schmidt ber...@codesourcery.com writes: * regrename.c (struct du_head): Make nregs signed. (scan_rtx_reg, scan_rtx_address, dump_def_use_chain): Remove declarations. This bit was split out. (mark_conflict, create_new_chain): Move upwards in the file. Same here. Should mention the change to create_new_chain's interface though. - 2. For each chain, the set of possible renaming registers is computed. + 2. We try combine the local chains across basic block boundaries by +comparing chains that were open at the start or end of a block to + those in successor/predecessor blocks. try to combine +/* Dump all def/use chains, starting at id FROM. */ static void -dump_def_use_chain (struct du_head *head) +dump_def_use_chain (int from) { - while (head) + du_head_p head; + int i; + FOR_EACH_VEC_ELT (du_head_p, id_to_chain, i, head) { struct du_chain *this_du = head-first; + if (i from) + continue; fprintf (dump_file, Register %s (%d):, reg_names[head-regno], head-nregs); while (this_du) I know it's only a dumping function, but maybe this'd be a good excuse to add: #define FOR_EACH_VEC_ELT_FROM(T, V, I, P, FROM) \ for (I = (FROM); VEC_iterate (T, (V), (I), (P)); ++(I)) +/* A structure recording information about each basic block. It is saved + and restored around basic block boundaries. */ +struct bb_rename_info Probably worth saying here or elsewhere that the bb-aux field points to this information and that (more importantly) the bb-aux is null for blocks that could not be optimised, including the exit block. +/* Record in RI that the block corresponding to it has an incoming + live value, described by CHAIN. */ +static void +set_incoming_from_chain (struct bb_rename_info *ri, du_head_p chain) +{ + int min_reg, max_reg, i; + int incoming_nregs = ri-incoming[chain-regno].nregs; + int nregs; + + /* If we've recorded the same information before, everything is fine. */ + if (incoming_nregs == chain-nregs) +{ + if (dump_file) + fprintf (dump_file, reg %d/%d already recorded\n, + chain-regno, chain-nregs); + return; +} + + /* If we have no information for any of the involved registers, update + the incoming array. */ + nregs = chain-nregs; + while (nregs-- 0) +if (ri-incoming[chain-regno + nregs].nregs != 0 + || ri-incoming[chain-regno + nregs].unusable) + break; + if (nregs 0) +{ + nregs = chain-nregs; + ri-incoming[chain-regno].nregs = nregs; + while (nregs-- 1) + ri-incoming[chain-regno + nregs].nregs = -nregs; + if (dump_file) + fprintf (dump_file, recorded reg %d/%d\n, + chain-regno, chain-nregs); + return; +} + + /* There must be some kind of conflict. Set the unusable for all + overlapping registers. */ + min_reg = chain-regno; + if (incoming_nregs 0) +min_reg += incoming_nregs; + max_reg = chain-regno + chain-nregs; + for (i = min_reg; i max_reg; i++) +ri-incoming[i].unusable = true; In the incoming_nregs 0 case, we only need to set ri-incoming[chain-regno + incoming_nregs] itself, right, not the other registers between that and ri-incoming[chain-regno]? If so, I think it'd be clearer to have: /* There must be some kind of conflict. Prevent both the old and new ranges from being used. */ if (incoming_nregs 0) ri-incoming[chain-regno + incoming_nregs].unusable = true; for (i = 0; i chain-nregs; i++) ri-incoming[chain-regno + i].unusable = true; When I first looked at the code, I was wondering why we changed every register in (chain-regno + incoming_nregs, chain_regno), but none in [chain-regno + chain-nregs, OLD_END). Seems like we should do neither (as in the above suggestion) or both. + /* Process all basic blocks by using a worklist, adding unvisited successor + blocks whenever we reach the end of one basic blocks. This ensures that + whenever possible, we only process a block after at least one of its + predecessors, which provides a seeding effect to make the logic in + set_incoming_from_chain and init_rename_info useful. */ Wouldn't a reverse post-order (inverted_post_order_compute) allow even more pre-opening (as well as being less code)? Looked good to me otherwise. Richard
Re: [v3] Handle different versions of Solaris 8 iso/math_iso.h, iso/stdlib_iso.h
Hi Paolo, Ok for mainline if bootstraps pass? Not a comment strictly about this patch, but why we have things like #if __cplusplus = 199711L anywhere? For sure the library is not supposed to be used together with old C++ front-ends. I thought about this myself, but at least the overloads are only present with __cplusplus = 199711L. I think it's best to match this to avoid strange problems if a user plays strange games with __cplusplus. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [testsuite, i386] Fix for PR50185
On Fri, Aug 26, 2011 at 5:04 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: According to Jakub's input, I've updated test to scan instruction, not pattern name. Is it ok? Thanks, K On Fri, Aug 26, 2011 at 3:45 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hi, Here is a fix for http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182 testsuite/ChangeLog entry: 2011-08-26 Kirill Yukhin kirill.yuk...@intel.com PR testsuite/50185 * gcc.target/i386/avx2-vmovmskb-2.c: Rename to ... * gcc.target/i386/avx2-vpmovmskb-2.c: ... this. Update. Test passes. Ok for trunk? Thanks, K Please check ALL AVX2 tests to see if they have similar problems. -- H.J.
Re: [v3] Handle different versions of Solaris 8 iso/math_iso.h, iso/stdlib_iso.h
On 8/26/11 2:59 PM, Rainer Orth wrote: Hi Paolo, Ok for mainline if bootstraps pass? Not a comment strictly about this patch, but why we have things like #if __cplusplus= 199711L anywhere? For sure the library is not supposed to be used together with old C++ front-ends. I thought about this myself, but at least the overloads are only present with __cplusplus= 199711L. I don't understand: isn't __cplusplus now *always* = 199711L? Or you want to protect vs the user undefining __cplusplus and then defining it to a different value?!? I don't have the Standard at hand (in theory I'm in vacation ;), maybe Marc can help, but I don't think it's legal, is it? Paolo.
Re: [PATCH, i386, testsuite] FMA intrinsics
On Fri, Aug 26, 2011 at 1:41 AM, Ilya Tocar tocarip.in...@gmail.com wrote: So if this is ok can someone please commit it? 2011/8/25 Ilya Tocar tocarip.in...@gmail.com: Fixed. Changelog: 2011-08-25 Ilya Tocar ilya.to...@intel.com * config/i386/fmaintrin.h: New. * config.gcc: Add fmaintrin.h. * config/i386/i386.c (enum ix86_builtins) IX86_BUILTIN_VFMADDSS3: New. IX86_BUILTIN_VFMADDSD3: Likewise. * config/i386/sse.md (fmai_vmfmadd_mode): New. (*fmai_fmadd_mode): Likewise. (*fmai_fmsub_mode): Likewise. (*fmai_fnmadd_mode): Likewise. (*fmai_fnmsub_mode): Likewise. * config/i386/x86intrin.h: Add fmaintrin.h. Please include fmaintrin.h in immintrin.h, not x86intrin.h, since immintrin.h should include all Intel intrinsics. -- H.J.
Re: [PATCH] [JAVA] patch for Java on RTEMS
* Jie Liu wrote on Mon, Aug 15, 2011 at 04:07:36PM CEST: Looks OK, but there is no ChangeLog. Do you have copyright assignment? Have added ChangeLog to the patch, please see the attachment. And I think I have copyright assignment, because I have Free Software Foundation paperwork, as ASSIGNMENT - GNU GCC ... JIE RT688742 The build-system specific parts of the patch are OK, provided that they have been sufficiently tested. When committing top-level changes, please make sure they are synced to the src repository; if you don't have write access to src, please ask someone who has to do that for you. I think you still need approval for the boehm-gc related changes. Please also try to send patches with some text MIME type. Thanks, Ralf --- boehm-gc/ChangeLog(revision 172224) +++ boehm-gc/ChangeLog(working copy) @@ -1,3 +1,22 @@ +2011-08-15 Jie Liu lj8...@gmail.com + * configure.ac: Add configure for RTEMS. + * configure: Add configure for RTEMS. + * include/gc_config.h.in: Add GC_RTEMS_PTHREADS for RTEMS. + * mach_dep.c (GC_with_callee_saves_pushed): Use setjmp for + RTEMS. + * include/gc_config_macros.h: Define GC_PTHREADS for rtems. + * include/private/gcconfig.h: Add configure for RTEMS/i386; + Use calloc for RTEMS to GET_MEM. + * pthread_stop_world.c (GC_stop_init): Add judge SA_RESTART + for operating system; Use sigprocmask unblock the signal + for RTEMS. + * pthread_support.c: Define USE_PTHREAD_SPECIFIC for RTEMS; + Do not include sys/mman.h for RTEMS; Add default GC_nprocs + for RTEMS. + * gc_dlopen.c: Do not include dlfcn.h for RTEMS. + * os_dep.c: Do not use auxiliary routines for obtaining + memory from RTEMS. --- ChangeLog (revision 172224) +++ ChangeLog (working copy) @@ -1,3 +1,8 @@ +2011-08-15 Jie Liu lj8...@gmail.com + + * configure.ac (*-*-rtems*): Remove ${libgcj} in nonconfigdirs. + * configure: Regenerate. --- libjava/ChangeLog (revision 172224) +++ libjava/ChangeLog (working copy) @@ -1,3 +1,13 @@ +2011-08-15 Jie Liu lj8...@gmail.com + + * configure.ac (THREADS): Add configuration for RTEMS. + * configure.host (host): Add configuration for RTEMS. + * include/config.h.in: Add RTEMS_PTHREADS for RTEMS. + * configure: Add configure for RTEMS. + * classpath/native/fdlibm/mprec.c: Remove _mprec_log10 for RTEMS. + * posix-threads.cc: Use SIGHUP for INTR on RTEMS. + * java/lang/natClass.cc: Undef HAVE_TLS for RTEMS.
Re: PATCH: PR middle-end/49721: convert_memory_address_addr_space may generate invalid new insns
H.J. Lu hjl.to...@gmail.com writes: On Sun, Aug 14, 2011 at 9:22 AM, H.J. Lu hjl.to...@gmail.com wrote: Hi, This patch is needed for x32 and only affects x32. Any comments/objections to apply this to finish x32 support? Thanks. H.J. On Thu, Aug 11, 2011 at 6:25 AM, H.J. Lu hjl.to...@gmail.com wrote: Hi, This is the last patch needed for x32 support. convert_memory_address_addr_space is called to convert a memory address without overflow/underflow. It should be safe to transform (zero_extend:DI (plus:SI (FOO:SI) (const_int Y))) to (plus:DI (zero_extend:DI (FOO:SI)) (const_int Y)) GCC only works this way. Any comments? Thanks. H.J. On Sun, Aug 7, 2011 at 1:08 PM, H.J. Lu hongjiu...@intel.com wrote: Hi, We transform ptr_extend:DI (plus:SI (FOO:SI) (const_int Y))) to (plus:DI (ptr_extend:DI (FOO:SI)) (const_int Y)) since this is how Pmode != ptr_mode is supported even if the resulting address may overflow/underflow. It is also true for x32 which has zero_extend instead of ptr_extend. I have tried different approaches to avoid transforming (zero_extend:DI (plus:SI (FOO:SI) (const_int Y))) to (plus:DI (zero_extend:DI (FOO:SI)) (const_int Y)) without success. This patch relaxes the condition to check POINTERS_EXTEND_UNSIGNED != 0 instead if POINTERS_EXTEND_UNSIGNED 0 to cover both ptr_extend and zero_extend. We can investigate a better approach for ptr_extend and zero_extend later. For now, I believe it is the saftest way to support ptr_extend and zero_extend. Any comments? Thanks. H.J. I am checking in this patch, which only affects x32 and nothing else. This one character change, from POINTERS_EXTEND_UNSIGNED 0 to POINTERS_EXTEND_UNSIGNED != 0 creates a working x32 GCC. This isn't perfect. I have tried many different approaches without any success. I will revisit it if we run into any problems with x32 applications. Sorry, I know it's frustrating when things don't get reviewed, but I strongly object to a nonobvious patch like this being applied without approval. (And for the record, I can't approve it. :-)) Richard
Re: [PATCH] [JAVA] patch for Java on RTEMS
On 08/14/2011 03:03 PM, Jie Liu wrote: Hi, I have add the boehm-gc patch and the configure for gcc patch to the patch attached. So we can add this patch and then compile gcj for RTEMS. Best Regards, Jie --- boehm-gc/include/private/gcconfig.h (revision 172224) +++ boehm-gc/include/private/gcconfig.h (working copy) .. @@ -1297,6 +1302,19 @@ # define STACKBOTTOM ((ptr_t)0xc000) # define DATAEND /* not needed */ # endif +# ifdef RTEMS +# define OS_TYPE RTEMS +# include sys/unistd.h Why sys/unistd.h? sys/unistd.h is not supposed to be accessed directly. This likely should be a plain simple #include unistd.h Ralf
Re: [testsuite, i386] Fix for PR50185
Hi guys, Thanks for your objections. HJ, I scanned all AVX2 tests. So, every tests has at least tab which distinguishes it from filename: $ pwd /export/users/kyukhin/ws/gcc/gcc/testsuite/gcc.target/i386 $ grep scan-assembler avx2-* |grep -v \t |wc -l 0 Uros, you're right. Patch contains usless file. Updated one is attached. Thanks, K On Fri, Aug 26, 2011 at 5:04 PM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Aug 26, 2011 at 5:04 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: According to Jakub's input, I've updated test to scan instruction, not pattern name. Is it ok? Thanks, K On Fri, Aug 26, 2011 at 3:45 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hi, Here is a fix for http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182 testsuite/ChangeLog entry: 2011-08-26 Kirill Yukhin kirill.yuk...@intel.com PR testsuite/50185 * gcc.target/i386/avx2-vmovmskb-2.c: Rename to ... * gcc.target/i386/avx2-vpmovmskb-2.c: ... this. Update. Test passes. Ok for trunk? Thanks, K Please check ALL AVX2 tests to see if they have similar problems. -- H.J. pr50185-3.gcc.patch Description: Binary data
Re: [testsuite, i386] Fix for PR50185
On Fri, Aug 26, 2011 at 6:45 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hi guys, Thanks for your objections. HJ, I scanned all AVX2 tests. So, every tests has at least tab which distinguishes it from filename: $ pwd /export/users/kyukhin/ws/gcc/gcc/testsuite/gcc.target/i386 $ grep scan-assembler avx2-* |grep -v \t |wc -l 0 Uros, you're right. Patch contains usless file. Updated one is attached. Thanks, K On Fri, Aug 26, 2011 at 5:04 PM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Aug 26, 2011 at 5:04 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: According to Jakub's input, I've updated test to scan instruction, not pattern name. Is it ok? Thanks, K On Fri, Aug 26, 2011 at 3:45 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hi, Here is a fix for http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182 testsuite/ChangeLog entry: 2011-08-26 Kirill Yukhin kirill.yuk...@intel.com PR testsuite/50185 * gcc.target/i386/avx2-vmovmskb-2.c: Rename to ... * gcc.target/i386/avx2-vpmovmskb-2.c: ... this. Update. Test passes. Ok for trunk? Thanks, K Please check ALL AVX2 tests to see if they have similar problems. Thanks. -- H.J.
Passes uses rather than defs to df_set_dead_notes_for_mw
As described here: http://gcc.gnu.org/ml/gcc/2011-08/msg00294.html df is currently failing to create REG_DEAD notes for the last use of a multi-reg hard register. This appears to be a typo: df_set_dead_notes_for_mw is supposed to handle uses, and the comment above it says so, but df_note_bb_compute is passing defs instead. Bootstrapped regression-tested on x86_64-linux-gnu. OK to install? Richard gcc/ * df-problems.c (df_note_bb_compute): Pass uses rather than defs to df_set_dead_notes_for_mw. Index: gcc/df-problems.c === --- gcc/df-problems.c 2011-08-16 16:27:24.641037124 +0100 +++ gcc/df-problems.c 2011-08-26 14:48:48.521897439 +0100 @@ -3376,7 +3376,7 @@ df_note_bb_compute (unsigned int bb_inde while (*mws_rec) { struct df_mw_hardreg *mws = *mws_rec; - if ((DF_MWS_REG_DEF_P (mws)) + if (DF_MWS_REG_USE_P (mws) !df_ignore_stack_reg (mws-start_regno)) { bool really_add_notes = debug_insn != 0;
Re: [v3] Handle different versions of Solaris 8 iso/math_iso.h, iso/stdlib_iso.h
Paolo, Ok for mainline if bootstraps pass? Not a comment strictly about this patch, but why we have things like #if __cplusplus= 199711L anywhere? For sure the library is not supposed to be used together with old C++ front-ends. I thought about this myself, but at least the overloads are only present with __cplusplus= 199711L. I don't understand: isn't __cplusplus now *always* = 199711L? Or you want to protect vs the user undefining __cplusplus and then defining it to a different value?!? I don't have the Standard at hand (in theory I'm in exactly: just g++ -D__cplusplus=1 or something. vacation ;), maybe Marc can help, but I don't think it's legal, is it? No idea. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH, middle-end]: Fix PR50083: All 32-bit fortran tests fail on 32-bit Solaris
Uros, I will wait for the confirmation from Rainer before committing the patch. an i386-pc-solaris2.9 bootstrap just finished, and all the failures are gone. Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: New automaton_option collapse-ndfa
On 08/25/2011 06:21 PM, Bernd Schmidt wrote: On 07/18/11 18:47, Vladimir Makarov wrote: But I guess comb-vector is popular for a reason. We could tolerate slow compression time because it is done once but worse compression and slower access would have a really bad impact on the compiler time. With some fixes that I need to make to the C6X machine description, comb vector generation time is no longer tolerable. Ok to apply the following patch? (Bootstrapped and tested on i686-linux). Ok. Thanks, Bernd.
Re: Passes uses rather than defs to df_set_dead_notes_for_mw
this looks right to me. ok for commit. On 08/26/2011 09:54 AM, Richard Sandiford wrote: As described here: http://gcc.gnu.org/ml/gcc/2011-08/msg00294.html df is currently failing to create REG_DEAD notes for the last use of a multi-reg hard register. This appears to be a typo: df_set_dead_notes_for_mw is supposed to handle uses, and the comment above it says so, but df_note_bb_compute is passing defs instead. Bootstrapped regression-tested on x86_64-linux-gnu. OK to install? Richard gcc/ * df-problems.c (df_note_bb_compute): Pass uses rather than defs to df_set_dead_notes_for_mw. Index: gcc/df-problems.c === --- gcc/df-problems.c 2011-08-16 16:27:24.641037124 +0100 +++ gcc/df-problems.c 2011-08-26 14:48:48.521897439 +0100 @@ -3376,7 +3376,7 @@ df_note_bb_compute (unsigned int bb_inde while (*mws_rec) { struct df_mw_hardreg *mws = *mws_rec; - if ((DF_MWS_REG_DEF_P (mws)) + if (DF_MWS_REG_USE_P (mws) !df_ignore_stack_reg (mws-start_regno)) { bool really_add_notes = debug_insn != 0;
Re: [PATCH] Add infrastructure to merge standard builtin enums with backend builtins
On Fri, Aug 26, 2011 at 10:19:24AM +0200, Richard Guenther wrote: On Thu, Aug 25, 2011 at 10:35 PM, Michael Meissner meiss...@linux.vnet.ibm.com wrote: On Wed, Aug 24, 2011 at 11:06:55AM +0200, Richard Guenther wrote: This basically would make DECL_BUILT_IN_CLASS no longer necessary if all targets where converted, right? (We don't currently have any BUILT_IN_FRONTEND builtins). That would sound appealing if this patch weren't a partial transition ;) Or we could reduce it to 1 bit if we aren't going to change all of the backends. Now for the possible downsides. How can we reliably distinguish middle-end from target builtins for purpose of lazy initialization? Doesn't this complicate the idea of pluggable targets, thus something like a hybrid ppc / spu compiler? In this light merging middle-end and target builtin enums and arrays sounds like a step backward. If we are willing to pay the storage costs, we could have 1 or 2 bytes for builtin owner, and 2 bytes for builtin index, and then reserve 0 for standard builtins and 1 for machine dependent builtins. However, then you still have the potential problem that sooner or later somebody else will omit the checks. I don't think that the issue you only can index BUILT_IN_NORMAL builtins in built_in_decls is an issue and worth thinking about at all. It's simply bugs. I've probably spent about 2-3 weeks total tracking down those bugs in the past, because they are hard to pin down, but if we don't want to merge the two numbers it isn't a deal breaker to me. It was more while I'm playing in the builtin space, fix the problem. We could reserve a fixed range for plugin builtins if you think that is desirable. Oh, plugin builtins - I didn't even think about the possibility of having those ;) In the end I think we should stick with BUILT_IN_CLASS and maybe add BUILT_IN_PLUGIN then ;) I think if we do this, we should re-use the front end builtin class, and add methods that front ends can add their builtins to the main list. Otherwise we need to grow the class by 1 bit. What I _do_ like is having common machinery for defining builtins. Though instead of continuing the .def file way with all the current warts of ways of adding attributes, etc. to builtins I would have prefered a genbuiltins.c program that can parse standard C declarations and generate whatever is necessary to setup the builtin decls. Thus, instead of DEF_GCC_BUILTIN (BUILT_IN_CLZ, clz, BT_FN_INT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST) have simply int __builtin_clz (unsigned int) __attribute__((const,nothrow,leaf)); in a header file which genbuiltins.c would parse. My first idea when discussing this was a -fgenbuiltins flag to the C frontend (because that already can do all the parsing ...), but Micha suggested a parser that can deal with the above is easy enough to re-implement. Yes, that is certainly do-able. My main intention is to see what kind of infrastructure people wanted before changing all of the ppc builtins. Sure. I agree that all the duplicated code we have in backends for a way to create target builtins, defining enums (or not) for them and having a way to reference them for targetm.builtin_decl (or not) is bad. But unifying those, or providing common infrastructure for them should be orthogonal to the issue whether we want to merge the builtin classes or their storage in some way (I think we don't). It would of course be nice if the infrastructure to create taget builtins were generic enough to eventually handle builtin creation in the middle-end (and the frontends) as well. Hm, I guess this pushes back a bit on your patch. Sorry for that. If you're not excited to try the above idea, can you split out the pieces that do the .def file thing for rs6000, keeping the separation of md and middle-end builtin arrays and enums? I have several goals for the 4.7 time frame: 1) Make target attribute and pragma enable appropriate machine dependent builtins; That's now something completely new ;) Why do we need builtins for this? I ran out of time when I added target pragma support in 4.6 to enable the builtins for target functions. We don't need new builtins, but the ppc backend needs to enable the builtins that exist when the target is selected, which the x86 already does. In the end, I want to be able to do: void v4sf_add (float *, float *, float *, size_t) __attribute__ ((__ifunc__ (resolve_v4sf_add))); static void v4sf_power7_add (float *, float *, float *, size_t) __attribute__ ((__target__ (cpu=power7))); static void v4sf_altivec_add (float *, float *, float *, size_t) __attribute__ ((__target__ (altivec))); static void v4sf_generic_add (float *, float *, float *, size_t); static void *resolve_v4sf_add (void); static void
Re: [PATCH] Fix -Wunused-but-set-* in C with stmt expression and array in it (PR c/50179)
On Fri, 26 Aug 2011, Jakub Jelinek wrote: Hi! As the following testcase shows, if the last expression in statement expression is array, mark_exp_read wasn't called on it. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.6? OK. -- Joseph S. Myers jos...@codesourcery.com
Re: [testsuite, i386] Fix for PR50185
Hi! On Fri, Aug 26, 2011 at 06:04:20AM -0700, H.J. Lu wrote: Please check ALL AVX2 tests to see if they have similar problems. Checking all current i386 tests revealed another problematic testcase: grep scan-assembler' [a-z0-9]*' testsuite/gcc.target/i386/*.c | grep '\(.*\).*\1' (minmax-*.c only match the path, which will show up only with -g and I bet many other scan-assembler tests would fail with RUNTESTFLAGS=--target_board=unix/-g (matching various stuff in the debug info)). 2011-08-26 Jakub Jelinek ja...@redhat.com * gcc.target/i386/cmpxchg16b-1.c: Match also space after the instruction. --- gcc/testsuite/gcc.target/i386/cmpxchg16b-1.c.jj 2011-07-11 10:39:29.0 +0200 +++ gcc/testsuite/gcc.target/i386/cmpxchg16b-1.c2011-08-26 16:20:46.0 +0200 @@ -10,4 +10,4 @@ void test(TItype x_128) m_128 = __sync_val_compare_and_swap (m_128, x_128, m_128); } -/* { dg-final { scan-assembler cmpxchg16b } } */ +/* { dg-final { scan-assembler cmpxchg16b\[ \\t] } } */ Jakub
Re: [PATCH 4/6] Shrink-wrapping
On 24 August 2011 18:23, Bernd Schmidt ber...@codesourcery.com wrote: On 08/24/11 19:17, Richard Sandiford wrote: OK with that change from a MIPS and rtl and perspective. Thanks. What else is in there? Trivial x86 changes, and a slightly less trivial but still tiny ARM bit, I suppose. Richard/Ramana? Sorry about the delayed review - I read through this for a bit this afternoon and for a while I must admit I was confused for a while by why the arm.md changes and the other changes in the backend hadn't made it in here from the original patch. This is OK but please watch out for any fall-out next week. cheers Ramana
Fix .init_array/.fini_array testcase (PR target/50166)
As detailed in the PR, H.J.'s .init_array/.fini_array patch 2011-08-20 H.J. Lu hongjiu...@intel.com PR other/46770 * config.gcc (tm_file): Add initfini-array.h if .init_arrary/.fini_array are supported. broke Solaris bootstrap since the testcase incorrectly succeeds on Solaris, failing to notice that none of the constructors and destructors were ever run. The following patch fixes that, allows i386-pc-solaris2.11 bootstrap to succeed and was also bootstrapped on x86_64-unknown-linux-gnu (CentOS 5.5 with gas/gld 2.21). The testcase still fails on my Linux system, so I'm uncertain if the fix is right. Ok for mainline? Rainer 2011-08-26 Rainer Orth r...@cebitec.uni-bielefeld.de PR target/50166 * acinclude.m4 (gcc_AC_INITFINI_ARRAY): Check count in main. * configure: Regenerate. # HG changeset patch # Parent f622b6f398b4f552dcc1450c8caf6368a5937748 Disable .init_array/.fini_array support on Solaris (PR target/50166) diff --git a/gcc/acinclude.m4 b/gcc/acinclude.m4 --- a/gcc/acinclude.m4 +++ b/gcc/acinclude.m4 @@ -477,6 +477,8 @@ void (*const dtors65535[]) () int main () { + if (count != 65535) +abort (); return 0; } #endif diff --git a/gcc/configure b/gcc/configure --- a/gcc/configure +++ b/gcc/configure @@ -10888,6 +10888,8 @@ void (*const dtors65535) () int main () { + if (count != 65535) +abort (); return 0; } #endif @@ -17913,7 +17915,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat conftest.$ac_ext _LT_EOF -#line 17916 configure +#line 17918 configure #include confdefs.h #if HAVE_DLFCN_H @@ -18019,7 +18021,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat conftest.$ac_ext _LT_EOF -#line 18022 configure +#line 18024 configure #include confdefs.h #if HAVE_DLFCN_H -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH 4/6] Shrink-wrapping
On 08/26/11 16:32, Ramana Radhakrishnan wrote: On 24 August 2011 18:23, Bernd Schmidt ber...@codesourcery.com wrote: On 08/24/11 19:17, Richard Sandiford wrote: OK with that change from a MIPS and rtl and perspective. Thanks. What else is in there? Trivial x86 changes, and a slightly less trivial but still tiny ARM bit, I suppose. Richard/Ramana? Sorry about the delayed review - I read through this for a bit this Nothing delayed about it really :) afternoon and for a while I must admit I was confused for a while by why the arm.md changes and the other changes in the backend hadn't made it in here from the original patch. You mean the introduction of simple_return patterns for ARM? The patch is split up further (this one is now piece 2/3 of the original patch 4/6) and I've postponed these until the final shrink-wrapping patch. In this patch I've only made some MIPS changes in this area, more as a proof-of-concept rather than because they gain anything yet. This is OK but please watch out for any fall-out next week. Thanks! Bernd
Re: [SPARC] Fix bugs with setjmp/longjmp + alloca
From: Eric Botcazou ebotca...@adacore.com Date: Sun, 22 May 2011 00:45:55 +0200 SPARC maintainers, any objection to me eliminating this SETJMP_VIA_SAVE_AREA kludge? This would make it possible to have a shared implementation with the flat mode and remove specific support in a few locations. Even IA-64 does things the canonical way here. Absolutely no objection to getting rid of the setjmp kludge. :-) The thing about the setjmp+alloca handling on sparc is that the code is simply trying to leave the originally stack frame and thus original setjmp area alone. Basically so that the JB_SP/JB_PC don't get overwritten. It would seem to me that, for example with C code, we don't need to update anything. Because any local stack allocation happening later than the setjmp() can be safely ignored since that allocated memory does not exist at the setjmp() point, thus it is safe to always longjmp to the pre-alloca()'d state. I guess when using setjmp/longjmp for exceptions the requirements increase above and beyond what is normally sufficient, and that's why you have to update the buffer?
Re: Fix .init_array/.fini_array testcase (PR target/50166)
On Fri, Aug 26, 2011 at 7:35 AM, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: As detailed in the PR, H.J.'s .init_array/.fini_array patch 2011-08-20 H.J. Lu hongjiu...@intel.com PR other/46770 * config.gcc (tm_file): Add initfini-array.h if .init_arrary/.fini_array are supported. broke Solaris bootstrap since the testcase incorrectly succeeds on Solaris, failing to notice that none of the constructors and destructors were ever run. The following patch fixes that, allows i386-pc-solaris2.11 bootstrap to succeed and was also bootstrapped on x86_64-unknown-linux-gnu (CentOS 5.5 with gas/gld 2.21). The testcase still fails on my Linux system, so I'm uncertain if the fix is right. Ok for mainline? Rainer 2011-08-26 Rainer Orth r...@cebitec.uni-bielefeld.de PR target/50166 * acinclude.m4 (gcc_AC_INITFINI_ARRAY): Check count in main. * configure: Regenerate. That explains why init_array was enabled on AIX. It looks good to me and still works on Fedora 15. Thanks. -- H.J.
Re: [PATCH 4/6] Shrink-wrapping
On 26 August 2011 15:36, Bernd Schmidt ber...@codesourcery.com wrote: On 08/26/11 16:32, Ramana Radhakrishnan wrote: On 24 August 2011 18:23, Bernd Schmidt ber...@codesourcery.com wrote: On 08/24/11 19:17, Richard Sandiford wrote: You mean the introduction of simple_return patterns for ARM? The patch is split up further (this one is now piece 2/3 of the original patch 4/6) and I've postponed these until the final shrink-wrapping patch. In this patch I've only made some MIPS changes in this area, more as a proof-of-concept rather than because they gain anything yet. Yes that's what I meant and figured out later. Thanks for making that explicit. Richard Sandiford did point that out to me on IRC as I was pretty much scratching my head about why some of the other changes were missing :) . cheers Ramana
Re: Fix .init_array/.fini_array testcase (PR target/50166)
H.J. Lu hjl.to...@gmail.com writes: 2011-08-26 Rainer Orth r...@cebitec.uni-bielefeld.de PR target/50166 * acinclude.m4 (gcc_AC_INITFINI_ARRAY): Check count in main. * configure: Regenerate. That explains why init_array was enabled on AIX. It looks good to me and still works on Fedora 15. What support do you need on the Linux side for .init_array/.fini_array to work? I'd have expected that gld 2.21 is enough, or is ld-linux.so.2 support required, too? Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
bb partitioning vs optimize_function_for_speed_p
In rest_of_reorder_blocks, we avoid reordering if !optimize_function_for_speed_p. However, we still call insert_section_bounary_note, which can cause problems because now, if we have a sequence of HOT-COLD-HOT blocks, the second set of HOT blocks will end up in the cold section. This causes assembler failures when using exception handling (subtracting labels from different sections). Unfortunately, the only way I have of reproducing it is to apply a 67-patch quilt tree backporting the preliminary shrink-wrapping patches to gcc-4.6; then we get FAIL: g++.dg/tree-prof/partition2.C compilation, -Os -fprofile-use However, the problem is reasonably obvious. Bootstrapped and currently testing in the aforementioned 4.6 tree. Ok for trunk after testing there? Bernd * bb-reorder.c (insert_section_boundary_note): Only do it if we reordered the blocks; i.e. not if !optimize_function_for_speed_p. Index: gcc/bb-reorder.c === --- gcc/bb-reorder.c(revision 178030) +++ gcc/bb-reorder.c(working copy) @@ -1965,8 +1965,11 @@ insert_section_boundary_note (void) rtx new_note; int first_partition = 0; - if (flag_reorder_blocks_and_partition) -FOR_EACH_BB (bb) + if (!flag_reorder_blocks_and_partition + || !optimize_function_for_speed_p (cfun)) +return; + + FOR_EACH_BB (bb) { if (!first_partition) first_partition = BB_PARTITION (bb);
Re: Fix .init_array/.fini_array testcase (PR target/50166)
On Fri, Aug 26, 2011 at 7:45 AM, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: H.J. Lu hjl.to...@gmail.com writes: 2011-08-26 Rainer Orth r...@cebitec.uni-bielefeld.de PR target/50166 * acinclude.m4 (gcc_AC_INITFINI_ARRAY): Check count in main. * configure: Regenerate. That explains why init_array was enabled on AIX. It looks good to me and still works on Fedora 15. What support do you need on the Linux side for .init_array/.fini_array to work? I'd have expected that gld 2.21 is enough, or is ld-linux.so.2 support required, too? You need the latest Linux binutils. Mixing init_array/,ctors sections support was added after binutils 2.21 was released: http://sourceware.org/git/?p=binutils.git;a=commit;h=30dfd0308a8551174634494822e194fcf24a7ddb -- H.J.
Re: [Patch ARM] Fix vec_pack_trunc pattern for vectorize_with_neon_quad.
On 16 August 2011 15:20, Ramana Radhakrishnan ramana.radhakrish...@linaro.org wrote: Hi, While looking at a failure with regrename and mvectorize-with-neon-quad I noticed that the early-clobber in this vec_pack_trunc pattern is superfluous given that we can use reg_overlap_mentioned_p to decide in which order we want to emit these 2 instructions. While it works around the problem in regrename.c I still think that the behaviour in regrename is a bit suspicious and needs some more investigation. RichardS finally fixed the problem in data-flow and hence we should be able to turn on vectorize_with_quad anyway. Here's the patch which I thought I should have committed as a workaround but I think it's better to split this further in the case where the 2 registers are equal because otherwise you are pointlessly creating a stall in the Neon pipe for the vmovn result to arrive. Hence I'm not committing this patch. Tests finished OK btw for this patch. cheers Ramana index 24dd941..2c60c5f 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -5631,14 +5631,29 @@ ; the semantics of the instructions require. (define_insn vec_pack_trunc_mode - [(set (match_operand:V_narrow_pack 0 register_operand =w) + [(set (match_operand:V_narrow_pack 0 register_operand =w) (vec_concat:V_narrow_pack (truncate:V_narrow (match_operand:VN 1 register_operand w)) (truncate:V_narrow (match_operand:VN 2 register_operand w] TARGET_NEON !BYTES_BIG_ENDIAN - vmovn.iV_sz_elem\t%e0, %q1\;vmovn.iV_sz_elem\t%f0, %q2 + { + /* If operand1 and operand2 are identical, then the second +narrowing operation isn't needed as the values obtained +in both parts of the destination q register are identical. +This precludes the need for an early clobber in the destination +operand. */ + if (rtx_equal_p (operands[1], operands[2])) +return vmovn.iV_sz_elem\\t%e0, %q1\;vmov.iV_sz_elem\\t%f0, %e0; + else + { + if (reg_overlap_mentioned_p (operands[0], operands[2])) + return vmovn.iV_sz_elem\\t%f0, %q2\;vmovn.iV_sz_elem\\t%e0, %q1; + else + return vmovn.iV_sz_elem\\t%e0, %q1\;vmovn.iV_sz_elem\\t%f0, %q2; + } + } [(set_attr neon_type neon_shift_1) (set_attr length 8)] )
Re: Fix .init_array/.fini_array testcase (PR target/50166)
H.J. Lu hjl.to...@gmail.com writes: What support do you need on the Linux side for .init_array/.fini_array to work? I'd have expected that gld 2.21 is enough, or is ld-linux.so.2 support required, too? You need the latest Linux binutils. Mixing init_array/,ctors sections support was added after binutils 2.21 was released: http://sourceware.org/git/?p=binutils.git;a=commit;h=30dfd0308a8551174634494822e194fcf24a7ddb I see, thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: Fix .init_array/.fini_array testcase (PR target/50166)
On Fri, Aug 26, 2011 at 04:35:18PM +0200, Rainer Orth wrote: Ok for mainline? Yes. 2011-08-26 Rainer Orth r...@cebitec.uni-bielefeld.de PR target/50166 * acinclude.m4 (gcc_AC_INITFINI_ARRAY): Check count in main. * configure: Regenerate. Jakub
Re: [v3] Handle different versions of Solaris 8 iso/math_iso.h, iso/stdlib_iso.h
Hi, exactly: just g++ -D__cplusplus=1 or something. Irrespective of what the Standard strictly says, I think the latter would only make sense if it would allow the user to return, consistently, to the pre-4.7 behavior, for compatibility reasons or something. Is it the case? Is the above enough for that? Or some of the changes which went in are effective anyway even if __cplusplus is reverted by hand to 1? I think this is the question deciding what we really want to do. Paolo
Re: [v3] Handle different versions of Solaris 8 iso/math_iso.h, iso/stdlib_iso.h
Hi Paolo, exactly: just g++ -D__cplusplus=1 or something. Irrespective of what the Standard strictly says, I think the latter would only make sense if it would allow the user to return, consistently, to the pre-4.7 behavior, for compatibility reasons or something. Is it the case? Is the above enough for that? Or some of the changes which went in are effective anyway even if __cplusplus is reverted by hand to 1? I think this is the question deciding what we really want to do. I'm pretty sure this is the case for Solaris. The other changes we've made to support __cplusplus 199711L were no-ops without the last one to change __cplusplus from 1 to the C++ 98 value. So, redefining __cplusplus to 1 should return us back to the old status. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[PATCH][ARM] -m{cpu,tune,arch}=native
Hi all, This patch adds support for -mcpu=native, -mtune=native, and -march=native for ARM Linux hosts. So far, it only recognises Cortex-A8 and Cortex-A9, so I really need to find out what the magic part numbers are for other cpus before this patch is complete. I couldn't just find this information listed anywhere. I think there are a lot of clues in the kernel code, but it's hard to mine and it mostly only goes as far the architecture version, not the individual cpu. Any suggestions? Otherwise, is this OK? Andrew 2011-08-26 Andrew Stubbs a...@codesourcery.com gcc/ * config.host (arm*-*-linux*): Add driver-arm.o and x-arm. * config/arm/arm-tables.opt: Add 'native' processor type and architecture type. * config/arm/arm.h (host_detect_local_cpu): New prototype. (EXTRA_SPEC_FUNCTIONS): New define. (MCPU_MTUNE_NATIVE_SPECS): New define. (DRIVER_SELF_SPECS): New define. * config/arm/driver-arm.c: New file. * config/arm/x-arm: New file. * doc/invoke.texi (ARM Options): Document -mcpu=native, -mtune=native and -march=native. --- a/gcc/config.host +++ b/gcc/config.host @@ -100,6 +100,14 @@ case ${host} in esac case ${host} in + arm*-*-linux*) +case ${target} in + arm*-*-*) + host_extra_gcc_objs=driver-arm.o + host_xmake_file=${host_xmake_file} arm/x-arm + ;; +esac +;; alpha*-*-linux* | alpha*-dec-osf*) case ${target} in alpha*-*-linux* | alpha*-dec-osf*) --- a/gcc/config/arm/arm-tables.opt +++ b/gcc/config/arm/arm-tables.opt @@ -25,6 +25,9 @@ Name(processor_type) Type(enum processor_type) Known ARM CPUs (for use with the -mcpu= and -mtune= options): EnumValue +Enum(processor_type) String(native) Value(-1) DriverOnly + +EnumValue Enum(processor_type) String(arm2) Value(arm2) EnumValue @@ -269,6 +272,9 @@ Name(arm_arch) Type(int) Known ARM architectures (for use with the -march= option): EnumValue +Enum(arm_arch) String(native) Value(-1) DriverOnly + +EnumValue Enum(arm_arch) String(armv2) Value(0) EnumValue --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -2223,4 +2223,21 @@ extern int making_const_table; instruction. */ #define MAX_LDM_STM_OPS 4 +/* -mcpu=native handling only makes sense with compiler running on + an ARM chip. */ +#if defined(__arm__) +extern const char *host_detect_local_cpu (int argc, const char **argv); +# define EXTRA_SPEC_FUNCTIONS \ + { local_cpu_detect, host_detect_local_cpu }, + +# define MCPU_MTUNE_NATIVE_SPECS \ +%{march=native:%march=native %:local_cpu_detect(arch)} \ +%{mcpu=native:%mcpu=native %:local_cpu_detect(cpu)} \ +%{mtune=native:%mtune=native %:local_cpu_detect(tune)} +#else +# define MCPU_MTUNE_NATIVE_SPECS +#endif + +#define DRIVER_SELF_SPECS MCPU_MTUNE_NATIVE_SPECS + #endif /* ! GCC_ARM_H */ --- /dev/null +++ b/gcc/config/arm/driver-arm.c @@ -0,0 +1,86 @@ +/* Subroutines for the gcc driver. + Copyright (C) 2011 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +http://www.gnu.org/licenses/. */ + +#include config.h +#include system.h +#include coretypes.h +#include tm.h + +static struct { + const char *part_no; + const char *arch_name; + const char *cpu_name; +} cpu_table[] = { +{0xc08, armv7-a, cortex-a8}, +{0xc09, armv7-a, cortex-a9}, +{NULL, NULL, NULL} +}; + +/* This will be called by the spec parser in gcc.c when it sees + a %:local_cpu_detect(args) construct. Currently it will be called + with either arch, cpu or tune as argument depending on if + -march=native, -mcpu=native or -mtune=native is to be substituted. + + It returns a string containing new command line parameters to be + put at the place of the above two options, depending on what CPU + this is executed. E.g. -march=armv7-a on a Cortex-A8 for + -march=native. If the routine can't detect a known processor, + the -march or -mtune option is discarded. + + ARGC and ARGV are set depending on the actual arguments given + in the spec. */ +const char * +host_detect_local_cpu (int argc, const char **argv) +{ + const char *val = NULL; + char buf[128]; + FILE *f; + bool arch; + + if (argc 1) +return NULL; + + arch = strcmp (argv[0], arch) == 0; + if (!arch strcmp (argv[0], cpu) != 0 strcmp (argv[0], tune)) +return NULL; + + f = fopen (/proc/cpuinfo, r); + if (f == NULL) +return NULL; + + while (fgets (buf, sizeof (buf), f) != NULL) +if (strncmp
Re: [v3] Handle different versions of Solaris 8 iso/math_iso.h, iso/stdlib_iso.h
Hi, I'm pretty sure this is the case for Solaris. The other changes we've made to support __cplusplus 199711L were no-ops without the last one to change __cplusplus from 1 to the C++ 98 value. So, redefining __cplusplus to 1 should return us back to the old status. I see, then I think the patch is Ok. Since you are so well positioned to test on Solaris machines, I would recommend running the library testsuite with -D__cplusplus=1 added to CXXFLAGS, as a final check. Paolo
[PATCH] Support (as an extension) threadprivate procedure pointers
Hi! This patch adds (tiny) code to handle procedure pointers in !$omp threadprivate plus a testcase. This is outside of the scope of OpenMP standard, i.e. an extension so far, hopefully OpenMP 4.0 will cover at least F2003, C++11 and maybe also F2008. Haven't touched any other OpenMP places wrt. procedure pointers, so e.g. they aren't allowed in various other clauses. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk. 2011-08-26 Jakub Jelinek ja...@redhat.com * trans-decl.c (get_proc_pointer_decl): Set DECL_TLS_MODEL if threadprivate. * symbol.c (check_conflict): Allow threadprivate attribute with FL_PROCEDURE if proc_pointer. * testsuite/libgomp.fortran/threadprivate4.f90: New test. --- gcc/fortran/trans-decl.c.jj 2011-08-18 08:35:51.0 +0200 +++ gcc/fortran/trans-decl.c2011-08-26 11:34:31.0 +0200 @@ -1533,6 +1533,11 @@ get_proc_pointer_decl (gfc_symbol *sym) false, true); } + /* Handle threadprivate procedure pointers. */ + if (sym-attr.threadprivate + (TREE_STATIC (decl) || DECL_EXTERNAL (decl))) +DECL_TLS_MODEL (decl) = decl_default_tls_model (decl); + attributes = add_attributes_to_decl (sym-attr, NULL_TREE); decl_attributes (decl, attributes, 0); --- gcc/fortran/symbol.c.jj 2011-08-22 08:17:04.0 +0200 +++ gcc/fortran/symbol.c2011-08-26 12:31:10.0 +0200 @@ -673,7 +673,8 @@ check_conflict (symbol_attribute *attr, conf2 (codimension); conf2 (dimension); conf2 (function); - conf2 (threadprivate); + if (!attr-proc_pointer) + conf2 (threadprivate); } if (!attr-proc_pointer) --- libgomp/testsuite/libgomp.fortran/threadprivate4.f90.jj 2011-08-26 11:54:50.0 +0200 +++ libgomp/testsuite/libgomp.fortran/threadprivate4.f902011-08-26 12:35:22.0 +0200 @@ -0,0 +1,78 @@ +! { dg-do run } +! { dg-require-effective-target tls_runtime } + +module threadprivate4 + integer :: vi + procedure(), pointer :: foo +!$omp threadprivate (foo, vi) + +contains + subroutine fn0 +vi = 0 + end subroutine fn0 + subroutine fn1 +vi = 1 + end subroutine fn1 + subroutine fn2 +vi = 2 + end subroutine fn2 + subroutine fn3 +vi = 3 + end subroutine fn3 +end module threadprivate4 + + use omp_lib + use threadprivate4 + + integer :: i + logical :: l + + procedure(), pointer :: bar1 + common /thrc/ bar1 +!$omp threadprivate (/thrc/) + + procedure(), pointer, save :: bar2 +!$omp threadprivate (bar2) + + l = .false. + call omp_set_dynamic (.false.) + call omp_set_num_threads (4) + +!$omp parallel num_threads (4) reduction (.or.:l) private (i) + i = omp_get_thread_num () + if (i.eq.0) then +foo = fn0 +bar1 = fn0 +bar2 = fn0 + elseif (i.eq.1) then +foo = fn1 +bar1 = fn1 +bar2 = fn1 + elseif (i.eq.2) then +foo = fn2 +bar1 = fn2 +bar2 = fn2 + else +foo = fn3 +bar1 = fn3 +bar2 = fn3 + end if + vi = -1 +!$omp barrier + vi = -1 + call foo () + l=l.or.(vi.ne.i) + vi = -2 + call bar1 () + l=l.or.(vi.ne.i) + vi = -3 + call bar2 () + l=l.or.(vi.ne.i) + vi = -1 +!$omp end parallel + + if (l) call abort + +end + +! { dg-final { cleanup-modules threadprivate4 } } Jakub
Re: [PATCH, i386, testsuite] FMA intrinsics
On Fri, Aug 26, 2011 at 8:06 AM, Ilya Tocar tocarip.in...@gmail.com wrote: Done. Also fixed changelog: 2011-08-26 Ilya Tocar ilya.to...@intel.com * config/i386/fmaintrin.h: New. * config.gcc: Add fmaintrin.h. * config/i386/i386.c (enum ix86_builtins) IX86_BUILTIN_VFMADDSS3: New. IX86_BUILTIN_VFMADDSD3: Likewise. * config/i386/sse.md (fmai_vmfmadd_mode): New. (*fmai_fmadd_mode): Likewise. (*fmai_fmsub_mode): Likewise. (*fmai_fnmadd_mode): Likewise. (*fmai_fnmsub_mode): Likewise. * config/i386/immintrin.h: Add fmaintrin.h. -- +++ b/gcc/config/i386/fmaintrin.h @@ -0,0 +1,229 @@ +/* Copyright (C) 2007, 2008, 2009, 2010, 2011 Free Software Foundation, Inc. + It should be just 2011. H.J.
Re: [PATCH, i386, testsuite] FMA intrinsics
On Fri, Aug 26, 2011 at 8:47 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Aug 26, 2011 at 8:06 AM, Ilya Tocar tocarip.in...@gmail.com wrote: Done. Also fixed changelog: 2011-08-26 Ilya Tocar ilya.to...@intel.com * config/i386/fmaintrin.h: New. * config.gcc: Add fmaintrin.h. * config/i386/i386.c (enum ix86_builtins) IX86_BUILTIN_VFMADDSS3: New. IX86_BUILTIN_VFMADDSD3: Likewise. * config/i386/sse.md (fmai_vmfmadd_mode): New. (*fmai_fmadd_mode): Likewise. (*fmai_fmsub_mode): Likewise. (*fmai_fnmadd_mode): Likewise. (*fmai_fnmsub_mode): Likewise. * config/i386/immintrin.h: Add fmaintrin.h. -- +++ b/gcc/config/i386/fmaintrin.h @@ -0,0 +1,229 @@ +/* Copyright (C) 2007, 2008, 2009, 2010, 2011 Free Software Foundation, Inc. + It should be just 2011. Also lines in fmaintrin.h are too long. I prefer 72 columns. -- H.J.
Re: [lto] Refactor streamer (1/N) (issue4809083)
Hi, On Fri, 26 Aug 2011, Jakub Jelinek wrote: While you are touching it, I think we should also optimize it as in the patch below. I'm afraid no string length optimization would be able to figure out that it doesn't have to call strlen twice, because the htab_find_slot isn't pure. Sure. Regstrapped the below patch and checked in as r178118. Ciao, Michael. -- Index: lto-streamer-in.c === --- lto-streamer-in.c (revision 178117) +++ lto-streamer-in.c (revision 178118) @@ -98,21 +98,22 @@ canon_file_name (const char *string) { void **slot; struct string_slot s_slot; + size_t len = strlen (string); + s_slot.s = string; - s_slot.len = strlen (string); + s_slot.len = len; slot = htab_find_slot (file_name_hash_table, s_slot, INSERT); if (*slot == NULL) { - size_t len; char *saved_string; struct string_slot *new_slot; - len = strlen (string); saved_string = (char *) xmalloc (len + 1); new_slot = XCNEW (struct string_slot); - strcpy (saved_string, string); + memcpy (saved_string, string, len + 1); new_slot-s = saved_string; + new_slot-len = len; *slot = new_slot; return saved_string; }
[PATCH][ARM] Generic tuning
Hi all, This patch is step 1 towards having generic (best-blend) tuning on ARM. The patch adds an option '-mtune=generic-armv7-a' but does not actually do any tuning tweaks yet - those are for follow up patches. x86 has simply '-mtune=generic', and from that (the documentation suggests) the compiler selects the most common architecture/cpu variants to tune for. I don't think that translates well to the ARM world, so I have chosen to make it generic within the architecture family. My intention is to make this the default tuning whenever the use specifies '-march=armv7-a', but that will have to wait until it does something meaningful. OK? Andrew 2011-08-26 Andrew Stubbs a...@codesourcery.com gcc/ * config/arm/arm-cores.def (generic-armv7-a): New architecture. * config/arm/arm-tables.opt: Add generic-armv7-a tune/cpu type. * config/arm/arm-tune.md: Regenerate. * config/arm/arm.c (arm_file_start): Output .arch directive when user passes -mcpu=generic-*. (arm_issue_rate): Add genericv7a support. * config/arm/arm.h (EXTRA_SPECS): Add asm_cpu_spec. (ASM_CPU_SPEC): New define. * config/arm/elf.h (ASM_SPEC): Use %(asm_cpu_spec). * config/arm/semi.h (ASM_SPEC): Likewise. * doc/invoke.texi (ARM Options): Document -mcpu=generic-* and -mtune=generic-*. --- a/gcc/config/arm/arm-cores.def +++ b/gcc/config/arm/arm-cores.def @@ -124,6 +124,7 @@ ARM_CORE(mpcorenovfp, mpcorenovfp, 6K, FL_LDSCHED, 9e) ARM_CORE(mpcore, mpcore, 6K, FL_LDSCHED | FL_VFPV2, 9e) ARM_CORE(arm1156t2-s, arm1156t2s, 6T2, FL_LDSCHED, v6t2) ARM_CORE(arm1156t2f-s, arm1156t2fs, 6T2, FL_LDSCHED | FL_VFPV2, v6t2) +ARM_CORE(generic-armv7-a, genericv7a, 7A, FL_LDSCHED, cortex) ARM_CORE(cortex-a5, cortexa5, 7A, FL_LDSCHED, cortex_a5) ARM_CORE(cortex-a8, cortexa8, 7A, FL_LDSCHED, cortex) ARM_CORE(cortex-a9, cortexa9, 7A, FL_LDSCHED, cortex_a9) @@ -135,3 +136,4 @@ ARM_CORE(cortex-m4, cortexm4, 7EM, FL_LDSCHED, cortex) ARM_CORE(cortex-m3, cortexm3, 7M, FL_LDSCHED, cortex) ARM_CORE(cortex-m1, cortexm1, 6M, FL_LDSCHED, cortex) ARM_CORE(cortex-m0, cortexm0, 6M, FL_LDSCHED, cortex) + --- a/gcc/config/arm/arm-tables.opt +++ b/gcc/config/arm/arm-tables.opt @@ -235,6 +235,9 @@ EnumValue Enum(processor_type) String(arm1156t2f-s) Value(arm1156t2fs) EnumValue +Enum(processor_type) String(generic-armv7-a) Value(genericv7a) + +EnumValue Enum(processor_type) String(cortex-a5) Value(cortexa5) EnumValue --- a/gcc/config/arm/arm-tune.md +++ b/gcc/config/arm/arm-tune.md @@ -1,5 +1,5 @@ ;; -*- buffer-read-only: t -*- ;; Generated automatically by gentune.sh from arm-cores.def (define_attr tune - arm2,arm250,arm3,arm6,arm60,arm600,arm610,arm620,arm7,arm7d,arm7di,arm70,arm700,arm700i,arm710,arm720,arm710c,arm7100,arm7500,arm7500fe,arm7m,arm7dm,arm7dmi,arm8,arm810,strongarm,strongarm110,strongarm1100,strongarm1110,fa526,fa626,arm7tdmi,arm7tdmis,arm710t,arm720t,arm740t,arm9,arm9tdmi,arm920,arm920t,arm922t,arm940t,ep9312,arm10tdmi,arm1020t,arm9e,arm946es,arm966es,arm968es,arm10e,arm1020e,arm1022e,xscale,iwmmxt,iwmmxt2,fa606te,fa626te,fmp626,fa726te,arm926ejs,arm1026ejs,arm1136js,arm1136jfs,arm1176jzs,arm1176jzfs,mpcorenovfp,mpcore,arm1156t2s,arm1156t2fs,cortexa5,cortexa8,cortexa9,cortexa15,cortexr4,cortexr4f,cortexr5,cortexm4,cortexm3,cortexm1,cortexm0 + arm2,arm250,arm3,arm6,arm60,arm600,arm610,arm620,arm7,arm7d,arm7di,arm70,arm700,arm700i,arm710,arm720,arm710c,arm7100,arm7500,arm7500fe,arm7m,arm7dm,arm7dmi,arm8,arm810,strongarm,strongarm110,strongarm1100,strongarm1110,fa526,fa626,arm7tdmi,arm7tdmis,arm710t,arm720t,arm740t,arm9,arm9tdmi,arm920,arm920t,arm922t,arm940t,ep9312,arm10tdmi,arm1020t,arm9e,arm946es,arm966es,arm968es,arm10e,arm1020e,arm1022e,xscale,iwmmxt,iwmmxt2,fa606te,fa626te,fmp626,fa726te,arm926ejs,arm1026ejs,arm1136js,arm1136jfs,arm1176jzs,arm1176jzfs,mpcorenovfp,mpcore,arm1156t2s,arm1156t2fs,genericv7a,cortexa5,cortexa8,cortexa9,cortexa15,cortexr4,cortexr4f,cortexr5,cortexm4,cortexm3,cortexm1,cortexm0 (const (symbol_ref ((enum attr_tune) arm_tune --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -22195,6 +22195,8 @@ arm_file_start (void) const char *fpu_name; if (arm_selected_arch) asm_fprintf (asm_out_file, \t.arch %s\n, arm_selected_arch-name); + else if (strncmp (arm_selected_cpu-name, generic, 7) == 0) + asm_fprintf (asm_out_file, \t.arch %s\n, arm_selected_cpu-name + 8); else asm_fprintf (asm_out_file, \t.cpu %s\n, arm_selected_cpu-name); @@ -23719,6 +23721,7 @@ arm_issue_rate (void) case cortexr4: case cortexr4f: case cortexr5: +case genericv7a: case cortexa5: case cortexa8: case cortexa9: --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -189,6 +189,7 @@ extern void (*arm_lang_output_object_attributes_hook)(void); Do not define this macro if it does not need to do anything. */ #define EXTRA_SPECS \ { subtarget_cpp_spec, SUBTARGET_CPP_SPEC },
Re: [PATCH][ARM] -m{cpu,tune,arch}=native
On Fri, 26 Aug 2011, Andrew Stubbs wrote: Hi all, This patch adds support for -mcpu=native, -mtune=native, and -march=native for ARM Linux hosts. So far, it only recognises Cortex-A8 and Cortex-A9, so I really need to find out what the magic part numbers are for other cpus before this patch is complete. I couldn't just find this information listed anywhere. I think there are a lot of clues in the kernel code, but it's hard to mine and it mostly only goes as far the architecture version, not the individual cpu. Any suggestions? Otherwise, is this OK? arm-tables.opt is a generated file. You need to modify the source files and regenerate it, not modify the generated file. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH][ARM] Generic tuning
Again, arm-tables.opt is generated - so the log entry should just be * config/arm/arm-tables.opt: Regenerate. and the file should be what you get from regeneration. -- Joseph S. Myers jos...@codesourcery.com
PING: PATCH: PR preprocessor/39533: -MM may list a header file twice
On Wed, Apr 15, 2009 at 1:07 PM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, Apr 15, 2009 at 11:51 AM, Tom Tromey tro...@redhat.com wrote: H.J. == H J Lu hjl.to...@gmail.com writes: H.J. Can you take a look at my patch: H.J. http://gcc.gnu.org/ml/gcc-patches/2009-03/msg01829.html I looked at this today. I don't understand why the check is not done in the loop. Also I don't understand whether this patch can change the directory search order in cases like #include_next. Can you comment on this issue? gcc.dg/cpp/pr20356.c checks the expected behavior for #include_next. My patch works with it. And more generally, could you try to provide some explanation for how these patches are supposed to function? FWIW the reason it takes me so long to look at them is that I have to reverse engineer the logic, usually by applying the patch and stepping through with the debugger... which is an awful lot of work for a bug which is essentially cosmetic. There is only one patch: http://gcc.gnu.org/ml/gcc-patches/2009-03/msg01829.html search_cache checks if the file can be found when starting searching at START_DIR with a trailing '/'. If the start_dir field of head of hash entry isn't NULL, it is the start search directory for the cached file. If START_DIR + name is the same as pathname for head and START_DIR is the directory which contains the file, (!strncmp (start_dir-name, file-path, start_dir-len) !strcmp (file-name, file-path + start_dir-len))) that means the cached head is a perfect match. We don't need to add START_DIR to start search at START_DIR (with trailing '/') and then START_DIR (without trailing '/') This is very old. I also forgot about it. OK for trunk? Thanks. -- H.J.
Re: [PATCH, ARM] Unaligned accesses for packed structures [1/2]
On Thu, 25 Aug 2011 18:31:21 +0100 Julian Brown jul...@codesourcery.com wrote: On Thu, 25 Aug 2011 16:46:50 +0100 Julian Brown jul...@codesourcery.com wrote: So, OK to apply this version, assuming testing comes out OK? (And the followup patch [2/2], which remains unchanged?) FWIW, all tests pass, apart from gcc.target/arm/volatile-bitfields-3.c, which regresses. The output contains: ldrhr0, [r3, #2]@ unaligned I believe that, to conform to the ARM EABI, that GCC must use an (aligned) ldr in this case. Is that correct? If so, it looks like the middle-end bitfield code does not take the setting of -fstrict-volatile-bitfields into account. This version fixes the last issue, by adding additional checks for volatile accesses/-fstrict-volatile-bitfields. Tests now show no regressions. OK to apply? Thanks, Julian ChangeLog gcc/ * config/arm/arm.c (arm_override_options): Add unaligned_access support. (arm_file_start): Emit attribute for unaligned access as appropriate. * config/arm/arm.md (UNSPEC_UNALIGNED_LOAD) (UNSPEC_UNALIGNED_STORE): Add constants for unspecs. (insv, extzv): Add unaligned-access support. (extv): Change to expander. Likewise. (extzv_t1, extv_regsi): Add helpers. (unaligned_loadsi, unaligned_loadhis, unaligned_loadhiu) (unaligned_storesi, unaligned_storehi): New. (*extv_reg): New (previous extv implementation). * config/arm/arm.opt (munaligned_access): Add option. * config/arm/constraints.md (Uw): New constraint. * expmed.c (store_bit_field_1): Adjust bitfield numbering according to size of access, not size of unit, when BITS_BIG_ENDIAN != BYTES_BIG_ENDIAN. Don't use bitfield accesses for volatile accesses when -fstrict-volatile-bitfields is in effect. (extract_bit_field_1): Likewise. commit 645a7c99ff91ea2841c8101fb3c76e3b1fddb2c7 Author: Julian Brown jul...@henry8.codesourcery.com Date: Tue Aug 23 05:46:22 2011 -0700 Unaligned support for packed structs diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 3162b30..cc1eb80 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -1905,6 +1905,28 @@ arm_option_override (void) fix_cm3_ldrd = 0; } + /* Enable -munaligned-access by default for + - all ARMv6 architecture-based processors + - ARMv7-A, ARMv7-R, and ARMv7-M architecture-based processors. + + Disable -munaligned-access by default for + - all pre-ARMv6 architecture-based processors + - ARMv6-M architecture-based processors. */ + + if (unaligned_access == 2) +{ + if (arm_arch6 (arm_arch_notm || arm_arch7)) + unaligned_access = 1; + else + unaligned_access = 0; +} + else if (unaligned_access == 1 + !(arm_arch6 (arm_arch_notm || arm_arch7))) +{ + warning (0, target CPU does not support unaligned accesses); + unaligned_access = 0; +} + if (TARGET_THUMB1 flag_schedule_insns) { /* Don't warn since it's on by default in -O2. */ @@ -22145,6 +22167,10 @@ arm_file_start (void) val = 6; asm_fprintf (asm_out_file, \t.eabi_attribute 30, %d\n, val); + /* Tag_CPU_unaligned_access. */ + asm_fprintf (asm_out_file, \t.eabi_attribute 34, %d\n, + unaligned_access); + /* Tag_ABI_FP_16bit_format. */ if (arm_fp16_format) asm_fprintf (asm_out_file, \t.eabi_attribute 38, %d\n, diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 0f23400..0ea0f7f 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -103,6 +103,10 @@ UNSPEC_SYMBOL_OFFSET ; The offset of the start of the symbol from ; another symbolic address. UNSPEC_MEMORY_BARRIER ; Represent a memory barrier. + UNSPEC_UNALIGNED_LOAD ; Used to represent ldr/ldrh instructions that access + ; unaligned locations, on architectures which support + ; that. + UNSPEC_UNALIGNED_STORE ; Same for str/strh. ]) ;; UNSPEC_VOLATILE Usage: @@ -2468,10 +2472,10 @@ ;;; this insv pattern, so this pattern needs to be reevalutated. (define_expand insv - [(set (zero_extract:SI (match_operand:SI 0 s_register_operand ) - (match_operand:SI 1 general_operand ) - (match_operand:SI 2 general_operand )) -(match_operand:SI 3 reg_or_int_operand ))] + [(set (zero_extract (match_operand 0 nonimmediate_operand ) + (match_operand 1 general_operand ) + (match_operand 2 general_operand )) +(match_operand 3 reg_or_int_operand ))] TARGET_ARM || arm_arch_thumb2 { @@ -2482,35 +2486,70 @@ if (arm_arch_thumb2) { - bool use_bfi = TRUE; - - if (GET_CODE (operands[3]) == CONST_INT) +if (unaligned_access MEM_P (operands[0]) + s_register_operand (operands[3], GET_MODE (operands[3])) + (width == 16 || width == 32) (start_bit % BITS_PER_UNIT) == 0) { - HOST_WIDE_INT val = INTVAL (operands[3]) mask; +
Re: [v3] Handle different versions of Solaris 8 iso/math_iso.h, iso/stdlib_iso.h
Hi Paolo, I'm pretty sure this is the case for Solaris. The other changes we've made to support __cplusplus 199711L were no-ops without the last one to change __cplusplus from 1 to the C++ 98 value. So, redefining __cplusplus to 1 should return us back to the old status. I see, then I think the patch is Ok. Since you are so well positioned to test on Solaris machines, I would recommend running the library testsuite with -D__cplusplus=1 added to CXXFLAGS, as a final check. I've just done that on i386-pc-solaris2.11, but had to use -U__cplusplus -D__cplusplus=1 to avoid the redefinition warning. This way, I get only a single regression: -PASS: abi/header_cxxabi.c (test for excess errors) +FAIL: abi/header_cxxabi.c (test for excess errors) FAIL: abi/header_cxxabi.c (test for excess errors) Excess errors: /var/gcc/regression/trunk/11-gcc/build/i386-pc-solaris2.11/libstdc++-v3/include/i386-pc-solaris2.11/bits/c++config.h:167:1: error: unknown type name 'namespace' /var/gcc/regression/trunk/11-gcc/build/i386-pc-solaris2.11/libstdc++-v3/include/i386-pc-solaris2.11/bits/c++config.h:168:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token which is pretty obvious given that this test is supposed to be compiled as C :-) I guess the patch is ok now? Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu hjl.to...@gmail.com wrote: On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam tmsri...@google.com wrote: Hi, Thanks for all the comments. I am attaching a new patch incorporating all of the changes mentioned, mainly : 1) Make __cpu_indicator_init a constructor in libgcc and guard to call it only once. This is unreliable and you don't need 3 symbols from libgcc. You can use Do you mean it is unreliable because of the constructor ordering problem? static struct cpu_indicator { feature model status } cpu_indicator; struct cpu_indicator * __get_cpu_indicator () { if cpu_indicator is uninitialized; then initialize cpu_indicator; return cpu_indicator; } You can simply call __get_cpu_indicator to get a pointer to cpu_indicator; -- H.J.
Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
On Fri, Aug 26, 2011 at 10:06 AM, Sriraman Tallam tmsri...@google.com wrote: On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu hjl.to...@gmail.com wrote: On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam tmsri...@google.com wrote: Hi, Thanks for all the comments. I am attaching a new patch incorporating all of the changes mentioned, mainly : 1) Make __cpu_indicator_init a constructor in libgcc and guard to call it only once. This is unreliable and you don't need 3 symbols from libgcc. You can use Do you mean it is unreliable because of the constructor ordering problem? You do not have total control when __cpu_indicator_init is called. Also you shouldn't use bitfield in struct __processor_model +{ + unsigned int __cpu_is_amd : 1; + unsigned int __cpu_is_intel : 1; + unsigned int __cpu_is_intel_atom : 1; + unsigned int __cpu_is_intel_core2 : 1; + unsigned int __cpu_is_intel_corei7_nehalem : 1; + unsigned int __cpu_is_intel_corei7_westmere : 1; + unsigned int __cpu_is_intel_corei7_sandybridge : 1; + unsigned int __cpu_is_amdfam10_barcelona : 1; + unsigned int __cpu_is_amdfam10_shanghai : 1; + unsigned int __cpu_is_amdfam10_istanbul : 1; +} __cpu_model = {0}; + A processor can't be both Atom and Core 2. -- H.J.
Re: [v3] Handle different versions of Solaris 8 iso/math_iso.h, iso/stdlib_iso.h
On 26 August 2011 14:09, Paolo Carlini wrote: On 8/26/11 2:59 PM, Rainer Orth wrote: Hi Paolo, Ok for mainline if bootstraps pass? Not a comment strictly about this patch, but why we have things like #if __cplusplus= 199711L anywhere? For sure the library is not supposed to be used together with old C++ front-ends. I thought about this myself, but at least the overloads are only present with __cplusplus= 199711L. I don't understand: isn't __cplusplus now *always* = 199711L? Or you want to protect vs the user undefining __cplusplus and then defining it to a different value?!? I don't have the Standard at hand (in theory I'm in vacation ;), maybe Marc can help, but I don't think it's legal, is it? [cpp.predefined]/3: If any of the pre-defined macro names in this subclause, or the identifier defined, is the subject of a #define or a #undef preprocessing directive, the behavior is undefined.
Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
On Fri, Aug 26, 2011 at 10:10 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Aug 26, 2011 at 10:06 AM, Sriraman Tallam tmsri...@google.com wrote: On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu hjl.to...@gmail.com wrote: On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam tmsri...@google.com wrote: Hi, Thanks for all the comments. I am attaching a new patch incorporating all of the changes mentioned, mainly : 1) Make __cpu_indicator_init a constructor in libgcc and guard to call it only once. This is unreliable and you don't need 3 symbols from libgcc. You can use Do you mean it is unreliable because of the constructor ordering problem? You do not have total control when __cpu_indicator_init is called. Like discussed before, for non-ctor functions, which in my opinion is the common use case, it works out great because __cpu_indicator_init is guaranteed to be called and I save doing an extra check. It is only for other ctors where this is a problem. So other ctors call this explicitly. What did I miss? Thanks, -Sri. Also you shouldn't use bitfield in struct __processor_model +{ + unsigned int __cpu_is_amd : 1; + unsigned int __cpu_is_intel : 1; + unsigned int __cpu_is_intel_atom : 1; + unsigned int __cpu_is_intel_core2 : 1; + unsigned int __cpu_is_intel_corei7_nehalem : 1; + unsigned int __cpu_is_intel_corei7_westmere : 1; + unsigned int __cpu_is_intel_corei7_sandybridge : 1; + unsigned int __cpu_is_amdfam10_barcelona : 1; + unsigned int __cpu_is_amdfam10_shanghai : 1; + unsigned int __cpu_is_amdfam10_istanbul : 1; +} __cpu_model = {0}; + A processor can't be both Atom and Core 2. -- H.J.
Re: [v3] Handle different versions of Solaris 8 iso/math_iso.h, iso/stdlib_iso.h
On 26 August 2011 18:13, Jonathan Wakely wrote: On 26 August 2011 14:09, Paolo Carlini wrote: On 8/26/11 2:59 PM, Rainer Orth wrote: Hi Paolo, Ok for mainline if bootstraps pass? Not a comment strictly about this patch, but why we have things like #if __cplusplus= 199711L anywhere? For sure the library is not supposed to be used together with old C++ front-ends. I thought about this myself, but at least the overloads are only present with __cplusplus= 199711L. I don't understand: isn't __cplusplus now *always* = 199711L? Or you want to protect vs the user undefining __cplusplus and then defining it to a different value?!? I don't have the Standard at hand (in theory I'm in vacation ;), maybe Marc can help, but I don't think it's legal, is it? [cpp.predefined]/3: If any of the pre-defined macro names in this subclause, or the identifier defined, is the subject of a #define or a #undef preprocessing directive, the behavior is undefined. More specifically, __cplusplus is ***NOT*** a feature-test macro like _POSIX_SOURCE that can be set by users to request different language standards. Setting __cplusplus will have no effect on the front-end, but might confuse the library (or other third-party headers) just as using -D__GXX_EXPERIMENTAL_CXX0X__ without -std=g++0x will cause big problems, because the front-end will be using -std=c++98 mode but the library will think C++0x support is enabled. Doing this will cause pain. If there is ***any*** maintenance overhead involved in supporting users who try to redefine __cplusplus then I think it's a mistake. I'm certainly not going to think of the effects on those users when I make changes to the library.
Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
On Fri, Aug 26, 2011 at 10:17 AM, Sriraman Tallam tmsri...@google.com wrote: On Fri, Aug 26, 2011 at 10:10 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Aug 26, 2011 at 10:06 AM, Sriraman Tallam tmsri...@google.com wrote: On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu hjl.to...@gmail.com wrote: On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam tmsri...@google.com wrote: Hi, Thanks for all the comments. I am attaching a new patch incorporating all of the changes mentioned, mainly : 1) Make __cpu_indicator_init a constructor in libgcc and guard to call it only once. This is unreliable and you don't need 3 symbols from libgcc. You can use Do you mean it is unreliable because of the constructor ordering problem? You do not have total control when __cpu_indicator_init is called. Like discussed before, for non-ctor functions, which in my opinion is the common use case, it works out great because __cpu_indicator_init is guaranteed to be called and I save doing an extra check. It is only for other ctors where this is a problem. So other ctors call this explicitly. What did I miss? I have static void foo ( void ) __attribute__((constructor)); static void foo ( void ) { ... call bar (); ... } in my application. bar () uses those cpu specific functions. foo () is called before __cpu_indicator_init. Since IFUNC returns the cpu specific function address only for the first call, the proper cpu specific functions will never be used. -- H.J.
Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
On Fri, Aug 26, 2011 at 10:24 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Aug 26, 2011 at 10:17 AM, Sriraman Tallam tmsri...@google.com wrote: On Fri, Aug 26, 2011 at 10:10 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Aug 26, 2011 at 10:06 AM, Sriraman Tallam tmsri...@google.com wrote: On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu hjl.to...@gmail.com wrote: On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam tmsri...@google.com wrote: Hi, Thanks for all the comments. I am attaching a new patch incorporating all of the changes mentioned, mainly : 1) Make __cpu_indicator_init a constructor in libgcc and guard to call it only once. This is unreliable and you don't need 3 symbols from libgcc. You can use Do you mean it is unreliable because of the constructor ordering problem? You do not have total control when __cpu_indicator_init is called. Like discussed before, for non-ctor functions, which in my opinion is the common use case, it works out great because __cpu_indicator_init is guaranteed to be called and I save doing an extra check. It is only for other ctors where this is a problem. So other ctors call this explicitly. What did I miss? I have static void foo ( void ) __attribute__((constructor)); static void foo ( void ) { ... call bar (); ... } in my application. bar () uses those cpu specific functions. foo () is called before __cpu_indicator_init. Since IFUNC returns the cpu specific function address only for the first call, the proper cpu specific functions will never be used. Please correct me if I am wrong since I did not follow the IFUNC part you mentioned. However, it looks like this could be solved with adding an explicit call to __cpu_indicator_init from within the ctor foo. To me, it seems like the pain of adding this call explicitly in other ctors is worth it because it works cleanly for non-ctors. static void foo ( void ) __attribute__((constructor)); static void foo ( void ) { ... __cpu_indicator_init (); call bar (); ... } Will this work? Thanks, -Sri. -- H.J.
Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
On Fri, Aug 26, 2011 at 10:37 AM, Sriraman Tallam tmsri...@google.com wrote: On Fri, Aug 26, 2011 at 10:24 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Aug 26, 2011 at 10:17 AM, Sriraman Tallam tmsri...@google.com wrote: On Fri, Aug 26, 2011 at 10:10 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Aug 26, 2011 at 10:06 AM, Sriraman Tallam tmsri...@google.com wrote: On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu hjl.to...@gmail.com wrote: On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam tmsri...@google.com wrote: Hi, Thanks for all the comments. I am attaching a new patch incorporating all of the changes mentioned, mainly : 1) Make __cpu_indicator_init a constructor in libgcc and guard to call it only once. This is unreliable and you don't need 3 symbols from libgcc. You can use Do you mean it is unreliable because of the constructor ordering problem? You do not have total control when __cpu_indicator_init is called. Like discussed before, for non-ctor functions, which in my opinion is the common use case, it works out great because __cpu_indicator_init is guaranteed to be called and I save doing an extra check. It is only for other ctors where this is a problem. So other ctors call this explicitly. What did I miss? I have static void foo ( void ) __attribute__((constructor)); static void foo ( void ) { ... call bar (); ... } in my application. bar () uses those cpu specific functions. foo () is called before __cpu_indicator_init. Since IFUNC returns the cpu specific function address only for the first call, the proper cpu specific functions will never be used. Please correct me if I am wrong since I did not follow the IFUNC part you mentioned. However, it looks like this could be solved with adding an explicit call to __cpu_indicator_init from within the ctor foo. To me, it seems like the pain of adding this call explicitly in other ctors is worth it because it works cleanly for non-ctors. static void foo ( void ) __attribute__((constructor)); static void foo ( void ) { ... __cpu_indicator_init (); call bar (); ... } Will this work? Do I have to do that in every constructor, including C++ global constructors? It is ridiculous. -- H.J.
Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
Is there a standard way to force this init function to be called before all ctors? Adding a ctor in one crtx.o ? David On Fri, Aug 26, 2011 at 10:45 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Aug 26, 2011 at 10:37 AM, Sriraman Tallam tmsri...@google.com wrote: On Fri, Aug 26, 2011 at 10:24 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Aug 26, 2011 at 10:17 AM, Sriraman Tallam tmsri...@google.com wrote: On Fri, Aug 26, 2011 at 10:10 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Aug 26, 2011 at 10:06 AM, Sriraman Tallam tmsri...@google.com wrote: On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu hjl.to...@gmail.com wrote: On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam tmsri...@google.com wrote: Hi, Thanks for all the comments. I am attaching a new patch incorporating all of the changes mentioned, mainly : 1) Make __cpu_indicator_init a constructor in libgcc and guard to call it only once. This is unreliable and you don't need 3 symbols from libgcc. You can use Do you mean it is unreliable because of the constructor ordering problem? You do not have total control when __cpu_indicator_init is called. Like discussed before, for non-ctor functions, which in my opinion is the common use case, it works out great because __cpu_indicator_init is guaranteed to be called and I save doing an extra check. It is only for other ctors where this is a problem. So other ctors call this explicitly. What did I miss? I have static void foo ( void ) __attribute__((constructor)); static void foo ( void ) { ... call bar (); ... } in my application. bar () uses those cpu specific functions. foo () is called before __cpu_indicator_init. Since IFUNC returns the cpu specific function address only for the first call, the proper cpu specific functions will never be used. Please correct me if I am wrong since I did not follow the IFUNC part you mentioned. However, it looks like this could be solved with adding an explicit call to __cpu_indicator_init from within the ctor foo. To me, it seems like the pain of adding this call explicitly in other ctors is worth it because it works cleanly for non-ctors. static void foo ( void ) __attribute__((constructor)); static void foo ( void ) { ... __cpu_indicator_init (); call bar (); ... } Will this work? Do I have to do that in every constructor, including C++ global constructors? It is ridiculous. -- H.J.
Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
On Fri, Aug 26, 2011 at 10:45 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Aug 26, 2011 at 10:37 AM, Sriraman Tallam tmsri...@google.com wrote: On Fri, Aug 26, 2011 at 10:24 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Aug 26, 2011 at 10:17 AM, Sriraman Tallam tmsri...@google.com wrote: On Fri, Aug 26, 2011 at 10:10 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Aug 26, 2011 at 10:06 AM, Sriraman Tallam tmsri...@google.com wrote: On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu hjl.to...@gmail.com wrote: On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam tmsri...@google.com wrote: Hi, Thanks for all the comments. I am attaching a new patch incorporating all of the changes mentioned, mainly : 1) Make __cpu_indicator_init a constructor in libgcc and guard to call it only once. This is unreliable and you don't need 3 symbols from libgcc. You can use Do you mean it is unreliable because of the constructor ordering problem? You do not have total control when __cpu_indicator_init is called. Like discussed before, for non-ctor functions, which in my opinion is the common use case, it works out great because __cpu_indicator_init is guaranteed to be called and I save doing an extra check. It is only for other ctors where this is a problem. So other ctors call this explicitly. What did I miss? I have static void foo ( void ) __attribute__((constructor)); static void foo ( void ) { ... call bar (); ... } in my application. bar () uses those cpu specific functions. foo () is called before __cpu_indicator_init. Since IFUNC returns the cpu specific function address only for the first call, the proper cpu specific functions will never be used. Please correct me if I am wrong since I did not follow the IFUNC part you mentioned. However, it looks like this could be solved with adding an explicit call to __cpu_indicator_init from within the ctor foo. To me, it seems like the pain of adding this call explicitly in other ctors is worth it because it works cleanly for non-ctors. static void foo ( void ) __attribute__((constructor)); static void foo ( void ) { ... __cpu_indicator_init (); call bar (); ... } Will this work? Do I have to do that in every constructor, including C++ global constructors? It is ridiculous. It seems like libgcc is on the link line after user code in the command-line and so __cpu_indicator_init should fire first, both when statically and dynamically linked. Example: foo.cc: int __attribute__ ((constructor)) foo () { return 0; } However, with something like this : g++ -Wl,--u,__cpu_indicator_init -lgcc foo.cc foo gets called ahead of __cpu_indicator_init. For these abnormal link usages, call it explicitly. So, can you please give me a common use case where __cpu_inidicator_init will get called after a constructor. Thanks, -Sri. -- H.J.
Re: [C++0x] contiguous bitfields race implementation
This is a slight update from the last revision, with your issues addressed as I explained in the last email. However, everything turned out to be much tricker than I expected (variable length offsets with arrays, bit fields spanning multiple words, surprising padding gymnastics by GCC, etc etc). It turns out that what we need is to know the precise bit region size at all times, and adjust it as we rearrange and cut things into pieces throughout the RTL bit field machinery. I enabled the C++ memory model, and forced a boostrap and regression test with it. This brought about many interesting cases, which I was able to distill and add to the testsuite. Of particular interest was the struct-layout-1.exp tests. Since many of the tests set a global bit field, only to later check it against a local variable containing the same value, it is the perfect stressor because, while globals are restricted under the memory model, locals are not. So we can check that we can interoperate with the less restrictive model, and that the patch does not introduce ABI inconsistencies. After much grief, we are now passing all the struct-layout-1.exp tests. Eventually, I'd like to force the struct-layout-1.exp tests to run for --param allow-store-data-races=0 as well. Unfortunately, this will increase testing time. I have (unfortunately) introduced an additional call to get_inner_reference(), but only for the field itself (one time). I can't remember the details, but it was something to effect of the bit position + padding being impossible to calculate in one variable array reference case. I can dig up the case if you'd like. I am currently tackling a reload miscompilation failure while building a 32-bit library. I am secretly hoping your review will uncover the flaw without me having to pick this up. Otherwise, this is a much more comprehensive approach than what is currently in mainline, and we now pass all the bitfield tests the GCC testsuite could throw at it. Fire away. * machmode.h (get_best_mode): Remove 2 arguments. * fold-const.c (optimize_bit_field_compare): Same. (fold_truthop): Same. * expr.c (store_field): Change argument types in prototype. (emit_group_store): Change argument types to store_bit_field call. (copy_blkmode_from_reg): Same. (write_complex_part): Same. (optimize_bitfield_assignment_op): Change argument types. Change arguments to get_best_mode. (get_bit_range): Rewrite. (expand_assignment): Adjust new call to get_bit_range. Adjust bitregion_offset when to_rtx is changed. Adjust calls to store_field with new argument types. (store_field): New argument types. Adjust calls to store_bit_field with new arguments. * expr.h (store_bit_field): Change argument types. * stor-layout.c (get_best_mode): Remove use of bitregion* arguments. * expmed.c (store_bit_field_1): Change argument types. Do not calculate maxbits. Adjust bitregion_maxbits if offset changes. (store_bit_field): Change argument types. Adjust address taking into account bitregion_offset. (store_fixed_bit_field): Change argument types. Do not calculate maxbits. (store_split_bit_field): Change argument types. (extract_bit_field_1): Adjust arguments to get_best_mode. (extract_fixed_bit_field): Same. Index: machmode.h === --- machmode.h (revision 176891) +++ machmode.h (working copy) @@ -249,8 +249,6 @@ extern enum machine_mode mode_for_vector /* Find the best mode to use to access a bit field. */ extern enum machine_mode get_best_mode (int, int, - unsigned HOST_WIDE_INT, - unsigned HOST_WIDE_INT, unsigned int, enum machine_mode, int); Index: fold-const.c === --- fold-const.c(revision 176891) +++ fold-const.c(working copy) @@ -3394,7 +3394,7 @@ optimize_bit_field_compare (location_t l flag_strict_volatile_bitfields 0) nmode = lmode; else -nmode = get_best_mode (lbitsize, lbitpos, 0, 0, +nmode = get_best_mode (lbitsize, lbitpos, const_p ? TYPE_ALIGN (TREE_TYPE (linner)) : MIN (TYPE_ALIGN (TREE_TYPE (linner)), TYPE_ALIGN (TREE_TYPE (rinner))), @@ -5221,7 +5221,7 @@ fold_truthop (location_t loc, enum tree_ to be relative to a field of that size. */ first_bit = MIN (ll_bitpos, rl_bitpos); end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize); - lnmode = get_best_mode (end_bit - first_bit, first_bit, 0, 0, + lnmode = get_best_mode (end_bit - first_bit, first_bit,
Re: [PATCH] Handle MEM_REF in decode_addr_const
On Fri, Aug 26, 2011 at 5:53 AM, Richard Guenther rguent...@suse.de wrote: Another missed piece, exposed by less MEM_REF - ARRAY_REF folding. Interestingly only for Ada testcases. I think this also fixed http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50116 but I don't know for sure. Thanks, Andrew Pinski
[lra] patch to fix ppc32 code size degradation and a small clean up
LRA on ppc32 had some code size degradation in comparison with the reload pass. The reason for that is systematic usage of moves from memory to memory through two integer registers for DFmode instead of one floating point register as reload does. The following patch solves the problem. It is achieved by preferencing an insn alternative with smallest number of registers involved when higher priority rules (like # of needed reloads) have the same results. I wish I could use also register pressure information for choosing an alternative but unfortunately it will result in slower LRA because the info is not available at this subpass (constraints). Another wish would be to use insn length but again it needs (a temporary) transformation to final result insn which is not known yet at this stage because we did not assigned hard registers to reload pseudos or memory to spilled pseudos. The patch also contains a clean up of function mark_not_eliminable. The patch was bootstrapped on x86-64 and ppc64. 2011-08-26 Vladimir Makarov vmaka...@redhat.com * lra-constraints.c (best_reload_nregs): New variable. (process_alt_operands): Add preferences for smaller hard registers involved. Increase reject for all failed non registers. * lra-eliminations.c (mark_not_eliminable): Add check on hard register before looping on eliminations. Index: lra-constraints.c === --- lra-constraints.c (revision 178120) +++ lra-constraints.c (working copy) @@ -1143,6 +1143,10 @@ static int best_losers, best_overall; /* Number of small register classes used for operands of the best alternative. */ static int best_small_class_operands_num; +/* Overall number hard registers used for reloads. For example, on + some targets we need 2 general registers to reload DFmode and only + one floating point register. */ +static int best_reload_nregs; /* Overall number reflecting distances of previous reloading the same value. It is used to improve inheritance chances. */ static int best_reload_sum; @@ -1415,7 +1419,7 @@ process_alt_operands (int only_alternati rtx no_subreg_operand[MAX_RECOG_OPERANDS], operand_reg[MAX_RECOG_OPERANDS]; int hard_regno[MAX_RECOG_OPERANDS]; enum machine_mode biggest_mode[MAX_RECOG_OPERANDS]; - int reload_sum; + int reload_nregs, reload_sum; /* Calculate some data common for all alternatives to speed up the function. */ @@ -1460,7 +1464,7 @@ process_alt_operands (int only_alternati (only_alternative = 0 nalt != only_alternative)) continue; - overall = losers = reject = reload_sum = 0; + overall = losers = reject = reload_nregs = reload_sum = 0; for (nop = 0; nop n_operands; nop++) reject += (curr_static_id -operand_alternative[nalt * n_operands + nop].reject); @@ -2003,7 +2007,7 @@ process_alt_operands (int only_alternati /* Input reloads can be inherited more often than output reloads can be removed, so penalize output reloads. */ - if (curr_static_id-operand[nop].type != OP_IN) + if (!REG_P (op) || curr_static_id-operand[nop].type != OP_IN) reject++; /* SUBREGS ??? */ if (this_alternative_matches = 0) @@ -2012,6 +2016,9 @@ process_alt_operands (int only_alternati } else if (no_regs_p ! this_alternative_offmemok ! constmemok) goto fail; + + if (! no_regs_p) + reload_nregs += ira_reg_class_max_nregs[this_alternative][mode]; } if (early_clobber_p) @@ -2128,7 +2135,9 @@ process_alt_operands (int only_alternati best_small_class_operands_num || (small_class_operands_num == best_small_class_operands_num - best_reload_sum reload_sum)) + (reload_nregs best_reload_nregs + || (reload_nregs == best_reload_nregs + best_reload_sum reload_sum { for (nop = 0; nop n_operands; nop++) { @@ -2145,6 +2154,7 @@ process_alt_operands (int only_alternati best_overall = overall; best_losers = losers; best_small_class_operands_num = small_class_operands_num; + best_reload_nregs = reload_nregs; best_reload_sum = reload_sum; goal_alt_number = nalt; } Index: lra-eliminations.c === --- lra-eliminations.c (revision 178120) +++ lra-eliminations.c (working copy) @@ -671,49 +671,46 @@ mark_not_eliminable (rtx x) case POST_DEC: case POST_MODIFY: case PRE_MODIFY: - /* If we modify the source of an
[PATCH, i386]: Vectorize round insn
Hello! Attached patch enables vectorization of round function using sse4.1 round insn. AZ stands for Away from Zero. 2011-08-26 Uros Bizjak ubiz...@gmail.com * config/i386/sse.md (roundmode2): New expander. * config/i386/i386.c (enum ix86_builtins): Add IX86_BUILTIN_ROUND{PS,PD}_AZ{,256}. (struct builtin_description): Add __builtin_ia32_round{ps,pd}_az{,256} descriptions. (ix86_builtin_vectorized_function): Handle BUILT_IN_ROUND{,F} builtins. testsuite/ChangeLog: 2011-08-26 Uros Bizjak ubiz...@gmail.com * gcc.target/i386/sse_4_1-round-vec.c: New test. * gcc.target/i386/sse_4_1-roundf-vec.c: New test. * gcc.target/i386/avx-round-vec.c: New test. * gcc.target/i386/avx-roundf-vec.c: New test. Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline. Uros. Index: config/i386/sse.md === --- config/i386/sse.md (revision 178119) +++ config/i386/sse.md (working copy) @@ -9646,6 +9646,40 @@ (set_attr prefix orig,vex) (set_attr mode MODE)]) +(define_expand roundmode2 + [(set (match_dup 4) + (plus:VF + (match_operand:VF 1 nonimmediate_operand ) + (match_dup 3))) + (set (match_operand:VF 0 register_operand ) + (unspec:VF + [(match_dup 4) (match_dup 5)] + UNSPEC_ROUND))] + TARGET_ROUND !flag_trapping_math +{ + enum machine_mode scalar_mode; + const struct real_format *fmt; + REAL_VALUE_TYPE pred_half, half_minus_pred_half; + rtx half, vec_half; + + scalar_mode = GET_MODE_INNER (MODEmode); + + /* load nextafter (0.5, 0.0) */ + fmt = REAL_MODE_FORMAT (scalar_mode); + real_2expN (half_minus_pred_half, -(fmt-p) - 1, scalar_mode); + REAL_ARITHMETIC (pred_half, MINUS_EXPR, dconsthalf, half_minus_pred_half); + half = const_double_from_real_value (pred_half, scalar_mode); + + vec_half = ix86_build_const_vector (MODEmode, true, half); + vec_half = force_reg (MODEmode, vec_half); + + operands[3] = gen_reg_rtx (MODEmode); + emit_insn (gen_copysignmode3 (operands[3], vec_half, operands[1])); + + operands[4] = gen_reg_rtx (MODEmode); + operands[5] = GEN_INT (ROUND_TRUNC); +}) + ; ;; ;; Intel SSE4.2 string/text processing instructions Index: config/i386/i386.c === --- config/i386/i386.c (revision 178119) +++ config/i386/i386.c (working copy) @@ -23661,10 +23661,12 @@ enum ix86_builtins IX86_BUILTIN_CEILPD, IX86_BUILTIN_TRUNCPD, IX86_BUILTIN_RINTPD, + IX86_BUILTIN_ROUNDPD_AZ, IX86_BUILTIN_FLOORPS, IX86_BUILTIN_CEILPS, IX86_BUILTIN_TRUNCPS, IX86_BUILTIN_RINTPS, + IX86_BUILTIN_ROUNDPS_AZ, IX86_BUILTIN_PTESTZ, IX86_BUILTIN_PTESTC, @@ -23837,10 +23839,12 @@ enum ix86_builtins IX86_BUILTIN_CEILPD256, IX86_BUILTIN_TRUNCPD256, IX86_BUILTIN_RINTPD256, + IX86_BUILTIN_ROUNDPD_AZ256, IX86_BUILTIN_FLOORPS256, IX86_BUILTIN_CEILPS256, IX86_BUILTIN_TRUNCPS256, IX86_BUILTIN_RINTPS256, + IX86_BUILTIN_ROUNDPS_AZ256, IX86_BUILTIN_UNPCKHPD256, IX86_BUILTIN_UNPCKLPD256, @@ -25063,11 +25067,15 @@ static const struct builtin_description bdesc_args { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundpd, __builtin_ia32_truncpd, IX86_BUILTIN_TRUNCPD, (enum rtx_code) ROUND_TRUNC, (int) V2DF_FTYPE_V2DF_ROUND }, { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundpd, __builtin_ia32_rintpd, IX86_BUILTIN_RINTPD, (enum rtx_code) ROUND_MXCSR, (int) V2DF_FTYPE_V2DF_ROUND }, + { OPTION_MASK_ISA_ROUND, CODE_FOR_roundv2df2, __builtin_ia32_roundpd_az, IX86_BUILTIN_ROUNDPD_AZ, UNKNOWN, (int) V2DF_FTYPE_V2DF }, + { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundps, __builtin_ia32_floorps, IX86_BUILTIN_FLOORPS, (enum rtx_code) ROUND_FLOOR, (int) V4SF_FTYPE_V4SF_ROUND }, { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundps, __builtin_ia32_ceilps, IX86_BUILTIN_CEILPS, (enum rtx_code) ROUND_CEIL, (int) V4SF_FTYPE_V4SF_ROUND }, { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundps, __builtin_ia32_truncps, IX86_BUILTIN_TRUNCPS, (enum rtx_code) ROUND_TRUNC, (int) V4SF_FTYPE_V4SF_ROUND }, { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundps, __builtin_ia32_rintps, IX86_BUILTIN_RINTPS, (enum rtx_code) ROUND_MXCSR, (int) V4SF_FTYPE_V4SF_ROUND }, + { OPTION_MASK_ISA_ROUND, CODE_FOR_roundv4sf2, __builtin_ia32_roundps_az, IX86_BUILTIN_ROUNDPS_AZ, UNKNOWN, (int) V4SF_FTYPE_V4SF }, + { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_ptest, __builtin_ia32_ptestz128, IX86_BUILTIN_PTESTZ, EQ, (int) INT_FTYPE_V2DI_V2DI_PTEST }, { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_ptest, __builtin_ia32_ptestc128, IX86_BUILTIN_PTESTC, LTU, (int) INT_FTYPE_V2DI_V2DI_PTEST }, { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_ptest, __builtin_ia32_ptestnzc128, IX86_BUILTIN_PTESTNZC, GTU, (int) INT_FTYPE_V2DI_V2DI_PTEST }, @@ -25185,11 +25193,15 @@
Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)
IFUNC selector will need to call get_cpu_indicator (as proposed by HJ or something similar), while in other contexts, the implementation should find a way to make sure the indicator is already initialized such that the builtins accessing the features can be directly used (See also Michael and Richard's previous comments). The runtime penalty is much smaller. david On Fri, Aug 26, 2011 at 10:37 AM, Sriraman Tallam tmsri...@google.com wrote: On Fri, Aug 26, 2011 at 10:24 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Aug 26, 2011 at 10:17 AM, Sriraman Tallam tmsri...@google.com wrote: On Fri, Aug 26, 2011 at 10:10 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Aug 26, 2011 at 10:06 AM, Sriraman Tallam tmsri...@google.com wrote: On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu hjl.to...@gmail.com wrote: On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam tmsri...@google.com wrote: Hi, Thanks for all the comments. I am attaching a new patch incorporating all of the changes mentioned, mainly : 1) Make __cpu_indicator_init a constructor in libgcc and guard to call it only once. This is unreliable and you don't need 3 symbols from libgcc. You can use Do you mean it is unreliable because of the constructor ordering problem? You do not have total control when __cpu_indicator_init is called. Like discussed before, for non-ctor functions, which in my opinion is the common use case, it works out great because __cpu_indicator_init is guaranteed to be called and I save doing an extra check. It is only for other ctors where this is a problem. So other ctors call this explicitly. What did I miss? I have static void foo ( void ) __attribute__((constructor)); static void foo ( void ) { ... call bar (); ... } in my application. bar () uses those cpu specific functions. foo () is called before __cpu_indicator_init. Since IFUNC returns the cpu specific function address only for the first call, the proper cpu specific functions will never be used. Please correct me if I am wrong since I did not follow the IFUNC part you mentioned. However, it looks like this could be solved with adding an explicit call to __cpu_indicator_init from within the ctor foo. To me, it seems like the pain of adding this call explicitly in other ctors is worth it because it works cleanly for non-ctors. static void foo ( void ) __attribute__((constructor)); static void foo ( void ) { ... __cpu_indicator_init (); call bar (); ... } Will this work? Thanks, -Sri. -- H.J.
Re: [PATCH] Handle MEM_REF in decode_addr_const
On Fri, Aug 26, 2011 at 9:02 PM, Andrew Pinski pins...@gmail.com wrote: On Fri, Aug 26, 2011 at 5:53 AM, Richard Guenther rguent...@suse.de wrote: Another missed piece, exposed by less MEM_REF - ARRAY_REF folding. Interestingly only for Ada testcases. I think this also fixed http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50116 but I don't know for sure. Yes, that's exactly the ICEs I got. I'll backport the fix. Richard. Thanks, Andrew Pinski
[PATCH, i386]: Rewrite ix86_build_const_vector
Hello! No functional change. 2011-08-26 Uros Bizjak ubiz...@gmail.com * config/i386/i386.c (ix86_build_const_vector): Rewrite using loop with RTVEC_ELT accessor. Tested on x86_64-pc-linux-gnu, committed to mainline. Uros. Index: config/i386/i386.c === --- config/i386/i386.c (revision 178123) +++ config/i386/i386.c (working copy) @@ -16512,53 +16512,30 @@ ix86_expand_convert_uns_sisf_sse (rtx target, rtx rtx ix86_build_const_vector (enum machine_mode mode, bool vect, rtx value) { + int i, n_elt; rtvec v; + enum machine_mode scalar_mode; + switch (mode) { case V4SImode: - gcc_assert (vect); - v = gen_rtvec (4, value, value, value, value); - return gen_rtx_CONST_VECTOR (V4SImode, v); - case V2DImode: gcc_assert (vect); - v = gen_rtvec (2, value, value); - return gen_rtx_CONST_VECTOR (V2DImode, v); - case V8SFmode: - if (vect) - v = gen_rtvec (8, value, value, value, value, - value, value, value, value); - else - v = gen_rtvec (8, value, CONST0_RTX (SFmode), - CONST0_RTX (SFmode), CONST0_RTX (SFmode), - CONST0_RTX (SFmode), CONST0_RTX (SFmode), - CONST0_RTX (SFmode), CONST0_RTX (SFmode)); - return gen_rtx_CONST_VECTOR (V8SFmode, v); - case V4SFmode: - if (vect) - v = gen_rtvec (4, value, value, value, value); - else - v = gen_rtvec (4, value, CONST0_RTX (SFmode), - CONST0_RTX (SFmode), CONST0_RTX (SFmode)); - return gen_rtx_CONST_VECTOR (V4SFmode, v); - case V4DFmode: - if (vect) - v = gen_rtvec (4, value, value, value, value); - else - v = gen_rtvec (4, value, CONST0_RTX (DFmode), - CONST0_RTX (DFmode), CONST0_RTX (DFmode)); - return gen_rtx_CONST_VECTOR (V4DFmode, v); - case V2DFmode: - if (vect) - v = gen_rtvec (2, value, value); - else - v = gen_rtvec (2, value, CONST0_RTX (DFmode)); - return gen_rtx_CONST_VECTOR (V2DFmode, v); + n_elt = GET_MODE_NUNITS (mode); + v = rtvec_alloc (n_elt); + scalar_mode = GET_MODE_INNER (mode); + RTVEC_ELT (v, 0) = value; + + for (i = 1; i n_elt; ++i) + RTVEC_ELT (v, i) = vect ? value : CONST0_RTX (scalar_mode); + + return gen_rtx_CONST_VECTOR (mode, v); + default: gcc_unreachable (); }
Re: [PATCH] Add infrastructure to merge standard builtin enums with backend builtins
On Aug 26, 2011, at 7:19 AM, Michael Meissner wrote: The alternative is something like what Kenney and Mike are doing in their private port, where they have new syntax in the MD file for builtins. But are those user-exposed builtins? Certainly interesting to combine builtin definition and the instruction it expands to. Yes, these are user exposed builtins. Massive amounts of user exposed builtins (Mike said he needs 13 bits for the builtin index). I think it would be better if Mike comments on this. I gave the quick intro yesterday. You wind up specifying the built-ins that you have, and the generator does things like assign enum values, create a file that appears the builtins into the user name space from the __builtin_ namespace, generate compilation test cases for all the built-ins with all different types they support. Generate executable testcases to ensure everything works flawlessly. We have mods to the overload builtin mechanism so that one can do things like: template class T T foo(T x, T y) { x = add(x, y); return x; } Or, if you perfer the C version: int fooi(int x, int y) { return add(x, y); } short foos(short x, short y) { return add(x, y); } and have it work out just fine when T is instantiated with all the various types that are supported by the hardware, and it works in C. This permits a nice api for the machine builtins, as you don't have to mangle in types into the builtin-name. The system is complete enough to handle the needs of anything coming down the pike in the next decade. It can handle input/output parameters that have register assignments. It can handle reference parameters (like the input/output parameters, but these are done as values in memory. The generator builds up _all_ the types one needs, handles all the registration and all the wiring up for codegen. There is a mechanism to remap arguments going to the rtl generators, so the operand ordering of the builtin doesn't have to match the operand ordering of the md pattern for the semantics that back the builtin. There is a beefy macro system built into the generator so that you can have nice simple patterns and it is beefier than the iterators one can use today. So, for example, we have: (define_special_iterator imath3 [add sub mul]) to define some built-ins that are regular with respect to the operation, but, this isn't a code nor mode iterator, it just iterators the pattern with the string substituted. For machines with any regularity, the patterns wind up being smaller and easier to maintain. I'd be happy to answer questions about it.
Re: PING: [PATCH]: Fix -fbranch-probabilities
Hello, Could I have a review for the trivial patch posted in http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01123.html -fprofile-use sets flag_branch_probabilities. But we should also be able to use -fbranch-probabilities on its own using the information generated by -fprofile-arcs, as documented. OK, thanks! I was under impression that some of gcov tests still use -fprofile-arcs -fbranch-probabilities pair. It don't seem to be the case, so if you add a testcase, you get extra score ;) Honza Many thanks Christian
Re: Vector Comparison patch
Hi Here is a patch with vector comparison only. Comparison is expanded using VEC_COND_EXPR, conversions between the different types inside the VEC_COND_EXPR are happening in optabs.c. The comparison generally works, however, the x86 backend does not recognize vectors of all 1s of type float and double, which is very bad, but I hope it could be fixed easily. Here is my humble attempt: Index: gcc/config/i386/predicates.md === --- gcc/config/i386/predicates.md (revision 177665) +++ gcc/config/i386/predicates.md (working copy) @@ -763,7 +763,19 @@ (define_predicate vector_all_ones_opera for (i = 0; i nunits; ++i) { rtx x = CONST_VECTOR_ELT (op, i); - if (x != constm1_rtx) + rtx y; + + if (GET_MODE_CLASS (GET_MODE (x)) == MODE_FLOAT) + { + REAL_VALUE_TYPE r; + REAL_VALUE_FROM_INT (r, -1, -1, GET_MODE (x)); + y = CONST_DOUBLE_FROM_REAL_VALUE (r, GET_MODE (x)); + } + else + y = constm1_rtx; + + /* if (x != constm1_rtx) */ + if (!rtx_equal_p (x, y)) return false; } return true; But the problem I have here is that -1 actually converts to -1.0, where I need to treat -0x1 as float. Something like: int p = -1; void *x = p; float r = *((float *)x); Is there any way to do that in this context? Or may be there is another way to support real-typed vectors of -1 as constants? ChangeLog 20011-08-27 Artjoms Sinkarovs artyom.shinkar...@gmail.com gcc/ * optabs.c (vector_compare_rtx): Allow comparison operands and vcond operands have different type. (expand_vec_cond_expr): Convert operands in case they do not match. * fold-const.c (constant_boolean_node): Adjust the meaning of boolean for vector types: true = {-1,..}, false = {0,..}. (fold_unary_loc): Avoid conversion of vector comparison to boolean type. * expr.c (expand_expr_real_2): Expand vector comparison by building an appropriate VEC_COND_EXPR. * c-typeck.c (build_binary_op): Typecheck vector comparisons. (c_objc_common_truthvalue_conversion): Adjust. * gimplify.c (gimplify_expr): Support vector comparison in gimple. * tree.def: Adjust comment. * tree-vect-generic.c (do_compare): Helper function. (expand_vector_comparison): Check if hardware supports vector comparison of the given type or expand vector piecewise. (expand_vector_operation): Treat comparison as binary operation of vector type. (expand_vector_operations_1): Adjust. * tree-cfg.c (verify_gimple_comparison): Adjust. gcc/config/i386 * i386.c (ix86_expand_sse_movcc): Consider a case when vcond operators are {-1,..} and {0,..}. gcc/doc * extend.texi: Adjust. gcc/testsuite * gcc.c-torture/execute/vector-compare-1.c: New test. * gcc.c-torture/execute/vector-compare-2.c: New test. * gcc.dg/vector-compare-1.c: New test. * gcc.dg/vector-compare-2.c: New test. Bootstrapped and tested on x86_64-unknown-linux-gnu. Artem. Index: gcc/doc/extend.texi === --- gcc/doc/extend.texi (revision 177665) +++ gcc/doc/extend.texi (working copy) @@ -6553,6 +6553,29 @@ invoke undefined behavior at runtime. W accesses for vector subscription can be enabled with @option{-Warray-bounds}. +In GNU C vector comparison is supported within standard comparison +operators: @code{==, !=, , =, , =}. Comparison operands can be +vector expressions of integer-type or real-type. Comparison between +integer-type vectors and real-type vectors are not supported. The +result of the comparison is a vector of the same width and number of +elements as the comparison operands with a signed integral element +type. + +Vectors are compared element-wise producing 0 when comparison is false +and -1 (constant of the appropriate type where all bits are set) +otherwise. Consider the following example. + +@smallexample +typedef int v4si __attribute__ ((vector_size (16))); + +v4si a = @{1,2,3,4@}; +v4si b = @{3,2,1,4@}; +v4si c; + +c = a b; /* The result would be @{0, 0,-1, 0@} */ +c = a == b; /* The result would be @{0,-1, 0,-1@} */ +@end smallexample + You can declare variables and use them in function calls and returns, as well as in assignments and some casts. You can specify a vector type as a return type for a function. Vector types can also be used as function Index: gcc/optabs.c === --- gcc/optabs.c(revision 177665) +++ gcc/optabs.c(working copy) @@ -6502,7 +6502,8 @@ get_rtx_code (enum tree_code tcode, bool unsigned operators. Do not generate compare instruction. */