date:20110826

[Fortran, Patch] Coarray: libcaf patch for _gfortran_caf_deregister

2011-08-26 Thread Tobias Burnus

Allocatable coarrays are freed and deregistered via the libcaf function 
_gfortran_caf_deregister. Currently, the front end does not generate 
calls to the that function, however, this patch already implements the 
function.


See http://gcc.gnu.org/wiki/CoarrayLib and 
http://gcc.gnu.org/ml/fortran/2010-04/msg00168.html for details.


The function is called with the coarray token as argument. The token 
identifies the coarray in a way defined by the library. In case of 
single.c, it just contains the address of the allocated memory of the 
coarray. In case of mpi.c, it is an array of memory addresses on all 
images such that token[this_image()-1] is the memory location of the 
current image.


The patch also adds stat= and errmsg= diagnostic.

TODO: Adding calls to the function in code generated by the compiler - 
and testing the function.



Tested by compiling with mpicc and gcc with -Wall -Wextra -std=c99.
OK for the trunk?

Tobias
2011-08-26  Tobias Burnus  bur...@net-b.de

	* caf/libcaf.h (_gfortran_caf_deregister): Update prototype.
	* caf/mpi.c (_gfortran_caf_deregister): Modify prototype,
	actually free memory and add error diagnostic.
	* caf/single.c (_gfortran_caf_deregister): Ditto.

diff --git a/libgfortran/caf/libcaf.h b/libgfortran/caf/libcaf.h
index 4fe09e4..e6be7ce 100644
--- a/libgfortran/caf/libcaf.h
+++ b/libgfortran/caf/libcaf.h
@@ -69,7 +69,7 @@ void _gfortran_caf_finalize (void);
 
 void * _gfortran_caf_register (ptrdiff_t, caf_register_t, void **, int *,
 			   char *, int);
-int _gfortran_caf_deregister (void **);
+void _gfortran_caf_deregister (void **, int *, char *, int);
 
 
 void _gfortran_caf_sync_all (int *, char *, int);
diff --git a/libgfortran/caf/mpi.c b/libgfortran/caf/mpi.c
index ea4c0f0..711c6ee 100644
--- a/libgfortran/caf/mpi.c
+++ b/libgfortran/caf/mpi.c
@@ -103,7 +103,7 @@ _gfortran_caf_finalize (void)
 {
   while (caf_static_list != NULL)
 {
-  free(caf_static_list-token[caf_this_image-1]);
+  free (caf_static_list-token[caf_this_image-1]);
   caf_static_list = caf_static_list-prev;
 }
 
@@ -187,10 +187,36 @@ error:
 }
 
 
-int
-_gfortran_caf_deregister (void **token __attribute__ ((unused)))
+void
+_gfortran_caf_deregister (void **token, int *stat, char *errmsg, int errmsg_len)
 {
-  return 0;
+  if (unlikely (caf_is_finalized))
+{
+  const char msg[] = Failed to deallocate coarray - 
+			  there are stopped images;
+  if (stat)
+	{
+	  *stat = STAT_STOPPED_IMAGE;
+	
+	  if (errmsg_len  0)
+	{
+	  int len = ((int) sizeof (msg) - 1  errmsg_len)
+			? errmsg_len : (int) sizeof (msg) - 1;
+	  memcpy (errmsg, msg, len);
+	  if (errmsg_len  len)
+		memset (errmsg[len], ' ', errmsg_len-len);
+	}
+	  return;
+	}
+  caf_runtime_error (msg);
+}
+
+  _gfortran_caf_sync_all (NULL, NULL, 0);
+
+  if (stat)
+*stat = 0;
+
+  free (token[caf_this_image-1]);
 }
 
 
@@ -267,7 +293,7 @@ _gfortran_caf_sync_images (int count, int images[], int *stat, char *errmsg,
 }
 
   /* Handle SYNC IMAGES(*).  */
-  if (unlikely(caf_is_finalized))
+  if (unlikely (caf_is_finalized))
 ierr = STAT_STOPPED_IMAGE;
   else
 ierr = MPI_Barrier (MPI_COMM_WORLD);
diff --git a/libgfortran/caf/single.c b/libgfortran/caf/single.c
index 09cc62f..50acc3d 100644
--- a/libgfortran/caf/single.c
+++ b/libgfortran/caf/single.c
@@ -121,10 +121,15 @@ _gfortran_caf_register (ptrdiff_t size, caf_register_t type, void **token,
 }
 
 
-int
-_gfortran_caf_deregister (void **token __attribute__ ((unused)))
+void
+_gfortran_caf_deregister (void **token, int *stat,
+			  char *errmsg __attribute__ ((unused)),
+			  int errmsg_len __attribute__ ((unused)))
 {
-  return 0;
+  free (*token);
+
+  if (stat)
+*stat = 0;
 }

Re: PATCH: Support BMI, BMI2 and LZCNT in immintrin.h

2011-08-26 Thread Uros Bizjak

On Fri, Aug 26, 2011 at 1:11 AM, H.J. Lu hongjiu...@intel.com wrote:

 immintrin.h should support all Intel intrinsics.  This patch adds
 BMI, BMI2 and LZCNT support to immintrin.h.  OK for trunk?

OK if passes regression testing.

Uros.

Re: [PATCH, middle-end]: Fix PR50083: All 32-bit fortran tests fail on 32-bit Solaris

2011-08-26 Thread Richard Guenther

On Thu, 25 Aug 2011, Uros Bizjak wrote:

 Hello!
 
 As noted in the PR, we also have to protect conversion from
 round-lround for non-TARGET_C99_FUNCTIONS targets. Otherwise, gcc
 chokes in fold_fixed_mathfn, trying to canonicalize iround to
 (non-existent) lround. It looks to me, that we can trigger the same
 problem trying to convert (long long) round - llround - lround on
 non-TARGET_C99_FUNCTIONS LP64 targets, so this fix probably applies to
 other release branches as well.
 
 2011-08-25  Uros Bizjak  ubiz...@gmail.com
 
   PR middle-end/50083
   * convert.c (convert_to_integer) BUIT_IN_ROUND{,F,L}: Convert
   only when TARGET_C99_FUNCTIONS.
   BUILT_IN_NEARBYINT{,F,L}: Ditto.
   BUILT_IN_RINT{,F,L}: Ditto.
 
 Bootstrapped on x86_64-pc-linux-gnu, regtesting in progress.
 
 OK for SVN and 4.6?

Hmm.  In builtins.c we usually check if the target has support to
expand the builtins directly in case we have named patterns for them.
IMHO these convert.c optimizations belong somewhere else (so that
they trigger for all languages).  Somewhere else these days would
be tree-ssa-forwprop.c.

I'm not asking you to do this move but please consider also doing
the optimization when the target can expand the function directly.

Thanks,
Richard.

Re: [PATCH, middle-end]: Fix PR50083: All 32-bit fortran tests fail on 32-bit Solaris

2011-08-26 Thread Uros Bizjak

On Fri, Aug 26, 2011 at 9:05 AM, Richard Guenther rguent...@suse.de wrote:

 As noted in the PR, we also have to protect conversion from
 round-lround for non-TARGET_C99_FUNCTIONS targets. Otherwise, gcc
 chokes in fold_fixed_mathfn, trying to canonicalize iround to
 (non-existent) lround. It looks to me, that we can trigger the same
 problem trying to convert (long long) round - llround - lround on
 non-TARGET_C99_FUNCTIONS LP64 targets, so this fix probably applies to
 other release branches as well.

 2011-08-25  Uros Bizjak  ubiz...@gmail.com

       PR middle-end/50083
       * convert.c (convert_to_integer) BUIT_IN_ROUND{,F,L}: Convert
       only when TARGET_C99_FUNCTIONS.
       BUILT_IN_NEARBYINT{,F,L}: Ditto.
       BUILT_IN_RINT{,F,L}: Ditto.

 Bootstrapped on x86_64-pc-linux-gnu, regtesting in progress.

 OK for SVN and 4.6?

 Hmm.  In builtins.c we usually check if the target has support to
 expand the builtins directly in case we have named patterns for them.
 IMHO these convert.c optimizations belong somewhere else (so that
 they trigger for all languages).  Somewhere else these days would
 be tree-ssa-forwprop.c.

 I'm not asking you to do this move but please consider also doing
 the optimization when the target can expand the function directly.

Yes, I know from our previous communication (ilogb handling) that this
whole convert.c part is fishy, but my attached patch fixes the
unwanted conversion in the same way as other similar builtins are
handled.

Uros.

Re: [PATCH, middle-end]: Fix PR50083: All 32-bit fortran tests fail on 32-bit Solaris

2011-08-26 Thread Uros Bizjak

On Fri, Aug 26, 2011 at 9:30 AM, Richard Guenther rguent...@suse.de wrote:

  As noted in the PR, we also have to protect conversion from
  round-lround for non-TARGET_C99_FUNCTIONS targets. Otherwise, gcc
  chokes in fold_fixed_mathfn, trying to canonicalize iround to
  (non-existent) lround. It looks to me, that we can trigger the same
  problem trying to convert (long long) round - llround - lround on
  non-TARGET_C99_FUNCTIONS LP64 targets, so this fix probably applies to
  other release branches as well.
 
  2011-08-25  Uros Bizjak  ubiz...@gmail.com
 
        PR middle-end/50083
        * convert.c (convert_to_integer) BUIT_IN_ROUND{,F,L}: Convert
        only when TARGET_C99_FUNCTIONS.
        BUILT_IN_NEARBYINT{,F,L}: Ditto.
        BUILT_IN_RINT{,F,L}: Ditto.
 
  Bootstrapped on x86_64-pc-linux-gnu, regtesting in progress.
 
  OK for SVN and 4.6?
 
  Hmm.  In builtins.c we usually check if the target has support to
  expand the builtins directly in case we have named patterns for them.
  IMHO these convert.c optimizations belong somewhere else (so that
  they trigger for all languages).  Somewhere else these days would
  be tree-ssa-forwprop.c.
 
  I'm not asking you to do this move but please consider also doing
  the optimization when the target can expand the function directly.

 Yes, I know from our previous communication (ilogb handling) that this
 whole convert.c part is fishy, but my attached patch fixes the
 unwanted conversion in the same way as other similar builtins are
 handled.

 Hmm, right, I see that now.

 Well, patch is ok then.

I will wait for the confirmation from Rainer before committing the patch.

Uros.

Re: fix for segmentation violation in dump_generic_node

2011-08-26 Thread Richard Guenther

On Thu, Aug 25, 2011 at 5:51 PM, Tom de Vries vr...@codesourcery.com wrote:
 Hi Richard,

 thanks for the review.

 On 08/25/2011 12:45 PM, Richard Guenther wrote:
 On Thu, Aug 25, 2011 at 12:32 PM, Tom de Vries vr...@codesourcery.com 
 wrote:
 Jakub,

 This patch fixes a segmentation violation, which occurs when printing a 
 MEM_REF
 or COMPONENT_REF containing a released ssa name.  This can happen when we 
 print
 basic blocks upon removal, enabled by -ftree-dump-tree-*-details (see
 remove_bb:tree-cfg.c).

 Where do we dump stmts there?


 In dump_bb:

 static void
 remove_bb (basic_block bb)
 {
  gimple_stmt_iterator i;

  if (dump_file)
    {
      fprintf (dump_file, Removing basic block %d\n, bb-index);
      if (dump_flags  TDF_DETAILS)
        {
          dump_bb (bb, dump_file, 0);
          fprintf (dump_file, \n);
        }
    }

 Bootstrapped and reg-tested on x86_64.

 OK for trunk?

 At least

   TREE_TYPE (TREE_OPERAND (node, 1)) != NULL_TREE

 is always true.


 Right.

 The comment before the new lines is now in the wrong place and this
 check at least needs a comment as well.


 Ok, fixed that.

 But - it's broken to dump freed stuff, why and where do we do this?


 Sorry, I did not realize that.

 The scenario is as follows: fnsplit splits a function, and as todo
 cleanup_tree_cfg is called and unreachable blocks are removed, among which
 blocks 12 and 13.

 Block 12 contains a use of 45:

  # BLOCK 12 freq:9100
  # PRED: 13
  D.13888_46 = *sD.13886_45;

 Block 13 contains a def of 45:

 Block 13
  # BLOCK 13
  # PRED: 11 12
  ...
  # sD.13886_45 = PHI sD.13886_44(11), sD.13886_49(12)
  ...
  if (sizeD.8479_2  iD.13887_50)
    goto bb 12;
  else
    goto bb 14;
  # SUCC: 12 14


 First block 13 is removed, and 
 remove_phi_nodes_and_edges_for_unreachable_block
 in remove_bb removes the phi def and releases version 45. Then block 12 is
 removed, and before removal it is dumped by dump_bb in remove_bb, triggering 
 the
 segv.

 The order of removal is determined by the 2nd loop in 
 delete_unreachable_blocks,
 which is chosen because there is no dominator info present:

     for (b = EXIT_BLOCK_PTR-prev_bb; b != ENTRY_BLOCK_PTR; b = prev_bb)
        {
          prev_bb = b-prev_bb;

          if (!(b-flags  BB_REACHABLE))
            {
              delete_basic_block (b);
              changed = true;
            }
        }

 I'm not sure how to fix this.

Hm, it's probably easiest to fixup the dumper here indeed.


 Another occurance of the same segv is in remove_dead_inserted_code:

  EXECUTE_IF_SET_IN_BITMAP (inserted_exprs, 0, i, bi)
    {
      t = SSA_NAME_DEF_STMT (ssa_name (i));
      if (!gimple_plf (t, NECESSARY))
        {
          gimple_stmt_iterator gsi;

          if (dump_file  (dump_flags  TDF_DETAILS))
            {
              fprintf (dump_file, Removing unnecessary insertion:);
              print_gimple_stmt (dump_file, t, 0, 0);
            }

          gsi = gsi_for_stmt (t);
          if (gimple_code (t) == GIMPLE_PHI)
            remove_phi_node (gsi, true);
          else
            {
              gsi_remove (gsi, true);
              release_defs (t);
            }
        }
    }

 Here a version is released, while it's used in the defining statement of
 version+1, which is subsequently printed. This is easy to fix by splitting the
 loop, I'll make a patch for this.

Probably also not worth fixing.  I guess we can simply go with your
patch, which in it's updated form is ok for trunk.

Thanks,
Richard.


 There might be other occurrences (I triggered these 2 doing a gcc build), but 
 I
 cannot trigger others until delete_unreachable_blocks does not trigger 
 anymore.

 Richard.


 Updated untested patch attached, I'll test this patch together with the
 remove_dead_inserted_code patch.

 Thanks,
 - Tom

 2011-08-25  Tom de Vries  t...@codesourcery.com

        * tree-pretty-print (dump_generic_node): Test for NULL_TREE before
        accessing TREE_TYPE.

Re: [PATCH] Add infrastructure to merge standard builtin enums with backend builtins

2011-08-26 Thread Richard Guenther

On Thu, Aug 25, 2011 at 10:35 PM, Michael Meissner
meiss...@linux.vnet.ibm.com wrote:
 On Wed, Aug 24, 2011 at 11:06:55AM +0200, Richard Guenther wrote:
 This basically would make DECL_BUILT_IN_CLASS no longer necessary
 if all targets where converted, right?  (We don't currently have any
 BUILT_IN_FRONTEND builtins).  That would sound appealing if this
 patch weren't a partial transition ;)

 Or we could reduce it to 1 bit if we aren't going to change all of the
 backends.

 Now for the possible downsides.  How can we reliably distinguish
 middle-end from target builtins for purpose of lazy initialization?
 Doesn't this complicate the idea of pluggable targets, thus
 something like a hybrid ppc / spu compiler?  In this light merging
 middle-end and target builtin enums and arrays sounds like a step
 backward.

 If we are willing to pay the storage costs, we could have 1 or 2 bytes for
 builtin owner, and 2 bytes for builtin index, and then reserve 0 for standard
 builtins and 1 for machine dependent builtins.  However, then you still have
 the potential problem that sooner or later somebody else will omit the checks.

I don't think that the issue you only can index BUILT_IN_NORMAL builtins
in built_in_decls is an issue and worth thinking about at all.  It's simply
bugs.

 We could reserve a fixed range for plugin builtins if you think that is
 desirable.

Oh, plugin builtins - I didn't even think about the possibility of having
those ;)

In the end I think we should stick with BUILT_IN_CLASS and maybe
add BUILT_IN_PLUGIN then ;)

 What I _do_ like is having common machinery for defining builtins.
 Though instead of continuing the .def file way with all the current
 warts of ways of adding attributes, etc. to builtins I would have
 prefered a genbuiltins.c program that can parse standard C
 declarations and generate whatever is necessary to setup the
 builtin decls.  Thus, instead of

 DEF_GCC_BUILTIN        (BUILT_IN_CLZ, clz, BT_FN_INT_UINT,
 ATTR_CONST_NOTHROW_LEAF_LIST)

 have simply

 int __builtin_clz (unsigned int) __attribute__((const,nothrow,leaf));

 in a header file which genbuiltins.c would parse.  My first idea
 when discussing this was a -fgenbuiltins flag to the C frontend
 (because that already can do all the parsing ...), but Micha suggested
 a parser that can deal with the above is easy enough to re-implement.

 Yes, that is certainly do-able.  My main intention is to see what kind of
 infrastructure people wanted before changing all of the ppc builtins.

Sure.  I agree that all the duplicated code we have in backends for a
way to create target builtins, defining enums (or not) for them and
having a way to reference them for targetm.builtin_decl (or not) is bad.
But unifying those, or providing common infrastructure for them should
be orthogonal to the issue whether we want to merge the builtin
classes or their storage in some way (I think we don't).  It would of
course be nice if the infrastructure to create taget builtins were
generic enough to eventually handle builtin creation in the middle-end
(and the frontends) as well.

 Hm, I guess this pushes back a bit on your patch.  Sorry for that.
 If you're not excited to try the above idea, can you split out the
 pieces that do the .def file thing for rs6000, keeping the separation
 of md and middle-end builtin arrays and enums?

 I have several goals for the 4.7 time frame:

  1) Make target attribute and pragma enable appropriate machine dependent
     builtins;

That's now something completely new ;)  Why do we need builtins for this?

  2) Make it less likely we will again be bitten by code that blindly
     references built_in_decl without checking if it is MD or standard;

I don't think this is important at all.  Proposed solution: transition
builtin decl access to a functional interface:

tree built_in_decl (enum built_in_code)

which when building with C++ will get you warnings if indexed with
a bougs enum type or an integer type.

  3) Make at least the MD builtins created on demand.  It would be nice to do
     the standard builtins as well, but that may somewhat more problematical.
     I do think all references to built_in_decl and implicit_built_in_decl
     should be moved to a macro wrapper.

To a (inline) function wrapper with the same name, indeed.

 If we restrict the types and attributes for a C like header file, it shouldn't
 be that hard (famous last words).  I would think adding #ifdef also, so:

        #ifdef __ALTIVEC__
        extern vector float __builtin_altivec_vaddfp (vector float, vector
                                float) __attribute__ ((...));
        #endif

 The backend would need to specify a list of valid #ifdef's and the mapping to
 TARGET_xxx, and valid extra types with a mapping to the internal type node.

Yes.  For the middle-end/frontend stuff we also need a way to specify
the difference between C89 and C99 builtins and GCC internal builtins.

Not sure if I'd use #ifdef like above or simply

Re: [lto] Refactor streamer (1/N) (issue4809083)

2011-08-26 Thread Richard Guenther

On Mon, Aug 8, 2011 at 5:17 PM, Richard Guenther rguent...@suse.de wrote:
 On Mon, 8 Aug 2011, Diego Novillo wrote:

 On Mon, Aug 8, 2011 at 10:52, Michael Matz m...@suse.de wrote:

  Sound.  ;)  Looking forward to some bikeshedding about naming in (2) and
  overabstraction in (3) :)

 Heh, yeah.

 I am going to be sending the renaming patch later today or tomorrow.
 In principle, the things I want to abstract are those that are forcing
 me to include lto-streamer.h from {tree,gimple,data}-streamer.*.
 I will know better when I merge this into the pph branch, though.

 Yeah, I think we discussed this already and agreed on that this is
 a sensible plan.

This patch caused http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50165
it seems that LTO string hashing is seriously broken now.

Richard.

 Richard.

[Patch ARM] Fix scheduling descriptions for smull.

2011-08-26 Thread Ramana Radhakrishnan


Hi,

This fixes the missing scheduling descriptions for some of the DImode 
multiply instructions. Tested on arm-linux-gnueabi and benchmarked with 
SPEC2k showing minor improvements.



Will be committed shortly.

cheers
Ramana


2011-08-26  Ramana Radhakrishnan  ramana.radhakrish...@arm.com

* config/arm/cortex-a9.md (cortex_a9_mult_long): New.
	(cortex_a9_multiply_long): New and use above.  Handle all long 
multiply cases.

(cortex_a9_multiply): Handle smmul and smmulr.
(cortex_a9_mac): Handle smmla.


diff --git a/gcc/config/arm/cortex-a9.md b/gcc/config/arm/cortex-a9.md
index b74ace8..12c19ef 100644
--- a/gcc/config/arm/cortex-a9.md
+++ b/gcc/config/arm/cortex-a9.md
@@ -68,7 +68,8 @@ cortex_a9_p1_e2 + cortex_a9_p0_e1 + cortex_a9_p1_e1)
   cortex_a9_mac_m1*2, cortex_a9_mac_m2, cortex_a9_p0_wb)
 (define_reservation cortex_a9_mac
   cortex_a9_multcycle1*2 ,cortex_a9_mac_m2, cortex_a9_p0_wb)
-
+(define_reservation cortex_a9_mult_long
+  cortex_a9_mac_m1*3, cortex_a9_mac_m2, cortex_a9_p0_wb)

 ;; Issue at the same time along the load store pipeline and
 ;; the VFP / Neon pipeline is not possible.
@@ -139,29 +140,35 @@ cortex_a9_p1_e2 + cortex_a9_p0_e1 + cortex_a9_p1_e1)
(eq_attr insn smlaxy))
   cortex_a9_mac16)

-
 (define_insn_reservation cortex_a9_multiply 4
   (and (eq_attr tune cortexa9)
-   (eq_attr insn mul))
+   (eq_attr insn mul,smmul,smmulr))
cortex_a9_mult)

 (define_insn_reservation cortex_a9_mac 4
   (and (eq_attr tune cortexa9)
-   (eq_attr insn mla))
+   (eq_attr insn mla,smmla))
cortex_a9_mac)

+(define_insn_reservation cortex_a9_multiply_long 5
+  (and (eq_attr tune cortexa9)
+   (eq_attr insn 
smull,umull,smulls,umulls,smlal,smlals,umlal,umlals))

+   cortex_a9_mult_long)
+
 ;; An instruction with a result in E2 can be forwarded
 ;; to E2 or E1 or M1 or the load store unit in the next cycle.

 (define_bypass 1 cortex_a9_dp
  cortex_a9_dp_shift, cortex_a9_multiply,
  cortex_a9_load1_2, cortex_a9_dp, cortex_a9_store1_2,
- cortex_a9_mult16, cortex_a9_mac16, cortex_a9_mac, cortex_a9_store3_4, 
cortex_a9_load3_4)
+ cortex_a9_mult16, cortex_a9_mac16, cortex_a9_mac, cortex_a9_store3_4, 
cortex_a9_load3_4,

+ cortex_a9_multiply_long)

 (define_bypass 2 cortex_a9_dp_shift
  cortex_a9_dp_shift, cortex_a9_multiply,
  cortex_a9_load1_2, cortex_a9_dp, cortex_a9_store1_2,
- cortex_a9_mult16, cortex_a9_mac16, cortex_a9_mac, cortex_a9_store3_4, 
cortex_a9_load3_4)
+ cortex_a9_mult16, cortex_a9_mac16, cortex_a9_mac, cortex_a9_store3_4, 
cortex_a9_load3_4,

+ cortex_a9_multiply_long)

 ;; An instruction in the load store pipeline can provide
 ;; read access to a DP instruction in the P0 default pipeline
@@ -212,7 +219,7 @@ cortex_a9_store3_4, cortex_a9_store1_2, 
cortex_a9_load3_4)


 (define_bypass 1
   cortex_a9_fps
-  cortex_a9_fadd, cortex_a9_fps, cortex_a9_fcmp, cortex_a9_dp, 
cortex_a9_dp_shift, cortex_a9_multiply)
+  cortex_a9_fadd, cortex_a9_fps, cortex_a9_fcmp, cortex_a9_dp, 
cortex_a9_dp_shift, cortex_a9_multiply, cortex_a9_multiply_long)


 ;; Scheduling on the FP_ADD pipeline.
 (define_reservation ca9fp_add ca9_issue_vfp_neon + ca9fp_add1, 
ca9fp_add2, ca9fp_add3, ca9fp_add4)

[PATCH, testsuite, ARM] change XFAIL to pass for ARM on a case testing tree-ssa-dom

2011-08-26 Thread Jiangning Liu

Test case gcc.dg/tree-ssa/20040204-1.c can pass for -O1 after Richard
Guenther rguent...@suse.de fixed something in tree-ssa-dom. The
link_error should be optimized away for ARM targets as well.

The patch is:

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c
b/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c
index 45e44a1..470b585 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/20040204-1.c
@@ -33,5 +33,5 @@ void test55 (int x, int y)
that the  should be emitted (based on BRANCH_COST).  Fix this
by teaching dom to look through  and register all components
as true.  */
-/* { dg-final { scan-tree-dump-times link_error 0 optimized { xfail { !
alpha*-*-* powerpc*-*-* cris-*-* crisv32-*-* hppa*-*-* i?86-*-* mmix-*-*
mips*-*-* m68k*-*-* moxie-*-* sparc*-*-* spu-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-tree-dump-times link_error 0 optimized { xfail { !
alpha*-*-* arm*-*-* powerpc*-*-* cris-*-* crisv32-*-* hppa*-*-* i?86-*-*
mmix-*-* mips*-*-* m68k*-*-* moxie-*-* sparc*-*-* spu-*-* x86_64-*-* } } }
} */
 /* { dg-final { cleanup-tree-dump optimized } } */

gcc/testsuite/ChangeLog:

2011-08-26  Jiangning Liu  jiangning@arm.com

   PR tree-optimization/46021
   * gcc.dg/tree-ssa/20040204-1.c: Don't XFAIL on arm*-*-*.

Thanks,
-Jiangning

Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-08-26 Thread Ilya Enkovich

Double ping.

2011/8/19 Ilya Enkovich enkovich@gmail.com:
 Ping.

 2011/8/10 Ilya Enkovich enkovich@gmail.com:
 Hello,

 Here is a new version of the patch. Changes from the previous version
 (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg02240.html):
  - updated to trunk
  - TODO_remove_unused_locals flag was removed from todo_flags_finish
 of reassoc pass

 Bootstrapped and checked on x86_64-linux.

 Thanks,
 Ilya
 ---
 gcc/

 2011-08-10  Enkovich Ilya  ilya.enkov...@intel.com

        PR middle-end/44382
        * target.def (reassociation_width): New hook.

        * doc/tm.texi.in (reassociation_width): Likewise.

        * doc/tm.texi (reassociation_width): Likewise.

        * doc/invoke.texi (tree-reassoc-width): New param documented.

        * hooks.h (hook_int_uint_mode_1): New default hook.

        * hooks.c (hook_int_uint_mode_1): Likewise.

        * config/i386/i386.h (ix86_tune_indices): Add
        X86_TUNE_REASSOC_INT_TO_PARALLEL and
        X86_TUNE_REASSOC_FP_TO_PARALLEL.

        (TARGET_REASSOC_INT_TO_PARALLEL): New.
        (TARGET_REASSOC_FP_TO_PARALLEL): Likewise.

        * config/i386/i386.c (initial_ix86_tune_features): Add
        X86_TUNE_REASSOC_INT_TO_PARALLEL and
        X86_TUNE_REASSOC_FP_TO_PARALLEL.

        (ix86_reassociation_width) implementation of
        new hook for i386 target.

        * params.def (PARAM_TREE_REASSOC_WIDTH): New param added.

        * tree-ssa-reassoc.c (get_required_cycles): New function.
        (get_reassociation_width): Likewise.
        (swap_ops_for_binary_stmt): Likewise.
        (rewrite_expr_tree_parallel): Likewise.

        (rewrite_expr_tree): Refactored. Part of code moved into
        swap_ops_for_binary_stmt.

        (reassociate_bb): Now checks reassociation width to be used
        and call rewrite_expr_tree_parallel instead of rewrite_expr_tree
        if needed.

 gcc/testsuite/

 2011-08-10  Enkovich Ilya  ilya.enkov...@intel.com

        * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option
        --param tree-reassoc-width=1.

        * gcc.dg/tree-ssa/reassoc-24.c: New test.
        * gcc.dg/tree-ssa/reassoc-25.c: Likewise.

Re: [PATCH][ARM] Thumb2 replicated constants

2011-08-26 Thread Andrew Stubbs


On 09/05/11 17:23, Andrew Stubbs wrote:

On 06/05/11 12:18, Richard Earnshaw wrote:

OK with a change to do that.


Thanks, I can't commit this until my ADDW/SUBW patch has been committed.


There was a bug I found in final testing, so this has been delayed somewhat.

I've just committed this version. There are a few minor changes to the 
way negative/inverted constants are generated.


Andrew
2011-08-26  Andrew Stubbs  a...@codesourcery.com

	gcc/
	* config/arm/arm.c (struct four_ints): New type.
	(count_insns_for_constant): Delete function.
	(find_best_start): Delete function.
	(optimal_immediate_sequence): New function.
	(optimal_immediate_sequence_1): New function.
	(arm_gen_constant): Move constant splitting code to
	optimal_immediate_sequence.
	Rewrite constant negation/invertion code.

	gcc/testsuite/
	* gcc.target/arm/thumb2-replicated-constant1.c: New file.
	* gcc.target/arm/thumb2-replicated-constant2.c: New file.
	* gcc.target/arm/thumb2-replicated-constant3.c: New file.
	* gcc.target/arm/thumb2-replicated-constant4.c: New file.

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -64,6 +64,11 @@ typedef struct minipool_fixup   Mfix;
 
 void (*arm_lang_output_object_attributes_hook)(void);
 
+struct four_ints
+{
+  int i[4];
+};
+
 /* Forward function declarations.  */
 static bool arm_needs_doubleword_align (enum machine_mode, const_tree);
 static int arm_compute_static_chain_stack_bytes (void);
@@ -128,7 +133,13 @@ static void arm_output_function_prologue (FILE *, HOST_WIDE_INT);
 static int arm_comp_type_attributes (const_tree, const_tree);
 static void arm_set_default_type_attributes (tree);
 static int arm_adjust_cost (rtx, rtx, rtx, int);
-static int count_insns_for_constant (HOST_WIDE_INT, int);
+static int optimal_immediate_sequence (enum rtx_code code,
+   unsigned HOST_WIDE_INT val,
+   struct four_ints *return_sequence);
+static int optimal_immediate_sequence_1 (enum rtx_code code,
+	 unsigned HOST_WIDE_INT val,
+	 struct four_ints *return_sequence,
+	 int i);
 static int arm_get_strip_length (int);
 static bool arm_function_ok_for_sibcall (tree, tree);
 static enum machine_mode arm_promote_function_mode (const_tree,
@@ -2513,68 +2524,41 @@ arm_split_constant (enum rtx_code code, enum machine_mode mode, rtx insn,
 			   1);
 }
 
-/* Return the number of instructions required to synthesize the given
-   constant, if we start emitting them from bit-position I.  */
-static int
-count_insns_for_constant (HOST_WIDE_INT remainder, int i)
-{
-  HOST_WIDE_INT temp1;
-  int step_size = TARGET_ARM ? 2 : 1;
-  int num_insns = 0;
-
-  gcc_assert (TARGET_ARM || i == 0);
-
-  do
-{
-  int end;
-
-  if (i = 0)
-	i += 32;
-  if (remainder  (((1  step_size) - 1)  (i - step_size)))
-	{
-	  end = i - 8;
-	  if (end  0)
-	end += 32;
-	  temp1 = remainder  ((0x0ff  end)
-| ((i  end) ? (0xff  (32 - end)) : 0));
-	  remainder = ~temp1;
-	  num_insns++;
-	  i -= 8 - step_size;
-	}
-  i -= step_size;
-} while (remainder);
-  return num_insns;
-}
-
+/* Return a sequence of integers, in RETURN_SEQUENCE that fit into
+   ARM/THUMB2 immediates, and add up to VAL.
+   Thr function return value gives the number of insns required.  */
 static int
-find_best_start (unsigned HOST_WIDE_INT remainder)
+optimal_immediate_sequence (enum rtx_code code, unsigned HOST_WIDE_INT val,
+			struct four_ints *return_sequence)
 {
   int best_consecutive_zeros = 0;
   int i;
   int best_start = 0;
+  int insns1, insns2;
+  struct four_ints tmp_sequence;
 
   /* If we aren't targetting ARM, the best place to start is always at
- the bottom.  */
-  if (! TARGET_ARM)
-return 0;
-
-  for (i = 0; i  32; i += 2)
+ the bottom, otherwise look more closely.  */
+  if (TARGET_ARM)
 {
-  int consecutive_zeros = 0;
-
-  if (!(remainder  (3  i)))
+  for (i = 0; i  32; i += 2)
 	{
-	  while ((i  32)  !(remainder  (3  i)))
-	{
-	  consecutive_zeros += 2;
-	  i += 2;
-	}
-	  if (consecutive_zeros  best_consecutive_zeros)
+	  int consecutive_zeros = 0;
+
+	  if (!(val  (3  i)))
 	{
-	  best_consecutive_zeros = consecutive_zeros;
-	  best_start = i - consecutive_zeros;
+	  while ((i  32)  !(val  (3  i)))
+		{
+		  consecutive_zeros += 2;
+		  i += 2;
+		}
+	  if (consecutive_zeros  best_consecutive_zeros)
+		{
+		  best_consecutive_zeros = consecutive_zeros;
+		  best_start = i - consecutive_zeros;
+		}
+	  i -= 2;
 	}
-	  i -= 2;
 	}
 }
 
@@ -2601,13 +2585,161 @@ find_best_start (unsigned HOST_WIDE_INT remainder)
  the constant starting from `best_start', and also starting from
  zero (i.e. with bit 31 first to be output).  If `best_start' doesn't
  yield a shorter sequence, we may as well use zero.  */
+  insns1 = optimal_immediate_sequence_1 (code, val, return_sequence, best_start);
   if (best_start != 0
-   unsigned HOST_WIDE_INT) 1)  best_start)  remainder)
-

Re: [PATCH] PR42554/49992: avoid use of '-c' flag with ranlib on darwin10 and later

2011-08-26 Thread Ralf Wildenhues

* Jack Howarth wrote on Fri, Aug 12, 2011 at 01:27:21AM CEST:
The following patch addresses 
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42554#c15
 by extending the logic used in...

 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=157563
 Log:
 PR ada/42554
 * configure.ac: Only pass -c to ranlib for darwin9 and earlier.
 * configure: Regenerate.

 Okay for gcc trunk?

OK with ...

 2010-08-11  Jack Howarth howa...@bromo.med.uc.edu
 
   PR 42554/49992
 
   * gcc/configure.ac: Only pass -c to ranlib for darwin9 and earlier.
   * gcc/configure.ac: Regenerate.

... typo in file name fixed.

Thanks,
Ralf

 --- gcc/configure.ac  (revision 177684)
 +++ gcc/configure.ac  (working copy)
 @@ -821,11 +821,8 @@ gcc_AC_PROG_LN_S
  ACX_PROG_LN($LN_S)
  AC_PROG_RANLIB
  case ${host} in
 -*-*-darwin*)
 -  # By default, the Darwin ranlib will not treat common symbols as
 -  # definitions when  building the archive table of contents.  Other 
 -  # ranlibs do that; pass an option to the Darwin ranlib that makes
 -  # it behave similarly.
 +*-*-darwin[[3-9]]*)
 +  # ranlib before Darwin10 requires the -c flag to look at common symbols.
ranlib_flags=-c 
;;
  *)

PING: [PATCH]: Fix -fbranch-probabilities

2011-08-26 Thread Christian Bruel


Hello,

Could I have a review for the trivial patch posted in
http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01123.html

-fprofile-use sets flag_branch_probabilities.

But we should also be able to use -fbranch-probabilities on its own 
using the information generated by -fprofile-arcs, as documented.


Many thanks

Christian

Re: [PATCH] PR42554/49992: avoid use of '-c' flag with ranlib on darwin10 and later

2011-08-26 Thread Iain Sandoe



On 26 Aug 2011, at 11:27, Ralf Wildenhues wrote:


* Jack Howarth wrote on Fri, Aug 12, 2011 at 01:27:21AM CEST:

  The following patch addresses 
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42554#c15
by extending the logic used in...



URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=157563
Log:
   PR ada/42554
   * configure.ac: Only pass -c to ranlib for darwin9 and earlier.
   * configure: Regenerate.



Okay for gcc trunk?


OK with ...


2010-08-11  Jack Howarth howa...@bromo.med.uc.edu

PR 42554/49992

* gcc/configure.ac: Only pass -c to ranlib for darwin9 and earlier.
* gcc/configure.ac: Regenerate.


... typo in file name fixed.

Thanks,
Ralf


--- gcc/configure.ac(revision 177684)
+++ gcc/configure.ac(working copy)
@@ -821,11 +821,8 @@ gcc_AC_PROG_LN_S
ACX_PROG_LN($LN_S)
AC_PROG_RANLIB
case ${host} in
-*-*-darwin*)
-  # By default, the Darwin ranlib will not treat common symbols as
-  # definitions when  building the archive table of contents.  Other
-  # ranlibs do that; pass an option to the Darwin ranlib that makes
-  # it behave similarly.
+*-*-darwin[[3-9]]*)
+  # ranlib before Darwin10 requires the -c flag to look at common  
symbols.

  ranlib_flags=-c
  ;;
*)



I am still investigating this -- getting Ada bootstrapped on ppc has  
taken some time...


not objecting to the patch - but I think we can go further
... as commented in the PR, I would say that we can likely remove the  
special casing of ranlib completely for all Darwin (some more testing  
on ppc/ada still under way).  So far OK on ppc/darwin8x86_64/ 
darwin10 (incl. ada on *86*)


As things stand, darwin  8 will not bootstrap GCC 4.6 or trunk with  
its native toolset; it requires the use of odcctools or similar to  
make use of newer versions of ld.  (thus, support of ancient  darwin  
is conditional on use of a toolset from at least darwin 8 era).


cheers
Iain

[testsuite, i386] Fix for PR50185

2011-08-26 Thread Kirill Yukhin

Hi,
Here is a fix for http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

testsuite/ChangeLog entry:

2011-08-26  Kirill Yukhin  kirill.yuk...@intel.com

PR testsuite/50185
* gcc.target/i386/avx2-vmovmskb-2.c: Rename to ...
* gcc.target/i386/avx2-vpmovmskb-2.c: ... this. Update.

Test passes.
Ok for trunk?

Thanks, K


pr50185.gcc.patch
Description: Binary data

Re: [testsuite, i386] Fix for PR50185

2011-08-26 Thread Kirill Yukhin

According to Jakub's input, I've updated test to scan instruction, not
pattern name.

Is it ok?

Thanks, K

On Fri, Aug 26, 2011 at 3:45 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hi,
 Here is a fix for http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

 testsuite/ChangeLog entry:

 2011-08-26  Kirill Yukhin  kirill.yuk...@intel.com

        PR testsuite/50185
        * gcc.target/i386/avx2-vmovmskb-2.c: Rename to ...
        * gcc.target/i386/avx2-vpmovmskb-2.c: ... this. Update.

 Test passes.
 Ok for trunk?

 Thanks, K



pr50185-2.gcc.patch
Description: Binary data

Re: [lto] Refactor streamer (1/N) (issue4809083)

2011-08-26 Thread Michael Matz

Hi,

On Fri, 26 Aug 2011, Richard Guenther wrote:

  I am going to be sending the renaming patch later today or tomorrow. 
  In principle, the things I want to abstract are those that are 
  forcing me to include lto-streamer.h from 
  {tree,gimple,data}-streamer.*. I will know better when I merge this 
  into the pph branch, though.
 
  Yeah, I think we discussed this already and agreed on that this is a 
  sensible plan.
 
 This patch caused http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50165 it 
 seems that LTO string hashing is seriously broken now.

Once regstrap passes on x86_64-linux I'm checking this in as obvious.


Ciao,
Michael.
-- 
PR lto/50165
* lto-streamer-in.c (canon_file_name): Initialize new_slot-len.

Index: lto-streamer-in.c
===
--- lto-streamer-in.c   (revision 178040)
+++ lto-streamer-in.c   (working copy)
@@ -113,6 +113,7 @@ canon_file_name (const char *string)
   new_slot = XCNEW (struct string_slot);
   strcpy (saved_string, string);
   new_slot-s = saved_string;
+  new_slot-len = len;
   *slot = new_slot;
   return saved_string;
 }

Re: [lto] Refactor streamer (1/N) (issue4809083)

2011-08-26 Thread Diego Novillo


On 11-08-26 04:24 , Richard Guenther wrote:


This patch caused http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50165
it seems that LTO string hashing is seriously broken now.


Sorry about this.  Bad timing as I will be away until 7/Sep.  Would it 
make things easier if the commit that introduced this was reverted?



Diego.

[patch, libfortran] Fix PR 50192 - fix wide-char comparison

2011-08-26 Thread Thomas Koenig


Hello world,

the attached patch fixes the PR by doing comparisions for wide
characters as unsigned 4-byte ints.

I have put the comparison function into libgfortran.h because I will
need it for MINLOC and friends for characters.

OK for trunk?  Which branches should this be backported to?

Thomas


2011-08-26  Thomas Koenig  tkoe...@gcc.gnu.org

PR libfortran/50192
* intrinsics/string_intrinsics.c (memcmp_char4):  New function.
* intrinsics/string_intrinsics_inc.c:  New macro MEMCMP, either
set to memcmp or memcmp_char4.
(compare_string):  Use MEMCMP, with correct size for it.
* libgfortran.h:  Add prototype for memcmp_char4.

2011-08-26  Thomas Koenig  tkoe...@gcc.gnu.org

PR libfortran/50192
* gfortran.dg/widechar_compare_1.f90:  New test.
Index: intrinsics/string_intrinsics_inc.c
===
--- intrinsics/string_intrinsics_inc.c	(Revision 178067)
+++ intrinsics/string_intrinsics_inc.c	(Arbeitskopie)
@@ -90,7 +90,7 @@ compare_string (gfc_charlen_type len1, const CHART
   gfc_charlen_type len;
   int res;
 
-  res = memcmp (s1, s2, ((len1  len2) ? len1 : len2) * sizeof (CHARTYPE));
+  res = MEMCMP (s1, s2, ((len1  len2) ? len1 : len2));
   if (res != 0)
 return res;
 
Index: intrinsics/string_intrinsics.c
===
--- intrinsics/string_intrinsics.c	(Revision 178067)
+++ intrinsics/string_intrinsics.c	(Arbeitskopie)
@@ -51,7 +51,24 @@ memset_char4 (gfc_char4_t *b, gfc_char4_t c, size_
   return b;
 }
 
+/* Compare wide character types, which are handled internally as
+   unsigned 4-byte integers.  */
+int
+memcmp_char4 (const void *a, const void *b, size_t len)
+{
+  const GFC_UINTEGER_4 *pa = a;
+  const GFC_UINTEGER_4 *pb = b;
+  while (len--  0)
+{
+  if (*pa != *pb)
+	return *pa  *pb ? -1 : 1;
+  pa ++;
+  pb ++;
+}
+  return 0;
+}
 
+
 /* All other functions are defined using a few generic macros in
string_intrinsics_inc.c, so we avoid code duplication between the
various character type kinds.  */
@@ -64,6 +81,8 @@ memset_char4 (gfc_char4_t *b, gfc_char4_t c, size_
 #define SUFFIX(x) x
 #undef  MEMSET
 #define MEMSET memset
+#undef  MEMCMP
+#define MEMCMP memcmp
 
 #include string_intrinsics_inc.c
 
@@ -76,6 +95,8 @@ memset_char4 (gfc_char4_t *b, gfc_char4_t c, size_
 #define SUFFIX(x) x ## _char4
 #undef  MEMSET
 #define MEMSET memset_char4
+#undef  MEMCMP
+#define MEMCMP memcmp_char4
 
 #include string_intrinsics_inc.c
 
Index: libgfortran.h
===
--- libgfortran.h	(Revision 178067)
+++ libgfortran.h	(Arbeitskopie)
@@ -1266,6 +1266,10 @@ extern int compare_string_char4 (gfc_charlen_type,
  gfc_charlen_type, const gfc_char4_t *);
 iexport_proto(compare_string_char4);
 
+extern int memcmp_char4 (const void *, const void *, size_t);
+internal_proto(memcmp_char4);
+
+
 /* random.c */
 
 extern void random_seed_i4 (GFC_INTEGER_4 * size, gfc_array_i4 * put,
! { dg-do run }
! PR 50192 - on little-endian systems, this used to fail.
program main
  character(kind=4,len=2) :: c1, c2
  c1 = 4_' '
  c2 = 4_' '
  c1(1:1) = transfer(257, mold=c1(1:1))
  c2(1:1) = transfer(64, mold=c2(1:1))
  if (c1  c2) call abort
end program main

Re: [patch, libfortran] Fix PR 50192 - fix wide-char comparison

2011-08-26 Thread Thomas Koenig


Am 26.08.2011 14:40, schrieb Thomas Koenig:


OK for trunk?  Which branches should this be backported to?


I forgot - also regression-tested.

Thomas

Re: [lto] Refactor streamer (1/N) (issue4809083)

2011-08-26 Thread Jakub Jelinek

On Fri, Aug 26, 2011 at 02:34:29PM +0200, Michael Matz wrote:
 Hi,
 
 On Fri, 26 Aug 2011, Richard Guenther wrote:
 
   I am going to be sending the renaming patch later today or tomorrow. 
   In principle, the things I want to abstract are those that are 
   forcing me to include lto-streamer.h from 
   {tree,gimple,data}-streamer.*. I will know better when I merge this 
   into the pph branch, though.
  
   Yeah, I think we discussed this already and agreed on that this is a 
   sensible plan.
  
  This patch caused http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50165 it 
  seems that LTO string hashing is seriously broken now.
 
 Once regstrap passes on x86_64-linux I'm checking this in as obvious.

While you are touching it, I think we should also optimize it as in the
patch below.  I'm afraid no string length optimization would be able to
figure out that it doesn't have to call strlen twice, because the
htab_find_slot isn't pure.

2011-08-26  Jakub Jelinek  ja...@redhat.com

* lto-streamer-in.c (canon_file_name): Avoid calling strlen twice,
use memcpy instead of strcpy.

--- gcc/lto-streamer-in.c.jj2011-08-26 14:39:52.0 +0200
+++ gcc/lto-streamer-in.c   2011-08-26 14:40:59.543884012 +0200
@@ -98,21 +98,20 @@ canon_file_name (const char *string)
 {
   void **slot;
   struct string_slot s_slot;
+  size_t len = strlen (string);
   s_slot.s = string;
-  s_slot.len = strlen (string);
+  s_slot.len = len;
 
   slot = htab_find_slot (file_name_hash_table, s_slot, INSERT);
   if (*slot == NULL)
 {
-  size_t len;
   char *saved_string;
   struct string_slot *new_slot;
 
-  len = strlen (string);
   saved_string = (char *) xmalloc (len + 1);
   new_slot = XCNEW (struct string_slot);
   new_slot-len = len;
-  strcpy (saved_string, string);
+  memcpy (saved_string, string, len + 1);
   new_slot-s = saved_string;
   *slot = new_slot;
   return saved_string;

Jakub

Re: [testsuite, i386] Fix for PR50185

2011-08-26 Thread Uros Bizjak

On Fri, Aug 26, 2011 at 2:04 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 According to Jakub's input, I've updated test to scan instruction, not
 pattern name.

 Is it ok?

 Thanks, K

 On Fri, Aug 26, 2011 at 3:45 PM, Kirill Yukhin kirill.yuk...@gmail.com 
 wrote:
 Hi,
 Here is a fix for http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

 testsuite/ChangeLog entry:

 2011-08-26  Kirill Yukhin  kirill.yuk...@intel.com

        PR testsuite/50185
        * gcc.target/i386/avx2-vmovmskb-2.c: Rename to ...
        * gcc.target/i386/avx2-vpmovmskb-2.c: ... this. Update.

 Test passes.
 Ok for trunk?

Is this correct ChangeLog? Looking into the patch, you are changing
one test to look for insn name, while adding avx2-vpmovmskb-2.c which
still looks for pattern name.

Please update ChangeLog and/or attached patch.

Uros.

[v3] Handle different versions of Solaris 8 iso/math_iso.h, iso/stdlib_iso.h

2011-08-26 Thread Rainer Orth

All my testing of the __cplusplus 199711L patches had been on Solaris
8+/x86.  During last weekend's bootstrap on the whole range of systems
(Solaris 8 to 11, SPARC and x86), it turned out that there are possible
variations of iso/math_iso.h and iso/stdlib_iso.h between Solaris 8
FCS and patches, so we cannot statically configure which overloads are
present, but need autoconf checks for that.

The situation is as follows:

* Solaris 8 FCS shipped rev. 1.1 of iso/math_iso.h which only had
  double std::abs(double).  Later, in patches 111721-04 (SPARC) and
  112757-01 (x86), rev. 1.3 whas shipped that has everything that's also
  present in Solaris 9 and up.

* Similarly, Solaris 8 FCS has rev. 1.1 of iso/stdlib_iso.h without
  any overloads.  Patches 109607-02 (SPARC) and 109608-02 (x86) added
  long std::abs(long) and ldiv_t div(lng, long) in rev. 1.3.

Since bits/os_defines.h is included before configure results,
configure needs to define the affected
__CORRECT_ISO_CPP_MATH_H_PROTO[12] and __CORRECT_ISO_CPP_STDLIB_H_PROTO
directly.  The following patch does just that.

Bootstrapped without regressions on x86_64-unknown-linux-gnu and
i386-pc-solaris2.11, bootstraps on i386-pc-solaris2.8 (with the old
rev. 1.1 headers) and sparc-sun-solaris2.8 (with the the rev. 1.3
headers) are still in progress, but I've verified that the
__CORRECT_ISO_CPP_* macros are all defined correctly..  Since errors in
previous versions of the patch manifested themselves in build failures
immediately, I'm pretty certain that there are no errors.

Ok for mainline if bootstraps pass?

Thanks.
Rainer


2011-08-25  Rainer Orth  r...@cebitec.uni-bielefeld.de

* acinclude.m4 (GLIBCXX_CHECK_MATH_PROTO)
(GLIBCXX_CHECK_STDLIB_PROTO): New tests.
* configure.ac (GLIBCXX_CHECK_MATH_PROTO)
(GLIBCXX_CHECK_STDLIB_PROTO): Call them.
* configure: Regenerate.
* config.h.in: Regenerate.
* config/os/solaris/solaris2.8/os_defines.h
(__CORRECT_ISO_CPP_MATH_H_PROTO2): Don't define.
* config/os/solaris/solaris2.9: Remove.
* configure.host (solaris2.8): Merge with ...
(solaris2.9, solaris2.1[0-9]): ... this.
Always use os/solaris/solaris2.8.

# HG changeset patch
# Parent b3524f20d0077532a567b222d37ef05976af2743
Handle different versions of Solaris 8 iso/math_iso.h

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -1693,6 +1693,100 @@ AC_DEFUN([GLIBCXX_COMPUTE_STDIO_INTEGER_
 ])
 
 dnl
+dnl Check whether required C++ overloads are present in math.h.
+dnl
+
+AC_DEFUN([GLIBCXX_CHECK_MATH_PROTO], [
+
+  AC_LANG_SAVE
+  AC_LANG_CPLUSPLUS
+
+  case $host in
+*-*-solaris2.*)
+  # Solaris 8 FCS only had an overload for double std::abs(double) in
+  # iso/math_iso.h.  Patches 111721-04 (SPARC) and 112757-01 (x86)
+  # introduced the full set also found from Solaris 9 onwards.
+  AC_MSG_CHECKING([for float std::abs(float) overload])
+  AC_CACHE_VAL(glibcxx_cv_abs_float, [
+	AC_COMPILE_IFELSE([AC_LANG_SOURCE(
+	  [#include math.h
+	   namespace std {
+	 inline float abs(float __x)
+	 {  return __builtin_fabsf(__x); }
+	   }
+	])],
+[glibcxx_cv_abs_float=no],
+[glibcxx_cv_abs_float=yes]
+  )])
+
+  # autoheader cannot handle indented templates.
+  AH_VERBATIM([__CORRECT_ISO_CPP_MATH_H_PROTO1],
+[/* Define if all C++ overloads are available in math.h.  */
+#if __cplusplus = 199711L
+#undef __CORRECT_ISO_CPP_MATH_H_PROTO1
+#endif])
+  AH_VERBATIM([__CORRECT_ISO_CPP_MATH_H_PROTO2],
+[/* Define if only double std::abs(double) is available in math.h.  */
+#if __cplusplus = 199711L
+#undef __CORRECT_ISO_CPP_MATH_H_PROTO2
+#endif])
+
+  if test $glibcxx_cv_abs_float = yes; then
+AC_DEFINE(__CORRECT_ISO_CPP_MATH_H_PROTO1)
+  else
+AC_DEFINE(__CORRECT_ISO_CPP_MATH_H_PROTO2)
+  fi
+  AC_MSG_RESULT($glibcxx_cv_abs_float)
+  ;;
+  esac
+
+  AC_LANG_RESTORE
+])
+
+dnl
+dnl Check whether required C++ overloads are present in stdlib.h.
+dnl
+
+AC_DEFUN([GLIBCXX_CHECK_STDLIB_PROTO], [
+
+  AC_LANG_SAVE
+  AC_LANG_CPLUSPLUS
+
+  case $host in
+*-*-solaris2.*)
+  # Solaris 8 FCS lacked the overloads for long std::abs(long) and
+  # ldiv_t std::div(long, long) in iso/stdlib_iso.h.  Patches 109607-02
+  # (SPARC) and 109608-02 (x86) introduced them.
+  AC_MSG_CHECKING([for long std::abs(long) overload])
+  AC_CACHE_VAL(glibcxx_cv_abs_long, [
+	AC_COMPILE_IFELSE([AC_LANG_SOURCE(
+	  [#include stdlib.h
+	   namespace std {
+	 inline long
+	 abs(long __i) { return labs(__i); }
+	   }
+])],
+[glibcxx_cv_abs_long=no],
+[glibcxx_cv_abs_long=yes]
+  )])
+
+  # autoheader cannot handle indented templates.
+  AH_VERBATIM([__CORRECT_ISO_CPP_STDLIB_H_PROTO],
+[/* Define if all C++ overloads are available in stdlib.h.  */

Re: [PATCH] PR42554/49992: avoid use of '-c' flag with ranlib on darwin10 and later

2011-08-26 Thread Jack Howarth

On Fri, Aug 26, 2011 at 12:09:53PM +0100, Iain Sandoe wrote:

 On 26 Aug 2011, at 11:27, Ralf Wildenhues wrote:

 * Jack Howarth wrote on Fri, Aug 12, 2011 at 01:27:21AM CEST:
   The following patch addresses 
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42554#c15
 by extending the logic used in...

 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=157563
 Log:
PR ada/42554
* configure.ac: Only pass -c to ranlib for darwin9 and earlier.
* configure: Regenerate.

 Okay for gcc trunk?

 OK with ...

 2010-08-11  Jack Howarth howa...@bromo.med.uc.edu

 PR 42554/49992

 * gcc/configure.ac: Only pass -c to ranlib for darwin9 and earlier.
 * gcc/configure.ac: Regenerate.

 ... typo in file name fixed.

 Thanks,
 Ralf

 --- gcc/configure.ac(revision 177684)
 +++ gcc/configure.ac(working copy)
 @@ -821,11 +821,8 @@ gcc_AC_PROG_LN_S
 ACX_PROG_LN($LN_S)
 AC_PROG_RANLIB
 case ${host} in
 -*-*-darwin*)
 -  # By default, the Darwin ranlib will not treat common symbols as
 -  # definitions when  building the archive table of contents.  Other
 -  # ranlibs do that; pass an option to the Darwin ranlib that makes
 -  # it behave similarly.
 +*-*-darwin[[3-9]]*)
 +  # ranlib before Darwin10 requires the -c flag to look at common  
 symbols.
   ranlib_flags=-c
   ;;
 *)


 I am still investigating this -- getting Ada bootstrapped on ppc has  
 taken some time...

Iain,
   Why don't you take the path of least resistance and just post your
proposed patch to unconditionally drop -c from ranflags with a cc
to the AdaCore developers. They should already be configured to test
on darwin.
 Jack


 not objecting to the patch - but I think we can go further
 ... as commented in the PR, I would say that we can likely remove the  
 special casing of ranlib completely for all Darwin (some more testing on 
 ppc/ada still under way).  So far OK on ppc/darwin8x86_64/darwin10 
 (incl. ada on *86*)

 As things stand, darwin  8 will not bootstrap GCC 4.6 or trunk with its 
 native toolset; it requires the use of odcctools or similar to make use 
 of newer versions of ld.  (thus, support of ancient  darwin is 
 conditional on use of a toolset from at least darwin 8 era).

 cheers
 Iain

Re: [PATCH, ARM] Generate conditional compares in Thumb2 state

2011-08-26 Thread Ramana Radhakrishnan

On 19 August 2011 11:06, Ramana Radhakrishnan
ramana.radhakrish...@linaro.org wrote:

 Regression test against cortex-M0/M3/M4 profile with -mthumb option
 doesn't show any new failures.

 Please test on ARM state as well and make sure there are no
 regressions before committing.


Jiangning told me privately that the test-results for v7-a were fine
for cross-testing for arm-eabi with C and C++.

And this is what I committed


cheers
Ramana


2011-08-26  Jiangning Liu  jiangning@arm.com

   * config/arm/arm.md (*ior_scc_scc): Enable for Thumb2 as well.
   (*ior_scc_scc_cmp): Likewise
   (*and_scc_scc): Likewise.
   (*and_scc_scc_cmp): Likewise.
   (*and_scc_scc_nodom): Likewise.
   (*cmp_ite0, *cmp_ite1, *cmp_and, *cmp_ior): Handle Thumb2.

2011-08-26  Jiangning Liu  jiangning@arm.com

   * gcc.target/arm/thumb2-cond-cmp-1.c: New.
   * gcc.target/arm/thumb2-cond-cmp-2.c: Likewise.
   * gcc.target/arm/thumb2-cond-cmp-3.c: Likewise.
   * gcc.target/arm/thumb2-cond-cmp-4.c: Likewise.


 Ok if no regressions.

 Ramana


 Thanks,
 -Jiangning

Index: gcc/config/arm/arm.md
===
--- gcc/config/arm/arm.md   (revision 178097)
+++ gcc/config/arm/arm.md   (working copy)
@@ -49,6 +49,15 @@
(DOM_CC_X_OR_Y   2)
   ]
 )
+;; conditional compare combination
+(define_constants
+  [(CMP_CMP 0)
+   (CMN_CMP 1)
+   (CMP_CMN 2)
+   (CMN_CMN 3)
+   (NUM_OF_COND_CMP 4)
+  ]
+)
 
 ;; UNSPEC Usage:
 ;; Note: sin and cos are no-longer used.
@@ -8980,40 +8989,85 @@
(set_attr length 8,12)]
 )
 
-;; ??? Is it worth using these conditional patterns in Thumb-2 mode?
 (define_insn *cmp_ite0
   [(set (match_operand 6 dominant_cc_register )
(compare
 (if_then_else:SI
  (match_operator 4 arm_comparison_operator
-  [(match_operand:SI 0 s_register_operand r,r,r,r)
-   (match_operand:SI 1 arm_add_operand rI,L,rI,L)])
+  [(match_operand:SI 0 s_register_operand
+   l,l,l,r,r,r,r,r,r)
+   (match_operand:SI 1 arm_add_operand
+   lPy,lPy,lPy,rI,L,rI,L,rI,L)])
  (match_operator:SI 5 arm_comparison_operator
-  [(match_operand:SI 2 s_register_operand r,r,r,r)
-   (match_operand:SI 3 arm_add_operand rI,rI,L,L)])
+  [(match_operand:SI 2 s_register_operand
+   l,r,r,l,l,r,r,r,r)
+   (match_operand:SI 3 arm_add_operand
+   lPy,rI,L,lPy,lPy,rI,rI,L,L)])
  (const_int 0))
 (const_int 0)))]
-  TARGET_ARM
+  TARGET_32BIT
   *
   {
-static const char * const opcodes[4][2] =
+static const char * const cmp1[NUM_OF_COND_CMP][2] =
 {
-  {\cmp\\t%2, %3\;cmp%d5\\t%0, %1\,
-   \cmp\\t%0, %1\;cmp%d4\\t%2, %3\},
-  {\cmp\\t%2, %3\;cmn%d5\\t%0, #%n1\,
-   \cmn\\t%0, #%n1\;cmp%d4\\t%2, %3\},
-  {\cmn\\t%2, #%n3\;cmp%d5\\t%0, %1\,
-   \cmp\\t%0, %1\;cmn%d4\\t%2, #%n3\},
-  {\cmn\\t%2, #%n3\;cmn%d5\\t%0, #%n1\,
-   \cmn\\t%0, #%n1\;cmn%d4\\t%2, #%n3\}
+  {\cmp%d5\\t%0, %1\,
+   \cmp%d4\\t%2, %3\},
+  {\cmn%d5\\t%0, #%n1\,
+   \cmp%d4\\t%2, %3\},
+  {\cmp%d5\\t%0, %1\,
+   \cmn%d4\\t%2, #%n3\},
+  {\cmn%d5\\t%0, #%n1\,
+   \cmn%d4\\t%2, #%n3\}
 };
+static const char * const cmp2[NUM_OF_COND_CMP][2] =
+{
+  {\cmp\\t%2, %3\,
+   \cmp\\t%0, %1\},
+  {\cmp\\t%2, %3\,
+   \cmn\\t%0, #%n1\},
+  {\cmn\\t%2, #%n3\,
+   \cmp\\t%0, %1\},
+  {\cmn\\t%2, #%n3\,
+   \cmn\\t%0, #%n1\}
+};
+static const char * const ite[2] =
+{
+  \it\\t%d5\,
+  \it\\t%d4\
+};
+static const int cmp_idx[9] = {CMP_CMP, CMP_CMP, CMP_CMN,
+   CMP_CMP, CMN_CMP, CMP_CMP,
+   CMN_CMP, CMP_CMN, CMN_CMN};
 int swap =
   comparison_dominates_p (GET_CODE (operands[5]), GET_CODE (operands[4]));
 
-return opcodes[which_alternative][swap];
+output_asm_insn (cmp2[cmp_idx[which_alternative]][swap], operands);
+if (TARGET_THUMB2) {
+  output_asm_insn (ite[swap], operands);
+}
+output_asm_insn (cmp1[cmp_idx[which_alternative]][swap], operands);
+return \\;
   }
   [(set_attr conds set)
-   (set_attr length 8)]
+   (set_attr arch t2,t2,t2,t2,t2,any,any,any,any)
+   (set_attr_alternative length
+  [(const_int 6)
+   (const_int 8)
+   (const_int 8)
+   (const_int 8)
+   (const_int 8)
+   (if_then_else (eq_attr is_thumb no)
+   (const_int 8)
+   (const_int 10))
+   (if_then_else (eq_attr is_thumb no)
+   (const_int 8)
+   (const_int 10))
+   (if_then_else (eq_attr is_thumb no)
+   (const_int 8)
+   (const_int 10))
+   (if_then_else (eq_attr is_thumb no)
+   (const_int 8)
+   (const_int 10))])]
 )
 
 (define_insn *cmp_ite1
@@ -9021,35 +9075,81 @@
(compare
 (if_then_else:SI

[PATCH] Handle MEM_REF in decode_addr_const

2011-08-26 Thread Richard Guenther


Another missed piece, exposed by less MEM_REF - ARRAY_REF folding.
Interestingly only for Ada testcases.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2011-08-26  Richard Guenther  rguent...@suse.de

* varasm.c (decode_addr_const): Handle MEM_REF[X, OFF].

Index: gcc/varasm.c
===
*** gcc/varasm.c(revision 178096)
--- gcc/varasm.c(working copy)
*** decode_addr_const (tree exp, struct addr
*** 2592,2597 
--- 2592,2603 
 * tree_low_cst (TREE_OPERAND (target, 1), 0));
  target = TREE_OPERAND (target, 0);
}
+   else if (TREE_CODE (target) == MEM_REF
+   TREE_CODE (TREE_OPERAND (target, 0)) == ADDR_EXPR)
+   {
+ offset += mem_ref_offset (target).low;
+ target = TREE_OPERAND (TREE_OPERAND (target, 0), 0);
+   }
else if (TREE_CODE (target) == INDIRECT_REF
TREE_CODE (TREE_OPERAND (target, 0)) == NOP_EXPR
TREE_CODE (TREE_OPERAND (TREE_OPERAND (target, 0), 0))

Re: [v3] Handle different versions of Solaris 8 iso/math_iso.h, iso/stdlib_iso.h

2011-08-26 Thread Paolo Carlini


Hi,

Ok for mainline if bootstraps pass?
Not a comment strictly about this patch, but why we have things like #if 
__cplusplus = 199711L anywhere? For sure the library is not supposed to 
be used together with old C++ front-ends.


Paolo.

Re: Rename across basic block boundaries

2011-08-26 Thread Richard Sandiford

Rather than using global variables and then copying them into a bb
structure, would it be possible to write directly into the bb structure?
The answer's probably no, just asking. :-)

Bernd Schmidt ber...@codesourcery.com writes:
   * regrename.c (struct du_head): Make nregs signed.
   (scan_rtx_reg, scan_rtx_address, dump_def_use_chain): Remove
   declarations.

This bit was split out.

   (mark_conflict, create_new_chain): Move upwards in the file.

Same here.  Should mention the change to create_new_chain's interface
though.

 - 2. For each chain, the set of possible renaming registers is computed.
 + 2. We try combine the local chains across basic block boundaries by
 +comparing chains that were open at the start or end of a block to
 + those in successor/predecessor blocks.

try to combine

 +/* Dump all def/use chains, starting at id FROM.  */
  
  static void
 -dump_def_use_chain (struct du_head *head)
 +dump_def_use_chain (int from)
  {
 -  while (head)
 +  du_head_p head;
 +  int i;
 +  FOR_EACH_VEC_ELT (du_head_p, id_to_chain, i, head)
  {
struct du_chain *this_du = head-first;
 +  if (i  from)
 + continue;
fprintf (dump_file, Register %s (%d):,
  reg_names[head-regno], head-nregs);
while (this_du)

I know it's only a dumping function, but maybe this'd be a good excuse
to add:

#define FOR_EACH_VEC_ELT_FROM(T, V, I, P, FROM) \
  for (I = (FROM); VEC_iterate (T, (V), (I), (P)); ++(I))

 +/* A structure recording information about each basic block.  It is saved
 +   and restored around basic block boundaries.  */
 +struct bb_rename_info

Probably worth saying here or elsewhere that the bb-aux field points
to this information and that (more importantly) the bb-aux is null
for blocks that could not be optimised, including the exit block.

 +/* Record in RI that the block corresponding to it has an incoming
 +   live value, described by CHAIN.  */
 +static void
 +set_incoming_from_chain (struct bb_rename_info *ri, du_head_p chain)
 +{
 +  int min_reg, max_reg, i;
 +  int incoming_nregs = ri-incoming[chain-regno].nregs;
 +  int nregs;
 +
 +  /* If we've recorded the same information before, everything is fine.  */
 +  if (incoming_nregs == chain-nregs)
 +{
 +  if (dump_file)
 + fprintf (dump_file, reg %d/%d already recorded\n,
 +  chain-regno, chain-nregs);
 +  return;
 +}
 +
 +  /* If we have no information for any of the involved registers, update
 + the incoming array.  */
 +  nregs = chain-nregs;
 +  while (nregs--  0)
 +if (ri-incoming[chain-regno + nregs].nregs != 0
 + || ri-incoming[chain-regno + nregs].unusable)
 +  break;
 +  if (nregs  0)
 +{
 +  nregs = chain-nregs;
 +  ri-incoming[chain-regno].nregs = nregs;
 +  while (nregs--  1)
 + ri-incoming[chain-regno + nregs].nregs = -nregs;
 +  if (dump_file)
 + fprintf (dump_file, recorded reg %d/%d\n,
 +  chain-regno, chain-nregs);
 +  return;
 +}
 +
 +  /* There must be some kind of conflict.  Set the unusable for all
 + overlapping registers.  */
 +  min_reg = chain-regno;
 +  if (incoming_nregs  0)
 +min_reg += incoming_nregs;
 +  max_reg = chain-regno + chain-nregs;
 +  for (i = min_reg; i  max_reg; i++)
 +ri-incoming[i].unusable = true;

In the incoming_nregs  0 case, we only need to set
ri-incoming[chain-regno + incoming_nregs] itself, right,
not the other registers between that and ri-incoming[chain-regno]?
If so, I think it'd be clearer to have:

  /* There must be some kind of conflict.  Prevent both the old and
 new ranges from being used.  */
  if (incoming_nregs  0)
ri-incoming[chain-regno + incoming_nregs].unusable = true;
  for (i = 0; i  chain-nregs; i++)
ri-incoming[chain-regno + i].unusable = true;

When I first looked at the code, I was wondering why we changed every
register in (chain-regno + incoming_nregs, chain_regno), but none in
[chain-regno + chain-nregs, OLD_END).  Seems like we should do neither
(as in the above suggestion) or both.

 +  /* Process all basic blocks by using a worklist, adding unvisited successor
 + blocks whenever we reach the end of one basic blocks.  This ensures that
 + whenever possible, we only process a block after at least one of its
 + predecessors, which provides a seeding effect to make the logic in
 + set_incoming_from_chain and init_rename_info useful.  */

Wouldn't a reverse post-order (inverted_post_order_compute) allow even
more pre-opening (as well as being less code)?

Looked good to me otherwise.

Richard

Re: [v3] Handle different versions of Solaris 8 iso/math_iso.h, iso/stdlib_iso.h

2011-08-26 Thread Rainer Orth

Hi Paolo,

 Ok for mainline if bootstraps pass?
 Not a comment strictly about this patch, but why we have things like #if
 __cplusplus = 199711L anywhere? For sure the library is not supposed to be
 used together with old C++ front-ends.

I thought about this myself, but at least the overloads are only present
with __cplusplus = 199711L.  I think it's best to match this to avoid
strange problems if a user plays strange games with __cplusplus.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [testsuite, i386] Fix for PR50185

2011-08-26 Thread H.J. Lu

On Fri, Aug 26, 2011 at 5:04 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 According to Jakub's input, I've updated test to scan instruction, not
 pattern name.

 Is it ok?

 Thanks, K

 On Fri, Aug 26, 2011 at 3:45 PM, Kirill Yukhin kirill.yuk...@gmail.com 
 wrote:
 Hi,
 Here is a fix for http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

 testsuite/ChangeLog entry:

 2011-08-26  Kirill Yukhin  kirill.yuk...@intel.com

        PR testsuite/50185
        * gcc.target/i386/avx2-vmovmskb-2.c: Rename to ...
        * gcc.target/i386/avx2-vpmovmskb-2.c: ... this. Update.

 Test passes.
 Ok for trunk?

 Thanks, K



Please check ALL AVX2 tests to see if they have similar problems.

-- 
H.J.

Re: [v3] Handle different versions of Solaris 8 iso/math_iso.h, iso/stdlib_iso.h

2011-08-26 Thread Paolo Carlini


On 8/26/11 2:59 PM, Rainer Orth wrote:

Hi Paolo,


Ok for mainline if bootstraps pass?

Not a comment strictly about this patch, but why we have things like #if
__cplusplus= 199711L anywhere? For sure the library is not supposed to be
used together with old C++ front-ends.

I thought about this myself, but at least the overloads are only present
with __cplusplus= 199711L.
I don't understand: isn't __cplusplus now *always* = 199711L? Or you 
want to protect vs the user undefining __cplusplus and then defining it 
to a different value?!? I don't have the Standard at hand (in theory I'm 
in vacation ;), maybe Marc can help, but I don't think it's legal, is it?


Paolo.

Re: [PATCH, i386, testsuite] FMA intrinsics

2011-08-26 Thread H.J. Lu

On Fri, Aug 26, 2011 at 1:41 AM, Ilya Tocar tocarip.in...@gmail.com wrote:
 So if this is ok can someone please commit it?

 2011/8/25 Ilya Tocar tocarip.in...@gmail.com:
 Fixed.

 Changelog:

 2011-08-25  Ilya Tocar  ilya.to...@intel.com

             * config/i386/fmaintrin.h: New.
             * config.gcc: Add fmaintrin.h.
             * config/i386/i386.c
            (enum ix86_builtins) IX86_BUILTIN_VFMADDSS3: New.
             IX86_BUILTIN_VFMADDSD3: Likewise.
             * config/i386/sse.md (fmai_vmfmadd_mode): New.
             (*fmai_fmadd_mode): Likewise.
             (*fmai_fmsub_mode): Likewise.
             (*fmai_fnmadd_mode): Likewise.
             (*fmai_fnmsub_mode): Likewise.
             * config/i386/x86intrin.h: Add fmaintrin.h.


Please include fmaintrin.h in immintrin.h, not x86intrin.h, since
immintrin.h should include all Intel intrinsics.


-- 
H.J.

Re: [PATCH] [JAVA] patch for Java on RTEMS

2011-08-26 Thread Ralf Wildenhues

* Jie Liu wrote on Mon, Aug 15, 2011 at 04:07:36PM CEST:
  Looks OK, but there is no ChangeLog.  Do you have copyright
  assignment?
 
 Have added ChangeLog to the patch, please see the attachment. And I
 think I have copyright assignment, because I have Free Software
 Foundation paperwork, as ASSIGNMENT - GNU GCC ... JIE RT688742

The build-system specific parts of the patch are OK, provided
that they have been sufficiently tested.  When committing
top-level changes, please make sure they are synced to the
src repository; if you don't have write access to src, please
ask someone who has to do that for you.

I think you still need approval for the boehm-gc related changes.

Please also try to send patches with some text MIME type.

Thanks,
Ralf

 --- boehm-gc/ChangeLog(revision 172224)
 +++ boehm-gc/ChangeLog(working copy)
 @@ -1,3 +1,22 @@
 +2011-08-15  Jie Liu  lj8...@gmail.com
 + * configure.ac: Add configure for RTEMS.
 + * configure: Add configure for RTEMS.
 + * include/gc_config.h.in: Add GC_RTEMS_PTHREADS for RTEMS.
 + * mach_dep.c (GC_with_callee_saves_pushed): Use setjmp for
 + RTEMS.
 + * include/gc_config_macros.h: Define GC_PTHREADS for rtems.
 + * include/private/gcconfig.h: Add configure for RTEMS/i386;
 + Use calloc for RTEMS to GET_MEM.
 + * pthread_stop_world.c (GC_stop_init): Add judge SA_RESTART
 + for operating system; Use sigprocmask unblock the signal
 + for RTEMS.
 + * pthread_support.c: Define USE_PTHREAD_SPECIFIC for RTEMS;
 + Do not include sys/mman.h for RTEMS; Add default GC_nprocs
 + for RTEMS.
 + * gc_dlopen.c: Do not include dlfcn.h for RTEMS.
 + * os_dep.c: Do not use auxiliary routines for obtaining 
 + memory from RTEMS.

 --- ChangeLog (revision 172224)
 +++ ChangeLog (working copy)
 @@ -1,3 +1,8 @@
 +2011-08-15  Jie Liu  lj8...@gmail.com
 +
 + * configure.ac (*-*-rtems*): Remove ${libgcj} in nonconfigdirs.
 + * configure: Regenerate.


 --- libjava/ChangeLog (revision 172224)
 +++ libjava/ChangeLog (working copy)
 @@ -1,3 +1,13 @@
 +2011-08-15  Jie Liu  lj8...@gmail.com
 +
 + * configure.ac (THREADS): Add configuration for RTEMS.
 + * configure.host (host): Add configuration for RTEMS.
 + * include/config.h.in: Add RTEMS_PTHREADS for RTEMS.
 + * configure: Add configure for RTEMS.
 + * classpath/native/fdlibm/mprec.c: Remove _mprec_log10 for RTEMS.
 + * posix-threads.cc: Use SIGHUP for INTR on RTEMS.
 + * java/lang/natClass.cc: Undef HAVE_TLS for RTEMS.

Re: PATCH: PR middle-end/49721: convert_memory_address_addr_space may generate invalid new insns

2011-08-26 Thread Richard Sandiford

H.J. Lu hjl.to...@gmail.com writes:
 On Sun, Aug 14, 2011 at 9:22 AM, H.J. Lu hjl.to...@gmail.com wrote:
 Hi,

 This patch is needed for x32 and only affects x32.  Any comments/objections
 to apply this to finish x32 support?

 Thanks.


 H.J.
 
 On Thu, Aug 11, 2011 at 6:25 AM, H.J. Lu hjl.to...@gmail.com wrote:
 Hi,

 This is the last patch needed for x32 support.
 convert_memory_address_addr_space
 is called to convert a memory address without overflow/underflow.  It
 should be safe
 to transform

 (zero_extend:DI (plus:SI (FOO:SI) (const_int Y)))

 to

 (plus:DI (zero_extend:DI (FOO:SI)) (const_int Y))

 GCC only works this way.  Any comments?

 Thanks.

 H.J.
 
 On Sun, Aug 7, 2011 at 1:08 PM, H.J. Lu hongjiu...@intel.com wrote:
 Hi,

 We transform

 ptr_extend:DI (plus:SI (FOO:SI) (const_int Y)))

 to

 (plus:DI (ptr_extend:DI (FOO:SI)) (const_int Y))

 since this is how Pmode != ptr_mode is supported even if the resulting
 address may overflow/underflow.   It is also true for x32 which has
 zero_extend instead of ptr_extend.  I have tried different approaches
 to avoid transforming

 (zero_extend:DI (plus:SI (FOO:SI) (const_int Y)))

 to

 (plus:DI (zero_extend:DI (FOO:SI)) (const_int Y))

 without success.  This patch relaxes the condition to check
 POINTERS_EXTEND_UNSIGNED != 0 instead if POINTERS_EXTEND_UNSIGNED  0
 to cover both ptr_extend and zero_extend. We can investigate a better
 approach for ptr_extend and zero_extend later.  For now, I believe it
 is the saftest way to support ptr_extend and zero_extend.

 Any comments?

 Thanks.


 H.J.

 I am checking in this patch, which only affects x32
 and nothing else.  This one character change, from

 POINTERS_EXTEND_UNSIGNED  0

 to

 POINTERS_EXTEND_UNSIGNED != 0

 creates a working x32 GCC. This isn't perfect. I have
 tried many different approaches without any success.
 I will revisit it if we run into any problems with x32
 applications.

Sorry, I know it's frustrating when things don't get reviewed,
but I strongly object to a nonobvious patch like this being applied
without approval.

(And for the record, I can't approve it. :-))

Richard

Re: [PATCH] [JAVA] patch for Java on RTEMS

2011-08-26 Thread Ralf Corsepius


On 08/14/2011 03:03 PM, Jie Liu wrote:

Hi,

I have add the boehm-gc patch and the configure for gcc patch to the
patch attached. So we can add this patch and then compile gcj for
RTEMS.

Best Regards,
Jie


--- boehm-gc/include/private/gcconfig.h (revision 172224)
+++ boehm-gc/include/private/gcconfig.h (working copy)
..
@@ -1297,6 +1302,19 @@
 #  define STACKBOTTOM ((ptr_t)0xc000)
 #  define DATAEND  /* not needed */
 #   endif
+#   ifdef RTEMS
+#   define OS_TYPE RTEMS
+#   include sys/unistd.h

Why sys/unistd.h?

sys/unistd.h is not supposed to be accessed directly.
This likely should be a plain simple #include unistd.h

Ralf

Re: [testsuite, i386] Fix for PR50185

2011-08-26 Thread Kirill Yukhin

Hi guys,
Thanks for your objections.

HJ, I scanned all AVX2 tests. So, every tests has at least tab which
distinguishes it from filename:
$ pwd
/export/users/kyukhin/ws/gcc/gcc/testsuite/gcc.target/i386
$ grep scan-assembler avx2-* |grep -v \t |wc -l
0

Uros, you're right. Patch contains usless file. Updated one is attached.

Thanks, K


On Fri, Aug 26, 2011 at 5:04 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Aug 26, 2011 at 5:04 AM, Kirill Yukhin kirill.yuk...@gmail.com 
 wrote:
 According to Jakub's input, I've updated test to scan instruction, not
 pattern name.

 Is it ok?

 Thanks, K

 On Fri, Aug 26, 2011 at 3:45 PM, Kirill Yukhin kirill.yuk...@gmail.com 
 wrote:
 Hi,
 Here is a fix for http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

 testsuite/ChangeLog entry:

 2011-08-26  Kirill Yukhin  kirill.yuk...@intel.com

        PR testsuite/50185
        * gcc.target/i386/avx2-vmovmskb-2.c: Rename to ...
        * gcc.target/i386/avx2-vpmovmskb-2.c: ... this. Update.

 Test passes.
 Ok for trunk?

 Thanks, K



 Please check ALL AVX2 tests to see if they have similar problems.

 --
 H.J.



pr50185-3.gcc.patch
Description: Binary data

Re: [testsuite, i386] Fix for PR50185

2011-08-26 Thread H.J. Lu

On Fri, Aug 26, 2011 at 6:45 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hi guys,
 Thanks for your objections.

 HJ, I scanned all AVX2 tests. So, every tests has at least tab which
 distinguishes it from filename:
 $ pwd
 /export/users/kyukhin/ws/gcc/gcc/testsuite/gcc.target/i386
 $ grep scan-assembler avx2-* |grep -v \t |wc -l
 0

 Uros, you're right. Patch contains usless file. Updated one is attached.

 Thanks, K


 On Fri, Aug 26, 2011 at 5:04 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Aug 26, 2011 at 5:04 AM, Kirill Yukhin kirill.yuk...@gmail.com 
 wrote:
 According to Jakub's input, I've updated test to scan instruction, not
 pattern name.

 Is it ok?

 Thanks, K

 On Fri, Aug 26, 2011 at 3:45 PM, Kirill Yukhin kirill.yuk...@gmail.com 
 wrote:
 Hi,
 Here is a fix for http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50182

 testsuite/ChangeLog entry:

 2011-08-26  Kirill Yukhin  kirill.yuk...@intel.com

        PR testsuite/50185
        * gcc.target/i386/avx2-vmovmskb-2.c: Rename to ...
        * gcc.target/i386/avx2-vpmovmskb-2.c: ... this. Update.

 Test passes.
 Ok for trunk?

 Thanks, K



 Please check ALL AVX2 tests to see if they have similar problems.

Thanks.



-- 
H.J.

Passes uses rather than defs to df_set_dead_notes_for_mw

2011-08-26 Thread Richard Sandiford

As described here:

http://gcc.gnu.org/ml/gcc/2011-08/msg00294.html

df is currently failing to create REG_DEAD notes for the last use
of a multi-reg hard register.  This appears to be a typo:
df_set_dead_notes_for_mw is supposed to handle uses, and the comment
above it says so, but df_note_bb_compute is passing defs instead.

Bootstrapped  regression-tested on x86_64-linux-gnu.  OK to install?

Richard


gcc/
* df-problems.c (df_note_bb_compute): Pass uses rather than defs
to df_set_dead_notes_for_mw.

Index: gcc/df-problems.c
===
--- gcc/df-problems.c   2011-08-16 16:27:24.641037124 +0100
+++ gcc/df-problems.c   2011-08-26 14:48:48.521897439 +0100
@@ -3376,7 +3376,7 @@ df_note_bb_compute (unsigned int bb_inde
   while (*mws_rec)
{
  struct df_mw_hardreg *mws = *mws_rec;
- if ((DF_MWS_REG_DEF_P (mws))
+ if (DF_MWS_REG_USE_P (mws)
   !df_ignore_stack_reg (mws-start_regno))
{
  bool really_add_notes = debug_insn != 0;

Re: [v3] Handle different versions of Solaris 8 iso/math_iso.h, iso/stdlib_iso.h

2011-08-26 Thread Rainer Orth

Paolo,

 Ok for mainline if bootstraps pass?
 Not a comment strictly about this patch, but why we have things like #if
 __cplusplus= 199711L anywhere? For sure the library is not supposed to be
 used together with old C++ front-ends.
 I thought about this myself, but at least the overloads are only present
 with __cplusplus= 199711L.
 I don't understand: isn't __cplusplus now *always* = 199711L? Or you want
 to protect vs the user undefining __cplusplus and then defining it to a
 different value?!? I don't have the Standard at hand (in theory I'm in

exactly: just g++ -D__cplusplus=1 or something.

 vacation ;), maybe Marc can help, but I don't think it's legal, is it?

No idea.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH, middle-end]: Fix PR50083: All 32-bit fortran tests fail on 32-bit Solaris

2011-08-26 Thread Rainer Orth

Uros,

 I will wait for the confirmation from Rainer before committing the patch.

an i386-pc-solaris2.9 bootstrap just finished, and all the failures are
gone.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: New automaton_option collapse-ndfa

2011-08-26 Thread Vladimir Makarov


On 08/25/2011 06:21 PM, Bernd Schmidt wrote:

On 07/18/11 18:47, Vladimir Makarov wrote:

But I guess comb-vector is popular for a reason.  We could tolerate slow
compression time because it is done once but worse compression and
slower access would have a really bad impact on the compiler time.

With some fixes that I need to make to the C6X machine description, comb
vector generation time is no longer tolerable. Ok to apply the following
patch? (Bootstrapped and tested on i686-linux).


Ok.  Thanks, Bernd.

Re: Passes uses rather than defs to df_set_dead_notes_for_mw

2011-08-26 Thread Kenneth Zadeck


this looks right to me.  ok for commit.



On 08/26/2011 09:54 AM, Richard Sandiford wrote:

As described here:

 http://gcc.gnu.org/ml/gcc/2011-08/msg00294.html

df is currently failing to create REG_DEAD notes for the last use
of a multi-reg hard register.  This appears to be a typo:
df_set_dead_notes_for_mw is supposed to handle uses, and the comment
above it says so, but df_note_bb_compute is passing defs instead.

Bootstrapped  regression-tested on x86_64-linux-gnu.  OK to install?

Richard


gcc/
* df-problems.c (df_note_bb_compute): Pass uses rather than defs
to df_set_dead_notes_for_mw.

Index: gcc/df-problems.c
===
--- gcc/df-problems.c   2011-08-16 16:27:24.641037124 +0100
+++ gcc/df-problems.c   2011-08-26 14:48:48.521897439 +0100
@@ -3376,7 +3376,7 @@ df_note_bb_compute (unsigned int bb_inde
while (*mws_rec)
{
  struct df_mw_hardreg *mws = *mws_rec;
- if ((DF_MWS_REG_DEF_P (mws))
+ if (DF_MWS_REG_USE_P (mws)
  !df_ignore_stack_reg (mws-start_regno))
{
  bool really_add_notes = debug_insn != 0;

Re: [PATCH] Add infrastructure to merge standard builtin enums with backend builtins

2011-08-26 Thread Michael Meissner

On Fri, Aug 26, 2011 at 10:19:24AM +0200, Richard Guenther wrote:
 On Thu, Aug 25, 2011 at 10:35 PM, Michael Meissner
 meiss...@linux.vnet.ibm.com wrote:
  On Wed, Aug 24, 2011 at 11:06:55AM +0200, Richard Guenther wrote:
  This basically would make DECL_BUILT_IN_CLASS no longer necessary
  if all targets where converted, right?  (We don't currently have any
  BUILT_IN_FRONTEND builtins).  That would sound appealing if this
  patch weren't a partial transition ;)
 
  Or we could reduce it to 1 bit if we aren't going to change all of the
  backends.
 
  Now for the possible downsides.  How can we reliably distinguish
  middle-end from target builtins for purpose of lazy initialization?
  Doesn't this complicate the idea of pluggable targets, thus
  something like a hybrid ppc / spu compiler?  In this light merging
  middle-end and target builtin enums and arrays sounds like a step
  backward.
 
  If we are willing to pay the storage costs, we could have 1 or 2 bytes for
  builtin owner, and 2 bytes for builtin index, and then reserve 0 for 
  standard
  builtins and 1 for machine dependent builtins.  However, then you still have
  the potential problem that sooner or later somebody else will omit the 
  checks.
 
 I don't think that the issue you only can index BUILT_IN_NORMAL builtins
 in built_in_decls is an issue and worth thinking about at all.  It's simply
 bugs.

I've probably spent about 2-3 weeks total tracking down those bugs in the past,
because they are hard to pin down, but if we don't want to merge the two
numbers it isn't a deal breaker to me.  It was more while I'm playing in the
builtin space, fix the problem.

  We could reserve a fixed range for plugin builtins if you think that is
  desirable.
 
 Oh, plugin builtins - I didn't even think about the possibility of having
 those ;)
 
 In the end I think we should stick with BUILT_IN_CLASS and maybe
 add BUILT_IN_PLUGIN then ;)

I think if we do this, we should re-use the front end builtin class, and add
methods that front ends can add their builtins to the main list.  Otherwise we
need to grow the class by 1 bit.

  What I _do_ like is having common machinery for defining builtins.
  Though instead of continuing the .def file way with all the current
  warts of ways of adding attributes, etc. to builtins I would have
  prefered a genbuiltins.c program that can parse standard C
  declarations and generate whatever is necessary to setup the
  builtin decls.  Thus, instead of
 
  DEF_GCC_BUILTIN        (BUILT_IN_CLZ, clz, BT_FN_INT_UINT,
  ATTR_CONST_NOTHROW_LEAF_LIST)
 
  have simply
 
  int __builtin_clz (unsigned int) __attribute__((const,nothrow,leaf));
 
  in a header file which genbuiltins.c would parse.  My first idea
  when discussing this was a -fgenbuiltins flag to the C frontend
  (because that already can do all the parsing ...), but Micha suggested
  a parser that can deal with the above is easy enough to re-implement.
 
  Yes, that is certainly do-able.  My main intention is to see what kind of
  infrastructure people wanted before changing all of the ppc builtins.
 
 Sure.  I agree that all the duplicated code we have in backends for a
 way to create target builtins, defining enums (or not) for them and
 having a way to reference them for targetm.builtin_decl (or not) is bad.
 But unifying those, or providing common infrastructure for them should
 be orthogonal to the issue whether we want to merge the builtin
 classes or their storage in some way (I think we don't).  It would of
 course be nice if the infrastructure to create taget builtins were
 generic enough to eventually handle builtin creation in the middle-end
 (and the frontends) as well.
 
  Hm, I guess this pushes back a bit on your patch.  Sorry for that.
  If you're not excited to try the above idea, can you split out the
  pieces that do the .def file thing for rs6000, keeping the separation
  of md and middle-end builtin arrays and enums?
 
  I have several goals for the 4.7 time frame:
 
   1) Make target attribute and pragma enable appropriate machine dependent
      builtins;
 
 That's now something completely new ;)  Why do we need builtins for this?

I ran out of time when I added target pragma support in 4.6 to enable the
builtins for target functions.  We don't need new builtins, but the ppc backend
needs to enable the builtins that exist when the target is selected, which the
x86 already does.  In the end, I want to be able to do:

void v4sf_add (float *, float *, float *, size_t)
__attribute__ ((__ifunc__ (resolve_v4sf_add)));

static void v4sf_power7_add (float *, float *, float *, size_t)
__attribute__ ((__target__ (cpu=power7)));

static void v4sf_altivec_add (float *, float *, float *, size_t)
__attribute__ ((__target__ (altivec)));

static void v4sf_generic_add (float *, float *, float *, size_t);

static void *resolve_v4sf_add (void);

static void

Re: [PATCH] Fix -Wunused-but-set-* in C with stmt expression and array in it (PR c/50179)

2011-08-26 Thread Joseph S. Myers

On Fri, 26 Aug 2011, Jakub Jelinek wrote:

 Hi!
 
 As the following testcase shows, if the last expression in statement
 expression is array, mark_exp_read wasn't called on it.
 Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
 ok for trunk/4.6?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [testsuite, i386] Fix for PR50185

2011-08-26 Thread Jakub Jelinek

Hi!

On Fri, Aug 26, 2011 at 06:04:20AM -0700, H.J. Lu wrote:
 Please check ALL AVX2 tests to see if they have similar problems.

Checking all current i386 tests revealed another problematic testcase:
grep scan-assembler' [a-z0-9]*' testsuite/gcc.target/i386/*.c | grep 
'\(.*\).*\1'

(minmax-*.c only match the path, which will show up only with -g and
I bet many other scan-assembler tests would fail with
RUNTESTFLAGS=--target_board=unix/-g
(matching various stuff in the debug info)).

2011-08-26  Jakub Jelinek  ja...@redhat.com

* gcc.target/i386/cmpxchg16b-1.c: Match also space after the
instruction.

--- gcc/testsuite/gcc.target/i386/cmpxchg16b-1.c.jj 2011-07-11 
10:39:29.0 +0200
+++ gcc/testsuite/gcc.target/i386/cmpxchg16b-1.c2011-08-26 
16:20:46.0 +0200
@@ -10,4 +10,4 @@ void test(TItype x_128)
   m_128 = __sync_val_compare_and_swap (m_128, x_128, m_128);
 }
 
-/* { dg-final { scan-assembler cmpxchg16b } } */
+/* { dg-final { scan-assembler cmpxchg16b\[ \\t] } } */


Jakub

Re: [PATCH 4/6] Shrink-wrapping

2011-08-26 Thread Ramana Radhakrishnan

On 24 August 2011 18:23, Bernd Schmidt ber...@codesourcery.com wrote:
 On 08/24/11 19:17, Richard Sandiford wrote:
 OK with that change from a MIPS and rtl and perspective.

 Thanks. What else is in there? Trivial x86 changes, and a slightly less
 trivial but still tiny ARM bit, I suppose. Richard/Ramana?

Sorry about the delayed review -  I read through this for a bit this
afternoon and for a while I must admit I was confused for a while by
why the arm.md changes and the other changes in the backend hadn't
made it in here from the original patch.

This is OK but please watch out for any fall-out next week.

cheers
Ramana

Fix .init_array/.fini_array testcase (PR target/50166)

2011-08-26 Thread Rainer Orth

As detailed in the PR, H.J.'s .init_array/.fini_array patch

2011-08-20  H.J. Lu  hongjiu...@intel.com

PR other/46770
* config.gcc (tm_file): Add initfini-array.h if
.init_arrary/.fini_array are supported.

broke Solaris bootstrap since the testcase incorrectly succeeds on
Solaris, failing to notice that none of the constructors and destructors
were ever run.

The following patch fixes that, allows i386-pc-solaris2.11 bootstrap to
succeed and was also bootstrapped on x86_64-unknown-linux-gnu (CentOS
5.5 with gas/gld 2.21).  The testcase still fails on my Linux system, so
I'm uncertain if the fix is right.

Ok for mainline?

Rainer


2011-08-26  Rainer Orth  r...@cebitec.uni-bielefeld.de

PR target/50166
* acinclude.m4 (gcc_AC_INITFINI_ARRAY): Check count in main.
* configure: Regenerate.

# HG changeset patch
# Parent f622b6f398b4f552dcc1450c8caf6368a5937748
Disable .init_array/.fini_array support on Solaris (PR target/50166)

diff --git a/gcc/acinclude.m4 b/gcc/acinclude.m4
--- a/gcc/acinclude.m4
+++ b/gcc/acinclude.m4
@@ -477,6 +477,8 @@ void (*const dtors65535[]) ()
 int
 main ()
 {
+  if (count != 65535)
+abort ();
   return 0;
 }
 #endif
diff --git a/gcc/configure b/gcc/configure
--- a/gcc/configure
+++ b/gcc/configure
@@ -10888,6 +10888,8 @@ void (*const dtors65535) ()
 int
 main ()
 {
+  if (count != 65535)
+abort ();
   return 0;
 }
 #endif
@@ -17913,7 +17915,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat  conftest.$ac_ext _LT_EOF
-#line 17916 configure
+#line 17918 configure
 #include confdefs.h
 
 #if HAVE_DLFCN_H
@@ -18019,7 +18021,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat  conftest.$ac_ext _LT_EOF
-#line 18022 configure
+#line 18024 configure
 #include confdefs.h
 
 #if HAVE_DLFCN_H

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH 4/6] Shrink-wrapping

2011-08-26 Thread Bernd Schmidt

On 08/26/11 16:32, Ramana Radhakrishnan wrote:
 On 24 August 2011 18:23, Bernd Schmidt ber...@codesourcery.com wrote:
 On 08/24/11 19:17, Richard Sandiford wrote:
 OK with that change from a MIPS and rtl and perspective.

 Thanks. What else is in there? Trivial x86 changes, and a slightly less
 trivial but still tiny ARM bit, I suppose. Richard/Ramana?
 
 Sorry about the delayed review -  I read through this for a bit this

Nothing delayed about it really :)

 afternoon and for a while I must admit I was confused for a while by
 why the arm.md changes and the other changes in the backend hadn't
 made it in here from the original patch.

You mean the introduction of simple_return patterns for ARM? The patch
is split up further (this one is now piece 2/3 of the original patch
4/6) and I've postponed these until the final shrink-wrapping patch. In
this patch I've only made some MIPS changes in this area, more as a
proof-of-concept rather than because they gain anything yet.

 This is OK but please watch out for any fall-out next week.

Thanks!


Bernd

Re: [SPARC] Fix bugs with setjmp/longjmp + alloca

2011-08-26 Thread David Miller

From: Eric Botcazou ebotca...@adacore.com
Date: Sun, 22 May 2011 00:45:55 +0200

 SPARC maintainers, any objection to me eliminating this SETJMP_VIA_SAVE_AREA 
 kludge?  This would make it possible to have a shared implementation with the 
 flat mode and remove specific support in a few locations.  Even IA-64 does 
 things the canonical way here.

Absolutely no objection to getting rid of the setjmp kludge. :-)

The thing about the setjmp+alloca handling on sparc is that the code
is simply trying to leave the originally stack frame and thus original
setjmp area alone.  Basically so that the JB_SP/JB_PC don't get
overwritten.

It would seem to me that, for example with C code, we don't need to
update anything.  Because any local stack allocation happening later
than the setjmp() can be safely ignored since that allocated memory
does not exist at the setjmp() point, thus it is safe to always longjmp
to the pre-alloca()'d state.

I guess when using setjmp/longjmp for exceptions the requirements
increase above and beyond what is normally sufficient, and that's why
you have to update the buffer?

Re: Fix .init_array/.fini_array testcase (PR target/50166)

2011-08-26 Thread H.J. Lu

On Fri, Aug 26, 2011 at 7:35 AM, Rainer Orth
r...@cebitec.uni-bielefeld.de wrote:
 As detailed in the PR, H.J.'s .init_array/.fini_array patch

 2011-08-20  H.J. Lu  hongjiu...@intel.com

        PR other/46770
        * config.gcc (tm_file): Add initfini-array.h if
        .init_arrary/.fini_array are supported.

 broke Solaris bootstrap since the testcase incorrectly succeeds on
 Solaris, failing to notice that none of the constructors and destructors
 were ever run.

 The following patch fixes that, allows i386-pc-solaris2.11 bootstrap to
 succeed and was also bootstrapped on x86_64-unknown-linux-gnu (CentOS
 5.5 with gas/gld 2.21).  The testcase still fails on my Linux system, so
 I'm uncertain if the fix is right.

 Ok for mainline?

        Rainer


 2011-08-26  Rainer Orth  r...@cebitec.uni-bielefeld.de

        PR target/50166
        * acinclude.m4 (gcc_AC_INITFINI_ARRAY): Check count in main.
        * configure: Regenerate.



That explains why init_array was enabled on AIX.  It looks good to me and
still works on Fedora 15.

Thanks.

-- 
H.J.

Re: [PATCH 4/6] Shrink-wrapping

2011-08-26 Thread Ramana Radhakrishnan

On 26 August 2011 15:36, Bernd Schmidt ber...@codesourcery.com wrote:
 On 08/26/11 16:32, Ramana Radhakrishnan wrote:
 On 24 August 2011 18:23, Bernd Schmidt ber...@codesourcery.com wrote:
 On 08/24/11 19:17, Richard Sandiford wrote:

 You mean the introduction of simple_return patterns for ARM? The patch
 is split up further (this one is now piece 2/3 of the original patch
 4/6) and I've postponed these until the final shrink-wrapping patch. In
 this patch I've only made some MIPS changes in this area, more as a
 proof-of-concept rather than because they gain anything yet.


Yes that's what I meant and figured out later. Thanks for making that
explicit. Richard Sandiford did point that out to me on IRC as I was
pretty much scratching my head about why some of the other changes
were missing :) .

cheers
Ramana

Re: Fix .init_array/.fini_array testcase (PR target/50166)

2011-08-26 Thread Rainer Orth

H.J. Lu hjl.to...@gmail.com writes:

 2011-08-26  Rainer Orth  r...@cebitec.uni-bielefeld.de

        PR target/50166
        * acinclude.m4 (gcc_AC_INITFINI_ARRAY): Check count in main.
        * configure: Regenerate.

 That explains why init_array was enabled on AIX.  It looks good to me and
 still works on Fedora 15.

What support do you need on the Linux side for .init_array/.fini_array
to work?  I'd have expected that gld 2.21 is enough, or is ld-linux.so.2
support required, too?

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

bb partitioning vs optimize_function_for_speed_p

2011-08-26 Thread Bernd Schmidt

In rest_of_reorder_blocks, we avoid reordering if
!optimize_function_for_speed_p. However, we still call
insert_section_bounary_note, which can cause problems because now, if we
have a sequence of HOT-COLD-HOT blocks, the second set of HOT blocks
will end up in the cold section. This causes assembler failures when
using exception handling (subtracting labels from different sections).

Unfortunately, the only way I have of reproducing it is to apply a
67-patch quilt tree backporting the preliminary shrink-wrapping patches
to gcc-4.6; then we get

FAIL: g++.dg/tree-prof/partition2.C compilation,  -Os  -fprofile-use

However, the problem is reasonably obvious. Bootstrapped and currently
testing in the aforementioned 4.6 tree. Ok for trunk after testing there?


Bernd
* bb-reorder.c (insert_section_boundary_note): Only do it if
we reordered the blocks; i.e. not if !optimize_function_for_speed_p.

Index: gcc/bb-reorder.c
===
--- gcc/bb-reorder.c(revision 178030)
+++ gcc/bb-reorder.c(working copy)
@@ -1965,8 +1965,11 @@ insert_section_boundary_note (void)
   rtx new_note;
   int first_partition = 0;
 
-  if (flag_reorder_blocks_and_partition)
-FOR_EACH_BB (bb)
+  if (!flag_reorder_blocks_and_partition
+  || !optimize_function_for_speed_p (cfun))
+return;
+
+  FOR_EACH_BB (bb)
 {
   if (!first_partition)
first_partition = BB_PARTITION (bb);

Re: Fix .init_array/.fini_array testcase (PR target/50166)

2011-08-26 Thread H.J. Lu

On Fri, Aug 26, 2011 at 7:45 AM, Rainer Orth
r...@cebitec.uni-bielefeld.de wrote:
 H.J. Lu hjl.to...@gmail.com writes:

 2011-08-26  Rainer Orth  r...@cebitec.uni-bielefeld.de

        PR target/50166
        * acinclude.m4 (gcc_AC_INITFINI_ARRAY): Check count in main.
        * configure: Regenerate.

 That explains why init_array was enabled on AIX.  It looks good to me and
 still works on Fedora 15.

 What support do you need on the Linux side for .init_array/.fini_array
 to work?  I'd have expected that gld 2.21 is enough, or is ld-linux.so.2
 support required, too?

You need the latest Linux binutils. Mixing init_array/,ctors sections support
was added after binutils 2.21 was released:

http://sourceware.org/git/?p=binutils.git;a=commit;h=30dfd0308a8551174634494822e194fcf24a7ddb


-- 
H.J.

Re: [Patch ARM] Fix vec_pack_trunc pattern for vectorize_with_neon_quad.

2011-08-26 Thread Ramana Radhakrishnan

On 16 August 2011 15:20, Ramana Radhakrishnan
ramana.radhakrish...@linaro.org wrote:
 Hi,

 While looking at a failure with regrename and
 mvectorize-with-neon-quad I noticed that the early-clobber in this
 vec_pack_trunc pattern is superfluous given that we can use
 reg_overlap_mentioned_p to decide in which order we want to emit these
 2 instructions. While it works around the problem in regrename.c I
 still think that the behaviour in regrename is a bit suspicious and
 needs some more investigation.


RichardS finally fixed the problem in data-flow and hence we should be
able to turn on vectorize_with_quad anyway.

Here's the patch which I thought I should have committed as a
workaround but I think it's better to split this further in the case
where the 2 registers are equal because otherwise you are pointlessly
creating a stall in the Neon pipe for the vmovn result to arrive.
Hence I'm not committing this patch.

Tests finished OK btw for this patch.


cheers
Ramana

index 24dd941..2c60c5f 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -5631,14 +5631,29 @@
 ; the semantics of the instructions require.

 (define_insn vec_pack_trunc_mode
- [(set (match_operand:V_narrow_pack 0 register_operand =w)
+ [(set (match_operand:V_narrow_pack 0 register_operand =w)
(vec_concat:V_narrow_pack
(truncate:V_narrow
(match_operand:VN 1 register_operand w))
(truncate:V_narrow
(match_operand:VN 2 register_operand w]
  TARGET_NEON  !BYTES_BIG_ENDIAN
- vmovn.iV_sz_elem\t%e0, %q1\;vmovn.iV_sz_elem\t%f0, %q2
+ {
+ /* If operand1 and operand2 are identical, then the second
+narrowing operation isn't needed as the values obtained
+in both parts of the destination q register are identical.
+This precludes the need for an early clobber in the destination
+operand.  */
+ if (rtx_equal_p (operands[1], operands[2]))
+return vmovn.iV_sz_elem\\t%e0, %q1\;vmov.iV_sz_elem\\t%f0, %e0;
+ else
+  {
+   if (reg_overlap_mentioned_p (operands[0], operands[2]))
+ return vmovn.iV_sz_elem\\t%f0, %q2\;vmovn.iV_sz_elem\\t%e0, %q1;
+   else
+ return vmovn.iV_sz_elem\\t%e0, %q1\;vmovn.iV_sz_elem\\t%f0, %q2;
+  }
+ }
  [(set_attr neon_type neon_shift_1)
   (set_attr length 8)]
 )

Re: Fix .init_array/.fini_array testcase (PR target/50166)

2011-08-26 Thread Rainer Orth

H.J. Lu hjl.to...@gmail.com writes:

 What support do you need on the Linux side for .init_array/.fini_array
 to work?  I'd have expected that gld 2.21 is enough, or is ld-linux.so.2
 support required, too?

 You need the latest Linux binutils. Mixing init_array/,ctors sections support
 was added after binutils 2.21 was released:

 http://sourceware.org/git/?p=binutils.git;a=commit;h=30dfd0308a8551174634494822e194fcf24a7ddb

I see, thanks.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: Fix .init_array/.fini_array testcase (PR target/50166)

2011-08-26 Thread Jakub Jelinek

On Fri, Aug 26, 2011 at 04:35:18PM +0200, Rainer Orth wrote:
 Ok for mainline?

Yes.
 
 2011-08-26  Rainer Orth  r...@cebitec.uni-bielefeld.de
 
   PR target/50166
   * acinclude.m4 (gcc_AC_INITFINI_ARRAY): Check count in main.
   * configure: Regenerate.
 

Jakub

Re: [v3] Handle different versions of Solaris 8 iso/math_iso.h, iso/stdlib_iso.h

2011-08-26 Thread Paolo Carlini

Hi,

 exactly: just g++ -D__cplusplus=1 or something.

Irrespective of what the Standard strictly says, I think the latter would only 
make sense if it would allow the user to return, consistently, to the pre-4.7 
behavior, for compatibility reasons or something. Is it the case? Is the above 
enough for that? Or some of the changes which went in are effective anyway even 
if __cplusplus is reverted by hand to 1? I think this is the question deciding 
what we really want to do.

Paolo

Re: [v3] Handle different versions of Solaris 8 iso/math_iso.h, iso/stdlib_iso.h

2011-08-26 Thread Rainer Orth

Hi Paolo,

 exactly: just g++ -D__cplusplus=1 or something.

 Irrespective of what the Standard strictly says, I think the latter would 
 only make sense if it would allow the user to return, consistently, to the 
 pre-4.7 behavior, for compatibility reasons or something. Is it the case? Is 
 the above enough for that? Or some of the changes which went in are effective 
 anyway even if __cplusplus is reverted by hand to 1? I think this is the 
 question deciding what we really want to do.

I'm pretty sure this is the case for Solaris.  The other changes we've
made to support __cplusplus 199711L were no-ops without the last one to
change __cplusplus from 1 to the C++ 98 value.  So, redefining
__cplusplus to 1 should return us back to the old status.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

[PATCH][ARM] -m{cpu,tune,arch}=native

2011-08-26 Thread Andrew Stubbs


Hi all,

This patch adds support for -mcpu=native, -mtune=native, and 
-march=native for ARM Linux hosts.


So far, it only recognises Cortex-A8 and Cortex-A9, so I really need to 
find out what the magic part numbers are for other cpus before this 
patch is complete. I couldn't just find this information listed 
anywhere. I think there are a lot of clues in the kernel code, but it's 
hard to mine and it mostly only goes as far the architecture version, 
not the individual cpu. Any suggestions?


Otherwise, is this OK?

Andrew

2011-08-26  Andrew Stubbs  a...@codesourcery.com

	gcc/
	* config.host (arm*-*-linux*): Add driver-arm.o and x-arm.
	* config/arm/arm-tables.opt: Add 'native' processor type and
	architecture type.
	* config/arm/arm.h (host_detect_local_cpu): New prototype.
	(EXTRA_SPEC_FUNCTIONS): New define.
	(MCPU_MTUNE_NATIVE_SPECS): New define.
	(DRIVER_SELF_SPECS): New define.
	* config/arm/driver-arm.c: New file.
	* config/arm/x-arm: New file.
	* doc/invoke.texi (ARM Options): Document -mcpu=native,
	-mtune=native and -march=native.

--- a/gcc/config.host
+++ b/gcc/config.host
@@ -100,6 +100,14 @@ case ${host} in
 esac
 
 case ${host} in
+  arm*-*-linux*)
+case ${target} in
+  arm*-*-*)
+	host_extra_gcc_objs=driver-arm.o
+	host_xmake_file=${host_xmake_file} arm/x-arm
+	;;
+esac
+;;
   alpha*-*-linux* | alpha*-dec-osf*)
 case ${target} in
   alpha*-*-linux* | alpha*-dec-osf*)
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -25,6 +25,9 @@ Name(processor_type) Type(enum processor_type)
 Known ARM CPUs (for use with the -mcpu= and -mtune= options):
 
 EnumValue
+Enum(processor_type) String(native) Value(-1) DriverOnly
+
+EnumValue
 Enum(processor_type) String(arm2) Value(arm2)
 
 EnumValue
@@ -269,6 +272,9 @@ Name(arm_arch) Type(int)
 Known ARM architectures (for use with the -march= option):
 
 EnumValue
+Enum(arm_arch) String(native) Value(-1) DriverOnly
+
+EnumValue
 Enum(arm_arch) String(armv2) Value(0)
 
 EnumValue
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -2223,4 +2223,21 @@ extern int making_const_table;
instruction.  */
 #define MAX_LDM_STM_OPS 4
 
+/* -mcpu=native handling only makes sense with compiler running on
+   an ARM chip.  */
+#if defined(__arm__)
+extern const char *host_detect_local_cpu (int argc, const char **argv);
+# define EXTRA_SPEC_FUNCTIONS		\
+  { local_cpu_detect, host_detect_local_cpu },
+
+# define MCPU_MTUNE_NATIVE_SPECS	\
+%{march=native:%march=native %:local_cpu_detect(arch)}		\
+%{mcpu=native:%mcpu=native %:local_cpu_detect(cpu)}		\
+%{mtune=native:%mtune=native %:local_cpu_detect(tune)}
+#else
+# define MCPU_MTUNE_NATIVE_SPECS 
+#endif
+
+#define DRIVER_SELF_SPECS MCPU_MTUNE_NATIVE_SPECS
+
 #endif /* ! GCC_ARM_H */
--- /dev/null
+++ b/gcc/config/arm/driver-arm.c
@@ -0,0 +1,86 @@
+/* Subroutines for the gcc driver.
+   Copyright (C) 2011 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+http://www.gnu.org/licenses/.  */
+
+#include config.h
+#include system.h
+#include coretypes.h
+#include tm.h
+
+static struct {
+  const char *part_no;
+  const char *arch_name;
+  const char *cpu_name;
+} cpu_table[] = {
+{0xc08, armv7-a, cortex-a8},
+{0xc09, armv7-a, cortex-a9},
+{NULL, NULL, NULL}
+};
+
+/* This will be called by the spec parser in gcc.c when it sees
+   a %:local_cpu_detect(args) construct.  Currently it will be called
+   with either arch, cpu or tune as argument depending on if
+   -march=native, -mcpu=native or -mtune=native is to be substituted.
+
+   It returns a string containing new command line parameters to be
+   put at the place of the above two options, depending on what CPU
+   this is executed.  E.g. -march=armv7-a on a Cortex-A8 for
+   -march=native.  If the routine can't detect a known processor,
+   the -march or -mtune option is discarded.
+
+   ARGC and ARGV are set depending on the actual arguments given
+   in the spec.  */
+const char *
+host_detect_local_cpu (int argc, const char **argv)
+{
+  const char *val = NULL;
+  char buf[128];
+  FILE *f;
+  bool arch;
+
+  if (argc  1)
+return NULL;
+
+  arch = strcmp (argv[0], arch) == 0;
+  if (!arch  strcmp (argv[0], cpu) != 0  strcmp (argv[0], tune))
+return NULL;
+
+  f = fopen (/proc/cpuinfo, r);
+  if (f == NULL)
+return NULL;
+
+  while (fgets (buf, sizeof (buf), f) != NULL)
+if (strncmp

Re: [v3] Handle different versions of Solaris 8 iso/math_iso.h, iso/stdlib_iso.h

2011-08-26 Thread Paolo Carlini

Hi,

 I'm pretty sure this is the case for Solaris.  The other changes we've
 made to support __cplusplus 199711L were no-ops without the last one to 
 change __cplusplus from 1 to the C++ 98 value.  So, redefining
 __cplusplus to 1 should return us back to the old status.

I see, then I think the patch is Ok. Since you are so well positioned to test 
on Solaris machines, I would recommend running the library testsuite with 
-D__cplusplus=1 added to CXXFLAGS, as a final check.

Paolo

[PATCH] Support (as an extension) threadprivate procedure pointers

2011-08-26 Thread Jakub Jelinek

Hi!

This patch adds (tiny) code to handle procedure pointers in !$omp
threadprivate plus a testcase.  This is outside of the scope of OpenMP
standard, i.e. an extension so far, hopefully OpenMP 4.0 will cover at least
F2003, C++11 and maybe also F2008.  Haven't touched any other OpenMP places
wrt. procedure pointers, so e.g. they aren't allowed in various other
clauses.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2011-08-26  Jakub Jelinek  ja...@redhat.com

* trans-decl.c (get_proc_pointer_decl): Set DECL_TLS_MODEL
if threadprivate.
* symbol.c (check_conflict): Allow threadprivate attribute with
FL_PROCEDURE if proc_pointer.

* testsuite/libgomp.fortran/threadprivate4.f90: New test.

--- gcc/fortran/trans-decl.c.jj 2011-08-18 08:35:51.0 +0200
+++ gcc/fortran/trans-decl.c2011-08-26 11:34:31.0 +0200
@@ -1533,6 +1533,11 @@ get_proc_pointer_decl (gfc_symbol *sym)
  false, true);
 }
 
+  /* Handle threadprivate procedure pointers.  */
+  if (sym-attr.threadprivate
+   (TREE_STATIC (decl) || DECL_EXTERNAL (decl)))
+DECL_TLS_MODEL (decl) = decl_default_tls_model (decl);
+
   attributes = add_attributes_to_decl (sym-attr, NULL_TREE);
   decl_attributes (decl, attributes, 0);
 
--- gcc/fortran/symbol.c.jj 2011-08-22 08:17:04.0 +0200
+++ gcc/fortran/symbol.c2011-08-26 12:31:10.0 +0200
@@ -673,7 +673,8 @@ check_conflict (symbol_attribute *attr, 
  conf2 (codimension);
  conf2 (dimension);
  conf2 (function);
- conf2 (threadprivate);
+ if (!attr-proc_pointer)
+   conf2 (threadprivate);
}
 
   if (!attr-proc_pointer)
--- libgomp/testsuite/libgomp.fortran/threadprivate4.f90.jj 2011-08-26 
11:54:50.0 +0200
+++ libgomp/testsuite/libgomp.fortran/threadprivate4.f902011-08-26 
12:35:22.0 +0200
@@ -0,0 +1,78 @@
+! { dg-do run }
+! { dg-require-effective-target tls_runtime }
+
+module threadprivate4
+  integer :: vi
+  procedure(), pointer :: foo
+!$omp threadprivate (foo, vi)
+
+contains
+  subroutine fn0
+vi = 0
+  end subroutine fn0
+  subroutine fn1
+vi = 1
+  end subroutine fn1
+  subroutine fn2
+vi = 2
+  end subroutine fn2
+  subroutine fn3
+vi = 3
+  end subroutine fn3
+end module threadprivate4
+
+  use omp_lib
+  use threadprivate4
+
+  integer :: i
+  logical :: l
+
+  procedure(), pointer :: bar1
+  common /thrc/ bar1
+!$omp threadprivate (/thrc/)
+
+  procedure(), pointer, save :: bar2
+!$omp threadprivate (bar2)
+
+  l = .false.
+  call omp_set_dynamic (.false.)
+  call omp_set_num_threads (4)
+
+!$omp parallel num_threads (4) reduction (.or.:l) private (i)
+  i = omp_get_thread_num ()
+  if (i.eq.0) then
+foo = fn0
+bar1 = fn0
+bar2 = fn0
+  elseif (i.eq.1) then
+foo = fn1
+bar1 = fn1
+bar2 = fn1
+  elseif (i.eq.2) then
+foo = fn2
+bar1 = fn2
+bar2 = fn2
+  else
+foo = fn3
+bar1 = fn3
+bar2 = fn3
+  end if
+  vi = -1
+!$omp barrier
+  vi = -1
+  call foo ()
+  l=l.or.(vi.ne.i)
+  vi = -2
+  call bar1 ()
+  l=l.or.(vi.ne.i)
+  vi = -3
+  call bar2 ()
+  l=l.or.(vi.ne.i)
+  vi = -1
+!$omp end parallel
+
+  if (l) call abort
+
+end
+
+! { dg-final { cleanup-modules threadprivate4 } }

Jakub

Re: [PATCH, i386, testsuite] FMA intrinsics

2011-08-26 Thread H.J. Lu

On Fri, Aug 26, 2011 at 8:06 AM, Ilya Tocar tocarip.in...@gmail.com wrote:
 Done.



 Also fixed  changelog:

  2011-08-26  Ilya Tocar  ilya.to...@intel.com

             * config/i386/fmaintrin.h: New.
             * config.gcc: Add fmaintrin.h.
             * config/i386/i386.c
            (enum ix86_builtins) IX86_BUILTIN_VFMADDSS3: New.
             IX86_BUILTIN_VFMADDSD3: Likewise.
             * config/i386/sse.md (fmai_vmfmadd_mode): New.
             (*fmai_fmadd_mode): Likewise.
             (*fmai_fmsub_mode): Likewise.
             (*fmai_fnmadd_mode): Likewise.
             (*fmai_fnmsub_mode): Likewise.
             * config/i386/immintrin.h: Add fmaintrin.h.



-- +++ b/gcc/config/i386/fmaintrin.h
@@ -0,0 +1,229 @@
+/* Copyright (C) 2007, 2008, 2009, 2010, 2011 Free Software Foundation, Inc.
+

It should be just 2011.

H.J.

Re: [PATCH, i386, testsuite] FMA intrinsics

2011-08-26 Thread H.J. Lu

On Fri, Aug 26, 2011 at 8:47 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Aug 26, 2011 at 8:06 AM, Ilya Tocar tocarip.in...@gmail.com wrote:
 Done.



 Also fixed  changelog:

  2011-08-26  Ilya Tocar  ilya.to...@intel.com

             * config/i386/fmaintrin.h: New.
             * config.gcc: Add fmaintrin.h.
             * config/i386/i386.c
            (enum ix86_builtins) IX86_BUILTIN_VFMADDSS3: New.
             IX86_BUILTIN_VFMADDSD3: Likewise.
             * config/i386/sse.md (fmai_vmfmadd_mode): New.
             (*fmai_fmadd_mode): Likewise.
             (*fmai_fmsub_mode): Likewise.
             (*fmai_fnmadd_mode): Likewise.
             (*fmai_fnmsub_mode): Likewise.
             * config/i386/immintrin.h: Add fmaintrin.h.



 -- +++ b/gcc/config/i386/fmaintrin.h
 @@ -0,0 +1,229 @@
 +/* Copyright (C) 2007, 2008, 2009, 2010, 2011 Free Software Foundation, Inc.
 +

 It should be just 2011.


Also lines in fmaintrin.h are too long.  I prefer 72 columns.


-- 
H.J.

Re: [lto] Refactor streamer (1/N) (issue4809083)

2011-08-26 Thread Michael Matz

Hi,

On Fri, 26 Aug 2011, Jakub Jelinek wrote:

 While you are touching it, I think we should also optimize it as in the 
 patch below.  I'm afraid no string length optimization would be able to 
 figure out that it doesn't have to call strlen twice, because the 
 htab_find_slot isn't pure.

Sure.  Regstrapped the below patch and checked in as r178118.


Ciao,
Michael.
-- 
Index: lto-streamer-in.c
===
--- lto-streamer-in.c   (revision 178117)
+++ lto-streamer-in.c   (revision 178118)
@@ -98,21 +98,22 @@ canon_file_name (const char *string)
 {
   void **slot;
   struct string_slot s_slot;
+  size_t len = strlen (string);
+
   s_slot.s = string;
-  s_slot.len = strlen (string);
+  s_slot.len = len;

   slot = htab_find_slot (file_name_hash_table, s_slot, INSERT);
   if (*slot == NULL)
 {
-  size_t len;
   char *saved_string;
   struct string_slot *new_slot;

-  len = strlen (string);
   saved_string = (char *) xmalloc (len + 1);
   new_slot = XCNEW (struct string_slot);
-  strcpy (saved_string, string);
+  memcpy (saved_string, string, len + 1);
   new_slot-s = saved_string;
+  new_slot-len = len;
   *slot = new_slot;
   return saved_string;
 }

[PATCH][ARM] Generic tuning

2011-08-26 Thread Andrew Stubbs


Hi all,

This patch is step 1 towards having generic (best-blend) tuning on ARM.

The patch adds an option '-mtune=generic-armv7-a' but does not actually 
do any tuning tweaks yet - those are for follow up patches.


x86 has simply '-mtune=generic', and from that (the documentation 
suggests) the compiler selects the most common architecture/cpu variants 
to tune for. I don't think that translates well to the ARM world, so I 
have chosen to make it generic within the architecture family.


My intention is to make this the default tuning whenever the use 
specifies '-march=armv7-a', but that will have to wait until it does 
something meaningful.


OK?

Andrew
2011-08-26  Andrew Stubbs  a...@codesourcery.com

	gcc/
	* config/arm/arm-cores.def (generic-armv7-a): New architecture.
	* config/arm/arm-tables.opt: Add generic-armv7-a tune/cpu type.
	* config/arm/arm-tune.md: Regenerate.
	* config/arm/arm.c (arm_file_start): Output .arch directive when
	user passes -mcpu=generic-*.
	(arm_issue_rate): Add genericv7a support.
	* config/arm/arm.h (EXTRA_SPECS): Add asm_cpu_spec.
	(ASM_CPU_SPEC): New define.
	* config/arm/elf.h (ASM_SPEC): Use %(asm_cpu_spec).
	* config/arm/semi.h (ASM_SPEC): Likewise.
	* doc/invoke.texi (ARM Options): Document -mcpu=generic-*
	and -mtune=generic-*.

--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -124,6 +124,7 @@ ARM_CORE(mpcorenovfp,	  mpcorenovfp,	6K, FL_LDSCHED, 9e)
 ARM_CORE(mpcore,	  mpcore,	6K, FL_LDSCHED | FL_VFPV2, 9e)
 ARM_CORE(arm1156t2-s,	  arm1156t2s,	6T2, FL_LDSCHED, v6t2)
 ARM_CORE(arm1156t2f-s,  arm1156t2fs,  6T2, FL_LDSCHED | FL_VFPV2, v6t2)
+ARM_CORE(generic-armv7-a, genericv7a,	7A, FL_LDSCHED, cortex)
 ARM_CORE(cortex-a5,	  cortexa5,	7A, FL_LDSCHED, cortex_a5)
 ARM_CORE(cortex-a8,	  cortexa8,	7A, FL_LDSCHED, cortex)
 ARM_CORE(cortex-a9,	  cortexa9,	7A, FL_LDSCHED, cortex_a9)
@@ -135,3 +136,4 @@ ARM_CORE(cortex-m4,	  cortexm4,	7EM, FL_LDSCHED, cortex)
 ARM_CORE(cortex-m3,	  cortexm3,	7M, FL_LDSCHED, cortex)
 ARM_CORE(cortex-m1,	  cortexm1,	6M, FL_LDSCHED, cortex)
 ARM_CORE(cortex-m0,	  cortexm0,	6M, FL_LDSCHED, cortex)
+
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -235,6 +235,9 @@ EnumValue
 Enum(processor_type) String(arm1156t2f-s) Value(arm1156t2fs)
 
 EnumValue
+Enum(processor_type) String(generic-armv7-a) Value(genericv7a)
+
+EnumValue
 Enum(processor_type) String(cortex-a5) Value(cortexa5)
 
 EnumValue
--- a/gcc/config/arm/arm-tune.md
+++ b/gcc/config/arm/arm-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from arm-cores.def
 (define_attr tune
-	arm2,arm250,arm3,arm6,arm60,arm600,arm610,arm620,arm7,arm7d,arm7di,arm70,arm700,arm700i,arm710,arm720,arm710c,arm7100,arm7500,arm7500fe,arm7m,arm7dm,arm7dmi,arm8,arm810,strongarm,strongarm110,strongarm1100,strongarm1110,fa526,fa626,arm7tdmi,arm7tdmis,arm710t,arm720t,arm740t,arm9,arm9tdmi,arm920,arm920t,arm922t,arm940t,ep9312,arm10tdmi,arm1020t,arm9e,arm946es,arm966es,arm968es,arm10e,arm1020e,arm1022e,xscale,iwmmxt,iwmmxt2,fa606te,fa626te,fmp626,fa726te,arm926ejs,arm1026ejs,arm1136js,arm1136jfs,arm1176jzs,arm1176jzfs,mpcorenovfp,mpcore,arm1156t2s,arm1156t2fs,cortexa5,cortexa8,cortexa9,cortexa15,cortexr4,cortexr4f,cortexr5,cortexm4,cortexm3,cortexm1,cortexm0
+	arm2,arm250,arm3,arm6,arm60,arm600,arm610,arm620,arm7,arm7d,arm7di,arm70,arm700,arm700i,arm710,arm720,arm710c,arm7100,arm7500,arm7500fe,arm7m,arm7dm,arm7dmi,arm8,arm810,strongarm,strongarm110,strongarm1100,strongarm1110,fa526,fa626,arm7tdmi,arm7tdmis,arm710t,arm720t,arm740t,arm9,arm9tdmi,arm920,arm920t,arm922t,arm940t,ep9312,arm10tdmi,arm1020t,arm9e,arm946es,arm966es,arm968es,arm10e,arm1020e,arm1022e,xscale,iwmmxt,iwmmxt2,fa606te,fa626te,fmp626,fa726te,arm926ejs,arm1026ejs,arm1136js,arm1136jfs,arm1176jzs,arm1176jzfs,mpcorenovfp,mpcore,arm1156t2s,arm1156t2fs,genericv7a,cortexa5,cortexa8,cortexa9,cortexa15,cortexr4,cortexr4f,cortexr5,cortexm4,cortexm3,cortexm1,cortexm0
 	(const (symbol_ref ((enum attr_tune) arm_tune
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -22195,6 +22195,8 @@ arm_file_start (void)
   const char *fpu_name;
   if (arm_selected_arch)
 	asm_fprintf (asm_out_file, \t.arch %s\n, arm_selected_arch-name);
+  else if (strncmp (arm_selected_cpu-name, generic, 7) == 0)
+	asm_fprintf (asm_out_file, \t.arch %s\n, arm_selected_cpu-name + 8);
   else
 	asm_fprintf (asm_out_file, \t.cpu %s\n, arm_selected_cpu-name);
 
@@ -23719,6 +23721,7 @@ arm_issue_rate (void)
 case cortexr4:
 case cortexr4f:
 case cortexr5:
+case genericv7a:
 case cortexa5:
 case cortexa8:
 case cortexa9:
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -189,6 +189,7 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
Do not define this macro if it does not need to do anything.  */
 #define EXTRA_SPECS		\
   { subtarget_cpp_spec,	SUBTARGET_CPP_SPEC },

Re: [PATCH][ARM] -m{cpu,tune,arch}=native

2011-08-26 Thread Joseph S. Myers

On Fri, 26 Aug 2011, Andrew Stubbs wrote:

 Hi all,
 
 This patch adds support for -mcpu=native, -mtune=native, and -march=native for
 ARM Linux hosts.
 
 So far, it only recognises Cortex-A8 and Cortex-A9, so I really need to find
 out what the magic part numbers are for other cpus before this patch is
 complete. I couldn't just find this information listed anywhere. I think there
 are a lot of clues in the kernel code, but it's hard to mine and it mostly
 only goes as far the architecture version, not the individual cpu. Any
 suggestions?
 
 Otherwise, is this OK?

arm-tables.opt is a generated file.  You need to modify the source files 
and regenerate it, not modify the generated file.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH][ARM] Generic tuning

2011-08-26 Thread Joseph S. Myers

Again, arm-tables.opt is generated - so the log entry should just be

* config/arm/arm-tables.opt: Regenerate.

and the file should be what you get from regeneration.

-- 
Joseph S. Myers
jos...@codesourcery.com

PING: PATCH: PR preprocessor/39533: -MM may list a header file twice

2011-08-26 Thread H.J. Lu

On Wed, Apr 15, 2009 at 1:07 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Wed, Apr 15, 2009 at 11:51 AM, Tom Tromey tro...@redhat.com wrote:
 H.J. == H J Lu hjl.to...@gmail.com writes:

 H.J. Can you take a look at my patch:
 H.J. http://gcc.gnu.org/ml/gcc-patches/2009-03/msg01829.html

 I looked at this today.  I don't understand why the check is not done
 in the loop.  Also I don't understand whether this patch can change
 the directory search order in cases like #include_next.  Can you
 comment on this issue?

 gcc.dg/cpp/pr20356.c checks the expected behavior for #include_next.
 My patch works with it.

 And more generally, could you try to provide some explanation for how
 these patches are supposed to function?  FWIW the reason it takes me
 so long to look at them is that I have to reverse engineer the logic,
 usually by applying the patch and stepping through with the
 debugger... which is an awful lot of work for a bug which is
 essentially cosmetic.


 There is only one patch:

 http://gcc.gnu.org/ml/gcc-patches/2009-03/msg01829.html

 search_cache checks if the file can be found when starting
 searching at START_DIR with a trailing '/'.  If the start_dir field of
 head of hash entry isn't NULL, it is the start search directory for
 the cached file. If START_DIR + name is the same as
 pathname for head and START_DIR is the directory which
 contains the file,

 (!strncmp (start_dir-name, file-path, start_dir-len)
   !strcmp (file-name, file-path + start_dir-len)))

 that means the cached head is a perfect match.
 We don't need to add START_DIR to start search at
 START_DIR (with trailing '/') and then START_DIR (without
 trailing '/')


This is very old.  I also forgot about it.  OK for trunk?

Thanks.

-- 
H.J.

Re: [PATCH, ARM] Unaligned accesses for packed structures [1/2]

2011-08-26 Thread Julian Brown

On Thu, 25 Aug 2011 18:31:21 +0100
Julian Brown jul...@codesourcery.com wrote:

 On Thu, 25 Aug 2011 16:46:50 +0100
 Julian Brown jul...@codesourcery.com wrote:
 
  So, OK to apply this version, assuming testing comes out OK? (And
  the followup patch [2/2], which remains unchanged?)
 
 FWIW, all tests pass, apart from
 gcc.target/arm/volatile-bitfields-3.c, which regresses. The output
 contains:
 
 ldrhr0, [r3, #2]@ unaligned
 
 I believe that, to conform to the ARM EABI, that GCC must use an
 (aligned) ldr in this case. Is that correct? If so, it looks like the
 middle-end bitfield code does not take the setting of
 -fstrict-volatile-bitfields into account.

This version fixes the last issue, by adding additional checks for
volatile accesses/-fstrict-volatile-bitfields. Tests now show no
regressions.

OK to apply?

Thanks,

Julian

ChangeLog

gcc/
* config/arm/arm.c (arm_override_options): Add unaligned_access
support.
(arm_file_start): Emit attribute for unaligned access as
appropriate.
* config/arm/arm.md (UNSPEC_UNALIGNED_LOAD)
(UNSPEC_UNALIGNED_STORE): Add constants for unspecs.
(insv, extzv): Add unaligned-access support.
(extv): Change to expander. Likewise.
(extzv_t1, extv_regsi): Add helpers.
(unaligned_loadsi, unaligned_loadhis, unaligned_loadhiu)
(unaligned_storesi, unaligned_storehi): New.
(*extv_reg): New (previous extv implementation).
* config/arm/arm.opt (munaligned_access): Add option.
* config/arm/constraints.md (Uw): New constraint.
* expmed.c (store_bit_field_1): Adjust bitfield numbering according
to size of access, not size of unit, when BITS_BIG_ENDIAN !=
BYTES_BIG_ENDIAN. Don't use bitfield accesses for
volatile accesses when -fstrict-volatile-bitfields is in effect.
(extract_bit_field_1): Likewise.
commit 645a7c99ff91ea2841c8101fb3c76e3b1fddb2c7
Author: Julian Brown jul...@henry8.codesourcery.com
Date:   Tue Aug 23 05:46:22 2011 -0700

Unaligned support for packed structs

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 3162b30..cc1eb80 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1905,6 +1905,28 @@ arm_option_override (void)
 	fix_cm3_ldrd = 0;
 }
 
+  /* Enable -munaligned-access by default for
+ - all ARMv6 architecture-based processors
+ - ARMv7-A, ARMv7-R, and ARMv7-M architecture-based processors.
+
+ Disable -munaligned-access by default for
+ - all pre-ARMv6 architecture-based processors
+ - ARMv6-M architecture-based processors.  */
+
+  if (unaligned_access == 2)
+{
+  if (arm_arch6  (arm_arch_notm || arm_arch7))
+	unaligned_access = 1;
+  else
+	unaligned_access = 0;
+}
+  else if (unaligned_access == 1
+	!(arm_arch6  (arm_arch_notm || arm_arch7)))
+{
+  warning (0, target CPU does not support unaligned accesses);
+  unaligned_access = 0;
+}
+
   if (TARGET_THUMB1  flag_schedule_insns)
 {
   /* Don't warn since it's on by default in -O2.  */
@@ -22145,6 +22167,10 @@ arm_file_start (void)
 	val = 6;
   asm_fprintf (asm_out_file, \t.eabi_attribute 30, %d\n, val);
 
+  /* Tag_CPU_unaligned_access.  */
+  asm_fprintf (asm_out_file, \t.eabi_attribute 34, %d\n,
+		   unaligned_access);
+
   /* Tag_ABI_FP_16bit_format.  */
   if (arm_fp16_format)
 	asm_fprintf (asm_out_file, \t.eabi_attribute 38, %d\n,
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 0f23400..0ea0f7f 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -103,6 +103,10 @@
   UNSPEC_SYMBOL_OFFSET  ; The offset of the start of the symbol from
 ; another symbolic address.
   UNSPEC_MEMORY_BARRIER ; Represent a memory barrier.
+  UNSPEC_UNALIGNED_LOAD	; Used to represent ldr/ldrh instructions that access
+			; unaligned locations, on architectures which support
+			; that.
+  UNSPEC_UNALIGNED_STORE ; Same for str/strh.
 ])
 
 ;; UNSPEC_VOLATILE Usage:
@@ -2468,10 +2472,10 @@
 ;;; this insv pattern, so this pattern needs to be reevalutated.
 
 (define_expand insv
-  [(set (zero_extract:SI (match_operand:SI 0 s_register_operand )
- (match_operand:SI 1 general_operand )
- (match_operand:SI 2 general_operand ))
-(match_operand:SI 3 reg_or_int_operand ))]
+  [(set (zero_extract (match_operand 0 nonimmediate_operand )
+  (match_operand 1 general_operand )
+  (match_operand 2 general_operand ))
+(match_operand 3 reg_or_int_operand ))]
   TARGET_ARM || arm_arch_thumb2
   
   {
@@ -2482,35 +2486,70 @@
 
 if (arm_arch_thumb2)
   {
-	bool use_bfi = TRUE;
-
-	if (GET_CODE (operands[3]) == CONST_INT)
+if (unaligned_access  MEM_P (operands[0])
+	 s_register_operand (operands[3], GET_MODE (operands[3]))
+	 (width == 16 || width == 32)  (start_bit % BITS_PER_UNIT) == 0)
 	  {
-	HOST_WIDE_INT val = INTVAL (operands[3])  mask;
+

Re: [v3] Handle different versions of Solaris 8 iso/math_iso.h, iso/stdlib_iso.h

2011-08-26 Thread Rainer Orth

Hi Paolo,

 I'm pretty sure this is the case for Solaris.  The other changes we've
 made to support __cplusplus 199711L were no-ops without the last one to 
 change __cplusplus from 1 to the C++ 98 value.  So, redefining
 __cplusplus to 1 should return us back to the old status.

 I see, then I think the patch is Ok. Since you are so well positioned to test 
 on Solaris machines, I would recommend running the library testsuite with 
 -D__cplusplus=1 added to CXXFLAGS, as a final check.

I've just done that on i386-pc-solaris2.11, but had to use -U__cplusplus
-D__cplusplus=1 to avoid the redefinition warning.  This way, I get only
a single regression:

-PASS: abi/header_cxxabi.c (test for excess errors)
+FAIL: abi/header_cxxabi.c (test for excess errors)

FAIL: abi/header_cxxabi.c (test for excess errors)
Excess errors:
/var/gcc/regression/trunk/11-gcc/build/i386-pc-solaris2.11/libstdc++-v3/include/i386-pc-solaris2.11/bits/c++config.h:167:1:
 error: unknown type name 'namespace'
/var/gcc/regression/trunk/11-gcc/build/i386-pc-solaris2.11/libstdc++-v3/include/i386-pc-solaris2.11/bits/c++config.h:168:1:
 error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token

which is pretty obvious given that this test is supposed to be compiled
as C :-)

I guess the patch is ok now?

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)

2011-08-26 Thread Sriraman Tallam

On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam tmsri...@google.com wrote:
 Hi,

  Thanks for all the comments. I am attaching a new patch
 incorporating all of the changes mentioned, mainly :

 1) Make __cpu_indicator_init a constructor in libgcc and guard to call
 it only once.

 This is unreliable and you don't need 3 symbols from libgcc. You can use

Do you mean it is unreliable because of the constructor ordering problem?


 static struct cpu_indicator
 {
  feature
  model
  status
 } cpu_indicator;

 struct cpu_indicator *
 __get_cpu_indicator ()
 {
   if cpu_indicator is uninitialized; then
      initialize cpu_indicator;
  return cpu_indicator;
 }

 You can simply call __get_cpu_indicator to
 get a pointer to cpu_indicator;

 --
 H.J.

Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)

2011-08-26 Thread H.J. Lu

On Fri, Aug 26, 2011 at 10:06 AM, Sriraman Tallam tmsri...@google.com wrote:
 On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam tmsri...@google.com wrote:
 Hi,

  Thanks for all the comments. I am attaching a new patch
 incorporating all of the changes mentioned, mainly :

 1) Make __cpu_indicator_init a constructor in libgcc and guard to call
 it only once.

 This is unreliable and you don't need 3 symbols from libgcc. You can use

 Do you mean it is unreliable because of the constructor ordering problem?


You do not have total control when __cpu_indicator_init is called.

Also you shouldn't use bitfield in

struct __processor_model
+{
+  unsigned int __cpu_is_amd : 1;
+  unsigned int __cpu_is_intel : 1;
+  unsigned int __cpu_is_intel_atom : 1;
+  unsigned int __cpu_is_intel_core2 : 1;
+  unsigned int __cpu_is_intel_corei7_nehalem : 1;
+  unsigned int __cpu_is_intel_corei7_westmere : 1;
+  unsigned int __cpu_is_intel_corei7_sandybridge : 1;
+  unsigned int __cpu_is_amdfam10_barcelona : 1;
+  unsigned int __cpu_is_amdfam10_shanghai : 1;
+  unsigned int __cpu_is_amdfam10_istanbul : 1;
+} __cpu_model = {0};
+

A processor can't be both Atom and Core 2.

-- 
H.J.

Re: [v3] Handle different versions of Solaris 8 iso/math_iso.h, iso/stdlib_iso.h

2011-08-26 Thread Jonathan Wakely

On 26 August 2011 14:09, Paolo Carlini wrote:
 On 8/26/11 2:59 PM, Rainer Orth wrote:

 Hi Paolo,

 Ok for mainline if bootstraps pass?

 Not a comment strictly about this patch, but why we have things like #if
 __cplusplus= 199711L anywhere? For sure the library is not supposed to
 be
 used together with old C++ front-ends.

 I thought about this myself, but at least the overloads are only present
 with __cplusplus= 199711L.

 I don't understand: isn't __cplusplus now *always* = 199711L? Or you want
 to protect vs the user undefining __cplusplus and then defining it to a
 different value?!? I don't have the Standard at hand (in theory I'm in
 vacation ;), maybe Marc can help, but I don't think it's legal, is it?

[cpp.predefined]/3:

If any of the pre-defined macro names in this subclause, or the
identifier defined, is the subject of a #define or a #undef
preprocessing directive, the behavior is undefined.

Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)

2011-08-26 Thread Sriraman Tallam

On Fri, Aug 26, 2011 at 10:10 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Aug 26, 2011 at 10:06 AM, Sriraman Tallam tmsri...@google.com wrote:
 On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam tmsri...@google.com 
 wrote:
 Hi,

  Thanks for all the comments. I am attaching a new patch
 incorporating all of the changes mentioned, mainly :

 1) Make __cpu_indicator_init a constructor in libgcc and guard to call
 it only once.

 This is unreliable and you don't need 3 symbols from libgcc. You can use

 Do you mean it is unreliable because of the constructor ordering problem?


 You do not have total control when __cpu_indicator_init is called.

Like  discussed before, for non-ctor functions, which in my opinion is
the common use case, it works out great because __cpu_indicator_init
is guaranteed to be called and I save doing an extra check. It is only
for other ctors where this is a problem. So other ctors call this
explicitly.  What did I miss?

Thanks,
-Sri.


 Also you shouldn't use bitfield in

 struct __processor_model
 +{
 +  unsigned int __cpu_is_amd : 1;
 +  unsigned int __cpu_is_intel : 1;
 +  unsigned int __cpu_is_intel_atom : 1;
 +  unsigned int __cpu_is_intel_core2 : 1;
 +  unsigned int __cpu_is_intel_corei7_nehalem : 1;
 +  unsigned int __cpu_is_intel_corei7_westmere : 1;
 +  unsigned int __cpu_is_intel_corei7_sandybridge : 1;
 +  unsigned int __cpu_is_amdfam10_barcelona : 1;
 +  unsigned int __cpu_is_amdfam10_shanghai : 1;
 +  unsigned int __cpu_is_amdfam10_istanbul : 1;
 +} __cpu_model = {0};
 +

 A processor can't be both Atom and Core 2.

 --
 H.J.

Re: [v3] Handle different versions of Solaris 8 iso/math_iso.h, iso/stdlib_iso.h

2011-08-26 Thread Jonathan Wakely

On 26 August 2011 18:13, Jonathan Wakely wrote:
 On 26 August 2011 14:09, Paolo Carlini wrote:
 On 8/26/11 2:59 PM, Rainer Orth wrote:

 Hi Paolo,

 Ok for mainline if bootstraps pass?

 Not a comment strictly about this patch, but why we have things like #if
 __cplusplus= 199711L anywhere? For sure the library is not supposed to
 be
 used together with old C++ front-ends.

 I thought about this myself, but at least the overloads are only present
 with __cplusplus= 199711L.

 I don't understand: isn't __cplusplus now *always* = 199711L? Or you want
 to protect vs the user undefining __cplusplus and then defining it to a
 different value?!? I don't have the Standard at hand (in theory I'm in
 vacation ;), maybe Marc can help, but I don't think it's legal, is it?

 [cpp.predefined]/3:

 If any of the pre-defined macro names in this subclause, or the
 identifier defined, is the subject of a #define or a #undef
 preprocessing directive, the behavior is undefined.


More specifically, __cplusplus is ***NOT*** a feature-test macro like
_POSIX_SOURCE that can be set by users to request different language
standards.

Setting __cplusplus will have no effect on the front-end, but might
confuse the library (or other third-party headers) just as using
-D__GXX_EXPERIMENTAL_CXX0X__ without -std=g++0x will cause big
problems, because the front-end will be using -std=c++98 mode but the
library will think C++0x support is enabled.  Doing this will cause
pain.

If there is ***any*** maintenance overhead involved in supporting
users who try to redefine __cplusplus then I think it's a mistake.
I'm certainly not going to think of the effects on those users when I
make changes to the library.

Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)

2011-08-26 Thread H.J. Lu

On Fri, Aug 26, 2011 at 10:17 AM, Sriraman Tallam tmsri...@google.com wrote:
 On Fri, Aug 26, 2011 at 10:10 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Aug 26, 2011 at 10:06 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam tmsri...@google.com 
 wrote:
 Hi,

  Thanks for all the comments. I am attaching a new patch
 incorporating all of the changes mentioned, mainly :

 1) Make __cpu_indicator_init a constructor in libgcc and guard to call
 it only once.

 This is unreliable and you don't need 3 symbols from libgcc. You can use

 Do you mean it is unreliable because of the constructor ordering problem?


 You do not have total control when __cpu_indicator_init is called.

 Like  discussed before, for non-ctor functions, which in my opinion is
 the common use case, it works out great because __cpu_indicator_init
 is guaranteed to be called and I save doing an extra check. It is only
 for other ctors where this is a problem. So other ctors call this
 explicitly.  What did I miss?


I have

static void foo ( void ) __attribute__((constructor));

static void foo ( void )
{
   ...
   call bar ();
   ...
}

in my application. bar () uses those cpu specific functions.
foo () is called before __cpu_indicator_init.  Since IFUNC
returns the cpu specific function address only for the
first call, the proper cpu specific functions will never be used.


-- 
H.J.

Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)

2011-08-26 Thread Sriraman Tallam

On Fri, Aug 26, 2011 at 10:24 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Aug 26, 2011 at 10:17 AM, Sriraman Tallam tmsri...@google.com wrote:
 On Fri, Aug 26, 2011 at 10:10 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Aug 26, 2011 at 10:06 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam tmsri...@google.com 
 wrote:
 Hi,

  Thanks for all the comments. I am attaching a new patch
 incorporating all of the changes mentioned, mainly :

 1) Make __cpu_indicator_init a constructor in libgcc and guard to call
 it only once.

 This is unreliable and you don't need 3 symbols from libgcc. You can use

 Do you mean it is unreliable because of the constructor ordering problem?


 You do not have total control when __cpu_indicator_init is called.

 Like  discussed before, for non-ctor functions, which in my opinion is
 the common use case, it works out great because __cpu_indicator_init
 is guaranteed to be called and I save doing an extra check. It is only
 for other ctors where this is a problem. So other ctors call this
 explicitly.  What did I miss?


 I have

 static void foo ( void ) __attribute__((constructor));

 static void foo ( void )
 {
   ...
   call bar ();
   ...
 }

 in my application. bar () uses those cpu specific functions.
 foo () is called before __cpu_indicator_init.  Since IFUNC
 returns the cpu specific function address only for the
 first call, the proper cpu specific functions will never be used.

Please correct me if I am wrong since I did not follow the IFUNC part
you mentioned.  However, it looks like this could be solved with
adding an explicit call to __cpu_indicator_init from within the ctor
foo. To me, it seems like the pain of adding this call explicitly in
other ctors is worth it because it works cleanly for non-ctors.

static void foo ( void ) __attribute__((constructor));

static void foo ( void )
{
  ...
  __cpu_indicator_init ();
  call bar ();
  ...
}

Will this work?

Thanks,
-Sri.



 --
 H.J.

Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)

2011-08-26 Thread H.J. Lu

On Fri, Aug 26, 2011 at 10:37 AM, Sriraman Tallam tmsri...@google.com wrote:
 On Fri, Aug 26, 2011 at 10:24 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Aug 26, 2011 at 10:17 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Fri, Aug 26, 2011 at 10:10 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Aug 26, 2011 at 10:06 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam tmsri...@google.com 
 wrote:
 Hi,

  Thanks for all the comments. I am attaching a new patch
 incorporating all of the changes mentioned, mainly :

 1) Make __cpu_indicator_init a constructor in libgcc and guard to call
 it only once.

 This is unreliable and you don't need 3 symbols from libgcc. You can use

 Do you mean it is unreliable because of the constructor ordering problem?


 You do not have total control when __cpu_indicator_init is called.

 Like  discussed before, for non-ctor functions, which in my opinion is
 the common use case, it works out great because __cpu_indicator_init
 is guaranteed to be called and I save doing an extra check. It is only
 for other ctors where this is a problem. So other ctors call this
 explicitly.  What did I miss?


 I have

 static void foo ( void ) __attribute__((constructor));

 static void foo ( void )
 {
   ...
   call bar ();
   ...
 }

 in my application. bar () uses those cpu specific functions.
 foo () is called before __cpu_indicator_init.  Since IFUNC
 returns the cpu specific function address only for the
 first call, the proper cpu specific functions will never be used.

 Please correct me if I am wrong since I did not follow the IFUNC part
 you mentioned.  However, it looks like this could be solved with
 adding an explicit call to __cpu_indicator_init from within the ctor
 foo. To me, it seems like the pain of adding this call explicitly in
 other ctors is worth it because it works cleanly for non-ctors.

 static void foo ( void ) __attribute__((constructor));

 static void foo ( void )
 {
  ...
  __cpu_indicator_init ();
  call bar ();
  ...
 }

 Will this work?



Do I have to do that in every constructor, including
C++ global constructors?  It is ridiculous.

-- 
H.J.

Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)

2011-08-26 Thread Xinliang David Li

Is there a standard way to force this init function to be called
before all ctors?  Adding a ctor in one crtx.o ?

David

On Fri, Aug 26, 2011 at 10:45 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Aug 26, 2011 at 10:37 AM, Sriraman Tallam tmsri...@google.com wrote:
 On Fri, Aug 26, 2011 at 10:24 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Aug 26, 2011 at 10:17 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Fri, Aug 26, 2011 at 10:10 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Aug 26, 2011 at 10:06 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam tmsri...@google.com 
 wrote:
 Hi,

  Thanks for all the comments. I am attaching a new patch
 incorporating all of the changes mentioned, mainly :

 1) Make __cpu_indicator_init a constructor in libgcc and guard to call
 it only once.

 This is unreliable and you don't need 3 symbols from libgcc. You can use

 Do you mean it is unreliable because of the constructor ordering problem?


 You do not have total control when __cpu_indicator_init is called.

 Like  discussed before, for non-ctor functions, which in my opinion is
 the common use case, it works out great because __cpu_indicator_init
 is guaranteed to be called and I save doing an extra check. It is only
 for other ctors where this is a problem. So other ctors call this
 explicitly.  What did I miss?


 I have

 static void foo ( void ) __attribute__((constructor));

 static void foo ( void )
 {
   ...
   call bar ();
   ...
 }

 in my application. bar () uses those cpu specific functions.
 foo () is called before __cpu_indicator_init.  Since IFUNC
 returns the cpu specific function address only for the
 first call, the proper cpu specific functions will never be used.

 Please correct me if I am wrong since I did not follow the IFUNC part
 you mentioned.  However, it looks like this could be solved with
 adding an explicit call to __cpu_indicator_init from within the ctor
 foo. To me, it seems like the pain of adding this call explicitly in
 other ctors is worth it because it works cleanly for non-ctors.

 static void foo ( void ) __attribute__((constructor));

 static void foo ( void )
 {
  ...
  __cpu_indicator_init ();
  call bar ();
  ...
 }

 Will this work?



 Do I have to do that in every constructor, including
 C++ global constructors?  It is ridiculous.

 --
 H.J.

Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)

2011-08-26 Thread Sriraman Tallam

On Fri, Aug 26, 2011 at 10:45 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Aug 26, 2011 at 10:37 AM, Sriraman Tallam tmsri...@google.com wrote:
 On Fri, Aug 26, 2011 at 10:24 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Aug 26, 2011 at 10:17 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Fri, Aug 26, 2011 at 10:10 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Aug 26, 2011 at 10:06 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam tmsri...@google.com 
 wrote:
 Hi,

  Thanks for all the comments. I am attaching a new patch
 incorporating all of the changes mentioned, mainly :

 1) Make __cpu_indicator_init a constructor in libgcc and guard to call
 it only once.

 This is unreliable and you don't need 3 symbols from libgcc. You can use

 Do you mean it is unreliable because of the constructor ordering problem?


 You do not have total control when __cpu_indicator_init is called.

 Like  discussed before, for non-ctor functions, which in my opinion is
 the common use case, it works out great because __cpu_indicator_init
 is guaranteed to be called and I save doing an extra check. It is only
 for other ctors where this is a problem. So other ctors call this
 explicitly.  What did I miss?


 I have

 static void foo ( void ) __attribute__((constructor));

 static void foo ( void )
 {
   ...
   call bar ();
   ...
 }

 in my application. bar () uses those cpu specific functions.
 foo () is called before __cpu_indicator_init.  Since IFUNC
 returns the cpu specific function address only for the
 first call, the proper cpu specific functions will never be used.

 Please correct me if I am wrong since I did not follow the IFUNC part
 you mentioned.  However, it looks like this could be solved with
 adding an explicit call to __cpu_indicator_init from within the ctor
 foo. To me, it seems like the pain of adding this call explicitly in
 other ctors is worth it because it works cleanly for non-ctors.

 static void foo ( void ) __attribute__((constructor));

 static void foo ( void )
 {
  ...
  __cpu_indicator_init ();
  call bar ();
  ...
 }

 Will this work?



 Do I have to do that in every constructor, including
 C++ global constructors?  It is ridiculous.

It seems like libgcc is on the link line after user code in the
command-line and so __cpu_indicator_init should fire first, both when
statically and dynamically linked.
Example:

foo.cc:
int  __attribute__ ((constructor))
foo ()
{
  return 0;
}


However, with something like this :

g++ -Wl,--u,__cpu_indicator_init  -lgcc foo.cc

foo gets called ahead of __cpu_indicator_init. For these abnormal link
usages, call it explicitly. So, can you please give me a common use
case where __cpu_inidicator_init will get called after a constructor.

Thanks,
-Sri.


 --
 H.J.

Re: [C++0x] contiguous bitfields race implementation

2011-08-26 Thread Aldy Hernandez

This is a slight update from the last revision, with your issues 
addressed as I explained in the last email.  However, everything turned 
out to be much tricker than I expected (variable length offsets with 
arrays, bit fields spanning multiple words, surprising padding 
gymnastics by GCC, etc etc).


It turns out that what we need is to know the precise bit region size at 
all times, and adjust it as we rearrange and cut things into pieces 
throughout the RTL bit field machinery.


I enabled the C++ memory model, and forced a boostrap and regression 
test with it.  This brought about many interesting cases, which I was 
able to distill and add to the testsuite.


Of particular interest was the struct-layout-1.exp tests.  Since many of 
the tests set a global bit field, only to later check it against a local 
variable containing the same value, it is the perfect stressor because, 
while globals are restricted under the memory model, locals are not.  So 
we can check that we can interoperate with the less restrictive model, 
and that the patch does not introduce ABI inconsistencies.  After much 
grief, we are now passing all the struct-layout-1.exp tests. 
Eventually, I'd like to force the struct-layout-1.exp tests to run for 
--param allow-store-data-races=0 as well.  Unfortunately, this will 
increase testing time.


I have (unfortunately) introduced an additional call to 
get_inner_reference(), but only for the field itself (one time).  I 
can't remember the details, but it was something to effect of the bit 
position + padding being impossible to calculate in one variable array 
reference case.  I can dig up the case if you'd like.


I am currently tackling a reload miscompilation failure while building a 
32-bit library.  I am secretly hoping your review will uncover the flaw 
without me having to pick this up.  Otherwise, this is a much more 
comprehensive approach than what is currently in mainline, and we now 
pass all the bitfield tests the GCC testsuite could throw at it.


Fire away.
* machmode.h (get_best_mode): Remove 2 arguments.
* fold-const.c (optimize_bit_field_compare): Same.
(fold_truthop): Same.
* expr.c (store_field): Change argument types in prototype.
(emit_group_store): Change argument types to store_bit_field call.
(copy_blkmode_from_reg): Same.
(write_complex_part): Same.
(optimize_bitfield_assignment_op): Change argument types.
Change arguments to get_best_mode.
(get_bit_range): Rewrite.
(expand_assignment): Adjust new call to get_bit_range.
Adjust bitregion_offset when to_rtx is changed.
Adjust calls to store_field with new argument types.
(store_field): New argument types.
Adjust calls to store_bit_field with new arguments.
* expr.h (store_bit_field): Change argument types.
* stor-layout.c (get_best_mode): Remove use of bitregion* arguments.
* expmed.c (store_bit_field_1): Change argument types.
Do not calculate maxbits.
Adjust bitregion_maxbits if offset changes.
(store_bit_field): Change argument types.
Adjust address taking into account bitregion_offset.
(store_fixed_bit_field): Change argument types.
Do not calculate maxbits.
(store_split_bit_field): Change argument types.
(extract_bit_field_1): Adjust arguments to get_best_mode.
(extract_fixed_bit_field): Same.

Index: machmode.h
===
--- machmode.h  (revision 176891)
+++ machmode.h  (working copy)
@@ -249,8 +249,6 @@ extern enum machine_mode mode_for_vector
 /* Find the best mode to use to access a bit field.  */
 
 extern enum machine_mode get_best_mode (int, int,
-   unsigned HOST_WIDE_INT,
-   unsigned HOST_WIDE_INT,
unsigned int,
enum machine_mode, int);
 
Index: fold-const.c
===
--- fold-const.c(revision 176891)
+++ fold-const.c(working copy)
@@ -3394,7 +3394,7 @@ optimize_bit_field_compare (location_t l
flag_strict_volatile_bitfields  0)
 nmode = lmode;
   else
-nmode = get_best_mode (lbitsize, lbitpos, 0, 0,
+nmode = get_best_mode (lbitsize, lbitpos,
   const_p ? TYPE_ALIGN (TREE_TYPE (linner))
   : MIN (TYPE_ALIGN (TREE_TYPE (linner)),
  TYPE_ALIGN (TREE_TYPE (rinner))),
@@ -5221,7 +5221,7 @@ fold_truthop (location_t loc, enum tree_
  to be relative to a field of that size.  */
   first_bit = MIN (ll_bitpos, rl_bitpos);
   end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize);
-  lnmode = get_best_mode (end_bit - first_bit, first_bit, 0, 0,
+  lnmode = get_best_mode (end_bit - first_bit, first_bit,

Re: [PATCH] Handle MEM_REF in decode_addr_const

2011-08-26 Thread Andrew Pinski

On Fri, Aug 26, 2011 at 5:53 AM, Richard Guenther rguent...@suse.de wrote:

 Another missed piece, exposed by less MEM_REF - ARRAY_REF folding.
 Interestingly only for Ada testcases.

I think this also fixed
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50116  but I don't know
for sure.

Thanks,
Andrew Pinski

[lra] patch to fix ppc32 code size degradation and a small clean up

2011-08-26 Thread Vladimir Makarov

  LRA on ppc32 had some code size degradation in comparison with the 
reload pass.  The reason for that is systematic usage of moves from 
memory to memory through two integer registers for DFmode instead of one 
floating point register as reload does.


  The following patch solves the problem.  It is achieved by 
preferencing an insn alternative with smallest number of registers 
involved when higher priority rules (like # of needed reloads) have the 
same results.


  I wish I could use also register pressure information for choosing an 
alternative but unfortunately it will result in slower LRA because the 
info is not available at this subpass (constraints).


  Another wish would be to use insn length but again it needs (a 
temporary) transformation to final result insn which is not known yet at 
this stage because we did not assigned hard registers to reload pseudos 
or memory to spilled pseudos.


The patch also contains a clean up of function mark_not_eliminable.

The patch was bootstrapped on x86-64 and ppc64.

2011-08-26  Vladimir Makarov vmaka...@redhat.com

* lra-constraints.c (best_reload_nregs): New variable.
(process_alt_operands): Add preferences for smaller hard registers
involved.  Increase reject for all failed non registers.

* lra-eliminations.c (mark_not_eliminable): Add check on hard
register before looping on eliminations.

Index: lra-constraints.c
===
--- lra-constraints.c   (revision 178120)
+++ lra-constraints.c   (working copy)
@@ -1143,6 +1143,10 @@ static int best_losers, best_overall;
 /* Number of small register classes used for operands of the best
alternative.  */
 static int best_small_class_operands_num;
+/* Overall number hard registers used for reloads.  For example, on
+   some targets we need 2 general registers to reload DFmode and only
+   one floating point register.  */
+static int best_reload_nregs;
 /* Overall number reflecting distances of previous reloading the same
value.  It is used to improve inheritance chances.  */
 static int best_reload_sum;
@@ -1415,7 +1419,7 @@ process_alt_operands (int only_alternati
   rtx no_subreg_operand[MAX_RECOG_OPERANDS], operand_reg[MAX_RECOG_OPERANDS];
   int hard_regno[MAX_RECOG_OPERANDS];
   enum machine_mode biggest_mode[MAX_RECOG_OPERANDS];
-  int reload_sum;
+  int reload_nregs, reload_sum;
 
   /* Calculate some data common for all alternatives to speed up the
  function.  */
@@ -1460,7 +1464,7 @@ process_alt_operands (int only_alternati
  (only_alternative = 0  nalt != only_alternative))
continue;
 
-  overall = losers = reject = reload_sum = 0;
+  overall = losers = reject = reload_nregs = reload_sum = 0;
   for (nop = 0; nop  n_operands; nop++)
reject += (curr_static_id
   -operand_alternative[nalt * n_operands + nop].reject);
@@ -2003,7 +2007,7 @@ process_alt_operands (int only_alternati
  /* Input reloads can be inherited more often than output
 reloads can be removed, so penalize output
 reloads.  */
- if (curr_static_id-operand[nop].type != OP_IN)
+ if (!REG_P (op) || curr_static_id-operand[nop].type != OP_IN)
reject++;
  /* SUBREGS ??? */
  if (this_alternative_matches = 0)
@@ -2012,6 +2016,9 @@ process_alt_operands (int only_alternati
}
  else if (no_regs_p  ! this_alternative_offmemok  ! constmemok)
goto fail;
+
+ if (! no_regs_p)
+   reload_nregs += ira_reg_class_max_nregs[this_alternative][mode];
}
   
  if (early_clobber_p)
@@ -2128,7 +2135,9 @@ process_alt_operands (int only_alternati
   best_small_class_operands_num
  || (small_class_operands_num
  == best_small_class_operands_num
-  best_reload_sum  reload_sum))
+  (reload_nregs  best_reload_nregs
+ || (reload_nregs == best_reload_nregs
+  best_reload_sum  reload_sum
{
  for (nop = 0; nop  n_operands; nop++)
{
@@ -2145,6 +2154,7 @@ process_alt_operands (int only_alternati
  best_overall = overall;
  best_losers = losers;
  best_small_class_operands_num = small_class_operands_num;
+ best_reload_nregs = reload_nregs;
  best_reload_sum = reload_sum;
  goal_alt_number = nalt;
}
Index: lra-eliminations.c
===
--- lra-eliminations.c  (revision 178120)
+++ lra-eliminations.c  (working copy)
@@ -671,49 +671,46 @@ mark_not_eliminable (rtx x)
 case POST_DEC:
 case POST_MODIFY:
 case PRE_MODIFY:
-  /* If we modify the source of an

[PATCH, i386]: Vectorize round insn

2011-08-26 Thread Uros Bizjak

Hello!

Attached patch enables vectorization of round function using sse4.1
round insn. AZ stands for Away from Zero.

2011-08-26  Uros Bizjak  ubiz...@gmail.com

* config/i386/sse.md (roundmode2): New expander.
* config/i386/i386.c (enum ix86_builtins): Add
IX86_BUILTIN_ROUND{PS,PD}_AZ{,256}.
(struct builtin_description): Add __builtin_ia32_round{ps,pd}_az{,256}
descriptions.
(ix86_builtin_vectorized_function): Handle BUILT_IN_ROUND{,F} builtins.

testsuite/ChangeLog:

2011-08-26  Uros Bizjak  ubiz...@gmail.com

* gcc.target/i386/sse_4_1-round-vec.c: New test.
* gcc.target/i386/sse_4_1-roundf-vec.c: New test.
* gcc.target/i386/avx-round-vec.c: New test.
* gcc.target/i386/avx-roundf-vec.c: New test.

Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32},
committed to mainline.

Uros.
Index: config/i386/sse.md
===
--- config/i386/sse.md  (revision 178119)
+++ config/i386/sse.md  (working copy)
@@ -9646,6 +9646,40 @@
(set_attr prefix orig,vex)
(set_attr mode MODE)])
 
+(define_expand roundmode2
+  [(set (match_dup 4)
+   (plus:VF
+ (match_operand:VF 1 nonimmediate_operand )
+ (match_dup 3)))
+   (set (match_operand:VF 0 register_operand )
+   (unspec:VF
+ [(match_dup 4) (match_dup 5)]
+ UNSPEC_ROUND))]
+  TARGET_ROUND  !flag_trapping_math
+{
+  enum machine_mode scalar_mode;
+  const struct real_format *fmt;
+  REAL_VALUE_TYPE pred_half, half_minus_pred_half;
+  rtx half, vec_half;
+
+  scalar_mode = GET_MODE_INNER (MODEmode);
+
+  /* load nextafter (0.5, 0.0) */
+  fmt = REAL_MODE_FORMAT (scalar_mode);
+  real_2expN (half_minus_pred_half, -(fmt-p) - 1, scalar_mode);
+  REAL_ARITHMETIC (pred_half, MINUS_EXPR, dconsthalf, half_minus_pred_half);
+  half = const_double_from_real_value (pred_half, scalar_mode);
+
+  vec_half = ix86_build_const_vector (MODEmode, true, half);
+  vec_half = force_reg (MODEmode, vec_half);
+
+  operands[3] = gen_reg_rtx (MODEmode);
+  emit_insn (gen_copysignmode3 (operands[3], vec_half, operands[1]));
+
+  operands[4] = gen_reg_rtx (MODEmode);
+  operands[5] = GEN_INT (ROUND_TRUNC);
+})
+
 ;
 ;;
 ;; Intel SSE4.2 string/text processing instructions
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 178119)
+++ config/i386/i386.c  (working copy)
@@ -23661,10 +23661,12 @@ enum ix86_builtins
   IX86_BUILTIN_CEILPD,
   IX86_BUILTIN_TRUNCPD,
   IX86_BUILTIN_RINTPD,
+  IX86_BUILTIN_ROUNDPD_AZ,
   IX86_BUILTIN_FLOORPS,
   IX86_BUILTIN_CEILPS,
   IX86_BUILTIN_TRUNCPS,
   IX86_BUILTIN_RINTPS,
+  IX86_BUILTIN_ROUNDPS_AZ,
 
   IX86_BUILTIN_PTESTZ,
   IX86_BUILTIN_PTESTC,
@@ -23837,10 +23839,12 @@ enum ix86_builtins
   IX86_BUILTIN_CEILPD256,
   IX86_BUILTIN_TRUNCPD256,
   IX86_BUILTIN_RINTPD256,
+  IX86_BUILTIN_ROUNDPD_AZ256,
   IX86_BUILTIN_FLOORPS256,
   IX86_BUILTIN_CEILPS256,
   IX86_BUILTIN_TRUNCPS256,
   IX86_BUILTIN_RINTPS256,
+  IX86_BUILTIN_ROUNDPS_AZ256,
 
   IX86_BUILTIN_UNPCKHPD256,
   IX86_BUILTIN_UNPCKLPD256,
@@ -25063,11 +25067,15 @@ static const struct builtin_description bdesc_args
   { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundpd, __builtin_ia32_truncpd, 
IX86_BUILTIN_TRUNCPD, (enum rtx_code) ROUND_TRUNC, (int) V2DF_FTYPE_V2DF_ROUND 
},
   { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundpd, __builtin_ia32_rintpd, 
IX86_BUILTIN_RINTPD, (enum rtx_code) ROUND_MXCSR, (int) V2DF_FTYPE_V2DF_ROUND },
 
+  { OPTION_MASK_ISA_ROUND, CODE_FOR_roundv2df2, __builtin_ia32_roundpd_az, 
IX86_BUILTIN_ROUNDPD_AZ, UNKNOWN, (int) V2DF_FTYPE_V2DF },
+
   { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundps, __builtin_ia32_floorps, 
IX86_BUILTIN_FLOORPS, (enum rtx_code) ROUND_FLOOR, (int) V4SF_FTYPE_V4SF_ROUND 
},
   { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundps, __builtin_ia32_ceilps, 
IX86_BUILTIN_CEILPS, (enum rtx_code) ROUND_CEIL, (int) V4SF_FTYPE_V4SF_ROUND },
   { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundps, __builtin_ia32_truncps, 
IX86_BUILTIN_TRUNCPS, (enum rtx_code) ROUND_TRUNC, (int) V4SF_FTYPE_V4SF_ROUND 
},
   { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_roundps, __builtin_ia32_rintps, 
IX86_BUILTIN_RINTPS, (enum rtx_code) ROUND_MXCSR, (int) V4SF_FTYPE_V4SF_ROUND },
 
+  { OPTION_MASK_ISA_ROUND, CODE_FOR_roundv4sf2, __builtin_ia32_roundps_az, 
IX86_BUILTIN_ROUNDPS_AZ, UNKNOWN, (int) V4SF_FTYPE_V4SF },
+
   { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_ptest, __builtin_ia32_ptestz128, 
IX86_BUILTIN_PTESTZ, EQ, (int) INT_FTYPE_V2DI_V2DI_PTEST },
   { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_ptest, __builtin_ia32_ptestc128, 
IX86_BUILTIN_PTESTC, LTU, (int) INT_FTYPE_V2DI_V2DI_PTEST },
   { OPTION_MASK_ISA_ROUND, CODE_FOR_sse4_1_ptest, 
__builtin_ia32_ptestnzc128, IX86_BUILTIN_PTESTNZC, GTU, (int) 
INT_FTYPE_V2DI_V2DI_PTEST },
@@ -25185,11 +25193,15 @@

Re: [4.7][google]Support for getting CPU type and feature information at run-time. (issue4893046)

2011-08-26 Thread Xinliang David Li

IFUNC selector will need to call get_cpu_indicator (as proposed by HJ
or something similar), while in other contexts, the implementation
should find a way to make sure the indicator is already initialized
such that the builtins accessing the features can be directly used
(See also Michael and Richard's previous comments).  The runtime
penalty is much smaller.

david

On Fri, Aug 26, 2011 at 10:37 AM, Sriraman Tallam tmsri...@google.com wrote:
 On Fri, Aug 26, 2011 at 10:24 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Aug 26, 2011 at 10:17 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Fri, Aug 26, 2011 at 10:10 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Aug 26, 2011 at 10:06 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Thu, Aug 25, 2011 at 6:02 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, Aug 25, 2011 at 5:37 PM, Sriraman Tallam tmsri...@google.com 
 wrote:
 Hi,

  Thanks for all the comments. I am attaching a new patch
 incorporating all of the changes mentioned, mainly :

 1) Make __cpu_indicator_init a constructor in libgcc and guard to call
 it only once.

 This is unreliable and you don't need 3 symbols from libgcc. You can use

 Do you mean it is unreliable because of the constructor ordering problem?


 You do not have total control when __cpu_indicator_init is called.

 Like  discussed before, for non-ctor functions, which in my opinion is
 the common use case, it works out great because __cpu_indicator_init
 is guaranteed to be called and I save doing an extra check. It is only
 for other ctors where this is a problem. So other ctors call this
 explicitly.  What did I miss?


 I have

 static void foo ( void ) __attribute__((constructor));

 static void foo ( void )
 {
   ...
   call bar ();
   ...
 }

 in my application. bar () uses those cpu specific functions.
 foo () is called before __cpu_indicator_init.  Since IFUNC
 returns the cpu specific function address only for the
 first call, the proper cpu specific functions will never be used.

 Please correct me if I am wrong since I did not follow the IFUNC part
 you mentioned.  However, it looks like this could be solved with
 adding an explicit call to __cpu_indicator_init from within the ctor
 foo. To me, it seems like the pain of adding this call explicitly in
 other ctors is worth it because it works cleanly for non-ctors.

 static void foo ( void ) __attribute__((constructor));

 static void foo ( void )
 {
  ...
  __cpu_indicator_init ();
  call bar ();
  ...
 }

 Will this work?

 Thanks,
 -Sri.



 --
 H.J.

Re: [PATCH] Handle MEM_REF in decode_addr_const

2011-08-26 Thread Richard Guenther

On Fri, Aug 26, 2011 at 9:02 PM, Andrew Pinski pins...@gmail.com wrote:
 On Fri, Aug 26, 2011 at 5:53 AM, Richard Guenther rguent...@suse.de wrote:

 Another missed piece, exposed by less MEM_REF - ARRAY_REF folding.
 Interestingly only for Ada testcases.

 I think this also fixed
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50116  but I don't know
 for sure.

Yes, that's exactly the ICEs I got.  I'll backport the fix.

Richard.

 Thanks,
 Andrew Pinski

[PATCH, i386]: Rewrite ix86_build_const_vector

2011-08-26 Thread Uros Bizjak

Hello!

No functional change.

2011-08-26  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.c (ix86_build_const_vector): Rewrite using loop
with RTVEC_ELT accessor.

Tested on x86_64-pc-linux-gnu, committed to mainline.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 178123)
+++ config/i386/i386.c  (working copy)
@@ -16512,53 +16512,30 @@ ix86_expand_convert_uns_sisf_sse (rtx target, rtx
 rtx
 ix86_build_const_vector (enum machine_mode mode, bool vect, rtx value)
 {
+  int i, n_elt;
   rtvec v;
+  enum machine_mode scalar_mode;
+
   switch (mode)
 {
 case V4SImode:
-  gcc_assert (vect);
-  v = gen_rtvec (4, value, value, value, value);
-  return gen_rtx_CONST_VECTOR (V4SImode, v);
-
 case V2DImode:
   gcc_assert (vect);
-  v = gen_rtvec (2, value, value);
-  return gen_rtx_CONST_VECTOR (V2DImode, v);
-
 case V8SFmode:
-  if (vect)
-   v = gen_rtvec (8, value, value, value, value,
-  value, value, value, value);
-  else
-   v = gen_rtvec (8, value, CONST0_RTX (SFmode),
-  CONST0_RTX (SFmode), CONST0_RTX (SFmode),
-  CONST0_RTX (SFmode), CONST0_RTX (SFmode),
-  CONST0_RTX (SFmode), CONST0_RTX (SFmode));
-  return gen_rtx_CONST_VECTOR (V8SFmode, v);
-
 case V4SFmode:
-  if (vect)
-   v = gen_rtvec (4, value, value, value, value);
-  else
-   v = gen_rtvec (4, value, CONST0_RTX (SFmode),
-  CONST0_RTX (SFmode), CONST0_RTX (SFmode));
-  return gen_rtx_CONST_VECTOR (V4SFmode, v);
-
 case V4DFmode:
-  if (vect)
-   v = gen_rtvec (4, value, value, value, value);
-  else
-   v = gen_rtvec (4, value, CONST0_RTX (DFmode),
-  CONST0_RTX (DFmode), CONST0_RTX (DFmode));
-  return gen_rtx_CONST_VECTOR (V4DFmode, v);
-
 case V2DFmode:
-  if (vect)
-   v = gen_rtvec (2, value, value);
-  else
-   v = gen_rtvec (2, value, CONST0_RTX (DFmode));
-  return gen_rtx_CONST_VECTOR (V2DFmode, v);
+  n_elt = GET_MODE_NUNITS (mode);
+  v = rtvec_alloc (n_elt);
+  scalar_mode = GET_MODE_INNER (mode);
 
+  RTVEC_ELT (v, 0) = value;
+
+  for (i = 1; i  n_elt; ++i)
+   RTVEC_ELT (v, i) = vect ? value : CONST0_RTX (scalar_mode);
+
+  return gen_rtx_CONST_VECTOR (mode, v);
+
 default:
   gcc_unreachable ();
 }

Re: [PATCH] Add infrastructure to merge standard builtin enums with backend builtins

2011-08-26 Thread Mike Stump

On Aug 26, 2011, at 7:19 AM, Michael Meissner wrote:
 The alternative is something like what Kenney and Mike are doing in their
 private port, where they have new syntax in the MD file for builtins.
 
 But are those user-exposed builtins?  Certainly interesting to combine
 builtin definition and the instruction it expands to.
 
 Yes, these are user exposed builtins.  Massive amounts of user exposed 
 builtins
 (Mike said he needs 13 bits for the builtin index).  I think it would be 
 better
 if Mike comments on this.

I gave the quick intro yesterday.  You wind up specifying the built-ins that 
you have, and the generator does things like assign enum values, create a file 
that appears the builtins into the user name space from the __builtin_ 
namespace, generate compilation test cases for all the built-ins with all 
different types they support.  Generate executable testcases to ensure 
everything works flawlessly.  We have mods to the overload builtin mechanism so 
that one can do things like:

template class T
T foo(T x, T y) {
  x = add(x, y);
  return x;
}

Or, if you perfer the C version:

int fooi(int x, int y) {
  return add(x, y);
}

short foos(short x, short y) {
  return add(x, y);
}

and have it work out just fine when T is instantiated with all the various 
types that are supported by the hardware, and it works in C.  This permits a 
nice api for the machine builtins, as you don't have to mangle in types into 
the builtin-name.  The system is complete enough to handle the needs of 
anything coming down the pike in the next decade.  It can handle input/output 
parameters that have register assignments.  It can handle reference parameters 
(like the input/output parameters, but these are done as values in memory.  The 
generator builds up _all_ the types one needs, handles all the registration and 
all the wiring up for codegen.  There is a mechanism to remap arguments going 
to the rtl generators, so the operand ordering of the builtin doesn't have to 
match the operand ordering of the md pattern for the semantics that back the 
builtin.  There is a beefy macro system built into the generator so that you 
can have nice simple patterns and it is beefier than the iterators one can use 
today.  So, for example, we have:

(define_special_iterator imath3 [add sub mul])

to define some built-ins that are regular with respect to the operation, but, 
this isn't a code nor mode iterator, it just iterators the pattern with the 
string substituted.  For machines with any regularity, the patterns wind up 
being smaller and easier to maintain.  I'd be happy to answer questions about 
it.

Re: PING: [PATCH]: Fix -fbranch-probabilities

2011-08-26 Thread Jan Hubicka

 Hello,

 Could I have a review for the trivial patch posted in
 http://gcc.gnu.org/ml/gcc-patches/2011-08/msg01123.html

 -fprofile-use sets flag_branch_probabilities.

 But we should also be able to use -fbranch-probabilities on its own  
 using the information generated by -fprofile-arcs, as documented.

OK, thanks!  I was under impression that some of gcov tests still use
-fprofile-arcs -fbranch-probabilities pair.  It don't seem to be the
case, so if you add a testcase, you get extra score ;)

Honza

 Many thanks

 Christian

Re: Vector Comparison patch

2011-08-26 Thread Artem Shinkarov

Hi

Here is a patch with vector comparison only.
Comparison is expanded using VEC_COND_EXPR, conversions between the
different types inside the VEC_COND_EXPR are happening in optabs.c.

The comparison generally works, however, the x86 backend does not
recognize vectors of all 1s of type float and double, which is very
bad, but I hope it could be fixed easily. Here is my humble attempt:

Index: gcc/config/i386/predicates.md
===
--- gcc/config/i386/predicates.md   (revision 177665)
+++ gcc/config/i386/predicates.md   (working copy)
@@ -763,7 +763,19 @@ (define_predicate vector_all_ones_opera
   for (i = 0; i  nunits; ++i)
 {
   rtx x = CONST_VECTOR_ELT (op, i);
-  if (x != constm1_rtx)
+ rtx y;
+
+ if (GET_MODE_CLASS (GET_MODE (x)) == MODE_FLOAT)
+   {
+ REAL_VALUE_TYPE r;
+ REAL_VALUE_FROM_INT (r, -1, -1, GET_MODE (x));
+ y = CONST_DOUBLE_FROM_REAL_VALUE (r, GET_MODE (x));
+   }
+ else
+   y = constm1_rtx;
+
+ /* if (x != constm1_rtx) */
+ if (!rtx_equal_p (x, y))
 return false;
 }
   return true;

But the problem I have here is that -1 actually converts to -1.0,
where I need to treat -0x1 as float. Something like:

int p = -1;
void *x = p;
float r = *((float *)x);

Is there any way to do that in this context? Or may be there is
another way to support real-typed vectors of -1 as constants?


ChangeLog

20011-08-27 Artjoms Sinkarovs artyom.shinkar...@gmail.com

gcc/
* optabs.c (vector_compare_rtx): Allow comparison operands
and vcond operands have different type.
(expand_vec_cond_expr): Convert operands in case they do
not match.
* fold-const.c (constant_boolean_node): Adjust the meaning
of boolean for vector types: true = {-1,..}, false = {0,..}.
(fold_unary_loc): Avoid conversion of vector comparison to
boolean type.
* expr.c (expand_expr_real_2): Expand vector comparison by
building an appropriate VEC_COND_EXPR.
* c-typeck.c (build_binary_op): Typecheck vector comparisons.
(c_objc_common_truthvalue_conversion): Adjust.
* gimplify.c (gimplify_expr): Support vector comparison
in gimple.
* tree.def: Adjust comment.
* tree-vect-generic.c (do_compare): Helper function.
(expand_vector_comparison): Check if hardware supports
vector comparison of the given type or expand vector
piecewise.
(expand_vector_operation): Treat comparison as binary
operation of vector type.
(expand_vector_operations_1): Adjust.
* tree-cfg.c (verify_gimple_comparison): Adjust.

gcc/config/i386
* i386.c (ix86_expand_sse_movcc): Consider a case when
vcond operators are {-1,..} and {0,..}.

gcc/doc
* extend.texi: Adjust.

gcc/testsuite
* gcc.c-torture/execute/vector-compare-1.c: New test.
* gcc.c-torture/execute/vector-compare-2.c: New test.
* gcc.dg/vector-compare-1.c: New test.
* gcc.dg/vector-compare-2.c: New test.

Bootstrapped and tested on x86_64-unknown-linux-gnu.


Artem.
Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi (revision 177665)
+++ gcc/doc/extend.texi (working copy)
@@ -6553,6 +6553,29 @@ invoke undefined behavior at runtime.  W
 accesses for vector subscription can be enabled with
 @option{-Warray-bounds}.
 
+In GNU C vector comparison is supported within standard comparison
+operators: @code{==, !=, , =, , =}. Comparison operands can be
+vector expressions of integer-type or real-type. Comparison between
+integer-type vectors and real-type vectors are not supported.  The
+result of the comparison is a vector of the same width and number of
+elements as the comparison operands with a signed integral element
+type.
+
+Vectors are compared element-wise producing 0 when comparison is false
+and -1 (constant of the appropriate type where all bits are set)
+otherwise. Consider the following example.
+
+@smallexample
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si a = @{1,2,3,4@};
+v4si b = @{3,2,1,4@};
+v4si c;
+
+c = a   b; /* The result would be @{0, 0,-1, 0@}  */
+c = a == b; /* The result would be @{0,-1, 0,-1@}  */
+@end smallexample
+
 You can declare variables and use them in function calls and returns, as
 well as in assignments and some casts.  You can specify a vector type as
 a return type for a function.  Vector types can also be used as function
Index: gcc/optabs.c
===
--- gcc/optabs.c(revision 177665)
+++ gcc/optabs.c(working copy)
@@ -6502,7 +6502,8 @@ get_rtx_code (enum tree_code tcode, bool
unsigned operators. Do not generate compare instruction.  */

92 matches

Mail list logo