Re: [patch, mips] Do not make -Os the default with mips-mti-elf target.

2012-11-07 Thread Richard Sandiford
"Steve Ellcey "  writes:
> I noticed that because my new mips-mti-elf target includes config/mt-sde
> it uses the -Os option by default when building runtime libraries.  I would
> like to remove the use of -Os so that the runtime performance for the
> mips-mti-elf target is improved.  If users want the -Os flag they can use
> the existing --enable-target-optspace configure option to get it.
>
> This patch creates config/mt-mti that is like mt-sde but without -Os and
> changes the mips*-mti-elf* target to use that include file instead of mt-sde.
>
> Tested with the mips-mti-elf target.
>
> OK to checkin?

OK with me, thanks.

Just for the record, the SDE configuration is also effectively owned by MTI,
so if you think the same reasoning applies there, we could change mt-sde
instead.  But if you want to keep SDE the way it is now then the patch is OK.

Richard


RE: [PATCH,RX] Support Bit Manipulation on Memory Operands

2012-11-07 Thread Naveen H. S
Hi,

Thank you for reviewing the patch and valuable comments.

>> The output constraint is now an in-out:  s/=Q/+Q/.
Done.

Please find attached the modified patch and let me know if it's okay?

Thanks & Regards,
Naveen



rx_bit_insn.patch
Description: rx_bit_insn.patch


Re: [PATCH] Vzeroupper placement/47440

2012-11-07 Thread Vladimir Yakovlev
I tested changes with configure

../gcc/configure --enable-clocale=gnu --with-system-zlib
--enable-shared --with-demangler-in-ld --with-fpmath=sse
--enable-languages=c,c++,fortran,java,lto,objc --with-arch=corei7-avx
--with-cpu=corei7-avx

Bootstrap is passed and no new fails in make check.

Thank you,
Vladimir


2012/11/7 Vladimir Yakovlev :
> Hello,
>
> Thanyou for investigation and fixing the problem.  I'll answer on remarks 
> later.
>
> Regards,
> Vladimir
>
> 2012/11/7 Jakub Jelinek :
>> On Tue, Nov 06, 2012 at 02:11:50PM -0800, H.J. Lu wrote:
>>> On Tue, Nov 6, 2012 at 2:30 AM, Kirill Yukhin  
>>> wrote:
>>> > Hello,
>>> >> OK for mainline SVN, please commit.
>>> > Checked into GCC trunk: 
>>> > http://gcc.gnu.org/ml/gcc-cvs/2012-11/msg00176.html
>>> >
>>> > Thanks, K
>>>
>>> This caused:
>>>
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55224
>>
>> Not only that, it also broke --enable-checking=yes,rtl bootstrap.
>> SET_DEST isn't valid on CALL, but XEXP (call, 0) is a MEM anyway and
>> the code looks for reg, so I think looking for CALL was just a mistake.
>>
>> This fixes the bootstrap, ok for trunk?
>>
>> 2012-11-06  Jakub Jelinek  
>>
>> * config/i386/i386.c (ix86_avx_u128_mode_after): Don't
>> look for reg in CALL operand.
>>
>> --- gcc/config/i386/i386.c.jj   2012-11-06 18:10:22.0 +0100
>> +++ gcc/config/i386/i386.c  2012-11-06 20:15:09.068912242 +0100
>> @@ -15084,9 +15084,9 @@ ix86_avx_u128_mode_after (int mode, rtx
>>/* Check for CALL instruction.  */
>>if (CALL_P (insn))
>>  {
>> -  if (GET_CODE (pat) == SET || GET_CODE (pat) == CALL)
>> +  if (GET_CODE (pat) == SET)
>> reg = SET_DEST (pat);
>> -  else if (GET_CODE (pat) ==  PARALLEL)
>> +  else if (GET_CODE (pat) == PARALLEL)
>> for (i = XVECLEN (pat, 0) - 1; i >= 0; i--)
>>   {
>> rtx x = XVECEXP (pat, 0, i);
>>
>>
>> Jakub


Re: [PATCH] Add extensive commentary to sparc's "U" constraint.

2012-11-07 Thread David Miller
From: Steven Bosscher 
Date: Thu, 8 Nov 2012 01:19:11 +0100

> On Wed, Nov 7, 2012 at 11:39 PM, David Miller wrote:
>> One idea that occurred to me was perhaps to extend
>> define_register_constraint such that an extra condition can be
>> specified.  So for sparc's constraint "U" it would evaluate to
>> GENERAL_REGS but also express the condition that the hard register
>> must be even.
> 
> I haven't looked at the details of this all, but there are ports that
> use define_predicate to request an even-numbered register. See e.g.
> frv and v850. I'm not sure if/how the RA takes predicates into account
> when selecting a register.

That would only influence instruction recognition.

> (bfin uses define_register_constraints, but it has separate register
> classes for the even-numbered registers, so apparently that's not for
> multi-word hardregs like your case.)

Right.

>> diff --git a/gcc/config/sparc/constraints.md 
>> b/gcc/config/sparc/constraints.md
>> index 2f8c6ad..440dc57 100644
>> --- a/gcc/config/sparc/constraints.md
>> +++ b/gcc/config/sparc/constraints.md
>> @@ -130,7 +130,43 @@
>>(match_code "mem")
>>(match_test "memory_ok_for_ldd (op)")))
>>
>> -;; Not needed in 64-bit mode
>> +;; This awkward register constraint is necessary because it is not
>> +;; possible to express the "must be even numbered regsiter" condition
> 
> s/regsiter/register/

Thanks, I'll fix that.


Re: [v3] Fix profile mode failures

2012-11-07 Thread Paolo Carlini

On 11/08/2012 02:37 AM, Jonathan Wakely wrote:
Bah, it's nothing to do with me, the profile-mode list should never 
have worked! I'm testing this overnight.

Ah! Thanks!

Paolo.



RE: [RFC] New feature to reuse one multilib among different targets

2012-11-07 Thread Terry Guo

[...]
> > Please help to review this new Multilib feature. It intends to
> provide
> > user
> 
> Your patch doesn't include documentation for fragments.texi (which
> needs to define the semantics without reference to the details of what
> gcc.c's internal datastructures for multilibs, as output by genmultilib,
> might look like).
> 
> I am unconvinced that directly adding to the drivers' internal
> datastructures like this is a sensible interface for specifying
> multilib choice in target makefile fragments.
> 

Very appreciate your review and comments. Here is an updated patch which
follows the approaches used in current multilib implementation. With this
update, the following statement means target represented by "optC optD" can
reuse existing multilib built by options "optA optB":

MULTILIB_REUSE = optA/optB=optC/optD

To convert such statements to data structure used by multilib_raw, I
refactor codes in genmultilib into two functions combo_to_dir and
options_output. Then use combo_to_dir to convert left part into multilib
folder name and use options_output to convert right part into option list.

Inside gcc.c, those reuse rules will be used once gcc can't figure out
multilib that exactly matches current command line options.

I build trunk code with this patch along with --enable-multilib for targets
arm-none-eabi/x86/m6800/mips/powerpc. No problem found.

Is this patch OK? Please comment.

BR,
Terry

2012-11-08  Terry Guo  

* genmultilib (combo_to_dir): New function.
(options_output): New function.
(MULTILIB_REUSE): New argument.
* Makefile.in (s-mlib): Add a new argument MULTILIB_REUSE.
* gcc.c (multilib_reuse): New spec.
(set_multilib_dir): Use multilib_reuse.

multilib-reuse-v2.patch
Description: Binary data


Re: [v3] Fix profile mode failures

2012-11-07 Thread Jonathan Wakely
On 7 November 2012 10:55, Jonathan Wakely wrote:
> On 7 November 2012 10:25, Paolo Carlini wrote:
>>
>> I'm for example seeing in the log:
>>
>> 23_containers/list/init-list.cc execution test
>>
>> pretty mysterious,
>
> Yes, I had a quick look at it but couldn't see the problem, so wanted
> to fix the trivial vector problem first.
>
>> I think it's the first time I ever see it.
>
> Huh, then I guess I broke that one too.  I won't rest until it's fixed ;-)

Bah, it's nothing to do with me, the profile-mode list should never
have worked!  I'm testing this overnight.
commit 756c968f9d35778e0b1c068c76833cbe8358a9d4
Author: Jonathan Wakely 
Date:   Thu Nov 8 01:27:24 2012 +

* include/profile/iterator_tracker.h (operator++): Fix returning
dangling reference.
(operator--): Likewise.

diff --git a/libstdc++-v3/include/profile/iterator_tracker.h 
b/libstdc++-v3/include/profile/iterator_tracker.h
index 733429d..91f733c 100644
--- a/libstdc++-v3/include/profile/iterator_tracker.h
+++ b/libstdc++-v3/include/profile/iterator_tracker.h
@@ -93,7 +93,7 @@ namespace __profile
return *this;
   }
 
-  __iterator_tracker&
+  __iterator_tracker
   operator++(int)
   {
_M_ds->_M_profile_iterate();
@@ -110,7 +110,7 @@ namespace __profile
return *this;
   }
 
-  __iterator_tracker&
+  __iterator_tracker
   operator--(int)
   {
_M_ds->_M_profile_iterate(1);


Re: [Bug libstdc++/54075] [4.7.1] unordered_map insert still slower than 4.6.2

2012-11-07 Thread Jonathan Wakely
On 7 November 2012 22:02, François Dumont wrote:
>
> Ok to commit ? If so, where ?

That patch is OK for trunk and 4.7, thanks.


Re: [PATCH] Add extensive commentary to sparc's "U" constraint.

2012-11-07 Thread Steven Bosscher
On Wed, Nov 7, 2012 at 11:39 PM, David Miller wrote:
> One idea that occurred to me was perhaps to extend
> define_register_constraint such that an extra condition can be
> specified.  So for sparc's constraint "U" it would evaluate to
> GENERAL_REGS but also express the condition that the hard register
> must be even.

I haven't looked at the details of this all, but there are ports that
use define_predicate to request an even-numbered register. See e.g.
frv and v850. I'm not sure if/how the RA takes predicates into account
when selecting a register.

(bfin uses define_register_constraints, but it has separate register
classes for the even-numbered registers, so apparently that's not for
multi-word hardregs like your case.)


> diff --git a/gcc/config/sparc/constraints.md b/gcc/config/sparc/constraints.md
> index 2f8c6ad..440dc57 100644
> --- a/gcc/config/sparc/constraints.md
> +++ b/gcc/config/sparc/constraints.md
> @@ -130,7 +130,43 @@
>(match_code "mem")
>(match_test "memory_ok_for_ldd (op)")))
>
> -;; Not needed in 64-bit mode
> +;; This awkward register constraint is necessary because it is not
> +;; possible to express the "must be even numbered regsiter" condition

s/regsiter/register/

Ciao!
Steven


Re: RFA: hookize ADJUST_INSN_LENGTH

2012-11-07 Thread Joern Rennecke

Quoting Joern Rennecke :


+  varying_length[uid] = (varying_length[inner_uid] & 1);

Typo; I meant:
+  varying_length[uid] |= (varying_length[inner_uid] & 1);


[patch, mips] Do not make -Os the default with mips-mti-elf target.

2012-11-07 Thread Steve Ellcey
I noticed that because my new mips-mti-elf target includes config/mt-sde
it uses the -Os option by default when building runtime libraries.  I would
like to remove the use of -Os so that the runtime performance for the
mips-mti-elf target is improved.  If users want the -Os flag they can use
the existing --enable-target-optspace configure option to get it.

This patch creates config/mt-mti that is like mt-sde but without -Os and
changes the mips*-mti-elf* target to use that include file instead of mt-sde.

Tested with the mips-mti-elf target.

OK to checkin?

Steve Ellcey
sell...@mips.com


2012-11-07  Steve Ellcey  

* config/mt-mti: New file.
* configure.ac (mips*-mti-elf*): Use config/mt-mti.
* configure: Regenerate.


diff --git a/config/mt-mti b/config/mt-mti
new file mode 100644
index 000..85ad9e7
--- /dev/null
+++ b/config/mt-mti
@@ -0,0 +1,13 @@
+# We use -minterlink-mips16 so that the non-MIPS16 libraries can still be
+# linked against partly-MIPS16 code.  The -mcode-readable=pcrel option allows
+# MIPS16 libraries to run on Harvard-style split I/D memories, so long
+# as they have the D-to-I redirect for PC-relative loads.  -mno-gpopt
+# has two purposes: it allows libraries to be used in situations where
+# $gp != our _gp, and it allows them to be built with -G8 while
+# retaining link compatibility with -G0 and -G4.
+#
+# We do not default to -Os like mt-sde does, users who want that can configure
+# with --enable-target-optspace.
+
+CFLAGS_FOR_TARGET += -minterlink-mips16 -mcode-readable=pcrel -mno-gpopt
+CXXFLAGS_FOR_TARGET += -minterlink-mips16 -mcode-readable=pcrel -mno-gpopt
diff --git a/configure.ac b/configure.ac
index c346c2c..a87185a 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2299,9 +2299,12 @@ case "${target}" in
   spu-*-*)
 target_makefile_frag="config/mt-spu"
 ;;
-  mips*-sde-elf* | mips*-mti-elf*)
+  mips*-sde-elf*)
 target_makefile_frag="config/mt-sde"
 ;;
+  mips*-mti-elf*)
+target_makefile_frag="config/mt-mti"
+;;
   mipsisa*-*-elfoabi*)
 target_makefile_frag="config/mt-mips-elfoabi"
 ;;


Re: [trans-mem][rfc] Improvements to uninstrumented code paths

2012-11-07 Thread Andi Kleen
Richard Henderson  writes:
>
> Is it ever likely that we'd choose an uninstrumented path for a
> nested transaction, given that we're already executing the instrumented
> path for an outer transaction?

I don't see why not. A small inner transaction may well succeed 
in HTM, even if the big outer one does not.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only


Re: [PATCH] Add extensive commentary to sparc's "U" constraint.

2012-11-07 Thread Vladimir Makarov

On 12-11-07 5:39 PM, David Miller wrote:

Vlad, I wanted to make you aware of the following because it's
a major barrier for using LRA on sparc at this time.  I therefore
do not think moving to LRA on this target is possible in the 4.8
timeframe, which is fine.  The situation is described completely
in the comment I am adding in the patch below.
David, thanks very much for reporting this.  I was quite surprised when 
you decided to switch SPARC to LRA for gcc4.8.  I was a bit scary too 
because I never paid a big attention to SPARC.  Even LRA for x86/x86-64 
on which I worked hard is not easy to port and I have a lot of PRs now.  
I did eight LRA ports on the branch mostly to prove that LRA approach is 
doable to replace reload.


So your decision is a big relief for me.  Thanks very much for working 
on SPARC LRA port.  I am sure we solve all the problems and switch on 
LRA for SPARC for gcc4.9.  I'll return my work on other targets and 
SPARC on the branch when I am done with current LRA PRs for x86/x86-64.




The most alarming aspect of this to me was discovering that IRA could
allocate registers to a pseudo that did not pass HARD_REGNO_MODE_OK,
and this anomaly is completely masked because reload and our splitters
end up fixing things up.
I think SPARC port is far from the worst.  I also found recently some 
inconsistency in LRA and IRA (LRA recognizes conflicts for multi-word 
pseudos when IRA does not) which I'd like to address too.  I think 
Richard Sandiford is probably right saying that we should use LRA as 
opportunity to clean the ports.

I wanted to explicitly thank you for your work on LRA because without
it we would never have discovered these inconsistencies in the sparc
backend.




Re: [trans-mem][rfc] Improvements to uninstrumented code paths

2012-11-07 Thread Richard Henderson
On 11/07/2012 03:01 PM, Richard Henderson wrote:
> Thoughts?

Now with 100% more patches per mail!


r~

>From 6e97eb1f7086b4392545cc73254037cd3ff09fe6 Mon Sep 17 00:00:00 2001
From: Richard Henderson 
Date: Wed, 7 Nov 2012 14:32:21 -0800
Subject: [PATCH 1/2] tm: Handle nested transactions in
 ipa_uninstrument_transaction

* trans-mem.c (ipa_uninstrument_transaction0): Rename from
ipa_uninstrument_transaction.
(ipa_uninstrument_transaction): New function.
---
 gcc/ChangeLog   |  4 
 gcc/trans-mem.c | 23 ---
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 24d9845..dc1909c 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,9 @@
 2012-11-07  Richard Henderson  
 
+	* trans-mem.c (ipa_uninstrument_transaction0): Rename from
+	ipa_uninstrument_transaction.
+	(ipa_uninstrument_transaction): New function.
+
 	* trans-mem.c (pass_ipa_tm): Don't use TODO_update_ssa.
 
 2012-11-07  Peter Bergner  
diff --git a/gcc/trans-mem.c b/gcc/trans-mem.c
index a7b4a9c..478ce71 100644
--- a/gcc/trans-mem.c
+++ b/gcc/trans-mem.c
@@ -3882,12 +3882,12 @@ maybe_push_queue (struct cgraph_node *node,
code path.  QUEUE are the basic blocks inside the transaction
represented in REGION.
 
-   Later in split_code_paths() we will add the conditional to choose
+   Later in expand_transaction we will add the conditional to choose
between the two alternatives.  */
 
 static void
-ipa_uninstrument_transaction (struct tm_region *region,
-			  VEC (basic_block, heap) *queue)
+ipa_uninstrument_transaction0 (struct tm_region *region,
+			   VEC (basic_block, heap) *queue)
 {
   gimple transaction = region->transaction_stmt;
   basic_block transaction_bb = gimple_bb (transaction);
@@ -3907,6 +3907,23 @@ ipa_uninstrument_transaction (struct tm_region *region,
   free (new_bbs);
 }
 
+static void
+ipa_uninstrument_transaction (struct tm_region *region,
+			  VEC (basic_block, heap) *bbs)
+{
+  ipa_uninstrument_transaction0 (region, bbs);
+
+  // Recurse for the inner transactions to make sure they all have
+  // uninstrumented code paths.
+  for (region = region->inner; region; region = region->next)
+{
+  bbs = get_tm_region_blocks (region->entry_block, region->exit_blocks,
+  NULL, NULL, false);
+  ipa_uninstrument_transaction (region, bbs);
+  VEC_free (basic_block, heap, bbs);
+}
+}
+
 /* A subroutine of ipa_tm_scan_calls_transaction and ipa_tm_scan_calls_clone.
Queue all callees within block BB.  */
 
-- 
1.7.11.7

>From 9253d4138a0cdb76c40345a1e32694968f375a86 Mon Sep 17 00:00:00 2001
From: Richard Henderson 
Date: Wed, 7 Nov 2012 14:36:01 -0800
Subject: [PATCH 2/2] tm: Optimize nested transactions in an uninstrumented
 code path

* trans-mem.c (tm_region_init_0): Consider all edges when looking
for the entry_block.
(collect_bb2reg): Handle uninstrumented only transactions.
(generate_tm_state): Likewise.
(expand_transaction): Likewise.
(execute_tm_memopt): Likewise.
(ipa_uninstrument_transaction0): Convert nested transactions in
the uninstrumented code path to uninstrumented only.
---
 gcc/ChangeLog   |  9 
 gcc/trans-mem.c | 66 -
 2 files changed, 60 insertions(+), 15 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index dc1909c..8555e8c 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,14 @@
 2012-11-07  Richard Henderson  
 
+	* trans-mem.c (tm_region_init_0): Consider all edges when looking
+	for the entry_block.
+	(collect_bb2reg): Handle uninstrumented only transactions.
+	(generate_tm_state): Likewise.
+	(expand_transaction): Likewise.
+	(execute_tm_memopt): Likewise.
+	(ipa_uninstrument_transaction0): Convert nested transactions in
+	the uninstrumented code path to uninstrumented only.
+
 	* trans-mem.c (ipa_uninstrument_transaction0): Rename from
 	ipa_uninstrument_transaction.
 	(ipa_uninstrument_transaction): New function.
diff --git a/gcc/trans-mem.c b/gcc/trans-mem.c
index 478ce71..c0987dc 100644
--- a/gcc/trans-mem.c
+++ b/gcc/trans-mem.c
@@ -868,10 +868,13 @@ typedef struct tm_log_entry
 {
   /* Address to save.  */
   tree addr;
+
   /* Entry block for the transaction this address occurs in.  */
   basic_block entry_block;
+
   /* Dominating statements the store occurs in.  */
   gimple_vec stmts;
+
   /* Initially, while we are building the log, we place a nonzero
  value here to mean that this address *will* be saved with a
  save/restore sequence.  Later, when generating the save sequence
@@ -1721,8 +1724,8 @@ struct tm_region
   /* Return value from BUILT_IN_TM_START.  */
   tree tm_state;
 
-  /* The entry block to this region.  This will always be the first
- block of the body of the transaction.  */
+  /* The entry block to the instrumented code path for this region.
+ This will always be the first block o

[trans-mem][rfc] Improvements to uninstrumented code paths

2012-11-07 Thread Richard Henderson
I wrote the second of these patches first, and I'm uncertain about the
desirability of the first of the patches.

While working on the uninstrumented code path bulk patch, I noticed that
nested transactions within the copy of the outermost transaction were
not being processed for an uninstrumented code path, and so were only
receiving an instrumented path.  This is clearly less than ideal when
considering HTM.

Now, it seemed to me that if we're already in an uninstrumented code
path, we're extremely likely to want to stay in one.  This is certainly
true for HTM, as well as when we've selected the serialirr method.  I
can't think off hand of any other reason we'd be on the uninstrumented
code path.

Therefore the second patch arranges for all nested transactions in the
uninstrumented path to _only_ have uninstrumented paths themselves.

While reviewing the results of this second patch in detail, I noticed
that nested transactions on the instrumented code paths were not 
generating both instrumented and uninstrumented code paths.  My first
reaction was that this was a bug, and so I wrote the first patch to
fix it.

But as I was reviewing the patch to write the changelog, I began to
wonder whether the same logic concerning the original instrumented/
uninstrumented choice applies as well to the instrumented path.

Is it ever likely that we'd choose an uninstrumented path for a
nested transaction, given that we're already executing the instrumented
path for an outer transaction?

It now seems to me that the only time we'd switch from instrumented
to uninstrumented code path would be during a transaction restart,
after having selected to retry with a serialirr method.

Which means that I should apply the second patch only,

Thoughts?


r~


Re: patch fixing a test for PR55151

2012-11-07 Thread Vladimir Makarov

On 12-11-07 5:27 PM, H.J. Lu wrote:

On Wed, Nov 7, 2012 at 2:21 PM, Vladimir Makarov  wrote:

   The following patch adds omitted target for the test.  The test was
supposed to run on x86-64 only.  On 32-bit x86, it should fail.  Reload
fails on this test on x86 too although with an error message.  I am going to
add a generation of a message too.

   Committed as rev. 193311.

2012-11-07  Vladimir Makarov  

 PR rtl-optimization/55151
 * gcc.dg/pr55151.c: Compile it only for x86_64.


Checking x86_64-*-* target is incorrect since i686 GCC can support
64-bit.  You should check !ia32 target:

/* { dg-do compile { target { ! { ia32 } } } } */



Thanks, H.J.  I've just fixed it.

Index: testsuite/ChangeLog
===
--- testsuite/ChangeLog (revision 193316)
+++ testsuite/ChangeLog (working copy)
@@ -1,3 +1,8 @@
+2012-11-07  Vladimir Makarov  
+
+   PR rtl-optimization/55151
+   * gcc.dg/pr55151.c: Use ia32 instead of x86_64.
+
 2012-11-05  Uros Bizjak  

* gcc.dg/tree-ssa/cunroll-1.c: Scan cunrolli dump.
Index: testsuite/gcc.dg/pr55151.c
===
--- testsuite/gcc.dg/pr55151.c  (revision 193316)
+++ testsuite/gcc.dg/pr55151.c  (working copy)
@@ -1,5 +1,5 @@
 /* PR rtl-optimization/55151 */
-/* { dg-do compile  { target x86_64-*-* } } */
+/* { dg-do compile  { target { ! { ia32 } } } } */
 /* { dg-options "-fPIC" } */

 int a, b, c, d, e, f, g, h, i, j, k, l;



Re: [PATCH, testsuite]: UNRESOLVED: gcc.dg/tree-ssa/cunroll-1.c

2012-11-07 Thread Uros Bizjak
On Wed, Nov 7, 2012 at 1:40 PM, Uros Bizjak  wrote:

> Attached patch addresses UNRESOLVED part of cunroll-1.c test failure,
> but with fixed dump filename, I got:
>
> FAIL: gcc.dg/tree-ssa/cunroll-1.c scan-tree-dump cunrolli "Unrolled
> loop 1 completely .duplicated 1 times.."

Now committed as obvious, with following ChangeLog:

2012-11-07  Uros Bizjak  

* gcc.dg/tree-ssa/cunroll-1.c: Scan cunrolli dump.

Uros.


[PATCH] Add extensive commentary to sparc's "U" constraint.

2012-11-07 Thread David Miller

Vlad, I wanted to make you aware of the following because it's
a major barrier for using LRA on sparc at this time.  I therefore
do not think moving to LRA on this target is possible in the 4.8
timeframe, which is fine.  The situation is described completely
in the comment I am adding in the patch below.

The most alarming aspect of this to me was discovering that IRA could
allocate registers to a pseudo that did not pass HARD_REGNO_MODE_OK,
and this anomaly is completely masked because reload and our splitters
end up fixing things up.

I wanted to explicitly thank you for your work on LRA because without
it we would never have discovered these inconsistencies in the sparc
backend.

One idea that occurred to me was perhaps to extend
define_register_constraint such that an extra condition can be
specified.  So for sparc's constraint "U" it would evaluate to
GENERAL_REGS but also express the condition that the hard register
must be even.  Then we could make the implementation of the macro
REG_CLASS_FROM_CONSTRAINT test the extra condition specified in
define_register_constraint, and return NO_REGS if that condition does
not pass.

But it would be much nicer if register classes could do what we need
them to.  Such a solution would be both cleaner, and significantly
more efficient.

* config/sparc/constraints.md ("U"): Document, in detail,
which this constraint is necessary.
---
 gcc/ChangeLog   |  5 +
 gcc/config/sparc/constraints.md | 38 +-
 2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 24d9845..64e7596 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2012-11-07  David S. Miller  
+
+   * config/sparc/constraints.md ("U"): Document, in detail,
+   which this constraint is necessary.
+
 2012-11-07  Richard Henderson  
 
* trans-mem.c (pass_ipa_tm): Don't use TODO_update_ssa.
diff --git a/gcc/config/sparc/constraints.md b/gcc/config/sparc/constraints.md
index 2f8c6ad..440dc57 100644
--- a/gcc/config/sparc/constraints.md
+++ b/gcc/config/sparc/constraints.md
@@ -130,7 +130,43 @@
   (match_code "mem")
   (match_test "memory_ok_for_ldd (op)")))
 
-;; Not needed in 64-bit mode
+;; This awkward register constraint is necessary because it is not
+;; possible to express the "must be even numbered regsiter" condition
+;; using register classes.  The problem is that membership in a
+;; register class requires that all registers of a multi-regno
+;; register be included in the set.  It is add_to_hard_reg_set
+;; and in_hard_reg_set_p which populate and test regsets with these
+;; semantics.
+;;
+;; So this means that we would have to put both the even and odd
+;; register into the register class, which would not restrict things
+;; at all.
+;;
+;; Using a combination of GENERAL_REGS and HARD_REGNO_MODE_OK is not a
+;; full solution either.  In fact, even though IRA uses the macro
+;; HARD_REGNO_MODE_OK to calculate which registers are prohibited from
+;; use in certain modes, it still can allocate an odd hard register
+;; for DImode values.  This is due to how IRA populates the table
+;; ira_useful_class_mode_regs[][].  It suffers from the same problem
+;; as using a register class to describe this restriction.  Namely, it
+;; sets both the odd and even part of an even register pair in the
+;; regset.  Therefore IRA can and will allocate odd registers for
+;; DImode values on 32-bit.
+;;
+;; There are legitimate cases where DImode values can end up in odd
+;; hard registers, the most notable example is argument passing.
+;;
+;; What saves us is reload and the DImode splitters.  Both are
+;; necessary.  The odd register splitters cannot match if, for
+;; example, we have a non-offsetable MEM.  Reload will notice this
+;; case and reload the address into a single hard register.
+;;
+;; The real downfall of this awkward register constraint is that it does
+;; not evaluate to a true register class like a bonafide use of
+;; define_register_constraint would.  This currently means that we cannot
+;; use LRA on Sparc, since the constraint processing of LRA really depends
+;; upon whether an extra constraint is for registers or not.  It uses
+;; REG_CLASS_FROM_CONSTRAINT, and checks it against NO_REGS.
 (define_constraint "U"
  "Pseudo-register or hard even-numbered integer register"
  (and (match_test "TARGET_ARCH32")
-- 
1.7.12.2.dirty



Re: [PATCH, middle-end]: FIX PR55253, [4.8 Regression] FAIL: gcc.target/i386/pr44948-2a.c

2012-11-07 Thread Uros Bizjak
On Wed, Nov 7, 2012 at 11:08 PM, Uros Bizjak  wrote:

> The patch simply removes the call to emit_block_move, while still
> calling copy_blkmode_from_reg when appropriate. The patch fixes the
> testsuite failure and produces the same code as gcc-4.7.
>
> 2012-11-07  Uros Bizjak  
>
> PR middle-end/55235
> * expr.c (store_expr): Do not call emit_block_move for
> non-BLKmode values.
>
> Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}.
>
> OK for mainline?

Oh, I didn't notice that Eric already approved the patch in the PR.
Although he thinks the "feature" that the patch fixes is a bit
controversial, it actually restores previous functionality.

So, committed to mainline SVN.

Uros.


Fix PR middle-end/55219

2012-11-07 Thread Eric Botcazou
This is a regression present on the mainline and 4.7 branch.  For expressions 
of the form:

  h = (g ? c : g ? f : g ? e : g ? i : g ? f : g ? e : g ? d : x)
  + (a ? : a ? : a ? : a ? : a ? : a ? : a ? : a ? : a ? : a ? : a
 ? j : a ? : 0 ? : a ? : a ? : a ? : a ? : a ? : a ? k : a ? : x);

there is a memory explosion in the folder: fold_binary_op_with_conditional_arg 
is distributing the + inside the conditional expressions at each level 
recursively.  The transformation was originally applied only if the argument 
was TREE_CONSTANT, but during 4.7 development I extended it to more general 
arguments to help Ada.

Fixed by disabling recursion entirely, as it was originally.


Bootstrapped/regtested on x86_64-suse-linux, applied on mainline and 4.7 
branch as obvious.


2012-11-07  Eric Botcazou  

PR middle-end/55219
* fold-const.c (fold_binary_op_with_conditional_arg): Do not fold if
the argument is itself a conditional expression.


2012-11-07  Eric Botcazou  

* gcc.c-torture/compile/20121107-1.c: New test.


-- 
Eric BotcazouIndex: fold-const.c
===
--- fold-const.c	(revision 193280)
+++ fold-const.c	(working copy)
@@ -5987,10 +5987,11 @@ fold_binary_op_with_conditional_arg (loc
 cond_code = VEC_COND_EXPR;
 
   /* This transformation is only worthwhile if we don't have to wrap ARG
- in a SAVE_EXPR and the operation can be simplified on at least one
- of the branches once its pushed inside the COND_EXPR.  */
+ in a SAVE_EXPR and the operation can be simplified without recursing
+ on at least one of the branches once its pushed inside the COND_EXPR.  */
   if (!TREE_CONSTANT (arg)
   && (TREE_SIDE_EFFECTS (arg)
+	  || TREE_CODE (arg) == COND_EXPR || TREE_CODE (arg) == VEC_COND_EXPR
 	  || TREE_CONSTANT (true_value) || TREE_CONSTANT (false_value)))
 return NULL_TREE;
 /* PR middle-end/55219 */
/* Testcase by Markus Trippelsdorf  */

int x, c, d, e, f, g, h, i;
double j;
const int k;
const enum { B } a;
void
fn1 (void)
{
  h = (g ? c : g ? f : g ? e : g ? i : g ? f : g ? e : g ? d : x)
  + (a ? : a ? : a ? : a ? : a ? : a ? : a ? : a ? : a ? : a ? : a
 ? j : a ? : 0 ? : a ? : a ? : a ? : a ? : a ? : a ? k : a ? : x);
}

Re: patch fixing a test for PR55151

2012-11-07 Thread H.J. Lu
On Wed, Nov 7, 2012 at 2:21 PM, Vladimir Makarov  wrote:
>   The following patch adds omitted target for the test.  The test was
> supposed to run on x86-64 only.  On 32-bit x86, it should fail.  Reload
> fails on this test on x86 too although with an error message.  I am going to
> add a generation of a message too.
>
>   Committed as rev. 193311.
>
> 2012-11-07  Vladimir Makarov  
>
> PR rtl-optimization/55151
> * gcc.dg/pr55151.c: Compile it only for x86_64.
>

Checking x86_64-*-* target is incorrect since i686 GCC can support
64-bit.  You should check !ia32 target:

/* { dg-do compile { target { ! { ia32 } } } } */


-- 
H.J.


patch fixing a test for PR55151

2012-11-07 Thread Vladimir Makarov
  The following patch adds omitted target for the test.  The test was 
supposed to run on x86-64 only.  On 32-bit x86, it should fail.  Reload 
fails on this test on x86 too although with an error message.  I am 
going to add a generation of a message too.


  Committed as rev. 193311.

2012-11-07  Vladimir Makarov  

PR rtl-optimization/55151
* gcc.dg/pr55151.c: Compile it only for x86_64.



patch to fix PR55122

2012-11-07 Thread Vladimir Makarov

  The following patch fixes

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55122

  The problem was in generation of reload pseudo for matching operands 
with uniq value which prevented to assign the same hard register for the 
reload pseudo and the original input pseudo when the choice of hard regs 
was quite small (AD regs).


  The patch was successfully bootstrapped and tested on x86/x86-64.

  Committed as rev. 193310.

2012-11-07  Vladimir Makarov  

PR rtl-optimization/55122
* lra-constraints.c (match_reload): Sync values for dead input
pseudos.

2012-11-07  Vladimir Makarov  

PR rtl-optimization/55122
* gcc.dg/pr55122.c: New test.


Index: lra-constraints.c
===
--- lra-constraints.c   (revision 193303)
+++ lra-constraints.c   (working copy)
@@ -682,6 +682,11 @@ match_reload (signed char out, signed ch
new_out_reg = gen_lowpart_SUBREG (outmode, reg);
  else
new_out_reg = gen_rtx_SUBREG (outmode, reg, 0);
+ /* If the input reg is dying here, we can use the same hard
+register for REG and IN_RTX.  */
+ if (REG_P (in_rtx)
+ && find_regno_note (curr_insn, REG_DEAD, REGNO (in_rtx)))
+   lra_reg_info[REGNO (reg)].val = lra_reg_info[REGNO (in_rtx)].val;
}
   else
{
@@ -698,6 +703,19 @@ match_reload (signed char out, signed ch
 it at the end of LRA work.  */
  clobber = emit_clobber (new_out_reg);
  LRA_TEMP_CLOBBER_P (PATTERN (clobber)) = 1;
+ if (GET_CODE (in_rtx) == SUBREG)
+   {
+ rtx subreg_reg = SUBREG_REG (in_rtx);
+ 
+ /* If SUBREG_REG is dying here and sub-registers IN_RTX
+and NEW_IN_REG are similar, we can use the same hard
+register for REG and SUBREG_REG.  */
+ if (REG_P (subreg_reg) && GET_MODE (subreg_reg) == outmode
+ && SUBREG_BYTE (in_rtx) == SUBREG_BYTE (new_in_reg)
+ && find_regno_note (curr_insn, REG_DEAD, REGNO (subreg_reg)))
+   lra_reg_info[REGNO (reg)].val
+ = lra_reg_info[REGNO (subreg_reg)].val;
+   }
}
 }
   else
Index: testsuite/gcc.dg/pr55122.c
===
--- testsuite/gcc.dg/pr55122.c  (revision 0)
+++ testsuite/gcc.dg/pr55122.c  (working copy)
@@ -0,0 +1,14 @@
+/* PR rtl-optimization/55122 */
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+int i, a;
+unsigned long long b;
+
+void f(void)
+{
+for(i = 0; i < 15; i++)
+b *= b;
+
+b *= a ? 0 : b;
+}


[PATCH, middle-end]: FIX PR55253, [4.8 Regression] FAIL: gcc.target/i386/pr44948-2a.c

2012-11-07 Thread Uros Bizjak
Hello!

Attached patch fixes an oversight, introduced in Revision 192641 [1]
that caused following testsuite failure on i686:

FAIL: gcc.target/i386/pr44948-2a.c (internal compiler error)
FAIL: gcc.target/i386/pr44948-2a.c (test for excess errors)

As shown in the PR [2], the referred patch activated the call to
emit_block_move that was previously effectively dead code (see how
modes of temp and target are checked). emit_block_move ICEs when
constant is passed to it, so we got ICE for following arguments:

(gdb) p debug_rtx (x)
(mem/j/c:BLK (plus:SI (reg/f:SI 54 virtual-stack-vars)
(const_int -16 [0xfff0])) [0 a.V4SF+0 S16 A128])
$1 = void
(gdb) p debug_rtx (y)
(const_vector:V4SF [
(const_double:SF 0.0 [0x0.0p+0])
(const_double:SF 1.0e+0 [0x0.8p+1])
(const_double:SF 2.0e+0 [0x0.8p+2])
(const_double:SF 3.0e+0 [0x0.cp+2])
])
$2 = void

The patch simply removes the call to emit_block_move, while still
calling copy_blkmode_from_reg when appropriate. The patch fixes the
testsuite failure and produces the same code as gcc-4.7.

2012-11-07  Uros Bizjak  

PR middle-end/55235
* expr.c (store_expr): Do not call emit_block_move for
non-BLKmode values.

Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}.

OK for mainline?

[1] http://gcc.gnu.org/ml/gcc-cvs/2012-10/msg00764.html
[2] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55235#c3

Uros.
Index: expr.c
===
--- expr.c  (revision 193296)
+++ expr.c  (working copy)
@@ -5246,19 +5246,12 @@ store_expr (tree exp, rtx target, int call_param_p
{
  if (GET_MODE (target) == BLKmode)
{
- if (REG_P (temp))
-   {
- if (TREE_CODE (exp) == CALL_EXPR)
-   copy_blkmode_from_reg (target, temp, TREE_TYPE (exp));
- else
-   store_bit_field (target,
-INTVAL (expr_size (exp)) * BITS_PER_UNIT,
-0, 0, 0, GET_MODE (temp), temp);
-   }
+ if (REG_P (temp) && TREE_CODE (exp) == CALL_EXPR)
+   copy_blkmode_from_reg (target, temp, TREE_TYPE (exp));
  else
-   emit_block_move (target, temp, expr_size (exp),
-(call_param_p
- ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
+   store_bit_field (target,
+INTVAL (expr_size (exp)) * BITS_PER_UNIT,
+0, 0, 0, GET_MODE (temp), temp);
}
  else
convert_move (target, temp, TYPE_UNSIGNED (TREE_TYPE (exp)));


Re: [Bug libstdc++/54075] [4.7.1] unordered_map insert still slower than 4.6.2

2012-11-07 Thread François Dumont

Here is the patch to fix the redundant rehash/reserve issue.

2012-11-07  François Dumont  

PR libstdc++/54075
* include/bits/hashtable.h (_Hashtable<>::rehash): Reset hash
policy state if no rehash.
* testsuite/23_containers/unordered_set/modifiers/reserve.cc
(test02): New.

I had prepared and tested it in 4.7 branch but I can apply the same on 
trunk.


Ok to commit ? If so, where ?

François

On 11/06/2012 10:33 PM, paolo.carlini at oracle dot com wrote:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54075

--- Comment #39 from Paolo Carlini  2012-11-06 
21:33:57 UTC ---
Ok thanks. I guess depending on the complexity of the fixes we can apply some
only to mainline first and reconsider the 4_7 branch later. Please do your best
to work on both issues: we just entered Stage 3 thus no new features from now
on, we are all concentrated on bug fixes until the release.



Index: include/bits/hashtable.h
===
--- include/bits/hashtable.h	(revision 193258)
+++ include/bits/hashtable.h	(working copy)
@@ -1597,6 +1597,9 @@
 	  // level.
 	  _M_rehash_policy._M_prev_resize = 0;
 	}
+  else
+	// No rehash, restore previous state to keep a consistent state.
+	_M_rehash_policy._M_reset(__saved_state);
 }
 
   template Set;
+  Set s;
+  s.reserve(N);
+  s.reserve(N);
+
+  std::size_t bkts = s.bucket_count();
+  for (int i = 0; i != N; ++i)
+{
+  s.insert(i);
+  // As long as we insert less than the reserved number of elements we
+  // shouldn't experiment any rehash.
+  VERIFY( s.bucket_count() == bkts );
+}
+}
+
 int main()
 {
   test01();
+  test02();
   return 0;
 }


Re: [PATCH] gcc-{ar,nm,ranlib}: Find binutils binaries relative to self

2012-11-07 Thread Meador Inge
Ping ^ 4.

On 10/29/2012 10:46 AM, Meador Inge wrote:
> Ping ^ 3.
> 
> On 10/18/2012 10:30 AM, Meador Inge wrote:
>> Ping ^ 2.
>>
>> On 10/09/2012 09:44 PM, Meador Inge wrote:
>>> Ping.
>>>
>>> On 10/04/2012 03:45 PM, Meador Inge wrote:
 Hi All,

 Currently the gcc-{ar,nm,ranlib} utilities assume that binutils is in
 path when invoking the wrapped binutils program.  This goes against the
 accepted practice in GCC to find sub-programs relative to where the
 GCC binaries are stored and to not make assumptions about the PATH.

 This patch changes the gcc-{ar,nm,ranlib} utilities to do the same
 by factoring out some utility code for finding files from collect2.c.
 These functions are then leveraged to find the binutils programs.
 Note that similar code exist in gcc.c.  Perhaps one day everything
 can be merged to the file-find files.

 Tested for Windows and GNU/Linux hosts and i686-pc-linux-gnu and
 arm-none-eabi targets.  OK?

 P.S. I am not quite sure what is best for the copyrights and contributed
 by comments in the file-find* files I added since that code was just moved.
 This patch drops the contributed by and keeps all the copyright dates from
 collect2.c.

 2012-10-04  Meador Inge  

* collect2.c (main): Call find_file_set_debug.
(find_a_find, add_prefix, prefix_from_env, prefix_from_string):
Factor out into ...
* file-find.c (New file): ... here and ...
* file-find.h (New file): ... here.
* gcc-ar.c (standard_exec_prefix): New variable.
(standard_libexec_prefix): Ditto.
(tooldir_base_prefix) Ditto.
(self_exec_prefix): Ditto.
(self_libexec_prefix): Ditto.
(self_tooldir_prefix): Ditto.
(target_version): Ditto.
(path): Ditto.
(target_path): Ditto.
(setup_prefixes): New function.
(main): Rework how wrapped programs are found.
* Makefile.in (OBJS-libcommon-target): Add file-find.o.
(AR_OBJS): New variable.
(gcc-ar$(exeext)): Add dependency on $(AR_OBJS).
(gcc-nm$(exeext)): Ditto.
(gcc-ranlib(exeext)): Ditto.
(COLLECT2_OBJS): Add file-find.o.
(collect2.o): Add file-find.h prerequisite.
(file-find.o): New rule.

 Index: gcc/gcc-ar.c
 ===
 --- gcc/gcc-ar.c   (revision 192099)
 +++ gcc/gcc-ar.c   (working copy)
 @@ -21,21 +21,110 @@
  #include "config.h"
  #include "system.h"
  #include "libiberty.h"
 +#include "file-find.h"
  
  #ifndef PERSONALITY
  #error "Please set personality"
  #endif
  
 +/* The exec prefix as derived at compile-time from --prefix.  */
 +
 +static const char standard_exec_prefix[] = STANDARD_EXEC_PREFIX;
 +
 +/* The libexec prefix as derived at compile-time from --prefix.  */
 +
  static const char standard_libexec_prefix[] = STANDARD_LIBEXEC_PREFIX;
 +
 +/* The bindir prefix as derived at compile-time from --prefix.  */
 +
  static const char standard_bin_prefix[] = STANDARD_BINDIR_PREFIX;
 -static const char *const target_machine = TARGET_MACHINE;
  
 +/* A relative path to be used in finding the location of tools
 +   relative to this program.  */
 +
 +static const char *const tooldir_base_prefix = TOOLDIR_BASE_PREFIX;
 +
 +/* The exec prefix as relocated from the location of this program.  */
 +
 +static const char *self_exec_prefix;
 +
 +/* The libexec prefix as relocated from the location of this program.  */
 +
 +static const char *self_libexec_prefix;
 +
 +/* The tools prefix as relocated from the location of this program.  */
 +
 +static const char *self_tooldir_prefix;
 +
 +/* The name of the machine that is being targeted.  */
 +
 +static const char *const target_machine = DEFAULT_TARGET_MACHINE;
 +
 +/* The target version.  */
 +
 +static const char *const target_version = DEFAULT_TARGET_VERSION;
 +
 +/* The collection of target specific path prefixes.  */
 +
 +static struct path_prefix target_path;
 +
 +/* The collection path prefixes.  */
 +
 +static struct path_prefix path;
 +
 +/* The directory separator.  */
 +
  static const char dir_separator[] = { DIR_SEPARATOR, 0 };
  
 +static void
 +setup_prefixes (const char *exec_path)
 +{
 +  const char *self;
 +
 +  self = getenv ("GCC_EXEC_PREFIX");
 +  if (!self)
 +self = exec_path;
 +  else
 +self = concat (self, "gcc-" PERSONALITY, NULL);
 +
 +  /* Relocate the exec prefix.  */
 +  self_exec_prefix = make_relative_prefix (self,
 + standard_bin_prefix,
 + standard_exec_prefix);
 +  i

Re: [Committed] S/390: Add support for the new IBM zEnterprise EC12

2012-11-07 Thread Gerald Pfeifer
On Wed, 7 Nov 2012, Andreas Krebbel wrote:
> Sure. What about something like this?
> 
> Index: htdocs/index.html
> ===
> RCS file: /cvs/gcc/wwwdocs/htdocs/index.html,v
> retrieving revision 1.865
> diff -u -r1.865 index.html
> --- htdocs/index.html 6 Nov 2012 12:17:13 -   1.865
> +++ htdocs/index.html 7 Nov 2012 09:36:17 -
> @@ -53,6 +53,12 @@
> 
>  
> 
> +IBM zEnterprise EC12 support
> +[2012-10-10]
> +Support for the latest release of the System z mainframe
> + href="http://www.ibm.com/systems/z/hardware/zenterprise/zec12.html";>zEC12
> +has been added to the architecture back-end.

Note that RMS has asked us to not add links to products/companies
that are not (doing) exclusively free software.  This is why you
generally won't find links to corporate home pages in this section.

(Where it goes for technical references such as readings.html, I
believe we can/should be pragmatic.)

Also, don't you want to add acknowledgements who did the work and
which company contributed it?  (Slightly obvious as far as the 
latter goes, I think, but... ;-)

> Index: htdocs/gcc-4.8/changes.html
> ===
> +Register pressure sensitive insn scheduling is enabled by
> +  default.

I'd use "instruction scheduling".

> +The IFUNC function attribute is enabled by default.

IFUNC

> +memcpy and memcmp invokations on big memory chunks or with

memcpy and memcmp

> +  runtime lengths are not generated inline anymore when tuning for
> +  z10 or higher.  The purpose is to make use of the IFUNC
> +  optimized versions in Glibc.

Note http://gcc.gnu.org/codingconventions.html as far as the use of
"runtime" goes.


This is fine with these minor adjustments; thanks!

Gerald


[trans-mem] Don't update_ssa twice

2012-11-07 Thread Richard Henderson
When I patched Aldy's code to perform the update_ssa explicitly,
I forgot to take out the TODO_update_ssa that Aldy had added.

Tested on x86_64-linux and committed.


r~
* trans-mem.c (pass_ipa_tm): Don't use TODO_update_ssa.


diff --git a/gcc/trans-mem.c b/gcc/trans-mem.c
index 642e088..a7b4a9c 100644
--- a/gcc/trans-mem.c
+++ b/gcc/trans-mem.c
@@ -5355,7 +5355,7 @@ struct simple_ipa_opt_pass pass_ipa_tm =
   0,   /* properties_provided */
   0,   /* properties_destroyed */
   0,   /* todo_flags_start */
-  TODO_update_ssa, /* todo_flags_finish */
+  0,   /* todo_flags_finish */
  },
 };
 


Re: [PATCH] Enable -mcpu=power8 for PowerPC

2012-11-07 Thread Peter Bergner
On Tue, 2012-11-06 at 10:35 -0500, David Edelsohn wrote:
> > * doc/invoke.texi (-mcpu=power8): Document.
> > * config.in (HAVE_AS_POWER8): New.
> > * config.gcc: Add cpu_type power8.
> > * configure.ac: (HAVE_AS_POWER8): Check for assembler support for 
> > the
> > POWER8 instructions.
> > * configure: Regenerate.
> > * config/rs6000/rs6000.h: (ASM_CPU_POWER8_SPEC): Define.
> > (ASM_CPU_SPEC): Pass %(asm_cpu_power8) for -mcpu=power8.
> > (EXTRA_SPECS): Add asm_cpu_power8 spec string.
> > * config/rs6000/rs6000-cpus.def (processor_target_table): Alias
> > POWER8 to POWER7.
> > * config/rs6000/driver-rs6000.c (ASM_CPU_SPEC): For -mcpu=power8,
> > pass %(asm_cpu_power8)/-mpwr8.
> > * config/rs6000/aix53.h: Likewise.
> > * config/rs6000/aix61.h: Likewise.
> 
> This patch is okay.

Thanks, committed as revision 193307.

Peter




[PATCH, i386]: Fix PR55224, FAIL: gcc.target/i386/tailcall-1.c scan-assembler jmp

2012-11-07 Thread Uros Bizjak
Hello!

Apparently, vzeroupper patch removed a couple of unrelated lines.
Attached patch puts back what was there in gcc-4.5.

(Also, the patch finds a better place for check_avx256_stores.)

2012-11-07  Uros Bizjak  

PR target/55224
* config/i386/i386.c (ix86_function_ok_for_sibcall): Put back exception
to make a sibcall if one of the functions has void return type.

Patch was tested on x86_64-pc-linux-gnu {,-m32} and committed to mainline SVN.

Uros.

Index: i386.c
===
--- i386.c  (revision 193296)
+++ i386.c  (working copy)
@@ -4638,6 +4622,8 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
   if (!rtx_equal_p (a, b))
return false;
 }
+  else if (VOID_TYPE_P (TREE_TYPE (DECL_RESULT (cfun->decl
+;
   else if (!rtx_equal_p (a, b))
 return false;


Re: [PATCH] Vtable pointer verification, gcc changes (patch 2 of 2)

2012-11-07 Thread Xinliang David Li
See some random comments below.  Some test cases should also be added.
It should be easy to fake the attack by using placement new with
incompatible type ..

David


>  /* Start the process of running a particular set of global constructors
> or destructors.  Subroutine of do_[cd]tors.  */
>
> -static tree
> -start_objects (int method_type, int initp)
> +tree
> +start_objects (int method_type, int initp, const char *extra_name)


Why do you need to make this global? The name start_objects are too
short and can
cause name conflicts. If you really need it to be global, defining a
wrapper function
with longer name as a global seems better.

The new parameter is not documented.


>  {
>tree body;
>tree fndecl;
> -  char type[14];
> +  char *type = NULL;
>
>/* Make ctor or dtor function.  METHOD_TYPE may be 'I' or 'D'.  */
>
> @@ -2984,15 +2982,22 @@ start_objects (int method_type, int init
>joiner = '_';
>  #endif
>
> -  sprintf (type, "sub_%c%c%.5u", method_type, joiner, initp);
> +  type = (char *) xmalloc ((17 + strlen (extra_name)) * sizeof (char));
> +  sprintf (type, "sub_%c%c%.5u%s", method_type, joiner, initp, 
> extra_name);


Why changing the type name?


>  }
>else
> -sprintf (type, "sub_%c", method_type);
> +{
> +  type = (char *) xmalloc (5 * sizeof (char));
> +  sprintf (type, "sub_%c", method_type);
> +}
>
>fndecl = build_lang_decl (FUNCTION_DECL,
>  get_file_function_name (type),
>  build_function_type_list (void_type_node,
>NULL_TREE));
> +
> +  free (type);
> +
>start_preparsed_function (fndecl, /*attrs=*/NULL_TREE, SF_PRE_PARSED);
>
>TREE_PUBLIC (current_function_decl) = 0;
> @@ -3018,7 +3023,7 @@ start_objects (int method_type, int init
>  /* Finish the process of running a particular set of global constructors
> or destructors.  Subroutine of do_[cd]tors.  */
>
> -static void
> +tree

Document the return value.


>  finish_objects (int method_type, int initp, tree body)
>  {
>tree fn;
> @@ -3031,6 +3036,10 @@ finish_objects (int method_type, int ini
>  {
>DECL_STATIC_CONSTRUCTOR (fn) = 1;
>decl_init_priority_insert (fn, initp);
> +
> +  if (flag_vtable_verify
> +  && strstr (IDENTIFIER_POINTER (DECL_NAME (fn)), ".vtable"))
> +return fn;
>  }
>else
>  {
> @@ -3039,6 +3048,7 @@ finish_objects (int method_type, int ini
>  }
>
>
> ===
> --- gcc/cp/vtable-class-hierarchy.c (revision 0)
> +++ gcc/cp/vtable-class-hierarchy.c (revision 0)
> @@ -0,0 +1,918 @@

Please add documentation (comments) for all functions defined in this file.

Some high level description of implementation structure at
the top of the file may also be helpful.

>
> --- gcc/tree-vtable-verify.c (revision 0)
> +++ gcc/tree-vtable-verify.c (revision 0)

Same comments as above.

On Mon, Nov 5, 2012 at 9:48 AM, Caroline Tice  wrote:
> As requested, I have split the original patch into two parts: GCC
> changes and runtime library changes.  The attached patch is fore the
> gcc changes.
>
> -- Caroline Tice
> cmt...@google.com
>
> 2012-11-05  Caroline Tice  
>
> * tree.h (save_vtable_map_decl): New function decl.
> * tree-pass.h (pass_vtable_verify): New pass declaration.
> * cp/init.c (build_vtbl_address): Remove 'static' qualifier from
> function declaration and definition.
> * cp/class.c (finish_struct_1):  Add call to vtv_save_class_info,
> if the vtable verify flag is set.
> * cp/Make-lang.in: Add vtable-class-hierarchy.o to list of object
> files.  Add definition for building vtable-class-hierarchy.o.
> * cp/pt.c (mark_class_instantiated):  Add call to vtv_save_class_info
> if the vtable verify flag is set.
> * cp/decl2 (start_objects): Remove 'static' qualifier from function
> declaratin and definition.  Add new paramater, 'extra_name'.  Change
> 'type' var from char array to char *.  Call xmalloc & free for 'type'.
> Add 'extra_name' to 'type' string.
> (finish_objects): Remove 'static' qualifier from function declaration
> and definition. Change return type from void to tree.  Make function
> return early if we're doing vtable verification and the function is
> a vtable verification constructor init function.  Make this function
> return 'fn'.
> (generate_ctor_or_dtor_function):  Add third argument to calls to
> start_objects.
> (cp_write_global_declarations):  Add calls to vtv_recover_class_info,
> vtv_compute_class_hierarchy_transitive_closure, and
> vtv_generate_init_routine, if the vtable verify flag is set.
> * cp/config-lang.in (gtfiles): Add vtable-class-hierarchy.c to the
> list of gtfiles.
> * cp/vtable-class-hierarchy.c: New file.
> * cp/mangle.c (get_mangled_id): Remove static 

Re: [PATCH,RX] Support Bit Manipulation on Memory Operands

2012-11-07 Thread Richard Henderson
On 2012-11-07 00:51, Naveen H. S wrote:
> +  [(set (match_operand:QI 0 "rx_restricted_mem_operand" "=Q")
> + (ior:QI (match_dup 0)

The output constraint is now an in-out:  s/=Q/+Q/.


r~


Re: [google] Add attributes: always_patch_for_instrumentation and never_patch_for_instrumentation (issue6821051)

2012-11-07 Thread Xinliang David Li
ok for google branches.

David

On Tue, Nov 6, 2012 at 5:17 PM, Harshit Chopra  wrote:
> Yes, will do, but probably not so soon. Once I have some spare time to
> prepare my case for this being useful to public.
>
> Meanwhile, this patch is just for google-main and then I will port it
> to google_4-7 and adds to the already existing functionality of
> -mpatch-function-for-instrumentation.
>
> Thanks,
> Harshit
>
>
> On Mon, Nov 5, 2012 at 12:29 PM, Xinliang David Li  wrote:
>> It does not hurt to submit the patch for review -- you need to provide
>> more background and motivation for this work
>> 1) comparison with -finstrument-functions (runtime overhead etc)
>> 2) use model difference (production binary ..)
>> 3) Interesting examples of use cases (with graphs).
>>
>> thanks,
>>
>> David
>>
>> On Mon, Nov 5, 2012 at 12:20 PM, Harshit Chopra  wrote:
>>> Thanks David for the review. My comments are inline.
>>>
>>>
>>> On Sat, Nov 3, 2012 at 12:38 PM, Xinliang David Li  
>>> wrote:

 Harshit, Nov 5 is the gcc48 cutoff date. If you want to have the x-ray
 instrumentation feature into this release, you will need to port your
 patch and submit for trunk review now.
>>>
>>>
>>> I am a bit too late now, I guess. If I target for the next release,
>>> will it create any issues for the gcc48 release?
>>>



 On Tue, Oct 30, 2012 at 5:15 PM, Harshit Chopra  wrote:
 > Adding function attributes: 'always_patch_for_instrumentation' and 
 > 'never_patch_for_instrumentation' to always patch a function or to never 
 > patch a function, respectively, when given the option 
 > -mpatch-functions-for-instrumentation. Additionally, the attribute 
 > always_patch_for_instrumentation disables inlining of that function.
 >
 > Tested:
 >   Tested by 'crosstool-validate.py --crosstool_ver=16 
 > --testers=crosstool'
 >
 > ChangeLog:
 >
 > 2012-10-30  Harshit Chopra 
 >
 > * gcc/c-family/c-common.c 
 > (handle_always_patch_for_instrumentation_attribute): Handle
 >   always_patch_for_instrumentation attribute and turn inlining off for 
 > the function.
 > (handle_never_patch_for_instrumentation_attribute): Handle 
 > never_patch_for_instrumentation
 >   attribute of a function.
 > * gcc/config/i386/i386.c (check_should_patch_current_function): 
 > Takes into account
 >   always_patch_for_instrumentation or never_patch_for_instrumentation 
 > attribute when
 >   deciding that a function should be patched.
 > * 
 > gcc/testsuite/gcc.target/i386/patch-functions-force-no-patching.c: Test 
 > case
 >   to test for never_patch_for_instrumentation attribute.
 > * 
 > gcc/testsuite/gcc.target/i386/patch-functions-force-patching.c: Test 
 > case to
 >   test for always_patch_for_instrumentation attribute.
 > * gcc/tree.h (struct GTY): Add fields for the two attributes and 
 > macros to access
 >   the fields.
 > diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
 > index ab416ff..998645d 100644
 > --- a/gcc/c-family/c-common.c
 > +++ b/gcc/c-family/c-common.c
 > @@ -396,6 +396,13 @@ static tree ignore_attribute (tree *, tree, tree, 
 > int, bool *);
 >  static tree handle_no_split_stack_attribute (tree *, tree, tree, int, 
 > bool *);
 >  static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *);
 >
 > +static tree handle_always_patch_for_instrumentation_attribute (tree *, 
 > tree,
 > +   tree, 
 > int,
 > +   bool *);

 Move bool * to the previous line.
>>>
>>>
>>> If I do that, it goes beyond the 80 char boundary.
>>>


 > +static tree handle_never_patch_for_instrumentation_attribute (tree *, 
 > tree,
 > +  tree, int,
 > +  bool *);
 > +

 Same here.
>>>
>>>
>>> As above.
>>>


 >  static void check_function_nonnull (tree, int, tree *);
 >  static void check_nonnull_arg (void *, tree, unsigned HOST_WIDE_INT);
 >  static bool nonnull_check_p (tree, unsigned HOST_WIDE_INT);
 > @@ -804,6 +811,13 @@ const struct attribute_spec 
 > c_common_attribute_table[] =
 >   The name contains space to prevent its usage in source code.  */
 >{ "fn spec",   1, 1, false, true, true,
 >   handle_fnspec_attribute, false },
 > +  { "always_patch_for_instrumentation", 0, 0, true,  false, false,
 > +  
 > handle_always_patch_for_instrumentation_attribute,
 > +  false },
 > +  { "never_patch_for_instrumentation", 0, 0,

Re: [C++11] PR54413 Option for turning off compiler extensions for numeric literals.

2012-11-07 Thread Jakub Jelinek
On Wed, Nov 07, 2012 at 10:22:57AM -0500, Jason Merrill wrote:
> >I thought about that.  We'd need some machinery that would allow cpp to 
> >query what has been declared already.
> 
> Or alternately, always treat them as user-defined in C++ mode and
> have the front end decide to use the built-in interpretation if no
> literal operator is declared.

Yeah, I think that would be best.

Jakub


Re: [C++11] PR54413 Option for turning off compiler extensions for numeric literals.

2012-11-07 Thread Jason Merrill

On 11/06/2012 05:20 PM, 3dw...@verizon.net wrote:

So how about
   -f[no-]ext-numeric-literals


Sure.


I think the ideal behavior for these suffixes would be to treat them as
user-defined literals if a corresponding literal operator is available,
or use the built-in extension if not. But that doesn't need to happen now.


I thought about that.  We'd need some machinery that would allow cpp to query 
what has been declared already.


Or alternately, always treat them as user-defined in C++ mode and have 
the front end decide to use the built-in interpretation if no literal 
operator is declared.


Jason



Re: [PATCH] Vzeroupper placement/47440

2012-11-07 Thread Vladimir Yakovlev
Hello,

Thanyou for investigation and fixing the problem.  I'll answer on remarks later.

Regards,
Vladimir

2012/11/7 Jakub Jelinek :
> On Tue, Nov 06, 2012 at 02:11:50PM -0800, H.J. Lu wrote:
>> On Tue, Nov 6, 2012 at 2:30 AM, Kirill Yukhin  
>> wrote:
>> > Hello,
>> >> OK for mainline SVN, please commit.
>> > Checked into GCC trunk: http://gcc.gnu.org/ml/gcc-cvs/2012-11/msg00176.html
>> >
>> > Thanks, K
>>
>> This caused:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55224
>
> Not only that, it also broke --enable-checking=yes,rtl bootstrap.
> SET_DEST isn't valid on CALL, but XEXP (call, 0) is a MEM anyway and
> the code looks for reg, so I think looking for CALL was just a mistake.
>
> This fixes the bootstrap, ok for trunk?
>
> 2012-11-06  Jakub Jelinek  
>
> * config/i386/i386.c (ix86_avx_u128_mode_after): Don't
> look for reg in CALL operand.
>
> --- gcc/config/i386/i386.c.jj   2012-11-06 18:10:22.0 +0100
> +++ gcc/config/i386/i386.c  2012-11-06 20:15:09.068912242 +0100
> @@ -15084,9 +15084,9 @@ ix86_avx_u128_mode_after (int mode, rtx
>/* Check for CALL instruction.  */
>if (CALL_P (insn))
>  {
> -  if (GET_CODE (pat) == SET || GET_CODE (pat) == CALL)
> +  if (GET_CODE (pat) == SET)
> reg = SET_DEST (pat);
> -  else if (GET_CODE (pat) ==  PARALLEL)
> +  else if (GET_CODE (pat) == PARALLEL)
> for (i = XVECLEN (pat, 0) - 1; i >= 0; i--)
>   {
> rtx x = XVECEXP (pat, 0, i);
>
>
> Jakub


Re: Asan/Tsan Unit/Regression testing (was [asan] Emit GIMPLE direclty, small cleanups)

2012-11-07 Thread Kostya Serebryany
On Tue, Nov 6, 2012 at 4:26 PM, Xinliang David Li  wrote:
>
> As asan/tsan functionality is getting into trunk, we need to set up
> testings as soon as possible to avoid bitrot.
>
> Kostya can probably shed some lights on the test case requirements,
> and we can continue discussions on how to extend dejagnu to import
> those tests.

asan has 3 kinds of tests.

1. LLVM unittests (Text file with LLVM IR and the expected result of
the transformations).
Example:  
http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Instrumentation/AddressSanitizer/do-not-touch-threadlocal.ll?revision=145092&view=markup
I am not sure if anything similar is possible with GCC.

2. Large Gtest-based unittest. This is a set of c++ files that should
be built with the asan switch, depends on gtest
(http://code.google.com/p/googletest/).
http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/asan/tests/asan_test.cc?revision=166104&view=markup
This should be easy to port to GCC, but it requires gtest.

3. Full output tests (a .cc file should be build with asan switch,
executable should be run and the stderr is compared with the expected
output)
Example: 
http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/asan/lit_tests/stack-overflow.cc?revision=165391&view=markup
The can be ported to GCC, but the uses of FileCheck
(http://llvm.org/docs/CommandGuide/FileCheck.html) will need to be
replaced with GCC's analog.
We should probably start with these tests.

tsan tests have similar structure.

--kcc


Re: [PATCH] Make IPA-CP work on aggregates

2012-11-07 Thread Jan Hubicka
> On Wed, Nov 07, 2012 at 03:39:15PM +0100, Martin Jambor wrote:
> > another bootstrap finishes.  I'm not sure if it would be OK to commit
> > it now, given it is stage3, though.  OTOH, I think it would be worth
> > it.
> 
> I'm ok with getting that in now from RM POV, but not familiar with the
> code enough to review it.  So, if somebody acks it (Honza?), it can be
> added.
> 
> > 2012-11-07  Martin Jambor  
> > 
> > * ipa-prop.c (determine_known_aggregate_parts): Skip writes to
> > different declarations when tracking writes to a declaration.
> > 
> > Index: src/gcc/ipa-prop.c
> > ===
> > --- src.orig/gcc/ipa-prop.c
> > +++ src/gcc/ipa-prop.c
> > @@ -1318,7 +1318,12 @@ determine_known_aggregate_parts (gimple
> > break;
> > }
> >else if (lhs_base != arg_base)
> > -   break;
> > +   {
> > + if (DECL_P (lhs_base))
> > +   continue;
> > + else
> > +   break;
> > +   }

OK, so the point of patch is to not stop on writes to decls while looking
for value the field is initialized to?

It looks OK.
Thanks,
Honza
> >  
> >if (lhs_offset + lhs_size < arg_offset
> >   || lhs_offset >= (arg_offset + arg_size))
> 
>   Jakub


Re: [PATCH] Make IPA-CP work on aggregates

2012-11-07 Thread Jakub Jelinek
On Wed, Nov 07, 2012 at 03:39:15PM +0100, Martin Jambor wrote:
> another bootstrap finishes.  I'm not sure if it would be OK to commit
> it now, given it is stage3, though.  OTOH, I think it would be worth
> it.

I'm ok with getting that in now from RM POV, but not familiar with the
code enough to review it.  So, if somebody acks it (Honza?), it can be
added.

> 2012-11-07  Martin Jambor  
> 
>   * ipa-prop.c (determine_known_aggregate_parts): Skip writes to
>   different declarations when tracking writes to a declaration.
> 
> Index: src/gcc/ipa-prop.c
> ===
> --- src.orig/gcc/ipa-prop.c
> +++ src/gcc/ipa-prop.c
> @@ -1318,7 +1318,12 @@ determine_known_aggregate_parts (gimple
>   break;
>   }
>else if (lhs_base != arg_base)
> - break;
> + {
> +   if (DECL_P (lhs_base))
> + continue;
> +   else
> + break;
> + }
>  
>if (lhs_offset + lhs_size < arg_offset
> || lhs_offset >= (arg_offset + arg_size))

Jakub


Re: [PATCH] Make IPA-CP work on aggregates

2012-11-07 Thread Martin Jambor
On Tue, Nov 06, 2012 at 02:35:30PM +0100, Jakub Jelinek wrote:
> On Tue, Nov 06, 2012 at 12:58:07AM +0100, Martin Jambor wrote:
> > 2012-11-05  Martin Jambor  
> > 
> > PR tree-optimization/53787
> > * ipa-cp.c (ipcp_value_source): New field offset.
> ...
> 
> Is this supposed to do something about Fortran array descriptors
> where some fields in the descriptors have known constant values in the
> caller?
> 
> Say
> subroutine bar (a, b, n)
>   integer :: a(n), b(n)
>   call foo (a, b)
> contains
> subroutine foo (a, b)
>   integer :: a(:), b(:)
>   a = b
> end subroutine
> end
> -O2 -fno-inline (there could be thousands of better testcases though, this
> one doesn't look at too many fields).
> With your patch
> foo.1899.constprop.0 is created, but I don't see any immediate other
> effects.  Certainly e.g.
>   _2 = a_1(D)->dim[0].stride;
>   if (_2 != 0)
> remains till *.optimized dump, even when in the caller it is set to 1.
> I guess for Fortran being able to optimize on constant (or even better
> constant one) stride would be very worthwhile.
> 

Oh... but that is not a due to a bug in this patch but due to an
unnecessarily strict bail out condition when building the jump
functions (code that went in in July), a thinko really.  The following
patch fixes it.  So far it is untested but I'll give it a go when
another bootstrap finishes.  I'm not sure if it would be OK to commit
it now, given it is stage3, though.  OTOH, I think it would be worth
it.

Thanks,

Martin


2012-11-07  Martin Jambor  

* ipa-prop.c (determine_known_aggregate_parts): Skip writes to
different declarations when tracking writes to a declaration.

Index: src/gcc/ipa-prop.c
===
--- src.orig/gcc/ipa-prop.c
+++ src/gcc/ipa-prop.c
@@ -1318,7 +1318,12 @@ determine_known_aggregate_parts (gimple
break;
}
   else if (lhs_base != arg_base)
-   break;
+   {
+ if (DECL_P (lhs_base))
+   continue;
+ else
+   break;
+   }
 
   if (lhs_offset + lhs_size < arg_offset
  || lhs_offset >= (arg_offset + arg_size))


Re: [Patch]: Update bb->count to avoid erroneous partitioning decisions

2012-11-07 Thread Jan Hubicka
> 
> > OK,
> > is bb1 going to die?  If not, probably bb1->count = 0 should be there, if 
> > so,
> > then the bb1->frequency = 0 is redundant.
> 
> Agree, we do 'delete_basic_block (bb1)' and the frequency is not used in
> between, so the setting to 0 seems unnecessary.
> 
> testing it:
> 
> Index: tree-ssa-tail-merge.c
> ===
> --- tree-ssa-tail-merge.c   (revision 193283)
> +++ tree-ssa-tail-merge.c   (working copy)
> @@ -1488,8 +1488,9 @@ replace_block_by (basic_block bb1, basic_block bb2
>bb2->frequency += bb1->frequency;
>if (bb2->frequency > BB_FREQ_MAX)
>  bb2->frequency = BB_FREQ_MAX;
> -  bb1->frequency = 0;
> 
> +  bb2->count += bb1->count;
> +
>/* Do updates that use bb1, before deleting bb1.  */
>release_last_vdef (bb1);
>same_succ_flush_bb (bb1);
> 
> OK when validation completes ?

OK,
thanks.
Honza
> 
> thanks
> 
> Christian


Re: [PATCH] fix libgomp.c++/pr24455.C failures on darwin

2012-11-07 Thread David Edelsohn
AIX has the exact same problem.  Thanks for tracking down the solution
on Darwin.  I applied the equivalent testsuite option for AIX.

Thanks, David

* testsuite/libgomp.c++/pr24455.C: Use -Wl,-G on AIX.

--- a/libgomp/testsuite/libgomp.c++/pr24455.C 2012-06-18
17:57:13.0 -0400
+++ b/libgomp/testsuite/libgomp.c++/pr24455.C   2012-11-06
11:43:55.0 -0500
@@ -1,6 +1,7 @@
 // { dg-do run }
 // { dg-additional-sources pr24455-1.C }
 // { dg-require-effective-target tls_runtime }
+// { dg-options "-Wl,-G" { target powerpc-ibm-aix* } }

 extern "C" void abort (void);


Re: [libbacktrace] Use getexecname() on Solaris

2012-11-07 Thread Rainer Orth
Gerald Pfeifer  writes:

> Just a small note, in the following
>
>   +#ifdef __FreeBSD__
>   +# define DEFAULT_PROCESS_FILENAME "/proc/curproc/file"
>   +#elif defined(HAVE_GETEXECNAME)
>   +# define DEFAULT_PROCESS_FILENAME getexecname ()
>   +#else
>   +# define DEFAULT_PROCESS_FILENAME "/proc/self/exe"
>   +#endif
>
> would it make sense to have the feature test (HAVE_GETEXECNAME) before
> the OS test (__FreeBSD__), so that when/if the OS implements the feature
> in newer versions that takes precedence?

Good point.  I've incorporated this into my patch and regularly include
it in my *-*-solaris2.{9, 10, 11} and x86_64-unknown-linux-gnu
bootstraps.

Ok for mainline?

Rainer


2012-10-05  Rainer Orth  
Gerald Pfeifer  

libbacktrace:
* configure.ac: Check for getexecname.
* configure: Regenerate.
* config.h.in: Regenerate.
* internal.h (DEFAULT_PROCESS_FILENAME): Define.
* fileline.c (fileline_initialize): Use it.
* print.c (error_callback): Likewise.
Include .

# HG changeset patch
# Parent a6a174227cae12381edf325b21adc905e8fa50e6
Use getexecname() on Solaris

diff --git a/libbacktrace/configure.ac b/libbacktrace/configure.ac
--- a/libbacktrace/configure.ac
+++ b/libbacktrace/configure.ac
@@ -289,6 +289,19 @@ fi
 
 AC_CHECK_DECLS(strnlen)
 
+# Check for getexecname function.
+if test -n "${with_target_subdir}"; then
+   case "${host}" in
+   *-*-solaris2*) have_getexecname=yes ;;
+   *) have_getexecname=no ;;
+   esac
+else
+  AC_CHECK_FUNC(getexecname, [have_getexecname=yes], [have_getexecname=no])
+fi
+if test "$have_getexecname" = "yes"; then
+  AC_DEFINE(HAVE_GETEXECNAME, 1, [Define if getexecname is available.])
+fi
+
 AC_CACHE_CHECK([whether tests can run],
   [libbacktrace_cv_sys_native],
   [AC_RUN_IFELSE([AC_LANG_PROGRAM([], [return 0;])],
diff --git a/libbacktrace/fileline.c b/libbacktrace/fileline.c
--- a/libbacktrace/fileline.c
+++ b/libbacktrace/fileline.c
@@ -82,7 +82,8 @@ fileline_initialize (struct backtrace_st
   if (state->filename != NULL)
 descriptor = backtrace_open (state->filename, error_callback, data, NULL);
   else
-descriptor = backtrace_open ("/proc/self/exe", error_callback, data, NULL);
+descriptor = backtrace_open (DEFAULT_PROCESS_FILENAME, error_callback,
+ data, NULL);
   if (descriptor < 0)
 failed = 1;
 
diff --git a/libbacktrace/internal.h b/libbacktrace/internal.h
--- a/libbacktrace/internal.h
+++ b/libbacktrace/internal.h
@@ -56,6 +56,14 @@ POSSIBILITY OF SUCH DAMAGE.  */
 # endif
 #endif
 
+#ifdef HAVE_GETEXECNAME
+# define DEFAULT_PROCESS_FILENAME getexecname ()
+#elif defined(__FreeBSD__)
+# define DEFAULT_PROCESS_FILENAME "/proc/curproc/file"
+#else
+# define DEFAULT_PROCESS_FILENAME "/proc/self/exe"
+#endif
+
 #ifndef HAVE_SYNC_FUNCTIONS
 
 /* Define out the sync functions.  These should never be called if
diff --git a/libbacktrace/print.c b/libbacktrace/print.c
--- a/libbacktrace/print.c
+++ b/libbacktrace/print.c
@@ -35,6 +35,7 @@ POSSIBILITY OF SUCH DAMAGE.  */
 #include 
 #include 
 #include 
+#include 
 
 #include "backtrace.h"
 #include "internal.h"
@@ -73,7 +74,7 @@ error_callback (void *data, const char *
 
   name = pdata->state->filename;
   if (name == NULL)
-name = "/proc/self/exe";
+name = DEFAULT_PROCESS_FILENAME;
   fprintf (stderr, "%s: libbacktrace: %s", name, msg);
   if (errnum > 0)
 fprintf (stderr, ": %s", strerror (errnum));

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH] Fix fold reassociation (PR c++/55137)

2012-11-07 Thread Jakub Jelinek
Hi!

The first (C++) testcase is rejected since my SIZEOF_EXPR folding deferral
changes, the problem is that
-1 + (int) (sizeof (int) - 1)
is first changed into
-1 + (int) ((unsigned) sizeof (int) + UINT_MAX)
and then fold_binary_loc reassociates it in int type into
(int) sizeof (int) + [-2(overflow)], thus introducing overflow where
there hasn't been originally.  maybe_const_value then refuses to fold it
into constant due to that.  I've fixed that by the moving the TYPE_OVERFLOW
check from before the operation to a check whether the whole reassociation
doesn't introduce overflows (while previously it was testing solely
for overflow introduced on fold_converted lit0/lit1 being combined
together, not e.g. when overflow was introduced by fold_converting lit0
resp. lit1, or for minus_lit{0,1} constants).
Unfortunately that patch lead to regression on loop-15.c:
+FAIL: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized "+" 0
+FAIL: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized "n_. * n_." 
1
as we were no longer reassociating an expression used in number of
iterations calculation.

Looking at that lead me to the second testcase below, which I hope is
valid C, the overflow happens there in unsigned type, thus with defined
wrapping, then there is implementation defined? conversion to signed type
(but all our targets are two's complement), and associate_trees:
was reassociating it in signed type, thus introducing signed overflow
where there wasn't before.  Fixed by doing the reassociation in the unsigned
type if (at least) one of the operands is of unsigned type.  This fixes
the loop-15.c testcase as well as the new testcase.

CCing Eric as the author of the last changes in that area.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2012-11-07  Jakub Jelinek  

PR c++/55137
* fold-const.c (fold_binary_loc) : Don't introduce
TREE_OVERFLOW through reassociation.  If type doesn't have defined
overflow, but one or both of the operands do, use the wrapping type
for reassociation and only convert to type at the end.

* g++.dg/opt/pr55137.C: New test.
* gcc.c-torture/execute/pr55137.c: New test.

--- gcc/fold-const.c.jj 2012-11-07 09:16:41.929494183 +0100
+++ gcc/fold-const.c2012-11-07 09:47:12.227710542 +0100
@@ -10337,6 +10337,7 @@ fold_binary_loc (location_t loc,
{
  tree var0, con0, lit0, minus_lit0;
  tree var1, con1, lit1, minus_lit1;
+ tree atype = type;
  bool ok = true;
 
  /* Split both trees into variables, constants, and literals.  Then
@@ -10352,11 +10353,25 @@ fold_binary_loc (location_t loc,
  if (code == MINUS_EXPR)
code = PLUS_EXPR;
 
- /* With undefined overflow we can only associate constants with one
-variable, and constants whose association doesn't overflow.  */
+ /* With undefined overflow prefer doing association in a type
+which wraps on overflow, if that is one of the operand types.  */
  if ((POINTER_TYPE_P (type) && POINTER_TYPE_OVERFLOW_UNDEFINED)
  || (INTEGRAL_TYPE_P (type) && !TYPE_OVERFLOW_WRAPS (type)))
{
+ if (INTEGRAL_TYPE_P (TREE_TYPE (arg0))
+ && TYPE_OVERFLOW_WRAPS (TREE_TYPE (arg0)))
+   atype = TREE_TYPE (arg0);
+ else if (INTEGRAL_TYPE_P (TREE_TYPE (arg1))
+  && TYPE_OVERFLOW_WRAPS (TREE_TYPE (arg1)))
+   atype = TREE_TYPE (arg1);
+ gcc_assert (TYPE_PRECISION (atype) == TYPE_PRECISION (type));
+   }
+
+ /* With undefined overflow we can only associate constants with one
+variable, and constants whose association doesn't overflow.  */
+ if ((POINTER_TYPE_P (atype) && POINTER_TYPE_OVERFLOW_UNDEFINED)
+ || (INTEGRAL_TYPE_P (atype) && !TYPE_OVERFLOW_WRAPS (atype)))
+   {
  if (var0 && var1)
{
  tree tmp0 = var0;
@@ -10367,14 +10382,14 @@ fold_binary_loc (location_t loc,
  if (CONVERT_EXPR_P (tmp0)
  && INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (tmp0, 0)))
  && (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (tmp0, 0)))
- <= TYPE_PRECISION (type)))
+ <= TYPE_PRECISION (atype)))
tmp0 = TREE_OPERAND (tmp0, 0);
  if (TREE_CODE (tmp1) == NEGATE_EXPR)
tmp1 = TREE_OPERAND (tmp1, 0);
  if (CONVERT_EXPR_P (tmp1)
  && INTEGRAL_TYPE_P (TREE_TYPE (TREE_OPERAND (tmp1, 0)))
  && (TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (tmp1, 0)))
- <= TYPE_PRECISION (type)))
+ <= TYPE_PRECISION (atype)))
tmp1 = TREE_OPERAND (tmp1, 0);
  /* The only case we can still associate with two variab

Re: [Patch]: Update bb->count to avoid erroneous partitioning decisions

2012-11-07 Thread Christian Bruel

> OK,
> is bb1 going to die?  If not, probably bb1->count = 0 should be there, if so,
> then the bb1->frequency = 0 is redundant.

Agree, we do 'delete_basic_block (bb1)' and the frequency is not used in
between, so the setting to 0 seems unnecessary.

testing it:

Index: tree-ssa-tail-merge.c
===
--- tree-ssa-tail-merge.c   (revision 193283)
+++ tree-ssa-tail-merge.c   (working copy)
@@ -1488,8 +1488,9 @@ replace_block_by (basic_block bb1, basic_block bb2
   bb2->frequency += bb1->frequency;
   if (bb2->frequency > BB_FREQ_MAX)
 bb2->frequency = BB_FREQ_MAX;
-  bb1->frequency = 0;

+  bb2->count += bb1->count;
+
   /* Do updates that use bb1, before deleting bb1.  */
   release_last_vdef (bb1);
   same_succ_flush_bb (bb1);

OK when validation completes ?

thanks

Christian


[PATCH, testsuite]: UNRESOLVED: gcc.dg/tree-ssa/cunroll-1.c

2012-11-07 Thread Uros Bizjak
Hello!

Attached patch addresses UNRESOLVED part of cunroll-1.c test failure,
but with fixed dump filename, I got:

FAIL: gcc.dg/tree-ssa/cunroll-1.c scan-tree-dump cunrolli "Unrolled
loop 1 completely .duplicated 1 times.."

I'll leave this to Honza to decide.

Uros.

Index: gcc.dg/tree-ssa/cunroll-1.c
===
--- gcc.dg/tree-ssa/cunroll-1.c (revision 193292)
+++ gcc.dg/tree-ssa/cunroll-1.c (working copy)
@@ -8,6 +8,6 @@
 a[i]=5;
 }
 /* Array bounds says the loop will not roll much.  */
-/* { dg-final { scan-tree-dump "Unrolled loop 1 completely
.duplicated 1 times.." "cunroll"} } */
-/* { dg-final { scan-tree-dump "Last iteration exit edge was proved
true." "cunroll"} } */
+/* { dg-final { scan-tree-dump "Unrolled loop 1 completely
.duplicated 1 times.." "cunrolli"} } */
+/* { dg-final { scan-tree-dump "Last iteration exit edge was proved
true." "cunrolli"} } */
 /* { dg-final { cleanup-tree-dump "cunrolli" } } */


Re: [PATCH] Vzeroupper placement/47440

2012-11-07 Thread Uros Bizjak
On Wed, Nov 7, 2012 at 9:04 AM, Jakub Jelinek  wrote:

> Or I wonder why is call handled specially at all, doesn't
>   /* Check if a 256bit AVX register is referenced in stores.  */
>   state = unused;
>   note_stores (pat, check_avx256_stores, &state);
>   if (state == used)
> return AVX_U128_DIRTY;
> handle it?  Then it would just need to be if (CALL_P (insn)) return 
> AVX_U128_CLEAN.
> BTW, the formatting is wrong in some spots, e.g.
> check_avx256_stores (rtx dest, const_rtx set, void *data)
> {
>   if (((REG_P (dest) || MEM_P(dest))
>
> I'd prefer to leave this to the original submitter.

I have committed following patch that address all the above issues.

2012-11-07  Uros Bizjak  

* config/i386/i386.c (enum upper_128bits_state): Remove.
(check_avx256_store): Use bool pointer argument.
(ix86_avx_u128_mode_needed): Use note_stores also for CALL insns.
* config/i386/predicates.md (vzeroupper_operation): Use match_test.

Bootstrapped and regression tested on x86_64-pc-linux-gnu, committed.

Uros.
Index: predicates.md
===
--- predicates.md   (revision 193292)
+++ predicates.md   (working copy)
@@ -1231,10 +1231,8 @@
 
 ;; return true if OP is a vzeroupper operation.
 (define_predicate "vzeroupper_operation"
-  (match_code "unspec_volatile")
-{
-  return XINT (op, 1) == UNSPECV_VZEROUPPER;
-})
+  (and (match_code "unspec_volatile")
+   (match_test "XINT (op, 1) == UNSPECV_VZEROUPPER")))
 
 ;; Return true if OP is a parallel for a vbroadcast permute.
 
Index: i386.c
===
--- i386.c  (revision 193292)
+++ i386.c  (working copy)
@@ -65,27 +65,19 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "tree-flow.h"
 
-enum upper_128bits_state
-{
-  unknown = 0,
-  unused,
-  used
-};
-
 /* Check if a 256bit AVX register is referenced in stores.   */
 
 static void
 check_avx256_stores (rtx dest, const_rtx set, void *data)
 {
-  if (((REG_P (dest) || MEM_P(dest))
+  if (((REG_P (dest) || MEM_P (dest))
&& VALID_AVX256_REG_OR_OI_MODE (GET_MODE (dest)))
   || (GET_CODE (set) == SET
  && (REG_P (SET_SRC (set)) || MEM_P (SET_SRC (set)))
  && VALID_AVX256_REG_OR_OI_MODE (GET_MODE (SET_SRC (set)
 {
-  enum upper_128bits_state *state
-   = (enum upper_128bits_state *) data;
-  *state = used;
+  bool *used = (bool *) data;
+  *used = true;
 }
 }
 
@@ -14967,23 +14959,24 @@ output_387_binary_op (rtx insn, rtx *operands)
 static int
 ix86_avx_u128_mode_needed (rtx insn)
 {
-  rtx pat = PATTERN (insn);
-  rtx arg;
-  enum upper_128bits_state state;
+  bool avx_u128_used;
 
   if (CALL_P (insn))
 {
+  rtx link;
+
   /* Needed mode is set to AVX_U128_CLEAN if there are
 no 256bit modes used in function arguments.  */
-  for (arg = CALL_INSN_FUNCTION_USAGE (insn); arg;
-  arg = XEXP (arg, 1))
+  for (link = CALL_INSN_FUNCTION_USAGE (insn);
+  link;
+  link = XEXP (link, 1))
{
- if (GET_CODE (XEXP (arg, 0)) == USE)
+ if (GET_CODE (XEXP (link, 0)) == USE)
{
- rtx reg = XEXP (XEXP (arg, 0), 0);
+ rtx arg = XEXP (XEXP (link, 0), 0);
 
- if (reg && REG_P (reg)
- && VALID_AVX256_REG_OR_OI_MODE (GET_MODE (reg)))
+ if (REG_P (arg)
+ && VALID_AVX256_REG_OR_OI_MODE (GET_MODE (arg)))
return AVX_U128_ANY;
}
}
@@ -14992,10 +14985,11 @@ ix86_avx_u128_mode_needed (rtx insn)
 }
 
   /* Check if a 256bit AVX register is referenced in stores.  */
-  state = unused;
-  note_stores (pat, check_avx256_stores, &state);
-  if (state == used)
+  avx_u128_used = false;
+  note_stores (PATTERN (insn), check_avx256_stores, &avx_u128_used);
+  if (avx_u128_used)
 return AVX_U128_DIRTY;
+
   return AVX_U128_ANY;
 }
 
@@ -15079,39 +15073,21 @@ static int
 ix86_avx_u128_mode_after (int mode, rtx insn)
 {
   rtx pat = PATTERN (insn);
-  rtx reg = NULL;
-  int i;
-  enum upper_128bits_state state;
+  bool avx_u128_used;
 
-  /* Check for CALL instruction.  */
-  if (CALL_P (insn))
-{
-  if (GET_CODE (pat) == SET)
-   reg = SET_DEST (pat);
-  else if (GET_CODE (pat) == PARALLEL)
-   for (i = XVECLEN (pat, 0) - 1; i >= 0; i--)
- {
-   rtx x = XVECEXP (pat, 0, i);
-   if (GET_CODE(x) == SET)
- reg = SET_DEST (x);
- }
-  /* Mode after call is set to AVX_U128_DIRTY if there are
-256bit modes used in the function return register.  */
-  if (reg && REG_P (reg) && VALID_AVX256_REG_OR_OI_MODE (GET_MODE (reg)))
-   return AVX_U128_DIRTY;
-  else
-   return AVX_U128_CLEAN;
-}
-
   if (vzeroupper_operation (pat, VOIDmode)
   || vzeroall_operation (pat, VOIDmode))
 return AVX_U128_CLEAN;
 

Re: [PATCH, generic] Support printing of escaped curly braces and vertical bar in assembler output

2012-11-07 Thread Maksim Kuznetsov
> There are four in-tree target architectures that already use %|.  I think
> it would be better if you made these new escapes target-specific.

Escaped curly braces cannot be target-specific since
do_assembler_dialects() in final.c ignores any % and considers '{' and
'}' to be alternative delimeters.

> For the logic to find the end of an alternative, you can simply always
> skip over the next char after any percent sign (well, check for end of
> string, of course); there is no need to count percent signs.

That would not be a general approach. %% stands for printing percent
sign, so in assembler string "{%%}" closing curly brace should be
handled as the end of an alternative, though it follows a percent
sign.

-- 
Maxim Kuznetsov


Re: [Patch]: Update bb->count to avoid erroneous partitioning decisions

2012-11-07 Thread Jan Hubicka
> Hello,
> 
> This tiny patch fixes the issue previously discussed in
> http://gcc.gnu.org/ml/gcc-patches/2012-09/msg00794.html
> 
> Not maintaining bb->count while merging basic blocs results in wrong
> partitioning (and surely other) decisions. This is visible on the SH4
> with shrink-wrapping. I haven't noticed any difference on x86.
> 
> This also solves a few "Invalid sum of incoming frequencies" messages
> while dumping the CFG
> 
> Reg-tested on x85 and sh-superh-elf. Is it OK for the 4.7 and 4.8 branches ?
> 
> Thanks
> 
> Christian

> 2012-11-07  Christian Bruel  
> 
>   * tree-ssa-tail-merge.c (replace_block_by): Update target bb count.
> 
> Index: tree-ssa-tail-merge.c
> ===
> --- tree-ssa-tail-merge.c (revision 193283)
> +++ tree-ssa-tail-merge.c (working copy)
> @@ -1490,6 +1490,8 @@ replace_block_by (basic_block bb1, basic_block bb2
>  bb2->frequency = BB_FREQ_MAX;
>bb1->frequency = 0;
>  
> +  bb2->count += bb1->count;
> +
OK,
is bb1 going to die?  If not, probably bb1->count = 0 should be there, if so,
then the bb1->frequency = 0 is redundant.

honza
>/* Do updates that use bb1, before deleting bb1.  */
>release_last_vdef (bb1);
>same_succ_flush_bb (bb1);



[Patch]: Update bb->count to avoid erroneous partitioning decisions

2012-11-07 Thread Christian Bruel
Hello,

This tiny patch fixes the issue previously discussed in
http://gcc.gnu.org/ml/gcc-patches/2012-09/msg00794.html

Not maintaining bb->count while merging basic blocs results in wrong
partitioning (and surely other) decisions. This is visible on the SH4
with shrink-wrapping. I haven't noticed any difference on x86.

This also solves a few "Invalid sum of incoming frequencies" messages
while dumping the CFG

Reg-tested on x85 and sh-superh-elf. Is it OK for the 4.7 and 4.8 branches ?

Thanks

Christian
2012-11-07  Christian Bruel  

	* tree-ssa-tail-merge.c (replace_block_by): Update target bb count.

Index: tree-ssa-tail-merge.c
===
--- tree-ssa-tail-merge.c	(revision 193283)
+++ tree-ssa-tail-merge.c	(working copy)
@@ -1490,6 +1490,8 @@ replace_block_by (basic_block bb1, basic_block bb2
 bb2->frequency = BB_FREQ_MAX;
   bb1->frequency = 0;
 
+  bb2->count += bb1->count;
+
   /* Do updates that use bb1, before deleting bb1.  */
   release_last_vdef (bb1);
   same_succ_flush_bb (bb1);


Re: [C++ Patch] PR 54922

2012-11-07 Thread Paolo Carlini

On 10/23/2012 07:55 PM, Jason Merrill wrote:

OK.
Unfortunately the patch as-is seems at least incomplete, thus to be sure 
I reverted it for now and re-opened the PR: trying to actually use the 
type showed issues in the gimplifier, see below. If you have hints about 
that I would be glad to further look into the issue (but, honestly, this 
isn't a regression, I don't think it can be considered an high priority 
issue now)


Thanks,
Paolo.

//

54922.C: In function ‘int main()’:
54922.C:14:16: internal compiler error: in gimplify_init_ctor_eval, at 
gimplify.c:3787

   nullable_int n;
^
0x974d3a gimplify_init_ctor_eval
/scratch/Gcc/svn-dirs/trunk/gcc/gimplify.c:3787
0x967c06 gimplify_init_constructor
/scratch/Gcc/svn-dirs/trunk/gcc/gimplify.c:4145
0x9688bf gimplify_modify_expr_rhs
/scratch/Gcc/svn-dirs/trunk/gcc/gimplify.c:4530
0x968cb1 gimplify_modify_expr
/scratch/Gcc/svn-dirs/trunk/gcc/gimplify.c:4840
0x96a3f0 gimplify_expr(tree_node**, gimple_statement_d**, 
gimple_statement_d**, bool (*)(tree_node*), int)

/scratch/Gcc/svn-dirs/trunk/gcc/gimplify.c:7167
0x972b66 gimplify_stmt(tree_node**, gimple_statement_d**)
/scratch/Gcc/svn-dirs/trunk/gcc/gimplify.c:5700
0x975339 gimplify_and_add
/scratch/Gcc/svn-dirs/trunk/gcc/gimplify.c:371
0x975339 gimplify_decl_expr
/scratch/Gcc/svn-dirs/trunk/gcc/gimplify.c:1484
0x96aa5a gimplify_expr(tree_node**, gimple_statement_d**, 
gimple_statement_d**, bool (*)(tree_node*), int)

/scratch/Gcc/svn-dirs/trunk/gcc/gimplify.c:7334
0x972b66 gimplify_stmt(tree_node**, gimple_statement_d**)
/scratch/Gcc/svn-dirs/trunk/gcc/gimplify.c:5700
0x96c4dc gimplify_cleanup_point_expr
/scratch/Gcc/svn-dirs/trunk/gcc/gimplify.c:5477
0x96c4dc gimplify_expr(tree_node**, gimple_statement_d**, 
gimple_statement_d**, bool (*)(tree_node*), int)

/scratch/Gcc/svn-dirs/trunk/gcc/gimplify.c:7504
0x972b66 gimplify_stmt(tree_node**, gimple_statement_d**)
/scratch/Gcc/svn-dirs/trunk/gcc/gimplify.c:5700
0x973ae5 gimplify_bind_expr
/scratch/Gcc/svn-dirs/trunk/gcc/gimplify.c:1230
0x96a44a gimplify_expr(tree_node**, gimple_statement_d**, 
gimple_statement_d**, bool (*)(tree_node*), int)

/scratch/Gcc/svn-dirs/trunk/gcc/gimplify.c:7338
0x972b66 gimplify_stmt(tree_node**, gimple_statement_d**)
/scratch/Gcc/svn-dirs/trunk/gcc/gimplify.c:5700
0x96b4ab gimplify_statement_list
/scratch/Gcc/svn-dirs/trunk/gcc/gimplify.c:1537
0x96b4ab gimplify_expr(tree_node**, gimple_statement_d**, 
gimple_statement_d**, bool (*)(tree_node*), int)

/scratch/Gcc/svn-dirs/trunk/gcc/gimplify.c:7556
0x972b66 gimplify_stmt(tree_node**, gimple_statement_d**)
/scratch/Gcc/svn-dirs/trunk/gcc/gimplify.c:5700
0x972c7e gimplify_body(tree_node*, bool)
/scratch/Gcc/svn-dirs/trunk/gcc/gimplify.c:8200




Re: [v3] Fix profile mode failures

2012-11-07 Thread Jonathan Wakely
On 7 November 2012 10:25, Paolo Carlini wrote:
> Hi,
>
>
> On 11/07/2012 10:18 AM, Jonathan Wakely wrote:
>>
>> On 6 November 2012 19:41, Jonathan Wakely wrote:
>>>
>>> On 6 November 2012 18:21, Paolo Carlini wrote:

  testsuite/20_util/scoped_allocator/1.cc:79: void test02():
 Assertion
 `evv[0].get_allocator().get_personality() == 2' failed.

 I didn't really investigate it...
>>>
>>> Oops, looks like I missed something, will fix it asap ...
>>
>> Fixed with this patch, which also adds an allocator parameter to the
>> vector(size_type) constructor, which is missing from the standard but
>> I hope to get fixed via
>> http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#2210
>
> Thanks a lot!
>
>> I still see some other profile-mode failures, not sure if they're old
>> or I caused them recently, will investigate further.
>
> I'm for example seeing in the log:
>
> 23_containers/list/init-list.cc execution test
>
> pretty mysterious,

Yes, I had a quick look at it but couldn't see the problem, so wanted
to fix the trivial vector problem first.

> I think it's the first time I ever see it.

Huh, then I guess I broke that one too.  I won't rest until it's fixed ;-)


Re: [patch RFA middle-end] Fix PR middle-end/49220

2012-11-07 Thread Kaz Kojima
Uros Bizjak  wrote:
> Please also add the testcase from the PR to the testsuite.

For the record, I've committed the testcase below from the PR.

Regards,
kaz
--
2012-11-07  Kaz Kojima  

* gcc.c-torture/compile/pr49220.c: New test.

--- ORIG/trunk/gcc/testsuite/gcc.c-torture/compile/pr49220.c1970-01-01 
09:00:00.0 +0900
+++ trunk/gcc/testsuite/gcc.c-torture/compile/pr49220.c 2012-11-07 
19:29:26.0 +0900
@@ -0,0 +1,25 @@
+int array[2];
+
+static int
+func1 (int b)
+{
+  return b;
+}
+
+static int
+func2 (int a, int b)
+{
+  return b == 0 ? a : b;
+}
+
+int
+func3 (int a)
+{
+}
+
+int *
+func4 (int *arg)
+{
+  *arg = func1 ((func2 (func3 (array[0]), *arg)) | array[0]);
+  return &array[1];
+}


Re: Enable inliner to bypass inline-insns-single/auto when it knows the performance will improve

2012-11-07 Thread Jan Hubicka
> On Wed, Nov 7, 2012 at 10:40 AM, Jan Hubicka wrote:
> > Hi,
> > with inliner predicates, the inliner heuristic now is able to prove that
> > some of the inlined function body will be optimized out after inlining.
> > This makes it possible to estimate the speedup that is now used to drive
> > the badness metric, but it is ignored in actual decision whether function
> > is inline candidate.
> 
> Is it really still the time for this kind of changes? Development
> stage3 means "regression fixes only" and this isn't a regression...

I discussed this with Jakub/Richi that I would like to do inliner heuristic
re-tunning at early stage 3. This is part of it.  I am hoping to be done soon.
While the changes was done a while ago, I am pushing them out slowly so they
can be indenpendently benchmarked.  I am not able to do too many SPEC2k6 runs
in a week. 

I had bit hard time getting inliner to the level of 4.7 on Mozilla LTO and
tramp3d that are both hard to analyze.  This turned out to be mostly the
addr_expr issues. We no longer forward propagate as much as we did to keep info
for objsize pass this made a lot of C++ abstraction to be no longer zero cost.
Also there was the stupid overflow on time metric making some inlining
copletely random.  Inliner seem to be in relatively good shape performance wise
getting quite consistent improvements in C++. (tramp3d is 50% smaller and
faster than before, wave and DLV also improved in both code size and speed,
Mozilla is faster &smaller and we now get smaller code from -Os than -O2 on the
C++ stuff, LTO SPEC builds got smaller with same speed,
http://gcc.opensuse.org/c++bench-frescobaldi/).

Overall plan I plan to add one extra inliner hint for array indexes to help
fortran array descriptors and enable use of the gcov's histograms that Google
apparently forgot to do (that is FDO only). So if I will wait today to see
effect of ipa-cp change probably going in, I should be done by Saturday (at
speed of patch a day).  

Next week I plan to run some benchmarks to see if the inlining limits can be
pushed down a bit, but it does not seem to be critical.  Pushing overall growth
to 15% or less would make wonders for Firefox with LTO (that probably won't
matter much in practice since we are impracticaly slow and memory hungry at
WPA), reducing inline-insns-auto/single may work given that we can now bypass
it in cases that matter. Neither one is too critical however.

I also still need to analyze botan regression that is only left on the table
(not neccesarily inliner related) for x86 and see if the IA-64 regresisons are
inliner related or something else.  There is also EON regression at -O2 that
seems to be related to unrolling heuristic decision. It seem to reproduce on
AMD hardware only so it may be simple code layout problem.

Plan also look into the comple time regression with large number of callees in
single function. (one of the old Lucier's PRs). This can be fixed by
incrementally updating the call statement costs as edges are added/removed
instead of recomputing them from scratch.

Honza
> 
> Ciao!
> Steven


Re: [v3] Fix profile mode failures

2012-11-07 Thread Paolo Carlini

Hi,

On 11/07/2012 10:18 AM, Jonathan Wakely wrote:

On 6 November 2012 19:41, Jonathan Wakely wrote:

On 6 November 2012 18:21, Paolo Carlini wrote:

 testsuite/20_util/scoped_allocator/1.cc:79: void test02(): Assertion
`evv[0].get_allocator().get_personality() == 2' failed.

I didn't really investigate it...

Oops, looks like I missed something, will fix it asap ...

Fixed with this patch, which also adds an allocator parameter to the
vector(size_type) constructor, which is missing from the standard but
I hope to get fixed via
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#2210

Thanks a lot!

I still see some other profile-mode failures, not sure if they're old
or I caused them recently, will investigate further.

I'm for example seeing in the log:

23_containers/list/init-list.cc execution test

pretty mysterious, I think it's the first time I ever see it.

Paolo.


[v3] Add _GLIBCXX_THROW_OR_ABORT

2012-11-07 Thread Paolo Carlini

Hi,

instead of writing again and again the same conditional, I'm finishing 
testing the below, will install soon if everything goes well.


Thanks,
Paolo.


2012-11-07  Paolo Carlini  

* include/debug/array (_GLIBCXX_THROW_OR_ABORT): Move...
* include/bits/c++config: ... here.
* include/bits/shared_ptr_base.h (__throw_bad_weak_ptr): Use it.
* include/ext/pb_ds/exception.hpp: Likewise.
* include/ext/throw_allocator.h (__throw_forced_error): Likewise.
* include/ext/concurrence.h (__throw_concurrence_lock_error,
__throw_concurrence_unlock_error, __throw_concurrence_broadcast_error,
__throw_concurrence_wait_error): Likewise.
* include/tr1/shared_ptr.h (__throw_bad_weak_ptr): Likewise.
* include/tr1/functional (function<_Res(_ArgTypes...)>::operator()
(_ArgTypes...)): Likewise.
* libsupc++/eh_aux_runtime.cc (__cxxabiv1::__cxa_bad_cast,
__cxxabiv1::__cxa_bad_typeid): Likewise.
* libsupc++/vec.cc (compute_size): Likewise.
* libsupc++/new_op.cc (operator new (std::size_t)): Likewise.
* src/c++11/functexcept.cc: Likewise.
* testsuite/util/io/illegal_input_error.hpp
(__throw_illegal_input_error): Likewise.
* libsupc++/eh_personality.cc: Avoid warning with -fno-exceptions.
* testsuite/ext/profile/mutex_extensions_neg.cc: Adjust dg-error line
numbers.
* testsuite/tr1/2_general_utilities/shared_ptr/cons/43820_neg.cc:
Likewise.
* testsuite/23_containers/array/tuple_interface/
tuple_element_debug_neg.cc: Likewise.
* testsuite/23_containers/array/tuple_interface/get_debug_neg.cc:
Likewise.
* testsuite/20_util/shared_ptr/cons/43820_neg.cc: Likewise.

Index: include/bits/c++config
===
--- include/bits/c++config  (revision 193278)
+++ include/bits/c++config  (working copy)
@@ -1,7 +1,6 @@
 // Predefined symbols and macros -*- C++ -*-
 
-// Copyright (C) 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
-// 2006, 2007, 2008, 2009, 2010, 2011, 2012 Free Software Foundation, Inc.
+// Copyright (C) 1997-2012 Free Software Foundation, Inc.
 //
 // This file is part of the GNU ISO C++ Library.  This library is free
 // software; you can redistribute it and/or modify it under the
@@ -115,6 +114,14 @@
 # define _GLIBCXX_NOTHROW _GLIBCXX_USE_NOEXCEPT
 #endif
 
+#ifndef _GLIBCXX_THROW_OR_ABORT
+# if __EXCEPTIONS
+#  define _GLIBCXX_THROW_OR_ABORT(_EXC) (throw (_EXC))
+# else
+#  define _GLIBCXX_THROW_OR_ABORT(_EXC) (__builtin_abort())
+# endif
+#endif
+
 // Macro for extern template, ie controling template linkage via use
 // of extern keyword on template declaration. As documented in the g++
 // manual, it inhibits all implicit instantiations and is used
Index: include/bits/shared_ptr_base.h
===
--- include/bits/shared_ptr_base.h  (revision 193278)
+++ include/bits/shared_ptr_base.h  (working copy)
@@ -73,13 +73,7 @@
   // Substitute for bad_weak_ptr object in the case of -fno-exceptions.
   inline void
   __throw_bad_weak_ptr()
-  {
-#if __EXCEPTIONS
-throw bad_weak_ptr();
-#else
-__builtin_abort();
-#endif
-  }
+  { _GLIBCXX_THROW_OR_ABORT(bad_weak_ptr()); }
 
   using __gnu_cxx::_Lock_policy;
   using __gnu_cxx::__default_lock_policy;
Index: include/debug/array
===
--- include/debug/array (revision 193278)
+++ include/debug/array (working copy)
@@ -33,14 +33,6 @@
 
 #include 
 
-#ifndef _GLIBCXX_THROW_OR_ABORT
-# if __EXCEPTIONS
-#  define _GLIBCXX_THROW_OR_ABORT(_Exc) (throw (_Exc))
-# else
-#  define _GLIBCXX_THROW_OR_ABORT(_Exc) (__builtin_abort())
-# endif
-#endif
-
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 namespace __debug
@@ -165,7 +157,7 @@
   operator[](size_type __n) const noexcept
   {
return __n < _Nm ? _AT_Type::_S_ref(_M_elems, __n)
-: (_GLIBCXX_THROW_OR_ABORT (_Array_check_subscript<_Nm>(__n)),
+: (_GLIBCXX_THROW_OR_ABORT(_Array_check_subscript<_Nm>(__n)),
_AT_Type::_S_ref(_M_elems, 0));
   }
 
@@ -198,7 +190,7 @@
   front() const
   {
return _Nm ? _AT_Type::_S_ref(_M_elems, 0)
- : (_GLIBCXX_THROW_OR_ABORT (_Array_check_nonempty<_Nm>()),
+ : (_GLIBCXX_THROW_OR_ABORT(_Array_check_nonempty<_Nm>()),
 _AT_Type::_S_ref(_M_elems, 0));
   }
 
@@ -213,7 +205,7 @@
   back() const
   {
return _Nm ? _AT_Type::_S_ref(_M_elems, _Nm - 1)
- : (_GLIBCXX_THROW_OR_ABORT (_Array_check_nonempty<_Nm>()),
+ : (_GLIBCXX_THROW_OR_ABORT(_Array_check_nonempty<_Nm>()),
 _AT_Type::_S_ref(_M_elems, 0));
   }
 
@@ -316,6 +308,4 @@
 };
 } // namespace std
 
-#undef _GLIBCXX_THROW_OR_ABORT
-
 #endif // _GLIBCXX_DEBUG_ARRAY
Index: 

Re: [patch RFA middle-end] Fix PR middle-end/49220

2012-11-07 Thread Uros Bizjak
On Wed, Nov 7, 2012 at 10:57 AM, Eric Botcazou  wrote:
>> 2012-11-07  Kaz Kojima  
>>
>>   PR middle-end/49220
>>   * mode-switching.c (create_pre_exit): Set short_block if there
>>   are no copy insns.
>
> OK, but clearly a rewrite of the function would be in order.

Please also add the testcase from the PR to the testsuite.

Uros.


Re: [patch RFA middle-end] Fix PR middle-end/49220

2012-11-07 Thread Eric Botcazou
> 2012-11-07  Kaz Kojima  
> 
>   PR middle-end/49220
>   * mode-switching.c (create_pre_exit): Set short_block if there
>   are no copy insns.

OK, but clearly a rewrite of the function would be in order.

-- 
Eric Botcazou


Re: RFC/A: set_mem_attributes_minus_bitpos tweak

2012-11-07 Thread Eric Botcazou
> expand_assignment calls:
> 
>if (MEM_P (to_rtx))
>  {
>/* If the field is at offset zero, we could have been given
> the DECL_RTX of the parent struct.  Don't munge it.  */ to_rtx =
> shallow_copy_rtx (to_rtx);
> 
>set_mem_attributes_minus_bitpos (to_rtx, to, 0, bitpos);
>...

The MEM_KEEP_ALIAS_SET_P line seems to be redundant here (but not the 
MEM_VOLATILE_P line).

> But set_mem_attributes_minus_bitpos has:
> 
>   /* Default values from pre-existing memory attributes if present.  */
>   refattrs = MEM_ATTRS (ref);
>   if (refattrs)
> {
>   /* ??? Can this ever happen?  Calling this routine on a MEM that
>already carries memory attributes should probably be invalid.  */
>   attrs.expr = refattrs->expr;
>   attrs.offset_known_p = refattrs->offset_known_p;
>   attrs.offset = refattrs->offset;
>   attrs.size_known_p = refattrs->size_known_p;
>   attrs.size = refattrs->size;
>   attrs.align = refattrs->align;
> }
> 
> which of course applies in this case: we have a MEM for g__style.
> I would expect many calls from this site are for MEMs with an
> existing MEM_EXPR.

Indeed.  Not very clear what to do though.

> Then later:
> 
>   /* If this is a field reference and not a bit-field, record it.  */
>   /* ??? There is some information that can be gleaned from bit-fields,
>such as the word offset in the structure that might be modified.
>But skip it for now.  */
>   else if (TREE_CODE (t) == COMPONENT_REF
>  && ! DECL_BIT_FIELD (TREE_OPERAND (t, 1)))
> 
> so we leave the offset and expr alone.  The end result is:
> 
>   (mem/j/c:SI (symbol_ref:DI ("g__style") [flags 0x4]  0x76e4ee40 g__style>) [0 g__style+0 S1 A64])
> 
> an SImode reference to the first byte (and only the first byte) of g__style.
> Then, when we apply adjust_bitfield_address, it looks like we're moving
> past the end of the region and so we drop the MEM_EXPR.
> 
> In cases where set_mem_attributes_minus_bitpos does set MEM_EXPR based
> on the new tree, it also adds the bitpos to the size.  But I think we
> should do that whenever we set the size based on the new tree,
> regardless of whether we were able to record a MEM_EXPR too.
> 
> That said, this code has lots of ???s in it, so I'm not exactly
> confident about this change.  Thoughts?

It also seems a little bold to me.  Since we now have the new processing of 
MEM_EXPR for bitfields, can't we remove the ! DECL_BIT_FIELD check?

  /* If this is a field reference and not a bit-field, record it.  */
  /* ??? There is some information that can be gleaned from bit-fields,
 such as the word offset in the structure that might be modified.
 But skip it for now.  */
  else if (TREE_CODE (t) == COMPONENT_REF
   && ! DECL_BIT_FIELD (TREE_OPERAND (t, 1)))
{
  attrs.expr = t;
  attrs.offset_known_p = true;
  attrs.offset = 0;
  apply_bitpos = bitpos;
  /* ??? Any reason the field size would be different than
 the size we got from the type?  */
}

This would mean removing the first ??? comment.  As for the second ??? 
comment, the answer is easy: because that's pretty much what defines a 
bitfield!  The size is DECL_SIZE_UNIT and not TYPE_SIZE_UNIT for them.

-- 
Eric Botcazou


Re: Enable inliner to bypass inline-insns-single/auto when it knows the performance will improve

2012-11-07 Thread Steven Bosscher
On Wed, Nov 7, 2012 at 10:40 AM, Jan Hubicka wrote:
> Hi,
> with inliner predicates, the inliner heuristic now is able to prove that
> some of the inlined function body will be optimized out after inlining.
> This makes it possible to estimate the speedup that is now used to drive
> the badness metric, but it is ignored in actual decision whether function
> is inline candidate.

Is it really still the time for this kind of changes? Development
stage3 means "regression fixes only" and this isn't a regression...

Ciao!
Steven


Re: [PATCH] Vzeroupper placement/47440

2012-11-07 Thread Uros Bizjak
On Wed, Nov 7, 2012 at 9:04 AM, Jakub Jelinek  wrote:
> On Wed, Nov 07, 2012 at 08:08:08AM +0100, Uros Bizjak wrote:
>> >> 2012-11-06  Jakub Jelinek  
>> >>
>> >> * config/i386/i386.c (ix86_avx_u128_mode_after): Don't
>> >> look for reg in CALL operand.
>> >
>> > OK.
>>
>> You can also break the loop after reg is found.
>
> I have committed the patch as is to fix the bootstrap, as anything else
> needs another bootstrap/regtest.  I don't think breaking out of the loop
> would be correct, then say for *{,sib}call_value_pop patterns reg would be
> stack pointer rather than the return value of the call.  Due to that pattern
> we can't use single_set, but I wonder if we just can't use XVECEXP (pat, 0, 0)
> unconditionally for the return value, or perhaps check
> the condition inside of the loop (REG_P (reg) && VALID_AVX256_REG_OR_OI_MODE 
> (GET_MODE
> (reg))), return AVX_U128_DIRTY if true (and that way break out of the loop),
> and return AVX_U128_CLEAN after the loop.

Indeed, I didn't notice reverse loop.

> Or I wonder why is call handled specially at all, doesn't
>   /* Check if a 256bit AVX register is referenced in stores.  */
>   state = unused;
>   note_stores (pat, check_avx256_stores, &state);
>   if (state == used)
> return AVX_U128_DIRTY;
> handle it?  Then it would just need to be if (CALL_P (insn)) return 
> AVX_U128_CLEAN.

You are right, I am testing the attached patch.

> BTW, the formatting is wrong in some spots, e.g.
> check_avx256_stores (rtx dest, const_rtx set, void *data)
> {
>   if (((REG_P (dest) || MEM_P(dest))

I have some doubts that this function is fully correct. The comment says:

/* Check if a 256bit AVX register is referenced in stores.   */

But, we are in fact checking stores and uses.

IIRC, U128_DIRTY state is only set for stores to YMM register, so:

static void
check_avx256_stores (rtx dest, const_rtx set ATTRIBUTE_UNUSED, void *data)
{
  if (REG_P (dest)
  && VALID_AVX256_REG_OR_OI_MODE (GET_MODE (dest)))
{
  enum upper_128bits_state *state
= (enum upper_128bits_state *) data;
  *state = used;
}
}

> I'd prefer to leave this to the original submitter.

Yes, Vladimir, can you please comment on these issues?

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 193280)
+++ config/i386/i386.c  (working copy)
@@ -15079,30 +15079,8 @@
 ix86_avx_u128_mode_after (int mode, rtx insn)
 {
   rtx pat = PATTERN (insn);
-  rtx reg = NULL;
-  int i;
   enum upper_128bits_state state;
 
-  /* Check for CALL instruction.  */
-  if (CALL_P (insn))
-{
-  if (GET_CODE (pat) == SET || GET_CODE (pat) == CALL)
-   reg = SET_DEST (pat);
-  else if (GET_CODE (pat) ==  PARALLEL)
-   for (i = XVECLEN (pat, 0) - 1; i >= 0; i--)
- {
-   rtx x = XVECEXP (pat, 0, i);
-   if (GET_CODE(x) == SET)
- reg = SET_DEST (x);
- }
-  /* Mode after call is set to AVX_U128_DIRTY if there are
-256bit modes used in the function return register.  */
-  if (reg && REG_P (reg) && VALID_AVX256_REG_OR_OI_MODE (GET_MODE (reg)))
-   return AVX_U128_DIRTY;
-  else
-   return AVX_U128_CLEAN;
-}
-
   if (vzeroupper_operation (pat, VOIDmode)
   || vzeroall_operation (pat, VOIDmode))
 return AVX_U128_CLEAN;
@@ -15112,6 +15090,10 @@
   note_stores (pat, check_avx256_stores, &state);
   if (state == used)
 return AVX_U128_DIRTY;
+  /* We know that state is clean after CALL insn if there are no
+ 256bit modes used in the function return register.  */
+  else if (CALL_P (insn) && state == unused)
+return AVX_U128_CLEAN;
 
   return mode;
 }


Enable inliner to bypass inline-insns-single/auto when it knows the performance will improve

2012-11-07 Thread Jan Hubicka
Hi,
with inliner predicates, the inliner heuristic now is able to prove that
some of the inlined function body will be optimized out after inlining.
This makes it possible to estimate the speedup that is now used to drive
the badness metric, but it is ignored in actual decision whether function
is inline candidate.

In general the decision on when to inline can be
 1) conservative on code size - when we know code will shrink it is almost
surely a win
 2) uninformed guess - we can just inline and hope something will simplify.
this makes sense to do with small enough function epsecially when user
asks for -O3
 3) informed inline - we know somehting important will simplify.

We already have inline hints handling some cases of 3), like loop strides
and bounds.  This patch just adds the time based hint.
If speedup of runtme of caller+callee exceeds 10%, it is quite likely inlining
is win.
The inlining still may not happen in the end due to other inlining limits.

Bootstrapped/regtested x86_64-linux. Benchmarked on SPEC2k, SPEC2k6, C++
tests, polyhedron and Mozilla.  Largest single win is on c-ray where we now
inline ray_spehere because it will become loop invariant.  There are also
improvements on polyhedron and Mozilla.

Will commit it today or tomorrow depending on when autotesters will hit other
changes.

Honza

PR middle-end/48636
* ipa-inline.c (big_speedup_p): New function.
(want_inline_small_function_p): Use it.
(edge_badness): Dump it.
* params.def (inline-min-speedup): New parameter.
* doc/invoke.texi (inline-min-speedup): Document.

Index: doc/invoke.texi
===
*** doc/invoke.texi (revision 193284)
--- doc/invoke.texi (working copy)
*** by the compiler are investigated.  To th
*** 8941,8946 
--- 8941,8952 
  be applied.
  The default value is 40.
  
+ @item inline-min-speedup
+ When estimated performance improvement of caller + callee runtime exceeds this
+ threshold (in precent), the function can be inlined regardless the limit on
+ @option{--param max-inline-insns-single} and @option{--param
+ max-inline-insns-auto}.
+ 
  @item large-function-insns
  The limit specifying really large functions.  For functions larger than this
  limit after inlining, inlining is constrained by
Index: ipa-inline.c
===
*** ipa-inline.c(revision 193284)
--- ipa-inline.c(working copy)
*** compute_inlined_call_time (struct cgraph
*** 493,498 
--- 493,514 
return time;
  }
  
+ /* Return true if the speedup for inlining E is bigger than
+PARAM_MAX_INLINE_MIN_SPEEDUP.  */
+ 
+ static bool
+ big_speedup_p (struct cgraph_edge *e)
+ {
+   gcov_type time = compute_uninlined_call_time (inline_summary (e->callee),
+ e);
+   gcov_type inlined_time = compute_inlined_call_time (e,
+   estimate_edge_time (e));
+   if (time - inlined_time
+   > RDIV (time * PARAM_VALUE (PARAM_INLINE_MIN_SPEEDUP), 100))
+ return true;
+   return false;
+ }
+ 
  /* Return true if we are interested in inlining small function.
 When REPORT is true, report reason to dump file.  */
  
*** want_inline_small_function_p (struct cgr
*** 514,519 
--- 530,536 
  {
int growth = estimate_edge_growth (e);
inline_hints hints = estimate_edge_hints (e);
+   bool big_speedup = big_speedup_p (e);
  
if (growth <= 0)
;
*** want_inline_small_function_p (struct cgr
*** 521,526 
--- 538,544 
 hints suggests that inlining given function is very profitable.  */
else if (DECL_DECLARED_INLINE_P (callee->symbol.decl)
   && growth >= MAX_INLINE_INSNS_SINGLE
+  && !big_speedup
   && !(hints & (INLINE_HINT_indirect_call
 | INLINE_HINT_loop_iterations
 | INLINE_HINT_loop_stride)))
*** want_inline_small_function_p (struct cgr
*** 574,579 
--- 592,598 
 Upgrade it to MAX_INLINE_INSNS_SINGLE when hints suggests that
 inlining given function is very profitable.  */
else if (!DECL_DECLARED_INLINE_P (callee->symbol.decl)
+  && !big_speedup
   && growth >= ((hints & (INLINE_HINT_indirect_call
   | INLINE_HINT_loop_iterations
   | INLINE_HINT_loop_stride))
*** edge_badness (struct cgraph_edge *edge, 
*** 836,841 
--- 855,862 
   growth,
   edge_time);
dump_inline_hints (dump_file, hints);
+   if (big_speedup_p (edge))
+   fprintf (dump_file, " big_speedup");
fprintf (dump_file, "\n");
  }
  
Index: params.def
===

Re: [Committed] S/390: Add support for the new IBM zEnterprise EC12

2012-11-07 Thread Andreas Krebbel
On 21/10/12 00:14, Gerald Pfeifer wrote:
> On Wed, 10 Oct 2012, Andreas Krebbel wrote:
>> the attached patch adds initial support for the latest release of
>> the IBM mainframe series - the IBM zEnterprise EC12 (zEC12).
> 
> Nice.  Can you please also add a note to the release notes at
> gcc-4.8/changes.html ?
> 
> In principle, I'm also in favor of adding a news item to our
> main page for updates like this since it shows how GCC is
> evolving and supporting the latest hardware releases (even
> if, like here, the code changes are not huge).
> 
> Gerald
> 

Sure. What about something like this?

Index: htdocs/index.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/index.html,v
retrieving revision 1.865
diff -u -r1.865 index.html
--- htdocs/index.html   6 Nov 2012 12:17:13 -   1.865
+++ htdocs/index.html   7 Nov 2012 09:36:17 -
@@ -53,6 +53,12 @@

 

+IBM zEnterprise EC12 support
+[2012-10-10]
+Support for the latest release of the System z mainframe
+http://www.ibm.com/systems/z/hardware/zenterprise/zec12.html";>zEC12
+has been added to the architecture back-end.
+
 GCC 4.7.2 released
 [2012-09-20]
 
Index: htdocs/gcc-4.8/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v
retrieving revision 1.53
diff -u -r1.53 changes.html
--- htdocs/gcc-4.8/changes.html 4 Nov 2012 15:22:03 -   1.53
+++ htdocs/gcc-4.8/changes.html 7 Nov 2012 09:36:17 -
@@ -325,6 +325,29 @@
 command-line option.
   

+S/390, System z
+  
+Support for the IBM zEnterprise zEC12 processor has been
+  added.  When using the -march=zEC12 option, the
+  compiler will generate code making use of the following new
+  instructions:
+  
+   load and trap instructions
+   2 new compare and trap instructions
+   rotate and insert selected bits - without CC clobber
+  
+  The -mtune=zEC12 option enables zEC12 specific
+  instruction scheduling without making use of new
+  instructions.
+Register pressure sensitive insn scheduling is enabled by
+  default.
+The IFUNC function attribute is enabled by default.
+memcpy and memcmp invokations on big memory chunks or with
+  runtime lengths are not generated inline anymore when tuning for
+  z10 or higher.  The purpose is to make use of the IFUNC
+  optimized versions in Glibc.
+  
+
 SH
   
 The default alignment settings have been reduced to be less aggressive.



Minor fixes to ipa-inline-analysis.c

2012-11-07 Thread Jan Hubicka
Hi,
while analyzing c-ray I noticed two issues.  First is that I originally set 
number
of size/time entries to 32.  Once we reach this limit we conservatively account
everything as unconditional.  This limit is not met on relatively simple
testcases, like ray-sphere.  The reason is that aggregate tracking now adds a 
lot
of new conditionals on individual fields. While number of arguments hardly 
exceeds
5, the number of fields passed and used easilly. So there is need to increase
the bound.

While propagating info about non-constant parameters, we should also work in
reverse postorder rather than in random order, since we can propagate things 
down across SSA graph.

Bootstrapped/regtested & comitted.

Honza

Index: ChangeLog
===
--- ChangeLog   (revision 193284)
+++ ChangeLog   (working copy)
@@ -1,3 +1,12 @@
+2012-11-07  Jan Hubicka  
+
+   * ipa-inline-analysis.c (true_predicate, single_cond_predicate,
+   reset_inline_edge_summary): Fix
+   formatting.
+   (account_size_time): Bump up the limit on number of size/time entries to
+   256.
+   (estimate_function_body_sizes): Work in reverse postorder.
+
 2012-11-07  David S. Miller  
 
PR bootstrap/55211
Index: ipa-inline-analysis.c
===
--- ipa-inline-analysis.c   (revision 193284)
+++ ipa-inline-analysis.c   (working copy)
@@ -149,7 +149,7 @@ static inline struct predicate
 true_predicate (void)
 {
   struct predicate p;
-  p.clause[0]=0;
+  p.clause[0] = 0;
   return p;
 }
 
@@ -160,8 +160,8 @@ static inline struct predicate
 single_cond_predicate (int cond)
 {
   struct predicate p;
-  p.clause[0]=1 << cond;
-  p.clause[1]=0;
+  p.clause[0] = 1 << cond;
+  p.clause[1] = 0;
   return p;
 }
 
@@ -692,12 +692,14 @@ account_size_time (struct inline_summary
found = true;
 break;
   }
-  if (i == 32)
+  if (i == 256)
 {
   i = 0;
   found = true;
   e = &VEC_index (size_time_entry, summary->entry, 0);
   gcc_assert (!e->predicate.clause[0]);
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "\t\tReached limit on number of entries, ignoring 
the predicate.");
 }
   if (dump_file && (dump_flags & TDF_DETAILS) && (time || size))
 {
@@ -970,7 +972,7 @@ reset_inline_edge_summary (struct cgraph
 {
   struct inline_edge_summary *es = inline_edge_summary (e);
 
-  es->call_stmt_size = es->call_stmt_time =0;
+  es->call_stmt_size = es->call_stmt_time = 0;
   if (es->predicate)
pool_free (edge_predicate_pool, es->predicate);
   es->predicate = NULL;
@@ -2280,6 +2282,8 @@ estimate_function_body_sizes (struct cgr
   struct predicate bb_predicate;
   struct ipa_node_params *parms_info = NULL;
   VEC (predicate_t, heap) *nonconstant_names = NULL;
+  int nblocks, n;
+  int *order;
 
   info->conds = 0;
   info->entry = 0;
@@ -2312,8 +2316,12 @@ estimate_function_body_sizes (struct cgr
   gcc_assert (my_function && my_function->cfg);
   if (parms_info)
 compute_bb_predicates (node, parms_info, info);
-  FOR_EACH_BB_FN (bb, my_function)
+  gcc_assert (cfun == my_function);
+  order = XNEWVEC (int, n_basic_blocks);
+  nblocks = pre_and_rev_post_order_compute (NULL, order, false);
+  for (n = 0; n < nblocks; n++)
 {
+  bb = BASIC_BLOCK (order[n]);
   freq = compute_call_stmt_bb_frequency (node->symbol.decl, bb);
 
   /* TODO: Obviously predicates can be propagated down across CFG.  */
@@ -2486,6 +2494,7 @@ estimate_function_body_sizes (struct cgr
   time = (time + CGRAPH_FREQ_BASE / 2) / CGRAPH_FREQ_BASE;
   if (time > MAX_TIME)
 time = MAX_TIME;
+  free (order);
 
   if (!early && nonconstant_names)
 {


[patch RFA middle-end] Fix PR middle-end/49220

2012-11-07 Thread Kaz Kojima
Hi,

The attached is yet another create_pre_exit patch to fix
PR middle-end/49220 which is an ice-on-invalid-code.  It's for
non-void function which returns without value.  The patch is
tested with bootstrap and the top level "make -k check" on
i686-pc-linux-gnu with no new failures and regtested on cross
sh4-unknown-linux-gnu.

Regards,
kaz
--
2012-11-07  Kaz Kojima  

PR middle-end/49220
* mode-switching.c (create_pre_exit): Set short_block if there
are no copy insns.

--- ORIG/trunk/gcc/mode-switching.c 2012-11-06 07:33:20.0 +0900
+++ trunk/gcc/mode-switching.c  2012-11-07 07:55:25.0 +0900
@@ -322,7 +322,14 @@ create_pre_exit (int n_entities, int *en
 && GET_CODE (SUBREG_REG (copy_reg)) == REG)
  copy_start = REGNO (SUBREG_REG (copy_reg));
else
- break;
+ {
+   /* When control reaches end of non-void function,
+  there are no return copy insns at all.  This
+  avoids an ice on that invalid function.  */
+   if (ret_start + nregs == ret_end)
+ short_block = 1;
+   break;
+ }
if (copy_start >= FIRST_PSEUDO_REGISTER)
  break;
copy_num


Re: [v3] Fix profile mode failures

2012-11-07 Thread Jonathan Wakely
On 6 November 2012 19:41, Jonathan Wakely wrote:
> On 6 November 2012 18:21, Paolo Carlini wrote:
>>
>> testsuite/20_util/scoped_allocator/1.cc:79: void test02(): Assertion
>> `evv[0].get_allocator().get_personality() == 2' failed.
>>
>> I didn't really investigate it...
>
> Oops, looks like I missed something, will fix it asap ...

Fixed with this patch, which also adds an allocator parameter to the
vector(size_type) constructor, which is missing from the standard but
I hope to get fixed via
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#2210

Tested x86_64-linux, in normal, debug and profile mode, committed to trunk.

I still see some other profile-mode failures, not sure if they're old
or I caused them recently, will investigate further.
commit f1d4bcfe4fb152e0db4bdd7093ad8885e6d88a80
Author: Jonathan Wakely 
Date:   Wed Nov 7 01:03:35 2012 +

* include/bits/stl_vector.h (vector(size_type)): Add missing allocator
parameter.
* include/bits/stl_bvector.h: Likewise.
* include/debug/vector (vector(size_type)): Likewise.
* include/profile/vector (vector(size_type)): Likewise. Pass allocator
to base constructor.
* testsuite/23_containers/vector/requirements/dr438/assign_neg.cc:
Adjust dg-error line numbers.
* testsuite/23_containers/vector/requirements/dr438/
constructor_1_neg.cc: Likewise.
* testsuite/23_containers/vector/requirements/dr438/
constructor_2_neg.cc: Likewise.
* testsuite/23_containers/vector/requirements/dr438/insert_neg.cc:
Likewise.

diff --git a/libstdc++-v3/include/bits/stl_bvector.h 
b/libstdc++-v3/include/bits/stl_bvector.h
index 3adbfa1..b8d3efb 100644
--- a/libstdc++-v3/include/bits/stl_bvector.h
+++ b/libstdc++-v3/include/bits/stl_bvector.h
@@ -555,6 +555,21 @@ template
 vector(const allocator_type& __a)
 : _Base(__a) { }
 
+#ifdef __GXX_EXPERIMENTAL_CXX0X__
+explicit
+vector(size_type __n, const allocator_type& __a = allocator_type())
+: vector(__n, false, __a)
+{ }
+
+vector(size_type __n, const bool& __value, 
+  const allocator_type& __a = allocator_type())
+: _Base(__a)
+{
+  _M_initialize(__n);
+  std::fill(this->_M_impl._M_start._M_p, this->_M_impl._M_end_of_storage, 
+   __value ? ~0 : 0);
+}
+#else
 explicit
 vector(size_type __n, const bool& __value = bool(), 
   const allocator_type& __a = allocator_type())
@@ -564,6 +579,7 @@ template
   std::fill(this->_M_impl._M_start._M_p, this->_M_impl._M_end_of_storage, 
__value ? ~0 : 0);
 }
+#endif
 
 vector(const vector& __x)
 : _Base(__x._M_get_Bit_allocator())
diff --git a/libstdc++-v3/include/bits/stl_vector.h 
b/libstdc++-v3/include/bits/stl_vector.h
index 6e229aa..1f14f7e 100644
--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -261,13 +261,14 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
   /**
*  @brief  Creates a %vector with default constructed elements.
*  @param  __n  The number of elements to initially create.
+   *  @param  __a  An allocator.
*
*  This constructor fills the %vector with @a __n default
*  constructed elements.
*/
   explicit
-  vector(size_type __n)
-  : _Base(__n)
+  vector(size_type __n, const allocator_type& __a = allocator_type())
+  : _Base(__n, __a)
   { _M_default_initialize(__n); }
 
   /**
diff --git a/libstdc++-v3/include/debug/vector 
b/libstdc++-v3/include/debug/vector
index 9c33fdf..fe65bab 100644
--- a/libstdc++-v3/include/debug/vector
+++ b/libstdc++-v3/include/debug/vector
@@ -83,8 +83,8 @@ namespace __debug
 
 #ifdef __GXX_EXPERIMENTAL_CXX0X__
   explicit
-  vector(size_type __n)
-  : _Base(__n), _M_guaranteed_capacity(__n) { }
+  vector(size_type __n, const _Allocator& __a = _Allocator())
+  : _Base(__n, __a), _M_guaranteed_capacity(__n) { }
 
   vector(size_type __n, const _Tp& __value,
 const _Allocator& __a = _Allocator())
diff --git a/libstdc++-v3/include/profile/vector 
b/libstdc++-v3/include/profile/vector
index fcd6962..ec931a3 100644
--- a/libstdc++-v3/include/profile/vector
+++ b/libstdc++-v3/include/profile/vector
@@ -84,8 +84,8 @@ namespace __profile
 
 #ifdef __GXX_EXPERIMENTAL_CXX0X__
   explicit
-  vector(size_type __n)
-  : _Base(__n)
+  vector(size_type __n, const _Allocator& __a = _Allocator())
+  : _Base(__n, __a)
   {
 __profcxx_vector_construct(this, this->capacity());
 __profcxx_vector_construct2(this);
@@ -147,7 +147,7 @@ namespace __profile
   }
 
   vector(const _Base& __x, const _Allocator& __a)
-  : _Base(__x) 
+  : _Base(__x, __a)
   { 
 __profcxx_vector_construct(this, this->capacity());
 __profcxx_vector_construct2(this);
diff --git 
a/libstdc++-v3/testsuite/23_containers/vector/requirem

Re: User directed Function Multiversioning via Function Overloading (issue5752064)

2012-11-07 Thread Dominique Dhumieres
> This should be fixed by a patch I committed directly before you
> sent your mail (which is why you did not see it yet).  Can you
> please verify?

Bootstrap has completed at revision 193278 (with the patch for
dwarf2out.c.

Thanks,

Dominique



RE: [PATCH,RX] Support Bit Manipulation on Memory Operands

2012-11-07 Thread Naveen H. S
Hi,

Thank you for reviewing the patch and valuable comments.

>> You need to use match_dup instead of a matching constraint.
Done.

>> Every one that isn't explicitly invoked should have a leading "*" 
>> in the name.
Done.

Please find attached the modified patch and let me know if it's okay?

Thanks & Regards,
Naveen



rx_bit_insn.patch
Description: rx_bit_insn.patch


[PATCH] Revert sparc "U" constraint removal.

2012-11-07 Thread David Miller

PR bootstrap/55211
Revert:
* config/sparc/constraints.md ("U"): Delete.
* config/sparc/sparc.md: Use 'r' constraint instead of 'U'.
* config/sparc/sync.md: Likewise.
And revert parts of:
* doc/md.texi: Sync sparc constraint documentation with reality.
---
 gcc/ChangeLog   | 10 ++
 gcc/config/sparc/constraints.md | 11 ++-
 gcc/config/sparc/sparc.md   | 16 
 gcc/config/sparc/sync.md|  4 ++--
 gcc/doc/md.texi |  3 +++
 5 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index eb4bd88..dc62c59 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,13 @@
+2012-11-07  David S. Miller  
+
+   PR bootstrap/55211
+   Revert:
+   * config/sparc/constraints.md ("U"): Delete.
+   * config/sparc/sparc.md: Use 'r' constraint instead of 'U'.
+   * config/sparc/sync.md: Likewise.
+   And revert parts of:
+   * doc/md.texi: Sync sparc constraint documentation with reality.
+
 2012-11-07  Jakub Jelinek  
 
* config/i386/i386.c (ix86_avx_u128_mode_after): Don't
diff --git a/gcc/config/sparc/constraints.md b/gcc/config/sparc/constraints.md
index 71670ee..2f8c6ad 100644
--- a/gcc/config/sparc/constraints.md
+++ b/gcc/config/sparc/constraints.md
@@ -18,7 +18,7 @@
 ;; .
 
 ;;; Unused letters:
-;;;AB  U
+;;;AB
 ;;;ajklq  tuv xyz
 
 
@@ -130,6 +130,15 @@
   (match_code "mem")
   (match_test "memory_ok_for_ldd (op)")))
 
+;; Not needed in 64-bit mode
+(define_constraint "U"
+ "Pseudo-register or hard even-numbered integer register"
+ (and (match_test "TARGET_ARCH32")
+  (match_code "reg")
+  (ior (match_test "REGNO (op) < FIRST_PSEUDO_REGISTER")
+  (not (match_test "reload_in_progress && reg_renumber [REGNO (op)] < 
0")))
+  (match_test "register_ok_for_ldd (op)")))
+
 ;; Equivalent to 'T' but available in 64-bit mode
 (define_memory_constraint "W"
  "Memory reference for 'e' constraint floating-point register"
diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index 4a44078..f604f46 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -1595,9 +1595,9 @@
 
 (define_insn "*movdi_insn_sp32"
   [(set (match_operand:DI 0 "nonimmediate_operand"
-   
"=T,o,T,r,o,r,r,r,?T,?*f,?*f,?o,?*e,?*e,  r,?*f,?*e,?W,b,b")
+   
"=T,o,T,U,o,r,r,r,?T,?*f,?*f,?o,?*e,?*e,  r,?*f,?*e,?W,b,b")
 (match_operand:DI 1 "input_operand"
-   " J,J,r,T,r,o,i,r,*f,  T,  o,*f, *e, 
*e,?*f,  r,  W,*e,J,P"))]
+   " J,J,U,T,r,o,i,r,*f,  T,  o,*f, *e, 
*e,?*f,  r,  W,*e,J,P"))]
   "! TARGET_ARCH64
&& (register_operand (operands[0], DImode)
|| register_or_zero_operand (operands[1], DImode))"
@@ -2302,8 +2302,8 @@
 })
 
 (define_insn "*movdf_insn_sp32"
-  [(set (match_operand:DF 0 "nonimmediate_operand" "=b,b,e,e,*r, f,  
e,T,W,r,T,  f,  *r,  o,o")
-   (match_operand:DF 1 "input_operand" "G,C,e,e, 
f,*r,W#F,G,e,T,r,o#F,*roF,*rG,f"))]
+  [(set (match_operand:DF 0 "nonimmediate_operand" "=b,b,e,e,*r, f,  
e,T,W,U,T,  f,  *r,  o,o")
+   (match_operand:DF 1 "input_operand" "G,C,e,e, 
f,*r,W#F,G,e,T,U,o#F,*roF,*rG,f"))]
   "! TARGET_ARCH64
&& (register_operand (operands[0], DFmode)
|| register_or_zero_or_all_ones_operand (operands[1], DFmode))"
@@ -2541,8 +2541,8 @@
 })
 
 (define_insn "*movtf_insn_sp32"
-  [(set (match_operand:TF 0 "nonimmediate_operand" "=b, e,o, o,r,  r")
-   (match_operand:TF 1 "input_operand"" G,oe,e,rG,o,roG"))]
+  [(set (match_operand:TF 0 "nonimmediate_operand" "=b, e,o,  o,U,  r")
+   (match_operand:TF 1 "input_operand"" G,oe,e,rGU,o,roG"))]
   "! TARGET_ARCH64
&& (register_operand (operands[0], TFmode)
|| register_or_zero_operand (operands[1], TFmode))"
@@ -7911,8 +7911,8 @@
(set_attr "cpu_feature" "vis,vis,vis,*,*,*,*,*,vis3,vis3,*")])
 
 (define_insn "*mov_insn_sp32"
-  [(set (match_operand:VM64 0 "nonimmediate_operand" "=e,e,e,*r, f,e,m,m,r,T, 
o,*r")
-   (match_operand:VM64 1 "input_operand" "Y,C,e, 
f,*r,m,e,Y,T,r,*r,*r"))]
+  [(set (match_operand:VM64 0 "nonimmediate_operand" "=e,e,e,*r, f,e,m,m,U,T, 
o,*r")
+   (match_operand:VM64 1 "input_operand" "Y,C,e, 
f,*r,m,e,Y,T,U,*r,*r"))]
   "TARGET_VIS
&& ! TARGET_ARCH64
&& (register_operand (operands[0], mode)
diff --git a/gcc/config/sparc/sync.md b/gcc/config/sparc/sync.md
index 302cd74..d11f663 100644
--- a/gcc/config/sparc/sync.md
+++ b/gcc/config/sparc/sync.md
@@ -115,7 +115,7 @@
 })
 
 (define_insn "atomic_loaddi_1"
-  [(set (match_operand:DI 0 "register_operand" "=r,?*f")
+  [(set (match_operand:DI 0 "register_operand" "=U,?*f")
(unspec:DI [(match_operand:DI 1 "memory_operand

Re: [PATCH] Vzeroupper placement/47440

2012-11-07 Thread Jakub Jelinek
On Wed, Nov 07, 2012 at 08:08:08AM +0100, Uros Bizjak wrote:
> >> 2012-11-06  Jakub Jelinek  
> >>
> >> * config/i386/i386.c (ix86_avx_u128_mode_after): Don't
> >> look for reg in CALL operand.
> >
> > OK.
> 
> You can also break the loop after reg is found.

I have committed the patch as is to fix the bootstrap, as anything else
needs another bootstrap/regtest.  I don't think breaking out of the loop
would be correct, then say for *{,sib}call_value_pop patterns reg would be
stack pointer rather than the return value of the call.  Due to that pattern
we can't use single_set, but I wonder if we just can't use XVECEXP (pat, 0, 0)
unconditionally for the return value, or perhaps check
the condition inside of the loop (REG_P (reg) && VALID_AVX256_REG_OR_OI_MODE 
(GET_MODE
(reg))), return AVX_U128_DIRTY if true (and that way break out of the loop),
and return AVX_U128_CLEAN after the loop.
Or I wonder why is call handled specially at all, doesn't
  /* Check if a 256bit AVX register is referenced in stores.  */
  state = unused;
  note_stores (pat, check_avx256_stores, &state);
  if (state == used)
return AVX_U128_DIRTY;
handle it?  Then it would just need to be if (CALL_P (insn)) return 
AVX_U128_CLEAN.
BTW, the formatting is wrong in some spots, e.g.
check_avx256_stores (rtx dest, const_rtx set, void *data)
{
  if (((REG_P (dest) || MEM_P(dest))

I'd prefer to leave this to the original submitter.

Jakub