Re: [PATCH 5/5] [AARCH64] Add variant support to -m="native"and add thunderxt88p1.

2016-11-02 Thread Jones, Joel
What is currently submitted for LLVM review was submitted before we determined 
this naming scheme. I will mark the current submittal as abandoned, as the 
scheduling model needs to be split out and revised.

Joel Jones 

Sent from my AArch64 powered iPhone

> On Nov 2, 2016, at 3:55 AM, James Greenhalgh  wrote:
> 
>> On Tue, Nov 01, 2016 at 11:08:53AM -0700, Andrew Pinski wrote:
>>> On Tue, Nov 17, 2015 at 2:10 PM, Andrew Pinski  wrote:
>>> Since ThunderX T88 pass 1 (variant 0) is a ARMv8 part while pass 2 (variant 
>>> 1)
>>> is an ARMv8.1 part, I needed to add detecting of the variant also for this
>>> difference. Also I simplify a little bit and combined the single core and
>>> arch detecting cases so it would be easier to add variant.
>> 
>> Actually it is a bit more complex than what I said here, see below for
>> the full table of options and what are enabled/disabled now.
>> 
>>> OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions.
>>> Tested -mcpu=native on both T88 pass 1 and T88 pass 2 to make sure it is
>>> deecting the two seperately.
>> 
>> 
>> Here is the final patch in this series updated; I changed the cpu name
>> slightly and made sure I updated invoke.texi too.
>> 
>> The names are going to match the names in LLVM (worked with our LLVM
>> engineer here at Cavium about the names).
>> Here are the names recorded and
>> -mpcu=thunderx:
>> *Matches part num 0xA0 (reserved for ThunderX 8x series)
>> *T88 Pass 2 scheduling
>> *Hardware prefetching (software prefetching disabled)
>> *LSE enabled
>> *no v8.1
> 
> This doesn't match the current LLVM proposal
> ( https://reviews.llvm.org/D24540 ) which enables full ARMv8.1-A support
> for -mcpu=thunderx.
> 
>> -mcpu=thunderxt88:
>> *Matches part num 0xA1
>> *T88 Pass 2 scheduling
>> *software prefetching enabled
>> *LSE enabled
>> *no v8.1
>> 
>> -mcpu=thunderxt88p1 (only for GCC):
>> *Matches part num 0xA1, variant 0
>> *T88 Pass 1 scheduling
>> *software prefetching enabled
>> *no LSE enabled
>> *no v8.1
>> 
>> -mcpu=thunderxt81 and -mcpu=thunderxt83:
>> *Matches part num 0xA2/0xA3
>> *T88 Pass 2 scheduling
>> *Hardware prefetching (software prefetching disabled)
>> *LSE enabled
>> *v8.1
> 
> This looks like what has been added to LLVM as -mcpu=thunderx.
> 
>> I have not hooked up software vs hardware prefetching and the
>> scheduler parts (the next patch will do part of that); both ARMv8.1-a
>> and LSE parts are hooked up as those parts are only in
>> aarch64-cores.def.
>> 
>> OK?  Bootstrapped and tested on ThunderX T88 and ThunderX T81
>> (aarch64-linux-gnu).
>> 
>> Index: common/config/aarch64/aarch64-common.c
>> ===
>> --- common/config/aarch64/aarch64-common.c(revision 241727)
>> +++ common/config/aarch64/aarch64-common.c(working copy)
>> @@ -145,7 +145,7 @@ struct arch_to_arch_name
>>the default set of architectural feature flags they support.  */
>> static const struct processor_name_to_arch all_cores[] =
>> {
>> -#define AARCH64_CORE(NAME, X, IDENT, ARCH_IDENT, FLAGS, COSTS, IMP, PART) \
>> +#define AARCH64_CORE(NAME, X, IDENT, ARCH_IDENT, FLAGS, COSTS, IMP, PART, 
>> VARIANT) \
>>   {NAME, AARCH64_ARCH_##ARCH_IDENT, FLAGS},
>> #include "config/aarch64/aarch64-cores.def"
>>   {"generic", AARCH64_ARCH_8A, AARCH64_FL_FOR_ARCH8},
>> Index: config/aarch64/aarch64-cores.def
>> ===
>> --- config/aarch64/aarch64-cores.def(revision 241727)
>> +++ config/aarch64/aarch64-cores.def(working copy)
>> @@ -21,7 +21,7 @@
>> 
>>Before using #include to read this file, define a macro:
>> 
>> -  AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHEDULER_IDENT, ARCH_IDENT, 
>> FLAGS, COSTS, IMP, PART)
>> +  AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHEDULER_IDENT, ARCH_IDENT, 
>> FLAGS, COSTS, IMP, PART, VARIANT)
>> 
>>The CORE_NAME is the name of the core, represented as a string constant.
>>The CORE_IDENT is the name of the core, represented as an identifier.
>> @@ -39,39 +39,45 @@
>>PART is the part number of the CPU.  On a GNU/Linux system it can be
>>found in /proc/cpuinfo.  For big.LITTLE systems this should use the
>>macro AARCH64_BIG_LITTLE where the big part number comes as the first
>> -   argument to the macro and little is the second.  */
>> +   argument to the macro and little is the second.
>> +   VARIANT is the variant of the CPU.  In a GNU/Linux system it can found
>> +   in /proc/cpuinfo.  If this is -1, this means it can match any variant.  
>> */
>> 
>> /* V8 Architecture Processors.  */
>> 
>> /* ARM ('A') cores. */
>> -AARCH64_CORE("cortex-a35",  cortexa35, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 
>> | AARCH64_FL_CRC, cortexa35, 0x41, 0xd04)
>> 

Re: [PATCH][rtlanal.c] Convert conditional compilation on WORD_REGISTER_OPERATIONS

2016-11-02 Thread Eric Botcazou
> I think you're right. I suppose the new condition should be:
> 
> #ifdef LOAD_EXTEND_OP
> /* If this is a typical RISC machine, we only have to worry
>about the way loads are extended.  */
> if (!WORD_REGISTER_OPERATIONS
> 
> || ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND
> 
>? val_signbit_known_set_p (inner_mode, nonzero)
> 
>: LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND)
>  || 
>  || !MEM_P (SUBREG_REG (x
> 
> #endif

Agreed.

> Would you prefer me to make this change or just revert the patch?

Go ahead and make the change, but please do a bit of comment massaging in the 
process, for example:

#ifdef LOAD_EXTEND_OP
  /* On many CISC machines, accessing an object in a wider mode
 causes the high-order bits to become undefined.  So they are
 not known to be zero.  */
  if (!WORD_REGISTER_OPERATIONS
 /* If this is a typical RISC machine, we only have to worry
about the way loads are extended.  */
  || ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND
 ? val_signbit_known_set_p (inner_mode, nonzero)
 : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND)
   || !MEM_P (SUBREG_REG (x
#endif
{
  if (GET_MODE_PRECISION (GET_MODE (x))
  > GET_MODE_PRECISION (inner_mode))
nonzero |= (GET_MODE_MASK (GET_MODE (x))
& ~GET_MODE_MASK (inner_mode));
}

-- 
Eric Botcazou


Re: [PATCH] Fix store-merging alias check, apply some TLC

2016-11-02 Thread Richard Biener
On Wed, 2 Nov 2016, Richard Biener wrote:

> 
> This fixes the alias check in terminate_all_aliasing_chains -- the
> base we use for the hash table indexing does not constitute an object
> that aliases all stores in the chain (consider for example the MEM_REF
> handling which strips the offset completely).

I fat-fingered this thinking one step ahead (forgot to take the address).

> I've refactored data structures a bit in the process of making
> (as a followup) 'base_addr' a true address (and thus avoid building
> that new MEM_REF for example).  A followup will then try to support
> (some) bases with offset != NULL_TREE.
> 
> Bootstrap / regtest running on x86_64-unknown-linux-gnu.

So I had to re-start the testing when I already finished the very minor
change to make addr_base actually the address.  Thus the 2nd part is now
included.

This also has the advantage that the hash table comparison will
now apply the lax rules valid for comparing addresses (thus the
TREE_THIS_VOLATILE check should be no longer needed for example
and also different TBAA stores will now be merged).  I took the
liberty to explicitely rule out TARGET_MEM_REFs from the
merging (they'll have failed to form groups anyway as they are
not subsetted and we don't handle the const offset stripping yet
to eventually merge byte to word stores).

Richard.

2016-11-02  Richard Biener  

* gimple-ssa-store-merging.c (struct store_immediate_info): Remove
redundant val and dest members.
(store_immediate_info::store_immediate_info): Adjust.
(merged_store_group::merged_store_group): Adjust.
(merged_store_group::apply_stores): Likewise.
(struct imm_store_chain_info): Add base_addr field.
(imm_store_chain_info::imm_store_chain_info): New constructor.
(imm_store_chain_info::terminate_and_process_chain): Do not pass base.
(imm_store_chain_info::output_merged_store): Likewise.  Use
addr_base which is already the address.
(imm_store_chain_info::output_merged_stores): Likewise.
(pass_tree_store_merging::terminate_all_aliasing_chains): Take
imm_store_chain_info instead of base.  Fix alias check.
(pass_tree_store_merging::terminate_and_release_chain): Likewise.
(imm_store_chain_info::coalesce_immediate_stores): Adjust.
(pass_store_merging::execute): Refuse to operate on TARGET_MEM_REF.
use the address of the base and adjust for other changes.

Index: gcc/gimple-ssa-store-merging.c
===
--- gcc/gimple-ssa-store-merging.c  (revision 241779)
+++ gcc/gimple-ssa-store-merging.c  (working copy)
@@ -140,19 +140,17 @@ struct store_immediate_info
 {
   unsigned HOST_WIDE_INT bitsize;
   unsigned HOST_WIDE_INT bitpos;
-  tree val;
-  tree dest;
   gimple *stmt;
   unsigned int order;
-  store_immediate_info (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, tree,
-   tree, gimple *, unsigned int);
+  store_immediate_info (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
+   gimple *, unsigned int);
 };
 
 store_immediate_info::store_immediate_info (unsigned HOST_WIDE_INT bs,
-   unsigned HOST_WIDE_INT bp, tree v,
-   tree d, gimple *st,
+   unsigned HOST_WIDE_INT bp,
+   gimple *st,
unsigned int ord)
-  : bitsize (bs), bitpos (bp), val (v), dest (d), stmt (st), order (ord)
+  : bitsize (bs), bitpos (bp), stmt (st), order (ord)
 {
 }
 
@@ -557,7 +555,7 @@ merged_store_group::merged_store_group (
   /* VAL has memory allocated for it in apply_stores once the group
  width has been finalized.  */
   val = NULL;
-  align = get_object_alignment (info->dest);
+  align = get_object_alignment (gimple_assign_lhs (info->stmt));
   stores.create (1);
   stores.safe_push (info);
   last_stmt = info->stmt;
@@ -654,14 +652,16 @@ merged_store_group::apply_stores ()
   FOR_EACH_VEC_ELT (stores, i, info)
 {
   unsigned int pos_in_buffer = info->bitpos - start;
-  bool ret = encode_tree_to_bitpos (info->val, val, info->bitsize,
-pos_in_buffer, buf_size);
+  bool ret = encode_tree_to_bitpos (gimple_assign_rhs1 (info->stmt),
+   val, info->bitsize,
+   pos_in_buffer, buf_size);
   if (dump_file && (dump_flags & TDF_DETAILS))
{
  if (ret)
{
  fprintf (dump_file, "After writing ");
- print_generic_expr (dump_file, info->val, 0);
+ print_generic_expr (dump_file,
+ gimple_assign_rhs1 (info->stmt), 0);
  fprintf (dump_file, " of size " HOST_WIDE_INT_PRINT_DEC
" at 

Re: [PATCH v2][AArch32][NEON] Implementing vmaxnmQ_ST and vminnmQ_ST intrinsincs.

2016-11-02 Thread Bin.Cheng
On Tue, Nov 1, 2016 at 12:21 PM, Kyrill Tkachov
 wrote:
> Hi Tamar,
>
>
> On 26/10/16 16:01, Tamar Christina wrote:
>>
>> Hi Christophe,
>>
>> Here's the updated patch.
>>
>> Cheers,
>> Tamar
>> 
>> From: Christophe Lyon 
>> Sent: Wednesday, October 19, 2016 11:23:56 AM
>> To: Tamar Christina
>> Cc: GCC Patches; Kyrylo Tkachov; nd
>> Subject: Re: [PATCH v2][AArch32][NEON] Implementing vmaxnmQ_ST and
>> vminnmQ_ST intrinsincs.
>>
>> On 19 October 2016 at 11:36, Tamar Christina 
>> wrote:
>>>
>>> Hi All,
>>>
>>> This patch implements the vmaxnmQ_ST and vminnmQ_ST intrinsics. The
>>> current builtin registration code is deficient since it can't access
>>> standard pattern names, to which vmaxnmQ_ST and vminnmQ_ST map
>>> directly. Thus, to enable the vectoriser to have access to these
>>> intrinsics, we implement them using builtin functions, which we
>>> expand to the proper standard pattern using a define_expand.
>>>
>>> This patch also implements the __ARM_FEATURE_NUMERIC_MAXMIN macro,
>>> which is defined when __ARM_ARCH >= 8, and which enables the
>>> intrinsics.
>>>
>>> Regression tested on arm-none-eabi and no regressions.
>>>
>>> This patch is a rework of a previous patch:
>>> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01971.html
>>>
>>> OK for trunk?
These cases failed on arm-none-linux-gnueabihf as below:
FAIL: gcc.target/arm/simd/vmaxnm_f32_1.c execution test
FAIL: gcc.target/arm/simd/vmaxnmq_f32_1.c execution test
FAIL: gcc.target/arm/simd/vminnm_f32_1.c execution test
FAIL: gcc.target/arm/simd/vminnmq_f32_1.c execution test

For such changes, I would suggest reg test for both bare-metal and
linux toolchains, plus a bootstrap for linux toolchain.

Thanks,
bin

>
>
> Ok.
> Thanks,
> Kyrill
>
>
>>> Thanks,
>>> Tamar
>>>
>>> ---
>>>
>>> gcc/
>>>
>>> 2016-10-19  Bilyan Borisov  
>>>  Tamar Christina 
>>>
>>>  * config/arm/arm-c.c (arm_cpu_builtins): New macro definition.
>>>  * config/arm/arm_neon.h (vmaxnm_f32): New intrinsinc.
>>>  (vmaxnmq_f32): Likewise.
>>>  (vminnm_f32): Likewise.
>>>  (vminnmq_f32): Likewise.
>>>  * config/arm/arm_neon_builtins.def (vmaxnm): New builtin.
>>>  (vminnm): Likewise.
>>>  * config/arm/neon.md (neon_, VCVTF): New
>>>  expander.
>>>
>>> gcc/testsuite/
>>>
>>> 2016-10-19  Bilyan Borisov  
>>>
>>>  * gcc.target/arm/simd/vmaxnm_f32_1.c: New.
>>>  * gcc.target/arm/simd/vmaxnmq_f32_1.c: Likewise.
>>>  * gcc.target/arm/simd/vminnm_f32_1.c: Likewise.
>>>  * gcc.target/arm/simd/vminnmq_f32_1.c: Likewise.
>>>
>> I think you forgot to attach the new tests.
>>
>> Christophe
>>
>


Re: [PATCH testsuite]Require vect_cond_mixed for test case gcc.dg/vect/pr56541.c

2016-11-02 Thread Bin.Cheng
On Wed, Nov 2, 2016 at 11:08 AM, Richard Biener
 wrote:
> On Tue, Nov 1, 2016 at 4:24 PM, Bin.Cheng  wrote:
>> On Thu, Aug 11, 2016 at 11:38 AM, Richard Biener
>>  wrote:
>>> On Thu, Aug 11, 2016 at 11:56 AM, Bin.Cheng  wrote:
 On Thu, Aug 11, 2016 at 10:50 AM, Richard Biener
  wrote:
> On Wed, Aug 10, 2016 at 5:58 PM, Bin Cheng  wrote:
>> Hi,
>> Due to some reasons, tree-if-conv.c now factors floating point 
>> comparison out of cond_expr,
>> resulting in mixed types in it.  This does help CSE on common comparison 
>> operations.
>> Only problem is that test gcc.dg/vect/pr56541.c now requires 
>> vect_cond_mixed to be
>> vectorized.  This patch changes the test in that way.
>> Test result checked.  Is it OK?
>
> Hmm, I think the fix is to fix if-conversion not doing that.  Can you
> track down why this happens?
 Hmm, but there are several common floating comparison operations in
 the case, by doing this, we could do CSE on GIMPLE, otherwise we
 depends on RTL optimizers.
>>>
>>> I see.
>>>
 I thought we prefer GIMPLE level
 transforms?
>>>
>>> Yes, but the vectorizer is happier with the conditions present in the 
>>> COND_EXPR
>>> and thus we concluded we always want to have them there.  forwprop will
>>> also aggressively put them back.  Note that we cannot put back
>>> tree_could_throw_p
>>> conditions (FP compares with signalling nans for example) to properly model 
>>> EH
>>> (though for VEC_COND_EXPRs we don't really care here).
>>>
>>> Note that nothing between if-conversion and vectorization will perform
>>> the desired
>>> CSE anyway.
>> Hi Richard,
>> I looked into this one and found it was not if-conv factors cond_expr
>> out.  For test case:
>>
>>   for (i=0; i!=1024; ++i)
>> {
>>   float rR = a*z[i];
>>   float rL = b*z[i];
>>   float rMin = (rR>   float rMax = (rR>   rMin = (rMax>0) ? rMin : rBig;
>>   rMin = (rMin>0) ? rMin : rMax;
>>   ok[i] = rMin-c> }
>> Dump before jump threading is like:
>>
>>   :
>>   # iftmp.3_12 = PHI 
>>   if (iftmp.3_12 > 0.0)
>> goto ;
>>   else
>> goto ;
>>
>>   :
>>
>>   :
>>   # iftmp.4_13 = PHI 
>>   if (iftmp.4_13 > 0.0)
>> goto ;
>>   else
>> goto ;
>>
>>   :
>>
>>   :
>>   # iftmp.5_14 = PHI 
>>
>> Jump thread in dom pass threads edges (bb7 -> bb8 -> ... bb11) to (bb6
>> -> bb12 -> bb9) as below:
>>
>>   :
>>   # iftmp.3_12 = PHI 
>>   # iftmp.2_23 = PHI 
>>   if (iftmp.3_12 > 0.0)
>> goto ;
>>   else
>> goto ;
>>
>>   :
>>   # iftmp.4_13 = PHI 
>>   if (iftmp.4_13 > 0.0)
>> goto ;
>>   else
>> goto ;
>>
>>   :
>>
>>   :
>>   # iftmp.5_14 = PHI 
>>
>>   //...
>>   :
>>   # iftmp.4_22 = PHI <1.5e+2(6)>
>>   goto ;
>>
>> This transform saves one comparison on the path, but creates multi-arg
>> phi, resulting in cond_expr being factored out.
>>
>> Looks like threading corrupts vectorization opportunity for target
>> doesn't support vect_cond_mixed, but I guess it's hard to tell in
>> threading itself.  Any ideas?
>
> First of all I wonder why MIN/MAX are not pattern matched earlier...
> (I'd generally avoid threading through them).
Hmm, this case is not MIN/MAX cases, though it uses rMin/rMax as
variable names.  So it cannot be pattern matched.

As for below cases you mentioned.  Since I am introducing
more/advanced pattern in match.pd and adding call to fold_stmt in
fold_stmt, I think they can be addressed in this way too, I will keep
this in mind in my cond_expr pattern work.  So we don't need to go
that far to vector pattern matches.

Thanks,
bin
>
> But yes, I see that if-conversion can cause factored out conditons.
> Also for the simple case of ! condition:
>
> _49 = iftmp.3_12 > 0.0;
>   _50 = ~_49;
>   iftmp.5_14 = _50 ? 1.5e+2 : _ifc__48;
>
> but those should have been eliminated by swapping operands (looks like
> that doesn't reliably work).
>
> In the end we have the bool patterns in the vectorizer to recover from this,
> so I wonder why that doesn't work then.  It should re-write the above to
>
>  _111 = iftmp.3_12 > 0.0 ? 1 : 0;  // of proper type
>  _112 = ~ _111;
>  iftmp.5_14 = _112 != 0 ? ;
>
> Richard.
>
>> Thanks,
>> bin


Re: [PATCH][rtlanal.c] Convert conditional compilation on WORD_REGISTER_OPERATIONS

2016-11-02 Thread Kyrill Tkachov

Hi Eric,

On 02/11/16 10:47, Eric Botcazou wrote:

This converts the preprocessor check for WORD_REGISTER_OPERATIONS into a
runtime one in rtlanal.c.

Since this one was in combination with an "#if defined" and used to guard an
if-statement I'd appreciate it if someone gave it a double-check that I
dind't screw up the intended behaviour.

Unfortunately I think you did, as the old version was:

#if WORD_REGISTER_OPERATIONS && defined (LOAD_EXTEND_OP)
  /* If this is a typical RISC machine, we only have to worry
 about the way loads are extended.  */
  if ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND
   ? val_signbit_known_set_p (inner_mode, nonzero)
   : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND)
  || !MEM_P (SUBREG_REG (x)))
#endif

and the new version is:

#ifdef LOAD_EXTEND_OP
  /* If this is a typical RISC machine, we only have to worry
 about the way loads are extended.  */
  if (WORD_REGISTER_OPERATIONS
  && ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND
 ? val_signbit_known_set_p (inner_mode, nonzero)
 : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND)
   || !MEM_P (SUBREG_REG (x
#endif

So if WORD_REGISTER_OPERATIONS is zero and LOAD_EXTEND_OP is defined, for
example on PowerPC, the block guarded by the condition is always executed in
the former case but never in the latter case.


I think you're right. I suppose the new condition should be:

#ifdef LOAD_EXTEND_OP
  /* If this is a typical RISC machine, we only have to worry
 about the way loads are extended.  */
  if (!WORD_REGISTER_OPERATIONS
  || ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND
 ? val_signbit_known_set_p (inner_mode, nonzero)
 : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND)
   || !MEM_P (SUBREG_REG (x
#endif

Would you prefer me to make this change or just revert the patch?

Thanks,
Kyrill




Re: [PATCH testsuite]Require vect_cond_mixed for test case gcc.dg/vect/pr56541.c

2016-11-02 Thread Richard Biener
On Tue, Nov 1, 2016 at 4:24 PM, Bin.Cheng  wrote:
> On Thu, Aug 11, 2016 at 11:38 AM, Richard Biener
>  wrote:
>> On Thu, Aug 11, 2016 at 11:56 AM, Bin.Cheng  wrote:
>>> On Thu, Aug 11, 2016 at 10:50 AM, Richard Biener
>>>  wrote:
 On Wed, Aug 10, 2016 at 5:58 PM, Bin Cheng  wrote:
> Hi,
> Due to some reasons, tree-if-conv.c now factors floating point comparison 
> out of cond_expr,
> resulting in mixed types in it.  This does help CSE on common comparison 
> operations.
> Only problem is that test gcc.dg/vect/pr56541.c now requires 
> vect_cond_mixed to be
> vectorized.  This patch changes the test in that way.
> Test result checked.  Is it OK?

 Hmm, I think the fix is to fix if-conversion not doing that.  Can you
 track down why this happens?
>>> Hmm, but there are several common floating comparison operations in
>>> the case, by doing this, we could do CSE on GIMPLE, otherwise we
>>> depends on RTL optimizers.
>>
>> I see.
>>
>>> I thought we prefer GIMPLE level
>>> transforms?
>>
>> Yes, but the vectorizer is happier with the conditions present in the 
>> COND_EXPR
>> and thus we concluded we always want to have them there.  forwprop will
>> also aggressively put them back.  Note that we cannot put back
>> tree_could_throw_p
>> conditions (FP compares with signalling nans for example) to properly model 
>> EH
>> (though for VEC_COND_EXPRs we don't really care here).
>>
>> Note that nothing between if-conversion and vectorization will perform
>> the desired
>> CSE anyway.
> Hi Richard,
> I looked into this one and found it was not if-conv factors cond_expr
> out.  For test case:
>
>   for (i=0; i!=1024; ++i)
> {
>   float rR = a*z[i];
>   float rL = b*z[i];
>   float rMin = (rR   float rMax = (rR   rMin = (rMax>0) ? rMin : rBig;
>   rMin = (rMin>0) ? rMin : rMax;
>   ok[i] = rMin-c }
> Dump before jump threading is like:
>
>   :
>   # iftmp.3_12 = PHI 
>   if (iftmp.3_12 > 0.0)
> goto ;
>   else
> goto ;
>
>   :
>
>   :
>   # iftmp.4_13 = PHI 
>   if (iftmp.4_13 > 0.0)
> goto ;
>   else
> goto ;
>
>   :
>
>   :
>   # iftmp.5_14 = PHI 
>
> Jump thread in dom pass threads edges (bb7 -> bb8 -> ... bb11) to (bb6
> -> bb12 -> bb9) as below:
>
>   :
>   # iftmp.3_12 = PHI 
>   # iftmp.2_23 = PHI 
>   if (iftmp.3_12 > 0.0)
> goto ;
>   else
> goto ;
>
>   :
>   # iftmp.4_13 = PHI 
>   if (iftmp.4_13 > 0.0)
> goto ;
>   else
> goto ;
>
>   :
>
>   :
>   # iftmp.5_14 = PHI 
>
>   //...
>   :
>   # iftmp.4_22 = PHI <1.5e+2(6)>
>   goto ;
>
> This transform saves one comparison on the path, but creates multi-arg
> phi, resulting in cond_expr being factored out.
>
> Looks like threading corrupts vectorization opportunity for target
> doesn't support vect_cond_mixed, but I guess it's hard to tell in
> threading itself.  Any ideas?

First of all I wonder why MIN/MAX are not pattern matched earlier...
(I'd generally avoid threading through them).

But yes, I see that if-conversion can cause factored out conditons.
Also for the simple case of ! condition:

_49 = iftmp.3_12 > 0.0;
  _50 = ~_49;
  iftmp.5_14 = _50 ? 1.5e+2 : _ifc__48;

but those should have been eliminated by swapping operands (looks like
that doesn't reliably work).

In the end we have the bool patterns in the vectorizer to recover from this,
so I wonder why that doesn't work then.  It should re-write the above to

 _111 = iftmp.3_12 > 0.0 ? 1 : 0;  // of proper type
 _112 = ~ _111;
 iftmp.5_14 = _112 != 0 ? ;

Richard.

> Thanks,
> bin


Re: [PATCH 5/5] [AARCH64] Add variant support to -m="native"and add thunderxt88p1.

2016-11-02 Thread James Greenhalgh
On Tue, Nov 01, 2016 at 11:08:53AM -0700, Andrew Pinski wrote:
> On Tue, Nov 17, 2015 at 2:10 PM, Andrew Pinski  wrote:
> > Since ThunderX T88 pass 1 (variant 0) is a ARMv8 part while pass 2 (variant 
> > 1)
> > is an ARMv8.1 part, I needed to add detecting of the variant also for this
> > difference. Also I simplify a little bit and combined the single core and
> > arch detecting cases so it would be easier to add variant.
> 
> Actually it is a bit more complex than what I said here, see below for
> the full table of options and what are enabled/disabled now.
> 
> > OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions.
> > Tested -mcpu=native on both T88 pass 1 and T88 pass 2 to make sure it is
> > deecting the two seperately.
> 
> 
> Here is the final patch in this series updated; I changed the cpu name
> slightly and made sure I updated invoke.texi too.
> 
> The names are going to match the names in LLVM (worked with our LLVM
> engineer here at Cavium about the names).
> Here are the names recorded and
> -mpcu=thunderx:
> *Matches part num 0xA0 (reserved for ThunderX 8x series)
> *T88 Pass 2 scheduling
> *Hardware prefetching (software prefetching disabled)
> *LSE enabled
> *no v8.1

This doesn't match the current LLVM proposal
( https://reviews.llvm.org/D24540 ) which enables full ARMv8.1-A support
for -mcpu=thunderx.

> -mcpu=thunderxt88:
> *Matches part num 0xA1
> *T88 Pass 2 scheduling
> *software prefetching enabled
> *LSE enabled
> *no v8.1
> 
> -mcpu=thunderxt88p1 (only for GCC):
> *Matches part num 0xA1, variant 0
> *T88 Pass 1 scheduling
> *software prefetching enabled
> *no LSE enabled
> *no v8.1
> 
> -mcpu=thunderxt81 and -mcpu=thunderxt83:
> *Matches part num 0xA2/0xA3
> *T88 Pass 2 scheduling
> *Hardware prefetching (software prefetching disabled)
> *LSE enabled
> *v8.1

This looks like what has been added to LLVM as -mcpu=thunderx.

> I have not hooked up software vs hardware prefetching and the
> scheduler parts (the next patch will do part of that); both ARMv8.1-a
> and LSE parts are hooked up as those parts are only in
> aarch64-cores.def.
> 
> OK?  Bootstrapped and tested on ThunderX T88 and ThunderX T81
> (aarch64-linux-gnu).
> 
> Index: common/config/aarch64/aarch64-common.c
> ===
> --- common/config/aarch64/aarch64-common.c(revision 241727)
> +++ common/config/aarch64/aarch64-common.c(working copy)
> @@ -145,7 +145,7 @@ struct arch_to_arch_name
> the default set of architectural feature flags they support.  */
>  static const struct processor_name_to_arch all_cores[] =
>  {
> -#define AARCH64_CORE(NAME, X, IDENT, ARCH_IDENT, FLAGS, COSTS, IMP, PART) \
> +#define AARCH64_CORE(NAME, X, IDENT, ARCH_IDENT, FLAGS, COSTS, IMP, PART, 
> VARIANT) \
>{NAME, AARCH64_ARCH_##ARCH_IDENT, FLAGS},
>  #include "config/aarch64/aarch64-cores.def"
>{"generic", AARCH64_ARCH_8A, AARCH64_FL_FOR_ARCH8},
> Index: config/aarch64/aarch64-cores.def
> ===
> --- config/aarch64/aarch64-cores.def  (revision 241727)
> +++ config/aarch64/aarch64-cores.def  (working copy)
> @@ -21,7 +21,7 @@
>  
> Before using #include to read this file, define a macro:
>  
> -  AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHEDULER_IDENT, ARCH_IDENT, 
> FLAGS, COSTS, IMP, PART)
> +  AARCH64_CORE(CORE_NAME, CORE_IDENT, SCHEDULER_IDENT, ARCH_IDENT, 
> FLAGS, COSTS, IMP, PART, VARIANT)
>  
> The CORE_NAME is the name of the core, represented as a string constant.
> The CORE_IDENT is the name of the core, represented as an identifier.
> @@ -39,39 +39,45 @@
> PART is the part number of the CPU.  On a GNU/Linux system it can be
> found in /proc/cpuinfo.  For big.LITTLE systems this should use the
> macro AARCH64_BIG_LITTLE where the big part number comes as the first
> -   argument to the macro and little is the second.  */
> +   argument to the macro and little is the second.
> +   VARIANT is the variant of the CPU.  In a GNU/Linux system it can found
> +   in /proc/cpuinfo.  If this is -1, this means it can match any variant.  */
>  
>  /* V8 Architecture Processors.  */
>  
>  /* ARM ('A') cores. */
> -AARCH64_CORE("cortex-a35",  cortexa35, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC, cortexa35, 0x41, 0xd04)
> -AARCH64_CORE("cortex-a53",  cortexa53, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC, cortexa53, 0x41, 0xd03)
> -AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC, cortexa57, 0x41, 0xd07)
> -AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
> | AARCH64_FL_CRC, cortexa72, 0x41, 0xd08)
> -AARCH64_CORE("cortex-a73",  cortexa73, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 
> | 

[PATCH] Fix store-merging alias check, apply some TLC

2016-11-02 Thread Richard Biener

This fixes the alias check in terminate_all_aliasing_chains -- the
base we use for the hash table indexing does not constitute an object
that aliases all stores in the chain (consider for example the MEM_REF
handling which strips the offset completely).

I've refactored data structures a bit in the process of making
(as a followup) 'base_addr' a true address (and thus avoid building
that new MEM_REF for example).  A followup will then try to support
(some) bases with offset != NULL_TREE.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Richard.

2016-11-02  Richard Biener  

* gimple-ssa-store-merging.c (struct store_immediate_info): Remove
redundant val and dest members.
(store_immediate_info::store_immediate_info): Adjust.
(merged_store_group::merged_store_group): Adjust.
(merged_store_group::apply_stores): Likewise.
(struct imm_store_chain_info): Add base_addr field.
(imm_store_chain_info::imm_store_chain_info): New constructor.
(imm_store_chain_info::terminate_and_process_chain): Do not pass base.
(imm_store_chain_info::output_merged_store): Likewise.
(imm_store_chain_info::output_merged_stores): Likewise.
(pass_tree_store_merging::terminate_all_aliasing_chains): Take
imm_store_chain_info instead of base.  Fix alias check.
(pass_tree_store_merging::terminate_and_release_chain): Likewise.
(imm_store_chain_info::coalesce_immediate_stores): Adjust.

Index: gcc/gimple-ssa-store-merging.c
===
--- gcc/gimple-ssa-store-merging.c  (revision 241779)
+++ gcc/gimple-ssa-store-merging.c  (working copy)
@@ -140,19 +140,17 @@ struct store_immediate_info
 {
   unsigned HOST_WIDE_INT bitsize;
   unsigned HOST_WIDE_INT bitpos;
-  tree val;
-  tree dest;
   gimple *stmt;
   unsigned int order;
-  store_immediate_info (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT, tree,
-   tree, gimple *, unsigned int);
+  store_immediate_info (unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
+   gimple *, unsigned int);
 };
 
 store_immediate_info::store_immediate_info (unsigned HOST_WIDE_INT bs,
-   unsigned HOST_WIDE_INT bp, tree v,
-   tree d, gimple *st,
+   unsigned HOST_WIDE_INT bp,
+   gimple *st,
unsigned int ord)
-  : bitsize (bs), bitpos (bp), val (v), dest (d), stmt (st), order (ord)
+  : bitsize (bs), bitpos (bp), stmt (st), order (ord)
 {
 }
 
@@ -557,7 +555,7 @@ merged_store_group::merged_store_group (
   /* VAL has memory allocated for it in apply_stores once the group
  width has been finalized.  */
   val = NULL;
-  align = get_object_alignment (info->dest);
+  align = get_object_alignment (gimple_assign_lhs (info->stmt));
   stores.create (1);
   stores.safe_push (info);
   last_stmt = info->stmt;
@@ -654,14 +652,16 @@ merged_store_group::apply_stores ()
   FOR_EACH_VEC_ELT (stores, i, info)
 {
   unsigned int pos_in_buffer = info->bitpos - start;
-  bool ret = encode_tree_to_bitpos (info->val, val, info->bitsize,
-pos_in_buffer, buf_size);
+  bool ret = encode_tree_to_bitpos (gimple_assign_rhs1 (info->stmt),
+   val, info->bitsize,
+   pos_in_buffer, buf_size);
   if (dump_file && (dump_flags & TDF_DETAILS))
{
  if (ret)
{
  fprintf (dump_file, "After writing ");
- print_generic_expr (dump_file, info->val, 0);
+ print_generic_expr (dump_file,
+ gimple_assign_rhs1 (info->stmt), 0);
  fprintf (dump_file, " of size " HOST_WIDE_INT_PRINT_DEC
" at position %d the merged region contains:\n",
info->bitsize, pos_in_buffer);
@@ -680,13 +680,15 @@ merged_store_group::apply_stores ()
 
 struct imm_store_chain_info
 {
+  tree base_addr;
   auto_vec m_store_info;
   auto_vec m_merged_store_groups;
 
-  bool terminate_and_process_chain (tree);
+  imm_store_chain_info (tree b_a) : base_addr (b_a) {}
+  bool terminate_and_process_chain ();
   bool coalesce_immediate_stores ();
-  bool output_merged_store (tree, merged_store_group *);
-  bool output_merged_stores (tree);
+  bool output_merged_store (merged_store_group *);
+  bool output_merged_stores ();
 };
 
 const pass_data pass_data_tree_store_merging = {
@@ -722,8 +724,9 @@ private:
   hash_map m_stores;
 
   bool terminate_and_process_all_chains ();
-  bool terminate_all_aliasing_chains (tree, tree, bool, gimple *);
-  bool terminate_and_release_chain (tree);
+  bool 

Re: [PATCH][rtlanal.c] Convert conditional compilation on WORD_REGISTER_OPERATIONS

2016-11-02 Thread Eric Botcazou
> This converts the preprocessor check for WORD_REGISTER_OPERATIONS into a
> runtime one in rtlanal.c.
> 
> Since this one was in combination with an "#if defined" and used to guard an
> if-statement I'd appreciate it if someone gave it a double-check that I
> dind't screw up the intended behaviour.

Unfortunately I think you did, as the old version was:

#if WORD_REGISTER_OPERATIONS && defined (LOAD_EXTEND_OP)
  /* If this is a typical RISC machine, we only have to worry
 about the way loads are extended.  */
  if ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND
   ? val_signbit_known_set_p (inner_mode, nonzero)
   : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND)
  || !MEM_P (SUBREG_REG (x)))
#endif

and the new version is:

#ifdef LOAD_EXTEND_OP
  /* If this is a typical RISC machine, we only have to worry
 about the way loads are extended.  */
  if (WORD_REGISTER_OPERATIONS
  && ((LOAD_EXTEND_OP (inner_mode) == SIGN_EXTEND
 ? val_signbit_known_set_p (inner_mode, nonzero)
 : LOAD_EXTEND_OP (inner_mode) != ZERO_EXTEND)
   || !MEM_P (SUBREG_REG (x
#endif

So if WORD_REGISTER_OPERATIONS is zero and LOAD_EXTEND_OP is defined, for 
example on PowerPC, the block guarded by the condition is always executed in 
the former case but never in the latter case.

-- 
Eric Botcazou


Re: [PATCH] bb-reorder: Improve compgotos pass (PR71785)

2016-11-02 Thread Steven Bosscher
 On Wed, Nov 2, 2016 at 10:02 AM, Richard Biener
 wrote:
> On Mon, Oct 31, 2016 at 4:35 PM, Segher Boessenkool wrote:
>> On Mon, Oct 31, 2016 at 04:09:48PM +0100, Steven Bosscher wrote:
>>> On Sun, Oct 30, 2016 at 8:10 PM, Segher Boessenkool wrote:
>>> > +  cfg_layout_finalize ();
>>>
>>> I don't think you have to go into/out-of cfglayout mode for each iteration.
>>
>> Yeah probably.  I was too lazy :-)  It needs the cleanup_cfg every
>> iteration though, I was afraid that interacts.
>
> Ick.  Why would it need a cfg-cleanup every iteration?

I don't believe it needs a cleanup on every iteration. One cleanup at
the end should work fine.

Ciao!
Steven


Re: [PATCH, gcc/ARM, ping] Add support for Cortex-M33

2016-11-02 Thread Kyrill Tkachov


On 02/11/16 10:07, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 26/10/16 17:42, Thomas Preudhomme wrote:

Hi,

This patch adds support for the Cortex-M33 processor launched by ARM [1]. The
patch adds support for the name and wires it up to the ARMv8-M Mainline with DSP
extensions architecture and arm_v7m_tune tuning parameters for the time being.
It also updates documentation to mention this new processor.

[1] http://www.arm.com/products/processors/cortex-m/cortex-m33-processor.php

ChangeLog entry is as follows:

*** gcc/Changelog ***

2016-10-26  Thomas Preud'homme 

* config/arm/arm-arches.def (armv8-m.main+dsp): Set Cortex-M33 as
representative core for this architecture.
* config/arm/arm-cores.def (cortex-m33): Define new processor.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Likewise.
* config/arm/bpabi.h (BE8_LINK_SPEC): Add Cortex-M33 to the list of
valid -mcpu options.
* doc/invoke.texi (ARM Options): Document new Cortex-M33 processor.


Tested by building libgcc and libstdc++ for Cortex-M33 and running a hello world
compiled for it.

Is this ok for master?



Ok.
Thanks,
Kyrill


Best regards,

Thomas




Re: [PATCH, gcc/ARM, ping] Add support for Cortex-M23

2016-11-02 Thread Kyrill Tkachov


On 02/11/16 10:07, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 26/10/16 17:42, Thomas Preudhomme wrote:

Hi,

This patch adds support for the Cortex-M23 processor launched by ARM [1]. The
patch adds support for the name and wires it up to the ARMv8-M Baseline
architecture and arm_v6m_tune tuning parameters for the time being. It also
updates documentation to mention this new processor.

[1] http://www.arm.com/products/processors/cortex-m/cortex-m23-processor.php

ChangeLog entry is as follows:

*** gcc/Changelog ***

2016-10-26  Thomas Preud'homme 

* config/arm/arm-arches.def (armv8-m.base): Set Cortex-M23 as
representative core for this architecture.
* config/arm/arm-cores.def (cortex-m23): Define new processor.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Likewise.
* config/arm/arm.c (arm_v6m_tune): Add Cortex-M23 to the list of cores
this tuning parameters apply to in the comment.
* config/arm/bpabi.h (BE8_LINK_SPEC): Add Cortex-M23 to the list of
valid -mcpu options.
* doc/invoke.texi (ARM Options): Document new Cortex-M23 processor.


Tested by building libgcc and libstdc++ for Cortex-M23 and running a hello world
compiled for it.

Is this ok for master?



Ok.
Thanks,
Kyrill


Best regards,

Thomas




Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-02 Thread Jakub Jelinek
On Wed, Nov 02, 2016 at 10:59:26AM +0100, Jakub Jelinek wrote:
> > Which is gimplified as:
> > 
> > int * ptr;
> > 
> > switch (argc) , case 1: >
> > {
> >   int a;
> > 
> >   try
> > {
> >   ASAN_MARK (2, , 4);
> >   :
> >   goto ;
> >   :
> >   ptr = 
> >   goto ;
> > }
> >   finally
> > {
> >   ASAN_MARK (1, , 4);
> > }

> Shouldn't there be also ASAN_MARK on the implicit gotos from the switch
> statement?  Otherwise, consider this being done in a loop, after the first
> iteration you ASAN_MARK (1, , 4) (i.e. poison), then you iterate say with
> args 1 and have in case 1: a = 0;, won't that trigger runtime error?

Wonder if for the variables declared inside of switch body, because we don't
care about uses before scope, it wouldn't be more efficient to arrange for
int *p, *q, *r;
switch (x)
  {
int a;
  case 1:
p = 
a = 5;
break;
int b;
  case 2:
int c;
q = 
r = 
b = 3;
c = 4;
break;
  }
to effectively ASAN_MARK (2, , 4); ASAN_MARK (2, , 4); ASAN_MARK (2, , 4);
before the GIMPLE_SWITCH, instead of unpoisoning them on every case label
where they might be in scope.  Though, of course, at least until lower pass
that is quite ugly, because it would refer to "not yet declared" variables.
Perhaps we'd need to move the ASAN_MARK and GIMPLE_SWITCH (but of course not
the expression evaluation of the switch control expression) inside of the
switches' GIMPLE_BIND.

Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-02 Thread Martin Liška
On 11/02/2016 10:59 AM, Jakub Jelinek wrote:
> On Wed, Nov 02, 2016 at 10:36:44AM +0100, Martin Liška wrote:
>> On 11/01/2016 03:53 PM, Jakub Jelinek wrote:
>>> What kind of false positives it is for each case?  Is it with normal
>>> asan-bootstrap (without your -fsanitize-use-after-scope changes), or
>>> only with those changes, or only with those changes and
>>> -fsanitize-use-after-scope used during bootstrap?
>>
>> Ok, the situation is simpler than I thought:
> 
> CCing also Marek.
>>
>> #include 
>>
>> int main(int argc, char **argv)
>> {
>>   int *ptr;
>>
>>   switch (argc)
>> {
>>   int a;
>>
>> case 1:
>>   break;
>>
>> default:
>>   ptr = 
>>   break;
>> }
>>
>>   fprintf (stderr, "v: %d\n", *ptr);
>>   return 0;
>> }
>>
>> Which is gimplified as:
>>
>> int * ptr;
>>
>> switch (argc) , case 1: >
>> {
>>   int a;
>>
>>   try
>> {
>>   ASAN_MARK (2, , 4);
>>   :
>>   goto ;
>>   :
>>   ptr = 
>>   goto ;
>> }
>>   finally
>> {
>>   ASAN_MARK (1, , 4);
>> }
>> }
>> :
>> _1 = *ptr;
>> stderr.0_2 = stderr;
>> fprintf (stderr.0_2, "v: %d\n", _1);
>> D.2577 = 0;
>> return D.2577;
>>   }
>>   D.2577 = 0;
>>   return D.2577;
>>
>> and thus we get:
>> /tmp/switch-case.c:9:11: warning: statement will never be executed 
>> [-Wswitch-unreachable]
>>int a;
>>
>> I'm wondering where properly fix that, we can either find all these 
>> ASAN_MARKs in gimplify_switch_expr
>> and distribute it to all labels (which are gimplified). Or we can put such 
>> variables to asan_poisoned_variables
>> if we have information that we're gimplifing statements before a first 
>> label. Do we know that from gimple context?
>> If so, these variables will be unpoisoned at the very beginning of each 
>> label and the ASAN_MARK call in between
>> switch statement and a first label can be removed.
> 
> Wouldn't it be easiest if -Wswitch-unreachable warning just ignored
> the ASAN_MARK internal calls altogether?
> Do you emit there any other statements, or just ASAN_MARK and nothing else?

Yep, skipping warning can be done easily, however gimplified code is wrong as
un-poisoning is not done for variable (even for a valid program).

> 
> Shouldn't there be also ASAN_MARK on the implicit gotos from the switch
> statement?  Otherwise, consider this being done in a loop, after the first
> iteration you ASAN_MARK (1, , 4) (i.e. poison), then you iterate say with
> args 1 and have in case 1: a = 0;, won't that trigger runtime error?

Hopefully having the un-poisoning call be encapsulated in finally block would 
properly
clean up the variable. Or am I wrong?

Martin

> 
>   Jakub
> 



Re: [PATCH, gcc/ARM, ping] Add support for Cortex-M23

2016-11-02 Thread Thomas Preudhomme

Ping?

Best regards,

Thomas

On 26/10/16 17:42, Thomas Preudhomme wrote:

Hi,

This patch adds support for the Cortex-M23 processor launched by ARM [1]. The
patch adds support for the name and wires it up to the ARMv8-M Baseline
architecture and arm_v6m_tune tuning parameters for the time being. It also
updates documentation to mention this new processor.

[1] http://www.arm.com/products/processors/cortex-m/cortex-m23-processor.php

ChangeLog entry is as follows:

*** gcc/Changelog ***

2016-10-26  Thomas Preud'homme  

* config/arm/arm-arches.def (armv8-m.base): Set Cortex-M23 as
representative core for this architecture.
* config/arm/arm-cores.def (cortex-m23): Define new processor.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Likewise.
* config/arm/arm.c (arm_v6m_tune): Add Cortex-M23 to the list of cores
this tuning parameters apply to in the comment.
* config/arm/bpabi.h (BE8_LINK_SPEC): Add Cortex-M23 to the list of
valid -mcpu options.
* doc/invoke.texi (ARM Options): Document new Cortex-M23 processor.


Tested by building libgcc and libstdc++ for Cortex-M23 and running a hello world
compiled for it.

Is this ok for master?

Best regards,

Thomas
diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index 4b196a7d1188de5eca028e5c2597bbc20835201f..9293429b3f9a026bcdacc1651c534bdf14d4df1e 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -69,7 +69,7 @@ ARM_ARCH ("armv8.2-a", cortexa53,  8A,
 ARM_ARCH ("armv8.2-a+fp16", cortexa53,  8A,
 	  ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A,
 			 FL2_FOR_ARCH8_2A | FL2_FP16INST))
-ARM_ARCH("armv8-m.base", cortexm0, 8M_BASE,
+ARM_ARCH("armv8-m.base", cortexm23, 8M_BASE,
 	 ARM_FSET_MAKE_CPU1 (			  FL_FOR_ARCH8M_BASE))
 ARM_ARCH("armv8-m.main", cortexm7, 8M_MAIN,
 	 ARM_FSET_MAKE_CPU1(FL_CO_PROC |	  FL_FOR_ARCH8M_MAIN))
diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 2072e1e6f8d84533deead24e6fb0b6aff7603f24..940b5de82f0340fc0c26be80d47729bc1f193db0 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -166,6 +166,7 @@ ARM_CORE("cortex-a15.cortex-a7", cortexa15cortexa7, cortexa7,	7A,	ARM_FSET_MAKE_
 ARM_CORE("cortex-a17.cortex-a7", cortexa17cortexa7, cortexa7,	7A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV | FL_FOR_ARCH7A), cortex_a12)
 
 /* V8 Architecture Processors */
+ARM_CORE("cortex-m23",	cortexm23, cortexm23,	8M_BASE, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_FOR_ARCH8M_BASE), v6m)
 ARM_CORE("cortex-a32",	cortexa32, cortexa53,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a35)
 ARM_CORE("cortex-a35",	cortexa35, cortexa53,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a35)
 ARM_CORE("cortex-a53",	cortexa53, cortexa53,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a53)
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index ee9e3bb7ec57e0e8f2f15b83442711b9faf82d20..de712924afd33ba1e6e65cb56a5b260858d0cc4f 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -307,6 +307,9 @@ EnumValue
 Enum(processor_type) String(cortex-a17.cortex-a7) Value(cortexa17cortexa7)
 
 EnumValue
+Enum(processor_type) String(cortex-m23) Value(cortexm23)
+
+EnumValue
 Enum(processor_type) String(cortex-a32) Value(cortexa32)
 
 EnumValue
diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md
index 594ce9d1734451f89812200191cb35f1f579289e..46c2c9258bcad43618a50f6201414fa084cb5b56 100644
--- a/gcc/config/arm/arm-tune.md
+++ b/gcc/config/arm/arm-tune.md
@@ -32,9 +32,9 @@
 	cortexr4f,cortexr5,cortexr7,
 	cortexr8,cortexm7,cortexm4,
 	cortexm3,marvell_pj4,cortexa15cortexa7,
-	cortexa17cortexa7,cortexa32,cortexa35,
-	cortexa53,cortexa57,cortexa72,
-	cortexa73,exynosm1,qdf24xx,
-	xgene1,cortexa57cortexa53,cortexa72cortexa53,
-	cortexa73cortexa35,cortexa73cortexa53"
+	cortexa17cortexa7,cortexm23,cortexa32,
+	cortexa35,cortexa53,cortexa57,
+	cortexa72,cortexa73,exynosm1,
+	qdf24xx,xgene1,cortexa57cortexa53,
+	cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53"
 	(const (symbol_ref "((enum attr_tune) arm_tune)")))
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 022c1d72a1272e56397dc7e2018483e77f18b90d..39b2da05d2135c68032231bb7780104061355786 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2243,7 +2243,8 @@ const struct tune_params arm_cortex_m7_tune =
 };
 
 /* The arm_v6m_tune is duplicated from arm_cortex_tune, rather than
-   arm_v6t2_tune. It is used for cortex-m0, cortex-m1 and cortex-m0plus.  */
+   arm_v6t2_tune.  It is used for cortex-m0, cortex-m1, cortex-m0plus and
+   cortex-m23.  */
 const struct tune_params arm_v6m_tune =
 {
   arm_9e_rtx_costs,
diff --git a/gcc/config/arm/bpabi.h b/gcc/config/arm/bpabi.h
index 

Re: [PATCH, gcc/ARM, ping] Add support for Cortex-M33

2016-11-02 Thread Thomas Preudhomme

Ping?

Best regards,

Thomas

On 26/10/16 17:42, Thomas Preudhomme wrote:

Hi,

This patch adds support for the Cortex-M33 processor launched by ARM [1]. The
patch adds support for the name and wires it up to the ARMv8-M Mainline with DSP
extensions architecture and arm_v7m_tune tuning parameters for the time being.
It also updates documentation to mention this new processor.

[1] http://www.arm.com/products/processors/cortex-m/cortex-m33-processor.php

ChangeLog entry is as follows:

*** gcc/Changelog ***

2016-10-26  Thomas Preud'homme  

* config/arm/arm-arches.def (armv8-m.main+dsp): Set Cortex-M33 as
representative core for this architecture.
* config/arm/arm-cores.def (cortex-m33): Define new processor.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Likewise.
* config/arm/bpabi.h (BE8_LINK_SPEC): Add Cortex-M33 to the list of
valid -mcpu options.
* doc/invoke.texi (ARM Options): Document new Cortex-M33 processor.


Tested by building libgcc and libstdc++ for Cortex-M33 and running a hello world
compiled for it.

Is this ok for master?

Best regards,

Thomas
diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index 9293429b3f9a026bcdacc1651c534bdf14d4df1e..cd79bc505853d4dda6cf2e58bdc2d129032befef 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -73,7 +73,7 @@ ARM_ARCH("armv8-m.base", cortexm23, 8M_BASE,
 	 ARM_FSET_MAKE_CPU1 (			  FL_FOR_ARCH8M_BASE))
 ARM_ARCH("armv8-m.main", cortexm7, 8M_MAIN,
 	 ARM_FSET_MAKE_CPU1(FL_CO_PROC |	  FL_FOR_ARCH8M_MAIN))
-ARM_ARCH("armv8-m.main+dsp", cortexm7, 8M_MAIN,
+ARM_ARCH("armv8-m.main+dsp", cortexm33, 8M_MAIN,
 	 ARM_FSET_MAKE_CPU1(FL_CO_PROC | FL_ARCH7EM | FL_FOR_ARCH8M_MAIN))
 ARM_ARCH("iwmmxt",  iwmmxt, 5TE,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT))
 ARM_ARCH("iwmmxt2", iwmmxt2,5TE,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2))
diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 940b5de82f0340fc0c26be80d47729bc1f193db0..ec63ee4abe54af06cd5531486f294f9a8dae71a1 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -168,6 +168,7 @@ ARM_CORE("cortex-a17.cortex-a7", cortexa17cortexa7, cortexa7,	7A,	ARM_FSET_MAKE_
 /* V8 Architecture Processors */
 ARM_CORE("cortex-m23",	cortexm23, cortexm23,	8M_BASE, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_FOR_ARCH8M_BASE), v6m)
 ARM_CORE("cortex-a32",	cortexa32, cortexa53,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a35)
+ARM_CORE("cortex-m33",	cortexm33, cortexm33,	8M_MAIN, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_ARCH7EM | FL_FOR_ARCH8M_MAIN), v7m)
 ARM_CORE("cortex-a35",	cortexa35, cortexa53,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a35)
 ARM_CORE("cortex-a53",	cortexa53, cortexa53,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a53)
 ARM_CORE("cortex-a57",	cortexa57, cortexa57,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a57)
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index de712924afd33ba1e6e65cb56a5b260858d0cc4f..f7886b94be779fcba91506e77574662fe7188876 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -313,6 +313,9 @@ EnumValue
 Enum(processor_type) String(cortex-a32) Value(cortexa32)
 
 EnumValue
+Enum(processor_type) String(cortex-m33) Value(cortexm33)
+
+EnumValue
 Enum(processor_type) String(cortex-a35) Value(cortexa35)
 
 EnumValue
diff --git a/gcc/config/arm/arm-tune.md b/gcc/config/arm/arm-tune.md
index 46c2c9258bcad43618a50f6201414fa084cb5b56..e782baccf424e51ac19ef5f02d25ed4f4eb0541d 100644
--- a/gcc/config/arm/arm-tune.md
+++ b/gcc/config/arm/arm-tune.md
@@ -33,8 +33,9 @@
 	cortexr8,cortexm7,cortexm4,
 	cortexm3,marvell_pj4,cortexa15cortexa7,
 	cortexa17cortexa7,cortexm23,cortexa32,
-	cortexa35,cortexa53,cortexa57,
-	cortexa72,cortexa73,exynosm1,
-	qdf24xx,xgene1,cortexa57cortexa53,
-	cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53"
+	cortexm33,cortexa35,cortexa53,
+	cortexa57,cortexa72,cortexa73,
+	exynosm1,qdf24xx,xgene1,
+	cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,
+	cortexa73cortexa53"
 	(const (symbol_ref "((enum attr_tune) arm_tune)")))
diff --git a/gcc/config/arm/bpabi.h b/gcc/config/arm/bpabi.h
index 302302f0d2d522fe282bb1d12687b53de72cae25..d45a1ca421901da25e16d965a9474438ea10f349 100644
--- a/gcc/config/arm/bpabi.h
+++ b/gcc/config/arm/bpabi.h
@@ -97,7 +97,7 @@
|march=armv8.2-a+fp16\
|march=armv8-m.base|mcpu=cortex-m23			\
|march=armv8-m.main	\
-   |march=armv8-m.main+dsp\
+   |march=armv8-m.main+dsp|mcpu=cortex-m33		\
:%{!r:--be8}}}"
 #else
 #define BE8_LINK_SPEC \
@@ -136,7 +136,7 @@
|march=armv8.2-a+fp16\
|march=armv8-m.base|mcpu=cortex-m23			\

Re: [PATCH, GCC/ARM 2/2, ping2] Allow combination of aprofile and rmprofile multilibs

2016-11-02 Thread Thomas Preudhomme

Ping?

Best regards,

Thomas

On 24/10/16 09:07, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 13/10/16 16:35, Thomas Preudhomme wrote:

Hi ARM maintainers,

This patchset aims at adding multilib support for R and M profile ARM
architectures and allowing it to be built alongside multilib for A profile ARM
architectures. This specific patch is concerned with the latter. The patch works
by moving the bits shared by both aprofile and rmprofile multilib build
(variable initilization as well as ISA and float ABI to build multilib for) to a
new t-multilib file. Then, based on which profile was requested in
--with-multilib-list option, that files includes t-aprofile and/or t-rmprofile
where the architecture and FPU to build the multilib for are specified.

Unfortunately the duplication of CPU to A profile architectures could not be
avoided because substitution due to MULTILIB_MATCHES are not transitive.
Therefore, mapping armv7-a to armv7 for rmprofile multilib build does not have
the expected effect. Two patches were written to allow this using 2 different
approaches but I decided against it because this is not the right solution IMO.
See caveats below for what I believe is the correct approach.


*** combined build caveats ***

As the documentation in this patch warns, there is a few caveats to using a
combined multilib build due to the way the multilib framework works.

1) For instance, when using only rmprofile the combination of options -mthumb
-march=armv7 -mfpu=neon the thumb/-march=armv7 multilib but in a combined
multilib build the default multilib would be used. This is because in the
rmprofile build -mfpu=neon is not specified in MULTILIB_OPTION and thus the
option is ignored when considering MULTILIB_REQUIRED entries.

2) Another issue is the fact that aprofile and rmprofile multilib build have
some conflicting requirements in terms of how to map options for which no
multilib is built to another option. (i) A first example of this is the
difference of CPU to architecture mapping mentionned above: rmprofile multilib
build needs A profile CPUs and architectures to be mapped down to ARMv7 so that
one of the v7-ar multilib gets chosen in such a case but aprofile needs A
profile architectures to stand on their own because multilibs are built for
several architectures.

(ii) Another example of this is that in aprofile multilib build no multilib is
built with -mfpu=fpv5-d16 but some multilibs are built with -mfpu=fpv4-d16.
Therefore, aprofile defines a match rule to map fpv5-d16 onto fpv4-d16. However,
rmprofile multilib profile *does* build some multilibs with -mfpu=fpv5-d16. This
has the consequence that when building for -mthumb -march=armv7e-m
-mfpu=fpv5-d16 -mfloat-abi=hard the default multilib is chosen because this is
rewritten into -mthumb -march=armv7e-m -mfpu=fpv5-d16 -mfloat-abi=hard and there
is no multilib for that.

Both of these issues could be handled by using MULTILIB_REUSE instead of
MULTILIB_MATCHES but this would require a large set of rules. I believe instead
the right approach is to create a new mechanism to inform GCC on how options can
be down mapped _when no multilib can be found_ which would require a smaller set
of rules and would make it explicit that the options are not equivalent. A patch
will be posted to this effect at a later time.

ChangeLog entry is as follows:


*** gcc/ChangeLog ***

2016-10-03  Thomas Preud'homme  

* config.gcc: Allow combinations of aprofile and rmprofile values for
--with-multilib-list.
* config/arm/t-multilib: New file.
* config/arm/t-aprofile: Remove initialization of MULTILIB_*
variables.  Remove setting of ISA and floating-point ABI in
MULTILIB_OPTIONS and MULTILIB_DIRNAMES.  Set architecture and FPU in
MULTI_ARCH_OPTS_A and MULTI_ARCH_DIRS_A rather than MULTILIB_OPTIONS
and MULTILIB_DIRNAMES respectively.  Add comment to introduce all
matches.  Add architecture matches for marvel-pj4 and generic-armv7-a
CPU options.
* config/arm/t-rmprofile: Likewise except for the matches changes.
* doc/install.texi (--with-multilib-list): Document the combination of
aprofile and rmprofile values and warn about pitfalls in doing that.


Testing:

* "tree install/lib/gcc/arm-none-eabi/7.0.0" is the same before and after the
patchset for both aprofile and rmprofile
* "tree install/lib/gcc/arm-none-eabi/7.0.0" is the same for aprofile,rmprofile
and rmprofile,aprofile
* default spec (gcc -dumpspecs) is the same for aprofile,rmprofile and
rmprofile,aprofile

* Difference in --print-multi-directory between aprofile or rmprofile and
aprofile,rmprofile for all combination of ISA (ARM/Thumb), architecture, CPU,
FPU and float ABI is as expected (best multilib for the combination is chosen),
modulo the caveat mentionned above and the new marvel-pj4 and generic-armv7-a
CPU to architecture mapping.


Is this ok for master?

Best 

Re: [PATCH, GCC/ARM 1/2, ping] Add multilib support for embedded bare-metal targets

2016-11-02 Thread Thomas Preudhomme

Ping?

Best regards,

Thomas

On 27/10/16 15:26, Thomas Preudhomme wrote:

Hi Kyrill,

On 27/10/16 10:45, Kyrill Tkachov wrote:

Hi Thomas,

On 24/10/16 09:06, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 13/10/16 16:35, Thomas Preudhomme wrote:

Hi ARM maintainers,

This patchset aims at adding multilib support for R and M profile ARM
architectures and allowing it to be built alongside multilib for A profile ARM
architectures. This specific patch adds the t-rmprofile multilib Makefile
fragment for the former objective. Multilib are built for all M profile
architecture involved: ARMv6S-M, ARMv7-M and ARMv7E-M as well as ARMv7. ARMv7
multilib is used for R profile architectures but also A profile architectures.

ChangeLog entry is as follows:


*** gcc/ChangeLog ***

2016-10-03  Thomas Preud'homme 

* config.gcc: Allow new rmprofile value for configure option
--with-multilib-list.
* config/arm/t-rmprofile: New file.
* doc/install.texi (--with-multilib-list): Document new rmprofile value
for ARM.


Testing:

== aprofile ==
* "tree install/lib/gcc/arm-none-eabi/7.0.0" is the same before and after the
patchset for both aprofile and rmprofile
* default spec (gcc -dumpspecs) is the same before and after the patchset for
aprofile
* No difference in --print-multi-directory between before and after the
patchset
for aprofile for all combination of ISA (ARM/Thumb), architecture, CPU, FPU and
float ABI

== rmprofile ==
* aprofile and rmprofile use similar directory structure (ISA/arch/FPU/float
ABI) and directory naming
* Difference in --print-multi-directory between before [1] and after the
patchset for rmprofile for all combination of ISA (ARM/Thumb), architecture,
CPU, FPU and float ABI modulo the name and directory structure changes

[1] as per patch applied in ARM embedded branches
https://gcc.gnu.org/viewcvs/gcc/branches/ARM/embedded-5-branch/gcc/config/arm/t-baremetal?view=markup




== aprofile + rmprofile ==
* aprofile,rmprofile and rmprofile,aprofile builds give an error saying it is
not supported


Is this ok for master branch?

Best regards,

Thomas


+# Arch Matches
+MULTILIB_MATCHES   += march?armv6s-m=march?armv6-m
+MULTILIB_MATCHES   += march?armv8-m.main=march?armv8-m.main+dsp
+MULTILIB_MATCHES   += march?armv7=march?armv7-r
+ifeq (,$(HAS_APROFILE))
+MULTILIB_MATCHES   += march?armv7=march?armv7-a
+MULTILIB_MATCHES   += march?armv7=march?armv7ve
+MULTILIB_MATCHES   += march?armv7=march?armv8-a
+MULTILIB_MATCHES   += march?armv7=march?armv8-a+crc
+MULTILIB_MATCHES   += march?armv7=march?armv8.1-a
+MULTILIB_MATCHES   += march?armv7=march?armv8.1-a+crc
+endif

I think you want to update the patch to handle -march=armv8.2-a and
armv8.2-a+fp16
Thanks,
Kyrill


Indeed. Please find updated ChangeLog and patch (attached):

*** gcc/ChangeLog ***

2016-10-03  Thomas Preud'homme  

* config.gcc: Allow new rmprofile value for configure option
--with-multilib-list.
* config/arm/t-rmprofile: New file.
* doc/install.texi (--with-multilib-list): Document new rmprofile value
for ARM.

Ok for trunk?

Best regards,

Thomas
diff --git a/gcc/config.gcc b/gcc/config.gcc
index d956da22ad60abfe9c6b4be0882f9e7dd64ac39f..15b662ad5449f8b91eb760b7fbe45f33d8cecb4b 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3739,6 +3739,16 @@ case "${target}" in
 # pragmatic.
 tmake_profile_file="arm/t-aprofile"
 ;;
+			rmprofile)
+# Note that arm/t-rmprofile is a
+# stand-alone make file fragment to be
+# used only with itself.  We do not
+# specifically use the
+# TM_MULTILIB_OPTION framework because
+# this shorthand is more
+# pragmatic.
+tmake_profile_file="arm/t-rmprofile"
+;;
 			default)
 ;;
 			*)
@@ -3748,9 +3758,10 @@ case "${target}" in
 			esac
 
 			if test "x${tmake_profile_file}" != x ; then
-# arm/t-aprofile is only designed to work
-# without any with-cpu, with-arch, with-mode,
-# with-fpu or with-float options.
+# arm/t-aprofile and arm/t-rmprofile are only
+# designed to work without any with-cpu,
+# with-arch, with-mode, with-fpu or with-float
+# options.
 if test "x$with_arch" != x \
 || test "x$with_cpu" != x \
 || test "x$with_float" != x \
diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile
new file mode 100644
index ..c8b5c9cbd03694eea69855e20372afa3e97d6b4c
--- /dev/null
+++ b/gcc/config/arm/t-rmprofile
@@ -0,0 +1,174 @@
+# Copyright (C) 2016 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it 

Re: [PATCH, ARM/testsuite 6/7, ping] Force soft float in ARMv6-M and ARMv8-M Baseline options

2016-11-02 Thread Thomas Preudhomme

Ping?

Best regards,

Thomas

On 28/10/16 10:49, Thomas Preudhomme wrote:

On 22/09/16 16:47, Richard Earnshaw (lists) wrote:

On 22/09/16 15:51, Thomas Preudhomme wrote:

Sorry, noticed an error in the patch. It was not caught during testing
because GCC was built with --with-mode=thumb. Correct patch attached.

Best regards,

Thomas

On 22/09/16 14:49, Thomas Preudhomme wrote:

Hi,

ARMv6-M and ARMv8-M Baseline only support soft float ABI. Therefore, the
arm_arch_v8m_base add option should pass -mfloat-abi=soft, much like
-mthumb is
passed for architectures that only support Thumb instruction set. This
patch
adds -mfloat-abi=soft to both arm_arch_v6m and arm_arch_v8m_base add
options.
Patch is in attachment.

ChangeLog entry is as follows:

*** gcc/testsuite/ChangeLog ***

2016-07-15  Thomas Preud'homme  

* lib/target-supports.exp (add_options_for_arm_arch_v6m): Add
-mfloat-abi=soft option.
(add_options_for_arm_arch_v8m_base): Likewise.


Is this ok for trunk?

Best regards,

Thomas


6_softfloat_testing_v6m_v8m_baseline.patch


diff --git a/gcc/testsuite/lib/target-supports.exp
b/gcc/testsuite/lib/target-supports.exp
index
0dabea0850124947a7fe333e0b94c4077434f278..b5d72f1283be6a6e4736a1d20936e169c1384398
100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3540,24 +3540,25 @@ proc check_effective_target_arm_fp16_hw { } {
 # Usage: /* { dg-require-effective-target arm_arch_v5_ok } */
 #/* { dg-add-options arm_arch_v5 } */
 # /* { dg-require-effective-target arm_arch_v5_multilib } */
-foreach { armfunc armflag armdef } { v4 "-march=armv4 -marm" __ARM_ARCH_4__
- v4t "-march=armv4t" __ARM_ARCH_4T__
- v5 "-march=armv5 -marm" __ARM_ARCH_5__
- v5t "-march=armv5t" __ARM_ARCH_5T__
- v5te "-march=armv5te" __ARM_ARCH_5TE__
- v6 "-march=armv6" __ARM_ARCH_6__
- v6k "-march=armv6k" __ARM_ARCH_6K__
- v6t2 "-march=armv6t2" __ARM_ARCH_6T2__
- v6z "-march=armv6z" __ARM_ARCH_6Z__
- v6m "-march=armv6-m -mthumb" __ARM_ARCH_6M__
- v7a "-march=armv7-a" __ARM_ARCH_7A__
- v7r "-march=armv7-r" __ARM_ARCH_7R__
- v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__
- v7em "-march=armv7e-m -mthumb" __ARM_ARCH_7EM__
- v8a "-march=armv8-a" __ARM_ARCH_8A__
- v8_1a "-march=armv8.1a" __ARM_ARCH_8A__
- v8m_base "-march=armv8-m.base -mthumb"
__ARM_ARCH_8M_BASE__
- v8m_main "-march=armv8-m.main -mthumb"
__ARM_ARCH_8M_MAIN__ } {
+foreach { armfunc armflag armdef } {
+v4 "-march=armv4 -marm" __ARM_ARCH_4__
+v4t "-march=armv4t" __ARM_ARCH_4T__
+v5 "-march=armv5 -marm" __ARM_ARCH_5__
+v5t "-march=armv5t" __ARM_ARCH_5T__
+v5te "-march=armv5te" __ARM_ARCH_5TE__
+v6 "-march=armv6" __ARM_ARCH_6__
+v6k "-march=armv6k" __ARM_ARCH_6K__
+v6t2 "-march=armv6t2" __ARM_ARCH_6T2__
+v6z "-march=armv6z" __ARM_ARCH_6Z__
+v6m "-march=armv6-m -mthumb -mfloat-abi=soft" __ARM_ARCH_6M__
+v7a "-march=armv7-a" __ARM_ARCH_7A__
+v7r "-march=armv7-r" __ARM_ARCH_7R__
+v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__
+v7em "-march=armv7e-m -mthumb" __ARM_ARCH_7EM__
+v8a "-march=armv8-a" __ARM_ARCH_8A__
+v8_1a "-march=armv8.1a" __ARM_ARCH_8A__
+v8m_base "-march=armv8-m.base -mthumb -mfloat-abi=soft"
__ARM_ARCH_8M_BASE__
+v8m_main "-march=armv8-m.main -mthumb" __ARM_ARCH_8M_MAIN__ } {
 eval [string map [list FUNC $armfunc FLAG $armflag DEF $armdef ] {
 proc check_effective_target_arm_arch_FUNC_ok { } {
 if { [ string match "*-marm*" "FLAG" ] &&



I think if you're going to do this you need to also check that changing
the ABI in this way isn't incompatible with other aspects of how the
user has invoked dejagnu.


The reason this patch was made is that without it dg-require-effective-target
arm_arch_v8m_base_ok evaluates to true for an arm-none-linux-gnueabihf toolchain
but then any testcase containing a function for such a target (such as the
atomic-op-* in gcc.target/arm) will error out because ARMv8-M Baseline does not
support hard float ABI.

I see 2 ways to fix this:

1) the approach taken in this patch, ie saying that to select ARMv8-M baseline
architecture you need the right -march, -mthumb but also the right float ABI.

Note that the comment at the top of that procedure says:
# Creates a series of routines that return 1 if the given architecture
# can be selected and a routine to give the flags to select that architecture

2) Add a function to the assembly that is used to test support for the
architecture.

The reason I favor the first one is that it enables more test while the second
test would just skip ARMv6-M and ARMv8-M Baseline tests for

Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-02 Thread Jakub Jelinek
On Wed, Nov 02, 2016 at 10:36:44AM +0100, Martin Liška wrote:
> On 11/01/2016 03:53 PM, Jakub Jelinek wrote:
> > What kind of false positives it is for each case?  Is it with normal
> > asan-bootstrap (without your -fsanitize-use-after-scope changes), or
> > only with those changes, or only with those changes and
> > -fsanitize-use-after-scope used during bootstrap?
> 
> Ok, the situation is simpler than I thought:

CCing also Marek.
> 
> #include 
> 
> int main(int argc, char **argv)
> {
>   int *ptr;
> 
>   switch (argc)
> {
>   int a;
> 
> case 1:
>   break;
> 
> default:
>   ptr = 
>   break;
> }
> 
>   fprintf (stderr, "v: %d\n", *ptr);
>   return 0;
> }
> 
> Which is gimplified as:
> 
> int * ptr;
> 
> switch (argc) , case 1: >
> {
>   int a;
> 
>   try
> {
>   ASAN_MARK (2, , 4);
>   :
>   goto ;
>   :
>   ptr = 
>   goto ;
> }
>   finally
> {
>   ASAN_MARK (1, , 4);
> }
> }
> :
> _1 = *ptr;
> stderr.0_2 = stderr;
> fprintf (stderr.0_2, "v: %d\n", _1);
> D.2577 = 0;
> return D.2577;
>   }
>   D.2577 = 0;
>   return D.2577;
> 
> and thus we get:
> /tmp/switch-case.c:9:11: warning: statement will never be executed 
> [-Wswitch-unreachable]
>int a;
> 
> I'm wondering where properly fix that, we can either find all these 
> ASAN_MARKs in gimplify_switch_expr
> and distribute it to all labels (which are gimplified). Or we can put such 
> variables to asan_poisoned_variables
> if we have information that we're gimplifing statements before a first label. 
> Do we know that from gimple context?
> If so, these variables will be unpoisoned at the very beginning of each label 
> and the ASAN_MARK call in between
> switch statement and a first label can be removed.

Wouldn't it be easiest if -Wswitch-unreachable warning just ignored
the ASAN_MARK internal calls altogether?
Do you emit there any other statements, or just ASAN_MARK and nothing else?

Shouldn't there be also ASAN_MARK on the implicit gotos from the switch
statement?  Otherwise, consider this being done in a loop, after the first
iteration you ASAN_MARK (1, , 4) (i.e. poison), then you iterate say with
args 1 and have in case 1: a = 0;, won't that trigger runtime error?

Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-02 Thread Jakub Jelinek
On Wed, Nov 02, 2016 at 10:40:35AM +0100, Richard Biener wrote:
> > I wonder if the sanitize_asan_mark wouldn't better be some PROP_* property
> > set during the asan pass and kept on until end of compilation of that
> > function.  That means even if a var only addressable because of ASAN_MARK is
> > discovered after this pass we'd still be able to rewrite it into SSA.
> 
> Note that being TREE_ADDRESSABLE also has effects on alias analysis
> (didn't follow the patches to see whether you handle ASAN_MARK specially
> in points-to analysis and/or alias analysis).
> 
> Generally in update-address-taken you can handle ASAN_MARK similar to
> MEM_REF (and drop it in the rewrite phase?).

That is the intent, but we can't do that before the asan pass, because
otherwise as Martin explained we don't diagnose at runtime bugs where
a variable is used outside of its scope only through a MEM_REF.
So we need to wait for asan pass to actually add a real builtin call that
takes the address in that case.  Except now I really don't see how that
can work for the case where the var is used only properly when it is in the
scope, because the asan pass will still see those being addressable.

Unless I'm missing something we either need to perform further analysis
during the addressable subpass - this variable could be made
non-addressable, but is used in ASAN_MARK, would if we rewrote it into SSA
form there be any SSA uses of the poisoning ASAN_MARK?  If yes, keep it
addressable, otherwise rewrite into SSA.
Or, just rewrite it into SSA always, but turn the poisoning ASAN_MARK into
some new internal ECF_CONST call var_5 = ASAN_POISON (); and if we have any
uses of those, rewrite it back into addressable immediately or later or
something.

Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-02 Thread Martin Liška
On 11/01/2016 04:12 PM, Jakub Jelinek wrote:
> On Tue, Nov 01, 2016 at 03:53:46PM +0100, Martin Liška wrote:
>> @@ -1504,7 +1505,7 @@ non_rewritable_lvalue_p (tree lhs)
>>  
>>  static void
>>  maybe_optimize_var (tree var, bitmap addresses_taken, bitmap not_reg_needs,
>> -bitmap suitable_for_renaming)
>> +bitmap suitable_for_renaming, bitmap marked_nonaddressable)
>>  {
>>/* Global Variables, result decls cannot be changed.  */
>>if (is_global_var (var)
>> @@ -1522,6 +1523,7 @@ maybe_optimize_var (tree var, bitmap addresses_taken, 
>> bitmap not_reg_needs,
>>|| !bitmap_bit_p (not_reg_needs, DECL_UID (var
>>  {
>>TREE_ADDRESSABLE (var) = 0;
>> +  bitmap_set_bit (marked_nonaddressable, DECL_UID (var));
> 
> Why do you need the marked_nonaddressable bitmap?

Because the later loop (which visits every gimple statement) iterates only
if there's an entry in suitable_for_renaming.

> 
>>if (is_gimple_reg (var))
>>  bitmap_set_bit (suitable_for_renaming, DECL_UID (var));
>>if (dump_file)
>> @@ -1550,20 +1552,43 @@ maybe_optimize_var (tree var, bitmap 
>> addresses_taken, bitmap not_reg_needs,
>>  }
>>  }
>>  
>> -/* Compute TREE_ADDRESSABLE and DECL_GIMPLE_REG_P for local variables.  */
>> +/* Return true when STMT is ASAN mark where second argument is an address
>> +   of a local variable.  */
>>  
>> -void
>> -execute_update_addresses_taken (void)
>> +static bool
>> +is_asan_mark_p (gimple *stmt)
>> +{
>> +  if (!gimple_call_internal_p (stmt, IFN_ASAN_MARK))
>> +return false;
>> +
>> +  tree addr = get_base_address (gimple_call_arg (stmt, 1));
>> +  if (TREE_CODE (addr) == ADDR_EXPR
>> +  && TREE_CODE (TREE_OPERAND (addr, 0)) == VAR_DECL)
> 
> Just check here if dropping TREE_ADDRESSABLE from the VAR (use VAR_P btw)
> would turn it into is_gimple_reg), and don't return true if not.

Well, the predicate is called once before maybe_optimize_var, thus I need to 
have
it conservative and not consider TREE_ADDRESSABLE flag. Having argument would 
work
for that?

> 
>> +return true;
>> +
>> +  return false;
>> +}
>> +
>> +/* Compute TREE_ADDRESSABLE and DECL_GIMPLE_REG_P for local variables.
>> +   If SANITIZE_ASAN_MARK is set to true, sanitize also ASAN_MARK built-ins. 
>>  */
>> +
>> +
>> +static void
>> +execute_update_addresses_taken (bool sanitize_asan_mark = false)
> 
> I wonder if the sanitize_asan_mark wouldn't better be some PROP_* property
> set during the asan pass and kept on until end of compilation of that
> function.  That means even if a var only addressable because of ASAN_MARK is
> discovered after this pass we'd still be able to rewrite it into SSA.

It's doable (please see attached patch) and also nicer. However, one would need 
to
extend curr_properties to long type as we already use 16 existing values.

Martin

> 
>   Jakub
> 

>From ad5f68a010674118fac7ca8b6953f7b99fd3c2a8 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 1 Nov 2016 11:21:20 +0100
Subject: [PATCH] Use-after-scope: do not mark variables that are no longer
 addressable

gcc/ChangeLog:

2016-11-02  Martin Liska  

	* asan.c: Update properties_provided and todo_flags_finish.
	* function.h (struct GTY): Change int to long as there's not
	enough space for a new value.
	* tree-pass.h: Define PROP_asan_check_done.
	* tree-ssa.c (maybe_optimize_var): Add new argument.
	(is_asan_mark_p): New function.
	(execute_update_addresses_taken): Handle ASAN_MARK internal fns.
---
 gcc/asan.c  |  5 +++--
 gcc/function.h  |  2 +-
 gcc/tree-pass.h |  1 +
 gcc/tree-ssa.c  | 69 +
 4 files changed, 60 insertions(+), 17 deletions(-)

diff --git a/gcc/asan.c b/gcc/asan.c
index 95495d2..94ee877 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -59,6 +59,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "builtins.h"
 #include "fnmatch.h"
+#include "tree-ssa.h"
 
 /* AddressSanitizer finds out-of-bounds and use-after-free bugs
with <2x slowdown on average.
@@ -2993,10 +2994,10 @@ const pass_data pass_data_asan =
   OPTGROUP_NONE, /* optinfo_flags */
   TV_NONE, /* tv_id */
   ( PROP_ssa | PROP_cfg | PROP_gimple_leh ), /* properties_required */
-  0, /* properties_provided */
+  PROP_asan_check_done, /* properties_provided */
   0, /* properties_destroyed */
   0, /* todo_flags_start */
-  TODO_update_ssa, /* todo_flags_finish */
+  TODO_update_ssa | TODO_update_address_taken, /* todo_flags_finish */
 };
 
 class pass_asan : public gimple_opt_pass
diff --git a/gcc/function.h b/gcc/function.h
index e854c7f..5600488 100644
--- a/gcc/function.h
+++ b/gcc/function.h
@@ -289,7 +289,7 @@ struct GTY(()) function {
   location_t function_end_locus;
 
   /* Properties used by the pass manager.  */
-  unsigned int curr_properties;
+  unsigned long curr_properties;
   unsigned int last_verified;
 
   /* Non-null if the function does 

Re: [PATCH] Fix nonoverlapping_memrefs_p ICE (PR target/77834)

2016-11-02 Thread Richard Biener
On Wed, 2 Nov 2016, Jakub Jelinek wrote:

> On Wed, Nov 02, 2016 at 10:34:15AM +0100, Richard Biener wrote:
> > Ok, just looking at regex.c (I happen to have regex.i around) shows
> > the first hit as
> > 
> > #2  0x008d9914 in true_dependence_1 (mem=0x761ef360, 
> > mem_mode=DImode, mem_addr=0x76220cf0, x=0x76605060, 
> > x_addr=0x7662ffd8, mem_canonicalized=true)
> > at /space/rguenther/src/svn/trunk/gcc/alias.c:2928
> > 2928  if (nonoverlapping_memrefs_p (mem, x, false))
> > (gdb) p debug_tree (expry)
> >   > type  > 
> > 2928  if (nonoverlapping_memrefs_p (mem, x, false))
> > (gdb) p debug_rtx (mem)
> > (mem/f/c:DI (plus:DI (reg/f:DI 7 sp)
> > (const_int 200 [0xc8])) [5 p+0 S8 A64])
> > $4 = void
> > (gdb) p debug_rtx (x)
> > (mem:QI (symbol_ref:DI ("malloc") [flags 0x41]  > 0x76943000 malloc>) [0 __builtin_malloc S1 A8])
> > $5 = void
> > 
> > called from DSE check_mem_read_rtx.  I believe that's a bogus query.
> 
> I guess I should also log debug_rtx (DECL_RTL (expr{x,y})) in those cases
> then and perhaps also if we ever return non-zero in those cases.

Yeah, plus if a followup test would have disambiguated things (the
dispatch to the tree oracle for example).

> > (I see only a single hit on regex.c, your log seems to have multiple
> > ones...)
> 
> That was statistics gathered across x86_64-linux and i686-linux simultaneous
> bootstraps + regtests (and most likely I've mistyped the first value
> from (int) BITS_PER_WORD to (int) BITS_PER_UNIT :( ).  So regex.c is built
> there many times (3 times each bootstrap at least).

Ah, ok.  At least 12 times for x86_64 with multilibs (we build PIC and 
non-PIC variants IIRC).

Richard.


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-02 Thread Martin Liška
On 11/02/2016 10:40 AM, Richard Biener wrote:
> On Tue, Nov 1, 2016 at 4:12 PM, Jakub Jelinek  wrote:
>> On Tue, Nov 01, 2016 at 03:53:46PM +0100, Martin Liška wrote:
>>> @@ -1504,7 +1505,7 @@ non_rewritable_lvalue_p (tree lhs)
>>>
>>>  static void
>>>  maybe_optimize_var (tree var, bitmap addresses_taken, bitmap not_reg_needs,
>>> - bitmap suitable_for_renaming)
>>> + bitmap suitable_for_renaming, bitmap 
>>> marked_nonaddressable)
>>>  {
>>>/* Global Variables, result decls cannot be changed.  */
>>>if (is_global_var (var)
>>> @@ -1522,6 +1523,7 @@ maybe_optimize_var (tree var, bitmap addresses_taken, 
>>> bitmap not_reg_needs,
>>> || !bitmap_bit_p (not_reg_needs, DECL_UID (var
>>>  {
>>>TREE_ADDRESSABLE (var) = 0;
>>> +  bitmap_set_bit (marked_nonaddressable, DECL_UID (var));
>>
>> Why do you need the marked_nonaddressable bitmap?
>>
>>>if (is_gimple_reg (var))
>>>   bitmap_set_bit (suitable_for_renaming, DECL_UID (var));
>>>if (dump_file)
>>> @@ -1550,20 +1552,43 @@ maybe_optimize_var (tree var, bitmap 
>>> addresses_taken, bitmap not_reg_needs,
>>>  }
>>>  }
>>>
>>> -/* Compute TREE_ADDRESSABLE and DECL_GIMPLE_REG_P for local variables.  */
>>> +/* Return true when STMT is ASAN mark where second argument is an address
>>> +   of a local variable.  */
>>>
>>> -void
>>> -execute_update_addresses_taken (void)
>>> +static bool
>>> +is_asan_mark_p (gimple *stmt)
>>> +{
>>> +  if (!gimple_call_internal_p (stmt, IFN_ASAN_MARK))
>>> +return false;
>>> +
>>> +  tree addr = get_base_address (gimple_call_arg (stmt, 1));
>>> +  if (TREE_CODE (addr) == ADDR_EXPR
>>> +  && TREE_CODE (TREE_OPERAND (addr, 0)) == VAR_DECL)
>>
>> Just check here if dropping TREE_ADDRESSABLE from the VAR (use VAR_P btw)
>> would turn it into is_gimple_reg), and don't return true if not.
>>
>>> +return true;
>>> +
>>> +  return false;
>>> +}
>>> +
>>> +/* Compute TREE_ADDRESSABLE and DECL_GIMPLE_REG_P for local variables.
>>> +   If SANITIZE_ASAN_MARK is set to true, sanitize also ASAN_MARK 
>>> built-ins.  */
>>> +
>>> +
>>> +static void
>>> +execute_update_addresses_taken (bool sanitize_asan_mark = false)
>>
>> I wonder if the sanitize_asan_mark wouldn't better be some PROP_* property
>> set during the asan pass and kept on until end of compilation of that
>> function.  That means even if a var only addressable because of ASAN_MARK is
>> discovered after this pass we'd still be able to rewrite it into SSA.
> 
> Note that being TREE_ADDRESSABLE also has effects on alias analysis
> (didn't follow the patches to see whether you handle ASAN_MARK specially
> in points-to analysis and/or alias analysis).

Currently all manipulation with shadow memory is done via a pointer type
which has created a separate aliasing set:

static void
asan_init_shadow_ptr_types (void)
{
  asan_shadow_set = new_alias_set ();
  tree types[3] = { signed_char_type_node, short_integer_type_node,
integer_type_node };

  for (unsigned i = 0; i < 3; i++)
{
  shadow_ptr_types[i] = build_distinct_type_copy (types[i]);
  TYPE_ALIAS_SET (shadow_ptr_types[i]) = asan_shadow_set;
  shadow_ptr_types[i] = build_pointer_type (shadow_ptr_types[i]);
}
...

Martin

> 
> Generally in update-address-taken you can handle ASAN_MARK similar to
> MEM_REF (and drop it in the rewrite phase?).
> 
> As said, I didnt look at the patches and just came by here seeing
> tree-ssa.c pieces...
> 
> Richard.
> 
>> Jakub



Re: [PATCH] Fix nonoverlapping_memrefs_p ICE (PR target/77834)

2016-11-02 Thread Jakub Jelinek
On Wed, Nov 02, 2016 at 10:34:15AM +0100, Richard Biener wrote:
> Ok, just looking at regex.c (I happen to have regex.i around) shows
> the first hit as
> 
> #2  0x008d9914 in true_dependence_1 (mem=0x761ef360, 
> mem_mode=DImode, mem_addr=0x76220cf0, x=0x76605060, 
> x_addr=0x7662ffd8, mem_canonicalized=true)
> at /space/rguenther/src/svn/trunk/gcc/alias.c:2928
> 2928  if (nonoverlapping_memrefs_p (mem, x, false))
> (gdb) p debug_tree (expry)
>   type  
> 2928  if (nonoverlapping_memrefs_p (mem, x, false))
> (gdb) p debug_rtx (mem)
> (mem/f/c:DI (plus:DI (reg/f:DI 7 sp)
> (const_int 200 [0xc8])) [5 p+0 S8 A64])
> $4 = void
> (gdb) p debug_rtx (x)
> (mem:QI (symbol_ref:DI ("malloc") [flags 0x41]  0x76943000 malloc>) [0 __builtin_malloc S1 A8])
> $5 = void
> 
> called from DSE check_mem_read_rtx.  I believe that's a bogus query.

I guess I should also log debug_rtx (DECL_RTL (expr{x,y})) in those cases
then and perhaps also if we ever return non-zero in those cases.

> (I see only a single hit on regex.c, your log seems to have multiple
> ones...)

That was statistics gathered across x86_64-linux and i686-linux simultaneous
bootstraps + regtests (and most likely I've mistyped the first value
from (int) BITS_PER_WORD to (int) BITS_PER_UNIT :( ).  So regex.c is built
there many times (3 times each bootstrap at least).

Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-02 Thread Richard Biener
On Tue, Nov 1, 2016 at 4:12 PM, Jakub Jelinek  wrote:
> On Tue, Nov 01, 2016 at 03:53:46PM +0100, Martin Liška wrote:
>> @@ -1504,7 +1505,7 @@ non_rewritable_lvalue_p (tree lhs)
>>
>>  static void
>>  maybe_optimize_var (tree var, bitmap addresses_taken, bitmap not_reg_needs,
>> - bitmap suitable_for_renaming)
>> + bitmap suitable_for_renaming, bitmap marked_nonaddressable)
>>  {
>>/* Global Variables, result decls cannot be changed.  */
>>if (is_global_var (var)
>> @@ -1522,6 +1523,7 @@ maybe_optimize_var (tree var, bitmap addresses_taken, 
>> bitmap not_reg_needs,
>> || !bitmap_bit_p (not_reg_needs, DECL_UID (var
>>  {
>>TREE_ADDRESSABLE (var) = 0;
>> +  bitmap_set_bit (marked_nonaddressable, DECL_UID (var));
>
> Why do you need the marked_nonaddressable bitmap?
>
>>if (is_gimple_reg (var))
>>   bitmap_set_bit (suitable_for_renaming, DECL_UID (var));
>>if (dump_file)
>> @@ -1550,20 +1552,43 @@ maybe_optimize_var (tree var, bitmap 
>> addresses_taken, bitmap not_reg_needs,
>>  }
>>  }
>>
>> -/* Compute TREE_ADDRESSABLE and DECL_GIMPLE_REG_P for local variables.  */
>> +/* Return true when STMT is ASAN mark where second argument is an address
>> +   of a local variable.  */
>>
>> -void
>> -execute_update_addresses_taken (void)
>> +static bool
>> +is_asan_mark_p (gimple *stmt)
>> +{
>> +  if (!gimple_call_internal_p (stmt, IFN_ASAN_MARK))
>> +return false;
>> +
>> +  tree addr = get_base_address (gimple_call_arg (stmt, 1));
>> +  if (TREE_CODE (addr) == ADDR_EXPR
>> +  && TREE_CODE (TREE_OPERAND (addr, 0)) == VAR_DECL)
>
> Just check here if dropping TREE_ADDRESSABLE from the VAR (use VAR_P btw)
> would turn it into is_gimple_reg), and don't return true if not.
>
>> +return true;
>> +
>> +  return false;
>> +}
>> +
>> +/* Compute TREE_ADDRESSABLE and DECL_GIMPLE_REG_P for local variables.
>> +   If SANITIZE_ASAN_MARK is set to true, sanitize also ASAN_MARK built-ins. 
>>  */
>> +
>> +
>> +static void
>> +execute_update_addresses_taken (bool sanitize_asan_mark = false)
>
> I wonder if the sanitize_asan_mark wouldn't better be some PROP_* property
> set during the asan pass and kept on until end of compilation of that
> function.  That means even if a var only addressable because of ASAN_MARK is
> discovered after this pass we'd still be able to rewrite it into SSA.

Note that being TREE_ADDRESSABLE also has effects on alias analysis
(didn't follow the patches to see whether you handle ASAN_MARK specially
in points-to analysis and/or alias analysis).

Generally in update-address-taken you can handle ASAN_MARK similar to
MEM_REF (and drop it in the rewrite phase?).

As said, I didnt look at the patches and just came by here seeing
tree-ssa.c pieces...

Richard.

> Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-11-02 Thread Martin Liška
On 11/01/2016 03:53 PM, Jakub Jelinek wrote:
> What kind of false positives it is for each case?  Is it with normal
> asan-bootstrap (without your -fsanitize-use-after-scope changes), or
> only with those changes, or only with those changes and
> -fsanitize-use-after-scope used during bootstrap?

Ok, the situation is simpler than I thought:

#include 

int main(int argc, char **argv)
{
  int *ptr;

  switch (argc)
{
  int a;

case 1:
  break;

default:
  ptr = 
  break;
}

  fprintf (stderr, "v: %d\n", *ptr);
  return 0;
}

Which is gimplified as:

int * ptr;

switch (argc) , case 1: >
{
  int a;

  try
{
  ASAN_MARK (2, , 4);
  :
  goto ;
  :
  ptr = 
  goto ;
}
  finally
{
  ASAN_MARK (1, , 4);
}
}
:
_1 = *ptr;
stderr.0_2 = stderr;
fprintf (stderr.0_2, "v: %d\n", _1);
D.2577 = 0;
return D.2577;
  }
  D.2577 = 0;
  return D.2577;

and thus we get:
/tmp/switch-case.c:9:11: warning: statement will never be executed 
[-Wswitch-unreachable]
   int a;

I'm wondering where properly fix that, we can either find all these ASAN_MARKs 
in gimplify_switch_expr
and distribute it to all labels (which are gimplified). Or we can put such 
variables to asan_poisoned_variables
if we have information that we're gimplifing statements before a first label. 
Do we know that from gimple context?
If so, these variables will be unpoisoned at the very beginning of each label 
and the ASAN_MARK call in between
switch statement and a first label can be removed.

Thoughts?
Thanks,
Martin



Re: [PATCH] Fix nonoverlapping_memrefs_p ICE (PR target/77834)

2016-11-02 Thread Richard Biener
On Wed, 2 Nov 2016, Jakub Jelinek wrote:

> On Wed, Nov 02, 2016 at 10:08:25AM +0100, Richard Biener wrote:
> > On Mon, 31 Oct 2016, Jakub Jelinek wrote:
> > 
> > > Hi!
> > > 
> > > Some automatic VAR_DECLs don't get DECL_RTL set - e.g. if its SSA_NAMEs
> > > expand to multiple rtls, then there is not a single one that can be used.
> > > Using DECL_RTL on such VAR_DECLs ICEs.
> > > 
> > > I've tried to just return 0 in nonoverlapping_memrefs_p if either
> > > DECL_RTL_SET_P (expr{x,y}) wasn't true, but that triggered way too often
> > > during bootstrap/regtest (3800+ times).  So the following patch narrows it
> > > down more and triggers only on the testcase below.
> > 
> > What kind of cases did this trigger on?  I would have expected we
> > have DECL_RTL_SET_P on (almost?) all decls that can have it.  And for
> > those that don't it should be uninteresting to have?
> 
> I admit I havne't studied it in detail yet.  Attaching list of
> BITS_PER_WORD, main_input_filename, current_function_name ()
> where nonoverlapping_memrefs_p returned 0 early because
> either exprx or expry didn't have DECL_RTL_SET_P.  As except for the new
> testcase that didn't result into ICE, all of those must have been something
> where make_decl_rtl created RTL in that case.

Ok, just looking at regex.c (I happen to have regex.i around) shows
the first hit as

#2  0x008d9914 in true_dependence_1 (mem=0x761ef360, 
mem_mode=DImode, mem_addr=0x76220cf0, x=0x76605060, 
x_addr=0x7662ffd8, mem_canonicalized=true)
at /space/rguenther/src/svn/trunk/gcc/alias.c:2928
2928  if (nonoverlapping_memrefs_p (mem, x, false))
(gdb) p debug_tree (expry)
 ) [0 __builtin_malloc S1 A8])
$5 = void

called from DSE check_mem_read_rtx.  I believe that's a bogus query.

(I see only a single hit on regex.c, your log seems to have multiple
ones...)

I wouldn't worry about the lost cases, and if I'd be ambitious I'd
try to investigate the above and see what remains...

> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > 
> > Hmm.  Can you try splitting out a decl_can_have_rtl () predicate
> > from make_decl_rtl and use that?
> 
> Will do.

Thanks,
Richard.


Re: [PATCH, RFC, rs6000] Add overloaded built-in function support to altivec.h, and re-implement vec_add

2016-11-02 Thread Jakub Jelinek
On Wed, Nov 02, 2016 at 10:19:26AM +0100, Richard Biener wrote:
> On Tue, Nov 1, 2016 at 1:09 AM, Jakub Jelinek  wrote:
> > On Mon, Oct 31, 2016 at 05:28:42PM -0500, Bill Schmidt wrote:
> >> The PowerPC back end loses performance on vector intrinsics, because 
> >> currently
> >> all of them are treated as calls throughout the middle-end phases and only
> >> expanded when they reach RTL.  Our version of altivec.h currently defines 
> >> the
> >> public names of overloaded functions (like vec_add) to be #defines for 
> >> hidden
> >> functions (like __builtin_vec_add), which are recognized in the parser as
> >> requiring special back-end support.  Tables in rs6000-c.c handle dispatch 
> >> of
> >> the overloaded functions to specific function calls appropriate to the 
> >> argument
> >> types.
> >
> > This doesn't look very nice.  If all you care is that the builtins like
> > __builtin_altivec_vaddubm etc. that __builtin_vec_add overloads into fold
> > into generic vector operations under certain conditions, just fold those
> > into whatever you want in targetm.gimple_fold_builtin (gsi).
> 
> Note that traditionally "overloading" with GCC "builtins" is done by
> using varargs
> and the "type generic" attribute.  That doesn't scale to return type 
> overloading
> though for which we usually added direct support to the parser (for example
> for __builtin_shuffle).

My understanding is that rs6000 already does that, it hooks into
resolve_overloaded_builtin which already handles the fully type generic
builtins where not just the arguments, but also the return type can be
picked up.  But it resolves the overloaded builtins into calls to other
builtins that are not type-generic.

So, either that function instead of returning the specific md builtin calls
in some cases already returns trees with the generic behavior of the
builtin, or it returns what it does now and then in the gimple fold builtin
target hook (note, the normal fold builtin target hook is not right for
that, because it is mostly used for folding builtins into constant - callers
will usually throw away other results) fold those specific md builtins
into whatever GIMPLE you want.  If we want to decrease amount of folding in
the FEs, the gimple fold builtin hook is probably better.

Jakub


Re: [PATCH] PR tree-optimization/78162: Reject negative offsets in store merging early

2016-11-02 Thread Richard Biener
On Tue, Nov 1, 2016 at 12:54 PM, Kyrill Tkachov
 wrote:
> Hi all,
>
> Store merging ICEs on this invalid testcase because it trips up on the
> negative bitposition to store to.
> It doesn't really expect to handle negative offsets and I believe they won't
> occur very often in valid code anyway.
> Filling out structs/bitfields/class members involves positive offsets.
> I can look into removing all the assumptions about positive offsets if folks
> want me to (should involve removing
> some 'unsigned' modifiers from HOST_WIDE_INTs and double-checking some range
> checks), but for now
> this patch just fixes the ICE by rejecting negative offsets early on.
>
> Bootstrapped and tested on aarch64-none-linux-gnu and x86_64.
>
> Ok for trunk?

Ok (an improvement would be to only reject it after eventually
processing a MEM_REF base_addr).

Richard.

> Thanks,
> Kyrill
>
> 2016-11-01  Kyrylo Tkachov  
>
> PR tree-optimization/78162
> * gimple-ssa-store-merging.c (execute): Mark stores with bitpos < 0
> as invalid.
>
> 2016-11-01  Kyrylo Tkachov  
>
> PR tree-optimization/78162
> * gcc.c-torture/compile/pr78162.c: New test.


Re: [PATCH] Fix nonoverlapping_memrefs_p ICE (PR target/77834)

2016-11-02 Thread Jakub Jelinek
On Wed, Nov 02, 2016 at 10:08:25AM +0100, Richard Biener wrote:
> On Mon, 31 Oct 2016, Jakub Jelinek wrote:
> 
> > Hi!
> > 
> > Some automatic VAR_DECLs don't get DECL_RTL set - e.g. if its SSA_NAMEs
> > expand to multiple rtls, then there is not a single one that can be used.
> > Using DECL_RTL on such VAR_DECLs ICEs.
> > 
> > I've tried to just return 0 in nonoverlapping_memrefs_p if either
> > DECL_RTL_SET_P (expr{x,y}) wasn't true, but that triggered way too often
> > during bootstrap/regtest (3800+ times).  So the following patch narrows it
> > down more and triggers only on the testcase below.
> 
> What kind of cases did this trigger on?  I would have expected we
> have DECL_RTL_SET_P on (almost?) all decls that can have it.  And for
> those that don't it should be uninteresting to have?

I admit I havne't studied it in detail yet.  Attaching list of
BITS_PER_WORD, main_input_filename, current_function_name ()
where nonoverlapping_memrefs_p returned 0 early because
either exprx or expry didn't have DECL_RTL_SET_P.  As except for the new
testcase that didn't result into ICE, all of those must have been something
where make_decl_rtl created RTL in that case.

> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> Hmm.  Can you try splitting out a decl_can_have_rtl () predicate
> from make_decl_rtl and use that?

Will do.

Jakub


alias.bz2
Description: BZip2 compressed data


Re: [PATCH] PR tree-optimization/78170: Truncate sign-extended padding when encoding bitfields

2016-11-02 Thread Richard Biener
On Tue, Nov 1, 2016 at 12:54 PM, Kyrill Tkachov
 wrote:
> Hi all,
>
> In this PR the code writes a -1 to a bitfield of size 17 bits and ends up
> overwriting another bitfields.
> The problem is that the intermediate buffer in encode_tree_to_bitpos holding
> the value to merge holds
> a 24-bit temporary with -1 written to it i.e. sign-extended to all ones.
> That is how native_encode_expr works.This gets then written to
> the final buffer (well, a shifted version of it).
>
> We should instead be truncating the intermediate value to contain zeros in
> all the bits that we don't want.
> This is already performed for big-endian, this patch just wires it up for
> little-endian.
>
> Bootstrapped and tested on x86_64.
> Ok for trunk?

Ok.

Richard.

> Thanks,
> Kyrill
>
> 2016-11-01  Kyrylo Tkachov  
>
> PR tree-optimization/78170
> * gimple-ssa-store-merging.c (encode_tree_to_bitpos): Truncate padding
> introduced by native_encode_expr on little-endian as well.
>
> 2016-11-01  Kyrylo Tkachov  
>
> PR tree-optimization/78170
> * gcc.c-torture/execute/pr78170.c: New test.


Re: [PATCH, RFC, rs6000] Add overloaded built-in function support to altivec.h, and re-implement vec_add

2016-11-02 Thread Richard Biener
On Tue, Nov 1, 2016 at 1:09 AM, Jakub Jelinek  wrote:
> On Mon, Oct 31, 2016 at 05:28:42PM -0500, Bill Schmidt wrote:
>> The PowerPC back end loses performance on vector intrinsics, because 
>> currently
>> all of them are treated as calls throughout the middle-end phases and only
>> expanded when they reach RTL.  Our version of altivec.h currently defines the
>> public names of overloaded functions (like vec_add) to be #defines for hidden
>> functions (like __builtin_vec_add), which are recognized in the parser as
>> requiring special back-end support.  Tables in rs6000-c.c handle dispatch of
>> the overloaded functions to specific function calls appropriate to the 
>> argument
>> types.
>
> This doesn't look very nice.  If all you care is that the builtins like
> __builtin_altivec_vaddubm etc. that __builtin_vec_add overloads into fold
> into generic vector operations under certain conditions, just fold those
> into whatever you want in targetm.gimple_fold_builtin (gsi).

Note that traditionally "overloading" with GCC "builtins" is done by
using varargs
and the "type generic" attribute.  That doesn't scale to return type overloading
though for which we usually added direct support to the parser (for example
for __builtin_shuffle).

The folding trick of course should work just fine.

Richard.

> Jakub


Re: [PATCH] Five patches for std::experimental::filesystem

2016-11-02 Thread Christophe Lyon
On 27 October 2016 at 15:34, Jonathan Wakely  wrote:
> On 26/10/16 09:24 +0200, Christophe Lyon wrote:
>>
>> Hi Jonathan,
>>
>> On 25 October 2016 at 17:32, Jonathan Wakely  wrote:
>>>
>>> Two more fixes for the filesystem TS, and improved tests.
>>>
>>>   Handle negative times in filesystem::last_write_time
>>>   * src/filesystem/ops.cc
>>>(last_write_time(const path&, file_time_type, error_code&)):
>>> Handle
>>>negative times correctly.
>>>* testsuite/experimental/filesystem/operations/last_write_time.cc:
>>>Test writing file times.
>>>
>>>Fix error handling in copy_file and equivalent
>>>   * src/filesystem/ops.cc (do_copy_file): Report an error if
>>> source
>>> or
>>>destination is not a regular file (LWG 2712).
>>>(equivalent): Fix error handling and result when only one file
>>> exists.
>>>* testsuite/experimental/filesystem/operations/copy.cc: Remove
>>> files
>>>created by tests. Test copying directories.
>>>* testsuite/experimental/filesystem/operations/copy_file.cc:
>>> Remove
>>>files created by tests.
>>>* testsuite/experimental/filesystem/operations/equivalent.cc: New.
>>>* testsuite/experimental/filesystem/operations/is_empty.cc: New.
>>>* testsuite/experimental/filesystem/operations/read_symlink.cc:
>>> Remove
>>>file created by test.
>>>* testsuite/experimental/filesystem/operations/remove_all.cc: New.
>>>* testsuite/util/testsuite_fs.h (~scoped_file): Only try to remove
>>>file if path is non-empty, to support removal by other means.
>>>
>>> Tested x86_64-linux, committed to trunk.
>>>
>>>
>> I can see failures in
>> experimental/filesystem/operations/last_write_time.cc after your
>> committed this patch:
>>
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/libstdc++-v3/testsuite/experimental/filesystem/operations/last_write_time.cc:127:
>> void test02(): Assertion 'last_write_time(f.path) == time' failed.
>> on arm*linux* and aarch64*linux* targets.
>
>
> That test will fail for targets where _GLIBCXX_USE_UTIMENSAT is not
> defined, as they use utime() instead which only supports second
> granularity.
>
> This should solve it, by only checking that the file times are within
> one second of the expected value.
>

Hi Jonathan,
Indeed your patch fixes the problem I reported.
Sorry for the delay, I was on holidays.

Thanks,

Christophe


>
> Tested x86_64-linux, committed to trunk.


Re: [PATCH] Fix nonoverlapping_memrefs_p ICE (PR target/77834)

2016-11-02 Thread Richard Biener
On Mon, 31 Oct 2016, Jakub Jelinek wrote:

> Hi!
> 
> Some automatic VAR_DECLs don't get DECL_RTL set - e.g. if its SSA_NAMEs
> expand to multiple rtls, then there is not a single one that can be used.
> Using DECL_RTL on such VAR_DECLs ICEs.
> 
> I've tried to just return 0 in nonoverlapping_memrefs_p if either
> DECL_RTL_SET_P (expr{x,y}) wasn't true, but that triggered way too often
> during bootstrap/regtest (3800+ times).  So the following patch narrows it
> down more and triggers only on the testcase below.

What kind of cases did this trigger on?  I would have expected we
have DECL_RTL_SET_P on (almost?) all decls that can have it.  And for
those that don't it should be uninteresting to have?

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Hmm.  Can you try splitting out a decl_can_have_rtl () predicate
from make_decl_rtl and use that?

Thanks,
Richard.

> 2016-10-31  Jakub Jelinek  
> 
>   PR target/77834
>   * alias.c (nonoverlapping_memrefs_p): Return 0 if exprx or expry
>   doesn't have rtl set and it can't be safely created.
> 
>   * gcc.dg/pr77834.c: New test.
> 
> --- gcc/alias.c.jj2016-10-21 17:06:27.0 +0200
> +++ gcc/alias.c   2016-10-31 11:38:29.448031590 +0100
> @@ -2755,6 +2755,27 @@ nonoverlapping_memrefs_p (const_rtx x, c
>|| TREE_CODE (expry) == CONST_DECL)
>  return 1;
>  
> +  /* Don't try to create RTL for decls that intentionally don't have
> + DECL_RTL set (e.g. marked as living in multiple places).  */
> +  if (!DECL_RTL_SET_P (exprx))
> +{
> +  if (TREE_CODE (exprx) == PARM_DECL
> +   || TREE_CODE (exprx) == RESULT_DECL
> +   || (VAR_P (exprx)
> +   && !TREE_STATIC (exprx)
> +   && !DECL_EXTERNAL (exprx)))
> + return 0;
> +}
> +  if (!DECL_RTL_SET_P (expry))
> +{
> +  if (TREE_CODE (expry) == PARM_DECL
> +   || TREE_CODE (expry) == RESULT_DECL
> +   || (VAR_P (expry)
> +   && !TREE_STATIC (expry)
> +   && !DECL_EXTERNAL (expry)))
> + return 0;
> +}
> +
>rtlx = DECL_RTL (exprx);
>rtly = DECL_RTL (expry);
>  
> --- gcc/testsuite/gcc.dg/pr77834.c.jj 2016-10-31 11:41:46.290521464 +0100
> +++ gcc/testsuite/gcc.dg/pr77834.c2016-10-31 11:41:24.0 +0100
> @@ -0,0 +1,18 @@
> +/* PR target/77834 */
> +/* { dg-do compile } */
> +/* { dg-options "-O -ftree-pre -Wno-psabi" } */
> +/* { dg-additional-options "-mstringop-strategy=libcall" { target i?86-*-* 
> x86_64-*-* } } */
> +
> +typedef int V __attribute__ ((vector_size (64)));
> +
> +V
> +foo (V u, V v, int w)
> +{
> +  do
> +{
> +  if (u[0]) v ^= u[w];
> +}
> +  while ((V) { 0, u[w] }[1]);
> +  u = (V) { v[v[0]], u[u[0]] };
> +  return v + u;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] bb-reorder: Improve compgotos pass (PR71785)

2016-11-02 Thread Richard Biener
On Mon, Oct 31, 2016 at 4:35 PM, Segher Boessenkool
 wrote:
> On Mon, Oct 31, 2016 at 04:09:48PM +0100, Steven Bosscher wrote:
>> On Sun, Oct 30, 2016 at 8:10 PM, Segher Boessenkool wrote:
>> > This patch solves this problem by simply running the 
>> > duplicate_computed_gotos
>> > pass again, as long as it does any work.  The patch looks much bigger than
>> > it is, because I factored out two routines to simplify the control flow.
>>
>> It's made the patch a bit difficult to read. Condensing it a bit...
>
> Well, it would have a goto crossing a loop, or two gotos crossing each
> other, otherwise.  I should have done it as two patches I guess (first
> factor, then change).
>
>> > +  for (;;)
>> >  {
>> > +  if (n_basic_blocks_for_fn (fun) <= NUM_FIXED_BLOCKS + 1)
>> > +   return 0;
>>
>> This test should not be needed in the loop. This pass can never
>> collapse the function to a single basic block.
>
> Yeah maybe, but that relies on quite a few assumptions.  This is strictly
> an optimisation anyway, will move it outside the loop.
>
>> > +  basic_block bb;
>> > +  FOR_EACH_BB_FN (bb, fun)
>> > +   {
>> > + /* Build the reorder chain for the original order of blocks.  */
>> > + if (bb->next_bb != EXIT_BLOCK_PTR_FOR_FN (fun))
>> > +   bb->aux = bb->next_bb;
>> > +   }
>> >
>> > +  duplicate_computed_gotos_find_candidates (fun, candidates, 
>> > max_size);
>> >
>> > +  bool changed = false;
>> > +  if (!bitmap_empty_p (candidates))
>> > +   changed = duplicate_computed_gotos_do_duplicate (fun, candidates);
>> >
>> > +  if (changed)
>> > +   fixup_partitions ();
>> > +
>> > +  cfg_layout_finalize ();
>>
>> I don't think you have to go into/out-of cfglayout mode for each iteration.
>
> Yeah probably.  I was too lazy :-)  It needs the cleanup_cfg every
> iteration though, I was afraid that interacts.

Ick.  Why would it need a cfg-cleanup every iteration?  I fear this is quadratic
complexity in the number of edges to the compgoto block (and the size of the
function).  Can't the unfactoring perform the "cleanup" we rely on here?

>> >/* Merge the duplicated blocks into predecessors, when possible.  */
>> > +  if (changed)
>> > +   cleanup_cfg (0);
>> > +  else
>> > +   break;
>> >  }
>>
>> Maybe a gcc_assert that the loop doesn't iterate more often than num_edges?
>
> Good plan (num blocks even).
>
> Thanks,
>
>
> Segher


Re: [PATCH] Fix host_size_t_cst_p predicate

2016-11-02 Thread Richard Biener
On Mon, Oct 31, 2016 at 3:56 PM, Martin Liška  wrote:
> On 10/31/2016 12:11 PM, Richard Biener wrote:
>> On Mon, Oct 31, 2016 at 11:18 AM, Richard Sandiford
>>  wrote:
>>> Richard Biener  writes:
 On Mon, Oct 31, 2016 at 10:58 AM, Richard Sandiford
  wrote:
> Richard Biener  writes:
>> On Mon, Oct 31, 2016 at 10:10 AM, Martin Liška  wrote:
>>> On 10/31/2016 01:12 AM, Richard Sandiford wrote:
 Richard Biener  writes:
> On Thu, Oct 27, 2016 at 5:06 PM, Martin Liška  wrote:
>> On 10/27/2016 03:35 PM, Richard Biener wrote:
>>> On Thu, Oct 27, 2016 at 9:41 AM, Martin Liška  
>>> wrote:
 Running simple test-case w/o the proper header file causes ICE:
 strncmp ("a", "b", -1);

 0xe74462 tree_to_uhwi(tree_node const*)
 ../../gcc/tree.c:7324
 0x90a23f host_size_t_cst_p
 ../../gcc/fold-const-call.c:63
 0x90a23f fold_const_call(combined_fn, tree_node*, tree_node*,
 tree_node*, tree_node*)
 ../../gcc/fold-const-call.c:1512
 0x787b01 fold_builtin_3
 ../../gcc/builtins.c:8385
 0x787b01 fold_builtin_n(unsigned int, tree_node*, tree_node**,
 int, bool)
 ../../gcc/builtins.c:8465
 0x9052b1 fold(tree_node*)
 ../../gcc/fold-const.c:11919
 0x6de2bb c_fully_fold_internal
 ../../gcc/c/c-fold.c:185
 0x6e1f6b c_fully_fold(tree_node*, bool, bool*)
 ../../gcc/c/c-fold.c:90
 0x67cbbf c_process_expr_stmt(unsigned int, tree_node*)
 ../../gcc/c/c-typeck.c:10369
 0x67cfbd c_finish_expr_stmt(unsigned int, tree_node*)
 ../../gcc/c/c-typeck.c:10414
 0x6cb578 c_parser_statement_after_labels
 ../../gcc/c/c-parser.c:5430
 0x6cd333 c_parser_compound_statement_nostart
 ../../gcc/c/c-parser.c:4944
 0x6cdbde c_parser_compound_statement
 ../../gcc/c/c-parser.c:4777
 0x6c93ac c_parser_declaration_or_fndef
 ../../gcc/c/c-parser.c:2176
 0x6d51ab c_parser_external_declaration
 ../../gcc/c/c-parser.c:1574
 0x6d5c09 c_parser_translation_unit
 ../../gcc/c/c-parser.c:1454
 0x6d5c09 c_parse_file()
 ../../gcc/c/c-parser.c:18173
 0x72ffd2 c_common_parse_file()
 ../../gcc/c-family/c-opts.c:1087

 Following patch improves the host_size_t_cst_p predicate.

 Patch can bootstrap on ppc64le-redhat-linux and survives
 regression tests.

 Ready to be installed?
>>>
>>> I believe the wi::min_precision (t, UNSIGNED) <= sizeof (size_t) *
>>> CHAR_BIT test is now redundant.
>>>
>>> OTOH it was probably desired to allow -1 here?  A little looking 
>>> back
>>> in time should tell.
>>
>> Ok, it started with r229922, where it was changed from:
>>
>>   if (tree_fits_uhwi_p (len) && p1 && p2)
>> {
>>   const int i = strncmp (p1, p2, tree_to_uhwi (len));
>> ...
>>
>> to current version:
>>
>> case CFN_BUILT_IN_STRNCMP:
>>   {
>> bool const_size_p = host_size_t_cst_p (arg2, );
>>
>> Thus I'm suggesting to change to back to it.
>>
>> Ready to be installed?
>
> Let's ask Richard.

 The idea with the:

   wi::min_precision (t, UNSIGNED) <= sizeof (size_t) * CHAR_BIT

 test was to stop us attempting 64-bit size_t operations on ILP32 hosts.
 I think we still want that.
>>>
>>> OK, so is the consensus to add tree_fits_uhwi_p predicate to the current
>>> wi::min_precision check, right?
>>
>> Not sure.  If we have host_size_t_cst_p then we should have a 
>> corresponding
>> size_t host_size_t (const_tree) and should use those in pairs.  Not sure
>> why we have sth satisfying host_size_t_cst_p but not tree_fits_uhwi_p.
>
> It's the other way around: something can satisfy tree_fits_uhwi_p
> (i.e. fit within a uint64_t) but not fit within the host's size_t.
> The kind of case I'm thinking of is:
>
>   strncmp ("fi", "fo", (1L << 32) + 1)
>
> for an ILP32 host and LP64 target.  There's a danger that by passing
> the uint64_t value (1L << 32) + 1 to the host's strncmp that 

Re: [PATCH, testsuite]: Cleanup lib/target-supports.exp, ...

2016-11-02 Thread Jakub Jelinek
On Wed, Nov 02, 2016 at 12:39:08PM +0530, Prathamesh Kulkarni wrote:
> On 1 November 2016 at 23:41, Uros Bizjak  wrote:
> > On Tue, Nov 1, 2016 at 5:05 PM, Jakub Jelinek  wrote:
> >> On Tue, Nov 01, 2016 at 10:05:22AM +0100, Uros Bizjak wrote:
> >>> ... simplify some conditions and add i?86-*-* target where missing.
> >>>
> >>> 2016-11-01  Uros Bizjak  
> >>>
> >>> * lib/target-supports.exp: Normalize order of i?86 and x86_64 targets.
> >>> Whitespace fixes.
> >> ...
> >>> (check_effective_target_divmod): Add i?86-*-* target.
> >>
> >> This part likely broke
> >> +FAIL: gcc.dg/divmod-1.c scan-tree-dump-times widening_mul "DIVMOD" 7
> >> +FAIL: gcc.dg/divmod-2.c scan-tree-dump-times widening_mul "DIVMOD" 7
> >> +FAIL: gcc.dg/divmod-3.c scan-tree-dump-times widening_mul "DIVMOD" 7
> >> +FAIL: gcc.dg/divmod-4.c scan-tree-dump-times widening_mul "DIVMOD" 7
> >> +FAIL: gcc.dg/divmod-6.c scan-tree-dump-times widening_mul "DIVMOD" 7
> >> on i686-linux (i.e. 32-bit).
> >
> > No, this is expected (these tests already fail with x86_64 -m32
> > multilib). These will be fixed by [1].
> Oops, sorry for the breakage.
> The tests are meant to check if the divmod transform triggered, which
> is done by scanning
> DIVMOD in the widening_mul dump.
> 
> Apparently I only checked for the triplet "x86_64-*-*" in
> check_effective_target_divmod()
> and it returned 1, which probably caused the divmod DImode tests to
> fail with -m32.
> 
> In general, could I check in check_effective_target_*(), what options
> are passed ?

On some targets, yes.  E.g. on i?86-*-*/x86_64-*-*, one can additionally
test lp64 or ia32 or negation thereof - there are -m32, -mx32 and -m64
modes, -m64 satisfies lp64, -m32 ia32.

Though the predicate seems to be misnamed and not properly documented
if it is about properties of DImode divmod rather than other modes
(SImode, HImode, QImode, TImode, ...).

Jakub


[PATCH] Infer value ranges from stmt ops in EVRP

2016-11-02 Thread Richard Biener

The following makes EVRP use infer_value_range.  It also adds a bit of
dump verbosity to make EVRP traceable with -details.

Bootstrapped and tested on x86_64-unkown-linux-gnu, applied to trunk.

Richard.

2016-11-02  Richard Biener  

* tree-vrp.c (evrp_dom_walker::before_dom_children): Call
infer_value_range on stmt ops and update value-ranges.
Dump visited stmts and blocks.
(evrp_dom_walker::push_value_range): Dump changes.
(evrp_dom_walker::pop_value_range): Likewise.
(evrp_dom_walker::try_find_new_range): Avoid noop changes.

* gcc.dg/tree-ssa/vrp111.c: New testcase.
* gcc.dg/tree-ssa/pr20702.c: Disable EVRP.
* gcc.dg/tree-ssa/pr21086.c: Likewise.
* gcc.dg/tree-ssa/pr58480.c: Likewise.
* gcc.dg/tree-ssa/vrp08.c: Likewise.

Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 241697)
+++ gcc/tree-vrp.c  (working copy)
@@ -10650,18 +10650,17 @@ public:
 }
   virtual edge before_dom_children (basic_block);
   virtual void after_dom_children (basic_block);
-  void push_value_range (const_tree var, value_range *vr);
-  value_range *pop_value_range (const_tree var);
+  void push_value_range (tree var, value_range *vr);
+  value_range *pop_value_range (tree var);
   value_range *try_find_new_range (tree op, tree_code code, tree limit);
 
   /* Cond_stack holds the old VR.  */
-  auto_vec > stack;
+  auto_vec > stack;
   bitmap need_eh_cleanup;
   auto_vec stmts_to_fixup;
   auto_vec stmts_to_remove;
 };
 
-
 /*  Find new range for OP such that (OP CODE LIMIT) is true.  */
 
 value_range *
@@ -10679,6 +10678,10 @@ evrp_dom_walker::try_find_new_range (tre
  PUSH old value in the stack with the old VR.  */
   if (vr.type == VR_RANGE || vr.type == VR_ANTI_RANGE)
 {
+  if (old_vr->type == vr.type
+ && vrp_operand_equal_p (old_vr->min, vr.min)
+ && vrp_operand_equal_p (old_vr->max, vr.max))
+   return NULL;
   value_range *new_vr = vrp_value_range_pool.allocate ();
   *new_vr = vr;
   return new_vr;
@@ -10696,7 +10699,10 @@ evrp_dom_walker::before_dom_children (ba
   edge_iterator ei;
   edge e;
 
-  push_value_range (NULL_TREE, NULL);
+  if (dump_file && (dump_flags & TDF_DETAILS))
+fprintf (dump_file, "Visiting BB%d\n", bb->index);
+
+  stack.safe_push (std::make_pair (NULL_TREE, (value_range *)NULL));
 
   edge pred_e = NULL;
   FOR_EACH_EDGE (e, ei, bb->preds)
@@ -10723,6 +10729,11 @@ evrp_dom_walker::before_dom_children (ba
  && (INTEGRAL_TYPE_P (TREE_TYPE (gimple_cond_lhs (stmt)))
  || POINTER_TYPE_P (TREE_TYPE (gimple_cond_lhs (stmt)
{
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "Visiting controlling predicate ");
+ print_gimple_stmt (dump_file, stmt, 0, 0);
+   }
  /* Entering a new scope.  Try to see if we can find a VR
 here.  */
  tree op1 = gimple_cond_rhs (stmt);
@@ -10778,6 +10789,11 @@ evrp_dom_walker::before_dom_children (ba
continue;
   value_range vr_result = VR_INITIALIZER;
   bool interesting = stmt_interesting_for_vrp (phi);
+  if (interesting && dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "Visiting PHI node ");
+ print_gimple_stmt (dump_file, phi, 0, 0);
+   }
   if (!has_unvisited_preds
  && interesting)
extract_range_from_phi_node (phi, _result);
@@ -10814,6 +10830,12 @@ evrp_dom_walker::before_dom_children (ba
   bool was_noreturn = (is_gimple_call (stmt)
   && gimple_call_noreturn_p (stmt));
 
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "Visiting stmt ");
+ print_gimple_stmt (dump_file, stmt, 0, 0);
+   }
+
   if (gcond *cond = dyn_cast  (stmt))
{
  vrp_visit_cond_stmt (cond, _edge);
@@ -10825,6 +10847,7 @@ evrp_dom_walker::before_dom_children (ba
gimple_cond_make_false (cond);
  else
gcc_unreachable ();
+ update_stmt (stmt);
}
}
   else if (stmt_interesting_for_vrp (stmt))
@@ -10873,6 +10896,55 @@ evrp_dom_walker::before_dom_children (ba
   else
set_defs_to_varying (stmt);
 
+  /* See if we can derive a range for any of STMT's operands.  */
+  tree op;
+  ssa_op_iter i;
+  FOR_EACH_SSA_TREE_OPERAND (op, stmt, i, SSA_OP_USE)
+   {
+ tree value;
+ enum tree_code comp_code;
+
+ /* If OP is used in such a way that we can infer a value
+range for it, and we don't find a previous assertion for
+it, create a new assertion location node for OP.  */
+ if (infer_value_range (stmt, op, _code, ))
+   {
+ /* If we are able to infer a nonzero value range 

Re: [PATCH] Fix PR77407

2016-11-02 Thread Richard Biener
On Tue, 1 Nov 2016, Marc Glisse wrote:

> On Mon, 31 Oct 2016, Richard Biener wrote:
> 
> > On Fri, 28 Oct 2016, Marc Glisse wrote:
> > 
> > > On Wed, 28 Sep 2016, Richard Biener wrote:
> > > 
> > > > The following patch implements patterns to catch x / abs (x)
> > > > and x / -x, taking advantage of undefinedness at x == 0 as
> > > > opposed to the PR having testcases with explicit != 0 checks.
> > > > 
> > > > Bootstrap / regtest pending on x86_64-unknown-linux-gnu.
> > > > 
> > > > Richard.
> > > > 
> > > > 2016-09-28  Richard Biener  
> > > > 
> > > > PR middle-end/77407
> > > > * match.pd: Add X / abs (X) -> X < 0 ? -1 : 1 and
> > > > X / -X -> -1 simplifications.
> > > 
> > > I notice that we still have the following comment a few lines above:
> > > 
> > > /* Make sure to preserve divisions by zero.  This is the reason why
> > >we don't simplify x / x to 1 or 0 / x to 0.  */
> > > 
> > > Did we give up on preserving divisions by 0? Can we now do the 2
> > > simplifications listed by the comment?
> > 
> > At some point there was at least diagnostics fallout when doing them.
> > There may be also undefined sanitizer fallout depending on when we
> > instrument for that.
> > 
> > But in general yes, we do want to do the two simplifications.  Maybe
> > we can compromise (in case of early fallout) to do them on GIMPLE
> > only.
> > 
> > We could at least add them with a proper nonzero_p predicate.
> 
>  (for div (trunc_div ceil_div floor_div round_div exact_div)
> + (simplify (div @0 @0) { build_one_cst (type); })
> + (simplify (div integer_zerop@0 @1) @0)
> 
> causes no regression on powerpc64le-unknown-linux-gnu with
> --enable-languages=all,obj-c++,go.

Good.  I probably tried last before the C++ early folding changes.

If you'd formally post a patch adding the above (and adjusting the
comment) I'll approve that.

This eventually means we can remove the if (integer_zerop ()) early out
in fold_binary_loc as well.

Richard.


Re: [PATCH] enhance buffer overflow warnings (and c/53562)

2016-11-02 Thread Jakub Jelinek
On Tue, Nov 01, 2016 at 08:55:03PM -0600, Martin Sebor wrote:
> struct S {
>   int a, b, c, d;
> };
> 
> #define bos(p, t) __builtin_object_size (p, t)
> #define memset0(p, i, n) __builtin___memset_chk (p, i, n, bos (p, 0))
> #define memset1(p, i, n) __builtin___memset_chk (p, i, n, bos (p, 1))
> 
> void f0 (struct S *s)
> {
>   memset0 (>d, 0, 1024);   // no warning here (bos 0)
> }

But we do not want the warning here, there is nothing wrong on it.
The caller may be

void
bar (void)
{
  struct T { struct S header; char payload[1024 - offsetof (struct S, d)]; } t;
  initialize ();
  f0 ();
}

and the callee might rely on that.  Using some header structure at the
beginning and then conditionally on fields in that structure various
payloads occurs in many projects, starting with glibc, gcc, Linux kernel,
... The warning really must not be detached from reality.

If you want a warning for suspicious calls, sure, but
1) it has to be clearly worded significantly differently from how do you
   word it, so that users really understand you are warning about
   suspicious code (though, I really believe it is very common and there
   is really nothing the users can do about it)
2) it really shouldn't be enabled in -Wall, and I think not even in -Wextra

Jakub


[ping * 4] PR35503 - warn for restrict

2016-11-02 Thread Prathamesh Kulkarni
Pinging patch: https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01545.html

Thanks,
Prathamesh


Re: [PATCH, testsuite]: Cleanup lib/target-supports.exp, ...

2016-11-02 Thread Prathamesh Kulkarni
On 1 November 2016 at 23:41, Uros Bizjak  wrote:
> On Tue, Nov 1, 2016 at 5:05 PM, Jakub Jelinek  wrote:
>> On Tue, Nov 01, 2016 at 10:05:22AM +0100, Uros Bizjak wrote:
>>> ... simplify some conditions and add i?86-*-* target where missing.
>>>
>>> 2016-11-01  Uros Bizjak  
>>>
>>> * lib/target-supports.exp: Normalize order of i?86 and x86_64 targets.
>>> Whitespace fixes.
>> ...
>>> (check_effective_target_divmod): Add i?86-*-* target.
>>
>> This part likely broke
>> +FAIL: gcc.dg/divmod-1.c scan-tree-dump-times widening_mul "DIVMOD" 7
>> +FAIL: gcc.dg/divmod-2.c scan-tree-dump-times widening_mul "DIVMOD" 7
>> +FAIL: gcc.dg/divmod-3.c scan-tree-dump-times widening_mul "DIVMOD" 7
>> +FAIL: gcc.dg/divmod-4.c scan-tree-dump-times widening_mul "DIVMOD" 7
>> +FAIL: gcc.dg/divmod-6.c scan-tree-dump-times widening_mul "DIVMOD" 7
>> on i686-linux (i.e. 32-bit).
>
> No, this is expected (these tests already fail with x86_64 -m32
> multilib). These will be fixed by [1].
Oops, sorry for the breakage.
The tests are meant to check if the divmod transform triggered, which
is done by scanning
DIVMOD in the widening_mul dump.

Apparently I only checked for the triplet "x86_64-*-*" in
check_effective_target_divmod()
and it returned 1, which probably caused the divmod DImode tests to
fail with -m32.

In general, could I check in check_effective_target_*(), what options
are passed ?
So in case of -m32, I wanted to return 0 instead of 1 to make the
tests on 32-bit
UNSUPPORTED.

Thanks for fixing the test-cases!

Thanks,
Prathamesh
>
> [1] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02483.html
>
> Uros.
>
>> Dunno what exactly the tests are meant to test, most likely they just
>> need extra guards or something.  Can be reproduced even on x86_64-linux
>> with
>> make check-gcc RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} dg.exp=divmod*'
>>
>>> @@ -8110,7 +8090,7 @@
>>>  #TODO: Add checks for all targets that have either hardware divmod insn
>>>  # or define libfunc for divmod.
>>>  if { [istarget arm*-*-*]
>>> -  || [istarget x86_64-*-*] } {
>>> +  || [istarget i?86-*-*] || [istarget x86_64-*-*] } {
>>>   return 1
>>>  }
>>>  return 0
>>
>>
>> Jakub


Re: [PATCH, C++] Warn on redefinition of builtin functions (PR c++/71973)

2016-11-02 Thread Bernd Edlinger
On 11/01/16 22:31, Jason Merrill wrote:
> On Tue, Nov 1, 2016 at 4:00 PM, Bernd Edlinger
>  wrote:
>> On 11/01/16 20:48, Jason Merrill wrote:
   else if ((DECL_EXTERN_C_P (newdecl)
 && DECL_EXTERN_C_P (olddecl))
|| compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
  TYPE_ARG_TYPES (TREE_TYPE (olddecl
>
> So I was thinking to drop the "else" and the compparms test.

 Yes.  But then we must somehow avoid:

else
  /* Discard the old built-in function.  */
  return NULL_TREE;
>
 It maybe easier, just to copy the warning to the DECL_ANTICIPATED case?
>>>
>>> Or even move it there; removing the existing warning doesn't change
>>> anything in the testsuite, and I'm having trouble imagining how to
>>> trigger it.
>>>
>>
>> Nice.  It must be something, which does not anticipate a declaration.
>>
>> like:
>>
>> cat test.cc
>> int __builtin_abort(void*);
>>
>> g++ -c test.cc
>> test.cc:1:5: warning: new declaration 'int __builtin_abort(void*)'
>> ambiguates built-in declaration 'void __builtin_abort()'
>>   int __builtin_abort(void*);
>>   ^~~
>>
>> Intersting, the warning comes even though I forgot to add the
>> extern "C".
>
> Looks like anything starting with __builtin is implicitly extern "C"
> (set in grokfndecl).
>

Oh, Well.

we do also have such builtins that begin with __sync_
like

extern "C"
unsigned char __sync_fetch_and_add_1(volatile void*, unsigned char);

> If we remove the DECL_ANTICIPATED check, I see some failures in
> builtin* tests due to missing extern "C".  That seems appropriate at
> file scope, but I'm not sure it's right for namespace std.
>


Interesting, what have you done?

Sounds a bit like the issue I had with the std::abs overloads.

To get rid of that I added this in the DECL_ANTICIPATED path:

+ /* A new declaration doesn't match a built-in one unless it
+is also extern "C".  */
+ gcc_assert (DECL_IS_BUILTIN (olddecl));
+ gcc_assert (DECL_EXTERN_C_P (olddecl));
+ if (!DECL_EXTERN_C_P (newdecl))
+   return NULL_TREE;
+


Bernd.


Re: [PATCH, C++] Warn on redefinition of builtin functions (PR c++/71973)

2016-11-02 Thread Bernd Edlinger
On 11/02/16 07:11, Bernd Edlinger wrote:
> On 11/01/16 19:15, Bernd Edlinger wrote:
>> On 11/01/16 18:11, Jason Merrill wrote:
>>> On Tue, Nov 1, 2016 at 11:45 AM, Bernd Edlinger
>>>  wrote:
 On 11/01/16 16:20, Jason Merrill wrote:
> On 10/17/2016 03:18 PM, Bernd Edlinger wrote:
> I'm not even sure we need a new warning.  Can we combine this warning
> with the block that currently follows?

 After 20 years of not having a warning on that,
 an implicitly enabled warning would at least break lots of bogus
 test cases.
>>>
>>> Would it, though?  Which test cases still break with the current patch?
>>>
>>
>> Less than before, but there are still at least a few of them.
>>
>> I can make a list and send it tomorrow.
>>
>
> FAIL: g++.dg/cpp1y/lambda-generic-udt.C  -std=gnu++14 (test for excess
> errors)
> FAIL: g++.dg/cpp1y/lambda-generic-xudt.C  -std=gnu++14 (test for excess
> errors)
> FAIL: g++.dg/init/new15.C  -std=c++11 (test for excess errors)
> FAIL: g++.dg/init/new15.C  -std=c++14 (test for excess errors)
> FAIL: g++.dg/init/new15.C  -std=c++98 (test for excess errors)
> FAIL: g++.dg/ipa/inline-1.C  -std=gnu++11 (test for excess errors)
> FAIL: g++.dg/ipa/inline-1.C  -std=gnu++14 (test for excess errors)
> FAIL: g++.dg/ipa/inline-1.C  -std=gnu++98 (test for excess errors)
> FAIL: g++.dg/ipa/inline-2.C  -std=gnu++11 (test for excess errors)
> FAIL: g++.dg/ipa/inline-2.C  -std=gnu++14 (test for excess errors)
> FAIL: g++.dg/ipa/inline-2.C  -std=gnu++98 (test for excess errors)
> FAIL: g++.dg/tc1/dr20.C  -std=c++11 (test for excess errors)
> FAIL: g++.dg/tc1/dr20.C  -std=c++14 (test for excess errors)
> FAIL: g++.dg/tc1/dr20.C  -std=c++98 (test for excess errors)
> FAIL: g++.dg/tree-ssa/inline-1.C  -std=gnu++11 (test for excess errors)
> FAIL: g++.dg/tree-ssa/inline-1.C  -std=gnu++14 (test for excess errors)
> FAIL: g++.dg/tree-ssa/inline-1.C  -std=gnu++98 (test for excess errors)
> FAIL: g++.dg/tree-ssa/inline-2.C  -std=gnu++11 (test for excess errors)
> FAIL: g++.dg/tree-ssa/inline-2.C  -std=gnu++14 (test for excess errors)
> FAIL: g++.dg/tree-ssa/inline-2.C  -std=gnu++98 (test for excess errors)
> FAIL: g++.dg/lto/20080908-1 cp_lto_20080908-1_0.o assemble, -O0 -flto
> -flto-partition=1to1 -fno-use-linker-plugin
> FAIL: g++.dg/lto/20080908-1 cp_lto_20080908-1_0.o assemble, -O0 -flto
> -flto-partition=none -fuse-linker-plugin
> FAIL: g++.dg/lto/20080908-1 cp_lto_20080908-1_0.o assemble, -O0 -flto
> -fuse-linker-plugin -fno-fat-lto-objects
> FAIL: g++.dg/lto/20080908-1 cp_lto_20080908-1_0.o assemble, -O2 -flto
> -flto-partition=1to1 -fno-use-linker-plugin
> FAIL: g++.dg/lto/20080908-1 cp_lto_20080908-1_0.o assemble, -O2 -flto
> -flto-partition=none -fuse-linker-plugin -fno-fat-lto-objects
> FAIL: g++.dg/lto/20080908-1 cp_lto_20080908-1_0.o assemble, -O2 -flto
> -fuse-linker-plugin
> FAIL: g++.dg/lto/pr68811 cp_lto_pr68811_0.o assemble, -O2
> FAIL: g++.old-deja/g++.law/except1.C  -std=gnu++11 (test for excess errors)
> FAIL: g++.old-deja/g++.law/except1.C  -std=gnu++14 (test for excess errors)
> FAIL: g++.old-deja/g++.law/except1.C  -std=gnu++98 (test for excess errors)
> FAIL: g++.old-deja/g++.mike/p700.C  -std=gnu++11 (test for excess errors)
> FAIL: g++.old-deja/g++.mike/p700.C  -std=gnu++14 (test for excess errors)
> FAIL: g++.old-deja/g++.mike/p700.C  -std=gnu++98 (test for excess errors)
> FAIL: g++.old-deja/g++.other/builtins10.C  -std=c++11 (test for excess
> errors)
> FAIL: g++.old-deja/g++.other/builtins10.C  -std=c++14 (test for excess
> errors)
> FAIL: g++.old-deja/g++.other/realloc.C  -std=c++11 (test for excess errors)
> FAIL: g++.old-deja/g++.other/realloc.C  -std=c++14 (test for excess errors)
> FAIL: g++.old-deja/g++.other/realloc.C  -std=c++98 (test for excess errors)
> FAIL: g++.old-deja/g++.other/vbase5.C  -std=c++11 (test for excess errors)
> FAIL: g++.old-deja/g++.other/vbase5.C  -std=c++14 (test for excess errors)
> FAIL: g++.old-deja/g++.other/vbase5.C  -std=c++98 (test for excess errors)
>
>
> The lto test case does emit the warning when assembling, but
> it still produces an executable and even executes it.
>
> Also g++.dg/cpp1y/lambda-generic-udt.C, g++.dg/tc1/dr20.C
> and g++.old-deja/g++.other/vbase5.C are execution tests.
>
> So I was wrong to assume these were all compile-only tests.
>
> I think that list should be fixable, if we decide to enable
> the warning by default.
>

Aehm, sorry.  I forgot to look at other languages...

 === obj-c++ tests ===


Running target unix
FAIL: obj-c++.dg/lto/trivial-1 obj_cpp_lto_trivial-1_0.o assemble, -O0 
-flto -fg
nu-runtime
FAIL: obj-c++.dg/lto/trivial-1 obj_cpp_lto_trivial-1_0.o assemble, -O2 
-flto -fg
nu-runtime
FAIL: obj-c++.dg/lto/trivial-1 obj_cpp_lto_trivial-1_0.o assemble, -O0 
-flto -fl
to-partition=none -fgnu-runtime
FAIL: obj-c++.dg/lto/trivial-1 obj_cpp_lto_trivial-1_0.o assemble, -O2 
-flto -fl
to-partition=none -fgnu-runtime

 === 

Re: [PATCH, C++] Warn on redefinition of builtin functions (PR c++/71973)

2016-11-02 Thread Bernd Edlinger
On 11/01/16 19:15, Bernd Edlinger wrote:
> On 11/01/16 18:11, Jason Merrill wrote:
>> On Tue, Nov 1, 2016 at 11:45 AM, Bernd Edlinger
>>  wrote:
>>> On 11/01/16 16:20, Jason Merrill wrote:
 On 10/17/2016 03:18 PM, Bernd Edlinger wrote:
 I'm not even sure we need a new warning.  Can we combine this warning
 with the block that currently follows?
>>>
>>> After 20 years of not having a warning on that,
>>> an implicitly enabled warning would at least break lots of bogus
>>> test cases.
>>
>> Would it, though?  Which test cases still break with the current patch?
>>
>
> Less than before, but there are still at least a few of them.
>
> I can make a list and send it tomorrow.
>

FAIL: g++.dg/cpp1y/lambda-generic-udt.C  -std=gnu++14 (test for excess 
errors)
FAIL: g++.dg/cpp1y/lambda-generic-xudt.C  -std=gnu++14 (test for excess 
errors)
FAIL: g++.dg/init/new15.C  -std=c++11 (test for excess errors)
FAIL: g++.dg/init/new15.C  -std=c++14 (test for excess errors)
FAIL: g++.dg/init/new15.C  -std=c++98 (test for excess errors)
FAIL: g++.dg/ipa/inline-1.C  -std=gnu++11 (test for excess errors)
FAIL: g++.dg/ipa/inline-1.C  -std=gnu++14 (test for excess errors)
FAIL: g++.dg/ipa/inline-1.C  -std=gnu++98 (test for excess errors)
FAIL: g++.dg/ipa/inline-2.C  -std=gnu++11 (test for excess errors)
FAIL: g++.dg/ipa/inline-2.C  -std=gnu++14 (test for excess errors)
FAIL: g++.dg/ipa/inline-2.C  -std=gnu++98 (test for excess errors)
FAIL: g++.dg/tc1/dr20.C  -std=c++11 (test for excess errors)
FAIL: g++.dg/tc1/dr20.C  -std=c++14 (test for excess errors)
FAIL: g++.dg/tc1/dr20.C  -std=c++98 (test for excess errors)
FAIL: g++.dg/tree-ssa/inline-1.C  -std=gnu++11 (test for excess errors)
FAIL: g++.dg/tree-ssa/inline-1.C  -std=gnu++14 (test for excess errors)
FAIL: g++.dg/tree-ssa/inline-1.C  -std=gnu++98 (test for excess errors)
FAIL: g++.dg/tree-ssa/inline-2.C  -std=gnu++11 (test for excess errors)
FAIL: g++.dg/tree-ssa/inline-2.C  -std=gnu++14 (test for excess errors)
FAIL: g++.dg/tree-ssa/inline-2.C  -std=gnu++98 (test for excess errors)
FAIL: g++.dg/lto/20080908-1 cp_lto_20080908-1_0.o assemble, -O0 -flto 
-flto-partition=1to1 -fno-use-linker-plugin
FAIL: g++.dg/lto/20080908-1 cp_lto_20080908-1_0.o assemble, -O0 -flto 
-flto-partition=none -fuse-linker-plugin
FAIL: g++.dg/lto/20080908-1 cp_lto_20080908-1_0.o assemble, -O0 -flto 
-fuse-linker-plugin -fno-fat-lto-objects
FAIL: g++.dg/lto/20080908-1 cp_lto_20080908-1_0.o assemble, -O2 -flto 
-flto-partition=1to1 -fno-use-linker-plugin
FAIL: g++.dg/lto/20080908-1 cp_lto_20080908-1_0.o assemble, -O2 -flto 
-flto-partition=none -fuse-linker-plugin -fno-fat-lto-objects
FAIL: g++.dg/lto/20080908-1 cp_lto_20080908-1_0.o assemble, -O2 -flto 
-fuse-linker-plugin
FAIL: g++.dg/lto/pr68811 cp_lto_pr68811_0.o assemble, -O2
FAIL: g++.old-deja/g++.law/except1.C  -std=gnu++11 (test for excess errors)
FAIL: g++.old-deja/g++.law/except1.C  -std=gnu++14 (test for excess errors)
FAIL: g++.old-deja/g++.law/except1.C  -std=gnu++98 (test for excess errors)
FAIL: g++.old-deja/g++.mike/p700.C  -std=gnu++11 (test for excess errors)
FAIL: g++.old-deja/g++.mike/p700.C  -std=gnu++14 (test for excess errors)
FAIL: g++.old-deja/g++.mike/p700.C  -std=gnu++98 (test for excess errors)
FAIL: g++.old-deja/g++.other/builtins10.C  -std=c++11 (test for excess 
errors)
FAIL: g++.old-deja/g++.other/builtins10.C  -std=c++14 (test for excess 
errors)
FAIL: g++.old-deja/g++.other/realloc.C  -std=c++11 (test for excess errors)
FAIL: g++.old-deja/g++.other/realloc.C  -std=c++14 (test for excess errors)
FAIL: g++.old-deja/g++.other/realloc.C  -std=c++98 (test for excess errors)
FAIL: g++.old-deja/g++.other/vbase5.C  -std=c++11 (test for excess errors)
FAIL: g++.old-deja/g++.other/vbase5.C  -std=c++14 (test for excess errors)
FAIL: g++.old-deja/g++.other/vbase5.C  -std=c++98 (test for excess errors)


The lto test case does emit the warning when assembling, but
it still produces an executable and even executes it.

Also g++.dg/cpp1y/lambda-generic-udt.C, g++.dg/tc1/dr20.C
and g++.old-deja/g++.other/vbase5.C are execution tests.

So I was wrong to assume these were all compile-only tests.

I think that list should be fixable, if we decide to enable
the warning by default.

>>> Of course in C we have an implicitly enabled warning,
>>> so I would like to at least enable the warning on -Wall, thus
>>> -Wshadow is too weak IMO.
>>
>> Right.  The -Wshadow warning is for file-local declarations, so that
>> doesn't apply to your testcase; I was thinking that we should hit the
>> first (currently unconditional) warning.
>>
>   else if ((DECL_EXTERN_C_P (newdecl)
> && DECL_EXTERN_C_P (olddecl))
>|| compparms (TYPE_ARG_TYPES (TREE_TYPE (newdecl)),
>  TYPE_ARG_TYPES (TREE_TYPE
> (olddecl
>>
>> So I was thinking to drop the "else" and the compparms test.
>>
>
> Yes.  But then we must somehow avoid:
>

<    1   2