RFA: Small PATCH to add pow2p_hwi to hwint.h

2016-09-07 Thread Jason Merrill
Various places in GCC use negate, bit-and and compare to test whether
an integer is a power of 2, but I think it would be clearer for this
test to be wrapped in a function.

OK for trunk?
commit e2ca9914ce46d56775854f50c21506b220fd50b6
Author: Jason Merrill 
Date:   Wed Sep 7 16:22:32 2016 -0400

* hwint.h (pow2p_hwi): New.

diff --git a/gcc/hwint.h b/gcc/hwint.h
index 6b4d537..3d85fc3 100644
--- a/gcc/hwint.h
+++ b/gcc/hwint.h
@@ -299,4 +299,12 @@ absu_hwi (HOST_WIDE_INT x)
   return x >= 0 ? (unsigned HOST_WIDE_INT)x : -(unsigned HOST_WIDE_INT)x;
 }
 
+/* True if X is a power of two.  */
+
+inline bool
+pow2p_hwi (unsigned HOST_WIDE_INT x)
+{
+  return (x & -x) == x;
+}
+
 #endif /* ! GCC_HWINT_H */


Re: [RFC][SSA] Iterator to visit SSA

2016-09-07 Thread Kugan Vivekanandarajah
Hi Richard,

On 7 September 2016 at 19:35, Richard Biener  wrote:
> On Wed, Sep 7, 2016 at 2:21 AM, Kugan Vivekanandarajah
>  wrote:
>> Hi Richard,
>>
>> On 6 September 2016 at 19:08, Richard Biener  
>> wrote:
>>> On Tue, Sep 6, 2016 at 2:24 AM, Kugan Vivekanandarajah
>>>  wrote:
 Hi Richard,

 On 5 September 2016 at 17:57, Richard Biener  
 wrote:
> On Mon, Sep 5, 2016 at 7:26 AM, Kugan Vivekanandarajah
>  wrote:
>> Hi All,
>>
>> While looking at gcc source, I noticed that we are iterating over SSA
>> variable from 0 to num_ssa_names in some cases and 1 to num_ssa_names
>> in others. It seems that variable 0 is always NULL TREE.
>
> Yeah, that's artificial because we don't assign SSA name version 0 (for 
> some
> unknown reason).
>
>> But it can
>> confuse people who are looking for the first time. Therefore It might
>> be good to follow some consistent usage here.
>>
>> It might be also good to gave a FOR_EACH_SSAVAR iterator as we do in
>> other case. Here is attempt to do this based on what is done in other
>> places. Bootstrapped and regression tested on X86_64-linux-gnu with no
>> new regressions. is this OK?
>
> First of all some bikeshedding - FOR_EACH_SSA_NAME would be better
> as SSAVAR might be confused with iterating over SSA_NAME_VAR.
>
> Then, if you add an iterator why leave the name == NULL handling to 
> consumers?
> That looks odd.
>
> Then, SSA names are just in a vector thus why not re-use the vector 
> iterator?
>
> That is,
>
> #define FOR_EACH_SSA_NAME (name) \
>   for (unsigned i = 1; SSANAMES (cfun).iterate (i, ); ++i)
>
> would be equivalent to your patch?
>
> Please also don't add new iterators that implicitely use 'cfun' but 
> always use
> one that passes it as explicit arg.

 I think defining FOR_EACH_SSA_NAME with vector iterator is better. But
 we will not be able to skill NULL ssa_names with that.
>>>
>>> Why?  Can't you simply do
>>>
>>>   #define FOR_EACH_SSA_NAME (name) \
>>> for (unsigned i = 1; SSANAMES (cfun).iterate (i, ); ++i) \
>>>if (name)
>>>
>>> ?
>>
>> Indeed.  I missed the if at the end :(.  Here is an updated patch.
>> Bootstrapped and regression tested on x86_64-linux-gnu with no new
>> regressions.
>
> Quoting myself:
>
>> As I said
>> I'd like 'cfun' to be explicit, thus
>>
>> #define FOR_EACH_SSA_NAME(I, VAR, FN) \
>>  for (I = 1; SSANAMES (FN)->iterate ((I), &(VAR)); ++(I)) \
>>   if (VAR)
>
> the patch doesn't seem to contain that FN part.  Please also put declarations
> of 'name' you need to add right before the FOR_EACH_SSA_NAME rather
> than far away.

Please find the attached patch does this. Bootstrapped and regression
tested on x86_64-linux-gnu with no new regressions.

Is this OK?

Thanks,
Kugan

>
> Thanks,
> Richard.
>
>> Thanks,
>> Kugan
>>>
 I also added
 index variable to the macro so that there want be any conflicts if the
 index variable "i" (or whatever) is also defined in the loop.

 Bootstrapped and regression tested on x86_64-linux-gnu with no new
 regressions. Is this OK for trunk?

 Thanks,
 Kugan


 gcc/ChangeLog:

 2016-09-06  Kugan Vivekanandarajah  

 * tree-ssanames.h (FOR_EACH_SSA_NAME): New.
 * cfgexpand.c (update_alias_info_with_stack_vars): Use
 FOR_EACH_SSA_NAME to iterate over SSA variables.
 (pass_expand::execute): Likewise.
 * omp-simd-clone.c (ipa_simd_modify_function_body): Likewise.
 * tree-cfg.c (dump_function_to_file): Likewise.
 * tree-into-ssa.c (pass_build_ssa::execute): Likewise.
 (update_ssa): Likewise.
 * tree-ssa-alias.c (dump_alias_info): Likewise.
 * tree-ssa-ccp.c (ccp_finalize): Likewise.
 * tree-ssa-coalesce.c (build_ssa_conflict_graph): Likewise.
 (create_outofssa_var_map): Likewise.
 (coalesce_ssa_name): Likewise.
 * tree-ssa-operands.c (dump_immediate_uses): Likewise.
 * tree-ssa-pre.c (compute_avail): Likewise.
 * tree-ssa-sccvn.c (init_scc_vn): Likewise.
 (scc_vn_restore_ssa_info): Likewise.
 (free_scc_vn): Likwise.
 (run_scc_vn): Likewise.
 * tree-ssa-structalias.c (compute_points_to_sets): Likewise.
 * tree-ssa-ter.c (new_temp_expr_table): Likewise.
 * tree-ssa-copy.c (fini_copy_prop): Likewise.
 * tree-ssa.c (verify_ssa): Likewise.

>
> Thanks,
> Richard.
>
>
>> Thanks,
>> Kugan
>>
>>
>> gcc/ChangeLog:
>>
>> 2016-09-05  Kugan Vivekanandarajah  
>>
>> * tree-ssanames.h 

Re: [PATCH GCC 9/9]Prove no-overflow in computation of LOOP_VINFO_NITERS and improve code generation

2016-09-07 Thread kugan

Hi Bin,

On 07/09/16 17:52, Bin.Cheng wrote:

On Wed, Sep 7, 2016 at 1:10 AM, kugan  wrote:

Hi Bin,


On 07/09/16 04:54, Bin Cheng wrote:


Hi,
LOOP_VINFO_NITERS is computed as LOOP_VINFO_NITERSM1 + 1, which could
overflow in loop niters' type.  Vectorizer needs to generate more code
computing vectorized niters if overflow does happen.  However, For common
loops, there is no overflow actually, this patch tries to prove the
no-overflow information and use that to improve code generation.  At the
moment, no-overflow information comes either from loop niter analysis, or
the truth that we know loop is peeled for non-zero iterations in prologue
peeling.  For the latter case, it doesn't matter if the original
LOOP_VINFO_NITERS overflows or not, because computation LOOP_VINFO_NITERS -
LOOP_VINFO_PEELING_FOR_ALIGNMENT cancels the overflow by underflow.

Thanks,
bin

2016-09-01  Bin Cheng  

* tree-vect-loop.c (loop_niters_no_overflow): New func.
(vect_transform_loop): Call loop_niters_no_overflow.  Pass the
no-overflow information to vect_do_peeling_for_loop_bound and
vect_gen_vector_loop_niters.


009-prove-no_overflow-for-vect-niters-20160902.txt


diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 0d37f55..2ef1f9b 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6610,6 +6610,38 @@ vect_loop_kill_debug_uses (struct loop *loop,
gimple *stmt)
 }
 }

+/* Given loop represented by LOOP_VINFO, return true if computation of
+   LOOP_VINFO_NITERS (= LOOP_VINFO_NITERSM1 + 1) doesn't overflow, false
+   otherwise.  */
+
+static bool
+loop_niters_no_overflow (loop_vec_info loop_vinfo)
+{
+  /* Constant case.  */
+  if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
+{
+  int cst_niters = LOOP_VINFO_INT_NITERS (loop_vinfo);



Wouldn't it truncate by assigning this to int?

Probably, now I think it's unnecessary to use int version niters here,
LOOP_VINFO_NITERS can be used directly.




+  tree cst_nitersm1 = LOOP_VINFO_NITERSM1 (loop_vinfo);
+
+  gcc_assert (TREE_CODE (cst_nitersm1) == INTEGER_CST);
+  if (wi::to_widest (cst_nitersm1) < cst_niters)



Shouldn't you have do the addition and comparison in the type of the loop
index instead of widest_int to see if that overflows?

You mean the type of loop niters?  NITERS is computed from NITERSM1 +
1, I don't think we need to do it again here.


Imagine that you have LOOP_VINFO_NITERSM1 as TYPE_MAX (loop niters 
type). In this case, when you add 1, it will overflow in loop niters 
type but not when you do the computation in widest_int.


But, as you said, if NITERS is already computed in loop niters type, yes 
this compare should be sufficient.


You could do the comparison as wide_int or tree. I think, this would 
make it clearer.


Thanks,
Kugan



Thanks,
bin



Re: Make max_align_t respect _Float128 [version 2]

2016-09-07 Thread Paul Eggert

On 09/07/2016 04:52 AM, Mark Wielaard wrote:

If valgrind believes the
memory isn't in valid memory then it will complain about an invalid access.
But if the memory is accessible but uninitialised then it will just track
the undefinedness complain later if such a value is used.


I think the former is what happened in Gnulib fts.c before Gnulib was fixed.


valgrind also has --partial-loads-ok (which in newer versions defaults
to =yes):

Controls how Memcheck handles 32-, 64-, 128- and 256-bit naturally
aligned loads from addresses for which some bytes are addressable
and others are not.


Although this helps in some cases, it does not suffice in general since 
the problem can occur with 16-bit aligned loads. I think that is what 
happened with fts.c.



Does anybody have an example program of the above issue compiled with
gcc that produces false positives with valgrind?



Sure, attached. On Fedora 24 x86-64 (GCC 6.1.1 20160621, valgrind 
3.11.0), when I compile with "gcc -O2 flexouch.c" and run with "valgrind 
./a.out", valgrind complains "Invalid read of size 2". This is because 
GCC compiles "p->d[0] == 2 && p->d[1] == 3" into "cmpw $770, 8(%rax); 
sete %al", which loads the uninitialized byte p->d[1] simultaneously 
with the initialized byte p->d[0].


As mentioned previously, although flexouch.c does not conform to C11, 
this is arguably a defect in C11.


#include 
#include 

struct s { struct s *next; char d[]; };

int
main (void)
{
  struct s *p = malloc (offsetof (struct s, d) + 1);
  p->d[0] = 1;
  return p->d[0] == 2 && p->d[1] == 3;
}


Re: [Patch libgcc] Enable HCmode multiply and divide (mulhc3/divhc3)

2016-09-07 Thread Bernd Schmidt

On 09/07/2016 06:05 PM, James Greenhalgh wrote:


Hi,

This patch arranges for half-precision complex multiply and divide
routines to be built if __LIBGCC_HAS_HF_MODE__.  This will be true
if the target supports the _Float16 type.

OK?


Ok, but please see Joseph's patch from today: I think you probably want 
to invert the NOTRUNC case here as well.



Bernd


Re: Correct libgcc complex multiply excess precision handling

2016-09-07 Thread Bernd Schmidt

On 09/07/2016 11:48 PM, Joseph Myers wrote:

libgcc complex multiply is meant to eliminate excess
precision from certain internal values by forcing them to memory in
exactly those cases where the type has excess precision.  But in
https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01894.html I
accidentally inverted the logic so that values get forced to memory in
exactly the cases where it's not needed.  (This is a pessimization in
the no-excess-precision case, in principle could lead to bad results
depending on code generation in the excess-precision case.  Note: I do
not have a test demonstrating bad results.)


Ok.


Bernd



Correct libgcc complex multiply excess precision handling

2016-09-07 Thread Joseph Myers
libgcc complex multiply is meant to eliminate excess
precision from certain internal values by forcing them to memory in
exactly those cases where the type has excess precision.  But in
https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01894.html I
accidentally inverted the logic so that values get forced to memory in
exactly the cases where it's not needed.  (This is a pessimization in
the no-excess-precision case, in principle could lead to bad results
depending on code generation in the excess-precision case.  Note: I do
not have a test demonstrating bad results.)

Bootstrapped with no regressions on x86_64-pc-linux-gnu.  Code size
went down on x86_64 as expected; old sizes:

   textdata bss dec hex filename
887   0   0 887 377 _muldc3.o
810   0   0 810 32a _mulsc3.o
   2032   0   02032 7f0 _multc3.o
983   0   0 983 3d7 _mulxc3.o

New sizes:

847   0   0 847 34f _muldc3.o
770   0   0 770 302 _mulsc3.o
   2032   0   02032 7f0 _multc3.o
951   0   0 951 3b7 _mulxc3.o

OK to commit?

2016-09-07  Joseph Myers  

PR libgcc/77519
* libgcc2.c (NOTRUNC): Invert settings.

Index: libgcc/libgcc2.c
===
--- libgcc/libgcc2.c(revision 240028)
+++ libgcc/libgcc2.c(working copy)
@@ -1866,25 +1866,25 @@
 # define CTYPE SCtype
 # define MODE  sc
 # define CEXT  __LIBGCC_SF_FUNC_EXT__
-# define NOTRUNC __LIBGCC_SF_EXCESS_PRECISION__
+# define NOTRUNC (!__LIBGCC_SF_EXCESS_PRECISION__)
 #elif defined(L_muldc3) || defined(L_divdc3)
 # define MTYPE DFtype
 # define CTYPE DCtype
 # define MODE  dc
 # define CEXT  __LIBGCC_DF_FUNC_EXT__
-# define NOTRUNC __LIBGCC_DF_EXCESS_PRECISION__
+# define NOTRUNC (!__LIBGCC_DF_EXCESS_PRECISION__)
 #elif defined(L_mulxc3) || defined(L_divxc3)
 # define MTYPE XFtype
 # define CTYPE XCtype
 # define MODE  xc
 # define CEXT  __LIBGCC_XF_FUNC_EXT__
-# define NOTRUNC __LIBGCC_XF_EXCESS_PRECISION__
+# define NOTRUNC (!__LIBGCC_XF_EXCESS_PRECISION__)
 #elif defined(L_multc3) || defined(L_divtc3)
 # define MTYPE TFtype
 # define CTYPE TCtype
 # define MODE  tc
 # define CEXT  __LIBGCC_TF_FUNC_EXT__
-# define NOTRUNC __LIBGCC_TF_EXCESS_PRECISION__
+# define NOTRUNC (!__LIBGCC_TF_EXCESS_PRECISION__)
 #else
 # error
 #endif

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Make max_align_t respect _Float128 [version 2]

2016-09-07 Thread Joseph Myers
On Wed, 7 Sep 2016, Bernd Edlinger wrote:

> Apparently the different -msse default setting made the situation worse.
> I think that will not run on a pentium4 any more.

I think that's x86_64-* defaulting to an x86_64 processor (which implies 
SSE2 support) even with -m32 (unless a --with-arch-32= option is used to 
select a different 32-bit default).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [Fortran, Patch] First patch for coarray FAILED IMAGES (TS 18508)

2016-09-07 Thread Alessandro Fanfarillo
Dear all,
the attached patch supports failed images also when -fcoarray=single is used.

Built and regtested on x86_64-pc-linux-gnu.

Cheers,
Alessandro

2016-08-09 5:22 GMT-06:00 Paul Richard Thomas :
> Hi Sandro,
>
> As far as I can see, this is OK barring a couple of minor wrinkles and
> a question:
>
> For coarray_failed_images_err.f90 and coarray_image_status_err.f90 you
> have used the option -fdump-tree-original without making use of the
> tree dump.
>
> Mikael asked you to provide an executable test with -fcoarray=single.
> Is this not possible for some reason?
>
> Otherwise, this is OK for trunk.
>
> Thanks for the patch.
>
> Paul
>
> On 4 August 2016 at 05:07, Alessandro Fanfarillo
>  wrote:
>> * PING *
>>
>> 2016-07-21 13:05 GMT-06:00 Alessandro Fanfarillo :
>>> Dear Mikael and all,
>>>
>>> in attachment the new patch, built and regtested on x86_64-pc-linux-gnu.
>>>
>>> Cheers,
>>> Alessandro
>>>
>>> 2016-07-20 13:17 GMT-06:00 Mikael Morin :
 Le 20/07/2016 à 11:39, Andre Vehreschild a écrit :
>
> Hi Mikael,
>
>
>>> +  if(st == ST_FAIL_IMAGE)
>>> +new_st.op = EXEC_FAIL_IMAGE;
>>> +  else
>>> +gcc_unreachable();
>>
>> You can use
>> gcc_assert (st == ST_FAIL_IMAGE);
>> foo...;
>> instead of
>> if (st == ST_FAIL_IMAGE)
>> foo...;
>> else
>> gcc_unreachable ();
>
>
> Be careful, this is not 100% identical in the general case. For older
> gcc version (gcc < 4008) gcc_assert() is mapped to nothing, esp. not to
> an abort(), so the behavior can change. But in this case everything is
> fine, because the patch is most likely not backported.
>
 Didn't know about this. The difference seems to be very subtle.
 I don't mind much anyway. The original version can stay if preferred, this
 was just a suggestion.

 By the way, if the function is inlined in its single caller, the assert or
 unreachable statement can be removed, which avoids choosing between them.
 That's another suggestion.


>>> +
>>> +  return MATCH_YES;
>>> +
>>> + syntax:
>>> +  gfc_syntax_error (st);
>>> +
>>> +  return MATCH_ERROR;
>>> +}
>>> +
>>> +match
>>> +gfc_match_fail_image (void)
>>> +{
>>> +  /* if (!gfc_notify_std (GFC_STD_F2008_TS, "FAIL IMAGE statement
>>> at %C")) */
>>> +  /*   return MATCH_ERROR; */
>>> +
>>
>> Can this be uncommented?
>>
>>> +  return fail_image_statement (ST_FAIL_IMAGE);
>>> +}
>>>
>>>  /* Match LOCK/UNLOCK statement. Syntax:
>>>   LOCK ( lock-variable [ , lock-stat-list ] )
>>> diff --git a/gcc/fortran/trans-intrinsic.c
>>> b/gcc/fortran/trans-intrinsic.c index 1aaf4e2..b2f5596 100644
>>> --- a/gcc/fortran/trans-intrinsic.c
>>> +++ b/gcc/fortran/trans-intrinsic.c
>>> @@ -1647,6 +1647,24 @@ trans_this_image (gfc_se * se, gfc_expr
>>> *expr) m, lbound));
>>>  }
>>>
>>> +static void
>>> +gfc_conv_intrinsic_image_status (gfc_se *se, gfc_expr *expr)
>>> +{
>>> +  unsigned int num_args;
>>> +  tree *args,tmp;
>>> +
>>> +  num_args = gfc_intrinsic_argument_list_length (expr);
>>> +  args = XALLOCAVEC (tree, num_args);
>>> +
>>> +  gfc_conv_intrinsic_function_args (se, expr, args, num_args);
>>> +
>>> +  if (flag_coarray == GFC_FCOARRAY_LIB)
>>> +{
>>
>> Can everything be put under the if?
>> Does it work with -fcoarray=single?
>
>
> IMO coarray=single should not generate code here, therefore putting
> everything under the if should to fine.
>
 My point was more avoiding generating code for the arguments if they are 
 not
 used in the end.
 Regarding the -fcoarray=single case, the function returns a result, which
 can be used in an expression, so I don't think it will work without at 
 least
 hardcoding a fixed value as result in that case.
 But even that wouldn't be enough, as the function wouldn't work 
 consistently
 with the fail image statement.

> Sorry for the comments ...
>
 Comments are welcome here, as far as I know. ;-)

 Mikael
>
>
>
> --
> The difference between genius and stupidity is; genius has its limits.
>
> Albert Einstein
commit 13213642603b4941a2e4ea085b0bfd5cb37f
Author: Alessandro Fanfarillo 
Date:   Wed Sep 7 13:00:17 2016 -0600

Second Review of failed image patch

diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c
index ff5e80b..110bec0 100644
--- a/gcc/fortran/check.c
+++ b/gcc/fortran/check.c
@@ -1217,6 +1217,82 @@ gfc_check_event_query (gfc_expr *event, gfc_expr *count, 
gfc_expr *stat)
   return true;
 }
 
+bool
+gfc_check_image_status (gfc_expr *image, gfc_expr 

Re: [Patch RFC] Modify excess precision logic to permit FLT_EVAL_METHOD=16

2016-09-07 Thread Joseph Myers
On Wed, 7 Sep 2016, Joseph Myers wrote:

> How about instead having more than one target macro / hook.  One would 
> indicate that excess precision is used by insn patterns (and be set only 
> for i386 and m68k).  Another would indicate the API-level excess precision 

Or, maybe there would be a single hook taking a tristate argument.

Target hooks need to provide the following information:

(a) What excess precision is implicit in the insn patterns (that is, when 
a value described in middle-end IR as having a particular mode/type may in 
fact implicitly have extra range and precision because insn patterns 
produce results with such extra range and precision, not just with the 
range and precision of the output mode).

(b) What excess precision should be added explicitly to the IR by the 
front end in "fast" mode.

(c) What excess precision should be added explicitly to the IR by the 
front end in "standard" mode.

All of these may be represented by FLT_EVAL_METHOD values.  In what 
follows, "none" means no excess precision (whether the value is 0 or 16); 
"unpredictable" means inherently unpredictable even in the absence of 
register spills (like -mfpmath=sse+387), so FLT_EVAL_METHOD == -1.  In 
principle there could be cases of predictable excess precision that have 
to map to -1 because there is no other value they could map to, but in 
practice I don't expect that to be an issue (given the TS 18661-3 values 
of FLT_EVAL_METHOD being available).

(a) should always be "none" except for the existing x86 and m68k cases.

(b) is "none" in all existing cases, but we have the issue of ARM cases 
without direct binary16 arithmetic where it would be desirable for it to 
apply excess precision to _Float16 values.

(c) is not "none" at present in exactly those cases where 
TARGET_FLT_EVAL_METHOD is nonzero (but in the case of s390 this is really 
a target bug that should be fixed).  It might also be not "none" in future 
for ARM cases like in (b).

(a) can be "unpredictable".  (b) and (c) never can.

Rather than init_excess_precision setting flag_excess_precision, possibly 
turning "standard" into "fast", I think it should set some variable that 
describes the result of whichever of (b) and (c) is applicable - and in 
the cases where "standard" turns into "fast", it would simply happen that 
both (b) and (c) produce the same result.

The effective excess precision seen by the user is the result of applying 
first one of (b) and (c), then (a).  If (a) is not "none", this is not 
entirely predictable.  It's a broken compiler configuration if applying 
(c) yields a type on which (a) is not a no-op, except in the case where 
(a) is "unpredictable" and (c) is "none".  I'm not aware of any likely 
cases where a type would actually get promoted twice by applying those two 
operations.

Right now TARGET_FLT_EVAL_METHOD_NON_DEFAULT is used to give errors for 
-fexcess-precision=standard for languages not supporting it.  With a 
conversion to hooks, that needs to be rethought.  The point is to give an 
error if predictability was requested but cannot be achieved, so I suppose 
ideally the error should be about (a) being not "none", together with 
-fexcess-precision=standard being used.  But if the relevant back-end 
options aren't available at this point to use the hook for (a), the error 
could just be given for all targets (for those languages when that option 
is given).

Effective predictability, for __GCC_IEC_559 in flag_iso mode, means that 
(a) does nothing to any type resulting from whichever of (b) and (c) is in 
effect.

The way __LIBGCC_*_EXCESS_PRECISION__ is used is about eliminating excess 
precision from results assigned to variables - meaning it should be about 
(a) only.

That leaves the question of setting FLT_EVAL_METHOD.  It should relate to 
the effective excess precision seen by the user, the combination of 
whichever of (b) and (c) is in effect with (a).  The only problem is the 
case where that combination is most precisely described by "16", which as 
discussed is not a C11 value and may affect existing code not expecting 
such a value.  The value -1 is compatible with C11 and TS 18661-3 but 
suboptimal, while the value 0 is compatible with C11 only, not with TS 
18661-3 even when no feature test macros are defined.

We already have the option -fno-fp-int-builtin-inexact to ensure certain 
built-in functions follow TS 18661-1 semantics.  It might be reasonable to 
have a new option to enable FLT_EVAL_METHOD using new values.  However, 
I'd be inclined to think that such an option should be on by default for 
-std=gnu*, only off for strict conformance modes.  (There would be both 
__FLT_EVAL_METHOD__ and __FLT_EVAL_METHOD_C99__, say, predefined macros, 
so that  could also always use the new value if 
__STDC_WANT_IEC_60559_TYPES_EXT__ is defined.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Make max_align_t respect _Float128 [version 2]

2016-09-07 Thread Bernd Edlinger
On 09/07/16 22:04, Joseph Myers wrote:
> On Wed, 7 Sep 2016, Bernd Edlinger wrote:
>
>> interesting.  I just tried the test case from PR 77330 with _Decimal128.
>> result: _Decimal128 did *not* trap with gcc4.8.4, but it does trap with
>> gcc-7.0.0.
>
> I checked with GCC 4.3; __alignof__ (_Decimal128) was 16 back then.
> Whether particular code happens to make use of that alignment requirement
> is inherently unpredictable.
>

Oh, now I see...

Alignof(_Decimal128) was 16, but gcc4.8 did not enable -msse, and
that must have changed.  When I use gcc -m32 -msse the test case
starts to fail with gcc-4.8.4.

With gcc-7.0.0 -m32 -mno-sse fixes the test case but the alignment is
still 16, as you already said.

Apparently the different -msse default setting made the situation worse.
I think that will not run on a pentium4 any more.


Bernd.


Re: [PATCH][v3] GIMPLE store merging pass

2016-09-07 Thread Jakub Jelinek
On Wed, Sep 07, 2016 at 10:19:11AM +0200, Richard Biener wrote:
> > If you want a 64-bit store, you'd need to merge the two, and that would be
> > even more expensive.  It is a matter of say:
> > movl $0x12345678, (%rsp)
> > movl $0x09abcdef, 4(%rsp)
> > vs.
> > movabsq $0x09abcdef12345678, %rax
> > movq %rax, (%rsp)
> > vs.
> > movl $0x09abcdef, %eax
> > salq $32, %rax
> > orq $0x12345678, %rax
> > movq $rax, (%rsp)
> 
> vs.
> 
> movq $LC0, (%rsp)

You don't want to store the address, so you'd use
movq .LC0, %rax
movq %rax, (%rsp)

> I think the important part to notice is that it should be straight forward
> for a target / the expander to split a large store from an immediate
> into any of the above but very hard to do the opposite.  Thus from a
> GIMPLE side "canonicalizing" to large stores (that are eventually
> supported and well-aligned) seems best to me.

I bet many programs assume that say 64-bit aligned store in the source is
atomic in 64-bit apps, without using __atomic_store (..., __ATOMIC_RELAXED);
So such a change would break that.

Jakub


Re: [PATCH][v3] GIMPLE store merging pass

2016-09-07 Thread Bernhard Reutner-Fischer
On September 6, 2016 5:14:47 PM GMT+02:00, Kyrill Tkachov 
 wrote:
>Hi all,

s/contigous/contiguous/
s/ where where/ where/

+struct merged_store_group
+{
+  HOST_WIDE_INT start;
+  HOST_WIDE_INT width;
+  unsigned char *val;
+  unsigned int align;
+  auto_vec stores;
+  /* We record the first and last original statements in the sequence because
+ because we'll need their vuse/vdef and replacement position.  */
+  gimple *last_stmt;

s/ because because/ because/

Why aren't these two HWIs unsigned, likewise in store_immediate_info and in 
most other spots in the patch?

+ fprintf (dump_file, "Afer writing ");
s/Afer /After/

/access if prohibitively slow/s/ if /is /

I'd get rid of successful_p in imm_store_chain_info::output_merged_stores.


+unsigned int
+pass_store_merging::execute (function *fun)
+{
+  basic_block bb;
+  hash_set orig_stmts;
+
+  FOR_EACH_BB_FN (bb, fun)
+{
+  gimple_stmt_iterator gsi;
+  HOST_WIDE_INT num_statements = 0;
+  /* Record the original statements so that we can keep track of
+statements emitted in this pass and not re-process new
+statements.  */
+  for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next ())
+   {
+ gimple_set_visited (gsi_stmt (gsi), false);
+ num_statements++;
+   }
+
+  if (num_statements < 2)
+   continue;

What about debug statements? ISTM you should skip those.
(Isn't visited reset before entry of a pass?)

Maybe I missed the bikeshedding about the name but I'd have used -fmerge-stores 
instead.

Thanks,
>
>The v3 of this patch addresses feedback I received on the version
>posted at [1].
>The merged store buffer is now represented as a char array that we
>splat values onto with
>native_encode_expr and native_interpret_expr. This allows us to merge
>anything that native_encode_expr
>accepts, including floating point values and short vectors. So this
>version extends the functionality
>of the previous one in that it handles floating point values as well.
>
>The first phase of the algorithm that detects the contiguous stores is
>also slightly refactored according
>to feedback to read more fluently.
>
>Richi, I experimented with merging up to MOVE_MAX bytes rather than
>word size but I got worse results on aarch64.
>MOVE_MAX there is 16 (because it has load/store register pair
>instructions) but the 128-bit immediates that we ended
>synthesising were too complex. Perhaps the TImode immediate store RTL
>expansions could be improved, but for now
>I've left the maximum merge size to be BITS_PER_WORD.
>
>I've disabled the pass for PDP-endian targets as the merging code
>proved to be quite fiddly to get right for different
>endiannesses and I didn't feel comfortable writing logic for
>BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN targets without serious
>testing capabilities. I hope that's ok (I note the bswap pass also
>doesn't try to do anything on such targets).
>
>Tested on arm, aarch64, x86_64 and on big-endian arm and aarch64.
>
>How does this version look?
>Thanks,
>Kyrill
>
>[1] https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01512.html
>
>2016-09-06  Kyrylo Tkachov  
>
> PR middle-end/22141
> * Makefile.in (OBJS): Add gimple-ssa-store-merging.o.
> * common.opt (fstore-merging): New Optimization option.
> * opts.c (default_options_table): Add entry for
> OPT_ftree_store_merging.
> * params.def (PARAM_STORE_MERGING_ALLOW_UNALIGNED): Define.
> * passes.def: Insert pass_tree_store_merging.
> * tree-pass.h (make_pass_store_merging): Declare extern
> prototype.
> * gimple-ssa-store-merging.c: New file.
> * doc/invoke.texi (Optimization Options): Document
> -fstore-merging.
>
>2016-09-06  Kyrylo Tkachov  
> Jakub Jelinek  
>
> PR middle-end/22141
> * gcc.c-torture/execute/pr22141-1.c: New test.
> * gcc.c-torture/execute/pr22141-2.c: Likewise.
> * gcc.target/aarch64/ldp_stp_1.c: Adjust for -fstore-merging.
> * gcc.target/aarch64/ldp_stp_4.c: Likewise.
> * gcc.dg/store_merging_1.c: New test.
> * gcc.dg/store_merging_2.c: Likewise.
> * gcc.dg/store_merging_3.c: Likewise.
> * gcc.dg/store_merging_4.c: Likewise.
> * gcc.dg/store_merging_5.c: Likewise.
> * gcc.dg/store_merging_6.c: Likewise.
> * gcc.target/i386/pr22141.c: Likewise.
> * gcc.target/i386/pr34012.c: Add -fno-store-merging to dg-options.




Re: [PATCH] -fsanitize=thread fixes (PR sanitizer/68260)

2016-09-07 Thread Jakub Jelinek
On Wed, Sep 07, 2016 at 09:07:45AM +0200, Richard Biener wrote:
> > @@ -493,6 +504,8 @@ instrument_builtin_call (gimple_stmt_ite
> > if (!tree_fits_uhwi_p (last_arg)
> > || memmodel_base (tree_to_uhwi (last_arg)) >= MEMMODEL_LAST)
> >   return;
> > +   if (lookup_stmt_eh_lp (stmt))
> > + remove_stmt_from_eh_lp (stmt);
> 
> These changes look bogus to me -- how can the tsan instrumentation
> function _not_ throw when the original function we replace it can?

The __tsan*atomic* functions are right now declared to be nothrow, so the
patch just matches how they are declared.
While the sync-builtins.def use
#define ATTR_NOTHROWCALL_LEAF_LIST (flag_non_call_exceptions ? \
ATTR_LEAF_LIST : ATTR_NOTHROW_LEAF_LIST)
I guess I could use the same for the tsan atomics, but wonder if it will work
properly when the libtsan is built with exceptions disabled and without
-fnon-call-exceptions.  Perhaps it would, at least if it is built with
-fasynchronous-unwind-tables (which is the case for x86_64 and aarch64 and
tsan isn't supported elsewhere).
> 
> It seems you are just avoiding the ICEs for now "wrong-code".  (and
> how does this relate to -fnon-call-exceptions as both are calls?)
> 
> The instrument_expr case seems to leave the original stmt in-place
> (and thus the EH).

Those are different.  For loads and stores there is just added call that
logs in the load or store, but for atomics the atomics are done inside of
the library and the library tracks everything.

Jakub


Re: [PATCH] Fix location of command line backend reported issues (PR middle-end/77475)

2016-09-07 Thread Christophe Lyon
On 6 September 2016 at 12:41, Christophe Lyon
 wrote:
> On 6 September 2016 at 12:12, Jakub Jelinek  wrote:
>> On Tue, Sep 06, 2016 at 12:07:47PM +0200, Christophe Lyon wrote:
>>> On 5 September 2016 at 19:20, Jakub Jelinek  wrote:
>>> > Hi!
>>> >
>>> > While it would be perhaps nice to pass explicit location_t in the target
>>> > option handling code, there are hundreds of error/warning/sorry calls
>>> > in lots of backends, and lots of those routines are used not just
>>> > for the process_options time (i.e. command line options), but also for
>>> > pragma GCC target and target option handling, so at least for the time 
>>> > being
>>> > I think it is easier to just use UNKNOWN_LOCATION for the command line
>>> > option diagnostics.
>>> >
>>> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>>> >
>>> > 2016-09-05  Jakub Jelinek  
>>> >
>>> > PR middle-end/77475
>>> > * toplev.c (process_options): Temporarily set input_location
>>> > to UNKNOWN_LOCATION around targetm.target_option.override () call.
>>> >
>>>
>>> This patch caused regressions on aarch64. The following tests now fail:
>>>   gcc.target/aarch64/arch-diagnostics-1.c  (test for errors, line 1)
>>>   gcc.target/aarch64/arch-diagnostics-1.c (test for excess errors)
>>>   gcc.target/aarch64/arch-diagnostics-2.c  (test for errors, line 1)
>>>   gcc.target/aarch64/arch-diagnostics-2.c (test for excess errors)
>>>   gcc.target/aarch64/cpu-diagnostics-1.c  (test for errors, line 1)
>>>   gcc.target/aarch64/cpu-diagnostics-1.c (test for excess errors)
>>>   gcc.target/aarch64/cpu-diagnostics-2.c  (test for errors, line 1)
>>>   gcc.target/aarch64/cpu-diagnostics-2.c (test for excess errors)
>>>   gcc.target/aarch64/cpu-diagnostics-3.c  (test for errors, line 1)
>>>   gcc.target/aarch64/cpu-diagnostics-3.c (test for excess errors)
>>>   gcc.target/aarch64/cpu-diagnostics-4.c  (test for errors, line 1)
>>>   gcc.target/aarch64/cpu-diagnostics-4.c (test for excess errors)
>>
>> Those tests need adjustments, not to expect such errors on line 1, but on
>> line 0.
>>
>> I think following untested patch should fix that:
>>
> Indeed it does, thanks!
>

I took the liberty of committing it (r240030) on your behalf with this
ChangeLog:
2016-09-07  Jakub Jelinek  

PR middle-end/77475
* gcc.target/aarch64/arch-diagnostics-1.c: Expect error on line 0.
* gcc.target/aarch64/arch-diagnostics-2.c: Likewise.
* gcc.target/aarch64/cpu-diagnostics-1.c: Likewise.
* gcc.target/aarch64/cpu-diagnostics-2.c: Likewise.
* gcc.target/aarch64/cpu-diagnostics-3.c: Likewise.
* gcc.target/aarch64/cpu-diagnostics-4.c: Likewise.

Thanks,

Christophe

> Christophe
>
>> --- gcc/testsuite/gcc.target/aarch64/arch-diagnostics-1.c.jj2012-10-23 
>> 19:54:58.0 +0200
>> +++ gcc/testsuite/gcc.target/aarch64/arch-diagnostics-1.c   2016-09-06 
>> 12:10:32.241560531 +0200
>> @@ -1,4 +1,4 @@
>> -/* { dg-error "unknown" "" {target "aarch64*-*-*" } } */
>> +/* { dg-error "unknown" "" {target "aarch64*-*-*" } 0 } */
>>  /* { dg-options "-O2 -march=dummy" } */
>>
>>  void f ()
>> --- gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-2.c.jj 2016-05-18 
>> 10:59:49.0 +0200
>> +++ gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-2.c2016-09-06 
>> 12:11:08.170110003 +0200
>> @@ -1,4 +1,4 @@
>> -/* { dg-error "missing" "" {target "aarch64*-*-*" } } */
>> +/* { dg-error "missing" "" {target "aarch64*-*-*" } 0 } */
>>  /* { dg-skip-if "do not override -mcpu" { *-*-* } { "-mcpu=*" } { "" } } */
>>  /* { dg-options "-O2 -mcpu=cortex-a53+no" } */
>>
>> --- gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-1.c.jj 2016-05-18 
>> 10:59:49.0 +0200
>> +++ gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-1.c2016-09-06 
>> 12:11:01.698191158 +0200
>> @@ -1,4 +1,4 @@
>> -/* { dg-error "unknown" "" {target "aarch64*-*-*" } } */
>> +/* { dg-error "unknown" "" {target "aarch64*-*-*" } 0 } */
>>  /* { dg-skip-if "do not override -mcpu" { *-*-* } { "-mcpu=*" } { "" } } */
>>  /* { dg-options "-O2 -mcpu=dummy" } */
>>
>> --- gcc/testsuite/gcc.target/aarch64/arch-diagnostics-2.c.jj2012-10-23 
>> 19:54:58.0 +0200
>> +++ gcc/testsuite/gcc.target/aarch64/arch-diagnostics-2.c   2016-09-06 
>> 12:10:39.737466536 +0200
>> @@ -1,4 +1,4 @@
>> -/* { dg-error "missing" "" {target "aarch64*-*-*" } } */
>> +/* { dg-error "missing" "" {target "aarch64*-*-*" } 0 } */
>>  /* { dg-options "-O2 -march=+dummy" } */
>>
>>  void f ()
>> --- gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-3.c.jj 2016-05-18 
>> 10:59:49.0 +0200
>> +++ gcc/testsuite/gcc.target/aarch64/cpu-diagnostics-3.c2016-09-06 
>> 12:11:13.628041564 +0200
>> @@ -1,4 +1,4 @@
>> -/* { dg-error "invalid feature" "" {target "aarch64*-*-*" } } */
>> +/* { dg-error "invalid feature" "" {target 

Re: Ping**2! Re: [PATCH, Fortran] Extension: AUTOMATIC/STATIC symbol attributes with -fdec-static

2016-09-07 Thread Fritz Reese
On Wed, Sep 7, 2016 at 9:31 AM, Andre Vehreschild  wrote:
> Hi Fritz,
>
> please note: I do not have official review privileges. So my vote here
> is rather an advise to you and the official reviewers. Often such a
> inofficial review helps to speed things up, because the official ones
> are pointed to the nics and nacs and don't have to bother with the
> minor things.

Andre,

Thank you very much for your comments. I have only been contributing
to GNU/GCC for a year or two and appreciate the advice. I definitely
strive to keep my patches in line with the relevant standards and
style guides. Attached is a replacement for the original patch which
addresses your concerns.


> - Do I understand this correctly: AUTOMATIC and STATIC have to come last,
>   i.e., right before the :: where declaring, e.g., a variable?
Not quite, you are probably misled by this code in decl.c:

> match
> gfc_match_static (void)
> {
...
> gfc_match (" ::");

(And equivalent for gfc_match_automatic.) These are similar to
gfc_match_save, and like gfc_match_save are only called to match
variable attribute specification _statements_, such as:
> SAVE :: x
> AUTOMATIC :: y

STATIC and AUTOMATIC are matched in any order in an attribute
specification _list_, as with SAVE, through the giant switch() earlier
in decl.c/match_attr_spec(). This applies to the following:
> INTEGER, SAVE, DIMENSION(3) :: x
> INTEGER, AUTOMATIC, DIMENSION(3) :: y


> - Running:
>
>   $ contrib/check_GNU_style.sh dec_static.patch
>
>   Reports some style issues in the C code, that should be fixed before
>   commit. (Style in Fortran testcases does not matter that much.)
I was not aware of this script - thanks!


> Please change formatting in a separate patch or not at all (here!).
> This policy is to distinguish cosmetic changes from relevant ones.
Fixed. These changes are usually accidental - I try not to reformat
code that I'm not otherwise touching.


>> diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
...
>> +With @option{-fdec-static} GNU Fortran supports the explicit specification 
>> of
>> +two addition variable attributes: @code{STATIC} and @code{AUTOMATIC}. These
...
> But is it only for variables? Can't it be used for equivalences or
> other constructs, too?
Yes, good point, perhaps 'entities' is a better term here:

+With @option{-fdec-static} GNU Fortran supports the DEC extended attributes
+@code{STATIC} and @code{AUTOMATIC} to provide explicit specification of entity
+storage. [...]

>> diff --git a/gcc/fortran/invoke.texi b/gcc/fortran/invoke.texi
...
>> +@item -fdec-static
>> +@opindex @code{fdec-static}
>> +Enable STATIC and AUTOMATIC as attributes specifying storage location.
>> +STATIC is equivalent to SAVE, and locals are typically AUTOMATIC by default.
>
> Well, this description to me sounds like: "Those attributes are
> useless, because they can be substituted." This is clearly not what you
> intend. I propose to include into the description that with "this
> switch the dec-extension" is available "to explicitly specify the
> storage of entities". Then the last sentence is still a good hint for
> all fortraneers that don't know the extension.

I guess I subconsciously made them sound "useless" because I hoped
users would think twice about using the extensions and use standard
conforming constructs instead. :-)  But, maybe you are right and this
would be clearer:

+@item -fdec-static
+@opindex @code{fdec-static}
+Enable DEC-style STATIC and AUTOMATIC attributes to explicitly specify
+the storage of variables and other objects.


>> diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
...
>> +fdec-static
>> +Fortran Var(flag_dec_static)
>> +Enable STATIC and AUTOMATIC attributes.
>
> How about: Enable the dec-extension of STATIC and AUTOMATIC attributes.
> Just a proposal.
How about this, to match invoke.texi:

+fdec-static
+Fortran Var(flag_dec_static)
+Enable DEC-style STATIC and AUTOMATIC attributes.


> -  Please add some testcases where the new error messages are tested.
Yes, good idea! cf. attached for dec_static_3.f90 and
dec_static_4.f90. These tests gave me a chance to realize I should
emit some better error messages so I made a minor change there:

diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index db431dd..be8e9f7 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -7838,6 +7838,8 @@ gfc_match_automatic (void)
   switch (m)
   {
   case MATCH_NO:
+break;
+
   case MATCH_ERROR:
 return MATCH_ERROR;

@@ -7856,7 +7858,7 @@ gfc_match_automatic (void)

   if (!seen_symbol)
 {
-  gfc_error ("Expected var-list in AUTOMATIC statement at %C");
+  gfc_error ("Expected entity-list in AUTOMATIC statement at %C");
   return MATCH_ERROR;
 }

... And similar for gfc_match_static. (Nb. "entity-list" was chosen to
correspond with the "save-entity-list" descriptor used by F90 to
specify the SAVE statement.)

Andre, thanks again for your comments. I 

[ARM] PR 67591 ARM v8 Thumb IT blocks are deprecated

2016-09-07 Thread Christophe Lyon
Hi,

The attached patch is a first part to solve PR 67591: it removes
several occurrences of "IT blocks containing 32-bit Thumb
instructions are deprecated in ARMv8" messages in the
gcc/g++/libstdc++/fortran testsuites.

It does not remove them all yet. This patch only modifies the
*cmp_and, *cmp_ior, *ior_scc_scc, *ior_scc_scc_cmp,
*and_scc_scc and *and_scc_scc_cmp patterns.
Additional work is required in sub_shiftsi etc, at least.
I've started looking at these, but I decided I could already
post this self-contained patch to check if this implementation
is OK.

Regarding *cmp_and and *cmp_ior patterns, the addition of the
enabled_for_depr_it attribute is aggressive in the sense that it keeps
only the alternatives with 'l' and 'Py' constraints, while in some
cases the constraints could be relaxed. Indeed, these 2 patterns can
swap their input comparisons, meaning that any of them can be emitted
in the IT-block, and is thus subject to the ARMv8 deprecation.
The generated code is possibly suboptimal in the cases where the
operands are not swapped, since 'r' could be used.

Cross-tested on arm-none-linux-gnueabihf with -mthumb/-march=armv8-a
and --with-cpu=cortex-a57 --with-mode=thumb, showing only improvements:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/239850-depr-it-4/report-build-info.html

Bootstrapped OK on armv8l HW.

Is this OK?

Thanks,

Christophe
2016-09-05  Christophe Lyon  

PR target/67591
* config/arm/arm.md (*cmp_and): Add enabled_for_depr_it attribute.
(*cmp_ior): Likewise.
(*ior_scc_scc): Add alternative for enabled_for_depr_it attribute.
(*ior_scc_scc_cmp): Likewise.
(*and_scc_scc): Likewise.
(*and_scc_scc_cmp): Likewise.
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 318db75..0374bdd 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -9340,6 +9340,7 @@
   [(set_attr "conds" "set")
(set_attr "predicable" "no")
(set_attr "arch" "t2,t2,t2,t2,t2,any,any,any,any")
+   (set_attr "enabled_for_depr_it" "yes,no,no,no,no,no,no,no,no")
(set_attr_alternative "length"
   [(const_int 6)
(const_int 8)
@@ -9422,6 +9423,7 @@
   "
   [(set_attr "conds" "set")
(set_attr "arch" "t2,t2,t2,t2,t2,any,any,any,any")
+   (set_attr "enabled_for_depr_it" "yes,no,no,no,no,no,no,no,no")
(set_attr_alternative "length"
   [(const_int 6)
(const_int 8)
@@ -9444,13 +9446,13 @@
 )
 
 (define_insn_and_split "*ior_scc_scc"
-  [(set (match_operand:SI 0 "s_register_operand" "=Ts")
+  [(set (match_operand:SI 0 "s_register_operand" "=Ts,Ts")
(ior:SI (match_operator:SI 3 "arm_comparison_operator"
-[(match_operand:SI 1 "s_register_operand" "r")
- (match_operand:SI 2 "arm_add_operand" "rIL")])
+[(match_operand:SI 1 "s_register_operand" "r,l")
+ (match_operand:SI 2 "arm_add_operand" "rIL,lPy")])
(match_operator:SI 6 "arm_comparison_operator"
-[(match_operand:SI 4 "s_register_operand" "r")
- (match_operand:SI 5 "arm_add_operand" "rIL")])))
+[(match_operand:SI 4 "s_register_operand" "r,l")
+ (match_operand:SI 5 "arm_add_operand" "rIL,lPy")])))
(clobber (reg:CC CC_REGNUM))]
   "TARGET_32BIT
&& (arm_select_dominance_cc_mode (operands[3], operands[6], DOM_CC_X_OR_Y)
@@ -9469,6 +9471,7 @@
  DOM_CC_X_OR_Y),
CC_REGNUM);"
   [(set_attr "conds" "clob")
+   (set_attr "enabled_for_depr_it" "no,yes")
(set_attr "length" "16")
(set_attr "type" "multiple")]
 )
@@ -9478,13 +9481,13 @@
 (define_insn_and_split "*ior_scc_scc_cmp"
   [(set (match_operand 0 "dominant_cc_register" "")
(compare (ior:SI (match_operator:SI 3 "arm_comparison_operator"
- [(match_operand:SI 1 "s_register_operand" "r")
-  (match_operand:SI 2 "arm_add_operand" "rIL")])
+ [(match_operand:SI 1 "s_register_operand" "r,l")
+  (match_operand:SI 2 "arm_add_operand" "rIL,lPy")])
 (match_operator:SI 6 "arm_comparison_operator"
- [(match_operand:SI 4 "s_register_operand" "r")
-  (match_operand:SI 5 "arm_add_operand" "rIL")]))
+ [(match_operand:SI 4 "s_register_operand" "r,l")
+  (match_operand:SI 5 "arm_add_operand" "rIL,lPy")]))
 (const_int 0)))
-   (set (match_operand:SI 7 "s_register_operand" "=Ts")
+   (set (match_operand:SI 7 "s_register_operand" "=Ts,Ts")
(ior:SI (match_op_dup 3 [(match_dup 1) (match_dup 2)])
(match_op_dup 6 [(match_dup 4) (match_dup 5)])))]
   "TARGET_32BIT"
@@ -9499,18 +9502,19 @@
(set (match_dup 7) (ne:SI (match_dup 0) (const_int 0)))]
   ""
   [(set_attr "conds" "set")
+   

Re: Make max_align_t respect _Float128 [version 2]

2016-09-07 Thread Joseph Myers
On Wed, 7 Sep 2016, Bernd Edlinger wrote:

> interesting.  I just tried the test case from PR 77330 with _Decimal128.
> result: _Decimal128 did *not* trap with gcc4.8.4, but it does trap with
> gcc-7.0.0.

I checked with GCC 4.3; __alignof__ (_Decimal128) was 16 back then.  
Whether particular code happens to make use of that alignment requirement 
is inherently unpredictable.

> So it looks to me, the ABI has changed incompatible recently, and I

No, it's been the same since _Decimal128 was added in GCC 4.3 (__float128 
was first supported for 32-bit x86 in 4.4, and also had alignment 16 back 
then).

> I think that we need more flexibility in that area than we have
> right now, because config/i386/i386.h is doing the ABI,
> but it is shared by linux/glibc, windows, vxworks, darwin,
> solaris, to name a few.  I can't believe that we can just
> change that ABI for the whole world at the same time.

If the system compilers don't support these types, effectively we can.  
If the system compilers do support these types (and don't follow HJ's ABI 
document which says they are 16-byte-aligned), it's up to the target OS 
maintainers to make sure we stay compatible with them.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][AArch64] Improve legitimize_address

2016-09-07 Thread Christophe Lyon
Hi Wilco,

On 7 September 2016 at 14:43, Richard Earnshaw (lists)
 wrote:
> On 06/09/16 14:14, Wilco Dijkstra wrote:
>> Improve aarch64_legitimize_address - avoid splitting the offset if it is
>> supported.  When we do split, take the mode size into account.  BLKmode
>> falls into the unaligned case but should be treated like LDP/STP.
>> This improves codesize slightly due to fewer base address calculations:
>>
>> int f(int *p) { return p[5000] + p[7000]; }
>>
>> Now generates:
>>
>> f:
>>   add x0, x0, 16384
>>   ldr w1, [x0, 3616]
>>   ldr w0, [x0, 11616]
>>   add w0, w1, w0
>>   ret
>>
>> instead of:
>>
>> f:
>>   add x1, x0, 16384
>>   add x0, x0, 24576
>>   ldr w1, [x1, 3616]
>>   ldr w0, [x0, 3424]
>>   add w0, w1, w0
>>   ret
>>
>> OK for trunk?
>>
>> ChangeLog:
>> 2016-09-06  Wilco Dijkstra  
>>
>> gcc/
>>   * config/aarch64/aarch64.c (aarch64_legitimize_address):
>>   Avoid use of base_offset if offset already in range.
>
> OK.
>
> R.

After this patch, I've noticed a regression:
FAIL: gcc.target/aarch64/ldp_vec_64_1.c scan-assembler ldp\td[0-9]+, d[0-9]
You probably need to adjust the scan pattern.

Thanks,

Christophe


>
>> --
>>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 
>> 27bbdbad8cddc576f9ed4fd0670116bd6d318412..119ff0aecb0c9f88899fa141b2c7f9158281f9c3
>>  100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -5058,9 +5058,19 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, 
>> machine_mode mode)
>>/* For offsets aren't a multiple of the access size, the limit is
>>-256...255.  */
>>else if (offset & (GET_MODE_SIZE (mode) - 1))
>> - base_offset = (offset + 0x100) & ~0x1ff;
>> + {
>> +   base_offset = (offset + 0x100) & ~0x1ff;
>> +
>> +   /* BLKmode typically uses LDP of X-registers.  */
>> +   if (mode == BLKmode)
>> + base_offset = (offset + 512) & ~0x3ff;
>> + }
>> +  /* Small negative offsets are supported.  */
>> +  else if (IN_RANGE (offset, -256, 0))
>> + base_offset = 0;
>> +  /* Use 12-bit offset by access size.  */
>>else
>> - base_offset = offset & ~0xfff;
>> + base_offset = offset & (~0xfff * GET_MODE_SIZE (mode));
>>
>>if (base_offset != 0)
>>   {
>>
>


Re: Make max_align_t respect _Float128 [version 2]

2016-09-07 Thread Bernd Edlinger
On Wed, 7 Sep 2016, Joseph Myers wrote:
 > On Wed, 7 Sep 2016, Florian Weimer wrote:
 >
 > > The existence of such a cut-off constant appears useful, but it's 
not clear if
 > > it should be tied to the fundamental alignment (especially, as this 
discussion
 > > shows, the fundamental alignment will be somewhat in flux).
 >
 > I don't think it's in flux.  I think it's very rare for a new basic type
 > to be added that has increased alignment requirements compared to the
 > existing basic types.  _Decimal128 was added in GCC 4.3, which increased
 > the requirement on 32-bit x86 (only) (32-bit x86 malloc having been 
buggy
 > in that regard ever since then); __float128 / _Float128 did not increase
 > the requirement relative to that introduced with _Decimal128. 
(Obviously
 > if CPLEX part 2 defines vector types that might have larger alignment
requirements it must avoid defining them as basic types.)


Hmm...

interesting.  I just tried the test case from PR 77330 with _Decimal128.
result: _Decimal128 did *not* trap with gcc4.8.4, but it does trap with
gcc-7.0.0.

include 
#include 

struct s {
 _Decimal128 f1;
};

int main(void)
{
 struct s *p = malloc(sizeof(struct s));
 printf("%p\n", p);
 p->f1 = 1.234;
 return 0;
}

gcc -m32 test.c

./a.out
0x9588008
Segmentation fault (core dumped)


So it looks to me, the ABI has changed incompatible recently, and I
believe that we need to proceed more carefully here.  While in the
the future most targets will support aligned _Float128, we can
still support 4-byte aligned _Decimal128 on a per-target base.

I think that we need more flexibility in that area than we have
right now, because config/i386/i386.h is doing the ABI,
but it is shared by linux/glibc, windows, vxworks, darwin,
solaris, to name a few.  I can't believe that we can just
change that ABI for the whole world at the same time.

See my experiment at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77330#c9
which may be a hack, but shows that is no impossibility.



Bernd.


Re: [Patch RFC] Modify excess precision logic to permit FLT_EVAL_METHOD=16

2016-09-07 Thread Joseph Myers
On Tue, 6 Sep 2016, James Greenhalgh wrote:

> In c-family/c-cppbuiltin.c I've updated cpp_iec_559_value such that also
> allow setting __GEC_IEC_559 if FLT_EVAL_METHOD=16, and I've updated
> c_cpp_builtins to handle the new value, and use the new enum names.

I think the special cases in this patch show that the abstractions are 
wrong.

How about instead having more than one target macro / hook.  One would 
indicate that excess precision is used by insn patterns (and be set only 
for i386 and m68k).  Another would indicate the API-level excess precision 
requested by the target and might take an argument to indicate whether the 
"fast" or "standard" case is in use (in the x86 case the results would 
differ depending on the argument, in the ARM case they wouldn't).

* If the first hook returns true, excess precision is unpredictable in the 
"fast" case.  Otherwise excess precision is predictable.

* For short-circuiting excess_precision_type, maybe an internal setting 
EXCESS_PRECISION_NONE would make sense, and other settings would be turned 
into it if permitted by the results of the second hook.

(For s390, talk to the maintainers, but I think they really need to 
eliminate the bogus float_t definition in glibc and the FLT_EVAL_METHOD 
setting that goes along with it.  This is the sort of theoretical ABI 
change that should be safe in practice.)

> +   machine_mode float16_type_mode = (FLOATN_TYPE_NODE (0)
> + ? TYPE_MODE (FLOATN_TYPE_NODE (0))
> + : VOIDmode);

This is obviously wrong.  Use float16_type_node.

> +   ||mode == TYPE_MODE (float_type_node));

Missing space after ||.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [Patch libgcc] Enable HCmode multiply and divide (mulhc3/divhc3)

2016-09-07 Thread Joseph Myers
On Wed, 7 Sep 2016, James Greenhalgh wrote:

> 2016-09-07  James Greenhalgh  
> 

I'd think this should say

PR target/63250

as being part of fixing that bug (although not a fix by itself).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 3/4][Ada,DJGPP] Ada support for DJGPP

2016-09-07 Thread Andris Pavenis

On 09/05/2016 09:42 AM, Eric Botcazou wrote:

Attached output is from last test build (r239639 with DJGPP related patches
applied, last version of patches for Ada).

Very strange error, line 28 of gtype-ada.h is supposed to have a guard for
nodes containing the 'common' structure.  Can you post an excerpt of the file?

Verified that contents of gtype-ada.h from DJGPP build is identical with one from my Linux build of 
yesterdays trunk (except of CR LF used for line separator in DJGPP build).


I'll try to find the revision from which the problem appears.

Andris





Re: Make max_align_t respect _Float128 [version 2]

2016-09-07 Thread Joseph Myers
On Wed, 7 Sep 2016, Florian Weimer wrote:

> The existence of such a cut-off constant appears useful, but it's not clear if
> it should be tied to the fundamental alignment (especially, as this discussion
> shows, the fundamental alignment will be somewhat in flux).

I don't think it's in flux.  I think it's very rare for a new basic type 
to be added that has increased alignment requirements compared to the 
existing basic types.  _Decimal128 was added in GCC 4.3, which increased 
the requirement on 32-bit x86 (only) (32-bit x86 malloc having been buggy 
in that regard ever since then); __float128 / _Float128 did not increase 
the requirement relative to that introduced with _Decimal128.  (Obviously 
if CPLEX part 2 defines vector types that might have larger alignment 
requirements it must avoid defining them as basic types.)

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] [PR libcpp/71681] Fix handling header.gcc in subdirectories

2016-09-07 Thread Andris Pavenis
This patch fixes handling header.gcc in subdirectories when command line option -remap has been 
used. Current version finds header.gcc in directories from GCC include directory search path but 
fails to find them in subdirectories due to missing directory separator.


Andris

2016-09-07 Andris Pavenis  

* files.c (remap_filename): Fix handling -remap in subdirectories


>From 77e02ba755fa9ea66e046ecf6dbc61c306bc2a71 Mon Sep 17 00:00:00 2001
From: Andris Pavenis 
Date: Wed, 7 Sep 2016 18:22:32 +0300
Subject: [PATCH] * files.c (remap_filename): Fix handling -remap in
 subdirectories

---
 libcpp/files.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/libcpp/files.c b/libcpp/files.c
index c8bb637..26a7330 100644
--- a/libcpp/files.c
+++ b/libcpp/files.c
@@ -1672,7 +1672,7 @@ static char *
 remap_filename (cpp_reader *pfile, _cpp_file *file)
 {
   const char *fname, *p;
-  char *new_dir;
+  char *new_dir, *p3;
   cpp_dir *dir;
   size_t index, len;
 
@@ -1701,9 +1701,15 @@ remap_filename (cpp_reader *pfile, _cpp_file *file)
 	return NULL;
 
   len = dir->len + (p - fname + 1);
-  new_dir = XNEWVEC (char, len + 1);
+  new_dir = XNEWVEC (char, len + 2);
+  p3 = new_dir + dir->len;
   memcpy (new_dir, dir->name, dir->len);
-  memcpy (new_dir + dir->len, fname, p - fname + 1);
+  if (dir->len && !IS_DIR_SEPARATOR(dir->name[dir->len - 1]))
+	{
+	  *p3++ = '/';
+	  len++;
+	}
+  memcpy (p3, fname, p - fname + 1);
   new_dir[len] = '\0';
 
   dir = make_cpp_dir (pfile, new_dir, dir->sysp);
-- 
2.7.4



[committed] Move class substring_loc from c-family into gcc

2016-09-07 Thread David Malcolm
On Sat, 2016-09-03 at 10:13 +0100, Manuel López-Ibáñez wrote:
> On 02/09/16 23:55, Martin Sebor wrote:
> > > diff --git a/gcc/substring-locations.h b/gcc/substring
> > > -locations.h
> > > index f839c74..bb0de4f 100644
> > > --- a/gcc/substring-locations.h
> > > +++ b/gcc/substring-locations.h
> > > @@ -20,6 +20,73 @@ along with GCC; see the file COPYING3.  If not
> > > see
> > >   #ifndef GCC_SUBSTRING_LOCATIONS_H
> > >   #define GCC_SUBSTRING_LOCATIONS_H
> > > 
> > > +#include 
> > > +
> 
> Is this header file going to be used in the middle-end? If so, then
> it is
> suspicious that it includes cpplib.h. Otherwise, perhaps it should
> live in
> c-family/

The include of cpplib.h turned out not to be necessary, so I removed it.
I also reworded the comment for class substring_loc; here's the patch
I ended up committing.

Successfully bootstrapped on x86_64-pc-linux-gnu.

Committed to trunk as r240028.

gcc/ChangeLog:
* Makefile.in (OBJS): Add substring-locations.o.
* langhooks-def.h (class substring_loc): New forward decl.
(lhd_get_substring_location): New decl.
(LANG_HOOKS_GET_SUBSTRING_LOCATION): New macro.
(LANG_HOOKS_INITIALIZER): Add LANG_HOOKS_GET_SUBSTRING_LOCATION.
* langhooks.c (lhd_get_substring_location): New function.
* langhooks.h (class substring_loc): New forward decl.
(struct lang_hooks): Add field get_substring_location.
* substring-locations.c: New file, taking definition of
format_warning_va and format_warning_at_substring from
c-family/c-format.c, making them non-static.
* substring-locations.h (class substring_loc): Move class here
from c-family/c-common.h.  Add and rewrite comments.
(format_warning_va): New decl.
(format_warning_at_substring): New decl.
(get_source_location_for_substring): Add comment.

gcc/c-family/ChangeLog:
* c-common.c (get_cpp_ttype_from_string_type): Handle being passed
a POINTER_TYPE.
(substring_loc::get_location): Move to substring-locations.c,
keeping implementation as...
(c_get_substring_location): New function, from the above, reworked
to use accessors rather than member lookup.
* c-common.h (class substring_loc): Move to substring-locations.h,
replacing with a forward decl.
(c_get_substring_location): New decl.
* c-format.c: Include "substring-locations.h".
(format_warning_va): Move to substring-locations.c.
(format_warning_at_substring): Likewise.

gcc/c/ChangeLog:
* c-lang.c (LANG_HOOKS_GET_SUBSTRING_LOCATION): Use
c_get_substring_location for this new langhook.

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/diagnostic_plugin_test_string_literals.c: Include
"substring-locations.h".
---
 gcc/Makefile.in|   1 +
 gcc/c-family/c-common.c|  23 +--
 gcc/c-family/c-common.h|  32 +---
 gcc/c-family/c-format.c| 157 +
 gcc/c/c-lang.c |   3 +
 gcc/langhooks-def.h|   8 +-
 gcc/langhooks.c|   8 +
 gcc/langhooks.h|   9 +
 gcc/substring-locations.c  | 195 +
 gcc/substring-locations.h  |  71 
 .../diagnostic_plugin_test_string_literals.c   |   1 +
 11 files changed, 312 insertions(+), 196 deletions(-)
 create mode 100644 gcc/substring-locations.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 7c18285..332c85e 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1443,6 +1443,7 @@ OBJS = \
store-motion.o \
streamer-hooks.o \
stringpool.o \
+   substring-locations.o \
target-globals.o \
targhooks.o \
timevar.o \
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 399ba97..ec1d87a 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -1122,6 +1122,9 @@ static enum cpp_ttype
 get_cpp_ttype_from_string_type (tree string_type)
 {
   gcc_assert (string_type);
+  if (TREE_CODE (string_type) == POINTER_TYPE)
+string_type = TREE_TYPE (string_type);
+
   if (TREE_CODE (string_type) != ARRAY_TYPE)
 return CPP_OTHER;
 
@@ -1148,23 +1151,23 @@ get_cpp_ttype_from_string_type (tree string_type)
 
 GTY(()) string_concat_db *g_string_concat_db;
 
-/* Attempt to determine the source location of the substring.
-   If successful, return NULL and write the source location to *OUT_LOC.
-   Otherwise return an error message.  Error messages are intended
-   for GCC developers (to help debugging) rather than for end-users.  */
+/* Implementation of LANG_HOOKS_GET_SUBSTRING_LOCATION.  */
 
 const char *
-substring_loc::get_location (location_t *out_loc) const
+c_get_substring_location (const 

[Patch libgcc] Enable HCmode multiply and divide (mulhc3/divhc3)

2016-09-07 Thread James Greenhalgh

Hi,

This patch arranges for half-precision complex multiply and divide
routines to be built if __LIBGCC_HAS_HF_MODE__.  This will be true
if the target supports the _Float16 type.

OK?

Thanks,
James

---

libgcc/

2016-09-07  James Greenhalgh  

*  Makefile.in (lib2funcs): Build _mulhc3 and _divhc3.
* libgcc2.h (LIBGCC_HAS_HF_MODE): Conditionally define.
(HFtype): Likewise.
(HCtype): Likewise.
(__divhc3): Likewise.
(__mulhc3): Likewise.
* libgcc2.c: Support _mulhc3 and _divhc3.

diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in
index ba37c65..53e3ea2 100644
--- a/libgcc/Makefile.in
+++ b/libgcc/Makefile.in
@@ -414,8 +414,9 @@ lib2funcs = _muldi3 _negdi2 _lshrdi3 _ashldi3 _ashrdi3 _cmpdi2 _ucmpdi2	   \
 	_negvsi2 _negvdi2 _ctors _ffssi2 _ffsdi2 _clz _clzsi2 _clzdi2  \
 	_ctzsi2 _ctzdi2 _popcount_tab _popcountsi2 _popcountdi2	   \
 	_paritysi2 _paritydi2 _powisf2 _powidf2 _powixf2 _powitf2	   \
-	_mulsc3 _muldc3 _mulxc3 _multc3 _divsc3 _divdc3 _divxc3	   \
-	_divtc3 _bswapsi2 _bswapdi2 _clrsbsi2 _clrsbdi2
+	_mulhc3 _mulsc3 _muldc3 _mulxc3 _multc3 _divhc3 _divsc3	   \
+	_divdc3 _divxc3 _divtc3 _bswapsi2 _bswapdi2 _clrsbsi2	   \
+	_clrsbdi2
 
 # The floating-point conversion routines that involve a single-word integer.
 # XX stands for the integer mode.
diff --git a/libgcc/libgcc2.c b/libgcc/libgcc2.c
index 0a716bf..ec3b21f 100644
--- a/libgcc/libgcc2.c
+++ b/libgcc/libgcc2.c
@@ -1852,7 +1852,8 @@ NAME (TYPE x, int m)
 
 #endif
 
-#if ((defined(L_mulsc3) || defined(L_divsc3)) && LIBGCC2_HAS_SF_MODE) \
+#if((defined(L_mulhc3) || defined(L_divhc3)) && LIBGCC2_HAS_HF_MODE) \
+|| ((defined(L_mulsc3) || defined(L_divsc3)) && LIBGCC2_HAS_SF_MODE) \
 || ((defined(L_muldc3) || defined(L_divdc3)) && LIBGCC2_HAS_DF_MODE) \
 || ((defined(L_mulxc3) || defined(L_divxc3)) && LIBGCC2_HAS_XF_MODE) \
 || ((defined(L_multc3) || defined(L_divtc3)) && LIBGCC2_HAS_TF_MODE)
@@ -1861,7 +1862,13 @@ NAME (TYPE x, int m)
 #undef double
 #undef long
 
-#if defined(L_mulsc3) || defined(L_divsc3)
+#if defined(L_mulhc3) || defined(L_divhc3)
+# define MTYPE	HFtype
+# define CTYPE	HCtype
+# define MODE	hc
+# define CEXT	__LIBGCC_HF_FUNC_EXT__
+# define NOTRUNC __LIBGCC_HF_EXCESS_PRECISION__
+#elif defined(L_mulsc3) || defined(L_divsc3)
 # define MTYPE	SFtype
 # define CTYPE	SCtype
 # define MODE	sc
@@ -1922,7 +1929,7 @@ extern void *compile_type_assert[sizeof(INFINITY) == sizeof(MTYPE) ? 1 : -1];
 # define TRUNC(x)	__asm__ ("" : "=m"(x) : "m"(x))
 #endif
 
-#if defined(L_mulsc3) || defined(L_muldc3) \
+#if defined(L_mulhc3) || defined(L_mulsc3) || defined(L_muldc3) \
 || defined(L_mulxc3) || defined(L_multc3)
 
 CTYPE
@@ -1992,7 +1999,7 @@ CONCAT3(__mul,MODE,3) (MTYPE a, MTYPE b, MTYPE c, MTYPE d)
 }
 #endif /* complex multiply */
 
-#if defined(L_divsc3) || defined(L_divdc3) \
+#if defined(L_divhc3) || defined(L_divsc3) || defined(L_divdc3) \
 || defined(L_divxc3) || defined(L_divtc3)
 
 CTYPE
diff --git a/libgcc/libgcc2.h b/libgcc/libgcc2.h
index 72bb873..c46fb77 100644
--- a/libgcc/libgcc2.h
+++ b/libgcc/libgcc2.h
@@ -34,6 +34,12 @@ extern void __clear_cache (char *, char *);
 extern void __eprintf (const char *, const char *, unsigned int, const char *)
   __attribute__ ((__noreturn__));
 
+#ifdef __LIBGCC_HAS_HF_MODE__
+#define LIBGCC2_HAS_HF_MODE 1
+#else
+#define LIBGCC2_HAS_HF_MODE 0
+#endif
+
 #ifdef __LIBGCC_HAS_SF_MODE__
 #define LIBGCC2_HAS_SF_MODE 1
 #else
@@ -133,6 +139,10 @@ typedef unsigned int UTItype	__attribute__ ((mode (TI)));
 #endif
 #endif
 
+#if LIBGCC2_HAS_HF_MODE
+typedef		float HFtype	__attribute__ ((mode (HF)));
+typedef _Complex float HCtype	__attribute__ ((mode (HC)));
+#endif
 #if LIBGCC2_HAS_SF_MODE
 typedef 	float SFtype	__attribute__ ((mode (SF)));
 typedef _Complex float SCtype	__attribute__ ((mode (SC)));
@@ -424,6 +434,10 @@ extern SItype __negvsi2 (SItype);
 #endif /* COMPAT_SIMODE_TRAPPING_ARITHMETIC */
 
 #undef int
+#if LIBGCC2_HAS_HF_MODE
+extern HCtype __divhc3 (HFtype, HFtype, HFtype, HFtype);
+extern HCtype __mulhc3 (HFtype, HFtype, HFtype, HFtype);
+#endif
 #if LIBGCC2_HAS_SF_MODE
 extern DWtype __fixsfdi (SFtype);
 extern SFtype __floatdisf (DWtype);


Re: [PATCH] Detect whether target can use -fprofile-update=atomic

2016-09-07 Thread Christophe Lyon
On 7 September 2016 at 11:34, Martin Liška  wrote:
> On 09/07/2016 09:45 AM, Christophe Lyon wrote:
>> On 6 September 2016 at 15:45, Martin Liška  wrote:
>>> On 09/06/2016 03:31 PM, Jakub Jelinek wrote:
 sizeof (gcov_type) talks about the host gcov type, you want instead the
 target gcov type.  So
 TYPE_SIZE (gcov_type_node) == 32 vs. 64 (or TYPE_SIZE_UNIT (gcov_type_node)
 == 4 vs. 8).
 As SImode and DImode are in fact 4*BITS_PER_UNIT and 8*BITS_PER_UNIT,
 TYPE_SIZE_UNIT comparisons for 4 and 8 are most natural.
 And I wouldn't add gcc_unreachable, just warn for weirdo arches always.

   Jakub
>>>
>>> Thank you Jakub for helping me with that. I've used TYPE_SIZE_UNIT macro.
>>>
>>> Ready for trunk?
>>> Martin
>>
>> Hi Martin,
>>
>> On targets which do not support atomic profile update, your patch generates a
>> warning on gcc.dg/tree-prof/val-profiler-threads-1.c, making it fail.
>>
>> Do we need a new effective-target ?
>>
>> Christophe
>>
>
> Hi.
>
> Thanks for observation, I'm sending a patch that does that.
> Can you please test it?
>
It does work indeed, thanks.
(tested on arm* targets)

Christophe

> Thanks,
> Martin


Re: [PING] Re: [PATCH, i386] Fix some warnings/errors that appear when enabling -Wnarrowing when building gcc

2016-09-07 Thread Uros Bizjak
On Tue, Sep 6, 2016 at 8:06 PM, Eric Gallager  wrote:
> On 9/6/16, Uros Bizjak  wrote:
>> On Tue, Sep 6, 2016 at 5:33 PM, Eric Gallager  wrote:
>>> Ping? CC-ing an i386 maintainer since the patch mostly touches
>>> i386-specific files. Also, to clarify, I say "warnings/errors" because
>>> they start off as warnings in stage 1 but then become errors in stage
>>> 2. Note also that my patch leaves out the part where I modify the
>>> configure script to enable -Wnarrowing, because the rest of the code
>>> isn't quite ready for that yet.
>>
>> You are probably referring to [1]? It looks OK, modulo:
>>
>> +DEF_TUNE (X86_TUNE_QIMODE_MATH, "qimode_math", ~(0U))
>>
>> where parenthesis are not needed.
>>
>>
>> Please resubmit the patch with a ChangeLog entry, as instructed in [2]
>>
>> [1] https://gcc.gnu.org/ml/gcc-patches/2016-08/msg02129.html
>> [2] https://gcc.gnu.org/contribute.html#patches
>>
>> Uros.
>>
>
>
> Okay, reattached. Here's a ChangeLog entry to put in gcc/ChangeLog:
>
> 2016-09-06  Eric Gallager  
>
> * config/i386/i386.c: Add 'U' suffix to constants to avoid
> -Wnarrowing.
> * config/i386/x86-tune.def: Likewise.
> * opts.c: Likewise.
>
>
> (Please also note that I don't have commit access.)

Thanks, committed with slightly adjusted ChangeLog:

2016-09-07  Eric Gallager  

* config/i386/i386.c: Add 'U' suffix to processor feature bits
to avoid -Wnarrowing warning.
* config/i386/x86-tune.def: Likewise for DEF_TUNE selector bitmasks.
* opts.c: Likewise for SANITIZER_OPT bitmasks.

Uros.


Re: [x86] Disable STV pass if -mstackrealign is enabled.

2016-09-07 Thread H.J. Lu
On Wed, Aug 31, 2016 at 12:29 PM, Uros Bizjak  wrote:
>> the new STV pass generates SSE instructions in 32-bit mode very late in the
>> pipeline and doesn't bother about realigning the stack, so it wreaks havoc on
>> OSes where you need to realign the stack, e.g. Windows, but I guess Solaris 
>> is
>> equally affected.  Therefore the attached patch disables it if -mstackrealign
>> is enabled (the option is automatically enabled on Windows and Solaris when
>> SSE support is enabled), as already done for -mpreferred-stack-boundary={2,3}
>> and -mincoming-stack-boundary={2,3}.
>>
>> Tested on x86/Windows, OK for mainline and 6 branch?
>>
>>
>> 2016-08-31  Eric Botcazou  
>>
>>* config/i386/i386.c (ix86_option_override_internal): Also disable the
>>STV pass if -mstackrealign is enabled.
>
> OK for mainline and gcc-6 branch.
>

Is there a testcase to show the problem with -mincoming-stack-boundary=
on Linux?

-- 
H.J.


PING: Re: [PATCH, LRA] Fix PR rtl-optimization 77289, LRA matching constraint problem

2016-09-07 Thread Peter Bergner
Ping this patch:

https://gcc.gnu.org/ml/gcc-patches/2016-08/msg02099.html

Peter



Re: Ping**2! Re: [PATCH, Fortran] Extension: AUTOMATIC/STATIC symbol attributes with -fdec-static

2016-09-07 Thread Andre Vehreschild
Hi Fritz,

please note: I do not have official review privileges. So my vote here
is rather an advise to you and the official reviewers. Often such a
inofficial review helps to speed things up, because the official ones
are pointed to the nics and nacs and don't have to bother with the
minor things.

So here it comes:

- Do I understand this correctly: AUTOMATIC and STATIC have to come last,
  i.e., right before the :: where declaring, e.g., a variable?


- Running:

  $ contrib/check_GNU_style.sh dec_static.patch

  Reports some style issues in the C code, that should be fixed before
  commit. (Style in Fortran testcases does not matter that much.)


- I have deleted huge parts of the diff and just kept the parts I had a
  question/remark for:

> diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
> index b34ae86..a0cf78b 100644
> --- a/gcc/fortran/gfortran.texi
> +++ b/gcc/fortran/gfortran.texi
> @@ -2120,7 +2121,6 @@ consider @code{BACKSPACE} or @code{REWIND} to properly 
> position
>  the file before the EOF marker.  As an extension, the run-time error may
>  be disabled using -std=legacy.
>  
> -

Please change formatting in a separate patch or not at all (here!).
This policy is to distinguish cosmetic changes from relevant ones.

>  @node STRUCTURE and RECORD
>  @subsection @code{STRUCTURE} and @code{RECORD}
>  @cindex @code{STRUCTURE}
> @@ -2420,6 +2420,53 @@ here:
>@tab @code{--} @tab @code{FLOATI} @tab @code{FLOATJ} @tab @code{FLOATK}
>  @end multitable
>  
> +@node AUTOMATIC and STATIC attributes
> +@subsection @code{AUTOMATIC} and @code{STATIC} attributes
> +@cindex variable attributes
> +@cindex @code{AUTOMATIC}
> +@cindex @code{STATIC}
> +
> +With @option{-fdec-static} GNU Fortran supports the explicit specification of
> +two addition variable attributes: @code{STATIC} and @code{AUTOMATIC}. These

two additional variable ...
^^ 

But is it only for variables? Can't it be used for equivalences or
other constructs, too?

> diff --git a/gcc/fortran/invoke.texi b/gcc/fortran/invoke.texi
> index 15c131a..a5da59e 100644
> --- a/gcc/fortran/invoke.texi
> +++ b/gcc/fortran/invoke.texi
> @@ -255,6 +255,11 @@ instead where possible.
>  Enable B/I/J/K kind variants of existing integer functions (e.g. BIAND, 
> IIAND,
>  JIAND, etc...). For a complete list of intrinsics see the full documentation.
>  
> +@item -fdec-static
> +@opindex @code{fdec-static}
> +Enable STATIC and AUTOMATIC as attributes specifying storage location.
> +STATIC is equivalent to SAVE, and locals are typically AUTOMATIC by default.

Well, this description to me sounds like: "Those attributes are
useless, because they can be substituted." This is clearly not what you
intend. I propose to include into the description that with "this
switch the dec-extension" is available "to explicitly specify the
storage of entities". Then the last sentence is still a good hint for
all fortraneers that don't know the extension.

> +
>  @item -fdollar-ok
>  @opindex @code{fdollar-ok}
>  @cindex @code{$}
> diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
> index 8ec5400..260512d 100644
> --- a/gcc/fortran/lang.opt
> +++ b/gcc/fortran/lang.opt
> @@ -432,6 +432,10 @@ fdec-structure
>  Fortran
>  Enable support for DEC STRUCTURE/RECORD.
>  
> +fdec-static
> +Fortran Var(flag_dec_static)
> +Enable STATIC and AUTOMATIC attributes.

How about: Enable the dec-extension of STATIC and AUTOMATIC attributes.
Just a proposal.

> +
>  fdefault-double-8
>  Fortran Var(flag_default_double)
>  Set the default double precision kind to an 8 byte wide type.


> diff --git a/gcc/testsuite/gfortran.dg/dec_static_1.f90 
> b/gcc/testsuite/gfortran.dg/dec_static_1.f90
> new file mode 100644
> index 000..4dcfc7c
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/dec_static_1.f90

-  Please add some testcases where the new error messages are tested.

So much from my side. Btw, I haven't applied the patch and tested
whether it runs or collides with other proposed patches. That is
usually done by Dominique and I did not want to waste doing it a second
time.

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


Re: [PATCH][v3] GIMPLE store merging pass

2016-09-07 Thread Bernd Schmidt



On 09/07/2016 10:19 AM, Richard Biener wrote:

On Tue, 6 Sep 2016, Jakub Jelinek wrote:



If you want a 64-bit store, you'd need to merge the two, and that would be
even more expensive.  It is a matter of say:
movl $0x12345678, (%rsp)
movl $0x09abcdef, 4(%rsp)
vs.
movabsq $0x09abcdef12345678, %rax
movq %rax, (%rsp)
vs.
movl $0x09abcdef, %eax
salq $32, %rax
orq $0x12345678, %rax
movq $rax, (%rsp)


vs.

movq $LC0, (%rsp)

?


Not the same. That moves the address of $LC0.


Bernd


[PATCH][expmed.c] PR middle-end/77426 Delete duplicate condition in synth_mult

2016-09-07 Thread Kyrill Tkachov

Hi all,

The duplicate mode check in synth can just be deleted IMO. It was introduced as 
part of r139821 that was
a much larger change introducing size/speed differentiation to the RTL midend. 
So I think it's just a typo/copy-pasto.

Tested on aarch64-none-elf.
Ok?

Thanks,
Kyrill

2016-09-07  Kyrylo Tkachov  

PR middle-end/77426
* expmed.c (synth_mult): Delete duplicate mode check.
diff --git a/gcc/expmed.c b/gcc/expmed.c
index 1cedf023c8e8916d887bd3a9d9a723e3cc2354f7..a5da8836f21debcda3b834cb869348ea6cb33414 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -2572,7 +2572,6 @@ synth_mult (struct algorithm *alg_out, unsigned HOST_WIDE_INT t,
   entry_ptr = alg_hash_entry_ptr (hash_index);
   if (entry_ptr->t == t
   && entry_ptr->mode == mode
-  && entry_ptr->mode == mode
   && entry_ptr->speed == speed
   && entry_ptr->alg != alg_unknown)
 {


Re: [PATCH][AArch64] Improve legitimize_address

2016-09-07 Thread Richard Earnshaw (lists)
On 06/09/16 14:14, Wilco Dijkstra wrote:
> Improve aarch64_legitimize_address - avoid splitting the offset if it is
> supported.  When we do split, take the mode size into account.  BLKmode
> falls into the unaligned case but should be treated like LDP/STP.
> This improves codesize slightly due to fewer base address calculations:
> 
> int f(int *p) { return p[5000] + p[7000]; }
> 
> Now generates:
> 
> f:
>   add x0, x0, 16384
>   ldr w1, [x0, 3616]
>   ldr w0, [x0, 11616]
>   add w0, w1, w0
>   ret
> 
> instead of:
> 
> f:
>   add x1, x0, 16384
>   add x0, x0, 24576
>   ldr w1, [x1, 3616]
>   ldr w0, [x0, 3424]
>   add w0, w1, w0
>   ret
> 
> OK for trunk?
> 
> ChangeLog:
> 2016-09-06  Wilco Dijkstra  
> 
> gcc/
>   * config/aarch64/aarch64.c (aarch64_legitimize_address):
>   Avoid use of base_offset if offset already in range.

OK.

R.

> --
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 27bbdbad8cddc576f9ed4fd0670116bd6d318412..119ff0aecb0c9f88899fa141b2c7f9158281f9c3
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -5058,9 +5058,19 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  */, 
> machine_mode mode)
>/* For offsets aren't a multiple of the access size, the limit is
>-256...255.  */
>else if (offset & (GET_MODE_SIZE (mode) - 1))
> - base_offset = (offset + 0x100) & ~0x1ff;
> + {
> +   base_offset = (offset + 0x100) & ~0x1ff;
> +
> +   /* BLKmode typically uses LDP of X-registers.  */
> +   if (mode == BLKmode)
> + base_offset = (offset + 512) & ~0x3ff;
> + }
> +  /* Small negative offsets are supported.  */
> +  else if (IN_RANGE (offset, -256, 0))
> + base_offset = 0;
> +  /* Use 12-bit offset by access size.  */
>else
> - base_offset = offset & ~0xfff;
> + base_offset = offset & (~0xfff * GET_MODE_SIZE (mode));
>  
>if (base_offset != 0)
>   {
> 



Re: [PATCH GCC 8/9]Adjust test case for CFG changes in vectorized loop

2016-09-07 Thread Jeff Law

On 09/06/2016 12:53 PM, Bin Cheng wrote:

Hi,
After CFG changes in vectorizer, the epilog loop now can be completely peeled, 
resulting in changes in the number of instructions that these tests check.  
This patch adjusts related checking strings.

Thanks,
bin


gcc/testsuite/ChangeLog
2016-09-01  Bin Cheng  

* gcc.target/i386/l_fma_float_1.c: Revise test.
* gcc.target/i386/l_fma_float_2.c: Ditto.
* gcc.target/i386/l_fma_float_3.c: Ditto.
* gcc.target/i386/l_fma_float_4.c: Ditto.
* gcc.target/i386/l_fma_float_5.c: Ditto.
* gcc.target/i386/l_fma_float_6.c: Ditto.
* gcc.target/i386/l_fma_double_1.c: Ditto.
* gcc.target/i386/l_fma_double_2.c: Ditto.
* gcc.target/i386/l_fma_double_3.c: Ditto.
* gcc.target/i386/l_fma_double_4.c: Ditto.
* gcc.target/i386/l_fma_double_5.c: Ditto.
* gcc.target/i386/l_fma_double_6.c: Ditto.

OK when prerequisites are approved.
jeff



Re: [PATCH][v3] GIMPLE store merging pass

2016-09-07 Thread Jeff Law

On 09/07/2016 02:19 AM, Richard Biener wrote:

On Tue, 6 Sep 2016, Jakub Jelinek wrote:


On Tue, Sep 06, 2016 at 04:59:23PM +0100, Kyrill Tkachov wrote:

On 06/09/16 16:32, Jakub Jelinek wrote:

On Tue, Sep 06, 2016 at 04:14:47PM +0100, Kyrill Tkachov wrote:

The v3 of this patch addresses feedback I received on the version posted at [1].
The merged store buffer is now represented as a char array that we splat values 
onto with
native_encode_expr and native_interpret_expr. This allows us to merge anything 
that native_encode_expr
accepts, including floating point values and short vectors. So this version 
extends the functionality
of the previous one in that it handles floating point values as well.

The first phase of the algorithm that detects the contiguous stores is also 
slightly refactored according
to feedback to read more fluently.

Richi, I experimented with merging up to MOVE_MAX bytes rather than word size 
but I got worse results on aarch64.
MOVE_MAX there is 16 (because it has load/store register pair instructions) but 
the 128-bit immediates that we ended
synthesising were too complex. Perhaps the TImode immediate store RTL 
expansions could be improved, but for now
I've left the maximum merge size to be BITS_PER_WORD.

At least from playing with this kind of things in the RTL PR22141 patch,
I remember storing 64-bit constants on x86_64 compared to storing 2 32-bit
constants usually isn't a win (not just for speed optimized blocks but also for
-Os).  For 64-bit store if the constant isn't signed 32-bit or unsigned
32-bit you need movabsq into some temporary register which has like 3 times 
worse
latency than normal store if I remember well, and then store it.


We could restrict the maximum width of the stores generated to 32 bits on 
x86_64.
I think this would need another parameter or target macro for the target to set.
Alternatively, is it a possibility for x86 to be a bit smarter in its DImode 
mov-immediate
expansion? For example break up the 64-bit movabsq immediate into two SImode 
immediates?


If you want a 64-bit store, you'd need to merge the two, and that would be
even more expensive.  It is a matter of say:
movl $0x12345678, (%rsp)
movl $0x09abcdef, 4(%rsp)
vs.
movabsq $0x09abcdef12345678, %rax
movq %rax, (%rsp)
vs.
movl $0x09abcdef, %eax
salq $32, %rax
orq $0x12345678, %rax
movq $rax, (%rsp)


vs.

movq $LC0, (%rsp)

?


etc.  Guess it needs to be benchmarked on contemporary CPUs, I'm pretty sure
the last sequence is the worst one.


I think the important part to notice is that it should be straight forward
for a target / the expander to split a large store from an immediate
into any of the above but very hard to do the opposite.  Thus from a
GIMPLE side "canonicalizing" to large stores (that are eventually
supported and well-aligned) seems best to me.

Agreed.





I'm aware of that. The patch already has logic to avoid emitting unaligned 
accesses
for SLOW_UNALIGNED_ACCESS targets. Beyond that the patch introduces the 
parameter
PARAM_STORE_MERGING_ALLOW_UNALIGNED that can be used by the user or target to
forbid generation of unaligned stores by the pass altogether. Beyond that I'm 
not sure
how to behave more intelligently here. Any ideas?


Dunno, the heuristics was the main problem with my patch.  Generally, I'd
say there is a difference between cold and hot blocks, in cold ones perhaps
unaligned stores are more appropriate (if supported at all and not way too
slow), while in hot ones less desirable.


Note that I repeatedly argue that if we can canonicalize sth to "larger"
then even if unaligned, the expander should be able to produce optimal
code again (it might not do, of course).
And agreed.  Furthermore, it's in line with our guiding principles WRT 
separation of the tree/SSA optimizers from target dependencies.


So let's push those decisions into the expanders/backend/target and 
canonicalize to the larger stores.


jeff




Re: [PATCH GCC 7/9]Skip loops iterating only 1 time in predictive commoning

2016-09-07 Thread Jeff Law

On 09/06/2016 12:53 PM, Bin Cheng wrote:

Hi,
For loops which are bounded to iterate only 1 time (thus loop's latch doesn't 
roll), there is nothing to predictive common, this patch detects/skips these 
cases.  A test is also added in 
gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f for this.

Thanks,
bin

2016-09-01  Bin Cheng  

* tree-predcom.c (tree_predictive_commoning_loop): Skip loop that only
iterates 1 time.

gcc/testsuite/ChangeLog
2016-09-01  Bin Cheng  

* gfortran.dg/vect/fast-math-mgrid-resid.f: New test string.


OK.
jeff


Re: [PATCH GCC 3/9]Support rewriting non-lcssa phis for vars live outside of vect-loop

2016-09-07 Thread Jeff Law

On 09/06/2016 12:51 PM, Bin Cheng wrote:

Hi,
Current implementation requires that variables live outside of vect-loop 
satisfying LCSSA form, this patch relaxes the restriction.  It keeps the old 
behavior for LCSSA PHI node by replacing use of live var with result of that 
PHI; for other uses of live var, it simply replaces all uses outside loop with 
the newly computed var.

Thanks,
bin

2016-09-01  Bin Cheng  

* tree-vect-loop.c (vectorizable_live_operation): Support handling
for live variable outside loop but not in lcssa form.


OK.
jeff


Re: [PATCH GCC 5/9]Put copied loop after its preheader and after the original loop's latch in basic block link list

2016-09-07 Thread Jeff Law

On 09/06/2016 12:52 PM, Bin Cheng wrote:

Hi,
This simple patch changes slpeel_tree_duplicate_loop_edge_cfg by putting copied 
loop after its new preheader and after the original loop's latch in basic 
block's linked list.  It doesn't change CFG at all, but makes the dump cfg a 
little bit easier to read.  I assume this is good for basic block reordering 
too?

Thanks,
bin

2016-09-01  Bin Cheng  

* tree-vect-loop-manip.c (slpeel_tree_duplicate_loop_to_edge_cfg): Put
duplicated loop after its preheader and after the original loop.

In theory bb reordering ought to clean this up.  But better to generate 
good code from the start when it's easy to do so.


OK.

jeff


Re: [PATCH GCC 4/9]Check niters for peeling for data access gaps in analyzer

2016-09-07 Thread Jeff Law

On 09/06/2016 12:51 PM, Bin Cheng wrote:

Hi,
This patch checks if loop has enough niters for peeling for data access gaps in 
vect_analyze_loop_2, while now this check is in vect_transform_loop stage.  The 
problem is vectorizer may vectorize loops without enough iterations and 
generate false guard on the vectorized loop.  Though the loop is successfully 
vectorized, it will never be executed, and most likely, it will be removed 
during cfg-cleanup.  Examples can be found in revised tests of this patch.

Thanks,
bin

2016-09-01  Bin Cheng  

* tree-vect-loop.c (vect_analyze_loop_2): Check and skip loop if it
has no enough iterations for LOOP_VINFO_PEELING_FOR_GAPS.

gcc/testsuite/ChangeLog
2016-09-01  Bin Cheng  

* gcc.dg/vect/vect-98.c: Refine test case.
* gcc.dg/vect/vect-strided-a-u8-i8-gap2.c: Ditto.
* gcc.dg/vect/vect-strided-u8-i8-gap2.c: Ditto.
* gcc.dg/vect/vect-strided-u8-i8-gap4.c: Ditto.


OK.
jeff


Re: Advice sought for debugging a lto1 ICE (was: Implement C _FloatN, _FloatNx types [version 6])

2016-09-07 Thread Richard Biener
On Wed, Sep 7, 2016 at 1:52 PM, Thomas Schwinge  wrote:
> Hi!
>
> I trimmed the CC list -- I'm looking for advice about debugging a lto1
> ICE.
>
> On Fri, 19 Aug 2016 11:05:59 +, Joseph Myers  
> wrote:
>> On Fri, 19 Aug 2016, Richard Biener wrote:
>> > Can you quickly verify if LTO works with the new types?  I don't see 
>> > anything
>> > that would prevent it but having new global trees and backends 
>> > initializing them
>> > might come up with surprises (see tree-streamer.c:preload_common_nodes)
>>
>> Well, the execution tests are in gcc.dg/torture, which is run with various
>> options including -flto (and I've checked the testsuite logs to confirm
>> these tests are indeed run with such options).  Is there something else
>> you think should be tested?
>
> As I noted in :
>
> As of the PR32187 commit r239625 "Implement C _FloatN, _FloatNx types", 
> nvptx
> offloading is broken, ICEs in LTO stream-in.  Probably some kind of 
> data-type
> mismatch that is not visible with Intel MIC offloading (using the same 
> data
> types) but explodes with nvptx.  I'm having a look.
>
> I know how to use "-save-temps -v" to re-run the ICEing lto1 in GDB; a
> backtrace of the ICE looks as follows:
>
> #0  fancy_abort (file=file@entry=0x10d61d0 "[...]/source-gcc/gcc/vec.h", 
> line=line@entry=727, function=function@entry=0x10d6e3a 
> <_ZZN3vecIP9tree_node7va_heap8vl_embedEixEjE12__FUNCTION__> "operator[]") at 
> [...]/source-gcc/gcc/diagnostic.c:1414
> #1  0x0058c9ef in vec::operator[] 
> (this=0x16919c0, ix=ix@entry=185) at [...]/source-gcc/gcc/vec.h:727
> #2  0x0058ca33 in vec::operator[] 
> (this=this@entry=0x1691998, ix=ix@entry=185) at 
> [...]/source-gcc/gcc/vec.h:1211

so it wants tree 185 which is (given the low number) likely one streamed by
preload_common_nodes.  This is carefully crafted to _not_ diverge by
frontend (!) it wasn't even designed to cope with global trees being present
or not dependent on target (well, because the target is always the
same! mind you!)

Now -- in theory it should deal with NULLs just fine (resulting in
error_mark_node), but it can diverge when there are additional
compount types (like vectors, complex
or array or record types) whose element types are not in the set of
global trees.
The complex _FloatN types would be such a case given they appear before their
components.  That mixes up the ordering at least.

So I suggest to add a print_tree to where it does the streamer_tree_cache_append
and compare cc1 and lto1 outcome.

The ICE above means the lto1 has fewer preloaded nodes I guess.

Richard.

> #3  0x00c73e54 in streamer_tree_cache_get_tree (cache=0x1691990, 
> ix=ix@entry=185) at [...]/source-gcc/gcc/tree-streamer.h:98
> #4  0x00c73eb9 in streamer_get_pickled_tree 
> (ib=ib@entry=0x7fffceb0, data_in=data_in@entry=0x1691930) at 
> [...]/source-gcc/gcc/tree-streamer-in.c:1112
> #5  0x008f841b in lto_input_tree_1 (ib=ib@entry=0x7fffceb0, 
> data_in=data_in@entry=0x1691930, tag=tag@entry=LTO_tree_pickle_reference, 
> hash=hash@entry=0) at [...]/source-gcc/gcc/lto-streamer-in.c:1404
> #6  0x008f8844 in lto_input_tree (ib=0x7fffceb0, 
> data_in=0x1691930) at [...]/source-gcc/gcc/lto-streamer-in.c:1444
> #7  0x00c720d2 in lto_input_ts_list_tree_pointers 
> (ib=ib@entry=0x7fffceb0, data_in=data_in@entry=0x1691930, 
> expr=expr@entry=0x76993780) at [...]/source-gcc/gcc/tree-streamer-in.c:861
> #8  0x00c7444e in streamer_read_tree_body 
> (ib=ib@entry=0x7fffceb0, data_in=data_in@entry=0x1691930, 
> expr=expr@entry=0x76993780) at 
> [...]/source-gcc/gcc/tree-streamer-in.c:1077
> #9  0x008f6428 in lto_read_tree_1 (ib=ib@entry=0x7fffceb0, 
> data_in=data_in@entry=0x1691930, expr=expr@entry=0x76993780) at 
> [...]/source-gcc/gcc/lto-streamer-in.c:1285
> #10 0x008f651b in lto_read_tree (ib=ib@entry=0x7fffceb0, 
> data_in=data_in@entry=0x1691930, tag=tag@entry=4, hash=hash@entry=4086308758) 
> at [...]/source-gcc/gcc/lto-streamer-in.c:1315
> #11 0x008f85db in lto_input_tree_1 (ib=ib@entry=0x7fffceb0, 
> data_in=data_in@entry=0x1691930, tag=tag@entry=4, hash=hash@entry=4086308758) 
> at [...]/source-gcc/gcc/lto-streamer-in.c:1427
> #12 0x008f8673 in lto_input_scc (ib=ib@entry=0x7fffceb0, 
> data_in=data_in@entry=0x1691930, len=len@entry=0x7fffceac, 
> entry_len=entry_len@entry=0x7fffcea8) at 
> [...]/source-gcc/gcc/lto-streamer-in.c:1339
> #13 0x005890f7 in lto_read_decls 
> (decl_data=decl_data@entry=0x77fc, data=data@entry=0x169d570, 
> resolutions=...) at [...]/source-gcc/gcc/lto/lto.c:1693
> #14 0x005898c8 in lto_file_finalize 
> (file_data=file_data@entry=0x77fc, 

Re: [PATCH GCC 2/9]Add interface reseting original copy tables in cfg.c

2016-09-07 Thread Jeff Law

On 09/06/2016 12:50 PM, Bin Cheng wrote:

Hi,
This simple patch adds interface reseting original copy table in cfg.c.  This 
will be used in rewriting vect_do_peeling_* functions in vectorizer so that we 
don't need to release/allocate tables between prolog and epilog peeling.

Thanks,
bin

2016-09-01  Bin Cheng  

* cfg.c (reset_original_copy_tables): New func.
* cfg.h (reset_original_copy_tables): New decl.

Needs a function comment for reset_original_copy_tables.  Should be fine 
with that change.


Jeff


Re: [PATCH GCC 1/9]Delete useless code in tree-vect-loop-manip.c

2016-09-07 Thread Jeff Law

On 09/06/2016 12:49 PM, Bin Cheng wrote:

Hi,
This is a patch set generating new control flow graph for vectorized loop and 
its peeling loops.  At the moment, CFG for vecorized loop is complicated and 
sub-optimal.  Major issues are like:
A) For both prologue and vectorized loop, it generates guard/branch before 
loops checking if the following (prologue/vectorized) loop should be skipped.  
It also generates guard/branch after loops checking if the next loop 
(vectorized/epilogue) loop should be skipped.
B) Depending on how conditional set is supported by targets, it may generates 
one additional if-statement (branch) setting the niters for prologue loop.
C) In the worst cases, up to 4 branch instructions need to be executed before 
vectorized loop is entered.
D) For loops without enough niters, it checks some (niters_prologue) 
iterations with prologue loop; then checks if the rest number of iterations (niters 
- niters_prologue) is enough for vectorization; if not, it skips vectorized loop 
and continues with epilogue loop.  This is bad since vectorized loop won't be 
executed at all after all the hassle.

This patch set improves it by merging different checks thus only 2 branch 
instructions (could be further reduced in combination with loop versioning) are 
executed before vectorized loop; it does better in compile time analysis in 
order to avoid prologue/epilogue peeling if possible; it improves code 
generation in various ways (live overflow handling, generating short live 
ranges).  In terms of implementation, it tries to factor SSA updating code out 
of CFG changing code, I think this may help future work replacing slpeel_* with 
generic GIMPLE loop copier.

So far there are 9 patches in the set, patch [1-5] are small prerequisites for 
major change which is done by patch 6.  Patch [7-9] are small patches either 
address test case or improve code generation.  Final bootstrap and test of 
patch set ongoing on x86_64 and AArch64.  Assume no new failure or will be 
fixed, any comments on this?

This is the first patch deleting useless code in tree-vect-loop-manip.c, as 
well as fixing obvious code style issue.

Thanks,
bin

2016-09-01  Bin Cheng  

* tree-vect-loop-manip.c (slpeel_can_duplicate_loop_p): Fix code
style issue.
(vect_do_peeling_for_loop_bound, vect_do_peeling_for_alignment):
Remove useless code.
Seems obvious to me -- I can't think of any reason why we'd emit a NULL 
sequence to the loop preheader edge.


jeff



Advice sought for debugging a lto1 ICE (was: Implement C _FloatN, _FloatNx types [version 6])

2016-09-07 Thread Thomas Schwinge
Hi!

I trimmed the CC list -- I'm looking for advice about debugging a lto1
ICE.

On Fri, 19 Aug 2016 11:05:59 +, Joseph Myers  
wrote:
> On Fri, 19 Aug 2016, Richard Biener wrote:
> > Can you quickly verify if LTO works with the new types?  I don't see 
> > anything
> > that would prevent it but having new global trees and backends initializing 
> > them
> > might come up with surprises (see tree-streamer.c:preload_common_nodes)
> 
> Well, the execution tests are in gcc.dg/torture, which is run with various 
> options including -flto (and I've checked the testsuite logs to confirm 
> these tests are indeed run with such options).  Is there something else 
> you think should be tested?

As I noted in :

As of the PR32187 commit r239625 "Implement C _FloatN, _FloatNx types", 
nvptx
offloading is broken, ICEs in LTO stream-in.  Probably some kind of 
data-type
mismatch that is not visible with Intel MIC offloading (using the same data
types) but explodes with nvptx.  I'm having a look.

I know how to use "-save-temps -v" to re-run the ICEing lto1 in GDB; a
backtrace of the ICE looks as follows:

#0  fancy_abort (file=file@entry=0x10d61d0 "[...]/source-gcc/gcc/vec.h", 
line=line@entry=727, function=function@entry=0x10d6e3a 
<_ZZN3vecIP9tree_node7va_heap8vl_embedEixEjE12__FUNCTION__> "operator[]") at 
[...]/source-gcc/gcc/diagnostic.c:1414
#1  0x0058c9ef in vec::operator[] 
(this=0x16919c0, ix=ix@entry=185) at [...]/source-gcc/gcc/vec.h:727
#2  0x0058ca33 in vec::operator[] 
(this=this@entry=0x1691998, ix=ix@entry=185) at [...]/source-gcc/gcc/vec.h:1211
#3  0x00c73e54 in streamer_tree_cache_get_tree (cache=0x1691990, 
ix=ix@entry=185) at [...]/source-gcc/gcc/tree-streamer.h:98
#4  0x00c73eb9 in streamer_get_pickled_tree 
(ib=ib@entry=0x7fffceb0, data_in=data_in@entry=0x1691930) at 
[...]/source-gcc/gcc/tree-streamer-in.c:1112
#5  0x008f841b in lto_input_tree_1 (ib=ib@entry=0x7fffceb0, 
data_in=data_in@entry=0x1691930, tag=tag@entry=LTO_tree_pickle_reference, 
hash=hash@entry=0) at [...]/source-gcc/gcc/lto-streamer-in.c:1404
#6  0x008f8844 in lto_input_tree (ib=0x7fffceb0, 
data_in=0x1691930) at [...]/source-gcc/gcc/lto-streamer-in.c:1444
#7  0x00c720d2 in lto_input_ts_list_tree_pointers 
(ib=ib@entry=0x7fffceb0, data_in=data_in@entry=0x1691930, 
expr=expr@entry=0x76993780) at [...]/source-gcc/gcc/tree-streamer-in.c:861
#8  0x00c7444e in streamer_read_tree_body 
(ib=ib@entry=0x7fffceb0, data_in=data_in@entry=0x1691930, 
expr=expr@entry=0x76993780) at [...]/source-gcc/gcc/tree-streamer-in.c:1077
#9  0x008f6428 in lto_read_tree_1 (ib=ib@entry=0x7fffceb0, 
data_in=data_in@entry=0x1691930, expr=expr@entry=0x76993780) at 
[...]/source-gcc/gcc/lto-streamer-in.c:1285
#10 0x008f651b in lto_read_tree (ib=ib@entry=0x7fffceb0, 
data_in=data_in@entry=0x1691930, tag=tag@entry=4, hash=hash@entry=4086308758) 
at [...]/source-gcc/gcc/lto-streamer-in.c:1315
#11 0x008f85db in lto_input_tree_1 (ib=ib@entry=0x7fffceb0, 
data_in=data_in@entry=0x1691930, tag=tag@entry=4, hash=hash@entry=4086308758) 
at [...]/source-gcc/gcc/lto-streamer-in.c:1427
#12 0x008f8673 in lto_input_scc (ib=ib@entry=0x7fffceb0, 
data_in=data_in@entry=0x1691930, len=len@entry=0x7fffceac, 
entry_len=entry_len@entry=0x7fffcea8) at 
[...]/source-gcc/gcc/lto-streamer-in.c:1339
#13 0x005890f7 in lto_read_decls 
(decl_data=decl_data@entry=0x77fc, data=data@entry=0x169d570, 
resolutions=...) at [...]/source-gcc/gcc/lto/lto.c:1693
#14 0x005898c8 in lto_file_finalize 
(file_data=file_data@entry=0x77fc, file=file@entry=0x15eedb0) at 
[...]/source-gcc/gcc/lto/lto.c:2037
#15 0x00589928 in lto_create_files_from_ids 
(file=file@entry=0x15eedb0, file_data=file_data@entry=0x77fc, 
count=count@entry=0x7fffd054) at [...]/source-gcc/gcc/lto/lto.c:2047
#16 0x00589a7a in lto_file_read (file=0x15eedb0, 
resolution_file=resolution_file@entry=0x0, count=count@entry=0x7fffd054) at 
[...]/source-gcc/gcc/lto/lto.c:2088
#17 0x00589e84 in read_cgraph_and_symbols (nfiles=1, 
fnames=0x160e990) at [...]/source-gcc/gcc/lto/lto.c:2798
#18 0x0058a572 in lto_main () at [...]/source-gcc/gcc/lto/lto.c:3299
#19 0x00a48eff in compile_file () at 
[...]/source-gcc/gcc/toplev.c:466
#20 0x00550943 in do_compile () at 
[...]/source-gcc/gcc/toplev.c:2010
#21 toplev::main (this=this@entry=0x7fffd180, argc=argc@entry=20, 
argv=0x15daf20, argv@entry=0x7fffd288) at [...]/source-gcc/gcc/toplev.c:2144
#22 0x00552717 in main (argc=20, argv=0x7fffd288) at 
[...]/source-gcc/gcc/main.c:39

(Comparing to yesterday's r240004, the 

Re: Make max_align_t respect _Float128 [version 2]

2016-09-07 Thread Mark Wielaard
On Wed, Sep 07, 2016 at 11:15:34AM +0200, Florian Weimer wrote:
> On 09/06/2016 11:31 PM, Paul Eggert wrote:
> >On 09/06/2016 01:40 PM, Joseph Myers wrote:
> >>Sounds like a defect in C11 to me - none of the examples of flexible
> >>array
> >>members anticipate needing to add to the size to allow for tail padding
> >>with unknown alignment requirements.
> >
> >Yes, I would prefer calling it a defect, as most code I've seen dealing
> >with flexible array members does not align the tail size. However, GCC +
> >valgrind does take advantage of this "defect" and I would not be
> >surprised if other picky implementations do too.
> 
> It might be an inherent limitation of the valgrind approach. 
> Speculative loads which cannot result in data races (in the C11 sense) 
> due to the way the architecture behaves should be fine.  The alignment 
> ensures that the load is on the same page, which is what typically 
> prevent this optimization.

It might or might not be an issue for valgrind. If valgrind believes the
memory isn't in valid memory then it will complain about an invalid access.
But if the memory is accessible but uninitialised then it will just track
the undefinedness complain later if such a value is used.

> Some implementation techniques for C string functions result in the same 
> behavior.  valgrind intercepts them or suppresses errors there, but 
> that's not possible for code that GCC emits inline, obviously.

valgrind also has --partial-loads-ok (which in newer versions defaults
to =yes):

   Controls how Memcheck handles 32-, 64-, 128- and 256-bit naturally
   aligned loads from addresses for which some bytes are addressable
   and others are not. When yes, such loads do not produce an address
   error. Instead, loaded bytes originating from illegal addresses are
   marked as uninitialised, and those corresponding to legal addresses
   are handled in the normal way.

> valgrind would still treat the bytes beyond the allocation boundary as 
> undefined.  But I agree that false positives in this area are annoying.

Does anybody have an example program of the above issue compiled with
gcc that produces false positives with valgrind?

Thanks,

Mark


Re: [PATCH] Delete GCJ

2016-09-07 Thread Richard Earnshaw (lists)
On 06/09/16 22:17, Jeff Law wrote:
> On 09/06/2016 03:08 AM, Jakub Jelinek wrote:
>> On Tue, Sep 06, 2016 at 11:06:36AM +0200, Richard Biener wrote:
>>> On Mon, Sep 5, 2016 at 6:17 PM, Andrew Haley  wrote:
 On 05/09/16 17:15, Richard Biener wrote:
> On September 5, 2016 5:13:06 PM GMT+02:00, Andrew Haley
>  wrote:
>> As discussed.  I think I should ask a Global reviewer to approve this
>> one.  For obvious reasons I haven't included the diffs to the deleted
>> gcc/java and libjava directories.  The whole tree, post GCJ-deletion,
>> is at svn+ssh://gcc.gnu.org/svn/gcc/branches/gcj/gcj-deletion-branch
>> if anyone would like to try it.
>
> Isn't there also java specific C++ frontend parts?

 There certainly are, but deleting them without breaking anything else
 is going to be rather delicate.  I'm trying to do this one step at a
 time, rather cautiously.
>>>
>>> Ok, that sounds reasonable.
>>>
>>> You have my approval for this first part then.  Please wait until
>>> after the
>>> GNU Cauldron to allow other global reviewers to object.
>>
>> No objection from me.
> No objection from me either (I'm guessing that's not a surprise).
> 

Nor from me.

R.

> jeff



Re: [PATCH] Set -fprofile-update=atomic when -pthread is present

2016-09-07 Thread Martin Liška
On 08/18/2016 06:06 PM, Richard Biener wrote:
> On August 18, 2016 5:54:49 PM GMT+02:00, Jakub Jelinek  
> wrote:
>> On Thu, Aug 18, 2016 at 08:51:31AM -0700, Andi Kleen wrote:
 I'd prefer to make updates atomic in multi-threaded applications.
 The best proxy we have for that is -pthread.

 Is it slower, most definitely, but odds are we're giving folks
 garbage data otherwise, which in many ways is even worse.
>>>
>>> It will likely be catastrophically slower in some cases. 
>>>
>>> Catastrophically as in too slow to be usable.
>>>
>>> An atomic instruction is a lot more expensive than a single
>> increment. Also
>>> they sometimes are really slow depending on the state of the machine.
>>
>> Can't we just have thread-local copies of all the counters (perhaps
>> using
>> __thread pointer as base) and just atomically merge at thread
>> termination?
> 
> I suggested that as well but of course it'll have its own class of issues 
> (short lived threads, so we need to somehow re-use counters from terminated 
> threads, large number of threads and thus using too much memory for the 
> counters)
> 
> Richard.

Hello.

I've got written the approach on my TODO list, let's see whether it would be 
doable in a reasonable amount of time.

I've just finished some measurements to illustrate slow-down of 
-fprofile-update=atomic approach.
All numbers are: no profile, -fprofile-generate, -fprofile-generate 
-fprofile-update=atomic
c-ray benchmark (utilizing 8 threads, -O3): 1.7, 15.5., 38.1s
unrar (utilizing 8 threads, -O3): 3.6, 11.6, 38s
tramp3d (1 thread, -O3): 18.0, 46.6, 168s

So the slow-down is roughly 300% compared to -fprofile-generate. I'm not having 
much experience with default option
selection, but these numbers can probably help.

Thoughts?
Martin

> 
>>  Jakub
> 
> 



Re: [RFC][SSA] Iterator to visit SSA

2016-09-07 Thread Richard Biener
On Wed, Sep 7, 2016 at 2:21 AM, Kugan Vivekanandarajah
 wrote:
> Hi Richard,
>
> On 6 September 2016 at 19:08, Richard Biener  
> wrote:
>> On Tue, Sep 6, 2016 at 2:24 AM, Kugan Vivekanandarajah
>>  wrote:
>>> Hi Richard,
>>>
>>> On 5 September 2016 at 17:57, Richard Biener  
>>> wrote:
 On Mon, Sep 5, 2016 at 7:26 AM, Kugan Vivekanandarajah
  wrote:
> Hi All,
>
> While looking at gcc source, I noticed that we are iterating over SSA
> variable from 0 to num_ssa_names in some cases and 1 to num_ssa_names
> in others. It seems that variable 0 is always NULL TREE.

 Yeah, that's artificial because we don't assign SSA name version 0 (for 
 some
 unknown reason).

> But it can
> confuse people who are looking for the first time. Therefore It might
> be good to follow some consistent usage here.
>
> It might be also good to gave a FOR_EACH_SSAVAR iterator as we do in
> other case. Here is attempt to do this based on what is done in other
> places. Bootstrapped and regression tested on X86_64-linux-gnu with no
> new regressions. is this OK?

 First of all some bikeshedding - FOR_EACH_SSA_NAME would be better
 as SSAVAR might be confused with iterating over SSA_NAME_VAR.

 Then, if you add an iterator why leave the name == NULL handling to 
 consumers?
 That looks odd.

 Then, SSA names are just in a vector thus why not re-use the vector 
 iterator?

 That is,

 #define FOR_EACH_SSA_NAME (name) \
   for (unsigned i = 1; SSANAMES (cfun).iterate (i, ); ++i)

 would be equivalent to your patch?

 Please also don't add new iterators that implicitely use 'cfun' but always 
 use
 one that passes it as explicit arg.
>>>
>>> I think defining FOR_EACH_SSA_NAME with vector iterator is better. But
>>> we will not be able to skill NULL ssa_names with that.
>>
>> Why?  Can't you simply do
>>
>>   #define FOR_EACH_SSA_NAME (name) \
>> for (unsigned i = 1; SSANAMES (cfun).iterate (i, ); ++i) \
>>if (name)
>>
>> ?
>
> Indeed.  I missed the if at the end :(.  Here is an updated patch.
> Bootstrapped and regression tested on x86_64-linux-gnu with no new
> regressions.

Quoting myself:

> As I said
> I'd like 'cfun' to be explicit, thus
>
> #define FOR_EACH_SSA_NAME(I, VAR, FN) \
>  for (I = 1; SSANAMES (FN)->iterate ((I), &(VAR)); ++(I)) \
>   if (VAR)

the patch doesn't seem to contain that FN part.  Please also put declarations
of 'name' you need to add right before the FOR_EACH_SSA_NAME rather
than far away.

Thanks,
Richard.

> Thanks,
> Kugan
>>
>>> I also added
>>> index variable to the macro so that there want be any conflicts if the
>>> index variable "i" (or whatever) is also defined in the loop.
>>>
>>> Bootstrapped and regression tested on x86_64-linux-gnu with no new
>>> regressions. Is this OK for trunk?
>>>
>>> Thanks,
>>> Kugan
>>>
>>>
>>> gcc/ChangeLog:
>>>
>>> 2016-09-06  Kugan Vivekanandarajah  
>>>
>>> * tree-ssanames.h (FOR_EACH_SSA_NAME): New.
>>> * cfgexpand.c (update_alias_info_with_stack_vars): Use
>>> FOR_EACH_SSA_NAME to iterate over SSA variables.
>>> (pass_expand::execute): Likewise.
>>> * omp-simd-clone.c (ipa_simd_modify_function_body): Likewise.
>>> * tree-cfg.c (dump_function_to_file): Likewise.
>>> * tree-into-ssa.c (pass_build_ssa::execute): Likewise.
>>> (update_ssa): Likewise.
>>> * tree-ssa-alias.c (dump_alias_info): Likewise.
>>> * tree-ssa-ccp.c (ccp_finalize): Likewise.
>>> * tree-ssa-coalesce.c (build_ssa_conflict_graph): Likewise.
>>> (create_outofssa_var_map): Likewise.
>>> (coalesce_ssa_name): Likewise.
>>> * tree-ssa-operands.c (dump_immediate_uses): Likewise.
>>> * tree-ssa-pre.c (compute_avail): Likewise.
>>> * tree-ssa-sccvn.c (init_scc_vn): Likewise.
>>> (scc_vn_restore_ssa_info): Likewise.
>>> (free_scc_vn): Likwise.
>>> (run_scc_vn): Likewise.
>>> * tree-ssa-structalias.c (compute_points_to_sets): Likewise.
>>> * tree-ssa-ter.c (new_temp_expr_table): Likewise.
>>> * tree-ssa-copy.c (fini_copy_prop): Likewise.
>>> * tree-ssa.c (verify_ssa): Likewise.
>>>

 Thanks,
 Richard.


> Thanks,
> Kugan
>
>
> gcc/ChangeLog:
>
> 2016-09-05  Kugan Vivekanandarajah  
>
> * tree-ssanames.h (ssa_iterator::ssa_iterator): New.
> (ssa_iterator::get): Likewise.
> (ssa_iterator::next): Likewise.
> (FOR_EACH_SSAVAR): Likewise.
> * cfgexpand.c (update_alias_info_with_stack_vars): Use
> FOR_EACH_SSAVAR to iterate over SSA variables.
> (pass_expand::execute): Likewise.
> * omp-simd-clone.c (ipa_simd_modify_function_body): Likewise.

Re: [PATCH] Detect whether target can use -fprofile-update=atomic

2016-09-07 Thread Martin Liška
On 09/07/2016 09:45 AM, Christophe Lyon wrote:
> On 6 September 2016 at 15:45, Martin Liška  wrote:
>> On 09/06/2016 03:31 PM, Jakub Jelinek wrote:
>>> sizeof (gcov_type) talks about the host gcov type, you want instead the
>>> target gcov type.  So
>>> TYPE_SIZE (gcov_type_node) == 32 vs. 64 (or TYPE_SIZE_UNIT (gcov_type_node)
>>> == 4 vs. 8).
>>> As SImode and DImode are in fact 4*BITS_PER_UNIT and 8*BITS_PER_UNIT,
>>> TYPE_SIZE_UNIT comparisons for 4 and 8 are most natural.
>>> And I wouldn't add gcc_unreachable, just warn for weirdo arches always.
>>>
>>>   Jakub
>>
>> Thank you Jakub for helping me with that. I've used TYPE_SIZE_UNIT macro.
>>
>> Ready for trunk?
>> Martin
> 
> Hi Martin,
> 
> On targets which do not support atomic profile update, your patch generates a
> warning on gcc.dg/tree-prof/val-profiler-threads-1.c, making it fail.
> 
> Do we need a new effective-target ?
> 
> Christophe
> 

Hi.

Thanks for observation, I'm sending a patch that does that.
Can you please test it?

Thanks,
Martin
>From 9a68f2fbf2b5cb547aee7860926c846d5f15d398 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 7 Sep 2016 11:28:13 +0200
Subject: [PATCH] Add new effective target: profile_update_atomic

gcc/testsuite/ChangeLog:

2016-09-07  Martin Liska  

	* g++.dg/gcov/gcov-threads-1.C: Use profile_update_atomic
	effective target.
	* gcc.dg/tree-prof/val-profiler-threads-1.c: Likewise.
	* lib/target-supports.exp: Define the new target.
---
 gcc/testsuite/g++.dg/gcov/gcov-threads-1.C  | 1 +
 gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c | 2 ++
 gcc/testsuite/lib/target-supports.exp   | 7 +++
 3 files changed, 10 insertions(+)

diff --git a/gcc/testsuite/g++.dg/gcov/gcov-threads-1.C b/gcc/testsuite/g++.dg/gcov/gcov-threads-1.C
index a4a6f0a..cc9266a 100644
--- a/gcc/testsuite/g++.dg/gcov/gcov-threads-1.C
+++ b/gcc/testsuite/g++.dg/gcov/gcov-threads-1.C
@@ -1,5 +1,6 @@
 /* { dg-options "-fprofile-arcs -ftest-coverage -pthread -fprofile-update=atomic" } */
 /* { dg-do run { target native } } */
+/* { dg-require-effective-target profile_update_atomic } */
 
 #include 
 #include 
diff --git a/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c b/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c
index e9b04a0..95d6ee3 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/val-profiler-threads-1.c
@@ -1,4 +1,6 @@
 /* { dg-options "-O0 -pthread -fprofile-update=atomic" } */
+/* { dg-require-effective-target profile_update_atomic } */
+
 #include 
 
 #define NUM_THREADS	8
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 545b3dc..6724a7f 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -7699,3 +7699,10 @@ proc check_effective_target_offload_hsa { } {
 	int main () {return 0;}
 } "-foffload=hsa" ]
 }
+
+# Return 1 if the target support -fprofile-update=atomic
+proc check_effective_target_profile_update_atomic {} {
+return [check_no_compiler_messages profile_update_atomic assembly {
+	int main (void) { return 0; }
+} "-fprofile-update=atomic -fprofile-generate"]
+}
-- 
2.9.2



Re: [PATCH, PING] DWARF: process all TYPE_DECL nodes when iterating on scopes

2016-09-07 Thread Richard Biener
On Tue, Sep 6, 2016 at 10:43 AM, Pierre-Marie de Rodat
 wrote:
> Hello,
>
> On 08/29/2016 11:03 AM, Pierre-Marie de Rodat wrote:
>>
>> Here is another attempt to solve the original issue. This time, with a
>> proper testcase. ;-)
>>
>> Rebased against trunk, boostrapped and regtested on x86_64-linux.
>
>
> Ping for the patch submitted at
> . Thank you in
> advance!

Ok, had time to look at this issue again.  I see the patch works like dwarf2out
works currently with respect to DIE creation order and re-location.

-   /* Output a DIE to represent the typedef itself.  */
-   gen_typedef_die (decl, context_die);
+   {
+ /* Output a DIE to represent the typedef itself.  */
+ gen_typedef_die (decl, context_die);
+
+ /* The above may create a typedef in the proper scope, but the
+underlying type itself could have been created earlier, at a point
+when the scope was not available yet.  If it's the case, relocate
+it.  This is analogous to what is done in process_scope_var,
+except we deal with a TYPE and not a DECL, here.  */
+ dw_die_ref type_die = lookup_type_die (TREE_TYPE (decl));
+ if (type_die != NULL && type_die->die_parent == NULL
+ && DECL_CONTEXT (decl) == TYPE_CONTEXT (TREE_TYPE (decl)))
+   add_child_die (context_die, type_die);
+   }

this might be incomplete though for the case where it's say

 typedef const T X;

thus the type of decl is a qualified type?  In this case the qualification might
be at the correct scope but the actual type not or you just relocate the
qualification but not the type DIE it refers to?

That said, with the idea of early debug in place and thus giving more
responsibility
to the frontends I wonder in what order the Ada FE calls
debug_hooks.early_global_decl ()?
Maybe it's the middle-end and it should arrange for a more natural
order on function
nests.  So I'd appreciate if you can investigate this side of the
issue a bit, that is, simply
avoid the bad ordering.

Richard.

> --
> Pierre-Marie de Rodat


Re: [PATCH, c++, PR77427 ] Set TYPE_STRUCTURAL_EQUALITY for sysv_abi va_list

2016-09-07 Thread Richard Biener
On Mon, Sep 5, 2016 at 6:11 PM, Tom de Vries  wrote:
> On 05/09/16 09:49, Richard Biener wrote:
>>
>> On Sun, Sep 4, 2016 at 11:30 PM, Tom de Vries 
>> wrote:
>>>
>>> > On 04/09/16 16:08, Richard Biener wrote:

 >>
 >> On September 4, 2016 12:33:02 PM GMT+02:00, Tom de Vries
 >>  wrote:
>
> >>>
> >>> On 04/09/16 08:12, Richard Biener wrote:
>>
>> 
>>  On September 3, 2016 5:23:35 PM GMT+02:00, Tom de Vries
>
> >>>
> >>>  wrote:

 >>
 >> Hi,
 >>
 >> this patch fixes a c++ ICE, a p1 6/7 regression.
 >>
 >>
 >> Consider test.C:
 >> ...
 >> void bar (__builtin_va_list &);
 >>
 >> struct c
 >> {
 >>   operator const __builtin_va_list &();
 >>   operator __builtin_va_list &();
 >> };
 >>
 >> void
 >> foo (void) {
 >>   struct c c1;
 >>
 >>   bar (c1);
 >> }
 >> ...
 >>
 >> The compiler ICEs as follows:
 >> ...
 >> test.C: In function ‘void foo()’:
 >> test.C:13:10: internal compiler error: canonical types differ
 >> for
 >> identical types __va_list_tag [1] and __va_list_tag [1]
 >>bar (c1);
 >>   ^
 >> comptypes(tree_node*, tree_node*, int)
 >> src/gcc/cp/typeck.c:1430
 >> reference_related_p(tree_node*, tree_node*)
 >> src/gcc/cp/call.c:1415
 >> reference_binding
 >> src/gcc/cp/call.c:1559
 >> implicit_conversion
 >> src/gcc/cp/call.c:1805
 >> build_user_type_conversion_1
 >> src/gcc/cp/call.c:3776
 >> reference_binding
 >> src/gcc/cp/call.c:1664
 >> implicit_conversion
 >> src/gcc/cp/call.c:1805
 >> add_function_candidate
 >> src/gcc/cp/call.c:2141
 >> add_candidates
 >> src/gcc/cp/call.c:5394
 >> perform_overload_resolution
 >> src/gcc/cp/call.c:4066
 >> build_new_function_call(tree_node*, vec
> >>>
> >>> vl_embed>**,

 >>
 >>   bool, int)
 >> src/gcc/cp/call.c:4143
 >> finish_call_expr(tree_node*, vec> vl_embed>**,
>
> >>>
> >>> bool,

 >>
 >>   bool, int)
 >> src/gcc/cp/semantics.c:2440
 >> ...
 >>
 >> The regression is caused by the commit for PR70955, that adds
 >> a
 >> "sysv_abi va_list" attribute to the struct in the va_list
 >> type
>
> >>>
> >>> (which

 >>
 >> is
 >> an array of one of struct).
 >>
 >> The ICE in comptypes happens as follows: we're comparing two
>
> >>>
> >>> versions

 >>
 >> of
 >> va_list type (with identical array element type), each with
 >> the
 >> canonical type set to themselves. Since the types are
 >> considered
 >> identical, they're supposed to have identical canonical
 >> types,
>
> >>>
> >>> which is
> >>>
>>
>>  Did you figure out why they are not assigned the same canonical
>>  type?
>
> >>>
> >>>
> >>> When constructing the first type in ix86_build_builtin_va_list_64,
> >>> it's
> >>>
> >>> cached.
> >>>
> >>> When constructing the second type in build_array_type_1 (with call
> >>> stack: grokdeclarator -> cp_build_qualified_type_real ->
> >>> build_cplus_array_type -> build_cplus_array_type ->
> >>> build_array_type ->
> >>>
> >>> build_array_type_1), we call type_hash_canon.
> >>>
> >>> But the cached type has name __builtin_sysv_va_list, and the new
> >>> type
> >>> has no name, so we hit the clause 'TYPE_NAME (a->type) != TYPE_NAME
> >>> (b->type)' in type_cache_hasher::equal.
> >>>
> >>> Consequently, TYPE_CANONICAL for the new type remain set to self.

 >>
 >>
 >> But how did it then work before the patch causing this?
>>>
>>> >
>>> >
>>> > Before the patch that introduced the attribute, rather than assigning
>>> > the
>>> > result of ix86_build_builtin_va_list_64 directly
>>> > sysv_va_list_type_node, an
>>> > intermediate build_variant_type_copy was used.
>>> >
>>> > This had as consequence 

Re: Make max_align_t respect _Float128 [version 2]

2016-09-07 Thread Florian Weimer

On 09/06/2016 11:31 PM, Paul Eggert wrote:

On 09/06/2016 01:40 PM, Joseph Myers wrote:

Sounds like a defect in C11 to me - none of the examples of flexible
array
members anticipate needing to add to the size to allow for tail padding
with unknown alignment requirements.


Yes, I would prefer calling it a defect, as most code I've seen dealing
with flexible array members does not align the tail size. However, GCC +
valgrind does take advantage of this "defect" and I would not be
surprised if other picky implementations do too.


It might be an inherent limitation of the valgrind approach. 
Speculative loads which cannot result in data races (in the C11 sense) 
due to the way the architecture behaves should be fine.  The alignment 
ensures that the load is on the same page, which is what typically 
prevent this optimization.


Some implementation techniques for C string functions result in the same 
behavior.  valgrind intercepts them or suppresses errors there, but 
that's not possible for code that GCC emits inline, obviously.


valgrind would still treat the bytes beyond the allocation boundary as 
undefined.  But I agree that false positives in this area are annoying.


Florian



Re: Make max_align_t respect _Float128 [version 2]

2016-09-07 Thread Florian Weimer

On 09/06/2016 10:40 PM, Joseph Myers wrote:

On Tue, 6 Sep 2016, Paul Eggert wrote:


One way to correct the code is to increase malloc's argument up to a multiple
of alignof(max_align_t). (One cannot portably use alignof(struct s) due to


Sounds like a defect in C11 to me - none of the examples of flexible array
members anticipate needing to add to the size to allow for tail padding
with unknown alignment requirements.


I agree, this is a defect in C99 and C11.  The language hasn't changed 
since C99, and C99 has the same issue because it's unrelated to 
alignment specifiers.  It's a confusion between struct sizes (which are 
multiples of the struct alignment) and object sizes (which are not 
necessarily so).


I have reopened PR1 with a more elaborate test case which shows that 
GCC packs objects more tightly than the struct alignment would permit.


Thanks,
Florian


Re: [PATCH][v3] GIMPLE store merging pass

2016-09-07 Thread Richard Biener
On Tue, 6 Sep 2016, Jakub Jelinek wrote:

> On Tue, Sep 06, 2016 at 04:59:23PM +0100, Kyrill Tkachov wrote:
> > On 06/09/16 16:32, Jakub Jelinek wrote:
> > >On Tue, Sep 06, 2016 at 04:14:47PM +0100, Kyrill Tkachov wrote:
> > >>The v3 of this patch addresses feedback I received on the version posted 
> > >>at [1].
> > >>The merged store buffer is now represented as a char array that we splat 
> > >>values onto with
> > >>native_encode_expr and native_interpret_expr. This allows us to merge 
> > >>anything that native_encode_expr
> > >>accepts, including floating point values and short vectors. So this 
> > >>version extends the functionality
> > >>of the previous one in that it handles floating point values as well.
> > >>
> > >>The first phase of the algorithm that detects the contiguous stores is 
> > >>also slightly refactored according
> > >>to feedback to read more fluently.
> > >>
> > >>Richi, I experimented with merging up to MOVE_MAX bytes rather than word 
> > >>size but I got worse results on aarch64.
> > >>MOVE_MAX there is 16 (because it has load/store register pair 
> > >>instructions) but the 128-bit immediates that we ended
> > >>synthesising were too complex. Perhaps the TImode immediate store RTL 
> > >>expansions could be improved, but for now
> > >>I've left the maximum merge size to be BITS_PER_WORD.
> > >At least from playing with this kind of things in the RTL PR22141 patch,
> > >I remember storing 64-bit constants on x86_64 compared to storing 2 32-bit
> > >constants usually isn't a win (not just for speed optimized blocks but 
> > >also for
> > >-Os).  For 64-bit store if the constant isn't signed 32-bit or unsigned
> > >32-bit you need movabsq into some temporary register which has like 3 
> > >times worse
> > >latency than normal store if I remember well, and then store it.
> > 
> > We could restrict the maximum width of the stores generated to 32 bits on 
> > x86_64.
> > I think this would need another parameter or target macro for the target to 
> > set.
> > Alternatively, is it a possibility for x86 to be a bit smarter in its 
> > DImode mov-immediate
> > expansion? For example break up the 64-bit movabsq immediate into two 
> > SImode immediates?
> 
> If you want a 64-bit store, you'd need to merge the two, and that would be
> even more expensive.  It is a matter of say:
>   movl $0x12345678, (%rsp)
>   movl $0x09abcdef, 4(%rsp)
> vs.
>   movabsq $0x09abcdef12345678, %rax
>   movq %rax, (%rsp)
> vs.
>   movl $0x09abcdef, %eax
>   salq $32, %rax
>   orq $0x12345678, %rax
>   movq $rax, (%rsp)

vs.

movq $LC0, (%rsp)

?

> etc.  Guess it needs to be benchmarked on contemporary CPUs, I'm pretty sure
> the last sequence is the worst one.

I think the important part to notice is that it should be straight forward
for a target / the expander to split a large store from an immediate
into any of the above but very hard to do the opposite.  Thus from a
GIMPLE side "canonicalizing" to large stores (that are eventually
supported and well-aligned) seems best to me.
 
> > >What alias set is used for the accesses if there are different alias sets
> > >involved in between the merged stores?
> > 
> > As per https://gcc.gnu.org/ml/gcc/2016-06/msg00162.html the type used in 
> > those cases
> > would be ptr_type_node. See the get_type_for_merged_store function in the 
> > patch.
> 
> Richi knows this best.  I just wonder if e.g. all the stores go into fields
> of the same structure it wouldn't be better if you need to punt use that
> structure as the type rather than alias set 0.

Well, yes - if the IL always accesses a common handled component you
can use the alias set of that component.  But it's some work to do
this correctly as you can have MEM[ + 4] = 1; which stores an 'int'
to a struct with two floats.   So you do have to be careful, also when
merging overlapping stores (in which case you'll certainly have an
irregular situation).

Which is why I suggested the "easy" approach above which should handle
a lot of cases well already.

> > I'm aware of that. The patch already has logic to avoid emitting unaligned 
> > accesses
> > for SLOW_UNALIGNED_ACCESS targets. Beyond that the patch introduces the 
> > parameter
> > PARAM_STORE_MERGING_ALLOW_UNALIGNED that can be used by the user or target 
> > to
> > forbid generation of unaligned stores by the pass altogether. Beyond that 
> > I'm not sure
> > how to behave more intelligently here. Any ideas?
> 
> Dunno, the heuristics was the main problem with my patch.  Generally, I'd
> say there is a difference between cold and hot blocks, in cold ones perhaps
> unaligned stores are more appropriate (if supported at all and not way too
> slow), while in hot ones less desirable.

Note that I repeatedly argue that if we can canonicalize sth to "larger"
then even if unaligned, the expander should be able to produce optimal
code again (it might not do, of course).

Hope to look at the patch in detail soon.


Re: [PATCH] Fix PR77450

2016-09-07 Thread Richard Biener
On Wed, 7 Sep 2016, Yvan Roux wrote:

> Hi Richard,
> 
> On 6 September 2016 at 14:41, Richard Biener  wrote:
> >
> > The following fixes PR77450.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
> >
> > Richard.
> >
> > 2016-09-06  Richard Biener  
> >
> > PR c/77450
> > c-family/
> > * c-common.c (c_common_mark_addressable_vec): Handle
> > COMPOUND_LITERAL_EXPR.
> >
> > * c-c++-common/vector-subscript-7.c: Adjust.
> > * c-c++-common/vector-subscript-8.c: New testcase.
> 
> This new testcase fails in our validation (ARM and x86 targets):
> 
> gcc/testsuite/c-c++-common/vector-subscript-8.c:8:17: error: lvalue
> required as left operand of assignment
> compiler exited with status 1
> FAIL: c-c++-common/vector-subscript-8.c  -std=c++98 (test for excess errors)

Ah, seems my dev tree eats the NON_LVALUE_EXPR we build around it.

So it looks like the testcase is invalid for C++, will move it.

Richard.

> 
> > Index: gcc/c-family/c-common.c
> > ===
> > --- gcc/c-family/c-common.c (revision 240004)
> > +++ gcc/c-family/c-common.c (working copy)
> > @@ -10918,7 +10918,9 @@ c_common_mark_addressable_vec (tree t)
> >  {
> >while (handled_component_p (t))
> >  t = TREE_OPERAND (t, 0);
> > -  if (!VAR_P (t) && TREE_CODE (t) != PARM_DECL)
> > +  if (!VAR_P (t)
> > +  && TREE_CODE (t) != PARM_DECL
> > +  && TREE_CODE (t) != COMPOUND_LITERAL_EXPR)
> >  return;
> >TREE_ADDRESSABLE (t) = 1;
> >  }
> > Index: gcc/testsuite/c-c++-common/vector-subscript-7.c
> > ===
> > --- gcc/testsuite/c-c++-common/vector-subscript-7.c (revision 240004)
> > +++ gcc/testsuite/c-c++-common/vector-subscript-7.c (working copy)
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-O -fdump-tree-ccp1" } */
> > +/* { dg-options "-O -fdump-tree-fre1" } */
> >
> >  typedef int v4si __attribute__ ((vector_size (16)));
> >
> > @@ -11,4 +11,4 @@ main (int argc, char** argv)
> >return ((v4si){1, 2, 42, 0})[j];
> >  }
> >
> > -/* { dg-final { scan-tree-dump "return 42;" "ccp1" } } */
> > +/* { dg-final { scan-tree-dump "return 42;" "fre1" } } */
> > Index: gcc/testsuite/c-c++-common/vector-subscript-8.c
> > ===
> > --- gcc/testsuite/c-c++-common/vector-subscript-8.c (revision 0)
> > +++ gcc/testsuite/c-c++-common/vector-subscript-8.c (working copy)
> > @@ -0,0 +1,9 @@
> > +/* { dg-do compile } */
> > +
> > +typedef int V __attribute__((vector_size(4)));
> > +
> > +void
> > +foo(void)
> > +{
> > +  (V){ 0 }[0] = 0;
> > +}
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [RFC][SSA] Iterator to visit SSA

2016-09-07 Thread Dominik Inführ
I am not sure about the process, but it may also be nice/useful to add your new 
macro to ForEachMacros in contrib/clang-format.

Dominik

> On 07 Sep 2016, at 02:21, Kugan Vivekanandarajah 
>  wrote:
> 
> Hi Richard,
> 
> On 6 September 2016 at 19:08, Richard Biener  
> wrote:
>> On Tue, Sep 6, 2016 at 2:24 AM, Kugan Vivekanandarajah
>>  wrote:
>>> Hi Richard,
>>> 
>>> On 5 September 2016 at 17:57, Richard Biener  
>>> wrote:
 On Mon, Sep 5, 2016 at 7:26 AM, Kugan Vivekanandarajah
  wrote:
> Hi All,
> 
> While looking at gcc source, I noticed that we are iterating over SSA
> variable from 0 to num_ssa_names in some cases and 1 to num_ssa_names
> in others. It seems that variable 0 is always NULL TREE.
 
 Yeah, that's artificial because we don't assign SSA name version 0 (for 
 some
 unknown reason).
 
> But it can
> confuse people who are looking for the first time. Therefore It might
> be good to follow some consistent usage here.
> 
> It might be also good to gave a FOR_EACH_SSAVAR iterator as we do in
> other case. Here is attempt to do this based on what is done in other
> places. Bootstrapped and regression tested on X86_64-linux-gnu with no
> new regressions. is this OK?
 
 First of all some bikeshedding - FOR_EACH_SSA_NAME would be better
 as SSAVAR might be confused with iterating over SSA_NAME_VAR.
 
 Then, if you add an iterator why leave the name == NULL handling to 
 consumers?
 That looks odd.
 
 Then, SSA names are just in a vector thus why not re-use the vector 
 iterator?
 
 That is,
 
 #define FOR_EACH_SSA_NAME (name) \
  for (unsigned i = 1; SSANAMES (cfun).iterate (i, ); ++i)
 
 would be equivalent to your patch?
 
 Please also don't add new iterators that implicitely use 'cfun' but always 
 use
 one that passes it as explicit arg.
>>> 
>>> I think defining FOR_EACH_SSA_NAME with vector iterator is better. But
>>> we will not be able to skill NULL ssa_names with that.
>> 
>> Why?  Can't you simply do
>> 
>>  #define FOR_EACH_SSA_NAME (name) \
>>for (unsigned i = 1; SSANAMES (cfun).iterate (i, ); ++i) \
>>   if (name)
>> 
>> ?
> 
> Indeed.  I missed the if at the end :(.  Here is an updated patch.
> Bootstrapped and regression tested on x86_64-linux-gnu with no new
> regressions.
> 
> Thanks,
> Kugan
>> 
>>> I also added
>>> index variable to the macro so that there want be any conflicts if the
>>> index variable "i" (or whatever) is also defined in the loop.
>>> 
>>> Bootstrapped and regression tested on x86_64-linux-gnu with no new
>>> regressions. Is this OK for trunk?
>>> 
>>> Thanks,
>>> Kugan
>>> 
>>> 
>>> gcc/ChangeLog:
>>> 
>>> 2016-09-06  Kugan Vivekanandarajah  
>>> 
>>>* tree-ssanames.h (FOR_EACH_SSA_NAME): New.
>>>* cfgexpand.c (update_alias_info_with_stack_vars): Use
>>>FOR_EACH_SSA_NAME to iterate over SSA variables.
>>>(pass_expand::execute): Likewise.
>>>* omp-simd-clone.c (ipa_simd_modify_function_body): Likewise.
>>>* tree-cfg.c (dump_function_to_file): Likewise.
>>>* tree-into-ssa.c (pass_build_ssa::execute): Likewise.
>>>(update_ssa): Likewise.
>>>* tree-ssa-alias.c (dump_alias_info): Likewise.
>>>* tree-ssa-ccp.c (ccp_finalize): Likewise.
>>>* tree-ssa-coalesce.c (build_ssa_conflict_graph): Likewise.
>>>(create_outofssa_var_map): Likewise.
>>>(coalesce_ssa_name): Likewise.
>>>* tree-ssa-operands.c (dump_immediate_uses): Likewise.
>>>* tree-ssa-pre.c (compute_avail): Likewise.
>>>* tree-ssa-sccvn.c (init_scc_vn): Likewise.
>>>(scc_vn_restore_ssa_info): Likewise.
>>>(free_scc_vn): Likwise.
>>>(run_scc_vn): Likewise.
>>>* tree-ssa-structalias.c (compute_points_to_sets): Likewise.
>>>* tree-ssa-ter.c (new_temp_expr_table): Likewise.
>>>* tree-ssa-copy.c (fini_copy_prop): Likewise.
>>>* tree-ssa.c (verify_ssa): Likewise.
>>> 
 
 Thanks,
 Richard.
 
 
> Thanks,
> Kugan
> 
> 
> gcc/ChangeLog:
> 
> 2016-09-05  Kugan Vivekanandarajah  
> 
>* tree-ssanames.h (ssa_iterator::ssa_iterator): New.
>(ssa_iterator::get): Likewise.
>(ssa_iterator::next): Likewise.
>(FOR_EACH_SSAVAR): Likewise.
>* cfgexpand.c (update_alias_info_with_stack_vars): Use
>FOR_EACH_SSAVAR to iterate over SSA variables.
>(pass_expand::execute): Likewise.
>* omp-simd-clone.c (ipa_simd_modify_function_body): Likewise.
>* tree-cfg.c (dump_function_to_file): Likewise.
>* tree-into-ssa.c (pass_build_ssa::execute): Likewise.
>(update_ssa): Likewise.
>* tree-ssa-alias.c (dump_alias_info): Likewise.
>* 

Re: [PATCH GCC 9/9]Prove no-overflow in computation of LOOP_VINFO_NITERS and improve code generation

2016-09-07 Thread Bin.Cheng
On Wed, Sep 7, 2016 at 1:10 AM, kugan  wrote:
> Hi Bin,
>
>
> On 07/09/16 04:54, Bin Cheng wrote:
>>
>> Hi,
>> LOOP_VINFO_NITERS is computed as LOOP_VINFO_NITERSM1 + 1, which could
>> overflow in loop niters' type.  Vectorizer needs to generate more code
>> computing vectorized niters if overflow does happen.  However, For common
>> loops, there is no overflow actually, this patch tries to prove the
>> no-overflow information and use that to improve code generation.  At the
>> moment, no-overflow information comes either from loop niter analysis, or
>> the truth that we know loop is peeled for non-zero iterations in prologue
>> peeling.  For the latter case, it doesn't matter if the original
>> LOOP_VINFO_NITERS overflows or not, because computation LOOP_VINFO_NITERS -
>> LOOP_VINFO_PEELING_FOR_ALIGNMENT cancels the overflow by underflow.
>>
>> Thanks,
>> bin
>>
>> 2016-09-01  Bin Cheng  
>>
>> * tree-vect-loop.c (loop_niters_no_overflow): New func.
>> (vect_transform_loop): Call loop_niters_no_overflow.  Pass the
>> no-overflow information to vect_do_peeling_for_loop_bound and
>> vect_gen_vector_loop_niters.
>>
>>
>> 009-prove-no_overflow-for-vect-niters-20160902.txt
>>
>>
>> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>> index 0d37f55..2ef1f9b 100644
>> --- a/gcc/tree-vect-loop.c
>> +++ b/gcc/tree-vect-loop.c
>> @@ -6610,6 +6610,38 @@ vect_loop_kill_debug_uses (struct loop *loop,
>> gimple *stmt)
>>  }
>>  }
>>
>> +/* Given loop represented by LOOP_VINFO, return true if computation of
>> +   LOOP_VINFO_NITERS (= LOOP_VINFO_NITERSM1 + 1) doesn't overflow, false
>> +   otherwise.  */
>> +
>> +static bool
>> +loop_niters_no_overflow (loop_vec_info loop_vinfo)
>> +{
>> +  /* Constant case.  */
>> +  if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
>> +{
>> +  int cst_niters = LOOP_VINFO_INT_NITERS (loop_vinfo);
>
>
> Wouldn't it truncate by assigning this to int?
Probably, now I think it's unnecessary to use int version niters here,
LOOP_VINFO_NITERS can be used directly.
>
>
>> +  tree cst_nitersm1 = LOOP_VINFO_NITERSM1 (loop_vinfo);
>> +
>> +  gcc_assert (TREE_CODE (cst_nitersm1) == INTEGER_CST);
>> +  if (wi::to_widest (cst_nitersm1) < cst_niters)
>
>
> Shouldn't you have do the addition and comparison in the type of the loop
> index instead of widest_int to see if that overflows?
You mean the type of loop niters?  NITERS is computed from NITERSM1 +
1, I don't think we need to do it again here.

Thanks,
bin


Re: [PATCH] Detect whether target can use -fprofile-update=atomic

2016-09-07 Thread Christophe Lyon
On 6 September 2016 at 15:45, Martin Liška  wrote:
> On 09/06/2016 03:31 PM, Jakub Jelinek wrote:
>> sizeof (gcov_type) talks about the host gcov type, you want instead the
>> target gcov type.  So
>> TYPE_SIZE (gcov_type_node) == 32 vs. 64 (or TYPE_SIZE_UNIT (gcov_type_node)
>> == 4 vs. 8).
>> As SImode and DImode are in fact 4*BITS_PER_UNIT and 8*BITS_PER_UNIT,
>> TYPE_SIZE_UNIT comparisons for 4 and 8 are most natural.
>> And I wouldn't add gcc_unreachable, just warn for weirdo arches always.
>>
>>   Jakub
>
> Thank you Jakub for helping me with that. I've used TYPE_SIZE_UNIT macro.
>
> Ready for trunk?
> Martin

Hi Martin,

On targets which do not support atomic profile update, your patch generates a
warning on gcc.dg/tree-prof/val-profiler-threads-1.c, making it fail.

Do we need a new effective-target ?

Christophe


Re: Make max_align_t respect _Float128 [version 2]

2016-09-07 Thread Florian Weimer

On 09/06/2016 11:23 AM, Richard Biener wrote:


The question then is, how can we make max_align_t more useful?  If it is
supposed to reflect a malloc property, it cannot be a compile-time constant
because we don't know at compile time which malloc is going to be used
(malloc has to remain interposable).  If there is no relationship to malloc,
GCC essentially supports unbounded alignment on many targets.  How is it
possible to have a meaningful definition of max_align_t under such
circumstances?


I thought max_align_t was to replace uses like

  union {
long long x;
double y;
...
char buf[N];
  };

to get statically allocated "max" aligned memory.  That is, you now know
_which_ type you need to put into the union.


Yes, that's the way I have used it too.  But it's difficult to use such 
an idiom and make it valid C11 (you basically have to use memcpy to 
access the buffer, direct access is not permitted because the dynamic 
type is char).


But it's unclear what this trying to achieve.  Ignoring the aliasing 
issue, you can store any object there if you haven't used an _Alignas 
specification.  If you used an _Alignas specification for the object's 
type, you may have to manually align the pointer if the alignment is 
larger than that of max_align_t.


The existence of such a cut-off constant appears useful, but it's not 
clear if it should be tied to the fundamental alignment (especially, as 
this discussion shows, the fundamental alignment will be somewhat in flux).


Florian


Re: [PATCH][RFC] Early LTO debug

2016-09-07 Thread Richard Biener
On Tue, 6 Sep 2016, Markus Trippelsdorf wrote:

> On 2016.09.06 at 14:26 +0200, Richard Biener wrote:
> > On Tue, 6 Sep 2016, Markus Trippelsdorf wrote:
> > 
> > > On 2016.09.06 at 13:17 +0200, Richard Biener wrote:
> > > > 
> > > > The following is an updated patch, mainly stripped out old unnecessary
> > > > cruft and some fixes for an issue that arised when LTOing Firefox.
> > > 
> > > One minor issue that I've noticed is that the patch generates a lot of
> > > empty *debugobj* files in $TMPDIR, e.g.:
> > > 
> > >  % echo 'int main(){}' | g++ -flto -o /dev/null-x c++ -
> > >  % ls -l $TMPDIR
> > > total 0
> > > -rw---. 1 trippels trippels 0 Sep  6 12:11 ccenD5Tcdebugobj
> > > -rw---. 1 trippels trippels 0 Sep  6 12:11 ccXzvE4udebugobjtem
> > 
> > Ah, make_temp_file actually _creates_ the files already...
> > 
> > Fixed with the patch below.
> 
> Thanks, it takes care of the debugobjtems, but still creates bogus
> debugobjs.

Ah, yeah.  The following incremental patch fixes that as well.

Richard.

Index: early-lto-debug/gcc/lto-wrapper.c
===
--- early-lto-debug.orig/gcc/lto-wrapper.c  2016-09-07 09:12:25.339333496 
+0200
+++ early-lto-debug/gcc/lto-wrapper.c   2016-09-07 09:09:19.017138725 +0200
@@ -1435,17 +1435,24 @@ cont1:
}
 }
   else
-skip_debug = true;
+{
+  unlink_if_ordinary (debug_obj);
+  free (debug_obj);
+  debug_obj = NULL;
+  skip_debug = true;
+}
 
   if (lto_mode == LTO_MODE_LTO)
 {
   printf ("%s\n", flto_out);
   if (!skip_debug)
-   printf ("%s\n", debug_obj);
+   {
+ printf ("%s\n", debug_obj);
+ free (debug_obj);
+ debug_obj = NULL;
+   }
   free (flto_out);
   flto_out = NULL;
-  free (debug_obj);
-  debug_obj = NULL;
 }
   else
 {
@@ -1593,9 +1600,11 @@ cont:
maybe_unlink (input_names[i]);
}
   if (!skip_debug)
-   printf ("%s\n", debug_obj);
-  free (debug_obj);
-  debug_obj = NULL;
+   {
+ printf ("%s\n", debug_obj);
+ free (debug_obj);
+ debug_obj = NULL;
+   }
   for (i = 0; i < nr; ++i)
{
  fputs (output_names[i], stdout);


Re: [PATCH] -fsanitize=thread fixes (PR sanitizer/68260)

2016-09-07 Thread Richard Biener
On Tue, 6 Sep 2016, Jakub Jelinek wrote:

> Hi!
> 
> As the PRs show, -fsanitize=thread hasn't been instrumenting __atomic_clear
> and __atomic_test_and_set builtins (i.e. those that operate on bool always),
> so we reported a data race on the testcase even when there wasn't one.
> And, as the other testcase shows, there were various ICEs with
> -fnon-call-exceptions -fsanitize=thread.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?
> 
> 2016-09-06  Jakub Jelinek  
> 
>   PR sanitizer/68260
>   * tsan.c: Include target.h and tree-eh.h.
>   (enum tsan_atomic_action): Add bool_clear and bool_test_and_set.
>   (BOOL_CLEAR, BOOL_TEST_AND_SET): Define.
>   (tsan_atomic_table): Add BUILT_IN_ATOMIC_CLEAR and
>   BUILT_IN_ATOMIC_TEST_AND_SET entries.
>   (instrument_builtin_call): Remove EH edges of replaced call stmts.
>   Handle bool_clear and bool_test_and_set.
>   (instrument_memory_accesses): Add RET argument, purge dead eh edges
>   at the end of bbs and set *RET in that case to TODO_cleanup_cfg.
>   (tsan_pass): Adjust instrument_memory_accesses caller, return ret.
> 
>   * c-c++-common/tsan/pr68260.c: New test.
>   * g++.dg/tsan/pr68260.C: New test.
> 
> --- gcc/tsan.c.jj 2016-07-11 22:18:08.0 +0200
> +++ gcc/tsan.c2016-09-06 14:06:21.193642590 +0200
> @@ -40,6 +40,8 @@ along with GCC; see the file COPYING3.
>  #include "tsan.h"
>  #include "asan.h"
>  #include "builtins.h"
> +#include "target.h"
> +#include "tree-eh.h"
>  
>  /* Number of instrumented memory accesses in the current function.  */
>  
> @@ -240,7 +242,8 @@ instrument_expr (gimple_stmt_iterator gs
>  enum tsan_atomic_action
>  {
>check_last, add_seq_cst, add_acquire, weak_cas, strong_cas,
> -  bool_cas, val_cas, lock_release, fetch_op, fetch_op_seq_cst
> +  bool_cas, val_cas, lock_release, fetch_op, fetch_op_seq_cst,
> +  bool_clear, bool_test_and_set
>  };
>  
>  /* Table how to map sync/atomic builtins to their corresponding
> @@ -274,6 +277,10 @@ static const struct tsan_map_atomic
>TRANSFORM (fcode, tsan_fcode, fetch_op, code)
>  #define FETCH_OPS(fcode, tsan_fcode, code) \
>TRANSFORM (fcode, tsan_fcode, fetch_op_seq_cst, code)
> +#define BOOL_CLEAR(fcode, tsan_fcode) \
> +  TRANSFORM (fcode, tsan_fcode, bool_clear, ERROR_MARK)
> +#define BOOL_TEST_AND_SET(fcode, tsan_fcode) \
> +  TRANSFORM (fcode, tsan_fcode, bool_test_and_set, ERROR_MARK)
>  
>CHECK_LAST (ATOMIC_LOAD_1, TSAN_ATOMIC8_LOAD),
>CHECK_LAST (ATOMIC_LOAD_2, TSAN_ATOMIC16_LOAD),
> @@ -463,7 +470,11 @@ static const struct tsan_map_atomic
>LOCK_RELEASE (SYNC_LOCK_RELEASE_2, TSAN_ATOMIC16_STORE),
>LOCK_RELEASE (SYNC_LOCK_RELEASE_4, TSAN_ATOMIC32_STORE),
>LOCK_RELEASE (SYNC_LOCK_RELEASE_8, TSAN_ATOMIC64_STORE),
> -  LOCK_RELEASE (SYNC_LOCK_RELEASE_16, TSAN_ATOMIC128_STORE)
> +  LOCK_RELEASE (SYNC_LOCK_RELEASE_16, TSAN_ATOMIC128_STORE),
> +
> +  BOOL_CLEAR (ATOMIC_CLEAR, TSAN_ATOMIC8_STORE),
> +
> +  BOOL_TEST_AND_SET (ATOMIC_TEST_AND_SET, TSAN_ATOMIC8_EXCHANGE)
>  };
>  
>  /* Instrument an atomic builtin.  */
> @@ -493,6 +504,8 @@ instrument_builtin_call (gimple_stmt_ite
>   if (!tree_fits_uhwi_p (last_arg)
>   || memmodel_base (tree_to_uhwi (last_arg)) >= MEMMODEL_LAST)
> return;
> + if (lookup_stmt_eh_lp (stmt))
> +   remove_stmt_from_eh_lp (stmt);

These changes look bogus to me -- how can the tsan instrumentation
function _not_ throw when the original function we replace it can?

It seems you are just avoiding the ICEs for now "wrong-code".  (and
how does this relate to -fnon-call-exceptions as both are calls?)

The instrument_expr case seems to leave the original stmt in-place
(and thus the EH).

Richard.

>   gimple_call_set_fndecl (stmt, decl);
>   update_stmt (stmt);
>   if (tsan_atomic_table[i].action == fetch_op)
> @@ -514,6 +527,8 @@ instrument_builtin_call (gimple_stmt_ite
>  != add_acquire
>  ? MEMMODEL_SEQ_CST
>  : MEMMODEL_ACQUIRE);
> + if (lookup_stmt_eh_lp (stmt))
> +   remove_stmt_from_eh_lp (stmt);
>   update_gimple_call (gsi, decl, num + 1, args[0], args[1], args[2]);
>   stmt = gsi_stmt (*gsi);
>   if (tsan_atomic_table[i].action == fetch_op_seq_cst)
> @@ -561,6 +576,8 @@ instrument_builtin_call (gimple_stmt_ite
>   if (!tree_fits_uhwi_p (args[5])
>   || memmodel_base (tree_to_uhwi (args[5])) >= MEMMODEL_LAST)
> return;
> + if (lookup_stmt_eh_lp (stmt))
> +   remove_stmt_from_eh_lp (stmt);
>   update_gimple_call (gsi, decl, 5, args[0], args[1], args[2],
>   args[4], args[5]);
>   return;
> @@ -584,6 +601,8 @@ instrument_builtin_call (gimple_stmt_ite
>   g = gimple_build_assign (t, 

Re: [PATCH] Fix PR77450

2016-09-07 Thread Yvan Roux
Hi Richard,

On 6 September 2016 at 14:41, Richard Biener  wrote:
>
> The following fixes PR77450.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
>
> Richard.
>
> 2016-09-06  Richard Biener  
>
> PR c/77450
> c-family/
> * c-common.c (c_common_mark_addressable_vec): Handle
> COMPOUND_LITERAL_EXPR.
>
> * c-c++-common/vector-subscript-7.c: Adjust.
> * c-c++-common/vector-subscript-8.c: New testcase.

This new testcase fails in our validation (ARM and x86 targets):

gcc/testsuite/c-c++-common/vector-subscript-8.c:8:17: error: lvalue
required as left operand of assignment
compiler exited with status 1
FAIL: c-c++-common/vector-subscript-8.c  -std=c++98 (test for excess errors)


> Index: gcc/c-family/c-common.c
> ===
> --- gcc/c-family/c-common.c (revision 240004)
> +++ gcc/c-family/c-common.c (working copy)
> @@ -10918,7 +10918,9 @@ c_common_mark_addressable_vec (tree t)
>  {
>while (handled_component_p (t))
>  t = TREE_OPERAND (t, 0);
> -  if (!VAR_P (t) && TREE_CODE (t) != PARM_DECL)
> +  if (!VAR_P (t)
> +  && TREE_CODE (t) != PARM_DECL
> +  && TREE_CODE (t) != COMPOUND_LITERAL_EXPR)
>  return;
>TREE_ADDRESSABLE (t) = 1;
>  }
> Index: gcc/testsuite/c-c++-common/vector-subscript-7.c
> ===
> --- gcc/testsuite/c-c++-common/vector-subscript-7.c (revision 240004)
> +++ gcc/testsuite/c-c++-common/vector-subscript-7.c (working copy)
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O -fdump-tree-ccp1" } */
> +/* { dg-options "-O -fdump-tree-fre1" } */
>
>  typedef int v4si __attribute__ ((vector_size (16)));
>
> @@ -11,4 +11,4 @@ main (int argc, char** argv)
>return ((v4si){1, 2, 42, 0})[j];
>  }
>
> -/* { dg-final { scan-tree-dump "return 42;" "ccp1" } } */
> +/* { dg-final { scan-tree-dump "return 42;" "fre1" } } */
> Index: gcc/testsuite/c-c++-common/vector-subscript-8.c
> ===
> --- gcc/testsuite/c-c++-common/vector-subscript-8.c (revision 0)
> +++ gcc/testsuite/c-c++-common/vector-subscript-8.c (working copy)
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +
> +typedef int V __attribute__((vector_size(4)));
> +
> +void
> +foo(void)
> +{
> +  (V){ 0 }[0] = 0;
> +}