Re: [PATCH] Instrument only selected files (PR gcov-profile/87442).

2018-11-11 Thread Martin Liška
On 11/9/18 11:00 PM, Jeff Law wrote:
> On 11/8/18 6:42 AM, Martin Liška wrote:
>> Hi.
>>
>> The patch is about possibility to filter which files are instrumented. The 
>> usage
>> is explained in the PR.
>>
>> Patch can bootstrap and survives regression tests on x86_64-linux-gnu.
>>
>> Ready for trunk?
>> Thanks,
>> Martin
>>
>> gcc/ChangeLog:
>>
>> 2018-11-08  Martin Liska  
>>
>>  PR gcov-profile/87442
>>  * common.opt: Add -fprofile-filter-files and -fprofile-exclude-files
>>  options.
>>  * doc/invoke.texi: Document them.
>>  * tree-profile.c (parse_profile_filter): New.
>>  (parse_profile_file_filtering): Likewise.
>>  (release_profile_file_filtering): Likewise.
>>  (include_source_file_for_profile): Likewise.
>>  (tree_profiling): Filter source files based on the
>>  newly added options.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2018-11-08  Martin Liska  
>>
>>  PR gcov-profile/87442
>>  * gcc.dg/profile-filtering-1.c: New test.
>>  * gcc.dg/profile-filtering-2.c: New test.
> Extra credit if we could also do this on a function level.  I've
> certainly talked to developers that want finer grained control over what
> gets instrumented and what doesn't.  This is probably enough to help
> them, but I'm sure they'll want more :-)
> 
> 
> OK.
> jeff
> 

Hi.

May I consider this Jeff as approval of the patch?

Thanks,
Martin


Fix __gnu_cxx::throw_allocator 2 * O(log(N)) complexity

2018-11-11 Thread François Dumont
When doing some debugging session I noticed that the 
__gnu_cxx::throw_allocator doubles all lookup on both insert and erase.


Using map::insert result and erasing the found iterator avoids this 
double lookup.


    * include/ext/throw_allocator.h
    (annotate_base::insert(void*, size_t)): Use insert result to check for
    double insert attempt.
    (annotate_base::insert_construct(void*)): Likewise.
    (annotate_base::check_allocated(void*, size_t)): Return found iterator.
    (annotate_base::erase(void*, size_t)): Use latter method returned
    iterator.
    (annotate_base::check_constructed(void*, size_t)): Return found 
iterator.

    (annotate_base::erase_construct(void*)): Use latter method returned
    iterator.

Tested under linux x86_64.

Ok to commit ?

François

diff --git a/libstdc++-v3/include/ext/throw_allocator.h b/libstdc++-v3/include/ext/throw_allocator.h
index dd7c69e..e513571b766 100644
--- a/libstdc++-v3/include/ext/throw_allocator.h
+++ b/libstdc++-v3/include/ext/throw_allocator.h
@@ -104,31 +104,27 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 void
 insert(void* p, size_t size)
 {
+  entry_type entry = make_entry(p, size);
   if (!p)
 	{
 	  std::string error("annotate_base::insert null insert!\n");
-	  log_to_string(error, make_entry(p, size));
+	  log_to_string(error, entry);
 	  std::__throw_logic_error(error.c_str());
 	}
 
-  const_iterator found = map_alloc().find(p);
-  if (found != map_alloc().end())
+  pair inserted = map_alloc().insert(entry);
+  if (!inserted.second)
 	{
 	  std::string error("annotate_base::insert double insert!\n");
-	  log_to_string(error, make_entry(p, size));
-	  log_to_string(error, *found);
+	  log_to_string(error, entry);
+	  log_to_string(error, *inserted.first);
 	  std::__throw_logic_error(error.c_str());
 	}
-
-  map_alloc().insert(make_entry(p, size));
 }
 
 void
 erase(void* p, size_t size)
-{
-  check_allocated(p, size);
-  map_alloc().erase(p);
-}
+{ map_alloc().erase(check_allocated(p, size)); }
 
 #if __cplusplus >= 201103L
 void
@@ -140,31 +136,26 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  std::__throw_logic_error(error.c_str());
 	}
 
-  auto found = map_construct().find(p);
-  if (found != map_construct().end())
+  auto inserted = map_construct().insert(std::make_pair(p, get_label()));
+  if (!inserted.second)
 	{
 	  std::string error("annotate_base::insert_construct double insert!\n");
 	  log_to_string(error, std::make_pair(p, get_label()));
-	  log_to_string(error, *found);
+	  log_to_string(error, *inserted.first);
 	  std::__throw_logic_error(error.c_str());
 	}
-
-  map_construct().insert(std::make_pair(p, get_label()));
 }
 
 void
 erase_construct(void* p)
-{
-  check_constructed(p);
-  map_construct().erase(p);
-}
+{ map_construct().erase(check_constructed(p)); }
 #endif
 
 // See if a particular address and allocation size has been saved.
-inline void
+inline typename map_alloc_type::iterator
 check_allocated(void* p, size_t size)
 {
-  const_iterator found = map_alloc().find(p);
+  iterator found = map_alloc().find(p);
   if (found == map_alloc().end())
 	{
 	  std::string error("annotate_base::check_allocated by value "
@@ -181,6 +172,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  log_to_string(error, *found);
 	  std::__throw_logic_error(error.c_str());
 	}
+
+  return found;
 }
 
 // See if a given label has been allocated.
@@ -256,7 +249,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
 #if __cplusplus >= 201103L
-inline void
+inline typename map_construct_type::iterator
 check_constructed(void* p)
 {
   auto found = map_construct().find(p);
@@ -267,6 +260,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	  log_to_string(error, std::make_pair(p, get_label()));
 	  std::__throw_logic_error(error.c_str());
 	}
+
+  return found;
 }
 
 inline void


[doc, committed] fix documentation of align type attribute

2018-11-11 Thread Sandra Loosemore
PR69502 notes that the documentation of the align type attribute had 
gotten out of sync with the corresponding variable attribute with 
respect to whether decreasing the alignment of a type was permitted. 
I've checked in this patch to replace the incorrect paragraph with a 
copy of the correct one.


-Sandra
2018-11-11  Sandra Loosemore  

	PR c/69502

	gcc/
	* doc/extend.texi (Common Type Attributes): For the align type
	attribute, copy language about decreasing alignment from the
	corresponding variable attribute.
Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi	(revision 266023)
+++ gcc/doc/extend.texi	(working copy)
@@ -7101,8 +7101,11 @@ up to a maximum of 8-byte alignment, the
 in an @code{__attribute__} still only provides you with 8-byte
 alignment.  See your linker documentation for further information.
 
-The @code{aligned} attribute can only increase alignment.  Alignment
-can be decreased by specifying the @code{packed} attribute.  See below.
+When used on a struct, or struct member, the @code{aligned} attribute can
+only increase the alignment; in order to decrease it, the @code{packed}
+attribute must be specified as well.  When used as part of a typedef, the
+@code{aligned} attribute can both increase and decrease alignment, and
+specifying the @code{packed} attribute generates a warning.
 
 @cindex @code{warn_if_not_aligned} type attribute
 @item warn_if_not_aligned (@var{alignment})


Re: [RFC][PR87528][PR86677] Disable builtin popcount detection when back-end does not define it

2018-11-11 Thread Kugan Vivekanandarajah
Hi Richard,

Thanks for the review.
On Thu, 8 Nov 2018 at 00:03, Richard Biener  wrote:
>
> On Fri, Nov 2, 2018 at 10:02 AM Kugan Vivekanandarajah
>  wrote:
> >
> > Hi Richard,
> > Thanks for the review.
> > On Tue, 30 Oct 2018 at 01:25, Richard Biener  
> > wrote:
> > >
> > > On Mon, Oct 29, 2018 at 2:06 AM Kugan Vivekanandarajah
> > >  wrote:
> > > >
> > > > Hi Richard and Jeff,
> > > >
> > > > Thanks for your comments.
> > > >
> > > > On Fri, 26 Oct 2018 at 19:40, Richard Biener 
> > > >  wrote:
> > > > >
> > > > > On Fri, Oct 26, 2018 at 4:55 AM Jeff Law  wrote:
> > > > > >
> > > > > > On 10/25/18 4:33 PM, Kugan Vivekanandarajah wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > PR87528 showed a case where libgcc generated popcount is causing
> > > > > > > regression for Skylake.
> > > > > > > We also have PR86677 where kernel build is failing because the 
> > > > > > > kernel
> > > > > > > does not use the libgcc (when backend is not defining popcount
> > > > > > > pattern).  While I agree that the kernel should implement its own
> > > > > > > functionality when it is not using the libgcc, I am afraid that 
> > > > > > > the
> > > > > > > implementation can have the same performance issues reported for
> > > > > > > Skylake in PR87528.
> > > > > > >
> > > > > > > Therefore, I would like to propose that we disable popcount 
> > > > > > > detection
> > > > > > > when we don't have a pattern for that. The attached patch (based 
> > > > > > > on
> > > > > > > previous discussions) does this.
> > > > > > >
> > > > > > > Bootstrapped and regression tested on x86_64-linux-gnu with no new
> > > > > > > regressions. We need to disable the popcount* testcases. I will 
> > > > > > > have
> > > > > > > to define a effective_target_with_popcount in
> > > > > > > gcc/testsuite/lib/target-supports.exp if this patch is OK?
> > > > > > > Thanks,
> > > > > > > Kugan
> > > > > > >
> > > > > > >
> > > > > > > gcc/ChangeLog:
> > > > > > >
> > > > > > > 2018-10-25  Kugan Vivekanandarajah  
> > > > > > >
> > > > > > > * tree-scalar-evolution.c (expression_expensive_p): Make 
> > > > > > > BUILTIN POPCOUNT
> > > > > > > as expensive when backend does not define it.
> > > > > > >
> > > > > > >
> > > > > > > gcc/testsuite/ChangeLog:
> > > > > > >
> > > > > > > 2018-10-25  Kugan Vivekanandarajah  
> > > > > > >
> > > > > > > * gcc.target/aarch64/popcount4.c: New test.
> > > > > > >
> > > > > > FWIW, I've been disabling by checking direct_optab_handler elsewhere
> > > > > > (number_of_iterations_popcount) in my tester.  It may in fact be an 
> > > > > > old
> > > > > > patch from you.
> > > > > >
> > > > > > Richi argued that it's the kernel team's responsibility to provide a
> > > > > > popcount since they don't link with libgcc.  And I'm generally in
> > > > > > agreement with that position, though it does tend to generate some
> > > > > > friction with the kernel developers.  We also run the real risk of 
> > > > > > GCC 9
> > > > > > not being able to build the kernel which, IMHO, would be a disaster 
> > > > > > from
> > > > > > a PR standpoint.
> > > > > >
> > > > > > I'd like to hear from others here.  I fully realize we're beyond the
> > > > > > realm of what is strictly technically correct here from a review 
> > > > > > standpoint.
> > > > >
> > > > > As said final value replacement to a library call is probably not 
> > > > > wanted
> > > > > for optimization purpose, so adjusting expression_expensive_p is OK 
> > > > > with
> > > > > me.  It might not fully solve the (non-)issue in case another 
> > > > > optimization pass
> > > > > chooses to materialize niter computation result.
> > > > >
> > > > > Few comments on the patch:
> > > > >
> > > > > +  tree fndecl = get_callee_fndecl (expr);
> > > > > +
> > > > > +  if (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
> > > > > +   {
> > > > > + combined_fn cfn = as_combined_fn (DECL_FUNCTION_CODE 
> > > > > (fndecl));
> > > > >
> > > > >   combined_fn cfn = gimple_call_combined_fn (expr);
> > > > >   switch (cfn)
> > > > > {
> > > >
> > > > Did you mean:
> > > > combined_fn cfn = get_call_combined_fn (expr);
> > >
> > > Yes.
> > >
> > > > > ...
> > > > >
> > > > > cfn will be CFN_LAST for a non-builtin/internal call.  I know Richard 
> > > > > is mostly
> > > > > offline but eventually he knows whether there is a better way to query
> > > > >
> > > > > +   CASE_CFN_POPCOUNT:
> > > > > + /* Check if opcode for popcount is available.  */
> > > > > + if (optab_handler (popcount_optab,
> > > > > +TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG
> > > > > (expr, 0
> > > > > + == CODE_FOR_nothing)
> > > > > +   return true;
> > > > >
> > > > > note that we currently generate builtin calls rather than IFN calls
> > > > > (when a direct
> > > > > optab is supported).
> > > > >
> > > > > Another comment on the patch is that you probably have to 

[PATCH][LRA] Fix PR87899: r264897 cause mis-compiled native arm-linux-gnueabihf toolchain

2018-11-11 Thread Peter Bergner
Renlin, Jeff and Vlad: requests and questions for you below...

PR87899 shows another latent LRA bug exposed by my r264897 commit.
In the bugzilla report, we have the following rtl in LRA:

  (insn 1 (set (reg:SI 1 r1) (reg/f:SI 2040)))
...
  (insn 2 (set (mem/f/c:SI (pre_modify:SI (reg:SI 1 r1)
  (plus:SI (reg:SI 1 r1)
   (const_int 12
   (reg:SI 1048))
  (expr_list:REG_INC (reg:SI 1 r1)))
...
  

My earlier patch now sees the reg copy in insn "1" and correctly skips
adding a conflict between r1 and r2040 due to the copy.  However, insn "2"
updates r1 and r2040 is live across that update and so we should create
a conflict between them, but we currently do not and that leads to us
assigning r1 to one of r2040's reload pseudos which gets clobbered by
the r1 update in insn "2".

The reason a conflict was never added between r1 and r2040 is that LRA
skips INOUT operands when computing conflicts and so misses the definition
of r1 in insn "2" and so never adds conflicts for it.  The reason the code
skips the INOUT operands is that LRA doesn't want to create new program
points for INOUT operands, since unnecessary program points can slow down
remove_some_program_points_and_update_live_ranges.  This was all fine
before when we had conservative conflict info, but now we cannot ignore
INOUT operands.

The heart of the problem is that the {make,mark}_*_{live,dead} routines
update the liveness, conflict and program point information for operands.
My solution to the problem was to pull out the updating of the program point
info from {make,mark}_*_{live,dead} and have them only update liveness and
conflict information.  I then created a separate function that is used for
updating an operand's program points.  This allowed me to modify the insn
operand scanning to handle all operand types (IN, OUT and INOUT) and always
call the {make,mark}_*_{live,dead} functions for all operand types, while
only calling the new program point update function for IN and OUT operands.

This change then allowed me to remove the hacky handling of conflicts for
reg copies and instead use the more common method of removing the src reg
of a copy from the live set before handling the copy's definition, thereby
skipping the unwanted conflict.  Bonus! :-)

This passes bootstrap and regtesting on powerpc64le-linux with no regressions.

Renlin, can you please verify that this patch fixes the issue you ran into?

Jeff, could you please throw this on your test builders to see whether
it exposes any problems with the patch I didn't see?

Otherwise, Vlad and Jeff, what do you think of this solution?

I'll note that I have compiled quite a few largish test cases and so far
I am seeing the same exact program points being used in the LRA rtl dump
before and after my patch.  I'm continuing to do that with more tests
to make sure I have everything working correctly.

I'll also note I took the opportunity to convert lra-lives.c over to using
the HARD_REGISTER_P and HARD_REGISTER_NUM_P convenience macros.

Peter


gcc/
PR rtl-optimization/87899
* lra-lives.c (start_living): Update white space in comment.
(enum point_type): New.
(sparseset_contains_pseudos_p): New function.
(update_pseudo_point): Likewise.
(make_hard_regno_live): Use HARD_REGISTER_NUM_P macro.
(make_hard_regno_dead): Likewise.  Remove ignore_reg_for_conflicts
handling.  Move early exit after adding conflicts.
(mark_pseudo_live): Use HARD_REGISTER_NUM_P macro.  Add early exit
if regno is already live.  Remove all handling of program points.
(mark_pseudo_dead): Use HARD_REGISTER_NUM_P macro.  Add early exit
after adding conflicts.  Remove all handling of program points and
ignore_reg_for_conflicts.
(mark_regno_live): Use HARD_REGISTER_NUM_P macro.  Remove return value
and do not guard call to mark_pseudo_live.
(mark_regno_dead): Use HARD_REGISTER_NUM_P macro.  Remove return value
and do not guard call to mark_pseudo_dead.
(check_pseudos_live_through_calls): Use HARD_REGISTER_NUM_P macro.
(process_bb_lives): Use HARD_REGISTER_NUM_P and HARD_REGISTER_P macros.
Use new function update_pseudo_point.  Handle register copies by
removing the source register from the live set.  Handle INOUT operands.
Update to the next program point using the unused_set, dead_set and
start_dying sets.
(lra_create_live_ranges_1): Use HARD_REGISTER_NUM_P macro.

Index: gcc/lra-lives.c
===
--- gcc/lra-lives.c (revision 265971)
+++ gcc/lra-lives.c (working copy)
@@ -83,7 +83,7 @@ static HARD_REG_SET hard_regs_live;
 
 /* Set of pseudos and hard registers start living/dying in the current
insn.  These sets are used to update REG_DEAD and REG_UNUSED 

Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-11 Thread Andi Kleen
On Sun, Nov 11, 2018 at 10:06:21AM +0100, Richard Biener wrote:
> That is, usually debuggers look for a location list of a variable
> and find, say, %rax.  But for ptwrite the debugger needs to
> examine all active location lists for, say, %rax and figure out
> that it contains the value for variable 'a'?

In dwarf output you end up with a list of

start-IP...stop-IP ...  variable locations

Both the original load/store and PTWRITE are in the same scope,
and the debugger just looks it up based on the IP,
so it all works without any extra modifications.

I even had an earlier version of this that instrumented
assembler output of the compiler with PTWRITE in a separate script,
and it worked fine too.
> 
> When there isn't any such relation between the ptwrite stored
> value and any variable the ptwrite is useless, right?

A programmer might still be able to make use of it
based on the context or the order.

e.g. if you don't instrument everything, but only specific
variables, or you only instrument arguments and returns or
similar then it could be still useful just based on the IP->symbol
resolution. If you instrument too many things yes it will be
hard to use without debug info resolution.

> I hope you don't mind if this eventually slips to GCC 10 given
> as you say there is no HW available right now.  (still waiting
> for a CPU with CET ...)

:-/

Actually there is.  Gemini Lake Atom Hardware with Goldmont Plus 
is shipping for some time and you can buy them.

-Andi


[PATCH][lower-subreg] Fix PR87507

2018-11-11 Thread Peter Bergner
PR87507 shows a problem where IRA assigns a non-volatile TImode reg pair to
a pseudo when there is a volatile reg pair available to use.  This then
causes us to emit save/restore code for the non-volatile reg usage.

My first attempt at solving this was to adjust the costs used by non-volatile
registers, but Vlad was hesitant to accept the patch since this is a sensitive
area that can cause performance issues:

  https://gcc.gnu.org/ml/gcc-patches/2018-10/msg00460.html

Since we're late in stage1, I decided to try another solution that doesn't
involve RA at all.  The only reason the register pairs exist at the moment
is that lower-subreg could not decompose them when compiling on ppc in LE
mode.  The reason is that for POWER8, some of the vector loads and stores
do not properly byte swap the values being loaded/stores, so our mov
patterns add an explicit rotate to the rtl insns.  So the following:

  (insn (set (mem:TI (reg/v/f:DI 122))
 (reg/v:TI 123)))

is replaced with:

  (insn (set (reg:TI 129)
 (rotate:TI (reg/v:TI 123) (const_int 64

  (insn (set (mem:TI (reg/v/f:DI 122))
 (rotate:TI (reg:TI 129) (const_int 64

On BE, we are able to correctly decompose the TImode access into two DImode
accesses, but on LE, lower-subreg doesn't see the accesses as simple moves
anymore, and so fails to decompose them.  However, is we look at what
lower-subreg tries, it sees a:

  (insn (set (concatn/v:TI [(reg:DI 131 [src])
(reg:DI 132 [src+8])])
 (rotate:TI (concatn:TI [(reg:DI 133)
 (reg:DI 134 [+8])])
(const_int 64

This is just a word sized rotate of a double word sized register pair
and that is just equivalent to stripping the rotate and swapping the
two registers like so:

  (insn (set (concatn/v:TI [(reg:DI 131 [src])
(reg:DI 132 [src+8])])
 (concatn:TI [(reg:DI 134 [+8])
  (reg:DI 133)])))

The following patch extends lower-subreg to recognize these word sized
rotates as simple moves so that it can replace the rotates with swapped
decomposed registers.

This has passed bootstrap and regtesting on powerpc64le-linux with no
regressions.  Is this ok for mainline?

Peter

gcc/
PR rtl-optimization/87507
* lower-subreg.c (simple_move_operator): New function.
(simple_move): Strip simple operators.
(find_pseudo_copy): Likewise.
(resolve_simple_move): Strip simple operators and swap operands.

gcc/testsuite/
PR rtl-optimization/87507
* gcc.target/powerpc/pr87507.c: New test.
* gcc.target/powerpc/pr68805.c: Update expected results.

Index: gcc/lower-subreg.c
===
--- gcc/lower-subreg.c  (revision 265971)
+++ gcc/lower-subreg.c  (working copy)
@@ -320,6 +320,24 @@ simple_move_operand (rtx x)
   return true;
 }
 
+/* If X is an operator that can be treated as a simple move that we
+   can split, then return the operand that is operated on.  */
+
+static rtx
+simple_move_operator (rtx x)
+{
+  /* A word sized rotate of a register pair is equivalent to swapping
+ the registers in the register pair.  */
+  if (GET_CODE (x) == ROTATE
+  && GET_MODE (x) == twice_word_mode
+  && simple_move_operand (XEXP (x, 0))
+  && CONST_INT_P (XEXP (x, 1))
+  && INTVAL (XEXP (x, 1)) == BITS_PER_WORD)
+return XEXP (x, 0);;
+
+  return NULL_RTX;
+}
+
 /* If INSN is a single set between two objects that we want to split,
return the single set.  SPEED_P says whether we are optimizing
INSN for speed or size.
@@ -330,7 +348,7 @@ simple_move_operand (rtx x)
 static rtx
 simple_move (rtx_insn *insn, bool speed_p)
 {
-  rtx x;
+  rtx x, op;
   rtx set;
   machine_mode mode;
 
@@ -348,6 +366,9 @@ simple_move (rtx_insn *insn, bool speed_
 return NULL_RTX;
 
   x = SET_SRC (set);
+  if ((op = simple_move_operator (x)) != NULL_RTX)
+x = op;
+
   if (x != recog_data.operand[0] && x != recog_data.operand[1])
 return NULL_RTX;
   /* For the src we can handle ASM_OPERANDS, and it is beneficial for
@@ -386,9 +407,13 @@ find_pseudo_copy (rtx set)
 {
   rtx dest = SET_DEST (set);
   rtx src = SET_SRC (set);
+  rtx op;
   unsigned int rd, rs;
   bitmap b;
 
+  if ((op = simple_move_operator (src)) != NULL_RTX)
+src = op;
+
   if (!REG_P (dest) || !REG_P (src))
 return false;
 
@@ -853,7 +878,7 @@ can_decompose_p (rtx x)
 static rtx_insn *
 resolve_simple_move (rtx set, rtx_insn *insn)
 {
-  rtx src, dest, real_dest;
+  rtx src, dest, real_dest, src_op;
   rtx_insn *insns;
   machine_mode orig_mode, dest_mode;
   unsigned int orig_size, words;
@@ -876,6 +901,33 @@ resolve_simple_move (rtx set, rtx_insn *
 
   real_dest = NULL_RTX;
 
+  if ((src_op = simple_move_operator (src)) != NULL_RTX)
+{
+  if (resolve_reg_p (dest))
+   {
+ /* DEST is a CONCATN, so swap 

Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-11 Thread Andi Kleen
On Sun, Nov 11, 2018 at 11:37:57AM -0700, Martin Sebor wrote:
> One other high-level comment: a more powerful interface to
> variable tracing than annotating declarations in the source
> would be to provide either the names of the symbols to trace
> on the command line or in an external file.  That way tracing
> could be enabled for objects and types declared in read-only
> files (such as system headers), and would let the user more
> easily experiment with annotations.

For variables/functions if you add at the end of the source file

typeof(foo) __attribute__(("vartrace"));

it should enable it in theory (haven't tested) for both
variables or functions. Not sure about types, probably not,
but that might not be needed.

But it has to be at the end of the file, so -include doesn't work.
If an -include-after would be added to the preprocessor
it would work.

> This could be in addition to the attributes, and would require
> coming up with a way of identifying symbols with internal or
> no linkage, such as local variables, and perhaps also function

Individual local variables are hard, but you could likely
enable tracing for everything in the function with the 
attribute trick above.

-Andi


[PATCH, csky] Handle -frounding-math.

2018-11-11 Thread 瞿仙淼
Hi, 
I have submitted a patch to handle -frounding-math for C-SKY


Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 266023)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2018-11-11  Xianmiao Qu  
+
+   * config/csky/csky.md (*fpuv2_nmulsf3_1, *fpuv2_nmuldf3_1): Handle
+   -frounding-math.
+
 2018-11-11  Sandra Loosemore  
 
PR c++/43105
Index: gcc/config/csky/csky_insn_fpu.md
===
--- gcc/config/csky/csky_insn_fpu.md(revision 266023)
+++ gcc/config/csky/csky_insn_fpu.md(working copy)
@@ -131,7 +131,7 @@
   [(set (match_operand:SF 0 "register_operand" "=v")
(mult:SF (neg:SF (match_operand:SF 1 "register_operand" "%v"))
 (match_operand:SF 2 "register_operand" "v")))]
-  "CSKY_ISA_FEATURE (fpv2_sf)"
+  "CSKY_ISA_FEATURE (fpv2_sf) && !flag_rounding_math"
   "fnmuls\t%0, %1, %2")
 
 (define_insn "*fpuv2_nmulsf3_2"
@@ -145,7 +145,7 @@
   [(set (match_operand:DF 0 "register_operand" "=v")
(mult:DF (neg:DF (match_operand:DF 1 "register_operand" "%v"))
 (match_operand:DF 2 "register_operand" "v")))]
- "CSKY_ISA_FEATURE (fpv2_df)"
+ "CSKY_ISA_FEATURE (fpv2_df) && !flag_rounding_math"
  "fnmuld\t%0, %1, %2")
 
 (define_insn "*fpuv2_nmuldf3_2"
Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog (revision 266023)
+++ gcc/testsuite/ChangeLog (working copy)
@@ -1,3 +1,10 @@
+2018-11-11  Xianmiao Qu  
+
+   * gcc.target/csky/fnmul-1.c: New.
+   * gcc.target/csky/fnmul-2.c: New.
+   * gcc.target/csky/fnmul-3.c: New.
+   * gcc.target/csky/fnmul-4.c: New.
+
 2018-11-11  Uros Bizjak  
 
PR target/87928
Index: gcc/testsuite/gcc.target/csky/fnmul-1.c
===
--- gcc/testsuite/gcc.target/csky/fnmul-1.c (nonexistent)
+++ gcc/testsuite/gcc.target/csky/fnmul-1.c (working copy)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mcpu=ck810f -mhard-float -O2" } */
+
+double
+fnmuld (double a, double b)
+{
+  /* { dg-final { scan-assembler "fnmuld" } } */
+  return -a * b;
+}
+
+float
+fnmuls (float a, float b)
+{
+  /* { dg-final { scan-assembler "fnmuls" } } */
+  return -a * b;
+}
+
Index: gcc/testsuite/gcc.target/csky/fnmul-2.c
===
--- gcc/testsuite/gcc.target/csky/fnmul-2.c (nonexistent)
+++ gcc/testsuite/gcc.target/csky/fnmul-2.c (working copy)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-mcpu=ck810f -mhard-float -O2 -frounding-math" } */
+
+double
+fnmuld (double a, double b)
+{
+  /* { dg-final { scan-assembler "fnegd" } } */
+  /* { dg-final { scan-assembler "fmuld" } } */
+  return -a * b;
+}
+
+float
+fnmuls (float a, float b)
+{
+  /* { dg-final { scan-assembler "fnegs" } } */
+  /* { dg-final { scan-assembler "fmuls" } } */
+  return -a * b;
+}
+
Index: gcc/testsuite/gcc.target/csky/fnmul-3.c
===
--- gcc/testsuite/gcc.target/csky/fnmul-3.c (nonexistent)
+++ gcc/testsuite/gcc.target/csky/fnmul-3.c (working copy)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mcpu=ck810f -mhard-float -O2" } */
+
+double
+fnmuld (double a, double b)
+{
+  /* { dg-final { scan-assembler "fnmuld" } } */
+  return -(a * b);
+}
+
+float
+fnmuls (float a, float b)
+{
+  /* { dg-final { scan-assembler "fnmuls" } } */
+  return -(a * b);
+}
+
Index: gcc/testsuite/gcc.target/csky/fnmul-4.c
===
--- gcc/testsuite/gcc.target/csky/fnmul-4.c (nonexistent)
+++ gcc/testsuite/gcc.target/csky/fnmul-4.c (working copy)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-mcpu=ck810f -mhard-float -O2 -frounding-math" } */
+
+double
+fnmuld (double a, double b)
+{
+  /* { dg-final { scan-assembler "fnmuld" } } */
+  return -(a * b);
+}
+
+float
+fnmuls (float a, float b)
+{
+  /* { dg-final { scan-assembler "fnmuls" } } */
+  return -(a * b);
+}
+





Re: [PATCH][RFC] Come up with -flive-patching master option.

2018-11-11 Thread Qing Zhao
Hi,


> On Nov 10, 2018, at 2:51 AM, Martin Liška  wrote:
> 
> On 11/9/18 6:43 PM, Qing Zhao wrote:
>> Hi, Martin,
>> 
>> thanks a lot for the previous two new options for live-patching. 
>> 
>> 
>> I have two more questions below:
> 
> Hello.
> 
>> 
>> 1. do we still need new options to disable the following:
>>   A. unreachable code/variable removal? 
> 
> I hope it's guarded with newly added option -fipa-reference-addressable. 
> Correct me
> if I'm wrong.

The followings are some previous discussions on this:

“
>>> 
>>> We perform discovery of functions/variables with no address taken and
>>> optimizations that are not valid otherwise such as duplicating them
>>> or doing skipping them for alias analysis (no flag to disable)
>> 
>> Can you be please more verbose here? What optimizations do you mean?

See ipa_discover_readonly_nonaddressable_vars. If addressable bit is
cleared we start analyzing uses of the variable via ipa_reference or so.
If writeonly bit is set, we start removing writes to the variable and if
readonly bit is set we skip any analysis about whether vairable changed.
“

the new -fipa-reference-addressable is to control the above,  seems not the 
unreachable code/variable removal?

> 
>>   B. Visibility changes with -flto and/or -fwhole-program?
> 
> The options are not used in linux kernel, thus I didn't consider.

Okay, I see.

> 
>> 
>> 2. for this new patch, could you please explain a little bit more on the 
>> problem?
> 
> We want to enable a single option that will disable all possible (and future) 
> optimizations
> that influence live patching.

Okay, I see.

I am also working on a similar option as yours, but make the -flive-patching as 
two level control:

+flive-patching
+Common RejectNegative Alias(flive-patching=,inline-clone)
+
+flive-patching=
+Common Report Joined RejectNegative Enum(live_patching_level) 
Var(flag_live_patching) Init(LIVE_NONE)
+-flive-patching=[inline-only-static|inline-clone]  Control optimizations 
to provide a safe comp for live-patching purpose.

the implementation for -flive-patching=inline-clone (the default) is exactly as 
yours,  the new level -flive-patching=inline-only-static
is to only enable inlining of static function for live patching, which is 
important for multiple-processes live patching to control memory
consumption. 

(please see my 2nd version of the -flive-patching proposal).

I will send out my complete patch in another email.

thanks.

Qing




[PATCH] C++: fix-it hint for missing parentheses

2018-11-11 Thread David Malcolm
Consider:

  class t1
  {
  public:
double length () const { return m_length; }
  private:
double m_length;
  };

missing-parens-fixit.C: In function 'bool test_1(const t1&)':
missing-parens-fixit.C:14:15: error: invalid use of member function
  'double t1::length() const' (did you forget the '()' ?)
   14 |   return inst.length > 0.0;
  |  ~^~

This patch adds a fix-it hint for the case where the member function
takes no parameters, suggesting the addition of the parentheses:

   14 |   return inst.length > 0.0;
  |  ~^~
  | ()

so that an IDE can potentially apply the fix.

OK for trunk?

gcc/cp/ChangeLog:
* typeck2.c: Include "gcc-rich-location.h".
(cxx_incomplete_type_diagnostic): When complaining about possibly
missing parens, add a fix-it hint if the member function takes no
additional params.

gcc/ChangeLog:
* diagnostic-core.h (emit_diagnostic): New decl.
* diagnostic.c (emit_diagnostic): New overload, taking a
rich_location *.

gcc/testsuite/ChangeLog:
* g++.dg/parse/missing-parens-fixit.C: New test.
---
 gcc/cp/typeck2.c  | 14 +++---
 gcc/diagnostic-core.h |  2 ++
 gcc/diagnostic.c  | 14 ++
 gcc/testsuite/g++.dg/parse/missing-parens-fixit.C | 32 +++
 4 files changed, 59 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/parse/missing-parens-fixit.C

diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c
index fec1db0..c3c59a8 100644
--- a/gcc/cp/typeck2.c
+++ b/gcc/cp/typeck2.c
@@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "stor-layout.h"
 #include "varasm.h"
 #include "intl.h"
+#include "gcc-rich-location.h"
 
 static tree
 process_init_constructor (tree type, tree init, int nested,
@@ -507,9 +508,16 @@ cxx_incomplete_type_diagnostic (location_t loc, const_tree 
value,
 
if (DECL_FUNCTION_MEMBER_P (member)
&& ! flag_ms_extensions)
- emit_diagnostic (diag_kind, loc, 0,
-  "invalid use of member function %qD "
-  "(did you forget the %<()%> ?)", member);
+ {
+   gcc_rich_location richloc (loc);
+   /* If "member" has no arguments (other than "this"), then
+  add a fix-it hint.  */
+   if (type_num_arguments (TREE_TYPE (member)) == 1)
+ richloc.add_fixit_insert_after ("()");
+   emit_diagnostic (diag_kind, , 0,
+"invalid use of member function %qD "
+"(did you forget the %<()%> ?)", member);
+ }
else
  emit_diagnostic (diag_kind, loc, 0,
   "invalid use of member %qD "
diff --git a/gcc/diagnostic-core.h b/gcc/diagnostic-core.h
index 80ff395..6c40f26 100644
--- a/gcc/diagnostic-core.h
+++ b/gcc/diagnostic-core.h
@@ -105,6 +105,8 @@ extern void inform_n (location_t, unsigned HOST_WIDE_INT, 
const char *,
 extern void verbatim (const char *, ...) ATTRIBUTE_GCC_DIAG(1,2);
 extern bool emit_diagnostic (diagnostic_t, location_t, int,
 const char *, ...) ATTRIBUTE_GCC_DIAG(4,5);
+extern bool emit_diagnostic (diagnostic_t, rich_location *, int,
+const char *, ...) ATTRIBUTE_GCC_DIAG(4,5);
 extern bool emit_diagnostic_valist (diagnostic_t, location_t, int, const char 
*,
va_list *) ATTRIBUTE_GCC_DIAG (4,0);
 extern bool seen_error (void);
diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
index a572c08..9b583ce 100644
--- a/gcc/diagnostic.c
+++ b/gcc/diagnostic.c
@@ -1168,6 +1168,20 @@ emit_diagnostic (diagnostic_t kind, location_t location, 
int opt,
   return ret;
 }
 
+/* As above, but for rich_location *.  */
+
+bool
+emit_diagnostic (diagnostic_t kind, rich_location *richloc, int opt,
+const char *gmsgid, ...)
+{
+  auto_diagnostic_group d;
+  va_list ap;
+  va_start (ap, gmsgid);
+  bool ret = diagnostic_impl (richloc, opt, gmsgid, , kind);
+  va_end (ap);
+  return ret;
+}
+
 /* Wrapper around diagnostic_impl taking a va_list parameter.  */
 
 bool
diff --git a/gcc/testsuite/g++.dg/parse/missing-parens-fixit.C 
b/gcc/testsuite/g++.dg/parse/missing-parens-fixit.C
new file mode 100644
index 000..a0fd7dd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/parse/missing-parens-fixit.C
@@ -0,0 +1,32 @@
+// { dg-options "-fdiagnostics-show-caret" }
+
+class t1
+{
+public:
+  double length () const { return m_length; }
+  double area (double width) const { return m_length * width; }
+
+private:
+  double m_length;
+};
+
+bool test_1 (const t1 )
+{
+  return inst.length > 0.0; // { dg-error "did you forget the '\\(\\)'" }
+  /* We expect a fix-it hint.  */
+  /* { dg-begin-multiline-output "" }
+   return inst.length > 0.0;
+  ~^~
+ 

Re: [PATCH PR84648]Adjust loop exit conditions for loop-until-wrap cases.

2018-11-11 Thread Bin.Cheng
On Sun, Nov 11, 2018 at 7:20 PM Bernhard Reutner-Fischer
 wrote:
>
> On Sun, 11 Nov 2018 16:02:33 +0800
> "bin.cheng"  wrote:
>
> Quick observation unrelated to the real patch.
>
> I think the coding style mandates to use the type itself and not the
> underlying structure.
>
> I know there are alot of places from before our switch to C++ that
> still use enum tree_code or enum machine_mode, but new code should
> adhere to the style please.
>
> So: s/enum tree_code/tree_code/g
Thanks very much for reminding, I forgot the switch when writing the
patch, will update along with changes for other reviews.

Thanks,
bin
>
> thanks,


[PATCH] RFC: elide repeated source locations (PR other/84889)

2018-11-11 Thread David Malcolm
We often emit more than one diagnostic at the same source location.
For example, the C++ frontend can emit many diagnostics at
the same source location when suggesting overload candidates.

For example:

../../src/gcc/testsuite/g++.dg/diagnostic/bad-binary-ops.C: In function 'int 
test_3(s, t)':
../../src/gcc/testsuite/g++.dg/diagnostic/bad-binary-ops.C:38:18: error: no 
match for 'operator&&' (operand types are 's' and 't')
   38 |   return param_s && param_t;
  |  ^~
../../src/gcc/testsuite/g++.dg/diagnostic/bad-binary-ops.C:38:18: note: 
candidate: 'operator&&(bool, bool)' 
../../src/gcc/testsuite/g++.dg/diagnostic/bad-binary-ops.C:38:18: note:   no 
known conversion for argument 2 from 't' to 'bool'

This is overly verbose.  Note how the same location has been printed
three times, obscuring the pertinent messages.

This patch add a new "elide" value to -fdiagnostics-show-location=
and makes it the default (previously it was "once").  With elision
the above is printed as:

../../src/gcc/testsuite/g++.dg/diagnostic/bad-binary-ops.C: In function 'int 
test_3(s, t)':
../../src/gcc/testsuite/g++.dg/diagnostic/bad-binary-ops.C:38:18: error: no 
match for 'operator&&' (operand types are 's' and 't')
   38 |   return param_s && param_t;
  |  ^~
  = note: candidate: 'operator&&(bool, bool)' 
  = note:   no known conversion for argument 2 from 't' to 'bool'

where the followup notes are printed with a '=' lined up with
the source code margin.

Thoughts?
Dave

TODO: the selftests currently assume LANG=C.  Clearly this would
need fixing before committing.

TODO: potentially integrate this with auto_diagnostic_group, so
we only elide within a logically-related diagnostic (and thus
make auto_diagnostic_group do something user-visible).

(FWIW, the output format is inspired by Rust)

gcc/ChangeLog:
PR other/84889
* common.opt (fdiagnostics-show-location=): Add "elide".
(Enum(diagnostic_prefixing_rule)): Add "elide" value.
* diagnostic-show-locus.c (layout::print_source_line): Set
last_margin.
(diagnostic_show_locus): Reset last_margin, saving it and
restoring it if bailing out.
(selftest::test_one_liner_elide_location): New test.
FIXME: require LANG=C, or fix the test to work with i18n.
(selftest::test_diagnostic_show_locus_one_liner): Call it.
* diagnostic.c (diagnostic_initialize): Initialize "last_margin"
and "override_file".
(diagnostic_get_location_text): Use context->override_file
if non-NULL.
(diagnostic_build_prefix): Handle DIAGNOSTICS_SHOW_PREFIX_ELIDE.
(selftest::assert_location_text): Set dc.override_file.
* diagnostic.h (struct diagnostic_context): Add fields
"last_margin" and "override_file".
* doc/invoke.texi (-fdiagnostics-show-location=elide): Document.
(-fdiagnostics-show-location=once): Remove "default behavior"
text.
* pretty-print.c (pp_set_real_maximum_length): Handle
DIAGNOSTICS_SHOW_PREFIX_ELIDE.
(pp_emit_prefix): Likewise.
(pretty_printer::pretty_printer): Default to
DIAGNOSTICS_SHOW_PREFIX_ELIDE.
* pretty-print.h (enum diagnostic_prefixing_rule_t): Add
DIAGNOSTICS_SHOW_PREFIX_ELIDE.  Document inline.
* selftest-diagnostic.c
(selftest::test_diagnostic_context::test_diagnostic_context):
Set buffer's stream and flush_p.  Set override_file.
(selftest::global_dc_test::global_dc_test): New ctor.
(selftest::global_dc_test::~global_dc_test): New dtor.
* selftest-diagnostic.h (class selftest::global_dc_test): New
class.

gcc/testsuite/ChangeLog:
PR other/84889
* lib/prune.exp (TEST_ALWAYS_FLAGS): Add
-fdiagnostics-show-location=once.
---
 gcc/common.opt  |   5 +-
 gcc/diagnostic-show-locus.c | 135 +++-
 gcc/diagnostic.c|  70 ++-
 gcc/diagnostic.h|   8 +++
 gcc/doc/invoke.texi |  10 +++-
 gcc/pretty-print.c  |   7 ++-
 gcc/pretty-print.h  |  22 +---
 gcc/selftest-diagnostic.c   |  21 +++
 gcc/selftest-diagnostic.h   |  16 ++
 gcc/testsuite/lib/prune.exp |   1 +
 10 files changed, 279 insertions(+), 16 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 5a5d332..010ca0c 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1222,7 +1222,7 @@ Try to convert virtual calls to direct ones.
 
 fdiagnostics-show-location=
 Common Joined RejectNegative Enum(diagnostic_prefixing_rule)
--fdiagnostics-show-location=[once|every-line]  How often to emit source 
location at the beginning of line-wrapped diagnostics.
+-fdiagnostics-show-location=[elide|once|every-line]How often to emit 
source location at the beginning of diagnostics.
 
 ; Required for these enum values.
 SourceInclude
@@ -1237,6 +1237,9 @@ 

[PATCH] pretty-print.c: add selftest::test_prefixes_and_wrapping

2018-11-11 Thread David Malcolm
gcc/ChangeLog:
* pretty-print.c (class selftest::test_pretty_printer): New
subclass of pretty_printer.
(selftest::test_prefixes_and_wrapping): New test.
(selftest::pretty_print_c_tests): Call it.
---
 gcc/pretty-print.c | 96 ++
 1 file changed, 96 insertions(+)

diff --git a/gcc/pretty-print.c b/gcc/pretty-print.c
index 19ef75b..691dbb6 100644
--- a/gcc/pretty-print.c
+++ b/gcc/pretty-print.c
@@ -2217,6 +2217,101 @@ test_pp_format ()
"problem with %qs at line %i", "bar", 10);
 }
 
+/* A subclass of pretty_printer for use by test_prefixes_and_wrapping.  */
+
+class test_pretty_printer : public pretty_printer
+{
+ public:
+  test_pretty_printer (enum diagnostic_prefixing_rule_t rule,
+  int max_line_length)
+  {
+pp_set_prefix (this, xstrdup ("PREFIX: "));
+wrapping.rule = rule;
+pp_set_line_maximum_length (this, max_line_length);
+  }
+};
+
+/* Verify that the various values of enum diagnostic_prefixing_rule_t work
+   as expected, with and without line wrapping.  */
+
+static void
+test_prefixes_and_wrapping ()
+{
+  /* Tests of the various prefixing rules, without wrapping.
+ Newlines embedded in pp_string don't affect it; we have to
+ explicitly call pp_newline.  */
+  {
+test_pretty_printer pp (DIAGNOSTICS_SHOW_PREFIX_ONCE, 0);
+pp_string (, "the quick brown fox");
+pp_newline ();
+pp_string (, "jumps over the lazy dog");
+pp_newline ();
+ASSERT_STREQ (pp_formatted_text (),
+ "PREFIX: the quick brown fox\n"
+ "   jumps over the lazy dog\n");
+  }
+  {
+test_pretty_printer pp (DIAGNOSTICS_SHOW_PREFIX_NEVER, 0);
+pp_string (, "the quick brown fox");
+pp_newline ();
+pp_string (, "jumps over the lazy dog");
+pp_newline ();
+ASSERT_STREQ (pp_formatted_text (),
+ "the quick brown fox\n"
+ "jumps over the lazy dog\n");
+  }
+  {
+test_pretty_printer pp (DIAGNOSTICS_SHOW_PREFIX_EVERY_LINE, 0);
+pp_string (, "the quick brown fox");
+pp_newline ();
+pp_string (, "jumps over the lazy dog");
+pp_newline ();
+ASSERT_STREQ (pp_formatted_text (),
+ "PREFIX: the quick brown fox\n"
+ "PREFIX: jumps over the lazy dog\n");
+  }
+
+  /* Tests of the various prefixing rules, with wrapping.  */
+  {
+test_pretty_printer pp (DIAGNOSTICS_SHOW_PREFIX_ONCE, 20);
+pp_string (, "the quick brown fox jumps over the lazy dog");
+pp_newline ();
+pp_string (, "able was I ere I saw elba");
+pp_newline ();
+ASSERT_STREQ (pp_formatted_text (),
+ "PREFIX: the quick \n"
+ "   brown fox jumps \n"
+ "   over the lazy \n"
+ "   dog\n"
+ "   able was I ere I \n"
+ "   saw elba\n");
+  }
+  {
+test_pretty_printer pp (DIAGNOSTICS_SHOW_PREFIX_NEVER, 20);
+pp_string (, "the quick brown fox jumps over the lazy dog");
+pp_newline ();
+pp_string (, "able was I ere I saw elba");
+pp_newline ();
+ASSERT_STREQ (pp_formatted_text (),
+ "the quick brown fox \n"
+ "jumps over the lazy \n"
+ "dog\n"
+ "able was I ere I \n"
+ "saw elba\n");
+  }
+  {
+test_pretty_printer pp (DIAGNOSTICS_SHOW_PREFIX_EVERY_LINE, 20);
+pp_string (, "the quick brown fox jumps over the lazy dog");
+pp_newline ();
+pp_string (, "able was I ere I saw elba");
+pp_newline ();
+ASSERT_STREQ (pp_formatted_text (),
+ "PREFIX: the quick brown fox jumps over the lazy dog\n"
+ "PREFIX: able was I ere I saw elba\n");
+  }
+
+}
+
 /* Run all of the selftests within this file.  */
 
 void
@@ -2224,6 +2319,7 @@ pretty_print_c_tests ()
 {
   test_basic_printing ();
   test_pp_format ();
+  test_prefixes_and_wrapping ();
 }
 
 } // namespace selftest
-- 
1.8.5.3



Re: [PATCH 2/2] asm inline

2018-11-11 Thread Segher Boessenkool
On Sun, Nov 11, 2018 at 04:41:19PM -0700, Martin Sebor wrote:
> On 11/11/2018 03:00 PM, Segher Boessenkool wrote:
> >On Sun, Nov 11, 2018 at 02:33:34PM -0700, Martin Sebor wrote:
> >>On 10/30/2018 11:30 AM, Segher Boessenkool wrote:
> >>>The Linux kernel people want a feature that makes GCC pretend some
> >>>inline assembler code is tiny (while it would think it is huge), so
> >>>that such code will be inlined essentially always instead of
> >>>essentially never.
> >>>
> >>>This patch lets you say "asm inline" instead of just "asm", with the
> >>>result that that inline assembler is always counted as minimum cost
> >>>for inlining.  It implements this for C and C++.
> >>
> >>Rather than overloading the inline keyword I think it would be
> >>more in keeping both with the design of existing features to
> >>control inlining in GCC and with the way the languages are
> >>evolving to allow the always_inline (and perhaps also noinline)
> >>attribute to apply to asm statements.
> >
> >We already "overloaded" the volatile and goto keywords here (it's not
> >overloading, the contexts are completely disjoint).
> >
> >always_inline is a function attribute, and you want to make it a statement
> >attribute (which we do not currently support except for empty statements!)
> >as well.
> 
> Yes, I suggest using a statement attribute for this.  I believe
> it would be more appropriate that overloading the meaning of
> an existing keyword in this context.

It is *not* overloading the meaning in this context.  There *was* no
meaning in this context.

Also, all of the other keywords allowed here (const, volatile, restrict,
goto) have different meanings in other contexts already.

> C++ is entertaining proposals to introduce a few statement
> attributes (e.g., likely/unlikely and even unreachable) so
> as I said, adding one to GCC would be in like with
> the direction the languages are moving (C has adopted the C++
> 11 attributes for C2X and it likely will continue to do so going
> forward).

As I've said many times now, I think attributes are a bad match for this.
Also, I am not going to implement statement attributes just for this.

> Supporting an attribute in this context also has a lower cost
> for third party tools that parse code (and are already prepared
> to deal with attributes).  Adding a new keyword here has a good
> chance of breaking some of those tools.

I don't see why I would care.  Sorry.  Extended asm is a GCC extension,
so we make the rules.

The first patch makes any order and combination of qualifiers (and goto
and inline) valid for asm, where before only zero or one qualifiers and
zero or one goto were allowed (in that order).  This gives exactly the
same problem for 3rd party tools, but it is an obvious improvement (and
even a bugfix).

> >>The inline keyword is commonly understood to be just a hint to
> >>the compiler.
> >
> >That is exactly what it is here.  It hints the compiler that this asm
> >is cheap, whatever it may look like.
> 
> The way you described the purpose of asm inline: "such code will
> be inlined essentially always instead of essentially never"
> sounded to me like it's close to the effect of attribute
> always_inline.  But after reading the patch it looks like the asm
> just overrides the number of instructions GCC estimates
> the statement expands into.

Only for the inlining decisions.  It still uses the conservative
estimate for things like the length attribute (the machine description
thing), and other things required for correctness.

> If I read the code right then
> a different attribute altogether would seem more appropriate
> (e.g., perhaps even something like insn_count (N); that would
> make it possible to provide an exact instruction count and also
> prevent the inlining of the enclosing function by specifying
> a very large count).

No, the user does not know what units is counted in here.  Would there
be any real benefit anyway?  I don't see it.

> In any event, overloading the inline
> keyword for this purpose doesn't seem like a good approach
> to me.

It is not overloading anything.


Segher


Re: [PATCH 2/2] asm inline

2018-11-11 Thread Martin Sebor

On 11/11/2018 03:00 PM, Segher Boessenkool wrote:

On Sun, Nov 11, 2018 at 02:33:34PM -0700, Martin Sebor wrote:

On 10/30/2018 11:30 AM, Segher Boessenkool wrote:

The Linux kernel people want a feature that makes GCC pretend some
inline assembler code is tiny (while it would think it is huge), so
that such code will be inlined essentially always instead of
essentially never.

This patch lets you say "asm inline" instead of just "asm", with the
result that that inline assembler is always counted as minimum cost
for inlining.  It implements this for C and C++.


Rather than overloading the inline keyword I think it would be
more in keeping both with the design of existing features to
control inlining in GCC and with the way the languages are
evolving to allow the always_inline (and perhaps also noinline)
attribute to apply to asm statements.


We already "overloaded" the volatile and goto keywords here (it's not
overloading, the contexts are completely disjoint).

always_inline is a function attribute, and you want to make it a statement
attribute (which we do not currently support except for empty statements!)
as well.


Yes, I suggest using a statement attribute for this.  I believe
it would be more appropriate that overloading the meaning of
an existing keyword in this context.

C++ is entertaining proposals to introduce a few statement
attributes (e.g., likely/unlikely and even unreachable) so
as I said, adding one to GCC would be in like with
the direction the languages are moving (C has adopted the C++
11 attributes for C2X and it likely will continue to do so going
forward).

Supporting an attribute in this context also has a lower cost
for third party tools that parse code (and are already prepared
to deal with attributes).  Adding a new keyword here has a good
chance of breaking some of those tools.


The inline keyword is commonly understood to be just a hint to
the compiler.


That is exactly what it is here.  It hints the compiler that this asm
is cheap, whatever it may look like.


The way you described the purpose of asm inline: "such code will
be inlined essentially always instead of essentially never"
sounded to me like it's close to the effect of attribute
always_inline.  But after reading the patch it looks like the asm
just overrides the number of instructions GCC estimates
the statement expands into.  If I read the code right then
a different attribute altogether would seem more appropriate
(e.g., perhaps even something like insn_count (N); that would
make it possible to provide an exact instruction count and also
prevent the inlining of the enclosing function by specifying
a very large count).  In any event, overloading the inline
keyword for this purpose doesn't seem like a good approach
to me.

Martin


Re: [GCC][AArch64] [middle-end][docs] Document the xorsign optab

2018-11-11 Thread Sandra Loosemore

On 11/11/18 3:14 AM, Tamar Christina wrote:

Hi All,

This patch just adds documentation for the xorsign optab that was added a while 
ago.

Ok for trunk?

+@cindex @code{xorsign@var{m}3} instruction pattern
+@item @samp{xorsign@var{m}3}
+Target suppports an efficient expansion of x * copysign (1.0, y)
+as xorsign (x, y).  Store a value with the magnitude of operand 1
+and the sign of operand 2 into operand 0.  All operands have mode
+@var{m}, which is a scalar or vector floating-point mode.
+
+This pattern is not allowed to @code{FAIL}.
+


Hmmm, needs markup, plus it's a little confusing.  How about describing 
it as


Equivalent to @samp{op0 = op1 * copysign (1.0, op2)}: store a value with
the magnitude of operand 1 and the sign of operand 2 into operand 0.
All operands have mode @var{m}, which is a scalar or vector
floating-point mode.

This pattern is not allowed to @code{FAIL}.

-Sandra


[RFC PATCH] rs6000: Early remat for CR fields

2018-11-11 Thread Segher Boessenkool
This implements early remat for condition codes, for rs6000.  Saving
and restoring the CR register is pretty expensive, so we want to avoid
doing it.

It is an RFC for two reasons.  Firstly, I needed to delete the
NUM_POLY_INT_COEFFS > 1 check from generic code, since that has nothing
to do with early remat, it's just for Arm SVE.  This should be moved to
the aarch64 hook implementation?

Secondly, it makes almost all (or all?) asan tests fail.  I do not know
why; everything else works fine.  This needs to be investigated,

Early remat does not handle instructions that set a condition reg as
well as another (general purpose) register.  It could, by
rematerializing the instruction as just the comparison.  I don't know
how much this would help (it would allow rematerializing Power "dot"
instructions, which are pretty frequent).


Segher

---
 gcc/config/rs6000/rs6000.c | 17 +
 gcc/early-remat.c  |  2 +-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index c60e1b2..3411570 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1589,6 +1589,9 @@ static const struct attribute_spec 
rs6000_attribute_table[] =
 #undef TARGET_SET_UP_BY_PROLOGUE
 #define TARGET_SET_UP_BY_PROLOGUE rs6000_set_up_by_prologue
 
+#undef TARGET_SELECT_EARLY_REMAT_MODES
+#define TARGET_SELECT_EARLY_REMAT_MODES rs6000_select_early_remat_modes
+
 #undef TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS
 #define TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS 
rs6000_get_separate_components
 #undef TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB
@@ -26031,6 +26034,20 @@ rs6000_global_entry_point_needed_p (void)
   return cfun->machine->r2_setup_needed;
 }
 
+/* Implement TARGET_SELECT_EARLY_REMAT_MODES.  */
+
+static void
+rs6000_select_early_remat_modes (sbitmap modes)
+{
+  /* We want to rematerialize all condition codes.  */
+  for (int i = 0; i < NUM_MACHINE_MODES; ++i)
+{
+  machine_mode mode = (machine_mode) i;
+  if (GET_MODE_CLASS (mode) == MODE_CC)
+   bitmap_set_bit (modes, i);
+}
+}
+
 /* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS.  */
 static sbitmap
 rs6000_get_separate_components (void)
diff --git a/gcc/early-remat.c b/gcc/early-remat.c
index 776b2d0..b969e28 100644
--- a/gcc/early-remat.c
+++ b/gcc/early-remat.c
@@ -2588,7 +2588,7 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *)
   {
-return optimize > 1 && NUM_POLY_INT_COEFFS > 1;
+return optimize > 1;
   }
 
   virtual unsigned int execute (function *f)
-- 
1.8.3.1



[PATCH] C++: improvements to diagnostics that use %P (more PR c++/85110)

2018-11-11 Thread David Malcolm
This patch is based on grepping the C++ frontend for %P and %qP
i.e. diagnostics that refer to a parameter number.  It fixes up
these diagnostics to highlight the pertinent param, where appropriate
(and possible), along with various other tweaks, as described in the
ChangeLog.

OK for trunk if it passes testing?

gcc/cp/ChangeLog:
PR c++/85110
* call.c (conversion_null_warnings): Try to use the location of
the expression for the warnings.  Add notes showing the parameter
of the function decl, where available.
(get_fndecl_argument_location): Gracefully reject
non-FUNCTION_DECLs.
(maybe_inform_about_fndecl_for_bogus_argument_init): New function.
(convert_like_real): Use it in various places to avoid repetition.
* cp-tree.h (maybe_inform_about_fndecl_for_bogus_argument_init):
New declaration.
* decl2.c: Include "gcc-rich-location.h".
(check_default_args): Use the location of the parameter when
complaining about parameters with missing default arguments in
preference to that of the fndecl.
Attempt to record the location of first parameter with a default
argument and add it as a secondary range to such errors.
* typeck.c (convert_arguments): When complaining about parameters
with incomplete types, attempt to use the location of the
argument.  Where available, add a note showing the pertinent
parameter in the fndecl.
(convert_for_assignment): When complaining about bad conversions
at function calls, use the location of the unstripped argument.
Update the note about the callee to use the location of the
pertinent parameter, and to use
maybe_inform_about_fndecl_for_bogus_argument_init.
(convert_for_initialization): When checking for bogus references,
add an auto_diagnostic_group, and update the note to use the
location of the pertinent parameter, rather than just the callee.

gcc/testsuite/ChangeLog:
PR c++/85110
* g++.dg/diagnostic/missing-default-args.C: New test.
* g++.dg/diagnostic/param-type-mismatch-3.C: New test.
* g++.dg/diagnostic/param-type-mismatch.C: Add tests for invalid
references and incomplete types.
* g++.dg/warn/Wconversion-null-4.C: New test.
---
 gcc/cp/call.c  | 83 ++
 gcc/cp/cp-tree.h   |  1 +
 gcc/cp/decl2.c | 16 -
 gcc/cp/typeck.c| 19 +++--
 .../g++.dg/diagnostic/missing-default-args.C   | 53 ++
 .../g++.dg/diagnostic/param-type-mismatch-3.C  | 26 +++
 .../g++.dg/diagnostic/param-type-mismatch.C| 41 +++
 gcc/testsuite/g++.dg/warn/Wconversion-null-4.C | 37 ++
 8 files changed, 238 insertions(+), 38 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/missing-default-args.C
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/param-type-mismatch-3.C
 create mode 100644 gcc/testsuite/g++.dg/warn/Wconversion-null-4.C

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 6f40156..ea5e73b 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -6636,16 +6636,24 @@ conversion_null_warnings (tree totype, tree expr, tree 
fn, int argnum)
   if (null_node_p (expr) && TREE_CODE (totype) != BOOLEAN_TYPE
   && ARITHMETIC_TYPE_P (totype))
 {
-  source_location loc =
-   expansion_point_location_if_in_system_header (input_location);
-
   if (fn)
-   warning_at (loc, OPT_Wconversion_null,
-   "passing NULL to non-pointer argument %P of %qD",
-   argnum, fn);
+   {
+ source_location loc = EXPR_LOC_OR_LOC (expr, input_location);
+ loc = expansion_point_location_if_in_system_header (loc);
+ auto_diagnostic_group d;
+ if (warning_at (loc, OPT_Wconversion_null,
+ "passing NULL to non-pointer argument %P of %qD",
+ argnum, fn))
+   inform (get_fndecl_argument_location (fn, argnum),
+   "  declared here");
+   }
   else
-   warning_at (loc, OPT_Wconversion_null,
-   "converting to non-pointer type %qT from NULL", totype);
+   {
+ source_location loc
+   = expansion_point_location_if_in_system_header (input_location);
+ warning_at (loc, OPT_Wconversion_null,
+ "converting to non-pointer type %qT from NULL", totype);
+   }
 }
 
   /* Issue warnings if "false" is converted to a NULL pointer */
@@ -6653,9 +6661,15 @@ conversion_null_warnings (tree totype, tree expr, tree 
fn, int argnum)
   && TYPE_PTR_P (totype))
 {
   if (fn)
-   warning_at (input_location, OPT_Wconversion_null,
-   "converting % to pointer type for argument %P "
-   "of 

[doc, committed] document restrictions on mixing -fno-rtti and -frtti

2018-11-11 Thread Sandra Loosemore
I've checked in this patch to fix another old but easy doc issue from 
bugzilla, PR43105.


-Sandra
2018-11-11  Sandra Loosemore  

	PR c++/43105

	gcc/
	* doc/invoke.texi (C++ Dialect Options): Add warning about mixing
	-frtti and -fno-rtti code.
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi	(revision 266019)
+++ gcc/doc/invoke.texi	(working copy)
@@ -2679,6 +2679,11 @@ needed. The @code{dynamic_cast} operator
 do not require run-time type information, i.e.@: casts to @code{void *} or to
 unambiguous base classes.
 
+Mixing code compiled with @option{-frtti} with that compiled with
+@option{-fno-rtti} may not work.  For example, programs may
+fail to link if a class compiled with @option{-fno-rtti} is used as a base 
+for a class compiled with @option{-frtti}.  
+
 @item -fsized-deallocation
 @opindex fsized-deallocation
 Enable the built-in global declarations


[PATCH] RFC: C/C++: print help when a header can't be found

2018-11-11 Thread David Malcolm
When gcc can't find a header file, it's a hard error that stops the build,
typically requiring the user to mess around with compile flags, Makefiles,
dependencies, and so forth.

Often the exact search paths aren't obvious to the user.  Consider the
case where the include paths are injected via a tool such as pkg-config,
such as e.g.:

  gcc $(pkg-config --cflags glib-2.0) demo.c

This patch is an attempt at being more helpful for such cases.  Given that
the user can't proceed until the issue is resolved, I think it's reasonable
to default to telling the user as much as possible about what happened.
This patch list all of the search paths, and any close matches (e.g. for
misspellings).

Without the patch, the current behavior is:

misspelled-header-1.c:1:10: fatal error: test-header.hpp: No such file or 
directory
1 | #include "test-header.hpp"
  |  ^
compilation terminated.

With the patch, the user gets this output:

misspelled-header-1.c:1:10: fatal error: test-header.hpp: No such file or 
directory
1 | #include "test-header.hpp"
  |  ^
misspelled-header-1.c:1:10: note: paths searched:
misspelled-header-1.c:1:10: note:  path: ''
misspelled-header-1.c:1:10: note:   not found: 'test-header.hpp'
misspelled-header-1.c:1:10: note:   close match: 'test-header.h'
1 | #include "test-header.hpp"
  |  ^
  |  "test-header.h"
misspelled-header-1.c:1:10: note:  path: '/usr/include/glib-2.0' (via '-I')
misspelled-header-1.c:1:10: note:   not found: 
'/usr/include/glib-2.0/test-header.hpp'
misspelled-header-1.c:1:10: note:  path: '/usr/lib64/glib-2.0/include' (via 
'-I')
misspelled-header-1.c:1:10: note:   not found: 
'/usr/lib64/glib-2.0/include/test-header.hpp'
misspelled-header-1.c:1:10: note:  path: './include' (system directory)
misspelled-header-1.c:1:10: note:   not found: './include/test-header.hpp'
misspelled-header-1.c:1:10: note:  path: './include-fixed' (system directory)
misspelled-header-1.c:1:10: note:   not found: './include-fixed/test-header.hpp'
misspelled-header-1.c:1:10: note:  path: '/usr/local/include' (system directory)
misspelled-header-1.c:1:10: note:   not found: 
'/usr/local/include/test-header.hpp'
misspelled-header-1.c:1:10: note:  path: '/usr/include' (system directory)
misspelled-header-1.c:1:10: note:   not found: '/usr/include/test-header.hpp'
compilation terminated.

showing the paths that were tried, and why (e.g. the -I paths injected by
the pkg-config invocation), and the .hpp vs .h issue (with a fix-it hint).

It's verbose, but as I said above, the user can't proceed until they
resolve it, so I think being verbose is appropriate here.

Thoughts?
Dave

gcc/c-family/ChangeLog:
* c-common.c: Include , , and
"gcc-rich-location.h".
(maybe_suggest_misspelled_headers): New function.
(cb_on_missing_header): New function.
* c-common.h (cb_on_missing_header): New decl.
* c-lex.c (init_c_lex): Wire up cb_on_missing_header.

gcc/ChangeLog:
* diagnostic.c (diagnostic_action_after_output): Add "rich_loc"
param.
(diagnostic_action_after_output): When handling a fatal error,
call the rich_location's on_fatal_error vfunc.
(diagnostic_report_diagnostic): Pass the rich_location to
diagnostic_action_after_output.
* diagnostic.h (diagnostic_action_after_output): Add optional
rich_location * param.
* spellcheck.c (get_copy_of_closest_string): New function.
* spellcheck.h (get_copy_of_closest_string): New decl.

gcc/testsuite/ChangeLog:
* gcc.dg/cpp/misspelled-header-1.c: New test.
* gcc.dg/cpp/misspelled-header-2.c: New test.
* gcc.dg/cpp/misspelled-header-3.c: New test.

libcpp/ChangeLog:
* errors.c (cpp_errno_filename): Add overload taking a
rich_location *.
* files.c (open_file_failed): Add "start_dir" param.
(find_file_in_dir): Update for renaming of append_file_to_dir.
(_cpp_find_file): Pass start_dir to open_file_failed.
(class open_file_failed_location): New subclass of rich_location.
(open_file_failed): Add "start_dir" param.  If we're not
outputting dependencies, then use open_file_failed_location to
provide hints to the user on which paths were searched.
(append_file_to_dir): Rename to...
(cpp_append_file_to_dir): ...this.
(read_name_map): Update for renaming.
* include/cpplib.h (struct cpp_callbacks): Add on_missing_header.
(cpp_errno_filename): Convert final patam from source_location
to rich_location *.
(cpp_append_file_to_dir): New decl.
* include/line-map.h (rich_location::~rich_location): Make
virtual.
(rich_location::on_fatal_error): New vfunc.
---
 gcc/c-family/c-common.c| 85 ++
 gcc/c-family/c-common.h|  5 

libstdc++ PR54005M is_lock_free; consistently avoid referring to object

2018-11-11 Thread Hans-Peter Nilsson
This patch should have no visible effect.  It was approved in the BZ and
is what remains of PR54005.  _M_i is declared "alignas(_S_alignment) _Tp
_M_i;" and is_lock_free is supposed to refer to the *type* not the
specific *object*.  (Note how the actual address of the object is not
used in the __atomic_is_lock_free call.)  Better then also not refer to
the object indirectly like that, even though the alignas-declaration
*should* make it the same number.  Now I don't have to write tests to
assert that being true.

For an earlier version that also replaced __atomic_is_lock_free with
__atomic_always_lock_free I wrote some tests that are posted in the PR,
but as they aren't related to the effect of the patch anymore, it
doesn't seem useful to add them.

Belatedly committed.

PR libstdc++-v3/54005
* include/bits/atomic_base.h (__atomic_base<_TTp>::is_lock_free(),
__atomic_base<_PTp*>::is_lock_free()): Call __atomic_is_lock_free
with the type-derived _S_alignment instead of __alignof the object.
* include/std/atomic (atomic::is_lock_free()): Likewise.

Index: libstdc++-v3/include/std/atomic
===
--- libstdc++-v3/include/std/atomic (revision 265027)
+++ libstdc++-v3/include/std/atomic (working copy)
@@ -222,7 +222,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
// Produce a fake, minimally aligned pointer.
return __atomic_is_lock_free(sizeof(_M_i),
-   reinterpret_cast(-__alignof(_M_i)));
+   reinterpret_cast(-_S_alignment));
   }
 
   bool
@@ -230,7 +230,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
// Produce a fake, minimally aligned pointer.
return __atomic_is_lock_free(sizeof(_M_i),
-   reinterpret_cast(-__alignof(_M_i)));
+   reinterpret_cast(-_S_alignment));
   }
 
 #if __cplusplus >= 201703L
Index: libstdc++-v3/include/bits/atomic_base.h
===
--- libstdc++-v3/include/bits/atomic_base.h (revision 265027)
+++ libstdc++-v3/include/bits/atomic_base.h (working copy)
@@ -355,7 +355,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
// Use a fake, minimally aligned pointer.
return __atomic_is_lock_free(sizeof(_M_i),
-   reinterpret_cast(-__alignof(_M_i)));
+   reinterpret_cast(-_S_alignment));
   }
 
   bool
@@ -363,7 +363,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
// Use a fake, minimally aligned pointer.
return __atomic_is_lock_free(sizeof(_M_i),
-   reinterpret_cast(-__alignof(_M_i)));
+   reinterpret_cast(-_S_alignment));
   }
 
   _GLIBCXX_ALWAYS_INLINE void

brgds, H-P


Re: [PATCH 2/2] asm inline

2018-11-11 Thread Segher Boessenkool
On Sun, Nov 11, 2018 at 02:33:34PM -0700, Martin Sebor wrote:
> On 10/30/2018 11:30 AM, Segher Boessenkool wrote:
> >The Linux kernel people want a feature that makes GCC pretend some
> >inline assembler code is tiny (while it would think it is huge), so
> >that such code will be inlined essentially always instead of
> >essentially never.
> >
> >This patch lets you say "asm inline" instead of just "asm", with the
> >result that that inline assembler is always counted as minimum cost
> >for inlining.  It implements this for C and C++.
> 
> Rather than overloading the inline keyword I think it would be
> more in keeping both with the design of existing features to
> control inlining in GCC and with the way the languages are
> evolving to allow the always_inline (and perhaps also noinline)
> attribute to apply to asm statements.

We already "overloaded" the volatile and goto keywords here (it's not
overloading, the contexts are completely disjoint).

always_inline is a function attribute, and you want to make it a statement
attribute (which we do not currently support except for empty statements!)
as well.

> The inline keyword is commonly understood to be just a hint to
> the compiler.

That is exactly what it is here.  It hints the compiler that this asm
is cheap, whatever it may look like.

> The attributes, on the other hand, are recognized
> as binding requests to inline (if possible) or avoid inlining,
> respectively.

And that is not what this does (that would be hard to do, in fact).

> >+You can use @code{asm inline} instead of @code{asm} to have the assembler
> >+code counted as mimimum size for inlining purposes; @pxref{Size of an 
> >asm}.


Segher


Re: [PATCH] Ensure that dump calls are guarded with dump_enabled_p

2018-11-11 Thread Segher Boessenkool
On Sun, Nov 11, 2018 at 04:01:42PM -0500, David Malcolm wrote:
> On Sun, 2018-11-11 at 12:10 -0700, Martin Sebor wrote:
> > > +#define VERIFY_DUMP_ENABLED_P \
> > > +  do {   \
> > > +gcc_assert (dump_enabled_p ());  \
> > > +if (!dump_enabled_p ())  \
> > > +  return;\
> > > +  } while (0)
> > 
> > Since it behaves more like a function call (compound statement
> > to be exact) I would suggest to make VERIFY_DUMP_ENABLED_P
> > a function-like macro rather than object-like one.
> 
> FWIW it's not quite a function call: the "return" statement within it
> returns from the function it's been put inside.  Maybe that needs
> expressing in the name of the macro?  (all this depends on whether we
> want to return or abort, or both.  If we just want to return, I'd write
> that directly, without a macro)

Why would we want to return?  Is the assert testing something that is not
required?  Then why is it an assert?  If on the other hand it is vital,
just returning is terrible (and won't work like this for non-void
functions of course).

It should be either

#define VERIFY_DUMP_ENABLED_P gcc_assert (dump_enabled_p ())

or

#define VERIFY_DUMP_ENABLED_P do { if (!dump_enabled_p ()) abort (); } while (0)

(depending on if you want to abort only with ENABLE_ASSERT_CHECKING or
always).


Segher


Re: [PATCH 2/2] asm inline

2018-11-11 Thread Martin Sebor

On 10/30/2018 11:30 AM, Segher Boessenkool wrote:

The Linux kernel people want a feature that makes GCC pretend some
inline assembler code is tiny (while it would think it is huge), so
that such code will be inlined essentially always instead of
essentially never.

This patch lets you say "asm inline" instead of just "asm", with the
result that that inline assembler is always counted as minimum cost
for inlining.  It implements this for C and C++.


Rather than overloading the inline keyword I think it would be
more in keeping both with the design of existing features to
control inlining in GCC and with the way the languages are
evolving to allow the always_inline (and perhaps also noinline)
attribute to apply to asm statements.

The inline keyword is commonly understood to be just a hint to
the compiler.  The attributes, on the other hand, are recognized
as binding requests to inline (if possible) or avoid inlining,
respectively.

Martin




2018-10-30  Segher Boessenkool  

* doc/extend.texi (Using Assembly Language with C): Document asm inline.
(Size of an asm): Fix typo.  Document asm inline.
* gimple-pretty-print.c (dump_gimple_asm): Handle asm inline.
* gimple.h (enum gf_mask): Add GF_ASM_INLINE.
(gimple_asm_set_volatile): Fix typo.
* gimple_asm_inline_p: New.
* gimple_asm_set_inline: New.
* gimplify.c (gimplify_asm_expr): Propagate the asm inline flag from
tree to gimple.
* ipa-icf-gimple.c (func_checker::compare_gimple_asm): Compare the
gimple_asm_inline_p flag, too.
* tree-core.h (tree_base): Document that protected_flag is ASM_INLINE_P
in an ASM_EXPR.
* tree-inline.c (estimate_num_insns): If gimple_asm_inline_p return
a minimum size for an asm.
* tree.h (ASM_INLINE_P): New.

gcc/c/
* c-parser.c (c_parser_asm_statement): Detect the inline keyword
after asm.  Pass a flag for it to build_asm_expr.
* c-tree.h (build_asm_expr): Update declaration.
* c-typeck.c (build_asm_stmt): Add is_inline parameter.  Use it to
set ASM_INLINE_P.

gcc/cp/
* cp-tree.h (finish_asm_stmt): Update declaration.
* parser.c (cp_parser_asm_definition): Detect the inline keyword
after asm.  Pass a flag for it to finish_asm_stmt.
* pt.c (tsubst_expr): Pass the ASM_INLINE_P flag to finish_asm_stmt.
* semantics.c (finish_asm_stmt): Add inline_p parameter.  Use it to
set ASM_INLINE_P.

gcc/testsuite/
* c-c++-common/torture/asm-inline.c: New testcase.

---
 gcc/c/c-parser.c| 15 +--
 gcc/c/c-tree.h  |  3 +-
 gcc/c/c-typeck.c|  3 +-
 gcc/cp/cp-tree.h|  2 +-
 gcc/cp/parser.c | 15 ++-
 gcc/cp/pt.c |  2 +-
 gcc/cp/semantics.c  |  3 +-
 gcc/doc/extend.texi | 10 -
 gcc/gimple-pretty-print.c   |  2 +
 gcc/gimple.h| 24 ++-
 gcc/gimplify.c  |  1 +
 gcc/ipa-icf-gimple.c|  3 ++
 gcc/testsuite/c-c++-common/torture/asm-inline.c | 53 +
 gcc/tree-core.h |  3 ++
 gcc/tree-inline.c   |  3 ++
 gcc/tree.h  |  3 ++
 16 files changed, 133 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/torture/asm-inline.c

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index ce9921e..b28b712 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -6283,11 +6283,12 @@ c_parser_for_statement (c_parser *parser, bool ivdep, 
unsigned short unroll,
 }

 /* Parse an asm statement, a GNU extension.  This is a full-blown asm
-   statement with inputs, outputs, clobbers, and volatile and goto tag
-   allowed.
+   statement with inputs, outputs, clobbers, and volatile, inline, and goto
+   tags allowed.

asm-qualifier:
  type-qualifier
+ inline
  goto

asm-qualifier-list:
@@ -6315,7 +6316,7 @@ static tree
 c_parser_asm_statement (c_parser *parser)
 {
   tree quals, str, outputs, inputs, clobbers, labels, ret;
-  bool simple, is_goto;
+  bool simple, is_inline, is_goto;
   location_t asm_loc = c_parser_peek_token (parser)->location;
   int section, nsections;

@@ -6323,6 +6324,7 @@ c_parser_asm_statement (c_parser *parser)
   c_parser_consume_token (parser);

   quals = NULL_TREE;
+  is_inline = false;
   is_goto = false;
   for (bool done = false; !done; )
 switch (c_parser_peek_token (parser)->keyword)
@@ -6340,6 +6342,10 @@ c_parser_asm_statement (c_parser *parser)
c_parser_peek_token (parser)->value);
c_parser_consume_token (parser);
break;
+  case RID_INLINE:

Re: [PATCH] C/C++: add fix-it hints for missing '&' and '*' (PR c++/87850)

2018-11-11 Thread David Malcolm
On Sun, 2018-11-11 at 11:01 -0700, Martin Sebor wrote:
> On 11/10/2018 12:01 AM, Eric Gallager wrote:
> > On 11/9/18, David Malcolm  wrote:
> > > This patch adds a fix-it hint to various pointer-vs-non-pointer
> > > diagnostics, suggesting the addition of a leading '&' or '*'.
> > > 
> > > For example, note the ampersand fix-it hint in the following:
> > > 
> > > demo.c:5:22: error: invalid conversion from 'pthread_key_t' {aka
> > > 'unsigned
> > > int'}
> > >to 'pthread_key_t*' {aka 'unsigned int*'} [-fpermissive]
> > > 5 |   pthread_key_create(key, NULL);
> > >   |  ^~~
> > >   |  |
> > >   |  pthread_key_t {aka unsigned int}
> > >   |  &
> > 
> > Having both the type and the fixit underneath the caret looks kind
> > of confusing
> 
> I agree it's rather subtle.  Keeping the diagnostics separate from
> the suggested fix should avoid the confusion.

FWIW, the fix-it hint is in a different color (assuming that gcc is
invoked in an environment that prints that...)

Dave


Re: [PATCH] Ensure that dump calls are guarded with dump_enabled_p

2018-11-11 Thread David Malcolm
On Sun, 2018-11-11 at 12:10 -0700, Martin Sebor wrote:
> On 11/10/2018 07:57 PM, David Malcolm wrote:
> > On Mon, 2018-10-22 at 16:08 +0200, Richard Biener wrote:
> > > On Mon, 22 Oct 2018, David Malcolm wrote:
> > > 
> > > > On Mon, 2018-10-22 at 15:56 +0200, Richard Biener wrote:
> > > > [...snip...]
> > > > 
> > > > > This is what I finally applied for the original patch after
> > > > > fixing
> > > > > the above issue.
> > > > > 
> > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
> > > > > 
> > > > > Richard.
> > > > > 
> > > > > 2018-10-22  Richard Biener  
> > > > > 
> > > > >  * gimple-ssa-evrp-analyze.c
> > > > >  (evrp_range_analyzer::record_ranges_from_incoming_edge):
> > > > > Be
> > > > >  smarter about what ranges to use.
> > > > >  * tree-vrp.c (add_assert_info): Dump here.
> > > > >  (register_edge_assert_for_2): Instead of here at
> > > > > multiple
> > > > > but
> > > > >  not all places.
> > > > 
> > > > [...snip...]
> > > > 
> > > > > Index: gcc/tree-vrp.c
> > > > > =
> > > > > 
> > > > > ==
> > > > > --- gcc/tree-vrp.c  (revision 265381)
> > > > > +++ gcc/tree-vrp.c  (working copy)
> > > > > @@ -2299,6 +2299,9 @@ add_assert_info (vec
> > > > > 
> > > > > info.val = val;
> > > > > info.expr = expr;
> > > > > asserts.safe_push (info);
> > > > > +  dump_printf (MSG_NOTE | MSG_PRIORITY_INTERNALS,
> > > > > +  "Adding assert for %T from %T %s %T\n",
> > > > > +  name, expr, op_symbol_code (comp_code), val);
> > > > >   }
> > > > 
> > > > I think this dump_printf call needs to be wrapped in:
> > > >if (dump_enabled_p ())
> > > > since otherwise it does non-trivial work, which is then
> > > > discarded
> > > > for
> > > > the common case where dumping is disabled.
> > > > 
> > > > Alternatively, should dump_printf and dump_printf_loc test have
> > > > an
> > > > early-reject internally for that?
> > > 
> > > Oh, I thought it had one - at least the "old" implementation
> > > did nothing expensive so if (dump_enabled_p ()) was just used
> > > to guard multiple printf stmts, avoiding multiple no-op calls.
> > > 
> > > Did you check that all existing dump_* calls are wrapped inside
> > > a dump_enabled_p () region?  If so I can properly guard the
> > > above.
> > > Otherwise I think we should restore previous expectation?
> > > 
> > > Richard.
> > 
> > Here's a patch to address the above.
> > 
> > If called when !dump_enabled_p, the dump_* functions effectively do
> > nothing, but as of r263178 this doing "nothing" involves non-
> > trivial
> > work internally.
> > 
> > I wasn't sure whether the dump_* functions should assert that
> >   dump_enabled_p ()
> > is true when they're called, or if they should bail out immediately
> > for this case, so in this patch I implemented both, so that we get
> > an assertion failure, and otherwise bail out for the case where
> > !dump_enabled_p when assertions are disabled.
> > 
> > Alternatively, we could remove the assertion, and simply have the
> > dump_* functions immediately bail out.
> > 
> > Richard, do you have a preference?
> > 
> > The patch also fixes all of the places I found during testing
> > (on x86_64-pc-linux-gnu) that call into dump_* but which
> > weren't guarded by
> >   if (dump_enabled_p ())
> > The patch adds such conditionals.
> > 
> > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > 
> > OK for trunk?
> > 
> > gcc/ChangeLog:
> > * dumpfile.c (VERIFY_DUMP_ENABLED_P): New macro.
> > (dump_gimple_stmt): Use it.
> > (dump_gimple_stmt_loc): Likewise.
> > (dump_gimple_expr): Likewise.
> > (dump_gimple_expr_loc): Likewise.
> > (dump_generic_expr): Likewise.
> > (dump_generic_expr_loc): Likewise.
> > (dump_printf): Likewise.
> > (dump_printf_loc): Likewise.
> > (dump_dec): Likewise.
> > (dump_dec): Likewise.
> > (dump_hex): Likewise.
> > (dump_symtab_node): Likewise.
> > 
> > gcc/ChangeLog:
> > * gimple-loop-interchange.cc
> > (tree_loop_interchange::interchange):
> > Guard dump call with dump_enabled_p.
> > * graphite-isl-ast-to-gimple.c (graphite_regenerate_ast_isl):
> > Likewise.
> > * graphite-optimize-isl.c (optimize_isl): Likewise.
> > * graphite.c (graphite_transform_loops): Likewise.
> > * tree-loop-distribution.c (pass_loop_distribution::execute):
> > Likewise.
> > * tree-parloops.c (parallelize_loops): Likewise.
> > * tree-ssa-loop-niter.c (number_of_iterations_exit): Likewise.
> > * tree-vect-data-refs.c (vect_analyze_group_access_1):
> > Likewise.
> > (vect_prune_runtime_alias_test_list): Likewise.
> > * tree-vect-loop.c (vect_update_vf_for_slp): Likewise.
> > (vect_estimate_min_profitable_iters): Likewise.
> > * tree-vect-slp.c (vect_record_max_nunits): Likewise.
> > (vect_build_slp_tree_2): Likewise.
> > (vect_supported_load_permutation_p): Likewise.
> > 

Re: [RFT PATCH, i386]: Fix PR87928, ICE in ix86_compute_frame_layout

2018-11-11 Thread Uros Bizjak
On Fri, Nov 9, 2018 at 7:37 PM Iain Sandoe  wrote:

> Bootstrap succeeds and the new test passes for Darwin.
>
> This does not alter that Darwin has breakage in this area 
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78444)
> see : https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00884.html and HJ’s reply.

You can't use crtl->is_leaf in ix86_update_stack_boundary, since that
function gets called from cfgexpand, while crtl->is_leaf is set only
in IRA pass.

I *think* the fix should be along the lines of TARGET_64BIT_MS_ABI
fixup in ix86_compute_frame_layout (BTW: the existing fixup is strange
by itself, since TARGET_64BIT_MS_ABI declares STACK_BOUNDARY to 128,
and I can't see how leaf functions with crtl->preferred_stack_boundary
< 128 survive "gcc_assert (preferred_alignment >= STACK_BOUNDARY /
BITS_PER_UNIT);" a couple of lines below).

So, I think that fixup you proposed in the patch is in the right
direction. What happens if you add TARGET_MACHO to the existing fixup?

Uros.


Re: [PATCH] Ensure that dump calls are guarded with dump_enabled_p

2018-11-11 Thread Martin Sebor

On 11/10/2018 07:57 PM, David Malcolm wrote:

On Mon, 2018-10-22 at 16:08 +0200, Richard Biener wrote:

On Mon, 22 Oct 2018, David Malcolm wrote:


On Mon, 2018-10-22 at 15:56 +0200, Richard Biener wrote:
[...snip...]


This is what I finally applied for the original patch after
fixing
the above issue.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2018-10-22  Richard Biener  

 * gimple-ssa-evrp-analyze.c
 (evrp_range_analyzer::record_ranges_from_incoming_edge): Be
 smarter about what ranges to use.
 * tree-vrp.c (add_assert_info): Dump here.
 (register_edge_assert_for_2): Instead of here at multiple
but
 not all places.


[...snip...]


Index: gcc/tree-vrp.c
=
==
--- gcc/tree-vrp.c  (revision 265381)
+++ gcc/tree-vrp.c  (working copy)
@@ -2299,6 +2299,9 @@ add_assert_info (vec 
info.val = val;
info.expr = expr;
asserts.safe_push (info);
+  dump_printf (MSG_NOTE | MSG_PRIORITY_INTERNALS,
+  "Adding assert for %T from %T %s %T\n",
+  name, expr, op_symbol_code (comp_code), val);
  }


I think this dump_printf call needs to be wrapped in:
   if (dump_enabled_p ())
since otherwise it does non-trivial work, which is then discarded
for
the common case where dumping is disabled.

Alternatively, should dump_printf and dump_printf_loc test have an
early-reject internally for that?


Oh, I thought it had one - at least the "old" implementation
did nothing expensive so if (dump_enabled_p ()) was just used
to guard multiple printf stmts, avoiding multiple no-op calls.

Did you check that all existing dump_* calls are wrapped inside
a dump_enabled_p () region?  If so I can properly guard the above.
Otherwise I think we should restore previous expectation?

Richard.


Here's a patch to address the above.

If called when !dump_enabled_p, the dump_* functions effectively do
nothing, but as of r263178 this doing "nothing" involves non-trivial
work internally.

I wasn't sure whether the dump_* functions should assert that
  dump_enabled_p ()
is true when they're called, or if they should bail out immediately
for this case, so in this patch I implemented both, so that we get
an assertion failure, and otherwise bail out for the case where
!dump_enabled_p when assertions are disabled.

Alternatively, we could remove the assertion, and simply have the
dump_* functions immediately bail out.

Richard, do you have a preference?

The patch also fixes all of the places I found during testing
(on x86_64-pc-linux-gnu) that call into dump_* but which
weren't guarded by
  if (dump_enabled_p ())
The patch adds such conditionals.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/ChangeLog:
* dumpfile.c (VERIFY_DUMP_ENABLED_P): New macro.
(dump_gimple_stmt): Use it.
(dump_gimple_stmt_loc): Likewise.
(dump_gimple_expr): Likewise.
(dump_gimple_expr_loc): Likewise.
(dump_generic_expr): Likewise.
(dump_generic_expr_loc): Likewise.
(dump_printf): Likewise.
(dump_printf_loc): Likewise.
(dump_dec): Likewise.
(dump_dec): Likewise.
(dump_hex): Likewise.
(dump_symtab_node): Likewise.

gcc/ChangeLog:
* gimple-loop-interchange.cc (tree_loop_interchange::interchange):
Guard dump call with dump_enabled_p.
* graphite-isl-ast-to-gimple.c (graphite_regenerate_ast_isl): Likewise.
* graphite-optimize-isl.c (optimize_isl): Likewise.
* graphite.c (graphite_transform_loops): Likewise.
* tree-loop-distribution.c (pass_loop_distribution::execute): Likewise.
* tree-parloops.c (parallelize_loops): Likewise.
* tree-ssa-loop-niter.c (number_of_iterations_exit): Likewise.
* tree-vect-data-refs.c (vect_analyze_group_access_1): Likewise.
(vect_prune_runtime_alias_test_list): Likewise.
* tree-vect-loop.c (vect_update_vf_for_slp): Likewise.
(vect_estimate_min_profitable_iters): Likewise.
* tree-vect-slp.c (vect_record_max_nunits): Likewise.
(vect_build_slp_tree_2): Likewise.
(vect_supported_load_permutation_p): Likewise.
(vect_slp_analyze_operations): Likewise.
(vect_slp_analyze_bb_1): Likewise.
(vect_slp_bb): Likewise.
* tree-vect-stmts.c (vect_analyze_stmt): Likewise.
* tree-vectorizer.c (try_vectorize_loop_1): Likewise.
(pass_slp_vectorize::execute): Likewise.
(increase_alignment): Likewise.
---
 gcc/dumpfile.c   | 25 
 gcc/gimple-loop-interchange.cc   |  2 +-
 gcc/graphite-isl-ast-to-gimple.c | 11 --
 gcc/graphite-optimize-isl.c  | 36 ++---
 gcc/graphite.c   |  3 +-
 gcc/tree-loop-distribution.c |  9 +++--
 gcc/tree-parloops.c  | 17 
 gcc/tree-ssa-loop-niter.c|  2 +-
 gcc/tree-vect-data-refs.c| 10 +++--
 

[doc, committed] document probability associated with __builtin_expect

2018-11-11 Thread Sandra Loosemore
I've checked in this patch to fix PR26366, another small but ancient 
documentation bug.


-Sandra
2018-11-11  Sandra Loosemore  

	PR c/26366

	gcc/
	* doc/extend.texi (Other Builtins): Document probability associated
	with __builtin_expect.
Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi	(revision 266016)
+++ gcc/doc/extend.texi	(working copy)
@@ -12121,6 +12121,12 @@ if (__builtin_expect (ptr != NULL, 1))
 
 @noindent
 when testing pointer or floating-point values.
+
+For the purposes of branch prediction optimizations, the probability that
+a @code{__builtin_expect} expression is true is controlled by GCC's
+@code{builtin-expect-probability} parameter, which defaults to 90%.  
+You can also use @code{__builtin_expect_with_probability} to explicitly 
+assign a probability value to individual expressions.
 @end deftypefn
 
 @deftypefn {Built-in Function} long __builtin_expect_with_probability


Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-11 Thread Martin Sebor

One other high-level comment: a more powerful interface to
variable tracing than annotating declarations in the source
would be to provide either the names of the symbols to trace
on the command line or in an external file.  That way tracing
could be enabled for objects and types declared in read-only
files (such as system headers), and would let the user more
easily experiment with annotations.

This could be in addition to the attributes, and would require
coming up with a way of identifying symbols with internal or
no linkage, such as local variables, and perhaps also function
arguments, return values, etc., if this mechanisms were to
provide access to those as well (I think it would be fine if
this "external" mechanism provided support to only a subset
of symbols).

Martin

On 11/04/2018 12:32 AM, Andi Kleen wrote:

From: Andi Kleen 

Add a new pass to automatically instrument changes to variables
with the new PTWRITE instruction on x86. PTWRITE writes a 4 or 8 byte
field into an Processor Trace log, which allows log over head
logging of informatin.

This allows to reconstruct how values later, which can be useful for
debugging or other analysis of the program behavior. With the compiler
support this can be done with without having to manually add instrumentation
to the code.

Using dwarf information this can be later mapped back to the variables.

There are new options to enable instrumentation for different types,
and also a new attribute to control analysis fine grained per
function or variable level. The attributes can be set on both
the variable and the type level, and also on structure fields.
This allows to enable tracing only for specific code in large
programs.

The pass is generic, but only the x86 backend enables the necessary
hooks. When the backend enables the necessary hooks (with -mptwrite)
there is an additional pass that looks through the code for
attribute vartrace enabled functions or variables.

The -fvartrace-locals options is experimental: it works, but it
generates redundant ptwrites because the pass doesn't use
the SSA information to minimize instrumentation. This could be optimized
later.

Currently the code can be tested with SDE, or on a Intel
Gemini Lake system with a new enough Linux kernel (v4.10+)
that supports PTWRITE for PT. Linux perf can be used to
record the values

perf record -e intel_pt/ptw=1,branch=0/ program
perf script --itrace=crw -F +synth ...

I have an experimential version of perf that can also use
dwarf information to symbolize many[1] values back to their variable
names. So far it is not in standard perf, but available at

https://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git/log/?h=perf/var-resolve-4

It is currently not able to decode all variable locations to names,
but a large subset.

Longer term hopefully gdb will support this information too.

The CPU can potentially generate very data high bandwidths when
code doing a lot of computation is heavily instrumented.
This can cause some data loss in both the CPU and also in perf
logging the data when the disk cannot keep up.

Running some larger workloads most workloads do not cause
CPU level overflows, but I've seen it with -fvartrace
with crafty, and with more workloads with -fvartrace-locals.

Recommendation is to not fully instrument programs,
but only areas of interest either at the file level or using
the attributes.

The other thing is that perf and the disk often cannot keep up
with the data bandwidth for longer computations. In this case
it's possible to use perf snapshot mode (add --snapshot
to the command line above). The data will be only logged to
a memory ring buffer then, and only dump the buffers on events
of interest by sending SIGUSR2 to the perf binrary.

In the future this will be hopefully better supported with
core files and gdb.

Passes bootstrap and test suite on x86_64-linux, also
bootstrapped and tested gcc itself with full -fvartrace
and -fvartrace-locals instrumentation.

gcc/:

2018-11-03  Andi Kleen  

* Makefile.in: Add tree-vartrace.o.
* common.opt: Add -fvartrace, -fvartrace-returns,
-fvartrace-args, -fvartrace-reads, -fvartrace-writes,
-fvartrace-locals
* config/i386/i386.c (ix86_vartrace_func): Add.
(TARGET_VARTRACE_FUNC): Add.
* doc/extend.texi: Document vartrace/no_vartrace
attributes.
* doc/invoke.texi: Document -fvartrace, -fvartrace-returns,
-fvartrace-args, -fvartrace-reads, -fvartrace-writes,
-fvartrace-locals
* doc/tm.texi (TARGET_VARTRACE_FUNC): Add.
* passes.def: Add vartrace pass.
* target.def (vartrace_func): Add.
* tree-pass.h (make_pass_vartrace): Add.
* tree-vartrace.c: New file to implement vartrace pass.

gcc/c-family/:

2018-11-03  Andi Kleen  

* c-attribs.c (handle_vartrace_attribute): New function.

config/:

2018-11-03  Andi Kleen  

* bootstrap-vartrace.mk: New.
* 

Re: [PATCH] C/C++: add fix-it hints for missing '&' and '*' (PR c++/87850)

2018-11-11 Thread Martin Sebor

On 11/10/2018 12:01 AM, Eric Gallager wrote:

On 11/9/18, David Malcolm  wrote:

This patch adds a fix-it hint to various pointer-vs-non-pointer
diagnostics, suggesting the addition of a leading '&' or '*'.

For example, note the ampersand fix-it hint in the following:

demo.c:5:22: error: invalid conversion from 'pthread_key_t' {aka 'unsigned
int'}
   to 'pthread_key_t*' {aka 'unsigned int*'} [-fpermissive]
5 |   pthread_key_create(key, NULL);
  |  ^~~
  |  |
  |  pthread_key_t {aka unsigned int}
  |  &


Having both the type and the fixit underneath the caret looks kind of confusing


I agree it's rather subtle.  Keeping the diagnostics separate from
the suggested fix should avoid the confusion.

Martin


Re: [PATCH 3/3] OpenACC 2.6 manual deep copy support (attach/detach)

2018-11-11 Thread Bernhard Reutner-Fischer
On Sat, 10 Nov 2018 09:11:20 -0800
Julian Brown  wrote:

Two nits.

> diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
> index 6430e61..ebba7ca 100644
> --- a/gcc/fortran/openmp.c
> +++ b/gcc/fortran/openmp.c

> @@ -3781,9 +3805,6 @@ check_array_not_assumed (gfc_symbol *sym, locus loc, 
> const char *name)
>  static void
>  resolve_oacc_data_clauses (gfc_symbol *sym, locus loc, const char *name)
>  {
> -  if (sym->ts.type == BT_DERIVED && sym->attr.allocatable)
> -gfc_error ("ALLOCATABLE object %qs of derived type in %s clause at %L",
> -sym->name, name, );
>if ((sym->ts.type == BT_ASSUMED && sym->attr.allocatable)
>|| (sym->ts.type == BT_CLASS && CLASS_DATA (sym)
> && CLASS_DATA (sym)->attr.allocatable))
> @@ -4153,11 +4174,23 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses 
> *omp_clauses,
>   && (list != OMP_LIST_REDUCTION || !openacc))
>for (n = omp_clauses->lists[list]; n; n = n->next)
>   {
> -   if (n->sym->mark)
> - gfc_error ("Symbol %qs present on multiple clauses at %L",
> -n->sym->name, >where);
> -   else
> - n->sym->mark = 1;
> +   bool array_only_p = true;
> +   /* Disallow duplicate bare variable references and multiple
> +  subarrays of the same array here, but allow multiple components of
> +  the same (e.g. derived-type) variable.  For the latter, duplicate
> +  components are detected elsewhere.  */
> +   if (openacc && n->expr && n->expr->expr_type == EXPR_VARIABLE)
> + for (gfc_ref *ref = n->expr->ref; ref; ref = ref->next)
> +   if (ref->type != REF_ARRAY)
> + array_only_p = false;

you could break here.
It seems we do not perform this optimization ourself even when we
could (or should) prove that the dataflow does not force the loop to run
to completion?!

Consider:

int foo(char *str) {
_Bool cond = 0; 
for (char *chp = str; *chp != 0; chp++) 
if (*chp == 42) {   
cond = 1;   
#ifdef BREAK
break;  
#endif  
}   
return 0x0815 + cond;   
}

$ for i in 0 1 2 3 s;do gcc -UBREAK -O$i -c -o $i.o /tmp/1/foo.c  
-fomit-frame-pointer -m32 -mtune=i386;done;size ?.o
   textdata bss dec hex filename
109   0   0 109  6d 0.o
 97   0   0  97  61 1.o
107   0   0 107  6b 2.o
107   0   0 107  6b 3.o
 86   0   0  86  56 s.o
$ for i in 0 1 2 3 s;do gcc -DBREAK -O$i -c -o $i.o /tmp/1/foo.c  
-fomit-frame-pointer -m32 -mtune=i386;done;size ?.o
   textdata bss dec hex filename
111   0   0 111  6f 0.o
 82   0   0  82  52 1.o
 82   0   0  82  52 2.o
 82   0   0  82  52 3.o
 77   0   0  77  4d s.o

respectively

$ for i in 0 1 2 3 s;do gcc -UBREAK -O$i -c -o $i.o /tmp/1/foo.c  
-fomit-frame-pointer -m32 -mtune=i386 -falign-functions=1 -falign-jumps=1 
-falign-labels=1 -falign-loops=1 -fno-unwind-tables 
-fno-asynchronous-unwind-tables -fno-guess-branch-probability;done;size ?.o
   textdata bss dec hex filename
 61   0   0  61  3d 0.o
 38   0   0  38  26 1.o
 43   0   0  43  2b 2.o
 43   0   0  43  2b 3.o
 34   0   0  34  22 s.o
$ for i in 0 1 2 3 s;do gcc -DBREAK -O$i -c -o $i.o /tmp/1/foo.c  
-fomit-frame-pointer -m32 -mtune=i386 -falign-functions=1 -falign-jumps=1 
-falign-labels=1 -falign-loops=1 -fno-unwind-tables 
-fno-asynchronous-unwind-tables -fno-guess-branch-probability;done;size ?.o
   textdata bss dec hex filename
 63   0   0  63  3f 0.o
 39   0   0  39  27 1.o
 39   0   0  39  27 2.o
 39   0   0  39  27 3.o
 33   0   0  33  21 s.o

that's really a pity.


> +   if (array_only_p)
> + {
> +   if (n->sym->mark)
> + gfc_error ("Symbol %qs present on multiple clauses at %L",
> +n->sym->name, >where);
> +   else
> + n->sym->mark = 1;
> + }
>   }

> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> index 274edc0..aa7723d 100644
> --- a/gcc/gimplify.c
> +++ 

Simplify enumerate and array types

2018-11-11 Thread Jan Hubicka
Hi,
this patch should be last patch for type simplification (modulo possible bits
that needs clearing I still notice).  It does the following
 1) enables the patch to simplify aggregates also for enums.
While this does not affect C++, in C we want pointers to incomplete
and complete variant of enums the same.
 2) turns arrays to arrays of incomplete type. This is used for pointers
to arrays
 3) turns arrays to arrays of simplified type. This is used for arrays of
pointers to agreggates.

The patch extends fld_type_variant_equal_p and fld_type_variant to live with
fact that arrays may have different TREE_TYPE within the TYPE_NEXT_VARIANT
lists and also introduced simplified_type_hash that holds the simplified arrays.
If build_array_type_1 was able to share arrays with TYPELESS_STORAGE flag
this hash would not be necessary. But I am unsure why that is done in there.

With this there are cca 9k types in Firefox (out of 80k) with duplicates and
20k type duplicates overall.  I inspected manually some of them and the reasons
can be tracked down to differences in VAR_DECL flags for vtables and
inherently by referring such types by TYPE_CONTEXT or as a field.
I will try to cleanup visibilities next stage1, but I think practically this
is quite non-critical as it would be very hard to make number of copies to
explide (there is never more than 3 types in the duplicate chain for Firefox
and we used to have more than 4k duplicates of single type before the changes)

Bootstrapped/regtested x86_64-linux.
OK?

Honza
* ipa-devirt.c (add_type_duplicate): Do not ICE on incomplete enums.
* tree.c (build_array_type_1): Forward declare.
(fld_type_variant_equal_p): Add INNER_TYPE parameter.
(fld_type_variant): Likewise.
(fld_simplified_types): New hash.
(fld_incomplete_type_of): Handle array and enumeration types.
(fld_simplified_type): Handle simplification of arrays.
(free_lang_data): Allocate and free simplified types hash.
Index: ipa-devirt.c
===
--- ipa-devirt.c(revision 266011)
+++ ipa-devirt.c(working copy)
@@ -1705,7 +1705,8 @@ add_type_duplicate (odr_type val, tree t
   else if (!COMPLETE_TYPE_P (val->type) && COMPLETE_TYPE_P (type))
 {
   prevail = true;
-  build_bases = TYPE_BINFO (type);
+  if (TREE_CODE (type) == RECORD_TYPE)
+build_bases = TYPE_BINFO (type);
 }
   else if (COMPLETE_TYPE_P (val->type) && !COMPLETE_TYPE_P (type))
 ;
Index: tree.c
===
--- tree.c  (revision 266011)
+++ tree.c  (working copy)
@@ -265,6 +265,8 @@ static void print_type_hash_statistics (
 static void print_debug_expr_statistics (void);
 static void print_value_expr_statistics (void);
 
+static tree build_array_type_1 (tree, tree, bool, bool);
+
 tree global_trees[TI_MAX];
 tree integer_types[itk_none];
 
@@ -5108,10 +5110,11 @@ fld_simplified_type_name (tree type)
 
 /* Do same comparsion as check_qualified_type skipping lang part of type
and be more permissive about type names: we only care that names are
-   same (for diagnostics) and that ODR names are the same.  */
+   same (for diagnostics) and that ODR names are the same.
+   If INNER_TYPE is non-NULL, be sure that TREE_TYPE match it.  */
 
 static bool
-fld_type_variant_equal_p (tree t, tree v)
+fld_type_variant_equal_p (tree t, tree v, tree inner_type)
 {
   if (TYPE_QUALS (t) != TYPE_QUALS (v)
   /* We want to match incomplete variants with complete types.
@@ -5121,21 +5124,24 @@ fld_type_variant_equal_p (tree t, tree v
  || TYPE_USER_ALIGN (t) != TYPE_USER_ALIGN (v)))
   || fld_simplified_type_name (t) != fld_simplified_type_name (v)
   || !attribute_list_equal (TYPE_ATTRIBUTES (t),
-   TYPE_ATTRIBUTES (v)))
+   TYPE_ATTRIBUTES (v))
+  || (inner_type && TREE_TYPE (v) != inner_type))
 return false;
- 
+
   return true;
 }
 
-/* Find variant of FIRST that match T and create new one if necessary.  */
+/* Find variant of FIRST that match T and create new one if necessary.
+   Set TREE_TYPE to INNER_TYPE if non-NULL.  */
 
 static tree
-fld_type_variant (tree first, tree t, struct free_lang_data_d *fld)
+fld_type_variant (tree first, tree t, struct free_lang_data_d *fld,
+ tree inner_type = NULL)
 {
   if (first == TYPE_MAIN_VARIANT (t))
 return t;
   for (tree v = first; v; v = TYPE_NEXT_VARIANT (v))
-if (fld_type_variant_equal_p (t, v))
+if (fld_type_variant_equal_p (t, v, inner_type))
   return v;
   tree v = build_variant_type_copy (first);
   TYPE_READONLY (v) = TYPE_READONLY (t);
@@ -5153,7 +5159,9 @@ fld_type_variant (tree first, tree t, st
   SET_TYPE_ALIGN (v, TYPE_ALIGN (t));
   TYPE_USER_ALIGN (v) = TYPE_USER_ALIGN (t);
 }
-  gcc_checking_assert (fld_type_variant_equal_p (t,v));

[patch, fortran] Fix PR 70260, ICE on invalid

2018-11-11 Thread Thomas Koenig

Hello world,

the attached patch fixes both ICEs in the PR by adding some tests.
It was necessary to shuffle around a bit of code, plus to make sure that
double error reporting did not become too bad.

Regression-tested. OK for trunk?

Regards

Thomas


2018-11-11  Thomas Koenig  

PR fortran/70260
* expr.c (gfc_check_assign): Reject assigning to an external
symbol.
(gfc_check_pointer_assign): Add suppress_type_test
argument. Insert line after if. A non-proc pointer can not point
to a constant.  Only check types if suppress_type_test is false.
* gfortran.h (gfc_check_pointer_assign): Add optional
suppress_type_test argument.
* resolve.c (gfc_resolve_code):  Move up gfc_check_pointer_assign
and give it the extra argument.
(resolve_fl_procedure): Set error on value for a function with
an inizializer.

2018-11-11  Thomas Koenig  

PR fortran/70260
* gfortran.dg/proc_ptr_result_5.f90:  Add dg-error directive.
* gfortran.dg/protected_4.f90: Split line to allow for extra error.
* gfortran.dg/protected_6.f90: Likewise.
* gfortran.dg/assign_11.f90: New test.
* gfortran.dg/pointer_assign_12.f90: New test.
Index: fortran/expr.c
===
--- fortran/expr.c	(Revision 265732)
+++ fortran/expr.c	(Arbeitskopie)
@@ -3507,6 +3507,18 @@ gfc_check_assign (gfc_expr *lvalue, gfc_expr *rval
 	  return false;
 	}
 }
+  else
+{
+  /* Reject assigning to an external symbol.  For initializers, this
+	 was already done before, in resolve_fl_procedure.  */
+  if (sym->attr.flavor == FL_PROCEDURE && sym->attr.external
+	  && sym->attr.proc != PROC_MODULE && !rvalue->error)
+	{
+	  gfc_error ("Illegal assignment to external procedure at %L",
+		 >where);
+	  return false;
+	}
+}
 
   if (rvalue->rank != 0 && lvalue->rank != rvalue->rank)
 {
@@ -3643,7 +3655,8 @@ gfc_check_assign (gfc_expr *lvalue, gfc_expr *rval
NULLIFY statement.  */
 
 bool
-gfc_check_pointer_assign (gfc_expr *lvalue, gfc_expr *rvalue)
+gfc_check_pointer_assign (gfc_expr *lvalue, gfc_expr *rvalue,
+			  bool suppress_type_test)
 {
   symbol_attribute attr, lhs_attr;
   gfc_ref *ref;
@@ -3771,6 +3784,7 @@ bool
 		 >where);
 	  return false;
 	}
+
   if (rvalue->expr_type == EXPR_VARIABLE && !attr.proc_pointer)
 	{
   	  /* Check for intrinsics.  */
@@ -3967,6 +3981,16 @@ bool
 
   return true;
 }
+  else
+{
+  /* A non-proc pointer cannot point to a constant.  */
+  if (rvalue->expr_type == EXPR_CONSTANT)
+	{
+	  gfc_error_now ("Pointer assignment target cannot be a constant at %L",
+			 >where);
+	  return false;
+	}
+}
 
   if (!gfc_compare_types (>ts, >ts))
 {
@@ -3980,7 +4004,7 @@ bool
 		   "polymorphic, or of a type with the BIND or SEQUENCE "
 		   "attribute, to be compatible with an unlimited "
 		   "polymorphic target", >where);
-  else
+  else if (!suppress_type_test)
 	gfc_error ("Different types in pointer assignment at %L; "
 		   "attempted assignment of %s to %s", >where,
 		   gfc_typename (>ts),
Index: fortran/gfortran.h
===
--- fortran/gfortran.h	(Revision 265732)
+++ fortran/gfortran.h	(Arbeitskopie)
@@ -3219,7 +3219,8 @@ int gfc_kind_max (gfc_expr *, gfc_expr *);
 
 bool gfc_check_conformance (gfc_expr *, gfc_expr *, const char *, ...) ATTRIBUTE_PRINTF_3;
 bool gfc_check_assign (gfc_expr *, gfc_expr *, int, bool c = true);
-bool gfc_check_pointer_assign (gfc_expr *, gfc_expr *);
+bool gfc_check_pointer_assign (gfc_expr *lvalue, gfc_expr *rvalue,
+  bool suppres_type_test = false);
 bool gfc_check_assign_symbol (gfc_symbol *, gfc_component *, gfc_expr *);
 
 gfc_expr *gfc_build_default_init_expr (gfc_typespec *, locus *);
Index: fortran/resolve.c
===
--- fortran/resolve.c	(Revision 265732)
+++ fortran/resolve.c	(Arbeitskopie)
@@ -11420,11 +11420,12 @@ start:
 	  t = gfc_check_vardef_context (e, false, false, false,
 	_("pointer assignment"));
 	gfc_free_expr (e);
+
+	t = gfc_check_pointer_assign (code->expr1, code->expr2, !t) && t;
+
 	if (!t)
 	  break;
 
-	gfc_check_pointer_assign (code->expr1, code->expr2);
-
 	/* Assigning a class object always is a regular assign.  */
 	if (code->expr2->ts.type == BT_CLASS
 		&& code->expr1->ts.type == BT_CLASS
@@ -12540,6 +12541,9 @@ resolve_fl_procedure (gfc_symbol *sym, int mp_flag
 {
   gfc_error ("Function %qs at %L cannot have an initializer",
 		 sym->name, >declared_at);
+
+  /* Make sure no second error is issued for this.  */
+  sym->value->error = 1;
   return false;
 }
 
Index: testsuite/gfortran.dg/proc_ptr_result_5.f90
===
--- 

[PATCH, csky] Support -profile

2018-11-11 Thread 瞿仙淼
Hi, 
I have submitted a patch to support ‘-profile' for C-SKY


Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 266012)
+++ gcc/ChangeLog   (working copy)
@@ -1,5 +1,9 @@
 2018-11-11  Xianmiao Qu  
 
+   * config/csky/csky-linux-elf.h (CC1_SPEC): Support -profile.
+
+2018-11-11  Xianmiao Qu  
+
* config/csky/csky.h (ASM_PREFERRED_EH_DATA_FORMAT): Define.
 
 2018-11-11  Richard Biener  
Index: gcc/config/csky/csky-linux-elf.h
===
--- gcc/config/csky/csky-linux-elf.h(revision 266011)
+++ gcc/config/csky/csky-linux-elf.h(working copy)
@@ -35,6 +35,7 @@
 #define CC1_SPEC  \
   "%{EB:-EB} \
%{EL:-EL} \
+   %{profile:-p}  \
   "
 
 #undef ASM_SPEC





[PATCH, csky] Define ASM_PREFERRED_EH_DATA_FORMAT

2018-11-11 Thread 瞿仙淼
Hi, 
I have submitted a patch to Define ASM_PREFERRED_EH_DATA_FORMAT for 
C-SKY


Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 266011)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2018-11-11  Xianmiao Qu  
+
+   * config/csky/csky.h (ASM_PREFERRED_EH_DATA_FORMAT): Define.
+
 2018-11-11  Richard Biener  
 
* tree-vrp.h (class value_range_base): New base class for
Index: gcc/config/csky/csky.h
===
--- gcc/config/csky/csky.h  (revision 266011)
+++ gcc/config/csky/csky.h  (working copy)
@@ -292,6 +292,9 @@ extern int csky_arch_isa_features[];
 /* The register that holds the return address in exception handlers.  */
 #define EH_RETURN_STACKADJ_RTX gen_rtx_REG (SImode, CSKY_EH_STACKADJ_REGNUM)
 
+/* Select a format to encode pointers in exception handling data.  */
+#define ASM_PREFERRED_EH_DATA_FORMAT(CODE, GLOBAL) \
+  (((GLOBAL) ? DW_EH_PE_indirect : 0) | DW_EH_PE_pcrel | DW_EH_PE_sdata4)
 
 /* Registers That Address the Stack Frame  */
 




Re: [PATCH] MIPS: Add `-mfix-r5900' option for the R5900 short loop erratum

2018-11-11 Thread Fredrik Noring
Hi Matthew,

> [ As mentioned in the commit message, the test was done with GCC 8.2.0
> since there was an unrelated problem compiling some sanitizer component
> at the time. I will try to test a current GCC version during the weekend. ]

The compilation problem, more specifically, is

In file included from 
../../../../libsanitizer/sanitizer_common/sanitizer_platform_limits_linux.cc:20:
../../../../libsanitizer/sanitizer_common/sanitizer_internal_defs.h:339:72: 
error: narrowing conversion of ‘-1’ from ‘int’ to ‘unsigned int’  [-Wnarrowing]
  339 | typedef char IMPL_PASTE(assertion_failed_##_, line)[2*(int)(pred)-1]
  |^
../../../../libsanitizer/sanitizer_common/sanitizer_internal_defs.h:333:30: 
note: in expansion of macro ‘IMPL_COMPILER_ASSERT’
  333 | #define COMPILER_CHECK(pred) IMPL_COMPILER_ASSERT(pred, __LINE__)
  |  ^~~~
../../../../libsanitizer/sanitizer_common/sanitizer_platform_limits_linux.cc:71:1:
 note: in expansion of macro ‘COMPILER_CHECK’
   71 | COMPILER_CHECK(struct_kernel_stat_sz == sizeof(struct stat));
  | ^~
../../../../libsanitizer/sanitizer_common/sanitizer_internal_defs.h:339:72: 
warning: size of array is not an integral constant-expression [-Wpedantic]
  339 | typedef char IMPL_PASTE(assertion_failed_##_, line)[2*(int)(pred)-1]
  |^
../../../../libsanitizer/sanitizer_common/sanitizer_internal_defs.h:333:30: 
note: in expansion of macro ‘IMPL_COMPILER_ASSERT’
  333 | #define COMPILER_CHECK(pred) IMPL_COMPILER_ASSERT(pred, __LINE__)
  |  ^~~~
../../../../libsanitizer/sanitizer_common/sanitizer_platform_limits_linux.cc:71:1:
 note: in expansion of macro ‘COMPILER_CHECK’
   71 | COMPILER_CHECK(struct_kernel_stat_sz == sizeof(struct stat));
  | ^~
../../../../libsanitizer/sanitizer_common/sanitizer_internal_defs.h:339:72: 
error: size of array ‘assertion_failed__71’ is negative
  339 | typedef char IMPL_PASTE(assertion_failed_##_, line)[2*(int)(pred)-1]
  |^
../../../../libsanitizer/sanitizer_common/sanitizer_internal_defs.h:333:30: 
note: in expansion of macro ‘IMPL_COMPILER_ASSERT’
  333 | #define COMPILER_CHECK(pred) IMPL_COMPILER_ASSERT(pred, __LINE__)
  |  ^~~~
../../../../libsanitizer/sanitizer_common/sanitizer_platform_limits_linux.cc:71:1:
 note: in expansion of macro ‘COMPILER_CHECK’
   71 | COMPILER_CHECK(struct_kernel_stat_sz == sizeof(struct stat));
  | ^~

on commit a1054504a5da (origin/trunk as of now). The host is ppc64le with
gcc-7.3.0 (Gentoo 7.3.0-r3 p1.4) and the target is mipsr5900el-unknown-
linux-gnu.

Fredrik


Re: [PATCH, AArch64, v3 0/6] LSE atomics out-of-line

2018-11-11 Thread Richard Henderson
Ping.

On 11/1/18 10:46 PM, Richard Henderson wrote:
> From: Richard Henderson 
> 
> Changes since v2:
>   * Committed half of the patch set.
>   * Split inline TImode support from out-of-line patches.
>   * Removed the ST out-of-line functions, to match inline.
>   * Moved the out-of-line functions to assembly.
> 
> What I have not done, but is now a possibility, is to use a custom
> calling convention for the out-of-line routines.  I now only clobber
> 2 (or 3, for TImode) temp regs and set a return value.
> 
> 
> r~
>   
> 
> Richard Henderson (6):
>   aarch64: Extend %R for integer registers
>   aarch64: Implement TImode compare-and-swap
>   aarch64: Tidy aarch64_split_compare_and_swap
>   aarch64: Add out-of-line functions for LSE atomics
>   aarch64: Implement -matomic-ool
>   Enable -matomic-ool by default
> 
>  gcc/config/aarch64/aarch64-protos.h   |  13 +
>  gcc/common/config/aarch64/aarch64-common.c|   6 +-
>  gcc/config/aarch64/aarch64.c  | 211 
>  .../atomic-comp-swap-release-acquire.c|   2 +-
>  .../gcc.target/aarch64/atomic-op-acq_rel.c|   2 +-
>  .../gcc.target/aarch64/atomic-op-acquire.c|   2 +-
>  .../gcc.target/aarch64/atomic-op-char.c   |   2 +-
>  .../gcc.target/aarch64/atomic-op-consume.c|   2 +-
>  .../gcc.target/aarch64/atomic-op-imm.c|   2 +-
>  .../gcc.target/aarch64/atomic-op-int.c|   2 +-
>  .../gcc.target/aarch64/atomic-op-long.c   |   2 +-
>  .../gcc.target/aarch64/atomic-op-relaxed.c|   2 +-
>  .../gcc.target/aarch64/atomic-op-release.c|   2 +-
>  .../gcc.target/aarch64/atomic-op-seq_cst.c|   2 +-
>  .../gcc.target/aarch64/atomic-op-short.c  |   2 +-
>  .../aarch64/atomic_cmp_exchange_zero_reg_1.c  |   2 +-
>  .../atomic_cmp_exchange_zero_strong_1.c   |   2 +-
>  .../gcc.target/aarch64/sync-comp-swap.c   |   2 +-
>  .../gcc.target/aarch64/sync-op-acquire.c  |   2 +-
>  .../gcc.target/aarch64/sync-op-full.c |   2 +-
>  libgcc/config/aarch64/lse-init.c  |  45 
>  gcc/config/aarch64/aarch64.opt|   4 +
>  gcc/config/aarch64/atomics.md | 185 +-
>  gcc/config/aarch64/iterators.md   |   3 +
>  gcc/doc/invoke.texi   |  14 +-
>  libgcc/config.host|   4 +
>  libgcc/config/aarch64/lse.S   | 238 ++
>  libgcc/config/aarch64/t-lse   |  44 
>  28 files changed, 717 insertions(+), 84 deletions(-)
>  create mode 100644 libgcc/config/aarch64/lse-init.c
>  create mode 100644 libgcc/config/aarch64/lse.S
>  create mode 100644 libgcc/config/aarch64/t-lse
> 



Re: [PATCH PR84648]Adjust loop exit conditions for loop-until-wrap cases.

2018-11-11 Thread Bernhard Reutner-Fischer
On Sun, 11 Nov 2018 16:02:33 +0800
"bin.cheng"  wrote:

Quick observation unrelated to the real patch.

I think the coding style mandates to use the type itself and not the
underlying structure.

I know there are alot of places from before our switch to C++ that
still use enum tree_code or enum machine_mode, but new code should
adhere to the style please.

So: s/enum tree_code/tree_code/g

thanks,


[PATCH 7/9][GCC][Arm] Enable autovectorization of Half float values

2018-11-11 Thread Tamar Christina
Hi All,

The AArch32 backend is currently not able to support autovectorization of 
half-float values
on ARM. This is because we never told the vectorizer what the vector modes are 
for Half floats.

This enables autovectorization by definiting V4HF and V8HF as the vector modes.

Bootstrap and Regtest on aarch64-none-linux-gnu, arm-none-gnueabihf and 
x86_64-pc-linux-gnu
are still on going but previous patch showed no regressions.

Ok for trunk?

Thanks,
Tamar

gcc/ChangeLog:

2018-11-11  Tamar Christina  

* config/arm/arm.c (arm_preferred_simd_mode): Add V4HF and V8HF.

-- 
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 8393f0b87f34c04c9dcc89c63d2e9bbd042c969c..79502606b632e6a187732c8b3be118df8bde149a 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -27211,6 +27211,8 @@ arm_preferred_simd_mode (scalar_mode mode)
   if (TARGET_NEON)
 switch (mode)
   {
+  case E_HFmode:
+	return TARGET_NEON_VECTORIZE_DOUBLE ? V4HFmode : V8HFmode;
   case E_SFmode:
 	return TARGET_NEON_VECTORIZE_DOUBLE ? V2SFmode : V4SFmode;
   case E_SImode:



[PATCH 8/9][GCC][Arm] Add autovectorization support for complex multiplication and addition

2018-11-11 Thread Tamar Christina
Hi All,

This patch adds the expander support for supporting autovectorization of 
complex number operations
such as Complex addition with a rotation along the Argand plane.  This also 
adds support for complex
FMA.

The instructions are described in the ArmARM [1] and are available from 
Armv8.3-a onwards.

Concretely, this generates

f90:
add ip, r1, #15
add r3, r0, #15
sub r3, r3, r2
sub ip, ip, r2
cmp ip, #30
cmphi   r3, #30
add r3, r0, #1600
bls .L5
.L3:
vld1.32 {q8}, [r0]!
vld1.32 {q9}, [r1]!
vcadd.f32   q8, q8, q9, #90
vst1.32 {q8}, [r2]!
cmp r0, r3
bne .L3
bx  lr
.L5:
vld1.32 {d16}, [r0]!
vld1.32 {d17}, [r1]!
vcadd.f32   d16, d16, d17, #90
vst1.32 {d16}, [r2]!
cmp r0, r3
bne .L5
bx  lr



now instead of

f90:
add ip, r1, #31
add r3, r0, #31
sub r3, r3, r2
sub ip, ip, r2
cmp ip, #62
cmphi   r3, #62
add r3, r0, #1600
bls .L2
.L3:
vld2.32 {d20-d23}, [r0]!
vld2.32 {d24-d27}, [r1]!
cmp r0, r3
vsub.f32q8, q10, q13
vadd.f32q9, q12, q11
vst2.32 {d16-d19}, [r2]!
bne .L3
bx  lr
.L2:
vldrd19, .L10
.L5:
vld1.32 {d16}, [r1]!
vld1.32 {d18}, [r0]!
vrev64.32   d16, d16
cmp r0, r3
vsub.f32d17, d18, d16
vadd.f32d16, d16, d18
vswpd16, d17
vtbl.8  d16, {d16, d17}, d19
vst1.32 {d16}, [r2]!
bne .L5
bx  lr
.L11:
.align  3
.L10:
.byte   0
.byte   1
.byte   2
.byte   3
.byte   12
.byte   13
.byte   14
.byte   15


For complex additions with a 90* rotation along the Argand plane.

[1] 
https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

Bootstrap and Regtest on aarch64-none-linux-gnu, arm-none-gnueabihf and 
x86_64-pc-linux-gnu
are still on going but previous patch showed no regressions.

The instructions have also been tested on aarch64-none-elf and arm-none-eabi on 
a Armv8.3-a model
and -march=Armv8.3-a+fp16 and all tests pass.

Ok for trunk?

Thanks,
Tamar

gcc/ChangeLog:

2018-11-11  Tamar Christina  

* config/arm/arm.c (arm_arch8_3, arm_arch8_4): New.
* config/arm/arm.h (TARGET_COMPLEX, arm_arch8_3, arm_arch8_4): New.
(arm_option_reconfigure_globals): Use them.
* config/arm/iterators.md (VDF, VQ_HSF): New.
(VCADD, VCMLA): New.
(VF_constraint, rot, rotsplit1, rotsplit2): Add V4HF and V8HF.
* config/arm/neon.md (neon_vcadd, fcadd3,
neon_vcmla, fcmla4): New.
* config/arm/unspecs.md (UNSPEC_VCADD90, UNSPEC_VCADD270,
UNSPEC_VCMLA, UNSPEC_VCMLA90, UNSPEC_VCMLA180, UNSPEC_VCMLA270): New.

gcc/testsuite/ChangeLog:

2018-11-11  Tamar Christina  

* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_1.c: Add Arm 
support.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_2.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_3.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_4.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_5.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_6.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_1.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_2.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_3.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_4.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_5.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_6.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_1.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_180_1.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_180_2.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_180_3.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_2.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_270_1.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_270_2.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_270_3.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_3.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_90_1.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_90_2.c: Likewise.
* 

[PATCH 4/9][GCC][AArch64/Arm] Add new testsuite directives to check complex instructions.

2018-11-11 Thread Tamar Christina
Hi All,

This patch adds new testsuite directive for both Arm and AArch64 to support
testing of the Complex Arithmetic operations form Armv8.3-a.

Bootstrap and Regtest on aarch64-none-linux-gnu, arm-none-gnueabihf and 
x86_64-pc-linux-gnu
and no regressions.

The instructions have also been tested on aarch64-none-elf and arm-none-eabi on 
a Armv8.3-a model
and -march=Armv8.3-a+fp16 and all tests pass.

Ok for trunk?

Thanks,
Tamar

gcc/testsuite/ChangeLog:

2018-11-11  Tamar Christina  

* lib/target-supports.exp
(check_effective_target_arm_v8_3a_complex_neon_ok_nocache,
check_effective_target_arm_v8_3a_complex_neon_ok,
add_options_for_arm_v8_3a_complex_neon,
check_effective_target_arm_v8_3a_complex_neon_hw,
check_effective_target_vect_complex_rot_N): New.

-- 
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index fd74c04d092b0e2341b85eefbebb1f0df9423492..0c37413d828637b944f88b84db0f0a23ded3e659 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -8933,3 +8933,111 @@ proc check_effective_target_cet { } {
 	}
 } "-O2" ]
 }
+
+# Return 1 if the target supports ARMv8.3 Adv.SIMD Complex instructions
+# instructions, 0 otherwise.  The test is valid for ARM and for AArch64.
+# Record the command line options needed.
+
+proc check_effective_target_arm_v8_3a_complex_neon_ok_nocache { } {
+global et_arm_v8_3a_complex_neon_flags
+set et_arm_v8_3a_complex_neon_flags ""
+
+if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
+return 0;
+}
+
+# Iterate through sets of options to find the compiler flags that
+# need to be added to the -march option.
+foreach flags {"" "-mfloat-abi=softfp -mfpu=auto" "-mfloat-abi=hard -mfpu=auto"} {
+if { [check_no_compiler_messages_nocache \
+  arm_v8_3a_complex_neon_ok object {
+#if !defined (__ARM_FEATURE_COMPLEX)
+#error "__ARM_FEATURE_COMPLEX not defined"
+#endif
+} "$flags -march=armv8.3-a"] } {
+set et_arm_v8_3a_complex_neon_flags "$flags -march=armv8.3-a"
+return 1
+}
+}
+
+return 0;
+}
+
+proc check_effective_target_arm_v8_3a_complex_neon_ok { } {
+return [check_cached_effective_target arm_v8_3a_complex_neon_ok \
+check_effective_target_arm_v8_3a_complex_neon_ok_nocache]
+}
+
+proc add_options_for_arm_v8_3a_complex_neon { flags } {
+if { ! [check_effective_target_arm_v8_3a_complex_neon_ok] } {
+return "$flags"
+}
+global et_arm_v8_3a_complex_neon_flags
+return "$flags $et_arm_v8_3a_complex_neon_flags"
+}
+
+# Return 1 if the target supports executing AdvSIMD instructions from ARMv8.3
+# with the complex instruction extension, 0 otherwise.  The test is valid for
+# ARM and for AArch64.
+
+proc check_effective_target_arm_v8_3a_complex_neon_hw { } {
+if { ![check_effective_target_arm_v8_3a_complex_neon_ok] } {
+return 0;
+}
+return [check_runtime arm_v8_3a_complex_neon_hw_available {
+#include "arm_neon.h"
+int
+main (void)
+{
+
+  float32x2_t results = {-4.0,5.0};
+  float32x2_t a = {1.0,3.0};
+  float32x2_t b = {2.0,5.0};
+
+  #ifdef __ARM_ARCH_ISA_A64
+  asm ("fcadd %0.2s, %1.2s, %2.2s, #90"
+   : "=w"(results)
+   : "w"(a), "w"(b)
+   : /* No clobbers.  */);
+
+  #else
+  asm ("vcadd.f32 %P0, %P1, %P2, #90"
+   : "=w"(results)
+   : "w"(a), "w"(b)
+   : /* No clobbers.  */);
+  #endif
+
+  return (results[0] == 8 && results[1] == 24) ? 1 : 0;
+}
+} [add_options_for_arm_v8_3a_complex_neon ""]]
+}
+
+# Return 1 if the target plus current options supports a vector
+# complex addition with rotate of half and single float modes, 0 otherwise.
+#
+# This won't change for different subtargets so cache the result.
+
+foreach N {hf sf} {
+eval [string map [list N $N] {
+proc check_effective_target_vect_complex_rot_N { } {
+return [check_cached_effective_target_indexed vect_complex_rot_N {
+expr { [istarget aarch64*-*-*]
+|| [istarget arm*-*-*] }}]
+}
+}]
+}
+
+# Return 1 if the target plus current options supports a vector
+# complex addition with rotate of double float modes, 0 otherwise.
+#
+# This won't change for different subtargets so cache the result.
+
+foreach N {df} {
+eval [string map [list N $N] {
+proc check_effective_target_vect_complex_rot_N { } {
+return [check_cached_effective_target_indexed vect_complex_rot_N {
+expr { [istarget aarch64*-*-*] }}]
+}
+}]
+}
+



[PATCH 5/9][GCC][AArch64/Arm] Add auto-vectorization tests.

2018-11-11 Thread Tamar Christina
Hi All,

This patch adds tests for AArch64 and Arm to test the autovectorization
of complex numbers using the Armv8.3-a instructions.

This patch enables them only for AArch64 at this point.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

The instructions have also been tested on aarch64-none-elf on a Armv8.3-a model
and -march=Armv8.3-a+fp16 and all tests pass.

Ok for trunk?

Thanks,
Tamar

gcc/testsuite/ChangeLog:

2018-11-11  Tamar Christina  

* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays-autovec-270.c: New 
test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays-autovec-90.c: New 
test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_1.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_2.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_3.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_4.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_5.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_6.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-complex-autovec.c: New 
test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_1.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_2.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_3.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_4.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_5.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcadd-complex_6.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex-autovec.c: New 
test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_1.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_180_1.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_180_2.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_180_3.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_2.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_270_1.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_270_2.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_270_3.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_3.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_90_1.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_90_2.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vcmla-complex_90_3.c: New test.

-- 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays-autovec-270.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays-autovec-270.c
new file mode 100644
index ..8f660f392153c3a6a83b31486e275be316c6ad2b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays-autovec-270.c
@@ -0,0 +1,13 @@
+/* { dg-skip-if "" { *-*-* } } */
+
+#define N 200
+
+__attribute__ ((noinline))
+void calc (TYPE a[N], TYPE b[N], TYPE *c)
+{
+  for (int i=0; i < N; i+=2)
+{
+  c[i] = a[i] + b[i+1];
+  c[i+1] = a[i+1] - b[i];
+}
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays-autovec-90.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays-autovec-90.c
new file mode 100644
index ..14014b9d4f2c41e75be3e253d2e47e639e4224c0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays-autovec-90.c
@@ -0,0 +1,12 @@
+/* { dg-skip-if "" { *-*-* } } */
+#define N 200
+
+__attribute__ ((noinline))
+void calc (TYPE a[N], TYPE b[N], TYPE *c)
+{
+  for (int i=0; i < N; i+=2)
+{
+  c[i] = a[i] - b[i+1];
+  c[i+1] = a[i+1] + b[i];
+}
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_1.c
new file mode 100644
index ..627f2e78daee9c4a4f86c2071080b4114820c209
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcadd-arrays_1.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_v8_3a_complex_neon_ok } */
+/* { dg-require-effective-target vect_complex_rot_df } */
+/* { dg-add-options arm_v8_3a_complex_neon } */
+/* { dg-additional-options "-Ofast -save-temps" } */
+
+#define TYPE double
+#include "vcadd-arrays-autovec-90.c"
+
+extern void abort(void);
+
+int main()
+{
+  TYPE a[N] = {1.0, 2.0, 3.0, 4.0};
+  TYPE b[N] = {4.0, 2.0, 1.5, 4.5};
+  TYPE c[N] = {0};
+  calc (a, b, c);
+
+  if (c[0] != -1.0 || c[1] != 6.0)
+abort ();
+
+  if (c[2] != -1.5 || c[3] != 5.5)
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times {fcadd\tv[0-9]+\.2d, v[0-9]+\.2d, 

[PATCH 3/9][GCC][AArch64] Add autovectorization support for Complex instructions

2018-11-11 Thread Tamar Christina
Hi All,

This patch adds the expander support for supporting autovectorization of 
complex number operations
such as Complex addition with a rotation along the Argand plane.  This also 
adds support for complex
FMA.

The instructions are described in the ArmARM [1] and are available from 
Armv8.3-a onwards.

Concretely, this generates

f90:
mov x3, 0
.p2align 3,,7
.L2:
ldr q0, [x0, x3]
ldr q1, [x1, x3]
fcadd   v0.2d, v0.2d, v1.2d, #90
str q0, [x2, x3]
add x3, x3, 16
cmp x3, 3200
bne .L2
ret

now instead of

f90:
mov x4, x1
mov x1, x2
add x3, x4, 31
add x2, x0, 31
sub x3, x3, x1
sub x2, x2, x1
cmp x3, 62
mov x3, 62
ccmpx2, x3, 0, hi
bls .L5
mov x2, x4
add x3, x0, 3200
.p2align 3,,7
.L3:
ld2 {v2.2d - v3.2d}, [x0], 32
ld2 {v4.2d - v5.2d}, [x2], 32
cmp x0, x3
fsubv0.2d, v2.2d, v5.2d
faddv1.2d, v4.2d, v3.2d
st2 {v0.2d - v1.2d}, [x1], 32
bne .L3
ret
.L5:
add x6, x0, 8
add x5, x4, 8
add x2, x1, 8
mov x3, 0
.p2align 3,,7
.L2:
ldr d1, [x0, x3]
ldr d3, [x5, x3]
ldr d0, [x6, x3]
ldr d2, [x4, x3]
fsubd1, d1, d3
faddd0, d0, d2
str d1, [x1, x3]
str d0, [x2, x3]
add x3, x3, 16
cmp x3, 3200
bne .L2
ret

For complex additions with a 90* rotation along the Argand plane.

[1] 
https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

Bootstrap and Regtest on aarch64-none-linux-gnu, arm-none-gnueabihf and 
x86_64-pc-linux-gnu
are still on going but previous patch showed no regressions.

The instructions have also been tested on aarch64-none-elf and arm-none-eabi on 
a Armv8.3-a model
and -march=Armv8.3-a+fp16 and all tests pass.

Ok for trunk?

Thanks,
Tamar

gcc/ChangeLog:

2018-11-11  Tamar Christina  

* config/aarch64/aarch64-simd.md (aarch64_fcadd,
fcadd3, aarch64_fcmla,
fcmla4): New.
* config/aarch64/aarch64.h (TARGET_COMPLEX): New.
* config/aarch64/iterators.md (UNSPEC_FCADD90, UNSPEC_FCADD270,
UNSPEC_FCMLA, UNSPEC_FCMLA90, UNSPEC_FCMLA180, UNSPEC_FCMLA270): New.
(FCADD, FCMLA): New.
(rot, rotsplit1, rotsplit2): New.
* config/arm/types.md (neon_fcadd, neon_fcmla): New.

-- 
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index c4be3101fdec930707918106cd7c53cf7584553e..12a91183a98ea23015860c77a97955cb1b30bfbb 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -419,6 +419,63 @@
 }
 )
 
+;; The fcadd and fcmla patterns are made UNSPEC for the explicitly due to the
+;; fact that their usage need to guarantee that the source vectors are
+;; contiguous.  It would be wrong to describe the operation without being able
+;; to describe the permute that is also required, but even if that is done
+;; the permute would have been created as a LOAD_LANES which means the values
+;; in the registers are in the wrong order.
+(define_insn "aarch64_fcadd"
+  [(set (match_operand:VHSDF 0 "register_operand" "=w")
+	(unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w")
+		   (match_operand:VHSDF 2 "register_operand" "w")]
+		   FCADD))]
+  "TARGET_COMPLEX"
+  "fcadd\t%0., %1., %2., #"
+  [(set_attr "type" "neon_fcadd")]
+)
+
+(define_expand "fcadd3"
+  [(set (match_operand:VHSDF 0 "register_operand")
+	(unspec:VHSDF [(match_operand:VHSDF 1 "register_operand")
+		   (match_operand:VHSDF 2 "register_operand")]
+		   FCADD))]
+  "TARGET_COMPLEX"
+{
+  emit_insn (gen_aarch64_fcadd (operands[0], operands[1],
+	   operands[2]));
+  DONE;
+})
+
+(define_insn "aarch64_fcmla"
+  [(set (match_operand:VHSDF 0 "register_operand" "=w")
+	(plus:VHSDF (match_operand:VHSDF 1 "register_operand" "0")
+		(unspec:VHSDF [(match_operand:VHSDF 2 "register_operand" "w")
+   (match_operand:VHSDF 3 "register_operand" "w")]
+   FCMLA)))]
+  "TARGET_COMPLEX"
+  "fcmla\t%0., %2., %3., #"
+  [(set_attr "type" "neon_fcmla")]
+)
+
+;; The complex mla operations always need to expand to two instructions.
+;; The first operation does half the computation and the second does the
+;; remainder.  Because of this, expand early.
+(define_expand "fcmla4"
+  [(set (match_operand:VHSDF 0 "register_operand")
+	(plus:VHSDF (match_operand:VHSDF 1 "register_operand")
+		(unspec:VHSDF [(match_operand:VHSDF 2 "register_operand")
+   (match_operand:VHSDF 3 "register_operand")]
+   FCMLA)))]
+  "TARGET_COMPLEX"
+{
+  emit_insn (gen_aarch64_fcmla (operands[0], operands[1],
+		 operands[2], 

[PATCH 2/9][GCC][AArch64][middle-end] Add rules to strip away unneeded type casts in expressions

2018-11-11 Thread Tamar Christina
Hi All,

This patch adds a match.pd rule for stripping away the type converts when you're
converting to a type that has twice the precision of the current type in the 
same
class, doing a simple math operation on it and converting back to the smaller 
type.

The change makes it so the operations are kept in the smaller type.  The 
motivating
reason behind this is that the imaginary constant I in C99 is defined to be a
single precision float.  For Half precision this means the entire operation is 
carried
out in single precision which means that it adds a lot of type casting 
instructions in
the output and prevents optimal vectorization as it lowers your vectorization 
factor.


It means that if a and b are fp16 values, doing a * b * I will get vectorized 
in SFmode
instead of HFmode.

Bootstrap and Regtest on aarch64-none-linux-gnu, arm-none-gnueabihf and 
x86_64-pc-linux-gnu
are still on going but previous patch showed regressions in the 
builtin-arith-overflow-8 to -11.

However since it doesn't show any regression anywhere else I am wondering if 
it's just the test
that need updating or if the idea is not acceptable. Perhaps it should be done 
only for unsafe math?

So I am posting the patch for comments.

Thanks,
Tamar

gcc/ChangeLog:

2018-11-11  Tamar Christina  

* match.pd: Add type conversion stripping.

-- 
diff --git a/gcc/match.pd b/gcc/match.pd
index d07ceb7d087b8b5c5a7d7362ad9d8f71ac90dc08..3c2f8caca42d6a163fbf7faba6220d7304200100 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4709,6 +4709,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 	 (convert (op (convert:utype @0)
 		  (convert:utype @1
 
+/* Strip out useless type conversions:
+   ((F)((X)a op (X)b)) -> a op b
+
+   when ((F)((X)a op (X)b)) where a and b are both of type F,
+   and X has twice the precision of F then the conversion is useless
+   and should be stripped away to allow more optimizations.  */
+
+(for op (plus minus mult rdiv)
+ (simplify
+   (convert (op:s (convert@0 @1) (convert@2 @3)))
+   (if (types_match (@1, @3)
+&& types_match (type, @1)
+&& types_match (@0, @2)
+	&& GET_MODE_CLASS (TYPE_MODE (type))
+	   == GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (@0)))
+&& TYPE_PRECISION (type) == (TYPE_PRECISION (TREE_TYPE (@0)) / 2))
+ (op @1 @3
+
 /* This is another case of narrowing, specifically when there's an outer
BIT_AND_EXPR which masks off bits outside the type of the innermost
operands.   Like the previous case we have to convert the operands



[PATCH 1/9][GCC][AArch64][middle-end] Implement SLP recognizer for Complex addition with rotate and complex MLA with rotation

2018-11-11 Thread Tamar Christina
Hi All,

This patch adds support for SLP vectorization of Complex number arithmetic with 
rotations
along with Argand plane.

For this to work it has to recognize two statements in parallel as it needs to 
match
against operations towards both the real and imaginary numbers.  The 
instructions generated
by this change is also only available in their vector forms.  As such I add 
them as pattern
statements to the stmt.  The BB is left untouched and so the scalar loop is 
untouched.

The instructions also require the loads to be contiguous and so when a match is 
made, and
the code decides it is able to do the replacement it re-organizes the SLP tree 
such that the
loads are contiguous.  Since this operation cannot be undone it only does this 
if it's sure that
the resulting loads can be made continguous.

It gets this guarantee by only allowing the replacement if between the matched 
expression and the
loads there are no other expressions it doesn't know aside from type casts.

When a match occurs over multiple expressions, the dead statements are 
immediately removed from the
tree to prevent verification failures later.

Because the pattern matching is done after SLP analysis has analyzed the usage 
of the instruction it
also marks the instructions as used and the old ones as unusued.

When a replacement is done a new internal function is generated which the 
back-end has to expand to
the proper instruction sequences.

For now, this patch only adds support for Complex addition with rotate and 
Complex FMLA
with rotation of 0 and 180. However it is the intention to in the future add 
support for
Complex subtraction and Complex multiplication.

Concretely, this generates

ldr q1, [x1, x3]
ldr q2, [x0, x3]
ldr q0, [x2, x3]
fcmla   v0.2d, v1.2d, v2.2d, #180
fcmla   v0.2d, v1.2d, v2.2d, #90
str q0, [x2, x3]
add x3, x3, 16
cmp x3, 3200
bne .L2
ret

now instead of

add x3, x2, 31
sub x4, x3, x1
sub x3, x3, x0
cmp x4, 62
mov x4, 62
ccmpx3, x4, 0, hi
bls .L5
mov x3, x0
mov x0, x1
add x1, x2, 3200
.p2align 3,,7
.L3:
ld2 {v16.2d - v17.2d}, [x2]
ld2 {v2.2d - v3.2d}, [x3], 32
ld2 {v0.2d - v1.2d}, [x0], 32
mov v7.16b, v17.16b
fmulv6.2d, v0.2d, v3.2d
fmlav7.2d, v1.2d, v3.2d
fmlav6.2d, v1.2d, v2.2d
fmlsv7.2d, v2.2d, v0.2d
faddv4.2d, v6.2d, v16.2d
mov v5.16b, v7.16b
st2 {v4.2d - v5.2d}, [x2], 32
cmp x2, x1
bne .L3
ret
.L5:
add x4, x2, 8
add x6, x0, 8
add x5, x1, 8
mov x3, 0
.p2align 3,,7
.L2:
ldr d1, [x6, x3]
ldr d4, [x1, x3]
ldr d5, [x5, x3]
ldr d3, [x0, x3]
fmuld2, d4, d1
ldr d0, [x4, x3]
fmadd   d0, d5, d1, d0
ldr d1, [x2, x3]
fmadd   d2, d5, d3, d2
fmsub   d0, d4, d3, d0
faddd1, d1, d2
str d1, [x2, x3]
str d0, [x4, x3]
add x3, x3, 16
cmp x3, 3200
bne .L2
ret

Bootstrap and Regtests on aarch64-none-linux-gnu, arm-none-gnueabihf and 
x86_64-pc-linux-gnu
are still on going but previous patch showed no regressions.

The instructions have also been tested on aarch64-none-elf and arm-none-eabi on 
a Armv8.3-a model
and -march=Armv8.3-a+fp16 and all tests pass.

Ok for trunk?

Thanks,
Tamar

gcc/ChangeLog:

2018-11-11  Tamar Christina  

* doc/md.texi (fcadd, fcmla): New.
* doc/sourcebuild.texi (vect_complex_rot_): New.
* internal-fn.def (FCOMPLEX_ADD_ROT90): New.
(FCOMPLEX_ADD_ROT270): New.
(FCOMPLEX_FMA_ROT0): New.
(FCOMPLEX_FMA_ROT180): New.
* optabs.def (fcadd90_optab, fcadd270_optab,
fcmla0_optab, fcmla180_optab):  New.
* tree-vect-patterns.c (vect_mark_pattern_stmts): Export function.
* tree-vect-slp.c (vect_create_complex_patt_stmt): New.
(vect_is_complex_transform_valid): New.
(vect_reorder_based_on_load_chain): New.
(vect_determine_place_in_load_1): New.
(vect_determine_place_in_load): New.
(vect_balance_load_nodes_1): New.
(vect_balance_load_nodes): New.
(vect_match_expression_p): New.
(vect_detect_pair_op): New.
(vect_delete_slp_nodes_1): New.
(vect_delete_slp_nodes): New.
(vect_match_slp_patterns_1): New.
(vect_match_call_complex_add): New.
(vect_match_call_complex_mla_1): New.
(vect_match_call_complex_mla): New.
(vect_match_slp_patterns): New.
(vect_analyze_slp_instance): Use it.
* tree-vectorizer.h (vect_mark_pattern_stmts): Export function.

-- 
diff --git 

[PATCH 0/9][GCC][AArch64/Arm][middle-end] Add support for SLP vectorization of complex number instructions.

2018-11-11 Thread Tamar Christina
Hi All,

This patch series adds support for SLP vectorization of complex instructions 
[1].

These instructions exist only in their vector forms and require you to recognize
two statements in parallel.  Complex operations usually require a permute due to
the fact that the real and imaginary numbers are stored intermixed but these 
vector
instructions expect this and no longer need the compiler to generate a permute.

For this reason the pass also re-orders the loads in the SLP tree such that they
become contiguous and no longer need the permutes.  The Basic Blocks are left
untouched such that the scalar loop will still correctly issue permutes.

The instructions also support rotations along the Argand plane, as such the 
operands
have to be re-ordered to coincide with their load group.

For now, this patch only adds support for Complex addition with rotate and 
Complex FMLA
with rotation of 0 and 180. However it is the intention to in the future add 
support for
Complex subtraction and Complex multiplication.

The operations rely on the early lowering of complex numbers by GCC into real 
and imaginary
pairs, and so just recognizes any instruction sequence matching the operations 
requested.

To be safe when the it is not sure it can support the operation or if it finds 
something it
does not understand it backs off.

The hit rate of such patterns in SPEC CPU 2006 are as follows

Unsupported due to type casts: 28
Successfully matched and created: 43
Aborted due to unknown instruction in sequence: 354
Total times pattern matched: 403

Which shows that this and the future enhancements are worth while.  On AArch64 
the code size
difference when the new instructions are used is about 2-3x smaller.

[1] 
https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

Thanks,
Tamar

-- 


[GCC][AArch64] [middle-end][docs] Document the xorsign optab

2018-11-11 Thread Tamar Christina
Hi All,

This patch just adds documentation for the xorsign optab that was added a while 
ago.

Ok for trunk?

Thanks,
Tamar

gcc/ChangeLog:

2018-11-11  Tamar Christina  

* doc/md.texi (xorsign): Document it.

-- 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 360b36b862f7eb13964e60ff53b04e1274f89fe4..510321ef12a167f79875cd13a8683271ee590ebe 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5997,6 +5997,15 @@ vector floating-point mode.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{xorsign@var{m}3} instruction pattern
+@item @samp{xorsign@var{m}3}
+Target suppports an efficient expansion of x * copysign (1.0, y)
+as xorsign (x, y).  Store a value with the magnitude of operand 1
+and the sign of operand 2 into operand 0.  All operands have mode
+@var{m}, which is a scalar or vector floating-point mode.
+
+This pattern is not allowed to @code{FAIL}.
+
 @cindex @code{ffs@var{m}2} instruction pattern
 @item @samp{ffs@var{m}2}
 Store into operand 0 one plus the index of the least significant 1-bit



Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-11 Thread Richard Biener
On Fri, Nov 9, 2018 at 7:18 PM Andi Kleen  wrote:
>
> Hi Richard,
>
> On Fri, Nov 09, 2018 at 04:27:22PM +0100, Richard Biener wrote:
> > > Passes bootstrap and test suite on x86_64-linux, also
> > > bootstrapped and tested gcc itself with full -fvartrace
> > > and -fvartrace-locals instrumentation.
> >
> > So how is this supposed to be used?  I guess in a
> > edit-debug cycle and not for production code?
>
> It can actually be used for production code.
>
> When processor trace is disabled the PTWRITE
> instructions acts as nops. So it's only increasing
> the code foot print. Since the instrumentation
> should only log values which are already computed
> it normally doesn't cause any other code.
>
> Even when it is enabled the primary overhead is the
> additional memory bandwidth, since the CPU can
> do the logging in parallel to other code. As long
> as the instrumentation is not too excessive to generate
> too much memory bandwidth, it might be actually
> quite reasonable to even keep the logging on for
> production code, and use it as a "flight recorder",
> which is dumped on failures.

I see.

> This would also be the model in gdb, once we have support
> in it. You would run the program in the debugger
> and it just logs the data to a memory buffer,
> but when stopping the value history can be examined.

Hmm, so the debugger still needs to relate the ptwrite
instruction with the actual variable the data is for.  I suppose
practically this means that var-tracking needs to be able to
compute a location list for a variable that happens to overlap
with the stored value?

That is, usually debuggers look for a location list of a variable
and find, say, %rax.  But for ptwrite the debugger needs to
examine all active location lists for, say, %rax and figure out
that it contains the value for variable 'a'?

When there isn't any such relation between the ptwrite stored
value and any variable the ptwrite is useless, right?

> There's also some ongoing work to add (optional) support
> for PT to Linux crash dumps, so eventually that will
> work without having to always run the debugger.
>
> Today it can be done by running perf in the background
> to record the PT, however there the setup is a bit
> more complicated.
>
> The primary use case I was envisioning was to set
> the attribute on some critical functions/structures/types
> of interest and then have a very overhead logging
> option for them (generally cheaper than
> equivalent software instrumentation). And then
> they automatically gets logged without the programmer
> needing to add lots of instrumentation code to
> catch every instance. So think of it as a
> "hardware accelerated printf"
>
> >
> > What do you actually write with PTWRITE?  I suppose
> > you need to keep a ID to something mapping somewhere
> > so you can make sense of the perf records?
>
> PTWRITE writes 32bit/64bit values. The CPU reports the
> IP of PTWRITE in the log, either explicitely, or implicitely if branch
> trace is enabled too. The IP can then be used to look up
> the DWARF scope for that IP. Then the decoder
> decodes the operand of PTWRITE and maps it back using
> the dwarf information. So it all works using
> existing debugger infrastructure, and a quite simple
> instruction decoder.
>
> I'll clarify that in the description.
>
> >
> > Few comments inline below, but I'm not sure if this
> > whole thing is interesting for GCC (as opposed to being
> > a instrumentation plugin)
>
> I'm biased, but I think automatic data tracing is a very exciting
> use case, so hopefully it can be considered for mainstream gcc.
>
> > >   
> > > handle_no_profile_instrument_function_attribute,
> > >   NULL },
> > > @@ -767,6 +775,21 @@ handle_no_sanitize_undefined_attribute (tree *node, 
> > > tree name, tree, int,
> > >return NULL_TREE;
> > >  }
> > >
> > > +/* Handle "vartrace"/"no_vartrace" attributes; arguments as in
> > > +   struct attribute_spec.handler.  */
> > > +
> > > +static tree
> > > +handle_vartrace_attribute (tree *node, tree, tree, int flags,
> > > +  bool *)
> > > +{
> > > +  if (TYPE_P (*node) && !(flags & (int) ATTR_FLAG_TYPE_IN_PLACE))
> > > +*node = build_variant_type_copy (*node);
> >
> > I don't think you want the attribute on types.  As far as I understood your
> > descriptions it should only be on variables and functions.
>
> The idea was that it's possible to trace all instances of a type,
> especially structure members. Otherwise it will be harder for
> the programmer to hunt down every instance.
>
> For example if I have a structure that is used over a program,
> and one member gets the wrong value.
>
> I can do then in the header:
>
> struct foo {
> int member __attribute__(("vartrace"));
> };
>
> and then recompile the program. Every instance of writing to
> member will then be automatically instrumented (assuming
> the program stays type safe)
>
> Makes sense?

OK.  The 

Re: [PATCH] Add value_range_base (w/o equivs)

2018-11-11 Thread Richard Biener
On Fri, 9 Nov 2018, Aldy Hernandez wrote:

> On 11/9/18 9:19 AM, Richard Biener wrote:
> > 
> > This adds value_range_base, a base class of class value_range
> > with all members but the m_equiv one.
> 
> First of all, thanks so much for doing this!
> 
> > 
> > I have looked into the sole GC user, IPA propagation, and replaced
> > the value_range use there with value_range_base since it also
> > asserted the equiv member is always NULL.
> > 
> > This in turn means I have written down that GC users only can
> > use value_range_base (and fixed the accessability issue with
> > adding a bunch of friends).
> 
> > +
> >  /* Range of values that can be associated with an SSA_NAME after VRP
> > -   has executed.  */
> > -class GTY((for_user)) value_range
> > +   has executed.  Note you may only use value_range_base with GC memory.
> > */
> > +class GTY((for_user)) value_range_base
> > +{
> 
> GC users cannot use the derived value_range?  Either way could you document
> the "why" this is the case above?

I've changed the comment as it was said to be confusing.  The reason is
the marking isn't implemented.

> And thanks for adding those accessibility friends, they're a pain to get
> right.  It's a pity we have to go to such lengths to deal with this shit.
> 
> > 
> > I have moved all methods that are required for this single user
> > sofar (without looking with others are trivially movable because
> > they do not look at equivs - that's for a followup).  I ended
> 
> That would be great.
> 
> > up basically duplicating the ::union_ wrapper around union_ranges
> > (boo).  Some more refactoring might save us a few lines here.
> > 
> > There are two places where IPA prop calls extract_range_from_unary_expr.
> > I've temporarily made it use value_range temporaries there given
> > I need to think about the few cases of ->deep_copy () we have in
> > there.  IMHO equiv handling would need to move to the callers or
> > rather "copies" shouldn't be handled by extract_range_from_unary_expr.
> > Well, I need to think about it.  (no, I don't want to make that
> > function virtual)
> > 
> > The other workers might be even easier to massage to work on
> > value_range_base only.
> 
> If value_range_base is the predominant idiom, perhaps it should be called
> value_range and the derived class be called value_range_with_equiv or some
> such?

Sure - the value_range_base approach was less intrusive.

> > 
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> > 
> > If that goes well I want to apply this even in its incomplete form.
> > Unless there are any complaints?
> 
> Fine by me.  Again, thanks.

Installed as follows.  I'll see how I can complete it while
woring out of virtual office on Monday.

Richard.

2018-11-11  Richard Biener  

* tree-vrp.h (class value_range_base): New base class for
value_range containing all but the m_equiv member.
(dump_value_range_base): Add.
(range_includes_zero_p): Work on value_range_base.
* tree-vrp.c (value_range_base::set): Split out base handling
from...
(value_range::set): this.
(value_range::set_equiv): New.
(value_range_base::value_range_base): New constructors.
(value_range_base::check): Split out base handling from...
(value_range::check): this.
(value_range::equal_p): Refactor in terms of
ignore_equivs_equal_p which is now member of the base.
(value_range_base::set_undefined): New.
(value_range_base::set_varying): Likewise.
(value_range_base::dump):Split out base handling from...
(value_range::dump): this.
(value_range_base::set_and_canonicalize): Split out base handling
from...
(value_range::set_and_canonicalize): this.
(value_range_base::union_): New.

* ipa-prop.h (struct ipa_jump_func): Use value_range_base *
for m_vr.
* ipa-cp.c (class ipcp_vr_lattice): Use value_range_base
instead of value_range everywhere.
(ipcp_vr_lattice::print): Use dump_value_range_base.
(ipcp_vr_lattice::meet_with): Adjust.
(ipcp_vr_lattice::meet_with_1): Likewise.
(ipa_vr_operation_and_type_effects): Likewise.
(propagate_vr_across_jump_function): Likewise.
* ipa-prop.c (struct ipa_vr_ggc_hash_traits): Likewise.
(ipa_get_value_range): Likewise.
(ipa_set_jfunc_vr): Likewise.
(ipa_compute_jump_functions_for_edge): Likewise.

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 4471bae11c7..4f147eb37cc 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -307,18 +307,18 @@ private:
 class ipcp_vr_lattice
 {
 public:
-  value_range m_vr;
+  value_range_base m_vr;
 
   inline bool bottom_p () const;
   inline bool top_p () const;
   inline bool set_to_bottom ();
-  bool meet_with (const value_range *p_vr);
+  bool meet_with (const value_range_base *p_vr);
   bool meet_with (const ipcp_vr_lattice );
   void init () { gcc_assert (m_vr.undefined_p 

RFA: Fix add_predicate_code to acknowledge ZERO_EXTRACT as an lvalue.

2018-11-11 Thread Joern Wolfgang Rennecke
With a configurable vector size, it is not really feasible to represent 
every vector register

inside GCC as a collection of lots of imaginary BITS_PER_WORD registers.
So you got your general purpose registers that are BITS_PER_WORD, and 
vector registers
that are a bit or a lot larger.  To void invalid code being emitted by 
reload, you have to define TARGET_CAN_CHANGE_MODE_CLASS to reject the 
use of vector registers for values

where certain kinds of SUBREGs are used.  In practice, that's most of them.
To avoid register allocation mayhem, the port has to steer the 
middle-end away from
the tried-and-true-and-generating-absymal-code path of SUBREGs. There 
are a number
of choices for lvalues.  vec_select is sort of obvious and works to a 
point, but it doesn't
scale well because the access representation changes according to the 
content of

vector registers.  And it doesn't work at all as an lvalue.
ZERO_EXTRACT has none of these problems.  It can describe a bitfield 
access independent of
the vector structure (if any) of outer and inner mode, and it is valid 
as an lvalue.

Unfortunately, add_predicate_code in gensupport.c didn't get the message.
This patch fixes that.

Bootstrapped and regression tested on  x86_64-pc-linux-gnu .
2018-11-10  Joern Rennecke  

* gensupport.c (add_predicate_code): Properly handle ZERO_EXTRACT
as an lvalue.

Index: gensupport.c
===
--- gensupport.c(revision 266008)
+++ gensupport.c(working copy)
@@ -2827,6 +2827,7 @@ add_predicate_code (struct pred_data *pr
  && code != CONCAT
  && code != PARALLEL
  && code != STRICT_LOW_PART
+ && code != ZERO_EXTRACT
  && code != SCRATCH)
pred->allows_non_lvalue = true;
 


RFA: vectorizer patches 2/2: reduction splitting

2018-11-11 Thread Joern Wolfgang Rennecke
It's nice to use the processors vector arithmetic to good effect, but 
it's all for naught when
there are too many moves from/to general registers cluttering up the 
loop.  With a
double-vector reduction variable, the standard final reduction code got 
so awkward that
the register allocator decided that the reduction variable must live in 
general purpose

registers, not only after the loop, but across the loop patch.
Splitting the reduction to force the first step to be done as a vector 
operation

seemed the obvious solution. The hook was called, but the vectorizer still
generated the vanilla final reduction code.  It turns out that the 
reduction splitting

was calculated, but the result not used, and the calculation started anew.

The attached patch fixes this.

bootstrapped and regression tested on x86_64-pc-linux-gnu .
2018-10-31  Joern Rennecke  

* tree-vect-loop.c (vect_create_epilog_for_reduction):
If we split the reduction, use the result in Case 3 too.

Index: tree-vect-loop.c
===
--- tree-vect-loop.c(revision 266008)
+++ tree-vect-loop.c(working copy)
@@ -5139,6 +5139,7 @@ vect_create_epilog_for_reduction (vec vect_d
   while (sz > sz1)
{
  gcc_assert (!slp_reduc);
+ gcc_assert (new_phis.length () == 1);
  sz /= 2;
  vectype1 = get_vectype_for_scalar_type_and_size (scalar_type, sz);
 
@@ -5301,7 +5302,12 @@ vect_create_epilog_for_reduction (vec vect_d
   FOR_EACH_VEC_ELT (new_phis, i, new_phi)
 {
   int bit_offset;
-  if (gimple_code (new_phi) == GIMPLE_PHI)
+
+ /* If we did reduction splitting, make sure to use the result.  */
+ if (!slp_reduc && new_phis.length () == 1
+ && new_temp != new_phi_result)
+   vec_temp = new_temp;
+ else if (gimple_code (new_phi) == GIMPLE_PHI)
 vec_temp = PHI_RESULT (new_phi);
   else
 vec_temp = gimple_assign_lhs (new_phi);


[PATCH PR84648]Adjust loop exit conditions for loop-until-wrap cases.

2018-11-11 Thread bin.cheng
Hi,
This patch fixes PR84648 by adjusting exit conditions for loop-until-wrap cases.
It only handles simple cases in which IV.base are constants because we rely on
current niter analyzer which doesn't handle parameterized bound in wrapped
case.  It could be relaxed in the future.

Bootstrap and test on x86_64 in progress.

Thanks,
bin
2018-11-11  Bin Cheng  

PR tree-optimization/84648
* tree-ssa-loop-niter.c (adjust_cond_for_loop_until_wrap): New.
(number_of_iterations_cond): Adjust exit cond for loop-until-wrap case
by calling adjust_cond_for_loop_until_wrap.

2018-11-11  Bin Cheng  

PR tree-optimization/84648
* gcc.dg/tree-ssa/pr84648.c: New test.

0001-Fix-pr84648-by-adjusting-exit-cond-for-loop-until-wr.patch
Description: Binary data