date:20170824

Re: [70/77] Make expand_fix/float check for scalar modes

2017-08-24 Thread Jeff Law

On 07/13/2017 03:03 AM, Richard Sandiford wrote:
> The expand_float code:
> 
>   /* Unsigned integer, and no way to convert directly.  Convert as signed,
>  then unconditionally adjust the result.  */
> 
> and the expand_fix code:
> 
>   /* For an unsigned conversion, there is one more way to do it.
>  If we have a signed conversion, we generate code that compares
>  the real value to the largest representable positive number.  If if
>  is smaller, the conversion is done normally.  Otherwise, subtract
>  one plus the highest signed number, convert, and add it back.
> 
> are restricted to scalars, since the expansion branches on a
> comparison of the value.  This patch makes that explicit.
> 
> 2017-07-13  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * optabs.c (expand_float): Explicitly check for scalars before
>   using a branching expansion.
>   (expand_fix): Likewise.
OK.
jeff

Re: [69/77] Split scalar-only part out of convert_mode

2017-08-24 Thread Jeff Law

On 07/13/2017 03:03 AM, Richard Sandiford wrote:
> This patch splits the final scalar-only part of convert_mode out
> into its own subroutine and treats the modes as scalar_modes there.
> 
> 2017-07-13  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * expr.c (convert_mode): Split scalar handling out into...
>   (convert_mode_scalar): ...this new function.  Treat the modes
>   as scalar_modes.
OK.
jeff

Re: [68/77] Use scalar_mode for is_int_mode/is_float_mode pairs

2017-08-24 Thread Jeff Law

On 07/13/2017 03:02 AM, Richard Sandiford wrote:
> This patch uses scalar_mode for code that operates only on MODE_INT
> and MODE_FLOAT.
> 
> 2017-07-13  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * omp-expand.c (expand_omp_atomic): Use is_int_mode, is_float_mode
>   and scalar_mode.
>   * tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Likewise.
OK.
jeff

Re: [66/77] Use scalar_mode for constant integers

2017-08-24 Thread Jeff Law

On 07/13/2017 03:02 AM, Richard Sandiford wrote:
> This patch treats the mode associated with an integer constant as a
> scalar_mode.  We can't use the more natural-sounding scalar_int_mode
> because we also use (const_int 0) for bounds-checking modes.  (It might
> be worth adding a bounds-specific code instead, but that's for another
> day.)
Is that the only reason why we can't use scalar_int_mode here -- the
bounds checking stuff?  What if it were to just magically disappear?


> 
> This exposes a latent bug in simplify_immed_subreg, which for
> vectors of CONST_WIDE_INTs would pass the vector mode rather than
> the element mode to rtx_mode_t.
> 
> I think the:
> 
> /* We can get a 0 for an error mark.  */
> || GET_MODE_CLASS (mode) == MODE_VECTOR_INT
> || GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT
An interesting nugget.   It appears Aldy's commit message, but not in
the approved patch from June 2002.
> 
> in immed_double_const is dead.  trunc_int_mode (via gen_int_mode)
> would go on to ICE if the mode fitted in a HWI, and surely plenty
> of other code would be confused to see a const_int be interpreted
> as a vector.  We should instead be using CONST0_RTX (mode) if we
> need a safe constant for a particular mode.
Yea, if we wanted to use 0 as an error mark, we should be using it via
CONST0_RTX (mode).
> 
> We didn't try to make these functions take scalar_mode arguments
> because in many cases that would be too invasive at this stage.
> Maybe it would become feasible in future.  Also, the long-term
> direction should probably be to add modes to constant integers
> rather than have then as VOIDmode odd-ones-out.  That would remove
> the need for rtx_mode_t and thus remove the question whether they
> should use scalar_int_mode, scalar_mode or machine_mode.
THe lack of a mode on CONST_INTs is a long standing wart.  It's been
eons since we really thought about how to fix it.I'd have to dig
real deep to remember why we've let the wart stand so long

> 
> The patch also uses scalar_mode for the CONST_DOUBLE handling
> in loc_descriptor.  In that case the mode can legitimately be
> either floating-point or integral.
> 
> 2017-07-13  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * emit-rtl.c (immed_double_const): Use is_a  instead
>   of separate mode class checks.  Do not allow vector modes here.
>   (immed_wide_int_const): Use as_a .
>   * explow.c (trunc_int_for_mode): Likewise.
>   * rtl.h (wi::int_traits::get_precision): Likewise.
>   (wi::shwi): Likewise.
>   (wi::min_value): Likewise.
>   (wi::max_value): Likewise.
>   * dwarf2out.c (loc_descriptor): Likewise.
>   * simplify-rtx.c (simplify_immed_subreg): Fix rtx_mode_t argument
>   for CONST_WIDE_INT.
OK.
jeff

Re: [65/77] Add a SCALAR_TYPE_MODE macro

2017-08-24 Thread Jeff Law

On 07/13/2017 03:01 AM, Richard Sandiford wrote:
> This patch adds a SCALAR_TYPE_MODE macro, along the same lines as
> SCALAR_INT_TYPE_MODE and SCALAR_FLOAT_TYPE_MODE.  It also adds
> two instances of as_a  to c_common_type, when converting
> an unsigned fixed-point SCALAR_TYPE_MODE to the equivalent signed mode.
> 
> 2017-07-13  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * tree.h (SCALAR_TYPE_MODE): New macro.
>   * expr.c (expand_expr_addr_expr_1): Use it.
>   (expand_expr_real_2): Likewise.
>   * fold-const.c (fold_convert_const_fixed_from_fixed): Likeise.
>   (fold_convert_const_fixed_from_int): Likewise.
>   (fold_convert_const_fixed_from_real): Likewise.
>   (native_encode_fixed): Likewise
>   (native_encode_complex): Likewise
>   (native_encode_vector): Likewise.
>   (native_interpret_fixed): Likewise.
>   (native_interpret_real): Likewise.
>   (native_interpret_complex): Likewise.
>   (native_interpret_vector): Likewise.
>   * omp-simd-clone.c (simd_clone_adjust_return_type): Likewise.
>   (simd_clone_adjust_argument_types): Likewise.
>   (simd_clone_init_simd_arrays): Likewise.
>   (simd_clone_adjust): Likewise.
>   * stor-layout.c (layout_type): Likewise.
>   * tree.c (build_minus_one_cst): Likewise.
>   * tree-cfg.c (verify_gimple_assign_ternary): Likewise.
>   * tree-inline.c (estimate_move_cost): Likewise.
>   * tree-ssa-math-opts.c (convert_plusminus_to_widen): Likewise.
>   * tree-vect-loop.c (vect_create_epilog_for_reduction): Likewise.
>   (vectorizable_reduction): Likewise.
>   * tree-vect-patterns.c (vect_recog_widen_mult_pattern): Likewise.
>   (vect_recog_mixed_size_cond_pattern): Likewise.
>   (check_bool_pattern): Likewise.
>   (adjust_bool_pattern): Likewise.
>   (search_type_for_mask_1): Likewise.
>   * tree-vect-slp.c (vect_schedule_slp_instance): Likewise.
>   * tree-vect-stmts.c (vectorizable_conversion): Likewise.
>   * ubsan.c (ubsan_encode_value): Likewise.
>   * varasm.c (output_constant): Likewise.
> 
> gcc/c-family/
>   * c-lex.c (interpret_fixed): Use SCALAR_TYPE_MODE.
>   * c-common.c (c_build_vec_perm_expr): Likewise.
> 
> gcc/c/
>   * c-typeck.c (build_binary_op): Use SCALAR_TYPE_MODE.
>   (c_common_type): Likewise.  Use as_a  when setting
>   m1 and m2 to the signed equivalent of a fixed-point
>   SCALAR_TYPE_MODE.
> 
> gcc/cp/
>   * typeck.c (cp_build_binary_op): Use SCALAR_TYPE_MODE.
OK.
Jeff

Re: [64/77] Add a scalar_mode class

2017-08-24 Thread Jeff Law

On 07/13/2017 03:01 AM, Richard Sandiford wrote:
> This patch adds a scalar_mode class that can hold any scalar mode,
> specifically:
> 
>   - scalar integers
>   - scalar floating-point values
>   - scalar fractional modes
>   - scalar accumulator modes
>   - pointer bounds modes
> 
> To start with this patch uses this type for GET_MODE_INNER.
> Later patches add more uses.
> 
> 2017-07-13  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * coretypes.h (scalar_mode): New class.
>   * machmode.h (scalar_mode): Likewise.
>   (scalar_mode::includes_p): New function.
>   (mode_to_inner): Return a scalar_mode rather than a machine_mode.
>   * gdbhooks.py (build_pretty_printers): Handle scalar_mode.
>   * genmodes.c (get_mode_class): Handle remaining scalar modes.
>   * cfgexpand.c (expand_debug_expr): Use scalar_mode.
>   * expmed.c (store_bit_field_1): Likewise.
>   (extract_bit_field_1): Likewise.
>   * expr.c (write_complex_part): Likewise.
>   (read_complex_part): Likewise.
>   (emit_move_complex_push): Likewise.
>   (expand_expr_real_2): Likewise.
>   * function.c (assign_parm_setup_reg): Likewise.
>   (assign_parms_unsplit_complex): Likewise.
>   * optabs.c (expand_binop): Likewise.
>   * rtlanal.c (subreg_get_info): Likewise.
>   * simplify-rtx.c (simplify_immed_subreg): Likewise.
>   * varasm.c (output_constant_pool_2): Likewise.
OK.
jeff

Re: [PATCH] handle pathological anti-ranges in gimple_fold_builtin_memory_op (PR 81908)

2017-08-24 Thread Martin Sebor


On 08/24/2017 03:52 PM, Jeff Law wrote:

On 08/24/2017 09:03 AM, Martin Sebor wrote:

On 08/24/2017 08:03 AM, Richard Biener wrote:

On Wed, Aug 23, 2017 at 9:42 PM, Martin Sebor  wrote:

Bug 81908 is about a -Wstringop-overflow warning for a Fortran
test triggered by a recent VRP improvement.  A simple test case
that approximates the warning is:

  void f (char *d, const char *s, size_t n)
  {
if (n > 0 && n <= SIZE_MAX / 2)
  n = 0;

memcpy (d, s, n);   // n in ~[1, SIZE_MAX / 2]
  }

Since the only valid value of n is zero the call to memcpy can
be considered a no-op (a value of n > SIZE_MAX is in excess of
the maximum size of the largest object and would surely make
the call crash).

The important difference between the test case and Bug 81908
is that in the latter, the code is emitted by GCC itself from
what appears to be correct source (looks like it's the result
of the loop distribution pass).  I believe the warning for
the test case above and for other human-written code like it
is helpful, but warning for code emitted by GCC, even if it's
dead or unreachable, is obviously not (at least not to users).

The attached patch enhances the gimple_fold_builtin_memory_op
function to eliminate this patholohgical case by making use
of range information to fold into no-ops calls to memcpy whose
size argument is in a range where the only valid value is zero.
This gets rid of the warning and improves the emitted code.

Tested on x86_64-linux.


@@ -646,6 +677,12 @@ gimple_fold_builtin_memory_op
(gimple_stmt_iterator *gsi,
   tree destvar, srcvar;
   location_t loc = gimple_location (stmt);

+  if (tree valid_len = only_valid_value (len))
+{
+  /* LEN is in range whose only valid value is zero.  */
+  len = valid_len;
+}
+
   /* If the LEN parameter is zero, return DEST.  */
   if (integer_zerop (len))

just enhance this check to

  if (integer_zerop (len)
  || size_must_be_zero_p (len))

?  'only_valid_value' looks too generic for this.


Sure.

FWIW, the reason I did it this was because my original patch
returned error_mark_node for entirely invalid ranges and had
the caller replace the call with a trap.  I decided not to
include that part in this fix to keep it contained.

Seems reasonable.  Though I would suggest going forward with trap
replacement for clearly invalid ranges as a follow-up.  Once you do trap
replacement, the input operands all become dead as does all the code
after the trap and the outgoing edges in the CFG.

That often exposes a fair amount of cleanup.  Furthermore on targets
that have conditional traps, the controlling condition often turns into
a conditional trap.  In all, you get a lot of nice cascading effects
when you turn something that is clearly bogus into a trap.

gimple-ssa-isolate-erroneous-paths probably has the infrastructure you
need, including a code you can crib to detect PHI arguments which would
cause bogus behavior and allow you to isolate that specific path.


Thanks for the pointer.  I'll look into it.


+  if (!wi::fits_uhwi_p (min) || !wi::fits_uhwi_p (max))
+return NULL_TREE;
+

why?


Only because I never remember what APIs are safe to use with
what input.


+  if (wi::eq_p (min, wone)
+  && wi::geu_p (max + 1, ssize_max))

   if (wi::eq_p (min, 1)
  && wi::gtu_p (max, wi::max_value (prec, SIGNED)))

your ssize_max isn't signed size max, and max + 1 might overflow to zero.


You're right that the addition to max would be better done
as subtraction from the result of (1 << N).  Thank you.

If (max + 1) overflowed then (max == TYPE_MAX) would have
to hold which I thought could never be true for an anti
range. (The patch includes tests for this case.)  Was I
wrong?

Couldn't we have an anti range like ~[TYPE_MAX,TYPE_MAX]?  Or am I
misunderstanding something.


I've never seen such an anti-range.  In my tests, a range one
of whose bounds is equal to one of the type's limits) ends up
canonicalized into an ordinary range.  If I'm reading the code
right it's done in set_and_canonicalize_value_range.


Attached is an updated version with the suggested changes
plus an additional test to verify the absence of warnings.

The patch is OK.


Thanks.  Patch committed in r251347.



I'll note this is another use of anti-ranges.  I'd really like to see
Aldy's work on the new range API move forward and get rid of anti-ranges.


You and me both :)

Martin

Re: [Patch][aarch64] Use IFUNCs to enable LSE instructions in libatomic on aarch64

2017-08-24 Thread Steve Ellcey

Ping.

> 2017-08-07  Steve Ellcey  
> 
>   * Makefile.am (ARCH_AARCH64_LINUX_LSE): Add IFUNC_OPTIONS and
>   libatomic_la_LIBADD.
>   * config/linux/aarch64/host-config.h: New file.
>   * config/linux/aarch64/init.c: New file.
>   * configure.ac (AC_CHECK_HEADERS): Check for sys/auxv.h.
>   (AC_CHECK_FUNCS): Check for getauxval.
>   (ARCH_AARCH64_LINUX_LSE): New conditional for IFUNC builds.
>   * configure.tgt (aarch64): Set AARCH and try_ifunc.
>   (aarch64*-*-linux*) Update config_path.
>   * Makefile.in: Regenerate.
>   * auto-config.h.in: Regenerate.
>   * configure: Regenerate.
>

Re: [PATCH GCC][04/06]Add copying interface for dependence_info

2017-08-24 Thread Jeff Law

On 08/14/2017 03:19 AM, Bin Cheng wrote:
> HI,
> This patch adds copying interface for dependence_info.  The methodology
> is we don't copy such information by default, and this interface should
> be called explicitly when it is safe and necessary to do so.  Just like
> this patch uses the interface in ivopts.
> Bootstrap and test in series.  Is it OK?
> 
> Thanks,
> bin
> 2017-08-10  Bin Cheng  
> 
>   * tree-ssa-address.c (copy_dependence_info): New function.
>   * tree-ssa-address.h (copy_dependence_info): New declaration.
>   * tree-ssa-loop-ivopts.c (rewrite_use_address): Call above func.
So do we have any structure sharing assumptions on the alias structures?
 ie, are we setting up the possibility that these objects will be shared
and that someone will modify them in a way that works in one context,
but not another?

If they're readonly after creation, then obviously this isn't a concern.

I wouldn't consider this an object or an ACK for the patch at this
point. More a design question we need to answer.

Jeff

Re: [PATCH v2] Simplify pow with constant

2017-08-24 Thread Jeff Law

On 08/17/2017 09:43 AM, Alexander Monakov wrote:
> On Thu, 17 Aug 2017, Wilco Dijkstra wrote:
> 
>> This patch simplifies pow (C, x) into exp (x * C1) if C > 0, C1 = log (C).
> 
> Note this changes the outcome for C == +Inf, x == 0 (pow is specified to
> return 1.0 in that case, but x * C1 == NaN).  There's another existing
> transform with the same issue, 'pow(expN(x), y) -> expN(x*y)', so this is
> not a new problem.
That may be part of the reason why they're guarded with -ffast-math.  In
addition to the changes in accuracy.  I haven't followed those
transformations on the GCC side to know with any certainty.

jeff

Re: [PATCH v2] Simplify pow with constant

2017-08-24 Thread Jeff Law

On 08/17/2017 07:56 AM, Wilco Dijkstra wrote:
> This patch simplifies pow (C, x) into exp (x * C1) if C > 0, C1 = log (C).
> Do this only for fast-math as accuracy is reduced.  This is much faster
> since pow is more complex than exp - with current GLIBC the speedup is
> more than 7 times for this transformation.
Right.  exp is painful in glibc, but pow is *dramatically* more painful
and likely always will be.

Siddhesh did some great work in bringing those costs down in glibc but
the more code we can reasonably shunt into exp instead of pow, the better.

It's likely pow will always be significantly more expensive than exp.
It's also likely that predicting when these functions are going to go
off the fast paths is painful.

I think Richi already ack'd the V2 patch.  Just wanted to chime in given
I've been fairly deep on this in the past with customers on the glibc side.

jeff

Re: [PING #2] [PATCH] enhance -Wrestrict to handle string built-ins (PR 78918)

2017-08-24 Thread Jeff Law

On 08/22/2017 02:45 AM, Richard Biener wrote:
> On Mon, Aug 21, 2017 at 10:10 PM, Martin Sebor  wrote:
>> On 08/09/2017 10:14 AM, Jeff Law wrote:
>>>
>>> On 08/06/2017 05:08 PM, Martin Sebor wrote:
>>>
>
> Well, simply because the way as implemented isn't a must-alias query
> but maybe one that's good enough for warnings (reduces false positives
> but surely doesn't eliminate them).


 I'm very interested in reducing the rate of false positives in
 these and all other warnings.  As I mentioned in my comments,
 I did my best to weed them out of the implementation by building
 GDB, Glibc, Busybox, and the Linux kernel.  That of course isn't
 a guarantee that there aren't any.  But the first implementation
 of any non-trivial feature is never perfect, and hardly any
 warning of sufficient complexity is free of false positives, no
 matter here it's implemented (the middle-end, front-end, or
 a standalone static analysis tool).  If you spotted some cases
 I had missed I'd certainly be grateful for examples.  Otherwise,
 they will undoubtedly be reported as more software is exposed
 to the warning and, if possible, fixed, as happens with all
 other warnings.
>>>
>>> I think Richi is saying that the must alias query you've built isn't
>>> proper/correct.  It's less about false positives for Richi and more
>>> about building a proper must alias query if I understand him correctly.
>>>
>>> I suspect he's also saying that you can't reasonably build must-alias on
>>> top of a may-alias query framework.  They're pretty different queries.
>>>
>>> If you need something that is close to, but not quite a must alias
>>> query, then you're going to have to make a argument for that and you
>>> can't call it a must alias query.
>>
>>
>> Attached is an updated and simplified patch that avoids making
>> changes to any of the may-alias functions.  It turns out that
>> all the information the logic needs to determine the overlap
>> is already in the ao_ref structures populated by
>> ao_ref_init_from_ptr_and_size.  The only changes to the pass
>> are the enhancement to ao_ref_init_from_ptr_and_size to make
>> use of range info and the addition of the two new functions
>> used by the -Wrestrict clients outside the pass.
> 
> Warning for memcpy (p, p, ...) is going to fire false positives all around
> given the C++ FE emits those in some cases and optimization can
> expose that we are dealing with self-assignments.  And *p = *p is
> valid.
Correct.  I wound my way through this mess a while back.  Essentially
Red Hat had a customer with code that had overlapped memcpy arguments.
We had them use the memstomp interposition library to start tracking
these problems down.

One of the things that popped up was structure/class copies which were
implemented via calls to memcpy.In the case of self assignment, the
interposition library would note the overlap and (rightly IMHO) complain.


One could argue that GCC should emit memmove by default for structure
assignments, only using memcpy when it knows its not doing self
assignment (which may be hard to determine).  Furthermore, GCC should
eliminate self structure/class assignment.


> 
> @@ -1028,6 +1066,10 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator 
> *gsi,
> }
> }
> 
> +  /* Avoid folding the call if overlap is detected.  */
> +  if (check_overlap && detect_overlap (loc, stmt, dest, src, len))
> +   return false;
> +
> 
> no, please not.  You diagnosed the call (which might be a false positive)
> so why keep it undefined?  The folded stmt will either have the same
> semantics (aggregate copies expanded as memcpy) or have all reads
> performed before writes.
So can we distinguish here between overlap and the self-copy case?

A self-copy should just be folded away.  It's no different than x = x on
scalars except that we've dropped it to a memcpy in the IL.  Doing so
makes the code more efficient and removes false positives from tools
like the memstomp interposition library, making those tools more useful.




Jeff

Re: [PATCH, rs6000] Fix PR81504 (vec_ld / vec_st bug)

2017-08-24 Thread Segher Boessenkool

Hi!

On Thu, Aug 24, 2017 at 04:04:23PM -0500, Bill Schmidt wrote:
> @@ -1501,7 +1503,21 @@ recombine_lvx_pattern (rtx_insn *insn, del_info *t
> to_delete[INSN_UID (swap_insn)].replace = true;
> to_delete[INSN_UID (swap_insn)].replace_insn = swap_insn;
>  
> -   XEXP (mem, 0) = and_operation;
> +   /* However, first we must be sure that we make the
> +  base register from the AND operation available
> +  in case the register has been overwritten.  Copy
> +  the base register to a new pseudo and use that
> +  as the base register of the AND operation in
> +  the new LVX instruction.  */
> +   rtx and_base = XEXP (and_operation, 0);
> +   rtx new_reg = gen_reg_rtx (GET_MODE (and_base));
> +   rtx copy = gen_rtx_SET (new_reg, and_base);
> +   rtx_insn *new_insn = emit_insn_after (copy, and_insn);
> +   set_block_for_insn (new_insn, BLOCK_FOR_INSN (and_insn));
> +   df_insn_rescan (new_insn);

Are those last two lines needed?  Doesn't emit_insn_after do this
already?

Oh, "copy" isn't an insn yet.  Is it simpler if you change this?

Okay with or without that change (also for 7).  Thanks!


Segher

Re: [PATCH] Improve alloca alignment

2017-08-24 Thread Jeff Law

On 08/22/2017 08:15 AM, Wilco Dijkstra wrote:
> Jeff Law wrote:
> On 07/26/2017 05:29 PM, Wilco Dijkstra wrote:
> 
>>> But then the check size_align % MAX_SUPPORTED_STACK_ALIGNMENT != 0
>>> seems wrong too given that round_push uses a different alignment to align 
>>> to. 
>> I had started to dig into the history of this code, but just didn't have
>> the time to do so fully before needing to leave for the day.  To some
>> degree I was hoping you knew the rationale behind the test against
>> MAX_SUPPORTED_STACK_ALIGNMENT and I wouldn't have to do a ton of digging :-)
> 
> I looked further into this - it appears this works correctly since it is only 
> bypassed if
> size_align is already maximally aligned. round_push aligns to the preferred 
> alignment,
> which may be lower or equal to MAX_SUPPORTED_STACK_ALIGNMENT (which is
> at least STACK_BOUNDARY).
> 
> Here is the updated version:
> 
> ChangeLog:
> 2017-08-22  Wilco Dijkstra  
> 
>   * explow.c (get_dynamic_stack_size): Improve dynamic alignment.
OK.  I wonder if this will make it easier to write stack-clash tests of
the dynamic space for boundary conditions :-)  I was always annoyed that
I had to fiddle around with magic adjustments to the sizes of objects to
tickle boundary cases.

jeff

Re: [PATCH] handle pathological anti-ranges in gimple_fold_builtin_memory_op (PR 81908)

2017-08-24 Thread Jeff Law

On 08/24/2017 09:03 AM, Martin Sebor wrote:
> On 08/24/2017 08:03 AM, Richard Biener wrote:
>> On Wed, Aug 23, 2017 at 9:42 PM, Martin Sebor  wrote:
>>> Bug 81908 is about a -Wstringop-overflow warning for a Fortran
>>> test triggered by a recent VRP improvement.  A simple test case
>>> that approximates the warning is:
>>>
>>>   void f (char *d, const char *s, size_t n)
>>>   {
>>> if (n > 0 && n <= SIZE_MAX / 2)
>>>   n = 0;
>>>
>>> memcpy (d, s, n);   // n in ~[1, SIZE_MAX / 2]
>>>   }
>>>
>>> Since the only valid value of n is zero the call to memcpy can
>>> be considered a no-op (a value of n > SIZE_MAX is in excess of
>>> the maximum size of the largest object and would surely make
>>> the call crash).
>>>
>>> The important difference between the test case and Bug 81908
>>> is that in the latter, the code is emitted by GCC itself from
>>> what appears to be correct source (looks like it's the result
>>> of the loop distribution pass).  I believe the warning for
>>> the test case above and for other human-written code like it
>>> is helpful, but warning for code emitted by GCC, even if it's
>>> dead or unreachable, is obviously not (at least not to users).
>>>
>>> The attached patch enhances the gimple_fold_builtin_memory_op
>>> function to eliminate this patholohgical case by making use
>>> of range information to fold into no-ops calls to memcpy whose
>>> size argument is in a range where the only valid value is zero.
>>> This gets rid of the warning and improves the emitted code.
>>>
>>> Tested on x86_64-linux.
>>
>> @@ -646,6 +677,12 @@ gimple_fold_builtin_memory_op
>> (gimple_stmt_iterator *gsi,
>>tree destvar, srcvar;
>>location_t loc = gimple_location (stmt);
>>
>> +  if (tree valid_len = only_valid_value (len))
>> +{
>> +  /* LEN is in range whose only valid value is zero.  */
>> +  len = valid_len;
>> +}
>> +
>>/* If the LEN parameter is zero, return DEST.  */
>>if (integer_zerop (len))
>>
>> just enhance this check to
>>
>>   if (integer_zerop (len)
>>   || size_must_be_zero_p (len))
>>
>> ?  'only_valid_value' looks too generic for this.
> 
> Sure.
> 
> FWIW, the reason I did it this was because my original patch
> returned error_mark_node for entirely invalid ranges and had
> the caller replace the call with a trap.  I decided not to
> include that part in this fix to keep it contained.
Seems reasonable.  Though I would suggest going forward with trap
replacement for clearly invalid ranges as a follow-up.  Once you do trap
replacement, the input operands all become dead as does all the code
after the trap and the outgoing edges in the CFG.

That often exposes a fair amount of cleanup.  Furthermore on targets
that have conditional traps, the controlling condition often turns into
a conditional trap.  In all, you get a lot of nice cascading effects
when you turn something that is clearly bogus into a trap.

gimple-ssa-isolate-erroneous-paths probably has the infrastructure you
need, including a code you can crib to detect PHI arguments which would
cause bogus behavior and allow you to isolate that specific path.



> 
>>
>> +  if (!wi::fits_uhwi_p (min) || !wi::fits_uhwi_p (max))
>> +return NULL_TREE;
>> +
>>
>> why?
> 
> Only because I never remember what APIs are safe to use with
> what input.
> 
>> +  if (wi::eq_p (min, wone)
>> +  && wi::geu_p (max + 1, ssize_max))
>>
>>if (wi::eq_p (min, 1)
>>   && wi::gtu_p (max, wi::max_value (prec, SIGNED)))
>>
>> your ssize_max isn't signed size max, and max + 1 might overflow to zero.
> 
> You're right that the addition to max would be better done
> as subtraction from the result of (1 << N).  Thank you.
> 
> If (max + 1) overflowed then (max == TYPE_MAX) would have
> to hold which I thought could never be true for an anti
> range. (The patch includes tests for this case.)  Was I
> wrong?
Couldn't we have an anti range like ~[TYPE_MAX,TYPE_MAX]?  Or am I
misunderstanding something.

> 
> Attached is an updated version with the suggested changes
> plus an additional test to verify the absence of warnings.
The patch is OK.

I'll note this is another use of anti-ranges.  I'd really like to see
Aldy's work on the new range API move forward and get rid of anti-ranges.

THanks,
jeff

Re: [63/77] Simplifications after type switch

2017-08-24 Thread Jeff Law

On 07/13/2017 03:01 AM, Richard Sandiford wrote:
> This patch makes a few simplifications after the previous
> mechanical machine_mode->scalar_int_mode change.
> 
> 2017-07-13  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * expmed.c (extract_high_half): Use scalar_int_mode and remove
>   assertion.
>   (expmed_mult_highpart_optab): Likewise.
>   (expmed_mult_highpart): Likewise.
OK.
jeff

Re: [62/77] Big machine_mode to scalar_int_mode replacement

2017-08-24 Thread Jeff Law

On 07/13/2017 03:00 AM, Richard Sandiford wrote:
> This patch changes the types of various things from machine_mode
> to scalar_int_mode, in cases where (after previous patches)
> simply changing the type is enough on its own.  The patch does
> nothing other than that.
> 
> 2017-07-13  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * builtins.h (builtin_strncpy_read_str): Take a scalar_int_mode
>   instead of a machine_mode.
>   (builtin_memset_read_str): Likewise.
>   * builtins.c (c_readstr): Likewise.
>   (builtin_memcpy_read_str): Likewise.
>   (builtin_strncpy_read_str): Likewise.
>   (builtin_memset_read_str): Likewise.
>   (builtin_memset_gen_str): Likewise.
>   (expand_builtin_signbit): Use scalar_int_mode for local variables.
>   * cfgexpand.c (convert_debug_memory_address): Take a scalar_int_mode
>   instead of a machine_mode.
>   * combine.c (simplify_if_then_else): Use scalar_int_mode for local
>   variables.
>   (make_extraction): Likewise.
>   (try_widen_shift_mode): Take and return scalar_int_modes instead
>   of machine_modes.
>   * config/aarch64/aarch64.c (aarch64_libgcc_cmp_return_mode): Return
>   a scalar_int_mode instead of a machine_mode.
>   * config/avr/avr.c (avr_addr_space_address_mode): Likewise.
>   (avr_addr_space_pointer_mode): Likewise.
>   * config/cr16/cr16.c (cr16_unwind_word_mode): Likewise.
>   * config/msp430/msp430.c (msp430_addr_space_pointer_mode): Likewise.
>   (msp430_unwind_word_mode): Likewise.
>   * config/spu/spu.c (spu_unwind_word_mode): Likewise.
>   (spu_addr_space_pointer_mode): Likewise.
>   (spu_addr_space_address_mode): Likewise.
>   (spu_libgcc_cmp_return_mode): Likewise.
>   (spu_libgcc_shift_count_mode): Likewise.
>   * config/rl78/rl78.c (rl78_addr_space_address_mode): Likewise.
>   (rl78_addr_space_pointer_mode): Likewise.
>   (fl78_unwind_word_mode): Likewise.
>   (rl78_valid_pointer_mode): Take a scalar_int_mode instead of a
>   machine_mode.
>   * config/alpha/alpha.c (vms_valid_pointer_mode): Likewise.
>   * config/ia64/ia64.c (ia64_vms_valid_pointer_mode): Likewise.
>   * config/mips/mips.c (mips_mode_rep_extended): Likewise.
>   (mips_valid_pointer_mode): Likewise.
>   * config/tilegx/tilegx.c (tilegx_mode_rep_extended): Likewise.
>   * config/ft32/ft32.c (ft32_valid_pointer_mode): Likewise.
>   (ft32_addr_space_pointer_mode): Return a scalar_int_mode instead
>   of a machine_mode.
>   (ft32_addr_space_address_mode): Likewise.
>   * config/m32c/m32c.c (m32c_valid_pointer_mode): Take a
>   scalar_int_mode instead of a machine_mode.
>   (m32c_addr_space_pointer_mode): Return a scalar_int_mode instead
>   of a machine_mode.
>   (m32c_addr_space_address_mode): Likewise.
>   * config/powerpcspe/powerpcspe.c (rs6000_abi_word_mode): Likewise.
>   (rs6000_eh_return_filter_mode): Likewise.
>   * config/rs6000/rs6000.c (rs6000_abi_word_mode): Likewise.
>   (rs6000_eh_return_filter_mode): Likewise.
>   * config/s390/s390.c (s390_libgcc_cmp_return_mode): Likewise.
>   (s390_libgcc_shift_count_mode): Likewise.
>   (s390_unwind_word_mode): Likewise.
>   (s390_valid_pointer_mode): Take a scalar_int_mode rather than a
>   machine_mode.
>   * target.def (mode_rep_extended): Likewise.
>   (valid_pointer_mode): Likewise.
>   (addr_space.valid_pointer_mode): Likewise.
>   (eh_return_filter_mode): Return a scalar_int_mode rather than
>   a machine_mode.
>   (libgcc_cmp_return_mode): Likewise.
>   (libgcc_shift_count_mode): Likewise.
>   (unwind_word_mode): Likewise.
>   (addr_space.pointer_mode): Likewise.
>   (addr_space.address_mode): Likewise.
>   * doc/tm.texi: Regenerate.
>   * dojump.c (prefer_and_bit_test): Take a scalar_int_mode rather than
>   a machine_mode.
>   (do_jump): Use scalar_int_mode for local variables.
>   * dwarf2cfi.c (init_return_column_size): Take a scalar_int_mode
>   rather than a machine_mode.
>   * dwarf2out.c (convert_descriptor_to_mode): Likewise.
>   (scompare_loc_descriptor_wide): Likewise.
>   (scompare_loc_descriptor_narrow): Likewise.
>   * emit-rtl.c (adjust_address_1): Use scalar_int_mode for local
>   variables.
>   * except.c (sjlj_emit_dispatch_table): Likewise.
>   (expand_builtin_eh_copy_values): Likewise.
>   * explow.c (convert_memory_address_addr_space_1): Likewise.
>   Take a scalar_int_mode rather than a machine_mode.
>   (convert_memory_address_addr_space): Take a scalar_int_mode rather
>   than a machine_mode.
>   (memory_address_addr_space): Use scalar_int_mode for local variables.
>   * expmed.h (expand_mult_highpart_adjust): Take a scalar_int_mode
>

Re: [61/77] Use scalar_int_mode in the AArch64 port

2017-08-24 Thread Jeff Law

On 07/13/2017 03:00 AM, Richard Sandiford wrote:
> This patch makes the AArch64 port use scalar_int_mode in various places.
> Other ports won't need this kind of change; we only need it for AArch64
> because of the variable-sized SVE modes.
> 
> The only change in functionality is in the rtx_costs handling
> of CONST_INT.  If the caller doesn't supply a mode, we now pass
> word_mode rather than VOIDmode to aarch64_internal_mov_immediate.
> aarch64_movw_imm will therefore not now truncate large constants
> in this situation.
> 
> 2017-07-13  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * config/aarch64/aarch64-protos.h (aarch64_is_extend_from_extract):
>   Take a scalar_int_mode instead of a machine_mode.
>   (aarch64_mask_and_shift_for_ubfiz_p): Likewise.
>   (aarch64_move_imm): Likewise.
>   (aarch64_output_scalar_simd_mov_immediate): Likewise.
>   (aarch64_simd_scalar_immediate_valid_for_move): Likewise.
>   (aarch64_simd_attr_length_rglist): Delete.
>   * config/aarch64/aarch64.c (aarch64_is_extend_from_extract): Take
>   a scalar_int_mode instead of a machine_mode.
>   (aarch64_add_offset): Likewise.
>   (aarch64_internal_mov_immediate): Likewise
>   (aarch64_add_constant_internal): Likewise.
>   (aarch64_add_constant): Likewise.
>   (aarch64_movw_imm): Likewise.
>   (aarch64_move_imm): Likewise.
>   (aarch64_rtx_arith_op_extract_p): Likewise.
>   (aarch64_mask_and_shift_for_ubfiz_p): Likewise.
>   (aarch64_simd_scalar_immediate_valid_for_move): Likewise.
>   Remove assert that the mode isn't a vector.
>   (aarch64_output_scalar_simd_mov_immediate): Likewise.
>   (aarch64_expand_mov_immediate): Update calls after above changes.
>   (aarch64_output_casesi): Use as_a .
>   (aarch64_and_bitmask_imm): Check for scalar integer modes.
>   (aarch64_strip_extend): Likewise.
>   (aarch64_extr_rtx_p): Likewise.
>   (aarch64_rtx_costs): Likewise, using wode_mode as the mode of
>   a CONST_INT when the mode parameter is VOIDmode.
I'll let the aarch64 owners ack this one...  You might know one or two
of 'em :-)

jeff

Re: [60/77] Pass scalar_int_modes to do_jump_by_parts_*

2017-08-24 Thread Jeff Law

On 07/13/2017 02:59 AM, Richard Sandiford wrote:
> The callers of do_jump_by_parts_* had already established
> that the modes were MODE_INTs, so this patch passes the
> modes down as scalar_int_modes.
> 
> 2017-07-13  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * dojump.c (do_jump_by_parts_greater_rtx): Change the type of
>   the mode argument to scalar_int_mode.
>   (do_jump_by_parts_zero_rtx): Likewise.
>   (do_jump_by_parts_equality_rtx): Likewise.
>   (do_jump_by_parts_greater): Take a mode argument.
>   (do_jump_by_parts_equality): Likewise.
>   (do_jump_1): Update calls accordingly.
OK.
jeff

Re: [59/77] Add a rtx_jump_table_data::get_data_mode helper

2017-08-24 Thread Jeff Law

On 07/13/2017 02:59 AM, Richard Sandiford wrote:
> This patch adds a helper function to get the mode of the addresses
> or offsets in a jump table.  It also changes the final.c code to use
> rtx_jump_table_data over rtx or rtx_insn in cases where it needed
> to use the new helper.  This in turn meant adding a safe_dyn_cast
> equivalent of safe_as_a, to cope with null NEXT_INSNs.
> 
> 2017-07-13  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * is-a.h (safe_dyn_cast): New function.
>   * rtl.h (rtx_jump_table_data::get_data_mode): New function.
>   (jump_table_for_label): Likewise.
>   * final.c (final_addr_vec_align): Take an rtx_jump_table_data *
>   instead of an rtx_insn *.
>   (shorten_branches): Use dyn_cast instead of LABEL_P and
>   JUMP_TABLE_DATA_P.  Use jump_table_for_label and
>   rtx_jump_table_data::get_data_mode.
>   (final_scan_insn): Likewise.
OK.
jeff
>

Re: [58/77] Use scalar_int_mode in a try_combine optimisation

2017-08-24 Thread Jeff Law

On 07/13/2017 02:59 AM, Richard Sandiford wrote:
> This patch uses scalar_int_modes for:
> 
>   /* If I2 is setting a pseudo to a constant and I3 is setting some
>  sub-part of it to another constant, merge them by making a new
>  constant.  */
> 
> This was already implicit, but the danger with checking only
> CONST_SCALAR_INT_P is that it can include CC values too.
> 
> 2017-07-13  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * combine.c (try_combine): Use is_a  when
>   trying to combine a full-register integer set with a subreg
>   integer set.
OK.  I briefly mis-read the SET_DEST in the is_a as a SET_SRC (which in
this context could have VOIDmode).  But you're getting the mode from the
dest :-)

jeff

Re: [57/77] Use scalar_int_mode in expand_expr_addr_expr

2017-08-24 Thread Jeff Law

On 07/13/2017 02:58 AM, Richard Sandiford wrote:
> This patch rewrites the condition:
> 
>   if (tmode != address_mode && tmode != pointer_mode)
> tmode = address_mode;
> 
> to the equivalent:
> 
>   tmode == pointer_mode ? pointer_mode : address_mode
> 
> The latter has the advantage that the result is naturally
> a scalar_int_mode; a later mechanical patch makes it one.
> 
> 2017-07-13  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * expr.c (expand_expr_addr_expr): Add a new_tmode local variable
>   that is always either address_mode or pointer_mode.
OK.
jeff

Re: [55/77] Use scalar_int_mode in simplify_const_unary_operation

2017-08-24 Thread Jeff Law

On 07/13/2017 02:58 AM, Richard Sandiford wrote:
> The main scalar integer block in simplify_const_unary_operation
> had the condition:
> 
>   if (CONST_SCALAR_INT_P (op) && width > 0)
> 
> where "width > 0" was a roundabout way of testing != VOIDmode.
> This patch replaces it with a check for a scalar_int_mode instead.
> It also uses the number of bits in the input rather than the output
> mode to determine the result of a "count ... bits in zero" operation.
> (At the momemnt these modes have to be the same, but it still seems
> conceptually wrong to use the number of bits in the output mode.)
> 
> The handling of float->integer ops also checked "width > 0",
> but this was redundant with the earlier check for MODE_INT.
> 
> 2017-07-13  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * simplify-rtx.c (simplify_const_unary_operation): Use
>   is_a  instead of checking for a nonzero
>   precision.  Forcibly convert op_mode to a scalar_int_mode
>   in that case.  More clearly differentiate the operand and
>   result modes and use the former when deciding what the value
>   of a count-bits operation should be.  Use is_int_mode instead
>   of checking for a MODE_INT.  Remove redundant check for whether
>   this mode has a zero precision.
OK.
jeff

Re: [54/77] Add explicit int checks for alternative optab implementations

2017-08-24 Thread Jeff Law

On 07/13/2017 02:57 AM, Richard Sandiford wrote:
> expand_unop can expand narrow clz, clrsb, ctz, bswap, parity and
> ffs operations using optabs for wider modes.  These expansions
> apply only to scalar integer modes (and not for example to vectors),
> so the patch adds explicit checks for that.
> 
> 2017-07-13  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * optabs.c (widen_leading): Change the type of the mode argument
>   to scalar_int_mode.  Use opt_scalar_int_mode for the mode iterator.
>   (widen_bswap): Likewise.
>   (expand_parity): Likewise.
>   (expand_ctz): Change the type of the mode argument to scalar_int_mode.
>   (expand_ffs): Likewise.
>   (epand_unop): Check for scalar integer modes before calling the
>   above routines.
OK.
jeff

Re: [PATCH GCC][03/06]Dump dependence information

2017-08-24 Thread Jeff Law

On 08/14/2017 03:19 AM, Bin Cheng wrote:
> Hi,
> This simple patch adds code dumping struct dependence_info.
> Bootstrap and test in series.  Is it OK?
> 
> Thanks,
> bin
> 2017-08-10  Bin Cheng  
> 
>   * tree-pretty-print.c (dump_generic_node): Dump fixed length
>   tag in MEM_REF.  Dump dependence info in TARGET_MEM_REF.
> 
This is OK iff its dependencies are approved.

jeff

Re: [PATCH] PR81747, ICE in operator[]

2017-08-24 Thread Jeff Law

On 08/15/2017 09:27 PM, Alan Modra wrote:
> Ping?
> 
> On Wed, Aug 09, 2017 at 08:58:31PM +0930, Alan Modra wrote:
>>  PR rtl-optimization/81747
>>  * cse.c (cse_extended_basic_block): Don't attempt to record
>>  equivalences for degenerate conditional jumps that branch
>>  to their fall-through.
OK.  Sorry I didn't see this one in my queue.  Thanks for pinging it.

jeff

Re: [PATCH] Factor out division by squares and remove division around comparisons (1/2)

2017-08-24 Thread Jeff Law

On 08/17/2017 03:55 AM, Wilco Dijkstra wrote:
> Richard Biener wrote:
>> On Tue, Aug 15, 2017 at 4:11 PM, Wilco Dijkstra  
>> wrote:
>>> Richard Biener wrote:
> We also change the association of
>
>   x / (y * C) -> (x / C) / y
>
> If C is a constant.

 Why's that profitable?
>>>
>>> It enables (x * C1) / (y * C2) -> (x * C1/C2) / y for example.
>>> Also 1/y is now available to the reciprocal optimization, see
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026 for details.
>>
>> Sure, but on its own it's going to be slower.  So this isn't the
>> correct way to enable those followup transforms.
> 
> How can it be any slower? It's one division and one multiply in both cases.
x / (y * C) -> (x / C) / y

Goes from one division and one multiplication to two divisions.  I'm
guessing that's what Richi is (reasonably) concerned about.

So it may be the case that we need more complex pattern matching here
for when to perform the first transformation to ultimately enable the
second.


Jeff

[PATCH, rs6000] Fix PR81504 (vec_ld / vec_st bug)

2017-08-24 Thread Bill Schmidt

Hi,

https://gcc.gnu.org/PR81504 reports a problem with the use of vec_st to generate
the stvx instruction.  With swap optimization enabled, the stvx instruction uses
the wrong base address, causing data corruption.

The problem arises in the recombine_lvx_stvx pre-pass that runs prior to swap
optimization proper.  This pre-pass looks for RTL that can be combined into an
lvx or stvx pattern and makes that transformation before the swap optimizer
has a chance to disturb things beyond recognition.  Unfortunately it contains a
rather silly bug.

The base register for the memory operand of an lvx or stvx looks something like
((ptr + offset) & -16).  The optimization copies this expression into the new
lvx or stvx pattern.  However, Dorothy, we aren't in SSA form anymore, and we
can't assume this expression is fully available at the point of replacement.

The fix is to insert a copy immediately after the AND expression that computes
this result.  The AND contains a register with the value (ptr + offset) at that
location.  We copy that into a new pseudo and use that pseudo in the AND 
subexpression of the lvx/stvx, rather than the original register.  This solves
the problem reported in the PR.

The bug was introduced in GCC 7, so I would like to backport this after 
sufficient
burn-in time.  The backport will be the same, except that this code was in
rs6000.c rather than rs6000-p8swap.c in GCC 7.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.  Is this
okay for trunk and 7?

Thanks,
Bill


2017-08-24  Bill Schmidt  

PR target/81504
* config/rs6000/rs6000-p8swap.c (find_alignment_op): Add reference
parameter and_insn and return it.
(recombine_lvx_pattern): Insert a copy to ensure availability of
the base register of the copied masking operation at the point of
the instruction replacement.
(recombine_stvx_pattern): Likewise.


Index: gcc/config/rs6000/rs6000-p8swap.c
===
--- gcc/config/rs6000/rs6000-p8swap.c   (revision 251339)
+++ gcc/config/rs6000/rs6000-p8swap.c   (working copy)
@@ -1431,9 +1431,10 @@ alignment_mask (rtx_insn *insn)
 }
 
 /* Given INSN that's a load or store based at BASE_REG, look for a
-   feeding computation that aligns its address on a 16-byte boundary.  */
+   feeding computation that aligns its address on a 16-byte boundary.
+   Return the rtx and its containing AND_INSN.  */
 static rtx
-find_alignment_op (rtx_insn *insn, rtx base_reg)
+find_alignment_op (rtx_insn *insn, rtx base_reg, rtx_insn **and_insn)
 {
   df_ref base_use;
   struct df_insn_info *insn_info = DF_INSN_INFO_GET (insn);
@@ -1454,8 +1455,8 @@ static rtx
   if (DF_REF_IS_ARTIFICIAL (base_def_link->ref))
break;
 
-  rtx_insn *and_insn = DF_REF_INSN (base_def_link->ref);
-  and_operation = alignment_mask (and_insn);
+  *and_insn = DF_REF_INSN (base_def_link->ref);
+  and_operation = alignment_mask (*and_insn);
   if (and_operation != 0)
break;
 }
@@ -1477,7 +1478,8 @@ recombine_lvx_pattern (rtx_insn *insn, del_info *t
   rtx mem = XEXP (SET_SRC (body), 0);
   rtx base_reg = XEXP (mem, 0);
 
-  rtx and_operation = find_alignment_op (insn, base_reg);
+  rtx_insn *and_insn;
+  rtx and_operation = find_alignment_op (insn, base_reg, _insn);
 
   if (and_operation != 0)
 {
@@ -1501,7 +1503,21 @@ recombine_lvx_pattern (rtx_insn *insn, del_info *t
  to_delete[INSN_UID (swap_insn)].replace = true;
  to_delete[INSN_UID (swap_insn)].replace_insn = swap_insn;
 
- XEXP (mem, 0) = and_operation;
+ /* However, first we must be sure that we make the
+base register from the AND operation available
+in case the register has been overwritten.  Copy
+the base register to a new pseudo and use that
+as the base register of the AND operation in
+the new LVX instruction.  */
+ rtx and_base = XEXP (and_operation, 0);
+ rtx new_reg = gen_reg_rtx (GET_MODE (and_base));
+ rtx copy = gen_rtx_SET (new_reg, and_base);
+ rtx_insn *new_insn = emit_insn_after (copy, and_insn);
+ set_block_for_insn (new_insn, BLOCK_FOR_INSN (and_insn));
+ df_insn_rescan (new_insn);
+
+ XEXP (mem, 0) = gen_rtx_AND (GET_MODE (and_base), new_reg,
+  XEXP (and_operation, 1));
  SET_SRC (body) = mem;
  INSN_CODE (insn) = -1; /* Force re-recognition.  */
  df_insn_rescan (insn);
@@ -1524,7 +1540,8 @@ recombine_stvx_pattern (rtx_insn *insn, del_info *
   rtx mem = SET_DEST (body);
   rtx base_reg = XEXP (mem, 0);
 
-  rtx and_operation = find_alignment_op (insn, base_reg);
+  rtx_insn *and_insn;
+  rtx and_operation = find_alignment_op (insn, base_reg, _insn);
 
   if (and_operation != 0)
 {
@@ -1552,7 +1569,21 @@ recombine_stvx_pattern (rtx_insn *insn,

Re: Drop df_ from df_read_modify_subreg_p

2017-08-24 Thread Jeff Law

On 08/24/2017 12:21 PM, Richard Sandiford wrote:
> ...it's really a general RTL predicate, rather than something that depends
> on the DF state.  Thanks to Segher for the suggestion.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
> 
> Richard
> 
> 
> 2017-08-24  Richard Sandiford  
> 
> gcc/
>   * df.h (df_read_modify_subreg_p): Remove in favor of...
>   * rtl.h (read_modify_subreg_p): ...this new function.  Take a
>   const_rtx instead of an rtx.
>   * cprop.c (local_cprop_find_used_regs): Update accordingly.
>   * df-problems.c (df_word_lr_mark_ref): Likewise.
>   * ira-lives.c (mark_pseudo_reg_live): Likewise.
>   (mark_pseudo_reg_dead): Likewise.
>   (mark_ref_dead): Likewise.
>   * reginfo.c (init_subregs_of_mode): Likewise.
>   * sched-deps.c (sched_analyze_1): Likewise.
>   * df-scan.c (df_def_record_1): Likewise.
>   (df_uses_record): Likewise.
>   (df_read_modify_subreg_p): Remove in favor of...
>   * rtlanal.c (read_modify_subreg_p): ...this new function.  Take a
>   const_rtx instead of an rtx.
OK.
jeff

Re: [PATCH 1/3] improve detection of attribute conflicts (PR 81544)

2017-08-24 Thread Martin Sebor


The bulk of this patch touches what I think is considered
the middle-end (attribs.c) so let me include its maintainers,
Ian, Jeff, and Richard:

  https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01087.html

The high-level description and rationale for the change are
here:

  https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00602.html

Thanks
Martin

On 08/17/2017 10:03 AM, Martin Sebor wrote:

Attached is an updated patch with just the minor editorial
tweaks for the issues pointed out by Marek (stray comments
and spaces), and with the C++ and libstdc++ bits removed
and posted separately.  I also added some text to the manual
to clarify the const/pure effects.

I thought quite a bit more about the const/pure attributes we
discussed and tried the approach of warning only on a const
declaration after one with attribute pure has been used, but
otherwise allowing and silently ignoring pure after const.
In the end I decided against it, for a few reasons (most of
which I already mentioned but just to summarize).

First, there is the risk that someone will write code based
on the pure declaration even if there's no intervening call
to the function between it and the const one.  Code tends to
be sloppy, and it's also not uncommon to declare the same
function more than once, for whatever reason.  (The ssa-ccp-2.c
test is an example of the latter.)

Second, there are cases of attribute conflicts that GCC already
points out that are much more benign in their effects (the ones
I know about are always_inline/noinline and cold/hot).  In light
of the risk above it seems only helpful to include const/pure in
the same set.

Third, I couldn't find another pair of attributes that GCC would
deliberately handle this way (silently accept both but prefer one
over the other), and introducing a special case just for these
two seemed like a wart.

Finally, compiling Binutils, GDB, Glkibc, and the Linux kernel
with the enhanced warning didn't turn up any code that does this
sort of thing, either intentionally or otherwise.

With that, I've left the handling unchanged.

I do still have the question whether diagnosing attribute
conflicts under -Wattributes is right.  The manual implies
that -Wattributes is for attributes in the wrong places or
on the wrong entities, and that -Wignored-attributes should
be expected instead when GCC decides to drop one for some
reason.

It is a little unfortunate that many -Wattributes warnings
print just "attribute ignored" (and have done so for years).
I think they should all be enhanced to also print why the
attribute is ignored (e.g., "'packed' attribute on function
declarations ignored/not valid/not supported" or something
like that).  Those that ignore attributes that would
otherwise be valid e.g., because they conflict with other
specifiers of the same attribute but with a different
operand might then be candidate for changing to
-Wignored-attributes.

Thanks
Martin

[PATCH] Add --enable-static-pie to build static PIE

2017-08-24 Thread H.J. Lu

--enable-static-pie configures glibc to compile libc.a as PIE and
creates static executables as PIE.  A static position independent
executable (static PIE) is similar to static executable, but can be
loaded at any address without help from a dynamic linker.  All linker
input files must be compiled with -fpie or -fPIE.  "-z text" is also
passed to linker to prevent dynamic relocations in read-only segments.

This patch requires:

1. Linker supports --no-dynamic-linker to remove PT_INTERP segment from
static PIE.
2. Linker can create working static PIE.  The x86-64 linker needs the
fix for

https://sourceware.org/bugzilla/show_bug.cgi?id=21782

Binutils 2.29 or above is OK.
3. GCC supports -static-pie with the patch:

https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01455.html

or -static -pie.  Since the bug fix for

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81523

has been applied, -static with GCC 8 will override -pie.

Tested on i686 and x86-64.

OK for master?

H.J.
---
* Makeconfig (pic-default): Updated for --enable-static-pie.
(pie-default): New for --enable-static-pie.
(default-pie-ldflag): Likewise.
(+link-static-before-libc): Add $(default-pie-ldflag).
(+prectorT): Updated for --enable-static-pie.
(+postctorT): Likewise.
(CFLAGS-.o): Add $(pie-default).
(CFLAGS-.op): Likewise.
* config.h.in (ENABLE_STATIC_PIE): New.
* configure.ac (--enable-static-pie): New configure option.
(have-no-dynamic-linker): New LIBC_CONFIG_VAR.
(have-static-pie): Likewise.
Enable static PIE if linker supports --no-dynamic-linker and
GCC supports -static-pie or -static -pie.
(ENABLE_STATIC_PIE): New AC_DEFINE.
(enable-static-pie): New LIBC_CONFIG_VAR.
* configure: Regenerated.
* csu/libc-start.c (LIBC_START_MAIN): Call _dl_relocate_static_pie
in libc.a.
* csu/libc-tls.c (__libc_setup_tls): Add main_map->l_addr to
initimage.
* elf/dl-support.c: Include "dynamic-link.h" and don't include
"get-dynamic-info.h" for --enable-static-pie.
(_dl_relocate_static_pie): New function for --enable-static-pie.
(STATIC_PIE_BOOTSTRAP): New for --enable-static-pie.
(RESOLVE_MAP): Likewise.
* elf/dynamic-link.h (ELF_DURING_STARTUP): Also check
STATIC_PIE_BOOTSTRAP.
* elf/get-dynamic-info.h (elf_get_dynamic_info): Likewise.
* sysdeps/generic/ldsodefs.h (_dl_relocate_static_pie): New.
* sysdeps/x86_64/configure.ac: Check if linker supports static PIE.
* sysdeps/x86_64/configure: Regenerated.
---
 Makeconfig  |  20 +++-
 config.h.in |   3 ++
 configure   | 109 
 configure.ac|  48 +++
 csu/libc-start.c|   2 +
 csu/libc-tls.c  |   6 +--
 elf/dl-support.c|  39 +++-
 elf/dynamic-link.h  |   2 +-
 elf/get-dynamic-info.h  |   6 ++-
 sysdeps/generic/ldsodefs.h  |   7 +++
 sysdeps/x86_64/configure|  37 +++
 sysdeps/x86_64/configure.ac |  26 +++
 12 files changed, 295 insertions(+), 10 deletions(-)

diff --git a/Makeconfig b/Makeconfig
index b51904b797..61dc61680a 100644
--- a/Makeconfig
+++ b/Makeconfig
@@ -386,6 +386,16 @@ LDFLAGS.so += $(hashstyle-LDFLAGS)
 LDFLAGS-rtld += $(hashstyle-LDFLAGS)
 endif
 
+ifeq (yes,$(enable-static-pie))
+pic-default = -DPIC
+pie-default = $(pie-ccflag)
+ifeq (yes,$(have-static-pie))
+default-pie-ldflag = -static-pie
+else
+default-pie-ldflag = -pie -Wl,--no-dynamic-linker,--eh-frame-hdr,-z,text
+endif
+endif
+
 # If lazy relocations are disabled, add the -z now flag.  Use
 # LDFLAGS-lib.so instead of LDFLAGS.so, to avoid adding the flag to
 # test modules.
@@ -435,6 +445,7 @@ endif
 # Command for statically linking programs with the C library.
 ifndef +link-static
 +link-static-before-libc = $(CC) -nostdlib -nostartfiles -static -o $@ \
+ $(default-pie-ldflag) \
  $(sysdep-LDFLAGS) $(LDFLAGS) $(LDFLAGS-$(@F))  \
  $(addprefix $(csu-objpfx),$(static-start-installed-name)) \
  $(+preinit) $(+prectorT) \
@@ -651,8 +662,13 @@ endif
 +prectorS = `$(CC) $(sysdep-LDFLAGS) --print-file-name=crtbeginS.o`
 +postctorS = `$(CC) $(sysdep-LDFLAGS) --print-file-name=crtendS.o`
 # Variants of the two previous definitions for statically linking programs.
+ifeq (yes,$(enable-static-pie))
++prectorT = $(+prectorS)
++postctorT = $(+postctorS)
+else
 +prectorT = `$(CC) $(sysdep-LDFLAGS) --print-file-name=crtbeginT.o`
 +postctorT = `$(CC) $(sysdep-LDFLAGS) --print-file-name=crtend.o`
+endif
 csu-objpfx = $(common-objpfx)csu/
 elf-objpfx = $(common-objpfx)elf/
 
@@ -973,7 +989,7 @@ libtypes = $(foreach 
o,$(object-suffixes-for-libc),$(libtype$o))
 all-object-suffixes := .o .os .oS
 object-suffixes :=
 CPPFLAGS-.o = $(pic-default)

Re: [PATCH] Add -static-pie to GCC driver to create static PIE

2017-08-24 Thread H.J. Lu

On Wed, Aug 9, 2017 at 3:39 AM, H.J. Lu  wrote:
> On Tue, Aug 8, 2017 at 10:36 PM, Richard Biener
>  wrote:
>> On August 9, 2017 12:18:41 AM GMT+02:00, "H.J. Lu"  
>> wrote:
>>>This patch adds -static-pie to GCC driver to create static PIE.  A
>>>static
>>>position independent executable (PIE) is similar to static executable,
>>>but can be loaded at any address without a dynamic linker.  All linker
>>>input files must be compiled with -fpie or -fPIE and linker must
>>>support
>>>--no-dynamic-linker to avoid linking with dynamic linker.  "-z text" is
>>>also needed to prevent dynamic relocations in read-only segments.
>>>
>>>OK for trunk?
>>
>> What's wrong with -static -pie?
>>
>
> -static -pie behaved differently depending on if --enable-default-pie
> was used:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81523
>
> I just checked in a patch to make -static always  override -pie.
>

Hi Joseph,

Can you take a look at this patch?  I am enclosing it here.

Thanks.


-- 
H.J.
From d5b7ffdb48d18b65e071b9e064e7d73ac58573da Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Thu, 20 Jul 2017 14:08:18 -0700
Subject: [PATCH] Add -static-pie to GCC driver to create static PIE

This patch adds -static-pie to GCC driver to create static PIE.  A static
position independent executable (PIE) is similar to static executable,
but can be loaded at any address without a dynamic linker.  All linker
input files must be compiled with -fpie or -fPIE and linker must support
--no-dynamic-linker to avoid linking with dynamic linker.  "-z text" is
also needed to prevent dynamic relocations in read-only segments.

	PR driver/81498
	* common.opt (-static-pie): New alias.
	(shared): Negate static-pie.
	(static-pie): New option.
	* config/gnu-user.h (GNU_USER_TARGET_STARTFILE_SPEC): Add
	-static-pie support.
	(GNU_USER_TARGET_ENDFILE_SPEC): Likewise.
	(LINK_EH_SPEC): Likewise.
	(LINK_GCC_C_SEQUENCE_SPEC): Likewise.
	* config/i386/gnu-user.h (GNU_USER_TARGET_LINK_SPEC): Likewise.
	* config/i386/gnu-user64.h (GNU_USER_TARGET_LINK_SPEC): Likewise.
	* gcc.c (LINK_COMMAND_SPEC): Likewise.
	(init_gcc_specs): Likewise.
	(init_spec): Likewise.
	* doc/invoke.texi: Document -static-pie.
---
 gcc/common.opt   |  9 -
 gcc/config/gnu-user.h| 15 ---
 gcc/config/i386/gnu-user.h   |  7 ---
 gcc/config/i386/gnu-user64.h | 11 ++-
 gcc/doc/invoke.texi  | 11 ++-
 gcc/gcc.c| 17 ++---
 6 files changed, 46 insertions(+), 24 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 1cb1c83d306..246566168cc 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -353,6 +353,9 @@ Common Alias(pedantic-errors)
 -pie
 Driver Alias(pie)
 
+-static-pie
+Driver Alias(static-pie)
+
 -pipe
 Driver Alias(pipe)
 
@@ -3062,7 +3065,7 @@ x
 Driver Joined Separate
 
 shared
-Driver RejectNegative Negative(pie)
+Driver RejectNegative Negative(static-pie)
 Create a shared library.
 
 shared-libgcc
@@ -3114,6 +3117,10 @@ pie
 Driver RejectNegative Negative(no-pie)
 Create a position independent executable.
 
+static-pie
+Driver RejectNegative Negative(pie)
+Create a static position independent executable.
+
 z
 Driver Joined Separate
 
diff --git a/gcc/config/gnu-user.h b/gcc/config/gnu-user.h
index de605b0c466..a967b69a350 100644
--- a/gcc/config/gnu-user.h
+++ b/gcc/config/gnu-user.h
@@ -53,11 +53,11 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
   "%{shared:; \
  pg|p|profile:gcrt1.o%s; \
  static:crt1.o%s; \
- " PIE_SPEC ":Scrt1.o%s; \
+ static-pie|" PIE_SPEC ":Scrt1.o%s; \
  :crt1.o%s} \
crti.o%s \
%{static:crtbeginT.o%s; \
- shared|" PIE_SPEC ":crtbeginS.o%s; \
+ shared|static-pie|" PIE_SPEC ":crtbeginS.o%s; \
  :crtbegin.o%s} \
%{fvtable-verify=none:%s; \
  fvtable-verify=preinit:vtv_start_preinit.o%s; \
@@ -70,7 +70,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
  :crt1.o%s} \
crti.o%s \
%{static:crtbeginT.o%s; \
- shared|pie:crtbeginS.o%s; \
+ shared|pie|static-pie:crtbeginS.o%s; \
  :crtbegin.o%s} \
%{fvtable-verify=none:%s; \
  fvtable-verify=preinit:vtv_start_preinit.o%s; \
@@ -92,7 +92,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
  fvtable-verify=preinit:vtv_end_preinit.o%s; \
  fvtable-verify=std:vtv_end.o%s} \
%{static:crtend.o%s; \
- shared|" PIE_SPEC ":crtendS.o%s; \
+ shared|static-pie|" PIE_SPEC ":crtendS.o%s; \
  :crtend.o%s} \
crtn.o%s \
" CRTOFFLOADEND
@@ -102,7 +102,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
  fvtable-verify=preinit:vtv_end_preinit.o%s; \
  fvtable-verify=std:vtv_end.o%s} \
%{static:crtend.o%s; \
- shared|pie:crtendS.o%s; \
+ shared|pie|static-pie:crtendS.o%s; \
  :crtend.o%s} \
crtn.o%s \

Re: PR81635: Use chrecs to help find related data refs

2017-08-24 Thread Richard Sandiford

Richard Biener  writes:
> @@ -787,14 +821,14 @@ canonicalize_base_object_address (tree a
>
>  bool
>  dr_analyze_innermost (innermost_loop_behavior *drb, tree ref,
> - struct loop *loop)
> + gimple *stmt, struct loop *loop)
>  {
>
> please amend the function comment with what STMT is about
> (DR_STMT I suppose).

I'd done that with:

--
@@ -765,7 +778,28 @@ canonicalize_base_object_address (tree a
   return build_fold_addr_expr (TREE_OPERAND (addr, 0));
 }
 
-/* Analyze the behavior of memory reference REF.  There are two modes:
[...]
+/* Analyze the behavior of memory reference REF, which occurs in STMT.
+   There are two modes:
 
- BB analysis.  In this case we simply split the address into base,
  init and offset components, without reference to any containing loop.
--

but it wasn't obvious because of the elided stuff.

> @@ -893,14 +927,14 @@ dr_analyze_innermost (innermost_loop_beh
>init = size_binop (PLUS_EXPR, init, dinit);
>base_misalignment -= TREE_INT_CST_LOW (dinit);
>
> -  split_constant_offset (offset_iv.base, _iv.base, );
> -  init = size_binop (PLUS_EXPR, init, dinit);
> -
> -  step = size_binop (PLUS_EXPR,
> -fold_convert (ssizetype, base_iv.step),
> -fold_convert (ssizetype, offset_iv.step));
> -
>base = canonicalize_base_object_address (base_iv.base);
> +  tree offset = size_binop (PLUS_EXPR,
> +   fold_convert (ssizetype, offset_iv.base),
> +   init);
>
> so you remove the split_constant_offset handling on offset_iv.base.
> This may end up no longer walking and expanding def stmts of
> SSA names contained therein.  I suppose this is fully intended
> so that re-computing the ref address using DR_BASE/DR_OFFSET
> doesn't end up expanding that redundant code?

Yeah.  I think DR_OFFSET is only going to be used by code that
wants to construct a new address, so there doesn't seem much
point taking apart the offset only to regimplify it when building
a new reference.

> For analysis relying on this one now needs to resort to
> DR_VAR/CONST_OFFSET where SCEV analysis hopefully performs similar
> expansions?

We still apply split_constant_offset to the DR_VAR_OFFSET as well;
the scev analysis applies on top of that rather than replacing it.

> Just sth to watch at ...
>
> @@ -921,12 +955,12 @@ dr_analyze_innermost (innermost_loop_beh
>  }
>
>drb->base_address = base;
> -  drb->offset = fold_convert (ssizetype, offset_iv.base);
> -  drb->init = init;
> +  drb->offset = offset;
>drb->step = step;
> +  split_constant_offset (scev, >var_offset, >const_offset);
>
> so is the full-fledged split_constant_offset (with its SSA name handling)
> still needed here?  Sth to eventually address with a followup.

Yeah, I think we still need it.  The SCEV stuff is only really useful
if the starting offset has a simple evolution in a containing loop.
For other cases we punt and just use the original generic expression.
In particular, if we're doing SLP on a single-block function, we need
everything that split_constant_offset currently does.

> @@ -1490,6 +1482,7 @@ ref_at_iteration (data_reference_p dr, i
>tree ref_op2 = NULL_TREE;
>tree new_offset;
>
> +  split_constant_offset (DR_OFFSET (dr), , );
>if (iter != 0)
>  {
>new_offset = size_binop (MULT_EXPR, DR_STEP (dr), ssize_int (iter));
>
> likewise here?  Why do you think ref_at_iteration cares?  Is that because
> of codegen quality?  I'd have done with coff == size_zero_node plus
> simplifications that arise from that.

Yeah, I assumed it was codegen quality.  But maybe it was just a way to
compensate for the fact that the MEM_REF is built directly (rather than
via a fold) and so this was the easiest way of getting a nonzero second
operand to the MEM_REF.

Thanks,
Richard

[C++ PATCH] put all conv-ops onto single overload

2017-08-24 Thread Nathan Sidwell

Currently the conversion operator member functions are kept on a set of 
METHOD_VEC slots at the start of the method_vec.  These are 
distinguished by return type.


However, usually we want to grab all the conversion operators to see if 
one permits a conversion to whatever we're converting to.  Usually if 
one has a conversion to T, the function is 'Foo::operator T () const', 
it's very rare we overload on the this-quals for a given T.


This patch changes things to have a single overload for all the 
conversion operators.  add_method already had sufficient smarts to deal 
with different target types on a single overload (because, templates). 
The lookup_conversions function similarly had to deal with templates, 
which are already on a single list.  So that gets simplified too (it's 
still rather horrible, for which I can only blame myself).


They are still on a magic slot though, and I'm now using a rather ugly 
idiom of:

  vec_safe_iterate (methods, CLASSTYPE_FIRST_CONVERSION_SLOT, );
  if (fns && !DECL_CONV_FN_P (OVL_FIRST (fns)))
fns = NULL_TREE;
  for (ovl_iterator iter (fns); iter; ++iter)
...
to get at it.  That's temporary, until they get a proper name to be 
looked up by.


Applied to trunk.

nathan
--
Nathan Sidwell
2017-08-24  Nathan Sidwell  

	Conversion operators kept on single overload set
	* class.c (add_method): Keep all conv-ops on one slot.
	* name-lookup.c (lookup_conversion_operator): Pull the desired
	conv op out of overload set.
	* search.c (lookup_conversions_r): Lose template/non-template
	distinction.
	(lookup_conversions): Likewise.

Index: class.c
===
--- class.c	(revision 251316)
+++ class.c	(working copy)
@@ -1014,30 +1014,16 @@ modify_vtable_entry (tree t,
 bool
 add_method (tree type, tree method, bool via_using)
 {
-  unsigned slot;
-  bool template_conv_p = false;
-  bool conv_p;
-  vec *method_vec;
-  bool complete_p;
-  bool insert_p = false;
-  tree current_fns;
-
   if (method == error_mark_node)
 return false;
 
-  complete_p = COMPLETE_TYPE_P (type);
-  conv_p = DECL_CONV_FN_P (method);
-  if (conv_p)
-template_conv_p = (TREE_CODE (method) == TEMPLATE_DECL
-		   && DECL_TEMPLATE_CONV_FN_P (method));
+  bool complete_p = COMPLETE_TYPE_P (type);
+  bool conv_p = DECL_CONV_FN_P (method);
 
-  method_vec = CLASSTYPE_METHOD_VEC (type);
+  vec *method_vec = CLASSTYPE_METHOD_VEC (type);
   if (!method_vec)
 {
-  /* Make a new method vector.  We start with 8 entries.  We must
-	 allocate at least two (for constructors and destructors), and
-	 we're going to end up with an assignment operator at some
-	 point as well.  */
+  /* Make a new method vector.  We start with 8 entries.  */
   vec_alloc (method_vec, 8);
   CLASSTYPE_METHOD_VEC (type) = method_vec;
 }
@@ -1045,24 +1031,22 @@ add_method (tree type, tree method, bool
   /* Maintain TYPE_HAS_USER_CONSTRUCTOR, etc.  */
   grok_special_member_properties (method);
 
+  bool insert_p = true;
+  unsigned slot;
   tree m;
 
-  insert_p = true;
   /* See if we already have an entry with this name.  */
   for (slot = CLASSTYPE_FIRST_CONVERSION_SLOT;
vec_safe_iterate (method_vec, slot, );
++slot)
 {
   m = OVL_FIRST (m);
-  if (template_conv_p)
+  if (conv_p)
 	{
-	  if (TREE_CODE (m) == TEMPLATE_DECL
-	  && DECL_TEMPLATE_CONV_FN_P (m))
+	  if (DECL_CONV_FN_P (m))
 	insert_p = false;
 	  break;
 	}
-  if (conv_p && !DECL_CONV_FN_P (m))
-	break;
   if (DECL_NAME (m) == DECL_NAME (method))
 	{
 	  insert_p = false;
@@ -1073,7 +1057,8 @@ add_method (tree type, tree method, bool
 	  && DECL_NAME (m) > DECL_NAME (method))
 	break;
 }
-  current_fns = insert_p ? NULL_TREE : (*method_vec)[slot];
+  tree current_fns = insert_p ? NULL_TREE : (*method_vec)[slot];
+  gcc_assert (!DECL_EXTERN_C_P (method));
 
   /* Check to see if we've already got this method.  */
   for (ovl_iterator iter (current_fns); iter; ++iter)
@@ -1216,8 +1201,7 @@ add_method (tree type, tree method, bool
 }
 
   /* A class should never have more than one destructor.  */
-  if (current_fns && DECL_MAYBE_IN_CHARGE_DESTRUCTOR_P (method))
-return false;
+  gcc_assert (!current_fns || !DECL_DESTRUCTOR_P (method));
 
   current_fns = ovl_insert (method, current_fns, via_using);
 
Index: name-lookup.c
===
--- name-lookup.c	(revision 251319)
+++ name-lookup.c	(working copy)
@@ -1096,34 +1096,31 @@ lookup_arg_dependent (tree name, tree fn
 static tree
 lookup_conversion_operator (tree class_type, tree type)
 {
-  tree tpls = NULL_TREE;
+  tree convs = NULL_TREE;
 
   if (TYPE_HAS_CONVERSION (class_type))
 {
-  tree fns;
+  tree fns = NULL_TREE;
+  tree tpls = NULL_TREE;
   vec *methods = CLASSTYPE_METHOD_VEC (class_type);
 
-  for (int i =

Re: Make more use of df_read_modify_subreg_p

2017-08-24 Thread Richard Sandiford

Segher Boessenkool  writes:
> On Wed, Aug 23, 2017 at 11:49:03AM +0100, Richard Sandiford wrote:
>> This patch uses df_read_modify_subreg_p to check whether writing
>> to a subreg would preserve some of the existing contents.
>
> combine does not keep the DF info up-to-date -- but that is no
> problem here, df_read_modify_subreg_p uses no DF info at all.  Maybe
> it should not have "df_" in the name?

Yeah, I guess that's a bit confusing.  I've just posted a patch
to rename it.

Here's a version of the patch that applies on top of that one.
Tested as before.  OK to install?

Thanks,
Richard


2017-08-24  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* caller-save.c (mark_referenced_regs):  Use read_modify_subreg_p.
* combine.c (find_single_use_1): Likewise.
(expand_field_assignment): Likewise.
(move_deaths): Likewise.
* lra-constraints.c (simplify_operand_subreg): Likewise.
(curr_insn_transform): Likewise.
* lra.c (collect_non_operand_hard_regs): Likewise.
(add_regs_to_insn_regno_info): Likewise.
* rtlanal.c (reg_referenced_p): Likewise.
(covers_regno_no_parallel_p): Likewise.

Index: gcc/caller-save.c
===
--- gcc/caller-save.c   2017-08-24 19:14:37.901164512 +0100
+++ gcc/caller-save.c   2017-08-24 19:22:45.217100872 +0100
@@ -1033,10 +1033,7 @@ mark_referenced_regs (rtx *loc, refmarke
  /* If we're setting only part of a multi-word register,
 we shall mark it as referenced, because the words
 that are not being set should be restored.  */
- && ((GET_MODE_SIZE (GET_MODE (*loc))
-  >= GET_MODE_SIZE (GET_MODE (SUBREG_REG (*loc
- || (GET_MODE_SIZE (GET_MODE (SUBREG_REG (*loc)))
- <= UNITS_PER_WORD
+ && !read_modify_subreg_p (*loc)))
return;
 }
   if (code == MEM || code == SUBREG)
Index: gcc/combine.c
===
--- gcc/combine.c   2017-08-24 19:22:26.163269637 +0100
+++ gcc/combine.c   2017-08-24 19:22:45.218100970 +0100
@@ -579,10 +579,7 @@ find_single_use_1 (rtx dest, rtx *loc)
  && !REG_P (SET_DEST (x))
  && ! (GET_CODE (SET_DEST (x)) == SUBREG
&& REG_P (SUBREG_REG (SET_DEST (x)))
-   && (((GET_MODE_SIZE (GET_MODE (SUBREG_REG (SET_DEST (x
- + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD)
-   == ((GET_MODE_SIZE (GET_MODE (SET_DEST (x)))
-+ (UNITS_PER_WORD - 1)) / UNITS_PER_WORD
+   && !read_modify_subreg_p (SET_DEST (x
break;
 
   return find_single_use_1 (dest, _SRC (x));
@@ -7275,15 +7272,12 @@ expand_field_assignment (const_rtx x)
}
}
 
-  /* A SUBREG between two modes that occupy the same numbers of words
-can be done by moving the SUBREG to the source.  */
+  /* If the destination is a subreg that overwrites the whole of the inner
+register, we can move the subreg to the source.  */
   else if (GET_CODE (SET_DEST (x)) == SUBREG
   /* We need SUBREGs to compute nonzero_bits properly.  */
   && nonzero_sign_valid
-  && (((GET_MODE_SIZE (GET_MODE (SET_DEST (x)))
-+ (UNITS_PER_WORD - 1)) / UNITS_PER_WORD)
-  == ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (SET_DEST (x
-   + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD)))
+  && !read_modify_subreg_p (SET_DEST (x)))
{
  x = gen_rtx_SET (SUBREG_REG (SET_DEST (x)),
   gen_lowpart
@@ -13821,10 +13815,7 @@ move_deaths (rtx x, rtx maybe_kill_insn,
   if (GET_CODE (dest) == ZERO_EXTRACT
  || GET_CODE (dest) == STRICT_LOW_PART
  || (GET_CODE (dest) == SUBREG
- && (((GET_MODE_SIZE (GET_MODE (dest))
-   + UNITS_PER_WORD - 1) / UNITS_PER_WORD)
- == ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (dest)))
-  + UNITS_PER_WORD - 1) / UNITS_PER_WORD
+ && !read_modify_subreg_p (dest)))
{
  move_deaths (dest, maybe_kill_insn, from_luid, to_insn, pnotes);
  return;
Index: gcc/lra-constraints.c
===
--- gcc/lra-constraints.c   2017-08-24 19:14:37.909164128 +0100
+++ gcc/lra-constraints.c   2017-08-24 19:22:45.218100970 +0100
@@ -1679,7 +1679,7 @@ simplify_operand_subreg (int nop, machin
  bitmap_set_bit (_subreg_reload_pseudos, REGNO (new_reg));
 
  insert_before = (type != OP_OUT
-  || GET_MODE_SIZE (innermode) > GET_MODE_SIZE (mode));
+

Drop df_ from df_read_modify_subreg_p

2017-08-24 Thread Richard Sandiford

...it's really a general RTL predicate, rather than something that depends
on the DF state.  Thanks to Segher for the suggestion.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2017-08-24  Richard Sandiford  

gcc/
* df.h (df_read_modify_subreg_p): Remove in favor of...
* rtl.h (read_modify_subreg_p): ...this new function.  Take a
const_rtx instead of an rtx.
* cprop.c (local_cprop_find_used_regs): Update accordingly.
* df-problems.c (df_word_lr_mark_ref): Likewise.
* ira-lives.c (mark_pseudo_reg_live): Likewise.
(mark_pseudo_reg_dead): Likewise.
(mark_ref_dead): Likewise.
* reginfo.c (init_subregs_of_mode): Likewise.
* sched-deps.c (sched_analyze_1): Likewise.
* df-scan.c (df_def_record_1): Likewise.
(df_uses_record): Likewise.
(df_read_modify_subreg_p): Remove in favor of...
* rtlanal.c (read_modify_subreg_p): ...this new function.  Take a
const_rtx instead of an rtx.

Index: gcc/df.h
===
--- gcc/df.h2017-08-24 19:20:31.055856350 +0100
+++ gcc/df.h2017-08-24 19:20:31.265856538 +0100
@@ -1080,7 +1080,6 @@ extern unsigned int df_hard_reg_used_cou
 extern bool df_regs_ever_live_p (unsigned int);
 extern void df_set_regs_ever_live (unsigned int, bool);
 extern void df_compute_regs_ever_live (bool);
-extern bool df_read_modify_subreg_p (rtx);
 extern void df_scan_verify (void);
 
 
Index: gcc/rtl.h
===
--- gcc/rtl.h   2017-08-24 19:20:31.055856350 +0100
+++ gcc/rtl.h   2017-08-24 19:20:31.266856539 +0100
@@ -2176,6 +2176,7 @@ extern unsigned int subreg_lsb_1 (machin
  unsigned int);
 extern unsigned int subreg_size_offset_from_lsb (unsigned int, unsigned int,
 unsigned int);
+extern bool read_modify_subreg_p (const_rtx);
 
 /* Return the subreg byte offset for a subreg whose outer mode is
OUTER_MODE, whose inner mode is INNER_MODE, and where there are
@@ -2800,7 +2801,7 @@ partial_subreg_p (machine_mode outermode
 }
 
 /* Likewise return true if X is a subreg that is smaller than the inner
-   register.  Use df_read_modify_subreg_p to test whether writing to such
+   register.  Use read_modify_subreg_p to test whether writing to such
a subreg preserves any part of the inner register.  */
 
 inline bool
Index: gcc/cprop.c
===
--- gcc/cprop.c 2017-08-24 19:20:31.055856350 +0100
+++ gcc/cprop.c 2017-08-24 19:20:31.264856537 +0100
@@ -1161,7 +1161,7 @@ local_cprop_find_used_regs (rtx *xptr, v
   return;
 
 case SUBREG:
-  if (df_read_modify_subreg_p (x))
+  if (read_modify_subreg_p (x))
return;
   break;
 
Index: gcc/df-problems.c
===
--- gcc/df-problems.c   2017-08-24 19:20:31.055856350 +0100
+++ gcc/df-problems.c   2017-08-24 19:20:31.264856537 +0100
@@ -2819,7 +2819,7 @@ df_word_lr_mark_ref (df_ref ref, bool is
 return true;
 
   if (GET_CODE (orig_reg) == SUBREG
-  && df_read_modify_subreg_p (orig_reg))
+  && read_modify_subreg_p (orig_reg))
 {
   gcc_assert (DF_REF_FLAGS_IS_SET (ref, DF_REF_PARTIAL));
   if (subreg_lowpart_p (orig_reg))
Index: gcc/ira-lives.c
===
--- gcc/ira-lives.c 2017-08-24 19:20:31.055856350 +0100
+++ gcc/ira-lives.c 2017-08-24 19:20:31.265856538 +0100
@@ -364,7 +364,7 @@ mark_hard_reg_live (rtx reg)
 static void
 mark_pseudo_reg_live (rtx orig_reg, unsigned regno)
 {
-  if (df_read_modify_subreg_p (orig_reg))
+  if (read_modify_subreg_p (orig_reg))
 {
   mark_pseudo_regno_subword_live (regno,
  subreg_lowpart_p (orig_reg) ? 0 : 1);
@@ -489,7 +489,7 @@ mark_hard_reg_dead (rtx reg)
 static void
 mark_pseudo_reg_dead (rtx orig_reg, unsigned regno)
 {
-  if (df_read_modify_subreg_p (orig_reg))
+  if (read_modify_subreg_p (orig_reg))
 {
   mark_pseudo_regno_subword_dead (regno,
  subreg_lowpart_p (orig_reg) ? 0 : 1);
@@ -515,7 +515,7 @@ mark_ref_dead (df_ref def)
   if (DF_REF_FLAGS_IS_SET (def, DF_REF_PARTIAL)
   && (GET_CODE (orig_reg) != SUBREG
  || REGNO (reg) < FIRST_PSEUDO_REGISTER
- || !df_read_modify_subreg_p (orig_reg)))
+ || !read_modify_subreg_p (orig_reg)))
 return;
 
   if (REGNO (reg) >= FIRST_PSEUDO_REGISTER)
Index: gcc/reginfo.c
===
--- gcc/reginfo.c   2017-08-24 19:20:31.055856350 +0100
+++ gcc/reginfo.c   2017-08-24 19:20:31.265856538 +0100
@@ -1356,7 +1356,7 @@ init_subregs_of_mode (void)
  df_ref def;
  FOR_EACH_INSN_DEF

[PATCH] [MSP430] Pass -mcode/data-region to the linker and assembler

2017-08-24 Thread Jozef Lawrynowicz


The changes made in a series of binutils patches
(https://sourceware.org/ml/binutils/2017-08/msg00274.html)
to ld and gas require the -mcode/data-region options to be propagated
from gcc.

The attached patch adds that functionality.

If the patch is acceptable, I would appreciate if someone could commit
it for me, as I do not have write access.

Thanks,
Jozef
From a1313aeefa289cdf8c7c4b28b04648eed4708716 Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Thu, 24 Aug 2017 16:39:29 +
Subject: [PATCH] MSP430: Pass -mcode/data-region to the linker and assembler

2017-08-XX  Jozef Lawrynowicz   

* gcc/config/msp430/msp430.h: Pass -mcode/data-region to the linker
and -mdata-region to the assembler.
---
 gcc/config/msp430/msp430.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/config/msp430/msp430.h b/gcc/config/msp430/msp430.h
index 2bff249..e95106d 100644
--- a/gcc/config/msp430/msp430.h
+++ b/gcc/config/msp430/msp430.h
@@ -58,12 +58,14 @@ extern bool msp430x;
   "%{!msim:-md} %{msim:%{mlarge:-md}} " /* Copy data from ROM to RAM if 
necessary.  */ \
   "%{msilicon-errata=*:-msilicon-errata=%*} " /* Pass on -msilicon-errata.  */ 
\
   "%{msilicon-errata-warn=*:-msilicon-errata-warn=%*} " /* Pass on 
-msilicon-errata-warn.  */ \
-  "%{ffunction-sections:-gdwarf-sections} " /* If function sections are being 
created then create DWARF line number sections as well.  */
+  "%{ffunction-sections:-gdwarf-sections} " /* If function sections are being 
created then create DWARF line number sections as well.  */ \
+  "%{mdata-region=*:-mdata-region=%*} " /* Pass on -mdata-region.  */
 
 /* Enable linker section garbage collection by default, unless we
are creating a relocatable binary (gc does not work) or debugging
is enabled  (the GDB testsuite relies upon unused entities not being 
deleted).  */
-#define LINK_SPEC "%{mrelax:--relax} %{mlarge:%{!r:%{!g:--gc-sections}}}"
+#define LINK_SPEC "%{mrelax:--relax} %{mlarge:%{!r:%{!g:--gc-sections}}} " \
+  "%{mcode-region=*:--code-region=%*} %{mdata-region=*:--data-region=%*}"
 
 extern const char * msp430_select_hwmult_lib (int, const char **);
 # define EXTRA_SPEC_FUNCTIONS  \
-- 
1.8.3.1

[PATCH, testsuite]: Group together target-dependant checks

2017-08-24 Thread Uros Bizjak

Some code moved to ease searching for subroutines.

2017-08-24  Uros Bizjak  

* lib/target-supports.exp: Group together target-dependant checks.

Tested on x86_64-linux-gnu {-m32}.

Committed to mainline SVN.

Uros.
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 547fe7a2ff23..3ddc92ee2737 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -1571,30 +1571,6 @@ proc check_linker_plugin_available { } {
   } "-flto -fuse-linker-plugin"]
 }
 
-# Return 1 if the target supports executing 750CL paired-single instructions, 0
-# otherwise.  Cache the result.
-
-proc check_750cl_hw_available { } {
-return [check_cached_effective_target 750cl_hw_available {
-   # If this is not the right target then we can skip the test.
-   if { ![istarget powerpc-*paired*] } {
-   expr 0
-   } else {
-   check_runtime_nocache 750cl_hw_available {
-int main()
-{
-#ifdef __MACH__
-  asm volatile ("ps_mul v0,v0,v0");
-#else
-  asm volatile ("ps_mul 0,0,0");
-#endif
-  return 0;
-}
-   } "-mpaired"
-   }
-}]
-}
-
 # Return 1 if the target OS supports running SSE executables, 0
 # otherwise.  Cache the result.
 
@@ -1635,7 +1611,7 @@ proc check_avx_os_support_available { } {
  unsigned int eax, edx;
 
  asm ("xgetbv" : "=a" (eax), "=d" (edx) : "c" (0));
- return (eax & 6) != 6;
+ return (eax & 0x06) != 0x06;
}
} ""
}
@@ -1679,92 +1655,16 @@ proc check_sse_hw_available { } {
int main ()
{
  unsigned int eax, ebx, ecx, edx;
- if (__get_cpuid (1, , , , ))
-   return !(edx & bit_SSE);
- return 1;
-   }
-   } ""
-   }
-}]
-}
-
-# Return 1 if the target supports executing MIPS Paired-Single instructions,
-# 0 otherwise.  Cache the result.
-
-proc check_mpaired_single_hw_available { } {
-return [check_cached_effective_target mpaired_single_hw_available {
-   # If this is not the right target then we can skip the test.
-   if { !([istarget mips*-*-*]) } {
-   expr 0
-   } else {
-   check_runtime_nocache mpaired_single_hw_available {
- int main()
- {
-   asm volatile ("pll.ps $f2,$f4,$f6");
-   return 0;
- }
-   } ""
-   }
-}]
-}
-
-# Return 1 if the target supports executing Loongson vector instructions,
-# 0 otherwise.  Cache the result.
+ if (!__get_cpuid (1, , , , ))
+   return 1;
 
-proc check_mips_loongson_hw_available { } {
-return [check_cached_effective_target mips_loongson_hw_available {
-   # If this is not the right target then we can skip the test.
-   if { !([istarget mips*-*-*]) } {
-   expr 0
-   } else {
-   check_runtime_nocache mips_loongson_hw_available {
- #include 
- int main()
- {
-   asm volatile ("paddw $f2,$f4,$f6");
-   return 0;
- }
+ return !(edx & bit_SSE);
+   }
} ""
}
 }]
 }
 
-# Return 1 if the target supports executing MIPS MSA instructions, 0
-# otherwise.  Cache the result.
-
-proc check_mips_msa_hw_available { } {
-  return [check_cached_effective_target mips_msa_hw_available {
-# If this is not the right target then we can skip the test.
-if { !([istarget mips*-*-*]) } {
-  expr 0
-} else {
-  check_runtime_nocache mips_msa_hw_available {
-   #if !defined(__mips_msa)
-   #error "MSA NOT AVAIL"
-   #else
-   #if !(((__mips == 64) || (__mips == 32)) && (__mips_isa_rev >= 2))
-   #error "MSA NOT AVAIL FOR ISA REV < 2"
-   #endif
-   #if !defined(__mips_hard_float)
-   #error "MSA HARD_FLOAT REQUIRED"
-   #endif
-   #if __mips_fpr != 64
-   #error "MSA 64-bit FPR REQUIRED"
-   #endif
-   #include 
-
-   int main()
-   {
- v8i16 v = __builtin_msa_ldi_h (0);
- v[0] = 0;
- return v[0];
-   }
-   #endif
-  } "-mmsa"
-}
-  }]
-}
-
 # Return 1 if the target supports executing SSE2 instructions, 0
 # otherwise.  Cache the result.
 
@@ -1779,9 +1679,10 @@ proc check_sse2_hw_available { } {
int main ()
{
  unsigned int eax, ebx, ecx, edx;
- if (__get_cpuid (1, , , , ))
-   return !(edx & bit_SSE2);
- return 1;
+ if (!__get_cpuid (1, , , , ))
+   return 1;
+
+ return !(edx & bit_SSE2);
}
} ""
}
@@ -1802,9

Re: [PATCH] [MSP430] Read mcu data from file instead of hardcoded data

2017-08-24 Thread DJ Delorie


Back when we first designed this, I asked about including devices.csv in
the gcc sources, and... no.  It's not GPL.  So please make sure to
thoroughly test the "no devices.csv found" case.  It's fine to use it to
override the internal data, but not fine to rely on it.

>   * gcc/config/msp430/msp430.h: Don't pass -mmcu to the assembler.

Does this break compatibility with older versions of binutils?

> +/* This structure is no longer being maintained.  Up-to-date data is read 
> from
> +   a devices.csv file on the user's system.

I think occasional updates to the table are still warranted from a
philosophical and GPL point of view, although I expect the average user
to use the latest devices.csv anyway.  If gcc stops working properly for
lack of a proprietary file, that's a problem.

[rest of review later]

Re: [52/77] Use scalar_int_mode in extract/store_bit_field

2017-08-24 Thread Jeff Law

On 07/13/2017 02:56 AM, Richard Sandiford wrote:
> After a certain point, extract_bit_field and store_bit_field
> ensure that they're dealing with integer modes or BLKmode MEMs.
> This patch uses scalar_int_mode and opt_scalar_int_mode for
> those parts.
> 
> 2017-07-13  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * expmed.c (store_bit_field_using_insv): Add op0_mode and
>   value_mode arguments.  Use scalar_int_mode internally.
>   (store_bit_field_1): Rename the new integer mode from imode
>   to op0_mode and use it instead of GET_MODE (op0).  Update calls
>   to store_split_bit_field, store_bit_field_using_insv and
>   store_fixed_bit_field.
>   (store_fixed_bit_field): Add op0_mode and value_mode arguments.
>   Use scalar_int_mode internally.  Use a bit count rather than a mode
>   when calculating the largest bit size for get_best_mode.
>   Update calls to store_split_bit_field and store_fixed_bit_field_1.
>   (store_fixed_bit_field_1): Add mode and value_mode arguments.
>   Remove assertion that OP0 has a scalar integer mode.
>   (store_split_bit_field): Add op0_mode and value_mode arguments.
>   Update calls to extract_fixed_bit_field.
>   (extract_bit_field_using_extv): Add an op0_mode argument.
>   Use scalar_int_mode internally.
>   (extract_bit_field_1): Rename the new integer mode from imode to
>   op0_mode and use it instead of GET_MODE (op0).  Update calls to
>   extract_split_bit_field, extract_bit_field_using_extv and
>   extract_fixed_bit_field.
>   (extract_fixed_bit_field): Add an op0_mode argument.  Update calls
>   to extract_split_bit_field and extract_fixed_bit_field_1.
>   (extract_fixed_bit_field_1): Add a mode argument.  Remove assertion
>   that OP0 has a scalar integer mode.  Use as_a 
>   on the target mode.
>   (extract_split_bit_field): Add an op0_mode argument.  Update call
>   to extract_fixed_bit_field.
> 
OK.
jeff

Re: [51/77] Use opt_scalar_int_mode when iterating over integer modes

2017-08-24 Thread Jeff Law

On 07/13/2017 02:56 AM, Richard Sandiford wrote:
> This patch uses opt_scalar_int_mode rather than machine_mode
> when iterating over scalar_int_modes, in cases where that helps
> with future patches.  (Using machine_mode is still OK in places
> that don't really care about the mode being a scalar integer.)
> 
> 2017-07-13  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * cse.c (cse_insn): Use opt_scalar_int_mode for the mode iterator.
>   * explow.c (hard_function_value): Likewise.
>   * expmed.c (init_expmed_one_mode): Likewise.
>   (extract_fixed_bit_field_1): Likewise.  Move the convert_to_mode
>   call outside the loop.
>   * expr.c (alignment_for_piecewise_move): Use opt_scalar_int_mode
>   for the mode iterator.  Require the mode specified by max_pieces
>   to exist.
>   (emit_block_move_via_movmem): Use opt_scalar_int_mode for the
>   mode iterator.
>   (copy_blkmode_to_reg): Likewise.
>   (set_storage_via_setmem): Likewise.
>   * optabs.c (prepare_cmp_insn): Likewise.
>   * rtlanal.c (init_num_sign_bit_copies_in_rep): Likewise.
>   * stor-layout.c (finish_bitfield_representative): Likewise.
> 
> gcc/fortran/
>   * trans-types.c (gfc_init_kinds): Use opt_scalar_int_mode for
>   the mode iterator.
>
OK.
jeff

Re: [PATCH][RFC] Make expansion of balanced binary trees of switches on tree level.

2017-08-24 Thread Martin Liška

On 08/18/2017 12:25 PM, Martin Liška wrote:
> On 08/18/2017 11:30 AM, Richard Biener wrote:
>> On Tue, Aug 15, 2017 at 2:37 PM, Martin Liška  wrote:
>>> On 08/14/2017 10:32 AM, Richard Biener wrote:
 Hmm, but the existing "lowering" part is called from the
 switch-conversion pass.  So
 I'm not sure a new file is good.
>>>
>>> Good, I'm not against having that in a single file. So new version of the 
>>> patch
>>> does that.
>>>
>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>>
>>> Ready to be installed?
>>
>> Hmm, I see you duplicate add_case_node for example.  Is that just temporary?
>> If not can you please factor out the data structure and common code?
>> (case.[Ch]?)
> 
> You are right. As we'll generate just jump table in stmt.c the proper fix is 
> to remove
> all usages of 'case_node' in the file because simple iteration of labels will 
> work fine.
> Let me do it incrementally to minimize fall out :)

Hello.

So lesson learned. I should follow your recommendation and make the clean-up in 
stmt.c. I didn't
so adding new variant of case_node with a different size caused bootstrap 
failure on aarch64 and
it was quite hard to debug. So sending updated version of the patch which has 
cleaned up stmt.c.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. Same 
of aarch64-linux-gnu.

Ready to be installed?
Martin
> 
> Martin
> 
>>
>> Thanks,
>> Richard.
>>
>>> Martin
> 

>From be0ef53761acf78108b26f71970cc44c7616b22e Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 31 Jul 2017 14:01:53 +0200
Subject: [PATCH] Make expansion of balanced binary trees of switches on tree
 level.

gcc/ChangeLog:

2017-07-31  Martin Liska  

	* passes.def: Include pass_lower_switch.
	* stmt.c (dump_case_nodes): Remove and move to
	tree-switch-conversion.
	(case_values_threshold): Likewise.
	(expand_switch_as_decision_tree_p): Likewise.
	(emit_case_decision_tree): Likewise.
	(expand_case): Likewise.
	(balance_case_nodes): Likewise.
	(node_has_low_bound): Likewise.
	(node_has_high_bound): Likewise.
	(node_is_bounded): Likewise.
	(emit_case_nodes): Likewise.
	(struct simple_case_node): New struct.
	(add_case_node): Remove.
	(emit_case_dispatch_table): Use vector instead of case_list.
	(reset_out_edges_aux): Remove.
	(compute_cases_per_edge): Likewise.
	(expand_case): Build list of simple_case_node.
	(expand_sjlj_dispatch_table): Use it.
	* timevar.def: Add TV_TREE_SWITCH_LOWERING.
	* tree-pass.h: Add make_pass_lower_switch.

gcc/testsuite/ChangeLog:

2017-07-31  Martin Liska  

	* gcc.dg/tree-prof/update-loopch.c: Scan patterns in
	switchlower.
	* gcc.dg/tree-ssa/vrp104.c: Likewise.
---
 gcc/passes.def |1 +
 gcc/stmt.c | 1026 +
 gcc/testsuite/gcc.dg/tree-prof/update-loopch.c |   10 +-
 gcc/testsuite/gcc.dg/tree-ssa/vrp104.c |2 +-
 gcc/timevar.def|1 +
 gcc/tree-pass.h|1 +
 gcc/tree-switch-conversion.c   | 1178 
 7 files changed, 1219 insertions(+), 1000 deletions(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index 316e19d12e3..81b6e62f602 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -394,6 +394,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_lower_vaarg);
   NEXT_PASS (pass_lower_vector);
   NEXT_PASS (pass_lower_complex_O0);
+  NEXT_PASS (pass_lower_switch);
   NEXT_PASS (pass_sancov_O0);
   NEXT_PASS (pass_asan_O0);
   NEXT_PASS (pass_tsan_O0);
diff --git a/gcc/stmt.c b/gcc/stmt.c
index 73aa9b2b809..39d29ff3da9 100644
--- a/gcc/stmt.c
+++ b/gcc/stmt.c
@@ -32,7 +32,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple.h"
 #include "cfghooks.h"
 #include "predict.h"
-#include "alloc-pool.h"
 #include "memmodel.h"
 #include "tm_p.h"
 #include "optabs.h"
@@ -63,53 +62,30 @@ along with GCC; see the file COPYING3.  If not see
as in C, the high and low limits are the same.
 
We start with a vector of case nodes sorted in ascending order, and
-   the default label as the last element in the vector.  Before expanding
-   to RTL, we transform this vector into a list linked via the RIGHT
-   fields in the case_node struct.  Nodes with higher case values are
-   later in the list.
-
-   Switch statements can be output in three forms.  A branch table is
-   used if there are more than a few labels and the labels are dense
-   within the range between the smallest and largest case value.  If a
-   branch table is used, no further manipulations are done with the case
-   node chain.
-
-   The alternative to the use of a branch table is to generate a series
-   of compare and jump insns.  When that is done, we use the LEFT, RIGHT,
-   and PARENT fields to hold a binary tree.  Initially the tree is
-   totally unbalanced, with everything

[PATCH] [MSP430] Read mcu data from file instead of hardcoded data

2017-08-24 Thread Jozef Lawrynowicz


GCC for the MSP430 target uses hard-coded data to choose the correct CPU
ISA and hardware multiply version for a given MCU. Since the data is
hard-coded, this data can only be updated by updating the compiler
itself.

The following patch changes the mechanism for processing device data to
instead look for a "devices.csv" file (provided by TI) on the include
and linker search paths specified by the user.

If the file is found then it is parsed, and the device passed to the
mmcu option is searched for. If the file or device aren't found, then
GCC uses the old hard-coded data instead.

If the patch is acceptable, I would appreciate if someone could commit
it for me, as I do not have write access.

Thanks,
Jozef
From 4ea2f4393929b70eb6e56e1d761b584f65d39248 Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Thu, 24 Aug 2017 13:44:11 +
Subject: [PATCH] MSP430: Read mcu data from file instead of hardcoded data

2017-08-XX  Jozef Lawrynowicz   

* gcc/config.gcc (msp430*-*-*): Add msp430-devices.o to extra objs.
* gcc/config/msp430/driver-msp430.c (parse_devices_csv_wrapper): New
function.
(msp430_check_path_for_devices): New function.
(msp430_select_cpu): New function.
(msp430_select_hwmult_lib): Check for devices.csv data before using
hardcoded data.
* gcc/config/msp430/msp430-devices.c (find_devices_csv): New function.
(parse_devices_csv): New function.
* gcc/config/msp430/msp430-devices.h: Move definition of 
msp430_mcu_data struct
here and rename to t_msp430_mcu_data.
* gcc/config/msp430/msp430.c (msp430_mcu_name): Look for devices.csv 
before
using hardcoded data.
(msp430_option_override): Likewise.
(msp430_use_f5_series_hwmult): Likewise.
(use_32bit_hwmult): Likewise.
(msp430_no_hwmult): Likewise.
* gcc/config/msp430/msp430.h: Don't pass -mmcu to the assembler.
Call msp430_check_path_for_devices on -I and -L paths.
Select multilib by transforming -mmcu to -mcpu.
* gcc/config/msp430/msp430.opt: Add -mdevices-csv-loc= and
-mdisable-device-warnings options.
* gcc/config/msp430/t-msp430: Add rule to build msp430-devices.o.
Remove MULTILIB_MATCHES for hardcoded mcu names.

gcc/testsuite
2017-08-XX  Jozef Lawrynowicz   

* gcc.target/msp430/devices/devices.csv: Add dummy devices.csv.
* gcc.target/msp430/devices/main.c: Add dummy source file.
* gcc.target/msp430/devices/msp430_00.c: New test.
* gcc.target/msp430/devices/msp430_01.c: New test.
* gcc.target/msp430/devices/msp430_02.c: New test.
* gcc.target/msp430/devices/msp430_04.c: New test.
* gcc.target/msp430/devices/msp430_08.c: New test.
* gcc.target/msp430/devices/msp430_10.c: New test.
* gcc.target/msp430/devices/msp430_11.c: New test.
* gcc.target/msp430/devices/msp430_12.c: New test.
* gcc.target/msp430/devices/msp430_14.c: New test.
* gcc.target/msp430/devices/msp430_18.c: New test.
* gcc.target/msp430/devices/msp430_20.c: New test.
* gcc.target/msp430/devices/msp430_21.c: New test.
* gcc.target/msp430/devices/msp430_22.c: New test.
* gcc.target/msp430/devices/msp430_24.c: New test.
* gcc.target/msp430/devices/msp430_28.c: New test.
* gcc.target/msp430/devices/msp430_20_bad_cpu.c: New test.
* gcc.target/msp430/devices/msp430_20_bad_hwmult.c: New test.
* gcc.target/msp430/devices/msp430_28_bad_hwmult.c: New test.
* gcc.target/msp430/interrupt_fn_placement.c: Skip if -mcpu=msp430.
* gcc.target/msp430/msp430.exp: Search in devices directory for tests.
* gcc.target/msp430/no_devices_warn_msp430.c: New test.
* gcc.target/msp430/no_devices_warn_msp430x.c: New test.
---
 gcc/config.gcc |   3 +-
 gcc/config/msp430/driver-msp430.c  | 114 +++---
 gcc/config/msp430/msp430-devices.c | 181 
 gcc/config/msp430/msp430-devices.h |  12 ++
 gcc/config/msp430/msp430.c | 146 -
 gcc/config/msp430/msp430.h |  14 +-
 gcc/config/msp430/msp430.opt   |   9 +
 gcc/config/msp430/t-msp430 | 235 +
 .../gcc.target/msp430/devices/devices.csv  |  16 ++
 gcc/testsuite/gcc.target/msp430/devices/main.c |   5 +
 .../gcc.target/msp430/devices/msp430_00.c  |   5 +
 .../gcc.target/msp430/devices/msp430_01.c  |   5 +
 .../gcc.target/msp430/devices/msp430_02.c  |   5 +
 .../gcc.target/msp430/devices/msp430_04.c  |   5 +
 .../gcc.target/msp430/devices/msp430_08.c  |   5 +
 .../gcc.target/msp430/devices/msp430_10.c  |   5 +

Re: [PATCH] handle pathological anti-ranges in gimple_fold_builtin_memory_op (PR 81908)

2017-08-24 Thread Martin Sebor


On 08/24/2017 08:03 AM, Richard Biener wrote:

On Wed, Aug 23, 2017 at 9:42 PM, Martin Sebor  wrote:

Bug 81908 is about a -Wstringop-overflow warning for a Fortran
test triggered by a recent VRP improvement.  A simple test case
that approximates the warning is:

  void f (char *d, const char *s, size_t n)
  {
if (n > 0 && n <= SIZE_MAX / 2)
  n = 0;

memcpy (d, s, n);   // n in ~[1, SIZE_MAX / 2]
  }

Since the only valid value of n is zero the call to memcpy can
be considered a no-op (a value of n > SIZE_MAX is in excess of
the maximum size of the largest object and would surely make
the call crash).

The important difference between the test case and Bug 81908
is that in the latter, the code is emitted by GCC itself from
what appears to be correct source (looks like it's the result
of the loop distribution pass).  I believe the warning for
the test case above and for other human-written code like it
is helpful, but warning for code emitted by GCC, even if it's
dead or unreachable, is obviously not (at least not to users).

The attached patch enhances the gimple_fold_builtin_memory_op
function to eliminate this patholohgical case by making use
of range information to fold into no-ops calls to memcpy whose
size argument is in a range where the only valid value is zero.
This gets rid of the warning and improves the emitted code.

Tested on x86_64-linux.


@@ -646,6 +677,12 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
   tree destvar, srcvar;
   location_t loc = gimple_location (stmt);

+  if (tree valid_len = only_valid_value (len))
+{
+  /* LEN is in range whose only valid value is zero.  */
+  len = valid_len;
+}
+
   /* If the LEN parameter is zero, return DEST.  */
   if (integer_zerop (len))

just enhance this check to

  if (integer_zerop (len)
  || size_must_be_zero_p (len))

?  'only_valid_value' looks too generic for this.


Sure.

FWIW, the reason I did it this was because my original patch
returned error_mark_node for entirely invalid ranges and had
the caller replace the call with a trap.  I decided not to
include that part in this fix to keep it contained.



+  if (!wi::fits_uhwi_p (min) || !wi::fits_uhwi_p (max))
+return NULL_TREE;
+

why?


Only because I never remember what APIs are safe to use with
what input.


+  if (wi::eq_p (min, wone)
+  && wi::geu_p (max + 1, ssize_max))

   if (wi::eq_p (min, 1)
  && wi::gtu_p (max, wi::max_value (prec, SIGNED)))

your ssize_max isn't signed size max, and max + 1 might overflow to zero.


You're right that the addition to max would be better done
as subtraction from the result of (1 << N).  Thank you.

If (max + 1) overflowed then (max == TYPE_MAX) would have
to hold which I thought could never be true for an anti
range. (The patch includes tests for this case.)  Was I
wrong?

Attached is an updated version with the suggested changes
plus an additional test to verify the absence of warnings.

Martin
PR middle-end/81908 - FAIL: gfortran.dg/alloc_comp_auto_array_2.f90 -O3 -g -m32

gcc/ChangeLog:

	PR middle-end/81908
	* gimple-fold.c (size_must_be_zero_p): New function.
	(gimple_fold_builtin_memory_op): Call it.

gcc/testsuite/ChangeLog:

	PR middle-end/81908
	* gcc.dg/tree-ssa/builtins-folding-gimple-2.c: New test.
	* gcc.dg/tree-ssa/builtins-folding-gimple-3.c: New test.
	* gcc.dg/tree-ssa/pr81908.c: New test.

diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 251446c..c4184fa 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -628,6 +628,36 @@ var_decl_component_p (tree var)
   return SSA_VAR_P (inner);
 }
 
+/* If the SIZE argument representing the size of an object is in a range
+   of values of which exactly one is valid (and that is zero), return
+   true, otherwise false.  */
+
+static bool
+size_must_be_zero_p (tree size)
+{
+  if (integer_zerop (size))
+return true;
+
+  if (TREE_CODE (size) != SSA_NAME)
+return false;
+
+  wide_int min, max;
+  enum value_range_type rtype = get_range_info (size, , );
+  if (rtype != VR_ANTI_RANGE)
+return false;
+
+  tree type = TREE_TYPE (size);
+  int prec = TYPE_PRECISION (type);
+
+  wide_int wone = wi::one (prec);
+
+  /* Compute the value of SSIZE_MAX, the largest positive value that
+ can be stored in ssize_t, the signed counterpart of size_t .  */
+  wide_int ssize_max = wi::lshift (wi::one (prec), prec - 1) - 1;
+
+  return wi::eq_p (min, wone) && wi::geu_p (max, ssize_max);
+}
+
 /* Fold function call to builtin mem{{,p}cpy,move}.  Return
false if no simplification can be made.
If ENDP is 0, return DEST (like memcpy).
@@ -646,8 +676,9 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
   tree destvar, srcvar;
   location_t loc = gimple_location (stmt);
 
-  /* If the LEN parameter is zero, return DEST.  */
-  if (integer_zerop (len))
+  /* If the LEN parameter is a constant zero or in range where
+ the only valid value is zero, return DEST.  */
+  if

Re: [PATCH PR81913]Skip niter analysis if either IV in exit condition can wrap

2017-08-24 Thread Richard Biener

On Thu, Aug 24, 2017 at 12:58 PM, Bin Cheng  wrote:
> Hi,
> I added code handle exit condition like "IV1 le/lt IV2" by changing it into 
> "IV1' le/lt INV".
> Unfortunately, wrapping behavior has subtle impact on the transformation.  
> This patch for
> now skips niter analysis if either IV1 or IV2 can wrap.  We can still handle 
> pointer case
> as reported in PR81196, but unsigned type needs more work.  The patch also 
> includes two
> XFAIL tests showing what shall be improved here.
> Bootstrap and test on AArch64.  Is it OK?

Ok.

Richard.

>
> Thanks,
> bin
> 2017-08-24  Bin Cheng  
>
> PR tree-optimization/81913
> * tree-ssa-loop-niter.c (number_of_iterations_cond): Skip niter
> analysis when either IVs in condition can wrap.
>
> gcc/testsuite
> 2017-08-24  Bin Cheng  
>
> PR tree-optimization/81913
> * gcc.c-torture/execute/pr81913.c: New test.
> * gcc.dg/tree-ssa/loop-niter-1.c: New test.
> * gcc.dg/tree-ssa/loop-niter-2.c: New test.

Re: [PATCH] handle pathological anti-ranges in gimple_fold_builtin_memory_op (PR 81908)

2017-08-24 Thread Richard Biener

On Wed, Aug 23, 2017 at 9:42 PM, Martin Sebor  wrote:
> Bug 81908 is about a -Wstringop-overflow warning for a Fortran
> test triggered by a recent VRP improvement.  A simple test case
> that approximates the warning is:
>
>   void f (char *d, const char *s, size_t n)
>   {
> if (n > 0 && n <= SIZE_MAX / 2)
>   n = 0;
>
> memcpy (d, s, n);   // n in ~[1, SIZE_MAX / 2]
>   }
>
> Since the only valid value of n is zero the call to memcpy can
> be considered a no-op (a value of n > SIZE_MAX is in excess of
> the maximum size of the largest object and would surely make
> the call crash).
>
> The important difference between the test case and Bug 81908
> is that in the latter, the code is emitted by GCC itself from
> what appears to be correct source (looks like it's the result
> of the loop distribution pass).  I believe the warning for
> the test case above and for other human-written code like it
> is helpful, but warning for code emitted by GCC, even if it's
> dead or unreachable, is obviously not (at least not to users).
>
> The attached patch enhances the gimple_fold_builtin_memory_op
> function to eliminate this patholohgical case by making use
> of range information to fold into no-ops calls to memcpy whose
> size argument is in a range where the only valid value is zero.
> This gets rid of the warning and improves the emitted code.
>
> Tested on x86_64-linux.

@@ -646,6 +677,12 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
   tree destvar, srcvar;
   location_t loc = gimple_location (stmt);

+  if (tree valid_len = only_valid_value (len))
+{
+  /* LEN is in range whose only valid value is zero.  */
+  len = valid_len;
+}
+
   /* If the LEN parameter is zero, return DEST.  */
   if (integer_zerop (len))

just enhance this check to

  if (integer_zerop (len)
  || size_must_be_zero_p (len))

?  'only_valid_value' looks too generic for this.

+  if (!wi::fits_uhwi_p (min) || !wi::fits_uhwi_p (max))
+return NULL_TREE;
+

why?

+  if (wi::eq_p (min, wone)
+  && wi::geu_p (max + 1, ssize_max))

   if (wi::eq_p (min, 1)
  && wi::gtu_p (max, wi::max_value (prec, SIGNED)))

your ssize_max isn't signed size max, and max + 1 might overflow to zero.

Richard.



> Martin

[PATCH] [MSP430] [PR78554] Prevent SUBREG from referencing a SYMBOL_REF

2017-08-24 Thread Jozef Lawrynowicz


As reported in PR78554, attempting to store an __int20 address in memory
causes an ICE due to an invalid insn. This only occurs at optimisation
levels higher than -O0 because these optimisation levels pass
-ftree-ter, which causes the compiler to try and do the store in
one instruction.
The issue in the insn is that a SUBREG references a SYMBOL_REF.

I guess the compiler gets into this situation because it assumes that
it can execute a move instruction where both src and dst are in memory,
but this isn't possible with __int20.

The attached patch prevents a instance of SUBREG being created where the
subword is a SYMBOL_REF.

If the patch is acceptable, I would appreciate if someone could commit
it for me, as I do not have write access.

Thanks,
Jozef
From 72a419c1f470254f06c59c7824ae27818a18b555 Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Thu, 24 Aug 2017 12:08:36 +
Subject: [PATCH] MSP430: Prevent SUBREG from referencing a SYMBOL_REF

2017-08-XX  Jozef Lawrynowicz   

PR target/78554
* gcc/emit-rtl.c (operand_subword): Dont pass SYMBOL_REF operand to
simplify_gen_subreg.

gcc/testsuite
2017-08-XX  Jozef Lawrynowicz   

PR target/78554
* gcc.target/msp430/pr78554.c: New test.
---
 gcc/emit-rtl.c|  5 +
 gcc/testsuite/gcc.target/msp430/pr78554.c | 20 
 2 files changed, 25 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/msp430/pr78554.c

diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index c1438d6..c655605 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -1596,6 +1596,11 @@ operand_subword (rtx op, unsigned int offset, int 
validate_address, machine_mode
   && (offset + 1) * UNITS_PER_WORD > GET_MODE_SIZE (mode))
 return const0_rtx;
 
+  /* TER can cause SYMBOL_REFs to arrive here.  Don't pass them to
+ simplify_gen_subreg.  */
+  if (SYMBOL_REF_P (op))
+return 0;
+
   /* Form a new MEM at the requested address.  */
   if (MEM_P (op))
 {
diff --git a/gcc/testsuite/gcc.target/msp430/pr78554.c 
b/gcc/testsuite/gcc.target/msp430/pr78554.c
new file mode 100644
index 000..aca98cc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/msp430/pr78554.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-skip-if "" { "*-*-*" } { "-mcpu=msp430" "-msmall" } { "" } } */
+/* { dg-options "-O1 -mlarge -mcpu=msp430x" } */
+
+unsigned char test_val = 0;
+
+typedef __int20 unsigned reg_t;
+
+struct holds_reg
+{
+  reg_t r0;
+};
+
+struct holds_reg ex = { 0 };
+
+int main (void)
+{
+  ex.r0 = (reg_t)(_val);
+  return 0;
+}
-- 
1.8.3.1

Re: RFC: Explicit move preference hints

2017-08-24 Thread Wilco Dijkstra

Vladimir Makarov wrote:
>
> As I correctly understand, you just want an intuitive allocation. The 
> current allocation performance has the same quality as the intuitive one.

Performance is affected as well but I didn't want to go into details as that
distracts from the underlying issue. But if you're interested - on AArch64 
there is an optimization phase dedicated to renaming registers to fix various
issues with FMAC. It doesn't work as well as expected since it is hard to do
a global rename after register allocation.

Given the register allocator already supports move preferencing, using that is
far simpler. A few experiments showed that move preferencing does much
better.

> In this case it would be hard for me to approve such change because it 
> gives no performance improvement but complicates machine description 
> definition which is already too complicated.

I'm not suggesting to add extra complication, just to make existing syntax
work as expected. My experiment required 2 lines of code.

> I think if it is important for some cases may be it is possible to find 
> an alternative solution without introducing a new hint.  Segher proposed 
> some solutions.  They might work.  If they don't work  we could try to 
> change heuristics in LRA, for example, to favor copy destination with 
> the first operand to improve some regularity for human eyes.

Unfortunately there is no alternative today. The allocator always prefers
copies to the first dead operand - there is nothing you can do about it
(reordering operands isn't possible in most patterns).

Wilco

Re: [Patch] PR c++/80287 backport on GCC 6 branch

2017-08-24 Thread Nathan Sidwell


On 08/24/2017 08:33 AM, Yvan Roux wrote:

Hi,

As described in the PR, gcc-6-branch is impacted by this issue.  This
patch backports the original fix from Bernd and the recently added
testcase (which does not rely on c++17 features).

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80287
https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00730.html

Bootstrapped and regression tested on x86, ARM and AArch64 targets.
Ok for gcc-6-branch ?


ok


--
Nathan Sidwell

Re: PR81635: Use chrecs to help find related data refs

2017-08-24 Thread Richard Biener

On Tue, Aug 22, 2017 at 4:19 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Fri, Aug 18, 2017 at 12:30 PM, Richard Biener
>>  wrote:
>>> On Thu, Aug 17, 2017 at 2:24 PM, Bin.Cheng  wrote:
 On Thu, Aug 17, 2017 at 12:35 PM, Richard Sandiford
  wrote:
> "Bin.Cheng"  writes:
>> On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford
>>  wrote:
>>> "Bin.Cheng"  writes:
 On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford
  wrote:
> "Bin.Cheng"  writes:
>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford
>>  wrote:
>>> The first loop in the testcase regressed after my recent changes to
>>> dr_analyze_innermost.  Previously we would treat "i" as an iv even
>>> for bb analysis and end up with:
>>>
>>>DR_BASE_ADDRESS: p or q
>>>DR_OFFSET: 0
>>>DR_INIT: 0 or 4
>>>DR_STEP: 16
>>>
>>> We now always keep the step as 0 instead, so for an int "i" we'd 
>>> have:
>>>
>>>DR_BASE_ADDRESS: p or q
>>>DR_OFFSET: (intptr_t) i
>>>DR_INIT: 0 or 4
>>>DR_STEP: 0
>>>
>>> This is also what we'd like to have for the unsigned "i", but the
>>> problem is that strip_constant_offset thinks that the "i + 1" in
>>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
>>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
>>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no
>>> longer seem to be related to the [i] ones.
>>
>> Didn't read the change in detail, so sorry if I mis-understood
>> the issue.
>> I made changes in scev to better fold type conversion by
>> various sources
>> of information, for example, vrp, niters, undefined overflow
>> behavior etc.
>> In theory these information should be available for other
>> optimizers without
>> querying scev.  For the mentioned test, vrp should compute
>> accurate range
>> information for "i" so that we can fold (intptr_t) (i + 1) it without
>> worrying
>> overflow.  Note we don't do it in generic folding because
>> (intptr_t) (i) + 1
>> could be more expensive (especially in case of (T)(i + j)), or
>> because the
>> CST part is in bigger precision after conversion.
>> But such folding is wanted in several places, e.g, IVOPTs.  To
>> provide such
>> an interface, we changed tree-affine and made it do aggressive
>> fold.  I am
>> curious if it's possible to use aff_tree to implement
>> strip_constant_offset
>> here since aggressive folding is wanted.  After all, using
>> additional chrec
>> looks like a little heavy wrto the simple test.
>
> Yeah, using aff_tree does work here when the bounds are constant.
> It doesn't look like it works for things like:
>
> double p[1000];
> double q[1000];
>
> void
> f4 (unsigned int n)
> {
>   for (unsigned int i = 0; i < n; i += 4)
> {
>   double a = q[i] + p[i];
>   double b = q[i + 1] + p[i + 1];
>   q[i] = a;
>   q[i + 1] = b;
> }
> }
>
> though, where the bounds on the global arrays guarantee that [i + 1] 
> can't
> overflow, even though "n" is unconstrained.  The patch as posted 
> handles
> this case too.
 BTW is this a missed optimization in value range analysis?  The range
 information for i should flow in a way like: array boundary -> niters
 -> scev/vrp.
 I think that's what niters/scev do in analysis.
>>>
>>> Yeah, maybe :-)  It looks like the problem is that when SLP runs,
>>> the previous VRP pass came before loop header copying, so the (single)
>>> header has to cope with n == 0 case.  Thus we get:
>> Ah, there are several passes in between vrp and pass_ch, not sure if
>> any such pass depends on vrp intensively.  I would suggestion reorder
>> the two passes, or standalone VRP interface updating information for
>> loop region after header copied?   This is a non-trivial issue that
>> needs to be fixed.  Niters analyzer rely on
>> simplify_using_initial_conditions heavily to get the same information,
>> which in my opinion should be provided by VRP.  Though this won't be

[Patch] PR c++/80287 backport on GCC 6 branch

2017-08-24 Thread Yvan Roux

Hi,

As described in the PR, gcc-6-branch is impacted by this issue.  This
patch backports the original fix from Bernd and the recently added
testcase (which does not rely on c++17 features).

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80287
https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00730.html

Bootstrapped and regression tested on x86, ARM and AArch64 targets.
Ok for gcc-6-branch ?

Thanks,
Yvan

2017-08-24  Yvan Roux  

PR c++/80287 C++ crash with __attribute((may_alias))

Backport from mainline
2017-04-17  Bernd Edlinger  

PR c++/80287
* class.c (fixup_may_alias): Fix all type variants.

gcc/testsuite
2017-08-22  Yvan Roux  

PR c++/80287
* g++.dg/pr8028.C: New test.
diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index 35deb1e99cd..86e7e003fe4 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -1999,12 +1999,14 @@ fixup_type_variants (tree t)
 static void
 fixup_may_alias (tree klass)
 {
-  tree t;
+  tree t, v;
 
   for (t = TYPE_POINTER_TO (klass); t; t = TYPE_NEXT_PTR_TO (t))
-TYPE_REF_CAN_ALIAS_ALL (t) = true;
+for (v = TYPE_MAIN_VARIANT (t); v; v = TYPE_NEXT_VARIANT (v))
+  TYPE_REF_CAN_ALIAS_ALL (v) = true;
   for (t = TYPE_REFERENCE_TO (klass); t; t = TYPE_NEXT_REF_TO (t))
-TYPE_REF_CAN_ALIAS_ALL (t) = true;
+for (v = TYPE_MAIN_VARIANT (t); v; v = TYPE_NEXT_VARIANT (v))
+  TYPE_REF_CAN_ALIAS_ALL (v) = true;
 }
 
 /* Early variant fixups: we apply attributes at the beginning of the class
diff --git a/gcc/testsuite/g++.dg/pr80287.C b/gcc/testsuite/g++.dg/pr80287.C
new file mode 100644
index 000..da8d3fab150
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr80287.C
@@ -0,0 +1,13 @@
+// PR c++/80287
+// { dg-do compile { target c++11 } }
+// { dg-options "-g" }
+
+struct A {
+  operator long() {}
+} __attribute__((__may_alias__));
+
+struct {
+  A ino;
+} a;
+
+char b = a.ino;

Re: [PATCH, gcc-7-branch] Backport PR80038

2017-08-24 Thread Xi Ruoyao

On 2017-08-22 10:17 +0200, Richard Biener wrote:
> 
> Ok for the gcc 7 branch.
> 
Well, I think I should say I don't have SVN write access...
-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH] Fix PR81936

2017-08-24 Thread Richard Biener


The following fixes offloading fallout from the early LTO debug patches.

Bootstrap and regtest running on x86_64-unknown-linux-gnu, at least
the compile stage is verified to work for nvptx offloading now.

Richard.

2017-08-24  Richard Biener  

PR debug/81936
* dwarf2out.c (output_die): Handle flag_generate_offload like
flag_generate_lto.
(output_comp_unit): Likewise.
(gen_array_type_die): Likewise.
(dwarf2out_early_finish): Likewise.
(note_variable_value_in_expr): Likewise.
(dwarf2out_finish): Likewise.  Adjust assert.
* cgraphunit.c (symbol_table::compile): Move setting of
flag_generate_offload earlier ...
(symbol_table::finalize_compilation_unit): ... here, before
early debug finalization.

Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 251326)
+++ gcc/dwarf2out.c (working copy)
@@ -,7 +,7 @@ output_die (dw_die_ref die)
   if (! die->comdat_type_p && die->die_id.die_symbol
   /* Don't output the symbol twice.  For LTO we want the label
  on the section beginning, not on the actual DIE.  */
-  && (!flag_generate_lto
+  && ((!flag_generate_lto && !flag_generate_offload)
  || die->die_tag != DW_TAG_compile_unit))
 output_die_symbol (die);
 
@@ -10450,7 +10450,7 @@ output_comp_unit (dw_die_ref die, int ou
 
   /* For LTO cross unit DIE refs we want a symbol on the start of the
  debuginfo section, not on the CU DIE.  */
-  if (flag_generate_lto && oldsym)
+  if ((flag_generate_lto || flag_generate_offload) && oldsym)
 {
   /* ???  No way to get visibility assembled without a decl.  */
   tree decl = build_decl (UNKNOWN_LOCATION, VAR_DECL,
@@ -20843,7 +20843,7 @@ gen_array_type_die (tree type, dw_die_re
add_AT_unsigned (array_die, DW_AT_byte_size, size);
   /* ???  We can't annotate types late, but for LTO we may not
 generate a location early either (gfortran.dg/save_6.f90).  */
-  else if (! (early_dwarf && flag_generate_lto)
+  else if (! (early_dwarf && (flag_generate_lto || flag_generate_offload))
   && TYPE_DOMAIN (type) != NULL_TREE
   && TYPE_MAX_VALUE (TYPE_DOMAIN (type)) != NULL_TREE)
{
@@ -29740,9 +29740,9 @@ dwarf2out_finish (const char *)
 
   gen_remaining_tmpl_value_param_die_attribute ();
 
-  if (flag_generate_lto)
+  if (flag_generate_lto || flag_generate_offload)
 {
-  gcc_assert (flag_fat_lto_objects);
+  gcc_assert (flag_fat_lto_objects || flag_generate_offload);
 
   /* Prune stuff so that dwarf2out_finish runs successfully
 for the fat part of the object.  */
@@ -30318,7 +30318,7 @@ note_variable_value_in_expr (dw_die_ref
   {
tree decl = loc->dw_loc_oprnd1.v.val_decl_ref;
dw_die_ref ref = lookup_decl_die (decl);
-   if (! ref && flag_generate_lto)
+   if (! ref && (flag_generate_lto || flag_generate_offload))
  {
/* ???  This is somewhat a hack because we do not create DIEs
   for variables not in BLOCK trees early but when generating
@@ -30529,7 +30529,7 @@ dwarf2out_early_finish (const char *file
   early_dwarf_finished = true;
 
   /* Do not generate DWARF assembler now when not producing LTO bytecode.  */
-  if (!flag_generate_lto)
+  if (!flag_generate_lto && !flag_generate_offload)
 return;
 
   /* Now as we are going to output for LTO initialize sections and labels
Index: gcc/cgraphunit.c
===
--- gcc/cgraphunit.c(revision 251326)
+++ gcc/cgraphunit.c(working copy)
@@ -2464,10 +2464,6 @@ symbol_table::compile (void)
 fprintf (stderr, "Performing interprocedural optimizations\n");
   state = IPA;
 
-  /* Offloading requires LTO infrastructure.  */
-  if (!in_lto_p && g->have_offload)
-flag_generate_offload = 1;
-
   /* If LTO is enabled, initialize the streamer hooks needed by GIMPLE.  */
   if (flag_generate_lto || flag_generate_offload)
 lto_streamer_hooks_init ();
@@ -2614,6 +2610,10 @@ symbol_table::finalize_compilation_unit
   /* Gimplify and lower thunks.  */
   analyze_functions (/*first_time=*/false);
 
+  /* Offloading requires LTO infrastructure.  */
+  if (!in_lto_p && g->have_offload)
+flag_generate_offload = 1;
+
   if (!seen_error ())
 {
   /* Emit early debug for reachable functions, and by consequence,

Re: ping [PATCH] [MSP430] Fix PR78849: ICE on initialization of global struct containing __int20 array

2017-08-24 Thread Jozef Lawrynowicz


On 22/08/2017 11:57, Jozef Lawrynowicz wrote:

On 02/08/2017 17:45, Joseph Myers wrote:

On Wed, 2 Aug 2017, Jeff Law wrote:


I think Joseph's suggestion for looking at partial float handling may be
useful, though ia64's RFmode may be more interesting as it's not a
multiple of 8 in bitsize.  IF/KF modes on the ppc port have similar
properties.


The key issue those floating-point modes engage is the meaning of
precision.  IFmode and KFmode have precision set to 106 and 113 to
distinguish them.  But that's precision in the sense of number of 
mantissa

bits; as normally understood in GCC, precision should be the number of
significant bits, so 128 for both those modes (but having multiple
different binary floating-point modes with the same precision would
require changing how we deal with laying out long double, so the target
specifies a mode for floating-point types rather than leaving it to be
deduced from values such as LONG_DOUBLE_TYPE_SIZE).



Thanks for the advice, I'm looking into how the ppc KFmode behaves in
this situation now.

I also looked through the front end code a bit more, and the behaviour
of stor-layout.c:layout_type for RECORD_TYPE looks likes a smoking gun
to me.
For BOOLEAN_TYPE, INTEGER_TYPE, ENUMERAL_TYPE, REAL_TYPE,
FIXED_POINT_TYPE etc. layout_type sets:

TYPE_SIZE (type) = bitsize_int (GET_MODE_BITSIZE (TYPE_MODE (type)));

But for the individual field types in RECORD_TYPE, UNION_TYPE or
QUAL_UNION_TYPE, this same setting of TYPE_SIZE and friends is not
performed.
So a field in a RECORD_TYPE might be an INTEGER_TYPE, but TYPE_SIZE for
this INTEGER_TYPE will not be set as it would have been had the type not
been a field in a RECORD_TYPE.

So the fix to me looks to be to ensure that the code for the base types
(BOOLEAN_TYPE, INTEGER_TYPE etc.) from layout_type is also executed for
each type that happens to be a field in a RECORD/UNION/QUAL_UNION_TYPE.


Actually, the issue turned out to be that TYPE_SIZE is specifically set
in the first place.
When the internal data structures to handle __intN types are initialised
in tree.c, the compiler is also setting TYPE_SIZE.

For the other "standard" types, layout_type sets TYPE_SIZE. So rather
than specifically setting TYPE_SIZE in tree.c, I've removed this code so
TYPE_SIZE will get set like it does for every other type.

If the attached patch is acceptable, I would appreciate if someone could
commit it for me, as I do not have write access.

Thanks,
Jozef
From 5437c7ffa48f974c6960a1e308c4cdf0ea0a2648 Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Thu, 24 Aug 2017 11:40:04 +
Subject: [PATCH] MSP430: Dont specifically set TYPE_SIZE for __intN types

2017-08-XX  Jozef Lawrynowicz   

PR target/78849
* gcc/tree.c (build_common_tree_nodes): Dont set TYPE_SIZE for __intN
types.

gcc/testsuite
2017-08-XX  Jozef Lawrynowicz   

PR target/78849
* gcc.target/msp430/msp430.exp: Remove -pedantic-errors from
DEFAULT_CFLAGS.
* gcc.target/msp430/pr78849.c: New test.
---
 gcc/testsuite/gcc.target/msp430/msp430.exp | 13 +---
 gcc/testsuite/gcc.target/msp430/pr78849.c  | 50 ++
 gcc/tree.c |  2 --
 3 files changed, 59 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/msp430/pr78849.c

diff --git a/gcc/testsuite/gcc.target/msp430/msp430.exp 
b/gcc/testsuite/gcc.target/msp430/msp430.exp
index e54d531..3be8711 100644
--- a/gcc/testsuite/gcc.target/msp430/msp430.exp
+++ b/gcc/testsuite/gcc.target/msp430/msp430.exp
@@ -24,10 +24,15 @@ if { ![istarget msp430-*-*] } then {
 # Load support procs.
 load_lib gcc-dg.exp
 
-# If a testcase doesn't have special options, use these.
+# The '-pedantic-errors' option in the global variable DEFAULT_CFLAGS that is
+# set by other drivers causes an error when the __int20 type is used, so remove
+# this option from DEFAULT_CFLAGS for the msp430 tests.
 global DEFAULT_CFLAGS
-if ![info exists DEFAULT_CFLAGS] then {
-set DEFAULT_CFLAGS ""
+if [info exists DEFAULT_CFLAGS] then {
+set MSP430_DEFAULT_CFLAGS \
+  [ string map { "-pedantic-errors" "" } $DEFAULT_CFLAGS ]
+} else {
+   set MSP430_DEFAULT_CFLAGS ""
 }
 
 # Initialize `dg'.
@@ -35,7 +40,7 @@ dg-init
 
 # Main loop.
 dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \
-   "" $DEFAULT_CFLAGS
+   "" $MSP430_DEFAULT_CFLAGS
 
 # All done.
 dg-finish
diff --git a/gcc/testsuite/gcc.target/msp430/pr78849.c 
b/gcc/testsuite/gcc.target/msp430/pr78849.c
new file mode 100644
index 000..f70f0bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/msp430/pr78849.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-final { scan-assembler ".size.*instance.*52" } } */
+
+struct t_inner
+{
+  __int20 a;
+  char val1;
+  __int20 b[3];
+  char val2;
+};
+
+struct t_full
+{
+  __int20 array[2];
+  char val1;

Re: [PATCH] Allow fpmath differences in inlining when callee doesn't use FP math

2017-08-24 Thread Uros Bizjak

On Thu, Aug 24, 2017 at 1:34 PM, Richard Biener  wrote:
> On Thu, 24 Aug 2017, Uros Bizjak wrote:
>
>> On Thu, Aug 24, 2017 at 1:10 PM, Richard Biener  wrote:
>> >
>> > This adjusts the x86 backend to allow -mfpmath differences when
>> > deciding whether to allow inlining.  -mfpmath doesn't really
>> > matter for functions not containing FP operations.
>> >
>> > It appears that the can_inline_p target hook is called from the
>> > C++ FE for multi-versioning, thus the ! ipa_fn_summaries check.
>> >
>> > Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk?
>> >
>> > Incidentically this fixes the bootstrap issue I have when enabling
>> > free-lang-data without LTO.
>> >
>> > It doesn't fully solve the fpmath difference issue we have with
>> > intrinsics but is an optimization.
>> >
>> > Thanks,
>> > Richard.
>> >
>> > 2017-08-24  Richard Biener  
>> >
>> > * config/i386/i386.c: Include symbol-summary.h, ipa-prop.h
>> > and ipa-fnsummary.h.
>> > (ix86_can_inline_p): When ix86_fpmath flags do not match
>> > check whether the callee uses FP math at all.
>>
>> LGTM for branches, but for trunk, I'd still like to remove fpmath
>
> Hmm, it should be indeed safe but fp_expression is only available
> on the GCC 7 branch.  Might be the only change that doesn't have
> the chance to reject code that we previously accepted...
>
>> processing from outside ix86_option_override_internal. As mentioned in
>> the comment in i_o_o_i, blindly switching fpmath for TARGET_SSE can
>> violate ABI for some environments.
>
> Yeah.  The following has also passed bootstrap & regtest on
> x86_64-unknown-linux-gnu.
>
> Ok for trunk?

Yes, x86 part is OK.

Thanks,
Uros.

> Thanks,
> Richard.
>
> 2017-08-23  Richard Biener  
>
> PR target/81921
> * targhooks.c (default_target_can_inline_p): Properly
> use target_option_default_node when no DECL_FUNCTION_SPECIFIC_TARGET
> is present and always compare.
> * config/i386/i386.c (ix86_valid_target_attribute_tree): Do not
> imply -mfpmath=sse from TARGET_SSE_P.
> (ix86_can_inline_p): Properly use target_option_default_node when
> no DECL_FUNCTION_SPECIFIC_TARGET is present and always compare.
>
> * gcc.target/i386/pr81921.c: New testcase.
>
> Index: gcc/config/i386/i386.c
> ===
> --- gcc/config/i386/i386.c  (revision 251270)
> +++ gcc/config/i386/i386.c  (working copy)
> @@ -7398,16 +7398,6 @@ ix86_valid_target_attribute_tree (tree a
>/* If fpmath= is not set, and we now have sse2 on 32-bit, use it.  */
>if (enum_opts_set.x_ix86_fpmath)
> opts_set->x_ix86_fpmath = (enum fpmath_unit) 1;
> -  else if (!TARGET_64BIT_P (opts->x_ix86_isa_flags)
> -  && TARGET_SSE_P (opts->x_ix86_isa_flags))
> -   {
> - if (TARGET_80387_P (opts->x_target_flags))
> -   opts->x_ix86_fpmath = (enum fpmath_unit) (FPMATH_SSE
> - | FPMATH_387);
> - else
> -   opts->x_ix86_fpmath = (enum fpmath_unit) FPMATH_SSE;
> - opts_set->x_ix86_fpmath = (enum fpmath_unit) 1;
> -   }
>
>/* Do any overrides, such as arch=xxx, or tune=xxx support.  */
>bool r = ix86_option_override_internal (false, opts, opts_set);
> @@ -7502,53 +7492,47 @@ ix86_valid_target_attribute_p (tree fnde
>  static bool
>  ix86_can_inline_p (tree caller, tree callee)
>  {
> -  bool ret = false;
>tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller);
>tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
> -
> -  /* If callee has no option attributes, then it is ok to inline.  */
>if (!callee_tree)
> -ret = true;
> +callee_tree = target_option_default_node;
> +  if (!caller_tree)
> +caller_tree = target_option_default_node;
> +  if (callee_tree == caller_tree)
> +return true;
> +
> +  struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree);
> +  struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
> +  bool ret = false;
>
> -  /* If caller has no option attributes, but callee does then it is not ok to
> - inline.  */
> -  else if (!caller_tree)
> +  /* Callee's isa options should be a subset of the caller's, i.e. a SSE4
> + function can inline a SSE2 function but a SSE2 function can't inline
> + a SSE4 function.  */
> +  if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags)
> +   != callee_opts->x_ix86_isa_flags)
> +  || ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2)
> + != callee_opts->x_ix86_isa_flags2))
>  ret = false;
>
> -  else
> -{
> -  struct cl_target_option *caller_opts = TREE_TARGET_OPTION 
> (caller_tree);
> -  struct cl_target_option *callee_opts = TREE_TARGET_OPTION 
> (callee_tree);
> +

Re: [PATCH] Allow fpmath differences in inlining when callee doesn't use FP math

2017-08-24 Thread Richard Biener

On Thu, 24 Aug 2017, Uros Bizjak wrote:

> On Thu, Aug 24, 2017 at 1:10 PM, Richard Biener  wrote:
> >
> > This adjusts the x86 backend to allow -mfpmath differences when
> > deciding whether to allow inlining.  -mfpmath doesn't really
> > matter for functions not containing FP operations.
> >
> > It appears that the can_inline_p target hook is called from the
> > C++ FE for multi-versioning, thus the ! ipa_fn_summaries check.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk?
> >
> > Incidentically this fixes the bootstrap issue I have when enabling
> > free-lang-data without LTO.
> >
> > It doesn't fully solve the fpmath difference issue we have with
> > intrinsics but is an optimization.
> >
> > Thanks,
> > Richard.
> >
> > 2017-08-24  Richard Biener  
> >
> > * config/i386/i386.c: Include symbol-summary.h, ipa-prop.h
> > and ipa-fnsummary.h.
> > (ix86_can_inline_p): When ix86_fpmath flags do not match
> > check whether the callee uses FP math at all.
> 
> LGTM for branches, but for trunk, I'd still like to remove fpmath

Hmm, it should be indeed safe but fp_expression is only available
on the GCC 7 branch.  Might be the only change that doesn't have
the chance to reject code that we previously accepted...

> processing from outside ix86_option_override_internal. As mentioned in
> the comment in i_o_o_i, blindly switching fpmath for TARGET_SSE can
> violate ABI for some environments.

Yeah.  The following has also passed bootstrap & regtest on 
x86_64-unknown-linux-gnu.

Ok for trunk?

Thanks,
Richard.

2017-08-23  Richard Biener  

PR target/81921
* targhooks.c (default_target_can_inline_p): Properly
use target_option_default_node when no DECL_FUNCTION_SPECIFIC_TARGET
is present and always compare.
* config/i386/i386.c (ix86_valid_target_attribute_tree): Do not
imply -mfpmath=sse from TARGET_SSE_P.
(ix86_can_inline_p): Properly use target_option_default_node when
no DECL_FUNCTION_SPECIFIC_TARGET is present and always compare.

* gcc.target/i386/pr81921.c: New testcase.

Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c  (revision 251270)
+++ gcc/config/i386/i386.c  (working copy)
@@ -7398,16 +7398,6 @@ ix86_valid_target_attribute_tree (tree a
   /* If fpmath= is not set, and we now have sse2 on 32-bit, use it.  */
   if (enum_opts_set.x_ix86_fpmath)
opts_set->x_ix86_fpmath = (enum fpmath_unit) 1;
-  else if (!TARGET_64BIT_P (opts->x_ix86_isa_flags)
-  && TARGET_SSE_P (opts->x_ix86_isa_flags))
-   {
- if (TARGET_80387_P (opts->x_target_flags))
-   opts->x_ix86_fpmath = (enum fpmath_unit) (FPMATH_SSE
- | FPMATH_387);
- else
-   opts->x_ix86_fpmath = (enum fpmath_unit) FPMATH_SSE;
- opts_set->x_ix86_fpmath = (enum fpmath_unit) 1;
-   }
 
   /* Do any overrides, such as arch=xxx, or tune=xxx support.  */
   bool r = ix86_option_override_internal (false, opts, opts_set);
@@ -7502,53 +7492,47 @@ ix86_valid_target_attribute_p (tree fnde
 static bool
 ix86_can_inline_p (tree caller, tree callee)
 {
-  bool ret = false;
   tree caller_tree = DECL_FUNCTION_SPECIFIC_TARGET (caller);
   tree callee_tree = DECL_FUNCTION_SPECIFIC_TARGET (callee);
-
-  /* If callee has no option attributes, then it is ok to inline.  */
   if (!callee_tree)
-ret = true;
+callee_tree = target_option_default_node;
+  if (!caller_tree)
+caller_tree = target_option_default_node;
+  if (callee_tree == caller_tree)
+return true;
+
+  struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree);
+  struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
+  bool ret = false;
 
-  /* If caller has no option attributes, but callee does then it is not ok to
- inline.  */
-  else if (!caller_tree)
+  /* Callee's isa options should be a subset of the caller's, i.e. a SSE4
+ function can inline a SSE2 function but a SSE2 function can't inline
+ a SSE4 function.  */
+  if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags)
+   != callee_opts->x_ix86_isa_flags)
+  || ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2)
+ != callee_opts->x_ix86_isa_flags2))
 ret = false;
 
-  else
-{
-  struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree);
-  struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
+  /* See if we have the same non-isa options.  */
+  else if (caller_opts->x_target_flags != callee_opts->x_target_flags)
+ret = false;
 
-  /* Callee's isa options should be a subset of the caller's, i.e. a SSE4
-function can inline a SSE2 function but a SSE2 function can't inline
-

Re: [PATCH] istream_iterator: unexpected read in ctor

2017-08-24 Thread Petr Ovtchenkov

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81964

On Thu, 24 Aug 2017 11:55:58 +0300
Petr Ovtchenkov  wrote:

> istream_iterator do unexpected read from stream
> when initialized by istream&.
> 
> It is not required from increment operators of istream_iterator
> that _M_ok will be true as precondition.
> ---
>  libstdc++-v3/include/bits/stream_iterator.h| 19 +-
>  .../24_iterators/istream_iterator/81964.cc | 42 
> ++
>  2 files changed, 50 insertions(+), 11 deletions(-)
>  create mode 100644 
> libstdc++-v3/testsuite/24_iterators/istream_iterator/81964.cc
> 
> diff --git a/libstdc++-v3/include/bits/stream_iterator.h
> b/libstdc++-v3/include/bits/stream_iterator.h index f9c6ba6..26959ce 100644
> --- a/libstdc++-v3/include/bits/stream_iterator.h
> +++ b/libstdc++-v3/include/bits/stream_iterator.h
> @@ -56,8 +56,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  
>  private:
>istream_type*  _M_stream;
> -  _Tp_M_value;
> -  bool   _M_ok;
> +  mutable _Tp_M_value;
> +  mutable bool   _M_ok;
>  
>  public:
>///  Construct end of input stream iterator.
> @@ -66,8 +66,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  
>///  Construct start of input stream iterator.
>istream_iterator(istream_type& __s)
> -  : _M_stream(&__s)
> -  { _M_read(); }
> +  : _M_stream(&__s), _M_value(), _M_ok(false)
> +  { }
>  
>istream_iterator(const istream_iterator& __obj)
>: _M_stream(__obj._M_stream), _M_value(__obj._M_value),
> @@ -77,6 +77,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>const _Tp&
>operator*() const
>{
> + if (!_M_ok) {
> +   _M_read();
> + }
>   __glibcxx_requires_cond(_M_ok,
>   _M_message(__gnu_debug::__msg_deref_istream)
>   ._M_iterator(*this));
> @@ -89,9 +92,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>istream_iterator&
>operator++()
>{
> - __glibcxx_requires_cond(_M_ok,
> - _M_message(__gnu_debug::__msg_inc_istream)
> - ._M_iterator(*this));
>   _M_read();
>   return *this;
>}
> @@ -99,9 +99,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>istream_iterator
>operator++(int)
>{
> - __glibcxx_requires_cond(_M_ok,
> - _M_message(__gnu_debug::__msg_inc_istream)
> - ._M_iterator(*this));
>   istream_iterator __tmp = *this;
>   _M_read();
>   return __tmp;
> @@ -113,7 +110,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  
>  private:
>void
> -  _M_read()
> +  _M_read() const
>{
>   _M_ok = (_M_stream && *_M_stream) ? true : false;
>   if (_M_ok)
> diff --git a/libstdc++-v3/testsuite/24_iterators/istream_iterator/81964.cc
> b/libstdc++-v3/testsuite/24_iterators/istream_iterator/81964.cc new file mode 
> 100644
> index 000..f58fc87
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/24_iterators/istream_iterator/81964.cc
> @@ -0,0 +1,42 @@
> +// { dg-options "-std=gnu++11" }
> +
> +// Copyright (C) 2017 Free Software Foundation, Inc.
> +//
> +// This file is part of the GNU ISO C++ Library.  This library is free
> +// software; you can redistribute it and/or modify it under the
> +// terms of the GNU General Public License as published by the
> +// Free Software Foundation; either version 3, or (at your option)
> +// any later version.
> +
> +// This library is distributed in the hope that it will be useful,
> +// but WITHOUT ANY WARRANTY; without even the implied warranty of
> +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +// GNU General Public License for more details.
> +
> +// You should have received a copy of the GNU General Public License along
> +// with this library; see the file COPYING3.  If not see
> +// .
> +
> +#include 
> +#include 
> +#include 
> +
> +// libstdc++/81964
> +void test0x()
> +{
> +  using namespace std;
> +  bool test __attribute__((unused)) = true;
> +
> +  std::istringstream s("1 2");
> +  std::istream_iterator ii1(s);
> +  std::istream_iterator ii2(s);
> +
> +  VERIFY( *ii2 == 1 );
> +  VERIFY( *ii1 == 2 );
> +}
> +
> +int main()
> +{
> +  test0x();
> +  return 0;
> +}
> -- 
> 2.10.1
>

[PATCH] streambuf_iterator: avoid debug-dependent behaviour

2017-08-24 Thread Petr Ovtchenkov

Explicit do sgetc from associated streambuf. Avoid debug-dependent
sgetc (within _M_at_eof()):

   __glibcxx_requires_cond(!_M_at_eof(),
   _M_message(__gnu_debug::__msg_inc_istreambuf)
   ._M_iterator(*this));

Increment operators not require not-eof precoditions.

Don't unlink associated streambuf if eof detected (in _M_get).

Clean logic in postfix increment operator.
---
 libstdc++-v3/include/bits/streambuf_iterator.h | 24 +---
 1 file changed, 5 insertions(+), 19 deletions(-)

diff --git a/libstdc++-v3/include/bits/streambuf_iterator.h 
b/libstdc++-v3/include/bits/streambuf_iterator.h
index 2230e94..cff3b69 100644
--- a/libstdc++-v3/include/bits/streambuf_iterator.h
+++ b/libstdc++-v3/include/bits/streambuf_iterator.h
@@ -136,9 +136,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   istreambuf_iterator&
   operator++()
   {
-   __glibcxx_requires_cond(!_M_at_eof(),
-   _M_message(__gnu_debug::__msg_inc_istreambuf)
-   ._M_iterator(*this));
if (_M_sbuf)
  {
_M_sbuf->sbumpc();
@@ -151,14 +148,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   istreambuf_iterator
   operator++(int)
   {
-   __glibcxx_requires_cond(!_M_at_eof(),
-   _M_message(__gnu_debug::__msg_inc_istreambuf)
-   ._M_iterator(*this));
+_M_get();
 
istreambuf_iterator __old = *this;
if (_M_sbuf)
  {
-   __old._M_c = _M_sbuf->sbumpc();
+   _M_sbuf->sbumpc();
_M_c = traits_type::eof();
  }
return __old;
@@ -177,18 +172,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _M_get() const
   {
const int_type __eof = traits_type::eof();
-   int_type __ret = __eof;
-   if (_M_sbuf)
- {
-   if (!traits_type::eq_int_type(_M_c, __eof))
- __ret = _M_c;
-   else if (!traits_type::eq_int_type((__ret = _M_sbuf->sgetc()),
-  __eof))
- _M_c = __ret;
-   else
- _M_sbuf = 0;
- }
-   return __ret;
+   if (_M_sbuf && traits_type::eq_int_type(_M_c, __eof))
+  _M_c = _M_sbuf->sgetc();
+   return _M_c;
   }
 
   bool
-- 
2.10.1

[PATCH] istream_iterator: unexpected read in ctor

2017-08-24 Thread Petr Ovtchenkov

istream_iterator do unexpected read from stream
when initialized by istream&.

It is not required from increment operators of istream_iterator
that _M_ok will be true as precondition.
---
 libstdc++-v3/include/bits/stream_iterator.h| 19 +-
 .../24_iterators/istream_iterator/81964.cc | 42 ++
 2 files changed, 50 insertions(+), 11 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/24_iterators/istream_iterator/81964.cc

diff --git a/libstdc++-v3/include/bits/stream_iterator.h 
b/libstdc++-v3/include/bits/stream_iterator.h
index f9c6ba6..26959ce 100644
--- a/libstdc++-v3/include/bits/stream_iterator.h
+++ b/libstdc++-v3/include/bits/stream_iterator.h
@@ -56,8 +56,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 private:
   istream_type*_M_stream;
-  _Tp  _M_value;
-  bool _M_ok;
+  mutable _Tp  _M_value;
+  mutable bool _M_ok;
 
 public:
   ///  Construct end of input stream iterator.
@@ -66,8 +66,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   ///  Construct start of input stream iterator.
   istream_iterator(istream_type& __s)
-  : _M_stream(&__s)
-  { _M_read(); }
+  : _M_stream(&__s), _M_value(), _M_ok(false)
+  { }
 
   istream_iterator(const istream_iterator& __obj)
   : _M_stream(__obj._M_stream), _M_value(__obj._M_value),
@@ -77,6 +77,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   const _Tp&
   operator*() const
   {
+   if (!_M_ok) {
+ _M_read();
+   }
__glibcxx_requires_cond(_M_ok,
_M_message(__gnu_debug::__msg_deref_istream)
._M_iterator(*this));
@@ -89,9 +92,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   istream_iterator&
   operator++()
   {
-   __glibcxx_requires_cond(_M_ok,
-   _M_message(__gnu_debug::__msg_inc_istream)
-   ._M_iterator(*this));
_M_read();
return *this;
   }
@@ -99,9 +99,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   istream_iterator
   operator++(int)
   {
-   __glibcxx_requires_cond(_M_ok,
-   _M_message(__gnu_debug::__msg_inc_istream)
-   ._M_iterator(*this));
istream_iterator __tmp = *this;
_M_read();
return __tmp;
@@ -113,7 +110,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 private:
   void
-  _M_read()
+  _M_read() const
   {
_M_ok = (_M_stream && *_M_stream) ? true : false;
if (_M_ok)
diff --git a/libstdc++-v3/testsuite/24_iterators/istream_iterator/81964.cc 
b/libstdc++-v3/testsuite/24_iterators/istream_iterator/81964.cc
new file mode 100644
index 000..f58fc87
--- /dev/null
+++ b/libstdc++-v3/testsuite/24_iterators/istream_iterator/81964.cc
@@ -0,0 +1,42 @@
+// { dg-options "-std=gnu++11" }
+
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+#include 
+#include 
+
+// libstdc++/81964
+void test0x()
+{
+  using namespace std;
+  bool test __attribute__((unused)) = true;
+
+  std::istringstream s("1 2");
+  std::istream_iterator ii1(s);
+  std::istream_iterator ii2(s);
+
+  VERIFY( *ii2 == 1 );
+  VERIFY( *ii1 == 2 );
+}
+
+int main()
+{
+  test0x();
+  return 0;
+}
-- 
2.10.1

[PATCH] copy_n for input iterator, avoid in-loop jumps

2017-08-24 Thread Petr Ovtchenkov

Reword loop in copy_n specialization for input iterator.
Avoid condition check and jumps within loop.
Pay attention that input iterator incremented n - 1 times,
while output iterator incremented n times.

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50119 and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81857
---
 libstdc++-v3/include/bits/stl_algo.h | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_algo.h 
b/libstdc++-v3/include/bits/stl_algo.h
index c2ac031..bf043cd 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -757,17 +757,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _OutputIterator __result, input_iterator_tag)
 {
   if (__n > 0)
-   {
- while (true)
-   {
- *__result = *__first;
- ++__result;
- if (--__n > 0)
-   ++__first;
- else
-   break;
-   }
-   }
+{
+  *__result = *__first;
+  ++__result;
+  --__n;
+}
+  for (; __n > 0; --__n)
+{
+  ++__first;
+  *__result = *__first;
+  ++__result;
+}
   return __result;
 }
 
-- 
2.10.1

[PATCH] [MSP430] [PR80993] Prevent lto removing interrupt handlers

2017-08-24 Thread Jozef Lawrynowicz


As reported in PR80993, enabling lto causes interrupt handlers to be
removed. This patch marks interrupt handlers as used, preventing them
from being optimized out.

If the attached patch is acceptable, I would appreciate if someone could
commit it for me, as I do not have write access.

Thanks,
Jozef
From 44b14ffae0e8956f40cd5356a9f00f5d055709c7 Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Thu, 24 Aug 2017 11:10:24 +
Subject: [PATCH] MSP430: Prevent lto removing interrupt handlers

2017-08-XX  Jozef Lawrynowicz   

PR target/80993
* gcc/config/msp430/msp430.c (msp430_attr): Mark interrupt handlers as
used.

gcc/testsuite
2017-08-XX  Jozef Lawrynowicz   

PR target/80993
* gcc.target/msp430/pr80993.c: New test.
---
 gcc/config/msp430/msp430.c|  4 
 gcc/testsuite/gcc.target/msp430/pr80993.c | 13 +
 2 files changed, 17 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/msp430/pr80993.c

diff --git a/gcc/config/msp430/msp430.c b/gcc/config/msp430/msp430.c
index 2b4427d..48fc12d 100644
--- a/gcc/config/msp430/msp430.c
+++ b/gcc/config/msp430/msp430.c
@@ -1904,6 +1904,10 @@ msp430_attr (tree * node,
 
   if (! TREE_PUBLIC (* node))
message = "interrupt handlers cannot be static";
+
+  /* Ensure interrupt handlers never get optimised out.  */
+  TREE_USED (* node) = 1;
+  DECL_PRESERVE_P (* node) = 1;
 }
   else if (TREE_NAME_EQ (name, ATTR_REENT))
 {
diff --git a/gcc/testsuite/gcc.target/msp430/pr80993.c 
b/gcc/testsuite/gcc.target/msp430/pr80993.c
new file mode 100644
index 000..4da5cf9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/msp430/pr80993.c
@@ -0,0 +1,13 @@
+/* { dg-do link } */
+/* { dg-options "--save-temps -msim -flto -Os" } */
+/* { dg-final { scan-file "pr80993.exe.ltrans0.s" no_ref_handler } } */
+
+void __attribute__((interrupt)) no_ref_handler (void)
+{
+  while (1);
+}
+
+int main (void)
+{
+  return 0;
+}
-- 
1.8.3.1

Re: RFC: Explicit move preference hints

2017-08-24 Thread Wilco Dijkstra

Segher Boessenkool wrote:
>
> "0,r" might work, or "0,?r", or similar (alternatives have commas
> between them).

No, it doesn't work at all. But that is no surprise if you look at 
ira_get_dup_out_num.
It iterates over the constraint string and if you have anything that matches 
after a "0",
the "0" constraint is simply ignored. So "0" works only if it is the only 
constraint,
and any combination with an unconstrained register makes no difference
irrespectively whether it is "0r" or "0, r" or "0, ?!^$r".

So that is why I'd like to fix this so it actually works like you'd expect.

Wilco

Re: [PATCH] Allow fpmath differences in inlining when callee doesn't use FP math

2017-08-24 Thread Uros Bizjak

On Thu, Aug 24, 2017 at 1:10 PM, Richard Biener  wrote:
>
> This adjusts the x86 backend to allow -mfpmath differences when
> deciding whether to allow inlining.  -mfpmath doesn't really
> matter for functions not containing FP operations.
>
> It appears that the can_inline_p target hook is called from the
> C++ FE for multi-versioning, thus the ! ipa_fn_summaries check.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk?
>
> Incidentically this fixes the bootstrap issue I have when enabling
> free-lang-data without LTO.
>
> It doesn't fully solve the fpmath difference issue we have with
> intrinsics but is an optimization.
>
> Thanks,
> Richard.
>
> 2017-08-24  Richard Biener  
>
> * config/i386/i386.c: Include symbol-summary.h, ipa-prop.h
> and ipa-fnsummary.h.
> (ix86_can_inline_p): When ix86_fpmath flags do not match
> check whether the callee uses FP math at all.

LGTM for branches, but for trunk, I'd still like to remove fpmath
processing from outside ix86_option_override_internal. As mentioned in
the comment in i_o_o_i, blindly switching fpmath for TARGET_SSE can
violate ABI for some environments.

Uros.

> Index: gcc/config/i386/i386.c
> ===
> --- gcc/config/i386/i386.c  (revision 251307)
> +++ gcc/config/i386/i386.c  (working copy)
> @@ -85,6 +85,9 @@ along with GCC; see the file COPYING3.
>  #include "print-rtl.h"
>  #include "intl.h"
>  #include "ifcvt.h"
> +#include "symbol-summary.h"
> +#include "ipa-prop.h"
> +#include "ipa-fnsummary.h"
>
>  /* This file should be included last.  */
>  #include "target-def.h"
> @@ -7544,7 +7547,14 @@ ix86_can_inline_p (tree caller, tree cal
>else if (caller_opts->tune != callee_opts->tune)
> ret = false;
>
> -  else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath)
> +  else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath
> +  /* If the calle doesn't use FP expressions differences in
> + ix86_fpmath can be ignored.  We are called from FEs
> + for multi-versioning call optimization, so beware of
> + ipa_fn_summaries not available.  */
> +  && (! ipa_fn_summaries
> +  || ipa_fn_summaries->get
> +   (cgraph_node::get (callee))->fp_expressions))
> ret = false;
>
>else if (caller_opts->branch_cost != callee_opts->branch_cost)

[PATCH] Allow fpmath differences in inlining when callee doesn't use FP math

2017-08-24 Thread Richard Biener


This adjusts the x86 backend to allow -mfpmath differences when
deciding whether to allow inlining.  -mfpmath doesn't really
matter for functions not containing FP operations.

It appears that the can_inline_p target hook is called from the
C++ FE for multi-versioning, thus the ! ipa_fn_summaries check.

Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk?

Incidentically this fixes the bootstrap issue I have when enabling
free-lang-data without LTO.

It doesn't fully solve the fpmath difference issue we have with
intrinsics but is an optimization.

Thanks,
Richard.

2017-08-24  Richard Biener  

* config/i386/i386.c: Include symbol-summary.h, ipa-prop.h
and ipa-fnsummary.h.
(ix86_can_inline_p): When ix86_fpmath flags do not match
check whether the callee uses FP math at all.

Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c  (revision 251307)
+++ gcc/config/i386/i386.c  (working copy)
@@ -85,6 +85,9 @@ along with GCC; see the file COPYING3.
 #include "print-rtl.h"
 #include "intl.h"
 #include "ifcvt.h"
+#include "symbol-summary.h"
+#include "ipa-prop.h"
+#include "ipa-fnsummary.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -7544,7 +7547,14 @@ ix86_can_inline_p (tree caller, tree cal
   else if (caller_opts->tune != callee_opts->tune)
ret = false;
 
-  else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath)
+  else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath
+  /* If the calle doesn't use FP expressions differences in
+ ix86_fpmath can be ignored.  We are called from FEs
+ for multi-versioning call optimization, so beware of
+ ipa_fn_summaries not available.  */
+  && (! ipa_fn_summaries
+  || ipa_fn_summaries->get
+   (cgraph_node::get (callee))->fp_expressions))
ret = false;
 
   else if (caller_opts->branch_cost != callee_opts->branch_cost)

Re: [PATCH] C: fix logic within c_expr::get_location

2017-08-24 Thread Marek Polacek

On Wed, Aug 23, 2017 at 02:08:47PM -0400, David Malcolm wrote:
> In r251239 I added a c_expr::get_location method for use by
> c_parser_expr_list for building the vec for
> an expression list, rather than using the location of the first token.
> 
> When determining whether to use the location within the tree node,
> or fall back to the range in the c_expr, I used EXPR_CAN_HAVE_LOCATION,
> rather than EXPR_HAS_LOCATION.  This meant that any tree nodes of kinds
> that *can* have a location but which erroneously had
>EXPR_LOCATION (value) == UNKNOWN_LOCATION
> had that value added to the vec, leading to missing
> location information when reporting on the issue
> (seen with gcc.dg/Wtraditional-conversion-2.c for m68k).
> 
> This patch addresses this in two ways:
> 
> (a) it fixes the specific issue in this failing test case, by
> setting up the location properly on the EXCESS_PRECISION_EXPR.
> 
> (b) updating c_expr::get_location by only using the EXPR_LOCATION
> if it's sane.  It could be argued that this could be papering over
> other "missing location" bugs, but if there are any, they are
> pre-existing ones exposed by r251239, and I'd rather have this fix
> in place than play whack-a-mole on any other such bugs that may
> be lurking in the codebase.
> 
> Successfully bootstrapped on x86_64-pc-linux-gnu;
> I've verified the fix with --target=m68k-elf.
> 
> OK for trunk?

Ok, thanks.

Marek

Re: [PR middle-end/81931] do not pass 0 precision in get_nonzero_bits

2017-08-24 Thread Richard Biener

On Thu, Aug 24, 2017 at 8:59 AM, Marc Glisse  wrote:
> On Thu, 24 Aug 2017, Aldy Hernandez wrote:
>
>> As discussed in the PR...
>>
>> CCP is passing a precision of 0 to get_nonzero_bits for complex
>> numbers.  With my last wide_int sign extension patch, the sign
>> extension of -1 with a precision of 0 returns 0, whereas it previously
>> was preturning all ones.  The attached patch calls element_precision,
>> to avoid getting a precision of 0.
>
>
> So if we call get_nonzero_bits on a complex number, we get -1 with the
> precision of the real part? Not sure that's very meaningful :-/ What do the
> callers (with complex_type) look like, do they check if this returns -1 and
> give up in that case?
>
> +  /* Use element_precision instead of TYPE_PRECISION so complex and
> + vector types get a non-zero precision.  */
>
> IIRC, on vectors, TYPE_PRECISION gives the number of elements. Still, better
> not rely on that indeed.

Yeah, I guess the fallback in

  if (!ri)
return wi::shwi (-1, precision);

should in the end use a precision that matches the size of the type
if it is not INTEGRAL_TYPE_P.  OTOH that may run into
MAX_BITSIZE_MODE_ANY_INT limitations... (AVX512 vector types).

I guess the cleanest approach would be to disallow get_nonzero_bits
on sth non-integral but it could in theory make sense on vectors or
floats.

Anyway, the proposed patch improves the situation (vector types
already get a non-zero precision as TYPE_PRECISION is mapped
to TYPE_VECTOR_SUBPARTS there).

So -- ok for trunk.

Thanks,
Richard.

> +  unsigned int precision = element_precision (TREE_TYPE (name));
>
> TREE_TYPE is not necessary there (it doesn't hurt though).
>
> --
> Marc Glisse

[PATCH PR81913]Skip niter analysis if either IV in exit condition can wrap

2017-08-24 Thread Bin Cheng

Hi,
I added code handle exit condition like "IV1 le/lt IV2" by changing it into 
"IV1' le/lt INV".
Unfortunately, wrapping behavior has subtle impact on the transformation.  This 
patch for
now skips niter analysis if either IV1 or IV2 can wrap.  We can still handle 
pointer case
as reported in PR81196, but unsigned type needs more work.  The patch also 
includes two
XFAIL tests showing what shall be improved here.
Bootstrap and test on AArch64.  Is it OK?

Thanks,
bin
2017-08-24  Bin Cheng  

PR tree-optimization/81913
* tree-ssa-loop-niter.c (number_of_iterations_cond): Skip niter
analysis when either IVs in condition can wrap.

gcc/testsuite
2017-08-24  Bin Cheng  

PR tree-optimization/81913
* gcc.c-torture/execute/pr81913.c: New test.
* gcc.dg/tree-ssa/loop-niter-1.c: New test.
* gcc.dg/tree-ssa/loop-niter-2.c: New test.From 58262ff795e2c2f4cff2982dc8c7aecc240d3227 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Wed, 23 Aug 2017 10:04:01 +0100
Subject: [PATCH] pr81913-20170817.txt

---
 gcc/testsuite/gcc.c-torture/execute/pr81913.c | 27 +++
 gcc/testsuite/gcc.dg/tree-ssa/loop-niter-1.c  | 31 +++
 gcc/testsuite/gcc.dg/tree-ssa/loop-niter-2.c  | 31 +++
 gcc/tree-ssa-loop-niter.c |  6 --
 4 files changed, 93 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr81913.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/loop-niter-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/loop-niter-2.c

diff --git a/gcc/testsuite/gcc.c-torture/execute/pr81913.c 
b/gcc/testsuite/gcc.c-torture/execute/pr81913.c
new file mode 100644
index 000..11eec4e
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr81913.c
@@ -0,0 +1,27 @@
+/* PR tree-optimization/81913 */
+
+typedef unsigned char u8;
+typedef unsigned int u32;
+
+static u32
+b (u8 d, u32 e, u32 g)
+{
+  do
+{
+  e += g + 1;
+  d--;
+}
+  while (d >= (u8) e);
+
+  return e;
+}
+
+int
+main (void)
+{
+  u32 x = b (1, -0x378704, ~0xba64fc);
+  if (x != 0xd93190d0)
+__builtin_abort ();
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-niter-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/loop-niter-1.c
new file mode 100644
index 000..16c76fe
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-niter-1.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-sccp-details" } */
+
+typedef unsigned char u8;
+typedef unsigned int u32;
+
+static u32
+b (u8 d, u32 e, u32 g)
+{
+  do
+{
+  e += g + 1;
+  d--;
+}
+  while (d >= (u8) e);
+
+  return e;
+}
+
+int
+main (void)
+{
+  u32 x = b (200, -0x378704, ~0xba64fc);
+  if (x != 0xe1ee4ca0)
+__builtin_abort ();
+
+  return 0;
+}
+
+/* Niter analyzer should be able to compute niters for the loop.  */
+/* { dg-final { scan-tree-dump "Replacing uses of: .* with: 3790490784" "sccp" 
{ xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/loop-niter-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/loop-niter-2.c
new file mode 100644
index 000..2377e6c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/loop-niter-2.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-sccp-details" } */
+
+typedef unsigned char u8;
+typedef unsigned int u32;
+
+static u32
+b (u8 d, u32 e, u32 g)
+{
+  do
+{
+  e += g + 1;
+  d--;
+}
+  while (d >= (u8) e);
+
+  return e;
+}
+
+int
+main (void)
+{
+  u32 x = b (1, -0x378704, ~0xba64fc);
+  if (x != 0xd93190d0)
+__builtin_abort ();
+  return 0;
+}
+
+/* Niter analyzer should be able to compute niters for the loop even though
+   IV:d wraps.  */
+/* { dg-final { scan-tree-dump "Replacing uses of: .* with: 3643904208" "sccp" 
{ xfail *-*-* } } } */
diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 0d6d101..27244eb 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -1728,7 +1728,7 @@ number_of_iterations_cond (struct loop *loop,
  provided that either below condition is satisfied:
 
a) the test is NE_EXPR;
-   b) iv0.step - iv1.step is positive integer.
+   b) iv0.step - iv1.step is integer and iv0/iv1 don't overflow.
 
  This rarely occurs in practice, but it is simple enough to manage.  */
   if (!integer_zerop (iv0->step) && !integer_zerop (iv1->step))
@@ -1739,7 +1739,9 @@ number_of_iterations_cond (struct loop *loop,
 
   /* No need to check sign of the new step since below code takes care
 of this well.  */
-  if (code != NE_EXPR && TREE_CODE (step) != INTEGER_CST)
+  if (code != NE_EXPR
+ && (TREE_CODE (step) != INTEGER_CST
+ || !iv0->no_overflow || !iv1->no_overflow))
return false;
 
   iv0->step = step;
-- 
1.9.1

Re: [PATCH] PowerPC cleanup, remove -mpower9-dform{,-scalar,-vector} options

2017-08-24 Thread Segher Boessenkool

Hi Mike,

On Wed, Aug 23, 2017 at 01:34:43PM -0400, Michael Meissner wrote:
> This patch eliminates the undocumented debugging options -mpower9-dform,
> -mpower9-dform-scalar, and -mpower9-dform-vector.  These switches were added
> when I added the support for the ISA 3.0 (power9) d-form (register+offset)
> vector addressing.

>   * config/rs6000/rs6000.c (rs6000_setup_reg_addr_masks): Change
>   tests against -mpowwer9-dform* to -mpower9-vector.  Delete code

Typo (power!)

> @@ -4239,8 +4239,7 @@ rs6000_option_override_internal (bool gl
>  
>/* For the newer switches (vsx, dfp, etc.) set some of the older options,
>   unless the user explicitly used the -mno- to disable the code.  
> */
> -  if (TARGET_P9_VECTOR || TARGET_MODULO || TARGET_P9_DFORM_SCALAR
> -  || TARGET_P9_DFORM_VECTOR || TARGET_P9_DFORM_BOTH > 0)
> +  if (TARGET_P9_VECTOR || TARGET_MODULO || TARGET_P9_MISC)
>  rs6000_isa_flags |= (ISA_3_0_MASKS_SERVER & ~ignore_masks);
>else if (TARGET_P9_MINMAX)
>  {

Where did TARGET_P9_MISC come from?  It should be mentioned in the
changelog, if it is needed here.

Okay for trunk with those things dealt with.  Thanks!


Segher

Re: Heads-Up: early LTO debug to land, breaking Mach-O / [X]COFF

2017-08-24 Thread Richard Biener

On Wed, 23 Aug 2017, Rainer Orth wrote:

> Hi Richard,
> 
> > On Fri, 12 May 2017, Richard Biener wrote:
> >
> >> 
> >> This is a heads-up that I am in the process of implementing the last
> >> of Jasons review comments on the dwarf2out parts of early LTO debug
> >> support.  I hope to post final patches early next week after thoroughly
> >> re-testing everything.
> >> 
> >> Note that Mach-O and [X]COFF support in the simple-object machinery
> >> is still missing for the early LTO debug feature so I am going to
> >> break LTOing with DWARF debuginfo on Darwin and Windows (CCing
> >> maintainers).  Mach-O support has been worked on a bit by Iain
> >> and myself but the simple-object piece is still missing.
> >> A workaround is to use stabs on these targets with LTO.
> 
> unfortunately, the patch not only broke LTO on Darwin, but bootstrap
> completely (seen on x86_64-apple-darwin17.0.0):
> 
> /var/gcc/src/hg/trunk/local/gcc/dwarf2out.c: In function 'void 
> init_sections_and_labels(bool)':
> /var/gcc/src/hg/trunk/local/gcc/dwarf2out.c:27184:1: error: '%ld' directive 
> writing between 1 and 10 bytes into a region of size 9 
> [-Werror=format-overflow=]
>  init_sections_and_labels (bool early_lto_debug)
>  ^~~~
> /var/gcc/src/hg/trunk/local/gcc/dwarf2out.c:27184:1: note: directive argument 
> in the range [0, 4294967295]
> In file included from ./tm.h:21:0,
>  from /var/gcc/src/hg/trunk/local/gcc/target.h:52,
>  from /var/gcc/src/hg/trunk/local/gcc/dwarf2out.c:61:
> /var/gcc/src/hg/trunk/local/gcc/config/darwin.h:759:15: note: 'sprintf' 
> output between 23 and 32 bytes into a destination of size 30
>sprintf (LABEL, "*%s%ld", PREFIX, (long)(NUM)); \
>^~
> /var/gcc/src/hg/trunk/local/gcc/dwarf2out.c:27210:4: note: in expansion of 
> macro 'ASM_GENERATE_INTERNAL_LABEL'
> ASM_GENERATE_INTERNAL_LABEL (debug_skeleton_line_section_label,
> ^~~
> /var/gcc/src/hg/trunk/local/gcc/dwarf2out.c:27184:1: error: '%ld' directive 
> writing between 1 and 10 bytes into a region of size 7 
> [-Werror=format-overflow=]
>  init_sections_and_labels (bool early_lto_debug)
>  ^~~~
> /var/gcc/src/hg/trunk/local/gcc/dwarf2out.c:27184:1: note: directive argument 
> in the range [0, 4294967295]
> In file included from ./tm.h:21:0,
>  from /var/gcc/src/hg/trunk/local/gcc/target.h:52,
>  from /var/gcc/src/hg/trunk/local/gcc/dwarf2out.c:61:
> /var/gcc/src/hg/trunk/local/gcc/config/darwin.h:759:15: note: 'sprintf' 
> output between 25 and 34 bytes into a destination of size 30
>sprintf (LABEL, "*%s%ld", PREFIX, (long)(NUM)); \
>^~
> /var/gcc/src/hg/trunk/local/gcc/dwarf2out.c:27229:4: note: in expansion of 
> macro 'ASM_GENERATE_INTERNAL_LABEL'
> ASM_GENERATE_INTERNAL_LABEL (debug_skeleton_abbrev_section_label,
> ^~~
> /var/gcc/src/hg/trunk/local/gcc/dwarf2out.c:27184:1: error: '%ld' directive 
> writing between 1 and 10 bytes into a region of size 9 
> [-Werror=format-overflow=]
>  init_sections_and_labels (bool early_lto_debug)
>  ^~~~
> /var/gcc/src/hg/trunk/local/gcc/dwarf2out.c:27184:1: note: directive argument 
> in the range [0, 4294967295]
> In file included from ./tm.h:21:0,
>  from /var/gcc/src/hg/trunk/local/gcc/target.h:52,
>  from /var/gcc/src/hg/trunk/local/gcc/dwarf2out.c:61:
> /var/gcc/src/hg/trunk/local/gcc/config/darwin.h:759:15: note: 'sprintf' 
> output between 23 and 32 bytes into a destination of size 30
>sprintf (LABEL, "*%s%ld", PREFIX, (long)(NUM)); \
>^~
> /var/gcc/src/hg/trunk/local/gcc/dwarf2out.c:27238:4: note: in expansion of 
> macro 'ASM_GENERATE_INTERNAL_LABEL'
> ASM_GENERATE_INTERNAL_LABEL (debug_skeleton_line_section_label,
> ^~~
> /var/gcc/src/hg/trunk/local/gcc/dwarf2out.c:27184:1: error: '%ld' directive 
> writing between 1 and 10 bytes into a region of size 9 
> [-Werror=format-overflow=]
>  init_sections_and_labels (bool early_lto_debug)
>  ^~~~
> /var/gcc/src/hg/trunk/local/gcc/dwarf2out.c:27184:1: note: directive argument 
> in the range [0, 4294967295]
> In file included from ./tm.h:21:0,
>  from /var/gcc/src/hg/trunk/local/gcc/target.h:52,
>  from /var/gcc/src/hg/trunk/local/gcc/dwarf2out.c:61:
> /var/gcc/src/hg/trunk/local/gcc/config/darwin.h:759:15: note: 'sprintf' 
> output between 23 and 32 bytes into a destination of size 30
>sprintf (LABEL, "*%s%ld", PREFIX, (long)(NUM)); \
>^~
> /var/gcc/src/hg/trunk/local/gcc/dwarf2out.c:27245:4: note: in expansion of 
> macro 'ASM_GENERATE_INTERNAL_LABEL'
> ASM_GENERATE_INTERNAL_LABEL

RE: [PATCH][GCC][PATCHv3] Improve fpclassify w.r.t IEEE like numbers in GIMPLE.

2017-08-24 Thread Richard Biener

On Wed, 23 Aug 2017, Tamar Christina wrote:

> Hi Richard,
> 
> Thanks for the feedback,
> 
> > 
> > I think you should split up your work a bit.  The pieces from
> > fold_builtin_* you remove and move to lowering that require no control flow
> > like __builtin_isnan (x) -> x UNORD x should move to match.pd patterns (aka
> > foldings that will then be applied both on GENERIC and GIMPLE).
> > 
> 
> But emitting them as an UNORD wouldn't solve the performance or correctness 
> issues
> The patch is trying to address. For correctness, for instance an UNORD would 
> still signal
> When -fsignaling-nans is used (PR/66462).

Well, then address the correctness issues separately.  I think lumping
everything into a single patch just makes your live harder ;)

I was saying that if I write isunordered (x, y) and use 
-fno-signaling-nans
(the default) then I would expect the same code to be generated as if
I say isnan (x) || isnan (y).  And I'm not sure if isnan (x) || isnan (y)
is more efficient if both are implemented with integer ops compared
to using an unordered FP compare.

A canonical high-level representation on GIMPLE is equally important
as maybe getting some optimization on a lowered "optimized" form.

And a good canonical representation is short (x UNORD x or x UNORD y
counts as short here).

> > As fallback the expanders should simply emit a call (as done for isinf when
> > not folded for example).
> 
> Yes the patch currently already does this.

Ah, I probably looked at an old version then.

> My question had more to do if 
> the late expansion when you alias one of these functions in C++ was 
> really an issue, since It used to (sometimes, only when you got the type 
> of the argument correct) expand before whereas now it'll always just 
> emit a function call in this edge case.

The answer here is really that we should perform lowering at a point
where we do not omit possible optimizations.  For the particular case
this would mean lowering after IPA optimizations (also think of
accelerator offloading which uses the early optimization phase of
the host).  This also allows early optimization to more accurately
estimate code size (when optimizing for size).  And I still think
whether to expose fp classification as FP or integer ops should be
a target decision and done late.

This still means canonicalizations like isnan -> UNORD should happen
early and during folding.

Note __builtin_fpclassify is really a special case and lowering that
early is a good thing -- I'd simply keep the current code here at
the start.

> > I think fpclassify is special (double-check), you can't have an indirect 
> > call to
> > the classification family as they are macros (builtins where they exist 
> > could be
> > called indirectly though I guess we should simply disallow taking their
> > address).  These are appropriate for early lowering like you do.  You can 
> > leave
> > high-level ops from fpclassify lowering and rely on folding to turn them 
> > into
> > sth more optimal.
> 
> Fpclassify is also a function in C++, so in that regard it's not 
> different from the rest. For C code my patch will always do the right 
> thing as like you said they're macros so I would always be able to lower 
> them early.

Hum, but fpclassify is then not a builtin in C++ given it needs to know
the actual values of FP_NAN and friends.  That is, we are talking about
__builtin_fpclassify which does not match the standards fpclassify API.
The builtin is always a function and never a macro.

> > 
> > I still don't fully believe in lowering to integer ops like the isnan 
> > lowering you
> > are doing.  That hides what you are doing from (not yet existing) 
> > propagation
> > passes of FP classification.  I'd do those transforms only late.
> > You are for example missing to open-code isunordered with integer ops, no?
> > I'd have isnan become UNORDERED_EXPR and eventually expand that with
> > bitops.
> 
> I'm not sure how I would be able to distinguish between the 
> UNORDERED_EXPR of an isnan call and that or a general X UNORDERED_EXPR X 
> expression. The new isnan code that replaces the UNORD isn't a general 
> comparison, it can only really check for nan. In general don't how if I 
> rewrite these to TREE expressions early in match.pd how I would ever be 
> able to recognize the very specific cases they try to handle in order to 
> do bitops only when it makes sense.

I believe the reverse transform x UNORD x to isnan (x) is also valid
(modulo -fsignalling-nans).

> 
> I think C++ cases will still be problematic, since match.pd will match on
> 
> int (*xx)(...);
> xx = isnan;
> if (xx(a))
> 
> quite late, and only when xx is eventually replaced by isnan, so a lot 
> of optimizations would have already ignored the UNORD it would have 
> generated. Which means the bitops have to be quite late and would miss a 
> lot of other optimization phases.

Is that so?  Did you try whether vectorization likes x UNORD x or
the bit operations variant of

Re: [PR middle-end/81931] do not pass 0 precision in get_nonzero_bits

2017-08-24 Thread Marc Glisse


On Thu, 24 Aug 2017, Aldy Hernandez wrote:


As discussed in the PR...

CCP is passing a precision of 0 to get_nonzero_bits for complex
numbers.  With my last wide_int sign extension patch, the sign
extension of -1 with a precision of 0 returns 0, whereas it previously
was preturning all ones.  The attached patch calls element_precision,
to avoid getting a precision of 0.


So if we call get_nonzero_bits on a complex number, we get -1 with the 
precision of the real part? Not sure that's very meaningful :-/ What do 
the callers (with complex_type) look like, do they check if this returns 
-1 and give up in that case?


+  /* Use element_precision instead of TYPE_PRECISION so complex and
+ vector types get a non-zero precision.  */

IIRC, on vectors, TYPE_PRECISION gives the number of elements. Still, 
better not rely on that indeed.


+  unsigned int precision = element_precision (TREE_TYPE (name));

TREE_TYPE is not necessary there (it doesn't hurt though).

--
Marc Glisse

[PR middle-end/81931] do not pass 0 precision in get_nonzero_bits

2017-08-24 Thread Aldy Hernandez

As discussed in the PR...

CCP is passing a precision of 0 to get_nonzero_bits for complex
numbers.  With my last wide_int sign extension patch, the sign
extension of -1 with a precision of 0 returns 0, whereas it previously
was preturning all ones.  The attached patch calls element_precision,
to avoid getting a precision of 0.

Segher tested this patch on ppc64-linux {-m32,-m64}, and it fixes the
ppc64 bootstrap problems.

OK?
Aldy


curr
Description: Binary data

Re: [49/77] Simplify nonzero/num_sign_bits hooks

2017-08-24 Thread Jeff Law

On 07/13/2017 02:55 AM, Richard Sandiford wrote:
> The two implementations of the reg_nonzero_bits and reg_num_sign_bits
> hooks ignored the "known_x", "known_mode" and "known_ret" arguments,
> so this patch removes them.  It adds a new scalar_int_mode parameter
> that specifies the mode of "x".  (This mode might be different from
> "mode", which is the mode in which "x" is used.)
> 
> 2017-07-13  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * rtl.h (rtl_hooks::reg_nonzero_bits): Add a scalar_int_mode
>   parameter for the mode of "x".  Remove the "known_x", "known_mode"
>   and "known_ret" arguments.  Change the type of the mode argument
>   to scalar_int_mode.
>   (rtl_hooks:reg_num_sign_bit_copies): Likewise.
>   * combine.c (reg_nonzero_bits_for_combine): Update accordingly.
>   (reg_num_sign_bit_copies_for_combine): Likewise.
>   * rtlanal.c (nonzero_bits1): Likewise.
>   (num_sign_bit_copies1): Likewise.
>   * rtlhooks-def.h (reg_nonzero_bits_general): Likewise.
>   (reg_num_sign_bit_copies_general): Likewise.
>   * rtlhooks.c (reg_num_sign_bit_copies_general): Likewise.
>   (reg_nonzero_bits_general): Likewise.
> 
OK.
jeff

Re: [48/77] Make subroutines of num_sign_bit_copies operate on scalar_int_mode

2017-08-24 Thread Jeff Law

On 07/13/2017 02:55 AM, Richard Sandiford wrote:
> Similarly to the nonzero_bits patch, this one moves the mode
> class check and VOIDmode handling from num_sign_bit_copies1
> to num_sign_bit_copies itself, then changes the subroutines
> to operate on scalar_int_modes.
> 
> gcc/
> 2017-07-13  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
>   * rtlanal.c (num_sign_bit_copies): Handle VOIDmode here rather
>   than in subroutines.  Return 1 for non-integer modes.
>   (cached_num_sign_bit_copies): Change the type of the mode parameter
>   to scalar_int_mode.
>   (num_sign_bit_copies1): Likewise.  Remove early exit for other mode
>   classes.  Handle CONST_INT_P first and then check whether X also
>   has a scalar integer mode.  Check the same thing for inner registers
>   of a SUBREG and for values that are being extended or truncated.
OK.
jeff

Re: [47/77] Make subroutines of nonzero_bits operate on scalar_int_mode

2017-08-24 Thread Jeff Law

On 07/13/2017 02:55 AM, Richard Sandiford wrote:
> nonzero_bits1 assumed that all bits of a floating-point or vector mode
> were needed.  It seems likely that fixed-point modes should have been
> handled in the same way.  After excluding those, the only remaining
> modes that are likely to be useful are scalar integer ones.
> 
> This patch moves the mode class check to nonzero_bits itself, along
> with the handling of mode == VOIDmode.  The subroutines of nonzero_bits
> can then take scalar_int_modes.
> 
> gcc/
> 2017-07-13  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
>   * rtlanal.c (nonzero_bits): Handle VOIDmode here rather than
>   in subroutines.  Return the mode mask for non-integer modes.
>   (cached_nonzero_bits): Change the type of the mode parameter
>   to scalar_int_mode.
>   (nonzero_bits1): Likewise.  Remove early exit for other mode
>   classes.  Handle CONST_INT_P first and then check whether X
>   also has a scalar integer mode.
OK.  I suspect not handling fixed-point modes like floating point modes
was just an oversight.

jeff

Re: [PATCH 6/6] qsort comparator consistency checking

2017-08-24 Thread Jeff Law

On 08/10/2017 05:24 AM, Alexander Monakov wrote:
> On Wed, 9 Aug 2017, Jeff Law wrote:
 The _5th macro isn't that bad either, appart from using reserved namespace
 identifiers (it really should be something like qsort_5th and the arguments
 shouldn't start with underscores).
>>>
>>> I didn't understand what Jeff found "ugly" about it; I wonder what epithets
>>> would be applied then to more, ahem, unusual parts of GCC.
>> I doubt anyone would want to hear what I might say about other code.
>> I'm sure I wouldn't want my kids reading how I might refer to certain
>> parts of GCC.
> 
> Imho it's no good to just say "ugly" in patch review without any further
> elaboration, it only serves to provide a minor chilling effect, telling
> the contributor they haven't done good enough (for your personal taste)
> without informing them where/how they could have improved.
> 
> More importantly, am I correct in understanding that at this point all
> remaining changes are reviewed and approved, and I can go ahead with
> preparing vec::qsort -> vec::sort mass-renaming patch and rebasing the
> others on top of it?  Or is the original approach with argument-counting
> trick still under consideration?
I still don't like the argument-counting trick.  But avoiding it seems
even more painful.  So let's go with your original approach with the
argument-counting trick.  At least it's buried and folks likely won't
have to look at it with any regularity.

jeff

76 matches

Mail list logo