Re: [PATCH] Fix tree-loop-distribution.c ICE with -ftrapv (PR tree-optimization/89278)

2019-02-15 Thread Bin.Cheng
On Fri, Feb 15, 2019 at 3:48 PM Jakub Jelinek  wrote:
>
> On Fri, Feb 15, 2019 at 08:33:44AM +0100, Jakub Jelinek wrote:
> > On Fri, Feb 15, 2019 at 03:25:33PM +0800, Bin.Cheng wrote:
> > > So with what condition we can safely rewrite trapping operations into
> > > non trapping one?  Does the rewrite nullify -ftrapv which requires
> > > trap behavior?
> >
> > For the particular expression?  Yes, otherwise no.
> >
> > -ftrapv should be either replaced with -fsanitize=signed-integer-overflow
> > -fsanitize-undefined-trap-on-error, or at least implemented that way in the
> > middle-end (perhaps with a separate ifn, so that we can pattern recognize it
> > during expansion and use library calls where the inline call is not small
> > enough).  We haven't done that yet though.
>
> To clarify, the current -ftrapv implementation doesn't guarantee you get
> traps on overflow, it will happily optimize computations away at any time
> during GIMPLE optimizations, or turn stuff into unsigned computations etc.
> (not just through this rewrite function, but many other ways).
> For -fsanitize=signed-integer-overflow -fsanitize-undefined-trap-on-error
> there are no guarantees either, but we try hard not to optimize those away,
> we have TYPE_OVERFLOW_SANITIZED checks that punt certain optimizations in
> fold-const.c/match.pd and early (right after going into ssa form) we turn
> the arithmetics into ifns, which are optimized away only if we can prove
> there will be no overflow.  On the other side, it can hinder other
> optimizations (a lot).  And possibly overflowing computations introduced
> during later optimizations are not sanitized.
> The question is what -ftrapv users want, plus right now they have a choice,
> catch perhaps less UB with more optimization opportunities (-ftrapv)
> or catch more optimize less (UBSan).
Thanks very much for the explanation, that explains all questions I had.

Thanks,
bin
>
> Jakub


Re: Go patch committed: Harmonize types referenced by both C and Go

2019-02-15 Thread Andreas Schwab
This breaks non-split-stack builds.

../../../libgo/runtime/stack.c: In function 'doscanstack1':
../../../libgo/runtime/stack.c:113:18: error: passing argument 1 of 
'scanstackblock' makes integer from pointer without a cast 
[-Werror=int-conversion]
  113 |   scanstackblock(bottom, (uintptr)(top - bottom), gcw);
  |  ^~
  |  |
  |  byte * {aka unsigned char *}

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [PATCH][GCC][Arm] Add HF modes to ANY iterators

2019-02-15 Thread Christophe Lyon
On Thu, 14 Feb 2019 at 17:52, Tamar Christina  wrote:
>
> Hi Kyrill,
>
> I couldn't find a way to actually generate this case so I have instead removed
> the entry from ANY128.  New patch and changelog below.
>
> --
>
> The iterator ANY64 are used in various general split patterns and is supposed
> to contain all 64 bit modes.
>
> For some reason the pattern has HI but not HF.  This adds HF so that general
> 64 bit splits are generated for these modes as well.  These are required
> by various split patterns that expect them to be there.
>
> Bootstrapped Regtested on arm-none-gnueabihf and no issues.
>
> Ok for trunk?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> 2019-02-14  Tamar Christina  
>
> PR target/88850
> * config/arm/iterators.md (ANY64): Add V4HF.
>
> gcc/testsuite/ChangeLog:
>
> 2019-02-14  Tamar Christina  
>
> PR target/88850
> * gcc.target/arm/pr88850-2.c: New test.
> * lib/target-supports.exp
> (check_effective_target_arm_neon_softfp_fp16_ok_nocache,
> check_effective_target_arm_neon_softfp_fp16_ok,
> add_options_for_arm_neon_softfp_fp16): New.
>
>

Hi,

I've noticed strange things with this new testcase.
I see it failing on target arm-none-linux-gnueabi  --with-mode arm
--with-cpu cortex-a9
and on target arm-none-eabi --with-mode arm --with-cpu cortex-a9


Looking at the logs, I see strange command lines when trying to
compile arm_neon_softfp_fp16_ok:
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/gcc3/gcc/xgcc
-B/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/gcc3/gcc/
-fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
-fdiagnostics-color=never -mfloat-abi=softfp -mfpu=neon-fp16
-mfloat-abi=softfp -c -o arm_neon_softfp_fp16_ok21466.o
arm_neon_softfp_fp16_ok21466.c
arm_neon_softfp_fp16_ok21466.c:3:3: error: unknown type name
'float16x4_t'; did you mean 'float32x4_t'?
arm_neon_softfp_fp16_ok21466.c: In function 'foo':
arm_neon_softfp_fp16_ok21466.c:6:26: warning: implicit declaration of
function 'vcvt_f16_f32'; did you mean 'vcvt_u32_f32'?
[-Wimplicit-function-declaration]
compiler exited with status 1

[I don't know where the first 'mfloat-abi=softfp' comes from]
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/gcc3/gcc/xgcc
-B/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/gcc3/gcc/
-fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
-fdiagnostics-color=never -mfloat-abi=softfp -mfloat-abi=softfp
-mfp16-format=ieee -c -o arm_neon_softfp_fp16_ok21466.o
arm_neon_softfp_fp16_ok21466.c
[succeeds]

then, when compiling the testcase:
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/gcc3/gcc/xgcc
-B/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/gcc3/gcc/
/gcc/testsuite/gcc.target/arm/pr88850-2.c -fno-diagnostics-show-caret
-fno-diagnostics-show-line-numbers -fdiagnostics-color=never -ansi
-pedantic-errors -O2 -march=armv7-a -fdump-rtl-final
-mfloat-abi=softfp -mfloat-abi=softfp -mfp16-format=ieee
-ffat-lto-objects -S -o pr88850-2.s
In file included from /gcc/testsuite/gcc.target/arm/pr88850-2.c:7:
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/gcc3/gcc/include/arm_neon.h:31:2:
error: #error "NEON intrinsics not available with the soft-float ABI.
Please use -mfloat-abi=softfp or -mfloat-abi=hard"
/gcc/testsuite/gcc.target/arm/pr88850-2.c:9:21: error: unknown type
name 'float16x4_t'
/gcc/testsuite/gcc.target/arm/pr88850-2.c:11:9: error: unknown type
name 'float16x4_t'

Why does the compiler think it's using float=abi=soft?


> The 02/13/2019 10:57, Kyrill Tkachov wrote:
> > Hi Tamar
> >
> > On 2/13/19 10:33 AM, Tamar Christina wrote:
> > > Hi All,
> > >
> > > The iterators ANY64 and ANY128 are used in various general split
> > > patterns and
> > > are supposed to contain any 64 bit and 128 bit modes respectively.
> > >
> > > For some reason these patterns had HI but not HF.  This adds HF so
> > > that general
> > > 64 and 128 bit splits are generated for these modes as well.  These
> > > are required
> > > by various split patterns that expect them to be there.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and  > > still running> issues.
> > >
> > Please do this on an arm-none-linux-gnueabihf target.
> >
> > Though I suspect this is just a placeholder from a boilerplate ;)
> >
> > > Ok for trunk?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 2019-02-13  Tamar Christina  
> > >
> > > PR target/88850
> > > * config/arm/iterators.md (ANY64): Add V4HF,
> > > (ANY128): Add V8HF.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > 2019-02-13  Tamar Christina  
> > >
> > > PR target/88850
> > > * gcc.target/arm/pr88850-2.c: New test.
> > >
> > > --
> >
> > diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> > index 
> > c33e572c3e89c3dc5848bd6b825d618481247558..4ac048a0c609273691c264c97ccf6cd47b43943b
> >  100644
> > --- a/gcc/co

[patch] Fix LRA/reload issue with -fnon-call-exceptions

2019-02-15 Thread Eric Botcazou
Hi,

this is a regression present on all active branches since the controversial 
get_initial_register_offset stuff was added to rtlanal.c some time ago, and 
visible in the testsuite on PowerPC/Linux under the form of gnat.dg/opt73.adb 
timing out at run time.

The problem is that the compiler generates code that doesn't save the frame 
pointer before clobbering it, because rs6000_stack_info computes a wrong final 
(post-reload) stack layout.  The scenario is as follows: LRA decides to use 
the frame pointer, sets reload_completed to 1 at the end and then does:

  /* We've possibly turned single trapping insn into multiple ones.  */
  if (cfun->can_throw_non_call_exceptions)
{
  auto_sbitmap blocks (last_basic_block_for_fn (cfun));
  bitmap_ones (blocks);
  find_many_sub_basic_blocks (blocks);
}

But find_many_sub_basic_blocks calls control_flow_insn_p, which in turn can 
call rtx_addr_can_trap_p_1, which can call get_initial_register_offset, which 
uses INITIAL_ELIMINATION_OFFSET, IOW rs6000_initial_elimination_offset, which 
calls rs6000_stack_info.  But at this point the DF information hasn't been 
updated so the frame pointer isn't detected as live by df_regs_ever_live_p.

You may think that the fix is just to set reload_completed to 1 after the 
above code in lra, but that's not sufficient because the same issue can arise 
from the do_reload function:

  if (optimize)
cleanup_cfg (CLEANUP_EXPENSIVE);

when checking is enabled, because cleanup_cfg can calls control_flow_insn_p 
and then eventually rtx_addr_can_trap_p_1.  In other words, we would need 
to set reload_completed to 1 only after the above code, which is very late.
As a matter of fact, that's not possible for old reload itself because of:

  /* We must set reload_completed now since the cleanup_subreg_operands call
 below will re-recognize each insn and reload may have generated insns
 which are only valid during and after reload.  */
  reload_completed = 1;

So, barring the removal of the get_initial_register_offset stuff, the only 
simple fix is probably to prevent it from calling into the back-end too early, 
for example with the attached fixlet.  Tested on x86-64 and PowerPC/Linux.

Thoughts?  Where do we want to fix this?


* rtlanal.c (get_initial_register_offset): Fall back to the raw estimate
as long as the epilogue isn't completed.

-- 
Eric BotcazouIndex: rtlanal.c
===
--- rtlanal.c	(revision 268849)
+++ rtlanal.c	(working copy)
@@ -359,10 +359,10 @@ get_initial_register_offset (int from, i
   if (to == from)
 return 0;
 
-  /* It is not safe to call INITIAL_ELIMINATION_OFFSET
- before the reload pass.  We need to give at least
- an estimation for the resulting frame size.  */
-  if (! reload_completed)
+  /* It is not safe to call INITIAL_ELIMINATION_OFFSET before the epilogue
+ is completed, but we need to give at least an estimate for the stack
+ pointer based on the frame size.  */
+  if (!epilogue_completed)
 {
   offset1 = crtl->outgoing_args_size + get_frame_size ();
 #if !STACK_GROWS_DOWNWARD


[visium] Adjust to recent assembler change

2019-02-15 Thread Eric Botcazou
This adjusts the compiler to the assembler change I recently istalled:
  https://sourceware.org/ml/binutils/2019-02/msg00035.html

The final.c one-liner is trivial, it changes the test to the exact condition 
under which the fallthrough code won't segfault.

Tested on visium-elf, applied on the mainline.


2019-02-15  Eric Botcazou  

libgcc/
* config/visium/lib2funcs.c (__set_trampoline_parity): Replace
TRAMPOLINE_SIZE with __LIBGCC_TRAMPOLINE_SIZE__.
gcc/
* final.c (insn_current_reference_address): Replace test on JUMP_P
with test on jump_to_label_p.
* config/visium/visium-passes.def: New file.
* config/visium/t-visium (PASSES_EXTRA): Define.
* config/visium/visium-protos.h (make_pass_visium_reorg): Declare.
* config/visium/visium.h (TRAMPOLINE_SIZE): Adjust.
(TRAMPOLINE_ALIGNMENT): Define.
* config/visium/visium.c (visium_option_override): Do not register
the machine-specific reorg pass here.
(visium_trampoline_init): Align the BRA insn on a 64-bit boundary
for the GR6.
(output_branch): Adjust threshold for long branch instruction.
* config/visium/visium.md (cpu): Move around.
(length): Adjust for the GR6.

-- 
Eric BotcazouIndex: libgcc/config/visium/lib2funcs.c
===
--- libgcc/config/visium/lib2funcs.c	(revision 268849)
+++ libgcc/config/visium/lib2funcs.c	(working copy)
@@ -315,7 +315,9 @@ __set_trampoline_parity (UWtype *addr)
 {
   int i;
 
-  for (i = 0; i < (TRAMPOLINE_SIZE * __CHAR_BIT__) / W_TYPE_SIZE; i++)
+  for (i = 0;
+   i < (__LIBGCC_TRAMPOLINE_SIZE__ * __CHAR_BIT__) / W_TYPE_SIZE;
+   i++)
 addr[i] |= parity_bit (addr[i]);
 }
 #endif
Index: gcc/final.c
===
--- gcc/final.c	(revision 268849)
+++ gcc/final.c	(working copy)
@@ -606,7 +606,7 @@ insn_current_reference_address (rtx_insn
 
   rtx_insn *seq = NEXT_INSN (PREV_INSN (branch));
   seq_uid = INSN_UID (seq);
-  if (!JUMP_P (branch))
+  if (!jump_to_label_p (branch))
 /* This can happen for example on the PA; the objective is to know the
offset to address something in front of the start of the function.
Thus, we can treat it like a backward branch.
Index: gcc/config/visium/t-visium
===
--- gcc/config/visium/t-visium	(revision 268849)
+++ gcc/config/visium/t-visium	(working copy)
@@ -1,4 +1,5 @@
-# Multilibs for Visium.
+# General rules that all visium/ targets must have.
+
 # Copyright (C) 2012-2019 Free Software Foundation, Inc.
 #
 # This file is part of GCC.
@@ -17,6 +18,8 @@
 # along with GCC; see the file COPYING3.  If not see
 # .
 
+PASSES_EXTRA += $(srcdir)/config/visium/visium-passes.def
+
 # The compiler defaults to -mcpu=gr5 but this may be overridden via --with-cpu
 # at configure time so the -mcpu setting must be symmetrical.
 MULTILIB_OPTIONS = mcpu=gr5/mcpu=gr6 muser-mode
Index: gcc/config/visium/visium-passes.def
===
--- gcc/config/visium/visium-passes.def	(nonexistent)
+++ gcc/config/visium/visium-passes.def	(working copy)
@@ -0,0 +1,27 @@
+/* Description of target passes for Visium.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+/*
+   Macros that can be used in this file:
+   INSERT_PASS_AFTER (PASS, INSTANCE, TGT_PASS)
+   INSERT_PASS_BEFORE (PASS, INSTANCE, TGT_PASS)
+   REPLACE_PASS (PASS, INSTANCE, TGT_PASS)
+ */
+
+  INSERT_PASS_AFTER (pass_delay_slots, 1, pass_visium_reorg);
Index: gcc/config/visium/visium-protos.h
===
--- gcc/config/visium/visium-protos.h	(revision 268849)
+++ gcc/config/visium/visium-protos.h	(working copy)
@@ -61,4 +61,6 @@ extern int visium_expand_block_set (rtx
 extern unsigned int reg_or_subreg_regno (rtx);
 #endif /* RTX_CODE */
 
+extern rtl_opt_pass * make_pass_visium_reorg (gcc::context *);
+
 #endif
Index: gcc/config/visium/visium.c
===
--- gcc/config/visium/visium.c	(revision 268849)
+++ gcc/config/visium/visium.c	(working copy)
@@ -484,20 +484,6 @@ visium_option_override (void)
 

[testsuite] Small tweaks for Visium

2019-02-15 Thread Eric Botcazou
The only interesting one is gcc.dg/tree-ssa/pr84859.c: for it to pass, the 
undocumented -ftree-cselim must be enabled, which is done automatically only 
on targets with conditional moves, what the Visium is not.

Tested on visium-elf, applied on the mainline and 8 branch.


2019-02-15  Eric Botcazou  

* c-c++-common/patchable_function_entry-decl.c: Do not run on Visium.
* c-c++-common/patchable_function_entry-default.c: Likewise.
* c-c++-common/patchable_function_entry-definition.c: Likewise.
* gcc.dg/tree-ssa/pr84859.c: Add -ftree-cselim switch.

-- 
Eric BotcazouIndex: c-c++-common/patchable_function_entry-decl.c
===
--- c-c++-common/patchable_function_entry-decl.c	(revision 268849)
+++ c-c++-common/patchable_function_entry-decl.c	(working copy)
@@ -1,6 +1,5 @@
-/* { dg-do compile { target { ! nvptx*-*-* } } } */
+/* { dg-do compile { target { ! { nvptx*-*-* visium-*-* } } } } */
 /* { dg-options "-O2 -fpatchable-function-entry=3,1" } */
-/* { dg-additional-options "-mcpu=gr6" { target visium-*-* } }
 /* { dg-final { scan-assembler-times "nop|NOP" 2 { target { ! { alpha*-*-* } } } } } */
 /* { dg-final { scan-assembler-times "bis" 2 { target alpha*-*-* } } } */
 
Index: c-c++-common/patchable_function_entry-default.c
===
--- c-c++-common/patchable_function_entry-default.c	(revision 268849)
+++ c-c++-common/patchable_function_entry-default.c	(working copy)
@@ -1,6 +1,5 @@
-/* { dg-do compile { target { ! nvptx*-*-* } } } */
+/* { dg-do compile { target { ! { nvptx*-*-* visium-*-* } } } } */
 /* { dg-options "-O2 -fpatchable-function-entry=3,1" } */
-/* { dg-additional-options "-mcpu=gr6" { target visium-*-* } }
 /* { dg-final { scan-assembler-times "nop|NOP" 3 { target { ! { alpha*-*-* } } } } } */
 /* { dg-final { scan-assembler-times "bis" 3 { target alpha*-*-* } } } */
 
Index: c-c++-common/patchable_function_entry-definition.c
===
--- c-c++-common/patchable_function_entry-definition.c	(revision 268849)
+++ c-c++-common/patchable_function_entry-definition.c	(working copy)
@@ -1,6 +1,5 @@
-/* { dg-do compile { target { ! nvptx*-*-* } } } */
+/* { dg-do compile { target { ! { nvptx*-*-* visium-*-* } } } } */
 /* { dg-options "-O2 -fpatchable-function-entry=3,1" } */
-/* { dg-additional-options "-mcpu=gr6" { target visium-*-* } }
 /* { dg-final { scan-assembler-times "nop|NOP" 1 { target { ! { alpha*-*-* } } } } } */
 /* { dg-final { scan-assembler-times "bis" 1 { target alpha*-*-* } } } */
 Index: gcc.dg/tree-ssa/pr84859.c
===
--- gcc.dg/tree-ssa/pr84859.c	(revision 268849)
+++ gcc.dg/tree-ssa/pr84859.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -Warray-bounds -fdump-tree-phiopt2" } */
+/* { dg-options "-O2 -ftree-cselim -Warray-bounds -fdump-tree-phiopt2" } */
 
 void
 h (const void *p, unsigned n)


[testsuite] Tweak gcc.target/sparc/struct-ret-check-1.c

2019-02-15 Thread Eric Botcazou
It cannot pass in PIE mode.

Tested on SPARC64/Linux, applied on all active branches.


2019-02-15  Eric Botcazou  

* gcc.target/sparc/struct-ret-check-1.c: Add -fno-pie option.

-- 
Eric Botcazou
Index: gcc.target/sparc/struct-ret-check-1.c
===
--- gcc.target/sparc/struct-ret-check-1.c	(revision 268849)
+++ gcc.target/sparc/struct-ret-check-1.c	(working copy)
@@ -7,7 +7,7 @@
 
 /* Origin: Carlos O'Donell  */
 /* { dg-do run { target sparc*-*-solaris* sparc*-*-linux* sparc*-*-*bsd* } } */
-/* { dg-options "-mstd-struct-return" } */
+/* { dg-options "-mstd-struct-return -fno-pie" } */
 /* { dg-require-effective-target ilp32 } */
 #include 
 #include 


Re: GCC 7 backport

2019-02-15 Thread Martin Liška
On 8/16/18 12:18 PM, Martin Liška wrote:
> Hi.
> 
> I'm going to install one more patch.
> 
> Martin
> 

Hi.

I'm going to install another 2 patches.

Thanks,
Martin
>From 37023f6a8e122d325cf3e3a054511425550cb6d6 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 15 Feb 2019 11:00:42 +0100
Subject: [PATCH 1/2] Backport r268762

gcc/ChangeLog:

2019-02-11  Martin Liska  

	PR ipa/89009
	* ipa-cp.c (build_toporder_info): Remove usage of a param.
	* ipa-inline.c (inline_small_functions): Likewise.
	* ipa-pure-const.c (propagate_pure_const): Likewise.
	(propagate_nothrow): Likewise.
	* ipa-reference.c (propagate): Likewise.
	* ipa-utils.c (struct searchc_env): Remove unused field.
	(searchc): Always search across AVAIL_INTERPOSABLE.
	(ipa_reduced_postorder): Always allow AVAIL_INTERPOSABLE as
	the only called IPA pure const can properly not propagate
	across interposable boundary.
	* ipa-utils.h (ipa_reduced_postorder): Remove param.

gcc/testsuite/ChangeLog:

2019-02-11  Martin Liska  

	PR ipa/89009
	* g++.dg/ipa/pr89009.C: New test.
---
 gcc/ipa-cp.c | 2 +-
 gcc/ipa-inline.c | 2 +-
 gcc/ipa-pure-const.c | 4 ++--
 gcc/ipa-reference.c  | 2 +-
 gcc/ipa-utils.c  | 9 +++--
 gcc/ipa-utils.h  | 2 +-
 6 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 3902d3a8a00..b42cfb0f6e0 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -810,7 +810,7 @@ build_toporder_info (struct ipa_topo_info *topo)
   topo->stack = XCNEWVEC (struct cgraph_node *, symtab->cgraph_count);
 
   gcc_checking_assert (topo->stack_top == 0);
-  topo->nnodes = ipa_reduced_postorder (topo->order, true, true, NULL);
+  topo->nnodes = ipa_reduced_postorder (topo->order, true, NULL);
 }
 
 /* Free information about strongly connected components and the arrays in
diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index 0c25635f4c8..b520c6393f4 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -1745,7 +1745,7 @@ inline_small_functions (void)
  metrics.  */
 
   max_count = 0;
-  ipa_reduced_postorder (order, true, true, NULL);
+  ipa_reduced_postorder (order, true, NULL);
   free (order);
 
   FOR_EACH_DEFINED_FUNCTION (node)
diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index e457166ea39..5a11919dd5c 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -1233,7 +1233,7 @@ propagate_pure_const (void)
   bool remove_p = false;
   bool has_cdtor;
 
-  order_pos = ipa_reduced_postorder (order, true, false,
+  order_pos = ipa_reduced_postorder (order, true,
  ignore_edge_for_pure_const);
   if (dump_file)
 {
@@ -1566,7 +1566,7 @@ propagate_nothrow (void)
   int i;
   struct ipa_dfs_info * w_info;
 
-  order_pos = ipa_reduced_postorder (order, true, false,
+  order_pos = ipa_reduced_postorder (order, true,
  ignore_edge_for_nothrow);
   if (dump_file)
 {
diff --git a/gcc/ipa-reference.c b/gcc/ipa-reference.c
index f47d0cc51e1..ccbfa078deb 100644
--- a/gcc/ipa-reference.c
+++ b/gcc/ipa-reference.c
@@ -730,7 +730,7 @@ propagate (void)
  the global information.  All the nodes within a cycle will have
  the same info so we collapse cycles first.  Then we can do the
  propagation in one pass from the leaves to the roots.  */
-  order_pos = ipa_reduced_postorder (order, true, true, ignore_edge_p);
+  order_pos = ipa_reduced_postorder (order, true, ignore_edge_p);
   if (dump_file)
 ipa_print_order (dump_file, "reduced", order, order_pos);
 
diff --git a/gcc/ipa-utils.c b/gcc/ipa-utils.c
index f7dd29f925c..3fb02150904 100644
--- a/gcc/ipa-utils.c
+++ b/gcc/ipa-utils.c
@@ -63,7 +63,6 @@ struct searchc_env {
   int order_pos;
   splay_tree nodes_marked_new;
   bool reduce;
-  bool allow_overwritable;
   int count;
 };
 
@@ -105,7 +104,7 @@ searchc (struct searchc_env* env, struct cgraph_node *v,
 
   if (w->aux
 	  && (avail > AVAIL_INTERPOSABLE
-	  || (env->allow_overwritable && avail == AVAIL_INTERPOSABLE)))
+	  || avail == AVAIL_INTERPOSABLE))
 	{
 	  w_info = (struct ipa_dfs_info *) w->aux;
 	  if (w_info->new_node)
@@ -162,7 +161,7 @@ searchc (struct searchc_env* env, struct cgraph_node *v,
 
 int
 ipa_reduced_postorder (struct cgraph_node **order,
-		   bool reduce, bool allow_overwritable,
+		   bool reduce,
 		   bool (*ignore_edge) (struct cgraph_edge *))
 {
   struct cgraph_node *node;
@@ -175,15 +174,13 @@ ipa_reduced_postorder (struct cgraph_node **order,
   env.nodes_marked_new = splay_tree_new (splay_tree_compare_ints, 0, 0);
   env.count = 1;
   env.reduce = reduce;
-  env.allow_overwritable = allow_overwritable;
 
   FOR_EACH_DEFINED_FUNCTION (node)
 {
   enum availability avail = node->get_availability ();
 
   if (avail > AVAIL_INTERPOSABLE
-	  || (allow_overwritable
-	  && (avail == AVAIL_INTERPOSABLE)))
+	  || avail == AVAIL_INTERPOSABLE)
 	{
 	  /* Reuse the info if it is already there.  */
 	  struct ipa_dfs_info *info = (struct ipa_dfs_info *) node->aux;
diff --git a/gcc/i

RE: [Committed][PATCH][GCC][Arm] Fix test directive

2019-02-15 Thread Tamar Christina
Hi Christoph,

> 
> On Thu, 14 Feb 2019 at 19:27, Tamar Christina 
> wrote:
> >
> > Hi All,
> >
> > This patch fixes a failing testcase due to a use of dg-options instead
> > of dg-additional-options.
> >
> Makes sense.
> It doesn't fail in any of the configurations I test though, in what case do 
> you
> see it failing?
> 

It's failing on a system with a very old dejagnu version because dg-options 
isn't working as documented there and the options were being overridden.

Regards,
Tamar

> > Committed under the GCC obvious
> >
> > Bootstrapped Regtested on arm-none-eabi and no issues.
> >
> > Ok for trunk?
> >
> > Thanks,
> > Tamar
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2019-02-14  Tamar Christina  
> >
> > * gcc.target/arm/pr88850.c: change options to additional option.
> >
> > --


Re: Fortran vector math header

2019-02-15 Thread Martin Liška
On 2/14/19 10:13 PM, Steve Ellcey wrote:
> On Wed, 2019-02-13 at 12:34 +0100, Martin Liška wrote:
>> May I please ping this so that we can reach mainline soon?
>>
>> Thanks,
>> Martin
> 
> Martin, I can't approve this patch but I can say that I have used it on
> Aarch64 and created a follow up patch for aarch64 to create a
> get_multilib_abi_name target function for that platform.  Everything
> seemed to work fine for me and I did not have any problems or see any
> regressions when using your patch.

Great, can you please send the patch to this email thread? 

> I hope it gets approved and checked
> in soon.

Me too :)

Martin

> 
> Steve Ellcey
> sell...@marvell.com
> 



Re: [PATCH, GCC] PR target/86487: fix the way 'uses_hard_regs_p' handles paradoxical subregs

2019-02-15 Thread Andre Vieira (lists)

Hi Vlad,

On 13/02/2019 16:46, Vladimir Makarov wrote:


On 2019-02-13 5:54 a.m., Andre Vieira (lists) wrote:

PING.

Since Jeff is away can another maintainer have a look at this please?



I see the following patch


Yeah I uploaded the wrong patch... sorry. See attached, including a 
testcase, currently only fails on GCC-8 and previous though.


diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 
c061093ed699620afe2dfda60d58066d6967523a..736b084acc552b75ff4d369b6584bc9ab422e21b 
100644

--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -1761,11 +1761,21 @@ uses_hard_regs_p (rtx x, HARD_REG_SET set)
  return false;
    code = GET_CODE (x);
    mode = GET_MODE (x);
+
    if (code == SUBREG)
  {
+  /* For all SUBREGs we want to check whether the full multi-register
+ overlaps the set.  For normal SUBREGs this means 'get_hard_regno' of
+ the inner register, for paradoxical SUBREGs this means the
+ 'get_hard_regno' of the full SUBREG and for complete SUBREGs 
either is

+ fine.  Use the wider mode for all cases.  */
+  rtx subreg = SUBREG_REG (x);
    mode = wider_subreg_mode (x);
-  x = SUBREG_REG (x);
-  code = GET_CODE (x);
+  if (mode == GET_MODE (subreg))
+    {
+  x = subreg;
+  code = GET_CODE (x);
+    }
  }

    if (REG_P (x))

In your case, x will be SUBREG and be processed recursively only as a 
register of subreg.


I think you need to change the last line on

   if (REG_P (x) || code == SUBREG)

then the subreg will processed by get_hard_regno as subreg.

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 0ef13439b5055dd2c5d5049d7f62f6b3b1ddfe2a..3af41b6eed2dc113c3158e1bde1c65a896d8feb5 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -1761,14 +1761,24 @@ uses_hard_regs_p (rtx x, HARD_REG_SET set)
 return false;
   code = GET_CODE (x);
   mode = GET_MODE (x);
+
   if (code == SUBREG)
 {
+  /* For all SUBREGs we want to check whether the full multi-register
+	 overlaps the set.  For normal SUBREGs this means 'get_hard_regno' of
+	 the inner register, for paradoxical SUBREGs this means the
+	 'get_hard_regno' of the full SUBREG and for complete SUBREGs either is
+	 fine.  Use the wider mode for all cases.  */
+  rtx subreg = SUBREG_REG (x);
   mode = wider_subreg_mode (x);
-  x = SUBREG_REG (x);
-  code = GET_CODE (x);
+  if (mode == GET_MODE (subreg))
+	{
+	  x = subreg;
+	  code = GET_CODE (x);
+	}
 }
 
-  if (REG_P (x))
+  if (REG_P (x) || SUBREG_P (x))
 {
   x_hard_regno = get_hard_regno (x, true);
   return (x_hard_regno >= 0
diff --git a/gcc/testsuite/gcc.target/arm/pr86487.c b/gcc/testsuite/gcc.target/arm/pr86487.c
new file mode 100644
index ..1c1db7852d91a82a1d2b6eaa4f3d4c6dbef107f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr86487.c
@@ -0,0 +1,10 @@
+/* { dg-skip-if "" { *-*-* } { "-march=armv[0-6]*" "-mthumb" } { "" } } */
+/* { dg-require-effective-target arm_neon_hw } */
+/* { dg-options "-O1 -mbig-endian" } */
+/* { dg-add-options arm_neon } */
+int a, b, c, d;
+long long fn1(long long p2) { return p2 == 0 ? -1 : -1 % p2; }
+void fn2(long long p1, short p2, long p3) {
+  b = fn1((d || 6) & a);
+  c = b | p3;
+}


[PATCH] Come up with fast {function,call}_summary classes (PR ipa/89306).

2019-02-15 Thread Martin Liška
Hi.

The patch comes up with new summaries that use vector as underlying
data structure. In order to make the code more readable I decided to
factor out some common code into base classes.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
I tested building Inkscape w/ LTO and Honza make the same testing
for Firefox libxul.so.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

2019-02-13  Martin Liska  

PR ipa/89306
* cgraph.c (symbol_table::create_edge): Set m_summary_id to -1
by default.
(symbol_table::free_edge): Recycle m_summary_id.
* cgraph.h (get_summary_id): New.
(symbol_table::release_symbol): Set m_summary_id to -1
by default.
(symbol_table::allocate_cgraph_symbol): Recycle m_summary_id.
* ipa-fnsummary.c (ipa_fn_summary_t): Switch from
function_summary to fast_function_summary.
* ipa-fnsummary.h (ipa_fn_summary_t): Likewise.
* ipa-pure-const.c (class funct_state_summary_t):
Switch from function_summary to fast_function_summary.
* ipa-reference.c (class ipa_ref_var_info_summary_t): Likewise.
(class ipa_ref_opt_summary_t): Switch from function_summary
to fast_function_summary.
* symbol-summary.h (class function_summary_base): New class
that is created from base of former function_summary.
(function_summary_base::unregister_hooks): New.
(class function_summary): Inherit from function_summary_base.
(class call_summary_base): New class
that is created from base of former call_summary.
(class call_summary): Inherit from call_summary_base.
(struct is_same): New.
(class fast_function_summary): New summary class.
(class fast_call_summary): New summary class.
* vec.h (vec_safe_grow_cleared): New function.
---
 gcc/cgraph.c |   7 +-
 gcc/cgraph.h |  44 ++-
 gcc/ipa-fnsummary.c  |   6 +-
 gcc/ipa-fnsummary.h  |  20 +-
 gcc/ipa-pure-const.c |   5 +-
 gcc/ipa-reference.c  |  13 +-
 gcc/symbol-summary.h | 840 +--
 gcc/vec.h|  11 +
 8 files changed, 735 insertions(+), 211 deletions(-)


diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index c9788d0286a..de82316d4b1 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -852,7 +852,10 @@ symbol_table::create_edge (cgraph_node *caller, cgraph_node *callee,
   free_edges = NEXT_FREE_EDGE (edge);
 }
   else
-edge = ggc_alloc ();
+{
+  edge = ggc_alloc ();
+  edge->m_summary_id = -1;
+}
 
   edges_count++;
 
@@ -1014,7 +1017,9 @@ symbol_table::free_edge (cgraph_edge *e)
 ggc_free (e->indirect_info);
 
   /* Clear out the edge so we do not dangle pointers.  */
+  int summary_id = e->m_summary_id;
   memset (e, 0, sizeof (*e));
+  e->m_summary_id = summary_id;
   NEXT_FREE_EDGE (e) = free_edges;
   free_edges = e;
   edges_count--;
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 2f6daa75a24..03ae411acde 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1302,6 +1302,12 @@ public:
 return m_uid;
   }
 
+  /* Get summary id of the node.  */
+  inline int get_summary_id ()
+  {
+return m_summary_id;
+  }
+
   /* Record that DECL1 and DECL2 are semantically identical function
  versions.  */
   static void record_function_versions (tree decl1, tree decl2);
@@ -1470,6 +1476,9 @@ private:
   /* Unique id of the node.  */
   int m_uid;
 
+  /* Summary id that is recycled.  */
+  int m_summary_id;
+
   /* Worker for call_for_symbol_and_aliases.  */
   bool call_for_symbol_and_aliases_1 (bool (*callback) (cgraph_node *,
 		void *),
@@ -1728,6 +1737,12 @@ struct GTY((chain_next ("%h.next_caller"), chain_prev ("%h.prev_caller"),
 return m_uid;
   }
 
+  /* Get summary id of the edge.  */
+  inline int get_summary_id ()
+  {
+return m_summary_id;
+  }
+
   /* Rebuild cgraph edges for current function node.  This needs to be run after
  passes that don't update the cgraph.  */
   static unsigned int rebuild_edges (void);
@@ -1805,6 +1820,9 @@ private:
   /* Unique id of the edge.  */
   int m_uid;
 
+  /* Summary id that is recycled.  */
+  int m_summary_id;
+
   /* Remove the edge from the list of the callers of the callee.  */
   void remove_caller (void);
 
@@ -2051,7 +2069,8 @@ public:
   friend class cgraph_node;
   friend class cgraph_edge;
 
-  symbol_table (): cgraph_max_uid (1), edges_max_uid (1)
+  symbol_table (): cgraph_max_uid (1), cgraph_max_summary_id (0),
+  edges_max_uid (1), edges_max_summary_id (0)
   {
   }
 
@@ -2254,15 +2273,31 @@ public:
   /* Dump symbol table to stderr.  */
   void DEBUG_FUNCTION debug (void);
 
+  /* Allocate new callgraph node.  */
+  inline int assign_summary_id (cgraph_node *node)
+  {
+node->m_summary_id = cgraph_max_summary_id++;
+return node->m_summary_id;
+  }
+
+  /* Allocate new callgraph node.  */
+  inline int assign_summary_id (cgraph_edge *edge)
+  {
+edge->m_

Re: [omp] Move NE_EXPR handling to omp_adjust_for_condition

2019-02-15 Thread Martin Jambor
Ping please, the issue is now PR 89302.

Thanks,

Martin

On Fri, Feb 01 2019, Martin Jambor wrote:
> Hi,
>
> even after the two previous HSA fixes, there is still one remining
> libgomp failure in the testsuite when run on an HSA-enabled APU.  The
> problem is that grid calculation does not work with NE_EXPR conditions
> in omp loop constructs which is now permitted in OpenMP 5.
>
> The patch below fixes it by simply moving the code that deals with it
> into the function shared between omp expansion and gridification, and a
> place which also feels more natural, to omp_adjust_for_condition.  For
> some reason, this function is also called twice in omp_extract_for_data
> but the second call cannot have any effect, so I removed one.
>
> I have tested this on an HSA APU system with hsa offloading enabled and
> also bootstrapped and tested on a bigger x86_64-linux system.  OK for
> trunk?
>
> Thanks,
>
> Martin
>
>
> 2019-02-01  Martin Jambor  
>
>   * omp-general.c (omp_extract_for_data): Removed a duplicate call
>   to omp_adjust_for_condition, moved NE_EXPR code_cond processing...
>   (omp_adjust_for_condition): ...here.  Added necessary parameters.
>   * omp-general.h (omp_adjust_for_condition): Updated declaration.
>   * omp-grid.c (grid_attempt_target_gridification): Adjust to pass
>   proper values to new parameters of omp_adjust_for_condition.
> ---
>  gcc/omp-general.c | 67 ---
>  gcc/omp-general.h |  2 +-
>  gcc/omp-grid.c|  9 ---
>  3 files changed, 40 insertions(+), 38 deletions(-)
>
> diff --git a/gcc/omp-general.c b/gcc/omp-general.c
> index 12210c556fc..0f66ba0c5d8 100644
> --- a/gcc/omp-general.c
> +++ b/gcc/omp-general.c
> @@ -56,18 +56,47 @@ omp_is_reference (tree decl)
>return lang_hooks.decls.omp_privatize_by_reference (decl);
>  }
>  
> -/* Adjust *COND_CODE and *N2 so that the former is either LT_EXPR or
> -   GT_EXPR.  */
> +/* Adjust *COND_CODE and *N2 so that the former is either LT_EXPR or GT_EXPR,
> +   given that V is the loop index variable and STEP is loop step. */
>  
>  void
> -omp_adjust_for_condition (location_t loc, enum tree_code *cond_code, tree 
> *n2)
> +omp_adjust_for_condition (location_t loc, enum tree_code *cond_code, tree 
> *n2,
> +   tree v, tree step)
>  {
>switch (*cond_code)
>  {
>  case LT_EXPR:
>  case GT_EXPR:
> +  break;
> +
>  case NE_EXPR:
> +  gcc_assert (TREE_CODE (step) == INTEGER_CST);
> +  if (TREE_CODE (TREE_TYPE (v)) == INTEGER_TYPE)
> + {
> +   if (integer_onep (step))
> + *cond_code = LT_EXPR;
> +   else
> + {
> +   gcc_assert (integer_minus_onep (step));
> +   *cond_code = GT_EXPR;
> + }
> + }
> +  else
> + {
> +   tree unit = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (v)));
> +   gcc_assert (TREE_CODE (unit) == INTEGER_CST);
> +   if (tree_int_cst_equal (unit, step))
> + *cond_code = LT_EXPR;
> +   else
> + {
> +   gcc_assert (wi::neg (wi::to_widest (unit))
> +   == wi::to_widest (step));
> +   *cond_code = GT_EXPR;
> + }
> + }
> +
>break;
> +
>  case LE_EXPR:
>if (POINTER_TYPE_P (TREE_TYPE (*n2)))
>   *n2 = fold_build_pointer_plus_hwi_loc (loc, *n2, 1);
> @@ -258,41 +287,13 @@ omp_extract_for_data (gomp_for *for_stmt, struct 
> omp_for_data *fd,
>gcc_assert (loop->cond_code != NE_EXPR
> || (gimple_omp_for_kind (for_stmt)
> != GF_OMP_FOR_KIND_OACC_LOOP));
> -  omp_adjust_for_condition (loc, &loop->cond_code, &loop->n2);
>  
>t = gimple_omp_for_incr (for_stmt, i);
>gcc_assert (TREE_OPERAND (t, 0) == var);
>loop->step = omp_get_for_step_from_incr (loc, t);
>  
> -  if (loop->cond_code == NE_EXPR)
> - {
> -   gcc_assert (TREE_CODE (loop->step) == INTEGER_CST);
> -   if (TREE_CODE (TREE_TYPE (loop->v)) == INTEGER_TYPE)
> - {
> -   if (integer_onep (loop->step))
> - loop->cond_code = LT_EXPR;
> -   else
> - {
> -   gcc_assert (integer_minus_onep (loop->step));
> -   loop->cond_code = GT_EXPR;
> - }
> - }
> -   else
> - {
> -   tree unit = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (loop->v)));
> -   gcc_assert (TREE_CODE (unit) == INTEGER_CST);
> -   if (tree_int_cst_equal (unit, loop->step))
> - loop->cond_code = LT_EXPR;
> -   else
> - {
> -   gcc_assert (wi::neg (wi::to_widest (unit))
> -   == wi::to_widest (loop->step));
> -   loop->cond_code = GT_EXPR;
> - }
> - }
> - }
> -
> -  omp_adjust_for_condition (loc, &loop->cond_code, &loop->n2);
> +  omp_adjust_for_condition (loc, &loop->cond_code, &loop->n2, loop->v,
> + 

Re: [PATCH][AArch64] Use implementation namespace consistently in arm_neon.h

2019-02-15 Thread Kyrill Tkachov

Ping.

https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00345.html

Thanks,

Kyrill

On 2/6/19 1:52 PM, Kyrill Tkachov wrote:

[resending with patch compressed]

Hi all,

We're somewhat inconsistent in arm_neon.h when it comes to using the 
implementation namespace for local

identifiers. This means things like:
#define hash_abcd 0
#define hash_e 1
#define wk 2

#include "arm_neon.h"

uint32x4_t
foo (uint32x4_t a, uint32_t b, uint32x4_t c)
{
   return vsha1cq_u32 (a, b, c);
}

don't compile.
This patch fixes these issues throughout the whole of arm_neon.h
Bootstrapped and tested on aarch64-none-linux-gnu.
The advsimd-intrinsics.exp tests pass just fine.

Don't feel sorry for me having to write the ChangeLog. 
./contrib/mklog.pl automated the whole thing.


Ok for trunk?
Thanks,
Kyrill

2019-02-06  Kyrylo Tkachov  

 * config/aarch64/arm_neon.h (vaba_s8): Use __ in identifiers
 consistenly.
 (vaba_s16): Likewise.
 (vaba_s32): Likewise.
 (vaba_u8): Likewise.
 (vaba_u16): Likewise.
 (vaba_u32): Likewise.
 (vabal_high_s8): Likewise.
 (vabal_high_s16): Likewise.
 (vabal_high_s32): Likewise.
 (vabal_high_u8): Likewise.
 (vabal_high_u16): Likewise.
 (vabal_high_u32): Likewise.
 (vabal_s8): Likewise.
 (vabal_s16): Likewise.
 (vabal_s32): Likewise.
 (vabal_u8): Likewise.
 (vabal_u16): Likewise.
 (vabal_u32): Likewise.
 (vabaq_s8): Likewise.
 (vabaq_s16): Likewise.
 (vabaq_s32): Likewise.
 (vabaq_u8): Likewise.
 (vabaq_u16): Likewise.
 (vabaq_u32): Likewise.
 (vabd_s8): Likewise.
 (vabd_s16): Likewise.
 (vabd_s32): Likewise.
 (vabd_u8): Likewise.
 (vabd_u16): Likewise.
 (vabd_u32): Likewise.
 (vabdl_high_s8): Likewise.
 (vabdl_high_s16): Likewise.
 (vabdl_high_s32): Likewise.
 (vabdl_high_u8): Likewise.
 (vabdl_high_u16): Likewise.
 (vabdl_high_u32): Likewise.
 (vabdl_s8): Likewise.
 (vabdl_s16): Likewise.
 (vabdl_s32): Likewise.
 (vabdl_u8): Likewise.
 (vabdl_u16): Likewise.
 (vabdl_u32): Likewise.
 (vabdq_s8): Likewise.
 (vabdq_s16): Likewise.
 (vabdq_s32): Likewise.
 (vabdq_u8): Likewise.
 (vabdq_u16): Likewise.
 (vabdq_u32): Likewise.
 (vaddlv_s8): Likewise.
 (vaddlv_s16): Likewise.
 (vaddlv_u8): Likewise.
 (vaddlv_u16): Likewise.
 (vaddlvq_s8): Likewise.
 (vaddlvq_s16): Likewise.
 (vaddlvq_s32): Likewise.
 (vaddlvq_u8): Likewise.
 (vaddlvq_u16): Likewise.
 (vaddlvq_u32): Likewise.
 (vcvtx_f32_f64): Likewise.
 (vcvtx_high_f32_f64): Likewise.
 (vcvtxd_f32_f64): Likewise.
 (vmla_n_f32): Likewise.
 (vmla_n_s16): Likewise.
 (vmla_n_s32): Likewise.
 (vmla_n_u16): Likewise.
 (vmla_n_u32): Likewise.
 (vmla_s8): Likewise.
 (vmla_s16): Likewise.
 (vmla_s32): Likewise.
 (vmla_u8): Likewise.
 (vmla_u16): Likewise.
 (vmla_u32): Likewise.
 (vmlal_high_n_s16): Likewise.
 (vmlal_high_n_s32): Likewise.
 (vmlal_high_n_u16): Likewise.
 (vmlal_high_n_u32): Likewise.
 (vmlal_high_s8): Likewise.
 (vmlal_high_s16): Likewise.
 (vmlal_high_s32): Likewise.
 (vmlal_high_u8): Likewise.
 (vmlal_high_u16): Likewise.
 (vmlal_high_u32): Likewise.
 (vmlal_n_s16): Likewise.
 (vmlal_n_s32): Likewise.
 (vmlal_n_u16): Likewise.
 (vmlal_n_u32): Likewise.
 (vmlal_s8): Likewise.
 (vmlal_s16): Likewise.
 (vmlal_s32): Likewise.
 (vmlal_u8): Likewise.
 (vmlal_u16): Likewise.
 (vmlal_u32): Likewise.
 (vmlaq_n_f32): Likewise.
 (vmlaq_n_s16): Likewise.
 (vmlaq_n_s32): Likewise.
 (vmlaq_n_u16): Likewise.
 (vmlaq_n_u32): Likewise.
 (vmlaq_s8): Likewise.
 (vmlaq_s16): Likewise.
 (vmlaq_s32): Likewise.
 (vmlaq_u8): Likewise.
 (vmlaq_u16): Likewise.
 (vmlaq_u32): Likewise.
 (vmls_n_f32): Likewise.
 (vmls_n_s16): Likewise.
 (vmls_n_s32): Likewise.
 (vmls_n_u16): Likewise.
 (vmls_n_u32): Likewise.
 (vmls_s8): Likewise.
 (vmls_s16): Likewise.
 (vmls_s32): Likewise.
 (vmls_u8): Likewise.
 (vmls_u16): Likewise.
 (vmls_u32): Likewise.
 (vmlsl_high_n_s16): Likewise.
 (vmlsl_high_n_s32): Likewise.
 (vmlsl_high_n_u16): Likewise.
 (vmlsl_high_n_u32): Likewise.
 (vmlsl_high_s8): Likewise.
 (vmlsl_high_s16): Likewise.
 (vmlsl_high_s32): Likewise.
 (vmlsl_high_u8): Likewise.
 (vmlsl_high_u16): Likewise.
 (vmlsl_high_u32): Likewise.
 (vmlsl_n_s16): Likewise.
 (vmlsl_n_s32): Likewise.
 (vmlsl_n_u16): Likewise.
 (vmlsl_n_u32): Likewise.
 (vmlsl_s8): Likewise.
 (vmlsl_s16): Likewise.
 (vmlsl_s32): Likewise.
 (vmlsl_u8): Likewise.
 (vmlsl_u16): Likewise.
 (vmlsl_u32): Likewise.
 (vmlsq_n_f32): Likewise.
 (vmlsq_n_s16): Likewise.
 (vmlsq_n_s32): Likewise.
 (vmlsq_n_u16): Likewise.
 (vmlsq_n_u32): Likewise.
 (vmlsq_s8): Likewise.

Re: Go patch committed: Harmonize types referenced by both C and Go

2019-02-15 Thread Rainer Orth
Andreas Schwab  writes:

> This breaks non-split-stack builds.
>
> ../../../libgo/runtime/stack.c: In function 'doscanstack1':
> ../../../libgo/runtime/stack.c:113:18: error: passing argument 1 of
> 'scanstackblock' makes integer from pointer without a cast
> [-Werror=int-conversion]
>   113 |   scanstackblock(bottom, (uintptr)(top - bottom), gcw);
>   |  ^~
>   |  |
>   |  byte * {aka unsigned char *}

I see the same on Solaris.  Even with that fixed by appropriate casts to
uintptr (plus a few more times), Solaris bootstrap is still broken by
that patch:

/vol/gcc/src/hg/trunk/local/libgo/runtime/go-varargs.c: In function 
'__go_syscall6':
/vol/gcc/src/hg/trunk/local/libgo/runtime/go-varargs.c:101:10: error: implicit 
declaration of function 'syscall' [-Werror=implicit-function-declaration]
  101 |   return syscall (flag, a1, a2, a3, a4, a5, a6);
  |  ^~~

This needs to include  for the syscall declaration, apart
from the fundamental problem that syscall isn't a stable interface on
Solaris.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH 37/40] i386: Allow MMX intrinsic emulation with SSE

2019-02-15 Thread Uros Bizjak
On Thu, Feb 14, 2019 at 1:33 PM H.J. Lu  wrote:
>
> Allow MMX intrinsic emulation with SSE/SSE2/SSSE3.  Don't enable MMX ISA
> by default with TARGET_MMX_WITH_SSE.
>
> For pr82483-1.c and pr82483-2.c, "-mssse3 -mno-mmx" compiles in 64-bit
> mode since MMX intrinsics can be emulated wit SSE.
>
> gcc/
>
> PR target/89021
> * config/i386/i386-builtin.def: Enable MMX intrinsics with
> SSE/SSE2/SSSE3.
> * config/i386/i386.c (ix86_option_override_internal): Don't
> enable MMX ISA with TARGET_MMX_WITH_SSE by default.
> (ix86_init_mmx_sse_builtins): Enable MMX intrinsics with
> SSE/SSE2/SSSE3.
> (ix86_expand_builtin): Allow SSE/SSE2/SSSE3 to emulate MMX
> intrinsics with TARGET_MMX_WITH_SSE.
> * config/i386/mmintrin.h: Don't require MMX in 64-bit mode.
>
> gcc/testsuite/
>
> PR target/89021
> * gcc.target/i386/pr82483-1.c: Error only on ia32.
> * gcc.target/i386/pr82483-2.c: Likewise.
> ---
>  gcc/config/i386/i386-builtin.def  | 126 +++---
>  gcc/config/i386/i386.c|  46 ++--
>  gcc/config/i386/mmintrin.h|  10 +-
>  gcc/testsuite/gcc.target/i386/pr82483-1.c |   2 +-
>  gcc/testsuite/gcc.target/i386/pr82483-2.c |   2 +-
>  5 files changed, 110 insertions(+), 76 deletions(-)
>
> diff --git a/gcc/config/i386/i386-builtin.def 
> b/gcc/config/i386/i386-builtin.def
> index 88005f4687f..10a9d631f29 100644
> --- a/gcc/config/i386/i386-builtin.def
> +++ b/gcc/config/i386/i386-builtin.def
> @@ -100,7 +100,7 @@ BDESC (0, 0, CODE_FOR_fnstsw, "__builtin_ia32_fnstsw", 
> IX86_BUILTIN_FNSTSW, UNKN
>  BDESC (0, 0, CODE_FOR_fnclex, "__builtin_ia32_fnclex", IX86_BUILTIN_FNCLEX, 
> UNKNOWN, (int) VOID_FTYPE_VOID)
>
>  /* MMX */
> -BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_emms, "__builtin_ia32_emms", 
> IX86_BUILTIN_EMMS, UNKNOWN, (int) VOID_FTYPE_VOID)
> +BDESC (OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_mmx_emms, 
> "__builtin_ia32_emms", IX86_BUILTIN_EMMS, UNKNOWN, (int) VOID_FTYPE_VOID)
>
>  /* 3DNow! */
>  BDESC (OPTION_MASK_ISA_3DNOW, 0, CODE_FOR_mmx_femms, "__builtin_ia32_femms", 
> IX86_BUILTIN_FEMMS, UNKNOWN, (int) VOID_FTYPE_VOID)
> @@ -442,68 +442,68 @@ BDESC (0, 0, CODE_FOR_rotrqi3, "__builtin_ia32_rorqi", 
> IX86_BUILTIN_RORQI, UNKNO
>  BDESC (0, 0, CODE_FOR_rotrhi3, "__builtin_ia32_rorhi", IX86_BUILTIN_RORHI, 
> UNKNOWN, (int) UINT16_FTYPE_UINT16_INT)
>
>  /* MMX */
> -BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_addv8qi3, 
> "__builtin_ia32_paddb", IX86_BUILTIN_PADDB, UNKNOWN, (int) 
> V8QI_FTYPE_V8QI_V8QI)
> -BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_addv4hi3, 
> "__builtin_ia32_paddw", IX86_BUILTIN_PADDW, UNKNOWN, (int) 
> V4HI_FTYPE_V4HI_V4HI)
> -BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_addv2si3, 
> "__builtin_ia32_paddd", IX86_BUILTIN_PADDD, UNKNOWN, (int) 
> V2SI_FTYPE_V2SI_V2SI)
> -BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_subv8qi3, 
> "__builtin_ia32_psubb", IX86_BUILTIN_PSUBB, UNKNOWN, (int) 
> V8QI_FTYPE_V8QI_V8QI)
> -BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_subv4hi3, 
> "__builtin_ia32_psubw", IX86_BUILTIN_PSUBW, UNKNOWN, (int) 
> V4HI_FTYPE_V4HI_V4HI)
> -BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_subv2si3, 
> "__builtin_ia32_psubd", IX86_BUILTIN_PSUBD, UNKNOWN, (int) 
> V2SI_FTYPE_V2SI_V2SI)
> -
> -BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_ssaddv8qi3, 
> "__builtin_ia32_paddsb", IX86_BUILTIN_PADDSB, UNKNOWN, (int) 
> V8QI_FTYPE_V8QI_V8QI)
> -BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_ssaddv4hi3, 
> "__builtin_ia32_paddsw", IX86_BUILTIN_PADDSW, UNKNOWN, (int) 
> V4HI_FTYPE_V4HI_V4HI)
> -BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_sssubv8qi3, 
> "__builtin_ia32_psubsb", IX86_BUILTIN_PSUBSB, UNKNOWN, (int) 
> V8QI_FTYPE_V8QI_V8QI)
> -BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_sssubv4hi3, 
> "__builtin_ia32_psubsw", IX86_BUILTIN_PSUBSW, UNKNOWN, (int) 
> V4HI_FTYPE_V4HI_V4HI)
> -BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_usaddv8qi3, 
> "__builtin_ia32_paddusb", IX86_BUILTIN_PADDUSB, UNKNOWN, (int) 
> V8QI_FTYPE_V8QI_V8QI)
> -BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_usaddv4hi3, 
> "__builtin_ia32_paddusw", IX86_BUILTIN_PADDUSW, UNKNOWN, (int) 
> V4HI_FTYPE_V4HI_V4HI)
> -BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_ussubv8qi3, 
> "__builtin_ia32_psubusb", IX86_BUILTIN_PSUBUSB, UNKNOWN, (int) 
> V8QI_FTYPE_V8QI_V8QI)
> -BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_ussubv4hi3, 
> "__builtin_ia32_psubusw", IX86_BUILTIN_PSUBUSW, UNKNOWN, (int) 
> V4HI_FTYPE_V4HI_V4HI)
> -
> -BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_mulv4hi3, 
> "__builtin_ia32_pmullw", IX86_BUILTIN_PMULLW, UNKNOWN, (int) 
> V4HI_FTYPE_V4HI_V4HI)
> -BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_smulv4hi3_highpart, 
> "__builtin_ia32_pmulhw", IX86_BUILTIN_PMULHW, UNKNOWN, (int) 
> V4HI_FTYPE_V4HI_V4HI)
> -
> -BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_andv2si3, "__builtin_ia32_pand", 
> IX86_BUILTIN_PAND, UNKNOWN, (int) V2SI_FTYPE_V2SI_V2SI)
> -BDESC (OPTI

Re: [PATCH][DOC] Document new features for GCC 9.

2019-02-15 Thread Martin Liška
On 2/14/19 10:19 PM, Martin Sebor wrote:
> On 2/13/19 6:48 AM, Martin Liška wrote:
>> Hi.
>>
>> I'm sending patch where I document changes I made during GCC 9
>> development. I would appreciate both language and factical comments
>> about the patch.
> 
> Nothing technical, just a few very minor language nits/suggestions.
> 
> Martin
> 
> diff --git a/htdocs/gcc-9/changes.html b/htdocs/gcc-9/changes.html
> index 13243c2..9fec9e2 100644
> --- a/htdocs/gcc-9/changes.html
> +++ b/htdocs/gcc-9/changes.html
> @@ -50,11 +50,64 @@ a work-in-progress.
>  General Improvements
>  
>    
> -    A new option -flive-patching=[inline-only-static|inline-clone] is
> +    A new option 
> -flive-patching=[inline-only-static|inline-clone] is
> 
> s/is/has been/ would be better (and either a comma after option or
> a definite article without the comma).
> 
>  introduced to provide a safe compilation for live-patching. At the same
>  time, provides multiple-level control on the enabled IPA optimizations.
>  See the user guide for further information about the option for more
> -    details.
> +    details.
> 
> It seems we should choose between "for further information" and "for
> more details" but we don't need both.
> 
> +  
> +  
> +  A new option --completion<\>code is added to provide more fine
> +  option completion in a shell.  It is intended for Bash-completion 
> project.
> 
> Missing article: for "a Bash-completion project" (or perhaps "to be
> used by Bash completion." not sure exactly what project it refers to).
> 
> +  
> +  
> +  Alignment-related options -falign-functions,
> 
> Since you're naming them use a definite article: "The alignment-related
> options..."
> 
> +  -falign-labels, -falign-loops
> +  and -falign-jumps received support for a secondary
> +  alignment (e.g. -falign-loops=n:m:n2:m2).
> +  
> +  
> +  A new built-in __builtin_expect_with_probability has been 
> added.
> 
> I'm really nit-picking now but again, since you are referring to
> a specific option a definite article would be more appropriate.
> Alternatively: "A new built-in function,
> __builtin_expect_with_probability, has been added.
> 
> +  
> +  
> +  Switch expansion has been improved by using a different strategy
> +  (jump table, bit test, decision tree) for a subset of switch cases.
> +  
> +  
> +  A linear function expression defined as switch statement with cases
> 
> Maybe a missing article?  "defined as a switch statement with cases"
> (if that's what you meant.)
> 
> +  can be transformed by -ftree-switch-conversion.  For 
> example:
> +    
> +int
> +foo (int how)
> +{
> +  switch (how) {
> +    case 2: how = 205; break;
> +    case 3: how = 305; break;
> +    case 4: how = 405; break;
> +    case 5: how = 505; break;
> +    case 6: how = 605; break;
> +  }
> +  return how;
> +}
> +
> +  can be transformed into 100 * how + 5 (for values defined
> +  in the switch statement).
> +  
> +  
> +  The gcov tool received a new option --use-hotness-colors
> +  (-q) that can provide perf-like coloring of hot functions.
> +  
> +  
> +  The gcov tool has changed intermediate format to a new JSON format.
> 
> Missing article: "has changed an (or "its?") intermediate format..."
> depending on how many intermediate formats it has.
> 
> +  
> +  
> +  New pair of profiling options (-fprofile-filter-files
> +  and -fprofile-exclude-files) has been added.
> +  The options help to filter which source files are instrumented.
> +  
> +  
> +  AddressSanitizer generates more compact red-zones for automatic 
> variables.
> +  That helps to reduce memory footprint of a sanitized binary.
>    
>  
> 
> @@ -137,7 +190,7 @@ a work-in-progress.
>  D
>  
>    Support for the D programming language has been added to GCC,
> -    implementing version 2.076 of the language and run-time library.
> +    implementing version 2.076 of the language and run-time library.
>    
>  
> 
> @@ -294,7 +347,11 @@ a work-in-progress.
> 
>  
> 
> -
> +IA-32/x86-64
> +
> +  Support of Intel MPX (Memory Protection Extensions) has been 
> removed.
> +
> +
> 
>  
> 
> 

Hi.

Thank you Martin for language correction, I'm sending updated version.

Martin
diff --git a/htdocs/gcc-9/changes.html b/htdocs/gcc-9/changes.html
index 13243c2..4d30ed4 100644
--- a/htdocs/gcc-9/changes.html
+++ b/htdocs/gcc-9/changes.html
@@ -50,11 +50,64 @@ a work-in-progress.
 General Improvements
 
   
-A new option -flive-patching=[inline-only-static|inline-clone] is 
+A new option, -flive-patching=[inline-only-static|inline-clone], has been
 introduced to provide a safe compilation for live-patching. At the same
 time, provides multiple-level control on the enabled IPA optimizations.
-See the user guide for further information about the option for more
-details. 
+See the user guide for more details about the option.
+  
+  
+  A new option, --completion, has been added

Re: [PATCH] Come up with fast {function,call}_summary classes (PR ipa/89306).

2019-02-15 Thread Martin Liška
Updated version where I fixed one function comment.

Martin
>From fb1cf6f220d6af2c1676e58bd36b160fa8d9706b Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 11 Feb 2019 14:58:31 +0100
Subject: [PATCH] Come up with fast {function,call}_summary classes (PR
 ipa/89306).

gcc/ChangeLog:

2019-02-13  Martin Liska  

	PR ipa/89306
	* cgraph.c (symbol_table::create_edge): Set m_summary_id to -1
	by default.
	(symbol_table::free_edge): Recycle m_summary_id.
	* cgraph.h (get_summary_id): New.
	(symbol_table::release_symbol): Set m_summary_id to -1
	by default.
	(symbol_table::allocate_cgraph_symbol): Recycle m_summary_id.
	* ipa-fnsummary.c (ipa_fn_summary_t): Switch from
	function_summary to fast_function_summary.
	* ipa-fnsummary.h (ipa_fn_summary_t): Likewise.
	* ipa-pure-const.c (class funct_state_summary_t):
	Switch from function_summary to fast_function_summary.
	* ipa-reference.c (class ipa_ref_var_info_summary_t): Likewise.
	(class ipa_ref_opt_summary_t): Switch from function_summary
	to fast_function_summary.
	* symbol-summary.h (class function_summary_base): New class
	that is created from base of former function_summary.
	(function_summary_base::unregister_hooks): New.
	(class function_summary): Inherit from function_summary_base.
	(class call_summary_base): New class
	that is created from base of former call_summary.
	(class call_summary): Inherit from call_summary_base.
	(struct is_same): New.
	(class fast_function_summary): New summary class.
	(class fast_call_summary): New summary class.
	* vec.h (vec_safe_grow_cleared): New function.
---
 gcc/cgraph.c |   7 +-
 gcc/cgraph.h |  44 ++-
 gcc/ipa-fnsummary.c  |   6 +-
 gcc/ipa-fnsummary.h  |  20 +-
 gcc/ipa-pure-const.c |   5 +-
 gcc/ipa-reference.c  |  13 +-
 gcc/symbol-summary.h | 840 +--
 gcc/vec.h|  11 +
 8 files changed, 735 insertions(+), 211 deletions(-)

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index c9788d0286a..de82316d4b1 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -852,7 +852,10 @@ symbol_table::create_edge (cgraph_node *caller, cgraph_node *callee,
   free_edges = NEXT_FREE_EDGE (edge);
 }
   else
-edge = ggc_alloc ();
+{
+  edge = ggc_alloc ();
+  edge->m_summary_id = -1;
+}
 
   edges_count++;
 
@@ -1014,7 +1017,9 @@ symbol_table::free_edge (cgraph_edge *e)
 ggc_free (e->indirect_info);
 
   /* Clear out the edge so we do not dangle pointers.  */
+  int summary_id = e->m_summary_id;
   memset (e, 0, sizeof (*e));
+  e->m_summary_id = summary_id;
   NEXT_FREE_EDGE (e) = free_edges;
   free_edges = e;
   edges_count--;
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 2f6daa75a24..c294602d762 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1302,6 +1302,12 @@ public:
 return m_uid;
   }
 
+  /* Get summary id of the node.  */
+  inline int get_summary_id ()
+  {
+return m_summary_id;
+  }
+
   /* Record that DECL1 and DECL2 are semantically identical function
  versions.  */
   static void record_function_versions (tree decl1, tree decl2);
@@ -1470,6 +1476,9 @@ private:
   /* Unique id of the node.  */
   int m_uid;
 
+  /* Summary id that is recycled.  */
+  int m_summary_id;
+
   /* Worker for call_for_symbol_and_aliases.  */
   bool call_for_symbol_and_aliases_1 (bool (*callback) (cgraph_node *,
 		void *),
@@ -1728,6 +1737,12 @@ struct GTY((chain_next ("%h.next_caller"), chain_prev ("%h.prev_caller"),
 return m_uid;
   }
 
+  /* Get summary id of the edge.  */
+  inline int get_summary_id ()
+  {
+return m_summary_id;
+  }
+
   /* Rebuild cgraph edges for current function node.  This needs to be run after
  passes that don't update the cgraph.  */
   static unsigned int rebuild_edges (void);
@@ -1805,6 +1820,9 @@ private:
   /* Unique id of the edge.  */
   int m_uid;
 
+  /* Summary id that is recycled.  */
+  int m_summary_id;
+
   /* Remove the edge from the list of the callers of the callee.  */
   void remove_caller (void);
 
@@ -2051,7 +2069,8 @@ public:
   friend class cgraph_node;
   friend class cgraph_edge;
 
-  symbol_table (): cgraph_max_uid (1), edges_max_uid (1)
+  symbol_table (): cgraph_max_uid (1), cgraph_max_summary_id (0),
+  edges_max_uid (1), edges_max_summary_id (0)
   {
   }
 
@@ -2254,15 +2273,31 @@ public:
   /* Dump symbol table to stderr.  */
   void DEBUG_FUNCTION debug (void);
 
+  /* Assign a new summary ID for the callgraph NODE.  */
+  inline int assign_summary_id (cgraph_node *node)
+  {
+node->m_summary_id = cgraph_max_summary_id++;
+return node->m_summary_id;
+  }
+
+  /* Assign a new summary ID for the callgraph EDGE.  */
+  inline int assign_summary_id (cgraph_edge *edge)
+  {
+edge->m_summary_id = edges_max_summary_id++;
+return edge->m_summary_id;
+  }
+
   /* Return true if assembler names NAME1 and NAME2 leads to the same symbol
  name.  */
   static bool assembler_names_equal_p (const char *name1, const char *name2);
 
   int cgraph_count;
 

RE: [PATCH][GCC][Arm] Add HF modes to ANY iterators

2019-02-15 Thread Tamar Christina
Hi Christoph,

> 
> Looking at the logs, I see strange command lines when trying to compile
> arm_neon_softfp_fp16_ok:
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/gcc3/gcc/xgcc
> -B/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/gcc3/gcc/
> -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
> -fdiagnostics-color=never -mfloat-abi=softfp -mfpu=neon-fp16 -mfloat-
> abi=softfp -c -o arm_neon_softfp_fp16_ok21466.o
> arm_neon_softfp_fp16_ok21466.c
> arm_neon_softfp_fp16_ok21466.c:3:3: error: unknown type name
> 'float16x4_t'; did you mean 'float32x4_t'?
> arm_neon_softfp_fp16_ok21466.c: In function 'foo':
> arm_neon_softfp_fp16_ok21466.c:6:26: warning: implicit declaration of
> function 'vcvt_f16_f32'; did you mean 'vcvt_u32_f32'?
> [-Wimplicit-function-declaration]
> compiler exited with status 1
> 
> [I don't know where the first 'mfloat-abi=softfp' comes from] 

It comes from the check for check_effective_target_arm_neon_ok,
The tests first try to determine what options are required to get neon to work,
The arm_neon test has decided that -mfloat-abi=softfp was enough to get it to 
pass in your case.

> /aci-gcc-
> fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/gcc3/gcc/xgcc
> -B/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/gcc3/gcc/
> -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
> -fdiagnostics-color=never -mfloat-abi=softfp -mfloat-abi=softfp -mfp16-
> format=ieee -c -o arm_neon_softfp_fp16_ok21466.o
> arm_neon_softfp_fp16_ok21466.c [succeeds]
> 
> then, when compiling the testcase:
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/gcc3/gcc/xgcc
> -B/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabi/gcc3/gcc/
> /gcc/testsuite/gcc.target/arm/pr88850-2.c -fno-diagnostics-show-caret -fno-
> diagnostics-show-line-numbers -fdiagnostics-color=never -ansi -pedantic-
> errors -O2 -march=armv7-a -fdump-rtl-final -mfloat-abi=softfp -mfloat-
> abi=softfp -mfp16-format=ieee -ffat-lto-objects -S -o pr88850-2.s In file
> included from /gcc/testsuite/gcc.target/arm/pr88850-2.c:7:
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-
> gnueabi/gcc3/gcc/include/arm_neon.h:31:2:
> error: #error "NEON intrinsics not available with the soft-float ABI.
> Please use -mfloat-abi=softfp or -mfloat-abi=hard"
> /gcc/testsuite/gcc.target/arm/pr88850-2.c:9:21: error: unknown type name
> 'float16x4_t'
> /gcc/testsuite/gcc.target/arm/pr88850-2.c:11:9: error: unknown type name
> 'float16x4_t'
> 
> Why does the compiler think it's using float=abi=soft?

The error is rather misleading, the likely reason is that the -mfpu isn't a 
neon one?
I guess the difference is the -march=armv7-a that's being set must be changing 
the fpu.

In my case, the feature test only passes on the last alternative which returns 
"-mfpu=neon-fp16 -mfloat-abi=softfp -mfp16-format=ieee " which is why it passes 
locally.

Forcing the -mfpu seems a bit wrong to me, so I guess the right solution is to 
remove the second alternative from the feature test and always explicitly test 
the fpu. I'll write up a patch.

Thanks,
Tamar

> 
> 
> > The 02/13/2019 10:57, Kyrill Tkachov wrote:
> > > Hi Tamar
> > >
> > > On 2/13/19 10:33 AM, Tamar Christina wrote:
> > > > Hi All,
> > > >
> > > > The iterators ANY64 and ANY128 are used in various general split
> > > > patterns and are supposed to contain any 64 bit and 128 bit modes
> > > > respectively.
> > > >
> > > > For some reason these patterns had HI but not HF.  This adds HF so
> > > > that general
> > > > 64 and 128 bit splits are generated for these modes as well.
> > > > These are required by various split patterns that expect them to
> > > > be there.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and  > > > regtest still running> issues.
> > > >
> > > Please do this on an arm-none-linux-gnueabihf target.
> > >
> > > Though I suspect this is just a placeholder from a boilerplate ;)
> > >
> > > > Ok for trunk?
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > 2019-02-13  Tamar Christina  
> > > >
> > > > PR target/88850
> > > > * config/arm/iterators.md (ANY64): Add V4HF,
> > > > (ANY128): Add V8HF.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > 2019-02-13  Tamar Christina  
> > > >
> > > > PR target/88850
> > > > * gcc.target/arm/pr88850-2.c: New test.
> > > >
> > > > --
> > >
> > > diff --git a/gcc/config/arm/iterators.md
> > > b/gcc/config/arm/iterators.md index
> > >
> c33e572c3e89c3dc5848bd6b825d618481247558..4ac048a0c609273691c264c97c
> > > cf6cd47b43943b 100644
> > > --- a/gcc/config/arm/iterators.md
> > > +++ b/gcc/config/arm/iterators.md
> > > @@ -24,11 +24,11 @@
> > >
> > > ;;--
> > > --
> > >
> > >   ;; A list of modes that are exactly 64 bits in size. This is used
> > > to expand -;; some splits that are the same for all modes

Re: [PATCH][DOC] Document new features for GCC 9.

2019-02-15 Thread Martin Liška
On 2/14/19 11:37 PM, David Malcolm wrote:
> On Thu, 2019-02-14 at 14:19 -0700, Martin Sebor wrote:
>> On 2/13/19 6:48 AM, Martin Liška wrote:
>>> Hi.
>>>
>>> I'm sending patch where I document changes I made during GCC 9
>>> development. I would appreciate both language and factical comments
>>> about the patch.
>>
>> Nothing technical, just a few very minor language nits/suggestions.
>>
>> Martin
>>
>> diff --git a/htdocs/gcc-9/changes.html b/htdocs/gcc-9/changes.html
>> index 13243c2..9fec9e2 100644
>> --- a/htdocs/gcc-9/changes.html
>> +++ b/htdocs/gcc-9/changes.html
>> @@ -50,11 +50,64 @@ a work-in-progress.
>>   General Improvements
>>   
>> 
>> -A new option -flive-patching=[inline-only-static|inline-clone]
>> is
>> +A new option 
>> -flive-patching=[inline-only-static|inline-clone] is
>>
>> s/is/has been/ would be better (and either a comma after option or
>> a definite article without the comma).
>>
>>   introduced to provide a safe compilation for live-patching. At
>> the 
>> same
>>   time, provides multiple-level control on the enabled IPA 
>> optimizations.
>>   See the user guide for further information about the option for
>> more
>> -details.
>> +details.
> 
> Ideally we should add URLs any time we mention an option, linking to
> the docs for that option.  texinfo's HTML toolchain does give us per-
> option anchors.  They're not visible [1], but "View Source" shows us
> that they do exist; in the form:
> 
> https://gcc.gnu.org/onlinedocs/gcc/SOMETHING.html#indexOPTION
> 
> though annoyingly the SOMETHING varies depending on what kind of option
> it is.
> 
> The pertinent one here is:
> https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-flive-patching

Hi.

Good idea, I'm attaching patch that does that for all options.
There are 2 issues I see:
- https://gcc.gnu.org/onlinedocs/gcc/Invoking-Gcov.html page is missing links 
for sub-options
- option like -mlra appears multiple times (in each target documenting the 
option).

Before doing the GCC 9.1 release we should adjust URLs in order to point to:
https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/...

> 
> (FWIW, I have a patch for GCC 10 that emits terminal sequences to
> "linkify" the output when diagnostics mention option names, adding a
> URL to the docs for the pertinent option).

That sound interesting!

Martin

> 
> [...snip...]
> 
> Dave
> 
> [1] I've emailed the texinfo project about this
> 

>From 48f2df117e3339d44910fdca3136d29369aaeb69 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 15 Feb 2019 13:38:13 +0100
Subject: [PATCH] Provide URL links for options and built-ins.

---
 htdocs/gcc-9/changes.html | 33 +
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/htdocs/gcc-9/changes.html b/htdocs/gcc-9/changes.html
index 4d30ed4..fdb668b 100644
--- a/htdocs/gcc-9/changes.html
+++ b/htdocs/gcc-9/changes.html
@@ -50,7 +50,7 @@ a work-in-progress.
 General Improvements
 
   
-A new option, -flive-patching=[inline-only-static|inline-clone], has been
+A new option, https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-flive-patching";>-flive-patching=[inline-only-static|inline-clone], has been
 introduced to provide a safe compilation for live-patching. At the same
 time, provides multiple-level control on the enabled IPA optimizations.
 See the user guide for more details about the option.
@@ -60,13 +60,14 @@ a work-in-progress.
   option completion in a shell.  It is intended to be used by Bash-completion.
   
   
-  The alignment-related options -falign-functions,
-  -falign-labels, -falign-loops
-  and -falign-jumps received support for a secondary
+  The alignment-related options https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-falign-functions";>-falign-functions,
+  https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-falign-labels";>-falign-labels,
+  https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-falign-loops";>-falign-loops,
+  and https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-falign-jumps";>-falign-jumps received support for a secondary
   alignment (e.g. -falign-loops=n:m:n2:m2).
   
   
-  A new built-in function, __builtin_expect_with_probability,
+  A new built-in function, https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fexpect_005fwith_005fprobability";>__builtin_expect_with_probability,
   has been added.
   
   
@@ -75,7 +76,7 @@ a work-in-progress.
   
   
   A linear function expression defined as ia switch statement with cases
-  can be transformed by -ftree-switch-conversion.  For example:
+  can be transformed by https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-ftree-switch-conversion";>-ftree-switch-conversion.  For example:
 
 int
 foo (int how)
@@ -94,15 +95,15 @@ foo (int how)
   in the switch statement).
   
   
-  The gcov

Re: Go patch committed: Harmonize types referenced by both C and Go

2019-02-15 Thread Ian Lance Taylor
On Fri, Feb 15, 2019 at 12:15 AM Andreas Schwab  wrote:
>
> This breaks non-split-stack builds.
>
> ../../../libgo/runtime/stack.c: In function 'doscanstack1':
> ../../../libgo/runtime/stack.c:113:18: error: passing argument 1 of 
> 'scanstackblock' makes integer from pointer without a cast 
> [-Werror=int-conversion]
>   113 |   scanstackblock(bottom, (uintptr)(top - bottom), gcw);
>   |  ^~
>   |  |
>   |  byte * {aka unsigned char *}

Thanks, and sorry.  Fixed like so.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 268923)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-03e28273a4fcb114f5204d52ed107591404002f4
+a9c1a76e14b66a356d3c3dfb50f1e6138e97733c
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/runtime/stack.c
===
--- libgo/runtime/stack.c   (revision 268923)
+++ libgo/runtime/stack.c   (working copy)
@@ -110,15 +110,15 @@ static bool doscanstack1(G *gp, void *gc
}
top = (byte*)(void*)(gp->gcinitialsp) + gp->gcstacksize;
if(top > bottom)
-   scanstackblock(bottom, (uintptr)(top - bottom), gcw);
+   scanstackblock((uintptr)(bottom), (uintptr)(top - bottom), gcw);
else
-   scanstackblock(top, (uintptr)(bottom - top), gcw);
+   scanstackblock((uintptr)(top), (uintptr)(bottom - top), gcw);
if (nextsp2 != nil) {
initialsp2 = (byte*)(void*)(gp->gcinitialsp2);
if(initialsp2 > nextsp2)
-   scanstackblock(nextsp2, (uintptr)(initialsp2 - 
nextsp2), gcw);
+   scanstackblock((uintptr)(nextsp2), (uintptr)(initialsp2 
- nextsp2), gcw);
else
-   scanstackblock(initialsp2, (uintptr)(nextsp2 - 
initialsp2), gcw);
+   scanstackblock((uintptr)(initialsp2), (uintptr)(nextsp2 
- initialsp2), gcw);
}
 #endif
return true;


[PATCH 03/42] i386: Emulate MMX packsswb/packssdw/packuswb with SSE2

2019-02-15 Thread H.J. Lu
Emulate MMX packsswb/packssdw/packuswb with SSE packsswb/packssdw/packuswb
plus moving bits 64:95 to bits 32:63 in SSE register.  Only SSE register
source operand is allowed.

2019-02-08  H.J. Lu  
Uros Bizjak  

PR target/89021
* config/i386/i386-protos.h (ix86_move_vector_high_sse_to_mmx):
New prototype.
(ix86_split_mmx_pack): Likewise.
* config/i386/i386.c (ix86_move_vector_high_sse_to_mmx): New
function.
(ix86_split_mmx_pack): Likewise.
* config/i386/i386.md (mmx_isa): New.
(enabled): Also check mmx_isa.
* config/i386/mmx.md (any_s_truncate): New code iterator.
(s_trunsuffix): New code attr.
(mmx_packsswb): Removed.
(mmx_packssdw): Likewise.
(mmx_packuswb): Likewise.
(mmx_packswb): New define_insn_and_split to emulate
MMX packsswb/packuswb with SSE2.
(mmx_packssdw): Likewise.
---
 gcc/config/i386/i386-protos.h |  3 ++
 gcc/config/i386/i386.c| 54 
 gcc/config/i386/i386.md   | 13 +++
 gcc/config/i386/mmx.md| 67 +++
 4 files changed, 107 insertions(+), 30 deletions(-)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 27f5cc13abf..a53b48438ec 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -202,6 +202,9 @@ extern void ix86_expand_vecop_qihi (enum rtx_code, rtx, 
rtx, rtx);
 
 extern rtx ix86_split_stack_guard (void);
 
+extern void ix86_move_vector_high_sse_to_mmx (rtx);
+extern void ix86_split_mmx_pack (rtx[], enum rtx_code);
+
 #ifdef TREE_CODE
 extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree, int);
 #endif /* TREE_CODE  */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7d7dd80930e..d31b69d9a82 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -20221,6 +20221,60 @@ ix86_expand_vector_move_misalign (machine_mode mode, 
rtx operands[])
 gcc_unreachable ();
 }
 
+/* Move bits 64:95 to bits 32:63.  */
+
+void
+ix86_move_vector_high_sse_to_mmx (rtx op)
+{
+  rtx mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4, GEN_INT (0), GEN_INT (2),
+ GEN_INT (0), GEN_INT (0)));
+  rtx dest = lowpart_subreg (V4SImode, op, GET_MODE (op));
+  op = gen_rtx_VEC_SELECT (V4SImode, dest, mask);
+  rtx insn = gen_rtx_SET (dest, op);
+  emit_insn (insn);
+}
+
+/* Split MMX pack with signed/unsigned saturation with SSE/SSE2.  */
+
+void
+ix86_split_mmx_pack (rtx operands[], enum rtx_code code)
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  rtx op2 = operands[2];
+
+  machine_mode dmode = GET_MODE (op0);
+  machine_mode smode = GET_MODE (op1);
+  machine_mode inner_dmode = GET_MODE_INNER (dmode);
+  machine_mode inner_smode = GET_MODE_INNER (smode);
+
+  /* Get the corresponding SSE mode for destination.  */
+  int nunits = 16 / GET_MODE_SIZE (inner_dmode);
+  machine_mode sse_dmode = mode_for_vector (GET_MODE_INNER (dmode),
+   nunits).require ();
+  machine_mode sse_half_dmode = mode_for_vector (GET_MODE_INNER (dmode),
+nunits / 2).require ();
+
+  /* Get the corresponding SSE mode for source.  */
+  nunits = 16 / GET_MODE_SIZE (inner_smode);
+  machine_mode sse_smode = mode_for_vector (GET_MODE_INNER (smode),
+   nunits).require ();
+
+  /* Generate SSE pack with signed/unsigned saturation.  */
+  rtx dest = lowpart_subreg (sse_dmode, op0, GET_MODE (op0));
+  op1 = lowpart_subreg (sse_smode, op1, GET_MODE (op1));
+  op2 = lowpart_subreg (sse_smode, op2, GET_MODE (op2));
+
+  op1 = gen_rtx_fmt_e (code, sse_half_dmode, op1);
+  op2 = gen_rtx_fmt_e (code, sse_half_dmode, op2);
+  rtx insn = gen_rtx_SET (dest, gen_rtx_VEC_CONCAT (sse_dmode,
+   op1, op2));
+  emit_insn (insn);
+
+  ix86_move_vector_high_sse_to_mmx (op0);
+}
+
 /* Helper function of ix86_fixup_binary_operands to canonicalize
operand order.  Returns true if the operands should be swapped.  */
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 40ed93dc804..e1727676deb 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -792,6 +792,10 @@
avx512vl,noavx512vl,x64_avx512dq,x64_avx512bw"
   (const_string "base"))
 
+;; Define instruction set of MMX instructions
+(define_attr "mmx_isa" "base,native,x64,x64_noavx,x64_avx"
+  (const_string "base"))
+
 (define_attr "enabled" ""
   (cond [(eq_attr "isa" "x64") (symbol_ref "TARGET_64BIT")
 (eq_attr "isa" "x64_sse2")
@@ -830,6 +834,15 @@
 (eq_attr "isa" "noavx512dq") (symbol_ref "!TARGET_AVX512DQ")
 (eq_attr "isa" "avx512vl") (symbol_ref "TARGET_AVX512VL")
 (eq_attr "isa" "noavx512vl") (symbol_ref "!TARGET_AVX512VL")
+
+(eq_a

[PATCH 01/42] i386: Allow MMX register modes in SSE registers

2019-02-15 Thread H.J. Lu
In 64-bit mode, SSE2 can be used to emulate MMX instructions without
3DNOW.  We can use SSE2 to support MMX register modes.

PR target/89021
* config/i386/i386-c.c (ix86_target_macros_internal): Define
__MMX_WITH_SSE__ for TARGET_MMX_WITH_SSE.
* config/i386/i386.c (ix86_set_reg_reg_cost): Add support for
TARGET_MMX_WITH_SSE with VALID_MMX_REG_MODE.
(ix86_vector_mode_supported_p): Likewise.
* config/i386/i386.h (TARGET_MMX_WITH_SSE): New.
---
 gcc/config/i386/i386-c.c | 2 ++
 gcc/config/i386/i386.c   | 5 +++--
 gcc/config/i386/i386.h   | 2 ++
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 5e7e46fcebe..213e1b56c6b 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -548,6 +548,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
 def_or_undef (parse_in, "__CLDEMOTE__");
   if (isa_flag2 & OPTION_MASK_ISA_PTWRITE)
 def_or_undef (parse_in, "__PTWRITE__");
+  if (TARGET_MMX_WITH_SSE)
+def_or_undef (parse_in, "__MMX_WITH_SSE__");
   if (TARGET_IAMCU)
 {
   def_or_undef (parse_in, "__iamcu");
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 3e5f52175d2..7d7dd80930e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -40490,7 +40490,8 @@ ix86_set_reg_reg_cost (machine_mode mode)
  || (TARGET_AVX && VALID_AVX256_REG_MODE (mode))
  || (TARGET_SSE2 && VALID_SSE2_REG_MODE (mode))
  || (TARGET_SSE && VALID_SSE_REG_MODE (mode))
- || (TARGET_MMX && VALID_MMX_REG_MODE (mode)))
+ || ((TARGET_MMX || TARGET_MMX_WITH_SSE)
+ && VALID_MMX_REG_MODE (mode)))
units = GET_MODE_SIZE (mode);
 }
 
@@ -44316,7 +44317,7 @@ ix86_vector_mode_supported_p (machine_mode mode)
 return true;
   if (TARGET_AVX512F && VALID_AVX512F_REG_MODE (mode))
 return true;
-  if (TARGET_MMX && VALID_MMX_REG_MODE (mode))
+  if ((TARGET_MMX ||TARGET_MMX_WITH_SSE) && VALID_MMX_REG_MODE (mode))
 return true;
   if (TARGET_3DNOW && VALID_MMX_REG_MODE_3DNOW (mode))
 return true;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index d9039060997..226dc21709d 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -201,6 +201,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #define TARGET_16BIT   TARGET_CODE16
 #define TARGET_16BIT_P(x)  TARGET_CODE16_P(x)
 
+#define TARGET_MMX_WITH_SSE(TARGET_64BIT && TARGET_SSE2)
+
 #include "config/vxworks-dummy.h"
 
 #include "config/i386/i386-opts.h"
-- 
2.20.1



[PATCH 02/42] i386: Add mmx_nonimmediate_operand

2019-02-15 Thread H.J. Lu
True if the operand is a register or an nonimmediate operand when
TARGET_MMX_WITH_SSE is false.

PR target/89021
* config/i386/predicates.md (mmx_nonimmediate_operand): New.
---
 gcc/config/i386/predicates.md | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 99226e86436..bd1f07a28fb 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -49,6 +49,13 @@
   (and (match_code "reg")
(match_test "MMX_REGNO_P (REGNO (op))")))
 
+;; True if the operand is a register or an nonimmediate operand when
+;; TARGET_MMX_WITH_SSE is false.
+(define_predicate "mmx_nonimmediate_operand"
+  (ior (match_operand 0 "register_operand")
+   (and (not (match_test "TARGET_MMX_WITH_SSE"))
+   (match_operand 0 "nonimmediate_operand"
+
 ;; True if the operand is an SSE register.
 (define_predicate "sse_reg_operand"
   (and (match_code "reg")
-- 
2.20.1



[PATCH 00/40] V6: Emulate MMX intrinsics with SSE

2019-02-15 Thread H.J. Lu
On x86-64, since __m64 is returned and passed in XMM registers, we can
emulate MMX intrinsics with SSE instructions. To support it, we added

 #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2)

;; Define instruction set of MMX instructions
(define_attr "mmx_isa" "base,native,x64,x64_noavx,x64_avx"
  (const_string "base"))

 (eq_attr "mmx_isa" "native")
   (symbol_ref "!TARGET_MMX_WITH_SSE")
 (eq_attr "mmx_isa" "x64")
   (symbol_ref "TARGET_MMX_WITH_SSE")
 (eq_attr "mmx_isa" "x64_avx")
   (symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX")
 (eq_attr "mmx_isa" "x64_noavx")
   (symbol_ref "TARGET_MMX_WITH_SSE && !TARGET_AVX")

We added SSE emulation to MMX patterns and disabled MMX alternatives with
TARGET_MMX_WITH_SSE.

Most of MMX instructions have equivalent SSE versions and results of some
SSE versions need to be reshuffled to the right order for MMX.  Thee are
couple tricky cases:

1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent.  We emulate MMX
maskmovq with SSE2 maskmovdqu by zeroing out the upper 64 bits of the
mask operand and handle unmapped bits 64:127 at memory address by
adjusting source and mask operands together with memory address.

2. MMX movntq is emulated with SSE2 DImode movnti, which is available
in 64-bit mode.

3. MMX pshufb takes a 3-bit index while SSE pshufb takes a 4-bit index.
SSE emulation must clear the bit 4 in the shuffle control mask.

4. To emulate MMX cvtpi2p with SSE2 cvtdq2ps, we must properly preserve
the upper 64 bits of destination XMM register.

Tests are also added to check each SSE emulation of MMX intrinsics.

There are no regressions on i686 and x86-64.  For x86-64, GCC is also
tested with

--with-arch=native --with-cpu=native

on AVX2 and AVX512F machines.

H.J. Lu (41):
  i386: Allow MMX register modes in SSE registers
  i386: Add mmx_nonimmediate_operand
  i386: Emulate MMX packsswb/packssdw/packuswb with SSE2
  i386: Emulate MMX punpcklXX/punpckhXX with SSE punpcklXX
  i386: Emulate MMX plusminus/sat_plusminus with SSE
  i386: Emulate MMX mulv4hi3 with SSE
  i386: Emulate MMX smulv4hi3_highpart with SSE
  i386: Emulate MMX mmx_pmaddwd with SSE
  i386: Emulate MMX ashr3/3 with SSE
  i386: Emulate MMX 3 with SSE
  i386: Emulate MMX mmx_andnot3 with SSE
  i386: Emulate MMX mmx_eq/mmx_gt3 with SSE
  i386: Emulate MMX vec_dupv2si with SSE
  i386: Emulate MMX pshufw with SSE
  i386: Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE
  i386: Emulate MMX sse_cvtpi2ps with SSE
  i386: Emulate MMX mmx_pextrw with SSE
  i386: Emulate MMX mmx_pinsrw with SSE
  i386: Emulate MMX V4HI smaxmin/V8QI umaxmin with SSE
  i386: Emulate MMX mmx_pmovmskb with SSE
  i386: Emulate MMX mmx_umulv4hi3_highpart with SSE
  i386: Emulate MMX maskmovq with SSE2 maskmovdqu
  i386: Emulate MMX mmx_uavgv8qi3 with SSE
  i386: Emulate MMX mmx_uavgv4hi3 with SSE
  i386: Emulate MMX mmx_psadbw with SSE
  i386: Emulate MMX movntq with SSE2 movntidi
  i386: Emulate MMX umulv1siv1di3 with SSE2
  i386: Make _mm_empty () as NOP when MMX is disabled
  i386: Emulate MMX ssse3_phwv4hi3 with SSE
  i386: Emulate MMX ssse3_phdv2si3 with SSE
  i386: Emulate MMX ssse3_pmaddubsw with SSE
  i386: Emulate MMX ssse3_pmulhrswv4hi3 with SSE
  i386: Emulate MMX pshufb with SSE version
  i386: Emulate MMX ssse3_psign3 with SSE
  i386: Emulate MMX ssse3_palignrdi with SSE
  i386: Emulate MMX abs2 with SSE
  i386: Allow MMXMODE moves with TARGET_MMX_WITH_SSE
  i386: Allow MMX vector expanders with TARGET_MMX_WITH_SSE
  i386: Allow MMX intrinsic emulation with SSE
  i386: Enable TM MMX intrinsics with SSE2
  i386: Add tests for MMX intrinsic emulations with SSE

Uros Bizjak (1):
  Prevent allocation of MMX registers with TARGET_MMX_WITH_SSE

 gcc/config/i386/constraints.md|   6 +
 gcc/config/i386/i386-builtin.def  | 126 +--
 gcc/config/i386/i386-c.c  |   2 +
 gcc/config/i386/i386-protos.h |   4 +
 gcc/config/i386/i386.c| 189 +++-
 gcc/config/i386/i386.h|   2 +
 gcc/config/i386/i386.md   |  17 +
 gcc/config/i386/mmintrin.h|  12 +-
 gcc/config/i386/mmx.md| 903 --
 gcc/config/i386/predicates.md |   7 +
 gcc/config/i386/sse.md| 353 +--
 gcc/config/i386/xmmintrin.h   |  61 ++
 gcc/testsuite/gcc.target/i386/mmx-vals.h  |  77 ++
 gcc/testsuite/gcc.target/i386/pr82483-1.c |   2 +-
 gcc/testsuite/gcc.target/i386/pr82483-2.c |   2 +-
 gcc/testsuite/gcc.target/i386/sse2-mmx-10.c   |  44 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-11.c   |  39 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-12.c   |  43 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-13.c   |  40 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-14.c   |  32 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-15.c   |  37 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-16.c 

[PATCH 08/42] i386: Emulate MMX mmx_pmaddwd with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX pmaddwd with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mmx_pmaddwd): Also allow TARGET_MMX_WITH_SSE.
(*mmx_pmaddwd): Also allow TARGET_MMX_WITH_SSE.  Add SSE support.
---
 gcc/config/i386/mmx.md | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 58054b7e0c7..23c10dffc38 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -823,20 +823,20 @@
(sign_extend:V2SI
  (vec_select:V2HI (match_dup 2)
(parallel [(const_int 1) (const_int 3)]))]
-  "TARGET_MMX"
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (MULT, V4HImode, operands);")
 
 (define_insn "*mmx_pmaddwd"
-  [(set (match_operand:V2SI 0 "register_operand" "=y")
+  [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")
 (plus:V2SI
  (mult:V2SI
(sign_extend:V2SI
  (vec_select:V2HI
-   (match_operand:V4HI 1 "nonimmediate_operand" "%0")
+   (match_operand:V4HI 1 "nonimmediate_operand" "%0,0,Yv")
(parallel [(const_int 0) (const_int 2)])))
(sign_extend:V2SI
  (vec_select:V2HI
-   (match_operand:V4HI 2 "nonimmediate_operand" "ym")
+   (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv")
(parallel [(const_int 0) (const_int 2)]
  (mult:V2SI
(sign_extend:V2SI
@@ -845,10 +845,15 @@
(sign_extend:V2SI
  (vec_select:V2HI (match_dup 2)
(parallel [(const_int 1) (const_int 3)]))]
-  "TARGET_MMX && ix86_binary_operator_ok (MULT, V4HImode, operands)"
-  "pmaddwd\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxmul")
-   (set_attr "mode" "DI")])
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && ix86_binary_operator_ok (MULT, V4HImode, operands)"
+  "@
+   pmaddwd\t{%2, %0|%0, %2}
+   pmaddwd\t{%2, %0|%0, %2}
+   vpmaddwd\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxmul,sseiadd,sseiadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_pmulhrwv4hi3"
   [(set (match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 04/42] i386: Emulate MMX punpcklXX/punpckhXX with SSE punpcklXX

2019-02-15 Thread H.J. Lu
Emulate MMX punpcklXX/punpckhXX with SSE punpcklXX.  For MMX punpckhXX,
move bits 64:127 to bits 0:63 in SSE register.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/i386-protos.h (ix86_split_mmx_punpck): New
prototype.
* config/i386/i386.c (ix86_split_mmx_punpck): New function.
* config/i386/mmx.m (mmx_punpckhbw): Changed to
define_insn_and_split to support SSE emulation.
(mmx_punpcklbw): Likewise.
(mmx_punpckhwd): Likewise.
(mmx_punpcklwd): Likewise.
(mmx_punpckhdq): Likewise.
(mmx_punpckldq): Likewise.
---
 gcc/config/i386/i386-protos.h |   1 +
 gcc/config/i386/i386.c|  77 +++
 gcc/config/i386/mmx.md| 138 ++
 3 files changed, 168 insertions(+), 48 deletions(-)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index a53b48438ec..37581837a32 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -204,6 +204,7 @@ extern rtx ix86_split_stack_guard (void);
 
 extern void ix86_move_vector_high_sse_to_mmx (rtx);
 extern void ix86_split_mmx_pack (rtx[], enum rtx_code);
+extern void ix86_split_mmx_punpck (rtx[], bool);
 
 #ifdef TREE_CODE
 extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree, int);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d31b69d9a82..a76c17beece 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -20275,6 +20275,83 @@ ix86_split_mmx_pack (rtx operands[], enum rtx_code 
code)
   ix86_move_vector_high_sse_to_mmx (op0);
 }
 
+/* Split MMX punpcklXX/punpckhXX with SSE punpcklXX.  */
+
+void
+ix86_split_mmx_punpck (rtx operands[], bool high_p)
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  rtx op2 = operands[2];
+  machine_mode mode = GET_MODE (op0);
+  rtx mask;
+  /* The corresponding SSE mode.  */
+  machine_mode sse_mode, double_sse_mode;
+
+  switch (mode)
+{
+case E_V8QImode:
+  sse_mode = V16QImode;
+  double_sse_mode = V32QImode;
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (16,
+ GEN_INT (0), GEN_INT (16),
+ GEN_INT (1), GEN_INT (17),
+ GEN_INT (2), GEN_INT (18),
+ GEN_INT (3), GEN_INT (19),
+ GEN_INT (4), GEN_INT (20),
+ GEN_INT (5), GEN_INT (21),
+ GEN_INT (6), GEN_INT (22),
+ GEN_INT (7), GEN_INT (23)));
+  break;
+
+case E_V4HImode:
+  sse_mode = V8HImode;
+  double_sse_mode = V16HImode;
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (8,
+ GEN_INT (0), GEN_INT (8),
+ GEN_INT (1), GEN_INT (9),
+ GEN_INT (2), GEN_INT (10),
+ GEN_INT (3), GEN_INT (11)));
+  break;
+
+case E_V2SImode:
+  sse_mode = V4SImode;
+  double_sse_mode = V8SImode;
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4,
+ GEN_INT (0), GEN_INT (4),
+ GEN_INT (1), GEN_INT (5)));
+  break;
+
+default:
+  gcc_unreachable ();
+}
+
+  /* Generate SSE punpcklXX.  */
+  rtx dest = lowpart_subreg (sse_mode, op0, GET_MODE (op0));
+  op1 = lowpart_subreg (sse_mode, op1, GET_MODE (op1));
+  op2 = lowpart_subreg (sse_mode, op2, GET_MODE (op2));
+
+  op1 = gen_rtx_VEC_CONCAT (double_sse_mode, op1, op2);
+  op2 = gen_rtx_VEC_SELECT (sse_mode, op1, mask);
+  rtx insn = gen_rtx_SET (dest, op2);
+  emit_insn (insn);
+
+  if (high_p)
+{
+  /* Move bits 64:127 to bits 0:63.  */
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4, GEN_INT (2), GEN_INT (3),
+ GEN_INT (0), GEN_INT (0)));
+  dest = lowpart_subreg (V4SImode, dest, GET_MODE (dest));
+  op1 = gen_rtx_VEC_SELECT (V4SImode, dest, mask);
+  insn = gen_rtx_SET (dest, op1);
+  emit_insn (insn);
+}
+}
+
 /* Helper function of ix86_fixup_binary_operands to canonicalize
operand order.  Returns true if the operands should be swapped.  */
 
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index ca9cf20f8e3..8ae24439e8d 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1064,87 +1064,129 @@
(set_attr "type" "mmxshft,sselog,sselog")
(set_attr "mode" "DI,TI,TI")])
 
-(define_insn "mmx_punpckhbw"
-  [(set (match_operand:V8QI 0 "register_operand" "=y")
+(define_insn_and_split "mmx_punpckhbw"
+  [(set (match_operand:V8QI 0 "register_opera

[PATCH 06/42] i386: Emulate MMX mulv4hi3 with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX mulv4hi3 with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mulv4hi3): New.
(*mmx_mulv4hi3): Also allow TARGET_MMX_WITH_SSE.  Add SSE
support.
---
 gcc/config/i386/mmx.md | 26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index b6277789091..8ec7632912b 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -721,14 +721,26 @@
   "TARGET_MMX"
   "ix86_fixup_binary_operands_no_copy (MULT, V4HImode, operands);")
 
+(define_expand "mulv4hi3"
+  [(set (match_operand:V4HI 0 "register_operand")
+(mult:V4HI (match_operand:V4HI 1 "nonimmediate_operand")
+  (match_operand:V4HI 2 "nonimmediate_operand")))]
+  "TARGET_MMX_WITH_SSE"
+  "ix86_fixup_binary_operands_no_copy (MULT, V4HImode, operands);")
+
 (define_insn "*mmx_mulv4hi3"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
-(mult:V4HI (match_operand:V4HI 1 "nonimmediate_operand" "%0")
-  (match_operand:V4HI 2 "nonimmediate_operand" "ym")))]
-  "TARGET_MMX && ix86_binary_operator_ok (MULT, V4HImode, operands)"
-  "pmullw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxmul")
-   (set_attr "mode" "DI")])
+  [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yv")
+(mult:V4HI (match_operand:V4HI 1 "nonimmediate_operand" "%0,0,Yv")
+  (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && ix86_binary_operator_ok (MULT, V4HImode, operands)"
+  "@
+   pmullw\t{%2, %0|%0, %2}
+   pmullw\t{%2, %0|%0, %2}
+   vpmullw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxmul,ssemul,ssemul")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_smulv4hi3_highpart"
   [(set (match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 07/42] i386: Emulate MMX smulv4hi3_highpart with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX mulv4hi3 with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mmx_smulv4hi3_highpart): Also allow
TARGET_MMX_WITH_SSE.
(*mmx_smulv4hi3_highpart): Also allow TARGET_MMX_WITH_SSE. Add
SSE support.
---
 gcc/config/i386/mmx.md | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 8ec7632912b..58054b7e0c7 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -752,23 +752,28 @@
  (sign_extend:V4SI
(match_operand:V4HI 2 "nonimmediate_operand")))
(const_int 16]
-  "TARGET_MMX"
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (MULT, V4HImode, operands);")
 
 (define_insn "*mmx_smulv4hi3_highpart"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yv")
(truncate:V4HI
  (lshiftrt:V4SI
(mult:V4SI
  (sign_extend:V4SI
-   (match_operand:V4HI 1 "nonimmediate_operand" "%0"))
+   (match_operand:V4HI 1 "nonimmediate_operand" "%0,0,Yv"))
  (sign_extend:V4SI
-   (match_operand:V4HI 2 "nonimmediate_operand" "ym")))
+   (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv")))
(const_int 16]
-  "TARGET_MMX && ix86_binary_operator_ok (MULT, V4HImode, operands)"
-  "pmulhw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxmul")
-   (set_attr "mode" "DI")])
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && ix86_binary_operator_ok (MULT, V4HImode, operands)"
+  "@
+   pmulhw\t{%2, %0|%0, %2}
+   pmulhw\t{%2, %0|%0, %2}
+   vpmulhw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxmul,ssemul,ssemul")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_umulv4hi3_highpart"
   [(set (match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 09/42] i386: Emulate MMX ashr3/3 with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX ashr3/3 with SSE.  Only SSE register
source operand is allowed.

PR target/89021
* config/i386/mmx.md (mmx_ashr3): Also allow
TARGET_MMX_WITH_SSE.  Add SSE emulation.
(mmx_3): Likewise.
(ashr3): New.
(3): Likewise.
---
 gcc/config/i386/mmx.md | 50 ++
 1 file changed, 36 insertions(+), 14 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 23c10dffc38..eef17504616 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -959,32 +959,54 @@
(set_attr "mode" "DI")])
 
 (define_insn "mmx_ashr3"
-  [(set (match_operand:MMXMODE24 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODE24 0 "register_operand" "=y,x,Yv")
 (ashiftrt:MMXMODE24
- (match_operand:MMXMODE24 1 "register_operand" "0")
- (match_operand:DI 2 "nonmemory_operand" "yN")))]
-  "TARGET_MMX"
-  "psra\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxshft")
+ (match_operand:MMXMODE24 1 "register_operand" "0,0,Yv")
+ (match_operand:DI 2 "nonmemory_operand" "yN,xN,YvN")))]
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
+  "@
+   psra\t{%2, %0|%0, %2}
+   psra\t{%2, %0|%0, %2}
+   vpsra\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxshft,sseishft,sseishft")
(set (attr "length_immediate")
  (if_then_else (match_operand 2 "const_int_operand")
(const_string "1")
(const_string "0")))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
+
+(define_expand "ashr3"
+  [(set (match_operand:MMXMODE24 0 "register_operand")
+(ashiftrt:MMXMODE24
+ (match_operand:MMXMODE24 1 "register_operand")
+ (match_operand:DI 2 "nonmemory_operand")))]
+  "TARGET_MMX_WITH_SSE")
 
 (define_insn "mmx_3"
-  [(set (match_operand:MMXMODE248 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODE248 0 "register_operand" "=y,x,Yv")
 (any_lshift:MMXMODE248
- (match_operand:MMXMODE248 1 "register_operand" "0")
- (match_operand:DI 2 "nonmemory_operand" "yN")))]
-  "TARGET_MMX"
-  "p\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxshft")
+ (match_operand:MMXMODE248 1 "register_operand" "0,0,Yv")
+ (match_operand:DI 2 "nonmemory_operand" "yN,xN,YvN")))]
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
+  "@
+   p\t{%2, %0|%0, %2}
+   p\t{%2, %0|%0, %2}
+   vp\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxshft,sseishft,sseishft")
(set (attr "length_immediate")
  (if_then_else (match_operand 2 "const_int_operand")
(const_string "1")
(const_string "0")))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
+
+(define_expand "3"
+  [(set (match_operand:MMXMODE248 0 "register_operand")
+(any_lshift:MMXMODE248
+ (match_operand:MMXMODE248 1 "register_operand")
+ (match_operand:DI 2 "nonmemory_operand")))]
+  "TARGET_MMX_WITH_SSE")
 
 ;
 ;;
-- 
2.20.1



[PATCH 05/42] i386: Emulate MMX plusminus/sat_plusminus with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX plusminus/sat_plusminus with SSE.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/mmx.md (MMXMODEI8): Require TARGET_SSE2 for V1DI.
(plusminus:mmx_3): Check
TARGET_MMX_WITH_SSE.
(sat_plusminus:mmx_3): Likewise.
(3): New.
(*mmx_3): Add SSE emulation.
(*mmx_3): Likewise.
---
 gcc/config/i386/mmx.md | 51 --
 1 file changed, 34 insertions(+), 17 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 8ae24439e8d..b6277789091 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -45,7 +45,7 @@
 
 ;; 8 byte integral modes handled by MMX (and by extension, SSE)
 (define_mode_iterator MMXMODEI [V8QI V4HI V2SI])
-(define_mode_iterator MMXMODEI8 [V8QI V4HI V2SI V1DI])
+(define_mode_iterator MMXMODEI8 [V8QI V4HI V2SI (V1DI "TARGET_SSE2")])
 
 ;; All 8-byte vector modes handled by MMX
 (define_mode_iterator MMXMODE [V8QI V4HI V2SI V1DI V2SF])
@@ -665,37 +665,54 @@
(plusminus:MMXMODEI8
  (match_operand:MMXMODEI8 1 "nonimmediate_operand")
  (match_operand:MMXMODEI8 2 "nonimmediate_operand")))]
-  "TARGET_MMX || (TARGET_SSE2 && mode == V1DImode)"
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
+  "ix86_fixup_binary_operands_no_copy (, mode, operands);")
+
+(define_expand "3"
+  [(set (match_operand:MMXMODEI 0 "register_operand")
+   (plusminus:MMXMODEI
+ (match_operand:MMXMODEI 1 "nonimmediate_operand")
+ (match_operand:MMXMODEI 2 "nonimmediate_operand")))]
+  "TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (, mode, operands);")
 
 (define_insn "*mmx_3"
-  [(set (match_operand:MMXMODEI8 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI8 0 "register_operand" "=y,x,Yv")
 (plusminus:MMXMODEI8
- (match_operand:MMXMODEI8 1 "nonimmediate_operand" "0")
- (match_operand:MMXMODEI8 2 "nonimmediate_operand" "ym")))]
-  "(TARGET_MMX || (TARGET_SSE2 && mode == V1DImode))
+ (match_operand:MMXMODEI8 1 "nonimmediate_operand" "0,0,Yv")
+ (match_operand:MMXMODEI8 2 "nonimmediate_operand" "ym,x,Yv")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
&& ix86_binary_operator_ok (, mode, operands)"
-  "p\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxadd")
-   (set_attr "mode" "DI")])
+  "@
+   p\t{%2, %0|%0, %2}
+   p\t{%2, %0|%0, %2}
+   vp\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxadd,sseadd,sseadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_3"
   [(set (match_operand:MMXMODE12 0 "register_operand")
(sat_plusminus:MMXMODE12
  (match_operand:MMXMODE12 1 "nonimmediate_operand")
  (match_operand:MMXMODE12 2 "nonimmediate_operand")))]
-  "TARGET_MMX"
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (, mode, operands);")
 
 (define_insn "*mmx_3"
-  [(set (match_operand:MMXMODE12 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODE12 0 "register_operand" "=y,x,Yv")
 (sat_plusminus:MMXMODE12
- (match_operand:MMXMODE12 1 "nonimmediate_operand" "0")
- (match_operand:MMXMODE12 2 "nonimmediate_operand" "ym")))]
-  "TARGET_MMX && ix86_binary_operator_ok (, mode, operands)"
-  "p\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxadd")
-   (set_attr "mode" "DI")])
+ (match_operand:MMXMODE12 1 "nonimmediate_operand" "0,0,Yv")
+ (match_operand:MMXMODE12 2 "nonimmediate_operand" "ym,x,Yv")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && ix86_binary_operator_ok (, mode, operands)"
+  "@
+   p\t{%2, %0|%0, %2}
+   p\t{%2, %0|%0, %2}
+   vp\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxadd,sseadd,sseadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_mulv4hi3"
   [(set (match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 11/42] i386: Emulate MMX mmx_andnot3 with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX mmx_andnot3 with SSE.  Only SSE register source operand
is allowed.

PR target/89021
* config/i386/mmx.md (mmx_andnot3): Also allow
TARGET_MMX_WITH_SSE.  Add SSE support.
---
 gcc/config/i386/mmx.md | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 7a253005aba..c558aee79b6 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1049,14 +1049,18 @@
 ;
 
 (define_insn "mmx_andnot3"
-  [(set (match_operand:MMXMODEI 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,Yv")
(and:MMXMODEI
- (not:MMXMODEI (match_operand:MMXMODEI 1 "register_operand" "0"))
- (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym")))]
-  "TARGET_MMX"
-  "pandn\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxadd")
-   (set_attr "mode" "DI")])
+ (not:MMXMODEI (match_operand:MMXMODEI 1 "register_operand" "0,0,Yv"))
+ (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym,x,Yv")))]
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
+  "@
+   pandn\t{%2, %0|%0, %2}
+   pandn\t{%2, %0|%0, %2}
+   vpandn\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxadd,sselog,sselog")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_3"
   [(set (match_operand:MMXMODEI 0 "register_operand")
-- 
2.20.1



[PATCH 12/42] i386: Emulate MMX mmx_eq/mmx_gt3 with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX mmx_eq/mmx_gt3 with SSE.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/mmx.md (mmx_eq3): Also allow
TARGET_MMX_WITH_SSE.
(*mmx_eq3): Also allow TARGET_MMX_WITH_SSE.  Add SSE
support.
(mmx_gt3): Likewise.
---
 gcc/config/i386/mmx.md | 39 ---
 1 file changed, 24 insertions(+), 15 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index c558aee79b6..54e16917115 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1019,28 +1019,37 @@
 (eq:MMXMODEI
  (match_operand:MMXMODEI 1 "nonimmediate_operand")
  (match_operand:MMXMODEI 2 "nonimmediate_operand")))]
-  "TARGET_MMX"
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (EQ, mode, operands);")
 
 (define_insn "*mmx_eq3"
-  [(set (match_operand:MMXMODEI 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,Yv")
 (eq:MMXMODEI
- (match_operand:MMXMODEI 1 "nonimmediate_operand" "%0")
- (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym")))]
-  "TARGET_MMX && ix86_binary_operator_ok (EQ, mode, operands)"
-  "pcmpeq\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxcmp")
-   (set_attr "mode" "DI")])
+ (match_operand:MMXMODEI 1 "nonimmediate_operand" "%0,0,Yv")
+ (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym,x,Yv")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && ix86_binary_operator_ok (EQ, mode, operands)"
+  "@
+   pcmpeq\t{%2, %0|%0, %2}
+   pcmpeq\t{%2, %0|%0, %2}
+   vpcmpeq\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxcmp,ssecmp,ssecmp")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "mmx_gt3"
-  [(set (match_operand:MMXMODEI 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,Yv")
 (gt:MMXMODEI
- (match_operand:MMXMODEI 1 "register_operand" "0")
- (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym")))]
-  "TARGET_MMX"
-  "pcmpgt\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxcmp")
-   (set_attr "mode" "DI")])
+ (match_operand:MMXMODEI 1 "register_operand" "0,0,Yv")
+ (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym,x,Yv")))]
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
+  "@
+   pcmpgt\t{%2, %0|%0, %2}
+   pcmpgt\t{%2, %0|%0, %2}
+   vpcmpgt\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxcmp,ssecmp,ssecmp")
+   (set_attr "mode" "DI,TI,TI")])
 
 ;
 ;;
-- 
2.20.1



[PATCH 13/42] i386: Emulate MMX vec_dupv2si with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX vec_dupv2si with SSE.  Add the "Yw" constraint to allow
broadcast from integer register for AVX512BW with TARGET_AVX512VL.
Only SSE register source operand is allowed.

PR target/89021
* config/i386/constraints.md (Yw): New constraint.
* config/i386/mmx.md (*vec_dupv2si): Changed to
define_insn_and_split and also allow TARGET_MMX_WITH_SSE to
support SSE emulation.
---
 gcc/config/i386/constraints.md |  6 ++
 gcc/config/i386/mmx.md | 24 +---
 2 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 16075b4acf3..c546b20d9dc 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -110,6 +110,8 @@
 ;;  v  any EVEX encodable SSE register for AVX512VL target,
 ;; otherwise any SSE register
 ;;  h  EVEX encodable SSE register with number factor of four
+;;  w  any EVEX encodable SSE register for AVX512BW with TARGET_AVX512VL
+;; target.
 
 (define_register_constraint "Yz" "TARGET_SSE ? SSE_FIRST_REG : NO_REGS"
  "First SSE register (@code{%xmm0}).")
@@ -146,6 +148,10 @@
  "TARGET_AVX512VL ? ALL_SSE_REGS : TARGET_SSE ? SSE_REGS : NO_REGS"
  "@internal For AVX512VL, any EVEX encodable SSE register 
(@code{%xmm0-%xmm31}), otherwise any SSE register.")
 
+(define_register_constraint "Yw"
+ "TARGET_AVX512BW && TARGET_AVX512VL ? ALL_SSE_REGS : NO_REGS"
+ "@internal Any EVEX encodable SSE register (@code{%xmm0-%xmm31}) for AVX512BW 
with TARGET_AVX512VL target.")
+
 ;; We use the B prefix to denote any number of internal operands:
 ;;  f  FLAGS_REG
 ;;  g  GOT memory operand.
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 54e16917115..f07e1104ae8 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1381,14 +1381,24 @@
(set_attr "length_immediate" "1")
(set_attr "mode" "DI")])
 
-(define_insn "*vec_dupv2si"
-  [(set (match_operand:V2SI 0 "register_operand" "=y")
+(define_insn_and_split "*vec_dupv2si"
+  [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv,Yw")
(vec_duplicate:V2SI
- (match_operand:SI 1 "register_operand" "0")))]
-  "TARGET_MMX"
-  "punpckldq\t%0, %0"
-  [(set_attr "type" "mmxcvt")
-   (set_attr "mode" "DI")])
+ (match_operand:SI 1 "register_operand" "0,0,Yv,r")))]
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
+  "@
+   punpckldq\t%0, %0
+   #
+   #
+   #"
+  "TARGET_MMX_WITH_SSE && reload_completed"
+  [(set (match_dup 0)
+   (vec_duplicate:V4SI (match_dup 1)))]
+  "operands[0] = lowpart_subreg (V4SImode, operands[0],
+GET_MODE (operands[0]));"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx,x64_avx")
+   (set_attr "type" "mmxcvt,ssemov,ssemov,ssemov")
+   (set_attr "mode" "DI,TI,TI,TI")])
 
 (define_insn "*mmx_concatv2si"
   [(set (match_operand:V2SI 0 "register_operand" "=y,y")
-- 
2.20.1



[PATCH 18/42] i386: Emulate MMX mmx_pinsrw with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX mmx_pinsrw with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mmx_pinsrw): Also check TARGET_MMX and
TARGET_MMX_WITH_SSE.
(*mmx_pinsrw): Add SSE emulation.
---
 gcc/config/i386/mmx.md | 33 +++--
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 678eaa713dc..7bd97c28f71 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1282,32 +1282,45 @@
 (match_operand:SI 2 "nonimmediate_operand"))
  (match_operand:V4HI 1 "register_operand")
   (match_operand:SI 3 "const_0_to_3_operand")))]
-  "TARGET_SSE || TARGET_3DNOW_A"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
 {
   operands[2] = gen_lowpart (HImode, operands[2]);
   operands[3] = GEN_INT (1 << INTVAL (operands[3]));
 })
 
 (define_insn "*mmx_pinsrw"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yv")
 (vec_merge:V4HI
   (vec_duplicate:V4HI
-(match_operand:HI 2 "nonimmediate_operand" "rm"))
- (match_operand:V4HI 1 "register_operand" "0")
+(match_operand:HI 2 "nonimmediate_operand" "rm,rm,rm"))
+ (match_operand:V4HI 1 "register_operand" "0,0,Yv")
   (match_operand:SI 3 "const_int_operand")))]
-  "(TARGET_SSE || TARGET_3DNOW_A)
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)
&& ((unsigned) exact_log2 (INTVAL (operands[3]))
< GET_MODE_NUNITS (V4HImode))"
 {
   operands[3] = GEN_INT (exact_log2 (INTVAL (operands[3])));
-  if (MEM_P (operands[2]))
-return "pinsrw\t{%3, %2, %0|%0, %2, %3}";
+  if (TARGET_MMX_WITH_SSE && TARGET_AVX)
+{
+  if (MEM_P (operands[2]))
+   return "vpinsrw\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+  else
+   return "vpinsrw\t{%3, %k2, %1, %0|%0, %1, %k2, %3}";
+}
   else
-return "pinsrw\t{%3, %k2, %0|%0, %k2, %3}";
+{
+  if (MEM_P (operands[2]))
+   return "pinsrw\t{%3, %2, %0|%0, %2, %3}";
+  else
+   return "pinsrw\t{%3, %k2, %0|%0, %k2, %3}";
+}
 }
-  [(set_attr "type" "mmxcvt")
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxcvt,sselog,sselog")
(set_attr "length_immediate" "1")
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "mmx_pextrw"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
-- 
2.20.1



[PATCH 14/42] i386: Emulate MMX pshufw with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX pshufw with SSE.  Only SSE register source operand is allowed.

PR target/89021
* config/i386/mmx.md (mmx_pshufw): Also check TARGET_MMX and
TARGET_MMX_WITH_SSE.
(mmx_pshufw_1): Add SSE emulation.
(*vec_dupv4hi): Changed to define_insn_and_split and also allow
TARGET_MMX_WITH_SSE to support SSE emulation.
---
 gcc/config/i386/mmx.md | 79 ++
 1 file changed, 64 insertions(+), 15 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index f07e1104ae8..3ea64e9aabe 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1325,7 +1325,8 @@
   [(match_operand:V4HI 0 "register_operand")
(match_operand:V4HI 1 "nonimmediate_operand")
(match_operand:SI 2 "const_int_operand")]
-  "TARGET_SSE || TARGET_3DNOW_A"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
 {
   int mask = INTVAL (operands[2]);
   emit_insn (gen_mmx_pshufw_1 (operands[0], operands[1],
@@ -1337,14 +1338,15 @@
 })
 
 (define_insn "mmx_pshufw_1"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,Yv")
 (vec_select:V4HI
-  (match_operand:V4HI 1 "nonimmediate_operand" "ym")
+  (match_operand:V4HI 1 "nonimmediate_operand" "ym,Yv")
   (parallel [(match_operand 2 "const_0_to_3_operand")
  (match_operand 3 "const_0_to_3_operand")
  (match_operand 4 "const_0_to_3_operand")
  (match_operand 5 "const_0_to_3_operand")])))]
-  "TARGET_SSE || TARGET_3DNOW_A"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
 {
   int mask = 0;
   mask |= INTVAL (operands[2]) << 0;
@@ -1353,11 +1355,20 @@
   mask |= INTVAL (operands[5]) << 6;
   operands[2] = GEN_INT (mask);
 
-  return "pshufw\t{%2, %1, %0|%0, %1, %2}";
+  switch (which_alternative)
+{
+case 0:
+  return "pshufw\t{%2, %1, %0|%0, %1, %2}";
+case 1:
+  return "%vpshuflw\t{%2, %1, %0|%0, %1, %2}";
+default:
+  gcc_unreachable ();
+}
 }
-  [(set_attr "type" "mmxcvt")
+  [(set_attr "mmx_isa" "native,x64")
+   (set_attr "type" "mmxcvt,sselog")
(set_attr "length_immediate" "1")
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI")])
 
 (define_insn "mmx_pswapdv2si2"
   [(set (match_operand:V2SI 0 "register_operand" "=y")
@@ -1370,16 +1381,54 @@
(set_attr "prefix_extra" "1")
(set_attr "mode" "DI")])
 
-(define_insn "*vec_dupv4hi"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+(define_insn_and_split "*vec_dupv4hi"
+  [(set (match_operand:V4HI 0 "register_operand" "=y,Yv,Yw")
(vec_duplicate:V4HI
  (truncate:HI
-   (match_operand:SI 1 "register_operand" "0"]
-  "TARGET_SSE || TARGET_3DNOW_A"
-  "pshufw\t{$0, %0, %0|%0, %0, 0}"
-  [(set_attr "type" "mmxcvt")
-   (set_attr "length_immediate" "1")
-   (set_attr "mode" "DI")])
+   (match_operand:SI 1 "register_operand" "0,Yv,r"]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
+  "@
+   pshufw\t{$0, %0, %0|%0, %0, 0}
+   #
+   #"
+  "TARGET_MMX_WITH_SSE && reload_completed"
+  [(const_int 0)]
+{
+  rtx op;
+  operands[0] = lowpart_subreg (V8HImode, operands[0],
+   GET_MODE (operands[0]));
+  if (TARGET_AVX2)
+{
+  operands[1] = lowpart_subreg (HImode, operands[1],
+   GET_MODE (operands[1]));
+  op = gen_rtx_VEC_DUPLICATE (V8HImode, operands[1]);
+}
+  else
+{
+  operands[1] = lowpart_subreg (V8HImode, operands[1],
+   GET_MODE (operands[1]));
+  rtx mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (8,
+ GEN_INT (0),
+ GEN_INT (0),
+ GEN_INT (0),
+ GEN_INT (0),
+ GEN_INT (4),
+ GEN_INT (5),
+ GEN_INT (6),
+ GEN_INT (7)));
+
+  op = gen_rtx_VEC_SELECT (V8HImode, operands[1], mask);
+}
+  rtx insn = gen_rtx_SET (operands[0], op);
+  emit_insn (insn);
+  DONE;
+}
+  [(set_attr "mmx_isa" "native,x64,x64_avx")
+   (set_attr "type" "mmxcvt,sselog1,ssemov")
+   (set_attr "length_immediate" "1,1,0")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn_and_split "*vec_dupv2si"
   [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv,Yw")
-- 
2.20.1



[PATCH 26/42] i386: Emulate MMX movntq with SSE2 movntidi

2019-02-15 Thread H.J. Lu
Emulate MMX movntq with SSE2 movntidi.  Only register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (sse_movntq): Add SSE2 emulation.
---
 gcc/config/i386/mmx.md | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 8ba8ca6ea45..427a037fa62 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -214,12 +214,16 @@
 })
 
 (define_insn "sse_movntq"
-  [(set (match_operand:DI 0 "memory_operand" "=m")
-   (unspec:DI [(match_operand:DI 1 "register_operand" "y")]
+  [(set (match_operand:DI 0 "memory_operand" "=m,m")
+   (unspec:DI [(match_operand:DI 1 "register_operand" "y,r")]
   UNSPEC_MOVNTQ))]
-  "TARGET_SSE || TARGET_3DNOW_A"
-  "movntq\t{%1, %0|%0, %1}"
-  [(set_attr "type" "mmxmov")
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
+  "@
+   movntq\t{%1, %0|%0, %1}
+   movnti\t{%1, %0|%0, %1}"
+  [(set_attr "mmx_isa" "native,x64")
+   (set_attr "type" "mmxmov,ssemov")
(set_attr "mode" "DI")])
 
 ;
-- 
2.20.1



[PATCH 15/42] i386: Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE.

PR target/89021
* config/i386/mmx.md (sse_cvtps2pi): Add SSE emulation.
(sse_cvttps2pi): Likewise.
---
 gcc/config/i386/sse.md | 30 ++
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index c8e0133560a..083f9ef0f44 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -4574,26 +4574,32 @@
(set_attr "mode" "V4SF")])
 
 (define_insn "sse_cvtps2pi"
-  [(set (match_operand:V2SI 0 "register_operand" "=y")
+  [(set (match_operand:V2SI 0 "register_operand" "=y,Yv")
(vec_select:V2SI
- (unspec:V4SI [(match_operand:V4SF 1 "nonimmediate_operand" "xm")]
+ (unspec:V4SI [(match_operand:V4SF 1 "nonimmediate_operand" "xm,YvBm")]
   UNSPEC_FIX_NOTRUNC)
  (parallel [(const_int 0) (const_int 1)])))]
-  "TARGET_SSE"
-  "cvtps2pi\t{%1, %0|%0, %q1}"
-  [(set_attr "type" "ssecvt")
-   (set_attr "unit" "mmx")
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSE"
+  "@
+   cvtps2pi\t{%1, %0|%0, %q1}
+   %vcvtps2dq\t{%1, %0|%0, %1}"
+  [(set_attr "mmx_isa" "native,x64")
+   (set_attr "type" "ssecvt")
+   (set_attr "unit" "mmx,*")
(set_attr "mode" "DI")])
 
 (define_insn "sse_cvttps2pi"
-  [(set (match_operand:V2SI 0 "register_operand" "=y")
+  [(set (match_operand:V2SI 0 "register_operand" "=y,Yv")
(vec_select:V2SI
- (fix:V4SI (match_operand:V4SF 1 "nonimmediate_operand" "xm"))
+ (fix:V4SI (match_operand:V4SF 1 "nonimmediate_operand" "xm,YvBm"))
  (parallel [(const_int 0) (const_int 1)])))]
-  "TARGET_SSE"
-  "cvttps2pi\t{%1, %0|%0, %q1}"
-  [(set_attr "type" "ssecvt")
-   (set_attr "unit" "mmx")
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSE"
+  "@
+   cvttps2pi\t{%1, %0|%0, %q1}
+   %vcvttps2dq\t{%1, %0|%0, %1}"
+  [(set_attr "mmx_isa" "native,x64")
+   (set_attr "type" "ssecvt")
+   (set_attr "unit" "mmx,*")
(set_attr "prefix_rep" "0")
(set_attr "mode" "SF")])
 
-- 
2.20.1



[PATCH 16/42] i386: Emulate MMX sse_cvtpi2ps with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX sse_cvtpi2ps with SSE2 cvtdq2ps, preserving upper 64 bits of
destination XMM register.  Only SSE register source operand is allowed.

PR target/89021
* config/i386/sse.md (sse_cvtpi2ps): Changed to
define_insn_and_split.  Also allow TARGET_MMX_WITH_SSE.  Add
SSE emulation.
---
 gcc/config/i386/sse.md | 64 --
 1 file changed, 56 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 083f9ef0f44..f37658630dd 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -4561,16 +4561,64 @@
 ;;
 ;
 
-(define_insn "sse_cvtpi2ps"
-  [(set (match_operand:V4SF 0 "register_operand" "=x")
+(define_insn_and_split "sse_cvtpi2ps"
+  [(set (match_operand:V4SF 0 "register_operand" "=x,x,Yv")
(vec_merge:V4SF
  (vec_duplicate:V4SF
-   (float:V2SF (match_operand:V2SI 2 "nonimmediate_operand" "ym")))
- (match_operand:V4SF 1 "register_operand" "0")
- (const_int 3)))]
-  "TARGET_SSE"
-  "cvtpi2ps\t{%2, %0|%0, %2}"
-  [(set_attr "type" "ssecvt")
+   (float:V2SF (match_operand:V2SI 2 "mmx_nonimmediate_operand" 
"ym,x,Yv")))
+ (match_operand:V4SF 1 "register_operand" "0,0,Yv")
+ (const_int 3)))
+   (clobber (match_scratch:V4SF 3 "=X,x,Yv"))]
+  "TARGET_SSE || TARGET_MMX_WITH_SSE"
+  "@
+   cvtpi2ps\t{%2, %0|%0, %2}
+   #
+   #"
+  "TARGET_MMX_WITH_SSE && reload_completed"
+  [(const_int 0)]
+{
+  rtx op2 = lowpart_subreg (V4SImode, operands[2],
+   GET_MODE (operands[2]));
+  /* Generate SSE2 cvtdq2ps.  */
+  rtx insn = gen_floatv4siv4sf2 (operands[3], op2);
+  emit_insn (insn);
+
+  /* Merge operands[3] with operands[0].  */
+  rtx mask, op1;
+  if (TARGET_AVX)
+{
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4, GEN_INT (0), GEN_INT (1),
+ GEN_INT (6), GEN_INT (7)));
+  op1 = gen_rtx_VEC_CONCAT (V8SFmode, operands[3], operands[1]);
+  op2 = gen_rtx_VEC_SELECT (V4SFmode, op1, mask);
+  insn = gen_rtx_SET (operands[0], op2);
+}
+  else
+{
+  /* NB: SSE can only concatenate OP0 and OP3 to OP0.  */
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4, GEN_INT (2), GEN_INT (3),
+ GEN_INT (4), GEN_INT (5)));
+  op1 = gen_rtx_VEC_CONCAT (V8SFmode, operands[0], operands[3]);
+  op2 = gen_rtx_VEC_SELECT (V4SFmode, op1, mask);
+  insn = gen_rtx_SET (operands[0], op2);
+  emit_insn (insn);
+
+  /* Swap bits 0:63 with bits 64:127.  */
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4, GEN_INT (2), GEN_INT (3),
+ GEN_INT (0), GEN_INT (1)));
+  rtx dest = lowpart_subreg (V4SImode, operands[0],
+GET_MODE (operands[0]));
+  op1 = gen_rtx_VEC_SELECT (V4SImode, dest, mask);
+  insn = gen_rtx_SET (dest, op1);
+}
+  emit_insn (insn);
+  DONE;
+}
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "ssecvt")
(set_attr "mode" "V4SF")])
 
 (define_insn "sse_cvtps2pi"
-- 
2.20.1



[PATCH 31/42] i386: Emulate MMX ssse3_pmaddubsw with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX ssse3_pmaddubsw with SSE.  Only SSE register source operand
is allowed.

PR target/89021
* config/i386/sse.md (ssse3_pmaddubsw): Add SSE emulation.
---
 gcc/config/i386/sse.md | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index cb4a1c9fc59..f2dbb51c7fd 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15544,17 +15544,17 @@
(set_attr "mode" "TI")])
 
 (define_insn "ssse3_pmaddubsw"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yv")
(ss_plus:V4HI
  (mult:V4HI
(zero_extend:V4HI
  (vec_select:V4QI
-   (match_operand:V8QI 1 "register_operand" "0")
+   (match_operand:V8QI 1 "register_operand" "0,0,Yv")
(parallel [(const_int 0) (const_int 2)
   (const_int 4) (const_int 6)])))
(sign_extend:V4HI
  (vec_select:V4QI
-   (match_operand:V8QI 2 "nonimmediate_operand" "ym")
+   (match_operand:V8QI 2 "nonimmediate_operand" "ym,x,Yv")
(parallel [(const_int 0) (const_int 2)
   (const_int 4) (const_int 6)]
  (mult:V4HI
@@ -15566,13 +15566,17 @@
  (vec_select:V4QI (match_dup 2)
(parallel [(const_int 1) (const_int 3)
   (const_int 5) (const_int 7)]))]
-  "TARGET_SSSE3"
-  "pmaddubsw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "sseiadd")
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"
+  "@
+   pmaddubsw\t{%2, %0|%0, %2}
+   pmaddubsw\t{%2, %0|%0, %2}
+   vpmaddubsw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "sseiadd")
(set_attr "atom_unit" "simul")
(set_attr "prefix_extra" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_mode_iterator PMULHRSW
   [V4HI V8HI (V16HI "TARGET_AVX2")])
-- 
2.20.1



[PATCH 34/42] i386: Emulate MMX ssse3_psign3 with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX ssse3_psign3 with SSE.  Only SSE register source operand
is allowed.

PR target/89021
* config/i386/sse.md (ssse3_psign3): Add SSE emulation.
---
 gcc/config/i386/sse.md | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 6fa9f383cd3..e8478e3f954 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15755,17 +15755,21 @@
(set_attr "mode" "")])
 
 (define_insn "ssse3_psign3"
-  [(set (match_operand:MMXMODEI 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,Yv")
(unspec:MMXMODEI
- [(match_operand:MMXMODEI 1 "register_operand" "0")
-  (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym")]
+ [(match_operand:MMXMODEI 1 "register_operand" "0,0,Yv")
+  (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym,x,Yv")]
  UNSPEC_PSIGN))]
-  "TARGET_SSSE3"
-  "psign\t{%2, %0|%0, %2}";
-  [(set_attr "type" "sselog1")
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"
+  "@
+   psign\t{%2, %0|%0, %2}
+   psign\t{%2, %0|%0, %2}
+   vpsign\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "sselog1")
(set_attr "prefix_extra" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "_palignr_mask"
   [(set (match_operand:VI1_AVX512 0 "register_operand" "=v")
-- 
2.20.1



[PATCH 39/42] i386: Allow MMX vector expanders with TARGET_MMX_WITH_SSE

2019-02-15 Thread H.J. Lu
PR target/89021
* config/i386/i386.c (ix86_expand_vector_init_duplicate): Set
mmx_ok to true if TARGET_MMX_WITH_SSE is true.
(ix86_expand_vector_init_one_nonzero): Likewise.
(ix86_expand_vector_init_one_var): Likewise.
(ix86_expand_vector_init_general): Likewise.
(ix86_expand_vector_init): Likewise.
(ix86_expand_vector_set): Likewise.
(ix86_expand_vector_extract): Likewise.
* config/i386/mmx.md (*vec_dupv2sf): Changed to
define_insn_and_split to support SSE emulation.
(*vec_extractv2sf_0): Likewise.
(*vec_extractv2sf_1): Likewise.
(*vec_extractv2si_0): Likewise.
(*vec_extractv2si_1): Likewise.
(*vec_extractv2si_zext_mem): Likewise.
(vec_setv2sf): Also allow TARGET_MMX_WITH_SSE.
(vec_extractv2sf_1 splitter): Likewise.
(vec_extractv2sfsf): Likewise.
(vec_setv2si): Likewise.
(vec_extractv2si_1 splitter): Likewise.
(vec_extractv2sisi): Likewise.
(vec_setv4hi): Likewise.
(vec_extractv4hihi): Likewise.
(vec_setv8qi): Likewise.
(vec_extractv8qiqi): Likewise.
---
 gcc/config/i386/i386.c |  8 +
 gcc/config/i386/mmx.md | 69 +++---
 2 files changed, 52 insertions(+), 25 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a76c17beece..25e0dc43a9e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -42620,6 +42620,7 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, 
machine_mode mode,
 {
   bool ok;
 
+  mmx_ok |= TARGET_MMX_WITH_SSE;
   switch (mode)
 {
 case E_V2SImode:
@@ -42779,6 +42780,7 @@ ix86_expand_vector_init_one_nonzero (bool mmx_ok, 
machine_mode mode,
   bool use_vector_set = false;
   rtx (*gen_vec_set_0) (rtx, rtx, rtx) = NULL;
 
+  mmx_ok |= TARGET_MMX_WITH_SSE;
   switch (mode)
 {
 case E_V2DImode:
@@ -42972,6 +42974,7 @@ ix86_expand_vector_init_one_var (bool mmx_ok, 
machine_mode mode,
   XVECEXP (const_vec, 0, one_var) = CONST0_RTX (GET_MODE_INNER (mode));
   const_vec = gen_rtx_CONST_VECTOR (mode, XVEC (const_vec, 0));
 
+  mmx_ok |= TARGET_MMX_WITH_SSE;
   switch (mode)
 {
 case E_V2DFmode:
@@ -43357,6 +43360,7 @@ ix86_expand_vector_init_general (bool mmx_ok, 
machine_mode mode,
   machine_mode quarter_mode = VOIDmode;
   int n, i;
 
+  mmx_ok |= TARGET_MMX_WITH_SSE;
   switch (mode)
 {
 case E_V2SFmode:
@@ -43556,6 +43560,8 @@ ix86_expand_vector_init (bool mmx_ok, rtx target, rtx 
vals)
   int i;
   rtx x;
 
+  mmx_ok |= TARGET_MMX_WITH_SSE;
+
   /* Handle first initialization from vector elts.  */
   if (n_elts != XVECLEN (vals, 0))
 {
@@ -43655,6 +43661,7 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx 
val, int elt)
   machine_mode mmode = VOIDmode;
   rtx (*gen_blendm) (rtx, rtx, rtx, rtx);
 
+  mmx_ok |= TARGET_MMX_WITH_SSE;
   switch (mode)
 {
 case E_V2SFmode:
@@ -44010,6 +44017,7 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, 
rtx vec, int elt)
   bool use_vec_extr = false;
   rtx tmp;
 
+  mmx_ok |= TARGET_MMX_WITH_SSE;
   switch (mode)
 {
 case E_V2SImode:
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index c5c0c449aab..aaafae53469 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -555,14 +555,23 @@
(set_attr "prefix_extra" "1")
(set_attr "mode" "V2SF")])
 
-(define_insn "*vec_dupv2sf"
-  [(set (match_operand:V2SF 0 "register_operand" "=y")
+(define_insn_and_split "*vec_dupv2sf"
+  [(set (match_operand:V2SF 0 "register_operand" "=y,x,Yv")
(vec_duplicate:V2SF
- (match_operand:SF 1 "register_operand" "0")))]
-  "TARGET_MMX"
-  "punpckldq\t%0, %0"
-  [(set_attr "type" "mmxcvt")
-   (set_attr "mode" "DI")])
+ (match_operand:SF 1 "register_operand" "0,0,Yv")))]
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
+  "@
+   punpckldq\t%0, %0
+   #
+   #"
+  "TARGET_MMX_WITH_SSE && reload_completed"
+  [(set (match_dup 0)
+   (vec_duplicate:V4SF (match_dup 1)))]
+  "operands[0] = lowpart_subreg (V4SFmode, operands[0],
+GET_MODE (operands[0]));"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxcvt,ssemov,ssemov")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "*mmx_concatv2sf"
   [(set (match_operand:V2SF 0 "register_operand" "=y,y")
@@ -580,7 +589,7 @@
   [(match_operand:V2SF 0 "register_operand")
(match_operand:SF 1 "register_operand")
(match_operand 2 "const_int_operand")]
-  "TARGET_MMX"
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
 {
   ix86_expand_vector_set (false, operands[0], operands[1],
  INTVAL (operands[2]));
@@ -594,11 +603,13 @@
(vec_select:SF
  (match_operand:V2SF 1 "nonimmediate_operand" " xm,x,ym,y,m,m")
  (parallel [(const_int 0)])))]
-  "TARGET_MMX && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && !(MEM_

[PATCH 19/42] i386: Emulate MMX V4HI smaxmin/V8QI umaxmin with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX V4HI smaxmin/V8QI umaxmin with SSE.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/mmx.md (mmx_v4hi3): Also check TARGET_MMX
and TARGET_MMX_WITH_SSE.
(mmx_v8qi3): Likewise.
(smaxmin:v4hi3): New.
(umaxmin:v8qi3): Likewise.
(smaxmin:*mmx_v4hi3): Add SSE emulation.
(umaxmin:*mmx_v8qi3): Likewise.
---
 gcc/config/i386/mmx.md | 60 +++---
 1 file changed, 44 insertions(+), 16 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 7bd97c28f71..8833c9f091b 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -925,38 +925,66 @@
 (smaxmin:V4HI
  (match_operand:V4HI 1 "nonimmediate_operand")
  (match_operand:V4HI 2 "nonimmediate_operand")))]
-  "TARGET_SSE || TARGET_3DNOW_A"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
+  "ix86_fixup_binary_operands_no_copy (, V4HImode, operands);")
+
+(define_expand "v4hi3"
+  [(set (match_operand:V4HI 0 "register_operand")
+(smaxmin:V4HI
+ (match_operand:V4HI 1 "nonimmediate_operand")
+ (match_operand:V4HI 2 "nonimmediate_operand")))]
+  "TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (, V4HImode, operands);")
 
 (define_insn "*mmx_v4hi3"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yv")
 (smaxmin:V4HI
- (match_operand:V4HI 1 "nonimmediate_operand" "%0")
- (match_operand:V4HI 2 "nonimmediate_operand" "ym")))]
-  "(TARGET_SSE || TARGET_3DNOW_A)
+ (match_operand:V4HI 1 "nonimmediate_operand" "%0,0,Yv")
+ (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)
&& ix86_binary_operator_ok (, V4HImode, operands)"
-  "pw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxadd")
-   (set_attr "mode" "DI")])
+  "@
+   pw\t{%2, %0|%0, %2}
+   pw\t{%2, %0|%0, %2}
+   vpw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxadd,sseiadd,sseiadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_v8qi3"
   [(set (match_operand:V8QI 0 "register_operand")
 (umaxmin:V8QI
  (match_operand:V8QI 1 "nonimmediate_operand")
  (match_operand:V8QI 2 "nonimmediate_operand")))]
-  "TARGET_SSE || TARGET_3DNOW_A"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
+  "ix86_fixup_binary_operands_no_copy (, V8QImode, operands);")
+
+(define_expand "v8qi3"
+  [(set (match_operand:V8QI 0 "register_operand")
+(umaxmin:V8QI
+ (match_operand:V8QI 1 "nonimmediate_operand")
+ (match_operand:V8QI 2 "nonimmediate_operand")))]
+  "TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (, V8QImode, operands);")
 
 (define_insn "*mmx_v8qi3"
-  [(set (match_operand:V8QI 0 "register_operand" "=y")
+  [(set (match_operand:V8QI 0 "register_operand" "=y,x,Yv")
 (umaxmin:V8QI
- (match_operand:V8QI 1 "nonimmediate_operand" "%0")
- (match_operand:V8QI 2 "nonimmediate_operand" "ym")))]
-  "(TARGET_SSE || TARGET_3DNOW_A)
+ (match_operand:V8QI 1 "nonimmediate_operand" "%0,0,Yv")
+ (match_operand:V8QI 2 "nonimmediate_operand" "ym,x,Yv")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)
&& ix86_binary_operator_ok (, V8QImode, operands)"
-  "pb\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxadd")
-   (set_attr "mode" "DI")])
+  "@
+   pb\t{%2, %0|%0, %2}
+   pb\t{%2, %0|%0, %2}
+   vpb\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxadd,sseiadd,sseiadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "mmx_ashr3"
   [(set (match_operand:MMXMODE24 0 "register_operand" "=y,x,Yv")
-- 
2.20.1



[PATCH 22/42] i386: Emulate MMX maskmovq with SSE2 maskmovdqu

2019-02-15 Thread H.J. Lu
Emulate MMX maskmovq with SSE2 maskmovdqu for TARGET_MMX_WITH_SSE by
zero-extending source and mask operands to 128 bits.  Handle unmapped
bits 64:127 at memory address by adjusting source and mask operands
together with memory address.

PR target/89021
* config/i386/xmmintrin.h: Emulate MMX maskmovq with SSE2
maskmovdqu for __MMX_WITH_SSE__.
---
 gcc/config/i386/xmmintrin.h | 61 +
 1 file changed, 61 insertions(+)

diff --git a/gcc/config/i386/xmmintrin.h b/gcc/config/i386/xmmintrin.h
index 58284378514..a915f6c87d7 100644
--- a/gcc/config/i386/xmmintrin.h
+++ b/gcc/config/i386/xmmintrin.h
@@ -1165,7 +1165,68 @@ _m_pshufw (__m64 __A, int const __N)
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_maskmove_si64 (__m64 __A, __m64 __N, char *__P)
 {
+#ifdef __MMX_WITH_SSE__
+  /* Emulate MMX maskmovq with SSE2 maskmovdqu and handle unmapped bits
+ 64:127 at address __P.  */
+  typedef long long __v2di __attribute__ ((__vector_size__ (16)));
+  typedef char __v16qi __attribute__ ((__vector_size__ (16)));
+  /* Zero-extend __A and __N to 128 bits.  */
+  __v2di __A128 = __extension__ (__v2di) { ((__v1di) __A)[0], 0 };
+  __v2di __N128 = __extension__ (__v2di) { ((__v1di) __N)[0], 0 };
+
+  /* Check the alignment of __P.  */
+  __SIZE_TYPE__ offset = ((__SIZE_TYPE__) __P) & 0xf;
+  if (offset)
+{
+  /* If the misalignment of __P > 8, subtract __P by 8 bytes.
+Otherwise, subtract __P by the misalignment.  */
+  if (offset > 8)
+   offset = 8;
+  __P = (char *) (((__SIZE_TYPE__) __P) - offset);
+
+  /* Shift __A128 and __N128 to the left by the adjustment.  */
+  switch (offset)
+   {
+   case 1:
+ __A128 = __builtin_ia32_pslldqi128 (__A128, 8);
+ __N128 = __builtin_ia32_pslldqi128 (__N128, 8);
+ break;
+   case 2:
+ __A128 = __builtin_ia32_pslldqi128 (__A128, 2 * 8);
+ __N128 = __builtin_ia32_pslldqi128 (__N128, 2 * 8);
+ break;
+   case 3:
+ __A128 = __builtin_ia32_pslldqi128 (__A128, 3 * 8);
+ __N128 = __builtin_ia32_pslldqi128 (__N128, 3 * 8);
+ break;
+   case 4:
+ __A128 = __builtin_ia32_pslldqi128 (__A128, 4 * 8);
+ __N128 = __builtin_ia32_pslldqi128 (__N128, 4 * 8);
+ break;
+   case 5:
+ __A128 = __builtin_ia32_pslldqi128 (__A128, 5 * 8);
+ __N128 = __builtin_ia32_pslldqi128 (__N128, 5 * 8);
+ break;
+   case 6:
+ __A128 = __builtin_ia32_pslldqi128 (__A128, 6 * 8);
+ __N128 = __builtin_ia32_pslldqi128 (__N128, 6 * 8);
+ break;
+   case 7:
+ __A128 = __builtin_ia32_pslldqi128 (__A128, 7 * 8);
+ __N128 = __builtin_ia32_pslldqi128 (__N128, 7 * 8);
+ break;
+   case 8:
+ __A128 = __builtin_ia32_pslldqi128 (__A128, 8 * 8);
+ __N128 = __builtin_ia32_pslldqi128 (__N128, 8 * 8);
+ break;
+   default:
+ break;
+   }
+}
+  __builtin_ia32_maskmovdqu ((__v16qi)__A128, (__v16qi)__N128, __P);
+#else
   __builtin_ia32_maskmovq ((__v8qi)__A, (__v8qi)__N, __P);
+#endif
 }
 
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
-- 
2.20.1



[PATCH 35/42] i386: Emulate MMX ssse3_palignrdi with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX version of palignrq with SSE version by concatenating 2
64-bit MMX operands into a single 128-bit SSE operand, followed by
SSE psrldq.  Only SSE register source operand is allowed.

PR target/89021
* config/i386/sse.md (ssse3_palignrdi): Changed to
define_insn_and_split to support SSE emulation.
---
 gcc/config/i386/sse.md | 58 ++
 1 file changed, 48 insertions(+), 10 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index e8478e3f954..e17f395688b 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15824,23 +15824,61 @@
(set_attr "prefix" "orig,vex,evex")
(set_attr "mode" "")])
 
-(define_insn "ssse3_palignrdi"
-  [(set (match_operand:DI 0 "register_operand" "=y")
-   (unspec:DI [(match_operand:DI 1 "register_operand" "0")
-   (match_operand:DI 2 "nonimmediate_operand" "ym")
-   (match_operand:SI 3 "const_0_to_255_mul_8_operand" "n")]
+(define_insn_and_split "ssse3_palignrdi"
+  [(set (match_operand:DI 0 "register_operand" "=y,x,Yv")
+   (unspec:DI [(match_operand:DI 1 "register_operand" "0,0,Yv")
+   (match_operand:DI 2 "nonimmediate_operand" "ym,x,Yv")
+   (match_operand:SI 3 "const_0_to_255_mul_8_operand" "n,n,n")]
   UNSPEC_PALIGNR))]
-  "TARGET_SSSE3"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"
 {
-  operands[3] = GEN_INT (INTVAL (operands[3]) / 8);
-  return "palignr\t{%3, %2, %0|%0, %2, %3}";
+  switch (which_alternative)
+{
+case 0:
+  operands[3] = GEN_INT (INTVAL (operands[3]) / 8);
+  return "palignr\t{%3, %2, %0|%0, %2, %3}";
+case 1:
+case 2:
+  return "#";
+default:
+  gcc_unreachable ();
+}
 }
-  [(set_attr "type" "sseishft")
+  "TARGET_MMX_WITH_SSE && reload_completed"
+  [(set (match_dup 0)
+   (lshiftrt:V1TI (match_dup 0) (match_dup 3)))]
+{
+  /* Emulate MMX palignrdi with SSE psrldq.  */
+  rtx op0 = lowpart_subreg (V2DImode, operands[0],
+   GET_MODE (operands[0]));
+  rtx insn;
+  if (TARGET_AVX)
+insn = gen_vec_concatv2di (op0, operands[2], operands[1]);
+  else
+{
+  /* NB: SSE can only concatenate OP0 and OP1 to OP0.  */
+  insn = gen_vec_concatv2di (op0, operands[1], operands[2]);
+  emit_insn (insn);
+  /* Swap bits 0:63 with bits 64:127.  */
+  rtx mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4, GEN_INT (2),
+ GEN_INT (3),
+ GEN_INT (0),
+ GEN_INT (1)));
+  rtx op1 = lowpart_subreg (V4SImode, op0, GET_MODE (op0));
+  rtx op2 = gen_rtx_VEC_SELECT (V4SImode, op1, mask);
+  insn = gen_rtx_SET (op1, op2);
+}
+  emit_insn (insn);
+  operands[0] = lowpart_subreg (V1TImode, op0, GET_MODE (op0));
+}
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "sseishft")
(set_attr "atom_unit" "sishuf")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 ;; Mode iterator to handle singularity w/ absence of V2DI and V4DI
 ;; modes for abs instruction on pre AVX-512 targets.
-- 
2.20.1



[PATCH 37/42] Prevent allocation of MMX registers with TARGET_MMX_WITH_SSE

2019-02-15 Thread H.J. Lu
From: Uros Bizjak 

2019-02-14  Uroš Bizjak  

PR target/89021
* config/i386/i386.md (*zero_extendsidi2): Add mmx_isa attribute.
* config/i386/sse.md (*vec_concatv2sf_sse4_1): Ditto.
(*vec_concatv2sf_sse): Ditto.
(*vec_concatv2si_sse4_1): Ditto.
(*vec_concatv2si): Ditto.
(*vec_concatv4si_0): Ditto.
(*vec_concatv2di_0): Ditto.
---
 gcc/config/i386/i386.md |  4 
 gcc/config/i386/sse.md  | 16 ++--
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index e1727676deb..22172fd77a8 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3682,6 +3682,10 @@
  (const_string "avx512bw")
   ]
   (const_string "*")))
+   (set (attr "mmx_isa")
+ (if_then_else (eq_attr "alternative" "5,6")
+  (const_string "native")
+  (const_string "*")))
(set (attr "type")
  (cond [(eq_attr "alternative" "0,1,2,4")
  (const_string "multi")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 0174778833a..203111c4c9e 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -7201,6 +7201,10 @@
  (const_string "mmxmov")
   ]
   (const_string "sselog")))
+   (set (attr "mmx_isa")
+ (if_then_else (eq_attr "alternative" "7,8")
+  (const_string "native")
+  (const_string "*")))
(set (attr "prefix_data16")
  (if_then_else (eq_attr "alternative" "3,4")
   (const_string "1")
@@ -7236,7 +7240,8 @@
movss\t{%1, %0|%0, %1}
punpckldq\t{%2, %0|%0, %2}
movd\t{%1, %0|%0, %1}"
-  [(set_attr "type" "sselog,ssemov,mmxcvt,mmxmov")
+  [(set_attr "mmx_isa" "*,*,native,native")
+   (set_attr "type" "sselog,ssemov,mmxcvt,mmxmov")
(set_attr "mode" "V4SF,SF,DI,DI")])
 
 (define_insn "*vec_concatv4sf"
@@ -14509,6 +14514,10 @@
punpckldq\t{%2, %0|%0, %2}
movd\t{%1, %0|%0, %1}"
   [(set_attr "isa" "noavx,noavx,avx,avx512dq,noavx,noavx,avx,*,*,*")
+   (set (attr "mmx_isa")
+ (if_then_else (eq_attr "alternative" "8,9")
+  (const_string "native")
+  (const_string "*")))
(set (attr "type")
  (cond [(eq_attr "alternative" "7")
  (const_string "ssemov")
@@ -14546,6 +14555,7 @@
punpckldq\t{%2, %0|%0, %2}
movd\t{%1, %0|%0, %1}"
   [(set_attr "isa" "sse2,sse2,*,*,*,*")
+   (set_attr "mmx_isa" "*,*,*,*,native,native")
(set_attr "type" "sselog,ssemov,sselog,ssemov,mmxcvt,mmxmov")
(set_attr "mode" "TI,TI,V4SF,SF,DI,DI")])
 
@@ -14575,7 +14585,8 @@
   "@
%vmovq\t{%1, %0|%0, %1}
movq2dq\t{%1, %0|%0, %1}"
-  [(set_attr "type" "ssemov")
+  [(set_attr "mmx_isa" "*,native")
+   (set_attr "type" "ssemov")
(set_attr "prefix" "maybe_vex,orig")
(set_attr "mode" "TI")])
 
@@ -14650,6 +14661,7 @@
%vmovq\t{%1, %0|%0, %1}
movq2dq\t{%1, %0|%0, %1}"
   [(set_attr "isa" "x64,*,*")
+   (set_attr "mmx_isa" "*,*,native")
(set_attr "type" "ssemov")
(set_attr "prefix_rex" "1,*,*")
(set_attr "prefix" "maybe_vex,maybe_vex,orig")
-- 
2.20.1



[PATCH 17/42] i386: Emulate MMX mmx_pextrw with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX mmx_pextrw with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mmx_pextrw): Add SSE emulation.
---
 gcc/config/i386/mmx.md | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 3ea64e9aabe..678eaa713dc 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1310,16 +1310,18 @@
(set_attr "mode" "DI")])
 
 (define_insn "mmx_pextrw"
-  [(set (match_operand:SI 0 "register_operand" "=r")
+  [(set (match_operand:SI 0 "register_operand" "=r,r")
 (zero_extend:SI
  (vec_select:HI
-   (match_operand:V4HI 1 "register_operand" "y")
-   (parallel [(match_operand:SI 2 "const_0_to_3_operand" "n")]]
-  "TARGET_SSE || TARGET_3DNOW_A"
-  "pextrw\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "mmxcvt")
+   (match_operand:V4HI 1 "register_operand" "y,Yv")
+   (parallel [(match_operand:SI 2 "const_0_to_3_operand" "n,n")]]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
+  "%vpextrw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64")
+   (set_attr "type" "mmxcvt,sselog1")
(set_attr "length_immediate" "1")
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI")])
 
 (define_expand "mmx_pshufw"
   [(match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 21/42] i386: Emulate MMX mmx_umulv4hi3_highpart with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX mmx_umulv4hi3_highpart with SSE.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/mmx.md (mmx_umulv4hi3_highpart): Also check
TARGET_MMX and TARGET_MMX_WITH_SSE.
(*mmx_umulv4hi3_highpart): Add SSE emulation.
---
 gcc/config/i386/mmx.md | 22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 1adb50aa4b1..940f022464d 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -785,24 +785,30 @@
  (zero_extend:V4SI
(match_operand:V4HI 2 "nonimmediate_operand")))
(const_int 16]
-  "TARGET_SSE || TARGET_3DNOW_A"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
   "ix86_fixup_binary_operands_no_copy (MULT, V4HImode, operands);")
 
 (define_insn "*mmx_umulv4hi3_highpart"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yv")
(truncate:V4HI
  (lshiftrt:V4SI
(mult:V4SI
  (zero_extend:V4SI
-   (match_operand:V4HI 1 "nonimmediate_operand" "%0"))
+   (match_operand:V4HI 1 "nonimmediate_operand" "%0,0,Yv"))
  (zero_extend:V4SI
-   (match_operand:V4HI 2 "nonimmediate_operand" "ym")))
+   (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv")))
  (const_int 16]
-  "(TARGET_SSE || TARGET_3DNOW_A)
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)
&& ix86_binary_operator_ok (MULT, V4HImode, operands)"
-  "pmulhuw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxmul")
-   (set_attr "mode" "DI")])
+  "@
+   pmulhuw\t{%2, %0|%0, %2}
+   pmulhuw\t{%2, %0|%0, %2}
+   vpmulhuw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxmul,ssemul,ssemul")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_pmaddwd"
   [(set (match_operand:V2SI 0 "register_operand")
-- 
2.20.1



[PATCH 32/42] i386: Emulate MMX ssse3_pmulhrswv4hi3 with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX ssse3_pmulhrswv4hi3 with SSE.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/sse.md (*ssse3_pmulhrswv4hi3): Add SSE emulation.
---
 gcc/config/i386/sse.md | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index f2dbb51c7fd..2b91f8f5839 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15652,25 +15652,31 @@
(set_attr "mode" "")])
 
 (define_insn "*ssse3_pmulhrswv4hi3"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yv")
(truncate:V4HI
  (lshiftrt:V4SI
(plus:V4SI
  (lshiftrt:V4SI
(mult:V4SI
  (sign_extend:V4SI
-   (match_operand:V4HI 1 "nonimmediate_operand" "%0"))
+   (match_operand:V4HI 1 "nonimmediate_operand" "%0,0,Yv"))
  (sign_extend:V4SI
-   (match_operand:V4HI 2 "nonimmediate_operand" "ym")))
+   (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv")))
(const_int 14))
  (match_operand:V4HI 3 "const1_operand"))
(const_int 1]
-  "TARGET_SSSE3 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
-  "pmulhrsw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "sseimul")
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && TARGET_SSSE3
+   && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
+  "@
+   pmulhrsw\t{%2, %0|%0, %2}
+   pmulhrsw\t{%2, %0|%0, %2}
+   vpmulhrsw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "sseimul")
(set_attr "prefix_extra" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "_pshufb3"
   [(set (match_operand:VI1_AVX512 0 "register_operand" "=x,x,v")
-- 
2.20.1



[PATCH 36/42] i386: Emulate MMX abs2 with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX abs2 with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/sse.md (abs2): Add SSE emulation.
---
 gcc/config/i386/sse.md | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index e17f395688b..0174778833a 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15942,16 +15942,19 @@
 })
 
 (define_insn "abs2"
-  [(set (match_operand:MMXMODEI 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI 0 "register_operand" "=y,Yv")
(abs:MMXMODEI
- (match_operand:MMXMODEI 1 "nonimmediate_operand" "ym")))]
-  "TARGET_SSSE3"
-  "pabs\t{%1, %0|%0, %1}";
-  [(set_attr "type" "sselog1")
+ (match_operand:MMXMODEI 1 "nonimmediate_operand" "ym,Yv")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"
+  "@
+   pabs\t{%1, %0|%0, %1}
+   %vpabs\t{%1, %0|%0, %1}"
+  [(set_attr "mmx_isa" "native,x64")
+   (set_attr "type" "sselog1")
(set_attr "prefix_rep" "0")
(set_attr "prefix_extra" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI")])
 
 ;
 ;;
-- 
2.20.1



[PATCH 41/42] i386: Enable TM MMX intrinsics with SSE2

2019-02-15 Thread H.J. Lu
This pach enables TM MMX intrinsics with SSE2 when MMX is disabled.

PR target/89021
* config/i386/i386.c (bdesc_tm): Enable MMX intrinsics with
SSE2.
---
 gcc/config/i386/i386.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 073a2534d1f..319a98f824a 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -31065,13 +31065,13 @@ static const struct builtin_description 
bdesc_##kind[] =  \
we're lazy.  Add casts to make them fit.  */
 static const struct builtin_description bdesc_tm[] =
 {
-  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, "__builtin__ITM_WM64", (enum 
ix86_builtins) BUILT_IN_TM_STORE_M64, UNKNOWN, VOID_FTYPE_PV2SI_V2SI },
-  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, "__builtin__ITM_WaRM64", (enum 
ix86_builtins) BUILT_IN_TM_STORE_WAR_M64, UNKNOWN, VOID_FTYPE_PV2SI_V2SI },
-  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, "__builtin__ITM_WaWM64", (enum 
ix86_builtins) BUILT_IN_TM_STORE_WAW_M64, UNKNOWN, VOID_FTYPE_PV2SI_V2SI },
-  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, "__builtin__ITM_RM64", (enum 
ix86_builtins) BUILT_IN_TM_LOAD_M64, UNKNOWN, V2SI_FTYPE_PCV2SI },
-  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, "__builtin__ITM_RaRM64", (enum 
ix86_builtins) BUILT_IN_TM_LOAD_RAR_M64, UNKNOWN, V2SI_FTYPE_PCV2SI },
-  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, "__builtin__ITM_RaWM64", (enum 
ix86_builtins) BUILT_IN_TM_LOAD_RAW_M64, UNKNOWN, V2SI_FTYPE_PCV2SI },
-  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, "__builtin__ITM_RfWM64", (enum 
ix86_builtins) BUILT_IN_TM_LOAD_RFW_M64, UNKNOWN, V2SI_FTYPE_PCV2SI },
+  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, 
"__builtin__ITM_WM64", (enum ix86_builtins) BUILT_IN_TM_STORE_M64, UNKNOWN, 
VOID_FTYPE_PV2SI_V2SI },
+  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, 
"__builtin__ITM_WaRM64", (enum ix86_builtins) BUILT_IN_TM_STORE_WAR_M64, 
UNKNOWN, VOID_FTYPE_PV2SI_V2SI },
+  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, 
"__builtin__ITM_WaWM64", (enum ix86_builtins) BUILT_IN_TM_STORE_WAW_M64, 
UNKNOWN, VOID_FTYPE_PV2SI_V2SI },
+  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, 
"__builtin__ITM_RM64", (enum ix86_builtins) BUILT_IN_TM_LOAD_M64, UNKNOWN, 
V2SI_FTYPE_PCV2SI },
+  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, 
"__builtin__ITM_RaRM64", (enum ix86_builtins) BUILT_IN_TM_LOAD_RAR_M64, 
UNKNOWN, V2SI_FTYPE_PCV2SI },
+  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, 
"__builtin__ITM_RaWM64", (enum ix86_builtins) BUILT_IN_TM_LOAD_RAW_M64, 
UNKNOWN, V2SI_FTYPE_PCV2SI },
+  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, 
"__builtin__ITM_RfWM64", (enum ix86_builtins) BUILT_IN_TM_LOAD_RFW_M64, 
UNKNOWN, V2SI_FTYPE_PCV2SI },
 
   { OPTION_MASK_ISA_SSE, 0, CODE_FOR_nothing, "__builtin__ITM_WM128", (enum 
ix86_builtins) BUILT_IN_TM_STORE_M128, UNKNOWN, VOID_FTYPE_PV4SF_V4SF },
   { OPTION_MASK_ISA_SSE, 0, CODE_FOR_nothing, "__builtin__ITM_WaRM128", (enum 
ix86_builtins) BUILT_IN_TM_STORE_WAR_M128, UNKNOWN, VOID_FTYPE_PV4SF_V4SF },
@@ -31089,7 +31089,7 @@ static const struct builtin_description bdesc_tm[] =
   { OPTION_MASK_ISA_AVX, 0, CODE_FOR_nothing, "__builtin__ITM_RaWM256", (enum 
ix86_builtins) BUILT_IN_TM_LOAD_RAW_M256, UNKNOWN, V8SF_FTYPE_PCV8SF },
   { OPTION_MASK_ISA_AVX, 0, CODE_FOR_nothing, "__builtin__ITM_RfWM256", (enum 
ix86_builtins) BUILT_IN_TM_LOAD_RFW_M256, UNKNOWN, V8SF_FTYPE_PCV8SF },
 
-  { OPTION_MASK_ISA_MMX, 0, CODE_FOR_nothing, "__builtin__ITM_LM64", (enum 
ix86_builtins) BUILT_IN_TM_LOG_M64, UNKNOWN, VOID_FTYPE_PCVOID },
+  { OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_nothing, 
"__builtin__ITM_LM64", (enum ix86_builtins) BUILT_IN_TM_LOG_M64, UNKNOWN, 
VOID_FTYPE_PCVOID },
   { OPTION_MASK_ISA_SSE, 0, CODE_FOR_nothing, "__builtin__ITM_LM128", (enum 
ix86_builtins) BUILT_IN_TM_LOG_M128, UNKNOWN, VOID_FTYPE_PCVOID },
   { OPTION_MASK_ISA_AVX, 0, CODE_FOR_nothing, "__builtin__ITM_LM256", (enum 
ix86_builtins) BUILT_IN_TM_LOG_M256, UNKNOWN, VOID_FTYPE_PCVOID },
 };
-- 
2.20.1



[PATCH 23/42] i386: Emulate MMX mmx_uavgv8qi3 with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX mmx_uavgv8qi3 with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mmx_uavgv8qi3): Also check TARGET_MMX
and TARGET_MMX_WITH_SSE.
(*mmx_uavgv8qi3): Add SSE emulation.
---
 gcc/config/i386/mmx.md | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 940f022464d..0bd87ba79e8 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1684,42 +1684,47 @@
  (const_int 1) (const_int 1)
  (const_int 1) (const_int 1)]))
(const_int 1]
-  "TARGET_SSE || TARGET_3DNOW"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
   "ix86_fixup_binary_operands_no_copy (PLUS, V8QImode, operands);")
 
 (define_insn "*mmx_uavgv8qi3"
-  [(set (match_operand:V8QI 0 "register_operand" "=y")
+  [(set (match_operand:V8QI 0 "register_operand" "=y,x,Yv")
(truncate:V8QI
  (lshiftrt:V8HI
(plus:V8HI
  (plus:V8HI
(zero_extend:V8HI
- (match_operand:V8QI 1 "nonimmediate_operand" "%0"))
+ (match_operand:V8QI 1 "nonimmediate_operand" "%0,0,Yv"))
(zero_extend:V8HI
- (match_operand:V8QI 2 "nonimmediate_operand" "ym")))
+ (match_operand:V8QI 2 "nonimmediate_operand" "ym,x,Yv")))
  (const_vector:V8HI [(const_int 1) (const_int 1)
  (const_int 1) (const_int 1)
  (const_int 1) (const_int 1)
  (const_int 1) (const_int 1)]))
(const_int 1]
-  "(TARGET_SSE || TARGET_3DNOW)
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)
&& ix86_binary_operator_ok (PLUS, V8QImode, operands)"
 {
   /* These two instructions have the same operation, but their encoding
  is different.  Prefer the one that is de facto standard.  */
-  if (TARGET_SSE || TARGET_3DNOW_A)
+  if (TARGET_MMX_WITH_SSE && TARGET_AVX)
+return "vpavgb\t{%2, %1, %0|%0, %1, %2}";
+  else if (TARGET_SSE || TARGET_3DNOW_A)
 return "pavgb\t{%2, %0|%0, %2}";
   else
 return "pavgusb\t{%2, %0|%0, %2}";
 }
-  [(set_attr "type" "mmxshft")
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxshft,sseiadd,sseiadd")
(set (attr "prefix_extra")
  (if_then_else
(not (ior (match_test "TARGET_SSE")
 (match_test "TARGET_3DNOW_A")))
(const_string "1")
(const_string "*")))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_uavgv4hi3"
   [(set (match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 27/42] i386: Emulate MMX umulv1siv1di3 with SSE2

2019-02-15 Thread H.J. Lu
Emulate MMX umulv1siv1di3 with SSE2.  Only SSE register source operand
is allowed.

PR target/89021
* config/i386/mmx.md (sse2_umulv1siv1di3): Add SSE emulation
support.
(*sse2_umulv1siv1di3): Add SSE2 emulation.
---
 gcc/config/i386/mmx.md | 22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 427a037fa62..d662663a445 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -911,24 +911,30 @@
(vec_select:V1SI
  (match_operand:V2SI 2 "nonimmediate_operand")
  (parallel [(const_int 0)])]
-  "TARGET_SSE2"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSE2"
   "ix86_fixup_binary_operands_no_copy (MULT, V2SImode, operands);")
 
 (define_insn "*sse2_umulv1siv1di3"
-  [(set (match_operand:V1DI 0 "register_operand" "=y")
+  [(set (match_operand:V1DI 0 "register_operand" "=y,x,Yv")
 (mult:V1DI
  (zero_extend:V1DI
(vec_select:V1SI
- (match_operand:V2SI 1 "nonimmediate_operand" "%0")
+ (match_operand:V2SI 1 "nonimmediate_operand" "%0,0,Yv")
  (parallel [(const_int 0)])))
  (zero_extend:V1DI
(vec_select:V1SI
- (match_operand:V2SI 2 "nonimmediate_operand" "ym")
+ (match_operand:V2SI 2 "nonimmediate_operand" "ym,x,Yv")
  (parallel [(const_int 0)])]
-  "TARGET_SSE2 && ix86_binary_operator_ok (MULT, V2SImode, operands)"
-  "pmuludq\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxmul")
-   (set_attr "mode" "DI")])
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && TARGET_SSE2
+   && ix86_binary_operator_ok (MULT, V2SImode, operands)"
+  "@
+   pmuludq\t{%2, %0|%0, %2}
+   pmuludq\t{%2, %0|%0, %2}
+   vpmuludq\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxmul,ssemul,ssemul")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_v4hi3"
   [(set (match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 25/42] i386: Emulate MMX mmx_psadbw with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX mmx_psadbw with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mmx_psadbw): Add SSE emulation.
---
 gcc/config/i386/mmx.md | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 456d1a51c50..8ba8ca6ea45 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1768,14 +1768,19 @@
(set_attr "mode" "DI,TI,TI")])
 
 (define_insn "mmx_psadbw"
-  [(set (match_operand:V1DI 0 "register_operand" "=y")
-(unspec:V1DI [(match_operand:V8QI 1 "register_operand" "0")
- (match_operand:V8QI 2 "nonimmediate_operand" "ym")]
+  [(set (match_operand:V1DI 0 "register_operand" "=y,x,Yv")
+(unspec:V1DI [(match_operand:V8QI 1 "register_operand" "0,0,Yv")
+ (match_operand:V8QI 2 "nonimmediate_operand" "ym,x,Yv")]
 UNSPEC_PSADBW))]
-  "TARGET_SSE || TARGET_3DNOW_A"
-  "psadbw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxshft")
-   (set_attr "mode" "DI")])
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
+  "@
+   psadbw\t{%2, %0|%0, %2}
+   psadbw\t{%2, %0|%0, %2}
+   vpsadbw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxshft,sseiadd,sseiadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn_and_split "mmx_pmovmskb"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
-- 
2.20.1



[PATCH 40/42] i386: Allow MMX intrinsic emulation with SSE

2019-02-15 Thread H.J. Lu
Allow MMX intrinsic emulation with SSE/SSE2/SSSE3.  Don't enable MMX ISA
by default with TARGET_MMX_WITH_SSE.

For pr82483-1.c and pr82483-2.c, "-mssse3 -mno-mmx" compiles in 64-bit
mode since MMX intrinsics can be emulated wit SSE.

gcc/

PR target/89021
* config/i386/i386-builtin.def: Enable MMX intrinsics with
SSE/SSE2/SSSE3.
* config/i386/i386.c (ix86_init_mmx_sse_builtins): Likewise.
(ix86_expand_builtin): Allow SSE/SSE2/SSSE3 to emulate MMX
intrinsics with TARGET_MMX_WITH_SSE.
* config/i386/mmintrin.h: Only require SSE2 if __MMX_WITH_SSE__
is defined.

gcc/testsuite/

PR target/89021
* gcc.target/i386/pr82483-1.c: Error only on ia32.
* gcc.target/i386/pr82483-2.c: Likewise.
---
 gcc/config/i386/i386-builtin.def  | 126 +++---
 gcc/config/i386/i386.c|  29 -
 gcc/config/i386/mmintrin.h|  12 ++-
 gcc/testsuite/gcc.target/i386/pr82483-1.c |   2 +-
 gcc/testsuite/gcc.target/i386/pr82483-2.c |   2 +-
 5 files changed, 101 insertions(+), 70 deletions(-)

diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 88005f4687f..10a9d631f29 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -100,7 +100,7 @@ BDESC (0, 0, CODE_FOR_fnstsw, "__builtin_ia32_fnstsw", 
IX86_BUILTIN_FNSTSW, UNKN
 BDESC (0, 0, CODE_FOR_fnclex, "__builtin_ia32_fnclex", IX86_BUILTIN_FNCLEX, 
UNKNOWN, (int) VOID_FTYPE_VOID)
 
 /* MMX */
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_emms, "__builtin_ia32_emms", 
IX86_BUILTIN_EMMS, UNKNOWN, (int) VOID_FTYPE_VOID)
+BDESC (OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_mmx_emms, 
"__builtin_ia32_emms", IX86_BUILTIN_EMMS, UNKNOWN, (int) VOID_FTYPE_VOID)
 
 /* 3DNow! */
 BDESC (OPTION_MASK_ISA_3DNOW, 0, CODE_FOR_mmx_femms, "__builtin_ia32_femms", 
IX86_BUILTIN_FEMMS, UNKNOWN, (int) VOID_FTYPE_VOID)
@@ -442,68 +442,68 @@ BDESC (0, 0, CODE_FOR_rotrqi3, "__builtin_ia32_rorqi", 
IX86_BUILTIN_RORQI, UNKNO
 BDESC (0, 0, CODE_FOR_rotrhi3, "__builtin_ia32_rorhi", IX86_BUILTIN_RORHI, 
UNKNOWN, (int) UINT16_FTYPE_UINT16_INT)
 
 /* MMX */
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_addv8qi3, "__builtin_ia32_paddb", 
IX86_BUILTIN_PADDB, UNKNOWN, (int) V8QI_FTYPE_V8QI_V8QI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_addv4hi3, "__builtin_ia32_paddw", 
IX86_BUILTIN_PADDW, UNKNOWN, (int) V4HI_FTYPE_V4HI_V4HI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_addv2si3, "__builtin_ia32_paddd", 
IX86_BUILTIN_PADDD, UNKNOWN, (int) V2SI_FTYPE_V2SI_V2SI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_subv8qi3, "__builtin_ia32_psubb", 
IX86_BUILTIN_PSUBB, UNKNOWN, (int) V8QI_FTYPE_V8QI_V8QI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_subv4hi3, "__builtin_ia32_psubw", 
IX86_BUILTIN_PSUBW, UNKNOWN, (int) V4HI_FTYPE_V4HI_V4HI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_subv2si3, "__builtin_ia32_psubd", 
IX86_BUILTIN_PSUBD, UNKNOWN, (int) V2SI_FTYPE_V2SI_V2SI)
-
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_ssaddv8qi3, 
"__builtin_ia32_paddsb", IX86_BUILTIN_PADDSB, UNKNOWN, (int) 
V8QI_FTYPE_V8QI_V8QI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_ssaddv4hi3, 
"__builtin_ia32_paddsw", IX86_BUILTIN_PADDSW, UNKNOWN, (int) 
V4HI_FTYPE_V4HI_V4HI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_sssubv8qi3, 
"__builtin_ia32_psubsb", IX86_BUILTIN_PSUBSB, UNKNOWN, (int) 
V8QI_FTYPE_V8QI_V8QI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_sssubv4hi3, 
"__builtin_ia32_psubsw", IX86_BUILTIN_PSUBSW, UNKNOWN, (int) 
V4HI_FTYPE_V4HI_V4HI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_usaddv8qi3, 
"__builtin_ia32_paddusb", IX86_BUILTIN_PADDUSB, UNKNOWN, (int) 
V8QI_FTYPE_V8QI_V8QI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_usaddv4hi3, 
"__builtin_ia32_paddusw", IX86_BUILTIN_PADDUSW, UNKNOWN, (int) 
V4HI_FTYPE_V4HI_V4HI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_ussubv8qi3, 
"__builtin_ia32_psubusb", IX86_BUILTIN_PSUBUSB, UNKNOWN, (int) 
V8QI_FTYPE_V8QI_V8QI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_ussubv4hi3, 
"__builtin_ia32_psubusw", IX86_BUILTIN_PSUBUSW, UNKNOWN, (int) 
V4HI_FTYPE_V4HI_V4HI)
-
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_mulv4hi3, "__builtin_ia32_pmullw", 
IX86_BUILTIN_PMULLW, UNKNOWN, (int) V4HI_FTYPE_V4HI_V4HI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_smulv4hi3_highpart, 
"__builtin_ia32_pmulhw", IX86_BUILTIN_PMULHW, UNKNOWN, (int) 
V4HI_FTYPE_V4HI_V4HI)
-
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_andv2si3, "__builtin_ia32_pand", 
IX86_BUILTIN_PAND, UNKNOWN, (int) V2SI_FTYPE_V2SI_V2SI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_andnotv2si3, 
"__builtin_ia32_pandn", IX86_BUILTIN_PANDN, UNKNOWN, (int) V2SI_FTYPE_V2SI_V2SI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_iorv2si3, "__builtin_ia32_por", 
IX86_BUILTIN_POR, UNKNOWN, (int) V2SI_FTYPE_V2SI_V2SI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_xorv2si3, "__builtin_ia32_pxor", 
IX86_BUILTIN_PXOR, UNKNOWN, (int) V2SI_FTYPE_V2SI_V2SI

[PATCH 24/42] i386: Emulate MMX mmx_uavgv4hi3 with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX mmx_uavgv4hi3 with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mmx_uavgv4hi3): Also check TARGET_MMX and
TARGET_MMX_WITH_SSE.
(*mmx_uavgv4hi3): Add SSE emulation.
---
 gcc/config/i386/mmx.md | 22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 0bd87ba79e8..456d1a51c50 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1739,27 +1739,33 @@
  (const_vector:V4SI [(const_int 1) (const_int 1)
  (const_int 1) (const_int 1)]))
(const_int 1]
-  "TARGET_SSE || TARGET_3DNOW_A"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
   "ix86_fixup_binary_operands_no_copy (PLUS, V4HImode, operands);")
 
 (define_insn "*mmx_uavgv4hi3"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yv")
(truncate:V4HI
  (lshiftrt:V4SI
(plus:V4SI
  (plus:V4SI
(zero_extend:V4SI
- (match_operand:V4HI 1 "nonimmediate_operand" "%0"))
+ (match_operand:V4HI 1 "nonimmediate_operand" "%0,0,Yv"))
(zero_extend:V4SI
- (match_operand:V4HI 2 "nonimmediate_operand" "ym")))
+ (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv")))
  (const_vector:V4SI [(const_int 1) (const_int 1)
  (const_int 1) (const_int 1)]))
(const_int 1]
-  "(TARGET_SSE || TARGET_3DNOW_A)
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)
&& ix86_binary_operator_ok (PLUS, V4HImode, operands)"
-  "pavgw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxshft")
-   (set_attr "mode" "DI")])
+  "@
+   pavgw\t{%2, %0|%0, %2}
+   pavgw\t{%2, %0|%0, %2}
+   vpavgw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxshft,sseiadd,sseiadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "mmx_psadbw"
   [(set (match_operand:V1DI 0 "register_operand" "=y")
-- 
2.20.1



[PATCH 28/42] i386: Make _mm_empty () as NOP when MMX is disabled

2019-02-15 Thread H.J. Lu
With SSE emulation of MMX intrinsics, we should make _mm_empty () as NOP
when MMX is disabled.

PR target/89021
* config/i386/mmx.md (EMMS): Also allow TARGET_MMX_WITH_SSE.
(mmx_): Generate "" only when MMX is enabled.
---
 gcc/config/i386/mmx.md | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index d662663a445..eaca71d5750 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1839,7 +1839,7 @@
(set_attr "mode" "DI")])
 
 (define_int_iterator EMMS
-  [(UNSPECV_EMMS "TARGET_MMX")
+  [(UNSPECV_EMMS "TARGET_MMX || TARGET_MMX_WITH_SSE")
(UNSPECV_FEMMS "TARGET_3DNOW")])
 
 (define_int_attr emms
@@ -1865,7 +1865,9 @@
(clobber (reg:DI MM6_REG))
(clobber (reg:DI MM7_REG))]
   ""
-  ""
+{
+  return TARGET_MMX ? "" : "";
+}
   [(set_attr "type" "mmx")
(set_attr "modrm" "0")
(set_attr "memory" "none")])
-- 
2.20.1



[PATCH 33/42] i386: Emulate MMX pshufb with SSE version

2019-02-15 Thread H.J. Lu
Emulate MMX version of pshufb with SSE version by masking out the bit 3
of the shuffle control byte.  Only SSE register source operand is allowed.

PR target/89021
* config/i386/sse.md (ssse3_pshufbv8qi3): Changed to
define_insn_and_split.  Also allow TARGET_MMX_WITH_SSE.  Add
SSE emulation.
---
 gcc/config/i386/sse.md | 46 +-
 1 file changed, 37 insertions(+), 9 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 2b91f8f5839..6fa9f383cd3 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15697,17 +15697,45 @@
(set_attr "btver2_decode" "vector")
(set_attr "mode" "")])
 
-(define_insn "ssse3_pshufbv8qi3"
-  [(set (match_operand:V8QI 0 "register_operand" "=y")
-   (unspec:V8QI [(match_operand:V8QI 1 "register_operand" "0")
- (match_operand:V8QI 2 "nonimmediate_operand" "ym")]
-UNSPEC_PSHUFB))]
-  "TARGET_SSSE3"
-  "pshufb\t{%2, %0|%0, %2}";
-  [(set_attr "type" "sselog1")
+(define_insn_and_split "ssse3_pshufbv8qi3"
+  [(set (match_operand:V8QI 0 "register_operand" "=y,x,Yv")
+   (unspec:V8QI [(match_operand:V8QI 1 "register_operand" "0,0,Yv")
+ (match_operand:V8QI 2 "mmx_nonimmediate_operand" 
"ym,x,Yv")]
+UNSPEC_PSHUFB))
+   (clobber (match_scratch:V4SI 3 "=X,x,Yv"))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"
+  "@
+   pshufb\t{%2, %0|%0, %2}
+   #
+   #"
+  "TARGET_MMX_WITH_SSE && reload_completed"
+  [(set (match_dup 3) (match_dup 5))
+   (set (match_dup 3)
+   (and:V4SI (match_dup 3) (match_dup 2)))
+   (set (match_dup 0)
+   (unspec:V16QI [(match_dup 1) (match_dup 4)] UNSPEC_PSHUFB))]
+{
+  /* Emulate MMX version of pshufb with SSE version by masking out the
+ bit 3 of the shuffle control byte.  */
+  operands[0] = lowpart_subreg (V16QImode, operands[0],
+   GET_MODE (operands[0]));
+  operands[1] = lowpart_subreg (V16QImode, operands[1],
+   GET_MODE (operands[1]));
+  operands[2] = lowpart_subreg (V4SImode, operands[2],
+   GET_MODE (operands[2]));
+  operands[4] = lowpart_subreg (V16QImode, operands[3],
+   GET_MODE (operands[3]));
+  rtvec par = gen_rtvec (4, GEN_INT (0xf7f7f7f7),
+GEN_INT (0xf7f7f7f7),
+GEN_INT (0xf7f7f7f7),
+GEN_INT (0xf7f7f7f7));
+  rtx vec_const = gen_rtx_CONST_VECTOR (V4SImode, par);
+  operands[5] = force_const_mem (V4SImode, vec_const);
+}
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
(set_attr "prefix_extra" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "_psign3"
   [(set (match_operand:VI124_AVX2 0 "register_operand" "=x,x")
-- 
2.20.1



[PATCH 38/42] i386: Allow MMXMODE moves with TARGET_MMX_WITH_SSE

2019-02-15 Thread H.J. Lu
PR target/89021
* config/i386/mmx.md (MMXMODE:mov): Also allow
TARGET_MMX_WITH_SSE.
(MMXMODE:*mov_internal): Likewise.
(MMXMODE:movmisalign): Likewise.
---
 gcc/config/i386/mmx.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index eaca71d5750..c5c0c449aab 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -70,7 +70,7 @@
 (define_expand "mov"
   [(set (match_operand:MMXMODE 0 "nonimmediate_operand")
(match_operand:MMXMODE 1 "nonimmediate_operand"))]
-  "TARGET_MMX"
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
 {
   ix86_expand_vector_move (mode, operands);
   DONE;
@@ -81,7 +81,7 @@
 "=r ,o ,r,r ,m ,?!y,!y,?!y,m  ,r  ,?!y,v,v,v,m,r,v,!y,*x")
(match_operand:MMXMODE 1 "nonimm_or_0_operand"
 "rCo,rC,C,rm,rC,C  ,!y,m  ,?!y,?!y,r  ,C,v,m,v,v,r,*x,!y"))]
-  "TARGET_MMX
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
&& !(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
@@ -207,7 +207,7 @@
 (define_expand "movmisalign"
   [(set (match_operand:MMXMODE 0 "nonimmediate_operand")
(match_operand:MMXMODE 1 "nonimmediate_operand"))]
-  "TARGET_MMX"
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
 {
   ix86_expand_vector_move (mode, operands);
   DONE;
-- 
2.20.1



[PATCH 20/42] i386: Emulate MMX mmx_pmovmskb with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX mmx_pmovmskb with SSE by zero-extending result of SSE pmovmskb
from QImode to SImode.  Only SSE register source operand is allowed.

PR target/89021
* config/i386/mmx.md (mmx_pmovmskb): Changed to
define_insn_and_split to support SSE emulation.
---
 gcc/config/i386/mmx.md | 30 +++---
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 8833c9f091b..1adb50aa4b1 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1760,14 +1760,30 @@
   [(set_attr "type" "mmxshft")
(set_attr "mode" "DI")])
 
-(define_insn "mmx_pmovmskb"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-   (unspec:SI [(match_operand:V8QI 1 "register_operand" "y")]
+(define_insn_and_split "mmx_pmovmskb"
+  [(set (match_operand:SI 0 "register_operand" "=r,r")
+   (unspec:SI [(match_operand:V8QI 1 "register_operand" "y,x")]
   UNSPEC_MOVMSK))]
-  "TARGET_SSE || TARGET_3DNOW_A"
-  "pmovmskb\t{%1, %0|%0, %1}"
-  [(set_attr "type" "mmxcvt")
-   (set_attr "mode" "DI")])
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
+  "@
+   pmovmskb\t{%1, %0|%0, %1}
+   #"
+  "TARGET_MMX_WITH_SSE && reload_completed"
+  [(set (match_dup 0)
+(unspec:SI [(match_dup 1)] UNSPEC_MOVMSK))
+   (set (match_dup 0)
+   (zero_extend:SI (match_dup 2)))]
+{
+  /* Generate SSE pmovmskb and zero-extend from QImode to SImode.  */
+  operands[1] = lowpart_subreg (V16QImode, operands[1],
+   GET_MODE (operands[1]));
+  operands[2] = lowpart_subreg (QImode, operands[0],
+   GET_MODE (operands[0]));
+}
+  [(set_attr "mmx_isa" "native,x64")
+   (set_attr "type" "mmxcvt,ssemov")
+   (set_attr "mode" "DI,TI")])
 
 (define_expand "mmx_maskmovq"
   [(set (match_operand:V8QI 0 "memory_operand")
-- 
2.20.1



[PATCH 10/42] i386: Emulate MMX 3 with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX 3 with SSE.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/mmx.md (any_logic:3): New.
(any_logic:*mmx_3): Also allow TARGET_MMX_WITH_SSE.
Add SSE support.
---
 gcc/config/i386/mmx.md | 27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index eef17504616..7a253005aba 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1066,15 +1066,28 @@
   "TARGET_MMX"
   "ix86_fixup_binary_operands_no_copy (, mode, operands);")
 
+(define_expand "3"
+  [(set (match_operand:MMXMODEI 0 "register_operand")
+   (any_logic:MMXMODEI
+ (match_operand:MMXMODEI 1 "nonimmediate_operand")
+ (match_operand:MMXMODEI 2 "nonimmediate_operand")))]
+  "TARGET_MMX_WITH_SSE"
+  "ix86_fixup_binary_operands_no_copy (, mode, operands);")
+
 (define_insn "*mmx_3"
-  [(set (match_operand:MMXMODEI 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,Yv")
 (any_logic:MMXMODEI
- (match_operand:MMXMODEI 1 "nonimmediate_operand" "%0")
- (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym")))]
-  "TARGET_MMX && ix86_binary_operator_ok (, mode, operands)"
-  "p\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxadd")
-   (set_attr "mode" "DI")])
+ (match_operand:MMXMODEI 1 "nonimmediate_operand" "%0,0,Yv")
+ (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym,x,Yv")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && ix86_binary_operator_ok (, mode, operands)"
+  "@
+   p\t{%2, %0|%0, %2}
+   p\t{%2, %0|%0, %2}
+   vp\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxadd,sselog,sselog")
+   (set_attr "mode" "DI,TI,TI")])
 
 ;
 ;;
-- 
2.20.1



[PATCH 30/42] i386: Emulate MMX ssse3_phdv2si3 with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX ssse3_phdv2si3 with SSE by moving bits
64:95 to bits 32:63 in SSE register.  Only SSE register source operand
is allowed.

PR target/89021
* config/i386/sse.md (ssse3_phdv2si3):
Changed to define_insn_and_split to support SSE emulation.
---
 gcc/config/i386/sse.md | 34 ++
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 1c31a1fbad0..cb4a1c9fc59 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15356,26 +15356,44 @@
(set_attr "prefix" "orig,vex")
(set_attr "mode" "TI")])
 
-(define_insn "ssse3_phdv2si3"
-  [(set (match_operand:V2SI 0 "register_operand" "=y")
+(define_insn_and_split "ssse3_phdv2si3"
+  [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")
(vec_concat:V2SI
  (plusminus:SI
(vec_select:SI
- (match_operand:V2SI 1 "register_operand" "0")
+ (match_operand:V2SI 1 "register_operand" "0,0,Yv")
  (parallel [(const_int 0)]))
(vec_select:SI (match_dup 1) (parallel [(const_int 1)])))
  (plusminus:SI
(vec_select:SI
- (match_operand:V2SI 2 "nonimmediate_operand" "ym")
+ (match_operand:V2SI 2 "nonimmediate_operand" "ym,x,Yv")
  (parallel [(const_int 0)]))
(vec_select:SI (match_dup 2) (parallel [(const_int 1)])]
-  "TARGET_SSSE3"
-  "phd\t{%2, %0|%0, %2}"
-  [(set_attr "type" "sseiadd")
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"
+  "@
+   phd\t{%2, %0|%0, %2}
+   #
+   #"
+  "TARGET_MMX_WITH_SSE && reload_completed"
+  [(const_int 0)]
+{
+  /* Generate SSE version of the operation.  */
+  rtx op0 = lowpart_subreg (V4SImode, operands[0],
+   GET_MODE (operands[0]));
+  rtx op1 = lowpart_subreg (V4SImode, operands[1],
+   GET_MODE (operands[1]));
+  rtx op2 = lowpart_subreg (V4SImode, operands[2],
+   GET_MODE (operands[2]));
+  emit_insn (gen_ssse3_phdv4si3 (op0, op1, op2));
+  ix86_move_vector_high_sse_to_mmx (op0);
+  DONE;
+}
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "sseiadd")
(set_attr "atom_unit" "complex")
(set_attr "prefix_extra" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "avx2_pmaddubsw256"
   [(set (match_operand:V16HI 0 "register_operand" "=x,v")
-- 
2.20.1



[PATCH 29/42] i386: Emulate MMX ssse3_phwv4hi3 with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX ssse3_phwv4hi3 with SSE by moving bits
64:95 to bits 32:63 in SSE register.  Only SSE register source operand
is allowed.

PR target/89021
* config/i386/sse.md (ssse3_phwv4hi3):
Changed to define_insn_and_split to support SSE emulation.
---
 gcc/config/i386/sse.md | 34 ++
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index f37658630dd..1c31a1fbad0 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15232,13 +15232,13 @@
(set_attr "prefix" "orig,vex")
(set_attr "mode" "TI")])
 
-(define_insn "ssse3_phwv4hi3"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+(define_insn_and_split "ssse3_phwv4hi3"
+  [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yv")
(vec_concat:V4HI
  (vec_concat:V2HI
(ssse3_plusminus:HI
  (vec_select:HI
-   (match_operand:V4HI 1 "register_operand" "0")
+   (match_operand:V4HI 1 "register_operand" "0,0,Yv")
(parallel [(const_int 0)]))
  (vec_select:HI (match_dup 1) (parallel [(const_int 1)])))
(ssse3_plusminus:HI
@@ -15247,19 +15247,37 @@
  (vec_concat:V2HI
(ssse3_plusminus:HI
  (vec_select:HI
-   (match_operand:V4HI 2 "nonimmediate_operand" "ym")
+   (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv")
(parallel [(const_int 0)]))
  (vec_select:HI (match_dup 2) (parallel [(const_int 1)])))
(ssse3_plusminus:HI
  (vec_select:HI (match_dup 2) (parallel [(const_int 2)]))
  (vec_select:HI (match_dup 2) (parallel [(const_int 3)]))]
-  "TARGET_SSSE3"
-  "phw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "sseiadd")
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"
+  "@
+   phw\t{%2, %0|%0, %2}
+   #
+   #"
+  "TARGET_MMX_WITH_SSE && reload_completed"
+  [(const_int 0)]
+{
+  /* Generate SSE version of the operation.  */
+  rtx op0 = lowpart_subreg (V8HImode, operands[0],
+   GET_MODE (operands[0]));
+  rtx op1 = lowpart_subreg (V8HImode, operands[1],
+   GET_MODE (operands[1]));
+  rtx op2 = lowpart_subreg (V8HImode, operands[2],
+   GET_MODE (operands[2]));
+  emit_insn (gen_ssse3_phwv8hi3 (op0, op1, op2));
+  ix86_move_vector_high_sse_to_mmx (op0);
+  DONE;
+}
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "sseiadd")
(set_attr "atom_unit" "complex")
(set_attr "prefix_extra" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "avx2_phdv8si3"
   [(set (match_operand:V8SI 0 "register_operand" "=x")
-- 
2.20.1



[PR 89330] Avoid adding dead speculative edges to inlinig heap

2019-02-15 Thread Martin Jambor
Hi,

Martin discovered that inliner was adding deleted call graph edges to
its heap when supposedly processing newly discovered direct edges.  The
problem is that a new edge created in the speculation part of the
indirect inlining machinery created speculative edges that were
immediately afterwards removed by check_speculations() after it figured
out the edge is not speculation_useful_p().

The fix below avoids creating such non-speculation_useful_p edges in the
first place.  The edge is not useful because it cannot be inlined
because the callee calls comdat local functions.  I had to split
can_inline_edge_p into two functions to allow perform the caller and
callee checks before actually creating an edge.

I think this is safe and beneficial to commit now, maybe with the
exception of the newly added assert in add_new_edges_to_heap, since
inlining apparently can cope with such nonsensical edges in the heap.
But in that case I'd add the assert in the next stage1.

Bootstrapped and tested on x86_64-linux.  IIUC, Martin even
LTO-bootstrapped it.  OK for trunk?

Thanks,

Martin



2019-02-15  Martin Jambor  

PR ipa/89330
* ipa-inline.c (can_inline_edge_p): Move most of the checks...
(call_not_inlinable_p): ...this new function.
(add_new_edges_to_heap): Assert a caller is known.
* ipa-inline.h (call_not_inlinable_p): Declare.
* ipa-prop.c: Include ipa-inline.h
(try_make_edge_direct_virtual_call): Create speculative edges only
if there is any chance of inlining them.

testsuite/
* g++.dg/lto/pr89330_[01].C: New test.
---
 gcc/ipa-inline.c | 128 ---
 gcc/ipa-inline.h |   4 +-
 gcc/ipa-prop.c   |   8 +-
 gcc/testsuite/g++.dg/lto/pr89330_0.C |  50 +++
 gcc/testsuite/g++.dg/lto/pr89330_1.C |  36 
 5 files changed, 154 insertions(+), 72 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/lto/pr89330_0.C
 create mode 100644 gcc/testsuite/g++.dg/lto/pr89330_1.C

diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index 360c3de3289..ae330943571 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -299,12 +299,60 @@ sanitize_attrs_match_for_inline_p (const_tree caller, 
const_tree callee)
   (opts_for_fn (caller->decl)->x_##flag\
!= opts_for_fn (callee->decl)->x_##flag)
 
+/* Return CIF_OK if a call from CALLER to CALLEE is or would be inlineable.
+   Otherwise, return the reason why it cannot.  EARLY should be set when
+   deciding about early inlining.  */
+
+enum cgraph_inline_failed_t
+call_not_inlinable_p (cgraph_node *caller, cgraph_node *callee,
+ bool early)
+{
+  enum availability avail;
+  caller = caller->global.inlined_to ? caller->global.inlined_to : caller;
+  callee = callee->ultimate_alias_target (&avail, caller);
+
+  if (!callee->definition)
+return CIF_BODY_NOT_AVAILABLE;
+  if (!early && (!opt_for_fn (callee->decl, optimize)
+|| !opt_for_fn (caller->decl, optimize)))
+return CIF_FUNCTION_NOT_OPTIMIZED;
+  else if (callee->calls_comdat_local)
+return CIF_USES_COMDAT_LOCAL;
+  else if (avail <= AVAIL_INTERPOSABLE)
+return CIF_OVERWRITABLE;
+  /* Don't inline if the functions have different EH personalities.  */
+  else if (DECL_FUNCTION_PERSONALITY (caller->decl)
+  && DECL_FUNCTION_PERSONALITY (callee->decl)
+  && (DECL_FUNCTION_PERSONALITY (caller->decl)
+  != DECL_FUNCTION_PERSONALITY (callee->decl)))
+return CIF_EH_PERSONALITY;
+  /* TM pure functions should not be inlined into non-TM_pure
+ functions.  */
+  else if (is_tm_pure (callee->decl) && !is_tm_pure (caller->decl))
+return CIF_UNSPECIFIED;
+  /* Check compatibility of target optimization options.  */
+  else if (!targetm.target_option.can_inline_p (caller->decl,
+   callee->decl))
+return CIF_TARGET_OPTION_MISMATCH;
+  else if (ipa_fn_summaries->get (callee) == NULL
+  || !ipa_fn_summaries->get (callee)->inlinable)
+return CIF_FUNCTION_NOT_INLINABLE;
+  /* Don't inline a function with mismatched sanitization attributes. */
+  else if (!sanitize_attrs_match_for_inline_p (caller->decl, callee->decl))
+return CIF_ATTRIBUTE_MISMATCH;
+  else if (callee->externally_visible
+  && flag_live_patching == LIVE_PATCHING_INLINE_ONLY_STATIC)
+return CIF_EXTERN_LIVE_ONLY_STATIC;
+  return CIF_OK;
+}
+
 /* Decide if we can inline the edge and possibly update
inline_failed reason.  
We check whether inlining is possible at all and whether
caller growth limits allow doing so.  
 
-   if REPORT is true, output reason to the dump file. */
+   If REPORT is true, output reason to the dump file.  EARLY should be set when
+   deciding about early inlining.  */
 
 static bool
 can_inline_edge_p (struct cgraph_edge *e, bool report,
@@ -319,81 +367,22 @@ can_inline_edge_p (struct cgraph_

Re: Go patch committed: Harmonize types referenced by both C and Go

2019-02-15 Thread Ian Lance Taylor
On Fri, Feb 15, 2019 at 4:03 AM Rainer Orth  
wrote:
>
> Andreas Schwab  writes:
>
> > This breaks non-split-stack builds.
> >
> > ../../../libgo/runtime/stack.c: In function 'doscanstack1':
> > ../../../libgo/runtime/stack.c:113:18: error: passing argument 1 of
> > 'scanstackblock' makes integer from pointer without a cast
> > [-Werror=int-conversion]
> >   113 |   scanstackblock(bottom, (uintptr)(top - bottom), gcw);
> >   |  ^~
> >   |  |
> >   |  byte * {aka unsigned char *}
>
> I see the same on Solaris.  Even with that fixed by appropriate casts to
> uintptr (plus a few more times), Solaris bootstrap is still broken by
> that patch:
>
> /vol/gcc/src/hg/trunk/local/libgo/runtime/go-varargs.c: In function 
> '__go_syscall6':
> /vol/gcc/src/hg/trunk/local/libgo/runtime/go-varargs.c:101:10: error: 
> implicit declaration of function 'syscall' 
> [-Werror=implicit-function-declaration]
>   101 |   return syscall (flag, a1, a2, a3, a4, a5, a6);
>   |  ^~~
>
> This needs to include  for the syscall declaration, apart
> from the fundamental problem that syscall isn't a stable interface on
> Solaris.

I committed this patch which should fix the Solaris build.

The code was already calling syscall, it was just doing it in a way
that the types didn't necessarily match the C declaration.  This is
the implementation of Go's syscall.Syscall function, so there isn't
really anything else we can do.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 268939)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-a9c1a76e14b66a356d3c3dfb50f1e6138e97733c
+6877c95a5f44c3ab4f492d2000ce07771341d7b7
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/runtime/go-varargs.c
===
--- libgo/runtime/go-varargs.c  (revision 268923)
+++ libgo/runtime/go-varargs.c  (working copy)
@@ -12,6 +12,12 @@
 #include 
 #include 
 #include 
+#ifdef HAVE_SYSCALL_H
+#include 
+#endif
+#ifdef HAVE_SYS_SYSCALL_H
+#include 
+#endif
 
 /* The syscall package calls C functions.  The Go compiler can not
represent a C varargs functions.  On some systems it's important


libgo patch committed: Add S/390 support to internal/cpu package

2019-02-15 Thread Ian Lance Taylor
This patch by Robin Dapp adds S/390 support to the internal/cpu
package.  This partially addresses PR 89123.  I bootstrapped it on
x86_64-pc-linux-gnu, which means little.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 268940)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-6877c95a5f44c3ab4f492d2000ce07771341d7b7
+0563f2d018cdb2cd685c254bac5ceb38396d0a27
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/internal/cpu/cpu_gccgo.c
===
--- libgo/go/internal/cpu/cpu_gccgo.c   (revision 268369)
+++ libgo/go/internal/cpu/cpu_gccgo.c   (working copy)
@@ -70,3 +70,118 @@ struct xgetbv_ret xgetbv(void) {
 #pragma GCC pop_options
 
 #endif /* defined(__i386__) || defined(__x86_64__)  */
+
+#ifdef __s390__
+
+struct facilityList {
+   uint64_t bits[4];
+};
+
+struct queryResult {
+   uint64_t bits[2];
+};
+
+struct facilityList stfle(void)
+  __asm__(GOSYM_PREFIX "internal..z2fcpu.stfle")
+  __attribute__((no_split_stack));
+
+struct facilityList stfle(void) {
+struct facilityList ret;
+__asm__ ("la%%r1, %[ret]\t\n"
+"lghi  %%r0, 3\t\n" // last doubleword index to store
+"xc0(32,%%r1), 0(%%r1)\t\n" // clear 4 doublewords (32 bytes)
+".long 0xb2b01000\t\n"  // store facility list extended (STFLE)
+:[ret] "=Q" (ret) : : "r0", "r1", "cc");
+return ret;
+}
+
+struct queryResult kmQuery(void)
+  __asm__(GOSYM_PREFIX "internal..z2fcpu.kmQuery")
+  __attribute__((no_split_stack));
+
+struct queryResult kmQuery() {
+struct queryResult ret;
+
+__asm__ ("lghi   %%r0, 0\t\n" // set function code to 0 (KM-Query)
+"la %%r1, %[ret]\t\n"
+".long  0xb92e0024\t\n" // cipher message (KM)
+:[ret] "=Q" (ret) : : "r0", "r1", "cc");
+return ret;
+}
+
+struct queryResult kmcQuery(void)
+  __asm__(GOSYM_PREFIX "internal..z2fcpu.kmcQuery")
+  __attribute__((no_split_stack));
+
+struct queryResult kmcQuery() {
+struct queryResult ret;
+
+__asm__ ("lghi   %%r0, 0\t\n" // set function code to 0 (KMC-Query)
+"la %%r1, %[ret]\t\n"
+".long  0xb92f0024\t\n"  // cipher message with chaining (KMC)
+:[ret] "=Q" (ret) : : "r0", "r1", "cc");
+
+return ret;
+}
+
+struct queryResult kmctrQuery(void)
+  __asm__(GOSYM_PREFIX "internal..z2fcpu.kmctrQuery")
+  __attribute__((no_split_stack));
+
+struct queryResult kmctrQuery() {
+struct queryResult ret;
+
+__asm__ ("lghi   %%r0, 0\t\n" // set function code to 0 (KMCTR-Query)
+"la %%r1, %[ret]\t\n"
+".long  0xb92d4024\t\n" // cipher message with counter (KMCTR)
+:[ret] "=Q" (ret) : : "r0", "r1", "cc");
+
+return ret;
+}
+
+struct queryResult kmaQuery(void)
+  __asm__(GOSYM_PREFIX "internal..z2fcpu.kmaQuery")
+  __attribute__((no_split_stack));
+
+struct queryResult kmaQuery() {
+struct queryResult ret;
+
+__asm__ ("lghi   %%r0, 0\t\n" // set function code to 0 (KMA-Query)
+"la %%r1, %[ret]\t\n"
+".long  0xb9296024\t\n" // cipher message with authentication (KMA)
+:[ret] "=Q" (ret) : : "r0", "r1", "cc");
+
+return ret;
+}
+
+struct queryResult kimdQuery(void)
+  __asm__(GOSYM_PREFIX "internal..z2fcpu.kimdQuery")
+  __attribute__((no_split_stack));
+
+struct queryResult kimdQuery() {
+struct queryResult ret;
+
+__asm__ ("lghi   %%r0, 0\t\n"  // set function code to 0 (KIMD-Query)
+"la %%r1, %[ret]\t\n"
+".long  0xb93e0024\t\n"  // compute intermediate message digest 
(KIMD)
+:[ret] "=Q" (ret) : : "r0", "r1", "cc");
+
+return ret;
+}
+
+struct queryResult klmdQuery(void)
+  __asm__(GOSYM_PREFIX "internal..z2fcpu.klmdQuery")
+  __attribute__((no_split_stack));
+
+struct queryResult klmdQuery() {
+struct queryResult ret;
+
+__asm__ ("lghi   %%r0, 0\t\n"  // set function code to 0 (KLMD-Query)
+"la %%r1, %[ret]\t\n"
+".long  0xb93f0024\t\n"  // compute last message digest (KLMD)
+:[ret] "=Q" (ret) : : "r0", "r1", "cc");
+
+return ret;
+}
+
+#endif /* defined(__s390__)  */
Index: libgo/go/internal/cpu/cpu_s390x.go
===
--- libgo/go/internal/cpu/cpu_s390x.go  (revision 268369)
+++ libgo/go/internal/cpu/cpu_s390x.go  (working copy)
@@ -98,13 +98,13 @@ func (s *facilityList) Has(fs ...facilit
 
 // The following feature detection functions are defined in cpu_s390x.s.
 // They are likely to be expensive to call so the results should be cached.
-func stfle() facilityList { panic("not implemented for gccgo") }
-func kmQuery() queryResult{ panic("not implemented for gccgo") }
-func kmcQuery() 

Bugs in extended C interop

2019-02-15 Thread Bader, Reinhold
Dear Paul,

I've started putting together my observations on the current status of the 
F2018 C interop extensions  in gfortran 9.0. See the PRs
89363, 89364, 89365, 89366:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89363
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89364
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89365
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89366

Regards
Reinhold


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH 17/42] i386: Emulate MMX mmx_pextrw with SSE

2019-02-15 Thread H.J. Lu
On Fri, Feb 15, 2019 at 6:03 AM H.J. Lu  wrote:
>
> Emulate MMX mmx_pextrw with SSE.  Only SSE register source operand is
> allowed.
>
> PR target/89021
> * config/i386/mmx.md (mmx_pextrw): Add SSE emulation.
> ---
>  gcc/config/i386/mmx.md | 16 +---
>  1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
> index 3ea64e9aabe..678eaa713dc 100644
> --- a/gcc/config/i386/mmx.md
> +++ b/gcc/config/i386/mmx.md
> @@ -1310,16 +1310,18 @@
> (set_attr "mode" "DI")])
>
>  (define_insn "mmx_pextrw"
> -  [(set (match_operand:SI 0 "register_operand" "=r")
> +  [(set (match_operand:SI 0 "register_operand" "=r,r")
>  (zero_extend:SI
>   (vec_select:HI
> -   (match_operand:V4HI 1 "register_operand" "y")
> -   (parallel [(match_operand:SI 2 "const_0_to_3_operand" "n")]]
> -  "TARGET_SSE || TARGET_3DNOW_A"
> -  "pextrw\t{%2, %1, %0|%0, %1, %2}"
> -  [(set_attr "type" "mmxcvt")
> +   (match_operand:V4HI 1 "register_operand" "y,Yv")
> +   (parallel [(match_operand:SI 2 "const_0_to_3_operand" "n,n")]]
> +  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
> +   && (TARGET_SSE || TARGET_3DNOW_A)"
> +  "%vpextrw\t{%2, %1, %0|%0, %1, %2}"
> +  [(set_attr "mmx_isa" "native,x64")
> +   (set_attr "type" "mmxcvt,sselog1")
> (set_attr "length_immediate" "1")
> -   (set_attr "mode" "DI")])
> +   (set_attr "mode" "DI,TI")])
>
>  (define_expand "mmx_pshufw"
>[(match_operand:V4HI 0 "register_operand")
> --
> 2.20.1
>

Here is the updated patch for mmx_pextrw.  It should be

(define_insn "mmx_pextrw"
  [(set (match_operand:SI 0 "register_operand" "=r,r")
(zero_extend:SI
  (vec_select:HI
(match_operand:V4HI 1 "register_operand" "y,Yv")
(parallel [(match_operand:SI 2 "const_0_to_3_operand" "n,n")]]
  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
   && (TARGET_SSE || TARGET_3DNOW_A)"
  "@
   pextrw\t{%2, %1, %0|%0, %1, %2}
   %vpextrw\t{%2, %1, %0|%0, %1, %2}"
  [(set_attr "mmx_isa" "native,x64")
   (set_attr "type" "mmxcvt,sselog1")
   (set_attr "length_immediate" "1")
   (set_attr "mode" "DI,TI")])


-- 
H.J.
From 17bd9eb652aff70a72680f444fbb169344cf563b Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Fri, 25 Jan 2019 11:27:35 -0800
Subject: [PATCH 17/42] i386: Emulate MMX mmx_pextrw with SSE

Emulate MMX mmx_pextrw with SSE.  Only SSE register source operand is
allowed.

	PR target/89021
	* config/i386/mmx.md (mmx_pextrw): Add SSE emulation.
---
 gcc/config/i386/mmx.md | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 3ea64e9aabe..1818957f670 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1310,16 +1310,20 @@
(set_attr "mode" "DI")])
 
 (define_insn "mmx_pextrw"
-  [(set (match_operand:SI 0 "register_operand" "=r")
+  [(set (match_operand:SI 0 "register_operand" "=r,r")
 (zero_extend:SI
 	  (vec_select:HI
-	(match_operand:V4HI 1 "register_operand" "y")
-	(parallel [(match_operand:SI 2 "const_0_to_3_operand" "n")]]
-  "TARGET_SSE || TARGET_3DNOW_A"
-  "pextrw\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "mmxcvt")
+	(match_operand:V4HI 1 "register_operand" "y,Yv")
+	(parallel [(match_operand:SI 2 "const_0_to_3_operand" "n,n")]]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
+  "@
+   pextrw\t{%2, %1, %0|%0, %1, %2}
+   %vpextrw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64")
+   (set_attr "type" "mmxcvt,sselog1")
(set_attr "length_immediate" "1")
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI")])
 
 (define_expand "mmx_pshufw"
   [(match_operand:V4HI 0 "register_operand")
-- 
2.20.1



GCC 8.3 Status Report (2019-02-15)

2019-02-15 Thread Jakub Jelinek
Status
==

The GCC 8 branch is now frozen for blocking regressions and documentation
fixes only, all changes to the branch require a RM approval now.


Quality Data


Priority  #   Change from last report
---   ---
P10
P2  193   -  11
P3   29   +   4
P4  163   -   2
P5   24
---   ---
Total P1-P3 222   -   7
Total   409   -   9


Previous Report
===

https://gcc.gnu.org/ml/gcc/2019-02/msg00034.html


Re: [PATCH 02/42] i386: Add mmx_nonimmediate_operand

2019-02-15 Thread Uros Bizjak
On Fri, Feb 15, 2019 at 2:58 PM H.J. Lu  wrote:
>
> True if the operand is a register or an nonimmediate operand when
> TARGET_MMX_WITH_SSE is false.
>
> PR target/89021
> * config/i386/predicates.md (mmx_nonimmediate_operand): New.
> ---
>  gcc/config/i386/predicates.md | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
> index 99226e86436..bd1f07a28fb 100644
> --- a/gcc/config/i386/predicates.md
> +++ b/gcc/config/i386/predicates.md
> @@ -49,6 +49,13 @@
>(and (match_code "reg")
> (match_test "MMX_REGNO_P (REGNO (op))")))
>
> +;; True if the operand is a register or an nonimmediate operand when
> +;; TARGET_MMX_WITH_SSE is false.
> +(define_predicate "mmx_nonimmediate_operand"
> +  (ior (match_operand 0 "register_operand")
> +   (and (not (match_test "TARGET_MMX_WITH_SSE"))
> +   (match_operand 0 "nonimmediate_operand"

Here you can use "memory_operand".

I'd expect you use this new predicate universally throughout the
patchset in e.g.

+  (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym,x,Yv")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && ix86_binary_operator_ok (, mode, operands)"
+  "@
+   ...
+   ...
+   v...
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")

When TARGET_MMX_WITH_SSE is true, then only the last two constraints
are enabled, so we are sure that only register operand is allowed.
While RA can fixup mem->reg by itself, It is beneficial to pass this
information to the compiler via predicate, and
mmx_nonimmediate_operand fits there perfectly.

Uros.


Re: [PATCH][GCC][DOC] Remove obsolete arm and aarch64 CPU names from invoke.texi

2019-02-15 Thread Sam Tebbs
On 19/01/2019 23:37, Gerald Pfeifer wrote:

> On Thu, 10 Jan 2019, Sam Tebbs wrote:
>>> I believe this should also be covered in the GCC 9 release notes
>>> at https://gcc.gnu.org/gcc-9/changes.html ?
>> Sorry for the late reply. My email filters seem to have stumbled a bit
>> so I didn't pick this up until now. Would you suggest adding something
>> along the lines of "Removed obsolete Arm CPU names from the option
>> documentation" (perhaps with a full list as in my original email)?
> Yes, please.
>
> Gerald (now needing to look at his filters)

Hi Gerald,

I was looking into this and it seems that the CPU and architecture 
removals have already been documented in the Arm-specific section of the 
GCC 9 changes, so explicitly mentioning that the documentation has been 
removed as well is probably unnecessary.

Sam



Re: [PATCH][DOC] Document new features for GCC 9.

2019-02-15 Thread Eric Gallager
On 2/14/19, David Malcolm  wrote:
> On Thu, 2019-02-14 at 14:19 -0700, Martin Sebor wrote:
>> On 2/13/19 6:48 AM, Martin Liška wrote:
>> > Hi.
>> >
>> > I'm sending patch where I document changes I made during GCC 9
>> > development. I would appreciate both language and factical comments
>> > about the patch.
>>
>> Nothing technical, just a few very minor language nits/suggestions.
>>
>> Martin
>>
>> diff --git a/htdocs/gcc-9/changes.html b/htdocs/gcc-9/changes.html
>> index 13243c2..9fec9e2 100644
>> --- a/htdocs/gcc-9/changes.html
>> +++ b/htdocs/gcc-9/changes.html
>> @@ -50,11 +50,64 @@ a work-in-progress.
>>   General Improvements
>>   
>> 
>> -A new option -flive-patching=[inline-only-static|inline-clone]
>> is
>> +A new option
>> -flive-patching=[inline-only-static|inline-clone] is
>>
>> s/is/has been/ would be better (and either a comma after option or
>> a definite article without the comma).
>>
>>   introduced to provide a safe compilation for live-patching. At
>> the
>> same
>>   time, provides multiple-level control on the enabled IPA
>> optimizations.
>>   See the user guide for further information about the option for
>> more
>> -details.
>> +details.
>
> Ideally we should add URLs any time we mention an option, linking to
> the docs for that option.  texinfo's HTML toolchain does give us per-
> option anchors.  They're not visible [1], but "View Source" shows us
> that they do exist; in the form:
>
> https://gcc.gnu.org/onlinedocs/gcc/SOMETHING.html#indexOPTION
>
> though annoyingly the SOMETHING varies depending on what kind of option
> it is.
>
> The pertinent one here is:
> https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-flive-patching
>
> (FWIW, I have a patch for GCC 10 that emits terminal sequences to
> "linkify" the output when diagnostics mention option names, adding a
> URL to the docs for the pertinent option).
>
> [...snip...]
>
> Dave
>
> [1] I've emailed the texinfo project about this
>

The link for that thread is here, for reference:
https://lists.gnu.org/archive/html/help-texinfo/2019-02/msg0.html


[Committed][PATCH][GCC][Arm] Remove alternative from neon_softfp_fp16 directive.

2019-02-15 Thread Tamar Christina
Hi All,

There's a bit of a disconnect between the feature flags that don't test the fpu
and ones that do when the test itself also forces an architecture.  The forcing
of the architecture would change the defaults and without explicitly giving the
correct fpu again the test would fail.

I don't see a good way to solve this problem, really the feature tests should
ideally contain the extra options the test adds too, but for this specific case
it can be solved by always testing the fpu explicitly.

Committed under the GCC obvious

Thanks,
Tamar

gcc/testsuite/ChangeLog:

2019-02-15  Tamar Christina  

* lib/target-supports.exp
(check_effective_target_arm_neon_softfp_fp16_ok_nocache): Drop non-fpu
checking alternative.

-- 
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 1d237d4cd664924cc580cff67a563230b3fe9571..5d8ba4436ac1ad29da57802f2465d05712c8e8e7 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3797,7 +3797,6 @@ proc check_effective_target_arm_neon_softfp_fp16_ok_nocache { } {
 if { [check_effective_target_arm32]
 	 && [check_effective_target_arm_neon_ok] } {
 	foreach flags {"-mfpu=neon-fp16 -mfloat-abi=softfp"
-		   "-mfloat-abi=softfp -mfp16-format=ieee"
 		   "-mfpu=neon-fp16 -mfloat-abi=softfp -mfp16-format=ieee"} {
 	if { [check_no_compiler_messages_nocache arm_neon_softfp_fp16_ok object {
 		#include "arm_neon.h"



Re: [PATCH 28/42] i386: Make _mm_empty () as NOP when MMX is disabled

2019-02-15 Thread Uros Bizjak
On Fri, Feb 15, 2019 at 3:03 PM H.J. Lu  wrote:
>
> With SSE emulation of MMX intrinsics, we should make _mm_empty () as NOP
> when MMX is disabled.
>
> PR target/89021
> * config/i386/mmx.md (EMMS): Also allow TARGET_MMX_WITH_SSE.
> (mmx_): Generate "" only when MMX is enabled.

Better rename the pattern to "*mmx_" and introduce a new expander:

(define_insn "mmx_"
  [(unspec_volatile [(const_int 0)] EMMS)
   (clobber (reg:XF ST0_REG))
   (clobber (reg:XF ST1_REG))
   (clobber (reg:XF ST2_REG))
   (clobber (reg:XF ST3_REG))
   (clobber (reg:XF ST4_REG))
   (clobber (reg:XF ST5_REG))
   (clobber (reg:XF ST6_REG))
   (clobber (reg:XF ST7_REG))
   (clobber (reg:DI MM0_REG))
   (clobber (reg:DI MM1_REG))
   (clobber (reg:DI MM2_REG))
   (clobber (reg:DI MM3_REG))
   (clobber (reg:DI MM4_REG))
   (clobber (reg:DI MM5_REG))
   (clobber (reg:DI MM6_REG))
   (clobber (reg:DI MM7_REG))]
  "TARGET_MMX || TARGET_MMX_WITH_SSE"
{
  if (!TARGET_MMX)
{
  emit_insn (gen_nop ());
  DONE;
}
})

This way, the compiler won't bother with {f,}emms when there are no
MMX registers.

Uros.

> ---
>  gcc/config/i386/mmx.md | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
> index d662663a445..eaca71d5750 100644
> --- a/gcc/config/i386/mmx.md
> +++ b/gcc/config/i386/mmx.md
> @@ -1839,7 +1839,7 @@
> (set_attr "mode" "DI")])
>
>  (define_int_iterator EMMS
> -  [(UNSPECV_EMMS "TARGET_MMX")
> +  [(UNSPECV_EMMS "TARGET_MMX || TARGET_MMX_WITH_SSE")
> (UNSPECV_FEMMS "TARGET_3DNOW")])
>
>  (define_int_attr emms
> @@ -1865,7 +1865,9 @@
> (clobber (reg:DI MM6_REG))
> (clobber (reg:DI MM7_REG))]
>""
> -  ""
> +{
> +  return TARGET_MMX ? "" : "";
>
> +}
>[(set_attr "type" "mmx")
> (set_attr "modrm" "0")
> (set_attr "memory" "none")])
> --
> 2.20.1
>


Re: [PATCH 00/40] V6: Emulate MMX intrinsics with SSE

2019-02-15 Thread Uros Bizjak
On Fri, Feb 15, 2019 at 2:58 PM H.J. Lu  wrote:
>
> On x86-64, since __m64 is returned and passed in XMM registers, we can
> emulate MMX intrinsics with SSE instructions. To support it, we added
>
>  #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2)
>
> ;; Define instruction set of MMX instructions
> (define_attr "mmx_isa" "base,native,x64,x64_noavx,x64_avx"
>   (const_string "base"))
>
>  (eq_attr "mmx_isa" "native")
>(symbol_ref "!TARGET_MMX_WITH_SSE")
>  (eq_attr "mmx_isa" "x64")
>(symbol_ref "TARGET_MMX_WITH_SSE")
>  (eq_attr "mmx_isa" "x64_avx")
>(symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX")
>  (eq_attr "mmx_isa" "x64_noavx")
>(symbol_ref "TARGET_MMX_WITH_SSE && !TARGET_AVX")
>
> We added SSE emulation to MMX patterns and disabled MMX alternatives with
> TARGET_MMX_WITH_SSE.
>
> Most of MMX instructions have equivalent SSE versions and results of some
> SSE versions need to be reshuffled to the right order for MMX.  Thee are
> couple tricky cases:
>
> 1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent.  We emulate MMX
> maskmovq with SSE2 maskmovdqu by zeroing out the upper 64 bits of the
> mask operand and handle unmapped bits 64:127 at memory address by
> adjusting source and mask operands together with memory address.
>
> 2. MMX movntq is emulated with SSE2 DImode movnti, which is available
> in 64-bit mode.
>
> 3. MMX pshufb takes a 3-bit index while SSE pshufb takes a 4-bit index.
> SSE emulation must clear the bit 4 in the shuffle control mask.
>
> 4. To emulate MMX cvtpi2p with SSE2 cvtdq2ps, we must properly preserve
> the upper 64 bits of destination XMM register.
>
> Tests are also added to check each SSE emulation of MMX intrinsics.
>
> There are no regressions on i686 and x86-64.  For x86-64, GCC is also
> tested with
>
> --with-arch=native --with-cpu=native
>
> on AVX2 and AVX512F machines.

I went through the code again, and looks OK in general, modulo
mmx_nonimmediate_operand issue and a couple of minor issues.

Please substitute nonimmediate_operand predicate with
mmx_nonimmediate_operand in expanders and insn patterns. Please note
that the proposed convention is to name the operand
register_mmxmem_operand (c.f. register_ssemem_operand), so I suggest
we name the predicate in this way.

There is an issue with a change to emms pattern.

And let's remove _mm_empty () calls from testcases; they complicate
things too much for no apparent benefit.

With those issues fixed, the patchset is OK for gcc-10 when it opens.

Uros.

> H.J. Lu (41):
>   i386: Allow MMX register modes in SSE registers
>   i386: Add mmx_nonimmediate_operand
>   i386: Emulate MMX packsswb/packssdw/packuswb with SSE2
>   i386: Emulate MMX punpcklXX/punpckhXX with SSE punpcklXX
>   i386: Emulate MMX plusminus/sat_plusminus with SSE
>   i386: Emulate MMX mulv4hi3 with SSE
>   i386: Emulate MMX smulv4hi3_highpart with SSE
>   i386: Emulate MMX mmx_pmaddwd with SSE
>   i386: Emulate MMX ashr3/3 with SSE
>   i386: Emulate MMX 3 with SSE
>   i386: Emulate MMX mmx_andnot3 with SSE
>   i386: Emulate MMX mmx_eq/mmx_gt3 with SSE
>   i386: Emulate MMX vec_dupv2si with SSE
>   i386: Emulate MMX pshufw with SSE
>   i386: Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE
>   i386: Emulate MMX sse_cvtpi2ps with SSE
>   i386: Emulate MMX mmx_pextrw with SSE
>   i386: Emulate MMX mmx_pinsrw with SSE
>   i386: Emulate MMX V4HI smaxmin/V8QI umaxmin with SSE
>   i386: Emulate MMX mmx_pmovmskb with SSE
>   i386: Emulate MMX mmx_umulv4hi3_highpart with SSE
>   i386: Emulate MMX maskmovq with SSE2 maskmovdqu
>   i386: Emulate MMX mmx_uavgv8qi3 with SSE
>   i386: Emulate MMX mmx_uavgv4hi3 with SSE
>   i386: Emulate MMX mmx_psadbw with SSE
>   i386: Emulate MMX movntq with SSE2 movntidi
>   i386: Emulate MMX umulv1siv1di3 with SSE2
>   i386: Make _mm_empty () as NOP when MMX is disabled
>   i386: Emulate MMX ssse3_phwv4hi3 with SSE
>   i386: Emulate MMX ssse3_phdv2si3 with SSE
>   i386: Emulate MMX ssse3_pmaddubsw with SSE
>   i386: Emulate MMX ssse3_pmulhrswv4hi3 with SSE
>   i386: Emulate MMX pshufb with SSE version
>   i386: Emulate MMX ssse3_psign3 with SSE
>   i386: Emulate MMX ssse3_palignrdi with SSE
>   i386: Emulate MMX abs2 with SSE
>   i386: Allow MMXMODE moves with TARGET_MMX_WITH_SSE
>   i386: Allow MMX vector expanders with TARGET_MMX_WITH_SSE
>   i386: Allow MMX intrinsic emulation with SSE
>   i386: Enable TM MMX intrinsics with SSE2
>   i386: Add tests for MMX intrinsic emulations with SSE
>
> Uros Bizjak (1):
>   Prevent allocation of MMX registers with TARGET_MMX_WITH_SSE
>
>  gcc/config/i386/constraints.md|   6 +
>  gcc/config/i386/i386-builtin.def  | 126 +--
>  gcc/config/i386/i386-c.c  |   2 +
>  gcc/config/i386/i386-protos.h |   4 +
>  gcc/config/i386/i386.c| 189 +++-
>  gcc/config/i386/i386.h|   2 +
>  gcc/config/i386/i38

[PATCH, i386]: Add missing TARGET_FPMATH_DEFAULT_P to darwin.h

2019-02-15 Thread Uros Bizjak
Darwin defines its own TARGET_FPMATH_DEFAULT, which should be
accompanied by corresponding TARGET_FPMATH_DEFAULT_P.  Patch adds
missing define.

While looking around, I also fixed various whitespace issues in the header.

BTW: The header file still defines TARGET_64BIT which is horribly out
of date. Someone should introduce correct multilib support to Darwin
to bring it in line with Linux and Solaris, so these defines could be
removed in favour of generic ones in i386.h.

2019-02-15  Uroš Bizjak  

* config/i386/darwin.h (TARGET_FPMATH_DEFAULT_P): New define.

Tested by building a crosscompiler to x86_64-apple-darwin18.

Committed to mainline SVN as obvious.

Uros.
diff --git a/gcc/config/i386/darwin.h b/gcc/config/i386/darwin.h
index a63841ca5554..d8e72ec69a57 100644
--- a/gcc/config/i386/darwin.h
+++ b/gcc/config/i386/darwin.h
@@ -25,10 +25,10 @@ along with GCC; see the file COPYING3.  If not see
 #undef DARWIN_X86
 #define DARWIN_X86 1
 
-#undef  TARGET_64BIT
-#undef TARGET_64BIT_P
+#undef TARGET_64BIT
 #define TARGET_64BIT TARGET_ISA_64BIT
-#defineTARGET_64BIT_P(x) TARGET_ISA_64BIT_P(x)
+#undef TARGET_64BIT_P
+#define TARGET_64BIT_P(x) TARGET_ISA_64BIT_P(x)
 
 #ifdef IN_LIBGCC2
 #undef TARGET_64BIT
@@ -70,14 +70,15 @@ along with GCC; see the file COPYING3.  If not see
 
 #undef TARGET_FPMATH_DEFAULT
 #define TARGET_FPMATH_DEFAULT (TARGET_SSE ? FPMATH_SSE : FPMATH_387)
+#undef TARGET_FPMATH_DEFAULT_P
+#define TARGET_FPMATH_DEFAULT_P(x) \
+  (TARGET_SSE_P(x) ? FPMATH_SSE : FPMATH_387)
 
 #define TARGET_OS_CPP_BUILTINS()\
-  do\
-{   \
-  builtin_define ("__LITTLE_ENDIAN__"); \
-  darwin_cpp_builtins (pfile); \
-}   \
-  while (0)
+  do { \
+builtin_define ("__LITTLE_ENDIAN__");  \
+darwin_cpp_builtins (pfile);   \
+  } while (0)
 
 #undef PTRDIFF_TYPE
 #define PTRDIFF_TYPE (TARGET_64BIT ? "long int" : "int")
@@ -121,7 +122,7 @@ extern int darwin_emit_branch_islands;
than 128 bits for Darwin, but it's easier to up the alignment if
it's below the minimum.  */
 #undef PREFERRED_STACK_BOUNDARY
-#define PREFERRED_STACK_BOUNDARY   \
+#define PREFERRED_STACK_BOUNDARY \
   MAX (128, ix86_preferred_stack_boundary)
 
 /* We want -fPIC by default, unless we're using -static to compile for
@@ -179,15 +180,15 @@ extern int darwin_emit_branch_islands;
and returns float values in the 387.  */
 
 #undef TARGET_SUBTARGET_DEFAULT
-#define TARGET_SUBTARGET_DEFAULT (MASK_80387 | MASK_IEEE_FP | 
MASK_FLOAT_RETURNS | MASK_128BIT_LONG_DOUBLE)
+#define TARGET_SUBTARGET_DEFAULT \
+  (MASK_80387 | MASK_IEEE_FP | MASK_FLOAT_RETURNS | MASK_128BIT_LONG_DOUBLE)
 
 /* For darwin we want to target specific processor features as a minimum,
but these unfortunately don't correspond to a specific processor.  */
 #undef TARGET_SUBTARGET32_ISA_DEFAULT
-#define TARGET_SUBTARGET32_ISA_DEFAULT (OPTION_MASK_ISA_MMX\
-   | OPTION_MASK_ISA_SSE   \
-   | OPTION_MASK_ISA_SSE2  \
-   | OPTION_MASK_ISA_SSE3)
+#define TARGET_SUBTARGET32_ISA_DEFAULT \
+  (OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE   \
+   | OPTION_MASK_ISA_SSE2 | OPTION_MASK_ISA_SSE3)
 
 #undef TARGET_SUBTARGET64_ISA_DEFAULT
 #define TARGET_SUBTARGET64_ISA_DEFAULT TARGET_SUBTARGET32_ISA_DEFAULT
@@ -209,15 +210,16 @@ extern int darwin_emit_branch_islands;
 #define SUBTARGET_ENCODE_SECTION_INFO  darwin_encode_section_info
 
 #undef ASM_OUTPUT_ALIGN
-#define ASM_OUTPUT_ALIGN(FILE,LOG) \
- do { if ((LOG) != 0)  \
-{  \
-  if (in_section == text_section) \
-fprintf (FILE, "\t%s %d,0x90\n", ALIGN_ASM_OP, (LOG)); \
-  else \
-fprintf (FILE, "\t%s %d\n", ALIGN_ASM_OP, (LOG)); \
-}  \
-} while (0)
+#define ASM_OUTPUT_ALIGN(FILE,LOG)\
+  do {\
+if ((LOG) != 0)   \
+  {   \
+   if (in_section == text_section)\
+ fprintf (FILE, "\t%s %d,0x90\n", ALIGN_ASM_OP, (LOG));   \
+   else   \
+ fprintf (FILE, "\t%s %d\n", ALIGN_ASM_OP, (LOG));\
+  }   \
+  } while (0)
 
 /* Darwin x86 assemblers support the .ident directive.  */
 
@@ -227,16 +229,16 @@ extern int darwin_emit_branch_islands;
 /* Darwin profiling -- call mcount.  */
 #un

Re: [PATCH 00/40] V6: Emulate MMX intrinsics with SSE

2019-02-15 Thread H.J. Lu
On Fri, Feb 15, 2019 at 9:50 AM Uros Bizjak  wrote:
>
> On Fri, Feb 15, 2019 at 2:58 PM H.J. Lu  wrote:
> >
> > On x86-64, since __m64 is returned and passed in XMM registers, we can
> > emulate MMX intrinsics with SSE instructions. To support it, we added
> >
> >  #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2)
> >
> > ;; Define instruction set of MMX instructions
> > (define_attr "mmx_isa" "base,native,x64,x64_noavx,x64_avx"
> >   (const_string "base"))
> >
> >  (eq_attr "mmx_isa" "native")
> >(symbol_ref "!TARGET_MMX_WITH_SSE")
> >  (eq_attr "mmx_isa" "x64")
> >(symbol_ref "TARGET_MMX_WITH_SSE")
> >  (eq_attr "mmx_isa" "x64_avx")
> >(symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX")
> >  (eq_attr "mmx_isa" "x64_noavx")
> >(symbol_ref "TARGET_MMX_WITH_SSE && !TARGET_AVX")
> >
> > We added SSE emulation to MMX patterns and disabled MMX alternatives with
> > TARGET_MMX_WITH_SSE.
> >
> > Most of MMX instructions have equivalent SSE versions and results of some
> > SSE versions need to be reshuffled to the right order for MMX.  Thee are
> > couple tricky cases:
> >
> > 1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent.  We emulate MMX
> > maskmovq with SSE2 maskmovdqu by zeroing out the upper 64 bits of the
> > mask operand and handle unmapped bits 64:127 at memory address by
> > adjusting source and mask operands together with memory address.
> >
> > 2. MMX movntq is emulated with SSE2 DImode movnti, which is available
> > in 64-bit mode.
> >
> > 3. MMX pshufb takes a 3-bit index while SSE pshufb takes a 4-bit index.
> > SSE emulation must clear the bit 4 in the shuffle control mask.
> >
> > 4. To emulate MMX cvtpi2p with SSE2 cvtdq2ps, we must properly preserve
> > the upper 64 bits of destination XMM register.
> >
> > Tests are also added to check each SSE emulation of MMX intrinsics.
> >
> > There are no regressions on i686 and x86-64.  For x86-64, GCC is also
> > tested with
> >
> > --with-arch=native --with-cpu=native
> >
> > on AVX2 and AVX512F machines.
>
> I went through the code again, and looks OK in general, modulo
> mmx_nonimmediate_operand issue and a couple of minor issues.
>
> Please substitute nonimmediate_operand predicate with
> mmx_nonimmediate_operand in expanders and insn patterns. Please note

Can we keep nonimmediate_operand in expanders, like

(define_expand "3"
  [(set (match_operand:MMXMODEI 0 "register_operand")
(plusminus:MMXMODEI
  (match_operand:MMXMODEI 1 "nonimmediate_operand")
  (match_operand:MMXMODEI 2 "nonimmediate_operand")))]
  "TARGET_MMX_WITH_SSE"
  "ix86_fixup_binary_operands_no_copy (, mode, operands);")

(define_insn "*mmx_3"
  [(set (match_operand:MMXMODEI8 0 "register_operand" "=y,x,Yv")
(plusminus:MMXMODEI8
  (match_operand:MMXMODEI8 1 "register_mmxmem_operand" "0,0,Yv")
  (match_operand:MMXMODEI8 2 "register_mmxmem_operand" "ym,x,Yv")))]
  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
   && ix86_binary_operator_ok (, mode, operands)"
  "@
   p\t{%2, %0|%0, %2}
   p\t{%2, %0|%0, %2}
   vp\t{%2, %1, %0|%0, %1, %2}"
  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
   (set_attr "type" "mmxadd,sseadd,sseadd")
   (set_attr "mode" "DI,TI,TI")])

Can RA do the right thing?

> that the proposed convention is to name the operand
> register_mmxmem_operand (c.f. register_ssemem_operand), so I suggest
> we name the predicate in this way.

I will rename it to register_mmxmem_operand.

> There is an issue with a change to emms pattern.
>
> And let's remove _mm_empty () calls from testcases; they complicate
> things too much for no apparent benefit.

Will do.

> With those issues fixed, the patchset is OK for gcc-10 when it opens.
>
> Uros.
>
> > H.J. Lu (41):
> >   i386: Allow MMX register modes in SSE registers
> >   i386: Add mmx_nonimmediate_operand
> >   i386: Emulate MMX packsswb/packssdw/packuswb with SSE2
> >   i386: Emulate MMX punpcklXX/punpckhXX with SSE punpcklXX
> >   i386: Emulate MMX plusminus/sat_plusminus with SSE
> >   i386: Emulate MMX mulv4hi3 with SSE
> >   i386: Emulate MMX smulv4hi3_highpart with SSE
> >   i386: Emulate MMX mmx_pmaddwd with SSE
> >   i386: Emulate MMX ashr3/3 with SSE
> >   i386: Emulate MMX 3 with SSE
> >   i386: Emulate MMX mmx_andnot3 with SSE
> >   i386: Emulate MMX mmx_eq/mmx_gt3 with SSE
> >   i386: Emulate MMX vec_dupv2si with SSE
> >   i386: Emulate MMX pshufw with SSE
> >   i386: Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE
> >   i386: Emulate MMX sse_cvtpi2ps with SSE
> >   i386: Emulate MMX mmx_pextrw with SSE
> >   i386: Emulate MMX mmx_pinsrw with SSE
> >   i386: Emulate MMX V4HI smaxmin/V8QI umaxmin with SSE
> >   i386: Emulate MMX mmx_pmovmskb with SSE
> >   i386: Emulate MMX mmx_umulv4hi3_highpart with SSE
> >   i386: Emulate MMX maskmovq with SSE2 maskmovdqu
> >   i386: Emulate MMX mmx_uavgv8qi3 with SSE
> >   i386: Emulate MMX mmx_uavgv4hi3 with SSE
> >   i386: Emulate MMX mmx_psadbw wi

Re: [PATCH 00/40] V6: Emulate MMX intrinsics with SSE

2019-02-15 Thread Uros Bizjak
On Fri, Feb 15, 2019 at 7:20 PM H.J. Lu  wrote:
> > I went through the code again, and looks OK in general, modulo
> > mmx_nonimmediate_operand issue and a couple of minor issues.
> >
> > Please substitute nonimmediate_operand predicate with
> > mmx_nonimmediate_operand in expanders and insn patterns. Please note
>
> Can we keep nonimmediate_operand in expanders, like

No, expander should also be changed. The way expanders are called is -
if the operand can't satisfy the predicate, then move it to a
register. So, for TARGET_MMX_WITH_SSE, we allow memory operand which
isn't allowed by relevant insn pattern -> ICE.

There is nothing RA can do here. Operand type, produced by expander
must match predicate in the insn pattern to satisfy insn pattern.
Otherwise, the compiler will ICE way before RA comes into play. Also,
in the insn pattern, the constraints must allow a subset of an operand
predicate if we want RA to fixup the operand.

Uros.

> (define_expand "3"
>   [(set (match_operand:MMXMODEI 0 "register_operand")
> (plusminus:MMXMODEI
>   (match_operand:MMXMODEI 1 "nonimmediate_operand")
>   (match_operand:MMXMODEI 2 "nonimmediate_operand")))]
>   "TARGET_MMX_WITH_SSE"
>   "ix86_fixup_binary_operands_no_copy (, mode, operands);")
>
> (define_insn "*mmx_3"
>   [(set (match_operand:MMXMODEI8 0 "register_operand" "=y,x,Yv")
> (plusminus:MMXMODEI8
>   (match_operand:MMXMODEI8 1 "register_mmxmem_operand" "0,0,Yv")
>   (match_operand:MMXMODEI8 2 "register_mmxmem_operand" "ym,x,Yv")))]
>   "(TARGET_MMX || TARGET_MMX_WITH_SSE)
>&& ix86_binary_operator_ok (, mode, operands)"
>   "@
>p\t{%2, %0|%0, %2}
>p\t{%2, %0|%0, %2}
>vp\t{%2, %1, %0|%0, %1, %2}"
>   [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
>(set_attr "type" "mmxadd,sseadd,sseadd")
>(set_attr "mode" "DI,TI,TI")])
>
> Can RA do the right thing?
>
> > that the proposed convention is to name the operand
> > register_mmxmem_operand (c.f. register_ssemem_operand), so I suggest
> > we name the predicate in this way.
>
> I will rename it to register_mmxmem_operand.
>
> > There is an issue with a change to emms pattern.
> >
> > And let's remove _mm_empty () calls from testcases; they complicate
> > things too much for no apparent benefit.
>
> Will do.
>
> > With those issues fixed, the patchset is OK for gcc-10 when it opens.
> >
> > Uros.
> >
> > > H.J. Lu (41):
> > >   i386: Allow MMX register modes in SSE registers
> > >   i386: Add mmx_nonimmediate_operand
> > >   i386: Emulate MMX packsswb/packssdw/packuswb with SSE2
> > >   i386: Emulate MMX punpcklXX/punpckhXX with SSE punpcklXX
> > >   i386: Emulate MMX plusminus/sat_plusminus with SSE
> > >   i386: Emulate MMX mulv4hi3 with SSE
> > >   i386: Emulate MMX smulv4hi3_highpart with SSE
> > >   i386: Emulate MMX mmx_pmaddwd with SSE
> > >   i386: Emulate MMX ashr3/3 with SSE
> > >   i386: Emulate MMX 3 with SSE
> > >   i386: Emulate MMX mmx_andnot3 with SSE
> > >   i386: Emulate MMX mmx_eq/mmx_gt3 with SSE
> > >   i386: Emulate MMX vec_dupv2si with SSE
> > >   i386: Emulate MMX pshufw with SSE
> > >   i386: Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE
> > >   i386: Emulate MMX sse_cvtpi2ps with SSE
> > >   i386: Emulate MMX mmx_pextrw with SSE
> > >   i386: Emulate MMX mmx_pinsrw with SSE
> > >   i386: Emulate MMX V4HI smaxmin/V8QI umaxmin with SSE
> > >   i386: Emulate MMX mmx_pmovmskb with SSE
> > >   i386: Emulate MMX mmx_umulv4hi3_highpart with SSE
> > >   i386: Emulate MMX maskmovq with SSE2 maskmovdqu
> > >   i386: Emulate MMX mmx_uavgv8qi3 with SSE
> > >   i386: Emulate MMX mmx_uavgv4hi3 with SSE
> > >   i386: Emulate MMX mmx_psadbw with SSE
> > >   i386: Emulate MMX movntq with SSE2 movntidi
> > >   i386: Emulate MMX umulv1siv1di3 with SSE2
> > >   i386: Make _mm_empty () as NOP when MMX is disabled
> > >   i386: Emulate MMX ssse3_phwv4hi3 with SSE
> > >   i386: Emulate MMX ssse3_phdv2si3 with SSE
> > >   i386: Emulate MMX ssse3_pmaddubsw with SSE
> > >   i386: Emulate MMX ssse3_pmulhrswv4hi3 with SSE
> > >   i386: Emulate MMX pshufb with SSE version
> > >   i386: Emulate MMX ssse3_psign3 with SSE
> > >   i386: Emulate MMX ssse3_palignrdi with SSE
> > >   i386: Emulate MMX abs2 with SSE
> > >   i386: Allow MMXMODE moves with TARGET_MMX_WITH_SSE
> > >   i386: Allow MMX vector expanders with TARGET_MMX_WITH_SSE
> > >   i386: Allow MMX intrinsic emulation with SSE
> > >   i386: Enable TM MMX intrinsics with SSE2
> > >   i386: Add tests for MMX intrinsic emulations with SSE
> > >
> > > Uros Bizjak (1):
> > >   Prevent allocation of MMX registers with TARGET_MMX_WITH_SSE
> > >
> > >  gcc/config/i386/constraints.md|   6 +
> > >  gcc/config/i386/i386-builtin.def  | 126 +--
> > >  gcc/config/i386/i386-c.c  |   2 +
> > >  gcc/config/i386/i386-protos.h |   4 +
> > >  gcc/config/i386/i386.c| 189 +++-
> > >  gcc/config/i386/i386.h   

Re: libgo patch committed: Add S/390 support to internal/cpu package

2019-02-15 Thread Matthias Klose
On 15.02.19 15:52, Ian Lance Taylor wrote:
> This patch by Robin Dapp adds S/390 support to the internal/cpu
> package.  This partially addresses PR 89123.  I bootstrapped it on
> x86_64-pc-linux-gnu, which means little.  Committed to mainline.

fails in the -m31 multilib variant with

libtool: compile:  /<>/build/./gcc/xgcc
-B/<>/build/./gcc/ -B/usr/s390x-linux-gnu/bin/
-B/usr/s390x-linux-gnu/lib/ -isystem /usr/s390x-linux-gnu/include -isystem
/usr/s390x-linux-gnu/sys-include -isys
tem /<>/build/sys-include -m31 -DHAVE_CONFIG_H -I.
-I../../../../src/libgo -I ../../../../src/libgo/
runtime -I../../../../src/libgo/../libffi/include -I../libffi/include -pthread
-L../libatomic/.libs -fexceptions
-fnon-call-exceptions -fno-stack-protector -fsplit-stack -Wall -Wextra
-Wwrite-strings -Wcast-qual -D_GNU_SOURCE
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I ../../../../src/libgo/../libgcc -I
../../../../src/libgo/../libback
trace -I ../../../gcc/include -g -O2 -m31 -c
../../../../src/libgo/go/internal/cpu/cpu_gccgo.c  -fPIC -DPIC -o in
ternal/cpu/.libs/cpu_gccgo.o

../../../../src/libgo/go/internal/cpu/cpu_gccgo.c: Assembler messages:
../../../../src/libgo/go/internal/cpu/cpu_gccgo.c:91: Error: Unrecognized
opcode: `lghi'
../../../../src/libgo/go/internal/cpu/cpu_gccgo.c:105: Error: Unrecognized
opcode: `lghi'
../../../../src/libgo/go/internal/cpu/cpu_gccgo.c:119: Error: Unrecognized
opcode: `lghi'
../../../../src/libgo/go/internal/cpu/cpu_gccgo.c:134: Error: Unrecognized
opcode: `lghi'
../../../../src/libgo/go/internal/cpu/cpu_gccgo.c:149: Error: Unrecognized
opcode: `lghi'
../../../../src/libgo/go/internal/cpu/cpu_gccgo.c:164: Error: Unrecognized
opcode: `lghi'
../../../../src/libgo/go/internal/cpu/cpu_gccgo.c:179: Error: Unrecognized
opcode: `lghi'
make[10]: *** [Makefile:2899: internal/cpu/cpu_gccgo.lo] Error 1

make[10]: *** Waiting for unfinished jobs
make[10]: Leaving directory '/<>/build/s390x-linux-gnu/32/libgo'
make[9]: *** [Makefile:2242: all-recursive] Error 1
make[9]: Leaving directory '/<>/build/s390x-linux-gnu/32/libgo'
make[8]: *** [Makefile:1167: all] Error 2
make[8]: Leaving directory '/<>/build/s390x-linux-gnu/32/libgo'
make[7]: *** [Makefile:3062: multi-do] Error 1

using binutils 2.32


[PATCH] i386: Fix ')' in VALID_MMX_REG_MODE

2019-02-15 Thread H.J. Lu
Replace "(MODE == V1DImode)" with "(MODE) == V1DImode".

* config/i386/i386.h (VALID_MMX_REG_MODE): Correct the misplaced
')'.
---
 gcc/ChangeLog  | 5 +
 gcc/config/i386/i386.h | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index d1083735e26..96f8679e8f9 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2019-02-15  H.J. Lu  
+
+   * config/i386/i386.h (VALID_MMX_REG_MODE): Correct the misplaced
+   ')'.
+
 2019-02-15  Uroš Bizjak  
 
* config/i386/darwin.h (TARGET_FPMATH_DEFAULT_P): New define.
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index d9039060997..4fd8bc40a34 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1158,7 +1158,7 @@ extern const char *host_detect_local_cpu (int argc, const 
char **argv);
   ((MODE) == V2SFmode || (MODE) == SFmode)
 
 #define VALID_MMX_REG_MODE(MODE)   \
-  ((MODE == V1DImode) || (MODE) == DImode  \
+  ((MODE) == V1DImode || (MODE) == DImode  \
|| (MODE) == V2SImode || (MODE) == SImode   \
|| (MODE) == V4HImode || (MODE) == V8QImode)
 
-- 
2.20.1



[PR fortran/89077, patch, part 3] - ICE using * as len specifier for character parameter

2019-02-15 Thread Harald Anlauf
The attached patch is the third in a series for the above PR.
This one fixes erroneous padding with garbage characters in some
declaration and initialization expressions.

The issue here was that expr->representation is set when either
Hollerith strings are used or a TRANSFER statement is involved.
As a result, the original string could be used with trailing
garbage instead of the properly space-padded string.  The patch
simply clears expr->representation in that case.

Regtested on x86_64-pc-linux-gnu.

OK for trunk?

Thanks,
Harald

2019-02-15  Harald Anlauf  

PR fortran/89077
* decl.c (gfc_set_constant_character_len): Clear original string
representation after padding has been performed to target length.

2019-02-15  Harald Anlauf  

PR fortran/89077
* gfortran.dg/transfer_simplify_12.f90: New test.

Index: gcc/fortran/decl.c
===
--- gcc/fortran/decl.c  (revision 268946)
+++ gcc/fortran/decl.c  (working copy)
@@ -1754,6 +1754,14 @@
   free (expr->value.character.string);
   expr->value.character.string = s;
   expr->value.character.length = len;
+  /* If explicit representation was given, clear it
+as it is no longer needed after padding.  */
+  if (expr->representation.length)
+   {
+ expr->representation.length = 0;
+ free (expr->representation.string);
+ expr->representation.string = NULL;
+   }
 }
 }
 
Index: gcc/testsuite/gfortran.dg/transfer_simplify_12.f90
===
--- gcc/testsuite/gfortran.dg/transfer_simplify_12.f90  (nonexistent)
+++ gcc/testsuite/gfortran.dg/transfer_simplify_12.f90  (working copy)
@@ -0,0 +1,27 @@
+! { dg-do run }
+! { dg-options "-O -std=legacy" }
+!
+! Test fixes for some findings while resolving PR fortran/89077
+
+program test
+  implicit none
+  integer :: i
+  character(*)  ,parameter :: s =  'abcdef'   ! Length will be 6
+  character(*)  ,parameter :: h = 6Habcdef! Length will be 8 (Hollerith!)
+  character(10) ,parameter :: k = 6Habcdef
+  character(10) ,parameter :: t = transfer (s, s)
+  character(10) ,save  :: u = transfer (s, s)
+  character(10) ,parameter :: v = transfer (h, h)
+  character(10) ,save  :: w = transfer (h, h)
+  character(10) ,parameter :: x = transfer ([(s(i:i),i=len(s),1,-1)], s)
+  character(10) ,save  :: y = transfer ([(s(i:i),i=len(s),1,-1)], s)
+  if (len (h) /= 8) stop 1
+  if (h /= s) stop 2
+  if (k /= s) stop 3
+  if (t /= s) stop 4
+  if (u /= s) stop 5
+  if (v /= s) stop 6
+  if (w /= s) stop 7
+  if (x /= "fedcba") stop 8
+  if (y /= x) stop 9
+end program test


Go patch committed: Don't use a nil check for the write barrier

2019-02-15 Thread Ian Lance Taylor
This patch to the Go frontend by Than McIntosh tweaks the recipe for
generating writeBarrier loads to insure that the dereference expr is
marked as not requiring a nil check.  This should fix gcc PR 89368.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 268941)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-0563f2d018cdb2cd685c254bac5ceb38396d0a27
+1a74b8a22b2ff7f430729aa87ecb8cea7b5cdd70
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/wb.cc
===
--- gcc/go/gofrontend/wb.cc (revision 268923)
+++ gcc/go/gofrontend/wb.cc (working copy)
@@ -904,7 +904,8 @@ Gogo::check_write_barrier(Block* enclosi
   ref = Expression::make_unary(OPERATOR_AND, ref, loc);
   ref = Expression::make_cast(unsafe_pointer_type, ref, loc);
   ref = Expression::make_cast(puint32_type, ref, loc);
-  ref = Expression::make_unary(OPERATOR_MULT, ref, loc);
+  ref = Expression::make_dereference(ref,
+ Expression::NIL_CHECK_NOT_NEEDED, loc);
   Expression* zero = Expression::make_integer_ul(0, ref->type(), loc);
   Expression* cond = Expression::make_binary(OPERATOR_EQEQ, ref, zero, loc);
 


Re: [PATCH] Avoid assuming valid_constant_size_p argument is a constant expression (PR 89294)

2019-02-15 Thread Martin Sebor

On 2/15/19 12:24 AM, Eric Botcazou wrote:

The attached patch removes the assumption introduced earlier today
in my fix for bug 87996 that the valid_constant_size_p argument is
a constant expression.  I couldn't come up with a C/C++ test case
where this isn't true but apparently it can happen in Ada which I
inadvertently didn't build.


Can we do something here?  Our internal testers have been down for 3 days
because of this blunder...


I'm ready to commit the patch once it's approved, and have been since
the day the problem was reported.


Martin




Re: [patch] Disable store merging in asan_expand_mark_ifn

2019-02-15 Thread Eric Botcazou
> > OK, revised patch attached.  I have manually verified that it yields the
> > expected result for an array of long doubles on 64-bit SPARC.
> > 
> > 
> > 2019-02-12  Eric Botcazou  
> > 
> > * asan.c (asan_expand_mark_ifn): Take into account the alignment of
> > the object to pick the size of stores on strict-alignment platforms.
> 
> Ok, thanks.

Glad you insisted in the end, because I have ASAN working on SPARC64/Linux, 
but only after fixing another bug on 64-bit strict-alignment platforms:

  /* Align base if target is STRICT_ALIGNMENT.  */
  if (STRICT_ALIGNMENT)
base = expand_binop (Pmode, and_optab, base,
 gen_int_mode (-((GET_MODE_ALIGNMENT (SImode)
  << ASAN_SHADOW_SHIFT)
 / BITS_PER_UNIT), Pmode), NULL_RTX,
 1, OPTAB_DIRECT);

GET_MODE_ALIGNMENT is unsigned int so this zero-extends to unsigned long...

Tested on 32-bit and 64-bit SPARC/Linux, applied on mainline as obvious.


2019-02-15  Eric Botcazou  

* asan.c (asan_emit_stack_protection): Use full-sized mask to align
the base address on 64-bit strict-alignment platforms.

-- 
Eric BotcazouIndex: asan.c
===
--- asan.c	(revision 268849)
+++ asan.c	(working copy)
@@ -1440,13 +1441,15 @@ asan_emit_stack_protection (rtx base, rt
 	base_align_bias = ((asan_frame_size + alignb - 1)
 			   & ~(alignb - HOST_WIDE_INT_1)) - asan_frame_size;
 }
+
   /* Align base if target is STRICT_ALIGNMENT.  */
   if (STRICT_ALIGNMENT)
-base = expand_binop (Pmode, and_optab, base,
-			 gen_int_mode (-((GET_MODE_ALIGNMENT (SImode)
-	  << ASAN_SHADOW_SHIFT)
-	 / BITS_PER_UNIT), Pmode), NULL_RTX,
-			 1, OPTAB_DIRECT);
+{
+  const HOST_WIDE_INT align
+	= (GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT) << ASAN_SHADOW_SHIFT;
+  base = expand_binop (Pmode, and_optab, base, gen_int_mode (-align, Pmode),
+			   NULL_RTX, 1, OPTAB_DIRECT);
+}
 
   if (use_after_return_class == -1 && pbase)
 emit_move_insn (pbase, base);
@ -1534,7 +1548,7 @@ asan_emit_stack_protection (rtx base, rt
   shadow_mem = gen_rtx_MEM (SImode, shadow_base);
   set_mem_alias_set (shadow_mem, asan_shadow_set);
   if (STRICT_ALIGNMENT)
-set_mem_align (shadow_mem, (GET_MODE_ALIGNMENT (SImode)));
+set_mem_align (shadow_mem, GET_MODE_ALIGNMENT (SImode));
   prev_offset = base_offset;
 
   asan_redzone_buffer rz_buffer (shadow_mem, prev_offset);


[PATCH, og8] Don't rescan "attach" node for dereferenced struct member

2019-02-15 Thread Julian Brown
Hi,

The following (og8 branch) patch added support for
attaching/detaching from dereferenced struct members:

https://gcc.gnu.org/ml/gcc-patches/2019-01/msg01778.html

Unfortunately I made a mistake in the portion of that patch that
inserts new alloc and firstprivate_pointer nodes for the struct base,
meaning that the node rewritten to an attach operation would be
scanned again. This is both unnecessary, and can cause problems in some
circumstances.

Tested with offloading to nvptx, no regressions and the new test passes.
I will apply (to the og8 branch) shortly.

Thanks,

Julian

ChangeLog

gcc/
* gimplify.c (gimplify_scan_omp_clauses): Avoid scanning
'c' again after creating base-pointer nodes for
dereferenced struct.

gcc/testsuite/
* gfortran.dg/goacc/derived-types-2.f90: New.
commit e374d415801588435d62ac214e0313ffd3ef2198
Author: Julian Brown 
Date:   Thu Feb 14 16:40:21 2019 -0800

[og8] Don't rescan "attach" node for dereferenced struct member

gcc/
* gimplify.c (gimplify_scan_omp_clauses): Avoid scanning 'c' again
after creating base-pointer nodes for dereferenced struct.

gcc/testsuite/
* gfortran.dg/goacc/derived-types-2.f90: New.

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 8bf11eb659e..2ff5b68e0cc 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8289,8 +8289,6 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 		  *list_p = c2;
 		  OMP_CLAUSE_CHAIN (c2) = c3;
 		  OMP_CLAUSE_CHAIN (c3) = c;
-		  c = c3;
-		  list_p = &OMP_CLAUSE_CHAIN (c3);
 
 		  struct_deref_set->add (decl);
 		}
diff --git a/gcc/testsuite/gfortran.dg/goacc/derived-types-2.f90 b/gcc/testsuite/gfortran.dg/goacc/derived-types-2.f90
new file mode 100644
index 000..d01583fac89
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/derived-types-2.f90
@@ -0,0 +1,14 @@
+module bar
+  type :: type1
+ real(8), pointer, public :: p(:) => null()
+  end type
+  type :: type2
+ class(type1), pointer :: p => null()
+  end type
+end module
+
+subroutine foo (var)
+   use bar
+   type(type2), intent(inout) :: var
+   !$acc enter data create(var%p%p)
+end subroutine


Re: [PATCH] Avoid assuming valid_constant_size_p argument is a constant expression (PR 89294)

2019-02-15 Thread Martin Sebor

Ping: https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00857.html

Jason, since you approved the original patch, can you please also
review this one?  Due to the Ada test breakage there seems to be
some anxiety about getting the problem corrected soon.

Thanks
Martin

On 2/11/19 6:13 PM, Martin Sebor wrote:

The attached patch removes the assumption introduced earlier today
in my fix for bug 87996 that the valid_constant_size_p argument is
a constant expression.  I couldn't come up with a C/C++ test case
where this isn't true but apparently it can happen in Ada which I
inadvertently didn't build.  I still haven't figured out what
I have to do to build it on my Fedora 29 machine so I tested
this change by hand (besides bootstrapping w/o Ada).

The first set of instructions Google gives me don't seem to do
it:

   https://fedoraproject.org/wiki/Features/Ada_developer_tools

and neither does dnf install gcc-gnat as explained on our Wiki:

   https://gcc.gnu.org/wiki/GNAT

If someone knows the magic chant I would be grateful (it might
be helpful to also update the Wiki page -- the last change to
it was made in 2012; I volunteer to do that).

Martin




Re: [PATCH] Avoid assuming valid_constant_size_p argument is a constant expression (PR 89294)

2019-02-15 Thread Eric Botcazou
> I'm ready to commit the patch once it's approved, and have been since
> the day the problem was reported.

Maybe CCing whoever approved the previous patch would help?

-- 
Eric Botcazou


Re: [PATCH] Avoid assuming valid_constant_size_p argument is a constant expression (PR 89294)

2019-02-15 Thread Martin Sebor

On 2/15/19 3:46 PM, Eric Botcazou wrote:

I'm ready to commit the patch once it's approved, and have been since
the day the problem was reported.


Maybe CCing whoever approved the previous patch would help?


I just pinged the patch a few minutes ago and CC'd Jason.  Sorry
about any trouble this has caused.

Martin


[SPARC] Small ASAN fixes

2019-02-15 Thread Eric Botcazou
This automatically passes -funwind-tables when ASAN is used on Linux, as done 
for other architectures, and also adjusts the shadow offset in 64-bit mode.

Tested on SPARC64/Linux, applied on the mainline.


2019-02-15  Eric Botcazou  

* config/sparc/linux.h (ASAN_CC1_SPEC): Define.
(CC1_SPEC): Use GNU_USER_TARGET_CC1_SPEC and ASAN_CC1_SPEC.
* config/sparc/linux64.h (ASAN_CC1_SPEC): Likewise.
(CC1_SPEC): Likewise.
* config/sparc/sparc.c (sparc_asan_shadow_offset): Adjust for 64-bit.

-- 
Eric BotcazouIndex: config/sparc/linux.h
===
--- config/sparc/linux.h	(revision 268849)
+++ config/sparc/linux.h	(working copy)
@@ -54,10 +54,11 @@ extern const char *host_detect_local_cpu
 
 #define DRIVER_SELF_SPECS MCPU_MTUNE_NATIVE_SPECS
 
-/* This is for -profile to use -lc_p instead of -lc.  */
-#undef	CC1_SPEC
-#define	CC1_SPEC "%{profile:-p} \
-"
+#undef  ASAN_CC1_SPEC
+#define ASAN_CC1_SPEC "%{%:sanitize(address):-funwind-tables}"
+
+#undef  CC1_SPEC
+#define CC1_SPEC GNU_USER_TARGET_CC1_SPEC ASAN_CC1_SPEC
 
 #undef SIZE_TYPE
 #define SIZE_TYPE "unsigned int"
Index: config/sparc/linux64.h
===
--- config/sparc/linux64.h	(revision 268849)
+++ config/sparc/linux64.h	(working copy)
@@ -143,24 +143,25 @@ extern const char *host_detect_local_cpu
 
 #define DRIVER_SELF_SPECS MCPU_MTUNE_NATIVE_SPECS
 
-#undef	CC1_SPEC
+#undef  ASAN_CC1_SPEC
+#define ASAN_CC1_SPEC "%{%:sanitize(address):-funwind-tables}"
+
+#undef  CC1_SPEC
 #if DEFAULT_ARCH32_P
-#define CC1_SPEC "%{profile:-p} \
-%{m32:%{m64:%emay not use both -m32 and -m64}} \
+#define CC1_SPEC GNU_USER_TARGET_CC1_SPEC ASAN_CC1_SPEC \
+"%{m32:%{m64:%emay not use both -m32 and -m64}} \
 %{m64:-mptr64 -mstack-bias -mlong-double-128 \
   %{!mcpu*:-mcpu=ultrasparc} \
-  %{!mno-vis:%{!mcpu=v9:-mvis}}} \
-"
+  %{!mno-vis:%{!mcpu=v9:-mvis}}}"
 #else
-#define CC1_SPEC "%{profile:-p} \
-%{m32:%{m64:%emay not use both -m32 and -m64}} \
+#define CC1_SPEC GNU_USER_TARGET_CC1_SPEC ASAN_CC1_SPEC \
+"%{m32:%{m64:%emay not use both -m32 and -m64}} \
 %{m32:-mptr32 -mno-stack-bias %{!mlong-double-128:-mlong-double-64} \
   %{!mcpu*:-mcpu=cypress}} \
 %{mv8plus:-mptr32 -mno-stack-bias %{!mlong-double-128:-mlong-double-64} \
   %{!mcpu*:-mcpu=v9}} \
 %{!m32:%{!mcpu*:-mcpu=ultrasparc}} \
-%{!mno-vis:%{!m32:%{!mcpu=v9:-mvis}}} \
-"
+%{!mno-vis:%{!m32:%{!mcpu=v9:-mvis}}}"
 #endif
 
 /* Support for a compile-time default CPU, et cetera.  The rules are:
Index: config/sparc/sparc.c
===
--- config/sparc/sparc.c	(revision 268849)
+++ config/sparc/sparc.c	(working copy)
@@ -12524,7 +12524,7 @@ sparc_init_machine_status (void)
 static unsigned HOST_WIDE_INT
 sparc_asan_shadow_offset (void)
 {
-  return TARGET_ARCH64 ? HOST_WIDE_INT_C (0x7fff8000) : (HOST_WIDE_INT_1 << 29);
+  return TARGET_ARCH64 ? (HOST_WIDE_INT_1 << 43) : (HOST_WIDE_INT_1 << 29);
 }
 
 /* This is called from dwarf2out.c via TARGET_ASM_OUTPUT_DWARF_DTPREL.


[testsuite] Couple of g++.dg/asan tweaks

2019-02-15 Thread Eric Botcazou
One of the tests in g++.dg/asan/asan_oob_test.cc uses unaligned memory 
accesses and g++.dg/asan/function-argument-3.C assumes a specific kind of 
calling conventions for vectors.

Tested on SPARC64/Linux, applied on the mainline.


2019-02-15  Eric Botcazou  

* g++.dg/asan/asan_oob_test.cc: Skip OOB_int on SPARC.
* g++.dg/asan/function-argument-3.C: Tweak for 32-bit SPARC.

-- 
Eric BotcazouIndex: g++.dg/asan/asan_oob_test.cc
===
--- g++.dg/asan/asan_oob_test.cc	(revision 268849)
+++ g++.dg/asan/asan_oob_test.cc	(working copy)
@@ -68,9 +68,13 @@ TEST(AddressSanitizer, OOB_char) {
   OOBTest();
 }
 
+// The following test uses unaligned memory accesses
+
+#if !defined(__sparc__)
 TEST(AddressSanitizer, OOB_int) {
   OOBTest();
 }
+#endif
 
 TEST(AddressSanitizer, OOBRightTest) {
   for (size_t access_size = 1; access_size <= 8; access_size *= 2) {
Index: g++.dg/asan/function-argument-3.C
===
--- g++.dg/asan/function-argument-3.C	(revision 268849)
+++ g++.dg/asan/function-argument-3.C	(working copy)
@@ -2,7 +2,16 @@
 // { dg-shouldfail "asan" }
 // { dg-additional-options "-Wno-psabi" }
 
+// On SPARC 32-bit, only vectors up to 8 bytes are passed in registers
+#if defined(__sparc__) && !defined(__sparcv9) && !defined(__arch64__)
+#define SMALL_VECTOR
+#endif
+
+#ifdef SMALL_VECTOR
+typedef int v4si __attribute__ ((vector_size (8)));
+#else
 typedef int v4si __attribute__ ((vector_size (16)));
+#endif
 
 static __attribute__ ((noinline)) int
 goo (v4si *a)
@@ -19,10 +28,14 @@ foo (v4si arg)
 int
 main ()
 {
+#ifdef SMALL_VECTOR
+  v4si v = {1,2};
+#else
   v4si v = {1,2,3,4};
+#endif
   return foo (v);
 }
 
 // { dg-output "ERROR: AddressSanitizer: stack-buffer-overflow on address.*(\n|\r\n|\r)" }
 // { dg-output "READ of size . at.*" }
-// { dg-output ".*'arg' \\(line 14\\) <== Memory access at offset \[0-9\]* overflows this variable.*" }
+// { dg-output ".*'arg' \\(line 23\\) <== Memory access at offset \[0-9\]* overflows this variable.*" }


Go patch committed: Use __builtin_dwarf_cfa for getcallersp

2019-02-15 Thread Ian Lance Taylor
This patch by Cherry Zhang changes the Go compiler and runtime to use
__builtin_dwarf_cfa for getcallersp.  Currently, the compiler lowers
runtime.getcallersp to __builtin_frame_address(1).  In the C side of
the runtime, getcallersp is defined as __builtin_frame_address(0).
They don't match.  Further, neither of them actually returns the
caller's SP.  On x86_64, __builtin_frame_address(0) just returns the
frame pointer.  __builtin_frame_address(1) returns the memory content
where the frame pointer points to, which is typically the caller's
frame pointer but can also be garbage if the frame pointer is not
enabled.

This patch changes getcallersp to use __builtin_dwarf_cfa(), which
returns the caller's SP at the call site.  This matches the SP we get
from unwinding the stack.

Currently getcallersp is not used for anything real. It will be used
for precise stack scan.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian


2019-02-15  Cherry Zhang  

* go-gcc.cc (Gcc_backend::Gcc_backend): Define __builtin_dwarf_cfa
instead of __builtin_frame_address.
Index: gcc/go/go-gcc.cc
===
--- gcc/go/go-gcc.cc(revision 268369)
+++ gcc/go/go-gcc.cc(working copy)
@@ -734,8 +734,9 @@ Gcc_backend::Gcc_backend()
   this->define_builtin(BUILT_IN_RETURN_ADDRESS, "__builtin_return_address",
   NULL, t, false, false);
 
-  // The runtime calls __builtin_frame_address for runtime.getcallersp.
-  this->define_builtin(BUILT_IN_FRAME_ADDRESS, "__builtin_frame_address",
+  // The runtime calls __builtin_dwarf_cfa for runtime.getcallersp.
+  t = build_function_type_list(ptr_type_node, NULL_TREE);
+  this->define_builtin(BUILT_IN_DWARF_CFA, "__builtin_dwarf_cfa",
   NULL, t, false, false);
 
   // The runtime calls __builtin_extract_return_addr when recording
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 268948)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-1a74b8a22b2ff7f430729aa87ecb8cea7b5cdd70
+9605c2efd99aa9c744652a9153e208e0653b8596
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 268923)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -9903,17 +9903,18 @@ Call_expression::do_lower(Gogo* gogo, Na
  && n == "getcallerpc")
{
  static Named_object* builtin_return_address;
+  int arg = 0;
  return this->lower_to_builtin(&builtin_return_address,
"__builtin_return_address",
-   0);
+   &arg);
}
  else if ((this->args_ == NULL || this->args_->size() == 0)
   && n == "getcallersp")
{
- static Named_object* builtin_frame_address;
- return this->lower_to_builtin(&builtin_frame_address,
-   "__builtin_frame_address",
-   1);
+ static Named_object* builtin_dwarf_cfa;
+ return this->lower_to_builtin(&builtin_dwarf_cfa,
+   "__builtin_dwarf_cfa",
+   NULL);
}
}
 }
@@ -10031,21 +10032,24 @@ Call_expression::lower_varargs(Gogo* gog
   this->varargs_are_lowered_ = true;
 }
 
-// Return a call to __builtin_return_address or __builtin_frame_address.
+// Return a call to __builtin_return_address or __builtin_dwarf_cfa.
 
 Expression*
 Call_expression::lower_to_builtin(Named_object** pno, const char* name,
- int arg)
+ int* arg)
 {
   if (*pno == NULL)
-*pno = Gogo::declare_builtin_rf_address(name);
+*pno = Gogo::declare_builtin_rf_address(name, arg != NULL);
 
   Location loc = this->location();
 
   Expression* fn = Expression::make_func_reference(*pno, NULL, loc);
-  Expression* a = Expression::make_integer_ul(arg, NULL, loc);
   Expression_list *args = new Expression_list();
-  args->push_back(a);
+  if (arg != NULL)
+{
+  Expression* a = Expression::make_integer_ul(*arg, NULL, loc);
+  args->push_back(a);
+}
   Expression* call = Expression::make_call(fn, args, false, loc);
 
   // The builtin functions return void*, but the Go functions return uintptr.
Index: gcc/go/gofrontend/expressions.h
===
--- gcc/go/gofrontend/expressions.h (revision 268369)
+++ gcc/go/gofrontend/expressions.h (working copy)
@@ -2356,7 +2356,7 @@ class Call_expression : pub

Re: Fortran vector math header

2019-02-15 Thread Steve Kargl
On Tue, Feb 05, 2019 at 01:47:57PM +0100, Martin Liška wrote:
> 
> gcc/fortran/ChangeLog:
> 
> 2019-01-24  Martin Liska  
> 
>   * decl.c (gfc_match_gcc_builtin): Add support for filtering
>   of builtin directive based on multilib ABI name.
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-01-24  Martin Liska  
> 
>   * gfortran.dg/simd-builtins-7.f90: New test.
>   * gfortran.dg/simd-builtins-7.h: New test.

The Fortran bits look ok to me.

-- 
steve


[PATCH 04/42] i386: Emulate MMX plusminus/sat_plusminus with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX plusminus/sat_plusminus with SSE.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/mmx.md (MMXMODEI8): Require TARGET_SSE2 for V1DI.
(plusminus:mmx_3): Check
TARGET_MMX_WITH_SSE.
(sat_plusminus:mmx_3): Likewise.
(3): New.
(*mmx_3): Add SSE emulation.
(*mmx_3): Likewise.
---
 gcc/config/i386/mmx.md | 59 +++---
 1 file changed, 38 insertions(+), 21 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 34fecd6a745..517c3283963 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -45,7 +45,7 @@
 
 ;; 8 byte integral modes handled by MMX (and by extension, SSE)
 (define_mode_iterator MMXMODEI [V8QI V4HI V2SI])
-(define_mode_iterator MMXMODEI8 [V8QI V4HI V2SI V1DI])
+(define_mode_iterator MMXMODEI8 [V8QI V4HI V2SI (V1DI "TARGET_SSE2")])
 
 ;; All 8-byte vector modes handled by MMX
 (define_mode_iterator MMXMODE [V8QI V4HI V2SI V1DI V2SF])
@@ -663,39 +663,56 @@
 (define_expand "mmx_3"
   [(set (match_operand:MMXMODEI8 0 "register_operand")
(plusminus:MMXMODEI8
- (match_operand:MMXMODEI8 1 "nonimmediate_operand")
- (match_operand:MMXMODEI8 2 "nonimmediate_operand")))]
-  "TARGET_MMX || (TARGET_SSE2 && mode == V1DImode)"
+ (match_operand:MMXMODEI8 1 "register_mmxmem_operand")
+ (match_operand:MMXMODEI8 2 "register_mmxmem_operand")))]
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
+  "ix86_fixup_binary_operands_no_copy (, mode, operands);")
+
+(define_expand "3"
+  [(set (match_operand:MMXMODEI 0 "register_operand")
+   (plusminus:MMXMODEI
+ (match_operand:MMXMODEI 1 "register_operand")
+ (match_operand:MMXMODEI 2 "register_operand")))]
+  "TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (, mode, operands);")
 
 (define_insn "*mmx_3"
-  [(set (match_operand:MMXMODEI8 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI8 0 "register_operand" "=y,x,Yv")
 (plusminus:MMXMODEI8
- (match_operand:MMXMODEI8 1 "nonimmediate_operand" "0")
- (match_operand:MMXMODEI8 2 "nonimmediate_operand" "ym")))]
-  "(TARGET_MMX || (TARGET_SSE2 && mode == V1DImode))
+ (match_operand:MMXMODEI8 1 "register_mmxmem_operand" "0,0,Yv")
+ (match_operand:MMXMODEI8 2 "register_mmxmem_operand" "ym,x,Yv")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
&& ix86_binary_operator_ok (, mode, operands)"
-  "p\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxadd")
-   (set_attr "mode" "DI")])
+  "@
+   p\t{%2, %0|%0, %2}
+   p\t{%2, %0|%0, %2}
+   vp\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxadd,sseadd,sseadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_3"
   [(set (match_operand:MMXMODE12 0 "register_operand")
(sat_plusminus:MMXMODE12
- (match_operand:MMXMODE12 1 "nonimmediate_operand")
- (match_operand:MMXMODE12 2 "nonimmediate_operand")))]
-  "TARGET_MMX"
+ (match_operand:MMXMODE12 1 "register_mmxmem_operand")
+ (match_operand:MMXMODE12 2 "register_mmxmem_operand")))]
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (, mode, operands);")
 
 (define_insn "*mmx_3"
-  [(set (match_operand:MMXMODE12 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODE12 0 "register_operand" "=y,x,Yv")
 (sat_plusminus:MMXMODE12
- (match_operand:MMXMODE12 1 "nonimmediate_operand" "0")
- (match_operand:MMXMODE12 2 "nonimmediate_operand" "ym")))]
-  "TARGET_MMX && ix86_binary_operator_ok (, mode, operands)"
-  "p\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxadd")
-   (set_attr "mode" "DI")])
+ (match_operand:MMXMODE12 1 "register_mmxmem_operand" "0,0,Yv")
+ (match_operand:MMXMODE12 2 "register_mmxmem_operand" "ym,x,Yv")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && ix86_binary_operator_ok (, mode, operands)"
+  "@
+   p\t{%2, %0|%0, %2}
+   p\t{%2, %0|%0, %2}
+   vp\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxadd,sseadd,sseadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_mulv4hi3"
   [(set (match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 01/42] i386: Allow MMX register modes in SSE registers

2019-02-15 Thread H.J. Lu
In 64-bit mode, SSE2 can be used to emulate MMX instructions without
3DNOW.  We can use SSE2 to support MMX register modes.

PR target/89021
* config/i386/i386-c.c (ix86_target_macros_internal): Define
__MMX_WITH_SSE__ for TARGET_MMX_WITH_SSE.
* config/i386/i386.c (ix86_set_reg_reg_cost): Add support for
TARGET_MMX_WITH_SSE with VALID_MMX_REG_MODE.
(ix86_vector_mode_supported_p): Likewise.
* config/i386/i386.h (TARGET_MMX_WITH_SSE): New.
---
 gcc/config/i386/i386-c.c | 2 ++
 gcc/config/i386/i386.c   | 5 +++--
 gcc/config/i386/i386.h   | 2 ++
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 5e7e46fcebe..213e1b56c6b 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -548,6 +548,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
 def_or_undef (parse_in, "__CLDEMOTE__");
   if (isa_flag2 & OPTION_MASK_ISA_PTWRITE)
 def_or_undef (parse_in, "__PTWRITE__");
+  if (TARGET_MMX_WITH_SSE)
+def_or_undef (parse_in, "__MMX_WITH_SSE__");
   if (TARGET_IAMCU)
 {
   def_or_undef (parse_in, "__iamcu");
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 3e5f52175d2..7d7dd80930e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -40490,7 +40490,8 @@ ix86_set_reg_reg_cost (machine_mode mode)
  || (TARGET_AVX && VALID_AVX256_REG_MODE (mode))
  || (TARGET_SSE2 && VALID_SSE2_REG_MODE (mode))
  || (TARGET_SSE && VALID_SSE_REG_MODE (mode))
- || (TARGET_MMX && VALID_MMX_REG_MODE (mode)))
+ || ((TARGET_MMX || TARGET_MMX_WITH_SSE)
+ && VALID_MMX_REG_MODE (mode)))
units = GET_MODE_SIZE (mode);
 }
 
@@ -44316,7 +44317,7 @@ ix86_vector_mode_supported_p (machine_mode mode)
 return true;
   if (TARGET_AVX512F && VALID_AVX512F_REG_MODE (mode))
 return true;
-  if (TARGET_MMX && VALID_MMX_REG_MODE (mode))
+  if ((TARGET_MMX ||TARGET_MMX_WITH_SSE) && VALID_MMX_REG_MODE (mode))
 return true;
   if (TARGET_3DNOW && VALID_MMX_REG_MODE_3DNOW (mode))
 return true;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 4fd8bc40a34..91b233022c2 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -201,6 +201,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #define TARGET_16BIT   TARGET_CODE16
 #define TARGET_16BIT_P(x)  TARGET_CODE16_P(x)
 
+#define TARGET_MMX_WITH_SSE(TARGET_64BIT && TARGET_SSE2)
+
 #include "config/vxworks-dummy.h"
 
 #include "config/i386/i386-opts.h"
-- 
2.20.1



[PATCH 00/42] V7: Emulate MMX intrinsics with SSE

2019-02-15 Thread H.J. Lu
On x86-64, since __m64 is returned and passed in XMM registers, we can
emulate MMX intrinsics with SSE instructions. To support it, we added

 #define TARGET_MMX_WITH_SSE (TARGET_64BIT && TARGET_SSE2)

;; Define instruction set of MMX instructions
(define_attr "mmx_isa" "base,native,x64,x64_noavx,x64_avx"
  (const_string "base"))

 (eq_attr "mmx_isa" "native")
   (symbol_ref "!TARGET_MMX_WITH_SSE")
 (eq_attr "mmx_isa" "x64")
   (symbol_ref "TARGET_MMX_WITH_SSE")
 (eq_attr "mmx_isa" "x64_avx")
   (symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX")
 (eq_attr "mmx_isa" "x64_noavx")
   (symbol_ref "TARGET_MMX_WITH_SSE && !TARGET_AVX")

We added SSE emulation to MMX patterns and disabled MMX alternatives with
TARGET_MMX_WITH_SSE.

Most of MMX instructions have equivalent SSE versions and results of some
SSE versions need to be reshuffled to the right order for MMX.  Thee are
couple tricky cases:

1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent.  We emulate MMX
maskmovq with SSE2 maskmovdqu by zeroing out the upper 64 bits of the
mask operand and handle unmapped bits 64:127 at memory address by
adjusting source and mask operands together with memory address.

2. MMX movntq is emulated with SSE2 DImode movnti, which is available
in 64-bit mode.

3. MMX pshufb takes a 3-bit index while SSE pshufb takes a 4-bit index.
SSE emulation must clear the bit 4 in the shuffle control mask.

4. To emulate MMX cvtpi2p with SSE2 cvtdq2ps, we must properly preserve
the upper 64 bits of destination XMM register.

Tests are also added to check each SSE emulation of MMX intrinsics.

There are no regressions on i686 and x86-64.  For x86-64, GCC is also
tested with

--with-arch=native --with-cpu=native

on AVX2 and AVX512F machines.

H.J. Lu (41):
  i386: Allow MMX register modes in SSE registers
  i386: Emulate MMX packsswb/packssdw/packuswb with SSE2
  i386: Emulate MMX punpcklXX/punpckhXX with SSE punpcklXX
  i386: Emulate MMX plusminus/sat_plusminus with SSE
  i386: Emulate MMX mulv4hi3 with SSE
  i386: Emulate MMX smulv4hi3_highpart with SSE
  i386: Emulate MMX mmx_pmaddwd with SSE
  i386: Emulate MMX ashr3/3 with SSE
  i386: Emulate MMX 3 with SSE
  i386: Emulate MMX mmx_andnot3 with SSE
  i386: Emulate MMX mmx_eq/mmx_gt3 with SSE
  i386: Emulate MMX vec_dupv2si with SSE
  i386: Emulate MMX pshufw with SSE
  i386: Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE
  i386: Emulate MMX sse_cvtpi2ps with SSE
  i386: Emulate MMX mmx_pextrw with SSE
  i386: Emulate MMX mmx_pinsrw with SSE
  i386: Emulate MMX V4HI smaxmin/V8QI umaxmin with SSE
  i386: Emulate MMX mmx_pmovmskb with SSE
  i386: Emulate MMX mmx_umulv4hi3_highpart with SSE
  i386: Emulate MMX maskmovq with SSE2 maskmovdqu
  i386: Emulate MMX mmx_uavgv8qi3 with SSE
  i386: Emulate MMX mmx_uavgv4hi3 with SSE
  i386: Emulate MMX mmx_psadbw with SSE
  i386: Emulate MMX movntq with SSE2 movntidi
  i386: Emulate MMX umulv1siv1di3 with SSE2
  i386: Make _mm_empty () as NOP when MMX is disabled
  i386: Emulate MMX ssse3_phwv4hi3 with SSE
  i386: Emulate MMX ssse3_phdv2si3 with SSE
  i386: Emulate MMX ssse3_pmaddubsw with SSE
  i386: Emulate MMX ssse3_pmulhrswv4hi3 with SSE
  i386: Emulate MMX pshufb with SSE version
  i386: Emulate MMX ssse3_psign3 with SSE
  i386: Emulate MMX ssse3_palignrdi with SSE
  i386: Emulate MMX abs2 with SSE
  i386: Correct _pmulhrsw3[_mask]
  i386: Allow MMXMODE moves with TARGET_MMX_WITH_SSE
  i386: Allow MMX vector expanders with TARGET_MMX_WITH_SSE
  i386: Allow MMX intrinsic emulation with SSE
  i386: Enable TM MMX intrinsics with SSE2
  i386: Add tests for MMX intrinsic emulations with SSE

Uros Bizjak (1):
  Prevent allocation of MMX registers with TARGET_MMX_WITH_SSE

 gcc/config/i386/constraints.md|   6 +
 gcc/config/i386/i386-builtin.def  | 126 +--
 gcc/config/i386/i386-c.c  |   2 +
 gcc/config/i386/i386-protos.h |   4 +
 gcc/config/i386/i386.c| 189 +++-
 gcc/config/i386/i386.h|   2 +
 gcc/config/i386/i386.md   |  17 +
 gcc/config/i386/mmintrin.h|  12 +-
 gcc/config/i386/mmx.md| 986 --
 gcc/config/i386/predicates.md |  14 +
 gcc/config/i386/sse.md| 368 +--
 gcc/config/i386/xmmintrin.h   |  61 ++
 gcc/testsuite/gcc.target/i386/mmx-vals.h  |  77 ++
 gcc/testsuite/gcc.target/i386/pr82483-1.c |   2 +-
 gcc/testsuite/gcc.target/i386/pr82483-2.c |   2 +-
 gcc/testsuite/gcc.target/i386/sse2-mmx-10.c   |  43 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-11.c   |  39 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-12.c   |  42 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-13.c   |  40 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-14.c   |  31 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-15.c   |  36 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-16.c   |

[PATCH 03/42] i386: Emulate MMX punpcklXX/punpckhXX with SSE punpcklXX

2019-02-15 Thread H.J. Lu
Emulate MMX punpcklXX/punpckhXX with SSE punpcklXX.  For MMX punpckhXX,
move bits 64:127 to bits 0:63 in SSE register.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/i386-protos.h (ix86_split_mmx_punpck): New
prototype.
* config/i386/i386.c (ix86_split_mmx_punpck): New function.
* config/i386/mmx.m (mmx_punpckhbw): Changed to
define_insn_and_split to support SSE emulation.
(mmx_punpcklbw): Likewise.
(mmx_punpckhwd): Likewise.
(mmx_punpcklwd): Likewise.
(mmx_punpckhdq): Likewise.
(mmx_punpckldq): Likewise.
---
 gcc/config/i386/i386-protos.h |   1 +
 gcc/config/i386/i386.c|  77 +++
 gcc/config/i386/mmx.md| 138 ++
 3 files changed, 168 insertions(+), 48 deletions(-)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index a53b48438ec..37581837a32 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -204,6 +204,7 @@ extern rtx ix86_split_stack_guard (void);
 
 extern void ix86_move_vector_high_sse_to_mmx (rtx);
 extern void ix86_split_mmx_pack (rtx[], enum rtx_code);
+extern void ix86_split_mmx_punpck (rtx[], bool);
 
 #ifdef TREE_CODE
 extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree, int);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d31b69d9a82..a76c17beece 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -20275,6 +20275,83 @@ ix86_split_mmx_pack (rtx operands[], enum rtx_code 
code)
   ix86_move_vector_high_sse_to_mmx (op0);
 }
 
+/* Split MMX punpcklXX/punpckhXX with SSE punpcklXX.  */
+
+void
+ix86_split_mmx_punpck (rtx operands[], bool high_p)
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  rtx op2 = operands[2];
+  machine_mode mode = GET_MODE (op0);
+  rtx mask;
+  /* The corresponding SSE mode.  */
+  machine_mode sse_mode, double_sse_mode;
+
+  switch (mode)
+{
+case E_V8QImode:
+  sse_mode = V16QImode;
+  double_sse_mode = V32QImode;
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (16,
+ GEN_INT (0), GEN_INT (16),
+ GEN_INT (1), GEN_INT (17),
+ GEN_INT (2), GEN_INT (18),
+ GEN_INT (3), GEN_INT (19),
+ GEN_INT (4), GEN_INT (20),
+ GEN_INT (5), GEN_INT (21),
+ GEN_INT (6), GEN_INT (22),
+ GEN_INT (7), GEN_INT (23)));
+  break;
+
+case E_V4HImode:
+  sse_mode = V8HImode;
+  double_sse_mode = V16HImode;
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (8,
+ GEN_INT (0), GEN_INT (8),
+ GEN_INT (1), GEN_INT (9),
+ GEN_INT (2), GEN_INT (10),
+ GEN_INT (3), GEN_INT (11)));
+  break;
+
+case E_V2SImode:
+  sse_mode = V4SImode;
+  double_sse_mode = V8SImode;
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4,
+ GEN_INT (0), GEN_INT (4),
+ GEN_INT (1), GEN_INT (5)));
+  break;
+
+default:
+  gcc_unreachable ();
+}
+
+  /* Generate SSE punpcklXX.  */
+  rtx dest = lowpart_subreg (sse_mode, op0, GET_MODE (op0));
+  op1 = lowpart_subreg (sse_mode, op1, GET_MODE (op1));
+  op2 = lowpart_subreg (sse_mode, op2, GET_MODE (op2));
+
+  op1 = gen_rtx_VEC_CONCAT (double_sse_mode, op1, op2);
+  op2 = gen_rtx_VEC_SELECT (sse_mode, op1, mask);
+  rtx insn = gen_rtx_SET (dest, op2);
+  emit_insn (insn);
+
+  if (high_p)
+{
+  /* Move bits 64:127 to bits 0:63.  */
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4, GEN_INT (2), GEN_INT (3),
+ GEN_INT (0), GEN_INT (0)));
+  dest = lowpart_subreg (V4SImode, dest, GET_MODE (dest));
+  op1 = gen_rtx_VEC_SELECT (V4SImode, dest, mask);
+  insn = gen_rtx_SET (dest, op1);
+  emit_insn (insn);
+}
+}
+
 /* Helper function of ix86_fixup_binary_operands to canonicalize
operand order.  Returns true if the operands should be swapped.  */
 
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index dbb2baa74d7..34fecd6a745 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1064,87 +1064,129 @@
(set_attr "type" "mmxshft,sselog,sselog")
(set_attr "mode" "DI,TI,TI")])
 
-(define_insn "mmx_punpckhbw"
-  [(set (match_operand:V8QI 0 "register_operand" "=y")
+(define_insn_and_split "mmx_punpckhbw"
+  [(set (match_operand:V8QI 0 "register_opera

[PATCH 02/42] i386: Emulate MMX packsswb/packssdw/packuswb with SSE2

2019-02-15 Thread H.J. Lu
Emulate MMX packsswb/packssdw/packuswb with SSE packsswb/packssdw/packuswb
plus moving bits 64:95 to bits 32:63 in SSE register.  Only SSE register
source operand is allowed.

2019-02-08  H.J. Lu  
Uros Bizjak  

PR target/89021
* config/i386/i386-protos.h (ix86_move_vector_high_sse_to_mmx):
New prototype.
(ix86_split_mmx_pack): Likewise.
* config/i386/i386.c (ix86_move_vector_high_sse_to_mmx): New
function.
(ix86_split_mmx_pack): Likewise.
* config/i386/i386.md (mmx_isa): New.
(enabled): Also check mmx_isa.
* config/i386/mmx.md (any_s_truncate): New code iterator.
(s_trunsuffix): New code attr.
(mmx_packsswb): Removed.
(mmx_packssdw): Likewise.
(mmx_packuswb): Likewise.
(mmx_packswb): New define_insn_and_split to emulate
MMX packsswb/packuswb with SSE2.
(mmx_packssdw): Likewise.
* config/i386/predicates.md (register_mmxmem_operand): New.
---
 gcc/config/i386/i386-protos.h |  3 ++
 gcc/config/i386/i386.c| 54 
 gcc/config/i386/i386.md   | 13 +++
 gcc/config/i386/mmx.md| 67 +++
 gcc/config/i386/predicates.md |  7 
 5 files changed, 114 insertions(+), 30 deletions(-)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 27f5cc13abf..a53b48438ec 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -202,6 +202,9 @@ extern void ix86_expand_vecop_qihi (enum rtx_code, rtx, 
rtx, rtx);
 
 extern rtx ix86_split_stack_guard (void);
 
+extern void ix86_move_vector_high_sse_to_mmx (rtx);
+extern void ix86_split_mmx_pack (rtx[], enum rtx_code);
+
 #ifdef TREE_CODE
 extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree, int);
 #endif /* TREE_CODE  */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7d7dd80930e..d31b69d9a82 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -20221,6 +20221,60 @@ ix86_expand_vector_move_misalign (machine_mode mode, 
rtx operands[])
 gcc_unreachable ();
 }
 
+/* Move bits 64:95 to bits 32:63.  */
+
+void
+ix86_move_vector_high_sse_to_mmx (rtx op)
+{
+  rtx mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4, GEN_INT (0), GEN_INT (2),
+ GEN_INT (0), GEN_INT (0)));
+  rtx dest = lowpart_subreg (V4SImode, op, GET_MODE (op));
+  op = gen_rtx_VEC_SELECT (V4SImode, dest, mask);
+  rtx insn = gen_rtx_SET (dest, op);
+  emit_insn (insn);
+}
+
+/* Split MMX pack with signed/unsigned saturation with SSE/SSE2.  */
+
+void
+ix86_split_mmx_pack (rtx operands[], enum rtx_code code)
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  rtx op2 = operands[2];
+
+  machine_mode dmode = GET_MODE (op0);
+  machine_mode smode = GET_MODE (op1);
+  machine_mode inner_dmode = GET_MODE_INNER (dmode);
+  machine_mode inner_smode = GET_MODE_INNER (smode);
+
+  /* Get the corresponding SSE mode for destination.  */
+  int nunits = 16 / GET_MODE_SIZE (inner_dmode);
+  machine_mode sse_dmode = mode_for_vector (GET_MODE_INNER (dmode),
+   nunits).require ();
+  machine_mode sse_half_dmode = mode_for_vector (GET_MODE_INNER (dmode),
+nunits / 2).require ();
+
+  /* Get the corresponding SSE mode for source.  */
+  nunits = 16 / GET_MODE_SIZE (inner_smode);
+  machine_mode sse_smode = mode_for_vector (GET_MODE_INNER (smode),
+   nunits).require ();
+
+  /* Generate SSE pack with signed/unsigned saturation.  */
+  rtx dest = lowpart_subreg (sse_dmode, op0, GET_MODE (op0));
+  op1 = lowpart_subreg (sse_smode, op1, GET_MODE (op1));
+  op2 = lowpart_subreg (sse_smode, op2, GET_MODE (op2));
+
+  op1 = gen_rtx_fmt_e (code, sse_half_dmode, op1);
+  op2 = gen_rtx_fmt_e (code, sse_half_dmode, op2);
+  rtx insn = gen_rtx_SET (dest, gen_rtx_VEC_CONCAT (sse_dmode,
+   op1, op2));
+  emit_insn (insn);
+
+  ix86_move_vector_high_sse_to_mmx (op0);
+}
+
 /* Helper function of ix86_fixup_binary_operands to canonicalize
operand order.  Returns true if the operands should be swapped.  */
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 40ed93dc804..e1727676deb 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -792,6 +792,10 @@
avx512vl,noavx512vl,x64_avx512dq,x64_avx512bw"
   (const_string "base"))
 
+;; Define instruction set of MMX instructions
+(define_attr "mmx_isa" "base,native,x64,x64_noavx,x64_avx"
+  (const_string "base"))
+
 (define_attr "enabled" ""
   (cond [(eq_attr "isa" "x64") (symbol_ref "TARGET_64BIT")
 (eq_attr "isa" "x64_sse2")
@@ -830,6 +834,15 @@
 (eq_attr "isa" "noavx512dq") (symbol_ref "!TARGET_AVX512DQ")
 (eq_attr "isa" "avx512vl") (symbol_

[PATCH 08/42] i386: Emulate MMX ashr3/3 with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX ashr3/3 with SSE.  Only SSE register
source operand is allowed.

PR target/89021
* config/i386/mmx.md (mmx_ashr3): Also allow
TARGET_MMX_WITH_SSE.  Add SSE emulation.
(mmx_3): Likewise.
(ashr3): New.
(3): Likewise.
---
 gcc/config/i386/mmx.md | 50 ++
 1 file changed, 36 insertions(+), 14 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 9f0311badca..240e0188a78 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -959,32 +959,54 @@
(set_attr "mode" "DI")])
 
 (define_insn "mmx_ashr3"
-  [(set (match_operand:MMXMODE24 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODE24 0 "register_operand" "=y,x,Yv")
 (ashiftrt:MMXMODE24
- (match_operand:MMXMODE24 1 "register_operand" "0")
- (match_operand:DI 2 "nonmemory_operand" "yN")))]
-  "TARGET_MMX"
-  "psra\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxshft")
+ (match_operand:MMXMODE24 1 "register_operand" "0,0,Yv")
+ (match_operand:DI 2 "nonmemory_operand" "yN,xN,YvN")))]
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
+  "@
+   psra\t{%2, %0|%0, %2}
+   psra\t{%2, %0|%0, %2}
+   vpsra\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxshft,sseishft,sseishft")
(set (attr "length_immediate")
  (if_then_else (match_operand 2 "const_int_operand")
(const_string "1")
(const_string "0")))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
+
+(define_expand "ashr3"
+  [(set (match_operand:MMXMODE24 0 "register_operand")
+(ashiftrt:MMXMODE24
+ (match_operand:MMXMODE24 1 "register_operand")
+ (match_operand:DI 2 "nonmemory_operand")))]
+  "TARGET_MMX_WITH_SSE")
 
 (define_insn "mmx_3"
-  [(set (match_operand:MMXMODE248 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODE248 0 "register_operand" "=y,x,Yv")
 (any_lshift:MMXMODE248
- (match_operand:MMXMODE248 1 "register_operand" "0")
- (match_operand:DI 2 "nonmemory_operand" "yN")))]
-  "TARGET_MMX"
-  "p\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxshft")
+ (match_operand:MMXMODE248 1 "register_operand" "0,0,Yv")
+ (match_operand:DI 2 "nonmemory_operand" "yN,xN,YvN")))]
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
+  "@
+   p\t{%2, %0|%0, %2}
+   p\t{%2, %0|%0, %2}
+   vp\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxshft,sseishft,sseishft")
(set (attr "length_immediate")
  (if_then_else (match_operand 2 "const_int_operand")
(const_string "1")
(const_string "0")))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
+
+(define_expand "3"
+  [(set (match_operand:MMXMODE248 0 "register_operand")
+(any_lshift:MMXMODE248
+ (match_operand:MMXMODE248 1 "register_operand")
+ (match_operand:DI 2 "nonmemory_operand")))]
+  "TARGET_MMX_WITH_SSE")
 
 ;
 ;;
-- 
2.20.1



[PATCH 09/42] i386: Emulate MMX 3 with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX 3 with SSE.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/mmx.md (any_logic:mmx_3): Also allow
TARGET_MMX_WITH_SSE.
(any_logic:3): New.
(any_logic:*mmx_3): Also allow TARGET_MMX_WITH_SSE.
Add SSE support.
---
 gcc/config/i386/mmx.md | 33 +++--
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 240e0188a78..7e2d40313c3 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1061,20 +1061,33 @@
 (define_expand "mmx_3"
   [(set (match_operand:MMXMODEI 0 "register_operand")
(any_logic:MMXMODEI
- (match_operand:MMXMODEI 1 "nonimmediate_operand")
- (match_operand:MMXMODEI 2 "nonimmediate_operand")))]
-  "TARGET_MMX"
+ (match_operand:MMXMODEI 1 "register_mmxmem_operand")
+ (match_operand:MMXMODEI 2 "register_mmxmem_operand")))]
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
+  "ix86_fixup_binary_operands_no_copy (, mode, operands);")
+
+(define_expand "3"
+  [(set (match_operand:MMXMODEI 0 "register_operand")
+   (any_logic:MMXMODEI
+ (match_operand:MMXMODEI 1 "register_operand")
+ (match_operand:MMXMODEI 2 "register_operand")))]
+  "TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (, mode, operands);")
 
 (define_insn "*mmx_3"
-  [(set (match_operand:MMXMODEI 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,Yv")
 (any_logic:MMXMODEI
- (match_operand:MMXMODEI 1 "nonimmediate_operand" "%0")
- (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym")))]
-  "TARGET_MMX && ix86_binary_operator_ok (, mode, operands)"
-  "p\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxadd")
-   (set_attr "mode" "DI")])
+ (match_operand:MMXMODEI 1 "register_mmxmem_operand" "%0,0,Yv")
+ (match_operand:MMXMODEI 2 "register_mmxmem_operand" "ym,x,Yv")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && ix86_binary_operator_ok (, mode, operands)"
+  "@
+   p\t{%2, %0|%0, %2}
+   p\t{%2, %0|%0, %2}
+   vp\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxadd,sselog,sselog")
+   (set_attr "mode" "DI,TI,TI")])
 
 ;
 ;;
-- 
2.20.1



[PATCH 07/42] i386: Emulate MMX mmx_pmaddwd with SSE

2019-02-15 Thread H.J. Lu
Emulate MMX pmaddwd with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mmx_pmaddwd): Also allow TARGET_MMX_WITH_SSE.
(*mmx_pmaddwd): Also allow TARGET_MMX_WITH_SSE.  Add SSE support.
---
 gcc/config/i386/mmx.md | 25 +++--
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 3a7964d52bb..9f0311badca 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -810,11 +810,11 @@
  (mult:V2SI
(sign_extend:V2SI
  (vec_select:V2HI
-   (match_operand:V4HI 1 "nonimmediate_operand")
+   (match_operand:V4HI 1 "register_mmxmem_operand")
(parallel [(const_int 0) (const_int 2)])))
(sign_extend:V2SI
  (vec_select:V2HI
-   (match_operand:V4HI 2 "nonimmediate_operand")
+   (match_operand:V4HI 2 "register_mmxmem_operand")
(parallel [(const_int 0) (const_int 2)]
  (mult:V2SI
(sign_extend:V2SI
@@ -823,20 +823,20 @@
(sign_extend:V2SI
  (vec_select:V2HI (match_dup 2)
(parallel [(const_int 1) (const_int 3)]))]
-  "TARGET_MMX"
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (MULT, V4HImode, operands);")
 
 (define_insn "*mmx_pmaddwd"
-  [(set (match_operand:V2SI 0 "register_operand" "=y")
+  [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")
 (plus:V2SI
  (mult:V2SI
(sign_extend:V2SI
  (vec_select:V2HI
-   (match_operand:V4HI 1 "nonimmediate_operand" "%0")
+   (match_operand:V4HI 1 "register_mmxmem_operand" "%0,0,Yv")
(parallel [(const_int 0) (const_int 2)])))
(sign_extend:V2SI
  (vec_select:V2HI
-   (match_operand:V4HI 2 "nonimmediate_operand" "ym")
+   (match_operand:V4HI 2 "register_mmxmem_operand" "ym,x,Yv")
(parallel [(const_int 0) (const_int 2)]
  (mult:V2SI
(sign_extend:V2SI
@@ -845,10 +845,15 @@
(sign_extend:V2SI
  (vec_select:V2HI (match_dup 2)
(parallel [(const_int 1) (const_int 3)]))]
-  "TARGET_MMX && ix86_binary_operator_ok (MULT, V4HImode, operands)"
-  "pmaddwd\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxmul")
-   (set_attr "mode" "DI")])
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && ix86_binary_operator_ok (MULT, V4HImode, operands)"
+  "@
+   pmaddwd\t{%2, %0|%0, %2}
+   pmaddwd\t{%2, %0|%0, %2}
+   vpmaddwd\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxmul,sseiadd,sseiadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_pmulhrwv4hi3"
   [(set (match_operand:V4HI 0 "register_operand")
-- 
2.20.1



  1   2   >