[Bug target/95218] [11 Regression] FAIL: gcc.target/i386/fma_run_double_1.c execution test since r11-455-g94f687bd9ae37ece

2020-05-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95218

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #20 from Richard Biener  ---
Fixed.

[Bug target/95218] [11 Regression] FAIL: gcc.target/i386/fma_run_double_1.c execution test since r11-455-g94f687bd9ae37ece

2020-05-20 Thread cvs-commit at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95218

--- Comment #19 from CVS Commits  ---
The master branch has been updated by Uros Bizjak :

https://gcc.gnu.org/g:7797f5ec58078523a452e5cf239596e13d77d885

commit r11-535-g7797f5ec58078523a452e5cf239596e13d77d885
Author: Uros Bizjak 
Date:   Thu May 21 01:53:09 2020 +0200

i386: Do not use commutative operands with (use) RTX [PR95238]

2020-05-21  Uroš Bizjak  

gcc/ChangeLog:
PR target/95218

* config/i386/mmx.md (*mmx_v2sf): Do not mark
operands 1 and 2 commutative.  Manually swap operands.
(*mmx_nabsv2sf2): Ditto.

Partially revert:

* config/i386/i386.md (*tf2_1):
Mark operands 1 and 2 commutative.
(*nabstf2_1): Ditto.
* config/i386/sse.md (*2): Mark operands 1 and 2
commutative.  Do not swap operands.
(*nabs2): Ditto.

[Bug target/95218] [11 Regression] FAIL: gcc.target/i386/fma_run_double_1.c execution test since r11-455-g94f687bd9ae37ece

2020-05-20 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95218

--- Comment #18 from Uroš Bizjak  ---
Created attachment 48575
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48575=edit
Patch in testing.

[Bug target/95218] [11 Regression] FAIL: gcc.target/i386/fma_run_double_1.c execution test since r11-455-g94f687bd9ae37ece

2020-05-20 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95218

--- Comment #17 from Uroš Bizjak  ---
The problem is with commutative operands, these somehow confuse postreload
pass.

I'll commit partial revert that basically puts back:

 (define_insn_and_split "*2"
-  [(set (match_operand:VF 0 "register_operand" "=x,v")
+  [(set (match_operand:VF 0 "register_operand" "=x,x,v,v")
(absneg:VF
- (match_operand:VF 1 "vector_operand" "%0,v")))
-   (use (match_operand:VF 2 "vector_operand" "xBm,vm"))]
+ (match_operand:VF 1 "vector_operand" "0,xBm,v,m")))
+   (use (match_operand:VF 2 "vector_operand" "xBm,0,vm,v"))]

with manual swapping of operands.

[Bug target/95218] [11 Regression] FAIL: gcc.target/i386/fma_run_double_1.c execution test since r11-455-g94f687bd9ae37ece

2020-05-20 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95218

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at redhat dot com

--- Comment #16 from Jeffrey A. Law  ---
A "naked" use as an INSN should only occur inside the pattern of an insn (there
are exceptins that exist internally within reload and delay slot scheduling).


It's only supposed to be used to mark that a particular object is live and used
by the insn.  I'll note the following:

...
This means that @code{use} can @emph{only} be used to describe
that the register is live.  You should think twice before adding
@code{use} statements, more often you will want to use @code{unspec}
instead.  The @code{use} RTX is most commonly useful to describe that
a fixed register is implicitly used in an insn.  It is also safe to use
in patterns where the compiler knows for other reasons that the result
of the whole pattern is variable, such as @samp{cpymem@var{m}} or
@samp{call} patterns.

I think one could reasonably think that we should support  (use (mem)) with the
same semantics -- it means that the value in memory is used which is helpful in
dependency tracking.

[Bug target/95218] [11 Regression] FAIL: gcc.target/i386/fma_run_double_1.c execution test since r11-455-g94f687bd9ae37ece

2020-05-20 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95218

--- Comment #15 from Uroš Bizjak  ---
(In reply to Jakub Jelinek from comment #13)
> So perhaps pre-reload splitter of that into the UNSPEC form?

Vector insns should be able to use pre-reload splitter, but scalar instructions
depend on post-reload splitter, because they are split in a different way,
depending on a register set of allocated register (FP, XMM or even integer).

So, it really needs to be a post-reload splitter.

[Bug target/95218] [11 Regression] FAIL: gcc.target/i386/fma_run_double_1.c execution test since r11-455-g94f687bd9ae37ece

2020-05-20 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95218

Uroš Bizjak  changed:

   What|Removed |Added

 CC|uros at gcc dot gnu.org|

--- Comment #14 from Uroš Bizjak  ---
Its interesting to note, that only *some* of insns with memory uses get
removed.

[Bug target/95218] [11 Regression] FAIL: gcc.target/i386/fma_run_double_1.c execution test since r11-455-g94f687bd9ae37ece

2020-05-20 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95218

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #13 from Jakub Jelinek  ---
So perhaps pre-reload splitter of that into the UNSPEC form?

[Bug target/95218] [11 Regression] FAIL: gcc.target/i386/fma_run_double_1.c execution test since r11-455-g94f687bd9ae37ece

2020-05-20 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95218

--- Comment #12 from Uroš Bizjak  ---
(In reply to Richard Biener from comment #11)
> Note a 'use' is not something that needs to be preserved, so
> 
> (define_insn_and_split "*2"
>   [(set (match_operand:VF 0 "register_operand" "=x,v")
> (absneg:VF
>   (match_operand:VF 1 "vector_operand" "%0,v")))
>(use (match_operand:VF 2 "vector_operand" "xBm,vm"))]
>   "TARGET_SSE"
>   "#"
>   "&& reload_completed"
>   [(set (match_dup 0)
> (:VF (match_dup 1) (match_dup 2)))]
>   ""
>   [(set_attr "isa" "noavx,avx")])
> 
> doesn't make much sense (before reload).  To me, that is.  Why do
> we go that obfuscated way at all?  I think a clean solution is to
> use an UNSPEC here (well, "clean"...).

The reason for this approach was, that combine still processes the pattern as
abs/nop. Please see how *nabstf2_1 is defined.

[Bug target/95218] [11 Regression] FAIL: gcc.target/i386/fma_run_double_1.c execution test since r11-455-g94f687bd9ae37ece

2020-05-20 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95218

--- Comment #11 from Richard Biener  ---
Note a 'use' is not something that needs to be preserved, so

(define_insn_and_split "*2"
  [(set (match_operand:VF 0 "register_operand" "=x,v")
(absneg:VF
  (match_operand:VF 1 "vector_operand" "%0,v")))
   (use (match_operand:VF 2 "vector_operand" "xBm,vm"))]
  "TARGET_SSE"
  "#"
  "&& reload_completed"
  [(set (match_dup 0)
(:VF (match_dup 1) (match_dup 2)))]
  ""
  [(set_attr "isa" "noavx,avx")])

doesn't make much sense (before reload).  To me, that is.  Why do
we go that obfuscated way at all?  I think a clean solution is to
use an UNSPEC here (well, "clean"...).

[Bug target/95218] [11 Regression] FAIL: gcc.target/i386/fma_run_double_1.c execution test since r11-455-g94f687bd9ae37ece

2020-05-20 Thread rguenther at suse dot de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95218

--- Comment #10 from rguenther at suse dot de  ---
On Wed, 20 May 2020, ubizjak at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95218
> 
> Uroš Bizjak  changed:
> 
>What|Removed |Added
> 
>  CC||law at gcc dot gnu.org,
>||rsandifo at gcc dot gnu.org
> 
> --- Comment #9 from Uroš Bizjak  ---
> (In reply to Uroš Bizjak from comment #7)
> > Ooh, yes :(
> > 
> > '(use X)'
> >  Represents the use of the value of X.  It indicates that the value
> >  in X at this point in the program is needed, even though it may not
> >  be apparent why this is so.  Therefore, the compiler will not
> >  attempt to delete previous instructions whose only effect is to
> >  store a value in X.  X must be a 'reg' expression.
> > 
> > Partial revert is in works.
> 
> Actually, no. The above applies to single (use ...) RTX, not (use ...) as part
> of a parallel. There are plenty of uses of memory_operands in i386.md:
> 
> (define_insn "fix_truncdi_i387"
>   [(set (match_operand:DI 0 "nonimmediate_operand" "=m")
> (fix:DI (match_operand 1 "register_operand" "f")))
>(use (match_operand:HI 2 "memory_operand" "m"))
>(use (match_operand:HI 3 "memory_operand" "m"))
>(clobber (match_scratch:XF 4 "="))]
> 
> Let's ask experts.

The question is what should the (use ...) do?  Allow the splitter
to use its contents?  I guess that's reasonable interpretation
though I thought (use ...) apply only to register liveness computation
and not to memory.  But what do I know about RTL ;)

[Bug target/95218] [11 Regression] FAIL: gcc.target/i386/fma_run_double_1.c execution test since r11-455-g94f687bd9ae37ece

2020-05-20 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95218

Uroš Bizjak  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org,
   ||rsandifo at gcc dot gnu.org

--- Comment #9 from Uroš Bizjak  ---
(In reply to Uroš Bizjak from comment #7)
> Ooh, yes :(
> 
> '(use X)'
>  Represents the use of the value of X.  It indicates that the value
>  in X at this point in the program is needed, even though it may not
>  be apparent why this is so.  Therefore, the compiler will not
>  attempt to delete previous instructions whose only effect is to
>  store a value in X.  X must be a 'reg' expression.
> 
> Partial revert is in works.

Actually, no. The above applies to single (use ...) RTX, not (use ...) as part
of a parallel. There are plenty of uses of memory_operands in i386.md:

(define_insn "fix_truncdi_i387"
  [(set (match_operand:DI 0 "nonimmediate_operand" "=m")
(fix:DI (match_operand 1 "register_operand" "f")))
   (use (match_operand:HI 2 "memory_operand" "m"))
   (use (match_operand:HI 3 "memory_operand" "m"))
   (clobber (match_scratch:XF 4 "="))]

Let's ask experts.

[Bug target/95218] [11 Regression] FAIL: gcc.target/i386/fma_run_double_1.c execution test since r11-455-g94f687bd9ae37ece

2020-05-20 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95218

--- Comment #8 from Martin Liška  ---
There's partially reduced test-case:

$ cat fma.i

double res_test0101[] = {
-3,1, 17,51,109,   197,   321,   487,   701,   969,  1297,
1691,  2157,  2701,  3329,  4047,  4861,  5777,  6801,  7939,  9197, 10581,
12097, 13751, 15549, 17497, 19601, 21867, 24301, 26909, 29697, 32671};
double res_test0110[] = {3,  -1, -17,-51,-109,   -197,   -321,
 -487,   -701,   -969,   -1297,  -1691,  -2157,  -2701,
 -3329,  -4047,  -4861,  -5777,  -6801,  -7939,  -9197,
 -10581, -12097, -13751, -15549, -17497, -19601,
-21867,
 -24301, -26909, -29697, -32671};
extern void abort() __attribute__(()) __attribute__(());
static __inline int __get_cpuid(unsigned int __leaf, unsigned int *__eax,
unsigned int *__ebx, unsigned int *__ecx,
unsigned int *__edx) {
  __asm__("cpuid\n\t"
  : "=a"(*__eax), "=b"(*__ebx), "=c"(*__ecx), "=d"(*__edx)
  : "0"(__leaf));
}
static void fma_test();
int main() {
  unsigned int eax, ebx, ecx, edx;
  if (!__get_cpuid(1, , , , )) 0;
  if (ecx & (1 << 12)) fma_test();

  return 0;
}
double m1[] = {1,  2,  3,  4,  5,  6,  7,  8,  9,  10, 11, 12, 13, 14, 15, 16,
   17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32};
double m2[] = {2,  3,  4,  5,  6,  7,  8,  9,  10, 11, 12, 13, 14, 15, 16, 17,
   18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33};
double m3[] = {3,  4,  5,  6,  7,  8,  9,  10, 11, 12, 13, 14, 15, 16, 17, 18,
   19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34};
double m4[32];
int test_fails = 0;
void compare_result(double *res) {
  int i;
  int good = 1;
  i = 0;
  for (i; i < 32; i++)
if (m4[i] != res[i])
  if (good) good = 0;
  if (!good) test_fails = 1;
}
static void fma_test() {
  double __trans_tmp_3;
  double __trans_tmp_2;
  double __trans_tmp_1;
  int i;
  for (i = 0; i < 32; i++) m4[i] = 0;
  i = 0;
  for (i; i < 32; i++) {
double a = m1[i];
double b = m2[i];
double c = m3[i];
__trans_tmp_1 = ((a * b) - c) * a - b;

m4[i] = __trans_tmp_1;
  }
  compare_result(res_test0101);
  i = 0;
  for (i; i < 32; i++) {
{
  double a = m1[i];
  double b = m2[i];
  double c = m3[i];

  __trans_tmp_3 = -((a * b) - c) * a + b;
}

m4[i] = __trans_tmp_3;
  }
  compare_result(res_test0110);
  i = 0;
  for (i; i < 32; i++) {
double a = m1[i];
double b = m2[i];
double c = m3[i];
__trans_tmp_2 = -((a * b) - c) * a - b;

m4[i] = __trans_tmp_2;
  }
  if (test_fails) abort();
}

$ gcc -O3 -Wno-attributes -mfpmath=sse -mfma fma.i  && ./a.out 
Aborted (core dumped)

[Bug target/95218] [11 Regression] FAIL: gcc.target/i386/fma_run_double_1.c execution test since r11-455-g94f687bd9ae37ece

2020-05-20 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95218

Uroš Bizjak  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |ubizjak at gmail dot com

--- Comment #7 from Uroš Bizjak  ---
Ooh, yes :(

'(use X)'
 Represents the use of the value of X.  It indicates that the value
 in X at this point in the program is needed, even though it may not
 be apparent why this is so.  Therefore, the compiler will not
 attempt to delete previous instructions whose only effect is to
 store a value in X.  X must be a 'reg' expression.

Partial revert is in works.

[Bug target/95218] [11 Regression] FAIL: gcc.target/i386/fma_run_double_1.c execution test since r11-455-g94f687bd9ae37ece

2020-05-20 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95218

--- Comment #6 from Uroš Bizjak  ---
I think I found the issue.

Before the patch, we had:

(insn 375 373 2574 7 (parallel [
(set (reg:V4DF 21 xmm1 [orig:1681 vect__45.441 ] [1681])
(neg:V4DF (mem/c:V4DF (plus:DI (reg/f:DI 7 sp)
(const_int 160 [0xa0])) [3 %sfp+-1184 S32 A256])))
(use (reg:V4DF 20 xmm0 [3332]))
]) "fma_1.h":20:10 1487 {*negv4df2}
 (nil))

after the patch, reload is free to create:

(insn 375 3216 2578 7 (parallel [
(set (reg:V4DF 21 xmm1 [orig:1681 vect__45.441 ] [1681])
(neg:V4DF (reg:V4DF 20 xmm0 [3332])))
(use (mem/c:V4DF (plus:DI (reg/f:DI 7 sp)
(const_int 160 [0xa0])) [3 %sfp+-1184 S32 A256]))
]) "fma_1.h":20:10 1487 {*negv4df2}
 (nil))

which postreload pass does not like, and simply deletes it:

deleting insn with uid = 375.

Just like that. No substitution whatsoever.

So, is there some limitation with (use) RTX, so we can't have memory here?

[Bug target/95218] [11 Regression] FAIL: gcc.target/i386/fma_run_double_1.c execution test since r11-455-g94f687bd9ae37ece

2020-05-20 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95218

--- Comment #5 from Martin Liška  ---
Sure, doing that.

[Bug target/95218] [11 Regression] FAIL: gcc.target/i386/fma_run_double_1.c execution test since r11-455-g94f687bd9ae37ece

2020-05-20 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95218

--- Comment #4 from Uroš Bizjak  ---
(In reply to Martin Liška from comment #3)
> Started with r11-455-g94f687bd9ae37ece.

It is not obvious from the referred patch what is going wrong here.

Unfortunately, I have no FMA capable machine, can someone please isolate one
small test that fails?

[Bug target/95218] [11 Regression] FAIL: gcc.target/i386/fma_run_double_1.c execution test since r11-455-g94f687bd9ae37ece

2020-05-20 Thread marxin at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95218

Martin Liška  changed:

   What|Removed |Added

 CC||marxin at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org,
   ||uros at gcc dot gnu.org
Summary|[11 Regression] FAIL:   |[11 Regression] FAIL:
   |gcc.target/i386/fma_run_dou |gcc.target/i386/fma_run_dou
   |ble_1.c execution test  |ble_1.c execution test
   ||since
   ||r11-455-g94f687bd9ae37ece

--- Comment #3 from Martin Liška  ---
Started with r11-455-g94f687bd9ae37ece.