Re: [PATCH] Fix ssa coalescing with inline asm (PR middle-end/70593)

2016-04-12 Thread Andrew MacLeod

On 04/12/2016 09:06 AM, Jakub Jelinek wrote:

On Tue, Apr 12, 2016 at 09:01:58AM -0400, Andrew MacLeod wrote:

I poked around for a while and found a patch from late 2012 where I was
attempting to implement atomics as a gimple primitive (GIMPLE_ATOMIC).  That
work did indeed add the ability of the CAS primitive to return 2 values to
op_iter_init and friends.  I recall there were maybe a half dozen places in
the compiler that needed to be updated to deal with 2 results for a gimple
statement.

It never went anywhere because I ran into a bunch of other problems that
frustrated me and led me down the frontend/backend separation path :-P

Today, we still just let RTL patterns optimize the conditional result and
expected-value address parameter from a compare_exchange... so we simply
bypass the 2 results issue for a CAS and push it to the target :-).

For the __builtin_*_overflow builtins, we lower them into internal functions
that return complex integer that contains both the values we want (result
and overflow flag) and then just extract the real and imag part of that.
Ugly, but it works without major changes to all optimizers.

Jakub
I suppose it shouldn't be too hard to abstract it a bit so that we 
aren't special casing GIMPLE_ASM, but rather check a flag on the 
statement kind and use a call back to handle the multiple defs that need 
handling as an exception.. And do it in some way so it doesn't slow down 
the normal case.


I expect everything required is buried somewhere in my patch.  Maybe 
someday when I'm looking for a something smallish and different to do 
I'll have a look and see if there is anything worth pursuing.  It does 
seem to come up every year or two.


Andrew


Re: [PATCH] Fix ssa coalescing with inline asm (PR middle-end/70593)

2016-04-12 Thread Jakub Jelinek
On Tue, Apr 12, 2016 at 09:01:58AM -0400, Andrew MacLeod wrote:
> I poked around for a while and found a patch from late 2012 where I was
> attempting to implement atomics as a gimple primitive (GIMPLE_ATOMIC).  That
> work did indeed add the ability of the CAS primitive to return 2 values to
> op_iter_init and friends.  I recall there were maybe a half dozen places in
> the compiler that needed to be updated to deal with 2 results for a gimple
> statement.
> 
> It never went anywhere because I ran into a bunch of other problems that
> frustrated me and led me down the frontend/backend separation path :-P
> 
> Today, we still just let RTL patterns optimize the conditional result and
> expected-value address parameter from a compare_exchange... so we simply
> bypass the 2 results issue for a CAS and push it to the target :-).

For the __builtin_*_overflow builtins, we lower them into internal functions
that return complex integer that contains both the values we want (result
and overflow flag) and then just extract the real and imag part of that.
Ugly, but it works without major changes to all optimizers.

Jakub


Re: [PATCH] Fix ssa coalescing with inline asm (PR middle-end/70593)

2016-04-12 Thread Andrew MacLeod

On 04/12/2016 04:35 AM, Richard Biener wrote:

On Mon, 11 Apr 2016, Jeff Law wrote:


On 04/08/2016 10:10 AM, Jakub Jelinek wrote:

On Fri, Apr 08, 2016 at 06:04:38PM +0200, Richard Biener wrote:

Hmm, don't we simply want to do this for all stmts (OK, only asm have
multiple defs...)?

For all stmts that have multiple defs (which is only GIMPLE_ASM right now).

I thought the atomic support added support for statements with multiple
outputs?

Maybe on a branch.  Historically all gimples had the ability to have
multiple DEFs but this was removed as having excessive memory requirement.
Later DEFs handling was simplified and we no longer keep def "operands"
explicit but the iterators know how to get at all possible defs.  Which
makes it "currently only ASMs can have multiple (non-virtual) defs".

See ssa-iterators.h:op_iter_init and op_iter_next_def.

Richard.
I poked around for a while and found a patch from late 2012 where I was 
attempting to implement atomics as a gimple primitive (GIMPLE_ATOMIC).  
That work did indeed add the ability of the CAS primitive to return 2 
values to op_iter_init and friends.  I recall there were maybe a half 
dozen places in the compiler that needed to be updated to deal with 2 
results for a gimple statement.


It never went anywhere because I ran into a bunch of other problems that 
frustrated me and led me down the frontend/backend separation path :-P


Today, we still just let RTL patterns optimize the conditional result 
and expected-value address parameter from a compare_exchange... so we 
simply bypass the 2 results issue for a CAS and push it to the target :-).


Andrew


Re: [PATCH] Fix ssa coalescing with inline asm (PR middle-end/70593)

2016-04-12 Thread Richard Biener
On Mon, 11 Apr 2016, Jeff Law wrote:

> On 04/08/2016 10:10 AM, Jakub Jelinek wrote:
> > On Fri, Apr 08, 2016 at 06:04:38PM +0200, Richard Biener wrote:
> > > Hmm, don't we simply want to do this for all stmts (OK, only asm have
> > > multiple defs...)?
> > 
> > For all stmts that have multiple defs (which is only GIMPLE_ASM right now).
> I thought the atomic support added support for statements with multiple
> outputs?

Maybe on a branch.  Historically all gimples had the ability to have
multiple DEFs but this was removed as having excessive memory requirement.
Later DEFs handling was simplified and we no longer keep def "operands"
explicit but the iterators know how to get at all possible defs.  Which
makes it "currently only ASMs can have multiple (non-virtual) defs".

See ssa-iterators.h:op_iter_init and op_iter_next_def.

Richard.


Re: [PATCH] Fix ssa coalescing with inline asm (PR middle-end/70593)

2016-04-11 Thread Jeff Law

On 04/08/2016 10:10 AM, Jakub Jelinek wrote:

On Fri, Apr 08, 2016 at 06:04:38PM +0200, Richard Biener wrote:

Hmm, don't we simply want to do this for all stmts (OK, only asm have multiple 
defs...)?


For all stmts that have multiple defs (which is only GIMPLE_ASM right now).
I thought the atomic support added support for statements with multiple 
outputs?


Jeff



Re: [PATCH] Fix ssa coalescing with inline asm (PR middle-end/70593)

2016-04-08 Thread Richard Biener
On April 8, 2016 6:10:05 PM GMT+02:00, Jakub Jelinek  wrote:
>On Fri, Apr 08, 2016 at 06:04:38PM +0200, Richard Biener wrote:
>> Hmm, don't we simply want to do this for all stmts (OK, only asm have
>multiple defs...)?
>
>For all stmts that have multiple defs (which is only GIMPLE_ASM right
>now).
>Though, of course, if you want, unconditionally doing:
> bool first = true;
> FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_DEF)
>   if (first)
> first = false;
>   else
> live_track_process_use (live, var);
>would work too and would be prepared for future hypothetical stmts
>with multiple defs.

Yes, that's what I am suggesting.

  The reason why I'm not calling
>live_track_process_use
>on the first SSA_OP_DEF operand is that it is completely useless,
>because the first live_track_process_def will undo it immediately.
>For the above, I guess I'd just slightly adjust the comment, instead of
>   For GIMPLE_ASM as the only statement which can have
>   more than one SSA_NAME definition, ...
>say
>   For stmts with more than one SSA_NAME definition, ...

Sounds good.

Richard.

>   Jakub




Re: [PATCH] Fix ssa coalescing with inline asm (PR middle-end/70593)

2016-04-08 Thread Jakub Jelinek
On Fri, Apr 08, 2016 at 06:04:38PM +0200, Richard Biener wrote:
> Hmm, don't we simply want to do this for all stmts (OK, only asm have 
> multiple defs...)?

For all stmts that have multiple defs (which is only GIMPLE_ASM right now).
Though, of course, if you want, unconditionally doing:
  bool first = true;
  FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_DEF)
if (first)
  first = false;
else
  live_track_process_use (live, var);
would work too and would be prepared for future hypothetical stmts
with multiple defs.  The reason why I'm not calling live_track_process_use
on the first SSA_OP_DEF operand is that it is completely useless,
because the first live_track_process_def will undo it immediately.
For the above, I guess I'd just slightly adjust the comment, instead of
For GIMPLE_ASM as the only statement which can have
more than one SSA_NAME definition, ...
say
For stmts with more than one SSA_NAME definition, ...

Jakub


Re: [PATCH] Fix ssa coalescing with inline asm (PR middle-end/70593)

2016-04-08 Thread Richard Biener
On April 8, 2016 4:54:22 PM GMT+02:00, Jakub Jelinek  wrote:
>Hi!
>
>xen is miscompiled since Alex' SSA coalescing changes.
>The problem is that we have there inline asm that sets more than one
>SSA_NAME, one of them is dead though (has zero uses) and because of
>the zero uses coalescing doesn't see any conflicts and puts both
>the SSA_NAMEs in the two GIMPLE_ASM outputs into the same partition.
>During expansion we then emit the ASM_OPERANDS followed by copying of
>the
>two outputs into DECL_RTL of their SSA_NAMEs.  But as both are in the
>same
>partition, that is the same pseudo.  The first one is what we want,
>and the second one is for the zero uses SSA_NAME, which then overwrites
>the right one with unrelated value.
>(insn 9 6 7 2 (parallel [
>(set (reg:DI 89 [ c ])
>(asm_operands/v:DI ("xorl  %k1, %k1
>   movl$7, %k0") ("=c") 0 [
>(reg/v:DI 87 [  ])
>(reg/v:DI 87 [  ])
>]
> [
>(asm_input:DI ("0") pr70593.c:9)
>(asm_input:DI ("1") pr70593.c:9)
>]
> [] pr70593.c:9))
>(set (reg:DI 90 [ a ])
>(asm_operands/v:DI ("xorl  %k1, %k1
>   movl$7, %k0") ("=a") 1 [
>(reg/v:DI 87 [  ])
>(reg/v:DI 87 [  ])
>]
> [
>(asm_input:DI ("0") pr70593.c:9)
>(asm_input:DI ("1") pr70593.c:9)
>]
> [] pr70593.c:9))
>(clobber (mem:BLK (scratch) [0  A8]))
>(clobber (reg:CCFP 18 fpsr))
>(clobber (reg:CC 17 flags))
>]) pr70593.c:9 -1
> (nil))
>(insn 7 9 8 2 (set (reg/v:DI 87 [  ])
>(reg:DI 89 [ c ])) pr70593.c:9 -1
> (nil))
>(insn 8 7 13 2 (set (reg/v:DI 87 [  ])
>(reg:DI 90 [ a ])) pr70593.c:9 -1
> (nil))
>The following patch handles this by making sure we record conflicts
>between multiple SSA_NAME outputs of the GIMPLE_ASM, even when some of
>them
>are not live, by emulating what really happens during expansion - that
>they are live in the moves after the asm insn.  The
>live_track_process_def calls later on remove it again from the live
>partitions and thus undo the live_track_process_use effect, so all
>the patch changes is that during the processing of e.g. the first
>SSA_NAME output it sees the second (and perhaps third ...) SSA_NAME
>output
>as live and adds conflict, similarly when processing the second one
>if there are more than two, it again sees the third one and adds
>conflict
>etc.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Hmm, don't we simply want to do this for all stmts (OK, only asm have multiple 
defs...)?

Otherwise OK.thanks,
Richard.

>2016-04-08  Jakub Jelinek  
>
>   PR middle-end/70593
>   * tree-ssa-coalesce.c (build_ssa_conflict_graph): For GIMPLE_ASM
>   with multiple SSA_NAME defs, force the outputs other than first
>   to be live before calling live_track_process_def on each output.
>
>   * gcc.target/i386/pr70593.c: New test.
>
>--- gcc/tree-ssa-coalesce.c.jj 2016-03-30 16:00:17.0 +0200
>+++ gcc/tree-ssa-coalesce.c2016-04-08 12:36:26.409403204 +0200
>@@ -905,6 +905,27 @@ build_ssa_conflict_graph (tree_live_info
>   }
> else if (is_gimple_debug (stmt))
>   continue;
>+else if (gimple_code (stmt) == GIMPLE_ASM
>+ && gimple_asm_noutputs (as_a  (stmt)) > 1)
>+  {
>+/* For GIMPLE_ASM as the only statement which can have
>+   more than one SSA_NAME definition, pretend all the
>+   SSA_NAME outputs but the first one are live at this point,
>+   so that conflicts are added in between all those even
>+   when they are actually not really live after the asm,
>+   because expansion might copy those into pseudos after
>+   the asm and if multiple outputs share the same partition,
>+   it might overwrite those that should be live.  E.g.
>+   asm volatile (".." : "=r" (a) : "=r" (b) : "0" (a), "1" (a));
>+   return a;
>+   See PR70593.  */
>+bool first = true;
>+FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_DEF)
>+  if (first)
>+first = false;
>+  else
>+live_track_process_use (live, var);
>+  }
> 
> FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_DEF)
>   live_track_process_def (live, var, graph);
>--- gcc/testsuite/gcc.target/i386/pr70593.c.jj 2016-04-08
>12:59:27.352278471 +0200
>+++ gcc/testsuite/gcc.target/i386/pr70593.c2016-04-08
>12:48:27.0 +0200
>@@ -0,0 +1,19 @@
>+/* PR middle-end/70593 */
>+/* { dg-do run } 

[PATCH] Fix ssa coalescing with inline asm (PR middle-end/70593)

2016-04-08 Thread Jakub Jelinek
Hi!

xen is miscompiled since Alex' SSA coalescing changes.
The problem is that we have there inline asm that sets more than one
SSA_NAME, one of them is dead though (has zero uses) and because of
the zero uses coalescing doesn't see any conflicts and puts both
the SSA_NAMEs in the two GIMPLE_ASM outputs into the same partition.
During expansion we then emit the ASM_OPERANDS followed by copying of the
two outputs into DECL_RTL of their SSA_NAMEs.  But as both are in the same
partition, that is the same pseudo.  The first one is what we want,
and the second one is for the zero uses SSA_NAME, which then overwrites
the right one with unrelated value.
(insn 9 6 7 2 (parallel [
(set (reg:DI 89 [ c ])
(asm_operands/v:DI ("xorl   %k1, %k1
movl$7, %k0") ("=c") 0 [
(reg/v:DI 87 [  ])
(reg/v:DI 87 [  ])
]
 [
(asm_input:DI ("0") pr70593.c:9)
(asm_input:DI ("1") pr70593.c:9)
]
 [] pr70593.c:9))
(set (reg:DI 90 [ a ])
(asm_operands/v:DI ("xorl   %k1, %k1
movl$7, %k0") ("=a") 1 [
(reg/v:DI 87 [  ])
(reg/v:DI 87 [  ])
]
 [
(asm_input:DI ("0") pr70593.c:9)
(asm_input:DI ("1") pr70593.c:9)
]
 [] pr70593.c:9))
(clobber (mem:BLK (scratch) [0  A8]))
(clobber (reg:CCFP 18 fpsr))
(clobber (reg:CC 17 flags))
]) pr70593.c:9 -1
 (nil))
(insn 7 9 8 2 (set (reg/v:DI 87 [  ])
(reg:DI 89 [ c ])) pr70593.c:9 -1
 (nil))
(insn 8 7 13 2 (set (reg/v:DI 87 [  ])
(reg:DI 90 [ a ])) pr70593.c:9 -1
 (nil))
The following patch handles this by making sure we record conflicts
between multiple SSA_NAME outputs of the GIMPLE_ASM, even when some of them
are not live, by emulating what really happens during expansion - that
they are live in the moves after the asm insn.  The
live_track_process_def calls later on remove it again from the live
partitions and thus undo the live_track_process_use effect, so all
the patch changes is that during the processing of e.g. the first
SSA_NAME output it sees the second (and perhaps third ...) SSA_NAME output
as live and adds conflict, similarly when processing the second one
if there are more than two, it again sees the third one and adds conflict
etc.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-04-08  Jakub Jelinek  

PR middle-end/70593
* tree-ssa-coalesce.c (build_ssa_conflict_graph): For GIMPLE_ASM
with multiple SSA_NAME defs, force the outputs other than first
to be live before calling live_track_process_def on each output.

* gcc.target/i386/pr70593.c: New test.

--- gcc/tree-ssa-coalesce.c.jj  2016-03-30 16:00:17.0 +0200
+++ gcc/tree-ssa-coalesce.c 2016-04-08 12:36:26.409403204 +0200
@@ -905,6 +905,27 @@ build_ssa_conflict_graph (tree_live_info
}
  else if (is_gimple_debug (stmt))
continue;
+ else if (gimple_code (stmt) == GIMPLE_ASM
+  && gimple_asm_noutputs (as_a  (stmt)) > 1)
+   {
+ /* For GIMPLE_ASM as the only statement which can have
+more than one SSA_NAME definition, pretend all the
+SSA_NAME outputs but the first one are live at this point,
+so that conflicts are added in between all those even
+when they are actually not really live after the asm,
+because expansion might copy those into pseudos after
+the asm and if multiple outputs share the same partition,
+it might overwrite those that should be live.  E.g.
+asm volatile (".." : "=r" (a) : "=r" (b) : "0" (a), "1" (a));
+return a;
+See PR70593.  */
+ bool first = true;
+ FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_DEF)
+   if (first)
+ first = false;
+   else
+ live_track_process_use (live, var);
+   }
 
  FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_DEF)
live_track_process_def (live, var, graph);
--- gcc/testsuite/gcc.target/i386/pr70593.c.jj  2016-04-08 12:59:27.352278471 
+0200
+++ gcc/testsuite/gcc.target/i386/pr70593.c 2016-04-08 12:48:27.0 
+0200
@@ -0,0 +1,19 @@
+/* PR middle-end/70593 */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+__attribute__((noinline, noclone)) unsigned long
+foo (unsigned x)
+{
+  unsigned long a, c = x;
+  asm volatile ("xorl\t%k1, %k1\n\tmovl\t$7, %k0" : "=c" (c), "=a" (a) : "0" 
(c), "1" (c) : "memory");
+  return c;
+}
+
+int
+main ()