Re: [PATCH] PR78255: Make postreload aware of NO_FUNCTION_CSE

2017-01-11 Thread Andre Vieira (lists)
On 06/01/17 15:47, Jeff Law wrote:
> On 01/06/2017 03:53 AM, Andre Vieira (lists) wrote:
>> On 09/12/16 16:31, Bernd Schmidt wrote:
>>> On 12/09/2016 05:16 PM, Andre Vieira (lists) wrote:
>>>
 Regardless, 'reload_cse_simplify' would never perform the opposite
 transformation.  It checks whether it can replace anything within the
 first argument INSN, with the second argument TESTREG. As the name
 implies this will always be a register. I double checked, the function
 is only called in 'reload_cse_regs' and 'testreg' is created using
 'gen_rtx_REG'.
>>>
>>> Ok, let's go ahead with it.
>>>
>>>
>>> Bernd
>>>
>> Hello,
>>
>> Is it OK to backport this (including the testcase fix) to gcc-6-branch?
>>
>> Patches apply cleanly and full bootstrap and regression tests for
>> aarch64- and arm-none-linux-gnueabihf. Regression tested for
>> arm-none-eabi.
> Yes, that should be fine to backport to the active release branches.
> 
> jeff
OK, I have committed the backports to gcc-5 and gcc-6 branches.

Cheers,
Andre


Re: [PATCH] PR78255: Make postreload aware of NO_FUNCTION_CSE

2017-01-06 Thread Jeff Law

On 01/06/2017 03:53 AM, Andre Vieira (lists) wrote:

On 09/12/16 16:31, Bernd Schmidt wrote:

On 12/09/2016 05:16 PM, Andre Vieira (lists) wrote:


Regardless, 'reload_cse_simplify' would never perform the opposite
transformation.  It checks whether it can replace anything within the
first argument INSN, with the second argument TESTREG. As the name
implies this will always be a register. I double checked, the function
is only called in 'reload_cse_regs' and 'testreg' is created using
'gen_rtx_REG'.


Ok, let's go ahead with it.


Bernd


Hello,

Is it OK to backport this (including the testcase fix) to gcc-6-branch?

Patches apply cleanly and full bootstrap and regression tests for
aarch64- and arm-none-linux-gnueabihf. Regression tested for arm-none-eabi.

Yes, that should be fine to backport to the active release branches.

jeff


Re: [PATCH] PR78255: Make postreload aware of NO_FUNCTION_CSE

2017-01-06 Thread Andre Vieira (lists)
On 09/12/16 16:31, Bernd Schmidt wrote:
> On 12/09/2016 05:16 PM, Andre Vieira (lists) wrote:
> 
>> Regardless, 'reload_cse_simplify' would never perform the opposite
>> transformation.  It checks whether it can replace anything within the
>> first argument INSN, with the second argument TESTREG. As the name
>> implies this will always be a register. I double checked, the function
>> is only called in 'reload_cse_regs' and 'testreg' is created using
>> 'gen_rtx_REG'.
> 
> Ok, let's go ahead with it.
> 
> 
> Bernd
> 
Hello,

Is it OK to backport this (including the testcase fix) to gcc-6-branch?

Patches apply cleanly and full bootstrap and regression tests for
aarch64- and arm-none-linux-gnueabihf. Regression tested for arm-none-eabi.

Cheers,
Andre


Re: [PATCH] PR78255: Make postreload aware of NO_FUNCTION_CSE

2016-12-12 Thread Christophe Lyon
Hi Andre,

On 9 December 2016 at 17:16, Andre Vieira (lists)
 wrote:
> On 09/12/16 16:02, Ramana Radhakrishnan wrote:
>> On Fri, Dec 9, 2016 at 3:58 PM, Bernd Schmidt  wrote:
>>> On 12/09/2016 04:34 PM, Andre Vieira (lists) wrote:
>>>
 Regardless, the other testcases I add in this patch show a sub-optimal
 transformation done by postreload, turning direct calls into indirect
 calls, for targets which have specifically pointed out that no CSE
 should be done on functions through 'NO_FUNCTION_CSE'.
>>>
>>>
>>> What I'm wondering about is whether the patch wouldn't also prevent the
>>> opposite transformation. Is there a reason not to do that one? Can the
>>> problem be modeled by tweaking costs?
>>
>> I really don't think we should have a solution that relies on costs
>> for correctness .
>>
>> regards
>> Ramana
>>
>
> Regardless, 'reload_cse_simplify' would never perform the opposite
> transformation.  It checks whether it can replace anything within the
> first argument INSN, with the second argument TESTREG. As the name
> implies this will always be a register. I double checked, the function
> is only called in 'reload_cse_regs' and 'testreg' is created using
> 'gen_rtx_REG'.
>

The new test (gcc.target/arm/pr78255-2.c scan-assembler b\\s+bar)
added at r243494 fails on old arm architectures, such as:
* arm-none-linux-gnueabi, forcing -march=armv5t in runtestflags
* arm-none-eabi with default cpu/fpu/mode

Christophe


> Cheers,
> Andre


Re: [PATCH] PR78255: Make postreload aware of NO_FUNCTION_CSE

2016-12-09 Thread Bernd Schmidt

On 12/09/2016 05:16 PM, Andre Vieira (lists) wrote:


Regardless, 'reload_cse_simplify' would never perform the opposite
transformation.  It checks whether it can replace anything within the
first argument INSN, with the second argument TESTREG. As the name
implies this will always be a register. I double checked, the function
is only called in 'reload_cse_regs' and 'testreg' is created using
'gen_rtx_REG'.


Ok, let's go ahead with it.


Bernd



Re: [PATCH] PR78255: Make postreload aware of NO_FUNCTION_CSE

2016-12-09 Thread Andre Vieira (lists)
On 09/12/16 16:02, Ramana Radhakrishnan wrote:
> On Fri, Dec 9, 2016 at 3:58 PM, Bernd Schmidt  wrote:
>> On 12/09/2016 04:34 PM, Andre Vieira (lists) wrote:
>>
>>> Regardless, the other testcases I add in this patch show a sub-optimal
>>> transformation done by postreload, turning direct calls into indirect
>>> calls, for targets which have specifically pointed out that no CSE
>>> should be done on functions through 'NO_FUNCTION_CSE'.
>>
>>
>> What I'm wondering about is whether the patch wouldn't also prevent the
>> opposite transformation. Is there a reason not to do that one? Can the
>> problem be modeled by tweaking costs?
> 
> I really don't think we should have a solution that relies on costs
> for correctness .
> 
> regards
> Ramana
> 

Regardless, 'reload_cse_simplify' would never perform the opposite
transformation.  It checks whether it can replace anything within the
first argument INSN, with the second argument TESTREG. As the name
implies this will always be a register. I double checked, the function
is only called in 'reload_cse_regs' and 'testreg' is created using
'gen_rtx_REG'.

Cheers,
Andre


Re: [PATCH] PR78255: Make postreload aware of NO_FUNCTION_CSE

2016-12-09 Thread Ramana Radhakrishnan
On Fri, Dec 9, 2016 at 3:58 PM, Bernd Schmidt  wrote:
> On 12/09/2016 04:34 PM, Andre Vieira (lists) wrote:
>
>> Regardless, the other testcases I add in this patch show a sub-optimal
>> transformation done by postreload, turning direct calls into indirect
>> calls, for targets which have specifically pointed out that no CSE
>> should be done on functions through 'NO_FUNCTION_CSE'.
>
>
> What I'm wondering about is whether the patch wouldn't also prevent the
> opposite transformation. Is there a reason not to do that one? Can the
> problem be modeled by tweaking costs?

I really don't think we should have a solution that relies on costs
for correctness .

regards
Ramana


Re: [PATCH] PR78255: Make postreload aware of NO_FUNCTION_CSE

2016-12-09 Thread Jeff Law

On 12/09/2016 08:02 AM, Bernd Schmidt wrote:

On 12/09/2016 03:03 PM, Andre Vieira (lists) wrote:

This patch fixes the issue reported in PR78255 by making postreload
aware it should not be performing CSE on functions if NO_FUNCTION_CSE is
defined to true.

Bootstrap and full regression on arm-none-linux-gnueabihf and
aarch64-unknown-linux-gnu.

Also checked this fixed the reported issue on arm-none-eabi.

Is this OK for trunk?


Hmm, it probably doesn't hurt, but looking at the PR I think the
originally reported problem suggests you need a different fix: a
separate register class to be used for indirect sibling calls. I
remember seeing similar issues on other targets.
I think we actually split the call patterns into direct and indirect 
variants on the PA when we stumbled on this in cse.c.


Jeff


Re: [PATCH] PR78255: Make postreload aware of NO_FUNCTION_CSE

2016-12-09 Thread Bernd Schmidt

On 12/09/2016 04:34 PM, Andre Vieira (lists) wrote:


Regardless, the other testcases I add in this patch show a sub-optimal
transformation done by postreload, turning direct calls into indirect
calls, for targets which have specifically pointed out that no CSE
should be done on functions through 'NO_FUNCTION_CSE'.


What I'm wondering about is whether the patch wouldn't also prevent the 
opposite transformation. Is there a reason not to do that one? Can the 
problem be modeled by tweaking costs?



Would you prefer I create a new PR for the problem this is actually
fixing and refile this PATCH under that PR?


Well, as long as you're working on fixing it I see no reason to clutter 
the bug database for the function cse issue, but do keep the existing PR 
open if there also ought to be register class changes.



Bernd



Re: [PATCH] PR78255: Make postreload aware of NO_FUNCTION_CSE

2016-12-09 Thread Wilco Dijkstra
Bernd wrote:

> Hmm, it probably doesn't hurt, but looking at the PR I think the originally 
> reported problem 
> suggests you need a different fix: a separate register class to be used for 
> indirect sibling calls. 
> I remember seeing similar issues on other targets. 

The only safe way to block any changes between direct and indirect calls is to 
split
them into separate instructions (rather than have 2 alternatives).  That's a 
good idea
to do anyway as calls already do this, so tailcalls should follow the same 
pattern.

However this patch fixes the postreload issue for all targets, similarly my 
leaf_function
patches fix any latent issues in prolog/epilog generation across all targets.

Wilco


Re: [PATCH] PR78255: Make postreload aware of NO_FUNCTION_CSE

2016-12-09 Thread Andre Vieira (lists)
On 09/12/16 15:02, Bernd Schmidt wrote:
> On 12/09/2016 03:03 PM, Andre Vieira (lists) wrote:
>> This patch fixes the issue reported in PR78255 by making postreload
>> aware it should not be performing CSE on functions if NO_FUNCTION_CSE is
>> defined to true.
>>
>> Bootstrap and full regression on arm-none-linux-gnueabihf and
>> aarch64-unknown-linux-gnu.
>>
>> Also checked this fixed the reported issue on arm-none-eabi.
>>
>> Is this OK for trunk?
> 
> Hmm, it probably doesn't hurt, but looking at the PR I think the
> originally reported problem suggests you need a different fix: a
> separate register class to be used for indirect sibling calls. I
> remember seeing similar issues on other targets.
> 
> 
> Bernd

I agree that even though this "fixes" the PR issue, this change is
fixing more than just that.

As for your suggestion to use a separate register class for indirect
sibling calls. We already do, we use CALLER_SAVE_REGS. However, 'r3' is
also allowed by that scheme as it should. Since if we don't use 'r3' to
either pass an argument or align the stack, then it is perfectly valid
to use it for indirect sibling calls.

The problem is at the time where we decide whether it is safe to use
'r3' we expect the assigned registers not to change and postreload does,
when it shouldn't. Hence why I am now telling it to not do that. Now it
could be that there are other cases in which the register allocation
would change after reload and before the pro and epilogue pass. Maybe we
shouldn't be making the decision quite so early. This is a bit of a can
of worms though...

Regardless, the other testcases I add in this patch show a sub-optimal
transformation done by postreload, turning direct calls into indirect
calls, for targets which have specifically pointed out that no CSE
should be done on functions through 'NO_FUNCTION_CSE'.  Maybe it would
make more sense to split this up into two PR's, though by fixing
postreload I wouldn't be able to reproduce the failure mentioned in PR78255.

Would you prefer I create a new PR for the problem this is actually
fixing and refile this PATCH under that PR?

Cheers,
Andre



Re: [PATCH] PR78255: Make postreload aware of NO_FUNCTION_CSE

2016-12-09 Thread Bernd Schmidt

On 12/09/2016 03:03 PM, Andre Vieira (lists) wrote:

This patch fixes the issue reported in PR78255 by making postreload
aware it should not be performing CSE on functions if NO_FUNCTION_CSE is
defined to true.

Bootstrap and full regression on arm-none-linux-gnueabihf and
aarch64-unknown-linux-gnu.

Also checked this fixed the reported issue on arm-none-eabi.

Is this OK for trunk?


Hmm, it probably doesn't hurt, but looking at the PR I think the 
originally reported problem suggests you need a different fix: a 
separate register class to be used for indirect sibling calls. I 
remember seeing similar issues on other targets.



Bernd


[PATCH] PR78255: Make postreload aware of NO_FUNCTION_CSE

2016-12-09 Thread Andre Vieira (lists)
Hi,

This patch fixes the issue reported in PR78255 by making postreload
aware it should not be performing CSE on functions if NO_FUNCTION_CSE is
defined to true.

Bootstrap and full regression on arm-none-linux-gnueabihf and
aarch64-unknown-linux-gnu.

Also checked this fixed the reported issue on arm-none-eabi.

Is this OK for trunk?

Cheers,
Andre

gcc/ChangeLog
2016-12-09  Andre Vieira 

PR rtl-optimization/78255
* gcc/postreload.c (reload_cse_simplify): Do not CSE a function if
NO_FUNCTION_CSE is true.

gcc/testsuite/ChangeLog:
2016-12-09  Andre Vieira 

PR rtl-optimization/78255
* gcc.target/arm/pr78255-1.c: New.
* gcc.target/arm/pr78255-2.c: New.


gcc/testsuite/ChangeLog:
2016-12-09  Andre Vieira 

PR rtl-optimization/78255
* gcc.target/aarch64/pr78255.c: New.
diff --git a/gcc/postreload.c b/gcc/postreload.c
index 
539ad33b6c3eb1b968677419a7420badc3a52f01..8325d121c403786fdb7804956724a81d134252a2
 100644
--- a/gcc/postreload.c
+++ b/gcc/postreload.c
@@ -90,6 +90,11 @@ reload_cse_simplify (rtx_insn *insn, rtx testreg)
   basic_block insn_bb = BLOCK_FOR_INSN (insn);
   unsigned insn_bb_succs = EDGE_COUNT (insn_bb->succs);
 
+  /* If NO_FUNCTION_CSE has been set by the target, then we should not try
+ to cse function calls.  */
+  if (NO_FUNCTION_CSE && CALL_P (insn))
+return false;
+
   if (GET_CODE (body) == SET)
 {
   int count = 0;
diff --git a/gcc/testsuite/gcc.target/aarch64/pr78255.c 
b/gcc/testsuite/gcc.target/aarch64/pr78255.c
new file mode 100644
index 
..b078cf3e1c1c7717c9e227721a367f9846f0c7fe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr78255.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcmodel=tiny" } */
+
+extern int bar (void *);
+
+int
+foo (void)
+{
+  return bar ((void *)bar);
+}
+
+/* { dg-final { scan-assembler "b\\s+bar" } } */
diff --git a/gcc/testsuite/gcc.target/arm/pr78255-1.c 
b/gcc/testsuite/gcc.target/arm/pr78255-1.c
new file mode 100644
index 
..4901acea51466c0bac92d9cb90e52b00b450d88a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr78255-1.c
@@ -0,0 +1,57 @@
+/* { dg-do run } */
+/* { dg-options "-O2" }  */
+
+#include 
+
+struct table_s
+{
+void (*fun0)
+( void );
+void (*fun1)
+( void );
+void (*fun2)
+( void );
+void (*fun3)
+( void );
+void (*fun4)
+( void );
+void (*fun5)
+( void );
+void (*fun6)
+( void );
+void (*fun7)
+( void );
+} table;
+
+void callback0(){__asm("mov r0, r0 \n\t");}
+void callback1(){__asm("mov r0, r0 \n\t");}
+void callback2(){__asm("mov r0, r0 \n\t");}
+void callback3(){__asm("mov r0, r0 \n\t");}
+void callback4(){__asm("mov r0, r0 \n\t");}
+
+void test (void) {
+memset(, 0, sizeof table);
+
+asm volatile ("" : : : "r3");
+
+table.fun0 = callback0;
+table.fun1 = callback1;
+table.fun2 = callback2;
+table.fun3 = callback3;
+table.fun4 = callback4;
+table.fun0();
+}
+
+void foo (void)
+{
+  __builtin_abort ();
+}
+
+int main (void)
+{
+  unsigned long p = (unsigned long) 
+  asm volatile ("mov r3, %0" : : "r" (p));
+  test ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/arm/pr78255-2.c 
b/gcc/testsuite/gcc.target/arm/pr78255-2.c
new file mode 100644
index 
..9e64ef3939465b088e35a01d4bb23fd50d43006d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr78255-2.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" }  */
+
+extern int bar (void *);
+
+int
+foo (void)
+{
+  return bar ((void*)bar);
+}
+
+/* { dg-final { scan-assembler "b\\s+bar" } } */