Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)

2016-02-01 Thread Richard Biener
On Mon, Feb 1, 2016 at 1:51 PM, Bernd Schmidt  wrote:
> On 02/01/2016 01:49 PM, Richard Biener wrote:
>
>> What prevents motion of those across a stack adjustment (thus a place
>> they are _not_ valid?)
>
>
> If the address is SP-based, dependencies on the address register. If you're
> thinking prologue stack adjustments, ports where this could be an issue emit
> suitable barrier insns.

Ok.  Just I remember we can't mark non-trapping memory references in general
as they might be non-trapping only conditional, aka if (p != NULL) *p = 1;

I suppose stack accesses are special enough where issues like that can't pop up?
I can think of

foo (int i)
{
 char *p = alloca(10);
 if (i < 10)
   p[i] = 1;
}

or stuff like that.  But I guess your proposal is to handle the simple
non-variable-indexed
cases and leave the rest as always conservatively trapping.

Richard.

>
> Bernd
>


AW: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)

2016-02-01 Thread Bernd Edlinger
Hi,

On 02/01/2016 13:18, Bernd Schmidt wrote:
> On 01/29/2016 08:42 PM, Bernd Edlinger wrote:
> > Usually we have "if (x==1234) { read MEM[FP+x]; }", so wo don't know,
> > and then after reload: "if (x==1234) { read MEM[SP+x+sp_fp_offset]; }"
> > but wait, in the if statement we know, that x==1234, so everything
> > turns in one magic constant, and we have a totally new constant offset
> > from the SP register "if (x==1234) { read MEM[SP+1234+sp_fp_offset]; }".
> > Now if rtx_addr_can_trap_p(MEM[SP+1234+sp_fp_offset]) says it cannot
> > trap we think we do not need the if at all => BANG.
> 
> What are you trying to say here? As far as I can tell this isn't a
> problem with my proposed solution (set MEM_NOTRAP_P for valid SFP+x
> offsets).

First, I think there are cases when x is still a pseudo before reload, and after
reload, some constant propagation manages to replace x with a constant value.
Then it will not help to know what rtx_addr_can_trap_p was before reload.

Second, I do also think that it may be helpful to annotate the RTX during
reload.  MEM_NOTRAP_P is already used, so maybe something new
like a spare bit would be necessary.

But maybe a single bit will not be enough, what I would need, would be to
somehow annotate the stack_pointer_rtx, that it has a known constant offset
to the initial stack pointer value.  The problem with that is, that 
stack_pointer_rtx
is a shared pointer, and it is therefore not possible to modify it, and attach
some context to it.


Bernd.

Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)

2016-02-01 Thread Bernd Schmidt

On 01/29/2016 08:42 PM, Bernd Edlinger wrote:

On 29.01.2016 16:47 Bernd Schmidt wrote:


Yes. What is the problem with that? If we have (plus sfp const_int) at
any point before reload, we can check whether that offset is inside
frame_size. If it isn't or if the offset isn't known, it could trap.




Usually we have "if (x==1234) { read MEM[FP+x]; }", so wo don't know,
and then after reload: "if (x==1234) { read MEM[SP+x+sp_fp_offset]; }"
but wait, in the if statement we know, that x==1234, so everything
turns in one magic constant, and we have a totally new constant offset
from the SP register "if (x==1234) { read MEM[SP+1234+sp_fp_offset]; }".
Now if rtx_addr_can_trap_p(MEM[SP+1234+sp_fp_offset]) says it cannot
trap we think we do not need the if at all => BANG.


What are you trying to say here? As far as I can tell this isn't a 
problem with my proposed solution (set MEM_NOTRAP_P for valid SFP+x 
offsets).



Bernd


Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)

2016-02-01 Thread Richard Biener
On Mon, Feb 1, 2016 at 1:18 PM, Bernd Schmidt  wrote:
> On 01/29/2016 08:42 PM, Bernd Edlinger wrote:
>>
>> On 29.01.2016 16:47 Bernd Schmidt wrote:
>>>
>>>
>>> Yes. What is the problem with that? If we have (plus sfp const_int) at
>>> any point before reload, we can check whether that offset is inside
>>> frame_size. If it isn't or if the offset isn't known, it could trap.
>>>
>>>
>>
>> Usually we have "if (x==1234) { read MEM[FP+x]; }", so wo don't know,
>> and then after reload: "if (x==1234) { read MEM[SP+x+sp_fp_offset]; }"
>> but wait, in the if statement we know, that x==1234, so everything
>> turns in one magic constant, and we have a totally new constant offset
>> from the SP register "if (x==1234) { read MEM[SP+1234+sp_fp_offset]; }".
>> Now if rtx_addr_can_trap_p(MEM[SP+1234+sp_fp_offset]) says it cannot
>> trap we think we do not need the if at all => BANG.
>
>
> What are you trying to say here? As far as I can tell this isn't a problem
> with my proposed solution (set MEM_NOTRAP_P for valid SFP+x offsets).

What prevents motion of those across a stack adjustment (thus a place
they are _not_ valid?)

Richard.

>
> Bernd


Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)

2016-02-01 Thread Bernd Schmidt

On 02/01/2016 01:49 PM, Richard Biener wrote:


What prevents motion of those across a stack adjustment (thus a place
they are _not_ valid?)


If the address is SP-based, dependencies on the address register. If 
you're thinking prologue stack adjustments, ports where this could be an 
issue emit suitable barrier insns.



Bernd



Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)

2016-01-29 Thread Bernd Edlinger
On 29.01.2016 02:09, Bernd Schmidt wrote:
> On 01/28/2016 12:36 AM, Eric Botcazou wrote:
>>> [cc-ing Eric as RTL maintainer]
>>
>> Sorry for the delay, the message apparently bounced]
>>
>>> IMO the problem is that rtx_addr_can_trap_p_1 duplicates a large
>>> bit of LRA/reload logic:
>>>
>>> [...]
>>>
>>> Under the current interface macros like INITIAL_ELIMINATION_OFFSET
>>> are expected to trigger the calculation of the target's frame layout
>>> (since you need that information to answer the question).
>>> To me it seems wrong that we're attempting to call that sort of
>>> macro in a query routine like rtx_addr_can_trap_p_1.
>>
>> I'm a little uncomfortable stepping in here because, while I
>> essentially share
>> your objections and was opposed to the patch (I was almost sure that
>> it would
>> introduce maintainance issues for no practical benefit), I didn't
>> imagine that
>> such a failure mode would have been possible (computing an
>> approximation of
>> the frame layout after reload being problematic) so I didn't really
>> object to
>> being overruled after seeing Bernd's patch...
>>
>>> IMO we should cache the information we need@the start of each
>>> LRA/reload cycle.  This will be more robust, because there will
>>> be no accidental changes to global state either during or after
>>> LRA/reload.  It will also be more efficient because
>>> rtx_addr_can_trap_p_1 can read the cached variables rather
>>> than calling back into the target.
>>
>> That would be a better design for sure and would eliminate the
>> maintainance
>> issues I was originally afraid of.
>>
>> My recommendation would be to back out Bernd's patch for GCC 6.0
>> (again, it
>> doesn't fix any regression and, more importantly, any bug in real
>> software,
>> but only corner case bugs in dumb computer-generated testcases) but
>> with the
>> commitment to address the underlying issue for GCC 7.0 and backport
>> the fix to
>> GCC 6.x unless really impracticable.  That being said, it's ultimately
>> Jakub
>> and Richard's call.
>
> I'm on the fence; I do think the original problem is an issue we should
> fix, but I'm also not terribly happy with the implementation we have
> right now. Besides the issues already mentioned, doesn't it kind of
> assume these offsets are constant (which they definitely aren't,
> consider arg pushes for example)?
>

Yes that is right.  I saw it as a thing that could possibly happen more
often than we know, because it is difficult to spot a wrong code by
ordinary tests.  Even the reproducer from pr61047 does not crash when
it runs in the gcc testsuite, I had to tweak the example first to use
a larger offset to compensate the large number of environment values
that are passed from the test suite in each test execution.

Yes, rtx_addr_can_trap_p_1 does not know how many bytes are pushed on
the stack and I saw no easy way how to get to the REG_NOTES for
instance, because they are attached to the INSN and rtx_addr_can_trap_p
has only access to a MEM rtx.  It is no exact science, but the error is
on the safe side.

Nevertheless, with the data from the target hook the approximation is
good enough, to not pessimize any single bit of code that is generated
in stage2 vs. stage3, which would not have been the case without some
help from the target.

> I think a better approach might be to just mark accesses at known
> locations in the frame, or arg pushes, as MEM_NOTRAP_P, and consider
> accesses with non-constant or calculated offsets as potentially trapping.
>

Yes I think also that might be a next step.  It would probably be good
to somehow fixate the result of rtx_addr_can_trap_p immediately before
reload when the RTX's are still FRAMEP+x and ARGP+x, and annotate that
somehow to the reloaded RTX's, that way it would finally be superfluous
to call the target hook at all, because the actual addresses should not
change during or after reload.  It will probably have to be a spare bit
on the RTX that is currently unused, because MEM_NOTRAP_P is already
used for something different.

It will however not be simple to find a valid piece of C code where the
current implementation with all of its limitations generates different
code compared to an implementation that has access to the exact offsets.
As I said, I tried already, but could not find an example of a missed
optimization due to my patch.


Thanks
Bernd.

>
> Bernd


Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)

2016-01-29 Thread Bernd Schmidt

On 01/29/2016 04:41 PM, Jakub Jelinek wrote:

On Fri, Jan 29, 2016 at 02:09:25AM +0100, Bernd Schmidt wrote:



I think a better approach might be to just mark accesses at known locations
in the frame, or arg pushes, as MEM_NOTRAP_P, and consider accesses with
non-constant or calculated offsets as potentially trapping.


I don't see how that would work generally.  Sure, if there is e.g. a constant
offset array access, it could be checked easily, but if there is variable offset
array access, that is at some point later on changed into a constant offset
access, you'd need to be conservative, unless you can prove it is in range.


Yes. What is the problem with that? If we have (plus sfp const_int) at 
any point before reload, we can check whether that offset is inside 
frame_size. If it isn't or if the offset isn't known, it could trap.



Bernd


Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)

2016-01-29 Thread Jakub Jelinek
On Fri, Jan 29, 2016 at 02:09:25AM +0100, Bernd Schmidt wrote:
> I'm on the fence; I do think the original problem is an issue we should fix,
> but I'm also not terribly happy with the implementation we have right now.

The fact that it has been only reported from generated testcases only means
we are lucky nobody encountered it yet in real-world programs.
Plus, we need to be thankful to people working on those generators that keep
reporting GCC bugs, they significantly improve the compiler.

> Besides the issues already mentioned, doesn't it kind of assume these
> offsets are constant (which they definitely aren't, consider arg pushes for
> example)?

Sure, for some registers the offsets aren't constant.  In some cases we e.g.
have REG_ARGS_SIZE notes, but in other cases don't and don't have anything
else that would help us understand the difference between current sp value
and one at the end of the prologue or so.

> I think a better approach might be to just mark accesses at known locations
> in the frame, or arg pushes, as MEM_NOTRAP_P, and consider accesses with
> non-constant or calculated offsets as potentially trapping.

I don't see how that would work generally.  Sure, if there is e.g. a constant
offset array access, it could be checked easily, but if there is variable offset
array access, that is at some point later on changed into a constant offset
access, you'd need to be conservative, unless you can prove it is in range.
Or perhaps we could also use some other flag (or turn it into __builtin_trap
or __builtin_unreachable or whatever) to mark MEMs that are always invalid
if executed.

Jakub


Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)

2016-01-29 Thread Bernd Edlinger
On 28.01.2016 23:17, Richard Sandiford wrote:
> Bernd Edlinger  writes:
>> On 26.01.2016 22:18, Richard Sandiford wrote:
>>> [cc-ing Eric as RTL maintainer]
>>>
>>> Matthew Fortune  writes:
 Bernd Edlinger  writes:
> Matthew Fortune  writes:
>> Has the patch been tested beyond just building GCC? I can do a
>> test run for you if you don't have things set up to do one yourself.
>
> I built a cross-gcc with all languages and a cross-glibc, but I have
> not set up an emulation environment, so if you could give it a test
> that would be highly welcome.

 mipsel-linux-gnu test results are the same before and after this patch.

 Please go ahead and commit.
>>>
>>> I still object to this.  And it feels like the patch was posted
>>> as though it was a new one in order to avoid answering the objections
>>> that were raised when it was last posted:
>>>
>>> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02218.html
>>>
>>
>> Richard, I am really sorry when you feel now like I did not take your
>> objections seriously.  Let me first explain what happened from my point
>> of view:
>>
>> When I posted this response to your objections here:
>>
>>https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00235.html
>>
>> I was waiting for your response, but nothing happened, so I kind of
>> forgot about the issue.  In the meantime Ubuntu and Debian began to
>> roll out GCC-6 and they got stuck at the same issue, but instead of
>> waiting for pr69012 to be eventually resolved they created a duplicate
>> pr69129, and then last Thursday Nick applied my initial patch without
>> my intervention, probably because of the pressure they put on us.
>
> Ah, I'd missed that, sorry.  It's not obvious from the email thread
> that the patch was actually approved.  I just see a message from Matthias
> saying that it worked for him and then a message from Nick saying that
> he'd applied it.
>
> Ah well.  I guess at some point I have to get over the fact that I'm no
> longer the MIPS maintainer :-)
>
>> That changed significantly how I looked at the issue after that point,
>> as there was no actual regression anymore, the revised patch still made
>> sense, but for another reason.  When you look at the 20+ targets in the
>> gcc tree you'll see that almost all of them have a frame-layout
>> computation function and all except mips have a shortcut
>> "if (reload_completed) return;" in that function.  And OTOH mips has
>> one of the most complicated frame layout functions of all targets.
>>
>> For all of these reasons I posted a new patch which tries to resolve
>> differences between mips and other targets inital_elimination_offset
>> functions.
>
> OK.  But the point still stands that the patch is only useful because
> we're now calling mips_compute_frame_info in cases where we wouldn't
> previously, because of the rtx_addr_can_trap_p changes.
>

Yes of course, but it cannot hurt to have all targets behave
identical in such a central point.  Even if at some point also the
implementation in rtx_addr_can_trap_p will eventually be improved in
one way or the other.

>> I still think that it is not OK for the mips target to do the frame
>> layout already in mips_frame_pointer_required because the frame layout
>> will change several times, until reload is completed, and that function
>> is only called right in the beginning.
>
> I don't think it's any better or worse than doing the frame layout in
> INITIAL_ELIMINATION_OFFSET (which is common practice and pretty much
> required).  They're both part of the initial setup phase --
> targetm.frame_layout_required determines frame_pointer_needed, which is
> a vital input to the code that decides which eliminations to make.
>

Hmm... That can also be a difference between LRA and traditional reload.
Reload calls targetm.frame_pointer_required in update_eliminables, and
can apparently handle 0=>1 and 1=>0 transitions here.

While LRA calls frame_pointer_required only once in
ira_setup_eliminable_regset and can only handle 0=>1 transitions of
frame_pointer_needed in setup_can_eliminate, but not based on
targetm.frame_pointer_required but
targetm.can_eliminate(FRAME_POINTER_REGNUM, STACK_POINTER_REGNUM).

At least it I read the source correctly, I could be wrong of course.

> And this is an example of us doing the kind of caching that I was
> suggesting.  Code that wants to know whether the frame pointer is needed
> for the current function should use frame_pointer_needed.  Only the code
> that sets up frame_pointer_needed should call frame_pointer_required
> directly.
>

But frame_pointer_needed adds at least some value to the raw
targetm.frame_pointer_required, while a cached result of
initial_elimination_offset would not, just conserve the current
interface exactly as it is.

>> And I think that it is not really well-designed to have a frame layout
>> 

Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)

2016-01-29 Thread Bernd Edlinger
On 29.01.2016 16:47 Bernd Schmidt wrote:
> On 01/29/2016 04:41 PM, Jakub Jelinek wrote:
>> On Fri, Jan 29, 2016 at 02:09:25AM +0100, Bernd Schmidt wrote:
>
>>> I think a better approach might be to just mark accesses at known
>>> locations
>>> in the frame, or arg pushes, as MEM_NOTRAP_P, and consider accesses with
>>> non-constant or calculated offsets as potentially trapping.
>>
>> I don't see how that would work generally.  Sure, if there is e.g. a
>> constant
>> offset array access, it could be checked easily, but if there is
>> variable offset
>> array access, that is at some point later on changed into a constant
>> offset
>> access, you'd need to be conservative, unless you can prove it is in
>> range.
>
> Yes. What is the problem with that? If we have (plus sfp const_int) at
> any point before reload, we can check whether that offset is inside
> frame_size. If it isn't or if the offset isn't known, it could trap.
>
>

Usually we have "if (x==1234) { read MEM[FP+x]; }", so wo don't know,
and then after reload: "if (x==1234) { read MEM[SP+x+sp_fp_offset]; }"
but wait, in the if statement we know, that x==1234, so everything
turns in one magic constant, and we have a totally new constant offset
from the SP register "if (x==1234) { read MEM[SP+1234+sp_fp_offset]; }".
Now if rtx_addr_can_trap_p(MEM[SP+1234+sp_fp_offset]) says it cannot
trap we think we do not need the if at all => BANG.


Bernd.


Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)

2016-01-28 Thread Richard Sandiford
Bernd Edlinger  writes:
> On 26.01.2016 22:18, Richard Sandiford wrote:
>> [cc-ing Eric as RTL maintainer]
>>
>> Matthew Fortune  writes:
>>> Bernd Edlinger  writes:
 Matthew Fortune  writes:
> Has the patch been tested beyond just building GCC? I can do a
> test run for you if you don't have things set up to do one yourself.

 I built a cross-gcc with all languages and a cross-glibc, but I have
 not set up an emulation environment, so if you could give it a test
 that would be highly welcome.
>>>
>>> mipsel-linux-gnu test results are the same before and after this patch.
>>>
>>> Please go ahead and commit.
>>
>> I still object to this.  And it feels like the patch was posted
>> as though it was a new one in order to avoid answering the objections
>> that were raised when it was last posted:
>>
>>https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02218.html
>>
>
> Richard, I am really sorry when you feel now like I did not take your
> objections seriously.  Let me first explain what happened from my point
> of view:
>
> When I posted this response to your objections here:
>
>   https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00235.html
>
> I was waiting for your response, but nothing happened, so I kind of
> forgot about the issue.  In the meantime Ubuntu and Debian began to
> roll out GCC-6 and they got stuck at the same issue, but instead of
> waiting for pr69012 to be eventually resolved they created a duplicate
> pr69129, and then last Thursday Nick applied my initial patch without
> my intervention, probably because of the pressure they put on us.

Ah, I'd missed that, sorry.  It's not obvious from the email thread
that the patch was actually approved.  I just see a message from Matthias
saying that it worked for him and then a message from Nick saying that
he'd applied it.

Ah well.  I guess at some point I have to get over the fact that I'm no
longer the MIPS maintainer :-)

> That changed significantly how I looked at the issue after that point,
> as there was no actual regression anymore, the revised patch still made
> sense, but for another reason.  When you look at the 20+ targets in the
> gcc tree you'll see that almost all of them have a frame-layout
> computation function and all except mips have a shortcut
> "if (reload_completed) return;" in that function.  And OTOH mips has
> one of the most complicated frame layout functions of all targets.
>
> For all of these reasons I posted a new patch which tries to resolve
> differences between mips and other targets inital_elimination_offset
> functions.

OK.  But the point still stands that the patch is only useful because
we're now calling mips_compute_frame_info in cases where we wouldn't
previously, because of the rtx_addr_can_trap_p changes.

> I still think that it is not OK for the mips target to do the frame
> layout already in mips_frame_pointer_required because the frame layout
> will change several times, until reload is completed, and that function
> is only called right in the beginning.

I don't think it's any better or worse than doing the frame layout in
INITIAL_ELIMINATION_OFFSET (which is common practice and pretty much
required).  They're both part of the initial setup phase --
targetm.frame_layout_required determines frame_pointer_needed, which is
a vital input to the code that decides which eliminations to make.

And this is an example of us doing the kind of caching that I was
suggesting.  Code that wants to know whether the frame pointer is needed
for the current function should use frame_pointer_needed.  Only the code
that sets up frame_pointer_needed should call frame_pointer_required
directly. 

> And I think that it is not really well-designed to have a frame layout
> function F(x,y)->z unless you assume F to be a trivial O(1) function
> that has no significant computation overhead.  When you assume F to be
> an expensive O(N) or higher complexity you would design the interface
> like compute_frame_layout_now() and
> get_cached_initial_elimination_offset(x,y)->z.
>
> Also note, that with the current design, of initial_elimination_offset
> the frame layout is already computed several times, because reload has
> to call the function with different parameters, and there is no way how
> to know when the cached value can be used and when not.

Agreed -- the current interface is pretty poor.  But that's even more
reason not to add uses of it after LRA/reload.  Caching would be both
simpler and more efficient.

> I do also think that having to cache the initial elimination offset
> values in the back-end that are already cached in the target does not
> improve anything.  Then I would prefer to have a set of new target
> callbacks that makes it easy to access the cached target values.

I don't agree.  Asking each backend to cache a particular value in
its frame_info structure and then providing a 

Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)

2016-01-28 Thread Bernd Schmidt

On 01/28/2016 12:36 AM, Eric Botcazou wrote:

[cc-ing Eric as RTL maintainer]


Sorry for the delay, the message apparently bounced]


IMO the problem is that rtx_addr_can_trap_p_1 duplicates a large
bit of LRA/reload logic:

[...]

Under the current interface macros like INITIAL_ELIMINATION_OFFSET
are expected to trigger the calculation of the target's frame layout
(since you need that information to answer the question).
To me it seems wrong that we're attempting to call that sort of
macro in a query routine like rtx_addr_can_trap_p_1.


I'm a little uncomfortable stepping in here because, while I essentially share
your objections and was opposed to the patch (I was almost sure that it would
introduce maintainance issues for no practical benefit), I didn't imagine that
such a failure mode would have been possible (computing an approximation of
the frame layout after reload being problematic) so I didn't really object to
being overruled after seeing Bernd's patch...


IMO we should cache the information we need@the start of each
LRA/reload cycle.  This will be more robust, because there will
be no accidental changes to global state either during or after
LRA/reload.  It will also be more efficient because
rtx_addr_can_trap_p_1 can read the cached variables rather
than calling back into the target.


That would be a better design for sure and would eliminate the maintainance
issues I was originally afraid of.

My recommendation would be to back out Bernd's patch for GCC 6.0 (again, it
doesn't fix any regression and, more importantly, any bug in real software,
but only corner case bugs in dumb computer-generated testcases) but with the
commitment to address the underlying issue for GCC 7.0 and backport the fix to
GCC 6.x unless really impracticable.  That being said, it's ultimately Jakub
and Richard's call.


I'm on the fence; I do think the original problem is an issue we should 
fix, but I'm also not terribly happy with the implementation we have 
right now. Besides the issues already mentioned, doesn't it kind of 
assume these offsets are constant (which they definitely aren't, 
consider arg pushes for example)?


I think a better approach might be to just mark accesses at known 
locations in the frame, or arg pushes, as MEM_NOTRAP_P, and consider 
accesses with non-constant or calculated offsets as potentially trapping.



Bernd


Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)

2016-01-27 Thread Eric Botcazou
> [cc-ing Eric as RTL maintainer]

Sorry for the delay, the message apparently bounced]

> IMO the problem is that rtx_addr_can_trap_p_1 duplicates a large
> bit of LRA/reload logic:
> 
> [...]
> 
> Under the current interface macros like INITIAL_ELIMINATION_OFFSET
> are expected to trigger the calculation of the target's frame layout
> (since you need that information to answer the question).
> To me it seems wrong that we're attempting to call that sort of
> macro in a query routine like rtx_addr_can_trap_p_1.

I'm a little uncomfortable stepping in here because, while I essentially share 
your objections and was opposed to the patch (I was almost sure that it would 
introduce maintainance issues for no practical benefit), I didn't imagine that 
such a failure mode would have been possible (computing an approximation of 
the frame layout after reload being problematic) so I didn't really object to 
being overruled after seeing Bernd's patch...

> IMO we should cache the information we need@the start of each
> LRA/reload cycle.  This will be more robust, because there will
> be no accidental changes to global state either during or after
> LRA/reload.  It will also be more efficient because
> rtx_addr_can_trap_p_1 can read the cached variables rather
> than calling back into the target.

That would be a better design for sure and would eliminate the maintainance 
issues I was originally afraid of.

My recommendation would be to back out Bernd's patch for GCC 6.0 (again, it 
doesn't fix any regression and, more importantly, any bug in real software, 
but only corner case bugs in dumb computer-generated testcases) but with the 
commitment to address the underlying issue for GCC 7.0 and backport the fix to 
GCC 6.x unless really impracticable.  That being said, it's ultimately Jakub 
and Richard's call.

-- 
Eric Botcazou


Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)

2016-01-27 Thread Bernd Edlinger
On 26.01.2016 22:18, Richard Sandiford wrote:
> [cc-ing Eric as RTL maintainer]
>
> Matthew Fortune  writes:
>> Bernd Edlinger  writes:
>>> Matthew Fortune  writes:
 Has the patch been tested beyond just building GCC? I can do a
 test run for you if you don't have things set up to do one yourself.
>>>
>>> I built a cross-gcc with all languages and a cross-glibc, but I have
>>> not set up an emulation environment, so if you could give it a test
>>> that would be highly welcome.
>>
>> mipsel-linux-gnu test results are the same before and after this patch.
>>
>> Please go ahead and commit.
>
> I still object to this.  And it feels like the patch was posted
> as though it was a new one in order to avoid answering the objections
> that were raised when it was last posted:
>
>https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02218.html
>

Richard, I am really sorry when you feel now like I did not take your
objections seriously.  Let me first explain what happened from my point
of view:

When I posted this response to your objections here:

  https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00235.html

I was waiting for your response, but nothing happened, so I kind of
forgot about the issue.  In the meantime Ubuntu and Debian began to
roll out GCC-6 and they got stuck at the same issue, but instead of
waiting for pr69012 to be eventually resolved they created a duplicate
pr69129, and then last Thursday Nick applied my initial patch without
my intervention, probably because of the pressure they put on us.

That changed significantly how I looked at the issue after that point,
as there was no actual regression anymore, the revised patch still made
sense, but for another reason.  When you look at the 20+ targets in the
gcc tree you'll see that almost all of them have a frame-layout
computation function and all except mips have a shortcut
"if (reload_completed) return;" in that function.  And OTOH mips has
one of the most complicated frame layout functions of all targets.

For all of these reasons I posted a new patch which tries to resolve
differences between mips and other targets inital_elimination_offset
functions.

I still think that it is not OK for the mips target to do the frame
layout already in mips_frame_pointer_required because the frame layout
will change several times, until reload is completed, and that function
is only called right in the beginning.

And I think that it is not really well-designed to have a frame layout
function F(x,y)->z unless you assume F to be a trivial O(1) function
that has no significant computation overhead.  When you assume F to be
an expensive O(N) or higher complexity you would design the interface
like compute_frame_layout_now() and
get_cached_initial_elimination_offset(x,y)->z.

Also note, that with the current design, of initial_elimination_offset
the frame layout is already computed several times, because reload has
to call the function with different parameters, and there is no way how
to know when the cached value can be used and when not.

I do also think that having to cache the initial elimination offset
values in the back-end that are already cached in the target does not
improve anything.  Then I would prefer to have a set of new target
callbacks that makes it easy to access the cached target values.

If you want to propose a patch for that I will not object, but I doubt
it will be suitable for Stage 4.


Thanks
Bernd.


> IMO the problem is that rtx_addr_can_trap_p_1 duplicates a large
> bit of LRA/reload logic:
>
> 
> /* Compute an approximation for the offset between the register
> FROM and TO for the current function, as it was at the start
> of the routine.  */
>
> static HOST_WIDE_INT
> get_initial_register_offset (int from, int to)
> {
> #ifdef ELIMINABLE_REGS
>static const struct elim_table_t
>{
>  const int from;
>  const int to;
>} table[] = ELIMINABLE_REGS;
>HOST_WIDE_INT offset1, offset2;
>unsigned int i, j;
>
>if (to == from)
>  return 0;
>
>/* It is not safe to call INITIAL_ELIMINATION_OFFSET
>   before the reload pass.  We need to give at least
>   an estimation for the resulting frame size.  */
>if (! reload_completed)
>  {
>offset1 = crtl->outgoing_args_size + get_frame_size ();
> #if !STACK_GROWS_DOWNWARD
>offset1 = - offset1;
> #endif
>if (to == STACK_POINTER_REGNUM)
>   return offset1;
>else if (from == STACK_POINTER_REGNUM)
>   return - offset1;
>else
>   return 0;
>   }
>
>for (i = 0; i < ARRAY_SIZE (table); i++)
>if (table[i].from == from)
>   {
> if (table[i].to == to)
>   {
> INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to,
> offset1);
> return offset1;
>  

Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)

2016-01-26 Thread Richard Sandiford
[cc-ing Eric as RTL maintainer]

Matthew Fortune  writes:
> Bernd Edlinger  writes:
>> Matthew Fortune  writes:
>> > Has the patch been tested beyond just building GCC? I can do a
>> > test run for you if you don't have things set up to do one yourself.
>> 
>> I built a cross-gcc with all languages and a cross-glibc, but I have
>> not set up an emulation environment, so if you could give it a test
>> that would be highly welcome.
>
> mipsel-linux-gnu test results are the same before and after this patch.
>
> Please go ahead and commit.

I still object to this.  And it feels like the patch was posted
as though it was a new one in order to avoid answering the objections
that were raised when it was last posted:

  https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02218.html

IMO the problem is that rtx_addr_can_trap_p_1 duplicates a large
bit of LRA/reload logic:


/* Compute an approximation for the offset between the register
   FROM and TO for the current function, as it was at the start
   of the routine.  */

static HOST_WIDE_INT
get_initial_register_offset (int from, int to)
{
#ifdef ELIMINABLE_REGS
  static const struct elim_table_t
  {
const int from;
const int to;
  } table[] = ELIMINABLE_REGS;
  HOST_WIDE_INT offset1, offset2;
  unsigned int i, j;

  if (to == from)
return 0;

  /* It is not safe to call INITIAL_ELIMINATION_OFFSET
 before the reload pass.  We need to give at least
 an estimation for the resulting frame size.  */
  if (! reload_completed)
{
  offset1 = crtl->outgoing_args_size + get_frame_size ();
#if !STACK_GROWS_DOWNWARD
  offset1 = - offset1;
#endif
  if (to == STACK_POINTER_REGNUM)
return offset1;
  else if (from == STACK_POINTER_REGNUM)
return - offset1;
  else
return 0;
 }

  for (i = 0; i < ARRAY_SIZE (table); i++)
  if (table[i].from == from)
{
  if (table[i].to == to)
{
  INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to,
  offset1);
  return offset1;
}
  for (j = 0; j < ARRAY_SIZE (table); j++)
{
  if (table[j].to == to
  && table[j].from == table[i].to)
{
  INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to,
  offset1);
  INITIAL_ELIMINATION_OFFSET (table[j].from, table[j].to,
  offset2);
  return offset1 + offset2;
}
  if (table[j].from == to
  && table[j].to == table[i].to)
{
  INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to,
  offset1);
  INITIAL_ELIMINATION_OFFSET (table[j].from, table[j].to,
  offset2);
  return offset1 - offset2;
}
}
}
  else if (table[i].to == from)
{
  if (table[i].from == to)
{
  INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to,
  offset1);
  return - offset1;
}
  for (j = 0; j < ARRAY_SIZE (table); j++)
{
  if (table[j].to == to
  && table[j].from == table[i].from)
{
  INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to,
  offset1);
  INITIAL_ELIMINATION_OFFSET (table[j].from, table[j].to,
  offset2);
  return - offset1 + offset2;
}
  if (table[j].from == to
  && table[j].to == table[i].from)
{
  INITIAL_ELIMINATION_OFFSET (table[i].from, table[i].to,
  offset1);
  INITIAL_ELIMINATION_OFFSET (table[j].from, table[j].to,
  offset2);
  return - offset1 - offset2;
}
}
}

  /* If the requested register combination was not found,
 try a different more simple combination.  */
  if (from == ARG_POINTER_REGNUM)
return get_initial_register_offset (HARD_FRAME_POINTER_REGNUM, to);
  else if (to == ARG_POINTER_REGNUM)
return get_initial_register_offset (from, HARD_FRAME_POINTER_REGNUM);
  else if (from == HARD_FRAME_POINTER_REGNUM)
return get_initial_register_offset (FRAME_POINTER_REGNUM, to);
  else if (to == HARD_FRAME_POINTER_REGNUM)
return get_initial_register_offset (from, FRAME_POINTER_REGNUM);
  else
return 0;

#else
  

RE: [PATCH] Skip re-computing the mips frame info after reload completed

2016-01-25 Thread Matthew Fortune
Bernd Edlinger  writes:
> Matthew Fortune  writes:
> > Has the patch been tested beyond just building GCC? I can do a
> > test run for you if you don't have things set up to do one yourself.
> 
> I built a cross-gcc with all languages and a cross-glibc, but I have
> not set up an emulation environment, so if you could give it a test
> that would be highly welcome.

mipsel-linux-gnu test results are the same before and after this patch.

Please go ahead and commit.

Thanks,
Matthew


RE: [PATCH] Skip re-computing the mips frame info after reload completed

2016-01-24 Thread Matthew Fortune
Bernd Edlinger  writes:
> this patch skips the redundant re-computing of the frame info after reload 
> completed.
> 
> I looked at all available targets initial_elimination_offset functions:
> 
> All of them currently use either a trivial function of 
> crtl->outgoing_args_size,
> get_frame_size ()
> and df_regs_ever_live_p (x), that can be expected to be evaluated quickly and 
> to be
> constant,
> or they use a cached value, if they are called when reload_completed is true.
> 

Hi Bernd,

For the sake of consistency with other targets then this looks fine. I'd
both prefer the MIPS backend to be as efficient as other targets and not
crash when something like initial elimination offset gets used post
reload. Whether it is right or wrong that we end up with these calls
post-reload is a different matter but I don't understand the detail
enough to comment.

> I believe this patch will both be a performance optimization and
> guarantee that the frame info can not unexpectedly change when it
> should not.

I am a bit dubious about this comment as simply not re-computing the
frame info is not quite the same as it being consistent with
register usage etc. However, we have no check for this right now
post-reload so there is no requirement to improve the situation.

> I have successfully built a cross-compiler with this patch.
> Is it OK for trunk?

Has the patch been tested beyond just building GCC? I can do a
test run for you if you don't have things set up to do one yourself.

Thanks,
Matthew