Re: [PING][PATCH, AArch64] Disable reg offset in quad-word store for Falkor

2018-01-17 Thread Siddhesh Poyarekar
Sorry, fixed a couple of typos that prevented the patch from actually working. Here's the updated version. I'll be building on ADDR_QUERY_STR for identifying and preventing pre/post incrementing addresses for stores for falkor. Siddhesh 2018-xx-xx Jim Wilson

Re: [PING][PATCH, AArch64] Disable reg offset in quad-word store for Falkor

2018-01-17 Thread Siddhesh Poyarekar
On Thursday 18 January 2018 01:11 AM, Jim Wilson wrote: > This is the only solution I found that worked. I tried a few things and ended up with pretty much the same fix you have except with the check in a slightly different place. That is, I used aarch64_classify_address to gate the change

Re: [PING][PATCH, AArch64] Disable reg offset in quad-word store for Falkor

2018-01-17 Thread Jim Wilson
On 01/17/2018 05:37 AM, Wilco Dijkstra wrote: In general I think the best way to achieve this would be to use the existing cost models which are there for exactly this purpose. If this doesn't work well enough then we should fix those. I tried using cost models, and this didn't work, because

Re: [PING][PATCH, AArch64] Disable reg offset in quad-word store for Falkor

2018-01-17 Thread Siddhesh Poyarekar
On Wednesday 17 January 2018 11:13 PM, Wilco Dijkstra wrote: > Are you saying the same issue exists for all stores with writeback? If so then > your patch would need to address that too. Yes, I'll be posting a separate patch for that because the condition set is slightly different for it. It

Re: [PING][PATCH, AArch64] Disable reg offset in quad-word store for Falkor

2018-01-17 Thread Wilco Dijkstra
Siddhesh Poyarekar wrote: On Wednesday 17 January 2018 08:31 PM, Wilco Dijkstra wrote: > Why is that a bad thing? With the patch as is, the testcase generates: > > .L4: >    ldr q0, [x2, x3] >    add x5, x1, x3 >    add x3, x3, 16 >    cmp x3, x4 >    str

Re: [PING][PATCH, AArch64] Disable reg offset in quad-word store for Falkor

2018-01-17 Thread Siddhesh Poyarekar
On Wednesday 17 January 2018 08:31 PM, Wilco Dijkstra wrote: > Why is that a bad thing? With the patch as is, the testcase generates: > > .L4: > ldr q0, [x2, x3] > add x5, x1, x3 > add x3, x3, 16 > cmp x3, x4 > str q0, [x5] > bne .L4 >

Re: [PING][PATCH, AArch64] Disable reg offset in quad-word store for Falkor

2018-01-17 Thread Wilco Dijkstra
Siddhesh Poyarekar wrote:   > The current cost model will disable reg offset for loads as well as > stores, which doesn't work well since loads with reg offset are faster > for falkor. Why is that a bad thing? With the patch as is, the testcase generates: .L4: ldr q0, [x2, x3]

Re: [PING][PATCH, AArch64] Disable reg offset in quad-word store for Falkor

2018-01-17 Thread Siddhesh Poyarekar
On Wednesday 17 January 2018 07:07 PM, Wilco Dijkstra wrote: > (finished version this time, somehow Outlook loves to send emails early...) > > Hi, > > In general I think the best way to achieve this would be to use the > existing cost models which are there for exactly this purpose. If > this

Re: [PING][PATCH, AArch64] Disable reg offset in quad-word store for Falkor

2018-01-17 Thread Wilco Dijkstra
(finished version this time, somehow Outlook loves to send emails early...) Hi, In general I think the best way to achieve this would be to use the existing cost models which are there for exactly this purpose. If this doesn't work well enough then we should fix those. As is, this patch disables

Re: [PING][PATCH, AArch64] Disable reg offset in quad-word store for Falkor

2018-01-17 Thread Wilco Dijkstra
Hi, In general I think the best way to achieve this would be to use the existing cost models which are there for exactly this purpose. If this doesn't work well enough then we should fix those. As is, this patch disables a whole class of instructions for a specific target rather than simply

[PING][PATCH, AArch64] Disable reg offset in quad-word store for Falkor

2018-01-17 Thread Siddhesh Poyarekar
From: Siddhesh Poyarekar Hi, Jim Wilson posted a patch for this in September[1] and it appears following discussions that the patch was an acceptable fix for falkor. Kugan followed up[2] with a test case since that was requested during initial review. Jim has moved on

Re: [PATCH, AArch64] Disable reg offset in quad-word store for Falkor.

2017-11-01 Thread Kugan Vivekanandarajah
Hi, On 1 November 2017 at 03:12, Jim Wilson wrote: > On Tue, 2017-10-31 at 14:35 +1100, Kugan Vivekanandarajah wrote: >> Ping ? >> >> I see that Jim has clarified the comments from Andrew. > > Andrew also suggested that we add a testcase to the testsuite. I > didn't do

Re: [PATCH, AArch64] Disable reg offset in quad-word store for Falkor.

2017-10-31 Thread Kugan Vivekanandarajah
Hi Jim, On 1 November 2017 at 03:12, Jim Wilson wrote: > On Tue, 2017-10-31 at 14:35 +1100, Kugan Vivekanandarajah wrote: >> Ping ? >> >> I see that Jim has clarified the comments from Andrew. > > Andrew also suggested that we add a testcase to the testsuite. I > didn't do

Re: [PATCH, AArch64] Disable reg offset in quad-word store for Falkor.

2017-10-31 Thread Jim Wilson
On Tue, 2017-10-31 at 14:35 +1100, Kugan Vivekanandarajah wrote: > Ping ? > > I see that Jim has clarified the comments from Andrew. Andrew also suggested that we add a testcase to the testsuite.  I didn't do that.  I did put a testcase to reproduce in the bug report.  See     

Re: [PATCH, AArch64] Disable reg offset in quad-word store for Falkor.

2017-10-30 Thread Kugan Vivekanandarajah
Ping ? I see that Jim has clarified the comments from Andrew. Thanks, Kugan On 13 October 2017 at 08:48, Jim Wilson wrote: > On Fri, 2017-09-22 at 14:11 -0700, Andrew Pinski wrote: >> On Fri, Sep 22, 2017 at 11:39 AM, Jim Wilson >> wrote: >> > >> >

Re: [PATCH, AArch64] Disable reg offset in quad-word store for Falkor.

2017-10-12 Thread Jim Wilson
On Fri, 2017-09-22 at 14:11 -0700, Andrew Pinski wrote: > On Fri, Sep 22, 2017 at 11:39 AM, Jim Wilson > wrote: > > > > On Fri, Sep 22, 2017 at 10:58 AM, Andrew Pinski > > wrote: > > > > > > Two overall comments: > > > * What about splitting

Re: [PATCH, AArch64] Disable reg offset in quad-word store for Falkor.

2017-09-22 Thread Andrew Pinski
On Fri, Sep 22, 2017 at 11:39 AM, Jim Wilson wrote: > On Fri, Sep 22, 2017 at 10:58 AM, Andrew Pinski wrote: >> Two overall comments: >> * What about splitting register_offset into two different elements, >> one for non 128bit modes and one for 128bit

Re: [PATCH, AArch64] Disable reg offset in quad-word store for Falkor.

2017-09-22 Thread Wilco Dijkstra
Hi Jim, This looks like a general issue with reg+reg addressing modes being generated in cases where it is not correct. I haven't looked at lmbench for a while, but it generated absolutely horrible code like: add x1, x0, #120 ldr v0, [x2, x1] add x1, x0, #128 ldr v1, [x2, x1] If this is still

Re: [PATCH, AArch64] Disable reg offset in quad-word store for Falkor.

2017-09-22 Thread Jim Wilson
On Fri, Sep 22, 2017 at 10:58 AM, Andrew Pinski wrote: > Two overall comments: > * What about splitting register_offset into two different elements, > one for non 128bit modes and one for 128bit (and more; OI, etc.) modes > so you get better address generation right away for

Re: [PATCH, AArch64] Disable reg offset in quad-word store for Falkor.

2017-09-22 Thread Andrew Pinski
On Fri, Sep 22, 2017 at 8:59 AM, Jim Wilson wrote: > On Fri, Sep 22, 2017 at 8:49 AM, Jim Wilson wrote: >> On Falkor, because of an idiosyncracy of how the pipelines are designed, a >> quad-word store using a reg+reg addressing mode is almost twice

Re: [PATCH, AArch64] Disable reg offset in quad-word store for Falkor.

2017-09-22 Thread Jim Wilson
On Fri, Sep 22, 2017 at 8:49 AM, Jim Wilson wrote: > On Falkor, because of an idiosyncracy of how the pipelines are designed, a > quad-word store using a reg+reg addressing mode is almost twice as slow as an > add followed by a quad-word store with a single reg addressing

[PATCH, AArch64] Disable reg offset in quad-word store for Falkor.

2017-09-22 Thread Jim Wilson
On Falkor, because of an idiosyncracy of how the pipelines are designed, a quad-word store using a reg+reg addressing mode is almost twice as slow as an add followed by a quad-word store with a single reg addressing mode. So we get better performance if we disallow addressing modes using register