Re: [PATCH] Fix computation of register limit for -fsched-pressure

2016-10-31 Thread Tamar Christina
Yeah, In the test it's reasonable, just used one extra register.

I'm currently running some benchmarks to make sure those are OK too.

Regards,
Tamar


From: Bin.Cheng <amker.ch...@gmail.com>
Sent: Friday, October 28, 2016 12:38:28 PM
To: Tamar Christina
Cc: Pat Haugen; Maxim Kuvyrkov; GCC Patches; nd
Subject: Re: [PATCH] Fix computation of register limit for -fsched-pressure

On Fri, Oct 28, 2016 at 12:27 PM, Tamar Christina
<tamar.christ...@arm.com> wrote:
> Looking at it again,
>
> it seems to be that the testcase should be adjusted.
> There's no actual spilling. It just uses more registers than before due to 
> the scheduling.
Sorry I didn't look into the test, but using more registers sounds
like a regression too?  At least we need to make sure it's reasonable.

Thanks,
bin
>
> I will update the testcase.
>
> Thanks.
>
> 
> From: gcc-patches-ow...@gcc.gnu.org <gcc-patches-ow...@gcc.gnu.org> on behalf 
> of Tamar Christina <tamar.christ...@arm.com>
> Sent: Friday, October 28, 2016 10:53:20 AM
> To: Pat Haugen; Maxim Kuvyrkov
> Cc: GCC Patches; nd
> Subject: Re: [PATCH] Fix computation of register limit for -fsched-pressure
>
> Forwarding to list as well.
> 
> From: Tamar Christina
> Sent: Friday, October 28, 2016 10:52:17 AM
> To: Pat Haugen; Maxim Kuvyrkov
> Cc: GCC Patches
> Subject: Re: [PATCH] Fix computation of register limit for -fsched-pressure
>
> Hi Pat,
>
> The commit seems to be causing some odd stack spills on aarch64.
>
> I've created a new ticket https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78142
>
> Thanks,
> Tamar
>
> 
> From: gcc-patches-ow...@gcc.gnu.org <gcc-patches-ow...@gcc.gnu.org> on behalf 
> of Pat Haugen <pthau...@linux.vnet.ibm.com>
> Sent: Tuesday, October 18, 2016 4:07:55 PM
> To: Maxim Kuvyrkov
> Cc: GCC Patches
> Subject: Re: [PATCH] Fix computation of register limit for -fsched-pressure
>
> On 10/18/2016 05:31 AM, Maxim Kuvyrkov wrote:
>>> > I see your point and agree that current code isn't optimal.  However, I 
>>> > don't think your patch is accurate either.  Consider 
>>> > https://gcc.gnu.org/onlinedocs/gccint/Register-Basics.html and let's 
>>> > assume that FIXED_REGISTERS in class CL is set for a third of the 
>>> > registers, and CALL_USED_REGISTERS is set to "1" for another third of 
>>> > registers.  So we have a third available for zero-cost allocation 
>>> > (CALL_USED_REGISTERS-FIXED_REGISTERS), a third available for spill-cost 
>>> > allocation (ALL_REGISTERS-CALL_USED_REGISTERS) and a third non-available 
>>> > (FIXED_REGISTERS).
>>> >
>>> > For a non-loop-single-basic-block function we should be targeting only 
>>> > the third of register available at zero-cost -- correct?
> Yes.
>
>   This is what is done by the current code, but, apparently, by accident.  It 
> seems that the right register count can be obtained with:
>>> >
>>> >  for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i)
>>> > -  if (call_used_regs[ira_class_hard_regs[cl][i]])
>>> > -++call_used_regs_num[cl];
>>> > +  if (!call_used_regs[ira_class_hard_regs[cl][i]]
>>> > +   || fixed_regs[ira_class_hard_regs[cl][i]])
>>> > +++call_saved_regs_num[cl];
>>> >
>>> > Does this look correct to you?
>> Thinking some more, it seems like fixed_regs should not be available to the 
>> scheduler no matter what.  Therefore, the number of fixed registers should 
>> be subtracted from ira_class_hard_regs_num[cl] without any scaling 
>> (entry_freq / bb_freq).
>
> Ahh, yes, I forgot about FIXED_REGISTERS being included in 
> CALL_USED_REGISTERS. I agree they should be totally removed from the register 
> limit calculation. I'll rework the patch.
>
> Thanks,
> Pat
>



Re: [PATCH] Fix computation of register limit for -fsched-pressure

2016-10-28 Thread Pat Haugen
On 10/28/2016 06:38 AM, Bin.Cheng wrote:
> On Fri, Oct 28, 2016 at 12:27 PM, Tamar Christina
>  wrote:
>> > Looking at it again,
>> >
>> > it seems to be that the testcase should be adjusted.
>> > There's no actual spilling. It just uses more registers than before due to 
>> > the scheduling.
> Sorry I didn't look into the test, but using more registers sounds
> like a regression too?  At least we need to make sure it's reasonable.

Using more/less registers is not unexpected, it all depends on a target's 
number of call-used vs. call-saved registers. As Bin mentioned, as long as it's 
reasonable.

-Pat



Re: [PATCH] Fix computation of register limit for -fsched-pressure

2016-10-28 Thread Bin.Cheng
On Fri, Oct 28, 2016 at 12:27 PM, Tamar Christina
<tamar.christ...@arm.com> wrote:
> Looking at it again,
>
> it seems to be that the testcase should be adjusted.
> There's no actual spilling. It just uses more registers than before due to 
> the scheduling.
Sorry I didn't look into the test, but using more registers sounds
like a regression too?  At least we need to make sure it's reasonable.

Thanks,
bin
>
> I will update the testcase.
>
> Thanks.
>
> 
> From: gcc-patches-ow...@gcc.gnu.org <gcc-patches-ow...@gcc.gnu.org> on behalf 
> of Tamar Christina <tamar.christ...@arm.com>
> Sent: Friday, October 28, 2016 10:53:20 AM
> To: Pat Haugen; Maxim Kuvyrkov
> Cc: GCC Patches; nd
> Subject: Re: [PATCH] Fix computation of register limit for -fsched-pressure
>
> Forwarding to list as well.
> 
> From: Tamar Christina
> Sent: Friday, October 28, 2016 10:52:17 AM
> To: Pat Haugen; Maxim Kuvyrkov
> Cc: GCC Patches
> Subject: Re: [PATCH] Fix computation of register limit for -fsched-pressure
>
> Hi Pat,
>
> The commit seems to be causing some odd stack spills on aarch64.
>
> I've created a new ticket https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78142
>
> Thanks,
> Tamar
>
> 
> From: gcc-patches-ow...@gcc.gnu.org <gcc-patches-ow...@gcc.gnu.org> on behalf 
> of Pat Haugen <pthau...@linux.vnet.ibm.com>
> Sent: Tuesday, October 18, 2016 4:07:55 PM
> To: Maxim Kuvyrkov
> Cc: GCC Patches
> Subject: Re: [PATCH] Fix computation of register limit for -fsched-pressure
>
> On 10/18/2016 05:31 AM, Maxim Kuvyrkov wrote:
>>> > I see your point and agree that current code isn't optimal.  However, I 
>>> > don't think your patch is accurate either.  Consider 
>>> > https://gcc.gnu.org/onlinedocs/gccint/Register-Basics.html and let's 
>>> > assume that FIXED_REGISTERS in class CL is set for a third of the 
>>> > registers, and CALL_USED_REGISTERS is set to "1" for another third of 
>>> > registers.  So we have a third available for zero-cost allocation 
>>> > (CALL_USED_REGISTERS-FIXED_REGISTERS), a third available for spill-cost 
>>> > allocation (ALL_REGISTERS-CALL_USED_REGISTERS) and a third non-available 
>>> > (FIXED_REGISTERS).
>>> >
>>> > For a non-loop-single-basic-block function we should be targeting only 
>>> > the third of register available at zero-cost -- correct?
> Yes.
>
>   This is what is done by the current code, but, apparently, by accident.  It 
> seems that the right register count can be obtained with:
>>> >
>>> >  for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i)
>>> > -  if (call_used_regs[ira_class_hard_regs[cl][i]])
>>> > -++call_used_regs_num[cl];
>>> > +  if (!call_used_regs[ira_class_hard_regs[cl][i]]
>>> > +   || fixed_regs[ira_class_hard_regs[cl][i]])
>>> > +++call_saved_regs_num[cl];
>>> >
>>> > Does this look correct to you?
>> Thinking some more, it seems like fixed_regs should not be available to the 
>> scheduler no matter what.  Therefore, the number of fixed registers should 
>> be subtracted from ira_class_hard_regs_num[cl] without any scaling 
>> (entry_freq / bb_freq).
>
> Ahh, yes, I forgot about FIXED_REGISTERS being included in 
> CALL_USED_REGISTERS. I agree they should be totally removed from the register 
> limit calculation. I'll rework the patch.
>
> Thanks,
> Pat
>


Re: [PATCH] Fix computation of register limit for -fsched-pressure

2016-10-28 Thread Tamar Christina
Looking at it again,

it seems to be that the testcase should be adjusted.
There's no actual spilling. It just uses more registers than before due to the 
scheduling.

I will update the testcase.

Thanks.


From: gcc-patches-ow...@gcc.gnu.org <gcc-patches-ow...@gcc.gnu.org> on behalf 
of Tamar Christina <tamar.christ...@arm.com>
Sent: Friday, October 28, 2016 10:53:20 AM
To: Pat Haugen; Maxim Kuvyrkov
Cc: GCC Patches; nd
Subject: Re: [PATCH] Fix computation of register limit for -fsched-pressure

Forwarding to list as well.

From: Tamar Christina
Sent: Friday, October 28, 2016 10:52:17 AM
To: Pat Haugen; Maxim Kuvyrkov
Cc: GCC Patches
Subject: Re: [PATCH] Fix computation of register limit for -fsched-pressure

Hi Pat,

The commit seems to be causing some odd stack spills on aarch64.

I've created a new ticket https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78142

Thanks,
Tamar


From: gcc-patches-ow...@gcc.gnu.org <gcc-patches-ow...@gcc.gnu.org> on behalf 
of Pat Haugen <pthau...@linux.vnet.ibm.com>
Sent: Tuesday, October 18, 2016 4:07:55 PM
To: Maxim Kuvyrkov
Cc: GCC Patches
Subject: Re: [PATCH] Fix computation of register limit for -fsched-pressure

On 10/18/2016 05:31 AM, Maxim Kuvyrkov wrote:
>> > I see your point and agree that current code isn't optimal.  However, I 
>> > don't think your patch is accurate either.  Consider 
>> > https://gcc.gnu.org/onlinedocs/gccint/Register-Basics.html and let's 
>> > assume that FIXED_REGISTERS in class CL is set for a third of the 
>> > registers, and CALL_USED_REGISTERS is set to "1" for another third of 
>> > registers.  So we have a third available for zero-cost allocation 
>> > (CALL_USED_REGISTERS-FIXED_REGISTERS), a third available for spill-cost 
>> > allocation (ALL_REGISTERS-CALL_USED_REGISTERS) and a third non-available 
>> > (FIXED_REGISTERS).
>> >
>> > For a non-loop-single-basic-block function we should be targeting only the 
>> > third of register available at zero-cost -- correct?
Yes.

  This is what is done by the current code, but, apparently, by accident.  It 
seems that the right register count can be obtained with:
>> >
>> >  for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i)
>> > -  if (call_used_regs[ira_class_hard_regs[cl][i]])
>> > -++call_used_regs_num[cl];
>> > +  if (!call_used_regs[ira_class_hard_regs[cl][i]]
>> > +   || fixed_regs[ira_class_hard_regs[cl][i]])
>> > +++call_saved_regs_num[cl];
>> >
>> > Does this look correct to you?
> Thinking some more, it seems like fixed_regs should not be available to the 
> scheduler no matter what.  Therefore, the number of fixed registers should be 
> subtracted from ira_class_hard_regs_num[cl] without any scaling (entry_freq / 
> bb_freq).

Ahh, yes, I forgot about FIXED_REGISTERS being included in CALL_USED_REGISTERS. 
I agree they should be totally removed from the register limit calculation. 
I'll rework the patch.

Thanks,
Pat



Re: [PATCH] Fix computation of register limit for -fsched-pressure

2016-10-28 Thread Tamar Christina

Forwarding to list as well.

From: Tamar Christina
Sent: Friday, October 28, 2016 10:52:17 AM
To: Pat Haugen; Maxim Kuvyrkov
Cc: GCC Patches
Subject: Re: [PATCH] Fix computation of register limit for -fsched-pressure

Hi Pat,

The commit seems to be causing some odd stack spills on aarch64.

I've created a new ticket https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78142

Thanks,
Tamar


From: gcc-patches-ow...@gcc.gnu.org <gcc-patches-ow...@gcc.gnu.org> on behalf 
of Pat Haugen <pthau...@linux.vnet.ibm.com>
Sent: Tuesday, October 18, 2016 4:07:55 PM
To: Maxim Kuvyrkov
Cc: GCC Patches
Subject: Re: [PATCH] Fix computation of register limit for -fsched-pressure

On 10/18/2016 05:31 AM, Maxim Kuvyrkov wrote:
>> > I see your point and agree that current code isn't optimal.  However, I 
>> > don't think your patch is accurate either.  Consider 
>> > https://gcc.gnu.org/onlinedocs/gccint/Register-Basics.html and let's 
>> > assume that FIXED_REGISTERS in class CL is set for a third of the 
>> > registers, and CALL_USED_REGISTERS is set to "1" for another third of 
>> > registers.  So we have a third available for zero-cost allocation 
>> > (CALL_USED_REGISTERS-FIXED_REGISTERS), a third available for spill-cost 
>> > allocation (ALL_REGISTERS-CALL_USED_REGISTERS) and a third non-available 
>> > (FIXED_REGISTERS).
>> >
>> > For a non-loop-single-basic-block function we should be targeting only the 
>> > third of register available at zero-cost -- correct?
Yes.

  This is what is done by the current code, but, apparently, by accident.  It 
seems that the right register count can be obtained with:
>> >
>> >  for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i)
>> > -  if (call_used_regs[ira_class_hard_regs[cl][i]])
>> > -++call_used_regs_num[cl];
>> > +  if (!call_used_regs[ira_class_hard_regs[cl][i]]
>> > +   || fixed_regs[ira_class_hard_regs[cl][i]])
>> > +++call_saved_regs_num[cl];
>> >
>> > Does this look correct to you?
> Thinking some more, it seems like fixed_regs should not be available to the 
> scheduler no matter what.  Therefore, the number of fixed registers should be 
> subtracted from ira_class_hard_regs_num[cl] without any scaling (entry_freq / 
> bb_freq).

Ahh, yes, I forgot about FIXED_REGISTERS being included in CALL_USED_REGISTERS. 
I agree they should be totally removed from the register limit calculation. 
I'll rework the patch.

Thanks,
Pat



Re: [PATCH] Fix computation of register limit for -fsched-pressure

2016-10-18 Thread Pat Haugen
On 10/18/2016 05:31 AM, Maxim Kuvyrkov wrote:
>> > I see your point and agree that current code isn't optimal.  However, I 
>> > don't think your patch is accurate either.  Consider 
>> > https://gcc.gnu.org/onlinedocs/gccint/Register-Basics.html and let's 
>> > assume that FIXED_REGISTERS in class CL is set for a third of the 
>> > registers, and CALL_USED_REGISTERS is set to "1" for another third of 
>> > registers.  So we have a third available for zero-cost allocation 
>> > (CALL_USED_REGISTERS-FIXED_REGISTERS), a third available for spill-cost 
>> > allocation (ALL_REGISTERS-CALL_USED_REGISTERS) and a third non-available 
>> > (FIXED_REGISTERS).
>> > 
>> > For a non-loop-single-basic-block function we should be targeting only the 
>> > third of register available at zero-cost -- correct?
Yes.

  This is what is done by the current code, but, apparently, by accident.  It 
seems that the right register count can be obtained with:
>> > 
>> >  for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i)
>> > -  if (call_used_regs[ira_class_hard_regs[cl][i]])
>> > -++call_used_regs_num[cl];
>> > +  if (!call_used_regs[ira_class_hard_regs[cl][i]]
>> > +   || fixed_regs[ira_class_hard_regs[cl][i]])
>> > +++call_saved_regs_num[cl];
>> > 
>> > Does this look correct to you?
> Thinking some more, it seems like fixed_regs should not be available to the 
> scheduler no matter what.  Therefore, the number of fixed registers should be 
> subtracted from ira_class_hard_regs_num[cl] without any scaling (entry_freq / 
> bb_freq).

Ahh, yes, I forgot about FIXED_REGISTERS being included in CALL_USED_REGISTERS. 
I agree they should be totally removed from the register limit calculation. 
I'll rework the patch.

Thanks,
Pat



Re: [PATCH] Fix computation of register limit for -fsched-pressure

2016-10-18 Thread Maxim Kuvyrkov
> On Oct 18, 2016, at 1:27 PM, Maxim Kuvyrkov  wrote:
> 
>> 
>> On Oct 17, 2016, at 7:21 PM, Pat Haugen  wrote:
>> 
>> On 10/17/2016 08:17 AM, Maxim Kuvyrkov wrote:
 The patch here, https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01872.html, 
 attempted to scale down the register limit used by -fsched-pressure for 
 the case where the block in question executes as frequently as the entry 
 block to just the call_clobbered (i.e. call_used) regs. But the code is 
 actually scaling toward call_saved registers. The following patch corrects 
 that by computing call_saved regs per class and subtracting out some 
 scaled portion of that.
> 
> Bootstrap/regtest on powerpc64le with no new failures. Ok for trunk?
>>> Hi Pat,
>>> 
>>> I stared at your patch and current code for good 30 minutes, and I still 
>>> don't see what is wrong with the current code.
>>> 
>>> With your patch the number of registers from class CL that scheduler has at 
>>> its disposal for a single-basic-block function will be:
>>> 
>>> sched_call_regs_num[CL] = ira_class_hard_regs_num[CL] - 
>>> call_saved_regs_num[CL];
>>> 
>>> where call_saved_regs_num is number of registers in class CL that need to 
>>> be saved in the prologue (i.e., "free" registers).  I can see some logic in 
>>> setting
>>> 
>>> sched_call_regs_num[CL] = call_saved_regs_num[CL];
>>> 
>>> but not in subtracting number of such registers from the number of total 
>>> available hard registers.
>>> 
>>> Am I missing something?
>>> 
>> 
>> Your original patch gave the following reasoning:
>> 
>> "At the moment the scheduler does not account for spills in the prologues 
>> and restores in the epilogue, which occur from use of call-used registers.  
>> The current state is, essentially, optimized for case when there is a hot 
>> loop inside the function, and the loop executes significantly more often 
>> than the prologue/epilogue.  However, on the opposite end, we have a case 
>> when the function is just a single non-cyclic basic block, which executes 
>> just as often as prologue / epilogue, so spills in the prologue hurt 
>> performance as much as spills in the basic block itself.  In such a case the 
>> scheduler should throttle-down on the number of available registers and try 
>> to not go beyond call-clobbered registers."
>> 
>> But the misunderstanding is that call-used registers do NOT cause any 
>> save/restore. That is to say, call-used == call-clobbered. Your last 
>> sentence explains the goal for a single block function, to not go beyond 
>> call-clobbered (i.e. call-used) registers, which makes perfect sense. My 
>> patch implements that goal by subtracting out call_saved_regs_num (those 
>> that require prolog/epilog save/restore) from the total regs, and using that 
>> as the target # of registers to be used for the block.
> 
> I see your point and agree that current code isn't optimal.  However, I don't 
> think your patch is accurate either.  Consider 
> https://gcc.gnu.org/onlinedocs/gccint/Register-Basics.html and let's assume 
> that FIXED_REGISTERS in class CL is set for a third of the registers, and 
> CALL_USED_REGISTERS is set to "1" for another third of registers.  So we have 
> a third available for zero-cost allocation 
> (CALL_USED_REGISTERS-FIXED_REGISTERS), a third available for spill-cost 
> allocation (ALL_REGISTERS-CALL_USED_REGISTERS) and a third non-available 
> (FIXED_REGISTERS).
> 
> For a non-loop-single-basic-block function we should be targeting only the 
> third of register available at zero-cost -- correct?  This is what is done by 
> the current code, but, apparently, by accident.  It seems that the right 
> register count can be obtained with:
> 
> for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i)
> - if (call_used_regs[ira_class_hard_regs[cl][i]])
> -   ++call_used_regs_num[cl];
> + if (!call_used_regs[ira_class_hard_regs[cl][i]]
> +   || fixed_regs[ira_class_hard_regs[cl][i]])
> +   ++call_saved_regs_num[cl];
> 
> Does this look correct to you?

Thinking some more, it seems like fixed_regs should not be available to the 
scheduler no matter what.  Therefore, the number of fixed registers should be 
subtracted from ira_class_hard_regs_num[cl] without any scaling (entry_freq / 
bb_freq).

--
Maxim Kuvyrkov
www.linaro.org



Re: [PATCH] Fix computation of register limit for -fsched-pressure

2016-10-18 Thread Maxim Kuvyrkov

> On Oct 17, 2016, at 7:21 PM, Pat Haugen  wrote:
> 
> On 10/17/2016 08:17 AM, Maxim Kuvyrkov wrote:
>>> The patch here, https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01872.html, 
>>> attempted to scale down the register limit used by -fsched-pressure for the 
>>> case where the block in question executes as frequently as the entry block 
>>> to just the call_clobbered (i.e. call_used) regs. But the code is actually 
>>> scaling toward call_saved registers. The following patch corrects that by 
>>> computing call_saved regs per class and subtracting out some scaled portion 
>>> of that.
 
 Bootstrap/regtest on powerpc64le with no new failures. Ok for trunk?
>> Hi Pat,
>> 
>> I stared at your patch and current code for good 30 minutes, and I still 
>> don't see what is wrong with the current code.
>> 
>> With your patch the number of registers from class CL that scheduler has at 
>> its disposal for a single-basic-block function will be:
>> 
>> sched_call_regs_num[CL] = ira_class_hard_regs_num[CL] - 
>> call_saved_regs_num[CL];
>> 
>> where call_saved_regs_num is number of registers in class CL that need to be 
>> saved in the prologue (i.e., "free" registers).  I can see some logic in 
>> setting
>> 
>> sched_call_regs_num[CL] = call_saved_regs_num[CL];
>> 
>> but not in subtracting number of such registers from the number of total 
>> available hard registers.
>> 
>> Am I missing something?
>> 
> 
> Your original patch gave the following reasoning:
> 
> "At the moment the scheduler does not account for spills in the prologues and 
> restores in the epilogue, which occur from use of call-used registers.  The 
> current state is, essentially, optimized for case when there is a hot loop 
> inside the function, and the loop executes significantly more often than the 
> prologue/epilogue.  However, on the opposite end, we have a case when the 
> function is just a single non-cyclic basic block, which executes just as 
> often as prologue / epilogue, so spills in the prologue hurt performance as 
> much as spills in the basic block itself.  In such a case the scheduler 
> should throttle-down on the number of available registers and try to not go 
> beyond call-clobbered registers."
> 
> But the misunderstanding is that call-used registers do NOT cause any 
> save/restore. That is to say, call-used == call-clobbered. Your last sentence 
> explains the goal for a single block function, to not go beyond 
> call-clobbered (i.e. call-used) registers, which makes perfect sense. My 
> patch implements that goal by subtracting out call_saved_regs_num (those that 
> require prolog/epilog save/restore) from the total regs, and using that as 
> the target # of registers to be used for the block.

I see your point and agree that current code isn't optimal.  However, I don't 
think your patch is accurate either.  Consider 
https://gcc.gnu.org/onlinedocs/gccint/Register-Basics.html and let's assume 
that FIXED_REGISTERS in class CL is set for a third of the registers, and 
CALL_USED_REGISTERS is set to "1" for another third of registers.  So we have a 
third available for zero-cost allocation (CALL_USED_REGISTERS-FIXED_REGISTERS), 
a third available for spill-cost allocation (ALL_REGISTERS-CALL_USED_REGISTERS) 
and a third non-available (FIXED_REGISTERS).

For a non-loop-single-basic-block function we should be targeting only the 
third of register available at zero-cost -- correct?  This is what is done by 
the current code, but, apparently, by accident.  It seems that the right 
register count can be obtained with:

  for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i)
-   if (call_used_regs[ira_class_hard_regs[cl][i]])
- ++call_used_regs_num[cl];
+   if (!call_used_regs[ira_class_hard_regs[cl][i]]
+   || fixed_regs[ira_class_hard_regs[cl][i]])
+ ++call_saved_regs_num[cl];

Does this look correct to you?

--
Maxim Kuvyrkov
www.linaro.org




Re: [PATCH] Fix computation of register limit for -fsched-pressure

2016-10-17 Thread Pat Haugen
On 10/17/2016 08:17 AM, Maxim Kuvyrkov wrote:
>> The patch here, https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01872.html, 
>> attempted to scale down the register limit used by -fsched-pressure for the 
>> case where the block in question executes as frequently as the entry block 
>> to just the call_clobbered (i.e. call_used) regs. But the code is actually 
>> scaling toward call_saved registers. The following patch corrects that by 
>> computing call_saved regs per class and subtracting out some scaled portion 
>> of that.
>> > 
>> > Bootstrap/regtest on powerpc64le with no new failures. Ok for trunk?
> Hi Pat,
> 
> I stared at your patch and current code for good 30 minutes, and I still 
> don't see what is wrong with the current code.
> 
> With your patch the number of registers from class CL that scheduler has at 
> its disposal for a single-basic-block function will be:
> 
> sched_call_regs_num[CL] = ira_class_hard_regs_num[CL] - 
> call_saved_regs_num[CL];
> 
> where call_saved_regs_num is number of registers in class CL that need to be 
> saved in the prologue (i.e., "free" registers).  I can see some logic in 
> setting
> 
> sched_call_regs_num[CL] = call_saved_regs_num[CL];
> 
> but not in subtracting number of such registers from the number of total 
> available hard registers.
> 
> Am I missing something?
> 

Your original patch gave the following reasoning:

"At the moment the scheduler does not account for spills in the prologues and 
restores in the epilogue, which occur from use of call-used registers.  The 
current state is, essentially, optimized for case when there is a hot loop 
inside the function, and the loop executes significantly more often than the 
prologue/epilogue.  However, on the opposite end, we have a case when the 
function is just a single non-cyclic basic block, which executes just as often 
as prologue / epilogue, so spills in the prologue hurt performance as much as 
spills in the basic block itself.  In such a case the scheduler should 
throttle-down on the number of available registers and try to not go beyond 
call-clobbered registers."

But the misunderstanding is that call-used registers do NOT cause any 
save/restore. That is to say, call-used == call-clobbered. Your last sentence 
explains the goal for a single block function, to not go beyond call-clobbered 
(i.e. call-used) registers, which makes perfect sense. My patch implements that 
goal by subtracting out call_saved_regs_num (those that require prolog/epilog 
save/restore) from the total regs, and using that as the target # of registers 
to be used for the block.


> Also, could you share the testcase that you used to investigate the problem 
> with register-aware scheduling?  I wonder if there is a problem lurking.

I don't have a testcase. I'm currently trying to get -fsched-pressure to be 
beneficial for PowerPC and was familiarizing myself with the code when I 
spotted the issue.

Thanks,
Pat



Re: [PATCH] Fix computation of register limit for -fsched-pressure

2016-10-17 Thread Maxim Kuvyrkov
> On Oct 7, 2016, at 6:08 PM, Pat Haugen  wrote:
> 
> The patch here, https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01872.html, 
> attempted to scale down the register limit used by -fsched-pressure for the 
> case where the block in question executes as frequently as the entry block to 
> just the call_clobbered (i.e. call_used) regs. But the code is actually 
> scaling toward call_saved registers. The following patch corrects that by 
> computing call_saved regs per class and subtracting out some scaled portion 
> of that.
> 
> Bootstrap/regtest on powerpc64le with no new failures. Ok for trunk?

Hi Pat,

I stared at your patch and current code for good 30 minutes, and I still don't 
see what is wrong with the current code.

With your patch the number of registers from class CL that scheduler has at its 
disposal for a single-basic-block function will be:

sched_call_regs_num[CL] = ira_class_hard_regs_num[CL] - call_saved_regs_num[CL];

where call_saved_regs_num is number of registers in class CL that need to be 
saved in the prologue (i.e., "free" registers).  I can see some logic in setting

sched_call_regs_num[CL] = call_saved_regs_num[CL];

but not in subtracting number of such registers from the number of total 
available hard registers.

Am I missing something?

Also, could you share the testcase that you used to investigate the problem 
with register-aware scheduling?  I wonder if there is a problem lurking.

Thank you,

--
Maxim Kuvyrkov
www.linaro.org


> 
> -Pat
> 
> 
> 2016-10-07  Pat Haugen  
> 
>   * haifa-sched.c call_used_regs_num: Rename to...
>   call_saved_regs_num: ...this.
>   (sched_pressure_start_bb): Scale call_saved regs not call_used.
>   (alloc_global_sched_pressure_data): Compute call_saved regs.
> 
> 
>