Re: [PATCH] Fix computation of register limit for -fsched-pressure
Yeah, In the test it's reasonable, just used one extra register. I'm currently running some benchmarks to make sure those are OK too. Regards, Tamar From: Bin.Cheng <amker.ch...@gmail.com> Sent: Friday, October 28, 2016 12:38:28 PM To: Tamar Christina Cc: Pat Haugen; Maxim Kuvyrkov; GCC Patches; nd Subject: Re: [PATCH] Fix computation of register limit for -fsched-pressure On Fri, Oct 28, 2016 at 12:27 PM, Tamar Christina <tamar.christ...@arm.com> wrote: > Looking at it again, > > it seems to be that the testcase should be adjusted. > There's no actual spilling. It just uses more registers than before due to > the scheduling. Sorry I didn't look into the test, but using more registers sounds like a regression too? At least we need to make sure it's reasonable. Thanks, bin > > I will update the testcase. > > Thanks. > > > From: gcc-patches-ow...@gcc.gnu.org <gcc-patches-ow...@gcc.gnu.org> on behalf > of Tamar Christina <tamar.christ...@arm.com> > Sent: Friday, October 28, 2016 10:53:20 AM > To: Pat Haugen; Maxim Kuvyrkov > Cc: GCC Patches; nd > Subject: Re: [PATCH] Fix computation of register limit for -fsched-pressure > > Forwarding to list as well. > > From: Tamar Christina > Sent: Friday, October 28, 2016 10:52:17 AM > To: Pat Haugen; Maxim Kuvyrkov > Cc: GCC Patches > Subject: Re: [PATCH] Fix computation of register limit for -fsched-pressure > > Hi Pat, > > The commit seems to be causing some odd stack spills on aarch64. > > I've created a new ticket https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78142 > > Thanks, > Tamar > > > From: gcc-patches-ow...@gcc.gnu.org <gcc-patches-ow...@gcc.gnu.org> on behalf > of Pat Haugen <pthau...@linux.vnet.ibm.com> > Sent: Tuesday, October 18, 2016 4:07:55 PM > To: Maxim Kuvyrkov > Cc: GCC Patches > Subject: Re: [PATCH] Fix computation of register limit for -fsched-pressure > > On 10/18/2016 05:31 AM, Maxim Kuvyrkov wrote: >>> > I see your point and agree that current code isn't optimal. However, I >>> > don't think your patch is accurate either. Consider >>> > https://gcc.gnu.org/onlinedocs/gccint/Register-Basics.html and let's >>> > assume that FIXED_REGISTERS in class CL is set for a third of the >>> > registers, and CALL_USED_REGISTERS is set to "1" for another third of >>> > registers. So we have a third available for zero-cost allocation >>> > (CALL_USED_REGISTERS-FIXED_REGISTERS), a third available for spill-cost >>> > allocation (ALL_REGISTERS-CALL_USED_REGISTERS) and a third non-available >>> > (FIXED_REGISTERS). >>> > >>> > For a non-loop-single-basic-block function we should be targeting only >>> > the third of register available at zero-cost -- correct? > Yes. > > This is what is done by the current code, but, apparently, by accident. It > seems that the right register count can be obtained with: >>> > >>> > for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i) >>> > - if (call_used_regs[ira_class_hard_regs[cl][i]]) >>> > -++call_used_regs_num[cl]; >>> > + if (!call_used_regs[ira_class_hard_regs[cl][i]] >>> > + || fixed_regs[ira_class_hard_regs[cl][i]]) >>> > +++call_saved_regs_num[cl]; >>> > >>> > Does this look correct to you? >> Thinking some more, it seems like fixed_regs should not be available to the >> scheduler no matter what. Therefore, the number of fixed registers should >> be subtracted from ira_class_hard_regs_num[cl] without any scaling >> (entry_freq / bb_freq). > > Ahh, yes, I forgot about FIXED_REGISTERS being included in > CALL_USED_REGISTERS. I agree they should be totally removed from the register > limit calculation. I'll rework the patch. > > Thanks, > Pat >
Re: [PATCH] Fix computation of register limit for -fsched-pressure
On 10/28/2016 06:38 AM, Bin.Cheng wrote: > On Fri, Oct 28, 2016 at 12:27 PM, Tamar Christina >wrote: >> > Looking at it again, >> > >> > it seems to be that the testcase should be adjusted. >> > There's no actual spilling. It just uses more registers than before due to >> > the scheduling. > Sorry I didn't look into the test, but using more registers sounds > like a regression too? At least we need to make sure it's reasonable. Using more/less registers is not unexpected, it all depends on a target's number of call-used vs. call-saved registers. As Bin mentioned, as long as it's reasonable. -Pat
Re: [PATCH] Fix computation of register limit for -fsched-pressure
On Fri, Oct 28, 2016 at 12:27 PM, Tamar Christina <tamar.christ...@arm.com> wrote: > Looking at it again, > > it seems to be that the testcase should be adjusted. > There's no actual spilling. It just uses more registers than before due to > the scheduling. Sorry I didn't look into the test, but using more registers sounds like a regression too? At least we need to make sure it's reasonable. Thanks, bin > > I will update the testcase. > > Thanks. > > > From: gcc-patches-ow...@gcc.gnu.org <gcc-patches-ow...@gcc.gnu.org> on behalf > of Tamar Christina <tamar.christ...@arm.com> > Sent: Friday, October 28, 2016 10:53:20 AM > To: Pat Haugen; Maxim Kuvyrkov > Cc: GCC Patches; nd > Subject: Re: [PATCH] Fix computation of register limit for -fsched-pressure > > Forwarding to list as well. > > From: Tamar Christina > Sent: Friday, October 28, 2016 10:52:17 AM > To: Pat Haugen; Maxim Kuvyrkov > Cc: GCC Patches > Subject: Re: [PATCH] Fix computation of register limit for -fsched-pressure > > Hi Pat, > > The commit seems to be causing some odd stack spills on aarch64. > > I've created a new ticket https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78142 > > Thanks, > Tamar > > > From: gcc-patches-ow...@gcc.gnu.org <gcc-patches-ow...@gcc.gnu.org> on behalf > of Pat Haugen <pthau...@linux.vnet.ibm.com> > Sent: Tuesday, October 18, 2016 4:07:55 PM > To: Maxim Kuvyrkov > Cc: GCC Patches > Subject: Re: [PATCH] Fix computation of register limit for -fsched-pressure > > On 10/18/2016 05:31 AM, Maxim Kuvyrkov wrote: >>> > I see your point and agree that current code isn't optimal. However, I >>> > don't think your patch is accurate either. Consider >>> > https://gcc.gnu.org/onlinedocs/gccint/Register-Basics.html and let's >>> > assume that FIXED_REGISTERS in class CL is set for a third of the >>> > registers, and CALL_USED_REGISTERS is set to "1" for another third of >>> > registers. So we have a third available for zero-cost allocation >>> > (CALL_USED_REGISTERS-FIXED_REGISTERS), a third available for spill-cost >>> > allocation (ALL_REGISTERS-CALL_USED_REGISTERS) and a third non-available >>> > (FIXED_REGISTERS). >>> > >>> > For a non-loop-single-basic-block function we should be targeting only >>> > the third of register available at zero-cost -- correct? > Yes. > > This is what is done by the current code, but, apparently, by accident. It > seems that the right register count can be obtained with: >>> > >>> > for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i) >>> > - if (call_used_regs[ira_class_hard_regs[cl][i]]) >>> > -++call_used_regs_num[cl]; >>> > + if (!call_used_regs[ira_class_hard_regs[cl][i]] >>> > + || fixed_regs[ira_class_hard_regs[cl][i]]) >>> > +++call_saved_regs_num[cl]; >>> > >>> > Does this look correct to you? >> Thinking some more, it seems like fixed_regs should not be available to the >> scheduler no matter what. Therefore, the number of fixed registers should >> be subtracted from ira_class_hard_regs_num[cl] without any scaling >> (entry_freq / bb_freq). > > Ahh, yes, I forgot about FIXED_REGISTERS being included in > CALL_USED_REGISTERS. I agree they should be totally removed from the register > limit calculation. I'll rework the patch. > > Thanks, > Pat >
Re: [PATCH] Fix computation of register limit for -fsched-pressure
Looking at it again, it seems to be that the testcase should be adjusted. There's no actual spilling. It just uses more registers than before due to the scheduling. I will update the testcase. Thanks. From: gcc-patches-ow...@gcc.gnu.org <gcc-patches-ow...@gcc.gnu.org> on behalf of Tamar Christina <tamar.christ...@arm.com> Sent: Friday, October 28, 2016 10:53:20 AM To: Pat Haugen; Maxim Kuvyrkov Cc: GCC Patches; nd Subject: Re: [PATCH] Fix computation of register limit for -fsched-pressure Forwarding to list as well. From: Tamar Christina Sent: Friday, October 28, 2016 10:52:17 AM To: Pat Haugen; Maxim Kuvyrkov Cc: GCC Patches Subject: Re: [PATCH] Fix computation of register limit for -fsched-pressure Hi Pat, The commit seems to be causing some odd stack spills on aarch64. I've created a new ticket https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78142 Thanks, Tamar From: gcc-patches-ow...@gcc.gnu.org <gcc-patches-ow...@gcc.gnu.org> on behalf of Pat Haugen <pthau...@linux.vnet.ibm.com> Sent: Tuesday, October 18, 2016 4:07:55 PM To: Maxim Kuvyrkov Cc: GCC Patches Subject: Re: [PATCH] Fix computation of register limit for -fsched-pressure On 10/18/2016 05:31 AM, Maxim Kuvyrkov wrote: >> > I see your point and agree that current code isn't optimal. However, I >> > don't think your patch is accurate either. Consider >> > https://gcc.gnu.org/onlinedocs/gccint/Register-Basics.html and let's >> > assume that FIXED_REGISTERS in class CL is set for a third of the >> > registers, and CALL_USED_REGISTERS is set to "1" for another third of >> > registers. So we have a third available for zero-cost allocation >> > (CALL_USED_REGISTERS-FIXED_REGISTERS), a third available for spill-cost >> > allocation (ALL_REGISTERS-CALL_USED_REGISTERS) and a third non-available >> > (FIXED_REGISTERS). >> > >> > For a non-loop-single-basic-block function we should be targeting only the >> > third of register available at zero-cost -- correct? Yes. This is what is done by the current code, but, apparently, by accident. It seems that the right register count can be obtained with: >> > >> > for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i) >> > - if (call_used_regs[ira_class_hard_regs[cl][i]]) >> > -++call_used_regs_num[cl]; >> > + if (!call_used_regs[ira_class_hard_regs[cl][i]] >> > + || fixed_regs[ira_class_hard_regs[cl][i]]) >> > +++call_saved_regs_num[cl]; >> > >> > Does this look correct to you? > Thinking some more, it seems like fixed_regs should not be available to the > scheduler no matter what. Therefore, the number of fixed registers should be > subtracted from ira_class_hard_regs_num[cl] without any scaling (entry_freq / > bb_freq). Ahh, yes, I forgot about FIXED_REGISTERS being included in CALL_USED_REGISTERS. I agree they should be totally removed from the register limit calculation. I'll rework the patch. Thanks, Pat
Re: [PATCH] Fix computation of register limit for -fsched-pressure
Forwarding to list as well. From: Tamar Christina Sent: Friday, October 28, 2016 10:52:17 AM To: Pat Haugen; Maxim Kuvyrkov Cc: GCC Patches Subject: Re: [PATCH] Fix computation of register limit for -fsched-pressure Hi Pat, The commit seems to be causing some odd stack spills on aarch64. I've created a new ticket https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78142 Thanks, Tamar From: gcc-patches-ow...@gcc.gnu.org <gcc-patches-ow...@gcc.gnu.org> on behalf of Pat Haugen <pthau...@linux.vnet.ibm.com> Sent: Tuesday, October 18, 2016 4:07:55 PM To: Maxim Kuvyrkov Cc: GCC Patches Subject: Re: [PATCH] Fix computation of register limit for -fsched-pressure On 10/18/2016 05:31 AM, Maxim Kuvyrkov wrote: >> > I see your point and agree that current code isn't optimal. However, I >> > don't think your patch is accurate either. Consider >> > https://gcc.gnu.org/onlinedocs/gccint/Register-Basics.html and let's >> > assume that FIXED_REGISTERS in class CL is set for a third of the >> > registers, and CALL_USED_REGISTERS is set to "1" for another third of >> > registers. So we have a third available for zero-cost allocation >> > (CALL_USED_REGISTERS-FIXED_REGISTERS), a third available for spill-cost >> > allocation (ALL_REGISTERS-CALL_USED_REGISTERS) and a third non-available >> > (FIXED_REGISTERS). >> > >> > For a non-loop-single-basic-block function we should be targeting only the >> > third of register available at zero-cost -- correct? Yes. This is what is done by the current code, but, apparently, by accident. It seems that the right register count can be obtained with: >> > >> > for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i) >> > - if (call_used_regs[ira_class_hard_regs[cl][i]]) >> > -++call_used_regs_num[cl]; >> > + if (!call_used_regs[ira_class_hard_regs[cl][i]] >> > + || fixed_regs[ira_class_hard_regs[cl][i]]) >> > +++call_saved_regs_num[cl]; >> > >> > Does this look correct to you? > Thinking some more, it seems like fixed_regs should not be available to the > scheduler no matter what. Therefore, the number of fixed registers should be > subtracted from ira_class_hard_regs_num[cl] without any scaling (entry_freq / > bb_freq). Ahh, yes, I forgot about FIXED_REGISTERS being included in CALL_USED_REGISTERS. I agree they should be totally removed from the register limit calculation. I'll rework the patch. Thanks, Pat
Re: [PATCH] Fix computation of register limit for -fsched-pressure
On 10/18/2016 05:31 AM, Maxim Kuvyrkov wrote: >> > I see your point and agree that current code isn't optimal. However, I >> > don't think your patch is accurate either. Consider >> > https://gcc.gnu.org/onlinedocs/gccint/Register-Basics.html and let's >> > assume that FIXED_REGISTERS in class CL is set for a third of the >> > registers, and CALL_USED_REGISTERS is set to "1" for another third of >> > registers. So we have a third available for zero-cost allocation >> > (CALL_USED_REGISTERS-FIXED_REGISTERS), a third available for spill-cost >> > allocation (ALL_REGISTERS-CALL_USED_REGISTERS) and a third non-available >> > (FIXED_REGISTERS). >> > >> > For a non-loop-single-basic-block function we should be targeting only the >> > third of register available at zero-cost -- correct? Yes. This is what is done by the current code, but, apparently, by accident. It seems that the right register count can be obtained with: >> > >> > for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i) >> > - if (call_used_regs[ira_class_hard_regs[cl][i]]) >> > -++call_used_regs_num[cl]; >> > + if (!call_used_regs[ira_class_hard_regs[cl][i]] >> > + || fixed_regs[ira_class_hard_regs[cl][i]]) >> > +++call_saved_regs_num[cl]; >> > >> > Does this look correct to you? > Thinking some more, it seems like fixed_regs should not be available to the > scheduler no matter what. Therefore, the number of fixed registers should be > subtracted from ira_class_hard_regs_num[cl] without any scaling (entry_freq / > bb_freq). Ahh, yes, I forgot about FIXED_REGISTERS being included in CALL_USED_REGISTERS. I agree they should be totally removed from the register limit calculation. I'll rework the patch. Thanks, Pat
Re: [PATCH] Fix computation of register limit for -fsched-pressure
> On Oct 18, 2016, at 1:27 PM, Maxim Kuvyrkovwrote: > >> >> On Oct 17, 2016, at 7:21 PM, Pat Haugen wrote: >> >> On 10/17/2016 08:17 AM, Maxim Kuvyrkov wrote: The patch here, https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01872.html, attempted to scale down the register limit used by -fsched-pressure for the case where the block in question executes as frequently as the entry block to just the call_clobbered (i.e. call_used) regs. But the code is actually scaling toward call_saved registers. The following patch corrects that by computing call_saved regs per class and subtracting out some scaled portion of that. > > Bootstrap/regtest on powerpc64le with no new failures. Ok for trunk? >>> Hi Pat, >>> >>> I stared at your patch and current code for good 30 minutes, and I still >>> don't see what is wrong with the current code. >>> >>> With your patch the number of registers from class CL that scheduler has at >>> its disposal for a single-basic-block function will be: >>> >>> sched_call_regs_num[CL] = ira_class_hard_regs_num[CL] - >>> call_saved_regs_num[CL]; >>> >>> where call_saved_regs_num is number of registers in class CL that need to >>> be saved in the prologue (i.e., "free" registers). I can see some logic in >>> setting >>> >>> sched_call_regs_num[CL] = call_saved_regs_num[CL]; >>> >>> but not in subtracting number of such registers from the number of total >>> available hard registers. >>> >>> Am I missing something? >>> >> >> Your original patch gave the following reasoning: >> >> "At the moment the scheduler does not account for spills in the prologues >> and restores in the epilogue, which occur from use of call-used registers. >> The current state is, essentially, optimized for case when there is a hot >> loop inside the function, and the loop executes significantly more often >> than the prologue/epilogue. However, on the opposite end, we have a case >> when the function is just a single non-cyclic basic block, which executes >> just as often as prologue / epilogue, so spills in the prologue hurt >> performance as much as spills in the basic block itself. In such a case the >> scheduler should throttle-down on the number of available registers and try >> to not go beyond call-clobbered registers." >> >> But the misunderstanding is that call-used registers do NOT cause any >> save/restore. That is to say, call-used == call-clobbered. Your last >> sentence explains the goal for a single block function, to not go beyond >> call-clobbered (i.e. call-used) registers, which makes perfect sense. My >> patch implements that goal by subtracting out call_saved_regs_num (those >> that require prolog/epilog save/restore) from the total regs, and using that >> as the target # of registers to be used for the block. > > I see your point and agree that current code isn't optimal. However, I don't > think your patch is accurate either. Consider > https://gcc.gnu.org/onlinedocs/gccint/Register-Basics.html and let's assume > that FIXED_REGISTERS in class CL is set for a third of the registers, and > CALL_USED_REGISTERS is set to "1" for another third of registers. So we have > a third available for zero-cost allocation > (CALL_USED_REGISTERS-FIXED_REGISTERS), a third available for spill-cost > allocation (ALL_REGISTERS-CALL_USED_REGISTERS) and a third non-available > (FIXED_REGISTERS). > > For a non-loop-single-basic-block function we should be targeting only the > third of register available at zero-cost -- correct? This is what is done by > the current code, but, apparently, by accident. It seems that the right > register count can be obtained with: > > for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i) > - if (call_used_regs[ira_class_hard_regs[cl][i]]) > - ++call_used_regs_num[cl]; > + if (!call_used_regs[ira_class_hard_regs[cl][i]] > + || fixed_regs[ira_class_hard_regs[cl][i]]) > + ++call_saved_regs_num[cl]; > > Does this look correct to you? Thinking some more, it seems like fixed_regs should not be available to the scheduler no matter what. Therefore, the number of fixed registers should be subtracted from ira_class_hard_regs_num[cl] without any scaling (entry_freq / bb_freq). -- Maxim Kuvyrkov www.linaro.org
Re: [PATCH] Fix computation of register limit for -fsched-pressure
> On Oct 17, 2016, at 7:21 PM, Pat Haugenwrote: > > On 10/17/2016 08:17 AM, Maxim Kuvyrkov wrote: >>> The patch here, https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01872.html, >>> attempted to scale down the register limit used by -fsched-pressure for the >>> case where the block in question executes as frequently as the entry block >>> to just the call_clobbered (i.e. call_used) regs. But the code is actually >>> scaling toward call_saved registers. The following patch corrects that by >>> computing call_saved regs per class and subtracting out some scaled portion >>> of that. Bootstrap/regtest on powerpc64le with no new failures. Ok for trunk? >> Hi Pat, >> >> I stared at your patch and current code for good 30 minutes, and I still >> don't see what is wrong with the current code. >> >> With your patch the number of registers from class CL that scheduler has at >> its disposal for a single-basic-block function will be: >> >> sched_call_regs_num[CL] = ira_class_hard_regs_num[CL] - >> call_saved_regs_num[CL]; >> >> where call_saved_regs_num is number of registers in class CL that need to be >> saved in the prologue (i.e., "free" registers). I can see some logic in >> setting >> >> sched_call_regs_num[CL] = call_saved_regs_num[CL]; >> >> but not in subtracting number of such registers from the number of total >> available hard registers. >> >> Am I missing something? >> > > Your original patch gave the following reasoning: > > "At the moment the scheduler does not account for spills in the prologues and > restores in the epilogue, which occur from use of call-used registers. The > current state is, essentially, optimized for case when there is a hot loop > inside the function, and the loop executes significantly more often than the > prologue/epilogue. However, on the opposite end, we have a case when the > function is just a single non-cyclic basic block, which executes just as > often as prologue / epilogue, so spills in the prologue hurt performance as > much as spills in the basic block itself. In such a case the scheduler > should throttle-down on the number of available registers and try to not go > beyond call-clobbered registers." > > But the misunderstanding is that call-used registers do NOT cause any > save/restore. That is to say, call-used == call-clobbered. Your last sentence > explains the goal for a single block function, to not go beyond > call-clobbered (i.e. call-used) registers, which makes perfect sense. My > patch implements that goal by subtracting out call_saved_regs_num (those that > require prolog/epilog save/restore) from the total regs, and using that as > the target # of registers to be used for the block. I see your point and agree that current code isn't optimal. However, I don't think your patch is accurate either. Consider https://gcc.gnu.org/onlinedocs/gccint/Register-Basics.html and let's assume that FIXED_REGISTERS in class CL is set for a third of the registers, and CALL_USED_REGISTERS is set to "1" for another third of registers. So we have a third available for zero-cost allocation (CALL_USED_REGISTERS-FIXED_REGISTERS), a third available for spill-cost allocation (ALL_REGISTERS-CALL_USED_REGISTERS) and a third non-available (FIXED_REGISTERS). For a non-loop-single-basic-block function we should be targeting only the third of register available at zero-cost -- correct? This is what is done by the current code, but, apparently, by accident. It seems that the right register count can be obtained with: for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i) - if (call_used_regs[ira_class_hard_regs[cl][i]]) - ++call_used_regs_num[cl]; + if (!call_used_regs[ira_class_hard_regs[cl][i]] + || fixed_regs[ira_class_hard_regs[cl][i]]) + ++call_saved_regs_num[cl]; Does this look correct to you? -- Maxim Kuvyrkov www.linaro.org
Re: [PATCH] Fix computation of register limit for -fsched-pressure
On 10/17/2016 08:17 AM, Maxim Kuvyrkov wrote: >> The patch here, https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01872.html, >> attempted to scale down the register limit used by -fsched-pressure for the >> case where the block in question executes as frequently as the entry block >> to just the call_clobbered (i.e. call_used) regs. But the code is actually >> scaling toward call_saved registers. The following patch corrects that by >> computing call_saved regs per class and subtracting out some scaled portion >> of that. >> > >> > Bootstrap/regtest on powerpc64le with no new failures. Ok for trunk? > Hi Pat, > > I stared at your patch and current code for good 30 minutes, and I still > don't see what is wrong with the current code. > > With your patch the number of registers from class CL that scheduler has at > its disposal for a single-basic-block function will be: > > sched_call_regs_num[CL] = ira_class_hard_regs_num[CL] - > call_saved_regs_num[CL]; > > where call_saved_regs_num is number of registers in class CL that need to be > saved in the prologue (i.e., "free" registers). I can see some logic in > setting > > sched_call_regs_num[CL] = call_saved_regs_num[CL]; > > but not in subtracting number of such registers from the number of total > available hard registers. > > Am I missing something? > Your original patch gave the following reasoning: "At the moment the scheduler does not account for spills in the prologues and restores in the epilogue, which occur from use of call-used registers. The current state is, essentially, optimized for case when there is a hot loop inside the function, and the loop executes significantly more often than the prologue/epilogue. However, on the opposite end, we have a case when the function is just a single non-cyclic basic block, which executes just as often as prologue / epilogue, so spills in the prologue hurt performance as much as spills in the basic block itself. In such a case the scheduler should throttle-down on the number of available registers and try to not go beyond call-clobbered registers." But the misunderstanding is that call-used registers do NOT cause any save/restore. That is to say, call-used == call-clobbered. Your last sentence explains the goal for a single block function, to not go beyond call-clobbered (i.e. call-used) registers, which makes perfect sense. My patch implements that goal by subtracting out call_saved_regs_num (those that require prolog/epilog save/restore) from the total regs, and using that as the target # of registers to be used for the block. > Also, could you share the testcase that you used to investigate the problem > with register-aware scheduling? I wonder if there is a problem lurking. I don't have a testcase. I'm currently trying to get -fsched-pressure to be beneficial for PowerPC and was familiarizing myself with the code when I spotted the issue. Thanks, Pat
Re: [PATCH] Fix computation of register limit for -fsched-pressure
> On Oct 7, 2016, at 6:08 PM, Pat Haugenwrote: > > The patch here, https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01872.html, > attempted to scale down the register limit used by -fsched-pressure for the > case where the block in question executes as frequently as the entry block to > just the call_clobbered (i.e. call_used) regs. But the code is actually > scaling toward call_saved registers. The following patch corrects that by > computing call_saved regs per class and subtracting out some scaled portion > of that. > > Bootstrap/regtest on powerpc64le with no new failures. Ok for trunk? Hi Pat, I stared at your patch and current code for good 30 minutes, and I still don't see what is wrong with the current code. With your patch the number of registers from class CL that scheduler has at its disposal for a single-basic-block function will be: sched_call_regs_num[CL] = ira_class_hard_regs_num[CL] - call_saved_regs_num[CL]; where call_saved_regs_num is number of registers in class CL that need to be saved in the prologue (i.e., "free" registers). I can see some logic in setting sched_call_regs_num[CL] = call_saved_regs_num[CL]; but not in subtracting number of such registers from the number of total available hard registers. Am I missing something? Also, could you share the testcase that you used to investigate the problem with register-aware scheduling? I wonder if there is a problem lurking. Thank you, -- Maxim Kuvyrkov www.linaro.org > > -Pat > > > 2016-10-07 Pat Haugen > > * haifa-sched.c call_used_regs_num: Rename to... > call_saved_regs_num: ...this. > (sched_pressure_start_bb): Scale call_saved regs not call_used. > (alloc_global_sched_pressure_data): Compute call_saved regs. > > >