Re: RFC reminder: an alternative -fsched-pressure implementation

2012-04-26 Thread Richard Sandiford
Ulrich Weigand uweig...@de.ibm.com writes:
 Richard Sandiford wrote:
 Vladimir Makarov vmaka...@redhat.com writes:
  Taking your results for S390 and ARM with Neon into account, I guess it 
  should be included and probably made by default for these 2 targets (for 
  sure for s390).
 
 OK, thanks to both of you.
 
 Ulrich and Andreas: would you be happy for s390 to use this by default?
 I'll update the patch and install if so.

 I've talked to Andreas, and we agree that s390 should use this as default.
 If you install the base patch, we'll do the back-end change accordingly.
 (I'll also work with Ramana to enable it on ARM where it makes sense,
 probably when targeting Cortex-A cores.)

OK, installed after bootstrapping  regression-testing on
x86_64-linux-gnu, both as normal and with:

(a) the i386 -fschedule-insns override disabled
(b) -fsched-pressure on by default and
(c) -fsched-pressure-algorithm=model by default

The results from the first run were the same as the baseline ones.
The second run had an extra spill failure in a guality test for -m32,
but I think that sort of thing is expected.  It also had a debug quality
regression in pr43479.c.  We had a chain of debug_insns that were
logically independent but which (regardless of the patch) were
nevertheless constrained to be in sequence.  The patch moved some later
non-debug instructions ahead of where the first debug instruction could go.
It seemed that the RA was such that the values used by the later debug
instructions were no longer valid.

Richard


Re: RFC reminder: an alternative -fsched-pressure implementation

2012-04-25 Thread Ulrich Weigand
Richard Sandiford wrote:
 Vladimir Makarov vmaka...@redhat.com writes:
  Taking your results for S390 and ARM with Neon into account, I guess it 
  should be included and probably made by default for these 2 targets (for 
  sure for s390).
 
 OK, thanks to both of you.
 
 Ulrich and Andreas: would you be happy for s390 to use this by default?
 I'll update the patch and install if so.

I've talked to Andreas, and we agree that s390 should use this as default.
If you install the base patch, we'll do the back-end change accordingly.
(I'll also work with Ramana to enable it on ARM where it makes sense,
probably when targeting Cortex-A cores.)

Thanks again to you and Vlad for looking into this long-standing problem!

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com



Re: RFC reminder: an alternative -fsched-pressure implementation

2012-04-24 Thread Richard Sandiford
Vladimir Makarov vmaka...@redhat.com writes:
 On 04/23/2012 11:42 AM, Ulrich Weigand wrote:
 Vladimir Makarov wrote:

 I have a mixed feeling with the patch. I've tried it on SPEC2000 on
 x86/x86-64 and ARM. Model algorithm generates bigger code up to 3.5%
 (SPECFP on x86), 2% (SPECFP on 86-64), and 0.23% (SPECFP on ARM) in
 comparison with the current algorithm. It is slower too. Although
 the difference is quite insignificant on Corei7, compiler speed
 slowdown achieves 0.4% on SPECFP2000 on arm. The algorithm also
 generates slower code on x86 (1.5% on SPECINT and 5% on SPECFP200)
 and practically the same average code on x86-64 and ARM (I've tried
 only SPECINT on ARM).
 Was this using NEON on ARM?  The biggest benefits we saw involved
 vector code ...
 Sorry, No.
 On 04/17/2012 04:29 AM, Richard Sandiford wrote:
 Anyway, given those results and your mixed feelings, I think it would
 be best to drop the patch.  It's a lot of code to carry around, and its
 ad-hoc nature would make it hard to change in future.  There must be
 better ways of achieving the same thing.

 It is up to you, Richard.  I don't mind if you commit it into the trunk.

 There is something in your approach too.  If it would be possible to get
 the best of the two methods, we could see a big improvement.
 On s390, the patch is pretty much a straight improvement across the line:
 0.5% on SPECint, 1.7% on SPECfp overall, with the best single improvement
 in GemsFDTD (9.3%), and no regression beyond usual noise levels.
 That is a really big improvement.
 Given that, and the impressive improvements we saw on some ARM benchmarks
 involving vector code (up to 70%), it would really be a pity to see the
 patch going nowhere ...

 Is there anything we can do to help make the patch more acceptable?
 Any suggestions welcome ...

 I did not object the patch.  As I wrote Richard, it is up to him to 
 decide to commit the patch or not (it still is).  I had mixed results on 
 x86-64 -- some better and some worse (but with average the same) with 
 pretty big variations.  Those big variations mean that people could use 
 this for fine-tuning their programs.

 Taking your results for S390 and ARM with Neon into account, I guess it 
 should be included and probably made by default for these 2 targets (for 
 sure for s390).

OK, thanks to both of you.

Ulrich and Andreas: would you be happy for s390 to use this by default?
I'll update the patch and install if so.

Richard


Re: RFC reminder: an alternative -fsched-pressure implementation

2012-04-23 Thread Ulrich Weigand
Vladimir Makarov wrote:

 I have a mixed feeling with the patch. I've tried it on SPEC2000 on
 x86/x86-64 and ARM. Model algorithm generates bigger code up to 3.5%
 (SPECFP on x86), 2% (SPECFP on 86-64), and 0.23% (SPECFP on ARM) in
 comparison with the current algorithm. It is slower too. Although
 the difference is quite insignificant on Corei7, compiler speed
 slowdown achieves 0.4% on SPECFP2000 on arm. The algorithm also
 generates slower code on x86 (1.5% on SPECINT and 5% on SPECFP200)
 and practically the same average code on x86-64 and ARM (I've tried
 only SPECINT on ARM).

Was this using NEON on ARM?  The biggest benefits we saw involved
vector code ...

 On 04/17/2012 04:29 AM, Richard Sandiford wrote:
  Anyway, given those results and your mixed feelings, I think it would
  be best to drop the patch.  It's a lot of code to carry around, and its
  ad-hoc nature would make it hard to change in future.  There must be
  better ways of achieving the same thing.
 
 It is up to you, Richard.  I don't mind if you commit it into the trunk.
 
 There is something in your approach too.  If it would be possible to get 
 the best of the two methods, we could see a big improvement.

On s390, the patch is pretty much a straight improvement across the line:
0.5% on SPECint, 1.7% on SPECfp overall, with the best single improvement
in GemsFDTD (9.3%), and no regression beyond usual noise levels.

Given that, and the impressive improvements we saw on some ARM benchmarks
involving vector code (up to 70%), it would really be a pity to see the
patch going nowhere ...

Is there anything we can do to help make the patch more acceptable?
Any suggestions welcome ...

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com



Re: RFC reminder: an alternative -fsched-pressure implementation

2012-04-23 Thread Vladimir Makarov

On 04/23/2012 11:42 AM, Ulrich Weigand wrote:

Vladimir Makarov wrote:


I have a mixed feeling with the patch. I've tried it on SPEC2000 on
x86/x86-64 and ARM. Model algorithm generates bigger code up to 3.5%
(SPECFP on x86), 2% (SPECFP on 86-64), and 0.23% (SPECFP on ARM) in
comparison with the current algorithm. It is slower too. Although
the difference is quite insignificant on Corei7, compiler speed
slowdown achieves 0.4% on SPECFP2000 on arm. The algorithm also
generates slower code on x86 (1.5% on SPECINT and 5% on SPECFP200)
and practically the same average code on x86-64 and ARM (I've tried
only SPECINT on ARM).

Was this using NEON on ARM?  The biggest benefits we saw involved
vector code ...

Sorry, No.

On 04/17/2012 04:29 AM, Richard Sandiford wrote:

Anyway, given those results and your mixed feelings, I think it would
be best to drop the patch.  It's a lot of code to carry around, and its
ad-hoc nature would make it hard to change in future.  There must be
better ways of achieving the same thing.


It is up to you, Richard.  I don't mind if you commit it into the trunk.

There is something in your approach too.  If it would be possible to get
the best of the two methods, we could see a big improvement.

On s390, the patch is pretty much a straight improvement across the line:
0.5% on SPECint, 1.7% on SPECfp overall, with the best single improvement
in GemsFDTD (9.3%), and no regression beyond usual noise levels.

That is a really big improvement.

Given that, and the impressive improvements we saw on some ARM benchmarks
involving vector code (up to 70%), it would really be a pity to see the
patch going nowhere ...

Is there anything we can do to help make the patch more acceptable?
Any suggestions welcome ...

I did not object the patch.  As I wrote Richard, it is up to him to 
decide to commit the patch or not (it still is).  I had mixed results on 
x86-64 -- some better and some worse (but with average the same) with 
pretty big variations.  Those big variations mean that people could use 
this for fine-tuning their programs.


Taking your results for S390 and ARM with Neon into account, I guess it 
should be included and probably made by default for these 2 targets (for 
sure for s390).




Re: RFC reminder: an alternative -fsched-pressure implementation

2012-04-17 Thread Richard Sandiford
Vladimir Makarov vmaka...@redhat.com writes:
 On 04/10/2012 09:35 AM, Richard Sandiford wrote:
 Hi Vlad,

 Back in Decemember, when we were still very much in stage 3, I sent
 an RFC about an alternative implementation of -fsched-pressure.
 Just wanted to send a reminder now that we're in the proper stage:

 http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01684.html

 Ulrich has benchmarked it on ARM, S/390 and Power7 (thanks), and got
 reasonable results.  (I mentioned bad Power 7 results in that message,
 because of the way the VSX_REGS class is handled.  Ulrich's results
 are without -mvsx though.)

 The condition I orignally set myself was that this patch should only
 go in if it becomes the default on at least one architecture,
 specifically ARM.  Ulrich tells me that Linaro have now made it
 the default for ARM in their GCC 4.7 release, so hopefully Ramana
 would be OK with doing the same in upstream 4.8.

 I realise the whole thing is probably more complicated and ad-hoc
 than you'd like.  Saying it can't go in is a perfectly acceptable
 answer IMO.

 I have a mixed feeling with the patch.  I've tried it on SPEC2000 on 
 x86/x86-64 and ARM.  Model algorithm generates bigger code up to 3.5% 
 (SPECFP on x86), 2% (SPECFP on 86-64), and 0.23% (SPECFP on ARM) in 
 comparison with the current algorithm.

Yeah.  That's expected really, since one of the points of the patch is
to allow more spilling than the current algorithm does.

One of the main problems I was seeing with the current algorithm was
that it was too conservative, and prevented spilling even when it would
be benificial.  This included spilling outside loops.  E.g. if you have
14 available GPRs, as on ARM, and have 10 pseudos live across a loop
but not used within it, the current algorithm uses 10 registers as the
starting pressure when scheduling the loop.  So the current algorithm
tries hard to avoid spilling those 10 registers, even if that restricts
the amount of reordering within the loop.  The new algorithm was supposed
to allow such registers to be spilled, but those extra spills would
increase code size.

 It is slower too.  Although the difference is quite insignificant on
 Corei7, compiler speed slowdown achieves 0.4% on SPECFP2000 on arm.

Hmm, that's not good.

 The algorithm also generates slower code on x86 (1.5% on SPECINT and
 5% on SPECFP200) and practically the same average code on x86-64 and
 ARM (I've tried only SPECINT on ARM).

Yeah, that's underwhelming too.  It looks like there's a danger that
this would become an A8- and A9-specific pass, so we would really need
better ARM results than that.

 On the other hand, I don't think that 1st insn scheduling will be ever 
 used for x86.  And although the SPECFP2000 rate is the same on x86-64 I 
 saw that some SPECFP2000 tests benefit from your algorithm on x86-64 
 (one amazing difference is 70% improvement on swim on x86-64 although it 
 might be because of different reasons like alignment or cache 
 behaviour).  So I think the algorithm might work better on processors 
 with more registers.

Notwithstanding that this is a goemean, I assume there were some bad
results to cancel out the gain?

 As for the patch itself, I think you should document the option in 
 doc/invoke.texi.  It is missed.

I forgot to say, but that was deliberate.  I see this as a developer
option rather than a user option.  The important thing was to allow
backends to default to the new algorithm if they want to.  Providing
command-line control gives developers an easier way of seeing which
default makes sense, but I don't think the option should be advertised
to users.

 Another minor mistake I found is one line garbage (I guess from
 -fira-algorithm) in description of -fsched-pressure-algorithm in
 common.opt.

Oops, thanks :-)

Anyway, given those results and your mixed feelings, I think it would
be best to drop the patch.  It's a lot of code to carry around, and its
ad-hoc nature would make it hard to change in future.  There must be
better ways of achieving the same thing.

Richard


Re: RFC reminder: an alternative -fsched-pressure implementation

2012-04-17 Thread Vladimir Makarov

On 04/17/2012 04:29 AM, Richard Sandiford wrote:

Vladimir Makarovvmaka...@redhat.com  writes:

On the other hand, I don't think that 1st insn scheduling will be ever
used for x86.  And although the SPECFP2000 rate is the same on x86-64 I
saw that some SPECFP2000 tests benefit from your algorithm on x86-64
(one amazing difference is 70% improvement on swim on x86-64 although it
might be because of different reasons like alignment or cache
behaviour).  So I think the algorithm might work better on processors
with more registers.
Notwithstanding that this is a goemean, I assume there were some bad
results to cancel out the gain?
Yes, mgrid had 4% degradation, mesa had 2%, facerec and ammp had 2.5%, 
lucas had 15%.  On the other hand, galgel had 2% improvement and equake 
had 1%.  All other differences are not considerable.


Oops, thanks :-)

Anyway, given those results and your mixed feelings, I think it would
be best to drop the patch.  It's a lot of code to carry around, and its
ad-hoc nature would make it hard to change in future.  There must be
better ways of achieving the same thing.


It is up to you, Richard.  I don't mind if you commit it into the trunk.

There is something in your approach too.  If it would be possible to get 
the best of the two methods, we could see a big improvement.




Re: RFC reminder: an alternative -fsched-pressure implementation

2012-04-16 Thread Vladimir Makarov

On 04/10/2012 09:35 AM, Richard Sandiford wrote:

Hi Vlad,

Back in Decemember, when we were still very much in stage 3, I sent
an RFC about an alternative implementation of -fsched-pressure.
Just wanted to send a reminder now that we're in the proper stage:

http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01684.html

Ulrich has benchmarked it on ARM, S/390 and Power7 (thanks), and got
reasonable results.  (I mentioned bad Power 7 results in that message,
because of the way the VSX_REGS class is handled.  Ulrich's results
are without -mvsx though.)

The condition I orignally set myself was that this patch should only
go in if it becomes the default on at least one architecture,
specifically ARM.  Ulrich tells me that Linaro have now made it
the default for ARM in their GCC 4.7 release, so hopefully Ramana
would be OK with doing the same in upstream 4.8.

I realise the whole thing is probably more complicated and ad-hoc
than you'd like.  Saying it can't go in is a perfectly acceptable
answer IMO.

I have a mixed feeling with the patch.  I've tried it on SPEC2000 on 
x86/x86-64 and ARM.  Model algorithm generates bigger code up to 3.5% 
(SPECFP on x86), 2% (SPECFP on 86-64), and 0.23% (SPECFP on ARM) in 
comparison with the current algorithm. It is slower too.  Although the 
difference is quite insignificant on Corei7, compiler speed slowdown 
achieves 0.4% on SPECFP2000 on arm.  The algorithm also generates slower 
code on x86 (1.5% on SPECINT and 5% on SPECFP200) and practically the 
same average code on x86-64 and ARM (I've tried only SPECINT on ARM).


On the other hand, I don't think that 1st insn scheduling will be ever 
used for x86.  And although the SPECFP2000 rate is the same on x86-64 I 
saw that some SPECFP2000 tests benefit from your algorithm on x86-64 
(one amazing difference is 70% improvement on swim on x86-64 although it 
might be because of different reasons like alignment or cache 
behaviour).  So I think the algorithm might work better on processors 
with more registers.


I gues it is ok to sumbit this work to the trunk. It may be useful for 
achieving better performance for specific tests or to make it a default 
for some targets if it is proven to be better.


As for the patch itself, I think you should document the option in 
doc/invoke.texi.  It is missed.   Another minor mistake I found is one 
line garbage (I guess from -fira-algorithm) in description of 
-fsched-pressure-algorithm in common.opt.


Thanks, Richard.

By the way, the code in scheduler has been changed since you made a 
patch and you need to do some merging first.




Re: RFC reminder: an alternative -fsched-pressure implementation

2012-04-10 Thread Ramana Radhakrishnan
 The condition I orignally set myself was that this patch should only
 go in if it becomes the default on at least one architecture,
 specifically ARM.  Ulrich tells me that Linaro have now made it
 the default for ARM in their GCC 4.7 release, so hopefully Ramana
 would be OK with doing the same in upstream 4.8.

Before I approve that I'd like to hopefully see some numbers across
the M-class cores as well so that we make sure that side of the
architecture isn't missed out (I don't expect it to make a huge
difference there but it's still worth checking) - I'm quite happy for
it to be default on A8 and A9 based on the evidence I've seen so far.

regards,
Ramana


Re: RFC reminder: an alternative -fsched-pressure implementation

2012-04-10 Thread Vladimir Makarov

On 04/10/2012 09:35 AM, Richard Sandiford wrote:

Hi Vlad,

Back in Decemember, when we were still very much in stage 3, I sent
an RFC about an alternative implementation of -fsched-pressure.
Just wanted to send a reminder now that we're in the proper stage:

http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01684.html

Ulrich has benchmarked it on ARM, S/390 and Power7 (thanks), and got
reasonable results.  (I mentioned bad Power 7 results in that message,
because of the way the VSX_REGS class is handled.  Ulrich's results
are without -mvsx though.)

The condition I orignally set myself was that this patch should only
go in if it becomes the default on at least one architecture,
specifically ARM.  Ulrich tells me that Linaro have now made it
the default for ARM in their GCC 4.7 release, so hopefully Ramana
would be OK with doing the same in upstream 4.8.
Sorry for forgetting about this and thanks for the remainder.  I'll 
start looking at this patch thoroughly and reviewing it on this week 
(may be I'll do some benchmarks too).

I realise the whole thing is probably more complicated and ad-hoc
than you'd like.  Saying it can't go in is a perfectly acceptable
answer IMO.

The first impression I got for the patch when you submitted it, that it 
probably should go in.