Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-25 Thread Brett Cannon
On Tue, 25 Aug 2015 at 09:10 R. David Murray rdmur...@bitdance.com wrote:

 On Tue, 25 Aug 2015 15:59:23 -, Brett Cannon br...@python.org wrote:
  On Mon, 24 Aug 2015 at 23:19 Nick Coghlan ncogh...@gmail.com wrote:
 
   On 25 August 2015 at 05:52, Gregory P. Smith g...@krypto.org wrote:
What we tested and decided to use on our own builds after
 benchmarking at
work was to build with:
   
make profile-opt PROFILE_TASK=-m test.regrtest -w -uall,-audio -x
   test_gdb
test_multiprocessing
   
In general if a test is unreliable or takes an extremely long time,
   exclude
it for your sanity.  (i'd also kick out test_subprocess on 2.7; we
   replaced
subprocess with subprocess32 in our build so that wasn't an issue)
  
   Having the production ready make target be make profile-opt
   doesn't strike me as the most intuitive thing in the world.
  
   I agree we want the ./configure  make sequence to be oriented
   towards local development builds rather than highly optimised
   production ones, so perhaps we could provide a make production
   target that enables PGO with an appropriate training set from
   regrtest, and also complains if --with-pydebug is configured?
  
 
  That's an interesting idea for a make target. It might help get the
  visibility of PGO builds higher as well.

 If we did want to make PGO the default, having a 'make develop' target
 would also be an option.  We already have a precedent for that in the
 'setup.py develop' command.


With a `make develop` target we also can make sure not only that
--with-pydebug is used but that the installation target is /tmp so that new
contributors don't accidentally install a debug build.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-25 Thread Xavier Combelle
Pardon me if I'm not in the right place to ask the following naive
question. (say me if it's the case)

Does Profile Guided Optimization performance improvements are specific to
the chip where the built is done or the performance is better on a larger
set of chips?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-25 Thread Gregory P. Smith
PGO is unrelated to the particular CPU the profiling is done on. (It is
conceivable that it'd make a small difference but I've never observed that
in practice)

On Tue, Aug 25, 2015, 9:28 AM Xavier Combelle xavier.combe...@gmail.com
wrote:

Pardon me if I'm not in the right place to ask the following naive
question. (say me if it's the case)

Does Profile Guided Optimization performance improvements are specific to
the chip where the built is done or the performance is better on a larger
set of chips?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-25 Thread Brett Cannon
On Mon, 24 Aug 2015 at 23:19 Nick Coghlan ncogh...@gmail.com wrote:

 On 25 August 2015 at 05:52, Gregory P. Smith g...@krypto.org wrote:
  What we tested and decided to use on our own builds after benchmarking at
  work was to build with:
 
  make profile-opt PROFILE_TASK=-m test.regrtest -w -uall,-audio -x
 test_gdb
  test_multiprocessing
 
  In general if a test is unreliable or takes an extremely long time,
 exclude
  it for your sanity.  (i'd also kick out test_subprocess on 2.7; we
 replaced
  subprocess with subprocess32 in our build so that wasn't an issue)

 Having the production ready make target be make profile-opt
 doesn't strike me as the most intuitive thing in the world.

 I agree we want the ./configure  make sequence to be oriented
 towards local development builds rather than highly optimised
 production ones, so perhaps we could provide a make production
 target that enables PGO with an appropriate training set from
 regrtest, and also complains if --with-pydebug is configured?


That's an interesting idea for a make target. It might help get the
visibility of PGO builds higher as well.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-25 Thread R. David Murray
On Tue, 25 Aug 2015 15:59:23 -, Brett Cannon br...@python.org wrote:
 On Mon, 24 Aug 2015 at 23:19 Nick Coghlan ncogh...@gmail.com wrote:
 
  On 25 August 2015 at 05:52, Gregory P. Smith g...@krypto.org wrote:
   What we tested and decided to use on our own builds after benchmarking at
   work was to build with:
  
   make profile-opt PROFILE_TASK=-m test.regrtest -w -uall,-audio -x
  test_gdb
   test_multiprocessing
  
   In general if a test is unreliable or takes an extremely long time,
  exclude
   it for your sanity.  (i'd also kick out test_subprocess on 2.7; we
  replaced
   subprocess with subprocess32 in our build so that wasn't an issue)
 
  Having the production ready make target be make profile-opt
  doesn't strike me as the most intuitive thing in the world.
 
  I agree we want the ./configure  make sequence to be oriented
  towards local development builds rather than highly optimised
  production ones, so perhaps we could provide a make production
  target that enables PGO with an appropriate training set from
  regrtest, and also complains if --with-pydebug is configured?
 
 
 That's an interesting idea for a make target. It might help get the
 visibility of PGO builds higher as well.

If we did want to make PGO the default, having a 'make develop' target
would also be an option.  We already have a precedent for that in the
'setup.py develop' command.

--David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-25 Thread Gregory P. Smith
On Mon, Aug 24, 2015, 11:19 PM Nick Coghlan ncogh...@gmail.com wrote:

On 25 August 2015 at 05:52, Gregory P. Smith g...@krypto.org wrote:
 What we tested and decided to use on our own builds after benchmarking at
 work was to build with:

 make profile-opt PROFILE_TASK=-m test.regrtest -w -uall,-audio -x
test_gdb
 test_multiprocessing

 In general if a test is unreliable or takes an extremely long time,
exclude
 it for your sanity.  (i'd also kick out test_subprocess on 2.7; we
replaced
 subprocess with subprocess32 in our build so that wasn't an issue)

Having the production ready make target be make profile-opt
doesn't strike me as the most intuitive thing in the world.

I agree we want the ./configure  make sequence to be oriented
towards local development builds rather than highly optimised
production ones, so perhaps we could provide a make production
target that enables PGO with an appropriate training set from
regrtest, and also complains if --with-pydebug is configured?


Regards,
Nick.

--
Nick Coghlan   |   ncoghlan@ ncogh...@gmail.comgmail.com
ncogh...@gmail.com   |   Brisbane, Australia


Agreed. Also, printing a message out at the end of a default make all build
suggesting people use make production for additional performance instead
might help advertise it.

make install could possibly depend on make production as well?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-25 Thread Patrascu, Alecsandru
Indeed, as Gregory well mentioned, PGO is unrelated to a particular CPU on 
which we do profiling.

From: Python-Dev 
[mailto:python-dev-bounces+alecsandru.patrascu=intel@python.org] On Behalf 
Of Gregory P. Smith
Sent: Tuesday, August 25, 2015 7:44 PM
To: Xavier Combelle; python-dev@python.org
Subject: Re: [Python-Dev] Profile Guided Optimization active by-default

PGO is unrelated to the particular CPU the profiling is done on. (It is 
conceivable that it'd make a small difference but I've never observed that in 
practice)
On Tue, Aug 25, 2015, 9:28 AM Xavier Combelle xavier.combe...@gmail.com wrote:
Pardon me if I'm not in the right place to ask the following naive question. 
(say me if it's the case)
Does Profile Guided Optimization performance improvements are specific to the 
chip where the built is done or the performance is better on a larger set of 
chips?


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-25 Thread Nick Coghlan
On 25 August 2015 at 05:52, Gregory P. Smith g...@krypto.org wrote:
 What we tested and decided to use on our own builds after benchmarking at
 work was to build with:

 make profile-opt PROFILE_TASK=-m test.regrtest -w -uall,-audio -x test_gdb
 test_multiprocessing

 In general if a test is unreliable or takes an extremely long time, exclude
 it for your sanity.  (i'd also kick out test_subprocess on 2.7; we replaced
 subprocess with subprocess32 in our build so that wasn't an issue)

Having the production ready make target be make profile-opt
doesn't strike me as the most intuitive thing in the world.

I agree we want the ./configure  make sequence to be oriented
towards local development builds rather than highly optimised
production ones, so perhaps we could provide a make production
target that enables PGO with an appropriate training set from
regrtest, and also complains if --with-pydebug is configured?

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-25 Thread Skip Montanaro
On Tue, Aug 25, 2015 at 11:17 AM, Brett Cannon br...@python.org wrote:

 With a `make develop` target we also can make sure not only that
 --with-pydebug is used but that the installation target is /tmp so that new
 contributors don't accidentally install a debug build.


You need to be careful there. In my environment, I interface with a lot of
Boost.Python-wrapped code which would be quite impractical to compile with
--with-pydebug. I'd like to be able to throw in all the other development
bells and whistles though, without changing the size of the object header.
Maybe develop-lite?

whatever happened to wink?-ly, y'rs,

Skip
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-25 Thread Eric Snow
On Aug 24, 2015 3:51 PM, Stewart, David C david.c.stew...@intel.com
wrote:

 (Sorry about the format here - I honestly just subscribed to Python-dev so
 be gentle ...)

:)


  Date: Sat, 22 Aug 2015 11:25:59 -0600
  From: Eric Snow ericsnowcurren...@gmail.com

 On Aug 22, 2015 9:02 AM, Patrascu, Alecsandru alecsandru.patrascu at
 intel.com https://mail.python.org/mailman/listinfo/python-dev
 wrote:[snip] For instance, as shown from attached sample performance
 results from theGrand Unified Python Benchmark, 20% speed up was
 observed.
 
 

 Eric ­ I'm the manager of Intel's server scripting language optimization
 team, so I'll answer from that perspective.

Thanks, David!


 Are you referring to the tests in the benchmarks repo? [1] How does the
 real-world performance improvement compare with otherlanguages you are
 targeting for optimization?

 Yes, we're using [1].

 We're seeing up to 10% improvement on Swift (a project in OpenStack) on
 some architectures using the ssbench workload, which is as close to
 real-world as we can get.

Cool.

 Relative to other languages we target, this is
 quite good actually. For example, Java's Hotspot JIT is driven by
 profiling at its core so it's hard to distinguish the value profiling
 alone brings.

Interesting.  So pypy (with it's profiling JIT) would be in a similar boat,
potentially.

 We have seen a nice boost on PHP running Wordpress using
 PGO, but not as impressive as Python and Swift.

Nice.  Presumably this reflects some of the choices we've made on the level
of complexity in the interpreter source.


 By the way, I think letting the compiler optimize the code is a good
 strategy. Not the only strategy we want to use, but it seems like one we
 could do more of.

  And thanks for working on this!  I have several more questions: What
 sorts of future changes in CPython's code might interfere with
 youroptimizations?
 
 

 We're also looking at other source-level optimizations, like the CGOTO
 patch Vamsi submitted in June. Some of these may reduce the value of PGO,
 but in general it's nice to let the compiler do some optimization for you.

  What future additions might stand to benefit?
 

 It's a good question. Our intent is to continue to evaluate and measure
 different training workloads for improvement. In other words, as with any
 good open source project, this patch should improve things a lot and
 should be accepted upstream, but we will continue to make it better.

  What changes in existing code might improve optimization opportunities?
 
 

 We intend to continue to work on source-level optimizations and measuring
 them against GUPB and Swift.

Thanks!  These sorts of contribution has far-reaching positive effects.


  What is the added maintenance burden of the optimizations on CPython,
 ifany?
 
 

 I think the answer is none. Our goal was to introduce performance
 improvements without adding to maintenance effort.

 What is the performance impact on non-Intel architectures?  What
 aboutolder Intel architectures?  ...and future ones?
 
 

 We should modify the patch to make it for Intel only, since we're not
 evaluating non-Intel architectures. Unfortunately for us, I suspect that
 older Intel CPUs might benefit more than current and future ones. Future
 architectures will benefit from other enabling work we're planning.

That's fine though.  At the least you're setting the stage for future work,
including building a relationship here.  :)


  What is Intel's commitment to supporting these (or other) optimizations
 inthe future?  How is the practical EOL of the optimizations managed?
 
 

 As with any corporation's budgeting process, it's hard to know exactly
 what my managers will let me spend money on. :-) But we're definitely
 convinced of the value of dynamic languages for servers and the need to
 work on optimization. As far as I have visibility, it appears to be
 holding true.

Sounds good.


  Finally, +1 on adding an opt-in Makefile target rather than enabling
 theoptimizations by default.
 
 

 Frankly since Ubuntu has been running this way for past two years, I think
 it's fine to make it opt-in, but eventually I hope it can be the default
 once we're happy with it.

Given the reaction here that sounds reasonable.

Thanks for answering these questions and to your team for getting involved!

-eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-25 Thread Maciej Fijalkowski

 Interesting.  So pypy (with it's profiling JIT) would be in a similar boat,
 potentially.


PGO and what pypy does have pretty much nothing to do with each other.
I'm not sure what do you mean by similar boat
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-24 Thread Gregory P. Smith
On Sat, Aug 22, 2015 at 9:27 AM Brett Cannon br...@python.org wrote:

 On Sat, Aug 22, 2015, 09:17 Guido van Rossum gu...@python.org wrote:

 How about we first add a new Makefile target that enables PGO, without
 turning it on by default? Then later we can enable it by default.


There already is one and has been for many years.  make profile-opt.

I even setup a buildbot for it last year.

The problem with the existing profile-opt build in our default Makefile.in
is that is uses a horrible profiling workload (pybench, ugh) so it leaves a
lot of improvements behind.

What all Linux distros (Debian/Ubuntu and Redhat at least; nothing else
matters) do for their Python builds is to use profile-opt but they replace
the profiling workload with a stable set of the Python unittest suite
itself. Results are much better all around.  Generally a 20% speedup.

Anyone deploying Python who is *not* using a profile-opt build is wasting
CPU resources.

Whether it should be *the default* or not *is a different question*.  The
Makefile is optimized for CPython developers who certainly do not want to
run two separate builds and a profile-opt workload every time they type
make to test out their changes.

But all binary release builds should use it.

I agree. Updating the Makefile so it's easier to use PGO is great, but we
 should do a release with it as opt-in and go from there.

 Also, I have my doubts about regrtest. How sure are we that it represents
 a typical Python load? Tests are often using a different mix of operations
 than production code.

 That was also my question. You said that it provides the best performance
 improvement, but compared to what; what else was tried? And what
 difference does it make to e.g. a Django app that is trained on their own
 simulated workload compared to using regrtest? IOW is regrtest displaying
 the best across-the-board performance because it stresses the largest swath
 of Python and thus catches generic patterns in the code but individuals
 could get better performance with a simulated workload?


This isn't something to argue about.  Just use regrtest and compare the
before and after with the benchmark suite.  It really does exercise things
well.  People like to fear that it'll produce code optimized for the test
suite itself or something.  No.  Python as an interpreter is very
realistically exercised by running it as it is simply running a lot of code
and a good variety of code including the extension modules that benefit
most such as regexes, pickle, json, xml, etc.

Thomas tried the test suite and a variety of other workloads when looking
at what to use at work.  The testsuite works out generally the best.  Going
beyond that seems to be a wash.

What we tested and decided to use on our own builds after benchmarking at
work was to build with:

make profile-opt PROFILE_TASK=-m test.regrtest -w -uall,-audio -x test_gdb
test_multiprocessing

In general if a test is unreliable or takes an extremely long time, exclude
it for your sanity.  (i'd also kick out test_subprocess on 2.7; we replaced
subprocess with subprocess32 in our build so that wasn't an issue)

-gps
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-24 Thread Matthias Klose
The current pgo target just uses a very specific task to train for the feedback.
For my Debian/Ubuntu builds I'm using the testsuite minus some problematic tests
to train. Otoh I don't know if this is the best way to do it, however it gave
better results at some time in the past.  What I would like is a benchmark / a
mixture of benchmarks on which to enable pgo/pdo. Based on that you could enable
pgo based on some static decisions based on autofdo. For that you don't need any
profile runs during your build; it just needs shipping the autofdo outcome
together with a Python release. This doesn't give you the same performance as
for for a GCC pgo build, but it would be a first step. And defining the probe
for any pgo build would be welcome too.

  Matthias


On 08/22/2015 06:25 PM, Brett Cannon wrote:
 On Sat, Aug 22, 2015, 09:17 Guido van Rossum gu...@python.org wrote:
 
 How about we first add a new Makefile target that enables PGO, without
 turning it on by default? Then later we can enable it by default.
 
 
 I agree. Updating the Makefile so it's easier to use PGO is great, but we
 should do a release with it as opt-in and go from there.
 
 Also, I have my doubts about regrtest. How sure are we that it represents a
 typical Python load? Tests are often using a different mix of operations
 than production code.
 
 That was also my question. You said that it provides the best performance
 improvement, but compared to what; what else was tried? And what
 difference does it make to e.g. a Django app that is trained on their own
 simulated workload compared to using regrtest? IOW is regrtest displaying
 the best across-the-board performance because it stresses the largest swath
 of Python and thus catches generic patterns in the code but individuals
 could get better performance with a simulated workload?
 
 -Brett
 
 
 On Sat, Aug 22, 2015 at 7:46 AM, Patrascu, Alecsandru 
 alecsandru.patra...@intel.com wrote:
 
 Hi All,
 
 This is Alecsandru from Server Scripting Languages Optimization team at
 Intel Corporation.
 
 I would like to submit a request to turn-on Profile Guided Optimization or
 PGO as the default build option for Python (both 2.7 and 3.6), given its
 performance benefits on a wide variety of workloads and hardware.  For
 instance, as shown from attached sample performance results from the Grand
 Unified Python Benchmark, 20% speed up was observed.  In addition, we are
 seeing 2-9% performance boost from OpenStack/Swift where more than 60% of
 the codes are in Python 2.7. Our analysis indicates the performance gain
 was mainly due to reduction of icache misses and CPU front-end stalls.
 
 Attached is the Makefile patches that modify the all build target and adds
 a new one called disable-profile-opt. We built and tested this patch for
 Python 2.7 and 3.6 on our Linux machines (CentOS 7/Ubuntu Server 14.04,
 Intel Xeon Haswell/Broadwell with 18/8 cores).  We use regrtest suite for
 training as it provides the best performance improvement.  Some of the test
 programs in the suite may fail which leads to build fail.  One solution is
 to disable the specific failed test using the -x  flag (as shown in the
 patch)
 
 Steps to apply the patch:
 1.  hg clone https://hg.python.org/cpython cpython
 2.  cd cpython
 3.  hg update 2.7 (needed for 2.7 only)
 4.  Copy *.patch to the current directory
 5.  patch  python2.7-pgo.patch (or patch  python3.6-pgo.patch)
 6.  ./configure
 7.  make
 
 To disable PGO
 7b. make disable-profile-opt
 
 In the following, please find our sample performance results from latest
 XEON machine, XEON Broadwell EP.
 Hardware (HW):  Intel XEON (Broadwell) 8 Cores
 
 BIOS settings:  Intel Turbo Boost Technology: false
 Hyper-Threading: false
 
 Operating System:   Ubuntu 14.04.3 LTS trusty
 
 OS configuration:   CPU freq set at fixed: 2.6GHz by
 echo 260 
 /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
 echo 260 
 /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
 Address Space Layout Randomization (ASLR) disabled (to
 reduce run to run variation) by
 echo 0  /proc/sys/kernel/randomize_va_space
 
 GCC version:gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)
 
 Benchmark:  Grand Unified Python Benchmark (GUPB)
 GUPB Source: https://hg.python.org/benchmarks/
 
 Python2.7 results:
 Python source: hg clone https://hg.python.org/cpython cpython
 Python Source: hg update 2.7
 hg id: 0511b1165bb6 (2.7)
 hg id -r 'ancestors(.) and tag()': 15c95b7d81dc (2.7) v2.7.10
 hg --debug id -i: 0511b1165bb6cf40ada0768a7efc7ba89316f6a5
 
 Benchmarks  Speedup(%)
 simple_logging  20
 raytrace20
 silent_logging  19
 richards19
 chaos   16
 formatted_logging   16
 json_dump   15
 

Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-24 Thread Stewart, David C
(Sorry about the format here - I honestly just subscribed to Python-dev so
be gentle ...)

 Date: Sat, 22 Aug 2015 11:25:59 -0600
 From: Eric Snow ericsnowcurren...@gmail.com

On Aug 22, 2015 9:02 AM, Patrascu, Alecsandru alecsandru.patrascu at
intel.com https://mail.python.org/mailman/listinfo/python-dev
wrote:[snip] For instance, as shown from attached sample performance
results from theGrand Unified Python Benchmark, 20% speed up was
observed. 



Eric ­ I'm the manager of Intel's server scripting language optimization
team, so I'll answer from that perspective.

Are you referring to the tests in the benchmarks repo? [1] How does the
real-world performance improvement compare with otherlanguages you are
targeting for optimization?

Yes, we're using [1].

We're seeing up to 10% improvement on Swift (a project in OpenStack) on
some architectures using the ssbench workload, which is as close to
real-world as we can get. Relative to other languages we target, this is
quite good actually. For example, Java's Hotspot JIT is driven by
profiling at its core so it's hard to distinguish the value profiling
alone brings. We have seen a nice boost on PHP running Wordpress using
PGO, but not as impressive as Python and Swift.

By the way, I think letting the compiler optimize the code is a good
strategy. Not the only strategy we want to use, but it seems like one we
could do more of.

 And thanks for working on this!  I have several more questions: What
sorts of future changes in CPython's code might interfere with
youroptimizations?



We're also looking at other source-level optimizations, like the CGOTO
patch Vamsi submitted in June. Some of these may reduce the value of PGO,
but in general it's nice to let the compiler do some optimization for you.

 What future additions might stand to benefit?


It's a good question. Our intent is to continue to evaluate and measure
different training workloads for improvement. In other words, as with any
good open source project, this patch should improve things a lot and
should be accepted upstream, but we will continue to make it better.

 What changes in existing code might improve optimization opportunities?



We intend to continue to work on source-level optimizations and measuring
them against GUPB and Swift.

 What is the added maintenance burden of the optimizations on CPython,
ifany? 



I think the answer is none. Our goal was to introduce performance
improvements without adding to maintenance effort.

What is the performance impact on non-Intel architectures?  What
aboutolder Intel architectures?  ...and future ones?



We should modify the patch to make it for Intel only, since we're not
evaluating non-Intel architectures. Unfortunately for us, I suspect that
older Intel CPUs might benefit more than current and future ones. Future
architectures will benefit from other enabling work we're planning.

 What is Intel's commitment to supporting these (or other) optimizations
inthe future?  How is the practical EOL of the optimizations managed?



As with any corporation's budgeting process, it's hard to know exactly
what my managers will let me spend money on. :-) But we're definitely
convinced of the value of dynamic languages for servers and the need to
work on optimization. As far as I have visibility, it appears to be
holding true.

 Finally, +1 on adding an opt-in Makefile target rather than enabling
theoptimizations by default.



Frankly since Ubuntu has been running this way for past two years, I think
it's fine to make it opt-in, but eventually I hope it can be the default
once we're happy with it.

 Thanks again! -eric

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-23 Thread Patrascu, Alecsandru
I removed the zip file and uploaded the patches individually. 

Alecsandru

From: Brett Cannon [mailto:br...@python.org] 
Sent: Sunday, August 23, 2015 4:47 AM
To: Patrascu, Alecsandru; python-dev@python.org
Subject: Re: [Python-Dev] Profile Guided Optimization active by-default


On Sat, 22 Aug 2015 at 11:10 Patrascu, Alecsandru 
alecsandru.patra...@intel.com wrote:
I'm sorry, I forgot to mention this, I already opened an issue and the patches 
are uploaded [1].

[1] http://bugs.python.org/issue24915

Great, thanks Alecandru. Do please follow Stefan's comment, though, and upload 
the patch files directly and not as a zip file. That way we can use our code 
review tool to do a proper review of the patches.

-Brett
 


From: Brett Cannon [mailto:br...@python.org]
Sent: Saturday, August 22, 2015 9:00 PM
To: Patrascu, Alecsandru; python-dev@python.org
Subject: Re: [Python-Dev] Profile Guided Optimization active by-default

I just realized I didn't see anyone say it, but please upload the patches to 
bugs.Python.org for easier tracking and reviewing.

On Sat, Aug 22, 2015, 08:01 Patrascu, Alecsandru 
alecsandru.patra...@intel.com wrote:
Hi All,

This is Alecsandru from Server Scripting Languages Optimization team at Intel 
Corporation.

I would like to submit a request to turn-on Profile Guided Optimization or PGO 
as the default build option for Python (both 2.7 and 3.6), given its 
performance benefits on a wide variety of workloads and hardware.  For 
instance, as shown from attached sample performance results from the Grand 
Unified Python Benchmark, 20% speed up was observed.  In addition, we are 
seeing 2-9% performance boost from OpenStack/Swift where more than 60% of the 
codes are in Python 2.7. Our analysis indicates the performance gain was mainly 
due to reduction of icache misses and CPU front-end stalls.

Attached is the Makefile patches that modify the all build target and adds a 
new one called disable-profile-opt. We built and tested this patch for Python 
2.7 and 3.6 on our Linux machines (CentOS 7/Ubuntu Server 14.04, Intel Xeon 
Haswell/Broadwell with 18/8 cores).  We use regrtest suite for training as it 
provides the best performance improvement.  Some of the test programs in the 
suite may fail which leads to build fail.  One solution is to disable the 
specific failed test using the -x  flag (as shown in the patch)

Steps to apply the patch:
1.  hg clone https://hg.python.org/cpython cpython
2.  cd cpython
3.  hg update 2.7 (needed for 2.7 only)
4.  Copy *.patch to the current directory
5.  patch  python2.7-pgo.patch (or patch  python3.6-pgo.patch)
6.  ./configure
7.  make

To disable PGO
7b. make disable-profile-opt

In the following, please find our sample performance results from latest XEON 
machine, XEON Broadwell EP.
Hardware (HW):      Intel XEON (Broadwell) 8 Cores

BIOS settings:      Intel Turbo Boost Technology: false
                    Hyper-Threading: false

Operating System:   Ubuntu 14.04.3 LTS trusty

OS configuration:   CPU freq set at fixed: 2.6GHz by
                        echo 260  
/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
                        echo 260  
/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
                    Address Space Layout Randomization (ASLR) disabled (to 
reduce run to run variation) by
                        echo 0  /proc/sys/kernel/randomize_va_space

GCC version:        gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)

Benchmark:          Grand Unified Python Benchmark (GUPB)
                    GUPB Source: https://hg.python.org/benchmarks/

Python2.7 results:
    Python source: hg clone https://hg.python.org/cpython cpython
    Python Source: hg update 2.7
    hg id: 0511b1165bb6 (2.7)
    hg id -r 'ancestors(.) and tag()': 15c95b7d81dc (2.7) v2.7.10
    hg --debug id -i: 0511b1165bb6cf40ada0768a7efc7ba89316f6a5

        Benchmarks          Speedup(%)
        simple_logging      20
        raytrace            20
        silent_logging      19
        richards            19
        chaos               16
        formatted_logging   16
        json_dump           15
        hexiom2             13
        pidigits            12
        slowunpickle        12
        django_v2           12
        unpack_sequence     11
        float               11
        mako                11
        slowpickle          11
        fastpickle          11
        django              11
        go                  10
        json_dump_v2        10
        pathlib             10
        regex_compile       10
        pybench             9.9
        etree_process       9
        regex_v8            8
        bzr_startup         8
        2to3                8
        slowspitfire        8
        telco               8
        pickle_list         8
        fannkuch            8
        etree_iterparse     8
        nqueens             8
        mako_v2             8
        etree_generate      8
        call_method_slots   7

Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-22 Thread Brett Cannon
On Sat, 22 Aug 2015 at 11:10 Patrascu, Alecsandru 
alecsandru.patra...@intel.com wrote:

 I'm sorry, I forgot to mention this, I already opened an issue and the
 patches are uploaded [1].

 [1] http://bugs.python.org/issue24915


Great, thanks Alecandru. Do please follow Stefan's comment, though, and
upload the patch files directly and not as a zip file. That way we can use
our code review tool to do a proper review of the patches.

-Brett




 From: Brett Cannon [mailto:br...@python.org]
 Sent: Saturday, August 22, 2015 9:00 PM
 To: Patrascu, Alecsandru; python-dev@python.org
 Subject: Re: [Python-Dev] Profile Guided Optimization active by-default

 I just realized I didn't see anyone say it, but please upload the patches
 to bugs.Python.org for easier tracking and reviewing.

 On Sat, Aug 22, 2015, 08:01 Patrascu, Alecsandru 
 alecsandru.patra...@intel.com wrote:
 Hi All,

 This is Alecsandru from Server Scripting Languages Optimization team at
 Intel Corporation.

 I would like to submit a request to turn-on Profile Guided Optimization or
 PGO as the default build option for Python (both 2.7 and 3.6), given its
 performance benefits on a wide variety of workloads and hardware.  For
 instance, as shown from attached sample performance results from the Grand
 Unified Python Benchmark, 20% speed up was observed.  In addition, we are
 seeing 2-9% performance boost from OpenStack/Swift where more than 60% of
 the codes are in Python 2.7. Our analysis indicates the performance gain
 was mainly due to reduction of icache misses and CPU front-end stalls.

 Attached is the Makefile patches that modify the all build target and adds
 a new one called disable-profile-opt. We built and tested this patch for
 Python 2.7 and 3.6 on our Linux machines (CentOS 7/Ubuntu Server 14.04,
 Intel Xeon Haswell/Broadwell with 18/8 cores).  We use regrtest suite for
 training as it provides the best performance improvement.  Some of the test
 programs in the suite may fail which leads to build fail.  One solution is
 to disable the specific failed test using the -x  flag (as shown in the
 patch)

 Steps to apply the patch:
 1.  hg clone https://hg.python.org/cpython cpython
 2.  cd cpython
 3.  hg update 2.7 (needed for 2.7 only)
 4.  Copy *.patch to the current directory
 5.  patch  python2.7-pgo.patch (or patch  python3.6-pgo.patch)
 6.  ./configure
 7.  make

 To disable PGO
 7b. make disable-profile-opt

 In the following, please find our sample performance results from latest
 XEON machine, XEON Broadwell EP.
 Hardware (HW):  Intel XEON (Broadwell) 8 Cores

 BIOS settings:  Intel Turbo Boost Technology: false
 Hyper-Threading: false

 Operating System:   Ubuntu 14.04.3 LTS trusty

 OS configuration:   CPU freq set at fixed: 2.6GHz by
 echo 260 
 /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
 echo 260 
 /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
 Address Space Layout Randomization (ASLR) disabled (to
 reduce run to run variation) by
 echo 0  /proc/sys/kernel/randomize_va_space

 GCC version:gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)

 Benchmark:  Grand Unified Python Benchmark (GUPB)
 GUPB Source: https://hg.python.org/benchmarks/

 Python2.7 results:
 Python source: hg clone https://hg.python.org/cpython cpython
 Python Source: hg update 2.7
 hg id: 0511b1165bb6 (2.7)
 hg id -r 'ancestors(.) and tag()': 15c95b7d81dc (2.7) v2.7.10
 hg --debug id -i: 0511b1165bb6cf40ada0768a7efc7ba89316f6a5

 Benchmarks  Speedup(%)
 simple_logging  20
 raytrace20
 silent_logging  19
 richards19
 chaos   16
 formatted_logging   16
 json_dump   15
 hexiom2 13
 pidigits12
 slowunpickle12
 django_v2   12
 unpack_sequence 11
 float   11
 mako11
 slowpickle  11
 fastpickle  11
 django  11
 go  10
 json_dump_v210
 pathlib 10
 regex_compile   10
 pybench 9.9
 etree_process   9
 regex_v88
 bzr_startup 8
 2to38
 slowspitfire8
 telco   8
 pickle_list 8
 fannkuch8
 etree_iterparse 8
 nqueens 8
 mako_v2 8
 etree_generate  8
 call_method_slots   7
 html5lib_warmup 7
 html5lib7
 nbody   7
 spectral_norm   7
 spambayes   7
 fastunpickle6

Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-22 Thread Patrascu, Alecsandru
A trial period on numerous other Python loads in which the provided patches are 
tested is welcomed, to be sure that it works as presented.

Yes, it is easy to change it to use a different training set, or subsets of the 
regrtest by adding additional parameters to the line inside the Makefile that 
runs it. Now, the attached patches run the full regrtest suite. 

Alecsandru

From: gvanros...@gmail.com [mailto:gvanros...@gmail.com] On Behalf Of Guido van 
Rossum
Sent: Saturday, August 22, 2015 7:56 PM
To: Patrascu, Alecsandru
Cc: python-dev@python.org
Subject: Re: [Python-Dev] Profile Guided Optimization active by-default

I'm sorry, but we're just not going to turn this on by default without doing a 
trial period ourselves. Your (and Intel's) contribution is very welcome, but in 
order to establish trust in a feature like this, an optional trial period is 
absolutely required.

Regarding the training set, I agree that regrtest sounds to be better than 
pybench. If we make this an opt-in change, we can experiment with different 
training sets easily. (Also, I haven't seen the patch yet, but I presume it's 
easy to use a different training set? Experimentation should be encouraged.)

On Sat, Aug 22, 2015 at 9:40 AM, Patrascu, Alecsandru 
alecsandru.patra...@intel.com wrote:
Hello and thank you for your feedback.

We have measured PGO gain using other workloads also. Our initial choice for 
this optimization was pybench, but the speedup obtained was lower than using 
regrtest and it didn't cover a lot of Python scenarios. Instead, regrtest has 
an uniform distribution for the tests and the resulting binary is overall much 
faster than the default, or trained using other workloads, and thus covering a 
larger pool of Python loads. This optimization was also tested on a production 
environments running OpenStack Swift and got up to 9% improvements.

The reason we proposed this target to be always on is that the obtained 
optimized binary is better out of the box for the general cases.

Alecsandru

From: gvanros...@gmail.com [mailto:gvanros...@gmail.com] On Behalf Of Guido van 
Rossum
Sent: Saturday, August 22, 2015 7:15 PM
To: Patrascu, Alecsandru
Cc: python-dev@python.org
Subject: Re: [Python-Dev] Profile Guided Optimization active by-default

How about we first add a new Makefile target that enables PGO, without turning 
it on by default? Then later we can enable it by default.
Also, I have my doubts about regrtest. How sure are we that it represents a 
typical Python load? Tests are often using a different mix of operations than 
production code.

On Sat, Aug 22, 2015 at 7:46 AM, Patrascu, Alecsandru 
alecsandru.patra...@intel.com wrote:
Hi All,

This is Alecsandru from Server Scripting Languages Optimization team at Intel 
Corporation.

I would like to submit a request to turn-on Profile Guided Optimization or PGO 
as the default build option for Python (both 2.7 and 3.6), given its 
performance benefits on a wide variety of workloads and hardware.  For 
instance, as shown from attached sample performance results from the Grand 
Unified Python Benchmark, 20% speed up was observed.  In addition, we are 
seeing 2-9% performance boost from OpenStack/Swift where more than 60% of the 
codes are in Python 2.7. Our analysis indicates the performance gain was mainly 
due to reduction of icache misses and CPU front-end stalls.

Attached is the Makefile patches that modify the all build target and adds a 
new one called disable-profile-opt. We built and tested this patch for Python 
2.7 and 3.6 on our Linux machines (CentOS 7/Ubuntu Server 14.04, Intel Xeon 
Haswell/Broadwell with 18/8 cores).  We use regrtest suite for training as it 
provides the best performance improvement.  Some of the test programs in the 
suite may fail which leads to build fail.  One solution is to disable the 
specific failed test using the -x  flag (as shown in the patch)

Steps to apply the patch:
1.  hg clone https://hg.python.org/cpython cpython
2.  cd cpython
3.  hg update 2.7 (needed for 2.7 only)
4.  Copy *.patch to the current directory
5.  patch  python2.7-pgo.patch (or patch  python3.6-pgo.patch)
6.  ./configure
7.  make

To disable PGO
7b. make disable-profile-opt

In the following, please find our sample performance results from latest XEON 
machine, XEON Broadwell EP.
Hardware (HW):      Intel XEON (Broadwell) 8 Cores

BIOS settings:      Intel Turbo Boost Technology: false
                    Hyper-Threading: false

Operating System:   Ubuntu 14.04.3 LTS trusty

OS configuration:   CPU freq set at fixed: 2.6GHz by
                        echo 260  
/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
                        echo 260  
/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
                    Address Space Layout Randomization (ASLR) disabled (to 
reduce run to run variation) by
                        echo 0  /proc/sys/kernel/randomize_va_space

GCC version:        gcc version 4.8.4 (Ubuntu 4.8.4

Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-22 Thread Stefan Behnel
Guido van Rossum schrieb am 22.08.2015 um 18:55:
 Regarding the training set, I agree that regrtest sounds to be better than
 pybench. If we make this an opt-in change, we can experiment with different
 training sets easily. (Also, I haven't seen the patch yet, but I presume
 it's easy to use a different training set?

It's just one command in one line, yes.


 Experimentation should be encouraged.)

A well chosen training set can have a notable impact on PGO compiled code
in general, and switching from pybench to regrtests should make such a
difference. However, since CPython's overall performance is mostly
determined by the interpreter loop, general object operations (getattr!)
and the basic builtin types, of which the regression test suite makes
plenty of use, it is rather unlikely that other training sets would provide
substantially better performance for Python code execution.

Note also that Ubuntu has shipped PGO builds based on the regrtests for
years, and they seemed to be quite happy with it.

Stefan


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-22 Thread Eric Snow
On Aug 22, 2015 9:02 AM, Patrascu, Alecsandru 
alecsandru.patra...@intel.com wrote:
[snip]
 For instance, as shown from attached sample performance results from the
Grand Unified Python Benchmark, 20% speed up was observed.

Are you referring to the tests in the benchmarks repo? [1]

How does the real-world performance improvement compare with other
languages you are targeting for optimization?

And thanks for working on this!  I have several more questions:

What sorts of future changes in CPython's code might interfere with your
optimizations?

What future additions might stand to benefit?

What changes in existing code might improve optimization opportunities?

What is the added maintenance burden of the optimizations on CPython, if
any?

What is the performance impact on non-Intel architectures?  What about
older Intel architectures?  ...and future ones?

What is Intel's commitment to supporting these (or other) optimizations in
the future?  How is the practical EOL of the optimizations managed?

Finally, +1 on adding an opt-in Makefile target rather than enabling the
optimizations by default.

Thanks again!

-eric

[1] https://hg.python.org/benchmarks/
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-22 Thread Stefan Behnel
Stefan Behnel schrieb am 22.08.2015 um 19:25:
 Guido van Rossum schrieb am 22.08.2015 um 18:55:
 Regarding the training set, I agree that regrtest sounds to be better than
 pybench. If we make this an opt-in change, we can experiment with different
 training sets easily. (Also, I haven't seen the patch yet, but I presume
 it's easy to use a different training set?
 Experimentation should be encouraged.)
 
 A well chosen training set can have a notable impact on PGO compiled code
 in general, and switching from pybench to regrtests should make such a
 difference. However, since CPython's overall performance is mostly
 determined by the interpreter loop, general object operations (getattr!)
 and the basic builtin types, of which the regression test suite makes
 plenty of use, it is rather unlikely that other training sets would provide
 substantially better performance for Python code execution.

Note that this doesn't mean that it's a good workload for the C code in the
standard library (and I guess that's why Alecsandru initially excluded the
hashlib tests). Improvements on that front might still be possible. But
it's certainly a good workload for all the rest, i.e. for executing general
Python code.

Stefan


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-22 Thread Patrascu, Alecsandru
Yes, the results are measured from running the benchmarks from the repo [1].

Furthermore, this optimization is generic and can handle any kind of changes in 
hardware or the CPython 2/3 source code. We are not adding to or modifying 
regrtest and our rule will be applied on the latest tests existing in the 
CPython repo. Since they are up to date and being easy to be executed, this 
proposal makes sure that users will always take benefit from them.

[1] https://hg.python.org/benchmarks/

Alecsandru

From: Eric Snow [mailto:ericsnowcurren...@gmail.com] 
Sent: Saturday, August 22, 2015 8:26 PM
To: Patrascu, Alecsandru
Cc: Python-Dev
Subject: Re: [Python-Dev] Profile Guided Optimization active by-default


On Aug 22, 2015 9:02 AM, Patrascu, Alecsandru alecsandru.patra...@intel.com 
wrote:
[snip] 
 For instance, as shown from attached sample performance results from the 
 Grand Unified Python Benchmark, 20% speed up was observed.
Are you referring to the tests in the benchmarks repo? [1]
How does the real-world performance improvement compare with other languages 
you are targeting for optimization?
And thanks for working on this!  I have several more questions:
What sorts of future changes in CPython's code might interfere with your 
optimizations?
What future additions might stand to benefit?
What changes in existing code might improve optimization opportunities?
What is the added maintenance burden of the optimizations on CPython, if any?
What is the performance impact on non-Intel architectures?  What about older 
Intel architectures?  ...and future ones?
What is Intel's commitment to supporting these (or other) optimizations in the 
future?  How is the practical EOL of the optimizations managed?
Finally, +1 on adding an opt-in Makefile target rather than enabling the 
optimizations by default.
Thanks again!
-eric
[1] https://hg.python.org/benchmarks/
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-22 Thread Brett Cannon
On Sat, Aug 22, 2015, 09:58 Patrascu, Alecsandru 
alecsandru.patra...@intel.com wrote:


This target replaces the existing one in the CPython Makefile, which now
uses a quick run of pybench and the obtained binary does not perform well
on general Python loads. I don't think is a good idea to add a by-default
target that does PGO on dedicated workloads, like Django, because then it
will perform better on that particular load and poorly on other.


Sorry for not being clearer, but I was not suggesting that the default be
for Django, just whether making the Makefile easier to work with when
generating a PGO build for a custom workload. If we already have a rule
that uses pybench then it should definitely be changed to use regrtest (and
honestly pybench should not be used for benchmarking anything since it
doesn't reflect real world usage in any way; its just for quick checks
while doing development on the core of Python and otherwise shouldn't be
used to measure anything substantial).

Of course, if any user has a dedicated workload for which he or she want to
get the best benefit over PGO, it will have to run that training separately
from the proposed one. Our proposal targets the broader audience that uses
Python in various scenarios, and they will see an overall improvement after
compiling Python from sources.

Right, but my question was whether there was any benefit to making the
Makefile rules generic to make building PGO binaries easier for people who
do want to do a custom profile and it sounds like it isn't worth the effort.

So I'm with Guido where I'm happy to see the build rules added/updated to
use regrtest for a PGO build but have it be an opt-in flag and not on by
default (at least for now).

-Brett

Alecsandru

From: Brett Cannon [mailto:br...@python.org]
Sent: Saturday, August 22, 2015 7:25 PM
To: gu...@python.org; Patrascu, Alecsandru
Cc: python-dev@python.org
Subject: Re: [Python-Dev] Profile Guided Optimization active by-default

On Sat, Aug 22, 2015, 09:17 Guido van Rossum gu...@python.org wrote:
How about we first add a new Makefile target that enables PGO, without
turning it on by default? Then later we can enable it by default.

I agree. Updating the Makefile so it's easier to use PGO is great, but we
should do a release with it as opt-in and go from there.
Also, I have my doubts about regrtest. How sure are we that it represents a
typical Python load? Tests are often using a different mix of operations
than production code.
That was also my question. You said that it provides the best performance
improvement, but compared to what; what else was tried? And what
difference does it make to e.g. a Django app that is trained on their own
simulated workload compared to using regrtest? IOW is regrtest displaying
the best across-the-board performance because it stresses the largest swath
of Python and thus catches generic patterns in the code but individuals
could get better performance with a simulated workload?
-Brett

On Sat, Aug 22, 2015 at 7:46 AM, Patrascu, Alecsandru 
alecsandru.patra...@intel.com wrote:
Hi All,
This is Alecsandru from Server Scripting Languages Optimization team at
Intel Corporation.
I would like to submit a request to turn-on Profile Guided Optimization or
PGO as the default build option for Python (both 2.7 and 3.6), given its
performance benefits on a wide variety of workloads and hardware.  For
instance, as shown from attached sample performance results from the Grand
Unified Python Benchmark, 20% speed up was observed.  In addition, we are
seeing 2-9% performance boost from OpenStack/Swift where more than 60% of
the codes are in Python 2.7. Our analysis indicates the performance gain
was mainly due to reduction of icache misses and CPU front-end stalls.
Attached is the Makefile patches that modify the all build target and adds
a new one called disable-profile-opt. We built and tested this patch for
Python 2.7 and 3.6 on our Linux machines (CentOS 7/Ubuntu Server 14.04,
Intel Xeon Haswell/Broadwell with 18/8 cores).  We use regrtest suite for
training as it provides the best performance improvement.  Some of the test
programs in the suite may fail which leads to build fail.  One solution is
to disable the specific failed test using the -x  flag (as shown in the
patch)
Steps to apply the patch:
1.  hg clone https://hg.python.org/cpython cpython
2.  cd cpython
3.  hg update 2.7 (needed for 2.7 only)
4.  Copy *.patch to the current directory
5.  patch  python2.7-pgo.patch (or patch  python3.6-pgo.patch)
6.  ./configure
7.  make
To disable PGO
7b. make disable-profile-opt
In the following, please find our sample performance results from latest
XEON machine, XEON Broadwell EP.
Hardware (HW):  Intel XEON (Broadwell) 8 Cores
BIOS settings:  Intel Turbo Boost Technology: false
Hyper-Threading: false
Operating System:   Ubuntu 14.04.3 LTS trusty
OS configuration:   CPU freq set at fixed: 2.6GHz by
echo 260 
/sys

Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-22 Thread Brett Cannon
I just realized I didn't see anyone say it, but please upload the patches
to bugs.Python.org for easier tracking and reviewing.

On Sat, Aug 22, 2015, 08:01 Patrascu, Alecsandru 
alecsandru.patra...@intel.com wrote:

 Hi All,

 This is Alecsandru from Server Scripting Languages Optimization team at
 Intel Corporation.

 I would like to submit a request to turn-on Profile Guided Optimization or
 PGO as the default build option for Python (both 2.7 and 3.6), given its
 performance benefits on a wide variety of workloads and hardware.  For
 instance, as shown from attached sample performance results from the Grand
 Unified Python Benchmark, 20% speed up was observed.  In addition, we are
 seeing 2-9% performance boost from OpenStack/Swift where more than 60% of
 the codes are in Python 2.7. Our analysis indicates the performance gain
 was mainly due to reduction of icache misses and CPU front-end stalls.

 Attached is the Makefile patches that modify the all build target and adds
 a new one called disable-profile-opt. We built and tested this patch for
 Python 2.7 and 3.6 on our Linux machines (CentOS 7/Ubuntu Server 14.04,
 Intel Xeon Haswell/Broadwell with 18/8 cores).  We use regrtest suite for
 training as it provides the best performance improvement.  Some of the test
 programs in the suite may fail which leads to build fail.  One solution is
 to disable the specific failed test using the -x  flag (as shown in the
 patch)

 Steps to apply the patch:
 1.  hg clone https://hg.python.org/cpython cpython
 2.  cd cpython
 3.  hg update 2.7 (needed for 2.7 only)
 4.  Copy *.patch to the current directory
 5.  patch  python2.7-pgo.patch (or patch  python3.6-pgo.patch)
 6.  ./configure
 7.  make

 To disable PGO
 7b. make disable-profile-opt

 In the following, please find our sample performance results from latest
 XEON machine, XEON Broadwell EP.
 Hardware (HW):  Intel XEON (Broadwell) 8 Cores

 BIOS settings:  Intel Turbo Boost Technology: false
 Hyper-Threading: false

 Operating System:   Ubuntu 14.04.3 LTS trusty

 OS configuration:   CPU freq set at fixed: 2.6GHz by
 echo 260 
 /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
 echo 260 
 /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
 Address Space Layout Randomization (ASLR) disabled (to
 reduce run to run variation) by
 echo 0  /proc/sys/kernel/randomize_va_space

 GCC version:gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)

 Benchmark:  Grand Unified Python Benchmark (GUPB)
 GUPB Source: https://hg.python.org/benchmarks/

 Python2.7 results:
 Python source: hg clone https://hg.python.org/cpython cpython
 Python Source: hg update 2.7
 hg id: 0511b1165bb6 (2.7)
 hg id -r 'ancestors(.) and tag()': 15c95b7d81dc (2.7) v2.7.10
 hg --debug id -i: 0511b1165bb6cf40ada0768a7efc7ba89316f6a5

 Benchmarks  Speedup(%)
 simple_logging  20
 raytrace20
 silent_logging  19
 richards19
 chaos   16
 formatted_logging   16
 json_dump   15
 hexiom2 13
 pidigits12
 slowunpickle12
 django_v2   12
 unpack_sequence 11
 float   11
 mako11
 slowpickle  11
 fastpickle  11
 django  11
 go  10
 json_dump_v210
 pathlib 10
 regex_compile   10
 pybench 9.9
 etree_process   9
 regex_v88
 bzr_startup 8
 2to38
 slowspitfire8
 telco   8
 pickle_list 8
 fannkuch8
 etree_iterparse 8
 nqueens 8
 mako_v2 8
 etree_generate  8
 call_method_slots   7
 html5lib_warmup 7
 html5lib7
 nbody   7
 spectral_norm   7
 spambayes   7
 fastunpickle6
 meteor_contest  6
 chameleon   6
 rietveld6
 tornado_http5
 unpickle_list   5
 pickle_dict 4
 regex_effbot3
 normal_startup  3
 startup_nosite  3
 etree_parse 2
 call_method_unknown 2
 call_simple 1
 json_load   1
 call_method 1

 Python3.6 results
 Python source: hg clone https://hg.python.org/cpython cpython
 hg id: 96d016f78726 tip
 hg id -r 'ancestors(.) and tag()': 1a58b1227501 (3.5) v3.5.0rc1
 hg --debug id -i: 96d016f78726afbf66d396f084b291ea43792af1


Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-22 Thread Patrascu, Alecsandru
Thank you Stefan for also pointing out the importance of regrtest as a good 
training set for building Python. Indeed, Ubuntu delivers in their repos the 
Python2/3 binaries already optimized using PGO based on regrtest.

Alecsandru 

-Original Message-
From: Python-Dev 
[mailto:python-dev-bounces+alecsandru.patrascu=intel@python.org] On Behalf 
Of Stefan Behnel
Sent: Saturday, August 22, 2015 8:25 PM
To: python-dev@python.org
Subject: Re: [Python-Dev] Profile Guided Optimization active by-default

Guido van Rossum schrieb am 22.08.2015 um 18:55:
 Regarding the training set, I agree that regrtest sounds to be better 
 than pybench. If we make this an opt-in change, we can experiment with 
 different training sets easily. (Also, I haven't seen the patch yet, 
 but I presume it's easy to use a different training set?

It's just one command in one line, yes.


 Experimentation should be encouraged.)

A well chosen training set can have a notable impact on PGO compiled code in 
general, and switching from pybench to regrtests should make such a difference. 
However, since CPython's overall performance is mostly determined by the 
interpreter loop, general object operations (getattr!) and the basic builtin 
types, of which the regression test suite makes plenty of use, it is rather 
unlikely that other training sets would provide substantially better 
performance for Python code execution.

Note also that Ubuntu has shipped PGO builds based on the regrtests for years, 
and they seemed to be quite happy with it.

Stefan


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/alecsandru.patrascu%40intel.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-22 Thread Patrascu, Alecsandru
I'm sorry, I forgot to mention this, I already opened an issue and the patches 
are uploaded [1].

[1] http://bugs.python.org/issue24915

From: Brett Cannon [mailto:br...@python.org] 
Sent: Saturday, August 22, 2015 9:00 PM
To: Patrascu, Alecsandru; python-dev@python.org
Subject: Re: [Python-Dev] Profile Guided Optimization active by-default

I just realized I didn't see anyone say it, but please upload the patches to 
bugs.Python.org for easier tracking and reviewing.

On Sat, Aug 22, 2015, 08:01 Patrascu, Alecsandru 
alecsandru.patra...@intel.com wrote:
Hi All,

This is Alecsandru from Server Scripting Languages Optimization team at Intel 
Corporation.

I would like to submit a request to turn-on Profile Guided Optimization or PGO 
as the default build option for Python (both 2.7 and 3.6), given its 
performance benefits on a wide variety of workloads and hardware.  For 
instance, as shown from attached sample performance results from the Grand 
Unified Python Benchmark, 20% speed up was observed.  In addition, we are 
seeing 2-9% performance boost from OpenStack/Swift where more than 60% of the 
codes are in Python 2.7. Our analysis indicates the performance gain was mainly 
due to reduction of icache misses and CPU front-end stalls.

Attached is the Makefile patches that modify the all build target and adds a 
new one called disable-profile-opt. We built and tested this patch for Python 
2.7 and 3.6 on our Linux machines (CentOS 7/Ubuntu Server 14.04, Intel Xeon 
Haswell/Broadwell with 18/8 cores).  We use regrtest suite for training as it 
provides the best performance improvement.  Some of the test programs in the 
suite may fail which leads to build fail.  One solution is to disable the 
specific failed test using the -x  flag (as shown in the patch)

Steps to apply the patch:
1.  hg clone https://hg.python.org/cpython cpython
2.  cd cpython
3.  hg update 2.7 (needed for 2.7 only)
4.  Copy *.patch to the current directory
5.  patch  python2.7-pgo.patch (or patch  python3.6-pgo.patch)
6.  ./configure
7.  make

To disable PGO
7b. make disable-profile-opt

In the following, please find our sample performance results from latest XEON 
machine, XEON Broadwell EP.
Hardware (HW):      Intel XEON (Broadwell) 8 Cores

BIOS settings:      Intel Turbo Boost Technology: false
                    Hyper-Threading: false

Operating System:   Ubuntu 14.04.3 LTS trusty

OS configuration:   CPU freq set at fixed: 2.6GHz by
                        echo 260  
/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
                        echo 260  
/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
                    Address Space Layout Randomization (ASLR) disabled (to 
reduce run to run variation) by
                        echo 0  /proc/sys/kernel/randomize_va_space

GCC version:        gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)

Benchmark:          Grand Unified Python Benchmark (GUPB)
                    GUPB Source: https://hg.python.org/benchmarks/

Python2.7 results:
    Python source: hg clone https://hg.python.org/cpython cpython
    Python Source: hg update 2.7
    hg id: 0511b1165bb6 (2.7)
    hg id -r 'ancestors(.) and tag()': 15c95b7d81dc (2.7) v2.7.10
    hg --debug id -i: 0511b1165bb6cf40ada0768a7efc7ba89316f6a5

        Benchmarks          Speedup(%)
        simple_logging      20
        raytrace            20
        silent_logging      19
        richards            19
        chaos               16
        formatted_logging   16
        json_dump           15
        hexiom2             13
        pidigits            12
        slowunpickle        12
        django_v2           12
        unpack_sequence     11
        float               11
        mako                11
        slowpickle          11
        fastpickle          11
        django              11
        go                  10
        json_dump_v2        10
        pathlib             10
        regex_compile       10
        pybench             9.9
        etree_process       9
        regex_v8            8
        bzr_startup         8
        2to3                8
        slowspitfire        8
        telco               8
        pickle_list         8
        fannkuch            8
        etree_iterparse     8
        nqueens             8
        mako_v2             8
        etree_generate      8
        call_method_slots   7
        html5lib_warmup     7
        html5lib            7
        nbody               7
        spectral_norm       7
        spambayes           7
        fastunpickle        6
        meteor_contest      6
        chameleon           6
        rietveld            6
        tornado_http        5
        unpickle_list       5
        pickle_dict         4
        regex_effbot        3
        normal_startup      3
        startup_nosite      3
        etree_parse         2
        call_method_unknown 2
        call_simple         1
        json_load           1
        call_method

Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-22 Thread Guido van Rossum
How about we first add a new Makefile target that enables PGO, without
turning it on by default? Then later we can enable it by default.

Also, I have my doubts about regrtest. How sure are we that it represents a
typical Python load? Tests are often using a different mix of operations
than production code.

On Sat, Aug 22, 2015 at 7:46 AM, Patrascu, Alecsandru 
alecsandru.patra...@intel.com wrote:

 Hi All,

 This is Alecsandru from Server Scripting Languages Optimization team at
 Intel Corporation.

 I would like to submit a request to turn-on Profile Guided Optimization or
 PGO as the default build option for Python (both 2.7 and 3.6), given its
 performance benefits on a wide variety of workloads and hardware.  For
 instance, as shown from attached sample performance results from the Grand
 Unified Python Benchmark, 20% speed up was observed.  In addition, we are
 seeing 2-9% performance boost from OpenStack/Swift where more than 60% of
 the codes are in Python 2.7. Our analysis indicates the performance gain
 was mainly due to reduction of icache misses and CPU front-end stalls.

 Attached is the Makefile patches that modify the all build target and adds
 a new one called disable-profile-opt. We built and tested this patch for
 Python 2.7 and 3.6 on our Linux machines (CentOS 7/Ubuntu Server 14.04,
 Intel Xeon Haswell/Broadwell with 18/8 cores).  We use regrtest suite for
 training as it provides the best performance improvement.  Some of the test
 programs in the suite may fail which leads to build fail.  One solution is
 to disable the specific failed test using the -x  flag (as shown in the
 patch)

 Steps to apply the patch:
 1.  hg clone https://hg.python.org/cpython cpython
 2.  cd cpython
 3.  hg update 2.7 (needed for 2.7 only)
 4.  Copy *.patch to the current directory
 5.  patch  python2.7-pgo.patch (or patch  python3.6-pgo.patch)
 6.  ./configure
 7.  make

 To disable PGO
 7b. make disable-profile-opt

 In the following, please find our sample performance results from latest
 XEON machine, XEON Broadwell EP.
 Hardware (HW):  Intel XEON (Broadwell) 8 Cores

 BIOS settings:  Intel Turbo Boost Technology: false
 Hyper-Threading: false

 Operating System:   Ubuntu 14.04.3 LTS trusty

 OS configuration:   CPU freq set at fixed: 2.6GHz by
 echo 260 
 /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
 echo 260 
 /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
 Address Space Layout Randomization (ASLR) disabled (to
 reduce run to run variation) by
 echo 0  /proc/sys/kernel/randomize_va_space

 GCC version:gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)

 Benchmark:  Grand Unified Python Benchmark (GUPB)
 GUPB Source: https://hg.python.org/benchmarks/

 Python2.7 results:
 Python source: hg clone https://hg.python.org/cpython cpython
 Python Source: hg update 2.7
 hg id: 0511b1165bb6 (2.7)
 hg id -r 'ancestors(.) and tag()': 15c95b7d81dc (2.7) v2.7.10
 hg --debug id -i: 0511b1165bb6cf40ada0768a7efc7ba89316f6a5

 Benchmarks  Speedup(%)
 simple_logging  20
 raytrace20
 silent_logging  19
 richards19
 chaos   16
 formatted_logging   16
 json_dump   15
 hexiom2 13
 pidigits12
 slowunpickle12
 django_v2   12
 unpack_sequence 11
 float   11
 mako11
 slowpickle  11
 fastpickle  11
 django  11
 go  10
 json_dump_v210
 pathlib 10
 regex_compile   10
 pybench 9.9
 etree_process   9
 regex_v88
 bzr_startup 8
 2to38
 slowspitfire8
 telco   8
 pickle_list 8
 fannkuch8
 etree_iterparse 8
 nqueens 8
 mako_v2 8
 etree_generate  8
 call_method_slots   7
 html5lib_warmup 7
 html5lib7
 nbody   7
 spectral_norm   7
 spambayes   7
 fastunpickle6
 meteor_contest  6
 chameleon   6
 rietveld6
 tornado_http5
 unpickle_list   5
 pickle_dict 4
 regex_effbot3
 normal_startup  3
 startup_nosite  3
 etree_parse 2
 call_method_unknown 2
 call_simple 1
 json_load   1
 call_method 1

 Python3.6 results
 Python source: hg clone 

Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-22 Thread Brett Cannon
On Sat, Aug 22, 2015, 09:17 Guido van Rossum gu...@python.org wrote:

How about we first add a new Makefile target that enables PGO, without
turning it on by default? Then later we can enable it by default.


I agree. Updating the Makefile so it's easier to use PGO is great, but we
should do a release with it as opt-in and go from there.

Also, I have my doubts about regrtest. How sure are we that it represents a
typical Python load? Tests are often using a different mix of operations
than production code.

That was also my question. You said that it provides the best performance
improvement, but compared to what; what else was tried? And what
difference does it make to e.g. a Django app that is trained on their own
simulated workload compared to using regrtest? IOW is regrtest displaying
the best across-the-board performance because it stresses the largest swath
of Python and thus catches generic patterns in the code but individuals
could get better performance with a simulated workload?

-Brett


On Sat, Aug 22, 2015 at 7:46 AM, Patrascu, Alecsandru 
alecsandru.patra...@intel.com wrote:

Hi All,

This is Alecsandru from Server Scripting Languages Optimization team at
Intel Corporation.

I would like to submit a request to turn-on Profile Guided Optimization or
PGO as the default build option for Python (both 2.7 and 3.6), given its
performance benefits on a wide variety of workloads and hardware.  For
instance, as shown from attached sample performance results from the Grand
Unified Python Benchmark, 20% speed up was observed.  In addition, we are
seeing 2-9% performance boost from OpenStack/Swift where more than 60% of
the codes are in Python 2.7. Our analysis indicates the performance gain
was mainly due to reduction of icache misses and CPU front-end stalls.

Attached is the Makefile patches that modify the all build target and adds
a new one called disable-profile-opt. We built and tested this patch for
Python 2.7 and 3.6 on our Linux machines (CentOS 7/Ubuntu Server 14.04,
Intel Xeon Haswell/Broadwell with 18/8 cores).  We use regrtest suite for
training as it provides the best performance improvement.  Some of the test
programs in the suite may fail which leads to build fail.  One solution is
to disable the specific failed test using the -x  flag (as shown in the
patch)

Steps to apply the patch:
1.  hg clone https://hg.python.org/cpython cpython
2.  cd cpython
3.  hg update 2.7 (needed for 2.7 only)
4.  Copy *.patch to the current directory
5.  patch  python2.7-pgo.patch (or patch  python3.6-pgo.patch)
6.  ./configure
7.  make

To disable PGO
7b. make disable-profile-opt

In the following, please find our sample performance results from latest
XEON machine, XEON Broadwell EP.
Hardware (HW):  Intel XEON (Broadwell) 8 Cores

BIOS settings:  Intel Turbo Boost Technology: false
Hyper-Threading: false

Operating System:   Ubuntu 14.04.3 LTS trusty

OS configuration:   CPU freq set at fixed: 2.6GHz by
echo 260 
/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
echo 260 
/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
Address Space Layout Randomization (ASLR) disabled (to
reduce run to run variation) by
echo 0  /proc/sys/kernel/randomize_va_space

GCC version:gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)

Benchmark:  Grand Unified Python Benchmark (GUPB)
GUPB Source: https://hg.python.org/benchmarks/

Python2.7 results:
Python source: hg clone https://hg.python.org/cpython cpython
Python Source: hg update 2.7
hg id: 0511b1165bb6 (2.7)
hg id -r 'ancestors(.) and tag()': 15c95b7d81dc (2.7) v2.7.10
hg --debug id -i: 0511b1165bb6cf40ada0768a7efc7ba89316f6a5

Benchmarks  Speedup(%)
simple_logging  20
raytrace20
silent_logging  19
richards19
chaos   16
formatted_logging   16
json_dump   15
hexiom2 13
pidigits12
slowunpickle12
django_v2   12
unpack_sequence 11
float   11
mako11
slowpickle  11
fastpickle  11
django  11
go  10
json_dump_v210
pathlib 10
regex_compile   10
pybench 9.9
etree_process   9
regex_v88
bzr_startup 8
2to38
slowspitfire8
telco   8
pickle_list 8
fannkuch8
etree_iterparse 8
nqueens 8
mako_v2 8
etree_generate  8
call_method_slots   7
html5lib_warmup 7
html5lib7

Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-22 Thread Patrascu, Alecsandru
Hello and thank you for your feedback.

We have measured PGO gain using other workloads also. Our initial choice for 
this optimization was pybench, but the speedup obtained was lower than using 
regrtest and it didn't cover a lot of Python scenarios. Instead, regrtest has 
an uniform distribution for the tests and the resulting binary is overall much 
faster than the default, or trained using other workloads, and thus covering a 
larger pool of Python loads. This optimization was also tested on a production 
environments running OpenStack Swift and got up to 9% improvements.

The reason we proposed this target to be always on is that the obtained 
optimized binary is better out of the box for the general cases.

Alecsandru 

From: gvanros...@gmail.com [mailto:gvanros...@gmail.com] On Behalf Of Guido van 
Rossum
Sent: Saturday, August 22, 2015 7:15 PM
To: Patrascu, Alecsandru
Cc: python-dev@python.org
Subject: Re: [Python-Dev] Profile Guided Optimization active by-default

How about we first add a new Makefile target that enables PGO, without turning 
it on by default? Then later we can enable it by default.
Also, I have my doubts about regrtest. How sure are we that it represents a 
typical Python load? Tests are often using a different mix of operations than 
production code.

On Sat, Aug 22, 2015 at 7:46 AM, Patrascu, Alecsandru 
alecsandru.patra...@intel.com wrote:
Hi All,

This is Alecsandru from Server Scripting Languages Optimization team at Intel 
Corporation.

I would like to submit a request to turn-on Profile Guided Optimization or PGO 
as the default build option for Python (both 2.7 and 3.6), given its 
performance benefits on a wide variety of workloads and hardware.  For 
instance, as shown from attached sample performance results from the Grand 
Unified Python Benchmark, 20% speed up was observed.  In addition, we are 
seeing 2-9% performance boost from OpenStack/Swift where more than 60% of the 
codes are in Python 2.7. Our analysis indicates the performance gain was mainly 
due to reduction of icache misses and CPU front-end stalls.

Attached is the Makefile patches that modify the all build target and adds a 
new one called disable-profile-opt. We built and tested this patch for Python 
2.7 and 3.6 on our Linux machines (CentOS 7/Ubuntu Server 14.04, Intel Xeon 
Haswell/Broadwell with 18/8 cores).  We use regrtest suite for training as it 
provides the best performance improvement.  Some of the test programs in the 
suite may fail which leads to build fail.  One solution is to disable the 
specific failed test using the -x  flag (as shown in the patch)

Steps to apply the patch:
1.  hg clone https://hg.python.org/cpython cpython
2.  cd cpython
3.  hg update 2.7 (needed for 2.7 only)
4.  Copy *.patch to the current directory
5.  patch  python2.7-pgo.patch (or patch  python3.6-pgo.patch)
6.  ./configure
7.  make

To disable PGO
7b. make disable-profile-opt

In the following, please find our sample performance results from latest XEON 
machine, XEON Broadwell EP.
Hardware (HW):      Intel XEON (Broadwell) 8 Cores

BIOS settings:      Intel Turbo Boost Technology: false
                    Hyper-Threading: false

Operating System:   Ubuntu 14.04.3 LTS trusty

OS configuration:   CPU freq set at fixed: 2.6GHz by
                        echo 260  
/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
                        echo 260  
/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
                    Address Space Layout Randomization (ASLR) disabled (to 
reduce run to run variation) by
                        echo 0  /proc/sys/kernel/randomize_va_space

GCC version:        gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)

Benchmark:          Grand Unified Python Benchmark (GUPB)
                    GUPB Source: https://hg.python.org/benchmarks/

Python2.7 results:
    Python source: hg clone https://hg.python.org/cpython cpython
    Python Source: hg update 2.7
    hg id: 0511b1165bb6 (2.7)
    hg id -r 'ancestors(.) and tag()': 15c95b7d81dc (2.7) v2.7.10
    hg --debug id -i: 0511b1165bb6cf40ada0768a7efc7ba89316f6a5

        Benchmarks          Speedup(%)
        simple_logging      20
        raytrace            20
        silent_logging      19
        richards            19
        chaos               16
        formatted_logging   16
        json_dump           15
        hexiom2             13
        pidigits            12
        slowunpickle        12
        django_v2           12
        unpack_sequence     11
        float               11
        mako                11
        slowpickle          11
        fastpickle          11
        django              11
        go                  10
        json_dump_v2        10
        pathlib             10
        regex_compile       10
        pybench             9.9
        etree_process       9
        regex_v8            8
        bzr_startup         8
        2to3                8
        slowspitfire        8

Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-22 Thread Guido van Rossum
I'm sorry, but we're just not going to turn this on by default without
doing a trial period ourselves. Your (and Intel's) contribution is very
welcome, but in order to establish trust in a feature like this, an
optional trial period is absolutely required.

Regarding the training set, I agree that regrtest sounds to be better than
pybench. If we make this an opt-in change, we can experiment with different
training sets easily. (Also, I haven't seen the patch yet, but I presume
it's easy to use a different training set? Experimentation should be
encouraged.)

On Sat, Aug 22, 2015 at 9:40 AM, Patrascu, Alecsandru 
alecsandru.patra...@intel.com wrote:

 Hello and thank you for your feedback.

 We have measured PGO gain using other workloads also. Our initial choice
 for this optimization was pybench, but the speedup obtained was lower than
 using regrtest and it didn't cover a lot of Python scenarios. Instead,
 regrtest has an uniform distribution for the tests and the resulting binary
 is overall much faster than the default, or trained using other workloads,
 and thus covering a larger pool of Python loads. This optimization was also
 tested on a production environments running OpenStack Swift and got up to
 9% improvements.

 The reason we proposed this target to be always on is that the obtained
 optimized binary is better out of the box for the general cases.

 Alecsandru

 From: gvanros...@gmail.com [mailto:gvanros...@gmail.com] On Behalf Of
 Guido van Rossum
 Sent: Saturday, August 22, 2015 7:15 PM
 To: Patrascu, Alecsandru
 Cc: python-dev@python.org
 Subject: Re: [Python-Dev] Profile Guided Optimization active by-default

 How about we first add a new Makefile target that enables PGO, without
 turning it on by default? Then later we can enable it by default.
 Also, I have my doubts about regrtest. How sure are we that it represents
 a typical Python load? Tests are often using a different mix of operations
 than production code.

 On Sat, Aug 22, 2015 at 7:46 AM, Patrascu, Alecsandru 
 alecsandru.patra...@intel.com wrote:
 Hi All,

 This is Alecsandru from Server Scripting Languages Optimization team at
 Intel Corporation.

 I would like to submit a request to turn-on Profile Guided Optimization or
 PGO as the default build option for Python (both 2.7 and 3.6), given its
 performance benefits on a wide variety of workloads and hardware.  For
 instance, as shown from attached sample performance results from the Grand
 Unified Python Benchmark, 20% speed up was observed.  In addition, we are
 seeing 2-9% performance boost from OpenStack/Swift where more than 60% of
 the codes are in Python 2.7. Our analysis indicates the performance gain
 was mainly due to reduction of icache misses and CPU front-end stalls.

 Attached is the Makefile patches that modify the all build target and adds
 a new one called disable-profile-opt. We built and tested this patch for
 Python 2.7 and 3.6 on our Linux machines (CentOS 7/Ubuntu Server 14.04,
 Intel Xeon Haswell/Broadwell with 18/8 cores).  We use regrtest suite for
 training as it provides the best performance improvement.  Some of the test
 programs in the suite may fail which leads to build fail.  One solution is
 to disable the specific failed test using the -x  flag (as shown in the
 patch)

 Steps to apply the patch:
 1.  hg clone https://hg.python.org/cpython cpython
 2.  cd cpython
 3.  hg update 2.7 (needed for 2.7 only)
 4.  Copy *.patch to the current directory
 5.  patch  python2.7-pgo.patch (or patch  python3.6-pgo.patch)
 6.  ./configure
 7.  make

 To disable PGO
 7b. make disable-profile-opt

 In the following, please find our sample performance results from latest
 XEON machine, XEON Broadwell EP.
 Hardware (HW):  Intel XEON (Broadwell) 8 Cores

 BIOS settings:  Intel Turbo Boost Technology: false
 Hyper-Threading: false

 Operating System:   Ubuntu 14.04.3 LTS trusty

 OS configuration:   CPU freq set at fixed: 2.6GHz by
 echo 260 
 /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
 echo 260 
 /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
 Address Space Layout Randomization (ASLR) disabled (to
 reduce run to run variation) by
 echo 0  /proc/sys/kernel/randomize_va_space

 GCC version:gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)

 Benchmark:  Grand Unified Python Benchmark (GUPB)
 GUPB Source: https://hg.python.org/benchmarks/

 Python2.7 results:
 Python source: hg clone https://hg.python.org/cpython cpython
 Python Source: hg update 2.7
 hg id: 0511b1165bb6 (2.7)
 hg id -r 'ancestors(.) and tag()': 15c95b7d81dc (2.7) v2.7.10
 hg --debug id -i: 0511b1165bb6cf40ada0768a7efc7ba89316f6a5

 Benchmarks  Speedup(%)
 simple_logging  20
 raytrace20
 silent_logging  19
 richards19

Re: [Python-Dev] Profile Guided Optimization active by-default

2015-08-22 Thread Patrascu, Alecsandru

This target replaces the existing one in the CPython Makefile, which now uses a 
quick run of pybench and the obtained binary does not perform well on general 
Python loads. I don't think is a good idea to add a by-default target that does 
PGO on dedicated workloads, like Django, because then it will perform better on 
that particular load and poorly on other. 

Of course, if any user has a dedicated workload for which he or she want to get 
the best benefit over PGO, it will have to run that training separately from 
the proposed one. Our proposal targets the broader audience that uses Python in 
various scenarios, and they will see an overall improvement after compiling 
Python from sources.

Alecsandru

From: Brett Cannon [mailto:br...@python.org] 
Sent: Saturday, August 22, 2015 7:25 PM
To: gu...@python.org; Patrascu, Alecsandru
Cc: python-dev@python.org
Subject: Re: [Python-Dev] Profile Guided Optimization active by-default


On Sat, Aug 22, 2015, 09:17 Guido van Rossum gu...@python.org wrote:
How about we first add a new Makefile target that enables PGO, without turning 
it on by default? Then later we can enable it by default.

I agree. Updating the Makefile so it's easier to use PGO is great, but we 
should do a release with it as opt-in and go from there.
Also, I have my doubts about regrtest. How sure are we that it represents a 
typical Python load? Tests are often using a different mix of operations than 
production code.
That was also my question. You said that it provides the best performance 
improvement, but compared to what; what else was tried? And what difference 
does it make to e.g. a Django app that is trained on their own simulated 
workload compared to using regrtest? IOW is regrtest displaying the best 
across-the-board performance because it stresses the largest swath of Python 
and thus catches generic patterns in the code but individuals could get better 
performance with a simulated workload?
-Brett

On Sat, Aug 22, 2015 at 7:46 AM, Patrascu, Alecsandru 
alecsandru.patra...@intel.com wrote:
Hi All,
This is Alecsandru from Server Scripting Languages Optimization team at Intel 
Corporation.
I would like to submit a request to turn-on Profile Guided Optimization or PGO 
as the default build option for Python (both 2.7 and 3.6), given its 
performance benefits on a wide variety of workloads and hardware.  For 
instance, as shown from attached sample performance results from the Grand 
Unified Python Benchmark, 20% speed up was observed.  In addition, we are 
seeing 2-9% performance boost from OpenStack/Swift where more than 60% of the 
codes are in Python 2.7. Our analysis indicates the performance gain was mainly 
due to reduction of icache misses and CPU front-end stalls.
Attached is the Makefile patches that modify the all build target and adds a 
new one called disable-profile-opt. We built and tested this patch for Python 
2.7 and 3.6 on our Linux machines (CentOS 7/Ubuntu Server 14.04, Intel Xeon 
Haswell/Broadwell with 18/8 cores).  We use regrtest suite for training as it 
provides the best performance improvement.  Some of the test programs in the 
suite may fail which leads to build fail.  One solution is to disable the 
specific failed test using the -x  flag (as shown in the patch)
Steps to apply the patch:
1.  hg clone https://hg.python.org/cpython cpython
2.  cd cpython
3.  hg update 2.7 (needed for 2.7 only)
4.  Copy *.patch to the current directory
5.  patch  python2.7-pgo.patch (or patch  python3.6-pgo.patch)
6.  ./configure
7.  make
To disable PGO
7b. make disable-profile-opt
In the following, please find our sample performance results from latest XEON 
machine, XEON Broadwell EP.
Hardware (HW):      Intel XEON (Broadwell) 8 Cores
BIOS settings:      Intel Turbo Boost Technology: false
                    Hyper-Threading: false
Operating System:   Ubuntu 14.04.3 LTS trusty
OS configuration:   CPU freq set at fixed: 2.6GHz by
                        echo 260  
/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
                        echo 260  
/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
                    Address Space Layout Randomization (ASLR) disabled (to 
reduce run to run variation) by
                        echo 0  /proc/sys/kernel/randomize_va_space
GCC version:        gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)
Benchmark:          Grand Unified Python Benchmark (GUPB)
                    GUPB Source: https://hg.python.org/benchmarks/
Python2.7 results:
    Python source: hg clone https://hg.python.org/cpython cpython
    Python Source: hg update 2.7
    hg id: 0511b1165bb6 (2.7)
    hg id -r 'ancestors(.) and tag()': 15c95b7d81dc (2.7) v2.7.10
    hg --debug id -i: 0511b1165bb6cf40ada0768a7efc7ba89316f6a5
        Benchmarks          Speedup(%)
        simple_logging      20
        raytrace            20
        silent_logging      19
        richards            19
        chaos               16
        formatted_logging   16