A trial period on numerous other Python loads in which the provided patches are 
tested is welcomed, to be sure that it works as presented.

Yes, it is easy to change it to use a different training set, or subsets of the 
regrtest by adding additional parameters to the line inside the Makefile that 
runs it. Now, the attached patches run the full regrtest suite. 

Alecsandru

From: gvanros...@gmail.com [mailto:gvanros...@gmail.com] On Behalf Of Guido van 
Rossum
Sent: Saturday, August 22, 2015 7:56 PM
To: Patrascu, Alecsandru
Cc: python-dev@python.org
Subject: Re: [Python-Dev] Profile Guided Optimization active by-default

I'm sorry, but we're just not going to turn this on by default without doing a 
trial period ourselves. Your (and Intel's) contribution is very welcome, but in 
order to establish trust in a feature like this, an optional trial period is 
absolutely required.

Regarding the training set, I agree that regrtest sounds to be better than 
pybench. If we make this an opt-in change, we can experiment with different 
training sets easily. (Also, I haven't seen the patch yet, but I presume it's 
easy to use a different training set? Experimentation should be encouraged.)

On Sat, Aug 22, 2015 at 9:40 AM, Patrascu, Alecsandru 
<alecsandru.patra...@intel.com> wrote:
Hello and thank you for your feedback.

We have measured PGO gain using other workloads also. Our initial choice for 
this optimization was pybench, but the speedup obtained was lower than using 
regrtest and it didn't cover a lot of Python scenarios. Instead, regrtest has 
an uniform distribution for the tests and the resulting binary is overall much 
faster than the default, or trained using other workloads, and thus covering a 
larger pool of Python loads. This optimization was also tested on a production 
environments running OpenStack Swift and got up to 9% improvements.

The reason we proposed this target to be always on is that the obtained 
optimized binary is better out of the box for the general cases.

Alecsandru

From: gvanros...@gmail.com [mailto:gvanros...@gmail.com] On Behalf Of Guido van 
Rossum
Sent: Saturday, August 22, 2015 7:15 PM
To: Patrascu, Alecsandru
Cc: python-dev@python.org
Subject: Re: [Python-Dev] Profile Guided Optimization active by-default

How about we first add a new Makefile target that enables PGO, without turning 
it on by default? Then later we can enable it by default.
Also, I have my doubts about regrtest. How sure are we that it represents a 
typical Python load? Tests are often using a different mix of operations than 
production code.

On Sat, Aug 22, 2015 at 7:46 AM, Patrascu, Alecsandru 
<alecsandru.patra...@intel.com> wrote:
Hi All,

This is Alecsandru from Server Scripting Languages Optimization team at Intel 
Corporation.

I would like to submit a request to turn-on Profile Guided Optimization or PGO 
as the default build option for Python (both 2.7 and 3.6), given its 
performance benefits on a wide variety of workloads and hardware.  For 
instance, as shown from attached sample performance results from the Grand 
Unified Python Benchmark, >20% speed up was observed.  In addition, we are 
seeing 2-9% performance boost from OpenStack/Swift where more than 60% of the 
codes are in Python 2.7. Our analysis indicates the performance gain was mainly 
due to reduction of icache misses and CPU front-end stalls.

Attached is the Makefile patches that modify the all build target and adds a 
new one called "disable-profile-opt". We built and tested this patch for Python 
2.7 and 3.6 on our Linux machines (CentOS 7/Ubuntu Server 14.04, Intel Xeon 
Haswell/Broadwell with 18/8 cores).  We use "regrtest" suite for training as it 
provides the best performance improvement.  Some of the test programs in the 
suite may fail which leads to build fail.  One solution is to disable the 
specific failed test using the "-x " flag (as shown in the patch)

Steps to apply the patch:
1.  hg clone https://hg.python.org/cpython cpython
2.  cd cpython
3.  hg update 2.7 (needed for 2.7 only)
4.  Copy *.patch to the current directory
5.  patch < python2.7-pgo.patch (or patch < python3.6-pgo.patch)
6.  ./configure
7.  make

To disable PGO
7b. make disable-profile-opt

In the following, please find our sample performance results from latest XEON 
machine, XEON Broadwell EP.
Hardware (HW):      Intel XEON (Broadwell) 8 Cores

BIOS settings:      Intel Turbo Boost Technology: false
                    Hyper-Threading: false

Operating System:   Ubuntu 14.04.3 LTS trusty

OS configuration:   CPU freq set at fixed: 2.6GHz by
                        echo 2600000 > 
/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
                        echo 2600000 > 
/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
                    Address Space Layout Randomization (ASLR) disabled (to 
reduce run to run variation) by
                        echo 0 > /proc/sys/kernel/randomize_va_space

GCC version:        gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)

Benchmark:          Grand Unified Python Benchmark (GUPB)
                    GUPB Source: https://hg.python.org/benchmarks/

Python2.7 results:
    Python source: hg clone https://hg.python.org/cpython cpython
    Python Source: hg update 2.7
    hg id: 0511b1165bb6 (2.7)
    hg id -r 'ancestors(.) and tag()': 15c95b7d81dc (2.7) v2.7.10
    hg --debug id -i: 0511b1165bb6cf40ada0768a7efc7ba89316f6a5

        Benchmarks          Speedup(%)
        simple_logging      20
        raytrace            20
        silent_logging      19
        richards            19
        chaos               16
        formatted_logging   16
        json_dump           15
        hexiom2             13
        pidigits            12
        slowunpickle        12
        django_v2           12
        unpack_sequence     11
        float               11
        mako                11
        slowpickle          11
        fastpickle          11
        django              11
        go                  10
        json_dump_v2        10
        pathlib             10
        regex_compile       10
        pybench             9.9
        etree_process       9
        regex_v8            8
        bzr_startup         8
        2to3                8
        slowspitfire        8
        telco               8
        pickle_list         8
        fannkuch            8
        etree_iterparse     8
        nqueens             8
        mako_v2             8
        etree_generate      8
        call_method_slots   7
        html5lib_warmup     7
        html5lib            7
        nbody               7
        spectral_norm       7
        spambayes           7
        fastunpickle        6
        meteor_contest      6
        chameleon           6
        rietveld            6
        tornado_http        5
        unpickle_list       5
        pickle_dict         4
        regex_effbot        3
        normal_startup      3
        startup_nosite      3
        etree_parse         2
        call_method_unknown 2
        call_simple         1
        json_load           1
        call_method         1

Python3.6 results
    Python source: hg clone https://hg.python.org/cpython cpython
    hg id: 96d016f78726 tip
    hg id -r 'ancestors(.) and tag()': 1a58b1227501 (3.5) v3.5.0rc1
    hg --debug id -i: 96d016f78726afbf66d396f084b291ea43792af1


        Benchmark           Speedup(%)
        fastunpickle        22.94
        fastpickle          21.67
        json_load           17.64
        simple_logging      17.49
        meteor_contest      16.67
        formatted_logging   15.33
        etree_process       14.61
        raytrace            13.57
        etree_generate      13.56
        chaos               12.09
        hexiom2             12
        nbody               11.88
        json_dump_v2        11.24
        richards            11.02
        nqueens             10.96
        fannkuch            10.79
        go                  10.77
        float               10.26
        regex_compile       9.8
        silent_logging      9.63
        pidigits            9.58
        etree_iterparse     9.48
        2to3                8.44
        regex_v8            8.09
        regex_effbot        7.88
        call_simple         7.63
        tornado_http        7.38
        etree_parse         4.92
        spectral_norm       4.72
        normal_startup      4.39
        telco               3.88
        startup_nosite      3.7
        call_method         3.63
        unpack_sequence     3.6
        call_method_slots   2.91
        call_method_unknown 2.59
        iterative_count     0.45
        threaded_count      -2.79


Thank you,
Alecsandru

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/guido%40python.org



--
--Guido van Rossum (python.org/~guido)



-- 
--Guido van Rossum (python.org/~guido)
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to