Hi Matthias and Gregory,

The results shown were run on Python 2.7.10 built using gcc. The goal of our 
team is to make long-term open source contributions with emphasis on 
performance optimization and support for the larger community and hence icc 
wasn't used.

We've experimented with gcc profile-guided optimization (PGO) and LTO a month 
ago. PGO being an independent/orthogonal optimization, it shows improvement for 
both the stock version (i.e. current switch based dispatch) and the 
computed-goto version. We ran PGO optimized Python on the workloads available 
at language benchmarks game 
(http://benchmarksgame.alioth.debian.org/u64/python.php) and found that PGO 
benefits computed-goto version more than the stock version. I haven't run PGO 
optimized Python with the "grand unified python benchmarks" (GUPB) suite 
...please give me a day or two and will get back to you with PGO (and LTO) 
numbers as well. (LTO hasn't shown much benefit so far on the language 
benchmarks game workloads).

Also, in our analysis using CPU performance counters, we found that python 
workloads (in general) have higher CPU front-end issues (mainly I-cache misses) 
and PGO is very helpful in mitigating those issues. We're also investigating 
and working on ways to further reduce those front-end issues and speedup Python 
workloads.

Thanks,
Vamsi

-----Original Message-----
From: Matthias Klose [mailto:d...@ubuntu.com] 
Sent: Thursday, May 28, 2015 5:01 AM
To: Parasa, Srinivas Vamsi; 'python-dev@python.org'
Subject: Re: [Python-Dev] Computed Goto dispatch for Python 2

On 05/28/2015 02:17 AM, Parasa, Srinivas Vamsi wrote:
> Hi All,
> 
> This is Vamsi from Server Scripting Languages Optimization team at Intel 
> Corporation.
> 
> Would like to submit a request to enable the computed goto based dispatch in 
> Python 2.x (which happens to be enabled by default in Python 3 given its 
> performance benefits on a wide range of workloads). We talked about this 
> patch with Guido and he encouraged us to submit a request on Python-dev 
> (email conversation with Guido shown at the bottom of this email).
> 
> Attached is the computed goto patch (along with instructions to run) for 
> Python 2.7.10 (based on the patch submitted by Jeffrey Yasskin  at 
> http://bugs.python.org/issue4753). We built and tested this patch for Python 
> 2.7.10 on a Linux machine (Ubuntu 14.04 LTS server, Intel Xeon - Haswell EP 
> CPU with 18 cores, hyper-threading off, turbo off).
> 
> Below is a summary of the performance we saw on the "grand unified python 
> benchmarks" suite (available at https://hg.python.org/benchmarks/). We made 3 
> rigorous runs of the following benchmarks. In each rigorous run, a benchmark 
> is run 100 times with and without the computed goto patch. Below we show the 
> average performance boost for the 3 rigorous runs.
> 
> Python 2.7.10 (original) vs Computed Goto performance Benchmark

-1

As Gregory pointed out, there are other options to build the interpreter, and 
we are missing data how these compare with your patch.

I assume, you tested with the Intel compiler, so it would be good to see 
results for other compilers as well (GCC, clang).  Please could you provide the 
data for LTO and profile guided optimized builds (maybe combined too)?  I'm 
happy to work with you on setting up these builds, but currently don't have the 
machine resources to do so myself.

If the benefits show up for these configurations too, then I'm +/-0 on this 
patch.

Matthias

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to