[issue41181] [macOS] Build macOS installer with LTO and PGO optimizations

2020-11-01 Thread STINNER Victor


STINNER Victor  added the comment:

> Link Time Optimization (LTO) and Profile-Guided Optimization (PGO) have a 
> major impact on Python performance: they make Python between 10% and 30% 
> faster (coarse estimation).
>
> Currently, macOS installers distributed on python.org are built with Clang 
> 6.0 without LTO or PGO. I propose to enable LTO and PGO to make these 
> binaries faster.

Oh, I forgot to mention that I discovered that macOS doesn't use LTO when I 
worked on the https://bugs.python.org/issue39542#msg373230 issue.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41181] [macOS] Build macOS installer with LTO and PGO optimizations

2020-07-01 Thread Ned Deily


Ned Deily  added the comment:

er, "macOS 11.0 Big Sur" :)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41181] [macOS] Build macOS installer with LTO and PGO optimizations

2020-07-01 Thread Ned Deily


Ned Deily  added the comment:

I should have made it clearer that we expect to release a new installer variant 
for macOS 11.6 Big Sur that supports both Intel and Apple Silicon architectures 
later this year (i.e. in several months) when Big Sur releases. It will be much 
easier to support newer optimizations in that variant.  We are in the process 
right now of getting builds to work on the developer previews and on developer 
hardware. We will look at optimizations for that variant then.

Please drop the idea of trying to change how we build on 10.9 (and, yes, we are 
perfectly capable of finding newer compilers to run on 10.9 but that's not the 
point - we *only* support building installers with standard Apple Developer 
Tool chains and with good reason); hacking on 10.9 is not worth it at this 
point.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41181] [macOS] Build macOS installer with LTO and PGO optimizations

2020-07-01 Thread STINNER Victor


STINNER Victor  added the comment:

If clang 6.0 is a dead end for LTO, another option is to build a recent clang 
version on macOS 10.9. If I manage to do that, would it sound like an 
acceptable solution? I don't expect any API/ABI issue just by changing the 
clang version. Upgrading clang should not change the semantics.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41181] [macOS] Build macOS installer with LTO and PGO optimizations

2020-07-01 Thread STINNER Victor


STINNER Victor  added the comment:

>> Clang 6.0 doesn't support LTO and PGO?
> No, it appears not.

That's really surprising. I see LTO mentioned in LLVM 3.4 changelog for example:
https://releases.llvm.org/3.4/tools/clang/docs/ReleaseNotes.html#new-compiler-flags

Did you try to build Python with my PR? Which error message do you get? How can 
I try? I only own a macbook which runs a recent macOS version. Maybe I could 
try to get clang 6.0 on Linux.

If PGO is not available, just enabling LTO should already make Python faster 
significantly.

I understand why Python is built on macOS 10.9, and this issue and my PR 
doesn't change anything about that. I don't request to require newer CPU 
features or to require newer macOS API or syscall. LTO only changes how Python 
itself is built.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41181] [macOS] Build macOS installer with LTO and PGO optimizations

2020-07-01 Thread Ned Deily

Ned Deily  added the comment:

> Clang 6.0 doesn't support LTO and PGO?

No, it appears not.  And it's not an oversight that we don't use the these 
options.

As Łukasz points out, for the current macOS installer variants we supply are 
designed to run on all Mac systems from macOS 10.9 on. To accomplish that 
safely, we build the Python binaries on macOS 10.9 system to ensure they will 
be compatible, in other words, we build on the oldest system support and rely 
on upward compatibility when running on newer systems. The other approach is to 
build on the newest systems available after adding runtime checks throughout 
the C code to test for the presence of newer features (i.e. runtime calls that 
have been added in an operating system release newer than the oldest one 
support).  While this ("weaklinking") can be a viable option, it's a lot more 
work to implement initially and then keep updated over each o/s release to 
avoid segfaults and other failures when users on older systems try to use newer 
features.  Eventually we would like to fully support weaklinking so that we 
could provide one installer variant for all supported o/s versions that has all 
features available at each o/s version, it's not a high priority item at the 
moment (for example, supporting the upcoming 11.0 Big Sur with Apple Silicon 
is) and the current practices have worked well for many years.

Keep in mind that the main goal of the python.org macOS installers is to 
provide a single installable binary that works correctly on a wide-range of 
macOS releases and hardware.  What we provide today works on all Macs capable 
of running macOS 10.9 or later.  In particular, it is *not* a goal to provide 
the most optimized configuration for a particular system.  In general, consider 
the range of hardware and operating system releases, that's not easy to do. I 
believe that the intended users for the python.org macOS pythons are (1) 
beginners (like in a teaching environment where ease of deployment and 
uniformity is key) and (2) third-party Mac applications developers who want an 
embeddable Python that will allow their applications to work on multiple levels 
of macOS. If you are looking for the highest performance for a particular use, 
like benchmarking, you should look elsewhere - like one of the third-party 
distributors who specialize in numeric Pythons - or build it yourself on your 
own system.

So, thanks for the suggestion but we won't be using it now. Sometime in the 
future, if and when we support weaklinking and/or use newer toolchains across 
the board we will look at adding and other optimizations.

--
resolution:  -> not a bug
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41181] [macOS] Build macOS installer with LTO and PGO optimizations

2020-07-01 Thread STINNER Victor


STINNER Victor  added the comment:

> We cannot depend on PGO and LTO for it unless we start building the installer 
> on 10.15.

Clang 6.0 doesn't support LTO and PGO? Would you mind to elaborate?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41181] [macOS] Build macOS installer with LTO and PGO optimizations

2020-07-01 Thread Łukasz Langa

Łukasz Langa  added the comment:

The installer is built on Mac OS X 10.9 so that it is forward compatible with 
all OS X and macOS versions. We cannot depend on PGO and LTO for it unless we 
start building the installer on 10.15. We cannot do this currently as those 
installers would not work with older macOS and OS X versions.

Since the Mac is switching to Apple Silicon, the plan is to start building a 
separate macOS 11+ installer. *That* could use PGO and LTO.

--
nosy: +lukasz.langa

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41181] [macOS] Build macOS installer with LTO and PGO optimizations

2020-07-01 Thread STINNER Victor


Change by STINNER Victor :


--
components: +macOS
nosy: +inada.naoki, ned.deily, rhettinger, ronaldoussoren

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41181] [macOS] Build macOS installer with LTO and PGO optimizations

2020-07-01 Thread STINNER Victor


Change by STINNER Victor :


--
keywords: +patch
pull_requests: +20402
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/21256

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41181] [macOS] Build macOS installer with LTO and PGO optimizations

2020-07-01 Thread STINNER Victor


STINNER Victor  added the comment:

The performance issue was noticed by Raymond Hettinger who ran a microbenchmark 
on tuplegetter_descr_get(), comparison between Python 3.8 and Python 3.9:

https://mail.python.org/archives/list/python-...@python.org/message/Q3YHYIKNUQH34FDEJRSLUP2MTYELFWY3/

INADA-san confirms that the performance regression was introduced by the commit 
45ec5b99aefa54552947049086e87ec01bc2fc9a (bpo-40170) which changes 
PyType_HasFeature() implementation to always call PyType_GetFlags() as a 
function rather than reading directly the PyTypeObject.tp_flags member.

https://mail.python.org/archives/list/python-...@python.org/message/FOKJXG2SYMXCHYPGUZWVYMHLDR42BYFB/


On Fedora 32, there is no performance difference because binaries are built 
with GCC using LTO and PGO: the PyType_GetFlags() function call is inlined by 
GCC 10.


I built Python on macOS with clang 11.0.3 on macOS 10.15.4, and I confirm that 
LTO+PGO allows to inline the PyType_GetFlags() function call in 
tuplegetter_descr_get().

Using "./configure && make":
---
$ lldb ./python.exe
(lldb) disassemble --name tuplegetter_descr_get
(...)
python.exe[0x1001c46ad] <+29>:  callq  0x10009c720   ; 
PyType_GetFlags at typeobject.c:2338
python.exe[0x1001c46b2] <+34>:  testl  $0x400, %eax  ; imm = 
0x400 
(...)
---

Using "./configure --with-lto --enable-optimizations && make":
---
$ lldb ./python.exe
(lldb) disassemble --name tuplegetter_descr_get
(...)
python.exe[0x1002a9542] <+18>:  movq   0x10(%rbx), %rdx
python.exe[0x1002a9546] <+22>:  movq   0x8(%rsi), %rax
python.exe[0x1002a954a] <+26>:  testb  $0x4, 0xab(%rax)
python.exe[0x1002a9551] <+33>:  je 0x1002a956f   ; <+63>
(...)
---

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41181] [macOS] Build macOS installer with LTO and PGO optimizations

2020-07-01 Thread STINNER Victor


New submission from STINNER Victor :

Link Time Optimization (LTO) and Profile-Guided Optimization (PGO) have a major 
impact on Python performance: they make Python between 10% and 30% faster 
(coarse estimation).

Currently, macOS installers distributed on python.org are built with Clang 6.0 
without LTO or PGO. I propose to enable LTO and PGO to make these binaries 
faster.

IMO we should build all new Python macOS installers with these optimizations.

Attached PR adds the flags.


Python 3.9.0b3 binary:

$ python3.9
Python 3.9.0b3 (v3.9.0b3:b484871ba7, Jun  9 2020, 16:05:25) 
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

configure options:

>>> import sysconfig; print(sysconfig.get_config_var('CONFIG_ARGS'))
'-C' '--enable-framework' '--enable-universalsdk=/' 
'--with-universal-archs=intel-64' '--with-computed-gotos' '--without-ensurepip' 
'--with-tcltk-includes=-I/tmp/_py/libraries/usr/local/include' 
'--with-tcltk-libs=-ltcl8.6 -ltk8.6' 'LDFLAGS=-g' 'CFLAGS=-g' 'CC=gcc'

Compiler flags:

>>> sysconfig.get_config_var('PY_CFLAGS') + 
>>> sysconfig.get_config_var('PY_CFLAGS_NODIST')
'-Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic 
-DNDEBUG -g -fwrapv -O3 -Wall -arch x86_64 -g-std=c99 -Wextra 
-Wno-unused-result -Wno-unused-parameter -Wno-missing-field-initializers 
-Wstrict-prototypes -Werror=implicit-function-declaration -fvisibility=hidden  
-I/Users/sysadmin/build/v3.9.0b3/Include/internal'

Linker flags:

>>> sysconfig.get_config_var('PY_LDFLAGS') + 
>>> sysconfig.get_config_var('PY_LDFLAGS_NODIST')
'-arch x86_64 -g'

--
components: Build
messages: 372743
nosy: vstinner
priority: normal
severity: normal
status: open
title: [macOS] Build macOS installer with LTO and PGO optimizations
type: performance
versions: Python 3.10, Python 3.8, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com