Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

David Malcolm Thu, 21 Jan 2010 14:37:04 -0800

On Wed, 2010-01-20 at 14:27 -0800, Collin Winter wrote:

[snip]


> At a high level, the Unladen Swallow JIT compiler works by translating a
> function's CPython bytecode to platform-specific machine code, using data
> collected at runtime, as well as classical compiler optimizations, to improve
> the quality of the generated machine code. Because we only want to spend
> resources compiling Python code that will actually benefit the runtime of the
> program, an online heuristic is used to assess how hot a given function is. 
> Once
> the hotness value for a function crosses a given threshold, it is selected for
> compilation and optimization. Until a function is judged hot, however, it runs
> in the standard CPython eval loop, which in Unladen Swallow has been
> instrumented to record interesting data about each bytecode executed. This
> runtime data is used to reduce the flexibility of the generated machine code,
> allowing us to optimize for the common case. For example, we collect data on
> 
> - Whether a branch was taken/not taken. If a branch is never taken, we will 
> not
>   compile it to machine code.
> - Types used by operators. If we find that ``a + b`` is only ever adding
>   integers, the generated machine code for that snippet will not support 
> adding
>   floats.
> - Functions called at each callsite. If we find that a particular ``foo()``
>   callsite is always calling the same ``foo`` function, we can optimize the
>   call or inline it away
> 
> Refer to [#us-llvm-notes]_ for a complete list of data points gathered and how
> they are used.

[snip]

To what extent would it be possible to use (conditionally) use full
ahead-of-time compilation as well as JIT?

With my "downstream distributor of Python" hat on, I'm wondering if it
would be feasible to replace the current precompiled .pyc/.pyo files in
marshal format with .so/.dll files in platform-specific shared-library
format, so that the pre-compiled versions of the stdlib could be
memory-mapped and shared between all Python processes on a system.  This
ought to dramatically reduce the whole-system memory load of the various
Python processes, whilst giving a reduction in CPU usage.  Distributors
of Python could build these shared libraries as part of the packaging
process, so that e.g. all of the Fedora python3 rpm packages would
contain .so files for every .py  (and this could apply to packaged
add-ons as well, so that every module you import would typically be
pre-compiled); startup of a python process would then involve
shared-readonly mmap-ing these files (which would typically be already
paged in if you're doing a lot of Python).

Potentially part of the memory bloat you're seeing could be debug data;
if that's the case, then the debug information could be stripped from
those .so files and shipped in a debuginfo package, to be loaded on
demand by the debugger (we do something like this in Fedora with our
RPMs for regular shared libraries and binaries).

(I wonder if to do this well would require adding annotations to the
code with hints about types to expect, since you'd have to lose the
run-time instrumentation, I think).

I did some research into the benefits of mmap-ing the data in .pyc files
to try to share the immutable data between them.  Executive summary is
that a (rather modest) saving of about 200K of heap usage per python
process is possible that way (with a rewrite of PyStringObject), with
higher savings depending on how many modules you import;  see:
http://dmalcolm.livejournal.com/4183.html

I'd expect to see this approach be more worthwhile when the in-memory
sizes of the modules get larger (hence this email).

[snip]

Hope this is helpful
Dave

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

Reply via email to