On Wed, 2010-01-20 at 14:27 -0800, Collin Winter wrote: [snip]
> At a high level, the Unladen Swallow JIT compiler works by translating a > function's CPython bytecode to platform-specific machine code, using data > collected at runtime, as well as classical compiler optimizations, to improve > the quality of the generated machine code. Because we only want to spend > resources compiling Python code that will actually benefit the runtime of the > program, an online heuristic is used to assess how hot a given function is. > Once > the hotness value for a function crosses a given threshold, it is selected for > compilation and optimization. Until a function is judged hot, however, it runs > in the standard CPython eval loop, which in Unladen Swallow has been > instrumented to record interesting data about each bytecode executed. This > runtime data is used to reduce the flexibility of the generated machine code, > allowing us to optimize for the common case. For example, we collect data on > > - Whether a branch was taken/not taken. If a branch is never taken, we will > not > compile it to machine code. > - Types used by operators. If we find that ``a + b`` is only ever adding > integers, the generated machine code for that snippet will not support > adding > floats. > - Functions called at each callsite. If we find that a particular ``foo()`` > callsite is always calling the same ``foo`` function, we can optimize the > call or inline it away > > Refer to [#us-llvm-notes]_ for a complete list of data points gathered and how > they are used. [snip] To what extent would it be possible to use (conditionally) use full ahead-of-time compilation as well as JIT? With my "downstream distributor of Python" hat on, I'm wondering if it would be feasible to replace the current precompiled .pyc/.pyo files in marshal format with .so/.dll files in platform-specific shared-library format, so that the pre-compiled versions of the stdlib could be memory-mapped and shared between all Python processes on a system. This ought to dramatically reduce the whole-system memory load of the various Python processes, whilst giving a reduction in CPU usage. Distributors of Python could build these shared libraries as part of the packaging process, so that e.g. all of the Fedora python3 rpm packages would contain .so files for every .py (and this could apply to packaged add-ons as well, so that every module you import would typically be pre-compiled); startup of a python process would then involve shared-readonly mmap-ing these files (which would typically be already paged in if you're doing a lot of Python). Potentially part of the memory bloat you're seeing could be debug data; if that's the case, then the debug information could be stripped from those .so files and shipped in a debuginfo package, to be loaded on demand by the debugger (we do something like this in Fedora with our RPMs for regular shared libraries and binaries). (I wonder if to do this well would require adding annotations to the code with hints about types to expect, since you'd have to lose the run-time instrumentation, I think). I did some research into the benefits of mmap-ing the data in .pyc files to try to share the immutable data between them. Executive summary is that a (rather modest) saving of about 200K of heap usage per python process is possible that way (with a rewrite of PyStringObject), with higher savings depending on how many modules you import; see: http://dmalcolm.livejournal.com/4183.html I'd expect to see this approach be more worthwhile when the in-memory sizes of the modules get larger (hence this email). [snip] Hope this is helpful Dave _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com