On Sun, 19 Jun 2016 at 21:01 Mark Shannon <m...@hotpy.org> wrote: > > > On 19/06/16 18:29, Brett Cannon wrote: > > > > > > On Sat, 18 Jun 2016 at 21:49 Guido van Rossum <gu...@python.org > > <mailto:gu...@python.org>> wrote: > > > > Hi Brett, > > > > I've got a few questions about the specific design. Probably you > > know the answers, it would be nice to have them in the PEP. > > > > > > Once you're happy with my answers I'll update the PEP. > > > > > > First, why not have a global hook? What does a hook per interpreter > > give you? Would even finer granularity buy anything? > > > > > > We initially considered a per-code object hook, but we figured it was > > unnecessary to have that level of control, especially since people like > > Numba have gotten away with not needing it for this long (although I > > suspect that's because they are a decorator so they can just return an > > object that overrides __call__()). We didn't think that a global one was > > appropriate as different workloads may call for different > > JITs/debuggers/etc. and there is no guarantee that you are executing > > every interpreter with the same workload. Plus we figured people might > > simply import their JIT of choice and as a side-effect set the hook, and > > since imports are a per-interpreter thing that seemed to suggest the > > granularity of interpreters. > > > > IOW it seemed to be more in line with sys.settrace() than some global > > thing for the process. > > > > > > Next, I'm a bit (but no more than a bit) concerned about the extra 8 > > bytes per code object, especially since for most people this is just > > waste (assuming most people won't be using Pyjion or Numba). Could > > it be a compile-time feature (requiring recompilation of CPython but > > not extensions)? > > > > > > Probably. It does water down potential usage thanks to needing a special > > build. If the decision is "special build or not", I would simply pull > > out this part of the proposal as I wouldn't want to add a flag that > > influences what is or is not possible for an interpreter. > > > > Could you figure out some other way to store per-code-object data? > > It seems you considered this but decided that the co_extra field was > > simpler and faster; I'm basically pushing a little harder on this. > > Of course most of the PEP would disappear without this feature; the > > extra interpreter field is fine. > > > > > > Dino and I thought of two potential alternatives, neither of which we > > have taken the time to implement and benchmark. One is to simply have a > > hash table of memory addresses to JIT data that is kept on the JIT side > > of things. Obviously it would be nice to avoid the overhead of a hash > > table lookup on every function call. This also doesn't help minimize > > memory when the code object gets GC'ed. > > Hash lookups aren't that slow.
There's "slow" and there's "slower". > If you combine it with the custom flags > suggested by MRAB, then you would only suffer the lookup penalty when > actually entering the special interpreter. > You actually will always need the lookup in the JIT case to increment the execution count if you're not always immediately JIT-ing. That means MRAB's flag won't necessarily be that useful in the JIT case (it could in the debugging case, though, if you're really aiming for the fastest debugger possible). > You can use a weakref callback to ensure things get GC'd properly. > Yes, that was already the plan if we lost co_extra. > > Also, if there is a special extra field on code-object, then everyone > will want to use it. How do you handle clashes? > As already explained in the PEP in https://www.python.org/dev/peps/pep-0523/#expanding-pycodeobject, like consenting adults. The expectation is that there will not be multiple users of the object at the same time. -Brett > > > > > The other potential solution we came up with was to use weakrefs. I have > > not looked into the details, but we were thinking that if we registered > > the JIT data object as a weakref on the code object, couldn't we iterate > > through the weakrefs attached to the code object to look for the JIT > > data object, and then get the reference that way? It would let us avoid > > a more expensive hash table lookup if we assume most code objects won't > > have a weakref on it (assuming weakrefs are stored in a list), and it > > gives us the proper cleanup semantics we want by getting the weakref > > cleanup callback execution to make sure we decref the JIT data object > > appropriately. But as I said, I have not looked into the feasibility of > > this at all to know if I'm remembering the weakref implementation > > details correctly. > > > > > > Finally, there are some error messages from pep2html.py: > > https://www.python.org/dev/peps/pep-0523/#copyright > > > > > > All fixed in > > > https://github.com/python/peps/commit/6929f850a5af07e51d0163558a5fe8d6b85dccfe > . > > > > -Brett > > > > > > > > --Guido > > > > On Fri, Jun 17, 2016 at 7:58 PM, Brett Cannon <br...@python.org > > <mailto:br...@python.org>> wrote: > > > > I have taken PEP 523 for this: > > https://github.com/python/peps/blob/master/pep-0523.txt . > > > > I'm waiting until Guido gets back from vacation, at which point > > I'll ask for a pronouncement or assignment of a BDFL delegate. > > > > On Fri, 3 Jun 2016 at 14:37 Brett Cannon <br...@python.org > > <mailto:br...@python.org>> wrote: > > > > For those of you who follow python-ideas or were at the > > PyCon US 2016 language summit, you have already seen/heard > > about this PEP. For those of you who don't fall into either > > of those categories, this PEP proposed a frame evaluation > > API for CPython. The motivating example of this work has > > been Pyjion, the experimental CPython JIT Dino Viehland and > > I have been working on in our spare time at Microsoft. The > > API also works for debugging, though, as already > > demonstrated by Google having added a very similar API > > internally for debugging purposes. > > > > The PEP is pasted in below and also available in rendered > > form at > > https://github.com/Microsoft/Pyjion/blob/master/pep.rst (I > > will assign myself a PEP # once discussion is finished as > > it's easier to work in git for this for the rich rendering > > of the in-progress PEP). > > > > I should mention that the difference from python-ideas and > > the language summit in the PEP are the listed support from > > Google's use of a very similar API as well as clarifying the > > co_extra field on code objects doesn't change their > > immutability (at least from the view of the PEP). > > > > ---------- > > PEP: NNN > > Title: Adding a frame evaluation API to CPython > > Version: $Revision$ > > Last-Modified: $Date$ > > Author: Brett Cannon <br...@python.org > > <mailto:br...@python.org>>, > > Dino Viehland <di...@microsoft.com > > <mailto:di...@microsoft.com>> > > Status: Draft > > Type: Standards Track > > Content-Type: text/x-rst > > Created: 16-May-2016 > > Post-History: 16-May-2016 > > 03-Jun-2016 > > > > > > Abstract > > ======== > > > > This PEP proposes to expand CPython's C API [#c-api]_ to > > allow for > > the specification of a per-interpreter function pointer to > > handle the > > evaluation of frames [#pyeval_evalframeex]_. This proposal > also > > suggests adding a new field to code objects [#pycodeobject]_ > > to store > > arbitrary data for use by the frame evaluation function. > > > > > > Rationale > > ========= > > > > One place where flexibility has been lacking in Python is in > > the direct > > execution of Python code. While CPython's C API [#c-api]_ > > allows for > > constructing the data going into a frame object and then > > evaluating it > > via ``PyEval_EvalFrameEx()`` [#pyeval_evalframeex]_, control > > over the > > execution of Python code comes down to individual objects > > instead of a > > hollistic control of execution at the frame level. > > > > While wanting to have influence over frame evaluation may > > seem a bit > > too low-level, it does open the possibility for things such > as a > > method-level JIT to be introduced into CPython without > > CPython itself > > having to provide one. By allowing external C code to > > control frame > > evaluation, a JIT can participate in the execution of Python > > code at > > the key point where evaluation occurs. This then allows for > > a JIT to > > conditionally recompile Python bytecode to machine code as > > desired > > while still allowing for executing regular CPython bytecode > when > > running the JIT is not desired. This can be accomplished by > > allowing > > interpreters to specify what function to call to evaluate a > > frame. And > > by placing the API at the frame evaluation level it allows > for a > > complete view of the execution environment of the code for > > the JIT. > > > > This ability to specify a frame evaluation function also > > allows for > > other use-cases beyond just opening CPython up to a JIT. For > > instance, > > it would not be difficult to implement a tracing or > > profiling function > > at the call level with this API. While CPython does provide > the > > ability to set a tracing or profiling function at the Python > > level, > > this would be able to match the data collection of the > > profiler and > > quite possibly be faster for tracing by simply skipping > per-line > > tracing support. > > > > It also opens up the possibility of debugging where the frame > > evaluation function only performs special debugging work > when it > > detects it is about to execute a specific code object. In > that > > instance the bytecode could be theoretically rewritten > > in-place to > > inject a breakpoint function call at the proper point for > > help in > > debugging while not having to do a heavy-handed approach as > > required by ``sys.settrace()``. > > > > To help facilitate these use-cases, we are also proposing > > the adding > > of a "scratch space" on code objects via a new field. This > > will allow > > per-code object data to be stored with the code object > > itself for easy > > retrieval by the frame evaluation function as necessary. The > > field > > itself will simply be a ``PyObject *`` type so that any data > > stored in > > the field will participate in normal object memory > management. > > > > > > Proposal > > ======== > > > > All proposed C API changes below will not be part of the > > stable ABI. > > > > > > Expanding ``PyCodeObject`` > > -------------------------- > > > > One field is to be added to the ``PyCodeObject`` struct > > [#pycodeobject]_:: > > > > typedef struct { > > ... > > PyObject *co_extra; /* "Scratch space" for the code > > object. */ > > } PyCodeObject; > > > > The ``co_extra`` will be ``NULL`` by default and will not be > > used by > > CPython itself. Third-party code is free to use the field as > > desired. > > Values stored in the field are expected to not be required > > in order > > for the code object to function, allowing the loss of the > > data of the > > field to be acceptable (this keeps the code object as > > immutable from > > a functionality point-of-view; this is slightly contentious > > and so is > > listed as an open issue in `Is co_extra needed?`_). The > > field will be > > freed like all other fields on ``PyCodeObject`` during > > deallocation > > using ``Py_XDECREF()``. > > > > It is not recommended that multiple users attempt to use the > > ``co_extra`` simultaneously. While a dictionary could > > theoretically be > > set to the field and various users could use a key specific > > to the > > project, there is still the issue of key collisions as well > as > > performance degradation from using a dictionary lookup on > > every frame > > evaluation. Users are expected to do a type check to make > > sure that > > the field has not been previously set by someone else. > > > > > > Expanding ``PyInterpreterState`` > > -------------------------------- > > > > The entrypoint for the frame evalution function is > > per-interpreter:: > > > > // Same type signature as PyEval_EvalFrameEx(). > > typedef PyObject* (__stdcall > > *PyFrameEvalFunction)(PyFrameObject*, int); > > > > typedef struct { > > ... > > PyFrameEvalFunction eval_frame; > > } PyInterpreterState; > > > > By default, the ``eval_frame`` field will be initialized to > > a function > > pointer that represents what ``PyEval_EvalFrameEx()`` > > currently is > > (called ``PyEval_EvalFrameDefault()``, discussed later in > > this PEP). > > Third-party code may then set their own frame evaluation > > function > > instead to control the execution of Python code. A pointer > > comparison > > can be used to detect if the field is set to > > ``PyEval_EvalFrameDefault()`` and thus has not been mutated > yet. > > > > > > Changes to ``Python/ceval.c`` > > ----------------------------- > > > > ``PyEval_EvalFrameEx()`` [#pyeval_evalframeex]_ as it > > currently stands > > will be renamed to ``PyEval_EvalFrameDefault()``. The new > > ``PyEval_EvalFrameEx()`` will then become:: > > > > PyObject * > > PyEval_EvalFrameEx(PyFrameObject *frame, int throwflag) > > { > > PyThreadState *tstate = PyThreadState_GET(); > > return tstate->interp->eval_frame(frame, throwflag); > > } > > > > This allows third-party code to place themselves directly in > > the path > > of Python code execution while being backwards-compatible > > with code > > already using the pre-existing C API. > > > > > > Updating ``python-gdb.py`` > > -------------------------- > > > > The generated ``python-gdb.py`` file used for Python support > > in GDB > > makes some hard-coded assumptions about > > ``PyEval_EvalFrameEx()``, e.g. > > the names of local variables. It will need to be updated to > > work with > > the proposed changes. > > > > > > Performance impact > > ================== > > > > As this PEP is proposing an API to add pluggability, > performance > > impact is considered only in the case where no third-party > > code has > > made any changes. > > > > Several runs of pybench [#pybench]_ consistently showed no > > performance > > cost from the API change alone. > > > > A run of the Python benchmark suite [#py-benchmarks]_ showed > no > > measurable cost in performance. > > > > In terms of memory impact, since there are typically not > > many CPython > > interpreters executing in a single process that means the > > impact of > > ``co_extra`` being added to ``PyCodeObject`` is the only > worry. > > According to [#code-object-count]_, a run of the Python test > > suite > > results in about 72,395 code objects being created. On a > 64-bit > > CPU that would result in 579,160 bytes of extra memory being > > used if > > all code objects were alive at once and had nothing set in > their > > ``co_extra`` fields. > > > > > > Example Usage > > ============= > > > > A JIT for CPython > > ----------------- > > > > Pyjion > > '''''' > > > > The Pyjion project [#pyjion]_ has used this proposed API to > > implement > > a JIT for CPython using the CoreCLR's JIT [#coreclr]_. Each > code > > object has its ``co_extra`` field set to a > > ``PyjionJittedCode`` object > > which stores four pieces of information: > > > > 1. Execution count > > 2. A boolean representing whether a previous attempt to JIT > > failed > > 3. A function pointer to a trampoline (which can be type > > tracing or not) > > 4. A void pointer to any JIT-compiled machine code > > > > The frame evaluation function has (roughly) the following > > algorithm:: > > > > def eval_frame(frame, throw_flag): > > pyjion_code = frame.code.co_extra > > if not pyjion_code: > > frame.code.co_extra = PyjionJittedCode() > > elif not pyjion_code.jit_failed: > > if not pyjion_code.jit_code: > > return > > pyjion_code.eval(pyjion_code.jit_code, frame) > > elif pyjion_code.exec_count > 20_000: > > if jit_compile(frame): > > return > > pyjion_code.eval(pyjion_code.jit_code, frame) > > else: > > pyjion_code.jit_failed = True > > pyjion_code.exec_count += 1 > > return PyEval_EvalFrameDefault(frame, throw_flag) > > > > The key point, though, is that all of this work and logic is > > separate > > from CPython and yet with the proposed API changes it is > able to > > provide a JIT that is compliant with Python semantics (as of > > this > > writing, performance is almost equivalent to CPython without > > the new > > API). This means there's nothing technically preventing > > others from > > implementing their own JITs for CPython by utilizing the > > proposed API. > > > > > > Other JITs > > '''''''''' > > > > It should be mentioned that the Pyston team was consulted on > an > > earlier version of this PEP that was more JIT-specific and > > they were > > not interested in utilizing the changes proposed because > > they want > > control over memory layout they had no interest in directly > > supporting > > CPython itself. An informal discusion with a developer on > > the PyPy > > team led to a similar comment. > > > > Numba [#numba]_, on the other hand, suggested that they > would be > > interested in the proposed change in a post-1.0 future for > > themselves [#numba-interest]_. > > > > The experimental Coconut JIT [#coconut]_ could have > > benefitted from > > this PEP. In private conversations with Coconut's creator we > > were told > > that our API was probably superior to the one they developed > for > > Coconut to add JIT support to CPython. > > > > > > Debugging > > --------- > > > > In conversations with the Python Tools for Visual Studio > > team (PTVS) > > [#ptvs]_, they thought they would find these API changes > > useful for > > implementing more performant debugging. As mentioned in the > > Rationale_ > > section, this API would allow for switching on debugging > > functionality > > only in frames where it is needed. This could allow for > either > > skipping information that ``sys.settrace()`` normally > > provides and > > even go as far as to dynamically rewrite bytecode prior to > > execution > > to inject e.g. breakpoints in the bytecode. > > > > It also turns out that Google has provided a very similar API > > internally for years. It has been used for performant > debugging > > purposes. > > > > > > Implementation > > ============== > > > > A set of patches implementing the proposed API is available > > through > > the Pyjion project [#pyjion]_. In its current form it has > more > > changes to CPython than just this proposed API, but that is > > for ease > > of development instead of strict requirements to accomplish > > its goals. > > > > > > Open Issues > > =========== > > > > Allow ``eval_frame`` to be ``NULL`` > > ----------------------------------- > > > > Currently the frame evaluation function is expected to > > always be set. > > It could very easily simply default to ``NULL`` instead > > which would > > signal to use ``PyEval_EvalFrameDefault()``. The current > > proposal of > > not special-casing the field seemed the most > > straight-forward, but it > > does require that the field not accidentally be cleared, > > else a crash > > may occur. > > > > > > Is co_extra needed? > > ------------------- > > > > While discussing this PEP at PyCon US 2016, some core > developers > > expressed their worry of the ``co_extra`` field making code > > objects > > mutable. The thinking seemed to be that having a field that > was > > mutated after the creation of the code object made the > > object seem > > mutable, even though no other aspect of code objects changed. > > > > The view of this PEP is that the `co_extra` field doesn't > > change the > > fact that code objects are immutable. The field is specified > > in this > > PEP as to not contain information required to make the code > > object > > usable, making it more of a caching field. It could be > viewed as > > similar to the UTF-8 cache that string objects have > internally; > > strings are still considered immutable even though they have > > a field > > that is conditionally set. > > > > The field is also not strictly necessary. While the field > > greatly > > simplifies attaching extra information to code objects, > > other options > > such as keeping a mapping of code object memory addresses to > > what > > would have been kept in ``co_extra`` or perhaps using a weak > > reference > > of the data on the code object and then iterating through > > the weak > > references until the attached data is found is possible. But > > obviously > > all of these solutions are not as simple or performant as > > adding the > > ``co_extra`` field. > > > > > > Rejected Ideas > > ============== > > > > A JIT-specific C API > > -------------------- > > > > Originally this PEP was going to propose a much larger API > > change > > which was more JIT-specific. After soliciting feedback from > > the Numba > > team [#numba]_, though, it became clear that the API was > > unnecessarily > > large. The realization was made that all that was truly > > needed was the > > opportunity to provide a trampoline function to handle > > execution of > > Python code that had been JIT-compiled and a way to attach > that > > compiled machine code along with other critical data to the > > corresponding Python code object. Once it was shown that > > there was no > > loss in functionality or in performance while minimizing the > API > > changes required, the proposal was changed to its current > form. > > > > > > References > > ========== > > > > .. [#pyjion] Pyjion project > > (https://github.com/microsoft/pyjion) > > > > .. [#c-api] CPython's C API > > (https://docs.python.org/3/c-api/index.html) > > > > .. [#pycodeobject] ``PyCodeObject`` > > ( > https://docs.python.org/3/c-api/code.html#c.PyCodeObject) > > > > .. [#coreclr] .NET Core Runtime (CoreCLR) > > (https://github.com/dotnet/coreclr) > > > > .. [#pyeval_evalframeex] ``PyEval_EvalFrameEx()`` > > > > ( > https://docs.python.org/3/c-api/veryhigh.html?highlight=pyframeobject#c.PyEval_EvalFrameEx > ) > > > > .. [#pycodeobject] ``PyCodeObject`` > > ( > https://docs.python.org/3/c-api/code.html#c.PyCodeObject) > > > > .. [#numba] Numba > > (http://numba.pydata.org/) > > > > .. [#numba-interest] numba-users mailing list: > > "Would the C API for a JIT entrypoint being proposed by > > Pyjion help out Numba?" > > > > ( > https://groups.google.com/a/continuum.io/forum/#!topic/numba-users/yRl_0t8-m1g > ) > > > > .. [#code-object-count] [Python-Dev] Opcode cache in ceval > loop > > > > ( > https://mail.python.org/pipermail/python-dev/2016-February/143025.html) > > > > .. [#py-benchmarks] Python benchmark suite > > (https://hg.python.org/benchmarks) > > > > .. [#pyston] Pyston > > (http://pyston.org) > > > > .. [#pypy] PyPy > > (http://pypy.org/) > > > > .. [#ptvs] Python Tools for Visual Studio > > (http://microsoft.github.io/PTVS/) > > > > .. [#coconut] Coconut > > (https://github.com/davidmalcolm/coconut) > > > > > > Copyright > > ========= > > > > This document has been placed in the public domain. > > > > > > > > .. > > Local Variables: > > mode: indented-text > > indent-tabs-mode: nil > > sentence-end-double-space: t > > fill-column: 70 > > coding: utf-8 > > End: > > > > > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev@python.org <mailto:Python-Dev@python.org> > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > > > https://mail.python.org/mailman/options/python-dev/guido%40python.org > > > > > > > > > > -- > > --Guido van Rossum (python.org/~guido <http://python.org/~guido>) > > > > > > > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev@python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/mark%40hotpy.org > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org >
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com