Hi Fijal,
This is Ruochen Huang, I want to begin to write my proposal and I think
actually there is not so much time left. I tried to make a summary of what I
have understood until now and the questions I want to know. Please feel free to
point out any incorrect things in my summary, and for the questions, if you
think the question is meaningless, you can just skip it, or provide some
possible document link or source code path if you think it will take too much
time to explain it.
As far as I understood,
The ‘small function’ problem occurred when one trace try to call another trace.
In source code level, it should be the situation that, inside one loop, there
is a function invocation to a function which has another loop.
Let me take the small example we discussed before, function g() tried to call
function f(a,b,c,d) in a big loop, and there is another loop inside f(a,b,c,d).
So in current version of PyPy, the result is that, two traces were generated:
the trace for the loop in g(), let me call it T1, actually, g() tried to inline
f(a,b,c,d), but since there is a loop in f, so the result is that T1 will
inline only the first iteration of the loop in f, let’s say f was taken apart
into f1(for the first iteration) and f’(for the rest iterations), so what T1
exactly does is start the loop in g -> do f1 -> do some allocations of PyFrame
(in order to call f’) -> call_assembler for f’.
the trace for the loop in f’. let me call it T2. T2 firstly unpack the PyFrame
prepared by T1, then do a preamble work, which means f’ is once again taken
apart into f2 (for the 1st iteration in f’, and it actually is also the 2nd
iteration in original f), and f’’(the iterations from 3rd iteration to the last
iteration), for f2 and f’’, there is a label in the head of them, respectively.
So finally we can say T2 consist of 3 parts: T2a (for PyFrame unpacking),
T2b(with label1, do f2), T2c(with label2, do f’’).
As mentioned above, we have T1 -> T2a -> T2b -> T3c, from the viewpoint of the
loop in f, f is distributed into: T1(f1) -> T2a -> T2b(f2) -> T2c(f’’), which
means the loop in f was peeled twice, so T2b might be not needed, further more,
the work for PyFrame before call_assembler in T1, and the unpacking work in T2a
is a waste. I can’t understand why it’s a waste very well, but I guess it’s
because T2c(f’') actually do the similar thing as f1 in T1, (or, T2c is already
*inside* the loop) Anyway, T2b is also not needed, so we want to have T1 ->
T2c, and since the work in PyFrame in T2a is eliminated, the allocation for
PyFrame in T1 can also be eliminated. So ideally we want to have T1’ (without
PyFrame allocation) -> T2c.
Some questions until now:
What’s the bridge you mentioned? To be honest I have only a very slight
understand of bridge, I know it is executed when some guard failed, but as far
as I knew, in normal trace JIT compiler, only one path of a loop will be
traced, any guard failure will make the execution escape from the native code
and return to VM, but I guess the bridge is a special kind of trace (native
code), is it right?
Could you please explain more about why T2b is not needed? I guess the answer
may be related to the “virtualizable” optimization for PyFrame, so what if
PyFrame is not virtualizable? I mean, if in that situation, does the problem
disappear? or become easier to solve?
What’s the difficulties in solving this problem? I’m sorry I’m not so familiar
with the details of RPython JIT, but in my opinion, we need just to make the
JIT know that,
when tries to inline a function, and encounter a loop so the inline work has to
stop, it’s time to do optimization O.
what O does is to delete the allocation instructions about PyFrame before
call_assembler, and them tell call_assembler to jump to 2rd label of target
trace. (In our example is T2c).
So It may seem not so difficult to solve.
Best Regards,
Ruochen Huang
_______________________________________________
pypy-dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/pypy-dev