[JIT] bsr/ret in native code
hello there, in one of my endless tours inside the JIT world, I came up with this idea which seems to give a major speed increase. basically, I'm substituting the Parrot method for subroutines (push the current address in the call stack and then jump) with a plain native x86 ASM call instruction. and of course, the ret instruction is just a plain native ret instruction. that way I'm completely avoiding the call stack, just relaying to the CPU internal stack for this. to make it work, I had to JIT all the 2-parameters eq/ne instructions to perform a ret on successful comparison instead of a pop and goto. this is of course a major change in the internal working of the interpreter when using the -j option, so I'm not sure it is a Good Thing. you would not be able, for example, to inspect the call stack from inside a Parrot program anymore. anyway, this is a little sample of the implementation: Parrot_bsr_ic { emit_call_op2(jit_info, *INT_CONST[1]); } Parrot_ret { emitm_ret(NATIVECODE); } Parrot_eq_i_i { emitm_movl_m_r(NATIVECODE, emit_EAX, emit_None, emit_None, emit_None, INT_REG[1]); emitm_cmpl_r_m(NATIVECODE, emit_EAX, emit_None, emit_None, emit_None, INT_REG[2]); emitm_jxs(NATIVECODE, emitm_jne, +1); emitm_ret(NATIVECODE); } there are of course a lot more eq_X_X and ne_X_X combination, but they're all similar to this. the emit_call_op2 in jit.h is just a slight variant of emit_call_op which only uses 32 bit displacement for backward calls (don't ask me why, but it seems to work like this): static void emit_call_op2(Parrot_jit_info *jit_info, opcode_t disp){ long offset; opcode_t opcode; opcode = jit_info-op_i + disp; if(opcode = jit_info-op_i) { offset = jit_info-op_map[opcode].offset - (jit_info-native_ptr - jit_info-arena_start); emitm_calll(jit_info-native_ptr, offset - 5); return; } Parrot_jit_newfixup(jit_info); jit_info-fixups-type = JIT_X86JUMP; jit_info-fixups-param.opcode = opcode; emitm_calll(jit_info-native_ptr, 0xc0def00d); } if anybody sees a problem with this approach, please let me know, otherwise I'll go on with the patch. cheers, Aldo __END__ $_=q,just perl,,s, , another ,,s,$, hacker,,print;
Re: [JIT] bsr/ret in native code
At 9:54 AM +0200 6/14/02, Aldo Calpini wrote: you would not be able, for example, to inspect the call stack from inside a Parrot program anymore. That, unfortunately, makes it untenable, since we need to be able to do this in the general case. Also, we'll fill up the thread stack pretty quickly. Not hugely fast, mind, but it's still an issue when we have a potentially small stack on hand. (20-40K won't be unusual, unfortunately) Believe me, I'd love to get the speed this way, but it'll make some code untenable, and the lack of stack inspection may be a problem. (If it turns out later to not be a problem, well, we can do it then. I like the idea, I just think the limits'll be a problem. Hopefully I'm wrong :) -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: [JIT] bsr/ret in native code
On Fri, 14 Jun 2002, Dan Sugalski wrote: : At 9:54 AM +0200 6/14/02, Aldo Calpini wrote: : you would : not be able, for example, to inspect the call stack from inside a Parrot : program anymore. : : That, unfortunately, makes it untenable, since we need to be able to : do this in the general case. Also, we'll fill up the thread stack : pretty quickly. Not hugely fast, mind, but it's still an issue when : we have a potentially small stack on hand. (20-40K won't be unusual, : unfortunately) : : Believe me, I'd love to get the speed this way, but it'll make some : code untenable, and the lack of stack inspection may be a problem. : (If it turns out later to not be a problem, well, we can do it then. : I like the idea, I just think the limits'll be a problem. Hopefully : I'm wrong :) Hmm. The routines called from tight loops tend to be leaf nodes. It might very well be useful to keep track of which routines don't inspect the stack. It might even be worthwhile to make a language rule saying that any routine that uses Ccaller or Cwant must so indicate in the declaration somehow, via a superpositional return type or a property. Larry
Re: [JIT] bsr/ret in native code
On Fri, 14 Jun 2002, Nicholas Clark wrote: : But surely an routine that calls another routine can potentially have its : stack inspected by the caller? Certainly. : So it would only make sense for leaf nodes, and even then they might : get inspected by overloaded values or methods on objects that were passed : as parameters? Yes. : So is it possible to make it useful in a general case, or were you meaning : that a subroutine can declare I don't need to be on the stack, document : itself as such, and then any indirect calls it makes don't get to see it : (but at their own risk). It's still a form of action-at-a-distance, so : is it that good? Probably can't make the optimization unless we have the body and can tell either that there are no indirect calls or that any indirect calls made are known safe. I can see some routines that could use this optimization that couldn't use inlining (such as when we have no guarantee against redefinition, except in that case you still have to go indirect through the header). : Or would the property of I don't use caller or want still be useful on a : subroutine, because the run-time could determine that it would be : inline-able (or whatever) inside a loop at run time, based on parameters : passed to it? (and call it non-inline if the parameters were not base perl : types) Maybe. I'm not an expert on run-time optimizations. I just know that the more info you have, the easier it is to know when you can get away with a particular optimization. And that there are advantages and disadvantages to knowing anything at any particular stage. And I really like optional declarations, because then the programmer gets to make the tradeoff. Larry
Re: [JIT] bsr/ret in native code
At 1:49 PM -0700 6/14/02, Larry Wall wrote: On Fri, 14 Jun 2002, Nicholas Clark wrote: : Or would the property of I don't use caller or want still be useful on a : subroutine, because the run-time could determine that it would be : inline-able (or whatever) inside a loop at run time, based on parameters : passed to it? (and call it non-inline if the parameters were not base perl : types) Maybe. I'm not an expert on run-time optimizations. I just know that the more info you have, the easier it is to know when you can get away with a particular optimization. And that there are advantages and disadvantages to knowing anything at any particular stage. And I really like optional declarations, because then the programmer gets to make the tradeoff. There's also the problem of active data--does a variable's tie/overload functions have access to their calling stack? If so, it's doubly hard to figure out whether there's anything that may inspect the call stack. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk