Re: JIT and platforms warning
Leopold Toetsch [EMAIL PROTECTED] wrote: arm, mips, and sun4 JIT platforms need definitely some work to even keep up with the current state of the JIT interface. That's actually wrong - sorry. Sun4 JIT is fairly complete and is up to date and should't have been in above sentence. I've messed that up with the arm platform, which doesn't use register mappings at all. All opcodes are reloading and storing from/to Parrot registers. Mips only has 3 JITted opcodes. Sorry again Stéphane, leo
Re: [PATCH] Re: JIT and platforms warning
Jeff Clites [EMAIL PROTECTED] wrote: On Oct 23, 2004, at 4:20 AM, Leopold Toetsch wrote: Jeff Clites [EMAIL PROTECTED] wrote: See attached the patch, plus the new asm.s file. Doesn't run, segfaults on even mops.pasm - please check. I can't reproduce that here; parrot -j works for me Sorry, false alarm. I must have had indirect register access tured on. Runs fine and gets applied. leo
Re: [PATCH] Re: JIT and platforms warning
Jeff Clites [EMAIL PROTECTED] wrote: On Oct 23, 2004, at 3:42 AM, Leopold Toetsch wrote: We were allocating the volatile float registers first (or, only)--so Cset_s_sc was blowing away an N-register, even with only one in use. That's why I was surprised there weren't more failures. Yes. As said i386 had the same problem and not one test file did break. I've now added a quick hack to the JIT compiler, which seems to do the right thing, or better, it's a step towards that: $extern = 1 if $asm =~ /call_func/; This marks the function being extern to JIT, so the register preserving code is activated. It's not quite the best solution, because only used volatiles have to be preserved/restored. More below ... Anyway, I think we need a more general solution. ... Solution: The JIT compiler/optimizer already calculates the register mapping. We have to use this information for JIT pro- and epilogs. That makes sense--I hadn't initially realized we were tracking this, but since we are, we should use it. Things may be a bit tricky fixup-wise, since the size of the prolog will depend on how many float registers we need to preserve. Not really. When creating the prolog/epilog you know exactly the register mapping of each section. So we just have to find the maximum for each register kind, then decide, if we should use non-volatiles or volatiles or both and emit the appropriate code. We probably need some platform settings and tweakables, which allocation policy should be used, e.g. PARROT_JIT_PREFER_VOLATILES and maybe a threshold, when to change allocation policy. But anyway, in jit_info-optimizer, you'll get the count of used registers. With that information the platform code can calculate the used stack storage and emit the appropriate pro- and epilogs. E.g. from Mach-O- Runtime Conventions for PowerPC - PowerPC Stack Structure spaceToSave = linkageArea + params + localVars + 4 * nGPRS + 8 * nFPRS rounded up to x*16. It's the same, what currently is a hardcode define. 2) JITed functions with calls into Parrot The jit2h JIT compiler needs a hint, that we call external code, e.g. CALL_FUNCTION(string_copy) So, back to external functions. With this syntax (and a needed jit_info argument), we can do: $extern = -1 if $asm =~ /CALL_FUNCTION/; This activates the saving and restoring of used volatiles. I don't think that we have to write back non-volatiles to Parrot registers, at least not, if we define that JITted code like that will not see a correct Parrot register set. If a function would need the Parrot registers, just don't provide a JITted version for it. OTOH this could be problematic if the function throws an exception, and if the exception handler is able to provide a Parrot core dump. One other question: Your P_ARITH optimization in jit/ppc/core.jit. I can't come up with a case where this kicks in, It's alomost only for examples/benchmarks/mops.pasm: + 50% speed. Now PPC JIT code is two instructions for the loop instead of three ;) Thanks, JEff leo
[PATCH] Re: JIT and platforms warning
On Oct 22, 2004, at 3:57 AM, Leopold Toetsch wrote: Jeff Clites wrote: On Oct 22, 2004, at 1:01 AM, Leopold Toetsch wrote: [JIT changes] I just finished tracking down the source of a couple of JIT test failures on PPC--due to recent changes but only indirectly related, and pointing out things which needed fixing anyway (float register preservation issues). I'll send it in tomorrow after I've had a chance to clean it up and add some comments. Please make sure to get a recent CVS copy and try to prepare the patch during my night ;) Of course! I've changed the PPC float register allocation yesterday, because it did look bogus - i.e. the non-volatile FPRs were allocated but not saved. That should be fixed. Yep, that was the core of the issue. There's no free lunch--if we use the nonvolatile registers, we need to preserve/restore them in begin/end, but if we use the volatile registers, we need to preserve them across function calls (incl. normal op calls). So I added code to do the appropriate save/restore, and use the non-volatile registers for mapping--that should be less asm than what we'd have to do to use the volatile registers. (The surprising thing was that we only got 2 failures when using the volatile registers--I'll look into creating some tests that would detect problems with register preservation.) The other tricky part was that saving/restoring the FP registers is one instruction per saved register, so saving all 18 was exceeding the asm size we allocate in src/jit.c (in some cases), since we emit Parrot_end for all restart ops. The fix for this was to pull this asm out into a utility routine, and just call that from the asm. (This is only done for restoring the registers so far--I should do it for the preservation step too, but right now that's just inline.) The attached patch also contains some other small improvements I'd been working on, and a few more jitted ops to demonstrate calling a C function from a jitted op. While you're investigating PPC JIT: JIT debugging via stabs doesn't work at all here. I get around the gdb message (missing data segment) by inserting .data\n.text\n in the stabs file, but that's all. After this change and regenerating file.o I can load it with add-symbol-file file.o without any complaint, but the access to the memory region the jitted code occupies is not allowed, disassemble doesn't work ... I'll take a look and see if I can figure it out. I remember gdb being uncooperative about disassembling, frustratingly, and I've moved to the habit of using something like x/300i jit_code as a workaround. Clearly it can access the memory region, so it seems like a gdb bug. See attached the patch, plus the new asm.s file. JEff config/gen/platform/darwin/asm.s: asm.s Description: application/applefile asm.s Description: application/text ppc-jit-preserve-fp.patch Description: application/text
Re: [PATCH] Re: JIT and platforms warning
Jeff Clites [EMAIL PROTECTED] wrote: See attached the patch, plus the new asm.s file. Doesn't run, segfaults on even mops.pasm - please check. JEff leo
Re: [PATCH] Re: JIT and platforms warning
Jeff Clites [EMAIL PROTECTED] wrote: Yep, that was the core of the issue. There's no free lunch--if we use the nonvolatile registers, we need to preserve/restore them in begin/end, but if we use the volatile registers, we need to preserve them across function calls (incl. normal op calls). Good point and JIT/i386 does it wrong in core.ops. But normal non-JITted code is not the problem - the framework preserves mapped registers or better - it has to copy all mapped registers to Parrot registers so that C code is able to see the actual values. And the framework also knows not to restore non-volatile registers, at least if the platform code defines PRESERVED_type_REGISTERS and arranges the registers correctly. ... So I added code to do the appropriate save/restore, and use the non-volatile registers for mapping--that should be less asm than what we'd have to do to use the volatile registers. (The surprising thing was that we only got 2 failures when using the volatile registers--I'll look into creating some tests that would detect problems with register preservation.) Well, allocation strategy on PPC (or I386) is to first use the non-volatile registers. PPC has 14 usable registers. To provoke a failure you'd need e.g. 15 different I-registers and then a JITted function call like Cset_s_sc. But in normal cases you have more string functions in that place and - as these are normally not JITted - registers are saved and restored around the external function. Oddly JIT/i386 has that problem too and there are now only 2 non-volatile registers. But albeit there are function calls, like Cstring_bool, the whole test suite passes. Anyway, I think we need a more general solution. We have basically two problems: 1) JIT startup and end code size and memory usage Given: a bunch of small overloaded vtable functions. All of these are called through runops_fromc_*(). So for every function PPC JIT now would move ~ 2 * 300 byte to and from the stack. That's too much and not needed in that case. Solution: The JIT compiler/optimizer already calculates the register mapping. We have to use this information for JIT pro- and epilogs. The allocation strategy should be adjusted and depend on code size and register usage: - big chunks of compiled code like whole modules should use the non-volatile registers first and then (if needed) the volatile registers. - small chunks of code should use the volatile registers to reduce function startup and end sizes. 2) JITed functions with calls into Parrot The jit2h JIT compiler needs a hint, that we call external code, e.g. CALL_FUNCTION(string_copy) This notion is needed anyway for the EXEC core, which has to generate fixup code for the executable. So with that information available, the JIT compiler can surround such code with: PRESERVE_REGS(); ... RESTORE_REGS(); The framework can now depending on the register mapping save and restore volatile registers if needed. The other tricky part was that saving/restoring the FP registers is one instruction per saved register, so saving all 18 was exceeding the asm size we allocate in src/jit.c (in some cases), since we emit Parrot_end for all restart ops. Yep, that's suboptimal. I've done that on i386 because it was just easy. But you are right, the Parrot_end() code should really be there only once. The attached patch also contains some other small improvements I'd been working on, and a few more jitted ops to demonstrate calling a C function from a jitted op. I'll apply it, because it's obviously correct albeit suboptimal ;) But we can improve things always later. ... , frustratingly, and I've moved to the habit of using something like x/300i jit_code as a workaround. Ah, yes - forgot that. Clearly it can access the memory region, so it seems like a gdb bug. Yep. JEff Thanks, leo
Re: [PATCH] Re: JIT and platforms warning
On Oct 23, 2004, at 4:20 AM, Leopold Toetsch wrote: Jeff Clites [EMAIL PROTECTED] wrote: See attached the patch, plus the new asm.s file. Doesn't run, segfaults on even mops.pasm - please check. I can't reproduce that here; parrot -j works for me with examples/{benchmarks,assembly}/mops.pasm, and all 'make testj' tests pass. (They were passing before, and I updated to pick up changes since I sent the patch, and still all passes.) I also tried building against a system ICU (was building against the parrot-supplied version), in case there were issues with calling into shared lib code (long shot), and no difference. Below is myconfig--I'm not doing any special configuration (just running 'perl Configure.pl', and not building optimized). Do you have any uncommitted local changes which might be involved? I'm up-to-date from CVS, and have no uncommitted changes except for those in the patch. JEff Summary of my parrot 0.1.1 configuration: configdate='Sat Oct 23 13:06:30 2004' Platform: osname=darwin, archname=darwin jitcapable=1, jitarchname=ppc-darwin, jitosname=DARWIN, jitcpuarch=ppc execcapable=1 perl=perl Compiler: cc='cc', ccflags='-g -pipe -pipe -fno-common -no-cpp-precomp -DHAS_TELLDIR_PROTOTYPE -pipe -fno-common -Wno-long-double -I/sw/include', Linker and Libraries: ld='c++', ldflags=' -flat_namespace -L/sw/lib', cc_ldflags='', libs='-lm -lgmp' Dynamic Linking: share_ext='.dylib', ld_share_flags='-dynamiclib', load_ext='.bundle', ld_load_flags='-bundle -undefined suppress' Types: iv=long, intvalsize=4, intsize=4, opcode_t=long, opcode_t_size=4, ptrsize=4, ptr_alignment=1 byteorder=4321, nv=double, numvalsize=8, doublesize=8 % gcc -v Reading specs from /usr/libexec/gcc/darwin/ppc/3.3/specs Thread model: posix gcc version 3.3 20030304 (Apple Computer, Inc. build 1495)
Re: [PATCH] Re: JIT and platforms warning
On Oct 23, 2004, at 3:42 AM, Leopold Toetsch wrote: Jeff Clites [EMAIL PROTECTED] wrote: Yep, that was the core of the issue. There's no free lunch--if we use the nonvolatile registers, we need to preserve/restore them in begin/end, but if we use the volatile registers, we need to preserve them across function calls (incl. normal op calls). Good point and JIT/i386 does it wrong in core.ops. But normal non-JITted code is not the problem - the framework preserves mapped registers or better - it has to copy all mapped registers to Parrot registers so that C code is able to see the actual values. Ah, good. The problem I was seeing was caused by Cset_s_sc, since my jit_emit_call_func was written with the assumption that we should be mapping only non-volatile registers (even though we were actually mapping volatiles, but at the end of the list, and even though we weren't yet preserving the appropriate float registers). But a recent check-in had us mapping the volatile float registers only, which exposed the problem. ... So I added code to do the appropriate save/restore, and use the non-volatile registers for mapping--that should be less asm than what we'd have to do to use the volatile registers. (The surprising thing was that we only got 2 failures when using the volatile registers--I'll look into creating some tests that would detect problems with register preservation.) Well, allocation strategy on PPC (or I386) is to first use the non-volatile registers. PPC has 14 usable registers. To provoke a failure you'd need e.g. 15 different I-registers and then a JITted function call like Cset_s_sc. We were allocating the volatile float registers first (or, only)--so Cset_s_sc was blowing away an N-register, even with only one in use. That's why I was surprised there weren't more failures. Anyway, I think we need a more general solution. ... Solution: The JIT compiler/optimizer already calculates the register mapping. We have to use this information for JIT pro- and epilogs. That makes sense--I hadn't initially realized we were tracking this, but since we are, we should use it. Things may be a bit tricky fixup-wise, since the size of the prolog will depend on how many float registers we need to preserve. (We can save a variable number of int registers with just a single instruction in the PPC case, as long as we are using registers in order, or we don't mind saving a few extra if we are not.) The allocation strategy should be adjusted and depend on code size and register usage: - big chunks of compiled code like whole modules should use the non-volatile registers first and then (if needed) the volatile registers. - small chunks of code should use the volatile registers to reduce function startup and end sizes. Yep, makes sense. And there may be a case for not mapping at all for very small chunks, but I'd have to experiment to know if that's be a win. 2) JITed functions with calls into Parrot The jit2h JIT compiler needs a hint, that we call external code, e.g. CALL_FUNCTION(string_copy) This notion is needed anyway for the EXEC core, which has to generate fixup code for the executable. So with that information available, the JIT compiler can surround such code with: PRESERVE_REGS(); ... RESTORE_REGS(); The framework can now depending on the register mapping save and restore volatile registers if needed. Sounds good. The other tricky part was that saving/restoring the FP registers is one instruction per saved register, so saving all 18 was exceeding the asm size we allocate in src/jit.c (in some cases), since we emit Parrot_end for all restart ops. Yep, that's suboptimal. I've done that on i386 because it was just easy. But you are right, the Parrot_end() code should really be there only once. Yep, instead of emitting it multiple times and conditionally jumping over it, we can emit it just once and conditionally jump to it. ... , frustratingly, and I've moved to the habit of using something like x/300i jit_code as a workaround. Ah, yes - forgot that. At least stepping does the right thing now, with your fix for the stabs file. Very nice to be able to do p I0 and such. One other question: Your P_ARITH optimization in jit/ppc/core.jit. I can't come up with a case where this kicks in, since in the tests I've tried, prev_op is always NULL when JITting if_i_ic or unless_i_ic. If I set things to not map any int registers, then I don't hit the cases in jit_emit.h which set prev_op to 0, but build_asm is doing it--I can't come up with a small test case which isn't being treated as multiple sections, it seems. There's something there w.r.t. sections which I don't understand, but anyway I just wanted to know how to set things up so that I can see your optimization in action. Thanks, JEff
Re: JIT and platforms warning
On Oct 22, 2004, at 1:01 AM, Leopold Toetsch wrote: [JIT changes] I just finished tracking down the source of a couple of JIT test failures on PPC--due to recent changes but only indirectly related, and pointing out things which needed fixing anyway (float register preservation issues). I'll send it in tomorrow after I've had a chance to clean it up and add some comments. JEff
Re: JIT and platforms warning
Jeff Clites wrote: On Oct 22, 2004, at 1:01 AM, Leopold Toetsch wrote: [JIT changes] I just finished tracking down the source of a couple of JIT test failures on PPC--due to recent changes but only indirectly related, and pointing out things which needed fixing anyway (float register preservation issues). I'll send it in tomorrow after I've had a chance to clean it up and add some comments. Please make sure to get a recent CVS copy and try to prepare the patch during my night ;) I've changed the PPC float register allocation yesterday, because it did look bogus - i.e. the non-volatile FPRs were allocated but not saved. That should be fixed. While you're investigating PPC JIT: JIT debugging via stabs doesn't work at all here. I get around the gdb message (missing data segment) by inserting .data\n.text\n in the stabs file, but that's all. After this change and regenerating file.o I can load it with add-symbol-file file.o without any complaint, but the access to the memory region the jitted code occupies is not allowed, disassemble doesn't work ... JEff leo