Re: [webkit-dev] ARM JIT for WinCE
Hi, the dateProtoFuncGetTimezoneOffset does not use the argList argument, while functionPrint does. Perhaps passing this argument is still not yet WinCE compatible. ArgList contains a pointer to the arguments (JSValue pointers), and the length of the arguments. This structure is 8 bytes on 32 bit machines (1 pointer, 1 int), and allocated on the stack, because the function got a reference (pointer) to it. Could you try the following JS code: print(a, 1, true) The length should be 3. Zoltan Hi, I did some further investigation today. I did a quick hack in the privateCompileCTIMachineTrampolines to get the same maybe correct register values like without OPTIMIZE_NATIVE_CALL. move(callFrameRegister, regT0); +move(ARMRegisters::r2, ARMRegisters::r3); +move(ARMRegisters::r1, ARMRegisters::r2); +move(ARMRegisters::r0, ARMRegisters::r1); -move(stackPointerRegister, ARMRegisters::r3); +move(stackPointerRegister, ARMRegisters::r0); -call(Address(regT1, OBJECT_OFFSETOF(JSFunction, m_data))); +call(Address(regT2, OBJECT_OFFSETOF(JSFunction, m_data))); addPtr(Imm32(sizeof(ArgList)), stackPointerRegister); Now it produces the following code: 003E01B0 mulsr0, r3, r0 003E01B4 subsr1, r1, r0 003E01B8 str r1, [sp] 003E01BC ldr r2, [r1, #-4] 003E01C0 ldr r1, [r4, #-8] 003E01C4 mov r0, r4 003E01C8 mov r3, r2 003E01CC mov r2, r1 003E01D0 mov r1, r0 003E01D4 mov r0, sp 003E01D8 mov lr, pc 003E01DC ldr pc, [r2, #0x1C] 003E01E0 addssp, sp, #8 003E01E4 ldr r3, [pc, #0x80] 003E01E8 ldr r2, [r3] 003E01EC bicsr3, r2, #0 003E01F0 bne 003E0204 The arguments seam to be sane now in the call to dateProtoFuncGetTimezoneOffset, but it crashes afterwards. When i step through it with the debugger i get the following register after the function finished and it jumps to 0x000139d8 instead of 0x003e01e0: (lr = 0x003e01e0 when i enter the function!) R0 = 0x182af984 R1 = 0x003f8054 R2 = 0x00601500 R3 = 0x0060 R4 = 0x003f8054 R5 = 0x0200 R6 = 0x182af984 R7 = 0x003f8054 R8 = 0x R9 = 0x182afbfc R10 = 0x R11 = 0x002b0370 R12 = 0x182af8f0 Sp = 0x182af95c Lr = 0x003e01e0 Pc = 0x000139d8 Psr = 0x201f I then tried to return jsNaN(exec) always. So R4 won't be used and prolog/epilog changed: 00071600 mov r12, sp 00071604 stmdb sp!, {r0 - r3} 00071608 stmdb sp!, {r4, r12, lr} 0007160C sub sp, sp, #0x1C 00071700 ldr r0, [sp, #8] 00071704 add sp, sp, #0x1C 00071708 ldmia sp, {r4, sp, pc} changed to 000734EC mov r12, sp 000734F0 stmdb sp!, {r0 - r3} 000734F4 stmdb sp!, {r12, lr} 000734F8 sub sp, sp, #0x1C 000735A4 ldr r0, [sp, #8] 000735A8 add sp, sp, #0x1C 000735AC ldmia sp, {sp, pc} I now get following registers and it jumps to the correct address (0x003e01e0), but it crashes then in functionPrint. R0 = 0x182af984 R1 = 0x182af8f8 R2 = 0x R3 = 0x182af984 R4 = 0x003f8080 R5 = 0x0200 R6 = 0x0060 R7 = 0x003e07c8 R8 = 0x R9 = 0x182afbfc R10 = 0x R11 = 0x002b0370 R12 = 0x03fc2c50 Sp = 0x182af984 Lr = 0x0001bc18 Pc = 0x003e01e0 Psr = 0x601f I tried jsc.exe with the following javascript file: print(getTimeZoneDiff()); function getTimeZoneDiff() { return (new Date(2000, 1, 1)).getTimezoneOffset(); } This doesn't make many sense to me in the moment. - Patrick ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] ARM JIT for WinCE
Hi Patrick, hm, I feel I found something. Please have a look at JavaScriptCore/jit/JITOpcodes.cpp : privateCompileCTIMachineTrampolines. The second one, when JSVALUE32_64 is disabled. If JIT_OPTIMIZE_NATIVE_CALL is enabled, a specialized code is generated to call native builtin functions (like Date.toString). This code for arm is around line 1733. Perhaps WinCE ABI wants the arguments in a different way than GCC. The faulting address according to your call stack is 0x003e01d4, which is the call(Address(regT1, OBJECT_OFFSETOF(JSFunction, m_data))); macro assembler instruction in line 1768. (Thank you for sending the instruction dump). Please try to fix this code according to WinCE ABI, since I am not sure JIT_OPTIMIZE_NATIVE_CALL can be disabled. Regards Zoltan Hi Gabor, Thanks for your prompt reply. Make sure your assembler does not break ctiVMThrowTrampoline and ctiOpThrowNotCaught functions. This approach requires that the ctiVMThrowTrampoline fall-backs to ctiOpThrowNotCaught after 'bl cti_vm_throw' call. Or you can simply copy the body of ctiOpThrowNotCaught into ctiVMThrowTrampoline after the call. I've copied it, but I think it's unnecessary (see disassembly) Did you do anything with DEFINE_STUB_FUNCTION macro? I've done it like for the RVCT compiler. (e.g. see cti_op_end in disassembly) When I run jsc.exe tests\mozilla\ecma_2\shell.js it crashes with the following callstack: 0x jsc.EXE!JSC::JSCell::inherits(JSC::ClassInfo* info = 0x00189818) Line: 335, Byte Offsets: 0x2c jsc.EXE!JSC::JSValue::inherits(JSC::ClassInfo* classInfo = 0x00189818) Line: 345, Byte Offsets: 0x40 jsc.EXE!JSC::dateProtoFuncGetTimezoneOffset(JSC::ExecState* exec = 0x00601b60, JSC::JSObject* __formal = 0x00601b40, JSC::JSValue thisValue = {...}, JSC::ArgList __formal = {...}) Line: 764, Byte Offsets: 0x1c 0x003e01d4 Is there a better javascript file to start with? When I enter a simple 1+2+3 into the interactive jsc.exe it prints the correct result. Here are some parts of the disassembly: // Execute the code! inline JSValue execute(RegisterFile* registerFile, CallFrame* callFrame, JSGlobalData* globalData, JSValue* exception) { 000A7868 mov r12, sp 000A786C stmdb sp!, {r0 - r3} 000A7870 stmdb sp!, {r12, lr} 000A7874 sub sp, sp, #0x20 return JSValue::decode(ctiTrampoline(m_ref.m_code.executableAddress(), registerFile, callFrame, exception, Profiler::enabledProfilerReference(), globalData)); 000A7878 bl |JSC::Profiler::enabledProfilerReference ( 1b2e0h )| 000A787C str r0, [sp, #0x14] 000A7880 ldr r0, this 000A7884 bl |WTF::RefPtrJSC::Profile::operator- ( d2e3ch )| 000A7888 str r0, [sp, #0x18] 000A788C ldr r3, globalData 000A7890 str r3, [sp, #4] 000A7894 ldr r3, [sp, #0x14] 000A7898 str r3, [sp] 000A789C ldr r3, exception 000A78A0 ldr r2, callFrame 000A78A4 ldr r1, registerFile 000A78A8 ldr r0, [sp, #0x18] 000A78AC bl 0014A000 000A78B0 str r0, [sp, #0x1C] 000A78B4 ldr r1, [sp, #0x1C] 000A78B8 ldr r0, [sp, #0x2C] 000A78BC bl |JSC::JSValue::decode ( 1b94ch )| 000A78C0 ldr r3, [sp, #0x2C] 000A78C4 str r3, [sp, #0x10] } 000A78C8 ldr r0, [sp, #0x10] 000A78CC add sp, sp, #0x20 000A78D0 ldmia sp, {sp, pc} ctiTrampoline: 0014A000 stmdb sp!, {r1 - r3} 0014A004 stmdb sp!, {r4 - r8, lr} 0014A008 sub sp, sp, #0x24 0014A00C mov r4, r2 0014A010 mov r5, #2, 24 0014A014 mov lr, pc 0014A018 bx r0// r0 = 0x003e0270 0014A01C add sp, sp, #0x24 0014A020 ldmia sp!, {r4 - r8, lr} 0014A024 add sp, sp, #0xC 0014A028 bx lr ctiVMThrowTrampoline: 0014A02C mov r0, sp 0014A030 bl 0014A6D4 0014A034 add sp, sp, #0x24 0014A038 ldmia sp!, {r4 - r8, lr} 0014A03C add sp, sp, #0xC 0014A040 bx lr ctiOpThrowNotCaught: 0014A044 add sp, sp, #0x24 0014A048 ldmia sp!, {r4 - r8, lr} 0014A04C add sp, sp, #0xC 0014A050 bx lr cti_op_convert_this: 0014A054 str lr, [sp, #0x20] 0014A058 bl |JITStubThunked_op_convert_this ( ae718h )| 0014A05C ldr lr, [sp, #0x20] 0014A060 bx lr cti_op_end: 0014A064 str lr, [sp, #0x20] 0014A068 bl |JITStubThunked_op_end ( ae878h )| 0014A06C ldr lr, [sp, #0x20] 0014A070 bx lr 003E017C mov pc, r0 003E0180 mov r0, lr 003E0184 str r0, [r4, #-0x14] 003E0188 ldr r1, [r4,
Re: [webkit-dev] ARM JIT for WinCE
Hi, many thanks! It works already when I disable OPTIMIZE_NATIVE_CALL (other 3 OPTIMIZE are turned on). I think you're right with the ABI problem. Maybe you can help me with it too: Here are the instruction dumps with and without the OPTIMIZE_NATIVE_CALL: == == #define OPTIMIZE_NATIVE_CALL = 1 == == 003E0100 ldr r8, [r2, #8] 003E0104 cmp r8, #0 003E0108 bgt 003E012C 003E010C mov r7, lr 003E0110 mov r0, sp 003E0114 str r4, [sp, #0x40] 003E0118 mov lr, pc 003E011C ldr pc, [pc, #0x128] 003E0120 ldr r1, [sp, #0xC] 003E0124 mov lr, r7 003E0128 ldr r2, [r0, #0x18] 003E012C ldr r8, [r2, #8] 003E0130 cmp r8, r1 003E0134 beq 003E0160 003E0138 mov r7, lr 003E013C str r7, [sp, #8] 003E0140 mov r0, sp 003E0144 str r4, [sp, #0x40] 003E0148 mov lr, pc 003E014C ldr pc, [pc, #0x100] 003E0150 mov r4, r1 003E0154 ldr r1, [sp, #0xC] 003E0158 mov lr, r7 003E015C ldr r2, [r0, #0x18] 003E0160 str r1, [r4, #-0xC] 003E0164 ldr r1, [r0, #0x1C] 003E0168 ldr r8, [pc, #0xE8] 003E016C str r8, [r4, #-4] 003E0170 str r0, [r4, #-8] 003E0174 str r1, [r4, #-0x1C] 003E0178 ldr r0, [r2, #0xC] 003E017C mov pc, r0 003E0180 mov r0, lr 003E0184 str r0, [r4, #-0x14] 003E0188 ldr r1, [r4, #-0x18] 003E018C ldr r1, [r1, #-0x1C] 003E0190 str r1, [r4, #-0x1C] 003E0194 ldr r0, [r4, #-0xC] 003E0198 subssp, sp, #8 003E019C subsr0, r0, #1 003E01A0 str r0, [sp, #4] 003E01A4 mov r1, r4 003E01A8 subsr1, r1, #0x20 003E01AC mov r3, #4 003E01B0 mulsr0, r3, r0 003E01B4 subsr1, r1, r0 003E01B8 str r1, [sp] 003E01BC ldr r2, [r1, #-4] 003E01C0 ldr r1, [r4, #-8] 003E01C4 mov r0, r4 003E01C8 mov r3, sp 003E01CC mov lr, pc 003E01D0 ldr pc, [r1, #0x1C] // R0 = 0x003f8080 R1 = 0x00601780 R2 = 0x00601760 R3 = 0x182af984 // R4 = 0x003f8080 R5 = 0x0200 R6 = 0x0060 R7 = 0x003e07b8 // R8 = 0x R9 = 0x182afbfc R10 = 0x R11 = 0x002b0370 // R12 = 0x182af8f0 Sp = 0x182af984 Lr = 0x003e01d4 // Pc = 0x00073468 Psr = 0x201f 003E01D4 addssp, sp, #8 003E01D8 ldr r3, [pc, #0x7C] 003E01DC ldr r2, [r3] 003E01E0 bicsr3, r2, #0 003E01E4 bne 003E01F8 003E01E8 ldr r1, [r4, #-0x14] 003E01EC ldr r4, [r4, #-0x18] 003E01F0 mov lr, r1 003E01F4 mov pc, lr 003E01F8 ldr r1, [r4, #-0x14] 003E01FC ldr r2, [pc, #0x60] 003E0200 str r1, [r2] 003E0204 ldr r2, [pc, #0x5C] 003E0208 ldr r4, [r4, #-0x18] 003E020C str r4, [sp, #0x40] 003E0210 mov lr, r2 003E0214 mov pc, lr == JSValue JSC_HOST_CALL dateProtoFuncGetTimezoneOffset(ExecState* exec, JSObject*, JSValue thisValue, const ArgList) { 00073468 mov r12, sp 0007346C stmdb sp!, {r0 - r3} 00073470 stmdb sp!, {r4, r12, lr} 00073474 sub sp, sp, #0x1C if (!thisValue.inherits(DateInstance::info)) 00073478 ldr r1, [pc, #0x100] // R0 = 0x003f8080 R1 = 0x00601780 R2 = 0x00601760 R3 = 0x182af984 // R4 = 0x003f8080 R5 = 0x0200 R6 = 0x0060 R7 = 0x003e07b8 // R8 = 0x R9 = 0x182afbfc R10 = 0x R11 = 0x002b0370 // R12 = 0x182af984 Sp = 0x182af94c Lr = 0x003e01d4 // Pc = 0x00073478 Psr = 0x201f 0007347C add r0, sp, #0x34 00073480 bl |JSC::JSValue::inherits ( 6997ch )| 00073484 strbr0, [sp, #0xC] 00073488 ldrbr3, [sp, #0xC] 0007348C cmp r3, #0 00073490 bne |JSC::dateProtoFuncGetTimezoneOffset + 0x54 ( 734bch )| return throwError(exec, TypeError); 00073494 mov r1, #5 00073498 ldr r0, exec 0007349C bl |JSC::throwError ( 5dd78h )| 000734A0 str r0, [sp, #0x10] 000734A4 ldr r1, [sp, #0x10] 000734A8 ldr r0, [sp, #0x28] 000734AC bl | WTF::OwnArrayPtrJSC::Register::OwnArrayPtrJSC::Register ( 110e8h )| 000734B0 ldr r3, [sp, #0x28] 000734B4 str r3, [sp, #8] 000734B8 b |JSC::dateProtoFuncGetTimezoneOffset + 0x100 ( 73568h )| DateInstance* thisDateObj = asDateInstance(thisValue); 000734BC ldr r0, thisValue 000734C0 bl |JSC::asRegExpConstructor ( 697b8h )| 000734C4 str r0, [sp,
Re: [webkit-dev] ARM JIT for WinCE
Hi, I did some further investigation today. I did a quick hack in the privateCompileCTIMachineTrampolines to get the same maybe correct register values like without OPTIMIZE_NATIVE_CALL. move(callFrameRegister, regT0); +move(ARMRegisters::r2, ARMRegisters::r3); +move(ARMRegisters::r1, ARMRegisters::r2); +move(ARMRegisters::r0, ARMRegisters::r1); -move(stackPointerRegister, ARMRegisters::r3); +move(stackPointerRegister, ARMRegisters::r0); -call(Address(regT1, OBJECT_OFFSETOF(JSFunction, m_data))); +call(Address(regT2, OBJECT_OFFSETOF(JSFunction, m_data))); addPtr(Imm32(sizeof(ArgList)), stackPointerRegister); Now it produces the following code: 003E01B0 mulsr0, r3, r0 003E01B4 subsr1, r1, r0 003E01B8 str r1, [sp] 003E01BC ldr r2, [r1, #-4] 003E01C0 ldr r1, [r4, #-8] 003E01C4 mov r0, r4 003E01C8 mov r3, r2 003E01CC mov r2, r1 003E01D0 mov r1, r0 003E01D4 mov r0, sp 003E01D8 mov lr, pc 003E01DC ldr pc, [r2, #0x1C] 003E01E0 addssp, sp, #8 003E01E4 ldr r3, [pc, #0x80] 003E01E8 ldr r2, [r3] 003E01EC bicsr3, r2, #0 003E01F0 bne 003E0204 The arguments seam to be sane now in the call to dateProtoFuncGetTimezoneOffset, but it crashes afterwards. When i step through it with the debugger i get the following register after the function finished and it jumps to 0x000139d8 instead of 0x003e01e0: (lr = 0x003e01e0 when i enter the function!) R0 = 0x182af984 R1 = 0x003f8054 R2 = 0x00601500 R3 = 0x0060 R4 = 0x003f8054 R5 = 0x0200 R6 = 0x182af984 R7 = 0x003f8054 R8 = 0x R9 = 0x182afbfc R10 = 0x R11 = 0x002b0370 R12 = 0x182af8f0 Sp = 0x182af95c Lr = 0x003e01e0 Pc = 0x000139d8 Psr = 0x201f I then tried to return jsNaN(exec) always. So R4 won't be used and prolog/epilog changed: 00071600 mov r12, sp 00071604 stmdb sp!, {r0 - r3} 00071608 stmdb sp!, {r4, r12, lr} 0007160C sub sp, sp, #0x1C 00071700 ldr r0, [sp, #8] 00071704 add sp, sp, #0x1C 00071708 ldmia sp, {r4, sp, pc} changed to 000734EC mov r12, sp 000734F0 stmdb sp!, {r0 - r3} 000734F4 stmdb sp!, {r12, lr} 000734F8 sub sp, sp, #0x1C 000735A4 ldr r0, [sp, #8] 000735A8 add sp, sp, #0x1C 000735AC ldmia sp, {sp, pc} I now get following registers and it jumps to the correct address (0x003e01e0), but it crashes then in functionPrint. R0 = 0x182af984 R1 = 0x182af8f8 R2 = 0x R3 = 0x182af984 R4 = 0x003f8080 R5 = 0x0200 R6 = 0x0060 R7 = 0x003e07c8 R8 = 0x R9 = 0x182afbfc R10 = 0x R11 = 0x002b0370 R12 = 0x03fc2c50 Sp = 0x182af984 Lr = 0x0001bc18 Pc = 0x003e01e0 Psr = 0x601f I tried jsc.exe with the following javascript file: print(getTimeZoneDiff()); function getTimeZoneDiff() { return (new Date(2000, 1, 1)).getTimezoneOffset(); } This doesn't make many sense to me in the moment. - Patrick ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
[webkit-dev] ARM JIT for WinCE
Hi, I'm trying to enable the JIT for CPU(ARM_TRADITIONAL) OS(WINCE). It already passes the RegExp-Tests with ENABLE_YARR_JIT. If i set all ENABLE_JIT_OPTIMIZE_* to 0 it won't compile. MSVC supports inline assembler only for X86, so i had to provide a separate asm file (i copied the code from the GCC #ifdef with !JSVALUE32_64): ctiTrampoline proc stmdb sp!, {r1-r3} stmdb sp!, {r4-r8, lr} sub sp, sp, #36 mov r4, r2 mov r5, #512 mov lr, pc mov pc, r0 add sp, sp, #36 ldmia sp!, {r4-r8, lr} add sp, sp, #12 mov pc, lr endp ctiVMThrowTrampoline proc mov r0, sp bl cti_vm_throw endp ctiOpThrowNotCaught proc add sp, sp, #36 ldmia sp!, {r4-r8, lr} add sp, sp, #12 mov pc, lr endp I can compile and link it without problems, but it crashes with a nullpointer at runtime and a strange callstack. When i use a debugger and step into ctiTrampoline it stops at the second stmdb because it can't find the sourcecode. :-/ I've done a #pragma pack(4) around the JITStackFrame. Can somebody give me a hint where to search for the failure? - Patrick ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] ARM JIT and related issues
Hi, 1) The armv7 port is separate from the armv6 work, and uses the thumb2 instruction set. Both ports are (I hope!) useful. We hope it as well. 2) We would have liked to let the community know about the arm v7 port sooner. Unfortunately, we were not at liberty to disclose it until the iPhone 3G S announcement. We try to let the community know what we're up to and drop code into the public tree as soon as we can, but sometimes we are limited by confidentiality constraints. Corporate secrets are corporate secrects, that is understandable. 3) We'd definitely like to have a port for pre-v7 ARM in the main WebKit tree. I think everyone made this clear. That is great! 4) I think it would be good to see if more code and ideas can be shared between the two ARM ports. They were made independently, and originally in different ways, so let's see what exchange can happen. Right now the macro-assembler based ARM port is an x86 emulator. It translates x86 instructions into ARM instruction sequences (usually 1-5 instructions are enough). As ARMv7 port mostly does it, but thumb2 is much closer to x86 than a RISC architecture (Thumb2 looks like a CISC operation mode). 5) Gavin has been a strong proponent of using MacroAssembler as the primary CPU abstraction layer, and that approach has worked reasonably well so far. However, it seems at least to me that CPUs with very different instruction sets may want to do things differently at a higher level. x86 is a 2-operand instruction set with optional memory operands, and it seems to me a 3-operand load-store architecture might want to do things in a different way to get good performance. Making them go through a common assembler interface may not work. Ultimately, however, the proof is in the performance results. If doing things a different way delivers better performance, that is more important than maximizing code sharing or architectural purity. That has always been the WebKit way. In case of the ARM-port we have a native implementation and a MacroAssembler based implementation, and we have already posted comparisons between them to the bugzilla. Furthermore, we performed some tests on our XScale simulator, and the native jit'ed code executes 5-40% less instructions. However, the gain is smaller on the total runtime, since the jit'ed code takes only a fragment of the total runtime. Although native jit is faster, we are happy with MacroAssembler as well. 6) It seems like the intent with the Szeged arm port and the plan for getting it in the tree wasn't clear to all parties involved. For me personally, it wasn't clear that there was an intent to contribute it, or perhaps even an expectation that we'd just pick it up from the external repository where it was developed. Things would have been more clear if patches were submitted for review earlier. We know that such big patches requries several refactoring phases before they go to mainline, that is why we thought it is a good idea to create a branch on Staikos where you can take a look at them before we flooded the WebKit bugzilla with patches. 7) It seems like people said some intemperate things during the earlier discussion. It also seems like these remarks were based partly on misunderstanding. I hope everyone has gotten past that, and that we are all ready to work together productively. True. I feel the communication between us improved a lot. However, I am still thinking how can we involve others as well. I am pretty sure not only us are interested in the design decisions we discuss in bugzilla. Perhaps squirellfish-dev would be a good place for such discussions. 8) A number patches from the folks working at University of Szeged have been landed. But it seems to me like there has also been a fair amount of abandoned work and working at cross purposes. I feel like the people working on JavaScript at U of Szeged are not entirely in sync with the main JavaScriptCore hackers. You guys have done a lot of great work, and I'd like to explore what we can do to get more in sync on design direction. Does anyone have suggestions on this front? Again, this is true. We have no idea what is the general direction of JavaScriptCore. We can only see landed patches, and predict the ongoing and future works based on them. However, landed patches are completed works, and it is usually too late for any contributions when they are landed. It would be good to discuss things before a work started, especially design changes, which affects all-ports. We feel the design dicussions - such it was about ifdefs - would greatly improve the cooperation between all parties since everybody can feel as a part of the community. Thanks, Zoltan ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
[webkit-dev] ARM JIT and related issues
I'm not sure if there are any remaining disputes about the Nitro ports to armv6 and armv7. But just to make sure everyone is on the same page, I would like to clarify a few things: 1) The armv7 port is separate from the armv6 work, and uses the thumb2 instruction set. Both ports are (I hope!) useful. 2) We would have liked to let the community know about the arm v7 port sooner. Unfortunately, we were not at liberty to disclose it until the iPhone 3G S announcement. We try to let the community know what we're up to and drop code into the public tree as soon as we can, but sometimes we are limited by confidentiality constraints. 3) We'd definitely like to have a port for pre-v7 ARM in the main WebKit tree. I think everyone made this clear. 4) I think it would be good to see if more code and ideas can be shared between the two ARM ports. They were made independently, and originally in different ways, so let's see what exchange can happen. 5) Gavin has been a strong proponent of using MacroAssembler as the primary CPU abstraction layer, and that approach has worked reasonably well so far. However, it seems at least to me that CPUs with very different instruction sets may want to do things differently at a higher level. x86 is a 2-operand instruction set with optional memory operands, and it seems to me a 3-operand load-store architecture might want to do things in a different way to get good performance. Making them go through a common assembler interface may not work. Ultimately, however, the proof is in the performance results. If doing things a different way delivers better performance, that is more important than maximizing code sharing or architectural purity. That has always been the WebKit way. 6) It seems like the intent with the Szeged arm port and the plan for getting it in the tree wasn't clear to all parties involved. For me personally, it wasn't clear that there was an intent to contribute it, or perhaps even an expectation that we'd just pick it up from the external repository where it was developed. Things would have been more clear if patches were submitted for review earlier. 7) It seems like people said some intemperate things during the earlier discussion. It also seems like these remarks were based partly on misunderstanding. I hope everyone has gotten past that, and that we are all ready to work together productively. 8) A number patches from the folks working at University of Szeged have been landed. But it seems to me like there has also been a fair amount of abandoned work and working at cross purposes. I feel like the people working on JavaScript at U of Szeged are not entirely in sync with the main JavaScriptCore hackers. You guys have done a lot of great work, and I'd like to explore what we can do to get more in sync on design direction. Does anyone have suggestions on this front? I know that at at least some non-Apple developers have managed to do major work on JavaScriptCore internals (for example Cameron Zwarich before he became an Apple employee), so I am confident we can make things work better. Regards, Maciej ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] ARM JIT and related issues
--- On Wed, 6/17/09, Maciej Stachowiak m...@apple.com wrote: 5) Gavin has been a strong proponent of using MacroAssembler as the primary CPU abstraction layer, and that approach has worked reasonably well so far. However, it seems at least to me that CPUs with very different instruction sets may want to do things differently at a higher level. x86 is a 2-operand instruction set with optional memory operands, and it seems to me a 3-operand load-store architecture might want to do things in a different way to get good performance. The porting problem IMHO isn't the number of operands. The problem is the JIT design assumes a CISC processor with the following characteristics: 1) call/return instructions which store the return address on the stack as on the x86 processor. If the target processor doesn't do this, then this requires a huge amount of work. 2) The JIT performs relocations in a kludgy way. Some relocations are performed before the code is copied to the final location, and some relocations after. Also, it's not clear which relocations are within a single code block, and which go across code blocks, so the generated code needs to assume the worst case if the target processor has a limited number of bits for relative branches. 3) JIT assumes the call instruction does not need to have the call address loaded into a register, which is the cause of the current bug I'm debugging. The JIT generates code which calls another call instruction directly instead of calling the previous two instructions which loads the call address into the register. I could go on and on, but you get the idea. Basically, the current JIT design akes a large number of x86/CISC target architecture assumptions. Toshi ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] ARM JIT and related issues
On Jun 16, 2009, at 5:52 PM, Toshiyasu Morita wrote: --- On Wed, 6/17/09, Maciej Stachowiak m...@apple.com wrote: 5) Gavin has been a strong proponent of using MacroAssembler as the primary CPU abstraction layer, and that approach has worked reasonably well so far. However, it seems at least to me that CPUs with very different instruction sets may want to do things differently at a higher level. x86 is a 2-operand instruction set with optional memory operands, and it seems to me a 3-operand load-store architecture might want to do things in a different way to get good performance. The porting problem IMHO isn't the number of operands. The problem is the JIT design assumes a CISC processor with the following characteristics: 1) call/return instructions which store the return address on the stack as on the x86 processor. If the target processor doesn't do this, then this requires a huge amount of work. 2) The JIT performs relocations in a kludgy way. Some relocations are performed before the code is copied to the final location, and some relocations after. Also, it's not clear which relocations are within a single code block, and which go across code blocks, so the generated code needs to assume the worst case if the target processor has a limited number of bits for relative branches. 3) JIT assumes the call instruction does not need to have the call address loaded into a register, which is the cause of the current bug I'm debugging. The JIT generates code which calls another call instruction directly instead of calling the previous two instructions which loads the call address into the register. I could go on and on, but you get the idea. Basically, the current JIT design akes a large number of x86/CISC target architecture assumptions. The issues you raise don't seem to have stopped either of the ARM ports so they do not seem like fundamental issues. - Maciej ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] arm jit
Hi guys at Apple, it looks we are in the way of the train. You have plans, we don't know about them, you have commit rights, we don't, so the tides are against us. Hints on the mailing lists are scarce, although a year ago someone from you asked whether others are interested in design discussions, we said 'yes', nothing really changed. Except some simple bugfixes, new patches of ours do not really get into the mainline (except some of our best ideas reimplemented by someone else). Partially this is due to the lack of information and roadmap of JavaScriptCore. That is why I feel the missing of the real openness. And here, I have to make a short comment on the non-acceptance of our ARM JIT implementation. In your mail you mention that you would remain reluctant to accept a duplicate of the JIT into the tree, rather than a port of the existing JIT utilizing the MacroAssembler abstraction. Well, did you check our ARM port? It has been rewritten to conform to the MacroAssembler interfaces more than a month ago and posted it into the bugzilla (https://bugs.webkit.org/show_bug.cgi?id=24986). Anyway, we have updated the MacroAssembler-based ARM port of ours, uploaded it to the bugzilla, and set the review flag on the patch. Regards, Zoltan On Jun 9, 2009, at 2:38 PM, Akos Kiss wrote: Dear Community, Today, we realized that there is a new ARM JIT port for WebKit. (http://trac.webkit.org/changeset/44514 ) Congratulations on getting this working!, great job. Hi Akos, Thank you! Just to clarify, we have just landed a ARMv7 architecture (thumb2) JIT backend into ToT. I say ARMv7 to distinguish this port from the ARM application instruction set found in the ARMv6 architecture and earlier (as I believe would be a common understanding of the term ARM, and and which I believe your port targets). Thumb2 in ARMv7 is, of course, a very different instruction set to the traditional 32-bit instructions found in ARM – with completely different machine encodings, and significantly different capabilities (e.g. two operand versus three operand instructions, sizes of immediate operands, and options for instruction predication). For the JIT to be able to run on both ARM and ARMv7 platforms, it needs to be ported to both architectures – in much the same way the the JIT is ported to both the x86 and x86-64 platforms. Obviously there is a great deal of similarity on the surface between ARM and ARMv7, but in terms of the JIT implementation that may well only be true above the level of the MacroAssembler interface. There may be some limited opportunities to share code within the Assembler classes (register numbering enums, and possibly types describing immediate operands), but since the assembler is primarily concerned with formatting machine instructions, and since the instruction encodings are different, it seems likely the bulk of the code will have to remain separate. Again, the differences in instruction selection options available on the two architectures will likely make it hard to share code within the MacroAssember (different numbers of operands to many common instructions, and the options when working with large immediate values particularly spring to mind). We would certainly want to share code and avoid any duplication where ever it makes sense to do so. I cannot conceal how disappointed I am, as is the whole team at Szeged. I am very sorry to hear this. If you look at the patches that landed into ToT there were very few changes made outside of the new assembler classes which, for the reasons described above, I think are highly unlikely to have much in common on the two platforms. The changes that have been made to common code outside of the assemblers should only help in removing x86 dependencies and assumptions that had existed in the code. I strongly urge you to review the changes that have been made, as I hope and believe you will find that they will assist the team in integrating your ARM port. Of course, we've felt that you were reluctant to accept our implementation. We were (and remain) reluctant to accept a duplicate of the JIT into the tree, rather than a port of the existing JIT utilizing the MacroAssembler abstraction. We are concerned that it would be extremely difficult to continue to maintain such a port as we move the JIT technology forwards. Beyond that, they key barrier to the ARM JIT being accepted into WebKit is that there simply haven't been any patches put forwards for us to review! (I'm sorry, I'm aware you have provided a link to an external git repository, but I'm afraid we really can't seek through version control systems to find changes to review – we do need contributors to attach patches to bugs, and we need a review flag setting to indicate when the contributor believes their patch is ready. If there is any uncertainty as to the procedure, please see http://webkit.org/coding/contributing.html .) - Are you
Re: [webkit-dev] arm jit
--- On Wed, 6/10/09, Gavin Barraclough barraclo...@apple.com wrote: If you consider calling a JS function with too few arguments as being akin to = invoking a C++ method with some defaulted parameters not-provided, then it is also the responsibility of code generated for the call to such a method to ensure that values for all declared parameters are passed.) Thanks for the long explanation. Can the arity check be performed at compile time as in C++? Toshi ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] arm jit
Can the arity check be performed at compile time as in C++? C++ can perform arity checks at compile time because C++ uses early binding. JavaScript uses late binding. Geoff___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] arm jit
On Jun 11, 2009, at 9:59 AM, Toshiyasu Morita wrote: --- On Wed, 6/10/09, Gavin Barraclough barraclo...@apple.com wrote: If you consider calling a JS function with too few arguments as being akin to = invoking a C++ method with some defaulted parameters not-provided, then it is also the responsibility of code generated for the call to such a method to ensure that values for all declared parameters are passed.) Thanks for the long explanation. Can the arity check be performed at compile time as in C++? Toshi Alas no because you can't really guarantee exactly what function will be called (except in a few relatively uncommon cases), eg. function g() { for (var i = 0; i 100; i++) f(a*i); } g(); So if we look at the call to f(a*i) we need to ask what is the arity of f?, so the issues we need to deal with to answer this question statically are * the object f may not be defined or it may not be a function at compile time -- at runtime f may have become a function, or it may not * any function call may result in f being changed and function calls may occur during arithmetic if you cannot guarantee the input types These two things together mean it's not reasonably possible to guarantee the same function will be called every time, let alone have the same arity. --Oliver ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] arm jit
it looks we are in the way of the train. You have plans, we don't know about them, you have commit rights, we don't, so the tides are against us. If you're interested in review or commit rights, they're granted based on a track record of good work, good judgement, and good collaboration. You can read more about the policy here: http://webkit.org/coding/commit-review-policy.html . Please work on your collaboration skills. Right now, your tone stinks. Geoff ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] arm jit
If you're interested in review or commit rights, they're granted based on a track record of good work, good judgement, and good collaboration. You can read more about the policy here: http://webkit.org/coding/commit-review-policy.html . Please work on your collaboration skills. Right now, your tone stinks. I am sorry if I was not clear. I was talking about cooperation and openness, not about commit rights. Actually, I think it is unimportant who commit a patch, the important thing is that everyone should know what is happening. That is why I wrote those boring blog posts about technical details. Zoltan ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] arm jit
And here, I have to make a short comment on the non-acceptance of our ARM JIT implementation. In your mail you mention that you would remain reluctant to accept a duplicate of the JIT into the tree, rather than a port of the existing JIT utilizing the MacroAssembler abstraction. Well, did you check our ARM port? It has been rewritten to conform to the MacroAssembler interfaces more than a month ago and posted it into the bugzilla (https://bugs.webkit.org/show_bug.cgi?id=24986). Hi Zoltan, I'm sorry if I'm misinterpreting you here, but it sounds like over the last month you have been expecting your MacroAssembler based ARM port to have been reviewed. If so - Then I can quite understand your frustration, and do sympathize greatly with you. There clearly has been a breakdown in communication, and I'm sorry about your disappointment. I'm afraid that we had no way of knowing that you considered your port to be complete. Just last week, in an email on this list, I said, when you have a patch ready for review, please attach it to the bug and set the review flag – I was under the impression that you did not feel your changes were yet final (by the sound of it I was mistaken). I was assuming that when a final patch was ready you would attach it to the bug, and mark it for review. We have a procedure for accepting contributions to avoid exactly this kind of miscommunication. The mechanism for communicating to us that you believe your patch is ready is very simple, and is absolutely critical if you want to get code into WebKit. Patches ready for review must be marked as such in bugzilla. Without this we cannot tell which patches attached to bugs are complete, and which represent work in progress. I urge you to review the instructions on contributing on the website, since following these will be the only way to avoid similar disappointment in the future. Perhaps this is an area where we need to improve our communication – perhaps we need to make these instructions clearer, or more prominent on the website. The website is all stored in svn, so please do file bugs in bugzilla – or patches welcome – if you think these can be improved. Anyway, we have updated the MacroAssembler-based ARM port of ours, uploaded it to the bugzilla, and set the review flag on the patch. I've had a brief chance to look at the patch, and it's looking really great. There are some bits to clean up a little to get it through review, and we will want to land a change of this magnitude incrementally. I'm afraid that I have a busy day today, I will try to comment on the bug tonight but it may have to wait until the morning. Hopefully we can get this landed into ToT fairly quickly. cheers, G. ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] arm jit
Gavin Barraclough barraclo...@apple.com wrote: We were (and remain) reluctant to accept a duplicate of the JIT into the tree, rather than a port of the existing JIT utilizing the MacroAssembler abstraction. We are concerned that it would be extremely difficult to continue to maintain such a port as we move the JIT technology forwards. Umm. IMHO The existing JIT is not well designed, with processor-specific constants everywhere and optimizations such as inlinining huge blocks of weakly-optimized code instead of making a function call to properly-optimized code. These issues make it difficult to both port and maintain. If the other JIT is better designed than the current one, IMHO it should be considered for inclusion. Toshi ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] arm jit
--- On Wed, 6/10/09, Geoffrey Garen gga...@apple.com wrote: I'm having a hard time understanding from your comment what optimization changes you think are appropriate, but if you can produce a patch that implements your idea, and shows a benefit on a benchmark, I'd be happy to review it. Consider something like op_call. This expands out to 95 inline instructions on the MIPS for just the slow case alone, of which 3 are functions calls to other functions. So this probably requires thousands of clock cycles to execute. IMHO it doesn't make sense to inline op_call because: 1. It's a huge amount of JIT code just to save three of four instructions at runtime (call, return, and maybe some register shuffling) 2. The code which is executed is thousands of instructions and saving three or four instructions is a microscopic net win. 4. It make the generated machine code MUCH larger because instead of having one copy of this function that is written in C/C++ and statically compiled, there are multiple copies of this code for every instance of op_call, which makes the instruction cache much less effective. 5. The generated machine code is weakly optimized, so instead of having calling code which is well-optimized by the C/C++ compiler for MIPS, it is executing weakly optimized dynamically generated code. Since the code is weakly optimized, it is also much larger than it should be, which also makes the instruction cache much less effective. 6. The JIT-generated code resides in the data cache, and must be flushed to main memory, then the instruction cache must be invalidated so the new code will load into the instruction cache. Because the WebKit JIT seems to do lazy compilation of functions at call time (instead of compiling all the functions in one pass), this requires the data cache to be flushed and the instruction cache to be invalided every time a new function is generated, which further degrades performance. This type of code generation strategy is ok for processors with unified caches (or pseudo-ounified on x86) but for RISC machines with separate instruction and data caches, it's really awful. This is just one of the problems with the JIT on MIPS (and other RISC processors). If you're interested, I can elaborate more. If my client is willing to pay for optimization work, I will eventually submit patches. Toshi ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] arm jit
This expands out to 95 inline instructions on the MIPS for just the slow case alone, of which 3 are functions calls to other functions. So this probably requires thousands of clock cycles to execute. IMHO it doesn't make sense to inline op_call because: You've made some interesting theoretical arguments against inlining op_call, but the empirical win from inlining op_call -- on x86 and arm, at least -- was tremendous. Maybe the situation is different on MIPS. You can experiment with the JIT_OPTIMIZE_CALL preprocessor setting to test your theory. 1. It's a huge amount of JIT code just to save three of four instructions at runtime (call, return, and maybe some register shuffling) I don't understand your math here. Just the code to pass arguments to a call helper function would be more than three or four instructions. 2. The code which is executed is thousands of instructions and saving three or four instructions is a microscopic net win. The generated code for a call slow case is pretty lengthy, but be careful not to confuse generated code with executed code. Slow case execution is relatively rare. 6. The JIT-generated code resides in the data cache, and must be flushed to main memory, then the instruction cache must be invalidated so the new code will load into the instruction cache. Because the WebKit JIT seems to do lazy compilation of functions at call time (instead of compiling all the functions in one pass), this requires the data cache to be flushed and the instruction cache to be invalided every time a new function is generated, which further degrades performance. This type of code generation strategy is ok for processors with unified caches (or pseudo-ounified on x86) but for RISC machines with separate instruction and data caches, it's really awful. It would be an interesting experiment to compile functions at creation time instead of call time, and see if things got faster. I'd love to hear your results, if you try it. I doubt that eager compilation would be a good strategy for the web, though, since web pages tend to load very large libraries of functions, while only calling a small percentage of those functions. Geoff___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] arm jit
It would be an interesting experiment to compile functions at creation time instead of call time, and see if things got faster. I'd love to hear your results, if you try it. I doubt that eager compilation would be a good strategy for the web, though, since web pages tend to load very large libraries of functions, while only calling a small percentage of those functions. It could be worth trying a stub function that triggers the compilation of the function should it not be present, but i'm not sure what that would really save as we still need the arity checks inline -- i suppose we could lazily generated trampolines for each arity as needed and just have many trampolines, but that could be complicated, and care would be needed to ensure that changing the code pointer didn't result in either incorrectly invalidating the caching or result in incorrectly caching the trampoline address instead of the real function code. --Oliver Geoff ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] arm jit
It could be worth trying a stub function that triggers the compilation of the function should it not be present, but i'm not sure what that would really save as we still need the arity checks inline A design that I like is a stub function that triggers compilation (so the caller can always just call), combined with an arity check in the callee, which linked calls can skip (by linking to a label past the end of the arity check). I think that could simplify the calling code, while reducing its footprint. Geoff ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] arm jit
Why does the arity check need to be in the caller, and not the callee? Consider: one function that is called from 10,000 places. Arity check in the caller: 10,000 copies of the artity check. Arity check in the callee: one copy of the arity check Toshi --- On Wed, 6/10/09, Geoffrey Garen gga...@apple.com wrote: From: Geoffrey Garen gga...@apple.com Subject: Re: [webkit-dev] arm jit To: Oliver Hunt oli...@apple.com Cc: Toshiyasu Morita tm_web...@yahoo.com, WebKit Development webkit-dev@lists.webkit.org Date: Wednesday, June 10, 2009, 9:14 PM It could be worth trying a stub function that triggers the compilation of the function should it not be present, but i'm not sure what that would really save as we still need the arity checks inline A design that I like is a stub function that triggers compilation (so the caller can always just call), combined with an arity check in the callee, which linked calls can skip (by linking to a label past the end of the arity check). I think that could simplify the calling code, while reducing its footprint. Geoff ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] arm jit
--- On Wed, 6/10/09, Oliver Hunt oli...@apple.com wrote: I doubt that eager compilation would be a good strategy for the web, though, since web pages tend to load very large libraries of functions, while only calling a small percentage of those functions. Turbo C compiled about 10,000 lines of source code per second on an ancient 12 Mhz PC AT. It does register allocation, common subexpression elimination, and a bunch of other classical compiler optimizations. Most modern processors are from about 200 Mhz to about 3 Ghz, which is significantly faster than a PC AT. If you use simple linear extrapolation, that's a compile speed of about 160k-2m lines per second. That seems adequate to compile even fairly large libraries of functions. Toshi ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] arm jit
The issue is that it compiling 5000 lines of libraries (possibly more) results in a significant amount of memory use, that's why we don't compile -- i don't believe there was a significant cpu time performance win (if any at all) from delaying function compilation. There was however a significant memory win for most pages the user visited. --Oliver On Jun 10, 2009, at 2:20 PM, Toshiyasu Morita wrote: --- On Wed, 6/10/09, Oliver Hunt oli...@apple.com wrote: I doubt that eager compilation would be a good strategy for the web, though, since web pages tend to load very large libraries of functions, while only calling a small percentage of those functions. Turbo C compiled about 10,000 lines of source code per second on an ancient 12 Mhz PC AT. It does register allocation, common subexpression elimination, and a bunch of other classical compiler optimizations. Most modern processors are from about 200 Mhz to about 3 Ghz, which is significantly faster than a PC AT. If you use simple linear extrapolation, that's a compile speed of about 160k-2m lines per second. That seems adequate to compile even fairly large libraries of functions. Toshi ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] arm jit
On Jun 10, 2009, at 1:15 PM, Toshiyasu Morita wrote: --- On Wed, 6/10/09, Geoffrey Garen gga...@apple.com wrote: I'm having a hard time understanding from your comment what optimization changes you think are appropriate, but if you can produce a patch that implements your idea, and shows a benefit on a benchmark, I'd be happy to review it. Consider something like op_call. This expands out to 95 inline instructions on the MIPS for just the slow case alone, of which 3 are functions calls to other functions. So this probably requires thousands of clock cycles to execute. IMHO it doesn't make sense to inline op_call because: [ I'm sorry, I've been away from a net connection, I may be replicating a couple of things ggaren olliej have already said. ] Okay! First up, have you tried turning off ENABLE_JIT_OPTIMIZE_CALL? If you do so, it should address the majority of your concerns, below (specifically, reducing code size, and removing the need for op_call to patch generated code). Of course, we added the call optimizations because we measure them as a significant performance improvement, but feel free to test whether this is true on your platform, and once the MIPS JIT is in the tree we'd be happy to consider changes to the optimized mode that aid MIPS performance. 1. It's a huge amount of JIT code just to save three of four instructions at runtime (call, return, and maybe some register shuffling) 2. The code which is executed is thousands of instructions and saving three or four instructions is a microscopic net win. 4. It make the generated machine code MUCH larger because instead of having one copy of this function that is written in C/C++ and statically compiled, there are multiple copies of this code for every instance of op_call, which makes the instruction cache much less effective. I think it's worth making sure you understand the optimization here. The majority of calls can be optimized, and having been optimized only run the sequence of instructions planted in the main generation pass. This code path is only a handful of instructions long, and introducing an extra call and return onto this path would almost certainly degrade performance (feel free to try doing so, and please so submit any patches that provide a memory saving, without significantly degrading performance). For such a short and performance critical fragment of code it clearly could make sense to tweak the code for specific platforms, and it may well provide a significant performance benefit to do so. We should certainly consider such patches. The slow case JIT code is much longer, and less frequently executed. Introducing a call and return here to share code between calls definitely makes sense. The way you know we think that it, the JIT already works this way! The slow cases call out to a set of shared trampolines generated in privateCompileCTIMachineTrampolines. This is however, a work in progress, and we are currently still clearly generating far more code than we should be in the slow cases. More work should be done to unify the pre-linked and post-link slow case states, and to move work into the trampolines (this is something I may be looking at again fairly soon). It is certainly valid to question whether the work performed by the machine trampolines is better in JIT generated code, or in C++ code that the compiler can optimize. In the early stages of its development the JIT was more a context threaded interpreter, calling out to C++ to perform almost all optimizations. We have migrated work into JIT generated code only where it has been a performance benefit to do so. Of course, that doesn't mean that we always got it right, or that the trade-offs haven't changed, or that the policy might not need to be tweaked on different platforms. Please feel free to experiment, and if you can produce patches that reduce the amount of work done in these JIT generated trampolines while improving performance then we'll be hugely appreciative (in fact, it needn't even be a performance win here – anything that doesn't degrade performance could be a nice simplification). 5. The generated machine code is weakly optimized, so instead of having calling code which is well-optimized by the C/C++ compiler for MIPS, it is executing weakly optimized dynamically generated code. Since the code is weakly optimized, it is also much larger than it should be, which also makes the instruction cache much less effective. 6. The JIT-generated code resides in the data cache, and must be flushed to main memory, then the instruction cache must be invalidated so the new code will load into the instruction cache. Because the WebKit JIT seems to do lazy compilation of functions at call time (instead of compiling all the functions in one pass), this requires the data cache to be flushed and the
Re: [webkit-dev] arm jit
Toshiyasu, On Jun 10, 2009, at 2:24 PM, Toshiyasu Morita wrote: Why does the arity check need to be in the caller, and not the callee? The majority of call sites always call to the same callee, and we can optimize these cases for calling that same function repeatedly. Within the optimized path of the op_call, where we are linked to a specific callee, we can statically determine at compile time that the caller and callee have the same arity, and omit the dynamic arity check altogether – which is a performance win. Within the callee (where you might have been called from multiple call sites with difference numbers of arguments passed) it is not clear how you would implement such an optimization. But patches welcome as ever! (btw, I was on your side on this one, the arity check in the callee seemed more natural to me at first, until the benefits of performing the check in the caller became clear. That said, from a design point of view, fixing up arity at the call site can also makes sense in matching (or at least being analogous to) calling conventions of other languages. If you consider calling a JS function with too few arguments as being akin to invoking a C++ method with some defaulted parameters not-provided, then it is also the responsibility of code generated for the call to such a method to ensure that values for all declared parameters are passed.) Consider: one function that is called from 10,000 places. Arity check in the caller: 10,000 copies of the artity check. Arity check in the callee: one copy of the arity check You're not taking into account that we don't generate the arity check inline, instead it is in a shared trampoline. Consider: one hundred functions that are called from 10,000 places. Arity check in the caller: 10,000 copies of the artity check. Arity check in the callee: one hundred copies of the arity check Arity check in a set of 3 shared trampolines (which is how the JIT is currently implemented): 3 copies. (3 due to the stages the call linking goes through, go see the code to find out why!) cheers, G. ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
[webkit-dev] arm jit
Dear Community, Today, we realized that there is a new ARM JIT port for WebKit. (http://trac.webkit.org/changeset/44514) Congratulations on getting this working!, great job. I cannot conceal how disappointed I am, as is the whole team at Szeged. It was months ago, when we presented you our first results in the Bugzilla (https://bugs.webkit.org/show_bug.cgi?id=24986). Since then, we exchanged ideas in several comments and even received feedbacks and suggestions from you. Now, I pose the question (possibly only to myself): For what end? I'm quite sure that the work on your side has been ongoing for some time now. Still, you did not mention anything. We did not even get a notification on the new port. Of course, we've felt that you were reluctant to accept our implementation. At least, now we know why. As WebKit currently is, it is not an open community. Not at all. Let me ask two final questions: - Are you still looking for patches, bug reports, feature requests, etc., or is it all in vain - you will get everything done inhouse? - Should we ask for credits in the new files, as it was done by you when we first published our JIT implementation? I'm quite sure that we can state the same argument: a number of the new files appear to have taken large chunks of logic from existing jit files. Best regards, Akos ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] arm jit
Hi Akos. Today, we realized that there is a new ARM JIT port for WebKit. (http://trac.webkit.org/changeset/44514 ) Congratulations on getting this working!, great job. Thanks. I cannot conceal how disappointed I am, as is the whole team at Szeged. I'm sorry to hear that. I understand your disappointment, but I have to admit that I'm excited to have a close-to-enabled, fast ARM JIT in the WebKit repository, that builds on JavaScriptCore's existing infrastructure and design. It was months ago, when we presented you our first results in the Bugzilla (https://bugs.webkit.org/show_bug.cgi?id=24986). Since then, we exchanged ideas in several comments and even received feedbacks and suggestions from you. Now, I pose the question (possibly only to myself): For what end? I'm quite sure that the work on your side has been ongoing for some time now. Still, you did not mention anything. We did not even get a notification on the new port. Of course, we've felt that you were reluctant to accept our implementation. At least, now we know why. As WebKit currently is, it is not an open community. Not at all. I don't think that's fair. Looking at https://bugs.webkit.org/show_bug.cgi?id=24986, I see three detailed comments by Oliver Hunt, and two by Gavin Barraclough, explaining our thinking about how an ARM JIT should be organized to best effect. I think you'll find those comments strongly reflected in http://trac.webkit.org/changeset/44514 . Let me ask two final questions: - Are you still looking for patches, bug reports, feature requests, etc., Yes. - Should we ask for credits in the new files, as it was done by you when we first published our JIT implementation? I'm quite sure that we can state the same argument: a number of the new files appear to have taken large chunks of logic from existing jit files. Generally, a new file that copies substantial code from an old file should include the old file's copyrights, along with the new author's copyright. Usually, this happens naturally via svn cp, but sometimes people forget or use other processes. If you notice a patch that is remiss in this area, please do mention something, with specifics. Cheers, Geoff ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] arm jit
On Tuesday 09 June 2009 23:38:43 Akos Kiss wrote: Dear Community, Today, we realized that there is a new ARM JIT port for WebKit. (http://trac.webkit.org/changeset/44514) Congratulations on getting this working!, great job. I cannot conceal how disappointed I am, as is the whole team at Szeged. I can understand how bad you feel and I agree that Apple does not comment on future products.-mantra can suck at times. When you look at the ARM JIT of Apple you will see they mostly target Cortex-A8 (thumb2, vfp) and IIRC your JIT is much wider (supporting many more existing devices). So my bottom line is something like please don't give up, and contributing to the JIT should be more easy now. E.g. pick a topic/theme for your work and try to get it in. I can not share that WebKit is not a open community. Of course there are vendor interests and Apple, Nokia and Torchmobile don't share some of them. E.g. Apple probably does not see the value of XHTML MP, ECMA MP and WML and still these changes go on... z. ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] arm jit
Hi there, I would also say that is it pretty understanding that Apple does not share information about working on a ARM JIT targeting thumb2, especially as this can be used to foresee the hardware of future iPhone models. Something they are probably not interesting in revealing. I agree whole-hearted with Holger, that the chances for getting ARM JIT enhancements upsteamed now have been greatly improved; and with Apple working on the same code I would expect good reviews and lots of care for the ARM code base. We really rely on you guys for bringing the ARM JIT code to other versions of ARM, and I would like to say that I consider your contributions important and very welcome, and would like to encourage your to keep contributing! Please keep up the good work! Kenneth On Tue, Jun 9, 2009 at 9:20 PM, Holger Freytherze...@selfish.org wrote: On Tuesday 09 June 2009 23:38:43 Akos Kiss wrote: Dear Community, Today, we realized that there is a new ARM JIT port for WebKit. (http://trac.webkit.org/changeset/44514) Congratulations on getting this working!, great job. I cannot conceal how disappointed I am, as is the whole team at Szeged. I can understand how bad you feel and I agree that Apple does not comment on future products.-mantra can suck at times. When you look at the ARM JIT of Apple you will see they mostly target Cortex-A8 (thumb2, vfp) and IIRC your JIT is much wider (supporting many more existing devices). So my bottom line is something like please don't give up, and contributing to the JIT should be more easy now. E.g. pick a topic/theme for your work and try to get it in. I can not share that WebKit is not a open community. Of course there are vendor interests and Apple, Nokia and Torchmobile don't share some of them. E.g. Apple probably does not see the value of XHTML MP, ECMA MP and WML and still these changes go on... z. ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
Re: [webkit-dev] arm jit
On Jun 9, 2009, at 2:38 PM, Akos Kiss wrote: Dear Community, Today, we realized that there is a new ARM JIT port for WebKit. (http://trac.webkit.org/changeset/44514 ) Congratulations on getting this working!, great job. Hi Akos, Thank you! Just to clarify, we have just landed a ARMv7 architecture (thumb2) JIT backend into ToT. I say ARMv7 to distinguish this port from the ARM application instruction set found in the ARMv6 architecture and earlier (as I believe would be a common understanding of the term ARM, and and which I believe your port targets). Thumb2 in ARMv7 is, of course, a very different instruction set to the traditional 32-bit instructions found in ARM – with completely different machine encodings, and significantly different capabilities (e.g. two operand versus three operand instructions, sizes of immediate operands, and options for instruction predication). For the JIT to be able to run on both ARM and ARMv7 platforms, it needs to be ported to both architectures – in much the same way the the JIT is ported to both the x86 and x86-64 platforms. Obviously there is a great deal of similarity on the surface between ARM and ARMv7, but in terms of the JIT implementation that may well only be true above the level of the MacroAssembler interface. There may be some limited opportunities to share code within the Assembler classes (register numbering enums, and possibly types describing immediate operands), but since the assembler is primarily concerned with formatting machine instructions, and since the instruction encodings are different, it seems likely the bulk of the code will have to remain separate. Again, the differences in instruction selection options available on the two architectures will likely make it hard to share code within the MacroAssember (different numbers of operands to many common instructions, and the options when working with large immediate values particularly spring to mind). We would certainly want to share code and avoid any duplication where ever it makes sense to do so. I cannot conceal how disappointed I am, as is the whole team at Szeged. I am very sorry to hear this. If you look at the patches that landed into ToT there were very few changes made outside of the new assembler classes which, for the reasons described above, I think are highly unlikely to have much in common on the two platforms. The changes that have been made to common code outside of the assemblers should only help in removing x86 dependencies and assumptions that had existed in the code. I strongly urge you to review the changes that have been made, as I hope and believe you will find that they will assist the team in integrating your ARM port. Of course, we've felt that you were reluctant to accept our implementation. We were (and remain) reluctant to accept a duplicate of the JIT into the tree, rather than a port of the existing JIT utilizing the MacroAssembler abstraction. We are concerned that it would be extremely difficult to continue to maintain such a port as we move the JIT technology forwards. Beyond that, they key barrier to the ARM JIT being accepted into WebKit is that there simply haven't been any patches put forwards for us to review! (I'm sorry, I'm aware you have provided a link to an external git repository, but I'm afraid we really can't seek through version control systems to find changes to review – we do need contributors to attach patches to bugs, and we need a review flag setting to indicate when the contributor believes their patch is ready. If there is any uncertainty as to the procedure, please see http://webkit.org/coding/contributing.html .) - Are you still looking for patches, bug reports, feature requests, etc., or is it all in vain - you will get everything done in house? Yes! Please do so, this is only way your changes will get into the tree. - Should we ask for credits in the new files, as it was done by you when we first published our JIT implementation? I'm quite sure that we can state the same argument: a number of the new files appear to have taken large chunks of logic from existing jit files. The new files were derived from their x86 counterparts, with reference to the ARMv7 manuals. As such the existing copyright notifications within the files from which they were derived have been retained. (Apologies for the slow reply, we have a busy week with WWDC on!) cheers, G. Best regards, Akos ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev