Re: [webkit-dev] ARM JIT for WinCE

2010-01-08 Thread Zoltan Herczeg
Hi,

the dateProtoFuncGetTimezoneOffset does not use the argList argument,
while functionPrint does. Perhaps passing this argument is still not yet
WinCE compatible. ArgList contains a pointer to the arguments (JSValue
pointers), and the length of the arguments. This structure is 8 bytes on
32 bit machines (1 pointer, 1 int), and allocated on the stack, because
the function got a reference (pointer) to it.

Could you try the following JS code: print(a, 1, true)
The length should be 3.

Zoltan

 Hi,

 I did some further investigation today.

 I did a quick hack in the privateCompileCTIMachineTrampolines to get the
 same
 maybe correct register values like without OPTIMIZE_NATIVE_CALL.

  move(callFrameRegister, regT0);

 +move(ARMRegisters::r2, ARMRegisters::r3);
 +move(ARMRegisters::r1, ARMRegisters::r2);
 +move(ARMRegisters::r0, ARMRegisters::r1);
 -move(stackPointerRegister, ARMRegisters::r3);
 +move(stackPointerRegister, ARMRegisters::r0);
 -call(Address(regT1, OBJECT_OFFSETOF(JSFunction, m_data)));
 +call(Address(regT2, OBJECT_OFFSETOF(JSFunction, m_data)));

  addPtr(Imm32(sizeof(ArgList)), stackPointerRegister);

 Now it produces the following code:

 003E01B0  mulsr0, r3, r0
 003E01B4  subsr1, r1, r0
 003E01B8  str r1, [sp]
 003E01BC  ldr r2, [r1, #-4]
 003E01C0  ldr r1, [r4, #-8]
 003E01C4  mov r0, r4
 003E01C8  mov r3, r2
 003E01CC  mov r2, r1
 003E01D0  mov r1, r0
 003E01D4  mov r0, sp
 003E01D8  mov lr, pc
 003E01DC  ldr pc, [r2, #0x1C]
 003E01E0  addssp, sp, #8
 003E01E4  ldr r3, [pc, #0x80]
 003E01E8  ldr r2, [r3]
 003E01EC  bicsr3, r2, #0
 003E01F0  bne 003E0204

 The arguments seam to be sane now in the call to
 dateProtoFuncGetTimezoneOffset, but it crashes afterwards.
 When i step through it with the debugger i get the following register
 after
 the function finished and it jumps to 0x000139d8 instead of 0x003e01e0:
 (lr = 0x003e01e0 when i enter the function!)

 R0 = 0x182af984 R1 = 0x003f8054 R2 = 0x00601500 R3 = 0x0060
 R4 = 0x003f8054 R5 = 0x0200 R6 = 0x182af984 R7 = 0x003f8054
 R8 = 0x R9 = 0x182afbfc R10 = 0x R11 = 0x002b0370
 R12 = 0x182af8f0 Sp = 0x182af95c Lr = 0x003e01e0
 Pc = 0x000139d8 Psr = 0x201f

 I then tried to return jsNaN(exec) always. So R4 won't be used and
 prolog/epilog changed:

 00071600  mov r12, sp
 00071604  stmdb   sp!, {r0 - r3}
 00071608  stmdb   sp!, {r4, r12, lr}
 0007160C  sub sp, sp, #0x1C
 
 00071700  ldr r0, [sp, #8]
 00071704  add sp, sp, #0x1C
 00071708  ldmia   sp, {r4, sp, pc}

 changed to

 000734EC  mov r12, sp
 000734F0  stmdb   sp!, {r0 - r3}
 000734F4  stmdb   sp!, {r12, lr}
 000734F8  sub sp, sp, #0x1C
 
 000735A4  ldr r0, [sp, #8]
 000735A8  add sp, sp, #0x1C
 000735AC  ldmia   sp, {sp, pc}

 I now get following registers and it jumps to the correct address
 (0x003e01e0), but it crashes then in functionPrint.

 R0 = 0x182af984 R1 = 0x182af8f8 R2 = 0x R3 = 0x182af984
 R4 = 0x003f8080 R5 = 0x0200 R6 = 0x0060 R7 = 0x003e07c8
 R8 = 0x R9 = 0x182afbfc R10 = 0x R11 = 0x002b0370
 R12 = 0x03fc2c50 Sp = 0x182af984 Lr = 0x0001bc18
 Pc = 0x003e01e0 Psr = 0x601f

 I tried jsc.exe with the following javascript file:
 print(getTimeZoneDiff());
 function getTimeZoneDiff() {
 return (new Date(2000, 1, 1)).getTimezoneOffset();
 }

 This doesn't make many sense to me in the moment.

 - Patrick



___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] ARM JIT for WinCE

2010-01-07 Thread Zoltan Herczeg
Hi Patrick,

hm, I feel I found something. Please have a look at
JavaScriptCore/jit/JITOpcodes.cpp : privateCompileCTIMachineTrampolines.
The second one, when JSVALUE32_64 is disabled. If JIT_OPTIMIZE_NATIVE_CALL
is enabled, a specialized code is generated to call native builtin
functions (like Date.toString). This code for arm is around line 1733.
Perhaps WinCE ABI wants the arguments in a different way than GCC. The
faulting address according to your call stack is 0x003e01d4, which is the
call(Address(regT1, OBJECT_OFFSETOF(JSFunction, m_data))); macro
assembler instruction in line 1768. (Thank you for sending the instruction
dump). Please try to fix this code according to WinCE ABI, since I am not
sure JIT_OPTIMIZE_NATIVE_CALL can be disabled.

Regards
Zoltan

 Hi Gabor,

 Thanks for your prompt reply.

 Make sure your assembler does not break ctiVMThrowTrampoline
 and ctiOpThrowNotCaught functions. This approach requires that the
 ctiVMThrowTrampoline fall-backs to ctiOpThrowNotCaught
 after 'bl cti_vm_throw' call. Or you can simply copy the body of
 ctiOpThrowNotCaught into ctiVMThrowTrampoline after the
 call.
 I've copied it, but I think it's unnecessary (see disassembly)

 Did you do anything with DEFINE_STUB_FUNCTION macro?
 I've done it like for the RVCT compiler. (e.g. see cti_op_end in
 disassembly)

 When I run jsc.exe tests\mozilla\ecma_2\shell.js it crashes with the
 following callstack:
 0x
 jsc.EXE!JSC::JSCell::inherits(JSC::ClassInfo* info = 0x00189818) Line:
 335,
 Byte Offsets: 0x2c
 jsc.EXE!JSC::JSValue::inherits(JSC::ClassInfo* classInfo = 0x00189818)
 Line:
 345, Byte Offsets: 0x40
 jsc.EXE!JSC::dateProtoFuncGetTimezoneOffset(JSC::ExecState* exec =
 0x00601b60,
 JSC::JSObject* __formal = 0x00601b40, JSC::JSValue thisValue = {...},
 JSC::ArgList __formal = {...}) Line: 764, Byte Offsets: 0x1c
 0x003e01d4

 Is there a better javascript file to start with? When I enter a simple
 1+2+3
 into the interactive jsc.exe it prints the correct result.

 Here are some parts of the disassembly:

 // Execute the code!
 inline JSValue execute(RegisterFile* registerFile, CallFrame*
 callFrame, JSGlobalData* globalData, JSValue* exception)
 {
 000A7868  mov r12, sp
 000A786C  stmdb   sp!, {r0 - r3}
 000A7870  stmdb   sp!, {r12, lr}
 000A7874  sub sp, sp, #0x20
 return
 JSValue::decode(ctiTrampoline(m_ref.m_code.executableAddress(),
 registerFile,
 callFrame, exception, Profiler::enabledProfilerReference(), globalData));
 000A7878  bl  |JSC::Profiler::enabledProfilerReference ( 1b2e0h )|
 000A787C  str r0, [sp, #0x14]
 000A7880  ldr r0, this
 000A7884  bl  |WTF::RefPtrJSC::Profile::operator- ( d2e3ch )|
 000A7888  str r0, [sp, #0x18]
 000A788C  ldr r3, globalData
 000A7890  str r3, [sp, #4]
 000A7894  ldr r3, [sp, #0x14]
 000A7898  str r3, [sp]
 000A789C  ldr r3, exception
 000A78A0  ldr r2, callFrame
 000A78A4  ldr r1, registerFile
 000A78A8  ldr r0, [sp, #0x18]
 000A78AC  bl  0014A000
 000A78B0  str r0, [sp, #0x1C]
 000A78B4  ldr r1, [sp, #0x1C]
 000A78B8  ldr r0, [sp, #0x2C]
 000A78BC  bl  |JSC::JSValue::decode ( 1b94ch )|
 000A78C0  ldr r3, [sp, #0x2C]
 000A78C4  str r3, [sp, #0x10]
 }
 000A78C8  ldr r0, [sp, #0x10]
 000A78CC  add sp, sp, #0x20
 000A78D0  ldmia   sp, {sp, pc}

 

 ctiTrampoline:
 0014A000  stmdb   sp!, {r1 - r3}
 0014A004  stmdb   sp!, {r4 - r8, lr}
 0014A008  sub sp, sp, #0x24
 0014A00C  mov r4, r2
 0014A010  mov r5, #2, 24
 0014A014  mov lr, pc
 0014A018  bx  r0// r0 = 0x003e0270
 0014A01C  add sp, sp, #0x24
 0014A020  ldmia   sp!, {r4 - r8, lr}
 0014A024  add sp, sp, #0xC
 0014A028  bx  lr
 ctiVMThrowTrampoline:
 0014A02C  mov r0, sp
 0014A030  bl  0014A6D4
 0014A034  add sp, sp, #0x24
 0014A038  ldmia   sp!, {r4 - r8, lr}
 0014A03C  add sp, sp, #0xC
 0014A040  bx  lr
 ctiOpThrowNotCaught:
 0014A044  add sp, sp, #0x24
 0014A048  ldmia   sp!, {r4 - r8, lr}
 0014A04C  add sp, sp, #0xC
 0014A050  bx  lr
 cti_op_convert_this:
 0014A054  str lr, [sp, #0x20]
 0014A058  bl  |JITStubThunked_op_convert_this ( ae718h )|
 0014A05C  ldr lr, [sp, #0x20]
 0014A060  bx  lr
 cti_op_end:
 0014A064  str lr, [sp, #0x20]
 0014A068  bl  |JITStubThunked_op_end ( ae878h )|
 0014A06C  ldr lr, [sp, #0x20]
 0014A070  bx  lr

 

 003E017C  mov pc, r0
 003E0180  mov r0, lr
 003E0184  str r0, [r4, #-0x14]
 003E0188  ldr r1, [r4, 

Re: [webkit-dev] ARM JIT for WinCE

2010-01-07 Thread Patrick Roland Gansterer
Hi,

many thanks! It works already when I disable OPTIMIZE_NATIVE_CALL (other 3 
OPTIMIZE are turned on). I think you're right with the ABI problem. Maybe you 
can help me with it too: Here are the instruction dumps with and without the 
OPTIMIZE_NATIVE_CALL:

==
== #define OPTIMIZE_NATIVE_CALL = 1 ==
==

003E0100  ldr r8, [r2, #8] 
003E0104  cmp r8, #0 
003E0108  bgt 003E012C 
003E010C  mov r7, lr 
003E0110  mov r0, sp 
003E0114  str r4, [sp, #0x40] 
003E0118  mov lr, pc 
003E011C  ldr pc, [pc, #0x128] 
003E0120  ldr r1, [sp, #0xC] 
003E0124  mov lr, r7 
003E0128  ldr r2, [r0, #0x18] 
003E012C  ldr r8, [r2, #8] 
003E0130  cmp r8, r1 
003E0134  beq 003E0160 
003E0138  mov r7, lr 
003E013C  str r7, [sp, #8] 
003E0140  mov r0, sp 
003E0144  str r4, [sp, #0x40] 
003E0148  mov lr, pc 
003E014C  ldr pc, [pc, #0x100] 
003E0150  mov r4, r1 
003E0154  ldr r1, [sp, #0xC] 
003E0158  mov lr, r7 
003E015C  ldr r2, [r0, #0x18] 
003E0160  str r1, [r4, #-0xC] 
003E0164  ldr r1, [r0, #0x1C] 
003E0168  ldr r8, [pc, #0xE8] 
003E016C  str r8, [r4, #-4] 
003E0170  str r0, [r4, #-8] 
003E0174  str r1, [r4, #-0x1C] 
003E0178  ldr r0, [r2, #0xC] 
003E017C  mov pc, r0 
003E0180  mov r0, lr 
003E0184  str r0, [r4, #-0x14] 
003E0188  ldr r1, [r4, #-0x18] 
003E018C  ldr r1, [r1, #-0x1C] 
003E0190  str r1, [r4, #-0x1C] 
003E0194  ldr r0, [r4, #-0xC] 
003E0198  subssp, sp, #8 
003E019C  subsr0, r0, #1 
003E01A0  str r0, [sp, #4] 
003E01A4  mov r1, r4 
003E01A8  subsr1, r1, #0x20 
003E01AC  mov r3, #4 
003E01B0  mulsr0, r3, r0 
003E01B4  subsr1, r1, r0 
003E01B8  str r1, [sp] 
003E01BC  ldr r2, [r1, #-4] 
003E01C0  ldr r1, [r4, #-8] 
003E01C4  mov r0, r4 
003E01C8  mov r3, sp 
003E01CC  mov lr, pc 
003E01D0  ldr pc, [r1, #0x1C] 
// R0 = 0x003f8080 R1 = 0x00601780 R2 = 0x00601760 R3 = 0x182af984
// R4 = 0x003f8080 R5 = 0x0200 R6 = 0x0060 R7 = 0x003e07b8
// R8 = 0x R9 = 0x182afbfc R10 = 0x R11 = 0x002b0370
// R12 = 0x182af8f0 Sp = 0x182af984 Lr = 0x003e01d4
// Pc = 0x00073468 Psr = 0x201f
003E01D4  addssp, sp, #8 
003E01D8  ldr r3, [pc, #0x7C] 
003E01DC  ldr r2, [r3] 
003E01E0  bicsr3, r2, #0 
003E01E4  bne 003E01F8 
003E01E8  ldr r1, [r4, #-0x14] 
003E01EC  ldr r4, [r4, #-0x18] 
003E01F0  mov lr, r1 
003E01F4  mov pc, lr 
003E01F8  ldr r1, [r4, #-0x14] 
003E01FC  ldr r2, [pc, #0x60] 
003E0200  str r1, [r2] 
003E0204  ldr r2, [pc, #0x5C] 
003E0208  ldr r4, [r4, #-0x18] 
003E020C  str r4, [sp, #0x40] 
003E0210  mov lr, r2 
003E0214  mov pc, lr 

==

JSValue JSC_HOST_CALL dateProtoFuncGetTimezoneOffset(ExecState* exec, 
JSObject*, JSValue thisValue, const ArgList)
{
00073468  mov r12, sp 
0007346C  stmdb   sp!, {r0 - r3} 
00073470  stmdb   sp!, {r4, r12, lr} 
00073474  sub sp, sp, #0x1C 
if (!thisValue.inherits(DateInstance::info))
00073478  ldr r1, [pc, #0x100] 
// R0 = 0x003f8080 R1 = 0x00601780 R2 = 0x00601760 R3 = 0x182af984
// R4 = 0x003f8080 R5 = 0x0200 R6 = 0x0060 R7 = 0x003e07b8
// R8 = 0x R9 = 0x182afbfc R10 = 0x R11 = 0x002b0370
// R12 = 0x182af984 Sp = 0x182af94c Lr = 0x003e01d4 
// Pc = 0x00073478 Psr = 0x201f 
0007347C  add r0, sp, #0x34 
00073480  bl  |JSC::JSValue::inherits ( 6997ch )| 
00073484  strbr0, [sp, #0xC] 
00073488  ldrbr3, [sp, #0xC] 
0007348C  cmp r3, #0 
00073490  bne |JSC::dateProtoFuncGetTimezoneOffset + 0x54 ( 734bch )| 
return throwError(exec, TypeError);
00073494  mov r1, #5 
00073498  ldr r0, exec 
0007349C  bl  |JSC::throwError ( 5dd78h )| 
000734A0  str r0, [sp, #0x10] 
000734A4  ldr r1, [sp, #0x10] 
000734A8  ldr r0, [sp, #0x28] 
000734AC  bl  |
WTF::OwnArrayPtrJSC::Register::OwnArrayPtrJSC::Register ( 110e8h )| 
000734B0  ldr r3, [sp, #0x28] 
000734B4  str r3, [sp, #8] 
000734B8  b   |JSC::dateProtoFuncGetTimezoneOffset + 0x100 ( 73568h )| 

DateInstance* thisDateObj = asDateInstance(thisValue); 
000734BC  ldr r0, thisValue 
000734C0  bl  |JSC::asRegExpConstructor ( 697b8h )| 
000734C4  str r0, [sp, 

Re: [webkit-dev] ARM JIT for WinCE

2010-01-07 Thread Patrick Roland Gansterer
Hi,

I did some further investigation today.

I did a quick hack in the privateCompileCTIMachineTrampolines to get the same 
maybe correct register values like without OPTIMIZE_NATIVE_CALL.

 move(callFrameRegister, regT0);

+move(ARMRegisters::r2, ARMRegisters::r3);
+move(ARMRegisters::r1, ARMRegisters::r2);
+move(ARMRegisters::r0, ARMRegisters::r1);
-move(stackPointerRegister, ARMRegisters::r3);
+move(stackPointerRegister, ARMRegisters::r0);
-call(Address(regT1, OBJECT_OFFSETOF(JSFunction, m_data)));
+call(Address(regT2, OBJECT_OFFSETOF(JSFunction, m_data)));
 
 addPtr(Imm32(sizeof(ArgList)), stackPointerRegister);

Now it produces the following code:

003E01B0  mulsr0, r3, r0 
003E01B4  subsr1, r1, r0 
003E01B8  str r1, [sp] 
003E01BC  ldr r2, [r1, #-4] 
003E01C0  ldr r1, [r4, #-8] 
003E01C4  mov r0, r4 
003E01C8  mov r3, r2 
003E01CC  mov r2, r1 
003E01D0  mov r1, r0 
003E01D4  mov r0, sp 
003E01D8  mov lr, pc 
003E01DC  ldr pc, [r2, #0x1C] 
003E01E0  addssp, sp, #8 
003E01E4  ldr r3, [pc, #0x80] 
003E01E8  ldr r2, [r3] 
003E01EC  bicsr3, r2, #0 
003E01F0  bne 003E0204 

The arguments seam to be sane now in the call to 
dateProtoFuncGetTimezoneOffset, but it crashes afterwards.
When i step through it with the debugger i get the following register after 
the function finished and it jumps to 0x000139d8 instead of 0x003e01e0:
(lr = 0x003e01e0 when i enter the function!)

R0 = 0x182af984 R1 = 0x003f8054 R2 = 0x00601500 R3 = 0x0060
R4 = 0x003f8054 R5 = 0x0200 R6 = 0x182af984 R7 = 0x003f8054
R8 = 0x R9 = 0x182afbfc R10 = 0x R11 = 0x002b0370
R12 = 0x182af8f0 Sp = 0x182af95c Lr = 0x003e01e0 
Pc = 0x000139d8 Psr = 0x201f 

I then tried to return jsNaN(exec) always. So R4 won't be used and 
prolog/epilog changed:

00071600  mov r12, sp 
00071604  stmdb   sp!, {r0 - r3} 
00071608  stmdb   sp!, {r4, r12, lr} 
0007160C  sub sp, sp, #0x1C 

00071700  ldr r0, [sp, #8] 
00071704  add sp, sp, #0x1C 
00071708  ldmia   sp, {r4, sp, pc} 

changed to

000734EC  mov r12, sp 
000734F0  stmdb   sp!, {r0 - r3} 
000734F4  stmdb   sp!, {r12, lr} 
000734F8  sub sp, sp, #0x1C 

000735A4  ldr r0, [sp, #8] 
000735A8  add sp, sp, #0x1C 
000735AC  ldmia   sp, {sp, pc} 

I now get following registers and it jumps to the correct address 
(0x003e01e0), but it crashes then in functionPrint.

R0 = 0x182af984 R1 = 0x182af8f8 R2 = 0x R3 = 0x182af984
R4 = 0x003f8080 R5 = 0x0200 R6 = 0x0060 R7 = 0x003e07c8
R8 = 0x R9 = 0x182afbfc R10 = 0x R11 = 0x002b0370
R12 = 0x03fc2c50 Sp = 0x182af984 Lr = 0x0001bc18 
Pc = 0x003e01e0 Psr = 0x601f

I tried jsc.exe with the following javascript file:
print(getTimeZoneDiff());
function getTimeZoneDiff() { 
return (new Date(2000, 1, 1)).getTimezoneOffset();
}

This doesn't make many sense to me in the moment.

- Patrick
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


[webkit-dev] ARM JIT for WinCE

2010-01-06 Thread Patrick Roland Gansterer
Hi,

I'm trying to enable the JIT for CPU(ARM_TRADITIONAL)  OS(WINCE).
It already passes the RegExp-Tests with ENABLE_YARR_JIT. If i set all 
ENABLE_JIT_OPTIMIZE_* to 0 it won't compile.
MSVC supports inline assembler only for X86, so i had to provide a separate 
asm file (i copied the code from the GCC #ifdef with !JSVALUE32_64):
ctiTrampoline proc
stmdb sp!, {r1-r3}
stmdb sp!, {r4-r8, lr}
sub sp, sp, #36
mov r4, r2
mov r5, #512
mov lr, pc
mov pc, r0
add sp, sp, #36
ldmia sp!, {r4-r8, lr}
add sp, sp, #12
mov pc, lr
endp

ctiVMThrowTrampoline proc
mov r0, sp
bl cti_vm_throw
endp

ctiOpThrowNotCaught proc
add sp, sp, #36
ldmia sp!, {r4-r8, lr}
add sp, sp, #12
mov pc, lr
endp

I can compile and link it without problems, but it crashes with a nullpointer 
at runtime and a strange callstack. When i use a debugger and step into 
ctiTrampoline it stops at the second stmdb because it can't find the 
sourcecode. :-/
I've done a #pragma pack(4) around the JITStackFrame.

Can somebody give me a hint where to search for the failure?

- Patrick
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] ARM JIT and related issues

2009-06-17 Thread Zoltan Herczeg
Hi,

 1) The armv7 port is separate from the armv6 work, and uses the thumb2
 instruction set. Both ports are (I hope!) useful.

We hope it as well.

 2) We would have liked to let the community know about the arm v7 port
 sooner. Unfortunately, we were not at liberty to disclose it until the
 iPhone 3G S announcement. We try to let the community know what we're
 up to and drop code into the public tree as soon as we can, but
 sometimes we are limited by confidentiality constraints.

Corporate secrets are corporate secrects, that is understandable.

 3) We'd definitely like to have a port for pre-v7 ARM in the main
 WebKit tree. I think everyone made this clear.

That is great!

 4) I think it would be good to see if more code and ideas can be
 shared between the two ARM ports. They were made independently, and
 originally in different ways, so let's see what exchange can happen.

Right now the macro-assembler based ARM port is an x86 emulator. It
translates x86 instructions into ARM instruction sequences (usually 1-5
instructions are enough). As ARMv7 port mostly does it, but thumb2 is much
closer to x86 than a RISC architecture (Thumb2 looks like a CISC operation
mode).

 5) Gavin has been a strong proponent of using MacroAssembler as the
 primary CPU abstraction layer, and that approach has worked reasonably
 well so far. However, it seems at least to me that CPUs with very
 different instruction sets may want to do things differently at a
 higher level. x86 is a 2-operand instruction set with optional memory
 operands, and it seems to me a 3-operand load-store architecture might
 want to do things in a different way to get good performance. Making
 them go through a common assembler interface may not work. Ultimately,
 however, the proof is in the performance results. If doing things a
 different way delivers better performance, that is more important than
 maximizing code sharing or architectural purity. That has always been
 the WebKit way.

In case of the ARM-port we have a native implementation and a
MacroAssembler based implementation, and we have already posted
comparisons between them to the bugzilla. Furthermore, we performed some
tests on our XScale simulator, and the native jit'ed code executes 5-40%
less instructions. However, the gain is smaller on the total runtime,
since the jit'ed code takes only a fragment of the total runtime.

Although native jit is faster, we are happy with MacroAssembler as well.

 6) It seems like the intent with the Szeged arm port and the plan for
 getting it in the tree wasn't clear to all parties involved. For me
 personally, it wasn't clear that there was an intent to contribute it,
 or perhaps even an expectation that we'd just pick it up from the
 external repository where it was developed. Things would have been
 more clear if patches were submitted for review earlier.

We know that such big patches requries several refactoring phases before
they go to mainline, that is why we thought it is a good idea to create a
branch on Staikos where you can take a look at them before we flooded the
WebKit bugzilla with patches.

 7) It seems like people said some intemperate things during the
 earlier discussion. It also seems like these remarks were based partly
 on misunderstanding. I hope everyone has gotten past that, and that we
 are all ready to work together productively.

True. I feel the communication between us improved a lot. However, I am
still thinking how can we involve others as well. I am pretty sure not
only us are interested in the design decisions we discuss in bugzilla.
Perhaps squirellfish-dev would be a good place for such discussions.

 8) A number patches from the folks working at University of Szeged
 have been landed. But it seems to me like there has also been a fair
 amount of abandoned work and working at cross purposes. I feel like
 the people working on JavaScript at U of Szeged are not entirely in
 sync with the main JavaScriptCore hackers. You guys have done a lot of
 great work, and I'd like to explore what we can do to get more in sync
 on design direction. Does anyone have suggestions on this front?

Again, this is true. We have no idea what is the general direction of
JavaScriptCore. We can only see landed patches, and predict the ongoing
and future works based on them. However, landed patches are completed
works, and it is usually too late for any contributions when they are
landed. It would be good to discuss things before a work started,
especially design changes, which affects all-ports.

We feel the design dicussions - such it was about ifdefs - would greatly
improve the cooperation between all parties since everybody can feel as a
part of the community.

Thanks,
Zoltan


___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


[webkit-dev] ARM JIT and related issues

2009-06-16 Thread Maciej Stachowiak


I'm not sure if there are any remaining disputes about the Nitro ports  
to armv6 and armv7. But just to make sure everyone is on the same  
page, I would like to clarify a few things:


1) The armv7 port is separate from the armv6 work, and uses the thumb2  
instruction set. Both ports are (I hope!) useful.


2) We would have liked to let the community know about the arm v7 port  
sooner. Unfortunately, we were not at liberty to disclose it until the  
iPhone 3G S announcement. We try to let the community know what we're  
up to and drop code into the public tree as soon as we can, but  
sometimes we are limited by confidentiality constraints.


3) We'd definitely like to have a port for pre-v7 ARM in the main  
WebKit tree. I think everyone made this clear.


4) I think it would be good to see if more code and ideas can be  
shared between the two ARM ports. They were made independently, and  
originally in different ways, so let's see what exchange can happen.


5) Gavin has been a strong proponent of using MacroAssembler as the  
primary CPU abstraction layer, and that approach has worked reasonably  
well so far. However, it seems at least to me that CPUs with very  
different instruction sets may want to do things differently at a  
higher level. x86 is a 2-operand instruction set with optional memory  
operands, and it seems to me a 3-operand load-store architecture might  
want to do things in a different way to get good performance. Making  
them go through a common assembler interface may not work. Ultimately,  
however, the proof is in the performance results. If doing things a  
different way delivers better performance, that is more important than  
maximizing code sharing or architectural purity. That has always been  
the WebKit way.


6) It seems like the intent with the Szeged arm port and the plan for  
getting it in the tree wasn't clear to all parties involved. For me  
personally, it wasn't clear that there was an intent to contribute it,  
or perhaps even an expectation that we'd just pick it up from the  
external repository where it was developed. Things would have been  
more clear if patches were submitted for review earlier.


7) It seems like people said some intemperate things during the  
earlier discussion. It also seems like these remarks were based partly  
on misunderstanding. I hope everyone has gotten past that, and that we  
are all ready to work together productively.


8) A number patches from the folks working at University of Szeged  
have been landed. But it seems to me like there has also been a fair  
amount of abandoned work and working at cross purposes. I feel like  
the people working on JavaScript at U of Szeged are not entirely in  
sync with the main JavaScriptCore hackers. You guys have done a lot of  
great work, and I'd like to explore what we can do to get more in sync  
on design direction. Does anyone have suggestions on this front?


I know that at at least some non-Apple developers have managed to do  
major work on JavaScriptCore internals (for example Cameron Zwarich  
before he became an Apple employee), so I am confident we can make  
things work better.



Regards,
Maciej

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] ARM JIT and related issues

2009-06-16 Thread Toshiyasu Morita
--- On Wed, 6/17/09, Maciej Stachowiak m...@apple.com wrote:
5)
Gavin has been a strong proponent of using MacroAssembler as the
primary CPU
 abstraction layer, and that approach has worked reasonably
well so far. However,
 it seems at least to me that CPUs with very
different instruction sets may want to
 do things differently at a
higher level. x86 is a 2-operand instruction set with
 optional memory
operands, and it seems to me a 3-operand load-store
 architecture might want to do things in a different way to get good 
 performance.

The porting problem IMHO isn't the number of operands. The problem is the JIT 
design assumes a CISC processor with the following characteristics:

1) call/return instructions which store the return address on the stack as on 
the x86 processor. If the target processor doesn't do this, then this requires 
a huge amount of work.

2) The JIT performs relocations in a kludgy way. Some relocations are performed 
before the code is copied to the final location, and some relocations after. 
Also, it's not clear which relocations are within a single code block, and 
which go across code blocks, so the generated code needs to assume the worst 
case if the target processor has a limited number of bits for relative branches.

3) JIT assumes the call instruction does not need to have the call address 
loaded into a register, which is the cause of the current bug I'm debugging. 
The JIT generates code which calls another call instruction directly instead of 
calling the previous two instructions which loads the call address into the 
register.

I could go on and on, but you get the idea. Basically, the current JIT design 
akes a large number of x86/CISC target architecture assumptions.

Toshi




  ___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] ARM JIT and related issues

2009-06-16 Thread Maciej Stachowiak


On Jun 16, 2009, at 5:52 PM, Toshiyasu Morita wrote:


--- On Wed, 6/17/09, Maciej Stachowiak m...@apple.com wrote:
5) Gavin has been a strong proponent of using MacroAssembler as the  
primary CPU
 abstraction layer, and that approach has worked reasonably well so  
far. However,
 it seems at least to me that CPUs with very different instruction  
sets may want to
 do things differently at a higher level. x86 is a 2-operand  
instruction set with

 optional memory operands, and it seems to me a 3-operand load-store
 architecture might want to do things in a different way to get  
good performance.


The porting problem IMHO isn't the number of operands. The problem  
is the JIT design assumes a CISC processor with the following  
characteristics:


1) call/return instructions which store the return address on the  
stack as on the x86 processor. If the target processor doesn't do  
this, then this requires a huge amount of work.


2) The JIT performs relocations in a kludgy way. Some relocations  
are performed before the code is copied to the final location, and  
some relocations after. Also, it's not clear which relocations are  
within a single code block, and which go across code blocks, so the  
generated code needs to assume the worst case if the target  
processor has a limited number of bits for relative branches.


3) JIT assumes the call instruction does not need to have the call  
address loaded into a register, which is the cause of the current  
bug I'm debugging. The JIT generates code which calls another call  
instruction directly instead of calling the previous two  
instructions which loads the call address into the register.


I could go on and on, but you get the idea. Basically, the current  
JIT design akes a large number of x86/CISC target architecture  
assumptions.


The issues you raise don't seem to have stopped either of the ARM  
ports so they do not seem like fundamental issues.


 - Maciej

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] arm jit

2009-06-11 Thread Zoltan Herczeg
Hi guys at Apple,

it looks we are in the way of the train. You have plans, we don't know
about them, you have commit rights, we don't, so the tides are against us.
Hints on the mailing lists are scarce, although a year ago someone from
you asked whether others are interested in design discussions, we said
'yes', nothing really changed. Except some simple bugfixes, new patches of
ours do not really get into the mainline (except some of our best ideas
reimplemented by someone else). Partially this is due to the lack of
information and roadmap of JavaScriptCore. That is why I feel the missing
of the real openness.

And here, I have to make a short comment on the non-acceptance of our ARM
JIT implementation. In your mail you mention that you would remain
reluctant to accept a duplicate of the JIT into the tree, rather than a
port of the existing JIT utilizing the MacroAssembler abstraction. Well,
did you check our ARM port? It has been rewritten to conform to the
MacroAssembler interfaces more than a month ago and posted it into the
bugzilla (https://bugs.webkit.org/show_bug.cgi?id=24986).

Anyway, we have updated the MacroAssembler-based ARM port of ours,
uploaded it to the bugzilla, and set the review flag on the patch.

Regards,
Zoltan

 On Jun 9, 2009, at 2:38 PM, Akos Kiss wrote:

 Dear Community,

 Today, we realized that there is a new ARM JIT port for WebKit.
 (http://trac.webkit.org/changeset/44514
 ) Congratulations on getting this working!, great job.

 Hi Akos,

 Thank you!  Just to clarify, we have just landed a ARMv7 architecture
 (thumb2) JIT backend into ToT.  I say ARMv7 to distinguish this port
 from the ARM application instruction set found in the ARMv6
 architecture and earlier (as I believe would be a common understanding
 of the term ARM, and and which I believe your port targets).  Thumb2
 in ARMv7 is, of course, a very different instruction set to the
 traditional 32-bit instructions found in ARM – with completely
 different machine encodings, and significantly different capabilities
 (e.g. two operand versus three operand instructions, sizes of
 immediate operands, and options for instruction predication).  For the
 JIT to be able to run on both ARM and ARMv7 platforms, it needs to be
 ported to both architectures – in much the same way the the JIT is
 ported to both the x86 and x86-64 platforms.

 Obviously there is a great deal of similarity on the surface between
 ARM and ARMv7, but in terms of the JIT implementation that may well
 only be true above the level of the MacroAssembler interface.  There
 may be some limited opportunities to share code within the Assembler
 classes (register numbering enums, and possibly types describing
 immediate operands), but since the assembler is primarily concerned
 with formatting machine instructions, and since the instruction
 encodings are different, it seems likely the bulk of the code will
 have to remain separate.  Again, the differences in instruction
 selection options available on the two architectures will likely make
 it hard to share code within the MacroAssember (different numbers of
 operands to many common instructions, and the options when working
 with large immediate values particularly spring to mind).  We would
 certainly want to share code and avoid any duplication where ever it
 makes sense to do so.

 I cannot conceal how disappointed I am, as is the whole team at
 Szeged.

 I am very sorry to hear this.  If you look at the patches that landed
 into ToT there were very few changes made outside of the new assembler
 classes which, for the reasons described above, I think are highly
 unlikely to have much in common on the two platforms.  The changes
 that have been made to common code outside of the assemblers should
 only help in removing x86 dependencies and assumptions that had
 existed in the code. I strongly urge you to review the changes that
 have been made, as I hope and believe you will find that they will
 assist the team in integrating your ARM port.

 Of course, we've felt that you were reluctant to accept our
 implementation.

 We were (and remain) reluctant to accept a duplicate of the JIT into
 the tree, rather than a port of the existing JIT utilizing the
 MacroAssembler abstraction.  We are concerned that it would be
 extremely difficult to continue to maintain such a port as we move the
 JIT technology forwards.  Beyond that, they key barrier to the ARM JIT
 being accepted into WebKit is that there simply haven't been any
 patches put forwards for us to review!  (I'm sorry, I'm aware you have
 provided a link to an external git repository, but I'm afraid we
 really can't seek through version control systems to find changes to
 review – we do need contributors to attach patches to bugs, and we
 need a review flag setting to indicate when the contributor believes
 their patch is ready.  If there is any uncertainty as to the
 procedure, please see http://webkit.org/coding/contributing.html .)

 - Are you 

Re: [webkit-dev] arm jit

2009-06-11 Thread Toshiyasu Morita
--- On Wed, 6/10/09, Gavin Barraclough barraclo...@apple.com wrote:



  If you consider calling a JS function with too few arguments as being
akin to =

 invoking a C++ method with some defaulted parameters
not-provided, then it is

 also the responsibility of code generated for
the call to such a method to ensure

 that values for all declared
parameters are passed.)



Thanks for the long explanation.



Can the arity check be performed at compile time as in C++?



Toshi




  ___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] arm jit

2009-06-11 Thread Geoffrey Garen

Can the arity check be performed at compile time as in C++?


C++ can perform arity checks at compile time because C++ uses early  
binding. JavaScript uses late binding.


Geoff___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] arm jit

2009-06-11 Thread Oliver Hunt


On Jun 11, 2009, at 9:59 AM, Toshiyasu Morita wrote:


--- On Wed, 6/10/09, Gavin Barraclough barraclo...@apple.com wrote:

  If you consider calling a JS function with too few arguments as  
being akin to =
 invoking a C++ method with some defaulted parameters not-provided,  
then it is
 also the responsibility of code generated for the call to such a  
method to ensure

 that values for all declared parameters are passed.)

Thanks for the long explanation.

Can the arity check be performed at compile time as in C++?

Toshi



Alas no because you can't really guarantee exactly what function will  
be called (except in a few relatively uncommon cases), eg.


function g() {
for (var i = 0; i  100; i++)
 f(a*i);
}
g();

So if we look at the call to f(a*i) we need to ask what is the arity  
of f?, so the issues we need to deal with to answer this question  
statically are
* the object f may not be defined or it may not be a function at  
compile time -- at runtime f may have become a function, or it may not
* any function call may result in f being changed and function calls  
may occur during arithmetic if you cannot guarantee the input types


These two things together mean it's not reasonably possible to  
guarantee the same function will be called every time, let alone have  
the same arity.


--Oliver

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] arm jit

2009-06-11 Thread Geoffrey Garen

it looks we are in the way of the train. You have plans, we don't know
about them, you have commit rights, we don't, so the tides are  
against us.


If you're interested in review or commit rights, they're granted based  
on a track record of good work, good judgement, and good  
collaboration. You can read more about the policy here: http://webkit.org/coding/commit-review-policy.html 
.


Please work on your collaboration skills. Right now, your tone stinks.

Geoff
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] arm jit

2009-06-11 Thread Zoltan Herczeg


 If you're interested in review or commit rights, they're granted based
 on a track record of good work, good judgement, and good
 collaboration. You can read more about the policy here:
 http://webkit.org/coding/commit-review-policy.html
 .

 Please work on your collaboration skills. Right now, your tone stinks.

I am sorry if I was not clear. I was talking about cooperation and
openness, not about commit rights. Actually, I think it is unimportant who
commit a patch, the important thing is that everyone should know what is
happening. That is why I wrote those boring blog posts about technical
details.

Zoltan


___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] arm jit

2009-06-11 Thread Gavin Barraclough
And here, I have to make a short comment on the non-acceptance of  
our ARM

JIT implementation. In your mail you mention that you would remain
reluctant to accept a duplicate of the JIT into the tree, rather  
than a
port of the existing JIT utilizing the MacroAssembler abstraction.  
Well,

did you check our ARM port? It has been rewritten to conform to the
MacroAssembler interfaces more than a month ago and posted it into the
bugzilla (https://bugs.webkit.org/show_bug.cgi?id=24986).


Hi Zoltan,

I'm sorry if I'm misinterpreting you here, but it sounds like over the  
last month you have been expecting your MacroAssembler based ARM port  
to have been reviewed.  If so -


Then I can quite understand your frustration, and do sympathize  
greatly with you.  There clearly has been a breakdown in  
communication, and I'm sorry about your disappointment.  I'm afraid  
that we had no way of knowing that you considered your port to be  
complete.  Just last week, in an email on this list, I said, when you  
have a patch ready for review, please attach it to the bug and set the  
review flag – I was under the impression that you did not feel your  
changes were yet final (by the sound of it I was mistaken).  I was  
assuming that when a final patch was ready you would attach it to the  
bug, and mark it for review.


We have a procedure for accepting contributions to avoid exactly this  
kind of miscommunication.  The mechanism for communicating to us that  
you believe your patch is ready is very simple, and is absolutely  
critical if you want to get code into WebKit.  Patches ready for  
review must be marked as such in bugzilla.  Without this we cannot  
tell which patches attached to bugs are complete, and which represent  
work in progress.


I urge you to review the instructions on contributing on the website,  
since following these will be the only way to avoid similar  
disappointment in the future.  Perhaps this is an area where we need  
to improve our communication – perhaps we need to make these  
instructions clearer, or more prominent on the website.  The website  
is all stored in svn, so please do file bugs in bugzilla – or patches  
welcome – if you think these can be improved.



Anyway, we have updated the MacroAssembler-based ARM port of ours,
uploaded it to the bugzilla, and set the review flag on the patch.


I've had a brief chance to look at the patch, and it's looking really  
great.  There are some bits to clean up a little to get it through  
review, and we will want to land a change of this magnitude  
incrementally.  I'm afraid that I have a busy day today, I will try to  
comment on the bug tonight but it may have to wait until the morning.   
Hopefully we can get this landed into ToT fairly quickly.


cheers,
G.



___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] arm jit

2009-06-10 Thread Toshiyasu Morita
Gavin Barraclough barraclo...@apple.com wrote:



 We were (and remain) reluctant to accept a duplicate of the JIT into
the tree, rather  than a port of the existing JIT utilizing the
MacroAssembler abstraction.  We are

 concerned that it would be
extremely difficult to continue to maintain such a port as

 we move the
JIT technology forwards. 


Umm. IMHO The existing JIT is not well designed, with
processor-specific constants everywhere and optimizations such as
inlinining huge blocks of weakly-optimized code instead of making a
function call to properly-optimized code.


These issues make it difficult to both port and maintain.


If the other JIT is better designed than the current one, IMHO it should be 
considered for inclusion.



Toshi






  ___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] arm jit

2009-06-10 Thread Toshiyasu Morita
--- On Wed, 6/10/09, Geoffrey Garen gga...@apple.com wrote:



I'm having a hard time understanding from your comment what optimization
changes you think are appropriate, but if you can produce a patch that
implements 

 your idea, and shows a benefit on a benchmark, I'd be happy
to review it.



Consider something like op_call.



This expands out to 95 inline instructions on the MIPS for just the
slow case alone, of which 3 are functions calls to other functions. So
this probably requires thousands of clock cycles to execute.



IMHO it doesn't make sense to inline op_call because:



1. It's a huge amount of JIT code just to save three of four
instructions at runtime (call, return, and maybe some register
shuffling)



2. The code which is executed is thousands of instructions and saving three or 
four instructions is a microscopic net win.



4. It make the generated machine code MUCH larger because instead of
having one copy of this function that is written in C/C++ and
statically compiled, there are multiple copies of this code for every
instance of op_call, which makes the instruction cache much less
effective.



5. The generated machine code is weakly optimized, so instead of having
calling code which is well-optimized by the C/C++ compiler for MIPS, it
is executing weakly optimized dynamically generated code. Since the
code is weakly optimized, it is also much larger than it should be,
which also makes the instruction cache much less effective.



6. The JIT-generated code resides in the data cache, and must be
flushed to main memory, then the instruction cache must be invalidated
so the new code will load into the instruction cache. Because the
WebKit JIT seems to do lazy compilation of functions at call time
(instead of compiling all the functions in one pass), this requires the
data cache to be flushed and the instruction cache to be invalided
every time a new function is generated, which further degrades
performance. This type of code generation strategy is ok for processors
with unified caches (or pseudo-ounified on x86) but for RISC machines
with separate instruction and data caches, it's really awful.



This is just one of the problems with the JIT on MIPS (and other RISC 
processors). If you're interested, I can elaborate more.



If my client is willing to pay for optimization work, I will eventually submit 
patches.



Toshi


  ___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] arm jit

2009-06-10 Thread Geoffrey Garen
This expands out to 95 inline instructions on the MIPS for just the  
slow case alone, of which 3 are functions calls to other functions.  
So this probably requires thousands of clock cycles to execute.


IMHO it doesn't make sense to inline op_call because:


You've made some interesting theoretical arguments against inlining  
op_call, but the empirical win from inlining op_call -- on x86 and  
arm, at least -- was tremendous.


Maybe the situation is different on MIPS. You can experiment with the  
JIT_OPTIMIZE_CALL preprocessor setting to test your theory.


1. It's a huge amount of JIT code just to save three of four  
instructions at runtime (call, return, and maybe some register  
shuffling)


I  don't understand your math here. Just the code to pass arguments to  
a call helper function would be more than three or four instructions.




2. The code which is executed is thousands of instructions and  
saving three or four instructions is a microscopic net win.


The generated code for a call slow case is pretty lengthy, but be  
careful not to confuse generated code with executed code. Slow case  
execution is relatively rare.




6. The JIT-generated code resides in the data cache, and must be  
flushed to main memory, then the instruction cache must be  
invalidated so the new code will load into the instruction cache.  
Because the WebKit JIT seems to do lazy compilation of functions at  
call time (instead of compiling all the functions in one pass), this  
requires the data cache to be flushed and the instruction cache to  
be invalided every time a new function is generated, which further  
degrades performance. This type of code generation strategy is ok  
for processors with unified caches (or pseudo-ounified on x86) but  
for RISC machines with separate instruction and data caches, it's  
really awful.


It would be an interesting experiment to compile functions at creation  
time instead of call time, and see if things got faster. I'd love to  
hear your results, if you try it.


I doubt that eager compilation would be a good strategy for the web,  
though, since web pages tend to load very large libraries of  
functions, while only calling a small percentage of those functions.


Geoff___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] arm jit

2009-06-10 Thread Oliver Hunt
It would be an interesting experiment to compile functions at  
creation time instead of call time, and see if things got faster.  
I'd love to hear your results, if you try it.


I doubt that eager compilation would be a good strategy for the web,  
though, since web pages tend to load very large libraries of  
functions, while only calling a small percentage of those functions.


It could be worth trying a stub function that triggers the compilation  
of the function should it not be present, but i'm not sure what that  
would really save as we still need the arity checks inline -- i  
suppose we could lazily generated trampolines for each arity as needed  
and just have many trampolines, but that could be complicated, and  
care would be needed to ensure that changing the code pointer didn't  
result in either incorrectly invalidating the caching or result in  
incorrectly caching the trampoline address instead of the real  
function code.


--Oliver



Geoff
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] arm jit

2009-06-10 Thread Geoffrey Garen
It could be worth trying a stub function that triggers the  
compilation of the function should it not be present, but i'm not  
sure what that would really save as we still need the arity checks  
inline


A design that I like is a stub function that triggers compilation (so  
the caller can always just call), combined with an arity check in  
the callee, which linked calls can skip (by linking to a label past  
the end of the arity check).


I think that could simplify the calling code, while reducing its  
footprint.


Geoff
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] arm jit

2009-06-10 Thread Toshiyasu Morita
Why does the arity check need to be in the caller, and not the callee?

Consider: one function that is called from 10,000 places.

Arity check in the caller: 10,000 copies of the artity check.
Arity check in the callee: one copy of the arity check

Toshi

--- On Wed, 6/10/09, Geoffrey Garen gga...@apple.com wrote:

From: Geoffrey Garen gga...@apple.com
Subject: Re: [webkit-dev] arm jit
To: Oliver Hunt oli...@apple.com
Cc: Toshiyasu Morita tm_web...@yahoo.com, WebKit Development 
webkit-dev@lists.webkit.org
Date: Wednesday, June 10, 2009, 9:14 PM

 It could be worth trying a stub function that triggers the compilation of the 
 function should it not be present, but i'm not sure what that would really 
 save as we still need the arity checks inline

A design that I like is a stub function that triggers compilation (so the 
caller can always just call), combined with an arity check in the callee, 
which linked calls can skip (by linking to a label past the end of the arity 
check).

I think that could simplify the calling code, while reducing its footprint.

Geoff



  ___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] arm jit

2009-06-10 Thread Toshiyasu Morita
--- On Wed, 6/10/09, Oliver Hunt oli...@apple.com wrote:

 

 I doubt that eager compilation would be a good strategy
for the web, though,

 since web pages tend to load very large libraries
of functions, while only calling a

 small percentage of those functions.



Turbo C compiled about 10,000 lines of source code per second on an
ancient 12 Mhz PC AT. It does register allocation, common subexpression
elimination, and a bunch of other classical compiler optimizations.



Most modern processors are from about 200 Mhz to about 3 Ghz, which is
significantly faster than a PC AT. If you use simple linear
extrapolation, that's a compile speed of about 160k-2m lines per
second. That seems adequate to compile even fairly large libraries of
functions.



Toshi





  ___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] arm jit

2009-06-10 Thread Oliver Hunt
The issue is that it compiling 5000 lines of libraries (possibly more)  
results in a significant amount of memory use, that's why we don't  
compile -- i don't believe there was a significant cpu time  
performance win (if any at all) from delaying function compilation.   
There was however a significant memory win for most pages the user  
visited.


--Oliver

On Jun 10, 2009, at 2:20 PM, Toshiyasu Morita wrote:


--- On Wed, 6/10/09, Oliver Hunt oli...@apple.com wrote:

 I doubt that eager compilation would be a good strategy for the  
web, though,
 since web pages tend to load very large libraries of functions,  
while only calling a

 small percentage of those functions.

Turbo C compiled about 10,000 lines of source code per second on an  
ancient 12 Mhz PC AT. It does register allocation, common  
subexpression elimination, and a bunch of other classical compiler  
optimizations.


Most modern processors are from about 200 Mhz to about 3 Ghz, which  
is significantly faster than a PC AT. If you use simple linear  
extrapolation, that's a compile speed of about 160k-2m lines per  
second. That seems adequate to compile even fairly large libraries  
of functions.


Toshi




___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] arm jit

2009-06-10 Thread Gavin Barraclough


On Jun 10, 2009, at 1:15 PM, Toshiyasu Morita wrote:


--- On Wed, 6/10/09, Geoffrey Garen gga...@apple.com wrote:

I'm having a hard time understanding from your comment what  
optimization changes you think are appropriate, but if you can  
produce a patch that implements
 your idea, and shows a benefit on a benchmark, I'd be happy to  
review it.


Consider something like op_call.

This expands out to 95 inline instructions on the MIPS for just the  
slow case alone, of which 3 are functions calls to other functions.  
So this probably requires thousands of clock cycles to execute.


IMHO it doesn't make sense to inline op_call because:


[ I'm sorry, I've been away from a net connection, I may be  
replicating a couple of things ggaren  olliej have already said. ]


Okay!  First up, have you tried turning off ENABLE_JIT_OPTIMIZE_CALL?   
If you do so, it should address the majority of your concerns, below  
(specifically, reducing code size, and removing the need for op_call  
to patch generated code).


Of course, we added the call optimizations because we measure them as  
a significant performance improvement, but feel free to test whether  
this is true on your platform, and once the MIPS JIT is in the tree  
we'd be happy to consider changes to the optimized mode that aid MIPS  
performance.


1. It's a huge amount of JIT code just to save three of four  
instructions at runtime (call, return, and maybe some register  
shuffling)


2. The code which is executed is thousands of instructions and  
saving three or four instructions is a microscopic net win.


4. It make the generated machine code MUCH larger because instead of  
having one copy of this function that is written in C/C++ and  
statically compiled, there are multiple copies of this code for  
every instance of op_call, which makes the instruction cache much  
less effective.


I think it's worth making sure you understand the optimization here.   
The majority of calls can be optimized, and having been optimized only  
run the sequence of instructions planted in the main generation pass.   
This code path is only a handful of instructions long, and introducing  
an extra call and return onto this path would almost certainly degrade  
performance (feel free to try doing so, and please so submit any  
patches that provide a memory saving, without significantly degrading  
performance).  For such a short and performance critical fragment of  
code it clearly could make sense to tweak the code for specific  
platforms, and it may well provide a significant performance benefit  
to do so.  We should certainly consider such patches.


The slow case JIT code is much longer, and less frequently executed.   
Introducing a call and return here to share code between calls  
definitely makes sense.  The way you know we think that it, the JIT  
already works this way!  The slow cases call out to a set of shared  
trampolines generated in privateCompileCTIMachineTrampolines.  This is  
however, a work in progress, and we are currently still clearly  
generating far more code than we should be in the slow cases.  More  
work should be done to unify the pre-linked and post-link slow case  
states, and to move work into the trampolines (this is something I may  
be looking at again fairly soon).


It is certainly valid to question whether the work performed by the  
machine trampolines is better in JIT generated code, or in C++ code  
that the compiler can optimize.  In the early stages of its  
development the JIT was more a context threaded interpreter, calling  
out to C++ to perform almost all optimizations.  We have migrated work  
into JIT generated code only where it has been a performance benefit  
to do so.  Of course, that doesn't mean that we always got it right,  
or that the trade-offs haven't changed, or that the policy might not  
need to be tweaked on different platforms.  Please feel free to  
experiment, and if you can produce patches that reduce the amount of  
work done in these JIT generated trampolines while improving  
performance then we'll be hugely appreciative (in fact, it needn't  
even be a performance win here – anything that doesn't degrade  
performance could be a nice simplification).


5. The generated machine code is weakly optimized, so instead of  
having calling code which is well-optimized by the C/C++ compiler  
for MIPS, it is executing weakly optimized dynamically generated  
code. Since the code is weakly optimized, it is also much larger  
than it should be, which also makes the instruction cache much less  
effective.


6. The JIT-generated code resides in the data cache, and must be  
flushed to main memory, then the instruction cache must be  
invalidated so the new code will load into the instruction cache.  
Because the WebKit JIT seems to do lazy compilation of functions at  
call time (instead of compiling all the functions in one pass), this  
requires the data cache to be flushed and the 

Re: [webkit-dev] arm jit

2009-06-10 Thread Gavin Barraclough

Toshiyasu,

On Jun 10, 2009, at 2:24 PM, Toshiyasu Morita wrote:


Why does the arity check need to be in the caller, and not the callee?


The majority of call sites always call to the same callee, and we can  
optimize these cases for calling that same function repeatedly.   
Within the optimized path of the op_call, where we are linked to a  
specific callee, we can statically determine at compile time that the  
caller and callee have the same arity, and omit the dynamic arity  
check altogether – which is a performance win.  Within the callee  
(where you might have been called from multiple call sites with  
difference numbers of arguments passed) it is not clear how you would  
implement such an optimization.  But patches welcome as ever!


(btw, I was on your side on this one, the arity check in the callee  
seemed more natural to me at first, until the benefits of performing  
the check in the caller became clear.  That said, from a design point  
of view, fixing up arity at the call site can also makes sense in  
matching (or at least being analogous to) calling conventions of other  
languages.  If you consider calling a JS function with too few  
arguments as being akin to invoking a C++ method with some defaulted  
parameters not-provided, then it is also the responsibility of code  
generated for the call to such a method to ensure that values for all  
declared parameters are passed.)



Consider: one function that is called from 10,000 places.

Arity check in the caller: 10,000 copies of the artity check.
Arity check in the callee: one copy of the arity check


You're not taking into account that we don't generate the arity check  
inline, instead it is in a shared trampoline.


Consider: one hundred functions that are called from 10,000 places.

Arity check in the caller: 10,000 copies of the artity check.
Arity check in the callee: one hundred copies of the arity check

Arity check in a set of 3 shared trampolines (which is how the JIT is  
currently implemented): 3 copies.
(3 due to the stages the call linking goes through, go see the code to  
find out why!)


cheers,
G.





___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


[webkit-dev] arm jit

2009-06-09 Thread Akos Kiss
Dear Community,

Today, we realized that there is a new ARM JIT port for WebKit. 
(http://trac.webkit.org/changeset/44514) Congratulations on getting this 
working!, great job.

I cannot conceal how disappointed I am, as is the whole team at Szeged. It was 
months ago, when we presented you our first results in the Bugzilla 
(https://bugs.webkit.org/show_bug.cgi?id=24986). Since then, we exchanged ideas 
in several comments and even received feedbacks and suggestions from you. Now, 
I pose the question (possibly only to myself): For what end? I'm quite sure 
that the work on your side has been ongoing for some time now. Still, you did 
not mention anything. We did not even get a notification on the new port. Of 
course, we've felt that you were reluctant to accept our implementation. At 
least, now we know why.

As WebKit currently is, it is not an open community. Not at all.

Let me ask two final questions:
- Are you still looking for patches, bug reports, feature requests, etc., or is 
it all in vain - you will get everything done inhouse?
- Should we ask for credits in the new files, as it was done by you when we 
first published our JIT implementation? I'm quite sure that we can state the 
same argument: a number of the new files appear to have taken large chunks of 
logic from existing jit files.

Best regards,
Akos


___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] arm jit

2009-06-09 Thread Geoffrey Garen

Hi Akos.

Today, we realized that there is a new ARM JIT port for WebKit. (http://trac.webkit.org/changeset/44514 
) Congratulations on getting this working!, great job.


Thanks.

I cannot conceal how disappointed I am, as is the whole team at  
Szeged.


I'm sorry to hear that. I understand your disappointment, but I have  
to admit that I'm excited to have a close-to-enabled, fast ARM JIT in  
the WebKit repository, that builds on JavaScriptCore's existing  
infrastructure and design.


It was months ago, when we presented you our first results in the  
Bugzilla (https://bugs.webkit.org/show_bug.cgi?id=24986). Since  
then, we exchanged ideas in several comments and even received  
feedbacks and suggestions from you. Now, I pose the question  
(possibly only to myself): For what end? I'm quite sure that the  
work on your side has been ongoing for some time now. Still, you did  
not mention anything. We did not even get a notification on the new  
port.


Of course, we've felt that you were reluctant to accept our  
implementation. At least, now we know why.


As WebKit currently is, it is not an open community. Not at all.


I don't think that's fair.

Looking at https://bugs.webkit.org/show_bug.cgi?id=24986, I see three  
detailed comments by Oliver Hunt, and two by Gavin Barraclough,  
explaining our thinking about how an ARM JIT should be organized to  
best effect. I think you'll find those comments strongly reflected in http://trac.webkit.org/changeset/44514 
.



Let me ask two final questions:
- Are you still looking for patches, bug reports, feature requests,  
etc.,


Yes.

- Should we ask for credits in the new files, as it was done by you  
when we first published our JIT implementation? I'm quite sure that  
we can state the same argument: a number of the new files appear to  
have taken large chunks of logic from existing jit files.


Generally, a new file that copies substantial code from an old file  
should include the old file's copyrights, along with the new author's  
copyright. Usually, this happens naturally via svn cp, but sometimes  
people forget or use other processes. If you notice a patch that is  
remiss in this area, please do mention something, with specifics.


Cheers,
Geoff
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] arm jit

2009-06-09 Thread Holger Freyther
On Tuesday 09 June 2009 23:38:43 Akos Kiss wrote:
 Dear Community,

 Today, we realized that there is a new ARM JIT port for WebKit.
 (http://trac.webkit.org/changeset/44514) Congratulations on getting this
 working!, great job.

 I cannot conceal how disappointed I am, as is the whole team at Szeged.

I can understand how bad you feel and I agree that Apple does not comment on 
future products.-mantra can suck at times. When you look at the ARM JIT of 
Apple you will see they mostly target Cortex-A8 (thumb2, vfp) and IIRC your 
JIT is much wider (supporting many more existing devices).

So my bottom line is something like please don't give up, and contributing to 
the JIT should be more easy now. E.g. pick a topic/theme for your work and try 
to get it in.

I can not share that WebKit is not a open community. Of course there are 
vendor interests and Apple, Nokia and Torchmobile don't share some of them. 
E.g. Apple probably does not see the value of XHTML MP, ECMA MP and WML and 
still these changes go on...

z.
___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] arm jit

2009-06-09 Thread Kenneth Christiansen
Hi there,

I would also say that is it pretty understanding that Apple does not
share information about working on a ARM JIT targeting thumb2,
especially as this can be used to foresee the hardware of future
iPhone models. Something they are probably not interesting in
revealing.

I agree whole-hearted with Holger, that the chances for getting ARM
JIT enhancements upsteamed now have been greatly improved; and with
Apple working on the same code I would expect good reviews and lots of
care for the ARM code base.

We really rely on you guys for bringing the ARM JIT code to other
versions of ARM, and I would like to say that I consider your
contributions important and very welcome, and would like to encourage
your to keep contributing!

Please keep up the good work!

Kenneth



On Tue, Jun 9, 2009 at 9:20 PM, Holger Freytherze...@selfish.org wrote:
 On Tuesday 09 June 2009 23:38:43 Akos Kiss wrote:
 Dear Community,

 Today, we realized that there is a new ARM JIT port for WebKit.
 (http://trac.webkit.org/changeset/44514) Congratulations on getting this
 working!, great job.

 I cannot conceal how disappointed I am, as is the whole team at Szeged.

 I can understand how bad you feel and I agree that Apple does not comment on
 future products.-mantra can suck at times. When you look at the ARM JIT of
 Apple you will see they mostly target Cortex-A8 (thumb2, vfp) and IIRC your
 JIT is much wider (supporting many more existing devices).

 So my bottom line is something like please don't give up, and contributing to
 the JIT should be more easy now. E.g. pick a topic/theme for your work and try
 to get it in.

 I can not share that WebKit is not a open community. Of course there are
 vendor interests and Apple, Nokia and Torchmobile don't share some of them.
 E.g. Apple probably does not see the value of XHTML MP, ECMA MP and WML and
 still these changes go on...

 z.
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


Re: [webkit-dev] arm jit

2009-06-09 Thread Gavin Barraclough

On Jun 9, 2009, at 2:38 PM, Akos Kiss wrote:


Dear Community,

Today, we realized that there is a new ARM JIT port for WebKit. (http://trac.webkit.org/changeset/44514 
) Congratulations on getting this working!, great job.


Hi Akos,

Thank you!  Just to clarify, we have just landed a ARMv7 architecture  
(thumb2) JIT backend into ToT.  I say ARMv7 to distinguish this port  
from the ARM application instruction set found in the ARMv6  
architecture and earlier (as I believe would be a common understanding  
of the term ARM, and and which I believe your port targets).  Thumb2  
in ARMv7 is, of course, a very different instruction set to the  
traditional 32-bit instructions found in ARM – with completely  
different machine encodings, and significantly different capabilities  
(e.g. two operand versus three operand instructions, sizes of  
immediate operands, and options for instruction predication).  For the  
JIT to be able to run on both ARM and ARMv7 platforms, it needs to be  
ported to both architectures – in much the same way the the JIT is  
ported to both the x86 and x86-64 platforms.


Obviously there is a great deal of similarity on the surface between  
ARM and ARMv7, but in terms of the JIT implementation that may well  
only be true above the level of the MacroAssembler interface.  There  
may be some limited opportunities to share code within the Assembler  
classes (register numbering enums, and possibly types describing  
immediate operands), but since the assembler is primarily concerned  
with formatting machine instructions, and since the instruction  
encodings are different, it seems likely the bulk of the code will  
have to remain separate.  Again, the differences in instruction  
selection options available on the two architectures will likely make  
it hard to share code within the MacroAssember (different numbers of  
operands to many common instructions, and the options when working  
with large immediate values particularly spring to mind).  We would  
certainly want to share code and avoid any duplication where ever it  
makes sense to do so.


I cannot conceal how disappointed I am, as is the whole team at  
Szeged.


I am very sorry to hear this.  If you look at the patches that landed  
into ToT there were very few changes made outside of the new assembler  
classes which, for the reasons described above, I think are highly  
unlikely to have much in common on the two platforms.  The changes  
that have been made to common code outside of the assemblers should  
only help in removing x86 dependencies and assumptions that had  
existed in the code. I strongly urge you to review the changes that  
have been made, as I hope and believe you will find that they will  
assist the team in integrating your ARM port.


Of course, we've felt that you were reluctant to accept our  
implementation.


We were (and remain) reluctant to accept a duplicate of the JIT into  
the tree, rather than a port of the existing JIT utilizing the  
MacroAssembler abstraction.  We are concerned that it would be  
extremely difficult to continue to maintain such a port as we move the  
JIT technology forwards.  Beyond that, they key barrier to the ARM JIT  
being accepted into WebKit is that there simply haven't been any  
patches put forwards for us to review!  (I'm sorry, I'm aware you have  
provided a link to an external git repository, but I'm afraid we  
really can't seek through version control systems to find changes to  
review – we do need contributors to attach patches to bugs, and we  
need a review flag setting to indicate when the contributor believes  
their patch is ready.  If there is any uncertainty as to the  
procedure, please see http://webkit.org/coding/contributing.html .)


- Are you still looking for patches, bug reports, feature requests,  
etc., or is it all in vain - you will get everything done in house?


Yes!   Please do so, this is only way your changes will get into the  
tree.


- Should we ask for credits in the new files, as it was done by you  
when we first published our JIT implementation? I'm quite sure that  
we can state the same argument: a number of the new files appear to  
have taken large chunks of logic from existing jit files.


The new files were derived from their x86 counterparts, with reference  
to the ARMv7 manuals.  As such the existing copyright notifications  
within the files from which they were derived have been retained.


(Apologies for the slow reply, we have a busy week with WWDC on!)

cheers,
G.




Best regards,
Akos


___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev


___
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev