I should add that operating system is Linux 4.4 running on 64-bit Intel(R)
Xeon(R) CPU E5-1650
On Sunday, September 23, 2018 at 5:49:47 PM UTC-4, Abhinav Jangda wrote:
>
> Hello Everyone,
>
> I have been studying the machine code generated by V8 for Web Assembly. I
> took the following function *kernel_gemm* as example:
>
> # define NI 1000
> # define NJ 1100
> # define NK 1200
>
> void kernel_gemm(
> int C[NI][NJ],
> int A[NI][NK],
> int B[NK][NJ])
> {
> int i, j, k;
>
> for (i = 0; i < NI; i++) {
> for (k = 0; k < NK; k++) {
> for (j = 0; j < NJ; j++) {
> C[i][j] += A[i][k] * B[k][j];
> }
> }
> }
> }
>
> The above file is compiled to WASM using latest emsdk based on
> clang//llvm-6.0 and is executed by v8. After studying the generated machine
> code for above function by v8, I found that there are extra stack loads:
>
> movq %rax, -0x10(%rsp)
> movq %rdx, -0x18(%rsp)
> xorq %rsi,%rsi
> movq $0, %rdi
> nop
> L1:
> imull $0x12c0, %esi, %r8d
> addq %rdx, %r8
> imull $0x1130,%esi,%r9d
> addq %rax,%r9
> xorq %r11,%r11
> nop
> L2:
> imull $0x1130,%r11d,%r12d
> leaq (%r8,%r11,4),%r14
> addq %rcx,%r12
> xorq %rbx,%rbx
> movl %ebx,%r15d
> nop
> nop
> L3:
> leaq 0x1(%r15),%rax
> leaq (%r12,%r15,4),%rbx
> movl (%rdi,%r14,1),%edx
> leaq (%r9,%r15,4),%r15
> movl (%rdi,%rbx,1),%ebx
> imull %ebx,%edx
> movl (%rdi,%r15,1),%ebx
> addl %ebx,%edx
> movl %ebx,(%rdi,%r15,1)
>
> cmpl $0x44c,%eax
> jz L3END
> movl %eax,%r15d
> jmp L3
> L3END:
> addl $0x1,%r11d
> cmpl $0x4b0,%r11d
> jnz L2
> addl $0x1,%esi
> cmpl $0x3e8,%esi
> jz L1END
> movq -0x18(%rsp),%rdx
> movq -0x10(%rsp),%rax
> jmp L1
> L1END:
> addq $0x20, %rsp
>
> As you can see that there are extra stack loads for *rdx* and *rax*
> registers in every iteration of first loop (in between *L1END *and *L3END*).
> However, clang generates a code which performs around 1.3x better than v8
> and has no stack loads of operands. According to calling convention of V8
> generated code, the arguments will be passed in registers *rax*, *rcx*,
> *rdx*. Hence, *rdx*, and *rax* are for variables B and C respectively.
> I have been trying to get to know why there are extra loads. One reason
> could be the register allocator of v8 is not as good as clang (which I
> guess is fine because v8 has JIT and JITs are supposed to generate code
> faster than AOT compilers). But I think there should exist another reason
> like may be for On Stack Replacement or Preemption of code.
>
> It would be really great if anyone can point me in the direction in V8
> source code. I have looked at wasm-compiler.cc but couldn't find anything.
>
> NOTE: The v8 generated code is generated using nodejs v8.11.2 and has been
> converted to a simpler format by replacing absolute address in code with
> labels. Above code, when assembled using clang (after taking care of
> calling conventions of clang) performs exactly the same as v8 generated
> code.
>
> As a reference, the clang generated assembly code is
>
> xorl %r8d, %r8d
> .p2align 4, 0x90
> .LBB0_1: # %for.body
> # =>This Loop Header: Depth=1
> # Child Loop BB0_2 Depth 2
> # Child Loop BB0_3 Depth 3
> movq %rdx, %r10
> xorl %r9d, %r9d
> .p2align 4, 0x90
> .LBB0_2: # %for.body3
> # Parent Loop BB0_1 Depth=1
> # => This Loop Header: Depth=2
> # Child Loop BB0_3 Depth 3
> imulq $4800, %r8, %rax # imm = 0x12C0
> addq %rsi, %rax
> leaq (%rax,%r9,4), %r11
> movq $-1100, %rcx # imm = 0xFBB4
> .p2align 4, 0x90
> .LBB0_3: # %for.body6
> # Parent Loop BB0_1 Depth=1
> # Parent Loop BB0_2 Depth=2
> # => This Inner Loop Header:
> Depth=3
> movl (%r11), %eax
> movl 4400(%r10,%rcx,4), %ebx
> imull %eax, %ebx
> movl 4400(%rdi,%rcx,4), %eax
> addl %ebx, %eax
> movl %eax, 4400(%rdi,%rcx,4)
> addq $1, %rcx
> jne .LBB0_3
> # %bb.4: # %for.inc17
> # in Loop: Header=BB0_2 Depth=2
> addq $1, %r9
> addq $4400, %r10 # imm = 0x1130
> cmpq $1200, %r9 # imm = 0x4B0
> jne .LBB0_2
> # %bb.5: # %for.inc20
> # in Loop: Header=BB0_1 Depth=1
> addq $1, %r8
> addq $4400, %rdi # imm = 0x1130
> cmpq $1000, %r8 # imm = 0x3E8
> jne .LBB0_1
> # %bb.6: # %for.end22
> popq %rbx
> retq
>
>
>
>
> Thank You,
>
--
--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.