[go-nuts] Re: instruction gen in amd64, it iseems not the best choice?

2020-08-26 Thread moehrmann via golang-nuts
As far as I am aware:

A LEA with a scale of 3 does not exist on amd64. Scale can be 1,2,4,8.

A LEA with 3 arguments LEAQ 4(AX)(AX*2) on many modern amd64 compatible 
machines will use 3 cycles instead of 2 for two simpler LEA.
The newest generation of Intel CPUs seems to have gotten better again 
avoiding slow LEA.

https://github.com/golang/go/issues/21735
https://github.com/golang/go/issues/31900

On Wednesday, August 26, 2020 at 5:04:15 AM UTC+2, xie cui wrote:
>
> function:
> func test3(a int) int {
>   return a * 3 + 4
> }
>
> go version go1.13.5 darwin/amd64
> generate instructions:
>   LEAQ(AX)(AX*2), AX
>   LEAQ4(AX), AX
>
> As far as i known,there a better choice
>  LEAQ4(AX*3), AX
>
> Can it be optimized?
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/0d638194-eaf0-4b99-b66e-c831d8491470o%40googlegroups.com.


[go-nuts] Re: instruction gen in amd64, it iseems not the best choice?

2020-08-25 Thread peterGo
Go has optimizing compilers and linkers which are constantly being improved.

The following example of your program reduces the call to test3 to

MOVQ $0x82, 0(SP)

where 0x82 = 130 = 42 * 3 + 4

$ cat xiecui.go
package main

func test3(a int) int {
return a*3 + 4
}

func main() {
a := 42
t := test3(a)
println(t)
}

$ go version
go version devel +758ac371ab Tue Aug 25 21:15:43 2020 + linux/amd64

$ go build xiecui.go

$ ./xiecui
130

$ go tool compile -S xiecui.go > xiecui.compile

"".test3 STEXT nosplit size=19 args=0x10 locals=0x0 funcid=0x0
0x 0 (xiecui.go:3)TEXT"".test3(SB), 
NOSPLIT|ABIInternal, $0-16
0x 0 (xiecui.go:3)FUNCDATA$0, 
gclocals·33cdeebe80329f1fdbee7f5874cb(SB)
0x 0 (xiecui.go:3)FUNCDATA$1, 
gclocals·33cdeebe80329f1fdbee7f5874cb(SB)
0x 0 (xiecui.go:4)MOVQ"".a+8(SP), AX
0x0005 5 (xiecui.go:4)LEAQ(AX)(AX*2), AX
0x0009 9 (xiecui.go:4)LEAQ4(AX), AX
0x000d 00013 (xiecui.go:4)MOVQAX, "".~r1+16(SP)
0x0012 00018 (xiecui.go:4)RET
"".main STEXT size=77 args=0x0 locals=0x10 funcid=0x0
0x 0 (xiecui.go:7)TEXT"".main(SB), ABIInternal, $16-0
0x 0 (xiecui.go:7)MOVQ(TLS), CX
0x0009 9 (xiecui.go:7)CMPQSP, 16(CX)
0x000d 00013 (xiecui.go:7)PCDATA$0, $-2
0x000d 00013 (xiecui.go:7)JLS70
0x000f 00015 (xiecui.go:7)PCDATA$0, $-1
0x000f 00015 (xiecui.go:7)SUBQ$16, SP
0x0013 00019 (xiecui.go:7)MOVQBP, 8(SP)
0x0018 00024 (xiecui.go:7)LEAQ8(SP), BP
0x001d 00029 (xiecui.go:7)FUNCDATA$0, 
gclocals·33cdeebe80329f1fdbee7f5874cb(SB)
0x001d 00029 (xiecui.go:7)FUNCDATA$1, 
gclocals·33cdeebe80329f1fdbee7f5874cb(SB)
0x001d 00029 (xiecui.go:10)PCDATA$1, $0
0x001d 00029 (xiecui.go:10)NOP
0x0020 00032 (xiecui.go:10)CALLruntime.printlock(SB)
0x0025 00037 (xiecui.go:10)MOVQ$130, (SP)
0x002d 00045 (xiecui.go:10)CALLruntime.printint(SB)
0x0032 00050 (xiecui.go:10)CALLruntime.printnl(SB)
0x0037 00055 (xiecui.go:10)CALLruntime.printunlock(SB)
0x003c 00060 (xiecui.go:11)MOVQ8(SP), BP
0x0041 00065 (xiecui.go:11)ADDQ$16, SP
0x0045 00069 (xiecui.go:11)RET
0x0046 00070 (xiecui.go:11)NOP
0x0046 00070 (xiecui.go:7)PCDATA$1, $-1
0x0046 00070 (xiecui.go:7)PCDATA$0, $-2
0x0046 00070 (xiecui.go:7)CALLruntime.morestack_noctxt(SB)
0x004b 00075 (xiecui.go:7)PCDATA$0, $-1
0x004b 00075 (xiecui.go:7)JMP0

$ go tool objdump xiecui > xiecui.objdump

TEXT main.main(SB) /home/peter/Sync/gopath/mod/nuts/xiecui.go
  xiecui.go:70x45cc8064488b0c25f8ffMOVQ 
FS:0xfff8, CX
  xiecui.go:70x45cc89483b6110CMPQ 0x10(CX), SP

  xiecui.go:70x45cc8d7637JBE 0x45ccc6

  xiecui.go:70x45cc8f4883ec10SUBQ $0x10, SP

  xiecui.go:70x45cc9348896c2408MOVQ BP, 0x8(SP)

  xiecui.go:70x45cc98488d6c2408LEAQ 0x8(SP), BP

  xiecui.go:100x45cc9d0f1f00NOPL 0(AX)

  xiecui.go:100x45cca0e89b17fdffCALL 
runtime.printlock(SB)
  xiecui.go:100x45cca548c704248200MOVQ $0x82, 
0(SP)
  xiecui.go:100x45ccade8ae1ffdffCALL 
runtime.printint(SB)
  xiecui.go:100x45ccb2e8491afdffCALL 
runtime.printnl(SB)
  xiecui.go:100x45ccb7e80418fdffCALL 
runtime.printunlock(SB)
  xiecui.go:110x45ccbc488b6c2408MOVQ 0x8(SP), BP

  xiecui.go:110x45ccc14883c410ADDQ $0x10, SP

  xiecui.go:110x45ccc5c3RET
  xiecui.go:70x45ccc6e8d5b0CALL 
runtime.morestack_noctxt(SB)
  xiecui.go:70x45cccbebb3JMP main.main(SB)

$ 

Peter

On Tuesday, August 25, 2020 at 11:04:15 PM UTC-4 cuiw...@gmail.com wrote:

> function:
> func test3(a int) int {
>   return a * 3 + 4
> }
>
> go version go1.13.5 darwin/amd64
> generate instructions:
>   LEAQ(AX)(AX*2), AX
>   LEAQ4(AX), AX
>
> As far as i known,there a better choice
>  LEAQ4(AX*3), AX
>
> Can it be optimized?
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit