Re: [fpc-devel] Register allocation question

2011-04-10 Thread Sergei Gorelkin

10.04.2011 00:49, Florian Klämpfl пишет:

Am 09.04.2011 22:22, schrieb Sergei Gorelkin:

09.04.2011 23:10, Florian Klämpfl пишет:


Problem is, this might hurt non leaf functions. Maybe the register
allocators can be initialized differently for leave and non-leave
functions?


I understand the concern, but it should be handled somehow already. If
we consider a non-leaf function that is complex enough to consume all 14
registers, what difference does the order of allocation make?


It is not needed to use all 14, but it might be more benefical to use
those which are preserved across a function call.


When
making a call, it must know which registers will be destroyed and which
won't, otherwise result will be wrong anyway.
What I see confirms what I think: non-leaf functions continue to use
rbx, rsi and rdi, not r8..r11.


So the code for those does not change?


Some do not change (that's why I was initially writing that it doesn't work), other change, but 
never to the worse. For example, if register was e.g. rbx but its live range was not intersecting 
the call, then it is replaced by volatile one like r8. If its live range intersects the call, it can 
be changed to other nonvolatile register like rdi. Likewise, registers within volatile group are 
interchanged. But I don't see it replacing nonvolatile register with volatile one if that would 
require adding spilling instructions.


By now I had run the test suite in x86_64-linux, without regressions.


Sergei

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Register allocation question

2011-04-10 Thread Florian Klämpfl
Am 10.04.2011 12:38, schrieb Sergei Gorelkin:
 
 By now I had run the test suite in x86_64-linux, without regressions.

Feel free to commit it then.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


[fpc-devel] Register allocation question

2011-04-09 Thread Sergei Gorelkin

Hello,

I wonder whether it is possible to assign a priority (or order) of registers for FPC's register 
allocator. Currently registers are allocated in the order of ordinals defined in cpubase.pas. On 
i386 it doesn't make any difference, but on x86_64 'nonvolatile' rbx (and in Win64 also rsi and rdi) 
are always used before 'volatile' ones r8..r11. Reversing this order would help avoiding stackframes 
in simple procedures, resulting in nicer code.


Maybe somebody could share some clues about if this is possible and where to 
start looking?

Regards,
Sergei
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Register allocation question

2011-04-09 Thread Jonas Maebe

On 09 Apr 2011, at 20:08, Sergei Gorelkin wrote:

 I wonder whether it is possible to assign a priority (or order) of registers 
 for FPC's register allocator. Currently registers are allocated in the order 
 of ordinals defined in cpubase.pas. On i386 it doesn't make any difference, 
 but on x86_64 'nonvolatile' rbx (and in Win64 also rsi and rdi) are always 
 used before 'volatile' ones r8..r11. Reversing this order would help avoiding 
 stackframes in simple procedures, resulting in nicer code.
 
 Maybe somebody could share some clues about if this is possible and where to 
 start looking?

Simply changing the register order in the array to trgcpu.create in 
Tcgx86_64.init_register_allocators should do it.


Jonas___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Register allocation question

2011-04-09 Thread Florian Klämpfl
Am 09.04.2011 20:08, schrieb Sergei Gorelkin:
 Hello,
 
 I wonder whether it is possible to assign a priority (or order) of
 registers for FPC's register allocator. Currently registers are
 allocated in the order of ordinals defined in cpubase.pas. On i386 it
 doesn't make any difference, but on x86_64 'nonvolatile' rbx (and in
 Win64 also rsi and rdi) are always used before 'volatile' ones r8..r11.
 Reversing this order would help avoiding stackframes in simple
 procedures, resulting in nicer code.
 
 Maybe somebody could share some clues about if this is possible and
 where to start looking?


The registers are allocated in the order defined in
tcgx86_64.init_registers_allocators. However, there are rax etc. in
front of rbx etc. The reason why rbx etc. are used might be calls to
other procedures. Can you give an example which is affected by the
problem mentioned above?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Register allocation question

2011-04-09 Thread Sergei Gorelkin

09.04.2011 22:26, Sergei Gorelkin пишет:

09.04.2011 22:13, Jonas Maebe пишет:


Simply changing the register order in the array to trgcpu.create in
Tcgx86_64.init_register_allocators should do it.


Hmm, that was the first thing I tried, but it doesnt't seem to make any 
difference :(


No, it works, I simply looked at the wrong place. As usual :-/

The right place to look is the function not calling other functions, not just any function simple 
enough.
Attached are assembler listings of system.indexqword() compiled for win64 with -O2, with and without 
the change. Note the prolog and epilog (almost) gone.


This is of course a very quick test, and I'll run the testsuite to check more 
thoroughly.
If no issues pop up, it is ok to commit?

Sergei
SYSTEM_INDEXQWORD$formal$INT64$QWORD$$INT64:
; Temps allocated between rsp+32 and rsp+56
; [377] begin
sub rsp,104
; Var buf located in register rcx
; Var len located in register rdx
; Var b located in register r8
; Var $result located in register rax
; Var psrc located in register rbx
; Var pend located in register rsi
mov qword ptr [rsp+32],rbx
mov qword ptr [rsp+40],rdi
mov qword ptr [rsp+48],rsi
; [378] psrc:=@buf;
mov rbx,rcx
; [381] if (len  0) or
mov rax,rdx
cmp rax,0
jl  @@j373
; [382] (len  high(PtrInt) div 4) or
mov rax,rdx
mov rsi,2305843009213693951
cmp rax,rsi
jg  @@j373
; [383] (psrc+len  psrc) then
mov rax,rdx
shl rax,3
add rax,rbx
cmp rax,rbx
jnb @@j374
@@j373:
; [384] pend:=pqword(high(PtrUInt)-sizeof(qword))
mov rsi,-9
jmp @@j383
@@j374:
; [386] pend:=psrc+len;
shl rdx,3
add rdx,rbx
mov rsi,rdx
; [400] while psrcpend do
jmp @@j383
ALIGN 8
@@j382:
; [402] if psrc^=b then
mov rdx,qword ptr [rbx]
cmp rdx,r8
jne @@j386
; [404] result:=psrc-pqword(@buf);
mov rdx,rcx
mov rdi,rbx
sub rdi,rdx
mov rdx,rdi
mov rdi,rdx
sar rdi,63
and rdi,7
add rdx,rdi
sar rdx,3
mov rax,rdx
; [405] exit;
jmp @@j369
@@j386:
; [407] inc(psrc);
add rbx,8
@@j383:
mov rdx,rbx
cmp rdx,rsi
jb  @@j382
; [409] result:=-1;
mov rax,-1
@@j369:
; [410] end;
mov rbx,qword ptr [rsp+32]
mov rdi,qword ptr [rsp+40]
mov rsi,qword ptr [rsp+48]
add rsp,104
ret
_TEXT   ENDS
SYSTEM_INDEXQWORD$formal$INT64$QWORD$$INT64:
; Temps allocated between rsp+32 and rsp+32
; [377] begin
sub rsp,72
; Var buf located in register rcx
; Var len located in register rdx
; Var b located in register r8
; Var $result located in register rax
; Var psrc located in register r9
; Var pend located in register r10
; [378] psrc:=@buf;
mov r9,rcx
; [381] if (len  0) or
mov rax,rdx
cmp rax,0
jl  @@j373
; [382] (len  high(PtrInt) div 4) or
mov rax,rdx
mov r10,2305843009213693951
cmp rax,r10
jg  @@j373
; [383] (psrc+len  psrc) then
mov rax,rdx
shl rax,3
add rax,r9
cmp rax,r9
jnb @@j374
@@j373:
; [384] pend:=pqword(high(PtrUInt)-sizeof(qword))
mov r10,-9
jmp @@j383
@@j374:
; [386] pend:=psrc+len;
shl rdx,3
add rdx,r9
mov r10,rdx
; [400] while psrcpend do
jmp @@j383
ALIGN 8
@@j382:
; [402] if psrc^=b then
mov rdx,qword ptr [r9]
cmp rdx,r8
jne @@j386
; [404] result:=psrc-pqword(@buf);
mov rdx,rcx
mov r11,r9
sub r11,rdx
mov rdx,r11
mov r11,rdx
sar r11,63
and r11,7
add rdx,r11
sar rdx,3
mov rax,rdx
; [405] exit;
jmp @@j369
@@j386:
; [407] inc(psrc);
add r9,8
@@j383:
mov rdx,r9
cmp rdx,r10
jb  @@j382
; [409] result:=-1;
mov 

Re: [fpc-devel] Register allocation question

2011-04-09 Thread Sergei Gorelkin

09.04.2011 22:15, Florian Klämpfl пишет:


The registers are allocated in the order defined in
tcgx86_64.init_registers_allocators. However, there are rax etc. in
front of rbx etc. The reason why rbx etc. are used might be calls to
other procedures. Can you give an example which is affected by the
problem mentioned above?


I attached an example to the answer to Jonas, in adjacent branch.

Sergei

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Register allocation question

2011-04-09 Thread Florian Klämpfl
Am 09.04.2011 21:04, schrieb Sergei Gorelkin:
 09.04.2011 22:26, Sergei Gorelkin пишет:
 09.04.2011 22:13, Jonas Maebe пишет:

 Simply changing the register order in the array to trgcpu.create in
 Tcgx86_64.init_register_allocators should do it.

 Hmm, that was the first thing I tried, but it doesnt't seem to make
 any difference :(

 No, it works, I simply looked at the wrong place. As usual :-/
 
 The right place to look is the function not calling other functions, not
 just any function simple enough.
 Attached are assembler listings of system.indexqword() compiled for
 win64 with -O2, with and without the change. Note the prolog and epilog
 (almost) gone.
 
 This is of course a very quick test, and I'll run the testsuite to check
 more thoroughly.
 If no issues pop up, it is ok to commit?

Problem is, this might hurt non leaf functions. Maybe the register
allocators can be initialized differently for leave and non-leave functions?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Register allocation question

2011-04-09 Thread Daniël Mantione



Op Sat, 9 Apr 2011, schreef Florian Klämpfl:


Am 09.04.2011 21:04, schrieb Sergei Gorelkin:

09.04.2011 22:26, Sergei Gorelkin ?:

09.04.2011 22:13, Jonas Maebe ?:


Simply changing the register order in the array to trgcpu.create in
Tcgx86_64.init_register_allocators should do it.


Hmm, that was the first thing I tried, but it doesnt't seem to make
any difference :(


No, it works, I simply looked at the wrong place. As usual :-/

The right place to look is the function not calling other functions, not
just any function simple enough.
Attached are assembler listings of system.indexqword() compiled for
win64 with -O2, with and without the change. Note the prolog and epilog
(almost) gone.

This is of course a very quick test, and I'll run the testsuite to check
more thoroughly.
If no issues pop up, it is ok to commit?


Problem is, this might hurt non leaf functions. Maybe the register
allocators can be initialized differently for leave and non-leave functions?


This is a form of biasing, the register allocator is biased to put 
certain values in certain registers. It's a very old trick to get better 
register allocations, and the iterated coalescing we do gets much better 
results than old biased algorithms.


However, I had noted that in many cases the iterated coalescing still 
leaves a lot of freedom during the actual allocations and adding some 
biasing at this point may be helpfull.


I think the challenge is do design some generic infrastructure to tell the 
register allocator about biasing it should do, and then to add some 
heuristics somewhere else (like leaf/non-leaf) to give the register 
allocator the proper instructions.


Daniël___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Register allocation question

2011-04-09 Thread Florian Klämpfl
Am 09.04.2011 21:34, schrieb Daniël Mantione:
 I think the challenge is do design some generic infrastructure to tell
 the register allocator about biasing it should do, and then to add some
 heuristics somewhere else (like leaf/non-leaf) to give the register
 allocator the proper instructions.

True, but we even don't find the time to extend the reg. allocator to
handle overlapping registers better so starting with different register
allocation initializations is a good approach imo.
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Register allocation question

2011-04-09 Thread Sergei Gorelkin

09.04.2011 23:10, Florian Klämpfl пишет:


Problem is, this might hurt non leaf functions. Maybe the register
allocators can be initialized differently for leave and non-leave functions?


I understand the concern, but it should be handled somehow already. If we consider a non-leaf 
function that is complex enough to consume all 14 registers, what difference does the order of 
allocation make? When making a call, it must know which registers will be destroyed and which won't, 
otherwise result will be wrong anyway.

What I see confirms what I think: non-leaf functions continue to use rbx, rsi 
and rdi, not r8..r11.
Must admit I don't understand how it happens: trgobj.preserved_by_proc is nowhere read, 
saved_standard_registers are only encountered in prolog and epilog generation code.


Sergei
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel


Re: [fpc-devel] Register allocation question

2011-04-09 Thread Florian Klämpfl
Am 09.04.2011 22:22, schrieb Sergei Gorelkin:
 09.04.2011 23:10, Florian Klämpfl пишет:

 Problem is, this might hurt non leaf functions. Maybe the register
 allocators can be initialized differently for leave and non-leave
 functions?
 
 I understand the concern, but it should be handled somehow already. If
 we consider a non-leaf function that is complex enough to consume all 14
 registers, what difference does the order of allocation make? 

It is not needed to use all 14, but it might be more benefical to use
those which are preserved across a function call.

 When
 making a call, it must know which registers will be destroyed and which
 won't, otherwise result will be wrong anyway.
 What I see confirms what I think: non-leaf functions continue to use
 rbx, rsi and rdi, not r8..r11.

So the code for those does not change?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel