Re: LDC2 win64 calling convention

2018-12-01 Thread Johan Engelen via Digitalmars-d-learn

On Thursday, 29 November 2018 at 15:10:41 UTC, realhet wrote:


In conclusion: Maybe LDC2 generates a lot of extra code, but I 
always make longer asm routines, so it's not a problem for me 
at all while it helps me a lot.


An extra note: I recommend you look into using 
`ldc.llvmasm.__asm` to write inline assembly. Some advantages: no 
worrying about calling conventions (portability) and you'll have 
more instructions available.
If you care about performance, usually you should _not_ write 
assembly, but for the 1% of other cases: the compiler also 
understands your asm much better if you use __asm.


LDC's __asm syntax is very similar (if not the same) to what GDC 
uses for inline assembly.


-Johan



Re: LDC2 win64 calling convention

2018-11-29 Thread realhet via Digitalmars-d-learn

On Wednesday, 28 November 2018 at 21:58:16 UTC, kinke wrote:
You're not using naked asm; this entails a prologue (spilling 
the params to stack etc.). Additionally, LDC doesn't really 
like accessing params and locals in DMD-style inline asm, see 
https://github.com/ldc-developers/ldc/issues/2854.


You can check the final asm trivially online, e.g., 
https://run.dlang.io/is/e0c2Ly (click the ASM button). You'll 
see that your params are in R8, RDX and RCX (reversed order as 
mentioned earlier).


Hi again.

I just tried a new debugger: x64dbg. I really like it, it is not 
the bloatware I got used to nowadays.


It turns out that LDC2's parameter/register handling is really 
clever:


- Register saving/restoring: fully automatic. It analyzes my asm 
and saves/restores only those regs I overwrite.
- Parameters: Reversed Microsoft x64 calling convention, just as 
you said. Parameters in the registers will be 'spilled' onto the 
stack no matter if I'm using them by their names or by the 
register. Maybe this is not too clever but as I can use the 
params by their name from anywhere, it can make my code nicer.
- Must not use the "ret" instruction because it will take it 
literally and will skip the auto-generated exit code.


In conclusion: Maybe LDC2 generates a lot of extra code, but I 
always make longer asm routines, so it's not a problem for me at 
all while it helps me a lot.




Re: LDC2 win64 calling convention

2018-11-28 Thread kinke via Digitalmars-d-learn
You're not using naked asm; this entails a prologue (spilling the 
params to stack etc.). Additionally, LDC doesn't really like 
accessing params and locals in DMD-style inline asm, see 
https://github.com/ldc-developers/ldc/issues/2854.


You can check the final asm trivially online, e.g., 
https://run.dlang.io/is/e0c2Ly (click the ASM button). You'll see 
that your params are in R8, RDX and RCX (reversed order as 
mentioned earlier).


Re: LDC2 win64 calling convention

2018-11-28 Thread realhet via Digitalmars-d-learn

Thank You for the explanation!

But my tests has different results:

void* SSE_sobelRow(ubyte* src, ubyte* dst, size_t srcStride){ asm{
  push RDI;

  mov RAX, 0; mov RDX, 0; mov RCX, 0; //clear 'parameter' 
registers


  mov RAX, src;
  mov RDI, dst;

  //gen
  movups XMM0,[RAX];
  movaps XMM1,XMM0;
  pslldq XMM0,1;
  movaps XMM2,XMM1;
  psrldq XMM1,1;
  pavgb XMM1,XMM0;
  pavgb XMM1,XMM2;
  movups [RDI],XMM1;
  //gen end

  pop RDI;
}}

When I clear those volatile regs that are used for register 
calling, I'm still able to get good results.
However when I put "mov [RBP+8], 0" into the code it generates an 
access violation, so this is why I think parameters are on the 
stack.


What I'm really unsire is that the registers I HAVE TO save in my 
asm routine.
Currently I think I only able to trash the contents of RAX, RCX, 
RDX, XMM0..XMM5 based on the Microsoft calling model. But I'm not 
sure what's the actual case with LDC2 Win64.


If my code is surrounded by SSE the optimizations of the LDC2 
compiler, and I can't satisfy the requirements, I will have 
random errors in the future. I better avoid those.


On the 32bit target the rule is simpe: you could do with all the 
XMM regs and a,c,d what you want. Now at 64bit I'm quite unsure. 
:S


Re: LDC2 win64 calling convention

2018-11-28 Thread kinke via Digitalmars-d-learn

On Wednesday, 28 November 2018 at 20:17:53 UTC, kinke wrote:

The stack isn't used at all


To prevent confusion: it's used of course, e.g., if there are 
more than 4 total parameters. Just not in the classical sense, 
i.e., a 16-bytes struct isn't pushed directly onto the stack, but 
the caller makes the copy and passes a pointer, either in a 
register or on the stack.


Re: LDC2 win64 calling convention

2018-11-28 Thread kinke via Digitalmars-d-learn

On Wednesday, 28 November 2018 at 18:56:14 UTC, realhet wrote:

1. Is there register parameters? (I think no)


Of course, e.g., POD structs of power-of-2 sizes <= 8 bytes and 
integral scalars as well as float/double/vectors. The stack isn't 
used at all, aggregates > 8 bytes are passed by ref (caller makes 
a copy on its stack and passes a pointer to it to the callee); 
that seems not to be mentioned at all in the Wiki article.



2. What are the volatile regs? RAX, RCX, RDX, XMM6..XMM15?


See Microsoft's docs.


3. Is the stack pointer aligned to 16?


It is IIRC.


4. Is there a 32 byte shadow area on the stack?


Yes, IIRC.

---

LDC conforms to the regular Win64 ABI (incl. __vectorcall 
extension for vectors). The biggest difference is that 
`extern(D)` (as opposed to `extern(C)` or `extern(C++)`) reverses 
the arguments - `foo(1, 2, 3, 4)` becomes `foo(4, 3, 2, 1)`, so 
not the first 4 args are passed in registers if possible, but the 
last ones (incl. special cases wrt. struct-return + `this` 
pointers). Other than that, there are just very few special cases 
for delegates and dynamic arrays, which only apply to `extern(D)`.


LDC2 win64 calling convention

2018-11-28 Thread realhet via Digitalmars-d-learn

Hi,

Is there a documentation about the win64 calling convention used 
with LDC2 compiler?


So far I try to use the Microsoft x64 calling convention, but I'm 
not sure if that's the one I have to. But it's not too accurate 
becaues I think it uses the stack only.


https://en.wikipedia.org/wiki/X86_calling_conventions#Microsoft_x64_calling_convention

I'm asking for Your help in the following things:

1. Is there register parameters? (I think no)
2. What are the volatile regs? RAX, RCX, RDX, XMM6..XMM15?
3. Is the stack pointer aligned to 16?
4. Is there a 32 byte shadow area on the stack?

Thank you!