VLA in Assembler

2014-12-17 Thread Foo via Digitalmars-d-learn

Hi,
Could someone explain me, if and how it is possible to allocate a 
variable length array with inline assembly?

Somewhat like

int[] arr;
int n = 42;
asm {
// allocate n stack space for arr
}

I know it is dangerous and all that, but I just want it know. ;)


Re: VLA in Assembler

2014-12-17 Thread bearophile via Digitalmars-d-learn

Foo:


Hi,
Could someone explain me, if and how it is possible to allocate 
a variable length array with inline assembly?

Somewhat like

int[] arr;
int n = 42;
asm {
// allocate n stack space for arr
}

I know it is dangerous and all that, but I just want it know. ;)


Doing it with alloca is simpler:


void main() @nogc {
import core.stdc.stdlib: alloca, exit;

alias T = int;
enum n = 42;

auto ptr = cast(T*)alloca(T.sizeof * n);
if (ptr == null)
exit(1); // Or throw a memory error.
auto arr = ptr[0 .. n];
}


Bye,
bearophile


Re: VLA in Assembler

2014-12-17 Thread Foo via Digitalmars-d-learn

On Wednesday, 17 December 2014 at 10:59:09 UTC, bearophile wrote:

Foo:


Hi,
Could someone explain me, if and how it is possible to 
allocate a variable length array with inline assembly?

Somewhat like

int[] arr;
int n = 42;
asm {
   // allocate n stack space for arr
}

I know it is dangerous and all that, but I just want it know. 
;)


Doing it with alloca is simpler:


void main() @nogc {
import core.stdc.stdlib: alloca, exit;

alias T = int;
enum n = 42;

auto ptr = cast(T*)alloca(T.sizeof * n);
if (ptr == null)
exit(1); // Or throw a memory error.
auto arr = ptr[0 .. n];
}


Bye,
bearophile
Yes I know, but I really want it in inline assembly. It's for 
learning purpose. :)


Re: VLA in Assembler

2014-12-17 Thread uri via Digitalmars-d-learn

On Wednesday, 17 December 2014 at 11:39:43 UTC, Foo wrote:
On Wednesday, 17 December 2014 at 10:59:09 UTC, bearophile 
wrote:

Foo:


Hi,
Could someone explain me, if and how it is possible to 
allocate a variable length array with inline assembly?

Somewhat like

int[] arr;
int n = 42;
asm {
  // allocate n stack space for arr
}

I know it is dangerous and all that, but I just want it know. 
;)


Doing it with alloca is simpler:


void main() @nogc {
   import core.stdc.stdlib: alloca, exit;

   alias T = int;
   enum n = 42;

   auto ptr = cast(T*)alloca(T.sizeof * n);
   if (ptr == null)
   exit(1); // Or throw a memory error.
   auto arr = ptr[0 .. n];
}


Bye,
bearophile
Yes I know, but I really want it in inline assembly. It's for 
learning purpose. :)


You could look at the disassembly.


Re: VLA in Assembler

2014-12-17 Thread Foo via Digitalmars-d-learn

On Wednesday, 17 December 2014 at 12:15:23 UTC, uri wrote:

On Wednesday, 17 December 2014 at 11:39:43 UTC, Foo wrote:
On Wednesday, 17 December 2014 at 10:59:09 UTC, bearophile 
wrote:

Foo:


Hi,
Could someone explain me, if and how it is possible to 
allocate a variable length array with inline assembly?

Somewhat like

int[] arr;
int n = 42;
asm {
 // allocate n stack space for arr
}

I know it is dangerous and all that, but I just want it 
know. ;)


Doing it with alloca is simpler:


void main() @nogc {
  import core.stdc.stdlib: alloca, exit;

  alias T = int;
  enum n = 42;

  auto ptr = cast(T*)alloca(T.sizeof * n);
  if (ptr == null)
  exit(1); // Or throw a memory error.
  auto arr = ptr[0 .. n];
}


Bye,
bearophile
Yes I know, but I really want it in inline assembly. It's for 
learning purpose. :)


You could look at the disassembly.


And how? I'm on Windows.


Re: VLA in Assembler

2014-12-17 Thread btdc via Digitalmars-d-learn

On Wednesday, 17 December 2014 at 10:35:39 UTC, Foo wrote:

Hi,
Could someone explain me, if and how it is possible to allocate 
a variable length array with inline assembly?

Somewhat like

int[] arr;
int n = 42;
asm {
// allocate n stack space for arr
}

I know it is dangerous and all that, but I just want it know. ;)


It's probably something like that:

module runnable;

import std.stdio;
import std.c.stdlib;

ubyte[] newArr(size_t aLength)
{
asm
{
naked;

mov ECX, EAX;   // saves aLength in ECX

push ECX;
call malloc;// .ptr =  malloc(aLength);
mov ECX,[EAX];  // saved the .ptr of our array

mov EAX, 8; // an array is a struct with length 
and ptr

// so 8 bytes in 32 bit
call malloc;// EAX points to the first byte of 
the struct


mov [EAX + 4], ECX; // .ptr
pop ECX;
mov [EAX], ECX; // .length
mov EAX, [EAX]; // curretnly EAX is a ref, so need to 
dig...


ret;
}
}

try and see ;) Actually it may be wrong



Re: VLA in Assembler

2014-12-17 Thread btdc via Digitalmars-d-learn

On Wednesday, 17 December 2014 at 12:54:44 UTC, btdc wrote:

On Wednesday, 17 December 2014 at 10:35:39 UTC, Foo wrote:

Hi,
Could someone explain me, if and how it is possible to 
allocate a variable length array with inline assembly?

Somewhat like

int[] arr;
int n = 42;
asm {
   // allocate n stack space for arr
}

I know it is dangerous and all that, but I just want it know. 
;)


It's probably something like that:

module runnable;

import std.stdio;
import std.c.stdlib;

ubyte[] newArr(size_t aLength)
{
asm
{
naked;

mov ECX, EAX;   // saves aLength in ECX

push ECX;
call malloc;// .ptr =  malloc(aLength);
mov ECX,[EAX];  // saved the .ptr of our array

mov EAX, 8; // an array is a struct with length 
and ptr

// so 8 bytes in 32 bit
call malloc;// EAX points to the first byte of 
the struct


mov [EAX + 4], ECX; // .ptr
pop ECX;
mov [EAX], ECX; // .length
mov EAX, [EAX]; // curretnly EAX is a ref, so need 
to dig...


ret;
}
}

try and see ;) Actually it may be wrong


fuck...the comments are once again cut...


Re: VLA in Assembler

2014-12-17 Thread Foo via Digitalmars-d-learn

And it is using malloc... ;)
I wanted something that increases the stack pointer ESP.

e.g.

void main()
{
int[] arr;
int n = 42;

writeln(arr.length);
writeln(arr.ptr);

asm {
mov EAX, n;
mov [arr + 8], ESP;
sub [ESP], EAX;
mov [arr + 0], EAX;
}

writeln(arr.length);
//writeln(arr[0]);
}

but that does not work...


Re: VLA in Assembler

2014-12-17 Thread Adam D. Ruppe via Digitalmars-d-learn

On Wednesday, 17 December 2014 at 12:29:53 UTC, Foo wrote:

And how? I'm on Windows.


Digital Mars sells an obj2asm function that will disassemble dmd 
generated code. I think it is in the $15 basic utility package.


But VLA/alloca is more complex than a regular function - the 
compiler needs to know about it to adjust for the changed stack. 
It'll take more length to write this up, I'll do it in a separate 
post.


Re: VLA in Assembler

2014-12-17 Thread btdc via Digitalmars-d-learn

On Wednesday, 17 December 2014 at 14:11:32 UTC, Foo wrote:

And it is using malloc... ;)
I wanted something that increases the stack pointer ESP.

e.g.

void main()
{
int[] arr;
int n = 42;

writeln(arr.length);
writeln(arr.ptr);

asm {
mov EAX, n;
mov [arr + 8], ESP;
sub [ESP], EAX;
mov [arr + 0], EAX;
}

writeln(arr.length);
//writeln(arr[0]);
}

but that does not work...


You cant always get what you want. try more, speak less.



Re: VLA in Assembler

2014-12-17 Thread Adam D. Ruppe via Digitalmars-d-learn

On Wednesday, 17 December 2014 at 14:11:32 UTC, Foo wrote:

asm {
mov EAX, n;
mov [arr + 8], ESP;
sub [ESP], EAX;
mov [arr + 0], EAX;
}
but that does not work...


That wouldn't work even with malloc remember, an integer more 
than one byte long, so your subtract is 1/4 the size it needs to 
be! Also, since the stack grows downward, you're storing the 
pointer to the end of the array instead of the beginning of it.



NOTE: I've never actually done this before, so I'm figuring it 
out as I go too. This might be buggy or otherwise mistaken at 
points. (Personally, I prefer to use a static array sized to the 
max thing I'll probably need that I slice  instead of alloca...)



Here's some code that runs successfully (in 32 bit!):

void vla(int n) {
int[] arr;

asm {
mov EAX, [n];
// the first word in an array is the length, 
store that

mov [arr], EAX;
shl EAX, 2; // number of bytes == n * int.sizeof
sub ESP, EAX; // allocate the bytes
		mov [arr + size_t.sizeof], ESP; // store the beginning of it in 
the arr.ptr

}

import std.stdio;
writeln(arr.length);
writeln(arr.ptr);

// initialize the data...
foreach(i, ref a; arr)
a = i;

writeln(arr); // and print it back out
}

void main() {
vla(8);
}


This looks right but isn't, we changed the stack and didn't 
put it back. That's usually a no-no. If we disassemble the 
function, we can take a look at the end and see something scary:


 8084ec6:   e8 9d 6a 00 00  call   808b968 
_D3std5stdio15__T7writelnTAiZ7writelnFAiZv  // our final 
writeln call

 8084ecb:   5e  popesi  // uh oh
 8084ecc:   5b  popebx
 8084ecd:   c9  leave
 8084ece:   c3  ret



Before the call to leave, which puts the stack back how it was at 
the beginning of the function - which saves us from a random EIP 
being restored upon the ret instruction - the compiler put in a 
few pop instructions.


main() will have different values in esi and ebx than it expects! 
Running it in the debugger shows these values changed too:


before

(gdb) info registers
[...]
ebx0xd4f4   -11020
[...]
esi0x80916e8134813416


after

ebx0x1  1
esi0x0  0


It popped the values of our array. According to the ABI: EBX, 
ESI, EDI, EBP must be preserved across function calls. 
http://dlang.org/abi.html


They are pushed for a reason - the compiler assumes they remain 
the same.



In this little test program, nothing went wrong because no more 
code was run after vla returned. But, if we were using, say a 
struct, it'd probably fault when it tried to access `this`. It'd 
probably mess up other local variables too. No good!



So, we'll need to store and restore the stack pointer... can we 
use the stack's push and pop instructions? Nope, we're changing 
the stack! Our own pop would grab the wrong data too.


We could save it in a local variable. How do we restore it 
though? scope(exit) won't work, it won't happen at the right time 
and will corrupt the stack even worse.


Gotta do it ourselves - which means we can't do the alloca even 
as a single mixin, since it needs code added before any return 
point too!


(There might be other, better ways to do this... and indeed, 
there is, as we'll see later on. I peeked at the druntime source 
code and it does it differently. Continue reading...)





Here's code that we can verify in the debugger leaves everything 
how it should be and doesn't crash:


void vla(int n) {
int[] arr;
void* saved_esp;

asm {
mov EAX, [n];
mov [arr], EAX;
shl EAX, 2; // number of bytes == n * int.sizeof

// NEW LINE
mov [saved_esp], ESP; // save it for later

sub ESP, EAX;
mov [arr + size_t.sizeof], ESP;
}

import std.stdio;
writeln(arr.length);
writeln(arr.ptr);

foreach(i, ref a; arr)
a = i;

writeln(arr);

// NEW LINE
asm { mov ESP, [saved_esp]; } // restore it before we return
}




Note that this still isn't quite right - the allocated size 
should be aligned too. It works for the simple case of 8 ints 
since that's coincidentally aligned, but if we were doing like 3 
bytes, it would mess things up. Gotta be rounded up to a multiple 
of 4 or 16 on some systems.


hmm, I'm looking at the alloca source and there's a touch of a 
guard page on Windows too. Check out the file: 
dmd2/src/druntime/src/rt/alloca.d, it is written in mostly inline 
asm.


Note the comment though:

 * This is a 'magic' function that needs help from the compiler to

Re: VLA in Assembler

2014-12-17 Thread Namespaces via Digitalmars-d-learn

On Wednesday, 17 December 2014 at 15:20:28 UTC, btdc wrote:

On Wednesday, 17 December 2014 at 14:11:32 UTC, Foo wrote:

And it is using malloc... ;)
I wanted something that increases the stack pointer ESP.

e.g.

void main()
{
int[] arr;
int n = 42;

writeln(arr.length);
writeln(arr.ptr);

asm {
mov EAX, n;
mov [arr + 8], ESP;
sub [ESP], EAX;
mov [arr + 0], EAX;
}

writeln(arr.length);
//writeln(arr[0]);
}

but that does not work...


You cant always get what you want. try more, speak less.

Very helpful. And soo friendly! ;)


Re: VLA in Assembler

2014-12-17 Thread Foo via Digitalmars-d-learn
On Wednesday, 17 December 2014 at 16:10:40 UTC, Adam D. Ruppe 
wrote:

On Wednesday, 17 December 2014 at 14:11:32 UTC, Foo wrote:

asm {
mov EAX, n;
mov [arr + 8], ESP;
sub [ESP], EAX;
mov [arr + 0], EAX;
}
but that does not work...


That wouldn't work even with malloc remember, an integer 
more than one byte long, so your subtract is 1/4 the size it 
needs to be! Also, since the stack grows downward, you're 
storing the pointer to the end of the array instead of the 
beginning of it.



NOTE: I've never actually done this before, so I'm figuring it 
out as I go too. This might be buggy or otherwise mistaken at 
points. (Personally, I prefer to use a static array sized to 
the max thing I'll probably need that I slice  instead of 
alloca...)



Here's some code that runs successfully (in 32 bit!):

void vla(int n) {
int[] arr;

asm {
mov EAX, [n];
// the first word in an array is the length, 
store that

mov [arr], EAX;
shl EAX, 2; // number of bytes == n * int.sizeof
sub ESP, EAX; // allocate the bytes
		mov [arr + size_t.sizeof], ESP; // store the beginning of it 
in the arr.ptr

}

import std.stdio;
writeln(arr.length);
writeln(arr.ptr);

// initialize the data...
foreach(i, ref a; arr)
a = i;

writeln(arr); // and print it back out
}

void main() {
vla(8);
}


This looks right but isn't, we changed the stack and didn't 
put it back. That's usually a no-no. If we disassemble the 
function, we can take a look at the end and see something scary:


 8084ec6:   e8 9d 6a 00 00  call   808b968 
_D3std5stdio15__T7writelnTAiZ7writelnFAiZv  // our final 
writeln call

 8084ecb:   5e  popesi  // uh oh
 8084ecc:   5b  popebx
 8084ecd:   c9  leave
 8084ece:   c3  ret



Before the call to leave, which puts the stack back how it was 
at the beginning of the function - which saves us from a random 
EIP being restored upon the ret instruction - the compiler put 
in a few pop instructions.


main() will have different values in esi and ebx than it 
expects! Running it in the debugger shows these values changed 
too:


before

(gdb) info registers
[...]
ebx0xd4f4   -11020
[...]
esi0x80916e8134813416


after

ebx0x1  1
esi0x0  0


It popped the values of our array. According to the ABI: EBX, 
ESI, EDI, EBP must be preserved across function calls. 
http://dlang.org/abi.html


They are pushed for a reason - the compiler assumes they remain 
the same.



In this little test program, nothing went wrong because no more 
code was run after vla returned. But, if we were using, say a 
struct, it'd probably fault when it tried to access `this`. 
It'd probably mess up other local variables too. No good!



So, we'll need to store and restore the stack pointer... can we 
use the stack's push and pop instructions? Nope, we're changing 
the stack! Our own pop would grab the wrong data too.


We could save it in a local variable. How do we restore it 
though? scope(exit) won't work, it won't happen at the right 
time and will corrupt the stack even worse.


Gotta do it ourselves - which means we can't do the alloca even 
as a single mixin, since it needs code added before any return 
point too!


(There might be other, better ways to do this... and indeed, 
there is, as we'll see later on. I peeked at the druntime 
source code and it does it differently. Continue reading...)





Here's code that we can verify in the debugger leaves 
everything how it should be and doesn't crash:


void vla(int n) {
int[] arr;
void* saved_esp;

asm {
mov EAX, [n];
mov [arr], EAX;
shl EAX, 2; // number of bytes == n * int.sizeof

// NEW LINE
mov [saved_esp], ESP; // save it for later

sub ESP, EAX;
mov [arr + size_t.sizeof], ESP;
}

import std.stdio;
writeln(arr.length);
writeln(arr.ptr);

foreach(i, ref a; arr)
a = i;

writeln(arr);

// NEW LINE
asm { mov ESP, [saved_esp]; } // restore it before we return
}




Note that this still isn't quite right - the allocated size 
should be aligned too. It works for the simple case of 8 ints 
since that's coincidentally aligned, but if we were doing like 
3 bytes, it would mess things up. Gotta be rounded up to a 
multiple of 4 or 16 on some systems.


hmm, I'm looking at the alloca source and there's a touch of a 
guard page on Windows too. Check out the file: 
dmd2/src/druntime/src/rt/alloca.d, it is written in mostly 
inline asm.


Note the comment