Re: What's the fastest way to check if a slice points to static data

2017-06-24 Thread via Digitalmars-d-learn

On Saturday, 24 June 2017 at 18:46:06 UTC, ketmar wrote:

Petar Kirov [ZombineDev] wrote:

Oh, I should have mentioned that I don't expect anything but 
ugly platform-specific hacks possibly involving the object 
file format ;)
Just enough of them to claim that the solution is somewhat 
cross-platform :D


i guess you can loot at how TSL scanning is done in druntime 
(at least on GNU/Linux). it does some parsing of internal 
structures of loaded ELF. i guess you can parse section part of 
the loaded ELF too, to find out r/o sections and their address 
ranges.


that's the stuff :P


Re: What's the fastest way to check if a slice points to static data

2017-06-24 Thread ketmar via Digitalmars-d-learn

Petar Kirov [ZombineDev] wrote:

Oh, I should have mentioned that I don't expect anything but ugly 
platform-specific hacks possibly involving the object file format ;)
Just enough of them to claim that the solution is somewhat cross-platform 
:D


i guess you can loot at how TSL scanning is done in druntime (at least on 
GNU/Linux). it does some parsing of internal structures of loaded ELF. i 
guess you can parse section part of the loaded ELF too, to find out r/o 
sections and their address ranges.


Re: What's the fastest way to check if a slice points to static data

2017-06-24 Thread via Digitalmars-d-learn

On Saturday, 24 June 2017 at 18:05:55 UTC, ketmar wrote:

Petar Kirov [ZombineDev] wrote:


***
But in any case, the null-terminated string was just an 
example application.
I'm interested in a fast way to determine the "storage class" 
of the memory
a slice or a pointer point to. I'm expecting some magic along 
the lines of
checking the range of addresses that the rodata section 
resides in memory.
Similar to how some allocators or the GC know if they own a 
range of memory.

Any ideas on that?
***


the only query you can do is GC query (see `core.memory.CG` 
namespace, `addrOf()` API, for example). it will tell you if 
something was allocated with D GC or not.


yet it is not guaranteed to be fast (althru it is usually "fast 
enough").


I'm not interested in asking the GC specifically,
but I have looked at its implementation and I know
that it keeps such information around:
https://github.com/dlang/druntime/blob/v2.074.1/src/gc/impl/conservative/gc.d#L843

i think this is all what you can get without resorting to ugly 
platform-specific hacks (that will inevitably break ;-).


Oh, I should have mentioned that I don't expect anything but ugly 
platform-specific hacks possibly involving the object file format 
;)
Just enough of them to claim that the solution is somewhat 
cross-platform :D


Re: What's the fastest way to check if a slice points to static data

2017-06-24 Thread ketmar via Digitalmars-d-learn

Petar Kirov [ZombineDev] wrote:


***
But in any case, the null-terminated string was just an example 
application.
I'm interested in a fast way to determine the "storage class" of the 
memory

a slice or a pointer point to. I'm expecting some magic along the lines of
checking the range of addresses that the rodata section resides in memory.
Similar to how some allocators or the GC know if they own a range of 
memory.

Any ideas on that?
***


the only query you can do is GC query (see `core.memory.CG` namespace, 
`addrOf()` API, for example). it will tell you if something was allocated with D 
GC or not.


yet it is not guaranteed to be fast (althru it is usually "fast enough").

i think this is all what you can get without resorting to ugly 
platform-specific hacks (that will inevitably break ;-).


Re: What's the fastest way to check if a slice points to static data

2017-06-24 Thread via Digitalmars-d-learn

On Saturday, 24 June 2017 at 14:18:33 UTC, ketmar wrote:


with the edge case when something like the code i posted below 
managed to make `a` perfectly aligned with r/o area, and you 
got segfault by accising out-of-bounds byte.


BTW, are you sure? AFAIU, it doesn't matter if the CTFE engine 
returns a
non-null-terminated string expression, since the backend or 
the glue layer
would write it to the object file as if it was a 
null-terminated string.


immutable ubyte[2] a = [65,66];
enum string s = cast(string)a;
	immutable ubyte[2] b = [67,68]; // just to show you that there 
is no zero


void main () {
  assert(s[$-1] == 0);
}


Thanks, I haven't considered immutable statically allocated 
fixed-size arrays of chars.
Specifically, while mutable fixed-size arrays of both character 
and non-character type
are common, I don't think immutable fixed-size char arrays are 
much used compared to string literals and ctfe-derived strings. 
I'm tempted to write in the documentation of my hypothetical 
fastStringZ function that passing anything, but something 
originating from a slice is UB, though I'm aware how 
under-specified and hand-wavy this sounds.


On Saturday, 24 June 2017 at 14:21:23 UTC, ketmar wrote:

ketmar wrote:

p.s.: btw, druntime tries to avoid that edge case by not 
checking for trailing out-of-bounds zero if string ends 
exactly on dword boundary. it will miss some strings this way, 
but otherwise it is perfectly safe.


oops. not druntime, phobos, in `std.string.toStringz()`.


Thanks, for some reason I assumed that toStringz always 
conservatively copies the string, without even checking the code.
It looks like the more aggressive optimization was at some point 
removed which is visible in this revision:

http://www.dsource.org/projects/phobos/changeset/101#file15
and later Andrei reintroduced it with the more conservative 
heuristic: 
https://github.com/dlang/phobos/commit/460c844b4fb9b96833871c111dd529d22129ab7c,

but I didn't manage to find any discussion about it.

***
But in any case, the null-terminated string was just an example 
application.
I'm interested in a fast way to determine the "storage class" of 
the memory
a slice or a pointer point to. I'm expecting some magic along the 
lines of
checking the range of addresses that the rodata section resides 
in memory.
Similar to how some allocators or the GC know if they own a range 
of memory.

Any ideas on that?
***


Re: What's the fastest way to check if a slice points to static data

2017-06-24 Thread ketmar via Digitalmars-d-learn

ketmar wrote:

p.s.: btw, druntime tries to avoid that edge case by not checking for 
trailing out-of-bounds zero if string ends exactly on dword boundary. it 
will miss some strings this way, but otherwise it is perfectly safe.


oops. not druntime, phobos, in `std.string.toStringz()`.


Re: What's the fastest way to check if a slice points to static data

2017-06-24 Thread ketmar via Digitalmars-d-learn
p.s.: btw, druntime tries to avoid that edge case by not checking for 
trailing out-of-bounds zero if string ends exactly on dword boundary. it 
will miss some strings this way, but otherwise it is perfectly safe.


Re: What's the fastest way to check if a slice points to static data

2017-06-24 Thread ketmar via Digitalmars-d-learn

Petar Kirov [ZombineDev] wrote:

Please note that not all static immutable strings have to be null 
terminated.
It is possible to generate a string at ctfe which may appear the same as 
string literal, but does not have the \0 at the end.


But in that case, the check `s.ptr[s.length] == 0` in fastStringZ
would do the trick, right?


with the edge case when something like the code i posted below managed to 
make `a` perfectly aligned with r/o area, and you got segfault by accising 
out-of-bounds byte.



BTW, are you sure? AFAIU, it doesn't matter if the CTFE engine returns a
non-null-terminated string expression, since the backend or the glue layer
would write it to the object file as if it was a null-terminated string.


immutable ubyte[2] a = [65,66];
enum string s = cast(string)a;
immutable ubyte[2] b = [67,68]; // just to show you that there is no 
zero

void main () {
  assert(s[$-1] == 0);
}



Re: What's the fastest way to check if a slice points to static data

2017-06-24 Thread via Digitalmars-d-learn

On Saturday, 24 June 2017 at 13:11:02 UTC, Stefan Koch wrote:
On Saturday, 24 June 2017 at 12:22:54 UTC, Petar Kirov 
[ZombineDev] wrote:

[ ... ]

/**
 * Returns:
 * A pointer to a null-terminated string in O(1) time,
 * (with regards to the length of the string and the required
 * memory, if any) or `null` if  * the time constraint
 * can't be met.
 */
immutable(T)* fastStringZ(T)(return immutable(T)[] s) @trusted
if (isSomeChar!T)
{
if (isStaticallyAllocated(s) && s.ptr[s.length] == 0)
return s.ptr;
else
return null;
}
---

(Without `isStaticallyAllocated`, `fastStringZ` may *appear* to
work but if you pass the pointer to e.g. a C library and that
library keeps it after the call has completed, good luck 
tracking
memory corruption if the slice was pointing to 
automatic/dynamic
memory - e.g. static array buffer on the stack or GC / RC * 
heap

allocation.
* malloc or custom allocator + smart pointer wrapper)


Please note that not all static immutable strings have to be 
null terminated.
It is possible to generate a string at ctfe which may appear 
the same as string literal, but does not have the \0 at the end.


But in that case, the check `s.ptr[s.length] == 0` in fastStringZ
would do the trick, right?

BTW, are you sure? AFAIU, it doesn't matter if the CTFE engine 
returns a
non-null-terminated string expression, since the backend or the 
glue layer
would write it to the object file as if it was a null-terminated 
string.
But you're right if you mean that this trick won't work in CTFE, 
since

the `s.ptr[s.length] == 0` trick rightfully is disallowed.

---
void main()
{
static immutable str = generateString();
pragma (msg, str, " is null-terminated at CT: ", 
str.isNullTerminated());


import std.stdio;
writeln(str, " is null-terminated at RT: ", 
str.isNullTerminated());

}

string generateString()
{
string res;
foreach (i; 0 .. 26) res ~= 'a' + i;
return res;
}

import std.traits : isSomeChar;

bool isNullTerminated(T)(scope const T[] str)
if (isSomeChar!T)
{
if (!__ctfe)
return str.ptr[str.length] == 0;
else
return false;
}
---

Compilation output:
abcdefghijklmnopqrstuvwxyz is null-terminated at CT: false

Application output:
abcdefghijklmnopqrstuvwxyz is null-terminated at RT: true




Re: What's the fastest way to check if a slice points to static data

2017-06-24 Thread Stefan Koch via Digitalmars-d-learn
On Saturday, 24 June 2017 at 12:22:54 UTC, Petar Kirov 
[ZombineDev] wrote:

[ ... ]

/**
 * Returns:
 * A pointer to a null-terminated string in O(1) time,
 * (with regards to the length of the string and the required
 * memory, if any) or `null` if  * the time constraint
 * can't be met.
 */
immutable(T)* fastStringZ(T)(return immutable(T)[] s) @trusted
if (isSomeChar!T)
{
if (isStaticallyAllocated(s) && s.ptr[s.length] == 0)
return s.ptr;
else
return null;
}
---

(Without `isStaticallyAllocated`, `fastStringZ` may *appear* to
work but if you pass the pointer to e.g. a C library and that
library keeps it after the call has completed, good luck 
tracking

memory corruption if the slice was pointing to automatic/dynamic
memory - e.g. static array buffer on the stack or GC / RC * heap
allocation.
* malloc or custom allocator + smart pointer wrapper)


Please note that not all static immutable strings have to be null 
terminated.
It is possible to generate a string at ctfe which may appear the 
same as string literal, but does not have the \0 at the end.


What's the fastest way to check if a slice points to static data

2017-06-24 Thread via Digitalmars-d-learn
I need a fast and hopefully relatively cross-platform (ELF, OMF, 
COFF and MachO) way of checking if a slice points to data in the 
read-only section of the binary, i.e. it's pointing to a 
statically-allocated piece of memory.




Of course a simple solution using meta programming would be:

---
enum isStaticallyAllocated(alias var) = __traits(compiles,
{
// ensures that the value is known at compile-time
enum value = var;

// ensures that it's not a manifest constant and that it's
// actually going to be part of the binary (modulo linker
// optimizations like gc-sections).
static immutable addr = 
});

enum x = 3;
static immutable y = 4;
immutable z = 5;
int w = 6;

void main()
{
enum localX = 3;
static immutable localY = 4;
immutable localZ = 5;
int localW = 6;

pragma (msg, isStaticallyAllocated!x); // false
pragma (msg, isStaticallyAllocated!y); // true
pragma (msg, isStaticallyAllocated!z); // true
pragma (msg, isStaticallyAllocated!w); // false
pragma (msg, isStaticallyAllocated!localX); // false
pragma (msg, isStaticallyAllocated!localY); // true
pragma (msg, isStaticallyAllocated!localZ); // false
pragma (msg, isStaticallyAllocated!localW); // false
}
---

However, that doesn't work when all you have is a slice as a 
run-time

argument to a function.



Additionally, if the the slice was constructed from a string 
literal,
it should possible to recover a pointer to the zero-terminated 
string.


Or in pseudo-code:

---
void main()
{
import core.stdc.stdio : printf;
auto p = "test".fastStringZ;
p || assert(0, "Something is terribly wrong!");
printf("%s\n", p);
}

import std.traits : isSomeChar;

// Does the magic
bool isStaticallyAllocated(in scope void[] slice)
{
// XXX_XXX Fix me
return true;
}

/**
 * Returns:
 * A pointer to a null-terminated string in O(1) time,
 * (with regards to the length of the string and the required
 * memory, if any) or `null` if  * the time constraint
 * can't be met.
 */
immutable(T)* fastStringZ(T)(return immutable(T)[] s) @trusted
if (isSomeChar!T)
{
if (isStaticallyAllocated(s) && s.ptr[s.length] == 0)
return s.ptr;
else
return null;
}
---

(Without `isStaticallyAllocated`, `fastStringZ` may *appear* to
work but if you pass the pointer to e.g. a C library and that
library keeps it after the call has completed, good luck tracking
memory corruption if the slice was pointing to automatic/dynamic
memory - e.g. static array buffer on the stack or GC / RC * heap
allocation.
* malloc or custom allocator + smart pointer wrapper)