Re: K-Nearest Neighbor + pointger alignments

2014-06-10 Thread bearophile via Digitalmars-d-announce

Ali Cehreli:


I wonder what bearophile's response will be. ;)


Despite looking like a silly sequence of optimizations, I do have 
some general comments on that text. Thanks to Kenji 
(https://github.com/D-Programming-Language/dmd/pull/3650 ) this 
code is now valid:


void foo(size_t N)(ref int[N] b) if (N == 4) {}
void main() {
int[5] a;
foo(a[1 .. $]);
}


The D type system is able to understand that if you slice away 
the first item of an an array of 5 items, you produce a pointer 
to an array of 4 items.


But the D static typing is not very strong (precise), D is not 
yet using all the fruits given by static typing.


D throws away the compile-time knowledge about pointer 
alignments, so if you write code like this without the 7 shorts 
of padding, the program crashes at run-time, because of 
misalignment:



uint distance(immutable ref short[nCols - 1] p1,
  immutable ref short[nCols - 1] p2)
pure nothrow @nogc {
alias TV = short8;
enum size_t Vlen = TV.init.array.length;
assert(p1.length % Vlen == 0);
immutable v1 = cast(immutable TV*)p1.ptr;
immutable v2 = cast(immutable TV*)p2.ptr;

TV totV = 0;
foreach (immutable i; 0 .. p1.length / Vlen) {
TV d = v1[i] - v2[i];
totV += d * d;
}

uint tot = 0;
foreach (immutable t; totV.array)
tot += t;
return tot;
}

TLabel classify(immutable short[nCols][] trainingSet,
immutable ref short[nCols - 1] pixels)
pure nothrow @nogc {
auto closestDistance = uint.max;
auto closestLabel = TLabel.max;

foreach (immutable ref s; trainingSet) {
immutable dist = pixels.distance(s[1 .. $]);
if (dist  closestDistance) {
closestDistance = dist;
closestLabel = s[labelIndex];
}
}

return closestLabel;
}



In this program there is all the information necessary to compute 
simply at compile-time the alignment of v1 and v2 and generate a 
compile-time error if you try to perform SIMD operations using 
such unaligned pointers. I don't like D to throw away static 
information that can be used to avoid run-time crashes, this is 
the opposite of what is usually called a safe language.


D type system is able to keep the length of arrays at 
compile-time, allowing data types like ushort[N], but in a system 
language that allows such simple usage of SIMD with core.simd 
it's also useful to encode in the pointer/array type the 
alignment.



So this code should not compile:

uint distance(immutable ref short[nCols - 1] p1,
  immutable ref short[nCols - 1] p2)
pure nothrow @nogc {
alias TV = short8;
enum size_t Vlen = TV.init.array.length;
assert(p1.length % Vlen == 0);
immutable v1 = cast(immutable TV*)p1.ptr;
immutable v2 = cast(immutable TV*)p2.ptr;

TV totV = 0;
foreach (immutable i; 0 .. p1.length / Vlen) {
TV d = v1[i] - v2[i];
totV += d * d;


While this should compile:

uint distance(immutable ref align(16) short[nCols - 1] p1,
  immutable ref align(16) short[nCols - 1] p2)
pure nothrow @nogc {
alias TV = short8;
enum size_t Vlen = TV.init.array.length;
assert(p1.length % Vlen == 0);
immutable v1 = cast(immutable TV*)p1.ptr;
immutable v2 = cast(immutable TV*)p2.ptr;

TV totV = 0;
foreach (immutable i; 0 .. p1.length / Vlen) {
TV d = v1[i] - v2[i];
totV += d * d;


And now this function that calls distance should not compile:

TLabel classify(immutable short[nCols][] trainingSet,
immutable ref short[nCols - 1] pixels)
pure nothrow @nogc {
auto closestDistance = uint.max;
auto closestLabel = TLabel.max;

foreach (immutable ref s; trainingSet) {
immutable dist = pixels.distance(s[1 .. $]); // error


And now this should compile:

TLabel classify(immutable align(16) short[nCols][] trainingSet,
immutable ref align(16) short[nCols - 1] pixels)
pure nothrow @nogc {
auto closestDistance = uint.max;
auto closestLabel = TLabel.max;

foreach (immutable ref s; trainingSet) {
immutable dist = pixels.distance(s[8 .. $]);



And this should compile because std.file.read returns memory 
allocated by the GC that is align(16):


align(16) immutable(short[nCols])[] readData(size_t nCols)(in 
string fileName) {

return cast(typeof(return))std.file.read(fileName);
}


Where the alignment is not known at compile-time the D compiler 
could add run-time alignment asserts in debug builds, to give 
nice run-time error messages.


The simpler int[4] or int[] types are still valid and usable, 
they could be align(1). If you think of them as align(16) you are 
writing faith-based code when you use SIMD instructions.



Automatic variables (stack-allocated) too could allow alignment 
annotations (perhaps ldc2 is already supporting this syntax):


void main() {
align(16) ubyte[60] ubs;
}


I discussed this topic another time in past:

Re: K-Nearest Neighbor + pointger alignments

2014-06-10 Thread Walter Bright via Digitalmars-d-announce

On 6/10/2014 1:46 AM, bearophile wrote:

I don't like D to
throw away static information that can be used to avoid run-time crashes, this
is the opposite of what is usually called a safe language.


To be pedantic, D being a safe language means memory safe, not no seg 
faults of any sort.


Memory safety means that all memory accessed is valid memory, i.e. no memory 
corruption.


http://en.wikipedia.org/wiki/Memory_safety

Memory alignment seg faults are something else entirely.