Re: Is placing data with align(32) on the stack with 16-byte alignment an error?

2016-05-29 Thread Marco Leise via Digitalmars-d
Am Sun, 29 May 2016 13:20:12 +
schrieb Johan Engelen :

> On Sunday, 29 May 2016 at 12:07:02 UTC, Marco Leise wrote:
> > 
> > void main() {
> > import core.simd;
> > Matrix4x4 matrix;  // No warning
> > float8 vector; // No warning
> > }  
> 
> Did you do some LDC IR/asm testing?

No :)
 
> With LDC, the type `float8` has 32-byte alignment and so will be 
> placed with that alignment on the stack.

Ok, so practically all compilers honor the alignment attribute
and DMD should follow suit. If I'm not mistaken, this is also
a C interop ABI issue now.

> For your Matrix4x4 user 
> type (I'll assume you meant to write `align(64)`), that alignment 
> becomes part of the type and will be put on the stack with 
> 64-byte alignment. (aliasing does not work: `alias Byte8 = 
> align(8) byte; Byte8 willBeUnaligned;`)

Actually align(64), yes. But for this example align(32) was
enough as I just wanted to focus on AVX types now.

> I believe LDC respects the type's alignment when selecting 
> instructions, so when you specified align(32) byte for your type 
> it can use the aligned load instructions. If you did not specify 
> that alignment, or a lower alignment, it will use unaligned loads.
> 
> A problem arises when you cast a (pointer of a) type with lower 
> alignment to a type with higher alignment; in that case, 
> currently LDC assumes that cast was valid in terms of alignment 
> and !
> 
> -Johan

That sounds reasonable. Thanks for the insight.

-- 
Marco



Re: Is placing data with align(32) on the stack with 16-byte alignment an error?

2016-05-29 Thread Johan Engelen via Digitalmars-d

On Sunday, 29 May 2016 at 12:07:02 UTC, Marco Leise wrote:


void main() {
import core.simd;
Matrix4x4 matrix;  // No warning
float8 vector; // No warning
}


Did you do some LDC IR/asm testing?

With LDC, the type `float8` has 32-byte alignment and so will be 
placed with that alignment on the stack. For your Matrix4x4 user 
type (I'll assume you meant to write `align(64)`), that alignment 
becomes part of the type and will be put on the stack with 
64-byte alignment. (aliasing does not work: `alias Byte8 = 
align(8) byte; Byte8 willBeUnaligned;`)
I believe LDC respects the type's alignment when selecting 
instructions, so when you specified align(32) byte for your type 
it can use the aligned load instructions. If you did not specify 
that alignment, or a lower alignment, it will use unaligned loads.


A problem arises when you cast a (pointer of a) type with lower 
alignment to a type with higher alignment; in that case, 
currently LDC assumes that cast was valid in terms of alignment 
and !


-Johan


Re: Is placing data with align(32) on the stack with 16-byte alignment an error?

2016-05-29 Thread ZombineDev via Digitalmars-d

On Sunday, 29 May 2016 at 12:07:02 UTC, Marco Leise wrote:
I'll try to be concise: The stack on x64 is 16-byte aligned, 
enough for SSE registers, but not the 32-byte AVX registers. 
Any data structure containing AVX registers, cannot be 
guaranteed to be correctly aligned on the stack, but we get no 
warning if we try anyways:


align(32) struct Matrix4x4 {
float[4][4] m;
}

void main() {
import core.simd;
Matrix4x4 matrix;  // No warning
float8 vector; // No warning
}

Now some people use align(64) just as a performance hint, for 
example to have a 64-byte data structure fill 1 cache-line 
exactly (and for all the other things like C interop, file 
alignment, etc.). On the other hand AVX is the first 
instruction set that makes use of alignments above 16 so the 
game has changed and will continue to do so with future x86 
SIMD extensions.



Perspective A:

We now have "authorative" alignments that must be honored with
explicit warnings/errors if not, and the status-quo: alignment
hints that should be honored, but are silently ignored on the
stack. The language could express this with an imagined
"forcealign(32)" attribute, which disallows placing such
data structures on the 16-byte aligned stack. ("forcealign"
naturally overrides any smaller "align" attribute.)


Perspective B:

AVX vectors should generally be assumed to be unaligned. Unlike 
SSE, all but the "aligned load" instructions work with 
unaligned memory operands and the potential speed penalty. 
Aligned loads could be replaced with unaligned loads and the 
code would work again. But as compiler intrinsics continue to 
emit aligned loads for SIMD, this only works for AVX code 
written in asm - intrinsics continue to be a heisen-bug mine 
field.



Thoughts?


Some platforms don't even support unaligned loads/stores so 
alignment should always honored, IMO. Otherwise SIMD types would 
be unusable, because you can't assume that they can be placed on 
the stack with correct alignment.


Re: Is placing data with align(32) on the stack with 16-byte alignment an error?

2016-05-29 Thread Marco Leise via Digitalmars-d
P.S.: From the following bug report, it looks like gcc and
icc honor stack alignments >= 16:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44948
That would be a good solution for dmd, too.

-- 
Marco