Re: AVX for math code ... avx instructions later disappearing ?

2021-09-26 Thread james.p.leblanc via Digitalmars-d-learn

On Sunday, 26 September 2021 at 19:00:54 UTC, kinke wrote:
On Sunday, 26 September 2021 at 18:08:46 UTC, james.p.leblanc 
wrote:

or even moving the array declarations to before
the dot product function, and the avx instructions will 
disappear!


That's because the `@fastmath` UDA applies to the next 
declaration only, which is the `x` array in your 2nd example 
(where it obviously has no effect). Either use `@fastmath:` 
with the colon to apply it to the entire scope, or use 
`-ffast-math` in the LDC cmdline.


Similarly, when moving the function to another module and you 
don't include that module in the cmdline, it's only imported 
and not compiled and won't show up in the resulting assembly.


Wrt. stack alignment, there aren't any issues with LDC AFAIK 
(not limited to 16 or whatever like DMD).


Kinke,

Thanks very much for your response.  There were many issues that I
had been misunderstanding in my attempts.  The provided 
explanation

helped me understand the broader scope of what is happening.

(I never even thought about the @fastmath UDA aspect! ... a bit
embarrassing for me!)  Using the -ffast-math in the LDC
cmdline seems to be a most elegant solution.

Much appreciated!
Regards,
James





Re: AVX for math code ... avx instructions later disappearing ?

2021-09-26 Thread kinke via Digitalmars-d-learn
On Sunday, 26 September 2021 at 18:08:46 UTC, james.p.leblanc 
wrote:

or even moving the array declarations to before
the dot product function, and the avx instructions will 
disappear!


That's because the `@fastmath` UDA applies to the next 
declaration only, which is the `x` array in your 2nd example 
(where it obviously has no effect). Either use `@fastmath:` with 
the colon to apply it to the entire scope, or use `-ffast-math` 
in the LDC cmdline.


Similarly, when moving the function to another module and you 
don't include that module in the cmdline, it's only imported and 
not compiled and won't show up in the resulting assembly.


Wrt. stack alignment, there aren't any issues with LDC AFAIK (not 
limited to 16 or whatever like DMD).


AVX for math code ... avx instructions later disappearing ?

2021-09-26 Thread james.p.leblanc via Digitalmars-d-learn

Dear D-ers,

I enjoyed reading some details of incorporating AVX into math code
from Johan Engelen's programming blog post:

http://johanengelen.github.io/ldc/2016/10/11/Math-performance-LDC.html

Basically, one can use the ldc compiler to insert avx code, nice!

In playing with some variants of his example code, I realize
that there are issues I do not understand.  For example, the 
following

code successfully incorporates the avx instructions:

```d
// File here is called dotFirst.d
import ldc.attributes : fastmath;
@fastmath

double dot( double[] a, double[] b)
{
double s = 0.0;
foreach (size_t i; 0 .. a.length) {
s += a[i] * b[i];
}
return s;
}

double[8] x =[0.0, 1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, ];
double[8] y =[0.0, 1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, ];

void main()
{
double z = 0.0;
z = dot(x, y);
}
```
If we run:

ldc2 -c -output-s -O3 -release dotFirst.d -mcpu=haswell
echo "Results of grep ymm dotFirst.s:"
grep ymm dotFirst.s

The "grep" shows a number of vector instructions, such as:

**vfmadd132pd 160(%rcx,%rdi,8), %ymm5, %ymm1**

However, subtle changes in the code (such as moving the dot 
product
function to a module, or even moving the array declarations to 
before

the dot product function, and the avx instructions will disappear!

```d
import ldc.attributes : fastmath;
@fastmath

double[8] x =[0.0, 1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, ];
double[8] y =[0.0, 1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, ];

double dot( double[] a, double[] b)
{
double s = 0.0;
foreach (size_t i; 0 .. a.length) {
...

```
Now a grep will not find a single **ymm**.

It is understood that ldc needs proper alignment to be able to do 
the vector

instructions...

**But my question is:**   how is proper alignment guaranteed? 
(Most importantly
how guaranteed among code using modules)??  (There are related 
stack alignment

issues -- 16?)

Best Regards,
James


PS I have come across scattered bits of (sometimes contradictory) 
information on

avx/simd for dlang.  Is there a canonical source for vector info?