Hi Everyone,

Would it be a good idea to arrange the data in fastest direction in the
following manner for the ease of aligned loads and vector operations?

Total grid points = 4n
0, n, 2n, 3n, 1, n+1, 2n+1, 3n+1 and so on
Ref: "Tuning a Finite Difference Computation for Parallel Vector Processors"
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6341495

This change in the global memory layout would mix up the ghost zones in
Petscs' DMDAs and I guess change the matrix structure seperating adjacent
points by a distance = 4. One can even make the distance = 8 and load one
full cacheline in one go. I was wondering if this memory layout can be used
for computations using Petscs' DMDAs and if the preconditioners would be ok
with this kind of an arrangement.

Thanks,
Mani

Reply via email to