[petsc-users] Optimal memory layout for finite differences

Mani Chandra Thu, 12 Dec 2013 19:24:41 -0800

Hi Everyone,

Would it be a good idea to arrange the data in fastest direction in the
following manner for the ease of aligned loads and vector operations?


Total grid points = 4n
0, n, 2n, 3n, 1, n+1, 2n+1, 3n+1 and so on
Ref: "Tuning a Finite Difference Computation for Parallel Vector Processors"
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6341495

This change in the global memory layout would mix up the ghost zones in
Petscs' DMDAs and I guess change the matrix structure seperating adjacent
points by a distance = 4. One can even make the distance = 8 and load one
full cacheline in one go. I was wondering if this memory layout can be used
for computations using Petscs' DMDAs and if the preconditioners would be ok
with this kind of an arrangement.

Thanks,
Mani

[petsc-users] Optimal memory layout for finite differences

Reply via email to