Could you elaborate a bit on what you mean by packing aligned representations at some granularity? I thought this was what the AOSOA configuration does: packing in variables at the aligned SIMD width. Do you mean loop blocking with each block fitting into the L1 cache?
On Sat, Nov 23, 2013 at 3:48 PM, Jed Brown <[email protected]> wrote: > Mani Chandra <[email protected]> writes: > > > Hi, > > > > Is it possible to use an Arrays of Structs of Arrays (AOSOA) > configuration > > using DMDAs? Something like > > > > struct node { > > float var1[16], var2[16], var3[16]; > > } > > Yes, you can manually manage this dimension/chunking, and use > DMDASetBlockFills() so that the resulting matrix retains proper > sparsity. Neighbor exchange will not automatically understand the > blocks, and you would have to use a different fringe layout if you want > to organize data as AoSoA. > > > Instead of > > > > struct node { > > float var1, var2, var3; > > } > > > > as is the usual way of using DMDAs. > > > > The global grid size of say a 2D grid would then decrease from NxN to > (N/16)xN > > > > I'm interested in doing this for ease of vectorization as described in > > http://software.intel.com/en-us/articles/memory-layout-transformations > > Note that sparse iterative methods are overwhelmingly limited by memory > bandwidth rather than vectorization, so you'll get no speedup here. > Heavy optimization of stencil operations requires either unaligned loads > or a "roll" operation, at which point the benefit over register > transposition fades. So instead of trying to change the global memory > alignment, I recommend packing aligned representations at whichever > granularity makes sense (in registers, in L1-cache tiles, etc). Make > sure to benchmark the real memory access patterns before leaping to > conclusions about optimal memory layout. >
