Re: [petsc-users] AOSOA configuration using DMDA

Mani Chandra Sun, 24 Nov 2013 15:25:54 -0800

Could you elaborate a bit on what you mean by packing aligned
representations at some granularity? I thought this was what the AOSOA
configuration does: packing in variables at the aligned SIMD width. Do you
mean loop blocking with each block fitting into the L1 cache?



On Sat, Nov 23, 2013 at 3:48 PM, Jed Brown <[email protected]> wrote:

> Mani Chandra <[email protected]> writes:
>
> > Hi,
> >
> > Is it possible to use an Arrays of Structs of Arrays (AOSOA)
> configuration
> > using DMDAs? Something like
> >
> > struct node {
> >   float var1[16], var2[16], var3[16];
> > }
>
> Yes, you can manually manage this dimension/chunking, and use
> DMDASetBlockFills() so that the resulting matrix retains proper
> sparsity.  Neighbor exchange will not automatically understand the
> blocks, and you would have to use a different fringe layout if you want
> to organize data as AoSoA.
>
> > Instead of
> >
> > struct node {
> >   float var1, var2, var3;
> > }
> >
> > as is the usual way of using DMDAs.
> >
> > The global grid size of say a 2D grid would then decrease from NxN to
> (N/16)xN
> >
> > I'm interested in doing this for ease of vectorization as described in
> > http://software.intel.com/en-us/articles/memory-layout-transformations
>
> Note that sparse iterative methods are overwhelmingly limited by memory
> bandwidth rather than vectorization, so you'll get no speedup here.
> Heavy optimization of stencil operations requires either unaligned loads
> or a "roll" operation, at which point the benefit over register
> transposition fades.  So instead of trying to change the global memory
> alignment, I recommend packing aligned representations at whichever
> granularity makes sense (in registers, in L1-cache tiles, etc).  Make
> sure to benchmark the real memory access patterns before leaping to
> conclusions about optimal memory layout.
>

Re: [petsc-users] AOSOA configuration using DMDA

Reply via email to