Is there a "tuning" part of BuildSystem that could be added. 

    We select things that would benefit from tuning to select a good value. For 
example with memalign it runs some "benchmark" with the various sizes and 
selects the smallest size that works best?

   If this is just way to much intellectual working and coding to utilize, 
which I think it is, we can use the "predefined based on architecture" approach 
where configure looks in a table based on the arch/sub-arch etc to select a 
value (and of course the value is verified to work correctly). This approach 
may not always produce the optimal value but it is simple and will usually 
select a pretty good value, it only requires the discipline of adding new archs 
overtime (which often will not happen) but at least for some arches we may have 
good values.

   Barry



> On Sep 5, 2017, at 12:55 PM, Jed Brown <[email protected]> wrote:
> 
> Richard Tran Mills <[email protected]> writes:
> 
>> Folks,
>> 
>> I am wondering how PETSc's BuildSystem currently chooses the memory
>> alignment to use. In the example file I provide in the repo for the Cori
>> KNL nodes, I specify '--with-memalign=64' to match the cache line width. If
>> I don't do this, then configure chooses a 16 byte alignment. Do people
>> think that we should try to make a better effort to choose a more
>> appropriate alignment?
>> 
>> I believe that all modern x86 CPUs use a 64 byte cache width, so should we
>> be defaulting to that? I don't know how much this matters on a Xeon
>> processor, but it can be important on Xeon Phi.
> 
> The preference for 16 was for SSE2, prior to AVX and AVX512.  With AVX,
> I believe there is no particular reason to prefer more than 32-byte
> alignment -- it just ensures that objects cannot share cache lines.
> Harmless with large arrays, but not ideal for smaller allocations like
> nodes in a linked list.  (PETSc doesn't use these very often, but it
> would be nice if control structures don't soak up more cache than they
> need.)

Reply via email to