Re: [Users] ET on KNL.

Ian Hinder Thu, 02 Mar 2017 07:10:15 -0800

On 2 Mar 2017, at 14:37, Erik Schnetter <[email protected]> wrote:


> I am currently redesigning the tiling infrastructure, also to allow 
> multithreading via Qthreads instead of OpenMP and to allow for aligning 
> arrays with cache line boundaries. The new approach (different from the 
> current LoopControl) is to choose a fixed tile size, either globally or per 
> loop, and then assign individual tiles to threads. This also works will with 
> DG derivative where the DG element size dictates a granularity for the tile 
> size, and the new efficient tiled derivative operators. Most of this is still 
> in flux. I have seen large efficiency improvements in the RHS calculation, 
> but two puzzling items remain:
> 
> (1) It remains more efficient to use MPI than multi-threading for 
> parallelization, at least on regular CPUs. On KNL my results are still 
> somewhat random.

When using MPI vs multi-threading on the same number of cores, the component 
will be smaller, meaning that more of it is likely to fit in the cache.  Would 
that explain this observation?

> (2) MoL_Add is quite expensive compared to the RHS evaluation.

That is indeed odd.

> The main thing that changed since our last round of thorough benchmarks is 
> that CPU became much more powerful while memory bandwidth hasn't. I'm 
> beginning to think that things such as vectorization or parallelization 
> basically don't matter any more if we ensure that we pull data from memory 
> into caches efficiently.
> 
> I have not yet collected PAPI statistics.
> 
> -erik
> 
> 
> On Thu, Mar 2, 2017 at 6:57 AM, Ian Hinder <[email protected]> wrote:
> 
> On 1 Mar 2017, at 22:10, David Radice <[email protected]> wrote:
> 
>> Hi Ian, Erik, Eloisa,
>> 
>>> I attach a very brief report of some results I obtained in 2015 after 
>>> attending a KNC workshop.
>>>> Conclusions: By using 244 threads, with the domain split into tiles of 
>>>> size 8 × 4 × 4 points, and OpenMP threads assigned one per tile as they 
>>>> become available, the MIC was able to outperform the single CPU by a 
>>>> factor of 1.5. The same tiling strategy was used on the CPU, as it has 
>>>> been found to give good performance there in the past. Since we have not 
>>>> yet optimised the code for the MIC architecture, we believe that further 
>>>> speed improvements will be possible, and that solving the Einstein 
>>>> equations on the MIC architecture should be feasible.
>>>> 
>>> Eloisa, are you using LoopControl?  There are tiling parameters which can 
>>> also help with performance on these devices.
>> 
>> how does tiling work with LoopControl? Is it documented somewhere? I naively 
>> thought that the point of tiling was to have chunks of data stored 
>> contiguously in memory...
> 
> Ideally yes, but this would need to be done in Carpet not LoopControl, and I 
> think you would then require ghost zones around each tile.  Since we have 
> huge numbers of ghost zones, I'm not sure it is practical.
> 
> LoopControl has parameters such as tilesize and loopsize, but Erik would know 
> better how to use these. It was a long time ago, and I can't immediately find 
> my parameter files.
> 
>> BTW, at the moment I am using this macro for all of my loop needs:
>> 
>> #define UTILS_LOOP3(NAME,I,SI,EI,J,SJ,EJ,K,SK,EK)                            
>>   \
>>    _Pragma("omp for collapse(3)")                                            
>>  \
>>    for(int I = SI; I < EI; ++I)                                              
>>  \
>>    for(int J = SJ; J < EJ; ++J)                                              
>>  \
>>    for(int K = SK; K < EK; ++K)
>> 
>> How would I convert it to something equivalent using LoopControl?
>> 
>> Thanks,
>> 
>> David
>> 
>> PS. Seeing that Eloisa was able to compile bbox.cc with the intel-17.0.0 
>> with -no-vec, I made a patch to disable vectorization using pragmas inside 
>> bbox.cc (to avoid having to compile it manually):
>> 
>> https://bitbucket.org/eschnett/carpet/pull-requests/16/carpetlib-fix-compilation-with-intel-1700/diff
> 
> -- 
> Ian Hinder
> http://members.aei.mpg.de/ianhin
> 
> 
> 
> 
> -- 
> Erik Schnetter <[email protected]>
> http://www.perimeterinstitute.ca/personal/eschnetter/

-- 
Ian Hinder
http://members.aei.mpg.de/ianhin

_______________________________________________
Users mailing list
[email protected]
http://lists.einsteintoolkit.org/mailman/listinfo/users

Re: [Users] ET on KNL.

Reply via email to