On Thu, Mar 2, 2017 at 10:03 AM, Ian Hinder <ian.hin...@aei.mpg.de> wrote:
> > On 2 Mar 2017, at 14:37, Erik Schnetter <schnet...@cct.lsu.edu> wrote: > > I am currently redesigning the tiling infrastructure, also to allow > multithreading via Qthreads instead of OpenMP and to allow for aligning > arrays with cache line boundaries. The new approach (different from the > current LoopControl) is to choose a fixed tile size, either globally or per > loop, and then assign individual tiles to threads. This also works will > with DG derivative where the DG element size dictates a granularity for the > tile size, and the new efficient tiled derivative operators. Most of this > is still in flux. I have seen large efficiency improvements in the RHS > calculation, but two puzzling items remain: > > (1) It remains more efficient to use MPI than multi-threading for > parallelization, at least on regular CPUs. On KNL my results are still > somewhat random. > > > When using MPI vs multi-threading on the same number of cores, the > component will be smaller, meaning that more of it is likely to fit in the > cache. Would that explain this observation? > My wild guess is that an explicit MPI parallelization exhibits more data locality, leading to better performance. -erik (2) MoL_Add is quite expensive compared to the RHS evaluation. > > > That is indeed odd. > > The main thing that changed since our last round of thorough benchmarks is > that CPU became much more powerful while memory bandwidth hasn't. I'm > beginning to think that things such as vectorization or parallelization > basically don't matter any more if we ensure that we pull data from memory > into caches efficiently. > > I have not yet collected PAPI statistics. > > -erik > > > On Thu, Mar 2, 2017 at 6:57 AM, Ian Hinder <ian.hin...@aei.mpg.de> wrote: > >> >> On 1 Mar 2017, at 22:10, David Radice <drad...@astro.princeton.edu> >> wrote: >> >> Hi Ian, Erik, Eloisa, >> >> I attach a very brief report of some results I obtained in 2015 after >> attending a KNC workshop. >> >> Conclusions: By using 244 threads, with the domain split into tiles of >> size 8 × 4 × 4 points, and OpenMP threads assigned one per tile as they >> become available, the MIC was able to outperform the single CPU by a factor >> of 1.5. The same tiling strategy was used on the CPU, as it has been found >> to give good performance there in the past. Since we have not yet optimised >> the code for the MIC architecture, we believe that further speed >> improvements will be possible, and that solving the Einstein equations on >> the MIC architecture should be feasible. >> >> Eloisa, are you using LoopControl? There are tiling parameters which can >> also help with performance on these devices. >> >> >> how does tiling work with LoopControl? Is it documented somewhere? I >> naively thought that the point of tiling was to have chunks of data stored >> contiguously in memory... >> >> >> Ideally yes, but this would need to be done in Carpet not LoopControl, >> and I think you would then require ghost zones around each tile. Since we >> have huge numbers of ghost zones, I'm not sure it is practical. >> >> LoopControl has parameters such as tilesize and loopsize, but Erik would >> know better how to use these. It was a long time ago, and I can't >> immediately find my parameter files. >> >> BTW, at the moment I am using this macro for all of my loop needs: >> >> #define UTILS_LOOP3(NAME,I,SI,EI,J,SJ,EJ,K,SK,EK) >> \ >> _Pragma("omp for collapse(3)") >> \ >> for(int I = SI; I < EI; ++I) >> \ >> for(int J = SJ; J < EJ; ++J) >> \ >> for(int K = SK; K < EK; ++K) >> >> How would I convert it to something equivalent using LoopControl? >> >> Thanks, >> >> David >> >> PS. Seeing that Eloisa was able to compile bbox.cc with the intel-17.0.0 >> with -no-vec, I made a patch to disable vectorization using pragmas inside >> bbox.cc (to avoid having to compile it manually): >> >> https://bitbucket.org/eschnett/carpet/pull-requests/16/ >> carpetlib-fix-compilation-with-intel-1700/diff >> >> >> -- >> Ian Hinder >> http://members.aei.mpg.de/ianhin >> >> > > > -- > Erik Schnetter <schnet...@cct.lsu.edu> > http://www.perimeterinstitute.ca/personal/eschnetter/ > > > -- > Ian Hinder > http://members.aei.mpg.de/ianhin > > -- Erik Schnetter <schnet...@cct.lsu.edu> http://www.perimeterinstitute.ca/personal/eschnetter/
_______________________________________________ Users mailing list Users@einsteintoolkit.org http://lists.einsteintoolkit.org/mailman/listinfo/users