On 10/10/2017 12:47 PM, Mark Adams wrote:
What are you comparing? Are you using say 32 MPI processes and
2 threads or 16 MPI processes and 4 threads? How are you
controlling the number of OpenMP threads, OpenMP environmental
variable? What parts of the time in the code are you comparing?
You should just -log_view and compare the times for PCApply and
PCSetUp() between say 64 MPI process/1 thread and 32 MPI
processes/2 threads and send us the output for those two cases.
These folks don't use many MPI processes. I'm not sure what the
optimal configuration is with Chombo-Crunch when using all of Cori.
Baky: how many MPI processes per socket are you aiming for on Cori-KNL?
right now I am testing it on a single KNL node going from flat 64+1 to
2+32 for comparison.
But as you can see from the plot in the previous mail, we have a sweet
spot at 16+4 point, then we scale that accordingly when running
with 8k nodes.
>
> It seems that it made no difference, so perhaps I am doing
something wrong or my build is not configured right.
>
> Do you have any example that makes use of threads when running
hybrid and show an advantage?
There is not reason to think that using threads on KNL is
faster than just using MPI processes. Despite what the NERSc/LBL
web pages may say, just because a website says something doesn't
make it true.
>
> I'd like to test it and make sure that my libs are configured
correctly, before start to investigate it further.
>
>
> Thanks,
>
> Baky
>
>