On 10/10/2017 12:47 PM, Mark Adams wrote:



       What are you comparing? Are you using say 32 MPI processes and
    2 threads or 16 MPI processes and 4 threads? How are you
    controlling the number of OpenMP threads, OpenMP environmental
    variable? What parts of the time in the code are you comparing?
    You should just -log_view and compare the times for PCApply and
    PCSetUp() between say 64 MPI process/1 thread and 32 MPI
    processes/2 threads and send us the output for those two cases.


These folks don't use many MPI processes. I'm not sure what the optimal configuration is with Chombo-Crunch when using all of Cori.

Baky: how many MPI processes per socket are you aiming for on Cori-KNL?
right now I am testing it on a single KNL node going from flat 64+1 to 2+32 for comparison. But as you can see from the plot in the previous mail, we have a sweet spot at 16+4 point, then we scale that accordingly when running
with 8k nodes.





    >
    > It seems that it made no difference, so perhaps I am doing
    something wrong or my build is not configured right.
    >
    > Do you have any example that makes use of threads when running
    hybrid and show an advantage?

       There is not reason to think that using threads on KNL is
    faster than just using MPI processes. Despite what the NERSc/LBL
    web pages may say, just because a website says something doesn't
    make it true.


    >
    > I'd like to test it and make sure that my libs are configured
    correctly, before start to investigate it further.
    >
    >
    > Thanks,
    >
    > Baky
    >
    >



Reply via email to