Great... Yes, I didn't estimate the degrees of freedom for this. Trying to be
too quick. I've uploaded the sd7003 case. I added the residual printing and I
can already see it produced 2x speedup going from 1 to 2 nodes. I am doing a
full test now.
I have several related questions and comments:
1) what is [backend] rank-allocator = linear? Does this not conflict with MPI
options e.g. -rank-by from OMPI or binding policy of MVAPICH. This is
significant for me as I have two GPUs per socket and 64 hardware threads
per socket. I don't want 4 process to run on the first socket alone.
I print my bindings in MVAPICH and it looks ok, but I want to double check
that python is not doing something else under the hood.
2) What is the rough DoF estimate for the strong scaling limit you observed
with PyFR?
3) At the moment I am setting 4 MPI proc per node as I've got 4 GPUs, but I
assume there's nothing to stop me from using more. Has anyone looked at
optimal ratio of MPI processes to GPUs?
Thanks,
Robert
--
Dr Robert Sawko
Research Staff Member, IBM
Daresbury Laboratory
Keckwick Lane, Warrington
WA4 4AD
United Kingdom
--
Email (IBM): [email protected]
Email (STFC): [email protected]
Phone (office): +44 (0) 1925 60 3301
Phone (mobile): +44 778 830 8522
Profile page:
http://researcher.watson.ibm.com/researcher/view.php?person=uk-RSawko
--
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
--
You received this message because you are subscribed to the Google Groups "PyFR
Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send an email to [email protected].
Visit this group at https://groups.google.com/group/pyfrmailinglist.
For more options, visit https://groups.google.com/d/optout.