Hi Carsten
I have also tried the tuned alltoalls and they are really great!! Only for very few message sizes in the case of 4 CPUs on a node one of my alltoalls performed better. Are these tuned collectives ready to be used for production runs?
We are actively testing them on larger systems to get better decision functions.. can you send me the list of which sizes they do better and worse for? (that way I can alter the decision functions). But the real question is do they exhibit the strange performance behaviour that you have with the other alltoall versions? (Noting that in my previous email to you I stated that one of the alltoalls is a sendrecv pairbased implementation).
Carsten
Thanks, Graham. ---------------------------------------------------------------------- Dr Graham E. Fagg | Distributed, Parallel and Meta-Computing Innovative Computing Lab. PVM3.4, HARNESS, FT-MPI, SNIPE & Open MPI Computer Science Dept | Suite 203, 1122 Volunteer Blvd, University of Tennessee | Knoxville, Tennessee, USA. TN 37996-3450 Email: f...@cs.utk.edu | Phone:+1(865)974-5790 | Fax:+1(865)974-8296 Broken complex systems are always derived from working simple systems ----------------------------------------------------------------------