Hi, Suppose we run a parallel MPI code with 64 processes on a cluster, say of 16 nodes. The cluster nodes has multicore CPU say 4 cores on each node.
Now all the 64 cores on the cluster running a process. Program is SPMD, means all processes has the same workload. Now if we had done auto-vectorization while compiling the code (for example with Intel compilers); Will there be any benefit (efficiency/scalability improvement) of having code with the auto-vectorization? Or we will get the same performance as without Auto-vectorization in this example case? MEANS THAT if we do not have free cpu cores in a PC or cluster (all cores are running MPI processes), still the auto-vertorization is beneficial? Or it is beneficial only if we have some free cpu cores locally? How can we really get benefit in performance improvement with Auto-Vectorization? Thank you.