Hi Nick ! Thanks.
I can use both the serial and parallel version in the super computer. After this post I have figured out that using 4 nodes and ParallelOverK makes the process faster than the serial version. Using more than 4 nodes again makes it slower. I am still confused on how to optimize the number of nodes. Any suggestion will be appreciated. AP
