Dear all, in order to calculate speedup (Sp = T1/Tp) I need an accurate measurement of T1, the time to solve on 1 processor. I will be using the parallel algorithm for that, but there seems to be a hick-up.
At the cluster I am currently working on, each node is made up by 12 PEs and have shared memory. When I would just reserve 1 PE for my job, the other 11 processors are given to other users, therefore giving dynamic load on the memory system resulting into inaccurate timings. The solve-times I get are ranging between 1 and 5 minutes. For me, this is not very scientific either. The second idea was to reserve all 12 PEs on the node and just let 1 PE run the job. However, in this way the single CPU gets all the memory bandwidth and has no waiting time, therefore giving very fast results. When I would calculate speedup from here, the algorithm does not scale very well. Another idea would be to spawn 12 identical jobs on 12 PEs and take the average runtime. Unfortunately, there is only one PETSC_COMM_WORLD, so I think this is impossible to do from within one program (MPI_COMM_WORLD). Do you fellow PETSc-users have any ideas on the subject? It would be much appreciated. regards, Leo van Kampenhout -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20100921/f94e5a85/attachment.html>
