[petsc-users] On what condition is useful MPI-based solution?
El 14/01/2010, a las 01:04, Takuya Sekikawa escribi?: Dear SLEPc/PETSc team, On Wed, 13 Jan 2010 09:50:57 +0100 Jose E. Roman jroman at dsic.upv.es wrote: [Q2] Generally, Is MPI only useful in very large matrix? Now I have to solve eigenvalue problem of 1M x 1M matrix, Should I use MPI-based system? For a 1 million matrix I would suggest to run in parallel on an MPI cluster. However, a single computer might be enough if the matrix is very sparse, you need very few eigenvalues, and/or the system has enough memory (but in that case, be prepared for very long response times, depending on how your problem converges). Jose Do you have any example of how many PCs need to solve this level of problem? and also how many memories do each PC should have? I would like to know how much resources do I need (PCs, memories) and how long it takes to solve. (not precisely, rough estimation is enough) Problem is 1M x 1M symmetric sparse matrix, and only a few eigenpairs (at least 1) I need. so currently I plan to use lanczos or krylov-schur method, with EPS_NEV=1. For nev=1, the workspace used by the solver is moderate. Maybe 20 vectors of length 1M (i.e. 160 Mbytes). If the matrix is really sparse, say 30 nonzero elements per row, then the matrix is not a problem either (roughly 364 Mbytes). So one PC may be enough. If the matrix is much denser, or you have convergence problems, or you need to do shift-and-invert, then things get worse. The execution time basically depends on convergence. For instance, ex1 with n=1M will have very bad convergence, but your problem may not. Run with -eps_monitor or -eps_monitor_draw to see how the solver is progressing. Jose Takuya --- Takuya Sekikawa Mathematical Systems, Inc sekikawa at msi.co.jp ---
[petsc-users] On what condition is useful MPI-based solution?
Dear SLEPc/PETSc team, I tried to run SLEPc's samples program ex1 in MPI-based multi-PC environment. (1-PC vs 2-PCs) (ex) [running ex1 in only 1 PC] -- $ time -p mpiexec -n 1 ./ex1 -n 4000 -eps_max_it 1 ... real 76.2 -- says, ex1 took 76.2 seconds. next, I run same sample in 2 PCs environment. [running ex1 in 2 PCs] -- $ time -p mpiexec -n 2 ./ex1 -n 4000 -eps_max_it 1 ... real 265.54 -- I got 265.54 seconds. (slower than single PC) [Q1] Can ex1 sample speed up with MPI?, if so, generally on what condition? [Q2] Generally, Is MPI only useful in very large matrix? Now I have to solve eigenvalue problem of 1M x 1M matrix, Should I use MPI-based system? Thanks, Takuya --- Takuya Sekikawa Mathematical Systems, Inc sekikawa at msi.co.jp ---
[petsc-users] On what condition is useful MPI-based solution?
Hi Jose, Thank you for very quick reply. On Wed, 13 Jan 2010 09:50:57 +0100 Jose E. Roman jroman at dsic.upv.es wrote: On 13/01/2010, Takuya Sekikawa wrote: Dear SLEPc/PETSc team, I tried to run SLEPc's samples program ex1 in MPI-based multi-PC environment. (1-PC vs 2-PCs) (ex) [running ex1 in only 1 PC] -- $ time -p mpiexec -n 1 ./ex1 -n 4000 -eps_max_it 1 ... real 76.2 -- says, ex1 took 76.2 seconds. next, I run same sample in 2 PCs environment. [running ex1 in 2 PCs] -- $ time -p mpiexec -n 2 ./ex1 -n 4000 -eps_max_it 1 ... real 265.54 -- I got 265.54 seconds. (slower than single PC) [Q1] Can ex1 sample speed up with MPI?, if so, generally on what condition? Yes. The same example on my desktop computer (Intel Core 2 Duo): With -n 1 -- real 33.99 With -n 2 -- real 21.90 If you simply have two PCs connected via a slow network, then you cannot expect good speedup. Try in a cluster with fast network. On the other hand, a better way to measure the parallel execution time is to edit the source file and put PetscGetTime around the Solve call. Thank you for benchmarking on your environment. that result is quite useful for me. Certainly my current test network is not fast. Probably that is the problem. thank you. [Q2] Generally, Is MPI only useful in very large matrix? Now I have to solve eigenvalue problem of 1M x 1M matrix, Should I use MPI-based system? For a 1 million matrix I would suggest to run in parallel on an MPI cluster. However, a single computer might be enough if the matrix is very sparse, you need very few eigenvalues, and/or the system has enough memory (but in that case, be prepared for very long response times, depending on how your problem converges). Also this advice is really meaningful to me, thank you so much! Takuya Jose Thanks, Takuya --- Takuya Sekikawa Mathematical Systems, Inc sekikawa at msi.co.jp --- --- Takuya Sekikawa Mathematical Systems, Inc sekikawa at msi.co.jp ---
[petsc-users] On what condition is useful MPI-based solution?
I have a few more questions. Just in case I want to ask. [Q1] In general, like ex1, Don't we need any change to source code to run program with MPI-based system? in other words, to run SLEPc-based program in MPI-based system, all I need to do is changing ./configure option (--with-mpi=1, etc) and re-compiling (without changing source) ? [Q2] Can I use intel-compiler (icc/icpc) on MPI-based SLEPc application? (because intel compiler is really fast so I want to use it on MPI-based system too) Takuya On Wed, 13 Jan 2010 09:50:57 +0100 Jose E. Roman jroman at dsic.upv.es wrote: On 13/01/2010, Takuya Sekikawa wrote: Dear SLEPc/PETSc team, I tried to run SLEPc's samples program ex1 in MPI-based multi-PC environment. (1-PC vs 2-PCs) (ex) [running ex1 in only 1 PC] -- $ time -p mpiexec -n 1 ./ex1 -n 4000 -eps_max_it 1 ... real 76.2 -- says, ex1 took 76.2 seconds. next, I run same sample in 2 PCs environment. [running ex1 in 2 PCs] -- $ time -p mpiexec -n 2 ./ex1 -n 4000 -eps_max_it 1 ... real 265.54 -- I got 265.54 seconds. (slower than single PC) [Q1] Can ex1 sample speed up with MPI?, if so, generally on what condition? Yes. The same example on my desktop computer (Intel Core 2 Duo): With -n 1 -- real 33.99 With -n 2 -- real 21.90 If you simply have two PCs connected via a slow network, then you cannot expect good speedup. Try in a cluster with fast network. On the other hand, a better way to measure the parallel execution time is to edit the source file and put PetscGetTime around the Solve call. [Q2] Generally, Is MPI only useful in very large matrix? Now I have to solve eigenvalue problem of 1M x 1M matrix, Should I use MPI-based system? For a 1 million matrix I would suggest to run in parallel on an MPI cluster. However, a single computer might be enough if the matrix is very sparse, you need very few eigenvalues, and/or the system has enough memory (but in that case, be prepared for very long response times, depending on how your problem converges). Jose Thanks, Takuya --- Takuya Sekikawa Mathematical Systems, Inc sekikawa at msi.co.jp --- --- ? Takuya Sekikawa ???Mathematical Systems, Inc ? sekikawa at msi.co.jp ---
[petsc-users] On what condition is useful MPI-based solution?
On 13/01/2010, Takuya Sekikawa wrote: I have a few more questions. Just in case I want to ask. [Q1] In general, like ex1, Don't we need any change to source code to run program with MPI-based system? in other words, to run SLEPc-based program in MPI-based system, all I need to do is changing ./configure option (--with-mpi=1, etc) and re-compiling (without changing source) ? No change in the program. Just enable MPI in the installation. [Q2] Can I use intel-compiler (icc/icpc) on MPI-based SLEPc application? (because intel compiler is really fast so I want to use it on MPI- based system too) Takuya Yes, no problem.
[petsc-users] On what condition is useful MPI-based solution?
On Wed, 13 Jan 2010 09:50:57 +0100, Jose E. Roman jroman at dsic.upv.es wrote: On the other hand, a better way to measure the parallel execution time is to edit the source file and put PetscGetTime around the Solve call. Or use -log_summary Jed
[petsc-users] On what condition is useful MPI-based solution?
On Wed, 13 Jan 2010, Jose E. Roman wrote: On 13/01/2010, Takuya Sekikawa wrote: I have a few more questions. Just in case I want to ask. [Q1] In general, like ex1, Don't we need any change to source code to run program with MPI-based system? in other words, to run SLEPc-based program in MPI-based system, all I need to do is changing ./configure option (--with-mpi=1, etc) and re-compiling (without changing source) ? No change in the program. Just enable MPI in the installation. Note that the examples are written to be parallel - and run with or without MPI. You cannot take any random sequential code, compile it with MPI - and expect it to run parallely. Eventhough SLEPc/PETSc hide most of the MPI related stuff from the user - there is generally some user code that should be MPI aware. [in ex1 - the matrix assembly is written to be MPI aware - and the data is distrubuted and assembled parallely] [Q2] Can I use intel-compiler (icc/icpc) on MPI-based SLEPc application? (because intel compiler is really fast so I want to use it on MPI-based system too) Yes, Note: Faster compilers speedup sequential part of the code [hence the overall runtime for the parallel run aswell]. But your bottleneck is MPI/communication cost - which won't change. So your parallel scalability will be skewed further. To improve parallel scalability - you'll have to get the fastest network you can - between the nodes. And then install MPI that can perform well on the given network setup. [and as Jed mentioned -log_summary is a better tool to compare performance] Satish