I am using an explicit time stepper. The matrices are assembled only once, and then I use the linear operator for example to compute the least stable eigenmode(s). I attached the output of log_summary for performing the same number of time steps using the linear and nonlinear operators.
On Sat, Mar 10, 2012 at 5:10 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote: > On Sat, Mar 10, 2012 at 09:59, Xavier Garnaud < > xavier.garnaud at ladhyx.polytechnique.fr> wrote: > >> am solving the compressible Navier--Stokes equations in compressible >> form, so in order to apply the operator, I >> >> 1. apply BCs on the flow field >> 2. compute the flux >> 3. take the derivative using finite differences >> 4. apply BCs on the derivatives of the flux >> >> >> In order to apply the linearized operator, I wish to linearize steps 2 >> and 4 (the other are linear). For this I assemble sparse matrices (MPIAIJ). >> The matrices should be block diagonal -- with square or rectangular blocks >> -- so I preallocate the whole diagonal blocks (but I only use MatSetValues >> for nonzero entries). When I do this, the linearized code runs >> approximately 50% slower (the computation of derivatives takes more that >> 70% of the time in the non-linear code), so steps 2 and 4 are much slower >> for the linear operator although the number of operations is very similar. >> Is this be due to the poor preallocation? Is there a way to improve the >> performance? >> > > It's not clear to me from this description if you are even using an > implicit method. Is the linearization for use in a Newton iteration? How > often do you have to reassemble? Please always send -log_summary output > with performance questions. > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120310/1bd0f59c/attachment-0001.htm> -------------- next part -------------- nx = 256 ny = 128 JET MESH ix = 74 iy_in = 44 iy_out = 65 rank: 0, i0-i1xj0-j1 : 0 - 74 x 44 - 63 rank: 2, i0-i1xj0-j1 : 0 - 74 x 64 - 65 TIME STEPPING Tf = 1 CFL = 2 State loaded from q0.h5 32768 40960 32768 40960 32768 40960 32768 40960 Euler - x Euler - y LODI - x LODIq - x LODI - y LODIq - y Stress - x Stress - x dFv/dq - x dFv/dtau - x dFv/dq - y dFv/dtau - y |MatEulerx | = 21.7871 |MatEulery | = 10.4999 |MatLODIx | = 13.3652 |MatLODIy | = 15.0075 |MatLODIqx | = 4.58531e+06 |MatLODIqy | = 1.00002 |MatViscousx_q | = 0.00122487 |MatViscousy_q | = 0.00125045 |MatViscousx_tau | = 1.99893 |MatViscousy_tau | = 1.99893 dt = 0.00571429 |q(1.000000)|/|q(0)| = 1.84842 Elapsed time (CPU) = 27.2226 s ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./ns2d on a real_opt named muzo.polytechnique.fr with 4 processors, by garnaud Sat Mar 10 18:02:03 2012 Using Petsc Release Version 3.2.0, Patch 5, Sat Oct 29 13:45:54 CDT 2011 Max Max/Min Avg Total Time (sec): 2.762e+01 1.00000 2.762e+01 Objects: 1.900e+02 1.00000 1.900e+02 Flops: 1.068e+10 1.01222 1.065e+10 4.258e+10 Flops/sec: 3.869e+08 1.01222 3.855e+08 1.542e+09 MPI Messages: 3.260e+04 1.00000 3.260e+04 1.304e+05 MPI Message Lengths: 2.277e+08 1.00000 6.984e+03 9.108e+08 MPI Reductions: 4.280e+02 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 2.7615e+01 100.0% 4.2584e+10 100.0% 1.304e+05 100.0% 6.984e+03 100.0% 4.270e+02 99.8% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %f - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage DERIVATIVES 10508 1.0 1.4299e+01 1.0 8.21e+09 1.0 0.0e+00 0.0e+00 0.0e+00 51 77 0 0 0 51 77 0 0 0 2295 FILTERS 350 1.0 1.9905e-01 1.0 1.26e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 2535 VECMANIP 21716 1.0 2.8288e+00 1.2 0.00e+00 0.0 1.3e+05 7.0e+03 6.0e+00 9 0100100 1 9 0100100 1 0 VecView 6 1.0 3.7352e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 VecNorm 2 1.0 5.9009e-04 3.3 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScale 1800 1.0 7.1079e-02 1.2 5.90e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 3323 VecCopy 414 1.0 3.7731e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 7051 1.0 7.2879e-01 1.1 4.97e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 5 0 0 0 3 5 0 0 0 2726 VecAXPBYCZ 350 1.0 6.3609e-02 1.0 4.01e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2524 VecLoad 1 1.0 1.8210e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 31858 1.0 1.5961e+00 1.1 0.00e+00 0.0 1.3e+05 7.0e+03 0.0e+00 6 0100100 0 6 0100100 0 0 VecScatterEnd 31858 1.0 8.4421e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 IMPOSEBC_VISC 5251 1.0 7.2675e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 IMPOSEBC_EULER 1945 1.0 8.8332e-03 4.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 FLUXES_VISC 22 1.0 4.4665e-03 1.1 4.33e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3874 FLUXES_EULER 14 1.0 2.4092e-03 1.3 2.75e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 4570 STRESSES 12 1.0 1.9977e-03 1.1 1.67e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3346 MatMult 12250 1.0 6.6647e+00 1.0 1.34e+09 1.1 0.0e+00 0.0e+00 0.0e+00 24 12 0 0 0 24 12 0 0 0 784 MatMultAdd 8750 1.0 2.5075e+00 1.0 4.13e+08 1.1 0.0e+00 0.0e+00 0.0e+00 9 4 0 0 0 9 4 0 0 0 642 MatAssemblyBegin 12 1.0 7.1454e-04 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.4e+01 0 0 0 0 6 0 0 0 0 6 0 MatAssemblyEnd 12 1.0 2.1005e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.1e+02 0 0 0 0 25 0 0 0 0 25 0 TSStep 175 1.0 2.6759e+01 1.0 1.05e+10 1.0 1.3e+05 7.0e+03 6.0e+00 97 99 97 97 1 97 99 97 97 1 1570 TSFunctionEval 1750 1.0 2.6487e+01 1.0 1.04e+10 1.0 1.3e+05 7.0e+03 0.0e+00 96 97 97 97 0 96 97 97 97 0 1562 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Distributed Mesh 5 5 905272 0 Vector 59 59 6793272 0 Vector Scatter 22 22 23320 0 Index Set 49 49 220252 0 IS L to G Mapping 10 10 703172 0 Viewer 4 3 2064 0 Matrix 39 36 33641936 0 TS 1 1 1088 0 SNES 1 1 1200 0 ======================================================================================================================== Average time to get PetscTime(): 9.53674e-08 Average time for MPI_Barrier(): 3.57628e-06 Average time for zero size MPI_Send(): 5.24521e-06 #PETSc Option Table entries: -Tf 1. -cfl 2 -lints -log_summary -nx 256 -ny 128 -save 1. #End of PETSc Option Table entries Compiled with FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 Configure run at: Thu Feb 16 17:51:17 2012 Configure options: --with-mpi=yes --with-shared-libraries --with-scalar-type=real --with-fortran-interfaces=1 --FFLAGS=-I/usr/include --with-fortran --with-fortran-kernels=1 --with-clanguage=c COPTFLAGS=-O3 FOPTFLAGS=-O3 --download-mumps=MUMPS_4.10.0.tar.gz --download-scalapack=SCALAPACK-1.7.tar.gz --download-blacs=blacs-dev.tar.gz --download-parmetis=ParMetis-3.2.0-p1.tar.gz --download-superlu=superlu_4.2.tar.gz --download-superlu_dist=superlu_dist_2.5.tar.gz --download-spooles=spooles-2.2-dec-2008.tar.gz --download-umfpack=UMFPACK-5.5.1.tar.gz --with-debugging=0 --with-mpi-dir=/home/garnaud/local/openmpi-1.4.4 --download-hdf5 --download-f-blas-lapack ----------------------------------------- Libraries compiled on Thu Feb 16 17:51:17 2012 on muzo.polytechnique.fr Machine characteristics: Linux-2.6.39.4-4.2-desktop-x86_64-with-mandrake-2011.0-Official Using PETSc directory: /home/garnaud/local/petsc/petsc-3.2-p5 Using PETSc arch: real_opt ----------------------------------------- Using C compiler: /home/garnaud/local/openmpi-1.4.4/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /home/garnaud/local/openmpi-1.4.4/bin/mpif90 -I/usr/include -fPIC -O3 ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/garnaud/local/petsc/petsc-3.2-p5/real_opt/include -I/home/garnaud/local/petsc/petsc-3.2-p5/include -I/home/garnaud/local/petsc/petsc-3.2-p5/include -I/home/garnaud/local/petsc/petsc-3.2-p5/real_opt/include -I/home/garnaud/local/openmpi-1.4.4/include ----------------------------------------- Using C linker: /home/garnaud/local/openmpi-1.4.4/bin/mpicc Using Fortran linker: /home/garnaud/local/openmpi-1.4.4/bin/mpif90 Using libraries: -Wl,-rpath,/home/garnaud/local/petsc/petsc-3.2-p5/real_opt/lib -L/home/garnaud/local/petsc/petsc-3.2-p5/real_opt/lib -lpetsc -lX11 -lpthread -Wl,-rpath,/home/garnaud/local/petsc/petsc-3.2-p5/real_opt/lib -L/home/garnaud/local/petsc/petsc-3.2-p5/real_opt/lib -lsuperlu_dist_2.5 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lspooles -lscalapack -lblacs -lsuperlu_4.2 -lumfpack -lamd -lflapack -lfblas -lhdf5_fortran -lhdf5 -lz -Wl,-rpath,/home/garnaud/local/openmpi-1.4.4/lib -L/home/garnaud/local/openmpi-1.4.4/lib -Wl,-rpath,/usr/lib64/gcc/x86_64-mandriva-linux-gnu/4.6.1 -L/usr/lib64/gcc/x86_64-mandriva-linux-gnu/4.6.1 -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lgfortran -lm -lgfortran -lm -lm -lquadmath -lm -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl ----------------------------------------- -------------- next part -------------- nx = 256 ny = 128 JET MESH ix = 74 iy_in = 44 iy_out = 65 rank: 0, i0-i1xj0-j1 : 0 - 74 x 44 - 63 rank: 2, i0-i1xj0-j1 : 0 - 74 x 64 - 65 TIME STEPPING Tf = 1 CFL = 2 dt = 0.00571429 |q(1.000000)|/|q(0)| = 1.0005 Elapsed time (CPU) = 19.2814 s Final state saved in q0.h5 ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./ns2d on a real_opt named muzo.polytechnique.fr with 4 processors, by garnaud Sat Mar 10 18:03:09 2012 Using Petsc Release Version 3.2.0, Patch 5, Sat Oct 29 13:45:54 CDT 2011 Max Max/Min Avg Total Time (sec): 1.955e+01 1.00000 1.955e+01 Objects: 8.400e+01 1.00000 8.400e+01 Flops: 1.090e+10 1.00000 1.090e+10 4.358e+10 Flops/sec: 5.574e+08 1.00000 5.574e+08 2.229e+09 MPI Messages: 3.259e+04 1.00000 3.259e+04 1.303e+05 MPI Message Lengths: 2.276e+08 1.00000 6.984e+03 9.103e+08 MPI Reductions: 1.180e+02 1.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 1.9549e+01 100.0% 4.3584e+10 100.0% 1.303e+05 100.0% 6.984e+03 100.0% 1.170e+02 99.2% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %f - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage DERIVATIVES 10502 1.0 1.4136e+01 1.0 8.20e+09 1.0 0.0e+00 0.0e+00 0.0e+00 72 75 0 0 0 72 75 0 0 0 2321 FILTERS 350 1.0 1.9755e-01 1.0 1.26e+08 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 2554 VECMANIP 21704 1.0 2.4231e+00 1.2 0.00e+00 0.0 1.3e+05 7.0e+03 6.0e+00 11 0100100 5 11 0100100 5 0 VecView 7 1.0 4.2857e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 VecNorm 2 1.0 6.0606e-04 3.7 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 2 0 0 0 0 2 0 VecScale 1750 1.0 6.4685e-02 1.1 5.73e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 3546 VecCopy 352 1.0 3.0717e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 8750 1.0 8.0684e-01 1.1 6.08e+08 1.0 0.0e+00 0.0e+00 0.0e+00 4 6 0 0 0 4 6 0 0 0 3013 VecAXPBYCZ 350 1.0 6.5956e-02 1.0 4.01e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2434 VecScatterBegin 10852 1.0 1.4070e+00 1.1 0.00e+00 0.0 1.3e+05 7.0e+03 0.0e+00 7 0100100 0 7 0100100 0 0 VecScatterEnd 10852 1.0 7.2064e-01 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 IMPOSEBC_VISC 5250 1.0 7.5293e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 IMPOSEBC_EULER 5425 1.0 5.2320e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 FLUXES_VISC 3500 1.0 6.5287e-01 1.1 6.88e+08 1.0 0.0e+00 0.0e+00 0.0e+00 3 6 0 0 0 3 6 0 0 0 4216 FLUXES_EULER 3500 1.0 4.5190e-01 1.1 6.88e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 6 0 0 0 2 6 0 0 0 6091 STRESSES 3500 1.0 4.7757e-01 1.1 4.87e+08 1.0 0.0e+00 0.0e+00 0.0e+00 2 4 0 0 0 2 4 0 0 0 4083 TSStep 175 1.0 1.8815e+01 1.0 1.08e+10 1.0 1.3e+05 7.0e+03 1.0e+01 96 99 97 97 8 96 99 97 97 9 2289 TSFunctionEval 1750 1.0 1.8553e+01 1.0 1.06e+10 1.0 1.3e+05 7.0e+03 4.0e+00 95 97 97 97 3 95 97 97 97 3 2287 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Distributed Mesh 5 5 905272 0 Vector 28 28 4715800 0 Vector Scatter 10 10 10600 0 Index Set 25 25 202300 0 IS L to G Mapping 10 10 703172 0 Viewer 4 3 2064 0 TS 1 1 1088 0 SNES 1 1 1200 0 ======================================================================================================================== Average time to get PetscTime(): 2.14577e-07 Average time for MPI_Barrier(): 3.57628e-06 Average time for zero size MPI_Send(): 5.00679e-06 #PETSc Option Table entries: -Tf 1. -cfl 2 -log_summary -nx 256 -ny 128 -save 1. -ts #End of PETSc Option Table entries Compiled with FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 Configure run at: Thu Feb 16 17:51:17 2012 Configure options: --with-mpi=yes --with-shared-libraries --with-scalar-type=real --with-fortran-interfaces=1 --FFLAGS=-I/usr/include --with-fortran --with-fortran-kernels=1 --with-clanguage=c COPTFLAGS=-O3 FOPTFLAGS=-O3 --download-mumps=MUMPS_4.10.0.tar.gz --download-scalapack=SCALAPACK-1.7.tar.gz --download-blacs=blacs-dev.tar.gz --download-parmetis=ParMetis-3.2.0-p1.tar.gz --download-superlu=superlu_4.2.tar.gz --download-superlu_dist=superlu_dist_2.5.tar.gz --download-spooles=spooles-2.2-dec-2008.tar.gz --download-umfpack=UMFPACK-5.5.1.tar.gz --with-debugging=0 --with-mpi-dir=/home/garnaud/local/openmpi-1.4.4 --download-hdf5 --download-f-blas-lapack ----------------------------------------- Libraries compiled on Thu Feb 16 17:51:17 2012 on muzo.polytechnique.fr Machine characteristics: Linux-2.6.39.4-4.2-desktop-x86_64-with-mandrake-2011.0-Official Using PETSc directory: /home/garnaud/local/petsc/petsc-3.2-p5 Using PETSc arch: real_opt ----------------------------------------- Using C compiler: /home/garnaud/local/openmpi-1.4.4/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O3 ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /home/garnaud/local/openmpi-1.4.4/bin/mpif90 -I/usr/include -fPIC -O3 ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/garnaud/local/petsc/petsc-3.2-p5/real_opt/include -I/home/garnaud/local/petsc/petsc-3.2-p5/include -I/home/garnaud/local/petsc/petsc-3.2-p5/include -I/home/garnaud/local/petsc/petsc-3.2-p5/real_opt/include -I/home/garnaud/local/openmpi-1.4.4/include ----------------------------------------- Using C linker: /home/garnaud/local/openmpi-1.4.4/bin/mpicc Using Fortran linker: /home/garnaud/local/openmpi-1.4.4/bin/mpif90 Using libraries: -Wl,-rpath,/home/garnaud/local/petsc/petsc-3.2-p5/real_opt/lib -L/home/garnaud/local/petsc/petsc-3.2-p5/real_opt/lib -lpetsc -lX11 -lpthread -Wl,-rpath,/home/garnaud/local/petsc/petsc-3.2-p5/real_opt/lib -L/home/garnaud/local/petsc/petsc-3.2-p5/real_opt/lib -lsuperlu_dist_2.5 -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lspooles -lscalapack -lblacs -lsuperlu_4.2 -lumfpack -lamd -lflapack -lfblas -lhdf5_fortran -lhdf5 -lz -Wl,-rpath,/home/garnaud/local/openmpi-1.4.4/lib -L/home/garnaud/local/openmpi-1.4.4/lib -Wl,-rpath,/usr/lib64/gcc/x86_64-mandriva-linux-gnu/4.6.1 -L/usr/lib64/gcc/x86_64-mandriva-linux-gnu/4.6.1 -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -lmpi_f90 -lmpi_f77 -lgfortran -lm -lgfortran -lm -lgfortran -lm -lm -lquadmath -lm -ldl -lmpi -lopen-rte -lopen-pal -lnsl -lutil -lgcc_s -lpthread -ldl -----------------------------------------
