On Wed, Nov 13, 2013 at 9:24 AM, Roc Wang <[email protected]> wrote:
> Hi, I tried to use -ksp_type bicg, but there was error. It was fine if I > use gmres as solver. Doe it mean the matrix cannot be solved by BiCG? > Thanks. > BiCG can breakdown. You can try -ksp_type bcgstab Matt > [0]PETSC ERROR: --------------------- Error Message > ------------------------------------ > [0]PETSC ERROR: Floating point exception! > [0]PETSC ERROR: Infinite or not-a-number generated in norm! > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: Petsc Release Version 3.3.0, Patch 6, Mon Feb 11 12:26:34 > CST 2013 > [0]PETSC ERROR: See docs/changes/index.html for recent updates. > [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting. > [0]PETSC ERROR: See docs/index.html for manual pages. > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: ./x.r on a arch-linu named node48.cocoa5 by pzw2 Wed Nov > 13 10:09:22 2013 > [0]PETSC ERROR: Libraries linked from > /home/pzw2/ZSoft/petsc-3.3-p6/arch-linux2-c-opt/lib > [0]PETSC ERROR: Configure run at Tue Nov 12 09:52:45 2013 > [0]PETSC ERROR: Configure options --download-f-blas-lapack > --with-mpi-dir=/usr/local/OpenMPI-1.6.4_Intel --download-hypre=1 > --download-hdf5=1 --download-superlu_dist --download-parmetis > --download-metis --download-spai --with-debugging=no > [0]PETSC ERROR: > ------------------------------------------------------------------------ > [0]PETSC ERROR: VecNorm() line 169 in > /home/pzw2/ZSoft/petsc-3.3-p6/src/vec/vec/interface/rvector.c > [0]PETSC ERROR: KSPSolve_BiCG() line 107 in > /home/pzw2/ZSoft/petsc-3.3-p6/src/ksp/ksp/impls/bicg/bicg.c > [0]PETSC ERROR: KSPSolve() line 446 in > /home/pzw2/ZSoft/petsc-3.3-p6/src/ksp/ksp/interface/itfunc.c > [0]PETSC ERROR: LinearSolver() line 181 in > "unknowndirectory/"src/solver.cpp > [23]PETSC ERROR: > ------------------------------------------------------------------------ > [23]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, > probably memory access out of range > [23]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger > [23]PETSC ERROR: or see > http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[23]PETSCERROR: > or try > http://valgrind.org on GNU/linux and Apple Mac OS X to find memory > corruption errors > [23]PETSC ERROR: configure using --with-debugging=yes, recompile, link, > and run > [23]PETSC ERROR: to get more information on the crash. > [23]PETSC ERROR: --------------------- Error Message > ------------------------------------ > [23]PETSC ERROR: Signal received! > > ------------------------------ > Date: Tue, 12 Nov 2013 15:34:16 -0600 > Subject: Re: [petsc-users] approaches to reduce computing time > From: [email protected] > To: [email protected] > CC: [email protected]; [email protected] > > On Tue, Nov 12, 2013 at 3:22 PM, Roc Wang <[email protected]> wrote: > > > > ------------------------------ > Date: Tue, 12 Nov 2013 14:59:30 -0600 > Subject: Re: [petsc-users] approaches to reduce computing time > From: [email protected] > To: [email protected] > CC: [email protected]; [email protected] > > On Tue, Nov 12, 2013 at 2:48 PM, Roc Wang <[email protected]> wrote: > > > > ------------------------------ > Date: Tue, 12 Nov 2013 14:22:35 -0600 > Subject: Re: [petsc-users] approaches to reduce computing time > From: [email protected] > To: [email protected] > CC: [email protected]; [email protected] > > On Tue, Nov 12, 2013 at 2:14 PM, Roc Wang <[email protected]> wrote: > > Thanks Jed, > > I have questions about load balance and PC type below. > > > From: [email protected] > > To: [email protected]; [email protected] > > Subject: Re: [petsc-users] approaches to reduce computing time > > Date: Sun, 10 Nov 2013 12:20:18 -0700 > > > > Roc Wang <[email protected]> writes: > > > > > Hi all, > > > > > > I am trying to minimize the computing time to solve a large sparse > matrix. The matrix dimension is with m=321 n=321 and p=321. I am trying to > reduce the computing time from two directions: 1 finding a Pre-conditioner > to reduce the number of iterations which reduces the time numerically, 2 > requesting more cores. > > > > > > ----For the first method, I tried several methods: > > > 1 default KSP and PC, > > > 2 -ksp_type fgmres -ksp_gmres_restart 30 -pc_type ksp -ksp_pc_type > jacobi, > > > 3 -ksp_type lgmres -ksp_gmres_restart 40 -ksp_lgmres_augment 10, > > > 4 -ksp_type lgmres -ksp_gmres_restart 50 -ksp_lgmres_augment 10, > > > 5 -ksp_type lgmres -ksp_gmres_restart 40 -ksp_lgmres_augment 10 > -pc_type asm (PCASM) > > > > > > The iterations and timing is like the following with 128 cores > requested: > > > case# iter timing (s) > > > 1 1436 816 > > > 2 3 12658 > > > 3 1069 669.64 > > > 4 872 768.12 > > > 5 927 513.14 > > > > > > It can be seen that change -ksp_gmres_restart and -ksp_lgmres_augment > can help to reduce the iterations but not the timing (comparing case 3 and > 4). Second, the PCASM helps a lot. Although the second option is able to > reduce iterations, the timing increases very much. Is it because more > operations are needed in the PC? > > > > > > My questions here are: 1. Which direction should I take to select > > > -ksp_gmres_restart and -ksp_lgmres_augment? For example, if larger > > > restart with large augment is better or larger restart with smaller > > > augment is better? > > > > Look at the -log_summary. By increasing the restart, the work in > > KSPGMRESOrthog will increase linearly, but the number of iterations > > might decrease enough to compensate. There is no general rule here > > since it depends on the relative expense of operations for your problem > > on your machine. > > > > > ----For the second method, I tried with -ksp_type lgmres > -ksp_gmres_restart 40 -ksp_lgmres_augment 10 -pc_type asm with different > number of cores. I found the speedup ratio increases slowly when more than > 32 to 64 cores are requested. I searched the milling list archives and > found that I am very likely running into the memory bandwidth bottleneck. > http://www.mail-archive.com/[email protected]/msg19152.html: > > > > > > # of cores iter timing > > > 1 923 19541.83 > > > 4 929 5897.06 > > > 8 932 4854.72 > > > 16 924 1494.33 > > > 32 924 1480.88 > > > 64 928 686.89 > > > 128 927 627.33 > > > 256 926 552.93 > > > > The bandwidth issue has more to do with using multiple cores within a > > node rather than between nodes. Likely the above is a load balancing > > problem or bad communication. > > I use DM to manage the distributed data. The DM was created by calling > DMDACreate3d() and let PETSc decide the local number of nodes in each > direction. To my understand the load of each core is determined at this > stage. If the load balance is done when DMDACreate3d() is called and use > PETSC_DECIDE option? Or how should make the load balanced after DM is > created? > > > We do not have a way to do fine-grained load balancing for the DMDA since > it is intended for very simple topologies. You can see > if it is load imbalance from the division by running with a cube that is > evenly divisible with a cube number of processes. > > Matt > > So, I have nothing to do to make the load balanced if I use DMDA? Would > you please take a look at the attached log summary files and give me some > suggestions on how to improve the speedup ratio? Thanks. > > > Please try what I suggested above. And it looks like there is a little > load imbalance > > Roc----So if the domain is a cube, then the number of the processors is > better to be like 2^3=8, 3^3=9, 4^4 =16, and so on, right? > > > I want you to try this to eliminate load imbalance as a reason for poor > speedup. I don't think it is, but we will see. > > > I am also wondering whether the physical boundary type effects the load > balance? Since freed node, Dirichlet node and Neumann node has different > number of neighbors? > > VecAXPY 234 1.0 1.0124e+00 3.4 1.26e+08 1.1 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 15290 > > VecAXPY 234 1.0 4.2862e-01 3.6 6.37e+07 1.1 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 36115 > > > although it is not limiting the speedup. The time imbalance is really > strange. I am guessing other jobs are running on this machine. > > > Roc----The code was run a cluster. There should be other jobs were > running. Do you mean those jobs affect the load balance of my job or speed > of the cluster? I am just trying to improve the scalability of the code, > but really don't know what's the reason that the speedup ratio decreases > so quickly? Thanks. > > > Yes, other people running can definitely screw up speedup and cause > imbalance. Usually timing runs are made with dedicated time. > > Your VecAXPY and MatMult are speeding up just fine. It is reductions which > are killing your computation. > You should switch to a more effective preconditioner, so you can avoid all > those dot products. Also, you > might try something like BiCG with fewer dot products. > > Matt > > > Matt > > > > > > > My question here is: Is there any other PC can help on both reducing > iterations and increasing scalability? Thanks. > > > > Always send -log_summary with questions like this, but algebraic > multigrid is a good place to start. > > Please take a look at the attached log file, they are for 128 cores and > 256 cores, respectively. Based on the log files, what should be done to > increase the scalability? Thanks. > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
