Re: [petsc-users] approaches to reduce computing time

Matthew Knepley Wed, 13 Nov 2013 07:41:00 -0800

On Wed, Nov 13, 2013 at 9:24 AM, Roc Wang <[email protected]> wrote:


> Hi, I tried to use -ksp_type bicg, but there was error. It was fine if I
> use gmres as solver. Doe it mean the matrix cannot be solved by BiCG?
> Thanks.
>

BiCG can breakdown. You can try -ksp_type bcgstab

   Matt


> [0]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [0]PETSC ERROR: Floating point exception!
> [0]PETSC ERROR: Infinite or not-a-number generated in norm!
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Petsc Release Version 3.3.0, Patch 6, Mon Feb 11 12:26:34
> CST 2013
> [0]PETSC ERROR: See docs/changes/index.html for recent updates.
> [0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [0]PETSC ERROR: See docs/index.html for manual pages.
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: ./x.r on a arch-linu named node48.cocoa5 by pzw2 Wed Nov
> 13 10:09:22 2013
> [0]PETSC ERROR: Libraries linked from
> /home/pzw2/ZSoft/petsc-3.3-p6/arch-linux2-c-opt/lib
> [0]PETSC ERROR: Configure run at Tue Nov 12 09:52:45 2013
> [0]PETSC ERROR: Configure options --download-f-blas-lapack
> --with-mpi-dir=/usr/local/OpenMPI-1.6.4_Intel --download-hypre=1
> --download-hdf5=1 --download-superlu_dist --download-parmetis
> --download-metis --download-spai --with-debugging=no
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: VecNorm() line 169 in
> /home/pzw2/ZSoft/petsc-3.3-p6/src/vec/vec/interface/rvector.c
> [0]PETSC ERROR: KSPSolve_BiCG() line 107 in
> /home/pzw2/ZSoft/petsc-3.3-p6/src/ksp/ksp/impls/bicg/bicg.c
> [0]PETSC ERROR: KSPSolve() line 446 in
> /home/pzw2/ZSoft/petsc-3.3-p6/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: LinearSolver() line 181 in
> "unknowndirectory/"src/solver.cpp
> [23]PETSC ERROR:
> ------------------------------------------------------------------------
> [23]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
> probably memory access out of range
> [23]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [23]PETSC ERROR: or see
> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind[23]PETSCERROR: 
> or try
> http://valgrind.org on GNU/linux and Apple Mac OS X to find memory
> corruption errors
> [23]PETSC ERROR: configure using --with-debugging=yes, recompile, link,
> and run
> [23]PETSC ERROR: to get more information on the crash.
> [23]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> [23]PETSC ERROR: Signal received!
>
> ------------------------------
> Date: Tue, 12 Nov 2013 15:34:16 -0600
> Subject: Re: [petsc-users] approaches to reduce computing time
> From: [email protected]
> To: [email protected]
> CC: [email protected]; [email protected]
>
> On Tue, Nov 12, 2013 at 3:22 PM, Roc Wang <[email protected]> wrote:
>
>
>
> ------------------------------
> Date: Tue, 12 Nov 2013 14:59:30 -0600
> Subject: Re: [petsc-users] approaches to reduce computing time
> From: [email protected]
> To: [email protected]
> CC: [email protected]; [email protected]
>
> On Tue, Nov 12, 2013 at 2:48 PM, Roc Wang <[email protected]> wrote:
>
>
>
> ------------------------------
> Date: Tue, 12 Nov 2013 14:22:35 -0600
> Subject: Re: [petsc-users] approaches to reduce computing time
> From: [email protected]
> To: [email protected]
> CC: [email protected]; [email protected]
>
> On Tue, Nov 12, 2013 at 2:14 PM, Roc Wang <[email protected]> wrote:
>
> Thanks Jed,
>
> I have questions about load balance and PC type below.
>
> > From: [email protected]
> > To: [email protected]; [email protected]
> > Subject: Re: [petsc-users] approaches to reduce computing time
> > Date: Sun, 10 Nov 2013 12:20:18 -0700
> >
> > Roc Wang <[email protected]> writes:
> >
> > > Hi all,
> > >
> > > I am trying to minimize the computing time to solve a large sparse
> matrix. The matrix dimension is with m=321 n=321 and p=321. I am trying to
> reduce the computing time from two directions: 1 finding a Pre-conditioner
> to reduce the number of iterations which reduces the time numerically, 2
> requesting more cores.
> > >
> > > ----For the first method, I tried several methods:
> > > 1 default KSP and PC,
> > > 2 -ksp_type fgmres -ksp_gmres_restart 30 -pc_type ksp -ksp_pc_type
> jacobi,
> > > 3 -ksp_type lgmres -ksp_gmres_restart 40 -ksp_lgmres_augment 10,
> > > 4 -ksp_type lgmres -ksp_gmres_restart 50 -ksp_lgmres_augment 10,
> > > 5 -ksp_type lgmres -ksp_gmres_restart 40 -ksp_lgmres_augment 10
> -pc_type asm (PCASM)
> > >
> > > The iterations and timing is like the following with 128 cores
> requested:
> > > case# iter timing (s)
> > > 1 1436 816
> > > 2 3 12658
> > > 3 1069 669.64
> > > 4 872 768.12
> > > 5 927 513.14
> > >
> > > It can be seen that change -ksp_gmres_restart and -ksp_lgmres_augment
> can help to reduce the iterations but not the timing (comparing case 3 and
> 4). Second, the PCASM helps a lot. Although the second option is able to
> reduce iterations, the timing increases very much. Is it because more
> operations are needed in the PC?
> > >
> > > My questions here are: 1. Which direction should I take to select
> > > -ksp_gmres_restart and -ksp_lgmres_augment? For example, if larger
> > > restart with large augment is better or larger restart with smaller
> > > augment is better?
> >
> > Look at the -log_summary. By increasing the restart, the work in
> > KSPGMRESOrthog will increase linearly, but the number of iterations
> > might decrease enough to compensate. There is no general rule here
> > since it depends on the relative expense of operations for your problem
> > on your machine.
> >
> > > ----For the second method, I tried with -ksp_type lgmres
> -ksp_gmres_restart 40 -ksp_lgmres_augment 10 -pc_type asm with different
> number of cores. I found the speedup ratio increases slowly when more than
> 32 to 64 cores are requested. I searched the milling list archives and
> found that I am very likely running into the memory bandwidth bottleneck.
> http://www.mail-archive.com/[email protected]/msg19152.html:
> > >
> > > # of cores iter timing
> > > 1 923 19541.83
> > > 4 929 5897.06
> > > 8 932 4854.72
> > > 16 924 1494.33
> > > 32 924 1480.88
> > > 64 928 686.89
> > > 128 927 627.33
> > > 256 926 552.93
> >
> > The bandwidth issue has more to do with using multiple cores within a
> > node rather than between nodes. Likely the above is a load balancing
> > problem or bad communication.
>
> I use DM to manage the distributed data.  The DM was created by calling
> DMDACreate3d() and let PETSc decide the local number of nodes in each
> direction. To my understand the load of each core is determined at this
> stage.   If the load balance is done when DMDACreate3d() is called and use
> PETSC_DECIDE option? Or how should make the load balanced after DM is
> created?
>
>
> We do not have a way to do fine-grained load balancing for the DMDA since
> it is intended for very simple topologies. You can see
> if it is load imbalance from the division by running with a cube that is
> evenly divisible with a cube number of processes.
>
>    Matt
>
> So, I have nothing to do to make the load balanced if I use DMDA?  Would
> you please take a look at the attached log summary files and give me some
> suggestions on how to improve the speedup ratio? Thanks.
>
>
> Please try what I suggested above. And it looks like there is a little
> load imbalance
>
> Roc----So if the domain is a cube, then the number of the processors is
> better to be like 2^3=8, 3^3=9, 4^4 =16, and so on, right?
>
>
> I want you to try this to eliminate load imbalance as a reason for poor
> speedup. I don't think it is, but we will see.
>
>
> I am also wondering whether the physical boundary type effects the load
> balance? Since freed node, Dirichlet node and Neumann node has different
> number of neighbors?
>
> VecAXPY              234 1.0 1.0124e+00 3.4 1.26e+08 1.1 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 15290
>
> VecAXPY              234 1.0 4.2862e-01 3.6 6.37e+07 1.1 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 36115
>
>
> although it is not limiting the speedup. The time imbalance is really
> strange. I am guessing other jobs are running on this machine.
>
>
> Roc----The code was run a cluster. There should be other jobs were
> running. Do you mean those jobs affect the load balance of my job or speed
> of the cluster?  I am just trying to improve the scalability of the code,
> but really don't know what's the reason that the speedup ratio decreases
> so quickly? Thanks.
>
>
> Yes, other people running can definitely screw up speedup and cause
> imbalance. Usually timing runs are made with dedicated time.
>
> Your VecAXPY and MatMult are speeding up just fine. It is reductions which
> are killing your computation.
> You should switch to a more effective preconditioner, so you can avoid all
> those dot products. Also, you
> might try something like BiCG with fewer dot products.
>
>    Matt
>
>
>    Matt
>
>
> >
> > > My question here is: Is there any other PC can help on both reducing
> iterations and increasing scalability? Thanks.
> >
> > Always send -log_summary with questions like this, but algebraic
> multigrid is a good place to start.
>
> Please take a look at the attached log file, they are for 128 cores and
> 256 cores, respectively.  Based on the log files, what should be done to
> increase the scalability? Thanks.
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>



-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

Re: [petsc-users] approaches to reduce computing time

Reply via email to