how to optimize my kspsolve

Matthew Knepley Sat, 07 Jan 2017 13:50:02 -0800

On Sat, Jan 7, 2017 at 3:32 PM, Manuel Valera <[email protected]> wrote:


> Hi Devs, hope you are having a great weekend,
>
> I could finally parallelize my linear solver and implement it into the
> rest of the code in a way that only the linear system is solved in
> parallel, great news for my team, but there is a catch and is that i don't
> see any speedup in the linear system, i don't know if its the MPI in the
> cluster we are using, but im not sure on how to debug it,
>

We need to see -log_view output for any performance question.


> On the other hand and because of this issue i was trying to do
> -log_summary or -log_view and i noticed the program in this context hangs
> when is time of producing the log, if i debug this for 2 cores, process 0
> exits normally but process 1 hangs in the vectorscatterbegin() with
> scatter_reverse way back in the code,
>

You are calling a collective routine from only 1 process.

  Matt


> and even after destroying all associated objects and calling
> petscfinalize(), so im really clueless on why is this, as it only happens
> for -log_* or -ksp_view options.
>
> my -ksp_view shows this:
>
>  KSP Object: 2 MPI processes
>
>   type: gcr
>
>     GCR: restart = 30
>
>     GCR: restarts performed = 20
>
>   maximum iterations=10000, initial guess is zero
>
>   tolerances:  relative=1e-14, absolute=1e-50, divergence=10000.
>
>   right preconditioning
>
>   using UNPRECONDITIONED norm type for convergence test
>
> PC Object: 2 MPI processes
>
>   type: bjacobi
>
>     block Jacobi: number of blocks = 2
>
>     Local solve is same for all blocks, in the following KSP and PC
> objects:
>
>   KSP Object:  (sub_)   1 MPI processes
>
>     type: preonly
>
>     maximum iterations=10000, initial guess is zero
>
>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>
>     left preconditioning
>
>     using NONE norm type for convergence test
>
>   PC Object:  (sub_)   1 MPI processes
>
>     type: ilu
>
>       ILU: out-of-place factorization
>
>       0 levels of fill
>
>       tolerance for zero pivot 2.22045e-14
>
>       matrix ordering: natural
>
>       factor fill ratio given 1., needed 1.
>
>         Factored matrix follows:
>
>           Mat Object:           1 MPI processes
>
>             type: seqaij
>
>             rows=100000, cols=100000
>
>             package used to perform factorization: petsc
>
>             total: nonzeros=1675180, allocated nonzeros=1675180
>
>             total number of mallocs used during MatSetValues calls =0
>
>               not using I-node routines
>
>     linear system matrix = precond matrix:
>
>     Mat Object:     1 MPI processes
>
>       type: seqaij
>
>       rows=100000, cols=100000
>
>       total: nonzeros=1675180, allocated nonzeros=1675180
>
>       total number of mallocs used during MatSetValues calls =0
>
>         not using I-node routines
>
>   linear system matrix = precond matrix:
>
>   Mat Object:   2 MPI processes
>
>     type: mpiaij
>
>     rows=200000, cols=200000
>
>     total: nonzeros=3373340, allocated nonzeros=3373340
>
>     total number of mallocs used during MatSetValues calls =0
>
>       not using I-node (on process 0) routines
>
>
>
> And i configured my PC object as:
>
>
>    call PCSetType(mg,PCHYPRE,ierr)
>
>    call PCHYPRESetType(mg,'boomeramg',ierr)
>
>
>     call PetscOptionsSetValue(PETSC_NULL_OBJECT,'pc_hypre_
> boomeramg_nodal_coarsen','1',ierr)
>
>     call PetscOptionsSetValue(PETSC_NULL_OBJECT,'pc_hypre_
> boomeramg_vec_interp_variant','1',ierr)
>
>
>
> What are your thoughts ?
>
> Thanks,
>
> Manuel
>
>
>
> On Fri, Jan 6, 2017 at 1:58 PM, Manuel Valera <[email protected]>
> wrote:
>
>> Awesome, that did it, thanks once again.
>>
>>
>> On Fri, Jan 6, 2017 at 1:53 PM, Barry Smith <[email protected]> wrote:
>>
>>>
>>>    Take the scatter out of the if () since everyone does it and get rid
>>> of the VecView().
>>>
>>>    Does this work? If not where is it hanging?
>>>
>>>
>>> > On Jan 6, 2017, at 3:29 PM, Manuel Valera <[email protected]>
>>> wrote:
>>> >
>>> > Thanks Dave,
>>> >
>>> > I think is interesting it never gave an error on this, after adding
>>> the vecassembly calls it still shows the same behavior, without
>>> complaining, i did:
>>> >
>>> > if(rankl==0)then
>>> >
>>> >      call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr)
>>> >      call VecAssemblyBegin(bp0,ierr) ; call VecAssemblyEnd(bp0,ierr);
>>> >      CHKERRQ(ierr)
>>> >
>>> endif
>>> >
>>> >
>>> >      call VecScatterBegin(ctr,bp0,bp2,INSERT_VALUES,SCATTER_REVERSE,ie
>>> rr)
>>> >      call VecScatterEnd(ctr,bp0,bp2,INSERT_VALUES,SCATTER_REVERSE,ierr
>>> )
>>> >      print*,"done! "
>>> >      CHKERRQ(ierr)
>>> >
>>> >
>>> >        CHKERRQ(ierr)
>>> >
>>> >
>>> > Thanks.
>>> >
>>> > On Fri, Jan 6, 2017 at 12:44 PM, Dave May <[email protected]>
>>> wrote:
>>> >
>>> >
>>> > On 6 January 2017 at 20:24, Manuel Valera <[email protected]>
>>> wrote:
>>> > Great help Barry, i totally had overlooked that option (it is explicit
>>> in the vecscatterbegin call help page but not in vecscattercreatetozero, as
>>> i read later)
>>> >
>>> > So i used that and it works partially, it scatters te values assigned
>>> in root but not the rest, if i call vecscatterbegin from outside root it
>>> hangs, the code currently look as this:
>>> >
>>> >   call VecScatterCreateToZero(bp2,ctr,bp0,ierr); CHKERRQ(ierr)
>>> >
>>> >   call PetscObjectSetName(bp0, 'bp0:',ierr)
>>> >
>>> > if(rankl==0)then
>>> >
>>> >      call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr)
>>> >
>>> >      call VecView(bp0,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>> >
>>> >
>>> > You need to call
>>> >
>>> >   VecAssemblyBegin(bp0);
>>> >   VecAssemblyEnd(bp0);
>>> > after your last call to VecSetValues() before you can do any
>>> operations with bp0.
>>> >
>>> > With your current code, the call to VecView should produce an error if
>>> you used the error checking macro CHKERRQ(ierr) (as should
>>> VecScatter{Begin,End}
>>> >
>>> > Thanks,
>>> >   Dave
>>> >
>>> >
>>> >      call VecScatterBegin(ctr,bp0,bp2,INSERT_VALUES,SCATTER_REVERSE,ie
>>> rr)
>>> >      call VecScatterEnd(ctr,bp0,bp2,INSERT_VALUES,SCATTER_REVERSE,ierr
>>> )
>>> >      print*,"done! "
>>> >      CHKERRQ(ierr)
>>> >
>>> > endif
>>> >
>>> >    ! call VecScatterBegin(ctr,bp0,bp2,INSERT_VALUES,SCATTER_REVERSE,ie
>>> rr)
>>> >    !  call VecScatterEnd(ctr,bp0,bp2,INSE
>>> RT_VALUES,SCATTER_REVERSE,ierr)
>>> >
>>> >   call VecView(bp2,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>> >
>>> >   call PetscBarrier(PETSC_NULL_OBJECT,ierr)
>>> >
>>> >   call exit()
>>> >
>>> >
>>> >
>>> > And the output is: (with bp the right answer)
>>> >
>>> > Vec Object:bp: 2 MPI processes
>>> >   type: mpi
>>> > Process [0]
>>> > 1.
>>> > 2.
>>> > Process [1]
>>> > 4.
>>> > 3.
>>> > Vec Object:bp2: 2 MPI processes  (before scatter)
>>> >   type: mpi
>>> > Process [0]
>>> > 0.
>>> > 0.
>>> > Process [1]
>>> > 0.
>>> > 0.
>>> > Vec Object:bp0: 1 MPI processes
>>> >   type: seq
>>> > 1.
>>> > 2.
>>> > 4.
>>> > 3.
>>> >  done!
>>> > Vec Object:bp2: 2 MPI processes  (after scatter)
>>> >   type: mpi
>>> > Process [0]
>>> > 1.
>>> > 2.
>>> > Process [1]
>>> > 0.
>>> > 0.
>>> >
>>> >
>>> >
>>> >
>>> > Thanks inmensely for your help,
>>> >
>>> > Manuel
>>> >
>>> >
>>> > On Thu, Jan 5, 2017 at 4:39 PM, Barry Smith <[email protected]>
>>> wrote:
>>> >
>>> > > On Jan 5, 2017, at 6:21 PM, Manuel Valera <[email protected]>
>>> wrote:
>>> > >
>>> > > Hello Devs is me again,
>>> > >
>>> > > I'm trying to distribute a vector to all called processes, the
>>> vector would be originally in root as a sequential vector and i would like
>>> to scatter it, what would the best call to do this ?
>>> > >
>>> > > I already know how to gather a distributed vector to root with
>>> VecScatterCreateToZero, this would be the inverse operation,
>>> >
>>> >    Use the same VecScatter object but with SCATTER_REVERSE, not you
>>> need to reverse the two vector arguments as well.
>>> >
>>> >
>>> > > i'm currently trying with VecScatterCreate() and as of now im doing
>>> the following:
>>> > >
>>> > >
>>> > > if(rank==0)then
>>> > >
>>> > >
>>> > >      call VecCreate(PETSC_COMM_SELF,bp0,ierr); CHKERRQ(ierr) !if i
>>> use WORLD
>>> > >
>>> !freezes in SetSizes
>>> > >      call VecSetSizes(bp0,PETSC_DECIDE,nbdp,ierr); CHKERRQ(ierr)
>>> > >      call VecSetType(bp0,VECSEQ,ierr)
>>> > >      call VecSetFromOptions(bp0,ierr); CHKERRQ(ierr)
>>> > >
>>> > >
>>> > >      call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr)
>>> > >
>>> > >      !call VecSet(bp0,5.0D0,ierr); CHKERRQ(ierr)
>>> > >
>>> > >
>>> > >      call VecView(bp0,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>> > >
>>> > >      call VecAssemblyBegin(bp0,ierr) ; call VecAssemblyEnd(bp0,ierr)
>>> !rhs
>>> > >
>>> > >      do i=0,nbdp-1,1
>>> > >         ind(i+1) = i
>>> > >      enddo
>>> > >
>>> > >      call ISCreateGeneral(PETSC_COMM_SEL
>>> F,nbdp,ind,PETSC_COPY_VALUES,locis,ierr)
>>> > >
>>> > >     !call VecScatterCreate(bp0,PETSC_NULL_OBJECT,bp2,is,ctr,ierr)
>>> !if i use SELF
>>> > >
>>>  !freezes here.
>>> > >
>>> > >      call VecScatterCreate(bp0,locis,bp2,PETSC_NULL_OBJECT,ctr,ierr)
>>> > >
>>> > > endif
>>> > >
>>> > > bp2 being the receptor MPI vector to scatter to
>>> > >
>>> > > But it freezes in VecScatterCreate when trying to use more than one
>>> processor, what would be a better approach ?
>>> > >
>>> > >
>>> > > Thanks once again,
>>> > >
>>> > > Manuel
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > On Wed, Jan 4, 2017 at 3:30 PM, Manuel Valera <[email protected]>
>>> wrote:
>>> > > Thanks i had no idea how to debug and read those logs, that solved
>>> this issue at least (i was sending a message from root to everyone else,
>>> but trying to catch from everyone else including root)
>>> > >
>>> > > Until next time, many thanks,
>>> > >
>>> > > Manuel
>>> > >
>>> > > On Wed, Jan 4, 2017 at 3:23 PM, Matthew Knepley <[email protected]>
>>> wrote:
>>> > > On Wed, Jan 4, 2017 at 5:21 PM, Manuel Valera <[email protected]>
>>> wrote:
>>> > > I did a PetscBarrier just before calling the vicariate routine and
>>> im pretty sure im calling it from every processor, code looks like this:
>>> > >
>>> > > From the gdb trace.
>>> > >
>>> > >   Proc 0: Is in some MPI routine you call yourself, line 113
>>> > >
>>> > >   Proc 1: Is in VecCreate(), line 130
>>> > >
>>> > > You need to fix your communication code.
>>> > >
>>> > >    Matt
>>> > >
>>> > > call PetscBarrier(PETSC_NULL_OBJECT,ierr)
>>> > >
>>> > > print*,'entering POInit from',rank
>>> > > !call exit()
>>> > >
>>> > > call PetscObjsInit()
>>> > >
>>> > >
>>> > > And output gives:
>>> > >
>>> > >  entering POInit from           0
>>> > >  entering POInit from           1
>>> > >  entering POInit from           2
>>> > >  entering POInit from           3
>>> > >
>>> > >
>>> > > Still hangs in the same way,
>>> > >
>>> > > Thanks,
>>> > >
>>> > > Manuel
>>> > >
>>> > >
>>> > >
>>> > > On Wed, Jan 4, 2017 at 2:55 PM, Manuel Valera <[email protected]>
>>> wrote:
>>> > > Thanks for the answers !
>>> > >
>>> > > heres the screenshot of what i got from bt in gdb (great hint in how
>>> to debug in petsc, didn't know that)
>>> > >
>>> > > I don't really know what to look at here,
>>> > >
>>> > > Thanks,
>>> > >
>>> > > Manuel
>>> > >
>>> > > On Wed, Jan 4, 2017 at 2:39 PM, Dave May <[email protected]>
>>> wrote:
>>> > > Are you certain ALL ranks in PETSC_COMM_WORLD call these
>>> function(s). These functions cannot be inside if statements like
>>> > > if (rank == 0){
>>> > >   VecCreateMPI(...)
>>> > > }
>>> > >
>>> > >
>>> > > On Wed, 4 Jan 2017 at 23:34, Manuel Valera <[email protected]>
>>> wrote:
>>> > > Thanks Dave for the quick answer, appreciate it,
>>> > >
>>> > > I just tried that and it didn't make a difference, any other
>>> suggestions ?
>>> > >
>>> > > Thanks,
>>> > > Manuel
>>> > >
>>> > > On Wed, Jan 4, 2017 at 2:29 PM, Dave May <[email protected]>
>>> wrote:
>>> > > You need to swap the order of your function calls.
>>> > > Call VecSetSizes() before VecSetType()
>>> > >
>>> > > Thanks,
>>> > >   Dave
>>> > >
>>> > >
>>> > > On Wed, 4 Jan 2017 at 23:21, Manuel Valera <[email protected]>
>>> wrote:
>>> > > Hello all, happy new year,
>>> > >
>>> > > I'm working on parallelizing my code, it worked and provided some
>>> results when i just called more than one processor, but created artifacts
>>> because i didn't need one image of the whole program in each processor,
>>> conflicting with each other.
>>> > >
>>> > > Since the pressure solver is the main part i need in parallel im
>>> chosing mpi to run everything in root processor until its time to solve for
>>> pressure, at this point im trying to create a distributed vector using
>>> either
>>> > >
>>> > >      call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,nbdp,xp,ierr)
>>> > > or
>>> > >      call VecCreate(PETSC_COMM_WORLD,xp,ierr); CHKERRQ(ierr)
>>> > >      call VecSetType(xp,VECMPI,ierr)
>>> > >      call VecSetSizes(xp,PETSC_DECIDE,nbdp,ierr); CHKERRQ(ierr)
>>> > >
>>> > >
>>> > > In both cases program hangs at this point, something it never
>>> happened on the naive way i described before. I've made sure the global
>>> size, nbdp, is the same in every processor. What can be wrong?
>>> > >
>>> > > Thanks for your kind help,
>>> > >
>>> > > Manuel.
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> > > -- Norbert Wiener
>>> > >
>>> > >
>>> >
>>> >
>>> >
>>> >
>>>
>>>
>>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

Re: [petsc-users] -log_view hangs unexpectedly // how to optimize my kspsolve

Reply via email to