how to optimize my kspsolve

Matthew Knepley Sat, 07 Jan 2017 14:25:38 -0800

On Sat, Jan 7, 2017 at 4:20 PM, Manuel Valera <[email protected]> wrote:


> Thank you Matthew,
>
> On Sat, Jan 7, 2017 at 1:49 PM, Matthew Knepley <[email protected]> wrote:
>
>> On Sat, Jan 7, 2017 at 3:32 PM, Manuel Valera <[email protected]>
>> wrote:
>>
>>> Hi Devs, hope you are having a great weekend,
>>>
>>> I could finally parallelize my linear solver and implement it into the
>>> rest of the code in a way that only the linear system is solved in
>>> parallel, great news for my team, but there is a catch and is that i don't
>>> see any speedup in the linear system, i don't know if its the MPI in the
>>> cluster we are using, but im not sure on how to debug it,
>>>
>>
>> We need to see -log_view output for any performance question.
>>
>>
>>> On the other hand and because of this issue i was trying to do
>>> -log_summary or -log_view and i noticed the program in this context hangs
>>> when is time of producing the log, if i debug this for 2 cores, process 0
>>> exits normally but process 1 hangs in the vectorscatterbegin() with
>>> scatter_reverse way back in the code,
>>>
>>
>> You are calling a collective routine from only 1 process.
>>
>>
>   Matt
>>
>
> I am pretty confident this is not the case,
>

This is still the simplest explanation. Can you send the stack trace for
the 2 process run?


> the callings to vecscattercreatetozero and vecscatterbegin are made in all
> processes, the program goes thru all of the iterations on the linear
> solver, writes output correctly and even closes all the petsc objects
> without complaining, the freeze occurs at the very end when the log is to
> be produced.
>

If you can send us a code to run, we can likely find the error.

  Thanks,

     Matt


> Thanks,
>
> Manuel
>
>
>
>>
>>
>>> and even after destroying all associated objects and calling
>>> petscfinalize(), so im really clueless on why is this, as it only happens
>>> for -log_* or -ksp_view options.
>>>
>>> my -ksp_view shows this:
>>>
>>>  KSP Object: 2 MPI processes
>>>
>>>   type: gcr
>>>
>>>     GCR: restart = 30
>>>
>>>     GCR: restarts performed = 20
>>>
>>>   maximum iterations=10000, initial guess is zero
>>>
>>>   tolerances:  relative=1e-14, absolute=1e-50, divergence=10000.
>>>
>>>   right preconditioning
>>>
>>>   using UNPRECONDITIONED norm type for convergence test
>>>
>>> PC Object: 2 MPI processes
>>>
>>>   type: bjacobi
>>>
>>>     block Jacobi: number of blocks = 2
>>>
>>>     Local solve is same for all blocks, in the following KSP and PC
>>> objects:
>>>
>>>   KSP Object:  (sub_)   1 MPI processes
>>>
>>>     type: preonly
>>>
>>>     maximum iterations=10000, initial guess is zero
>>>
>>>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>
>>>     left preconditioning
>>>
>>>     using NONE norm type for convergence test
>>>
>>>   PC Object:  (sub_)   1 MPI processes
>>>
>>>     type: ilu
>>>
>>>       ILU: out-of-place factorization
>>>
>>>       0 levels of fill
>>>
>>>       tolerance for zero pivot 2.22045e-14
>>>
>>>       matrix ordering: natural
>>>
>>>       factor fill ratio given 1., needed 1.
>>>
>>>         Factored matrix follows:
>>>
>>>           Mat Object:           1 MPI processes
>>>
>>>             type: seqaij
>>>
>>>             rows=100000, cols=100000
>>>
>>>             package used to perform factorization: petsc
>>>
>>>             total: nonzeros=1675180, allocated nonzeros=1675180
>>>
>>>             total number of mallocs used during MatSetValues calls =0
>>>
>>>               not using I-node routines
>>>
>>>     linear system matrix = precond matrix:
>>>
>>>     Mat Object:     1 MPI processes
>>>
>>>       type: seqaij
>>>
>>>       rows=100000, cols=100000
>>>
>>>       total: nonzeros=1675180, allocated nonzeros=1675180
>>>
>>>       total number of mallocs used during MatSetValues calls =0
>>>
>>>         not using I-node routines
>>>
>>>   linear system matrix = precond matrix:
>>>
>>>   Mat Object:   2 MPI processes
>>>
>>>     type: mpiaij
>>>
>>>     rows=200000, cols=200000
>>>
>>>     total: nonzeros=3373340, allocated nonzeros=3373340
>>>
>>>     total number of mallocs used during MatSetValues calls =0
>>>
>>>       not using I-node (on process 0) routines
>>>
>>>
>>>
>>> And i configured my PC object as:
>>>
>>>
>>>    call PCSetType(mg,PCHYPRE,ierr)
>>>
>>>    call PCHYPRESetType(mg,'boomeramg',ierr)
>>>
>>>
>>>     call PetscOptionsSetValue(PETSC_NULL_OBJECT,'pc_hypre_boomeramg_n
>>> odal_coarsen','1',ierr)
>>>
>>>     call PetscOptionsSetValue(PETSC_NULL_OBJECT,'pc_hypre_boomeramg_v
>>> ec_interp_variant','1',ierr)
>>>
>>>
>>>
>>> What are your thoughts ?
>>>
>>> Thanks,
>>>
>>> Manuel
>>>
>>>
>>>
>>> On Fri, Jan 6, 2017 at 1:58 PM, Manuel Valera <[email protected]>
>>> wrote:
>>>
>>>> Awesome, that did it, thanks once again.
>>>>
>>>>
>>>> On Fri, Jan 6, 2017 at 1:53 PM, Barry Smith <[email protected]> wrote:
>>>>
>>>>>
>>>>>    Take the scatter out of the if () since everyone does it and get
>>>>> rid of the VecView().
>>>>>
>>>>>    Does this work? If not where is it hanging?
>>>>>
>>>>>
>>>>> > On Jan 6, 2017, at 3:29 PM, Manuel Valera <[email protected]>
>>>>> wrote:
>>>>> >
>>>>> > Thanks Dave,
>>>>> >
>>>>> > I think is interesting it never gave an error on this, after adding
>>>>> the vecassembly calls it still shows the same behavior, without
>>>>> complaining, i did:
>>>>> >
>>>>> > if(rankl==0)then
>>>>> >
>>>>> >      call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr)
>>>>> >      call VecAssemblyBegin(bp0,ierr) ; call VecAssemblyEnd(bp0,ierr);
>>>>> >      CHKERRQ(ierr)
>>>>> >
>>>>> endif
>>>>> >
>>>>> >
>>>>> >      call VecScatterBegin(ctr,bp0,bp2,IN
>>>>> SERT_VALUES,SCATTER_REVERSE,ierr)
>>>>> >      call VecScatterEnd(ctr,bp0,bp2,INSE
>>>>> RT_VALUES,SCATTER_REVERSE,ierr)
>>>>> >      print*,"done! "
>>>>> >      CHKERRQ(ierr)
>>>>> >
>>>>> >
>>>>> >        CHKERRQ(ierr)
>>>>> >
>>>>> >
>>>>> > Thanks.
>>>>> >
>>>>> > On Fri, Jan 6, 2017 at 12:44 PM, Dave May <[email protected]>
>>>>> wrote:
>>>>> >
>>>>> >
>>>>> > On 6 January 2017 at 20:24, Manuel Valera <[email protected]>
>>>>> wrote:
>>>>> > Great help Barry, i totally had overlooked that option (it is
>>>>> explicit in the vecscatterbegin call help page but not in
>>>>> vecscattercreatetozero, as i read later)
>>>>> >
>>>>> > So i used that and it works partially, it scatters te values
>>>>> assigned in root but not the rest, if i call vecscatterbegin from outside
>>>>> root it hangs, the code currently look as this:
>>>>> >
>>>>> >   call VecScatterCreateToZero(bp2,ctr,bp0,ierr); CHKERRQ(ierr)
>>>>> >
>>>>> >   call PetscObjectSetName(bp0, 'bp0:',ierr)
>>>>> >
>>>>> > if(rankl==0)then
>>>>> >
>>>>> >      call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr)
>>>>> >
>>>>> >      call VecView(bp0,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>>>> >
>>>>> >
>>>>> > You need to call
>>>>> >
>>>>> >   VecAssemblyBegin(bp0);
>>>>> >   VecAssemblyEnd(bp0);
>>>>> > after your last call to VecSetValues() before you can do any
>>>>> operations with bp0.
>>>>> >
>>>>> > With your current code, the call to VecView should produce an error
>>>>> if you used the error checking macro CHKERRQ(ierr) (as should
>>>>> VecScatter{Begin,End}
>>>>> >
>>>>> > Thanks,
>>>>> >   Dave
>>>>> >
>>>>> >
>>>>> >      call VecScatterBegin(ctr,bp0,bp2,IN
>>>>> SERT_VALUES,SCATTER_REVERSE,ierr)
>>>>> >      call VecScatterEnd(ctr,bp0,bp2,INSE
>>>>> RT_VALUES,SCATTER_REVERSE,ierr)
>>>>> >      print*,"done! "
>>>>> >      CHKERRQ(ierr)
>>>>> >
>>>>> > endif
>>>>> >
>>>>> >    ! call VecScatterBegin(ctr,bp0,bp2,IN
>>>>> SERT_VALUES,SCATTER_REVERSE,ierr)
>>>>> >    !  call VecScatterEnd(ctr,bp0,bp2,INSE
>>>>> RT_VALUES,SCATTER_REVERSE,ierr)
>>>>> >
>>>>> >   call VecView(bp2,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>>>> >
>>>>> >   call PetscBarrier(PETSC_NULL_OBJECT,ierr)
>>>>> >
>>>>> >   call exit()
>>>>> >
>>>>> >
>>>>> >
>>>>> > And the output is: (with bp the right answer)
>>>>> >
>>>>> > Vec Object:bp: 2 MPI processes
>>>>> >   type: mpi
>>>>> > Process [0]
>>>>> > 1.
>>>>> > 2.
>>>>> > Process [1]
>>>>> > 4.
>>>>> > 3.
>>>>> > Vec Object:bp2: 2 MPI processes  (before scatter)
>>>>> >   type: mpi
>>>>> > Process [0]
>>>>> > 0.
>>>>> > 0.
>>>>> > Process [1]
>>>>> > 0.
>>>>> > 0.
>>>>> > Vec Object:bp0: 1 MPI processes
>>>>> >   type: seq
>>>>> > 1.
>>>>> > 2.
>>>>> > 4.
>>>>> > 3.
>>>>> >  done!
>>>>> > Vec Object:bp2: 2 MPI processes  (after scatter)
>>>>> >   type: mpi
>>>>> > Process [0]
>>>>> > 1.
>>>>> > 2.
>>>>> > Process [1]
>>>>> > 0.
>>>>> > 0.
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > Thanks inmensely for your help,
>>>>> >
>>>>> > Manuel
>>>>> >
>>>>> >
>>>>> > On Thu, Jan 5, 2017 at 4:39 PM, Barry Smith <[email protected]>
>>>>> wrote:
>>>>> >
>>>>> > > On Jan 5, 2017, at 6:21 PM, Manuel Valera <[email protected]>
>>>>> wrote:
>>>>> > >
>>>>> > > Hello Devs is me again,
>>>>> > >
>>>>> > > I'm trying to distribute a vector to all called processes, the
>>>>> vector would be originally in root as a sequential vector and i would like
>>>>> to scatter it, what would the best call to do this ?
>>>>> > >
>>>>> > > I already know how to gather a distributed vector to root with
>>>>> VecScatterCreateToZero, this would be the inverse operation,
>>>>> >
>>>>> >    Use the same VecScatter object but with SCATTER_REVERSE, not you
>>>>> need to reverse the two vector arguments as well.
>>>>> >
>>>>> >
>>>>> > > i'm currently trying with VecScatterCreate() and as of now im
>>>>> doing the following:
>>>>> > >
>>>>> > >
>>>>> > > if(rank==0)then
>>>>> > >
>>>>> > >
>>>>> > >      call VecCreate(PETSC_COMM_SELF,bp0,ierr); CHKERRQ(ierr) !if
>>>>> i use WORLD
>>>>> > >
>>>>> !freezes in SetSizes
>>>>> > >      call VecSetSizes(bp0,PETSC_DECIDE,nbdp,ierr); CHKERRQ(ierr)
>>>>> > >      call VecSetType(bp0,VECSEQ,ierr)
>>>>> > >      call VecSetFromOptions(bp0,ierr); CHKERRQ(ierr)
>>>>> > >
>>>>> > >
>>>>> > >      call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr)
>>>>> > >
>>>>> > >      !call VecSet(bp0,5.0D0,ierr); CHKERRQ(ierr)
>>>>> > >
>>>>> > >
>>>>> > >      call VecView(bp0,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>>>> > >
>>>>> > >      call VecAssemblyBegin(bp0,ierr) ; call
>>>>> VecAssemblyEnd(bp0,ierr) !rhs
>>>>> > >
>>>>> > >      do i=0,nbdp-1,1
>>>>> > >         ind(i+1) = i
>>>>> > >      enddo
>>>>> > >
>>>>> > >      call ISCreateGeneral(PETSC_COMM_SEL
>>>>> F,nbdp,ind,PETSC_COPY_VALUES,locis,ierr)
>>>>> > >
>>>>> > >     !call VecScatterCreate(bp0,PETSC_NULL_OBJECT,bp2,is,ctr,ierr)
>>>>> !if i use SELF
>>>>> > >
>>>>>  !freezes here.
>>>>> > >
>>>>> > >      call VecScatterCreate(bp0,locis,bp2
>>>>> ,PETSC_NULL_OBJECT,ctr,ierr)
>>>>> > >
>>>>> > > endif
>>>>> > >
>>>>> > > bp2 being the receptor MPI vector to scatter to
>>>>> > >
>>>>> > > But it freezes in VecScatterCreate when trying to use more than
>>>>> one processor, what would be a better approach ?
>>>>> > >
>>>>> > >
>>>>> > > Thanks once again,
>>>>> > >
>>>>> > > Manuel
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > > On Wed, Jan 4, 2017 at 3:30 PM, Manuel Valera <
>>>>> [email protected]> wrote:
>>>>> > > Thanks i had no idea how to debug and read those logs, that solved
>>>>> this issue at least (i was sending a message from root to everyone else,
>>>>> but trying to catch from everyone else including root)
>>>>> > >
>>>>> > > Until next time, many thanks,
>>>>> > >
>>>>> > > Manuel
>>>>> > >
>>>>> > > On Wed, Jan 4, 2017 at 3:23 PM, Matthew Knepley <[email protected]>
>>>>> wrote:
>>>>> > > On Wed, Jan 4, 2017 at 5:21 PM, Manuel Valera <
>>>>> [email protected]> wrote:
>>>>> > > I did a PetscBarrier just before calling the vicariate routine and
>>>>> im pretty sure im calling it from every processor, code looks like this:
>>>>> > >
>>>>> > > From the gdb trace.
>>>>> > >
>>>>> > >   Proc 0: Is in some MPI routine you call yourself, line 113
>>>>> > >
>>>>> > >   Proc 1: Is in VecCreate(), line 130
>>>>> > >
>>>>> > > You need to fix your communication code.
>>>>> > >
>>>>> > >    Matt
>>>>> > >
>>>>> > > call PetscBarrier(PETSC_NULL_OBJECT,ierr)
>>>>> > >
>>>>> > > print*,'entering POInit from',rank
>>>>> > > !call exit()
>>>>> > >
>>>>> > > call PetscObjsInit()
>>>>> > >
>>>>> > >
>>>>> > > And output gives:
>>>>> > >
>>>>> > >  entering POInit from           0
>>>>> > >  entering POInit from           1
>>>>> > >  entering POInit from           2
>>>>> > >  entering POInit from           3
>>>>> > >
>>>>> > >
>>>>> > > Still hangs in the same way,
>>>>> > >
>>>>> > > Thanks,
>>>>> > >
>>>>> > > Manuel
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > > On Wed, Jan 4, 2017 at 2:55 PM, Manuel Valera <
>>>>> [email protected]> wrote:
>>>>> > > Thanks for the answers !
>>>>> > >
>>>>> > > heres the screenshot of what i got from bt in gdb (great hint in
>>>>> how to debug in petsc, didn't know that)
>>>>> > >
>>>>> > > I don't really know what to look at here,
>>>>> > >
>>>>> > > Thanks,
>>>>> > >
>>>>> > > Manuel
>>>>> > >
>>>>> > > On Wed, Jan 4, 2017 at 2:39 PM, Dave May <[email protected]>
>>>>> wrote:
>>>>> > > Are you certain ALL ranks in PETSC_COMM_WORLD call these
>>>>> function(s). These functions cannot be inside if statements like
>>>>> > > if (rank == 0){
>>>>> > >   VecCreateMPI(...)
>>>>> > > }
>>>>> > >
>>>>> > >
>>>>> > > On Wed, 4 Jan 2017 at 23:34, Manuel Valera <[email protected]>
>>>>> wrote:
>>>>> > > Thanks Dave for the quick answer, appreciate it,
>>>>> > >
>>>>> > > I just tried that and it didn't make a difference, any other
>>>>> suggestions ?
>>>>> > >
>>>>> > > Thanks,
>>>>> > > Manuel
>>>>> > >
>>>>> > > On Wed, Jan 4, 2017 at 2:29 PM, Dave May <[email protected]>
>>>>> wrote:
>>>>> > > You need to swap the order of your function calls.
>>>>> > > Call VecSetSizes() before VecSetType()
>>>>> > >
>>>>> > > Thanks,
>>>>> > >   Dave
>>>>> > >
>>>>> > >
>>>>> > > On Wed, 4 Jan 2017 at 23:21, Manuel Valera <[email protected]>
>>>>> wrote:
>>>>> > > Hello all, happy new year,
>>>>> > >
>>>>> > > I'm working on parallelizing my code, it worked and provided some
>>>>> results when i just called more than one processor, but created artifacts
>>>>> because i didn't need one image of the whole program in each processor,
>>>>> conflicting with each other.
>>>>> > >
>>>>> > > Since the pressure solver is the main part i need in parallel im
>>>>> chosing mpi to run everything in root processor until its time to solve 
>>>>> for
>>>>> pressure, at this point im trying to create a distributed vector using
>>>>> either
>>>>> > >
>>>>> > >      call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,nbdp,xp,ierr)
>>>>> > > or
>>>>> > >      call VecCreate(PETSC_COMM_WORLD,xp,ierr); CHKERRQ(ierr)
>>>>> > >      call VecSetType(xp,VECMPI,ierr)
>>>>> > >      call VecSetSizes(xp,PETSC_DECIDE,nbdp,ierr); CHKERRQ(ierr)
>>>>> > >
>>>>> > >
>>>>> > > In both cases program hangs at this point, something it never
>>>>> happened on the naive way i described before. I've made sure the global
>>>>> size, nbdp, is the same in every processor. What can be wrong?
>>>>> > >
>>>>> > > Thanks for your kind help,
>>>>> > >
>>>>> > > Manuel.
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > >
>>>>> > > --
>>>>> > > What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which their
>>>>> experiments lead.
>>>>> > > -- Norbert Wiener
>>>>> > >
>>>>> > >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

Re: [petsc-users] -log_view hangs unexpectedly // how to optimize my kspsolve

Reply via email to