On Sat, Jan 7, 2017 at 3:32 PM, Manuel Valera <[email protected]> wrote:
> Hi Devs, hope you are having a great weekend, > > I could finally parallelize my linear solver and implement it into the > rest of the code in a way that only the linear system is solved in > parallel, great news for my team, but there is a catch and is that i don't > see any speedup in the linear system, i don't know if its the MPI in the > cluster we are using, but im not sure on how to debug it, > We need to see -log_view output for any performance question. > On the other hand and because of this issue i was trying to do > -log_summary or -log_view and i noticed the program in this context hangs > when is time of producing the log, if i debug this for 2 cores, process 0 > exits normally but process 1 hangs in the vectorscatterbegin() with > scatter_reverse way back in the code, > You are calling a collective routine from only 1 process. Matt > and even after destroying all associated objects and calling > petscfinalize(), so im really clueless on why is this, as it only happens > for -log_* or -ksp_view options. > > my -ksp_view shows this: > > KSP Object: 2 MPI processes > > type: gcr > > GCR: restart = 30 > > GCR: restarts performed = 20 > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-14, absolute=1e-50, divergence=10000. > > right preconditioning > > using UNPRECONDITIONED norm type for convergence test > > PC Object: 2 MPI processes > > type: bjacobi > > block Jacobi: number of blocks = 2 > > Local solve is same for all blocks, in the following KSP and PC > objects: > > KSP Object: (sub_) 1 MPI processes > > type: preonly > > maximum iterations=10000, initial guess is zero > > tolerances: relative=1e-05, absolute=1e-50, divergence=10000. > > left preconditioning > > using NONE norm type for convergence test > > PC Object: (sub_) 1 MPI processes > > type: ilu > > ILU: out-of-place factorization > > 0 levels of fill > > tolerance for zero pivot 2.22045e-14 > > matrix ordering: natural > > factor fill ratio given 1., needed 1. > > Factored matrix follows: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=100000, cols=100000 > > package used to perform factorization: petsc > > total: nonzeros=1675180, allocated nonzeros=1675180 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node routines > > linear system matrix = precond matrix: > > Mat Object: 1 MPI processes > > type: seqaij > > rows=100000, cols=100000 > > total: nonzeros=1675180, allocated nonzeros=1675180 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node routines > > linear system matrix = precond matrix: > > Mat Object: 2 MPI processes > > type: mpiaij > > rows=200000, cols=200000 > > total: nonzeros=3373340, allocated nonzeros=3373340 > > total number of mallocs used during MatSetValues calls =0 > > not using I-node (on process 0) routines > > > > And i configured my PC object as: > > > call PCSetType(mg,PCHYPRE,ierr) > > call PCHYPRESetType(mg,'boomeramg',ierr) > > > call PetscOptionsSetValue(PETSC_NULL_OBJECT,'pc_hypre_ > boomeramg_nodal_coarsen','1',ierr) > > call PetscOptionsSetValue(PETSC_NULL_OBJECT,'pc_hypre_ > boomeramg_vec_interp_variant','1',ierr) > > > > What are your thoughts ? > > Thanks, > > Manuel > > > > On Fri, Jan 6, 2017 at 1:58 PM, Manuel Valera <[email protected]> > wrote: > >> Awesome, that did it, thanks once again. >> >> >> On Fri, Jan 6, 2017 at 1:53 PM, Barry Smith <[email protected]> wrote: >> >>> >>> Take the scatter out of the if () since everyone does it and get rid >>> of the VecView(). >>> >>> Does this work? If not where is it hanging? >>> >>> >>> > On Jan 6, 2017, at 3:29 PM, Manuel Valera <[email protected]> >>> wrote: >>> > >>> > Thanks Dave, >>> > >>> > I think is interesting it never gave an error on this, after adding >>> the vecassembly calls it still shows the same behavior, without >>> complaining, i did: >>> > >>> > if(rankl==0)then >>> > >>> > call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr) >>> > call VecAssemblyBegin(bp0,ierr) ; call VecAssemblyEnd(bp0,ierr); >>> > CHKERRQ(ierr) >>> > >>> endif >>> > >>> > >>> > call VecScatterBegin(ctr,bp0,bp2,INSERT_VALUES,SCATTER_REVERSE,ie >>> rr) >>> > call VecScatterEnd(ctr,bp0,bp2,INSERT_VALUES,SCATTER_REVERSE,ierr >>> ) >>> > print*,"done! " >>> > CHKERRQ(ierr) >>> > >>> > >>> > CHKERRQ(ierr) >>> > >>> > >>> > Thanks. >>> > >>> > On Fri, Jan 6, 2017 at 12:44 PM, Dave May <[email protected]> >>> wrote: >>> > >>> > >>> > On 6 January 2017 at 20:24, Manuel Valera <[email protected]> >>> wrote: >>> > Great help Barry, i totally had overlooked that option (it is explicit >>> in the vecscatterbegin call help page but not in vecscattercreatetozero, as >>> i read later) >>> > >>> > So i used that and it works partially, it scatters te values assigned >>> in root but not the rest, if i call vecscatterbegin from outside root it >>> hangs, the code currently look as this: >>> > >>> > call VecScatterCreateToZero(bp2,ctr,bp0,ierr); CHKERRQ(ierr) >>> > >>> > call PetscObjectSetName(bp0, 'bp0:',ierr) >>> > >>> > if(rankl==0)then >>> > >>> > call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr) >>> > >>> > call VecView(bp0,PETSC_VIEWER_STDOUT_WORLD,ierr) >>> > >>> > >>> > You need to call >>> > >>> > VecAssemblyBegin(bp0); >>> > VecAssemblyEnd(bp0); >>> > after your last call to VecSetValues() before you can do any >>> operations with bp0. >>> > >>> > With your current code, the call to VecView should produce an error if >>> you used the error checking macro CHKERRQ(ierr) (as should >>> VecScatter{Begin,End} >>> > >>> > Thanks, >>> > Dave >>> > >>> > >>> > call VecScatterBegin(ctr,bp0,bp2,INSERT_VALUES,SCATTER_REVERSE,ie >>> rr) >>> > call VecScatterEnd(ctr,bp0,bp2,INSERT_VALUES,SCATTER_REVERSE,ierr >>> ) >>> > print*,"done! " >>> > CHKERRQ(ierr) >>> > >>> > endif >>> > >>> > ! call VecScatterBegin(ctr,bp0,bp2,INSERT_VALUES,SCATTER_REVERSE,ie >>> rr) >>> > ! call VecScatterEnd(ctr,bp0,bp2,INSE >>> RT_VALUES,SCATTER_REVERSE,ierr) >>> > >>> > call VecView(bp2,PETSC_VIEWER_STDOUT_WORLD,ierr) >>> > >>> > call PetscBarrier(PETSC_NULL_OBJECT,ierr) >>> > >>> > call exit() >>> > >>> > >>> > >>> > And the output is: (with bp the right answer) >>> > >>> > Vec Object:bp: 2 MPI processes >>> > type: mpi >>> > Process [0] >>> > 1. >>> > 2. >>> > Process [1] >>> > 4. >>> > 3. >>> > Vec Object:bp2: 2 MPI processes (before scatter) >>> > type: mpi >>> > Process [0] >>> > 0. >>> > 0. >>> > Process [1] >>> > 0. >>> > 0. >>> > Vec Object:bp0: 1 MPI processes >>> > type: seq >>> > 1. >>> > 2. >>> > 4. >>> > 3. >>> > done! >>> > Vec Object:bp2: 2 MPI processes (after scatter) >>> > type: mpi >>> > Process [0] >>> > 1. >>> > 2. >>> > Process [1] >>> > 0. >>> > 0. >>> > >>> > >>> > >>> > >>> > Thanks inmensely for your help, >>> > >>> > Manuel >>> > >>> > >>> > On Thu, Jan 5, 2017 at 4:39 PM, Barry Smith <[email protected]> >>> wrote: >>> > >>> > > On Jan 5, 2017, at 6:21 PM, Manuel Valera <[email protected]> >>> wrote: >>> > > >>> > > Hello Devs is me again, >>> > > >>> > > I'm trying to distribute a vector to all called processes, the >>> vector would be originally in root as a sequential vector and i would like >>> to scatter it, what would the best call to do this ? >>> > > >>> > > I already know how to gather a distributed vector to root with >>> VecScatterCreateToZero, this would be the inverse operation, >>> > >>> > Use the same VecScatter object but with SCATTER_REVERSE, not you >>> need to reverse the two vector arguments as well. >>> > >>> > >>> > > i'm currently trying with VecScatterCreate() and as of now im doing >>> the following: >>> > > >>> > > >>> > > if(rank==0)then >>> > > >>> > > >>> > > call VecCreate(PETSC_COMM_SELF,bp0,ierr); CHKERRQ(ierr) !if i >>> use WORLD >>> > > >>> !freezes in SetSizes >>> > > call VecSetSizes(bp0,PETSC_DECIDE,nbdp,ierr); CHKERRQ(ierr) >>> > > call VecSetType(bp0,VECSEQ,ierr) >>> > > call VecSetFromOptions(bp0,ierr); CHKERRQ(ierr) >>> > > >>> > > >>> > > call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr) >>> > > >>> > > !call VecSet(bp0,5.0D0,ierr); CHKERRQ(ierr) >>> > > >>> > > >>> > > call VecView(bp0,PETSC_VIEWER_STDOUT_WORLD,ierr) >>> > > >>> > > call VecAssemblyBegin(bp0,ierr) ; call VecAssemblyEnd(bp0,ierr) >>> !rhs >>> > > >>> > > do i=0,nbdp-1,1 >>> > > ind(i+1) = i >>> > > enddo >>> > > >>> > > call ISCreateGeneral(PETSC_COMM_SEL >>> F,nbdp,ind,PETSC_COPY_VALUES,locis,ierr) >>> > > >>> > > !call VecScatterCreate(bp0,PETSC_NULL_OBJECT,bp2,is,ctr,ierr) >>> !if i use SELF >>> > > >>> !freezes here. >>> > > >>> > > call VecScatterCreate(bp0,locis,bp2,PETSC_NULL_OBJECT,ctr,ierr) >>> > > >>> > > endif >>> > > >>> > > bp2 being the receptor MPI vector to scatter to >>> > > >>> > > But it freezes in VecScatterCreate when trying to use more than one >>> processor, what would be a better approach ? >>> > > >>> > > >>> > > Thanks once again, >>> > > >>> > > Manuel >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > On Wed, Jan 4, 2017 at 3:30 PM, Manuel Valera <[email protected]> >>> wrote: >>> > > Thanks i had no idea how to debug and read those logs, that solved >>> this issue at least (i was sending a message from root to everyone else, >>> but trying to catch from everyone else including root) >>> > > >>> > > Until next time, many thanks, >>> > > >>> > > Manuel >>> > > >>> > > On Wed, Jan 4, 2017 at 3:23 PM, Matthew Knepley <[email protected]> >>> wrote: >>> > > On Wed, Jan 4, 2017 at 5:21 PM, Manuel Valera <[email protected]> >>> wrote: >>> > > I did a PetscBarrier just before calling the vicariate routine and >>> im pretty sure im calling it from every processor, code looks like this: >>> > > >>> > > From the gdb trace. >>> > > >>> > > Proc 0: Is in some MPI routine you call yourself, line 113 >>> > > >>> > > Proc 1: Is in VecCreate(), line 130 >>> > > >>> > > You need to fix your communication code. >>> > > >>> > > Matt >>> > > >>> > > call PetscBarrier(PETSC_NULL_OBJECT,ierr) >>> > > >>> > > print*,'entering POInit from',rank >>> > > !call exit() >>> > > >>> > > call PetscObjsInit() >>> > > >>> > > >>> > > And output gives: >>> > > >>> > > entering POInit from 0 >>> > > entering POInit from 1 >>> > > entering POInit from 2 >>> > > entering POInit from 3 >>> > > >>> > > >>> > > Still hangs in the same way, >>> > > >>> > > Thanks, >>> > > >>> > > Manuel >>> > > >>> > > >>> > > >>> > > On Wed, Jan 4, 2017 at 2:55 PM, Manuel Valera <[email protected]> >>> wrote: >>> > > Thanks for the answers ! >>> > > >>> > > heres the screenshot of what i got from bt in gdb (great hint in how >>> to debug in petsc, didn't know that) >>> > > >>> > > I don't really know what to look at here, >>> > > >>> > > Thanks, >>> > > >>> > > Manuel >>> > > >>> > > On Wed, Jan 4, 2017 at 2:39 PM, Dave May <[email protected]> >>> wrote: >>> > > Are you certain ALL ranks in PETSC_COMM_WORLD call these >>> function(s). These functions cannot be inside if statements like >>> > > if (rank == 0){ >>> > > VecCreateMPI(...) >>> > > } >>> > > >>> > > >>> > > On Wed, 4 Jan 2017 at 23:34, Manuel Valera <[email protected]> >>> wrote: >>> > > Thanks Dave for the quick answer, appreciate it, >>> > > >>> > > I just tried that and it didn't make a difference, any other >>> suggestions ? >>> > > >>> > > Thanks, >>> > > Manuel >>> > > >>> > > On Wed, Jan 4, 2017 at 2:29 PM, Dave May <[email protected]> >>> wrote: >>> > > You need to swap the order of your function calls. >>> > > Call VecSetSizes() before VecSetType() >>> > > >>> > > Thanks, >>> > > Dave >>> > > >>> > > >>> > > On Wed, 4 Jan 2017 at 23:21, Manuel Valera <[email protected]> >>> wrote: >>> > > Hello all, happy new year, >>> > > >>> > > I'm working on parallelizing my code, it worked and provided some >>> results when i just called more than one processor, but created artifacts >>> because i didn't need one image of the whole program in each processor, >>> conflicting with each other. >>> > > >>> > > Since the pressure solver is the main part i need in parallel im >>> chosing mpi to run everything in root processor until its time to solve for >>> pressure, at this point im trying to create a distributed vector using >>> either >>> > > >>> > > call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,nbdp,xp,ierr) >>> > > or >>> > > call VecCreate(PETSC_COMM_WORLD,xp,ierr); CHKERRQ(ierr) >>> > > call VecSetType(xp,VECMPI,ierr) >>> > > call VecSetSizes(xp,PETSC_DECIDE,nbdp,ierr); CHKERRQ(ierr) >>> > > >>> > > >>> > > In both cases program hangs at this point, something it never >>> happened on the naive way i described before. I've made sure the global >>> size, nbdp, is the same in every processor. What can be wrong? >>> > > >>> > > Thanks for your kind help, >>> > > >>> > > Manuel. >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > -- >>> > > What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> > > -- Norbert Wiener >>> > > >>> > > >>> > >>> > >>> > >>> > >>> >>> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
