On Sat, Jan 7, 2017 at 4:20 PM, Manuel Valera <[email protected]> wrote:
> Thank you Matthew, > > On Sat, Jan 7, 2017 at 1:49 PM, Matthew Knepley <[email protected]> wrote: > >> On Sat, Jan 7, 2017 at 3:32 PM, Manuel Valera <[email protected]> >> wrote: >> >>> Hi Devs, hope you are having a great weekend, >>> >>> I could finally parallelize my linear solver and implement it into the >>> rest of the code in a way that only the linear system is solved in >>> parallel, great news for my team, but there is a catch and is that i don't >>> see any speedup in the linear system, i don't know if its the MPI in the >>> cluster we are using, but im not sure on how to debug it, >>> >> >> We need to see -log_view output for any performance question. >> >> >>> On the other hand and because of this issue i was trying to do >>> -log_summary or -log_view and i noticed the program in this context hangs >>> when is time of producing the log, if i debug this for 2 cores, process 0 >>> exits normally but process 1 hangs in the vectorscatterbegin() with >>> scatter_reverse way back in the code, >>> >> >> You are calling a collective routine from only 1 process. >> >> > Matt >> > > I am pretty confident this is not the case, > This is still the simplest explanation. Can you send the stack trace for the 2 process run? > the callings to vecscattercreatetozero and vecscatterbegin are made in all > processes, the program goes thru all of the iterations on the linear > solver, writes output correctly and even closes all the petsc objects > without complaining, the freeze occurs at the very end when the log is to > be produced. > If you can send us a code to run, we can likely find the error. Thanks, Matt > Thanks, > > Manuel > > > >> >> >>> and even after destroying all associated objects and calling >>> petscfinalize(), so im really clueless on why is this, as it only happens >>> for -log_* or -ksp_view options. >>> >>> my -ksp_view shows this: >>> >>> KSP Object: 2 MPI processes >>> >>> type: gcr >>> >>> GCR: restart = 30 >>> >>> GCR: restarts performed = 20 >>> >>> maximum iterations=10000, initial guess is zero >>> >>> tolerances: relative=1e-14, absolute=1e-50, divergence=10000. >>> >>> right preconditioning >>> >>> using UNPRECONDITIONED norm type for convergence test >>> >>> PC Object: 2 MPI processes >>> >>> type: bjacobi >>> >>> block Jacobi: number of blocks = 2 >>> >>> Local solve is same for all blocks, in the following KSP and PC >>> objects: >>> >>> KSP Object: (sub_) 1 MPI processes >>> >>> type: preonly >>> >>> maximum iterations=10000, initial guess is zero >>> >>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>> >>> left preconditioning >>> >>> using NONE norm type for convergence test >>> >>> PC Object: (sub_) 1 MPI processes >>> >>> type: ilu >>> >>> ILU: out-of-place factorization >>> >>> 0 levels of fill >>> >>> tolerance for zero pivot 2.22045e-14 >>> >>> matrix ordering: natural >>> >>> factor fill ratio given 1., needed 1. >>> >>> Factored matrix follows: >>> >>> Mat Object: 1 MPI processes >>> >>> type: seqaij >>> >>> rows=100000, cols=100000 >>> >>> package used to perform factorization: petsc >>> >>> total: nonzeros=1675180, allocated nonzeros=1675180 >>> >>> total number of mallocs used during MatSetValues calls =0 >>> >>> not using I-node routines >>> >>> linear system matrix = precond matrix: >>> >>> Mat Object: 1 MPI processes >>> >>> type: seqaij >>> >>> rows=100000, cols=100000 >>> >>> total: nonzeros=1675180, allocated nonzeros=1675180 >>> >>> total number of mallocs used during MatSetValues calls =0 >>> >>> not using I-node routines >>> >>> linear system matrix = precond matrix: >>> >>> Mat Object: 2 MPI processes >>> >>> type: mpiaij >>> >>> rows=200000, cols=200000 >>> >>> total: nonzeros=3373340, allocated nonzeros=3373340 >>> >>> total number of mallocs used during MatSetValues calls =0 >>> >>> not using I-node (on process 0) routines >>> >>> >>> >>> And i configured my PC object as: >>> >>> >>> call PCSetType(mg,PCHYPRE,ierr) >>> >>> call PCHYPRESetType(mg,'boomeramg',ierr) >>> >>> >>> call PetscOptionsSetValue(PETSC_NULL_OBJECT,'pc_hypre_boomeramg_n >>> odal_coarsen','1',ierr) >>> >>> call PetscOptionsSetValue(PETSC_NULL_OBJECT,'pc_hypre_boomeramg_v >>> ec_interp_variant','1',ierr) >>> >>> >>> >>> What are your thoughts ? >>> >>> Thanks, >>> >>> Manuel >>> >>> >>> >>> On Fri, Jan 6, 2017 at 1:58 PM, Manuel Valera <[email protected]> >>> wrote: >>> >>>> Awesome, that did it, thanks once again. >>>> >>>> >>>> On Fri, Jan 6, 2017 at 1:53 PM, Barry Smith <[email protected]> wrote: >>>> >>>>> >>>>> Take the scatter out of the if () since everyone does it and get >>>>> rid of the VecView(). >>>>> >>>>> Does this work? If not where is it hanging? >>>>> >>>>> >>>>> > On Jan 6, 2017, at 3:29 PM, Manuel Valera <[email protected]> >>>>> wrote: >>>>> > >>>>> > Thanks Dave, >>>>> > >>>>> > I think is interesting it never gave an error on this, after adding >>>>> the vecassembly calls it still shows the same behavior, without >>>>> complaining, i did: >>>>> > >>>>> > if(rankl==0)then >>>>> > >>>>> > call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr) >>>>> > call VecAssemblyBegin(bp0,ierr) ; call VecAssemblyEnd(bp0,ierr); >>>>> > CHKERRQ(ierr) >>>>> > >>>>> endif >>>>> > >>>>> > >>>>> > call VecScatterBegin(ctr,bp0,bp2,IN >>>>> SERT_VALUES,SCATTER_REVERSE,ierr) >>>>> > call VecScatterEnd(ctr,bp0,bp2,INSE >>>>> RT_VALUES,SCATTER_REVERSE,ierr) >>>>> > print*,"done! " >>>>> > CHKERRQ(ierr) >>>>> > >>>>> > >>>>> > CHKERRQ(ierr) >>>>> > >>>>> > >>>>> > Thanks. >>>>> > >>>>> > On Fri, Jan 6, 2017 at 12:44 PM, Dave May <[email protected]> >>>>> wrote: >>>>> > >>>>> > >>>>> > On 6 January 2017 at 20:24, Manuel Valera <[email protected]> >>>>> wrote: >>>>> > Great help Barry, i totally had overlooked that option (it is >>>>> explicit in the vecscatterbegin call help page but not in >>>>> vecscattercreatetozero, as i read later) >>>>> > >>>>> > So i used that and it works partially, it scatters te values >>>>> assigned in root but not the rest, if i call vecscatterbegin from outside >>>>> root it hangs, the code currently look as this: >>>>> > >>>>> > call VecScatterCreateToZero(bp2,ctr,bp0,ierr); CHKERRQ(ierr) >>>>> > >>>>> > call PetscObjectSetName(bp0, 'bp0:',ierr) >>>>> > >>>>> > if(rankl==0)then >>>>> > >>>>> > call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr) >>>>> > >>>>> > call VecView(bp0,PETSC_VIEWER_STDOUT_WORLD,ierr) >>>>> > >>>>> > >>>>> > You need to call >>>>> > >>>>> > VecAssemblyBegin(bp0); >>>>> > VecAssemblyEnd(bp0); >>>>> > after your last call to VecSetValues() before you can do any >>>>> operations with bp0. >>>>> > >>>>> > With your current code, the call to VecView should produce an error >>>>> if you used the error checking macro CHKERRQ(ierr) (as should >>>>> VecScatter{Begin,End} >>>>> > >>>>> > Thanks, >>>>> > Dave >>>>> > >>>>> > >>>>> > call VecScatterBegin(ctr,bp0,bp2,IN >>>>> SERT_VALUES,SCATTER_REVERSE,ierr) >>>>> > call VecScatterEnd(ctr,bp0,bp2,INSE >>>>> RT_VALUES,SCATTER_REVERSE,ierr) >>>>> > print*,"done! " >>>>> > CHKERRQ(ierr) >>>>> > >>>>> > endif >>>>> > >>>>> > ! call VecScatterBegin(ctr,bp0,bp2,IN >>>>> SERT_VALUES,SCATTER_REVERSE,ierr) >>>>> > ! call VecScatterEnd(ctr,bp0,bp2,INSE >>>>> RT_VALUES,SCATTER_REVERSE,ierr) >>>>> > >>>>> > call VecView(bp2,PETSC_VIEWER_STDOUT_WORLD,ierr) >>>>> > >>>>> > call PetscBarrier(PETSC_NULL_OBJECT,ierr) >>>>> > >>>>> > call exit() >>>>> > >>>>> > >>>>> > >>>>> > And the output is: (with bp the right answer) >>>>> > >>>>> > Vec Object:bp: 2 MPI processes >>>>> > type: mpi >>>>> > Process [0] >>>>> > 1. >>>>> > 2. >>>>> > Process [1] >>>>> > 4. >>>>> > 3. >>>>> > Vec Object:bp2: 2 MPI processes (before scatter) >>>>> > type: mpi >>>>> > Process [0] >>>>> > 0. >>>>> > 0. >>>>> > Process [1] >>>>> > 0. >>>>> > 0. >>>>> > Vec Object:bp0: 1 MPI processes >>>>> > type: seq >>>>> > 1. >>>>> > 2. >>>>> > 4. >>>>> > 3. >>>>> > done! >>>>> > Vec Object:bp2: 2 MPI processes (after scatter) >>>>> > type: mpi >>>>> > Process [0] >>>>> > 1. >>>>> > 2. >>>>> > Process [1] >>>>> > 0. >>>>> > 0. >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > Thanks inmensely for your help, >>>>> > >>>>> > Manuel >>>>> > >>>>> > >>>>> > On Thu, Jan 5, 2017 at 4:39 PM, Barry Smith <[email protected]> >>>>> wrote: >>>>> > >>>>> > > On Jan 5, 2017, at 6:21 PM, Manuel Valera <[email protected]> >>>>> wrote: >>>>> > > >>>>> > > Hello Devs is me again, >>>>> > > >>>>> > > I'm trying to distribute a vector to all called processes, the >>>>> vector would be originally in root as a sequential vector and i would like >>>>> to scatter it, what would the best call to do this ? >>>>> > > >>>>> > > I already know how to gather a distributed vector to root with >>>>> VecScatterCreateToZero, this would be the inverse operation, >>>>> > >>>>> > Use the same VecScatter object but with SCATTER_REVERSE, not you >>>>> need to reverse the two vector arguments as well. >>>>> > >>>>> > >>>>> > > i'm currently trying with VecScatterCreate() and as of now im >>>>> doing the following: >>>>> > > >>>>> > > >>>>> > > if(rank==0)then >>>>> > > >>>>> > > >>>>> > > call VecCreate(PETSC_COMM_SELF,bp0,ierr); CHKERRQ(ierr) !if >>>>> i use WORLD >>>>> > > >>>>> !freezes in SetSizes >>>>> > > call VecSetSizes(bp0,PETSC_DECIDE,nbdp,ierr); CHKERRQ(ierr) >>>>> > > call VecSetType(bp0,VECSEQ,ierr) >>>>> > > call VecSetFromOptions(bp0,ierr); CHKERRQ(ierr) >>>>> > > >>>>> > > >>>>> > > call VecSetValues(bp0,nbdp,ind,Rhs,INSERT_VALUES,ierr) >>>>> > > >>>>> > > !call VecSet(bp0,5.0D0,ierr); CHKERRQ(ierr) >>>>> > > >>>>> > > >>>>> > > call VecView(bp0,PETSC_VIEWER_STDOUT_WORLD,ierr) >>>>> > > >>>>> > > call VecAssemblyBegin(bp0,ierr) ; call >>>>> VecAssemblyEnd(bp0,ierr) !rhs >>>>> > > >>>>> > > do i=0,nbdp-1,1 >>>>> > > ind(i+1) = i >>>>> > > enddo >>>>> > > >>>>> > > call ISCreateGeneral(PETSC_COMM_SEL >>>>> F,nbdp,ind,PETSC_COPY_VALUES,locis,ierr) >>>>> > > >>>>> > > !call VecScatterCreate(bp0,PETSC_NULL_OBJECT,bp2,is,ctr,ierr) >>>>> !if i use SELF >>>>> > > >>>>> !freezes here. >>>>> > > >>>>> > > call VecScatterCreate(bp0,locis,bp2 >>>>> ,PETSC_NULL_OBJECT,ctr,ierr) >>>>> > > >>>>> > > endif >>>>> > > >>>>> > > bp2 being the receptor MPI vector to scatter to >>>>> > > >>>>> > > But it freezes in VecScatterCreate when trying to use more than >>>>> one processor, what would be a better approach ? >>>>> > > >>>>> > > >>>>> > > Thanks once again, >>>>> > > >>>>> > > Manuel >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > On Wed, Jan 4, 2017 at 3:30 PM, Manuel Valera < >>>>> [email protected]> wrote: >>>>> > > Thanks i had no idea how to debug and read those logs, that solved >>>>> this issue at least (i was sending a message from root to everyone else, >>>>> but trying to catch from everyone else including root) >>>>> > > >>>>> > > Until next time, many thanks, >>>>> > > >>>>> > > Manuel >>>>> > > >>>>> > > On Wed, Jan 4, 2017 at 3:23 PM, Matthew Knepley <[email protected]> >>>>> wrote: >>>>> > > On Wed, Jan 4, 2017 at 5:21 PM, Manuel Valera < >>>>> [email protected]> wrote: >>>>> > > I did a PetscBarrier just before calling the vicariate routine and >>>>> im pretty sure im calling it from every processor, code looks like this: >>>>> > > >>>>> > > From the gdb trace. >>>>> > > >>>>> > > Proc 0: Is in some MPI routine you call yourself, line 113 >>>>> > > >>>>> > > Proc 1: Is in VecCreate(), line 130 >>>>> > > >>>>> > > You need to fix your communication code. >>>>> > > >>>>> > > Matt >>>>> > > >>>>> > > call PetscBarrier(PETSC_NULL_OBJECT,ierr) >>>>> > > >>>>> > > print*,'entering POInit from',rank >>>>> > > !call exit() >>>>> > > >>>>> > > call PetscObjsInit() >>>>> > > >>>>> > > >>>>> > > And output gives: >>>>> > > >>>>> > > entering POInit from 0 >>>>> > > entering POInit from 1 >>>>> > > entering POInit from 2 >>>>> > > entering POInit from 3 >>>>> > > >>>>> > > >>>>> > > Still hangs in the same way, >>>>> > > >>>>> > > Thanks, >>>>> > > >>>>> > > Manuel >>>>> > > >>>>> > > >>>>> > > >>>>> > > On Wed, Jan 4, 2017 at 2:55 PM, Manuel Valera < >>>>> [email protected]> wrote: >>>>> > > Thanks for the answers ! >>>>> > > >>>>> > > heres the screenshot of what i got from bt in gdb (great hint in >>>>> how to debug in petsc, didn't know that) >>>>> > > >>>>> > > I don't really know what to look at here, >>>>> > > >>>>> > > Thanks, >>>>> > > >>>>> > > Manuel >>>>> > > >>>>> > > On Wed, Jan 4, 2017 at 2:39 PM, Dave May <[email protected]> >>>>> wrote: >>>>> > > Are you certain ALL ranks in PETSC_COMM_WORLD call these >>>>> function(s). These functions cannot be inside if statements like >>>>> > > if (rank == 0){ >>>>> > > VecCreateMPI(...) >>>>> > > } >>>>> > > >>>>> > > >>>>> > > On Wed, 4 Jan 2017 at 23:34, Manuel Valera <[email protected]> >>>>> wrote: >>>>> > > Thanks Dave for the quick answer, appreciate it, >>>>> > > >>>>> > > I just tried that and it didn't make a difference, any other >>>>> suggestions ? >>>>> > > >>>>> > > Thanks, >>>>> > > Manuel >>>>> > > >>>>> > > On Wed, Jan 4, 2017 at 2:29 PM, Dave May <[email protected]> >>>>> wrote: >>>>> > > You need to swap the order of your function calls. >>>>> > > Call VecSetSizes() before VecSetType() >>>>> > > >>>>> > > Thanks, >>>>> > > Dave >>>>> > > >>>>> > > >>>>> > > On Wed, 4 Jan 2017 at 23:21, Manuel Valera <[email protected]> >>>>> wrote: >>>>> > > Hello all, happy new year, >>>>> > > >>>>> > > I'm working on parallelizing my code, it worked and provided some >>>>> results when i just called more than one processor, but created artifacts >>>>> because i didn't need one image of the whole program in each processor, >>>>> conflicting with each other. >>>>> > > >>>>> > > Since the pressure solver is the main part i need in parallel im >>>>> chosing mpi to run everything in root processor until its time to solve >>>>> for >>>>> pressure, at this point im trying to create a distributed vector using >>>>> either >>>>> > > >>>>> > > call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,nbdp,xp,ierr) >>>>> > > or >>>>> > > call VecCreate(PETSC_COMM_WORLD,xp,ierr); CHKERRQ(ierr) >>>>> > > call VecSetType(xp,VECMPI,ierr) >>>>> > > call VecSetSizes(xp,PETSC_DECIDE,nbdp,ierr); CHKERRQ(ierr) >>>>> > > >>>>> > > >>>>> > > In both cases program hangs at this point, something it never >>>>> happened on the naive way i described before. I've made sure the global >>>>> size, nbdp, is the same in every processor. What can be wrong? >>>>> > > >>>>> > > Thanks for your kind help, >>>>> > > >>>>> > > Manuel. >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > -- >>>>> > > What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> > > -- Norbert Wiener >>>>> > > >>>>> > > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> >>>>> >>>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
