Re: [petsc-users] A number of questions about DMDA with SNES and Quasi-Newton methods
I worked on the assumptions in my previous email and I at least partially implemented the function to assign the couplings. For row 0, which is the redundant field, I set dnz[0] to end-start, and onz[0] to the size of the matrix minus dnz[0]. For all other rows, I just increment the existing values of dnz[i] and onz[i], since the coupling to the redundant field adds one extra element beyond what's allocated for the DMDA stencil. I see in the source that the FormCoupleLocations function is called once if the DM has PREALLOC_ONLY set to true, but twice otherwise. I assume that the second call is for setting the nonzero structure. Do I need to do this? In any case, something is still not right. Even with the extra elements preallocated, the first assembly of the matrix is very slow. I ran a test problem on a single process with -info, and got this: 0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0 unneeded,1 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 1) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using Inode routines [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space: 0 unneeded,629703 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9009) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5. Using Inode routines [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0 unneeded,1 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 1) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using Inode routines [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space: 0 unneeded,629703 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9009) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5. Using Inode routines [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 18018 unneeded,629704 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5. Using Inode routines [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446774208 30818112 [0] DMGetDMSNES(): Creating new DMSNES [0] DMGetDMKSP(): Creating new DMKSP [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 0 SNES Function norm 2.302381528359e+00 [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064 [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 126132 unneeded,647722 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 9610 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines. [0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5. Using Inode routines [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 0 unneeded,647722 used [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0 [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010 [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines. The 9610 mallocs during MatSetValues seem
Re: [petsc-users] unsorted local columns in 3.8?
Doing some additional testing, the issue goes away when removing the gamg preconditioner line from the petsc.rc: -pc_type gamg On Wed, Nov 1, 2017 at 8:23 PM, Hongwrote: > Randy: > Thanks, I'll check it tomorrow. > Hong > > OK, this might not be completely satisfactory, because it doesn't show the >> partitioning or how the matrix is created, but this reproduces the problem. >> I wrote out my matrix, Amat, from the larger simulation, and load it in >> this script. This must be run with MPI rank greater than 1. This may be >> some combination of my petsc.rc, because when I use the PetscInitialize >> with it, it throws the error, but when using default (PETSC_NULL_CHARACTER) >> it runs fine. >> >> >> On Tue, Oct 31, 2017 at 9:58 AM, Hong wrote: >> >>> Randy: >>> It could be a bug or a missing feature in our new >>> MatCreateSubMatrix_MPIAIJ_SameRowDist(). >>> It would be helpful if you can provide us a simple example that produces >>> this example. >>> Hong >>> >>> I'm running a Fortran code that was just changed over to using petsc 3.8 (previously petsc 3.7.6). An error was thrown during a KSPSetUp() call. The error is "unsorted iscol_local is not implemented yet" (see full error below). I tried to trace down the difference in the source files, but where the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) doesn't seem to have existed in v3.7.6, so I'm unsure how to compare. It seems the error is that the order of the columns locally are unsorted, though I don't think I specify a column order in the creation of the matrix: call MatCreate(this%comm,AA,ierr) call MatSetSizes(AA,npetscloc,npetscloc,nreal,nreal,ierr) call MatSetType(AA,MATAIJ,ierr) call MatSetup(AA,ierr) call MatGetOwnershipRange(AA,low,high,ierr) allocate(d_nnz(npetscloc),o_nnz(npetscloc)) call getNNZ(grid,npetscloc,low,high,d_nnz,o_nnz,this%xgc_petsc,nr eal,ierr) call MatSeqAIJSetPreallocation(AA,PETSC_NULL_INTEGER,d_nnz,ierr) call MatMPIAIJSetPreallocation(AA,PETSC_NULL_INTEGER,d_nnz,PETSC_ NULL_INTEGER,o_nnz,ierr) deallocate(d_nnz,o_nnz) call MatSetOption(AA,MAT_IGNORE_OFF_PROC_ENTRIES,PETSC_TRUE,ierr) call MatSetOption(AA,MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ierr) call MatSetup(AA,ierr) [62]PETSC ERROR: - Error Message -- [62]PETSC ERROR: No support for this operation for this object type [62]PETSC ERROR: unsorted iscol_local is not implemented yet [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d ocumentation/faq.html for trouble shooting. [62]PETSC ERROR: Petsc Release Version 3.8.0, unknown[62]PETSC ERROR: #1 MatCreateSubMatrix_MPIAIJ_SameRowDist() line 3418 in /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line 3247 in /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 in /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line 383 in /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c [62]PETSC ERROR: #6 PCSetUp() line 924 in /global/u1/r/rchurchi/petsc/3. 8.0/src/ksp/pc/interface/precon.c [62]PETSC ERROR: #7 KSPSetUp() line 378 in /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/ksp/interface/itfunc.c -- R. Michael Churchill >>> >>> >> >> >> -- >> R. Michael Churchill >> > > -- R. Michael Churchill
Re: [petsc-users] unsorted local columns in 3.8?
Randy: Thanks, I'll check it tomorrow. Hong > OK, this might not be completely satisfactory, because it doesn't show the > partitioning or how the matrix is created, but this reproduces the problem. > I wrote out my matrix, Amat, from the larger simulation, and load it in > this script. This must be run with MPI rank greater than 1. This may be > some combination of my petsc.rc, because when I use the PetscInitialize > with it, it throws the error, but when using default (PETSC_NULL_CHARACTER) > it runs fine. > > > On Tue, Oct 31, 2017 at 9:58 AM, Hongwrote: > >> Randy: >> It could be a bug or a missing feature in our new >> MatCreateSubMatrix_MPIAIJ_SameRowDist(). >> It would be helpful if you can provide us a simple example that produces >> this example. >> Hong >> >> I'm running a Fortran code that was just changed over to using petsc 3.8 >>> (previously petsc 3.7.6). An error was thrown during a KSPSetUp() call. The >>> error is "unsorted iscol_local is not implemented yet" (see full error >>> below). I tried to trace down the difference in the source files, but where >>> the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) doesn't seem >>> to have existed in v3.7.6, so I'm unsure how to compare. It seems the error >>> is that the order of the columns locally are unsorted, though I don't think >>> I specify a column order in the creation of the matrix: >>> call MatCreate(this%comm,AA,ierr) >>> call MatSetSizes(AA,npetscloc,npetscloc,nreal,nreal,ierr) >>> call MatSetType(AA,MATAIJ,ierr) >>> call MatSetup(AA,ierr) >>> call MatGetOwnershipRange(AA,low,high,ierr) >>> allocate(d_nnz(npetscloc),o_nnz(npetscloc)) >>> call getNNZ(grid,npetscloc,low,high,d_nnz,o_nnz,this%xgc_petsc,nr >>> eal,ierr) >>> call MatSeqAIJSetPreallocation(AA,PETSC_NULL_INTEGER,d_nnz,ierr) >>> call MatMPIAIJSetPreallocation(AA,PETSC_NULL_INTEGER,d_nnz,PETSC_ >>> NULL_INTEGER,o_nnz,ierr) >>> deallocate(d_nnz,o_nnz) >>> call MatSetOption(AA,MAT_IGNORE_OFF_PROC_ENTRIES,PETSC_TRUE,ierr) >>> call MatSetOption(AA,MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ierr) >>> call MatSetup(AA,ierr) >>> >>> >>> [62]PETSC ERROR: - Error Message >>> -- >>> [62]PETSC ERROR: No support for this operation for this object type >>> [62]PETSC ERROR: unsorted iscol_local is not implemented yet >>> [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html >>> for trouble shooting. >>> [62]PETSC ERROR: Petsc Release Version 3.8.0, unknown[62]PETSC ERROR: #1 >>> MatCreateSubMatrix_MPIAIJ_SameRowDist() line 3418 in >>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>> [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line 3247 in >>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c >>> [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 in >>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c >>> [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line 383 in >>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>> [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in >>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c >>> [62]PETSC ERROR: #6 PCSetUp() line 924 in /global/u1/r/rchurchi/petsc/3. >>> 8.0/src/ksp/pc/interface/precon.c >>> [62]PETSC ERROR: #7 KSPSetUp() line 378 in /global/u1/r/rchurchi/petsc/3. >>> 8.0/src/ksp/ksp/interface/itfunc.c >>> >>> -- >>> R. Michael Churchill >>> >> >> > > > -- > R. Michael Churchill >
Re: [petsc-users] GAMG advice
Thanks Barry. By simply replacing chebychev by richardson I get similar performance with GAMG and ML (GAMG even slightly faster): -pc_type gamg -pc_gamg_type agg -pc_gamg_threshold 0.03 -pc_gamg_square_graph 10 -pc_gamg_sym_graph -mg_levels_ksp_type richardson -mg_levels_pc_type sor Is it still true that I need to set "-pc_gamg_sym_graph" if the matrix is asymmetric? For serial runs it doesn't seem to matter, but in parallel the PC setup hangs (after calls of PCGAMGFilterGraph()) if -pc_gamg_sym_graph is not set. David On 10/21/2017 12:10 AM, Barry Smith wrote: > David, > >GAMG picks the number of levels based on how the coarsening process etc > proceeds. You cannot hardwire it to a particular value. You can run with > -info to get more info potentially on the decisions GAMG is making. > > Barry > >> On Oct 20, 2017, at 2:06 PM, David Noltewrote: >> >> PS: I didn't realize at first, it looks as if the -pc_mg_levels 3 option >> was not taken into account: >> type: gamg >> MG: type is MULTIPLICATIVE, levels=1 cycles=v >> >> >> >> On 10/20/2017 03:32 PM, David Nolte wrote: >>> Dear all, >>> >>> I have some problems using GAMG as a preconditioner for (F)GMRES. >>> Background: I am solving the incompressible, unsteady Navier-Stokes >>> equations with a coupled mixed FEM approach, using P1/P1 elements for >>> velocity and pressure on an unstructured tetrahedron mesh with about >>> 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG, >>> hence, no zeros on the diagonal of the pressure block. Time >>> discretization with semi-implicit backward Euler. The flow is a >>> convection dominated flow through a nozzle. >>> >>> So far, for this setup, I have been quite happy with a simple FGMRES/ML >>> solver for the full system (rather bruteforce, I admit, but much faster >>> than any block/Schur preconditioners I tried): >>> >>> -ksp_converged_reason >>> -ksp_monitor_true_residual >>> -ksp_type fgmres >>> -ksp_rtol 1.0e-6 >>> -ksp_initial_guess_nonzero >>> >>> -pc_type ml >>> -pc_ml_Threshold 0.03 >>> -pc_ml_maxNlevels 3 >>> >>> This setup converges in ~100 iterations (see below the ksp_view output) >>> to rtol: >>> >>> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm >>> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06 >>> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm >>> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06 >>> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm >>> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06 >>> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm >>> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07 >>> >>> >>> Now I'd like to try GAMG instead of ML. However, I don't know how to set >>> it up to get similar performance. >>> The obvious/naive >>> >>> -pc_type gamg >>> -pc_gamg_type agg >>> >>> # with and without >>> -pc_gamg_threshold 0.03 >>> -pc_mg_levels 3 >>> >>> converges very slowly on 1 proc and much worse on 8 (~200k dofs per >>> proc), for instance: >>> np = 1: >>> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm >>> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04 >>> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm >>> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04 >>> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm >>> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04 >>> >>> np = 8: >>> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm >>> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03 >>> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm >>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 >>> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm >>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03 >>> >>> A very high threshold seems to improve the GAMG PC, for instance with >>> 0.75 I get convergence to rtol=1e-6 after 744 iterations. >>> What else should I try? >>> >>> I would very much appreciate any advice on configuring GAMG and >>> differences w.r.t ML to be taken into account (not a multigrid expert >>> though). >>> >>> Thanks, best wishes >>> David >>> >>> >>> -- >>> ksp_view for -pc_type gamg -pc_gamg_threshold 0.75
[petsc-users] Problem with encapsulation of PETSc/SLEPc in Fortran
Dear PETSc/SLEPc users, I am encountering a problem when I try to isolate my calls to PETSc/SLEPc routines in a module. When I have a single file everything works fine, but when I have, say: - modA.f90 (Independant modules) - modB.F90 (Contains all the calls to PETSc and SLEPc) - main.f90 or main.F90 When calling EPSSolve, I keep having the error "[1]PETSC ERROR: Caught signal number 8 FPE: Floating Point Exception,probably divide by zero" plus "User provided function() line 0 in unknown file." With gdb: "Thread 1 "main" received signal SIGFPE, Arithmetic exception. 0x71662cf6 in dlamch_ () from /home/linuxbrew/.linuxbrew/lib/libopenblas.so.0" The PETSc/SLEPc module looks like USE SlepcEps IMPLICIT NONE #include which is the only preprocessed directive used. I tried to add more (petscsys, petscmat, petscvec, slepseps), tried to change main.f90 to main.F90 and incorporate the preprocessed directives there as well without any effect. My code calls CHKERRQ(ierr) systematically. My makefile looks like include ${SLEPC_DIR}/lib/slepc/conf/slepc_common INCL = -I$(PETSC_DIR)/include/ -I$(SLEPC_DIR)/include/ %.o: %.f90 $(FC) $(FLAGS) $(INCL) -c $< -o $@ $(SLEPC_EPS_LIB) %.o: %.F90 $(FC) $(FLAGS) $(INCL) -c $< -o $@ $(SLEPC_EPS_LIB) $(EXEC): $(OBJS) $(FC) $(FLAGS) $(INCL) $(OBJS) -o $@ $(SLEPC_EPS_LIB) Could you spot anything I am doing wrong or dangerous? Furthermore, do you know how to avoid the warnings "Same actual argument associated with INTENT(IN) argument 'errorcode' and INTENT(OUT) argument 'ierror' at (1)" when calling CHKERRQ(ierr) when ierr is declared as INTEGER? Thanks in advance for your continued support, Thibaut
Re: [petsc-users] petsc4py sparse matrix construction time
Should add: To make sure your matrix assembly is efficient - you should run the code with the option -info, and make sure there are no mallocs during assembly. I'm not sure how petsc4py processes options. One way is to set such options with PETSC_OPTIONS env variable - and then run the code. Satish On Wed, 1 Nov 2017, Matthew Knepley wrote: > On Tue, Oct 31, 2017 at 10:00 PM, Cetinbas, Cankur Firat> wrote: > > > Hi, > > > > Thanks a lot. Based on both of your suggestions I modified the code using > > Mat.createAIJ() and csr option. The computation time decreased > > significantly after using this method. Still if there is a better option > > please let me know after seeing the modified code below. > > > > At first trial with 1000x1000 matrix with 96019 non-zeros in the matrix, > > the computation time did not scale with the number of cores : Single core > > python @ 0.0035s, single core petsc @ 0.0024s, 2 cores petsc @ 0.0036s, 4 > > cores petsc @ 0.0032, 8 cores petsc @ 0.0030s. > > > > Then I tried with larger matrix 181797x181797 with more non-zeros and I > > got the following results: Single core python @ 0.021, single core petsc @ > > 0.031, 2 cores petsc @ 0.024s, 4 cores petsc @ 0.014, 8 cores petsc @ > > 0.009s, 16 cores petsc @ 0.0087s. > > > > I think the optimum number of nodes is highly dependent on matrix size and > > the number of non-zeros. In the real code matrix size (and so the number of > > non-zero elements) will grow at every iteration starting with very small > > matrices growing to very big ones. Is it possible to set the number process > > from the code dynamically? > > > > I am not sure how you are interpreting these measurements. Normally, I > would say > > 1) Everything timed below is "parallel overhead". This is not intended to > scale with P, instead it will look like a constant, as you observe > > 2) The time to compute the matrix entires should far outstrip the time > below to figure the nonzero structure. Is this true? > > 3) Solve time is often larger than matrix calculation. Is it? > > Thus, we deciding on parallelism, you need to look at the largest costs, > and how they scale with P. > > Thanks, > > Matt > > Another question is about the data types; mpi4py only let me transfer float > > type data, and petsc4py only lets me use int32 type indices. Besides keep > > converting the data, is there any solution for this? > > > > The modified code for matrix creation part: > > > > comm = MPI.COMM_WORLD > > rank = comm.Get_rank() > > size = comm.Get_size() > > > > if rank==0: > > row = np.loadtxt('row1000.out').astype(dtype='int32') > > col = np.loadtxt('col1000.out').astype(dtype='int32') > > val = np.loadtxt('val1000.out').astype(dtype='int32') > > m = 1000# 1000 x 1000 matrix > > if size>1: > > rbc = np.bincount(row)*1.0 > > ieq = int(np.floor(m/size)) > > a = [ieq]*size > > ix = int(np.mod(m,size)) > > if ix>0: > > for i in range(0,ix): > > a[i]= a[i]+1 > > a = np.array([0]+a).cumsum() > > b = np.zeros(a.shape[0]-1) > > for i in range(0,a.shape[0]-1): > > b[i]=rbc[a[i]:a[i+1]].sum() # b is the send counts for > > Scatterv > > row = row.astype(dtype=float) > > col = col.astype(dtype=float) > > val = val.astype(dtype=float) > > else: > > row=None > > col=None > > val=None > > indpt=None > > b=None > > m=None > > > > if size>1: > > ml = comm.bcast(m,root=0) > > bl = comm.bcast(b,root=0) > > row_lcl = np.zeros(bl[rank]) > > col_lcl = row_lcl.copy() > > val_lcl = row_lcl.copy() > > > > comm.Scatterv([row,b],row_lcl) > > comm.Scatterv([col,b],col_lcl) > > comm.Scatterv([val,b],val_lcl) > > comm.Barrier() > > > > row_lcl = row_lcl.astype(dtype='int32') > > col_lcl = col_lcl.astype(dtype='int32') > > val_lcl = val_lcl.astype(dtype='int32') > > > > indptr = np.bincount(row_lcl) > > indptr = indptr[indptr>0] > > indptr = np.insert(indptr,0,0).cumsum() > > indptr = indptr.astype(dtype='int32') > > comm.Barrier() > > > > pA = PETSc.Mat().createAIJ(size=(ml,ml),csr=(indptr, col_lcl, > > val_lcl)) # Matrix generation > > > > else: > > indptr = np.bincount(row) > > indptr = np.insert(indptr,0,0).cumsum() > > indptr = indptr.astype(dtype='int32') > > st=time.time() > > pA = PETSc.Mat().createAIJ(size=(m,m),csr=(indptr, col, val)) > > print('dt:',time.time()-st) > > > > > > Regards, > > > > Firat > > > > > > -Original Message- > > From: Smith, Barry F. > > Sent: Tuesday, October 31, 2017 10:18 AM > > To: Matthew Knepley > > Cc: Cetinbas, Cankur Firat; petsc-users@mcs.anl.gov; Ahluwalia, Rajesh K. > > Subject: Re: [petsc-users] petsc4py sparse matrix construction time > > > > > >You also need to make sure that most matrix entries are generated on > > the
Re: [petsc-users] petsc4py sparse matrix construction time
On Tue, Oct 31, 2017 at 10:00 PM, Cetinbas, Cankur Firatwrote: > Hi, > > Thanks a lot. Based on both of your suggestions I modified the code using > Mat.createAIJ() and csr option. The computation time decreased > significantly after using this method. Still if there is a better option > please let me know after seeing the modified code below. > > At first trial with 1000x1000 matrix with 96019 non-zeros in the matrix, > the computation time did not scale with the number of cores : Single core > python @ 0.0035s, single core petsc @ 0.0024s, 2 cores petsc @ 0.0036s, 4 > cores petsc @ 0.0032, 8 cores petsc @ 0.0030s. > > Then I tried with larger matrix 181797x181797 with more non-zeros and I > got the following results: Single core python @ 0.021, single core petsc @ > 0.031, 2 cores petsc @ 0.024s, 4 cores petsc @ 0.014, 8 cores petsc @ > 0.009s, 16 cores petsc @ 0.0087s. > > I think the optimum number of nodes is highly dependent on matrix size and > the number of non-zeros. In the real code matrix size (and so the number of > non-zero elements) will grow at every iteration starting with very small > matrices growing to very big ones. Is it possible to set the number process > from the code dynamically? > I am not sure how you are interpreting these measurements. Normally, I would say 1) Everything timed below is "parallel overhead". This is not intended to scale with P, instead it will look like a constant, as you observe 2) The time to compute the matrix entires should far outstrip the time below to figure the nonzero structure. Is this true? 3) Solve time is often larger than matrix calculation. Is it? Thus, we deciding on parallelism, you need to look at the largest costs, and how they scale with P. Thanks, Matt Another question is about the data types; mpi4py only let me transfer float > type data, and petsc4py only lets me use int32 type indices. Besides keep > converting the data, is there any solution for this? > > The modified code for matrix creation part: > > comm = MPI.COMM_WORLD > rank = comm.Get_rank() > size = comm.Get_size() > > if rank==0: > row = np.loadtxt('row1000.out').astype(dtype='int32') > col = np.loadtxt('col1000.out').astype(dtype='int32') > val = np.loadtxt('val1000.out').astype(dtype='int32') > m = 1000# 1000 x 1000 matrix > if size>1: > rbc = np.bincount(row)*1.0 > ieq = int(np.floor(m/size)) > a = [ieq]*size > ix = int(np.mod(m,size)) > if ix>0: > for i in range(0,ix): > a[i]= a[i]+1 > a = np.array([0]+a).cumsum() > b = np.zeros(a.shape[0]-1) > for i in range(0,a.shape[0]-1): > b[i]=rbc[a[i]:a[i+1]].sum() # b is the send counts for > Scatterv > row = row.astype(dtype=float) > col = col.astype(dtype=float) > val = val.astype(dtype=float) > else: > row=None > col=None > val=None > indpt=None > b=None > m=None > > if size>1: > ml = comm.bcast(m,root=0) > bl = comm.bcast(b,root=0) > row_lcl = np.zeros(bl[rank]) > col_lcl = row_lcl.copy() > val_lcl = row_lcl.copy() > > comm.Scatterv([row,b],row_lcl) > comm.Scatterv([col,b],col_lcl) > comm.Scatterv([val,b],val_lcl) > comm.Barrier() > > row_lcl = row_lcl.astype(dtype='int32') > col_lcl = col_lcl.astype(dtype='int32') > val_lcl = val_lcl.astype(dtype='int32') > > indptr = np.bincount(row_lcl) > indptr = indptr[indptr>0] > indptr = np.insert(indptr,0,0).cumsum() > indptr = indptr.astype(dtype='int32') > comm.Barrier() > > pA = PETSc.Mat().createAIJ(size=(ml,ml),csr=(indptr, col_lcl, > val_lcl)) # Matrix generation > > else: > indptr = np.bincount(row) > indptr = np.insert(indptr,0,0).cumsum() > indptr = indptr.astype(dtype='int32') > st=time.time() > pA = PETSc.Mat().createAIJ(size=(m,m),csr=(indptr, col, val)) > print('dt:',time.time()-st) > > > Regards, > > Firat > > > -Original Message- > From: Smith, Barry F. > Sent: Tuesday, October 31, 2017 10:18 AM > To: Matthew Knepley > Cc: Cetinbas, Cankur Firat; petsc-users@mcs.anl.gov; Ahluwalia, Rajesh K. > Subject: Re: [petsc-users] petsc4py sparse matrix construction time > > >You also need to make sure that most matrix entries are generated on > the process that they will belong on. > > Barry > > > On Oct 30, 2017, at 8:01 PM, Matthew Knepley wrote: > > > > On Mon, Oct 30, 2017 at 8:06 PM, Cetinbas, Cankur Firat < > ccetin...@anl.gov> wrote: > > Hello, > > > > > > > > I am a beginner both in PETSc and mpi4py. I have been working on > parallelizing our water transport code (where we solve linear system of > equations) and I started with the toy code below. > > > > > > > > The toy code reads right hand size (rh), row, column, value vectors to > construct sparse coefficient matrix and then scatters them to construct > parallel PETSc