Re: [petsc-users] A number of questions about DMDA with SNES and Quasi-Newton methods

2017-11-01 Thread zakaryah .
I worked on the assumptions in my previous email and I at least partially
implemented the function to assign the couplings.  For row 0, which is the
redundant field, I set dnz[0] to end-start, and onz[0] to the size of the
matrix minus dnz[0].  For all other rows, I just increment the existing
values of dnz[i] and onz[i], since the coupling to the redundant field adds
one extra element beyond what's allocated for the DMDA stencil.

I see in the source that the FormCoupleLocations function is called once if
the DM has PREALLOC_ONLY set to true, but twice otherwise.  I assume that
the second call is for setting the nonzero structure.  Do I need to do this?

In any case, something is still not right.  Even with the extra elements
preallocated, the first assembly of the matrix is very slow.  I ran a test
problem on a single process with -info, and got this:

0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0 unneeded,1
used

[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0

[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1

[0] MatCheckCompressedRow(): Found the ratio (num_zerorows
0)/(num_localrows 1) < 0.6. Do not use CompressedRow routines.

[0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using Inode
routines

[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space: 0
unneeded,629703 used

[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0

[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81

[0] MatCheckCompressedRow(): Found the ratio (num_zerorows
0)/(num_localrows 9009) < 0.6. Do not use CompressedRow routines.

[0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5. Using
Inode routines

[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0
unneeded,1 used

[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0

[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1

[0] MatCheckCompressedRow(): Found the ratio (num_zerorows
0)/(num_localrows 1) < 0.6. Do not use CompressedRow routines.

[0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using Inode
routines

[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space: 0
unneeded,629703 used

[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0

[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81

[0] MatCheckCompressedRow(): Found the ratio (num_zerorows
0)/(num_localrows 9009) < 0.6. Do not use CompressedRow routines.

[0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5. Using
Inode routines

[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 18018
unneeded,629704 used

[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0

[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81

[0] MatCheckCompressedRow(): Found the ratio (num_zerorows
0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines.

[0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5. Using
Inode routines

[0] PetscCommDuplicate(): Using internal PETSc communicator 139995446774208
30818112

[0] DMGetDMSNES(): Creating new DMSNES

[0] DMGetDMKSP(): Creating new DMKSP

[0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056
30194064

[0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056
30194064

[0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056
30194064

[0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056
30194064

  0 SNES Function norm 2.302381528359e+00

[0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056
30194064

[0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056
30194064

[0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056
30194064

[0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056
30194064

[0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056
30194064

[0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056
30194064

[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space:
126132 unneeded,647722 used

[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 9610

[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010

[0] MatCheckCompressedRow(): Found the ratio (num_zerorows
0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines.

[0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5. Using
Inode routines

[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 0
unneeded,647722 used

[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0

[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010

[0] MatCheckCompressedRow(): Found the ratio (num_zerorows
0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines.


The 9610 mallocs during MatSetValues seem 

Re: [petsc-users] unsorted local columns in 3.8?

2017-11-01 Thread Randy Michael Churchill
Doing some additional testing, the issue goes away when removing the gamg
preconditioner line from the petsc.rc:
-pc_type gamg


On Wed, Nov 1, 2017 at 8:23 PM, Hong  wrote:

> Randy:
> Thanks, I'll check it tomorrow.
> Hong
>
> OK, this might not be completely satisfactory, because it doesn't show the
>> partitioning or how the matrix is created, but this reproduces the problem.
>> I wrote out my matrix, Amat, from the larger simulation, and load it in
>> this script. This must be run with MPI rank greater than 1. This may be
>> some combination of my petsc.rc, because when I use the PetscInitialize
>> with it, it throws the error, but when using default (PETSC_NULL_CHARACTER)
>> it runs fine.
>>
>>
>> On Tue, Oct 31, 2017 at 9:58 AM, Hong  wrote:
>>
>>> Randy:
>>> It could be a bug or a missing feature in our new
>>> MatCreateSubMatrix_MPIAIJ_SameRowDist().
>>> It would be helpful if you can provide us a simple example that produces
>>> this example.
>>> Hong
>>>
>>> I'm running a Fortran code that was just changed over to using petsc 3.8
 (previously petsc 3.7.6). An error was thrown during a KSPSetUp() call. The
 error is "unsorted iscol_local is not implemented yet" (see full error
 below). I tried to trace down the difference in the source files, but where
 the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) doesn't
 seem to have existed in v3.7.6, so I'm unsure how to compare. It seems the
 error is that the order of the columns locally are unsorted, though I don't
 think I specify a column order in the creation of the matrix:
  call MatCreate(this%comm,AA,ierr)
  call MatSetSizes(AA,npetscloc,npetscloc,nreal,nreal,ierr)
  call MatSetType(AA,MATAIJ,ierr)
  call MatSetup(AA,ierr)
  call MatGetOwnershipRange(AA,low,high,ierr)
  allocate(d_nnz(npetscloc),o_nnz(npetscloc))
  call getNNZ(grid,npetscloc,low,high,d_nnz,o_nnz,this%xgc_petsc,nr
 eal,ierr)
  call MatSeqAIJSetPreallocation(AA,PETSC_NULL_INTEGER,d_nnz,ierr)
  call MatMPIAIJSetPreallocation(AA,PETSC_NULL_INTEGER,d_nnz,PETSC_
 NULL_INTEGER,o_nnz,ierr)
  deallocate(d_nnz,o_nnz)
  call MatSetOption(AA,MAT_IGNORE_OFF_PROC_ENTRIES,PETSC_TRUE,ierr)
  call MatSetOption(AA,MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ierr)
  call MatSetup(AA,ierr)


 [62]PETSC ERROR: - Error Message
 --
 [62]PETSC ERROR: No support for this operation for this object type
 [62]PETSC ERROR: unsorted iscol_local is not implemented yet
 [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/d
 ocumentation/faq.html for trouble shooting.
 [62]PETSC ERROR: Petsc Release Version 3.8.0, unknown[62]PETSC ERROR:
 #1 MatCreateSubMatrix_MPIAIJ_SameRowDist() line 3418 in
 /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c
 [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line 3247 in
 /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c
 [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 in
 /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c
 [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line 383 in
 /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c
 [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in
 /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c
 [62]PETSC ERROR: #6 PCSetUp() line 924 in /global/u1/r/rchurchi/petsc/3.
 8.0/src/ksp/pc/interface/precon.c
 [62]PETSC ERROR: #7 KSPSetUp() line 378 in
 /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/ksp/interface/itfunc.c

 --
 R. Michael Churchill

>>>
>>>
>>
>>
>> --
>> R. Michael Churchill
>>
>
>


-- 
R. Michael Churchill


Re: [petsc-users] unsorted local columns in 3.8?

2017-11-01 Thread Hong
Randy:
Thanks, I'll check it tomorrow.
Hong

> OK, this might not be completely satisfactory, because it doesn't show the
> partitioning or how the matrix is created, but this reproduces the problem.
> I wrote out my matrix, Amat, from the larger simulation, and load it in
> this script. This must be run with MPI rank greater than 1. This may be
> some combination of my petsc.rc, because when I use the PetscInitialize
> with it, it throws the error, but when using default (PETSC_NULL_CHARACTER)
> it runs fine.
>
>
> On Tue, Oct 31, 2017 at 9:58 AM, Hong  wrote:
>
>> Randy:
>> It could be a bug or a missing feature in our new
>> MatCreateSubMatrix_MPIAIJ_SameRowDist().
>> It would be helpful if you can provide us a simple example that produces
>> this example.
>> Hong
>>
>> I'm running a Fortran code that was just changed over to using petsc 3.8
>>> (previously petsc 3.7.6). An error was thrown during a KSPSetUp() call. The
>>> error is "unsorted iscol_local is not implemented yet" (see full error
>>> below). I tried to trace down the difference in the source files, but where
>>> the error occurs (MatCreateSubMatrix_MPIAIJ_SameRowDist()) doesn't seem
>>> to have existed in v3.7.6, so I'm unsure how to compare. It seems the error
>>> is that the order of the columns locally are unsorted, though I don't think
>>> I specify a column order in the creation of the matrix:
>>>  call MatCreate(this%comm,AA,ierr)
>>>  call MatSetSizes(AA,npetscloc,npetscloc,nreal,nreal,ierr)
>>>  call MatSetType(AA,MATAIJ,ierr)
>>>  call MatSetup(AA,ierr)
>>>  call MatGetOwnershipRange(AA,low,high,ierr)
>>>  allocate(d_nnz(npetscloc),o_nnz(npetscloc))
>>>  call getNNZ(grid,npetscloc,low,high,d_nnz,o_nnz,this%xgc_petsc,nr
>>> eal,ierr)
>>>  call MatSeqAIJSetPreallocation(AA,PETSC_NULL_INTEGER,d_nnz,ierr)
>>>  call MatMPIAIJSetPreallocation(AA,PETSC_NULL_INTEGER,d_nnz,PETSC_
>>> NULL_INTEGER,o_nnz,ierr)
>>>  deallocate(d_nnz,o_nnz)
>>>  call MatSetOption(AA,MAT_IGNORE_OFF_PROC_ENTRIES,PETSC_TRUE,ierr)
>>>  call MatSetOption(AA,MAT_KEEP_NONZERO_PATTERN,PETSC_TRUE,ierr)
>>>  call MatSetup(AA,ierr)
>>>
>>>
>>> [62]PETSC ERROR: - Error Message
>>> --
>>> [62]PETSC ERROR: No support for this operation for this object type
>>> [62]PETSC ERROR: unsorted iscol_local is not implemented yet
>>> [62]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
>>> for trouble shooting.
>>> [62]PETSC ERROR: Petsc Release Version 3.8.0, unknown[62]PETSC ERROR: #1
>>> MatCreateSubMatrix_MPIAIJ_SameRowDist() line 3418 in
>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c
>>> [62]PETSC ERROR: #2 MatCreateSubMatrix_MPIAIJ() line 3247 in
>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/impls/aij/mpi/mpiaij.c
>>> [62]PETSC ERROR: #3 MatCreateSubMatrix() line 7872 in
>>> /global/u1/r/rchurchi/petsc/3.8.0/src/mat/interface/matrix.c
>>> [62]PETSC ERROR: #4 PCGAMGCreateLevel_GAMG() line 383 in
>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c
>>> [62]PETSC ERROR: #5 PCSetUp_GAMG() line 561 in
>>> /global/u1/r/rchurchi/petsc/3.8.0/src/ksp/pc/impls/gamg/gamg.c
>>> [62]PETSC ERROR: #6 PCSetUp() line 924 in /global/u1/r/rchurchi/petsc/3.
>>> 8.0/src/ksp/pc/interface/precon.c
>>> [62]PETSC ERROR: #7 KSPSetUp() line 378 in /global/u1/r/rchurchi/petsc/3.
>>> 8.0/src/ksp/ksp/interface/itfunc.c
>>>
>>> --
>>> R. Michael Churchill
>>>
>>
>>
>
>
> --
> R. Michael Churchill
>


Re: [petsc-users] GAMG advice

2017-11-01 Thread David Nolte
Thanks Barry.
By simply replacing chebychev by richardson I get similar performance
with GAMG and ML (GAMG even slightly faster):

-pc_type
gamg
   

-pc_gamg_type
agg 
  

-pc_gamg_threshold
0.03
 

-pc_gamg_square_graph 10
-pc_gamg_sym_graph
-mg_levels_ksp_type
richardson  


-mg_levels_pc_type sor

Is it still true that I need to set "-pc_gamg_sym_graph" if the matrix
is asymmetric? For serial runs it doesn't seem to matter, but in
parallel the PC setup hangs (after calls of
PCGAMGFilterGraph()) if -pc_gamg_sym_graph is not set.

David


On 10/21/2017 12:10 AM, Barry Smith wrote:
>   David,
>
>GAMG picks the number of levels based on how the coarsening process etc 
> proceeds. You cannot hardwire it to a particular value. You can run with 
> -info to get more info potentially on the decisions GAMG is making.
>
>   Barry
>
>> On Oct 20, 2017, at 2:06 PM, David Nolte  wrote:
>>
>> PS: I didn't realize at first, it looks as if the -pc_mg_levels 3 option
>> was not taken into account:  
>> type: gamg
>> MG: type is MULTIPLICATIVE, levels=1 cycles=v
>>
>>
>>
>> On 10/20/2017 03:32 PM, David Nolte wrote:
>>> Dear all,
>>>
>>> I have some problems using GAMG as a preconditioner for (F)GMRES.
>>> Background: I am solving the incompressible, unsteady Navier-Stokes
>>> equations with a coupled mixed FEM approach, using P1/P1 elements for
>>> velocity and pressure on an unstructured tetrahedron mesh with about
>>> 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG,
>>> hence, no zeros on the diagonal of the pressure block. Time
>>> discretization with semi-implicit backward Euler. The flow is a
>>> convection dominated flow through a nozzle.
>>>
>>> So far, for this setup, I have been quite happy with a simple FGMRES/ML
>>> solver for the full system (rather bruteforce, I admit, but much faster
>>> than any block/Schur preconditioners I tried):
>>>
>>> -ksp_converged_reason
>>> -ksp_monitor_true_residual
>>> -ksp_type fgmres
>>> -ksp_rtol 1.0e-6
>>> -ksp_initial_guess_nonzero
>>>
>>> -pc_type ml
>>> -pc_ml_Threshold 0.03
>>> -pc_ml_maxNlevels 3
>>>
>>> This setup converges in ~100 iterations (see below the ksp_view output)
>>> to rtol:
>>>
>>> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm
>>> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06
>>> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm
>>> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06
>>> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm
>>> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06
>>> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm
>>> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07
>>>
>>>
>>> Now I'd like to try GAMG instead of ML. However, I don't know how to set
>>> it up to get similar performance.
>>> The obvious/naive
>>>
>>> -pc_type gamg
>>> -pc_gamg_type agg
>>>
>>> # with and without
>>> -pc_gamg_threshold 0.03
>>> -pc_mg_levels 3
>>>
>>> converges very slowly on 1 proc and much worse on 8 (~200k dofs per
>>> proc), for instance:
>>> np = 1:
>>> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm
>>> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04
>>> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm
>>> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04
>>> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm
>>> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04
>>>
>>> np = 8:
>>> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm
>>> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03
>>> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
>>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
>>> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
>>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
>>>
>>> A very high threshold seems to improve the GAMG PC, for instance with
>>> 0.75 I get convergence to rtol=1e-6 after 744 iterations.
>>> What else should I try?
>>>
>>> I would very much appreciate any advice on configuring GAMG and
>>> differences w.r.t ML to be taken into account (not a multigrid expert
>>> though).
>>>
>>> Thanks, best wishes
>>> David
>>>
>>>
>>> --
>>> ksp_view for -pc_type gamg  -pc_gamg_threshold 0.75 

[petsc-users] Problem with encapsulation of PETSc/SLEPc in Fortran

2017-11-01 Thread Thibaut Appel

Dear PETSc/SLEPc users,

I am encountering a problem when I try to isolate my calls to 
PETSc/SLEPc routines in a module. When I have a single file everything 
works fine, but when I have, say:


   - modA.f90 (Independant modules)
   - modB.F90 (Contains all the calls to PETSc and SLEPc)
   - main.f90 or main.F90

When calling EPSSolve, I keep having the error "[1]PETSC ERROR: Caught 
signal number 8 FPE: Floating Point Exception,probably divide by zero" 
plus "User provided function() line 0 in  unknown file."
With gdb: "Thread 1 "main" received signal SIGFPE, Arithmetic exception. 
0x71662cf6 in dlamch_ () from 
/home/linuxbrew/.linuxbrew/lib/libopenblas.so.0"


The PETSc/SLEPc module looks like
  USE SlepcEps
  IMPLICIT NONE
#include 

which is the only preprocessed directive used. I tried to add more 
(petscsys, petscmat, petscvec, slepseps), tried to change main.f90 to 
main.F90 and incorporate the preprocessed directives there as well 
without any effect. My code calls CHKERRQ(ierr) systematically. My 
makefile looks like


    include ${SLEPC_DIR}/lib/slepc/conf/slepc_common
    INCL = -I$(PETSC_DIR)/include/ -I$(SLEPC_DIR)/include/
    %.o: %.f90
        $(FC) $(FLAGS) $(INCL) -c $< -o $@ $(SLEPC_EPS_LIB)
    %.o: %.F90
        $(FC) $(FLAGS) $(INCL) -c $< -o $@ $(SLEPC_EPS_LIB)
    $(EXEC): $(OBJS)
        $(FC) $(FLAGS) $(INCL) $(OBJS) -o $@ $(SLEPC_EPS_LIB)

Could you spot anything I am doing wrong or dangerous? Furthermore, do 
you know how to avoid the warnings "Same actual argument associated with 
INTENT(IN) argument 'errorcode' and INTENT(OUT) argument 'ierror' at 
(1)" when calling CHKERRQ(ierr) when ierr is declared as INTEGER?


Thanks in advance for your continued support,

Thibaut


Re: [petsc-users] petsc4py sparse matrix construction time

2017-11-01 Thread Satish Balay
Should add:

To make sure your matrix assembly is efficient - you should run the
code with the option -info, and make sure there are no mallocs during
assembly.

I'm not sure how petsc4py processes options. One way is to set such
options with PETSC_OPTIONS env variable - and then run the code.

Satish

On Wed, 1 Nov 2017, Matthew Knepley wrote:

> On Tue, Oct 31, 2017 at 10:00 PM, Cetinbas, Cankur Firat 
> wrote:
> 
> > Hi,
> >
> > Thanks a lot. Based on both of your suggestions I modified the code using
> > Mat.createAIJ() and csr option. The computation time decreased
> > significantly after using this method. Still if there is a better option
> > please let me know after seeing the modified code below.
> >
> > At first trial with 1000x1000 matrix with 96019 non-zeros in the matrix,
> > the computation time did not scale with the number of cores : Single core
> > python @ 0.0035s, single core petsc @ 0.0024s, 2 cores petsc @ 0.0036s, 4
> > cores petsc @ 0.0032, 8 cores petsc @ 0.0030s.
> >
> > Then I tried with larger matrix 181797x181797 with more non-zeros and I
> > got the following results: Single core python @ 0.021, single core petsc @
> > 0.031, 2 cores petsc @ 0.024s, 4 cores petsc @ 0.014, 8 cores petsc @
> > 0.009s, 16 cores petsc @ 0.0087s.
> >
> > I think the optimum number of nodes is highly dependent on matrix size and
> > the number of non-zeros. In the real code matrix size (and so the number of
> > non-zero elements) will grow at every iteration starting with very small
> > matrices growing to very big ones. Is it possible to set the number process
> > from the code dynamically?
> >
> 
> I am not sure how you are interpreting these measurements. Normally, I
> would say
> 
>   1) Everything timed below is "parallel overhead". This is not intended to
> scale with P, instead it will look like a constant, as you observe
> 
>   2) The time to compute the matrix entires should far outstrip the time
> below to figure the nonzero structure. Is this true?
> 
>   3) Solve time is often larger than matrix calculation. Is it?
> 
> Thus, we deciding on parallelism, you need to look at the largest costs,
> and how they scale with P.
> 
>   Thanks,
> 
>  Matt
> 
> Another question is about the data types; mpi4py only let me transfer float
> > type data, and petsc4py only lets me use int32 type indices. Besides keep
> > converting the data, is there any solution for this?
> >
> > The modified code for matrix creation part:
> >
> > comm = MPI.COMM_WORLD
> > rank = comm.Get_rank()
> > size = comm.Get_size()
> >
> > if rank==0:
> > row = np.loadtxt('row1000.out').astype(dtype='int32')
> > col = np.loadtxt('col1000.out').astype(dtype='int32')
> > val = np.loadtxt('val1000.out').astype(dtype='int32')
> > m = 1000# 1000  x 1000 matrix
> > if size>1:
> > rbc = np.bincount(row)*1.0
> > ieq = int(np.floor(m/size))
> > a = [ieq]*size
> > ix = int(np.mod(m,size))
> > if ix>0:
> > for i in range(0,ix):
> > a[i]= a[i]+1
> > a = np.array([0]+a).cumsum()
> > b = np.zeros(a.shape[0]-1)
> > for i in range(0,a.shape[0]-1):
> > b[i]=rbc[a[i]:a[i+1]].sum()  # b is the send counts for
> > Scatterv
> > row = row.astype(dtype=float)
> > col = col.astype(dtype=float)
> > val = val.astype(dtype=float)
> > else:
> > row=None
> > col=None
> > val=None
> > indpt=None
> > b=None
> > m=None
> >
> > if size>1:
> > ml = comm.bcast(m,root=0)
> > bl = comm.bcast(b,root=0)
> > row_lcl = np.zeros(bl[rank])
> > col_lcl = row_lcl.copy()
> > val_lcl = row_lcl.copy()
> >
> > comm.Scatterv([row,b],row_lcl)
> > comm.Scatterv([col,b],col_lcl)
> > comm.Scatterv([val,b],val_lcl)
> > comm.Barrier()
> >
> > row_lcl = row_lcl.astype(dtype='int32')
> > col_lcl = col_lcl.astype(dtype='int32')
> > val_lcl = val_lcl.astype(dtype='int32')
> >
> > indptr = np.bincount(row_lcl)
> > indptr = indptr[indptr>0]
> > indptr = np.insert(indptr,0,0).cumsum()
> > indptr = indptr.astype(dtype='int32')
> > comm.Barrier()
> >
> > pA = PETSc.Mat().createAIJ(size=(ml,ml),csr=(indptr, col_lcl,
> > val_lcl)) # Matrix generation
> >
> > else:
> > indptr = np.bincount(row)
> > indptr = np.insert(indptr,0,0).cumsum()
> > indptr = indptr.astype(dtype='int32')
> > st=time.time()
> > pA = PETSc.Mat().createAIJ(size=(m,m),csr=(indptr, col, val))
> > print('dt:',time.time()-st)
> >
> >
> > Regards,
> >
> > Firat
> >
> >
> > -Original Message-
> > From: Smith, Barry F.
> > Sent: Tuesday, October 31, 2017 10:18 AM
> > To: Matthew Knepley
> > Cc: Cetinbas, Cankur Firat; petsc-users@mcs.anl.gov; Ahluwalia, Rajesh K.
> > Subject: Re: [petsc-users] petsc4py sparse matrix construction time
> >
> >
> >You also need to make sure that most matrix entries are generated on
> > the 

Re: [petsc-users] petsc4py sparse matrix construction time

2017-11-01 Thread Matthew Knepley
On Tue, Oct 31, 2017 at 10:00 PM, Cetinbas, Cankur Firat 
wrote:

> Hi,
>
> Thanks a lot. Based on both of your suggestions I modified the code using
> Mat.createAIJ() and csr option. The computation time decreased
> significantly after using this method. Still if there is a better option
> please let me know after seeing the modified code below.
>
> At first trial with 1000x1000 matrix with 96019 non-zeros in the matrix,
> the computation time did not scale with the number of cores : Single core
> python @ 0.0035s, single core petsc @ 0.0024s, 2 cores petsc @ 0.0036s, 4
> cores petsc @ 0.0032, 8 cores petsc @ 0.0030s.
>
> Then I tried with larger matrix 181797x181797 with more non-zeros and I
> got the following results: Single core python @ 0.021, single core petsc @
> 0.031, 2 cores petsc @ 0.024s, 4 cores petsc @ 0.014, 8 cores petsc @
> 0.009s, 16 cores petsc @ 0.0087s.
>
> I think the optimum number of nodes is highly dependent on matrix size and
> the number of non-zeros. In the real code matrix size (and so the number of
> non-zero elements) will grow at every iteration starting with very small
> matrices growing to very big ones. Is it possible to set the number process
> from the code dynamically?
>

I am not sure how you are interpreting these measurements. Normally, I
would say

  1) Everything timed below is "parallel overhead". This is not intended to
scale with P, instead it will look like a constant, as you observe

  2) The time to compute the matrix entires should far outstrip the time
below to figure the nonzero structure. Is this true?

  3) Solve time is often larger than matrix calculation. Is it?

Thus, we deciding on parallelism, you need to look at the largest costs,
and how they scale with P.

  Thanks,

 Matt

Another question is about the data types; mpi4py only let me transfer float
> type data, and petsc4py only lets me use int32 type indices. Besides keep
> converting the data, is there any solution for this?
>
> The modified code for matrix creation part:
>
> comm = MPI.COMM_WORLD
> rank = comm.Get_rank()
> size = comm.Get_size()
>
> if rank==0:
> row = np.loadtxt('row1000.out').astype(dtype='int32')
> col = np.loadtxt('col1000.out').astype(dtype='int32')
> val = np.loadtxt('val1000.out').astype(dtype='int32')
> m = 1000# 1000  x 1000 matrix
> if size>1:
> rbc = np.bincount(row)*1.0
> ieq = int(np.floor(m/size))
> a = [ieq]*size
> ix = int(np.mod(m,size))
> if ix>0:
> for i in range(0,ix):
> a[i]= a[i]+1
> a = np.array([0]+a).cumsum()
> b = np.zeros(a.shape[0]-1)
> for i in range(0,a.shape[0]-1):
> b[i]=rbc[a[i]:a[i+1]].sum()  # b is the send counts for
> Scatterv
> row = row.astype(dtype=float)
> col = col.astype(dtype=float)
> val = val.astype(dtype=float)
> else:
> row=None
> col=None
> val=None
> indpt=None
> b=None
> m=None
>
> if size>1:
> ml = comm.bcast(m,root=0)
> bl = comm.bcast(b,root=0)
> row_lcl = np.zeros(bl[rank])
> col_lcl = row_lcl.copy()
> val_lcl = row_lcl.copy()
>
> comm.Scatterv([row,b],row_lcl)
> comm.Scatterv([col,b],col_lcl)
> comm.Scatterv([val,b],val_lcl)
> comm.Barrier()
>
> row_lcl = row_lcl.astype(dtype='int32')
> col_lcl = col_lcl.astype(dtype='int32')
> val_lcl = val_lcl.astype(dtype='int32')
>
> indptr = np.bincount(row_lcl)
> indptr = indptr[indptr>0]
> indptr = np.insert(indptr,0,0).cumsum()
> indptr = indptr.astype(dtype='int32')
> comm.Barrier()
>
> pA = PETSc.Mat().createAIJ(size=(ml,ml),csr=(indptr, col_lcl,
> val_lcl)) # Matrix generation
>
> else:
> indptr = np.bincount(row)
> indptr = np.insert(indptr,0,0).cumsum()
> indptr = indptr.astype(dtype='int32')
> st=time.time()
> pA = PETSc.Mat().createAIJ(size=(m,m),csr=(indptr, col, val))
> print('dt:',time.time()-st)
>
>
> Regards,
>
> Firat
>
>
> -Original Message-
> From: Smith, Barry F.
> Sent: Tuesday, October 31, 2017 10:18 AM
> To: Matthew Knepley
> Cc: Cetinbas, Cankur Firat; petsc-users@mcs.anl.gov; Ahluwalia, Rajesh K.
> Subject: Re: [petsc-users] petsc4py sparse matrix construction time
>
>
>You also need to make sure that most matrix entries are generated on
> the process that they will belong on.
>
>   Barry
>
> > On Oct 30, 2017, at 8:01 PM, Matthew Knepley  wrote:
> >
> > On Mon, Oct 30, 2017 at 8:06 PM, Cetinbas, Cankur Firat <
> ccetin...@anl.gov> wrote:
> > Hello,
> >
> >
> >
> > I am a beginner both in PETSc and mpi4py. I have been working on
> parallelizing our water transport code (where we solve linear system of
> equations) and I started with the toy code below.
> >
> >
> >
> > The toy code reads right hand size (rh), row, column, value vectors to
> construct sparse coefficient matrix and then scatters them to construct
> parallel PETSc