Hi Dave,

Sorry, I should have put more comment to explain the code.
The number of process in each dimension is the same: Px = Py=Pz=P. So is the domain size. So if the you want to run the code for a 512^3 grid points on 16^3 cores, you need to set "-N 512 -P 16" in the command line. I add more comments and also fix an error in the attached code. ( The error only effects the accuracy of solution but not the memory usage. )

Thank you.
Frank

On 9/14/2016 9:05 PM, Dave May wrote:


On Thursday, 15 September 2016, Dave May <dave.mayhe...@gmail.com <mailto:dave.mayhe...@gmail.com>> wrote:



    On Thursday, 15 September 2016, frank <hengj...@uci.edu
    <javascript:_e(%7B%7D,'cvml','hengj...@uci.edu');>> wrote:

        Hi,

        I write a simple code to re-produce the error. I hope this can
        help to diagnose the problem.
        The code just solves a 3d poisson equation.


    Why is the stencil width a runtime parameter?? And why is the
    default value 2? For 7-pnt FD Laplace, you only need a stencil
    width of 1.

    Was this choice made to mimic something in the real application code?


Please ignore - I misunderstood your usage of the param set by -P


        I run the code on a 1024^3 mesh. The process partition is 32 *
        32 * 32. That's when I re-produce the OOM error. Each core has
        about 2G memory.
        I also run the code on a 512^3 mesh with 16 * 16 * 16
        processes. The ksp solver works fine.
        I attached the code, ksp_view_pre's output and my petsc option
        file.

        Thank you.
        Frank

        On 09/09/2016 06:38 PM, Hengjie Wang wrote:
        Hi Barry,

        I checked. On the supercomputer, I had the option
        "-ksp_view_pre" but it is not in file I sent you. I am sorry
        for the confusion.

        Regards,
        Frank

        On Friday, September 9, 2016, Barry Smith
        <bsm...@mcs.anl.gov> wrote:


            > On Sep 9, 2016, at 3:11 PM, frank <hengj...@uci.edu> wrote:
            >
            > Hi Barry,
            >
            > I think the first KSP view output is from
            -ksp_view_pre. Before I submitted the test, I was not
            sure whether there would be OOM error or not. So I added
            both -ksp_view_pre and -ksp_view.

              But the options file you sent specifically does NOT
            list the -ksp_view_pre so how could it be from that?

               Sorry to be pedantic but I've spent too much time in
            the past trying to debug from incorrect information and
            want to make sure that the information I have is correct
            before thinking. Please recheck exactly what happened.
            Rerun with the exact input file you emailed if that is
            needed.

               Barry

            >
            > Frank
            >
            >
            > On 09/09/2016 12:38 PM, Barry Smith wrote:
            >>   Why does ksp_view2.txt have two KSP views in it
            while ksp_view1.txt has only one KSPView in it? Did you
            run two different solves in the 2 case but not the one?
            >>
            >>   Barry
            >>
            >>
            >>
            >>> On Sep 9, 2016, at 10:56 AM, frank <hengj...@uci.edu>
            wrote:
            >>>
            >>> Hi,
            >>>
            >>> I want to continue digging into the memory problem here.
            >>> I did find a work around in the past, which is to use
            less cores per node so that each core has 8G memory.
            However this is deficient and expensive. I hope to locate
            the place that uses the most memory.
            >>>
            >>> Here is a brief summary of the tests I did in past:
            >>>> Test1:   Mesh 1536*128*384  | Process Mesh 48*4*12
>>> Maximum (over computational time) process memory: total 7.0727e+08 >>> Current process memory: total 7.0727e+08
            >>> Maximum (over computational time) space
            PetscMalloc()ed:  total 6.3908e+11
>>> Current space PetscMalloc()ed: total 1.8275e+09
            >>>
            >>>> Test2:    Mesh 1536*128*384  | Process Mesh 96*8*24
>>> Maximum (over computational time) process memory: total 5.9431e+09 >>> Current process memory: total 5.9431e+09
            >>> Maximum (over computational time) space
            PetscMalloc()ed:  total 5.3202e+12
>>> Current space PetscMalloc()ed: total 5.4844e+09
            >>>
            >>>> Test3:    Mesh 3072*256*768  | Process Mesh 96*8*24
            >>>     OOM( Out Of Memory ) killer of the supercomputer
            terminated the job during "KSPSolve".
            >>>
            >>> I attached the output of ksp_view( the third test's
            output is from ksp_view_pre ), memory_view and also the
            petsc options.
            >>>
            >>> In all the tests, each core can access about 2G
            memory. In test3, there are 4223139840 non-zeros in the
            matrix. This will consume about 1.74M, using double
            precision. Considering some extra memory used to store
            integer index, 2G memory should still be way enough.
            >>>
            >>> Is there a way to find out which part of KSPSolve
            uses the most memory?
            >>> Thank you so much.
            >>>
            >>> BTW, there are 4 options remains unused and I don't
            understand why they are omitted:
            >>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly
            >>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi
            >>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1
            >>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson
            >>>
            >>>
            >>> Regards,
            >>> Frank
            >>>
            >>> On 07/13/2016 05:47 PM, Dave May wrote:
            >>>>
            >>>> On 14 July 2016 at 01:07, frank <hengj...@uci.edu>
            wrote:
            >>>> Hi Dave,
            >>>>
            >>>> Sorry for the late reply.
            >>>> Thank you so much for your detailed reply.
            >>>>
            >>>> I have a question about the estimation of the memory
            usage. There are 4223139840 allocated non-zeros and 18432
            MPI processes. Double precision is used. So the memory
            per process is:
            >>>>   4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ?
            >>>> Did I do sth wrong here? Because this seems too small.
            >>>>
            >>>> No - I totally f***ed it up. You are correct.
            That'll teach me for fumbling around with my iphone
            calculator and not using my brain. (Note that to convert
            to MB just divide by 1e6, not 1024^2 - although I
            apparently cannot convert between units correctly....)
            >>>>
            >>>> From the PETSc objects associated with the solver,
            It looks like it _should_ run with 2GB per MPI rank.
            Sorry for my mistake. Possibilities are: somewhere in
            your usage of PETSc you've introduced a memory leak;
            PETSc is doing a huge over allocation (e.g. as per our
            discussion of MatPtAP); or in your application code there
            are other objects you have forgotten to log the memory for.
            >>>>
            >>>>
            >>>>
            >>>> I am running this job on Bluewater
            >>>> I am using the 7 points FD stencil in 3D.
            >>>>
            >>>> I thought so on both counts.
            >>>>
            >>>> I apologize that I made a stupid mistake in
            computing the memory per core. My settings render each
            core can access only 2G memory on average instead of 8G
            which I mentioned in previous email. I re-run the job
            with 8G memory per core on average and there is no "Out
            Of Memory" error. I would do more test to see if there is
            still some memory issue.
            >>>>
            >>>> Ok. I'd still like to know where the memory was
            being used since my estimates were off.
            >>>>
            >>>>
            >>>> Thanks,
            >>>>   Dave
            >>>>
            >>>> Regards,
            >>>> Frank
            >>>>
            >>>>
            >>>>
            >>>> On 07/11/2016 01:18 PM, Dave May wrote:
            >>>>> Hi Frank,
            >>>>>
            >>>>>
            >>>>> On 11 July 2016 at 19:14, frank <hengj...@uci.edu>
            wrote:
            >>>>> Hi Dave,
            >>>>>
            >>>>> I re-run the test using bjacobi as the
            preconditioner on the coarse mesh of telescope. The Grid
            is 3072*256*768 and process mesh is 96*8*24. The petsc
            option file is attached.
            >>>>> I still got the "Out Of Memory" error. The error
            occurred before the linear solver finished one step. So I
            don't have the full info from ksp_view. The info from
            ksp_view_pre is attached.
            >>>>>
            >>>>> Okay - that is essentially useless (sorry)
            >>>>>
            >>>>> It seems to me that the error occurred when the
            decomposition was going to be changed.
            >>>>>
            >>>>> Based on what information?
            >>>>> Running with -info would give us more clues, but
            will create a ton of output.
            >>>>> Please try running the case which failed with -info
            >>>>>  I had another test with a grid of 1536*128*384 and
            the same process mesh as above. There was no error. The
            ksp_view info is attached for comparison.
            >>>>> Thank you.
            >>>>>
            >>>>>
            >>>>> [3] Here is my crude estimate of your memory usage.
            >>>>> I'll target the biggest memory hogs only to get an
            order of magnitude estimate
            >>>>>
            >>>>> * The Fine grid operator contains 4223139840
            non-zeros --> 1.8 GB per MPI rank assuming double precision.
            >>>>> The indices for the AIJ could amount to another 0.3
            GB (assuming 32 bit integers)
            >>>>>
            >>>>> * You use 5 levels of coarsening, so the other
            operators should represent (collectively)
            >>>>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4  ~ 300 MB per
            MPI rank on the communicator with 18432 ranks.
            >>>>> The coarse grid should consume ~ 0.5 MB per MPI
            rank on the communicator with 18432 ranks.
            >>>>>
            >>>>> * You use a reduction factor of 64, making the new
            communicator with 288 MPI ranks.
            >>>>> PCTelescope will first gather a temporary matrix
            associated with your coarse level operator assuming a
            comm size of 288 living on the comm with size 18432.
            >>>>> This matrix will require approximately 0.5 * 64 =
            32 MB per core on the 288 ranks.
            >>>>> This matrix is then used to form a new MPIAIJ
            matrix on the subcomm, thus require another 32 MB per rank.
            >>>>> The temporary matrix is now destroyed.
            >>>>>
            >>>>> * Because a DMDA is detected, a permutation matrix
            is assembled.
            >>>>> This requires 2 doubles per point in the DMDA.
            >>>>> Your coarse DMDA contains 92 x 16 x 48 points.
            >>>>> Thus the permutation matrix will require < 1 MB per
            MPI rank on the sub-comm.
            >>>>>
            >>>>> * Lastly, the matrix is permuted. This uses
            MatPtAP(), but the resulting operator will have the same
            memory footprint as the unpermuted matrix (32 MB). At any
            stage in PCTelescope, only 2 operators of size 32 MB are
            held in memory when the DMDA is provided.
            >>>>>
            >>>>> From my rough estimates, the worst case memory foot
            print for any given core, given your options is approximately
            >>>>> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB  = 2465 MB
            >>>>> This is way below 8 GB.
            >>>>>
            >>>>> Note this estimate completely ignores:
            >>>>> (1) the memory required for the restriction operator,
            >>>>> (2) the potential growth in the number of non-zeros
            per row due to Galerkin coarsening (I wished
            -ksp_view_pre reported the output from MatView so we
            could see the number of non-zeros required by the coarse
            level operators)
            >>>>> (3) all temporary vectors required by the CG
            solver, and those required by the smoothers.
            >>>>> (4) internal memory allocated by MatPtAP
            >>>>> (5) memory associated with IS's used within PCTelescope
            >>>>>
            >>>>> So either I am completely off in my estimates, or
            you have not carefully estimated the memory usage of your
            application code. Hopefully others might examine/correct
            my rough estimates
            >>>>>
            >>>>> Since I don't have your code I cannot access the
            latter.
            >>>>> Since I don't have access to the same machine you
            are running on, I think we need to take a step back.
            >>>>>
            >>>>> [1] What machine are you running on? Send me a URL
            if its available
            >>>>>
            >>>>> [2] What discretization are you using? (I am
            guessing a scalar 7 point FD stencil)
            >>>>> If it's a 7 point FD stencil, we should be able to
            examine the memory usage of your solver configuration
            using a standard, light weight existing PETSc example,
            run on your machine at the same scale.
            >>>>> This would hopefully enable us to correctly
            evaluate the actual memory usage required by the solver
            configuration you are using.
            >>>>>
            >>>>> Thanks,
            >>>>>   Dave
            >>>>>
            >>>>>
            >>>>> Frank
            >>>>>
            >>>>>
            >>>>>
            >>>>>
            >>>>> On 07/08/2016 10:38 PM, Dave May wrote:
            >>>>>>
            >>>>>> On Saturday, 9 July 2016, frank <hengj...@uci.edu>
            wrote:
            >>>>>> Hi Barry and Dave,
            >>>>>>
            >>>>>> Thank both of you for the advice.
            >>>>>>
            >>>>>> @Barry
            >>>>>> I made a mistake in the file names in last email.
            I attached the correct files this time.
            >>>>>> For all the three tests, 'Telescope' is used as
            the coarse preconditioner.
            >>>>>>
            >>>>>> == Test1:   Grid: 1536*128*384,   Process Mesh:
            48*4*12
>>>>>> Part of the memory usage: Vector 125 124 3971904 0.
            >>>>>>                   Matrix   101 101      9462372  0
            >>>>>>
            >>>>>> == Test2: Grid: 1536*128*384,   Process Mesh: 96*8*24
>>>>>> Part of the memory usage: Vector 125 124 681672 0.
            >>>>>>                   Matrix   101 101      1462180  0.
            >>>>>>
            >>>>>> In theory, the memory usage in Test1 should be 8
            times of Test2. In my case, it is about 6 times.
            >>>>>>
            >>>>>> == Test3: Grid: 3072*256*768,   Process Mesh:
            96*8*24. Sub-domain per process: 32*32*32
            >>>>>> Here I get the out of memory error.
            >>>>>>
            >>>>>> I tried to use -mg_coarse jacobi. In this way, I
            don't need to set -mg_coarse_ksp_type and
            -mg_coarse_pc_type explicitly, right?
            >>>>>> The linear solver didn't work in this case. Petsc
            output some errors.
            >>>>>>
            >>>>>> @Dave
            >>>>>> In test3, I use only one instance of 'Telescope'.
            On the coarse mesh of 'Telescope', I used LU as the
            preconditioner instead of SVD.
            >>>>>> If my set the levels correctly, then on the last
            coarse mesh of MG where it calls 'Telescope', the
            sub-domain per process is 2*2*2.
            >>>>>> On the last coarse mesh of 'Telescope', there is
            only one grid point per process.
            >>>>>> I still got the OOM error. The detailed petsc
            option file is attached.
            >>>>>>
            >>>>>> Do you understand the expected memory usage for
            the particular parallel LU implementation you are using?
            I don't (seriously). Replace LU with bjacobi and re-run
            this test. My point about solver debugging is still valid.
            >>>>>>
            >>>>>> And please send the result of KSPView so we can
            see what is actually used in the computations
            >>>>>>
            >>>>>> Thanks
            >>>>>>   Dave
            >>>>>>
            >>>>>>
            >>>>>> Thank you so much.
            >>>>>>
            >>>>>> Frank
            >>>>>>
            >>>>>>
            >>>>>>
            >>>>>> On 07/06/2016 02:51 PM, Barry Smith wrote:
            >>>>>> On Jul 6, 2016, at 4:19 PM, frank
            <hengj...@uci.edu> wrote:
            >>>>>>
            >>>>>> Hi Barry,
            >>>>>>
            >>>>>> Thank you for you advice.
            >>>>>> I tried three test. In the 1st test, the grid is
            3072*256*768 and the process mesh is 96*8*24.
            >>>>>> The linear solver is 'cg' the preconditioner is
            'mg' and 'telescope' is used as the preconditioner at the
            coarse mesh.
            >>>>>> The system gives me the "Out of Memory" error
            before the linear system is completely solved.
            >>>>>> The info from '-ksp_view_pre' is attached. I seems
            to me that the error occurs when it reaches the coarse mesh.
            >>>>>>
            >>>>>> The 2nd test uses a grid of 1536*128*384 and
process mesh is 96*8*24. The 3rd test uses the same grid but a different
            process mesh 48*4*12.
            >>>>>>     Are you sure this is right? The total matrix
            and vector memory usage goes from 2nd test
>>>>>> Vector 384 383 8,193,712 0. >>>>>> Matrix 103 103 11,508,688 0.
            >>>>>> to 3rd test
>>>>>> Vector 384 383 1,590,520 0. >>>>>> Matrix 103 103 3,508,664 0.
            >>>>>> that is the memory usage got smaller but if you
            have only 1/8th the processes and the same grid it should
            have gotten about 8 times bigger. Did you maybe cut the
            grid by a factor of 8 also? If so that still doesn't
            explain it because the memory usage changed by a factor
            of 5 something for the vectors and 3 something for the
            matrices.
            >>>>>>
            >>>>>>
            >>>>>> The linear solver and petsc options in 2nd and 3rd
            tests are the same in 1st test. The linear solver works
            fine in both test.
            >>>>>> I attached the memory usage of the 2nd and 3rd
            tests. The memory info is from the option '-log_summary'.
            I tried to use '-momery_info' as you suggested, but in my
            case petsc treated it as an unused option. It output
            nothing about the memory. Do I need to add sth to my code
            so I can use '-memory_info'?
            >>>>>>     Sorry, my mistake the option is -memory_view
            >>>>>>
            >>>>>>    Can you run the one case with -memory_view and
            -mg_coarse jacobi -ksp_max_it 1 (just so it doesn't
            iterate forever) to see how much memory is used without
            the telescope? Also run case 2 the same way.
            >>>>>>
            >>>>>>    Barry
            >>>>>>
            >>>>>>
            >>>>>>
            >>>>>> In both tests the memory usage is not large.
            >>>>>>
>>>>>> It seems to me that it might be the 'telescope' preconditioner that allocated a lot of memory and caused
            the error in the 1st test.
            >>>>>> Is there is a way to show how much memory it
            allocated?
            >>>>>>
            >>>>>> Frank
            >>>>>>
            >>>>>> On 07/05/2016 03:37 PM, Barry Smith wrote:
            >>>>>>    Frank,
            >>>>>>
            >>>>>>      You can run with -ksp_view_pre to have it
            "view" the KSP before the solve so hopefully it gets that
            far.
            >>>>>>
            >>>>>>       Please run the problem that does fit with
            -memory_info when the problem completes it will show the
            "high water mark" for PETSc allocated memory and total
            memory used. We first want to look at these numbers to
            see if it is using more memory than you expect. You could
            also run with say half the grid spacing to see how the
            memory usage scaled with the increase in grid points.
            Make the runs also with -log_view and send all the output
            from these options.
            >>>>>>
            >>>>>>     Barry
            >>>>>>
            >>>>>> On Jul 5, 2016, at 5:23 PM, frank
            <hengj...@uci.edu> wrote:
            >>>>>>
            >>>>>> Hi,
            >>>>>>
            >>>>>> I am using the CG ksp solver and Multigrid
            preconditioner  to solve a linear system in parallel.
            >>>>>> I chose to use the 'Telescope' as the
            preconditioner on the coarse mesh for its good performance.
            >>>>>> The petsc options file is attached.
            >>>>>>
            >>>>>> The domain is a 3d box.
            >>>>>> It works well when the grid is  1536*128*384 and
            the process mesh is 96*8*24. When I double the size of
            grid and                                keep the same
            process mesh and petsc options, I get an "out of memory"
            error from the super-cluster I am using.
            >>>>>> Each process has access to at least 8G memory,
            which should be more than enough for my application. I am
            sure that all the other parts of my code( except the
            linear solver ) do not use much memory. So I doubt if
            there is something wrong with the linear solver.
            >>>>>> The error occurs before the linear system is
            completely solved so I don't have the info from ksp view.
            I am not able to re-produce the error with a smaller
            problem either.
            >>>>>> In addition,  I tried to use the block jacobi as
            the preconditioner with the same grid and same
            decomposition. The linear solver runs extremely slow but
            there is no memory error.
            >>>>>>
            >>>>>> How can I diagnose what exactly cause the error?
            >>>>>> Thank you so much.
            >>>>>>
            >>>>>> Frank
            >>>>>> <petsc_options.txt>
            >>>>>>
            
<ksp_view_pre.txt><memory_test2.txt><memory_test3.txt><petsc_options.txt>
            >>>>>>
            >>>>>
            >>>>
            >>>
            
<ksp_view1.txt><ksp_view2.txt><ksp_view3.txt><memory1.txt><memory2.txt><petsc_options1.txt><petsc_options2.txt><petsc_options3.txt>
            >



! Purpose: Test ksp solver with a 3d poisson eqn.
!
! Discrip: The domain is a periodic cube. The exact solution is 
sin(x)*sin(y)*sin(z).
!          The eqn is discretized with 2nd order central difference.
!          The rhs of the eqn would be -3*sin(x)*(y)*sin(z) * (-h**2), where h 
is the space step.
!          In the matrix, the diagram element is 6 and the other 6 non-zero 
elements in the row is -1.
!
! Usage:   The length of the cube is set by command line input.
!          The number of process in each dimension is the same and also set by 
command line input.
!          To run the code on 512^3 grid points and 16^3 cores: 
!                 mpirun -n 4096 ./test_ksp.exe -N 512 -P 16
!          Petsc options are passed in a file "pestc_options.txt" in the 
working directory.
!
! Author:  Frank
!          hengj...@uci.edu
!
! Date:    09/14/2016

PROGRAM test_ksp



#include <petsc/finclude/petscdmdef.h> 
#include <petsc/finclude/petscmatdef.h>
#include <petsc/finclude/petscvecdef.h>
#include <petsc/finclude/petsckspdef.h>

  USE petscdmda
  USE petscsys

  IMPLICIT NONE

  INTEGER, PARAMETER :: pr = 8
  CHARACTER(*), PARAMETER :: petsc_options_file="petsc_options.txt"

  DM                :: decomp
  DMBoundaryType    :: BType(3)
  INTEGER           :: N                          ! length of cube 
  integer           :: P                          ! process # in each dimension
  INTEGER           :: is, js ,ks, nxl, nyl, nzl  ! subdomain index
  INTEGER           :: rank
  INTEGER           :: i, j, k
  REAL(pr)          :: h                          ! space step
  Vec               :: x,b                        ! solution and rhs
  REAL(pr), POINTER :: ptr_vec(:,:,:) => NULL()   ! pointer to petsc vector
  Mat               :: A
  MatNullSpace      :: nullspace
  MatStencil        :: row(4), col(4,7)
  PetscInt          :: i1 = 1, i7 = 7
  REAL(pr)          :: value(7)
  KSP               :: ksp
  PetscErrorCode    :: ierr
  REAL(pr)          :: q2 = 0.5_pr
  REAL(pr)          :: erri = 0, err1 = 0
  LOGICAL           :: is_converged, is_set
  

  CALL PetscInitialize( petsc_options_file, ierr ) 
  CALL MPI_Comm_rank( PETSC_COMM_WORLD, rank, ierr )

  ! default values
  N     = 64
  P     = 2
  BType = DM_BOUNDARY_PERIODIC

  ! read domain size and process# from command line
  CALL PetscOptionsGetInt( PETSC_NULL_OBJECT, PETSC_NULL_CHARACTER, '-N', N, 
is_set, ierr )
  CALL PetscOptionsGetInt( PETSC_NULL_OBJECT, PETSC_NULL_CHARACTER, '-P', P, 
is_set, ierr )

  CALL DMDACreate3d( PETSC_COMM_WORLD, BType(1), BType(2), BType(3), &
                   & DMDA_STENCIL_STAR, N, N, N, P, P, P, 1, 1, &
                   & PETSC_NULL_INTEGER,  PETSC_NULL_INTEGER, 
PETSC_NULL_INTEGER, &
                   & decomp, ierr ) 
  CALL DMDAGetCorners( decomp, is, js, ks, nxl, nyl, nzl, ierr) 

  !WRITE(*,'(7I6)') rank, is, js, ks, nxl, nyl, nzl

  ! create vector for rhs and solution
  CALL DMCreateGlobalVector( decomp, b, ierr )
  CALL VecDuplicate( b, x, ierr )
  CALL VecSet( x, 0.0_pr, ierr )

  ! create matrix
  CALL DMSetMatType( decomp, MATAIJ, ierr )
  CALL DMCreateMatrix( decomp, A, ierr )

  ! create ksp solver
  CALL KSPCreate( PETSC_COMM_WORLD, ksp, ierr ) 
  CALL KSPSetDM( ksp, decomp, ierr )
  CALL KSPSetDMActive( ksp, PETSC_FALSE, ierr )
  CALL KSPSetFromOptions( ksp, ierr )

  ! create nullspace, periodic bc give the matrix a nullspace 
  CALL MatNullSpaceCreate( PETSC_COMM_WORLD, PETSC_TRUE, PETSC_NULL_INTEGER, &
                         & PETSC_NULL_INTEGER, nullspace, ierr )

  ! set rhs
  CALL DMDAVecGetArrayF90( decomp, b, ptr_vec, ierr )
  DO i = is, is+nxl-1
    DO j = js, js+nyl-1
      DO k = ks, ks+nzl-1
        ptr_vec(i,j,k) = -3 * SIN((i+q2)*h) * SIN((j+q2)*h) * SIN((k+q2)*h)
      END DO
    END DO
  END DO
  h       = 1.0_pr / N
  ptr_vec = - h**2 * ptr_vec
 
  ! assembly rhs
  CALL VecAssemblyBegin( b, ierr )
  CALL VecAssemblyEnd( b, ierr )
  CALL DMDAVecRestoreArrayF90( decomp, b, ptr_vec, ierr )
  
  ! set matrix
  DO i = is, is+nxl-1
    DO j = js, js+nyl-1
      DO k = ks, ks+nzl-1

        ! row index of current point
        row(MatStencil_i) = i
        row(MatStencil_j) = j
        row(MatStencil_k) = k   

        ! column index of current point and its neighbors
        col(MatStencil_i,1) = i;    col(MatStencil_j,1) = j;    
col(MatStencil_k,1) = k
        col(MatStencil_i,2) = i-1;  col(MatStencil_j,2) = j;    
col(MatStencil_k,2) = k     
        col(MatStencil_i,3) = i+1;  col(MatStencil_j,3) = j;    
col(MatStencil_k,3) = k 
        col(MatStencil_i,4) = i;    col(MatStencil_j,4) = j-1;  
col(MatStencil_k,4) = k 
        col(MatStencil_i,5) = i;    col(MatStencil_j,5) = j+1;  
col(MatStencil_k,5) = k 
        col(MatStencil_i,6) = i;    col(MatStencil_j,6) = j;    
col(MatStencil_k,6) = k-1 
        col(MatStencil_i,7) = i;    col(MatStencil_j,7) = j;    
col(MatStencil_k,7) = k+1 

        ! set values at current point and its neighbors
        value(1 ) =  6.0_pr
        value(2:) = -1.0_pr

        ! set the matrix's elements
         CALL MatSetValuesStencil( A, i1, row, i7, col, value, INSERT_VALUES, 
ierr ) 

      END DO
    END DO
  END DO

  ! assembly the matrix
  CALL MatAssemblyBegin( A, MAT_FINAL_ASSEMBLY, ierr )
  CALL MatAssemblyEnd( A, MAT_FINAL_ASSEMBLY, ierr )

  ! remove the nullspace
  CALL MatSetNullSpace( A, nullspace, ierr )
  CALL MatNullSpaceRemove( nullspace, b, PETSC_NULL_OBJECT, ierr )

  ! Solve system
  CALL KSPSetOperators( ksp, A, A, ierr )
  !CALL KSPSetReusePreconditioner( ksp, reuse_pc, ierr )
  CALL KSPSolve( ksp, b, x, ierr )
  !CALL KSPGetIterationNumber( ksp, niter, ierr ) 
  !CALL KSPGetConvergedReason( ksp, is_converged, ierr )

  ! get solution
  CALL VecAssemblyBegin( x, ierr )
  CALL VecAssemblyEnd( x, ierr )
  CALL DMDAVecGetArrayF90( decomp, x, ptr_vec, ierr )

  ! check the error
  DO i = is, is+nxl-1
    DO j = js, js+nyl-1
      DO k = ks, ks+nzl-1
        erri = MAX(erri, ABS(ptr_vec(i,j,k) - 
SIN((i+q2)*h)*SIN((j+q2)*h)*SIN((k+q2)*h)))
        err1 = err1 + ABS(ptr_vec(i,j,k) - 
SIN((i+q2)*h)*SIN((j+q2)*h)*SIN((k+q2)*h))
      END DO
    END DO
  END DO
  CALL DMDAVecRestoreArrayF90( decomp, x, ptr_vec, ierr )

  CALL MPI_Allreduce( erri, MPI_IN_PLACE, 1, MPI_REAL8, MPI_MAX, 
PETSC_COMM_WORLD, ierr )
  CALL MPI_Allreduce( err1, MPI_IN_PLACE, 1, MPI_REAL8, MPI_SUM, 
PETSC_COMM_WORLD, ierr )

  IF( rank == 0 ) THEN 
    PRINT*, 'norm1 error: ', err1 / N**3
    PRINT*, 'norm inf error:', erri
  END IF

  CALL VecDestroy( x, ierr )
  CALL VecDestroy( b, ierr ) 
  CALL MatDestroy( A, ierr )
  CALL KSPDestroy( ksp, ierr )

END PROGRAM test_ksp

Reply via email to