> On Jul 13, 2015, at 2:05 PM, Anthony Haas <[email protected]> wrote: > > Hi, > > I ran my program under Valgrind with 2 processors using the following: > > ${PETSC_DIR}/${PETSC_ARCH}/bin/mpiexec -n 2 valgrind --tool=memcheck > --leak-check=full --show-leak-kinds=all --num-callers=20 > --log-file=valgrind.log.%p ./BiGlobal_Spectral_Petsc.x -malloc off -ksp_type > preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist -eps_view > -st_ksp_type preonly -st_pc_type lu -st_pc_factor_mat_solver_package > superlu_dist > > Note that in my program, I use once PETSc alone to solve a linear system of > equations (see module_baseflows.F90) and then later in the code I use SLEPc > to get the eigenvalues (see module_petsc.F90). My main program is called > BiGlobal_Spectral_Petsc.F90 > > I have attached the log files from Valgrind. It seems that quite a few erros > occur in pzgstrf. See valgrind.log.31371 and 31372. Do I have any control > over these errors?
These are likely bugs in SuperLU_Dist. > Then starting from line 403 in valgrind.log.31371 and 458 in > valgrind.log.31372, I have a series of errors that seem to be related to a > VecCreateSeq and to a VecScatterCreateToZero (see in module_baseflows.F90). > Is there something I can do to avoid these errors? These are because you are not destroying all your vectors at the end. > > Thanks > > Anthony > > > > > > > > On 07/08/2015 01:49 PM, Barry Smith wrote: >> You should try running under valgrind, see: >> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >> >> You can also run in the debugger (yes this is tricky on a batch system >> but possible) and see exactly what triggers the floating point exception or >> when it "hangs" interrupt the program to see where it is "hanging". >> >> >> Barry >> >> >> >>> On Jul 8, 2015, at 12:34 PM, Anthony Paul Haas <[email protected]> >>> wrote: >>> >>> Hi, >>> >>> I have used the switch -mat_superlu_dist_parsymbfact in my pbs script. >>> However, although my program worked fine with sequential symbolic >>> factorization, I get one of the following 2 behaviors when I run with >>> parallel symbolic factorization (depending on the number of processors that >>> I use): >>> >>> 1) the program just hangs (it seems stuck in some subroutine ==> see >>> test.out-hangs) >>> 2) I get a floating point exception ==> see >>> test.out-floating-point-exception >>> >>> Note that as suggested in the Superlu manual, I use a power of 2 number of >>> procs. Are there any tunable parameters for the parallel symbolic >>> factorization? Note that when I build my sparse matrix, most elements I add >>> are nonzero of course but to simplify the programming, I also add a few >>> zero elements in the sparse matrix. I was thinking that maybe if the >>> parallel symbolic factorization proceed by block, there could be some >>> blocks where the pivot would be zero, hence creating the FPE?? >>> >>> Thanks, >>> >>> Anthony >>> >>> >>> >>> On Wed, Jul 8, 2015 at 6:46 AM, Xiaoye S. Li <[email protected]> wrote: >>> Did you find out how to change option to use parallel symbolic >>> factorization? Perhaps PETSc team can help. >>> >>> Sherry >>> >>> >>> On Tue, Jul 7, 2015 at 3:58 PM, Xiaoye S. Li <[email protected]> wrote: >>> Is there an inquiry function that tells you all the available options? >>> >>> Sherry >>> >>> On Tue, Jul 7, 2015 at 3:25 PM, Anthony Paul Haas <[email protected]> >>> wrote: >>> Hi Sherry, >>> >>> Thanks for your message. I have used superlu_dist default options. I did >>> not realize that I was doing serial symbolic factorization. That is >>> probably the cause of my problem. >>> Each node on Garnet has 60GB usable memory and I can run with 1,2,4,8,16 or >>> 32 core per node. >>> >>> So I should use: >>> >>> -mat_superlu_dist_r 20 >>> -mat_superlu_dist_c 32 >>> >>> How do you specify the parallel symbolic factorization option? is it >>> -mat_superlu_dist_matinput 1 >>> >>> Thanks, >>> >>> Anthony >>> >>> >>> On Tue, Jul 7, 2015 at 3:08 PM, Xiaoye S. Li <[email protected]> wrote: >>> For superlu_dist failure, this occurs during symbolic factorization. Since >>> you are using serial symbolic factorization, it requires the entire graph >>> of A to be available in the memory of one MPI task. How much memory do you >>> have for each MPI task? >>> >>> It won't help even if you use more processes. You should try to use >>> parallel symbolic factorization option. >>> >>> Another point. You set up process grid as: >>> Process grid nprow 32 x npcol 20 >>> For better performance, you show swap the grid dimension. That is, it's >>> better to use 20 x 32, never gives nprow larger than npcol. >>> >>> >>> Sherry >>> >>> >>> On Tue, Jul 7, 2015 at 1:27 PM, Barry Smith <[email protected]> wrote: >>> >>> I would suggest running a sequence of problems, 101 by 101 111 by 111 >>> etc and get the memory usage in each case (when you run out of memory you >>> can get NO useful information out about memory needs). You can then plot >>> memory usage as a function of problem size to get a handle on how much >>> memory it is using. You can also run on more and more processes (which >>> have a total of more memory) to see how large a problem you may be able to >>> reach. >>> >>> MUMPS also has an "out of core" version (which we have never used) that >>> could in theory anyways let you get to large problems if you have lots of >>> disk space, but you are on your own figuring out how to use it. >>> >>> Barry >>> >>>> On Jul 7, 2015, at 2:37 PM, Anthony Paul Haas <[email protected]> >>>> wrote: >>>> >>>> Hi Jose, >>>> >>>> In my code, I use once PETSc to solve a linear system to get the baseflow >>>> (without using SLEPc) and then I use SLEPc to do the stability analysis of >>>> that baseflow. This is why, there are some SLEPc options that are not used >>>> in test.out-superlu_dist-151x151 (when I am solving for the baseflow with >>>> PETSc only). I have attached a 101x101 case for which I get the >>>> eigenvalues. That case works fine. However If i increase to 151x151, I get >>>> the error that you can see in test.out-superlu_dist-151x151 (similar error >>>> with mumps: see test.out-mumps-151x151 line 2918 ). If you look a the very >>>> end of the files test.out-superlu_dist-151x151 and test.out-mumps-151x151, >>>> you will see that the last info message printed is: >>>> >>>> On Processor (after EPSSetFromOptions) 0 memory: 0.65073152000E+08 >>>> =====> (see line 807 of module_petsc.F90) >>>> >>>> This means that the memory error probably occurs in the call to EPSSolve >>>> (see module_petsc.F90 line 810). I would like to evaluate how much memory >>>> is required by the most memory intensive operation within EPSSolve. Since >>>> I am solving a generalized EVP, I would imagine that it would be the LU >>>> decomposition. But is there an accurate way of doing it? >>>> >>>> Before starting with iterative solvers, I would like to exploit as much as >>>> I can direct solvers. I tried GMRES with default preconditioner at some >>>> point but I had convergence problem. What solver/preconditioner would you >>>> recommend for a generalized non-Hermitian (EPS_GNHEP) EVP? >>>> >>>> Thanks, >>>> >>>> Anthony >>>> >>>> On Tue, Jul 7, 2015 at 12:17 AM, Jose E. Roman <[email protected]> wrote: >>>> >>>> El 07/07/2015, a las 02:33, Anthony Haas escribió: >>>> >>>>> Hi, >>>>> >>>>> I am computing eigenvalues using PETSc/SLEPc and superlu_dist for the LU >>>>> decomposition (my problem is a generalized eigenvalue problem). The code >>>>> runs fine for a grid with 101x101 but when I increase to 151x151, I get >>>>> the following error: >>>>> >>>>> Can't expand MemType 1: jcol 16104 (and then [NID 00037] 2015-07-06 >>>>> 19:19:17 Apid 31025976: OOM killer terminated this process.) >>>>> >>>>> It seems to be a memory problem. I monitor the memory usage as far as I >>>>> can and it seems that memory usage is pretty low. The most memory >>>>> intensive part of the program is probably the LU decomposition in the >>>>> context of the generalized EVP. Is there a way to evaluate how much >>>>> memory will be required for that step? I am currently running the debug >>>>> version of the code which I would assume would use more memory? >>>>> >>>>> I have attached the output of the job. Note that the program uses twice >>>>> PETSc: 1) to solve a linear system for which no problem occurs, and, 2) >>>>> to solve the Generalized EVP with SLEPc, where I get the error. >>>>> >>>>> Thanks >>>>> >>>>> Anthony >>>>> <test.out-superlu_dist-151x151> >>>> In the output you are attaching there are no SLEPc objects in the report >>>> and SLEPc options are not used. It seems that SLEPc calls are skipped? >>>> >>>> Do you get the same error with MUMPS? Have you tried to solve linear >>>> systems with a preconditioned iterative solver? >>>> >>>> Jose >>>> >>>> >>>> <module_petsc.F90><test.out-mumps-151x151><test.out_superlu_dist-101x101><test.out-superlu_dist-151x151> >>> >>> >>> >>> >>> >>> <test.out-hangs><test.out-floating-point-exception> > > > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > !! SPECTRAL TEMPORAL BIGLOBAL SOLVER ! > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > > PROGRAM Biglobal_Spectral > > USE PETSC > USE TYPEDEF > USE MAPPINGS > USE BASEFLOW > USE WRITE_PRINT > USE FOURIER_CHEBYSHEV > > IMPLICIT NONE > > include 'mpif.h' > > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -! > ! Variables Definition ! > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -! > > !...Testing and dummy variables > real(dp) :: dummy > > !...Problem parameters > integer(i4b),parameter :: nx=11,ny=11,nxy=nx*ny,N=4*nxy > > !Grid size for 3D Poiseuille baseflow > integer(i4b),parameter :: nnx=nx,nny=ny > > real(dp) :: Re,alpha,beta,Lx,xmin,xmax,ymin,ymax,AA,xll,yll > > !...Problem Flags > integer(i4b) :: fourier,testcase,formulation,algorithm > > !Mapping Flags > integer(i4b) :: map,mapx,mapy > > !...Indices > integer(i4b) :: i,j,iii,jjj,ij,skip,imin,imax,jmin,jmax > > !...Output files > integer(i4b) :: file_num > character(128) :: filetype,filename,file_name > > !...Boundary conditions variables > integer(i4b) :: ct1,ct2,ct3,ct4 > integer(i4b),dimension(:),allocatable :: fst_bc,wall_bc,outlet_bc,inlet_bc > integer(i4b),dimension(:),allocatable :: bcond_px,bcond_py,bcond_pz > integer(i4b),dimension(:),allocatable :: bcond_u,bcond_v,bcond_w,bcond_p > integer(i4b),dimension(:),allocatable :: bcond > > !...Chebyshev related > real(dp),dimension(:),allocatable :: xi,yi,x,y,xfourier > > !...Baseflow variables > real(dp),dimension(:,:),allocatable :: usol,uxsol,uysol > real(dp),dimension(:,:),allocatable :: vsol,vxsol,vysol > real(dp),dimension(:,:),allocatable :: wsol,wxsol,wysol > > !For Hiemenz Baseflow > integer(i4b) :: npts_hz > real(dp) :: ymax_hz,epstol,si > > real(dp),dimension(:),allocatable :: eta > real(dp),dimension(:),allocatable :: u0,v0,v1,v2,v3 > real(dp),dimension(:),allocatable :: w0,w1,w2 > real(dp),dimension(:),allocatable :: etarev,u0rev,v0rev,w0rev > > real(dp),dimension(:,:),allocatable :: Ubar,Vbar,Wbar > > !...Grid variables > real(dp),dimension(:,:),allocatable :: xx,yy,xx1,yy1 > > !...Vectorization array > integer(i4b),dimension(:),allocatable :: li > > !...MPI > integer(i4b) :: rank,size,ierr > > !...Spline interpolation > !real(dp),dimension(:),allocatable :: bsp,csp,dsp > !real(dp),dimension(:),allocatable :: interp_u0,interp_v0,interp_w0 > > real(dp) :: seval > > !...External Subroutines > external seval > > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -! > ! Beginning of program ! > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -! > > call MPI_INIT(ierr) > > call MPI_COMM_SIZE(MPI_COMM_WORLD,size,ierr) > call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr) > > if (rank==0) call cpu_time(start_total) > > !...Parameters for BiGlobal Computation > > testcase=2 ! 1 --> Plane Poiseuille Flow > ! 2 --> 3D Poiseuille Flow > ! 3 --> Attachment line flow (Hiemenz) > ! 4 --> Laminar Separation Bubble > ! 5 --> Artificially parallel boundary-layer (cf. > Rodriguez thesis) > > formulation = 1 ! 1 --> x,y inhomogeneous > ! 2 --> y,z inhomogeneous > > algorithm = 1 ! 1 --> QZ algorith > ! 2 --> Arnoldi algorithm > > > !...Arrays allocation > > allocate(li(nx)) > allocate(xi(nx),yi(ny)) > allocate(x(nx),y(ny)) > allocate(xfourier(nx)) > > allocate(uvec(nxy),uxvec(nxy),uyvec(nxy)) > allocate(vvec(nxy),vxvec(nxy),vyvec(nxy)) > allocate(wvec(nxy),wxvec(nxy),wyvec(nxy)) > > allocate(xx(nx,ny),yy(nx,ny)) > allocate(D1X(nx,nx),D2X(nx,nx)) > allocate(D1Y(ny,ny),D2Y(ny,ny)) > allocate(DMX(nx,nx,2),DMY(ny,ny,2)) > > !allocate(interp_u0(ny),interp_v0(ny),interp_w0(ny)) > > li = 0 > xi = 0.0d0 > yi = 0.0d0 > x = 0.0d0 > y = 0.0d0 > xfourier = 0.0d0 > uvec = 0.0d0 > uxvec = 0.0d0 > uyvec = 0.0d0 > vvec = 0.0d0 > vxvec = 0.0d0 > vyvec = 0.0d0 > wvec = 0.0d0 > wxvec = 0.0d0 > wyvec = 0.0d0 > xx = 0.0d0 > yy = 0.0d0 > D1X = 0.0d0 > D2X = 0.0d0 > D1Y = 0.0d0 > D2Y = 0.0d0 > DMX = 0.0d0 > DMY = 0.0d0 > > !...Test Cases Parameters > > if ( testcase == 2 ) then > > mapx=1 > mapy=0 > > Re=1000 > > if (formulation == 1)then > beta = pi > elseif (formulation == 2) then > alpha = pi > endif > > AA=1.0 > > xmin=-AA > xmax=AA > > ymin=-1.0 > ymax=1.0 > > !grid for baseflow > !nnx=nx > !nny=ny > > endif > > !...Working Array for Converting 2D Indices to 1D > > li=0 > > do i=1,nx > li(i)=(i-1)*ny > enddo > > !...Get Collocation Points and Chebyshev Differentiation Matrices > > call cheby(nx-1,xi,DMX) ! (cf. Spectral Methods in Matlab pages 53-55) > call cheby(ny-1,yi,DMY) > > xll=0 > > ! To get matrices D1X,D2X,D1Y,D2Y > call set_mappings(nx,ny,mapx,mapy,xmin,xmax,ymin,ymax,xi,yi,xll,yll,x,y) > > do i=1,nx > do j=1,ny > > xx(i,j)=x(i) ! not equivalent to matlab meshgrid!!! > yy(i,j)=y(j) ! not equivalent to matlab meshgrid!!! > > enddo > enddo > > > !...Compute Baseflow (3D Poiseuille) > > if ( testcase == 2 ) then > > allocate(xx1(nnx,nny),yy1(nnx,nny)) > > allocate(usol(nnx,nny),uxsol(nnx,nny),uysol(nnx,nny)) > allocate(vsol(nnx,nny),vxsol(nnx,nny),vysol(nnx,nny)) > allocate(wsol(nnx,nny),wxsol(nnx,nny),wysol(nnx,nny)) > > xx1 = 0.0d0 > yy1 = 0.0d0 > usol = 0.0d0 > uxsol = 0.0d0 > uysol = 0.0d0 > vsol = 0.0d0 > vxsol = 0.0d0 > vysol = 0.0d0 > wsol = 0.0d0 > wxsol = 0.0d0 > wysol = 0.0d0 > > if (rank == 0) then > WRITE(*,*) > WRITE(*,'(A,I3,A,I3)')'3D POISEUILLE BASEFLOW COMPUTED ON GRID WITH > NX =',nnx,' NY =',nny > WRITE(*,*) > call cpu_time(start) > endif > > !currently I am taking same grid for baseflow as for biglobal ==> no > interpolation > call Poiseuille_3D_PETSC(nnx,nny,AA,usol,xx1,yy1) > > if (rank==0) then > call cpu_time(finish) > print '("Time for baseflow computation = ",f15.4," > seconds.")',finish-start > > file_num=10 > file_name='check_baseflow_3D_poiseuille.dat' > > call > write_to_tecplot(file_num,file_name,nnx,nny,xx1,yy1,usol,usol,usol) > > endif > > vsol=0*usol > wsol=0*usol > > if (formulation==1) then > > wsol=usol > usol=0.*wsol > vsol=0.*wsol > > endif > > !Compute derivatives > uxsol = MATMUL(D1X,usol) > uysol = MATMUL(usol,TRANSPOSE(D1Y)) > > vxsol = MATMUL(D1X,vsol) > vysol = MATMUL(vsol,TRANSPOSE(D1Y)) > > wxsol = MATMUL(D1X,wsol) > wysol = MATMUL(wsol,TRANSPOSE(D1Y)) > > !Vectorize the matrices > do i=1,nx > do j=1,ny > > ij=li(i)+j > > uvec(ij) = usol(i,j) > vvec(ij) = vsol(i,j) > wvec(ij) = wsol(i,j) > > uxvec(ij) = uxsol(i,j) > vxvec(ij) = vxsol(i,j) > wxvec(ij) = wxsol(i,j) > > uyvec(ij) = uysol(i,j) > vyvec(ij) = vysol(i,j) > wyvec(ij) = wysol(i,j) > > enddo > enddo > > endif > > !call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) > > !...Build Matrices, Set BCs and Solve Problem > > call SetupSolveGEVP(nx,ny,nxy,alpha,beta,Re) > > if (rank==0) then > call cpu_time(finish_total) > print '("Total time = ",f15.4," seconds.")',finish_total-start_total > endif > > deallocate(li) > deallocate(xi,yi) > deallocate(x,y) > deallocate(xfourier) > > deallocate(uvec,uxvec,uyvec) > deallocate(vvec,vxvec,vyvec) > deallocate(wvec,wxvec,wyvec) > > deallocate(xx,yy) > deallocate(D1X,D2X) > deallocate(D1Y,D2Y) > deallocate(DMX,DMY) > > deallocate(xx1,yy1) > > deallocate(usol,uxsol,uysol) > deallocate(vsol,vxsol,vysol) > deallocate(wsol,wxsol,wysol) > > call MPI_FINALIZE(ierr) > > > END PROGRAM Biglobal_Spectral > > MODULE BASEFLOW > > USE TYPEDEF > USE WRITE_PRINT > USE FOURIER_CHEBYSHEV > > IMPLICIT NONE > > CONTAINS > > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > > SUBROUTINE Poiseuille_3D_PETSC(nx,ny,AA,usol,xx,yy) > > #include <petsc/finclude/petscsys.h> > #include <petsc/finclude/petscvec.h> > #include <petsc/finclude/petscmat.h> > #include <petsc/finclude/petscmat.h90> > #include <petsc/finclude/petscvec.h90> > #include <petsc/finclude/petscpc.h> > #include <petsc/finclude/petscksp.h> > #include <petsc/finclude/petscviewer.h> > #include <petsc/finclude/petscviewer.h90> > > #include <slepc/finclude/slepcsys.h> > #include <slepc/finclude/slepceps.h> > > !...Regular Variables > > integer(i4b),intent(in) :: nx,ny > > real(dp),intent(in) :: AA > real(dp),dimension(nx,ny),intent(inout) :: usol,xx,yy > > integer(i4b) :: i,j,ij > integer(i4b) :: rank,size,ierr,source > integer(i4b) :: nxy,nxm2,nym2,nxym2 > > integer(i4b),dimension(nx) :: li > integer(i4b),dimension(nx-2) :: lim2 > > integer(i4b),dimension(:),allocatable :: loc > > real(dp) :: ymin,ymax,xmin,xmax > real(dp) :: dxidx,d2xidx2,dyidy,d2yidy2 > > real(dp),dimension(nx) :: xi,x > real(dp),dimension(ny) :: yi,y > > real(dp),dimension(nx,nx) :: D2XB > real(dp),dimension(ny,ny) :: D2YB > real(dp),dimension(nx,nx,2) :: DMXB > real(dp),dimension(ny,ny,2) :: DMYB > > real(dp),dimension(nx*ny) :: usol_vec > real(dp),dimension( nx-2,ny-2 ) :: sol_mat > > complex(dpc) :: U00 > complex(dpc) :: atest > complex(dpc),dimension(nx-2,nx-2) :: D2XBs > complex(dpc),dimension(ny-2,ny-2) :: D2YBs > complex(dpc),dimension( nx-2,ny-2 ) :: sol_mat_cmplx > complex(dpc),dimension(:),allocatable :: xwork1 > > !!$ allocate(li(nx)) > !!$ allocate(lim2(nx-2)) > > !!! NEED TO TRANSFORM ALL THESE ARRAYS TO ALLOCATABLE!!!!!! > > !...Petsc Variables > PetscInt ii,jj,ione,ct > PetscInt jmin,jmax > PetscInt Istart,Iend > PetscInt IstartVec,IendVec > > PetscReal tol > PetscScalar aone > > PetscViewer viewer > > PC pc > KSP ksp > VecScatter ctx > Mat DXX,DYY > !Mat DLOAD > Vec frhs,frhs2 > Vec sol,sol_seq > > > !For VecGetArrayReadF90 > !PetscScalar,pointer :: xx_v(:) !Fortran90 pointer to the array > PetscOffset i_x > PetscScalar sol_array(1) > > !...MPI > !PetscMPIInt rank,size > > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -! > ! Beginning of program ! > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -! > > call MPI_COMM_SIZE(MPI_COMM_WORLD,size,ierr) > call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr) > > call PetscInitialize(PETSC_NULL_CHARACTER,ierr) > > !call PetscOptionsInsertString("-mat_superlu_dist_parsymbfact",ierr) > > !...Check if nz and ny are odd (necessary to set max velocity to 1) > > if ( mod(nx,2)==0 .or. mod(ny,2)==0 ) then > write(*,*) > STOP'STOPPING... IN 3D POISEUILLE BASEFLOW NZ,NY NEED TO BE ODD' > endif > > !...Set Domain Sizes and Initialize other variables > > xmin=-AA > xmax=AA > > ymin=-1 > ymax=1 > > ione=1 > > nxy=nx*ny > > !Variables to Build DXX and DYY > nxm2=nx-2 > nym2=ny-2 > nxym2 = nxm2*nym2 > > !...Mat to Vec and Vec to Mat conversion > > lim2=0 > do i=1,nxm2 > lim2(i) = (i-1)*nym2 > enddo > > !...Get collocation points and chebyshev differentiation matrices > > call cheby(nx-1,xi,DMXB) > call cheby(ny-1,yi,DMYB) > > ! x-mapping from [-1,1]->[-XMAX,XMAX] > x=xmax*xi > > ! y-mapping (from [-1,1]->[-1,1]) > y=yi > > !Compute the metric terms > dxidx = 1/xmax > d2xidx2 = 0 > > dyidy = 1 > d2yidy2 = 0 > > !Scale the differentiation matrix (due to the use of the mapping) > > !x-direction > D2XB = d2xidx2*DMXB(:,:,1) + dxidx**2*DMXB(:,:,2) > D2XBs = D2XB(2:nx-1,2:nx-1) > > !y-direction > D2YB = d2yidy2*DMYB(:,:,1) + dyidy**2*DMYB(:,:,2) > D2YBs = D2YB(2:ny-1,2:ny-1) > > xx=0.0 > yy=0.0 > > do i=1,nx > do j=1,ny > > xx(i,j)=x(i) > yy(i,j)=y(j) > > enddo > enddo > > !...Create Petsc right-hand-side vector and derivation matrices > call VecCreateSeq(PETSC_COMM_SELF,nxym2,sol_seq,ierr) > call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,nxym2,frhs,ierr) ! right > hand side vector > call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,nxym2,frhs2,ierr) ! right > hand side vector # 2 > > call > MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,nxym2,nxym2,nx,PETSC_NULL_INTEGER,nx,PETSC_NULL_INTEGER,DXX,ierr) > call > MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,nxym2,nxym2,ny,PETSC_NULL_INTEGER,ny,PETSC_NULL_INTEGER,DYY,ierr) > > > !...Determine which rows of the vector/matrix are locally owned. > call VecGetOwnershipRange(frhs,IstartVec,IendVec,ierr) > call MatGetOwnershipRange(DXX,Istart,Iend,ierr) > > if ( IstartVec /= Istart .or. IendVec /= Iend ) then > stop 'Problem in Baseflow Computation' > endif > > !...Fill in Sequential and Parallel Vectors > allocate(xwork1(Iend-Istart)) > allocate(loc(Iend-Istart)) > > ct=0 > do i=Istart,Iend-1 > ct=ct+1 > loc(ct)=i > xwork1(ct)=(-2.0d0,0.0d0) > enddo > > call VecSetValues(frhs,Iend-Istart,loc,xwork1,INSERT_VALUES,ierr) > call VecZeroEntries(sol_seq,ierr) > > !...Assemble Vectors > call VecAssemblyBegin(frhs,ierr) > call VecAssemblyEnd(frhs,ierr) > > call VecAssemblyBegin(sol_seq,ierr) > call VecAssemblyEnd(sol_seq,ierr) > > call VecDuplicate(frhs,sol,ierr) ! parallel solution vector > > !...Display vector > !call VecView(frhs,PETSC_VIEWER_STDOUT_WORLD,ierr) ! ==> seems very slow > > !...Build DXX in Parallel > > !!$ if (rank==0) then > !!$ call print_matrix('D2XB',nx,nx,D2XB) > !!$ call print_matrix('D2YB',ny,ny,D2YB) > !!$ > !!$ call print_matrix_cmplx('D2XBs',nx-2,nx-2,D2XBs) > !!$ call print_matrix_cmplx('D2YBs',ny-2,ny-2,D2YBs) > !!$ endif > > if (rank == 0) call cpu_time(start) > > do i=Istart,Iend-1 > > ii=1+floor((i+0.01)/nym2) > jj=1 > > jmin=mod(i,nym2) > jmax=jmin+(nxm2-1)*nym2 > > do j=jmin,jmax,nym2 > > call MatSetValues(DXX,ione,i,ione,j,D2XBs(ii,jj),INSERT_VALUES,ierr) > jj=jj+1 > > enddo > > enddo > > !...Build DYY in Parallel > > do i=Istart,Iend-1 > > ii=mod(i,nym2)+1 > jj=1 > > jmin=floor((i+0.01)/nym2)*nym2 > > do j=jmin,jmin+(nym2-1) > > call MatSetValues(DYY,ione,i,ione,j,D2YBs(ii,jj),INSERT_VALUES,ierr) > jj=jj+1 > > enddo > > enddo > > !...Assemble DXX and DYY > > call MatAssemblyBegin(DXX,MAT_FINAL_ASSEMBLY,ierr) > call MatAssemblyEnd(DXX,MAT_FINAL_ASSEMBLY,ierr) > > call MatAssemblyBegin(DYY,MAT_FINAL_ASSEMBLY,ierr) > call MatAssemblyEnd(DYY,MAT_FINAL_ASSEMBLY,ierr) > > if (rank==0) then > call cpu_time(finish) > print '("Time for build and assembly of baseflow operator = ",f15.4," > seconds.")',finish-start > endif > > !call MatView(DXX,PETSC_VIEWER_STDOUT_WORLD,ierr) > !call MatView(DYY,PETSC_VIEWER_STDOUT_WORLD,ierr) > > aone = (1.0d0,0.0d0) > > if (rank == 0) call cpu_time(start) > call MatAXPY(DXX,aone,DYY,DIFFERENT_NONZERO_PATTERN,ierr) ! on exit DXX = > a*DYY + DXX > if (rank==0) then > call cpu_time(finish) > print '("Time for Matrix summation = ",f15.4," seconds.")',finish-start > > endif > > > !...Write Matrix to ASCII and Binary Format > !!$ call PetscViewerASCIIOpen(PETSC_COMM_WORLD,"Amat.m",viewer,ierr) > !!$ call MatView(DXX,viewer,ierr) > !!$ call PetscViewerDestroy(viewer,ierr) > !!$ > !!$ call > PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_WRITE,viewer,ierr) > !!$ call MatView(DXX,viewer,ierr) > !!$ call PetscViewerDestroy(viewer,ierr) > > !...Load a Matrix in Binary Format > !!$ call > PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_READ,viewer,ierr) > !!$ call MatCreate(PETSC_COMM_WORLD,DLOAD,ierr) > !!$ call MatSetType(DLOAD,MATAIJ,ierr) > !!$ call MatLoad(DLOAD,viewer,ierr) > !!$ call PetscViewerDestroy(viewer,ierr) > > !call MatTranspose(DXX,MatReuse reuse,Mat *B) > !call MatView(DXX,PETSC_VIEWER_STDOUT_WORLD,ierr) > > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > !! SOLVE POISSON EQUATION FOR THE FIRST TIME !! > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > > !...Create linear solver context > call KSPCreate(PETSC_COMM_WORLD,ksp,ierr) > > !...Set operators. Here the matrix that defines the linear system also serves > as the preconditioning matrix. > !call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr) !aha > commented and replaced by next line > call KSPSetOperators(ksp,DXX,DXX,ierr) ! remember: here DXX=DXX+DYY > > !call KSPSetOperators(ksp,DLOAD,DLOAD,ierr) ! remember: here DXX=DXX+DYY > > tol = 1.e-10 > !call > KSPSetTolerances(ksp,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr) > ! set relative tolerance and uses defualt for absolute and divergence tol > call > KSPSetTolerances(ksp,tol,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr) ! > set relative and absolute tolerances and uses defualt for divergence tol > > !...Set the Direct (LU) Solver > !call KSPSetType(ksp,KSPPREONLY,ierr) > !call KSPGetPC(ksp,pc,ierr) > !call PCSetType(pc,PCLU,ierr) > !call PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS,ierr) ! > MATSOLVERSUPERLU_DIST > > !...Set runtime options, e.g., -ksp_type <type> -pc_type <type> -ksp_monitor > -ksp_rtol <rtol> > !These options will override those specified above as long as > KSPSetFromOptions() is called _after_ > !any other customization routines. > > call KSPSetFromOptions(ksp,ierr) > > !...Sets an ADDITIONAL function to be called at every iteration to monitor > the residual/error etc > call > KSPMonitorSet(ksp,KSPMonitorTrueResidualNorm,PETSC_NULL_OBJECT,PETSC_NULL_FUNCTION,ierr) > > > !...Solve the linear system > call KSPSolve(ksp,frhs,sol,ierr) > > !...View solver info (could use -ksp_view instead) > > call KSPView(ksp,PETSC_VIEWER_STDOUT_WORLD,ierr) > > !...Extract Maximum Value (at center of duct) from Solution Vector > > call VecScatterCreateToZero(sol,ctx,sol_seq,ierr) > !scatter as many times as you need > call VecScatterBegin(ctx,sol,sol_seq,INSERT_VALUES,SCATTER_FORWARD,ierr) > call VecScatterEnd(ctx,sol,sol_seq,INSERT_VALUES,SCATTER_FORWARD,ierr) > !destroy scatter context and local vector when no longer needed > call VecScatterDestroy(ctx,ierr) > > !call VecView(sol_seq,PETSC_VIEWER_STDOUT_WORLD,ierr) > > call VecGetArray(sol_seq,sol_array,i_x,ierr) > > if (rank==0) then > > sol_mat_cmplx=0.0d0 > > do i=1,nxm2 > do j=1,nym2 > ij=lim2(i)+j > sol_mat_cmplx(i,j) = sol_array(i_x + ij) > enddo > enddo > > U00 = sol_mat_cmplx((nxm2+1)/2,(nym2+1)/2) > > endif > > source=0 > Call MPI_Bcast(U00,1,MPI_DOUBLE_COMPLEX,source,MPI_COMM_WORLD,ierr) > > !call VecGetArrayReadF90(sol,xx_v,ierr) ! requires #include > <petsc/finclude/petscvec.h90> > !atest = xx_v(3) > !call VecRestoreArrayReadF90(sol,xx_v,ierr) > !write(*,*)'atest',atest > > call VecRestoreArray(sol_seq,sol_array,i_x,ierr) > > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > !! SOLVE POISSON EQUATION FOR THE SECOND TIME !! > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > > !...Fill in New Right-Hand-Side Vector (frhs2) > call VecSetValues(frhs2,Iend-Istart,loc,xwork1/U00,INSERT_VALUES,ierr) > > !...Assemble New RHS Vector > call VecAssemblyBegin(frhs2,ierr) > call VecAssemblyEnd(frhs2,ierr) > > !...Solve System > call KSPSolve(ksp,frhs2,sol,ierr) > > !...Get Final Solution > call VecScatterCreateToZero(sol,ctx,sol_seq,ierr) > call VecScatterBegin(ctx,sol,sol_seq,INSERT_VALUES,SCATTER_FORWARD,ierr) > call VecScatterEnd(ctx,sol,sol_seq,INSERT_VALUES,SCATTER_FORWARD,ierr) > call VecScatterDestroy(ctx,ierr) > > call VecGetArray(sol_seq,sol_array,i_x,ierr) > > if (rank==0) then > > usol=0.0d0 > sol_mat_cmplx=0.0d0 > > do i=1,nxm2 > do j=1,nym2 > ij=lim2(i)+j > sol_mat_cmplx(i,j) = sol_array(i_x + ij) > enddo > enddo > > usol(2:nx-1,2:ny-1) = dble(sol_mat_cmplx) > > endif > > call VecRestoreArray(sol_seq,sol_array,i_x,ierr) > > !...Send usol to all other processes (vectorize, broadcast and matrixize) > li=0 > do i=1,nx > li(i) = (i-1)*ny > enddo > > if (rank==0) then > > do i=1,nx > do j=1,ny > ij=li(i)+j > usol_vec(ij)=usol(i,j) > enddo > enddo > > endif > > call MPI_Bcast(usol_vec,nx*ny,MPI_DOUBLE,source,MPI_COMM_WORLD,ierr) > > if (rank/=0) then > > do i=1,nx > do j=1,ny > ij=li(i)+j > usol(i,j)=usol_vec(ij) > enddo > enddo > > endif > > !...Free work space. All PETSc objects should be destroyed when they are no > longer needed. > > deallocate(loc) > deallocate(xwork1) > > call KSPDestroy(ksp,ierr) > call VecDestroy(frhs,ierr) > call VecDestroy(frhs2,ierr) > call VecDestroy(sol,ierr) > call VecDestroy(sol_seq,ierr) > call MatDestroy(DXX,ierr) > call MatDestroy(DYY,ierr) > > call PetscFinalize(ierr) > > > END SUBROUTINE Poiseuille_3D_PETSC > > > > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > > > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > !!!!!!! HIEMENZ FLOW SOLVER !!!!!!!! > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > > > !************************************************************************************************* > ! on Veltins: ifort hiemenz.f90 -convert big_endian -r8 -i4 -O3 -132 -o > hiemenz.x > ! > ************************************************************************************************ > !program hiemenz > > subroutine hiemenz(ymax,eps,n,si,eta,u0,v0,v1,v2,v3,w0,w1,w2) > > implicit none > > real(dp) eps,ymax,si > integer(i4b) n,i > > ! real(dp),allocatable,dimension(:) :: eta > ! real(dp),allocatable,dimension(:) :: u0,v0,v1,v2,v3 > ! real(dp),allocatable,dimension(:) :: w0,w1,w2 > > real(dp),dimension(n) :: eta > real(dp),dimension(n) :: u0,v0,v1,v2,v3 > real(dp),dimension(n) :: w0,w1,w2 > > write(*,*) > write(*,*) > '*******************************************************************' > write(*,*) ' HIEMENZ SOLVER > ' > write(*,*) > '*******************************************************************' > write(*,*) > > ! > !..... Numerical Parameters > ! > > !Domain height and number of points > > !ymax=100 > !n=500001 > > !Tolerance for Newton iteration > > !eps=1e-12 > > !Initial guess for Newton iteration: f''(0) > > !si = -1.232587656820134 > > ! > !..... Allocate arrays > ! > ! allocate(eta(n)) > ! allocate(u0(n),v0(n),v1(n),v2(n),v3(n)) > ! allocate(w0(n),w1(n),w2(n)) > > ! > !..... Compute solution > ! > > call hiemenz_fct(ymax,eps,n,si,eta,u0,v0,v1,v2,v3,w0,w1,w2) > > ! > !..... Write out solution in ASCII format > ! > > write(*,*) > write(*,*) 'Writing solution to hiemenz.dat' > write(*,*) > > open(unit=15,file='hiemenz.dat',form='formatted') > > do i=1,n > write(15,'(9E21.11)') > eta(i),u0(i),v0(i),v1(i),v2(i),v3(i),w0(i),w1(i),w2(i) > end do > > close(15) > > > > end subroutine hiemenz > > > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > > > subroutine hiemenz_fct(ymax,eps,n,si,eta,u0,v0,v1,v2,v3,w0,w1,w2) > > implicit none > > integer(i4b) i,iter,n,j > real(dp) ymax,res,p,eps,s,si > > real(dp),allocatable,dimension(:,:) :: v,w > > real(dp) eta(n) > real(dp) u0(n),v0(n),v1(n),v2(n),v3(n) > real(dp) w0(n),w1(n),w2(n) > > !external w_ode > > write(*,*) 'eps=',eps > write(*,*) 'ymax=',ymax > write(*,*) 'N=',n > write(*,*) > > ! > ! Allocate arrays > ! > allocate(v (6,n)) > allocate(w (5,n)) > > ! > ! Create grid > ! > do i=1,n > eta(i) = ymax * real(i-1)/real(n-1) > end do > ! > ! Newton-Raphson iteration for v > ! > iter = 0 > s = si > res = 1. > write(*,*) 'Newton-Raphson iteration for v: s=',s > > do while (res.gt.eps) > > iter=iter+1 > > call hiemenz_eval(s,eta,n,v) > > s = s - (v(2,n) + 1.0) / v(5,n) > res = abs(v(2,n) + 1.0) > > write(*,*) iter,s,res > > end do > > write(*,*) 'Number of iterations ',iter > write(*,*) > write(*,*) "v''(0)=",v(3,1) > > ! > ! Now compute w > ! > w(1,1) = 0. > w(2,1) = v(3,1) > > w(3,1) = 0. > w(4,1) = 0. > w(5,1) = v(3,1) > > do i=2,n > call rk4(w(1,i-1),w(1,i),eta(i)-eta(i-1),5,w_ode) > end do > > ! > ! Rescale w to match boundary conditions at infinity > ! > p = 1./w(1,n) > > do i=1,n > w(1,i) = w(1,i) * p > w(2,i) = w(2,i) * p > end do > > write(*,*) "w'(0)=",w(2,1) > > ! > ! Now compute v''', w'' from the RHS and copy all values to new arrays > ! > > do i=1,n > > u0(i) = -v(2,i) > v0(i) = v(1,i) > v1(i) = v(2,i) > v2(i) = v(3,i) > v3(i) = -v(2,i)*v(2,i) + v(1,i)*v(3,i) + 1.0 > w0(i) = w(1,i) > w1(i) = w(2,i) > w2(i) = w(2,i)*v(1,i) > > end do > > > deallocate(v,w) > > end subroutine hiemenz_fct > > > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > > > subroutine hiemenz_eval(s,eta,n,f) > > implicit none > > integer(i4b) n,i > real(dp) s > real(dp) eta(n),f(6,n) > > !external v_ode > > f(1,1) = 0. > f(2,1) = 0. > f(3,1) = s > > f(4,1) = 0. > f(5,1) = 0. > f(6,1) = 1. > > do i=2,n > call rk4(f(1,i-1),f(1,i),eta(i)-eta(i-1),6,v_ode) > end do > > end subroutine hiemenz_eval > > > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > > > subroutine v_ode(y,dy) > > implicit none > > real(dp) y(6),dy(6) > > dy(1) = y(2) > dy(2) = y(3) > dy(3) = -y(2)*y(2) + y(1)*y(3) + 1.0 > > dy(4) = y(5) > dy(5) = y(6) > dy(6) = y(3)*y(4) - 2.0*y(2)*y(5) + y(1)*y(6) > > end subroutine v_ode > > > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > > > subroutine w_ode(y,dy) > > implicit none > > real(dp) y(5),dy(5) > > dy(1) = y(2) > dy(2) = y(2)*y(3) > > dy(3) = y(4) > dy(4) = y(5) > dy(5) = -y(4)*y(4) + y(3)*y(5) + 1.0 > > end subroutine w_ode > > > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > > > subroutine rk4(y1,y2,h,nn,der) > ! > ! Perform one RK4 step with step size h of a n-vector > ! Input: y1(n) old step > ! h step size > ! nn vector size > ! der derivative function > ! Output: y2(nn) new step > ! > implicit none > > integer(i4b) nn > > real(dp) y1(nn),y2(nn),y(nn) > real(dp) k1(nn),k2(nn),k3(nn),k4(nn) > real(dp) h > > external der > ! > ! First RK substep > ! > > y(:) = y1(:) > call der(y,k1) > k1(:)=h*k1(:) > > ! > ! Second RK substep > ! > > y(:) = y1(:) + .5*k1(:) > call der(y,k2) > k2(:)=h*k2(:) > > ! > ! Third RK substep > ! > > y(:) = y1(:) + .5*k2(:) > call der(y,k3) > k3(:)=h*k3(:) > > ! > ! Fourth RK substep > ! > > y(:) = y1(:) + k3(:) > call der(y,k4) > k4(:)=h*k4(:) > > ! > ! Compose solution after full RK step > ! > y2(:) = y1(:) + ( k1(:) + 2.0*k2(:) + 2.0*k3(:) + k4(:) ) / 6.0 > > end subroutine rk4 > > > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > > > subroutine compute_baseflow_derivative(DXYZ,uvwvec,nx,ny,UVWxyz) > > implicit none > > integer(i4b) :: i,j > > integer(i4b),intent(in) :: nx,ny > > real(dp),dimension(nx*ny),intent(in) :: uvwvec > real(dp),dimension(nx*ny,nx*ny),intent(in) :: DXYZ > real(dp),dimension(nx*ny,nx*ny),intent(out) :: UVWxyz > > UVWxyz = 0.0 > > do i=1,nx*ny > do j=1,nx*ny > > UVWxyz(i,i) = UVWxyz(i,i) + DXYZ(i,j)*uvwvec(j) > > enddo > enddo > > end subroutine compute_baseflow_derivative > > > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > > > > subroutine compute_UVW_DXYZ(DXYZ,uvwvec,nx,ny,UVW_DXYZ) > > implicit none > > integer(i4b) :: i,j > > integer(i4b),intent(in) :: nx,ny > > real(dp),dimension(nx*ny),intent(in) :: uvwvec > real(dp),dimension(nx*ny,nx*ny),intent(in) :: DXYZ > real(dp),dimension(nx*ny,nx*ny),intent(out) :: UVW_DXYZ > > do i=1,nx*ny > do j=1,nx*ny > > UVW_DXYZ(i,j) = uvwvec(i)*DXYZ(i,j) > > enddo > enddo > > end subroutine compute_UVW_DXYZ > > > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > > > > > > END MODULE BASEFLOW > > > > > > > > > > > > > > > > > > > > > > > > > > > !!$ > !!$ > !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > !!$ > !!$ SUBROUTINE Poiseuille_3D(nz,ny,AA,usol,zz,yy) > !!$ > !!$ character(128) :: file_name > !!$ integer(i4b) :: INFO,NRHS,ct,i,j,N,file_num,m1,n1,p1,q1 > !!$ > !!$ integer(i4b),intent(in) :: nz,ny > !!$ real(dp),intent(in) :: AA > !!$ > !!$ real(dp) :: ymin,ymax,zmin,zmax,dzidz,d2zidz2,dyidy,d2yidy2,U00 > !!$ real(dp) :: start,finish > !!$ > !!$ real(dp),dimension(nz,nz,2) :: DMZ > !!$ real(dp),dimension(ny,ny,2) :: DMY > !!$ > !!$ real(dp),dimension(nz,nz) :: D2Z > !!$ real(dp),dimension(ny,ny) :: D2Y > !!$ > !!$ real(dp),dimension(nz-2,nz-2) :: D2Zs > !!$ real(dp),dimension(ny-2,ny-2) :: D2Ys > !!$ > !!$ real(dp),dimension(nz-2,nz-2) :: IZ > !!$ real(dp),dimension(ny-2,ny-2) :: IY > !!$ > !!$ real(dp),dimension(nz,ny) :: usol,zz,yy > !!$ real(dp),dimension(nz-2,ny-2) :: utmp > !!$ > !!$ real(dp),dimension(nz) :: zi,z > !!$ real(dp),dimension(ny) :: yi,y > !!$ > !!$ real(dp),dimension( (nz-2)*(ny-2) ) :: frhs > !!$ real(dp),dimension( (nz-2)*(ny-2),(nz-2)*(ny-2) ) :: AMAT,DZZ,DYY > !!$ > !!$ integer(i4b),dimension((nz-2)*(ny-2)) :: IPIV > !!$ > !!$ frhs=0.0 > !!$ > !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > !!$!!!!!!! 3D POISEUILLE FLOW !!!!!!!! > !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > !!$ > !!$ zmin=-AA > !!$ zmax=AA > !!$ > !!$ ymin=-1 > !!$ ymax=1 > !!$ > !!$ ! > !!$ !! .....Create Grid > !!$ ! > !!$ > !!$ if ( mod(nz,2)==0 .or. mod(ny,2)==0 ) then > !!$ WRITE(*,*) > !!$ STOP'STOPPING... IN 3D POISEUILLE BASEFLOW NZ,NY NEED TO BE ODD' > !!$ endif > !!$ > !!$ ! > !!$ !! .....Get collocation points and chebyshev differentiation matrices > !!$ ! > !!$ > !!$ call cheby(nz-1,zi,DMZ) > !!$ call cheby(ny-1,yi,DMY) > !!$ > !!$ ! z-mapping from [-1,1]->[-ZMAX,ZMAX] > !!$ z=zmax*zi > !!$ > !!$ ! y-mapping (from [-1,1]->[-1,1]) > !!$ y=yi > !!$ > !!$ !Compute the metric terms > !!$ dzidz = 1/zmax > !!$ d2zidz2 = 0 > !!$ > !!$ dyidy = 1 > !!$ d2yidy2 = 0 > !!$ > !!$ !Scale the differentiation matrix (due to the use of the mapping) > !!$ > !!$ !z-direction > !!$ !D1Z = dzidz*DMZ(:,:,1) > !!$ D2Z = d2zidz2*DMZ(:,:,1) + dzidz**2*DMZ(:,:,2) > !!$ D2Zs = D2Z(2:nz-1,2:nz-1) > !!$ > !!$ !y-direction > !!$ !D1Y = dyidy*DMY(:,:,1) > !!$ D2Y = d2yidy2*DMY(:,:,1) + dyidy**2*DMY(:,:,2) > !!$ D2Ys = D2Y(2:ny-1,2:ny-1) > !!$ > !!$ IZ=0.0 > !!$ IY=0.0 > !!$ > !!$ do i=1,nz-2 > !!$ IZ(i,i)=1.0 > !!$ enddo > !!$ > !!$ do i=1,ny-2 > !!$ IY(i,i)=1.0 > !!$ enddo > !!$ > !!$ m1=size(IY,1) > !!$ n1=size(IY,2) > !!$ p1=size(D2Zs,1) > !!$ q1=size(D2Zs,2) > !!$ > !!$ call mykron(IY,D2Zs,m1,n1,p1,q1,DZZ) > !!$ > !!$ m1=size(D2Ys,1) > !!$ n1=size(D2Ys,2) > !!$ p1=size(IZ,1) > !!$ q1=size(IZ,2) > !!$ > !!$ call mykron(D2Ys,IZ,m1,n1,p1,q1,DYY) > !!$ > !!$ AMAT = DYY + DZZ > !!$ > !!$ !CALL PRINT_MATRIX( 'DZZ', (nz-2)*(ny-2), (nz-2)*(ny-2), DZZ ) > !!$ !CALL PRINT_MATRIX( 'DYY', (nz-2)*(ny-2), (nz-2)*(ny-2),DYY ) > !!$ > !!$ > !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > !!$ !! SOLVE POISSON EQUATION FOR THE FIRST TIME !! > !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > !!$ > !!$ > !!$ > !!$ do i=1,(ny-2)*(nz-2) > !!$ frhs(i)=-2.0 > !!$ enddo > !!$ > !!$ INFO=0 > !!$ NRHS=1 ! only 1 rhs here > !!$ N=(nz-2)*(ny-2) > !!$ > !!$ > !!$ ! > !!$ ! Compute the LU factorization of A. > !!$ ! > !!$ > !!$ ! Note: on exit of dgetrf, AMAT is not AMAT anymore but contains the > factors L and U > !!$ ! from the factorization A = P*L*U; the unit diagonal elements of L > are not stored. > !!$ > !!$ !call PRINT_MATRIX( 'AMAT', N, N, AMAT ) > !!$ > !!$ > !!$ call cpu_time(start) > !!$ CALL dgetrf( N, N, AMAT, N, IPIV, INFO ) > !!$ call cpu_time(finish) > !!$ > !!$ !write(*,*)'AMAT AFTER',AMAT > !!$ > !!$ print '("Time LU = ",f15.4," seconds.")',finish-start > !!$ > !!$ IF( INFO .NE. 0 ) THEN > !!$ WRITE(*,*)'INFO',INFO > !!$ STOP 'PROBLEM IN LAPACK (Poiseuille_3D)' > !!$ ENDIF > !!$ > !!$ > !!$ ! > !!$ ! Solve the system A*X = B, overwriting B with X. > !!$ ! > !!$ > !!$ IF( INFO.EQ.0 ) THEN > !!$ > !!$ call cpu_time(start) > !!$ CALL dgetrs( 'No transpose', n, nrhs, AMAT, N, IPIV, frhs, N, INFO ) > !!$ call cpu_time(finish) > !!$ > !!$ print '("Time Linear System Solve = ",f15.4," > seconds.")',finish-start > !!$ write(*,*) > !!$ > !!$ END IF > !!$ > !!$ ct=0 > !!$ do j=1,ny-2 > !!$ do i=1,nz-2 > !!$ > !!$ ct=ct+1 > !!$ utmp(i,j) = frhs(ct) > !!$ > !!$ enddo > !!$ enddo > !!$ > !!$ usol = 0.0 > !!$ usol(2:nz-1,2:ny-1) = utmp(:,:) > !!$ > !!$ zz=0.0 > !!$ yy=0.0 > !!$ > !!$ do i=1,nz > !!$ do j=1,ny > !!$ > !!$ zz(i,j)=z(i) > !!$ yy(i,j)=y(j) > !!$ > !!$ enddo > !!$ enddo > !!$ > !!$ U00 = usol( (nz+1)/2 , (ny+1)/2 ) > !!$ > !!$ > !!$ > !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > !!$!! SOLVE POISSON EQUATION FOR THE SECOND TIME !! > !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > !!$ > !!$ > !!$ utmp=0.0 > !!$ usol=0.0 > !!$ frhs=0.0 > !!$ > !!$ !New,scaled rhs > !!$ do i=1,(ny-2)*(nz-2) > !!$ frhs(i)=-2.0/U00 > !!$ enddo > !!$ > !!$ ! > !!$ ! Solve the system A*X = B, overwriting B with X. > !!$ ! > !!$ > !!$ ! Note: here we already have the LU factorization so that we only need > a call to dgetrs > !!$ > !!$ CALL dgetrs( 'No transpose', N, NRHS, AMAT, N, IPIV, frhs, N, INFO ) > !!$ > !!$ IF( INFO .NE. 0 ) THEN > !!$ WRITE(*,*)'INFO',INFO > !!$ STOP 'PROBLEM IN LAPACK (Poiseuille_3D -- solve 2)' > !!$ ENDIF > !!$ > !!$ !write(*,*)'frhs',frhs > !!$ > !!$ ct=0 > !!$ do j=1,ny-2 > !!$ do i=1,nz-2 > !!$ > !!$ ct=ct+1 > !!$ utmp(i,j) = frhs(ct) > !!$ > !!$ enddo > !!$ enddo > !!$ > !!$ usol(2:nz-1,2:ny-1) = utmp(:,:) > !!$ > !!$ > !!$ END SUBROUTINE Poiseuille_3D > !!$ > !!$ > ! > ! Description: Build complex matrices A and B in the context of a generalized > ! eigenvalue problem (GEVP) > ! > ! > !/*T > ! Concepts: Build complex matrices > ! Processors: n > !T*/ > ! > ! ----------------------------------------------------------------------- > > MODULE PETSC > > USE TYPEDEF > USE WRITE_PRINT > > IMPLICIT NONE > > CONTAINS > > Subroutine SetupSolveGEVP(nx,ny,nxy,alpha,beta,Re) > > #include <petsc/finclude/petscsys.h> > #include <petsc/finclude/petscvec.h> > #include <petsc/finclude/petscmat.h> > #include <petsc/finclude/petscmat.h90> > #include <petsc/finclude/petscpc.h> > #include <petsc/finclude/petscksp.h> > > #include <slepc/finclude/slepcsys.h> > #include <slepc/finclude/slepceps.h> > > !...Non PETSc/SLEPc Variables > > real(dp),intent(in) :: alpha,beta,Re > integer(i4b),intent(in) :: nx,ny,nxy > > integer(i4b) :: ij > integer(i4b) :: sizem,rank,source > > integer(i4b),dimension(:),allocatable :: idxm1,idxn1 > integer(i4b),dimension(:),allocatable :: idxm2,idxn2 > integer(i4b),dimension(:),allocatable :: idxm3,idxn3 > > integer(i4b),dimension(:),allocatable :: li > integer(i4b),dimension(:),allocatable :: all_bc > integer(i4b),dimension(:),allocatable :: u_bc,v_bc,w_bc,p_bc > integer(i4b),dimension(:),allocatable :: fst_bc,wall_bc > integer(i4b),dimension(:),allocatable :: outlet_bc,inlet_bc > > > complex(dpc) :: ibeta,Rec,alphac,betac,small,sigma > complex(dpc),dimension(:,:),allocatable :: D1Xc,D2Xc,D1Yc,D2Yc > > complex(dpc),dimension(nxy) :: uvec_cmplx,uxvec_cmplx,uyvec_cmplx > complex(dpc),dimension(nxy) :: vvec_cmplx,vxvec_cmplx,vyvec_cmplx > complex(dpc),dimension(nxy) :: wvec_cmplx,wxvec_cmplx,wyvec_cmplx > > complex(dpc),dimension(3*nxy) :: wvec_cmplx3 > complex(dpc),dimension(ny,ny) :: VDIAG,VDY > > !...Temporary variables > integer(i4b),dimension(:),allocatable :: loc > complex(dpc),dimension(:),allocatable :: xwork1 > > > !...SLEPC VARIABLES > EPS eps > EPSType tname > > ST st > PC pc > > Vec xr,xi > > PetscReal error > PetscScalar kr,ki > > PetscLogDouble mem > > PetscInt maxit,nconv > PetscInt nev,ncv,mpd > PetscInt three > PetscInt ct,ct1,ct2,ct3,ct4 > > PetscInt icntl,ival,ival1 > > PetscReal tolslepc > PetscInt maxiter > > !...PETSC VARIABLES > > PetscReal norm,tol,nrm,nrm1 > PetscInt i,j,k,II,JJ,its > > PetscInt jmin,jmax > PetscInt ione,ntimes > PetscErrorCode ierr > PetscBool flg > PetscScalar v,one,zero,rhs > > KSP ksp > PetscRandom rctx > > !...MPI > !PetscMPIInt rank,size > > !...Matrices indices > > PetscInt IstartA,IendA,IstartB,IendB > > !...Vectors > > Vec i_vec,frhs,sol > > !...Matrices > > Mat A,B > > ! Note: Any user-defined Fortran routines (such as MyKSPMonitor) > ! MUST be declared as external. > > !external MyKSPMonitor,MyKSPConverged > > !#if !defined(PETSC_USE_COMPLEX) > !#error "this code requires PETSc --with-scalar-type=complex build" > !#endif > !#if !defined(PETSC_USE_REAL_DOUBLE) > !#error "this code requires PETSc --with-precision=real build" > !#endif > > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -! > ! Beginning of program ! > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -! > > call SlepcInitialize(PETSC_NULL_CHARACTER,ierr)!;CHKERRQ(ierr) > > call PetscMemorySetGetMaximumUsage(ierr)!;CHKERRQ(ierr) > > call PetscOptionsInsertString("-mat_superlu_dist_parsymbfact",ierr) > > call MPI_COMM_SIZE(MPI_COMM_WORLD,sizem,ierr)!;CHKERRQ(ierr) > call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)!;CHKERRQ(ierr) > > call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) > !call PetscMemoryGetMaximumUsage(mem,ierr) > call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr) > if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (entering > module)',rank,' memory:',mem > > !call PetscOptionsGetInt(PETSC_NULL_CHARACTER,'-nx',nx,flg,ierr) > !call PetscOptionsGetInt(PETSC_NULL_CHARACTER,'-ny',ny,flg,ierr) > > ione=1 ! integer > ibeta = imag_unit*beta > one = (1.0d0,0.0d0) > zero = (0.0d0,0.0d0) > Rec=Re > alphac=alpha > betac=beta > > source=0 > > ct=0 > > !...Convert Arrays to Complex > > uvec_cmplx=uvec > uxvec_cmplx=uxvec > uyvec_cmplx=uyvec > > vvec_cmplx=vvec > vxvec_cmplx=vxvec > vyvec_cmplx=vyvec > > wvec_cmplx=wvec > wxvec_cmplx=wxvec > wyvec_cmplx=wyvec > > wvec_cmplx3(1:nxy)=wvec_cmplx > wvec_cmplx3(nxy+1:2*nxy)=wvec_cmplx > wvec_cmplx3(2*nxy+1:3*nxy)=wvec_cmplx > > allocate(D1Xc(nx,nx),D2Xc(nx,nx),D1Yc(ny,ny),D2Yc(ny,ny)) > > D1Xc=D1X > D1Yc=D1Y > > D2Xc=D2X > D2Yc=D2Y > > !call > MPI_Bcast(wvec_cmplx3,3*nxy,MPI_DOUBLE_COMPLEX,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) > > !...Parallel Matrices > > !Create A > call > MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4*nxy,4*nxy,2*(nx+ny),PETSC_NULL_INTEGER,max(nx,ny),PETSC_NULL_INTEGER,A,ierr)!;CHKERRQ(ierr) > > !Create B > > call > MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4*nxy,4*nxy,1,PETSC_NULL_INTEGER,0,PETSC_NULL_INTEGER,B,ierr)!;CHKERRQ(ierr) > > call MatGetOwnershipRange(A,IstartA,IendA,ierr)!;CHKERRQ(ierr) > call MatGetOwnershipRange(B,IstartB,IendB,ierr)!;CHKERRQ(ierr) > > !write(*,*)'IstartA,IendA',IstartA,IendA > !write(*,*)'IstartB,IendB',IstartB,IendB > > if (IstartA > 4*nxy .or. IstartB > 4*nxy .or. IendA > 4*nxy .or. IendB > > 4*nxy) then > stop 'indices problem in module_petsc' > endif > > !...NEED TO TEST WITHOUT THIS NEXT LINE TO UNDERSTAND > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > !...This line avoids error [0]PETSC ERROR: Argument out of range [0]PETSC > ERROR: New nonzero at (470,470)... > call > MatSetOption(A,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE,ierr)!;CHKERRQ(ierr) > > > !...Get Memory Usage > call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) > !call PetscMemoryGetMaximumUsage(mem,ierr) > call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr) > if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (after > matcreate)',rank,' memory:',mem > > !...Build A in Parallel > > if (rank == 0) call cpu_time(start) > > do i=IstartA,IendA-1 > > if ( i < nxy ) then > > if (abs(uyvec_cmplx(i+1))>1e10) STOP '1' > call > MatSetValues(A,ione,i,ione,i+nxy,uyvec_cmplx(i+1),INSERT_VALUES,ierr)!;CHKERRQ(ierr) > ! Uy (block 1,2) > > !Build DX (block 1,4) > ii=1+floor((i+0.01)/ny) > jj=1 > > jmin=mod(i,ny) > jmax=jmin+(nx-1)*ny > > do j=jmin,jmax,ny > > if (abs(D1Xc(ii,jj))>1e10) STOP '2' > call > MatSetValues(A,ione,i,ione,j+3*nxy,D1Xc(ii,jj),INSERT_VALUES,ierr) ! Build DX > (block 1,4) > jj=jj+1 > > enddo > > !Build -DXX/Re (part of block 1,1) > ii=1+floor((i+0.01)/ny) > jj=1 > > jmin=mod(i,ny) > jmax=jmin+(nx-1)*ny > > do j=jmin,jmax,ny > > if (abs(-D2Xc(ii,jj)/Rec)>1e10) STOP '3' > call > MatSetValues(A,ione,i,ione,j,-D2Xc(ii,jj)/Rec,INSERT_VALUES,ierr) !Build > -DXX/Re (part of block 1,1) > jj=jj+1 > > enddo > > > elseif ( i > nxy-1 .and. i < 2*nxy ) then > > if (abs(vxvec_cmplx(i-nxy+1))>1e10) STOP '4' > call > MatSetValues(A,ione,i,ione,i-nxy,vxvec_cmplx(i-nxy+1),INSERT_VALUES,ierr)!;CHKERRQ(ierr) > ! Vx (block 2,1) > > !Build DY (block 2,4) > ii=mod(i,ny)+1 > jj=1 > > jmin=floor((i-nxy+0.01)/ny)*ny > > do j=jmin,jmin+(ny-1) > > if (abs(D1Yc(ii,jj))>1e10) STOP '5' > call > MatSetValues(A,ione,i,ione,j+3*nxy,D1Yc(ii,jj),INSERT_VALUES,ierr) ! Build DY > (block 2,4) > jj=jj+1 > > enddo > > > !Build -DXX/Re (part of block 2,2) > ii=1+floor((i-nxy+0.01)/ny) > jj=1 > > jmin=mod(i,ny) > jmax=jmin+(nx-1)*ny > > do j=jmin,jmax,ny > > if (abs(-D2Xc(ii,jj)/Rec)>1e10) STOP '6' > call > MatSetValues(A,ione,i,ione,j+nxy,-D2Xc(ii,jj)/Rec,INSERT_VALUES,ierr) !Build > -DXX/Re (part of block 2,2) > jj=jj+1 > > enddo > > > > > elseif ( i > 2*nxy-1 .and. i < 3*nxy ) then > > call MatSetValues(A,ione,i,ione,i+nxy,ibeta,INSERT_VALUES,ierr)! > iBeta (block 3,4) > > call > MatSetValues(A,ione,i,ione,i-2*nxy,wxvec_cmplx(i-2*nxy+1),INSERT_VALUES,ierr)! > Wx (block 3,1) > call MatSetValues(A,ione,i,ione,i > -nxy,wyvec_cmplx(i-2*nxy+1),INSERT_VALUES,ierr)! Wy (block 3,2) > > > !Build -DXX/Re (part of block 3,3) > ii=1+floor((i-2*nxy+0.01)/ny) > jj=1 > > jmin=mod(i,ny) > jmax=jmin+(nx-1)*ny > > do j=jmin,jmax,ny > > call > MatSetValues(A,ione,i,ione,j+2*nxy,-D2Xc(ii,jj)/Rec,INSERT_VALUES,ierr) > !Build -DXX/Re (part of block 3,3) > jj=jj+1 > > enddo > > > elseif ( i > 3*nxy-1 ) then > > call MatSetValues(A,ione,i,ione,i-nxy,ibeta,INSERT_VALUES,ierr)! > iBeta (block 4,3) > > !Build DX (block 4,1) > ii=1+floor(((i-3*nxy)+0.01)/ny) > jj=1 > > jmin=mod(i,ny) > jmax=jmin+(nx-1)*ny > > do j=jmin,jmax,ny > > call MatSetValues(A,ione,i,ione,j,D1Xc(ii,jj),INSERT_VALUES,ierr) > jj=jj+1 > > enddo > > !Build DY (block 4,2) > ii=mod(i,ny)+1 > jj=1 > > jmin=floor((i-3*nxy+0.01)/ny)*ny > > do j=jmin,jmin+(ny-1) > > call > MatSetValues(A,ione,i,ione,j+nxy,D1Yc(ii,jj),INSERT_VALUES,ierr) > jj=jj+1 > > enddo > > endif > > enddo > > if (rank==0) then > > call cpu_time(finish) > write(*,*) > print '("Time Partial Parallel Build A #1 = ",f21.3," > seconds.")',finish-start > > endif > > > call MatAssemblyBegin(A,MAT_FLUSH_ASSEMBLY,ierr)!;CHKERRQ(ierr) ! > necessary to switch between INSERT_VALUES and ADD_VALUES > call MatAssemblyEnd(A,MAT_FLUSH_ASSEMBLY,ierr)!;CHKERRQ(ierr) ! > necessary to switch between INSERT_VALUES and ADD_VALUES > > > > !...Finish Blocks A11, A22 and A33 > > if (rank == 0) call cpu_time(start) > > VDY=(0.0d0,0.0d0) > VDIAG=(0.0d0,0.0d0) > > do i=IstartA,IendA-1 > if (i < 3*nxy) then > call > MatSetValues(A,ione,i,ione,i,betac**2/Rec+imag_unit*betac*wvec_cmplx3(i+1),ADD_VALUES,ierr)!;CHKERRQ(ierr) > endif > enddo > > do i=IstartA,IendA-1 > > if ( i < nxy ) then > > !Build -DYY/Rec (part of block 1,1) > ii=mod(i,ny)+1 > jj=1 > > jmin=floor((i+0.01)/ny)*ny > > do j=jmin,jmin+(ny-1) > > call > MatSetValues(A,ione,i,ione,j,-D2Yc(ii,jj)/Rec,ADD_VALUES,ierr) > jj=jj+1 > > enddo > > > !Build U*DX (part of block 1,1) > ii=1+floor((i+0.01)/ny) > jj=1 > > jmin=mod(i,ny) > jmax=jmin+(nx-1)*ny > > do j=jmin,jmax,ny > > call > MatSetValues(A,ione,i,ione,j,uvec_cmplx(i+1)*D1Xc(ii,jj),ADD_VALUES,ierr) > jj=jj+1 > > enddo > > > !Build V*DY (part of block 1,1) > ii=mod(i,ny)+1 > jj=1 > > jmin=floor((i+0.01)/ny)*ny > > do j=jmin,jmin+(ny-1) > call > MatSetValues(A,ione,i,ione,j,vvec_cmplx(i+1)*D1Yc(ii,jj),ADD_VALUES,ierr) > jj=jj+1 > > enddo > > > call > MatSetValues(A,ione,i,ione,i,uxvec_cmplx(i+1),ADD_VALUES,ierr)!;CHKERRQ(ierr) > ! Ux (block 1,1) > > > > > elseif ( i > nxy-1 .and. i < 2*nxy ) then > > !Build -DYY/Rec (part of block 2,2) > ii=mod(i,ny)+1 > jj=1 > > jmin=floor((i-nxy+0.01)/ny)*ny > > do j=jmin,jmin+(ny-1) > > call > MatSetValues(A,ione,i,ione,j+nxy,-D2Yc(ii,jj)/Rec,ADD_VALUES,ierr) ! Build DY > (block 2,4) > jj=jj+1 > > enddo > > > !Build U*DX (part of block 2,2) > ii=1+floor((i-nxy+0.01)/ny) > jj=1 > > jmin=mod(i,ny) > jmax=jmin+(nx-1)*ny > > do j=jmin,jmax,ny > > call > MatSetValues(A,ione,i,ione,j+nxy,uvec_cmplx(i-nxy+1)*D1Xc(ii,jj),ADD_VALUES,ierr) > > jj=jj+1 > > enddo > > > !Build V*DY (part of block 2,2) > ii=mod(i,ny)+1 > jj=1 > > jmin=floor((i-nxy+0.01)/ny)*ny > > do j=jmin,jmin+(ny-1) > call > MatSetValues(A,ione,i,ione,j+nxy,vvec_cmplx(i-nxy+1)*D1Yc(ii,jj),ADD_VALUES,ierr) > jj=jj+1 > > enddo > > call > MatSetValues(A,ione,i,ione,i,vyvec_cmplx(i-nxy+1),ADD_VALUES,ierr)!;CHKERRQ(ierr) > ! Vy (block 2,2) > > > elseif ( i > 2*nxy-1 .and. i < 3*nxy ) then > > !Build -DYY/Rec (part of block 3,3) > ii=mod(i,ny)+1 > jj=1 > > jmin=floor((i-2*nxy+0.01)/ny)*ny > > do j=jmin,jmin+(ny-1) > > call > MatSetValues(A,ione,i,ione,j+2*nxy,-D2Yc(ii,jj)/Rec,ADD_VALUES,ierr) ! Build > DY (block 2,4) > jj=jj+1 > > enddo > > > !Build U*DX (part of block 3,3) > ii=1+floor((i-2*nxy+0.01)/ny) > jj=1 > > jmin=mod(i,ny) > jmax=jmin+(nx-1)*ny > > do j=jmin,jmax,ny > > call > MatSetValues(A,ione,i,ione,j+2*nxy,uvec_cmplx(i-2*nxy+1)*D1Xc(ii,jj),ADD_VALUES,ierr) > > jj=jj+1 > > enddo > > > !Build V*DY (part of block 3,3) > ii=mod(i,ny)+1 > jj=1 > > jmin=floor((i-2*nxy+0.01)/ny)*ny > > do j=jmin,jmin+(ny-1) > call > MatSetValues(A,ione,i,ione,j+2*nxy,vvec_cmplx(i-2*nxy+1)*D1Yc(ii,jj),ADD_VALUES,ierr) > jj=jj+1 > > enddo > > endif > > enddo > > > if (rank==0) then > > call cpu_time(finish) > write(*,*) > print '("Time Partial Parallel Build A #2 = ",f21.3," > seconds.")',finish-start > > endif > > !$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ > !$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ > > > !...Artificially create diagonal element for A ==> TO BE REMOVED > small = (0.000001d0,0.0d0) > do i=IstartA,IendA-1 > call MatSetValues(A,ione,i,ione,i,small,ADD_VALUES,ierr)!;CHKERRQ(ierr) > > enddo > > call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) > !call PetscMemoryGetMaximumUsage(mem,ierr) > call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr) > if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (A just before > assembly)',rank,' memory:',mem > > > if (rank==0) then > call cpu_time(start) > endif > > !...Assemble A > call MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY,ierr)!;CHKERRQ(ierr) > call MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY,ierr)!;CHKERRQ(ierr) > !call MatView(A,PETSC_VIEWER_STDOUT_WORLD,ierr) > > if (rank==0) then > call cpu_time(finish) > write(*,*) > print '("Time to assemble A = ",f21.3," seconds.")',finish-start > endif > > > call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) > !call PetscMemoryGetMaximumUsage(mem,ierr) > call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr) > if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (A build and > assembled)',rank,' memory:',mem > > > !...Build B in parallel > if (rank==0) then > call cpu_time(start) > endif > > do i=IstartB,IendB-1 > if (i < 3*nxy) then > call > MatSetValues(B,ione,i,ione,i,imag_unit,INSERT_VALUES,ierr)!;CHKERRQ(ierr) > endif > enddo > > !...Assemble B > call MatAssemblyBegin(B,MAT_FINAL_ASSEMBLY,ierr)!;CHKERRQ(ierr) > call MatAssemblyEnd(B,MAT_FINAL_ASSEMBLY,ierr)!;CHKERRQ(ierr) > !call MatView(B,PETSC_VIEWER_STDOUT_WORLD,ierr) > > call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) > !call PetscMemoryGetMaximumUsage(mem,ierr) > call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr) > if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (B build and > assembled)',rank,' memory:',mem > > > call MatNorm(B,NORM_FROBENIUS,nrm1,ierr)!;CHKERRQ(ierr) > > !!$ if (rank==0) then > !!$ write(*,*)'norm B:',nrm1 > !!$ endif > > if (rank==0) then > call cpu_time(finish) > write(*,*) > print '("Time to build and assemble B = ",f21.3," > seconds.")',finish-start > endif > > > !...Set Boundary Conditions > > if (rank==0) then > > allocate(li(nx)) > > do i=1,nx > li(i)=(i-1)*ny > enddo > > ct1=0 > ct2=0 > ct3=0 > ct4=0 > > do i=1,nx > do j=1,ny > > if (j==1) then ! fst_bc > ct1=ct1+1 > elseif (j==ny) then ! wall_bc > ct2=ct2+1 > elseif (i==1) then ! outlet_bc > ct3=ct3+1 > elseif (i==nx) then ! inlet_bc > ct4=ct4+1 > endif > > enddo > enddo > > ct=ct1+ct2+ct3+ct4 > > endif ! if (rank == 0) > > call MPI_Bcast(ct,1,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) > call MPI_Bcast(ct1,1,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) > call MPI_Bcast(ct2,1,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) > call MPI_Bcast(ct3,1,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) > call MPI_Bcast(ct4,1,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) > > allocate(fst_bc(ct1)) > allocate(wall_bc(ct2)) > allocate(outlet_bc(ct3)) > allocate(inlet_bc(ct4)) > > allocate(all_bc(3*ct)) > > allocate(u_bc(ct)) > allocate(v_bc(ct)) > allocate(w_bc(ct)) > allocate(p_bc(ct)) > > if (rank == 0) then > > ct1=0 > ct2=0 > ct3=0 > ct4=0 > > do i=1,nx > do j=1,ny > > ij=li(i)+j-1 > > if (j==1) then ! fst_bc > ct1=ct1+1 > fst_bc(ct1)=ij > elseif (j==ny) then ! wall_bc > ct2=ct2+1 > wall_bc(ct2)=ij > elseif (i==1) then ! outlet_bc > ct3=ct3+1 > outlet_bc(ct3)=ij > elseif (i==nx) then ! inlet_bc > ct4=ct4+1 > inlet_bc(ct4)=ij > endif > > enddo > enddo > > u_bc(1:ct1)=fst_bc > u_bc(ct1+1:ct1+ct2)=wall_bc > u_bc(ct1+ct2+1:ct1+ct2+ct3)=outlet_bc > u_bc(ct1+ct2+ct3+1:ct1+ct2+ct3+ct4)=inlet_bc > > v_bc=u_bc+nxy > w_bc=v_bc+nxy > p_bc=w_bc+nxy > > all_bc=[u_bc,v_bc,w_bc] > > endif ! if (rank == 0) > > call > MPI_Bcast(all_bc,3*ct,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) > > !For parallel case, all processes that share matrix MUST call this > routine, regardless of whether any rows being zeroed are owned by them. > call MatZeroRows(A,3*ct,all_bc, one,zero,zero,ierr);CHKERRQ(ierr) > call MatZeroRows(B,3*ct,all_bc,zero,zero,zero,ierr);CHKERRQ(ierr) > > !call MatView(B,PETSC_VIEWER_STDOUT_WORLD,ierr) > > ! cannot be inside a if (rank==0) statment!!! > call MatNorm(A,NORM_FROBENIUS,nrm,ierr)!;CHKERRQ(ierr) > call MatNorm(B,NORM_FROBENIUS,nrm1,ierr)!;CHKERRQ(ierr) > > if (rank==0) then > write(*,*)'norm A:',nrm > write(*,*)'norm B:',nrm1 > endif > > > > !!!!!! TESTING HOW BIG LU CAN BE !!!!!!!!!!!!!!!! > > !!$ sigma = (1.0d0,0.0d0) > !!$ call MatAXPY(A,-sigma,B,DIFFERENT_NONZERO_PATTERN,ierr) ! on exit A = > A-sigma*B > !!$ > !!$ !...Create linear solver context > !!$ call KSPCreate(PETSC_COMM_WORLD,ksp,ierr) > !!$ > !!$!...Set operators. Here the matrix that defines the linear system also > serves as the preconditioning matrix. > !!$ !call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr) !aha > commented and replaced by next line > !!$ call KSPSetOperators(ksp,A,A,ierr) ! remember: here A = A-sigma*B > !!$ > !!$ tol = 1.e-10 > !!$ call > KSPSetTolerances(ksp,tol,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr) ! > set relative and absolute tolerances and uses defualt for divergence tol > !!$ > !!$!...Set the Direct (LU) Solver > !!$ call KSPSetType(ksp,KSPPREONLY,ierr) > !!$ call KSPGetPC(ksp,pc,ierr) > !!$ call PCSetType(pc,PCLU,ierr) > !!$ call PCFactorSetMatSolverPackage(pc,MATSOLVERSUPERLU_DIST,ierr) ! > MATSOLVERSUPERLU_DIST MATSOLVERMUMPS > !!$ > !!$!...Create right-hand-side vector > !!$ call MatCreateVecs(A,frhs,PETSC_NULL_OBJECT,ierr) > !!$ call MatCreateVecs(A,sol,PETSC_NULL_OBJECT,ierr) > !!$ > !!$ allocate(xwork1(IendA-IstartA)) > !!$ allocate(loc(IendA-IstartA)) > !!$ > !!$ ct=0 > !!$ do i=IstartA,IendA-1 > !!$ ct=ct+1 > !!$ loc(ct)=i > !!$ xwork1(ct)=(1.0d0,0.0d0) > !!$ enddo > !!$ > !!$ call VecSetValues(frhs,IendA-IstartA,loc,xwork1,INSERT_VALUES,ierr) > !!$ call VecZeroEntries(sol,ierr) > !!$ > !!$ deallocate(xwork1,loc) > !!$ > !!$!...Assemble Vectors > !!$ call VecAssemblyBegin(frhs,ierr) > !!$ call VecAssemblyEnd(frhs,ierr) > !!$ > !!$!...Solve the linear system > !!$ call KSPSolve(ksp,frhs,sol,ierr) > !!$ > !!$ !call VecView(sol,PETSC_VIEWER_STDOUT_WORLD,ierr) > !!$ > !!$ STOP > > > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > !!!!!!!!!!!!!!!!!!!!!!! SLEPC !!!!!!!!!!!!!!!!!!!!!!!!!! > !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! > > if (rank==0) call cpu_time(start) > > > call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) > call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr) > if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (begin slepc)',rank,' > memory:',mem > > !...Get vector(s) compatible with the matrix, i.e. with the same parallel > layout > call MatCreateVecs(A,xr,xi,ierr) > !!$ call MatCreateVecs(A,xr,PETSC_NULL_OBJECT,ierr) > !!$ call MatCreateVecs(A,xi,PETSC_NULL_OBJECT,ierr) > > > !...Create Eigensolver Context and Spectral Transformation context > call EPSCreate(PETSC_COMM_WORLD,eps,ierr) > !call STCreate(PETSC_COMM_WORLD,st,ierr) > > !...Set Operators and Problem Type - Generalized Eigenvalue Problem > call EPSSetOperators(eps,A,B,ierr) > !call EPSSetProblemType(eps,EPS_PGNHEP,ierr) ! THIS ONE LEADS TO AN ERROR > --> CHECK WHY BECAUSE MY B SEEMS FITTING THAT CRITERION > call EPSSetProblemType(eps,EPS_GNHEP,ierr) > > !...Select Portion of Spectrum > call EPSSetTarget(eps,(3.0d0,0.d0),ierr) > call EPSSetWhichEigenpairs(eps,EPS_TARGET_MAGNITUDE,ierr) > > !...Select Algorithm for Eigenvalue Computation (default is Krylov-Schur) > call EPSSetType(eps,EPSKRYLOVSCHUR,ierr) !EPSARNOLDI,EPSKRYLOVSCHUR > > call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) > call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr) > if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (slepc after > EPSSetType)',rank,' memory:',mem > > !...Set Tolerance and Maxiter for Solver > tolslepc=1e-12 > maxiter=500 > !call EPSSetTolerances(eps,PETSC_DEFAULT_REAL,maxiter,ierr) > call EPSSetTolerances(eps,tolslepc,maxiter,ierr) > > !...Sets number of eigenvalues to compute and dimension of subspace > nev=200 > ncv=3*nev > mpd=10*nev > !call EPSSetDimensions(eps,nev,ncv,mpd,ierr) PETSC_DEFAULT > call > EPSSetDimensions(eps,nev,PETSC_DEFAULT_INTEGER,PETSC_DEFAULT_INTEGER,ierr) > > call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) > call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr) > if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (before > EPSGetST)',rank,' memory:',mem > > !...Set the Spectral Transformation --> Use Shift-And-Invert > call EPSGetST(eps,st,ierr) > call STSetType(st,STSINVERT,ierr) > > !...Set Linear Solver > call STGetKSP(st,ksp,ierr) > call KSPSetType(ksp,KSPPREONLY,ierr) > call KSPGetPC(ksp,pc,ierr) > call PCSetType(pc,PCLU,ierr) ! could try PCCHOLESKY > call PCFactorSetMatSolverPackage(pc,MATSOLVERSUPERLU_DIST,ierr) ! > MATSOLVERSUPERLU_DIST,MATSOLVERMUMPS > > !...Set Solver Parameters at Runtime > call EPSSetFromOptions(eps,ierr) > > call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) > call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr) > if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (after > EPSSetFromOptions)',rank,' memory:',mem > > !...Solve the Eigensystem > call EPSSolve(eps,ierr) > > call EPSGetIterationNumber(eps,its,ierr) > if (rank .eq. 0) then > write(*,110) its > endif > 110 format (/' Number of iterations of the method:',I4) > > !...Optional: Get some information from the solver and display it > call EPSGetType(eps,tname,ierr) > if (rank .eq. 0) then > write(*,120) tname > endif > 120 format (' Solution method: ',A) > > nev=10 > call EPSGetDimensions(eps,nev,PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,ierr) > if (rank .eq. 0) then > write(*,130) nev > endif > 130 format (' Number of requested eigenvalues:',I4) > call EPSGetTolerances(eps,tol,maxit,ierr) > if (rank .eq. 0) then > write(*,140) tol, maxit > endif > 140 format (' Stopping condition: tol=',1P,E10.4,', maxit=',I4) > > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > ! Display solution and clean up > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > !...Get number of converged eigenpairs > call EPSGetConverged(eps,nconv,ierr) > if (rank .eq. 0) then > write(*,150) nconv > endif > 150 format (' Number of converged eigenpairs:',I4/) > > !...Display eigenvalues and relative errors > if (nconv.gt.0) then > if (rank .eq. 0) then > write(*,*) ' kr ki ||Ax-kx||/||kx||' > write(*,*) ' ----------------- ----------------- ------------------' > endif > do i=0,nconv-1 > !...Get converged eigenpairs: i-th eigenvalue is stored in kr (real > part) and ki (imaginary part) > call EPSGetEigenpair(eps,i,kr,ki,xr,xi,ierr) > > !...Compute the relative error associated to each eigenpair > call EPSComputeError(eps,i,EPS_ERROR_RELATIVE,error,ierr) > if (rank .eq. 0) then > write(*,160) PetscRealPart(kr),PetscImaginaryPart(kr), error > endif > 160 format (1P,' ',E21.11,' ',E21.11,' ',E21.11) > > enddo > > endif > > !...Free work space > > if (rank==0) deallocate(li) > > deallocate(D1Xc,D2Xc,D1Yc,D2Yc) > > deallocate(fst_bc) > deallocate(wall_bc) > deallocate(outlet_bc) > deallocate(inlet_bc) > > deallocate(all_bc) > > deallocate(u_bc) > deallocate(v_bc) > deallocate(w_bc) > deallocate(p_bc) > > !call STDestroy(st,ierr) > call EPSDestroy(eps,ierr) > call MatDestroy(A,ierr) > call MatDestroy(B,ierr) > !call VecDestroy(i_vec,ierr) > call VecDestroy(xr,ierr) > call VecDestroy(xi,ierr) > > !...Finalize > call SlepcFinalize(ierr) > > if (rank==0) then > call cpu_time(finish) > write(*,*) > print '("Total time for SLEPC = ",f21.3," seconds.")',finish-start > endif > > END Subroutine SETUPSOLVEGEVP > > > END MODULE PETSC > > > > > > !!$ write(*,*)'fst_bc',fst_bc > !!$ write(*,*)'wall_bc',wall_bc > !!$ write(*,*)'outlet_bc',outlet_bc > !!$ write(*,*)'inlet_bc',inlet_bc > !!$ > !!$ write(*,*)'all_bc',all_bc > > !!$ write(*,*)'ct1+ct2+ct3+ct4',ct1+ct2+ct3+ct4 > > > > !!$ call MatCreate(PETSC_COMM_WORLD,A,ierr) > !!$ call MatSetSizes(A,PETSC_DECIDE,PETSC_DECIDE,4*nxy,4*nxy,ierr) > !!$ call MatSetFromOptions(A,ierr) > !!$ call MatSetUp(A,ierr) > > > > !call VecCreateSeq(PETSC_COMM_SELF,nxy,i_vec,ierr) ! sequential vector > !call VecSetValues(i_vec,nxy,loc,xwork1,INSERT_VALUES,ierr) > > ! Assemble vector > !call VecAssemblyBegin(i_vec,ierr) > !call VecAssemblyEnd(i_vec,ierr) > > ! Create parallel matrix, specifying only its global dimensions. > ! When using MatCreate(), the matrix format can be specified at > ! runtime. Also, the parallel partitioning of the matrix is > ! determined by PETSc at runtime. > > > > !!$ call MatCreate(PETSC_COMM_WORLD,B,ierr) > !!$ call MatSetSizes(B,PETSC_DECIDE,PETSC_DECIDE,4*nxy,4*nxy,ierr) > !!$ call MatSetFromOptions(B,ierr) > !!$ call MatSetUp(B,ierr) > > ! Currently, all PETSc parallel matrix formats are partitioned by > ! contiguous chunks of rows across the processors. Determine which > ! rows of the matrix are locally owned. > > > > > > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > ! Include files > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > ! > ! This program uses CPP for preprocessing, as indicated by the use of > ! PETSc include files in the directory petsc/include/finclude. This > ! convention enables use of the CPP preprocessor, which allows the use > ! of the #include statements that define PETSc objects and variables. > ! > ! Use of the conventional Fortran include statements is also supported > ! In this case, the PETsc include files are located in the directory > ! petsc/include/foldinclude. > ! > ! Since one must be very careful to include each file no more than once > ! in a Fortran routine, application programmers must exlicitly list > ! each file needed for the various PETSc components within their > ! program (unlike the C/C++ interface). > ! > > > > > !write(*,*)'IstartA,IendA',IstartA,IendA > !write(*,*)'IstartB,IendB',IstartB,IendB > > > > !!$#include <finclude/petscsys.h> > !!$#include <finclude/petscvec.h> > !!$#include <finclude/petscmat.h> > !!$#include <finclude/petscmat.h90> > !!$#include <finclude/petscpc.h> > !!$#include <finclude/petscksp.h> > !!$ > !!$#include <finclude/slepcsys.h> > !!$#include <finclude/slepceps.h> > > > > > > > > > !!$ do i=1,nx > !!$ do j=1,nx > !!$ D1X(i,j)=100+j+(i-1)*nx > !!$ enddo > !!$ enddo > > !!$ do i=1,ny > !!$ do j=1,ny > !!$ D1Y(i,j)=999.d0 > !!$ enddo > !!$ enddo > > > > > > > > > > > ! Create parallel vectors. > ! - Here, the parallel partitioning of the vector is determined by > ! PETSc at runtime. We could also specify the local dimensions > ! if desired -- or use the more general routine VecCreate(). > ! - When solving a linear system, the vectors and matrices MUST > ! be partitioned accordingly. PETSc automatically generates > ! appropriately partitioned matrices and vectors when MatCreate() > ! and VecCreate() are used with the same communicator. > ! - Note: We form 1 vector from scratch and then duplicate as needed. > > !call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,m*n,u,ierr) > !call VecSetFromOptions(u,ierr) > !call VecDuplicate(u,b,ierr) > !call VecDuplicate(b,x,ierr) > > ! Set exact solution; then compute right-hand-side vector. > ! By default we use an exact solution of a vector with all > ! elements of 1.0; Alternatively, using the runtime option > ! -random_sol forms a solution vector with random components. > > !!$ call > PetscOptionsHasName(PETSC_NULL_CHARACTER,'-random_exact_sol',flg,ierr) > !!$ if (flg) then > !!$ call PetscRandomCreate(PETSC_COMM_WORLD,rctx,ierr) > !!$ call PetscRandomSetFromOptions(rctx,ierr) > !!$ call VecSetRandom(u,rctx,ierr) > !!$ call PetscRandomDestroy(rctx,ierr) > !!$ else > !!$ call VecSet(u,one,ierr) > !!$ endif > !!$ call MatMult(A,u,b,ierr) > > > ! View the exact solution vector if desired > > !!$ call > PetscOptionsHasName(PETSC_NULL_CHARACTER,"-view_exact_sol",flg,ierr) > !!$ if (flg) then > !!$ call VecView(u,PETSC_VIEWER_STDOUT_WORLD,ierr) > !!$ endif > > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > ! Create the linear solver and set various options > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > ! Create linear solver context > > !!$ call KSPCreate(PETSC_COMM_WORLD,ksp,ierr) > > > ! Set operators. Here the matrix that defines the linear system > ! also serves as the preconditioning matrix. > > !call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr) !aha > commented and replaced by next line > !!$ call KSPSetOperators(ksp,A,A,ierr) > > > ! Set linear solver defaults for this problem (optional). > ! - By extracting the KSP and PC contexts from the KSP context, > ! we can then directly directly call any KSP and PC routines > ! to set various options. > ! - The following four statements are optional; all of these > ! parameters could alternatively be specified at runtime via > ! KSPSetFromOptions(). All of these defaults can be > ! overridden at runtime, as indicated below. > > ! We comment out this section of code since the Jacobi > ! preconditioner is not a good general default. > > ! call KSPGetPC(ksp,pc,ierr) > ! ptype = PCJACOBI > ! call PCSetType(pc,ptype,ierr) > ! tol = 1.e-7 > ! call KSPSetTolerances(ksp,tol,PETSC_DEFAULT_DOUBLE_PRECISION, > ! & PETSC_DEFAULT_DOUBLE_PRECISION,PETSC_DEFAULT_INTEGER,ierr) > > ! Set user-defined monitoring routine if desired > > !call > PetscOptionsHasName(PETSC_NULL_CHARACTER,'-my_ksp_monitor',flg,ierr) ! flg > set to true if that option has been specified > !if (flg) then > ! call > KSPMonitorSet(ksp,MyKSPMonitor,PETSC_NULL_OBJECT,PETSC_NULL_FUNCTION,ierr) ! > call MyKSPMonitor > !endif > > !tol = 1.e-10 > !call > KSPSetTolerances(ksp,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr) > ! set relative tolerance and uses defualt for absolute and divergence tol > !call > KSPSetTolerances(ksp,tol,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr) ! > set relative and absolute tolerances and uses defualt for divergence tol > > ! Set runtime options, e.g., > ! -ksp_type <type> -pc_type <type> -ksp_monitor -ksp_rtol <rtol> > ! These options will override those specified above as long as > ! KSPSetFromOptions() is called _after_ any other customization > ! routines. > > !call KSPSetFromOptions(ksp,ierr) > > > ! Set convergence test routine if desired > > !call > PetscOptionsHasName(PETSC_NULL_CHARACTER,'-my_ksp_convergence',flg,ierr) > !if (flg) then > ! call > KSPSetConvergenceTest(ksp,MyKSPConverged,PETSC_NULL_OBJECT,PETSC_NULL_FUNCTION,ierr) > !endif > > > > !call > KSPMonitorSet(ksp,KSPMonitorTrueResidualNorm,PETSC_NULL_OBJECT,PETSC_NULL_FUNCTION,ierr); > > > ! > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > ! Solve the linear system > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > !call KSPSolve(ksp,b,x,ierr) > > > ! > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > ! View solver info (could use -ksp_view instead) > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > !added by aha > > !!$ write(*,*)'' > !!$ write(*,*)'Start of KSPView' > !!$ write(*,*)'----------------' > !!$ write(*,*)'' > !!$ call KSPView(ksp,PETSC_VIEWER_STDOUT_WORLD,ierr) > !!$ write(*,*)'' > !!$ write(*,*)'End of KSPView' > !!$ write(*,*)'--------------' > !!$ write(*,*)'' > > > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > ! Check solution and clean up > ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > ! Check the error > !!$ call VecAXPY(x,neg_one,u,ierr) > !!$ call VecNorm(x,NORM_2,norm,ierr) > !!$ call KSPGetIterationNumber(ksp,its,ierr) > !!$ if (rank .eq. 0) then > !!$ if (norm .gt. 1.e-12) then > !!$ write(6,100) norm,its > !!$ else > !!$ write(6,110) its > !!$ endif > !!$ endif > !!$ 100 format('Norm of error ',e11.4,' iterations ',i5) > !!$ 110 format('Norm of error < 1.e-12,iterations ',i5) > > ! Free work space. All PETSc objects should be destroyed when they > ! are no longer needed. > > !!$ call KSPDestroy(ksp,ierr) > !!$ call VecDestroy(u,ierr) > !!$ call VecDestroy(x,ierr) > !!$ call VecDestroy(b,ierr) > !!$ call MatDestroy(A,ierr) > > ! Always call PetscFinalize() before exiting a program. This routine > ! - finalizes the PETSc libraries as well as MPI > ! - provides summary and diagnostic information if certain runtime > ! options are chosen (e.g., -log_summary). See PetscFinalize() > ! manpage for more information. > > > ! -------------------------------------------------------------- > ! > ! MyKSPMonitor - This is a user-defined routine for monitoring > ! the KSP iterative solvers. > ! > ! Input Parameters: > ! ksp - iterative context > ! n - iteration number > ! rnorm - 2-norm (preconditioned) residual value (may be estimated) > ! dummy - optional user-defined monitor context (unused here) > ! > !!$ subroutine MyKSPMonitor(ksp,n,rnorm,dummy,ierr) > !!$ > !!$ implicit none > !!$ > !!$#include <finclude/petscsys.h> > !!$#include <finclude/petscvec.h> > !!$#include <finclude/petscksp.h> > !!$ > !!$ KSP ksp > !!$ Vec x > !!$ PetscErrorCode ierr > !!$ PetscInt n,dummy > !!$ PetscMPIInt rank > !!$ double precision rnorm > !!$ > !!$! Build the solution vector > !!$ > !!$ call KSPBuildSolution(ksp,PETSC_NULL_OBJECT,x,ierr) > !!$ > !!$! Write the solution vector and residual norm to stdout > !!$! - Note that the parallel viewer PETSC_VIEWER_STDOUT_WORLD > !!$! handles data from multiple processors so that the > !!$! output is not jumbled. > !!$ > !!$ call MPI_Comm_rank(PETSC_COMM_WORLD,rank,ierr) > !!$ if (rank .eq. 0) write(6,100) n > !!$ call VecView(x,PETSC_VIEWER_STDOUT_WORLD,ierr) > !!$ if (rank .eq. 0) write(6,200) n,rnorm > !!$ > !!$ 100 format('iteration ',i5,' solution vector:') > !!$ 200 format('iteration ',i5,' residual norm ',e11.4) > !!$ ierr = 0 > !!$ > !!$ end subroutine MyKSPMonitor > > ! -------------------------------------------------------------- > ! > ! MyKSPConverged - This is a user-defined routine for testing > ! convergence of the KSP iterative solvers. > ! > ! Input Parameters: > ! ksp - iterative context > ! n - iteration number > ! rnorm - 2-norm (preconditioned) residual value (may be estimated) > ! dummy - optional user-defined monitor context (unused here) > ! > !!$ subroutine MyKSPConverged(ksp,n,rnorm,flag,dummy,ierr) > !!$ > !!$ implicit none > !!$ > !!$#include <finclude/petscsys.h> > !!$#include <finclude/petscvec.h> > !!$#include <finclude/petscksp.h> > !!$ > !!$ KSP ksp > !!$ PetscErrorCode ierr > !!$ PetscInt n,dummy > !!$ KSPConvergedReason flag > !!$ double precision rnorm > !!$ > !!$ if (rnorm .le. .05) then > !!$ flag = 1 > !!$ else > !!$ flag = 0 > !!$ endif > !!$ ierr = 0 > !!$ > !!$ end subroutine MyKSPConverged > > > > <valgrind.log.31371><valgrind.log.31372>
