Hi Barry,

If you look in module_baseflows.F90 line 467-473, I have:

    call KSPDestroy(ksp,ierr)
    call VecDestroy(frhs,ierr)
    call VecDestroy(frhs2,ierr)
    call VecDestroy(sol,ierr)
    call VecDestroy(sol_seq,ierr)
    call MatDestroy(DXX,ierr)
    call MatDestroy(DYY,ierr)

I checked again and all the vectors that I create (see module_baseflows.F90 lines 87-88) are destroyed there. Also when I use VecScatterCreateToZero, I always have a VecScatterDestroy after.

Thanks,

Anthony



On 07/13/2015 12:15 PM, Barry Smith wrote:
On Jul 13, 2015, at 2:05 PM, Anthony Haas <[email protected]> wrote:

Hi,

I ran my program under Valgrind with 2 processors using the following:

${PETSC_DIR}/${PETSC_ARCH}/bin/mpiexec -n 2 valgrind --tool=memcheck 
--leak-check=full --show-leak-kinds=all --num-callers=20 
--log-file=valgrind.log.%p ./BiGlobal_Spectral_Petsc.x -malloc off -ksp_type 
preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist -eps_view 
-st_ksp_type preonly -st_pc_type lu -st_pc_factor_mat_solver_package 
superlu_dist

Note that in my program, I use once PETSc alone to solve a linear system of 
equations (see module_baseflows.F90) and then later in the code I use SLEPc to 
get the eigenvalues (see module_petsc.F90). My main program is called 
BiGlobal_Spectral_Petsc.F90

I have attached the log files from Valgrind. It seems that quite a few erros 
occur in pzgstrf. See valgrind.log.31371 and 31372. Do I have any control over 
these errors?
    These are likely bugs in SuperLU_Dist.

Then starting from line 403 in valgrind.log.31371 and 458 in 
valgrind.log.31372, I have a series of errors that seem to be related to a 
VecCreateSeq and to a VecScatterCreateToZero (see in module_baseflows.F90). Is 
there something I can do to avoid these errors?
    These are because you are not destroying all your vectors at the end.

Thanks

Anthony







On 07/08/2015 01:49 PM, Barry Smith wrote:
    You should try running under valgrind, see:  
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind

     You can also run in the debugger (yes this is tricky on a batch system but possible) and see 
exactly what triggers the floating point exception or when it "hangs" interrupt the 
program to see where it is "hanging".


   Barry



On Jul 8, 2015, at 12:34 PM, Anthony Paul Haas <[email protected]> wrote:

Hi,

I have used the switch -mat_superlu_dist_parsymbfact in my pbs script. However, 
although my program worked fine with sequential symbolic factorization, I get 
one of the following 2 behaviors when I run with parallel symbolic 
factorization (depending on the number of processors that I use):

1) the program just hangs (it seems stuck in some subroutine ==> see 
test.out-hangs)
2) I get a floating point exception ==> see test.out-floating-point-exception

Note that as suggested in the Superlu manual, I use a power of 2 number of 
procs. Are there any tunable parameters for the parallel symbolic 
factorization? Note that when I build my sparse matrix, most elements I add are 
nonzero of course but to simplify the programming, I also add a few zero 
elements in the sparse matrix. I was thinking that maybe if the parallel 
symbolic factorization proceed by block, there could be some blocks where the 
pivot would be zero, hence creating the FPE??

Thanks,

Anthony



On Wed, Jul 8, 2015 at 6:46 AM, Xiaoye S. Li <[email protected]> wrote:
Did you find out how to change option to use parallel symbolic factorization?  
Perhaps PETSc team can help.

Sherry


On Tue, Jul 7, 2015 at 3:58 PM, Xiaoye S. Li <[email protected]> wrote:
Is there an inquiry function that tells you all the available options?

Sherry

On Tue, Jul 7, 2015 at 3:25 PM, Anthony Paul Haas <[email protected]> 
wrote:
Hi Sherry,

Thanks for your message. I have used superlu_dist default options. I did not 
realize that I was doing serial symbolic factorization. That is probably the 
cause of my problem.
Each node on Garnet has 60GB usable memory and I can run with 1,2,4,8,16 or 32 
core per node.

So I should use:

-mat_superlu_dist_r 20
-mat_superlu_dist_c 32

How do you specify the parallel symbolic factorization option? is it 
-mat_superlu_dist_matinput 1

Thanks,

Anthony


On Tue, Jul 7, 2015 at 3:08 PM, Xiaoye S. Li <[email protected]> wrote:
For superlu_dist failure, this occurs during symbolic factorization.  Since you 
are using serial symbolic factorization, it requires the entire graph of A to 
be available in the memory of one MPI task. How much memory do you have for 
each MPI task?

It won't help even if you use more processes.  You should try to use parallel 
symbolic factorization option.

Another point.  You set up process grid as:
        Process grid nprow 32 x npcol 20
For better performance, you show swap the grid dimension. That is, it's better 
to use 20 x 32, never gives nprow larger than npcol.


Sherry


On Tue, Jul 7, 2015 at 1:27 PM, Barry Smith <[email protected]> wrote:

    I would suggest running a sequence of problems, 101 by 101 111 by 111 etc 
and get the memory usage in each case (when you run out of memory you can get 
NO useful information out about memory needs). You can then plot memory usage 
as a function of problem size to get a handle on how much memory it is using.  
You can also run on more and more processes (which have a total of more memory) 
to see how large a problem you may be able to reach.

    MUMPS also has an "out of core" version (which we have never used) that 
could in theory anyways let you get to large problems if you have lots of disk space, but 
you are on your own figuring out how to use it.

   Barry

On Jul 7, 2015, at 2:37 PM, Anthony Paul Haas <[email protected]> wrote:

Hi Jose,

In my code, I use once PETSc to solve a linear system to get the baseflow 
(without using SLEPc) and then I use SLEPc to do the stability analysis of that 
baseflow. This is why, there are some SLEPc options that are not used in 
test.out-superlu_dist-151x151 (when I am solving for the baseflow with PETSc 
only). I have attached a 101x101 case for which I get the eigenvalues. That 
case works fine. However If i increase to 151x151, I get the error that you can 
see in test.out-superlu_dist-151x151 (similar error with mumps: see 
test.out-mumps-151x151 line 2918 ). If you look a the very end of the files 
test.out-superlu_dist-151x151 and test.out-mumps-151x151, you will see that the 
last info message printed is:

On Processor (after EPSSetFromOptions)  0    memory:    0.65073152000E+08          
=====>  (see line 807 of module_petsc.F90)

This means that the memory error probably occurs in the call to EPSSolve (see 
module_petsc.F90 line 810). I would like to evaluate how much memory is 
required by the most memory intensive operation within EPSSolve. Since I am 
solving a generalized EVP, I would imagine that it would be the LU 
decomposition. But is there an accurate way of doing it?

Before starting with iterative solvers, I would like to exploit as much as I 
can direct solvers. I tried GMRES with default preconditioner at some point but 
I had convergence problem. What solver/preconditioner would you recommend for a 
generalized non-Hermitian (EPS_GNHEP) EVP?

Thanks,

Anthony

On Tue, Jul 7, 2015 at 12:17 AM, Jose E. Roman <[email protected]> wrote:

El 07/07/2015, a las 02:33, Anthony Haas escribió:

Hi,

I am computing eigenvalues using PETSc/SLEPc and superlu_dist for the LU 
decomposition (my problem is a generalized eigenvalue problem). The code runs 
fine for a grid with 101x101 but when I increase to 151x151, I get the 
following error:

Can't expand MemType 1: jcol 16104   (and then [NID 00037] 2015-07-06 19:19:17 
Apid 31025976: OOM killer terminated this process.)

It seems to be a memory problem. I monitor the memory usage as far as I can and 
it seems that memory usage is pretty low. The most memory intensive part of the 
program is probably the LU decomposition in the context of the generalized EVP. 
Is there a way to evaluate how much memory will be required for that step? I am 
currently running the debug version of the code which I would assume would use 
more memory?

I have attached the output of the job. Note that the program uses twice PETSc: 
1) to solve a linear system for which no problem occurs, and, 2) to solve the 
Generalized EVP with SLEPc, where I get the error.

Thanks

Anthony
<test.out-superlu_dist-151x151>
In the output you are attaching there are no SLEPc objects in the report and 
SLEPc options are not used. It seems that SLEPc calls are skipped?

Do you get the same error with MUMPS? Have you tried to solve linear systems 
with a preconditioned iterative solver?

Jose


<module_petsc.F90><test.out-mumps-151x151><test.out_superlu_dist-101x101><test.out-superlu_dist-151x151>




<test.out-hangs><test.out-floating-point-exception>

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! SPECTRAL TEMPORAL BIGLOBAL SOLVER !
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

PROGRAM Biglobal_Spectral

  USE PETSC
  USE TYPEDEF
  USE MAPPINGS
  USE BASEFLOW
  USE WRITE_PRINT
  USE FOURIER_CHEBYSHEV

  IMPLICIT NONE

  include 'mpif.h'

! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -!
!                           Variables Definition                         !
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -!

!...Testing and dummy variables
    real(dp) :: dummy

!...Problem parameters
    integer(i4b),parameter :: nx=11,ny=11,nxy=nx*ny,N=4*nxy

    !Grid size for 3D Poiseuille baseflow
    integer(i4b),parameter :: nnx=nx,nny=ny

    real(dp) :: Re,alpha,beta,Lx,xmin,xmax,ymin,ymax,AA,xll,yll

!...Problem Flags
    integer(i4b) :: fourier,testcase,formulation,algorithm

    !Mapping Flags
    integer(i4b) :: map,mapx,mapy

!...Indices
    integer(i4b) :: i,j,iii,jjj,ij,skip,imin,imax,jmin,jmax

!...Output files
    integer(i4b)   :: file_num
    character(128) :: filetype,filename,file_name

!...Boundary conditions variables
    integer(i4b) :: ct1,ct2,ct3,ct4
    integer(i4b),dimension(:),allocatable :: fst_bc,wall_bc,outlet_bc,inlet_bc
    integer(i4b),dimension(:),allocatable :: bcond_px,bcond_py,bcond_pz
    integer(i4b),dimension(:),allocatable :: bcond_u,bcond_v,bcond_w,bcond_p
    integer(i4b),dimension(:),allocatable :: bcond

!...Chebyshev related
    real(dp),dimension(:),allocatable   :: xi,yi,x,y,xfourier

!...Baseflow variables
    real(dp),dimension(:,:),allocatable :: usol,uxsol,uysol
    real(dp),dimension(:,:),allocatable :: vsol,vxsol,vysol
    real(dp),dimension(:,:),allocatable :: wsol,wxsol,wysol

    !For Hiemenz Baseflow
    integer(i4b) :: npts_hz
    real(dp) :: ymax_hz,epstol,si

    real(dp),dimension(:),allocatable :: eta
    real(dp),dimension(:),allocatable :: u0,v0,v1,v2,v3
    real(dp),dimension(:),allocatable :: w0,w1,w2
    real(dp),dimension(:),allocatable :: etarev,u0rev,v0rev,w0rev

    real(dp),dimension(:,:),allocatable :: Ubar,Vbar,Wbar

!...Grid variables
    real(dp),dimension(:,:),allocatable :: xx,yy,xx1,yy1

!...Vectorization array
    integer(i4b),dimension(:),allocatable :: li

!...MPI
    integer(i4b) :: rank,size,ierr

!...Spline interpolation
    !real(dp),dimension(:),allocatable :: bsp,csp,dsp
    !real(dp),dimension(:),allocatable :: interp_u0,interp_v0,interp_w0

    real(dp) :: seval

!...External Subroutines
    external seval

! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -!
!                           Beginning of program                         !
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -!

    call MPI_INIT(ierr)

    call MPI_COMM_SIZE(MPI_COMM_WORLD,size,ierr)
    call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)

    if (rank==0) call cpu_time(start_total)

!...Parameters for BiGlobal Computation

    testcase=2      ! 1 --> Plane Poiseuille Flow
                    ! 2 --> 3D Poiseuille Flow
                    ! 3 --> Attachment line flow (Hiemenz)
                    ! 4 --> Laminar Separation Bubble
                    ! 5 --> Artificially parallel boundary-layer (cf. Rodriguez 
thesis)

    formulation = 1 ! 1 --> x,y inhomogeneous
                    ! 2 --> y,z inhomogeneous

    algorithm = 1   ! 1 --> QZ algorith
                    ! 2 --> Arnoldi algorithm


!...Arrays allocation

    allocate(li(nx))
    allocate(xi(nx),yi(ny))
    allocate(x(nx),y(ny))
    allocate(xfourier(nx))

    allocate(uvec(nxy),uxvec(nxy),uyvec(nxy))
    allocate(vvec(nxy),vxvec(nxy),vyvec(nxy))
    allocate(wvec(nxy),wxvec(nxy),wyvec(nxy))

    allocate(xx(nx,ny),yy(nx,ny))
    allocate(D1X(nx,nx),D2X(nx,nx))
    allocate(D1Y(ny,ny),D2Y(ny,ny))
    allocate(DMX(nx,nx,2),DMY(ny,ny,2))

    !allocate(interp_u0(ny),interp_v0(ny),interp_w0(ny))

    li       = 0
    xi       = 0.0d0
    yi       = 0.0d0
    x        = 0.0d0
    y        = 0.0d0
    xfourier = 0.0d0
    uvec     = 0.0d0
    uxvec    = 0.0d0
    uyvec    = 0.0d0
    vvec     = 0.0d0
    vxvec    = 0.0d0
    vyvec    = 0.0d0
    wvec     = 0.0d0
    wxvec    = 0.0d0
    wyvec    = 0.0d0
    xx       = 0.0d0
    yy       = 0.0d0
    D1X      = 0.0d0
    D2X      = 0.0d0
    D1Y      = 0.0d0
    D2Y      = 0.0d0
    DMX      = 0.0d0
    DMY      = 0.0d0

!...Test Cases Parameters

    if ( testcase == 2 ) then

       mapx=1
       mapy=0

       Re=1000

       if (formulation == 1)then
          beta = pi
       elseif (formulation == 2) then
          alpha = pi
       endif

       AA=1.0

       xmin=-AA
       xmax=AA

       ymin=-1.0
       ymax=1.0

       !grid for baseflow
       !nnx=nx
       !nny=ny

    endif

!...Working Array for Converting 2D Indices to 1D

    li=0

    do i=1,nx
       li(i)=(i-1)*ny
    enddo

!...Get Collocation Points and Chebyshev Differentiation Matrices

    call cheby(nx-1,xi,DMX) ! (cf. Spectral Methods in Matlab pages 53-55)
    call cheby(ny-1,yi,DMY)

    xll=0

    ! To get matrices D1X,D2X,D1Y,D2Y
    call set_mappings(nx,ny,mapx,mapy,xmin,xmax,ymin,ymax,xi,yi,xll,yll,x,y)

    do i=1,nx
       do j=1,ny

          xx(i,j)=x(i) ! not equivalent to matlab meshgrid!!!
          yy(i,j)=y(j) ! not equivalent to matlab meshgrid!!!

       enddo
    enddo


!...Compute Baseflow (3D Poiseuille)

    if ( testcase == 2 ) then

       allocate(xx1(nnx,nny),yy1(nnx,nny))

       allocate(usol(nnx,nny),uxsol(nnx,nny),uysol(nnx,nny))
       allocate(vsol(nnx,nny),vxsol(nnx,nny),vysol(nnx,nny))
       allocate(wsol(nnx,nny),wxsol(nnx,nny),wysol(nnx,nny))

       xx1   = 0.0d0
       yy1   = 0.0d0
       usol  = 0.0d0
       uxsol = 0.0d0
       uysol = 0.0d0
       vsol  = 0.0d0
       vxsol = 0.0d0
       vysol = 0.0d0
       wsol  = 0.0d0
       wxsol = 0.0d0
       wysol = 0.0d0

       if (rank == 0) then
          WRITE(*,*)
          WRITE(*,'(A,I3,A,I3)')'3D POISEUILLE BASEFLOW COMPUTED ON GRID WITH 
NX =',nnx,' NY =',nny
          WRITE(*,*)
          call cpu_time(start)
       endif

       !currently I am taking same grid for baseflow as for biglobal ==> no 
interpolation
       call Poiseuille_3D_PETSC(nnx,nny,AA,usol,xx1,yy1)

       if (rank==0) then
          call cpu_time(finish)
          print '("Time for baseflow computation = ",f15.4," 
seconds.")',finish-start

          file_num=10
          file_name='check_baseflow_3D_poiseuille.dat'

          call 
write_to_tecplot(file_num,file_name,nnx,nny,xx1,yy1,usol,usol,usol)

       endif

       vsol=0*usol
       wsol=0*usol

       if (formulation==1) then

          wsol=usol
          usol=0.*wsol
          vsol=0.*wsol

       endif

       !Compute derivatives
       uxsol = MATMUL(D1X,usol)
       uysol = MATMUL(usol,TRANSPOSE(D1Y))

       vxsol = MATMUL(D1X,vsol)
       vysol = MATMUL(vsol,TRANSPOSE(D1Y))

       wxsol = MATMUL(D1X,wsol)
       wysol = MATMUL(wsol,TRANSPOSE(D1Y))

       !Vectorize the matrices
       do i=1,nx
          do j=1,ny

             ij=li(i)+j

             uvec(ij) = usol(i,j)
             vvec(ij) = vsol(i,j)
             wvec(ij) = wsol(i,j)

             uxvec(ij) = uxsol(i,j)
             vxvec(ij) = vxsol(i,j)
             wxvec(ij) = wxsol(i,j)

             uyvec(ij) = uysol(i,j)
             vyvec(ij) = vysol(i,j)
             wyvec(ij) = wysol(i,j)

          enddo
       enddo

    endif

    !call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)

!...Build Matrices, Set BCs and Solve Problem

    call SetupSolveGEVP(nx,ny,nxy,alpha,beta,Re)

    if (rank==0) then
       call cpu_time(finish_total)
       print '("Total time = ",f15.4," seconds.")',finish_total-start_total
    endif

    deallocate(li)
    deallocate(xi,yi)
    deallocate(x,y)
    deallocate(xfourier)

    deallocate(uvec,uxvec,uyvec)
    deallocate(vvec,vxvec,vyvec)
    deallocate(wvec,wxvec,wyvec)

    deallocate(xx,yy)
    deallocate(D1X,D2X)
    deallocate(D1Y,D2Y)
    deallocate(DMX,DMY)

    deallocate(xx1,yy1)

    deallocate(usol,uxsol,uysol)
    deallocate(vsol,vxsol,vysol)
    deallocate(wsol,wxsol,wysol)

    call MPI_FINALIZE(ierr)


  END PROGRAM Biglobal_Spectral

MODULE BASEFLOW

  USE TYPEDEF
  USE WRITE_PRINT
  USE FOURIER_CHEBYSHEV

  IMPLICIT NONE

CONTAINS

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

  SUBROUTINE Poiseuille_3D_PETSC(nx,ny,AA,usol,xx,yy)

#include <petsc/finclude/petscsys.h>
#include <petsc/finclude/petscvec.h>
#include <petsc/finclude/petscmat.h>
#include <petsc/finclude/petscmat.h90>
#include <petsc/finclude/petscvec.h90>
#include <petsc/finclude/petscpc.h>
#include <petsc/finclude/petscksp.h>
#include <petsc/finclude/petscviewer.h>
#include <petsc/finclude/petscviewer.h90>

#include <slepc/finclude/slepcsys.h>
#include <slepc/finclude/slepceps.h>

!...Regular Variables

    integer(i4b),intent(in) :: nx,ny

    real(dp),intent(in) :: AA
    real(dp),dimension(nx,ny),intent(inout) :: usol,xx,yy

    integer(i4b) :: i,j,ij
    integer(i4b) :: rank,size,ierr,source
    integer(i4b) :: nxy,nxm2,nym2,nxym2

    integer(i4b),dimension(nx) :: li
    integer(i4b),dimension(nx-2) :: lim2

    integer(i4b),dimension(:),allocatable  :: loc

    real(dp) :: ymin,ymax,xmin,xmax
    real(dp) :: dxidx,d2xidx2,dyidy,d2yidy2

    real(dp),dimension(nx) :: xi,x
    real(dp),dimension(ny) :: yi,y

    real(dp),dimension(nx,nx) :: D2XB
    real(dp),dimension(ny,ny) :: D2YB
    real(dp),dimension(nx,nx,2) :: DMXB
    real(dp),dimension(ny,ny,2) :: DMYB

    real(dp),dimension(nx*ny) :: usol_vec
    real(dp),dimension( nx-2,ny-2 ) :: sol_mat

    complex(dpc) :: U00
    complex(dpc) :: atest
    complex(dpc),dimension(nx-2,nx-2) :: D2XBs
    complex(dpc),dimension(ny-2,ny-2) :: D2YBs
    complex(dpc),dimension( nx-2,ny-2 ) :: sol_mat_cmplx
    complex(dpc),dimension(:),allocatable  :: xwork1

!!$    allocate(li(nx))
!!$    allocate(lim2(nx-2))

!!! NEED TO TRANSFORM ALL THESE ARRAYS TO ALLOCATABLE!!!!!!

!...Petsc Variables
    PetscInt ii,jj,ione,ct
    PetscInt jmin,jmax
    PetscInt Istart,Iend
    PetscInt IstartVec,IendVec

    PetscReal   tol
    PetscScalar aone

    PetscViewer viewer

    PC         pc
    KSP        ksp
    VecScatter ctx
    Mat        DXX,DYY
    !Mat        DLOAD
    Vec        frhs,frhs2
    Vec        sol,sol_seq


    !For VecGetArrayReadF90
    !PetscScalar,pointer :: xx_v(:) !Fortran90 pointer to the array
    PetscOffset i_x
    PetscScalar sol_array(1)

!...MPI
    !PetscMPIInt    rank,size

! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -!
!                         Beginning of program                           !
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -!

    call MPI_COMM_SIZE(MPI_COMM_WORLD,size,ierr)
    call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)

    call PetscInitialize(PETSC_NULL_CHARACTER,ierr)

    !call PetscOptionsInsertString("-mat_superlu_dist_parsymbfact",ierr)

!...Check if nz and ny are odd (necessary to set max velocity to 1)

    if ( mod(nx,2)==0 .or.  mod(ny,2)==0   ) then
       write(*,*)
       STOP'STOPPING... IN 3D POISEUILLE BASEFLOW NZ,NY NEED TO BE ODD'
    endif

!...Set Domain Sizes and Initialize other variables

    xmin=-AA
    xmax=AA

    ymin=-1
    ymax=1

    ione=1

    nxy=nx*ny

    !Variables to Build DXX and DYY
    nxm2=nx-2
    nym2=ny-2
    nxym2 = nxm2*nym2

!...Mat to Vec and Vec to Mat conversion

    lim2=0
    do i=1,nxm2
       lim2(i) = (i-1)*nym2
    enddo

!...Get collocation points and chebyshev differentiation matrices

    call cheby(nx-1,xi,DMXB)
    call cheby(ny-1,yi,DMYB)

    ! x-mapping from [-1,1]->[-XMAX,XMAX]
    x=xmax*xi

    ! y-mapping (from [-1,1]->[-1,1])
    y=yi

    !Compute the metric terms
    dxidx = 1/xmax
    d2xidx2 = 0

    dyidy = 1
    d2yidy2 = 0

    !Scale the differentiation matrix (due to the use of the mapping)

    !x-direction
    D2XB  = d2xidx2*DMXB(:,:,1) + dxidx**2*DMXB(:,:,2)
    D2XBs = D2XB(2:nx-1,2:nx-1)

    !y-direction
    D2YB  = d2yidy2*DMYB(:,:,1) + dyidy**2*DMYB(:,:,2)
    D2YBs = D2YB(2:ny-1,2:ny-1)

    xx=0.0
    yy=0.0

    do i=1,nx
       do j=1,ny

          xx(i,j)=x(i)
          yy(i,j)=y(j)

       enddo
    enddo

!...Create Petsc right-hand-side vector and derivation matrices
    call VecCreateSeq(PETSC_COMM_SELF,nxym2,sol_seq,ierr)
    call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,nxym2,frhs,ierr) ! right 
hand side vector
    call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,nxym2,frhs2,ierr) ! right 
hand side vector # 2

    call 
MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,nxym2,nxym2,nx,PETSC_NULL_INTEGER,nx,PETSC_NULL_INTEGER,DXX,ierr)
    call 
MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,nxym2,nxym2,ny,PETSC_NULL_INTEGER,ny,PETSC_NULL_INTEGER,DYY,ierr)

!...Determine which rows of the vector/matrix are locally owned.
    call VecGetOwnershipRange(frhs,IstartVec,IendVec,ierr)
    call MatGetOwnershipRange(DXX,Istart,Iend,ierr)

    if ( IstartVec /= Istart .or. IendVec /= Iend ) then
       stop 'Problem in Baseflow Computation'
    endif

!...Fill in Sequential and Parallel Vectors
    allocate(xwork1(Iend-Istart))
    allocate(loc(Iend-Istart))

    ct=0
    do i=Istart,Iend-1
       ct=ct+1
       loc(ct)=i
       xwork1(ct)=(-2.0d0,0.0d0)
    enddo

    call VecSetValues(frhs,Iend-Istart,loc,xwork1,INSERT_VALUES,ierr)
    call VecZeroEntries(sol_seq,ierr)

!...Assemble Vectors
    call VecAssemblyBegin(frhs,ierr)
    call VecAssemblyEnd(frhs,ierr)

    call VecAssemblyBegin(sol_seq,ierr)
    call VecAssemblyEnd(sol_seq,ierr)

    call VecDuplicate(frhs,sol,ierr)   ! parallel solution vector

!...Display vector
    !call VecView(frhs,PETSC_VIEWER_STDOUT_WORLD,ierr) ! ==> seems very slow

!...Build DXX in Parallel

!!$    if (rank==0) then
!!$       call print_matrix('D2XB',nx,nx,D2XB)
!!$       call print_matrix('D2YB',ny,ny,D2YB)
!!$
!!$       call print_matrix_cmplx('D2XBs',nx-2,nx-2,D2XBs)
!!$       call print_matrix_cmplx('D2YBs',ny-2,ny-2,D2YBs)
!!$    endif

    if (rank == 0) call cpu_time(start)

    do i=Istart,Iend-1

       ii=1+floor((i+0.01)/nym2)
       jj=1

       jmin=mod(i,nym2)
       jmax=jmin+(nxm2-1)*nym2

       do j=jmin,jmax,nym2

          call MatSetValues(DXX,ione,i,ione,j,D2XBs(ii,jj),INSERT_VALUES,ierr)
          jj=jj+1

       enddo

    enddo

!...Build DYY in Parallel

    do i=Istart,Iend-1

       ii=mod(i,nym2)+1
       jj=1

       jmin=floor((i+0.01)/nym2)*nym2

       do j=jmin,jmin+(nym2-1)

          call MatSetValues(DYY,ione,i,ione,j,D2YBs(ii,jj),INSERT_VALUES,ierr)
          jj=jj+1

       enddo

    enddo

!...Assemble DXX and DYY

    call MatAssemblyBegin(DXX,MAT_FINAL_ASSEMBLY,ierr)
    call MatAssemblyEnd(DXX,MAT_FINAL_ASSEMBLY,ierr)

    call MatAssemblyBegin(DYY,MAT_FINAL_ASSEMBLY,ierr)
    call MatAssemblyEnd(DYY,MAT_FINAL_ASSEMBLY,ierr)

    if (rank==0) then
       call cpu_time(finish)
       print '("Time for build and assembly of baseflow operator = ",f15.4," 
seconds.")',finish-start
    endif

    !call MatView(DXX,PETSC_VIEWER_STDOUT_WORLD,ierr)
    !call MatView(DYY,PETSC_VIEWER_STDOUT_WORLD,ierr)

    aone = (1.0d0,0.0d0)

    if (rank == 0) call cpu_time(start)
    call MatAXPY(DXX,aone,DYY,DIFFERENT_NONZERO_PATTERN,ierr) ! on exit DXX = 
a*DYY + DXX
    if (rank==0) then
       call cpu_time(finish)
       print '("Time for Matrix summation = ",f15.4," seconds.")',finish-start
    endif


!...Write Matrix to ASCII and Binary Format
!!$    call PetscViewerASCIIOpen(PETSC_COMM_WORLD,"Amat.m",viewer,ierr)
!!$    call MatView(DXX,viewer,ierr)
!!$    call PetscViewerDestroy(viewer,ierr)
!!$
!!$    call 
PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_WRITE,viewer,ierr)
!!$    call MatView(DXX,viewer,ierr)
!!$    call PetscViewerDestroy(viewer,ierr)

!...Load a Matrix in Binary Format
!!$    call 
PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_READ,viewer,ierr)
!!$    call MatCreate(PETSC_COMM_WORLD,DLOAD,ierr)
!!$    call MatSetType(DLOAD,MATAIJ,ierr)
!!$    call MatLoad(DLOAD,viewer,ierr)
!!$    call PetscViewerDestroy(viewer,ierr)

    !call MatTranspose(DXX,MatReuse reuse,Mat *B)
    !call MatView(DXX,PETSC_VIEWER_STDOUT_WORLD,ierr)

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! SOLVE POISSON EQUATION FOR THE FIRST TIME !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

!...Create linear solver context
    call KSPCreate(PETSC_COMM_WORLD,ksp,ierr)

!...Set operators. Here the matrix that defines the linear system also serves 
as the preconditioning matrix.
    !call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr) !aha 
commented and replaced by next line
    call KSPSetOperators(ksp,DXX,DXX,ierr) ! remember: here DXX=DXX+DYY

    !call KSPSetOperators(ksp,DLOAD,DLOAD,ierr) ! remember: here DXX=DXX+DYY

    tol = 1.e-10
    !call 
KSPSetTolerances(ksp,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr)
 ! set relative tolerance and uses defualt for absolute and divergence tol
    call 
KSPSetTolerances(ksp,tol,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr) ! 
set relative and absolute tolerances and uses defualt for divergence tol

!...Set the Direct (LU) Solver
    !call KSPSetType(ksp,KSPPREONLY,ierr)
    !call KSPGetPC(ksp,pc,ierr)
    !call PCSetType(pc,PCLU,ierr)
    !call PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS,ierr) ! 
MATSOLVERSUPERLU_DIST

!...Set runtime options, e.g., -ksp_type <type> -pc_type <type> -ksp_monitor 
-ksp_rtol <rtol>
    !These options will override those specified above as long as 
KSPSetFromOptions() is called _after_
    !any other customization routines.

    call KSPSetFromOptions(ksp,ierr)

!...Sets an ADDITIONAL function to be called at every iteration to monitor the 
residual/error etc
    call 
KSPMonitorSet(ksp,KSPMonitorTrueResidualNorm,PETSC_NULL_OBJECT,PETSC_NULL_FUNCTION,ierr)

!...Solve the linear system
    call KSPSolve(ksp,frhs,sol,ierr)

!...View solver info (could use -ksp_view instead)

    call KSPView(ksp,PETSC_VIEWER_STDOUT_WORLD,ierr)

!...Extract Maximum Value (at center of duct) from Solution Vector

    call VecScatterCreateToZero(sol,ctx,sol_seq,ierr)
    !scatter as many times as you need
    call VecScatterBegin(ctx,sol,sol_seq,INSERT_VALUES,SCATTER_FORWARD,ierr)
    call VecScatterEnd(ctx,sol,sol_seq,INSERT_VALUES,SCATTER_FORWARD,ierr)
    !destroy scatter context and local vector when no longer needed
    call VecScatterDestroy(ctx,ierr)

    !call VecView(sol_seq,PETSC_VIEWER_STDOUT_WORLD,ierr)

    call VecGetArray(sol_seq,sol_array,i_x,ierr)

    if (rank==0) then

       sol_mat_cmplx=0.0d0

       do i=1,nxm2
          do j=1,nym2
             ij=lim2(i)+j
             sol_mat_cmplx(i,j) = sol_array(i_x + ij)
          enddo
       enddo

       U00 = sol_mat_cmplx((nxm2+1)/2,(nym2+1)/2)

    endif

    source=0
    Call MPI_Bcast(U00,1,MPI_DOUBLE_COMPLEX,source,MPI_COMM_WORLD,ierr)

    !call VecGetArrayReadF90(sol,xx_v,ierr) ! requires #include 
<petsc/finclude/petscvec.h90>
    !atest = xx_v(3)
    !call VecRestoreArrayReadF90(sol,xx_v,ierr)
    !write(*,*)'atest',atest

    call VecRestoreArray(sol_seq,sol_array,i_x,ierr)

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! SOLVE POISSON EQUATION FOR THE SECOND TIME !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

!...Fill in New Right-Hand-Side Vector (frhs2)
    call VecSetValues(frhs2,Iend-Istart,loc,xwork1/U00,INSERT_VALUES,ierr)

!...Assemble New RHS Vector
    call VecAssemblyBegin(frhs2,ierr)
    call VecAssemblyEnd(frhs2,ierr)

!...Solve System
    call KSPSolve(ksp,frhs2,sol,ierr)

!...Get Final Solution
    call VecScatterCreateToZero(sol,ctx,sol_seq,ierr)
    call VecScatterBegin(ctx,sol,sol_seq,INSERT_VALUES,SCATTER_FORWARD,ierr)
    call VecScatterEnd(ctx,sol,sol_seq,INSERT_VALUES,SCATTER_FORWARD,ierr)
    call VecScatterDestroy(ctx,ierr)

    call VecGetArray(sol_seq,sol_array,i_x,ierr)

    if (rank==0) then

       usol=0.0d0
       sol_mat_cmplx=0.0d0

       do i=1,nxm2
          do j=1,nym2
             ij=lim2(i)+j
             sol_mat_cmplx(i,j) = sol_array(i_x + ij)
          enddo
       enddo

       usol(2:nx-1,2:ny-1) = dble(sol_mat_cmplx)

    endif

    call VecRestoreArray(sol_seq,sol_array,i_x,ierr)

!...Send usol to all other processes (vectorize, broadcast and matrixize)
    li=0
    do i=1,nx
       li(i) = (i-1)*ny
    enddo

    if (rank==0) then

       do i=1,nx
          do j=1,ny
             ij=li(i)+j
             usol_vec(ij)=usol(i,j)
          enddo
       enddo

    endif

    call MPI_Bcast(usol_vec,nx*ny,MPI_DOUBLE,source,MPI_COMM_WORLD,ierr)

    if (rank/=0) then

       do i=1,nx
          do j=1,ny
             ij=li(i)+j
             usol(i,j)=usol_vec(ij)
          enddo
       enddo

    endif

!...Free work space. All PETSc objects should be destroyed when they are no 
longer needed.

    deallocate(loc)
    deallocate(xwork1)

    call KSPDestroy(ksp,ierr)
    call VecDestroy(frhs,ierr)
    call VecDestroy(frhs2,ierr)
    call VecDestroy(sol,ierr)
    call VecDestroy(sol_seq,ierr)
    call MatDestroy(DXX,ierr)
    call MatDestroy(DYY,ierr)

    call PetscFinalize(ierr)


END SUBROUTINE Poiseuille_3D_PETSC



!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!! HIEMENZ FLOW SOLVER !!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


!*************************************************************************************************
! on Veltins: ifort hiemenz.f90 -convert big_endian -r8 -i4 -O3 -132 -o 
hiemenz.x
! 
************************************************************************************************
      !program hiemenz

      subroutine hiemenz(ymax,eps,n,si,eta,u0,v0,v1,v2,v3,w0,w1,w2)

      implicit none

      real(dp) eps,ymax,si
      integer(i4b) n,i

!      real(dp),allocatable,dimension(:) :: eta
!      real(dp),allocatable,dimension(:) :: u0,v0,v1,v2,v3
!      real(dp),allocatable,dimension(:) :: w0,w1,w2

      real(dp),dimension(n) :: eta
      real(dp),dimension(n) :: u0,v0,v1,v2,v3
      real(dp),dimension(n) :: w0,w1,w2

      write(*,*)
      write(*,*) 
'*******************************************************************'
      write(*,*) '                       HIEMENZ SOLVER                         
     '
      write(*,*) 
'*******************************************************************'
      write(*,*)

!
!..... Numerical Parameters
!

!Domain height and number of points

      !ymax=100
      !n=500001

!Tolerance for Newton iteration

      !eps=1e-12

!Initial guess for Newton iteration: f''(0)

      !si = -1.232587656820134

!
!..... Allocate arrays
!
!      allocate(eta(n))
!      allocate(u0(n),v0(n),v1(n),v2(n),v3(n))
!      allocate(w0(n),w1(n),w2(n))

!
!..... Compute solution
!

      call hiemenz_fct(ymax,eps,n,si,eta,u0,v0,v1,v2,v3,w0,w1,w2)

!
!..... Write out solution in ASCII format
!

      write(*,*)
      write(*,*) 'Writing solution to hiemenz.dat'
      write(*,*)

      open(unit=15,file='hiemenz.dat',form='formatted')

      do i=1,n
         write(15,'(9E21.11)') 
eta(i),u0(i),v0(i),v1(i),v2(i),v3(i),w0(i),w1(i),w2(i)
      end do

      close(15)



    end subroutine hiemenz


!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


      subroutine hiemenz_fct(ymax,eps,n,si,eta,u0,v0,v1,v2,v3,w0,w1,w2)

      implicit none

      integer(i4b) i,iter,n,j
      real(dp) ymax,res,p,eps,s,si

      real(dp),allocatable,dimension(:,:) :: v,w

      real(dp) eta(n)
      real(dp) u0(n),v0(n),v1(n),v2(n),v3(n)
      real(dp) w0(n),w1(n),w2(n)

      !external w_ode

      write(*,*) 'eps=',eps
      write(*,*) 'ymax=',ymax
      write(*,*) 'N=',n
      write(*,*)

!
!     Allocate arrays
!
      allocate(v (6,n))
      allocate(w (5,n))

!
!     Create grid
!
      do i=1,n
         eta(i) = ymax * real(i-1)/real(n-1)
      end do
!
!     Newton-Raphson iteration for v
!
      iter = 0
      s = si
      res = 1.
      write(*,*) 'Newton-Raphson iteration for v: s=',s

      do while (res.gt.eps)

         iter=iter+1

         call hiemenz_eval(s,eta,n,v)

         s = s - (v(2,n) + 1.0) / v(5,n)
         res = abs(v(2,n) + 1.0)

         write(*,*) iter,s,res

      end do

      write(*,*) 'Number of iterations ',iter
      write(*,*)
      write(*,*) "v''(0)=",v(3,1)

!
!     Now compute w
!
      w(1,1) = 0.
      w(2,1) = v(3,1)

      w(3,1) = 0.
      w(4,1) = 0.
      w(5,1) = v(3,1)

      do i=2,n
         call rk4(w(1,i-1),w(1,i),eta(i)-eta(i-1),5,w_ode)
      end do

!
!     Rescale w to match boundary conditions at infinity
!
      p = 1./w(1,n)

      do i=1,n
         w(1,i) = w(1,i) * p
         w(2,i) = w(2,i) * p
      end do

      write(*,*) "w'(0)=",w(2,1)

!
!     Now compute v''', w'' from the RHS and copy all values to new arrays
!

      do i=1,n

         u0(i) = -v(2,i)
         v0(i) = v(1,i)
         v1(i) = v(2,i)
         v2(i) = v(3,i)
         v3(i) = -v(2,i)*v(2,i) + v(1,i)*v(3,i) + 1.0
         w0(i) = w(1,i)
         w1(i) = w(2,i)
         w2(i) = w(2,i)*v(1,i)

      end do


      deallocate(v,w)

    end subroutine hiemenz_fct


!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


      subroutine hiemenz_eval(s,eta,n,f)

      implicit none

      integer(i4b) n,i
      real(dp) s
      real(dp) eta(n),f(6,n)

      !external v_ode

      f(1,1) = 0.
      f(2,1) = 0.
      f(3,1) = s

      f(4,1) = 0.
      f(5,1) = 0.
      f(6,1) = 1.

      do i=2,n
         call rk4(f(1,i-1),f(1,i),eta(i)-eta(i-1),6,v_ode)
      end do

    end subroutine hiemenz_eval


!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


      subroutine v_ode(y,dy)

      implicit none

      real(dp) y(6),dy(6)

      dy(1) = y(2)
      dy(2) = y(3)
      dy(3) = -y(2)*y(2) + y(1)*y(3) + 1.0

      dy(4) = y(5)
      dy(5) = y(6)
      dy(6) = y(3)*y(4) - 2.0*y(2)*y(5) + y(1)*y(6)

      end subroutine v_ode


!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


      subroutine w_ode(y,dy)

      implicit none

      real(dp) y(5),dy(5)

      dy(1) = y(2)
      dy(2) = y(2)*y(3)

      dy(3) = y(4)
      dy(4) = y(5)
      dy(5) = -y(4)*y(4) + y(3)*y(5) + 1.0

      end subroutine w_ode


!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


      subroutine rk4(y1,y2,h,nn,der)
!
!     Perform one RK4 step with step size h of a n-vector
!     Input:  y1(n)  old step
!             h      step size
!             nn     vector size
!             der    derivative function
!     Output: y2(nn) new step
!
      implicit none

      integer(i4b) nn

      real(dp) y1(nn),y2(nn),y(nn)
      real(dp) k1(nn),k2(nn),k3(nn),k4(nn)
      real(dp) h

      external der
!
!     First RK substep
!

      y(:) = y1(:)
      call der(y,k1)
      k1(:)=h*k1(:)

!
!     Second RK substep
!

      y(:) = y1(:) + .5*k1(:)
      call der(y,k2)
      k2(:)=h*k2(:)

!
!     Third RK substep
!

      y(:) = y1(:) + .5*k2(:)
      call der(y,k3)
      k3(:)=h*k3(:)

!
!     Fourth RK substep
!

      y(:) = y1(:) + k3(:)
      call der(y,k4)
      k4(:)=h*k4(:)

!
!     Compose solution after full RK step
!
      y2(:) = y1(:) + ( k1(:) + 2.0*k2(:) + 2.0*k3(:) + k4(:) ) / 6.0

      end subroutine rk4


!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


      subroutine compute_baseflow_derivative(DXYZ,uvwvec,nx,ny,UVWxyz)

        implicit none

        integer(i4b) :: i,j

        integer(i4b),intent(in) :: nx,ny

        real(dp),dimension(nx*ny),intent(in) :: uvwvec
        real(dp),dimension(nx*ny,nx*ny),intent(in) :: DXYZ
        real(dp),dimension(nx*ny,nx*ny),intent(out) :: UVWxyz

        UVWxyz = 0.0

        do i=1,nx*ny
           do j=1,nx*ny

              UVWxyz(i,i) = UVWxyz(i,i) + DXYZ(i,j)*uvwvec(j)

           enddo
        enddo

      end subroutine compute_baseflow_derivative


!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


      subroutine compute_UVW_DXYZ(DXYZ,uvwvec,nx,ny,UVW_DXYZ)

        implicit none

        integer(i4b) :: i,j

        integer(i4b),intent(in) :: nx,ny

        real(dp),dimension(nx*ny),intent(in) :: uvwvec
        real(dp),dimension(nx*ny,nx*ny),intent(in) :: DXYZ
        real(dp),dimension(nx*ny,nx*ny),intent(out) :: UVW_DXYZ

        do i=1,nx*ny
           do j=1,nx*ny

              UVW_DXYZ(i,j) = uvwvec(i)*DXYZ(i,j)

           enddo
        enddo

      end subroutine compute_UVW_DXYZ


!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!




END MODULE BASEFLOW


























!!$
!!$
!!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!$
!!$  SUBROUTINE Poiseuille_3D(nz,ny,AA,usol,zz,yy)
!!$
!!$    character(128) :: file_name
!!$    integer(i4b) :: INFO,NRHS,ct,i,j,N,file_num,m1,n1,p1,q1
!!$
!!$    integer(i4b),intent(in) :: nz,ny
!!$    real(dp),intent(in) :: AA
!!$
!!$    real(dp) :: ymin,ymax,zmin,zmax,dzidz,d2zidz2,dyidy,d2yidy2,U00
!!$    real(dp) :: start,finish
!!$
!!$    real(dp),dimension(nz,nz,2) :: DMZ
!!$    real(dp),dimension(ny,ny,2) :: DMY
!!$
!!$    real(dp),dimension(nz,nz) :: D2Z
!!$    real(dp),dimension(ny,ny) :: D2Y
!!$
!!$    real(dp),dimension(nz-2,nz-2) :: D2Zs
!!$    real(dp),dimension(ny-2,ny-2) :: D2Ys
!!$
!!$    real(dp),dimension(nz-2,nz-2) :: IZ
!!$    real(dp),dimension(ny-2,ny-2) :: IY
!!$
!!$    real(dp),dimension(nz,ny) :: usol,zz,yy
!!$    real(dp),dimension(nz-2,ny-2) :: utmp
!!$
!!$    real(dp),dimension(nz) :: zi,z
!!$    real(dp),dimension(ny) :: yi,y
!!$
!!$    real(dp),dimension( (nz-2)*(ny-2) ) :: frhs
!!$    real(dp),dimension( (nz-2)*(ny-2),(nz-2)*(ny-2) ) :: AMAT,DZZ,DYY
!!$
!!$    integer(i4b),dimension((nz-2)*(ny-2)) :: IPIV
!!$
!!$    frhs=0.0
!!$
!!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!$!!!!!!! 3D POISEUILLE FLOW !!!!!!!!
!!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!$
!!$    zmin=-AA
!!$    zmax=AA
!!$
!!$    ymin=-1
!!$    ymax=1
!!$
!!$    !
!!$    !! .....Create Grid
!!$    !
!!$
!!$    if ( mod(nz,2)==0 .or.  mod(ny,2)==0   ) then
!!$       WRITE(*,*)
!!$       STOP'STOPPING... IN 3D POISEUILLE BASEFLOW NZ,NY NEED TO BE ODD'
!!$    endif
!!$
!!$    !
!!$    !! .....Get collocation points and chebyshev differentiation matrices
!!$    !
!!$
!!$    call cheby(nz-1,zi,DMZ)
!!$    call cheby(ny-1,yi,DMY)
!!$
!!$    ! z-mapping from [-1,1]->[-ZMAX,ZMAX]
!!$    z=zmax*zi
!!$
!!$    ! y-mapping (from [-1,1]->[-1,1])
!!$    y=yi
!!$
!!$    !Compute the metric terms
!!$    dzidz = 1/zmax
!!$    d2zidz2 = 0
!!$
!!$    dyidy = 1
!!$    d2yidy2 = 0
!!$
!!$    !Scale the differentiation matrix (due to the use of the mapping)
!!$
!!$    !z-direction
!!$    !D1Z = dzidz*DMZ(:,:,1)
!!$    D2Z  = d2zidz2*DMZ(:,:,1) + dzidz**2*DMZ(:,:,2)
!!$    D2Zs = D2Z(2:nz-1,2:nz-1)
!!$
!!$    !y-direction
!!$    !D1Y = dyidy*DMY(:,:,1)
!!$    D2Y  = d2yidy2*DMY(:,:,1) + dyidy**2*DMY(:,:,2)
!!$    D2Ys = D2Y(2:ny-1,2:ny-1)
!!$
!!$    IZ=0.0
!!$    IY=0.0
!!$
!!$    do i=1,nz-2
!!$       IZ(i,i)=1.0
!!$    enddo
!!$
!!$    do i=1,ny-2
!!$       IY(i,i)=1.0
!!$    enddo
!!$
!!$    m1=size(IY,1)
!!$    n1=size(IY,2)
!!$    p1=size(D2Zs,1)
!!$    q1=size(D2Zs,2)
!!$
!!$    call mykron(IY,D2Zs,m1,n1,p1,q1,DZZ)
!!$
!!$    m1=size(D2Ys,1)
!!$    n1=size(D2Ys,2)
!!$    p1=size(IZ,1)
!!$    q1=size(IZ,2)
!!$
!!$    call mykron(D2Ys,IZ,m1,n1,p1,q1,DYY)
!!$
!!$    AMAT = DYY + DZZ
!!$
!!$    !CALL PRINT_MATRIX( 'DZZ', (nz-2)*(ny-2), (nz-2)*(ny-2), DZZ )
!!$    !CALL PRINT_MATRIX( 'DYY', (nz-2)*(ny-2), (nz-2)*(ny-2),DYY )
!!$
!!$
!!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!$    !! SOLVE POISSON EQUATION FOR THE FIRST TIME !!
!!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!$
!!$
!!$
!!$    do i=1,(ny-2)*(nz-2)
!!$       frhs(i)=-2.0
!!$    enddo
!!$
!!$    INFO=0
!!$    NRHS=1 ! only 1 rhs here
!!$    N=(nz-2)*(ny-2)
!!$
!!$
!!$    !
!!$    ! Compute the LU factorization of A.
!!$    !
!!$
!!$    ! Note: on exit of dgetrf, AMAT is not AMAT anymore but contains the 
factors L and U
!!$    ! from the factorization A = P*L*U; the unit diagonal elements of L are 
not stored.
!!$
!!$    !call PRINT_MATRIX( 'AMAT', N, N, AMAT )
!!$
!!$
!!$    call cpu_time(start)
!!$    CALL dgetrf( N, N, AMAT, N, IPIV, INFO )
!!$    call cpu_time(finish)
!!$
!!$    !write(*,*)'AMAT AFTER',AMAT
!!$
!!$    print '("Time LU = ",f15.4," seconds.")',finish-start
!!$
!!$    IF( INFO .NE. 0 ) THEN
!!$       WRITE(*,*)'INFO',INFO
!!$       STOP 'PROBLEM IN LAPACK (Poiseuille_3D)'
!!$    ENDIF
!!$
!!$
!!$    !
!!$    ! Solve the system A*X = B, overwriting B with X.
!!$    !
!!$
!!$    IF( INFO.EQ.0 ) THEN
!!$
!!$       call cpu_time(start)
!!$       CALL dgetrs( 'No transpose', n, nrhs, AMAT, N, IPIV, frhs, N, INFO )
!!$       call cpu_time(finish)
!!$
!!$       print '("Time Linear System Solve = ",f15.4," seconds.")',finish-start
!!$       write(*,*)
!!$
!!$    END IF
!!$
!!$    ct=0
!!$    do j=1,ny-2
!!$       do i=1,nz-2
!!$
!!$          ct=ct+1
!!$          utmp(i,j) = frhs(ct)
!!$
!!$       enddo
!!$    enddo
!!$
!!$    usol = 0.0
!!$    usol(2:nz-1,2:ny-1) = utmp(:,:)
!!$
!!$    zz=0.0
!!$    yy=0.0
!!$
!!$    do i=1,nz
!!$       do j=1,ny
!!$
!!$          zz(i,j)=z(i)
!!$          yy(i,j)=y(j)
!!$
!!$       enddo
!!$    enddo
!!$
!!$    U00 = usol( (nz+1)/2 , (ny+1)/2 )
!!$
!!$
!!$
!!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!$!! SOLVE POISSON EQUATION FOR THE SECOND TIME !!
!!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!$
!!$
!!$    utmp=0.0
!!$    usol=0.0
!!$    frhs=0.0
!!$
!!$    !New,scaled rhs
!!$    do i=1,(ny-2)*(nz-2)
!!$       frhs(i)=-2.0/U00
!!$    enddo
!!$
!!$    !
!!$    ! Solve the system A*X = B, overwriting B with X.
!!$    !
!!$
!!$    ! Note: here we already have the LU factorization so that we only need a 
call to dgetrs
!!$
!!$    CALL dgetrs( 'No transpose', N, NRHS, AMAT, N, IPIV, frhs, N, INFO )
!!$
!!$    IF( INFO .NE. 0 ) THEN
!!$       WRITE(*,*)'INFO',INFO
!!$       STOP 'PROBLEM IN LAPACK (Poiseuille_3D -- solve 2)'
!!$    ENDIF
!!$
!!$    !write(*,*)'frhs',frhs
!!$
!!$    ct=0
!!$    do j=1,ny-2
!!$       do i=1,nz-2
!!$
!!$          ct=ct+1
!!$          utmp(i,j) = frhs(ct)
!!$
!!$       enddo
!!$    enddo
!!$
!!$    usol(2:nz-1,2:ny-1) = utmp(:,:)
!!$
!!$
!!$  END SUBROUTINE Poiseuille_3D
!!$
!!$
!
!  Description: Build complex matrices A and B in the context of a generalized
!  eigenvalue problem (GEVP)
!
!
!/*T
!  Concepts: Build complex matrices
!  Processors: n
!T*/
!
! -----------------------------------------------------------------------

MODULE PETSC

  USE TYPEDEF
  USE WRITE_PRINT

  IMPLICIT NONE

CONTAINS

  Subroutine SetupSolveGEVP(nx,ny,nxy,alpha,beta,Re)

#include <petsc/finclude/petscsys.h>
#include <petsc/finclude/petscvec.h>
#include <petsc/finclude/petscmat.h>
#include <petsc/finclude/petscmat.h90>
#include <petsc/finclude/petscpc.h>
#include <petsc/finclude/petscksp.h>

#include <slepc/finclude/slepcsys.h>
#include <slepc/finclude/slepceps.h>

!...Non PETSc/SLEPc Variables

    real(dp),intent(in) :: alpha,beta,Re
    integer(i4b),intent(in) :: nx,ny,nxy

    integer(i4b) :: ij
    integer(i4b) :: sizem,rank,source

    integer(i4b),dimension(:),allocatable  :: idxm1,idxn1
    integer(i4b),dimension(:),allocatable  :: idxm2,idxn2
    integer(i4b),dimension(:),allocatable  :: idxm3,idxn3

    integer(i4b),dimension(:),allocatable  :: li
    integer(i4b),dimension(:),allocatable  :: all_bc
    integer(i4b),dimension(:),allocatable  :: u_bc,v_bc,w_bc,p_bc
    integer(i4b),dimension(:),allocatable  :: fst_bc,wall_bc
    integer(i4b),dimension(:),allocatable  :: outlet_bc,inlet_bc


    complex(dpc) :: ibeta,Rec,alphac,betac,small,sigma
    complex(dpc),dimension(:,:),allocatable :: D1Xc,D2Xc,D1Yc,D2Yc

    complex(dpc),dimension(nxy) :: uvec_cmplx,uxvec_cmplx,uyvec_cmplx
    complex(dpc),dimension(nxy) :: vvec_cmplx,vxvec_cmplx,vyvec_cmplx
    complex(dpc),dimension(nxy) :: wvec_cmplx,wxvec_cmplx,wyvec_cmplx

    complex(dpc),dimension(3*nxy) :: wvec_cmplx3
    complex(dpc),dimension(ny,ny) :: VDIAG,VDY

!...Temporary variables
    integer(i4b),dimension(:),allocatable  :: loc
    complex(dpc),dimension(:),allocatable  :: xwork1


!...SLEPC VARIABLES
    EPS eps
    EPSType tname

    ST st
    PC pc

    Vec xr,xi

    PetscReal      error
    PetscScalar    kr,ki

    PetscLogDouble mem

    PetscInt       maxit,nconv
    PetscInt       nev,ncv,mpd
    PetscInt       three
    PetscInt       ct,ct1,ct2,ct3,ct4

    PetscInt       icntl,ival,ival1

    PetscReal tolslepc
    PetscInt maxiter

!...PETSC VARIABLES

    PetscReal      norm,tol,nrm,nrm1
    PetscInt       i,j,k,II,JJ,its

    PetscInt       jmin,jmax
    PetscInt       ione,ntimes
    PetscErrorCode ierr
    PetscBool      flg
    PetscScalar    v,one,zero,rhs

    KSP            ksp
    PetscRandom    rctx

!...MPI
    !PetscMPIInt    rank,size

!...Matrices indices

    PetscInt       IstartA,IendA,IstartB,IendB

!...Vectors

    Vec            i_vec,frhs,sol

!...Matrices

    Mat            A,B

    !  Note: Any user-defined Fortran routines (such as MyKSPMonitor)
    !  MUST be declared as external.

    !external MyKSPMonitor,MyKSPConverged

!#if !defined(PETSC_USE_COMPLEX)
!#error "this code requires PETSc --with-scalar-type=complex build"
!#endif
!#if !defined(PETSC_USE_REAL_DOUBLE)
!#error "this code requires PETSc --with-precision=real build"
!#endif

! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -!
!                           Beginning of program                         !
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -!

    call SlepcInitialize(PETSC_NULL_CHARACTER,ierr)!;CHKERRQ(ierr)

    call PetscMemorySetGetMaximumUsage(ierr)!;CHKERRQ(ierr)

    call PetscOptionsInsertString("-mat_superlu_dist_parsymbfact",ierr)

    call MPI_COMM_SIZE(MPI_COMM_WORLD,sizem,ierr)!;CHKERRQ(ierr)
    call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)!;CHKERRQ(ierr)

    call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
    !call PetscMemoryGetMaximumUsage(mem,ierr)
    call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr)
    if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (entering 
module)',rank,'    memory:',mem

    !call PetscOptionsGetInt(PETSC_NULL_CHARACTER,'-nx',nx,flg,ierr)
    !call PetscOptionsGetInt(PETSC_NULL_CHARACTER,'-ny',ny,flg,ierr)

    ione=1 ! integer
    ibeta = imag_unit*beta
    one = (1.0d0,0.0d0)
    zero = (0.0d0,0.0d0)
    Rec=Re
    alphac=alpha
    betac=beta

    source=0

    ct=0

!...Convert Arrays to Complex

    uvec_cmplx=uvec
    uxvec_cmplx=uxvec
    uyvec_cmplx=uyvec

    vvec_cmplx=vvec
    vxvec_cmplx=vxvec
    vyvec_cmplx=vyvec

    wvec_cmplx=wvec
    wxvec_cmplx=wxvec
    wyvec_cmplx=wyvec

    wvec_cmplx3(1:nxy)=wvec_cmplx
    wvec_cmplx3(nxy+1:2*nxy)=wvec_cmplx
    wvec_cmplx3(2*nxy+1:3*nxy)=wvec_cmplx

    allocate(D1Xc(nx,nx),D2Xc(nx,nx),D1Yc(ny,ny),D2Yc(ny,ny))

    D1Xc=D1X
    D1Yc=D1Y

    D2Xc=D2X
    D2Yc=D2Y

    !call 
MPI_Bcast(wvec_cmplx3,3*nxy,MPI_DOUBLE_COMPLEX,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)

!...Parallel Matrices

    !Create A
    call 
MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4*nxy,4*nxy,2*(nx+ny),PETSC_NULL_INTEGER,max(nx,ny),PETSC_NULL_INTEGER,A,ierr)!;CHKERRQ(ierr)

    !Create B

    call 
MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4*nxy,4*nxy,1,PETSC_NULL_INTEGER,0,PETSC_NULL_INTEGER,B,ierr)!;CHKERRQ(ierr)

    call MatGetOwnershipRange(A,IstartA,IendA,ierr)!;CHKERRQ(ierr)
    call MatGetOwnershipRange(B,IstartB,IendB,ierr)!;CHKERRQ(ierr)

    !write(*,*)'IstartA,IendA',IstartA,IendA
    !write(*,*)'IstartB,IendB',IstartB,IendB

    if (IstartA > 4*nxy .or. IstartB > 4*nxy .or. IendA > 4*nxy .or. IendB > 
4*nxy) then
       stop 'indices problem in module_petsc'
    endif

!...NEED TO TEST WITHOUT THIS NEXT LINE TO UNDERSTAND 
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!...This line avoids error [0]PETSC ERROR: Argument out of range [0]PETSC 
ERROR: New nonzero at (470,470)...
    call 
MatSetOption(A,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE,ierr)!;CHKERRQ(ierr)

!...Get Memory Usage
    call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
    !call PetscMemoryGetMaximumUsage(mem,ierr)
    call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr)
    if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (after 
matcreate)',rank,'    memory:',mem

!...Build A in Parallel

    if (rank == 0) call cpu_time(start)

    do i=IstartA,IendA-1

       if     ( i < nxy ) then

          if (abs(uyvec_cmplx(i+1))>1e10) STOP '1'
          call 
MatSetValues(A,ione,i,ione,i+nxy,uyvec_cmplx(i+1),INSERT_VALUES,ierr)!;CHKERRQ(ierr)
 ! Uy (block 1,2)

          !Build DX (block 1,4)
          ii=1+floor((i+0.01)/ny)
          jj=1

          jmin=mod(i,ny)
          jmax=jmin+(nx-1)*ny

          do j=jmin,jmax,ny

             if (abs(D1Xc(ii,jj))>1e10) STOP '2'
             call 
MatSetValues(A,ione,i,ione,j+3*nxy,D1Xc(ii,jj),INSERT_VALUES,ierr) ! Build DX 
(block 1,4)
             jj=jj+1

          enddo

          !Build -DXX/Re (part of block 1,1)
          ii=1+floor((i+0.01)/ny)
          jj=1

          jmin=mod(i,ny)
          jmax=jmin+(nx-1)*ny

          do j=jmin,jmax,ny

             if (abs(-D2Xc(ii,jj)/Rec)>1e10) STOP '3'
             call 
MatSetValues(A,ione,i,ione,j,-D2Xc(ii,jj)/Rec,INSERT_VALUES,ierr) !Build 
-DXX/Re (part of block 1,1)
             jj=jj+1

          enddo


       elseif ( i > nxy-1 .and. i < 2*nxy ) then

          if (abs(vxvec_cmplx(i-nxy+1))>1e10) STOP '4'
          call 
MatSetValues(A,ione,i,ione,i-nxy,vxvec_cmplx(i-nxy+1),INSERT_VALUES,ierr)!;CHKERRQ(ierr)
 ! Vx (block 2,1)

          !Build DY (block 2,4)
          ii=mod(i,ny)+1
          jj=1

          jmin=floor((i-nxy+0.01)/ny)*ny

          do j=jmin,jmin+(ny-1)

             if (abs(D1Yc(ii,jj))>1e10) STOP '5'
             call 
MatSetValues(A,ione,i,ione,j+3*nxy,D1Yc(ii,jj),INSERT_VALUES,ierr) ! Build DY 
(block 2,4)
             jj=jj+1

          enddo


          !Build -DXX/Re (part of block 2,2)
          ii=1+floor((i-nxy+0.01)/ny)
          jj=1

          jmin=mod(i,ny)
          jmax=jmin+(nx-1)*ny

          do j=jmin,jmax,ny

             if (abs(-D2Xc(ii,jj)/Rec)>1e10) STOP '6'
             call 
MatSetValues(A,ione,i,ione,j+nxy,-D2Xc(ii,jj)/Rec,INSERT_VALUES,ierr) !Build 
-DXX/Re (part of block 2,2)
             jj=jj+1

          enddo




       elseif ( i > 2*nxy-1 .and. i < 3*nxy ) then

          call MatSetValues(A,ione,i,ione,i+nxy,ibeta,INSERT_VALUES,ierr)! 
iBeta (block 3,4)

          call 
MatSetValues(A,ione,i,ione,i-2*nxy,wxvec_cmplx(i-2*nxy+1),INSERT_VALUES,ierr)! 
Wx (block 3,1)
          call MatSetValues(A,ione,i,ione,i  
-nxy,wyvec_cmplx(i-2*nxy+1),INSERT_VALUES,ierr)! Wy (block 3,2)


          !Build -DXX/Re (part of block 3,3)
          ii=1+floor((i-2*nxy+0.01)/ny)
          jj=1

          jmin=mod(i,ny)
          jmax=jmin+(nx-1)*ny

          do j=jmin,jmax,ny

             call 
MatSetValues(A,ione,i,ione,j+2*nxy,-D2Xc(ii,jj)/Rec,INSERT_VALUES,ierr) !Build 
-DXX/Re (part of block 3,3)
             jj=jj+1

          enddo


       elseif ( i > 3*nxy-1 ) then

          call MatSetValues(A,ione,i,ione,i-nxy,ibeta,INSERT_VALUES,ierr)! 
iBeta (block 4,3)

          !Build DX (block 4,1)
          ii=1+floor(((i-3*nxy)+0.01)/ny)
          jj=1

          jmin=mod(i,ny)
          jmax=jmin+(nx-1)*ny

          do j=jmin,jmax,ny

             call MatSetValues(A,ione,i,ione,j,D1Xc(ii,jj),INSERT_VALUES,ierr)
             jj=jj+1

          enddo

          !Build DY (block 4,2)
          ii=mod(i,ny)+1
          jj=1

          jmin=floor((i-3*nxy+0.01)/ny)*ny

          do j=jmin,jmin+(ny-1)

             call 
MatSetValues(A,ione,i,ione,j+nxy,D1Yc(ii,jj),INSERT_VALUES,ierr)
             jj=jj+1

          enddo

       endif

    enddo

    if (rank==0) then

       call cpu_time(finish)
       write(*,*)
       print '("Time Partial Parallel Build A #1 = ",f21.3," 
seconds.")',finish-start

    endif


    call MatAssemblyBegin(A,MAT_FLUSH_ASSEMBLY,ierr)!;CHKERRQ(ierr) ! necessary 
to switch between INSERT_VALUES and ADD_VALUES
    call MatAssemblyEnd(A,MAT_FLUSH_ASSEMBLY,ierr)!;CHKERRQ(ierr)   ! necessary 
to switch between INSERT_VALUES and ADD_VALUES



!...Finish Blocks A11, A22 and A33

    if (rank == 0) call cpu_time(start)

    VDY=(0.0d0,0.0d0)
    VDIAG=(0.0d0,0.0d0)

    do i=IstartA,IendA-1
       if (i < 3*nxy) then
          call 
MatSetValues(A,ione,i,ione,i,betac**2/Rec+imag_unit*betac*wvec_cmplx3(i+1),ADD_VALUES,ierr)!;CHKERRQ(ierr)
       endif
    enddo

    do i=IstartA,IendA-1

       if     ( i < nxy ) then

          !Build -DYY/Rec (part of block 1,1)
          ii=mod(i,ny)+1
          jj=1

          jmin=floor((i+0.01)/ny)*ny

          do j=jmin,jmin+(ny-1)

             call MatSetValues(A,ione,i,ione,j,-D2Yc(ii,jj)/Rec,ADD_VALUES,ierr)
             jj=jj+1

          enddo


          !Build U*DX (part of block 1,1)
          ii=1+floor((i+0.01)/ny)
          jj=1

          jmin=mod(i,ny)
          jmax=jmin+(nx-1)*ny

          do j=jmin,jmax,ny

             call 
MatSetValues(A,ione,i,ione,j,uvec_cmplx(i+1)*D1Xc(ii,jj),ADD_VALUES,ierr)
             jj=jj+1

          enddo


          !Build V*DY (part of block 1,1)
          ii=mod(i,ny)+1
          jj=1

          jmin=floor((i+0.01)/ny)*ny

          do j=jmin,jmin+(ny-1)
             call 
MatSetValues(A,ione,i,ione,j,vvec_cmplx(i+1)*D1Yc(ii,jj),ADD_VALUES,ierr)
             jj=jj+1

          enddo


          call 
MatSetValues(A,ione,i,ione,i,uxvec_cmplx(i+1),ADD_VALUES,ierr)!;CHKERRQ(ierr) ! 
Ux (block 1,1)




       elseif ( i > nxy-1 .and. i < 2*nxy ) then

          !Build -DYY/Rec (part of block 2,2)
          ii=mod(i,ny)+1
          jj=1

          jmin=floor((i-nxy+0.01)/ny)*ny

          do j=jmin,jmin+(ny-1)

             call 
MatSetValues(A,ione,i,ione,j+nxy,-D2Yc(ii,jj)/Rec,ADD_VALUES,ierr) ! Build DY 
(block 2,4)
             jj=jj+1

          enddo


          !Build U*DX (part of block 2,2)
          ii=1+floor((i-nxy+0.01)/ny)
          jj=1

          jmin=mod(i,ny)
          jmax=jmin+(nx-1)*ny

          do j=jmin,jmax,ny

             call 
MatSetValues(A,ione,i,ione,j+nxy,uvec_cmplx(i-nxy+1)*D1Xc(ii,jj),ADD_VALUES,ierr)
             jj=jj+1

          enddo


          !Build V*DY (part of block 2,2)
          ii=mod(i,ny)+1
          jj=1

          jmin=floor((i-nxy+0.01)/ny)*ny

          do j=jmin,jmin+(ny-1)
             call 
MatSetValues(A,ione,i,ione,j+nxy,vvec_cmplx(i-nxy+1)*D1Yc(ii,jj),ADD_VALUES,ierr)
             jj=jj+1

          enddo

          call 
MatSetValues(A,ione,i,ione,i,vyvec_cmplx(i-nxy+1),ADD_VALUES,ierr)!;CHKERRQ(ierr)
 ! Vy (block 2,2)


       elseif ( i > 2*nxy-1 .and. i < 3*nxy ) then

          !Build -DYY/Rec (part of block 3,3)
          ii=mod(i,ny)+1
          jj=1

          jmin=floor((i-2*nxy+0.01)/ny)*ny

          do j=jmin,jmin+(ny-1)

             call 
MatSetValues(A,ione,i,ione,j+2*nxy,-D2Yc(ii,jj)/Rec,ADD_VALUES,ierr) ! Build DY 
(block 2,4)
             jj=jj+1

          enddo


          !Build U*DX (part of block 3,3)
          ii=1+floor((i-2*nxy+0.01)/ny)
          jj=1

          jmin=mod(i,ny)
          jmax=jmin+(nx-1)*ny

          do j=jmin,jmax,ny

             call 
MatSetValues(A,ione,i,ione,j+2*nxy,uvec_cmplx(i-2*nxy+1)*D1Xc(ii,jj),ADD_VALUES,ierr)
             jj=jj+1

          enddo


          !Build V*DY (part of block 3,3)
          ii=mod(i,ny)+1
          jj=1

          jmin=floor((i-2*nxy+0.01)/ny)*ny

          do j=jmin,jmin+(ny-1)
             call 
MatSetValues(A,ione,i,ione,j+2*nxy,vvec_cmplx(i-2*nxy+1)*D1Yc(ii,jj),ADD_VALUES,ierr)
             jj=jj+1

          enddo

       endif

    enddo


    if (rank==0) then

       call cpu_time(finish)
       write(*,*)
       print '("Time Partial Parallel Build A #2 = ",f21.3," 
seconds.")',finish-start

    endif

!$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
!$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$


    !...Artificially create diagonal element for A ==> TO BE REMOVED
    small = (0.000001d0,0.0d0)
    do i=IstartA,IendA-1
       call MatSetValues(A,ione,i,ione,i,small,ADD_VALUES,ierr)!;CHKERRQ(ierr)
    enddo

    call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
    !call PetscMemoryGetMaximumUsage(mem,ierr)
    call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr)
    if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (A just before 
assembly)',rank,'    memory:',mem


    if (rank==0) then
       call cpu_time(start)
    endif

!...Assemble A
    call MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY,ierr)!;CHKERRQ(ierr)
    call MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY,ierr)!;CHKERRQ(ierr)
    !call MatView(A,PETSC_VIEWER_STDOUT_WORLD,ierr)

    if (rank==0) then
       call cpu_time(finish)
       write(*,*)
       print '("Time to assemble A = ",f21.3," seconds.")',finish-start
    endif


    call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
    !call PetscMemoryGetMaximumUsage(mem,ierr)
    call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr)
    if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (A build and 
assembled)',rank,'    memory:',mem


!...Build B in parallel
    if (rank==0) then
       call cpu_time(start)
    endif

    do i=IstartB,IendB-1
       if (i < 3*nxy) then
          call 
MatSetValues(B,ione,i,ione,i,imag_unit,INSERT_VALUES,ierr)!;CHKERRQ(ierr)
       endif
    enddo

!...Assemble B
    call MatAssemblyBegin(B,MAT_FINAL_ASSEMBLY,ierr)!;CHKERRQ(ierr)
    call MatAssemblyEnd(B,MAT_FINAL_ASSEMBLY,ierr)!;CHKERRQ(ierr)
    !call MatView(B,PETSC_VIEWER_STDOUT_WORLD,ierr)

    call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
    !call PetscMemoryGetMaximumUsage(mem,ierr)
    call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr)
    if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (B build and 
assembled)',rank,'    memory:',mem


    call MatNorm(B,NORM_FROBENIUS,nrm1,ierr)!;CHKERRQ(ierr)

!!$    if (rank==0) then
!!$       write(*,*)'norm B:',nrm1
!!$    endif

    if (rank==0) then
       call cpu_time(finish)
       write(*,*)
       print '("Time to build and assemble B = ",f21.3," 
seconds.")',finish-start
    endif


!...Set Boundary Conditions

    if (rank==0) then

       allocate(li(nx))

       do i=1,nx
          li(i)=(i-1)*ny
       enddo

       ct1=0
       ct2=0
       ct3=0
       ct4=0

       do i=1,nx
          do j=1,ny

             if (j==1) then ! fst_bc
                ct1=ct1+1
             elseif (j==ny) then ! wall_bc
                ct2=ct2+1
             elseif (i==1) then ! outlet_bc
                ct3=ct3+1
             elseif (i==nx) then ! inlet_bc
                ct4=ct4+1
             endif

          enddo
       enddo

       ct=ct1+ct2+ct3+ct4

    endif ! if (rank == 0)

    call MPI_Bcast(ct,1,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
    call MPI_Bcast(ct1,1,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
    call MPI_Bcast(ct2,1,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
    call MPI_Bcast(ct3,1,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
    call MPI_Bcast(ct4,1,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)

    allocate(fst_bc(ct1))
    allocate(wall_bc(ct2))
    allocate(outlet_bc(ct3))
    allocate(inlet_bc(ct4))

    allocate(all_bc(3*ct))

    allocate(u_bc(ct))
    allocate(v_bc(ct))
    allocate(w_bc(ct))
    allocate(p_bc(ct))

    if (rank == 0) then

       ct1=0
       ct2=0
       ct3=0
       ct4=0

       do i=1,nx
          do j=1,ny

             ij=li(i)+j-1

             if (j==1) then ! fst_bc
                ct1=ct1+1
                fst_bc(ct1)=ij
             elseif (j==ny) then ! wall_bc
                ct2=ct2+1
                wall_bc(ct2)=ij
             elseif (i==1) then ! outlet_bc
                ct3=ct3+1
                outlet_bc(ct3)=ij
             elseif (i==nx) then ! inlet_bc
                ct4=ct4+1
                inlet_bc(ct4)=ij
             endif

          enddo
       enddo

       u_bc(1:ct1)=fst_bc
       u_bc(ct1+1:ct1+ct2)=wall_bc
       u_bc(ct1+ct2+1:ct1+ct2+ct3)=outlet_bc
       u_bc(ct1+ct2+ct3+1:ct1+ct2+ct3+ct4)=inlet_bc

       v_bc=u_bc+nxy
       w_bc=v_bc+nxy
       p_bc=w_bc+nxy

       all_bc=[u_bc,v_bc,w_bc]

    endif ! if (rank == 0)

    call 
MPI_Bcast(all_bc,3*ct,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)

    !For parallel case, all processes that share  matrix MUST call this 
routine, regardless of whether any rows being zeroed are owned by them.
    call MatZeroRows(A,3*ct,all_bc, one,zero,zero,ierr);CHKERRQ(ierr)
    call MatZeroRows(B,3*ct,all_bc,zero,zero,zero,ierr);CHKERRQ(ierr)

    !call MatView(B,PETSC_VIEWER_STDOUT_WORLD,ierr)

    ! cannot be inside a if (rank==0) statment!!!
    call MatNorm(A,NORM_FROBENIUS,nrm,ierr)!;CHKERRQ(ierr)
    call MatNorm(B,NORM_FROBENIUS,nrm1,ierr)!;CHKERRQ(ierr)

    if (rank==0) then
       write(*,*)'norm A:',nrm
       write(*,*)'norm B:',nrm1
    endif



!!!!!! TESTING HOW BIG LU CAN BE !!!!!!!!!!!!!!!!

!!$    sigma = (1.0d0,0.0d0)
!!$    call MatAXPY(A,-sigma,B,DIFFERENT_NONZERO_PATTERN,ierr) ! on exit A = 
A-sigma*B
!!$
!!$    !...Create linear solver context
!!$    call KSPCreate(PETSC_COMM_WORLD,ksp,ierr)
!!$
!!$!...Set operators. Here the matrix that defines the linear system also 
serves as the preconditioning matrix.
!!$    !call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr) !aha 
commented and replaced by next line
!!$    call KSPSetOperators(ksp,A,A,ierr) ! remember: here A = A-sigma*B
!!$
!!$    tol = 1.e-10
!!$    call 
KSPSetTolerances(ksp,tol,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr) ! 
set relative and absolute tolerances and uses defualt for divergence tol
!!$
!!$!...Set the Direct (LU) Solver
!!$    call KSPSetType(ksp,KSPPREONLY,ierr)
!!$    call KSPGetPC(ksp,pc,ierr)
!!$    call PCSetType(pc,PCLU,ierr)
!!$    call PCFactorSetMatSolverPackage(pc,MATSOLVERSUPERLU_DIST,ierr) ! 
MATSOLVERSUPERLU_DIST MATSOLVERMUMPS
!!$
!!$!...Create right-hand-side vector
!!$    call MatCreateVecs(A,frhs,PETSC_NULL_OBJECT,ierr)
!!$    call MatCreateVecs(A,sol,PETSC_NULL_OBJECT,ierr)
!!$
!!$    allocate(xwork1(IendA-IstartA))
!!$    allocate(loc(IendA-IstartA))
!!$
!!$    ct=0
!!$    do i=IstartA,IendA-1
!!$       ct=ct+1
!!$       loc(ct)=i
!!$       xwork1(ct)=(1.0d0,0.0d0)
!!$    enddo
!!$
!!$    call VecSetValues(frhs,IendA-IstartA,loc,xwork1,INSERT_VALUES,ierr)
!!$    call VecZeroEntries(sol,ierr)
!!$
!!$    deallocate(xwork1,loc)
!!$
!!$!...Assemble Vectors
!!$    call VecAssemblyBegin(frhs,ierr)
!!$    call VecAssemblyEnd(frhs,ierr)
!!$
!!$!...Solve the linear system
!!$    call KSPSolve(ksp,frhs,sol,ierr)
!!$
!!$    !call VecView(sol,PETSC_VIEWER_STDOUT_WORLD,ierr)
!!$
!!$    STOP


!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!           SLEPC         !!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

    if (rank==0) call cpu_time(start)


    call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
    call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr)
    if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (begin slepc)',rank,'  
  memory:',mem

!...Get vector(s) compatible with the matrix, i.e. with the same parallel layout
    call MatCreateVecs(A,xr,xi,ierr)
!!$    call MatCreateVecs(A,xr,PETSC_NULL_OBJECT,ierr)
!!$    call MatCreateVecs(A,xi,PETSC_NULL_OBJECT,ierr)


!...Create Eigensolver Context and Spectral Transformation context
    call EPSCreate(PETSC_COMM_WORLD,eps,ierr)
    !call STCreate(PETSC_COMM_WORLD,st,ierr)

!...Set Operators and Problem Type - Generalized Eigenvalue Problem
    call EPSSetOperators(eps,A,B,ierr)
    !call EPSSetProblemType(eps,EPS_PGNHEP,ierr) ! THIS ONE LEADS TO AN ERROR 
--> CHECK WHY BECAUSE MY B SEEMS FITTING THAT CRITERION
    call EPSSetProblemType(eps,EPS_GNHEP,ierr)

!...Select Portion of Spectrum
    call EPSSetTarget(eps,(3.0d0,0.d0),ierr)
    call EPSSetWhichEigenpairs(eps,EPS_TARGET_MAGNITUDE,ierr)

!...Select Algorithm for Eigenvalue Computation (default is Krylov-Schur)
    call EPSSetType(eps,EPSKRYLOVSCHUR,ierr) !EPSARNOLDI,EPSKRYLOVSCHUR

    call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
    call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr)
    if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (slepc after 
EPSSetType)',rank,'    memory:',mem

!...Set Tolerance and Maxiter for Solver
    tolslepc=1e-12
    maxiter=500
    !call EPSSetTolerances(eps,PETSC_DEFAULT_REAL,maxiter,ierr)
    call EPSSetTolerances(eps,tolslepc,maxiter,ierr)

!...Sets number of eigenvalues to compute and dimension of subspace
    nev=200
    ncv=3*nev
    mpd=10*nev
    !call EPSSetDimensions(eps,nev,ncv,mpd,ierr) PETSC_DEFAULT
    call 
EPSSetDimensions(eps,nev,PETSC_DEFAULT_INTEGER,PETSC_DEFAULT_INTEGER,ierr)

    call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
    call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr)
    if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (before 
EPSGetST)',rank,'    memory:',mem

!...Set the Spectral Transformation --> Use Shift-And-Invert
    call EPSGetST(eps,st,ierr)
    call STSetType(st,STSINVERT,ierr)

!...Set Linear Solver
    call STGetKSP(st,ksp,ierr)
    call KSPSetType(ksp,KSPPREONLY,ierr)
    call KSPGetPC(ksp,pc,ierr)
    call PCSetType(pc,PCLU,ierr) ! could try PCCHOLESKY
    call PCFactorSetMatSolverPackage(pc,MATSOLVERSUPERLU_DIST,ierr) ! 
MATSOLVERSUPERLU_DIST,MATSOLVERMUMPS

!...Set Solver Parameters at Runtime
    call EPSSetFromOptions(eps,ierr)

    call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
    call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr)
    if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (after 
EPSSetFromOptions)',rank,'    memory:',mem

!...Solve the Eigensystem
    call EPSSolve(eps,ierr)

    call EPSGetIterationNumber(eps,its,ierr)
    if (rank .eq. 0) then
       write(*,110) its
    endif
110 format (/' Number of iterations of the method:',I4)

!...Optional: Get some information from the solver and display it
    call EPSGetType(eps,tname,ierr)
    if (rank .eq. 0) then
       write(*,120) tname
    endif
120 format (' Solution method: ',A)

    nev=10
    call EPSGetDimensions(eps,nev,PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,ierr)
    if (rank .eq. 0) then
       write(*,130) nev
    endif
130 format (' Number of requested eigenvalues:',I4)
    call EPSGetTolerances(eps,tol,maxit,ierr)
    if (rank .eq. 0) then
       write(*,140) tol, maxit
    endif
140 format (' Stopping condition: tol=',1P,E10.4,', maxit=',I4)

! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!     Display solution and clean up
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

!...Get number of converged eigenpairs
    call EPSGetConverged(eps,nconv,ierr)
    if (rank .eq. 0) then
       write(*,150) nconv
    endif
150 format (' Number of converged eigenpairs:',I4/)

!...Display eigenvalues and relative errors
    if (nconv.gt.0) then
       if (rank .eq. 0) then
          write(*,*) '         kr               ki          ||Ax-kx||/||kx||'
          write(*,*) ' ----------------- -----------------  ------------------'
       endif
       do i=0,nconv-1
          !...Get converged eigenpairs: i-th eigenvalue is stored in kr (real 
part) and ki (imaginary part)
          call EPSGetEigenpair(eps,i,kr,ki,xr,xi,ierr)

          !...Compute the relative error associated to each eigenpair
          call EPSComputeError(eps,i,EPS_ERROR_RELATIVE,error,ierr)
          if (rank .eq. 0) then
             write(*,160) PetscRealPart(kr),PetscImaginaryPart(kr), error
          endif
160       format (1P,'   ',E21.11,'       ',E21.11,'      ',E21.11)

       enddo

    endif

!...Free work space

    if (rank==0) deallocate(li)

    deallocate(D1Xc,D2Xc,D1Yc,D2Yc)

    deallocate(fst_bc)
    deallocate(wall_bc)
    deallocate(outlet_bc)
    deallocate(inlet_bc)

    deallocate(all_bc)

    deallocate(u_bc)
    deallocate(v_bc)
    deallocate(w_bc)
    deallocate(p_bc)

    !call STDestroy(st,ierr)
    call EPSDestroy(eps,ierr)
    call MatDestroy(A,ierr)
    call MatDestroy(B,ierr)
    !call VecDestroy(i_vec,ierr)
    call VecDestroy(xr,ierr)
    call VecDestroy(xi,ierr)

!...Finalize
    call SlepcFinalize(ierr)

    if (rank==0) then
       call cpu_time(finish)
       write(*,*)
       print '("Total time for SLEPC = ",f21.3," seconds.")',finish-start
    endif

    END Subroutine SETUPSOLVEGEVP


    END MODULE PETSC





!!$       write(*,*)'fst_bc',fst_bc
!!$       write(*,*)'wall_bc',wall_bc
!!$       write(*,*)'outlet_bc',outlet_bc
!!$       write(*,*)'inlet_bc',inlet_bc
!!$
!!$       write(*,*)'all_bc',all_bc

!!$       write(*,*)'ct1+ct2+ct3+ct4',ct1+ct2+ct3+ct4



!!$    call MatCreate(PETSC_COMM_WORLD,A,ierr)
!!$    call MatSetSizes(A,PETSC_DECIDE,PETSC_DECIDE,4*nxy,4*nxy,ierr)
!!$    call MatSetFromOptions(A,ierr)
!!$    call MatSetUp(A,ierr)



       !call VecCreateSeq(PETSC_COMM_SELF,nxy,i_vec,ierr) ! sequential vector
       !call VecSetValues(i_vec,nxy,loc,xwork1,INSERT_VALUES,ierr)

       !  Assemble vector
       !call VecAssemblyBegin(i_vec,ierr)
       !call VecAssemblyEnd(i_vec,ierr)

    !  Create parallel matrix, specifying only its global dimensions.
    !  When using MatCreate(), the matrix format can be specified at
    !  runtime. Also, the parallel partitioning of the matrix is
    !  determined by PETSc at runtime.



!!$    call MatCreate(PETSC_COMM_WORLD,B,ierr)
!!$    call MatSetSizes(B,PETSC_DECIDE,PETSC_DECIDE,4*nxy,4*nxy,ierr)
!!$    call MatSetFromOptions(B,ierr)
!!$    call MatSetUp(B,ierr)

    !  Currently, all PETSc parallel matrix formats are partitioned by
    !  contiguous chunks of rows across the processors.  Determine which
    !  rows of the matrix are locally owned.





! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!                    Include files
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!
!  This program uses CPP for preprocessing, as indicated by the use of
!  PETSc include files in the directory petsc/include/finclude.  This
!  convention enables use of the CPP preprocessor, which allows the use
!  of the #include statements that define PETSc objects and variables.
!
!  Use of the conventional Fortran include statements is also supported
!  In this case, the PETsc include files are located in the directory
!  petsc/include/foldinclude.
!
!  Since one must be very careful to include each file no more than once
!  in a Fortran routine, application programmers must exlicitly list
!  each file needed for the various PETSc components within their
!  program (unlike the C/C++ interface).
!




    !write(*,*)'IstartA,IendA',IstartA,IendA
    !write(*,*)'IstartB,IendB',IstartB,IendB



!!$#include <finclude/petscsys.h>
!!$#include <finclude/petscvec.h>
!!$#include <finclude/petscmat.h>
!!$#include <finclude/petscmat.h90>
!!$#include <finclude/petscpc.h>
!!$#include <finclude/petscksp.h>
!!$
!!$#include <finclude/slepcsys.h>
!!$#include <finclude/slepceps.h>








!!$      do i=1,nx
!!$         do j=1,nx
!!$            D1X(i,j)=100+j+(i-1)*nx
!!$         enddo
!!$      enddo

!!$      do i=1,ny
!!$         do j=1,ny
!!$            D1Y(i,j)=999.d0
!!$         enddo
!!$      enddo










!  Create parallel vectors.
!   - Here, the parallel partitioning of the vector is determined by
!     PETSc at runtime.  We could also specify the local dimensions
!     if desired -- or use the more general routine VecCreate().
!   - When solving a linear system, the vectors and matrices MUST
!     be partitioned accordingly.  PETSc automatically generates
!     appropriately partitioned matrices and vectors when MatCreate()
!     and VecCreate() are used with the same communicator.
!   - Note: We form 1 vector from scratch and then duplicate as needed.

      !call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,m*n,u,ierr)
      !call VecSetFromOptions(u,ierr)
      !call VecDuplicate(u,b,ierr)
      !call VecDuplicate(b,x,ierr)

!  Set exact solution; then compute right-hand-side vector.
!  By default we use an exact solution of a vector with all
!  elements of 1.0;  Alternatively, using the runtime option
!  -random_sol forms a solution vector with random components.

!!$      call 
PetscOptionsHasName(PETSC_NULL_CHARACTER,'-random_exact_sol',flg,ierr)
!!$      if (flg) then
!!$         call PetscRandomCreate(PETSC_COMM_WORLD,rctx,ierr)
!!$         call PetscRandomSetFromOptions(rctx,ierr)
!!$         call VecSetRandom(u,rctx,ierr)
!!$         call PetscRandomDestroy(rctx,ierr)
!!$      else
!!$         call VecSet(u,one,ierr)
!!$      endif
!!$      call MatMult(A,u,b,ierr)


!  View the exact solution vector if desired

!!$      call 
PetscOptionsHasName(PETSC_NULL_CHARACTER,"-view_exact_sol",flg,ierr)
!!$      if (flg) then
!!$         call VecView(u,PETSC_VIEWER_STDOUT_WORLD,ierr)
!!$      endif

! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!         Create the linear solver and set various options
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

!  Create linear solver context

!!$      call KSPCreate(PETSC_COMM_WORLD,ksp,ierr)


!  Set operators. Here the matrix that defines the linear system
!  also serves as the preconditioning matrix.

      !call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr) !aha 
commented and replaced by next line
!!$      call KSPSetOperators(ksp,A,A,ierr)


!  Set linear solver defaults for this problem (optional).
!   - By extracting the KSP and PC contexts from the KSP context,
!     we can then directly directly call any KSP and PC routines
!     to set various options.
!   - The following four statements are optional; all of these
!     parameters could alternatively be specified at runtime via
!     KSPSetFromOptions(). All of these defaults can be
!     overridden at runtime, as indicated below.

!     We comment out this section of code since the Jacobi
!     preconditioner is not a good general default.

!      call KSPGetPC(ksp,pc,ierr)
!      ptype = PCJACOBI
!      call PCSetType(pc,ptype,ierr)
!      tol = 1.e-7
!      call KSPSetTolerances(ksp,tol,PETSC_DEFAULT_DOUBLE_PRECISION,
!     &     PETSC_DEFAULT_DOUBLE_PRECISION,PETSC_DEFAULT_INTEGER,ierr)

!  Set user-defined monitoring routine if desired

      !call 
PetscOptionsHasName(PETSC_NULL_CHARACTER,'-my_ksp_monitor',flg,ierr) ! flg set 
to true if that option has been specified
      !if (flg) then
      !  call 
KSPMonitorSet(ksp,MyKSPMonitor,PETSC_NULL_OBJECT,PETSC_NULL_FUNCTION,ierr) ! 
call MyKSPMonitor
      !endif

      !tol = 1.e-10
      !call 
KSPSetTolerances(ksp,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr)
 ! set relative tolerance and uses defualt for absolute and divergence tol
      !call 
KSPSetTolerances(ksp,tol,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr) ! 
set relative and absolute tolerances and uses defualt for divergence tol

!  Set runtime options, e.g.,
!      -ksp_type <type> -pc_type <type> -ksp_monitor -ksp_rtol <rtol>
!  These options will override those specified above as long as
!  KSPSetFromOptions() is called _after_ any other customization
!  routines.

      !call KSPSetFromOptions(ksp,ierr)


!  Set convergence test routine if desired

      !call 
PetscOptionsHasName(PETSC_NULL_CHARACTER,'-my_ksp_convergence',flg,ierr)
      !if (flg) then
      !  call 
KSPSetConvergenceTest(ksp,MyKSPConverged,PETSC_NULL_OBJECT,PETSC_NULL_FUNCTION,ierr)
      !endif



      !call 
KSPMonitorSet(ksp,KSPMonitorTrueResidualNorm,PETSC_NULL_OBJECT,PETSC_NULL_FUNCTION,ierr);


!
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!                      Solve the linear system
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

      !call KSPSolve(ksp,b,x,ierr)


!
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!                      View solver info (could use -ksp_view instead)
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

      !added by aha

!!$      write(*,*)''
!!$      write(*,*)'Start of KSPView'
!!$      write(*,*)'----------------'
!!$      write(*,*)''
!!$      call KSPView(ksp,PETSC_VIEWER_STDOUT_WORLD,ierr)
!!$      write(*,*)''
!!$      write(*,*)'End of KSPView'
!!$      write(*,*)'--------------'
!!$      write(*,*)''


! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
!                     Check solution and clean up
! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

!  Check the error
!!$      call VecAXPY(x,neg_one,u,ierr)
!!$      call VecNorm(x,NORM_2,norm,ierr)
!!$      call KSPGetIterationNumber(ksp,its,ierr)
!!$      if (rank .eq. 0) then
!!$        if (norm .gt. 1.e-12) then
!!$           write(6,100) norm,its
!!$        else
!!$           write(6,110) its
!!$        endif
!!$      endif
!!$  100 format('Norm of error ',e11.4,' iterations ',i5)
!!$  110 format('Norm of error < 1.e-12,iterations ',i5)

!  Free work space.  All PETSc objects should be destroyed when they
!  are no longer needed.

!!$      call KSPDestroy(ksp,ierr)
!!$      call VecDestroy(u,ierr)
!!$      call VecDestroy(x,ierr)
!!$      call VecDestroy(b,ierr)
!!$      call MatDestroy(A,ierr)

!  Always call PetscFinalize() before exiting a program.  This routine
!    - finalizes the PETSc libraries as well as MPI
!    - provides summary and diagnostic information if certain runtime
!      options are chosen (e.g., -log_summary).  See PetscFinalize()
!      manpage for more information.


! --------------------------------------------------------------
!
!  MyKSPMonitor - This is a user-defined routine for monitoring
!  the KSP iterative solvers.
!
!  Input Parameters:
!    ksp   - iterative context
!    n     - iteration number
!    rnorm - 2-norm (preconditioned) residual value (may be estimated)
!    dummy - optional user-defined monitor context (unused here)
!
!!$      subroutine MyKSPMonitor(ksp,n,rnorm,dummy,ierr)
!!$
!!$      implicit none
!!$
!!$#include <finclude/petscsys.h>
!!$#include <finclude/petscvec.h>
!!$#include <finclude/petscksp.h>
!!$
!!$      KSP              ksp
!!$      Vec              x
!!$      PetscErrorCode ierr
!!$      PetscInt n,dummy
!!$      PetscMPIInt rank
!!$      double precision rnorm
!!$
!!$!  Build the solution vector
!!$
!!$      call KSPBuildSolution(ksp,PETSC_NULL_OBJECT,x,ierr)
!!$
!!$!  Write the solution vector and residual norm to stdout
!!$!   - Note that the parallel viewer PETSC_VIEWER_STDOUT_WORLD
!!$!     handles data from multiple processors so that the
!!$!     output is not jumbled.
!!$
!!$      call MPI_Comm_rank(PETSC_COMM_WORLD,rank,ierr)
!!$      if (rank .eq. 0) write(6,100) n
!!$      call VecView(x,PETSC_VIEWER_STDOUT_WORLD,ierr)
!!$      if (rank .eq. 0) write(6,200) n,rnorm
!!$
!!$ 100  format('iteration ',i5,' solution vector:')
!!$ 200  format('iteration ',i5,' residual norm ',e11.4)
!!$      ierr = 0
!!$
!!$      end subroutine MyKSPMonitor

! --------------------------------------------------------------
!
!  MyKSPConverged - This is a user-defined routine for testing
!  convergence of the KSP iterative solvers.
!
!  Input Parameters:
!    ksp   - iterative context
!    n     - iteration number
!    rnorm - 2-norm (preconditioned) residual value (may be estimated)
!    dummy - optional user-defined monitor context (unused here)
!
!!$      subroutine MyKSPConverged(ksp,n,rnorm,flag,dummy,ierr)
!!$
!!$      implicit none
!!$
!!$#include <finclude/petscsys.h>
!!$#include <finclude/petscvec.h>
!!$#include <finclude/petscksp.h>
!!$
!!$      KSP              ksp
!!$      PetscErrorCode ierr
!!$      PetscInt n,dummy
!!$      KSPConvergedReason flag
!!$      double precision rnorm
!!$
!!$      if (rnorm .le. .05) then
!!$        flag = 1
!!$      else
!!$        flag = 0
!!$      endif
!!$      ierr = 0
!!$
!!$      end subroutine MyKSPConverged



<valgrind.log.31371><valgrind.log.31372>

Reply via email to