Re: [petsc-users] Can't expand MemType 1: jcol 16104

Barry Smith Mon, 13 Jul 2015 13:20:27 -0700

   It lists the line numbers in __baseflow_MOD_poiseuille_3d_petsc 
(module_baseflows.F90:182) __baseflow_MOD_poiseuille_3d_petsc 
(module_baseflows.F90:356) where apparently two vectors are created but never 
get destroyed. Perhaps they get created twice and you only destroy them once? 
Or you increase their reference count?


  Barry


 4 bytes in 1 blocks are indirectly lost in loss record 1 of 17
==31371==    at 0x4C2BD2C: memalign (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==31371==    by 0x4FE5EDB: PetscMallocAlign (mal.c:28)
==31371==    by 0x4F56672: PetscStrallocpy (str.c:188)
==31371==    by 0x4F921F7: PetscObjectChangeTypeName (pname.c:153)
==31371==    by 0x51CA40C: VecCreate_Seq_Private (bvec2.c:888)
==31371==    by 0x51E0533: VecCreate_Seq (bvec3.c:39)
==31371==    by 0x51F7C25: VecSetType (vecreg.c:53)
==31371==    by 0x51E000F: VecCreateSeq (vseqcr.c:39)
==31371==    by 0x51E0778: veccreateseq_ (vseqcrf.c:43)
==31371==    by 0x412632: __baseflow_MOD_poiseuille_3d_petsc 
(module_baseflows.F90:182)
==31371==    by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246)
==31371==    by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8)
==31371== 
==31371== 4 bytes in 1 blocks are indirectly lost in loss record 2 of 17
==31371==    at 0x4C2BD2C: memalign (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==31371==    by 0x4FE5EDB: PetscMallocAlign (mal.c:28)
==31371==    by 0x4F56672: PetscStrallocpy (str.c:188)
==31371==    by 0x4F921F7: PetscObjectChangeTypeName (pname.c:153)
==31371==    by 0x51CA40C: VecCreate_Seq_Private (bvec2.c:888)
==31371==    by 0x51E0533: VecCreate_Seq (bvec3.c:39)
==31371==    by 0x51F7C25: VecSetType (vecreg.c:53)
==31371==    by 0x51E000F: VecCreateSeq (vseqcr.c:39)
==31371==    by 0x528E06C: VecScatterCreateToZero (vecmpitoseq.c:140)
==31371==    by 0x529DD9B: vecscattercreatetozero_ (vecmpitoseqf.c:52)
==31371==    by 0x413459: __baseflow_MOD_poiseuille_3d_petsc 
(module_baseflows.F90:356)
==31371==    by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246)
==31371==    by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8)
==31371== 
==31371== 8 bytes in 1 blocks are indirectly lost in loss record 3 of 17
==31371==    at 0x4C2BD2C: memalign (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==31371==    by 0x4FE5EDB: PetscMallocAlign (mal.c:28)
==31371==    by 0x5120B52: PetscLayoutSetUp (pmap.c:150)
==31371==    by 0x51CA3A9: VecCreate_Seq_Private (bvec2.c:887)
==31371==    by 0x51E0533: VecCreate_Seq (bvec3.c:39)
==31371==    by 0x51F7C25: VecSetType (vecreg.c:53)
==31371==    by 0x51E000F: VecCreateSeq (vseqcr.c:39)
==31371==    by 0x51E0778: veccreateseq_ (vseqcrf.c:43)
==31371==    by 0x412632: __baseflow_MOD_poiseuille_3d_petsc 
(module_baseflows.F90:182)
==31371==    by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246)
==31371==    by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8)
==31371== 
==31371== 8 bytes in 1 blocks are indirectly lost in loss record 4 of 17
==31371==    at 0x4C2BD2C: memalign (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==31371==    by 0x4FE5EDB: PetscMallocAlign (mal.c:28)
==31371==    by 0x5120B52: PetscLayoutSetUp (pmap.c:150)
==31371==    by 0x51CA3A9: VecCreate_Seq_Private (bvec2.c:887)
==31371==    by 0x51E0533: VecCreate_Seq (bvec3.c:39)
==31371==    by 0x51F7C25: VecSetType (vecreg.c:53)
==31371==    by 0x51E000F: VecCreateSeq (vseqcr.c:39)
==31371==    by 0x528E06C: VecScatterCreateToZero (vecmpitoseq.c:140)
==31371==    by 0x529DD9B: vecscattercreatetozero_ (vecmpitoseqf.c:52)
==31371==    by 0x413459: __baseflow_MOD_poiseuille_3d_petsc 
(module_baseflows.F90:356)
==31371==    by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246)
==31371==    by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8)
==31371== 
==31371== 24 bytes in 1 blocks are indirectly lost in loss record 5 of 17
==31371==    at 0x4C2BD2C: memalign (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==31371==    by 0x4FE5EDB: PetscMallocAlign (mal.c:28)
==31371==    by 0x51CA25C: VecCreate_Seq_Private (bvec2.c:879)
==31371==    by 0x51E0533: VecCreate_Seq (bvec3.c:39)
==31371==    by 0x51F7C25: VecSetType (vecreg.c:53)
==31371==    by 0x51E000F: VecCreateSeq (vseqcr.c:39)
==31371==    by 0x51E0778: veccreateseq_ (vseqcrf.c:43)
==31371==    by 0x412632: __baseflow_MOD_poiseuille_3d_petsc 
(module_baseflows.F90:182)
==31371==    by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246)
==31371==    by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8)
==31371== 
==31371== 24 bytes in 1 blocks are indirectly lost in loss record 6 of 17
==31371==    at 0x4C2BD2C: memalign (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==31371==    by 0x4FE5EDB: PetscMallocAlign (mal.c:28)
==31371==    by 0x51CA25C: VecCreate_Seq_Private (bvec2.c:879)
==31371==    by 0x51E0533: VecCreate_Seq (bvec3.c:39)
==31371==    by 0x51F7C25: VecSetType (vecreg.c:53)
==31371==    by 0x51E000F: VecCreateSeq (vseqcr.c:39)
==31371==    by 0x528E06C: VecScatterCreateToZero (vecmpitoseq.c:140)
==31371==    by 0x529DD9B: vecscattercreatetozero_ (vecmpitoseqf.c:52)
==31371==    by 0x413459: __baseflow_MOD_poiseuille_3d_petsc 
(module_baseflows.F90:356)
==31371==    by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246)
==31371==    by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8)
==31371== 
==31371== 56 bytes in 1 blocks are indirectly lost in loss record 7 of 17
==31371==    at 0x4C2BD2C: memalign (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==31371==    by 0x4FE5EDB: PetscMallocAlign (mal.c:28)
==31371==    by 0x511FE32: PetscLayoutCreate (pmap.c:53)
==31371==    by 0x51F743B: VecCreate (veccreate.c:39)
==31371==    by 0x51DFF42: VecCreateSeq (vseqcr.c:37)
==31371==    by 0x51E0778: veccreateseq_ (vseqcrf.c:43)
==31371==    by 0x412632: __baseflow_MOD_poiseuille_3d_petsc 
(module_baseflows.F90:182)
==31371==    by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246)
==31371==    by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8)
==31371== 
==31371== 56 bytes in 1 blocks are indirectly lost in loss record 8 of 17
==31371==    at 0x4C2BD2C: memalign (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==31371==    by 0x4FE5EDB: PetscMallocAlign (mal.c:28)
==31371==    by 0x511FE32: PetscLayoutCreate (pmap.c:53)
==31371==    by 0x51F743B: VecCreate (veccreate.c:39)
==31371==    by 0x51DFF42: VecCreateSeq (vseqcr.c:37)
==31371==    by 0x528E06C: VecScatterCreateToZero (vecmpitoseq.c:140)
==31371==    by 0x529DD9B: vecscattercreatetozero_ (vecmpitoseqf.c:52)
==31371==    by 0x413459: __baseflow_MOD_poiseuille_3d_petsc 
(module_baseflows.F90:356)
==31371==    by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246)
==31371==    by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8)
==31371== 
==31371== 80 bytes in 1 blocks are indirectly lost in loss record 9 of 17
==31371==    at 0x4C2BD2C: memalign (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==31371==    by 0x4FE5EDB: PetscMallocAlign (mal.c:28)
==31371==    by 0x4FCFDA6: PetscObjectComposedDataIncreaseReal (state.c:167)
==31371==    by 0x5203462: VecSet (rvector.c:590)
==31371==    by 0x51E05CA: VecCreate_Seq (bvec3.c:44)
==31371==    by 0x51F7C25: VecSetType (vecreg.c:53)
==31371==    by 0x51E000F: VecCreateSeq (vseqcr.c:39)
==31371==    by 0x51E0778: veccreateseq_ (vseqcrf.c:43)
==31371==    by 0x412632: __baseflow_MOD_poiseuille_3d_petsc 
(module_baseflows.F90:182)
==31371==    by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246)
==31371==    by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8)
==31371== 
==31371== 80 bytes in 1 blocks are indirectly lost in loss record 10 of 17
==31371==    at 0x4C2BD2C: memalign (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==31371==    by 0x4FE5EDB: PetscMallocAlign (mal.c:28)
==31371==    by 0x4FCFE5A: PetscObjectComposedDataIncreaseReal (state.c:168)
==31371==    by 0x5203462: VecSet (rvector.c:590)
==31371==    by 0x51E05CA: VecCreate_Seq (bvec3.c:44)
==31371==    by 0x51F7C25: VecSetType (vecreg.c:53)
==31371==    by 0x51E000F: VecCreateSeq (vseqcr.c:39)
==31371==    by 0x51E0778: veccreateseq_ (vseqcrf.c:43)
==31371==    by 0x412632: __baseflow_MOD_poiseuille_3d_petsc 
(module_baseflows.F90:182)
==31371==    by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246)
==31371==    by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8)
==31371== 
==31371== 80 bytes in 1 blocks are indirectly lost in loss record 11 of 17
==31371==    at 0x4C2BD2C: memalign (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==31371==    by 0x4FE5EDB: PetscMallocAlign (mal.c:28)
==31371==    by 0x4FCFDA6: PetscObjectComposedDataIncreaseReal (state.c:167)
==31371==    by 0x5203462: VecSet (rvector.c:590)
==31371==    by 0x51E05CA: VecCreate_Seq (bvec3.c:44)
==31371==    by 0x51F7C25: VecSetType (vecreg.c:53)
==31371==    by 0x51E000F: VecCreateSeq (vseqcr.c:39)
==31371==    by 0x528E06C: VecScatterCreateToZero (vecmpitoseq.c:140)
==31371==    by 0x529DD9B: vecscattercreatetozero_ (vecmpitoseqf.c:52)
==31371==    by 0x413459: __baseflow_MOD_poiseuille_3d_petsc 
(module_baseflows.F90:356)
==31371==    by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246)
==31371==    by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8)
==31371== 
==31371== 80 bytes in 1 blocks are indirectly lost in loss record 12 of 17
==31371==    at 0x4C2BD2C: memalign (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==31371==    by 0x4FE5EDB: PetscMallocAlign (mal.c:28)
==31371==    by 0x4FCFE5A: PetscObjectComposedDataIncreaseReal (state.c:168)
==31371==    by 0x5203462: VecSet (rvector.c:590)
==31371==    by 0x51E05CA: VecCreate_Seq (bvec3.c:44)
==31371==    by 0x51F7C25: VecSetType (vecreg.c:53)
==31371==    by 0x51E000F: VecCreateSeq (vseqcr.c:39)
==31371==    by 0x528E06C: VecScatterCreateToZero (vecmpitoseq.c:140)
==31371==    by 0x529DD9B: vecscattercreatetozero_ (vecmpitoseqf.c:52)
==31371==    by 0x413459: __baseflow_MOD_poiseuille_3d_petsc 
(module_baseflows.F90:356)
==31371==    by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246)
==31371==    by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8)
==31371== 
==31371== 1,296 bytes in 1 blocks are indirectly lost in loss record 13 of 17
==31371==    at 0x4C2BD2C: memalign (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==31371==    by 0x4FE5EDB: PetscMallocAlign (mal.c:28)
==31371==    by 0x51E044A: VecCreate_Seq (bvec3.c:37)
==31371==    by 0x51F7C25: VecSetType (vecreg.c:53)
==31371==    by 0x51E000F: VecCreateSeq (vseqcr.c:39)
==31371==    by 0x51E0778: veccreateseq_ (vseqcrf.c:43)
==31371==    by 0x412632: __baseflow_MOD_poiseuille_3d_petsc 
(module_baseflows.F90:182)
==31371==    by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246)
==31371==    by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8)
==31371== 
==31371== 1,296 bytes in 1 blocks are indirectly lost in loss record 14 of 17
==31371==    at 0x4C2BD2C: memalign (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==31371==    by 0x4FE5EDB: PetscMallocAlign (mal.c:28)
==31371==    by 0x51E044A: VecCreate_Seq (bvec3.c:37)
==31371==    by 0x51F7C25: VecSetType (vecreg.c:53)
==31371==    by 0x51E000F: VecCreateSeq (vseqcr.c:39)
==31371==    by 0x528E06C: VecScatterCreateToZero (vecmpitoseq.c:140)
==31371==    by 0x529DD9B: vecscattercreatetozero_ (vecmpitoseqf.c:52)
==31371==    by 0x413459: __baseflow_MOD_poiseuille_3d_petsc 
(module_baseflows.F90:356)
==31371==    by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246)
==31371==    by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8)
==31371== 
==31371== 3,044 (1,496 direct, 1,548 indirect) bytes in 1 blocks are definitely 
lost in loss record 15 of 17
==31371==    at 0x4C2BD2C: memalign (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==31371==    by 0x4FE5EDB: PetscMallocAlign (mal.c:28)
==31371==    by 0x51F7312: VecCreate (veccreate.c:37)
==31371==    by 0x51DFF42: VecCreateSeq (vseqcr.c:37)
==31371==    by 0x51E0778: veccreateseq_ (vseqcrf.c:43)
==31371==    by 0x412632: __baseflow_MOD_poiseuille_3d_petsc 
(module_baseflows.F90:182)
==31371==    by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246)
==31371==    by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8)
==31371== 
==31371== 3,044 (1,496 direct, 1,548 indirect) bytes in 1 blocks are definitely 
lost in loss record 16 of 17
==31371==    at 0x4C2BD2C: memalign (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==31371==    by 0x4FE5EDB: PetscMallocAlign (mal.c:28)
==31371==    by 0x51F7312: VecCreate (veccreate.c:37)
==31371==    by 0x51DFF42: VecCreateSeq (vseqcr.c:37)
==31371==    by 0x528E06C: VecScatterCreateToZero (vecmpitoseq.c:140)
==31371==    by 0x529DD9B: vecscattercreatetozero_ (vecmpitoseqf.c:52)
==31371==    by 0x413459: __baseflow_MOD_poiseuille_3d_petsc 
(module_baseflows.F90:356)
==31371==    by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246)
==31371==    by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8)
==31371== 

> On Jul 13, 2015, at 2:52 PM, Anthony Haas <[email protected]> wrote:
> 
> Hi Barry,
> 
> If you look in module_baseflows.F90 line 467-473, I have:
> 
>    call KSPDestroy(ksp,ierr)
>    call VecDestroy(frhs,ierr)
>    call VecDestroy(frhs2,ierr)
>    call VecDestroy(sol,ierr)
>    call VecDestroy(sol_seq,ierr)
>    call MatDestroy(DXX,ierr)
>    call MatDestroy(DYY,ierr)
> 
> I checked again and all the vectors that I create (see module_baseflows.F90 
> lines 87-88) are destroyed there. Also when I use VecScatterCreateToZero, I 
> always have a VecScatterDestroy after.
> 
> Thanks,
> 
> Anthony
> 
> 
> 
> On 07/13/2015 12:15 PM, Barry Smith wrote:
>>> On Jul 13, 2015, at 2:05 PM, Anthony Haas <[email protected]> wrote:
>>> 
>>> Hi,
>>> 
>>> I ran my program under Valgrind with 2 processors using the following:
>>> 
>>> ${PETSC_DIR}/${PETSC_ARCH}/bin/mpiexec -n 2 valgrind --tool=memcheck 
>>> --leak-check=full --show-leak-kinds=all --num-callers=20 
>>> --log-file=valgrind.log.%p ./BiGlobal_Spectral_Petsc.x -malloc off 
>>> -ksp_type preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist 
>>> -eps_view -st_ksp_type preonly -st_pc_type lu 
>>> -st_pc_factor_mat_solver_package superlu_dist
>>> 
>>> Note that in my program, I use once PETSc alone to solve a linear system of 
>>> equations (see module_baseflows.F90) and then later in the code I use SLEPc 
>>> to get the eigenvalues (see module_petsc.F90). My main program is called 
>>> BiGlobal_Spectral_Petsc.F90
>>> 
>>> I have attached the log files from Valgrind. It seems that quite a few 
>>> erros occur in pzgstrf. See valgrind.log.31371 and 31372. Do I have any 
>>> control over these errors?
>>    These are likely bugs in SuperLU_Dist.
>> 
>>> Then starting from line 403 in valgrind.log.31371 and 458 in 
>>> valgrind.log.31372, I have a series of errors that seem to be related to a 
>>> VecCreateSeq and to a VecScatterCreateToZero (see in module_baseflows.F90). 
>>> Is there something I can do to avoid these errors?
>>    These are because you are not destroying all your vectors at the end.
>> 
>>> Thanks
>>> 
>>> Anthony
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 07/08/2015 01:49 PM, Barry Smith wrote:
>>>>    You should try running under valgrind, see:  
>>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>>> 
>>>>     You can also run in the debugger (yes this is tricky on a batch system 
>>>> but possible) and see exactly what triggers the floating point exception 
>>>> or when it "hangs" interrupt the program to see where it is "hanging".
>>>> 
>>>> 
>>>>   Barry
>>>> 
>>>> 
>>>> 
>>>>> On Jul 8, 2015, at 12:34 PM, Anthony Paul Haas <[email protected]> 
>>>>> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I have used the switch -mat_superlu_dist_parsymbfact in my pbs script. 
>>>>> However, although my program worked fine with sequential symbolic 
>>>>> factorization, I get one of the following 2 behaviors when I run with 
>>>>> parallel symbolic factorization (depending on the number of processors 
>>>>> that I use):
>>>>> 
>>>>> 1) the program just hangs (it seems stuck in some subroutine ==> see 
>>>>> test.out-hangs)
>>>>> 2) I get a floating point exception ==> see 
>>>>> test.out-floating-point-exception
>>>>> 
>>>>> Note that as suggested in the Superlu manual, I use a power of 2 number 
>>>>> of procs. Are there any tunable parameters for the parallel symbolic 
>>>>> factorization? Note that when I build my sparse matrix, most elements I 
>>>>> add are nonzero of course but to simplify the programming, I also add a 
>>>>> few zero elements in the sparse matrix. I was thinking that maybe if the 
>>>>> parallel symbolic factorization proceed by block, there could be some 
>>>>> blocks where the pivot would be zero, hence creating the FPE??
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Anthony
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, Jul 8, 2015 at 6:46 AM, Xiaoye S. Li <[email protected]> wrote:
>>>>> Did you find out how to change option to use parallel symbolic 
>>>>> factorization?  Perhaps PETSc team can help.
>>>>> 
>>>>> Sherry
>>>>> 
>>>>> 
>>>>> On Tue, Jul 7, 2015 at 3:58 PM, Xiaoye S. Li <[email protected]> wrote:
>>>>> Is there an inquiry function that tells you all the available options?
>>>>> 
>>>>> Sherry
>>>>> 
>>>>> On Tue, Jul 7, 2015 at 3:25 PM, Anthony Paul Haas 
>>>>> <[email protected]> wrote:
>>>>> Hi Sherry,
>>>>> 
>>>>> Thanks for your message. I have used superlu_dist default options. I did 
>>>>> not realize that I was doing serial symbolic factorization. That is 
>>>>> probably the cause of my problem.
>>>>> Each node on Garnet has 60GB usable memory and I can run with 1,2,4,8,16 
>>>>> or 32 core per node.
>>>>> 
>>>>> So I should use:
>>>>> 
>>>>> -mat_superlu_dist_r 20
>>>>> -mat_superlu_dist_c 32
>>>>> 
>>>>> How do you specify the parallel symbolic factorization option? is it 
>>>>> -mat_superlu_dist_matinput 1
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Anthony
>>>>> 
>>>>> 
>>>>> On Tue, Jul 7, 2015 at 3:08 PM, Xiaoye S. Li <[email protected]> wrote:
>>>>> For superlu_dist failure, this occurs during symbolic factorization.  
>>>>> Since you are using serial symbolic factorization, it requires the entire 
>>>>> graph of A to be available in the memory of one MPI task. How much memory 
>>>>> do you have for each MPI task?
>>>>> 
>>>>> It won't help even if you use more processes.  You should try to use 
>>>>> parallel symbolic factorization option.
>>>>> 
>>>>> Another point.  You set up process grid as:
>>>>>        Process grid nprow 32 x npcol 20
>>>>> For better performance, you show swap the grid dimension. That is, it's 
>>>>> better to use 20 x 32, never gives nprow larger than npcol.
>>>>> 
>>>>> 
>>>>> Sherry
>>>>> 
>>>>> 
>>>>> On Tue, Jul 7, 2015 at 1:27 PM, Barry Smith <[email protected]> wrote:
>>>>> 
>>>>>    I would suggest running a sequence of problems, 101 by 101 111 by 111 
>>>>> etc and get the memory usage in each case (when you run out of memory you 
>>>>> can get NO useful information out about memory needs). You can then plot 
>>>>> memory usage as a function of problem size to get a handle on how much 
>>>>> memory it is using.  You can also run on more and more processes (which 
>>>>> have a total of more memory) to see how large a problem you may be able 
>>>>> to reach.
>>>>> 
>>>>>    MUMPS also has an "out of core" version (which we have never used) 
>>>>> that could in theory anyways let you get to large problems if you have 
>>>>> lots of disk space, but you are on your own figuring out how to use it.
>>>>> 
>>>>>   Barry
>>>>> 
>>>>>> On Jul 7, 2015, at 2:37 PM, Anthony Paul Haas <[email protected]> 
>>>>>> wrote:
>>>>>> 
>>>>>> Hi Jose,
>>>>>> 
>>>>>> In my code, I use once PETSc to solve a linear system to get the 
>>>>>> baseflow (without using SLEPc) and then I use SLEPc to do the stability 
>>>>>> analysis of that baseflow. This is why, there are some SLEPc options 
>>>>>> that are not used in test.out-superlu_dist-151x151 (when I am solving 
>>>>>> for the baseflow with PETSc only). I have attached a 101x101 case for 
>>>>>> which I get the eigenvalues. That case works fine. However If i increase 
>>>>>> to 151x151, I get the error that you can see in 
>>>>>> test.out-superlu_dist-151x151 (similar error with mumps: see 
>>>>>> test.out-mumps-151x151 line 2918 ). If you look a the very end of the 
>>>>>> files test.out-superlu_dist-151x151 and test.out-mumps-151x151, you will 
>>>>>> see that the last info message printed is:
>>>>>> 
>>>>>> On Processor (after EPSSetFromOptions)  0    memory:    
>>>>>> 0.65073152000E+08          =====>  (see line 807 of module_petsc.F90)
>>>>>> 
>>>>>> This means that the memory error probably occurs in the call to EPSSolve 
>>>>>> (see module_petsc.F90 line 810). I would like to evaluate how much 
>>>>>> memory is required by the most memory intensive operation within 
>>>>>> EPSSolve. Since I am solving a generalized EVP, I would imagine that it 
>>>>>> would be the LU decomposition. But is there an accurate way of doing it?
>>>>>> 
>>>>>> Before starting with iterative solvers, I would like to exploit as much 
>>>>>> as I can direct solvers. I tried GMRES with default preconditioner at 
>>>>>> some point but I had convergence problem. What solver/preconditioner 
>>>>>> would you recommend for a generalized non-Hermitian (EPS_GNHEP) EVP?
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Anthony
>>>>>> 
>>>>>> On Tue, Jul 7, 2015 at 12:17 AM, Jose E. Roman <[email protected]> 
>>>>>> wrote:
>>>>>> 
>>>>>> El 07/07/2015, a las 02:33, Anthony Haas escribió:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I am computing eigenvalues using PETSc/SLEPc and superlu_dist for the 
>>>>>>> LU decomposition (my problem is a generalized eigenvalue problem). The 
>>>>>>> code runs fine for a grid with 101x101 but when I increase to 151x151, 
>>>>>>> I get the following error:
>>>>>>> 
>>>>>>> Can't expand MemType 1: jcol 16104   (and then [NID 00037] 2015-07-06 
>>>>>>> 19:19:17 Apid 31025976: OOM killer terminated this process.)
>>>>>>> 
>>>>>>> It seems to be a memory problem. I monitor the memory usage as far as I 
>>>>>>> can and it seems that memory usage is pretty low. The most memory 
>>>>>>> intensive part of the program is probably the LU decomposition in the 
>>>>>>> context of the generalized EVP. Is there a way to evaluate how much 
>>>>>>> memory will be required for that step? I am currently running the debug 
>>>>>>> version of the code which I would assume would use more memory?
>>>>>>> 
>>>>>>> I have attached the output of the job. Note that the program uses twice 
>>>>>>> PETSc: 1) to solve a linear system for which no problem occurs, and, 2) 
>>>>>>> to solve the Generalized EVP with SLEPc, where I get the error.
>>>>>>> 
>>>>>>> Thanks
>>>>>>> 
>>>>>>> Anthony
>>>>>>> <test.out-superlu_dist-151x151>
>>>>>> In the output you are attaching there are no SLEPc objects in the report 
>>>>>> and SLEPc options are not used. It seems that SLEPc calls are skipped?
>>>>>> 
>>>>>> Do you get the same error with MUMPS? Have you tried to solve linear 
>>>>>> systems with a preconditioned iterative solver?
>>>>>> 
>>>>>> Jose
>>>>>> 
>>>>>> 
>>>>>> <module_petsc.F90><test.out-mumps-151x151><test.out_superlu_dist-101x101><test.out-superlu_dist-151x151>
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> <test.out-hangs><test.out-floating-point-exception>
>>> 
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> !! SPECTRAL TEMPORAL BIGLOBAL SOLVER !
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> 
>>> PROGRAM Biglobal_Spectral
>>> 
>>>  USE PETSC
>>>  USE TYPEDEF
>>>  USE MAPPINGS
>>>  USE BASEFLOW
>>>  USE WRITE_PRINT
>>>  USE FOURIER_CHEBYSHEV
>>> 
>>>  IMPLICIT NONE
>>> 
>>>  include 'mpif.h'
>>> 
>>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -!
>>> !                           Variables Definition                         !
>>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -!
>>> 
>>> !...Testing and dummy variables
>>>    real(dp) :: dummy
>>> 
>>> !...Problem parameters
>>>    integer(i4b),parameter :: nx=11,ny=11,nxy=nx*ny,N=4*nxy
>>> 
>>>    !Grid size for 3D Poiseuille baseflow
>>>    integer(i4b),parameter :: nnx=nx,nny=ny
>>> 
>>>    real(dp) :: Re,alpha,beta,Lx,xmin,xmax,ymin,ymax,AA,xll,yll
>>> 
>>> !...Problem Flags
>>>    integer(i4b) :: fourier,testcase,formulation,algorithm
>>> 
>>>    !Mapping Flags
>>>    integer(i4b) :: map,mapx,mapy
>>> 
>>> !...Indices
>>>    integer(i4b) :: i,j,iii,jjj,ij,skip,imin,imax,jmin,jmax
>>> 
>>> !...Output files
>>>    integer(i4b)   :: file_num
>>>    character(128) :: filetype,filename,file_name
>>> 
>>> !...Boundary conditions variables
>>>    integer(i4b) :: ct1,ct2,ct3,ct4
>>>    integer(i4b),dimension(:),allocatable :: 
>>> fst_bc,wall_bc,outlet_bc,inlet_bc
>>>    integer(i4b),dimension(:),allocatable :: bcond_px,bcond_py,bcond_pz
>>>    integer(i4b),dimension(:),allocatable :: bcond_u,bcond_v,bcond_w,bcond_p
>>>    integer(i4b),dimension(:),allocatable :: bcond
>>> 
>>> !...Chebyshev related
>>>    real(dp),dimension(:),allocatable   :: xi,yi,x,y,xfourier
>>> 
>>> !...Baseflow variables
>>>    real(dp),dimension(:,:),allocatable :: usol,uxsol,uysol
>>>    real(dp),dimension(:,:),allocatable :: vsol,vxsol,vysol
>>>    real(dp),dimension(:,:),allocatable :: wsol,wxsol,wysol
>>> 
>>>    !For Hiemenz Baseflow
>>>    integer(i4b) :: npts_hz
>>>    real(dp) :: ymax_hz,epstol,si
>>> 
>>>    real(dp),dimension(:),allocatable :: eta
>>>    real(dp),dimension(:),allocatable :: u0,v0,v1,v2,v3
>>>    real(dp),dimension(:),allocatable :: w0,w1,w2
>>>    real(dp),dimension(:),allocatable :: etarev,u0rev,v0rev,w0rev
>>> 
>>>    real(dp),dimension(:,:),allocatable :: Ubar,Vbar,Wbar
>>> 
>>> !...Grid variables
>>>    real(dp),dimension(:,:),allocatable :: xx,yy,xx1,yy1
>>> 
>>> !...Vectorization array
>>>    integer(i4b),dimension(:),allocatable :: li
>>> 
>>> !...MPI
>>>    integer(i4b) :: rank,size,ierr
>>> 
>>> !...Spline interpolation
>>>    !real(dp),dimension(:),allocatable :: bsp,csp,dsp
>>>    !real(dp),dimension(:),allocatable :: interp_u0,interp_v0,interp_w0
>>> 
>>>    real(dp) :: seval
>>> 
>>> !...External Subroutines
>>>    external seval
>>> 
>>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -!
>>> !                           Beginning of program                         !
>>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -!
>>> 
>>>    call MPI_INIT(ierr)
>>> 
>>>    call MPI_COMM_SIZE(MPI_COMM_WORLD,size,ierr)
>>>    call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)
>>> 
>>>    if (rank==0) call cpu_time(start_total)
>>> 
>>> !...Parameters for BiGlobal Computation
>>> 
>>>    testcase=2      ! 1 --> Plane Poiseuille Flow
>>>                    ! 2 --> 3D Poiseuille Flow
>>>                    ! 3 --> Attachment line flow (Hiemenz)
>>>                    ! 4 --> Laminar Separation Bubble
>>>                    ! 5 --> Artificially parallel boundary-layer (cf. 
>>> Rodriguez thesis)
>>> 
>>>    formulation = 1 ! 1 --> x,y inhomogeneous
>>>                    ! 2 --> y,z inhomogeneous
>>> 
>>>    algorithm = 1   ! 1 --> QZ algorith
>>>                    ! 2 --> Arnoldi algorithm
>>> 
>>> 
>>> !...Arrays allocation
>>> 
>>>    allocate(li(nx))
>>>    allocate(xi(nx),yi(ny))
>>>    allocate(x(nx),y(ny))
>>>    allocate(xfourier(nx))
>>> 
>>>    allocate(uvec(nxy),uxvec(nxy),uyvec(nxy))
>>>    allocate(vvec(nxy),vxvec(nxy),vyvec(nxy))
>>>    allocate(wvec(nxy),wxvec(nxy),wyvec(nxy))
>>> 
>>>    allocate(xx(nx,ny),yy(nx,ny))
>>>    allocate(D1X(nx,nx),D2X(nx,nx))
>>>    allocate(D1Y(ny,ny),D2Y(ny,ny))
>>>    allocate(DMX(nx,nx,2),DMY(ny,ny,2))
>>> 
>>>    !allocate(interp_u0(ny),interp_v0(ny),interp_w0(ny))
>>> 
>>>    li       = 0
>>>    xi       = 0.0d0
>>>    yi       = 0.0d0
>>>    x        = 0.0d0
>>>    y        = 0.0d0
>>>    xfourier = 0.0d0
>>>    uvec     = 0.0d0
>>>    uxvec    = 0.0d0
>>>    uyvec    = 0.0d0
>>>    vvec     = 0.0d0
>>>    vxvec    = 0.0d0
>>>    vyvec    = 0.0d0
>>>    wvec     = 0.0d0
>>>    wxvec    = 0.0d0
>>>    wyvec    = 0.0d0
>>>    xx       = 0.0d0
>>>    yy       = 0.0d0
>>>    D1X      = 0.0d0
>>>    D2X      = 0.0d0
>>>    D1Y      = 0.0d0
>>>    D2Y      = 0.0d0
>>>    DMX      = 0.0d0
>>>    DMY      = 0.0d0
>>> 
>>> !...Test Cases Parameters
>>> 
>>>    if ( testcase == 2 ) then
>>> 
>>>       mapx=1
>>>       mapy=0
>>> 
>>>       Re=1000
>>> 
>>>       if (formulation == 1)then
>>>          beta = pi
>>>       elseif (formulation == 2) then
>>>          alpha = pi
>>>       endif
>>> 
>>>       AA=1.0
>>> 
>>>       xmin=-AA
>>>       xmax=AA
>>> 
>>>       ymin=-1.0
>>>       ymax=1.0
>>> 
>>>       !grid for baseflow
>>>       !nnx=nx
>>>       !nny=ny
>>> 
>>>    endif
>>> 
>>> !...Working Array for Converting 2D Indices to 1D
>>> 
>>>    li=0
>>> 
>>>    do i=1,nx
>>>       li(i)=(i-1)*ny
>>>    enddo
>>> 
>>> !...Get Collocation Points and Chebyshev Differentiation Matrices
>>> 
>>>    call cheby(nx-1,xi,DMX) ! (cf. Spectral Methods in Matlab pages 53-55)
>>>    call cheby(ny-1,yi,DMY)
>>> 
>>>    xll=0
>>> 
>>>    ! To get matrices D1X,D2X,D1Y,D2Y
>>>    call set_mappings(nx,ny,mapx,mapy,xmin,xmax,ymin,ymax,xi,yi,xll,yll,x,y)
>>> 
>>>    do i=1,nx
>>>       do j=1,ny
>>> 
>>>          xx(i,j)=x(i) ! not equivalent to matlab meshgrid!!!
>>>          yy(i,j)=y(j) ! not equivalent to matlab meshgrid!!!
>>> 
>>>       enddo
>>>    enddo
>>> 
>>> 
>>> !...Compute Baseflow (3D Poiseuille)
>>> 
>>>    if ( testcase == 2 ) then
>>> 
>>>       allocate(xx1(nnx,nny),yy1(nnx,nny))
>>> 
>>>       allocate(usol(nnx,nny),uxsol(nnx,nny),uysol(nnx,nny))
>>>       allocate(vsol(nnx,nny),vxsol(nnx,nny),vysol(nnx,nny))
>>>       allocate(wsol(nnx,nny),wxsol(nnx,nny),wysol(nnx,nny))
>>> 
>>>       xx1   = 0.0d0
>>>       yy1   = 0.0d0
>>>       usol  = 0.0d0
>>>       uxsol = 0.0d0
>>>       uysol = 0.0d0
>>>       vsol  = 0.0d0
>>>       vxsol = 0.0d0
>>>       vysol = 0.0d0
>>>       wsol  = 0.0d0
>>>       wxsol = 0.0d0
>>>       wysol = 0.0d0
>>> 
>>>       if (rank == 0) then
>>>          WRITE(*,*)
>>>          WRITE(*,'(A,I3,A,I3)')'3D POISEUILLE BASEFLOW COMPUTED ON GRID 
>>> WITH NX =',nnx,' NY =',nny
>>>          WRITE(*,*)
>>>          call cpu_time(start)
>>>       endif
>>> 
>>>       !currently I am taking same grid for baseflow as for biglobal ==> no 
>>> interpolation
>>>       call Poiseuille_3D_PETSC(nnx,nny,AA,usol,xx1,yy1)
>>> 
>>>       if (rank==0) then
>>>          call cpu_time(finish)
>>>          print '("Time for baseflow computation = ",f15.4," 
>>> seconds.")',finish-start
>>> 
>>>          file_num=10
>>>          file_name='check_baseflow_3D_poiseuille.dat'
>>> 
>>>          call 
>>> write_to_tecplot(file_num,file_name,nnx,nny,xx1,yy1,usol,usol,usol)
>>> 
>>>       endif
>>> 
>>>       vsol=0*usol
>>>       wsol=0*usol
>>> 
>>>       if (formulation==1) then
>>> 
>>>          wsol=usol
>>>          usol=0.*wsol
>>>          vsol=0.*wsol
>>> 
>>>       endif
>>> 
>>>       !Compute derivatives
>>>       uxsol = MATMUL(D1X,usol)
>>>       uysol = MATMUL(usol,TRANSPOSE(D1Y))
>>> 
>>>       vxsol = MATMUL(D1X,vsol)
>>>       vysol = MATMUL(vsol,TRANSPOSE(D1Y))
>>> 
>>>       wxsol = MATMUL(D1X,wsol)
>>>       wysol = MATMUL(wsol,TRANSPOSE(D1Y))
>>> 
>>>       !Vectorize the matrices
>>>       do i=1,nx
>>>          do j=1,ny
>>> 
>>>             ij=li(i)+j
>>> 
>>>             uvec(ij) = usol(i,j)
>>>             vvec(ij) = vsol(i,j)
>>>             wvec(ij) = wsol(i,j)
>>> 
>>>             uxvec(ij) = uxsol(i,j)
>>>             vxvec(ij) = vxsol(i,j)
>>>             wxvec(ij) = wxsol(i,j)
>>> 
>>>             uyvec(ij) = uysol(i,j)
>>>             vyvec(ij) = vysol(i,j)
>>>             wyvec(ij) = wysol(i,j)
>>> 
>>>          enddo
>>>       enddo
>>> 
>>>    endif
>>> 
>>>    !call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>> 
>>> !...Build Matrices, Set BCs and Solve Problem
>>> 
>>>    call SetupSolveGEVP(nx,ny,nxy,alpha,beta,Re)
>>> 
>>>    if (rank==0) then
>>>       call cpu_time(finish_total)
>>>       print '("Total time = ",f15.4," seconds.")',finish_total-start_total
>>>    endif
>>> 
>>>    deallocate(li)
>>>    deallocate(xi,yi)
>>>    deallocate(x,y)
>>>    deallocate(xfourier)
>>> 
>>>    deallocate(uvec,uxvec,uyvec)
>>>    deallocate(vvec,vxvec,vyvec)
>>>    deallocate(wvec,wxvec,wyvec)
>>> 
>>>    deallocate(xx,yy)
>>>    deallocate(D1X,D2X)
>>>    deallocate(D1Y,D2Y)
>>>    deallocate(DMX,DMY)
>>> 
>>>    deallocate(xx1,yy1)
>>> 
>>>    deallocate(usol,uxsol,uysol)
>>>    deallocate(vsol,vxsol,vysol)
>>>    deallocate(wsol,wxsol,wysol)
>>> 
>>>    call MPI_FINALIZE(ierr)
>>> 
>>> 
>>>  END PROGRAM Biglobal_Spectral
>>> 
>>> MODULE BASEFLOW
>>> 
>>>  USE TYPEDEF
>>>  USE WRITE_PRINT
>>>  USE FOURIER_CHEBYSHEV
>>> 
>>>  IMPLICIT NONE
>>> 
>>> CONTAINS
>>> 
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> 
>>>  SUBROUTINE Poiseuille_3D_PETSC(nx,ny,AA,usol,xx,yy)
>>> 
>>> #include <petsc/finclude/petscsys.h>
>>> #include <petsc/finclude/petscvec.h>
>>> #include <petsc/finclude/petscmat.h>
>>> #include <petsc/finclude/petscmat.h90>
>>> #include <petsc/finclude/petscvec.h90>
>>> #include <petsc/finclude/petscpc.h>
>>> #include <petsc/finclude/petscksp.h>
>>> #include <petsc/finclude/petscviewer.h>
>>> #include <petsc/finclude/petscviewer.h90>
>>> 
>>> #include <slepc/finclude/slepcsys.h>
>>> #include <slepc/finclude/slepceps.h>
>>> 
>>> !...Regular Variables
>>> 
>>>    integer(i4b),intent(in) :: nx,ny
>>> 
>>>    real(dp),intent(in) :: AA
>>>    real(dp),dimension(nx,ny),intent(inout) :: usol,xx,yy
>>> 
>>>    integer(i4b) :: i,j,ij
>>>    integer(i4b) :: rank,size,ierr,source
>>>    integer(i4b) :: nxy,nxm2,nym2,nxym2
>>> 
>>>    integer(i4b),dimension(nx) :: li
>>>    integer(i4b),dimension(nx-2) :: lim2
>>> 
>>>    integer(i4b),dimension(:),allocatable  :: loc
>>> 
>>>    real(dp) :: ymin,ymax,xmin,xmax
>>>    real(dp) :: dxidx,d2xidx2,dyidy,d2yidy2
>>> 
>>>    real(dp),dimension(nx) :: xi,x
>>>    real(dp),dimension(ny) :: yi,y
>>> 
>>>    real(dp),dimension(nx,nx) :: D2XB
>>>    real(dp),dimension(ny,ny) :: D2YB
>>>    real(dp),dimension(nx,nx,2) :: DMXB
>>>    real(dp),dimension(ny,ny,2) :: DMYB
>>> 
>>>    real(dp),dimension(nx*ny) :: usol_vec
>>>    real(dp),dimension( nx-2,ny-2 ) :: sol_mat
>>> 
>>>    complex(dpc) :: U00
>>>    complex(dpc) :: atest
>>>    complex(dpc),dimension(nx-2,nx-2) :: D2XBs
>>>    complex(dpc),dimension(ny-2,ny-2) :: D2YBs
>>>    complex(dpc),dimension( nx-2,ny-2 ) :: sol_mat_cmplx
>>>    complex(dpc),dimension(:),allocatable  :: xwork1
>>> 
>>> !!$    allocate(li(nx))
>>> !!$    allocate(lim2(nx-2))
>>> 
>>> !!! NEED TO TRANSFORM ALL THESE ARRAYS TO ALLOCATABLE!!!!!!
>>> 
>>> !...Petsc Variables
>>>    PetscInt ii,jj,ione,ct
>>>    PetscInt jmin,jmax
>>>    PetscInt Istart,Iend
>>>    PetscInt IstartVec,IendVec
>>> 
>>>    PetscReal   tol
>>>    PetscScalar aone
>>> 
>>>    PetscViewer viewer
>>> 
>>>    PC         pc
>>>    KSP        ksp
>>>    VecScatter ctx
>>>    Mat        DXX,DYY
>>>    !Mat        DLOAD
>>>    Vec        frhs,frhs2
>>>    Vec        sol,sol_seq
>>> 
>>> 
>>>    !For VecGetArrayReadF90
>>>    !PetscScalar,pointer :: xx_v(:) !Fortran90 pointer to the array
>>>    PetscOffset i_x
>>>    PetscScalar sol_array(1)
>>> 
>>> !...MPI
>>>    !PetscMPIInt    rank,size
>>> 
>>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -!
>>> !                         Beginning of program                           !
>>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -!
>>> 
>>>    call MPI_COMM_SIZE(MPI_COMM_WORLD,size,ierr)
>>>    call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)
>>> 
>>>    call PetscInitialize(PETSC_NULL_CHARACTER,ierr)
>>> 
>>>    !call PetscOptionsInsertString("-mat_superlu_dist_parsymbfact",ierr)
>>> 
>>> !...Check if nz and ny are odd (necessary to set max velocity to 1)
>>> 
>>>    if ( mod(nx,2)==0 .or.  mod(ny,2)==0   ) then
>>>       write(*,*)
>>>       STOP'STOPPING... IN 3D POISEUILLE BASEFLOW NZ,NY NEED TO BE ODD'
>>>    endif
>>> 
>>> !...Set Domain Sizes and Initialize other variables
>>> 
>>>    xmin=-AA
>>>    xmax=AA
>>> 
>>>    ymin=-1
>>>    ymax=1
>>> 
>>>    ione=1
>>> 
>>>    nxy=nx*ny
>>> 
>>>    !Variables to Build DXX and DYY
>>>    nxm2=nx-2
>>>    nym2=ny-2
>>>    nxym2 = nxm2*nym2
>>> 
>>> !...Mat to Vec and Vec to Mat conversion
>>> 
>>>    lim2=0
>>>    do i=1,nxm2
>>>       lim2(i) = (i-1)*nym2
>>>    enddo
>>> 
>>> !...Get collocation points and chebyshev differentiation matrices
>>> 
>>>    call cheby(nx-1,xi,DMXB)
>>>    call cheby(ny-1,yi,DMYB)
>>> 
>>>    ! x-mapping from [-1,1]->[-XMAX,XMAX]
>>>    x=xmax*xi
>>> 
>>>    ! y-mapping (from [-1,1]->[-1,1])
>>>    y=yi
>>> 
>>>    !Compute the metric terms
>>>    dxidx = 1/xmax
>>>    d2xidx2 = 0
>>> 
>>>    dyidy = 1
>>>    d2yidy2 = 0
>>> 
>>>    !Scale the differentiation matrix (due to the use of the mapping)
>>> 
>>>    !x-direction
>>>    D2XB  = d2xidx2*DMXB(:,:,1) + dxidx**2*DMXB(:,:,2)
>>>    D2XBs = D2XB(2:nx-1,2:nx-1)
>>> 
>>>    !y-direction
>>>    D2YB  = d2yidy2*DMYB(:,:,1) + dyidy**2*DMYB(:,:,2)
>>>    D2YBs = D2YB(2:ny-1,2:ny-1)
>>> 
>>>    xx=0.0
>>>    yy=0.0
>>> 
>>>    do i=1,nx
>>>       do j=1,ny
>>> 
>>>          xx(i,j)=x(i)
>>>          yy(i,j)=y(j)
>>> 
>>>       enddo
>>>    enddo
>>> 
>>> !...Create Petsc right-hand-side vector and derivation matrices
>>>    call VecCreateSeq(PETSC_COMM_SELF,nxym2,sol_seq,ierr)
>>>    call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,nxym2,frhs,ierr) ! right 
>>> hand side vector
>>>    call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,nxym2,frhs2,ierr) ! 
>>> right hand side vector # 2
>>> 
>>>    call 
>>> MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,nxym2,nxym2,nx,PETSC_NULL_INTEGER,nx,PETSC_NULL_INTEGER,DXX,ierr)
>>>    call 
>>> MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,nxym2,nxym2,ny,PETSC_NULL_INTEGER,ny,PETSC_NULL_INTEGER,DYY,ierr)
>>> 
>>> !...Determine which rows of the vector/matrix are locally owned.
>>>    call VecGetOwnershipRange(frhs,IstartVec,IendVec,ierr)
>>>    call MatGetOwnershipRange(DXX,Istart,Iend,ierr)
>>> 
>>>    if ( IstartVec /= Istart .or. IendVec /= Iend ) then
>>>       stop 'Problem in Baseflow Computation'
>>>    endif
>>> 
>>> !...Fill in Sequential and Parallel Vectors
>>>    allocate(xwork1(Iend-Istart))
>>>    allocate(loc(Iend-Istart))
>>> 
>>>    ct=0
>>>    do i=Istart,Iend-1
>>>       ct=ct+1
>>>       loc(ct)=i
>>>       xwork1(ct)=(-2.0d0,0.0d0)
>>>    enddo
>>> 
>>>    call VecSetValues(frhs,Iend-Istart,loc,xwork1,INSERT_VALUES,ierr)
>>>    call VecZeroEntries(sol_seq,ierr)
>>> 
>>> !...Assemble Vectors
>>>    call VecAssemblyBegin(frhs,ierr)
>>>    call VecAssemblyEnd(frhs,ierr)
>>> 
>>>    call VecAssemblyBegin(sol_seq,ierr)
>>>    call VecAssemblyEnd(sol_seq,ierr)
>>> 
>>>    call VecDuplicate(frhs,sol,ierr)   ! parallel solution vector
>>> 
>>> !...Display vector
>>>    !call VecView(frhs,PETSC_VIEWER_STDOUT_WORLD,ierr) ! ==> seems very slow
>>> 
>>> !...Build DXX in Parallel
>>> 
>>> !!$    if (rank==0) then
>>> !!$       call print_matrix('D2XB',nx,nx,D2XB)
>>> !!$       call print_matrix('D2YB',ny,ny,D2YB)
>>> !!$
>>> !!$       call print_matrix_cmplx('D2XBs',nx-2,nx-2,D2XBs)
>>> !!$       call print_matrix_cmplx('D2YBs',ny-2,ny-2,D2YBs)
>>> !!$    endif
>>> 
>>>    if (rank == 0) call cpu_time(start)
>>> 
>>>    do i=Istart,Iend-1
>>> 
>>>       ii=1+floor((i+0.01)/nym2)
>>>       jj=1
>>> 
>>>       jmin=mod(i,nym2)
>>>       jmax=jmin+(nxm2-1)*nym2
>>> 
>>>       do j=jmin,jmax,nym2
>>> 
>>>          call 
>>> MatSetValues(DXX,ione,i,ione,j,D2XBs(ii,jj),INSERT_VALUES,ierr)
>>>          jj=jj+1
>>> 
>>>       enddo
>>> 
>>>    enddo
>>> 
>>> !...Build DYY in Parallel
>>> 
>>>    do i=Istart,Iend-1
>>> 
>>>       ii=mod(i,nym2)+1
>>>       jj=1
>>> 
>>>       jmin=floor((i+0.01)/nym2)*nym2
>>> 
>>>       do j=jmin,jmin+(nym2-1)
>>> 
>>>          call 
>>> MatSetValues(DYY,ione,i,ione,j,D2YBs(ii,jj),INSERT_VALUES,ierr)
>>>          jj=jj+1
>>> 
>>>       enddo
>>> 
>>>    enddo
>>> 
>>> !...Assemble DXX and DYY
>>> 
>>>    call MatAssemblyBegin(DXX,MAT_FINAL_ASSEMBLY,ierr)
>>>    call MatAssemblyEnd(DXX,MAT_FINAL_ASSEMBLY,ierr)
>>> 
>>>    call MatAssemblyBegin(DYY,MAT_FINAL_ASSEMBLY,ierr)
>>>    call MatAssemblyEnd(DYY,MAT_FINAL_ASSEMBLY,ierr)
>>> 
>>>    if (rank==0) then
>>>       call cpu_time(finish)
>>>       print '("Time for build and assembly of baseflow operator = ",f15.4," 
>>> seconds.")',finish-start
>>>    endif
>>> 
>>>    !call MatView(DXX,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>>    !call MatView(DYY,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>> 
>>>    aone = (1.0d0,0.0d0)
>>> 
>>>    if (rank == 0) call cpu_time(start)
>>>    call MatAXPY(DXX,aone,DYY,DIFFERENT_NONZERO_PATTERN,ierr) ! on exit DXX 
>>> = a*DYY + DXX
>>>    if (rank==0) then
>>>       call cpu_time(finish)
>>>       print '("Time for Matrix summation = ",f15.4," 
>>> seconds.")',finish-start
>>>    endif
>>> 
>>> 
>>> !...Write Matrix to ASCII and Binary Format
>>> !!$    call PetscViewerASCIIOpen(PETSC_COMM_WORLD,"Amat.m",viewer,ierr)
>>> !!$    call MatView(DXX,viewer,ierr)
>>> !!$    call PetscViewerDestroy(viewer,ierr)
>>> !!$
>>> !!$    call 
>>> PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_WRITE,viewer,ierr)
>>> !!$    call MatView(DXX,viewer,ierr)
>>> !!$    call PetscViewerDestroy(viewer,ierr)
>>> 
>>> !...Load a Matrix in Binary Format
>>> !!$    call 
>>> PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_READ,viewer,ierr)
>>> !!$    call MatCreate(PETSC_COMM_WORLD,DLOAD,ierr)
>>> !!$    call MatSetType(DLOAD,MATAIJ,ierr)
>>> !!$    call MatLoad(DLOAD,viewer,ierr)
>>> !!$    call PetscViewerDestroy(viewer,ierr)
>>> 
>>>    !call MatTranspose(DXX,MatReuse reuse,Mat *B)
>>>    !call MatView(DXX,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>> 
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> !! SOLVE POISSON EQUATION FOR THE FIRST TIME !!
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> 
>>> !...Create linear solver context
>>>    call KSPCreate(PETSC_COMM_WORLD,ksp,ierr)
>>> 
>>> !...Set operators. Here the matrix that defines the linear system also 
>>> serves as the preconditioning matrix.
>>>    !call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr) !aha 
>>> commented and replaced by next line
>>>    call KSPSetOperators(ksp,DXX,DXX,ierr) ! remember: here DXX=DXX+DYY
>>> 
>>>    !call KSPSetOperators(ksp,DLOAD,DLOAD,ierr) ! remember: here DXX=DXX+DYY
>>> 
>>>    tol = 1.e-10
>>>    !call 
>>> KSPSetTolerances(ksp,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr)
>>>  ! set relative tolerance and uses defualt for absolute and divergence tol
>>>    call 
>>> KSPSetTolerances(ksp,tol,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr) 
>>> ! set relative and absolute tolerances and uses defualt for divergence tol
>>> 
>>> !...Set the Direct (LU) Solver
>>>    !call KSPSetType(ksp,KSPPREONLY,ierr)
>>>    !call KSPGetPC(ksp,pc,ierr)
>>>    !call PCSetType(pc,PCLU,ierr)
>>>    !call PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS,ierr) ! 
>>> MATSOLVERSUPERLU_DIST
>>> 
>>> !...Set runtime options, e.g., -ksp_type <type> -pc_type <type> 
>>> -ksp_monitor -ksp_rtol <rtol>
>>>    !These options will override those specified above as long as 
>>> KSPSetFromOptions() is called _after_
>>>    !any other customization routines.
>>> 
>>>    call KSPSetFromOptions(ksp,ierr)
>>> 
>>> !...Sets an ADDITIONAL function to be called at every iteration to monitor 
>>> the residual/error etc
>>>    call 
>>> KSPMonitorSet(ksp,KSPMonitorTrueResidualNorm,PETSC_NULL_OBJECT,PETSC_NULL_FUNCTION,ierr)
>>> 
>>> !...Solve the linear system
>>>    call KSPSolve(ksp,frhs,sol,ierr)
>>> 
>>> !...View solver info (could use -ksp_view instead)
>>> 
>>>    call KSPView(ksp,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>> 
>>> !...Extract Maximum Value (at center of duct) from Solution Vector
>>> 
>>>    call VecScatterCreateToZero(sol,ctx,sol_seq,ierr)
>>>    !scatter as many times as you need
>>>    call VecScatterBegin(ctx,sol,sol_seq,INSERT_VALUES,SCATTER_FORWARD,ierr)
>>>    call VecScatterEnd(ctx,sol,sol_seq,INSERT_VALUES,SCATTER_FORWARD,ierr)
>>>    !destroy scatter context and local vector when no longer needed
>>>    call VecScatterDestroy(ctx,ierr)
>>> 
>>>    !call VecView(sol_seq,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>> 
>>>    call VecGetArray(sol_seq,sol_array,i_x,ierr)
>>> 
>>>    if (rank==0) then
>>> 
>>>       sol_mat_cmplx=0.0d0
>>> 
>>>       do i=1,nxm2
>>>          do j=1,nym2
>>>             ij=lim2(i)+j
>>>             sol_mat_cmplx(i,j) = sol_array(i_x + ij)
>>>          enddo
>>>       enddo
>>> 
>>>       U00 = sol_mat_cmplx((nxm2+1)/2,(nym2+1)/2)
>>> 
>>>    endif
>>> 
>>>    source=0
>>>    Call MPI_Bcast(U00,1,MPI_DOUBLE_COMPLEX,source,MPI_COMM_WORLD,ierr)
>>> 
>>>    !call VecGetArrayReadF90(sol,xx_v,ierr) ! requires #include 
>>> <petsc/finclude/petscvec.h90>
>>>    !atest = xx_v(3)
>>>    !call VecRestoreArrayReadF90(sol,xx_v,ierr)
>>>    !write(*,*)'atest',atest
>>> 
>>>    call VecRestoreArray(sol_seq,sol_array,i_x,ierr)
>>> 
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> !! SOLVE POISSON EQUATION FOR THE SECOND TIME !!
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> 
>>> !...Fill in New Right-Hand-Side Vector (frhs2)
>>>    call VecSetValues(frhs2,Iend-Istart,loc,xwork1/U00,INSERT_VALUES,ierr)
>>> 
>>> !...Assemble New RHS Vector
>>>    call VecAssemblyBegin(frhs2,ierr)
>>>    call VecAssemblyEnd(frhs2,ierr)
>>> 
>>> !...Solve System
>>>    call KSPSolve(ksp,frhs2,sol,ierr)
>>> 
>>> !...Get Final Solution
>>>    call VecScatterCreateToZero(sol,ctx,sol_seq,ierr)
>>>    call VecScatterBegin(ctx,sol,sol_seq,INSERT_VALUES,SCATTER_FORWARD,ierr)
>>>    call VecScatterEnd(ctx,sol,sol_seq,INSERT_VALUES,SCATTER_FORWARD,ierr)
>>>    call VecScatterDestroy(ctx,ierr)
>>> 
>>>    call VecGetArray(sol_seq,sol_array,i_x,ierr)
>>> 
>>>    if (rank==0) then
>>> 
>>>       usol=0.0d0
>>>       sol_mat_cmplx=0.0d0
>>> 
>>>       do i=1,nxm2
>>>          do j=1,nym2
>>>             ij=lim2(i)+j
>>>             sol_mat_cmplx(i,j) = sol_array(i_x + ij)
>>>          enddo
>>>       enddo
>>> 
>>>       usol(2:nx-1,2:ny-1) = dble(sol_mat_cmplx)
>>> 
>>>    endif
>>> 
>>>    call VecRestoreArray(sol_seq,sol_array,i_x,ierr)
>>> 
>>> !...Send usol to all other processes (vectorize, broadcast and matrixize)
>>>    li=0
>>>    do i=1,nx
>>>       li(i) = (i-1)*ny
>>>    enddo
>>> 
>>>    if (rank==0) then
>>> 
>>>       do i=1,nx
>>>          do j=1,ny
>>>             ij=li(i)+j
>>>             usol_vec(ij)=usol(i,j)
>>>          enddo
>>>       enddo
>>> 
>>>    endif
>>> 
>>>    call MPI_Bcast(usol_vec,nx*ny,MPI_DOUBLE,source,MPI_COMM_WORLD,ierr)
>>> 
>>>    if (rank/=0) then
>>> 
>>>       do i=1,nx
>>>          do j=1,ny
>>>             ij=li(i)+j
>>>             usol(i,j)=usol_vec(ij)
>>>          enddo
>>>       enddo
>>> 
>>>    endif
>>> 
>>> !...Free work space. All PETSc objects should be destroyed when they are no 
>>> longer needed.
>>> 
>>>    deallocate(loc)
>>>    deallocate(xwork1)
>>> 
>>>    call KSPDestroy(ksp,ierr)
>>>    call VecDestroy(frhs,ierr)
>>>    call VecDestroy(frhs2,ierr)
>>>    call VecDestroy(sol,ierr)
>>>    call VecDestroy(sol_seq,ierr)
>>>    call MatDestroy(DXX,ierr)
>>>    call MatDestroy(DYY,ierr)
>>> 
>>>    call PetscFinalize(ierr)
>>> 
>>> 
>>> END SUBROUTINE Poiseuille_3D_PETSC
>>> 
>>> 
>>> 
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> 
>>> 
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> !!!!!!! HIEMENZ FLOW SOLVER !!!!!!!!
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> 
>>> 
>>> !*************************************************************************************************
>>> ! on Veltins: ifort hiemenz.f90 -convert big_endian -r8 -i4 -O3 -132 -o 
>>> hiemenz.x
>>> ! 
>>> ************************************************************************************************
>>>      !program hiemenz
>>> 
>>>      subroutine hiemenz(ymax,eps,n,si,eta,u0,v0,v1,v2,v3,w0,w1,w2)
>>> 
>>>      implicit none
>>> 
>>>      real(dp) eps,ymax,si
>>>      integer(i4b) n,i
>>> 
>>> !      real(dp),allocatable,dimension(:) :: eta
>>> !      real(dp),allocatable,dimension(:) :: u0,v0,v1,v2,v3
>>> !      real(dp),allocatable,dimension(:) :: w0,w1,w2
>>> 
>>>      real(dp),dimension(n) :: eta
>>>      real(dp),dimension(n) :: u0,v0,v1,v2,v3
>>>      real(dp),dimension(n) :: w0,w1,w2
>>> 
>>>      write(*,*)
>>>      write(*,*) 
>>> '*******************************************************************'
>>>      write(*,*) '                       HIEMENZ SOLVER                      
>>>         '
>>>      write(*,*) 
>>> '*******************************************************************'
>>>      write(*,*)
>>> 
>>> !
>>> !..... Numerical Parameters
>>> !
>>> 
>>> !Domain height and number of points
>>> 
>>>      !ymax=100
>>>      !n=500001
>>> 
>>> !Tolerance for Newton iteration
>>> 
>>>      !eps=1e-12
>>> 
>>> !Initial guess for Newton iteration: f''(0)
>>> 
>>>      !si = -1.232587656820134
>>> 
>>> !
>>> !..... Allocate arrays
>>> !
>>> !      allocate(eta(n))
>>> !      allocate(u0(n),v0(n),v1(n),v2(n),v3(n))
>>> !      allocate(w0(n),w1(n),w2(n))
>>> 
>>> !
>>> !..... Compute solution
>>> !
>>> 
>>>      call hiemenz_fct(ymax,eps,n,si,eta,u0,v0,v1,v2,v3,w0,w1,w2)
>>> 
>>> !
>>> !..... Write out solution in ASCII format
>>> !
>>> 
>>>      write(*,*)
>>>      write(*,*) 'Writing solution to hiemenz.dat'
>>>      write(*,*)
>>> 
>>>      open(unit=15,file='hiemenz.dat',form='formatted')
>>> 
>>>      do i=1,n
>>>         write(15,'(9E21.11)') 
>>> eta(i),u0(i),v0(i),v1(i),v2(i),v3(i),w0(i),w1(i),w2(i)
>>>      end do
>>> 
>>>      close(15)
>>> 
>>> 
>>> 
>>>    end subroutine hiemenz
>>> 
>>> 
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> 
>>> 
>>>      subroutine hiemenz_fct(ymax,eps,n,si,eta,u0,v0,v1,v2,v3,w0,w1,w2)
>>> 
>>>      implicit none
>>> 
>>>      integer(i4b) i,iter,n,j
>>>      real(dp) ymax,res,p,eps,s,si
>>> 
>>>      real(dp),allocatable,dimension(:,:) :: v,w
>>> 
>>>      real(dp) eta(n)
>>>      real(dp) u0(n),v0(n),v1(n),v2(n),v3(n)
>>>      real(dp) w0(n),w1(n),w2(n)
>>> 
>>>      !external w_ode
>>> 
>>>      write(*,*) 'eps=',eps
>>>      write(*,*) 'ymax=',ymax
>>>      write(*,*) 'N=',n
>>>      write(*,*)
>>> 
>>> !
>>> !     Allocate arrays
>>> !
>>>      allocate(v (6,n))
>>>      allocate(w (5,n))
>>> 
>>> !
>>> !     Create grid
>>> !
>>>      do i=1,n
>>>         eta(i) = ymax * real(i-1)/real(n-1)
>>>      end do
>>> !
>>> !     Newton-Raphson iteration for v
>>> !
>>>      iter = 0
>>>      s = si
>>>      res = 1.
>>>      write(*,*) 'Newton-Raphson iteration for v: s=',s
>>> 
>>>      do while (res.gt.eps)
>>> 
>>>         iter=iter+1
>>> 
>>>         call hiemenz_eval(s,eta,n,v)
>>> 
>>>         s = s - (v(2,n) + 1.0) / v(5,n)
>>>         res = abs(v(2,n) + 1.0)
>>> 
>>>         write(*,*) iter,s,res
>>> 
>>>      end do
>>> 
>>>      write(*,*) 'Number of iterations ',iter
>>>      write(*,*)
>>>      write(*,*) "v''(0)=",v(3,1)
>>> 
>>> !
>>> !     Now compute w
>>> !
>>>      w(1,1) = 0.
>>>      w(2,1) = v(3,1)
>>> 
>>>      w(3,1) = 0.
>>>      w(4,1) = 0.
>>>      w(5,1) = v(3,1)
>>> 
>>>      do i=2,n
>>>         call rk4(w(1,i-1),w(1,i),eta(i)-eta(i-1),5,w_ode)
>>>      end do
>>> 
>>> !
>>> !     Rescale w to match boundary conditions at infinity
>>> !
>>>      p = 1./w(1,n)
>>> 
>>>      do i=1,n
>>>         w(1,i) = w(1,i) * p
>>>         w(2,i) = w(2,i) * p
>>>      end do
>>> 
>>>      write(*,*) "w'(0)=",w(2,1)
>>> 
>>> !
>>> !     Now compute v''', w'' from the RHS and copy all values to new arrays
>>> !
>>> 
>>>      do i=1,n
>>> 
>>>         u0(i) = -v(2,i)
>>>         v0(i) = v(1,i)
>>>         v1(i) = v(2,i)
>>>         v2(i) = v(3,i)
>>>         v3(i) = -v(2,i)*v(2,i) + v(1,i)*v(3,i) + 1.0
>>>         w0(i) = w(1,i)
>>>         w1(i) = w(2,i)
>>>         w2(i) = w(2,i)*v(1,i)
>>> 
>>>      end do
>>> 
>>> 
>>>      deallocate(v,w)
>>> 
>>>    end subroutine hiemenz_fct
>>> 
>>> 
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> 
>>> 
>>>      subroutine hiemenz_eval(s,eta,n,f)
>>> 
>>>      implicit none
>>> 
>>>      integer(i4b) n,i
>>>      real(dp) s
>>>      real(dp) eta(n),f(6,n)
>>> 
>>>      !external v_ode
>>> 
>>>      f(1,1) = 0.
>>>      f(2,1) = 0.
>>>      f(3,1) = s
>>> 
>>>      f(4,1) = 0.
>>>      f(5,1) = 0.
>>>      f(6,1) = 1.
>>> 
>>>      do i=2,n
>>>         call rk4(f(1,i-1),f(1,i),eta(i)-eta(i-1),6,v_ode)
>>>      end do
>>> 
>>>    end subroutine hiemenz_eval
>>> 
>>> 
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> 
>>> 
>>>      subroutine v_ode(y,dy)
>>> 
>>>      implicit none
>>> 
>>>      real(dp) y(6),dy(6)
>>> 
>>>      dy(1) = y(2)
>>>      dy(2) = y(3)
>>>      dy(3) = -y(2)*y(2) + y(1)*y(3) + 1.0
>>> 
>>>      dy(4) = y(5)
>>>      dy(5) = y(6)
>>>      dy(6) = y(3)*y(4) - 2.0*y(2)*y(5) + y(1)*y(6)
>>> 
>>>      end subroutine v_ode
>>> 
>>> 
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> 
>>> 
>>>      subroutine w_ode(y,dy)
>>> 
>>>      implicit none
>>> 
>>>      real(dp) y(5),dy(5)
>>> 
>>>      dy(1) = y(2)
>>>      dy(2) = y(2)*y(3)
>>> 
>>>      dy(3) = y(4)
>>>      dy(4) = y(5)
>>>      dy(5) = -y(4)*y(4) + y(3)*y(5) + 1.0
>>> 
>>>      end subroutine w_ode
>>> 
>>> 
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> 
>>> 
>>>      subroutine rk4(y1,y2,h,nn,der)
>>> !
>>> !     Perform one RK4 step with step size h of a n-vector
>>> !     Input:  y1(n)  old step
>>> !             h      step size
>>> !             nn     vector size
>>> !             der    derivative function
>>> !     Output: y2(nn) new step
>>> !
>>>      implicit none
>>> 
>>>      integer(i4b) nn
>>> 
>>>      real(dp) y1(nn),y2(nn),y(nn)
>>>      real(dp) k1(nn),k2(nn),k3(nn),k4(nn)
>>>      real(dp) h
>>> 
>>>      external der
>>> !
>>> !     First RK substep
>>> !
>>> 
>>>      y(:) = y1(:)
>>>      call der(y,k1)
>>>      k1(:)=h*k1(:)
>>> 
>>> !
>>> !     Second RK substep
>>> !
>>> 
>>>      y(:) = y1(:) + .5*k1(:)
>>>      call der(y,k2)
>>>      k2(:)=h*k2(:)
>>> 
>>> !
>>> !     Third RK substep
>>> !
>>> 
>>>      y(:) = y1(:) + .5*k2(:)
>>>      call der(y,k3)
>>>      k3(:)=h*k3(:)
>>> 
>>> !
>>> !     Fourth RK substep
>>> !
>>> 
>>>      y(:) = y1(:) + k3(:)
>>>      call der(y,k4)
>>>      k4(:)=h*k4(:)
>>> 
>>> !
>>> !     Compose solution after full RK step
>>> !
>>>      y2(:) = y1(:) + ( k1(:) + 2.0*k2(:) + 2.0*k3(:) + k4(:) ) / 6.0
>>> 
>>>      end subroutine rk4
>>> 
>>> 
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> 
>>> 
>>>      subroutine compute_baseflow_derivative(DXYZ,uvwvec,nx,ny,UVWxyz)
>>> 
>>>        implicit none
>>> 
>>>        integer(i4b) :: i,j
>>> 
>>>        integer(i4b),intent(in) :: nx,ny
>>> 
>>>        real(dp),dimension(nx*ny),intent(in) :: uvwvec
>>>        real(dp),dimension(nx*ny,nx*ny),intent(in) :: DXYZ
>>>        real(dp),dimension(nx*ny,nx*ny),intent(out) :: UVWxyz
>>> 
>>>        UVWxyz = 0.0
>>> 
>>>        do i=1,nx*ny
>>>           do j=1,nx*ny
>>> 
>>>              UVWxyz(i,i) = UVWxyz(i,i) + DXYZ(i,j)*uvwvec(j)
>>> 
>>>           enddo
>>>        enddo
>>> 
>>>      end subroutine compute_baseflow_derivative
>>> 
>>> 
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> 
>>> 
>>>      subroutine compute_UVW_DXYZ(DXYZ,uvwvec,nx,ny,UVW_DXYZ)
>>> 
>>>        implicit none
>>> 
>>>        integer(i4b) :: i,j
>>> 
>>>        integer(i4b),intent(in) :: nx,ny
>>> 
>>>        real(dp),dimension(nx*ny),intent(in) :: uvwvec
>>>        real(dp),dimension(nx*ny,nx*ny),intent(in) :: DXYZ
>>>        real(dp),dimension(nx*ny,nx*ny),intent(out) :: UVW_DXYZ
>>> 
>>>        do i=1,nx*ny
>>>           do j=1,nx*ny
>>> 
>>>              UVW_DXYZ(i,j) = uvwvec(i)*DXYZ(i,j)
>>> 
>>>           enddo
>>>        enddo
>>> 
>>>      end subroutine compute_UVW_DXYZ
>>> 
>>> 
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> 
>>> 
>>> 
>>> 
>>> END MODULE BASEFLOW
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> !!$
>>> !!$
>>> !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> !!$
>>> !!$  SUBROUTINE Poiseuille_3D(nz,ny,AA,usol,zz,yy)
>>> !!$
>>> !!$    character(128) :: file_name
>>> !!$    integer(i4b) :: INFO,NRHS,ct,i,j,N,file_num,m1,n1,p1,q1
>>> !!$
>>> !!$    integer(i4b),intent(in) :: nz,ny
>>> !!$    real(dp),intent(in) :: AA
>>> !!$
>>> !!$    real(dp) :: ymin,ymax,zmin,zmax,dzidz,d2zidz2,dyidy,d2yidy2,U00
>>> !!$    real(dp) :: start,finish
>>> !!$
>>> !!$    real(dp),dimension(nz,nz,2) :: DMZ
>>> !!$    real(dp),dimension(ny,ny,2) :: DMY
>>> !!$
>>> !!$    real(dp),dimension(nz,nz) :: D2Z
>>> !!$    real(dp),dimension(ny,ny) :: D2Y
>>> !!$
>>> !!$    real(dp),dimension(nz-2,nz-2) :: D2Zs
>>> !!$    real(dp),dimension(ny-2,ny-2) :: D2Ys
>>> !!$
>>> !!$    real(dp),dimension(nz-2,nz-2) :: IZ
>>> !!$    real(dp),dimension(ny-2,ny-2) :: IY
>>> !!$
>>> !!$    real(dp),dimension(nz,ny) :: usol,zz,yy
>>> !!$    real(dp),dimension(nz-2,ny-2) :: utmp
>>> !!$
>>> !!$    real(dp),dimension(nz) :: zi,z
>>> !!$    real(dp),dimension(ny) :: yi,y
>>> !!$
>>> !!$    real(dp),dimension( (nz-2)*(ny-2) ) :: frhs
>>> !!$    real(dp),dimension( (nz-2)*(ny-2),(nz-2)*(ny-2) ) :: AMAT,DZZ,DYY
>>> !!$
>>> !!$    integer(i4b),dimension((nz-2)*(ny-2)) :: IPIV
>>> !!$
>>> !!$    frhs=0.0
>>> !!$
>>> !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> !!$!!!!!!! 3D POISEUILLE FLOW !!!!!!!!
>>> !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> !!$
>>> !!$    zmin=-AA
>>> !!$    zmax=AA
>>> !!$
>>> !!$    ymin=-1
>>> !!$    ymax=1
>>> !!$
>>> !!$    !
>>> !!$    !! .....Create Grid
>>> !!$    !
>>> !!$
>>> !!$    if ( mod(nz,2)==0 .or.  mod(ny,2)==0   ) then
>>> !!$       WRITE(*,*)
>>> !!$       STOP'STOPPING... IN 3D POISEUILLE BASEFLOW NZ,NY NEED TO BE ODD'
>>> !!$    endif
>>> !!$
>>> !!$    !
>>> !!$    !! .....Get collocation points and chebyshev differentiation matrices
>>> !!$    !
>>> !!$
>>> !!$    call cheby(nz-1,zi,DMZ)
>>> !!$    call cheby(ny-1,yi,DMY)
>>> !!$
>>> !!$    ! z-mapping from [-1,1]->[-ZMAX,ZMAX]
>>> !!$    z=zmax*zi
>>> !!$
>>> !!$    ! y-mapping (from [-1,1]->[-1,1])
>>> !!$    y=yi
>>> !!$
>>> !!$    !Compute the metric terms
>>> !!$    dzidz = 1/zmax
>>> !!$    d2zidz2 = 0
>>> !!$
>>> !!$    dyidy = 1
>>> !!$    d2yidy2 = 0
>>> !!$
>>> !!$    !Scale the differentiation matrix (due to the use of the mapping)
>>> !!$
>>> !!$    !z-direction
>>> !!$    !D1Z = dzidz*DMZ(:,:,1)
>>> !!$    D2Z  = d2zidz2*DMZ(:,:,1) + dzidz**2*DMZ(:,:,2)
>>> !!$    D2Zs = D2Z(2:nz-1,2:nz-1)
>>> !!$
>>> !!$    !y-direction
>>> !!$    !D1Y = dyidy*DMY(:,:,1)
>>> !!$    D2Y  = d2yidy2*DMY(:,:,1) + dyidy**2*DMY(:,:,2)
>>> !!$    D2Ys = D2Y(2:ny-1,2:ny-1)
>>> !!$
>>> !!$    IZ=0.0
>>> !!$    IY=0.0
>>> !!$
>>> !!$    do i=1,nz-2
>>> !!$       IZ(i,i)=1.0
>>> !!$    enddo
>>> !!$
>>> !!$    do i=1,ny-2
>>> !!$       IY(i,i)=1.0
>>> !!$    enddo
>>> !!$
>>> !!$    m1=size(IY,1)
>>> !!$    n1=size(IY,2)
>>> !!$    p1=size(D2Zs,1)
>>> !!$    q1=size(D2Zs,2)
>>> !!$
>>> !!$    call mykron(IY,D2Zs,m1,n1,p1,q1,DZZ)
>>> !!$
>>> !!$    m1=size(D2Ys,1)
>>> !!$    n1=size(D2Ys,2)
>>> !!$    p1=size(IZ,1)
>>> !!$    q1=size(IZ,2)
>>> !!$
>>> !!$    call mykron(D2Ys,IZ,m1,n1,p1,q1,DYY)
>>> !!$
>>> !!$    AMAT = DYY + DZZ
>>> !!$
>>> !!$    !CALL PRINT_MATRIX( 'DZZ', (nz-2)*(ny-2), (nz-2)*(ny-2), DZZ )
>>> !!$    !CALL PRINT_MATRIX( 'DYY', (nz-2)*(ny-2), (nz-2)*(ny-2),DYY )
>>> !!$
>>> !!$
>>> !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> !!$    !! SOLVE POISSON EQUATION FOR THE FIRST TIME !!
>>> !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> !!$
>>> !!$
>>> !!$
>>> !!$    do i=1,(ny-2)*(nz-2)
>>> !!$       frhs(i)=-2.0
>>> !!$    enddo
>>> !!$
>>> !!$    INFO=0
>>> !!$    NRHS=1 ! only 1 rhs here
>>> !!$    N=(nz-2)*(ny-2)
>>> !!$
>>> !!$
>>> !!$    !
>>> !!$    ! Compute the LU factorization of A.
>>> !!$    !
>>> !!$
>>> !!$    ! Note: on exit of dgetrf, AMAT is not AMAT anymore but contains the 
>>> factors L and U
>>> !!$    ! from the factorization A = P*L*U; the unit diagonal elements of L 
>>> are not stored.
>>> !!$
>>> !!$    !call PRINT_MATRIX( 'AMAT', N, N, AMAT )
>>> !!$
>>> !!$
>>> !!$    call cpu_time(start)
>>> !!$    CALL dgetrf( N, N, AMAT, N, IPIV, INFO )
>>> !!$    call cpu_time(finish)
>>> !!$
>>> !!$    !write(*,*)'AMAT AFTER',AMAT
>>> !!$
>>> !!$    print '("Time LU = ",f15.4," seconds.")',finish-start
>>> !!$
>>> !!$    IF( INFO .NE. 0 ) THEN
>>> !!$       WRITE(*,*)'INFO',INFO
>>> !!$       STOP 'PROBLEM IN LAPACK (Poiseuille_3D)'
>>> !!$    ENDIF
>>> !!$
>>> !!$
>>> !!$    !
>>> !!$    ! Solve the system A*X = B, overwriting B with X.
>>> !!$    !
>>> !!$
>>> !!$    IF( INFO.EQ.0 ) THEN
>>> !!$
>>> !!$       call cpu_time(start)
>>> !!$       CALL dgetrs( 'No transpose', n, nrhs, AMAT, N, IPIV, frhs, N, 
>>> INFO )
>>> !!$       call cpu_time(finish)
>>> !!$
>>> !!$       print '("Time Linear System Solve = ",f15.4," 
>>> seconds.")',finish-start
>>> !!$       write(*,*)
>>> !!$
>>> !!$    END IF
>>> !!$
>>> !!$    ct=0
>>> !!$    do j=1,ny-2
>>> !!$       do i=1,nz-2
>>> !!$
>>> !!$          ct=ct+1
>>> !!$          utmp(i,j) = frhs(ct)
>>> !!$
>>> !!$       enddo
>>> !!$    enddo
>>> !!$
>>> !!$    usol = 0.0
>>> !!$    usol(2:nz-1,2:ny-1) = utmp(:,:)
>>> !!$
>>> !!$    zz=0.0
>>> !!$    yy=0.0
>>> !!$
>>> !!$    do i=1,nz
>>> !!$       do j=1,ny
>>> !!$
>>> !!$          zz(i,j)=z(i)
>>> !!$          yy(i,j)=y(j)
>>> !!$
>>> !!$       enddo
>>> !!$    enddo
>>> !!$
>>> !!$    U00 = usol( (nz+1)/2 , (ny+1)/2 )
>>> !!$
>>> !!$
>>> !!$
>>> !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> !!$!! SOLVE POISSON EQUATION FOR THE SECOND TIME !!
>>> !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> !!$
>>> !!$
>>> !!$    utmp=0.0
>>> !!$    usol=0.0
>>> !!$    frhs=0.0
>>> !!$
>>> !!$    !New,scaled rhs
>>> !!$    do i=1,(ny-2)*(nz-2)
>>> !!$       frhs(i)=-2.0/U00
>>> !!$    enddo
>>> !!$
>>> !!$    !
>>> !!$    ! Solve the system A*X = B, overwriting B with X.
>>> !!$    !
>>> !!$
>>> !!$    ! Note: here we already have the LU factorization so that we only 
>>> need a call to dgetrs
>>> !!$
>>> !!$    CALL dgetrs( 'No transpose', N, NRHS, AMAT, N, IPIV, frhs, N, INFO )
>>> !!$
>>> !!$    IF( INFO .NE. 0 ) THEN
>>> !!$       WRITE(*,*)'INFO',INFO
>>> !!$       STOP 'PROBLEM IN LAPACK (Poiseuille_3D -- solve 2)'
>>> !!$    ENDIF
>>> !!$
>>> !!$    !write(*,*)'frhs',frhs
>>> !!$
>>> !!$    ct=0
>>> !!$    do j=1,ny-2
>>> !!$       do i=1,nz-2
>>> !!$
>>> !!$          ct=ct+1
>>> !!$          utmp(i,j) = frhs(ct)
>>> !!$
>>> !!$       enddo
>>> !!$    enddo
>>> !!$
>>> !!$    usol(2:nz-1,2:ny-1) = utmp(:,:)
>>> !!$
>>> !!$
>>> !!$  END SUBROUTINE Poiseuille_3D
>>> !!$
>>> !!$
>>> !
>>> !  Description: Build complex matrices A and B in the context of a 
>>> generalized
>>> !  eigenvalue problem (GEVP)
>>> !
>>> !
>>> !/*T
>>> !  Concepts: Build complex matrices
>>> !  Processors: n
>>> !T*/
>>> !
>>> ! -----------------------------------------------------------------------
>>> 
>>> MODULE PETSC
>>> 
>>>  USE TYPEDEF
>>>  USE WRITE_PRINT
>>> 
>>>  IMPLICIT NONE
>>> 
>>> CONTAINS
>>> 
>>>  Subroutine SetupSolveGEVP(nx,ny,nxy,alpha,beta,Re)
>>> 
>>> #include <petsc/finclude/petscsys.h>
>>> #include <petsc/finclude/petscvec.h>
>>> #include <petsc/finclude/petscmat.h>
>>> #include <petsc/finclude/petscmat.h90>
>>> #include <petsc/finclude/petscpc.h>
>>> #include <petsc/finclude/petscksp.h>
>>> 
>>> #include <slepc/finclude/slepcsys.h>
>>> #include <slepc/finclude/slepceps.h>
>>> 
>>> !...Non PETSc/SLEPc Variables
>>> 
>>>    real(dp),intent(in) :: alpha,beta,Re
>>>    integer(i4b),intent(in) :: nx,ny,nxy
>>> 
>>>    integer(i4b) :: ij
>>>    integer(i4b) :: sizem,rank,source
>>> 
>>>    integer(i4b),dimension(:),allocatable  :: idxm1,idxn1
>>>    integer(i4b),dimension(:),allocatable  :: idxm2,idxn2
>>>    integer(i4b),dimension(:),allocatable  :: idxm3,idxn3
>>> 
>>>    integer(i4b),dimension(:),allocatable  :: li
>>>    integer(i4b),dimension(:),allocatable  :: all_bc
>>>    integer(i4b),dimension(:),allocatable  :: u_bc,v_bc,w_bc,p_bc
>>>    integer(i4b),dimension(:),allocatable  :: fst_bc,wall_bc
>>>    integer(i4b),dimension(:),allocatable  :: outlet_bc,inlet_bc
>>> 
>>> 
>>>    complex(dpc) :: ibeta,Rec,alphac,betac,small,sigma
>>>    complex(dpc),dimension(:,:),allocatable :: D1Xc,D2Xc,D1Yc,D2Yc
>>> 
>>>    complex(dpc),dimension(nxy) :: uvec_cmplx,uxvec_cmplx,uyvec_cmplx
>>>    complex(dpc),dimension(nxy) :: vvec_cmplx,vxvec_cmplx,vyvec_cmplx
>>>    complex(dpc),dimension(nxy) :: wvec_cmplx,wxvec_cmplx,wyvec_cmplx
>>> 
>>>    complex(dpc),dimension(3*nxy) :: wvec_cmplx3
>>>    complex(dpc),dimension(ny,ny) :: VDIAG,VDY
>>> 
>>> !...Temporary variables
>>>    integer(i4b),dimension(:),allocatable  :: loc
>>>    complex(dpc),dimension(:),allocatable  :: xwork1
>>> 
>>> 
>>> !...SLEPC VARIABLES
>>>    EPS eps
>>>    EPSType tname
>>> 
>>>    ST st
>>>    PC pc
>>> 
>>>    Vec xr,xi
>>> 
>>>    PetscReal      error
>>>    PetscScalar    kr,ki
>>> 
>>>    PetscLogDouble mem
>>> 
>>>    PetscInt       maxit,nconv
>>>    PetscInt       nev,ncv,mpd
>>>    PetscInt       three
>>>    PetscInt       ct,ct1,ct2,ct3,ct4
>>> 
>>>    PetscInt       icntl,ival,ival1
>>> 
>>>    PetscReal tolslepc
>>>    PetscInt maxiter
>>> 
>>> !...PETSC VARIABLES
>>> 
>>>    PetscReal      norm,tol,nrm,nrm1
>>>    PetscInt       i,j,k,II,JJ,its
>>> 
>>>    PetscInt       jmin,jmax
>>>    PetscInt       ione,ntimes
>>>    PetscErrorCode ierr
>>>    PetscBool      flg
>>>    PetscScalar    v,one,zero,rhs
>>> 
>>>    KSP            ksp
>>>    PetscRandom    rctx
>>> 
>>> !...MPI
>>>    !PetscMPIInt    rank,size
>>> 
>>> !...Matrices indices
>>> 
>>>    PetscInt       IstartA,IendA,IstartB,IendB
>>> 
>>> !...Vectors
>>> 
>>>    Vec            i_vec,frhs,sol
>>> 
>>> !...Matrices
>>> 
>>>    Mat            A,B
>>> 
>>>    !  Note: Any user-defined Fortran routines (such as MyKSPMonitor)
>>>    !  MUST be declared as external.
>>> 
>>>    !external MyKSPMonitor,MyKSPConverged
>>> 
>>> !#if !defined(PETSC_USE_COMPLEX)
>>> !#error "this code requires PETSc --with-scalar-type=complex build"
>>> !#endif
>>> !#if !defined(PETSC_USE_REAL_DOUBLE)
>>> !#error "this code requires PETSc --with-precision=real build"
>>> !#endif
>>> 
>>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -!
>>> !                           Beginning of program                         !
>>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -!
>>> 
>>>    call SlepcInitialize(PETSC_NULL_CHARACTER,ierr)!;CHKERRQ(ierr)
>>> 
>>>    call PetscMemorySetGetMaximumUsage(ierr)!;CHKERRQ(ierr)
>>> 
>>>    call PetscOptionsInsertString("-mat_superlu_dist_parsymbfact",ierr)
>>> 
>>>    call MPI_COMM_SIZE(MPI_COMM_WORLD,sizem,ierr)!;CHKERRQ(ierr)
>>>    call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)!;CHKERRQ(ierr)
>>> 
>>>    call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>>    !call PetscMemoryGetMaximumUsage(mem,ierr)
>>>    call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr)
>>>    if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (entering 
>>> module)',rank,'    memory:',mem
>>> 
>>>    !call PetscOptionsGetInt(PETSC_NULL_CHARACTER,'-nx',nx,flg,ierr)
>>>    !call PetscOptionsGetInt(PETSC_NULL_CHARACTER,'-ny',ny,flg,ierr)
>>> 
>>>    ione=1 ! integer
>>>    ibeta = imag_unit*beta
>>>    one = (1.0d0,0.0d0)
>>>    zero = (0.0d0,0.0d0)
>>>    Rec=Re
>>>    alphac=alpha
>>>    betac=beta
>>> 
>>>    source=0
>>> 
>>>    ct=0
>>> 
>>> !...Convert Arrays to Complex
>>> 
>>>    uvec_cmplx=uvec
>>>    uxvec_cmplx=uxvec
>>>    uyvec_cmplx=uyvec
>>> 
>>>    vvec_cmplx=vvec
>>>    vxvec_cmplx=vxvec
>>>    vyvec_cmplx=vyvec
>>> 
>>>    wvec_cmplx=wvec
>>>    wxvec_cmplx=wxvec
>>>    wyvec_cmplx=wyvec
>>> 
>>>    wvec_cmplx3(1:nxy)=wvec_cmplx
>>>    wvec_cmplx3(nxy+1:2*nxy)=wvec_cmplx
>>>    wvec_cmplx3(2*nxy+1:3*nxy)=wvec_cmplx
>>> 
>>>    allocate(D1Xc(nx,nx),D2Xc(nx,nx),D1Yc(ny,ny),D2Yc(ny,ny))
>>> 
>>>    D1Xc=D1X
>>>    D1Yc=D1Y
>>> 
>>>    D2Xc=D2X
>>>    D2Yc=D2Y
>>> 
>>>    !call 
>>> MPI_Bcast(wvec_cmplx3,3*nxy,MPI_DOUBLE_COMPLEX,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>> 
>>> !...Parallel Matrices
>>> 
>>>    !Create A
>>>    call 
>>> MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4*nxy,4*nxy,2*(nx+ny),PETSC_NULL_INTEGER,max(nx,ny),PETSC_NULL_INTEGER,A,ierr)!;CHKERRQ(ierr)
>>> 
>>>    !Create B
>>> 
>>>    call 
>>> MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4*nxy,4*nxy,1,PETSC_NULL_INTEGER,0,PETSC_NULL_INTEGER,B,ierr)!;CHKERRQ(ierr)
>>> 
>>>    call MatGetOwnershipRange(A,IstartA,IendA,ierr)!;CHKERRQ(ierr)
>>>    call MatGetOwnershipRange(B,IstartB,IendB,ierr)!;CHKERRQ(ierr)
>>> 
>>>    !write(*,*)'IstartA,IendA',IstartA,IendA
>>>    !write(*,*)'IstartB,IendB',IstartB,IendB
>>> 
>>>    if (IstartA > 4*nxy .or. IstartB > 4*nxy .or. IendA > 4*nxy .or. IendB > 
>>> 4*nxy) then
>>>       stop 'indices problem in module_petsc'
>>>    endif
>>> 
>>> !...NEED TO TEST WITHOUT THIS NEXT LINE TO UNDERSTAND 
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> !...This line avoids error [0]PETSC ERROR: Argument out of range [0]PETSC 
>>> ERROR: New nonzero at (470,470)...
>>>    call 
>>> MatSetOption(A,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE,ierr)!;CHKERRQ(ierr)
>>> 
>>> !...Get Memory Usage
>>>    call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>>    !call PetscMemoryGetMaximumUsage(mem,ierr)
>>>    call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr)
>>>    if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (after 
>>> matcreate)',rank,'    memory:',mem
>>> 
>>> !...Build A in Parallel
>>> 
>>>    if (rank == 0) call cpu_time(start)
>>> 
>>>    do i=IstartA,IendA-1
>>> 
>>>       if     ( i < nxy ) then
>>> 
>>>          if (abs(uyvec_cmplx(i+1))>1e10) STOP '1'
>>>          call 
>>> MatSetValues(A,ione,i,ione,i+nxy,uyvec_cmplx(i+1),INSERT_VALUES,ierr)!;CHKERRQ(ierr)
>>>  ! Uy (block 1,2)
>>> 
>>>          !Build DX (block 1,4)
>>>          ii=1+floor((i+0.01)/ny)
>>>          jj=1
>>> 
>>>          jmin=mod(i,ny)
>>>          jmax=jmin+(nx-1)*ny
>>> 
>>>          do j=jmin,jmax,ny
>>> 
>>>             if (abs(D1Xc(ii,jj))>1e10) STOP '2'
>>>             call 
>>> MatSetValues(A,ione,i,ione,j+3*nxy,D1Xc(ii,jj),INSERT_VALUES,ierr) ! Build 
>>> DX (block 1,4)
>>>             jj=jj+1
>>> 
>>>          enddo
>>> 
>>>          !Build -DXX/Re (part of block 1,1)
>>>          ii=1+floor((i+0.01)/ny)
>>>          jj=1
>>> 
>>>          jmin=mod(i,ny)
>>>          jmax=jmin+(nx-1)*ny
>>> 
>>>          do j=jmin,jmax,ny
>>> 
>>>             if (abs(-D2Xc(ii,jj)/Rec)>1e10) STOP '3'
>>>             call 
>>> MatSetValues(A,ione,i,ione,j,-D2Xc(ii,jj)/Rec,INSERT_VALUES,ierr) !Build 
>>> -DXX/Re (part of block 1,1)
>>>             jj=jj+1
>>> 
>>>          enddo
>>> 
>>> 
>>>       elseif ( i > nxy-1 .and. i < 2*nxy ) then
>>> 
>>>          if (abs(vxvec_cmplx(i-nxy+1))>1e10) STOP '4'
>>>          call 
>>> MatSetValues(A,ione,i,ione,i-nxy,vxvec_cmplx(i-nxy+1),INSERT_VALUES,ierr)!;CHKERRQ(ierr)
>>>  ! Vx (block 2,1)
>>> 
>>>          !Build DY (block 2,4)
>>>          ii=mod(i,ny)+1
>>>          jj=1
>>> 
>>>          jmin=floor((i-nxy+0.01)/ny)*ny
>>> 
>>>          do j=jmin,jmin+(ny-1)
>>> 
>>>             if (abs(D1Yc(ii,jj))>1e10) STOP '5'
>>>             call 
>>> MatSetValues(A,ione,i,ione,j+3*nxy,D1Yc(ii,jj),INSERT_VALUES,ierr) ! Build 
>>> DY (block 2,4)
>>>             jj=jj+1
>>> 
>>>          enddo
>>> 
>>> 
>>>          !Build -DXX/Re (part of block 2,2)
>>>          ii=1+floor((i-nxy+0.01)/ny)
>>>          jj=1
>>> 
>>>          jmin=mod(i,ny)
>>>          jmax=jmin+(nx-1)*ny
>>> 
>>>          do j=jmin,jmax,ny
>>> 
>>>             if (abs(-D2Xc(ii,jj)/Rec)>1e10) STOP '6'
>>>             call 
>>> MatSetValues(A,ione,i,ione,j+nxy,-D2Xc(ii,jj)/Rec,INSERT_VALUES,ierr) 
>>> !Build -DXX/Re (part of block 2,2)
>>>             jj=jj+1
>>> 
>>>          enddo
>>> 
>>> 
>>> 
>>> 
>>>       elseif ( i > 2*nxy-1 .and. i < 3*nxy ) then
>>> 
>>>          call MatSetValues(A,ione,i,ione,i+nxy,ibeta,INSERT_VALUES,ierr)! 
>>> iBeta (block 3,4)
>>> 
>>>          call 
>>> MatSetValues(A,ione,i,ione,i-2*nxy,wxvec_cmplx(i-2*nxy+1),INSERT_VALUES,ierr)!
>>>  Wx (block 3,1)
>>>          call MatSetValues(A,ione,i,ione,i  
>>> -nxy,wyvec_cmplx(i-2*nxy+1),INSERT_VALUES,ierr)! Wy (block 3,2)
>>> 
>>> 
>>>          !Build -DXX/Re (part of block 3,3)
>>>          ii=1+floor((i-2*nxy+0.01)/ny)
>>>          jj=1
>>> 
>>>          jmin=mod(i,ny)
>>>          jmax=jmin+(nx-1)*ny
>>> 
>>>          do j=jmin,jmax,ny
>>> 
>>>             call 
>>> MatSetValues(A,ione,i,ione,j+2*nxy,-D2Xc(ii,jj)/Rec,INSERT_VALUES,ierr) 
>>> !Build -DXX/Re (part of block 3,3)
>>>             jj=jj+1
>>> 
>>>          enddo
>>> 
>>> 
>>>       elseif ( i > 3*nxy-1 ) then
>>> 
>>>          call MatSetValues(A,ione,i,ione,i-nxy,ibeta,INSERT_VALUES,ierr)! 
>>> iBeta (block 4,3)
>>> 
>>>          !Build DX (block 4,1)
>>>          ii=1+floor(((i-3*nxy)+0.01)/ny)
>>>          jj=1
>>> 
>>>          jmin=mod(i,ny)
>>>          jmax=jmin+(nx-1)*ny
>>> 
>>>          do j=jmin,jmax,ny
>>> 
>>>             call 
>>> MatSetValues(A,ione,i,ione,j,D1Xc(ii,jj),INSERT_VALUES,ierr)
>>>             jj=jj+1
>>> 
>>>          enddo
>>> 
>>>          !Build DY (block 4,2)
>>>          ii=mod(i,ny)+1
>>>          jj=1
>>> 
>>>          jmin=floor((i-3*nxy+0.01)/ny)*ny
>>> 
>>>          do j=jmin,jmin+(ny-1)
>>> 
>>>             call 
>>> MatSetValues(A,ione,i,ione,j+nxy,D1Yc(ii,jj),INSERT_VALUES,ierr)
>>>             jj=jj+1
>>> 
>>>          enddo
>>> 
>>>       endif
>>> 
>>>    enddo
>>> 
>>>    if (rank==0) then
>>> 
>>>       call cpu_time(finish)
>>>       write(*,*)
>>>       print '("Time Partial Parallel Build A #1 = ",f21.3," 
>>> seconds.")',finish-start
>>> 
>>>    endif
>>> 
>>> 
>>>    call MatAssemblyBegin(A,MAT_FLUSH_ASSEMBLY,ierr)!;CHKERRQ(ierr) ! 
>>> necessary to switch between INSERT_VALUES and ADD_VALUES
>>>    call MatAssemblyEnd(A,MAT_FLUSH_ASSEMBLY,ierr)!;CHKERRQ(ierr)   ! 
>>> necessary to switch between INSERT_VALUES and ADD_VALUES
>>> 
>>> 
>>> 
>>> !...Finish Blocks A11, A22 and A33
>>> 
>>>    if (rank == 0) call cpu_time(start)
>>> 
>>>    VDY=(0.0d0,0.0d0)
>>>    VDIAG=(0.0d0,0.0d0)
>>> 
>>>    do i=IstartA,IendA-1
>>>       if (i < 3*nxy) then
>>>          call 
>>> MatSetValues(A,ione,i,ione,i,betac**2/Rec+imag_unit*betac*wvec_cmplx3(i+1),ADD_VALUES,ierr)!;CHKERRQ(ierr)
>>>       endif
>>>    enddo
>>> 
>>>    do i=IstartA,IendA-1
>>> 
>>>       if     ( i < nxy ) then
>>> 
>>>          !Build -DYY/Rec (part of block 1,1)
>>>          ii=mod(i,ny)+1
>>>          jj=1
>>> 
>>>          jmin=floor((i+0.01)/ny)*ny
>>> 
>>>          do j=jmin,jmin+(ny-1)
>>> 
>>>             call 
>>> MatSetValues(A,ione,i,ione,j,-D2Yc(ii,jj)/Rec,ADD_VALUES,ierr)
>>>             jj=jj+1
>>> 
>>>          enddo
>>> 
>>> 
>>>          !Build U*DX (part of block 1,1)
>>>          ii=1+floor((i+0.01)/ny)
>>>          jj=1
>>> 
>>>          jmin=mod(i,ny)
>>>          jmax=jmin+(nx-1)*ny
>>> 
>>>          do j=jmin,jmax,ny
>>> 
>>>             call 
>>> MatSetValues(A,ione,i,ione,j,uvec_cmplx(i+1)*D1Xc(ii,jj),ADD_VALUES,ierr)
>>>             jj=jj+1
>>> 
>>>          enddo
>>> 
>>> 
>>>          !Build V*DY (part of block 1,1)
>>>          ii=mod(i,ny)+1
>>>          jj=1
>>> 
>>>          jmin=floor((i+0.01)/ny)*ny
>>> 
>>>          do j=jmin,jmin+(ny-1)
>>>             call 
>>> MatSetValues(A,ione,i,ione,j,vvec_cmplx(i+1)*D1Yc(ii,jj),ADD_VALUES,ierr)
>>>             jj=jj+1
>>> 
>>>          enddo
>>> 
>>> 
>>>          call 
>>> MatSetValues(A,ione,i,ione,i,uxvec_cmplx(i+1),ADD_VALUES,ierr)!;CHKERRQ(ierr)
>>>  ! Ux (block 1,1)
>>> 
>>> 
>>> 
>>> 
>>>       elseif ( i > nxy-1 .and. i < 2*nxy ) then
>>> 
>>>          !Build -DYY/Rec (part of block 2,2)
>>>          ii=mod(i,ny)+1
>>>          jj=1
>>> 
>>>          jmin=floor((i-nxy+0.01)/ny)*ny
>>> 
>>>          do j=jmin,jmin+(ny-1)
>>> 
>>>             call 
>>> MatSetValues(A,ione,i,ione,j+nxy,-D2Yc(ii,jj)/Rec,ADD_VALUES,ierr) ! Build 
>>> DY (block 2,4)
>>>             jj=jj+1
>>> 
>>>          enddo
>>> 
>>> 
>>>          !Build U*DX (part of block 2,2)
>>>          ii=1+floor((i-nxy+0.01)/ny)
>>>          jj=1
>>> 
>>>          jmin=mod(i,ny)
>>>          jmax=jmin+(nx-1)*ny
>>> 
>>>          do j=jmin,jmax,ny
>>> 
>>>             call 
>>> MatSetValues(A,ione,i,ione,j+nxy,uvec_cmplx(i-nxy+1)*D1Xc(ii,jj),ADD_VALUES,ierr)
>>>             jj=jj+1
>>> 
>>>          enddo
>>> 
>>> 
>>>          !Build V*DY (part of block 2,2)
>>>          ii=mod(i,ny)+1
>>>          jj=1
>>> 
>>>          jmin=floor((i-nxy+0.01)/ny)*ny
>>> 
>>>          do j=jmin,jmin+(ny-1)
>>>             call 
>>> MatSetValues(A,ione,i,ione,j+nxy,vvec_cmplx(i-nxy+1)*D1Yc(ii,jj),ADD_VALUES,ierr)
>>>             jj=jj+1
>>> 
>>>          enddo
>>> 
>>>          call 
>>> MatSetValues(A,ione,i,ione,i,vyvec_cmplx(i-nxy+1),ADD_VALUES,ierr)!;CHKERRQ(ierr)
>>>  ! Vy (block 2,2)
>>> 
>>> 
>>>       elseif ( i > 2*nxy-1 .and. i < 3*nxy ) then
>>> 
>>>          !Build -DYY/Rec (part of block 3,3)
>>>          ii=mod(i,ny)+1
>>>          jj=1
>>> 
>>>          jmin=floor((i-2*nxy+0.01)/ny)*ny
>>> 
>>>          do j=jmin,jmin+(ny-1)
>>> 
>>>             call 
>>> MatSetValues(A,ione,i,ione,j+2*nxy,-D2Yc(ii,jj)/Rec,ADD_VALUES,ierr) ! 
>>> Build DY (block 2,4)
>>>             jj=jj+1
>>> 
>>>          enddo
>>> 
>>> 
>>>          !Build U*DX (part of block 3,3)
>>>          ii=1+floor((i-2*nxy+0.01)/ny)
>>>          jj=1
>>> 
>>>          jmin=mod(i,ny)
>>>          jmax=jmin+(nx-1)*ny
>>> 
>>>          do j=jmin,jmax,ny
>>> 
>>>             call 
>>> MatSetValues(A,ione,i,ione,j+2*nxy,uvec_cmplx(i-2*nxy+1)*D1Xc(ii,jj),ADD_VALUES,ierr)
>>>             jj=jj+1
>>> 
>>>          enddo
>>> 
>>> 
>>>          !Build V*DY (part of block 3,3)
>>>          ii=mod(i,ny)+1
>>>          jj=1
>>> 
>>>          jmin=floor((i-2*nxy+0.01)/ny)*ny
>>> 
>>>          do j=jmin,jmin+(ny-1)
>>>             call 
>>> MatSetValues(A,ione,i,ione,j+2*nxy,vvec_cmplx(i-2*nxy+1)*D1Yc(ii,jj),ADD_VALUES,ierr)
>>>             jj=jj+1
>>> 
>>>          enddo
>>> 
>>>       endif
>>> 
>>>    enddo
>>> 
>>> 
>>>    if (rank==0) then
>>> 
>>>       call cpu_time(finish)
>>>       write(*,*)
>>>       print '("Time Partial Parallel Build A #2 = ",f21.3," 
>>> seconds.")',finish-start
>>> 
>>>    endif
>>> 
>>> !$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
>>> !$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
>>> 
>>> 
>>>    !...Artificially create diagonal element for A ==> TO BE REMOVED
>>>    small = (0.000001d0,0.0d0)
>>>    do i=IstartA,IendA-1
>>>       call 
>>> MatSetValues(A,ione,i,ione,i,small,ADD_VALUES,ierr)!;CHKERRQ(ierr)
>>>    enddo
>>> 
>>>    call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>>    !call PetscMemoryGetMaximumUsage(mem,ierr)
>>>    call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr)
>>>    if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (A just before 
>>> assembly)',rank,'    memory:',mem
>>> 
>>> 
>>>    if (rank==0) then
>>>       call cpu_time(start)
>>>    endif
>>> 
>>> !...Assemble A
>>>    call MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY,ierr)!;CHKERRQ(ierr)
>>>    call MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY,ierr)!;CHKERRQ(ierr)
>>>    !call MatView(A,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>> 
>>>    if (rank==0) then
>>>       call cpu_time(finish)
>>>       write(*,*)
>>>       print '("Time to assemble A = ",f21.3," seconds.")',finish-start
>>>    endif
>>> 
>>> 
>>>    call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>>    !call PetscMemoryGetMaximumUsage(mem,ierr)
>>>    call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr)
>>>    if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (A build and 
>>> assembled)',rank,'    memory:',mem
>>> 
>>> 
>>> !...Build B in parallel
>>>    if (rank==0) then
>>>       call cpu_time(start)
>>>    endif
>>> 
>>>    do i=IstartB,IendB-1
>>>       if (i < 3*nxy) then
>>>          call 
>>> MatSetValues(B,ione,i,ione,i,imag_unit,INSERT_VALUES,ierr)!;CHKERRQ(ierr)
>>>       endif
>>>    enddo
>>> 
>>> !...Assemble B
>>>    call MatAssemblyBegin(B,MAT_FINAL_ASSEMBLY,ierr)!;CHKERRQ(ierr)
>>>    call MatAssemblyEnd(B,MAT_FINAL_ASSEMBLY,ierr)!;CHKERRQ(ierr)
>>>    !call MatView(B,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>> 
>>>    call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>>    !call PetscMemoryGetMaximumUsage(mem,ierr)
>>>    call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr)
>>>    if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (B build and 
>>> assembled)',rank,'    memory:',mem
>>> 
>>> 
>>>    call MatNorm(B,NORM_FROBENIUS,nrm1,ierr)!;CHKERRQ(ierr)
>>> 
>>> !!$    if (rank==0) then
>>> !!$       write(*,*)'norm B:',nrm1
>>> !!$    endif
>>> 
>>>    if (rank==0) then
>>>       call cpu_time(finish)
>>>       write(*,*)
>>>       print '("Time to build and assemble B = ",f21.3," 
>>> seconds.")',finish-start
>>>    endif
>>> 
>>> 
>>> !...Set Boundary Conditions
>>> 
>>>    if (rank==0) then
>>> 
>>>       allocate(li(nx))
>>> 
>>>       do i=1,nx
>>>          li(i)=(i-1)*ny
>>>       enddo
>>> 
>>>       ct1=0
>>>       ct2=0
>>>       ct3=0
>>>       ct4=0
>>> 
>>>       do i=1,nx
>>>          do j=1,ny
>>> 
>>>             if (j==1) then ! fst_bc
>>>                ct1=ct1+1
>>>             elseif (j==ny) then ! wall_bc
>>>                ct2=ct2+1
>>>             elseif (i==1) then ! outlet_bc
>>>                ct3=ct3+1
>>>             elseif (i==nx) then ! inlet_bc
>>>                ct4=ct4+1
>>>             endif
>>> 
>>>          enddo
>>>       enddo
>>> 
>>>       ct=ct1+ct2+ct3+ct4
>>> 
>>>    endif ! if (rank == 0)
>>> 
>>>    call MPI_Bcast(ct,1,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>>    call MPI_Bcast(ct1,1,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>>    call MPI_Bcast(ct2,1,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>>    call MPI_Bcast(ct3,1,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>>    call MPI_Bcast(ct4,1,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>> 
>>>    allocate(fst_bc(ct1))
>>>    allocate(wall_bc(ct2))
>>>    allocate(outlet_bc(ct3))
>>>    allocate(inlet_bc(ct4))
>>> 
>>>    allocate(all_bc(3*ct))
>>> 
>>>    allocate(u_bc(ct))
>>>    allocate(v_bc(ct))
>>>    allocate(w_bc(ct))
>>>    allocate(p_bc(ct))
>>> 
>>>    if (rank == 0) then
>>> 
>>>       ct1=0
>>>       ct2=0
>>>       ct3=0
>>>       ct4=0
>>> 
>>>       do i=1,nx
>>>          do j=1,ny
>>> 
>>>             ij=li(i)+j-1
>>> 
>>>             if (j==1) then ! fst_bc
>>>                ct1=ct1+1
>>>                fst_bc(ct1)=ij
>>>             elseif (j==ny) then ! wall_bc
>>>                ct2=ct2+1
>>>                wall_bc(ct2)=ij
>>>             elseif (i==1) then ! outlet_bc
>>>                ct3=ct3+1
>>>                outlet_bc(ct3)=ij
>>>             elseif (i==nx) then ! inlet_bc
>>>                ct4=ct4+1
>>>                inlet_bc(ct4)=ij
>>>             endif
>>> 
>>>          enddo
>>>       enddo
>>> 
>>>       u_bc(1:ct1)=fst_bc
>>>       u_bc(ct1+1:ct1+ct2)=wall_bc
>>>       u_bc(ct1+ct2+1:ct1+ct2+ct3)=outlet_bc
>>>       u_bc(ct1+ct2+ct3+1:ct1+ct2+ct3+ct4)=inlet_bc
>>> 
>>>       v_bc=u_bc+nxy
>>>       w_bc=v_bc+nxy
>>>       p_bc=w_bc+nxy
>>> 
>>>       all_bc=[u_bc,v_bc,w_bc]
>>> 
>>>    endif ! if (rank == 0)
>>> 
>>>    call 
>>> MPI_Bcast(all_bc,3*ct,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>> 
>>>    !For parallel case, all processes that share  matrix MUST call this 
>>> routine, regardless of whether any rows being zeroed are owned by them.
>>>    call MatZeroRows(A,3*ct,all_bc, one,zero,zero,ierr);CHKERRQ(ierr)
>>>    call MatZeroRows(B,3*ct,all_bc,zero,zero,zero,ierr);CHKERRQ(ierr)
>>> 
>>>    !call MatView(B,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>> 
>>>    ! cannot be inside a if (rank==0) statment!!!
>>>    call MatNorm(A,NORM_FROBENIUS,nrm,ierr)!;CHKERRQ(ierr)
>>>    call MatNorm(B,NORM_FROBENIUS,nrm1,ierr)!;CHKERRQ(ierr)
>>> 
>>>    if (rank==0) then
>>>       write(*,*)'norm A:',nrm
>>>       write(*,*)'norm B:',nrm1
>>>    endif
>>> 
>>> 
>>> 
>>> !!!!!! TESTING HOW BIG LU CAN BE !!!!!!!!!!!!!!!!
>>> 
>>> !!$    sigma = (1.0d0,0.0d0)
>>> !!$    call MatAXPY(A,-sigma,B,DIFFERENT_NONZERO_PATTERN,ierr) ! on exit A 
>>> = A-sigma*B
>>> !!$
>>> !!$    !...Create linear solver context
>>> !!$    call KSPCreate(PETSC_COMM_WORLD,ksp,ierr)
>>> !!$
>>> !!$!...Set operators. Here the matrix that defines the linear system also 
>>> serves as the preconditioning matrix.
>>> !!$    !call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr) !aha 
>>> commented and replaced by next line
>>> !!$    call KSPSetOperators(ksp,A,A,ierr) ! remember: here A = A-sigma*B
>>> !!$
>>> !!$    tol = 1.e-10
>>> !!$    call 
>>> KSPSetTolerances(ksp,tol,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr) 
>>> ! set relative and absolute tolerances and uses defualt for divergence tol
>>> !!$
>>> !!$!...Set the Direct (LU) Solver
>>> !!$    call KSPSetType(ksp,KSPPREONLY,ierr)
>>> !!$    call KSPGetPC(ksp,pc,ierr)
>>> !!$    call PCSetType(pc,PCLU,ierr)
>>> !!$    call PCFactorSetMatSolverPackage(pc,MATSOLVERSUPERLU_DIST,ierr) ! 
>>> MATSOLVERSUPERLU_DIST MATSOLVERMUMPS
>>> !!$
>>> !!$!...Create right-hand-side vector
>>> !!$    call MatCreateVecs(A,frhs,PETSC_NULL_OBJECT,ierr)
>>> !!$    call MatCreateVecs(A,sol,PETSC_NULL_OBJECT,ierr)
>>> !!$
>>> !!$    allocate(xwork1(IendA-IstartA))
>>> !!$    allocate(loc(IendA-IstartA))
>>> !!$
>>> !!$    ct=0
>>> !!$    do i=IstartA,IendA-1
>>> !!$       ct=ct+1
>>> !!$       loc(ct)=i
>>> !!$       xwork1(ct)=(1.0d0,0.0d0)
>>> !!$    enddo
>>> !!$
>>> !!$    call VecSetValues(frhs,IendA-IstartA,loc,xwork1,INSERT_VALUES,ierr)
>>> !!$    call VecZeroEntries(sol,ierr)
>>> !!$
>>> !!$    deallocate(xwork1,loc)
>>> !!$
>>> !!$!...Assemble Vectors
>>> !!$    call VecAssemblyBegin(frhs,ierr)
>>> !!$    call VecAssemblyEnd(frhs,ierr)
>>> !!$
>>> !!$!...Solve the linear system
>>> !!$    call KSPSolve(ksp,frhs,sol,ierr)
>>> !!$
>>> !!$    !call VecView(sol,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>> !!$
>>> !!$    STOP
>>> 
>>> 
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> !!!!!!!!!!!!!!!!!!!!!!!           SLEPC         !!!!!!!!!!!!!!!!!!!!!!!!!!
>>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
>>> 
>>>    if (rank==0) call cpu_time(start)
>>> 
>>> 
>>>    call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>>    call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr)
>>>    if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (begin 
>>> slepc)',rank,'    memory:',mem
>>> 
>>> !...Get vector(s) compatible with the matrix, i.e. with the same parallel 
>>> layout
>>>    call MatCreateVecs(A,xr,xi,ierr)
>>> !!$    call MatCreateVecs(A,xr,PETSC_NULL_OBJECT,ierr)
>>> !!$    call MatCreateVecs(A,xi,PETSC_NULL_OBJECT,ierr)
>>> 
>>> 
>>> !...Create Eigensolver Context and Spectral Transformation context
>>>    call EPSCreate(PETSC_COMM_WORLD,eps,ierr)
>>>    !call STCreate(PETSC_COMM_WORLD,st,ierr)
>>> 
>>> !...Set Operators and Problem Type - Generalized Eigenvalue Problem
>>>    call EPSSetOperators(eps,A,B,ierr)
>>>    !call EPSSetProblemType(eps,EPS_PGNHEP,ierr) ! THIS ONE LEADS TO AN 
>>> ERROR --> CHECK WHY BECAUSE MY B SEEMS FITTING THAT CRITERION
>>>    call EPSSetProblemType(eps,EPS_GNHEP,ierr)
>>> 
>>> !...Select Portion of Spectrum
>>>    call EPSSetTarget(eps,(3.0d0,0.d0),ierr)
>>>    call EPSSetWhichEigenpairs(eps,EPS_TARGET_MAGNITUDE,ierr)
>>> 
>>> !...Select Algorithm for Eigenvalue Computation (default is Krylov-Schur)
>>>    call EPSSetType(eps,EPSKRYLOVSCHUR,ierr) !EPSARNOLDI,EPSKRYLOVSCHUR
>>> 
>>>    call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>>    call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr)
>>>    if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (slepc after 
>>> EPSSetType)',rank,'    memory:',mem
>>> 
>>> !...Set Tolerance and Maxiter for Solver
>>>    tolslepc=1e-12
>>>    maxiter=500
>>>    !call EPSSetTolerances(eps,PETSC_DEFAULT_REAL,maxiter,ierr)
>>>    call EPSSetTolerances(eps,tolslepc,maxiter,ierr)
>>> 
>>> !...Sets number of eigenvalues to compute and dimension of subspace
>>>    nev=200
>>>    ncv=3*nev
>>>    mpd=10*nev
>>>    !call EPSSetDimensions(eps,nev,ncv,mpd,ierr) PETSC_DEFAULT
>>>    call 
>>> EPSSetDimensions(eps,nev,PETSC_DEFAULT_INTEGER,PETSC_DEFAULT_INTEGER,ierr)
>>> 
>>>    call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>>    call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr)
>>>    if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (before 
>>> EPSGetST)',rank,'    memory:',mem
>>> 
>>> !...Set the Spectral Transformation --> Use Shift-And-Invert
>>>    call EPSGetST(eps,st,ierr)
>>>    call STSetType(st,STSINVERT,ierr)
>>> 
>>> !...Set Linear Solver
>>>    call STGetKSP(st,ksp,ierr)
>>>    call KSPSetType(ksp,KSPPREONLY,ierr)
>>>    call KSPGetPC(ksp,pc,ierr)
>>>    call PCSetType(pc,PCLU,ierr) ! could try PCCHOLESKY
>>>    call PCFactorSetMatSolverPackage(pc,MATSOLVERSUPERLU_DIST,ierr) ! 
>>> MATSOLVERSUPERLU_DIST,MATSOLVERMUMPS
>>> 
>>> !...Set Solver Parameters at Runtime
>>>    call EPSSetFromOptions(eps,ierr)
>>> 
>>>    call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr)
>>>    call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr)
>>>    if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (after 
>>> EPSSetFromOptions)',rank,'    memory:',mem
>>> 
>>> !...Solve the Eigensystem
>>>    call EPSSolve(eps,ierr)
>>> 
>>>    call EPSGetIterationNumber(eps,its,ierr)
>>>    if (rank .eq. 0) then
>>>       write(*,110) its
>>>    endif
>>> 110 format (/' Number of iterations of the method:',I4)
>>> 
>>> !...Optional: Get some information from the solver and display it
>>>    call EPSGetType(eps,tname,ierr)
>>>    if (rank .eq. 0) then
>>>       write(*,120) tname
>>>    endif
>>> 120 format (' Solution method: ',A)
>>> 
>>>    nev=10
>>>    call EPSGetDimensions(eps,nev,PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,ierr)
>>>    if (rank .eq. 0) then
>>>       write(*,130) nev
>>>    endif
>>> 130 format (' Number of requested eigenvalues:',I4)
>>>    call EPSGetTolerances(eps,tol,maxit,ierr)
>>>    if (rank .eq. 0) then
>>>       write(*,140) tol, maxit
>>>    endif
>>> 140 format (' Stopping condition: tol=',1P,E10.4,', maxit=',I4)
>>> 
>>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>> !     Display solution and clean up
>>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>> 
>>> !...Get number of converged eigenpairs
>>>    call EPSGetConverged(eps,nconv,ierr)
>>>    if (rank .eq. 0) then
>>>       write(*,150) nconv
>>>    endif
>>> 150 format (' Number of converged eigenpairs:',I4/)
>>> 
>>> !...Display eigenvalues and relative errors
>>>    if (nconv.gt.0) then
>>>       if (rank .eq. 0) then
>>>          write(*,*) '         kr               ki          ||Ax-kx||/||kx||'
>>>          write(*,*) ' ----------------- -----------------  
>>> ------------------'
>>>       endif
>>>       do i=0,nconv-1
>>>          !...Get converged eigenpairs: i-th eigenvalue is stored in kr 
>>> (real part) and ki (imaginary part)
>>>          call EPSGetEigenpair(eps,i,kr,ki,xr,xi,ierr)
>>> 
>>>          !...Compute the relative error associated to each eigenpair
>>>          call EPSComputeError(eps,i,EPS_ERROR_RELATIVE,error,ierr)
>>>          if (rank .eq. 0) then
>>>             write(*,160) PetscRealPart(kr),PetscImaginaryPart(kr), error
>>>          endif
>>> 160       format (1P,'   ',E21.11,'       ',E21.11,'      ',E21.11)
>>> 
>>>       enddo
>>> 
>>>    endif
>>> 
>>> !...Free work space
>>> 
>>>    if (rank==0) deallocate(li)
>>> 
>>>    deallocate(D1Xc,D2Xc,D1Yc,D2Yc)
>>> 
>>>    deallocate(fst_bc)
>>>    deallocate(wall_bc)
>>>    deallocate(outlet_bc)
>>>    deallocate(inlet_bc)
>>> 
>>>    deallocate(all_bc)
>>> 
>>>    deallocate(u_bc)
>>>    deallocate(v_bc)
>>>    deallocate(w_bc)
>>>    deallocate(p_bc)
>>> 
>>>    !call STDestroy(st,ierr)
>>>    call EPSDestroy(eps,ierr)
>>>    call MatDestroy(A,ierr)
>>>    call MatDestroy(B,ierr)
>>>    !call VecDestroy(i_vec,ierr)
>>>    call VecDestroy(xr,ierr)
>>>    call VecDestroy(xi,ierr)
>>> 
>>> !...Finalize
>>>    call SlepcFinalize(ierr)
>>> 
>>>    if (rank==0) then
>>>       call cpu_time(finish)
>>>       write(*,*)
>>>       print '("Total time for SLEPC = ",f21.3," seconds.")',finish-start
>>>    endif
>>> 
>>>    END Subroutine SETUPSOLVEGEVP
>>> 
>>> 
>>>    END MODULE PETSC
>>> 
>>> 
>>> 
>>> 
>>> 
>>> !!$       write(*,*)'fst_bc',fst_bc
>>> !!$       write(*,*)'wall_bc',wall_bc
>>> !!$       write(*,*)'outlet_bc',outlet_bc
>>> !!$       write(*,*)'inlet_bc',inlet_bc
>>> !!$
>>> !!$       write(*,*)'all_bc',all_bc
>>> 
>>> !!$       write(*,*)'ct1+ct2+ct3+ct4',ct1+ct2+ct3+ct4
>>> 
>>> 
>>> 
>>> !!$    call MatCreate(PETSC_COMM_WORLD,A,ierr)
>>> !!$    call MatSetSizes(A,PETSC_DECIDE,PETSC_DECIDE,4*nxy,4*nxy,ierr)
>>> !!$    call MatSetFromOptions(A,ierr)
>>> !!$    call MatSetUp(A,ierr)
>>> 
>>> 
>>> 
>>>       !call VecCreateSeq(PETSC_COMM_SELF,nxy,i_vec,ierr) ! sequential vector
>>>       !call VecSetValues(i_vec,nxy,loc,xwork1,INSERT_VALUES,ierr)
>>> 
>>>       !  Assemble vector
>>>       !call VecAssemblyBegin(i_vec,ierr)
>>>       !call VecAssemblyEnd(i_vec,ierr)
>>> 
>>>    !  Create parallel matrix, specifying only its global dimensions.
>>>    !  When using MatCreate(), the matrix format can be specified at
>>>    !  runtime. Also, the parallel partitioning of the matrix is
>>>    !  determined by PETSc at runtime.
>>> 
>>> 
>>> 
>>> !!$    call MatCreate(PETSC_COMM_WORLD,B,ierr)
>>> !!$    call MatSetSizes(B,PETSC_DECIDE,PETSC_DECIDE,4*nxy,4*nxy,ierr)
>>> !!$    call MatSetFromOptions(B,ierr)
>>> !!$    call MatSetUp(B,ierr)
>>> 
>>>    !  Currently, all PETSc parallel matrix formats are partitioned by
>>>    !  contiguous chunks of rows across the processors.  Determine which
>>>    !  rows of the matrix are locally owned.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>> !                    Include files
>>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>> !
>>> !  This program uses CPP for preprocessing, as indicated by the use of
>>> !  PETSc include files in the directory petsc/include/finclude.  This
>>> !  convention enables use of the CPP preprocessor, which allows the use
>>> !  of the #include statements that define PETSc objects and variables.
>>> !
>>> !  Use of the conventional Fortran include statements is also supported
>>> !  In this case, the PETsc include files are located in the directory
>>> !  petsc/include/foldinclude.
>>> !
>>> !  Since one must be very careful to include each file no more than once
>>> !  in a Fortran routine, application programmers must exlicitly list
>>> !  each file needed for the various PETSc components within their
>>> !  program (unlike the C/C++ interface).
>>> !
>>> 
>>> 
>>> 
>>> 
>>>    !write(*,*)'IstartA,IendA',IstartA,IendA
>>>    !write(*,*)'IstartB,IendB',IstartB,IendB
>>> 
>>> 
>>> 
>>> !!$#include <finclude/petscsys.h>
>>> !!$#include <finclude/petscvec.h>
>>> !!$#include <finclude/petscmat.h>
>>> !!$#include <finclude/petscmat.h90>
>>> !!$#include <finclude/petscpc.h>
>>> !!$#include <finclude/petscksp.h>
>>> !!$
>>> !!$#include <finclude/slepcsys.h>
>>> !!$#include <finclude/slepceps.h>
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> !!$      do i=1,nx
>>> !!$         do j=1,nx
>>> !!$            D1X(i,j)=100+j+(i-1)*nx
>>> !!$         enddo
>>> !!$      enddo
>>> 
>>> !!$      do i=1,ny
>>> !!$         do j=1,ny
>>> !!$            D1Y(i,j)=999.d0
>>> !!$         enddo
>>> !!$      enddo
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> !  Create parallel vectors.
>>> !   - Here, the parallel partitioning of the vector is determined by
>>> !     PETSc at runtime.  We could also specify the local dimensions
>>> !     if desired -- or use the more general routine VecCreate().
>>> !   - When solving a linear system, the vectors and matrices MUST
>>> !     be partitioned accordingly.  PETSc automatically generates
>>> !     appropriately partitioned matrices and vectors when MatCreate()
>>> !     and VecCreate() are used with the same communicator.
>>> !   - Note: We form 1 vector from scratch and then duplicate as needed.
>>> 
>>>      !call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,m*n,u,ierr)
>>>      !call VecSetFromOptions(u,ierr)
>>>      !call VecDuplicate(u,b,ierr)
>>>      !call VecDuplicate(b,x,ierr)
>>> 
>>> !  Set exact solution; then compute right-hand-side vector.
>>> !  By default we use an exact solution of a vector with all
>>> !  elements of 1.0;  Alternatively, using the runtime option
>>> !  -random_sol forms a solution vector with random components.
>>> 
>>> !!$      call 
>>> PetscOptionsHasName(PETSC_NULL_CHARACTER,'-random_exact_sol',flg,ierr)
>>> !!$      if (flg) then
>>> !!$         call PetscRandomCreate(PETSC_COMM_WORLD,rctx,ierr)
>>> !!$         call PetscRandomSetFromOptions(rctx,ierr)
>>> !!$         call VecSetRandom(u,rctx,ierr)
>>> !!$         call PetscRandomDestroy(rctx,ierr)
>>> !!$      else
>>> !!$         call VecSet(u,one,ierr)
>>> !!$      endif
>>> !!$      call MatMult(A,u,b,ierr)
>>> 
>>> 
>>> !  View the exact solution vector if desired
>>> 
>>> !!$      call 
>>> PetscOptionsHasName(PETSC_NULL_CHARACTER,"-view_exact_sol",flg,ierr)
>>> !!$      if (flg) then
>>> !!$         call VecView(u,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>> !!$      endif
>>> 
>>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>> !         Create the linear solver and set various options
>>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>> 
>>> !  Create linear solver context
>>> 
>>> !!$      call KSPCreate(PETSC_COMM_WORLD,ksp,ierr)
>>> 
>>> 
>>> !  Set operators. Here the matrix that defines the linear system
>>> !  also serves as the preconditioning matrix.
>>> 
>>>      !call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr) !aha 
>>> commented and replaced by next line
>>> !!$      call KSPSetOperators(ksp,A,A,ierr)
>>> 
>>> 
>>> !  Set linear solver defaults for this problem (optional).
>>> !   - By extracting the KSP and PC contexts from the KSP context,
>>> !     we can then directly directly call any KSP and PC routines
>>> !     to set various options.
>>> !   - The following four statements are optional; all of these
>>> !     parameters could alternatively be specified at runtime via
>>> !     KSPSetFromOptions(). All of these defaults can be
>>> !     overridden at runtime, as indicated below.
>>> 
>>> !     We comment out this section of code since the Jacobi
>>> !     preconditioner is not a good general default.
>>> 
>>> !      call KSPGetPC(ksp,pc,ierr)
>>> !      ptype = PCJACOBI
>>> !      call PCSetType(pc,ptype,ierr)
>>> !      tol = 1.e-7
>>> !      call KSPSetTolerances(ksp,tol,PETSC_DEFAULT_DOUBLE_PRECISION,
>>> !     &     PETSC_DEFAULT_DOUBLE_PRECISION,PETSC_DEFAULT_INTEGER,ierr)
>>> 
>>> !  Set user-defined monitoring routine if desired
>>> 
>>>      !call 
>>> PetscOptionsHasName(PETSC_NULL_CHARACTER,'-my_ksp_monitor',flg,ierr) ! flg 
>>> set to true if that option has been specified
>>>      !if (flg) then
>>>      !  call 
>>> KSPMonitorSet(ksp,MyKSPMonitor,PETSC_NULL_OBJECT,PETSC_NULL_FUNCTION,ierr) 
>>> ! call MyKSPMonitor
>>>      !endif
>>> 
>>>      !tol = 1.e-10
>>>      !call 
>>> KSPSetTolerances(ksp,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr)
>>>  ! set relative tolerance and uses defualt for absolute and divergence tol
>>>      !call 
>>> KSPSetTolerances(ksp,tol,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr) 
>>> ! set relative and absolute tolerances and uses defualt for divergence tol
>>> 
>>> !  Set runtime options, e.g.,
>>> !      -ksp_type <type> -pc_type <type> -ksp_monitor -ksp_rtol <rtol>
>>> !  These options will override those specified above as long as
>>> !  KSPSetFromOptions() is called _after_ any other customization
>>> !  routines.
>>> 
>>>      !call KSPSetFromOptions(ksp,ierr)
>>> 
>>> 
>>> !  Set convergence test routine if desired
>>> 
>>>      !call 
>>> PetscOptionsHasName(PETSC_NULL_CHARACTER,'-my_ksp_convergence',flg,ierr)
>>>      !if (flg) then
>>>      !  call 
>>> KSPSetConvergenceTest(ksp,MyKSPConverged,PETSC_NULL_OBJECT,PETSC_NULL_FUNCTION,ierr)
>>>      !endif
>>> 
>>> 
>>> 
>>>      !call 
>>> KSPMonitorSet(ksp,KSPMonitorTrueResidualNorm,PETSC_NULL_OBJECT,PETSC_NULL_FUNCTION,ierr);
>>> 
>>> 
>>> !
>>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>> !                      Solve the linear system
>>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>> 
>>>      !call KSPSolve(ksp,b,x,ierr)
>>> 
>>> 
>>> !
>>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>> !                      View solver info (could use -ksp_view instead)
>>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>> 
>>>      !added by aha
>>> 
>>> !!$      write(*,*)''
>>> !!$      write(*,*)'Start of KSPView'
>>> !!$      write(*,*)'----------------'
>>> !!$      write(*,*)''
>>> !!$      call KSPView(ksp,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>> !!$      write(*,*)''
>>> !!$      write(*,*)'End of KSPView'
>>> !!$      write(*,*)'--------------'
>>> !!$      write(*,*)''
>>> 
>>> 
>>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>> !                     Check solution and clean up
>>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>> 
>>> !  Check the error
>>> !!$      call VecAXPY(x,neg_one,u,ierr)
>>> !!$      call VecNorm(x,NORM_2,norm,ierr)
>>> !!$      call KSPGetIterationNumber(ksp,its,ierr)
>>> !!$      if (rank .eq. 0) then
>>> !!$        if (norm .gt. 1.e-12) then
>>> !!$           write(6,100) norm,its
>>> !!$        else
>>> !!$           write(6,110) its
>>> !!$        endif
>>> !!$      endif
>>> !!$  100 format('Norm of error ',e11.4,' iterations ',i5)
>>> !!$  110 format('Norm of error < 1.e-12,iterations ',i5)
>>> 
>>> !  Free work space.  All PETSc objects should be destroyed when they
>>> !  are no longer needed.
>>> 
>>> !!$      call KSPDestroy(ksp,ierr)
>>> !!$      call VecDestroy(u,ierr)
>>> !!$      call VecDestroy(x,ierr)
>>> !!$      call VecDestroy(b,ierr)
>>> !!$      call MatDestroy(A,ierr)
>>> 
>>> !  Always call PetscFinalize() before exiting a program.  This routine
>>> !    - finalizes the PETSc libraries as well as MPI
>>> !    - provides summary and diagnostic information if certain runtime
>>> !      options are chosen (e.g., -log_summary).  See PetscFinalize()
>>> !      manpage for more information.
>>> 
>>> 
>>> ! --------------------------------------------------------------
>>> !
>>> !  MyKSPMonitor - This is a user-defined routine for monitoring
>>> !  the KSP iterative solvers.
>>> !
>>> !  Input Parameters:
>>> !    ksp   - iterative context
>>> !    n     - iteration number
>>> !    rnorm - 2-norm (preconditioned) residual value (may be estimated)
>>> !    dummy - optional user-defined monitor context (unused here)
>>> !
>>> !!$      subroutine MyKSPMonitor(ksp,n,rnorm,dummy,ierr)
>>> !!$
>>> !!$      implicit none
>>> !!$
>>> !!$#include <finclude/petscsys.h>
>>> !!$#include <finclude/petscvec.h>
>>> !!$#include <finclude/petscksp.h>
>>> !!$
>>> !!$      KSP              ksp
>>> !!$      Vec              x
>>> !!$      PetscErrorCode ierr
>>> !!$      PetscInt n,dummy
>>> !!$      PetscMPIInt rank
>>> !!$      double precision rnorm
>>> !!$
>>> !!$!  Build the solution vector
>>> !!$
>>> !!$      call KSPBuildSolution(ksp,PETSC_NULL_OBJECT,x,ierr)
>>> !!$
>>> !!$!  Write the solution vector and residual norm to stdout
>>> !!$!   - Note that the parallel viewer PETSC_VIEWER_STDOUT_WORLD
>>> !!$!     handles data from multiple processors so that the
>>> !!$!     output is not jumbled.
>>> !!$
>>> !!$      call MPI_Comm_rank(PETSC_COMM_WORLD,rank,ierr)
>>> !!$      if (rank .eq. 0) write(6,100) n
>>> !!$      call VecView(x,PETSC_VIEWER_STDOUT_WORLD,ierr)
>>> !!$      if (rank .eq. 0) write(6,200) n,rnorm
>>> !!$
>>> !!$ 100  format('iteration ',i5,' solution vector:')
>>> !!$ 200  format('iteration ',i5,' residual norm ',e11.4)
>>> !!$      ierr = 0
>>> !!$
>>> !!$      end subroutine MyKSPMonitor
>>> 
>>> ! --------------------------------------------------------------
>>> !
>>> !  MyKSPConverged - This is a user-defined routine for testing
>>> !  convergence of the KSP iterative solvers.
>>> !
>>> !  Input Parameters:
>>> !    ksp   - iterative context
>>> !    n     - iteration number
>>> !    rnorm - 2-norm (preconditioned) residual value (may be estimated)
>>> !    dummy - optional user-defined monitor context (unused here)
>>> !
>>> !!$      subroutine MyKSPConverged(ksp,n,rnorm,flag,dummy,ierr)
>>> !!$
>>> !!$      implicit none
>>> !!$
>>> !!$#include <finclude/petscsys.h>
>>> !!$#include <finclude/petscvec.h>
>>> !!$#include <finclude/petscksp.h>
>>> !!$
>>> !!$      KSP              ksp
>>> !!$      PetscErrorCode ierr
>>> !!$      PetscInt n,dummy
>>> !!$      KSPConvergedReason flag
>>> !!$      double precision rnorm
>>> !!$
>>> !!$      if (rnorm .le. .05) then
>>> !!$        flag = 1
>>> !!$      else
>>> !!$        flag = 0
>>> !!$      endif
>>> !!$      ierr = 0
>>> !!$
>>> !!$      end subroutine MyKSPConverged
>>> 
>>> 
>>> 
>>> <valgrind.log.31371><valgrind.log.31372>
>

Re: [petsc-users] Can't expand MemType 1: jcol 16104

Reply via email to