It lists the line numbers in __baseflow_MOD_poiseuille_3d_petsc (module_baseflows.F90:182) __baseflow_MOD_poiseuille_3d_petsc (module_baseflows.F90:356) where apparently two vectors are created but never get destroyed. Perhaps they get created twice and you only destroy them once? Or you increase their reference count?
Barry 4 bytes in 1 blocks are indirectly lost in loss record 1 of 17 ==31371== at 0x4C2BD2C: memalign (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==31371== by 0x4FE5EDB: PetscMallocAlign (mal.c:28) ==31371== by 0x4F56672: PetscStrallocpy (str.c:188) ==31371== by 0x4F921F7: PetscObjectChangeTypeName (pname.c:153) ==31371== by 0x51CA40C: VecCreate_Seq_Private (bvec2.c:888) ==31371== by 0x51E0533: VecCreate_Seq (bvec3.c:39) ==31371== by 0x51F7C25: VecSetType (vecreg.c:53) ==31371== by 0x51E000F: VecCreateSeq (vseqcr.c:39) ==31371== by 0x51E0778: veccreateseq_ (vseqcrf.c:43) ==31371== by 0x412632: __baseflow_MOD_poiseuille_3d_petsc (module_baseflows.F90:182) ==31371== by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246) ==31371== by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8) ==31371== ==31371== 4 bytes in 1 blocks are indirectly lost in loss record 2 of 17 ==31371== at 0x4C2BD2C: memalign (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==31371== by 0x4FE5EDB: PetscMallocAlign (mal.c:28) ==31371== by 0x4F56672: PetscStrallocpy (str.c:188) ==31371== by 0x4F921F7: PetscObjectChangeTypeName (pname.c:153) ==31371== by 0x51CA40C: VecCreate_Seq_Private (bvec2.c:888) ==31371== by 0x51E0533: VecCreate_Seq (bvec3.c:39) ==31371== by 0x51F7C25: VecSetType (vecreg.c:53) ==31371== by 0x51E000F: VecCreateSeq (vseqcr.c:39) ==31371== by 0x528E06C: VecScatterCreateToZero (vecmpitoseq.c:140) ==31371== by 0x529DD9B: vecscattercreatetozero_ (vecmpitoseqf.c:52) ==31371== by 0x413459: __baseflow_MOD_poiseuille_3d_petsc (module_baseflows.F90:356) ==31371== by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246) ==31371== by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8) ==31371== ==31371== 8 bytes in 1 blocks are indirectly lost in loss record 3 of 17 ==31371== at 0x4C2BD2C: memalign (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==31371== by 0x4FE5EDB: PetscMallocAlign (mal.c:28) ==31371== by 0x5120B52: PetscLayoutSetUp (pmap.c:150) ==31371== by 0x51CA3A9: VecCreate_Seq_Private (bvec2.c:887) ==31371== by 0x51E0533: VecCreate_Seq (bvec3.c:39) ==31371== by 0x51F7C25: VecSetType (vecreg.c:53) ==31371== by 0x51E000F: VecCreateSeq (vseqcr.c:39) ==31371== by 0x51E0778: veccreateseq_ (vseqcrf.c:43) ==31371== by 0x412632: __baseflow_MOD_poiseuille_3d_petsc (module_baseflows.F90:182) ==31371== by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246) ==31371== by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8) ==31371== ==31371== 8 bytes in 1 blocks are indirectly lost in loss record 4 of 17 ==31371== at 0x4C2BD2C: memalign (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==31371== by 0x4FE5EDB: PetscMallocAlign (mal.c:28) ==31371== by 0x5120B52: PetscLayoutSetUp (pmap.c:150) ==31371== by 0x51CA3A9: VecCreate_Seq_Private (bvec2.c:887) ==31371== by 0x51E0533: VecCreate_Seq (bvec3.c:39) ==31371== by 0x51F7C25: VecSetType (vecreg.c:53) ==31371== by 0x51E000F: VecCreateSeq (vseqcr.c:39) ==31371== by 0x528E06C: VecScatterCreateToZero (vecmpitoseq.c:140) ==31371== by 0x529DD9B: vecscattercreatetozero_ (vecmpitoseqf.c:52) ==31371== by 0x413459: __baseflow_MOD_poiseuille_3d_petsc (module_baseflows.F90:356) ==31371== by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246) ==31371== by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8) ==31371== ==31371== 24 bytes in 1 blocks are indirectly lost in loss record 5 of 17 ==31371== at 0x4C2BD2C: memalign (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==31371== by 0x4FE5EDB: PetscMallocAlign (mal.c:28) ==31371== by 0x51CA25C: VecCreate_Seq_Private (bvec2.c:879) ==31371== by 0x51E0533: VecCreate_Seq (bvec3.c:39) ==31371== by 0x51F7C25: VecSetType (vecreg.c:53) ==31371== by 0x51E000F: VecCreateSeq (vseqcr.c:39) ==31371== by 0x51E0778: veccreateseq_ (vseqcrf.c:43) ==31371== by 0x412632: __baseflow_MOD_poiseuille_3d_petsc (module_baseflows.F90:182) ==31371== by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246) ==31371== by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8) ==31371== ==31371== 24 bytes in 1 blocks are indirectly lost in loss record 6 of 17 ==31371== at 0x4C2BD2C: memalign (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==31371== by 0x4FE5EDB: PetscMallocAlign (mal.c:28) ==31371== by 0x51CA25C: VecCreate_Seq_Private (bvec2.c:879) ==31371== by 0x51E0533: VecCreate_Seq (bvec3.c:39) ==31371== by 0x51F7C25: VecSetType (vecreg.c:53) ==31371== by 0x51E000F: VecCreateSeq (vseqcr.c:39) ==31371== by 0x528E06C: VecScatterCreateToZero (vecmpitoseq.c:140) ==31371== by 0x529DD9B: vecscattercreatetozero_ (vecmpitoseqf.c:52) ==31371== by 0x413459: __baseflow_MOD_poiseuille_3d_petsc (module_baseflows.F90:356) ==31371== by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246) ==31371== by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8) ==31371== ==31371== 56 bytes in 1 blocks are indirectly lost in loss record 7 of 17 ==31371== at 0x4C2BD2C: memalign (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==31371== by 0x4FE5EDB: PetscMallocAlign (mal.c:28) ==31371== by 0x511FE32: PetscLayoutCreate (pmap.c:53) ==31371== by 0x51F743B: VecCreate (veccreate.c:39) ==31371== by 0x51DFF42: VecCreateSeq (vseqcr.c:37) ==31371== by 0x51E0778: veccreateseq_ (vseqcrf.c:43) ==31371== by 0x412632: __baseflow_MOD_poiseuille_3d_petsc (module_baseflows.F90:182) ==31371== by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246) ==31371== by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8) ==31371== ==31371== 56 bytes in 1 blocks are indirectly lost in loss record 8 of 17 ==31371== at 0x4C2BD2C: memalign (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==31371== by 0x4FE5EDB: PetscMallocAlign (mal.c:28) ==31371== by 0x511FE32: PetscLayoutCreate (pmap.c:53) ==31371== by 0x51F743B: VecCreate (veccreate.c:39) ==31371== by 0x51DFF42: VecCreateSeq (vseqcr.c:37) ==31371== by 0x528E06C: VecScatterCreateToZero (vecmpitoseq.c:140) ==31371== by 0x529DD9B: vecscattercreatetozero_ (vecmpitoseqf.c:52) ==31371== by 0x413459: __baseflow_MOD_poiseuille_3d_petsc (module_baseflows.F90:356) ==31371== by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246) ==31371== by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8) ==31371== ==31371== 80 bytes in 1 blocks are indirectly lost in loss record 9 of 17 ==31371== at 0x4C2BD2C: memalign (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==31371== by 0x4FE5EDB: PetscMallocAlign (mal.c:28) ==31371== by 0x4FCFDA6: PetscObjectComposedDataIncreaseReal (state.c:167) ==31371== by 0x5203462: VecSet (rvector.c:590) ==31371== by 0x51E05CA: VecCreate_Seq (bvec3.c:44) ==31371== by 0x51F7C25: VecSetType (vecreg.c:53) ==31371== by 0x51E000F: VecCreateSeq (vseqcr.c:39) ==31371== by 0x51E0778: veccreateseq_ (vseqcrf.c:43) ==31371== by 0x412632: __baseflow_MOD_poiseuille_3d_petsc (module_baseflows.F90:182) ==31371== by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246) ==31371== by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8) ==31371== ==31371== 80 bytes in 1 blocks are indirectly lost in loss record 10 of 17 ==31371== at 0x4C2BD2C: memalign (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==31371== by 0x4FE5EDB: PetscMallocAlign (mal.c:28) ==31371== by 0x4FCFE5A: PetscObjectComposedDataIncreaseReal (state.c:168) ==31371== by 0x5203462: VecSet (rvector.c:590) ==31371== by 0x51E05CA: VecCreate_Seq (bvec3.c:44) ==31371== by 0x51F7C25: VecSetType (vecreg.c:53) ==31371== by 0x51E000F: VecCreateSeq (vseqcr.c:39) ==31371== by 0x51E0778: veccreateseq_ (vseqcrf.c:43) ==31371== by 0x412632: __baseflow_MOD_poiseuille_3d_petsc (module_baseflows.F90:182) ==31371== by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246) ==31371== by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8) ==31371== ==31371== 80 bytes in 1 blocks are indirectly lost in loss record 11 of 17 ==31371== at 0x4C2BD2C: memalign (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==31371== by 0x4FE5EDB: PetscMallocAlign (mal.c:28) ==31371== by 0x4FCFDA6: PetscObjectComposedDataIncreaseReal (state.c:167) ==31371== by 0x5203462: VecSet (rvector.c:590) ==31371== by 0x51E05CA: VecCreate_Seq (bvec3.c:44) ==31371== by 0x51F7C25: VecSetType (vecreg.c:53) ==31371== by 0x51E000F: VecCreateSeq (vseqcr.c:39) ==31371== by 0x528E06C: VecScatterCreateToZero (vecmpitoseq.c:140) ==31371== by 0x529DD9B: vecscattercreatetozero_ (vecmpitoseqf.c:52) ==31371== by 0x413459: __baseflow_MOD_poiseuille_3d_petsc (module_baseflows.F90:356) ==31371== by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246) ==31371== by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8) ==31371== ==31371== 80 bytes in 1 blocks are indirectly lost in loss record 12 of 17 ==31371== at 0x4C2BD2C: memalign (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==31371== by 0x4FE5EDB: PetscMallocAlign (mal.c:28) ==31371== by 0x4FCFE5A: PetscObjectComposedDataIncreaseReal (state.c:168) ==31371== by 0x5203462: VecSet (rvector.c:590) ==31371== by 0x51E05CA: VecCreate_Seq (bvec3.c:44) ==31371== by 0x51F7C25: VecSetType (vecreg.c:53) ==31371== by 0x51E000F: VecCreateSeq (vseqcr.c:39) ==31371== by 0x528E06C: VecScatterCreateToZero (vecmpitoseq.c:140) ==31371== by 0x529DD9B: vecscattercreatetozero_ (vecmpitoseqf.c:52) ==31371== by 0x413459: __baseflow_MOD_poiseuille_3d_petsc (module_baseflows.F90:356) ==31371== by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246) ==31371== by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8) ==31371== ==31371== 1,296 bytes in 1 blocks are indirectly lost in loss record 13 of 17 ==31371== at 0x4C2BD2C: memalign (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==31371== by 0x4FE5EDB: PetscMallocAlign (mal.c:28) ==31371== by 0x51E044A: VecCreate_Seq (bvec3.c:37) ==31371== by 0x51F7C25: VecSetType (vecreg.c:53) ==31371== by 0x51E000F: VecCreateSeq (vseqcr.c:39) ==31371== by 0x51E0778: veccreateseq_ (vseqcrf.c:43) ==31371== by 0x412632: __baseflow_MOD_poiseuille_3d_petsc (module_baseflows.F90:182) ==31371== by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246) ==31371== by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8) ==31371== ==31371== 1,296 bytes in 1 blocks are indirectly lost in loss record 14 of 17 ==31371== at 0x4C2BD2C: memalign (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==31371== by 0x4FE5EDB: PetscMallocAlign (mal.c:28) ==31371== by 0x51E044A: VecCreate_Seq (bvec3.c:37) ==31371== by 0x51F7C25: VecSetType (vecreg.c:53) ==31371== by 0x51E000F: VecCreateSeq (vseqcr.c:39) ==31371== by 0x528E06C: VecScatterCreateToZero (vecmpitoseq.c:140) ==31371== by 0x529DD9B: vecscattercreatetozero_ (vecmpitoseqf.c:52) ==31371== by 0x413459: __baseflow_MOD_poiseuille_3d_petsc (module_baseflows.F90:356) ==31371== by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246) ==31371== by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8) ==31371== ==31371== 3,044 (1,496 direct, 1,548 indirect) bytes in 1 blocks are definitely lost in loss record 15 of 17 ==31371== at 0x4C2BD2C: memalign (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==31371== by 0x4FE5EDB: PetscMallocAlign (mal.c:28) ==31371== by 0x51F7312: VecCreate (veccreate.c:37) ==31371== by 0x51DFF42: VecCreateSeq (vseqcr.c:37) ==31371== by 0x51E0778: veccreateseq_ (vseqcrf.c:43) ==31371== by 0x412632: __baseflow_MOD_poiseuille_3d_petsc (module_baseflows.F90:182) ==31371== by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246) ==31371== by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8) ==31371== ==31371== 3,044 (1,496 direct, 1,548 indirect) bytes in 1 blocks are definitely lost in loss record 16 of 17 ==31371== at 0x4C2BD2C: memalign (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==31371== by 0x4FE5EDB: PetscMallocAlign (mal.c:28) ==31371== by 0x51F7312: VecCreate (veccreate.c:37) ==31371== by 0x51DFF42: VecCreateSeq (vseqcr.c:37) ==31371== by 0x528E06C: VecScatterCreateToZero (vecmpitoseq.c:140) ==31371== by 0x529DD9B: vecscattercreatetozero_ (vecmpitoseqf.c:52) ==31371== by 0x413459: __baseflow_MOD_poiseuille_3d_petsc (module_baseflows.F90:356) ==31371== by 0x41953A: MAIN__ (BiGlobal_Spectral_Petsc.F90:246) ==31371== by 0x41C3CC: main (BiGlobal_Spectral_Petsc.F90:8) ==31371== > On Jul 13, 2015, at 2:52 PM, Anthony Haas <[email protected]> wrote: > > Hi Barry, > > If you look in module_baseflows.F90 line 467-473, I have: > > call KSPDestroy(ksp,ierr) > call VecDestroy(frhs,ierr) > call VecDestroy(frhs2,ierr) > call VecDestroy(sol,ierr) > call VecDestroy(sol_seq,ierr) > call MatDestroy(DXX,ierr) > call MatDestroy(DYY,ierr) > > I checked again and all the vectors that I create (see module_baseflows.F90 > lines 87-88) are destroyed there. Also when I use VecScatterCreateToZero, I > always have a VecScatterDestroy after. > > Thanks, > > Anthony > > > > On 07/13/2015 12:15 PM, Barry Smith wrote: >>> On Jul 13, 2015, at 2:05 PM, Anthony Haas <[email protected]> wrote: >>> >>> Hi, >>> >>> I ran my program under Valgrind with 2 processors using the following: >>> >>> ${PETSC_DIR}/${PETSC_ARCH}/bin/mpiexec -n 2 valgrind --tool=memcheck >>> --leak-check=full --show-leak-kinds=all --num-callers=20 >>> --log-file=valgrind.log.%p ./BiGlobal_Spectral_Petsc.x -malloc off >>> -ksp_type preonly -pc_type lu -pc_factor_mat_solver_package superlu_dist >>> -eps_view -st_ksp_type preonly -st_pc_type lu >>> -st_pc_factor_mat_solver_package superlu_dist >>> >>> Note that in my program, I use once PETSc alone to solve a linear system of >>> equations (see module_baseflows.F90) and then later in the code I use SLEPc >>> to get the eigenvalues (see module_petsc.F90). My main program is called >>> BiGlobal_Spectral_Petsc.F90 >>> >>> I have attached the log files from Valgrind. It seems that quite a few >>> erros occur in pzgstrf. See valgrind.log.31371 and 31372. Do I have any >>> control over these errors? >> These are likely bugs in SuperLU_Dist. >> >>> Then starting from line 403 in valgrind.log.31371 and 458 in >>> valgrind.log.31372, I have a series of errors that seem to be related to a >>> VecCreateSeq and to a VecScatterCreateToZero (see in module_baseflows.F90). >>> Is there something I can do to avoid these errors? >> These are because you are not destroying all your vectors at the end. >> >>> Thanks >>> >>> Anthony >>> >>> >>> >>> >>> >>> >>> >>> On 07/08/2015 01:49 PM, Barry Smith wrote: >>>> You should try running under valgrind, see: >>>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind >>>> >>>> You can also run in the debugger (yes this is tricky on a batch system >>>> but possible) and see exactly what triggers the floating point exception >>>> or when it "hangs" interrupt the program to see where it is "hanging". >>>> >>>> >>>> Barry >>>> >>>> >>>> >>>>> On Jul 8, 2015, at 12:34 PM, Anthony Paul Haas <[email protected]> >>>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I have used the switch -mat_superlu_dist_parsymbfact in my pbs script. >>>>> However, although my program worked fine with sequential symbolic >>>>> factorization, I get one of the following 2 behaviors when I run with >>>>> parallel symbolic factorization (depending on the number of processors >>>>> that I use): >>>>> >>>>> 1) the program just hangs (it seems stuck in some subroutine ==> see >>>>> test.out-hangs) >>>>> 2) I get a floating point exception ==> see >>>>> test.out-floating-point-exception >>>>> >>>>> Note that as suggested in the Superlu manual, I use a power of 2 number >>>>> of procs. Are there any tunable parameters for the parallel symbolic >>>>> factorization? Note that when I build my sparse matrix, most elements I >>>>> add are nonzero of course but to simplify the programming, I also add a >>>>> few zero elements in the sparse matrix. I was thinking that maybe if the >>>>> parallel symbolic factorization proceed by block, there could be some >>>>> blocks where the pivot would be zero, hence creating the FPE?? >>>>> >>>>> Thanks, >>>>> >>>>> Anthony >>>>> >>>>> >>>>> >>>>> On Wed, Jul 8, 2015 at 6:46 AM, Xiaoye S. Li <[email protected]> wrote: >>>>> Did you find out how to change option to use parallel symbolic >>>>> factorization? Perhaps PETSc team can help. >>>>> >>>>> Sherry >>>>> >>>>> >>>>> On Tue, Jul 7, 2015 at 3:58 PM, Xiaoye S. Li <[email protected]> wrote: >>>>> Is there an inquiry function that tells you all the available options? >>>>> >>>>> Sherry >>>>> >>>>> On Tue, Jul 7, 2015 at 3:25 PM, Anthony Paul Haas >>>>> <[email protected]> wrote: >>>>> Hi Sherry, >>>>> >>>>> Thanks for your message. I have used superlu_dist default options. I did >>>>> not realize that I was doing serial symbolic factorization. That is >>>>> probably the cause of my problem. >>>>> Each node on Garnet has 60GB usable memory and I can run with 1,2,4,8,16 >>>>> or 32 core per node. >>>>> >>>>> So I should use: >>>>> >>>>> -mat_superlu_dist_r 20 >>>>> -mat_superlu_dist_c 32 >>>>> >>>>> How do you specify the parallel symbolic factorization option? is it >>>>> -mat_superlu_dist_matinput 1 >>>>> >>>>> Thanks, >>>>> >>>>> Anthony >>>>> >>>>> >>>>> On Tue, Jul 7, 2015 at 3:08 PM, Xiaoye S. Li <[email protected]> wrote: >>>>> For superlu_dist failure, this occurs during symbolic factorization. >>>>> Since you are using serial symbolic factorization, it requires the entire >>>>> graph of A to be available in the memory of one MPI task. How much memory >>>>> do you have for each MPI task? >>>>> >>>>> It won't help even if you use more processes. You should try to use >>>>> parallel symbolic factorization option. >>>>> >>>>> Another point. You set up process grid as: >>>>> Process grid nprow 32 x npcol 20 >>>>> For better performance, you show swap the grid dimension. That is, it's >>>>> better to use 20 x 32, never gives nprow larger than npcol. >>>>> >>>>> >>>>> Sherry >>>>> >>>>> >>>>> On Tue, Jul 7, 2015 at 1:27 PM, Barry Smith <[email protected]> wrote: >>>>> >>>>> I would suggest running a sequence of problems, 101 by 101 111 by 111 >>>>> etc and get the memory usage in each case (when you run out of memory you >>>>> can get NO useful information out about memory needs). You can then plot >>>>> memory usage as a function of problem size to get a handle on how much >>>>> memory it is using. You can also run on more and more processes (which >>>>> have a total of more memory) to see how large a problem you may be able >>>>> to reach. >>>>> >>>>> MUMPS also has an "out of core" version (which we have never used) >>>>> that could in theory anyways let you get to large problems if you have >>>>> lots of disk space, but you are on your own figuring out how to use it. >>>>> >>>>> Barry >>>>> >>>>>> On Jul 7, 2015, at 2:37 PM, Anthony Paul Haas <[email protected]> >>>>>> wrote: >>>>>> >>>>>> Hi Jose, >>>>>> >>>>>> In my code, I use once PETSc to solve a linear system to get the >>>>>> baseflow (without using SLEPc) and then I use SLEPc to do the stability >>>>>> analysis of that baseflow. This is why, there are some SLEPc options >>>>>> that are not used in test.out-superlu_dist-151x151 (when I am solving >>>>>> for the baseflow with PETSc only). I have attached a 101x101 case for >>>>>> which I get the eigenvalues. That case works fine. However If i increase >>>>>> to 151x151, I get the error that you can see in >>>>>> test.out-superlu_dist-151x151 (similar error with mumps: see >>>>>> test.out-mumps-151x151 line 2918 ). If you look a the very end of the >>>>>> files test.out-superlu_dist-151x151 and test.out-mumps-151x151, you will >>>>>> see that the last info message printed is: >>>>>> >>>>>> On Processor (after EPSSetFromOptions) 0 memory: >>>>>> 0.65073152000E+08 =====> (see line 807 of module_petsc.F90) >>>>>> >>>>>> This means that the memory error probably occurs in the call to EPSSolve >>>>>> (see module_petsc.F90 line 810). I would like to evaluate how much >>>>>> memory is required by the most memory intensive operation within >>>>>> EPSSolve. Since I am solving a generalized EVP, I would imagine that it >>>>>> would be the LU decomposition. But is there an accurate way of doing it? >>>>>> >>>>>> Before starting with iterative solvers, I would like to exploit as much >>>>>> as I can direct solvers. I tried GMRES with default preconditioner at >>>>>> some point but I had convergence problem. What solver/preconditioner >>>>>> would you recommend for a generalized non-Hermitian (EPS_GNHEP) EVP? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Anthony >>>>>> >>>>>> On Tue, Jul 7, 2015 at 12:17 AM, Jose E. Roman <[email protected]> >>>>>> wrote: >>>>>> >>>>>> El 07/07/2015, a las 02:33, Anthony Haas escribió: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I am computing eigenvalues using PETSc/SLEPc and superlu_dist for the >>>>>>> LU decomposition (my problem is a generalized eigenvalue problem). The >>>>>>> code runs fine for a grid with 101x101 but when I increase to 151x151, >>>>>>> I get the following error: >>>>>>> >>>>>>> Can't expand MemType 1: jcol 16104 (and then [NID 00037] 2015-07-06 >>>>>>> 19:19:17 Apid 31025976: OOM killer terminated this process.) >>>>>>> >>>>>>> It seems to be a memory problem. I monitor the memory usage as far as I >>>>>>> can and it seems that memory usage is pretty low. The most memory >>>>>>> intensive part of the program is probably the LU decomposition in the >>>>>>> context of the generalized EVP. Is there a way to evaluate how much >>>>>>> memory will be required for that step? I am currently running the debug >>>>>>> version of the code which I would assume would use more memory? >>>>>>> >>>>>>> I have attached the output of the job. Note that the program uses twice >>>>>>> PETSc: 1) to solve a linear system for which no problem occurs, and, 2) >>>>>>> to solve the Generalized EVP with SLEPc, where I get the error. >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> Anthony >>>>>>> <test.out-superlu_dist-151x151> >>>>>> In the output you are attaching there are no SLEPc objects in the report >>>>>> and SLEPc options are not used. It seems that SLEPc calls are skipped? >>>>>> >>>>>> Do you get the same error with MUMPS? Have you tried to solve linear >>>>>> systems with a preconditioned iterative solver? >>>>>> >>>>>> Jose >>>>>> >>>>>> >>>>>> <module_petsc.F90><test.out-mumps-151x151><test.out_superlu_dist-101x101><test.out-superlu_dist-151x151> >>>>> >>>>> >>>>> >>>>> >>>>> <test.out-hangs><test.out-floating-point-exception> >>> >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> !! SPECTRAL TEMPORAL BIGLOBAL SOLVER ! >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> >>> PROGRAM Biglobal_Spectral >>> >>> USE PETSC >>> USE TYPEDEF >>> USE MAPPINGS >>> USE BASEFLOW >>> USE WRITE_PRINT >>> USE FOURIER_CHEBYSHEV >>> >>> IMPLICIT NONE >>> >>> include 'mpif.h' >>> >>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -! >>> ! Variables Definition ! >>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -! >>> >>> !...Testing and dummy variables >>> real(dp) :: dummy >>> >>> !...Problem parameters >>> integer(i4b),parameter :: nx=11,ny=11,nxy=nx*ny,N=4*nxy >>> >>> !Grid size for 3D Poiseuille baseflow >>> integer(i4b),parameter :: nnx=nx,nny=ny >>> >>> real(dp) :: Re,alpha,beta,Lx,xmin,xmax,ymin,ymax,AA,xll,yll >>> >>> !...Problem Flags >>> integer(i4b) :: fourier,testcase,formulation,algorithm >>> >>> !Mapping Flags >>> integer(i4b) :: map,mapx,mapy >>> >>> !...Indices >>> integer(i4b) :: i,j,iii,jjj,ij,skip,imin,imax,jmin,jmax >>> >>> !...Output files >>> integer(i4b) :: file_num >>> character(128) :: filetype,filename,file_name >>> >>> !...Boundary conditions variables >>> integer(i4b) :: ct1,ct2,ct3,ct4 >>> integer(i4b),dimension(:),allocatable :: >>> fst_bc,wall_bc,outlet_bc,inlet_bc >>> integer(i4b),dimension(:),allocatable :: bcond_px,bcond_py,bcond_pz >>> integer(i4b),dimension(:),allocatable :: bcond_u,bcond_v,bcond_w,bcond_p >>> integer(i4b),dimension(:),allocatable :: bcond >>> >>> !...Chebyshev related >>> real(dp),dimension(:),allocatable :: xi,yi,x,y,xfourier >>> >>> !...Baseflow variables >>> real(dp),dimension(:,:),allocatable :: usol,uxsol,uysol >>> real(dp),dimension(:,:),allocatable :: vsol,vxsol,vysol >>> real(dp),dimension(:,:),allocatable :: wsol,wxsol,wysol >>> >>> !For Hiemenz Baseflow >>> integer(i4b) :: npts_hz >>> real(dp) :: ymax_hz,epstol,si >>> >>> real(dp),dimension(:),allocatable :: eta >>> real(dp),dimension(:),allocatable :: u0,v0,v1,v2,v3 >>> real(dp),dimension(:),allocatable :: w0,w1,w2 >>> real(dp),dimension(:),allocatable :: etarev,u0rev,v0rev,w0rev >>> >>> real(dp),dimension(:,:),allocatable :: Ubar,Vbar,Wbar >>> >>> !...Grid variables >>> real(dp),dimension(:,:),allocatable :: xx,yy,xx1,yy1 >>> >>> !...Vectorization array >>> integer(i4b),dimension(:),allocatable :: li >>> >>> !...MPI >>> integer(i4b) :: rank,size,ierr >>> >>> !...Spline interpolation >>> !real(dp),dimension(:),allocatable :: bsp,csp,dsp >>> !real(dp),dimension(:),allocatable :: interp_u0,interp_v0,interp_w0 >>> >>> real(dp) :: seval >>> >>> !...External Subroutines >>> external seval >>> >>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -! >>> ! Beginning of program ! >>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -! >>> >>> call MPI_INIT(ierr) >>> >>> call MPI_COMM_SIZE(MPI_COMM_WORLD,size,ierr) >>> call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr) >>> >>> if (rank==0) call cpu_time(start_total) >>> >>> !...Parameters for BiGlobal Computation >>> >>> testcase=2 ! 1 --> Plane Poiseuille Flow >>> ! 2 --> 3D Poiseuille Flow >>> ! 3 --> Attachment line flow (Hiemenz) >>> ! 4 --> Laminar Separation Bubble >>> ! 5 --> Artificially parallel boundary-layer (cf. >>> Rodriguez thesis) >>> >>> formulation = 1 ! 1 --> x,y inhomogeneous >>> ! 2 --> y,z inhomogeneous >>> >>> algorithm = 1 ! 1 --> QZ algorith >>> ! 2 --> Arnoldi algorithm >>> >>> >>> !...Arrays allocation >>> >>> allocate(li(nx)) >>> allocate(xi(nx),yi(ny)) >>> allocate(x(nx),y(ny)) >>> allocate(xfourier(nx)) >>> >>> allocate(uvec(nxy),uxvec(nxy),uyvec(nxy)) >>> allocate(vvec(nxy),vxvec(nxy),vyvec(nxy)) >>> allocate(wvec(nxy),wxvec(nxy),wyvec(nxy)) >>> >>> allocate(xx(nx,ny),yy(nx,ny)) >>> allocate(D1X(nx,nx),D2X(nx,nx)) >>> allocate(D1Y(ny,ny),D2Y(ny,ny)) >>> allocate(DMX(nx,nx,2),DMY(ny,ny,2)) >>> >>> !allocate(interp_u0(ny),interp_v0(ny),interp_w0(ny)) >>> >>> li = 0 >>> xi = 0.0d0 >>> yi = 0.0d0 >>> x = 0.0d0 >>> y = 0.0d0 >>> xfourier = 0.0d0 >>> uvec = 0.0d0 >>> uxvec = 0.0d0 >>> uyvec = 0.0d0 >>> vvec = 0.0d0 >>> vxvec = 0.0d0 >>> vyvec = 0.0d0 >>> wvec = 0.0d0 >>> wxvec = 0.0d0 >>> wyvec = 0.0d0 >>> xx = 0.0d0 >>> yy = 0.0d0 >>> D1X = 0.0d0 >>> D2X = 0.0d0 >>> D1Y = 0.0d0 >>> D2Y = 0.0d0 >>> DMX = 0.0d0 >>> DMY = 0.0d0 >>> >>> !...Test Cases Parameters >>> >>> if ( testcase == 2 ) then >>> >>> mapx=1 >>> mapy=0 >>> >>> Re=1000 >>> >>> if (formulation == 1)then >>> beta = pi >>> elseif (formulation == 2) then >>> alpha = pi >>> endif >>> >>> AA=1.0 >>> >>> xmin=-AA >>> xmax=AA >>> >>> ymin=-1.0 >>> ymax=1.0 >>> >>> !grid for baseflow >>> !nnx=nx >>> !nny=ny >>> >>> endif >>> >>> !...Working Array for Converting 2D Indices to 1D >>> >>> li=0 >>> >>> do i=1,nx >>> li(i)=(i-1)*ny >>> enddo >>> >>> !...Get Collocation Points and Chebyshev Differentiation Matrices >>> >>> call cheby(nx-1,xi,DMX) ! (cf. Spectral Methods in Matlab pages 53-55) >>> call cheby(ny-1,yi,DMY) >>> >>> xll=0 >>> >>> ! To get matrices D1X,D2X,D1Y,D2Y >>> call set_mappings(nx,ny,mapx,mapy,xmin,xmax,ymin,ymax,xi,yi,xll,yll,x,y) >>> >>> do i=1,nx >>> do j=1,ny >>> >>> xx(i,j)=x(i) ! not equivalent to matlab meshgrid!!! >>> yy(i,j)=y(j) ! not equivalent to matlab meshgrid!!! >>> >>> enddo >>> enddo >>> >>> >>> !...Compute Baseflow (3D Poiseuille) >>> >>> if ( testcase == 2 ) then >>> >>> allocate(xx1(nnx,nny),yy1(nnx,nny)) >>> >>> allocate(usol(nnx,nny),uxsol(nnx,nny),uysol(nnx,nny)) >>> allocate(vsol(nnx,nny),vxsol(nnx,nny),vysol(nnx,nny)) >>> allocate(wsol(nnx,nny),wxsol(nnx,nny),wysol(nnx,nny)) >>> >>> xx1 = 0.0d0 >>> yy1 = 0.0d0 >>> usol = 0.0d0 >>> uxsol = 0.0d0 >>> uysol = 0.0d0 >>> vsol = 0.0d0 >>> vxsol = 0.0d0 >>> vysol = 0.0d0 >>> wsol = 0.0d0 >>> wxsol = 0.0d0 >>> wysol = 0.0d0 >>> >>> if (rank == 0) then >>> WRITE(*,*) >>> WRITE(*,'(A,I3,A,I3)')'3D POISEUILLE BASEFLOW COMPUTED ON GRID >>> WITH NX =',nnx,' NY =',nny >>> WRITE(*,*) >>> call cpu_time(start) >>> endif >>> >>> !currently I am taking same grid for baseflow as for biglobal ==> no >>> interpolation >>> call Poiseuille_3D_PETSC(nnx,nny,AA,usol,xx1,yy1) >>> >>> if (rank==0) then >>> call cpu_time(finish) >>> print '("Time for baseflow computation = ",f15.4," >>> seconds.")',finish-start >>> >>> file_num=10 >>> file_name='check_baseflow_3D_poiseuille.dat' >>> >>> call >>> write_to_tecplot(file_num,file_name,nnx,nny,xx1,yy1,usol,usol,usol) >>> >>> endif >>> >>> vsol=0*usol >>> wsol=0*usol >>> >>> if (formulation==1) then >>> >>> wsol=usol >>> usol=0.*wsol >>> vsol=0.*wsol >>> >>> endif >>> >>> !Compute derivatives >>> uxsol = MATMUL(D1X,usol) >>> uysol = MATMUL(usol,TRANSPOSE(D1Y)) >>> >>> vxsol = MATMUL(D1X,vsol) >>> vysol = MATMUL(vsol,TRANSPOSE(D1Y)) >>> >>> wxsol = MATMUL(D1X,wsol) >>> wysol = MATMUL(wsol,TRANSPOSE(D1Y)) >>> >>> !Vectorize the matrices >>> do i=1,nx >>> do j=1,ny >>> >>> ij=li(i)+j >>> >>> uvec(ij) = usol(i,j) >>> vvec(ij) = vsol(i,j) >>> wvec(ij) = wsol(i,j) >>> >>> uxvec(ij) = uxsol(i,j) >>> vxvec(ij) = vxsol(i,j) >>> wxvec(ij) = wxsol(i,j) >>> >>> uyvec(ij) = uysol(i,j) >>> vyvec(ij) = vysol(i,j) >>> wyvec(ij) = wysol(i,j) >>> >>> enddo >>> enddo >>> >>> endif >>> >>> !call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) >>> >>> !...Build Matrices, Set BCs and Solve Problem >>> >>> call SetupSolveGEVP(nx,ny,nxy,alpha,beta,Re) >>> >>> if (rank==0) then >>> call cpu_time(finish_total) >>> print '("Total time = ",f15.4," seconds.")',finish_total-start_total >>> endif >>> >>> deallocate(li) >>> deallocate(xi,yi) >>> deallocate(x,y) >>> deallocate(xfourier) >>> >>> deallocate(uvec,uxvec,uyvec) >>> deallocate(vvec,vxvec,vyvec) >>> deallocate(wvec,wxvec,wyvec) >>> >>> deallocate(xx,yy) >>> deallocate(D1X,D2X) >>> deallocate(D1Y,D2Y) >>> deallocate(DMX,DMY) >>> >>> deallocate(xx1,yy1) >>> >>> deallocate(usol,uxsol,uysol) >>> deallocate(vsol,vxsol,vysol) >>> deallocate(wsol,wxsol,wysol) >>> >>> call MPI_FINALIZE(ierr) >>> >>> >>> END PROGRAM Biglobal_Spectral >>> >>> MODULE BASEFLOW >>> >>> USE TYPEDEF >>> USE WRITE_PRINT >>> USE FOURIER_CHEBYSHEV >>> >>> IMPLICIT NONE >>> >>> CONTAINS >>> >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> >>> SUBROUTINE Poiseuille_3D_PETSC(nx,ny,AA,usol,xx,yy) >>> >>> #include <petsc/finclude/petscsys.h> >>> #include <petsc/finclude/petscvec.h> >>> #include <petsc/finclude/petscmat.h> >>> #include <petsc/finclude/petscmat.h90> >>> #include <petsc/finclude/petscvec.h90> >>> #include <petsc/finclude/petscpc.h> >>> #include <petsc/finclude/petscksp.h> >>> #include <petsc/finclude/petscviewer.h> >>> #include <petsc/finclude/petscviewer.h90> >>> >>> #include <slepc/finclude/slepcsys.h> >>> #include <slepc/finclude/slepceps.h> >>> >>> !...Regular Variables >>> >>> integer(i4b),intent(in) :: nx,ny >>> >>> real(dp),intent(in) :: AA >>> real(dp),dimension(nx,ny),intent(inout) :: usol,xx,yy >>> >>> integer(i4b) :: i,j,ij >>> integer(i4b) :: rank,size,ierr,source >>> integer(i4b) :: nxy,nxm2,nym2,nxym2 >>> >>> integer(i4b),dimension(nx) :: li >>> integer(i4b),dimension(nx-2) :: lim2 >>> >>> integer(i4b),dimension(:),allocatable :: loc >>> >>> real(dp) :: ymin,ymax,xmin,xmax >>> real(dp) :: dxidx,d2xidx2,dyidy,d2yidy2 >>> >>> real(dp),dimension(nx) :: xi,x >>> real(dp),dimension(ny) :: yi,y >>> >>> real(dp),dimension(nx,nx) :: D2XB >>> real(dp),dimension(ny,ny) :: D2YB >>> real(dp),dimension(nx,nx,2) :: DMXB >>> real(dp),dimension(ny,ny,2) :: DMYB >>> >>> real(dp),dimension(nx*ny) :: usol_vec >>> real(dp),dimension( nx-2,ny-2 ) :: sol_mat >>> >>> complex(dpc) :: U00 >>> complex(dpc) :: atest >>> complex(dpc),dimension(nx-2,nx-2) :: D2XBs >>> complex(dpc),dimension(ny-2,ny-2) :: D2YBs >>> complex(dpc),dimension( nx-2,ny-2 ) :: sol_mat_cmplx >>> complex(dpc),dimension(:),allocatable :: xwork1 >>> >>> !!$ allocate(li(nx)) >>> !!$ allocate(lim2(nx-2)) >>> >>> !!! NEED TO TRANSFORM ALL THESE ARRAYS TO ALLOCATABLE!!!!!! >>> >>> !...Petsc Variables >>> PetscInt ii,jj,ione,ct >>> PetscInt jmin,jmax >>> PetscInt Istart,Iend >>> PetscInt IstartVec,IendVec >>> >>> PetscReal tol >>> PetscScalar aone >>> >>> PetscViewer viewer >>> >>> PC pc >>> KSP ksp >>> VecScatter ctx >>> Mat DXX,DYY >>> !Mat DLOAD >>> Vec frhs,frhs2 >>> Vec sol,sol_seq >>> >>> >>> !For VecGetArrayReadF90 >>> !PetscScalar,pointer :: xx_v(:) !Fortran90 pointer to the array >>> PetscOffset i_x >>> PetscScalar sol_array(1) >>> >>> !...MPI >>> !PetscMPIInt rank,size >>> >>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -! >>> ! Beginning of program ! >>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -! >>> >>> call MPI_COMM_SIZE(MPI_COMM_WORLD,size,ierr) >>> call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr) >>> >>> call PetscInitialize(PETSC_NULL_CHARACTER,ierr) >>> >>> !call PetscOptionsInsertString("-mat_superlu_dist_parsymbfact",ierr) >>> >>> !...Check if nz and ny are odd (necessary to set max velocity to 1) >>> >>> if ( mod(nx,2)==0 .or. mod(ny,2)==0 ) then >>> write(*,*) >>> STOP'STOPPING... IN 3D POISEUILLE BASEFLOW NZ,NY NEED TO BE ODD' >>> endif >>> >>> !...Set Domain Sizes and Initialize other variables >>> >>> xmin=-AA >>> xmax=AA >>> >>> ymin=-1 >>> ymax=1 >>> >>> ione=1 >>> >>> nxy=nx*ny >>> >>> !Variables to Build DXX and DYY >>> nxm2=nx-2 >>> nym2=ny-2 >>> nxym2 = nxm2*nym2 >>> >>> !...Mat to Vec and Vec to Mat conversion >>> >>> lim2=0 >>> do i=1,nxm2 >>> lim2(i) = (i-1)*nym2 >>> enddo >>> >>> !...Get collocation points and chebyshev differentiation matrices >>> >>> call cheby(nx-1,xi,DMXB) >>> call cheby(ny-1,yi,DMYB) >>> >>> ! x-mapping from [-1,1]->[-XMAX,XMAX] >>> x=xmax*xi >>> >>> ! y-mapping (from [-1,1]->[-1,1]) >>> y=yi >>> >>> !Compute the metric terms >>> dxidx = 1/xmax >>> d2xidx2 = 0 >>> >>> dyidy = 1 >>> d2yidy2 = 0 >>> >>> !Scale the differentiation matrix (due to the use of the mapping) >>> >>> !x-direction >>> D2XB = d2xidx2*DMXB(:,:,1) + dxidx**2*DMXB(:,:,2) >>> D2XBs = D2XB(2:nx-1,2:nx-1) >>> >>> !y-direction >>> D2YB = d2yidy2*DMYB(:,:,1) + dyidy**2*DMYB(:,:,2) >>> D2YBs = D2YB(2:ny-1,2:ny-1) >>> >>> xx=0.0 >>> yy=0.0 >>> >>> do i=1,nx >>> do j=1,ny >>> >>> xx(i,j)=x(i) >>> yy(i,j)=y(j) >>> >>> enddo >>> enddo >>> >>> !...Create Petsc right-hand-side vector and derivation matrices >>> call VecCreateSeq(PETSC_COMM_SELF,nxym2,sol_seq,ierr) >>> call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,nxym2,frhs,ierr) ! right >>> hand side vector >>> call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,nxym2,frhs2,ierr) ! >>> right hand side vector # 2 >>> >>> call >>> MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,nxym2,nxym2,nx,PETSC_NULL_INTEGER,nx,PETSC_NULL_INTEGER,DXX,ierr) >>> call >>> MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,nxym2,nxym2,ny,PETSC_NULL_INTEGER,ny,PETSC_NULL_INTEGER,DYY,ierr) >>> >>> !...Determine which rows of the vector/matrix are locally owned. >>> call VecGetOwnershipRange(frhs,IstartVec,IendVec,ierr) >>> call MatGetOwnershipRange(DXX,Istart,Iend,ierr) >>> >>> if ( IstartVec /= Istart .or. IendVec /= Iend ) then >>> stop 'Problem in Baseflow Computation' >>> endif >>> >>> !...Fill in Sequential and Parallel Vectors >>> allocate(xwork1(Iend-Istart)) >>> allocate(loc(Iend-Istart)) >>> >>> ct=0 >>> do i=Istart,Iend-1 >>> ct=ct+1 >>> loc(ct)=i >>> xwork1(ct)=(-2.0d0,0.0d0) >>> enddo >>> >>> call VecSetValues(frhs,Iend-Istart,loc,xwork1,INSERT_VALUES,ierr) >>> call VecZeroEntries(sol_seq,ierr) >>> >>> !...Assemble Vectors >>> call VecAssemblyBegin(frhs,ierr) >>> call VecAssemblyEnd(frhs,ierr) >>> >>> call VecAssemblyBegin(sol_seq,ierr) >>> call VecAssemblyEnd(sol_seq,ierr) >>> >>> call VecDuplicate(frhs,sol,ierr) ! parallel solution vector >>> >>> !...Display vector >>> !call VecView(frhs,PETSC_VIEWER_STDOUT_WORLD,ierr) ! ==> seems very slow >>> >>> !...Build DXX in Parallel >>> >>> !!$ if (rank==0) then >>> !!$ call print_matrix('D2XB',nx,nx,D2XB) >>> !!$ call print_matrix('D2YB',ny,ny,D2YB) >>> !!$ >>> !!$ call print_matrix_cmplx('D2XBs',nx-2,nx-2,D2XBs) >>> !!$ call print_matrix_cmplx('D2YBs',ny-2,ny-2,D2YBs) >>> !!$ endif >>> >>> if (rank == 0) call cpu_time(start) >>> >>> do i=Istart,Iend-1 >>> >>> ii=1+floor((i+0.01)/nym2) >>> jj=1 >>> >>> jmin=mod(i,nym2) >>> jmax=jmin+(nxm2-1)*nym2 >>> >>> do j=jmin,jmax,nym2 >>> >>> call >>> MatSetValues(DXX,ione,i,ione,j,D2XBs(ii,jj),INSERT_VALUES,ierr) >>> jj=jj+1 >>> >>> enddo >>> >>> enddo >>> >>> !...Build DYY in Parallel >>> >>> do i=Istart,Iend-1 >>> >>> ii=mod(i,nym2)+1 >>> jj=1 >>> >>> jmin=floor((i+0.01)/nym2)*nym2 >>> >>> do j=jmin,jmin+(nym2-1) >>> >>> call >>> MatSetValues(DYY,ione,i,ione,j,D2YBs(ii,jj),INSERT_VALUES,ierr) >>> jj=jj+1 >>> >>> enddo >>> >>> enddo >>> >>> !...Assemble DXX and DYY >>> >>> call MatAssemblyBegin(DXX,MAT_FINAL_ASSEMBLY,ierr) >>> call MatAssemblyEnd(DXX,MAT_FINAL_ASSEMBLY,ierr) >>> >>> call MatAssemblyBegin(DYY,MAT_FINAL_ASSEMBLY,ierr) >>> call MatAssemblyEnd(DYY,MAT_FINAL_ASSEMBLY,ierr) >>> >>> if (rank==0) then >>> call cpu_time(finish) >>> print '("Time for build and assembly of baseflow operator = ",f15.4," >>> seconds.")',finish-start >>> endif >>> >>> !call MatView(DXX,PETSC_VIEWER_STDOUT_WORLD,ierr) >>> !call MatView(DYY,PETSC_VIEWER_STDOUT_WORLD,ierr) >>> >>> aone = (1.0d0,0.0d0) >>> >>> if (rank == 0) call cpu_time(start) >>> call MatAXPY(DXX,aone,DYY,DIFFERENT_NONZERO_PATTERN,ierr) ! on exit DXX >>> = a*DYY + DXX >>> if (rank==0) then >>> call cpu_time(finish) >>> print '("Time for Matrix summation = ",f15.4," >>> seconds.")',finish-start >>> endif >>> >>> >>> !...Write Matrix to ASCII and Binary Format >>> !!$ call PetscViewerASCIIOpen(PETSC_COMM_WORLD,"Amat.m",viewer,ierr) >>> !!$ call MatView(DXX,viewer,ierr) >>> !!$ call PetscViewerDestroy(viewer,ierr) >>> !!$ >>> !!$ call >>> PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_WRITE,viewer,ierr) >>> !!$ call MatView(DXX,viewer,ierr) >>> !!$ call PetscViewerDestroy(viewer,ierr) >>> >>> !...Load a Matrix in Binary Format >>> !!$ call >>> PetscViewerBinaryOpen(PETSC_COMM_WORLD,"Amat_binary.m",FILE_MODE_READ,viewer,ierr) >>> !!$ call MatCreate(PETSC_COMM_WORLD,DLOAD,ierr) >>> !!$ call MatSetType(DLOAD,MATAIJ,ierr) >>> !!$ call MatLoad(DLOAD,viewer,ierr) >>> !!$ call PetscViewerDestroy(viewer,ierr) >>> >>> !call MatTranspose(DXX,MatReuse reuse,Mat *B) >>> !call MatView(DXX,PETSC_VIEWER_STDOUT_WORLD,ierr) >>> >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> !! SOLVE POISSON EQUATION FOR THE FIRST TIME !! >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> >>> !...Create linear solver context >>> call KSPCreate(PETSC_COMM_WORLD,ksp,ierr) >>> >>> !...Set operators. Here the matrix that defines the linear system also >>> serves as the preconditioning matrix. >>> !call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr) !aha >>> commented and replaced by next line >>> call KSPSetOperators(ksp,DXX,DXX,ierr) ! remember: here DXX=DXX+DYY >>> >>> !call KSPSetOperators(ksp,DLOAD,DLOAD,ierr) ! remember: here DXX=DXX+DYY >>> >>> tol = 1.e-10 >>> !call >>> KSPSetTolerances(ksp,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr) >>> ! set relative tolerance and uses defualt for absolute and divergence tol >>> call >>> KSPSetTolerances(ksp,tol,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr) >>> ! set relative and absolute tolerances and uses defualt for divergence tol >>> >>> !...Set the Direct (LU) Solver >>> !call KSPSetType(ksp,KSPPREONLY,ierr) >>> !call KSPGetPC(ksp,pc,ierr) >>> !call PCSetType(pc,PCLU,ierr) >>> !call PCFactorSetMatSolverPackage(pc,MATSOLVERMUMPS,ierr) ! >>> MATSOLVERSUPERLU_DIST >>> >>> !...Set runtime options, e.g., -ksp_type <type> -pc_type <type> >>> -ksp_monitor -ksp_rtol <rtol> >>> !These options will override those specified above as long as >>> KSPSetFromOptions() is called _after_ >>> !any other customization routines. >>> >>> call KSPSetFromOptions(ksp,ierr) >>> >>> !...Sets an ADDITIONAL function to be called at every iteration to monitor >>> the residual/error etc >>> call >>> KSPMonitorSet(ksp,KSPMonitorTrueResidualNorm,PETSC_NULL_OBJECT,PETSC_NULL_FUNCTION,ierr) >>> >>> !...Solve the linear system >>> call KSPSolve(ksp,frhs,sol,ierr) >>> >>> !...View solver info (could use -ksp_view instead) >>> >>> call KSPView(ksp,PETSC_VIEWER_STDOUT_WORLD,ierr) >>> >>> !...Extract Maximum Value (at center of duct) from Solution Vector >>> >>> call VecScatterCreateToZero(sol,ctx,sol_seq,ierr) >>> !scatter as many times as you need >>> call VecScatterBegin(ctx,sol,sol_seq,INSERT_VALUES,SCATTER_FORWARD,ierr) >>> call VecScatterEnd(ctx,sol,sol_seq,INSERT_VALUES,SCATTER_FORWARD,ierr) >>> !destroy scatter context and local vector when no longer needed >>> call VecScatterDestroy(ctx,ierr) >>> >>> !call VecView(sol_seq,PETSC_VIEWER_STDOUT_WORLD,ierr) >>> >>> call VecGetArray(sol_seq,sol_array,i_x,ierr) >>> >>> if (rank==0) then >>> >>> sol_mat_cmplx=0.0d0 >>> >>> do i=1,nxm2 >>> do j=1,nym2 >>> ij=lim2(i)+j >>> sol_mat_cmplx(i,j) = sol_array(i_x + ij) >>> enddo >>> enddo >>> >>> U00 = sol_mat_cmplx((nxm2+1)/2,(nym2+1)/2) >>> >>> endif >>> >>> source=0 >>> Call MPI_Bcast(U00,1,MPI_DOUBLE_COMPLEX,source,MPI_COMM_WORLD,ierr) >>> >>> !call VecGetArrayReadF90(sol,xx_v,ierr) ! requires #include >>> <petsc/finclude/petscvec.h90> >>> !atest = xx_v(3) >>> !call VecRestoreArrayReadF90(sol,xx_v,ierr) >>> !write(*,*)'atest',atest >>> >>> call VecRestoreArray(sol_seq,sol_array,i_x,ierr) >>> >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> !! SOLVE POISSON EQUATION FOR THE SECOND TIME !! >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> >>> !...Fill in New Right-Hand-Side Vector (frhs2) >>> call VecSetValues(frhs2,Iend-Istart,loc,xwork1/U00,INSERT_VALUES,ierr) >>> >>> !...Assemble New RHS Vector >>> call VecAssemblyBegin(frhs2,ierr) >>> call VecAssemblyEnd(frhs2,ierr) >>> >>> !...Solve System >>> call KSPSolve(ksp,frhs2,sol,ierr) >>> >>> !...Get Final Solution >>> call VecScatterCreateToZero(sol,ctx,sol_seq,ierr) >>> call VecScatterBegin(ctx,sol,sol_seq,INSERT_VALUES,SCATTER_FORWARD,ierr) >>> call VecScatterEnd(ctx,sol,sol_seq,INSERT_VALUES,SCATTER_FORWARD,ierr) >>> call VecScatterDestroy(ctx,ierr) >>> >>> call VecGetArray(sol_seq,sol_array,i_x,ierr) >>> >>> if (rank==0) then >>> >>> usol=0.0d0 >>> sol_mat_cmplx=0.0d0 >>> >>> do i=1,nxm2 >>> do j=1,nym2 >>> ij=lim2(i)+j >>> sol_mat_cmplx(i,j) = sol_array(i_x + ij) >>> enddo >>> enddo >>> >>> usol(2:nx-1,2:ny-1) = dble(sol_mat_cmplx) >>> >>> endif >>> >>> call VecRestoreArray(sol_seq,sol_array,i_x,ierr) >>> >>> !...Send usol to all other processes (vectorize, broadcast and matrixize) >>> li=0 >>> do i=1,nx >>> li(i) = (i-1)*ny >>> enddo >>> >>> if (rank==0) then >>> >>> do i=1,nx >>> do j=1,ny >>> ij=li(i)+j >>> usol_vec(ij)=usol(i,j) >>> enddo >>> enddo >>> >>> endif >>> >>> call MPI_Bcast(usol_vec,nx*ny,MPI_DOUBLE,source,MPI_COMM_WORLD,ierr) >>> >>> if (rank/=0) then >>> >>> do i=1,nx >>> do j=1,ny >>> ij=li(i)+j >>> usol(i,j)=usol_vec(ij) >>> enddo >>> enddo >>> >>> endif >>> >>> !...Free work space. All PETSc objects should be destroyed when they are no >>> longer needed. >>> >>> deallocate(loc) >>> deallocate(xwork1) >>> >>> call KSPDestroy(ksp,ierr) >>> call VecDestroy(frhs,ierr) >>> call VecDestroy(frhs2,ierr) >>> call VecDestroy(sol,ierr) >>> call VecDestroy(sol_seq,ierr) >>> call MatDestroy(DXX,ierr) >>> call MatDestroy(DYY,ierr) >>> >>> call PetscFinalize(ierr) >>> >>> >>> END SUBROUTINE Poiseuille_3D_PETSC >>> >>> >>> >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> >>> >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> !!!!!!! HIEMENZ FLOW SOLVER !!!!!!!! >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> >>> >>> !************************************************************************************************* >>> ! on Veltins: ifort hiemenz.f90 -convert big_endian -r8 -i4 -O3 -132 -o >>> hiemenz.x >>> ! >>> ************************************************************************************************ >>> !program hiemenz >>> >>> subroutine hiemenz(ymax,eps,n,si,eta,u0,v0,v1,v2,v3,w0,w1,w2) >>> >>> implicit none >>> >>> real(dp) eps,ymax,si >>> integer(i4b) n,i >>> >>> ! real(dp),allocatable,dimension(:) :: eta >>> ! real(dp),allocatable,dimension(:) :: u0,v0,v1,v2,v3 >>> ! real(dp),allocatable,dimension(:) :: w0,w1,w2 >>> >>> real(dp),dimension(n) :: eta >>> real(dp),dimension(n) :: u0,v0,v1,v2,v3 >>> real(dp),dimension(n) :: w0,w1,w2 >>> >>> write(*,*) >>> write(*,*) >>> '*******************************************************************' >>> write(*,*) ' HIEMENZ SOLVER >>> ' >>> write(*,*) >>> '*******************************************************************' >>> write(*,*) >>> >>> ! >>> !..... Numerical Parameters >>> ! >>> >>> !Domain height and number of points >>> >>> !ymax=100 >>> !n=500001 >>> >>> !Tolerance for Newton iteration >>> >>> !eps=1e-12 >>> >>> !Initial guess for Newton iteration: f''(0) >>> >>> !si = -1.232587656820134 >>> >>> ! >>> !..... Allocate arrays >>> ! >>> ! allocate(eta(n)) >>> ! allocate(u0(n),v0(n),v1(n),v2(n),v3(n)) >>> ! allocate(w0(n),w1(n),w2(n)) >>> >>> ! >>> !..... Compute solution >>> ! >>> >>> call hiemenz_fct(ymax,eps,n,si,eta,u0,v0,v1,v2,v3,w0,w1,w2) >>> >>> ! >>> !..... Write out solution in ASCII format >>> ! >>> >>> write(*,*) >>> write(*,*) 'Writing solution to hiemenz.dat' >>> write(*,*) >>> >>> open(unit=15,file='hiemenz.dat',form='formatted') >>> >>> do i=1,n >>> write(15,'(9E21.11)') >>> eta(i),u0(i),v0(i),v1(i),v2(i),v3(i),w0(i),w1(i),w2(i) >>> end do >>> >>> close(15) >>> >>> >>> >>> end subroutine hiemenz >>> >>> >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> >>> >>> subroutine hiemenz_fct(ymax,eps,n,si,eta,u0,v0,v1,v2,v3,w0,w1,w2) >>> >>> implicit none >>> >>> integer(i4b) i,iter,n,j >>> real(dp) ymax,res,p,eps,s,si >>> >>> real(dp),allocatable,dimension(:,:) :: v,w >>> >>> real(dp) eta(n) >>> real(dp) u0(n),v0(n),v1(n),v2(n),v3(n) >>> real(dp) w0(n),w1(n),w2(n) >>> >>> !external w_ode >>> >>> write(*,*) 'eps=',eps >>> write(*,*) 'ymax=',ymax >>> write(*,*) 'N=',n >>> write(*,*) >>> >>> ! >>> ! Allocate arrays >>> ! >>> allocate(v (6,n)) >>> allocate(w (5,n)) >>> >>> ! >>> ! Create grid >>> ! >>> do i=1,n >>> eta(i) = ymax * real(i-1)/real(n-1) >>> end do >>> ! >>> ! Newton-Raphson iteration for v >>> ! >>> iter = 0 >>> s = si >>> res = 1. >>> write(*,*) 'Newton-Raphson iteration for v: s=',s >>> >>> do while (res.gt.eps) >>> >>> iter=iter+1 >>> >>> call hiemenz_eval(s,eta,n,v) >>> >>> s = s - (v(2,n) + 1.0) / v(5,n) >>> res = abs(v(2,n) + 1.0) >>> >>> write(*,*) iter,s,res >>> >>> end do >>> >>> write(*,*) 'Number of iterations ',iter >>> write(*,*) >>> write(*,*) "v''(0)=",v(3,1) >>> >>> ! >>> ! Now compute w >>> ! >>> w(1,1) = 0. >>> w(2,1) = v(3,1) >>> >>> w(3,1) = 0. >>> w(4,1) = 0. >>> w(5,1) = v(3,1) >>> >>> do i=2,n >>> call rk4(w(1,i-1),w(1,i),eta(i)-eta(i-1),5,w_ode) >>> end do >>> >>> ! >>> ! Rescale w to match boundary conditions at infinity >>> ! >>> p = 1./w(1,n) >>> >>> do i=1,n >>> w(1,i) = w(1,i) * p >>> w(2,i) = w(2,i) * p >>> end do >>> >>> write(*,*) "w'(0)=",w(2,1) >>> >>> ! >>> ! Now compute v''', w'' from the RHS and copy all values to new arrays >>> ! >>> >>> do i=1,n >>> >>> u0(i) = -v(2,i) >>> v0(i) = v(1,i) >>> v1(i) = v(2,i) >>> v2(i) = v(3,i) >>> v3(i) = -v(2,i)*v(2,i) + v(1,i)*v(3,i) + 1.0 >>> w0(i) = w(1,i) >>> w1(i) = w(2,i) >>> w2(i) = w(2,i)*v(1,i) >>> >>> end do >>> >>> >>> deallocate(v,w) >>> >>> end subroutine hiemenz_fct >>> >>> >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> >>> >>> subroutine hiemenz_eval(s,eta,n,f) >>> >>> implicit none >>> >>> integer(i4b) n,i >>> real(dp) s >>> real(dp) eta(n),f(6,n) >>> >>> !external v_ode >>> >>> f(1,1) = 0. >>> f(2,1) = 0. >>> f(3,1) = s >>> >>> f(4,1) = 0. >>> f(5,1) = 0. >>> f(6,1) = 1. >>> >>> do i=2,n >>> call rk4(f(1,i-1),f(1,i),eta(i)-eta(i-1),6,v_ode) >>> end do >>> >>> end subroutine hiemenz_eval >>> >>> >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> >>> >>> subroutine v_ode(y,dy) >>> >>> implicit none >>> >>> real(dp) y(6),dy(6) >>> >>> dy(1) = y(2) >>> dy(2) = y(3) >>> dy(3) = -y(2)*y(2) + y(1)*y(3) + 1.0 >>> >>> dy(4) = y(5) >>> dy(5) = y(6) >>> dy(6) = y(3)*y(4) - 2.0*y(2)*y(5) + y(1)*y(6) >>> >>> end subroutine v_ode >>> >>> >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> >>> >>> subroutine w_ode(y,dy) >>> >>> implicit none >>> >>> real(dp) y(5),dy(5) >>> >>> dy(1) = y(2) >>> dy(2) = y(2)*y(3) >>> >>> dy(3) = y(4) >>> dy(4) = y(5) >>> dy(5) = -y(4)*y(4) + y(3)*y(5) + 1.0 >>> >>> end subroutine w_ode >>> >>> >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> >>> >>> subroutine rk4(y1,y2,h,nn,der) >>> ! >>> ! Perform one RK4 step with step size h of a n-vector >>> ! Input: y1(n) old step >>> ! h step size >>> ! nn vector size >>> ! der derivative function >>> ! Output: y2(nn) new step >>> ! >>> implicit none >>> >>> integer(i4b) nn >>> >>> real(dp) y1(nn),y2(nn),y(nn) >>> real(dp) k1(nn),k2(nn),k3(nn),k4(nn) >>> real(dp) h >>> >>> external der >>> ! >>> ! First RK substep >>> ! >>> >>> y(:) = y1(:) >>> call der(y,k1) >>> k1(:)=h*k1(:) >>> >>> ! >>> ! Second RK substep >>> ! >>> >>> y(:) = y1(:) + .5*k1(:) >>> call der(y,k2) >>> k2(:)=h*k2(:) >>> >>> ! >>> ! Third RK substep >>> ! >>> >>> y(:) = y1(:) + .5*k2(:) >>> call der(y,k3) >>> k3(:)=h*k3(:) >>> >>> ! >>> ! Fourth RK substep >>> ! >>> >>> y(:) = y1(:) + k3(:) >>> call der(y,k4) >>> k4(:)=h*k4(:) >>> >>> ! >>> ! Compose solution after full RK step >>> ! >>> y2(:) = y1(:) + ( k1(:) + 2.0*k2(:) + 2.0*k3(:) + k4(:) ) / 6.0 >>> >>> end subroutine rk4 >>> >>> >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> >>> >>> subroutine compute_baseflow_derivative(DXYZ,uvwvec,nx,ny,UVWxyz) >>> >>> implicit none >>> >>> integer(i4b) :: i,j >>> >>> integer(i4b),intent(in) :: nx,ny >>> >>> real(dp),dimension(nx*ny),intent(in) :: uvwvec >>> real(dp),dimension(nx*ny,nx*ny),intent(in) :: DXYZ >>> real(dp),dimension(nx*ny,nx*ny),intent(out) :: UVWxyz >>> >>> UVWxyz = 0.0 >>> >>> do i=1,nx*ny >>> do j=1,nx*ny >>> >>> UVWxyz(i,i) = UVWxyz(i,i) + DXYZ(i,j)*uvwvec(j) >>> >>> enddo >>> enddo >>> >>> end subroutine compute_baseflow_derivative >>> >>> >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> >>> >>> subroutine compute_UVW_DXYZ(DXYZ,uvwvec,nx,ny,UVW_DXYZ) >>> >>> implicit none >>> >>> integer(i4b) :: i,j >>> >>> integer(i4b),intent(in) :: nx,ny >>> >>> real(dp),dimension(nx*ny),intent(in) :: uvwvec >>> real(dp),dimension(nx*ny,nx*ny),intent(in) :: DXYZ >>> real(dp),dimension(nx*ny,nx*ny),intent(out) :: UVW_DXYZ >>> >>> do i=1,nx*ny >>> do j=1,nx*ny >>> >>> UVW_DXYZ(i,j) = uvwvec(i)*DXYZ(i,j) >>> >>> enddo >>> enddo >>> >>> end subroutine compute_UVW_DXYZ >>> >>> >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> >>> >>> >>> >>> END MODULE BASEFLOW >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> !!$ >>> !!$ >>> !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> !!$ >>> !!$ SUBROUTINE Poiseuille_3D(nz,ny,AA,usol,zz,yy) >>> !!$ >>> !!$ character(128) :: file_name >>> !!$ integer(i4b) :: INFO,NRHS,ct,i,j,N,file_num,m1,n1,p1,q1 >>> !!$ >>> !!$ integer(i4b),intent(in) :: nz,ny >>> !!$ real(dp),intent(in) :: AA >>> !!$ >>> !!$ real(dp) :: ymin,ymax,zmin,zmax,dzidz,d2zidz2,dyidy,d2yidy2,U00 >>> !!$ real(dp) :: start,finish >>> !!$ >>> !!$ real(dp),dimension(nz,nz,2) :: DMZ >>> !!$ real(dp),dimension(ny,ny,2) :: DMY >>> !!$ >>> !!$ real(dp),dimension(nz,nz) :: D2Z >>> !!$ real(dp),dimension(ny,ny) :: D2Y >>> !!$ >>> !!$ real(dp),dimension(nz-2,nz-2) :: D2Zs >>> !!$ real(dp),dimension(ny-2,ny-2) :: D2Ys >>> !!$ >>> !!$ real(dp),dimension(nz-2,nz-2) :: IZ >>> !!$ real(dp),dimension(ny-2,ny-2) :: IY >>> !!$ >>> !!$ real(dp),dimension(nz,ny) :: usol,zz,yy >>> !!$ real(dp),dimension(nz-2,ny-2) :: utmp >>> !!$ >>> !!$ real(dp),dimension(nz) :: zi,z >>> !!$ real(dp),dimension(ny) :: yi,y >>> !!$ >>> !!$ real(dp),dimension( (nz-2)*(ny-2) ) :: frhs >>> !!$ real(dp),dimension( (nz-2)*(ny-2),(nz-2)*(ny-2) ) :: AMAT,DZZ,DYY >>> !!$ >>> !!$ integer(i4b),dimension((nz-2)*(ny-2)) :: IPIV >>> !!$ >>> !!$ frhs=0.0 >>> !!$ >>> !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> !!$!!!!!!! 3D POISEUILLE FLOW !!!!!!!! >>> !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> !!$ >>> !!$ zmin=-AA >>> !!$ zmax=AA >>> !!$ >>> !!$ ymin=-1 >>> !!$ ymax=1 >>> !!$ >>> !!$ ! >>> !!$ !! .....Create Grid >>> !!$ ! >>> !!$ >>> !!$ if ( mod(nz,2)==0 .or. mod(ny,2)==0 ) then >>> !!$ WRITE(*,*) >>> !!$ STOP'STOPPING... IN 3D POISEUILLE BASEFLOW NZ,NY NEED TO BE ODD' >>> !!$ endif >>> !!$ >>> !!$ ! >>> !!$ !! .....Get collocation points and chebyshev differentiation matrices >>> !!$ ! >>> !!$ >>> !!$ call cheby(nz-1,zi,DMZ) >>> !!$ call cheby(ny-1,yi,DMY) >>> !!$ >>> !!$ ! z-mapping from [-1,1]->[-ZMAX,ZMAX] >>> !!$ z=zmax*zi >>> !!$ >>> !!$ ! y-mapping (from [-1,1]->[-1,1]) >>> !!$ y=yi >>> !!$ >>> !!$ !Compute the metric terms >>> !!$ dzidz = 1/zmax >>> !!$ d2zidz2 = 0 >>> !!$ >>> !!$ dyidy = 1 >>> !!$ d2yidy2 = 0 >>> !!$ >>> !!$ !Scale the differentiation matrix (due to the use of the mapping) >>> !!$ >>> !!$ !z-direction >>> !!$ !D1Z = dzidz*DMZ(:,:,1) >>> !!$ D2Z = d2zidz2*DMZ(:,:,1) + dzidz**2*DMZ(:,:,2) >>> !!$ D2Zs = D2Z(2:nz-1,2:nz-1) >>> !!$ >>> !!$ !y-direction >>> !!$ !D1Y = dyidy*DMY(:,:,1) >>> !!$ D2Y = d2yidy2*DMY(:,:,1) + dyidy**2*DMY(:,:,2) >>> !!$ D2Ys = D2Y(2:ny-1,2:ny-1) >>> !!$ >>> !!$ IZ=0.0 >>> !!$ IY=0.0 >>> !!$ >>> !!$ do i=1,nz-2 >>> !!$ IZ(i,i)=1.0 >>> !!$ enddo >>> !!$ >>> !!$ do i=1,ny-2 >>> !!$ IY(i,i)=1.0 >>> !!$ enddo >>> !!$ >>> !!$ m1=size(IY,1) >>> !!$ n1=size(IY,2) >>> !!$ p1=size(D2Zs,1) >>> !!$ q1=size(D2Zs,2) >>> !!$ >>> !!$ call mykron(IY,D2Zs,m1,n1,p1,q1,DZZ) >>> !!$ >>> !!$ m1=size(D2Ys,1) >>> !!$ n1=size(D2Ys,2) >>> !!$ p1=size(IZ,1) >>> !!$ q1=size(IZ,2) >>> !!$ >>> !!$ call mykron(D2Ys,IZ,m1,n1,p1,q1,DYY) >>> !!$ >>> !!$ AMAT = DYY + DZZ >>> !!$ >>> !!$ !CALL PRINT_MATRIX( 'DZZ', (nz-2)*(ny-2), (nz-2)*(ny-2), DZZ ) >>> !!$ !CALL PRINT_MATRIX( 'DYY', (nz-2)*(ny-2), (nz-2)*(ny-2),DYY ) >>> !!$ >>> !!$ >>> !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> !!$ !! SOLVE POISSON EQUATION FOR THE FIRST TIME !! >>> !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> !!$ >>> !!$ >>> !!$ >>> !!$ do i=1,(ny-2)*(nz-2) >>> !!$ frhs(i)=-2.0 >>> !!$ enddo >>> !!$ >>> !!$ INFO=0 >>> !!$ NRHS=1 ! only 1 rhs here >>> !!$ N=(nz-2)*(ny-2) >>> !!$ >>> !!$ >>> !!$ ! >>> !!$ ! Compute the LU factorization of A. >>> !!$ ! >>> !!$ >>> !!$ ! Note: on exit of dgetrf, AMAT is not AMAT anymore but contains the >>> factors L and U >>> !!$ ! from the factorization A = P*L*U; the unit diagonal elements of L >>> are not stored. >>> !!$ >>> !!$ !call PRINT_MATRIX( 'AMAT', N, N, AMAT ) >>> !!$ >>> !!$ >>> !!$ call cpu_time(start) >>> !!$ CALL dgetrf( N, N, AMAT, N, IPIV, INFO ) >>> !!$ call cpu_time(finish) >>> !!$ >>> !!$ !write(*,*)'AMAT AFTER',AMAT >>> !!$ >>> !!$ print '("Time LU = ",f15.4," seconds.")',finish-start >>> !!$ >>> !!$ IF( INFO .NE. 0 ) THEN >>> !!$ WRITE(*,*)'INFO',INFO >>> !!$ STOP 'PROBLEM IN LAPACK (Poiseuille_3D)' >>> !!$ ENDIF >>> !!$ >>> !!$ >>> !!$ ! >>> !!$ ! Solve the system A*X = B, overwriting B with X. >>> !!$ ! >>> !!$ >>> !!$ IF( INFO.EQ.0 ) THEN >>> !!$ >>> !!$ call cpu_time(start) >>> !!$ CALL dgetrs( 'No transpose', n, nrhs, AMAT, N, IPIV, frhs, N, >>> INFO ) >>> !!$ call cpu_time(finish) >>> !!$ >>> !!$ print '("Time Linear System Solve = ",f15.4," >>> seconds.")',finish-start >>> !!$ write(*,*) >>> !!$ >>> !!$ END IF >>> !!$ >>> !!$ ct=0 >>> !!$ do j=1,ny-2 >>> !!$ do i=1,nz-2 >>> !!$ >>> !!$ ct=ct+1 >>> !!$ utmp(i,j) = frhs(ct) >>> !!$ >>> !!$ enddo >>> !!$ enddo >>> !!$ >>> !!$ usol = 0.0 >>> !!$ usol(2:nz-1,2:ny-1) = utmp(:,:) >>> !!$ >>> !!$ zz=0.0 >>> !!$ yy=0.0 >>> !!$ >>> !!$ do i=1,nz >>> !!$ do j=1,ny >>> !!$ >>> !!$ zz(i,j)=z(i) >>> !!$ yy(i,j)=y(j) >>> !!$ >>> !!$ enddo >>> !!$ enddo >>> !!$ >>> !!$ U00 = usol( (nz+1)/2 , (ny+1)/2 ) >>> !!$ >>> !!$ >>> !!$ >>> !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> !!$!! SOLVE POISSON EQUATION FOR THE SECOND TIME !! >>> !!$!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> !!$ >>> !!$ >>> !!$ utmp=0.0 >>> !!$ usol=0.0 >>> !!$ frhs=0.0 >>> !!$ >>> !!$ !New,scaled rhs >>> !!$ do i=1,(ny-2)*(nz-2) >>> !!$ frhs(i)=-2.0/U00 >>> !!$ enddo >>> !!$ >>> !!$ ! >>> !!$ ! Solve the system A*X = B, overwriting B with X. >>> !!$ ! >>> !!$ >>> !!$ ! Note: here we already have the LU factorization so that we only >>> need a call to dgetrs >>> !!$ >>> !!$ CALL dgetrs( 'No transpose', N, NRHS, AMAT, N, IPIV, frhs, N, INFO ) >>> !!$ >>> !!$ IF( INFO .NE. 0 ) THEN >>> !!$ WRITE(*,*)'INFO',INFO >>> !!$ STOP 'PROBLEM IN LAPACK (Poiseuille_3D -- solve 2)' >>> !!$ ENDIF >>> !!$ >>> !!$ !write(*,*)'frhs',frhs >>> !!$ >>> !!$ ct=0 >>> !!$ do j=1,ny-2 >>> !!$ do i=1,nz-2 >>> !!$ >>> !!$ ct=ct+1 >>> !!$ utmp(i,j) = frhs(ct) >>> !!$ >>> !!$ enddo >>> !!$ enddo >>> !!$ >>> !!$ usol(2:nz-1,2:ny-1) = utmp(:,:) >>> !!$ >>> !!$ >>> !!$ END SUBROUTINE Poiseuille_3D >>> !!$ >>> !!$ >>> ! >>> ! Description: Build complex matrices A and B in the context of a >>> generalized >>> ! eigenvalue problem (GEVP) >>> ! >>> ! >>> !/*T >>> ! Concepts: Build complex matrices >>> ! Processors: n >>> !T*/ >>> ! >>> ! ----------------------------------------------------------------------- >>> >>> MODULE PETSC >>> >>> USE TYPEDEF >>> USE WRITE_PRINT >>> >>> IMPLICIT NONE >>> >>> CONTAINS >>> >>> Subroutine SetupSolveGEVP(nx,ny,nxy,alpha,beta,Re) >>> >>> #include <petsc/finclude/petscsys.h> >>> #include <petsc/finclude/petscvec.h> >>> #include <petsc/finclude/petscmat.h> >>> #include <petsc/finclude/petscmat.h90> >>> #include <petsc/finclude/petscpc.h> >>> #include <petsc/finclude/petscksp.h> >>> >>> #include <slepc/finclude/slepcsys.h> >>> #include <slepc/finclude/slepceps.h> >>> >>> !...Non PETSc/SLEPc Variables >>> >>> real(dp),intent(in) :: alpha,beta,Re >>> integer(i4b),intent(in) :: nx,ny,nxy >>> >>> integer(i4b) :: ij >>> integer(i4b) :: sizem,rank,source >>> >>> integer(i4b),dimension(:),allocatable :: idxm1,idxn1 >>> integer(i4b),dimension(:),allocatable :: idxm2,idxn2 >>> integer(i4b),dimension(:),allocatable :: idxm3,idxn3 >>> >>> integer(i4b),dimension(:),allocatable :: li >>> integer(i4b),dimension(:),allocatable :: all_bc >>> integer(i4b),dimension(:),allocatable :: u_bc,v_bc,w_bc,p_bc >>> integer(i4b),dimension(:),allocatable :: fst_bc,wall_bc >>> integer(i4b),dimension(:),allocatable :: outlet_bc,inlet_bc >>> >>> >>> complex(dpc) :: ibeta,Rec,alphac,betac,small,sigma >>> complex(dpc),dimension(:,:),allocatable :: D1Xc,D2Xc,D1Yc,D2Yc >>> >>> complex(dpc),dimension(nxy) :: uvec_cmplx,uxvec_cmplx,uyvec_cmplx >>> complex(dpc),dimension(nxy) :: vvec_cmplx,vxvec_cmplx,vyvec_cmplx >>> complex(dpc),dimension(nxy) :: wvec_cmplx,wxvec_cmplx,wyvec_cmplx >>> >>> complex(dpc),dimension(3*nxy) :: wvec_cmplx3 >>> complex(dpc),dimension(ny,ny) :: VDIAG,VDY >>> >>> !...Temporary variables >>> integer(i4b),dimension(:),allocatable :: loc >>> complex(dpc),dimension(:),allocatable :: xwork1 >>> >>> >>> !...SLEPC VARIABLES >>> EPS eps >>> EPSType tname >>> >>> ST st >>> PC pc >>> >>> Vec xr,xi >>> >>> PetscReal error >>> PetscScalar kr,ki >>> >>> PetscLogDouble mem >>> >>> PetscInt maxit,nconv >>> PetscInt nev,ncv,mpd >>> PetscInt three >>> PetscInt ct,ct1,ct2,ct3,ct4 >>> >>> PetscInt icntl,ival,ival1 >>> >>> PetscReal tolslepc >>> PetscInt maxiter >>> >>> !...PETSC VARIABLES >>> >>> PetscReal norm,tol,nrm,nrm1 >>> PetscInt i,j,k,II,JJ,its >>> >>> PetscInt jmin,jmax >>> PetscInt ione,ntimes >>> PetscErrorCode ierr >>> PetscBool flg >>> PetscScalar v,one,zero,rhs >>> >>> KSP ksp >>> PetscRandom rctx >>> >>> !...MPI >>> !PetscMPIInt rank,size >>> >>> !...Matrices indices >>> >>> PetscInt IstartA,IendA,IstartB,IendB >>> >>> !...Vectors >>> >>> Vec i_vec,frhs,sol >>> >>> !...Matrices >>> >>> Mat A,B >>> >>> ! Note: Any user-defined Fortran routines (such as MyKSPMonitor) >>> ! MUST be declared as external. >>> >>> !external MyKSPMonitor,MyKSPConverged >>> >>> !#if !defined(PETSC_USE_COMPLEX) >>> !#error "this code requires PETSc --with-scalar-type=complex build" >>> !#endif >>> !#if !defined(PETSC_USE_REAL_DOUBLE) >>> !#error "this code requires PETSc --with-precision=real build" >>> !#endif >>> >>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -! >>> ! Beginning of program ! >>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -! >>> >>> call SlepcInitialize(PETSC_NULL_CHARACTER,ierr)!;CHKERRQ(ierr) >>> >>> call PetscMemorySetGetMaximumUsage(ierr)!;CHKERRQ(ierr) >>> >>> call PetscOptionsInsertString("-mat_superlu_dist_parsymbfact",ierr) >>> >>> call MPI_COMM_SIZE(MPI_COMM_WORLD,sizem,ierr)!;CHKERRQ(ierr) >>> call MPI_COMM_RANK(MPI_COMM_WORLD,rank,ierr)!;CHKERRQ(ierr) >>> >>> call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) >>> !call PetscMemoryGetMaximumUsage(mem,ierr) >>> call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr) >>> if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (entering >>> module)',rank,' memory:',mem >>> >>> !call PetscOptionsGetInt(PETSC_NULL_CHARACTER,'-nx',nx,flg,ierr) >>> !call PetscOptionsGetInt(PETSC_NULL_CHARACTER,'-ny',ny,flg,ierr) >>> >>> ione=1 ! integer >>> ibeta = imag_unit*beta >>> one = (1.0d0,0.0d0) >>> zero = (0.0d0,0.0d0) >>> Rec=Re >>> alphac=alpha >>> betac=beta >>> >>> source=0 >>> >>> ct=0 >>> >>> !...Convert Arrays to Complex >>> >>> uvec_cmplx=uvec >>> uxvec_cmplx=uxvec >>> uyvec_cmplx=uyvec >>> >>> vvec_cmplx=vvec >>> vxvec_cmplx=vxvec >>> vyvec_cmplx=vyvec >>> >>> wvec_cmplx=wvec >>> wxvec_cmplx=wxvec >>> wyvec_cmplx=wyvec >>> >>> wvec_cmplx3(1:nxy)=wvec_cmplx >>> wvec_cmplx3(nxy+1:2*nxy)=wvec_cmplx >>> wvec_cmplx3(2*nxy+1:3*nxy)=wvec_cmplx >>> >>> allocate(D1Xc(nx,nx),D2Xc(nx,nx),D1Yc(ny,ny),D2Yc(ny,ny)) >>> >>> D1Xc=D1X >>> D1Yc=D1Y >>> >>> D2Xc=D2X >>> D2Yc=D2Y >>> >>> !call >>> MPI_Bcast(wvec_cmplx3,3*nxy,MPI_DOUBLE_COMPLEX,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) >>> >>> !...Parallel Matrices >>> >>> !Create A >>> call >>> MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4*nxy,4*nxy,2*(nx+ny),PETSC_NULL_INTEGER,max(nx,ny),PETSC_NULL_INTEGER,A,ierr)!;CHKERRQ(ierr) >>> >>> !Create B >>> >>> call >>> MatCreateAIJ(PETSC_COMM_WORLD,PETSC_DECIDE,PETSC_DECIDE,4*nxy,4*nxy,1,PETSC_NULL_INTEGER,0,PETSC_NULL_INTEGER,B,ierr)!;CHKERRQ(ierr) >>> >>> call MatGetOwnershipRange(A,IstartA,IendA,ierr)!;CHKERRQ(ierr) >>> call MatGetOwnershipRange(B,IstartB,IendB,ierr)!;CHKERRQ(ierr) >>> >>> !write(*,*)'IstartA,IendA',IstartA,IendA >>> !write(*,*)'IstartB,IendB',IstartB,IendB >>> >>> if (IstartA > 4*nxy .or. IstartB > 4*nxy .or. IendA > 4*nxy .or. IendB > >>> 4*nxy) then >>> stop 'indices problem in module_petsc' >>> endif >>> >>> !...NEED TO TEST WITHOUT THIS NEXT LINE TO UNDERSTAND >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> !...This line avoids error [0]PETSC ERROR: Argument out of range [0]PETSC >>> ERROR: New nonzero at (470,470)... >>> call >>> MatSetOption(A,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE,ierr)!;CHKERRQ(ierr) >>> >>> !...Get Memory Usage >>> call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) >>> !call PetscMemoryGetMaximumUsage(mem,ierr) >>> call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr) >>> if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (after >>> matcreate)',rank,' memory:',mem >>> >>> !...Build A in Parallel >>> >>> if (rank == 0) call cpu_time(start) >>> >>> do i=IstartA,IendA-1 >>> >>> if ( i < nxy ) then >>> >>> if (abs(uyvec_cmplx(i+1))>1e10) STOP '1' >>> call >>> MatSetValues(A,ione,i,ione,i+nxy,uyvec_cmplx(i+1),INSERT_VALUES,ierr)!;CHKERRQ(ierr) >>> ! Uy (block 1,2) >>> >>> !Build DX (block 1,4) >>> ii=1+floor((i+0.01)/ny) >>> jj=1 >>> >>> jmin=mod(i,ny) >>> jmax=jmin+(nx-1)*ny >>> >>> do j=jmin,jmax,ny >>> >>> if (abs(D1Xc(ii,jj))>1e10) STOP '2' >>> call >>> MatSetValues(A,ione,i,ione,j+3*nxy,D1Xc(ii,jj),INSERT_VALUES,ierr) ! Build >>> DX (block 1,4) >>> jj=jj+1 >>> >>> enddo >>> >>> !Build -DXX/Re (part of block 1,1) >>> ii=1+floor((i+0.01)/ny) >>> jj=1 >>> >>> jmin=mod(i,ny) >>> jmax=jmin+(nx-1)*ny >>> >>> do j=jmin,jmax,ny >>> >>> if (abs(-D2Xc(ii,jj)/Rec)>1e10) STOP '3' >>> call >>> MatSetValues(A,ione,i,ione,j,-D2Xc(ii,jj)/Rec,INSERT_VALUES,ierr) !Build >>> -DXX/Re (part of block 1,1) >>> jj=jj+1 >>> >>> enddo >>> >>> >>> elseif ( i > nxy-1 .and. i < 2*nxy ) then >>> >>> if (abs(vxvec_cmplx(i-nxy+1))>1e10) STOP '4' >>> call >>> MatSetValues(A,ione,i,ione,i-nxy,vxvec_cmplx(i-nxy+1),INSERT_VALUES,ierr)!;CHKERRQ(ierr) >>> ! Vx (block 2,1) >>> >>> !Build DY (block 2,4) >>> ii=mod(i,ny)+1 >>> jj=1 >>> >>> jmin=floor((i-nxy+0.01)/ny)*ny >>> >>> do j=jmin,jmin+(ny-1) >>> >>> if (abs(D1Yc(ii,jj))>1e10) STOP '5' >>> call >>> MatSetValues(A,ione,i,ione,j+3*nxy,D1Yc(ii,jj),INSERT_VALUES,ierr) ! Build >>> DY (block 2,4) >>> jj=jj+1 >>> >>> enddo >>> >>> >>> !Build -DXX/Re (part of block 2,2) >>> ii=1+floor((i-nxy+0.01)/ny) >>> jj=1 >>> >>> jmin=mod(i,ny) >>> jmax=jmin+(nx-1)*ny >>> >>> do j=jmin,jmax,ny >>> >>> if (abs(-D2Xc(ii,jj)/Rec)>1e10) STOP '6' >>> call >>> MatSetValues(A,ione,i,ione,j+nxy,-D2Xc(ii,jj)/Rec,INSERT_VALUES,ierr) >>> !Build -DXX/Re (part of block 2,2) >>> jj=jj+1 >>> >>> enddo >>> >>> >>> >>> >>> elseif ( i > 2*nxy-1 .and. i < 3*nxy ) then >>> >>> call MatSetValues(A,ione,i,ione,i+nxy,ibeta,INSERT_VALUES,ierr)! >>> iBeta (block 3,4) >>> >>> call >>> MatSetValues(A,ione,i,ione,i-2*nxy,wxvec_cmplx(i-2*nxy+1),INSERT_VALUES,ierr)! >>> Wx (block 3,1) >>> call MatSetValues(A,ione,i,ione,i >>> -nxy,wyvec_cmplx(i-2*nxy+1),INSERT_VALUES,ierr)! Wy (block 3,2) >>> >>> >>> !Build -DXX/Re (part of block 3,3) >>> ii=1+floor((i-2*nxy+0.01)/ny) >>> jj=1 >>> >>> jmin=mod(i,ny) >>> jmax=jmin+(nx-1)*ny >>> >>> do j=jmin,jmax,ny >>> >>> call >>> MatSetValues(A,ione,i,ione,j+2*nxy,-D2Xc(ii,jj)/Rec,INSERT_VALUES,ierr) >>> !Build -DXX/Re (part of block 3,3) >>> jj=jj+1 >>> >>> enddo >>> >>> >>> elseif ( i > 3*nxy-1 ) then >>> >>> call MatSetValues(A,ione,i,ione,i-nxy,ibeta,INSERT_VALUES,ierr)! >>> iBeta (block 4,3) >>> >>> !Build DX (block 4,1) >>> ii=1+floor(((i-3*nxy)+0.01)/ny) >>> jj=1 >>> >>> jmin=mod(i,ny) >>> jmax=jmin+(nx-1)*ny >>> >>> do j=jmin,jmax,ny >>> >>> call >>> MatSetValues(A,ione,i,ione,j,D1Xc(ii,jj),INSERT_VALUES,ierr) >>> jj=jj+1 >>> >>> enddo >>> >>> !Build DY (block 4,2) >>> ii=mod(i,ny)+1 >>> jj=1 >>> >>> jmin=floor((i-3*nxy+0.01)/ny)*ny >>> >>> do j=jmin,jmin+(ny-1) >>> >>> call >>> MatSetValues(A,ione,i,ione,j+nxy,D1Yc(ii,jj),INSERT_VALUES,ierr) >>> jj=jj+1 >>> >>> enddo >>> >>> endif >>> >>> enddo >>> >>> if (rank==0) then >>> >>> call cpu_time(finish) >>> write(*,*) >>> print '("Time Partial Parallel Build A #1 = ",f21.3," >>> seconds.")',finish-start >>> >>> endif >>> >>> >>> call MatAssemblyBegin(A,MAT_FLUSH_ASSEMBLY,ierr)!;CHKERRQ(ierr) ! >>> necessary to switch between INSERT_VALUES and ADD_VALUES >>> call MatAssemblyEnd(A,MAT_FLUSH_ASSEMBLY,ierr)!;CHKERRQ(ierr) ! >>> necessary to switch between INSERT_VALUES and ADD_VALUES >>> >>> >>> >>> !...Finish Blocks A11, A22 and A33 >>> >>> if (rank == 0) call cpu_time(start) >>> >>> VDY=(0.0d0,0.0d0) >>> VDIAG=(0.0d0,0.0d0) >>> >>> do i=IstartA,IendA-1 >>> if (i < 3*nxy) then >>> call >>> MatSetValues(A,ione,i,ione,i,betac**2/Rec+imag_unit*betac*wvec_cmplx3(i+1),ADD_VALUES,ierr)!;CHKERRQ(ierr) >>> endif >>> enddo >>> >>> do i=IstartA,IendA-1 >>> >>> if ( i < nxy ) then >>> >>> !Build -DYY/Rec (part of block 1,1) >>> ii=mod(i,ny)+1 >>> jj=1 >>> >>> jmin=floor((i+0.01)/ny)*ny >>> >>> do j=jmin,jmin+(ny-1) >>> >>> call >>> MatSetValues(A,ione,i,ione,j,-D2Yc(ii,jj)/Rec,ADD_VALUES,ierr) >>> jj=jj+1 >>> >>> enddo >>> >>> >>> !Build U*DX (part of block 1,1) >>> ii=1+floor((i+0.01)/ny) >>> jj=1 >>> >>> jmin=mod(i,ny) >>> jmax=jmin+(nx-1)*ny >>> >>> do j=jmin,jmax,ny >>> >>> call >>> MatSetValues(A,ione,i,ione,j,uvec_cmplx(i+1)*D1Xc(ii,jj),ADD_VALUES,ierr) >>> jj=jj+1 >>> >>> enddo >>> >>> >>> !Build V*DY (part of block 1,1) >>> ii=mod(i,ny)+1 >>> jj=1 >>> >>> jmin=floor((i+0.01)/ny)*ny >>> >>> do j=jmin,jmin+(ny-1) >>> call >>> MatSetValues(A,ione,i,ione,j,vvec_cmplx(i+1)*D1Yc(ii,jj),ADD_VALUES,ierr) >>> jj=jj+1 >>> >>> enddo >>> >>> >>> call >>> MatSetValues(A,ione,i,ione,i,uxvec_cmplx(i+1),ADD_VALUES,ierr)!;CHKERRQ(ierr) >>> ! Ux (block 1,1) >>> >>> >>> >>> >>> elseif ( i > nxy-1 .and. i < 2*nxy ) then >>> >>> !Build -DYY/Rec (part of block 2,2) >>> ii=mod(i,ny)+1 >>> jj=1 >>> >>> jmin=floor((i-nxy+0.01)/ny)*ny >>> >>> do j=jmin,jmin+(ny-1) >>> >>> call >>> MatSetValues(A,ione,i,ione,j+nxy,-D2Yc(ii,jj)/Rec,ADD_VALUES,ierr) ! Build >>> DY (block 2,4) >>> jj=jj+1 >>> >>> enddo >>> >>> >>> !Build U*DX (part of block 2,2) >>> ii=1+floor((i-nxy+0.01)/ny) >>> jj=1 >>> >>> jmin=mod(i,ny) >>> jmax=jmin+(nx-1)*ny >>> >>> do j=jmin,jmax,ny >>> >>> call >>> MatSetValues(A,ione,i,ione,j+nxy,uvec_cmplx(i-nxy+1)*D1Xc(ii,jj),ADD_VALUES,ierr) >>> jj=jj+1 >>> >>> enddo >>> >>> >>> !Build V*DY (part of block 2,2) >>> ii=mod(i,ny)+1 >>> jj=1 >>> >>> jmin=floor((i-nxy+0.01)/ny)*ny >>> >>> do j=jmin,jmin+(ny-1) >>> call >>> MatSetValues(A,ione,i,ione,j+nxy,vvec_cmplx(i-nxy+1)*D1Yc(ii,jj),ADD_VALUES,ierr) >>> jj=jj+1 >>> >>> enddo >>> >>> call >>> MatSetValues(A,ione,i,ione,i,vyvec_cmplx(i-nxy+1),ADD_VALUES,ierr)!;CHKERRQ(ierr) >>> ! Vy (block 2,2) >>> >>> >>> elseif ( i > 2*nxy-1 .and. i < 3*nxy ) then >>> >>> !Build -DYY/Rec (part of block 3,3) >>> ii=mod(i,ny)+1 >>> jj=1 >>> >>> jmin=floor((i-2*nxy+0.01)/ny)*ny >>> >>> do j=jmin,jmin+(ny-1) >>> >>> call >>> MatSetValues(A,ione,i,ione,j+2*nxy,-D2Yc(ii,jj)/Rec,ADD_VALUES,ierr) ! >>> Build DY (block 2,4) >>> jj=jj+1 >>> >>> enddo >>> >>> >>> !Build U*DX (part of block 3,3) >>> ii=1+floor((i-2*nxy+0.01)/ny) >>> jj=1 >>> >>> jmin=mod(i,ny) >>> jmax=jmin+(nx-1)*ny >>> >>> do j=jmin,jmax,ny >>> >>> call >>> MatSetValues(A,ione,i,ione,j+2*nxy,uvec_cmplx(i-2*nxy+1)*D1Xc(ii,jj),ADD_VALUES,ierr) >>> jj=jj+1 >>> >>> enddo >>> >>> >>> !Build V*DY (part of block 3,3) >>> ii=mod(i,ny)+1 >>> jj=1 >>> >>> jmin=floor((i-2*nxy+0.01)/ny)*ny >>> >>> do j=jmin,jmin+(ny-1) >>> call >>> MatSetValues(A,ione,i,ione,j+2*nxy,vvec_cmplx(i-2*nxy+1)*D1Yc(ii,jj),ADD_VALUES,ierr) >>> jj=jj+1 >>> >>> enddo >>> >>> endif >>> >>> enddo >>> >>> >>> if (rank==0) then >>> >>> call cpu_time(finish) >>> write(*,*) >>> print '("Time Partial Parallel Build A #2 = ",f21.3," >>> seconds.")',finish-start >>> >>> endif >>> >>> !$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ >>> !$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ >>> >>> >>> !...Artificially create diagonal element for A ==> TO BE REMOVED >>> small = (0.000001d0,0.0d0) >>> do i=IstartA,IendA-1 >>> call >>> MatSetValues(A,ione,i,ione,i,small,ADD_VALUES,ierr)!;CHKERRQ(ierr) >>> enddo >>> >>> call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) >>> !call PetscMemoryGetMaximumUsage(mem,ierr) >>> call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr) >>> if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (A just before >>> assembly)',rank,' memory:',mem >>> >>> >>> if (rank==0) then >>> call cpu_time(start) >>> endif >>> >>> !...Assemble A >>> call MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY,ierr)!;CHKERRQ(ierr) >>> call MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY,ierr)!;CHKERRQ(ierr) >>> !call MatView(A,PETSC_VIEWER_STDOUT_WORLD,ierr) >>> >>> if (rank==0) then >>> call cpu_time(finish) >>> write(*,*) >>> print '("Time to assemble A = ",f21.3," seconds.")',finish-start >>> endif >>> >>> >>> call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) >>> !call PetscMemoryGetMaximumUsage(mem,ierr) >>> call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr) >>> if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (A build and >>> assembled)',rank,' memory:',mem >>> >>> >>> !...Build B in parallel >>> if (rank==0) then >>> call cpu_time(start) >>> endif >>> >>> do i=IstartB,IendB-1 >>> if (i < 3*nxy) then >>> call >>> MatSetValues(B,ione,i,ione,i,imag_unit,INSERT_VALUES,ierr)!;CHKERRQ(ierr) >>> endif >>> enddo >>> >>> !...Assemble B >>> call MatAssemblyBegin(B,MAT_FINAL_ASSEMBLY,ierr)!;CHKERRQ(ierr) >>> call MatAssemblyEnd(B,MAT_FINAL_ASSEMBLY,ierr)!;CHKERRQ(ierr) >>> !call MatView(B,PETSC_VIEWER_STDOUT_WORLD,ierr) >>> >>> call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) >>> !call PetscMemoryGetMaximumUsage(mem,ierr) >>> call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr) >>> if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (B build and >>> assembled)',rank,' memory:',mem >>> >>> >>> call MatNorm(B,NORM_FROBENIUS,nrm1,ierr)!;CHKERRQ(ierr) >>> >>> !!$ if (rank==0) then >>> !!$ write(*,*)'norm B:',nrm1 >>> !!$ endif >>> >>> if (rank==0) then >>> call cpu_time(finish) >>> write(*,*) >>> print '("Time to build and assemble B = ",f21.3," >>> seconds.")',finish-start >>> endif >>> >>> >>> !...Set Boundary Conditions >>> >>> if (rank==0) then >>> >>> allocate(li(nx)) >>> >>> do i=1,nx >>> li(i)=(i-1)*ny >>> enddo >>> >>> ct1=0 >>> ct2=0 >>> ct3=0 >>> ct4=0 >>> >>> do i=1,nx >>> do j=1,ny >>> >>> if (j==1) then ! fst_bc >>> ct1=ct1+1 >>> elseif (j==ny) then ! wall_bc >>> ct2=ct2+1 >>> elseif (i==1) then ! outlet_bc >>> ct3=ct3+1 >>> elseif (i==nx) then ! inlet_bc >>> ct4=ct4+1 >>> endif >>> >>> enddo >>> enddo >>> >>> ct=ct1+ct2+ct3+ct4 >>> >>> endif ! if (rank == 0) >>> >>> call MPI_Bcast(ct,1,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) >>> call MPI_Bcast(ct1,1,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) >>> call MPI_Bcast(ct2,1,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) >>> call MPI_Bcast(ct3,1,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) >>> call MPI_Bcast(ct4,1,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) >>> >>> allocate(fst_bc(ct1)) >>> allocate(wall_bc(ct2)) >>> allocate(outlet_bc(ct3)) >>> allocate(inlet_bc(ct4)) >>> >>> allocate(all_bc(3*ct)) >>> >>> allocate(u_bc(ct)) >>> allocate(v_bc(ct)) >>> allocate(w_bc(ct)) >>> allocate(p_bc(ct)) >>> >>> if (rank == 0) then >>> >>> ct1=0 >>> ct2=0 >>> ct3=0 >>> ct4=0 >>> >>> do i=1,nx >>> do j=1,ny >>> >>> ij=li(i)+j-1 >>> >>> if (j==1) then ! fst_bc >>> ct1=ct1+1 >>> fst_bc(ct1)=ij >>> elseif (j==ny) then ! wall_bc >>> ct2=ct2+1 >>> wall_bc(ct2)=ij >>> elseif (i==1) then ! outlet_bc >>> ct3=ct3+1 >>> outlet_bc(ct3)=ij >>> elseif (i==nx) then ! inlet_bc >>> ct4=ct4+1 >>> inlet_bc(ct4)=ij >>> endif >>> >>> enddo >>> enddo >>> >>> u_bc(1:ct1)=fst_bc >>> u_bc(ct1+1:ct1+ct2)=wall_bc >>> u_bc(ct1+ct2+1:ct1+ct2+ct3)=outlet_bc >>> u_bc(ct1+ct2+ct3+1:ct1+ct2+ct3+ct4)=inlet_bc >>> >>> v_bc=u_bc+nxy >>> w_bc=v_bc+nxy >>> p_bc=w_bc+nxy >>> >>> all_bc=[u_bc,v_bc,w_bc] >>> >>> endif ! if (rank == 0) >>> >>> call >>> MPI_Bcast(all_bc,3*ct,MPI_INT,source,MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) >>> >>> !For parallel case, all processes that share matrix MUST call this >>> routine, regardless of whether any rows being zeroed are owned by them. >>> call MatZeroRows(A,3*ct,all_bc, one,zero,zero,ierr);CHKERRQ(ierr) >>> call MatZeroRows(B,3*ct,all_bc,zero,zero,zero,ierr);CHKERRQ(ierr) >>> >>> !call MatView(B,PETSC_VIEWER_STDOUT_WORLD,ierr) >>> >>> ! cannot be inside a if (rank==0) statment!!! >>> call MatNorm(A,NORM_FROBENIUS,nrm,ierr)!;CHKERRQ(ierr) >>> call MatNorm(B,NORM_FROBENIUS,nrm1,ierr)!;CHKERRQ(ierr) >>> >>> if (rank==0) then >>> write(*,*)'norm A:',nrm >>> write(*,*)'norm B:',nrm1 >>> endif >>> >>> >>> >>> !!!!!! TESTING HOW BIG LU CAN BE !!!!!!!!!!!!!!!! >>> >>> !!$ sigma = (1.0d0,0.0d0) >>> !!$ call MatAXPY(A,-sigma,B,DIFFERENT_NONZERO_PATTERN,ierr) ! on exit A >>> = A-sigma*B >>> !!$ >>> !!$ !...Create linear solver context >>> !!$ call KSPCreate(PETSC_COMM_WORLD,ksp,ierr) >>> !!$ >>> !!$!...Set operators. Here the matrix that defines the linear system also >>> serves as the preconditioning matrix. >>> !!$ !call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr) !aha >>> commented and replaced by next line >>> !!$ call KSPSetOperators(ksp,A,A,ierr) ! remember: here A = A-sigma*B >>> !!$ >>> !!$ tol = 1.e-10 >>> !!$ call >>> KSPSetTolerances(ksp,tol,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr) >>> ! set relative and absolute tolerances and uses defualt for divergence tol >>> !!$ >>> !!$!...Set the Direct (LU) Solver >>> !!$ call KSPSetType(ksp,KSPPREONLY,ierr) >>> !!$ call KSPGetPC(ksp,pc,ierr) >>> !!$ call PCSetType(pc,PCLU,ierr) >>> !!$ call PCFactorSetMatSolverPackage(pc,MATSOLVERSUPERLU_DIST,ierr) ! >>> MATSOLVERSUPERLU_DIST MATSOLVERMUMPS >>> !!$ >>> !!$!...Create right-hand-side vector >>> !!$ call MatCreateVecs(A,frhs,PETSC_NULL_OBJECT,ierr) >>> !!$ call MatCreateVecs(A,sol,PETSC_NULL_OBJECT,ierr) >>> !!$ >>> !!$ allocate(xwork1(IendA-IstartA)) >>> !!$ allocate(loc(IendA-IstartA)) >>> !!$ >>> !!$ ct=0 >>> !!$ do i=IstartA,IendA-1 >>> !!$ ct=ct+1 >>> !!$ loc(ct)=i >>> !!$ xwork1(ct)=(1.0d0,0.0d0) >>> !!$ enddo >>> !!$ >>> !!$ call VecSetValues(frhs,IendA-IstartA,loc,xwork1,INSERT_VALUES,ierr) >>> !!$ call VecZeroEntries(sol,ierr) >>> !!$ >>> !!$ deallocate(xwork1,loc) >>> !!$ >>> !!$!...Assemble Vectors >>> !!$ call VecAssemblyBegin(frhs,ierr) >>> !!$ call VecAssemblyEnd(frhs,ierr) >>> !!$ >>> !!$!...Solve the linear system >>> !!$ call KSPSolve(ksp,frhs,sol,ierr) >>> !!$ >>> !!$ !call VecView(sol,PETSC_VIEWER_STDOUT_WORLD,ierr) >>> !!$ >>> !!$ STOP >>> >>> >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> !!!!!!!!!!!!!!!!!!!!!!! SLEPC !!!!!!!!!!!!!!!!!!!!!!!!!! >>> !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! >>> >>> if (rank==0) call cpu_time(start) >>> >>> >>> call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) >>> call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr) >>> if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (begin >>> slepc)',rank,' memory:',mem >>> >>> !...Get vector(s) compatible with the matrix, i.e. with the same parallel >>> layout >>> call MatCreateVecs(A,xr,xi,ierr) >>> !!$ call MatCreateVecs(A,xr,PETSC_NULL_OBJECT,ierr) >>> !!$ call MatCreateVecs(A,xi,PETSC_NULL_OBJECT,ierr) >>> >>> >>> !...Create Eigensolver Context and Spectral Transformation context >>> call EPSCreate(PETSC_COMM_WORLD,eps,ierr) >>> !call STCreate(PETSC_COMM_WORLD,st,ierr) >>> >>> !...Set Operators and Problem Type - Generalized Eigenvalue Problem >>> call EPSSetOperators(eps,A,B,ierr) >>> !call EPSSetProblemType(eps,EPS_PGNHEP,ierr) ! THIS ONE LEADS TO AN >>> ERROR --> CHECK WHY BECAUSE MY B SEEMS FITTING THAT CRITERION >>> call EPSSetProblemType(eps,EPS_GNHEP,ierr) >>> >>> !...Select Portion of Spectrum >>> call EPSSetTarget(eps,(3.0d0,0.d0),ierr) >>> call EPSSetWhichEigenpairs(eps,EPS_TARGET_MAGNITUDE,ierr) >>> >>> !...Select Algorithm for Eigenvalue Computation (default is Krylov-Schur) >>> call EPSSetType(eps,EPSKRYLOVSCHUR,ierr) !EPSARNOLDI,EPSKRYLOVSCHUR >>> >>> call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) >>> call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr) >>> if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (slepc after >>> EPSSetType)',rank,' memory:',mem >>> >>> !...Set Tolerance and Maxiter for Solver >>> tolslepc=1e-12 >>> maxiter=500 >>> !call EPSSetTolerances(eps,PETSC_DEFAULT_REAL,maxiter,ierr) >>> call EPSSetTolerances(eps,tolslepc,maxiter,ierr) >>> >>> !...Sets number of eigenvalues to compute and dimension of subspace >>> nev=200 >>> ncv=3*nev >>> mpd=10*nev >>> !call EPSSetDimensions(eps,nev,ncv,mpd,ierr) PETSC_DEFAULT >>> call >>> EPSSetDimensions(eps,nev,PETSC_DEFAULT_INTEGER,PETSC_DEFAULT_INTEGER,ierr) >>> >>> call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) >>> call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr) >>> if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (before >>> EPSGetST)',rank,' memory:',mem >>> >>> !...Set the Spectral Transformation --> Use Shift-And-Invert >>> call EPSGetST(eps,st,ierr) >>> call STSetType(st,STSINVERT,ierr) >>> >>> !...Set Linear Solver >>> call STGetKSP(st,ksp,ierr) >>> call KSPSetType(ksp,KSPPREONLY,ierr) >>> call KSPGetPC(ksp,pc,ierr) >>> call PCSetType(pc,PCLU,ierr) ! could try PCCHOLESKY >>> call PCFactorSetMatSolverPackage(pc,MATSOLVERSUPERLU_DIST,ierr) ! >>> MATSOLVERSUPERLU_DIST,MATSOLVERMUMPS >>> >>> !...Set Solver Parameters at Runtime >>> call EPSSetFromOptions(eps,ierr) >>> >>> call MPI_Barrier(MPI_COMM_WORLD,ierr)!;CHKERRQ(ierr) >>> call PetscMemoryGetCurrentUsage(mem,ierr)!;CHKERRQ(ierr) >>> if (rank==0) write(*,'(A,I3,A,E21.11)')'On Processor (after >>> EPSSetFromOptions)',rank,' memory:',mem >>> >>> !...Solve the Eigensystem >>> call EPSSolve(eps,ierr) >>> >>> call EPSGetIterationNumber(eps,its,ierr) >>> if (rank .eq. 0) then >>> write(*,110) its >>> endif >>> 110 format (/' Number of iterations of the method:',I4) >>> >>> !...Optional: Get some information from the solver and display it >>> call EPSGetType(eps,tname,ierr) >>> if (rank .eq. 0) then >>> write(*,120) tname >>> endif >>> 120 format (' Solution method: ',A) >>> >>> nev=10 >>> call EPSGetDimensions(eps,nev,PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,ierr) >>> if (rank .eq. 0) then >>> write(*,130) nev >>> endif >>> 130 format (' Number of requested eigenvalues:',I4) >>> call EPSGetTolerances(eps,tol,maxit,ierr) >>> if (rank .eq. 0) then >>> write(*,140) tol, maxit >>> endif >>> 140 format (' Stopping condition: tol=',1P,E10.4,', maxit=',I4) >>> >>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> ! Display solution and clean up >>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> >>> !...Get number of converged eigenpairs >>> call EPSGetConverged(eps,nconv,ierr) >>> if (rank .eq. 0) then >>> write(*,150) nconv >>> endif >>> 150 format (' Number of converged eigenpairs:',I4/) >>> >>> !...Display eigenvalues and relative errors >>> if (nconv.gt.0) then >>> if (rank .eq. 0) then >>> write(*,*) ' kr ki ||Ax-kx||/||kx||' >>> write(*,*) ' ----------------- ----------------- >>> ------------------' >>> endif >>> do i=0,nconv-1 >>> !...Get converged eigenpairs: i-th eigenvalue is stored in kr >>> (real part) and ki (imaginary part) >>> call EPSGetEigenpair(eps,i,kr,ki,xr,xi,ierr) >>> >>> !...Compute the relative error associated to each eigenpair >>> call EPSComputeError(eps,i,EPS_ERROR_RELATIVE,error,ierr) >>> if (rank .eq. 0) then >>> write(*,160) PetscRealPart(kr),PetscImaginaryPart(kr), error >>> endif >>> 160 format (1P,' ',E21.11,' ',E21.11,' ',E21.11) >>> >>> enddo >>> >>> endif >>> >>> !...Free work space >>> >>> if (rank==0) deallocate(li) >>> >>> deallocate(D1Xc,D2Xc,D1Yc,D2Yc) >>> >>> deallocate(fst_bc) >>> deallocate(wall_bc) >>> deallocate(outlet_bc) >>> deallocate(inlet_bc) >>> >>> deallocate(all_bc) >>> >>> deallocate(u_bc) >>> deallocate(v_bc) >>> deallocate(w_bc) >>> deallocate(p_bc) >>> >>> !call STDestroy(st,ierr) >>> call EPSDestroy(eps,ierr) >>> call MatDestroy(A,ierr) >>> call MatDestroy(B,ierr) >>> !call VecDestroy(i_vec,ierr) >>> call VecDestroy(xr,ierr) >>> call VecDestroy(xi,ierr) >>> >>> !...Finalize >>> call SlepcFinalize(ierr) >>> >>> if (rank==0) then >>> call cpu_time(finish) >>> write(*,*) >>> print '("Total time for SLEPC = ",f21.3," seconds.")',finish-start >>> endif >>> >>> END Subroutine SETUPSOLVEGEVP >>> >>> >>> END MODULE PETSC >>> >>> >>> >>> >>> >>> !!$ write(*,*)'fst_bc',fst_bc >>> !!$ write(*,*)'wall_bc',wall_bc >>> !!$ write(*,*)'outlet_bc',outlet_bc >>> !!$ write(*,*)'inlet_bc',inlet_bc >>> !!$ >>> !!$ write(*,*)'all_bc',all_bc >>> >>> !!$ write(*,*)'ct1+ct2+ct3+ct4',ct1+ct2+ct3+ct4 >>> >>> >>> >>> !!$ call MatCreate(PETSC_COMM_WORLD,A,ierr) >>> !!$ call MatSetSizes(A,PETSC_DECIDE,PETSC_DECIDE,4*nxy,4*nxy,ierr) >>> !!$ call MatSetFromOptions(A,ierr) >>> !!$ call MatSetUp(A,ierr) >>> >>> >>> >>> !call VecCreateSeq(PETSC_COMM_SELF,nxy,i_vec,ierr) ! sequential vector >>> !call VecSetValues(i_vec,nxy,loc,xwork1,INSERT_VALUES,ierr) >>> >>> ! Assemble vector >>> !call VecAssemblyBegin(i_vec,ierr) >>> !call VecAssemblyEnd(i_vec,ierr) >>> >>> ! Create parallel matrix, specifying only its global dimensions. >>> ! When using MatCreate(), the matrix format can be specified at >>> ! runtime. Also, the parallel partitioning of the matrix is >>> ! determined by PETSc at runtime. >>> >>> >>> >>> !!$ call MatCreate(PETSC_COMM_WORLD,B,ierr) >>> !!$ call MatSetSizes(B,PETSC_DECIDE,PETSC_DECIDE,4*nxy,4*nxy,ierr) >>> !!$ call MatSetFromOptions(B,ierr) >>> !!$ call MatSetUp(B,ierr) >>> >>> ! Currently, all PETSc parallel matrix formats are partitioned by >>> ! contiguous chunks of rows across the processors. Determine which >>> ! rows of the matrix are locally owned. >>> >>> >>> >>> >>> >>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> ! Include files >>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> ! >>> ! This program uses CPP for preprocessing, as indicated by the use of >>> ! PETSc include files in the directory petsc/include/finclude. This >>> ! convention enables use of the CPP preprocessor, which allows the use >>> ! of the #include statements that define PETSc objects and variables. >>> ! >>> ! Use of the conventional Fortran include statements is also supported >>> ! In this case, the PETsc include files are located in the directory >>> ! petsc/include/foldinclude. >>> ! >>> ! Since one must be very careful to include each file no more than once >>> ! in a Fortran routine, application programmers must exlicitly list >>> ! each file needed for the various PETSc components within their >>> ! program (unlike the C/C++ interface). >>> ! >>> >>> >>> >>> >>> !write(*,*)'IstartA,IendA',IstartA,IendA >>> !write(*,*)'IstartB,IendB',IstartB,IendB >>> >>> >>> >>> !!$#include <finclude/petscsys.h> >>> !!$#include <finclude/petscvec.h> >>> !!$#include <finclude/petscmat.h> >>> !!$#include <finclude/petscmat.h90> >>> !!$#include <finclude/petscpc.h> >>> !!$#include <finclude/petscksp.h> >>> !!$ >>> !!$#include <finclude/slepcsys.h> >>> !!$#include <finclude/slepceps.h> >>> >>> >>> >>> >>> >>> >>> >>> >>> !!$ do i=1,nx >>> !!$ do j=1,nx >>> !!$ D1X(i,j)=100+j+(i-1)*nx >>> !!$ enddo >>> !!$ enddo >>> >>> !!$ do i=1,ny >>> !!$ do j=1,ny >>> !!$ D1Y(i,j)=999.d0 >>> !!$ enddo >>> !!$ enddo >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> ! Create parallel vectors. >>> ! - Here, the parallel partitioning of the vector is determined by >>> ! PETSc at runtime. We could also specify the local dimensions >>> ! if desired -- or use the more general routine VecCreate(). >>> ! - When solving a linear system, the vectors and matrices MUST >>> ! be partitioned accordingly. PETSc automatically generates >>> ! appropriately partitioned matrices and vectors when MatCreate() >>> ! and VecCreate() are used with the same communicator. >>> ! - Note: We form 1 vector from scratch and then duplicate as needed. >>> >>> !call VecCreateMPI(PETSC_COMM_WORLD,PETSC_DECIDE,m*n,u,ierr) >>> !call VecSetFromOptions(u,ierr) >>> !call VecDuplicate(u,b,ierr) >>> !call VecDuplicate(b,x,ierr) >>> >>> ! Set exact solution; then compute right-hand-side vector. >>> ! By default we use an exact solution of a vector with all >>> ! elements of 1.0; Alternatively, using the runtime option >>> ! -random_sol forms a solution vector with random components. >>> >>> !!$ call >>> PetscOptionsHasName(PETSC_NULL_CHARACTER,'-random_exact_sol',flg,ierr) >>> !!$ if (flg) then >>> !!$ call PetscRandomCreate(PETSC_COMM_WORLD,rctx,ierr) >>> !!$ call PetscRandomSetFromOptions(rctx,ierr) >>> !!$ call VecSetRandom(u,rctx,ierr) >>> !!$ call PetscRandomDestroy(rctx,ierr) >>> !!$ else >>> !!$ call VecSet(u,one,ierr) >>> !!$ endif >>> !!$ call MatMult(A,u,b,ierr) >>> >>> >>> ! View the exact solution vector if desired >>> >>> !!$ call >>> PetscOptionsHasName(PETSC_NULL_CHARACTER,"-view_exact_sol",flg,ierr) >>> !!$ if (flg) then >>> !!$ call VecView(u,PETSC_VIEWER_STDOUT_WORLD,ierr) >>> !!$ endif >>> >>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> ! Create the linear solver and set various options >>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> >>> ! Create linear solver context >>> >>> !!$ call KSPCreate(PETSC_COMM_WORLD,ksp,ierr) >>> >>> >>> ! Set operators. Here the matrix that defines the linear system >>> ! also serves as the preconditioning matrix. >>> >>> !call KSPSetOperators(ksp,A,A,DIFFERENT_NONZERO_PATTERN,ierr) !aha >>> commented and replaced by next line >>> !!$ call KSPSetOperators(ksp,A,A,ierr) >>> >>> >>> ! Set linear solver defaults for this problem (optional). >>> ! - By extracting the KSP and PC contexts from the KSP context, >>> ! we can then directly directly call any KSP and PC routines >>> ! to set various options. >>> ! - The following four statements are optional; all of these >>> ! parameters could alternatively be specified at runtime via >>> ! KSPSetFromOptions(). All of these defaults can be >>> ! overridden at runtime, as indicated below. >>> >>> ! We comment out this section of code since the Jacobi >>> ! preconditioner is not a good general default. >>> >>> ! call KSPGetPC(ksp,pc,ierr) >>> ! ptype = PCJACOBI >>> ! call PCSetType(pc,ptype,ierr) >>> ! tol = 1.e-7 >>> ! call KSPSetTolerances(ksp,tol,PETSC_DEFAULT_DOUBLE_PRECISION, >>> ! & PETSC_DEFAULT_DOUBLE_PRECISION,PETSC_DEFAULT_INTEGER,ierr) >>> >>> ! Set user-defined monitoring routine if desired >>> >>> !call >>> PetscOptionsHasName(PETSC_NULL_CHARACTER,'-my_ksp_monitor',flg,ierr) ! flg >>> set to true if that option has been specified >>> !if (flg) then >>> ! call >>> KSPMonitorSet(ksp,MyKSPMonitor,PETSC_NULL_OBJECT,PETSC_NULL_FUNCTION,ierr) >>> ! call MyKSPMonitor >>> !endif >>> >>> !tol = 1.e-10 >>> !call >>> KSPSetTolerances(ksp,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr) >>> ! set relative tolerance and uses defualt for absolute and divergence tol >>> !call >>> KSPSetTolerances(ksp,tol,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr) >>> ! set relative and absolute tolerances and uses defualt for divergence tol >>> >>> ! Set runtime options, e.g., >>> ! -ksp_type <type> -pc_type <type> -ksp_monitor -ksp_rtol <rtol> >>> ! These options will override those specified above as long as >>> ! KSPSetFromOptions() is called _after_ any other customization >>> ! routines. >>> >>> !call KSPSetFromOptions(ksp,ierr) >>> >>> >>> ! Set convergence test routine if desired >>> >>> !call >>> PetscOptionsHasName(PETSC_NULL_CHARACTER,'-my_ksp_convergence',flg,ierr) >>> !if (flg) then >>> ! call >>> KSPSetConvergenceTest(ksp,MyKSPConverged,PETSC_NULL_OBJECT,PETSC_NULL_FUNCTION,ierr) >>> !endif >>> >>> >>> >>> !call >>> KSPMonitorSet(ksp,KSPMonitorTrueResidualNorm,PETSC_NULL_OBJECT,PETSC_NULL_FUNCTION,ierr); >>> >>> >>> ! >>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> ! Solve the linear system >>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> >>> !call KSPSolve(ksp,b,x,ierr) >>> >>> >>> ! >>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> ! View solver info (could use -ksp_view instead) >>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> >>> !added by aha >>> >>> !!$ write(*,*)'' >>> !!$ write(*,*)'Start of KSPView' >>> !!$ write(*,*)'----------------' >>> !!$ write(*,*)'' >>> !!$ call KSPView(ksp,PETSC_VIEWER_STDOUT_WORLD,ierr) >>> !!$ write(*,*)'' >>> !!$ write(*,*)'End of KSPView' >>> !!$ write(*,*)'--------------' >>> !!$ write(*,*)'' >>> >>> >>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> ! Check solution and clean up >>> ! - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> >>> ! Check the error >>> !!$ call VecAXPY(x,neg_one,u,ierr) >>> !!$ call VecNorm(x,NORM_2,norm,ierr) >>> !!$ call KSPGetIterationNumber(ksp,its,ierr) >>> !!$ if (rank .eq. 0) then >>> !!$ if (norm .gt. 1.e-12) then >>> !!$ write(6,100) norm,its >>> !!$ else >>> !!$ write(6,110) its >>> !!$ endif >>> !!$ endif >>> !!$ 100 format('Norm of error ',e11.4,' iterations ',i5) >>> !!$ 110 format('Norm of error < 1.e-12,iterations ',i5) >>> >>> ! Free work space. All PETSc objects should be destroyed when they >>> ! are no longer needed. >>> >>> !!$ call KSPDestroy(ksp,ierr) >>> !!$ call VecDestroy(u,ierr) >>> !!$ call VecDestroy(x,ierr) >>> !!$ call VecDestroy(b,ierr) >>> !!$ call MatDestroy(A,ierr) >>> >>> ! Always call PetscFinalize() before exiting a program. This routine >>> ! - finalizes the PETSc libraries as well as MPI >>> ! - provides summary and diagnostic information if certain runtime >>> ! options are chosen (e.g., -log_summary). See PetscFinalize() >>> ! manpage for more information. >>> >>> >>> ! -------------------------------------------------------------- >>> ! >>> ! MyKSPMonitor - This is a user-defined routine for monitoring >>> ! the KSP iterative solvers. >>> ! >>> ! Input Parameters: >>> ! ksp - iterative context >>> ! n - iteration number >>> ! rnorm - 2-norm (preconditioned) residual value (may be estimated) >>> ! dummy - optional user-defined monitor context (unused here) >>> ! >>> !!$ subroutine MyKSPMonitor(ksp,n,rnorm,dummy,ierr) >>> !!$ >>> !!$ implicit none >>> !!$ >>> !!$#include <finclude/petscsys.h> >>> !!$#include <finclude/petscvec.h> >>> !!$#include <finclude/petscksp.h> >>> !!$ >>> !!$ KSP ksp >>> !!$ Vec x >>> !!$ PetscErrorCode ierr >>> !!$ PetscInt n,dummy >>> !!$ PetscMPIInt rank >>> !!$ double precision rnorm >>> !!$ >>> !!$! Build the solution vector >>> !!$ >>> !!$ call KSPBuildSolution(ksp,PETSC_NULL_OBJECT,x,ierr) >>> !!$ >>> !!$! Write the solution vector and residual norm to stdout >>> !!$! - Note that the parallel viewer PETSC_VIEWER_STDOUT_WORLD >>> !!$! handles data from multiple processors so that the >>> !!$! output is not jumbled. >>> !!$ >>> !!$ call MPI_Comm_rank(PETSC_COMM_WORLD,rank,ierr) >>> !!$ if (rank .eq. 0) write(6,100) n >>> !!$ call VecView(x,PETSC_VIEWER_STDOUT_WORLD,ierr) >>> !!$ if (rank .eq. 0) write(6,200) n,rnorm >>> !!$ >>> !!$ 100 format('iteration ',i5,' solution vector:') >>> !!$ 200 format('iteration ',i5,' residual norm ',e11.4) >>> !!$ ierr = 0 >>> !!$ >>> !!$ end subroutine MyKSPMonitor >>> >>> ! -------------------------------------------------------------- >>> ! >>> ! MyKSPConverged - This is a user-defined routine for testing >>> ! convergence of the KSP iterative solvers. >>> ! >>> ! Input Parameters: >>> ! ksp - iterative context >>> ! n - iteration number >>> ! rnorm - 2-norm (preconditioned) residual value (may be estimated) >>> ! dummy - optional user-defined monitor context (unused here) >>> ! >>> !!$ subroutine MyKSPConverged(ksp,n,rnorm,flag,dummy,ierr) >>> !!$ >>> !!$ implicit none >>> !!$ >>> !!$#include <finclude/petscsys.h> >>> !!$#include <finclude/petscvec.h> >>> !!$#include <finclude/petscksp.h> >>> !!$ >>> !!$ KSP ksp >>> !!$ PetscErrorCode ierr >>> !!$ PetscInt n,dummy >>> !!$ KSPConvergedReason flag >>> !!$ double precision rnorm >>> !!$ >>> !!$ if (rnorm .le. .05) then >>> !!$ flag = 1 >>> !!$ else >>> !!$ flag = 0 >>> !!$ endif >>> !!$ ierr = 0 >>> !!$ >>> !!$ end subroutine MyKSPConverged >>> >>> >>> >>> <valgrind.log.31371><valgrind.log.31372> >
