So, I tried to use MUMPS instead of SuperLU. Still some problems with running out of memory, but I think that Sherry and Hong have poked me in an important direction:
I may have also underestimated the need for matrix reordering when computing the LU-factors. I have to look into this… One thing which is puzzling me now is that when I followed Hongs suggestion to try an iterative solver, I found that it solved my simple test problem after some testing with different settings. I am solving Ax = b with a sparse, indefinite, symmetric, complex matrix, can anything be said about the chances of success in using an iterative method? /Mahir From: Matthew Knepley [mailto:[email protected]] Sent: den 22 juli 2015 19:17 To: Ülker-Kaustell, Mahir Cc: Barry Smith; petsc-users Subject: Re: [petsc-users] SuperLU MPI-problem On Wed, Jul 22, 2015 at 11:11 AM, [email protected]<mailto:[email protected]> <[email protected]<mailto:[email protected]>> wrote: Thank you for your reply. As you have probably figured out already, I am not a computational scientist. I am a researcher in civil engineering (railways for high-speed traffic), trying to produce some, from my perspective, fairly large parametric studies based on finite element discretizations. I am working in a Windows-environment and have installed PETSc through Cygwin. Apparently, there is no support for Valgrind in this OS. It is really worth any amount of time and effort to get away from Windows if you are doing computational science. If I have understood you correct, the memory issues are related to superLU and given my background, there is not much I can do. Is this correct? The next step is to run the problem using MUMPS (--download-mumps --download-scalapack). Thanks, Matt Best regards, Mahir ______________________________________________ Mahir Ülker-Kaustell, Kompetenssamordnare, Brokonstruktör, Tekn. Dr, Tyréns AB 010 452 30 82, [email protected]<mailto:[email protected]> ______________________________________________ -----Original Message----- From: Barry Smith [mailto:[email protected]<mailto:[email protected]>] Sent: den 22 juli 2015 02:57 To: Ülker-Kaustell, Mahir Cc: Xiaoye S. Li; petsc-users Subject: Re: [petsc-users] SuperLU MPI-problem Run the program under valgrind http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind . When I use the option -mat_superlu_dist_parsymbfact I get many scary memory problems some involving for example ddist_psymbtonum (pdsymbfact_distdata.c:1332) Note that I consider it unacceptable for running programs to EVER use uninitialized values; until these are all cleaned up I won't trust any runs like this. Barry ==42050== Conditional jump or move depends on uninitialised value(s) ==42050== at 0x10274C436: MPI_Allgatherv (allgatherv.c:1053) ==42050== by 0x101557F60: get_perm_c_parmetis (get_perm_c_parmetis.c:285) ==42050== by 0x101501192: pdgssvx (pdgssvx.c:934) ==42050== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414) ==42050== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) ==42050== by 0x100F09F2C: PCSetUp_LU (lu.c:152) ==42050== by 0x100FF9036: PCSetUp (precon.c:982) ==42050== by 0x1010F54EB: KSPSetUp (itfunc.c:332) ==42050== by 0x1010F7985: KSPSolve (itfunc.c:546) ==42050== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) ==42050== by 0x1011C49B7: SNESSolve (snes.c:3906) ==42050== by 0x100001B3C: main (in ./ex19) ==42050== Uninitialised value was created by a stack allocation ==42050== at 0x10155751B: get_perm_c_parmetis (get_perm_c_parmetis.c:96) ==42050== ==42050== Conditional jump or move depends on uninitialised value(s) ==42050== at 0x102851C61: MPIR_Allgatherv_intra (allgatherv.c:651) ==42050== by 0x102853EC7: MPIR_Allgatherv (allgatherv.c:903) ==42050== by 0x102853F84: MPIR_Allgatherv_impl (allgatherv.c:944) ==42050== by 0x10274CA41: MPI_Allgatherv (allgatherv.c:1107) ==42050== by 0x101557F60: get_perm_c_parmetis (get_perm_c_parmetis.c:285) ==42050== by 0x101501192: pdgssvx (pdgssvx.c:934) ==42050== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414) ==42050== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) ==42050== by 0x100F09F2C: PCSetUp_LU (lu.c:152) ==42050== by 0x100FF9036: PCSetUp (precon.c:982) ==42050== by 0x1010F54EB: KSPSetUp (itfunc.c:332) ==42050== by 0x1010F7985: KSPSolve (itfunc.c:546) ==42050== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) ==42050== by 0x1011C49B7: SNESSolve (snes.c:3906) ==42050== by 0x100001B3C: main (in ./ex19) ==42050== Uninitialised value was created by a stack allocation ==42050== at 0x10155751B: get_perm_c_parmetis (get_perm_c_parmetis.c:96) ==42050== ==42049== Syscall param writev(vector[...]) points to uninitialised byte(s) ==42049== at 0x102DA1C3A: writev (in /usr/lib/system/libsystem_kernel.dylib) ==42049== by 0x10296A0DC: MPL_large_writev (mplsock.c:32) ==42049== by 0x10295F6AD: MPIDU_Sock_writev (sock_immed.i:610) ==42049== by 0x102943FCA: MPIDI_CH3_iSendv (ch3_isendv.c:84) ==42049== by 0x102934361: MPIDI_CH3_EagerContigIsend (ch3u_eager.c:556) ==42049== by 0x102939531: MPID_Isend (mpid_isend.c:138) ==42049== by 0x10277656E: MPI_Isend (isend.c:125) ==42049== by 0x102088B66: libparmetis__gkMPI_Isend (gkmpi.c:63) ==42049== by 0x10208140F: libparmetis__CommInterfaceData (comm.c:298) ==42049== by 0x1020A8758: libparmetis__CompactGraph (ometis.c:553) ==42049== by 0x1020A77BB: libparmetis__MultilevelOrder (ometis.c:225) ==42049== by 0x1020A7493: ParMETIS_V32_NodeND (ometis.c:151) ==42049== by 0x1020A6AFB: ParMETIS_V3_NodeND (ometis.c:34) ==42049== by 0x101557CFC: get_perm_c_parmetis (get_perm_c_parmetis.c:241) ==42049== by 0x101501192: pdgssvx (pdgssvx.c:934) ==42049== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414) ==42049== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) ==42049== by 0x100F09F2C: PCSetUp_LU (lu.c:152) ==42049== by 0x100FF9036: PCSetUp (precon.c:982) ==42048== Syscall param writev(vector[...]) points to uninitialised byte(s) ==42049== by 0x1010F54EB: KSPSetUp (itfunc.c:332) ==42049== Address 0x105edff70 is 1,424 bytes inside a block of size 752,720 alloc'd ==42049== at 0x1000183B1: malloc (vg_replace_malloc.c:303) ==42049== by 0x1020EB90C: gk_malloc (memory.c:147) ==42049== by 0x1020EAA28: gk_mcoreCreate (mcore.c:28) ==42048== at 0x102DA1C3A: writev (in /usr/lib/system/libsystem_kernel.dylib) ==42048== by 0x10296A0DC: MPL_large_writev (mplsock.c:32) ==42049== by 0x1020BA5CF: libparmetis__AllocateWSpace (wspace.c:23) ==42049== by 0x1020A6E84: ParMETIS_V32_NodeND (ometis.c:98) ==42048== by 0x10295F6AD: MPIDU_Sock_writev (sock_immed.i:610) ==42048== by 0x102943FCA: MPIDI_CH3_iSendv (ch3_isendv.c:84) ==42048== by 0x102934361: MPIDI_CH3_EagerContigIsend (ch3u_eager.c:556) ==42049== by 0x1020A6AFB: ParMETIS_V3_NodeND (ometis.c:34) ==42049== by 0x101557CFC: get_perm_c_parmetis (get_perm_c_parmetis.c:241) ==42049== by 0x101501192: pdgssvx (pdgssvx.c:934) ==42048== by 0x102939531: MPID_Isend (mpid_isend.c:138) ==42048== by 0x10277656E: MPI_Isend (isend.c:125) ==42049== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414) ==42049== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) ==42049== by 0x100F09F2C: PCSetUp_LU (lu.c:152) ==42049== by 0x100FF9036: PCSetUp (precon.c:982) ==42049== by 0x1010F54EB: KSPSetUp (itfunc.c:332) ==42048== by 0x102088B66: libparmetis__gkMPI_Isend (gkmpi.c:63) ==42048== by 0x10208140F: libparmetis__CommInterfaceData (comm.c:298) ==42049== by 0x1010F7985: KSPSolve (itfunc.c:546) ==42049== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) ==42048== by 0x1020A8758: libparmetis__CompactGraph (ometis.c:553) ==42048== by 0x1020A77BB: libparmetis__MultilevelOrder (ometis.c:225) ==42048== by 0x1020A7493: ParMETIS_V32_NodeND (ometis.c:151) ==42049== by 0x1011C49B7: SNESSolve (snes.c:3906) ==42049== by 0x100001B3C: main (in ./ex19) ==42049== Uninitialised value was created by a heap allocation ==42049== at 0x1000183B1: malloc (vg_replace_malloc.c:303) ==42049== by 0x1020EB90C: gk_malloc (memory.c:147) ==42048== by 0x1020A6AFB: ParMETIS_V3_NodeND (ometis.c:34) ==42048== by 0x101557CFC: get_perm_c_parmetis (get_perm_c_parmetis.c:241) ==42048== by 0x101501192: pdgssvx (pdgssvx.c:934) ==42048== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414) ==42049== by 0x10211C50B: libmetis__imalloc (gklib.c:24) ==42049== by 0x1020A8566: libparmetis__CompactGraph (ometis.c:519) ==42049== by 0x1020A77BB: libparmetis__MultilevelOrder (ometis.c:225) ==42048== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) ==42049== by 0x1020A7493: ParMETIS_V32_NodeND (ometis.c:151) ==42049== by 0x1020A6AFB: ParMETIS_V3_NodeND (ometis.c:34) ==42049== by 0x101557CFC: get_perm_c_parmetis (get_perm_c_parmetis.c:241) ==42049== by 0x101501192: pdgssvx (pdgssvx.c:934) ==42049== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414) ==42049== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) ==42048== by 0x100F09F2C: PCSetUp_LU (lu.c:152) ==42049== by 0x100F09F2C: PCSetUp_LU (lu.c:152) ==42049== by 0x100FF9036: PCSetUp (precon.c:982) ==42049== by 0x1010F54EB: KSPSetUp (itfunc.c:332) ==42049== by 0x1010F7985: KSPSolve (itfunc.c:546) ==42049== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) ==42048== by 0x100FF9036: PCSetUp (precon.c:982) ==42048== by 0x1010F54EB: KSPSetUp (itfunc.c:332) ==42048== Address 0x10597a860 is 1,408 bytes inside a block of size 752,720 alloc'd ==42049== by 0x1011C49B7: SNESSolve (snes.c:3906) ==42049== by 0x100001B3C: main (in ./ex19) ==42049== ==42048== at 0x1000183B1: malloc (vg_replace_malloc.c:303) ==42048== by 0x1020EB90C: gk_malloc (memory.c:147) ==42048== by 0x1020EAA28: gk_mcoreCreate (mcore.c:28) ==42048== by 0x1020BA5CF: libparmetis__AllocateWSpace (wspace.c:23) ==42048== by 0x1020A6E84: ParMETIS_V32_NodeND (ometis.c:98) ==42048== by 0x1020A6AFB: ParMETIS_V3_NodeND (ometis.c:34) ==42048== by 0x101557CFC: get_perm_c_parmetis (get_perm_c_parmetis.c:241) ==42048== by 0x101501192: pdgssvx (pdgssvx.c:934) ==42048== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414) ==42048== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) ==42048== by 0x100F09F2C: PCSetUp_LU (lu.c:152) ==42048== by 0x100FF9036: PCSetUp (precon.c:982) ==42048== by 0x1010F54EB: KSPSetUp (itfunc.c:332) ==42048== by 0x1010F7985: KSPSolve (itfunc.c:546) ==42048== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) ==42048== by 0x1011C49B7: SNESSolve (snes.c:3906) ==42048== by 0x100001B3C: main (in ./ex19) ==42048== Uninitialised value was created by a heap allocation ==42048== at 0x1000183B1: malloc (vg_replace_malloc.c:303) ==42048== by 0x1020EB90C: gk_malloc (memory.c:147) ==42048== by 0x10211C50B: libmetis__imalloc (gklib.c:24) ==42048== by 0x1020A8566: libparmetis__CompactGraph (ometis.c:519) ==42048== by 0x1020A77BB: libparmetis__MultilevelOrder (ometis.c:225) ==42048== by 0x1020A7493: ParMETIS_V32_NodeND (ometis.c:151) ==42048== by 0x1020A6AFB: ParMETIS_V3_NodeND (ometis.c:34) ==42048== by 0x101557CFC: get_perm_c_parmetis (get_perm_c_parmetis.c:241) ==42048== by 0x101501192: pdgssvx (pdgssvx.c:934) ==42048== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414) ==42048== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) ==42048== by 0x100F09F2C: PCSetUp_LU (lu.c:152) ==42048== by 0x100FF9036: PCSetUp (precon.c:982) ==42048== by 0x1010F54EB: KSPSetUp (itfunc.c:332) ==42048== by 0x1010F7985: KSPSolve (itfunc.c:546) ==42048== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) ==42048== by 0x1011C49B7: SNESSolve (snes.c:3906) ==42048== by 0x100001B3C: main (in ./ex19) ==42048== ==42048== Syscall param write(buf) points to uninitialised byte(s) ==42048== at 0x102DA1C22: write (in /usr/lib/system/libsystem_kernel.dylib) ==42048== by 0x10295F5BD: MPIDU_Sock_write (sock_immed.i:525) ==42048== by 0x102944839: MPIDI_CH3_iStartMsg (ch3_istartmsg.c:86) ==42048== by 0x102933B80: MPIDI_CH3_EagerContigShortSend (ch3u_eager.c:257) ==42048== by 0x10293ADBA: MPID_Send (mpid_send.c:130) ==42048== by 0x10277A1FA: MPI_Send (send.c:127) ==42048== by 0x10155802F: get_perm_c_parmetis (get_perm_c_parmetis.c:299) ==42048== by 0x101501192: pdgssvx (pdgssvx.c:934) ==42048== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414) ==42048== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) ==42048== by 0x100F09F2C: PCSetUp_LU (lu.c:152) ==42048== by 0x100FF9036: PCSetUp (precon.c:982) ==42048== by 0x1010F54EB: KSPSetUp (itfunc.c:332) ==42048== by 0x1010F7985: KSPSolve (itfunc.c:546) ==42048== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) ==42048== by 0x1011C49B7: SNESSolve (snes.c:3906) ==42048== by 0x100001B3C: main (in ./ex19) ==42048== Address 0x104810704 is on thread 1's stack ==42048== in frame #3, created by MPIDI_CH3_EagerContigShortSend (ch3u_eager.c:218) ==42048== Uninitialised value was created by a heap allocation ==42048== at 0x1000183B1: malloc (vg_replace_malloc.c:303) ==42048== by 0x10153B704: superlu_malloc_dist (memory.c:108) ==42048== by 0x101557AB9: get_perm_c_parmetis (get_perm_c_parmetis.c:185) ==42048== by 0x101501192: pdgssvx (pdgssvx.c:934) ==42048== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414) ==42048== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) ==42048== by 0x100F09F2C: PCSetUp_LU (lu.c:152) ==42048== by 0x100FF9036: PCSetUp (precon.c:982) ==42048== by 0x1010F54EB: KSPSetUp (itfunc.c:332) ==42048== by 0x1010F7985: KSPSolve (itfunc.c:546) ==42048== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) ==42048== by 0x1011C49B7: SNESSolve (snes.c:3906) ==42048== by 0x100001B3C: main (in ./ex19) ==42048== ==42050== Conditional jump or move depends on uninitialised value(s) ==42050== at 0x102744CB8: MPI_Alltoallv (alltoallv.c:480) ==42050== by 0x101510B3E: dist_symbLU (pdsymbfact_distdata.c:539) ==42050== by 0x10150A5C6: ddist_psymbtonum (pdsymbfact_distdata.c:1275) ==42050== by 0x1015018C2: pdgssvx (pdgssvx.c:1057) ==42050== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414) ==42050== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) ==42050== by 0x100F09F2C: PCSetUp_LU (lu.c:152) ==42050== by 0x100FF9036: PCSetUp (precon.c:982) ==42050== by 0x1010F54EB: KSPSetUp (itfunc.c:332) ==42050== by 0x1010F7985: KSPSolve (itfunc.c:546) ==42050== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) ==42050== by 0x1011C49B7: SNESSolve (snes.c:3906) ==42050== by 0x100001B3C: main (in ./ex19) ==42050== Uninitialised value was created by a stack allocation ==42050== at 0x10150E4C4: dist_symbLU (pdsymbfact_distdata.c:96) ==42050== ==42050== Conditional jump or move depends on uninitialised value(s) ==42050== at 0x102744E43: MPI_Alltoallv (alltoallv.c:490) ==42050== by 0x101510B3E: dist_symbLU (pdsymbfact_distdata.c:539) ==42050== by 0x10150A5C6: ddist_psymbtonum (pdsymbfact_distdata.c:1275) ==42050== by 0x1015018C2: pdgssvx (pdgssvx.c:1057) ==42050== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414) ==42050== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) ==42050== by 0x100F09F2C: PCSetUp_LU (lu.c:152) ==42050== by 0x100FF9036: PCSetUp (precon.c:982) ==42050== by 0x1010F54EB: KSPSetUp (itfunc.c:332) ==42050== by 0x1010F7985: KSPSolve (itfunc.c:546) ==42050== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) ==42050== by 0x1011C49B7: SNESSolve (snes.c:3906) ==42050== by 0x100001B3C: main (in ./ex19) ==42050== Uninitialised value was created by a stack allocation ==42050== at 0x10150E4C4: dist_symbLU (pdsymbfact_distdata.c:96) ==42050== ==42050== Conditional jump or move depends on uninitialised value(s) ==42050== at 0x102744EBF: MPI_Alltoallv (alltoallv.c:497) ==42050== by 0x101510B3E: dist_symbLU (pdsymbfact_distdata.c:539) ==42050== by 0x10150A5C6: ddist_psymbtonum (pdsymbfact_distdata.c:1275) ==42050== by 0x1015018C2: pdgssvx (pdgssvx.c:1057) ==42050== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414) ==42050== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) ==42050== by 0x100F09F2C: PCSetUp_LU (lu.c:152) ==42050== by 0x100FF9036: PCSetUp (precon.c:982) ==42050== by 0x1010F54EB: KSPSetUp (itfunc.c:332) ==42050== by 0x1010F7985: KSPSolve (itfunc.c:546) ==42050== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) ==42050== by 0x1011C49B7: SNESSolve (snes.c:3906) ==42050== by 0x100001B3C: main (in ./ex19) ==42050== Uninitialised value was created by a stack allocation ==42050== at 0x10150E4C4: dist_symbLU (pdsymbfact_distdata.c:96) ==42050== ==42050== Conditional jump or move depends on uninitialised value(s) ==42050== at 0x1027450B1: MPI_Alltoallv (alltoallv.c:512) ==42050== by 0x101510B3E: dist_symbLU (pdsymbfact_distdata.c:539) ==42050== by 0x10150A5C6: ddist_psymbtonum (pdsymbfact_distdata.c:1275) ==42050== by 0x1015018C2: pdgssvx (pdgssvx.c:1057) ==42050== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414) ==42050== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) ==42050== by 0x100F09F2C: PCSetUp_LU (lu.c:152) ==42050== by 0x100FF9036: PCSetUp (precon.c:982) ==42050== by 0x1010F54EB: KSPSetUp (itfunc.c:332) ==42050== by 0x1010F7985: KSPSolve (itfunc.c:546) ==42050== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) ==42050== by 0x1011C49B7: SNESSolve (snes.c:3906) ==42050== by 0x100001B3C: main (in ./ex19) ==42050== Uninitialised value was created by a stack allocation ==42050== at 0x10150E4C4: dist_symbLU (pdsymbfact_distdata.c:96) ==42050== ==42050== Conditional jump or move depends on uninitialised value(s) ==42050== at 0x10283FB06: MPIR_Alltoallv_intra (alltoallv.c:92) ==42050== by 0x1028407B6: MPIR_Alltoallv (alltoallv.c:343) ==42050== by 0x102840884: MPIR_Alltoallv_impl (alltoallv.c:380) ==42050== by 0x10274541B: MPI_Alltoallv (alltoallv.c:531) ==42050== by 0x101510B3E: dist_symbLU (pdsymbfact_distdata.c:539) ==42050== by 0x10150A5C6: ddist_psymbtonum (pdsymbfact_distdata.c:1275) ==42050== by 0x1015018C2: pdgssvx (pdgssvx.c:1057) ==42050== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414) ==42050== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) ==42050== by 0x100F09F2C: PCSetUp_LU (lu.c:152) ==42050== by 0x100FF9036: PCSetUp (precon.c:982) ==42050== by 0x1010F54EB: KSPSetUp (itfunc.c:332) ==42050== by 0x1010F7985: KSPSolve (itfunc.c:546) ==42050== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) ==42050== by 0x1011C49B7: SNESSolve (snes.c:3906) ==42050== by 0x100001B3C: main (in ./ex19) ==42050== Uninitialised value was created by a stack allocation ==42050== at 0x10150E4C4: dist_symbLU (pdsymbfact_distdata.c:96) ==42050== ==42050== Syscall param writev(vector[...]) points to uninitialised byte(s) ==42050== at 0x102DA1C3A: writev (in /usr/lib/system/libsystem_kernel.dylib) ==42050== by 0x10296A0DC: MPL_large_writev (mplsock.c:32) ==42050== by 0x10295F6AD: MPIDU_Sock_writev (sock_immed.i:610) ==42050== by 0x102943FCA: MPIDI_CH3_iSendv (ch3_isendv.c:84) ==42050== by 0x102934361: MPIDI_CH3_EagerContigIsend (ch3u_eager.c:556) ==42050== by 0x102939531: MPID_Isend (mpid_isend.c:138) ==42050== by 0x10277656E: MPI_Isend (isend.c:125) ==42050== by 0x101524C41: pdgstrf2_trsm (pdgstrf2.c:201) ==42050== by 0x10151ECBF: pdgstrf (pdgstrf.c:1082) ==42050== by 0x1015019A5: pdgssvx (pdgssvx.c:1069) ==42050== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414) ==42050== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) ==42050== by 0x100F09F2C: PCSetUp_LU (lu.c:152) ==42050== by 0x100FF9036: PCSetUp (precon.c:982) ==42050== by 0x1010F54EB: KSPSetUp (itfunc.c:332) ==42050== by 0x1010F7985: KSPSolve (itfunc.c:546) ==42050== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) ==42050== by 0x1011C49B7: SNESSolve (snes.c:3906) ==42050== by 0x100001B3C: main (in ./ex19) ==42050== Address 0x1060144d0 is 1,168 bytes inside a block of size 131,072 alloc'd ==42050== at 0x1000183B1: malloc (vg_replace_malloc.c:303) ==42050== by 0x10153B704: superlu_malloc_dist (memory.c:108) ==42050== by 0x1014FD7AD: doubleMalloc_dist (dmemory.c:145) ==42050== by 0x10151DA7D: pdgstrf (pdgstrf.c:735) ==42050== by 0x1015019A5: pdgssvx (pdgssvx.c:1069) ==42050== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414) ==42050== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) ==42050== by 0x100F09F2C: PCSetUp_LU (lu.c:152) ==42050== by 0x100FF9036: PCSetUp (precon.c:982) ==42050== by 0x1010F54EB: KSPSetUp (itfunc.c:332) ==42050== by 0x1010F7985: KSPSolve (itfunc.c:546) ==42050== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) ==42050== by 0x1011C49B7: SNESSolve (snes.c:3906) ==42050== by 0x100001B3C: main (in ./ex19) ==42050== Uninitialised value was created by a heap allocation ==42050== at 0x1000183B1: malloc (vg_replace_malloc.c:303) ==42050== by 0x10153B704: superlu_malloc_dist (memory.c:108) ==42050== by 0x1014FD7AD: doubleMalloc_dist (dmemory.c:145) ==42050== by 0x10151DA7D: pdgstrf (pdgstrf.c:735) ==42050== by 0x1015019A5: pdgssvx (pdgssvx.c:1069) ==42050== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414) ==42050== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) ==42050== by 0x100F09F2C: PCSetUp_LU (lu.c:152) ==42050== by 0x100FF9036: PCSetUp (precon.c:982) ==42050== by 0x1010F54EB: KSPSetUp (itfunc.c:332) ==42050== by 0x1010F7985: KSPSolve (itfunc.c:546) ==42050== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) ==42050== by 0x1011C49B7: SNESSolve (snes.c:3906) ==42050== by 0x100001B3C: main (in ./ex19) ==42050== ==42048== Conditional jump or move depends on uninitialised value(s) ==42048== at 0x10151F141: pdgstrf (pdgstrf.c:1139) ==42048== by 0x1015019A5: pdgssvx (pdgssvx.c:1069) ==42048== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414) ==42048== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) ==42048== by 0x100F09F2C: PCSetUp_LU (lu.c:152) ==42048== by 0x100FF9036: PCSetUp (precon.c:982) ==42048== by 0x1010F54EB: KSPSetUp (itfunc.c:332) ==42048== by 0x1010F7985: KSPSolve (itfunc.c:546) ==42048== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) ==42048== by 0x1011C49B7: SNESSolve (snes.c:3906) ==42048== by 0x100001B3C: main (in ./ex19) ==42048== Uninitialised value was created by a heap allocation ==42048== at 0x1000183B1: malloc (vg_replace_malloc.c:303) ==42048== by 0x10153B704: superlu_malloc_dist (memory.c:108) ==42048== by 0x10150ABE2: ddist_psymbtonum (pdsymbfact_distdata.c:1332) ==42048== by 0x1015018C2: pdgssvx (pdgssvx.c:1057) ==42048== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414) ==42048== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) ==42048== by 0x100F09F2C: PCSetUp_LU (lu.c:152) ==42048== by 0x100FF9036: PCSetUp (precon.c:982) ==42048== by 0x1010F54EB: KSPSetUp (itfunc.c:332) ==42048== by 0x1010F7985: KSPSolve (itfunc.c:546) ==42048== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) ==42048== by 0x1011C49B7: SNESSolve (snes.c:3906) ==42048== by 0x100001B3C: main (in ./ex19) ==42048== ==42049== Conditional jump or move depends on uninitialised value(s) ==42049== at 0x10151F141: pdgstrf (pdgstrf.c:1139) ==42049== by 0x1015019A5: pdgssvx (pdgssvx.c:1069) ==42049== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414) ==42049== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) ==42049== by 0x100F09F2C: PCSetUp_LU (lu.c:152) ==42049== by 0x100FF9036: PCSetUp (precon.c:982) ==42049== by 0x1010F54EB: KSPSetUp (itfunc.c:332) ==42049== by 0x1010F7985: KSPSolve (itfunc.c:546) ==42049== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) ==42049== by 0x1011C49B7: SNESSolve (snes.c:3906) ==42049== by 0x100001B3C: main (in ./ex19) ==42049== Uninitialised value was created by a heap allocation ==42049== at 0x1000183B1: malloc (vg_replace_malloc.c:303) ==42049== by 0x10153B704: superlu_malloc_dist (memory.c:108) ==42049== by 0x10150ABE2: ddist_psymbtonum (pdsymbfact_distdata.c:1332) ==42049== by 0x1015018C2: pdgssvx (pdgssvx.c:1057) ==42049== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414) ==42049== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) ==42049== by 0x100F09F2C: PCSetUp_LU (lu.c:152) ==42049== by 0x100FF9036: PCSetUp (precon.c:982) ==42049== by 0x1010F54EB: KSPSetUp (itfunc.c:332) ==42049== by 0x1010F7985: KSPSolve (itfunc.c:546) ==42049== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) ==42049== by 0x1011C49B7: SNESSolve (snes.c:3906) ==42049== by 0x100001B3C: main (in ./ex19) ==42049== ==42048== Conditional jump or move depends on uninitialised value(s) ==42048== at 0x101520054: pdgstrf (pdgstrf.c:1429) ==42048== by 0x1015019A5: pdgssvx (pdgssvx.c:1069) ==42048== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414) ==42048== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) ==42048== by 0x100F09F2C: PCSetUp_LU (lu.c:152) ==42048== by 0x100FF9036: PCSetUp (precon.c:982) ==42048== by 0x1010F54EB: KSPSetUp (itfunc.c:332) ==42048== by 0x1010F7985: KSPSolve (itfunc.c:546) ==42049== Conditional jump or move depends on uninitialised value(s) ==42048== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) ==42048== by 0x1011C49B7: SNESSolve (snes.c:3906) ==42048== by 0x100001B3C: main (in ./ex19) ==42048== Uninitialised value was created by a heap allocation ==42049== at 0x101520054: pdgstrf (pdgstrf.c:1429) ==42048== at 0x1000183B1: malloc (vg_replace_malloc.c:303) ==42048== by 0x10153B704: superlu_malloc_dist (memory.c:108) ==42049== by 0x1015019A5: pdgssvx (pdgssvx.c:1069) ==42049== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414) ==42048== by 0x10150ABE2: ddist_psymbtonum (pdsymbfact_distdata.c:1332) ==42048== by 0x1015018C2: pdgssvx (pdgssvx.c:1057) ==42048== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414) ==42049== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) ==42049== by 0x100F09F2C: PCSetUp_LU (lu.c:152) ==42048== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) ==42048== by 0x100F09F2C: PCSetUp_LU (lu.c:152) ==42049== by 0x100FF9036: PCSetUp (precon.c:982) ==42049== by 0x1010F54EB: KSPSetUp (itfunc.c:332) ==42049== by 0x1010F7985: KSPSolve (itfunc.c:546) ==42048== by 0x100FF9036: PCSetUp (precon.c:982) ==42048== by 0x1010F54EB: KSPSetUp (itfunc.c:332) ==42048== by 0x1010F7985: KSPSolve (itfunc.c:546) ==42049== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) ==42049== by 0x1011C49B7: SNESSolve (snes.c:3906) ==42048== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) ==42048== by 0x1011C49B7: SNESSolve (snes.c:3906) ==42049== by 0x100001B3C: main (in ./ex19) ==42049== Uninitialised value was created by a heap allocation ==42049== at 0x1000183B1: malloc (vg_replace_malloc.c:303) ==42048== by 0x100001B3C: main (in ./ex19) ==42048== ==42049== by 0x10153B704: superlu_malloc_dist (memory.c:108) ==42049== by 0x10150ABE2: ddist_psymbtonum (pdsymbfact_distdata.c:1332) ==42049== by 0x1015018C2: pdgssvx (pdgssvx.c:1057) ==42049== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414) ==42049== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) ==42049== by 0x100F09F2C: PCSetUp_LU (lu.c:152) ==42049== by 0x100FF9036: PCSetUp (precon.c:982) ==42049== by 0x1010F54EB: KSPSetUp (itfunc.c:332) ==42049== by 0x1010F7985: KSPSolve (itfunc.c:546) ==42049== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) ==42049== by 0x1011C49B7: SNESSolve (snes.c:3906) ==42049== by 0x100001B3C: main (in ./ex19) ==42049== ==42050== Conditional jump or move depends on uninitialised value(s) ==42050== at 0x10151FDE6: pdgstrf (pdgstrf.c:1382) ==42050== by 0x1015019A5: pdgssvx (pdgssvx.c:1069) ==42050== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414) ==42050== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) ==42050== by 0x100F09F2C: PCSetUp_LU (lu.c:152) ==42050== by 0x100FF9036: PCSetUp (precon.c:982) ==42050== by 0x1010F54EB: KSPSetUp (itfunc.c:332) ==42050== by 0x1010F7985: KSPSolve (itfunc.c:546) ==42050== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) ==42050== by 0x1011C49B7: SNESSolve (snes.c:3906) ==42050== by 0x100001B3C: main (in ./ex19) ==42050== Uninitialised value was created by a heap allocation ==42050== at 0x1000183B1: malloc (vg_replace_malloc.c:303) ==42050== by 0x10153B704: superlu_malloc_dist (memory.c:108) ==42050== by 0x10150B241: ddist_psymbtonum (pdsymbfact_distdata.c:1389) ==42050== by 0x1015018C2: pdgssvx (pdgssvx.c:1057) ==42050== by 0x1009CFE7A: MatLUFactorNumeric_SuperLU_DIST (superlu_dist.c:414) ==42050== by 0x10046CC5C: MatLUFactorNumeric (matrix.c:2946) ==42050== by 0x100F09F2C: PCSetUp_LU (lu.c:152) ==42050== by 0x100FF9036: PCSetUp (precon.c:982) ==42050== by 0x1010F54EB: KSPSetUp (itfunc.c:332) ==42050== by 0x1010F7985: KSPSolve (itfunc.c:546) ==42050== by 0x10125541E: SNESSolve_NEWTONLS (ls.c:233) ==42050== by 0x1011C49B7: SNESSolve (snes.c:3906) ==42050== by 0x100001B3C: main (in ./ex19) ==42050== > On Jul 20, 2015, at 12:03 PM, > [email protected]<mailto:[email protected]> wrote: > > Ok. So I have been creating the full factorization on each process. That > gives me some hope! > > I followed your suggestion and tried to use the runtime option > ‘-mat_superlu_dist_parsymbfact’. > However, now the program crashes with: > > Invalid ISPEC at line 484 in file get_perm_c.c > > And so on… > > From the SuperLU manual; I should give the option either YES or NO, however > -mat_superlu_dist_parsymbfact YES makes the program crash in the same way as > above. > Also I can’t find any reference to -mat_superlu_dist_parsymbfact in the PETSc > documentation > > Mahir > > Mahir Ülker-Kaustell, Kompetenssamordnare, Brokonstruktör, Tekn. Dr, Tyréns AB > 010 452 30 82, > [email protected]<mailto:[email protected]> > > From: Xiaoye S. Li [mailto:[email protected]<mailto:[email protected]>] > Sent: den 20 juli 2015 18:12 > To: Ülker-Kaustell, Mahir > Cc: Hong; petsc-users > Subject: Re: [petsc-users] SuperLU MPI-problem > > The default SuperLU_DIST setting is to serial symbolic factorization. > Therefore, what matters is how much memory do you have per MPI task? > > The code failed to malloc memory during redistribution of matrix A to {L\U} > data struction (using result of serial symbolic factorization.) > > You can use parallel symbolic factorization, by runtime option: > '-mat_superlu_dist_parsymbfact' > > Sherry Li > > > On Mon, Jul 20, 2015 at 8:59 AM, > [email protected]<mailto:[email protected]> > <[email protected]<mailto:[email protected]>> wrote: > Hong: > > Previous experiences with this equation have shown that it is very difficult > to solve it iteratively. Hence the use of a direct solver. > > The large test problem I am trying to solve has slightly less than 10^6 > degrees of freedom. The matrices are derived from finite elements so they are > sparse. > The machine I am working on has 128GB ram. I have estimated the memory needed > to less than 20GB, so if the solver needs twice or even three times as much, > it should still work well. Or have I completely misunderstood something here? > > Mahir > > > > From: Hong [mailto:[email protected]<mailto:[email protected]>] > Sent: den 20 juli 2015 17:39 > To: Ülker-Kaustell, Mahir > Cc: petsc-users > Subject: Re: [petsc-users] SuperLU MPI-problem > > Mahir: > Direct solvers consume large amount of memory. Suggest to try followings: > > 1. A sparse iterative solver if [-omega^2M + K] is not too ill-conditioned. > You may test it using the small matrix. > > 2. Incrementally increase your matrix sizes. Try different matrix orderings. > Do you get memory crash in the 1st symbolic factorization? > In your case, matrix data structure stays same when omega changes, so you > only need to do one matrix symbolic factorization and reuse it. > > 3. Use a machine that gives larger memory. > > Hong > > Dear Petsc-Users, > > I am trying to use PETSc to solve a set of linear equations arising from > Naviers equation (elastodynamics) in the frequency domain. > The frequency dependency of the problem requires that the system > > [-omega^2M + K]u = F > > where M and K are constant, square, positive definite matrices (mass and > stiffness respectively) is solved for each frequency omega of interest. > K is a complex matrix, including material damping. > > I have written a PETSc program which solves this problem for a small (1000 > degrees of freedom) test problem on one or several processors, but it keeps > crashing when I try it on my full scale (in the order of 10^6 degrees of > freedom) problem. > > The program crashes at KSPSetUp() and from what I can see in the error > messages, it appears as if it consumes too much memory. > > I would guess that similar problems have occurred in this mail-list, so I am > hoping that someone can push me in the right direction… > > Mahir -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
