Re: [petsc-users] Question about TSComputeRHSJacobianConstant
> On May 16, 2019, at 8:04 PM, Sajid Ali > wrote: > > While there is a ~3.5X speedup, deleting the aforementioned 20 lines also > leads the new version of petsc to give the wrong solution (off by orders of > magnitude for the same program). Ok, sorry about this. Unfortunately this stuff has been giving us headaches for years and we are struggling to get it right. > > I tried switching over the the IFunction/IJacobian interface as per the > manual (page 146) which the following lines : It is probably better to not switch to the IFunction/IJacobian, we are more likely to get the TS version working properly. > ``` > TSSetProblemType(ts,TSLINEAR); > TSSetRHSFunction(ts,NULL,TSComputeRHSFunctionLinear,NULL); > TSSetRHSJacobian(ts,A,A,TSComputeRHSJacobianConstant,NULL); > ``` > are equivalent to : > ``` > TSSetProblemType(ts,TSLINEAR); > TSSetIFunction(ts,NULL,TSComputeIFunctionLinear,NULL); > TSSetIJacobian(ts,A,A,TSComputeIJacobianConstant,NULL); > ``` > But the example at src/ts/examples/tutorials/ex3.c employs a strategy of > setting a shift flag to prevent re-computation for time-independent problems. > Moreover, the docs say "using this function (TSComputeIFunctionLinear) is NOT > equivalent to using TSComputeRHSFunctionLinear()" and now I'm even more > confused. > > PS : Doing the simple switch is as slow as the original code and the answer > is wrong as well. > > Thank You, > Sajid Ali > Applied Physics > Northwestern University
Re: [petsc-users] Question about TSComputeRHSJacobianConstant
Hi Sajid, Can you please try this branch hongzh/fix-computejacobian quickly and see if it makes a difference? Thanks, Hong (Mr.) On May 16, 2019, at 8:04 PM, Sajid Ali via petsc-users mailto:petsc-users@mcs.anl.gov>> wrote: While there is a ~3.5X speedup, deleting the aforementioned 20 lines also leads the new version of petsc to give the wrong solution (off by orders of magnitude for the same program). I tried switching over the the IFunction/IJacobian interface as per the manual (page 146) which the following lines : ``` TSSetProblemType(ts,TSLINEAR); TSSetRHSFunction(ts,NULL,TSComputeRHSFunctionLinear,NULL); TSSetRHSJacobian(ts,A,A,TSComputeRHSJacobianConstant,NULL); ``` are equivalent to : ``` TSSetProblemType(ts,TSLINEAR); TSSetIFunction(ts,NULL,TSComputeIFunctionLinear,NULL); TSSetIJacobian(ts,A,A,TSComputeIJacobianConstant,NULL); ``` But the example at src/ts/examples/tutorials/ex3.c employs a strategy of setting a shift flag to prevent re-computation for time-independent problems. Moreover, the docs say "using this function (TSComputeIFunctionLinear) is NOT equivalent to using TSComputeRHSFunctionLinear()" and now I'm even more confused. PS : Doing the simple switch is as slow as the original code and the answer is wrong as well. Thank You, Sajid Ali Applied Physics Northwestern University
Re: [petsc-users] Question about TSComputeRHSJacobianConstant
While there is a ~3.5X speedup, deleting the aforementioned 20 lines also leads the new version of petsc to give the wrong solution (off by orders of magnitude for the same program). I tried switching over the the IFunction/IJacobian interface as per the manual (page 146) which the following lines : ``` TSSetProblemType(ts,TSLINEAR); TSSetRHSFunction(ts,NULL,TSComputeRHSFunctionLinear,NULL); TSSetRHSJacobian(ts,A,A,TSComputeRHSJacobianConstant,NULL); ``` are equivalent to : ``` TSSetProblemType(ts,TSLINEAR); TSSetIFunction(ts,NULL,TSComputeIFunctionLinear,NULL); TSSetIJacobian(ts,A,A,TSComputeIJacobianConstant,NULL); ``` But the example at src/ts/examples/tutorials/ex3.c employs a strategy of setting a shift flag to prevent re-computation for time-independent problems. Moreover, the docs say "using this function (TSComputeIFunctionLinear) is NOT equivalent to using TSComputeRHSFunctionLinear()" and now I'm even more confused. PS : Doing the simple switch is as slow as the original code and the answer is wrong as well. Thank You, Sajid Ali Applied Physics Northwestern University
Re: [petsc-users] Question about TSComputeRHSJacobianConstant
Hi Barry, Thanks a lot for pointing this out. I'm seeing ~3X speedup in time ! Attached are the new log files. Does everything look right ? Thank You, Sajid Ali Applied Physics Northwestern University out_50 Description: Binary data out_100 Description: Binary data
Re: [petsc-users] MatCreateBAIJ, SNES, Preallocation...
Ok, got it. My misinterpretation was how to fill the d_nnz and o_nnz arrays. Thank you for your help! Might I make a suggestion related to the documentation? Perhaps I have not fully read the page on the MatMPIBAIJSetPreallocation so you can simply disregard and I'm ok with that! The documentation has for the d_nnz: d_nnz - array containing the number of block nonzeros in the various block rows of the in diagonal portion of the local (possibly different for each block row) or NULL. If you plan to factor the matrix you must leave room for the diagonal entry and set it even if it is zero. Am I correct in that this array should be of size numRows, where numRows is found from calling MatGetOwnershipRange(J,,) so numRows=iHigh-iLow. I think my error was allocating this only to be numRows/bs since I thought it's a block size thing. When I fixed this, things worked really fast! Maybe it's obvious it should be that size. :-) Regardless, thanks for your help. PETSc is great! I'm a believer and user. From: Smith, Barry F. [bsm...@mcs.anl.gov] Sent: Thursday, May 16, 2019 4:07 PM To: William Coirier Cc: petsc-users@mcs.anl.gov; Michael Robinson; Andrew Holm Subject: Re: [petsc-users] MatCreateBAIJ, SNES, Preallocation... > On May 16, 2019, at 3:44 PM, William Coirier via petsc-users > wrote: > > Folks: > > I'm developing an application using the SNES, and overall it's working great, > as many of our other PETSc-based projects. But, I'm having a problem related > to (presumably) pre-allocation, block matrices and SNES. > > Without going into details about the actual problem we are solving, here are > the symptoms/characteristics/behavior. > • For the SNES Jacobian, I'm using MatCreateBAIJ for a block size=3, > and letting "PETSC_DECIDE" the partitioning. Actual call is: > • ierr = MatCreateBAIJ(PETSC_COMM_WORLD, bs, PETSC_DECIDE, > PETSC_DECIDE, (int)3 * numNodesSAM, (int)3 * numNodesSAM, PETSC_DEFAULT, > NULL, PETSC_DEFAULT, NULL, ); > • When registering the SNES jacobian function, I set the B and J > matrices to be the same. > • ierr = SNESSetJacobian(snes, J, J, SAMformSNESJ, (void > *)this); CHKERRQ(ierr); > • I can either let PETSc figure out the allocation structure: > • ierr = MatMPIBAIJSetPreallocation(J, bs, PETSC_DEFAULT, > NULL,PETSC_DEFAULT, NULL); > • or, do it myself, since I know the fill pattern, > • ierr = MatMPIBAIJSetPreallocation(J, bs, > d_nz_dum,_nnz[0],o_nz_dum,_nnz[0]); > The symptoms/problems are as follows: > • Whether I do preallocation or not, the "setup" time is pretty long. > It might take 2 minutes before SNES starts doing its thing. After this setup, > convergence and speed is great. But this first phase takes a long time. I'm > assuming this has to be related to some poor preallocation setup so it's > doing tons of mallocs where it’s not needed. You should definitely get much better performance with proper preallocation then with none (unless the default is enough for your matrix). Run with -info and grep for "malloc" this will tell you exactly how many, if any mallocs are taking place inside the MatSetValues() due to improper preallocation. > • If I don't call my Jacobian formulation before calling SNESSolve, I > get a segmentation violation in a PETSc routine. Not sure what you mean by Jacobian formation but I'm guessing filling up the Jacobian with numerical values? Something is wrong because you should not need to fill up the values before calling SNES solve, and regardless it should never ever crash with a segmentation violation. You can run with valgrind https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mcs.anl.gov_petsc_documentation_faq.html-23valgrind=DwIGaQ=zeCCs5WLaN-HWPHrpXwbFoOqeS0G3NH2_2IQ_bzV13g=q_3hswOPAFb0l_4-IAZZi5DgTpzDUIpk984njq2YnggBd-vCWTgNlbk27KjHXmKK=IREQmkCt5PXK-SnLqJZXz3Du7h3mFP24xtI0jHGgGUY=UiGHkQ2Zr_nYYQ-GYg1HEYtbqZutYSgv9F1A86sfNKI= to make sure that it is not a memory corruption issue. You can also run in the debugger (perhaps the PETSc command line option -start_in_debugger) to get more details on why it is crashing. When you have it running satisfactory you can send us the output from running with -log_view and we can let you know how it seems to be performing efficiency wise. Barry > (If I DO call my Jacobian first, things work great, although slow for the > setup phase.) Here's a snippet of the traceback: > 0 0x009649fc in MatMultAdd_SeqBAIJ_3 (A=, > xx=0x3a525b0, yy=0x3a531b0, zz=0x3a531b0) > at > /home/jstutts/Downloads/petsc-3.11.1/src/mat/impls/baij/seq/baij2.c:1424 > #1 0x006444cb in MatMult_MPIBAIJ (A=0x15da340, xx=0x3a542a0, > yy=0x3a531b0) > at > /home/jstutts/Downloads/petsc-3.11.1/src/mat/impls/baij/mpi/mpibaij.c:1380 > #2 0x005b2c0f in MatMult (mat=0x15da340, x=x@entry=0x3a542a0, >
Re: [petsc-users] MatCreateBAIJ, SNES, Preallocation...
Barry: Thanks for the quick response! Running with -info gives nearly the same # of mallocs whether I "prealloc" or not. I'll bet I'm doing something wrong with the preallocation. I must know the matrix structure since convergence is really good with SNES. I should have 9232128 total non zeros, and when i do a -info -mat_view ::ascii_info i see that in the diagnostic output, but I also see a lot of allocated non-zeros: Mat Object: SNES_Jacobian 8 MPI processes type: mpibaij rows=453195, cols=453195, bs=3 total: nonzeros=9232128, allocated nonzeros=203660352 total number of mallocs used during MatSetValues calls =146300 block size is 3 grepping for malloc in the output shows this initially (8 processors) and then zeros afterwards. Makes sense. [0] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is 18884 [3] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is 18883 [7] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is 14122 [4] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is 18883 [5] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is 18881 [2] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is 18882 [1] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is 18882 [6] MatAssemblyEnd_SeqBAIJ(): Number of mallocs during MatSetValues is 18883 From: Smith, Barry F. [bsm...@mcs.anl.gov] Sent: Thursday, May 16, 2019 4:07 PM To: William Coirier Cc: petsc-users@mcs.anl.gov; Michael Robinson; Andrew Holm Subject: Re: [petsc-users] MatCreateBAIJ, SNES, Preallocation... > On May 16, 2019, at 3:44 PM, William Coirier via petsc-users > wrote: > > Folks: > > I'm developing an application using the SNES, and overall it's working great, > as many of our other PETSc-based projects. But, I'm having a problem related > to (presumably) pre-allocation, block matrices and SNES. > > Without going into details about the actual problem we are solving, here are > the symptoms/characteristics/behavior. > • For the SNES Jacobian, I'm using MatCreateBAIJ for a block size=3, > and letting "PETSC_DECIDE" the partitioning. Actual call is: > • ierr = MatCreateBAIJ(PETSC_COMM_WORLD, bs, PETSC_DECIDE, > PETSC_DECIDE, (int)3 * numNodesSAM, (int)3 * numNodesSAM, PETSC_DEFAULT, > NULL, PETSC_DEFAULT, NULL, ); > • When registering the SNES jacobian function, I set the B and J > matrices to be the same. > • ierr = SNESSetJacobian(snes, J, J, SAMformSNESJ, (void > *)this); CHKERRQ(ierr); > • I can either let PETSc figure out the allocation structure: > • ierr = MatMPIBAIJSetPreallocation(J, bs, PETSC_DEFAULT, > NULL,PETSC_DEFAULT, NULL); > • or, do it myself, since I know the fill pattern, > • ierr = MatMPIBAIJSetPreallocation(J, bs, > d_nz_dum,_nnz[0],o_nz_dum,_nnz[0]); > The symptoms/problems are as follows: > • Whether I do preallocation or not, the "setup" time is pretty long. > It might take 2 minutes before SNES starts doing its thing. After this setup, > convergence and speed is great. But this first phase takes a long time. I'm > assuming this has to be related to some poor preallocation setup so it's > doing tons of mallocs where it’s not needed. You should definitely get much better performance with proper preallocation then with none (unless the default is enough for your matrix). Run with -info and grep for "malloc" this will tell you exactly how many, if any mallocs are taking place inside the MatSetValues() due to improper preallocation. > • If I don't call my Jacobian formulation before calling SNESSolve, I > get a segmentation violation in a PETSc routine. Not sure what you mean by Jacobian formation but I'm guessing filling up the Jacobian with numerical values? Something is wrong because you should not need to fill up the values before calling SNES solve, and regardless it should never ever crash with a segmentation violation. You can run with valgrind https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mcs.anl.gov_petsc_documentation_faq.html-23valgrind=DwIGaQ=zeCCs5WLaN-HWPHrpXwbFoOqeS0G3NH2_2IQ_bzV13g=q_3hswOPAFb0l_4-IAZZi5DgTpzDUIpk984njq2YnggBd-vCWTgNlbk27KjHXmKK=IREQmkCt5PXK-SnLqJZXz3Du7h3mFP24xtI0jHGgGUY=UiGHkQ2Zr_nYYQ-GYg1HEYtbqZutYSgv9F1A86sfNKI= to make sure that it is not a memory corruption issue. You can also run in the debugger (perhaps the PETSc command line option -start_in_debugger) to get more details on why it is crashing. When you have it running satisfactory you can send us the output from running with -log_view and we can let you know how it seems to be performing efficiency wise. Barry > (If I DO call my Jacobian first, things work great, although slow for the > setup phase.) Here's a snippet of the traceback: > 0 0x009649fc in MatMultAdd_SeqBAIJ_3 (A=, >
Re: [petsc-users] MatCreateBAIJ, SNES, Preallocation...
> On May 16, 2019, at 3:44 PM, William Coirier via petsc-users > wrote: > > Folks: > > I'm developing an application using the SNES, and overall it's working great, > as many of our other PETSc-based projects. But, I'm having a problem related > to (presumably) pre-allocation, block matrices and SNES. > > Without going into details about the actual problem we are solving, here are > the symptoms/characteristics/behavior. > • For the SNES Jacobian, I'm using MatCreateBAIJ for a block size=3, > and letting "PETSC_DECIDE" the partitioning. Actual call is: > • ierr = MatCreateBAIJ(PETSC_COMM_WORLD, bs, PETSC_DECIDE, > PETSC_DECIDE, (int)3 * numNodesSAM, (int)3 * numNodesSAM, PETSC_DEFAULT, > NULL, PETSC_DEFAULT, NULL, ); > • When registering the SNES jacobian function, I set the B and J > matrices to be the same. > • ierr = SNESSetJacobian(snes, J, J, SAMformSNESJ, (void > *)this); CHKERRQ(ierr); > • I can either let PETSc figure out the allocation structure: > • ierr = MatMPIBAIJSetPreallocation(J, bs, PETSC_DEFAULT, > NULL,PETSC_DEFAULT, NULL); > • or, do it myself, since I know the fill pattern, > • ierr = MatMPIBAIJSetPreallocation(J, bs, > d_nz_dum,_nnz[0],o_nz_dum,_nnz[0]); > The symptoms/problems are as follows: > • Whether I do preallocation or not, the "setup" time is pretty long. > It might take 2 minutes before SNES starts doing its thing. After this setup, > convergence and speed is great. But this first phase takes a long time. I'm > assuming this has to be related to some poor preallocation setup so it's > doing tons of mallocs where it’s not needed. You should definitely get much better performance with proper preallocation then with none (unless the default is enough for your matrix). Run with -info and grep for "malloc" this will tell you exactly how many, if any mallocs are taking place inside the MatSetValues() due to improper preallocation. > • If I don't call my Jacobian formulation before calling SNESSolve, I > get a segmentation violation in a PETSc routine. Not sure what you mean by Jacobian formation but I'm guessing filling up the Jacobian with numerical values? Something is wrong because you should not need to fill up the values before calling SNES solve, and regardless it should never ever crash with a segmentation violation. You can run with valgrind https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind to make sure that it is not a memory corruption issue. You can also run in the debugger (perhaps the PETSc command line option -start_in_debugger) to get more details on why it is crashing. When you have it running satisfactory you can send us the output from running with -log_view and we can let you know how it seems to be performing efficiency wise. Barry > (If I DO call my Jacobian first, things work great, although slow for the > setup phase.) Here's a snippet of the traceback: > 0 0x009649fc in MatMultAdd_SeqBAIJ_3 (A=, > xx=0x3a525b0, yy=0x3a531b0, zz=0x3a531b0) > at > /home/jstutts/Downloads/petsc-3.11.1/src/mat/impls/baij/seq/baij2.c:1424 > #1 0x006444cb in MatMult_MPIBAIJ (A=0x15da340, xx=0x3a542a0, > yy=0x3a531b0) > at > /home/jstutts/Downloads/petsc-3.11.1/src/mat/impls/baij/mpi/mpibaij.c:1380 > #2 0x005b2c0f in MatMult (mat=0x15da340, x=x@entry=0x3a542a0, > y=y@entry=0x3a531b0) > at /home/jstutts/Downloads/petsc-3.11.1/src/mat/interface/matrix.c:2396 > #3 0x00c61f2e in PCApplyBAorAB (pc=0x1ce78c0, side=PC_LEFT, > x=0x3a542a0, y=y@entry=0x3a548a0, work=0x3a531b0) > at /home/jstutts/Downloads/petsc-3.11.1/src/ksp/pc/interface/precon.c:690 > #4 0x00ccb36b in KSP_PCApplyBAorAB (w=, y=0x3a548a0, > x=, ksp=0x1d44d50) > at > /home/jstutts/Downloads/petsc-3.11.1/include/petsc/private/kspimpl.h:309 > #5 KSPGMRESCycle (itcount=itcount@entry=0x7fffc02c, > ksp=ksp@entry=0x1d44d50) > at > /home/jstutts/Downloads/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmres.c:152 > #6 0x00ccbf6f in KSPSolve_GMRES (ksp=0x1d44d50) > at > /home/jstutts/Downloads/petsc-3.11.1/src/ksp/ksp/impls/gmres/gmres.c:237 > #7 0x007dc193 in KSPSolve (ksp=0x1d44d50, b=b@entry=0x1d41c70, > x=x@entry=0x1cebf40) > > > > I apologize if I’ve missed something in the documentation or examples, but I > can’t seem to figure this one out. The “setup” seems to take too long, and > from my previous experiences with PETSc, this is due to a poor preallocation > strategy. > > Any and all help is appreciated! > > --- > William J. Coirier, Ph.D. > Director, Aerosciences and Engineering Analysis Branch > Advanced Concepts Development and Test Division > Kratos Defense and Rocket Support Services > 4904 Research Drive > Huntsville, AL 35805 > 256-327-8170 >
Re: [petsc-users] Question about TSComputeRHSJacobianConstant
Sajid, This is a huge embarrassing performance bug in PETSc https://bitbucket.org/petsc/petsc/issues/293/refactoring-of-ts-handling-of-reuse-of It is using 74 percent of the time to perform MatAXPY() on two large sparse matrices, not knowing they have identical nonzero patterns and one of which has all zeros off of the diagonal. This despite the fact that a few lines higher in the code is special purpose code for exactly the case you have that only stores one matrix and only ever shifts the diagonal of the matrix. Please edit TSSetUp() and remove the lines if (ts->rhsjacobian.reuse && rhsjac == TSComputeRHSJacobianConstant) { Mat Amat,Pmat; SNES snes; ierr = TSGetSNES(ts,);CHKERRQ(ierr); ierr = SNESGetJacobian(snes,,,NULL,NULL);CHKERRQ(ierr); /* Matching matrices implies that an IJacobian is NOT set, because if it had been set, the IJacobian's matrix would * have displaced the RHS matrix */ if (Amat && Amat == ts->Arhs) { /* we need to copy the values of the matrix because for the constant Jacobian case the user will never set the numerical values in this new location */ ierr = MatDuplicate(ts->Arhs,MAT_COPY_VALUES,);CHKERRQ(ierr); ierr = SNESSetJacobian(snes,Amat,NULL,NULL,NULL);CHKERRQ(ierr); ierr = MatDestroy();CHKERRQ(ierr); } if (Pmat && Pmat == ts->Brhs) { ierr = MatDuplicate(ts->Brhs,MAT_COPY_VALUES,);CHKERRQ(ierr); ierr = SNESSetJacobian(snes,NULL,Pmat,NULL,NULL);CHKERRQ(ierr); ierr = MatDestroy();CHKERRQ(ierr); } } You will be stunned by the improvement in time. > On May 16, 2019, at 3:06 PM, Sajid Ali via petsc-users > wrote: > > Hi PETSc developers, > > I have a question about TSComputeRHSJacobianConstant. If I create a TS (of > type linear) for a problem where the jacobian does not change with time (set > with the aforementioned option) and run it for different number of time > steps, why does the time it takes to evaluate the jacobian change (as > indicated by TSJacobianEval) ? > > To clarify, I run with the example with different TSSetTimeStep, but the same > jacobian matrix. I see that the time spent in KSPSolve increases with > increasing number of steps (which is as expected as this is a KSPOnly SNES > solver). But surprisingly, the time spent in TSJacobianEval also increases > with decreasing time-step (or increasing number of steps). > > For reference, I attach the log files for two cases which were run with > different time steps and the source code. > > Thank You, > Sajid Ali > Applied Physics > Northwestern University >
[petsc-users] Question about TSComputeRHSJacobianConstant
Hi PETSc developers, I have a question about TSComputeRHSJacobianConstant. If I create a TS (of type linear) for a problem where the jacobian does not change with time (set with the aforementioned option) and run it for different number of time steps, why does the time it takes to evaluate the jacobian change (as indicated by TSJacobianEval) ? To clarify, I run with the example with different TSSetTimeStep, but the same jacobian matrix. I see that the time spent in KSPSolve increases with increasing number of steps (which is as expected as this is a KSPOnly SNES solver). But surprisingly, the time spent in TSJacobianEval also increases with decreasing time-step (or increasing number of steps). For reference, I attach the log files for two cases which were run with different time steps and the source code. Thank You, Sajid Ali Applied Physics Northwestern University ex_dmda.c Description: Binary data out_50 Description: Binary data out_100 Description: Binary data