Hong, How can I reproduce this? What should happen is KSP gets a Nan which passes a Nan to SNES which passes it up to TS and TS should stop with an error saying it cannot time-step.
One can run with -ksp_error_if_not_converged to get a crash as soon as the the linear solver detects a problem to reproduce some of the old behavior. Barry The logic behind these changes is that previously KSP would stop the program with a error if there was a zero pivot in LU, this means the time step algorithms could not recover from a zero pivot. This was very bad, who wants a program that has been running for 8 hours to crash because at timestep 1,003,415 it got a zero pivot when a simple cut in the tilmestep size would allow the program to keep running. Obviously there are some issues in the new model that have to be worked out. > On Jun 10, 2015, at 8:49 PM, Hong <[email protected]> wrote: > > Barry: > > Why the code does not crash in the case of 'nan': > > Integrating characteristic equations ... > TS: 0.000000 > 0 SNES Function norm 1.490033584870e+02 > 0 KSP Residual norm nan > TS: 0.100000 > 0 SNES Function norm 1.490033584870e+02 > 0 KSP Residual norm nan > ... > > Previously, when ilu/lu encounters a zero pivot, the code crashes, which asks > user to do something about his model or the algorithm; > Now, it gives 'nan', but continues running. > We cannot assume new users know '-ksp_monitor -snes_monitor', and set > '-pc_factor_shift'. The frustration we encountered is that the code seems > working well, until I checked the solutions at each time steps and found them > never change values. Then it took us hours to find out it was caused by shift. > > At least for LU, when zero-pivot occurs, we should throw an error. > > Hong > > Hong, > > This is a question for the entire PETSc development team, not just me. > > The reason I made this change was I was sick of > runs with ILU and nearly zero pivots where the preconditioned residual normed > ended up initially being like 10^20 and then decreased in a few iterations to > 10^10 and so the iteration stopped while, in fact, no progress on the true > residual norm at all, meanwhile the user thinks the linear solver is working > fine. I figured was better to just "stop" the entire process so the user can > do something smarter for preconditioning than using ILU(0). At worse the user > can now add the pivot option themselves. > > Was this a good choice, perhaps not? Should we just go back to the old > default? Probably we need to at least have better help in terms of telling > people to turn on that option? > > > Barry > > > > > On Jun 10, 2015, at 4:47 PM, Hong <[email protected]> wrote: > > > > Daniel, > > > > Try to use it with the flag: '-pc_type svd'. Now it works. > > Great, you found the problem! > > > > This is very strange > > OK, this must be the change made on the default pc_factor_shift used for > > ilu/lu. > > In v3.5.4, it is > > -pc_factor_shift_type INBLOCKS > > > > now, > > -pc_factor_shift_type NONE > > > > Adding the option '-pc_factor_shift_type INBLOCKS' > > or '-pc_factor_shift_type NONZERO' > > the code runs well :-) > > > > Barry: > > For ilu, a shift should be used to avoid zero pivot; > > for lu, '-pc_factor_shift_type NONE' should be the default. > > Why do we make such change? > > It seems you made such change: > > https://bitbucket.org/petsc/petsc/commits/422a814ef4a731b8529cf3a6428db526d183e312 > > > > Hong > > > > > > > > On Wed, Jun 10, 2015 at 3:11 PM, Adrian Maldonado <[email protected]> > > wrote: > > I have been able to replicate the error with the new petsc-dev. > > > > KSP does not converge. However, the Jacobian matrix is the same!!! > > > > This jacobian matrix is not very well conditioned (condition number > > 1.4814e+03 for a 12x12 matrix) because of the valve... but it didn't give > > me any problem before. > > > > In fact, you can see that I have implemented the same in matlab (matlab > > folder) and solved the problem with a vanilla newton-rhapson iteration (no > > preconditioner etc...) and I didn't have any problem at all. > > > > > > > > On Wed, Jun 10, 2015 at 2:37 PM, Adrian Maldonado <[email protected]> > > wrote: > > I am upgrading to the new petsc-dev, I will let you know what I find. Maybe > > is an issue of the finite difference jacobian (how is generated)? > > > > I would love to go to to the conference! How can I register? > > > > On Wed, Jun 10, 2015 at 12:55 PM, Hong <[email protected]> wrote: > > I found which change made difference: > > When I change Makefile > > $ diff old.Makefile new.Makefile > > 1,2c1,2 > > < include ${PETSC_DIR}/conf/variables > > < include ${PETSC_DIR}/conf/rules > > --- > > > include ${PETSC_DIR}/lib/petsc/conf/variables > > > include ${PETSC_DIR}/lib/petsc/conf/rules > > > > to build tubtrans with petsc-dev, then I get > > > > Loaded case file > > 0 KSP Residual norm nan > > SYSTEM SUMMARY > > > > Number of variables: 12 > > Number of nodes: 2 > > Number of pipes: 1 > > Integrating characteristic equations ... > > TS: 0.000000 > > 0 KSP Residual norm nan > > > > Some petsc solvers have changed, but it should not behave like this. > > We have to use a debugger to figure out which step causes this problem :-( > > > > Hong > > > > On Wed, Jun 10, 2015 at 10:57 AM, Adrian Maldonado <[email protected]> > > wrote: > > Sure. > > > > This one has the object files, etc.. > > > > I will check the new petsc-dev this afternoon. > > > > On Wed, Jun 10, 2015 at 10:42 AM, Hong <[email protected]> wrote: > > How about make a tarball of your current tubtrans and send to me > > (or use google drive/dropbox)? > > > > Hong > > > > > > On Wed, Jun 10, 2015 at 10:28 AM, Adrian Maldonado <[email protected]> > > wrote: > > Strange. > > > > I am using the master branch from the bitbucket repo > > (https://bitbucket.org/damalbel/tubtrans.git). > > > > The jacobian is the finite difference jacobian (-snes_fd) as I didn't write > > one (I did write one in the matlab code prototype, though). > > > > What I can think now is to check this singular jacobian and see what > > equation is causing the problem. > > > > Let me clone the new petsc release and try to replicate the problem. I have > > a couple of things to do this morning but I will get back to you this > > afternoon. > > > > On Wed, Jun 10, 2015 at 9:54 AM, Hong <[email protected]> wrote: > > Daniel, > > > > It seems Jacobian is singular: > > ./tubtrans -snes_monitor -ksp_monitor |more > > > > Loaded case file > > 0 SNES Function norm 1.500947612239e+02 > > 0 KSP Residual norm -nan > > SYSTEM SUMMARY > > > > Number of variables: 24 > > Number of nodes: 4 > > Number of pipes: 2 > > Integrating characteristic equations ... > > TS: 0.000000 > > 0 SNES Function norm 1.490033613070e+02 > > 0 KSP Residual norm -nan > > > > Do you use the same version in > > git clone https://bitbucket.org/damalbel/tubtrans.git? > > > > Can you apply my patch to this version and test it with new petsc-3.5 (we > > are working on its release)? > > The best way to get petsc-dev is using > > • git clone https://bitbucket.org/petsc/petsc > > We will have a PETSc conference next week > > http://www.mcs.anl.gov/petsc/petsc-20.pdf > > > > Would you like join us? > > Hong > > > > > > On Wed, Jun 10, 2015 at 8:17 AM, Adrian Maldonado <[email protected]> > > wrote: > > P.S: I have applied your patch, and using PETSC 3.5.2 the program works all > > right. I have attached the logs of running test.xml and test2.xml. > > > > Again, let me check out the new petsc release. > > > > On Wed, Jun 10, 2015 at 8:08 AM, Adrian Maldonado <[email protected]> > > wrote: > > I don't see why it should not work. I can see two things that we can check: > > > > - Is SNES iterating? Can you check for -snes_monitor and > > -snes_converged_reason > > - In the program, I initialize the the solutions vector 'x'to ones as an > > initial guess (line 102), maybe somehow petsc is unable to write on it? > > > > Let me clone the new petsc-dev version and I will try to replicate the > > error. > > > > On Tue, Jun 9, 2015 at 4:56 PM, Hong <[email protected]> wrote: > > Danniel, > > Thanks for detailed instruction. I'm using petsc-dev, which will be > > released today or tomorrow. I'm building a release version to see if that > > causes the problem. > > > > However, your code should generate same solution with my patch (attached) > > because it only replace > > VecGetArray() with VecGetArrayRead(), and fix a memory leak problem. > > > > Here is what I did: > > 1. install petsc-dev (see > > http://www.mcs.anl.gov/petsc/developers/index.html) > > git clone https://bitbucket.org/petsc/petsc > > > > 2. build petsc-dev > > > > 3. git clone https://bitbucket.org/damalbel/tubtrans.git > > patch it: > > cd tubtrans > > patch -Np1 < patch_tubtrans > > make tubtrans > > > > 4. run tubtrans > > ./tubtrans > > > > Loaded case file > > SYSTEM SUMMARY > > > > Number of variables: 24 > > Number of nodes: 4 > > Number of pipes: 2 > > Integrating characteristic equations ... > > TS: 0.000000 > > Printing solution vector... > > Vec Object: 1 MPI processes > > type: seq > > 1 > > 1 > > 1 > > 1 > > 1 > > ... > > > > Changing to different input file: > > ./tubtrans -f input/test.xml |more > > Loaded case file > > SYSTEM SUMMARY > > > > Number of variables: 12 > > Number of nodes: 2 > > Number of pipes: 1 > > Integrating characteristic equations ... > > TS: 0.000000 > > Printing solution vector... > > Vec Object: 1 MPI processes > > type: seq > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > 1 > > TS: 0.100000 > > Printing solution vector... > > Vec Object: 1 MPI processes > > type: seq > > 1 > > 1 > > 1 > > > > Do I do something wrong? > > > > Hong > > > > > > On Tue, Jun 9, 2015 at 12:15 AM, Adrian Maldonado <[email protected]> > > wrote: > > Professor Hong, > > > > I am glad to hear from you! I am happy that you are finding the code > > useful. As for the solution vector being constant ones - that doesn't seem > > right. > > > > The master branch of the bitbucket repository with the Petsc Release > > Version 3.5.2 seems to work fine in my laptop. > > > > I have written a small walk through below. > > > > Please, let me know if you want me to take a look at the code or I can help > > you in any other way. > > > > Walk through: > > > > - The program uses XML (libxml2) to parse data for different case > > scenarios. The cases are stored in the directory 'input/' and you can > > select the case to run in the petscopt file (e.g -f input/test.xml) > > > > - The file test.xml, for example, includes a simple one-pipe test scenario > > from the textbook 'Fluid Transients' by Wyle. The content is shown below: > > > > <?xml version="1.0"?> > > <case> > > <nnode>2</nnode> > > <pipe id="1"> > > <fr>1</fr> > > <to>2</to> > > <fric>0.018</fric> > > <diam>0.5</diam> > > <a>1200</a> > > <len>600</len> > > </pipe> > > <reservoir id="1"> > > <node>1</node> > > <head>150</head> > > </reservoir> > > > > <valve id="1"> > > <node>2</node> > > <lossc>0.009</lossc> > > </valve> > > </case> > > > > In this case we have a conduct of two nodes (boundary equations). The only > > conduct (or pipe) goes from node 1 to node 2. The entries 'fric', 'diam', > > 'a' and 'len' are the parameters of the pipeline. Then, the file includes a > > reservoir and a valve as boundary equations. > > > > The reservoir defines a constant pressure (water head) in the node 1 > > whereas the valve defines a relationship between the pressure and the flow > > with friction losses. > > > > The program, after creating the necessary data structures and > > discretization (according to Courant number) will proceed to the > > initialization. The initialization is solving the system state for the > > steady-state conditions with SNES. > > > > Running the program in gdb for 'test.xml' and printing the 'x' vector will > > produce this: > > > > Breakpoint 1, main (argc=1, argv=0x7fffffffde68) at main.c:115 > > 115 ierr = SNESSolve(snes, NULL, x);CHKERRQ(ierr); > > (gdb) n > > 120 ierr = MatDenseGetArray(SOL, &ss);CHKERRQ(ierr); > > (gdb) p VecView(x, 0) > > Vec Object: 1 MPI processes > > type: seq > > 0.477432 > > 150 > > 0.477432 > > 148.698 > > 0.477432 > > 147.395 > > 0.477432 > > 146.093 > > 0.477432 > > 144.791 > > 0.477432 > > 143.488 > > > > The ordering here is, according to the discretization x = [Q0 H0 Q1 H1 ... > > Qn Hn]' . Where Q is the water flow and H is the water head or pressure. > > You can see how the water flow is constant in steady state along the > > pipeline and the pressure in the first node is 150, as the reservoir sets > > this as boundary equation. > > > > For the transient we simulate the sudden closure of the valve (dirac delta) > > although we can implement different closure dynamics. > > > > I have compiled the debug mode of the program (make debug) which prints the > > 'x' vector and each time step. I am attaching the results in the file > > log.txt so you can compare it. Each time step solution is stored in the > > matrix SOL and finally written to a binary file 'output.out'. > > > > The first integration time-step looks like: > > > > 0.477432 > > 150 > > 0.477432 > > 148.698 > > 0.477432 > > 147.395 > > 0.477432 > > 146.093 > > 0.477432 > > 144.791 > > -1.11838e-12 > > 441.046 > > > > The pen-ultimate entry of the x vector is the flow at the valve, which is 0 > > after the valve closes. At the end of the valve an steep rise in the > > pressure happens, as expected. This pressure wave will oscillate along the > > pipe according to the method of characteristics. > > > > More complex scenarios are included in '/input' and a script with some > > python macros 'intPlot.py' is included to produce plots of the pressure and > > flow profiles within the system. > > > > I hope this helped you and please, let me know in what can I help. > > > > Yours, > > > > > > On Mon, Jun 8, 2015 at 10:06 PM, Hong <[email protected]> wrote: > > Danniel, > > > > Hello, this is Hong, your professor in CS595 :-) > > > > We are working on a project for simulating water cycle. The math model to > > be started is a shallow water river, very similar to your project. I start > > looking at your codes. After small update with latest petsc-dev, I'm able > > to run it. > > > > Look at the solutions at each time step, I found that solution vector x > > generated in main.c are alway a constant vector, x=1. > > Am I doing the right thing? > > > > Hong > > > > > > > > -- > > D. Adrian Maldonado, PhD Candidate > > Electrical & Computer Engineering Dept. > > Illinois Institute of Technology > > 3301 S. Dearborn Street, Chicago, IL 60616 > > > > > > > > > > -- > > D. Adrian Maldonado, PhD Candidate > > Electrical & Computer Engineering Dept. > > Illinois Institute of Technology > > 3301 S. Dearborn Street, Chicago, IL 60616 > > > > > > > > -- > > D. Adrian Maldonado, PhD Candidate > > Electrical & Computer Engineering Dept. > > Illinois Institute of Technology > > 3301 S. Dearborn Street, Chicago, IL 60616 > > > > > > > > > > -- > > D. Adrian Maldonado, PhD Candidate > > Electrical & Computer Engineering Dept. > > Illinois Institute of Technology > > 3301 S. Dearborn Street, Chicago, IL 60616 > > > > > > > > > > -- > > D. Adrian Maldonado, PhD Candidate > > Electrical & Computer Engineering Dept. > > Illinois Institute of Technology > > 3301 S. Dearborn Street, Chicago, IL 60616 > > > > > > > > > > -- > > D. Adrian Maldonado, PhD Candidate > > Electrical & Computer Engineering Dept. > > Illinois Institute of Technology > > 3301 S. Dearborn Street, Chicago, IL 60616 > > > > > > > > -- > > D. Adrian Maldonado, PhD Candidate > > Electrical & Computer Engineering Dept. > > Illinois Institute of Technology > > 3301 S. Dearborn Street, Chicago, IL 60616 > > > > > > > > -- > > D. Adrian Maldonado, PhD Candidate > > Electrical & Computer Engineering Dept. > > Illinois Institute of Technology > > 3301 S. Dearborn Street, Chicago, IL 60616 > > > >
