Hong,

   How can I reproduce this? What should happen is KSP gets a Nan which passes 
a Nan to SNES which passes it up to TS and TS should stop with an error saying 
it cannot time-step.  

   One can run with -ksp_error_if_not_converged to get a crash as soon as the 
the linear solver detects a problem to reproduce some of the old behavior.

   Barry

  The logic behind these changes is that previously KSP would stop the program 
with a error if there was a zero pivot in LU, this means the time step 
algorithms could not recover from a zero pivot. This was very bad, who wants a 
program that has been running for 8 hours to crash because at timestep 
1,003,415 it got a zero pivot when a simple cut in the tilmestep size would 
allow the program to keep running. 

 Obviously there are some issues in the new model that have to be worked out.


> On Jun 10, 2015, at 8:49 PM, Hong <[email protected]> wrote:
> 
> Barry:
> 
> Why the code does not crash in the case of 'nan':
> 
> Integrating characteristic equations ...
> TS: 0.000000
>   0 SNES Function norm 1.490033584870e+02
>     0 KSP Residual norm            nan
> TS: 0.100000
>   0 SNES Function norm 1.490033584870e+02
>     0 KSP Residual norm            nan
> ...
> 
> Previously, when ilu/lu encounters a zero pivot, the code crashes, which asks 
> user to do something about his model or the algorithm; 
> Now, it gives 'nan', but continues running.
> We cannot assume new users know '-ksp_monitor -snes_monitor', and set 
> '-pc_factor_shift'. The frustration we encountered is that the code seems 
> working well, until I checked the solutions at each time steps and found them 
> never change values. Then it took us hours to find out it was caused by shift.
> 
> At least for LU, when zero-pivot occurs, we should throw an error.
> 
> Hong
> 
>   Hong,
> 
>     This is a question for the entire PETSc development team, not just me.
> 
> The reason I made this change was I was sick of
> runs with ILU and nearly zero pivots where the preconditioned residual normed 
> ended up initially being like 10^20 and then decreased in a few iterations to 
> 10^10 and so the iteration stopped while, in fact, no progress on the true 
> residual norm at all, meanwhile the user thinks the linear solver is working 
> fine. I figured was better to just "stop" the entire process so the user can 
> do something smarter for preconditioning than using ILU(0). At worse the user 
> can now add the pivot option themselves.
> 
>    Was this a good choice, perhaps not? Should we just go back to the old 
> default? Probably we need to at least have better help in terms of telling 
> people to turn on that option?
> 
> 
>   Barry
> 
> 
> 
> > On Jun 10, 2015, at 4:47 PM, Hong <[email protected]> wrote:
> >
> > Daniel,
> >
> > Try to use it with the flag: '-pc_type svd'. Now it works.
> > Great, you found the problem!
> >
> > This is very strange
> > OK, this must be the change made on the default pc_factor_shift used for 
> > ilu/lu.
> > In v3.5.4, it is
> > -pc_factor_shift_type INBLOCKS
> >
> > now,
> > -pc_factor_shift_type NONE
> >
> > Adding the option '-pc_factor_shift_type INBLOCKS'
> > or '-pc_factor_shift_type NONZERO'
> > the code runs well :-)
> >
> > Barry:
> > For ilu, a shift should be used to avoid zero pivot;
> > for lu, '-pc_factor_shift_type NONE' should be the default.
> > Why do we make such change?
> > It seems you made such change:
> > https://bitbucket.org/petsc/petsc/commits/422a814ef4a731b8529cf3a6428db526d183e312
> >
> > Hong
> >
> >
> >
> > On Wed, Jun 10, 2015 at 3:11 PM, Adrian Maldonado <[email protected]> 
> > wrote:
> > I have been able to replicate the error with the new petsc-dev.
> >
> > KSP does not converge. However, the Jacobian matrix is the same!!!
> >
> > This jacobian matrix is not very well conditioned (condition number 
> > 1.4814e+03 for a 12x12 matrix) because of the valve... but it didn't give 
> > me any problem before.
> >
> > In fact, you can see that I have implemented the same in matlab (matlab 
> > folder) and solved the problem with a vanilla newton-rhapson iteration (no 
> > preconditioner etc...) and I didn't have any problem at all.
> >
> >
> >
> > On Wed, Jun 10, 2015 at 2:37 PM, Adrian Maldonado <[email protected]> 
> > wrote:
> > I am upgrading to the new petsc-dev, I will let you know what I find. Maybe 
> > is an issue of the finite difference jacobian (how is generated)?
> >
> > I would love to go to to the conference! How can I register?
> >
> > On Wed, Jun 10, 2015 at 12:55 PM, Hong <[email protected]> wrote:
> > I found which change made difference:
> > When I change Makefile
> > $ diff old.Makefile new.Makefile
> > 1,2c1,2
> > < include ${PETSC_DIR}/conf/variables
> > < include ${PETSC_DIR}/conf/rules
> > ---
> > > include ${PETSC_DIR}/lib/petsc/conf/variables
> > > include ${PETSC_DIR}/lib/petsc/conf/rules
> >
> > to build tubtrans with petsc-dev, then I get
> >
> > Loaded case file
> >     0 KSP Residual norm            nan
> > SYSTEM SUMMARY
> >
> > Number of variables: 12
> > Number of nodes: 2
> > Number of pipes: 1
> > Integrating characteristic equations ...
> > TS: 0.000000
> >     0 KSP Residual norm            nan
> >
> > Some petsc solvers have changed, but it should not behave like this.
> > We have to use a debugger to figure out which step causes this problem :-(
> >
> > Hong
> >
> > On Wed, Jun 10, 2015 at 10:57 AM, Adrian Maldonado <[email protected]> 
> > wrote:
> > Sure.
> >
> > This one has the object files, etc..
> >
> > I will check the new petsc-dev this afternoon.
> >
> > On Wed, Jun 10, 2015 at 10:42 AM, Hong <[email protected]> wrote:
> > How about make a tarball of your current tubtrans and send to me
> > (or use google drive/dropbox)?
> >
> > Hong
> >
> >
> > On Wed, Jun 10, 2015 at 10:28 AM, Adrian Maldonado <[email protected]> 
> > wrote:
> > Strange.
> >
> > I am using the master branch from the bitbucket repo 
> > (https://bitbucket.org/damalbel/tubtrans.git).
> >
> > The jacobian is the finite difference jacobian (-snes_fd) as I didn't write 
> > one (I did write one in the matlab code prototype, though).
> >
> > What I can think now is to check this singular jacobian and see what 
> > equation is causing the problem.
> >
> > Let me clone the new petsc release and try to replicate the problem. I have 
> > a couple of things to do this morning but I will get back to you this 
> > afternoon.
> >
> > On Wed, Jun 10, 2015 at 9:54 AM, Hong <[email protected]> wrote:
> > Daniel,
> >
> > It seems Jacobian is singular:
> > ./tubtrans -snes_monitor -ksp_monitor |more
> >
> > Loaded case file
> >   0 SNES Function norm 1.500947612239e+02
> >     0 KSP Residual norm           -nan
> > SYSTEM SUMMARY
> >
> > Number of variables: 24
> > Number of nodes: 4
> > Number of pipes: 2
> > Integrating characteristic equations ...
> > TS: 0.000000
> >   0 SNES Function norm 1.490033613070e+02
> >     0 KSP Residual norm           -nan
> >
> > Do you use the same version in
> > git clone https://bitbucket.org/damalbel/tubtrans.git?
> >
> > Can you apply my patch to this version and test it with new petsc-3.5 (we 
> > are working on its release)?
> > The best way to get petsc-dev is using
> >       • git clone https://bitbucket.org/petsc/petsc
> > We will have a PETSc conference next week
> > http://www.mcs.anl.gov/petsc/petsc-20.pdf
> >
> > Would you like join us?
> > Hong
> >
> >
> > On Wed, Jun 10, 2015 at 8:17 AM, Adrian Maldonado <[email protected]> 
> > wrote:
> > P.S: I have applied your patch, and using PETSC 3.5.2 the program works all 
> > right. I have attached the logs of running test.xml and test2.xml.
> >
> > Again, let me check out the new petsc release.
> >
> > On Wed, Jun 10, 2015 at 8:08 AM, Adrian Maldonado <[email protected]> 
> > wrote:
> > I don't see why it should not work. I can see two things that we can check:
> >
> > - Is SNES iterating? Can you check for -snes_monitor and 
> > -snes_converged_reason
> > - In the program, I initialize the the solutions vector 'x'to ones as an 
> > initial guess (line 102), maybe somehow petsc is unable to write on it?
> >
> > Let me clone the new petsc-dev version and I will try to replicate the 
> > error.
> >
> > On Tue, Jun 9, 2015 at 4:56 PM, Hong <[email protected]> wrote:
> > Danniel,
> >  Thanks for detailed instruction. I'm using petsc-dev, which will be 
> > released today or tomorrow. I'm building a release version to see if that 
> > causes the problem.
> >
> > However, your code should generate same solution with my patch (attached) 
> > because it only replace
> > VecGetArray() with VecGetArrayRead(), and fix a memory leak problem.
> >
> > Here is what I did:
> > 1. install petsc-dev (see 
> > http://www.mcs.anl.gov/petsc/developers/index.html)
> > git clone https://bitbucket.org/petsc/petsc
> >
> > 2. build petsc-dev
> >
> > 3. git clone https://bitbucket.org/damalbel/tubtrans.git
> > patch it:
> > cd tubtrans
> > patch -Np1 < patch_tubtrans
> > make tubtrans
> >
> > 4. run tubtrans
> > ./tubtrans
> >
> > Loaded case file
> > SYSTEM SUMMARY
> >
> > Number of variables: 24
> > Number of nodes: 4
> > Number of pipes: 2
> > Integrating characteristic equations ...
> > TS: 0.000000
> > Printing solution vector...
> > Vec Object: 1 MPI processes
> >   type: seq
> > 1
> > 1
> > 1
> > 1
> > 1
> > ...
> >
> > Changing to different input file:
> > ./tubtrans -f input/test.xml |more
> > Loaded case file
> > SYSTEM SUMMARY
> >
> > Number of variables: 12
> > Number of nodes: 2
> > Number of pipes: 1
> > Integrating characteristic equations ...
> > TS: 0.000000
> > Printing solution vector...
> > Vec Object: 1 MPI processes
> >   type: seq
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > 1
> > TS: 0.100000
> > Printing solution vector...
> > Vec Object: 1 MPI processes
> >   type: seq
> > 1
> > 1
> > 1
> >
> > Do I do something wrong?
> >
> > Hong
> >
> >
> > On Tue, Jun 9, 2015 at 12:15 AM, Adrian Maldonado <[email protected]> 
> > wrote:
> > Professor Hong,
> >
> > I am glad to hear from you! I am happy that you are finding the code 
> > useful. As for the solution vector being constant ones - that doesn't seem 
> > right.
> >
> > The master branch of the bitbucket repository with the Petsc Release 
> > Version 3.5.2 seems to work fine in my laptop.
> >
> > I have written a small walk through below.
> >
> > Please, let me know if you want me to take a look at the code or I can help 
> > you in any other way.
> >
> > Walk through:
> >
> > - The program uses XML (libxml2) to parse data for different case 
> > scenarios. The cases are stored in the directory 'input/' and you can 
> > select the case to run in the petscopt file (e.g -f input/test.xml)
> >
> > - The file test.xml, for example, includes a simple one-pipe test scenario 
> > from the textbook 'Fluid Transients' by Wyle. The content is shown below:
> >
> > <?xml version="1.0"?>
> > <case>
> >       <nnode>2</nnode>
> >       <pipe id="1">
> >               <fr>1</fr>
> >               <to>2</to>
> >         <fric>0.018</fric>
> >         <diam>0.5</diam>
> >         <a>1200</a>
> >         <len>600</len>
> >       </pipe>
> >       <reservoir id="1">
> >               <node>1</node>
> >               <head>150</head>
> >       </reservoir>
> >
> >       <valve id="1">
> >               <node>2</node>
> >               <lossc>0.009</lossc>
> >       </valve>
> > </case>
> >
> > In this case we have a conduct of two nodes (boundary equations). The only 
> > conduct (or pipe) goes from node 1 to node 2. The entries 'fric', 'diam', 
> > 'a' and 'len' are the parameters of the pipeline. Then, the file includes a 
> > reservoir and a valve as boundary equations.
> >
> > The reservoir defines a constant pressure (water head) in the node 1 
> > whereas the valve defines a relationship between the pressure and the flow 
> > with friction losses.
> >
> > The program, after creating the necessary data structures and 
> > discretization (according to Courant number) will proceed to the 
> > initialization. The initialization is solving the system state for the 
> > steady-state conditions with SNES.
> >
> > Running the program in gdb for 'test.xml' and printing the 'x' vector will 
> > produce this:
> >
> > Breakpoint 1, main (argc=1, argv=0x7fffffffde68) at main.c:115
> > 115       ierr = SNESSolve(snes, NULL, x);CHKERRQ(ierr);
> > (gdb) n
> > 120       ierr = MatDenseGetArray(SOL, &ss);CHKERRQ(ierr);
> > (gdb) p VecView(x, 0)
> > Vec Object: 1 MPI processes
> >   type: seq
> > 0.477432
> > 150
> > 0.477432
> > 148.698
> > 0.477432
> > 147.395
> > 0.477432
> > 146.093
> > 0.477432
> > 144.791
> > 0.477432
> > 143.488
> >
> > The ordering here is, according to the discretization x = [Q0 H0 Q1 H1 ... 
> > Qn Hn]' . Where Q is the water flow and H is the water head or pressure. 
> > You can see how the water flow is constant in steady state along the 
> > pipeline and the pressure in the first node is 150, as the reservoir sets 
> > this as boundary equation.
> >
> > For the transient we simulate the sudden closure of the valve (dirac delta) 
> > although we can implement different closure dynamics.
> >
> > I have compiled the debug mode of the program (make debug) which prints the 
> > 'x' vector and each time step. I am attaching the results in the file 
> > log.txt so you can compare it. Each time step solution is stored in the 
> > matrix SOL and finally written to a binary file 'output.out'.
> >
> > The first integration time-step looks like:
> >
> > 0.477432
> > 150
> > 0.477432
> > 148.698
> > 0.477432
> > 147.395
> > 0.477432
> > 146.093
> > 0.477432
> > 144.791
> > -1.11838e-12
> > 441.046
> >
> > The pen-ultimate entry of the x vector is the flow at the valve, which is 0 
> > after the valve closes. At the end of the valve an steep rise in the 
> > pressure happens, as expected. This pressure wave will oscillate along the 
> > pipe according to the method of characteristics.
> >
> > More complex scenarios are included in '/input' and a script with some 
> > python macros 'intPlot.py' is included to produce plots of the pressure and 
> > flow profiles within the system.
> >
> > I hope this helped you and please, let me know in what can I help.
> >
> > Yours,
> >
> >
> > On Mon, Jun 8, 2015 at 10:06 PM, Hong <[email protected]> wrote:
> > Danniel,
> >
> > Hello, this is Hong, your professor in CS595 :-)
> >
> > We are working on a project for simulating water cycle. The math model to 
> > be started is a shallow water river, very similar to your project. I start 
> > looking at your codes. After small update with latest petsc-dev, I'm able 
> > to run it.
> >
> > Look at the solutions at each time step, I found that solution vector x 
> > generated in main.c are alway a constant vector, x=1.
> > Am I doing the right thing?
> >
> > Hong
> >
> >
> >
> > --
> > D. Adrian Maldonado, PhD Candidate
> > Electrical & Computer Engineering Dept.
> > Illinois Institute of Technology
> > 3301 S. Dearborn Street, Chicago, IL 60616
> >
> >
> >
> >
> > --
> > D. Adrian Maldonado, PhD Candidate
> > Electrical & Computer Engineering Dept.
> > Illinois Institute of Technology
> > 3301 S. Dearborn Street, Chicago, IL 60616
> >
> >
> >
> > --
> > D. Adrian Maldonado, PhD Candidate
> > Electrical & Computer Engineering Dept.
> > Illinois Institute of Technology
> > 3301 S. Dearborn Street, Chicago, IL 60616
> >
> >
> >
> >
> > --
> > D. Adrian Maldonado, PhD Candidate
> > Electrical & Computer Engineering Dept.
> > Illinois Institute of Technology
> > 3301 S. Dearborn Street, Chicago, IL 60616
> >
> >
> >
> >
> > --
> > D. Adrian Maldonado, PhD Candidate
> > Electrical & Computer Engineering Dept.
> > Illinois Institute of Technology
> > 3301 S. Dearborn Street, Chicago, IL 60616
> >
> >
> >
> >
> > --
> > D. Adrian Maldonado, PhD Candidate
> > Electrical & Computer Engineering Dept.
> > Illinois Institute of Technology
> > 3301 S. Dearborn Street, Chicago, IL 60616
> >
> >
> >
> > --
> > D. Adrian Maldonado, PhD Candidate
> > Electrical & Computer Engineering Dept.
> > Illinois Institute of Technology
> > 3301 S. Dearborn Street, Chicago, IL 60616
> >
> >
> >
> > --
> > D. Adrian Maldonado, PhD Candidate
> > Electrical & Computer Engineering Dept.
> > Illinois Institute of Technology
> > 3301 S. Dearborn Street, Chicago, IL 60616
> >
> 
> 

Reply via email to