> On Jun 10, 2015, at 11:04 PM, Hong <[email protected]> wrote:
>
> Barry:
>
> How can I reproduce this? What should happen is KSP gets a Nan which
> passes a Nan to SNES which passes it up to TS and TS should stop with an
> error saying it cannot time-step.
>
> I've sent you an invitation for repository 'wash'.
> The example is wash/pipe/main.c
> make pipe
> $ ./pipe -pc_factor_shift_type NONE -snes_monitor -ksp_monitor
> Loaded case file
> nTimeSteps 100 * deltatim 0.1 = endTime 10
> Pipe 0. nlen: 5. hlen: 120.000000
> 0 SNES Function norm 1.495500487667e+02
> 0 KSP Residual norm nan
>
> Solve with xda...
> 0 SNES Function norm 1.495500487667e+02
> 0 KSP Residual norm nan
>
> i.e., 'nan' does not cause crash or give any error, simply moves to next line
> of code ... We do our own TS.
Why? Shouldn't one always use TS?
> Maybe we should use
> -snes_converged_reason -ksp_converged_reason to stop.
No, you need to check the converged reason of SNES each time.
SNESGetConvergedReason(snes,&reason);
if (reason < 0) {
nonlinear solve failed so do something
}
Note that TS is suppose to handle all this for you, checking the converged
reason of SNES, cutting the time-step if the nonlinear solve failed ....... so
you don't need to write all the code yourself.
Barry
>
>
> One can run with -ksp_error_if_not_converged to get a crash as soon as the
> the linear solver detects a problem to reproduce some of the old behavior.
>
> This works, thanks. Why not put it as default?
>
> The logic behind these changes is that previously KSP would stop the
> program with a error if there was a zero pivot in LU, this means the time
> step algorithms could not recover from a zero pivot. This was very bad, who
> wants a program that has been running for 8 hours to crash because at
> timestep 1,003,415 it got a zero pivot when a simple cut in the tilmestep
> size would allow the program to keep running.
>
> Obviously there are some issues in the new model that have to be worked out.
>
> Running 1,003,415 time steps with 'nan', and still continue ... ?
>
> Hong
>
>
> > On Jun 10, 2015, at 8:49 PM, Hong <[email protected]> wrote:
> >
> > Barry:
> >
> > Why the code does not crash in the case of 'nan':
> >
> > Integrating characteristic equations ...
> > TS: 0.000000
> > 0 SNES Function norm 1.490033584870e+02
> > 0 KSP Residual norm nan
> > TS: 0.100000
> > 0 SNES Function norm 1.490033584870e+02
> > 0 KSP Residual norm nan
> > ...
> >
> > Previously, when ilu/lu encounters a zero pivot, the code crashes, which
> > asks user to do something about his model or the algorithm;
> > Now, it gives 'nan', but continues running.
> > We cannot assume new users know '-ksp_monitor -snes_monitor', and set
> > '-pc_factor_shift'. The frustration we encountered is that the code seems
> > working well, until I checked the solutions at each time steps and found
> > them never change values. Then it took us hours to find out it was caused
> > by shift.
> >
> > At least for LU, when zero-pivot occurs, we should throw an error.
> >
> > Hong
> >
> > Hong,
> >
> > This is a question for the entire PETSc development team, not just me.
> >
> > The reason I made this change was I was sick of
> > runs with ILU and nearly zero pivots where the preconditioned residual
> > normed ended up initially being like 10^20 and then decreased in a few
> > iterations to 10^10 and so the iteration stopped while, in fact, no
> > progress on the true residual norm at all, meanwhile the user thinks the
> > linear solver is working fine. I figured was better to just "stop" the
> > entire process so the user can do something smarter for preconditioning
> > than using ILU(0). At worse the user can now add the pivot option
> > themselves.
> >
> > Was this a good choice, perhaps not? Should we just go back to the old
> > default? Probably we need to at least have better help in terms of telling
> > people to turn on that option?
> >
> >
> > Barry
> >
> >
> >
> > > On Jun 10, 2015, at 4:47 PM, Hong <[email protected]> wrote:
> > >
> > > Daniel,
> > >
> > > Try to use it with the flag: '-pc_type svd'. Now it works.
> > > Great, you found the problem!
> > >
> > > This is very strange
> > > OK, this must be the change made on the default pc_factor_shift used for
> > > ilu/lu.
> > > In v3.5.4, it is
> > > -pc_factor_shift_type INBLOCKS
> > >
> > > now,
> > > -pc_factor_shift_type NONE
> > >
> > > Adding the option '-pc_factor_shift_type INBLOCKS'
> > > or '-pc_factor_shift_type NONZERO'
> > > the code runs well :-)
> > >
> > > Barry:
> > > For ilu, a shift should be used to avoid zero pivot;
> > > for lu, '-pc_factor_shift_type NONE' should be the default.
> > > Why do we make such change?
> > > It seems you made such change:
> > > https://bitbucket.org/petsc/petsc/commits/422a814ef4a731b8529cf3a6428db526d183e312
> > >
> > > Hong
> > >
> > >
> > >
> > > On Wed, Jun 10, 2015 at 3:11 PM, Adrian Maldonado <[email protected]>
> > > wrote:
> > > I have been able to replicate the error with the new petsc-dev.
> > >
> > > KSP does not converge. However, the Jacobian matrix is the same!!!
> > >
> > > This jacobian matrix is not very well conditioned (condition number
> > > 1.4814e+03 for a 12x12 matrix) because of the valve... but it didn't give
> > > me any problem before.
> > >
> > > In fact, you can see that I have implemented the same in matlab (matlab
> > > folder) and solved the problem with a vanilla newton-rhapson iteration
> > > (no preconditioner etc...) and I didn't have any problem at all.
> > >
> > >
> > >
> > > On Wed, Jun 10, 2015 at 2:37 PM, Adrian Maldonado <[email protected]>
> > > wrote:
> > > I am upgrading to the new petsc-dev, I will let you know what I find.
> > > Maybe is an issue of the finite difference jacobian (how is generated)?
> > >
> > > I would love to go to to the conference! How can I register?
> > >
> > > On Wed, Jun 10, 2015 at 12:55 PM, Hong <[email protected]> wrote:
> > > I found which change made difference:
> > > When I change Makefile
> > > $ diff old.Makefile new.Makefile
> > > 1,2c1,2
> > > < include ${PETSC_DIR}/conf/variables
> > > < include ${PETSC_DIR}/conf/rules
> > > ---
> > > > include ${PETSC_DIR}/lib/petsc/conf/variables
> > > > include ${PETSC_DIR}/lib/petsc/conf/rules
> > >
> > > to build tubtrans with petsc-dev, then I get
> > >
> > > Loaded case file
> > > 0 KSP Residual norm nan
> > > SYSTEM SUMMARY
> > >
> > > Number of variables: 12
> > > Number of nodes: 2
> > > Number of pipes: 1
> > > Integrating characteristic equations ...
> > > TS: 0.000000
> > > 0 KSP Residual norm nan
> > >
> > > Some petsc solvers have changed, but it should not behave like this.
> > > We have to use a debugger to figure out which step causes this problem :-(
> > >
> > > Hong
> > >
> > > On Wed, Jun 10, 2015 at 10:57 AM, Adrian Maldonado
> > > <[email protected]> wrote:
> > > Sure.
> > >
> > > This one has the object files, etc..
> > >
> > > I will check the new petsc-dev this afternoon.
> > >
> > > On Wed, Jun 10, 2015 at 10:42 AM, Hong <[email protected]> wrote:
> > > How about make a tarball of your current tubtrans and send to me
> > > (or use google drive/dropbox)?
> > >
> > > Hong
> > >
> > >
> > > On Wed, Jun 10, 2015 at 10:28 AM, Adrian Maldonado
> > > <[email protected]> wrote:
> > > Strange.
> > >
> > > I am using the master branch from the bitbucket repo
> > > (https://bitbucket.org/damalbel/tubtrans.git).
> > >
> > > The jacobian is the finite difference jacobian (-snes_fd) as I didn't
> > > write one (I did write one in the matlab code prototype, though).
> > >
> > > What I can think now is to check this singular jacobian and see what
> > > equation is causing the problem.
> > >
> > > Let me clone the new petsc release and try to replicate the problem. I
> > > have a couple of things to do this morning but I will get back to you
> > > this afternoon.
> > >
> > > On Wed, Jun 10, 2015 at 9:54 AM, Hong <[email protected]> wrote:
> > > Daniel,
> > >
> > > It seems Jacobian is singular:
> > > ./tubtrans -snes_monitor -ksp_monitor |more
> > >
> > > Loaded case file
> > > 0 SNES Function norm 1.500947612239e+02
> > > 0 KSP Residual norm -nan
> > > SYSTEM SUMMARY
> > >
> > > Number of variables: 24
> > > Number of nodes: 4
> > > Number of pipes: 2
> > > Integrating characteristic equations ...
> > > TS: 0.000000
> > > 0 SNES Function norm 1.490033613070e+02
> > > 0 KSP Residual norm -nan
> > >
> > > Do you use the same version in
> > > git clone https://bitbucket.org/damalbel/tubtrans.git?
> > >
> > > Can you apply my patch to this version and test it with new petsc-3.5 (we
> > > are working on its release)?
> > > The best way to get petsc-dev is using
> > > • git clone https://bitbucket.org/petsc/petsc
> > > We will have a PETSc conference next week
> > > http://www.mcs.anl.gov/petsc/petsc-20.pdf
> > >
> > > Would you like join us?
> > > Hong
> > >
> > >
> > > On Wed, Jun 10, 2015 at 8:17 AM, Adrian Maldonado <[email protected]>
> > > wrote:
> > > P.S: I have applied your patch, and using PETSC 3.5.2 the program works
> > > all right. I have attached the logs of running test.xml and test2.xml.
> > >
> > > Again, let me check out the new petsc release.
> > >
> > > On Wed, Jun 10, 2015 at 8:08 AM, Adrian Maldonado <[email protected]>
> > > wrote:
> > > I don't see why it should not work. I can see two things that we can
> > > check:
> > >
> > > - Is SNES iterating? Can you check for -snes_monitor and
> > > -snes_converged_reason
> > > - In the program, I initialize the the solutions vector 'x'to ones as an
> > > initial guess (line 102), maybe somehow petsc is unable to write on it?
> > >
> > > Let me clone the new petsc-dev version and I will try to replicate the
> > > error.
> > >
> > > On Tue, Jun 9, 2015 at 4:56 PM, Hong <[email protected]> wrote:
> > > Danniel,
> > > Thanks for detailed instruction. I'm using petsc-dev, which will be
> > > released today or tomorrow. I'm building a release version to see if that
> > > causes the problem.
> > >
> > > However, your code should generate same solution with my patch (attached)
> > > because it only replace
> > > VecGetArray() with VecGetArrayRead(), and fix a memory leak problem.
> > >
> > > Here is what I did:
> > > 1. install petsc-dev (see
> > > http://www.mcs.anl.gov/petsc/developers/index.html)
> > > git clone https://bitbucket.org/petsc/petsc
> > >
> > > 2. build petsc-dev
> > >
> > > 3. git clone https://bitbucket.org/damalbel/tubtrans.git
> > > patch it:
> > > cd tubtrans
> > > patch -Np1 < patch_tubtrans
> > > make tubtrans
> > >
> > > 4. run tubtrans
> > > ./tubtrans
> > >
> > > Loaded case file
> > > SYSTEM SUMMARY
> > >
> > > Number of variables: 24
> > > Number of nodes: 4
> > > Number of pipes: 2
> > > Integrating characteristic equations ...
> > > TS: 0.000000
> > > Printing solution vector...
> > > Vec Object: 1 MPI processes
> > > type: seq
> > > 1
> > > 1
> > > 1
> > > 1
> > > 1
> > > ...
> > >
> > > Changing to different input file:
> > > ./tubtrans -f input/test.xml |more
> > > Loaded case file
> > > SYSTEM SUMMARY
> > >
> > > Number of variables: 12
> > > Number of nodes: 2
> > > Number of pipes: 1
> > > Integrating characteristic equations ...
> > > TS: 0.000000
> > > Printing solution vector...
> > > Vec Object: 1 MPI processes
> > > type: seq
> > > 1
> > > 1
> > > 1
> > > 1
> > > 1
> > > 1
> > > 1
> > > 1
> > > 1
> > > 1
> > > 1
> > > 1
> > > TS: 0.100000
> > > Printing solution vector...
> > > Vec Object: 1 MPI processes
> > > type: seq
> > > 1
> > > 1
> > > 1
> > >
> > > Do I do something wrong?
> > >
> > > Hong
> > >
> > >
> > > On Tue, Jun 9, 2015 at 12:15 AM, Adrian Maldonado <[email protected]>
> > > wrote:
> > > Professor Hong,
> > >
> > > I am glad to hear from you! I am happy that you are finding the code
> > > useful. As for the solution vector being constant ones - that doesn't
> > > seem right.
> > >
> > > The master branch of the bitbucket repository with the Petsc Release
> > > Version 3.5.2 seems to work fine in my laptop.
> > >
> > > I have written a small walk through below.
> > >
> > > Please, let me know if you want me to take a look at the code or I can
> > > help you in any other way.
> > >
> > > Walk through:
> > >
> > > - The program uses XML (libxml2) to parse data for different case
> > > scenarios. The cases are stored in the directory 'input/' and you can
> > > select the case to run in the petscopt file (e.g -f input/test.xml)
> > >
> > > - The file test.xml, for example, includes a simple one-pipe test
> > > scenario from the textbook 'Fluid Transients' by Wyle. The content is
> > > shown below:
> > >
> > > <?xml version="1.0"?>
> > > <case>
> > > <nnode>2</nnode>
> > > <pipe id="1">
> > > <fr>1</fr>
> > > <to>2</to>
> > > <fric>0.018</fric>
> > > <diam>0.5</diam>
> > > <a>1200</a>
> > > <len>600</len>
> > > </pipe>
> > > <reservoir id="1">
> > > <node>1</node>
> > > <head>150</head>
> > > </reservoir>
> > >
> > > <valve id="1">
> > > <node>2</node>
> > > <lossc>0.009</lossc>
> > > </valve>
> > > </case>
> > >
> > > In this case we have a conduct of two nodes (boundary equations). The
> > > only conduct (or pipe) goes from node 1 to node 2. The entries 'fric',
> > > 'diam', 'a' and 'len' are the parameters of the pipeline. Then, the file
> > > includes a reservoir and a valve as boundary equations.
> > >
> > > The reservoir defines a constant pressure (water head) in the node 1
> > > whereas the valve defines a relationship between the pressure and the
> > > flow with friction losses.
> > >
> > > The program, after creating the necessary data structures and
> > > discretization (according to Courant number) will proceed to the
> > > initialization. The initialization is solving the system state for the
> > > steady-state conditions with SNES.
> > >
> > > Running the program in gdb for 'test.xml' and printing the 'x' vector
> > > will produce this:
> > >
> > > Breakpoint 1, main (argc=1, argv=0x7fffffffde68) at main.c:115
> > > 115 ierr = SNESSolve(snes, NULL, x);CHKERRQ(ierr);
> > > (gdb) n
> > > 120 ierr = MatDenseGetArray(SOL, &ss);CHKERRQ(ierr);
> > > (gdb) p VecView(x, 0)
> > > Vec Object: 1 MPI processes
> > > type: seq
> > > 0.477432
> > > 150
> > > 0.477432
> > > 148.698
> > > 0.477432
> > > 147.395
> > > 0.477432
> > > 146.093
> > > 0.477432
> > > 144.791
> > > 0.477432
> > > 143.488
> > >
> > > The ordering here is, according to the discretization x = [Q0 H0 Q1 H1
> > > ... Qn Hn]' . Where Q is the water flow and H is the water head or
> > > pressure. You can see how the water flow is constant in steady state
> > > along the pipeline and the pressure in the first node is 150, as the
> > > reservoir sets this as boundary equation.
> > >
> > > For the transient we simulate the sudden closure of the valve (dirac
> > > delta) although we can implement different closure dynamics.
> > >
> > > I have compiled the debug mode of the program (make debug) which prints
> > > the 'x' vector and each time step. I am attaching the results in the file
> > > log.txt so you can compare it. Each time step solution is stored in the
> > > matrix SOL and finally written to a binary file 'output.out'.
> > >
> > > The first integration time-step looks like:
> > >
> > > 0.477432
> > > 150
> > > 0.477432
> > > 148.698
> > > 0.477432
> > > 147.395
> > > 0.477432
> > > 146.093
> > > 0.477432
> > > 144.791
> > > -1.11838e-12
> > > 441.046
> > >
> > > The pen-ultimate entry of the x vector is the flow at the valve, which is
> > > 0 after the valve closes. At the end of the valve an steep rise in the
> > > pressure happens, as expected. This pressure wave will oscillate along
> > > the pipe according to the method of characteristics.
> > >
> > > More complex scenarios are included in '/input' and a script with some
> > > python macros 'intPlot.py' is included to produce plots of the pressure
> > > and flow profiles within the system.
> > >
> > > I hope this helped you and please, let me know in what can I help.
> > >
> > > Yours,
> > >
> > >
> > > On Mon, Jun 8, 2015 at 10:06 PM, Hong <[email protected]> wrote:
> > > Danniel,
> > >
> > > Hello, this is Hong, your professor in CS595 :-)
> > >
> > > We are working on a project for simulating water cycle. The math model to
> > > be started is a shallow water river, very similar to your project. I
> > > start looking at your codes. After small update with latest petsc-dev,
> > > I'm able to run it.
> > >
> > > Look at the solutions at each time step, I found that solution vector x
> > > generated in main.c are alway a constant vector, x=1.
> > > Am I doing the right thing?
> > >
> > > Hong
> > >
> > >
> > >
> > > --
> > > D. Adrian Maldonado, PhD Candidate
> > > Electrical & Computer Engineering Dept.
> > > Illinois Institute of Technology
> > > 3301 S. Dearborn Street, Chicago, IL 60616
> > >
> > >
> > >
> > >
> > > --
> > > D. Adrian Maldonado, PhD Candidate
> > > Electrical & Computer Engineering Dept.
> > > Illinois Institute of Technology
> > > 3301 S. Dearborn Street, Chicago, IL 60616
> > >
> > >
> > >
> > > --
> > > D. Adrian Maldonado, PhD Candidate
> > > Electrical & Computer Engineering Dept.
> > > Illinois Institute of Technology
> > > 3301 S. Dearborn Street, Chicago, IL 60616
> > >
> > >
> > >
> > >
> > > --
> > > D. Adrian Maldonado, PhD Candidate
> > > Electrical & Computer Engineering Dept.
> > > Illinois Institute of Technology
> > > 3301 S. Dearborn Street, Chicago, IL 60616
> > >
> > >
> > >
> > >
> > > --
> > > D. Adrian Maldonado, PhD Candidate
> > > Electrical & Computer Engineering Dept.
> > > Illinois Institute of Technology
> > > 3301 S. Dearborn Street, Chicago, IL 60616
> > >
> > >
> > >
> > >
> > > --
> > > D. Adrian Maldonado, PhD Candidate
> > > Electrical & Computer Engineering Dept.
> > > Illinois Institute of Technology
> > > 3301 S. Dearborn Street, Chicago, IL 60616
> > >
> > >
> > >
> > > --
> > > D. Adrian Maldonado, PhD Candidate
> > > Electrical & Computer Engineering Dept.
> > > Illinois Institute of Technology
> > > 3301 S. Dearborn Street, Chicago, IL 60616
> > >
> > >
> > >
> > > --
> > > D. Adrian Maldonado, PhD Candidate
> > > Electrical & Computer Engineering Dept.
> > > Illinois Institute of Technology
> > > 3301 S. Dearborn Street, Chicago, IL 60616
> > >
> >
> >
>
>