Re: [petsc-dev] tubtrans

Barry Smith Thu, 25 Jun 2015 11:14:17 -0700

> On Jun 10, 2015, at 11:04 PM, Hong <[email protected]> wrote:
> 
> Barry:
> 
>    How can I reproduce this? What should happen is KSP gets a Nan which 
> passes a Nan to SNES which passes it up to TS and TS should stop with an 
> error saying it cannot time-step.
>  
> I've sent you an invitation for repository 'wash'.
> The example is wash/pipe/main.c 
> make pipe
> $ ./pipe -pc_factor_shift_type NONE -snes_monitor -ksp_monitor
> Loaded case file
>   nTimeSteps 100 * deltatim 0.1 = endTime 10
> Pipe 0. nlen: 5. hlen: 120.000000
>   0 SNES Function norm 1.495500487667e+02
>     0 KSP Residual norm            nan
> 
>  Solve with xda...
>   0 SNES Function norm 1.495500487667e+02
>     0 KSP Residual norm            nan
> 
> i.e., 'nan' does not cause crash or give any error, simply moves to next line 
> of code ... We do our own TS.


  Why? Shouldn't one always use TS? 

> Maybe we should use
> -snes_converged_reason -ksp_converged_reason to stop.

  No, you need to check the converged reason of SNES each time.

   SNESGetConvergedReason(snes,&reason);
  if (reason < 0) {
     nonlinear solve failed so do something
  }

  Note that TS is suppose to handle all this for you, checking the converged 
reason of SNES, cutting the time-step if the nonlinear solve failed ....... so 
you don't need to write all the code yourself.

  Barry


> 
> 
>    One can run with -ksp_error_if_not_converged to get a crash as soon as the 
> the linear solver detects a problem to reproduce some of the old behavior.
>  
> This works, thanks. Why not put it as default?
> 
>   The logic behind these changes is that previously KSP would stop the 
> program with a error if there was a zero pivot in LU, this means the time 
> step algorithms could not recover from a zero pivot. This was very bad, who 
> wants a program that has been running for 8 hours to crash because at 
> timestep 1,003,415 it got a zero pivot when a simple cut in the tilmestep 
> size would allow the program to keep running.
> 
>  Obviously there are some issues in the new model that have to be worked out.
> 
> Running 1,003,415 time steps with 'nan', and still continue ... ?
> 
> Hong
> 
> 
> > On Jun 10, 2015, at 8:49 PM, Hong <[email protected]> wrote:
> >
> > Barry:
> >
> > Why the code does not crash in the case of 'nan':
> >
> > Integrating characteristic equations ...
> > TS: 0.000000
> >   0 SNES Function norm 1.490033584870e+02
> >     0 KSP Residual norm            nan
> > TS: 0.100000
> >   0 SNES Function norm 1.490033584870e+02
> >     0 KSP Residual norm            nan
> > ...
> >
> > Previously, when ilu/lu encounters a zero pivot, the code crashes, which 
> > asks user to do something about his model or the algorithm;
> > Now, it gives 'nan', but continues running.
> > We cannot assume new users know '-ksp_monitor -snes_monitor', and set 
> > '-pc_factor_shift'. The frustration we encountered is that the code seems 
> > working well, until I checked the solutions at each time steps and found 
> > them never change values. Then it took us hours to find out it was caused 
> > by shift.
> >
> > At least for LU, when zero-pivot occurs, we should throw an error.
> >
> > Hong
> >
> >   Hong,
> >
> >     This is a question for the entire PETSc development team, not just me.
> >
> > The reason I made this change was I was sick of
> > runs with ILU and nearly zero pivots where the preconditioned residual 
> > normed ended up initially being like 10^20 and then decreased in a few 
> > iterations to 10^10 and so the iteration stopped while, in fact, no 
> > progress on the true residual norm at all, meanwhile the user thinks the 
> > linear solver is working fine. I figured was better to just "stop" the 
> > entire process so the user can do something smarter for preconditioning 
> > than using ILU(0). At worse the user can now add the pivot option 
> > themselves.
> >
> >    Was this a good choice, perhaps not? Should we just go back to the old 
> > default? Probably we need to at least have better help in terms of telling 
> > people to turn on that option?
> >
> >
> >   Barry
> >
> >
> >
> > > On Jun 10, 2015, at 4:47 PM, Hong <[email protected]> wrote:
> > >
> > > Daniel,
> > >
> > > Try to use it with the flag: '-pc_type svd'. Now it works.
> > > Great, you found the problem!
> > >
> > > This is very strange
> > > OK, this must be the change made on the default pc_factor_shift used for 
> > > ilu/lu.
> > > In v3.5.4, it is
> > > -pc_factor_shift_type INBLOCKS
> > >
> > > now,
> > > -pc_factor_shift_type NONE
> > >
> > > Adding the option '-pc_factor_shift_type INBLOCKS'
> > > or '-pc_factor_shift_type NONZERO'
> > > the code runs well :-)
> > >
> > > Barry:
> > > For ilu, a shift should be used to avoid zero pivot;
> > > for lu, '-pc_factor_shift_type NONE' should be the default.
> > > Why do we make such change?
> > > It seems you made such change:
> > > https://bitbucket.org/petsc/petsc/commits/422a814ef4a731b8529cf3a6428db526d183e312
> > >
> > > Hong
> > >
> > >
> > >
> > > On Wed, Jun 10, 2015 at 3:11 PM, Adrian Maldonado <[email protected]> 
> > > wrote:
> > > I have been able to replicate the error with the new petsc-dev.
> > >
> > > KSP does not converge. However, the Jacobian matrix is the same!!!
> > >
> > > This jacobian matrix is not very well conditioned (condition number 
> > > 1.4814e+03 for a 12x12 matrix) because of the valve... but it didn't give 
> > > me any problem before.
> > >
> > > In fact, you can see that I have implemented the same in matlab (matlab 
> > > folder) and solved the problem with a vanilla newton-rhapson iteration 
> > > (no preconditioner etc...) and I didn't have any problem at all.
> > >
> > >
> > >
> > > On Wed, Jun 10, 2015 at 2:37 PM, Adrian Maldonado <[email protected]> 
> > > wrote:
> > > I am upgrading to the new petsc-dev, I will let you know what I find. 
> > > Maybe is an issue of the finite difference jacobian (how is generated)?
> > >
> > > I would love to go to to the conference! How can I register?
> > >
> > > On Wed, Jun 10, 2015 at 12:55 PM, Hong <[email protected]> wrote:
> > > I found which change made difference:
> > > When I change Makefile
> > > $ diff old.Makefile new.Makefile
> > > 1,2c1,2
> > > < include ${PETSC_DIR}/conf/variables
> > > < include ${PETSC_DIR}/conf/rules
> > > ---
> > > > include ${PETSC_DIR}/lib/petsc/conf/variables
> > > > include ${PETSC_DIR}/lib/petsc/conf/rules
> > >
> > > to build tubtrans with petsc-dev, then I get
> > >
> > > Loaded case file
> > >     0 KSP Residual norm            nan
> > > SYSTEM SUMMARY
> > >
> > > Number of variables: 12
> > > Number of nodes: 2
> > > Number of pipes: 1
> > > Integrating characteristic equations ...
> > > TS: 0.000000
> > >     0 KSP Residual norm            nan
> > >
> > > Some petsc solvers have changed, but it should not behave like this.
> > > We have to use a debugger to figure out which step causes this problem :-(
> > >
> > > Hong
> > >
> > > On Wed, Jun 10, 2015 at 10:57 AM, Adrian Maldonado 
> > > <[email protected]> wrote:
> > > Sure.
> > >
> > > This one has the object files, etc..
> > >
> > > I will check the new petsc-dev this afternoon.
> > >
> > > On Wed, Jun 10, 2015 at 10:42 AM, Hong <[email protected]> wrote:
> > > How about make a tarball of your current tubtrans and send to me
> > > (or use google drive/dropbox)?
> > >
> > > Hong
> > >
> > >
> > > On Wed, Jun 10, 2015 at 10:28 AM, Adrian Maldonado 
> > > <[email protected]> wrote:
> > > Strange.
> > >
> > > I am using the master branch from the bitbucket repo 
> > > (https://bitbucket.org/damalbel/tubtrans.git).
> > >
> > > The jacobian is the finite difference jacobian (-snes_fd) as I didn't 
> > > write one (I did write one in the matlab code prototype, though).
> > >
> > > What I can think now is to check this singular jacobian and see what 
> > > equation is causing the problem.
> > >
> > > Let me clone the new petsc release and try to replicate the problem. I 
> > > have a couple of things to do this morning but I will get back to you 
> > > this afternoon.
> > >
> > > On Wed, Jun 10, 2015 at 9:54 AM, Hong <[email protected]> wrote:
> > > Daniel,
> > >
> > > It seems Jacobian is singular:
> > > ./tubtrans -snes_monitor -ksp_monitor |more
> > >
> > > Loaded case file
> > >   0 SNES Function norm 1.500947612239e+02
> > >     0 KSP Residual norm           -nan
> > > SYSTEM SUMMARY
> > >
> > > Number of variables: 24
> > > Number of nodes: 4
> > > Number of pipes: 2
> > > Integrating characteristic equations ...
> > > TS: 0.000000
> > >   0 SNES Function norm 1.490033613070e+02
> > >     0 KSP Residual norm           -nan
> > >
> > > Do you use the same version in
> > > git clone https://bitbucket.org/damalbel/tubtrans.git?
> > >
> > > Can you apply my patch to this version and test it with new petsc-3.5 (we 
> > > are working on its release)?
> > > The best way to get petsc-dev is using
> > >       • git clone https://bitbucket.org/petsc/petsc
> > > We will have a PETSc conference next week
> > > http://www.mcs.anl.gov/petsc/petsc-20.pdf
> > >
> > > Would you like join us?
> > > Hong
> > >
> > >
> > > On Wed, Jun 10, 2015 at 8:17 AM, Adrian Maldonado <[email protected]> 
> > > wrote:
> > > P.S: I have applied your patch, and using PETSC 3.5.2 the program works 
> > > all right. I have attached the logs of running test.xml and test2.xml.
> > >
> > > Again, let me check out the new petsc release.
> > >
> > > On Wed, Jun 10, 2015 at 8:08 AM, Adrian Maldonado <[email protected]> 
> > > wrote:
> > > I don't see why it should not work. I can see two things that we can 
> > > check:
> > >
> > > - Is SNES iterating? Can you check for -snes_monitor and 
> > > -snes_converged_reason
> > > - In the program, I initialize the the solutions vector 'x'to ones as an 
> > > initial guess (line 102), maybe somehow petsc is unable to write on it?
> > >
> > > Let me clone the new petsc-dev version and I will try to replicate the 
> > > error.
> > >
> > > On Tue, Jun 9, 2015 at 4:56 PM, Hong <[email protected]> wrote:
> > > Danniel,
> > >  Thanks for detailed instruction. I'm using petsc-dev, which will be 
> > > released today or tomorrow. I'm building a release version to see if that 
> > > causes the problem.
> > >
> > > However, your code should generate same solution with my patch (attached) 
> > > because it only replace
> > > VecGetArray() with VecGetArrayRead(), and fix a memory leak problem.
> > >
> > > Here is what I did:
> > > 1. install petsc-dev (see 
> > > http://www.mcs.anl.gov/petsc/developers/index.html)
> > > git clone https://bitbucket.org/petsc/petsc
> > >
> > > 2. build petsc-dev
> > >
> > > 3. git clone https://bitbucket.org/damalbel/tubtrans.git
> > > patch it:
> > > cd tubtrans
> > > patch -Np1 < patch_tubtrans
> > > make tubtrans
> > >
> > > 4. run tubtrans
> > > ./tubtrans
> > >
> > > Loaded case file
> > > SYSTEM SUMMARY
> > >
> > > Number of variables: 24
> > > Number of nodes: 4
> > > Number of pipes: 2
> > > Integrating characteristic equations ...
> > > TS: 0.000000
> > > Printing solution vector...
> > > Vec Object: 1 MPI processes
> > >   type: seq
> > > 1
> > > 1
> > > 1
> > > 1
> > > 1
> > > ...
> > >
> > > Changing to different input file:
> > > ./tubtrans -f input/test.xml |more
> > > Loaded case file
> > > SYSTEM SUMMARY
> > >
> > > Number of variables: 12
> > > Number of nodes: 2
> > > Number of pipes: 1
> > > Integrating characteristic equations ...
> > > TS: 0.000000
> > > Printing solution vector...
> > > Vec Object: 1 MPI processes
> > >   type: seq
> > > 1
> > > 1
> > > 1
> > > 1
> > > 1
> > > 1
> > > 1
> > > 1
> > > 1
> > > 1
> > > 1
> > > 1
> > > TS: 0.100000
> > > Printing solution vector...
> > > Vec Object: 1 MPI processes
> > >   type: seq
> > > 1
> > > 1
> > > 1
> > >
> > > Do I do something wrong?
> > >
> > > Hong
> > >
> > >
> > > On Tue, Jun 9, 2015 at 12:15 AM, Adrian Maldonado <[email protected]> 
> > > wrote:
> > > Professor Hong,
> > >
> > > I am glad to hear from you! I am happy that you are finding the code 
> > > useful. As for the solution vector being constant ones - that doesn't 
> > > seem right.
> > >
> > > The master branch of the bitbucket repository with the Petsc Release 
> > > Version 3.5.2 seems to work fine in my laptop.
> > >
> > > I have written a small walk through below.
> > >
> > > Please, let me know if you want me to take a look at the code or I can 
> > > help you in any other way.
> > >
> > > Walk through:
> > >
> > > - The program uses XML (libxml2) to parse data for different case 
> > > scenarios. The cases are stored in the directory 'input/' and you can 
> > > select the case to run in the petscopt file (e.g -f input/test.xml)
> > >
> > > - The file test.xml, for example, includes a simple one-pipe test 
> > > scenario from the textbook 'Fluid Transients' by Wyle. The content is 
> > > shown below:
> > >
> > > <?xml version="1.0"?>
> > > <case>
> > >       <nnode>2</nnode>
> > >       <pipe id="1">
> > >               <fr>1</fr>
> > >               <to>2</to>
> > >         <fric>0.018</fric>
> > >         <diam>0.5</diam>
> > >         <a>1200</a>
> > >         <len>600</len>
> > >       </pipe>
> > >       <reservoir id="1">
> > >               <node>1</node>
> > >               <head>150</head>
> > >       </reservoir>
> > >
> > >       <valve id="1">
> > >               <node>2</node>
> > >               <lossc>0.009</lossc>
> > >       </valve>
> > > </case>
> > >
> > > In this case we have a conduct of two nodes (boundary equations). The 
> > > only conduct (or pipe) goes from node 1 to node 2. The entries 'fric', 
> > > 'diam', 'a' and 'len' are the parameters of the pipeline. Then, the file 
> > > includes a reservoir and a valve as boundary equations.
> > >
> > > The reservoir defines a constant pressure (water head) in the node 1 
> > > whereas the valve defines a relationship between the pressure and the 
> > > flow with friction losses.
> > >
> > > The program, after creating the necessary data structures and 
> > > discretization (according to Courant number) will proceed to the 
> > > initialization. The initialization is solving the system state for the 
> > > steady-state conditions with SNES.
> > >
> > > Running the program in gdb for 'test.xml' and printing the 'x' vector 
> > > will produce this:
> > >
> > > Breakpoint 1, main (argc=1, argv=0x7fffffffde68) at main.c:115
> > > 115       ierr = SNESSolve(snes, NULL, x);CHKERRQ(ierr);
> > > (gdb) n
> > > 120       ierr = MatDenseGetArray(SOL, &ss);CHKERRQ(ierr);
> > > (gdb) p VecView(x, 0)
> > > Vec Object: 1 MPI processes
> > >   type: seq
> > > 0.477432
> > > 150
> > > 0.477432
> > > 148.698
> > > 0.477432
> > > 147.395
> > > 0.477432
> > > 146.093
> > > 0.477432
> > > 144.791
> > > 0.477432
> > > 143.488
> > >
> > > The ordering here is, according to the discretization x = [Q0 H0 Q1 H1 
> > > ... Qn Hn]' . Where Q is the water flow and H is the water head or 
> > > pressure. You can see how the water flow is constant in steady state 
> > > along the pipeline and the pressure in the first node is 150, as the 
> > > reservoir sets this as boundary equation.
> > >
> > > For the transient we simulate the sudden closure of the valve (dirac 
> > > delta) although we can implement different closure dynamics.
> > >
> > > I have compiled the debug mode of the program (make debug) which prints 
> > > the 'x' vector and each time step. I am attaching the results in the file 
> > > log.txt so you can compare it. Each time step solution is stored in the 
> > > matrix SOL and finally written to a binary file 'output.out'.
> > >
> > > The first integration time-step looks like:
> > >
> > > 0.477432
> > > 150
> > > 0.477432
> > > 148.698
> > > 0.477432
> > > 147.395
> > > 0.477432
> > > 146.093
> > > 0.477432
> > > 144.791
> > > -1.11838e-12
> > > 441.046
> > >
> > > The pen-ultimate entry of the x vector is the flow at the valve, which is 
> > > 0 after the valve closes. At the end of the valve an steep rise in the 
> > > pressure happens, as expected. This pressure wave will oscillate along 
> > > the pipe according to the method of characteristics.
> > >
> > > More complex scenarios are included in '/input' and a script with some 
> > > python macros 'intPlot.py' is included to produce plots of the pressure 
> > > and flow profiles within the system.
> > >
> > > I hope this helped you and please, let me know in what can I help.
> > >
> > > Yours,
> > >
> > >
> > > On Mon, Jun 8, 2015 at 10:06 PM, Hong <[email protected]> wrote:
> > > Danniel,
> > >
> > > Hello, this is Hong, your professor in CS595 :-)
> > >
> > > We are working on a project for simulating water cycle. The math model to 
> > > be started is a shallow water river, very similar to your project. I 
> > > start looking at your codes. After small update with latest petsc-dev, 
> > > I'm able to run it.
> > >
> > > Look at the solutions at each time step, I found that solution vector x 
> > > generated in main.c are alway a constant vector, x=1.
> > > Am I doing the right thing?
> > >
> > > Hong
> > >
> > >
> > >
> > > --
> > > D. Adrian Maldonado, PhD Candidate
> > > Electrical & Computer Engineering Dept.
> > > Illinois Institute of Technology
> > > 3301 S. Dearborn Street, Chicago, IL 60616
> > >
> > >
> > >
> > >
> > > --
> > > D. Adrian Maldonado, PhD Candidate
> > > Electrical & Computer Engineering Dept.
> > > Illinois Institute of Technology
> > > 3301 S. Dearborn Street, Chicago, IL 60616
> > >
> > >
> > >
> > > --
> > > D. Adrian Maldonado, PhD Candidate
> > > Electrical & Computer Engineering Dept.
> > > Illinois Institute of Technology
> > > 3301 S. Dearborn Street, Chicago, IL 60616
> > >
> > >
> > >
> > >
> > > --
> > > D. Adrian Maldonado, PhD Candidate
> > > Electrical & Computer Engineering Dept.
> > > Illinois Institute of Technology
> > > 3301 S. Dearborn Street, Chicago, IL 60616
> > >
> > >
> > >
> > >
> > > --
> > > D. Adrian Maldonado, PhD Candidate
> > > Electrical & Computer Engineering Dept.
> > > Illinois Institute of Technology
> > > 3301 S. Dearborn Street, Chicago, IL 60616
> > >
> > >
> > >
> > >
> > > --
> > > D. Adrian Maldonado, PhD Candidate
> > > Electrical & Computer Engineering Dept.
> > > Illinois Institute of Technology
> > > 3301 S. Dearborn Street, Chicago, IL 60616
> > >
> > >
> > >
> > > --
> > > D. Adrian Maldonado, PhD Candidate
> > > Electrical & Computer Engineering Dept.
> > > Illinois Institute of Technology
> > > 3301 S. Dearborn Street, Chicago, IL 60616
> > >
> > >
> > >
> > > --
> > > D. Adrian Maldonado, PhD Candidate
> > > Electrical & Computer Engineering Dept.
> > > Illinois Institute of Technology
> > > 3301 S. Dearborn Street, Chicago, IL 60616
> > >
> >
> >
> 
>

Re: [petsc-dev] tubtrans

Reply via email to