Thank you for your answers. Barry's answer allowed me to get up to 50x faster code, so it was a huge help. I decided not to use ksp context altogether for this small operation.

Thank you again,


On 11-03-2018 18:53, Matthew Knepley wrote:

On Sun, Mar 11, 2018 at 1:14 AM, Smith, Barry F. < <>> wrote:

      1) Run the problem with -ksp_view_mat and -ksp_view_rhs and mail <>  the
    resulting file produced called binaryoutput

       2) By default PCLU does a reordering to reduce fill that could
    introduce a zero pivoit, PCILU does not do a reordering by
    default. You can use -pc_factor_mat_ordering_type none to force no
    reordering (PCLU does not do numerical pivoting for stability so
    can fail with zero pivots).

       3) If you need to solve these tiny 7 by 7 systems many times
    (presumably you are solving these to set up a large algebraic
    system solved afterwards) then you probably don't want to use KSP
    to solve them. You can use the low level kernel
    PetscKernel_A_gets_inverse_A_7() that does do pivoting followed by
    a multiply like

    s1 = v[0]*x1 + v[6]*x2  + v[12]*x3 + v[18]*x4 + v[24]*x5 + v[30]*x6;
          s2 = v[1]*x1 + v[7]*x2  + v[13]*x3 + v[19]*x4 + v[25]*x5 +
          s3 = v[2]*x1 + v[8]*x2  + v[14]*x3 + v[20]*x4 + v[26]*x5 +
          s4 = v[3]*x1 + v[9]*x2  + v[15]*x3 + v[21]*x4 + v[27]*x5 +
          s5 = v[4]*x1 + v[10]*x2 + v[16]*x3 + v[22]*x4 + v[28]*x5 +
          s6 = v[5]*x1 + v[11]*x2 + v[17]*x3 + v[23]*x4 + v[29]*x5 +

    where v is the dense 7 by 7 matrix (stored column oriented like
    Fortran) an the x are the seven values of the right hand side.

Note that PCPBJACOBI will do this automatically.



    > On Mar 10, 2018, at 5:22 AM, Ali Berk Kahraman
    < <>> wrote:
    > Hello All,
    > I am trying to get the finite difference coefficients for a
    given irregular grid. For this, I follow the following webpage,
    which tells me to solve a linear system.
    > I solve a 7 unknown linear system with a 7x7 dense matrix to get
    the finite difference coefficients. Since I will call this code
    many many many times in my overall project, I need it to be as
    fast, yet as exact as possible. So I use PCLU. I make sure that
    there are no zero diagonals on the matrix, I swap required rows
    for it. However, PCLU still diverges with the output at the end of
    this e-mail. It indicates "FACTOR_NUMERIC_ZEROPIVOT" , but as I
    have written above I make sure there are no zero main diagonal
    entries on the matrix. When I use PCILU instead, it converges
    pretty well.
    > So my question is, is PCILU the same thing mathematically as
    PCLU when applied on a small dense matrix? I need to know if I get
    the exact solution with PCILU, because my whole project will
    depend on the accuracy of the finite differences.
    > Best Regards,
    > Ali Berk Kahraman
    > M.Sc. Student, Mechanical Engineering Dept.
    > Boğaziçi Uni., Istanbul, Turkey
    > Linear solve did not converge due to DIVERGED_PCSETUP_FAILED
    iterations 0
    > KSP Object: 1 MPI processes
    >   type: gmres
    >     restart=30, using Classical (unmodified) Gram-Schmidt
    Orthogonalization with no iterative refinement
    >     happy breakdown tolerance 1e-30
    >   maximum iterations=10000, initial guess is zero
    >   tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
    >   left preconditioning
    >   using PRECONDITIONED norm type for convergence test
    > PC Object: 1 MPI processes
    >   type: lu
    >     out-of-place factorization
    >     tolerance for zero pivot 2.22045e-14
    >     matrix ordering: nd
    >     factor fill ratio given 5., needed 1.
    >       Factored matrix follows:
    >         Mat Object: 1 MPI processes
    >           type: seqaij
    >           rows=7, cols=7
    >           package used to perform factorization: petsc
    >           total: nonzeros=49, allocated nonzeros=49
    >           total number of mallocs used during MatSetValues calls =0
    >             using I-node routines: found 2 nodes, limit used is 5
    >   linear system matrix = precond matrix:
    >   Mat Object: 1 MPI processes
    >     type: seqaij
    >     rows=7, cols=7
    >     total: nonzeros=49, allocated nonzeros=49
    >     total number of mallocs used during MatSetValues calls =0
    >       using I-node routines: found 2 nodes, limit used is 5

What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener <>

Reply via email to