I pulled the most recent version of petsc-dev and added superlu_dist. The only case that fails is spooles on 1 processor with an mpiaij matrix created with the Create/SetSizes/... paradigm. I would suspect something in spooles wrapper code.
code allocating mat: MatCreate(Py_PETSC_COMM_WORLD,&self->m); MatSetSizes(self->m,n,n,N,N); MatSetType(self->m,MATMPIAIJ); results: -pc_type lu -pc_factor_mat_solver_packages spooles - ksp_monitor_true_residual (spooles on 1 proc) 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm 1.062801511746e+01 ||Ae||/||Ax|| 1.000000000000e+00 1 KSP preconditioned resid norm 3.936838941525e-01 true resid norm 1.050257318153e+01 ||Ae||/||Ax|| 9.881970495395e-01 2 KSP preconditioned resid norm 7.332823321496e-14 true resid norm 2.408053447453e-13 ||Ae||/||Ax|| 2.265760276816e-14 -pc_type lu -pc_factor_mat_solver_packages spooles - ksp_monitor_true_residual (spooleson 8 procs) 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm 1.062801511746e+01 ||Ae||/||Ax|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.898588418444e-14 true resid norm 1.045214942604e-13 ||Ae||/||Ax|| 9.834526306678e-15 -pc_type lu -pc_factor_mat_solver_packages superlu_dist - ksp_monitor_true_residual 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm 1.062801511746e+01 ||Ae||/||Ax|| 1.000000000000e+00 1 KSP preconditioned resid norm 2.917052183641e-14 true resid norm 1.299023464819e-13 ||Ae||/||Ax|| 1.222263471084e-14 code allocating mat: MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N, 1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m); results: -pc_type lu -pc_factor_mat_solver_packages spooles - ksp_monitor_true_residual (spooles on 1 proc) 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm 1.062801511746e+01 ||Ae||/||Ax|| 1.000000000000e+00 1 KSP preconditioned resid norm 3.093811333624e-14 true resid norm 1.046870732638e-13 ||Ae||/||Ax|| 9.850105791801e-15 On Oct 29, 2009, at 1:50 PM, Matthew Knepley wrote: > On Thu, Oct 29, 2009 at 1:46 PM, Chris Kees <christopher.e.kees at > usace.army.mil > > wrote: > Mark, Barry, > > Thanks for the help. For now I'm sticking with the approach of > copying the csr matrix and using the csr data structures to do the > preallocations. I'll eventually get around to writing the code for > assembling directly into the petsc matrix. I have two more questions. > > 1) On 1 processor, the linear solver is not converging in 1 > iteration using -pc_type lu -pc_factor_mat_solver_package spooles. > It converges in one iteration in parallel or on a single processor > if I set -mat_type seqaij. Also, if I go back to the old way of > constructing the matrix it DOES converge in 1 iteration on 1 > processor. That is, if I replace the MatCreate/MatSetSizes/... > calls from the earlier email, with what I had originally: > > MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N, > 1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m); > > then the solver converges in 1 iteration as expected. This wouldn't > be a big deal except that we're trying to verify that our quadratic > finite elements are working, and the behavior for p1 and p2 elements > is different. The above observations are for p2, while all the cases > work as expected for p1. Should we just not be using mpiaij at all > on 1 processor? > > No, you have a problem. > > 2) In the code snippet I took from the petsc example there are two > preallocation calls (see below). Is that correct, or should I be > checking the matrix type and calling only the preallocation function > for the actual matrix type? i.e. something like > > MatCreate(... > MatSetSizes(... > MatSetFromOptions(... > MatGetType(... > if type==mpiaij: MatMPIAIJSetPreallocationCSR(... > > It does not hurt to call them all. > > Matt > > Chris > > On Oct 21, 2009, at 11:50 AM, Mark F. Adams wrote: > > Chris, just a note, > > Perhaps I missed something here but do something similar to you, eg > overlapping partitions, and PETSc is setup very well to be intrusive > in your code (I sometimes write a little 'addvalue' wrapper) and > assemble your FE matrices directly into a global matrix. You use > the (global) MatSetValues(...) and set the matrix option > MAT_IGNORE_OFF_PROC_ENTRIES (or something like that) so that these > off processor matrix entries, which are computed redundantly, are > thrown away and you get the correct matrix. > > As Barry said you don't need exact non-zero counts to allocate > storage in PETSC matrices, just don't underestimate. > > Mark > > On Oct 21, 2009, at 9:16 AM, Barry Smith wrote: > > > On Oct 20, 2009, at 10:13 PM, Chris Kees wrote: > > Thanks. Just to make sure I'm following, when I create the matrix I > do: > > MatCreate(PETSC_COMM_WORLD,&self->m); > MatSetSizes(self->m,n,n,N,N); > MatSetType(self->m,MATMPIAIJ); > MatSetFromOptions(self->m); > MatSeqAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- > >A.colind,(double*)(SMP(L)->A.nzval)); > MatMPIAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- > >A.colind,(double*)(SMP(L)->A.nzval)); > > and then I copy the matrix at each Newton iteration using the code > in the first message below. You said "...PreallocationCSR() does > the copy very efficiently", but I don't think I'm supposed to > repeatedly call it, right? Looking at the output some more it does > seem like building the preconditioner initially is taking much more > time than the initial pass through MatSetValues. > > If the matrix nonzero structure stays the same then yes you need > call MatMPIAIJSetPreallocationCSR() only once and use the > MatSetValues() calls each Newton step to update the nonzeros. > > Barry > > > Chris > > On Oct 20, 2009, at 8:32 PM, Barry Smith wrote: > > > Here's the deal. In parallel PETSc does not use a single CSR matrix > on each process to hold the MPIAIJ matrix. Hence if you store the > matrix on a process as a single CSR matrix then it has to be copied > into the PETSc datastructure. MatMPIAIJSetPreallocationCSR() does > the copy very efficiently. But you are right, until you free YOUR > rowpt[], colind[] and a[] there are two copies of the matrix. > One could argue that this is ok, because anyways any preconditioner > worth anything will take up at least that much memory later > anyways. For example, if you use the PETSc default preconditioner, > block Jacobi with ILU(0) on each block it will take pretty much the > same amount of memory. > > For people who use SOR or some other preconditioner that requires > very little memory this is not satisfactory because they feel they > could run larger problems without the copy. > > I strongly recommend you use MatMPIAIJSetPreallocationCSR() and live > with any memory limitation. The reason is that doing more than that > has orders of magnitude more pain with generally little payoff. > If you like pain then you can > 1) change your code to not build directly into CSR, instead call > MatSetValuesLocal() as you compute the entries. With this you need > to figure out the preallocation which can be painful code. > 2) write an entire matrix class that stores the matrix like you > store it, not like you store it. This is a huge project, since you > have to write your own matrix-vector, ILU factorization etc. > > Barry > > > > On Oct 20, 2009, at 8:04 PM, Chris Kees wrote: > > Hi, > > Our unstructured finite element code does a fairly standard > overlapping decomposition that allows it to calculate all the non- > zero column entries for the rows owned by the processor (rows > 0...ldim-1 in the local numbering). We assemble them into a local > CSR matrix and then copy them into a PETSc matrix like this: > > for (int i=0;i<ldim;i++) > { > irow[0] = i; > MatSetValuesLocal(PETSCMAT(par_L),1,irow,rowptr[i+1]- > rowptr[i],&colind[rowptr[i]],&a[rowptr[i]],INSERT_VALUES); > } > > where rowptr and colind are the CSR indexing arrays in the local > numbering. > > I would like to eliminate the duplicate memory (and the copy above). > Is there a recommended way to let PETSc reference array 'a' directly > or a way to get rid of 'a' and translate our CSR assembly routines > to use low level PETSc data structures? > > So far I've added MatMPIAIJSetPreallocationCSR to try to speed up > the first solve, but the documentation is clear that 'a' is not a > pointer to the AIJ storage so I still need the copy. I tried using > MatCreateMPIAIJWithSplitArrays with the 'o' arrays set to zero but I > got indexing errors, and the documentation really seems to imply > that the local matrix is really not a standard CSR anyway. If that's > true then it seems like the options I have are to write a shell > matrix for parallel CSR storage or write some new assembly routines > that use MatSetValuesLocal. I'm more concerned about the memory than > the overhead of MatSetValuesLocal, but it would certainly be easier > on me to use the CSR-based assembly routines we already have. > > Thanks, > Chris > > > > > > > > > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which > their experiments lead. > -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20091029/f88ff12f/attachment.html>