On Tue, Oct 22, 2013 at 3:12 PM, paul zhang <[email protected]> wrote:
> One more question, can I solve the system in parallel? > Yes, or you would be using ETSC :) Matt > > On Tue, Oct 22, 2013 at 4:08 PM, Matthew Knepley <[email protected]>wrote: > >> On Tue, Oct 22, 2013 at 3:04 PM, huaibao zhang >> <[email protected]>wrote: >> >>> Thanks for the answer. It makes sense. >>> >>> However, in my case, matrix A is huge and rather sparse, which also owns >>> a pretty good diagonal structure although there are some other elements are >>> nonzero. I have to look for a better way to solve the system more >>> efficiently. If in parallel, it is even better. >>> >>> Attached is an example for A's structure. The pink block is a matrix >>> with 10x10 elements. The row or column in my case can be in million size. >>> >> >> The analytic character of the operator is usually more important than the >> sparsity structure for scalable solvers. >> The pattern matters a lot for direct solvers, and you should definitely >> try them (SuperLU_dist or MUMPS in PETSc). >> If they use too much memory or are too slow, then you need to investigate >> good preconditioners for iterative methods. >> >> Matt >> >> >>> Thanks again. >>> Paul >>> >>> >>> >>> -- >>> Huaibao (Paul) Zhang >>> *Gas Surface Interactions Lab* >>> Department of Mechanical Engineering >>> University of Kentucky, >>> Lexington, KY, 40506-0503 >>> *Office*: 216 Ralph G. Anderson Building >>> *Web*:gsil.engineering.uky.edu >>> >>> On Oct 21, 2013, at 12:53 PM, Matthew Knepley <[email protected]> wrote: >>> >>> On Mon, Oct 21, 2013 at 11:23 AM, paul zhang <[email protected]>wrote: >>> >>>> Hi Jed, >>>> >>>> Thanks a lot for your answer. It really helps. I built parts of the >>>> matrix on each processor, then collected them into a global one according >>>> to their global position. Actually I used two MPI function instead of the >>>> one in the example, where the local size, as well as the global size is >>>> given. >>>> VecCreateMPI and MatCreateMPIAIJ. It does not really matter right? >>>> >>>> My continuing question is since the iteration for the system is global. >>>> Is it more efficient if I solve locally instead. ie. solve parts on each of >>>> the processor instead of doing globally. >>>> >>> >>> No, because this ignores the coupling between domains. >>> >>> Matt >>> >>> >>>> Thanks again, >>>> >>>> Paul >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Oct 21, 2013 at 11:42 AM, Jed Brown <[email protected]>wrote: >>>> >>>>> paul zhang <[email protected]> writes: >>>>> >>>>> > I am using KSP, more specifically FGMRES method, with MPI to solve >>>>> Ax=b >>>>> > system. Here is what I am doing. I cut my computation domain into >>>>> many >>>>> > pieces, in each of them I compute independently by solving fluid >>>>> equations. >>>>> > This has nothing to do with PETSc. Finally, I collect all of the >>>>> > information and load it to a whole A matrix. >>>>> >>>>> I hope you build parts of this matrix on each processor, as is done in >>>>> the examples. Note the range Istart to Iend here: >>>>> >>>>> >>>>> http://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex2.c.html >>>>> >>>>> > My question is how PETSc functions work in parallel in my case. >>>>> There are >>>>> > two guesses to me. First, PETSc solves its own matrix for each >>>>> domain using >>>>> > local processor, although A is a global. For the values like number >>>>> of >>>>> > iterations, solution vector, their numbers should have equaled to the >>>>> > number of processors I applied, but I get only one value for each of >>>>> them. >>>>> > The reason is that the processors must talk with each other once all >>>>> of >>>>> > their work is done, that is why I received the "all reduced" value. >>>>> This is >>>>> > more logical than my second guess. >>>>> >>>>> It does not work because the solution operators are global, so to solve >>>>> the problem, the iteration must be global. >>>>> >>>>> > In the second one, the system is solved in parallel too. But PETSc >>>>> function >>>>> > redistributes the global sparse matrix A to each of the processors >>>>> after >>>>> > its load is complete. That is to say now each processor may not >>>>> solve the >>>>> > its own partition matrix. >>>>> >>>>> Hopefully you build the matrix already-distributed. The default >>>>> _preconditioner_ is local, but the iteration is global. PETSc does not >>>>> "redistribute" the matrix automatically, though if you call >>>>> MatSetSizes() and pass PETSC_DECIDE for the local sizes, PETSc will >>>>> choose them. >>>>> >>>> >>>> >>>> >>>> -- >>>> Huaibao (Paul) Zhang >>>> *Gas Surface Interactions Lab* >>>> Department of Mechanical Engineering >>>> University of Kentucky, >>>> Lexington, >>>> KY, 40506-0503* >>>> Office*: 216 Ralph G. Anderson Building >>>> *Web*:gsil.engineering.uky.edu >>>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > > > -- > Huaibao (Paul) Zhang > *Gas Surface Interactions Lab* > Department of Mechanical Engineering > University of Kentucky, > Lexington, > KY, 40506-0503* > Office*: 216 Ralph G. Anderson Building > *Web*:gsil.engineering.uky.edu > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
<<PastedGraphic-1.png>>
