Re: [petsc-users] MPI linear solver reproducibility question
In the typical FD implementation, you only set local rows, but with FE and sometimes FV, you also create values that need to be communicated and summed on other processors. Makes sense. Anyway, in this case, I am certain that I am giving the solver bitwise identical matrices from each process. I am not using a preconditioner, using BCGS, with Petsc version 3.13.3. So then, how can I make sure that I am "using an MPI that follows the suggestion for implementers about determinism"? I am using MPICH version 3.3a2, didn't do anything special when installing it. Does that sound OK? If so, I could upgrade to the latest Petsc, try again, and if confirmed that it persists, could provide a reproduction scenario. On Sat, Apr 1, 2023 at 9:53 PM Jed Brown wrote: > Mark McClure writes: > > > Thank you, I will try BCGSL. > > > > And good to know that this is worth pursuing, and that it is possible. > Step > > 1, I guess I should upgrade to the latest release on Petsc. > > > > How can I make sure that I am "using an MPI that follows the suggestion > for > > implementers about determinism"? I am using MPICH version 3.3a2. > > > > I am pretty sure that I'm assembling the same matrix every time, but I'm > > not sure how it would depend on 'how you do the communication'. Each > > process is doing a series of MatSetValues with INSERT_VALUES, > > assembling the matrix by rows. My understanding of this process is that > > it'd be deterministic. > > In the typical FD implementation, you only set local rows, but with FE and > sometimes FV, you also create values that need to be communicated and > summed on other processors. >
Re: [petsc-users] MPI linear solver reproducibility question
Mark McClure writes: > Thank you, I will try BCGSL. > > And good to know that this is worth pursuing, and that it is possible. Step > 1, I guess I should upgrade to the latest release on Petsc. > > How can I make sure that I am "using an MPI that follows the suggestion for > implementers about determinism"? I am using MPICH version 3.3a2. > > I am pretty sure that I'm assembling the same matrix every time, but I'm > not sure how it would depend on 'how you do the communication'. Each > process is doing a series of MatSetValues with INSERT_VALUES, > assembling the matrix by rows. My understanding of this process is that > it'd be deterministic. In the typical FD implementation, you only set local rows, but with FE and sometimes FV, you also create values that need to be communicated and summed on other processors.
Re: [petsc-users] MPI linear solver reproducibility question
Thank you, I will try BCGSL. And good to know that this is worth pursuing, and that it is possible. Step 1, I guess I should upgrade to the latest release on Petsc. How can I make sure that I am "using an MPI that follows the suggestion for implementers about determinism"? I am using MPICH version 3.3a2. I am pretty sure that I'm assembling the same matrix every time, but I'm not sure how it would depend on 'how you do the communication'. Each process is doing a series of MatSetValues with INSERT_VALUES, assembling the matrix by rows. My understanding of this process is that it'd be deterministic. On Sat, Apr 1, 2023 at 9:05 PM Jed Brown wrote: > If you use unpreconditioned BCGS and ensure that you assemble the same > matrix (depends how you do the communication for that), I think you'll get > bitwise reproducible results when using an MPI that follows the suggestion > for implementers about determinism. Beyond that, it'll depend somewhat on > the preconditioner. > > If you like BCGS, you may want to try BCGSL, which has a longer memory and > tends to be more robust. But preconditioning is usually critical and the > place to devote most effort. > > Mark McClure writes: > > > Hello, > > > > I have been a user of Petsc for quite a few years, though I haven't > updated > > my version in a few years, so it's possible that my comments below could > be > > 'out of date'. > > > > Several years ago, I'd asked you guys about reproducibility. I observed > > that if I gave an identical matrix to the Petsc linear solver, I would > get > > a bit-wise identical result back if running on one processor, but if I > ran > > with MPI, I would see differences at the final sig figs, below the > > convergence criterion. Even if rerunning the same exact calculation on > the > > same exact machine. > > > > Ie, with repeated tests, it was always converging to the same answer > > 'within convergence tolerance', but not consistent in the sig figs beyond > > the convergence tolerance. > > > > At the time, the response that this was unavoidable, and related to the > > issue that machine arithmetic is not commutative, and so the timing of > when > > processors were recombining information (which was random, effectively a > > race condition) was causing these differences. > > > > Am I remembering correctly? And, if so, is this still a property of the > > Petsc linear solver with MPI, and is there now any option available to > > resolve it? I would be willing to accept a performance hit in order to > get > > guaranteed bitwise consistency, even when running with MPI. > > > > I am using the solver KSPBCGS, without a preconditioner. This is the > > selection because several years ago, I did testing, and found that on the > > particular linear systems that I am usually working with, this solver > (with > > no preconditioner) was the most robust, in terms of consistently > > converging, and in terms of performance. Actually, I also tested a > variety > > of other linear solvers other than Petsc (including other implementations > > of BiCGStab), and found that the Petsc BCGS was the best performer. > Though, > > I'm curious, have there been updates to that algorithm in recent years, > > where I should consider updating to a newer Petsc build and comparing? > > > > Best regards, > > Mark McClure >
Re: [petsc-users] MPI linear solver reproducibility question
If you use unpreconditioned BCGS and ensure that you assemble the same matrix (depends how you do the communication for that), I think you'll get bitwise reproducible results when using an MPI that follows the suggestion for implementers about determinism. Beyond that, it'll depend somewhat on the preconditioner. If you like BCGS, you may want to try BCGSL, which has a longer memory and tends to be more robust. But preconditioning is usually critical and the place to devote most effort. Mark McClure writes: > Hello, > > I have been a user of Petsc for quite a few years, though I haven't updated > my version in a few years, so it's possible that my comments below could be > 'out of date'. > > Several years ago, I'd asked you guys about reproducibility. I observed > that if I gave an identical matrix to the Petsc linear solver, I would get > a bit-wise identical result back if running on one processor, but if I ran > with MPI, I would see differences at the final sig figs, below the > convergence criterion. Even if rerunning the same exact calculation on the > same exact machine. > > Ie, with repeated tests, it was always converging to the same answer > 'within convergence tolerance', but not consistent in the sig figs beyond > the convergence tolerance. > > At the time, the response that this was unavoidable, and related to the > issue that machine arithmetic is not commutative, and so the timing of when > processors were recombining information (which was random, effectively a > race condition) was causing these differences. > > Am I remembering correctly? And, if so, is this still a property of the > Petsc linear solver with MPI, and is there now any option available to > resolve it? I would be willing to accept a performance hit in order to get > guaranteed bitwise consistency, even when running with MPI. > > I am using the solver KSPBCGS, without a preconditioner. This is the > selection because several years ago, I did testing, and found that on the > particular linear systems that I am usually working with, this solver (with > no preconditioner) was the most robust, in terms of consistently > converging, and in terms of performance. Actually, I also tested a variety > of other linear solvers other than Petsc (including other implementations > of BiCGStab), and found that the Petsc BCGS was the best performer. Though, > I'm curious, have there been updates to that algorithm in recent years, > where I should consider updating to a newer Petsc build and comparing? > > Best regards, > Mark McClure
[petsc-users] MPI linear solver reproducibility question
Hello, I have been a user of Petsc for quite a few years, though I haven't updated my version in a few years, so it's possible that my comments below could be 'out of date'. Several years ago, I'd asked you guys about reproducibility. I observed that if I gave an identical matrix to the Petsc linear solver, I would get a bit-wise identical result back if running on one processor, but if I ran with MPI, I would see differences at the final sig figs, below the convergence criterion. Even if rerunning the same exact calculation on the same exact machine. Ie, with repeated tests, it was always converging to the same answer 'within convergence tolerance', but not consistent in the sig figs beyond the convergence tolerance. At the time, the response that this was unavoidable, and related to the issue that machine arithmetic is not commutative, and so the timing of when processors were recombining information (which was random, effectively a race condition) was causing these differences. Am I remembering correctly? And, if so, is this still a property of the Petsc linear solver with MPI, and is there now any option available to resolve it? I would be willing to accept a performance hit in order to get guaranteed bitwise consistency, even when running with MPI. I am using the solver KSPBCGS, without a preconditioner. This is the selection because several years ago, I did testing, and found that on the particular linear systems that I am usually working with, this solver (with no preconditioner) was the most robust, in terms of consistently converging, and in terms of performance. Actually, I also tested a variety of other linear solvers other than Petsc (including other implementations of BiCGStab), and found that the Petsc BCGS was the best performer. Though, I'm curious, have there been updates to that algorithm in recent years, where I should consider updating to a newer Petsc build and comparing? Best regards, Mark McClure
Re: [petsc-users] Augmented Linear System
Dear Mark, I thought so that all bets would be off with my attempt. So just to illustrate what I meant by my second approach. I can start from the second line of the blocksystem and rewrite it to D dp = f2 - C du then I tried a pesudoinverse (for lack of alternatives I did this with an SVD) and get (denoting the pseudoinverse with D^+) dp = D^+(f2 - C du) plugging that into the first line I get (K - B D^+ C) du = f1 - B D^+ f2 The left hand side I put into a MATSHELL were I store D^+ and the 8 vectors for B and C as a user ctx and update them on the fly during newton. However as you said using this matrix a Amat and pure K as a Pmat in a KSP fails more often then it suceeds so I guess this version of the Schur-Complement is not really well-behaved. And thanks for the advice with Walker et al, what I meant here was that in the outer newton I wanted to do an inexact Newton satisfying || F(x) + F'(x)dx || <= eta_k || F(x) || with F(x) being my residual comprised of the two vectors and F'(x) my block-system. Now I can select eta_k according to SOME rules, but I haven't found anything on howto choose the tolerances in my KSP to finally arrive at this inequality. Reason for doing that in the first place was, that I wanted to safe iteration counts on my total solves, as K is non-symmetric and I have to use a GMRES (preferred) or BiCGSTAB (not so preferred). Cheers Elias Am 31.03.2023 um 15:25 schrieb Mark Adams: On Fri, Mar 31, 2023 at 4:58 AM Karabelas, Elias (elias.karabe...@uni-graz.at) wrote: Hi Mark, thanks for the input, however I didn't quite get what you mean. Maybe I should be a bit more precisce what I want to achieve and why: So this specific form of block system arises in some biomedical application that colleagues and me published in https://www.sciencedirect.com/science/article/pii/S0045782521004230 (the intersting part is Appendix B.3) It boils down to a Newton method for solving nolinear mechanics describing the motion of the human hear, that is coupled on some Neumann surfaces (the surfaces of the inner cavities of each bloodpool in the heart) with a pressure that comes from a complicated 0D ODE model that describes cardiovascular physiology. This comes to look like |F_1(u,p)| | 0 | | | = | | |F_2(u,p)| | 0 | with F_1 is the residual of nonlinear mechanics plus a nonlinear boundary coupling term and F_2 is a coupling term to the ODE system. In this case u is displacement and p is the pressure calculated from the ODE model (one for each cavity in the heart, which gives four). After linearization, we arrive exactly at the aforementioned blocksystem, that we solve at the moment by a Schurcomplement approach based on K. So using this we can get around without doing a MATSHELL for the schurcomplement as we can just apply the KSP for K five times in order to approximate the solution of the schur-complement system. So you compute an explicit Schur complement (4 solves) and then the real solve use 1 more K solve. I think this is pretty good as is. You are lucky with only 4 of these pressure equations. I've actually do this on a problem with 100s of extra equations (surface averaging equations) but the problem was linear and would 1000s of times steps, or more, and this huge set up cost was amortized. However here it gets tricky: The outer newton loop is working based on an inexact newton method with a forcing term from Walker et al. So basically the atol and rtol of the KSP are not constant but vary, so I guess this will influence how well we actually resolve the solution to the schur-complement system (I tried to find some works that can explain how to choose forcing terms in this case but found none). Honestly, Walker is a great guy, but I would not get too hung up on this. I've done a lot of plasticity work long ago and gave up on Walker et al. Others have had the same experience. What is new with your problem is how accurately do you want the Schur complement (4) solves. This brought me to think if we can do this the other way around and do a pseudo-inverse of D because it's 4x4 and there is no need for a KSP there. I did a test implementation with a MATSHELL that realizes (K - B D^+ C) and use just K for building an GAMG prec however this fails spectaculary, because D^+ can behave very badly and the other way around I have (C K^-1 B - D) and this behaves more like a singular pertubation of the Matrix C K^-1 B which behaves nicer. So here I stopped investigating because my PETSc expertise is not bad but certainly not good enough to judge which approach would pay off more in terms of runtime (by gut feeling was that building a MATSHELL requires then only one KSP solve vs the other 5). However I'm happy to hear some alternatives that I could persue