On 8/18/11 12:49 PM, Kevin Green wrote: > On that note, do PETSc routines take into account ideas such as those in > that paper for matrix-matrix matrix-vector etc. operations? I assume > that that by default they don't (since they take more time), but are > there flags/functions that do?
One source of this lack of reproducibility is in parallel accumulation, e.g., calling VecSetValues() with ADD_VALUES with input values coming from different processors. If efficiency is not an issue, it is not too hard to modify VecAssemblyBegin/End so that the final accumulation is done in extended precision arithmetic (e.g., with the QD library), or by using a more accurate summation algorithm. -- Boyce > > Kevin > > On Thu, Aug 18, 2011 at 4:18 AM, Henning Sauerland <uerland at gmail.com > <mailto:uerland at gmail.com>> wrote: > > Here is an interesting paper on the topic of reproducibility of > parallel summation: > > Y. He and C. H. Q. Ding. Using accurate arithmetics to improve > numerical reproducibility and stability in parallel applications. J. > Supercomput., 18:259 ? 277, 2001. > > > Henning > > > On 18.08.2011, at 09:49, Harald Pfeiffer wrote: > >> Hello, >> >> we use PETSc to solve the nonlinear system arising from >> pseudo-spectral discretization of certain elliptic PDEs in >> Einstein's equations. When running the same job multiple times on >> the same number of processors on the same workstation, we find >> roundoff differences. Is this expected, e.g. because MPI >> reduction calls may behave differently depending on the load of >> the machine? Or should we be concerned and investigate further? >> >> Thanks, >> Harald >> >> >> >> >> >> >> >> -------- Original Message -------- >> Subject: Re: Quick question about derivatives in SpEC >> Date: Tue, 16 Aug 2011 09:45:27 -0400 >> From: Gregory B. Cook <cookgb at wfu.edu> <mailto:cookgb at wfu.edu> >> To: Harald Pfeiffer <pfeiffer at cita.utoronto.ca> >> <mailto:pfeiffer at cita.utoronto.ca> >> CC: Larry Kidder <kidder at astro.cornell.edu> >> <mailto:kidder at astro.cornell.edu>, Mark Scheel >> <scheel at tapir.caltech.edu> <mailto:scheel at tapir.caltech.edu> >> >> >> >> Hi Harald, >> >> All of the tests I was doing were on the same 8 cores on my office >> workstation. It is running Ubuntu 11, and uses the default OpenMPI >> communication approach. To make sure it wasn't something I was doing, I >> ran two elliptic solves of the ExtendedConformalThinSandwich() volume >> terms. Here are the outputs of snes.dat for the different levels: >> >> Run 1 Run 2 >> Six0/snes.dat >> 0 7.3385297958166698 0 7.3385297958166698 >> 1 5.1229060531500723 1 5.1229060531500723 >> 2 0.32616852761238285 2 0.32616852761238285 >> 3 0.012351417186533147 3 0.012351417186800266<***** >> 4 9.7478354935351385e-06 4 9.7478351511500114e-06 >> Six1/snes.dat >> 0 0.13405558402489681 0 0.13405558402540407 >> 1 0.00068002100028642610 1 0.00068002089609322440 >> 2 6.8764357250058596e-08 2 6.3738394418031232e-08 >> Six2/snes.dat >> 0 0.0063028244769771681 0 0.0063028058475922306 >> 1 1.4538921141731714e-06 1 1.4545032695605256e-06 >> Six3/snes.dat >> 0 0.00061476105672438877 0 0.00061476093499534406 >> 1 6.0267672358059814e-08 1 5.4897793428123648e-08 >> Six4/snes.dat >> 0 0.00053059501859595651 0 0.00053059591479892143 >> 1 4.8003269489205705e-08 1 4.8079799390886591e-08 >> Six5/snes.dat >> 0 3.6402372419546429e-05 0 3.6402169997838670e-05 >> 1 5.3117360561476420e-09 1 5.2732089856727503e-09 >> >> The differences are clearly at the level of roundoff, but it is >> "strange" that you cannot reproduce identical results. >> >> I've attached all of the .input files for this run in case you want to >> try to reproduce my findings. >> >> Greg >> >> On 08/16/2011 06:21 AM, Harald Pfeiffer wrote: >> > Hi Greg, >> > >> > some thoughts: >> > >> > Petsc is using standard MPI reduction calls, which may give results >> that >> > differ by roundoff. We have positively noticed this happening for >> > different number of processes, but perhaps this also happens depending >> > on where in a cluster your jobs run (different network topology >> > depending on whether all processors are on the same rack, vs. split >> > among racks; dynamic load-balancing of network communication). >> > >> > You might want to try reserving a few nodes interactively, and then >> > running the elliptic solver multiple times on this same set of nodes. >> > >> > The Mover does indeed load-balanced interpolation, but when doing so, >> > the MPI communication should not affect identical results. >> > >> > Once there are roundoff differences, they are typically amplified >> during >> > a petsc linear solve. The iterative algorithm takes different paths >> > toward the solution, and a difference of 1e-10 doesn't seem excessive. >> > >> > Harald >> > >> > ps. Preconditioning is done differently on-processor and off-processor >> > and depends therefore highly on the processor count. So if you were to >> > change number of processors, the iterative solve will proceed very >> > differently. >> > >> > On 8/15/11 10:27 PM, Gregory B. Cook wrote: >> >> Hi Larry, >> >> >> >> I ran a check using the default ExtendedConformalThinSandwich() >> volume >> >> terms and this also produced roundoff error differences between >> >> identical runs, so I feel better about that. I am using the same >> >> number of processors, but if there is any kind of dynamic load >> >> balancing for interpolation/communication/etc, then I can see that >> >> different runs might end up using different boundary communications. >> >> Maybe that's all there is to it? >> >> >> >> Greg >> >> >> >> On 08/15/2011 04:16 PM, Larry Kidder wrote: >> >>> Hi Greg, >> >>> >> >>> Harald is traveling, so I am not sure when he will answer. >> >>> My vague recollection is that there is something about how PETSc >> does >> >>> preconditioning in parallel that leads to not producing the same >> result; >> >>> but I don't recall if this happens in general, or only if you >> change the >> >>> distribution of processes. >> >>> >> >>> Larry >> >>> >> >>> Gregory B. Cook wrote: >> >>>> Hi Guys, >> >>>> >> >>>> I have a follow-up question that may be tangentially related to my >> >>>> original question about derivatives. This one is targeted at >> Harald. >> >>>> >> >>>> When I run a version of my code where the very small errors in the >> >>>> derivative of the metric are not present (I code them in >> differently), >> >>>> I find that running the exact same input files successively does >> not >> >>>> produce exactly the same results. This is a multi-level elliptic >> solve >> >>>> on a complex domain for binary black holes. On Level-0, the the >> >>>> results returned in snes.dat are identical. On Level-1, the initial >> >>>> and second snes norms are identical, but the third differs. After >> >>>> this, all snes norms differ. >> >>>> >> >>>> Is this to be expected? Does PETSc not produce identical results on >> >>>> consecutive solves with the same starting point? Is there >> something in >> >>>> the MPI communication that means that the results should differ? >> The >> >>>> differences start at the order of 10^-13, but grow by the 6th >> level to >> >>>> be of order 10^-10. >> >>>> >> >>>> Greg >> >>>> >> >>>> On 08/15/2011 01:02 PM, Larry Kidder wrote: >> >>>>> Hi Greg, >> >>>>> >> >>>>> Did you compute the norm of the metric itself? >> >>>>> What domain did you use? >> >>>>> >> >>>>> Larry >> >>>>> >> >>>>> Gregory B. Cook wrote: >> >>>>>> Hi Guys, >> >>>>>> >> >>>>>> I was doing a simple test as part of debugging some code I'm >> writing. >> >>>>>> I ended up placing the following relevant lines of code into the >> >>>>>> EllipticItems.input and EllipticObservers.input files: >> >>>>>> >> >>>>>> ---EllipticItems.input--- >> >>>>>> EvaluateMatrixFormula(Output=InvConformalMetric; Dim=3; Symm=11; >> >>>>>> M[0,0]=1; M[1,1]=1; M[2,2]=1), >> >>>>>> FirstDeriv(Input=InvConformalMetric; Output=dInvConformalMetric), >> >>>>>> SecondDeriv(Input=InvConformalMetric; >> Output=ddInvConformalMetric), >> >>>>>> >> >>>>>> FlattenDeriv(Input=dInvConformalMetric; >> >>>>>> Output=fdInvConformalMetric;DerivPosition=Last), >> >>>>>> FlattenDeriv(Input=ddInvConformalMetric; >> >>>>>> Output=fddInvConformalMetric;DerivPosition=Last), >> >>>>>> >> >>>>>> ---EllipticObservers.input--- >> >>>>>> NormOfTensor(Input=fdInvConformalMetric, fddInvConformalMetric; >> >>>>>> Filename=dInvCM_L2.dat;Op=L2; MetricForTensors=None), >> >>>>>> NormOfTensor(Input=fdInvConformalMetric, fddInvConformalMetric; >> >>>>>> Filename=dInvCM_Linf.dat;Op=Linf; MetricForTensors=None), >> >>>>>> >> >>>>>> >> >>>>>> The odd thing is that the norms that I get out are not exactly >> zero. >> >>>>>> They are very small, but I'm taking the first and second >> derivatives >> >>>>>> of the identity matrix, so I would expect them to evaluate to >> exactly >> >>>>>> zero. The fact that they don't leads me to think that there is >> >>>>>> something wrong either in my code or in how I have written the >> input >> >>>>>> files. >> >>>>>> >> >>>>>> Should these derivatives evaluate to exactly zero? >> >>>>>> >> >>>>>> Greg >> >>>>> >> >>> >> > >> > >
