[petsc-users] roundoff differences

Boyce Griffith Thu, 18 Aug 2011 21:17:14 -0400

On 8/18/11 12:49 PM, Kevin Green wrote:
> On that note, do PETSc routines take into account ideas such as those in
> that paper for matrix-matrix matrix-vector etc. operations?  I assume
> that that by default they don't (since they take more time), but are
> there flags/functions that do?


One source of this lack of reproducibility is in parallel accumulation, 
e.g., calling VecSetValues() with ADD_VALUES with input values coming 
from different processors.  If efficiency is not an issue, it is not too 
hard to modify VecAssemblyBegin/End so that the final accumulation is 
done in extended precision arithmetic (e.g., with the QD library), or by 
using a more accurate summation algorithm.

-- Boyce

>
> Kevin
>
> On Thu, Aug 18, 2011 at 4:18 AM, Henning Sauerland <uerland at gmail.com
> <mailto:uerland at gmail.com>> wrote:
>
>     Here is an interesting paper on the topic of reproducibility of
>     parallel summation:
>
>     Y. He and C. H. Q. Ding. Using accurate arithmetics to improve
>     numerical reproducibility and stability in parallel applications. J.
>     Supercomput.,  18:259 ? 277,  2001.
>
>
>     Henning
>
>
>     On 18.08.2011, at 09:49, Harald Pfeiffer wrote:
>
>>     Hello,
>>
>>     we use PETSc to solve the nonlinear system arising from
>>     pseudo-spectral discretization of certain elliptic PDEs in
>>     Einstein's equations.  When running the same job multiple times on
>>     the same number of processors on the same workstation, we find
>>     roundoff differences.   Is this expected, e.g. because MPI
>>     reduction calls may behave differently depending on the load of
>>     the machine?   Or should we be concerned and investigate further?
>>
>>     Thanks,
>>     Harald
>>
>>
>>
>>
>>
>>
>>
>>     -------- Original Message --------
>>     Subject:         Re: Quick question about derivatives in SpEC
>>     Date:    Tue, 16 Aug 2011 09:45:27 -0400
>>     From:    Gregory B. Cook <cookgb at wfu.edu> <mailto:cookgb at wfu.edu>
>>     To:      Harald Pfeiffer <pfeiffer at cita.utoronto.ca>
>>     <mailto:pfeiffer at cita.utoronto.ca>
>>     CC:      Larry Kidder <kidder at astro.cornell.edu>
>>     <mailto:kidder at astro.cornell.edu>, Mark Scheel
>>     <scheel at tapir.caltech.edu> <mailto:scheel at tapir.caltech.edu>
>>
>>
>>
>>     Hi Harald,
>>
>>     All of the tests I was doing were on the same 8 cores on my office
>>     workstation.  It is running Ubuntu 11, and uses the default OpenMPI
>>     communication approach.  To make sure it wasn't something I was doing, I
>>     ran two elliptic solves of the ExtendedConformalThinSandwich() volume
>>     terms.  Here are the outputs of snes.dat for the different levels:
>>
>>                     Run 1                          Run 2
>>     Six0/snes.dat
>>          0         7.3385297958166698   0         7.3385297958166698
>>          1         5.1229060531500723   1         5.1229060531500723
>>          2        0.32616852761238285   2        0.32616852761238285
>>          3       0.012351417186533147   3       0.012351417186800266<*****
>>          4     9.7478354935351385e-06   4     9.7478351511500114e-06
>>     Six1/snes.dat
>>          0        0.13405558402489681   0        0.13405558402540407
>>          1     0.00068002100028642610   1     0.00068002089609322440
>>          2     6.8764357250058596e-08   2     6.3738394418031232e-08
>>     Six2/snes.dat
>>          0      0.0063028244769771681   0      0.0063028058475922306
>>          1     1.4538921141731714e-06   1     1.4545032695605256e-06
>>     Six3/snes.dat
>>          0     0.00061476105672438877   0     0.00061476093499534406
>>          1     6.0267672358059814e-08   1     5.4897793428123648e-08
>>     Six4/snes.dat
>>          0     0.00053059501859595651   0     0.00053059591479892143
>>          1     4.8003269489205705e-08   1     4.8079799390886591e-08
>>     Six5/snes.dat
>>          0     3.6402372419546429e-05   0     3.6402169997838670e-05
>>          1     5.3117360561476420e-09   1     5.2732089856727503e-09
>>
>>     The differences are clearly at the level of roundoff, but it is
>>     "strange"  that you cannot reproduce identical results.
>>
>>     I've attached all of the .input files for this run in case you want to
>>     try to reproduce my findings.
>>
>>     Greg
>>
>>     On 08/16/2011 06:21 AM, Harald Pfeiffer wrote:
>>     >  Hi Greg,
>>     >
>>     >  some thoughts:
>>     >
>>     >  Petsc is using standard MPI reduction calls, which may give results 
>> that
>>     >  differ by roundoff. We have positively noticed this happening for
>>     >  different number of processes, but perhaps this also happens depending
>>     >  on where in a cluster your jobs run (different network topology
>>     >  depending on whether all processors are on the same rack, vs. split
>>     >  among racks; dynamic load-balancing of network communication).
>>     >
>>     >  You might want to try reserving a few nodes interactively, and then
>>     >  running the elliptic solver multiple times on this same set of nodes.
>>     >
>>     >  The Mover does indeed load-balanced interpolation, but when doing so,
>>     >  the MPI communication should not affect identical results.
>>     >
>>     >  Once there are roundoff differences, they are typically amplified 
>> during
>>     >  a petsc linear solve. The iterative algorithm takes different paths
>>     >  toward the solution, and a difference of 1e-10 doesn't seem excessive.
>>     >
>>     >  Harald
>>     >
>>     >  ps. Preconditioning is done differently on-processor and off-processor
>>     >  and depends therefore highly on the processor count. So if you were to
>>     >  change number of processors, the iterative solve will proceed very
>>     >  differently.
>>     >
>>     >  On 8/15/11 10:27 PM, Gregory B. Cook wrote:
>>     >>  Hi Larry,
>>     >>
>>     >>  I ran a check using the default ExtendedConformalThinSandwich() 
>> volume
>>     >>  terms and this also produced roundoff error differences between
>>     >>  identical runs, so I feel better about that. I am using the same
>>     >>  number of processors, but if there is any kind of dynamic load
>>     >>  balancing for interpolation/communication/etc, then I can see that
>>     >>  different runs might end up using different boundary communications.
>>     >>  Maybe that's all there is to it?
>>     >>
>>     >>  Greg
>>     >>
>>     >>  On 08/15/2011 04:16 PM, Larry Kidder wrote:
>>     >>>  Hi Greg,
>>     >>>
>>     >>>  Harald is traveling, so I am not sure when he will answer.
>>     >>>  My vague recollection is that there is something about how PETSc 
>> does
>>     >>>  preconditioning in parallel that leads to not producing the same 
>> result;
>>     >>>  but I don't recall if this happens in general, or only if you 
>> change the
>>     >>>  distribution of processes.
>>     >>>
>>     >>>  Larry
>>     >>>
>>     >>>  Gregory B. Cook wrote:
>>     >>>>  Hi Guys,
>>     >>>>
>>     >>>>  I have a follow-up question that may be tangentially related to my
>>     >>>>  original question about derivatives. This one is targeted at 
>> Harald.
>>     >>>>
>>     >>>>  When I run a version of my code where the very small errors in the
>>     >>>>  derivative of the metric are not present (I code them in 
>> differently),
>>     >>>>  I find that running the exact same input files successively does 
>> not
>>     >>>>  produce exactly the same results. This is a multi-level elliptic 
>> solve
>>     >>>>  on a complex domain for binary black holes. On Level-0, the the
>>     >>>>  results returned in snes.dat are identical. On Level-1, the initial
>>     >>>>  and second snes norms are identical, but the third differs. After
>>     >>>>  this, all snes norms differ.
>>     >>>>
>>     >>>>  Is this to be expected? Does PETSc not produce identical results on
>>     >>>>  consecutive solves with the same starting point? Is there 
>> something in
>>     >>>>  the MPI communication that means that the results should differ? 
>> The
>>     >>>>  differences start at the order of 10^-13, but grow by the 6th 
>> level to
>>     >>>>  be of order 10^-10.
>>     >>>>
>>     >>>>  Greg
>>     >>>>
>>     >>>>  On 08/15/2011 01:02 PM, Larry Kidder wrote:
>>     >>>>>  Hi Greg,
>>     >>>>>
>>     >>>>>  Did you compute the norm of the metric itself?
>>     >>>>>  What domain did you use?
>>     >>>>>
>>     >>>>>  Larry
>>     >>>>>
>>     >>>>>  Gregory B. Cook wrote:
>>     >>>>>>  Hi Guys,
>>     >>>>>>
>>     >>>>>>  I was doing a simple test as part of debugging some code I'm 
>> writing.
>>     >>>>>>  I ended up placing the following relevant lines of code into the
>>     >>>>>>  EllipticItems.input and EllipticObservers.input files:
>>     >>>>>>
>>     >>>>>>  ---EllipticItems.input---
>>     >>>>>>  EvaluateMatrixFormula(Output=InvConformalMetric; Dim=3; Symm=11;
>>     >>>>>>  M[0,0]=1; M[1,1]=1; M[2,2]=1),
>>     >>>>>>  FirstDeriv(Input=InvConformalMetric; Output=dInvConformalMetric),
>>     >>>>>>  SecondDeriv(Input=InvConformalMetric; 
>> Output=ddInvConformalMetric),
>>     >>>>>>
>>     >>>>>>  FlattenDeriv(Input=dInvConformalMetric;
>>     >>>>>>  Output=fdInvConformalMetric;DerivPosition=Last),
>>     >>>>>>  FlattenDeriv(Input=ddInvConformalMetric;
>>     >>>>>>  Output=fddInvConformalMetric;DerivPosition=Last),
>>     >>>>>>
>>     >>>>>>  ---EllipticObservers.input---
>>     >>>>>>  NormOfTensor(Input=fdInvConformalMetric, fddInvConformalMetric;
>>     >>>>>>  Filename=dInvCM_L2.dat;Op=L2; MetricForTensors=None),
>>     >>>>>>  NormOfTensor(Input=fdInvConformalMetric, fddInvConformalMetric;
>>     >>>>>>  Filename=dInvCM_Linf.dat;Op=Linf; MetricForTensors=None),
>>     >>>>>>
>>     >>>>>>
>>     >>>>>>  The odd thing is that the norms that I get out are not exactly 
>> zero.
>>     >>>>>>  They are very small, but I'm taking the first and second 
>> derivatives
>>     >>>>>>  of the identity matrix, so I would expect them to evaluate to 
>> exactly
>>     >>>>>>  zero. The fact that they don't leads me to think that there is
>>     >>>>>>  something wrong either in my code or in how I have written the 
>> input
>>     >>>>>>  files.
>>     >>>>>>
>>     >>>>>>  Should these derivatives evaluate to exactly zero?
>>     >>>>>>
>>     >>>>>>  Greg
>>     >>>>>
>>     >>>
>>     >
>>
>
>

[petsc-users] roundoff differences

Reply via email to