Hi, Folks -- I recently noticed an irreproducibility issue with some parallel code which uses PETSc. Basically, I can run the same parallel simulation on the same number of processors and get different answers. The discrepancies were only apparent to the eye after very long integration times (e.g., approx. 20k timesteps), but in fact the computations start diverging after only a few handfuls (30--40) of timesteps. (Running the same code in serial yields reproducible results, but changing the MPI library or running the code on a different system did not make any difference.)
I believe that I have tracked this issue down to VecSetValues/VecSetValuesBlocked. A part of the computation uses VecSetValues to accumulate the values of forces at the nodes of a mesh. At some nodes, force contributions come from multiple processors. It appears that there is some "randomness" in the way that this accumulation is performed in parallel, presumably related to the order in which values are communicated. I have found that I can make modified versions of VecSetValues_MPI(), VecSetValuesBlocked_MPI(), and VecAssemblyEnd_MPI() to ensure reproducible results. I make the following modifications: (1) VecSetValues_MPI() and VecSetValuesBlocked_MPI() place all values into the stash instead of immediately adding the local components. (2) VecAssemblyEnd_MPI() "extracts" all of the values provided by the stash and bstash, placing these values into lists corresponding to each of the local entries in the vector. Next, once all values have been extracted, I sort each of these lists (e.g., by descending or ascending magnitude). Making these changes appears to yield exactly reproducible results, although I am still performing tests to try to shake out any other problems. Another approach which seems to work is to perform the final summation in higher precision (e.g., using "double-double precision" arithmetic). Using double-double precision allows me to skip the sorting step, although since the number of values to sort is relatively small, it may be cheaper to sort. Using more accurate summation methods (e.g., compensated summation) does not appear to fix the lack or reproducibility problem. I was wondering if anyone has run into similar issues with PETSc and has a better solution. Thanks! -- Boyce PS: These tests were done using PETSc 2.3.3. I have just finished upgrading the code to PETSc 3.0.0 and will re-run them. However, looking at VecSetValues_MPI()/VecSetValuesBlocked_MPI()/etc, it looks like this code is essentially the same in PETSc 2.3.3 and 3.0.0.
