Congratulations you have found a ginormous bug in PETSc! Thanks for the detail information on the problem.
I will post a fix shortly. Barry > On Nov 16, 2023, at 6:19 PM, Sreeram R Venkat <[email protected]> wrote: > > I have a program which reads a vector from file into an array, and then uses > that array to create a PETSc Vec object. The Vec is defined on the global > communicator, but not all processes actually contain entries of it. For > example, suppose we have 4 processors, and the vector is of size 10. Rank 0 > will contain entries 0-4 and Rank 1 will contain entries 5-9. Ranks 2 and 3 > will not have any entries of the Vec. > > This Vec is then used as an input to other parts of the code, and those work > fine. However, if I try to take the norm of the Vec with VecNorm(), I get the > error > > `MPI_Allreduce() called in different locations (code lines) on different > processors` > > The stack trace shows that ranks 0 and 1 (from the above example) are still > in the VecNorm() function while ranks 2 and 3 have moved on to a later part > of the code. If I add a PetscBarrier() after the VecNorm(), I find that the > program hangs. > > The funny thing is that part of the code duplicates the Vec with > VecDuplicate() and assigns to the duplicated vector the result of some > computations. The duplicated Vec has the same layout as the original Vec, but > taking VecNorm() on the duplicated Vec works fine. If I use VecCopy(), > however, the copied Vec also causes VecNorm() to hang. I've printed out the > original Vec, and there are no corrupted/NaN entries. > > I have a temporary workaround where I perturb the original Vec slightly > before copying it to another Vec. This causes the program to successfully > terminate. > > Any advice on how to get VecNorm() working with the original Vec? > > Thanks, > Sreeram
