Yea, I implemented it as suggested by Matt and Barry with the BCast of the element that sits only on process 0, and that seems to work fine.
-gideon > On Feb 25, 2017, at 5:57 PM, Barry Smith <[email protected]> wrote: > > >> On Feb 25, 2017, at 11:00 AM, Matthew Knepley <[email protected]> wrote: >> >> On Sat, Feb 25, 2017 at 10:12 AM, Gideon Simpson <[email protected]> >> wrote: >> >> -gideon >> >>> On Feb 25, 2017, at 10:44 AM, Jed Brown <[email protected]> wrote: >>> >>> Gideon Simpson <[email protected]> writes: >>> >>>> I’ve been continuing working on implementing a projection method problem, >>>> which, loosely, looks like the following. Vec y contains the state vector >>>> for my system, y’ = f(y) which is solved with a TS, using, for now, rk4. >>>> >>>> I have added to this a TSPostStep which does root finding on the nonlinear >>>> problem >>>> >>>> f(lambda) = f(lambda; y, w) = g(y + lambda * w) - g0 >>> >>> You might want to treat this as a DAE. >> >> Yea, we discussed DAE previously. For the sake of a reviewer, I’m trying to >> implement a classical method (explicit time step + projection), rather than >> DAE. >> >>> >>>> y and w are vectors that are the size of the system (numbers of mesh >>>> points), and lambda is a scalar perturbation. >>>> >>>> Right now, the snes function for solving this amounts to: >>>> >>>> PetscErrorCode projector(SNES snes, Vec lam, Vec f, void *ctx){ >>>> >>>> AppCtx * user = (AppCtx *) ctx; >>>> const PetscScalar *lam_value; >>>> PetscScalar *f_value; >>>> >>>> VecGetArrayRead(lam, &lam_value); >>>> VecGetArray(f, &f_value); >>>> VecCopy(user->y, user->w); > > If you do make SNES run on PETSC_COMM_WORLD where lam is now a parallel > vector with 1 entry on process 0 and zero entries on all the others (and > similar for f) > You will need to broadcast the lam_value[0] to all processes before calling > the VecAXPY() > >>>> VecAXPY(user->w, lam_value[0], user->gradMy); >>>> >>>> f_value[0] = g(user->w) -user->g0; >>>> VecRestoreArrayRead(lam, &lam_value); >>>> VecRestoreArray(f, &f_value); >>>> >>>> return 0; >>>> } >>>> >>>> To get this all to work, I constructed the SNES and Vec lam with >>>> PETSC_COMM_SELF, effectively putting a copy of the nonlinear problem on >>>> each process. Basically, the nonlinear problem is low dimensional, but it >>>> parametrically depends on the high dimensional, distributed, vectors y and >>>> w. >>>> >>>> The one thing that bothers me about this is: >>>> >>>> 1. It would seem that each process is is doing all of these vector >>>> operations, which is entirely redundant, as the only thing that really >>>> needs to be shared amongst the processes is the value of lambda. Is that >>>> correct? >>> >>> Only the SNES part is redundant, but it's super cheap because it's a >>> scalar problem. The Vec operations are using global (parallel, >>> distributed) vectors, so they are not redundant. Of course it is >>> critical for the SNES on every process to be configured identically and >>> deterministic so that the processes don't diverge. And g(user->w) must >>> return the same value on each process (it probably finishes with an >>> MPI_Allreduce or similar). >> >> So within the root finding problem, when it calls, for instance, VecCopy >> (which is acting on distributed vectors), it’s not doing that once on >> process 0, once on process 1, once on process 2, etc. Or it is, but you’re >> saying it’s too cheap to matter? > > Each process is copying ITs part of the vector; which is exactly what you > need. >> >> No they are collective. That is why Jed said its crucial that the same >> branches are taken on all processes. >> >> Two things I’m mildly concerned about is that, since user->w changes at each >> step of the solver, by virtue of the lambda scalar changing, if the SNES’s >> are running asynchronously, by virtue of them being PETSC_COMM_SELF >> constructs, couldn’t there be a collision? >> >> Yes, as I say above. I think its much safer to run a COMM_WORLD SNES that >> just has nothing on most procs than to do the former >> since you are already synchronizing everywhere. >> >> The operation g(user->w) does indeed conclude with an MPI_Allreduce, >> followed by DMDAVecRestoreArray and DMRestoreLocalVector. >> >> Lastly, in the TSPostStep, it runs as: >> >> SNESSolve(user->snes, NULL,user->lam); >> VecGetArray(user->lam, &lam_value); > > Again here you would need to broadcast this user->lam value to all processes > before calling the VecAXPY() since it is only stored on process zero. > >> VecAXPY(y, lam_value[0], user->gradMy); >> VecRestoreArray(user->lam, &lam_value); >> >> Is there any communication issue with ensuring that all the SNES’s have >> finished on each process, before proceeding to do the vector operations? >> >> No. >> >> I keep wondering if I should just write this as a global, with the data >> structures stored on process 0, just to avoid this kind of headache. >> >> Yes. >> >> Matt >> >>> >>>> 2. Is there an obvious way around this redundancy? >>>> >>>> -gideon >> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments >> is infinitely more interesting than any results to which their experiments >> lead. >> -- Norbert Wiener >
