Great, thanks for letting us know. Barry
> On Feb 25, 2017, at 6:24 PM, Gideon Simpson <[email protected]> wrote: > > Yea, I implemented it as suggested by Matt and Barry with the BCast of the > element that sits only on process 0, and that seems to work fine. > > -gideon > >> On Feb 25, 2017, at 5:57 PM, Barry Smith <[email protected]> wrote: >> >> >>> On Feb 25, 2017, at 11:00 AM, Matthew Knepley <[email protected]> wrote: >>> >>> On Sat, Feb 25, 2017 at 10:12 AM, Gideon Simpson <[email protected]> >>> wrote: >>> >>> -gideon >>> >>>> On Feb 25, 2017, at 10:44 AM, Jed Brown <[email protected]> wrote: >>>> >>>> Gideon Simpson <[email protected]> writes: >>>> >>>>> I’ve been continuing working on implementing a projection method problem, >>>>> which, loosely, looks like the following. Vec y contains the state >>>>> vector for my system, y’ = f(y) which is solved with a TS, using, for >>>>> now, rk4. >>>>> >>>>> I have added to this a TSPostStep which does root finding on the >>>>> nonlinear problem >>>>> >>>>> f(lambda) = f(lambda; y, w) = g(y + lambda * w) - g0 >>>> >>>> You might want to treat this as a DAE. >>> >>> Yea, we discussed DAE previously. For the sake of a reviewer, I’m trying >>> to implement a classical method (explicit time step + projection), rather >>> than DAE. >>> >>>> >>>>> y and w are vectors that are the size of the system (numbers of mesh >>>>> points), and lambda is a scalar perturbation. >>>>> >>>>> Right now, the snes function for solving this amounts to: >>>>> >>>>> PetscErrorCode projector(SNES snes, Vec lam, Vec f, void *ctx){ >>>>> >>>>> AppCtx * user = (AppCtx *) ctx; >>>>> const PetscScalar *lam_value; >>>>> PetscScalar *f_value; >>>>> >>>>> VecGetArrayRead(lam, &lam_value); >>>>> VecGetArray(f, &f_value); >>>>> VecCopy(user->y, user->w); >> >> If you do make SNES run on PETSC_COMM_WORLD where lam is now a parallel >> vector with 1 entry on process 0 and zero entries on all the others (and >> similar for f) >> You will need to broadcast the lam_value[0] to all processes before calling >> the VecAXPY() >> >>>>> VecAXPY(user->w, lam_value[0], user->gradMy); >>>>> >>>>> f_value[0] = g(user->w) -user->g0; >>>>> VecRestoreArrayRead(lam, &lam_value); >>>>> VecRestoreArray(f, &f_value); >>>>> >>>>> return 0; >>>>> } >>>>> >>>>> To get this all to work, I constructed the SNES and Vec lam with >>>>> PETSC_COMM_SELF, effectively putting a copy of the nonlinear problem on >>>>> each process. Basically, the nonlinear problem is low dimensional, but >>>>> it parametrically depends on the high dimensional, distributed, vectors y >>>>> and w. >>>>> >>>>> The one thing that bothers me about this is: >>>>> >>>>> 1. It would seem that each process is is doing all of these vector >>>>> operations, which is entirely redundant, as the only thing that really >>>>> needs to be shared amongst the processes is the value of lambda. Is that >>>>> correct? >>>> >>>> Only the SNES part is redundant, but it's super cheap because it's a >>>> scalar problem. The Vec operations are using global (parallel, >>>> distributed) vectors, so they are not redundant. Of course it is >>>> critical for the SNES on every process to be configured identically and >>>> deterministic so that the processes don't diverge. And g(user->w) must >>>> return the same value on each process (it probably finishes with an >>>> MPI_Allreduce or similar). >>> >>> So within the root finding problem, when it calls, for instance, VecCopy >>> (which is acting on distributed vectors), it’s not doing that once on >>> process 0, once on process 1, once on process 2, etc. Or it is, but you’re >>> saying it’s too cheap to matter? >> >> Each process is copying ITs part of the vector; which is exactly what you >> need. >>> >>> No they are collective. That is why Jed said its crucial that the same >>> branches are taken on all processes. >>> >>> Two things I’m mildly concerned about is that, since user->w changes at >>> each step of the solver, by virtue of the lambda scalar changing, if the >>> SNES’s are running asynchronously, by virtue of them being PETSC_COMM_SELF >>> constructs, couldn’t there be a collision? >>> >>> Yes, as I say above. I think its much safer to run a COMM_WORLD SNES that >>> just has nothing on most procs than to do the former >>> since you are already synchronizing everywhere. >>> >>> The operation g(user->w) does indeed conclude with an MPI_Allreduce, >>> followed by DMDAVecRestoreArray and DMRestoreLocalVector. >>> >>> Lastly, in the TSPostStep, it runs as: >>> >>> SNESSolve(user->snes, NULL,user->lam); >>> VecGetArray(user->lam, &lam_value); >> >> Again here you would need to broadcast this user->lam value to all processes >> before calling the VecAXPY() since it is only stored on process zero. >> >>> VecAXPY(y, lam_value[0], user->gradMy); >>> VecRestoreArray(user->lam, &lam_value); >>> >>> Is there any communication issue with ensuring that all the SNES’s have >>> finished on each process, before proceeding to do the vector operations? >>> >>> No. >>> >>> I keep wondering if I should just write this as a global, with the data >>> structures stored on process 0, just to avoid this kind of headache. >>> >>> Yes. >>> >>> Matt >>> >>>> >>>>> 2. Is there an obvious way around this redundancy? >>>>> >>>>> -gideon >>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>
