Yea, I implemented it as suggested by Matt and Barry with the BCast of the 
element that sits only on process 0, and that seems to work fine.

-gideon

> On Feb 25, 2017, at 5:57 PM, Barry Smith <[email protected]> wrote:
> 
> 
>> On Feb 25, 2017, at 11:00 AM, Matthew Knepley <[email protected]> wrote:
>> 
>> On Sat, Feb 25, 2017 at 10:12 AM, Gideon Simpson <[email protected]> 
>> wrote:
>> 
>> -gideon
>> 
>>> On Feb 25, 2017, at 10:44 AM, Jed Brown <[email protected]> wrote:
>>> 
>>> Gideon Simpson <[email protected]> writes:
>>> 
>>>> I’ve been continuing working on implementing a projection method problem, 
>>>> which, loosely, looks like the following.  Vec y contains the state vector 
>>>> for my system, y’ = f(y) which is solved with a TS, using, for now, rk4.
>>>> 
>>>> I have added to this a TSPostStep which does root finding on the nonlinear 
>>>> problem
>>>> 
>>>> f(lambda) = f(lambda; y, w) =  g(y + lambda * w) - g0
>>> 
>>> You might want to treat this as a DAE.
>> 
>> Yea, we discussed DAE previously.  For the sake of a reviewer, I’m trying to 
>> implement a classical method (explicit time step + projection), rather than 
>> DAE.
>> 
>>> 
>>>> y and w are vectors that are the size of the system (numbers of mesh 
>>>> points), and lambda is a scalar perturbation.  
>>>> 
>>>> Right now, the snes function for solving this amounts to:
>>>> 
>>>> PetscErrorCode projector(SNES snes, Vec lam, Vec f, void *ctx){
>>>> 
>>>> AppCtx * user = (AppCtx *) ctx;
>>>> const PetscScalar *lam_value;
>>>> PetscScalar *f_value;
>>>> 
>>>> VecGetArrayRead(lam, &lam_value);
>>>> VecGetArray(f, &f_value);
>>>> VecCopy(user->y, user->w);
> 
>    If you do make SNES run on PETSC_COMM_WORLD where lam is now a parallel 
> vector with 1 entry on process 0 and zero entries on all the others (and 
> similar for f)
> You will need to broadcast the lam_value[0] to all processes before calling 
> the VecAXPY()
> 
>>>> VecAXPY(user->w, lam_value[0], user->gradMy);
>>>> 
>>>> f_value[0] = g(user->w) -user->g0;
>>>> VecRestoreArrayRead(lam, &lam_value);
>>>> VecRestoreArray(f, &f_value);
>>>> 
>>>> return 0;
>>>> }
>>>> 
>>>> To get this all to work, I constructed the SNES and Vec lam with 
>>>> PETSC_COMM_SELF, effectively putting a copy of the nonlinear problem on 
>>>> each process.  Basically, the nonlinear problem is low dimensional, but it 
>>>> parametrically depends on the high dimensional, distributed, vectors y and 
>>>> w.
>>>> 
>>>> The one thing that bothers me about this is:  
>>>> 
>>>> 1.  It would seem that each process is is doing all of these vector 
>>>> operations, which is entirely redundant, as the only thing that really 
>>>> needs to be shared amongst the processes is the value of lambda.  Is that 
>>>> correct? 
>>> 
>>> Only the SNES part is redundant, but it's super cheap because it's a
>>> scalar problem.  The Vec operations are using global (parallel,
>>> distributed) vectors, so they are not redundant.  Of course it is
>>> critical for the SNES on every process to be configured identically and
>>> deterministic so that the processes don't diverge.  And g(user->w) must
>>> return the same value on each process (it probably finishes with an
>>> MPI_Allreduce or similar).
>> 
>> So within the root finding problem, when it calls, for instance, VecCopy 
>> (which is acting on distributed vectors), it’s not doing that once on 
>> process 0, once on process 1, once on process 2, etc.  Or it is, but you’re 
>> saying it’s too cheap to matter? 
> 
>   Each process is copying ITs part of the vector; which is exactly what you 
> need.
>> 
>> No they are collective. That is why Jed said its crucial that the same 
>> branches are taken on all processes.
>> 
>> Two things I’m mildly concerned about is that, since user->w changes at each 
>> step of the solver, by virtue of the lambda scalar changing,  if the SNES’s 
>> are running asynchronously, by virtue of them being PETSC_COMM_SELF 
>> constructs, couldn’t there be a collision?  
>> 
>> Yes, as I say above. I think its much safer to run a COMM_WORLD SNES that 
>> just has nothing on most procs than to do the former
>> since you are already synchronizing everywhere.
>> 
>> The operation g(user->w) does indeed conclude with an MPI_Allreduce, 
>> followed by DMDAVecRestoreArray and DMRestoreLocalVector.
>> 
>> Lastly, in the TSPostStep, it runs as:
>> 
>> SNESSolve(user->snes, NULL,user->lam);
>> VecGetArray(user->lam, &lam_value);
> 
> Again here you would need to broadcast this user->lam value to all processes 
> before calling the VecAXPY() since it is only stored on process zero.
> 
>> VecAXPY(y, lam_value[0], user->gradMy);
>> VecRestoreArray(user->lam, &lam_value);
>> 
>> Is there any communication issue with ensuring that all the SNES’s have 
>> finished on each process, before proceeding to do the vector operations?
>> 
>> No.
>> 
>> I keep wondering if I should just write this as a global, with the data 
>> structures stored on process 0, just to avoid this kind of headache. 
>> 
>> Yes.
>> 
>>   Matt
>> 
>>> 
>>>> 2.  Is there an obvious way around this redundancy?
>>>> 
>>>> -gideon
>> 
>> 
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments 
>> is infinitely more interesting than any results to which their experiments 
>> lead.
>> -- Norbert Wiener
> 

Reply via email to