Great, thanks for letting us know.

  Barry

> On Feb 25, 2017, at 6:24 PM, Gideon Simpson <[email protected]> wrote:
> 
> Yea, I implemented it as suggested by Matt and Barry with the BCast of the 
> element that sits only on process 0, and that seems to work fine.
> 
> -gideon
> 
>> On Feb 25, 2017, at 5:57 PM, Barry Smith <[email protected]> wrote:
>> 
>> 
>>> On Feb 25, 2017, at 11:00 AM, Matthew Knepley <[email protected]> wrote:
>>> 
>>> On Sat, Feb 25, 2017 at 10:12 AM, Gideon Simpson <[email protected]> 
>>> wrote:
>>> 
>>> -gideon
>>> 
>>>> On Feb 25, 2017, at 10:44 AM, Jed Brown <[email protected]> wrote:
>>>> 
>>>> Gideon Simpson <[email protected]> writes:
>>>> 
>>>>> I’ve been continuing working on implementing a projection method problem, 
>>>>> which, loosely, looks like the following.  Vec y contains the state 
>>>>> vector for my system, y’ = f(y) which is solved with a TS, using, for 
>>>>> now, rk4.
>>>>> 
>>>>> I have added to this a TSPostStep which does root finding on the 
>>>>> nonlinear problem
>>>>> 
>>>>> f(lambda) = f(lambda; y, w) =  g(y + lambda * w) - g0
>>>> 
>>>> You might want to treat this as a DAE.
>>> 
>>> Yea, we discussed DAE previously.  For the sake of a reviewer, I’m trying 
>>> to implement a classical method (explicit time step + projection), rather 
>>> than DAE.
>>> 
>>>> 
>>>>> y and w are vectors that are the size of the system (numbers of mesh 
>>>>> points), and lambda is a scalar perturbation.  
>>>>> 
>>>>> Right now, the snes function for solving this amounts to:
>>>>> 
>>>>> PetscErrorCode projector(SNES snes, Vec lam, Vec f, void *ctx){
>>>>> 
>>>>> AppCtx * user = (AppCtx *) ctx;
>>>>> const PetscScalar *lam_value;
>>>>> PetscScalar *f_value;
>>>>> 
>>>>> VecGetArrayRead(lam, &lam_value);
>>>>> VecGetArray(f, &f_value);
>>>>> VecCopy(user->y, user->w);
>> 
>>   If you do make SNES run on PETSC_COMM_WORLD where lam is now a parallel 
>> vector with 1 entry on process 0 and zero entries on all the others (and 
>> similar for f)
>> You will need to broadcast the lam_value[0] to all processes before calling 
>> the VecAXPY()
>> 
>>>>> VecAXPY(user->w, lam_value[0], user->gradMy);
>>>>> 
>>>>> f_value[0] = g(user->w) -user->g0;
>>>>> VecRestoreArrayRead(lam, &lam_value);
>>>>> VecRestoreArray(f, &f_value);
>>>>> 
>>>>> return 0;
>>>>> }
>>>>> 
>>>>> To get this all to work, I constructed the SNES and Vec lam with 
>>>>> PETSC_COMM_SELF, effectively putting a copy of the nonlinear problem on 
>>>>> each process.  Basically, the nonlinear problem is low dimensional, but 
>>>>> it parametrically depends on the high dimensional, distributed, vectors y 
>>>>> and w.
>>>>> 
>>>>> The one thing that bothers me about this is:  
>>>>> 
>>>>> 1.  It would seem that each process is is doing all of these vector 
>>>>> operations, which is entirely redundant, as the only thing that really 
>>>>> needs to be shared amongst the processes is the value of lambda.  Is that 
>>>>> correct? 
>>>> 
>>>> Only the SNES part is redundant, but it's super cheap because it's a
>>>> scalar problem.  The Vec operations are using global (parallel,
>>>> distributed) vectors, so they are not redundant.  Of course it is
>>>> critical for the SNES on every process to be configured identically and
>>>> deterministic so that the processes don't diverge.  And g(user->w) must
>>>> return the same value on each process (it probably finishes with an
>>>> MPI_Allreduce or similar).
>>> 
>>> So within the root finding problem, when it calls, for instance, VecCopy 
>>> (which is acting on distributed vectors), it’s not doing that once on 
>>> process 0, once on process 1, once on process 2, etc.  Or it is, but you’re 
>>> saying it’s too cheap to matter? 
>> 
>>  Each process is copying ITs part of the vector; which is exactly what you 
>> need.
>>> 
>>> No they are collective. That is why Jed said its crucial that the same 
>>> branches are taken on all processes.
>>> 
>>> Two things I’m mildly concerned about is that, since user->w changes at 
>>> each step of the solver, by virtue of the lambda scalar changing,  if the 
>>> SNES’s are running asynchronously, by virtue of them being PETSC_COMM_SELF 
>>> constructs, couldn’t there be a collision?  
>>> 
>>> Yes, as I say above. I think its much safer to run a COMM_WORLD SNES that 
>>> just has nothing on most procs than to do the former
>>> since you are already synchronizing everywhere.
>>> 
>>> The operation g(user->w) does indeed conclude with an MPI_Allreduce, 
>>> followed by DMDAVecRestoreArray and DMRestoreLocalVector.
>>> 
>>> Lastly, in the TSPostStep, it runs as:
>>> 
>>> SNESSolve(user->snes, NULL,user->lam);
>>> VecGetArray(user->lam, &lam_value);
>> 
>> Again here you would need to broadcast this user->lam value to all processes 
>> before calling the VecAXPY() since it is only stored on process zero.
>> 
>>> VecAXPY(y, lam_value[0], user->gradMy);
>>> VecRestoreArray(user->lam, &lam_value);
>>> 
>>> Is there any communication issue with ensuring that all the SNES’s have 
>>> finished on each process, before proceeding to do the vector operations?
>>> 
>>> No.
>>> 
>>> I keep wondering if I should just write this as a global, with the data 
>>> structures stored on process 0, just to avoid this kind of headache. 
>>> 
>>> Yes.
>>> 
>>>  Matt
>>> 
>>>> 
>>>>> 2.  Is there an obvious way around this redundancy?
>>>>> 
>>>>> -gideon
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> What most experimenters take for granted before they begin their 
>>> experiments is infinitely more interesting than any results to which their 
>>> experiments lead.
>>> -- Norbert Wiener
>> 

Reply via email to