Re: [petsc-users] Parallel TS for ODE

Francesco Brarda Tue, 20 Apr 2021 10:17:24 -0700

Thank you for the advices, I would just like to convert the code I already have 
to see what might happen once parallelized.
Do you think it is better to put the 3 equations into a 1d Distributed Array 
with 3 dofs and run the job with multiple procs regardless of how many 
equations I have? Is it possible?


Thank you,
Francesco

> Il giorno 20 apr 2021, alle ore 17:57, Stefano Zampini 
> <[email protected]> ha scritto:
> 
> It does not make sense to parallelize to 1 equation per process, unless that 
> single equation per process is super super super costly.
> Is this work you are doing used to understand PETSc parallelization strategy? 
> if so, there are multiple examples in the sourcetree that you can look at to 
> populate matrices and vectors in parallel
> 
> Il giorno mar 20 apr 2021 alle ore 17:52 Francesco Brarda 
> <[email protected] <mailto:[email protected]>> ha scritto:
> In principle the entire code was for 1 proc only. The functions were built 
> with VecGetArray(). While adapting the code for multiple procs I thought 
> using VecGetOwnershipRange was a possible way to allocate the equations in 
> the vector using multiple procs. What do you think, please? 
> 
> Thank you,
> Francesco
> 
>> Il giorno 20 apr 2021, alle ore 16:43, Matthew Knepley <[email protected] 
>> <mailto:[email protected]>> ha scritto:
>> 
>> On Tue, Apr 20, 2021 at 10:41 AM Francesco Brarda <[email protected] 
>> <mailto:[email protected]>> wrote:
>> I was trying to follow Barry's advice some time ago, but I guess that's not 
>> the way he meant it. How should I refer to the values contained in x? With 
>> Distributed Arrays?
>> 
>> That is how you get values from x. However, I cannot understand at all what 
>> you are doing with "mybase".
>> 
>>    Matt
>>  
>> Thanks 
>> Francesco
>> 
>>>>  Even though it will not scale and will deliver slower performance it is 
>>>> completely possible for you to solve the 3 variable problem using 3 MPI 
>>>> ranks. Or 10 mpi ranks. You would just create vectors/matrices with 1 
>>>> degree of freedom for the first three ranks and no degrees of freedom for 
>>>> the later ranks. During your function evaluation (and Jacobian evaluation) 
>>>> for TS you will need to set up the appropriate communication to get the 
>>>> values you need on each rank to evaluate the parts of the function 
>>>> evaluation needed by that rank. This is true for parallelizing any 
>>>> computation.
>>>> 
>>>>  Barry
>> 
>> 
>> 
>>> Il giorno 20 apr 2021, alle ore 15:40, Matthew Knepley <[email protected] 
>>> <mailto:[email protected]>> ha scritto:
>>> 
>>> On Tue, Apr 20, 2021 at 9:36 AM Francesco Brarda <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> Hi!
>>> I tried to implement the SIR model taking into account the fact that I will 
>>> only use 3 MPI ranks at this moment.
>>> I built vectors and matrices following the examples already available. In 
>>> particular, I defined the functions required similarly (RHSFunction, 
>>> IFunction, IJacobian), as follows:
>>> 
>>> I don't think this makes sense. You use "mybase" to distinguish between 3 
>>> procs, which would indicate that each procs has only
>>> 1 degree of freedom. However, you use x[1] on each proc, indicating it has 
>>> at least 2 dofs.
>>> 
>>>   Thanks,
>>> 
>>>      Matt
>>>  
>>> static PetscErrorCode RHSFunction(TS ts,PetscReal t,Vec X,Vec F,void *ctx)
>>> { 
>>>   PetscErrorCode    ierr;
>>>   AppCtx            *appctx = (AppCtx*) ctx;
>>>   PetscScalar       f;//, *x_localptr; 
>>>   const PetscScalar *x;
>>>   PetscInt          mybase;
>>>   
>>>   PetscFunctionBeginUser;
>>>   ierr = VecGetOwnershipRange(X,&mybase,NULL);CHKERRQ(ierr);
>>>   ierr = VecGetArrayRead(X,&x);CHKERRQ(ierr);
>>>   if (mybase == 0) {
>>>     f    = (PetscScalar) (-appctx->p1*x[0]*x[1]/appctx->N);
>>>     ierr = VecSetValues(F,1,&mybase,&f,INSERT_VALUES);
>>>   }
>>>   if (mybase == 1) {
>>>     f    = (PetscScalar) (appctx->p1*x[0]*x[1]/appctx->N-appctx->p2*x[1]);
>>>     ierr = VecSetValues(F,1,&mybase,&f,INSERT_VALUES);
>>>   }
>>>   if (mybase == 2) {
>>>     f    = (PetscScalar) (appctx->p2*x[1]);
>>>     ierr = VecSetValues(F,1,&mybase,&f,INSERT_VALUES);
>>>   }
>>>   ierr = VecRestoreArrayRead(X,&x);CHKERRQ(ierr);
>>>   ierr = VecAssemblyBegin(F);CHKERRQ(ierr);
>>>   ierr = VecAssemblyEnd(F);CHKERRQ(ierr);
>>>   PetscFunctionReturn(0);
>>> }
>>> 
>>> 
>>> Whilst for the Jacobian I did:
>>> 
>>> 
>>> static PetscErrorCode IJacobian(TS ts,PetscReal t,Vec X,Vec Xdot,PetscReal 
>>> a,Mat A,Mat B,void *ctx)
>>> { 
>>>   PetscErrorCode    ierr;
>>>   AppCtx            *appctx = (AppCtx*) ctx;
>>>   PetscInt          mybase, rowcol[] = {0,1,2};
>>>   const PetscScalar *x;
>>>   
>>>   PetscFunctionBeginUser;
>>>   ierr = MatGetOwnershipRange(B,&mybase,NULL);CHKERRQ(ierr);
>>>   ierr = VecGetArrayRead(X,&x);CHKERRQ(ierr);
>>>   if (mybase == 0) {
>>>     const PetscScalar J[] = {a + appctx->p1*x[1]/appctx->N, 
>>> appctx->p1*x[0]/appctx->N, 0};
>>>     ierr = MatSetValues(B,1,&mybase,3,rowcol,J,INSERT_VALUES);CHKERRQ(ierr);
>>>   }
>>>   if (mybase == 1) {
>>>     const PetscScalar J[] = {- appctx->p1*x[1]/appctx->N, a - 
>>> appctx->p1*x[0]/appctx->N + appctx->p2, 0};
>>>     ierr = MatSetValues(B,1,&mybase,3,rowcol,J,INSERT_VALUES);CHKERRQ(ierr);
>>>   }
>>>   if (mybase == 2) {
>>>     const PetscScalar J[] = {0, - appctx->p2, a};
>>>     ierr = MatSetValues(B,1,&mybase,3,rowcol,J,INSERT_VALUES);CHKERRQ(ierr);
>>>   }
>>>   ierr    = VecRestoreArrayRead(X,&x);CHKERRQ(ierr);
>>>   
>>>   ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>>>   ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>>>   if (A != B) {
>>>     ierr = MatAssemblyBegin(B,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>>>     ierr = MatAssemblyEnd(B,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>>>   }
>>>   PetscFunctionReturn(0);
>>> }
>>> 
>>> This code does not provide the correct result, that is, the solution is the 
>>> initial condition, either using implicit or explicit methods. Is the way I 
>>> defined these objects wrong? How can I fix it? 
>>> I also tried to print the Jacobian with the following commands but it does 
>>> not work (blank rows and error message). How should I print the Jacobian?
>>> 
>>> ierr = TSGetIJacobian(ts,NULL,&K, NULL, NULL); CHKERRQ(ierr);
>>> ierr = MatAssemblyBegin(K,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>>> ierr = MatAssemblyEnd(K,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>>> ierr = MatView(K,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr);                  
>>>           
>>> 
>>> I would very much appreciate any kind of help or advice.
>>> Best,
>>> Francesco
>>> 
>>>> Il giorno 2 apr 2021, alle ore 04:45, Barry Smith <[email protected] 
>>>> <mailto:[email protected]>> ha scritto:
>>>> 
>>>> 
>>>> 
>>>>> On Apr 1, 2021, at 9:17 PM, Zhang, Hong via petsc-users 
>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Mar 31, 2021, at 2:53 AM, Francesco Brarda <[email protected] 
>>>>>> <mailto:[email protected]>> wrote:
>>>>>> 
>>>>>> Hi everyone!
>>>>>> 
>>>>>> I am trying to solve a system of 3 ODEs (a basic SIR model) with TS. 
>>>>>> Sequentially works pretty well, but I need to switch it into a parallel 
>>>>>> version. 
>>>>>> I started working with TS not very long time ago, there are few 
>>>>>> questions I’d like to share with you and if you have any advices I’d be 
>>>>>> happy to hear.
>>>>>> First of all, do I need to use a DM object even if the model is only 
>>>>>> time dependent? All the examples I found were using that object for the 
>>>>>> other variable when solving PDEs.
>>>>> 
>>>>> Are you considering SIR on a spatial domain? If so, you can parallelize 
>>>>> your model in the spatial domain using DM. Splitting the three variables 
>>>>> in the ODE among processors would not scale.
>>>> 
>>>>  Even though it will not scale and will deliver slower performance it is 
>>>> completely possible for you to solve the 3 variable problem using 3 MPI 
>>>> ranks. Or 10 mpi ranks. You would just create vectors/matrices with 1 
>>>> degree of freedom for the first three ranks and no degrees of freedom for 
>>>> the later ranks. During your function evaluation (and Jacobian evaluation) 
>>>> for TS you will need to set up the appropriate communication to get the 
>>>> values you need on each rank to evaluate the parts of the function 
>>>> evaluation needed by that rank. This is true for parallelizing any 
>>>> computation.
>>>> 
>>>>  Barry
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>> Hong (Mr.)  
>>>>> 
>>>>>> When I preallocate the space for the Jacobian matrix, is it better to 
>>>>>> decide the local or global space?
>>>>>> 
>>>>>> Best,
>>>>>> Francesco
>>> 
>>> 
>>> 
>>> -- 
>>> What most experimenters take for granted before they begin their 
>>> experiments is infinitely more interesting than any results to which their 
>>> experiments lead.
>>> -- Norbert Wiener
>>> 
>>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments 
>> is infinitely more interesting than any results to which their experiments 
>> lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
> 
> 
> -- 
> Stefano

Re: [petsc-users] Parallel TS for ODE

Reply via email to