Re: [petsc-users] Parallel TS for ODE

Francesco Brarda Wed, 05 May 2021 02:12:27 -0700

Many thanks, this helped a lot. Now it finally works.

Best,
Francesco


> Il giorno 4 mag 2021, alle ore 17:40, Barry Smith <[email protected]> ha 
> scritto:
> 
> 
> 
>> On May 4, 2021, at 2:53 AM, Francesco Brarda <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Thank you very much everyone. I do have one more question. 
>>> For what you want to do you can use 3,1,2.  This says three "spatial" 
>>> points, 1 dof at each "spatial" point and 2 ghost points. In your case 
>>> "spatial" does not mean spatial in space it is just three abstract points. 
>> In this case, since I have 2 ghost points, should I change the 
>> DMBoundaryType? 
>> For one process this works, but with 2 or 3 procs it doesn’t. The error I 
>> have is the following:
>> 
>> Solving a non-linear TS problem, number of processors = 2
>> [1]PETSC ERROR: --------------------- Error Message 
>> --------------------------------------------------------------
>> [1]PETSC ERROR: Argument out of range
>> [1]PETSC ERROR: Local x-width of domain x 1 is smaller than stencil width s 2
> 
>      I had forgotten this. It is a limitation of the DMDA implementation that 
> one cannot require data from two ranks away from the current rank. 
> 
>       But yes perhaps you can "cheat" by using DM_BOUNDARY_PERIODIC and a 
> stencil width of 1.  For each rank it will bring the value from the previous 
> and next rank and due to the periodicity it will thus bring the 3rd value to 
> the first rank and the first value to the third rank.
> 
>    Barry
> 
> 
> 
>> [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html 
>> <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble shooting.
>> [1]PETSC ERROR: Petsc Release Version 3.14.4, unknown 
>> [1]PETSC ERROR: ./test_ic on a arch-debug named srvulx13 by fbrarda Mon May  
>> 3 17:46:37 2021
>> [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ 
>> --with-fc=gfortran --with-openblas-dir=/opt/packages/openblas/0.2.13-gcc 
>> --download-mpich PETSC_ARCH=arch-debug
>> [1]PETSC ERROR: #1 DMSetUp_DA_1D() line 199 in 
>> /home/fbrarda/petsc/src/dm/impls/da/da1.c
>> [1]PETSC ERROR: #2 DMSetUp_DA() line 20 in 
>> /home/fbrarda/petsc/src/dm/impls/da/dareg.c
>> [1]PETSC ERROR: #3 DMSetUp() line 787 in 
>> /home/fbrarda/petsc/src/dm/interface/dm.c
>> [1]PETSC ERROR: #4 main() line 232 in test_ic.c
>> [1]PETSC ERROR: No PETSc Option Table entries
>> [1]PETSC ERROR: ----------------End of Error Message -------send entire 
>> error message to [email protected] 
>> <mailto:[email protected]>----------
>> 
>>> Il giorno 2 mag 2021, alle ore 18:54, Barry Smith <[email protected] 
>>> <mailto:[email protected]>> ha scritto:
>>> 
>>> 
>>> [0]PETSC ERROR: Wrong subtype object:Parameter # 1 must have implementation 
>>> da it is shell
>>> 
>>> Are you calling TSSetDM() to supply your created DMDA to the TS? Based on 
>>> the error message you are not, it is using a default shell DM, which is 
>>> what TS does if you do not provide them with a DM. You need to call 
>>> TSSetDM() after you create the TS and the DMDA.
>>> 
>>> 
>>>> On Apr 29, 2021, at 2:57 AM, Francesco Brarda <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> I defined the DM as follows
>>>> ierr = 
>>>> DMDACreate1d(PETSC_COMM_WORLD,DM_BOUNDARY_NONE,3,3,3,NULL,&da);CHKERRQ(ierr);
>>> 
>>> If you truly want one "spatial" point then you would want to use 1,3,0. 
>>> This says 1 location in space, three degrees of freedom at that point and 0 
>>> ghost values (since there is only one spatial point there can be no ghost 
>>> spatial values).  
>>> 
>>> BUT DMDA ALWAYS puts all degree of freedom at a point on the same process 
>>> so this will not give you parallelism. All 3 dof will be on the same MPI 
>>> rank.
>>> 
>>> For what you want to do you can use 3,1,2.  This says three "spatial" 
>>> points, 1 dof at each "spatial" point and 2 ghost points. In your case 
>>> "spatial" does not mean spatial in space it is just three abstract points. 
>>> 
>>> The global vectors (from DMCreate or Get GlobalVector) will have 1 value on 
>>> each rank. The local vector from DMCreate or Get LocalVector) will have 
>>> three values on each rank. Your initial conditions code would be something 
>>> like 
>>> 
>>> if (rank == 0) {
>>>> x[0] = appctx->N - appctx->p[2];
>>> } else if (rank == 1) {
>>>>   x[1] = appctx->p[2];
>>> } else {
>>>>   x[2] = 0.0;
>>> }
>>> 
>>> 
>>> Your TSSetRHSFunction() would make a call to DMGetLocalVector(...&localX), 
>>> do a DMGlobalToLocalBegin/End() from the input global X to localX, you 
>>> would call DMDAVecGetArray(...,&xarray) on the localX and access all three 
>>> values in xarray. The resulting computation of f the output vector would be 
>>> something like 
>>> 
>>> if (rank == 0) {
>>>> farray[0] = your code that can use xarray[0], xarray[1], xarray[2]
>>> } else if (rank == 1) {
>>>> farray[1] = your code that can use xarray[0], xarray[1], xarray[2]
>>> } else {
>>>> farray[2] = your code that can use xarray[0], xarray[1], xarray[2]
>>> }
>>> 
>>> There are many examples of this pattern in the example tutorials.
>>> 
>>> When you implement a code with a spatial distribution you would use a dof 
>>> of 3 at each point and not parallelize over the dof at each point. Likely 
>>> you want to use DMNETWORK to manage the spatial distribution since it has a 
>>> simple API and allows any number of different number of neighbors for each 
>>> point. DMDA would not make sense  for true spatial distribution except in 
>>> some truly trivial neighbor configurations. 
>>> 
>>> Barry
>>> 
>>> 
>>> 
>>>> 
>>>> I am not sure whether I correctly understood this command properly. The 
>>>> vector should have 3 components (S, I, R) and 3 DOF as it is defined only 
>>>> when the three coordinates have been set.
>>>> Then I create a global vector X. When I set the initial conditions as 
>>>> below 
>>>> 
>>>> static PetscErrorCode InitialConditions(TS ts,Vec X, void *ctx)
>>>> { 
>>>>   PetscErrorCode    ierr;
>>>>   AppCtx            *appctx = (AppCtx*) ctx;
>>>>   PetscScalar       *x;
>>>>   DM                da;
>>>>   
>>>>   PetscFunctionBeginUser;
>>>>   ierr = TSGetDM(ts,&da);CHKERRQ(ierr);
>>>>   
>>>>   /* Get pointers to vector data */
>>>>   ierr = DMDAVecGetArray(da,X,(void*)&x);CHKERRQ(ierr);
>>>>   
>>>>   x[0] = appctx->N - appctx->p[2];
>>>>   x[1] = appctx->p[2];
>>>>   x[2] = 0.0;
>>>>   
>>>>   ierr = DMDAVecRestoreArray(da,X,(void*)&x);CHKERRQ(ierr);
>>>>   PetscFunctionReturn(0);
>>>> }
>>>> 
>>>> I have the error:
>>>> 
>>>> [0]PETSC ERROR: Invalid argument
>>>> [0]PETSC ERROR: Wrong subtype object:Parameter # 1 must have 
>>>> implementation da it is shell
>>>> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html 
>>>> <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble 
>>>> shooting.
>>>> [0]PETSC ERROR: Petsc Release Version 3.14.4, unknown 
>>>> [0]PETSC ERROR: ./par_sir_model on a arch-debug named srvulx13 by fbrarda 
>>>> Thu Apr 29 09:36:17 2021
>>>> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ 
>>>> --with-fc=gfortran --with-openblas-dir=/opt/packages/openblas/0.2.13-gcc 
>>>> --download-mpich PETSC_ARCH=arch-debug
>>>> [0]PETSC ERROR: #1 DMDAVecGetArray() line 48 in 
>>>> /home/fbrarda/petsc/src/dm/impls/da/dagetarray.c
>>>> [0]PETSC ERROR: #2 InitialConditions() line 175 in par_sir_model.c
>>>> [0]PETSC ERROR: #3 main() line 295 in par_sir_model.c
>>>> [0]PETSC ERROR: No PETSc Option Table entries
>>>> 
>>>> I would be very happy to receive any advices to fix the code.
>>>> Best,
>>>> Francesco
>>>> 
>>>>> Il giorno 20 apr 2021, alle ore 21:35, Matthew Knepley <[email protected] 
>>>>> <mailto:[email protected]>> ha scritto:
>>>>> 
>>>>> On Tue, Apr 20, 2021 at 1:17 PM Francesco Brarda 
>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>> Thank you for the advices, I would just like to convert the code I 
>>>>> already have to see what might happen once parallelized.
>>>>> Do you think it is better to put the 3 equations into a 1d Distributed 
>>>>> Array with 3 dofs and run the job with multiple procs regardless of how 
>>>>> many equations I have? Is it possible? 
>>>>> 
>>>>> If you plan in the end to use a structured grid, this is a great plan. If 
>>>>> not, this is not a good plan.
>>>>> 
>>>>>   Thanks,
>>>>> 
>>>>>      Matt
>>>>>  
>>>>> Thank you,
>>>>> Francesco
>>>>> 
>>>>>> Il giorno 20 apr 2021, alle ore 17:57, Stefano Zampini 
>>>>>> <[email protected] <mailto:[email protected]>> ha 
>>>>>> scritto:
>>>>>> 
>>>>>> It does not make sense to parallelize to 1 equation per process, unless 
>>>>>> that single equation per process is super super super costly.
>>>>>> Is this work you are doing used to understand PETSc parallelization 
>>>>>> strategy? if so, there are multiple examples in the sourcetree that you 
>>>>>> can look at to populate matrices and vectors in parallel
>>>>>> 
>>>>>> Il giorno mar 20 apr 2021 alle ore 17:52 Francesco Brarda 
>>>>>> <[email protected] <mailto:[email protected]>> ha 
>>>>>> scritto:
>>>>>> In principle the entire code was for 1 proc only. The functions were 
>>>>>> built with VecGetArray(). While adapting the code for multiple procs I 
>>>>>> thought using VecGetOwnershipRange was a possible way to allocate the 
>>>>>> equations in the vector using multiple procs. What do you think, please? 
>>>>>> 
>>>>>> Thank you,
>>>>>> Francesco
>>>>>> 
>>>>>>> Il giorno 20 apr 2021, alle ore 16:43, Matthew Knepley 
>>>>>>> <[email protected] <mailto:[email protected]>> ha scritto:
>>>>>>> 
>>>>>>> On Tue, Apr 20, 2021 at 10:41 AM Francesco Brarda 
>>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>> I was trying to follow Barry's advice some time ago, but I guess that's 
>>>>>>> not the way he meant it. How should I refer to the values contained in 
>>>>>>> x? With Distributed Arrays?
>>>>>>> 
>>>>>>> That is how you get values from x. However, I cannot understand at all 
>>>>>>> what you are doing with "mybase".
>>>>>>> 
>>>>>>>    Matt
>>>>>>>  
>>>>>>> Thanks 
>>>>>>> Francesco
>>>>>>> 
>>>>>>>>>  Even though it will not scale and will deliver slower performance it 
>>>>>>>>> is completely possible for you to solve the 3 variable problem using 
>>>>>>>>> 3 MPI ranks. Or 10 mpi ranks. You would just create vectors/matrices 
>>>>>>>>> with 1 degree of freedom for the first three ranks and no degrees of 
>>>>>>>>> freedom for the later ranks. During your function evaluation (and 
>>>>>>>>> Jacobian evaluation) for TS you will need to set up the appropriate 
>>>>>>>>> communication to get the values you need on each rank to evaluate the 
>>>>>>>>> parts of the function evaluation needed by that rank. This is true 
>>>>>>>>> for parallelizing any computation.
>>>>>>>>> 
>>>>>>>>>  Barry
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> Il giorno 20 apr 2021, alle ore 15:40, Matthew Knepley 
>>>>>>>> <[email protected] <mailto:[email protected]>> ha scritto:
>>>>>>>> 
>>>>>>>> On Tue, Apr 20, 2021 at 9:36 AM Francesco Brarda 
>>>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>>> Hi!
>>>>>>>> I tried to implement the SIR model taking into account the fact that I 
>>>>>>>> will only use 3 MPI ranks at this moment.
>>>>>>>> I built vectors and matrices following the examples already available. 
>>>>>>>> In particular, I defined the functions required similarly 
>>>>>>>> (RHSFunction, IFunction, IJacobian), as follows:
>>>>>>>> 
>>>>>>>> I don't think this makes sense. You use "mybase" to distinguish 
>>>>>>>> between 3 procs, which would indicate that each procs has only
>>>>>>>> 1 degree of freedom. However, you use x[1] on each proc, indicating it 
>>>>>>>> has at least 2 dofs.
>>>>>>>> 
>>>>>>>>   Thanks,
>>>>>>>> 
>>>>>>>>      Matt
>>>>>>>>  
>>>>>>>> static PetscErrorCode RHSFunction(TS ts,PetscReal t,Vec X,Vec F,void 
>>>>>>>> *ctx)
>>>>>>>> { 
>>>>>>>>   PetscErrorCode    ierr;
>>>>>>>>   AppCtx            *appctx = (AppCtx*) ctx;
>>>>>>>>   PetscScalar       f;//, *x_localptr; 
>>>>>>>>   const PetscScalar *x;
>>>>>>>>   PetscInt          mybase;
>>>>>>>>   
>>>>>>>>   PetscFunctionBeginUser;
>>>>>>>>   ierr = VecGetOwnershipRange(X,&mybase,NULL);CHKERRQ(ierr);
>>>>>>>>   ierr = VecGetArrayRead(X,&x);CHKERRQ(ierr);
>>>>>>>>   if (mybase == 0) {
>>>>>>>>     f    = (PetscScalar) (-appctx->p1*x[0]*x[1]/appctx->N);
>>>>>>>>     ierr = VecSetValues(F,1,&mybase,&f,INSERT_VALUES);
>>>>>>>>   }
>>>>>>>>   if (mybase == 1) {
>>>>>>>>     f    = (PetscScalar) 
>>>>>>>> (appctx->p1*x[0]*x[1]/appctx->N-appctx->p2*x[1]);
>>>>>>>>     ierr = VecSetValues(F,1,&mybase,&f,INSERT_VALUES);
>>>>>>>>   }
>>>>>>>>   if (mybase == 2) {
>>>>>>>>     f    = (PetscScalar) (appctx->p2*x[1]);
>>>>>>>>     ierr = VecSetValues(F,1,&mybase,&f,INSERT_VALUES);
>>>>>>>>   }
>>>>>>>>   ierr = VecRestoreArrayRead(X,&x);CHKERRQ(ierr);
>>>>>>>>   ierr = VecAssemblyBegin(F);CHKERRQ(ierr);
>>>>>>>>   ierr = VecAssemblyEnd(F);CHKERRQ(ierr);
>>>>>>>>   PetscFunctionReturn(0);
>>>>>>>> }
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Whilst for the Jacobian I did:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> static PetscErrorCode IJacobian(TS ts,PetscReal t,Vec X,Vec 
>>>>>>>> Xdot,PetscReal a,Mat A,Mat B,void *ctx)
>>>>>>>> { 
>>>>>>>>   PetscErrorCode    ierr;
>>>>>>>>   AppCtx            *appctx = (AppCtx*) ctx;
>>>>>>>>   PetscInt          mybase, rowcol[] = {0,1,2};
>>>>>>>>   const PetscScalar *x;
>>>>>>>>   
>>>>>>>>   PetscFunctionBeginUser;
>>>>>>>>   ierr = MatGetOwnershipRange(B,&mybase,NULL);CHKERRQ(ierr);
>>>>>>>>   ierr = VecGetArrayRead(X,&x);CHKERRQ(ierr);
>>>>>>>>   if (mybase == 0) {
>>>>>>>>     const PetscScalar J[] = {a + appctx->p1*x[1]/appctx->N, 
>>>>>>>> appctx->p1*x[0]/appctx->N, 0};
>>>>>>>>     ierr = 
>>>>>>>> MatSetValues(B,1,&mybase,3,rowcol,J,INSERT_VALUES);CHKERRQ(ierr);
>>>>>>>>   }
>>>>>>>>   if (mybase == 1) {
>>>>>>>>     const PetscScalar J[] = {- appctx->p1*x[1]/appctx->N, a - 
>>>>>>>> appctx->p1*x[0]/appctx->N + appctx->p2, 0};
>>>>>>>>     ierr = 
>>>>>>>> MatSetValues(B,1,&mybase,3,rowcol,J,INSERT_VALUES);CHKERRQ(ierr);
>>>>>>>>   }
>>>>>>>>   if (mybase == 2) {
>>>>>>>>     const PetscScalar J[] = {0, - appctx->p2, a};
>>>>>>>>     ierr = 
>>>>>>>> MatSetValues(B,1,&mybase,3,rowcol,J,INSERT_VALUES);CHKERRQ(ierr);
>>>>>>>>   }
>>>>>>>>   ierr    = VecRestoreArrayRead(X,&x);CHKERRQ(ierr);
>>>>>>>>   
>>>>>>>>   ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>>>>>>>>   ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>>>>>>>>   if (A != B) {
>>>>>>>>     ierr = MatAssemblyBegin(B,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>>>>>>>>     ierr = MatAssemblyEnd(B,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>>>>>>>>   }
>>>>>>>>   PetscFunctionReturn(0);
>>>>>>>> }
>>>>>>>> 
>>>>>>>> This code does not provide the correct result, that is, the solution 
>>>>>>>> is the initial condition, either using implicit or explicit methods. 
>>>>>>>> Is the way I defined these objects wrong? How can I fix it? 
>>>>>>>> I also tried to print the Jacobian with the following commands but it 
>>>>>>>> does not work (blank rows and error message). How should I print the 
>>>>>>>> Jacobian?
>>>>>>>> 
>>>>>>>> ierr = TSGetIJacobian(ts,NULL,&K, NULL, NULL); CHKERRQ(ierr);
>>>>>>>> ierr = MatAssemblyBegin(K,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>>>>>>>> ierr = MatAssemblyEnd(K,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>>>>>>>> ierr = MatView(K,PETSC_VIEWER_STDOUT_WORLD);CHKERRQ(ierr);             
>>>>>>>>                
>>>>>>>> 
>>>>>>>> I would very much appreciate any kind of help or advice.
>>>>>>>> Best,
>>>>>>>> Francesco
>>>>>>>> 
>>>>>>>>> Il giorno 2 apr 2021, alle ore 04:45, Barry Smith <[email protected] 
>>>>>>>>> <mailto:[email protected]>> ha scritto:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Apr 1, 2021, at 9:17 PM, Zhang, Hong via petsc-users 
>>>>>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Mar 31, 2021, at 2:53 AM, Francesco Brarda 
>>>>>>>>>>> <[email protected] <mailto:[email protected]>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi everyone!
>>>>>>>>>>> 
>>>>>>>>>>> I am trying to solve a system of 3 ODEs (a basic SIR model) with 
>>>>>>>>>>> TS. Sequentially works pretty well, but I need to switch it into a 
>>>>>>>>>>> parallel version. 
>>>>>>>>>>> I started working with TS not very long time ago, there are few 
>>>>>>>>>>> questions I’d like to share with you and if you have any advices 
>>>>>>>>>>> I’d be happy to hear.
>>>>>>>>>>> First of all, do I need to use a DM object even if the model is 
>>>>>>>>>>> only time dependent? All the examples I found were using that 
>>>>>>>>>>> object for the other variable when solving PDEs.
>>>>>>>>>> 
>>>>>>>>>> Are you considering SIR on a spatial domain? If so, you can 
>>>>>>>>>> parallelize your model in the spatial domain using DM. Splitting the 
>>>>>>>>>> three variables in the ODE among processors would not scale.
>>>>>>>>> 
>>>>>>>>>  Even though it will not scale and will deliver slower performance it 
>>>>>>>>> is completely possible for you to solve the 3 variable problem using 
>>>>>>>>> 3 MPI ranks. Or 10 mpi ranks. You would just create vectors/matrices 
>>>>>>>>> with 1 degree of freedom for the first three ranks and no degrees of 
>>>>>>>>> freedom for the later ranks. During your function evaluation (and 
>>>>>>>>> Jacobian evaluation) for TS you will need to set up the appropriate 
>>>>>>>>> communication to get the values you need on each rank to evaluate the 
>>>>>>>>> parts of the function evaluation needed by that rank. This is true 
>>>>>>>>> for parallelizing any computation.
>>>>>>>>> 
>>>>>>>>>  Barry
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Hong (Mr.)  
>>>>>>>>>> 
>>>>>>>>>>> When I preallocate the space for the Jacobian matrix, is it better 
>>>>>>>>>>> to decide the local or global space?
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Francesco
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> What most experimenters take for granted before they begin their 
>>>>>>>> experiments is infinitely more interesting than any results to which 
>>>>>>>> their experiments lead.
>>>>>>>> -- Norbert Wiener
>>>>>>>> 
>>>>>>>> https://www.cse.buffalo.edu/~knepley/ 
>>>>>>>> <https://www.cse.buffalo.edu/~knepley/>
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> What most experimenters take for granted before they begin their 
>>>>>>> experiments is infinitely more interesting than any results to which 
>>>>>>> their experiments lead.
>>>>>>> -- Norbert Wiener
>>>>>>> 
>>>>>>> https://www.cse.buffalo.edu/~knepley/ 
>>>>>>> <https://www.cse.buffalo.edu/~knepley/>
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> Stefano
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> What most experimenters take for granted before they begin their 
>>>>> experiments is infinitely more interesting than any results to which 
>>>>> their experiments lead.
>>>>> -- Norbert Wiener
>>>>> 
>>>>> https://www.cse.buffalo.edu/~knepley/ 
>>>>> <https://www.cse.buffalo.edu/~knepley/>

Re: [petsc-users] Parallel TS for ODE

Reply via email to