Re: [petsc-users] Diagnosing Convergence Issue in Fieldsplit Problem

2024-05-23 Thread Stefano Zampini
It is true if A is nonsingular. if it is singular I never did the algebra;
you need to use generalized Schur complements with pseudo-inverses

Il giorno ven 24 mag 2024 alle ore 00:04 Barry Smith  ha
scritto:

>
>
> On May 23, 2024, at 3:48 PM, Stefano Zampini 
> wrote:
>
> the null space of the Schur complement is the restriction of the original
> null space. I guess if fieldsplit is Schur type then we could in principle
> extract the sub vectors and renormalize them
>
>
>Is this true if A is singular?   Or are you assuming the Schur
> complement form is only used if A is nonsingular? Would the user need to
> somehow indicate A is nonsingular?
>
>
>
>
> On Thu, May 23, 2024, 22:13 Jed Brown  wrote:
>
>> Barry Smith  writes: > Unfortunately it cannot
>> automatically because -pc_fieldsplit_detect_saddle_point just grabs part of
>> the matrix (having no concept of "what part" so doesn't know to grab the
>> null space information.
>> ZjQcmQRYFpfptBannerStart
>> This Message Is From an External Sender
>> This message came from outside your organization.
>>
>> ZjQcmQRYFpfptBannerEnd
>>
>> Barry Smith  writes:
>>
>> >Unfortunately it cannot automatically because 
>> > -pc_fieldsplit_detect_saddle_point just grabs part of the matrix (having 
>> > no concept of "what part" so doesn't know to grab the null space 
>> > information.
>> >
>> >It would be possible for PCFIELDSPLIT to access the null space of the 
>> > larger matrix directly as vectors and check if they are all zero in the 00 
>> > block, then it would know that the null space only applied to the second 
>> > block and could use it for the Schur complement.
>> >
>> >Matt, Jed, Stefano, Pierre does this make sense?
>>
>> I think that would work (also need to check that the has_cnst flag is 
>> false), though if you've gone to the effort of filling in that Vec, you 
>> might as well provide the IS.
>>
>> I also wonder if the RHS is consistent.
>>
>>
>

-- 
Stefano


Re: [petsc-users] Diagnosing Convergence Issue in Fieldsplit Problem

2024-05-23 Thread Stefano Zampini
the null space of the Schur complement is the restriction of the original
null space. I guess if fieldsplit is Schur type then we could in principle
extract the sub vectors and renormalize them


On Thu, May 23, 2024, 22:13 Jed Brown  wrote:

> Barry Smith  writes: > Unfortunately it cannot
> automatically because -pc_fieldsplit_detect_saddle_point just grabs part of
> the matrix (having no concept of "what part" so doesn't know to grab the
> null space information.
> ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> This message came from outside your organization.
>
> ZjQcmQRYFpfptBannerEnd
>
> Barry Smith  writes:
>
> >Unfortunately it cannot automatically because 
> > -pc_fieldsplit_detect_saddle_point just grabs part of the matrix (having no 
> > concept of "what part" so doesn't know to grab the null space information.
> >
> >It would be possible for PCFIELDSPLIT to access the null space of the 
> > larger matrix directly as vectors and check if they are all zero in the 00 
> > block, then it would know that the null space only applied to the second 
> > block and could use it for the Schur complement.
> >
> >Matt, Jed, Stefano, Pierre does this make sense?
>
> I think that would work (also need to check that the has_cnst flag is false), 
> though if you've gone to the effort of filling in that Vec, you might as well 
> provide the IS.
>
> I also wonder if the RHS is consistent.
>
>


Re: [petsc-users] Question about petsc4py createWithArray function

2024-05-06 Thread Stefano Zampini
Samar

After a second look, I believe the petsc4py code is correct. You can test
it using the script below.
After destroy is called (or del), the reference count of the numpy array is
back to its initial state.
Maybe you are not calling del or destroy? a MWE would help to understand
your use case


kl-18448:~ szampini$ cat t.py

import sys

import numpy as np

from petsc4py import PETSc


x = np.zeros(4, dtype=PETSc.ScalarType)

print('initial ref count',sys.getrefcount(x))


v = PETSc.Vec().createWithArray(x)

print('after create',sys.getrefcount(x))


# check if they share memory

v.view()

x[1] = 2

v.view()


# free

v.destroy()

# you can also call del

# del v

print('after destroy',sys.getrefcount(x))


kl-18448:~ szampini$ python t.py

initial ref count 2

after create 3

Vec Object: 1 MPI process

  type: seq

0.

0.

0.

0.

Vec Object: 1 MPI process

  type: seq

0.

2.

0.

0.

after destroy 2



Il giorno lun 6 mag 2024 alle ore 08:49 Samar Khatiwala <
samar.khatiw...@earth.ox.ac.uk> ha scritto:

> Hi Stefano,
>
> Thanks for looking into this. Since createWithArray calls
> VecCreateMPIWithArray which, as Matt noted and is documented (
> https://urldefense.us/v3/__https://petsc.org/main/manualpages/Vec/VecCreateMPIWithArray/__;!!G_uCfscf7eWS!Z9RkH5ffTuJDtBUf8_Gk0BHuG__BKv4jPYeg89Rp6_GcS9VTcFs2J8uyLd5_wiqdDmy9ABXjczA3PYB1raRliVrs4DeDixY$
>  ) doesn’t
> free the memory, then there’s a memory leak (and, furthermore, calling del
> on the original array will have no effect).
>
> Lisandro: would be great if you can provide some guidance.
>
> Thanks,
>
> Samar
>
> On May 3, 2024, at 12:45 PM, Stefano Zampini 
> wrote:
>
> While waiting for our Python wizard to shed light on this, I note that,
> from the documentation of PyArray_FROM_OTF
> https://urldefense.us/v3/__https://numpy.org/devdocs/user/c-info.how-to-extend.html*converting-an-arbitrary-sequence-object__;Iw!!G_uCfscf7eWS!Z9RkH5ffTuJDtBUf8_Gk0BHuG__BKv4jPYeg89Rp6_GcS9VTcFs2J8uyLd5_wiqdDmy9ABXjczA3PYB1raRliVrsskf0ioU$
>  ,
> we have
>
> The object can be any Python object convertible to an ndarray. If the
> object is already (a subclass of) the ndarray that satisfies the
> requirements then a new reference is returned.
>
> I guess we should call "del" on the ndarray returned by iarray_s after
> having called  self.set_attr('__array__', array) in this case, but let's
> wait for Lisandro to confirm
>
>
>
>
> Il giorno ven 3 mag 2024 alle ore 11:42 Samar Khatiwala <
> samar.khatiw...@earth.ox.ac.uk> ha scritto:
>
>> Hi Matt, Thanks so much for the quick reply! Regarding #2, I put some
>> debug statement in my code and what I find is that when I use
>> createWithArray on my Cython-allocated numpy array, the destructor I set
>> for it is no longer called when I delete
>> ZjQcmQRYFpfptBannerStart
>> This Message Is From an External Sender
>> This message came from outside your organization.
>>
>> ZjQcmQRYFpfptBannerEnd
>> Hi Matt,
>>
>> Thanks so much for the quick reply!
>>
>> Regarding #2, I put some debug statement in my code and what I find is
>> that when I use createWithArray on my Cython-allocated numpy array, the
>> destructor I set for it is no longer called when I delete the array. (If I
>> don’t use createWithArray then the destructor is triggered.) I interpret
>> that to suggest that the petsc4py Vec is somehow ’taking over’ management
>> of the numpy array. But I don’t understand where that could be
>> happening. (I don’t think it has to do with the actual freeing of memory by
>> PETSc's VecDestroy.)
>>
>> createWithArray calls iarray_s which in turn calls PyArray_FROM_OTF.
>> Could it be there’s something going on there? The numpy documentation is
>> unclear.
>>
>> Lisandro: do you have any thoughts on this?
>>
>> Thanks,
>>
>> Samar
>>
>> On May 2, 2024, at 11:56 PM, Matthew Knepley  wrote:
>>
>> On Thu, May 2, 2024 at 12:53 PM Samar Khatiwala <
>> samar.khatiw...@earth.ox.ac.uk> wrote:
>>
>>> This Message Is From an External Sender
>>> This message came from outside your organization.
>>>
>>>
>>> Hello,
>>>
>>> I have a couple of questions about createWithArray in petsc4py:
>>>
>>> 1) What is the correct usage for creating a standard MPI Vec with it? 
>>> Something like this seems to work but is it right?:
>>>
>>> On each rank do:
>>> a = np.zeros(localSize)
>>> v = PETSc.Vec().createWithArray(a, comm=PETSc.COMM_WORLD)
>>>
>>> Is that all it takes?
>>>
>>>
>> That looks right to me.
>>
>>> 2) Who 

Re: [petsc-users] Question about petsc4py createWithArray function

2024-05-03 Thread Stefano Zampini
While waiting for our Python wizard to shed light on this, I note that,
from the documentation of PyArray_FROM_OTF
https://urldefense.us/v3/__https://numpy.org/devdocs/user/c-info.how-to-extend.html*converting-an-arbitrary-sequence-object__;Iw!!G_uCfscf7eWS!cPqvd3brsxbc9HOSop19DutpF2hSdn6iPY382KBUd45kJQBA2AyAU5Neifq0WTDf49xH3CybbVSfg7gg6NOaKMOEYdkmd8Y$
 ,
we have

The object can be any Python object convertible to an ndarray. If the
object is already (a subclass of) the ndarray that satisfies the
requirements then a new reference is returned.

I guess we should call "del" on the ndarray returned by iarray_s after
having called  self.set_attr('__array__', array) in this case, but let's
wait for Lisandro to confirm




Il giorno ven 3 mag 2024 alle ore 11:42 Samar Khatiwala <
samar.khatiw...@earth.ox.ac.uk> ha scritto:

> Hi Matt, Thanks so much for the quick reply! Regarding #2, I put some
> debug statement in my code and what I find is that when I use
> createWithArray on my Cython-allocated numpy array, the destructor I set
> for it is no longer called when I delete
> ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> This message came from outside your organization.
>
> ZjQcmQRYFpfptBannerEnd
> Hi Matt,
>
> Thanks so much for the quick reply!
>
> Regarding #2, I put some debug statement in my code and what I find is
> that when I use createWithArray on my Cython-allocated numpy array, the
> destructor I set for it is no longer called when I delete the array. (If I
> don’t use createWithArray then the destructor is triggered.) I interpret
> that to suggest that the petsc4py Vec is somehow ’taking over’ management
> of the numpy array. But I don’t understand where that could be
> happening. (I don’t think it has to do with the actual freeing of memory by
> PETSc's VecDestroy.)
>
> createWithArray calls iarray_s which in turn calls PyArray_FROM_OTF.
> Could it be there’s something going on there? The numpy documentation is
> unclear.
>
> Lisandro: do you have any thoughts on this?
>
> Thanks,
>
> Samar
>
> On May 2, 2024, at 11:56 PM, Matthew Knepley  wrote:
>
> On Thu, May 2, 2024 at 12:53 PM Samar Khatiwala <
> samar.khatiw...@earth.ox.ac.uk> wrote:
>
>> This Message Is From an External Sender
>> This message came from outside your organization.
>>
>>
>> Hello,
>>
>> I have a couple of questions about createWithArray in petsc4py:
>>
>> 1) What is the correct usage for creating a standard MPI Vec with it? 
>> Something like this seems to work but is it right?:
>>
>> On each rank do:
>> a = np.zeros(localSize)
>> v = PETSc.Vec().createWithArray(a, comm=PETSc.COMM_WORLD)
>>
>> Is that all it takes?
>>
>>
> That looks right to me.
>
>> 2) Who ‘owns’ the underlying memory for a Vec created with the 
>> createWithArray method, i.e., who is responsible for managing it and doing 
>> garbage collection? In my problem, the numpy array is created in a Cython 
>> module where memory is allocated, and a pointer to it is associated with a 
>> numpy ndarray via PyArray_SimpleNewFromData and PyArray_SetBaseObject. I 
>> have a deallocator method of my own that is called when the numpy array is 
>> deleted/goes out of scope/whenever python does garbage collection. All of 
>> that works fine. But if I use this array to create a Vec with 
>> createWithArray what happens when the Vec is, e.g., destroyed? Will my 
>> deallocator be called?
>>
>> No. The PETSc struct will be deallocated, but the storage will not be
> touched.
>
>   Thanks,
>
>  Matt
>
>> Or does petsc4py know that it doesn’t own the memory and won’t attempt to 
>> free it? I can’t quite figure out from the petsc4py code what is going on. 
>> And help would be appreciated.
>>
>> Thanks very much.
>>
>> Samar
>>
>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!cPqvd3brsxbc9HOSop19DutpF2hSdn6iPY382KBUd45kJQBA2AyAU5Neifq0WTDf49xH3CybbVSfg7gg6NOaKMOEOy05fZU$
>  
> 
>
>
>

-- 
Stefano


Re: [petsc-users] Question about how to use DS to do the discretization

2024-03-23 Thread Stefano Zampini
Take a look at
https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/blob/main/src/snes/tutorials/ex11.c?ref_type=heads__;!!G_uCfscf7eWS!er4CI8GIe7OCWvCmRKQpZt6FOz1QYvbuZOdf2Fm7pvMGee3I9M5bhjNytv42F9C17NpBy0i6mTfgEmQfUR_QOqwC7gC6pYk$
 
and the discussion at the beginning (including the reference to the
original paper)

On Sat, Mar 23, 2024, 15:03 Gong Yujie  wrote:

> Dear PETSc group, I'm reading the DS part for the discretization start
> from SNES ex17. c which is a demo for solving linear elasticity problem. I
> have two questions for the details. The first question is for the residual
> function. Is the residual
> ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> This message came from outside your organization.
>
> ZjQcmQRYFpfptBannerEnd
> Dear PETSc group,
>
> I'm reading the DS part for the discretization start from SNES ex17.c
> which is a demo for solving linear elasticity problem. I have two questions
> for the details.
>
> The first question is for the residual function. Is the residual
> calculated as this? The dot product is a little weird because of the
> dimension of the result.
> Here \sigma is the stress tensor, \phi_i is the test function for the i-th
> function (Linear elasticity in 3D contains three equations).
>
> The second question is how to derive the Jacobian of the system (line 330
> in ex17.c). As shown in the PetscDSSetJacobian, we need to provide function
> g3() which I think is a 4th-order tensor with size 3*3*3*3 in this linear
> elasticity case. I'm not sure how to get it. Are there any references on
> how to get this Jacobian?
>
> I've checked about the comment before this Jacobian function (line 330 in
> ex17.c) but don't know how to get this.
>
> Thanks in advance!
>
> Best Regards,
> Yujie
>


Re: [petsc-users] Difference between DMGetLocalVector and DMCreateLocalVector

2024-02-22 Thread Stefano Zampini
I guess we are missing to clear the internal scratch vectors, matt can
confirm

Can you please add this call AFTER you set the local section to confirm?

  call DMClearLocalVectors(dm_mesh, ierr);


Il giorno gio 22 feb 2024 alle ore 11:36 袁煕  ha
scritto:

> Hello, I found DMGetLocalVector and DMCreateLocalVector generate Vec of
> different size in my following codes  call
> PetscSectionSetup(section,ierr) call
> DMSetLocalSection(dm_mesh,section,ierr) .. . . . . do somethingcall
> PetscSectionSetup(section,ierr)call
> ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> This message came from outside your organization.
>
> ZjQcmQRYFpfptBannerEnd
>  Hello,
>
> I found DMGetLocalVector and DMCreateLocalVector generate Vec of
> different size in my following codes
>
> 
> call PetscSectionSetup(section,ierr)
> call DMSetLocalSection(dm_mesh,section,ierr)
> ..do something
>
> call PetscSectionSetup(section,ierr)
> call DMSetLocalSection(dm_mesh,section,ierr)
> ..
> call DMCreateLocalVector(dm_mesh,lvec,ierr);
> call DMGetLocalVector(dm_mesh,llvec,ierr);
> call VecGetSize(lvec,off,ierr)
> call VecGetSize(llvec,offl,ierr)
> call VecDestroy(lvec,ierr)
> call DMRestoreLocalVector(dm_mesh,llvec,ierr);
> 
> The pointer in the above program is that DMSetLocalSection is called two
> times. As the petsc manual indicates, "Any existing Section will be
> destroyed" after calling this function and To my understanding, a new
> section should be set. Therefore,  The above program should obtain values
> of *off* and *offl *the same. However, I found that they are different,
> with *off *equal to the size defined by the new section and *offl* equal
> to the size defined by the first one.
>
> Any problem in above codes? And how can I obtain a Vector of the same size
> by DMGetLocalVector and DMCreateLocalVector?
>
> Much thanks for your help.
>
> X. Yuan, Ph.D. in Solid Mechanics
>


-- 
Stefano


Re: [petsc-users] Bug in VecNorm, 3.20.3

2024-01-23 Thread Stefano Zampini
petsc main in debug mode has some additional checks for this cases. Can you
run with the main branch and configure petsc using --with-debugging=1?

Il giorno mar 23 gen 2024 alle ore 22:35 Barry Smith  ha
scritto:

>
>This could happen if the values in the vector get changed but the
> PetscObjectState does not get updated. Normally this is impossible, any
> action that changes a vectors values changes its state (so for example
> calling VecGetArray()/VecRestoreArray() updates the state.
>
>Are you accessing the vector values in any non-standard way?
>
>Barry
>
>
> > On Jan 23, 2024, at 11:39 AM, mich...@paraffinalia.co.uk wrote:
> >
> > Hello,
> >
> > I have used the GMRES solver in PETSc successfully up to now, but on
> installing the most recent release, 3.20.3, the solver fails by exiting
> early. Output from the code is:
> >
> > lt-nbi-solve-laplace: starting PETSc solver [23.0537]
> >  0 KSP Residual norm < 1.e-11
> > Linear solve converged due to CONVERGED_ATOL iterations 0
> > lt-nbi-solve-laplace: 0 iterations [23.0542] (22.9678)
> >
> > and tracing execution shows the norm returned by VecNorm to be 0.
> >
> > If I modify the function by commenting out line 217 of
> >
> >  src/vec/vec/interface/rvector.c
> >
> >  /* if (flg) PetscFunctionReturn(PETSC_SUCCESS); */
> >
> > the code executes correctly:
> >
> > lt-nbi-solve-laplace: starting PETSc solver [22.9392]
> >  0 KSP Residual norm 1.10836
> >  1 KSP Residual norm 0.0778301
> >  2 KSP Residual norm 0.0125121
> >  3 KSP Residual norm 0.00165836
> >  4 KSP Residual norm 0.000164066
> >  5 KSP Residual norm 2.12824e-05
> >  6 KSP Residual norm 4.50696e-06
> >  7 KSP Residual norm 5.85082e-07
> > Linear solve converged due to CONVERGED_RTOL iterations 7
> >
> > My compile options are:
> >
> > PETSC_ARCH=linux-gnu-real ./configure --with-mpi=0
> --with-scalar-type=real --with-threadsafety --with-debugging=0 --with-log=0
> --with-openmp
> >
> > uname -a returns:
> >
> > 5.15.80 #1 SMP PREEMPT Sun Nov 27 13:28:05 CST 2022 x86_64 Intel(R)
> Core(TM) i5-6200U CPU @ 2.30GHz GenuineIntel GNU/Linux
> >
>
>

-- 
Stefano


Re: [petsc-users] Parallel processes run significantly slower

2024-01-11 Thread Stefano Zampini
You are creating the matrix on the wrong communicator if you want it
parallel. You are using PETSc.COMM_SELF

On Thu, Jan 11, 2024, 19:28 Steffen Wilksen | Universitaet Bremen <
swilk...@itp.uni-bremen.de> wrote:

> Hi all,
>
> I'm trying to do repeated matrix-vector-multiplication of large sparse
> matrices in python using petsc4py. Even the most simple method of
> parallelization, dividing up the calculation to run on multiple processes
> indenpendtly, does not seem to give a singnificant speed up for large
> matrices. I constructed a minimal working example, which I run using
>
> mpiexec -n N python parallel_example.py,
>
> where N is the number of processes. Instead of taking approximately the
> same time irrespective of the number of processes used, the calculation is
> much slower when starting more MPI processes. This translates to little to
> no speed up when splitting up a fixed number of calculations over N
> processes. As an example, running with N=1 takes 9s, while running with N=4
> takes 34s. When running with smaller matrices, the problem is not as severe
> (only slower by a factor of 1.5 when setting MATSIZE=1e+5 instead of
> MATSIZE=1e+6). I get the same problems when just starting the script four
> times manually without using MPI.
> I attached both the script and the log file for running the script with
> N=4. Any help would be greatly appreciated. Calculations are done on my
> laptop, arch linux version 6.6.8 and PETSc version 3.20.2.
>
> Kind Regards
> Steffen
>


Re: [petsc-users] Problem with matrix and vector using GPU

2023-09-20 Thread Stefano Zampini
You are missing a call to DMSetFromOptions

On Wed, Sep 20, 2023, 19:20 Ramoni Z. Sedano Azevedo <
ramoni.zsed...@gmail.com> wrote:

> Thanks for the tip. Using dm_mat_type and dm_vec_type the code runs.
> ./${executable} \
>  -A_dm_mat_type aijcusparse \
>  -P_dm_mat_type aijcusparse \
>  -dm_vec_type cuda \
>  -use_gpu_aware_mpi 0 \
>  -em_ksp_monitor_true_residual \
>  -em_ksp_type bcgs \
>  -em_pc_type bjacobi \
>  -em_sub_pc_type ilu \
>  -em_sub_pc_factor_levels 3 \
>  -em_sub_pc_factor_fill 6 \
>  < ./Parameters.inp \
>
> But at the end the message appears:
> WARNING! There are options you set that were not used!
> WARNING! could be spelling mistake, etc!
> There are 3 unused database options. They are:
> Option left: name:-A_dm_mat_type value: aijcusparse
> Option left: name:-dm_vec_type value: cuda
> Option left: name:-P_dm_mat_type value: aijcusparse
>
> Using nvprof does not include kernels, only API use.
>
> Ramoni Z. S. Azevedo
>
> Em qua., 20 de set. de 2023 às 12:31, Junchao Zhang <
> junchao.zh...@gmail.com> escreveu:
>
>> Try to also add  *-dm_mat_type aijcusparse -dm_vec_type cuda*
>>
>> --Junchao Zhang
>>
>>
>> On Wed, Sep 20, 2023 at 10:21 AM Ramoni Z. Sedano Azevedo <
>> ramoni.zsed...@gmail.com> wrote:
>>
>>>
>>> Hey!
>>>
>>> I am using PETSc in a Fortran code and we use MPI parallelization. We
>>> would like to use GPU parallelization, but we are encountering an error.
>>>
>>> PETSc is configured as follows:
>>> #!/bin/bash
>>> ./configure \
>>>  --prefix=${PWD}/installdir \
>>>  --with-fortran \
>>>  --with-fortran-kernels=true \
>>>  --with-cuda \
>>>  --download-fblaslapack \
>>>  --with-scalar-type=complex \
>>>  --with-precision=double \
>>>  --with-debugging=yes \
>>>  --with-x=0 \
>>>  --with-gnu-compilers=1 \
>>>  --with-cc=mpicc \
>>>  --with-cxx=mpicxx \
>>>  --with-fc=mpif90 \
>>>  --with-make-exec=make
>>>
>>> Within my code, matrices and vectors are allocated with the following
>>> commands:
>>> PetscCallA( DMDACreate3d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE,
>>> DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, DMDA_STENCIL_BOX, l+1, m+1, nzn,
>>> PETSC_DECIDE, PETSC_DECIDE, PETSC_DECIDE, i3, i1, PETSC_NULL_INTEGER,
>>> PETSC_NULL_INTEGER, PETSC_NULL_INTEGER, da, ierr) )
>>>
>>> PetscCallA( DMSetUp(da,ierr) )
>>>
>>> PetscCallA( DMCreateGlobalVector(da,b,ierr) )
>>> PetscCallA( VecDuplicate(b,xsol,ierr) )
>>> PetscCallA( VecDuplicate(b,src,ierr) )
>>> PetscCallA( VecDuplicate(b,rhoxyz,ierr) )
>>>
>>> PetscCallA( DMCreateLocalVector(da,localx,ierr) )
>>> PetscCallA( VecDuplicate(localx,localb,ierr) )
>>> PetscCallA( VecDuplicate(localx,localsrc,ierr) )
>>> PetscCallA( VecDuplicate(localx,lrhoxyz,ierr) )
>>>
>>> PetscCallA( VecGetLocalSize(xsol,mloc,ierr) )
>>>
>>> ngrow=3*(l+1)*(m+1)*nzn
>>>
>>> PetscCallA( MatCreate(PETSC_COMM_WORLD,A,ierr) )
>>> PetscCallA( MatSetOptionsPrefix(A,'A_',ierr) )
>>> PetscCallA( MatSetSizes(A,mloc,mloc,ngrow,ngrow,ierr) )
>>> PetscCallA( MatSetFromOptions(A,ierr) )
>>> PetscCallA( MatSeqAIJSetPreallocation(A,i15,PETSC_NULL_INTEGER,ierr) )
>>> PetscCallA( MatSeqBAIJSetPreallocation(A, i3, i15, PETSC_NULL_INTEGER,
>>> ierr) )
>>>
>>> PetscCallA( MatMPIAIJSetPreallocation(A, i15, PETSC_NULL_INTEGER, i15,
>>> PETSC_NULL_INTEGER, ierr) )
>>>
>>> PetscCallA( MatMPIBAIJSetPreallocation(A, i3, i15, PETSC_NULL_INTEGER,
>>> i15, PETSC_NULL_INTEGER, ierr) )
>>>
>>> PetscCallA( MatCreate(PETSC_COMM_WORLD, P, ierr) )
>>> PetscCallA( MatSetOptionsPrefix(P, 'P_', ierr) )
>>> PetscCallA( MatSetSizes(P, mloc, mloc, ngrow, ngrow, ierr) )
>>> PetscCallA( MatSetFromOptions(P, ierr) )
>>> PetscCallA( MatSeqAIJSetPreallocation(P, i15, PETSC_NULL_INTEGER, ierr) )
>>> PetscCallA( MatSeqBAIJSetPreallocation(P, i3, i15, PETSC_NULL_INTEGER,
>>> ierr) )
>>>
>>> PetscCallA( MatMPIAIJSetPreallocation(P, i15, PETSC_NULL_INTEGER, i15,
>>> PETSC_NULL_INTEGER, ierr) )
>>> PetscCallA( MatMPIBAIJSetPreallocation(P, i3, i15, PETSC_NULL_INTEGER,
>>> i15, PETSC_NULL_INTEGER, ierr) )
>>>
>>> PetscCallA( DMDAGetInfo(da, PETSC_NULL_INTEGER, mx, my, mz,
>>> PETSC_NULL_INTEGER, PETSC_NULL_INTEGER, PETSC_NULL_INTEGER,
>>> PETSC_NULL_INTEGER, PETSC_NULL_INTEGER,
>>> PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,ierr)
>>> )
>>> PetscCallA( DMDAGetCorners(da,xs,ys,zs,xm,ym,zm,ierr) )
>>> PetscCallA( DMDAGetGhostCorners(da,gxs,gys,gzs,gxm,gym,gzm,ierr) )
>>>
>>> PetscCallA( DMLocalToGlobal(da,localsrc,INSERT_VALUES,src,ierr) )
>>> PetscCallA( DMGlobalToLocalBegin(da, src, INSERT_VALUES, localsrc, ierr)
>>> )
>>> PetscCallA( DMGlobalToLocalEnd(da,src,INSERT_VALUES,localsrc,ierr) )
>>> PetscCallA( DMLocalToGlobal(da,localb,INSERT_VALUES,b,ierr) )
>>>
>>> When calling the solver function
>>> PetscCallA( KSPSolve(ksp,b,xsol,ierr) )
>>> the following error occurs:
>>>
>>> [0]PETSC ERROR: - Error Message 
>>> --[0]PETSC 
>>> ERROR: Invalid argument[0]PETSC ERROR: Object (seq) is not seqcuda or 
>>> 

Re: [petsc-users] Non-linear solve: DIVERGED_LINE_SEARCH

2023-09-07 Thread Stefano Zampini
The solver did not diverge.
It was the line search that was not able to make further progress in
minimizing the 2-norm of the residual.
This is common in nonlinear solvers. It would help if you tell us what you
are trying to solve.
Note that at the first step, your residual norm is already 1.e-6. What kind
of accuracy do you want?


Il giorno gio 7 set 2023 alle ore 13:08 Karthikeyan Chockalingam - STFC
UKRI via petsc-users  ha scritto:

> Hello,
>
>
>
> The non-linear solution diverged. The final solution seems right and I
> believe the Jacobian is correct (not 100% certain).
>
>
>
> I am not sure if I doing something wrong in the solver setting.
>
>
>
>
>
>  0 SNES Function norm 3.890991210938e-03
>
> 0 KSP Residual norm 9.037762538598e+00
>
> 1 KSP Residual norm 2.120375403775e-01
>
> 2 KSP Residual norm 5.155439334511e-03
>
> 3 KSP Residual norm 1.394364169369e-04
>
> 4 KSP Residual norm 9.233543407204e-06
>
>   Linear solve converged due to CONVERGED_RTOL iterations 4
>
>
>
>   Line search: Using full step: fnorm 3.890991210938e-03 gnorm
> 7.701511565083e-06
>
>   1 SNES Function norm 7.701511565083e-06
>
> 0 KSP Residual norm 5.630229687829e-03
>
> 1 KSP Residual norm 1.030475601271e-04
>
> 2 KSP Residual norm 2.576454714319e-06
>
> 3 KSP Residual norm 6.669316846898e-08
>
> 4 KSP Residual norm 3.215810984829e-09
>
>   Linear solve converged due to CONVERGED_RTOL iterations 4
>
>
>
>   Line search: gnorm after quadratic fit 1.805500533481e-05
>
>   Line search: Cubic step no good, shrinking lambda, current gnorm
> 2.563759884284e-05 lambda=3.0804668685096816e-02
>
>   Line search: Cubic step no good, shrinking lambda, current gnorm
> 3.332721829751e-05 lambda=3.0804668685096817e-03
>
>   Line search: Cubic step no good, shrinking lambda, current gnorm
> 4.102754045833e-05 lambda=3.0804668685096822e-04
>
>   Line search: Cubic step no good, shrinking lambda, current gnorm
> 4.872893294880e-05 lambda=3.0804668685096822e-05
>
>   Line search: Cubic step no good, shrinking lambda, current gnorm
> 5.643043250787e-05 lambda=3.0804668685096822e-06
>
>   Line search: Cubic step no good, shrinking lambda, current gnorm
> 6.413194279696e-05 lambda=3.0804668685096827e-07
>
>   Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.183345417492e-05 lambda=3.0804668685096828e-08
>
>   Line search: Cubic step no good, shrinking lambda, current gnorm
> 7.953496567312e-05 lambda=3.0804668685096829e-09
>
>   Line search: Cubic step no good, shrinking lambda, current gnorm
> 8.723647719173e-05 lambda=3.0804668685096831e-10
>
>   Line search: Cubic step no good, shrinking lambda, current gnorm
> 9.493798871875e-05 lambda=3.0804668685096832e-11
>
>   Line search: Cubic step no good, shrinking lambda, current gnorm
> 1.026395002516e-04 lambda=3.0804668685096835e-12
>
>   Line search: Cubic step no good, shrinking lambda, current gnorm
> 1.103410117889e-04 lambda=3.0804668685096835e-13
>
>   Line search: unable to find good step length! After 12 tries
>
>   Line search: fnorm=7.7015115650831560e-06,
> gnorm=1.1034101178892401e-04, ynorm=6.2052357872955976e-03,
> minlambda=9.9998e-13, lambda=3.0804668685096835e-13, initial
> slope=-5.9313318983096354e-11
>
> Nonlinear solve did not converge due to DIVERGED_LINE_SEARCH iterations 1
>
>
>
>
>
> Thank you for your help.
>
>
>
> Kind regards,
>
> Karthik.
>
>
>
> --
>
> *Dr. Karthik Chockalingam*
>
> Senior Research Software Engineer
>
> High Performance Systems Engineering Group
>
> Hartree Centre | Science and Technology Facilities Council
>
> karthikeyan.chockalin...@stfc.ac.uk
>
>
>
>  [image: signature_3970890138]
>
>
>


-- 
Stefano


Re: [petsc-users] Python PETSc performance vs scipy ZVODE

2023-08-10 Thread Stefano Zampini
Then just do the multiplications you need. My proposal was for the example
function you were showing.

On Thu, Aug 10, 2023, 12:25 Niclas Götting 
wrote:

> You are absolutely right for this specific case (I get about 2400it/s
> instead of 2100it/s). However, the single square function will be replaced
> by a series of gaussian pulses in the future, which will never be zero.
> Maybe one could do an approximation and skip the second mult, if the
> gaussians are close to zero.
> On 10.08.23 12:16, Stefano Zampini wrote:
>
> If you do the mult of "pump" inside an if it should be faster
>
> On Thu, Aug 10, 2023, 12:12 Niclas Götting 
> wrote:
>
>> If I understood you right, this should be the resulting RHS:
>>
>> def rhsfunc5(ts, t, u, F):
>> l.mult(u, F)
>> pump.mult(u, tmp_vec)
>> scale = 0.5 * (5 < t < 10)
>> F.axpy(scale, tmp_vec)
>>
>> It is a little bit slower than option 3, but with about 2100it/s
>> consistently ~10% faster than option 4.
>>
>> Thank you very much for the suggestion!
>> On 10.08.23 11:47, Stefano Zampini wrote:
>>
>> I would use option 3. Keep a work vector and do a vector summation
>> instead of the multiple multiplication by scale and 1/scale.
>>
>> I agree with you the docs are a little misleading here.
>>
>> On Thu, Aug 10, 2023, 11:40 Niclas Götting 
>> wrote:
>>
>>> Thank you both for the very quick answer!
>>>
>>> So far, I compiled PETSc with debugging turned on, but I think it should
>>> still be faster than standard scipy in both cases. Actually, Stefano's
>>> answer has got me very far already; now I only define the RHS of the ODE
>>> and no Jacobian (I wonder, why the documentation suggests otherwise,
>>> though). I had the following four tries at implementing the RHS:
>>>
>>>1. def rhsfunc1(ts, t, u, F):
>>>scale = 0.5 * (5 < t < 10)
>>>(l + scale * pump).mult(u, F)
>>>2. def rhsfunc2(ts, t, u, F):
>>>l.mult(u, F)
>>>scale = 0.5 * (5 < t < 10)
>>>(scale * pump).multAdd(u, F, F)
>>>3. def rhsfunc3(ts, t, u, F):
>>>l.mult(u, F)
>>>scale = 0.5 * (5 < t < 10)
>>>if scale != 0:
>>>pump.scale(scale)
>>>pump.multAdd(u, F, F)
>>>pump.scale(1/scale)
>>>4. def rhsfunc4(ts, t, u, F):
>>>tmp_pump.zeroEntries() # tmp_pump is pump.duplicate()
>>>l.mult(u, F)
>>>scale = 0.5 * (5 < t < 10)
>>>tmp_pump.axpy(scale, pump,
>>>structure=PETSc.Mat.Structure.SAME_NONZERO_PATTERN)
>>>tmp_pump.multAdd(u, F, F)
>>>
>>> They all yield the same results, but with 50it/s, 800it/, 2300it/s and
>>> 1900it/s, respectively, which is a huge performance boost (almost 7 times
>>> as fast as scipy, with PETSc debugging still turned on). As the scale
>>> function will most likely be a gaussian in the future, I think that option
>>> 3 will be become numerically unstable and I'll have to go with option 4,
>>> which is already faster than I expected. If you think it is possible to
>>> speed up the RHS calculation even more, I'd be happy to hear your
>>> suggestions; the -log_view is attached to this message.
>>>
>>> One last point: If I didn't misunderstand the documentation at
>>> https://petsc.org/release/manual/ts/#special-cases, should this maybe
>>> be changed?
>>>
>>> Best regards
>>> Niclas
>>> On 09.08.23 17:51, Stefano Zampini wrote:
>>>
>>> TSRK is an explicit solver. Unless you are changing the ts type from
>>> command line,  the explicit  jacobian should not be needed. On top of
>>> Barry's suggestion, I would suggest you to write the explicit RHS instead
>>> of assembly a throw away matrix every time that function needs to be
>>> sampled.
>>>
>>> On Wed, Aug 9, 2023, 17:09 Niclas Götting 
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'm currently trying to convert a quantum simulation from scipy to
>>>> PETSc. The problem itself is extremely simple and of the form
>>>> \dot{u}(t)
>>>> = (A_const + f(t)*B_const)*u(t), where f(t) in this simple test case is
>>>> a square function. The matrices A_const and B_const are extremely
>>>> sparse
>>>> and therefore I thought, the problem will be well suited for PETSc.
>

Re: [petsc-users] Python PETSc performance vs scipy ZVODE

2023-08-10 Thread Stefano Zampini
If you do the mult of "pump" inside an if it should be faster

On Thu, Aug 10, 2023, 12:12 Niclas Götting 
wrote:

> If I understood you right, this should be the resulting RHS:
>
> def rhsfunc5(ts, t, u, F):
> l.mult(u, F)
> pump.mult(u, tmp_vec)
> scale = 0.5 * (5 < t < 10)
> F.axpy(scale, tmp_vec)
>
> It is a little bit slower than option 3, but with about 2100it/s
> consistently ~10% faster than option 4.
>
> Thank you very much for the suggestion!
> On 10.08.23 11:47, Stefano Zampini wrote:
>
> I would use option 3. Keep a work vector and do a vector summation instead
> of the multiple multiplication by scale and 1/scale.
>
> I agree with you the docs are a little misleading here.
>
> On Thu, Aug 10, 2023, 11:40 Niclas Götting 
> wrote:
>
>> Thank you both for the very quick answer!
>>
>> So far, I compiled PETSc with debugging turned on, but I think it should
>> still be faster than standard scipy in both cases. Actually, Stefano's
>> answer has got me very far already; now I only define the RHS of the ODE
>> and no Jacobian (I wonder, why the documentation suggests otherwise,
>> though). I had the following four tries at implementing the RHS:
>>
>>1. def rhsfunc1(ts, t, u, F):
>>scale = 0.5 * (5 < t < 10)
>>(l + scale * pump).mult(u, F)
>>2. def rhsfunc2(ts, t, u, F):
>>l.mult(u, F)
>>scale = 0.5 * (5 < t < 10)
>>(scale * pump).multAdd(u, F, F)
>>3. def rhsfunc3(ts, t, u, F):
>>l.mult(u, F)
>>scale = 0.5 * (5 < t < 10)
>>if scale != 0:
>>pump.scale(scale)
>>pump.multAdd(u, F, F)
>>pump.scale(1/scale)
>>4. def rhsfunc4(ts, t, u, F):
>>tmp_pump.zeroEntries() # tmp_pump is pump.duplicate()
>>l.mult(u, F)
>>scale = 0.5 * (5 < t < 10)
>>tmp_pump.axpy(scale, pump,
>>structure=PETSc.Mat.Structure.SAME_NONZERO_PATTERN)
>>tmp_pump.multAdd(u, F, F)
>>
>> They all yield the same results, but with 50it/s, 800it/, 2300it/s and
>> 1900it/s, respectively, which is a huge performance boost (almost 7 times
>> as fast as scipy, with PETSc debugging still turned on). As the scale
>> function will most likely be a gaussian in the future, I think that option
>> 3 will be become numerically unstable and I'll have to go with option 4,
>> which is already faster than I expected. If you think it is possible to
>> speed up the RHS calculation even more, I'd be happy to hear your
>> suggestions; the -log_view is attached to this message.
>>
>> One last point: If I didn't misunderstand the documentation at
>> https://petsc.org/release/manual/ts/#special-cases, should this maybe be
>> changed?
>>
>> Best regards
>> Niclas
>> On 09.08.23 17:51, Stefano Zampini wrote:
>>
>> TSRK is an explicit solver. Unless you are changing the ts type from
>> command line,  the explicit  jacobian should not be needed. On top of
>> Barry's suggestion, I would suggest you to write the explicit RHS instead
>> of assembly a throw away matrix every time that function needs to be
>> sampled.
>>
>> On Wed, Aug 9, 2023, 17:09 Niclas Götting 
>> wrote:
>>
>>> Hi all,
>>>
>>> I'm currently trying to convert a quantum simulation from scipy to
>>> PETSc. The problem itself is extremely simple and of the form \dot{u}(t)
>>> = (A_const + f(t)*B_const)*u(t), where f(t) in this simple test case is
>>> a square function. The matrices A_const and B_const are extremely sparse
>>> and therefore I thought, the problem will be well suited for PETSc.
>>> Currently, I solve the ODE with the following procedure in scipy (I can
>>> provide the necessary data files, if needed, but they are just some
>>> trace-preserving, very sparse matrices):
>>>
>>> import numpy as np
>>> import scipy.sparse
>>> import scipy.integrate
>>>
>>> from tqdm import tqdm
>>>
>>>
>>> l = np.load("../liouvillian.npy")
>>> pump = np.load("../pump_operator.npy")
>>> state = np.load("../initial_state.npy")
>>>
>>> l = scipy.sparse.csr_array(l)
>>> pump = scipy.sparse.csr_array(pump)
>>>
>>> def f(t, y, *args):
>>>  return (l + 0.5 * (5 < t < 10) * pump) @ y
>>>  #return l @ y # Uncomment for f(t) = 0
>>>
>>> dt = 0.1
>>> NUM_STEPS = 200
>>> res = np.empty((NU

Re: [petsc-users] Python PETSc performance vs scipy ZVODE

2023-08-10 Thread Stefano Zampini
I would use option 3. Keep a work vector and do a vector summation instead
of the multiple multiplication by scale and 1/scale.

I agree with you the docs are a little misleading here.

On Thu, Aug 10, 2023, 11:40 Niclas Götting 
wrote:

> Thank you both for the very quick answer!
>
> So far, I compiled PETSc with debugging turned on, but I think it should
> still be faster than standard scipy in both cases. Actually, Stefano's
> answer has got me very far already; now I only define the RHS of the ODE
> and no Jacobian (I wonder, why the documentation suggests otherwise,
> though). I had the following four tries at implementing the RHS:
>
>1. def rhsfunc1(ts, t, u, F):
>scale = 0.5 * (5 < t < 10)
>(l + scale * pump).mult(u, F)
>2. def rhsfunc2(ts, t, u, F):
>l.mult(u, F)
>scale = 0.5 * (5 < t < 10)
>(scale * pump).multAdd(u, F, F)
>3. def rhsfunc3(ts, t, u, F):
>l.mult(u, F)
>scale = 0.5 * (5 < t < 10)
>if scale != 0:
>pump.scale(scale)
>pump.multAdd(u, F, F)
>pump.scale(1/scale)
>4. def rhsfunc4(ts, t, u, F):
>tmp_pump.zeroEntries() # tmp_pump is pump.duplicate()
>l.mult(u, F)
>scale = 0.5 * (5 < t < 10)
>tmp_pump.axpy(scale, pump,
>structure=PETSc.Mat.Structure.SAME_NONZERO_PATTERN)
>tmp_pump.multAdd(u, F, F)
>
> They all yield the same results, but with 50it/s, 800it/, 2300it/s and
> 1900it/s, respectively, which is a huge performance boost (almost 7 times
> as fast as scipy, with PETSc debugging still turned on). As the scale
> function will most likely be a gaussian in the future, I think that option
> 3 will be become numerically unstable and I'll have to go with option 4,
> which is already faster than I expected. If you think it is possible to
> speed up the RHS calculation even more, I'd be happy to hear your
> suggestions; the -log_view is attached to this message.
>
> One last point: If I didn't misunderstand the documentation at
> https://petsc.org/release/manual/ts/#special-cases, should this maybe be
> changed?
>
> Best regards
> Niclas
> On 09.08.23 17:51, Stefano Zampini wrote:
>
> TSRK is an explicit solver. Unless you are changing the ts type from
> command line,  the explicit  jacobian should not be needed. On top of
> Barry's suggestion, I would suggest you to write the explicit RHS instead
> of assembly a throw away matrix every time that function needs to be
> sampled.
>
> On Wed, Aug 9, 2023, 17:09 Niclas Götting 
> wrote:
>
>> Hi all,
>>
>> I'm currently trying to convert a quantum simulation from scipy to
>> PETSc. The problem itself is extremely simple and of the form \dot{u}(t)
>> = (A_const + f(t)*B_const)*u(t), where f(t) in this simple test case is
>> a square function. The matrices A_const and B_const are extremely sparse
>> and therefore I thought, the problem will be well suited for PETSc.
>> Currently, I solve the ODE with the following procedure in scipy (I can
>> provide the necessary data files, if needed, but they are just some
>> trace-preserving, very sparse matrices):
>>
>> import numpy as np
>> import scipy.sparse
>> import scipy.integrate
>>
>> from tqdm import tqdm
>>
>>
>> l = np.load("../liouvillian.npy")
>> pump = np.load("../pump_operator.npy")
>> state = np.load("../initial_state.npy")
>>
>> l = scipy.sparse.csr_array(l)
>> pump = scipy.sparse.csr_array(pump)
>>
>> def f(t, y, *args):
>>  return (l + 0.5 * (5 < t < 10) * pump) @ y
>>  #return l @ y # Uncomment for f(t) = 0
>>
>> dt = 0.1
>> NUM_STEPS = 200
>> res = np.empty((NUM_STEPS, 4096), dtype=np.complex128)
>> solver =
>> scipy.integrate.ode(f).set_integrator("zvode").set_initial_value(state)
>> times = []
>> for i in tqdm(range(NUM_STEPS)):
>>  res[i, :] = solver.integrate(solver.t + dt)
>>  times.append(solver.t)
>>
>> Here, A_const = l, B_const = pump and f(t) = 5 < t < 10. tqdm reports
>> about 330it/s on my machine. When converting the code to PETSc, I came
>> to the following result (according to the chapter
>> https://petsc.org/main/manual/ts/#special-cases)
>>
>> import sys
>> import petsc4py
>> petsc4py.init(args=sys.argv)
>> import numpy as np
>> import scipy.sparse
>>
>> from tqdm import tqdm
>> from petsc4py import PETSc
>>
>> comm = PETSc.COMM_WORLD
>>
>>
>> def mat_to_real(arr):
>>  return np.block([[arr.real, -arr.imag], [a

Re: [petsc-users] Python PETSc performance vs scipy ZVODE

2023-08-09 Thread Stefano Zampini
TSRK is an explicit solver. Unless you are changing the ts type from
command line,  the explicit  jacobian should not be needed. On top of
Barry's suggestion, I would suggest you to write the explicit RHS instead
of assembly a throw away matrix every time that function needs to be
sampled.

On Wed, Aug 9, 2023, 17:09 Niclas Götting 
wrote:

> Hi all,
>
> I'm currently trying to convert a quantum simulation from scipy to
> PETSc. The problem itself is extremely simple and of the form \dot{u}(t)
> = (A_const + f(t)*B_const)*u(t), where f(t) in this simple test case is
> a square function. The matrices A_const and B_const are extremely sparse
> and therefore I thought, the problem will be well suited for PETSc.
> Currently, I solve the ODE with the following procedure in scipy (I can
> provide the necessary data files, if needed, but they are just some
> trace-preserving, very sparse matrices):
>
> import numpy as np
> import scipy.sparse
> import scipy.integrate
>
> from tqdm import tqdm
>
>
> l = np.load("../liouvillian.npy")
> pump = np.load("../pump_operator.npy")
> state = np.load("../initial_state.npy")
>
> l = scipy.sparse.csr_array(l)
> pump = scipy.sparse.csr_array(pump)
>
> def f(t, y, *args):
>  return (l + 0.5 * (5 < t < 10) * pump) @ y
>  #return l @ y # Uncomment for f(t) = 0
>
> dt = 0.1
> NUM_STEPS = 200
> res = np.empty((NUM_STEPS, 4096), dtype=np.complex128)
> solver =
> scipy.integrate.ode(f).set_integrator("zvode").set_initial_value(state)
> times = []
> for i in tqdm(range(NUM_STEPS)):
>  res[i, :] = solver.integrate(solver.t + dt)
>  times.append(solver.t)
>
> Here, A_const = l, B_const = pump and f(t) = 5 < t < 10. tqdm reports
> about 330it/s on my machine. When converting the code to PETSc, I came
> to the following result (according to the chapter
> https://petsc.org/main/manual/ts/#special-cases)
>
> import sys
> import petsc4py
> petsc4py.init(args=sys.argv)
> import numpy as np
> import scipy.sparse
>
> from tqdm import tqdm
> from petsc4py import PETSc
>
> comm = PETSc.COMM_WORLD
>
>
> def mat_to_real(arr):
>  return np.block([[arr.real, -arr.imag], [arr.imag,
> arr.real]]).astype(np.float64)
>
> def mat_to_petsc_aij(arr):
>  arr_sc_sp = scipy.sparse.csr_array(arr)
>  mat = PETSc.Mat().createAIJ(arr.shape[0], comm=comm)
>  rstart, rend = mat.getOwnershipRange()
>  print(rstart, rend)
>  print(arr.shape[0])
>  print(mat.sizes)
>  I = arr_sc_sp.indptr[rstart : rend + 1] - arr_sc_sp.indptr[rstart]
>  J = arr_sc_sp.indices[arr_sc_sp.indptr[rstart] :
> arr_sc_sp.indptr[rend]]
>  V = arr_sc_sp.data[arr_sc_sp.indptr[rstart] : arr_sc_sp.indptr[rend]]
>
>  print(I.shape, J.shape, V.shape)
>  mat.setValuesCSR(I, J, V)
>  mat.assemble()
>  return mat
>
>
> l = np.load("../liouvillian.npy")
> l = mat_to_real(l)
> pump = np.load("../pump_operator.npy")
> pump = mat_to_real(pump)
> state = np.load("../initial_state.npy")
> state = np.hstack([state.real, state.imag]).astype(np.float64)
>
> l = mat_to_petsc_aij(l)
> pump = mat_to_petsc_aij(pump)
>
>
> jac = l.duplicate()
> for i in range(8192):
>  jac.setValue(i, i, 0)
> jac.assemble()
> jac += l
>
> vec = l.createVecRight()
> vec.setValues(np.arange(state.shape[0], dtype=np.int32), state)
> vec.assemble()
>
>
> dt = 0.1
>
> ts = PETSc.TS().create(comm=comm)
> ts.setFromOptions()
> ts.setProblemType(ts.ProblemType.LINEAR)
> ts.setEquationType(ts.EquationType.ODE_EXPLICIT)
> ts.setType(ts.Type.RK)
> ts.setRKType(ts.RKType.RK3BS)
> ts.setTime(0)
> print("KSP:", ts.getKSP().getType())
> print("KSP PC:",ts.getKSP().getPC().getType())
> print("SNES :", ts.getSNES().getType())
>
> def jacobian(ts, t, u, Amat, Pmat):
>  Amat.zeroEntries()
>  Amat.aypx(1, l, structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN)
>  Amat.axpy(0.5 * (5 < t < 10), pump,
> structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN)
>
> ts.setRHSFunction(PETSc.TS.computeRHSFunctionLinear)
> #ts.setRHSJacobian(PETSc.TS.computeRHSJacobianConstant, l, l) #
> Uncomment for f(t) = 0
> ts.setRHSJacobian(jacobian, jac)
>
> NUM_STEPS = 200
> res = np.empty((NUM_STEPS, 8192), dtype=np.float64)
> times = []
> rstart, rend = vec.getOwnershipRange()
> for i in tqdm(range(NUM_STEPS)):
>  time = ts.getTime()
>  ts.setMaxTime(time + dt)
>  ts.solve(vec)
>  res[i, rstart:rend] = vec.getArray()[:]
>  times.append(time)
>
> I decomposed the complex ODE into a larger real ODE, so that I can
> easily switch maybe to GPU computation later on. Now, the solutions of
> both scripts are very much identical, but PETSc runs about 3 times
> slower at 120it/s on my machine. I don't use MPI for PETSc yet.
>
> I strongly suppose that the problem lies within the jacobian definition,
> as PETSc is about 3 times *faster* than scipy with f(t) = 0 and
> therefore a constant jacobian.
>
> Thank you in advance.
>
> All the best,
> Niclas
>
>
>


Re: [petsc-users] Near null space for a fieldsplit in petsc4py

2023-07-13 Thread Stefano Zampini
clearly, I meant optimized mode

Il giorno gio 13 lug 2023 alle ore 19:19 Stefano Zampini <
stefano.zamp...@gmail.com> ha scritto:

> In any case, we need to activate PetscCheck in debug mode too. This could
> have been avoided.
>
> Il giorno gio 13 lug 2023 alle ore 18:23 Karin 
> ha scritto:
>
>> Thank you very much for the information Matt.
>> Unfortunately, I do not use DM  :-(
>>
>> Le jeu. 13 juil. 2023 à 13:44, Matthew Knepley  a
>> écrit :
>>
>>> On Thu, Jul 13, 2023 at 5:33 AM Pierre Jolivet 
>>> wrote:
>>>
>>>> Dear Nicolas,
>>>>
>>>> On 13 Jul 2023, at 10:17 AM, TARDIEU Nicolas 
>>>> wrote:
>>>>
>>>> Dear Pierre,
>>>>
>>>> You are absolutely right. I was using a --with-debugging=0 (aka
>>>> release) install and this is definitely an error.
>>>> Once I used my debug install, I found the way to fix my problem. The
>>>> solution is in the attached script: I first need to extract the correct
>>>> block from the PC operator's MatNest and then append the null space to it.
>>>> Anyway this is a bit tricky...
>>>>
>>>>
>>>> Yep, it’s the same with all “nested” solvers, fieldsplit, ASM, MG, you
>>>> name it.
>>>> You first need the initial PCSetUp() so that the bare minimum is put in
>>>> place, then you have to fetch things yourself and adapt it to your needs.
>>>> We had a similar discussion with the MEF++ people last week, there is
>>>> currently no way around this, AFAIK.
>>>>
>>>
>>> Actually, I hated this as well, so I built a way around it _if_ you are
>>> using a DM to define the problem. Then
>>> you can set a "nullspace constructor" to make it if the field you are
>>> talking about is ever extracted. You use DMSetNullSpaceConstructor(). I do
>>> this in SNES ex62 and ex69, and other examples.
>>>
>>>   Thanks,
>>>
>>>  Matt
>>>
>>>
>>>> Thanks,
>>>> Pierre
>>>>
>>>> Regards,
>>>> Nicolas
>>>>
>>>> --
>>>> *De :* pierre.joli...@lip6.fr 
>>>> *Envoyé :* mercredi 12 juillet 2023 19:52
>>>> *À :* TARDIEU Nicolas 
>>>> *Cc :* petsc-users@mcs.anl.gov 
>>>> *Objet :* Re: [petsc-users] Near null space for a fieldsplit in
>>>> petsc4py
>>>>
>>>>
>>>> > On 12 Jul 2023, at 6:04 PM, TARDIEU Nicolas via petsc-users <
>>>> petsc-users@mcs.anl.gov> wrote:
>>>> >
>>>> > Dear PETSc team,
>>>> >
>>>> > In the attached example, I set up a block pc for a saddle-point
>>>> problem in petsc4py. The IS define the unknowns, namely some physical
>>>> quantity (phys) and a Lagrange multiplier (lags).
>>>> > I would like to attach a near null space to the physical block, in
>>>> order to get the best performance from an AMG pc.
>>>> > I have been trying hard, attaching it to the initial block, to the IS
>>>> but no matter what I am doing, when it comes to "ksp_view", no near null
>>>> space is attached to the matrix.
>>>> >
>>>> > Could you please help me figure out what I am doing wrong ?
>>>>
>>>> Are you using a double-precision 32-bit integers real build of PETSc?
>>>> Is it --with-debugging=0?
>>>> Because with my debug build, I get the following error (thus explaining
>>>> why it’s not attached to the KSP).
>>>> Traceback (most recent call last):
>>>>   File "/Volumes/Data/Downloads/test/test_NullSpace.py", line 35, in
>>>> 
>>>> ns = NullSpace().create(True, [v], comm=comm)
>>>>  
>>>>   File "petsc4py/PETSc/Mat.pyx", line 5611, in
>>>> petsc4py.PETSc.NullSpace.create
>>>> petsc4py.PETSc.Error: error code 62
>>>> [0] MatNullSpaceCreate() at
>>>> /Volumes/Data/repositories/petsc/src/mat/interface/matnull.c:249
>>>> [0] Invalid argument
>>>> [0] Vector 0 must have 2-norm of 1.0, it is 22.3159
>>>>
>>>> Furthermore, if you set yourself the constant vector in the near
>>>> null-space, then the first argument of create() must be False, otherwise,
>>>> you’ll have twice the same vector, and you’ll end up wit

Re: [petsc-users] Near null space for a fieldsplit in petsc4py

2023-07-13 Thread Stefano Zampini
In any case, we need to activate PetscCheck in debug mode too. This could
have been avoided.

Il giorno gio 13 lug 2023 alle ore 18:23 Karin 
ha scritto:

> Thank you very much for the information Matt.
> Unfortunately, I do not use DM  :-(
>
> Le jeu. 13 juil. 2023 à 13:44, Matthew Knepley  a
> écrit :
>
>> On Thu, Jul 13, 2023 at 5:33 AM Pierre Jolivet 
>> wrote:
>>
>>> Dear Nicolas,
>>>
>>> On 13 Jul 2023, at 10:17 AM, TARDIEU Nicolas 
>>> wrote:
>>>
>>> Dear Pierre,
>>>
>>> You are absolutely right. I was using a --with-debugging=0 (aka release)
>>> install and this is definitely an error.
>>> Once I used my debug install, I found the way to fix my problem. The
>>> solution is in the attached script: I first need to extract the correct
>>> block from the PC operator's MatNest and then append the null space to it.
>>> Anyway this is a bit tricky...
>>>
>>>
>>> Yep, it’s the same with all “nested” solvers, fieldsplit, ASM, MG, you
>>> name it.
>>> You first need the initial PCSetUp() so that the bare minimum is put in
>>> place, then you have to fetch things yourself and adapt it to your needs.
>>> We had a similar discussion with the MEF++ people last week, there is
>>> currently no way around this, AFAIK.
>>>
>>
>> Actually, I hated this as well, so I built a way around it _if_ you are
>> using a DM to define the problem. Then
>> you can set a "nullspace constructor" to make it if the field you are
>> talking about is ever extracted. You use DMSetNullSpaceConstructor(). I do
>> this in SNES ex62 and ex69, and other examples.
>>
>>   Thanks,
>>
>>  Matt
>>
>>
>>> Thanks,
>>> Pierre
>>>
>>> Regards,
>>> Nicolas
>>>
>>> --
>>> *De :* pierre.joli...@lip6.fr 
>>> *Envoyé :* mercredi 12 juillet 2023 19:52
>>> *À :* TARDIEU Nicolas 
>>> *Cc :* petsc-users@mcs.anl.gov 
>>> *Objet :* Re: [petsc-users] Near null space for a fieldsplit in petsc4py
>>>
>>>
>>> > On 12 Jul 2023, at 6:04 PM, TARDIEU Nicolas via petsc-users <
>>> petsc-users@mcs.anl.gov> wrote:
>>> >
>>> > Dear PETSc team,
>>> >
>>> > In the attached example, I set up a block pc for a saddle-point
>>> problem in petsc4py. The IS define the unknowns, namely some physical
>>> quantity (phys) and a Lagrange multiplier (lags).
>>> > I would like to attach a near null space to the physical block, in
>>> order to get the best performance from an AMG pc.
>>> > I have been trying hard, attaching it to the initial block, to the IS
>>> but no matter what I am doing, when it comes to "ksp_view", no near null
>>> space is attached to the matrix.
>>> >
>>> > Could you please help me figure out what I am doing wrong ?
>>>
>>> Are you using a double-precision 32-bit integers real build of PETSc?
>>> Is it --with-debugging=0?
>>> Because with my debug build, I get the following error (thus explaining
>>> why it’s not attached to the KSP).
>>> Traceback (most recent call last):
>>>   File "/Volumes/Data/Downloads/test/test_NullSpace.py", line 35, in
>>> 
>>> ns = NullSpace().create(True, [v], comm=comm)
>>>  
>>>   File "petsc4py/PETSc/Mat.pyx", line 5611, in
>>> petsc4py.PETSc.NullSpace.create
>>> petsc4py.PETSc.Error: error code 62
>>> [0] MatNullSpaceCreate() at
>>> /Volumes/Data/repositories/petsc/src/mat/interface/matnull.c:249
>>> [0] Invalid argument
>>> [0] Vector 0 must have 2-norm of 1.0, it is 22.3159
>>>
>>> Furthermore, if you set yourself the constant vector in the near
>>> null-space, then the first argument of create() must be False, otherwise,
>>> you’ll have twice the same vector, and you’ll end up with another error
>>> (the vectors in the near null-space must be orthonormal).
>>> If things still don’t work after those couple of fixes, please feel free
>>> to send an up-to-date reproducer.
>>>
>>> Thanks,
>>> Pierre
>>>
>>> > Thanks,
>>> > Nicolas
>>> >
>>> >
>>> >
>>> >
>>> > Ce message et toutes les pièces jointes (ci-après le 'Message') sont
>>> établis à l'intention exclusive des destinataires et les informations qui y
>>> figurent sont strictement confidentielles. Toute utilisation de ce Message
>>> non conforme à sa destination, toute diffusion ou toute publication totale
>>> ou partielle, est interdite sauf autorisation expresse.
>>> >
>>> > Si vous n'êtes pas le destinataire de ce Message, il vous est interdit
>>> de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou
>>> partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de
>>> votre système, ainsi que toutes ses copies, et de n'en garder aucune trace
>>> sur quelque support que ce soit. Nous vous remercions également d'en
>>> avertir immédiatement l'expéditeur par retour du message.
>>> >
>>> > Il est impossible de garantir que les communications par messagerie
>>> électronique arrivent en temps utile, sont sécurisées ou dénuées de toute
>>> erreur ou virus.
>>> > 
>>> >
>>> > This message and any attachments 

Re: [petsc-users] How to efficiently fill in, in parallel, a PETSc matrix from a COO sparse matrix?

2023-06-20 Thread Stefano Zampini
The loop should iterate on the number of entries of the array, not the
number of local rows

On Tue, Jun 20, 2023, 17:07 Matthew Knepley  wrote:

> On Tue, Jun 20, 2023 at 10:55 AM Diego Magela Lemos via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
>
>> Considering, for instance, the following COO sparse matrix format, with
>> repeated indices:
>>
>> std::vector rows{0, 0, 1, 2, 3, 4};
>> std::vector cols{0, 0, 1, 2, 3, 4};
>> std::vector values{2, -1, 2, 3, 4, 5};
>>
>> that represents a 5x5 diagonal matrix A.
>>
>> So far, the code that I have is:
>>
>> // fill_in_matrix.cc
>> static char help[] = "Fill in a parallel COO format sparse matrix.";
>> #include #include 
>> int main(int argc, char **args){
>> Mat A;
>> PetscInt m = 5, i, Istart, Iend;
>>
>> PetscCall(PetscInitialize(, , NULL, help));
>>
>> PetscCall(MatCreate(PETSC_COMM_WORLD, ));
>> PetscCall(MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, m, m));
>> PetscCall(MatSetFromOptions(A));
>> PetscCall(MatSetUp(A));
>> PetscCall(MatGetOwnershipRange(A, , ));
>>
>> std::vector II{0, 0, 1, 2, 3, 4};
>> std::vector JJ{0, 0, 1, 2, 3, 4};
>> std::vector XX{2, -1, 2, 3, 4, 5};
>>
>> for (i = Istart; i < Iend; i++)
>> PetscCall(MatSetValues(A, 1, (i), 1, (i), (i), 
>> ADD_VALUES));
>>
>> PetscCall(MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY));
>> PetscCall(MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY));
>> PetscCall(MatView(A, PETSC_VIEWER_STDERR_WORLD));
>>
>> PetscCall(MatDestroy());
>> PetscCall(PetscFinalize());
>> return 0;
>> }
>>
>> When running it with
>>
>> petscmpiexec -n 4 ./fill_in_matrix
>>
>>
>> I get
>>
>>
>>  Mat Object: 4 MPI processes
>>
>>   type: mpiaij
>> row 0: (0, 1.)
>> row 1: (1, 2.)
>> row 2: (2, 3.)
>> row 3: (3, 4.)
>> row 4:
>>
>>
>> Which is missing the entry of the last row.
>>
>> What am I missing? Even better, which would be the best way to fill in this 
>> matrix?
>>
>> We have a new interface for this:
>
>   https://petsc.org/main/manualpages/Mat/MatSetValuesCOO/
>
>   Thanks,
>
>  Matt
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> 
>


Re: [petsc-users] MatSetSizes: C vs python

2023-06-06 Thread Stefano Zampini
The right call in petsc4py is


P.setSizes(((nrowsLoc,PETSc.DECIDE),(ncolsLoc,PETSc.DECIDE)),1)


https://petsc.org/main/petsc4py/reference/petsc4py.PETSc.Mat.html#petsc4py.PETSc.Mat.setSizes

Il giorno mar 6 giu 2023 alle ore 13:05 Blaise Bourdin 
ha scritto:

> Hi,
>
> Does anybody understand why MatSetSizes seem to behave differently in C
> and python?
>
> I would expect the attached examples to be strictly equivalent but the
> python version fails in parallel. It may be that the python interface is
> different, but I don’t see any mention of this in the python docs.
>
> Regards,
> Blaise
>
>
> SiBookPro:test (master)$ mpirun -np 2 python3 testL2G2.py  nrowsLoc: 10
> ncolsLoc: 20
> Traceback (most recent call last):
>   File "/Users/blaise/Development/ccG_CR/test/testL2G2.py", line 20, in
> 
> nrowsLoc: 11 ncolsLoc: 21
> Traceback (most recent call last):
>   File "/Users/blaise/Development/ccG_CR/test/testL2G2.py", line 20, in
> 
> sys.exit(main())
> sys.exit(main())
>  ^^
>   File "/Users/blaise/Development/ccG_CR/test/testL2G2.py", line 12, in
> main
>  ^^
>   File "/Users/blaise/Development/ccG_CR/test/testL2G2.py", line 12, in
> main
> P.setSizes([nrowsLoc,ncolsLoc],1)
> P.setSizes([nrowsLoc,ncolsLoc],1)
>   File "petsc4py/PETSc/Mat.pyx", line 323, in petsc4py.PETSc.Mat.setSizes
> petsc4py.PETSc.Error: error code 62
> [1] MatSetSizes() at /opt/HPC/petsc-release/src/mat/utils/gcreate.c:161
> [1] Invalid argument
> [1] Int value must be same on all processes, argument # 4
>   File "petsc4py/PETSc/Mat.pyx", line 323, in petsc4py.PETSc.Mat.setSizes
> petsc4py.PETSc.Error: error code 62
> [0] MatSetSizes() at /opt/HPC/petsc-release/src/mat/utils/gcreate.c:161
> [0] Invalid argument
> [0] Int value must be same on all processes, argument # 4
>
>
>
>
> —
> Canada Research Chair in Mathematical and Computational Aspects of Solid
> Mechanics (Tier 1)
> Professor, Department of Mathematics & Statistics
> Hamilton Hall room 409A, McMaster University
> 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada
> https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243
>
>

-- 
Stefano


Re: [petsc-users] Petsc using VecDuplicate in solution process

2023-06-06 Thread Stefano Zampini
Il giorno mar 6 giu 2023 alle ore 09:24 Pichler, Franz <
franz.pich...@v2c2.at> ha scritto:

> Hello,
>
> I was just investigating my KSP_Solve_BCGS Routine with algrandcallgrind,
>
> I see there that petsc is using a vecduplicate (onvolvin malloc and copy)
> every time it is called.
>

Do you mean KSPSolve_BCGS?

There's only one VecDuplicate in there and it is called only once. An
example code showing the problem would help



>
> I call it several thousand times (time evolution problem with rather small
> matrices)
>
>
>
> I am not quite sure which vector is copied there but I guess is the
> initial guess or the rhs,
>
> Is there a tool in petsc to avoid any vecduplication by providing a fixed
> memory for this vector?
>
> Some corner facts of my routine:
>
> I assemble the matrices(crs,serial) and vectors myself and then use
>
> MatCreateSeqAIJWithArrays and VecCreateSeqWithArray
>
> To wrap petsc around it,
>
>
>
> I use a ILU preconditioner and the sparsity patterns between the calls to
> not change, the values do,
>
>
>
> Thank you for any hint how to avoid the vecduplicate,
>
>
>
> Best regards
>
>
>
> Franz
>
>
>
>
>
> *Dr. Franz Pichler*
>
> Lead Researcher Area E
>
>
>
>
>
> *Virtual Vehicle Research GmbH*
>
>
>
> Inffeldgasse 21a, 8010 Graz, Austria
>
> Phone: +43 316 873 9818
>
> franz.pich...@v2c2.at
>
> www.v2c2.at
>
>
>
> Firmenname: Virtual Vehicle Research GmbH
>
> Rechtsform: Gesellschaft mit beschränkter Haftung
>
> Firmensitz: Austria, 8010 Graz, Inffeldgasse 21/A
>
> Firmenbuchnummer: 224755y
>
> Firmenbuchgericht: Landesgericht für ZRS Graz
>
> UID: ATU54713500
>
>
>
>
>


-- 
Stefano


Re: [petsc-users] MPI_Iprobe Error with MUMPS Solver on Multi-Nodes

2023-05-22 Thread Stefano Zampini
If I may add to the discussion, it may be that you are going OOM since you
are trying to factorize a 3 million dofs problem, this problem goes
undetected and then fails at a later stage

Il giorno lun 22 mag 2023 alle ore 20:03 Zongze Yang 
ha scritto:

> Thanks!
>
> Zongze
>
> Matthew Knepley 于2023年5月23日 周二00:09写道:
>
>> On Mon, May 22, 2023 at 11:07 AM Zongze Yang 
>> wrote:
>>
>>> Hi,
>>>
>>> I hope this letter finds you well. I am writing to seek guidance
>>> regarding an error I encountered while solving a matrix using MUMPS on
>>> multiple nodes:
>>>
>>
>> Iprobe is buggy on several MPI implementations. PETSc has an option for
>> shutting it off for this reason.
>> I do not know how to shut it off inside MUMPS however. I would mail their
>> mailing list to see.
>>
>>   Thanks,
>>
>>  Matt
>>
>>
>>> ```bash
>>> Abort(1681039) on node 60 (rank 60 in comm 240): Fatal error in
>>> PMPI_Iprobe: Other MPI error, error stack:
>>> PMPI_Iprobe(124)..: MPI_Iprobe(src=MPI_ANY_SOURCE,
>>> tag=MPI_ANY_TAG, comm=0xc426, flag=0x7ffc130f9c4c,
>>> status=0x7ffc130f9e80) failed
>>> MPID_Iprobe(240)..:
>>> MPIDI_iprobe_safe(108):
>>> MPIDI_iprobe_unsafe(35)...:
>>> MPIDI_OFI_do_iprobe(69)...:
>>> MPIDI_OFI_handle_cq_error(949): OFI poll failed
>>> (ofi_events.c:951:MPIDI_OFI_handle_cq_error:Input/output error)
>>> Assertion failed in file src/mpid/ch4/netmod/ofi/ofi_events.c at line
>>> 125: 0
>>> ```
>>>
>>> The matrix in question has a degree of freedom (dof) of 3.86e+06.
>>> Interestingly, when solving smaller-scale problems, everything functions
>>> perfectly without any issues. However, when attempting to solve the larger
>>> matrix on multiple nodes, I encounter the aforementioned error.
>>>
>>> The complete error message I received is as follows:
>>> ```bash
>>> Abort(1681039) on node 60 (rank 60 in comm 240): Fatal error in
>>> PMPI_Iprobe: Other MPI error, error stack:
>>> PMPI_Iprobe(124)..: MPI_Iprobe(src=MPI_ANY_SOURCE,
>>> tag=MPI_ANY_TAG, comm=0xc426, flag=0x7ffc130f9c4c,
>>> status=0x7ffc130f9e80) failed
>>> MPID_Iprobe(240)..:
>>> MPIDI_iprobe_safe(108):
>>> MPIDI_iprobe_unsafe(35)...:
>>> MPIDI_OFI_do_iprobe(69)...:
>>> MPIDI_OFI_handle_cq_error(949): OFI poll failed
>>> (ofi_events.c:951:MPIDI_OFI_handle_cq_error:Input/output error)
>>> Assertion failed in file src/mpid/ch4/netmod/ofi/ofi_events.c at line
>>> 125: 0
>>> /nfs/opt/cascadelake/linux-centos7-cascadelake/gcc-9.4.0/mpich-3.4.2-qgtz76gekvjzuacy7wq5a26rqlewoxfc/lib/libmpi.so.12(MPL_backtrace_show+0x26)
>>> [0x7f6076063f2c]
>>> /nfs/opt/cascadelake/linux-centos7-cascadelake/gcc-9.4.0/mpich-3.4.2-qgtz76gekvjzuacy7wq5a26rqlewoxfc/lib/libmpi.so.12(+0x41dc24)
>>> [0x7f6075fc5c24]
>>> /nfs/opt/cascadelake/linux-centos7-cascadelake/gcc-9.4.0/mpich-3.4.2-qgtz76gekvjzuacy7wq5a26rqlewoxfc/lib/libmpi.so.12(+0x49cc51)
>>> [0x7f6076044c51]
>>> /nfs/opt/cascadelake/linux-centos7-cascadelake/gcc-9.4.0/mpich-3.4.2-qgtz76gekvjzuacy7wq5a26rqlewoxfc/lib/libmpi.so.12(+0x49f799)
>>> [0x7f6076047799]
>>> /nfs/opt/cascadelake/linux-centos7-cascadelake/gcc-9.4.0/mpich-3.4.2-qgtz76gekvjzuacy7wq5a26rqlewoxfc/lib/libmpi.so.12(+0x451e18)
>>> [0x7f6075ff9e18]
>>> /nfs/opt/cascadelake/linux-centos7-cascadelake/gcc-9.4.0/mpich-3.4.2-qgtz76gekvjzuacy7wq5a26rqlewoxfc/lib/libmpi.so.12(+0x452272)
>>> [0x7f6075ffa272]
>>> /nfs/opt/cascadelake/linux-centos7-cascadelake/gcc-9.4.0/mpich-3.4.2-qgtz76gekvjzuacy7wq5a26rqlewoxfc/lib/libmpi.so.12(+0x2ce836)
>>> [0x7f6075e76836]
>>> /nfs/opt/cascadelake/linux-centos7-cascadelake/gcc-9.4.0/mpich-3.4.2-qgtz76gekvjzuacy7wq5a26rqlewoxfc/lib/libmpi.so.12(+0x2ce90d)
>>> [0x7f6075e7690d]
>>> /nfs/opt/cascadelake/linux-centos7-cascadelake/gcc-9.4.0/mpich-3.4.2-qgtz76gekvjzuacy7wq5a26rqlewoxfc/lib/libmpi.so.12(+0x48137b)
>>> [0x7f607602937b]
>>> /nfs/opt/cascadelake/linux-centos7-cascadelake/gcc-9.4.0/mpich-3.4.2-qgtz76gekvjzuacy7wq5a26rqlewoxfc/lib/libmpi.so.12(+0x44d471)
>>> [0x7f6075ff5471]
>>> /nfs/opt/cascadelake/linux-centos7-cascadelake/gcc-9.4.0/mpich-3.4.2-qgtz76gekvjzuacy7wq5a26rqlewoxfc/lib/libmpi.so.12(+0x407acd)
>>> [0x7f6075fafacd]
>>> /nfs/opt/cascadelake/linux-centos7-cascadelake/gcc-9.4.0/mpich-3.4.2-qgtz76gekvjzuacy7wq5a26rqlewoxfc/lib/libmpi.so.12(MPIR_Err_return_comm+0x10a)
>>> [0x7f6075fafbea]
>>> /nfs/opt/cascadelake/linux-centos7-cascadelake/gcc-9.4.0/mpich-3.4.2-qgtz76gekvjzuacy7wq5a26rqlewoxfc/lib/libmpi.so.12(MPI_Iprobe+0x312)
>>> [0x7f6075ddd542]
>>> /nfs/opt/cascadelake/linux-centos7-cascadelake/gcc-9.4.0/mpich-3.4.2-qgtz76gekvjzuacy7wq5a26rqlewoxfc/lib/libmpifort.so.12(pmpi_iprobe+0x2f)
>>> [0x7f606e08f19f]
>>> /nfs/home/zzyang/opt/software/linux-centos7-cascadelake/gcc-9.4.0/mumps-5.5.1-gb7wlwxwbalf5rw5vkp6gtkhfkdqpntz/lib/libzmumps.so(__zmumps_load_MOD_zmumps_load_recv_msgs+0x142)
>>> [0x7f60737b194d]
>>> 

Re: [petsc-users] problem for using PCBDDC

2023-05-15 Thread Stefano Zampini
BDDC is a domain decomposition solver of the non-overlapping type and
cannot be used on assembled operators.
If you want to use it, you need to restructure your code a bit.

I presume from your message that your current approach is

1) generate_assembled_csr
2) decompose_csr? or decompose_mesh?
3) get_subdomain_relevant_entries
4) set in local matrix

this is wrong since you are summing up redundant matrix values in the final
MATIS format (you can check that MatMultEqual return false if you compare
the assembled operator and the MATIS operator)

You should restructure your code as

1) decompose mesh
2) generate_csr_only_for_local_subdomain
3) set values in local ordering into the MATIS object

you can start with a simple 2 cells problem, each assigned to a different
process to understand how to move forward.

You can play with src/ksp/ksp/tutorials/ex71.c which uses a structured grid
to understand how to setup a MATIS for a  PDE solve

Hope this helps


Il giorno lun 15 mag 2023 alle ore 17:08 ziming xiong <
xiongziming2...@gmail.com> ha scritto:

> Hello sir,
> I am a PhD student and am trying to use the PCBDDC method in petsc to
> solve the matrix, but the final result is wrong. So I would like to ask you
> a few questions.
> First I will describe the flow of my code, I first used the finite element
> method to build the total matrix in CSR format (total boundary conditions
> have been imposed), where I did not build the total matrix, but only the
> parameters ia, ja,value in CSR format, through which the parameters of the
> metis (xadj, adjncy) are derived. The matrix is successfully divided into 2
> subdomains using metis. After getting the global index of the points of
> each subdomain by the part parameter of metis. I apply
> ISLocalToGlobalMappingCreate to case mapping and use
> ISGlobalToLocalMappingApply to convert the global index of points within
> each process to local index and use MatSetValueLocal to populate the
> corresponding subdomain matrix for each process. Here I am missing the
> relationship of the boundary points between subdomains, and by using
> ISGlobalToLocalMappingApply (I use IS_GTOLM_MASK to get the points outside
> the subdomains converted to -1) I can get the index of the missing
> relationship in the global matrix as well as the value. After creating the
> global MATIS use MatISSetLocalMat to synchronize the subdomain matrix to
> the global MATIS. After using MatSetValues to add the relationship of the
> boundary points between subdomains into the global MATIS. The final
> calculation is performed, but the final result is not correct.
> My question is:
> 1. in PetscCall(MatAssemblyBegin(matIS, MAT_FINAL_ASSEMBLY)).
> PetscCall(MatAssemblyEnd(matIS, MAT_FINAL_ASSEMBLY)).
> After that, when viewing the matrix by
> PetscCall(MatView(matIS,PETSC_VIEWER_STDOUT_WORLD));, each process will
> output the non-zero items of the matrix separately, but this index is the
> local index is this normal?
> 2. I found that after using MatSetValues to add the relationship of
> boundary points between subdomains into the global MATIS, the calculation
> result does not change. Why is this? Can I interpolate directly into the
> global MATIS if I know the global matrix index of the missing relations?
>
>
> Best regards,
> Ziming XIONG
>
>


-- 
Stefano


Re: [petsc-users] Help with KSPSetConvergenceTest

2023-05-13 Thread Stefano Zampini
Run make allfortranstubs to generate the fortran interfaces,  then make

On Sat, May 13, 2023, 13:35 Edoardo alinovi 
wrote:

> Hello Barry,
>
> I have seen you guys merged in main the minimum tolerance stuff.
>
> After compiling that branch, I have tried to
> call KSPSetMinimumIterations(this%ksp, this%minIter, ierr), but the
> compiler cannot find the function.
>
> I have included this module as standard practice:
> #include "petsc/finclude/petscksp.h"
> use petscksp
>
> Maybe I am missing something else?
>
> Thank you!
>
>
>
>
>


Re: [petsc-users] Petsc ObjectStateIncrease without proivate header

2023-05-08 Thread Stefano Zampini
You can achieve the same effect by calling MatAssemblyBegin/End

Il giorno lun 8 mag 2023 alle ore 15:54 Pichler, Franz <
franz.pich...@v2c2.at> ha scritto:

> Hello,
> i am using petsc in a single cpu setup where I have preassembled crs
> matrices that I wrap via PetSC’s
>
> MatCreateSeqAIJWithArrays  Functionality.
>
>
>
> Now I manipulate the values of these matrices (wohtout changing the
> sparsity) without using petsc,
>
>
>
> When I now want to solve again I have to call
>
> PetscObjectStateIncrease((PetscObject)petsc_A);
>
>
>
> So that oetsc actually solves again (otherwise thinking nothing hs changed
> ,
>
> This means I have to include the private header
>
> #include 
>
>
>
> Which makes a seamingless implementation of petsc into a cmake process
> more complicate (This guy has to be stated explicitly in the cmake process
> at the moment)
>
>
>
> I would like to resolve that by “going” around the private header,
>
> My first intuition was to increase the state by hand
>
> ((PetscObject)petsc_A_aux[the_sys])->state++;
>
> This is the definition of petscstateincrease in the header. This throws me
> an
>
> error: invalid use of incomplete type ‘struct _p_PetscObject’
>
>
>
> compilation error.
>
>
>
> Is there any elegeant way around this?
>
> This is the first time I use the petsc mailing list so apologies for any
> beginners mistake I did in formatting or anything else.
>
>
>
> Best regards
>
>
>
> Franz Pichler
>
>
>


-- 
Stefano


Re: [petsc-users] Installation issue of 3.18.* and 3.19.0 on Apple systems

2023-04-05 Thread Stefano Zampini
It seems there's some typo/error in the configure command that is being
executed. Can you post it here?

Il giorno mer 5 apr 2023 alle ore 23:18 Kaus, Boris  ha
scritto:

> Hi everyone,
>
> I’m trying to install precompiled binaries for PETSc 3.18.5 & 3.19.0 using
> the BinaryBuilder cross-compilation:
> https://github.com/JuliaPackaging/Yggdrasil/pull/6533, which mostly works
> fine: https://buildkite.com/julialang/yggdrasil/builds/2093).
>
> Yet, on apple systems I receive a somewhat weird bug during the configure
> step:
>
> [22:08:49]
> ***
> [22:08:49] TypeError or ValueError possibly related to ERROR in
> COMMAND LINE ARGUMENT while running ./configure
> [22:08:49]
> ---
> [22:08:49] invalid literal for int() with base 10: ''
> [22:08:49]
> ***
> [22:08:49]
> [22:08:49]
> [22:08:49] /workspace/srcdir/petsc-3.18.0/lib/petsc/conf/rules:860:
> /workspace/srcdir/petsc-3.18.0//lib/petsc/conf/petscrules: No such file or
> directory
> [22:08:49] make[1]: *** No rule to make target
> '/workspace/srcdir/petsc-3.18.0//lib/petsc/conf/petscrules'.  Stop.
> [22:08:49] /workspace/srcdir/petsc-3.18.0/lib/petsc/conf/rules:860:
> /workspace/srcdir/petsc-3.18.0//lib/petsc/conf/petscrules: No such file or
> directory
> [22:08:49] make[1]: *** No rule to make target
> '/workspace/srcdir/petsc-3.18.0//lib/petsc/conf/petscrules'.  Stop.
> [22:08:49] make: *** [GNUmakefile:17:
> /workspace/srcdir/petsc-3.18.0//lib/petsc/conf/petscvariables] Error 2
> [22:08:49] make: *** Waiting for unfinished jobs
> [22:08:49] make: *** [GNUmakefile:17: lib/petsc/conf/petscvariables] Error
> 2
>
> The log file is rather brief:
>
> sandbox:${WORKSPACE}/srcdir/petsc-3.18.0 # more configure.log
> Executing: uname -s
> stdout: Darwin
>
> It works fine for PETSc 3.16.5/3.17.5, and this first occurs in 3.18.0.
> Is there something that changed between 3.17 & 3.18 that could cause this?
>
> The build system seems to use python3.9 (3.4+ as required)
>
> Thanks!
> Boris
>
>
>
>
>

-- 
Stefano


Re: [petsc-users] Does petsc4py support matrix-free iterative solvers?

2023-03-14 Thread Stefano Zampini
You can find other  examples at
https://gitlab.com/stefanozampini/petscexamples

On Tue, Mar 14, 2023, 19:50 Jose E. Roman  wrote:

> Have a look at ex100.c ex100.py:
>
> https://gitlab.com/petsc/petsc/-/blob/c28a890633c5a91613f1645670105409b4ba3c14/src/ksp/ksp/tutorials/ex100.c
>
> https://gitlab.com/petsc/petsc/-/blob/c28a890633c5a91613f1645670105409b4ba3c14/src/ksp/ksp/tutorials/ex100.py
>
> Jose
>
>
> > El 14 mar 2023, a las 17:45, Eric Hester 
> escribió:
> >
> > Is there a similar example of how to create shell preconditioners using
> petsc4py?
> >
> > Thanks,
> > Eric
> >
> >> On Mar 13, 2023, at 09:37, Eric Hester 
> wrote:
> >>
> >> Ah ok. I see how the poisson2d example works. Thanks for the quick
> reply.
> >>
> >> Eric
> >>
> >>> On Mar 13, 2023, at 08:10, Jose E. Roman  wrote:
> >>>
> >>> Both ode/vanderpol.py and poisson2d/poisson2d.py use shell matrices
> via a mult(self,mat,X,Y) function defined in the python side. Another
> example is ex3.py in slepc4py.
> >>>
> >>> Jose
> >>>
> >>>
> >>>
>  El 13 mar 2023, a las 15:58, Eric Hester via petsc-users <
> petsc-users@mcs.anl.gov> escribió:
> 
>  Hello everyone,
> 
>  Does petsc4py support matrix-free iterative solvers (as for
> Matrix-Free matrices in petsc)?
> 
>  For context, I have a distributed matrix problem to solve. It comes
> from a Fourier-Chebyshev Galerkin discretisation. The corresponding matrix
> is dense, but it is fast to evaluate using fftw. It is also distributed in
> memory.
> 
>  While I’ve found some petsc4py tutorial examples in
> "/petsc/src/binding/petsc4py/demo/“, they don’t seem to show a matrix free
> example. And I don’t see a reference to a matrix shell create method in the
> petsc4py api.
> 
>  If petsc4py does support matrix free iterative solvers, it would be
> really helpful if someone could provide even a toy example of that. Serial
> would work, though a parallelised one would be better.
> 
>  Thanks,
>  Eric
> 
> 
> >>>
> >>
> >
>
>


Re: [petsc-users] Question about rank of matrix

2023-02-16 Thread Stefano Zampini
On Fri, Feb 17, 2023, 10:43 user_gong Kim  wrote:

> Hello,
>
> I have a question about rank of matrix.
> At the problem
> Au = b,
>
> In my case, sometimes global matrix A is not full rank.
> In this case, the global matrix A is more likely to be singular, and if it
> becomes singular, the problem cannot be solved even in the case of the
> direct solver.
> I haven't solved the problem with an iterative solver yet, but I would
> like to ask someone who has experienced this kind of problem.
>
> 1. If it is not full rank, is there a numerical technique to solve it by
> catching rows and columns with empty ranks in advance?
>
> 2.If anyone has solved it in a different way than the above numerical
> analysis method, please tell me your experience.
>
> Thanks,
> Hyung Kim
>

My experience with this is usually associated to reading a book and find
the solution I'm looking for.

>
>
>


Re: [petsc-users] Question for Petsc

2023-02-16 Thread Stefano Zampini
For bddc, you can also take a look at
https://gitlab.com/petsc/petsc/-/blob/main/src/ksp/ksp/tutorials/ex71.c

On Thu, Feb 16, 2023, 19:41 Matthew Knepley  wrote:

> On Thu, Feb 16, 2023 at 9:14 AM ziming xiong 
> wrote:
>
>> Hello,
>> I want to use Petsc to implement high performance computing, and I mainly
>> want to apply DDM methods to parallel computing. I have implemented some of
>> the DDM methods (such as ASM, Bjacobi, etc.), but I don't understand the
>> PCBDDC method. The official example (src/ksp/ksp/tutorials/ex59.c.html) is
>> too complicated, so I have not been able to figure out the setting process.
>> I would like to ask you if you have other simple and clearer examples for
>> reference.
>>
>
> You could look at the paper:
> https://epubs.siam.org/doi/abs/10.1137/15M1025785
>
>
>> Secondly, I would like to apply mklPardiso to Petsc. But not work, can u
>> help me figure out the problem? i use oneAPI for the mklpardiso, and when i
>> configure, i give the blaslapack lib.
>>
>
> You should reconfigure with --download-mkl_pardiso
>
>   Thanks,
>
>  Matt
>
>
>> there are the errors:
>>
>> [0]PETSC ERROR: See
>> https://petsc.org/release/overview/linear_solve_table/ for possible LU
>> and Cholesky solvers
>> [0]PETSC ERROR: Could not locate solver type mkl_pardiso for
>> factorization type LU and matrix type seqaij. Perhaps you must ./configure
>> with --download-mkl_pardiso
>> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
>> [0]PETSC ERROR: Petsc Release Version 3.18.3, Dec 28, 2022
>> [0]PETSC ERROR:
>> C:\Users\XiongZiming\Desktop\test_petsc_FEM\test_petsc_fem\x64\Release\test_petsc_fem.exe
>> on a arch-mswin-c-debug named lmeep-329 by XiongZiming Thu Feb 16 15:05:14
>> 2023
>> [0]PETSC ERROR: Configure options --with-cc="win32fe cl" --with-fc=0
>> --with-cxx="win32fe cl" --with-shared-libraries=0
>> --with-mpi-include="[/cygdrive/c/PROGRA~2/Intel/MPI/Include,/cygdrive/c/PROGRA~2/Intel/MPI/Include/x64]"
>> --with-mpi-lib="-L/cygdrive/c/PROGRA~2/Intel/MPI/lib/x64 msmpifec.lib
>> msmpi.lib" --with-mpiexec=/cygdrive/c/PROGRA~1/Microsoft_MPI/Bin/mpiexec
>> --with-blaslapack-lib="-L/cygdrive/c/PROGRA~2/Intel/oneAPI/mkl/2023.0.0/lib/intel64
>> mkl_intel_lp64_dll.lib mkl_sequential_dll.lib mkl_core_dll.lib"
>>
>> Thanks,
>> Ziming XIONG
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> 
>


Re: [petsc-users] GPU implementation of serial smoothers

2023-01-10 Thread Stefano Zampini
DILU in openfoam is our block Jacobi ilu subdomain solvers

On Tue, Jan 10, 2023, 23:45 Barry Smith  wrote:

>
>   The default is some kind of Jacobi plus Chebyshev, for a certain class
> of problems, it is quite good.
>
>
>
> On Jan 10, 2023, at 3:31 PM, Mark Lohry  wrote:
>
> So what are people using for GAMG configs on GPU? I was hoping petsc today
> would be performance competitive with AMGx but it sounds like that's not
> the case?
>
> On Tue, Jan 10, 2023 at 3:03 PM Jed Brown  wrote:
>
>> Mark Lohry  writes:
>>
>> > I definitely need multigrid. I was under the impression that GAMG was
>> > relatively cuda-complete, is that not the case? What functionality works
>> > fully on GPU and what doesn't, without any host transfers (aside from
>> > what's needed for MPI)?
>> >
>> > If I use -ksp-pc_type gamg -mg_levels_pc_type pbjacobi
>> -mg_levels_ksp_type
>> > richardson is that fully on device, but -mg_levels_pc_type ilu or
>> > -mg_levels_pc_type sor require transfers?
>>
>> You can do `-mg_levels_pc_type ilu`, but it'll be extremely slow (like
>> 20x slower than an operator apply). One can use Krylov smoothers, though
>> that's more synchronization. Automatic construction of operator-dependent
>> multistage smoothers for linear multigrid (because Chebyshev only works for
>> problems that have eigenvalues near the real axis) is something I've wanted
>> to develop for at least a decade, but time is always short. I might put
>> some effort into p-MG with such smoothers this year as we add DDES to our
>> scale-resolving compressible solver.
>>
>
>


Re: [petsc-users] Question about eigenvalue and eigenvectors

2022-12-24 Thread Stefano Zampini
For 3x3 matrices you can use explicit formulas

On Sat, Dec 24, 2022, 11:20 김성익  wrote:

> Hello,
>
>
> I tried to calculate the eigenvalues and eigenvectors in 3 by 3 matrix
> (real and nonsymmetric).
> I already checked the kspcomputeeigenvalues and kspcomputeritz.
>
> However, the target matrix is just 3 by 3 matrix.
> So I need another way to calculate the values and vectors.
> Can anyone recommend other methods that are efficient for such small size
> problems??
>
> Thanks,
> Hyung Kim
>


Re: [petsc-users] Get solution and rhs in the ts monitor

2022-11-10 Thread Stefano Zampini
-ksp_view_mat
-ksp_view_rhs
-ksp_view_solution

> On Nov 10, 2022, at 5:00 PM, Tang, Qi  wrote:
> 
> Yes, but I need to get A, x and b out, so that I can test them in pyamg for 
> other preconditioner options. I can get A, x, and b through what I described, 
> but I do not think x or b is the original one in the linear system. 
> 
> Is there a simple way to get x and b (I just need once) of TS? Thanks.
> 
> Qi 
> From: Matthew Knepley 
> Sent: Thursday, November 10, 2022 6:15 AM
> To: Tang, Qi 
> Cc: petsc-users@mcs.anl.gov 
> Subject: Re: [petsc-users] Get solution and rhs in the ts monitor
>  
> On Thu, Nov 10, 2022 at 3:18 AM Tang, Qi  > wrote:
> Hi,
> 
> How could I get rhs and solution in a ksp solve of ts?
> 
> I am testing a linear problem (TS_Linear) using a bdf integrator. I tried to 
> get the operator, rhs, and solution in the ts monitor through TSGetKSP and 
> KSPGet***. But r = Ax-b is much larger than the ksp norm. I know the solver 
> works fine. 
> 
> Ax - b is the _unpreconditioned_ norm. By default we are printing the 
> preconditioned norm. You can see the difference by running with
> 
>   -ksp_monitor_true_residual
> 
>   Thanks,
> 
>  Matt
>  
> Did I misunderstand something about how TS works here? Perhaps one of the 
> vectors is changed after the ksp solve? If so, is there a simple way to get 
> rhs and solution that ksp of ts solved?
> 
> Thanks,
> Qi
> 
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any results to which their experiments 
> lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ 
> 


Re: [petsc-users] KSP on GPU

2022-11-01 Thread Stefano Zampini
Are you calling VecRestoreArray when you are done inserting the values?

On Tue, Nov 1, 2022, 18:42 Carl-Johan Thore via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> Thanks for the tips!
>
>
>
> The suggested settings for GAMG did not yield better results,
>
> but hypre worked well right away, giving very good convergence!
>
>
>
> A follow-up question then (I hope that’s ok; and it could be related to
> GAMG
>
> not working, I’ll check that). Once everything was running I discovered
> that my gradient vector
>
> dfdx which I populate via an array df obtained from VecGetArray(dfdx, )
> doesn’t get
>
> filled properly; it always contains only zeros. This is not the case when
> I run on the CPU,
>
> and df gets filled as it should even on the GPU, suggesting that either
> I’m not using
>
> VecGetArray properly, or I shouldn’t use it at all for GPU computations?
>
>
>
> Kind regards,
>
> Carl-Johan
>
>
>
> *From:* Mark Adams 
> *Sent:* den 31 oktober 2022 13:30
> *To:* Carl-Johan Thore 
> *Cc:* Matthew Knepley ; Barry Smith ;
> petsc-users@mcs.anl.gov
> *Subject:* Re: [petsc-users] KSP on GPU
>
>
>
> * You could try hypre or another preconditioner that you can afford,
> like LU or ASM, that works.
>
> * If this matrix is SPD, you want to use
> -fieldsplit_0_pc_gamg_esteig_ksp_type cg
> -fieldsplit_0_pc_gamg_esteig_ksp_max_it 10
>
>  These will give better eigen estimates, and that is important.
>
> The differences between these steimates is not too bad.
>
> There is a safety factor (1.05 is the default) that you could increase
> with: -fieldsplit_0_mg_levels_ksp_chebyshev_esteig 0,0.05,0,*1.1*
>
> * Finally you could try -fieldsplit_0_pc_gamg_reuse_interpolation 1, if
> GAMG is still not working.
>
>
>
> Use -fieldsplit_0_ksp_converged_reason and check the iteration count.
>
> And it is a good idea to check with hypre to make sure something is not
> going badly in terms of performance anyway. AMG is hard and hypre is a good
> solver.
>
>
>
> Mark
>
>
>
> On Mon, Oct 31, 2022 at 1:56 AM Carl-Johan Thore via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
>
> The GPU supports double precision and I didn’t explicitly tell PETSc to
> use float when compiling, so
>
> I guess it uses double? What’s the easiest way to check?
>
>
>
> Barry, running -ksp_view shows that the solver options are the same for
> CPU and GPU. The only
>
> difference is the coarse grid solver for gamg (“the package used to
> perform factorization:”) which
>
> is petsc for CPU and cusparse for GPU. I tried forcing the GPU to use
> petsc via
>
> -fieldsplit_0_mg_coarse_sub_pc_factor_mat_solver_type, but then ksp failed
> to converge
>
> even on the first topology optimization iteration.
>
>
>
> -ksp_view also shows differences in the eigenvalues from the Chebyshev
> smoother. For example,
>
>
>
> GPU:
>
>Down solver (pre-smoother) on level 2 ---
>
>   KSP Object: (fieldsplit_0_mg_levels_2_) 1 MPI process
>
> type: chebyshev
>
>   eigenvalue targets used: min 0.109245, max 1.2017
>
>   eigenvalues provided (min 0.889134, max 1.09245) with
>
>
>
> CPU:
>
>   eigenvalue targets used: min 0.112623, max 1.23886
>
>   eigenvalues provided (min 0.879582, max 1.12623)
>
>
>
> But I guess such differences are expected?
>
>
>
> /Carl-Johan
>
>
>
> *From:* Matthew Knepley 
> *Sent:* den 30 oktober 2022 22:00
> *To:* Barry Smith 
> *Cc:* Carl-Johan Thore ; petsc-users@mcs.anl.gov
> *Subject:* Re: [petsc-users] KSP on GPU
>
>
>
> On Sun, Oct 30, 2022 at 3:52 PM Barry Smith  wrote:
>
>
>
>In general you should expect similar but not identical conference
> behavior.
>
>
>
> I suggest running with all the monitoring you can.
> -ksp_monitor_true_residual
> -fieldsplit_0_monitor_true_residual -fieldsplit_1_monitor_true_residual and
> compare the various convergence between the CPU and GPU. Also run with
> -ksp_view and check that the various solver options are the same (they
> should be).
>
>
>
> Is the GPU using float or double?
>
>
>
>Matt
>
>
>
>   Barry
>
>
>
>
>
> On Oct 30, 2022, at 11:02 AM, Carl-Johan Thore via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
>
>
>
> Hi,
>
>
>
> I'm solving a topology optimization problem with Stokes flow discretized
> by a stabilized Q1-Q0 finite element method
>
> and using BiCGStab with the fieldsplit preconditioner to solve the linear
> systems. The implementation
>
> is based on DMStag, runs on Ubuntu via WSL2, and works fine with
> PETSc-3.18.1 on multiple CPU cores and the following
>
> options for the preconditioner:
>
>
>
> -fieldsplit_0_ksp_type preonly \
>
> -fieldsplit_0_pc_type gamg \
>
> -fieldsplit_0_pc_gamg_reuse_interpolation 0 \
>
> -fieldsplit_1_ksp_type preonly \
>
> -fieldsplit_1_pc_type jacobi
>
>
>
> However, when I enable GPU computations by adding two options -
>
>
>
> ...
>
> -dm_vec_type cuda \
>
> -dm_mat_type aijcusparse \
>
> -fieldsplit_0_ksp_type preonly \
>
> 

Re: [petsc-users] Issue with single precision complex numbers in petsc4py

2022-10-14 Thread Stefano Zampini
On Fri, Oct 14, 2022, 19:46 Peng Sun  wrote:

> Hi Stefano,
>
> No I used pip to install petsc4py after I installed PETSc.  I did not see
> the binding folder under /src
>


Not sure which was the first version petsc4py was shipped with PETSc , for
sure 3.18 has it.


Best regards,
> Peng Sun
> ------
> *From:* Stefano Zampini 
> *Sent:* Friday, October 14, 2022 4:36 AM
> *To:* Peng Sun 
> *Cc:* Zhang, Hong ; petsc-users@mcs.anl.gov <
> petsc-users@mcs.anl.gov>
> *Subject:* Re: [petsc-users] Issue with single precision complex numbers
> in petsc4py
>
>
>
> On Oct 14, 2022, at 3:53 AM, Peng Sun  wrote:
>
> Hi Hong,
>
> Thanks for the advice.  I could not install petsc4py with the
> --with-petsc4py=1 option, which gave me an "No rule to make target
> 'petsc4py-install'" error when I ran "make install".   That was why I
> needed to install petsc4py separately after the PETSc was installed.
>
>
>
> After you installed PETSc, go to src/binding/petsc4py and run make install
> there. It will install in .local and it will be visible to python.
> Is this how you installed it?
>
>
> Best regards,
> Peng Sun
> --
> *From:* Zhang, Hong 
> *Sent:* Thursday, October 13, 2022 4:30 PM
> *To:* Peng Sun 
> *Cc:* petsc-users@mcs.anl.gov 
> *Subject:* Re: [petsc-users] Issue with single precision complex numbers
> in petsc4py
>
> It seems that you installed petsc4py separately. I would suggest to add
> the configure option --with-petsc4py=1 and follow the instructions to set
> PYTHONPATH before using petsc4py.
>
> Hong (Mr.)
>
> > On Oct 13, 2022, at 10:42 AM, Peng Sun  wrote:
> >
> > Hi Matt,
> >
> > Sure, please see the attached configure.log file.  Thanks!
> >
> > Best regards,
> > Peng Sun
> >
> >
> > From: Matthew Knepley 
> > Sent: Thursday, October 13, 2022 6:34 AM
> > To: Peng Sun 
> > Cc: petsc-users@mcs.anl.gov 
> > Subject: Re: [petsc-users] Issue with single precision complex numbers
> in petsc4py
> >
> > First send configure.log so we can see the setup.
> >
> >   Thanks,
> >
> >   Matt
> >
> > On Thu, Oct 13, 2022 at 12:53 AM Peng Sun  wrote:
> > Dear PETSc community,
> >
> > I have a question regarding the single precision complex numbers of
> petsc4py.  I configured PETSc with the “--with-scalar-type=complex
> --with-precision=single" option before compiling, but all the DA structures
> I created with petsc4py had double precision.
> >
> > Here is a minimum test code on Python 3.8/PETSc 3.12/petsc4py 3.12: both
> print commands show data type of complex128.  Could anybody please help
> me?  Thanks!
> >
> > import
> >  petsc4py
> >
> > import
> >  sys
> > petsc4py.init(sys.argv)
> >
> > from petsc4py import
> >  PETSc
> >
> > da=PETSc.DA().create(sizes=[
> > 2,2,2],dof=1,stencil_type=0,stencil_width=1,boundary_type=1
> > )
> > da_1 = da.createGlobalVec()
> >
> > print
> > (petsc4py.PETSc.ComplexType)
> >
> > print(da_1.getArray().dtype)
> >
> >
> >
> > Best regards,
> > Peng Sun
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
> >
> > https://www.cse.buffalo.edu/~knepley/
> > 
>
>
>


Re: [petsc-users] Issue with single precision complex numbers in petsc4py

2022-10-14 Thread Stefano Zampini


> On Oct 14, 2022, at 3:53 AM, Peng Sun  wrote:
> 
> Hi Hong,
> 
> Thanks for the advice.  I could not install petsc4py with the 
> --with-petsc4py=1 option, which gave me an "No rule to make target 
> 'petsc4py-install'" error when I ran "make install".   That was why I needed 
> to install petsc4py separately after the PETSc was installed.


After you installed PETSc, go to src/binding/petsc4py and run make install 
there. It will install in .local and it will be visible to python.
Is this how you installed it? 
> 
> Best regards,
> Peng Sun
> From: Zhang, Hong 
> Sent: Thursday, October 13, 2022 4:30 PM
> To: Peng Sun 
> Cc: petsc-users@mcs.anl.gov 
> Subject: Re: [petsc-users] Issue with single precision complex numbers in 
> petsc4py
>  
> It seems that you installed petsc4py separately. I would suggest to add the 
> configure option --with-petsc4py=1 and follow the instructions to set 
> PYTHONPATH before using petsc4py.
> 
> Hong (Mr.)
> 
> > On Oct 13, 2022, at 10:42 AM, Peng Sun  wrote:
> > 
> > Hi Matt,
> > 
> > Sure, please see the attached configure.log file.  Thanks!
> > 
> > Best regards,
> > Peng Sun
> > 
> > 
> > From: Matthew Knepley 
> > Sent: Thursday, October 13, 2022 6:34 AM
> > To: Peng Sun 
> > Cc: petsc-users@mcs.anl.gov 
> > Subject: Re: [petsc-users] Issue with single precision complex numbers in 
> > petsc4py
> >  
> > First send configure.log so we can see the setup.
> > 
> >   Thanks,
> > 
> >   Matt
> > 
> > On Thu, Oct 13, 2022 at 12:53 AM Peng Sun  wrote:
> > Dear PETSc community, 
> > 
> > I have a question regarding the single precision complex numbers of 
> > petsc4py.  I configured PETSc with the “--with-scalar-type=complex 
> > --with-precision=single" option before compiling, but all the DA structures 
> > I created with petsc4py had double precision.  
> > 
> > Here is a minimum test code on Python 3.8/PETSc 3.12/petsc4py 3.12: both 
> > print commands show data type of complex128.  Could anybody please help me? 
> >  Thanks!
> > 
> > import
> >  petsc4py
> > 
> > import
> >  sys
> > petsc4py.init(sys.argv)
> > 
> > from petsc4py import
> >  PETSc
> > 
> > da=PETSc.DA().create(sizes=[
> > 2,2,2],dof=1,stencil_type=0,stencil_width=1,boundary_type=1
> > )
> > da_1 = da.createGlobalVec()
> > 
> > print
> > (petsc4py.PETSc.ComplexType)
> > 
> > print(da_1.getArray().dtype)
> > 
> > 
> > 
> > Best regards,
> > Peng Sun
> > 
> > 
> > -- 
> > What most experimenters take for granted before they begin their 
> > experiments is infinitely more interesting than any results to which their 
> > experiments lead.
> > -- Norbert Wiener
> > 
> > https://www.cse.buffalo.edu/~knepley/ 
> > 
> > 



Re: [petsc-users] Issue with single precision complex numbers in petsc4py

2022-10-13 Thread Stefano Zampini
Matt

Yes, petsc4py does the right thing. This is probably. Picking up the wrong 
PETSc arch.

Peng, can you please run this?

import petsc4py
petsc4py.init()
print(petsc4py.get_config())

> On Oct 13, 2022, at 11:23 PM, Matthew Knepley  wrote:
> 
> Lisandro,
> 
> PETSc is compiled for single. Does petsc4py respect this, or does it always 
> use double for getArray() and friends?
> 
>   Thanks,
> 
>  Matt
> 
> On Thu, Oct 13, 2022 at 11:42 AM Peng Sun  > wrote:
> Hi Matt,
> 
> Sure, please see the attached configure.log file.  Thanks!
> 
> Best regards,
> Peng Sun
> 
> 
> From: Matthew Knepley mailto:knep...@gmail.com>>
> Sent: Thursday, October 13, 2022 6:34 AM
> To: Peng Sun mailto:p...@outlook.com>>
> Cc: petsc-users@mcs.anl.gov  
> mailto:petsc-users@mcs.anl.gov>>
> Subject: Re: [petsc-users] Issue with single precision complex numbers in 
> petsc4py
>  
> First send configure.log so we can see the setup.
> 
>   Thanks,
> 
>   Matt
> 
> On Thu, Oct 13, 2022 at 12:53 AM Peng Sun  > wrote:
> Dear PETSc community, 
> 
> 
> 
> I have a question regarding the single precision complex numbers of petsc4py. 
>  I configured PETSc with the “--with-scalar-type=complex 
> --with-precision=single" option before compiling, but all the DA structures I 
> created with petsc4py had double precision.  
> 
> 
> 
> Here is a minimum test code on Python 3.8/PETSc 3.12/petsc4py 3.12: both 
> print commands show data type of complex128.  Could anybody please help me?  
> Thanks!
> 
> 
> 
> import petsc4py
> import sys
> petsc4py.init(sys.argv)
> from petsc4py import PETSc
> 
> da=PETSc.DA().create(sizes=[2,2,2],dof=1,stencil_type=0,stencil_width=1,boundary_type=1)
> da_1 = da.createGlobalVec()
> print(petsc4py.PETSc.ComplexType)
> print(da_1.getArray().dtype)
> 
> 
> 
> 
> 
> Best regards,
> 
> Peng Sun
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any results to which their experiments 
> lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any results to which their experiments 
> lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] Using VecGetType() and VecNest

2022-08-20 Thread Stefano Zampini
The idea of the pod code was to create local vectors to perform local
products and use mpi non blocking collectives . Not sure how to fix for the
nest case. I'm on vacation now, I can take a look in a couple of weeks or so

On Sat, Aug 20, 2022, 05:30 Barry Smith  wrote:

>
>   I am not sure why the code works this way creating these sequential work
> vectors. Since VecNest needs some information about the subvectors I don't
> think just setting these work vectors to nest vectors will work properly.
>
>   I am cc:ing Stefano who wrote the code and can likely say immediately
> what the solution is.
>
>   Barry
>
>
> On Aug 19, 2022, at 5:21 PM, Wells, David  wrote:
>
> Hello PETSc experts,
>
> I am using VecNest to solve a Stokes problem and I ran into an issue using
> the POD KSPGuess routines:
>
> [0]PETSC ERROR: - Error Message
> --
> [0]PETSC ERROR: Unknown type. Check for miss-spelling or missing package:
> https://petsc.org/release/install/install/#external-packages
> [0]PETSC ERROR: Unknown vector type: nest
> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.17.0, Mar 30, 2022
> [0]PETSC ERROR: /[...]/step-32/step-32 on a  named mitral by drwells Fri
> Aug 19 16:33:08 2022
> [0]PETSC ERROR: Configure options [...]
> [0]PETSC ERROR: #1 VecSetType() at /afs/
> cas.unc.edu/users/d/r/drwells/Documents/Code/C/petsc-3.17.0/src/vec/vec/interface/vecreg.c:80
> [0]PETSC ERROR: #2 KSPGuessSetUp_POD() at /afs/
> cas.unc.edu/users/d/r/drwells/Documents/Code/C/petsc-3.17.0/src/ksp/ksp/guess/impls/pod/pod.c:118
> [0]PETSC ERROR: #3 KSPGuessSetUp() at /afs/
> cas.unc.edu/users/d/r/drwells/Documents/Code/C/petsc-3.17.0/src/ksp/ksp/interface/iguess.c:352
> [0]PETSC ERROR: #4 KSPSolve_Private() at /afs/
> cas.unc.edu/users/d/r/drwells/Documents/Code/C/petsc-3.17.0/src/ksp/ksp/interface/itfunc.c:830
> [0]PETSC ERROR: #5 KSPSolve() at /afs/
> cas.unc.edu/users/d/r/drwells/Documents/Code/C/petsc-3.17.0/src/ksp/ksp/interface/itfunc.c:1078
>
>
> It looks like VecGetType() works with VecNest while VecSetType() does not.
> I also noticed that VecNest isn't listed in VecRegisterAll().
>
> I think I can write a patch to make the POD KSPGuess work with VecNest,
> but: should VecSetType() work with VecNest? It seems like this is an
> oversight but I'd like to know if there is some fundamental reason why the
> sequence used there
>
> PetscCall(KSPCreateVecs(guess->ksp,1,,0,NULL));
> PetscCall(VecCreate(PETSC_COMM_SELF,));
> PetscCall(VecGetLocalSize(v[0],));
> PetscCall(VecSetSizes(vseq,n,n));
> PetscCall(VecGetType(v[0],));
> PetscCall(VecSetType(vseq,type));
> PetscCall(VecDestroyVecs(1,));
> PetscCall(VecDuplicateVecs(vseq,pod->maxn,>xsnap));
> PetscCall(VecDestroy());
> PetscCall(PetscLogObjectParents(guess,pod->maxn,pod->xsnap));
>
> can't work.
>
> Best,
> David Wells
>
>
>


Re: [petsc-users] Issue to set values to a matrix in parallel in python

2022-08-09 Thread Stefano Zampini
PETSc distributes matrices and vectors in parallel. Take a look at 
https://petsc.org/release/docs/manualpages/Vec/VecGetOwnershipRange.html

> On Aug 9, 2022, at 11:21 AM, Thomas Saigre  wrote:
> 
> Hi,
> 
> I've been trying for a few weeks to construct a matrix from a list of 
> vectors, unsuccessfully.
> Here is my problem : I have a list l a petsc4py.Vec, each vector has a size 
> n, and I have d vectors. I want to "cast" these vectors to a petsc4py.Mat Z 
> of shape (n,d), where Z[:, i] = l[i] (using NumPy notation)
> 
> Here is the code I'm using :
> 
> import sys
> from petsc4py import PETSc
> n = 5
> d = 10
> 
> l = []   # creation of the list of vectors
> for i in range(d):
> v = PETSc.Vec().create()
> v.setSizes(n)
> v.setFromOptions()
> v.set(i)
> l.append(v)
> 
> Z = PETSc.Mat().create()
> Z.setSizes([n, d])
> Z.setFromOptions()
> Z.setUp()
> for i, v in enumerate(l):
> Z.setValues(range(n), i, v)
> Z.assemble()
> Z.view()# to display the matrix in the terminal
> 
> In sequential, the result is correct :
> 
> Mat Object: 1 MPI processes
>   type: seqaij
> row 0: (0, 0.)  (1, 1.)  (2, 2.)  (3, 3.)  (4, 4.)  (5, 5.)  (6, 6.)  (7, 7.) 
>  (8, 8.)  (9, 9.)
> row 1: (0, 0.)  (1, 1.)  (2, 2.)  (3, 3.)  (4, 4.)  (5, 5.)  (6, 6.)  (7, 7.) 
>  (8, 8.)  (9, 9.)
> row 2: (0, 0.)  (1, 1.)  (2, 2.)  (3, 3.)  (4, 4.)  (5, 5.)  (6, 6.)  (7, 7.) 
>  (8, 8.)  (9, 9.)
> row 3: (0, 0.)  (1, 1.)  (2, 2.)  (3, 3.)  (4, 4.)  (5, 5.)  (6, 6.)  (7, 7.) 
>  (8, 8.)  (9, 9.)
> row 4: (0, 0.)  (1, 1.)  (2, 2.)  (3, 3.)  (4, 4.)  (5, 5.)  (6, 6.)  (7, 7.) 
>  (8, 8.)  (9, 9.)
> 
>  but when I run it using the command mpirun -np 2 python3 file.py, I get the 
> following error, about incompatible array sizes (I did not manage to 
> understand what ni, nj and nv correspond to...)
> 
> Traceback (most recent call last):
>   File "/home/Documents/code/tests/file.py", line 31, in 
> Z.setValues(list(range(n)), i, v)
>   File "PETSc/Mat.pyx", line 888, in petsc4py.PETSc.Mat.setValues
>   File "PETSc/petscmat.pxi", line 828, in petsc4py.PETSc.matsetvalues
> ValueError: incompatible array sizes: ni=5, nj=1, nv=3
> Traceback (most recent call last):
>   File "/home/saigre/Documents/code/tests/t2.py", line 31, in 
> Z.setValues(list(range(n)), i, v)
>   File "PETSc/Mat.pyx", line 888, in petsc4py.PETSc.Mat.setValues
>   File "PETSc/petscmat.pxi", line 828, in petsc4py.PETSc.matsetvalues
> ValueError: incompatible array sizes: ni=5, nj=1, nv=2
> 
> Two weeks ago, I made a post on stack overflow 
> (https://stackoverflow.com/questions/73124230/convert-a-list-of-vector-to-a-matrix-with-petsc4py).
>  I tried using the apt packages, and I also compiled from the sources, but I 
> get the same error.
> 
> I someone has an idea how to succeed in it, I'm all ears !
> 
> Thanks,
> 
> Thomas



Re: [petsc-users] Error running src/snes/tutorials/ex19 on Nvidia Tesla K40m : CUDA ERROR (code = 101, invalid device ordinal)

2022-07-14 Thread Stefano Zampini
You don't need unified memory for boomeramg to work.

On Thu, Jul 14, 2022, 18:55 Barry Smith  wrote:

>
>   So the PETSc test all run, including the test that uses a GPU.
>
>   The hypre test is failing. It is impossible to tell from the output why.
>
>   You can run it manually, cd src/snes/tutorials
>
> make ex19
> mpiexec -n 1 ./ex19 -dm_vec_type cuda -dm_mat_type aijcusparse -da_refine
> 3 -snes_monitor_short -ksp_norm_type unpreconditioned -pc_type hypre -info
> > somefile
>
> then take a look at the output in somefile and send it to us.
>
>   Barry
>
>
>
> On Jul 14, 2022, at 12:32 PM, Juan Pablo de Lima Costa Salazar via
> petsc-users  wrote:
>
> Hello,
>
> I was hoping to get help regarding a runtime error I am encountering on a
> cluster node with 4 Tesla K40m GPUs after configuring PETSc with the
> following command:
>
> $./configure --force \
>   --with-precision=double  \
>   --with-debugging=0 \
>   --COPTFLAGS=-O3 \
>   --CXXOPTFLAGS=-O3 \
>   --FOPTFLAGS=-O3 \
>   PETSC_ARCH=linux64GccDPInt32-spack \
>   --download-fblaslapack \
>   --download-openblas \
>   --download-hypre \
>
> --download-hypre-configure-arguments=--enable-unified-memory \
>   --with-mpi-dir=/opt/ohpc/pub/mpi/openmpi4-gnu9/4.0.4 \
>   --with-cuda=1 \
>   --download-suitesparse \
>   --download-dir=downloads \
>
> --with-cudac=/opt/ohpc/admin/spack/0.15.0/opt/spack/linux-centos8-ivybridge/gcc-9.3.0/cuda-11.7.0-hel25vgwc7fixnvfl5ipvnh34fnskw3m/bin/nvcc
> \
>   --with-packages-download-dir=downloads \
>   --download-sowing=downloads/v1.1.26-p4.tar.gz \
>   --with-cuda-arch=35
>
> When I run
>
> $ make PETSC_DIR=/home/juan/OpenFOAM/juan-v2206/petsc-cuda
> PETSC_ARCH=linux64GccDPInt32-spack check
> Running check examples to verify correct installation
> Using PETSC_DIR=/home/juan/OpenFOAM/juan-v2206/petsc-cuda and
> PETSC_ARCH=linux64GccDPInt32-spack
> C/C++ example src/snes/tutorials/ex19 run successfully with 1 MPI process
> C/C++ example src/snes/tutorials/ex19 run successfully with 2 MPI processes
> 3,5c3,15
> <   1 SNES Function norm 4.12227e-06
> <   2 SNES Function norm 6.098e-11
> < Number of SNES iterations = 2
> ---
> > CUDA ERROR (code = 101, invalid device ordinal) at memory.c:139
> > CUDA ERROR (code = 101, invalid device ordinal) at memory.c:139
> >
> --
> > Primary job  terminated normally, but 1 process returned
> > a non-zero exit code. Per user-direction, the job has been aborted.
> >
> --
> >
> --
> > mpiexec detected that one or more processes exited with non-zero status,
> thus causing
> > the job to be terminated. The first process to do so was:
> >
> >   Process name: [[52712,1],0]
> >   Exit code:1
> >
> --
> /home/juan/OpenFOAM/juan-v2206/petsc-cuda/src/snes/tutorials
> Possible problem with ex19 running with hypre, diffs above
> =
> C/C++ example src/snes/tutorials/ex19 run successfully with cuda
> C/C++ example src/snes/tutorials/ex19 run successfully with suitesparse
> Fortran example src/snes/tutorials/ex5f run successfully with 1 MPI process
> Completed test examples
>
> I have compiled the code on the head node (without GPUs) and on the
> compute node where there are 4 GPUs.
>
> $nvidia-debugdump -l
> Found 4 NVIDIA devices
> Device ID:  0
> Device name:Tesla K40m
> GPU internal ID:0320717032250
>
> Device ID:  1
> Device name:Tesla K40m
> GPU internal ID:0320717031968
>
> Device ID:  2
> Device name:Tesla K40m
> GPU internal ID:0320717032246
>
> Device ID:  3
> Device name:Tesla K40m
> GPU internal ID:0320717032235
>
> Attached are the log files form configure and make.
>
> Any pointers are highly appreciated. My intention is to use PETSc as a
> linear solver for OpenFOAM, leveraging the availability of GPUs at the same
> time. Currently I can run PETSc without GPU support.
>
> Cheers,
> Juan S.
>
>
>
>
>
> 
>
>
>


Re: [petsc-users] Strange strong scaling result

2022-07-11 Thread Stefano Zampini
It depends on the solver used. What solver are you using?

> On Jul 11, 2022, at 5:33 PM, Ce Qin  wrote:
> 
> Dear all,
> 
> I want to analyze the strong scaling of our in-house FEM code.
> The test problem has about 20M DoFs. I ran the problem using
> various settings. The speedups for the assembly and solving
> procedures are as follows:
>Assembly Solving
> NProcessors NNodes CoresPerNode
> 1   1  11.0 1.0
> 2   1  2   1.9952461.898756
> 2  1   2.1214012.436149
> 4   1  4   4.6581876.004539
> 2  2   4.675.942085
> 4  14.652726.101214
> 8   2  4   9.380985   16.581135
> 4  2   9.308575   17.258891
> 8  1   9.314449   17.380612
> 16  2  8  18.575953   34.483058
> 4  4  18.745129   34.854409
> 8  2  18.82839336.45509
> 32  4  8  37.140626   70.175879
> 8  4  37.166421   71.533865
> 
> I don't quite understand this result. Why we can achieve a speedup of
> about 70+ using 32 processors? Could you please help me explain this?
> 
> Thank you in advance.
> 
> Best,
> Ce
> 
> 



Re: [petsc-users] Solving a linear system with sparse matrices

2022-05-21 Thread Stefano Zampini
MatMatSolve  with sparse rhs works with MUMPS

On Sat, May 21, 2022, 19:41 Mateo José Moinelo 
wrote:

> Hi. I am working on a Petsc program, and I have a problem. I have two
> sparse matrices A and B, and I want to compute inv(A)*B. I understand that
> computing the inverse of a matrix directly is not effective, and that in
> this case, the best way to do the operation is instead solving the system
> A*X = B, being X the result. I reviewed the documentation, and found some
> interesting options, like the function MatMatSolve, or using a KSP.
>
> The problem I found is that these options seem to work only for dense
> matrices, and I do not know how to convert my sparce matrices to dense.
> What can I do in this case?.
>
> Thanks in advance.
>


Re: [petsc-users] Update of the buffer

2022-02-10 Thread Stefano Zampini


> On Feb 10, 2022, at 6:00 PM, Matthew Knepley  wrote:
> 
> On Thu, Feb 10, 2022 at 9:17 AM Medane TCHAKOROM 
> mailto:medane.tchako...@univ-fcomte.fr>> 
> wrote:
> Hello ,
> 
> Sorry if this question does not belong to this mailling list, i'am using 
> Petsc , but with some
> 
> MPI parts code, when dealing with communication.
> 
> If a make two consecutive MPI_Isend requests, and if the destination 
> processor has not yet receive the message inbetween the two calls, will the 
> buffer be updated ? I mean if I send message "1" for the first request, then 
> send "0" as the second message. Will the receiver receive "0" as message ? I 
> not, how can I do to update the message ?
> 
> I believe that MPI guarantees in-order message delivery from a source to a 
> target, so if you send 1 before 0, the receiver
> should get them in that order. However, someone here should know for sure.

I don’t think so. You should use tags to flag the proper operation and wait for 
it if you need the value to arrive.


> 
>   Thanks,
> 
> Matt
>  
> Thanks
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any results to which their experiments 
> lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ 



Re: [petsc-users] PETSc GPU MatMatMult performance question

2022-02-03 Thread Stefano Zampini
1) It uses MatMPIDenseScatter() to move to the other ranks their needed
> rows of the C matrix. That function has the call MatDenseGetArrayRead()
> normally would trigger a copy of C up to the CPU each time. But since C is
> not changing in your test run I guess it only triggers one copy.
>
> 2) If uses
> MatMatMultNumericAdd_SeqAIJ_SeqDense(aij->B,workB,cdense->A,PETSC_TRUE);CHKERRQ(ierr);
> to do the off diagonal part of the product but this triggers for each
> multiply a copy of the result matrix from the CPU to the GPU (hugely
> expensive)
>
> For performance there needs to be a new routine 
> MatMatMultNumeric_MPIAIJCUSPRSE_MPICUDADense()
> that is smarter about the needed MPI communication so it only moves exactly
> what it needs to the other ranks and it does the off-diagonal part of the
> product on the GPU so it does not need to copy the result up to the CPU.
>
>
MPIAIJCUSPARSE uses MatProductSetFromOptions_MPIAIJBACKEND

Rohan
I would suggest to add PetscLogStage around your performance loop (do a
warmup outside of it) and send the relevant portion of the log


> Barry
>
>
>
>
>
>
> -- PETSc Performance Summary: 
> --
>
> /g/g15/yadav2/taco/petsc/bin/benchmark on a  named lassen457 with 2 
> processors, by yadav2 Wed Feb  2 17:23:19 2022
> Using Petsc Release Version 3.16.3, unknown
>
>  Max   Max/Min Avg   Total
> Time (sec):   1.163e+02 1.000   1.163e+02
> Objects:  4.800e+01 1.000   4.800e+01
> Flop: 6.338e+11 1.065   6.144e+11  1.229e+12
> Flop/sec: 5.451e+09 1.065   5.284e+09  1.057e+10
> MPI Messages: 3.500e+01 1.000   3.500e+01  7.000e+01
> MPI Message Lengths:  2.544e+09 1.000   7.267e+07  5.087e+09
> MPI Reductions:   8.100e+01 1.000
>
> Flop counting convention: 1 flop = 1 real number operation of type 
> (multiply/divide/add/subtract)
> e.g., VecAXPY() for real vectors of length N --> 
> 2N flop
> and VecAXPY() for complex vectors of length N --> 
> 8N flop
>
> Summary of Stages:   - Time --  - Flop --  --- Messages ---  
> -- Message Lengths --  -- Reductions --
> Avg %Total Avg %TotalCount   %Total   
>   Avg %TotalCount   %Total
>  0:  Main Stage: 1.1628e+02 100.0%  1.2288e+12 100.0%  7.000e+01 100.0%  
> 7.267e+07  100.0%  6.300e+01  77.8%
>
> 
> See the 'Profiling' chapter of the users' manual for details on interpreting 
> output.
> Phase summary info:
>Count: number of times phase was executed
>Time and Flop: Max - maximum over all processors
>   Ratio - ratio of maximum to minimum over all processors
>Mess: number of messages sent
>AvgLen: average message length (bytes)
>Reduct: number of global reductions
>Global: entire computation
>Stage: stages of a computation. Set stages with PetscLogStagePush() and 
> PetscLogStagePop().
>   %T - percent time in this phase %F - percent flop in this phase
>   %M - percent messages in this phase %L - percent message lengths in 
> this phase
>   %R - percent reductions in this phase
>Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over 
> all processors)
>GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU 
> time over all processors)
>CpuToGpu Count: total number of CPU to GPU copies per processor
>CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per 
> processor)
>GpuToCpu Count: total number of GPU to CPU copies per processor
>GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per 
> processor)
>GPU %F: percent flops on GPU in this event
> 
> EventCount  Time (sec) Flop   
>--- Global ---  --- Stage   Total   GPU- CpuToGpu -   - GpuToCpu - 
> GPU
>Max Ratio  Max Ratio   Max  Ratio  Mess   AvgLen  
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s Count   Size   Count   
> Size  %F
> ---
>
> --- Event Stage 0: Main Stage
>
> BuildTwoSided  2 1.0 4.4400e-01567.5 0.00e+00 0.0 2.0e+00 4.0e+00 
> 2.0e+00  0  0  3  0  2   0  0  3  0  3 0   0  0 0.00e+000 
> 0.00e+00  0
> BuildTwoSidedF 1 1.0 4.4395e-0115659.1 0.00e+00 0.0 0.0e+00 0.0e+00 
> 1.0e+00  0  0  0  0  1   0  0  0  0  2 0   0  0 0.00e+000 
> 0.00e+00  0
> 

Re: [petsc-users] [petsc-dev] MatPreallocatorPreallocate segfault with PETSC 3.16

2022-02-01 Thread Stefano Zampini
Il giorno mar 1 feb 2022 alle ore 18:34 Jed Brown  ha
scritto:

> Patrick Sanan  writes:
>
> > Am Di., 1. Feb. 2022 um 16:20 Uhr schrieb Jed Brown :
> >
> >> Patrick Sanan  writes:
> >>
> >> > Sorry about the delay on this. I can reproduce.
> >> >
> >> > This regression appears to be a result of this optimization:
> >> > https://gitlab.com/petsc/petsc/-/merge_requests/4273
> >>
> >> Thanks for tracking this down. Is there a reason to prefer preallocating
> >> twice
> >>
> >>ierr =
> >> MatPreallocatorPreallocate(preallocator,PETSC_TRUE,A);CHKERRQ(ierr);
> >>ierr =
> >>
> MatPreallocatorPreallocate(preallocator,PETSC_TRUE,A_duplicate);CHKERRQ(ierr);
> >>
> >> versus using MatDuplicate() or MatConvert()?
> >>
>

Jed

this is not the point. Suppose you pass around only a preallocator, but do
not pass around the matrices. Reusing the preallocator should be allowed.

>
> > Maybe if your preallocation is an overestimate for each of two different
> > post-assembly non-zero structures in A and A_duplicate?
>
> Even then, why not preallocate A and duplicate immediately, before
> compressing out zeros?
>


-- 
Stefano


Re: [petsc-users] Error when configuring the PETSC environment

2022-01-25 Thread Stefano Zampini
You should attach configure.log if you want us to take a look at the failure. 
In any case, you should use gmake that you can install via brew

> On Jan 26, 2022, at 12:53 AM, Peng, Kang  wrote:
> 
> Hi PETSc,
> 
> I am trying to configure the PETSC environment in MacOS (Apple M1 pro chip, 
> macOS 12.1), but something went wrong when I executing those command below. I 
> tried many methods but failed to solve it. Could you help me to solve it? 
> 
> I’ve been following this instruction to install PETSc and configure the 
> environment, but I can’t do it after changing to the new chip.
> https://www.pflotran.org/documentation/user_guide/how_to/installation/linux.html#linux-install
>  
> 
> 
> Error is as follows:
> von@MacBook-Pro-VON petsc % ./configure --CFLAGS='-O3' --CXXFLAGS='-O3' 
> --FFLAGS='-O3' --with-debugging=no --download-mpich=yes --download-hdf5=yes 
> --download-fblaslapack=yes --download-cmake=yes --download-metis=yes 
> --download-parmetis=yes --download-hdf5-fortran-bindings=yes 
> --download-hdf5-configure-arguments="--with-zlib=yes"
> =
>   Configuring PETSc to compile on your system 
>
> =
> =
>   
>* WARNING: You have a version of GNU make older than 4.0. It will 
> work, 
> but may not support all the parallel testing options. You 
> can install the   
> latest GNU make with your package manager, 
> such as brew or macports, or use  
>the --download-make option to 
> get the latest GNU make * 
>   
> =
>
> =
>   
>Running configure on CMAKE; this may take several minutes  
>   
>  
> =
>   
>  
> ***
>  UNABLE to CONFIGURE with GIVEN OPTIONS(see configure.log for 
> details):
> ---
> Error running configure on CMAKE
> ***
>  
> ***
>  UNABLE to CONFIGURE with GIVEN OPTIONS(see configure.log for 
> details):
> ---
> Error running configure on CMAKE
> ***
>   File "/Users/von/petsc/config/configure.py", line 465, in petsc_configure
> framework.configure(out = sys.stdout)
>   File "/Users/von/petsc/config/BuildSystem/config/framework.py", line 1385, 
> in configure
> self.processChildren()
>   File "/Users/von/petsc/config/BuildSystem/config/framework.py", line 1373, 
> in processChildren
> self.serialEvaluation(self.childGraph)
>   File "/Users/von/petsc/config/BuildSystem/config/framework.py", line 1348, 
> in serialEvaluation
> child.configure()
>   File "/Users/von/petsc/config/BuildSystem/config/packages/cmake.py", line 
> 75, in configure
> config.package.GNUPackage.configure(self)
>   File "/Users/von/petsc/config/BuildSystem/config/package.py", line 1189, in 
> configure
> self.executeTest(self.configureLibrary)
>   File "/Users/von/petsc/config/BuildSystem/config/base.py", line 138, in 
> executeTest
> ret = test(*args,**kargs)
>   File "/Users/von/petsc/config/BuildSystem/config/package.py", line 935, in 
> configureLibrary
> for location, directory, lib, incl in self.generateGuesses():
>   File "/Users/von/petsc/config/BuildSystem/config/package.py", line 509, in 
> generateGuesses
> d 

Re: [petsc-users] hypre / hip usage

2022-01-22 Thread Stefano Zampini
Mark

the two options are only there to test the code in CI, and are not needed
in general

   '--download-hypre-configure-arguments=--enable-unified-memory',
This is only here to test the unified memory code path

'--with-hypre-gpuarch=gfx90a',
This is not needed if rocminfo is in PATH

Our interface code with HYPRE GPU works fine for HIP, it is tested in CI.
The -mat_type hypre assembling for ex19 does not work because ex19 uses
FDColoring. Just assemble in mpiaij format (look at  runex19_hypre_hip in
src/snes/tutorials/makefile); the interface code will copy the matrix to
the GPU

Il giorno ven 21 gen 2022 alle ore 19:24 Mark Adams  ha
scritto:

>
>
> On Fri, Jan 21, 2022 at 11:14 AM Jed Brown  wrote:
>
>> "Paul T. Bauman"  writes:
>>
>> > On Fri, Jan 21, 2022 at 8:52 AM Paul T. Bauman 
>> wrote:
>> >> Yes. The way HYPRE's memory model is setup is that ALL GPU allocations
>> are
>> >> "native" (i.e. [cuda,hip]Malloc) or, if unified memory is enabled,
>> then ALL
>> >> GPU allocations are unified memory (i.e. [cuda,hip]MallocManaged).
>> >> Regarding HIP, there is an HMM implementation of hipMallocManaged
>> planned,
>> >> but is it not yet delivered AFAIK (and it will *not* support gfx906,
>> e.g.
>> >> RVII, FYI), so, today, under the covers, hipMallocManaged is calling
>> >> hipHostMalloc. So, today, all your unified memory allocations in HYPRE
>> on
>> >> HIP are doing CPU-pinned memory accesses. And performance is just truly
>> >> terrible (as you might expect).
>>
>> Thanks for this important bit of information.
>>
>> And it sounds like when we add support to hand off Kokkos matrices and
>> vectors (our current support for matrices on ROCm devices uses Kokkos) or
>> add direct support for hipSparse, we'll avoid touching host memory in
>> assembly-to-solve with hypre.
>>
>
> It does not look like anyone has made Hypre work with HIP. Stafano added a
> runex19_hypre_hip target 4 months ago and hypre.py has some HIP things.
>
> I have a user that would like to try this, no hurry but, can I get an idea
> of a plan for this?
>
> Thanks,
> Mark
>
>


-- 
Stefano


Re: [petsc-users] Nullspaces

2022-01-04 Thread Stefano Zampini
Il Mar 4 Gen 2022, 16:56 Marco Cisternino  ha
scritto:

> Hello Mark,
>
> I analyzed the codes with valgrind, both the real code and the tiny one.
>
> I obviously used memcheck tool but with full leak check compiling the
> codes with debug info.
>
> Not considering OpenMPI events (I have no wrappers on the machine I used
> for the analysis), the real code gave zero errors and the tiny one gave this
>
> ==17911== 905,536 (1,552 direct, 903,984 indirect) bytes in 1 blocks are
> definitely lost in loss record 33 of 33
>
> ==17911==at 0x483E340: memalign (in
> /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
>
> ==17911==by 0x49CB672: PetscMallocAlign (in
> /usr/lib/x86_64-linux-gnu/libpetsc_real.so.3.12.4)
>
> ==17911==by 0x49CBE1D: PetscMallocA (in
> /usr/lib/x86_64-linux-gnu/libpetsc_real.so.3.12.4)
>
> ==17911==by 0x4B26187: VecCreate (in
> /usr/lib/x86_64-linux-gnu/libpetsc_real.so.3.12.4)
>
> ==17911==by 0x10940D: main (testNullSpace.cpp:30)
>
>
>
> due to the fact that I forgot to destroy the solution Vec (adding
> VecDestroy() at the end of the main, the error disappear).
>
> For both the codes, I analyzed the two ways of passing the constant to the
> null space of the operator, no memory errors but still the same results
> from MatNullSpaceTest, i.e.
>
>
>
> MatNullSpaceCreate(PETSC_COMM_WORLD, PETSC_TRUE, 0, nullptr, );
>
>
>
> passes the test while
>
>
>
> Vec* nsp;
>
> VecDuplicateVecs(solution, 1, );
>
> VecSet(nsp[0],1.0);
>
> VecNormalize(nsp[0], nullptr);
>
> MatNullSpaceCreate(PETSC_COMM_WORLD, PETSC_FALSE, 1, nsp, );
>
> VecDestroyVecs(1,);
>
> PetscFree(nsp);
>
>
>
> does not.
>
>
>
> I hope this can satisfy your doubt about the memory behavior, but please
> do not hesitate to ask for more analysis if it cannot.
>
>
>
> As Matthew said some weeks ago, something should be wrong in the code, I
> would say in the matrix, that’s why I provided the matrix and the way I
> test it.
>
> Unfortunately, it is hard (read impossible) for me to share the code
> producing the matrix. I hope the minimal code I provided is enough to
> understand something.
>

Try running slepc to find the smallest eigenvalues.  There should be two
zero eigenvalues,  then inspect the eigenvectors

>
>
> Thank you all.
>
>
>
> Marco Cisternino
>
>
>
> *From:* Marco Cisternino
> *Sent:* lunedì 3 gennaio 2022 16:08
> *To:* Mark Adams 
> *Cc:* Matthew Knepley ; petsc-users <
> petsc-users@mcs.anl.gov>
> *Subject:* RE: [petsc-users] Nullspaces
>
>
>
> We usually analyze the code with valgrind, when important changes are
> implemented.
>
> I have to admit that this analysis is still not automatic and the case we
> are talking about is not a test case for our workload.
>
> The test cases we have give no errors in valgrind analysis.
>
>
>
> However, I will analyze both the real code and the tiny one for this case
> with valgrind and report the results.
>
>
>
> Thank you,
>
>
>
>
>
> Marco Cisternino
>
>
>
> *From:* Mark Adams 
> *Sent:* lunedì 3 gennaio 2022 15:50
> *To:* Marco Cisternino 
> *Cc:* Matthew Knepley ; petsc-users <
> petsc-users@mcs.anl.gov>
> *Subject:* Re: [petsc-users] Nullspaces
>
>
>
> I have not looked at your code, but as a general observation you want to
> have some sort of memory checker, like valgrid for CPUs, in your workflow.
>
> It is the fastest way to find some classes of bugs.
>
>
>
> On Mon, Jan 3, 2022 at 8:47 AM Marco Cisternino <
> marco.cistern...@optimad.it> wrote:
>
> Are you talking about the code that produce the linear system or about the
> tiny code that test the null space?
> In the first case, it is absolutely possible, but I would expect no
> problem in the tiny code, do you agree?
> It is important to remark that the real code and the tiny one behave in
> the same way when testing the null space of the operator. I can analyze
> with valgrind and I will, but I would not expect great insights.
>
>
>
> Thanks,
>
>
>
> Marco Cisternino, PhD
> marco.cistern...@optimad.it
>
> __
>
> Optimad Engineering Srl
>
> Via Bligny 5, Torino, Italia.
> +3901119719782
> www.optimad.it
>
>
>
> *From:* Mark Adams 
> *Sent:* lunedì 3 gennaio 2022 14:42
> *To:* Marco Cisternino 
> *Cc:* Matthew Knepley ; petsc-users <
> petsc-users@mcs.anl.gov>
> *Subject:* Re: [petsc-users] Nullspaces
>
>
>
> There could be a memory bug that does not cause a noticeable problem until
> it hits some vital data and valgrind might find it on a small problem.
>
>
>
> However you might have a bug like a hardwired buffer size that
> overflows that is in fact not a bug until you get to this large size and in
> that case valgrid would need to be run on the large case and would have a
> good chance of finding it.
>
>
>
>
>
> On Mon, Jan 3, 2022 at 4:42 AM Marco Cisternino <
> marco.cistern...@optimad.it> wrote:
>
> My comments are between the Mark’s lines and they starts with “#”
>
>
>
> Marco Cisternino
>
>
>
> *From:* Mark Adams 
> *Sent:* sabato 25 dicembre 2021 14:59
> 

Re: [petsc-users] PCs and MATIS

2021-12-14 Thread Stefano Zampini
Eric

What Pierre and Barry suggested is OK.
If you want to take a look at how to use MATIS with overlapped meshes, see 
https://gitlab.com/petsc/petsc/-/blob/main/src/dm/impls/plex/plexhpddm.c 

This code assembles the local Neumann problem in the overlapped mesh as needed 
by there GenEO preconditioner.

> On Dec 15, 2021, at 6:42 AM, Eric Chamberland 
>  wrote:
> 
> Hi Barry,
> 
> yes the overlapping meshes match.  They will be generated by PETSc 
> DMPlexDistributeOverlap but transposed in our in-house code and we are using 
> Matthew's branch (https://gitlab.com/petsc/petsc/-/merge_requests/4547 
> ) to have it work all 
> right.
> 
> 
> On 2021-12-14 4:00 p.m., Barry Smith wrote:
>>Do the "overlapping meshes" "match" in the overlap regions or are you 
>> connecting completely different meshes discretizations by boundary 
>> conditions along the edges of all the sub meshes?  In other words, will your 
>> global linear system be defined by your overlapping meshes?
>> 
>>If they match but you want to use more general boundary conditions for 
>> the subproblems than PCASM supports by default you might be able to use 
>> PCSetModifySubMatrices() to allow you to modify the sub matrices before they 
>> are used in the preconditioned; you can for example modify the entries along 
>> the boundary of the domain to represent Robin's conditions. Or you can put 
>> whatever you want into the entire submatrix if modifying them is too tedious.
> 
> Ok, that is exactly what we spoted!
> 
> But since we will need the unassembled (MATIS) matrices, we have hit the same 
> problem Pierre got: the call to MatCreateSubMatrices is not directly 
> avoidable, so we will have to use the same trick you gave him some time 
> ago... ;)
> 
> 
>> 
>>But if the meshes don't match then you don't really need a new PC you 
>> need to even define what your nonlinear system is and you have a very big 
>> project to write a PDE solver for non-matching overlapping grids using PETSc.
> 
> ok, no, that was not our goal right now...
> 
> Thanks again Pierre and Barry for your fast answers! :)
> 
> Eric
> 
>>   Barry
>> 
>> 
>>> On Dec 14, 2021, at 3:37 PM, Eric Chamberland 
>>>  wrote:
>>> 
>>> Hi,
>>> 
>>> We want to use an Additive Schwarz preconditioner (like PCASM) combined 
>>> with overlapping meshes *and* specific boundary conditions to local (MATIS) 
>>> matrices.
>>> 
>>> At first sight, MATIS is only supported for BDDC and FETI-DP and is not 
>>> working with PCASM.
>>> 
>>> Do we have to write a new PC from scratch to combine the use of mesh 
>>> overlap, MATIS and customized local matrices?
>>> 
>>> ...or is there any working example we should look at to start from? :)
>>> 
>>> Thanks a lot!
>>> 
>>> Eric
>>> 
>>> -- 
>>> Eric Chamberland, ing., M. Ing
>>> Professionnel de recherche
>>> GIREF/Université Laval
>>> (418) 656-2131 poste 41 22 42
>>> 
> -- 
> Eric Chamberland, ing., M. Ing
> Professionnel de recherche
> GIREF/Université Laval
> (418) 656-2131 poste 41 22 42



Re: [petsc-users] hypre on gpus

2021-11-22 Thread Stefano Zampini
You don't need to specify the HYPRE commit. Remove
--download-hypre-commit=origin/hypre_petsc
from the configuration options

Il giorno lun 22 nov 2021 alle ore 17:29 Matthew Knepley 
ha scritto:

> On Mon, Nov 22, 2021 at 8:50 AM Karthikeyan Chockalingam - STFC UKRI <
> karthikeyan.chockalin...@stfc.ac.uk> wrote:
>
>> Hi Matt,
>>
>>
>>
>> Below is the entire error message:
>>
>
> I cannot reproduce this:
>
> main $:/PETSc3/petsc/petsc-dev/src/ksp/ksp/tutorials$ ./ex4 -ksp_view
> -ksp_type cg -mat_type hypre -pc_type hypre
> KSP Object: 1 MPI processes
>   type: cg
>   maximum iterations=1, initial guess is zero
>   tolerances:  relative=0.000138889, absolute=1e-50, divergence=1.
>   left preconditioning
>   using PRECONDITIONED norm type for convergence test
> PC Object: 1 MPI processes
>   type: hypre
> HYPRE BoomerAMG preconditioning
>   Cycle type V
>   Maximum number of levels 25
>   Maximum number of iterations PER hypre call 1
>   Convergence tolerance PER hypre call 0.
>   Threshold for strong coupling 0.25
>   Interpolation truncation factor 0.
>   Interpolation: max elements per row 0
>   Number of levels of aggressive coarsening 0
>   Number of paths for aggressive coarsening 1
>   Maximum row sums 0.9
>   Sweeps down 1
>   Sweeps up   1
>   Sweeps on coarse1
>   Relax down  symmetric-SOR/Jacobi
>   Relax upsymmetric-SOR/Jacobi
>   Relax on coarse Gaussian-elimination
>   Relax weight  (all)  1.
>   Outer relax weight (all) 1.
>   Using CF-relaxation
>   Not using more complex smoothers.
>   Measure typelocal
>   Coarsen typeFalgout
>   Interpolation type  classical
>   linear system matrix = precond matrix:
>   Mat Object: 1 MPI processes
> type: hypre
> rows=56, cols=56
> Norm of error 8.69801e-05 iterations 2
>
> This is on the 'main' branch.  So either there is some bug in release, or
> something is strange on your end. Since we run Hypre tests for the CI,
> I am leaning toward the latter. Can you try the 'main' branch? We will
> have to use this anyway if we want any fixes.
>
>   Thanks,
>
> Matt
>
>
>>  *[0]PETSC ERROR: - Error Message
>> --*
>>
>> [0]PETSC ERROR: Object is in wrong state
>>
>> [0]PETSC ERROR: Must call MatXXXSetPreallocation(), MatSetUp() or the
>> matrix has not yet been factored on argument 1 "mat" before
>> MatGetOwnershipRange()
>>
>> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
>>
>> [0]PETSC ERROR: Petsc Release Version 3.16.0, Sep 29, 2021
>>
>> [0]PETSC ERROR: ./ex4 on a  named sqg2b4.bullx by kxc07-lxm25 Mon Nov 22
>> 11:33:41 2021
>>
>> [0]PETSC ERROR: Configure options
>> --prefix=/lustre/scafellpike/local/apps/gcc7/petsc/3.16.0-cuda11.2
>> --with-debugging=yes
>> --with-blaslapack-dir=/lustre/scafellpike/local/apps/intel/intel_cs/2018.0.128/mkl
>> --with-cuda=1 --with-cuda-arch=70 --download-hypre=yes
>> --download-hypre-configure-arguments=HYPRE_CUDA_SM=70
>> --download-hypre-commit=origin/hypre_petsc --with-shared-libraries=1
>> --known-mpi-shared-libraries=1 --with-cc=mpicc --with-cxx=mpicxx
>> -with-fc=mpif90
>>
>> [0]PETSC ERROR: #1 MatGetOwnershipRange() at
>> /netfs/smain01/scafellpike/local/package_build/build/rja87-build/petsc-cuda-3.16.0/src/mat/interface/matrix.c:6784
>>
>> [0]PETSC ERROR: #2 main() at ex4.c:40
>>
>> [0]PETSC ERROR: PETSc Option Table entries:
>>
>> [0]PETSC ERROR: -ksp_type cg
>>
>> [0]PETSC ERROR: -ksp_view
>>
>> [0]PETSC ERROR: -mat_type hypre
>>
>> [0]PETSC ERROR: -pc_type hypre
>>
>> [0]PETSC ERROR: -use_gpu_aware_mpi 0
>>
>> *[0]PETSC ERROR: End of Error Message ---send entire
>> error message to petsc-ma...@mcs.anl.gov--*
>>
>> --
>>
>>
>>
>> I have also attached the make.log. Thank you for having a look.
>>
>>
>>
>> Best,
>>
>> Karthik.
>>
>>
>>
>> *From: *Matthew Knepley 
>> *Date: *Monday, 22 November 2021 at 13:41
>> *To: *"Chockalingam, Karthikeyan (STFC,DL,HC)" <
>> karthikeyan.chockalin...@stfc.ac.uk>
>> *Cc: *Mark Adams , "petsc-users@mcs.anl.gov" <
>> petsc-users@mcs.anl.gov>
>> *Subject: *Re: [petsc-users] hypre on gpus
>>
>>
>>
>> On Mon, Nov 22, 2021 at 6:47 AM Karthikeyan Chockalingam - STFC UKRI <
>> karthikeyan.chockalin...@stfc.ac.uk> wrote:
>>
>> Thank you for your response. I tried to run the same example
>>
>>
>>
>> petsc/src/ksp/ksp/tutorials$  *./ex4 -ksp_type cg -mat_type hypre
>> -ksp_view -pc_type hypre*
>>
>>
>>
>> but it crashed with the below error
>>
>>
>>
>> *[0]PETSC ERROR: - Error Message
>> --*
>>
>> [0]PETSC ERROR: Object is in wrong state
>>
>> [0]PETSC ERROR: Must call MatXXXSetPreallocation(), MatSetUp() or the
>> matrix has not 

Re: [petsc-users] PCSHELL does not support getting factor matrix

2021-11-19 Thread Stefano Zampini
Oh, I see the stack trace now. This requires to compute the inertia?

> On Nov 19, 2021, at 3:45 PM, Stefano Zampini  
> wrote:
> 
> Jose
> 
> Now that we have the PCMatApply interface, you could  switch to use that 
> inside SLEPc. I guess you are using MatSolve, right?
> If not, the alternative is to have a PCFactorGetMatrix with creates on the 
> fly an object that behaves like it. The problem is that we do not have a 
> matching restore, and the new object created will be leaked.
> What do you think?
> 
> 
>> On Nov 19, 2021, at 12:41 PM, Jose E. Roman  wrote:
>> 
>> It is trying to call PCFactorGetMatrix() on your PC, but this operation is 
>> not supported by PCSHELL and it is not possible to set it via 
>> PetscObjectComposeFunction(). PCSHELL uses the PCShellSet* interface, that 
>> is restricted to a limited number of operations.
>> 
>> Jose
>> 
>> 
>>> El 19 nov 2021, a las 6:30, Sam Guo  escribió:
>>> 
>>> Dear PETSc dev team,
>>>  I am implementing SLEPc interval option using shell matrix as follows:
>>> 
>>> EPSGetST(eps, );
>>> STSetType(st, STSINVERT);
>>> STGetKSP(st, );
>>> KSPSetOperators(ksp, A, A);
>>> KSPSetType(ksp, KSPPREONLY);
>>> KSPGetPC(ksp, );
>>> KSPGetPC(ksp, )
>>> MatSetOption(A, MAT_SPD, PETSC_TRUE);
>>> PCSetType(pc, PCSHELL);
>>> PCShellSetContext(pc, );
>>> PCShellSetApply(pc, applyPreconditioner);
>>> PetscObjectComposeFunction((PetscObject)pc,"PCFactorGetZeroPivot_C",PCFactorGetZeroPivot_C);
>>> 
>>> When I run it, I get the following error.  Any idea what I did wrong? 
>>> Thanks a lot for your help.
>>> 
>>> [0]PETSC ERROR: No support for this operation for this object type
>>> [0]PETSC ERROR: PC type does not support getting factor matrix
>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for 
>>> trouble shooting.
>>> [0]PETSC ERROR: Petsc Release Version 3.11.3, Jun, 26, 2019 
>>> [0]PETSC ERROR: Unknown Name on a arch-starccmplus_serial_real named 
>>> pl2usbvu0037pc.net.plm.eds.com by cd4hhv Thu Nov 18 21:11:45 2021
>>> Number of iterations of the method: 0
>>> [0]PETSC ERROR: Configure options --with-x=0 --with-fc=0 --with-debugging=1 
>>> --with-blaslapack-dir=/u/cd4hhv/dev2/mkl/2017.2-cda-001/linux/lib/intel64/../..
>>>  --with-mpi=0 -CFLAGS=-g -CXXFLAGS=-g --with-clean=1 --force 
>>> --with-scalar-type=real
>>> [0]PETSC ERROR: #1 PCFactorGetMatrix() line 1332 in 
>>> ../../../petsc/src/ksp/pc/interface/precon.c
>>> [0]PETSC ERROR: #2 EPSSliceGetInertia() line 340 in 
>>> ../../../slepc/src/eps/impls/krylov/krylovschur/ks-slice.c
>>> [0]PETSC ERROR: #3 EPSSetUp_KrylovSchur_Slice() line 467 in 
>>> ../../../slepc/src/eps/impls/krylov/krylovschur/ks-slice.c
>>> [0]PETSC ERROR: #4 EPSSetUp_KrylovSchur() line 146 in 
>>> ../../../slepc/src/eps/impls/krylov/krylovschur/krylovschur.c
>>> [0]PETSC ERROR: #5 EPSSetUp() line 173 in 
>>> ../../../slepc/src/eps/interface/epssetup.c
>>> Solution method: krylovschur
>>> [0]PETSC ERROR: #6 EPSSliceGetEPS() line 306 in 
>>> ../../../slepc/src/eps/impls/krylov/krylovschur/ks-slice.c
>>> [0]PETSC ERROR: #7 EPSSetUp_KrylovSchur_Slice() line 416 in 
>>> ../../../slepc/src/eps/impls/krylov/krylovschur/ks-slice.c
>>> [0]PETSC ERROR: #8 EPSSetUp_KrylovSchur() line 146 in 
>>> ../../../slepc/src/eps/impls/krylov/krylovschur/krylovschur.c
>>> [0]PETSC ERROR: #9 EPSSetUp() line 173 in 
>>> ../../../slepc/src/eps/interface/epssetup.c
>> 
> 



Re: [petsc-users] PCSHELL does not support getting factor matrix

2021-11-19 Thread Stefano Zampini
Jose

Now that we have the PCMatApply interface, you could  switch to use that inside 
SLEPc. I guess you are using MatSolve, right?
If not, the alternative is to have a PCFactorGetMatrix with creates on the fly 
an object that behaves like it. The problem is that we do not have a matching 
restore, and the new object created will be leaked.
What do you think?


> On Nov 19, 2021, at 12:41 PM, Jose E. Roman  wrote:
> 
> It is trying to call PCFactorGetMatrix() on your PC, but this operation is 
> not supported by PCSHELL and it is not possible to set it via 
> PetscObjectComposeFunction(). PCSHELL uses the PCShellSet* interface, that is 
> restricted to a limited number of operations.
> 
> Jose
> 
> 
>> El 19 nov 2021, a las 6:30, Sam Guo  escribió:
>> 
>> Dear PETSc dev team,
>>   I am implementing SLEPc interval option using shell matrix as follows:
>> 
>> EPSGetST(eps, );
>> STSetType(st, STSINVERT);
>> STGetKSP(st, );
>> KSPSetOperators(ksp, A, A);
>> KSPSetType(ksp, KSPPREONLY);
>> KSPGetPC(ksp, );
>> KSPGetPC(ksp, )
>> MatSetOption(A, MAT_SPD, PETSC_TRUE);
>> PCSetType(pc, PCSHELL);
>> PCShellSetContext(pc, );
>> PCShellSetApply(pc, applyPreconditioner);
>> PetscObjectComposeFunction((PetscObject)pc,"PCFactorGetZeroPivot_C",PCFactorGetZeroPivot_C);
>> 
>> When I run it, I get the following error.  Any idea what I did wrong? Thanks 
>> a lot for your help.
>> 
>> [0]PETSC ERROR: No support for this operation for this object type
>> [0]PETSC ERROR: PC type does not support getting factor matrix
>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for 
>> trouble shooting.
>> [0]PETSC ERROR: Petsc Release Version 3.11.3, Jun, 26, 2019 
>> [0]PETSC ERROR: Unknown Name on a arch-starccmplus_serial_real named 
>> pl2usbvu0037pc.net.plm.eds.com by cd4hhv Thu Nov 18 21:11:45 2021
>> Number of iterations of the method: 0
>> [0]PETSC ERROR: Configure options --with-x=0 --with-fc=0 --with-debugging=1 
>> --with-blaslapack-dir=/u/cd4hhv/dev2/mkl/2017.2-cda-001/linux/lib/intel64/../..
>>  --with-mpi=0 -CFLAGS=-g -CXXFLAGS=-g --with-clean=1 --force 
>> --with-scalar-type=real
>> [0]PETSC ERROR: #1 PCFactorGetMatrix() line 1332 in 
>> ../../../petsc/src/ksp/pc/interface/precon.c
>> [0]PETSC ERROR: #2 EPSSliceGetInertia() line 340 in 
>> ../../../slepc/src/eps/impls/krylov/krylovschur/ks-slice.c
>> [0]PETSC ERROR: #3 EPSSetUp_KrylovSchur_Slice() line 467 in 
>> ../../../slepc/src/eps/impls/krylov/krylovschur/ks-slice.c
>> [0]PETSC ERROR: #4 EPSSetUp_KrylovSchur() line 146 in 
>> ../../../slepc/src/eps/impls/krylov/krylovschur/krylovschur.c
>> [0]PETSC ERROR: #5 EPSSetUp() line 173 in 
>> ../../../slepc/src/eps/interface/epssetup.c
>> Solution method: krylovschur
>> [0]PETSC ERROR: #6 EPSSliceGetEPS() line 306 in 
>> ../../../slepc/src/eps/impls/krylov/krylovschur/ks-slice.c
>> [0]PETSC ERROR: #7 EPSSetUp_KrylovSchur_Slice() line 416 in 
>> ../../../slepc/src/eps/impls/krylov/krylovschur/ks-slice.c
>> [0]PETSC ERROR: #8 EPSSetUp_KrylovSchur() line 146 in 
>> ../../../slepc/src/eps/impls/krylov/krylovschur/krylovschur.c
>> [0]PETSC ERROR: #9 EPSSetUp() line 173 in 
>> ../../../slepc/src/eps/interface/epssetup.c
> 



Re: [petsc-users] Installation on NEC SX-Aurora TSUBASA

2021-10-28 Thread Stefano Zampini


> On Oct 28, 2021, at 10:52 PM, Rafael Monteiro da Silva 
>  wrote:
> 
> Thank you Satish and Stefano for pointing me out how to do this.
> 
> Stefano, if I'm interpreting correctly, I could try to add build options I 
> need to this script. Is that correct?

The script configures PETSc with default options for NEC. I don’t recommend 
changing compilation flags


> 
> First, I'll try to install (based on arch-necve.py script) and then, as 
> Satish suggested, include additional build options.
> 

Good luck with building and running these external packages

> 
> Rafael.
> 
> Em qui., 28 de out. de 2021 às 16:38, Stefano Zampini 
> mailto:stefano.zamp...@gmail.com>> escreveu:
> Rafael
> 
> PETSc can be built for NEC vector engines. Here is a sample configure script 
> https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-necve.py 
> <https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-necve.py>
> NEC blas lapack should be automatically used.
> 
> I don’t know if the packages you need will compile and run smoothly. Their 
> C/C++ compiler is very buggy, and I had to resort compiling with -O1to get 
> almost all PETSc  tests pass.
> PETSc automatically uses this optimization flag if you compile using 
> with-debugging=0. Do not use higher optimizations, unless you are willing to 
> file bug reports to them
> 
> 
> Stefano
> 
>> On Oct 28, 2021, at 10:12 PM, Rafael Monteiro da Silva 
>> mailto:rafael.m.si...@alumni.usp.br>> wrote:
>> 
>> Hello.
>> 
>> On my machine, for initial tests, I use the following options to install 
>> petsc:
>> 
>> PETSC_DIR=/home/rafael/petsc PETSC_ARCH=optimized-v3.15.5 --with-debugging=0 
>> COPTFLAGS="-O3 -march=native -mtune=native" CXXOPTFLAGS="-O3 -march=native 
>> -mtune=native" FOPTFLAGS="-O3 -march=native -mtune=native" --with-cc=gcc 
>> --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich 
>> --download-superlu_dist --download-metis --download-parmetis 
>> --download-mumps --download-scalapack --download-hdf5
>> 
>> I need to test our software in an environment with NEC SX-Aurora TSUBASA 
>> Vector Engine.
>> Is there any resource where I can set up petsc to use Vector Engine?
>> 
>> Thank you!
>> Regards,
>> Rafael.
> 



Re: [petsc-users] Installation on NEC SX-Aurora TSUBASA

2021-10-28 Thread Stefano Zampini
Rafael

PETSc can be built for NEC vector engines. Here is a sample configure script 
https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-necve.py 

NEC blas lapack should be automatically used.

I don’t know if the packages you need will compile and run smoothly. Their 
C/C++ compiler is very buggy, and I had to resort compiling with -O1to get 
almost all PETSc  tests pass.
PETSc automatically uses this optimization flag if you compile using 
with-debugging=0. Do not use higher optimizations, unless you are willing to 
file bug reports to them


Stefano

> On Oct 28, 2021, at 10:12 PM, Rafael Monteiro da Silva 
>  wrote:
> 
> Hello.
> 
> On my machine, for initial tests, I use the following options to install 
> petsc:
> 
> PETSC_DIR=/home/rafael/petsc PETSC_ARCH=optimized-v3.15.5 --with-debugging=0 
> COPTFLAGS="-O3 -march=native -mtune=native" CXXOPTFLAGS="-O3 -march=native 
> -mtune=native" FOPTFLAGS="-O3 -march=native -mtune=native" --with-cc=gcc 
> --with-cxx=g++ --with-fc=gfortran --download-fblaslapack --download-mpich 
> --download-superlu_dist --download-metis --download-parmetis --download-mumps 
> --download-scalapack --download-hdf5
> 
> I need to test our software in an environment with NEC SX-Aurora TSUBASA 
> Vector Engine.
> Is there any resource where I can set up petsc to use Vector Engine?
> 
> Thank you!
> Regards,
> Rafael.



Re: [petsc-users] Why PetscDestroy global collective semantics?

2021-10-23 Thread Stefano Zampini
Non-deterministic garbage collection is an issue from Python too, and
firedrake folks are also working on that.

We may consider deferring all calls to MPI_Comm_free done on communicators
with 1 as ref count (i.e., the call will actually wipe out some internal
MPI data) in a collective call that can be either run by the user (on
PETSC_COMM_WORLD), or at PetscFinalize() stage.
I.e., something like that

#define MPI_Comm_free(comm) PutCommInAList(comm)

Comm creation is collective by definition, and thus collectiveness of the
order of the destruction can be easily enforced.
I don't see problems with 3rd party libraries using comms, since we always
duplicate the comm we passed them

Lawrence, do you think this may help you?

Thanks
Stefano

Il giorno dom 24 ott 2021 alle ore 05:58 Barry Smith  ha
scritto:

>
>   Ahh, this makes perfect sense.
>
>   The code for PetscObjectRegisterDestroy() and the actual destruction
> (called in PetscFinalize()) is very simply and can be found in
> src/sys/objects/destroy.c PetscObjectRegisterDestroy(), 
> PetscObjectRegisterDestroyAll().
>
>   You could easily maintain a new array
> like PetscObjectRegisterGCDestroy_Objects[] and add objects
> with PetscObjectRegisterGCDestroy() and then destroy them
> with PetscObjectRegisterDestroyGCAll(). The only tricky part is that you
> have to have, in the context of your Julia MPI, make sure
> that PetscObjectRegisterDestroyGCAll() is called collectively over all the
> MPI ranks (that is it has to be called where all the ranks have made the
> same progress on MPI communication) that have registered objects to
> destroy, generally PETSC_COMM_ALL.  We would be happy to incorporate such a
> system into the PETSc source with a merge request.
>
>   Barry
>
> On Oct 23, 2021, at 10:40 PM, Alberto F. Martín 
> wrote:
>
> Thanks all for your very insightful answers.
>
> We are leveraging PETSc from Julia in a parallel distributed memory
> context (several MPI tasks running the Julia REPL each).
>
> Julia uses Garbage Collection (GC), and we would like to destroy the PETSc
> objects automatically when the GC decides so along the simulation.
>
> In this context, we cannot guarantee deterministic destruction on all MPI
> tasks as the GC decisions are local to each task, no global semantics
> guaranteed.
>
> As far as I understand from your answers, there seems to be the
> possibility to defer the destruction of objects till points in the parallel
> program in which you can guarantee collective semantics, correct? If yes I
> guess that this may occur at any point in the simulation, not necessarily
> at shut down via PetscFinalize(), right?
>
> Best regards,
>
>  Alberto.
>
>
> On 24/10/21 1:10 am, Jacob Faibussowitsch wrote:
>
> Depending on the use-case you may also find PetscObjectRegisterDestroy()
> useful. If you can’t guarantee your PetscObjectDestroy() calls are
> collective, but have some other collective section you may call it then to
> punt the destruction of your object to PetscFinalize() which is guaranteed
> to be collective.
>
> https://petsc.org/main/docs/manualpages/Sys/PetscObjectRegisterDestroy.html
>
> Best regards,
>
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
>
> On Oct 22, 2021, at 23:33, Jed Brown  wrote:
>
> Junchao Zhang  writes:
>
> On Fri, Oct 22, 2021 at 9:13 PM Barry Smith  wrote:
>
>
>  One technical reason is that PetscHeaderDestroy_Private() may call
> PetscCommDestroy() which may call MPI_Comm_free() which is defined by the
> standard to be collective. Though PETSc tries to limit its use of new MPI
> communicators (for example generally many objects shared the same
> communicator) if we did not free those we no longer need when destroying
> objects we could run out.
>
> PetscCommDestroy() might call MPI_Comm_free() , but it is very unlikely.
> Petsc uses reference counting on communicators, so in PetscCommDestroy(),
> it likely just decreases the count. In other words, PetscCommDestroy() is
> cheap and in effect not collective.
>
>
> Unless it's the last reference to a given communicator, which is a
> risky/difficult thing for a user to guarantee and the consequences are
> potentially dire (deadlock being way worse than a crash) when the user's
> intent is to relax ordering for destruction.
>
> Alberto, what is the use case in which deterministic destruction is
> problematic? If you relax it for individual objects, is there a place you
> can be collective to collect any stale communicators?
>
>
> --
> Alberto F. Martín-Huertas
> Senior Researcher, PhD. Computational Science
> Centre Internacional de Mètodes Numèrics a l'Enginyeria (CIMNE)
> Parc Mediterrani de la Tecnologia, UPC
> Esteve Terradas 5, Building C3, Office 215,
> 08860 Castelldefels (Barcelona, Spain)
> Tel.: (+34) 9341 34223e-mail:amar...@cimne.upc.edu
>
> FEMPAR project co-founder
> web: http://www.fempar.org
>
> **
> IMPORTANT ANNOUNCEMENT
>
> The information contained in this message and / or attached file (s), 

Re: [petsc-users] Questions on Petsc4py with PyCUDA

2021-10-23 Thread Stefano Zampini
Use v.setType('veccuda')? Or v.setType(PETSc.Vec.Type.VECCUDA)

Il Sab 23 Ott 2021, 11:46 Guangpu Zhu  ha scritto:

> Dear Sir/Madam,
>
> I am using the Petsc4py with PyCUDA. According to the following
> link
>
>
> https://www.mcs.anl.gov/petsc/petsc4py-current/docs/apiref/petsc4py.PETSc.Vec.Type-class.html
>
> I set the vector type as 'cuda', the simple code is as follows:
>
> import sys
> import petsc4py
> from petsc4py import PETSc
> petsc4py.init(sys.argv)
> from pycuda import autoinit
> import pycuda.driver as drv
> import pycuda.compiler as compiler
> import pycuda.gpuarray as gpuarray
>
> a = PETSc.Vec().create()
> a.setType('cuda')
> a.setSizes(8)
>
> But when I run this code, it always shows that "Unknown vector type: cuda
> ".
>
> I have tried:
>(a) petsc4py 3.15.0 with PyCUDA 2020.1
>(b) petsc4py 3.15.1 with PyCUDA 2021.1
>(c) petsc4py 3.16.0 with PyCUDA 2021.1
>
> but it always shows the same message: Unknown vector type: cuda
>
> The CUDA version on my computer is CUDA 11.3.
>
> So I am writing this e-mail to ask for your help and advice. Thank you in
> advance.
>
>
> Best,
>
> Guangpu Zhu
>
>
> ---
> Guangpu Zhu
>
> Research Associate,  Department of Mechanical Engineering
>
> National University of Singapore
>
> Personal E-mail: zhugp...@gmail.com
>
> Phone: (+65) 87581879
>


Re: [petsc-users] Still reachable memory in valgrind

2021-10-12 Thread Stefano Zampini
Your are using two different mallocs in PETSc. For your 3.14 test,
PetscMallocAlign is used, while for 3.16,  PetscTrMallocDefault is called,
which uses much more memory to trace memory corruption previous allocated
PETSc data.

Il giorno mar 12 ott 2021 alle ore 18:07 Pierre Seize 
ha scritto:

> With 3.14 : both malloc and PetscMalloc1 are definitely lost, which is
> what I want:
>
> ==5463== Memcheck, a memory error detector
> ==5463== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
> ==5463== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
> ==5463== Command: ./build/bin/yanss data/box.yaml
> ==5463==
> ==5463==
> ==5463== HEAP SUMMARY:
> ==5463== in use at exit: 48 bytes in 3 blocks
> ==5463==   total heap usage: 2,092 allocs, 2,089 frees, 9,139,664 bytes
> allocated
> ==5463==
> ==5463== 8 bytes in 1 blocks are definitely lost in loss record 1 of 3
> ==5463==at 0x4C29BE3: malloc (vg_replace_malloc.c:299)
> ==5463==by 0x4191A1: main (main.c:62)
> ==5463==
> ==5463== 8 bytes in 1 blocks are definitely lost in loss record 2 of 3
> ==5463==at 0x4C2BE2D: memalign (vg_replace_malloc.c:858)
> ==5463==by 0x5655AEF: PetscMallocAlign (mal.c:52)
> ==5463==by 0x5657465: PetscMallocA (mal.c:425)
> ==5463==by 0x4191D3: main (main.c:63)
> ==5463==
> ==5463== LEAK SUMMARY:
> ==5463==definitely lost: 16 bytes in 2 blocks
> ==5463==indirectly lost: 0 bytes in 0 blocks
> ==5463==  possibly lost: 0 bytes in 0 blocks
> ==5463==still reachable: 32 bytes in 1 blocks
> ==5463== suppressed: 0 bytes in 0 blocks
> ==5463== Reachable blocks (those to which a pointer was found) are not
> shown.
> ==5463== To see them, rerun with: --leak-check=full --show-leak-kinds=all
> ==5463==
> ==5463== For counts of detected and suppressed errors, rerun with: -v
> ==5463== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
> but on a more recent version, the lost memory from PetscMalloc1 is marked
> ad reachable. It bothers me as I use valgrind to make sure I free
> everything. Usually the lost memory would be reported right away, but now
> it isn't.
> If I understand Barry's answer, this is because the memory block is large
> ("1,636 bytes in 1 blocks") and valgrind gives up on this block tracing ?
> Then out of curiosity, why is this block 8 bytes in 3.14 and 1636 bytes
> today ?
>
> Thank you for your time
> Pierre
>
> On 12/10/21 16:51, Barry Smith wrote:
>
>
>   Do you have the valgrind output from 3.14 ?
>
> 1,636 bytes in 1 blocks are still reachable in loss record 4
>> > of 4
>> > ==2036==at 0x4C2BE2D: memalign (vg_replace_malloc.c:858)
>> > ==2036==by 0x54AC0CB: PetscMallocAlign (mal.c:54)
>> > ==2036==by 0x54AFBA9: PetscTrMallocDefault (mtr.c:183)
>> > ==2036==by 0x54ADDD2: PetscMallocA (mal.c:423)
>> > ==2036==by 0x41A52F: main (main.c:9)
>> > ==2036==
>
>
> Given the large amount of memory in the block I think tracing of PETSc's
> memory allocation is turned on with this run, this may mean the memory is
> reachable but with your 3.14 run I would guess the memory size is 8 bytes
> and tracing is not turned on so the memory is listed as "lost". But I do
> not understand the subtleties of reachable.
>
> Barry
>
>
>
> On Oct 12, 2021, at 10:38 AM, Pierre Seize  wrote:
>
> The "bug" is that memory from PetscMalloc1 that is not freed is reported
> as "definitely lost" in v3.14 (OK) but as "still reachable" in today's
> release (not OK).
>
> Here I forget to free the memory on purpose, I would like valgrind to
> report it's lost and not still reachable.
>
>
> Pierre
>
> On 12/10/21 16:24, Matthew Knepley wrote:
>
> On Tue, Oct 12, 2021 at 10:16 AM Pierre Seize 
> wrote:
>
>> Sorry, I should have tried this before:
>>
>> I checked out to v3.14, and now both malloc and PetscMalloc1 are
>> reported as definitely lost, so I would say it's a bug.
>>
>
> I am not sure what would be the bug. This is correctly reporting that you
> did not free the memory.
>
>   Thanks,
>
> Matt
>
>
>> Pierre
>>
>>
>> On 12/10/21 15:58, Pierre Seize wrote:
>> > Hello petsc-users
>> >
>> > I am using Valgrind with my PETSc application, and I noticed something:
>> >
>> >  1 #include 
>> >  2
>> >  3 int main(int argc, char **argv){
>> >  4   PetscErrorCode ierr = 0;
>> >  5
>> >  6   ierr = PetscInitialize(, , NULL, ""); if (ierr) return
>> > ierr;
>> >  7   PetscReal *foo;
>> >  8   malloc(sizeof(PetscReal));
>> >  9   ierr = PetscMalloc1(1, ); CHKERRQ(ierr);
>> > 10   ierr = PetscFinalize();
>> > 11   return ierr;
>> > 12 }
>> >
>> > With this example, with today's release branch, I've got this Valgrind
>> > result (--leak-check=full --show-leak-kinds=all):
>> >
>> > ==2036== Memcheck, a memory error detector
>> > ==2036== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
>> > ==2036== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright
>> > info
>> > ==2036== Command: ./build/bin/yanss data/box.yaml
>> > ==2036==

Re: [petsc-users] On QN + Fieldsplit

2021-10-12 Thread Stefano Zampini
Il giorno mar 12 ott 2021 alle ore 13:56 Nicolás Barnafi 
ha scritto:

> Hello PETSc users,
>
> first email sent!
> I am creating a SNES solver using fenics, my example runs smoothly with
> 'newtonls', but gives a strange missing function error (error 83):
>
>
Dolphin swallows any useful error information returned from PETSc. You can
try using the below code snippet at the beginning of your script

from petsc4py import PETSc
from dolfin import *
# Remove the dolfin error handler
PETSc.Sys.pushErrorHandler('python')



>
> these are the relevant lines of code where I setup the solver:
>
> > problem = SNESProblem(Res, sol, bcs)
> > b = PETScVector()  # same as b = PETSc.Vec()
> > J_mat = PETScMatrix()
> > snes = PETSc.SNES().create(MPI.COMM_WORLD)
> > snes.setFunction(problem.F, b.vec())
> > snes.setJacobian(problem.J, J_mat.mat())
> > # Set up fieldsplit
> > ksp = snes.ksp
> > ksp.setOperators(J_mat.mat())
> > pc = ksp.pc
> > pc.setType('fieldsplit')
> > dofmap_s = V.sub(0).dofmap().dofs()
> > dofmap_p = V.sub(1).dofmap().dofs()
> > is_s = PETSc.IS().createGeneral(dofmap_s)
> > is_p = PETSc.IS().createGeneral(dofmap_p)
> > pc.setFieldSplitIS((None, is_s), (None, is_p))
> > pc.setFromOptions()
> > snes.setFromOptions()
> > snes.setUp()
>
>
If it can be useful, this are the outputs of snes.view(), ksp.view() and
> pc.view():
>
> >   type: qn
> >   SNES has not been set up so information may be incomplete
> > type is BROYDEN, restart type is DEFAULT, scale type is JACOBIAN
> > Stored subspace size: 10
> > Using the single reduction variant.
> >   maximum iterations=1, maximum function evaluations=3
> >   tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
> >   total number of function evaluations=0
> >   norm schedule ALWAYS
> >   SNESLineSearch Object: 4 MPI processes
> > type: basic
> > maxstep=1.00e+08, minlambda=1.00e-12
> > tolerances: relative=1.00e-08, absolute=1.00e-15,
> lambda=1.00e-08
> > maximum iterations=1
> > KSP Object: 4 MPI processes
> >   type: gmres
> > restart=1000, using Modified Gram-Schmidt Orthogonalization
> > happy breakdown tolerance 1e-30
> >   maximum iterations=1000, initial guess is zero
> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
> >   left preconditioning
> >   using UNPRECONDITIONED norm type for convergence test
> > PC Object: 4 MPI processes
> >   type: fieldsplit
> >   PC has not been set up so information may be incomplete
> > FieldSplit with Schur preconditioner, factorization FULL
>
> I know that PC is not setup, but if I do it before setting up the SNES,
> the error persists. Thanks in advance for your help.
>
> Best,
> Nicolas
> --
> Nicolás Alejandro Barnafi Wittwer
>


-- 
Stefano


Re: [petsc-users] Error "Attempting to use an MPI routine before initializing MPICH" after compiling PETSc with Intel MPI and GCC

2021-10-11 Thread Stefano Zampini
Can you try with a simple call that only calls PetscInitialize/Finalize?


> On Oct 11, 2021, at 3:30 PM, Roland Richter  wrote:
> 
> At least according to configure.log mpiexec was defined as
> 
> Checking for program /opt/intel/oneapi/mpi/2021.4.0//bin/mpiexec...found
>   Defined make macro "MPIEXECEXECUTABLE" to 
> "/opt/intel/oneapi/mpi/2021.4.0/bin/mpiexec"
> 
> When running ex19 with this mpiexec it fails with the usual error, even 
> though all configuration steps worked fine. I attached the configuration log.
> 
> Regards,
> 
> Roland
> 
> Am 11.10.21 um 14:24 schrieb Stefano Zampini:
>> You are most probably using a different mpiexec then the one used to compile 
>> petsc.
>> 
>> 
>> 
>>> On Oct 11, 2021, at 3:23 PM, Roland Richter >> <mailto:roland.rich...@ntnu.no>> wrote:
>>> 
>>> I tried either ./ex19 (SNES-example), mpirun ./ex19 or mpirun -n 1 ./ex19, 
>>> all with the same result.
>>> 
>>> Regards,
>>> 
>>> Roland
>>> 
>>> Am 11.10.21 um 14:22 schrieb Matthew Knepley:
>>>> On Mon, Oct 11, 2021 at 8:07 AM Roland Richter >>> <mailto:roland.rich...@ntnu.no>> wrote:
>>>> Hei,
>>>> 
>>>> at least in gdb it fails with
>>>> 
>>>> Attempting to use an MPI routine before initializing MPICH 
>>>> [Inferior 1 (process 7854) exited with code 01] 
>>>> (gdb) backtrace 
>>>> No stack.
>>>> 
>>>> 
>>>> What were you running? If it never makes it into PETSc code, I am not sure 
>>>> what we are
>>>> doing to cause this.
>>>> 
>>>>   Thanks,
>>>> 
>>>>  Matt
>>>>  
>>>> Regards,
>>>> 
>>>> Roland
>>>> 
>>>> Am 11.10.21 um 13:57 schrieb Matthew Knepley:
>>>>> On Mon, Oct 11, 2021 at 5:24 AM Roland Richter >>>> <mailto:roland.rich...@ntnu.no>> wrote:
>>>>> Hei,
>>>>> 
>>>>> I compiled PETSc with Intel MPI (MPICH) and GCC as compiler (i.e. using
>>>>> Intel OneAPI together with the supplied mpicxx-compiler). Compilation
>>>>> and installation worked fine, but running the tests resulted in the
>>>>> error "Attempting to use an MPI routine before initializing MPICH". A
>>>>> simple test program (attached) worked fine with the same combination.
>>>>> 
>>>>> What could be the reason for that?
>>>>> 
>>>>> Hi Roland,
>>>>> 
>>>>> Can you get a stack trace for this error using the debugger?
>>>>> 
>>>>>   Thanks,
>>>>> 
>>>>>  Matt
>>>>>  
>>>>> Thanks!
>>>>> 
>>>>> Regards,
>>>>> 
>>>>> Roland Richter
>>>>> 
>>>>> 
>>>>> -- 
>>>>> What most experimenters take for granted before they begin their 
>>>>> experiments is infinitely more interesting than any results to which 
>>>>> their experiments lead.
>>>>> -- Norbert Wiener
>>>>> 
>>>>> https://www.cse.buffalo.edu/~knepley/ 
>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>> 
>>>> 
>>>> -- 
>>>> What most experimenters take for granted before they begin their 
>>>> experiments is infinitely more interesting than any results to which their 
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>> 
>>>> https://www.cse.buffalo.edu/~knepley/ 
>>>> <http://www.cse.buffalo.edu/~knepley/>
>> 
> 



Re: [petsc-users] Error "Attempting to use an MPI routine before initializing MPICH" after compiling PETSc with Intel MPI and GCC

2021-10-11 Thread Stefano Zampini
You are most probably using a different mpiexec then the one used to compile 
petsc.



> On Oct 11, 2021, at 3:23 PM, Roland Richter  wrote:
> 
> I tried either ./ex19 (SNES-example), mpirun ./ex19 or mpirun -n 1 ./ex19, 
> all with the same result.
> 
> Regards,
> 
> Roland
> 
> Am 11.10.21 um 14:22 schrieb Matthew Knepley:
>> On Mon, Oct 11, 2021 at 8:07 AM Roland Richter > > wrote:
>> Hei,
>> 
>> at least in gdb it fails with
>> 
>> Attempting to use an MPI routine before initializing MPICH 
>> [Inferior 1 (process 7854) exited with code 01] 
>> (gdb) backtrace 
>> No stack.
>> 
>> 
>> What were you running? If it never makes it into PETSc code, I am not sure 
>> what we are
>> doing to cause this.
>> 
>>   Thanks,
>> 
>>  Matt
>>  
>> Regards,
>> 
>> Roland
>> 
>> Am 11.10.21 um 13:57 schrieb Matthew Knepley:
>>> On Mon, Oct 11, 2021 at 5:24 AM Roland Richter >> > wrote:
>>> Hei,
>>> 
>>> I compiled PETSc with Intel MPI (MPICH) and GCC as compiler (i.e. using
>>> Intel OneAPI together with the supplied mpicxx-compiler). Compilation
>>> and installation worked fine, but running the tests resulted in the
>>> error "Attempting to use an MPI routine before initializing MPICH". A
>>> simple test program (attached) worked fine with the same combination.
>>> 
>>> What could be the reason for that?
>>> 
>>> Hi Roland,
>>> 
>>> Can you get a stack trace for this error using the debugger?
>>> 
>>>   Thanks,
>>> 
>>>  Matt
>>>  
>>> Thanks!
>>> 
>>> Regards,
>>> 
>>> Roland Richter
>>> 
>>> 
>>> -- 
>>> What most experimenters take for granted before they begin their 
>>> experiments is infinitely more interesting than any results to which their 
>>> experiments lead.
>>> -- Norbert Wiener
>>> 
>>> https://www.cse.buffalo.edu/~knepley/ 
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments 
>> is infinitely more interesting than any results to which their experiments 
>> lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/ 



Re: [petsc-users] Error "Attempting to use an MPI routine before initializing MPICH" after compiling PETSc with Intel MPI and GCC

2021-10-11 Thread Stefano Zampini
Try removing line 15

boost_procs = boost::thread::physical_concurrency();

Usually these errors are caused by  destructors called when objects go out of 
scope

> On Oct 11, 2021, at 12:23 PM, Roland Richter  wrote:
> 
> 



Re: [petsc-users] Hypre runtime switch CPU/GPU

2021-10-07 Thread Stefano Zampini
We have discussed full runtime switch in HYPRE with Ruipeng few weeks ago,
I'm not sure what's the status. cc'ing him

Il giorno gio 7 ott 2021 alle ore 14:10 Mark Adams  ha
scritto:

> I'm not sure, but I suspect that Hypre does not support runtime switching
> and our model is that you can switch at runtime. This leads to an
> inconsistency.
>
> If we remove -mat_type hypre then your issue would go away but 1) we would
> have to add it back if hypre supports runtime switching in the future, and
> break everyone's input decks, and 2) it would be inconsistent with the
> PETSc model.
>
> I could see throwing an error if you do not use -mat_type hypre and are
> configured for GPUs.
>
> Mark
>
> On Wed, Oct 6, 2021 at 3:31 PM Milan Pelletier via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
>
>> Dear PETSc users,
>>
>> Is there a way to switch a runtime setting for PETSc+Hypre to run on CPU,
>> even when it has been compiled to allow for GPU support?
>> I looks like setting the matrix and vector types to respectively "seqaij"
>> and "seq" results in GPU computation when Hypre is used as a
>> preconditioner. I thought GPU would be used only when mat_type is set to
>> "hypre", following the examples provided with the last release.
>>
>> Thanks for the help,
>> Best regards,
>>
>> Milan
>>
>>

-- 
Stefano


Re: [petsc-users] Spock link error

2021-09-19 Thread Stefano Zampini
Are you following the user advices here
https://docs.olcf.ornl.gov/systems/spock_quick_start_guide.html#compiling-with-the-cray-compiler-wrappers-cc-or-cc
?

Il giorno dom 19 set 2021 alle ore 16:30 Mark Adams  ha
scritto:

> I am getting to see this error. It seems to be suggesting that I turn
> --no-allow-shlib-undefined off.
> Any ideas?
> Thanks,
> Mark
>
> 09:09 main= /gpfs/alpine/csc314/scratch/adams/petsc$ make
> PETSC_DIR=/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new
> PETSC_ARCH="" check
> Running check examples to verify correct installation
> Using
> PETSC_DIR=/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new
> and PETSC_ARCH=
> gmake[3]:
> [/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib/petsc/conf/rules:301:
> ex19.PETSc] Error 2 (ignored)
> ***Error detected during compile or
> link!***
> See http://www.mcs.anl.gov/petsc/documentation/faq.html
> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials ex19
>
> *
> cc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas
> -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O2  -fPIC
> -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas
> -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O2
>  
> -I/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/include
> -I/opt/rocm-4.2.0/include ex19.c
>  
> -Wl,-rpath,/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib
> -L/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib
> -Wl,-rpath,/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib
> -L/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib
> -Wl,-rpath,/opt/rocm-4.2.0/lib -L/opt/rocm-4.2.0/lib
> -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib64
> -L/opt/cray/pe/gcc/8.1.0/snos/lib64 -Wl,-rpath,/opt/cray/pe/libsci/
> 21.06.1.1/CRAY/9.0/x86_64/lib -L/opt/cray/pe/libsci/
> 21.06.1.1/CRAY/9.0/x86_64/lib
> -Wl,-rpath,/opt/cray/pe/mpich/8.1.7/ofi/cray/10.0/lib
> -L/opt/cray/pe/mpich/8.1.7/ofi/cray/10.0/lib
> -Wl,-rpath,/opt/cray/pe/mpich/default/gtl/lib
> -L/opt/cray/pe/mpich/default/gtl/lib
> -Wl,-rpath,/opt/cray/pe/dsmml/0.1.5/dsmml/lib
> -L/opt/cray/pe/dsmml/0.1.5/dsmml/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.12/lib
> -L/opt/cray/pe/pmi/6.0.12/lib
> -Wl,-rpath,/opt/cray/pe/cce/12.0.1/cce/x86_64/lib
> -L/opt/cray/pe/cce/12.0.1/cce/x86_64/lib
> -Wl,-rpath,/opt/cray/xpmem/2.2.40-2.1_2.44__g3cf3325.shasta/lib64
> -L/opt/cray/xpmem/2.2.40-2.1_2.44__g3cf3325.shasta/lib64
> -Wl,-rpath,/opt/cray/pe/cce/12.0.1/cce-clang/x86_64/lib/clang/12.0.0/lib/linux
> -L/opt/cray/pe/cce/12.0.1/cce-clang/x86_64/lib/clang/12.0.0/lib/linux
> -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0
> -L/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0
> -Wl,-rpath,/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-unknown-linux-gnu/lib
> -L/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-unknown-linux-gnu/lib
> -Wl,-rpath,/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-pc-linux-gnu/..//x86_64-unknown-linux-gnu/lib
> -L/opt/cray/pe/cce/12.0.1/binutils/x86_64/x86_64-pc-linux-gnu/..//x86_64-unknown-linux-gnu/lib
> -lpetsc -lparmetis -lmetis -lhipsparse -lhipblas -lrocsparse -lrocsolver
> -lrocblas -lrocrand -lamdhip64 -lstdc++ -ldl -lmpifort_cray -lmpi_cray
> -lmpi_gtl_hsa -ldsmml -lpmi -lxpmem -lpgas-shmem -lquadmath
> -lcrayacc_amdgpu -lopenacc -lmodules -lfi -lcraymath -lf -lu -lcsup
> -lgfortran -lpthread -lgcc_eh -lm -lclang_rt.craypgo-x86_64
> -lclang_rt.builtins-x86_64 -lquadmath -lstdc++ -ldl -o ex19
>
>
> *ld.lld: error:
> /gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib/libpetsc.so:
> undefined reference to .omp_offloading.img_start.cray_amdgcn-amd-amdhsa
> [--no-allow-shlib-undefined]ld.lld: error:
> /gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib/libpetsc.so:
> undefined reference to .omp_offloading.img_size.cray_amdgcn-amd-amdhsa
> [--no-allow-shlib-undefined]ld.lld: error:
> /gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/lib/libpetsc.so:
> undefined reference to .omp_offloading.img_cache.cray_amdgcn-amd-amdhsa
> [--no-allow-shlib-undefined]*
> clang-12: error: linker command failed with exit code 1 (use -v to see
> invocation)
> gmake[4]: *** [: ex19] Error 1
> ***Error detected during compile or
> link!***
> See http://www.mcs.anl.gov/petsc/documentation/faq.html
> /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tutorials ex5f
> *
> ftn -fPIC -g -O2   -fPIC -g -O2
>  
> -I/gpfs/alpine/phy122/proj-shared/spock/petsc/current/arch-opt-cray-new/include
> -I/opt/rocm-4.2.0/include ex5f.F90
>  
> 

Re: [petsc-users] PETSC installation on Cray

2021-09-13 Thread Stefano Zampini
Enrico

I have accidentally stepped on the same issue. You may want to check  if it
works with this branch
https://gitlab.com/petsc/petsc/-/tree/stefanozampini/cray-arm

Il giorno mar 2 mar 2021 alle ore 23:03 Barry Smith  ha
scritto:

>
>   Please try the following. Make four files as below then compile each
> with  cc -c -o test.o  test1.c again for test2.c etc
>
>   Send all the output.
>
>
>
>   test1.c
> #include 
>
>   test2.c
> #define _BSD_SOURCE
> #include 
>
>   test3.c
> #define _DEFAULT_SOURCE
> #include 
>
>   test4.c
> #define _GNU_SOURCE
> #include 
>
> > On Mar 2, 2021, at 7:33 AM, Enrico  wrote:
> >
> > Hi,
> >
> > attached is the configuration and make log files.
> >
> > Enrico
> >
> > On 02/03/2021 14:13, Matthew Knepley wrote:
> >> On Tue, Mar 2, 2021 at 7:49 AM Enrico  degreg...@dkrz.de>> wrote:
> >>Hi,
> >>I'm having some problems installing PETSC with Cray compiler.
> >>I use this configuration:
> >>./configure --with-cc=cc --with-cxx=CC --with-fc=0 --with-debugging=1
> >>--with-shared-libraries=1 COPTFLAGS=-O0 CXXOPTFLAGS=-O0
> >>and when I do
> >>make all
> >>I get the following error because of cmathcalls.h:
> >>CC-1043 craycc: ERROR File = /usr/include/bits/cmathcalls.h, Line =
> 55
> >>_Complex can only be used with floating-point types.
> >>__MATHCALL (cacos, (_Mdouble_complex_ __z));
> >>^
> >>Am I doing something wrong?
> >> This was expended from somewhere. Can you show the entire err log?
> >>   Thanks,
> >>  Matt
> >>Regards,
> >>Enrico Degregori
> >> --
> >> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> >> -- Norbert Wiener
> >> https://www.cse.buffalo.edu/~knepley/ <
> http://www.cse.buffalo.edu/~knepley/>
> > 
>
>

-- 
Stefano


Re: [petsc-users] KSPSolve with MPIAIJ with non-square 'diagonal parts'

2021-08-30 Thread Stefano Zampini
What is the error you are getting from the KSP? Default solver in parallel in 
BlockJacobi+ILU which does not work for non-square matrices. You do not need to 
call PCSetFromOptions on the pc. Just call KSPSetFromOptions and run with 
-pc_type none

> On Aug 30, 2021, at 9:17 PM, Olivier Jamond  wrote:
> 
> Hello,
> 
> I am sorry because I surely miss something, but I cannot manage to solve a 
> problem with a MPIAIJ matrix which has non-square 'diagonal parts'.
> 
> I copy/paste at the bottom of this message a very simple piece of code which 
> causes me troubles. I this code I try to do 'x=KSP(A)*b' (with gmres/jacobi), 
> but this fails whereas a matrix multiplication 'b=A*x' seems to work. Is ksp 
> with such a matrix supposed to work (I can't find anything in the 
> documentation about that, so I guess that it is...) ?
> 
> Many thanks,
> Olivier
> 
> NB: this code should be launched with exactly 3 procs
> 
> #include "petscsys.h" /* framework routines */
> #include "petscvec.h" /* vectors */
> #include "petscmat.h" /* matrices */
> #include "petscksp.h"
> 
> #include 
> #include 
> #include 
> #include 
> 
> static char help[] = "Trying to solve a linear system on a sub-block of a 
> matrix\n\n";
> int main(int argc, char **argv)
> {
>   MPI_Init(NULL, NULL);
>   PetscErrorCode ierr;
>   ierr = PetscInitialize(, , NULL, help);
>   CHKERRQ(ierr);
> 
>   // clang-format off
>   std::vector> AA = {
>   { 1,  2,  0, /**/ 0,  3, /**/ 0,  0,  4},
>   { 0,  5,  6, /**/ 7,  0, /**/ 0,  8,  0},
>   { 9,  0, 10, /**/11,  0, /**/ 0, 12,  0},
>   //---
>   {13,  0, 14, /**/15, 16, /**/17,  0,  0},
>   { 0, 18,  0, /**/19, 20, /**/21,  0,  0},
>   { 0,  0,  0, /**/22, 23, /**/ 1, 24,  0},
>   //--
>   {25, 26, 27, /**/ 0,  0, /**/28, 29,  0},
>   {30,  0,  0, /**/31, 32, /**/33,  0, 34},
>   };
> 
> 
>   std::vector bb = {1.,
> 1.,
> 1.,
> //
> 1.,
> 1.,
> 1.,
> //
> 1.,
> 1.};
> 
> 
>   std::vector nDofsRow = {3, 3, 2};
>   std::vector nDofsCol = {3, 2, 3};
>   // clang-format on
> 
>   int NDofs = std::accumulate(nDofsRow.begin(), nDofsRow.end(), 0);
> 
>   int pRank, nProc;
>   MPI_Comm_rank(PETSC_COMM_WORLD, );
>   MPI_Comm_size(PETSC_COMM_WORLD, );
> 
>   if (nProc != 3) {
> std::cerr << "THIS TEST MUST BE LAUNCHED WITH EXACTLY 3 PROCS\n";
> abort();
>   }
> 
>   Mat A;
>   MatCreate(PETSC_COMM_WORLD, );
>   MatSetType(A, MATMPIAIJ);
>   MatSetSizes(A, nDofsRow[pRank], nDofsCol[pRank], PETSC_DETERMINE, 
> PETSC_DETERMINE);
>   MatMPIAIJSetPreallocation(A, NDofs, NULL, NDofs, NULL);
> 
>   Vec b;
>   VecCreate(PETSC_COMM_WORLD, );
>   VecSetType(b, VECMPI);
>   VecSetSizes(b, nDofsRow[pRank], PETSC_DECIDE);
> 
>   if (pRank == 0) {
> for (int i = 0; i < NDofs; ++i) {
>   for (int j = 0; j < NDofs; ++j) {
> if (AA[i][j] != 0.) {
>   MatSetValue(A, i, j, AA[i][j], ADD_VALUES);
> }
>   }
>   VecSetValue(b, i, bb[i], ADD_VALUES);
> }
>   }
> 
>   MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY);
>   MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY);
>   VecAssemblyBegin(b);
>   VecAssemblyEnd(b);
> 
>   PetscViewerPushFormat(PETSC_VIEWER_STDOUT_WORLD, PETSC_VIEWER_ASCII_DENSE);
>   MatView(A, PETSC_VIEWER_STDOUT_WORLD);
>   VecView(b, PETSC_VIEWER_STDOUT_WORLD);
> 
>   KSP ksp;
>   KSPCreate(PETSC_COMM_WORLD, );
>   KSPSetOperators(ksp, A, A);
>   KSPSetFromOptions(ksp);
> 
>   PC pc;
>   KSPGetPC(ksp, );
>   PCSetFromOptions(pc);
> 
>   Vec x;
>   MatCreateVecs(A, , NULL);
>   ierr = KSPSolve(ksp, b, x);// this fails
>   MatMult(A, x, b);  // whereas the seems to be ok...
> 
>   VecView(x, PETSC_VIEWER_STDOUT_WORLD);
> 
>   MPI_Finalize();
> 
>   return 0;
> }
> 



Re: [petsc-users] MatZeroRows changes my sparsity pattern

2021-07-15 Thread Stefano Zampini
Alexander

Do you have a small code to reproduce the issue?

Below is the output using a PETSc example (src/mat/tests/ex11). The pattern is 
kept. 

kl-18448:tests szampini$ ./ex11 
Mat Object: 1 MPI processes
  type: seqaij
row 0: (0, 5.) 
row 1: (0, -1.)  (1, 4.)  (2, -1.)  (6, -1.) 
row 2: (2, 5.) 
row 3: (2, -1.)  (3, 4.)  (4, -1.)  (8, -1.) 
row 4: (4, 5.) 
row 5: (0, -1.)  (5, 4.)  (6, -1.)  (10, -1.) 
row 6: (6, 5.) 
row 7: (2, -1.)  (6, -1.)  (7, 4.)  (8, -1.)  (12, -1.) 
row 8: (8, 5.) 
row 9: (4, -1.)  (8, -1.)  (9, 4.)  (14, -1.) 
row 10: (10, 5.) 
row 11: (6, -1.)  (10, -1.)  (11, 4.)  (12, -1.)  (16, -1.) 
row 12: (12, 5.) 
row 13: (8, -1.)  (12, -1.)  (13, 4.)  (14, -1.)  (18, -1.) 
row 14: (14, 5.) 
row 15: (10, -1.)  (15, 4.)  (16, -1.)  (20, -1.) 
row 16: (16, 5.) 
row 17: (12, -1.)  (16, -1.)  (17, 4.)  (18, -1.)  (22, -1.) 
row 18: (18, 5.) 
row 19: (14, -1.)  (18, -1.)  (19, 4.)  (24, -1.) 
row 20: (20, 5.) 
row 21: (16, -1.)  (20, -1.)  (21, 4.)  (22, -1.) 
row 22: (22, 5.) 
row 23: (18, -1.)  (22, -1.)  (23, 4.)  (24, -1.) 
row 24: (19, -1.)  (23, -1.)  (24, 4.) 
kl-18448:tests szampini$ ./ex11 -keep_nonzero_pattern
Mat Object: 1 MPI processes
  type: seqaij
row 0: (0, 5.)  (1, 0.)  (5, 0.) 
row 1: (0, -1.)  (1, 4.)  (2, -1.)  (6, -1.) 
row 2: (1, 0.)  (2, 5.)  (3, 0.)  (7, 0.) 
row 3: (2, -1.)  (3, 4.)  (4, -1.)  (8, -1.) 
row 4: (3, 0.)  (4, 5.)  (9, 0.) 
row 5: (0, -1.)  (5, 4.)  (6, -1.)  (10, -1.) 
row 6: (1, 0.)  (5, 0.)  (6, 5.)  (7, 0.)  (11, 0.) 
row 7: (2, -1.)  (6, -1.)  (7, 4.)  (8, -1.)  (12, -1.) 
row 8: (3, 0.)  (7, 0.)  (8, 5.)  (9, 0.)  (13, 0.) 
row 9: (4, -1.)  (8, -1.)  (9, 4.)  (14, -1.) 
row 10: (5, 0.)  (10, 5.)  (11, 0.)  (15, 0.) 
row 11: (6, -1.)  (10, -1.)  (11, 4.)  (12, -1.)  (16, -1.) 
row 12: (7, 0.)  (11, 0.)  (12, 5.)  (13, 0.)  (17, 0.) 
row 13: (8, -1.)  (12, -1.)  (13, 4.)  (14, -1.)  (18, -1.) 
row 14: (9, 0.)  (13, 0.)  (14, 5.)  (19, 0.) 
row 15: (10, -1.)  (15, 4.)  (16, -1.)  (20, -1.) 
row 16: (11, 0.)  (15, 0.)  (16, 5.)  (17, 0.)  (21, 0.) 
row 17: (12, -1.)  (16, -1.)  (17, 4.)  (18, -1.)  (22, -1.) 
row 18: (13, 0.)  (17, 0.)  (18, 5.)  (19, 0.)  (23, 0.) 
row 19: (14, -1.)  (18, -1.)  (19, 4.)  (24, -1.) 
row 20: (15, 0.)  (20, 5.)  (21, 0.) 
row 21: (16, -1.)  (20, -1.)  (21, 4.)  (22, -1.) 
row 22: (17, 0.)  (21, 0.)  (22, 5.)  (23, 0.) 
row 23: (18, -1.)  (22, -1.)  (23, 4.)  (24, -1.) 
row 24: (19, -1.)  (23, -1.)  (24, 4.)

> On Jul 15, 2021, at 4:41 PM, Alexander Lindsay  
> wrote:
> 
> My interpretation of the documentation page of MatZeroRows is that if I've 
> set MAT_KEEP_NONZERO_PATTERN to true, then my sparsity pattern shouldn't be 
> changed by a call to it, e.g. a->imax should not change. However, at least 
> for sequential matrices, MatAssemblyEnd is called with MAT_FINAL_ASSEMBLY at 
> the end of MatZeroRows_SeqAIJ and that does indeed change my sparsity 
> pattern. Is my interpretation of the documentation page wrong?
> 
> Alex



Re: [petsc-users] [EXTERNAL] Re: Problem with PCFIELDSPLIT

2021-07-14 Thread Stefano Zampini


> On Jul 14, 2021, at 5:01 PM, Tang, Qi  wrote:
> 
> Thanks a lot for the explanation, Matt and Stefano. That helps a lot.
> 
> Just to confirm, the comment in src/ts/impls/implicit/theta/theta.c seems to 
> indicates TS solves U_{n+1}  in its SNES/KSP solve, but it actually solves 
> the update dU_n in U_{n+1} = U_n - lambda*dU_n in the solve. Right?

The SNES object solves the nonlinear equations as written in the comment of 
TSTHETA.

F[t0+Theta*dt, U, (U-U0)*shift] = 0


In case SNES is of type SNESLS (Newton), then the linearized equations are 
solved. The linear system matrix is the one provided by the IJacobian  function

J = dF/dU + shift dF/dUdot

If it is SNESKSPONLY ( as it should be for TS_LINEAR), then only one step is 
taken and lambda = 1.

> 
> It actually makes a lot sense, because KSPSolve in TSSolve reports it uses 
> zero initial guess. So if what I said is true, that effectively means it uses 
> U0 as the initial guess.
> 
> Qi
> 
>> On Jul 14, 2021, at 2:56 AM, Matthew Knepley > <mailto:knep...@gmail.com>> wrote:
>> 
>> On Wed, Jul 14, 2021 at 4:43 AM Stefano Zampini > <mailto:stefano.zamp...@gmail.com>> wrote:
>> Qi
>> 
>> Backward Euler is a special case of Theta methods in PETSc (Theta=1). In 
>> src/ts/impls/implicit/theta/theta.c on top of SNESTSFormFunction_Theta you 
>> have some explanation of what is solved for at each time step (see below). 
>> SNES then solves for the Newton update dy_n  and the next Newton iterate is 
>> computed as x_{n+1} = x_{n} - lambda * dy_n. Hope this helps.
>> 
>> In other words, you should be able to match the initial residual to
>> 
>>   F(t + dt, 0, -Un / dt)
>> 
>> for your IFunction. However, it is really not normal to use U = 0. The 
>> default is to use U = U0
>> as the initial guess I think.
>> 
>>   Thanks,
>> 
>>  Matt
>>  
>> /*
>>   This defines the nonlinear equation that is to be solved with SNES
>>   G(U) = F[t0+Theta*dt, U, (U-U0)*shift] = 0
>> 
>>   Note that U here is the stage argument. This means that U = U_{n+1} only 
>> if endpoint = true,
>>   otherwise U = theta U_{n+1} + (1 - theta) U0, which for the case of 
>> implicit midpoint is
>>   U = (U_{n+1} + U0)/2
>> */
>> static PetscErrorCode SNESTSFormFunction_Theta(SNES snes,Vec x,Vec y,TS ts)
>> 
>> 
>>> On Jul 14, 2021, at 6:12 AM, Tang, Qi >> <mailto:tan...@msu.edu>> wrote:
>>> 
>>> Hi,
>>> 
>>> During the process to experiment the suggestion Matt made, we ran into some 
>>> questions regarding to TSSolve vs KSPSolve. We got different initial 
>>> unpreconditioned residual using two solvers. Let’s say we solve the problem 
>>> with backward Euler and there is no rhs. We guess TSSolve solves
>>> (U^{n+1}-U^n)/dt = A U^{n+1}.
>>> (We only provides IJacobian in this case and turn on TS_LINEAR.)
>>> So we guess the initial unpreconditioned residual would be ||U^n/dt||_2, 
>>> which seems different from the residual we got from a backward Euler 
>>> stepping we implemented by ourself through KSPSolve.
>>> 
>>> Do we have some misunderstanding on TSSolve? 
>>> 
>>> Thanks,
>>> Qi
>>> T5@LANL
>>> 
>>> 
>>> 
>>>> On Jul 7, 2021, at 3:54 PM, Matthew Knepley >>> <mailto:knep...@gmail.com>> wrote:
>>>> 
>>>> On Wed, Jul 7, 2021 at 2:33 PM Jorti, Zakariae >>> <mailto:zjo...@lanl.gov>> wrote:
>>>> Hi Matt,
>>>> 
>>>> 
>>>> 
>>>> Thanks for your quick reply. 
>>>> 
>>>> I have not completely understood your suggestion, could you please 
>>>> elaborate a bit more? 
>>>> 
>>>> For your convenience, here is how I am proceeding for the moment in my 
>>>> code: 
>>>> 
>>>> 
>>>> 
>>>> TSGetKSP(ts,);
>>>> 
>>>> KSPGetPC(ksp,);  
>>>> 
>>>> PCSetType(pc,PCFIELDSPLIT);
>>>> 
>>>> PCFieldSplitSetDetectSaddlePoint(pc,PETSC_TRUE);
>>>> 
>>>> PCSetUp(pc);
>>>> 
>>>> PCFieldSplitGetSubKSP(pc, , );
>>>> 
>>>> KSPGetPC(subksp[1], &(subpc[1]));
>>>> 
>>>> I do not like the two lines above. We should not have to do this. 
>>>> KSPSetOperators(subksp[1],T,T);
>>>> 
>>>>  In the above line, I want you to use a separate preconditioning matrix M, 
>>>>

Re: [petsc-users] [EXTERNAL] Re: Problem with PCFIELDSPLIT

2021-07-14 Thread Stefano Zampini
Qi

Backward Euler is a special case of Theta methods in PETSc (Theta=1). In 
src/ts/impls/implicit/theta/theta.c on top of SNESTSFormFunction_Theta you have 
some explanation of what is solved for at each time step (see below). SNES then 
solves for the Newton update dy_n  and the next Newton iterate is computed as 
x_{n+1} = x_{n} - lambda * dy_n. Hope this helps.

/*
  This defines the nonlinear equation that is to be solved with SNES
  G(U) = F[t0+Theta*dt, U, (U-U0)*shift] = 0

  Note that U here is the stage argument. This means that U = U_{n+1} only if 
endpoint = true,
  otherwise U = theta U_{n+1} + (1 - theta) U0, which for the case of implicit 
midpoint is
  U = (U_{n+1} + U0)/2
*/
static PetscErrorCode SNESTSFormFunction_Theta(SNES snes,Vec x,Vec y,TS ts)


> On Jul 14, 2021, at 6:12 AM, Tang, Qi  wrote:
> 
> Hi,
> 
> During the process to experiment the suggestion Matt made, we ran into some 
> questions regarding to TSSolve vs KSPSolve. We got different initial 
> unpreconditioned residual using two solvers. Let’s say we solve the problem 
> with backward Euler and there is no rhs. We guess TSSolve solves
> (U^{n+1}-U^n)/dt = A U^{n+1}.
> (We only provides IJacobian in this case and turn on TS_LINEAR.)
> So we guess the initial unpreconditioned residual would be ||U^n/dt||_2, 
> which seems different from the residual we got from a backward Euler stepping 
> we implemented by ourself through KSPSolve.
> 
> Do we have some misunderstanding on TSSolve? 
> 
> Thanks,
> Qi
> T5@LANL
> 
> 
> 
>> On Jul 7, 2021, at 3:54 PM, Matthew Knepley > > wrote:
>> 
>> On Wed, Jul 7, 2021 at 2:33 PM Jorti, Zakariae > > wrote:
>> Hi Matt,
>> 
>> 
>> 
>> Thanks for your quick reply. 
>> 
>> I have not completely understood your suggestion, could you please elaborate 
>> a bit more? 
>> 
>> For your convenience, here is how I am proceeding for the moment in my code: 
>> 
>> 
>> 
>> TSGetKSP(ts,);
>> 
>> KSPGetPC(ksp,);  
>> 
>> PCSetType(pc,PCFIELDSPLIT);
>> 
>> PCFieldSplitSetDetectSaddlePoint(pc,PETSC_TRUE);
>> 
>> PCSetUp(pc);
>> 
>> PCFieldSplitGetSubKSP(pc, , );
>> 
>> KSPGetPC(subksp[1], &(subpc[1]));
>> 
>> I do not like the two lines above. We should not have to do this. 
>> KSPSetOperators(subksp[1],T,T);
>> 
>>  In the above line, I want you to use a separate preconditioning matrix M, 
>> instead of T. That way, it will provide
>> the preconditioning matrix for your Schur complement problem.
>> 
>>   Thanks,
>> 
>>   Matt
>> KSPSetUp(subksp[1]);
>> 
>> PetscFree(subksp);
>> 
>> TSSolve(ts,X);
>> 
>> 
>> 
>> Thank you.
>> 
>> Best,
>> 
>> 
>> 
>> Zakariae
>> 
>> From: Matthew Knepley mailto:knep...@gmail.com>>
>> Sent: Wednesday, July 7, 2021 12:11:10 PM
>> To: Jorti, Zakariae
>> Cc: petsc-users@mcs.anl.gov ; Tang, Qi; 
>> Tang, Xianzhu
>> Subject: [EXTERNAL] Re: [petsc-users] Problem with PCFIELDSPLIT
>>  
>> On Wed, Jul 7, 2021 at 1:51 PM Jorti, Zakariae via petsc-users 
>> mailto:petsc-users@mcs.anl.gov>> wrote:
>> Hi,
>> 
>> 
>> 
>> I am trying to build a PCFIELDSPLIT preconditioner for a matrix 
>> 
>> J =  [A00  A01]
>> 
>>[A10  A11] 
>> 
>> that has the following shape: 
>> 
>> 
>> 
>> M_{user}^{-1} = [I   -ksp(A00) A01] [ksp(A00)   0] [I
>> 0]
>> 
>>   [0I]  [0   
>> ksp(T)] [-A10 ksp(A00)  I ]
>> 
>> 
>> 
>> where T is a user-defined Schur complement approximation that replaces the 
>> true Schur complement S:= A11 - A10 ksp(A00) A01.
>> 
>> 
>> 
>> I am trying to do something similar to this example (lines 41--45 and 
>> 116--121): 
>> https://www.mcs.anl.gov/petsc/petsc-current/src/snes/tutorials/ex70.c.html 
>> 
>> 
>> The problem I have is that I manage to replace S with T on a separate single 
>> linear system but not for the linear systems generated by my time-dependent 
>> PDE. Even if I set the preconditioner M_{user}^{-1} correctly, the T matrix 
>> gets replaced by S in the preconditioner once I call TSSolve. 
>> 
>> Do you have any suggestions how to fix this knowing that the matrix J does 
>> not change over time?
>> 
>> 
>> I don't like how it is done in that example for this very reason.
>> 
>> When I want to use a custom preconditioning matrix for the Schur complement, 
>> I always give a preconditioning matrix M to the outer solve.
>> Then PCFIELDSPLIT automatically pulls the correct block from M, (1,1) for 
>> the Schur complement, for that preconditioning matrix without
>> extra code. Can you do this?
>> 
>>   Thanks,
>> 
>> Matt
>> Many thanks.
>> 
>> 
>> 
>> Best regards,
>> 
>> 
>> 
>> Zakariae   
>> 
>> 
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments 
>> is 

Re: [petsc-users] CUDA running out of memory in PtAP

2021-07-09 Thread Stefano Zampini
Mark

Can you test with https://gitlab.com/petsc/petsc/-/merge_requests/4158? 
<https://gitlab.com/petsc/petsc/-/merge_requests/4158?>
It is off release

> On Jul 7, 2021, at 4:24 PM, Mark Adams  wrote:
> 
> I think that is a good idea. I am trying to do it myself but it is getting 
> messy.
> Thanks,
> 
> On Wed, Jul 7, 2021 at 9:50 AM Stefano Zampini  <mailto:stefano.zamp...@gmail.com>> wrote:
> Do you want me to open an MR to handle the sequential case?
> 
>> On Jul 7, 2021, at 3:39 PM, Mark Adams > <mailto:mfad...@lbl.gov>> wrote:
>> 
>> OK, I found where its not protected in sequential.
>> 
>> On Wed, Jul 7, 2021 at 9:25 AM Mark Adams > <mailto:mfad...@lbl.gov>> wrote:
>> Thanks, but that did not work. 
>> 
>> It looks like this is just in MPIAIJ, but I am using SeqAIJ. ex2 (below) 
>> uses PETSC_COMM_SELF everywhere.
>> 
>> + srun -G 1 -n 16 -c 1 --cpu-bind=cores --ntasks-per-core=2 
>> /global/homes/m/madams/mps-wrapper.sh ../ex2 -dm_landau_device_type cuda 
>> -dm_mat_type aijcusparse -dm_vec_type cuda -log_view -pc_type gamg -ksp_type 
>> gmres -pc_gamg_reuse_interpolation -matmatmult_backend_cpu 
>> -matptap_backend_cpu -dm_landau_ion_masses .0005,1,1,1,1,1,1,1,1 
>> -dm_landau_ion_charges 1,2,3,4,5,6,7,8,9 -dm_landau_thermal_temps 
>> 1,1,1,1,1,1,1,1,1,1 -dm_landau_n 
>> 1.03,.5,1e-7,1e-7,1e-7,1e-7,1e-7,1e-7,1e-7,1e-7
>> 0 starting nvidia-cuda-mps-control on cgpu17
>> mps ready: 2021-07-07T06:17:36-07:00
>> masses:e= 9.109e-31; ions in proton mass units:5.000e-04  
>> 1.000e+00 ...
>> charges:   e=-1.602e-19; charges in elementary units:  1.000e+00  
>> 2.000e+00
>> thermal T (K): e= 1.160e+07 i= 1.160e+07 imp= 1.160e+07. v_0= 1.326e+07 n_0= 
>> 1.000e+20 t_0= 5.787e-06 domain= 5.000e+00
>> CalculateE j0=0. Ec = 0.050991
>> 0 TS dt 1. time 0.
>>   0) species-0: charge density= -1.6054532569865e+01 z-momentum= 
>> -1.9059929215360e-19 energy=  2.4178543516210e+04
>>   0) species-1: charge density=  8.0258396545108e+00 z-momentum=  
>> 7.0660527288120e-20 energy=  1.2082380663859e+04
>>   0) species-2: charge density=  6.3912608577597e-05 z-momentum= 
>> -1.1513901010709e-24 energy=  3.5799558195524e-01
>>   0) species-3: charge density=  9.5868912866395e-05 z-momentum= 
>> -1.1513901010709e-24 energy=  3.5799558195524e-01
>>   0) species-4: charge density=  1.2782521715519e-04 z-momentum= 
>> -1.1513901010709e-24 energy=  3.5799558195524e-01
>> [7]PETSC ERROR: - Error Message 
>> --
>> [7]PETSC ERROR: GPU resources unavailable 
>> [7]PETSC ERROR: CUDA error 2 (cudaErrorMemoryAllocation) : out of memory. 
>> Reports alloc failed; this indicates the GPU has run out resources
>> [7]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html 
>> <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble shooting.
>> [7]PETSC ERROR: Petsc Development GIT revision: v3.15.1-569-g270a066c1e  GIT 
>> Date: 2021-07-06 03:22:54 -0700
>> [7]PETSC ERROR: ../ex2 on a arch-cori-gpu-opt-gcc named cgpu17 by madams Wed 
>> Jul  7 06:17:38 2021
>> [7]PETSC ERROR: Configure options 
>> --with-mpi-dir=/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc 
>> --with-cuda-dir=/usr/common/software/sles15_cgpu/cuda/11.1.1 --CFLAGS="   -g 
>> -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CXXFLAGS=" -g 
>> -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CUDAFLAGS="-g 
>> -Xcompiler -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 
>> -DLANDAU_MAX_Q=4" --FFLAGS="   -g " --COPTFLAGS="   -O3" --CXXOPTFLAGS=" 
>> -O3" --FOPTFLAGS="   -O3" --download-fblaslapack=1 --with-debugging=0 
>> --with-mpiexec="srun -G 1" --with-cuda-gencodearch=70 --with-batch=0 
>> --with-cuda=1 --download-p4est=1 --download-hypre=1 --with-zlib=1 
>> PETSC_ARCH=arch-cori-gpu-opt-gcc
>> [7]PETSC ERROR: #1 MatProductSymbolic_SeqAIJCUSPARSE_SeqAIJCUSPARSE() at 
>> /global/u2/m/madams/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2622
>>  <http://aijcusparse.cu:2622/>
>> [7]PETSC ERROR: #2 MatProductSymbolic_ABC_Basic() at 
>> /global/u2/m/madams/petsc/src/mat/interface/matproduct.c:1146
>> [7]PETSC ERROR: #3 MatProductSymbolic() at 
>> /global/u2/m/madams/petsc/src/mat/interface/matproduct.c:799
>> [7]PETSC ERROR: #4 MatPtAP() at 
>> /global/u2/m/madams/petsc/src/mat/interface/matrix.c:9626
>> [7]PETSC ERROR: #5 PCGAMGCreateLevel_GAMG() at

Re: [petsc-users] download zlib error

2021-07-07 Thread Stefano Zampini
There's an extra comma

Il Mer 7 Lug 2021, 18:08 Mark Adams  ha scritto:

> Humm, I get this error (I just copied your whole file into here):
>
> 12:06 jczhang/fix-kokkos-includes=
> /gpfs/alpine/csc314/scratch/adams/petsc$ ~/arch-spock-dbg-cray-kokkos.py
> Traceback (most recent call last):
>   File "/ccs/home/adams/arch-spock-dbg-cray-kokkos.py", line 27, in
> 
> '--LDFLAGS=-L'+os.environ['ROCM_PATH'],+'lib -lhsa-runtime64',
> TypeError: bad operand type for unary +: 'str'
>
> On Wed, Jul 7, 2021 at 11:08 AM Stefano Zampini 
> wrote:
>
>> Mark
>>
>> On Spock, you can use
>> https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-olcf-spock.py
>>  as
>> a template for your configuration. You need to add libraries as LDFLAGS to
>> resolve the hsa symbols
>>
>> On Jul 7, 2021, at 5:04 PM, Mark Adams  wrote:
>>
>> Thanks,
>>
>> 08:30 jczhang/fix-kokkos-includes=
>> /gpfs/alpine/csc314/scratch/adams/petsc$ cd
>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-spock-opt-cray-kokkos/externalpackages/zlib-1.2.11
>> && CC="cc" CFLAGS="-fPIC -fstack-protector -Qunused-arguments -g -O0
>> -I${ROCM_PATH}/include"
>> prefix="/gpfs/alpine/csc314/scratch/adams/petsc/arch-spock-opt-cray-kokkos"
>> ./configure  && /usr/bin/gmake -j8 -l307.2 &&  /usr/bin/gmake install
>> Checking for shared library support...
>> Building shared library libz.so.1.2.11 with cc.
>> Checking for size_t... Yes.
>> Checking for off64_t... Yes.
>> Checking for fseeko... Yes.
>> Checking for strerror... No.
>> Checking for unistd.h... Yes.
>> Checking for stdarg.h... Yes.
>> Checking whether to use vs[n]printf() or s[n]printf()... using
>> vs[n]printf().
>> Checking for vsnprintf() in stdio.h... No.
>>   WARNING: vsnprintf() not found, falling back to vsprintf(). zlib
>>   can build but will be open to possible buffer-overflow security
>>   vulnerabilities.
>> Checking for return value of vsprintf()... Yes.
>> Checking for attribute(visibility) support... Yes.
>> cc -fPIC -fstack-protector -Qunused-arguments -g -O0
>> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1
>> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN -I. -c -o example.o
>> test/example.c
>> cc -fPIC -fstack-protector -Qunused-arguments -g -O0
>> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1
>> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN  -c -o adler32.o adler32.c
>> cc -fPIC -fstack-protector -Qunused-arguments -g -O0
>> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1
>> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN  -c -o crc32.o crc32.c
>> cc -fPIC -fstack-protector -Qunused-arguments -g -O0
>> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1
>> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN  -c -o deflate.o deflate.c
>> cc -fPIC -fstack-protector -Qunused-arguments -g -O0
>> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1
>> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN  -c -o infback.o infback.c
>> cc -fPIC -fstack-protector -Qunused-arguments -g -O0
>> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1
>> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN  -c -o inffast.o inffast.c
>> cc -fPIC -fstack-protector -Qunused-arguments -g -O0
>> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1
>> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN  -c -o inflate.o inflate.c
>> cc -fPIC -fstack-protector -Qunused-arguments -g -O0
>> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1
>> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN  -c -o inftrees.o inftrees.c
>> cc -fPIC -fstack-protector -Qunused-arguments -g -O0
>> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1
>> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN  -c -o trees.o trees.c
>> cc -fPIC -fstack-protector -Qunused-arguments -g -O0
>> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1
>> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN  -c -o zutil.o zutil.c
>> cc -fPIC -fstack-protector -Qunused-arguments -g -O0
>> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1
>> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN  -c -o compress.o compress.c
>> cc -fPIC -fstack-protector -Qunused-arguments -g -O0
>> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1
>> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN  -c -o uncompr.o uncompr.c
>> cc -fPIC -fstack-protector -Qunused-argu

Re: [petsc-users] download zlib error

2021-07-07 Thread Stefano Zampini
Mark

On Spock, you can use 
https://gitlab.com/petsc/petsc/-/blob/main/config/examples/arch-olcf-spock.py 
 
as a template for your configuration. You need to add libraries as LDFLAGS to 
resolve the hsa symbols

> On Jul 7, 2021, at 5:04 PM, Mark Adams  wrote:
> 
> Thanks,
> 
> 08:30 jczhang/fix-kokkos-includes= /gpfs/alpine/csc314/scratch/adams/petsc$ 
> cd 
> /gpfs/alpine/csc314/scratch/adams/petsc/arch-spock-opt-cray-kokkos/externalpackages/zlib-1.2.11
>  && CC="cc" CFLAGS="-fPIC -fstack-protector -Qunused-arguments -g -O0 
> -I${ROCM_PATH}/include" 
> prefix="/gpfs/alpine/csc314/scratch/adams/petsc/arch-spock-opt-cray-kokkos" 
> ./configure  && /usr/bin/gmake -j8 -l307.2 &&  /usr/bin/gmake install
> Checking for shared library support...
> Building shared library libz.so.1.2.11 with cc.
> Checking for size_t... Yes.
> Checking for off64_t... Yes.
> Checking for fseeko... Yes.
> Checking for strerror... No.
> Checking for unistd.h... Yes.
> Checking for stdarg.h... Yes.
> Checking whether to use vs[n]printf() or s[n]printf()... using vs[n]printf().
> Checking for vsnprintf() in stdio.h... No.
>   WARNING: vsnprintf() not found, falling back to vsprintf(). zlib
>   can build but will be open to possible buffer-overflow security
>   vulnerabilities.
> Checking for return value of vsprintf()... Yes.
> Checking for attribute(visibility) support... Yes.
> cc -fPIC -fstack-protector -Qunused-arguments -g -O0 
> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1 
> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN -I. -c -o example.o test/example.c
> cc -fPIC -fstack-protector -Qunused-arguments -g -O0 
> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1 
> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN  -c -o adler32.o adler32.c
> cc -fPIC -fstack-protector -Qunused-arguments -g -O0 
> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1 
> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN  -c -o crc32.o crc32.c
> cc -fPIC -fstack-protector -Qunused-arguments -g -O0 
> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1 
> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN  -c -o deflate.o deflate.c
> cc -fPIC -fstack-protector -Qunused-arguments -g -O0 
> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1 
> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN  -c -o infback.o infback.c
> cc -fPIC -fstack-protector -Qunused-arguments -g -O0 
> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1 
> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN  -c -o inffast.o inffast.c
> cc -fPIC -fstack-protector -Qunused-arguments -g -O0 
> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1 
> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN  -c -o inflate.o inflate.c
> cc -fPIC -fstack-protector -Qunused-arguments -g -O0 
> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1 
> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN  -c -o inftrees.o inftrees.c
> cc -fPIC -fstack-protector -Qunused-arguments -g -O0 
> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1 
> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN  -c -o trees.o trees.c
> cc -fPIC -fstack-protector -Qunused-arguments -g -O0 
> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1 
> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN  -c -o zutil.o zutil.c
> cc -fPIC -fstack-protector -Qunused-arguments -g -O0 
> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1 
> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN  -c -o compress.o compress.c
> cc -fPIC -fstack-protector -Qunused-arguments -g -O0 
> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1 
> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN  -c -o uncompr.o uncompr.c
> cc -fPIC -fstack-protector -Qunused-arguments -g -O0 
> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1 
> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN  -c -o gzclose.o gzclose.c
> cc -fPIC -fstack-protector -Qunused-arguments -g -O0 
> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1 
> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN  -c -o gzlib.o gzlib.c
> cc -fPIC -fstack-protector -Qunused-arguments -g -O0 
> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1 
> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN  -c -o gzread.o gzread.c
> cc -fPIC -fstack-protector -Qunused-arguments -g -O0 
> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1 
> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN  -c -o gzwrite.o gzwrite.c
> cc -fPIC -fstack-protector -Qunused-arguments -g -O0 
> -I/sw/spock/spack-envs/views/rocm-4.1.0/include -D_LARGEFILE64_SOURCE=1 
> -DNO_STRERROR -DNO_vsnprintf -DHAVE_HIDDEN -I. -c -o minigzip.o 
> test/minigzip.c
> cc -fPIC -fstack-protector -Qunused-arguments -g -O0 
> 

Re: [petsc-users] CUDA running out of memory in PtAP

2021-07-07 Thread Stefano Zampini
etsc/src/snes/interface/snes.c:4769
> [7]PETSC ERROR: #13 TSTheta_SNESSolve() at 
> /global/u2/m/madams/petsc/src/ts/impls/implicit/theta/theta.c:185
> [7]PETSC ERROR: #14 TSStep_Theta() at 
> /global/u2/m/madams/petsc/src/ts/impls/implicit/theta/theta.c:223
> [7]PETSC ERROR: #15 TSStep() at 
> /global/u2/m/madams/petsc/src/ts/interface/ts.c:3571
> [7]PETSC ERROR: #16 TSSolve() at 
> /global/u2/m/madams/petsc/src/ts/interface/ts.c:3968
> [7]PETSC ERROR: #17 main() at ex2.c:699
> [7]PETSC ERROR: PETSc Option Table entries:
> [7]PETSC ERROR: -dm_landau_amr_levels_max 0
> [7]PETSC ERROR: -dm_landau_amr_post_refine 5
> [7]PETSC ERROR: -dm_landau_device_type cuda
> [7]PETSC ERROR: -dm_landau_domain_radius 5
> [7]PETSC ERROR: -dm_landau_Ez 0
> [7]PETSC ERROR: -dm_landau_ion_charges 1,2,3,4,5,6,7,8,9
> [7]PETSC ERROR: -dm_landau_ion_masses .0005,1,1,1,1,1,1,1,1
> [7]PETSC ERROR: -dm_landau_n 
> 1.03,.5,1e-7,1e-7,1e-7,1e-7,1e-7,1e-7,1e-7,1e-7
> [7]PETSC ERROR: -dm_landau_thermal_temps 1,1,1,1,1,1,1,1,1,1
> [7]PETSC ERROR: -dm_landau_type p4est
> [7]PETSC ERROR: -dm_mat_type aijcusparse
> [7]PETSC ERROR: -dm_preallocate_only
> [7]PETSC ERROR: -dm_vec_type cuda
> [7]PETSC ERROR: -ex2_connor_e_field_units
> [7]PETSC ERROR: -ex2_impurity_index 1
> [7]PETSC ERROR: -ex2_plot_dt 200
> [7]PETSC ERROR: -ex2_test_type none
> [7]PETSC ERROR: -ksp_type gmres
> [7]PETSC ERROR: -log_view
> [7]PETSC ERROR: -matmatmult_backend_cpu
> [7]PETSC ERROR: -matptap_backend_cpu
> [7]PETSC ERROR: -pc_gamg_reuse_interpolation
> [7]PETSC ERROR: -pc_type gamg
> [7]PETSC ERROR: -petscspace_degree 1
> [7]PETSC ERROR: -snes_max_it 15
> [7]PETSC ERROR: -snes_rtol 1.e-6
> [7]PETSC ERROR: -snes_stol 1.e-6
> [7]PETSC ERROR: -ts_adapt_scale_solve_failed 0.5
> [7]PETSC ERROR: -ts_adapt_time_step_increase_delay 5
> [7]PETSC ERROR: -ts_dt 1
> [7]PETSC ERROR: -ts_exact_final_time stepover
> [7]PETSC ERROR: -ts_max_snes_failures -1
> [7]PETSC ERROR: -ts_max_steps 10
> [7]PETSC ERROR: -ts_max_time 300
> [7]PETSC ERROR: -ts_rtol 1e-2
> [7]PETSC ERROR: -ts_type beuler
> 
> On Wed, Jul 7, 2021 at 4:07 AM Stefano Zampini  <mailto:stefano.zamp...@gmail.com>> wrote:
> This will select the CPU path
> 
> -matmatmult_backend_cpu -matptap_backend_cpu
> 
>> On Jul 7, 2021, at 2:43 AM, Mark Adams > <mailto:mfad...@lbl.gov>> wrote:
>> 
>> Can I turn off using cuSprarse for RAP?
>> 
>> On Tue, Jul 6, 2021 at 6:25 PM Barry Smith > <mailto:bsm...@petsc.dev>> wrote:
>> 
>>   Stefano has mentioned this before. He reported cuSparse matrix-matrix 
>> vector products use a very amount of memory.
>> 
>>> On Jul 6, 2021, at 4:33 PM, Mark Adams >> <mailto:mfad...@lbl.gov>> wrote:
>>> 
>>> I am running out of memory in GAMG. It looks like this is from the new 
>>> cuSparse RAP.
>>> I was able to run Hypre with twice as much work on the GPU as this run.
>>> Are there parameters to tweek for this perhaps or can I disable it?
>>> 
>>> Thanks,
>>> Mark 
>>> 
>>>0 SNES Function norm 5.442539952302e-04 
>>> [2]PETSC ERROR: - Error Message 
>>> --
>>> [2]PETSC ERROR: GPU resources unavailable 
>>> [2]PETSC ERROR: CUDA error 2 (cudaErrorMemoryAllocation) : out of memory. 
>>> Reports alloc failed; this indicates the GPU has run out resources
>>> [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html 
>>> <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble shooting.
>>> [2]PETSC ERROR: Petsc Development GIT revision: v3.15.1-569-g270a066c1e  
>>> GIT Date: 2021-07-06 03:22:54 -0700
>>> [2]PETSC ERROR: ../ex2 on a arch-cori-gpu-opt-gcc named cgpu11 by madams 
>>> Tue Jul  6 13:37:43 2021
>>> [2]PETSC ERROR: Configure options 
>>> --with-mpi-dir=/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc 
>>> --with-cuda-dir=/usr/common/software/sles15_cgpu/cuda/11.1.1 --CFLAGS="   
>>> -g -DLANDAU_DIM=2 -DLANDAU_MAX_SPECI
>>> ES=10 -DLANDAU_MAX_Q=4" --CXXFLAGS=" -g -DLANDAU_DIM=2 
>>> -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CUDAFLAGS="-g -Xcompiler 
>>> -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" 
>>> --FFLAGS="   -g " -
>>> -COPTFLAGS="   -O3" --CXXOPTFLAGS=" -O3" --FOPTFLAGS="   -O3" 
>>> --download-fblaslapack=1 --with-debugging=0 --with-mpiexec="srun -G 1" 
>>> --with-cuda-gencodearch=70 --with-batch=0 --with-cuda=1 -

Re: [petsc-users] CUDA running out of memory in PtAP

2021-07-07 Thread Stefano Zampini
This will select the CPU path

-matmatmult_backend_cpu -matptap_backend_cpu

> On Jul 7, 2021, at 2:43 AM, Mark Adams  wrote:
> 
> Can I turn off using cuSprarse for RAP?
> 
> On Tue, Jul 6, 2021 at 6:25 PM Barry Smith  > wrote:
> 
>   Stefano has mentioned this before. He reported cuSparse matrix-matrix 
> vector products use a very amount of memory.
> 
>> On Jul 6, 2021, at 4:33 PM, Mark Adams > > wrote:
>> 
>> I am running out of memory in GAMG. It looks like this is from the new 
>> cuSparse RAP.
>> I was able to run Hypre with twice as much work on the GPU as this run.
>> Are there parameters to tweek for this perhaps or can I disable it?
>> 
>> Thanks,
>> Mark 
>> 
>>0 SNES Function norm 5.442539952302e-04 
>> [2]PETSC ERROR: - Error Message 
>> --
>> [2]PETSC ERROR: GPU resources unavailable 
>> [2]PETSC ERROR: CUDA error 2 (cudaErrorMemoryAllocation) : out of memory. 
>> Reports alloc failed; this indicates the GPU has run out resources
>> [2]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html 
>>  for trouble shooting.
>> [2]PETSC ERROR: Petsc Development GIT revision: v3.15.1-569-g270a066c1e  GIT 
>> Date: 2021-07-06 03:22:54 -0700
>> [2]PETSC ERROR: ../ex2 on a arch-cori-gpu-opt-gcc named cgpu11 by madams Tue 
>> Jul  6 13:37:43 2021
>> [2]PETSC ERROR: Configure options 
>> --with-mpi-dir=/usr/common/software/sles15_cgpu/openmpi/4.0.3/gcc 
>> --with-cuda-dir=/usr/common/software/sles15_cgpu/cuda/11.1.1 --CFLAGS="   -g 
>> -DLANDAU_DIM=2 -DLANDAU_MAX_SPECI
>> ES=10 -DLANDAU_MAX_Q=4" --CXXFLAGS=" -g -DLANDAU_DIM=2 
>> -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CUDAFLAGS="-g -Xcompiler 
>> -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" 
>> --FFLAGS="   -g " -
>> -COPTFLAGS="   -O3" --CXXOPTFLAGS=" -O3" --FOPTFLAGS="   -O3" 
>> --download-fblaslapack=1 --with-debugging=0 --with-mpiexec="srun -G 1" 
>> --with-cuda-gencodearch=70 --with-batch=0 --with-cuda=1 --download-p4est=1 --
>> download-hypre=1 --with-zlib=1 PETSC_ARCH=arch-cori-gpu-opt-gcc
>> [2]PETSC ERROR: #1 MatProductSymbolic_SeqAIJCUSPARSE_SeqAIJCUSPARSE() at 
>> /global/u2/m/madams/petsc/src/mat/impls/aij/seq/seqcusparse/aijcusparse.cu:2622
>>  
>> [2]PETSC ERROR: #2 MatProductSymbolic_ABC_Basic() at 
>> /global/u2/m/madams/petsc/src/mat/interface/matproduct.c:1159
>> [2]PETSC ERROR: #3 MatProductSymbolic() at 
>> /global/u2/m/madams/petsc/src/mat/interface/matproduct.c:799
>> [2]PETSC ERROR: #4 MatPtAP() at 
>> /global/u2/m/madams/petsc/src/mat/interface/matrix.c:9626
>> [2]PETSC ERROR: #5 PCGAMGCreateLevel_GAMG() at 
>> /global/u2/m/madams/petsc/src/ksp/pc/impls/gamg/gamg.c:87
>> [2]PETSC ERROR: #6 PCSetUp_GAMG() at 
>> /global/u2/m/madams/petsc/src/ksp/pc/impls/gamg/gamg.c:663
>> [2]PETSC ERROR: #7 PCSetUp() at 
>> /global/u2/m/madams/petsc/src/ksp/pc/interface/precon.c:1014
>> [2]PETSC ERROR: #8 KSPSetUp() at 
>> /global/u2/m/madams/petsc/src/ksp/ksp/interface/itfunc.c:406
>> [2]PETSC ERROR: #9 KSPSolve_Private() at 
>> /global/u2/m/madams/petsc/src/ksp/ksp/interface/itfunc.c:850
>> [2]PETSC ERROR: #10 KSPSolve() at 
>> /global/u2/m/madams/petsc/src/ksp/ksp/interface/itfunc.c:1084
>> [2]PETSC ERROR: #11 SNESSolve_NEWTONLS() at 
>> /global/u2/m/madams/petsc/src/snes/impls/ls/ls.c:225
>> [2]PETSC ERROR: #12 SNESSolve() at 
>> /global/u2/m/madams/petsc/src/snes/interface/snes.c:4769
>> [2]PETSC ERROR: #13 TSTheta_SNESSolve() at 
>> /global/u2/m/madams/petsc/src/ts/impls/implicit/theta/theta.c:185
>> [2]PETSC ERROR: #14 TSStep_Theta() at 
>> /global/u2/m/madams/petsc/src/ts/impls/implicit/theta/theta.c:223
>> [2]PETSC ERROR: #15 TSStep() at 
>> /global/u2/m/madams/petsc/src/ts/interface/ts.c:3571
>> [2]PETSC ERROR: #16 TSSolve() at 
>> /global/u2/m/madams/petsc/src/ts/interface/ts.c:3968
>> [2]PETSC ERROR: #17 main() at ex2.c:699
> 



Re: [petsc-users] PETSc with Julia Binary Builder

2021-07-02 Thread Stefano Zampini
Patrick

Should this be fixed in PETSc build system? 
https://github.com/JuliaPackaging/Yggdrasil/blob/master/P/PETSc/bundled/patches/petsc_name_mangle.patch

> On Jul 2, 2021, at 9:05 AM, Patrick Sanan  wrote:
> 
> As you mention in [4], the proximate cause of the configure failure is this 
> link error [8]:
> 
> Naively, that looks like a problem to be resolved at the level of the C++ 
> compiler and MPI.
> 
> Unless there are wrinkles of this build process that I don't understand 
> (likely), this [6] looks non-standard to me:
> 
>   includedir="${prefix}/include"
>   ...
>   ./configure --prefix=${prefix} \
>   ...
>   -with-mpi-include="${includedir}" \
>   ...
> 
> 
> Is it possible to configure using  --with-mpi-dir, instead of the separate 
> --with-mpi-include and --with-mpi-lib commands? 
> 
> 
> As an aside, maybe Satish can say more, but I'm not sure if it's advisable to 
> override variables in the make command [7].
> 
> [8]   
> https://gist.github.com/jkozdon/c161fb15f2df23c3fbc0a5a095887ef8#file-configure-log-L7795
> [6]   
> https://gist.github.com/jkozdon/c161fb15f2df23c3fbc0a5a095887ef8#file-build_tarballs-jl-L45
> [7]   
> https://gist.github.com/jkozdon/c161fb15f2df23c3fbc0a5a095887ef8#file-build_tarballs-jl-L55
> 
> 
>> Am 02.07.2021 um 06:25 schrieb Kozdon, Jeremy (CIV) :
>> 
>> I have been talking with Boris Kaus and Patrick Sanan about trying to revive 
>> the Julia PETSc interface wrappers. One of the first things to get going is 
>> to use Julia's binary builder [1] to wrap more scalar, real, and int type 
>> builds of the PETSc library; the current distribution is just Real, double, 
>> Int32. I've been working on a PR for this [2] but have been running into 
>> some build issues on some architectures [3].
>> 
>> I doubt that anyone here is an expert with Julia's binary builder system, 
>> but I was wondering if anyone who is better with the PETSc build system can 
>> see anything obvious from the configure.log [4] that might help me sort out 
>> what's going on.
>> 
>> This exact script worked on 2020-08-20 [5] to build the libraries, se 
>> something has obviously changed with either the Julia build system and/or 
>> one (or more!) of the dependency binaries.
>> 
>> For those that don't know, Julia's binary builder system essentially allows 
>> users to download binaries directly from the web for any system that the 
>> Julia Programing language distributes binaries for. So a (desktop) user can 
>> get MPI, PETSc, etc. without the headache of having to build anything from 
>> scratch; obviously on clusters you would still want to use system MPIs and 
>> what not.
>> 
>> 
>> 
>> [1] https://github.com/JuliaPackaging/BinaryBuilder.jl
>> [2] https://github.com/JuliaPackaging/Yggdrasil/pull/3249
>> [3] 
>> https://github.com/JuliaPackaging/Yggdrasil/pull/3249#issuecomment-872698681
>> [4] 
>> https://gist.github.com/jkozdon/c161fb15f2df23c3fbc0a5a095887ef8#file-configure-log
>> [5] 
>> https://github.com/JuliaBinaryWrappers/PETSc_jll.jl/releases/tag/PETSc-v3.13.4%2B0
> 



Re: [petsc-users] reproducibility

2021-06-14 Thread Stefano Zampini
Mark

I presume in your first message you report the SHA1 as listed by log_view.
That string is populated at configure time, not at runtime.


Re: [petsc-users] SLEPc: non-real singular vectors from SVD of real matrix

2021-06-02 Thread Stefano Zampini
Peder 

We have a fix for the hdf5 complex reader here if you want to give it a try 
https://gitlab.com/petsc/petsc/-/merge_requests/4044

Sorry it took so long and thank you for reporting the bug

> On Apr 29, 2021, at 2:55 PM, Peder Jørgensgaard Olesen via petsc-users 
>  wrote:
> 
> Thank you for your advice, Jose.
> 
> I tried using a single column of the data matrix as a basis for left singular 
> vector space, and a row for the right one - and lo and behold, all singular 
> vectors become real.
> 
> Somehow it did not occur to me that I was I was looking at was simply a 
> complex phase on U and V which would cancel upon computing UΣV*. In other 
> words, rather than an incorrect result it was a slightly inconvenient 
> representation of a correct result.
> 
> Best regards
> Peder
> Fra: Jose E. Roman 
> Sendt: 29. april 2021 13:00:03
> Til: Peder Jørgensgaard Olesen
> Cc: petsc-users@mcs.anl.gov
> Emne: Re: [petsc-users] SLEPc: non-real singular vectors from SVD of real 
> matrix
>  
> In complex scalars there is no way of knowing if the user-provided matrix is 
> real or not. If there was an option of MatSetOption() we could use it.
> 
> If you set a real initial vector with SVDSetInitialSpaces(), then most 
> probably the computed singular vectors will be real. But still rounding 
> errors could introduce a nonzero imaginary part.
> 
> In any case, you could normalize the computed singular vectors, as is done 
> for instance in FixSign() in this 
> examplehttps://slepc.upv.es/documentation/current/src/nep/tutorials/ex20.c.html
>  
> 
> Jose
> 
> 
> > El 29 abr 2021, a las 12:42, Peder Jørgensgaard Olesen via petsc-users 
> >  escribió:
> > 
> > Hello
> > 
> > I've noticed that doing a singular value decomposition of a real matrix 
> > appears to result in non-real singular vectors. This should not be the case 
> > - singular vectors of a real matrix must be real-valued.
> > 
> > In the example attached I read a matrix from a binary file (also attached), 
> > perform the SVD, and write singular vectors to an HDF5 file which I 
> > subsequently inspect using h5dump, revealing  non-zero imaginary parts of 
> > vector elements as highlighted below:
> > [pjool@svol mwes]$ h5dump -d U0_sample_svd -c "5,2" sample_svecs.h5 
> > HDF5 "sample_svecs.h5" {
> > DATASET "U0_sample_svd" {
> >DATATYPE  H5T_IEEE_F64LE
> >DATASPACE  SIMPLE { ( 882, 2 ) / ( 882, 2 ) }
> >SUBSET {
> >   START ( 0, 0 );
> >   STRIDE ( 1, 1 );
> >   COUNT ( 5, 2 );
> >   BLOCK ( 1, 1 );
> >   DATA {
> >   (0,0): 0.0226108, 0.0299595,
> >   (1,0): 0.035414, 0.0469237,
> >   (2,0): 0.0276317, 0.0366122,
> >   (3,0): 0.0145344, 0.0192581,
> >   (4,0): 0.0110376, 0.0146249
> >   }
> >}
> >ATTRIBUTE "complex" {
> >   DATATYPE  H5T_STD_I32LE
> >   DATASPACE  SCALAR
> >   DATA {
> >   (0): 1
> >   }
> >}
> > }
> > }
> > I also extract the imaginary part of the input matrix and print its norm to 
> > ensure that the matrix is indeed real.
> > 
> > I'm running v3.14, but I don't believe that alone should cause the issue, 
> > since it what I'm trying to do appears like a rather common and basic task.
> > 
> > What might be the reason behind this behavior, and what can be done to 
> > resolve it?
> > 
> > 
> > Med venlig hilsen / Best Regards
> > 
> > Peder Jørgensgaard Olesen
> > PhD Student, Turbulence Research Lab
> > Dept. of Mechanical Engineering
> > Technical University of Denmark
> > Niels Koppels Allé
> > Bygning 403, Rum 105
> > DK-2800 Kgs. Lyngby
> > 



Re: [petsc-users] [petsc-maint] Performing a coordinate system rotation for the stiffness matrix

2021-05-31 Thread Stefano Zampini
Mike

as long as P is a sparse matrix with compatible rows and cols (i.e.
rows(P)= cols(A) = rows (A)) , MatPtAP will compute the result.

Il giorno lun 31 mag 2021 alle ore 16:52 Mark Adams  ha
scritto:

>
>
> On Mon, May 31, 2021 at 9:20 AM Michael Wick 
> wrote:
>
>> Hi PETSc team:
>>
>> I am considering implementing a skew roller boundary condition for my
>> elasticity problem. The method is based on this journal paper:
>> http://inside.mines.edu/~vgriffit/pubs/All_J_Pubs/18.pdf
>>
>> Or you may find the method in the attached Bathe's slides, pages 9 -10.
>>
>> Roughly speaking, a (very) sparse matrix T will be created which takes
>> the shape [ I, O; O, R], where R is a 3x3 rotation matrix. And the original
>> linear problem K U = F will be modified into (T^t K T) (T^t U) = T^t F. In
>> doing so, one can enforce a roller boundary condition on a slanted surface.
>>
>> I think it can be an easy option if I can generate the T matrix and do
>> two matrix multiplications to get T^t K T. I noticed that there is a
>> MatPtAP function. Yet, after reading a previous discussion, it seems that
>> this function is not designed for this purposes (
>> https://lists.mcs.anl.gov/pipermail/petsc-users/2018-June/035477.html).
>>
>
> Yes, and no. It is motivated and optimized for a Galerkin coarse grid
> operator for AMG solvers, but it is a projection and it should be fine. If
> not, we will fix it.
>
> We try to test our methods of "empty" operators , but I don't know
> if MatPtAP has ever been tested for super sparse P. Give it a shot and see
> what happens.
>
> Mark
>
>
>>
>> I assume I can only call MatMatMult & MatTransposeMatMult to do this job,
>> correct? Is there any existingly PETSc function to do T^t K T in one call?
>>
>> Thanks,
>>
>> Mike
>>
>>

-- 
Stefano


Re: [petsc-users] CUDA MatSetValues test

2021-05-28 Thread Stefano Zampini
I can take a quick look at it tomorrow, what are the main changes you made 
since then?

> On May 28, 2021, at 9:51 PM, Mark Adams  wrote:
> 
> I am getting messed up in trying to resolve conflicts in rebasing over main.
> Is there a better way of doing this?
> Can I just tell git to use Barry's version and then test it?
> Or should I just try it again?
> 
> On Fri, May 28, 2021 at 2:15 PM Mark Adams  <mailto:mfad...@lbl.gov>> wrote:
> I am rebasing over main and its a bit of a mess. I must have missed 
> something. I get this. I think the _n_SplitCSRMat must be wrong.
> 
> 
> In file included from 
> /autofs/nccs-svm1_home1/adams/petsc/src/vec/is/sf/impls/basic/sfbasic.c:128:0:
> /ccs/home/adams/petsc/include/petscmat.h:1976:32: error: conflicting types 
> for 'PetscSplitCSRDataStructure'
>  typedef struct _n_SplitCSRMat *PetscSplitCSRDataStructure;
> ^~
> /ccs/home/adams/petsc/include/petscmat.h:1922:31: note: previous declaration 
> of 'PetscSplitCSRDataStructure' was here
>  typedef struct _p_SplitCSRMat PetscSplitCSRDataStructure;
>^~
>   CC arch-summit-opt-gnu-cuda/obj/vec/vec/impls/seq/dvec2.o
> 
> On Fri, May 28, 2021 at 1:50 PM Stefano Zampini  <mailto:stefano.zamp...@gmail.com>> wrote:
> OpenMPI.py depends on cuda.py in that, if cuda is present, configures using 
> cuda. MPI.py or MPICH.py do not depend on cuda.py (MPICH, only weakly, it 
> adds a print if cuda is present)
> Since eventually the MPI distro will only need a hint to be configured with 
> CUDA, why not removing the dependency at all and add only a flag 
> —download-openmpi-use-cuda?
> 
>> On May 28, 2021, at 8:44 PM, Barry Smith > <mailto:bsm...@petsc.dev>> wrote:
>> 
>> 
>>  Stefano, who has a far better memory than me, wrote
>> 
>> > Or probably remove —download-openmpi ? Or, just for the moment, why can’t 
>> > we just tell configure that mpi is a weak dependence of cuda.py, so that 
>> > it will be forced to be configured later?
>> 
>>   MPI.py depends on cuda.py so we cannot also have cuda.py depend on MPI.py 
>> using the generic dependencies of configure/packages  
>> 
>>   but perhaps we can just hardwire the rerunning of cuda.py when the MPI 
>> compilers are reset. I will try that now and if I can get it to work we 
>> should be able to move those old fix branches along as MR.
>> 
>>   Barry
>> 
>> 
>> 
>>> On May 28, 2021, at 12:41 PM, Mark Adams >> <mailto:mfad...@lbl.gov>> wrote:
>>> 
>>> OK, I will try to rebase and test Barry's branch.
>>> 
>>> On Fri, May 28, 2021 at 1:26 PM Stefano Zampini >> <mailto:stefano.zamp...@gmail.com>> wrote:
>>> Yes, it is the branch I was using before force pushing to Barry’s 
>>> barry/2020-11-11/cleanup-matsetvaluesdevice
>>> You can use both I guess
>>> 
>>>> On May 28, 2021, at 8:25 PM, Mark Adams >>> <mailto:mfad...@lbl.gov>> wrote:
>>>> 
>>>> Is this the correct branch? It conflicted with ex5cu so I assume it is.
>>>> 
>>>> 
>>>> stefanozampini/simplify-setvalues-device 
>>>> <https://gitlab.com/petsc/petsc/-/tree/stefanozampini/simplify-setvalues-device>
>>>> 
>>>> On Fri, May 28, 2021 at 1:24 PM Mark Adams >>> <mailto:mfad...@lbl.gov>> wrote:
>>>> I am fixing rebasing this branch over main.
>>>> 
>>>> On Fri, May 28, 2021 at 1:16 PM Stefano Zampini >>> <mailto:stefano.zamp...@gmail.com>> wrote:
>>>> Or probably remove —download-openmpi ? Or, just for the moment, why can’t 
>>>> we just tell configure that mpi is a weak dependence of cuda.py, so that 
>>>> it will be forced to be configured later?
>>>> 
>>>>> On May 28, 2021, at 8:12 PM, Stefano Zampini >>>> <mailto:stefano.zamp...@gmail.com>> wrote:
>>>>> 
>>>>> That branch provides a fix for MatSetValuesDevice but it never got merged 
>>>>> because of the CI issues with the —download-openmpi. We can probably try 
>>>>> to skip the test in that specific configuration?
>>>>> 
>>>>>> On May 28, 2021, at 7:45 PM, Barry Smith >>>>> <mailto:bsm...@petsc.dev>> wrote:
>>>>>> 
>>>>>> 
>>>>>> ~/petsc/src/mat/tutorials 
>>>>>> (barry/2021-05-28/robustify-cuda-gencodearch-check=) 
>>&g

Re: [petsc-users] CUDA MatSetValues test

2021-05-28 Thread Stefano Zampini
OpenMPI.py depends on cuda.py in that, if cuda is present, configures using 
cuda. MPI.py or MPICH.py do not depend on cuda.py (MPICH, only weakly, it adds 
a print if cuda is present)
Since eventually the MPI distro will only need a hint to be configured with 
CUDA, why not removing the dependency at all and add only a flag 
—download-openmpi-use-cuda?

> On May 28, 2021, at 8:44 PM, Barry Smith  wrote:
> 
> 
>  Stefano, who has a far better memory than me, wrote
> 
> > Or probably remove —download-openmpi ? Or, just for the moment, why can’t 
> > we just tell configure that mpi is a weak dependence of cuda.py, so that it 
> > will be forced to be configured later?
> 
>   MPI.py depends on cuda.py so we cannot also have cuda.py depend on MPI.py 
> using the generic dependencies of configure/packages  
> 
>   but perhaps we can just hardwire the rerunning of cuda.py when the MPI 
> compilers are reset. I will try that now and if I can get it to work we 
> should be able to move those old fix branches along as MR.
> 
>   Barry
> 
> 
> 
>> On May 28, 2021, at 12:41 PM, Mark Adams > <mailto:mfad...@lbl.gov>> wrote:
>> 
>> OK, I will try to rebase and test Barry's branch.
>> 
>> On Fri, May 28, 2021 at 1:26 PM Stefano Zampini > <mailto:stefano.zamp...@gmail.com>> wrote:
>> Yes, it is the branch I was using before force pushing to Barry’s 
>> barry/2020-11-11/cleanup-matsetvaluesdevice
>> You can use both I guess
>> 
>>> On May 28, 2021, at 8:25 PM, Mark Adams >> <mailto:mfad...@lbl.gov>> wrote:
>>> 
>>> Is this the correct branch? It conflicted with ex5cu so I assume it is.
>>> 
>>> 
>>> stefanozampini/simplify-setvalues-device 
>>> <https://gitlab.com/petsc/petsc/-/tree/stefanozampini/simplify-setvalues-device>
>>> 
>>> On Fri, May 28, 2021 at 1:24 PM Mark Adams >> <mailto:mfad...@lbl.gov>> wrote:
>>> I am fixing rebasing this branch over main.
>>> 
>>> On Fri, May 28, 2021 at 1:16 PM Stefano Zampini >> <mailto:stefano.zamp...@gmail.com>> wrote:
>>> Or probably remove —download-openmpi ? Or, just for the moment, why can’t 
>>> we just tell configure that mpi is a weak dependence of cuda.py, so that it 
>>> will be forced to be configured later?
>>> 
>>>> On May 28, 2021, at 8:12 PM, Stefano Zampini >>> <mailto:stefano.zamp...@gmail.com>> wrote:
>>>> 
>>>> That branch provides a fix for MatSetValuesDevice but it never got merged 
>>>> because of the CI issues with the —download-openmpi. We can probably try 
>>>> to skip the test in that specific configuration?
>>>> 
>>>>> On May 28, 2021, at 7:45 PM, Barry Smith >>>> <mailto:bsm...@petsc.dev>> wrote:
>>>>> 
>>>>> 
>>>>> ~/petsc/src/mat/tutorials 
>>>>> (barry/2021-05-28/robustify-cuda-gencodearch-check=) 
>>>>> arch-robustify-cuda-gencodearch-check
>>>>> $ ./ex5cu
>>>>> terminate called after throwing an instance of 
>>>>> 'thrust::system::system_error'
>>>>>   what():  fill_n: failed to synchronize: cudaErrorIllegalAddress: an 
>>>>> illegal memory access was encountered
>>>>> Aborted (core dumped)
>>>>> 
>>>>> requires: cuda !define(PETSC_USE_CTABLE)
>>>>> 
>>>>>   CI does not test with CUDA and no ctable.  The code is still broken as 
>>>>> it was six months ago in the discussion Stefano pointed to. It is clear 
>>>>> why just no one has had the time to clean things up.
>>>>> 
>>>>>   Barry
>>>>> 
>>>>> 
>>>>>> On May 28, 2021, at 11:13 AM, Mark Adams >>>>> <mailto:mfad...@lbl.gov>> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Fri, May 28, 2021 at 11:57 AM Stefano Zampini 
>>>>>> mailto:stefano.zamp...@gmail.com>> wrote:
>>>>>> If you are referring to your device set values, I guess it is not 
>>>>>> currently tested
>>>>>> 
>>>>>> No. There is a test for that (ex5cu).
>>>>>> I have a user that is getting a segv in MatSetValues with aijcusparse. I 
>>>>>> suspect there is memory corruption but I'm trying to cover all the bases.
>>>>>> I have added a cuda test to ksp/ex56 that works. I can do an MR for it 
>>>>>> if such a test does not exist.
>>>>>>  
>>>>>> See the discussions here 
>>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/3411 
>>>>>> <https://gitlab.com/petsc/petsc/-/merge_requests/3411>
>>>>>> I started cleaning up the code to prepare for testing but we never 
>>>>>> finished it 
>>>>>> https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/
>>>>>>  
>>>>>> <https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/>
>>>>>> 
>>>>>> 
>>>>>>> On May 28, 2021, at 6:53 PM, Mark Adams >>>>>> <mailto:mfad...@lbl.gov>> wrote:
>>>>>>> 
>>>>>>> Is there a test with MatSetValues and CUDA? 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 



Re: [petsc-users] CUDA MatSetValues test

2021-05-28 Thread Stefano Zampini
Or probably remove —download-openmpi ? Or, just for the moment, why can’t we 
just tell configure that mpi is a weak dependence of cuda.py, so that it will 
be forced to be configured later?

> On May 28, 2021, at 8:12 PM, Stefano Zampini  
> wrote:
> 
> That branch provides a fix for MatSetValuesDevice but it never got merged 
> because of the CI issues with the —download-openmpi. We can probably try to 
> skip the test in that specific configuration?
> 
>> On May 28, 2021, at 7:45 PM, Barry Smith > <mailto:bsm...@petsc.dev>> wrote:
>> 
>> 
>> ~/petsc/src/mat/tutorials 
>> (barry/2021-05-28/robustify-cuda-gencodearch-check=) 
>> arch-robustify-cuda-gencodearch-check
>> $ ./ex5cu
>> terminate called after throwing an instance of 'thrust::system::system_error'
>>   what():  fill_n: failed to synchronize: cudaErrorIllegalAddress: an 
>> illegal memory access was encountered
>> Aborted (core dumped)
>> 
>> requires: cuda !define(PETSC_USE_CTABLE)
>> 
>>   CI does not test with CUDA and no ctable.  The code is still broken as it 
>> was six months ago in the discussion Stefano pointed to. It is clear why 
>> just no one has had the time to clean things up.
>> 
>>   Barry
>> 
>> 
>>> On May 28, 2021, at 11:13 AM, Mark Adams >> <mailto:mfad...@lbl.gov>> wrote:
>>> 
>>> 
>>> 
>>> On Fri, May 28, 2021 at 11:57 AM Stefano Zampini >> <mailto:stefano.zamp...@gmail.com>> wrote:
>>> If you are referring to your device set values, I guess it is not currently 
>>> tested
>>> 
>>> No. There is a test for that (ex5cu).
>>> I have a user that is getting a segv in MatSetValues with aijcusparse. I 
>>> suspect there is memory corruption but I'm trying to cover all the bases.
>>> I have added a cuda test to ksp/ex56 that works. I can do an MR for it if 
>>> such a test does not exist.
>>>  
>>> See the discussions here 
>>> https://gitlab.com/petsc/petsc/-/merge_requests/3411 
>>> <https://gitlab.com/petsc/petsc/-/merge_requests/3411>
>>> I started cleaning up the code to prepare for testing but we never finished 
>>> it 
>>> https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/
>>>  
>>> <https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/>
>>> 
>>> 
>>>> On May 28, 2021, at 6:53 PM, Mark Adams >>> <mailto:mfad...@lbl.gov>> wrote:
>>>> 
>>>> Is there a test with MatSetValues and CUDA? 
>>> 
>> 
> 



Re: [petsc-users] CUDA MatSetValues test

2021-05-28 Thread Stefano Zampini
That branch provides a fix for MatSetValuesDevice but it never got merged 
because of the CI issues with the —download-openmpi. We can probably try to 
skip the test in that specific configuration?

> On May 28, 2021, at 7:45 PM, Barry Smith  wrote:
> 
> 
> ~/petsc/src/mat/tutorials 
> (barry/2021-05-28/robustify-cuda-gencodearch-check=) 
> arch-robustify-cuda-gencodearch-check
> $ ./ex5cu
> terminate called after throwing an instance of 'thrust::system::system_error'
>   what():  fill_n: failed to synchronize: cudaErrorIllegalAddress: an illegal 
> memory access was encountered
> Aborted (core dumped)
> 
> requires: cuda !define(PETSC_USE_CTABLE)
> 
>   CI does not test with CUDA and no ctable.  The code is still broken as it 
> was six months ago in the discussion Stefano pointed to. It is clear why just 
> no one has had the time to clean things up.
> 
>   Barry
> 
> 
>> On May 28, 2021, at 11:13 AM, Mark Adams > <mailto:mfad...@lbl.gov>> wrote:
>> 
>> 
>> 
>> On Fri, May 28, 2021 at 11:57 AM Stefano Zampini > <mailto:stefano.zamp...@gmail.com>> wrote:
>> If you are referring to your device set values, I guess it is not currently 
>> tested
>> 
>> No. There is a test for that (ex5cu).
>> I have a user that is getting a segv in MatSetValues with aijcusparse. I 
>> suspect there is memory corruption but I'm trying to cover all the bases.
>> I have added a cuda test to ksp/ex56 that works. I can do an MR for it if 
>> such a test does not exist.
>>  
>> See the discussions here 
>> https://gitlab.com/petsc/petsc/-/merge_requests/3411 
>> <https://gitlab.com/petsc/petsc/-/merge_requests/3411>
>> I started cleaning up the code to prepare for testing but we never finished 
>> it 
>> https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/
>>  
>> <https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/>
>> 
>> 
>>> On May 28, 2021, at 6:53 PM, Mark Adams >> <mailto:mfad...@lbl.gov>> wrote:
>>> 
>>> Is there a test with MatSetValues and CUDA? 
>> 
> 



Re: [petsc-users] CUDA MatSetValues test

2021-05-28 Thread Stefano Zampini
That test is not run  in the testsuite

Il Ven 28 Mag 2021, 19:13 Mark Adams  ha scritto:

>
>
> On Fri, May 28, 2021 at 11:57 AM Stefano Zampini <
> stefano.zamp...@gmail.com> wrote:
>
>> If you are referring to your device set values, I guess it is not
>> currently tested
>>
>
> No. There is a test for that (ex5cu).
> I have a user that is getting a segv in MatSetValues with aijcusparse. I
> suspect there is memory corruption but I'm trying to cover all the bases.
> I have added a cuda test to ksp/ex56 that works. I can do an MR for it if
> such a test does not exist.
>
>
>> See the discussions here
>> https://gitlab.com/petsc/petsc/-/merge_requests/3411
>> I started cleaning up the code to prepare for testing but we never
>> finished it
>> https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/
>>
>>
>> On May 28, 2021, at 6:53 PM, Mark Adams  wrote:
>>
>> Is there a test with MatSetValues and CUDA?
>>
>>
>>


Re: [petsc-users] CUDA MatSetValues test

2021-05-28 Thread Stefano Zampini
If you are referring to your device set values, I guess it is not currently 
tested
See the discussions here https://gitlab.com/petsc/petsc/-/merge_requests/3411
I started cleaning up the code to prepare for testing but we never finished it 
https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/
 



> On May 28, 2021, at 6:53 PM, Mark Adams  wrote:
> 
> Is there a test with MatSetValues and CUDA? 



Re: [petsc-users] reproducibility

2021-05-28 Thread Stefano Zampini
Mark

That line is obtained via

git describe --match "v*"

At configure time. The number after the g indicates the commit
As Matt says, you can do git checkout  to go back at the point were 
you configured PETSc

> On May 28, 2021, at 4:33 PM, Matthew Knepley  wrote:
> 
> 1397235



Re: [petsc-users] adding calls before and after each iteration of snes

2021-05-25 Thread Stefano Zampini
I use 
https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetUpdate.html
 

 and
 
https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESLineSearchSetPostCheck.html

> On May 25, 2021, at 10:24 PM, Barry Smith  wrote:
> 
> 
>There is also SNESMonitorSet() and SNESSetConvergenceTest(). 
> 
>> On May 25, 2021, at 9:25 AM, Matthew Knepley > > wrote:
>> 
>> On Tue, May 25, 2021 at 8:41 AM hg > > wrote:
>> Hello
>> 
>> I would like to ask if it is possible to add function call before and after 
>> each iteration of SNES solve, e.g. InitializeNonLinearIteration and 
>> FinalizeNonLinearIteration. It is particularly useful for debugging the 
>> constitutive law or for post-processing to post the intermediate results.
>> 
>> There is this: 
>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/SNES/SNESSetUpdate.html
>>  
>> 
>> 
>>   Thanks,
>> 
>> Matt
>>  
>> Best
>> Giang
>> -- 
>> What most experimenters take for granted before they begin their experiments 
>> is infinitely more interesting than any results to which their experiments 
>> lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/ 
> 



Re: [petsc-users] MatChop

2021-05-21 Thread Stefano Zampini



> On 21 May 2021, at 8:10 PM, Lawrence Mitchell  wrote:
> 
> 
> 
>> On 21 May 2021, at 17:53, Stefano Zampini  wrote:
>> 
>>> I see, anyway you do not need the check if the loop range [rStart,rEnd). So 
>>> now I don’t understand why the loop must be [rStart,rStart+maxRows], Matt?
>>> 
>>> It is terrible, but I could not see a way around it. We want to use 
>>> MatGetRow() for each row, but that requires an assembled matrix.
>> 
>> What is the use case for calling MatChop on an unassembled matrix ?
> 
> It's rather that the matrix is modified "through the front door" by just 
> calling MatSetValues repeatedly to replace the small values with zero. This 
> is because an in-place modification of the matrix would require a method for 
> each matrix type.
> 

If the matrix is assembled, this procedure will not insert new values, nor 
replace old ones if we set MatSetOption(MAT_IGNORE_ZEROENTRIES,PETSC_FALSE). 
This should never fail, or am I wrong?


> So the process is:
> 
> for each row:
>rowvals = MatGetRow(row)
>rowvals[abs(rowvals) < tol] = 0
>MatSetValues(rowvals, ..., INSERT)
>MatRestoreRow(row)
>MatAssemblyBegin/End <- so that the next MatGetRow does not error.
> 
> Now, one "knows" that this assembly will not need to communicate (because you 
> only set local values), but the automatic state tracking can't know this.
> 
> A disgusting hack that is tremendously fragile would be to do:
> 
> for each row:
>...
>mat->assembled = PETSC_TRUE
> mat->assembled = PETSC_FALSE
> MatAssemblyBegin/End
> 
> But I would probably refuse to accept that could :)
> 
> Lawrence



Re: [petsc-users] MatChop

2021-05-21 Thread Stefano Zampini


> On 21 May 2021, at 7:49 PM, Matthew Knepley  wrote:
> 
> On Fri, May 21, 2021 at 12:33 PM Stefano Zampini  <mailto:stefano.zamp...@gmail.com>> wrote:
> 
> 
>> On 21 May 2021, at 7:17 PM, Pierre Jolivet > <mailto:pie...@joliv.et>> wrote:
>> 
>> 
>> 
>>> On 21 May 2021, at 6:03 PM, Stefano Zampini >> <mailto:stefano.zamp...@gmail.com>> wrote:
>>> 
>>> Emmanuel
>>> 
>>> thanks for reporting this.
>>> I believe we have a regression in MatChop from 
>>> https://gitlab.com/petsc/petsc/-/commit/038df967165af8ac6c3de46a36f650566a7db07c
>>>  
>>> <https://gitlab.com/petsc/petsc/-/commit/038df967165af8ac6c3de46a36f650566a7db07c>
>>>  (cc'ing Pierre)
>>> We call MatAssemblyBegin/End within the row loop. Also. I don't understand 
>>> why we need to check for  r < rend herre 
>>> https://gitlab.com/petsc/petsc/-/blob/038df967165af8ac6c3de46a36f650566a7db07c/src/mat/utils/axpy.c#L513
>>>  
>>> <https://gitlab.com/petsc/petsc/-/blob/038df967165af8ac6c3de46a36f650566a7db07c/src/mat/utils/axpy.c#L513>.
>>>  nor why we need to allocate newCols (can use cols)
>> 
>> That part is from the initial 8-year old implementation from Matt 
>> (https://gitlab.com/petsc/petsc/-/commit/4325cce7191c5c61f4f090c59eaf6773fdee7b48#9d78409dea8190bffda8b68fee5aef233dc1c677
>>  
>> <https://gitlab.com/petsc/petsc/-/commit/4325cce7191c5c61f4f090c59eaf6773fdee7b48#9d78409dea8190bffda8b68fee5aef233dc1c677>).
>> You need the check otherwise this error is raised: 
>> https://www.mcs.anl.gov/petsc/petsc-current/src/mat/interface/matrix.c.html#line566
>>  
>> <https://www.mcs.anl.gov/petsc/petsc-current/src/mat/interface/matrix.c.html#line566>.
> 
> I see, anyway you do not need the check if the loop range [rStart,rEnd). So 
> now I don’t understand why the loop must be [rStart,rStart+maxRows], Matt?
> 
> It is terrible, but I could not see a way around it. We want to use 
> MatGetRow() for each row, but that requires an assembled matrix.

What is the use case for calling MatChop on an unassembled matrix ?

> We want to use
> MatSetValues() to changes things, but that unassembles the matrix, so we need 
> an assembly at each iteration, but assembly is collective, so everyone has
> to take the same number of iterations. Thus, maxRows.


> 
>Matt 
>> Thanks,
>> Pierre
>> 
>>> Pierre, can you take a look?
>>> 
>>> Il giorno ven 21 mag 2021 alle ore 18:49 Emmanuel Ayala >> <mailto:juan...@gmail.com>> ha scritto:
>>> Hi everybody,
>>> 
>>> I just updated petsc from version 13 to 15. Before the update everything 
>>> works well, then my code give me an error:
>>> 
>>> [9]PETSC ERROR: - Error Message 
>>> --
>>> [9]PETSC ERROR: Invalid argument
>>> [9]PETSC ERROR: Setting off process row 53484 even though 
>>> MatSetOption(,MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE) was set
>>> [9]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html 
>>> <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble shooting.
>>> [9]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 
>>> [9]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-superlud_mumps_hyp named 
>>> eayala by ayala Fri May 21 10:40:36 2021
>>> [9]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 
>>> -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" 
>>> FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich 
>>> --download-hypre --download-mumps --download-scalapack --download-parmetis 
>>> --download-metis --download-superlu_dist --download-cmake 
>>> --download-fblaslapack=1 --with-cxx-dialect=C++11
>>> 
>>> The error appears after a matrix assembly, the matrix was created with 
>>> DMCreateMatrix and updated with MatSetValuesLocal.
>>> 
>>> A Little work around I found the solution, avoid using MatChop on this 
>>> matrix, but I still need to use MatChop. There is any reason to have this 
>>> problem?
>>> 
>>> Thanks in advance.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Stefano
>> 
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any results to which their experiments 
> lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>


Re: [petsc-users] MatChop

2021-05-21 Thread Stefano Zampini


> On 21 May 2021, at 7:17 PM, Pierre Jolivet  wrote:
> 
> 
> 
>> On 21 May 2021, at 6:03 PM, Stefano Zampini > <mailto:stefano.zamp...@gmail.com>> wrote:
>> 
>> Emmanuel
>> 
>> thanks for reporting this.
>> I believe we have a regression in MatChop from 
>> https://gitlab.com/petsc/petsc/-/commit/038df967165af8ac6c3de46a36f650566a7db07c
>>  
>> <https://gitlab.com/petsc/petsc/-/commit/038df967165af8ac6c3de46a36f650566a7db07c>
>>  (cc'ing Pierre)
>> We call MatAssemblyBegin/End within the row loop. Also. I don't understand 
>> why we need to check for  r < rend herre 
>> https://gitlab.com/petsc/petsc/-/blob/038df967165af8ac6c3de46a36f650566a7db07c/src/mat/utils/axpy.c#L513
>>  
>> <https://gitlab.com/petsc/petsc/-/blob/038df967165af8ac6c3de46a36f650566a7db07c/src/mat/utils/axpy.c#L513>.
>>  nor why we need to allocate newCols (can use cols)
> 
> That part is from the initial 8-year old implementation from Matt 
> (https://gitlab.com/petsc/petsc/-/commit/4325cce7191c5c61f4f090c59eaf6773fdee7b48#9d78409dea8190bffda8b68fee5aef233dc1c677
>  
> <https://gitlab.com/petsc/petsc/-/commit/4325cce7191c5c61f4f090c59eaf6773fdee7b48#9d78409dea8190bffda8b68fee5aef233dc1c677>).
> You need the check otherwise this error is raised: 
> https://www.mcs.anl.gov/petsc/petsc-current/src/mat/interface/matrix.c.html#line566
>  
> <https://www.mcs.anl.gov/petsc/petsc-current/src/mat/interface/matrix.c.html#line566>.

I see, anyway you do not need the check if the loop range [rStart,rEnd). So now 
I don’t understand why the loop must be [rStart,rStart+maxRows], Matt?

> 
> Thanks,
> Pierre
> 
>> Pierre, can you take a look?
>> 
>> Il giorno ven 21 mag 2021 alle ore 18:49 Emmanuel Ayala > <mailto:juan...@gmail.com>> ha scritto:
>> Hi everybody,
>> 
>> I just updated petsc from version 13 to 15. Before the update everything 
>> works well, then my code give me an error:
>> 
>> [9]PETSC ERROR: - Error Message 
>> --
>> [9]PETSC ERROR: Invalid argument
>> [9]PETSC ERROR: Setting off process row 53484 even though 
>> MatSetOption(,MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE) was set
>> [9]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html 
>> <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble shooting.
>> [9]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021 
>> [9]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-superlud_mumps_hyp named 
>> eayala by ayala Fri May 21 10:40:36 2021
>> [9]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2 
>> -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native" 
>> FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich 
>> --download-hypre --download-mumps --download-scalapack --download-parmetis 
>> --download-metis --download-superlu_dist --download-cmake 
>> --download-fblaslapack=1 --with-cxx-dialect=C++11
>> 
>> The error appears after a matrix assembly, the matrix was created with 
>> DMCreateMatrix and updated with MatSetValuesLocal.
>> 
>> A Little work around I found the solution, avoid using MatChop on this 
>> matrix, but I still need to use MatChop. There is any reason to have this 
>> problem?
>> 
>> Thanks in advance.
>> 
>> 
>> 
>> 
>> 
>> 
>> -- 
>> Stefano
> 



Re: [petsc-users] MatChop

2021-05-21 Thread Stefano Zampini
Emmanuel

thanks for reporting this.
I believe we have a regression in MatChop from
https://gitlab.com/petsc/petsc/-/commit/038df967165af8ac6c3de46a36f650566a7db07c
(cc'ing Pierre)
We call MatAssemblyBegin/End within the row loop. Also. I don't understand
why we need to check for  r < rend herre
https://gitlab.com/petsc/petsc/-/blob/038df967165af8ac6c3de46a36f650566a7db07c/src/mat/utils/axpy.c#L513.
nor why we need to allocate newCols (can use cols)

Pierre, can you take a look?

Il giorno ven 21 mag 2021 alle ore 18:49 Emmanuel Ayala 
ha scritto:

> Hi everybody,
>
> I just updated petsc from version 13 to 15. Before the update everything
> works well, then my code give me an error:
>
> [9]PETSC ERROR: - Error Message
> --
> [9]PETSC ERROR: Invalid argument
> [9]PETSC ERROR: Setting off process row 53484 even though
> MatSetOption(,MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE) was set
> [9]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [9]PETSC ERROR: Petsc Release Version 3.15.0, Mar 30, 2021
> [9]PETSC ERROR: ./comp on a arch-linux-c-opt-O2-superlud_mumps_hyp named
> eayala by ayala Fri May 21 10:40:36 2021
> [9]PETSC ERROR: Configure options --with-debugging=0 COPTFLAGS="-O2
> -march=native -mtune=native" CXXOPTFLAGS="-O2 -march=native -mtune=native"
> FOPTFLAGS="-O2 -march=native -mtune=native" --download-mpich
> --download-hypre --download-mumps --download-scalapack --download-parmetis
> --download-metis --download-superlu_dist --download-cmake
> --download-fblaslapack=1 --with-cxx-dialect=C++11
>
> The error appears after a matrix assembly, the matrix was created with
> DMCreateMatrix and updated with MatSetValuesLocal.
>
> A Little work around I found the solution, avoid using MatChop on this
> matrix, but I still need to use MatChop. There is any reason to have this
> problem?
>
> Thanks in advance.
>
>
>
>
>

-- 
Stefano


Re: [petsc-users] Parallel TS for ODE

2021-05-04 Thread Stefano Zampini
 at each point and not parallelize over the dof at each point. Likely
> you want to use DMNETWORK to manage the spatial distribution since it has a
> simple API and allows any number of different number of neighbors for each
> point. DMDA would not make sense  for true spatial distribution except in
> some truly trivial neighbor configurations.
>
> Barry
>
>
>
>
> I am not sure whether I correctly understood this command properly. The
> vector should have 3 components (S, I, R) and 3 DOF as it is defined only
> when the three coordinates have been set.
> Then I create a global vector X. When I set the initial conditions as
> below
>
> static PetscErrorCode InitialConditions(TS ts,Vec X, void *ctx)
> {
>   PetscErrorCodeierr;
>   AppCtx*appctx = (AppCtx*) ctx;
>   PetscScalar   *x;
>   DMda;
>
>   PetscFunctionBeginUser;
>   ierr = TSGetDM(ts,);CHKERRQ(ierr);
>
>   /* Get pointers to vector data */
>   ierr = DMDAVecGetArray(da,X,(void*));CHKERRQ(ierr);
>
>   x[0] = appctx->N - appctx->p[2];
>   x[1] = appctx->p[2];
>   x[2] = 0.0;
>
>   ierr = DMDAVecRestoreArray(da,X,(void*));CHKERRQ(ierr);
>   PetscFunctionReturn(0);
> }
>
> I have the error:
>
> [0]PETSC ERROR: Invalid argument
> [0]PETSC ERROR: Wrong subtype object:Parameter # 1 must have
> implementation da it is shell
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for
> trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.14.4, unknown
> [0]PETSC ERROR: ./par_sir_model on a arch-debug named srvulx13 by fbrarda
> Thu Apr 29 09:36:17 2021
> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++
> --with-fc=gfortran --with-openblas-dir=/opt/packages/openblas/0.2.13-gcc
> --download-mpich PETSC_ARCH=arch-debug
> [0]PETSC ERROR: #1 DMDAVecGetArray() line 48 in
> /home/fbrarda/petsc/src/dm/impls/da/dagetarray.c
> [0]PETSC ERROR: #2 InitialConditions() line 175 in par_sir_model.c
> [0]PETSC ERROR: #3 main() line 295 in par_sir_model.c
> [0]PETSC ERROR: No PETSc Option Table entries
>
> I would be very happy to receive any advices to fix the code.
> Best,
> Francesco
>
> Il giorno 20 apr 2021, alle ore 21:35, Matthew Knepley 
> ha scritto:
>
> On Tue, Apr 20, 2021 at 1:17 PM Francesco Brarda <
> brardafrance...@gmail.com> wrote:
> Thank you for the advices, I would just like to convert the code I already
> have to see what might happen once parallelized.
> Do you think it is better to put the 3 equations into a 1d Distributed
> Array with 3 dofs and run the job with multiple procs regardless of how
> many equations I have? Is it possible?
>
> If you plan in the end to use a structured grid, this is a great plan. If
> not, this is not a good plan.
>
>   Thanks,
>
>  Matt
>
> Thank you,
> Francesco
>
> Il giorno 20 apr 2021, alle ore 17:57, Stefano Zampini <
> stefano.zamp...@gmail.com> ha scritto:
>
> It does not make sense to parallelize to 1 equation per process, unless
> that single equation per process is super super super costly.
> Is this work you are doing used to understand PETSc parallelization
> strategy? if so, there are multiple examples in the sourcetree that you can
> look at to populate matrices and vectors in parallel
>
> Il giorno mar 20 apr 2021 alle ore 17:52 Francesco Brarda <
> brardafrance...@gmail.com> ha scritto:
> In principle the entire code was for 1 proc only. The functions were built
> with VecGetArray(). While adapting the code for multiple procs I thought
> using VecGetOwnershipRange was a possible way to allocate the equations in
> the vector using multiple procs. What do you think, please?
>
> Thank you,
> Francesco
>
> Il giorno 20 apr 2021, alle ore 16:43, Matthew Knepley 
> ha scritto:
>
> On Tue, Apr 20, 2021 at 10:41 AM Francesco Brarda <
> brardafrance...@gmail.com> wrote:
> I was trying to follow Barry's advice some time ago, but I guess that's
> not the way he meant it. How should I refer to the values contained in x?
> With Distributed Arrays?
>
> That is how you get values from x. However, I cannot understand at all
> what you are doing with "mybase".
>
>Matt
>
> Thanks
> Francesco
>
>  Even though it will not scale and will deliver slower performance it is
> completely possible for you to solve the 3 variable problem using 3 MPI
> ranks. Or 10 mpi ranks. You would just create vectors/matrices with 1
> degree of freedom for the first three ranks and no degrees of freedom for
> the later ranks. During your function evaluation (and Jacobian evaluation)
> for TS you will need to set up the appropriate communication to get the
> v

Re: [petsc-users] Parallel TS for ODE

2021-04-29 Thread Stefano Zampini
o gio 29 apr 2021 alle ore 10:57 Francesco Brarda <
brardafrance...@gmail.com> ha scritto:

> Hi,
> The plan is actually to move to a SIR model also with the space.
> I understand that doing a SIR model in parallel will not bring any
> benefits, but I have been asked to do it as part of a project I am involved
> in.
>
> I defined the DM as follows
> ierr =
> DMDACreate1d(PETSC_COMM_WORLD,DM_BOUNDARY_NONE,3,3,3,NULL,);CHKERRQ(ierr);
>
> I am not sure whether I correctly understood this command properly. The
> vector should have 3 components (S, I, R) and 3 DOF as it is defined
> only when the three coordinates have been set.
> Then I create a global vector X. When I set the initial conditions
> as below
>
> static PetscErrorCode InitialConditions(TS ts,Vec X, void *ctx)
> {
>   PetscErrorCodeierr;
>   AppCtx*appctx = (AppCtx*) ctx;
>   PetscScalar   *x;
>   DMda;
>
>   PetscFunctionBeginUser;
>   ierr = TSGetDM(ts,);CHKERRQ(ierr);
>
>   /* Get pointers to vector data */
>   ierr = DMDAVecGetArray(da,X,(void*));CHKERRQ(ierr);
>
>   x[0] = appctx->N - appctx->p[2];
>   x[1] = appctx->p[2];
>   x[2] = 0.0;
>
>   ierr = DMDAVecRestoreArray(da,X,(void*));CHKERRQ(ierr);
>   PetscFunctionReturn(0);
> }
>
> I have the error:
>
> [0]PETSC ERROR: Invalid argument
> [0]PETSC ERROR: Wrong subtype object:Parameter # 1 must have
> implementation da it is shell
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.14.4, unknown
> [0]PETSC ERROR: ./par_sir_model on a arch-debug named srvulx13 by fbrarda
> Thu Apr 29 09:36:17 2021
> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++
> --with-fc=gfortran --with-openblas-dir=/opt/packages/openblas/0.2.13-gcc
> --download-mpich PETSC_ARCH=arch-debug
> [0]PETSC ERROR: #1 DMDAVecGetArray() line 48 in
> /home/fbrarda/petsc/src/dm/impls/da/dagetarray.c
> [0]PETSC ERROR: #2 InitialConditions() line 175 in par_sir_model.c
> [0]PETSC ERROR: #3 main() line 295 in par_sir_model.c
> [0]PETSC ERROR: No PETSc Option Table entries
>
> I would be very happy to receive any advices to fix the code.
> Best,
> Francesco
>
> Il giorno 20 apr 2021, alle ore 21:35, Matthew Knepley 
> ha scritto:
>
> On Tue, Apr 20, 2021 at 1:17 PM Francesco Brarda <
> brardafrance...@gmail.com> wrote:
> Thank you for the advices, I would just like to convert the code I already
> have to see what might happen once parallelized.
> Do you think it is better to put the 3 equations into a 1d Distributed
> Array with 3 dofs and run the job with multiple procs regardless of
> how many equations I have? Is it possible?
>
> If you plan in the end to use a structured grid, this is a great plan. If
> not, this is not a good plan.
>
>   Thanks,
>
>  Matt
>
> Thank you,
> Francesco
>
> Il giorno 20 apr 2021, alle ore 17:57, Stefano Zampini <
> stefano.zamp...@gmail.com> ha scritto:
>
> It does not make sense to parallelize to 1 equation per process, unless
> that single equation per process is super super super costly.
> Is this work you are doing used to understand PETSc parallelization
> strategy? if so, there are multiple examples in the sourcetree that you can
> look at to populate matrices and vectors in parallel
>
> Il giorno mar 20 apr 2021 alle ore 17:52 Francesco Brarda <
> brardafrance...@gmail.com> ha scritto:
> In principle the entire code was for 1 proc only. The functions were built
> with VecGetArray(). While adapting the code for multiple procs I thought
> using VecGetOwnershipRange was a possible way to allocate the equations in
> the vector using multiple procs. What do you think, please?
>
> Thank you,
> Francesco
>
> Il giorno 20 apr 2021, alle ore 16:43, Matthew Knepley 
> ha scritto:
>
> On Tue, Apr 20, 2021 at 10:41 AM Francesco Brarda <
> brardafrance...@gmail.com> wrote:
> I was trying to follow Barry's advice some time ago, but I guess that's
> not the way he meant it. How should I refer to the values contained in x?
> With Distributed Arrays?
>
> That is how you get values from x. However, I cannot understand at all
> what you are doing with "mybase".
>
>Matt
>
> Thanks
> Francesco
>
>  Even though it will not scale and will deliver slower performance it is
> completely possible for you to solve the 3 variable problem using 3 MPI
> ranks. Or 10 mpi ranks. You would just create vectors/matrices with 1
> degree of freedom for the first three ranks and no degrees of freedom for
> the later ranks. During your function evaluation (and Jacobian eval

Re: [petsc-users] Rather different matrix product results on multiple processes

2021-04-21 Thread Stefano Zampini
Here you have, https://gitlab.com/petsc/petsc/-/merge_requests/3903. We can
discuss the issue on gitlab.

Thanks
Stefano

Il giorno mer 21 apr 2021 alle ore 13:39 Stefano Zampini <
stefano.zamp...@gmail.com> ha scritto:

> Peder
>
> I have slightly modified your code and I confirm the bug.
> The bug is not with the MatMatTranspose operation; it is within the HDF5
> reader. I will soon open an MR with the code and discussing the issues.
>
> Thanks for reporting the issue
> Stefano
>
> Il giorno mer 21 apr 2021 alle ore 12:22 Peder Jørgensgaard Olesen via
> petsc-users  ha scritto:
>
>> Dear Hong
>>
>>
>> Thank your for your reply.
>>
>>
>> I have a hunch that the issue goes beyond the minor differences that
>> might arise from floating-point computation order, however.
>>
>>
>> Writing the product matrix to a binary file using MatView() and
>> inspecting the output shows very different entries depending on the number
>> of processes. Here are the first three rows and columns of the product
>> matrix obtained in a sequential run:
>>
>> 2.58348   1.68202   1.66302
>>
>> 1.68202   4.27506   1.91897
>>
>> 1.66302   1.91897   2.70028
>>
>>
>> - and the corresponding part of the product matrix obtained on one node
>> (40 processes):
>>
>> 4.43536   2.17261   0.16430
>>
>> 2.17261   4.53224   2.53210
>>
>> 0.16430   2.53210   4.73234
>>
>>
>> The parallel result is not even close to the sequential one. Trying
>> different numbers of processes produces yet different results.
>>
>>
>> Also, the eigenvectors that I subsequently determine using a SLEPC
>> solver do not form a proper basis for the column space of the data
>> matrix as they must, which is hardly a surprise given the variability of
>> results indicated above - except when the code is run on just a single
>> process. Forming such a basis central to the intended application, and given
>> that it would need to work on rather large data sets, running on a single
>> process is hardly a viable solution.
>>
>>
>> Best regards
>>
>> Peder
>> --
>> *Fra:* Zhang, Hong 
>> *Sendt:* 19. april 2021 18:34:31
>> *Til:* petsc-users@mcs.anl.gov; Peder Jørgensgaard Olesen
>> *Emne:* Re: Rather different matrix product results on multiple processes
>>
>> Peder,
>> I tested your code on a linux machine. I got
>> $ ./acorr_mwe
>> Data matrix norm: 5.0538e+01
>> Autocorrelation matrix norm: 1.0473e+03
>>
>> mpiexec -n 40 ./acorr_mwe -matmattransmult_mpidense_mpidense_via
>> allgatherv (default)
>> Data matrix norm: 5.0538e+01
>> Autocorrelation matrix norm: 1.0363e+03
>>
>> mpiexec -n 20 ./acorr_mwe
>> Data matrix norm: 5.0538e+01
>> Autocorrelation matrix norm: 1.0897e+03
>>
>> mpiexec -n 40 ./acorr_mwe -matmattransmult_mpidense_mpidense_via cyclic
>> Data matrix norm: 5.0538e+01
>> Autocorrelation matrix norm: 1.0363e+03
>>
>> I use petsc 'main' branch (same as the latest release). You can remove
>> MatAssemblyBegin/End calls after MatMatTransposeMult():
>> MatMatTransposeMult(data_mat, data_mat, MAT_INITIAL_MATRIX,
>> PETSC_DEFAULT, _mat);
>> //ierr = MatAssemblyBegin(corr_mat, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);
>> //ierr = MatAssemblyEnd(corr_mat, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);
>>
>> The communication patterns of parallel implementation led to different
>> order of floating-point computation, thus slightly different matrix norm of
>> R.
>> Hong
>>
>> --
>> *From:* petsc-users  on behalf of Peder
>> Jørgensgaard Olesen via petsc-users 
>> *Sent:* Monday, April 19, 2021 7:57 AM
>> *To:* petsc-users@mcs.anl.gov 
>> *Subject:* [petsc-users] Rather different matrix product results on
>> multiple processes
>>
>>
>> Hello,
>>
>>
>> When computing a matrix product of the type R = D.DT using
>> MatMatTransposeMult() I find I get rather different results depending on
>> the number of processes. In one example using a data set that is
>> small compared to the application I get Frobenius norms |R| = 1.047e3 on a
>> single process, 1.0363e3 on a single HPC node (40 cores), and 9.7307e2 on
>> two nodes.
>>
>>
>> I have ascertained that the single process result is indeed the correct
>> one (i.e., eigenvectors of R form a proper basis for the columns of D), so
>> naturally I'd love to be able to reproduce this result across different
>> parallel setups. How might I achieve this?
>>
>>
>> I'm attaching MWE code and the data set used for the example.
>>
>>
>> Thanks in advance!
>>
>>
>> Best Regards
>>
>>
>> Peder Jørgensgaard Olesen
>>
>> PhD Student, Turbulence Research Lab
>>
>> Dept. of Mechanical Engineering
>>
>> Technical University of Denmark
>>
>> Niels Koppels Allé
>>
>> Bygning 403, Rum 105
>>
>> DK-2800 Kgs. Lyngby
>>
>
>
> --
> Stefano
>


-- 
Stefano


Re: [petsc-users] Rather different matrix product results on multiple processes

2021-04-21 Thread Stefano Zampini
Peder

I have slightly modified your code and I confirm the bug.
The bug is not with the MatMatTranspose operation; it is within the HDF5
reader. I will soon open an MR with the code and discussing the issues.

Thanks for reporting the issue
Stefano

Il giorno mer 21 apr 2021 alle ore 12:22 Peder Jørgensgaard Olesen via
petsc-users  ha scritto:

> Dear Hong
>
>
> Thank your for your reply.
>
>
> I have a hunch that the issue goes beyond the minor differences that
> might arise from floating-point computation order, however.
>
>
> Writing the product matrix to a binary file using MatView() and inspecting
> the output shows very different entries depending on the number of
> processes. Here are the first three rows and columns of the product matrix
> obtained in a sequential run:
>
> 2.58348   1.68202   1.66302
>
> 1.68202   4.27506   1.91897
>
> 1.66302   1.91897   2.70028
>
>
> - and the corresponding part of the product matrix obtained on one node
> (40 processes):
>
> 4.43536   2.17261   0.16430
>
> 2.17261   4.53224   2.53210
>
> 0.16430   2.53210   4.73234
>
>
> The parallel result is not even close to the sequential one. Trying
> different numbers of processes produces yet different results.
>
>
> Also, the eigenvectors that I subsequently determine using a SLEPC solver
> do not form a proper basis for the column space of the data matrix as
> they must, which is hardly a surprise given the variability of
> results indicated above - except when the code is run on just a single
> process. Forming such a basis central to the intended application, and given
> that it would need to work on rather large data sets, running on a single
> process is hardly a viable solution.
>
>
> Best regards
>
> Peder
> --
> *Fra:* Zhang, Hong 
> *Sendt:* 19. april 2021 18:34:31
> *Til:* petsc-users@mcs.anl.gov; Peder Jørgensgaard Olesen
> *Emne:* Re: Rather different matrix product results on multiple processes
>
> Peder,
> I tested your code on a linux machine. I got
> $ ./acorr_mwe
> Data matrix norm: 5.0538e+01
> Autocorrelation matrix norm: 1.0473e+03
>
> mpiexec -n 40 ./acorr_mwe -matmattransmult_mpidense_mpidense_via
> allgatherv (default)
> Data matrix norm: 5.0538e+01
> Autocorrelation matrix norm: 1.0363e+03
>
> mpiexec -n 20 ./acorr_mwe
> Data matrix norm: 5.0538e+01
> Autocorrelation matrix norm: 1.0897e+03
>
> mpiexec -n 40 ./acorr_mwe -matmattransmult_mpidense_mpidense_via cyclic
> Data matrix norm: 5.0538e+01
> Autocorrelation matrix norm: 1.0363e+03
>
> I use petsc 'main' branch (same as the latest release). You can remove
> MatAssemblyBegin/End calls after MatMatTransposeMult():
> MatMatTransposeMult(data_mat, data_mat, MAT_INITIAL_MATRIX, PETSC_DEFAULT,
> _mat);
> //ierr = MatAssemblyBegin(corr_mat, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);
> //ierr = MatAssemblyEnd(corr_mat, MAT_FINAL_ASSEMBLY); CHKERRQ(ierr);
>
> The communication patterns of parallel implementation led to different
> order of floating-point computation, thus slightly different matrix norm of
> R.
> Hong
>
> --
> *From:* petsc-users  on behalf of Peder
> Jørgensgaard Olesen via petsc-users 
> *Sent:* Monday, April 19, 2021 7:57 AM
> *To:* petsc-users@mcs.anl.gov 
> *Subject:* [petsc-users] Rather different matrix product results on
> multiple processes
>
>
> Hello,
>
>
> When computing a matrix product of the type R = D.DT using
> MatMatTransposeMult() I find I get rather different results depending on
> the number of processes. In one example using a data set that is
> small compared to the application I get Frobenius norms |R| = 1.047e3 on a
> single process, 1.0363e3 on a single HPC node (40 cores), and 9.7307e2 on
> two nodes.
>
>
> I have ascertained that the single process result is indeed the correct
> one (i.e., eigenvectors of R form a proper basis for the columns of D), so
> naturally I'd love to be able to reproduce this result across different
> parallel setups. How might I achieve this?
>
>
> I'm attaching MWE code and the data set used for the example.
>
>
> Thanks in advance!
>
>
> Best Regards
>
>
> Peder Jørgensgaard Olesen
>
> PhD Student, Turbulence Research Lab
>
> Dept. of Mechanical Engineering
>
> Technical University of Denmark
>
> Niels Koppels Allé
>
> Bygning 403, Rum 105
>
> DK-2800 Kgs. Lyngby
>


-- 
Stefano


Re: [petsc-users] Parallel TS for ODE

2021-04-20 Thread Stefano Zampini

> 
> 
> Thank you for the advices, I would just like to convert the code I already 
> have to see what might happen once parallelized.

You are not really listening to our advices. I can tell you what happens to 3 
coupled ODEs split  on 3 processes. The solver will be slower, by far.

> Do you think it is better to put the 3 equations into a 1d Distributed Array 
> with 3 dofs and run the job with multiple procs regardless of how many 
> equations I have? Is it possible? 
> 
> Thank you,
> Francesco
> 
>> Il giorno 20 apr 2021, alle ore 17:57, Stefano Zampini 
>> mailto:stefano.zamp...@gmail.com>> ha scritto:
>> 
>> It does not make sense to parallelize to 1 equation per process, unless that 
>> single equation per process is super super super costly.
>> Is this work you are doing used to understand PETSc parallelization 
>> strategy? if so, there are multiple examples in the sourcetree that you can 
>> look at to populate matrices and vectors in parallel
>> 
>> Il giorno mar 20 apr 2021 alle ore 17:52 Francesco Brarda 
>> mailto:brardafrance...@gmail.com>> ha scritto:
>> In principle the entire code was for 1 proc only. The functions were built 
>> with VecGetArray(). While adapting the code for multiple procs I thought 
>> using VecGetOwnershipRange was a possible way to allocate the equations in 
>> the vector using multiple procs. What do you think, please? 
>> 
>> Thank you,
>> Francesco
>> 
>>> Il giorno 20 apr 2021, alle ore 16:43, Matthew Knepley >> <mailto:knep...@gmail.com>> ha scritto:
>>> 
>>> On Tue, Apr 20, 2021 at 10:41 AM Francesco Brarda 
>>> mailto:brardafrance...@gmail.com>> wrote:
>>> I was trying to follow Barry's advice some time ago, but I guess that's not 
>>> the way he meant it. How should I refer to the values contained in x? With 
>>> Distributed Arrays?
>>> 
>>> That is how you get values from x. However, I cannot understand at all what 
>>> you are doing with "mybase".
>>> 
>>>Matt
>>>  
>>> Thanks 
>>> Francesco
>>> 
>>>>>  Even though it will not scale and will deliver slower performance it is 
>>>>> completely possible for you to solve the 3 variable problem using 3 MPI 
>>>>> ranks. Or 10 mpi ranks. You would just create vectors/matrices with 1 
>>>>> degree of freedom for the first three ranks and no degrees of freedom for 
>>>>> the later ranks. During your function evaluation (and Jacobian 
>>>>> evaluation) for TS you will need to set up the appropriate communication 
>>>>> to get the values you need on each rank to evaluate the parts of the 
>>>>> function evaluation needed by that rank. This is true for parallelizing 
>>>>> any computation.
>>>>> 
>>>>>  Barry
>>> 
>>> 
>>> 
>>>> Il giorno 20 apr 2021, alle ore 15:40, Matthew Knepley >>> <mailto:knep...@gmail.com>> ha scritto:
>>>> 
>>>> On Tue, Apr 20, 2021 at 9:36 AM Francesco Brarda 
>>>> mailto:brardafrance...@gmail.com>> wrote:
>>>> Hi!
>>>> I tried to implement the SIR model taking into account the fact that I 
>>>> will only use 3 MPI ranks at this moment.
>>>> I built vectors and matrices following the examples already available. In 
>>>> particular, I defined the functions required similarly (RHSFunction, 
>>>> IFunction, IJacobian), as follows:
>>>> 
>>>> I don't think this makes sense. You use "mybase" to distinguish between 3 
>>>> procs, which would indicate that each procs has only
>>>> 1 degree of freedom. However, you use x[1] on each proc, indicating it has 
>>>> at least 2 dofs.
>>>> 
>>>>   Thanks,
>>>> 
>>>>  Matt
>>>>  
>>>> static PetscErrorCode RHSFunction(TS ts,PetscReal t,Vec X,Vec F,void *ctx)
>>>> { 
>>>>   PetscErrorCodeierr;
>>>>   AppCtx*appctx = (AppCtx*) ctx;
>>>>   PetscScalar   f;//, *x_localptr; 
>>>>   const PetscScalar *x;
>>>>   PetscInt  mybase;
>>>>   
>>>>   PetscFunctionBeginUser;
>>>>   ierr = VecGetOwnershipRange(X,,NULL);CHKERRQ(ierr);
>>>>   ierr = VecGetArrayRead(X,);CHKERRQ(ierr);
>>>>   if (mybase == 0) {
>>>> f= (PetscScalar) (-appctx->p1*x[0]*x[1]/appctx->N);
>>>>  

Re: [petsc-users] Parallel TS for ODE

2021-04-20 Thread Stefano Zampini
It does not make sense to parallelize to 1 equation per process, unless
that single equation per process is super super super costly.
Is this work you are doing used to understand PETSc parallelization
strategy? if so, there are multiple examples in the sourcetree that you can
look at to populate matrices and vectors in parallel

Il giorno mar 20 apr 2021 alle ore 17:52 Francesco Brarda <
brardafrance...@gmail.com> ha scritto:

> In principle the entire code was for 1 proc only. The functions were built
> with VecGetArray(). While adapting the code for multiple procs I thought
> using VecGetOwnershipRange was a possible way to allocate the equations
> in the vector using multiple procs. What do you think, please?
>
> Thank you,
> Francesco
>
> Il giorno 20 apr 2021, alle ore 16:43, Matthew Knepley 
> ha scritto:
>
> On Tue, Apr 20, 2021 at 10:41 AM Francesco Brarda <
> brardafrance...@gmail.com> wrote:
>
>> I was trying to follow Barry's advice some time ago, but I guess that's
>> not the way he meant it. How should I refer to the values contained in x?
>> With Distributed Arrays?
>>
>
> That is how you get values from x. However, I cannot understand at all
> what you are doing with "mybase".
>
>Matt
>
>
>> Thanks
>> Francesco
>>
>>  Even though it will not scale and will deliver slower performance it is
>>> completely possible for you to solve the 3 variable problem using 3 MPI
>>> ranks. Or 10 mpi ranks. You would just create vectors/matrices with 1
>>> degree of freedom for the first three ranks and no degrees of freedom for
>>> the later ranks. During your function evaluation (and Jacobian evaluation)
>>> for TS you will need to set up the appropriate communication to get the
>>> values you need on each rank to evaluate the parts of the function
>>> evaluation needed by that rank. This is true for parallelizing any
>>> computation.
>>>
>>>  Barry
>>>
>>>
>>
>>
>> Il giorno 20 apr 2021, alle ore 15:40, Matthew Knepley 
>> ha scritto:
>>
>> On Tue, Apr 20, 2021 at 9:36 AM Francesco Brarda <
>> brardafrance...@gmail.com> wrote:
>>
>>> Hi!
>>> I tried to implement the SIR model taking into account the fact that I
>>> will only use 3 MPI ranks at this moment.
>>> I built vectors and matrices following the examples already available.
>>> In particular, I defined the functions required similarly (RHSFunction,
>>> IFunction, IJacobian), as follows:
>>>
>>
>> I don't think this makes sense. You use "mybase" to distinguish between 3
>> procs, which would indicate that each procs has only
>> 1 degree of freedom. However, you use x[1] on each proc, indicating it
>> has at least 2 dofs.
>>
>>   Thanks,
>>
>>  Matt
>>
>>
>>> static PetscErrorCode RHSFunction(TS ts,PetscReal t,Vec X,Vec
>>> F,void *ctx)
>>> {
>>>   PetscErrorCodeierr;
>>>   AppCtx*appctx = (AppCtx*) ctx;
>>>   PetscScalar   f;//, *x_localptr;
>>>   const PetscScalar *x;
>>>   PetscInt  mybase;
>>>
>>>   PetscFunctionBeginUser;
>>>   ierr = VecGetOwnershipRange(X,,NULL);CHKERRQ(ierr);
>>>   ierr = VecGetArrayRead(X,);CHKERRQ(ierr);
>>>   if (mybase == 0) {
>>> f= (PetscScalar) (-appctx->p1*x[0]*x[1]/appctx->N);
>>> ierr = VecSetValues(F,1,,,INSERT_VALUES);
>>>   }
>>>   if (mybase == 1) {
>>> f= (PetscScalar)
>>> (appctx->p1*x[0]*x[1]/appctx->N-appctx->p2*x[1]);
>>> ierr = VecSetValues(F,1,,,INSERT_VALUES);
>>>   }
>>>   if (mybase == 2) {
>>> f= (PetscScalar) (appctx->p2*x[1]);
>>> ierr = VecSetValues(F,1,,,INSERT_VALUES);
>>>   }
>>>   ierr = VecRestoreArrayRead(X,);CHKERRQ(ierr);
>>>   ierr = VecAssemblyBegin(F);CHKERRQ(ierr);
>>>   ierr = VecAssemblyEnd(F);CHKERRQ(ierr);
>>>   PetscFunctionReturn(0);
>>> }
>>>
>>>
>>> Whilst for the Jacobian I did:
>>>
>>>
>>> static PetscErrorCode IJacobian(TS ts,PetscReal t,Vec X,Vec
>>> Xdot,PetscReal a,Mat A,Mat B,void *ctx)
>>> {
>>>   PetscErrorCodeierr;
>>>   AppCtx*appctx = (AppCtx*) ctx;
>>>   PetscInt  mybase, rowcol[] = {0,1,2};
>>>   const PetscScalar *x;
>>>
>>>   PetscFunctionBeginUser;
>>>   ierr = MatGetOwnershipRange(B,,NULL);CHKERRQ(ierr);
>>>   ierr = VecGetArrayRead(X,);CHKERRQ(ierr);
>>>   if (mybase == 0) {
>>> const PetscScalar J[] = {a + appctx->p1*x[1]/appctx->N,
>>> appctx->p1*x[0]/appctx->N, 0};
>>> ierr =
>>> MatSetValues(B,1,,3,rowcol,J,INSERT_VALUES);CHKERRQ(ierr);
>>>   }
>>>   if (mybase == 1) {
>>> const PetscScalar J[] = {- appctx->p1*x[1]/appctx->N, a -
>>> appctx->p1*x[0]/appctx->N + appctx->p2, 0};
>>> ierr =
>>> MatSetValues(B,1,,3,rowcol,J,INSERT_VALUES);CHKERRQ(ierr);
>>>   }
>>>   if (mybase == 2) {
>>> const PetscScalar J[] = {0, - appctx->p2, a};
>>> ierr =
>>> MatSetValues(B,1,,3,rowcol,J,INSERT_VALUES);CHKERRQ(ierr);
>>>   }
>>>   ierr= VecRestoreArrayRead(X,);CHKERRQ(ierr);
>>>
>>>   ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>>>   ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);CHKERRQ(ierr);
>>>   if (A != B) {
>>> ierr = 

Re: [petsc-users] Newbie question: Strange failure when calling PetscIntView from slepc application

2021-04-09 Thread Stefano Zampini
==841883== Invalid write of size 4
==841883==at 0x503E784: petscintview_ 
(/data/work/slepc/PETSC/petsc-3.14.5/src/sys/error/ftn-custom/zerrf.c:109)
==841883==by 0x40262C: all_stab_routines_mp_write_rows_to_petsc_matrix_ 
(/data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp3/tryme.F:17)
==841883==by 0x402465: MAIN__ 
(/data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp3/tryme.F:40)
==841883==by 0x402221: main (in 
/data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp3/trashy.exe)
==841883==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==841883== 

This exactly points to the error and suggests you to take a look at 
/data/work/slepc/PETSC/petsc-3.14.5/src/sys/error/ftn-custom/zerrf.c, line 109
You are trying to write 4 bytes (most probably an int) where is not allowed.

> On Apr 9, 2021, at 12:32 PM, dazza simplythebest  wrote:
> 
> ==841883== Invalid write of size 4
> ==841883==at 0x503E784: petscintview_ 
> (/data/work/slepc/PETSC/petsc-3.14.5/src/sys/error/ftn-custom/zerrf.c:109)
> ==841883==by 0x40262C: all_stab_routines_mp_write_rows_to_petsc_matrix_ 
> (/data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp3/tryme.F:17)
> ==841883==by 0x402465: MAIN__ 
> (/data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp3/tryme.F:40)
> ==841883==by 0x402221: main (in 
> /data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp3/trashy.exe)
> ==841883==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> ==841883== 



Re: [petsc-users] Newbie question: Strange failure when calling PetscIntView from slepc application

2021-04-09 Thread Stefano Zampini
This way you are running mpiexec.hydra under valgrind. You want to run instead

mpiexec.hydra -n 1 valgrind --track-origins=yes --leak-check=full  
--fullpath-after= ./trashy.exe

> On Apr 9, 2021, at 12:11 PM, dazza simplythebest  wrote:
> 
> valgrind --track-origins=yes --leak-check=full mpiexec.hydra -n 1 
> ./trashy.exe



Re: [petsc-users] Newbie question: Strange failure when calling PetscIntView from slepc application

2021-04-09 Thread Stefano Zampini
As the error message says, use valgrind https://www.valgrind.org/ 
 to catch these kind of issues

> On Apr 9, 2021, at 10:43 AM, dazza simplythebest  wrote:
> 
> Dear All,
>   I am getting a puzzling 'Segmentation Violation' error when I 
> try to
> write out an integer array using PetscIntView in a Fortran code. I have 
> written the small 
> code below which reproduces the problem. All this code does is create
>  a PetscInt array, initialise this array, then try to write it out to screen.
> Interestingly PetscIntView does seem to correctly write out all the values to
> the screen (which agree with a direct write), but then fails before it can 
> return to
> the main program (see output pasted in below).
> 
> I think I must be doing something quite silly, but just 
> can't quite see what it is! Any suggestions will be very welcome.
>   Many thanks,
>Dan
>   
> Code:
> 
>   MODULE ALL_STAB_ROUTINES
>   IMPLICIT NONE
>   CONTAINS
> 
>   SUBROUTINE  WRITE_ROWS_TO_PETSC_MATRIX( ISIZE, JALOC)
> #include 
>   use slepceps
>   IMPLICIT NONE
>   PetscInt, INTENT (IN) ::  ISIZE
>   PetscInt, INTENT(INOUT), DIMENSION(0:ISIZE-1)  :: JALOC
>   PetscErrorCode   :: ierr
> 
>   write(*,*)'check 02: ',shape(jaloc),lbound(jaloc),ubound(jaloc)
>   write(*,*)jaloc
> 
>   write(*,*)'now for PetscIntView ...'
>   call PetscIntView(ISIZE,JALOC, PETSC_VIEWER_STDOUT_WORLD)
>   CHKERRA(ierr)
> 
>   END SUBROUTINE WRITE_ROWS_TO_PETSC_MATRIX
> 
>   END MODULE ALL_STAB_ROUTINES
>  
>   program  stabbo
>   USE  MPI
> #include 
>   use slepceps
>   USE ALL_STAB_ROUTINES
>   IMPLICIT NONE
>   PetscInt, ALLOCATABLE, DIMENSION(:) :: JALOC
>   PetscInt, PARAMETER ::  ISIZE = 10
>   PetscInt, parameter ::  FOUR=4
>   PetscErrorCode   :: ierr_pets
>   call SlepcInitialize(PETSC_NULL_CHARACTER,ierr_pets)
>
>   ALLOCATE(JALOC(0:ISIZE-1))
>   JALOC = FOUR
>   write(*,*)'check 01: ',shape(jaloc),lbound(jaloc),ubound(jaloc)
>   CALL WRITE_ROWS_TO_PETSC_MATRIX(ISIZE, JALOC)
>   CALL SlepcFinalize(ierr_pets)
>   END PROGRAM STABBO   
> 
> Output:
> 
> dan@super01 /data/work/rotplane/omega_to_zero/stability/test/tmp10/tmp3 $ 
> mpiexec.hydra -n 1 ./trashy.exe
>  check 01:   10   0   9
>  check 02:   10   0   9
>  4 4 4
>  4 4 4
>  4 4 4
>  4
>  now for PetscIntView ...
> 0: 4 4 4 4 4 4 4 4 4 4
> [0]PETSC ERROR: 
> 
> [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, 
> probably memory access out of range
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see 
> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind 
> 
> [0]PETSC ERROR: or try http://valgrind.org  on 
> GNU/linux and Apple Mac OS X to find memory corruption errors
> [0]PETSC ERROR: likely location of problem given in stack below
> [0]PETSC ERROR: -  Stack Frames 
> 
> [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
> [0]PETSC ERROR:   INSTEAD the line number of the start of the function
> [0]PETSC ERROR:   is given.
> [0]PETSC ERROR: - Error Message 
> --
> [0]PETSC ERROR: Signal received
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html 
>  for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.14.5, Mar 03, 2021
> [0]PETSC ERROR: ./trashy.exe on a  named super01 by darren Fri Apr  9 
> 16:28:25 2021
> [0]PETSC ERROR: Configure options 
> --package-prefix-hash=/home/darren/petsc-hash-pkgs --with-cc=mpiicc 
> --with-cxx=mpiicpc --with-fc=mpiifort --with-mpiexec=mpiexec.hydra 
> COPTFLAGS="-g -O" FOPTFLAGS="-g -O" CXXOPTFLAGS="-g -O" 
> --with-64-bit-indices=1 --with-scalar-type=complex --with-precision=double 
> --with-debugging=1 
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl 
> --with-mkl_pardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
>  
> --with-mkl_cpardiso-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl
>  --download-mumps --download-scalapack --download-cmake 
> PETSC_ARCH=arch-ci-linux-intel-mkl-cmplx-ilp64-dbg-ftn-with-external
> [0]PETSC ERROR: #1 User provided function() line 0 in  unknown file
> [0]PETSC ERROR: Checking the memory for corruption.
> Abort(50176059) 

Re: [petsc-users] Parallel TS for ODE

2021-03-31 Thread Stefano Zampini
Are you trying to parallelize a 3 equations system? Or you just use your SIR 
code to experiment with TS?


> On Mar 31, 2021, at 5:18 PM, Matthew Knepley  wrote:
> 
> On Wed, Mar 31, 2021 at 10:15 AM Francesco Brarda  > wrote:
> Thank you for your advices. 
> I wrote what seems to me a very basic code, but I got this error when I run 
> it with more than 1 processor:
> Clearly the result 299. is wrong but I do not understand what am doing wrong. 
> With 1 processor it works fine.
> 
> My guess is that you do VecGetArray() and index the array using global 
> indices rather than local indices, because
> there memory corruption with a Vec array.
> 
>   Thanks,
> 
>  Matt
>  
> steps 150, ftime 15.
> Vec Object: 2 MPI processes
>   type: mpi
> Process [0]
> 16.5613
> 2.91405
> Process [1]
> 299.
> [0]PETSC ERROR: PetscTrFreeDefault() called from VecDestroy_MPI() line 21 in 
> /home/fbrarda/petsc/src/vec/vec/impls/mpi/pdvec.c
> [0]PETSC ERROR: Block [id=0(16)] at address 0x15812a0 is corrupted (probably 
> write past end of array)
> [0]PETSC ERROR: Block allocated in VecCreate_MPI_Private() line 514 in 
> /home/fbrarda/petsc/src/vec/vec/impls/mpi/pbvec.c
> [0]PETSC ERROR: - Error Message 
> --
> [0]PETSC ERROR: Memory corruption: 
> https://www.mcs.anl.gov/petsc/documentation/installation.html#valgrind 
> 
> [0]PETSC ERROR: Corrupted memory
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html 
>  for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.14.4, unknown 
> [0]PETSC ERROR: ./par_sir_model on a arch-debug named srvulx13 by fbrarda Wed 
> Mar 31 16:05:22 2021
> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ 
> --with-fc=gfortran --with-openblas-dir=/opt/packages/openblas/0.2.13-gcc 
> --download-mpich PETSC_ARCH=arch-debug
> [0]PETSC ERROR: #1 PetscTrFreeDefault() line 310 in 
> /home/fbrarda/petsc/src/sys/memory/mtr.c
> [0]PETSC ERROR: #2 VecDestroy_MPI() line 21 in 
> /home/fbrarda/petsc/src/vec/vec/impls/mpi/pdvec.c
> [0]PETSC ERROR: #3 VecDestroy() line 396 in 
> /home/fbrarda/petsc/src/vec/vec/interface/vector.c
> [0]PETSC ERROR: #4 SNESLineSearchReset() line 284 in 
> /home/fbrarda/petsc/src/snes/linesearch/interface/linesearch.c
> [0]PETSC ERROR: #5 SNESReset() line 3229 in 
> /home/fbrarda/petsc/src/snes/interface/snes.c
> [0]PETSC ERROR: #6 TSReset() line 2800 in 
> /home/fbrarda/petsc/src/ts/interface/ts.c
> [0]PETSC ERROR: #7 TSDestroy() line 2856 in 
> /home/fbrarda/petsc/src/ts/interface/ts.c
> [0]PETSC ERROR: #8 main() line 256 in par_sir_model.c
> [0]PETSC ERROR: No PETSc Option Table entries
> [0]PETSC ERROR: End of Error Message ---send entire error 
> message to petsc-ma...@mcs.anl.gov--
> application called MPI_Abort(MPI_COMM_SELF, 256001) - process 0
> [1]PETSC ERROR: PetscTrFreeDefault() called from VecDestroy_MPI() line 21 in 
> /home/fbrarda/petsc/src/vec/vec/impls/mpi/pdvec.c
> [1]PETSC ERROR: Block [id=0(16)] at address 0xbd9520 is corrupted (probably 
> write past end of array)
> [1]PETSC ERROR: Block allocated in VecCreate_MPI_Private() line 514 in 
> /home/fbrarda/petsc/src/vec/vec/impls/mpi/pbvec.c
> [1]PETSC ERROR: - Error Message 
> --
> [1]PETSC ERROR: Memory corruption: 
> https://www.mcs.anl.gov/petsc/documentation/installation.html#valgrind 
> 
> [1]PETSC ERROR: Corrupted memory
> [1]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html 
>  for trouble shooting.
> [1]PETSC ERROR: Petsc Release Version 3.14.4, unknown 
> [1]PETSC ERROR: ./par_sir_model on a arch-debug named srvulx13 by fbrarda Wed 
> Mar 31 16:05:22 2021
> [1]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ 
> --with-fc=gfortran --with-openblas-dir=/opt/packages/openblas/0.2.13-gcc 
> --download-mpich PETSC_ARCH=arch-debug
> [1]PETSC ERROR: #1 PetscTrFreeDefault() line 310 in 
> /home/fbrarda/petsc/src/sys/memory/mtr.c
> [1]PETSC ERROR: #2 VecDestroy_MPI() line 21 in 
> /home/fbrarda/petsc/src/vec/vec/impls/mpi/pdvec.c
> [1]PETSC ERROR: #3 VecDestroy() line 396 in 
> /home/fbrarda/petsc/src/vec/vec/interface/vector.c
> [1]PETSC ERROR: #4 TSReset() line 2806 in 
> /home/fbrarda/petsc/src/ts/interface/ts.c
> [1]PETSC ERROR: #5 TSDestroy() line 2856 in 
> /home/fbrarda/petsc/src/ts/interface/ts.c
> [1]PETSC ERROR: #6 main() line 256 in par_sir_model.c
> [1]PETSC ERROR: No PETSc Option Table entries
> [1]PETSC ERROR: End of Error Message ---send entire error 
> message to petsc-ma...@mcs.anl.gov--
> application called 

Re: [petsc-users] funny link error

2021-03-21 Thread Stefano Zampini
This looks like a CMAKE issue. Good luck

Il giorno dom 21 mar 2021 alle ore 15:26 Mark Adams  ha
scritto:

> We are having problems with linking and use static linking.
> We get this error and have seen others like it (eg, lpetsc_lib_gcc_s)
>
> /usr/bin/ld: cannot find -lpetsc_lib_wlm_detect-NOTFOUND
>
> wlm_detect is some sort of system library, but I have no idea where this
> petsc string comes from.
> This is on Cori and the application uses cmake.
> I can run PETSc tests fine.
>
> Any ideas?
>
> Thanks,
> Mark
>


-- 
Stefano


Re: [petsc-users] DMPlex in Firedrake: scaling of mesh distribution

2021-03-07 Thread Stefano Zampini
> I want understand why calling CreateEmbeddedRootSF() would be an abuse.
>

It was just sarcasm to emphasize the number of new SFs created. Being a
very general code, DMPlex does the right thing and uses the proper calls.


> On Mar 7, 2021, at 10:01 PM, Barry Smith  wrote:
>>
>>
>>Mark,
>>
>>Thanks for the numbers.
>>
>>Extremely problematic. DMPlexDistribute takes 88 percent of the total
>> run time, SFBcastOpEnd takes 80 percent.
>>
>>Probably Matt is right, PetscSF is flooding the network which it
>> cannot handle. IMHO fixing PetscSF would be a far better route than writing
>> all kinds of fancy DMPLEX hierarchical distributors.   PetscSF needs to
>> detect that it  is sending too many messages together and do the messaging
>> in appropriate waves; at the moment PetscSF is as dumb as stone it just
>> shoves everything out as fast as it can. Junchao needs access to this
>> machine. If everything in PETSc will depend on PetscSF then it simply has
>> to scale on systems where you cannot just flood the network with MPI.
>>
>>   Barry
>>
>>
>> Mesh Partition 1 1.0 5.0133e+02 1.0 0.00e+00 0.0 1.3e+05 2.7e+02
>> 6.0e+00 15  0  0  0  0  15  0  0  0  1 0
>> Mesh Migration 1 1.0 1.5494e+03 1.0 0.00e+00 0.0 7.3e+05 1.9e+02
>> 2.4e+01 45  0  0  0  1  46  0  0  0  2 0
>> DMPlexPartStrtSF   1 1.0 4.9474e+023520.8 0.00e+00 0.0 3.3e+04
>> 4.3e+00.0e+00 14  0  0  0  0  15  0  0  0  0 0
>> DMPlexPointSF  1 1.0 9.8750e+021264.8 0.00e+00 0.0 6.6e+04
>> 5.4e+00.0e+00 28  0  0  0  0  29  0  0  0  0 0
>> DMPlexDistribute   1 1.0 3.e+03 1.5 0.00e+00 0.0 9.3e+05 2.3e+02
>> 3.0e+01 88  0  0  0  2  90  0  0  0  3 0
>> DMPlexDistCones1 1.0 1.0688e+03 2.6 0.00e+00 0.0 1.8e+05 3.1e+02
>> 1.0e+00 31  0  0  0  0  31  0  0  0  0 0
>> DMPlexDistLabels   1 1.0 2.9172e+02 1.0 0.00e+00 0.0 3.1e+05 1.9e+02
>> 2.1e+01  9  0  0  0  1   9  0  0  0  2 0
>> DMPlexDistField1 1.0 1.8688e+02 1.2 0.00e+00 0.0 2.1e+05 9.3e+01
>> 1.0e+00  5  0  0  0  0   5  0  0  0  0 0
>> SFSetUp   62 1.0 7.3283e+0213.6 0.00e+00 0.0 2.0e+07 2.7e+04
>> 0.0e+00  5  0  1  3  0   5  0  6  9  0 0
>> SFBcastOpBegin   107 1.0 1.5770e+00452.5 0.00e+00 0.0 2.1e+07 1.8e+04
>> 0.0e+00 0  0  1  2  0   0  0  6  6  0 0
>> SFBcastOpEnd 107 1.0 2.9430e+03 4.8 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 80  0  0  0  0  82  0  0  0  0 0
>> SFDistSection  9 1.0 4.4325e+02 1.5 0.00e+00 0.0 2.8e+06 1.1e+04
>> 9.0e+00 11  0  0  0  0  11  0  1  1  1 0
>> SFSectionSF   11 1.0 2.3898e+02 4.7 0.00e+00 0.0 9.2e+05 1.7e+05
>> 0.0e+00  5  0  0  1  0   5  0  0  2  0 0
>>
>> On Mar 7, 2021, at 7:35 AM, Mark Adams  wrote:
>>
>> And this data puts one cell per process, distributes, and then refines 5
>> (or 2,3,4 in plot) times.
>>
>> On Sun, Mar 7, 2021 at 8:27 AM Mark Adams  wrote:
>>
>>> FWIW, Here is the output from ex13 on 32K processes (8K Fugaku
>>> nodes/sockets, 4 MPI/node, which seems recommended) with 128^3 vertex mesh
>>> (64^3 Q2 3D Laplacian).
>>> Almost an hour.
>>> Attached is solver scaling.
>>>
>>>
>>>   0 SNES Function norm 3.658334849208e+00
>>>   Linear solve converged due to CONVERGED_RTOL iterations 22
>>>   1 SNES Function norm 1.609000373074e-12
>>> Nonlinear solve converged due to CONVERGED_ITS iterations 1
>>>   Linear solve converged due to CONVERGED_RTOL iterations 22
>>>   Linear solve converged due to CONVERGED_RTOL iterations 22
>>>   Linear solve converged due to CONVERGED_RTOL iterations 22
>>>   Linear solve converged due to CONVERGED_RTOL iterations 22
>>>   Linear solve converged due to CONVERGED_RTOL iterations 22
>>>   Linear solve converged due to CONVERGED_RTOL iterations 22
>>>   Linear solve converged due to CONVERGED_RTOL iterations 22
>>>   Linear solve converged due to CONVERGED_RTOL iterations 22
>>>   Linear solve converged due to CONVERGED_RTOL iterations 22
>>>   Linear solve converged due to CONVERGED_RTOL iterations 22
>>>
>>> 
>>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
>>> -fCourier9' to print this document***
>>>
>>> 
>>>
>>> -- PETSc Performance
>>> Summary: --
>>>
>>> ../ex13 on a  named i07-4008c with 32768 processors, by a04199 Fri Feb
>>> 12 23:27:13 2021
>>> Using Petsc Development GIT revision: v3.14.4-579-g4cb72fa  GIT Date:
>>> 2021-02-05 15:19:40 +
>>>
>>>  Max   Max/Min Avg   Total
>>> Time (sec):   3.373e+03 1.000   3.373e+03
>>> Objects:  1.055e+0514.797   7.144e+03
>>> Flop: 5.376e+10 1.176   4.885e+10  1.601e+15
>>> 

Re: [petsc-users] DMPlex in Firedrake: scaling of mesh distribution

2021-03-07 Thread Stefano Zampini
Mark

Being an MPI issue, you should run with -log_sync 
From your log the problem seems with SFSetup that is called many times (62), 
with timings associated mostly to the SF  revealing ranks phase.
DMPlex  abuses of the embedded SF, that can be optimized further I presume. It 
should run (someone has to write the code) a cheaper operation, since the 
communication graph of the embedded SF is a subgraph of the original . 



> On Mar 7, 2021, at 10:01 PM, Barry Smith  wrote:
> 
> 
>Mark,
> 
>Thanks for the numbers.
> 
>Extremely problematic. DMPlexDistribute takes 88 percent of the total run 
> time, SFBcastOpEnd takes 80 percent. 
> 
>Probably Matt is right, PetscSF is flooding the network which it cannot 
> handle. IMHO fixing PetscSF would be a far better route than writing all 
> kinds of fancy DMPLEX hierarchical distributors.   PetscSF needs to detect 
> that it  is sending too many messages together and do the messaging in 
> appropriate waves; at the moment PetscSF is as dumb as stone it just shoves 
> everything out as fast as it can. Junchao needs access to this machine. If 
> everything in PETSc will depend on PetscSF then it simply has to scale on 
> systems where you cannot just flood the network with MPI.
> 
>   Barry
> 
> 
> Mesh Partition 1 1.0 5.0133e+02 1.0 0.00e+00 0.0 1.3e+05 2.7e+02 
> 6.0e+00 15  0  0  0  0  15  0  0  0  1 0
> Mesh Migration 1 1.0 1.5494e+03 1.0 0.00e+00 0.0 7.3e+05 1.9e+02 
> 2.4e+01 45  0  0  0  1  46  0  0  0  2 0
> DMPlexPartStrtSF   1 1.0 4.9474e+023520.8 0.00e+00 0.0 3.3e+04 
> 4.3e+00.0e+00 14  0  0  0  0  15  0  0  0  0 0
> DMPlexPointSF  1 1.0 9.8750e+021264.8 0.00e+00 0.0 6.6e+04 
> 5.4e+00.0e+00 28  0  0  0  0  29  0  0  0  0 0
> DMPlexDistribute   1 1.0 3.e+03 1.5 0.00e+00 0.0 9.3e+05 2.3e+02 
> 3.0e+01 88  0  0  0  2  90  0  0  0  3 0
> DMPlexDistCones1 1.0 1.0688e+03 2.6 0.00e+00 0.0 1.8e+05 3.1e+02 
> 1.0e+00 31  0  0  0  0  31  0  0  0  0 0
> DMPlexDistLabels   1 1.0 2.9172e+02 1.0 0.00e+00 0.0 3.1e+05 1.9e+02 
> 2.1e+01  9  0  0  0  1   9  0  0  0  2 0
> DMPlexDistField1 1.0 1.8688e+02 1.2 0.00e+00 0.0 2.1e+05 9.3e+01 
> 1.0e+00  5  0  0  0  0   5  0  0  0  0 0
> SFSetUp   62 1.0 7.3283e+0213.6 0.00e+00 0.0 2.0e+07 2.7e+04 
> 0.0e+00  5  0  1  3  0   5  0  6  9  0 0
> SFBcastOpBegin   107 1.0 1.5770e+00452.5 0.00e+00 0.0 2.1e+07 1.8e+04 
> 0.0e+00 0  0  1  2  0   0  0  6  6  0 0
> SFBcastOpEnd 107 1.0 2.9430e+03 4.8 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00 80  0  0  0  0  82  0  0  0  0 0
> SFDistSection  9 1.0 4.4325e+02 1.5 0.00e+00 0.0 2.8e+06 1.1e+04 
> 9.0e+00 11  0  0  0  0  11  0  1  1  1 0
> SFSectionSF   11 1.0 2.3898e+02 4.7 0.00e+00 0.0 9.2e+05 1.7e+05 
> 0.0e+00  5  0  0  1  0   5  0  0  2  0 0
> 
>> On Mar 7, 2021, at 7:35 AM, Mark Adams > > wrote:
>> 
>> And this data puts one cell per process, distributes, and then refines 5 (or 
>> 2,3,4 in plot) times.
>> 
>> On Sun, Mar 7, 2021 at 8:27 AM Mark Adams > > wrote:
>> FWIW, Here is the output from ex13 on 32K processes (8K Fugaku 
>> nodes/sockets, 4 MPI/node, which seems recommended) with 128^3 vertex mesh 
>> (64^3 Q2 3D Laplacian).
>> Almost an hour.
>> Attached is solver scaling.
>> 
>> 
>>   0 SNES Function norm 3.658334849208e+00 
>>   Linear solve converged due to CONVERGED_RTOL iterations 22
>>   1 SNES Function norm 1.609000373074e-12 
>> Nonlinear solve converged due to CONVERGED_ITS iterations 1
>>   Linear solve converged due to CONVERGED_RTOL iterations 22
>>   Linear solve converged due to CONVERGED_RTOL iterations 22
>>   Linear solve converged due to CONVERGED_RTOL iterations 22
>>   Linear solve converged due to CONVERGED_RTOL iterations 22
>>   Linear solve converged due to CONVERGED_RTOL iterations 22
>>   Linear solve converged due to CONVERGED_RTOL iterations 22
>>   Linear solve converged due to CONVERGED_RTOL iterations 22
>>   Linear solve converged due to CONVERGED_RTOL iterations 22
>>   Linear solve converged due to CONVERGED_RTOL iterations 22
>>   Linear solve converged due to CONVERGED_RTOL iterations 22
>> 
>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r 
>> -fCourier9' to print this document***
>> 
>> 
>> -- PETSc Performance Summary: 
>> --
>> 
>> ../ex13 on a  named i07-4008c with 32768 processors, by a04199 Fri Feb 12 
>> 23:27:13 2021
>> Using Petsc Development GIT revision: v3.14.4-579-g4cb72fa  GIT Date: 
>> 2021-02-05 15:19:40 +
>> 
>>  Max   Max/Min  

Re: [petsc-users] DMPlex in Firedrake: scaling of mesh distribution

2021-03-07 Thread Stefano Zampini
128^3 is the entire mesh. The blue line (1 phase) is with dmplexdistribute,
the red line, with a two-stage approach.

Il Dom 7 Mar 2021, 16:20 Mark Adams  ha scritto:

> Is phase 1 the old method and 2 the new?
> Is this 128^3 mesh per process?
>
> On Sun, Mar 7, 2021 at 7:27 AM Stefano Zampini 
> wrote:
>
>>
>>
>> [2] On the robustness and performance of entropy stable discontinuous
>>> collocation methods for the compressible Navier-Stokes equations, ROjas .
>>> et.al.
>>>   https://arxiv.org/abs/1911.10966
>>>
>>
>> This is not the proper reference, here is the correct one
>> https://www.sciencedirect.com/science/article/pii/S0021999120306185?dgcid=rss_sd_all
>> However, there the algorithm is only outlined, and performances related
>> to the mesh distribution are not really reported.
>> We observed a large gain for large core counts and one to all
>> distributions (from minutes to seconds) by splitting the several
>> communication rounds needed by DMPlex into stages: from rank 0 to 1 rank
>> per node, and then decomposing independently within the node.
>> Attached the total time for one-to-all DMPlexDistrbute for a 128^3 mesh
>>
>>
>>>
>>>
>>>> ?
>>>>
>>>> The attached plots suggest (A), (B), and (C) is happening for
>>>> Cahn-Hilliard problem (from firedrake-bench repo) on a 2D 8Kx8K
>>>> unit-square mesh. The implementation is here [1]. Versions are
>>>> Firedrake, PyOp2: 20200204.0; PETSc 3.13.1; ParMETIS 4.0.3.
>>>>
>>>> Two questions, one on (A) and the other on (B)+(C):
>>>>
>>>> 1. Is (A) result expected? Given (A), any effort to improve the quality
>>>> of the compiled assembly kernels (or anything else other than mesh
>>>> distribution) appears futile since it takes 1% of end-to-end execution
>>>> time, or am I missing something?
>>>>
>>>> 1a. Is mesh distribution fundamentally necessary for any FEM framework,
>>>> or is it only needed by Firedrake? If latter, then how do other
>>>> frameworks partition the mesh and execute in parallel with MPI but avoid
>>>> the non-scalable mesh destribution step?
>>>>
>>>> 2. Results (B) and (C) suggest that the mesh distribution step does
>>>> not scale. Is it a fundamental property of the mesh distribution problem
>>>> that it has a central bottleneck in the master process, or is it
>>>> a limitation of the current implementation in PETSc-DMPlex?
>>>>
>>>> 2a. Our (B) result seems to agree with Figure 4(left) of [2]. Fig 6 of
>>>> [2]
>>>> suggests a way to reduce the time spent on sequential bottleneck by
>>>> "parallel mesh refinment" that creates high-resolution meshes from an
>>>> initial coarse mesh. Is this approach implemented in DMPLex?  If so, any
>>>> pointers on how to try it out with Firedrake? If not, any other
>>>> directions for reducing this bottleneck?
>>>>
>>>> 2b. Fig 6 in [3] shows plots for Assembly and Solve steps that scale
>>>> well up
>>>> to 96 cores -- is mesh distribution included in those times?  Is anyone
>>>> reading this aware of any other publications with evaluations of
>>>> Firedrake that measure mesh distribution (or explain how to avoid or
>>>> exclude it)?
>>>>
>>>> Thank you for your time and any info or tips.
>>>>
>>>>
>>>> [1]
>>>> https://github.com/ISI-apex/firedrake-bench/blob/master/cahn_hilliard/firedrake_cahn_hilliard_problem.py
>>>>
>>>> [2] Unstructured Overlapping Mesh Distribution in Parallel, Matthew G.
>>>> Knepley, Michael Lange, Gerard J. Gorman, 2015.
>>>> https://arxiv.org/pdf/1506.06194.pdf
>>>>
>>>> [3] Efficient mesh management in Firedrake using PETSc-DMPlex, Michael
>>>> Lange, Lawrence Mitchell, Matthew G. Knepley and Gerard J. Gorman, SISC,
>>>> 38(5), S143-S155, 2016. http://arxiv.org/abs/1506.07749
>>>>
>>>
>>
>> --
>> Stefano
>>
>


  1   2   3   >