Re: [petsc-users] Including Implementations in my code

2019-11-14 Thread Jed Brown via petsc-users
Ritwik Saha via petsc-users  writes:

> Hi All,
>
> PETSc provides various implementations of functions like VecAXPY() in CUDA.
> I am talking specifically about VecAXPY_SeqCUDA() in
> src/vec/vec/impls/seq/seqcuda/veccuda2.cu . How to I include these
> functions in my C code?

I'm not sure I follow.  If you want to call those functions, set the
VecType to VECCUDA.  For many examples, this is done via the run-time
option

  -dm_vec_type cuda

(see many examples in the PETSc source tree).  If you're trying to copy
the implementation into your code without using PETSc, you're on your
own.


Re: [petsc-users] quick question on MatXAIJSetPreallocation

2019-10-30 Thread Jed Brown via petsc-users
Xiangdong via petsc-users  writes:

> Hello everyone,
>
> It seems that to use MatXAIJSetPreallocation, one has to pass the array of
> the number of nonzero blocks per row, even if this number is same across
> all the local rows.
>
> For the other preallocation functions (say, MatMPIBAIJSetPreallocation),
> they do support both a constant and a array for the number of nonzeros.
>
> I am curious that the reason MatAIJSetPreallocation does not support the
> constant number of nonzeros case.

It was in the interest of limiting the number of arguments and
recognizing that it's easy to call all relevant preallocation functions
if you have constant row lengths, but somewhat involved to convert if
you don't.


Re: [petsc-users] DMCompositeSetCoupling -- simple example?

2019-10-28 Thread Jed Brown via petsc-users
Its purpose is to preallocate off-diagonal blocks, but unless you're
hard up against your memory capacity, I would skip that clumsy code and
use MatPreallocator.

"Ellen M. Price via petsc-users"  writes:

> Hello again!
>
> For my multiphysics problem, I think a DMComposite might make the most
> sense, since some variables are only defined in 1d (so any 2d
> information is just wasting memory and CPU), while others have a full 2d
> distribution. To tell PETSc how the two DMs should be coupled, it looks
> like I need DMCompositeSetCoupling, but there are no examples on how to
> use it, or even what the parameters of the provided function mean. Does
> anyone have a simple example to share? Doesn't have to be full code, a
> snippet would be just fine.
>
> Thanks,
> Ellen Price


Re: [petsc-users] R: AMG Variable block preconditioner

2019-10-28 Thread Jed Brown via petsc-users
Smoothed aggregation mostly just cares about the near-null space
(MatSetNearNullSpace), which is a global property.  Classical AMG uses
block size directly (number of dofs per C-point), but I'm not aware of
any implementation that supports variable block size.  This would be a
research topic.

Marco Cisternino via petsc-users  writes:

> Hi Stefano,
> thanks for your interest.
> Mainly, elliptic operators both in the context of methods in NS solvers and 
> in the context of mesh deformation.
> I didn't think about AMG on each block. What I was thinking about is an AMG 
> preconditioner aware about the variable size blocks.
> I hope it is clearer than before.
> Thanks,
>
>
> Marco Cisternino, PhD
> marco.cistern...@optimad.it
> ___
> OPTIMAD Engineering srl
> Via Giacinto Collegno 18, Torino, Italia.
> +3901119719782
> www.optimad.it
>
>
> 
> Da: Stefano Zampini 
> Inviato: lunedì 28 ottobre 2019 17:09
> A: Marco Cisternino
> Cc: petsc-users
> Oggetto: Re: [petsc-users] AMG Variable block preconditioner
>
> Marco,
>
> you should define what do you mean by "AMG block preconditioning"; separate 
> AMG on each block? What operator do you plan to precondition?
> AFAIK, MatSetVariableBlockSizes is just exploited by the Jacobi solver .
>
> Thanks
> Stefano
>
> Il giorno lun 28 ott 2019 alle ore 19:05 Marco Cisternino via petsc-users 
> mailto:petsc-users@mcs.anl.gov>> ha scritto:
> Hello everybody.
> We would like to setup an algebraic multi-grid block preconditioner and I 
> would like to be able to define custom blocks of variable sizes. Reading the 
> documentation, it seems that this can be achieved using the "PCGAMG" 
> preconditioner and defining the blocks with the function 
> "MatSetVariableBlockSizes". Is this correct?
> Are the variable blocks exploited by PCGAMG algorithms? We suppose this PC 
> structure is already exploited in PCVPBJACOBI.
> Are the different options like PCHYPRE (setting "-pc_hypre_type" to 
> "boomeramg") suitable for AMG preconditioner with variable size blocks?
> If none of the above solutions are available, could anyone suggest any other 
> strategy in order to get a preconditioner like this using PETSc?
> Any help is much appreciated.
> Thanks.
>
> Best regards,
>
> Marco Cisternino
>
>
> --
> Stefano


Re: [petsc-users] TS_SSP implementation for co-dependent variables

2019-10-23 Thread Jed Brown via petsc-users
Manuel Valera  writes:

> Yes, all of that sounds correct to me,
>
> No I haven't tried embedding the column integral into the RHS, right now I
> am unable to think how to do this without the solution of the previous
> intermediate stage. Any ideas are welcome,

Do you have some technical notes on your present formulation?  I think
it just amounts to performing the integration and then evaluating the
differential operator using results of that integral.

> Thanks,
>
> On Wed, Oct 9, 2019 at 4:18 PM Jed Brown  wrote:
>
>> Manuel Valera  writes:
>>
>> > Sorry, I don't follow this last email, my spatial discretization is
>> fixed,
>> > the problem is caused by the choice of vertical coordinate, in this case
>> > sigma, that calls for an integration of the hydrostatic pressure to
>> correct
>> > for the right velocities.
>>
>> Ah, fine.  To phrase this differently, you are currently solving an
>> integro-differential equation.  With an explicit integrator, you should
>> be able to embed that in the RHS function.  With an implicit integrator,
>> that causes the Jacobian to lose sparsity (the column integral is dense
>> coupling) so it's sometimes preferable to add pressure as an explicit
>> variable (or transform your existing variable set as part of a
>> preconditioner), in which case you get a differential algebraic equation
>> (the incompressible limit).
>>
>> Have you tried embedding the column integral into the RHS function to
>> make a single unsplit formulation?
>>
>> >  I had RK3 working before and SSP is much more stable, i can use way
>> bigger
>> > DTs but then i get this asynchronous time integration. With RK3 I can
>> > operate in the intermediate states and thus I can advance everything in
>> > synchronization, but bigger DTs are not viable, it turns unstable
>> quickly.
>> >
>> > On Wed, Oct 9, 2019 at 3:58 PM Jed Brown  wrote:
>> >
>> >> Is it a problem with the spatial discretization or with the time
>> >> discretization that you've been using thus far?  (This sort of problem
>> >> can occur for either reason.)
>> >>
>> >> Note that an SSP method is merely "preserving" -- the spatial
>> >> discretization needs to be strongly stable for an SSP method to preserve
>> >> it.  It sounds like yours is not, so maybe there is no particular
>> >> benefit to using SSP over any other method (but likely tighter time step
>> >> restriction).
>> >>
>> >> Manuel Valera  writes:
>> >>
>> >> > To correct for the deformation of the sigma coordinate grid... without
>> >> this
>> >> > correction the velocity become unphysical in the zones of high slope
>> of
>> >> the
>> >> > grid. This is very specific of our model and probably will be solved
>> by
>> >> > updating the equations transformation, but that's not nearly close to
>> >> > happening right now.
>> >> >
>> >> > On Wed, Oct 9, 2019 at 3:47 PM Jed Brown  wrote:
>> >> >
>> >> >> Manuel Valera  writes:
>> >> >>
>> >> >> > Thanks,
>> >> >> >
>> >> >> > My time integration schemes are all explicit, sorry if this a very
>> >> >> atypical
>> >> >> > setup. This is similar to the barotropic splitting but not
>> exactly, we
>> >> >> > don't have free surface in the model, this is only to correct for
>> >> sigma
>> >> >> > coordinates deformations in the velocity field.
>> >> >> >
>> >> >> > From how i see it this could be solved by obtaining the
>> intermediate
>> >> >> stages
>> >> >> > and then updating them accordingly, is this not possible to do ?
>> >> >>
>> >> >> Why are you splitting if all components are explicit and not
>> subcycled?
>> >> >>
>> >>
>>


Re: [petsc-users] Block preconditioning for 3d problem

2019-10-14 Thread Jed Brown via petsc-users
Dave Lee  writes:

> Thanks Jed,
>
> I will reconfigure my PETSc with MUMPS or SuperLU and see if that helps.
> (my code is configured to only run in parallel on 6*n^2 processors (n^2
> procs on each face of a cubed sphere, which is a little annoying for
> situations like these where a serial LU solver would be handy for testing).
>
> I've tried setting a tight tolerance on the slab-wise block solve, and
> preconditioning based on the pressure, but these haven't helped.
>
> Unlike other atmosphere codes I'm using slab (layer)-minor indexing, as my
> horizontal discretisation is arbitrary order, while I'm only piecewise
> constant/linear in the vertical, so I've already got the slab-wise dofs
> close together in memory.

Being high order does not necessarily mean more tightly coupled.  A
useful test is to compute a column of the Jacobian inverse (e.g., by
solving with a right-hand side that is a column of the identity) and
plot it to see how it extends in each spatial dimension.

> I feel like maybe the construction of the Hessenberg during the Arnoldi
> iteration and/or the least squares minimisation for the GMRES solve is
> leading to additional coupling between the slabs, which is maybe degrading
> the convergence, just a thought...

GMRES does not care how you order degrees of freedom.

My guess based on what I've seen in this thread is that there is a bug
in your discretization.


Re: [petsc-users] Block preconditioning for 3d problem

2019-10-11 Thread Jed Brown via petsc-users
Dave Lee  writes:

> Hey Jed,
>
> Right now they are uncoupled purely for testing purposes. I just want to
> get the uncoupled problem right before moving on to the coupled problem.

Okay, well you may not want to change your dof ordering to put slabs
together since you'll likely have some parts of your code that do
vertical processing (column integrals/physics or PDE stiffness in the
vertical).  It's common for climate/weather codes to use some horizontal
index (within a blocking scheme or an entire subdomain array index) as
the innermost to facilitate vectorization of the column processing
(which has high enough arithmetic intensity that vectorization matters).

> When I add the vertical dynamics back in both the outer nonlinear
> convergence and the inner linear convergence are roughly the same as for
> when I solve for the 2D slabs only as a 3D system.

Which you've said is terrible compared to separate 2D slabs, so we have
to get to the bottom of that.  We outlined a number of tests you can do
to understand why it isn't converging as expected, but haven't heard
back on the results so it'll be hard to advise further.

> I like your scaling idea. I tried using the pressure multiplied by the mass
> matrix as a preconditioner for the inner linear problem, but this didn't
> help. Perhaps some sort of scaling is the way to go though.
>
> Cheers, Dave.
>
> On Fri, Oct 11, 2019 at 2:45 PM Jed Brown  wrote:
>
>> Why are your slabs decoupled at present?  (Have you done a transform in
>> the vertical?)  Is the linear convergence significantly different when
>> you include the multiple layers?
>>
>> Dave Lee  writes:
>>
>> > Hi Jed and Mark,
>> >
>> > thanks for your helpful comments. Yes the nonlinear outer problem is
>> > uncoupled between the slabs, it is only the linear inner problem where
>> they
>> > are coupled.
>> >
>> > I've tried to make the slab DOFs close in memory, and also tried using a
>> > tight tolerance on the outer KSP (1.0e-20), but without success.
>> >
>> > You make some really good points about scaling issues Jed and Mark. This
>> is
>> > a solve for a global atmosphere, and each 2D slab is a horizontal layer
>> of
>> > the atmosphere, so the pressure (which the linear solve is for) will vary
>> > dramatically between slabs. Perhaps I can additionally precondition the
>> > linear problem to normalise the pressure in each slab so that they stay
>> > close to unity.
>> >
>> > Cheers, Dave.
>> >
>> > On Fri, Oct 11, 2019 at 2:52 AM Mark Adams  wrote:
>> >
>> >> I can think of a few sources of coupling in the solver: 1) line search
>> and
>> >> 2) Krylov, and 3) the residual test (scaling issues). You could turn
>> >> linesearch off and use Richardson (with a fixed number of iterations) or
>> >> exact solves as Jed suggested. As far as scaling can you use the same NL
>> >> problem on each slab? This should fix all the problems anyway. Or, on
>> the
>> >> good decouple solves, if the true residual is of the same scale and
>> *all*
>> >> of the slabs converge well then you should be OK on scaling. If this
>> works
>> >> then start adding stuff back in and see what breaks it.
>> >>
>> >> On Thu, Oct 10, 2019 at 11:01 AM Jed Brown via petsc-users <
>> >> petsc-users@mcs.anl.gov> wrote:
>> >>
>> >>> Dave Lee via petsc-users  writes:
>> >>>
>> >>> > Hi PETSc,
>> >>> >
>> >>> > I have a nonlinear 3D problem for a set of uncoupled 2D slabs.
>> (Which I
>> >>> > ultimately want to couple once this problem is solved).
>> >>> >
>> >>> > When I solve the inner linear problem for each of these 2D slabs
>> >>> > individually (using KSPGMRES) the convergence of the outer nonlinear
>> >>> > problem is good. However when I solve the inner linear problem as a
>> >>> single
>> >>> > 3D problem (with no coupling between the 2D slabs, so the matrix is
>> >>> > effectively a set of uncoupled blocks, one for each 2D slab) the
>> outer
>> >>> > nonlinear convergence degrades dramatically.
>> >>>
>> >>> Is the nonlinear problem also decoupled between slabs?
>> >>>
>> >>> If you solve the linear problem accurately (tight tolerances on the
>> >>> outer KSP, or global direct solve), is the outer nonlinear convergence
>> >>> good 

Re: [petsc-users] Block preconditioning for 3d problem

2019-10-10 Thread Jed Brown via petsc-users
Why are your slabs decoupled at present?  (Have you done a transform in
the vertical?)  Is the linear convergence significantly different when
you include the multiple layers?

Dave Lee  writes:

> Hi Jed and Mark,
>
> thanks for your helpful comments. Yes the nonlinear outer problem is
> uncoupled between the slabs, it is only the linear inner problem where they
> are coupled.
>
> I've tried to make the slab DOFs close in memory, and also tried using a
> tight tolerance on the outer KSP (1.0e-20), but without success.
>
> You make some really good points about scaling issues Jed and Mark. This is
> a solve for a global atmosphere, and each 2D slab is a horizontal layer of
> the atmosphere, so the pressure (which the linear solve is for) will vary
> dramatically between slabs. Perhaps I can additionally precondition the
> linear problem to normalise the pressure in each slab so that they stay
> close to unity.
>
> Cheers, Dave.
>
> On Fri, Oct 11, 2019 at 2:52 AM Mark Adams  wrote:
>
>> I can think of a few sources of coupling in the solver: 1) line search and
>> 2) Krylov, and 3) the residual test (scaling issues). You could turn
>> linesearch off and use Richardson (with a fixed number of iterations) or
>> exact solves as Jed suggested. As far as scaling can you use the same NL
>> problem on each slab? This should fix all the problems anyway. Or, on the
>> good decouple solves, if the true residual is of the same scale and *all*
>> of the slabs converge well then you should be OK on scaling. If this works
>> then start adding stuff back in and see what breaks it.
>>
>> On Thu, Oct 10, 2019 at 11:01 AM Jed Brown via petsc-users <
>> petsc-users@mcs.anl.gov> wrote:
>>
>>> Dave Lee via petsc-users  writes:
>>>
>>> > Hi PETSc,
>>> >
>>> > I have a nonlinear 3D problem for a set of uncoupled 2D slabs. (Which I
>>> > ultimately want to couple once this problem is solved).
>>> >
>>> > When I solve the inner linear problem for each of these 2D slabs
>>> > individually (using KSPGMRES) the convergence of the outer nonlinear
>>> > problem is good. However when I solve the inner linear problem as a
>>> single
>>> > 3D problem (with no coupling between the 2D slabs, so the matrix is
>>> > effectively a set of uncoupled blocks, one for each 2D slab) the outer
>>> > nonlinear convergence degrades dramatically.
>>>
>>> Is the nonlinear problem also decoupled between slabs?
>>>
>>> If you solve the linear problem accurately (tight tolerances on the
>>> outer KSP, or global direct solve), is the outer nonlinear convergence
>>> good again?  If not, test that your Jacobian is correct (it likely isn't
>>> or you have inconsistent scaling leading to ill conditioning).  SNES has
>>> automatic tests for that, but you aren't using SNES so you'd have to
>>> write some code.
>>>
>>> What happens if you run the 2D problem (where convergence is currently
>>> good) with much smaller subdomains (or -pc_type pbjacobi)?
>>>
>>> > Note that I am not using SNES, just my own quasi-Newton approach for the
>>> > outer nonlinear problem.
>>> >
>>> > I suspect that the way to recover the convergence for the 3D coupled
>>> > problem is to use some sort of PCBJACOBI or PCFIELDSPLIT preconditioner,
>>> > but I'm not entirely sure. I've tried following this example:
>>> >
>>> https://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex7.c.html
>>> > but without any improvement in the convergence.
>>> >
>>> > On each processor the slab index is the major index and the internal
>>> slab
>>> > DOF index is the minor index. The parallel decomposition is within the
>>> 2D
>>> > slab dimensions only, not between slabs.
>>>
>>> For convergence, you usually want the direction of tight coupling
>>> (sounds like that is within slabs) to be close in memory.
>>>
>>> In general, use -ksp_monitor_true_residual -ksp_converged_reason.
>>>
>>> > I'm configuring the 3D inner linear solver as
>>> >
>>> > KSPGetPC(ksp, );
>>> > PCSetType(pc, PCBJACOBI);
>>> > PCBJacobiSetLocalBlocks(pc, num_slabs, NULL);
>>> > KSPSetUp(ksp);
>>> > PCBJacobiGetSubKSP(pc, , _local, );
>>> > for(int ii = 0; ii < nlocal; ii++) {
>>> > KSPGetPC(subksp[ii], );
>>> > PCSetType(subpc, PCJACOBI);
>>> > KSPSetType(subksp[ii], KSPGMRES);
>>> > KSPSetTolerances(subksp[ii], 1.e-12, PETSC_DEFAULT, PETSC_DEFAULT,
>>> > PETSC_DEFAULT);
>>> > }
>>> >
>>> > However this does not seem to change the outer nonlinear convergence.
>>> When
>>> > I run with ksp_view the local block sizes look correct - number of
>>> local 2D
>>> > slab DOFs for each block on each processor.
>>> >
>>> > Any ideas on what sort of KSP/PC configuration I should be using to
>>> recover
>>> > the convergence of the uncoupled 2D slab linear solve for this 3D linear
>>> > solve? Should I be rethinking my node indexing to help with this?
>>> >
>>> > Cheers, Dave.
>>>
>>


Re: [petsc-users] Block preconditioning for 3d problem

2019-10-10 Thread Jed Brown via petsc-users
Dave Lee via petsc-users  writes:

> Hi PETSc,
>
> I have a nonlinear 3D problem for a set of uncoupled 2D slabs. (Which I
> ultimately want to couple once this problem is solved).
>
> When I solve the inner linear problem for each of these 2D slabs
> individually (using KSPGMRES) the convergence of the outer nonlinear
> problem is good. However when I solve the inner linear problem as a single
> 3D problem (with no coupling between the 2D slabs, so the matrix is
> effectively a set of uncoupled blocks, one for each 2D slab) the outer
> nonlinear convergence degrades dramatically.

Is the nonlinear problem also decoupled between slabs?

If you solve the linear problem accurately (tight tolerances on the
outer KSP, or global direct solve), is the outer nonlinear convergence
good again?  If not, test that your Jacobian is correct (it likely isn't
or you have inconsistent scaling leading to ill conditioning).  SNES has
automatic tests for that, but you aren't using SNES so you'd have to
write some code.

What happens if you run the 2D problem (where convergence is currently
good) with much smaller subdomains (or -pc_type pbjacobi)?

> Note that I am not using SNES, just my own quasi-Newton approach for the
> outer nonlinear problem.
>
> I suspect that the way to recover the convergence for the 3D coupled
> problem is to use some sort of PCBJACOBI or PCFIELDSPLIT preconditioner,
> but I'm not entirely sure. I've tried following this example:
> https://www.mcs.anl.gov/petsc/petsc-current/src/ksp/ksp/examples/tutorials/ex7.c.html
> but without any improvement in the convergence.
>
> On each processor the slab index is the major index and the internal slab
> DOF index is the minor index. The parallel decomposition is within the 2D
> slab dimensions only, not between slabs.

For convergence, you usually want the direction of tight coupling
(sounds like that is within slabs) to be close in memory.

In general, use -ksp_monitor_true_residual -ksp_converged_reason.

> I'm configuring the 3D inner linear solver as
>
> KSPGetPC(ksp, );
> PCSetType(pc, PCBJACOBI);
> PCBJacobiSetLocalBlocks(pc, num_slabs, NULL);
> KSPSetUp(ksp);
> PCBJacobiGetSubKSP(pc, , _local, );
> for(int ii = 0; ii < nlocal; ii++) {
> KSPGetPC(subksp[ii], );
> PCSetType(subpc, PCJACOBI);
> KSPSetType(subksp[ii], KSPGMRES);
> KSPSetTolerances(subksp[ii], 1.e-12, PETSC_DEFAULT, PETSC_DEFAULT,
> PETSC_DEFAULT);
> }
>
> However this does not seem to change the outer nonlinear convergence. When
> I run with ksp_view the local block sizes look correct - number of local 2D
> slab DOFs for each block on each processor.
>
> Any ideas on what sort of KSP/PC configuration I should be using to recover
> the convergence of the uncoupled 2D slab linear solve for this 3D linear
> solve? Should I be rethinking my node indexing to help with this?
>
> Cheers, Dave.


Re: [petsc-users] TS_SSP implementation for co-dependent variables

2019-10-09 Thread Jed Brown via petsc-users
Manuel Valera  writes:

> Sorry, I don't follow this last email, my spatial discretization is fixed,
> the problem is caused by the choice of vertical coordinate, in this case
> sigma, that calls for an integration of the hydrostatic pressure to correct
> for the right velocities.

Ah, fine.  To phrase this differently, you are currently solving an
integro-differential equation.  With an explicit integrator, you should
be able to embed that in the RHS function.  With an implicit integrator,
that causes the Jacobian to lose sparsity (the column integral is dense
coupling) so it's sometimes preferable to add pressure as an explicit
variable (or transform your existing variable set as part of a
preconditioner), in which case you get a differential algebraic equation
(the incompressible limit).

Have you tried embedding the column integral into the RHS function to
make a single unsplit formulation?

>  I had RK3 working before and SSP is much more stable, i can use way bigger
> DTs but then i get this asynchronous time integration. With RK3 I can
> operate in the intermediate states and thus I can advance everything in
> synchronization, but bigger DTs are not viable, it turns unstable quickly.
>
> On Wed, Oct 9, 2019 at 3:58 PM Jed Brown  wrote:
>
>> Is it a problem with the spatial discretization or with the time
>> discretization that you've been using thus far?  (This sort of problem
>> can occur for either reason.)
>>
>> Note that an SSP method is merely "preserving" -- the spatial
>> discretization needs to be strongly stable for an SSP method to preserve
>> it.  It sounds like yours is not, so maybe there is no particular
>> benefit to using SSP over any other method (but likely tighter time step
>> restriction).
>>
>> Manuel Valera  writes:
>>
>> > To correct for the deformation of the sigma coordinate grid... without
>> this
>> > correction the velocity become unphysical in the zones of high slope of
>> the
>> > grid. This is very specific of our model and probably will be solved by
>> > updating the equations transformation, but that's not nearly close to
>> > happening right now.
>> >
>> > On Wed, Oct 9, 2019 at 3:47 PM Jed Brown  wrote:
>> >
>> >> Manuel Valera  writes:
>> >>
>> >> > Thanks,
>> >> >
>> >> > My time integration schemes are all explicit, sorry if this a very
>> >> atypical
>> >> > setup. This is similar to the barotropic splitting but not exactly, we
>> >> > don't have free surface in the model, this is only to correct for
>> sigma
>> >> > coordinates deformations in the velocity field.
>> >> >
>> >> > From how i see it this could be solved by obtaining the intermediate
>> >> stages
>> >> > and then updating them accordingly, is this not possible to do ?
>> >>
>> >> Why are you splitting if all components are explicit and not subcycled?
>> >>
>>


Re: [petsc-users] TS_SSP implementation for co-dependent variables

2019-10-09 Thread Jed Brown via petsc-users
Is it a problem with the spatial discretization or with the time
discretization that you've been using thus far?  (This sort of problem
can occur for either reason.)

Note that an SSP method is merely "preserving" -- the spatial
discretization needs to be strongly stable for an SSP method to preserve
it.  It sounds like yours is not, so maybe there is no particular
benefit to using SSP over any other method (but likely tighter time step
restriction).

Manuel Valera  writes:

> To correct for the deformation of the sigma coordinate grid... without this
> correction the velocity become unphysical in the zones of high slope of the
> grid. This is very specific of our model and probably will be solved by
> updating the equations transformation, but that's not nearly close to
> happening right now.
>
> On Wed, Oct 9, 2019 at 3:47 PM Jed Brown  wrote:
>
>> Manuel Valera  writes:
>>
>> > Thanks,
>> >
>> > My time integration schemes are all explicit, sorry if this a very
>> atypical
>> > setup. This is similar to the barotropic splitting but not exactly, we
>> > don't have free surface in the model, this is only to correct for sigma
>> > coordinates deformations in the velocity field.
>> >
>> > From how i see it this could be solved by obtaining the intermediate
>> stages
>> > and then updating them accordingly, is this not possible to do ?
>>
>> Why are you splitting if all components are explicit and not subcycled?
>>


Re: [petsc-users] TS_SSP implementation for co-dependent variables

2019-10-09 Thread Jed Brown via petsc-users
Manuel Valera  writes:

> Thanks,
>
> My time integration schemes are all explicit, sorry if this a very atypical
> setup. This is similar to the barotropic splitting but not exactly, we
> don't have free surface in the model, this is only to correct for sigma
> coordinates deformations in the velocity field.
>
> From how i see it this could be solved by obtaining the intermediate stages
> and then updating them accordingly, is this not possible to do ?

Why are you splitting if all components are explicit and not subcycled?


Re: [petsc-users] TS_SSP implementation for co-dependent variables

2019-10-09 Thread Jed Brown via petsc-users
Manuel Valera  writes:

> Thanks for the answer, I will read the mentioned example, but to clarify
> for Barry I will schematize the process:
>
> At time n, the program need to do all of these at once:
>
>1. Solve T as a function of u,v,w
>2. Solve S as a function of u,v,w
>3. Solve rho density as a function of T,S
>4. Derivate a correction of the velocity fields from the density
>5. Solve u,v,w being corrected by the density field
>
> What I have implemented so far:
>
>1. Advance TS1 to solve for T
>2. Advance TS2 to solve for S
>3. Solve rho and calculate correction
>4. Advance TS3 to solve for u,v,w
>
> Or, altenatively:
>
>1. Advance TS to solve for T,S,u,v,w at the same time.
>2. Solve rho and calculate correction
>
> Both implementation are lacking the feedback from the T,S <-> rho <->
> Velocities interaction, and is creating problems when using a bigger DT.
>
> All the systems from the first numeration are different algorithms, and
> each TS in the 2nd numeration generate a different RHS.
>
> What Jed is suggesting is to create an overarching routine that does all
> that is the first list under one single step?

TSSSP isn't SSP or high order with ad-hoc coupling procedures such as
the above.  If you're in a parameter regime that is stiff (e.g., typical
regime in which barotropic splitting is popular), I would suggest
putting the semi-implicit component into a preconditioner or
Rosenbrock-W Jacobian approximation so that you can preserve the
desirable accuracy and stability properties of the method.


Re: [petsc-users] TS_SSP implementation for co-dependent variables

2019-10-09 Thread Jed Brown via petsc-users
Manuel Valera via petsc-users  writes:

> Hello,
>
> I have a set of equations which are co-dependent when integrating in time,
> this means the velocities u,v,w need a component from the Temperature and
> Salinity integration at the same intermediate step. Same for Temperature
> and Salinity, which need the current velocities (at the intermediate time
> stages) to be computed accurately.

There's nothing special about having different fields.  Just evaluate
the RHS function of the coupled system.  There are examples (e.g.,
ts/examples/tutorials/ex9.c) of doing this for systems like shallow
water (momentum coupled with thickness).

> My question is, how to feed the information of the updated temperature
> inside the velocity, and vice-versa? this would be equivalent, I think, to
> obtaining the intermediate stages of the time integration so they can be
> the input for the next intermediate stage.
>
> Thanks,


Re: [petsc-users] question about small matrices

2019-09-25 Thread Jed Brown via petsc-users
"Povolotskyi, Mykhailo via petsc-users"  writes:

> Hi Matthew,
>
> is it possible to do in principle what I would like to do?

SNES isn't meant to solve tiny independent systems.  (It's just high
overhead for that purpose.)  You can solve many such instances together
by creating a residual function that evaluates them all.  The linear
algebra will be more efficient with that granularity, though all
sub-problems will take the same number of Newton iterations.


Re: [petsc-users] Undefined symbols for architecture x86_64: "_dmviewfromoptions_",

2019-09-20 Thread Jed Brown via petsc-users
"Smith, Barry F. via petsc-users"  writes:

>Currently none of the XXXViewFromOptions() have manual pages or Fortran 
> stubs/interfaces. It is probably easier to remove them as inline functions 
> and instead write them as full functions which just call 
> PetscObjectViewFromOptions() with manual pages then the Fortran 
> stubs/interfaces will be built automatically.

PetscObjectViewFromOptions has a custom interface because it takes a string.

Fortran users could call that today, rather than wait for stubs to be
written.


Re: [petsc-users] Reading in the full matrix in one process and then trying to solve in parallel with PETSc

2019-09-20 Thread Jed Brown via petsc-users
Matthew Knepley via petsc-users  writes:

> On Fri, Sep 20, 2019 at 7:54 AM Bao Kai via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
>
>> Hi,
>>
>> I understand that PETSc is not designed to be used this way, while I
>> am wondering if someone have done something similar to this.
>>
>> We have the full matrix from a simulation and rhs vector. We would
>> like to read in through PETSc in one process, then we use some
>> partition functions to partition the matrix.
>>
>> Based on the partition information, we redistribute the matrix among
>> the processes. Then we solve it in parallel.  It is for testing the
>> performance of some parallel linear solver and preconditions.
>>
>> We are not in the position to develop a full parallel implementation
>> of the simulator yet.

An alternative is to assemble a Mat living on a parallel communicator,
but with all entries on rank 0 (so just call your serial code to build
the matrix).  You can do the same for your vector, then KSPSolve.  To
make the solver parallel, just use run-time options:

 -ksp_type preonly -pc_type redistribute

will redistribute automatically inside the solver and return the
solution vector to you on rank 0.  You can control the inner solver via
prefix

 -redistribute_ksp_type gmres -redistribute_pc_type gamg 
-redistibute_ksp_monitor


Re: [petsc-users] Block Tridiagonal Solver

2019-09-06 Thread Jed Brown via petsc-users
Where do your tridiagonal systems come from?  Do you need to solve one
at a time, or batches of tridiagonal problems?

Although it is not in PETSc, we have some work on solving the sort of
tridiagonal systems that arise in compact discretizations, which it
turns out can be solved much faster than generic tridiagonal problems.

  https://tridiaglu.github.io/index.html

"John L. Papp via petsc-users"  writes:

> Hello,
>
> I need a parallel block tridiagonal solver and thought PETSc would be 
> perfect.  However, there seems to be no specific example showing exactly 
> which VecCreate and MatCreate functions to use.  I searched the archive 
> and the web and there is no explicit block tridiagonal examples 
> (although ex23.c example solves a tridiagonal matrix) and the manual is 
> vague on the subject.  So a couple of questions:
>
>  1. Is it better to create a monolithic matrix (MatCreateAIJ) and vector
> (VecCreate)?
>  2. Is it better to create a block matrix (MatCreateBAIJ) and vector
> (VecCreate and then VecSetBlockSize or is there an equivalent block
> vector create)?
>  3. What is the best parallel solver(s) to invert the Dx=b when D is a
> block tridiagonal matrix?
>
> If this helps, each row will be owned by the same process.  In other 
> words, the data used to fill the [A] [B] [C] block matrices in a row of 
> the D block tridiagonal matrix will reside on the same process.  Hence, 
> I don't need to store the individual [A], [B], and [C] block matrices in 
> parallel, just the over all block tridiagonal matrix on a row by row basis.
>
> Thanks in advance,
>
> John
>
> -- 
> **
> Dr. John Papp
> Senior Research Scientist
> CRAFT Tech.
> 6210 Kellers Church Road
> Pipersville, PA 18947
>
> Email:  jp...@craft-tech.com
> Phone:  (215) 766-1520
> Fax  :  (215) 766-1524
> Web  :  http://www.craft-tech.com
>
> **


Re: [petsc-users] is TS_EQ_DAE_SEMI_EXPLICIT_INDEX functional

2019-09-02 Thread Jed Brown via petsc-users
I believe this is intended to work with most any implicit solver,
*provided* the initial conditions are compatible.  It was added by Emil,
but I don't see it explicitly tested in PETSc.

"Huck, Moritz via petsc-users"  writes:

> Hi,
> TS_EQ_DAE_SEMI_EXPLICIT_INDEX(?) are defined in TSEquationType but not 
> mentioned in the manual.
> Is this feature functional ?
> If yes how do I have to define the RHSFunction? 
> (I am asking since the ODE variant has it defined as G= M^-1 g, which cannot 
> work for a DAE)
>
> Best Regards,
> Moritz


Re: [petsc-users] question about CISS

2019-08-29 Thread Jed Brown via petsc-users
Elemental also has distributed-memory eigensolvers that should be at
least as good as ScaLAPACK's.  There is support for Elemental in PETSc,
but not yet in SLEPc.

"Povolotskyi, Mykhailo via petsc-users"  writes:

> Thank you for suggestion.
>
> Is it interfaced to SLEPC?
>
>
> On 08/29/2019 04:14 PM, Jose E. Roman wrote:
>> I am not an expert in contour integral eigensolvers. I think difficulties 
>> come with corners, so ellipses are the best choice. I don't think ring 
>> regions are relevant here.
>>
>> Have you considered using ScaLAPACK. Some time ago we were able to address 
>> problems of size up to 400k   https://doi.org/10.1017/jfm.2016.208
>>
>> Jose
>>
>>
>>> El 29 ago 2019, a las 21:55, Povolotskyi, Mykhailo  
>>> escribió:
>>>
>>> Thank you, Jose,
>>>
>>> what about rings? Are they better than rectangles?
>>>
>>> Michael.
>>>
>>>
>>> On 08/29/2019 03:44 PM, Jose E. Roman wrote:
 The CISS solver is supposed to estimate the number of eigenvalues 
 contained in the contour. My impression is that the estimation is less 
 accurate in case of rectangular contours, compared to elliptic ones. But 
 of course, with ellipses it is not possible to fully cover the complex 
 plane unless there is some overlap.

 Jose


> El 29 ago 2019, a las 20:56, Povolotskyi, Mykhailo via petsc-users 
>  escribió:
>
> Hello everyone,
>
> this is a question about  SLEPc.
>
> The problem that I need to solve is as follows.
>
> I have a matrix and I need a full spectrum of it (both eigenvalues and
> eigenvectors).
>
> The regular way is to use Lapack, but it is slow. I decided to try the
> following:
>
> a) compute the bounds of the spectrum using Krylov Schur approach.
>
> b) divide the complex eigenvalue plane into rectangular areas, then
> apply CISS to each area in parallel.
>
> However, I found that the solver is missing some eigenvalues, even if my
> rectangles cover the whole spectral area.
>
> My question: can this approach work in principle? If yes, how one can
> set-up CISS solver to not loose the eigenvalues?
>
> Thank you,
>
> Michael.
>


Re: [petsc-users] FVM using dmplex for linear advection

2019-08-28 Thread Jed Brown via petsc-users
Try to reduce the problem.  You may also compare with
src/ts/examples/tutorials/ex11.c, which includes a finite-volume
advection solver (with or without slope reconstruction/limiting).

Praveen C via petsc-users  writes:

> Dear all
>
> I am trying to write a simple first order upwind FVM to solve linear 
> advection in 2d on triangular meshes using dmplex.
>
> Based on some suggestions in recent discussions, I cooked up this simple 
> example
>
> https://github.com/cpraveen/cfdlab/tree/master/petsc/dmplex_convect2d
>
> A gaussian profile must rotate around origin but in my code, there is some 
> errors developing in just one or two cells, see picture.
>
> I am looking for any suggestions/hints/tips on how to identify the problem. 
> Since I have never written a dmplex code before, I am not able to ask a 
> specific question.
>
> Thanks
> praveen


Re: [petsc-users] Getting the connectivity from DMPlex

2019-08-21 Thread Jed Brown via petsc-users
Jian Zhang - 3ME  writes:

> Hi Jed,
>
> Thank you very much. I tried to use DMPlexGetCone, but the output is
> the edge ids, not the vertice ids. 

This means you have an interpolated mesh (edges represented explicitly
in the data structure).

> For the function DMPlexGetClosureIndices, I can not find it in
> petsc4py. Do you know any alternative ways to solve this?

Add it to petsc4py or ask for it to be added.  I'm sorry I can't do it
at the moment.


Re: [petsc-users] Getting the connectivity from DMPlex

2019-08-21 Thread Jed Brown via petsc-users
Jian Zhang - 3ME via petsc-users  writes:

> Hi guys,
>
> I am trying to get the element connectivity from DMPlex. The input is the 
> element id, and the output should be the vertice ids. Which function should I 
> use to achieve this? Thanks in advance.

See DMPlexGetCone or DMPlexGetClosureIndices.


Re: [petsc-users] MINRES accuracy for a SPD matrix

2019-07-30 Thread Jed Brown via petsc-users
Are you saying that the MINRES error is larger than CG error?  In which
norm?  And which norm are you using for CG?  (Output from
-ksp_monitor_true_residual -ksp_view would be useful.)

CG does find a solution that is optimal in a different inner product,
though this is usually pretty benign unless you have a poor
preconditioner.

Mohammad Gohardoust via petsc-users  writes:

> Hi,
>
> I am trying to evaluate different solver/preconditioner combinations for
> solving Richards' equation in porous media. The coefficient matrix is
> symmetric positive definite and is usually solved by CG solver. I tried
> MINRES with various preconditioners. The solution is acceptable but with
> somehow considerable errors (compared to other solvers) which requires me
> to decrease the tolerance threshold in my Picard iteration leading to
> larger computation time. I would like to know if this is common for MINRES
> and how to possibly overcome this?
>
> Thanks,
> Mohammad


Re: [petsc-users] Zero entries of a sparse matrix in a block-wise style

2019-07-26 Thread Jed Brown via petsc-users
Mingchang Ding  writes:

>> It appears that you are calling MatZeroRows with rows = {0, info.mx-1},
>> which will only affect the first and last rows.  The other entries are
>> how you have assembled them (with the matrix-matrix product).
>
> Yes. I think so. Is there anyway I can fill in the rows array with all the 
> first and last DOF rows index efficiently instead of only filling in one row 
> index at one time?

for (dof=0; dof

Re: [petsc-users] Zero entries of a sparse matrix in a block-wise style

2019-07-26 Thread Jed Brown via petsc-users
Please always use "reply-all" so that your messages go to the list.
This is standard mailing list etiquette.  It is important to preserve
threading for people who find this discussion later and so that we do
not waste our time re-answering the same questions that have already
been answered in private side-conversations.  You'll likely get an
answer faster that way too.

Mingchang Ding  writes:

> Thank you for your quick reply. The code I have now is attached. The sparse 
> matrix obtained from LDG discretization to u_xx is assembled in SetupMat 
> routine of setup.h. When I choose my polynomial degree nk =1 (DOF = 2), only 
> the first and last row of the output sparse matrix is as what I want. Again, 
> I want to impose essential (Dirichlet) boundary condition to the problem.
>
> Let me know if any explanation of the question is needed.

It appears that you are calling MatZeroRows with rows = {0, info.mx-1},
which will only affect the first and last rows.  The other entries are
how you have assembled them (with the matrix-matrix product).


  PetscIntnrows = 0;
  PetscInt*rows;

  ierr = DMDAGetLocalInfo(da, );CHKERRQ(ierr);

  PetscMalloc1(info.mx, );

  for (j=info.xs; jMass_inv, 1, , 1, , 
[0][0], INSERT_VALUES);CHKERRQ(ierr);

  }


[...]


  MatZeroRows(appctx->J_dif, nrows, rows, 1.0, NULL, NULL);


Re: [petsc-users] Zero entries of a sparse matrix in a block-wise style

2019-07-26 Thread Jed Brown via petsc-users
Can you give an example of what you are trying?  These functions should be 
capable of handling any set of rows.

Mingchang Ding via petsc-users  writes:

> Hi, all
>
> I am trying to apply LDG to discretizing 1D diffusion term u_{xx} with 
> Dirichlet boundary condition. The difficulty I have is that I want to zero 
> all entries (except the diagonal one needs to be set as 1.0) of the beginning 
> and ending DOF rows of the sparse matrix.
>
> I have tried the MatZeroRows and MatZeroRowsStencil. However, I can only 
> change the very first and last row of the matrix with them. This isn’t right 
> when DOF > 1 because I need to zero the first DOF rows (the diagonal should 
> be set as identity).
>
> I was wondering if there is anyway I can take to zero the first and last DOF 
> rows (with identity matrix diagonal) in a block-wise style.
>
> Let me know if any further explanation is needed.
>
> Thank you in advance!
>
> Mingchang Ding
> Graduate Student
> EWG 111
> Department of Mathematical Sciences
> University of Delaware


Re: [petsc-users] Low floating precision numerical solver?

2019-07-23 Thread Jed Brown via petsc-users
You can use reduced-precision preconditioning if you're writing your
own, but there isn't out-of-the-box support.  Note that the benefit is
limited when working with sparse matrices because a lot of the cost
comes from memory access (including column indices) and vectorization
for some operations is difficult.

Manuel Valera via petsc-users  writes:

> Hello,
>
> I was wondering if PETSc had some form of a low precision linear solver
> algorithm like seen in:
>
> https://www.dropbox.com/s/rv5quc3k72qdpmp/iciam_lowprec19.pdf?dl=0
>
> I understand this treatment is coming from one of the NAG library
> developers,
>
> Thanks,


[petsc-users] AGU Session: T003: Advances in Computational Geosciences

2019-07-22 Thread Jed Brown via petsc-users
If you are thinking about attending the American Geophysical Union Fall
Meeting (Dec 9-13 in San Francisco), please consider submitting an
abstract to this interdisciplinary session.  Abstracts are due July 31.

T003: Advances in Computational Geosciences

This session highlights advances in the theory and practice of
computational geoscience, from improvements in numerical methods to
their application to outstanding problems in the Earth sciences. Common
issues include robust and efficient solvers, multiscale discretizations,
design of benchmark problems and standards for comparison. Increasing
data and computational power necessitates open source scientific
libraries and workflow automation for model setup, 3D feature
connectivity, and data assimilation, and automation in uncertainty
representation and propagation, optimal design of field studies, risk
quantification, and testing the predictive power of numerical
simulations. By bringing these crosscutting computational activities
together in one session, we hope to sharpen our collective understanding
of fundamental challenges, level of rigor, and opportunities for
reusable implementations. Contributions from all areas are welcome,
including, but not limited to, fault modeling, tectonics, subduction,
seismology, magma dynamics, mantle convection, the core, as well as
surface processes, hydrology, and cryosphere.

Confirmed invited presenters: Talea Mayo, Andreas Fichtner

https://agu.confex.com/agu/fm19/prelim.cgi/Session/83797

Conveners
  Jed Brown
  University of Colorado at Boulder
  Alice-Agnes Gabriel
  Ludwig-Maximilians-Universität
  Georg S Reuber
  Johannes Gutenberg University of Mainz
  Nathan Collier
  Oak Ridge National Laboratory


Re: [petsc-users] Block matrix vector products?

2019-07-15 Thread Jed Brown via petsc-users
You can use MatCreateMAIJ(A,2,) and a single MatMult(A,xy) where xy
contains the vectors x and y interlaced [x_0, y_0, x_1, y_1, ...].

There is also MatMatMult(A,X,...,) where X is a MATDENSE with two
columns, but I would prefer the MAIJ variant above in most cases.

Tyler Chen via petsc-users  writes:

> Hello,
>
> I am looking for a way to compute two matrix vector products
> simultaneously, where the matrix is the same for both products (i.e. Ax and
> Ay). I know some numerical libraries have the concept of "block vectors" or
> "basis vectors" and was wondering if there is anything similar in PETSc.
> Right now I am just using two MatMults sequentially, but I'm hoping there
> are probably better ways to do this. My matrices are MPIAIJ if that makes
> any difference.
>
> Best,
> Tyler


Re: [petsc-users] Questions about AMG and Jacobian Contruction

2019-07-14 Thread Jed Brown via petsc-users
"Smith, Barry F. via petsc-users"  writes:

>We've found that the additive Schwarz methods (PCASM) or even block
>Jacobi PCBJACOBI often work well for network problems, better than
>GAMG. GAMG doesn't know anything about the structure of networks,
>it is for PDEs on meshes, so there is no reason to think it would
>work particularly well for network problems.

The number of iterations for one-level methods (like Jacobi/block
Jacobi) is roughly proportional to the diameter of your network.  If
your network has small diameter, those methods may be fast already.  PDE
meshes generally have large diameter with bounded degree; the AMG
methods in PETSc are good for such problems.  If you have large diameter
with unbounded degree, then more specialized AMG variants such as
https://arxiv.org/abs/1705.06266 can be better.


Re: [petsc-users] Questions about AMG and Jacobian Contruction

2019-07-14 Thread Jed Brown via petsc-users
Yingjie Wu via petsc-users  writes:

> Respected PETSc developers:
> Hi,
> I have some questions about some functions of AMG and the construction time
> of Jacobian matrix in the process of using. Please help me to answer them.
>
> 1. I see some functions about AMG in the list of PETSc functions. I wonder
> if DMDA is necessary to construct vectors when using AMG precondition.
> (Because the geometry described in my PDE is irregular,it contains a lot of
> pipes and connections, instead of using DMDA to manage vectors, I define
> discrete solution vectors by myself.)

It is not used by AMG.  Geometric MG either uses a DM (of which DMDA is
one, but there are unstructured choices) or requires the user to call
functions like PCMGSetInterpolation.  AMG is probably a better choice
for your problem (unless the pipe networks are such that a direct solver
is practical; note that Approximate Minimum Degree ordering may be
better than Nested Dissection).

> 2. As far as I know, besides V-cycle, AMG method also needs auxiliary
> matrix B and transfer matrix P (transfer matrix from fine mesh to coarse
> mesh). Are these transformation matrices built-in or can they be provided
> by users in PETSc?

I'm not sure what your B represents, but see PCMGSetInterpolation to specify P.

> 3. At present, in my program, I use -snes_mf_operator to solve the
> non-linear PDE problem. Although I do not directly construct Jacobian
> matrix, it is also expensive to construct preconditioning matrix (provided
> by users). Can I control the update and decomposition time of the
> preconditioning matrix, such as updating the preconditioning matrix every
> two Newton steps and then ILU factorization? Can these controls be
> implemented in SNESMonitor?

See SNESSetLagJacobian (-snes_lag_jacobian) and SNESSetLagPreconditioner
(-snes_lag_preconditioner).


Re: [petsc-users] What is the best way to do domain decomposition with petsc?

2019-07-11 Thread Jed Brown via petsc-users
Matthew Knepley via petsc-users  writes:

>> I am just wondering which way is better, or do you have any other
>> suggestion?
>>
> If you plan on doing a lot of mesh manipulation by hand and want to control
> everything, the first option might be better.
> On the other hand, if you use Plex, you could potentially take advantage of
> parallel loading and output, redistribution
> and load balancing, adaptive refinement, and interfacing with the
> multilevel and block solvers in PETSc.

There's also a workflow factor.  If you rely on partitioning via Gmsh as
a preprocessing step, then you can't easily run the same mesh on a
different number of processes.  With DMPlex doing the partitioning and
distribution automatically on load (it's fast), you can much more easily
conduct scaling studies, debug configurations, move to different
machines, etc.


Re: [petsc-users] Making the convergence faster

2019-07-10 Thread Jed Brown via petsc-users
This is typical for weak preconditioners.  Have you tried -pc_type gamg
or -pc_type mg (algebraic and geometric multigrid, respectively)?  On a
structured grid with smooth coefficients, geometric multigrid is
possible with low setup cost and convergence in a few iterations
independent of problem size.

Maahi Talukder via petsc-users  writes:

> Dear All,
>
>
> I am solving Poisson grid generation equation  using 9-point Finite
> Difference. The type of convergence plot I am getting has been attached
> herewith. As  the plot shows, at the beginning, the convergence is very
> fast, but later the convergence is slowing down. I am using GMRES with left
> preconditioning?
>
> My question is, is it normal? Any ideas how  to make the convergence faster?
>
>
>
> Thanks,
> Maahi Talukder


Re: [petsc-users] LU Performance

2019-07-05 Thread Jed Brown via petsc-users
Stefano Zampini via petsc-users  writes:

> Jared,
>
> The petsc output shows
>
> package used to perform factorization: petsc
>
> You are not using umfpack, but the PETSc native LU. You can run with 
> -options_left  to see the options that are not processed from the PETSc 
> options database.

-pc_factor_mat_solver_package was used until PETSc-3.9.  You're using a
rather old (~2016) PETSc with new options.

https://www.mcs.anl.gov/petsc/documentation/changes/39.html

>
>
>> On Jul 5, 2019, at 10:26 AM, Jared Crean via petsc-users 
>>  wrote:
>> 
>> This is in reply to both David and Barry's emails.
>> 
>> I am using the Umfpack that Petsc built (--download-suitesparse=yes was 
>> passed to configure), so all the compiler flags and Blas/Lapack libraries 
>> are the same.  I used OpenBlas for Blas and Lapack, with multi-threading 
>> disabled.  When calling Umfpack directly, the factorization takes about 4 
>> second, compared to 135 seconds spent in MatLUFactorNum when using Umfpack 
>> via Petsc.
>> 
>> I added a call to umfpack_di_report_control()  (which prints  the 
>> Umfpack parameters) to my code, and also added -mat_umfpack_prl 2 to the 
>> Petsc options, which should cause Petsc to call the same function just 
>> before doing the symbolic factorization (umfpack.c line 245 in Petsc 3.7.6). 
>> The output is attached (also with the -ksp_view option).  My code did print 
>> the parameters, but Petsc did not, which makes me think 
>> MatLUFactorSymbolic_UMFPACK never got called.  For reference, here is how I 
>> am invoking the program:
>> 
>> ./test -ksp_type preonly -pc_type lu -pc_factor_mat_solver_type umfpack 
>> -log_view -ksp_view -mat_umfpack_prl 2 > fout_umfpacklu
>> 
>> Jared Crean
>> 
>> On 7/5/19 4:02 AM, Smith, Barry F. wrote:
>>>When you use Umfpack standalone do you use OpenMP threads? When you use 
>>> umfpack alone do you us thread enabled BLAS/LAPACK? Perhaps OpenBLAS or MKL?
>>> 
>>>You can run both cases with -ksp_view and it will print more details 
>>> indicating indicating the solver used.
>>> 
>>> Do you use the same compiler and same options when compiling PETSc and 
>>> Umfpack standalone. Is the Umfpack standalone time in the numerical 
>>> factorization much smaller? Perhaps umfpack is using a much better ordering 
>>> then when used with PETSc (perhaps the default orderings are different).
>>> 
>>>Does Umfpack has a routine that tiggers output of the parameters etc it 
>>> is using? If you can trigger it you might see differences between 
>>> standalone and not.
>>> 
>>>Barry
>>> 
>>> 
 On Jul 4, 2019, at 4:05 PM, Jared Crean via petsc-users 
  wrote:
 
 Hello,
 
 I am getting very bad performance from the Umfpack LU solver when I 
 use it via Petsc compared to calling Umfpack directly. It takes about 5.5 
 seconds to factor and solve the matrix with Umfpack, but 140 seconds when 
 I use Petsc with -ksp_type preonly -pc_type lu -pc_factor_mat_solver_type 
 umfpack.
 
 I have attached a minimal example (test.c) that reads a matrix from a 
 file, solves with Umfpack, and then solves with Petsc.  The matrix data 
 files are not included because they are about 250 megabytes.  I also 
 attached the output of the program with -log_view for 
 -pc_factor_mat_solver_type umfpack (fout_umfpacklu) and 
 -pc_factor_mat_solver_type petsc (fout_petsclu).  Both results show nearly 
 all of the time is spent in MatLuFactorNum.  The times are very similar, 
 so I am wondering if Petsc is really calling Umfpack or if the Petsc LU 
 solver is getting called in both cases.
 
 
 Jared Crean
 
 
>> 
>> 
>> 


Re: [petsc-users] Computing residual norm in KSPFGMRESCycle()

2019-07-05 Thread Jed Brown via petsc-users
Dave, have you considered using GCR instead of FGMRES?  It's a flexible
method that is equivalent in many circumstances, but provides the
residual and solution at each iteration without needing to "build" it.

Dave Lee via petsc-users  writes:

> Hi Matt and Barry,
>
> thanks for the good ideas.
>
> I wasn't aware of the function KSPSetConvergenceTest(), thanks for alerting
> me to this Matt. Unfortunately I really do need to exclude a couple of
> specific degrees of freedom from the norm, and as such I want to make sure
> I can replicate the norm as computed within UpdateHessenberg() before I
> replace this.
>
> Apologies Barry, there was a stupid bug in that code, it should have read:
> KSPFGMRESUpdateHessenberg(ksp,loc_it,hapend,_norm);
> KSPFGMRESBuildSoln(fgmres->nrs,ksp->vec_sol,tmp_sol,ksp,loc_it+1);
> VecAXPY(tmp_sol,-1.0,ksp->vec_rhs);
> VecNorm(tmp_sol,NORM_2,_tmp);
>
> Which does not give the same result as the norm for UpdateHessenberg().
> Unfortunately I want to avoid using
> KSPBuildResidual(ksp,NULL,NULL,);
> as this calls KSP_MatMult(), which will in turn call my snes residual
> assembly routine. This routine in turn calls a Navier Stokes solver to run
> for a fixed (long) time period, and so is expensive.
>
> Replacing VecNorm() with my own version is indeed what I'm doing elsewhere,
> I'm just not sure of how to replace the UpdateHessenberg() norm right now.
>
> Cheers, Dave.
>
> On Fri, Jul 5, 2019 at 12:59 PM Smith, Barry F.  wrote:
>
>>
>>
>> > On Jul 4, 2019, at 7:29 PM, Dave Lee via petsc-users <
>> petsc-users@mcs.anl.gov> wrote:
>> >
>> > Hi PETSc,
>> >
>> > I have a problem for which I need to exclude certain degrees of freedom
>> from the evaluation of the norm as used to monitor the residual in FGMRES
>> (don't ask!). Therefore I would like to replace the existing norm
>> evaluation with my own version.
>> >
>> > Currently this is done within:
>> > KSPFGMRESUpdateHessenberg(ksp,loc_it,hapend,_norm)
>> > as:
>> > *res = PetscAbsScalar(*RS(it+1));
>> >
>> > Firstly, I would like to make sure I can replicate the existing residual
>> norm. I tried to do this as:
>> > KSPFGMRESUpdateHessenberg(ksp,loc_it,hapend,_norm);
>> > KSPFGMRESBuildSoln(fgmres->nrs,ksp >vec_sol,tmp_sol,ksp,loc_it);
>>
>>I don't understand the lines of code below. I don't see how they could
>> compute the norm of the residual.
>>
>> From KSPMonitorTrueResidualNorm() you can compute the unpreconditioned
>> residual with
>>
>>   ierr = KSPBuildResidual(ksp,NULL,NULL,);CHKERRQ(ierr);
>>   ierr = VecNorm(resid,NORM_2,);CHKERRQ(ierr);
>>   ierr = VecDestroy();CHKERRQ(ierr);
>>
>> since FGMRES always uses right preconditioning (and hence the
>> unpreconditioned residual) this should match the value computed internally
>> by FGMRES (subject to roundoff differences)
>>
>> To exclude certain values in the residual from the norm calculation you
>> could write a custom "VecNorm" norm that skipped those entries.
>>
>>   Barry
>>
>>
>>
>>
>>
>> > VecNorm(tmp_sol,NORM_2,_tmp);
>> > VecNorm(ksp->vec_rhs,NORM_2,_rhs);
>> > res_norm = fabs(res_tmp-res_rhs);
>> >
>> > But this doesn't match the norm that comes out of UpdateHessenberg.
>> >
>> > Any ideas on how I can re-create the norm derived from UpdateHessenberg
>> as an expansion over the Kyrlov vectors in FGMRES?
>> >
>> > Cheers, Dave.
>>
>>


Re: [petsc-users] Communication during MatAssemblyEnd

2019-06-21 Thread Jed Brown via petsc-users
What is the partition like?  Suppose you randomly assigned nodes to
processes; then in the typical case, all neighbors would be on different
processors.  Then the "diagonal block" would be nearly diagonal and the
off-diagonal block would be huge, requiring communication with many
other processes.

"Smith, Barry F. via petsc-users"  writes:

>The load balance is definitely out of whack. 
>
>
>
> BuildTwoSidedF 1 1.0 1.6722e-0241.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> MatMult  138 1.0 2.6604e+02 7.4 3.19e+10 2.1 8.2e+07 7.8e+06 
> 0.0e+00  2  4 13 13  0  15 25100100  0 2935476
> MatAssemblyBegin   1 1.0 1.6807e-0236.1 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> MatAssemblyEnd 1 1.0 3.5680e-01 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> VecNorm2 1.0 4.4252e+0174.8 1.73e+07 1.0 0.0e+00 0.0e+00 
> 2.0e+00  1  0  0  0  0   5  0  0  0  1 12780
> VecCopy6 1.0 6.5655e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> VecAXPY2 1.0 1.3793e-02 2.7 1.73e+07 1.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 41000838
> VecScatterBegin  138 1.0 1.1653e+0285.8 0.00e+00 0.0 8.2e+07 7.8e+06 
> 0.0e+00  1  0 13 13  0   4  0100100  0 0
> VecScatterEnd138 1.0 1.3653e+0222.4 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  1  0  0  0  0   4  0  0  0  0 0
> VecSetRandom   1 1.0 9.6668e-01 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
>
> Note that VecCopy/AXPY/SetRandom which are all embarrassingly parallel have a 
> balance ratio above 2 which means some processes have more than twice the 
> work of others. Meanwhile the ratio for anything with communication is 
> extremely in balanced, some processes get to the synchronization point well 
> before other processes. 
>
> The first thing I would do is worry about the load imbalance, what is its 
> cause? is it one process with much less work than others (not great but not 
> terrible) or is it one process with much more work then the others (terrible) 
> or something in between. I think once you get a handle on the load balance 
> the rest may fall into place, otherwise we still have some exploring to do. 
> This is not expected behavior for a good machine with a good network and a 
> well balanced job. After you understand the load balancing you may need to 
> use one of the parallel performance visualization tools to see why the 
> synchronization is out of whack.
>
>Good luck
>
>   Barry
>
>
>> On Jun 21, 2019, at 9:27 AM, Ale Foggia  wrote:
>> 
>> I'm sending one with a bit less time.
>> I'm timing the functions also with std::chronos and for the case of 180 
>> seconds the program runs out of memory (and crushes) before the PETSc log 
>> gets to be printed, so I know the time only from my function. Anyway, in 
>> every case, the times between std::chronos and the PETSc log match.
>> 
>> (The large times are in part "4b- Building offdiagonal part" or "Event Stage 
>> 5: Offdiag").
>> 
>> El vie., 21 jun. 2019 a las 16:09, Zhang, Junchao () 
>> escribió:
>> 
>> 
>> On Fri, Jun 21, 2019 at 8:07 AM Ale Foggia  wrote:
>> Thanks both of you for your answers,
>> 
>> El jue., 20 jun. 2019 a las 22:20, Smith, Barry F. () 
>> escribió:
>> 
>>   Note that this is a one time cost if the nonzero structure of the matrix 
>> stays the same. It will not happen in future MatAssemblies.
>> 
>> > On Jun 20, 2019, at 3:16 PM, Zhang, Junchao via petsc-users 
>> >  wrote:
>> > 
>> > Those messages were used to build MatMult communication pattern for the 
>> > matrix. They were not part of the matrix entries-passing you imagined, but 
>> > indeed happened in MatAssemblyEnd. If you want to make sure processors do 
>> > not set remote entries, you can use 
>> > MatSetOption(A,MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE), which will generate an 
>> > error when an off-proc entry is set.
>> 
>> I started being concerned about this when I saw that the assembly was taking 
>> a few hundreds of seconds in my code, like 180 seconds, which for me it's a 
>> considerable time. Do you think (or maybe you need more information to 
>> answer this) that this time is "reasonable" for communicating the pattern 
>> for the matrix? I already checked that I'm not setting any remote entries. 
>> It is not reasonable. Could you send log view of that test with 180 seconds 
>> MatAssembly?
>>  
>> Also I see (in my code) that even if there are no messages being passed 
>> during MatAssemblyBegin, it is taking time and the "ratio" is very big.
>> 
>> > 
>> > 
>> > --Junchao Zhang
>> > 
>> > 
>> > On Thu, Jun 20, 2019 at 4:13 AM Ale Foggia via petsc-users 
>> >  wrote:
>> > Hello all!
>> > 
>> > During the conference I showed you a problem happening during 
>> > MatAssemblyEnd in a particular code that I have. Now, I tried the 

Re: [petsc-users] Local sizes when creating PETSC MatIS

2019-06-15 Thread Jed Brown via petsc-users
José Lorenzo via petsc-users  writes:

> I'm using PETSC 3.10 with 64 bits indices.
>
> When I run valgrind I get the following message at the end of the report,
> which I don't know how to interpret:
>
> ==399672== Invalid read of size 8
> ==399672==at 0x5627D05: MatZeroEntries_SeqAIJ (aij.c:1077)
> ==399672==by 0x5394A61: MatZeroEntries (matrix.c:5664)
> ==399672==by 0x53DEF3F: MatZeroEntries_IS (matis.c:3064)
> ==399672==by 0x5394A61: MatZeroEntries (matrix.c:5664)
> ==399672==by 0x53B682C: matzeroentries_ (matrixf.c:1024)
> ==399672==by 0x415C0B: parallmodule_mp_elementarymatrices_
> (parallModule.F90:101)
> ==399672==by 0x424C40: MAIN__ (main.F90:414)
> ==399672==by 0x40303D: main (in /myproject/)
> ==399672==  Address 0x4c30 is not stack'd, malloc'd or (recently) free'd

This "address" looks an awful lot like a typical integer.  Is this really the 
*first* Valgrind warning?


Re: [petsc-users] Using MATAIJ and MATSBAIJ

2019-06-11 Thread Jed Brown via petsc-users
Chih-Chuen Lin via petsc-users  writes:

> Dear PETSc users,
>
> I am Ian. I trying to implement a solver which involves a sparse symmetric 
> matrix A multiplied by a dense matrix X. And because of the nature of the 
> problem, the bandwidth of the matrix A would be kind of large.For A*X, I am 
> thinking using reverse Cuthill-Mckee algorithm to reduce the bandwidth.
>
> Are the following approach reasonable, or do you have a better advice?
>
> 1. Use MatGetOrdering to get a MATORDERINGRCM ordering, and MatPermute to 
> create a new with it.

You can do this.  It may not make much difference.  Note that if you
permute columns of A, you'll also need to permute rows of X.

> 2. What’s the difference by using MATAIJ and MATBAIJ in terms of the entry 
> insertion and computation and MPI efficiency for a sparse-dense matrix 
> multiplication? Would it be better to use MATSBAIJ in terms of the 
> computational efficiency?

The sparse-dense variant isn't implemented for all cases and may not
provide benefit anyway.  Using a parallel symmetric format for SpMV
requires more communication of the "vector", which in your case is a
dense matrix.


Re: [petsc-users] Is it possible to update SNES tolerances?

2019-06-10 Thread Jed Brown via petsc-users
David Knezevic via petsc-users  writes:

> I'm doing load stepping with SNES, where I do a SNES solve for each load
> step. Ideally the convergence tolerances for each load step would be set
> based on the norm of the load in the current step. As a result I would like
> to be able to update the absolute and relative tolerances on a SNES.
>
> I tried to call SNESSetTolerances at the beginning of each load step, but
> it seems (based on the -snes_view output) that the tolerances do not get
> updated. Is there a way to update the tolerances?

Did you also have SNESSetFromOptions and specify tolerances on the
command line?  SNESSetTolerances should be used unless it gets
overwritten later by a process such as SNESSetFromOptions.


Re: [petsc-users] updating parallel matrix

2019-06-05 Thread Jed Brown via petsc-users
Just MatSetValues and MatAssembly* again.  If you use ADD_VALUES, then
you might MatZeroEntries before adding values.

Evan Um via petsc-users  writes:

> Hi PETSC users,
>
> I want to solve a number of matrix-vector equations. In each equation, the
> matrix has different values, but the sparsity pattern is unchanged. I can
> destroy a current matrix and create a new matrix. However, is there a more
> efficient way to do this task? Is there any code example for this kind of
> problem? In advance, thank you very much for your comments.
>
> Regards,
> Evan


Re: [petsc-users] Merging Matrix Market files

2019-06-04 Thread Jed Brown via petsc-users
For most purposes, it's easiest to read it with SciPy (or Matlab, etc.),
merge, and use PetscBinaryIO to write.  Then it'll be fast to access
from PETSc, even in parallel.

Afrah Najib via petsc-users  writes:

> Hi,
>
> I have a set of files generated synthetically in matrix market file formats
> and I want to merge them in order to get a larger matrix. Do you know any
> useful tool or library for merging MTX files?
>
> The same question applies for Harwell-beoing files?
>
> Thank you,


Re: [petsc-users] Nonzero I-j locations

2019-05-29 Thread Jed Brown via petsc-users
"Smith, Barry F."  writes:

>   Sorry, my mistake. I assumed that the naming would follow PETSc convention 
> and there would be MatGetLocalSubMatrix_something() as there is 
> MatGetLocalSubMatrix_IS() and MatGetLocalSubMatrix_Nest(). Instead 
> MatGetLocalSubMatrix() is hardwired to call MatCreateLocalRef() if the 
> method is not provide for the original matrix. 
>
>   Now interestingly MatCreateLocalRef() has its own manual page which states: 
> Most will use MatGetLocalSubMatrix(). I am not sure why MatCreateLocalRef() 
> is a public function (that is why it would ever be called directly). Perhaps 
> a note could be added to its manual page indicating why it is public. My 
> inclination would be to make it private and call it 
> MatGetLocalSubMatrix_Basic(). There is harm in having multiple similar public 
> functions unless there is a true need for them.

I think the motivation was that it's actually a Mat implementation and
thus made sense as Developer level interface rather than a strictly
internal interface.  I don't know if we had in mind any use cases where
it could be useful to a general caller.  I don't have a strong opinion
at the moment about whether it makes sense to keep like this or make
internal.


Re: [petsc-users] Nonzero I-j locations

2019-05-29 Thread Jed Brown via petsc-users
"Smith, Barry F. via petsc-users"  writes:

>This is an interesting idea, but unfortunately not directly compatible 
> with libMesh filling up the finite element part of the matrix. Plus it 
> appears MatGetLocalSubMatrix() is only implemented for IS and Nest matrices 
> :-(

Maybe I'm missing something, but MatGetLocalSubMatrix *is* implemented
for arbitrary Mats; it returns a view that allows you to set entries
using local submatrix indexing.  That was a key feature of the MatNest
work from so many years ago and a paper on which you're coauthor.  ;-)


Re: [petsc-users] How do I supply the compiler PIC flag via CFLAGS, CXXXFLAGS, and FCFLAGS

2019-05-29 Thread Jed Brown via petsc-users
Lisandro Dalcin  writes:

> On Tue, 28 May 2019 at 22:05, Jed Brown  wrote:
>
>>
>> Note that all of these compilers (including Sun C, which doesn't define
>> the macro) recognize -fPIC.  (Blue Gene xlc requires -qpic.)  Do we
>> still need to test the other alternatives?
>>
>>
> Well, worst case, if the configure test always fails with and without all
> the possible variants of  the PIC flag (-fPIC, -kPIC, -qpic, etc.) because
> they do not define the __PIC__ macro, then you are free to abort configure
> and ask users to pass the pic flag in CFLAGS and remove --with-pic=1 from
> the configure line, as we do today for all compilers.

Yeah, this would also need to apply to shared libraries, where need for
PIC is implied and we should proceed to confirm that linking works
(i.e., the old test) using user-provided CFLAGS.

> BTW, my trick seems to works with the Cray compiler.
>
> $ cc
> CC-2107 craycc: ERROR in command line
>   No valid filenames are specified on the command line.
>
> $ cc -c check-pic.c -fPIC
>
> $ cc -c check-pic.c
> CC-35 craycc: ERROR File = check-pic.c, Line = 2
>   #error directive: "no-PIC"
>   #error "no-PIC"
>^
>
>
> -- 
> Lisandro Dalcin
> 
> Research Scientist
> Extreme Computing Research Center (ECRC)
> King Abdullah University of Science and Technology (KAUST)
> http://ecrc.kaust.edu.sa/


Re: [petsc-users] How do I supply the compiler PIC flag via CFLAGS, CXXXFLAGS, and FCFLAGS

2019-05-28 Thread Jed Brown via petsc-users
No love with:

cc: Sun C 5.12 Linux_i386 2011/11/16


Note that all of these compilers (including Sun C, which doesn't define
the macro) recognize -fPIC.  (Blue Gene xlc requires -qpic.)  Do we
still need to test the other alternatives?

"Smith, Barry F."  writes:

>   Works for Intel and PGI compiles (the version I checked)
>
> bsmith@es:~$ pgcc check-pic.c  -PIC
> pgcc-Error-Unknown switch: -PIC
> bsmith@es:~$ pgcc check-pic.c  -fPIC
> bsmith@es:~$ pgcc check-pic.c 
> PGC-F-0249-#error --  "no-PIC" (check-pic.c: 2)
> PGC/x86-64 Linux 19.3-0: compilation aborted
> bsmith@es:~$ icc check-pic.c 
> check-pic.c(2): error: #error directive: "no-PIC"
>   #error "no-PIC"
>^
>
> compilation aborted for check-pic.c (code 2)
> bsmith@es:~$ icc check-pic.c -PIC
> icc: command line warning #10006: ignoring unknown option '-PIC'
> check-pic.c(2): error: #error directive: "no-PIC"
>   #error "no-PIC"
>^
>
> compilation aborted for check-pic.c (code 2)
> bsmith@es:~$ icc check-pic.c -fPIC
> bsmith@es:~$ 
>
>
> You are the man!
>
>
>> On May 28, 2019, at 12:29 PM, Lisandro Dalcin via petsc-users 
>>  wrote:
>> 
>> 
>> 
>> On Tue, 28 May 2019 at 18:19, Jed Brown  wrote:
>> Lisandro Dalcin via petsc-users  writes:
>> 
>> > On Tue, 28 May 2019 at 17:31, Balay, Satish via petsc-users <
>> > petsc-users@mcs.anl.gov> wrote:
>> >
>> >> Configure.log shows '--with-pic=1' - hence this error.
>> >>
>> >> Remove '--with-pic=1' and retry.
>> >>
>> >>
>> > Nonsense. Why this behavior? Building a static library with PIC code is a
>> > perfectly valid use case.
>> 
>> And that's what will happen because Inge passed -fPIC in CFLAGS et al.
>> 
>> Do you know how we could confirm that PIC code is generated without
>> attempting to use shared libraries?
>> 
>> 
>> I know how to do it with the `readelf` command for ELF objects. I even know 
>> how to do it compile-time for GCC and clang. Maybe Intel also works this 
>> way. I do not know about a general solution, though.
>> 
>> $ cat check-pic.c 
>> #ifndef __PIC__
>> #error "no-PIC"
>> #endif
>> 
>> $ gcc -c check-pic.c -fPIC
>> 
>> $ clang -c check-pic.c -fPIC
>> 
>> $ gcc -c check-pic.c
>> check-pic.c:2:2: error: #error "no-PIC"
>> 2 | #error "no-PIC"
>>   |  ^
>> 
>> $ clang -c check-pic.c
>> check-pic.c:2:2: error: "no-PIC"
>> #error "no-PIC"
>>  ^
>> 1 error generated.
>>  
>> -- 
>> Lisandro Dalcin
>> 
>> Research Scientist
>> Extreme Computing Research Center (ECRC)
>> King Abdullah University of Science and Technology (KAUST)
>> http://ecrc.kaust.edu.sa/


Re: [petsc-users] How do I supply the compiler PIC flag via CFLAGS, CXXXFLAGS, and FCFLAGS

2019-05-28 Thread Jed Brown via petsc-users
Lisandro Dalcin via petsc-users  writes:

> On Tue, 28 May 2019 at 17:31, Balay, Satish via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
>
>> Configure.log shows '--with-pic=1' - hence this error.
>>
>> Remove '--with-pic=1' and retry.
>>
>>
> Nonsense. Why this behavior? Building a static library with PIC code is a
> perfectly valid use case.

And that's what will happen because Inge passed -fPIC in CFLAGS et al.

Do you know how we could confirm that PIC code is generated without
attempting to use shared libraries?

if self.argDB['with-pic'] and not useSharedLibraries:
  # this is a flaw in configure; it is a legitimate use case where PETSc is 
built with PIC flags but not shared libraries
  # to fix it the capability to build shared libraries must be enabled in 
configure if --with-pic=true even if shared libraries are off and this
  # test must use that capability instead of using the default shared 
library build in that case which is static libraries
  raise RuntimeError("Cannot determine compiler PIC flags if shared 
libraries is turned off\nEither run using --with-shared-libraries or 
--with-pic=0 and supply the compiler PIC flag via CFLAGS, CXXXFLAGS, and 
FCFLAGS\n")


Re: [petsc-users] Calling LAPACK routines from PETSc

2019-05-20 Thread Jed Brown via petsc-users
Dave Lee via petsc-users  writes:

> Hi Petsc,
>
> I'm attempting to implement a "hookstep" for the SNES trust region solver.
> Essentially what I'm trying to do is replace the solution of the least
> squares problem at the end of each GMRES solve with a modified solution
> with a norm that is constrained to be within the size of the trust region.
>
> In order to do this I need to perform an SVD on the Hessenberg matrix,
> which copying the function KSPComputeExtremeSingularValues(), I'm trying to
> do by accessing the LAPACK function dgesvd() via the PetscStackCallBLAS()
> machinery. One thing I'm confused about however is the ordering of the 2D
> arrays into and out of this function, given that that C and FORTRAN arrays
> use reverse indexing, ie: C[j+1][i+1] = F[i,j].
>
> Given that the Hessenberg matrix has k+1 rows and k columns, should I be
> still be initializing this as H[row][col] and passing this into
> PetscStackCallBLAS("LAPACKgesvd",LAPACKgrsvd_(...))
> or should I be transposing this before passing it in?

LAPACK terminology is with respect to Fortran ordering.  There is a
"leading dimension" parameter so that you can operate on non-contiguous
blocks.  See KSPComputeExtremeSingularValues_GMRES for an example.

> Also for the left and right singular vector matrices that are returned by
> this function, should I be transposing these before I interpret them as C
> arrays?
>
> I've attached my modified version of gmres.c in case this is helpful. If
> you grep for DRL (my initials) then you'll see my changes to the code.
>
> Cheers, Dave.
>
> /*
> This file implements GMRES (a Generalized Minimal Residual) method.
> Reference:  Saad and Schultz, 1986.
>
>
> Some comments on left vs. right preconditioning, and restarts.
> Left and right preconditioning.
> If right preconditioning is chosen, then the problem being solved
> by gmres is actually
>My =  AB^-1 y = f
> so the initial residual is
>   r = f - Mx
> Note that B^-1 y = x or y = B x, and if x is non-zero, the initial
> residual is
>   r = f - A x
> The final solution is then
>   x = B^-1 y
>
> If left preconditioning is chosen, then the problem being solved is
>My = B^-1 A x = B^-1 f,
> and the initial residual is
>r  = B^-1(f - Ax)
>
> Restarts:  Restarts are basically solves with x0 not equal to zero.
> Note that we can eliminate an extra application of B^-1 between
> restarts as long as we don't require that the solution at the end
> of an unsuccessful gmres iteration always be the solution x.
>  */
>
> #include <../src/ksp/ksp/impls/gmres/gmresimpl.h>   /*I  "petscksp.h"  I*/
> #include  // DRL
> #define GMRES_DELTA_DIRECTIONS 10
> #define GMRES_DEFAULT_MAXK 30
> static PetscErrorCode 
> KSPGMRESUpdateHessenberg(KSP,PetscInt,PetscBool,PetscReal*);
> static PetscErrorCode KSPGMRESBuildSoln(PetscScalar*,Vec,Vec,KSP,PetscInt);
>
> PetscErrorCodeKSPSetUp_GMRES(KSP ksp)
> {
>   PetscInt   hh,hes,rs,cc;
>   PetscErrorCode ierr;
>   PetscInt   max_k,k;
>   KSP_GMRES  *gmres = (KSP_GMRES*)ksp->data;
>
>   PetscFunctionBegin;
>   max_k = gmres->max_k;  /* restart size */
>   hh= (max_k + 2) * (max_k + 1);
>   hes   = (max_k + 1) * (max_k + 1);
>   rs= (max_k + 2);
>   cc= (max_k + 1);
>
>   ierr = 
> PetscCalloc5(hh,>hh_origin,hes,>hes_origin,rs,>rs_origin,cc,>cc_origin,cc,>ss_origin);CHKERRQ(ierr);
>   ierr = PetscLogObjectMemory((PetscObject)ksp,(hh + hes + rs + 
> 2*cc)*sizeof(PetscScalar));CHKERRQ(ierr);
>
>   if (ksp->calc_sings) {
> /* Allocate workspace to hold Hessenberg matrix needed by lapack */
> ierr = PetscMalloc1((max_k + 3)*(max_k + 9),>Rsvd);CHKERRQ(ierr);
> ierr = PetscLogObjectMemory((PetscObject)ksp,(max_k + 3)*(max_k + 
> 9)*sizeof(PetscScalar));CHKERRQ(ierr);
> ierr = PetscMalloc1(6*(max_k+2),>Dsvd);CHKERRQ(ierr);
> ierr = 
> PetscLogObjectMemory((PetscObject)ksp,6*(max_k+2)*sizeof(PetscReal));CHKERRQ(ierr);
>   }
>
>   /* Allocate array to hold pointers to user vectors.  Note that we need
>4 + max_k + 1 (since we need it+1 vectors, and it <= max_k) */
>   gmres->vecs_allocated = VEC_OFFSET + 2 + max_k + gmres->nextra_vecs;
>
>   ierr = PetscMalloc1(gmres->vecs_allocated,>vecs);CHKERRQ(ierr);
>   ierr = PetscMalloc1(VEC_OFFSET+2+max_k,>user_work);CHKERRQ(ierr);
>   ierr = PetscMalloc1(VEC_OFFSET+2+max_k,>mwork_alloc);CHKERRQ(ierr);
>   ierr = 
> PetscLogObjectMemory((PetscObject)ksp,(VEC_OFFSET+2+max_k)*(sizeof(Vec*)+sizeof(PetscInt))
>  + gmres->vecs_allocated*sizeof(Vec));CHKERRQ(ierr);
>
>   if (gmres->q_preallocate) {
> gmres->vv_allocated = VEC_OFFSET + 2 + max_k;
>
> ierr = 
> KSPCreateVecs(ksp,gmres->vv_allocated,>user_work[0],0,NULL);CHKERRQ(ierr);
> ierr = 
> PetscLogObjectParents(ksp,gmres->vv_allocated,gmres->user_work[0]);CHKERRQ(ierr);
>
> gmres->mwork_alloc[0] = gmres->vv_allocated;
> gmres->nwork_alloc= 1;
> 

Re: [petsc-users] Question about parallel Vectors and communicators

2019-05-09 Thread Jed Brown via petsc-users
GIRET Jean-Christophe via petsc-users  writes:

> Hello,
>
> Thanks Mark and Jed for your quick answers.
>
> So the idea is to define all the Vecs on the world communicator, and perform 
> the communications using traditional scatter objects? The data would still be 
> accessible on the two sub-communicators as they are both subsets of the 
> MPI_COMM_WORLD communicator, but they would be used while creating the Vecs 
> or the IS for the scatter. Is that right?

I would use two global Vecs (on COMM_WORLD) to perform the scatter.
Those global Vec memory can be aliased to Vecs on subcomms with the same
local sizes.

> I’m currently trying, without success, to perform a Scatter from a MPI Vec 
> defined on a subcomm to another Vec defined on the world comm, and 
> vice-versa. But I don’t know if it’s possible.
>
> I can imagine that trying doing that seems a bit strange. However, I’m 
> dealing with code coupling (and linear algebra for the main part of the 
> code), and my idea was trying to use the Vec data structures to perform data 
> exchange between some parts of the software which would have their own 
> communicator. It would eliminate the need to re-implement an ad-hoc solution.
>
> An option would be to stick on the world communicator for all the PETSc part, 
> but I could face some situations where my Vecs could be small while I would 
> have to run the whole simulation on an important number of core for the 
> coupled part. I imagine that It may not really serve the linear system 
> solving part in terms of performance. Another one would be perform all the 
> PETSc operations on a sub-communicator and use “raw” MPI communications 
> between the communicators to perform the data exchange for the coupling part.
>
> Thanks again for your support,
> Best regards,
> Jean-Christophe
>
> De : Mark Adams [mailto:mfad...@lbl.gov]
> Envoyé : mardi 7 mai 2019 21:39
> À : GIRET Jean-Christophe
> Cc : petsc-users@mcs.anl.gov
> Objet : Re: [petsc-users] Question about parallel Vectors and communicators
>
>
>
> On Tue, May 7, 2019 at 11:38 AM GIRET Jean-Christophe via petsc-users 
> mailto:petsc-users@mcs.anl.gov>> wrote:
> Dear PETSc users,
>
> I would like to use Petsc4Py for a project extension, which consists mainly 
> of:
>
> -  Storing data and matrices on several rank/nodes which could not 
> fit on a single node.
>
> -  Performing some linear algebra in a parallel fashion (solving 
> sparse linear system for instance)
>
> -  Exchanging those data structures (parallel vectors) between 
> non-overlapping MPI communicators, created for instance by splitting 
> MPI_COMM_WORLD.
>
> While the two first items seems to be well addressed by PETSc, I am wondering 
> about the last one.
>
> Is it possible to access the data of a vector, defined on a communicator from 
> another, non-overlapping communicator? From what I have seen from the 
> documentation and the several threads on the user mailing-list, I would say 
> no. But maybe I am missing something? If not, is it possible to transfer a 
> vector defined on a given communicator on a communicator which is a subset of 
> the previous one?
>
> If you are sending to a subset of processes then VecGetSubVec + Jed's tricks 
> might work.
>
> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Vec/VecGetSubVector.html
>
>
> Best regards,
> Jean-Christophe


Re: [petsc-users] Question about parallel Vectors and communicators

2019-05-07 Thread Jed Brown via petsc-users
The standard approach would be to communicate via the parent comm.  So
you split comm world into part0 and part1 and use a VecScatter with vecs
on world (which can have zero entries on part1 and part0 respectively)
to exchange your data.  You can use VecPlaceArray or VecCreate*WithArray
to avoid an extra copy.

GIRET Jean-Christophe via petsc-users  writes:

> Dear PETSc users,
>
> I would like to use Petsc4Py for a project extension, which consists mainly 
> of:
>
> -  Storing data and matrices on several rank/nodes which could not 
> fit on a single node.
>
> -  Performing some linear algebra in a parallel fashion (solving 
> sparse linear system for instance)
>
> -  Exchanging those data structures (parallel vectors) between 
> non-overlapping MPI communicators, created for instance by splitting 
> MPI_COMM_WORLD.
>
> While the two first items seems to be well addressed by PETSc, I am wondering 
> about the last one.
>
> Is it possible to access the data of a vector, defined on a communicator from 
> another, non-overlapping communicator? From what I have seen from the 
> documentation and the several threads on the user mailing-list, I would say 
> no. But maybe I am missing something? If not, is it possible to transfer a 
> vector defined on a given communicator on a communicator which is a subset of 
> the previous one?
>
> Best regards,
> Jean-Christophe


Re: [petsc-users] ``--with-clanguage=c++" turns on "PETSC_HAVE_COMPLEX"?

2019-05-03 Thread Jed Brown via petsc-users
Fande Kong via petsc-users  writes:

> It looks like mpicxx from openmpi does not handle this correctly. I
> switched to mpich, and it works now.
>
> However there is till some warnings:
>
> *clang-6.0: warning: treating 'c' input as 'c++' when in C++ mode, this
> behavior is deprecated [-Wdeprecated]*
> * CXX arch-linux2-c-opt-memory/obj/dm/impls/plex/glexg.o*
> * CXX arch-linux2-c-opt-memory/obj/dm/impls/plex/petscpartmatpart.o*
> *clang-6.0: warning: treating 'c' input as 'c++' when in C++ mode, this
> behavior is dep*

Clang has always done this.  I don't know a clean way to work around the
warning, but we don't recommend with-clanguage=c++ unless you're on a
platform where a sane C compiler doesn't exist.


Re: [petsc-users] DMDASetBlockFillsSparse format

2019-04-21 Thread Jed Brown via petsc-users
I'm pretty confident all the tests are sorted.  It wouldn't be any great
hardship for us to allow unsorted input.  If you submit an unsorted
test, we can make sure it works (it might already, but we should
probably add a call to in-place sort).

Oleksandr Koshkarov via petsc-users  writes:

> Dear All,
> Does anyone know if the format of sparse matrices
> in DMDASetBlockFillsSparse require column indicies for chosen row to be in
> increasing order?
> Thank you,
> Oleksandr.


Re: [petsc-users] VecView to hdf5 broken for large (complex) vectors

2019-04-17 Thread Jed Brown via petsc-users
"Smith, Barry F."  writes:

>   This is fine for "hacking" on PETSc but worthless for any other package. 
> Here is my concern, when someone 
> realizes there is a problem with a package they are using through a package 
> manager they think, crud I have to
>
> 1) find the git repository for this package
> 2) git clone the package 
> 3) figure out how to build the package from source, is it ./configure, cmake, 
> what are the needed arguments,... 
> 4) wait for the entire thing to build 
>
> then I can go in and investigate the problem and provide and test the fix via 
> a pull request. Heck I'm not going to bother.

Thanks for explaining.  Yes, this could certainly be improved.  Spack
could run a git diff after a successful build and offer it as a patch.
That might help inform us about hacks that people need even if not
intended for acceptance as-is.


Re: [petsc-users] VecView to hdf5 broken for large (complex) vectors

2019-04-16 Thread Jed Brown via petsc-users
"Smith, Barry F. via petsc-users"  writes:

>   So it sounds like spack is still mostly a "package manager" where people 
> use "static" packages and don't hack the package's code. This is not 
> unreasonable, no other package manager supports hacking a package's code 
> easily, presumably. The problem is that in the HPC world a "packaged"
> code is always incomplete and may need hacking or application of newly 
> generated patches and this is painful with static package managers so people 
> want to use the git repository directly and mange the build themselves which 
> negates the advantages of using a package manager.

I don't think people "want" to hack the packages that they don't
contribute to.  Spack provides pretty rapid distribution of patches.

What if PETSc had

  ./configure --with-mumps=spack

or some alternative that would check with spack to find a suitable
MUMPS, installing it (with Spack's dependency resolution) if not
available?  Then you could hack on PETSc with multiple PETSC_ARCH,
branches, incremental rebuilds, and testing, but not need to deal with
PETSc's crude package installation and upgrade mechanism.

Upon completion, the build could offer a yaml snippet for packages.yaml
in case the user wanted other Spack packages to use that PETSc.


Re: [petsc-users] Bad memory scaling with PETSc 3.10

2019-04-10 Thread Jed Brown via petsc-users
"Zhang, Hong"  writes:

> Jed:
>>> Myriam,
>>> Thanks for the plot. '-mat_freeintermediatedatastructures' should not 
>>> affect solution. It releases almost half of memory in C=PtAP if C is not 
>>> reused.
>
>> And yet if turning it on causes divergence, that would imply a bug.
>> Hong, are you able to reproduce the experiment to see the memory
>> scaling?
>
> I like to test his code using an alcf machine, but my hands are full now. 
> I'll try it as soon as I find time, hopefully next week.

I have now compiled and run her code locally.

Myriam, thanks for your last mail adding configuration and removing the
MemManager.h dependency.  I ran with and without
-mat_freeintermediatedatastructures and don't see a difference in
convergence.  What commands did you run to observe that difference?


Re: [petsc-users] Bad memory scaling with PETSc 3.10

2019-04-10 Thread Jed Brown via petsc-users
"Zhang, Hong via petsc-users"  writes:

> Myriam,
> Thanks for the plot. '-mat_freeintermediatedatastructures' should not affect 
> solution. It releases almost half of memory in C=PtAP if C is not reused.

And yet if turning it on causes divergence, that would imply a bug.
Hong, are you able to reproduce the experiment to see the memory
scaling?


Re: [petsc-users] Required Help on Calculation of All-to-All broadcast in a Balanced Binary Tree -Please

2019-04-09 Thread Jed Brown via petsc-users
This sounds an awful lot like a homework question.  In any case, it does
not relate directly to PETSc and is thus off-topic for this list.

Zulfi Khan via petsc-users  writes:

> Calculation of Cost of all-to-broadcast for a Balanced Binary Tree
>
> Hi,
>
> I have a question is:
>
> Given a balanced binary tree as shown in Figure 4.7 (attached),describe a 
> procedure to perform all-to-all broadcast that takes time (ts +twmp/2)logp 
> for m-word messages on p nodes (assuming th is ignored). Assumethat only the 
> leaves of the tree contain nodes, and that an exchange of twom-word messages 
> between any two nodes connected by bidirectional channels takestime ts + twmk 
> if the communication channel (or a part of it) is shared by ksimultaneous 
> messages.
>
>  
>
> Equations:
>
> comm cost = ts + tw* m* k (as given in the question)
>
> I am trying to calculate the cost of balanced binary tree for allto all 
> broadcast. I have created a procedure for this broadcast:
>
>  P0 P1 P2 P3  (left sub tree): P4, P5, P6, P7(right sub tree)
>
> 1st Step: broadcast from left subtree to right subtree(for 8 processors four 
> on left subtree and four on right subtree), the cost is:
>
>
>
>
> P0 exchanges with P7 (i.e each P0 & P7 contains 2 messages)
>
> P1 exchanges with P6 (i.e each P1 & P6 contains 2 messages) 
>
> P2 exchanges with P5 (i.e. each P2 & P5 contains 2 messages)
>
> P3 exchanges with P4 (i.e each P3 and P4 contains 2 messages)
>
> 2nd Step: Broadcast Across  the Subtrees Within the Left and RightSubtree
>
> P0 exchanges with P3 (i.e each P0 & P3 contains 4 messages)
>
> P1 exchanges with P2 (i.e each P1 & P2 contains 4 messages)
>
> P4 exchanges with P7 (i.e. each P4 & P7 contains 4 messages)
>
> P5 exchanges with P6 (i.e. each P5 & P6 contains 4 messages)  
>
> 3rd Step: Broadcast Within the Subtrees of Left and Right Subtree
>
> P0 exchanges with P1 (i.e each P0 & P1 contains 8 messages)
>
> P2 exchanges with P3 (i.e each P2 & P3 contains 8 messages)
>
> P4 exchanges with P5 ((i.e each P4 & P5 contains 8 messages)
>
> P6 exchanges with P7 (i.e. each P6 & P7 contains 8 messages)
>
> Step 1: t_s + t_w m * 2 //Let k =2.Is k correct?
>
> Step 2: t_s + t_w m *  1  //Letk = 1. Is  k correct?
>
> Step3: t_s +t_w m * 1// Let k = 1. Is k  correct?
>
> logp(t_s + 2 * t_w m + 1 * t_w m +1 * t_w m)
>
>  t_comm = logp(t_s+4t_w m)
>
> Now p/2 = 4
>
> There t_comm = logp(t_s + t_w mp/2)
>
>  
>
> My answer is correct as given inthe question but I don’t have any 
> justification for choosing the value of ‘k’?
>
> Can some body please guide me is mysolution correct? What is the 
> justification of choosing the value of k?
>
>  
>
> Stack Exchange link is;
> https://cs.stackexchange.com/questions/106631/all-to-all-broadcast-on-a-balanced-binary-tree
>
> Kindly help me, I would be very much thankful to you guys.
>
> Zulfi.


Re: [petsc-users] PetscSortIntWithDataArray

2019-04-07 Thread Jed Brown via petsc-users
Fande Kong via petsc-users  writes:

> Hi All,
>
> *Input Parameters*
>
> *n - number of values*
> *i - array of integers*
> *Ii - second array of data*
> *size - sizeof elements in the data array in bytes*
> *work - workspace of "size" bytes used when sorting*
>
>
> size is the size of one element or all elements? It looks like it is the
> size of one element when looked into the code.

Each element.


Re: [petsc-users] Estimate memory needs for large grids

2019-04-05 Thread Jed Brown via petsc-users
Memory use will depend on the preconditioner.  This will converge very
slowly (i.e., never) without multigrid unless time steps are small.
Depending on how rough the coefficients are, you may be able to use
geometric multigrid, which has pretty low setup costs and memory
requirements.

To estimate memory with an arbitrary preconditioner, I would run a
smaller problem using the desired preconditioner and check its memory
use using -log_view.  From that you can estimate total memory
requirements for the target job.

Sajid Ali via petsc-users  writes:

> Hi,
>
> I've solving a simple linear equation [ u_t = A*u_xx + A*u_yy + F_t*u ] on
> a grid size of 55296x55296. I'm reading a vector of that size from an hdf5
> file and have the jacobian matrix as a modified 5-point stencil which is
> preallocated with the following
> ```
>   ierr = MatCreate(PETSC_COMM_WORLD,);CHKERRQ(ierr);
>   ierr = MatSetSizes(A,PETSC_DECIDE,PETSC_DECIDE,M,M);CHKERRQ(ierr);
>   ierr = MatSetType(A,MATMPIAIJ);CHKERRQ(ierr);
>   ierr = MatSetFromOptions(A);CHKERRQ(ierr);
>   ierr = MatMPIAIJSetPreallocation(A,5,NULL,5,NULL);CHKERRQ(ierr);
>   ierr = MatSeqAIJSetPreallocation(A,5,NULL);CHKERRQ(ierr);
> ```
> Total number of elements is ~3e9 and the matrix size is ~9e9 (but only 5
> diagonals are non zeros). I'm reading F_t which has ~3e9 elements. I'm
> using double complex numbers and I've compiled with int64 indices.
>
> Thus, for the vector I need, 55296x55296x2x8 bytes ~ 50Gb and for the F
> vector, another 50 Gb. For the matrix I need ~250 Gb and some overhead for
> the solver.
>
> How do I estimate this overhead (and estimate how many nodes I would need
> to run this given the maximum memory per node (as specified by slurm's
> --mem option)) ?
>
> Thanks in advance for the help!
>
> -- 
> Sajid Ali
> Applied Physics
> Northwestern University


Re: [petsc-users] PetscSFReduceBegin can not handle MPI_CHAR?

2019-04-05 Thread Jed Brown via petsc-users
Junchao's PR has been merged to 'master'.

https://bitbucket.org/petsc/petsc/pull-requests/1511/add-signed-char-unsigned-char-and-char

Fande Kong via petsc-users  writes:

> Thanks for the reply.  It is not necessary for me to use MPI_SUM.  I think 
> the better choice is MPIU_REPLACE. Doesn’t MPIU_REPLACE work for any 
> mpi_datatype?
>
> Fande 
>
>
>> On Apr 3, 2019, at 9:15 PM, Zhang, Junchao  wrote:
>> 
>> 
>>> On Wed, Apr 3, 2019 at 3:41 AM Lisandro Dalcin via petsc-users 
>>>  wrote:
>>> IIRC, MPI_CHAR is for ASCII text data. Also, remember that in C the 
>>> signedness of plain `char` is implementation (or platform?) dependent.
>>>  I'm not sure MPI_Reduce() is supposed to / should  handle MPI_CHAR, you 
>>> should use MPI_{SIGNED|UNSIGNED}_CHAR for that. Note however that 
>>> MPI_SIGNED_CHAR is from MPI 2.0.
>> 
>> MPI standard chapter 5.9.3, says "MPI_CHAR, MPI_WCHAR, and MPI_CHARACTER 
>> (which represent printable characters) cannot be used in reduction 
>> operations"
>> So Fande's code and Jed's branch have problems. To fix that, we have to add 
>> support for signed char, unsigned char, and char in PetscSF.  The first two 
>> types support add, mult, logical and bitwise operations. The last is a dumb 
>> type, only supports pack/unpack. With this fix, PetscSF/MPI would raise 
>> error on Fande's code. I can come up with a fix tomorrow.
>>  
>>> 
 On Wed, 3 Apr 2019 at 07:01, Fande Kong via petsc-users 
  wrote:
 Hi All,
 
 There were some error messages when using PetscSFReduceBegin with 
 MPI_CHAR. 
 
 ierr = 
 PetscSFReduceBegin(ptap->sf,MPI_CHAR,rmtspace,space,MPI_SUM);CHKERRQ(ierr);
 
 
 My question would be: Does PetscSFReduceBegin suppose work with MPI_CHAR? 
 If not, should we document somewhere?
 
 Thanks
 
 Fande,
 
 
 [0]PETSC ERROR: - Error Message 
 --
 [0]PETSC ERROR: No support for this operation for this object type
 [0]PETSC ERROR: No support for type size not divisible by 4
 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html 
 for trouble shooting.
 [0]PETSC ERROR: Petsc Development GIT revision: v3.10.4-1989-gd816d1587e  
 GIT Date: 2019-04-02 17:37:18 -0600
 [0]PETSC ERROR: [1]PETSC ERROR: - Error Message 
 --
 [1]PETSC ERROR: No support for this operation for this object type
 [1]PETSC ERROR: No support for type size not divisible by 4
 [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html 
 for trouble shooting.
 [1]PETSC ERROR: Petsc Development GIT revision: v3.10.4-1989-gd816d1587e  
 GIT Date: 2019-04-02 17:37:18 -0600
 [1]PETSC ERROR: ./ex90 on a arch-linux2-c-dbg-feature-ptap-all-at-once 
 named fn605731.local by kongf Tue Apr  2 21:48:41 2019
 [1]PETSC ERROR: Configure options --download-hypre=1 --with-debugging=yes 
 --with-shared-libraries=1 --download-fblaslapack=1 --download-metis=1 
 --download-parmetis=1 --download-superlu_dist=1 
 PETSC_ARCH=arch-linux2-c-dbg-feature-ptap-all-at-once --download-ptscotch 
 --download-party --download-chaco --with-cxx-dialect=C++11
 [1]PETSC ERROR: #1 PetscSFBasicPackTypeSetup() line 678 in 
 /Users/kongf/projects/petsc/src/vec/is/sf/impls/basic/sfbasic.c
 [1]PETSC ERROR: #2 PetscSFBasicGetPack() line 804 in 
 /Users/kongf/projects/petsc/src/vec/is/sf/impls/basic/sfbasic.c
 [1]PETSC ERROR: #3 PetscSFReduceBegin_Basic() line 1024 in 
 /Users/kongf/projects/petsc/src/vec/is/sf/impls/basic/sfbasic.c
 ./ex90 on a arch-linux2-c-dbg-feature-ptap-all-at-once named 
 fn605731.local by kongf Tue Apr  2 21:48:41 2019
 [0]PETSC ERROR: Configure options --download-hypre=1 --with-debugging=yes 
 --with-shared-libraries=1 --download-fblaslapack=1 --download-metis=1 
 --download-parmetis=1 --download-superlu_dist=1 
 PETSC_ARCH=arch-linux2-c-dbg-feature-ptap-all-at-once --download-ptscotch 
 --download-party --download-chaco --with-cxx-dialect=C++11
 [0]PETSC ERROR: #1 PetscSFBasicPackTypeSetup() line 678 in 
 /Users/kongf/projects/petsc/src/vec/is/sf/impls/basic/sfbasic.c
 [0]PETSC ERROR: #2 PetscSFBasicGetPack() line 804 in 
 /Users/kongf/projects/petsc/src/vec/is/sf/impls/basic/sfbasic.c
 [0]PETSC ERROR: #3 PetscSFReduceBegin_Basic() line 1024 in 
 /Users/kongf/projects/petsc/src/vec/is/sf/impls/basic/sfbasic.c
 [0]PETSC ERROR: #4 PetscSFReduceBegin() line 1208 in 
 /Users/kongf/projects/petsc/src/vec/is/sf/interface/sf.c
 [0]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIAIJ_allatonce() line 850 in 
 /Users/kongf/projects/petsc/src/mat/impls/aij/mpi/mpiptap.c
 [0]PETSC ERROR: #6 MatPtAP_MPIAIJ_MPIAIJ() line 202 in 
 /Users/kongf/projects/petsc/src/mat/impls/aij/mpi/mpiptap.c
 

Re: [petsc-users] PetscSFReduceBegin can not handle MPI_CHAR?

2019-04-04 Thread Jed Brown via petsc-users
Fande Kong via petsc-users  writes:

> Hi Jed,
>
> One more question. Is it fine to use the same SF to exchange two groups of
> data at the same time? What is the better way to do this

This should work due to the non-overtaking property defined by MPI.

> Fande Kong,
>
>  ierr =
> PetscSFReduceBegin(ptap->sf,MPIU_INT,rmtspace,space,MPIU_REPLACE);CHKERRQ(ierr);
>  ierr =
> PetscSFReduceBegin(ptap->sf,MPI_CHAR,rmtspace2,space2,MPIU_REPLACE);CHKERRQ(ierr);
>  Doing some calculations
>  ierr =
> PetscSFReduceEnd(ptap->sf,MPIU_INT,rmtspace,space,MPIU_REPLACE);CHKERRQ(ierr);
>  ierr =
> PetscSFReduceEnd(ptap->sf,MPI_CHAR,rmtspace2,space2,MPIU_REPLACE);CHKERRQ(ierr);


Re: [petsc-users] PetscSFReduceBegin can not handle MPI_CHAR?

2019-04-04 Thread Jed Brown via petsc-users
"Zhang, Junchao via petsc-users"  writes:

> MPI standard chapter 5.9.3, says "MPI_CHAR, MPI_WCHAR, and MPI_CHARACTER 
> (which represent printable characters) cannot be used in reduction operations"
> So Fande's code and Jed's branch have problems. To fix that, we have to add 
> support for signed char, unsigned char, and char in PetscSF.  The first two 
> types support add, mult, logical and bitwise operations. The last is a dumb 
> type, only supports pack/unpack. With this fix, PetscSF/MPI would raise error 
> on Fande's code. I can come up with a fix tomorrow.

FWIW, my patch did not define arithmetic or comparison operations.  It
only added moving bytes for sizes smaller than MPI_INT.  Perhaps it
should, but that would require identifying more types, as you say.


Re: [petsc-users] testing for and removing a null space using JFNK

2019-04-04 Thread Jed Brown via petsc-users
Mark Adams via petsc-users  writes:

> On Thu, Apr 4, 2019 at 7:35 AM Dave Lee  wrote:
>
>> I already have the Navier Stokes solver. My issue is wrapping it in a JFNK
>> solver to find the periodic solutions. I will keep reading up on SVD
>> approaches, there may be some capability for something like this in SLEPc.
>>
>
> Yes, SLEPc will give you parallel eigen solvers, etc.

Even so, computing a null space will be *much* more expensive than solving 
linear systems.


Re: [petsc-users] Bad memory scaling with PETSc 3.10

2019-04-03 Thread Jed Brown via petsc-users
Myriam Peyrounette via petsc-users  writes:

> Hi all,
>
> for your information, you'll find attached the comparison of the weak
> memory scalings when using :
>
> - PETSc 3.6.4 (reference)
> - PETSc 3.10.4 without specific options
> - PETSc 3.10.4 with the three scalability options you mentionned
>
> Using the scalability options does improve the memory scaling. However,
> the 3.6 version still has a better one...

Yes, this still looks significant.  Is this an effect we can still
reproduce with a PETSc example and/or using a memory profiler (such as
massif or gperftools)?  I think it's important for us to narrow down
what causes this difference (looks like almost 2x on your 1e8 problem
size) so we can fix.


Re: [petsc-users] PetscSFReduceBegin can not handle MPI_CHAR?

2019-04-02 Thread Jed Brown via petsc-users
You can try branch 'jed/feature-sf-char'; not tested yet.

Fande Kong  writes:

> Thanks, Jed,
>
> Please let me know when the patch is in master.
>
> Fande 
>
>
>> On Apr 2, 2019, at 10:47 PM, Jed Brown  wrote:
>> 
>> Fande Kong  writes:
>> 
>>> I am working on petsc master. So it should be fine to have it in 3.11
>> 
>> Cool, I'd rather just do it in 'master'.


Re: [petsc-users] PetscSFReduceBegin can not handle MPI_CHAR?

2019-04-02 Thread Jed Brown via petsc-users
Fande Kong  writes:

> I am working on petsc master. So it should be fine to have it in 3.11

Cool, I'd rather just do it in 'master'.


Re: [petsc-users] PetscSFReduceBegin can not handle MPI_CHAR?

2019-04-02 Thread Jed Brown via petsc-users
We can add it easily.  Would it be enough to add it to petsc-3.11.*?
(I'd rather not backport to an earlier version, for which we presumably
won't have any more maintenance releases.)

Fande Kong via petsc-users  writes:

> Hi All,
>
> There were some error messages when using PetscSFReduceBegin with MPI_CHAR.
>
> ierr =
> PetscSFReduceBegin(ptap->sf,MPI_CHAR,rmtspace,space,MPI_SUM);CHKERRQ(ierr);
>
>
> My question would be: Does PetscSFReduceBegin suppose work with MPI_CHAR?
> If not, should we document somewhere?
>
> Thanks
>
> Fande,
>
>
> [*0]PETSC ERROR: - Error Message
> --*
> *[0]PETSC ERROR: No support for this operation for this object type*
> *[0]PETSC ERROR: No support for type size not divisible by 4*
> *[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
>  for trouble shooting.*
> *[0]PETSC ERROR: Petsc Development GIT revision: v3.10.4-1989-gd816d1587e
>  GIT Date: 2019-04-02 17:37:18 -0600*
> *[0]PETSC ERROR: [1]PETSC ERROR: - Error Message
> --*
> *[1]PETSC ERROR: No support for this operation for this object type*
> *[1]PETSC ERROR: No support for type size not divisible by 4*
> *[1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
>  for trouble shooting.*
> *[1]PETSC ERROR: Petsc Development GIT revision: v3.10.4-1989-gd816d1587e
>  GIT Date: 2019-04-02 17:37:18 -0600*
> *[1]PETSC ERROR: ./ex90 on a arch-linux2-c-dbg-feature-ptap-all-at-once
> named fn605731.local by kongf Tue Apr  2 21:48:41 2019*
> *[1]PETSC ERROR: Configure options --download-hypre=1 --with-debugging=yes
> --with-shared-libraries=1 --download-fblaslapack=1 --download-metis=1
> --download-parmetis=1 --download-superlu_dist=1
> PETSC_ARCH=arch-linux2-c-dbg-feature-ptap-all-at-once --download-ptscotch
> --download-party --download-chaco --with-cxx-dialect=C++11*
> *[1]PETSC ERROR: #1 PetscSFBasicPackTypeSetup() line 678 in
> /Users/kongf/projects/petsc/src/vec/is/sf/impls/basic/sfbasic.c*
> *[1]PETSC ERROR: #2 PetscSFBasicGetPack() line 804 in
> /Users/kongf/projects/petsc/src/vec/is/sf/impls/basic/sfbasic.c*
> *[1]PETSC ERROR: #3 PetscSFReduceBegin_Basic() line 1024 in
> /Users/kongf/projects/petsc/src/vec/is/sf/impls/basic/sfbasic.c*
> *./ex90 on a arch-linux2-c-dbg-feature-ptap-all-at-once named
> fn605731.local by kongf Tue Apr  2 21:48:41 2019*
> *[0]PETSC ERROR: Configure options --download-hypre=1 --with-debugging=yes
> --with-shared-libraries=1 --download-fblaslapack=1 --download-metis=1
> --download-parmetis=1 --download-superlu_dist=1
> PETSC_ARCH=arch-linux2-c-dbg-feature-ptap-all-at-once --download-ptscotch
> --download-party --download-chaco --with-cxx-dialect=C++11*
> *[0]PETSC ERROR: #1 PetscSFBasicPackTypeSetup() line 678 in
> /Users/kongf/projects/petsc/src/vec/is/sf/impls/basic/sfbasic.c*
> *[0]PETSC ERROR: #2 PetscSFBasicGetPack() line 804 in
> /Users/kongf/projects/petsc/src/vec/is/sf/impls/basic/sfbasic.c*
> *[0]PETSC ERROR: #3 PetscSFReduceBegin_Basic() line 1024 in
> /Users/kongf/projects/petsc/src/vec/is/sf/impls/basic/sfbasic.c*
> *[0]PETSC ERROR: #4 PetscSFReduceBegin() line 1208 in
> /Users/kongf/projects/petsc/src/vec/is/sf/interface/sf.c*
> *[0]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIAIJ_allatonce() line 850 in
> /Users/kongf/projects/petsc/src/mat/impls/aij/mpi/mpiptap.c*
> *[0]PETSC ERROR: #6 MatPtAP_MPIAIJ_MPIAIJ() line 202 in
> /Users/kongf/projects/petsc/src/mat/impls/aij/mpi/mpiptap.c*
> *[0]PETSC ERROR: #7 MatPtAP() line 9429 in
> /Users/kongf/projects/petsc/src/mat/interface/matrix.c*
> *[0]PETSC ERROR: #8 main() line 58 in
> /Users/kongf/projects/petsc/src/mat/examples/tests/ex90.c*
> *[0]PETSC ERROR: PETSc Option Table entries:*
> *[0]PETSC ERROR: -matptap_via allatonce*
> *[0]PETSC ERROR: End of Error Message ---send entire
> error message to petsc-ma...@mcs.anl.gov--*
> *[1]PETSC ERROR: #4 PetscSFReduceBegin() line 1208 in
> /Users/kongf/projects/petsc/src/vec/is/sf/interface/sf.c*
> *[1]PETSC ERROR: #5 MatPtAPNumeric_MPIAIJ_MPIAIJ_allatonce() line 850 in
> /Users/kongf/projects/petsc/src/mat/impls/aij/mpi/mpiptap.c*
> *[1]PETSC ERROR: #6 MatPtAP_MPIAIJ_MPIAIJ() line 202 in
> /Users/kongf/projects/petsc/src/mat/impls/aij/mpi/mpiptap.c*
> *[1]PETSC ERROR: #7 MatPtAP() line 9429 in
> /Users/kongf/projects/petsc/src/mat/interface/matrix.c*
> *[1]PETSC ERROR: #8 main() line 58 in
> /Users/kongf/projects/petsc/src/mat/examples/tests/ex90.c*
> *[1]PETSC ERROR: PETSc Option Table entries:*
> *[1]PETSC ERROR: -matptap_via allatonce*
> *[1]PETSC ERROR: End of Error Message ---send entire
> error message to petsc-ma...@mcs.anl.gov--*
> *--*
> *MPI_ABORT was 

Re: [petsc-users] Converting complex PDE to real for KNL performance ?

2019-03-27 Thread Jed Brown via petsc-users
When you roll your own equivalent real formulation, PETSc has no way of
knowing what conjugate transpose might mean, thus symmetry is lost.  I
would suggest just using the AVX2 implementation for now and putting in
a request (or contributing a patch) for AVX-512 complex optimizations.

Sajid Ali via petsc-users  writes:

>  Hi,
>
> I'm able to solve the following equation using complex numbers (with
> ts_type cn and pc_type gamg) :
>   u_t = A*u'' + F_t*u;
> (where A = -1j/(2k) amd u'' refers to u_xx+u_yy implemented with the
> familiar 5-point stencil)
>
> Now, I want to solve the same problem using real numbers. The equivalent
> equations are:
> u_t_real   =  1/(2k) * u''_imag + F_real*u_real   - F_imag*u_imag
> u_t_imag = -1/(2k) * u''_real   + F_imag*u_real - F_real*u_imag
>
> Thus, if we now take our new u vector to have twice the length of the
> problem we're solving, keeping the first half as real and the second half
> as imaginary, we'd get a matrix that had matrices computing the laplacian
> via the 5-point stencil in the top-right and bottom-left corners and a
> diagonal [F_real+F_imag, F_real-F_imag] term.
>
> I tried doing this and the gamg preconditioner complains about an
> unsymmetric matrix. If i use the default preconditioner, I get
> DIVERGED_NONLINEAR_SOLVE.
>
> Is there a way to better organize the matrix ?
>
> PS: I'm trying to do this using only real numbers because I realized that
> the optimized avx-512 kernels for KNL are not implemented for complex
> numbers. Would that be implemented soon ?
>
> Thank You,
> Sajid Ali
> Applied Physics
> Northwestern University


Re: [petsc-users] DMCreateSubDM() not available in petsc4py

2019-03-21 Thread Jed Brown via petsc-users
Justin Chang via petsc-users  writes:

> Hi all,
>
> I'm writing a petsc4py routine to manually create nested fieldsplits using
> index sets, and it looks like whenever I move onto the next level of splits
> I need to rescale the IS's.
>
> From the PCFieldSplitSetDefault() routine, it looks like DMCreateSubDM()
> does exactly this here
> .
> However, I see no such Python wrapper for this function. While I could
> write my own, I kind of have my hands tied behind my back by using the
> petsc4py that's attached to FEniCS

What do you mean "attached to FEniCS"?

> - I wonder if there's a general strategy or workaround for writing
> your own recursive fieldsplitsetIS.
>
> Thanks,
> Justin


Re: [petsc-users] Saving Vecs/Mats in HDF5 and visualizing in Matlab

2019-03-19 Thread Jed Brown via petsc-users
Yuyun Yang via petsc-users  writes:

> It's simply for visualization purposes. I wasn't sure if HDF5 would perform 
> better than binary, and what specific functions are needed to load the PETSc 
> vectors/matrices, so wanted to ask for some advice here. Since Matt mentioned 
> it's not likely to be much faster than binary, I guess there is no need to 
> make the change?

There is no speed benefit.  The advantage of HDF5 is that it supports
better metadata, including the data types and sizes.  The PETSc data
format is quick, dirty, and fast.


Re: [petsc-users] Using PETSc with GPU

2019-03-15 Thread Jed Brown via petsc-users
Yuyun Yang via petsc-users  writes:

> Currently we are forming the sparse matrices explicitly, but I think the goal 
> is to move towards matrix-free methods and use a stencil, which I suppose is 
> good to use GPUs for and more efficient. On the other hand, I've also read 
> about matrix-free operations in the manual just on the CPUs. Would there be 
> any benefit then to switching to GPU (looks like matrix-free in PETSc is 
> rather straightforward to use, whereas writing the kernel function for GPU 
> stencil would require quite a lot of work)?

It all depends what kind of computation happens in there and how well
you can implement it for the GPU.  It's important to have a clear idea
of what you expect to achieve.  For example, if you write an excellent
GPU implementation of your SNES residual/matrix-free Jacobian, it might
be 2-3x faster than a good CPU implementation on hardware of similar
cost ($ or Watt).  But you still need preconditioning, which is usually
at least half the work, and perhaps a preconditioner runs the same speed
on GPU and CPU (CPU version often converges a bit faster;
preconditioning operations are often less amenable to GPUs).  So after
all that effort, and now with code that is likely harder to maintain,
you go from 4 seconds per solve to 3 seconds per solve on hardware of
the same cost.  Is that worth it?

Maybe, but you probably want that to be in the critical path for your
research and/or customers.


Re: [petsc-users] GAMG parallel convergence sensitivity

2019-03-14 Thread Jed Brown via petsc-users
Mark Lohry  writes:

> It seems to me with these semi-implicit methods the CFL limit is still so
> close to the explicit limit (that paper stops at 30), I don't really see
> the purpose unless you're running purely incompressible? That's just my
> ignorance speaking though. I'm currently running fully implicit for
> everything, with CFLs around 1e3 - 1e5 or so.

It depends what you're trying to resolve.  Sounds like maybe you're
stepping toward steady state.  The paper is wishing to resolve vortex
and baroclinic dynamics while stepping over acoustics and barotropic
waves.


Re: [petsc-users] GAMG parallel convergence sensitivity

2019-03-13 Thread Jed Brown via petsc-users
Mark Lohry via petsc-users  writes:

> For what it's worth, I'm regularly solving much larger problems (1M-100M
> unknowns, unsteady) with this discretization and AMG setup on 500+ cores
> with impressively great convergence, dramatically better than ILU/ASM. This
> just happens to be the first time I've experimented with this extremely low
> Mach number, which is known to have a whole host of issues and generally
> needs low-mach preconditioners, I was just a bit surprised by this specific
> failure mechanism.

A common technique for low-Mach preconditioning is to convert to
primitive variables (much better conditioned for the solve) and use a
Schur fieldsplit into the pressure space.  For modest time step, you can
use SIMPLE-like method ("selfp" in PCFieldSplit lingo) to approximate
that Schur complement.  You can also rediscretize to form that
approximation.  This paper has a bunch of examples of choices for the
state variables and derivation of the continuous pressure preconditioner
each case.  (They present it as a classical semi-implicit method, but
that would be the Schur complement preconditioner if using FieldSplit
with a fully implicit or IMEX method.)

https://doi.org/10.1137/090775889


Re: [petsc-users] PCFieldSplit with MatNest

2019-03-13 Thread Jed Brown via petsc-users
Is there any output if you run with -malloc_dump?

Manuel Colera Rico via petsc-users  writes:

> Hi, Junchao,
>
> I have installed the newest version of PETSc and it works fine. I just 
> get the following memory leak warning:
>
> Direct leak of 28608 byte(s) in 12 object(s) allocated from:
>      #0 0x7f1ddd5caa38 in __interceptor_memalign 
> ../../../../gcc-8.1.0/libsanitizer/asan/asan_malloc_linux.cc:111
>      #1 0x7f1ddbef1213 in PetscMallocAlign 
> (/opt/PETSc_library/petsc-3.10.4/mcr_20190313/lib/libpetsc.so.3.10+0x150213)
>
> Thank you,
>
> Manuel
>
> ---
>
> On 3/12/19 7:08 PM, Zhang, Junchao wrote:
>> Hi, Manuel,
>>   I recently fixed a problem in VecRestoreArrayRead. Basically, I 
>> added VecRestoreArrayRead_Nest. Could you try the master branch of 
>> PETSc to see if it fixes your problem?
>>   Thanks.
>>
>> --Junchao Zhang
>>
>>
>> On Mon, Mar 11, 2019 at 6:56 AM Manuel Colera Rico via petsc-users 
>> mailto:petsc-users@mcs.anl.gov>> wrote:
>>
>> Hello,
>>
>> I need to solve a 2*2 block linear system. The matrices A_00, A_01,
>> A_10, A_11 are constructed separately via
>> MatCreateSeqAIJWithArrays and
>> MatCreateSeqSBAIJWithArrays. Then, I construct the full system matrix
>> with MatCreateNest, and use MatNestGetISs and PCFieldSplitSetIS to
>> set
>> up the PC, trying to follow the procedure described here:
>> 
>> https://www.mcs.anl.gov/petsc/petsc-current/src/snes/examples/tutorials/ex70.c.html.
>>
>> However, when I run the code with Leak Sanitizer, I get the
>> following error:
>>
>> =
>> ==54927==ERROR: AddressSanitizer: attempting free on address which
>> was
>> not malloc()-ed: 0x62751ab8 in thread T0
>>  #0 0x7fbd95c08f30 in __interceptor_free
>> ../../../../gcc-8.1.0/libsanitizer/asan/asan_malloc_linux.cc:66
>>  #1 0x7fbd92b99dcd in PetscFreeAlign
>> 
>> (/opt/PETSc_library/petsc/manuel_OpenBLAS_petsc/lib/libpetsc.so.3.8+0x146dcd)
>>  #2 0x7fbd92ce0178 in VecRestoreArray_Nest
>> 
>> (/opt/PETSc_library/petsc/manuel_OpenBLAS_petsc/lib/libpetsc.so.3.8+0x28d178)
>>  #3 0x7fbd92cd627d in VecRestoreArrayRead
>> 
>> (/opt/PETSc_library/petsc/manuel_OpenBLAS_petsc/lib/libpetsc.so.3.8+0x28327d)
>>  #4 0x7fbd92d1189e in VecScatterBegin_SSToSS
>> 
>> (/opt/PETSc_library/petsc/manuel_OpenBLAS_petsc/lib/libpetsc.so.3.8+0x2be89e)
>>  #5 0x7fbd92d1a414 in VecScatterBegin
>> 
>> (/opt/PETSc_library/petsc/manuel_OpenBLAS_petsc/lib/libpetsc.so.3.8+0x2c7414)
>>  #6 0x7fbd934a999c in PCApply_FieldSplit
>> 
>> (/opt/PETSc_library/petsc/manuel_OpenBLAS_petsc/lib/libpetsc.so.3.8+0xa5699c)
>>  #7 0x7fbd93369071 in PCApply
>> 
>> (/opt/PETSc_library/petsc/manuel_OpenBLAS_petsc/lib/libpetsc.so.3.8+0x916071)
>>  #8 0x7fbd934efe77 in KSPInitialResidual
>> 
>> (/opt/PETSc_library/petsc/manuel_OpenBLAS_petsc/lib/libpetsc.so.3.8+0xa9ce77)
>>  #9 0x7fbd9350272c in KSPSolve_GMRES
>> 
>> (/opt/PETSc_library/petsc/manuel_OpenBLAS_petsc/lib/libpetsc.so.3.8+0xaaf72c)
>>  #10 0x7fbd934e3c01 in KSPSolve
>> 
>> (/opt/PETSc_library/petsc/manuel_OpenBLAS_petsc/lib/libpetsc.so.3.8+0xa90c01)
>>
>> Disabling Leak Sanitizer also outputs an "invalid pointer" error.
>>
>> Did I forget something when writing the code?
>>
>> Thank you,
>>
>> Manuel
>>
>> ---
>>


[petsc-users] Invite to Fluid Dynamics Software Infrastructure Workshop: April 12-13 at CU Boulder

2019-03-12 Thread Jed Brown via petsc-users
For PETSc users/developers interested in software and data
infrastructure for fluid dynamics:

We are excited to invite you to attend the second workshop to aid in the
conceptualization of FDSI, a potential NSF-sponsored Institute dedicated
to Fluid Dynamics Software Infrastructure. The workshop will be in
Boulder, CO and will start at 1:00 PM on Friday April 12th and continue
until 5:00 PM on April 13th.  Registration and further info about FDSI
is available in the links below.
 
Registration (free; travel support available)

  https://www.colorado.edu/events/cfdsi/fdsi-full-community-workshop
 
GitHub repo of May 2018 Kickoff Workshop:

  https://github.com/FDSI/Kickoff_Workshop
 
Community Needs Assessment based on 2018 Kickoff Workshop:

  https://www.colorado.edu/events/cfdsi/2018-kick-work/Summary
 
FDSI Discussion Forum:

  https://gitter.im/FDSI/community#
 
We also ask for your help in growing the community by either forwarding
this email to any of your colleagues with an interest in fluid dynamics
software or by nominating them for personal invites to this and future
FDSI workshops:

  https://www.colorado.edu/events/cfdsi/content/grow-fdsi
 
We hope to see you in Boulder and/or in online virtual discussions of
fluid dynamics software infrastructure.
 
Thanks,
 
FDSI Conceptualization Management Team


Re: [petsc-users] PetscScatterCreate type mismatch after update.

2019-03-12 Thread Jed Brown via petsc-users
Manuel Valera  writes:

> Ok i'll try that and let you know, for the time being i reverted to 3.9 to
> finish a paper, will update after that :)

3.10 will also work.


Re: [petsc-users] PetscScatterCreate type mismatch after update.

2019-03-12 Thread Jed Brown via petsc-users
Did you just update to 'master'?  See VecScatter changes:

https://www.mcs.anl.gov/petsc/documentation/changes/dev.html

Manuel Valera via petsc-users  writes:

> Hello,
>
> I just updated petsc from the repo to the latest master branch version, and
> a compilation problem popped up, it seems like the variable types are not
> being acknowledged properly, what i have in a minimum working example
> fashion is:
>
> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>> USE petscvec
>> USE petscdmda
>> USE petscdm
>> USE petscis
>> USE petscksp
>> IS :: ScalarIS
>> IS :: DummyIS
>> VecScatter :: LargerToSmaller,to0,from0
>> VecScatter :: SmallerToLarger
>> PetscInt, ALLOCATABLE  :: pScalarDA(:), pDummyDA(:)
>> PetscScalar:: rtol
>> Vec:: Vec1
>> Vec:: Vec2
>> ! Create index sets
>> allocate( pScalarDA(0:(gridx-1)*(gridy-1)*(gridz-1)-1) ,
>> pDummyDA(0:(gridx-1)*(gridy-1)*(gridz-1)-1) )
>> iter=0
>> do k=0,gridz-2
>> kplane = k*gridx*gridy
>> do j=0,gridy-2
>> do i=0,gridx-2
>> pScalarDA(iter) = kplane + j*(gridx) + i
>> iter = iter+1
>> enddo
>> enddo
>> enddo
>> pDummyDA = (/ (ind, ind=0,((gridx-1)*(gridy-1)*(gridz-1))-1) /)
>> call
>> ISCreateGeneral(PETSC_COMM_WORLD,(gridx-1)*(gridy-1)*(gridz-1), &
>>
>>  pScalarDA,PETSC_COPY_VALUES,ScalarIS,ierr)
>> call
>> ISCreateGeneral(PETSC_COMM_WORLD,(gridx-1)*(gridy-1)*(gridz-1), &
>>
>>  pDummyDA,PETSC_COPY_VALUES,DummyIS,ierr)
>> deallocate(pScalarDA,pDummyDA, STAT=ierr)
>> ! Create VecScatter contexts: LargerToSmaller & SmallerToLarger
>> call DMDACreateNaturalVector(daScalars,Vec1,ierr)
>> call DMDACreateNaturalVector(daDummy,Vec2,ierr)
>> call
>> VecScatterCreate(Vec1,ScalarIS,Vec2,DummyIS,LargerToSmaller,ierr)
>> call
>> VecScatterCreate(Vec2,DummyIS,Vec1,ScalarIS,SmallerToLarger,ierr)
>> call VecDestroy(Vec1,ierr)
>> call VecDestroy(Vec2,ierr)
>
>
> And the error i get is the part i cannot really understand:
>
> matrixobjs.f90:99.34:
>> call
>> VecScatterCreate(Vec1,ScalarIS,Vec2,DummyIS,LargerToSmaller,ie
>>  1
>> Error: Type mismatch in argument 'a' at (1); passed TYPE(tvec) to
>> INTEGER(4)
>> matrixobjs.f90:100.34:
>> call
>> VecScatterCreate(Vec2,DummyIS,Vec1,ScalarIS,SmallerToLarger,ie
>>  1
>> Error: Type mismatch in argument 'a' at (1); passed TYPE(tvec) to
>> INTEGER(4)
>> make[1]: *** [matrixobjs.o] Error 1
>> make[1]: Leaving directory `/usr/scratch/valera/ParGCCOM-Master/Src'
>> make: *** [gcmSeamount] Error 2
>
>
> What i find hard to understand is why/where my code is finding an integer
> type? as you can see from the MWE header the variables types look correct,
>
> Any help is appreaciated,
>
> Thanks,


Re: [petsc-users] Bad memory scaling with PETSc 3.10

2019-03-08 Thread Jed Brown via petsc-users
It may not address the memory issue, but can you build 3.10 with the
same options you used for 3.6?  It is currently a debugging build:

  ##
  ##
  #   WARNING!!!   #
  ##
  #   This code was compiled with a debugging option.  #
  #   To get timing results run ./configure#
  #   using --with-debugging=no, the performance will  #
  #   be generally two or three times faster.  #
  ##
  ##



Re: [petsc-users] MatCreate performance

2019-03-08 Thread Jed Brown via petsc-users
This is very unusual.  MatCreate() does no work, merely dup'ing a
communicator (or referencing an inner communicator if this is not the
first PetscObject on the provided communicator).  What size matrices are
you working with?  Can you send some performance data and (if feasible)
a reproducer?

Ale Foggia via petsc-users  writes:

> Hello all,
>
> I have a problem with the scaling of the MatCreate() function. I wrote a
> code to diagonalize sparse matrices and I'm running it in parallel. I've
> observed a very bad speedup of the code and it's given by the MatCreate
> part of it: for a fixed matrix size, when I increase the number of
> processes the time taken by the function also increases. I wanted to know
> if you expect this behavior or if maybe there's something wrong with my
> code. When I go to (what I consider) very big matrix sizes, and depending
> on the number of mpi processes, in some cases, MatCreate takes more time
> than the time the solver takes to solve the system for one eigenvalue or
> the time it takes to set up the values.
>
> Ale


Re: [petsc-users] Bad memory scaling with PETSc 3.10

2019-03-05 Thread Jed Brown via petsc-users
Myriam, in your first message, there was a significant (about 50%)
increase in memory consumption already on 4 cores.  Before attacking
scaling, it may be useful to trace memory usage for that base case.
Even better if you can reduce to one process.  Anyway, I would start by
running both cases with -log_view and looking at the memory summary.  I
would then use Massif (the memory profiler/tracer component in Valgrind)
to obtain stack traces for the large allocations.  Comparing those
traces should help narrow down which part of the code has significantly
different memory allocation behavior.  It might also point to the
unacceptable memory consumption under weak scaling, but it's something
we should try to fix.

If I had to guess, it may be in intermediate data structures for the
different PtAP algorithms in GAMG.  The option "-matptap_via scalable"
may be helpful.

"Smith, Barry F. via petsc-users"  writes:

>Myriam,
>
> Sorry we have not been able to resolve this problem with memory scaling 
> yet.
>
> The best tool to determine the change in a code that results in large 
> differences in a program's run is git bisect. Basically you tell git bisect 
> the git commit of the code that is "good" and the git commit of the code that 
> is "bad" and it gives you additional git commits for you to check your code 
> on  each time telling git if it is "good" or "bad", eventually git bisect 
> tells you exactly the git commit that "broke" the code. No guess work, no 
> endless speculation. 
>
> The draw back is that you have to ./configure && make PETSc for each 
> "test" commit and then compile and run your code for that commit. I can 
> understand if you have to run your code on 10,000 processes to check if it is 
> "good" or "bad" that can be very daunting. But all I can suggest is to find a 
> problem size that is manageable and do the git bisect process (yeah it may 
> take several hours but that beats days of head banging).
>
>Good luck,
>
>Barry
>
>
>> On Mar 5, 2019, at 12:42 PM, Matthew Knepley via petsc-users 
>>  wrote:
>> 
>> On Tue, Mar 5, 2019 at 11:53 AM Myriam Peyrounette 
>>  wrote:
>> I used PCView to display the size of the linear system in each level of the 
>> MG. You'll find the outputs attached to this mail (zip file) for both the 
>> default threshold value and a value of 0.1, and for both 3.6 and 3.10 PETSc 
>> versions. 
>> 
>> For convenience, I summarized the information in a graph, also attached (png 
>> file).
>> 
>> 
>> Great! Can you draw lines for the different runs you did? My interpretation 
>> was that memory was increasing
>> as you did larger runs, and that you though that was coming from GAMG. That 
>> means the curves should
>> be pushed up for larger runs. Do you see that?
>> 
>>   Thanks,
>> 
>> Matt 
>> As you can see, there are slight differences between the two versions but 
>> none is critical, in my opinion. Do you see anything suspicious in the 
>> outputs?
>> 
>> + I can't find the default threshold value. Do you know where I can find it?
>> 
>> Thanks for the follow-up
>> 
>> Myriam
>> 
>> 
>> Le 03/05/19 à 14:06, Matthew Knepley a écrit :
>>> On Tue, Mar 5, 2019 at 7:14 AM Myriam Peyrounette 
>>>  wrote:
>>> Hi Matt,
>>> 
>>> I plotted the memory scalings using different threshold values. The two 
>>> scalings are slightly translated (from -22 to -88 mB) but this gain is 
>>> neglectable. The 3.6-scaling keeps being robust while the 3.10-scaling 
>>> deteriorates.
>>> 
>>> Do you have any other suggestion?
>>> 
>>> Mark, what is the option she can give to output all the GAMG data?
>>> 
>>> Also, run using -ksp_view. GAMG will report all the sizes of its grids, so 
>>> it should be easy to see
>>> if the coarse grid sizes are increasing, and also what the effect of the 
>>> threshold value is.
>>> 
>>>   Thanks,
>>> 
>>>  Matt 
>>> Thanks
>>> 
>>> Myriam 
>>> 
>>> Le 03/02/19 à 02:27, Matthew Knepley a écrit :
 On Fri, Mar 1, 2019 at 10:53 AM Myriam Peyrounette via petsc-users 
  wrote:
 Hi,
 
 I used to run my code with PETSc 3.6. Since I upgraded the PETSc version
 to 3.10, this code has a bad memory scaling.
 
 To report this issue, I took the PETSc script ex42.c and slightly
 modified it so that the KSP and PC configurations are the same as in my
 code. In particular, I use a "personnalised" multi-grid method. The
 modifications are indicated by the keyword "TopBridge" in the attached
 scripts.
 
 To plot the memory (weak) scaling, I ran four calculations for each
 script with increasing problem sizes and computations cores:
 
 1. 100,000 elts on 4 cores
 2. 1 million elts on 40 cores
 3. 10 millions elts on 400 cores
 4. 100 millions elts on 4,000 cores
 
 The resulting graph is also attached. The scaling using PETSc 3.10
 clearly deteriorates for large cases, while the one using PETSc 3.6 is
 robust.
 
 After a few tests, I 

Re: [petsc-users] streams test on hpc

2019-03-05 Thread Jed Brown via petsc-users
Of course, just as you would run any other MPI application.

GangLu via petsc-users  writes:

> Hi all,
>
> When installing petsc, there is a stream test that is quite useful.
>
> Is it possible to run such test in batch mode, e.g. using pbs script?
>
> Thanks.
>
> cheers,
>
> Gang


Re: [petsc-users] Compute the sum of the absolute values of the off-block diagonal entries of each row

2019-03-04 Thread Jed Brown via petsc-users
"Zhang, Junchao via petsc-users"  writes:

> Perhaps PETSc should have a MatGetRemoteRow (or
> MatGetRowOffDiagonalBlock) (A, r, , , ).  MatGetRow()
> internally has to allocate memory and sort indices and values from
> local diagonal block and off-diagonal block. It is totally a waste in
> this case -- users do not care column indices and the local block.
> With MatGetRemoteRow(A, r, , NULL, ), PETSc just needs to
> set an integer and a pointer.

I'm not wild about the resulting programming model, which is so
intimately tied to PETSc *AIJ storage conventions yet also likely not
efficient for operations like SOR.  Perhaps PETSc MatSOR should be
taught about a supplemental diagonal, such as produced by the "l^1"
scheme?


Re: [petsc-users] AddressSanitizer: attempting free on address which was not malloc()-ed

2019-03-03 Thread Jed Brown via petsc-users
The compiler doesn't know anything special about PetscFinalize.
Destructors are called after all executable statements in their scope.
So you need the extra scoping if the destructor should be called
earlier.

Yuyun Yang  writes:

> Oh interesting, so I need to add those extra brackets around my class object 
> and function calls. I thought the destructor is automatically at Finalize.
>
> Thanks!
> Yuyun
>
> -Original Message-
> From: Jed Brown  
> Sent: Sunday, March 3, 2019 2:19 PM
> To: Yuyun Yang ; Matthew Knepley 
> Cc: petsc-users@mcs.anl.gov
> Subject: Re: [petsc-users] AddressSanitizer: attempting free on address which 
> was not malloc()-ed
>
> If you run this with MPICH, it prints
>
>   Attempting to use an MPI routine after finalizing MPICH
>
> You need to ensure that the C++ class destructor is called before 
> PetscFinalize.  For example, like this:
>
> diff --git i/test_domain.cpp w/test_domain.cpp index 0cfe22f..23545f2 100644
> --- i/test_domain.cpp
> +++ w/test_domain.cpp
> @@ -8,11 +8,12 @@ int main(int argc, char **argv) {
>PetscErrorCode ierr = 0;
>PetscInitialize(, , NULL, NULL);
>  
> -  Domain d;
> +  {
> +Domain d;
>  
> -  ierr = d.setFields(); CHKERRQ(ierr);
> -  ierr = d.setScatters(); CHKERRQ(ierr);
> -  
> +ierr = d.setFields(); CHKERRQ(ierr);
> +ierr = d.setScatters(); CHKERRQ(ierr);  }
>PetscFinalize();
>return ierr;
>  }
>
>
> Yuyun Yang via petsc-users  writes:
>
>> Yes, please see the attached files for a minimal example. Thanks a lot!
>>
>> Best regards,
>> Yuyun
>>
>> From: Matthew Knepley 
>> Sent: Sunday, March 3, 2019 12:46 PM
>> To: Yuyun Yang 
>> Cc: Zhang, Junchao ; petsc-users@mcs.anl.gov
>> Subject: Re: [petsc-users] AddressSanitizer: attempting free on 
>> address which was not malloc()-ed
>>
>> On Sun, Mar 3, 2019 at 3:05 PM Yuyun Yang 
>> mailto:yyan...@stanford.edu>> wrote:
>> Actually, I tried just creating a domain object (since the address sanitizer 
>> was complaining about that code to start with). Simply creating that object 
>> gave me a core dump, so I suppose the issue must be there. I got the 
>> following message when running the code with -objects_dump flag on the 
>> command line, but couldn’t find a problem with the code (I’ve attached it 
>> here with only the relevant functions).
>>
>> I think what we are going to need from you is a minimal example that 
>> has the error. I am guessing you have a logic bug in the C++, which we 
>> cannot debug by looking.
>>
>>   Thanks,
>>
>>  Matt
>>
>> Thanks a lot for your help!
>>
>> The following objects were never freed
>> -
>> [0] Vec seq y
>>   [0]  VecCreate() in 
>> /home/yyy910805/petsc/src/vec/vec/interface/veccreate.c
>> [0] Vec seq Vec_0x8400_0
>>   [0]  VecCreate() in 
>> /home/yyy910805/petsc/src/vec/vec/interface/veccreate.c
>> [0] Vec seq Vec_0x8400_1
>>   [0]  VecCreate() in 
>> /home/yyy910805/petsc/src/vec/vec/interface/veccreate.c
>> [0] VecScatter seq VecScatter_0x8400_2
>>   [0]  VecScatterCreate() in 
>> /home/yyy910805/petsc/src/vec/vscat/interface/vscreate.c
>> [0] VecScatter seq VecScatter_0x8400_3
>>   [0]  VecScatterCreate() in 
>> /home/yyy910805/petsc/src/vec/vscat/interface/vscreate.c
>> [0] VecScatter seq VecScatter_0x8400_4
>>   [0]  VecScatterCreate() in 
>> /home/yyy910805/petsc/src/vec/vscat/interface/vscreate.c
>> [0] VecScatter seq VecScatter_0x8400_5
>>   [0]  VecScatterCreate() in 
>> /home/yyy910805/petsc/src/vec/vscat/interface/vscreate.c
>> Attempting to use an MPI routine after finalizing MPICH
>>
>> --
>> --
>> -
>> From: Matthew Knepley mailto:knep...@gmail.com>>
>> Sent: Sunday, March 3, 2019 11:28 AM
>> To: Yuyun Yang mailto:yyan...@stanford.edu>>
>> Cc: Zhang, Junchao mailto:jczh...@mcs.anl.gov>>; 
>> petsc-users@mcs.anl.gov
>> Subject: Re: [petsc-users] AddressSanitizer: attempting free on 
>> address which was not malloc()-ed
>>
>> On Sun, Mar 3, 2019 at 1:03 PM Yuyun Yang via petsc-users 
>> mailto:petsc-users@mcs.anl.gov>> wrote:
>> I tried compiling without the sanitizer and running on valgrind. Got a bunch 
>> of errors “Uninitialised value was created by a stack allocation at 
>> 0x41B280: ComputeVel_qd::computeVel(double*, double, int&, int)”.
>>
>> There is no memory management code here, so other parts of the code must be 
>> relevant.
>>
>>   Thanks,
>>
>> Matt
>>
>>
>> HEAP SUMMARY:
>> ==74== in use at exit: 96,637 bytes in 91 blocks
>> ==74==   total heap usage: 47,774 allocs, 47,522 frees, 308,253,653 bytes 
>> allocated
>> LEAK SUMMARY:
>> ==74==definitely lost: 0 bytes in 0 blocks
>> ==74==indirectly lost: 0 bytes in 0 blocks
>> ==74==  possibly lost: 0 bytes in 0 blocks
>> ==74==still reachable: 96,637 bytes in 91 

Re: [petsc-users] AddressSanitizer: attempting free on address which was not malloc()-ed

2019-03-03 Thread Jed Brown via petsc-users
If you run this with MPICH, it prints

  Attempting to use an MPI routine after finalizing MPICH

You need to ensure that the C++ class destructor is called before
PetscFinalize.  For example, like this:

diff --git i/test_domain.cpp w/test_domain.cpp
index 0cfe22f..23545f2 100644
--- i/test_domain.cpp
+++ w/test_domain.cpp
@@ -8,11 +8,12 @@ int main(int argc, char **argv) {
   PetscErrorCode ierr = 0;
   PetscInitialize(, , NULL, NULL);
 
-  Domain d;
+  {
+Domain d;
 
-  ierr = d.setFields(); CHKERRQ(ierr);
-  ierr = d.setScatters(); CHKERRQ(ierr);
-  
+ierr = d.setFields(); CHKERRQ(ierr);
+ierr = d.setScatters(); CHKERRQ(ierr);
+  }
   PetscFinalize();
   return ierr;
 }


Yuyun Yang via petsc-users  writes:

> Yes, please see the attached files for a minimal example. Thanks a lot!
>
> Best regards,
> Yuyun
>
> From: Matthew Knepley 
> Sent: Sunday, March 3, 2019 12:46 PM
> To: Yuyun Yang 
> Cc: Zhang, Junchao ; petsc-users@mcs.anl.gov
> Subject: Re: [petsc-users] AddressSanitizer: attempting free on address which 
> was not malloc()-ed
>
> On Sun, Mar 3, 2019 at 3:05 PM Yuyun Yang 
> mailto:yyan...@stanford.edu>> wrote:
> Actually, I tried just creating a domain object (since the address sanitizer 
> was complaining about that code to start with). Simply creating that object 
> gave me a core dump, so I suppose the issue must be there. I got the 
> following message when running the code with -objects_dump flag on the 
> command line, but couldn’t find a problem with the code (I’ve attached it 
> here with only the relevant functions).
>
> I think what we are going to need from you is a minimal example that has the 
> error. I am guessing you have a logic
> bug in the C++, which we cannot debug by looking.
>
>   Thanks,
>
>  Matt
>
> Thanks a lot for your help!
>
> The following objects were never freed
> -
> [0] Vec seq y
>   [0]  VecCreate() in 
> /home/yyy910805/petsc/src/vec/vec/interface/veccreate.c
> [0] Vec seq Vec_0x8400_0
>   [0]  VecCreate() in 
> /home/yyy910805/petsc/src/vec/vec/interface/veccreate.c
> [0] Vec seq Vec_0x8400_1
>   [0]  VecCreate() in 
> /home/yyy910805/petsc/src/vec/vec/interface/veccreate.c
> [0] VecScatter seq VecScatter_0x8400_2
>   [0]  VecScatterCreate() in 
> /home/yyy910805/petsc/src/vec/vscat/interface/vscreate.c
> [0] VecScatter seq VecScatter_0x8400_3
>   [0]  VecScatterCreate() in 
> /home/yyy910805/petsc/src/vec/vscat/interface/vscreate.c
> [0] VecScatter seq VecScatter_0x8400_4
>   [0]  VecScatterCreate() in 
> /home/yyy910805/petsc/src/vec/vscat/interface/vscreate.c
> [0] VecScatter seq VecScatter_0x8400_5
>   [0]  VecScatterCreate() in 
> /home/yyy910805/petsc/src/vec/vscat/interface/vscreate.c
> Attempting to use an MPI routine after finalizing MPICH
>
> -
> From: Matthew Knepley mailto:knep...@gmail.com>>
> Sent: Sunday, March 3, 2019 11:28 AM
> To: Yuyun Yang mailto:yyan...@stanford.edu>>
> Cc: Zhang, Junchao mailto:jczh...@mcs.anl.gov>>; 
> petsc-users@mcs.anl.gov
> Subject: Re: [petsc-users] AddressSanitizer: attempting free on address which 
> was not malloc()-ed
>
> On Sun, Mar 3, 2019 at 1:03 PM Yuyun Yang via petsc-users 
> mailto:petsc-users@mcs.anl.gov>> wrote:
> I tried compiling without the sanitizer and running on valgrind. Got a bunch 
> of errors “Uninitialised value was created by a stack allocation at 0x41B280: 
> ComputeVel_qd::computeVel(double*, double, int&, int)”.
>
> There is no memory management code here, so other parts of the code must be 
> relevant.
>
>   Thanks,
>
> Matt
>
>
> HEAP SUMMARY:
> ==74== in use at exit: 96,637 bytes in 91 blocks
> ==74==   total heap usage: 47,774 allocs, 47,522 frees, 308,253,653 bytes 
> allocated
> LEAK SUMMARY:
> ==74==definitely lost: 0 bytes in 0 blocks
> ==74==indirectly lost: 0 bytes in 0 blocks
> ==74==  possibly lost: 0 bytes in 0 blocks
> ==74==still reachable: 96,637 bytes in 91 blocks
> ==74== suppressed: 0 bytes in 0 blocks
>
> The error is located in the attached code (I’ve extracted only the relevant 
> functions), but I couldn’t figure out what is wrong. Is this causing the 
> memory corruption/double free error that happens when I execute the code?
>
> Thanks a lot for your help.
>
> Best regards,
> Yuyun
>
> From: Zhang, Junchao mailto:jczh...@mcs.anl.gov>>
> Sent: Friday, March 1, 2019 7:36 AM
> To: Yuyun Yang mailto:yyan...@stanford.edu>>
> Subject: Re: [petsc-users] AddressSanitizer: attempting free on address which 
> was not malloc()-ed
>
>
> On Fri, Mar 1, 2019 at 1:02 AM Yuyun Yang 
> mailto:yyan...@stanford.edu>> wrote:
> Actually, I also saw a line at the beginning of valgrind saying "shadow 
> memory range interleaves with an existing memory mapping. ASan cannot 

Re: [petsc-users] Kronecker product

2019-02-25 Thread Jed Brown via petsc-users
MatCreateMAIJ does that (implicitly).

https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatCreateMAIJ.html

If you want a Kronecker product with a non-identity matrix, this PR may
be of interest.

https://bitbucket.org/petsc/petsc/pull-requests/1334/rmills-mat-kaij/diff

Yuyun Yang via petsc-users  writes:

> Hello team,
>
> I'd like to ask whether PETSc has a function to compute the Kronecker product 
> of a sparse matrix with an identity matrix? A Google search didn't lead me to 
> a manual page (like most of the other PETSc functions), so I'm wondering if 
> this has been implemented yet.
>
> Thanks very much!
>
> Best,
> Yuyun


Re: [petsc-users] Using DMCOMPOSITE with TS

2019-02-23 Thread Jed Brown via petsc-users
It shouldn't have any affect.  This will need to be debugged.  There's
no chance I'll have time for at least a week; hopefully one of the other
TS contributors can look sooner.

"Ellen M. Price via petsc-users"  writes:

> Quick update: I found that changing TS_EXACTFINALTIME_INTERPOLATE to
> TS_EXACTFINALTIME_MATCHSTEP fixes the problem; is there a reason I can't
> use the interpolation when using a DMCOMPOSITE?
>
> Ellen Price
>
>
> On 2/20/19 1:42 PM, Ellen Price wrote:
>> Hi fellow PETSc users!
>> 
>> I am attempting to use a DMCOMPOSITE alongside TS and have run into some
>> trouble. I'm attaching a MWE demonstrating the problem. The goal is to
>> combine a DMDA3d for spatial data and a DMREDUNDANT for non-spatial,
>> time-dependent fields. In the attached example, this additional field is
>> temperature.
>> 
>> The DMDA data is currently just temperature-dependent, and the
>> temperature is supposed to increase linearly. Unfortunately, no matter
>> how I integrate (explicitly, using Euler or RK, or implicitly, using
>> backward Euler or BDF), I get the wrong final temperature. The
>> integration proceeds for 100 timesteps and then stops (verified by
>> -ts_monitor). At a heating rate of 1, and an initial temperature of 150,
>> I should get a final temperature of 250 very easily. However, I get
>> something closer to 200 (not exact, which makes me think this isn't a
>> simple missing-a-factor-of-2 error).
>> 
>> I'm currently using TSSetDM with the composite DM to let PETSc compute
>> the Jacobian. Having checked all the inputs and outputs as well as I
>> can, this is the only step that seems like it may be causing a problem;
>> yet even explicit integration doesn't work, and that shouldn't need the
>> Jacobian at all. I'm at a loss.
>> 
>> Run using:
>> ./mwe -implicit yes -ts_type beuler -ts_monitor OR ./mwe -implicit no
>> -ts_type euler -ts_monitor
>> 
>> Thanks in advance for any help,
>> Ellen Price


Re: [petsc-users] PCMGSetGalerkin() new inputs

2019-02-20 Thread Jed Brown via petsc-users
This wasn't explained well in the commit message.  The old code used the
Galerkin procedure on the "Pmat" (preconditioning matrix; which may or
may not be the same as the Amat) and set the result as both Amat and
Pmat of the coarse grid.  The new code allows you to specify.  If your
Amat and Pmat are the same on the finest level, then BOTH will offer the
same behavior as before.

Myriam Peyrounette via petsc-users  writes:

> Hi,
>
> I am currently comparing two codes based on PETSc. The first one uses
> PETSC 3.6.4 and the other one PETSc 3.10.2.
>
> I am having a look at the use of the function PCMGSetGalerkin(). With
> PETSc 3.6, the input is a boolean, while it is either
> PC_MG_GALERKIN_MAT, PC_MG_GALERKIN_PMAT or PC_MG_GALERKIN_BOTH with PETSc 
> 3.10.
>
> Which one of these new inputs should I use to have the same
> configuration as with PETSc 3.6?
>
> Thanks in advance,
>
> -- 
> Myriam Peyrounette
> CNRS/IDRIS - HLST
> --


Re: [petsc-users] saving results

2019-02-18 Thread Jed Brown via petsc-users
What kind of problems are you solving?  Running for a Krylov method for
tens of thousands of iterations is very rarely recommended.

Regarding storage, it's significantly more expensive to store the Krylov
basis (even when it's a recurrence) than the current approximation.
Some methods require some work to compute the current approximation.
Anyway, you can use a KSPMonitor to write checkpoints at whatever
interval you like.  In case of a crash, use VecLoad() to read in the
checkpoint file and use KSPSetInitialGuessNonzero() to make it be used
as an initial guess.

Sal Am  writes:

> This is how I run it:
>
> -ksp_type bcgs -pc_type gamg -mattransposematmult_via scalable
> -build_twosided allreduce -ksp_monitor_true_residual
> -ksp_monitor_if_not_converged -log_view -ksp_max_it 10 -ksp_rtol 1.0e-7
>
> so BCGS with GAMG as preconditioner. I am guessing writing at every
> timestep would be expensive, maybe every hour? I am not sure what would be
> a good number here if the simulations lasts more than a day.
>
>
>
> On Mon, Feb 18, 2019 at 4:10 PM Jed Brown  wrote:
>
>> What kind of solver are you using and how often do you want to write?
>>
>> Sal Am via petsc-users  writes:
>>
>> > Is there a function/command line option to save the solution as it is
>> > solving (and read in the file from where it crashed and keep iterating
>> from
>> > there perhaps)?
>> > Had a seg fault and all the results until that point seems to have been
>> > lost.
>>


Re: [petsc-users] saving results

2019-02-18 Thread Jed Brown via petsc-users
What kind of solver are you using and how often do you want to write?

Sal Am via petsc-users  writes:

> Is there a function/command line option to save the solution as it is
> solving (and read in the file from where it crashed and keep iterating from
> there perhaps)?
> Had a seg fault and all the results until that point seems to have been
> lost.


Re: [petsc-users] Problem in loading Matrix Market format

2019-02-12 Thread Jed Brown via petsc-users
We should make the (two line) functionality a command-line feature of
PetscBinaryIO.py.  Then a user could do

  python -m PetscBinaryIO matrix.mm matrix.petsc

Matthew Knepley via petsc-users  writes:

> It definitely should not be there under 'datafiles'. We should put it in an
> example, as Junchao graciously agreed to do.
>
>   Thanks,
>
> Matt
>
> On Tue, Feb 12, 2019 at 11:15 AM Zhang, Hong  wrote:
>
>> We have /home/petsc/datafiles/matrices/MtxMarket/mm2petsc.c
>> Hong
>>
>> On Tue, Feb 12, 2019 at 9:52 AM Zhang, Junchao via petsc-users <
>> petsc-users@mcs.anl.gov> wrote:
>>
>>> Sure.
>>> --Junchao Zhang
>>>
>>>
>>> On Tue, Feb 12, 2019 at 9:47 AM Matthew Knepley 
>>> wrote:
>>>
 Hi Junchao,

 Could you fix the MM example in PETSc to have this full support? That
 way we will always have it.

  Thanks,

 Matt

 On Tue, Feb 12, 2019 at 10:27 AM Zhang, Junchao via petsc-users <
 petsc-users@mcs.anl.gov> wrote:

> Eda,
>   I have a code that can read in Matrix Market and write out PETSc
> binary files.  Usage:  mpirun -n 1 ./mm2petsc -fin  -fout
> .  You can have a try.
> --Junchao Zhang
>
>
> On Tue, Feb 12, 2019 at 1:50 AM Eda Oktay via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
>
>> Hello,
>>
>> I am trying to load matrix in Matrix Market format. I found an example
>> on mat file  (ex78) whih can be tested by using .dat file. Since .dat 
>> file
>> and .mtx file are similar  in structure (specially afiro_A.dat file is
>> similar to amesos2_test_mat0.mtx since they both have 3 columns and the
>> columns represent the same properties), I tried to run ex78 by using
>> amesos2_test_mat0.mtx instead of afiro_A.dat. However, I got the error
>> "Badly formatted input file". Here is the full error message:
>>
>> [0]PETSC ERROR: - Error Message
>> --
>> [0]PETSC ERROR: Badly formatted input file
>>
>> [0]PETSC ERROR: See
>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble
>> shooting.
>> [0]PETSC ERROR: Petsc Release Version 3.10.3, Dec, 18, 2018
>> [0]PETSC ERROR: ./ex78 on a arch-linux2-c-debug named
>> 7330.wls.metu.edu.tr by edaoktay Tue Feb 12 10:47:58 2019
>> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++
>> --with-fc=gfortran --with-cxx-dialect=C++11 --download-openblas
>> --download-metis --download-parmetis --download-superlu_dist
>> --download-slepc --download-mpich
>> [0]PETSC ERROR: #1 main() line 73 in
>> /home/edaoktay/petsc-3.10.3/src/mat/examples/tests/ex78.c
>> [0]PETSC ERROR: PETSc Option Table entries:
>> [0]PETSC ERROR: -Ain
>> /home/edaoktay/petsc-3.10.3/share/petsc/datafiles/matrices/amesos2_test_mat0.mtx
>> [0]PETSC ERROR: End of Error Message ---send
>> entire error message to petsc-ma...@mcs.anl.gov--
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=1
>> :
>> system msg for write_line failure : Bad file descriptor
>>
>> I know there is also an example (ex72) for Matrix Market format but in
>> description, it is only proper for symmmetric and lower triangle, so I
>> decided to use ex78.
>>
>> Best regards,
>>
>> Eda
>>
>

 --
 What most experimenters take for granted before they begin their
 experiments is infinitely more interesting than any results to which their
 experiments lead.
 -- Norbert Wiener

 https://www.cse.buffalo.edu/~knepley/
 

>>>
>
> -- 
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] Preconditioning systems of equations with complex numbers

2019-02-11 Thread Jed Brown via petsc-users
Justin Chang via petsc-users  writes:

> So I used -mat_view draw -draw_pause -1 on my medium sized matrix and got
> this output:
>
> [image: 1MPI.png]
>
> So it seems there are lots of off-diagonal terms, and that a decomposition
> of the problem via matload would give a terrible unbalanced problem.
>
> Given the initial A and b Mat/Vec, I experimented with MatPartioning and
> inserted the following lines into my code:
>
> Mat   Apart;
> Vec   bpart;
> MatPartitioning   part;
> ISis,isrows;
> ierr = MatPartitioningCreate(PETSC_COMM_WORLD, );CHKERRQ(ierr);
> ierr = MatPartitioningSetAdjacency(part, A);CHKERRQ(ierr);
> ierr = MatPartitioningSetFromOptions(part);CHKERRQ(ierr);
> ierr = MatPartitioningApply(part, );CHKERRQ(ierr);
> ierr = ISBuildTwoSided(is,NULL,);CHKERRQ(ierr);
> ierr = MatCreateSubMatrix(A, isrows,isrows, MAT_INITIAL_MATRIX,
> );CHKERRQ(ierr);
> ierr = MatSetOptionsPrefix(Apart, "part_");CHKERRQ(ierr);
> ierr = MatSetFromOptions(Apart);CHKERRQ(ierr);
> ierr = VecGetSubVector(b,isrows,);CHKERRQ(ierr);
>
> /* Set Apart and bpart in the KSPSolve */
> ...
>
> And here are the mat_draw figures from 2 and 4 MPI processes respectively:
>
> [image: 2MPI.png][image: 4MPI.png]
>
> Is this "right"? It just feels like I'm just duplicating the nnz structure
> among all the MPI processes. And it didn't really improve the performance
> of ASM.

ASM might not be an appropriate preconditioner (or it might need a
special sort of overlap for stability of the local problems).  The edge
cuts look relatively small, so it doesn't look to me like power law or
social network problems that don't admit vertex partitions with low edge
cut.

We really have to understand the spectrum to comment further on fast
solvers.


Re: [petsc-users] About the value of the PETSC_SMALL

2019-02-11 Thread Jed Brown via petsc-users
You're probably looking for PETSC_MACHINE_EPSILON.

ztdepyahoo via petsc-users  writes:

> Dear sir:
> I output the value  of the "PETSC_SMALL", it is 1E-10.  But i think it
> should be more smaller than this for double float number.
> Regards


Re: [petsc-users] Meaning of PETSc error code 77

2019-02-08 Thread Jed Brown via petsc-users
Can you run in a debugger to get a stack trace?  I believe Dolfin
swallows the stack trace to obstruct efforts to remove bugs; probably
paid off by the bug lobby.

aditya kumar via petsc-users  writes:

> Hello,
>
> I am using PETSc with FEniCS project libraries to solve a nonlinear
> problem. I am using PETSc Krylov solver with the following configuration
>
> pc = PETScPreconditioner("petsc_amg")
> PETScOptions.set("mg_levels_ksp_type", "chebyshev")
> PETScOptions.set("mg_levels_pc_type", "jacobi")
> PETScOptions.set("mg_levels_esteig_ksp_type", "cg")
> PETScOptions.set("mg_levels_ksp_chebyshev_esteig_steps", 50) solver_u =
> PETScKrylovSolver("cg", pc)
>
> I am encountering this error sometimes during the solution of the linear
> system:
> Error: Unable to successfully call PETSc function 'KSPSolve'. *** Reason:
> PETSc error code is: 77 (Petsc has generated inconsistent data). *** Where:
> This error was encountered inside
> /tmp/dolfin/dolfin/la/PETScKrylovSolver.cpp.
>
> *** Process: 130
>
> Can anybody help me in understanding the possible reasons for this error
> code? Or refer me to any available documentation? It will be really
> helpful.
>
> Also, I will like to know if there is a way to prevent the code from
> stopping due to such PETSc errors, so that I can tweak some parameters to
> hopefully make it progress.
>
> Thanks and Regards,
> Aditya


Re: [petsc-users] [TimeStepping] Eventhandler

2019-02-05 Thread Jed Brown via petsc-users
Dolfin will need this PR to work with any PETSc 3.10.

https://bitbucket.org/fenics-project/dolfin/pull-requests/508/jed-petsc-310/diff

It's been chillin' there for a couple months; it appears that much of
the Dolfin development effort has moved to DolfinX and Firedrake.

"Huck, Moritz"  writes:

> I am using the fenics library for my PDE discretization.
> I can not compile fenics with PETSc 3.10.3.
> I will locate the exact error tomorrow.
>
> 
> Von: Jed Brown 
> Gesendet: Dienstag, 5. Februar 2019 15:30:38
> An: Huck, Moritz; Smith, Barry F.; Abhyankar, Shrirang G
> Cc: PETSc users list
> Betreff: Re: [petsc-users] [TimeStepping] Eventhandler
>
> "Huck, Moritz via petsc-users"  writes:
>
>> @Shri
>> The system is very stiff, but the stiffness is handled well by ARKIMEX.
>>
>> I'am using PETSc 3.10. (I cannot use 3.10.3 at the moment due to
>> compatibilty with a third library),
>
> What compatibility problem is this?  3.10.3 should be (binary and
> source) backward compatible with other 3.10 releases, so if there is a
> case where it is not, we need to understand it.
>
>
> Regarding the event, it seems there are scenarios where something
> important changes in the model and a conservative time step is
> desirable, and other scenarios where the dynamics after the event are on
> the same scale as before so it makes sense to continue with the same
> step sizes.


Re: [petsc-users] Store type (Eigen::Vector2d) in a petsc vec

2019-02-05 Thread Jed Brown via petsc-users
Andrew Parker  writes:

> Thanks, so you would suggest a flat vector storing u, v, w (or indeed x, y,
> z) or interleaved and then construct eigen types on the fly?  

Interleaved if you want to use Eigen types in the same memory, or if
your code (like most applications) benefits more from memory locality
than from vectorization.  You can cast because Matrix has
the same storage representation as

struct {
  double values[2];
};

> Can I ask, is that because Vec cannot store user defined types (as in
> it's not templatetable?)

PETSc does not use templates, but you wouldn't want it to in this
circumstance.  Either your types have the same storage representation
and you can cast or they don't and templating would ruin what are now
contiguous operations.


Re: [petsc-users] Installing PETSc

2019-02-05 Thread Jed Brown via petsc-users
Fazlul Huq via petsc-users  writes:

> Hello PETSc Developers,
>
> may be this is a trivial question!
>
> I usually run PETSc code from Home/petsc-3.10.2 directory. Last day I tried
> to run the code from Documents/petsc directory but I can't. As far as I can
> recall, I have installed PETSc in the Home directory. Is it the reason why
> I can't run PETSc code from other directory? Shall I install PETSc in the
> root directory?

What do you mean by "run PETSc"?  How did you link your executable?

> Again, if I run command "which petsc" I don't get any echo on the terminal.

That is searching for an executable named "petsc" in your PATH.

  echo $PATH

to see the value of that variable.


Re: [petsc-users] [TimeStepping] Eventhandler

2019-02-05 Thread Jed Brown via petsc-users
"Huck, Moritz via petsc-users"  writes:

> @Shri 
> The system is very stiff, but the stiffness is handled well by ARKIMEX.
>
> I'am using PETSc 3.10. (I cannot use 3.10.3 at the moment due to
> compatibilty with a third library), 

What compatibility problem is this?  3.10.3 should be (binary and
source) backward compatible with other 3.10 releases, so if there is a
case where it is not, we need to understand it.


Regarding the event, it seems there are scenarios where something
important changes in the model and a conservative time step is
desirable, and other scenarios where the dynamics after the event are on
the same scale as before so it makes sense to continue with the same
step sizes.


Re: [petsc-users] Store type (Eigen::Vector2d) in a petsc vec

2019-02-05 Thread Jed Brown via petsc-users
My suggestion is to use PETSc like usual and inside your
residual/Jacobian evaluation, for each cell or batch of cells, create
Eigen objects.  For size 2d or 3d, it won't matter much whether you make
them share memory with the PETSc Vec -- the Eigen types should mostly
exist in registers.

Andrew Parker via petsc-users  writes:

> Hi,
>
> I have quite a bit of working code that uses eigen vector2d/3d (and some
> corresponding matrix3d types) to allow for local linAlg.  I would like to
> store these within a petsc vector. So for example to have a vector for all
> cells of velocity-vectors in 2d held for each cell within an eigen
> vector2d, within a Petsc Vec for all cells.  What would be the best way to
> achieve this within petsc? I would like to make full use of petsc
> capabilities including parallel (ghost updating, partitioning etc),
> likewise those of eigen's for local linAlg on small fixed sized vectors, so
> how best to achieve this to allow for both eco-systems?
> Thanks,
> Andy


Re: [petsc-users] Preconditioning systems of equations with complex numbers

2019-02-04 Thread Jed Brown via petsc-users
Matthew Knepley via petsc-users  writes:

>> 2) I tried all the suggestions mentioned before: setting
>> -pc_gamg_agg_nsmooths 0  -pc_gamg_square_graph 10 did not improve my
>> convergence. Neither did explicitly setting -mg_coarse_pc_type lu or more
>> iterations of richardson/sor.
>>
>
> 1) Can we see some convergence for these? LU HAS to be better than the
> alternative, but you are saying that convergence still
> stalls above that I think. Okay, but I do not understand how SOR was
> working on the smaller problem, and cannot solve the coarse
> problem. However, it could be if the interpolation is crap, which it sounds
> like is true here.
>
> Jed, does that makes sense? If so, I would think about doing some bootstrap
> stuff.

Justin, can you compute and plot the spectrum of a small problem (<1000
dofs) as well as the eigenvectors of the extremal eigenvalues?


Re: [petsc-users] Preconditioning systems of equations with complex numbers

2019-01-31 Thread Jed Brown via petsc-users
Justin Chang via petsc-users  writes:

> Here's IMHO the simplest explanation of the equations I'm trying to solve:
>
> http://home.eng.iastate.edu/~jdm/ee458_2011/PowerFlowEquations.pdf
>
> Right now we're just trying to solve eq(5) (in section 1), inverting the
> linear Y-bus matrix. Eventually we have to be able to solve equations like
> those in the next section.

It's always useful to pick a smallish problem and compute the spectrum
of the operator and plot the eigenvectors associated with interesting
eigenvalues (especially those near zero and those with largest
magnitude).  Eq 5 looks like a graph Laplacian.


  1   2   >