from:"Jed Brown"

Re: [petsc-users] Correct way to set/track global numberings in DMPlex?

2024-04-03 Thread Jed Brown

 Matthew Knepley  writes: >> I'm developing routines that will read/write CGNS files to DMPlex and vice >> versa. >> One of the recurring challenges is the bookkeeping of global numbering for >>

ZjQcmQRYFpfptBannerStart

	This Message Is From an External Sender

This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

Matthew Knepley  writes:

>> I'm developing routines that will read/write CGNS files to DMPlex and vice
>> versa.
>> One of the recurring challenges is the bookkeeping of global numbering for
>> vertices and cells.
>> Currently, I am restricting my support to single Zone CGNS files, in which
>> the file provides global numbers for vertices and cells.
>>
>
> I thought Jed had put in parallel CGNS loading. If so, maybe you can
> transition to that. If not, we should get your stuff integrated.

We did implement this, but it's still in a branch. It's been on my plate for a while and I thought I would get to clean up that branch for merging this week, but that will have to wait for next week. (The current hang-up is mediating globally consistent edge orientation for isoperiodic meshes, as needed for cubic elements.)

How are you handling boundary conditions so far? Do you support periodicity?

Re: [petsc-users] using custom matrix vector multiplication

2024-03-28 Thread Jed Brown

 Interfaces like KSPSetOperators (https: //urldefense. us/v3/__https: //petsc. org/main/manualpages/KSP/KSPSetOperators/__;!!G_uCfscf7eWS!YINsVNEe8TcsVMY3AVwkS1hf46fWdiKi5JNOe9560N5QG1LPQyjMQgodivpJtg1IwxHgRR3_V3uHWG4h2AI$) have Amat and Pmat arguments. 

ZjQcmQRYFpfptBannerStart

	This Message Is From an External Sender

This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

Interfaces like KSPSetOperators (https://urldefense.us/v3/__https://petsc.org/main/manualpages/KSP/KSPSetOperators/__;!!G_uCfscf7eWS!YINsVNEe8TcsVMY3AVwkS1hf46fWdiKi5JNOe9560N5QG1LPQyjMQgodivpJtg1IwxHgRR3_V3uHWG4h2AI$) have Amat and Pmat arguments. The Amat is used as the Krylov operator and the Pmat is used for preconditioning.

In a general sense, Pmat is just whatever data is to be used by the preconditioner, though for algebraic preconditioners, it's usually regarded as an assembled matrix. If you're not using PCShell, then you need to think about what the preconditioner needs (conceptually or check the code).

Raju Mandhapati  writes:

> Hello,
>
> I want to use my own custom matrix vector multiplication routine (which
> will use finite difference method to calculate it). I will supply a matrix
> but that matrix should be used only for preconditioner and not for matrix
> vector multiplication. Is there a way to do it?
>
> thanks
> Raju.

Re: [petsc-users] Fortran interfaces: Google Summer of Code project?

2024-03-23 Thread Jed Brown





 Looks good to me. Thanks for taking the initiative. Martin Diehl  writes: > Dear All, > > pls. find attached the proposal for > https: //urldefense. us/v3/__https: //github. com/fortran-lang/webpage/wiki/GSoC-2024-Project-ideas__;!!G_uCfscf7eWS!blt-tme3gpPtGXW8o_NasIEtvlUql08d2_Q9Q83PvjPwLqIITWec4gsejnmPLpUCgMU8xuIEZgTlg65QIRo$. 




ZjQcmQRYFpfptBannerStart




  

  
	This Message Is From an External Sender
  
  
This message came from outside your organization.
  



 
  


ZjQcmQRYFpfptBannerEnd




Looks good to me. Thanks for taking the initiative.

Martin Diehl  writes:

> Dear All,
>
> pls. find attached the proposal for
> https://urldefense.us/v3/__https://github.com/fortran-lang/webpage/wiki/GSoC-2024-Project-ideas__;!!G_uCfscf7eWS!blt-tme3gpPtGXW8o_NasIEtvlUql08d2_Q9Q83PvjPwLqIITWec4gsejnmPLpUCgMU8xuIEZgTlg65QIRo$.
>
> I tried to keep it general such that we keep all options open.
>
> Let me know if you want to change/add/remove anything and/or if you
> want to be listed as mentor.
>
> Since I've mixed up the deadline, the most urgent task is the finding
> of suitable candidates. Once it's online, I'll post on linkedin but
> ideally we can motivate someone who is already known.
>
> best regards,
> Martin
>
> On Thu, 2024-03-21 at 23:13 -0600, Jed Brown wrote:
>> Barry Smith  writes:
>> 
>> > > We already have the generated ftn-auto-interfaces/*.h90. The
>> > > INTERFACE keyword could be replaced with CONTAINS (making these
>> > > definitions instead of just interfaces), and then the bodies
>> > > could use iso_c_binding to call the C functions. That would
>> > > reduce repetition and be the standards-compliant way to do this.
>> > 
>> >    Sure, the interface and the stub go in the same file instead of
>> > two files. This is slightly nicer but not significantly simpler,
>> > and alone, it is not reason enough to write an entire new stub
>> > generator.
>> 
>> I agree, but if one *is* writing a new stub generator for good
>> reasons (like better automation/completeness), there's a case for
>> doing it this way unless users really need an environment in which
>> that feature can't be used.
>> 
>> > > What we do now with detecting the Fortran mangling scheme and
>> > > calling conventions "works" but doesn't conform to any standard
>> > > and there's nothing stopping Fortran implementations from
>> > > creating yet another variant that we have to deal with manually.
>> > 
>> >    From practical experience, calling C/Fortran using non-standards
>> > has only gotten easier over the last thirty-five years (fewer
>> > variants on how char* is handled); it has not gotten more
>> > complicated, so I submit the likelihood of "nothing stopping
>> > Fortran implementations from creating yet another variant that we
>> > have to deal with manually" is (though possible) rather unlikely.
>> > As far as I am concerned, much of iso_c_binding stuff just solved a
>> > problem that never really existed (except in some people's minds)
>> > since calling C/Fortran has always been easy, modulo knowing a tiny
>> > bit of information..
>> 
>> An examples for concreteness:
>> 
>> https://urldefense.us/v3/__https://fortranwiki.org/fortran/show/Generating*C*Interfaces__;Kys!!G_uCfscf7eWS!blt-tme3gpPtGXW8o_NasIEtvlUql08d2_Q9Q83PvjPwLqIITWec4gsejnmPLpUCgMU8xuIEZgTlsKy54Ps$
>> 
>> And discussion:
>> 
>> https://urldefense.us/v3/__https://fortran-lang.discourse.group/t/iso-c-binding-looking-for-practical-example-of-how-it-helps-with-mangling/3393/8__;!!G_uCfscf7eWS!blt-tme3gpPtGXW8o_NasIEtvlUql08d2_Q9Q83PvjPwLqIITWec4gsejnmPLpUCgMU8xuIEZgTlVTiuUTk$
>> 
>> With this approach, one could even use method syntax like
>> ksp%SetOperators(J, Jpre), as in the nlopt-f project linked in the
>> top of this question. I don't know if we want that (it would be a
>> huge change for users, albeit it "easy"), but generating stubs in
>> Fortran using iso_c_binding opens a variety of possibilities for more
>> idiomatic bindings.
>
> -- 
> KU Leuven
> Department of Computer Science
> Department of Materials Engineering
> Celestijnenlaan 200a
> 3001 Leuven, Belgium
> # Improved generation of Fortran interfaces for [PETSc](https://urldefense.us/v3/__https://petsc.org__;!!G_uCfscf7eWS!blt-tme3gpPtGXW8o_NasIEtvlUql08d2_Q9Q83PvjPwLqIITWec4gsejnmPLpUCgMU8xuIEZgTljowbJv0$)
>
> PETSc, the Portable, Extensible Toolkit for Scientific Computation, pronounced PET-see, is for the scalable (parall

Re: [petsc-users] Fortran interfaces: Google Summer of Code project?

2024-03-21 Thread Jed Brown

 Barry Smith  writes: >> We already have the generated ftn-auto-interfaces/*. h90. The INTERFACE keyword could be replaced with CONTAINS (making these definitions instead of just interfaces), and then the bodies

ZjQcmQRYFpfptBannerStart

	This Message Is From an External Sender

This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

Barry Smith  writes:

>> We already have the generated ftn-auto-interfaces/*.h90. The INTERFACE keyword could be replaced with CONTAINS (making these definitions instead of just interfaces), and then the bodies could use iso_c_binding to call the C functions. That would reduce repetition and be the standards-compliant way to do this.
>
>Sure, the interface and the stub go in the same file instead of two files. This is slightly nicer but not significantly simpler, and alone, it is not reason enough to write an entire new stub generator.

I agree, but if one *is* writing a new stub generator for good reasons (like better automation/completeness), there's a case for doing it this way unless users really need an environment in which that feature can't be used.

>> What we do now with detecting the Fortran mangling scheme and calling conventions "works" but doesn't conform to any standard and there's nothing stopping Fortran implementations from creating yet another variant that we have to deal with manually.
>
>From practical experience, calling C/Fortran using non-standards has only gotten easier over the last thirty-five years (fewer variants on how char* is handled); it has not gotten more complicated, so I submit the likelihood of "nothing stopping Fortran implementations from creating yet another variant that we have to deal with manually" is (though possible) rather unlikely. As far as I am concerned, much of iso_c_binding stuff just solved a problem that never really existed (except in some people's minds) since calling C/Fortran has always been easy, modulo knowing a tiny bit of information..

An examples for concreteness:

https://urldefense.us/v3/__https://fortranwiki.org/fortran/show/Generating*C*Interfaces__;Kys!!G_uCfscf7eWS!YxEvDc1D4VU4lVeRBtAnpLnK6bOtF5usV8GsUXqiSsukaNrK5znZB16c7n5VUgIqFwNTGEkSEppdwFVuJSM$

And discussion:

https://urldefense.us/v3/__https://fortran-lang.discourse.group/t/iso-c-binding-looking-for-practical-example-of-how-it-helps-with-mangling/3393/8__;!!G_uCfscf7eWS!YxEvDc1D4VU4lVeRBtAnpLnK6bOtF5usV8GsUXqiSsukaNrK5znZB16c7n5VUgIqFwNTGEkSEppdx1pKPgQ$

With this approach, one could even use method syntax like ksp%SetOperators(J, Jpre), as in the nlopt-f project linked in the top of this question. I don't know if we want that (it would be a huge change for users, albeit it "easy"), but generating stubs in Fortran using iso_c_binding opens a variety of possibilities for more idiomatic bindings.

Re: [petsc-users] Fortran interfaces: Google Summer of Code project?

2024-03-21 Thread Jed Brown

Barry Smith writes: > In my limited understanding of the Fortran iso_c_binding, if we do not provide an equivalent Fortran stub (the user calls) that uses the iso_c_binding to call PETSc C code, then when the user

ZjQcmQRYFpfptBannerStart

This Message Is From an External Sender

This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

Barry Smith writes:

> In my limited understanding of the Fortran iso_c_binding, if we do not provide an equivalent Fortran stub (the user calls) that uses the iso_c_binding to call PETSc C code, then when the user calls PETSc C code directly via the iso_c_binding they have to pass iso_c_binding type arguments to the call. This I consider unacceptable. So my conclusion was there is the same number of stubs, just in a different language, so there is no reason to consider changing since we cannot "delete lots of stubs", but I could be wrong.

I don't want users to deal with iso_c_binding manually.

We already have the generated ftn-auto-interfaces/*.h90. The INTERFACE keyword could be replaced with CONTAINS (making these definitions instead of just interfaces), and then the bodies could use iso_c_binding to call the C functions. That would reduce repetition and be the standards-compliant way to do this. What we do now with detecting the Fortran mangling scheme and calling conventions "works" but doesn't conform to any standard and there's nothing stopping Fortran implementations from creating yet another variant that we have to deal with manually.

I don't know if this change would enable inlining without LTO, though I think the indirection through our C sourcef.c is rarely a performance factor for Fortran users.

Re: [petsc-users] Fortran interfaces: Google Summer of Code project?

2024-03-21 Thread Jed Brown

 Barry Smith  writes: > We've always had some tension between adding new features to bfort vs developing an entirely new tool (for example in Python (maybe calling a little LLVM to help parse the C function), for maybe

ZjQcmQRYFpfptBannerStart

	This Message Is From an External Sender

This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

Barry Smith  writes:

>  We've always had some tension between adding new features to bfort vs developing an entirely new tool (for example in Python (maybe calling a little LLVM to help parse the C function), for maybe there is already a tool out there) to replace bfort.

Note that depending on LLVM (presumably libclang) is a nontrivial dependency if the users don't already have it installed on their systems. I'm all for making it easier to extend the stub generator, but an equally-hacky pybfort wouldn't make much difference. If some better tools have emerged or we have a clear idea for a better design, let's discuss that.

>  Both approaches have their advantages and disadvantages instead we've relied on the quick and dirty of providing the interfaces as needed). We have not needed the Fortran standard C interface stuff and I would prefer not to use it unless it offers some huge advantage).

Mainly that lots of C stubs could be deleted in favor of iso_c_binding.

Re: [petsc-users] 'Preconditioning' with lower-order method

2024-03-03 Thread Jed Brown





 If you're having PETSc use coloring and have confirmed that the stencil is sufficient, then it would be nonsmoothness (again, consider the limiter you've chosen) preventing quadratic convergence (assuming that doesn't kick in eventually). Note




ZjQcmQRYFpfptBannerStart




  

  
	This Message Is From an External Sender
  
  
This message came from outside your organization.
  



 
  


ZjQcmQRYFpfptBannerEnd




If you're having PETSc use coloring and have confirmed that the stencil is sufficient, then it would be nonsmoothness (again, consider the limiter you've chosen) preventing quadratic convergence (assuming that doesn't kick in eventually). Note that assembling a Jacobian of a second order TVD operator requires at least second neighbors while the first order needs only first neighbors, thus is much sparser and needs fewer colors to compute. I expect you're either not exploiting that in the timings or something else is amiss. You can run with `-log_view -snes_view -ksp_converged_reason` to get a bit more information about what's happening. 

"Zou, Ling via petsc-users"  writes:

> Barry, thank you.
> I am not sure if I exactly follow you on this:
> “Are you forming the Jacobian for the first and second order cases inside of Newton?”
>
> The problem that we deal with, heat/mass transfer in heterogeneous systems (reactor system), is generally small in terms of size, i.e., # of DOFs (several k to maybe 100k level), so for now, I completely rely on PETSc to compute Jacobian, i.e., finite-differencing.
>
> That’s a good suggestion to see the time spent during various events.
> What motivated me to try the options are the following observations.
>
> 2nd order FVM:
>
> Time Step 149, time = 13229.7, dt = 100
>
> NL Step =  0, fnorm =  7.80968E-03
>
> NL Step =  1, fnorm =  7.65731E-03
>
> NL Step =  2, fnorm =  6.85034E-03
>
> NL Step =  3, fnorm =  6.11873E-03
>
> NL Step =  4, fnorm =  1.57347E-03
>
> NL Step =  5, fnorm =  9.03536E-04
>
>  Solve Converged!
>
> 1st order FVM:
>
> Time Step 149, time = 13229.7, dt = 100
>
> NL Step =  0, fnorm =  7.90072E-03
>
> NL Step =  1, fnorm =  2.01919E-04
>
> NL Step =  2, fnorm =  1.06960E-05
>
> NL Step =  3, fnorm =  2.41683E-09
>
>  Solve Converged!
>
> Notice the obvious ‘stagnant’ in residual for the 2nd order method while not in the 1st order.
> For the same problem, the wall time is 10 sec vs 6 sec. I would be happy if I can reduce 2 sec for the 2nd order method.
>
> -Ling
>
> From: Barry Smith 
> Date: Sunday, March 3, 2024 at 12:06 PM
> To: Zou, Ling 
> Cc: petsc-users@mcs.anl.gov 
> Subject: Re: [petsc-users] 'Preconditioning' with lower-order method
> Are you forming the Jacobian for the first and second order cases inside of Newton? You can run both with -log_view to see how much time is spent in the various events (compute function, compute Jacobian, linear solve, .. . ) for the two cases
> ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> This message came from outside your organization.
>
> ZjQcmQRYFpfptBannerEnd
>
>Are you forming the Jacobian for the first and second order cases inside of Newton?
>
>You can run both with -log_view to see how much time is spent in the various events (compute function, compute Jacobian, linear solve, ...) for the two cases and compare them.
>
>
>
>
> On Mar 3, 2024, at 11:42 AM, Zou, Ling via petsc-users  wrote:
>
> Original email may have been sent to the incorrect place.
> See below.
>
> -Ling
>
> From: Zou, Ling >
> Date: Sunday, March 3, 2024 at 10:34 AM
> To: petsc-users >
> Subject: 'Preconditioning' with lower-order method
> Hi all,
>
> I am solving a PDE system over a spatial domain. Numerical methods are:
>
>   *   Finite volume method (both 1st and 2nd order implemented)
>   *   BDF1 and BDF2 for time integration.
> What I have noticed is that 1st order FVM converges much faster than 2nd order FVM, regardless the time integration scheme. Well, not surprising since 2nd order FVM introduces additional non-linearity.
>
> I’m thinking about two possible ways to speed up 2nd order FVM, and would like to get some thoughts or community knowledge before jumping into code implementation.
>
> Say, let the 2nd order FVM residual function be F2(x) = 0; and the 1st order FVM residual function be F1(x) = 0.
>
>   1.  Option – 1, multi-step for each time step
> Step 1: solving F1(x) = 0 to obtain a temporary solution x1
> Step 2: feed x1 as an initial guess to solve F2(x) = 0 to obtain the final solution.
> [Not sure if gain any saving at all]
>
>
>   1.  Option -2, dynamically changing residual function F(x)
> In pseudo code, would be something like.
>
> snesFormFunction(SNES snes, Vec u, Vec f, void *)
> {
>   if (snes.nl_it_no < 4) // 4 being arbitrary here
> f = F1(u);
>   else
> f = F2(u);
> }
>
> I know this might be a bit crazy since it may crash after switching residual function, still, any thoughts?
>

Re: [petsc-users] FW: 'Preconditioning' with lower-order method

2024-03-03 Thread Jed Brown

 One option is to form the preconditioner using the FV1 method, which is sparser and satisfies h-ellipticity, while using FV2 for the residual and (optionally) for matrix-free operator application. FV1 is a highly diffusive method so in a sense,

ZjQcmQRYFpfptBannerStart

	This Message Is From an External Sender

This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

One option is to form the preconditioner using the FV1 method, which is sparser and satisfies h-ellipticity, while using FV2 for the residual and (optionally) for matrix-free operator application.

FV1 is a highly diffusive method so in a sense, it's much less faithful to the physics and (say, in the case of fluids) similar to a much lower-Reynolds number (if you use a modified equation analysis to work out the effective Reynolds number in the presence of the numerical diffusion).

It's good to put some thought into your choice of limiter. Note that intersection of second order and TVD methods leads to mandatory nonsmoothness (discontinuous derivatives). 

"Zou, Ling via petsc-users"  writes:

> Original email may have been sent to the incorrect place.
> See below.
>
> -Ling
>
> From: Zou, Ling 
> Date: Sunday, March 3, 2024 at 10:34 AM
> To: petsc-users 
> Subject: 'Preconditioning' with lower-order method
> Hi all,
>
> I am solving a PDE system over a spatial domain. Numerical methods are:
>
>   *   Finite volume method (both 1st and 2nd order implemented)
>   *   BDF1 and BDF2 for time integration.
> What I have noticed is that 1st order FVM converges much faster than 2nd order FVM, regardless the time integration scheme. Well, not surprising since 2nd order FVM introduces additional non-linearity.
>
> I’m thinking about two possible ways to speed up 2nd order FVM, and would like to get some thoughts or community knowledge before jumping into code implementation.
>
> Say, let the 2nd order FVM residual function be F2(x) = 0; and the 1st order FVM residual function be F1(x) = 0.
>
>   1.  Option – 1, multi-step for each time step
> Step 1: solving F1(x) = 0 to obtain a temporary solution x1
> Step 2: feed x1 as an initial guess to solve F2(x) = 0 to obtain the final solution.
> [Not sure if gain any saving at all]
>
>
>   1.  Option -2, dynamically changing residual function F(x)
>
> In pseudo code, would be something like.
>
>
>
> snesFormFunction(SNES snes, Vec u, Vec f, void *)
>
> {
>
>   if (snes.nl_it_no < 4) // 4 being arbitrary here
>
> f = F1(u);
>
>   else
>
> f = F2(u);
>
> }
>
>
>
> I know this might be a bit crazy since it may crash after switching residual function, still, any thoughts?
>
> Best,
>
> -Ling

Re: [petsc-users] Parallel vector layout for TAO optimization with separable state/design structure

2024-01-30 Thread Jed Brown

For a bit of assistance, you can use DMComposite and DMRedundantCreate; see 
src/snes/tutorials/ex21.c and ex22.c.

Note that when computing redundantly, it's critical that the computation be 
deterministic (i.e., not using atomics or randomness without matching seeds) so 
the logic stays collective.

This merge request may also be relevant and comments related to your needs 
would be welcome in the discussion. 

https://gitlab.com/petsc/petsc/-/merge_requests/6531

Barry Smith  writes:

>This is a problem with MPI programming and optimization; I am unaware of a 
> perfect solution.
>
>Put the design variables into the solution vector on MPI rank 0, and when 
> doing your objective/gradient, send the values to all the MPI processes where 
> you use them. You can use a VecScatter to handle the communication you need 
> or MPI_Scatter() etc whatever makes the most sense in your code. 
>
>Barry
>
>
>> On Jan 30, 2024, at 10:53 AM, Guenther, Stefanie via petsc-users 
>>  wrote:
>> 
>> Hi Petsc team, 
>>  
>> I have a question regarding parallel layout of a Petsc vector to be used in 
>> TAO optimizers for cases where the optimization variables split into 
>> ‘design’ and ‘state’ variables (e.g. such as in PDE-constrained optimization 
>> as in tao_lcl). In our case, the state variable naturally parallelizes 
>> evenly amongst multiple processors and this distribution is fixed. The 
>> ‘design’ vector however does not, it is very short compared to the state 
>> vector and it is required on all state-processors when evaluating the 
>> objective function and gradient. My question would be how the TAO 
>> optimization vector x = [design,state] should be created in such a way that 
>> the ‘state’ part is distributed as needed in our solver, while the design 
>> part is not.
>>  
>> My only idea so far was to copy the design variables to all processors and 
>> augment / interleave the optimization vector as x = [state_proc1,design, 
>> state_proc2, design, … ] . When creating this vector in parallel on 
>> PETSC_COMM_WORLD, each processor would then own the same number of variables 
>> ( [state_proc, design] ), as long as the numbers match up, and I would 
>> only need to be careful when gathering the gradient wrt the design parts 
>> from all processors.
>>  
>> This seems cumbersome however, and I would be worried whether the 
>> optimization problem is harder to solve this way. Is there any other way to 
>> achieve this splitting, that I am missing here? Note that the distribution 
>> of the state itself is given and can not be changed, and that the state vs 
>> design vectors have very different (and independent) dimensions.
>>  
>> Thanks for your help and thoughts!
>> Best,
>> Stefanie

Re: [petsc-users] fortran interface to snes matrix-free jacobian

2023-12-20 Thread Jed Brown

Then just use MatShell. I see the docs need some work to clarify this, but 
MatCreateSNESMF is to specify matrix-free finite differencing from code 
(perhaps where one wants to customize parameters).

Yi Hu  writes:

> Dear Jed,
>
> Thanks for your reply. I have an analytical one to implement.
>
> Best, Yi
>
> -Original Message-----
> From: Jed Brown  
> Sent: Wednesday, December 20, 2023 5:40 PM
> To: Yi Hu ; petsc-users@mcs.anl.gov
> Subject: Re: [petsc-users] fortran interface to snes matrix-free jacobian
>
> Are you wanting an analytic matrix-free operator or one created for you based 
> on finite differencing? If the latter, just use -snes_mf or -snes_mf_operator.
>
> https://petsc.org/release/manual/snes/#jacobian-evaluation
>
> Yi Hu  writes:
>
>> Dear PETSc team,
>>  
>> My  solution scheme relies on a matrix-free jacobian in the SNES solver. I 
>> saw the useful C interface like MatCreateSNESMF(), DMSNESCreateJacobianMF(). 
>> I am wondering if you have the fortran equivalence?
>>  
>> I think for my problem in the main program I need to do 
>> DMDASNESsetJacobianLocal(DM, INSERT_VALUES, myJacobian, ctx, err_petsc). 
>> Then in myJacobian() subroutine I have to create the operator from 
>> DMSNESCreateJacobianMF(), and register my own MATOP_MULT from 
>> MatShellSetOperation(). Am I correct? 
>>  
>> Are these fortran subroutines available? I saw an example in ts module 
>> as ex22f_mf.F90 which behaves similar as what I would like to do. Because I 
>> would like to use ngmres, I then need to stay in the SNES.
>>  
>> Thanks for your help.
>>  
>> Best wishes,
>> Yi
>>
>> -
>> Stay up to date and follow us on LinkedIn, Twitter and YouTube.
>>
>> Max-Planck-Institut für Eisenforschung GmbH Max-Planck-Straße 1
>> D-40237 Düsseldorf
>>  
>> Handelsregister B 2533
>> Amtsgericht Düsseldorf
>>  
>> Geschäftsführung
>> Prof. Dr. Gerhard Dehm
>> Prof. Dr. Jörg Neugebauer
>> Prof. Dr. Dierk Raabe
>> Dr. Kai de Weldige
>>  
>> Ust.-Id.-Nr.: DE 11 93 58 514
>> Steuernummer: 105 5891 1000
>>
>>
>> Please consider that invitations and e-mails of our institute are only 
>> valid if they end with …@mpie.de.
>> If you are not sure of the validity please contact r...@mpie.de
>>
>> Bitte beachten Sie, dass Einladungen zu Veranstaltungen und E-Mails 
>> aus unserem Haus nur mit der Endung …@mpie.de gültig sind.
>> In Zweifelsfällen wenden Sie sich bitte an r...@mpie.de
>> -
>
>
> -
> Stay up to date and follow us on LinkedIn, Twitter and YouTube.
>
> Max-Planck-Institut für Eisenforschung GmbH
> Max-Planck-Straße 1
> D-40237 Düsseldorf
>  
> Handelsregister B 2533 
> Amtsgericht Düsseldorf
>  
> Geschäftsführung
> Prof. Dr. Gerhard Dehm
> Prof. Dr. Jörg Neugebauer
> Prof. Dr. Dierk Raabe
> Dr. Kai de Weldige
>  
> Ust.-Id.-Nr.: DE 11 93 58 514 
> Steuernummer: 105 5891 1000
>
>
> Please consider that invitations and e-mails of our institute are 
> only valid if they end with …@mpie.de. 
> If you are not sure of the validity please contact r...@mpie.de
>
> Bitte beachten Sie, dass Einladungen zu Veranstaltungen und E-Mails
> aus unserem Haus nur mit der Endung …@mpie.de gültig sind. 
> In Zweifelsfällen wenden Sie sich bitte an r...@mpie.de
> -

Re: [petsc-users] fortran interface to snes matrix-free jacobian

2023-12-20 Thread Jed Brown

Are you wanting an analytic matrix-free operator or one created for you based 
on finite differencing? If the latter, just use -snes_mf or -snes_mf_operator.

https://petsc.org/release/manual/snes/#jacobian-evaluation

Yi Hu  writes:

> Dear PETSc team,
>  
> My  solution scheme relies on a matrix-free jacobian in the SNES solver. I 
> saw the useful C interface like MatCreateSNESMF(), DMSNESCreateJacobianMF(). 
> I am wondering if you have the fortran equivalence?
>  
> I think for my problem in the main program I need to do 
> DMDASNESsetJacobianLocal(DM, INSERT_VALUES, myJacobian, ctx, err_petsc). Then 
> in myJacobian() subroutine I have to create the operator from 
> DMSNESCreateJacobianMF(), and register my own MATOP_MULT from 
> MatShellSetOperation(). Am I correct? 
>  
> Are these fortran subroutines available? I saw an example in ts module as 
> ex22f_mf.F90 which behaves similar as what I would like to do. Because I 
> would like to use ngmres, I then need to stay in the SNES.  
>  
> Thanks for your help.
>  
> Best wishes,
> Yi
>
> -
> Stay up to date and follow us on LinkedIn, Twitter and YouTube.
>
> Max-Planck-Institut für Eisenforschung GmbH
> Max-Planck-Straße 1
> D-40237 Düsseldorf
>  
> Handelsregister B 2533 
> Amtsgericht Düsseldorf
>  
> Geschäftsführung
> Prof. Dr. Gerhard Dehm
> Prof. Dr. Jörg Neugebauer
> Prof. Dr. Dierk Raabe
> Dr. Kai de Weldige
>  
> Ust.-Id.-Nr.: DE 11 93 58 514 
> Steuernummer: 105 5891 1000
>
>
> Please consider that invitations and e-mails of our institute are 
> only valid if they end with …@mpie.de. 
> If you are not sure of the validity please contact r...@mpie.de
>
> Bitte beachten Sie, dass Einladungen zu Veranstaltungen und E-Mails
> aus unserem Haus nur mit der Endung …@mpie.de gültig sind. 
> In Zweifelsfällen wenden Sie sich bitte an r...@mpie.de
> -

Re: [petsc-users] [EXTERNAL] Re: Call to DMSetMatrixPreallocateSkip not changing allocation behavior

2023-12-18 Thread Jed Brown

Great, thanks for letting us know. It'll merge to release shortly and thus be 
in petsc >= 3.20.3.

"Fackler, Philip"  writes:

> Jed,
>
> That seems to have worked (ridiculously well). It's now 55MB, and it's 
> happening in the call to MatSetPreallocationCOO.
>
> Thank you,
>
> Philip Fackler
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> Oak Ridge National Laboratory
> 
> From: Jed Brown 
> Sent: Thursday, December 14, 2023 16:27
> To: Fackler, Philip ; petsc-users@mcs.anl.gov 
> ; xolotl-psi-developm...@lists.sourceforge.net 
> 
> Subject: [EXTERNAL] Re: [petsc-users] Call to DMSetMatrixPreallocateSkip not 
> changing allocation behavior
>
> I had a one-character typo in the diff above. This MR to release should work 
> now.
>
> https://urldefense.us/v2/url?u=https-3A__gitlab.com_petsc_petsc_-2D_merge-5Frequests_7120=DwIBAg=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ=v9sHqomCGBRWotign4NcwYwOpszOJehUGs_EO3eGn4SSZqxnfK7Iv15-X8nO1lii=h_jIP-6WcIjR6LssfGrV6Z2DojlN_w7Me4-a4rBE074=
>
> Jed Brown  writes:
>
>> 17 GB for a 1D DMDA, wow. :-)
>>
>> Could you try applying this diff to make it work for DMDA (it's currently 
>> handled by DMPlex)?
>>
>> diff --git i/src/dm/impls/da/fdda.c w/src/dm/impls/da/fdda.c
>> index cad4d926504..bd2a3bda635 100644
>> --- i/src/dm/impls/da/fdda.c
>> +++ w/src/dm/impls/da/fdda.c
>> @@ -675,19 +675,21 @@ PetscErrorCode DMCreateMatrix_DA(DM da, Mat *J)
>> specialized setting routines depend only on the particular preallocation
>> details of the matrix, not the type itself.
>>*/
>> -  PetscCall(PetscObjectQueryFunction((PetscObject)A, 
>> "MatMPIAIJSetPreallocation_C", ));
>> -  if (!aij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
>> "MatSeqAIJSetPreallocation_C", ));
>> -  if (!aij) {
>> -PetscCall(PetscObjectQueryFunction((PetscObject)A, 
>> "MatMPIBAIJSetPreallocation_C", ));
>> -if (!baij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
>> "MatSeqBAIJSetPreallocation_C", ));
>> -if (!baij) {
>> -  PetscCall(PetscObjectQueryFunction((PetscObject)A, 
>> "MatMPISBAIJSetPreallocation_C", ));
>> -  if (!sbaij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
>> "MatSeqSBAIJSetPreallocation_C", ));
>> -  if (!sbaij) {
>> -PetscCall(PetscObjectQueryFunction((PetscObject)A, 
>> "MatMPISELLSetPreallocation_C", ));
>> -if (!sell) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
>> "MatSeqSELLSetPreallocation_C", ));
>> +  if (!dm->prealloc_skip) { // Flag is likely set when user intends to use 
>> MatSetPreallocationCOO()
>> +PetscCall(PetscObjectQueryFunction((PetscObject)A, 
>> "MatMPIAIJSetPreallocation_C", ));
>> +if (!aij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
>> "MatSeqAIJSetPreallocation_C", ));
>> +if (!aij) {
>> +  PetscCall(PetscObjectQueryFunction((PetscObject)A, 
>> "MatMPIBAIJSetPreallocation_C", ));
>> +  if (!baij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
>> "MatSeqBAIJSetPreallocation_C", ));
>> +  if (!baij) {
>> +PetscCall(PetscObjectQueryFunction((PetscObject)A, 
>> "MatMPISBAIJSetPreallocation_C", ));
>> +if (!sbaij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
>> "MatSeqSBAIJSetPreallocation_C", ));
>> +if (!sbaij) {
>> +  PetscCall(PetscObjectQueryFunction((PetscObject)A, 
>> "MatMPISELLSetPreallocation_C", ));
>> +  if (!sell) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
>> "MatSeqSELLSetPreallocation_C", ));
>> +}
>> +if (!sell) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
>> "MatISSetPreallocation_C", ));
>>}
>> -  if (!sell) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
>> "MatISSetPreallocation_C", ));
>>  }
>>}
>>if (aij) {
>>
>>
>> "Fackler, Philip via petsc-users"  writes:
>>
>>> I'm using the following sequence of functions related to the Jacobian 
>>> matrix:
>>>
>>> DMDACreate1d(..., );
>>> DMSetFromOptions(da);
>>> DMSetUp(da);
>>> DMSetMatType(da, MATAIJKOKKOS);
>>> DMSetMatrixPreallocateSkip(da, P

Re: [petsc-users] Call to DMSetMatrixPreallocateSkip not changing allocation behavior

2023-12-14 Thread Jed Brown

I had a one-character typo in the diff above. This MR to release should work 
now.

https://gitlab.com/petsc/petsc/-/merge_requests/7120

Jed Brown  writes:

> 17 GB for a 1D DMDA, wow. :-)
>
> Could you try applying this diff to make it work for DMDA (it's currently 
> handled by DMPlex)?
>
> diff --git i/src/dm/impls/da/fdda.c w/src/dm/impls/da/fdda.c
> index cad4d926504..bd2a3bda635 100644
> --- i/src/dm/impls/da/fdda.c
> +++ w/src/dm/impls/da/fdda.c
> @@ -675,19 +675,21 @@ PetscErrorCode DMCreateMatrix_DA(DM da, Mat *J)
> specialized setting routines depend only on the particular preallocation
> details of the matrix, not the type itself.
>*/
> -  PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatMPIAIJSetPreallocation_C", ));
> -  if (!aij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatSeqAIJSetPreallocation_C", ));
> -  if (!aij) {
> -PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatMPIBAIJSetPreallocation_C", ));
> -if (!baij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatSeqBAIJSetPreallocation_C", ));
> -if (!baij) {
> -  PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatMPISBAIJSetPreallocation_C", ));
> -  if (!sbaij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatSeqSBAIJSetPreallocation_C", ));
> -  if (!sbaij) {
> -PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatMPISELLSetPreallocation_C", ));
> -if (!sell) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatSeqSELLSetPreallocation_C", ));
> +  if (!dm->prealloc_skip) { // Flag is likely set when user intends to use 
> MatSetPreallocationCOO()
> +PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatMPIAIJSetPreallocation_C", ));
> +if (!aij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatSeqAIJSetPreallocation_C", ));
> +if (!aij) {
> +  PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatMPIBAIJSetPreallocation_C", ));
> +  if (!baij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatSeqBAIJSetPreallocation_C", ));
> +  if (!baij) {
> +PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatMPISBAIJSetPreallocation_C", ));
> +if (!sbaij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatSeqSBAIJSetPreallocation_C", ));
> +if (!sbaij) {
> +  PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatMPISELLSetPreallocation_C", ));
> +  if (!sell) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatSeqSELLSetPreallocation_C", ));
> +}
> +if (!sell) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatISSetPreallocation_C", ));
>}
> -  if (!sell) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatISSetPreallocation_C", ));
>  }
>}
>if (aij) {
>
>
> "Fackler, Philip via petsc-users"  writes:
>
>> I'm using the following sequence of functions related to the Jacobian matrix:
>>
>> DMDACreate1d(..., );
>> DMSetFromOptions(da);
>> DMSetUp(da);
>> DMSetMatType(da, MATAIJKOKKOS);
>> DMSetMatrixPreallocateSkip(da, PETSC_TRUE);
>> Mat J;
>> DMCreateMatrix(da, );
>> MatSetPreallocationCOO(J, ...);
>>
>> I recently added the call to DMSetMatrixPreallocateSkip, hoping the 
>> allocation would be delayed to MatSetPreallocationCOO, and that it would 
>> require less memory. The 
>> documentation<https://petsc.org/release/manualpages/DM/DMSetMatrixPreallocateSkip/>
>>  says that the data structures will not be preallocated. The following data 
>> from heaptrack shows that the allocation is still happening in the call to 
>> DMCreateMatrix.
>>
>> [cid:bda9ef12-a46f-47b2-9b9b-a4b2808b6b13]
>>
>> Can someone help me understand this?
>>
>> Thanks,
>>
>> Philip Fackler
>> Research Software Engineer, Application Engineering Group
>> Advanced Computing Systems Research Section
>> Computer Science and Mathematics Division
>> Oak Ridge National Laboratory

Re: [petsc-users] Call to DMSetMatrixPreallocateSkip not changing allocation behavior

2023-12-14 Thread Jed Brown

17 GB for a 1D DMDA, wow. :-)

Could you try applying this diff to make it work for DMDA (it's currently 
handled by DMPlex)?

diff --git i/src/dm/impls/da/fdda.c w/src/dm/impls/da/fdda.c
index cad4d926504..bd2a3bda635 100644
--- i/src/dm/impls/da/fdda.c
+++ w/src/dm/impls/da/fdda.c
@@ -675,19 +675,21 @@ PetscErrorCode DMCreateMatrix_DA(DM da, Mat *J)
specialized setting routines depend only on the particular preallocation
details of the matrix, not the type itself.
   */
-  PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatMPIAIJSetPreallocation_C", ));
-  if (!aij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatSeqAIJSetPreallocation_C", ));
-  if (!aij) {
-PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatMPIBAIJSetPreallocation_C", ));
-if (!baij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatSeqBAIJSetPreallocation_C", ));
-if (!baij) {
-  PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatMPISBAIJSetPreallocation_C", ));
-  if (!sbaij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatSeqSBAIJSetPreallocation_C", ));
-  if (!sbaij) {
-PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatMPISELLSetPreallocation_C", ));
-if (!sell) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatSeqSELLSetPreallocation_C", ));
+  if (!dm->prealloc_skip) { // Flag is likely set when user intends to use 
MatSetPreallocationCOO()
+PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatMPIAIJSetPreallocation_C", ));
+if (!aij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatSeqAIJSetPreallocation_C", ));
+if (!aij) {
+  PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatMPIBAIJSetPreallocation_C", ));
+  if (!baij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatSeqBAIJSetPreallocation_C", ));
+  if (!baij) {
+PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatMPISBAIJSetPreallocation_C", ));
+if (!sbaij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatSeqSBAIJSetPreallocation_C", ));
+if (!sbaij) {
+  PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatMPISELLSetPreallocation_C", ));
+  if (!sell) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatSeqSELLSetPreallocation_C", ));
+}
+if (!sell) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatISSetPreallocation_C", ));
   }
-  if (!sell) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatISSetPreallocation_C", ));
 }
   }
   if (aij) {


"Fackler, Philip via petsc-users"  writes:

> I'm using the following sequence of functions related to the Jacobian matrix:
>
> DMDACreate1d(..., );
> DMSetFromOptions(da);
> DMSetUp(da);
> DMSetMatType(da, MATAIJKOKKOS);
> DMSetMatrixPreallocateSkip(da, PETSC_TRUE);
> Mat J;
> DMCreateMatrix(da, );
> MatSetPreallocationCOO(J, ...);
>
> I recently added the call to DMSetMatrixPreallocateSkip, hoping the 
> allocation would be delayed to MatSetPreallocationCOO, and that it would 
> require less memory. The 
> documentation
>  says that the data structures will not be preallocated. The following data 
> from heaptrack shows that the allocation is still happening in the call to 
> DMCreateMatrix.
>
> [cid:bda9ef12-a46f-47b2-9b9b-a4b2808b6b13]
>
> Can someone help me understand this?
>
> Thanks,
>
> Philip Fackler
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> Oak Ridge National Laboratory

Re: [petsc-users] Bug report VecNorm

2023-12-10 Thread Jed Brown

Pierre Jolivet  writes:

>> On 10 Dec 2023, at 8:40 AM, Stephan Köhler 
>>  wrote:
>> 
>> Dear PETSc/Tao team, 
>> 
>> there is a bug in the voector interface:  In the function 
>> VecNorm, see, eg. 
>> https://petsc.org/release/src/vec/vec/interface/rvector.c.html#VecNorm line 
>> 197 the check for consistency in line 214 is done on the wrong communicator. 
>>  The  communicator should be PETSC_COMM_SELF.
>> Otherwise the program may hang when PetscCheck is executed.
>
> I think the communicator should not be changed, but instead, the 
> check/conditional should be changed, à la PetscValidLogicalCollectiveBool().

I agree -- it's no extra cost to discover collectively whether all, none, or 
some have the norm. In this case, it could be a MPI_SUM, in which case the 
error message could report how many processes took each path.

Re: [petsc-users] PETSc and MPI-3/RMA

2023-12-09 Thread Jed Brown

It uses nonblocking point-to-point by default since that tends to perform 
better and is less prone to MPI implementation bugs, but you can select 
`-sf_type window` to try it, or use other strategies here depending on the sort 
of problem you're working with.

#define PETSCSFBASIC  "basic"
#define PETSCSFNEIGHBOR   "neighbor"
#define PETSCSFALLGATHERV "allgatherv"
#define PETSCSFALLGATHER  "allgather"
#define PETSCSFGATHERV"gatherv"
#define PETSCSFGATHER "gather"
#define PETSCSFALLTOALL   "alltoall"
#define PETSCSFWINDOW "window"

PETSc does try to use GPU-aware MPI, though implementation bugs are present on 
many machines and it often requires a delicate environment arrangement.

"Maeder  Alexander"  writes:

> I am a new user of PETSc
>
> and want to know more about the underlying implementation for matrix-vector 
> multiplication (Ax=y).
>
> PETSc utilizes a 1D distribution and communicates only parts of the vector x 
> utilized depending on the sparsity pattern of A.
>
> Is the communication of x done with MPI-3 RMA and utilizes cuda-aware mpi for 
> RMA?
>
>
> Best regards,
>
>
> Alexander Maeder

Re: [petsc-users] Reading VTK files in PETSc

2023-11-30 Thread Jed Brown

I assume you're working with a DA, in which case you can write in HDF5 format 
and add an Xdmf header so Paraview can read it. The utility 
lib/petsc/bin/petsc_gen_xdmf.py should be able to handle it.

I haven't written support for it (my problems are unstructured so I've focused 
on DMPlex), but the CGNS format supports structured meshes and that would be an 
efficient parallel format that doesn't need a header.

"Kevin G. Wang"  writes:

> Hi Jed,
>
> Thanks for your help! It does not have to be VTK (.vtr in my case). But I
> am trying to have a "seamless" workflow between reading, writing, and
> visualization, without the need of format conversions. I opted for VTK
> simply because it is easy to write, and can be directly visualized (using
> Paraview).
>
> Could you please advise as to what is the best practice in my scenario? My
> meshes are Cartesian, but non-uniform.
>
> Thanks,
> Kevin
>
> On Thu, Nov 30, 2023 at 1:02 AM Jed Brown  wrote:
>
>> Is it necessary that it be VTK format or can it be PETSc's binary format
>> or a different mesh format? VTK (be it legacy .vtk or the XML-based .vtu,
>> etc.) is a bad format for parallel reading, no matter how much effort might
>> go into an implementation.
>>
>> "Kevin G. Wang"  writes:
>>
>> > Good morning everyone.
>> >
>> > I use the following functions to output parallel vectors --- "globalVec"
>> in
>> > this example --- to VTK files. It works well, and is quite convenient.
>> >
>> > ~
>> >   PetscViewer viewer;
>> >   PetscViewerVTKOpen(PetscObjectComm((PetscObject)*dm), filename,
>> > FILE_MODE_WRITE, );
>> >   VecView(globalVec, viewer);
>> >   PetscViewerDestroy();
>> > ~
>> >
>> > Now, I am trying to do the opposite. I would like to read the VTK files
>> > generated by PETSc back into memory, and assign each one to a Vec. Could
>> > someone let me know how this can be done?
>> >
>> > Thanks!
>> > Kevin
>> >
>> >
>> > --
>> > Kevin G. Wang, Ph.D.
>> > Associate Professor
>> > Kevin T. Crofton Department of Aerospace and Ocean Engineering
>> > Virginia Tech
>> > 1600 Innovation Dr., VTSS Rm 224H, Blacksburg, VA 24061
>> > Office: (540) 231-7547  |  Mobile: (650) 862-2663
>> > URL: https://www.aoe.vt.edu/people/faculty/wang.html
>> > Codes: https://github.com/kevinwgy
>>
>
>
> -- 
> Kevin G. Wang, Ph.D.
> Associate Professor
> Kevin T. Crofton Department of Aerospace and Ocean Engineering
> Virginia Tech
> 1600 Innovation Dr., VTSS Rm 224H, Blacksburg, VA 24061
> Office: (540) 231-7547  |  Mobile: (650) 862-2663
> URL: https://www.aoe.vt.edu/people/faculty/wang.html
> Codes: https://github.com/kevinwgy

Re: [petsc-users] Reading VTK files in PETSc

2023-11-29 Thread Jed Brown

Is it necessary that it be VTK format or can it be PETSc's binary format or a 
different mesh format? VTK (be it legacy .vtk or the XML-based .vtu, etc.) is a 
bad format for parallel reading, no matter how much effort might go into an 
implementation.

"Kevin G. Wang"  writes:

> Good morning everyone.
>
> I use the following functions to output parallel vectors --- "globalVec" in
> this example --- to VTK files. It works well, and is quite convenient.
>
> ~
>   PetscViewer viewer;
>   PetscViewerVTKOpen(PetscObjectComm((PetscObject)*dm), filename,
> FILE_MODE_WRITE, );
>   VecView(globalVec, viewer);
>   PetscViewerDestroy();
> ~
>
> Now, I am trying to do the opposite. I would like to read the VTK files
> generated by PETSc back into memory, and assign each one to a Vec. Could
> someone let me know how this can be done?
>
> Thanks!
> Kevin
>
>
> -- 
> Kevin G. Wang, Ph.D.
> Associate Professor
> Kevin T. Crofton Department of Aerospace and Ocean Engineering
> Virginia Tech
> 1600 Innovation Dr., VTSS Rm 224H, Blacksburg, VA 24061
> Office: (540) 231-7547  |  Mobile: (650) 862-2663
> URL: https://www.aoe.vt.edu/people/faculty/wang.html
> Codes: https://github.com/kevinwgy

Re: [petsc-users] [Xolotl-psi-development] [EXTERNAL] Re: Unexpected performance losses switching to COO interface

2023-11-29 Thread Jed Brown

"Blondel, Sophie"  writes:

> Hi Jed,
>
> I'm not sure I'm going to reply to your question correctly because I don't 
> really understand how the split is done. Is it related to on diagonal and off 
> diagonal? If so, the off-diagonal part is usually pretty small (less than 20 
> DOFs) and related to diffusion, the diagonal part involves thousands of DOFs 
> for the reaction term.

>From the run-time option, it'll be a default (additive) split and we're 
>interested in the two diagonal blocks. One currently has a cheap solver that 
>would only be efficient with a well-conditioned positive definite matrix and 
>the other is using a direct solver ('redundant'). If you were to run with 
>-ksp_view and share the output, it would be informative.

Either way, I'd like to understand what physics are beind the equation 
currently being solved with 'redundant'. If it's diffusive, then algebraic 
multigrid would be a good place to start.

> Let us know what we can do to answer this question more accurately.
>
> Cheers,
>
> Sophie
> 
> From: Jed Brown 
> Sent: Tuesday, November 28, 2023 19:07
> To: Fackler, Philip ; Junchao Zhang 
> 
> Cc: petsc-users@mcs.anl.gov ; 
> xolotl-psi-developm...@lists.sourceforge.net 
> 
> Subject: Re: [Xolotl-psi-development] [petsc-users] [EXTERNAL] Re: Unexpected 
> performance losses switching to COO interface
>
> [Some people who received this message don't often get email from 
> j...@jedbrown.org. Learn why this is important at 
> https://aka.ms/LearnAboutSenderIdentification ]
>
> "Fackler, Philip via petsc-users"  writes:
>
>> That makes sense. Here are the arguments that I think are relevant:
>>
>> -fieldsplit_1_pc_type redundant -fieldsplit_0_pc_type sor -pc_type 
>> fieldsplit -pc_fieldsplit_detect_coupling
>
> What sort of physics are in splits 0 and 1?
>
> SOR is not a good GPU algorithm, so we'll want to change that one way or 
> another. Are the splits of similar size or very different?
>
>> What would you suggest to make this better?
>>
>> Also, note that the cases marked "serial" are running on CPU only, that is, 
>> using only the SERIAL backend for kokkos.
>>
>> Philip Fackler
>> Research Software Engineer, Application Engineering Group
>> Advanced Computing Systems Research Section
>> Computer Science and Mathematics Division
>> Oak Ridge National Laboratory
>> 
>> From: Junchao Zhang 
>> Sent: Tuesday, November 28, 2023 15:51
>> To: Fackler, Philip 
>> Cc: petsc-users@mcs.anl.gov ; 
>> xolotl-psi-developm...@lists.sourceforge.net 
>> 
>> Subject: Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses 
>> switching to COO interface
>>
>> Hi, Philip,
>>I opened hpcdb-PSI_9-serial and it seems you used PCLU.  Since Kokkos 
>> does not have a GPU LU implementation, we do it on CPU via 
>> MatLUFactorNumeric_SeqAIJ(). Perhaps you can try other PC types?
>>
>> [Screenshot 2023-11-28 at 2.43.03 PM.png]
>> --Junchao Zhang
>>
>>
>> On Wed, Nov 22, 2023 at 10:43 AM Fackler, Philip 
>> mailto:fackle...@ornl.gov>> wrote:
>> I definitely dropped the ball on this. I'm sorry for that. I have new 
>> profiling data using the latest (as of yesterday) of petsc/main. I've put 
>> them in a single google drive folder linked here:
>>
>> https://drive.google.com/drive/folders/14ScvyfxOzc4OzXs9HZVeQDO-g6FdIVAI?usp=drive_link<https://urldefense.us/v2/url?u=https-3A__drive.google.com_drive_folders_14ScvyfxOzc4OzXs9HZVeQDO-2Dg6FdIVAI-3Fusp-3Ddrive-5Flink=DwMFaQ=v4IIwRuZAmwupIjowmMWUmLasxPEgYsgNI-O7C4ViYc=DAkLCjn8leYU-uJ-kfNEQMhPZWx9lzc4d5KgIR-RZWQ=Qn5D9xuzFcMdyuL0I2ruKmU6yeez0NrOx69oUjRaAXTeKD6etHt4USuZgnbqF4v6=_Lqg9v8aa4KXUdud3zqSp55FiYkZ12Pp5ZY54_9OvJI=>
>>
>> Have a happy holiday weekend!
>>
>> Thanks,
>>
>> Philip Fackler
>> Research Software Engineer, Application Engineering Group
>> Advanced Computing Systems Research Section
>> Computer Science and Mathematics Division
>> Oak Ridge National Laboratory
>> 
>> From: Junchao Zhang mailto:junchao.zh...@gmail.com>>
>> Sent: Monday, October 16, 2023 15:24
>> To: Fackler, Philip mailto:fackle...@ornl.gov>>
>> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
>> mailto:petsc-users@mcs.anl.gov>>; 
>> xolotl-psi-developm...@lists.sourceforge.net<mailto:xolotl-psi-developm...@lists.sourceforge.net>
>>  
>> mailto:xolotl-psi-developm...@lists.sourceforge.net>>
>> Subject: Re: [

Re: [petsc-users] [EXTERNAL] Re: Unexpected performance losses switching to COO interface

2023-11-28 Thread Jed Brown

"Fackler, Philip via petsc-users"  writes:

> That makes sense. Here are the arguments that I think are relevant:
>
> -fieldsplit_1_pc_type redundant -fieldsplit_0_pc_type sor -pc_type fieldsplit 
> -pc_fieldsplit_detect_coupling

What sort of physics are in splits 0 and 1?

SOR is not a good GPU algorithm, so we'll want to change that one way or 
another. Are the splits of similar size or very different?

> What would you suggest to make this better?
>
> Also, note that the cases marked "serial" are running on CPU only, that is, 
> using only the SERIAL backend for kokkos.
>
> Philip Fackler
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> Oak Ridge National Laboratory
> 
> From: Junchao Zhang 
> Sent: Tuesday, November 28, 2023 15:51
> To: Fackler, Philip 
> Cc: petsc-users@mcs.anl.gov ; 
> xolotl-psi-developm...@lists.sourceforge.net 
> 
> Subject: Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses 
> switching to COO interface
>
> Hi, Philip,
>I opened hpcdb-PSI_9-serial and it seems you used PCLU.  Since Kokkos does 
> not have a GPU LU implementation, we do it on CPU via 
> MatLUFactorNumeric_SeqAIJ(). Perhaps you can try other PC types?
>
> [Screenshot 2023-11-28 at 2.43.03 PM.png]
> --Junchao Zhang
>
>
> On Wed, Nov 22, 2023 at 10:43 AM Fackler, Philip 
> mailto:fackle...@ornl.gov>> wrote:
> I definitely dropped the ball on this. I'm sorry for that. I have new 
> profiling data using the latest (as of yesterday) of petsc/main. I've put 
> them in a single google drive folder linked here:
>
> https://drive.google.com/drive/folders/14ScvyfxOzc4OzXs9HZVeQDO-g6FdIVAI?usp=drive_link
>
> Have a happy holiday weekend!
>
> Thanks,
>
> Philip Fackler
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> Oak Ridge National Laboratory
> 
> From: Junchao Zhang mailto:junchao.zh...@gmail.com>>
> Sent: Monday, October 16, 2023 15:24
> To: Fackler, Philip mailto:fackle...@ornl.gov>>
> Cc: petsc-users@mcs.anl.gov 
> mailto:petsc-users@mcs.anl.gov>>; 
> xolotl-psi-developm...@lists.sourceforge.net
>  
> mailto:xolotl-psi-developm...@lists.sourceforge.net>>
> Subject: Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses 
> switching to COO interface
>
> Hi, Philip,
>That branch was merged to petsc/main today. Let me know once you have new 
> profiling results.
>
>Thanks.
> --Junchao Zhang
>
>
> On Mon, Oct 16, 2023 at 9:33 AM Fackler, Philip 
> mailto:fackle...@ornl.gov>> wrote:
> Junchao,
>
> I've attached updated timing plots (red and blue are swapped from before; 
> yellow is the new one). There is an improvement for the NE_3 case only with 
> CUDA. Serial stays the same, and the PSI cases stay the same. In the PSI 
> cases, MatShift doesn't show up (I assume because we're using different 
> preconditioner arguments). So, there must be some other primary culprit. I'll 
> try to get updated profiling data to you soon.
>
> Thanks,
>
> Philip Fackler
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> Oak Ridge National Laboratory
> 
> From: Fackler, Philip via Xolotl-psi-development 
> mailto:xolotl-psi-developm...@lists.sourceforge.net>>
> Sent: Wednesday, October 11, 2023 11:31
> To: Junchao Zhang mailto:junchao.zh...@gmail.com>>
> Cc: petsc-users@mcs.anl.gov 
> mailto:petsc-users@mcs.anl.gov>>; 
> xolotl-psi-developm...@lists.sourceforge.net
>  
> mailto:xolotl-psi-developm...@lists.sourceforge.net>>
> Subject: Re: [Xolotl-psi-development] [EXTERNAL] Re: [petsc-users] Unexpected 
> performance losses switching to COO interface
>
> I'm on it.
>
> Philip Fackler
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> Oak Ridge National Laboratory
> 
> From: Junchao Zhang mailto:junchao.zh...@gmail.com>>
> Sent: Wednesday, October 11, 2023 10:14
> To: Fackler, Philip mailto:fackle...@ornl.gov>>
> Cc: petsc-users@mcs.anl.gov 
> mailto:petsc-users@mcs.anl.gov>>; 
>

Re: [petsc-users] Storing Values using a Triplet for using later

2023-11-08 Thread Jed Brown

I don't think you want to hash floating point values, but I've had a number of 
reasons to want spatial hashing for near-neighbor queries in PETSc and that 
would be a great contribution. (Spatial hashes have a length scale and compute 
integer bins.)

Brandon Denton via petsc-users  writes:

> Good Afternoon,
>
> Is there a structure within PETSc that allows storage of a value using a 
> triple similar to PetscHMapIJSet with the key using a struct{PetscScalar i, 
> j, k;}?
>
> I'm trying to access mesh information (the shape function coefficients I will 
> calculate prior to their use) who's values I want to store in the auxiliary 
> array available in the Residual Functions of PETSc's FEM infrastructure. 
> After some trial and error work, I've come to the realization that the 
> coordinates (x[]) available in the auxiliary functions is the centroid of the 
> cell/element currently being evaluated. This triplet is unique for each 
> cell/element for a valid mesh so I think it's reasonable to use this triplet 
> as a key for looking up stored values unique to each cell/element. My plan is 
> to attached the map to the Application Context, also available to Auxiliary 
> Functions, to enable these calculations.
>
> Does such a map infrastructure exist within PETSc? If so, could you point me 
> to a reference for it? If not, does anyone have any suggestions on how to 
> solve this problem?
>
> Thank you in advance for your time.
> Brandon Denton

Re: [petsc-users] Better solver and preconditioner to use multiple GPU

2023-11-08 Thread Jed Brown

What sort of problem are you solving? Algebraic multigrid like gamg or hypre 
are good choices for elliptic problems. Sparse triangular solves have horrific 
efficiency even on one GPU so you generally want to do your best to stay away 
from them.

"Ramoni Z. Sedano Azevedo"  writes:

> Hey!
>
> I am using PETSC in Fortran code and we apply the MPI process to
> parallelize the code.
>
> At the moment, the options that have been used are
> -ksp_monitor_true_residual
> -ksp_type bcgs
> -pc_type bjacobi
> -sub_pc_type ilu
> -sub_pc_factor_levels 3
> -sub_pc_factor_fill 6
>
> Now, we want to use multiple GPUs and I would like to know if there is a
> better solver and preconditioner pair to apply in this case.
>
> Yours sincerely,
> Ramoni Z. S . Azevedo

Re: [petsc-users] Status of PETScSF failures with GPU-aware MPI on Perlmutter

2023-11-02 Thread Jed Brown

What modules do you have loaded. I don't know if it currently works with 
cuda-11.7. I assume you're following these instructions carefully.

https://docs.nersc.gov/development/programming-models/mpi/cray-mpich/#cuda-aware-mpi

In our experience, GPU-aware MPI continues to be brittle on these machines. 
Maybe you can inquire with NERSC exactly which CUDA versions are tested with 
GPU-aware MPI.

Sajid Ali  writes:

> Hi PETSc-developers,
>
> I had posted about crashes within PETScSF when using GPU-aware MPI on
> Perlmutter a while ago (
> https://lists.mcs.anl.gov/mailman/htdig/petsc-users/2022-February/045585.html).
> Now that the software stacks have stabilized, I was wondering if there was
> a fix for the same as I am still observing similar crashes.
>
> I am attaching the trace of the latest crash (with PETSc-3.20.0) for
> reference.
>
> Thank You,
> Sajid Ali (he/him) | Research Associate
> Data Science, Simulation, and Learning Division
> Fermi National Accelerator Laboratory
> s-sajid-ali.github.io

Re: [petsc-users] Advices on creating DMPlex from custom input format

2023-10-30 Thread Jed Brown

It's probably easier to apply boundary conditions when you have the serial 
mesh. You may consider contributing the reader if it's a format that others use.

"onur.notonur via petsc-users"  writes:

> Hi,
>
> I hope this message finds you all in good health and high spirits.
>
> I wanted to discuss an approach problem input file reading/processing in a 
> solver which is using PETSc DMPlex. In our team we have a range of solvers, 
> they are not built on PETSc except this one, but they all share a common 
> problem input format. This format includes essential data such as node 
> coordinates, element connectivity, boundary conditions based on elements, and 
> specific metadata related to the problem. I create an array for boundary 
> points on each rank and utilize them in our computations, I am doing it 
> hardcoded currently but I need to start reading those input files, But I am 
> not sure about the approach.
>
> Here's what I have in mind:
>
> - - Begin by reading the node coordinates and connectivity on a single core.
> - Utilize the DMPlexCreateFromCellListPetsc() function to construct the 
> DMPlex.
> - Distribute the mesh across processors.
> - Proceed to read and process the boundary conditions on each processor. If 
> the global index of the boundary element corresponds to that processor, we 
> process it; otherwise, we pass.
>
> Additionally, maybe I need to reorder the mesh. In that case I think I can 
> use the point permutation IS obtained from the DMPlexGetOrdering() function 
> while processing boundary conditions.
>
> Also I have another approach in my mind but I don't know if it's possible: 
> Read/construct DMPlex on single core including boundary conditions. Store BC 
> related data in Vec or another appropriate data structure. Then distribute 
> this BC holding data structure too as well as DMPlex.
>
> I would greatly appreciate your thoughts and any suggestions you might have 
> regarding this approach. Looking forward to hearing your insights.
>
> Best regards,
>
> Onur

Re: [petsc-users] Copying PETSc Objects Across MPI Communicators

2023-10-24 Thread Jed Brown

You can place it in a parallel Mat (that has rows or columns on only one rank 
or a subset of ranks) and then MatCreateSubMatrix with all new rows/columns on 
a different rank or subset of ranks.

That said, you usually have a function that assembles the matrix and you can 
just call that on the new communicator.

Damyn Chipman  writes:

> Hi PETSc developers,
>
> In short, my question is this: Does PETSc provide a way to move or copy an 
> object (say a Mat) from one communicator to another?
>
> The more detailed scenario is this: I’m working on a linear algebra solver on 
> quadtree meshes (i.e., p4est). I use communicator subsets in order to 
> facilitate communication between siblings or nearby neighbors. When 
> performing linear algebra across siblings (a group of 4), I need to copy a 
> node’s data (i.e., a Mat object) from a sibling’s communicator to the 
> communicator that includes the four siblings. From what I can tell, I can 
> only copy a PETSc object onto the same communicator.
>
> My current approach will be to copy the raw data from the Mat on one 
> communicator to a new Mat on the new communicator, but I wanted to see if 
> there is a more “elegant” approach within PETSc.
>
> Thanks in advance,
>
> Damyn Chipman
> Boise State University
> PhD Candidate
> Computational Sciences and Engineering
> damynchip...@u.boisestate.edu

Re: [petsc-users] FEM Implementation of NS with SUPG Stabilization

2023-10-11 Thread Jed Brown

Matthew Knepley  writes:

> On Wed, Oct 11, 2023 at 1:03 PM Jed Brown  wrote:
>
>> I don't see an attachment, but his thesis used conservative variables and
>> defined an effective length scale in a way that seemed to assume constant
>> shape function gradients. I'm not aware of systematic literature comparing
>> the covariant and contravariant length measures on anisotropic meshes, but
>> I believe most people working in the Shakib/Hughes approach use the
>> covariant measure. Our docs have a brief discussion of this choice.
>>
>> https://libceed.org/en/latest/examples/fluids/#equation-eq-peclet
>>
>> Matt, I don't understand how the second derivative comes into play as a
>> length measure on anistropic meshes -- the second derivatives can be
>> uniformly zero and yet you still need a length measure.
>>
>
> I was talking about the usual SUPG where we just penalize the true residual.

I think you're focused on computing the strong diffusive flux (which can be 
done using second derivatives or by a projection; the latter produces somewhat 
better results). But you still need a length scale and that's most naturally 
computed using the derivative of reference coordinates with respect to physical 
(or equivalently, the associated metric tensor).

Re: [petsc-users] FEM Implementation of NS with SUPG Stabilization

2023-10-11 Thread Jed Brown

I don't see an attachment, but his thesis used conservative variables and 
defined an effective length scale in a way that seemed to assume constant shape 
function gradients. I'm not aware of systematic literature comparing the 
covariant and contravariant length measures on anisotropic meshes, but I 
believe most people working in the Shakib/Hughes approach use the covariant 
measure. Our docs have a brief discussion of this choice.

https://libceed.org/en/latest/examples/fluids/#equation-eq-peclet

Matt, I don't understand how the second derivative comes into play as a length 
measure on anistropic meshes -- the second derivatives can be uniformly zero 
and yet you still need a length measure.

Brandon Denton via petsc-users  writes:

> I was thinking about trying to implement Ben Kirk's approach to Navier-Stokes 
> (see attached paper; Section 5). His approach uses these quantities to align 
> the orientation of the unstructured element/cell with the fluid velocity to 
> apply the stabilization/upwinding and to detect shocks.
>
> If you have an example of the approach you mentioned, could you please send 
> it over so I can review it?
>
> On Oct 11, 2023 6:02 AM, Matthew Knepley  wrote:
> On Tue, Oct 10, 2023 at 9:34 PM Brandon Denton via petsc-users 
> mailto:petsc-users@mcs.anl.gov>> wrote:
> Good Evening,
>
> I am looking to implement a form of Navier-Stokes with SUPG Stabilization and 
> shock capturing using PETSc's FEM infrastructure. In this implementation, I 
> need access to the cell's shape function gradients and natural coordinate 
> gradients for calculations within the point-wise residual calculations. How 
> do I get these quantities at the quadrature points? The signatures for fo and 
> f1 don't seem to contain this information.
>
> Are you sure you need those? Darsh and I implemented SUPG without that. You 
> would need local second derivative information, which you can get using 
> -dm_ds_jet_degree 2. If you check in an example, I can go over it.
>
>   Thanks,
>
>  Matt
>
> Thank you in advance for your time.
> Brandon
>
>
> --
> What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any results to which their experiments 
> lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/

Re: [petsc-users] FEM Implementation of NS with SUPG Stabilization

2023-10-10 Thread Jed Brown

Do you want to write a new code using only PETSc or would you be up for 
collaborating on ceed-fluids, which is a high-performance compressible SUPG 
solver based on DMPlex with good GPU support? It uses the metric to compute 
covariant length for stabilization. We have YZƁ shock capturing, though it 
hasn't been tested much beyond shock tube experiments. (Most of our work has 
been low Mach.)

https://libceed.org/en/latest/examples/fluids/
https://github.com/CEED/libCEED/blob/main/examples/fluids/qfunctions/stabilization.h#L76

On Tue, Oct 10, 2023, at 7:34 PM, Brandon Denton via petsc-users wrote:
> Good Evening,
> 
> I am looking to implement a form of Navier-Stokes with SUPG Stabilization and 
> shock capturing using PETSc's FEM infrastructure. In this implementation, I 
> need access to the cell's shape function gradients and natural coordinate 
> gradients for calculations within the point-wise residual calculations. How 
> do I get these quantities at the quadrature points? The signatures for fo and 
> f1 don't seem to contain this information.
> 
> Thank you in advance for your time.
> Brandon

Re: [petsc-users] Orthogonalization of a (sparse) PETSc matrix

2023-08-29 Thread Jed Brown

Suitesparse includes a sparse QR algorithm. The main issue is that (even with 
pivoting) the R factor has the same nonzero structure as a Cholesky factor of 
A^T A, which is generally much denser than a factor of A, and this degraded 
sparsity impacts Q as well.

I wonder if someone would like to contribute a sparse QR to PETSc. It could 
have a default implementation via Cholesky QR and the ability to call SPQR from 
Suitesparse.

Barry Smith  writes:

>   Are the nonzero structures of all the rows related? If they are, one could 
> devise a routine to take advantage of this relationship, but if the nonzero 
> structures of each row are "randomly" different from all the other rows, then 
> it is difficult to see how one can take advantage of the sparsity.
>
>
>
>> On Aug 29, 2023, at 12:50 PM, Thanasis Boutsikakis 
>>  wrote:
>> 
>> Hi all, I have the following code that orthogonalizes a PETSc matrix. The 
>> problem is that this implementation requires that the PETSc matrix is dense, 
>> otherwise, it fails at bv.SetFromOptions(). Hence the assert in 
>> orthogonality().
>> 
>> What could I do in order to be able to orthogonalize sparse matrices as 
>> well? Could I convert it efficiently? (I tried to no avail)
>> 
>> Thanks!
>> 
>> """Experimenting with matrix orthogonalization"""
>> 
>> import contextlib
>> import sys
>> import time
>> import numpy as np
>> from firedrake import COMM_WORLD
>> from firedrake.petsc import PETSc
>> 
>> import slepc4py
>> 
>> slepc4py.init(sys.argv)
>> from slepc4py import SLEPc
>> 
>> from numpy.testing import assert_array_almost_equal
>> 
>> EPSILON_USER = 1e-4
>> EPS = sys.float_info.epsilon
>> 
>> 
>> def Print(message: str):
>> """Print function that prints only on rank 0 with color
>> 
>> Args:
>> message (str): message to be printed
>> """
>> PETSc.Sys.Print(message)
>> 
>> 
>> def create_petsc_matrix(input_array, sparse=True):
>> """Create a PETSc matrix from an input_array
>> 
>> Args:
>> input_array (np array): Input array
>> partition_like (PETSc mat, optional): Petsc matrix. Defaults to None.
>> sparse (bool, optional): Toggle for sparese or dense. Defaults to 
>> True.
>> 
>> Returns:
>> PETSc mat: PETSc matrix
>> """
>> # Check if input_array is 1D and reshape if necessary
>> assert len(input_array.shape) == 2, "Input array should be 2-dimensional"
>> global_rows, global_cols = input_array.shape
>> 
>> size = ((None, global_rows), (global_cols, global_cols))
>> 
>> # Create a sparse or dense matrix based on the 'sparse' argument
>> if sparse:
>> matrix = PETSc.Mat().createAIJ(size=size, comm=COMM_WORLD)
>> else:
>> matrix = PETSc.Mat().createDense(size=size, comm=COMM_WORLD)
>> matrix.setUp()
>> 
>> local_rows_start, local_rows_end = matrix.getOwnershipRange()
>> 
>> for counter, i in enumerate(range(local_rows_start, local_rows_end)):
>> # Calculate the correct row in the array for the current process
>> row_in_array = counter + local_rows_start
>> matrix.setValues(
>> i, range(global_cols), input_array[row_in_array, :], addv=False
>> )
>> 
>> # Assembly the matrix to compute the final structure
>> matrix.assemblyBegin()
>> matrix.assemblyEnd()
>> 
>> return matrix
>> 
>> 
>> def orthogonality(A):  # sourcery skip: avoid-builtin-shadow
>> """Checking and correcting orthogonality
>> 
>> Args:
>> A (PETSc.Mat): Matrix of size [m x k].
>> 
>> Returns:
>> PETSc.Mat: Matrix of size [m x k].
>> """
>> # Check if the matrix is dense
>> mat_type = A.getType()
>> assert mat_type in (
>> "seqdense",
>> "mpidense",
>> ), "A must be a dense matrix. SLEPc.BV().createFromMat() requires a 
>> dense matrix."
>> 
>> m, k = A.getSize()
>> 
>> Phi1 = A.getColumnVector(0)
>> Phi2 = A.getColumnVector(k - 1)
>> 
>> # Compute dot product using PETSc function
>> dot_product = Phi1.dot(Phi2)
>> 
>> if abs(dot_product) > min(EPSILON_USER, EPS * m):
>> Print("Matrix is not orthogonal")
>> 
>> # Type can be CHOL, GS, mro(), SVQB, TSQR, TSQRCHOL
>> _type = SLEPc.BV().OrthogBlockType.GS
>> 
>> bv = SLEPc.BV().createFromMat(A)
>> bv.setFromOptions()
>> bv.setOrthogonalization(_type)
>> bv.orthogonalize()
>> 
>> A = bv.createMat()
>> 
>> Print("Matrix successfully orthogonalized")
>> 
>> # # Assembly the matrix to compute the final structure
>> if not A.assembled:
>> A.assemblyBegin()
>> A.assemblyEnd()
>> else:
>> Print("Matrix is orthogonal")
>> 
>> return A
>> 
>> 
>> # 
>> # EXP: Orthogonalization of an mpi PETSc matrix
>> # 
>> 
>> m, k = 11, 7
>> # Generate the random numpy matrices
>>

Re: [petsc-users] 32-bit vs 64-bit GPU support

2023-08-11 Thread Jed Brown

Jacob Faibussowitsch  writes:

> More generally, it would be interesting to know the breakdown of installed 
> CUDA versions for users. Unlike compilers etc, I suspect that cluster admins 
> (and those running on local machines) are much more likely to be updating 
> their CUDA toolkits to the latest versions as they often contain critical 
> performance improvements.

One difference is that some sites (not looking at you at all, ALCF) still run 
pretty ancient drivers and/or have broken GPU-aware MPI with all but a specific 
ancient version of CUDA (OLCF, LLNL). With a normal compiler, you can choose to 
use the latest version, but with CUDA, people are firmly stuck on old versions.

Re: [petsc-users] 32-bit vs 64-bit GPU support

2023-08-11 Thread Jed Brown

Rohan Yadav  writes:

> With modern GPU sizes, for example A100's with 80GB of memory, a vector of
> length 2^31 is not that much memory -- one could conceivably run a CG solve
> with local vectors > 2^31.

Yeah, each vector would be 8 GB (single precision) or 16 GB (double). You can't 
store a matrix of this size, and probably not a "mesh", but it's possible to 
create such a problem if everything is matrix-free (possibly with matrix-free 
geometric multigrid). This is more likely to show up in a benchmark than any 
real science or engineering probelm. We should support it, but it still seems 
hypothetical and not urgent.

> Thanks Junchao, I might look into that. However, I currently am not trying
> to solve such a large problem -- these questions just came from wondering
> why the cuSPARSE kernel PETSc was calling was running faster than mine.

Hah, bandwidth doesn't like. ;-)

Re: [petsc-users] Setting a custom predictor in the generalized-alpha time stepper

2023-08-04 Thread Jed Brown

Yeah, we'd like the implementation to stay in alpha2.c. There could be a new 
interface TSAlpha2SetPredictorType (with -ts_alpha2_predictor_type 
[none,same_velocity,...]) or TSAlpha2SetPredictorFunction.

David Kamensky  writes:

> Hi Jed,
>
> The current workaround I'm using is very minimal and basically just moves
> the definition of `TS_Alpha` from `alpha2.c` up to `petsc/private/tsimpl.h`
> (and renames it to avoid a conflict with `TS_Alpha` in `alpha1.c`), but I
> gather that we're really still not "supposed to" include that header in
> applications.  So, I don't know whether something like that would be
> welcomed upstream.  The actual computation of the predictor is done on the
> application side.
>
> Having options like `-ts_alpha_same_velocity` and
> `-ts_alpha_same_acceleration` could probably be implemented by analogy to
> `-ts_theta_initial_guess_extrapolate`, although they wouldn't quite cover
> my specific use-case, where I'm only setting the predictor on part of the
> solution vector.  So, maybe something more general, like providing a
> generalized-$\alpha$-specific option for a custom predictor callback that
> takes `X0`, `V0`, and `A0` arguments would be the cleanest solution (and
> some convenient shortcuts for full-solution same-velocity and
> same-acceleration predictors could subsequently make use of that
> infrastructure).  I've been working quickly over the past week, but I might
> be able to take some time to implement a more sustainable solution soon.
>
> Thanks again,
> David
>
> On Fri, Aug 4, 2023 at 9:23 AM Jed Brown  wrote:
>
>> Some other TS implementations have a concept of extrapolation as an
>> initial guess. Such method-specific initial guesses sound like they fit
>> that pattern and would be welcome to be included in alpha2.c. Would you be
>> willing to make a merge request to bring your work upstream?
>>
>> David Kamensky  writes:
>>
>> > Hi Jed,
>> >
>> > What I'm trying to compute is basically a standard same-velocity or
>> > same-acceleration predictor (although slightly more complicated, since
>> I'm
>> > restricting it to a sub-system).  I hadn't looked into
>> > `SNESSetComputeInitialGuess` yet, although one difficulty is that it
>> would
>> > need access to the `X0`, `V0`, and `A0` members of the `TS_Alpha` struct,
>> > which is only defined in `alpha2.c`, and thus not available through the
>> > API.
>> >
>> > For now, we just worked around this by patching PETSc to move the
>> > definition of `TS_Alpha` up into a header to make it accessible.
>> > (Modifying the library obviously introduces a maintenance headache; I
>> also
>> > considered just casting the `ts->data` pointer to `(char*)`, calculating
>> > memory offsets based on `sizeof` the struct members, and casting back to
>> > `Vec`, but that relies on compiler-specific assumptions, and could also
>> > break if the PETSc source code was updated.)  We also shuffled the order
>> of
>> > some calls to `VecCopy` and `TSPreStage` in the routine
>> `TSAlpha_Restart`,
>> > so that `TSPreStage` can set the initial guess, although that sounds like
>> > it would be unnecessary if we instead used a callback in
>> > `SNESSetComputeInitialGuess` that had access to the internals of
>> > `TS_Alpha`.
>> >
>> > Thanks, David
>> >
>> > On Thu, Aug 3, 2023 at 11:28 PM Jed Brown  wrote:
>> >
>> >> I think you can use TSGetSNES() and SNESSetComputeInitialGuess() to
>> modify
>> >> the initial guess for SNES. Would that serve your needs? Is there
>> anything
>> >> else you can say about how you'd like to compute this initial guess? Is
>> >> there a paper or something?
>> >>
>> >> David Kamensky  writes:
>> >>
>> >> > Hi,
>> >> >
>> >> > My understanding is that the second-order generalized-alpha time
>> stepper
>> >> in
>> >> > PETSc uses a same-displacement predictor as the initial guess for the
>> >> > nonlinear solver that executes in each time step.  I'd like to be
>> able to
>> >> > set this to something else, to improve convergence.  However, my
>> >> > (possibly-naive) attempts to use `TSSetPreStep` and `TSSetPreStage`
>> >> haven't
>> >> > worked out.  Is there any way to set a custom predictor?
>> >> >
>> >> > Thanks,
>> >> > David Kamensky
>> >>
>>

Re: [petsc-users] Setting a custom predictor in the generalized-alpha time stepper

2023-08-04 Thread Jed Brown

Some other TS implementations have a concept of extrapolation as an initial 
guess. Such method-specific initial guesses sound like they fit that pattern 
and would be welcome to be included in alpha2.c. Would you be willing to make a 
merge request to bring your work upstream? 

David Kamensky  writes:

> Hi Jed,
>
> What I'm trying to compute is basically a standard same-velocity or
> same-acceleration predictor (although slightly more complicated, since I'm
> restricting it to a sub-system).  I hadn't looked into
> `SNESSetComputeInitialGuess` yet, although one difficulty is that it would
> need access to the `X0`, `V0`, and `A0` members of the `TS_Alpha` struct,
> which is only defined in `alpha2.c`, and thus not available through the
> API.
>
> For now, we just worked around this by patching PETSc to move the
> definition of `TS_Alpha` up into a header to make it accessible.
> (Modifying the library obviously introduces a maintenance headache; I also
> considered just casting the `ts->data` pointer to `(char*)`, calculating
> memory offsets based on `sizeof` the struct members, and casting back to
> `Vec`, but that relies on compiler-specific assumptions, and could also
> break if the PETSc source code was updated.)  We also shuffled the order of
> some calls to `VecCopy` and `TSPreStage` in the routine `TSAlpha_Restart`,
> so that `TSPreStage` can set the initial guess, although that sounds like
> it would be unnecessary if we instead used a callback in
> `SNESSetComputeInitialGuess` that had access to the internals of
> `TS_Alpha`.
>
> Thanks, David
>
> On Thu, Aug 3, 2023 at 11:28 PM Jed Brown  wrote:
>
>> I think you can use TSGetSNES() and SNESSetComputeInitialGuess() to modify
>> the initial guess for SNES. Would that serve your needs? Is there anything
>> else you can say about how you'd like to compute this initial guess? Is
>> there a paper or something?
>>
>> David Kamensky  writes:
>>
>> > Hi,
>> >
>> > My understanding is that the second-order generalized-alpha time stepper
>> in
>> > PETSc uses a same-displacement predictor as the initial guess for the
>> > nonlinear solver that executes in each time step.  I'd like to be able to
>> > set this to something else, to improve convergence.  However, my
>> > (possibly-naive) attempts to use `TSSetPreStep` and `TSSetPreStage`
>> haven't
>> > worked out.  Is there any way to set a custom predictor?
>> >
>> > Thanks,
>> > David Kamensky
>>

Re: [petsc-users] Setting a custom predictor in the generalized-alpha time stepper

2023-08-04 Thread Jed Brown

I think you can use TSGetSNES() and SNESSetComputeInitialGuess() to modify the 
initial guess for SNES. Would that serve your needs? Is there anything else you 
can say about how you'd like to compute this initial guess? Is there a paper or 
something?

David Kamensky  writes:

> Hi,
>
> My understanding is that the second-order generalized-alpha time stepper in
> PETSc uses a same-displacement predictor as the initial guess for the
> nonlinear solver that executes in each time step.  I'd like to be able to
> set this to something else, to improve convergence.  However, my
> (possibly-naive) attempts to use `TSSetPreStep` and `TSSetPreStage` haven't
> worked out.  Is there any way to set a custom predictor?
>
> Thanks,
> David Kamensky

Re: [petsc-users] Scalable Solver for Incompressible Flow

2023-07-31 Thread Jed Brown

I would email the author. He's been helpful in the past and that newer paper 
may have been extensions that didn't make it into the upstream example.

Alexander Lindsay  writes:

> Maybe that example was a jumping point for some of those studies, but it
> looks to me like that example has been around since ~2012 and initially
> only touched on SIMPLE, as opposed to addition of SIMPLE into an
> augmented lagrange scheme.
>
> But it does seem that at some point Carola Kruse added Golub-Kahan
> bidiagonalization tests to ex70. I don't know very much about that although
> it seems to be related to AL methods ... but requires that the matrix be
> symmetric?
>
> On Fri, Jul 28, 2023 at 7:04 PM Jed Brown  wrote:
>
>> See src/snes/tutorials/ex70.c for the code that I think was used for that
>> paper.
>>
>> Alexander Lindsay  writes:
>>
>> > Sorry for the spam. Looks like these authors have published multiple
>> papers on the subject
>> >
>> > cover.jpg
>> > Combining the Augmented Lagrangian Preconditioner with the Simple Schur
>> Complement Approximation | SIAM Journal on
>> > Scientific Computingdoi.org
>> > cover.jpg
>> >
>> >  On Jul 28, 2023, at 12:59 PM, Alexander Lindsay <
>> alexlindsay...@gmail.com> wrote:
>> >
>> >  Do you know of anyone who has applied the augmented Lagrange
>> methodology to a finite volume discretization?
>> >
>> >  On Jul 6, 2023, at 6:25 PM, Matthew Knepley  wrote:
>> >
>> >  On Thu, Jul 6, 2023 at 8:30 PM Alexander Lindsay <
>> alexlindsay...@gmail.com> wrote:
>> >
>> >  This is an interesting article that compares a multi-level ILU
>> algorithm to approximate commutator and augmented
>> >  Lagrange methods: https://doi.org/10.1002/fld.5039
>> >
>> >  That is for incompressible NS. The results are not better than
>> https://arxiv.org/abs/1810.03315, and that PC is considerably
>> >  simpler and already implemented in PETSc. There is an update in to this
>> >
>> >
>> >
>> https://epubs.siam.org/doi/abs/10.1137/21M1430698?casa_token=Fp_XhuZStZ0A:YDhnkW9XvAom_b8KocWz-hBEI7FAt46aw3ICa0FvCrOVCtYr9bwvtqJ4aBOTkDSvANKh6YTQEw
>> >
>> >
>> >  which removes the need for complicated elements.
>> >
>> >  You might need stuff like ILU for compressible flow, but I think
>> incompressible is solved.
>> >
>> >Thanks,
>> >
>> >   Matt
>> >
>> >  On Wed, Jun 28, 2023 at 11:37 AM Alexander Lindsay <
>> alexlindsay...@gmail.com> wrote:
>> >
>> >  I do believe that based off the results in
>> https://doi.org/10.1137/040608817 we should be able to make LSC, with
>> >  proper scaling, compare very favorably with PCD
>> >
>> >  On Tue, Jun 27, 2023 at 10:41 AM Alexander Lindsay <
>> alexlindsay...@gmail.com> wrote:
>> >
>> >  I've opened https://gitlab.com/petsc/petsc/-/merge_requests/6642 which
>> adds a couple more scaling
>> >  applications of the inverse of the diagonal of A
>> >
>> >  On Mon, Jun 26, 2023 at 6:06 PM Alexander Lindsay <
>> alexlindsay...@gmail.com> wrote:
>> >
>> >  I guess that similar to the discussions about selfp, the approximation
>> of the velocity mass matrix by the
>> >  diagonal of the velocity sub-matrix will improve when running a
>> transient as opposed to a steady
>> >  calculation, especially if the time derivative is lumped Just
>> thinking while typing
>> >
>> >  On Mon, Jun 26, 2023 at 6:03 PM Alexander Lindsay <
>> alexlindsay...@gmail.com> wrote:
>> >
>> >  Returning to Sebastian's question about the correctness of the current
>> LSC implementation: in the
>> >  taxonomy paper that Jed linked to (which talks about SIMPLE, PCD, and
>> LSC), equation 21 shows four
>> >  applications of the inverse of the velocity mass matrix. In the PETSc
>> implementation there are at
>> >  most two applications of the reciprocal of the diagonal of A (an
>> approximation to the velocity mass
>> >  matrix without more plumbing, as already pointed out). It seems like
>> for code implementations in
>> >  which there are possible scaling differences between the velocity and
>> pressure equations, that this
>> >  difference in the number of inverse applications could be significant?
>> I know Jed said that these
>> >  scalings wouldn't really matter if you have a uniform grid, but I'm not
>&

Re: [petsc-users] Scalable Solver for Incompressible Flow

2023-07-28 Thread Jed Brown

d tell me their solve is
>  converging slowly, asking me how to fix it. So I generally assume they have
>  built an appropriate mesh and problem size for the problem they want to solve
>  and added appropriate turbulence modeling (although my general assumption
>  is often violated).
>
>  > And to confirm, are you doing a nonlinearly implicit velocity-pressure 
> solve?
>
>  Yes, this is our default.
>
>  A general question: it seems that it is well known that the quality of selfp
>  degrades with increasing advection. Why is that?
>
>  On Wed, Jun 7, 2023 at 8:01 PM Jed Brown  wrote:
>
>  Alexander Lindsay  writes:
>
>  > This has been a great discussion to follow. Regarding
>  >
>  >> when time stepping, you have enough mass matrix that cheaper
>  preconditioners are good enough
>  >
>  > I'm curious what some algebraic recommendations might be for high Re
>  in
>  > transients. 
>
>  What mesh aspect ratio and streamline CFL number? Assuming your model
>  is turbulent, can you say anything about momentum thickness Reynolds
>  number Re_θ? What is your wall normal spacing in plus units? (Wall
>  resolved or wall modeled?)
>
>  And to confirm, are you doing a nonlinearly implicit velocity-pressure
>  solve?
>
>  > I've found one-level DD to be ineffective when applied monolithically or
>  to the momentum block of a split, as it scales with the mesh size. 
>
>  I wouldn't put too much weight on "scaling with mesh size" per se. You
>  want an efficient solver for the coarsest mesh that delivers sufficient
>  accuracy in your flow regime. Constants matter.
>
>  Refining the mesh while holding time steps constant changes the advective
>  CFL number as well as cell Peclet/cell Reynolds numbers. A meaningful
>  scaling study is to increase Reynolds number (e.g., by growing the domain)
>  while keeping mesh size matched in terms of plus units in the viscous
>  sublayer and Kolmogorov length in the outer boundary layer. That turns
>  out to not be a very automatic study to do, but it's what matters and you
>  can spend a lot of time chasing ghosts with naive scaling studies.
>
>  -- 
>  What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any
>  results to which their experiments lead.
>  -- Norbert Wiener
>
>  https://www.cse.buffalo.edu/~knepley/

Re: [petsc-users] [petsc-maint] Monolithic AMG with fieldsplit as smoother

2023-07-26 Thread Jed Brown

AMG is subtle here. With AMG for systems, you typically feed it elements of the 
near null space. In the case of (smoothed) aggregation, the coarse space will 
have a regular block structure with block sizes equal to the number of 
near-null vectors. You can use pc_fieldsplit options to select which fields you 
want in each split.

However, AMG also needs a strength of connection and if your system is so weird 
you need to fieldsplit the smoothers (e.g., a saddle point problem or a 
hyperbolic system) then it's likely that you'll also need a custom strength of 
connection to obtain reasonable coarsening.

Barry Smith  writes:

>   See the very end of the section 
> https://petsc.org/release/manual/ksp/#multigrid-preconditioners on how to 
> control the smoothers (and coarse grid solve) for multigrid in PETSc 
> including for algebraic multigrid.
>
>So, for example, -mg_levels_pc_type fieldsplit would be the starting 
> point. Depending on the block size of the matrices it may automatically do 
> simple splits, you can control the details  of the fieldsplit preconditioner 
> with -mg_levels_pc_fieldsplit_...  and the details for each split with 
> -mg_levels_fieldsplit_ 
>
>See src/ksp/ksp/tutorials/ex42.c for example, usage 
>
>Feel free to ask more specific questions once you get started.
>
>> On Jul 26, 2023, at 9:47 PM, Michael Wick  
>> wrote:
>> 
>> Hello PETSc team:
>> 
>> I wonder if the current PETSc implementation supports using AMG 
>> monolithically for a multi-field problem and using fieldsplit in the 
>> smoother.
>> 
>> Thank you very much,
>> 
>> Mike

Re: [petsc-users] Question about using PETSc to generate matrix preconditioners

2023-07-25 Thread Jed Brown

I think random matrices will produce misleading results. The chance of randomly 
generating a matrix that resembles an application is effectively zero. I think 
you'd be better off with some model problems varying parameters that control 
the physical regime (e.g., shifts to a Laplacian, advection with/without 
upwinding at varying cell Peclet numbers, varying nondimensional parameters, 
time step, and boundary layers for fluids, varying Poisson ratio and material 
contrast in elasticity).

"Sun, Yixuan via petsc-users"  writes:

> Hello PETSc team,
>
> My name is Yixuan, and I am a postdoc at MCS. We are working on a project 
> where we want to predict the preconditioners from the corresponding matrices. 
> For preliminary exploration purposes, we generated random matrices and their 
> preconditioners using PETSc. The current deep learning models look promising 
> with the generated data. However, we are not sure about the correctness of 
> the data generation code and wanted to ask if you could help check the 
> correctness of our script (< 100 lines) and let us know if there were any 
> issues. Here is the 
> link
>  to the script.
>
> Please let me know if you have any questions or concerns. Thank you in 
> advance!
>
>
> Warm regards,
> Yixuan
> 
> Yixuan Sun
> Postdoctoral Researcher
> Mathematics and Computer Science
> Argonne National Laboratory

Re: [petsc-users] GAMG and Hypre preconditioner

2023-06-27 Thread Jed Brown

Zisheng Ye via petsc-users  writes:

> Dear PETSc Team
>
> We are testing the GPU support in PETSc's KSPSolve, especially for the GAMG 
> and Hypre preconditioners. We have encountered several issues that we would 
> like to ask for your suggestions.
>
> First, we have couple of questions when working with a single MPI rank:
>
>   1.  We have tested two backends, CUDA and Kokkos. One commonly encountered 
> error is related to SpGEMM in CUDA when the mat is large as listed below:
>
> cudaMalloc((void **), bufferSize2) error( cudaErrorMemoryAllocation): 
> out of memory
>
> For CUDA backend, one can use "-matmatmult_backend_cpu -matptap_backend_cpu" 
> to avoid these problems. However, there seems no equivalent options in Kokkos 
> backend. Is there any good practice to avoid this error for both backends and 
> if we can avoid this error in Kokkos backend?

Junchao will know more about KK tuning, but the faster GPU matrix-matrix 
algorithms use extra memory. We should be able to make the host option 
available with kokkos.

>   2.  We have tested the combination of Hypre and Kokkos as backend. It looks 
> like this combination is not compatible with each other, as we observed that 
> KSPSolve takes a greater number of iterations to exit, and the residual norm 
> in the post-checking is much larger than the one obtained when working with 
> CUDA backend. This happens for matrices with block size larger than 1. Is 
> there any explanation to the error?
>
> Second, we have couple more questions when working with multiple MPI ranks:
>
>   1.  We are currently using OpenMPI as we couldnt get Intel MPI to work as a 
> GPU-aware MPI, is this a known issue with Intel MPI?

As far as I know, Intel's MPI is only for SYCL/Intel GPUs. In general, 
GPU-aware MPI has been incredibly flaky on all HPC systems despite being 
introduced ten years ago.

>   2.  With OpenMPI we currently see a slow down when increasing the MPI count 
> as shown in the figure below, is this normal?

Could you share -log_view output from a couple representative runs? You could 
send those here or to petsc-ma...@mcs.anl.gov. We need to see what kind of work 
is not scaling to attribute what may be causing it.

Re: [petsc-users] hypre-ILU vs hypre Euclid

2023-06-22 Thread Jed Brown

It looks like Victor is working on hypre-ILU so it is active. PETSc used to 
have PILUT support, but it was so buggy/leaky that we removed the interface.

Alexander Lindsay  writes:

> Haha no I am not sure. There are a few other preconditioning options I will 
> explore before knocking on this door some more. 
>
>  On Jun 22, 2023, at 6:49 PM, Matthew Knepley  wrote:
>
>  On Thu, Jun 22, 2023 at 8:37 PM Alexander Lindsay 
>  wrote:
>
>  I know that PETSc has hooks for Euclid but I discovered today that it does 
> not support 64 bit indices, which many MOOSE
>  applications need. This would probably be more appropriate for a hypre 
> support forum (does anyone know if such a forum
>  exists other than opening GitHub issues?), but does anyone here know what 
> the difference between hypre-ILU and
>  hypre-Euclid are? From the docs it seems they are both supposed to be 
> parallel ILU solvers.
>
>  If hypre-ILU worked with 64 bit indices (I can probably check this sifting 
> through the sources), then I would probably add
>  hooks for it in PETSc (AFAICT those don't exist at present).
>
>  My understanding was that two different people were working on them. I do 
> not know if either is still actively supported. We
>  would of course like a binding to whatever is supported.
>
>  Are you sure you want to run ILU?
>
>THanks,
>
>   Matt
>  -- 
>  What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any results to
>  which their experiments lead.
>  -- Norbert Wiener
>
>  https://www.cse.buffalo.edu/~knepley/

Re: [petsc-users] How to efficiently fill in, in parallel, a PETSc matrix from a COO sparse matrix?

2023-06-20 Thread Jed Brown

Matthew Knepley  writes:

>> The matrix entries are multiplied by 2, that is, the number of processes
>> used to execute the code.
>>
>
> No. This was mostly intended for GPUs, where there is 1 process. If you
> want to use multiple MPI processes, then each process can only introduce
> some disjoint subset of the values. This is also how MatSetValues() works,
> but it might not be as obvious.

They need not be disjoint, just sum to the expected values. This interface is 
very convenient for FE and FV methods. MatSetValues with ADD_VALUES has similar 
semantics without the intermediate storage, but it forces you to submit one 
element matrix at a time. Classic parallelism granularity versus memory use 
tradeoff with MatSetValuesCOO being a clear win on GPUs and more nuanced for 
CPUs.

Re: [petsc-users] How to efficiently fill in, in parallel, a PETSc matrix from a COO sparse matrix?

2023-06-20 Thread Jed Brown

You should partition the entries so each entry is submitted by only one 
process. Note that duplicate entries (on the same or different proceses) are 
summed as you've seen. For example, in finite elements, it's typical to 
partition the elements and each process submits entries from its elements. 

Diego Magela Lemos via petsc-users  writes:

> Using all recommended approaches it worked!
> Thank you all in advance.
>
> Now, I'm facing problems when solving a linear system using each approach.
>
> *COO approach*
>
> Using MatSetPreallocationCOO and then MatSetValuesCOO, I'm able to fill in
> the matrix when running with 1 MPI process.
> But, if I run with more than one MPI process, the values entries are
> multiplied by the number of MPI processes being used.
> Is this behavior correct?
>
> Consider the following code:
>
> // fill_in_matrix.cc
>
> static char help[] = "Fill in a parallel COO format sparse matrix.";
>
> #include 
> #include 
>
> int main(int argc, char **args)
> {
> std::vector coo_i{0, 0, 1, 2, 3, 4};
> std::vector coo_j{0, 0, 1, 2, 3, 4};
> std::vector coo_v{2, -1, 2, 3, 4, 5};
>
> Mat A;
>
> PetscInt size = 5;
>
> PetscCall(PetscInitialize(, , NULL, help));
>
> // Create matrix
> PetscCall(MatCreate(PETSC_COMM_WORLD, ));
> PetscCall(MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, size, size));
> PetscCall(MatSetFromOptions(A));
>
> // Populate matrix
> PetscCall(MatSetPreallocationCOO(A, coo_v.size(), coo_i.data(),
> coo_j.data()));
> PetscCall(MatSetValuesCOO(A, coo_v.data(), ADD_VALUES));
>
> // View matrix
> PetscCall(MatView(A, PETSC_VIEWER_STDERR_WORLD));
>
> PetscCall(MatDestroy());
>
> PetscCall(PetscFinalize());
> return 0;
> }
>
> When running with petscmpiexec -n 1 ./fill_in_matrix, I got
>
> Mat Object: 1 MPI process
>>   type: seqaij
>> row 0: (0, 1.)
>> row 1: (1, 2.)
>> row 2: (2, 3.)
>> row 3: (3, 4.)
>> row 4: (4, 5.)
>
>
> Which is a correct result. But, when running it with petscmpiexec -n 2
> ./fill_in_matrix, I get
>
> Mat Object: 2 MPI process
>>   type: mpiaij
>> row 0: (0, 2.)
>> row 1: (1, 4.)
>> row 2: (2, 6.)
>> row 3: (3, 8.)
>> row 4: (4, 10.)
>
>
> The matrix entries are multiplied by 2, that is, the number of processes
> used to execute the code.
>
> *MatSetValues approach*
>
> I obtain the same behavior when filling in the matrix by using MatSetValues
>
> static char help[] = "Fill in a parallel COO format sparse matrix.";
>
> // fill_in_matrix.cc
>
> #include 
> #include 
>
> int main(int argc, char **args)
> {
> std::vector coo_i{0, 0, 1, 2, 3, 4};
> std::vector coo_j{0, 0, 1, 2, 3, 4};
> std::vector coo_v{2, -1, 2, 3, 4, 5};
>
> Mat A;
> PetscInt size = 5;
>
> PetscCall(PetscInitialize(, , NULL, help));
>
> // Create matrix
> PetscCall(MatCreate(PETSC_COMM_WORLD, ));
> PetscCall(MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, size, size));
> PetscCall(MatSetFromOptions(A));
>
> // Populate matrix
> for (size_t i = 0; i < coo_v.size(); i++)
> PetscCall(MatSetValues(A, 1, _i.at(i), 1, _j.at(i), &
> coo_v.at(i), ADD_VALUES));
>
> PetscCall(MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY));
> PetscCall(MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY));
>
> // View matrix
> PetscCall(MatView(A, PETSC_VIEWER_STDERR_WORLD));
>
> PetscCall(MatDestroy());
>
> PetscCall(PetscFinalize());
> return 0;
> }
>
> When solving a linear system, I get the correct answer no matter the number
> of MPI processes when using MatSetValues approach.
> The same is not true when using COO approach, whose result is only correct
> when using 1 MPI process.
>
> static char help[] = "Fill in a parallel COO format sparse matrix and solve
> a linear system.";
>
> #include 
> #include 
>
> int main(int argc, char **args)
> {
> std::vector coo_i{0, 0, 1, 2, 3, 4};
> std::vector coo_j{0, 0, 1, 2, 3, 4};
> std::vector coo_v{2, -1, 2, 3, 4, 5};
>
> Mat A;
> Vec B, X, U;
> KSP ksp;
> PC pc;
> PetscReal norm; // norm of solution error
> PetscInt its;
>
> PetscInt size = 5;
>
> PetscCall(PetscInitialize(, , NULL, help));
>
> // Create matrix
>
> PetscCall(MatCreate(PETSC_COMM_WORLD, ));
> PetscCall(MatSetSizes(A, PETSC_DECIDE, PETSC_DECIDE, size, size));
> PetscCall(MatSetFromOptions(A));
>
>
> // Populate matrix
>
> // COO
> PetscCall(MatSetPreallocationCOO(A, coo_v.size(), coo_i.data(),
> coo_j.data()));
> PetscCall(MatSetValuesCOO(A, coo_v.data(), ADD_VALUES));
>
> // MatSetValues for-loop
> // for (size_t i = 0; i < coo_v.size(); i++)
> // PetscCall(MatSetValues(A, 1, _i.at(i), 1, _j.at(i), &
> coo_v.at(i), ADD_VALUES));
>
> // PetscCall(MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY));
> // PetscCall(MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY));
>
> // View matrix
> PetscCall(MatView(A, PETSC_VIEWER_STDERR_WORLD));
>
> // Create vector B
>

Re: [petsc-users] dm_view of high-order geometry/solution

2023-06-12 Thread Jed Brown

CGNS supports fourth order and it's coded in PETSc, but Paraview hasn't 
implemented reading it yet. I think it would not be much work for someone 
(maybe you) to add it to Paraview. I have lots of applications on cubics, but 
not much beyond that so it hasn't risen to top priority for me.

There is an accepted extension, but the CGNS implementation is still in a 
branch.

https://cgns.github.io/ProposedExtensions/CPEX0045_HighOrder_v2.pdf
https://github.com/CGNS/CGNS/tree/CPEX45_high_order

There have been recent (announced in a blog post, yet still undocumented) 
extensions to the VTU format that would support this, but the format is so bad 
for parallel IO and time series that I haven't been motivated to extend the 
PETSc writer. Of course we would welcome contributions.

Duan Junming  writes:

> Dear Jed,
>
>
> Thank you for your help!
>
> Now I moved the line using "DMViewFromOptions" after the function 
> "PetscDSSetObjective",
>
> and it works for "-dm_coord_petscspace_degree 3 -petscspace_degree 3".
>
>
> But when I tried degree 4:
>
> ./ex33 -dm_plex_simplex 0 -dm_plex_box_faces 1,1 -mesh_transform annulus 
> -dm_coord_space 0 -dm_coord_petscspace_degree 4 -petscspace_degree 4 
> -dm_refine 1 -dm_view cgns:test.cgns
>
> Paraview gives an empty render.
>
>
> Using degree 5:
>
> ./ex33 -dm_plex_simplex 0 -dm_plex_box_faces 1,1 -mesh_transform annulus 
> -dm_coord_space 0 -dm_coord_petscspace_degree 5 -petscspace_degree 5 
> -dm_refine 1 -dm_view cgns:test.cgns
>
> it reports:
>
>
> [0]PETSC ERROR: - Error Message 
> --
> [0]PETSC ERROR: No support for this operation for this object type
> [0]PETSC ERROR: Cell type quadrilateral with closure size 36
> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.19.2, unknown
> [0]PETSC ERROR: ./ex33 on a arch-darwin-c-debug named 
> JunmingMacBook-Pro.local by Junming Mon Jun 12 20:23:04 2023
> [0]PETSC ERROR: Configure options --download-cgns --download-hdf5 
> --download-openmpi --download-triangle --with-fc=0 
> PETSC_ARCH=arch-darwin-c-debug --download-cgns
> [0]PETSC ERROR: #1 DMPlexCGNSGetPermutation_Internal() at 
> /Users/Junming/Packages/petsc/src/dm/impls/plex/cgns/plexcgns2.c:533
> [0]PETSC ERROR: #2 DMView_PlexCGNS() at 
> /Users/Junming/Packages/petsc/src/dm/impls/plex/cgns/plexcgns2.c:769
> [0]PETSC ERROR: #3 DMView_Plex() at 
> /Users/Junming/Packages/petsc/src/dm/impls/plex/plex.c:1801
> [0]PETSC ERROR: #4 DMView() at 
> /Users/Junming/Packages/petsc/src/dm/interface/dm.c:996
> [0]PETSC ERROR: #5 PetscObjectView() at 
> /Users/Junming/Packages/petsc/src/sys/objects/destroy.c:78
> [0]PETSC ERROR: #6 PetscObjectViewFromOptions() at 
> /Users/Junming/Packages/petsc/src/sys/objects/destroy.c:128
> [0]PETSC ERROR: #7 DMViewFromOptions() at 
> /Users/Junming/Packages/petsc/src/dm/interface/dm.c:940
> [0]PETSC ERROR: #8 CreateDiscretization() at ex33.c:232
> [0]PETSC ERROR: #9 main() at ex33.c:263
> [0]PETSC ERROR: PETSc Option Table entries:
> [0]PETSC ERROR: -dm_coord_petscspace_degree 5 (source: command line)
> [0]PETSC ERROR: -dm_coord_space 0 (source: command line)
> [0]PETSC ERROR: -dm_plex_box_faces 1,1 (source: command line)
> [0]PETSC ERROR: -dm_plex_simplex 0 (source: command line)
> [0]PETSC ERROR: -dm_refine 1 (source: command line)
> [0]PETSC ERROR: -dm_view cgns:test.cgns (source: command line)
> [0]PETSC ERROR: -mesh_transform annulus (source: command line)
> [0]PETSC ERROR: -petscspace_degree 5 (source: command line)
> [0]PETSC ERROR: End of Error Message ---send entire error 
> message to petsc-ma...@mcs.anl.gov--
> ------
> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_SELF
> with errorcode 56.
>
>
> Does cgns work for degree >= 4?
>
>
> Junming
>
>
> 
> From: Jed Brown 
> Sent: Monday, June 12, 2023 19:07
> To: Duan Junming; Matthew Knepley
> Cc: petsc-users@mcs.anl.gov
> Subject: Re: [petsc-users] dm_view of high-order geometry/solution
>
> And here's an MR to do what you want without any code/arg changes.
>
> https://gitlab.com/petsc/petsc/-/merge_requests/6588
>
> Jed Brown  writes:
>
>> Duan Junming  writes:
>>
>>> Dear Jed,
>>>
>>>
>>> Thank you for the suggestion.
>>>
>>> When I run tests/ex33.c with
>>>
>>> ./ex33 -dm_plex_simplex 0 -dm_plex_box_faces 1,1 -mesh_transform annulus 
>>> -dm_coord_space 0 -dm_coord_petscspace_degree 3 -dm_refine 1 -dm_

Re: [petsc-users] dm_view of high-order geometry/solution

2023-06-12 Thread Jed Brown

And here's an MR to do what you want without any code/arg changes.

https://gitlab.com/petsc/petsc/-/merge_requests/6588

Jed Brown  writes:

> Duan Junming  writes:
>
>> Dear Jed,
>>
>>
>> Thank you for the suggestion.
>>
>> When I run tests/ex33.c with
>>
>> ./ex33 -dm_plex_simplex 0 -dm_plex_box_faces 1,1 -mesh_transform annulus 
>> -dm_coord_space 0 -dm_coord_petscspace_degree 3 -dm_refine 1 -dm_view 
>> cgns:test.cgns
>>
>> and load it using Paraview,
>>
>> the mesh is still with straight lines.
>
> Ah, the viewer is keyed on the field (since the CGNS as supported by Paraview 
> specifies coordinates and fields in the same space). That doesn't exist in 
> your case. If you apply this patch and add `-petscspace_degre 3` to your 
> command, you'll see that high order information is present. Paraview doesn't 
> render as curves in all views, but it has the data.
>
> diff --git i/src/dm/impls/plex/tests/ex33.c w/src/dm/impls/plex/tests/ex33.c
> index 803095bc082..590facfa4f4 100644
> --- i/src/dm/impls/plex/tests/ex33.c
> +++ w/src/dm/impls/plex/tests/ex33.c
> @@ -198,7 +198,6 @@ PetscErrorCode CreateMesh(MPI_Comm comm, AppCtx *ctx, DM 
> *dm)
>default:
>  SETERRQ(comm, PETSC_ERR_ARG_OUTOFRANGE, "Unknown mesh transform %d", 
> ctx->meshTransform);
>}
> -  PetscCall(DMViewFromOptions(*dm, NULL, "-dm_view"));
>PetscFunctionReturn(PETSC_SUCCESS);
>  }
>  
> @@ -227,6 +226,7 @@ static PetscErrorCode CreateDiscretization(DM dm, AppCtx 
> *ctx)
>PetscCall(DMCreateDS(dm));
>PetscCall(DMGetDS(dm, ));
>PetscCall(PetscDSSetObjective(ds, 0, volume));
> +  PetscCall(DMViewFromOptions(dm, NULL, "-dm_view"));
>PetscFunctionReturn(PETSC_SUCCESS);
>  }
>  
>
> I can update the viewer to handle the degenerate case of no field (all my 
> models have fields).

Re: [petsc-users] dm_view of high-order geometry/solution

2023-06-12 Thread Jed Brown

Duan Junming  writes:

> Dear Jed,
>
>
> Thank you for the suggestion.
>
> When I run tests/ex33.c with
>
> ./ex33 -dm_plex_simplex 0 -dm_plex_box_faces 1,1 -mesh_transform annulus 
> -dm_coord_space 0 -dm_coord_petscspace_degree 3 -dm_refine 1 -dm_view 
> cgns:test.cgns
>
> and load it using Paraview,
>
> the mesh is still with straight lines.

Ah, the viewer is keyed on the field (since the CGNS as supported by Paraview 
specifies coordinates and fields in the same space). That doesn't exist in your 
case. If you apply this patch and add `-petscspace_degre 3` to your command, 
you'll see that high order information is present. Paraview doesn't render as 
curves in all views, but it has the data.

diff --git i/src/dm/impls/plex/tests/ex33.c w/src/dm/impls/plex/tests/ex33.c
index 803095bc082..590facfa4f4 100644
--- i/src/dm/impls/plex/tests/ex33.c
+++ w/src/dm/impls/plex/tests/ex33.c
@@ -198,7 +198,6 @@ PetscErrorCode CreateMesh(MPI_Comm comm, AppCtx *ctx, DM 
*dm)
   default:
 SETERRQ(comm, PETSC_ERR_ARG_OUTOFRANGE, "Unknown mesh transform %d", 
ctx->meshTransform);
   }
-  PetscCall(DMViewFromOptions(*dm, NULL, "-dm_view"));
   PetscFunctionReturn(PETSC_SUCCESS);
 }

@@ -227,6 +226,7 @@ static PetscErrorCode CreateDiscretization(DM dm, AppCtx 
*ctx)
   PetscCall(DMCreateDS(dm));
   PetscCall(DMGetDS(dm, ));
   PetscCall(PetscDSSetObjective(ds, 0, volume));
+  PetscCall(DMViewFromOptions(dm, NULL, "-dm_view"));
   PetscFunctionReturn(PETSC_SUCCESS);
 }

I can update the viewer to handle the degenerate case of no field (all my 
models have fields).

Re: [petsc-users] dm_view of high-order geometry/solution

2023-06-12 Thread Jed Brown

Matthew Knepley  writes:

> On Mon, Jun 12, 2023 at 6:01 AM Duan Junming  wrote:
>
>> Dear Matt,
>>
>> Thank you for the reply. I have a more specific question about the
>> spectral element example. Do you have any suggestions that how to write
>> all the nodes in each cell to .vtu?
>>
> It is the same procedure. VTU is not a great format for this. It wants
> everything at first order.

I would recommend configuring with --download-cgns and running with -dm_view 
cgns:output.cgns. This format has efficient parallel IO and curved elements 
work for moderate order in Paraview.

Re: [petsc-users] Scalable Solver for Incompressible Flow

2023-06-07 Thread Jed Brown

Alexander Lindsay  writes:

> This has been a great discussion to follow. Regarding
>
>> when time stepping, you have enough mass matrix that cheaper preconditioners 
>> are good enough
>
> I'm curious what some algebraic recommendations might be for high Re in
> transients. 

What mesh aspect ratio and streamline CFL number? Assuming your model is 
turbulent, can you say anything about momentum thickness Reynolds number Re_θ? 
What is your wall normal spacing in plus units? (Wall resolved or wall modeled?)

And to confirm, are you doing a nonlinearly implicit velocity-pressure solve?

> I've found one-level DD to be ineffective when applied monolithically or to 
> the momentum block of a split, as it scales with the mesh size. 

I wouldn't put too much weight on "scaling with mesh size" per se. You want an 
efficient solver for the coarsest mesh that delivers sufficient accuracy in 
your flow regime. Constants matter.

Refining the mesh while holding time steps constant changes the advective CFL 
number as well as cell Peclet/cell Reynolds numbers. A meaningful scaling study 
is to increase Reynolds number (e.g., by growing the domain) while keeping mesh 
size matched in terms of plus units in the viscous sublayer and Kolmogorov 
length in the outer boundary layer. That turns out to not be a very automatic 
study to do, but it's what matters and you can spend a lot of time chasing 
ghosts with naive scaling studies.

Re: [petsc-users] Issues creating DMPlex from higher order mesh generated by gmsh

2023-05-15 Thread Jed Brown

Matthew Knepley  writes:

> On Fri, May 5, 2023 at 10:55 AM Vilmer Dahlberg via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
>
>> Hi.
>>
>>
>> I'm trying to read a mesh of higher element order, in this example a mesh
>> consisting of 10-node tetrahedral elements, from gmsh, into PETSC. But It
>> looks like the mesh is not properly being loaded and converted into a
>> DMPlex. gmsh tells me it has generated a mesh with 7087 nodes, but when I
>> view my dm object it tells me it has 1081 0-cells. This is the printout I
>> get
>>
>
> Hi Vilmer,
>
> Plex makes a distinction between topological entities, like vertices, edges
> and cells, and the function spaces used to represent fields, like velocity
> or coordinates. When formats use "nodes", they mix the two concepts
> together.
>
> You see that if you add the number of vertices and edges, you get 7087,
> since for P2 there is a "node" on every edge. Is anything else wrong?

Note that quadratic (and higher order) tets are broken with the Gmsh reader. 
It's been on my todo list for a while.

As an example, this works when using linear elements (the projection makes them 
quadratic and visualization is correct), but is tangled when holes.msh is 
quadratic.

$ $PETSC_ARCH/tests/dm/impls/plex/tutorials/ex1 -dm_plex_filename 
~/meshes/holes.msh -dm_view cgns:s.cgns -dm_coord_petscspace_degree 2

SetFactory("OpenCASCADE");

radius = 0.3;
DefineConstant[
  nx = {1, Min 1, Max 30, Step 1, Name "Parameters/nx"}
  ny = {1, Min 1, Max 30, Step 1, Name "Parameters/ny"}
  extrude_length = {1, Min .1, Max 10, Step .1, Name "Parameters/extrusion 
length"}
  extrude_layers = {10, Min 1, Max 100, Step 1, Name "Parameters/extrusion 
layers"}
];
N = nx * ny;
Rectangle(1) = {0, 0, 0, 1, 1, 0};
Disk(10) = {0.5, 0.5, 0, radius};
BooleanDifference(100) = {Surface{1}; Delete;}{Surface{10}; Delete;};

For i In {0:nx-1}
  For j In {0:ny-1}
If (i + j > 0)
   Translate {i, j, 0} { Duplicata { Surface{100}; } }
EndIf
  EndFor
EndFor

Coherence;

// All the straight edges should have 8 elements
Transfinite Curve {:} = 8+1;

// Select the circles
circles = {};
For i In {0:nx-1}
  For j In {0:ny-1}
lo = .5 - radius - 1e-4;
hi = .5 + radius + 1e-4;
circles() += Curve In BoundingBox {
  i + lo, j + lo, -1e-4,
  i + hi, j + hi, +1e-4
};
  EndFor
EndFor

// Circles need 16 elements
Transfinite Curve {circles()} = 16+1;

Mesh.Algorithm = 8;

Extrude {0, 0, extrude_length} { Surface{100:100+N-1}; Layers{extrude_layers}; }

e = 1e-4;
extrude_start() = Surface In BoundingBox {-e, -e, -e, nx+e, ny+e, e};
extrude_end() = Surface In BoundingBox {
  -e, -e, extrude_length-e,
  nx+e, ny+e, extrude_length+e};

Physical Surface("start") = {extrude_start()};
Physical Surface("end") = {extrude_end()};
Physical Volume("solid") = {1:N};

Re: [petsc-users] How to find the map between the high order coordinates of DMPlex and vertex numbering?

2023-05-14 Thread Jed Brown

Good to hear this works for you. I believe there is still a problem with high 
order tetrahedral elements (we've been coping with it for months and someone 
asked last week) and plan to look at it as soon as possible now that my 
semester finished.

Zongze Yang  writes:

> Hi, Matt,
>
> The issue has been resolved while testing on the latest version of PETSc.
> It seems that the problem has been fixed in the following merge request:
> https://gitlab.com/petsc/petsc/-/merge_requests/5970
>
> I sincerely apologize for any inconvenience caused by my previous message.
> However, I would like to provide you with additional information regarding
> the test files. Attached to this email, you will find two Gmsh files:
> "square_2rd.msh" and "square_3rd.msh." These files contain high-order
> triangulated mesh data for the unit square.
>
> ```
> $ ./ex33 -coord_space 0 -dm_plex_filename square_2rd.msh
> -dm_plex_gmsh_project
> -dm_plex_gmsh_project_petscdualspace_lagrange_continuity true
> -dm_plex_gmsh_project_fe_view -volume 1
> PetscFE Object: P2 1 MPI process
>   type: basic
>   Basic Finite Element in 2 dimensions with 2 components
>   PetscSpace Object: P2 1 MPI process
> type: sum
> Space in 2 variables with 2 components, size 12
> Sum space of 2 concatenated subspaces (all identical)
>   PetscSpace Object: sum component (sumcomp_) 1 MPI process
> type: poly
> Space in 2 variables with 1 components, size 6
> Polynomial space of degree 2
>   PetscDualSpace Object: P2 1 MPI process
> type: lagrange
> Dual space with 2 components, size 12
> Continuous Lagrange dual space
> Quadrature on a triangle of order 5 on 9 points (dim 2)
> Volume: 1.
> $ ./ex33 -coord_space 0 -dm_plex_filename square_3rd.msh
> -dm_plex_gmsh_project
> -dm_plex_gmsh_project_petscdualspace_lagrange_continuity true
> -dm_plex_gmsh_project_fe_view -volume 1
> PetscFE Object: P3 1 MPI process
>   type: basic
>   Basic Finite Element in 2 dimensions with 2 components
>   PetscSpace Object: P3 1 MPI process
> type: sum
> Space in 2 variables with 2 components, size 20
> Sum space of 2 concatenated subspaces (all identical)
>   PetscSpace Object: sum component (sumcomp_) 1 MPI process
> type: poly
> Space in 2 variables with 1 components, size 10
> Polynomial space of degree 3
>   PetscDualSpace Object: P3 1 MPI process
> type: lagrange
> Dual space with 2 components, size 20
> Continuous Lagrange dual space
> Quadrature on a triangle of order 7 on 16 points (dim 2)
> Volume: 1.
> ```
>
> Thank you for your attention and understanding. I apologize once again for
> my previous oversight.
>
> Best wishes,
> Zongze
>
>
> On Sun, 14 May 2023 at 16:44, Matthew Knepley  wrote:
>
>> On Sat, May 13, 2023 at 6:08 AM Zongze Yang  wrote:
>>
>>> Hi, Matt,
>>>
>>> There seem to be ongoing issues with projecting high-order coordinates
>>> from a gmsh file to other spaces. I would like to inquire whether there are
>>> any plans to resolve this problem.
>>>
>>> Thank you for your attention to this matter.
>>>
>>
>> Yes, I will look at it. The important thing is to have a good test. Here
>> are the higher order geometry tests
>>
>>
>> https://gitlab.com/petsc/petsc/-/blob/main/src/dm/impls/plex/tests/ex33.c
>>
>> I take shapes with known volume, mesh them with higher order geometry, and
>> look at the convergence to the true volume. Could you add a GMsh test,
>> meaning the .msh file and known volume, and I will fix it?
>>
>>   Thanks,
>>
>>  Matt
>>
>>
>>> Best wishes,
>>> Zongze
>>>
>>>
>>> On Sat, 18 Jun 2022 at 20:31, Zongze Yang  wrote:
>>>
 Thank you for your reply. May I ask for some references on the order of
 the dofs on PETSc's FE Space (especially high order elements)?

 Thanks,

  Zongze

 Matthew Knepley  于2022年6月18日周六 20:02写道：

> On Sat, Jun 18, 2022 at 2:16 AM Zongze Yang 
> wrote:
>
>> In order to check if I made mistakes in the python code, I try to use
>> c code to show the issue on DMProjectCoordinates. The code and mesh file 
>> is
>> attached.
>> If the code is correct, there must be something wrong with
>> `DMProjectCoordinates` or `DMPlexCreateGmshFromFile` for high-order mesh.
>>
>
> Something is definitely wrong with high order, periodic simplices from
> Gmsh. We had not tested that case. I am at a conference and cannot look at
> it for a week.
> My suspicion is that the space we make when reading in the Gmsh
> coordinates does not match the values (wrong order).
>
>   Thanks,
>
> Matt
>
>
>> The command and the output are listed below: (Obviously the bounding
>> box is changed.)
>> ```
>> $ ./test_gmsh_load_2rd -filename cube-p2.msh -old_fe_view -new_fe_view
>> Old Bounding Box:
>>   0: lo = 0. hi = 1.
>>   1: lo = 0. hi = 1.
>>   2: lo = 0. hi = 1.
>> PetscFE

Re: [petsc-users] DMPlex, is there an API to get a list of Boundary points?

2023-05-11 Thread Jed Brown

Boundary faces are often labeled already on a mesh, but you can use this to set 
a label for all boundary faces.

https://petsc.org/main/manualpages/DMPlex/DMPlexMarkBoundaryFaces/

"Ferrand, Jesus A."  writes:

> Greetings.
>
> I terms of dm-plex terminology, I need a list points corresponding to the 
> boundary (i.e., height-1 points whose support is of size 1).
> Sorry if this is trivial, but I've been looking at the list of APIs for 
> DM-Plex and couldn't find/discern something that addresses this need.
>
> I guess I could loop through all height-1 points and call 
> DMPlexGetSupportSize() and assemble said list manually.
> The assumption for the plex I'm working with is that it has had 
> DMPlexInterpolate()called on it.
>
>
> Sincerely:
>
> J.A. Ferrand
>
> Embry-Riddle Aeronautical University - Daytona Beach - FL
> Ph.D. Candidate, Aerospace Engineering
>
> M.Sc. Aerospace Engineering
>
> B.Sc. Aerospace Engineering
>
> B.Sc. Computational Mathematics
>
>
> Phone: (386)-843-1829
>
> Email(s): ferra...@my.erau.edu
>
> jesus.ferr...@gmail.com

Re: [petsc-users] issues with VecSetValues in petsc 3.19

2023-05-10 Thread Jed Brown

Edoardo alinovi  writes:

> Hello Barry,
>
> Welcome to the party! Thank you guys for your precious suggestions, they
> are really helpful!
>
> It's been a while since I am messing around and I have tested many
> combinations. Schur + selfp is the best preconditioner, it converges within
> 5 iters using gmres for inner solvers but it is not very fast and sometimes
> multiplicative it's a better option as the inner iterations looks lighter.
> If I tune the relative tolerance I get a huge speed up, but I am a bit less
> confident about the results. The funny thing is that the default relative
> tolerance if commercial CFD solver is huge, very often 0.1 -.-

This depends on time step size and tolerances for quantities of interest.

> If I may borrow your brain guys for a while, I would like to ask your
> opinion about the multigrid they use in this paper:
> https://www.aub.edu.lb/msfea/research/Documents/CFD-P18.pdf.
> At some point they say: " The algorithm used in this work is a combination
> of the ILU(0) [28] algorithm with an additive corrective multigrid method
> [29] ", 29: https://www.tandfonline.com/doi/abs/10.1080/10407788608913491

It looks like this is a coupled solver with collocated dofs and thus I don't 
know how they stabilize in the incompressible limit (an inf-sup requirement is 
needed for stability). (This is based on a cursory read.)

Re: [petsc-users] Scalable Solver for Incompressible Flow

2023-05-08 Thread Jed Brown

Sebastian Blauth  writes:

> Hello everyone,
>
> I wanted to briefly follow up on my question (see my last reply).
> Does anyone know / have an idea why the LSC preconditioner in PETSc does 
> not seem to scale well with the problem size (the outer fgmres solver I 
> am using nearly scale nearly linearly with the problem size in my example).

The implementation was tested on heterogeneous Stokes problems from 
geodynamics, and perhaps not on NS (or not with the discretization you're 
using).

https://doi.org/10.1016/j.pepi.2008.07.036

There is a comment about not having plumbing to provide a mass matrix. A few 
lines earlier there is code using PetscObjectQuery, and that same pattern could 
be applied for the mass matrix. If you're on a roughly uniform mesh, including 
the mass scaling will probably have little effect, but it could have a big 
impact in the presence of highly anistropic elements or a broad range of scales.

I don't think LSC has gotten a lot of use since everyone I know who tried it 
has been sort of disappointed relative to other methods (e.g., inverse 
viscosity scaled mass matrix for heterogeneous Stokes, PCD for moderate Re 
Navier-Stokes). Of course there are no steady solutions to high Re so you 
either have a turbulence model or are time stepping. I'm not aware of work with 
LSC with turbulence models, and when time stepping, you have enough mass matrix 
that cheaper preconditioners are good enough. That said, it would be a great 
contribution to support this scaling.

> I have also already tried using -ksp_diagonal_scale but the results are 
> identical.

That's expected, and likely to mess up some MG implementations so I wouldn't 
recommend it.

Re: [petsc-users] Scalable Solver for Incompressible Flow

2023-05-02 Thread Jed Brown

Sebastian Blauth  writes:

> I agree with your comment for the Stokes equations - for these, I have 
> already tried and used the pressure mass matrix as part of a (additive) 
> block preconditioner and it gave mesh independent results.
>
> However, for the Navier Stokes equations, is the Schur complement really 
> spectrally equivalent to the pressure mass matrix? 

No, it's not. You'd want something like PCD (better, but not algebraic) or LSC.

> And even if it is, the convergence is only good for small Reynolds numbers, 
> for moderately high ones the convergence really deteriorates. This is why I 
> am trying to make fieldsplit_schur_precondition selfp work better (this is, 
> if I understand it correctly, a SIMPLE type preconditioner).

SIMPLE is for short time steps (not too far from resolving CFL) and bad for 
steady. This taxonomy is useful, though the problems are super academic and 
they don't use high aspect ratio.

https://doi.org/10.1016/j.jcp.2007.09.026

Re: [petsc-users] PETSc testing recipes

2023-04-14 Thread Jed Brown

Look at config/examples/arch-ci-*.py for the configurations. They're driven 
from .gitlab-ci.yml

Alexander Lindsay  writes:

> Hi, is there a place I can look to understand the testing recipes used in
> PETSc CI, e.g. what external packages are included (if any), what C++
> dialect is used for any external packages built with C++, etc.?
>
> Alex

Re: [petsc-users] MPI linear solver reproducibility question

2023-04-02 Thread Jed Brown

Vector communication used a different code path in 3.13. If you have a 
reproducer with current PETSc, I'll have a look. Here's a demo that the 
solution is bitwise identical (the sha256sum is the same every time you run it, 
though it might be different on your computer from mine due to compiler version 
and flags).

$ mpiexec -n 8 ompi/tests/snes/tutorials/ex5 -da_refine 3 -snes_monitor 
-snes_view_solution binary && sha256sum binaryoutput
  0 SNES Function norm 1.265943996096e+00
  1 SNES Function norm 2.831564838232e-02
  2 SNES Function norm 4.456686729809e-04
  3 SNES Function norm 1.206531765776e-07
  4 SNES Function norm 1.740255643596e-12
5410f84e91a9db3a74a2ac336031fb48e7eaf739614192cfd53344517986  binaryoutput

Mark McClure  writes:

> In the typical FD implementation, you only set local rows, but with FE and
> sometimes FV, you also create values that need to be communicated and
> summed on other processors.
> Makes sense.
>
> Anyway, in this case, I am certain that I am giving the solver bitwise
> identical matrices from each process. I am not using a preconditioner,
> using BCGS, with Petsc version 3.13.3.
>
> So then, how can I make sure that I am "using an MPI that follows the
> suggestion for implementers about determinism"? I am using MPICH version
> 3.3a2, didn't do anything special when installing it. Does that sound OK?
> If so, I could upgrade to the latest Petsc, try again, and if confirmed
> that it persists, could provide a reproduction scenario.
>
>
>
> On Sat, Apr 1, 2023 at 9:53 PM Jed Brown  wrote:
>
>> Mark McClure  writes:
>>
>> > Thank you, I will try BCGSL.
>> >
>> > And good to know that this is worth pursuing, and that it is possible.
>> Step
>> > 1, I guess I should upgrade to the latest release on Petsc.
>> >
>> > How can I make sure that I am "using an MPI that follows the suggestion
>> for
>> > implementers about determinism"? I am using MPICH version 3.3a2.
>> >
>> > I am pretty sure that I'm assembling the same matrix every time, but I'm
>> > not sure how it would depend on 'how you do the communication'. Each
>> > process is doing a series of MatSetValues with INSERT_VALUES,
>> > assembling the matrix by rows. My understanding of this process is that
>> > it'd be deterministic.
>>
>> In the typical FD implementation, you only set local rows, but with FE and
>> sometimes FV, you also create values that need to be communicated and
>> summed on other processors.
>>

Re: [petsc-users] MPI linear solver reproducibility question

2023-04-01 Thread Jed Brown

Mark McClure  writes:

> Thank you, I will try BCGSL.
>
> And good to know that this is worth pursuing, and that it is possible. Step
> 1, I guess I should upgrade to the latest release on Petsc.
>
> How can I make sure that I am "using an MPI that follows the suggestion for
> implementers about determinism"? I am using MPICH version 3.3a2.
>
> I am pretty sure that I'm assembling the same matrix every time, but I'm
> not sure how it would depend on 'how you do the communication'. Each
> process is doing a series of MatSetValues with INSERT_VALUES,
> assembling the matrix by rows. My understanding of this process is that
> it'd be deterministic.

In the typical FD implementation, you only set local rows, but with FE and 
sometimes FV, you also create values that need to be communicated and summed on 
other processors.

Re: [petsc-users] MPI linear solver reproducibility question

2023-04-01 Thread Jed Brown

If you use unpreconditioned BCGS and ensure that you assemble the same matrix 
(depends how you do the communication for that), I think you'll get bitwise 
reproducible results when using an MPI that follows the suggestion for 
implementers about determinism. Beyond that, it'll depend somewhat on the 
preconditioner.

If you like BCGS, you may want to try BCGSL, which has a longer memory and 
tends to be more robust. But preconditioning is usually critical and the place 
to devote most effort.

Mark McClure  writes:

> Hello,
>
> I have been a user of Petsc for quite a few years, though I haven't updated
> my version in a few years, so it's possible that my comments below could be
> 'out of date'.
>
> Several years ago, I'd asked you guys about reproducibility. I observed
> that if I gave an identical matrix to the Petsc linear solver, I would get
> a bit-wise identical result back if running on one processor, but if I ran
> with MPI, I would see differences at the final sig figs, below the
> convergence criterion. Even if rerunning the same exact calculation on the
> same exact machine.
>
> Ie, with repeated tests, it was always converging to the same answer
> 'within convergence tolerance', but not consistent in the sig figs beyond
> the convergence tolerance.
>
> At the time, the response that this was unavoidable, and related to the
> issue that machine arithmetic is not commutative, and so the timing of when
> processors were recombining information (which was random, effectively a
> race condition) was causing these differences.
>
> Am I remembering correctly? And, if so, is this still a property of the
> Petsc linear solver with MPI, and is there now any option available to
> resolve it? I would be willing to accept a performance hit in order to get
> guaranteed bitwise consistency, even when running with MPI.
>
> I am using the solver KSPBCGS, without a preconditioner. This is the
> selection because several years ago, I did testing, and found that on the
> particular linear systems that I am usually working with, this solver (with
> no preconditioner) was the most robust, in terms of consistently
> converging, and in terms of performance. Actually, I also tested a variety
> of other linear solvers other than Petsc (including other implementations
> of BiCGStab), and found that the Petsc BCGS was the best performer. Though,
> I'm curious, have there been updates to that algorithm in recent years,
> where I should consider updating to a newer Petsc build and comparing?
>
> Best regards,
> Mark McClure

Re: [petsc-users] Using PETSc Testing System

2023-03-28 Thread Jed Brown

Great that you got it working. We would accept a merge request that made our 
infrastructure less PETSc-specific so long as it doesn't push more complexity 
on the end user. That would likely make it easier for you to pull updates in 
the future. 

Daniele Prada  writes:

> Dear Matthew, dear Jacob,
>
> Thank you very much for your useful remarks. I managed to use the PETSc
> Testing System by doing as follows:
>
> 1. Redefined TESTDIR when running make
> 2. Used a project tree similar to that of PETSc. For examples, tests for
> 'package1' are in $MYLIB/src/package1/tests/
> 3. cp $PETSC_DIR/gmakefile.test $MYLIB/gmakefile.test
>
> Inside gmakefile.test:
>
> 4. Right AFTER "-include petscdir.mk" added "-include mylib.mk" to have
> $MYLIB exported (note: this affects TESTSRCDIR)
> 5. Redefined variable pkgs as "pkgs := package1"
> 6. Introduced a few variables to make PETSC_COMPILE.c work:
>
> CFLAGS := -I$(MYLIB)/include
> LDFLAGS = -L$(MYLIB)/lib
> LDLIBS = -lmylib
>
> 7. Changed the call to gmakegentest.py as follows
>
> $(PYTHON) $(CONFIGDIR)/gmakegentest.py --petsc-dir=$(PETSC_DIR)
> --petsc-arch=$(PETSC_ARCH) --testdir=$(TESTDIR) --srcdir=$(MYLIB)/src
> --pkg-pkgs=$(pkgs)
>
> 8. Changed the rule $(testexe.c) as follows:
>
> $(call quiet,CLINKER) $(EXEFLAGS) $(LDFLAGS) -o $@ $^ $(PETSC_TEST_LIB)
> $(LDLIBS)
>
> 9. Added the option --srcdir=$(TESTSRCDIR) and set --petsc-dir=$(MYLIB)
> when calling query_tests.py, for example:
>
> TESTTARGETS := $(shell $(PYTHON) $(CONFIGDIR)/query_tests.py
> --srcdir=$(TESTSRCDIR) --testdir=$(TESTDIR) --petsc-dir=$(MYLIB)
> --petsc-arch=$(PETSC_ARCH) --searchin=$(searchin) 'name' '$(search)')
>
>
>
> What do you think?
>
> Best,
> Daniele
>
> On Mon, Mar 27, 2023 at 4:38 PM Matthew Knepley  wrote:
>
>> On Mon, Mar 27, 2023 at 10:19 AM Jacob Faibussowitsch 
>> wrote:
>>
>>> Our testing framework was pretty much tailor-made for the PETSc src tree
>>> and as such has many hard-coded paths and decisions. I’m going to go out on
>>> a limb and say you probably won’t get this to work...
>>>
>>
>> I think we can help you get this to work. I have wanted to generalize the
>> test framework for a long time. Everything is build by
>>
>>   confg/gmakegentest.py
>>
>> and I think we can get away with just changing paths here and everything
>> will work.
>>
>>   Thanks!
>>
>>  Matt
>>
>>
>>> That being said, one of the “base” paths that the testing harness uses to
>>> initially find tests is the `TESTSRCDIR` variable in
>>> `${PETSC_DIR}/gmakefile.test`. It is currently defined as
>>> ```
>>> # TESTSRCDIR is always relative to gmakefile.test
>>> #  This must be before includes
>>> mkfile_path := $(abspath $(lastword $(MAKEFILE_LIST)))
>>> TESTSRCDIR := $(dir $(mkfile_path))src
>>> ```
>>> You should start by changing this to
>>> ```
>>> # TESTSRCDIR is always relative to gmakefile.test
>>> #  This must be before includes
>>> mkfile_path := $(abspath $(lastword $(MAKEFILE_LIST)))
>>> TESTSRCDIR ?= $(dir $(mkfile_path))src
>>> ```
>>> That way you could run your tests via
>>> ```
>>> $ make test TESTSRCDIR=/path/to/your/src/dir
>>> ```
>>> I am sure there are many other modifications you will need to make.
>>>
>>> Best regards,
>>>
>>> Jacob Faibussowitsch
>>> (Jacob Fai - booss - oh - vitch)
>>>
>>> > On Mar 27, 2023, at 06:14, Daniele Prada 
>>> wrote:
>>> >
>>> > Hello everyone,
>>> >
>>> > I would like to use the PETSc Testing System for testing a package that
>>> I am developing.
>>> >
>>> > I have read the PETSc developer documentation and have written some
>>> tests using the PETSc Test Description Language. I am going through the
>>> files in ${PETSC_DIR}/config but I am not able to make the testing system
>>> look into the directory tree of my project.
>>> >
>>> > Any suggestions?
>>> >
>>> > Thanks in advance
>>> > Daniele
>>>
>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> 
>>

Re: [petsc-users] GAMG failure

2023-03-28 Thread Jed Brown

This suite has been good for my solid mechanics solvers. (It's written here as 
a coarse grid solver because we do matrix-free p-MG first, but you can use it 
directly.)

https://github.com/hypre-space/hypre/issues/601#issuecomment-1069426997

Blaise Bourdin  writes:

>  On Mar 27, 2023, at 9:11 PM, Mark Adams  wrote:
>
>  Yes, the eigen estimates are converging slowly. 
>
>  BTW, have you tried hypre? It is a good solver (lots lots more woman years)
>  These eigen estimates are conceptually simple, but they can lead to problems 
> like this (hypre and an eigen estimate free
>  smoother).
>
> I just moved from petsc 3.3 to main, so my experience with an old version of 
> hyper has not been very convincing. Strangely
> enough, ML has always been the most efficient PC for me. Maybe it’s time to 
> revisit.
> That said, I would really like to get decent performances out of gamg. One 
> day, I’d like to be able to account for the special structure
> of phase-field fracture in the construction of the coarse space.
>
>  But try this (good to have options anyway):
>
>  -pc_gamg_esteig_ksp_max_it 20
>
>  Chevy will scale the estimate that we give by, I think, 5% by default. Maybe 
> 10.
>  You can set that with:
>
>  -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05
>
>  0.2 is the scaling of the high eigen estimate for the low eigen value in 
> Chebyshev.
>
> Jed’s suggestion of using -pc_gamg_reuse_interpolation 0 worked. I am testing 
> your options at the moment.
>
> Thanks a lot,
>
> Blaise
>
> — 
> Canada Research Chair in Mathematical and Computational Aspects of Solid 
> Mechanics (Tier 1)
> Professor, Department of Mathematics & Statistics
> Hamilton Hall room 409A, McMaster University
> 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada 
> https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243

Re: [petsc-users] GAMG failure

2023-03-27 Thread Jed Brown

Try -pc_gamg_reuse_interpolation 0. I thought this was disabled by default, but 
I see pc_gamg->reuse_prol = PETSC_TRUE in the code.

Blaise Bourdin  writes:

>  On Mar 24, 2023, at 3:21 PM, Mark Adams  wrote:
>
>  * Do you set: 
>
>  PetscCall(MatSetOption(Amat, MAT_SPD, PETSC_TRUE));
>
>  PetscCall(MatSetOption(Amat, MAT_SPD_ETERNAL, PETSC_TRUE));
>
> Yes
>
>  Do that to get CG Eigen estimates. Outright failure is usually caused by a 
> bad Eigen estimate.
>  -pc_gamg_esteig_ksp_monitor_singular_value
>  Will print out the estimates as its iterating. You can look at that to check 
> that the max has converged.
>
> I just did, and something is off:
> I do multiple calls to SNESSolve (staggered scheme for phase-field fracture), 
> but only get informations on the first solve (which is
> not the one failing, of course)
> Here is what I get:
> Residual norms for Displacement_pc_gamg_esteig_ solve.
>   0 KSP Residual norm 7.636421712860e+01 % max 1.e+00 min 
> 1.e+00 max/min
> 1.e+00
>   1 KSP Residual norm 3.402024867977e+01 % max 1.114319928921e+00 min 
> 1.114319928921e+00 max/min
> 1.e+00
>   2 KSP Residual norm 2.124815079671e+01 % max 1.501143586520e+00 min 
> 5.739351119078e-01 max/min
> 2.615528402732e+00
>   3 KSP Residual norm 1.581785698912e+01 % max 1.644351137983e+00 min 
> 3.263683482596e-01 max/min
> 5.038329074347e+00
>   4 KSP Residual norm 1.254871990315e+01 % max 1.714668863819e+00 min 
> 2.044075812142e-01 max/min
> 8.388479789416e+00
>   5 KSP Residual norm 1.051198229090e+01 % max 1.760078533063e+00 min 
> 1.409327403114e-01 max/min
> 1.248878386367e+01
>   6 KSP Residual norm 9.061658306086e+00 % max 1.792995287686e+00 min 
> 1.023484740555e-01 max/min
> 1.751853463603e+01
>   7 KSP Residual norm 8.015529297567e+00 % max 1.821497535985e+00 min 
> 7.818018001928e-02 max/min
> 2.329871248104e+01
>   8 KSP Residual norm 7.201063258957e+00 % max 1.855140071935e+00 min 
> 6.178572472468e-02 max/min
> 3.002538337458e+01
>   9 KSP Residual norm 6.548491711695e+00 % max 1.903578294573e+00 min 
> 5.008612895206e-02 max/min
> 3.800609738466e+01
>  10 KSP Residual norm 6.002109992255e+00 % max 1.961356890125e+00 min 
> 4.130572033722e-02 max/min
> 4.748390475004e+01
>   Residual norms for Displacement_pc_gamg_esteig_ solve.
>   0 KSP Residual norm 2.373573910237e+02 % max 1.e+00 min 
> 1.e+00 max/min
> 1.e+00
>   1 KSP Residual norm 8.845061415709e+01 % max 1.081192207576e+00 min 
> 1.081192207576e+00 max/min
> 1.e+00
>   2 KSP Residual norm 5.607525485152e+01 % max 1.345947059840e+00 min 
> 5.768825326129e-01 max/min
> 2.333138869267e+00
>   3 KSP Residual norm 4.123522550864e+01 % max 1.481153523075e+00 min 
> 3.070603564913e-01 max/min
> 4.823655974348e+00
>   4 KSP Residual norm 3.345765664017e+01 % max 1.551374710727e+00 min 
> 1.953487694959e-01 max/min
> 7.941563771968e+00
>   5 KSP Residual norm 2.859712984893e+01 % max 1.604588395452e+00 min 
> 1.313871480574e-01 max/min
> 1.221267391199e+01
>   6 KSP Residual norm 2.525636054248e+01 % max 1.650487481750e+00 min 
> 9.322735730688e-02 max/min
> 1.770389646804e+01
>   7 KSP Residual norm 2.270711391451e+01 % max 1.697243639599e+00 min 
> 6.945419058256e-02 max/min
> 2.443687883140e+01
>   8 KSP Residual norm 2.074739485241e+01 % max 1.737293728907e+00 min 
> 5.319942519758e-02 max/min
> 3.265624999621e+01
>   9 KSP Residual norm 1.912808268870e+01 % max 1.771708608618e+00 min 
> 4.229776586667e-02 max/min
> 4.188657656771e+01
>  10 KSP Residual norm 1.787394414641e+01 % max 1.802834420843e+00 min 
> 3.460455235448e-02 max/min
> 5.209818645753e+01
>   Residual norms for Displacement_pc_gamg_esteig_ solve.
>   0 KSP Residual norm 1.361990679391e+03 % max 1.e+00 min 
> 1.e+00 max/min
> 1.e+00
>   1 KSP Residual norm 5.377188333825e+02 % max 1.086812916769e+00 min 
> 1.086812916769e+00 max/min
> 1.e+00
>   2 KSP Residual norm 2.819790765047e+02 % max 1.474233179517e+00 min 
> 6.475176340551e-01 max/min
> 2.276745994212e+00
>   3 KSP Residual norm 1.856720658591e+02 % max 1.646049713883e+00 min 
> 4.391851040105e-01 max/min
> 3.747963441500e+00
>   4 KSP Residual norm 1.446507859917e+02 % max 1.760403013135e+00 min 
> 2.972886103795e-01 max/min
> 5.921528614526e+00
>   5 KSP Residual norm 1.212491636433e+02 % max 1.839250080524e+00 min 
> 1.921591413785e-01 max/min
> 9.571494061277e+00
>   6 KSP Residual norm 1.052783637696e+02 % max 1.887062042760e+00 min 
> 1.275920366984e-01 max/min
> 1.478981048966e+01
>   7 KSP Residual norm 9.230292625762e+01 % max 1.917891358356e+00 min 
> 8.853577120467e-02 max/min
> 2.166233300122e+01
>   8 KSP Residual norm 8.262607594297e+01 % max 1.935857204308e+00 min 
> 6.706949937710e-02 max/min
> 2.886345093206e+01
>   9 KSP Residual norm 7.616474911000e+01 % max 1.946323901431e+00 min 
> 5.354310733090e-02 max/min
> 3.635059671458e+01
>  10 KSP Residual norm

Re: [petsc-users] GAMG failure

2023-03-24 Thread Jed Brown

You can -pc_gamg_threshold .02 to slow the coarsening and either stronger 
smoother or increase number of iterations used for estimation (or increase 
tolerance). I assume your system is SPD and you've set the near-null space.

Blaise Bourdin  writes:

> Hi,
>
> I am having issue with GAMG for some very ill-conditioned 2D linearized 
> elasticity problems (sharp variation of elastic moduli with thin  regions of 
> nearly incompressible material). I use snes_type newtonls, linesearch_type 
> cp, and pc_type gamg without any further options. pc_type Jacobi converges 
> fine (although slowly of course).
>
>
> I am not really surprised that gamg would not converge out of the box, but 
> don’t know where to start to investigate the convergence failure. Can anybody 
> help?
>
> Blaise
>
> — 
> Canada Research Chair in Mathematical and Computational Aspects of Solid 
> Mechanics (Tier 1)
> Professor, Department of Mathematics & Statistics
> Hamilton Hall room 409A, McMaster University
> 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada 
> https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243

Re: [petsc-users] O3 versus O2

2023-03-08 Thread Jed Brown

You can test a benchmark problem with both. It probably doesn't make a lot of 
difference with the solver configuration you've selected (most of those 
operations are memory bandwidth limited).

If your residual and Jacobian assembly code is written to vectorize, you may 
get significant benefit from architecture-specific optimizations like 
-march=skylake.

Alfredo Jaramillo  writes:

> Dear community,
>
> We are in the middle of testing a simulator where the main computational
> bottleneck is solving a linear problem. We do this by calling
> GMRES+BoomerAMG through PETSc.
>
> This is a commercial code, pretended to serve clients with workstations or
> with access to clusters.
>
> Would you recommend O3 versus O2 optimizations? Maybe just to compile the
> linear algebra libraries?
>
> Some years ago, I worked on another project where going back to O2 solved a
> weird runtime error that I was never able to solve. This triggers my
> untrust.
>
> Thank you for your time!
> Alfredo

Re: [petsc-users] PetscViewer with 64bit

2023-02-16 Thread Jed Brown

Okay, this works now. I'm pretty sure I tested this long ago with connectivity 
using Int64 and found that didn't work. That may have been ancient history, but 
I'm hesitant to revamp to match PetscInt. If doing that, it would require 
changing the signature of DMPlexGetVTKConnectivity to use PetscInt instead of 
PetscVTKInt. I'm already underwater and don't have the stamina to test it, but 
this MR will get you going for problems in which individual parts don't have 
more than 2B dofs.

https://gitlab.com/petsc/petsc/-/merge_requests/6081/diffs?commit_id=27ba695b8b62ee2bef0e5776c33883276a2a1735

Mike Michell  writes:

> Jed,
>
> It does not work for me even with the reproducer case with the small 2D
> square mesh. Can you run the case that I sent and open the created
> "sol.vtu" file with paraview?
>
> Thanks,
>
>
>> Thanks, Dave.
>>
>> Mike, can you test that this branch works with your large problems? I
>> tested that .vtu works in parallel for small problems, where works = loads
>> correctly in Paraview and VisIt.
>>
>> https://gitlab.com/petsc/petsc/-/merge_requests/6081
>>
>> Dave May  writes:
>>
>> > On Tue 14. Feb 2023 at 21:27, Jed Brown  wrote:
>> >
>> >> Dave May  writes:
>> >>
>> >> > On Tue 14. Feb 2023 at 17:17, Jed Brown  wrote:
>> >> >
>> >> >> Can you share a reproducer? I think I recall the format requiring
>> >> certain
>> >> >> things to be Int32.
>> >> >
>> >> >
>> >> > By default, the byte offset used with the appended data format is
>> >> UInt32. I
>> >> > believe that’s where the sizeof(int) is coming from. This default is
>> >> > annoying as it limits the total size of your appended data to be < 3
>> GB.
>> >> > That said, in the opening of the paraview file you can add this
>> attribute
>> >> >
>> >> > header_type="UInt64"
>> >>
>> >> You mean in the header of the .vtu?
>> >
>> >
>> > Yeah, within the open VTKFile tag.
>> > Like this
>> > < VTKFile type=“xxx”,  byte_order="LittleEndian" header_type="UInt64" >
>> >
>> > Do you happen to have an example or pointers to docs describing this
>> >> feature?
>> >
>> >
>> > Example yes - will send it to you tomorrow. Docs… not really. Only stuff
>> > like this
>> >
>> >
>> https://kitware.github.io/paraview-docs/latest/python/paraview.simple.XMLPStructuredGridWriter.html
>> >
>> >
>> >
>> https://kitware.github.io/paraview-docs/v5.8.0/python/paraview.simple.XMLMultiBlockDataWriter.html
>> >
>> > All the writers seem to support it.
>> >
>> >
>> > Can we always do this?
>> >
>> >
>> > Yep!
>> >
>> >
>> > It isn't mentioned in these:
>> >>
>> >> https://vtk.org/wp-content/uploads/2015/04/file-formats.pdf   (PDF was
>> >> created in 2003)
>> >>
>> >>
>> https://kitware.github.io/vtk-examples/site/VTKFileFormats/#xml-file-formats
>> >>
>> >
>> > Yes I know. I’ve tied myself in knots for years because the of the
>> > assumption that the offset had to be an int.
>> >
>> > Credit for the discovery goes to Carsten Uphoff. He showed me this.
>> >
>> > Cheers,
>> > Dave
>> >
>> >
>> >
>> >> > then the size of the offset is now UInt64 and now large files can be
>> >> > finally written.
>> >> >
>> >> >
>> >> > Cheers,
>> >> > Dave
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >>
>> >> >> Mike Michell  writes:
>> >> >>
>> >> >> > Thanks for the note.
>> >> >> > I understood that PETSc calculates the offsets for me through
>> >> "boffset"
>> >> >> > variable in plexvtu.c file. Please correct me if it is wrong.
>> >> >> >
>> >> >> > If plexvtu.c has a bug, it could be around "write file header"
>> part in
>> >> >> > which the boffset is also computed. Is this correct? I am not using
>> >> >> complex
>> >> >> > number.
>> >> >> > There are several mixed parts among "I

Re: [petsc-users] PetscViewer with 64bit

2023-02-16 Thread Jed Brown

Thanks, Dave.

Mike, can you test that this branch works with your large problems? I tested 
that .vtu works in parallel for small problems, where works = loads correctly 
in Paraview and VisIt.

https://gitlab.com/petsc/petsc/-/merge_requests/6081

Dave May  writes:

> On Tue 14. Feb 2023 at 21:27, Jed Brown  wrote:
>
>> Dave May  writes:
>>
>> > On Tue 14. Feb 2023 at 17:17, Jed Brown  wrote:
>> >
>> >> Can you share a reproducer? I think I recall the format requiring
>> certain
>> >> things to be Int32.
>> >
>> >
>> > By default, the byte offset used with the appended data format is
>> UInt32. I
>> > believe that’s where the sizeof(int) is coming from. This default is
>> > annoying as it limits the total size of your appended data to be < 3 GB.
>> > That said, in the opening of the paraview file you can add this attribute
>> >
>> > header_type="UInt64"
>>
>> You mean in the header of the .vtu?
>
>
> Yeah, within the open VTKFile tag.
> Like this
> < VTKFile type=“xxx”,  byte_order="LittleEndian" header_type="UInt64" >
>
> Do you happen to have an example or pointers to docs describing this
>> feature?
>
>
> Example yes - will send it to you tomorrow. Docs… not really. Only stuff
> like this
>
> https://kitware.github.io/paraview-docs/latest/python/paraview.simple.XMLPStructuredGridWriter.html
>
>
> https://kitware.github.io/paraview-docs/v5.8.0/python/paraview.simple.XMLMultiBlockDataWriter.html
>
> All the writers seem to support it.
>
>
> Can we always do this?
>
>
> Yep!
>
>
> It isn't mentioned in these:
>>
>> https://vtk.org/wp-content/uploads/2015/04/file-formats.pdf   (PDF was
>> created in 2003)
>>
>> https://kitware.github.io/vtk-examples/site/VTKFileFormats/#xml-file-formats
>>
>
> Yes I know. I’ve tied myself in knots for years because the of the
> assumption that the offset had to be an int.
>
> Credit for the discovery goes to Carsten Uphoff. He showed me this.
>
> Cheers,
> Dave
>
>
>
>> > then the size of the offset is now UInt64 and now large files can be
>> > finally written.
>> >
>> >
>> > Cheers,
>> > Dave
>> >
>> >
>> >
>> >
>> >>
>> >> Mike Michell  writes:
>> >>
>> >> > Thanks for the note.
>> >> > I understood that PETSc calculates the offsets for me through
>> "boffset"
>> >> > variable in plexvtu.c file. Please correct me if it is wrong.
>> >> >
>> >> > If plexvtu.c has a bug, it could be around "write file header" part in
>> >> > which the boffset is also computed. Is this correct? I am not using
>> >> complex
>> >> > number.
>> >> > There are several mixed parts among "Int32, UInt8, PetscInt_FMT,
>> >> > PetscInt64_FMT" in writing the header.
>> >> >
>> >> > Which combination of those flags is correct for 64bit indices? I am
>> gonna
>> >> > modify plexvtu.c file with "#if defined(PETSC_USE_64BIT_INDICES)"
>> >> > statement, but I do not know what is the correct form of the header
>> flag
>> >> > for 64bit indices.
>> >> >
>> >> > It is also confusing to me:
>> >> > boffset += gpiece[r].ncells * sizeof(PetscInt) + sizeof(int);
>> >> > How is sizeof(PetscInt) different from sizeof(int)?
>> >> >
>> >> > Thanks,
>> >> > Mike
>> >> >
>> >> >
>> >> >> On Tue, Feb 14, 2023 at 11:45 AM Mike Michell > >
>> >> >> wrote:
>> >> >>
>> >> >>> I was trying to modify the header flags from "Int32" to "Int64", but
>> >> the
>> >> >>> problem was not resolved. Could I get any additional comments?
>> >> >>>
>> >> >>
>> >> >> The calculated offsets are not correct I think.
>> >> >>
>> >> >>   Matt
>> >> >>
>> >> >>
>> >> >>> Thanks,
>> >> >>> Mike
>> >> >>>
>> >> >>>
>> >> >>>> Thanks for the comments.
>> >> >>>> To be precise on the question, the entire part of the header of the
>> >> .vtu
>> >> >>>&

Re: [petsc-users] PetscViewer with 64bit

2023-02-14 Thread Jed Brown

Dave May  writes:

> On Tue 14. Feb 2023 at 17:17, Jed Brown  wrote:
>
>> Can you share a reproducer? I think I recall the format requiring certain
>> things to be Int32.
>
>
> By default, the byte offset used with the appended data format is UInt32. I
> believe that’s where the sizeof(int) is coming from. This default is
> annoying as it limits the total size of your appended data to be < 3 GB.
> That said, in the opening of the paraview file you can add this attribute
>
> header_type="UInt64"

You mean in the header of the .vtu? Do you happen to have an example or 
pointers to docs describing this feature? Can we always do this? It isn't 
mentioned in these: 

https://vtk.org/wp-content/uploads/2015/04/file-formats.pdf   (PDF was created 
in 2003)
https://kitware.github.io/vtk-examples/site/VTKFileFormats/#xml-file-formats

> then the size of the offset is now UInt64 and now large files can be
> finally written.
>
>
> Cheers,
> Dave
>
>
>
>
>>
>> Mike Michell  writes:
>>
>> > Thanks for the note.
>> > I understood that PETSc calculates the offsets for me through "boffset"
>> > variable in plexvtu.c file. Please correct me if it is wrong.
>> >
>> > If plexvtu.c has a bug, it could be around "write file header" part in
>> > which the boffset is also computed. Is this correct? I am not using
>> complex
>> > number.
>> > There are several mixed parts among "Int32, UInt8, PetscInt_FMT,
>> > PetscInt64_FMT" in writing the header.
>> >
>> > Which combination of those flags is correct for 64bit indices? I am gonna
>> > modify plexvtu.c file with "#if defined(PETSC_USE_64BIT_INDICES)"
>> > statement, but I do not know what is the correct form of the header flag
>> > for 64bit indices.
>> >
>> > It is also confusing to me:
>> > boffset += gpiece[r].ncells * sizeof(PetscInt) + sizeof(int);
>> > How is sizeof(PetscInt) different from sizeof(int)?
>> >
>> > Thanks,
>> > Mike
>> >
>> >
>> >> On Tue, Feb 14, 2023 at 11:45 AM Mike Michell 
>> >> wrote:
>> >>
>> >>> I was trying to modify the header flags from "Int32" to "Int64", but
>> the
>> >>> problem was not resolved. Could I get any additional comments?
>> >>>
>> >>
>> >> The calculated offsets are not correct I think.
>> >>
>> >>   Matt
>> >>
>> >>
>> >>> Thanks,
>> >>> Mike
>> >>>
>> >>>
>> >>>> Thanks for the comments.
>> >>>> To be precise on the question, the entire part of the header of the
>> .vtu
>> >>>> file is attached:
>> >>>>
>> >>>> 
>> >>>> > byte_order="LittleEndian">
>> >>>>   
>> >>>> 
>> >>>>   
>> >>>> > NumberOfComponents="3"
>> >>>> format="appended" offset="0" />
>> >>>>   
>> >>>>   
>> >>>> > >>>> NumberOfComponents="1" format="appended" offset="116932" />
>> >>>> > >>>>  NumberOfComponents="1" format="appended" offset="372936" />
>> >>>> > >>>>  NumberOfComponents="1" format="appended" offset="404940" />
>> >>>>   
>> >>>>   
>> >>>> > >>>> format="appended" offset="408944" />
>> >>>>   
>> >>>>   
>> >>>> > >>>> NumberOfComponents="1" format="appended" offset="424948" />
>> >>>>   
>> >>>> 
>> >>>> 
>> >>>>   
>> >>>> > NumberOfComponents="3"
>> >>>> format="appended" offset="463928" />
>> >>>>   
>> >>>>   
>> >>>> > >>>> NumberOfComponents="1" format="appended" offset="580860" />
>> >>>> > >>>>  NumberOfComponents="1" format="appended" offset="836864" /

Re: [petsc-users] PetscViewer with 64bit

2023-02-14 Thread Jed Brown

Can you share a reproducer? I think I recall the format requiring certain 
things to be Int32.

Mike Michell  writes:

> Thanks for the note.
> I understood that PETSc calculates the offsets for me through "boffset"
> variable in plexvtu.c file. Please correct me if it is wrong.
>
> If plexvtu.c has a bug, it could be around "write file header" part in
> which the boffset is also computed. Is this correct? I am not using complex
> number.
> There are several mixed parts among "Int32, UInt8, PetscInt_FMT,
> PetscInt64_FMT" in writing the header.
>
> Which combination of those flags is correct for 64bit indices? I am gonna
> modify plexvtu.c file with "#if defined(PETSC_USE_64BIT_INDICES)"
> statement, but I do not know what is the correct form of the header flag
> for 64bit indices.
>
> It is also confusing to me:
> boffset += gpiece[r].ncells * sizeof(PetscInt) + sizeof(int);
> How is sizeof(PetscInt) different from sizeof(int)?
>
> Thanks,
> Mike
>
>
>> On Tue, Feb 14, 2023 at 11:45 AM Mike Michell 
>> wrote:
>>
>>> I was trying to modify the header flags from "Int32" to "Int64", but the
>>> problem was not resolved. Could I get any additional comments?
>>>
>>
>> The calculated offsets are not correct I think.
>>
>>   Matt
>>
>>
>>> Thanks,
>>> Mike
>>>
>>>
 Thanks for the comments.
 To be precise on the question, the entire part of the header of the .vtu
 file is attached:

 
 
   
 
   
 >>> format="appended" offset="0" />
   
   
 >>> NumberOfComponents="1" format="appended" offset="116932" />
 >>>  NumberOfComponents="1" format="appended" offset="372936" />
 >>>  NumberOfComponents="1" format="appended" offset="404940" />
   
   
 >>> format="appended" offset="408944" />
   
   
 >>> NumberOfComponents="1" format="appended" offset="424948" />
   
 
 
   
 >>> format="appended" offset="463928" />
   
   
 >>> NumberOfComponents="1" format="appended" offset="580860" />
 >>>  NumberOfComponents="1" format="appended" offset="836864" />
 >>>  NumberOfComponents="1" format="appended" offset="868868" />
   
   
 >>> format="appended" offset="872872" />
   
   
 >>> NumberOfComponents="1" format="appended" offset="76" />
   
 
   
   


 Thanks,
 Mike


> On Sun, Feb 12, 2023 at 6:15 PM Mike Michell 
> wrote:
>
>> Dear PETSc team,
>>
>> I am a user of PETSc with Fortran. My code uses DMPlex to handle dm
>> object. To print out output variable and mesh connectivity, I use 
>> VecView()
>> by defining PetscSection on that dm and borrow a vector. The type of the
>> viewer is set to PETSCVIEWERVTK.
>>
>> With 32bit indices, the above work flow has no issue. However, if
>> PETSc is configured with 64bit indices, my output .vtu file has an error 
>> if
>> I open the file with visualization tools, such as Paraview or Tecplot,
>> saying that:
>> "Cannot read cell connectivity from Cells in piece 0 because the
>> "offsets" array is not monotonically increasing or starts with a value
>> other than 0."
>>
>> If I open the .vtu file from terminal, I can see such a line:
>> ...
>> > format="appended" offset="580860" />
>> ...
>>
>> I expected "DataArray type="Int64", since the PETSc has 64bit indices.
>> Could I get recommendations that I need to check to resolve the issue?
>>
>
> This is probably a bug. We will look at it.
>
> Jed, I saw that Int32 is hardcoded in plexvtu.c, but sizeof(PetscInt)
> is used to calculate the offset, which looks inconsistent. Can you take a
> look?
>
>   Thanks,
>
>  Matt
>
>
>> Thanks,
>> Mike
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> 
>

>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> 
>>

Re: [petsc-users] GPUs and the float-double dilemma

2023-02-10 Thread Jed Brown

Ces VLC  writes:

> El El vie, 10 feb 2023 a las 21:44, Barry Smith  escribió:
>
>>
>>What is the use case you are looking for that cannot be achieved by
>> just distributing a single precision application? If the user is happy when
>> they happen to have GPUs to use single precision everywhere, then why would
>> they need double precision if they happen not to have a GPU?
>>
>>Are you just using KSP or also SNES, TS etc?
>
>
> Thanks for your replies. The use case is structural analysis (so, sparse
> symmetrical matrix, and minimum degree reorder tends to work fine in CPU
> (for GPU I’ll need to check the best performing scenarios).

Sparse direct solvers are rather lacking on GPUs. You may be able to use an 
algebraic multigrid on GPUs. If you're using an iterative solver, you'll be 
limited by memory bandwidth, not flops, so double -> float is at best an 8/12 
improvement.

You may be interested in this work on structural mechanics for GPUs.

https://arxiv.org/abs/2204.01722

> Obviously, this use case requires double precision. But single precision
> might be fine enough for faster low quality runs if the user happens to
> have a GPU that accelerates float and not double (I have a 12GB Pascal
> Titan, it accelerates float, not double).
>
> Kind regards,
>
> César

Re: [petsc-users] Question about handling matrix

2023-02-01 Thread Jed Brown

Is the small matrix dense? Then you can use MatSetValues. If the small matrix 
is sparse, you can assemble it with larger dimension (empty rows and columns) 
and use MatAXPY.

김성익  writes:

> Hello,
>
>
> I want to put small matrix to large matrix.
> The schematic of operation is as below.
> [image: image.png]
> Is there any function for put small matrix to large matrix at once?
>
> Thanks,
> Hyung Kim

Re: [petsc-users] locally deploy PETSc

2023-01-19 Thread Jed Brown

You're probably looking for ./configure --prefix=/opt/petsc. It's documented in 
./configure --help. 

Tim Meehan  writes:

> Hi - I am trying to set up a local workstation for a few other developers who 
> need PETSc installed from the latest release. I figured that it would be 
> easiest for me to just clone the repository, as mentioned in the Quick Start.
>
> So, in /home/me/opt, I issued:
> git clone -b release https://gitlab.com/petsc/petsc.git petsc
> cd petsc
> ./configure
> make all check
>
> Things work fine, but I would like to install it in /opt/petsc, minus all of 
> the build derbris
>
> Is there some way to have './configure' do this?
> (I was actually thinking that the configure script was from GNU autotools or 
> something - but obviously not)
>
> Cheers,
> Tim

Re: [petsc-users] DMPlex and CGNS

2023-01-17 Thread Jed Brown

Copying my private reply that appeared off-list. If you have one base with 
different element types, that's in scope for what I plan to develop soon.

Congrats, you crashed cgnsview.

$ cgnsview dl/HybridGrid.cgns
Error in startup script: file was not found
while executing
"CGNSfile $ProgData(file)"
(procedure "file_stats" line 4)
invoked from within
"file_stats"
(procedure "file_load" line 53)
invoked from within
"file_load $fname"
invoked from within
"if {$argc} {
  set fname [lindex $argv [expr $argc - 1]]
  if {[file isfile $fname] && [file readable $fname]} {
file_load $fname
  }
}"
(file "/usr/share/cgnstools/cgnsview.tcl" line 3013)

This file looks okay in cgnscheck and paraview, but I don't have a need for 
multi-block and I'm stretched really thin so probably won't make it work any 
time soon. But if
you make a single block with HexElements alongside PyramidElements and 
TetElements, I should be able to read it. If you don't mind prepping such a 
file (this size or
smaller), it would help me test.


"Engblom, William A."  writes:

> Jesus,
>
> The CGNS files we get from Pointwise have only one base, so that should not 
> be an issue.  However, sections are needed to contain each cell type, the 
> BCs, and zonal boundaries. So, there are always several sections.  The grid 
> that Spencer made for you must have multiple sections.  We have to be able to 
> deal with grids like Spencer's example or else it's not useful.
>
> B.
>
>
>
>
>
>
> 
> From: Ferrand, Jesus A. 
> Sent: Monday, January 16, 2023 5:41 PM
> To: petsc-users@mcs.anl.gov 
> Subject: DMPlex and CGNS
>
> Dear PETSc team:
>
> I would like to use DMPlex to partition a mesh stored as a CGNS file. I 
> configured my installation with --download_cgns = 1, got me a .cgns file and 
> called DMPlexCreateCGNSFromFile() on it. Doing so got me this error:
>
> [0]PETSC ERROR: - Error Message 
> --
> [0]PETSC ERROR: Error in external library
> [0]PETSC ERROR: CGNS file must have a single section, not 4
> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.18.3, unknown
> [0]PETSC ERROR: ./program.exe on a arch-linux-c-debug named F86 by jesus Mon 
> Jan 16 17:25:11 2023
> [0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes 
> --download-cgns=yes --download-metis=yes --download-parmetis=yes 
> --download-ptscotch=yes --download-chaco=yes --with-32bits-pci-domain=1
> [0]PETSC ERROR: #1 DMPlexCreateCGNS_Internal() at 
> /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/cgns/plexcgns2.c:104
> [0]PETSC ERROR: #2 DMPlexCreateCGNS() at 
> /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/plexcgns.c:60
> [0]PETSC ERROR: #3 DMPlexCreateCGNSFromFile_Internal() at 
> /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/cgns/plexcgns2.c:27
> [0]PETSC ERROR: #4 DMPlexCreateCGNSFromFile() at 
> /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/plexcgns.c:29
>
> I looked around mail archives for clused and found this one 
> (https://lists.mcs.anl.gov/pipermail/petsc-users/2018-June/035544.html). 
> There, Matt provides a link to the source code for DMPlexCreateCGNSFromFile() 
> and another (seemingly broken) link to CGNS files that can be opened with the 
> former.  After reading the source code I now understand that it is hardcoded 
> for CGNS files that feature a single "base" and a single "section", whatever 
> those are.
>
> After navigating the CGNS documentation, I can sympathize with the comments 
> in the source code.
>
> Anyhow, I wanted to ask if I could be furnished with one such CGNS file that 
> is compatible with DMPlexCreateCGNSFromFile() to see if I can modify my CGNS 
> files to conform to it. If not, I will look into building the DAG myself 
> using DMPlex APIs.
>
>
> Sincerely:
>
> J.A. Ferrand
>
> Embry-Riddle Aeronautical University - Daytona Beach FL
>
> M.Sc. Aerospace Engineering
>
> B.Sc. Aerospace Engineering
>
> B.Sc. Computational Mathematics
>
>
> Phone: (386)-843-1829
>
> Email(s): ferra...@my.erau.edu
>
> jesus.ferr...@gmail.com

Re: [petsc-users] DMPlex and CGNS

2023-01-16 Thread Jed Brown

Matthew Knepley  writes:

> On Mon, Jan 16, 2023 at 6:15 PM Jed Brown  wrote:
>
>> How soon do you need this? I understand the grumbling about CGNS, but it's
>> easy to build, uses HDF5 parallel IO in a friendly way, supports high order
>> elements, and is generally pretty expressive. I wrote a parallel writer
>> (with some limitations that I'll remove) and plan to replace the current
>> reader with a parallel reader because I have need to read big CGNS meshes.
>> Alas, my semester is starting so it's hard to make promises, but this is a
>> high priority for my research group and if you share your mesh file, I hope
>> to be able to make the parallel reader work in the next couple weeks.
>>
>
> The biggest hurdle for me is understanding the CGNS format. If you can tell
> me what is going on, I can help write the Plex code that
> interprets it.

I wrote the parallel writer and I'm pretty familiar with the data model. Let's 
connect in chat.

Re: [petsc-users] DMPlex and CGNS

2023-01-16 Thread Jed Brown

How soon do you need this? I understand the grumbling about CGNS, but it's easy 
to build, uses HDF5 parallel IO in a friendly way, supports high order 
elements, and is generally pretty expressive. I wrote a parallel writer (with 
some limitations that I'll remove) and plan to replace the current reader with 
a parallel reader because I have need to read big CGNS meshes. Alas, my 
semester is starting so it's hard to make promises, but this is a high priority 
for my research group and if you share your mesh file, I hope to be able to 
make the parallel reader work in the next couple weeks.

"Ferrand, Jesus A."  writes:

> Dear PETSc team:
>
> I would like to use DMPlex to partition a mesh stored as a CGNS file. I 
> configured my installation with --download_cgns = 1, got me a .cgns file and 
> called DMPlexCreateCGNSFromFile() on it. Doing so got me this error:
>
> [0]PETSC ERROR: - Error Message 
> --
> [0]PETSC ERROR: Error in external library
> [0]PETSC ERROR: CGNS file must have a single section, not 4
> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.18.3, unknown
> [0]PETSC ERROR: ./program.exe on a arch-linux-c-debug named F86 by jesus Mon 
> Jan 16 17:25:11 2023
> [0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes 
> --download-cgns=yes --download-metis=yes --download-parmetis=yes 
> --download-ptscotch=yes --download-chaco=yes --with-32bits-pci-domain=1
> [0]PETSC ERROR: #1 DMPlexCreateCGNS_Internal() at 
> /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/cgns/plexcgns2.c:104
> [0]PETSC ERROR: #2 DMPlexCreateCGNS() at 
> /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/plexcgns.c:60
> [0]PETSC ERROR: #3 DMPlexCreateCGNSFromFile_Internal() at 
> /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/cgns/plexcgns2.c:27
> [0]PETSC ERROR: #4 DMPlexCreateCGNSFromFile() at 
> /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/plexcgns.c:29
>
> I looked around mail archives for clused and found this one 
> (https://lists.mcs.anl.gov/pipermail/petsc-users/2018-June/035544.html). 
> There, Matt provides a link to the source code for DMPlexCreateCGNSFromFile() 
> and another (seemingly broken) link to CGNS files that can be opened with the 
> former.  After reading the source code I now understand that it is hardcoded 
> for CGNS files that feature a single "base" and a single "section", whatever 
> those are.
>
> After navigating the CGNS documentation, I can sympathize with the comments 
> in the source code.
>
> Anyhow, I wanted to ask if I could be furnished with one such CGNS file that 
> is compatible with DMPlexCreateCGNSFromFile() to see if I can modify my CGNS 
> files to conform to it. If not, I will look into building the DAG myself 
> using DMPlex APIs.
>
>
> Sincerely:
>
> J.A. Ferrand
>
> Embry-Riddle Aeronautical University - Daytona Beach FL
>
> M.Sc. Aerospace Engineering
>
> B.Sc. Aerospace Engineering
>
> B.Sc. Computational Mathematics
>
>
> Phone: (386)-843-1829
>
> Email(s): ferra...@my.erau.edu
>
> jesus.ferr...@gmail.com

Re: [petsc-users] coordinate degrees of freedom for 2nd-order gmsh mesh

2023-01-12 Thread Jed Brown

Dave May  writes:

> On Thu 12. Jan 2023 at 17:58, Blaise Bourdin  wrote:
>
>> Out of curiosity, what is the rationale for _reading_ high order gmsh
>> meshes?
>>
>
> GMSH can use a CAD engine like OpenCascade. This provides geometric
> representations via things like BSplines. Such geometric representation are
> not exposed to the users application code, nor are they embedded in any
> mesh format GMSH emits. The next best thing is to use a high order
> representation of the mesh geometry and project the CAD geometry (say a
> BSpline) into this higher order function space. The projection of the
> geometry is a quantity that can be described with the .msh format.

Adding to this, efficient methods for volumes with concave surfaces *must* use 
at least quadratic geometry. See Figure 5, where "p-refinement with linear 
geometry" causes anti-convergence (due the spurious stress singularities from 
the linear geometry, visible in Figure 4) while p-refinement with quadratic 
geometry is vastly more efficient despite the physical stress singularities 
that prevent exponential convergence. 

https://arxiv.org/pdf/2204.01722.pdf

Re: [petsc-users] coordinate degrees of freedom for 2nd-order gmsh mesh

2023-01-12 Thread Jed Brown

It's confusing, but this line makes high order simplices always read as 
discontinuous coordinate spaces. I would love if someone would revisit that, 
perhaps also using DMPlexSetIsoperiodicFaceSF(), which should simplify the code 
and avoid the confusing cell coordinates pattern. Sadly, I don't have time to 
dive in.

https://gitlab.com/petsc/petsc/-/commit/066ea43f7f75752f012be6cd06b6107ebe84cc6d#3616cad8148970af5b97293c49492ff893e25b59_1552_1724

"Daniel R. Shapero"  writes:

> Sorry either your mail system or mine prevented me from attaching the file,
> so I put it on pastebin:
> https://pastebin.com/awFpc1Js
>
> On Wed, Jan 11, 2023 at 4:54 PM Matthew Knepley  wrote:
>
>> Can you send the .msh file? I still have not installed Gmsh :)
>>
>>   Thanks,
>>
>>  Matt
>>
>> On Wed, Jan 11, 2023 at 2:43 PM Daniel R. Shapero  wrote:
>>
>>> Hi all -- I'm trying to read in 2nd-order / piecewise quadratic meshes
>>> that are generated by gmsh and I don't understand how the coordinates are
>>> stored in the plex. I've been discussing this with Matt Knepley here
>>> 
>>> as it pertains to Firedrake but I think this is more an issue at the PETSc
>>> level.
>>>
>>> This code
>>> 
>>> uses gmsh to generate a 2nd-order mesh of the unit disk, read it into a
>>> DMPlex, print out the number of cells in each depth stratum, and finally
>>> print a view of the coordinate DM's section. The resulting mesh has 64
>>> triangles, 104 edges, and 41 vertices. For 2nd-order meshes, I'd expected
>>> there to be 2 degrees of freedom at each node and 2 at each edge. The
>>> output is:
>>>
>>> ```
>>> Depth strata: [(64, 105), (105, 209), (0, 64)]
>>>
>>> PetscSection Object: 1 MPI process
>>>   type not yet set
>>> 1 fields
>>>   field 0 with 2 components
>>> Process 0:
>>>   (   0) dim 12 offset   0
>>>   (   1) dim 12 offset  12
>>>   (   2) dim 12 offset  24
>>> ...
>>>   (  62) dim 12 offset 744
>>>   (  63) dim 12 offset 756
>>>   (  64) dim  0 offset 768
>>>   (  65) dim  0 offset 768
>>> ...
>>>   ( 207) dim  0 offset 768
>>>   ( 208) dim  0 offset 768
>>>   PetscSectionSym Object: 1 MPI process
>>> type: label
>>> Label 'depth'
>>> Symmetry for stratum value 0 (0 dofs per point): no symmetries
>>> Symmetry for stratum value 1 (0 dofs per point): no symmetries
>>> Symmetry for stratum value 2 (12 dofs per point):
>>>   Orientation range: [-3, 3)
>>> Symmetry for stratum value -1 (0 dofs per point): no symmetries
>>> ```
>>>
>>> The output suggests that there are 12 degrees of freedom in each
>>> triangle. That would mean the coordinate field is discontinuous across cell
>>> boundaries. Can someone explain what's going on? I tried reading the .msh
>>> file but it's totally inscrutable to me. I'm happy to RTFSC if someone
>>> points me in the right direction. Matt tells me that the coordinate field
>>> should only be discontinuous if the mesh is periodic, but this mesh
>>> shouldn't be periodic.
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> 
>>

Re: [petsc-users] GPU implementation of serial smoothers

2023-01-10 Thread Jed Brown

Mark Lohry  writes:

> I definitely need multigrid. I was under the impression that GAMG was
> relatively cuda-complete, is that not the case? What functionality works
> fully on GPU and what doesn't, without any host transfers (aside from
> what's needed for MPI)?
>
> If I use -ksp-pc_type gamg -mg_levels_pc_type pbjacobi -mg_levels_ksp_type
> richardson is that fully on device, but -mg_levels_pc_type ilu or
> -mg_levels_pc_type sor require transfers?

You can do `-mg_levels_pc_type ilu`, but it'll be extremely slow (like 20x 
slower than an operator apply). One can use Krylov smoothers, though that's 
more synchronization. Automatic construction of operator-dependent multistage 
smoothers for linear multigrid (because Chebyshev only works for problems that 
have eigenvalues near the real axis) is something I've wanted to develop for at 
least a decade, but time is always short. I might put some effort into p-MG 
with such smoothers this year as we add DDES to our scale-resolving 
compressible solver.

Re: [petsc-users] GPU implementation of serial smoothers

2023-01-10 Thread Jed Brown

The joy of GPUs. You can use sparse triangular kernels like ILU (provided by 
cuBLAS), but they are so mindbogglingly slow that you'll go back to the drawing 
board and try to use a multigrid method of some sort with 
polynomial/point-block smoothing.

BTW, on unstructured grids, coloring requires a lot of colors and thus many 
times more bandwidth (due to multiple passes) than the operator itself.

Mark Lohry  writes:

> Well that's suboptimal. What are my options for 100% GPU solves with no
> host transfers?
>
> On Tue, Jan 10, 2023, 2:23 PM Barry Smith  wrote:
>
>>
>>
>> On Jan 10, 2023, at 2:19 PM, Mark Lohry  wrote:
>>
>> Is DILU a point-block method? We have -pc_type pbjacobi (and vpbjacobi if
>>> the node size is not uniform). The are good choices for scale-resolving CFD
>>> on GPUs.
>>>
>>
>> I was hoping you'd know :)  pbjacobi is underperforming ilu by a pretty
>> wide margin on some of the systems i'm looking at.
>>
>> We don't have colored smoothers currently in PETSc.
>>>
>>
>> So what happens under the hood when I run -mg_levels_pc_type sor on GPU?
>> Are you actually decomposing the matrix into lower and computing updates
>> with matrix multiplications? Or is it just the standard serial algorithm
>> with thread safety ignored?
>>
>>
>>   It is running the regular SOR on the CPU and needs to copy up the vector
>> and copy down the result.
>>
>>
>> On Tue, Jan 10, 2023 at 1:52 PM Barry Smith  wrote:
>>
>>>
>>>   We don't have colored smoothers currently in PETSc.
>>>
>>> > On Jan 10, 2023, at 12:56 PM, Jed Brown  wrote:
>>> >
>>> > Is DILU a point-block method? We have -pc_type pbjacobi (and vpbjacobi
>>> if the node size is not uniform). The are good choices for scale-resolving
>>> CFD on GPUs.
>>> >
>>> > Mark Lohry  writes:
>>> >
>>> >> I'm running GAMG with CUDA, and I'm wondering how the nominally serial
>>> >> smoother algorithms are implemented on GPU? Specifically SOR/GS and
>>> ILU(0)
>>> >> -- in e.g. AMGx these are applied by first creating a coloring, and the
>>> >> smoother passes are done color by color. Is this how it's done in
>>> petsc AMG?
>>> >>
>>> >> Tangential, AMGx and OpenFOAM offer something called "DILU", diagonal
>>> ILU.
>>> >> Is there an equivalent in petsc?
>>> >>
>>> >> Thanks,
>>> >> Mark
>>>
>>>
>>

Re: [petsc-users] GPU implementation of serial smoothers

2023-01-10 Thread Jed Brown

Is DILU a point-block method? We have -pc_type pbjacobi (and vpbjacobi if the 
node size is not uniform). The are good choices for scale-resolving CFD on GPUs.

Mark Lohry  writes:

> I'm running GAMG with CUDA, and I'm wondering how the nominally serial
> smoother algorithms are implemented on GPU? Specifically SOR/GS and ILU(0)
> -- in e.g. AMGx these are applied by first creating a coloring, and the
> smoother passes are done color by color. Is this how it's done in petsc AMG?
>
> Tangential, AMGx and OpenFOAM offer something called "DILU", diagonal ILU.
> Is there an equivalent in petsc?
>
> Thanks,
> Mark

Re: [petsc-users] How to install in /usr/lib64 instead of /usr/lib?

2023-01-06 Thread Jed Brown

The make convention would be to respond to `libdir`, which is probably the 
simplest if we can defer that choice until install time. It probably needs to 
be known at build time, thus should go in configure.

https://www.gnu.org/software/make/manual/html_node/Directory-Variables.html

Satish Balay via petsc-users  writes:

> For now - perhaps the following patch...
>
> Satish
>
> ---
>
> diff --git a/config/install.py b/config/install.py
> index 017bb736542..00f857f939e 100755
> --- a/config/install.py
> +++ b/config/install.py
> @@ -76,9 +76,9 @@ class Installer(script.Script):
>  self.archBinDir= os.path.join(self.rootDir, self.arch, 'bin')
>  self.archLibDir= os.path.join(self.rootDir, self.arch, 'lib')
>  self.destIncludeDir= os.path.join(self.destDir, 'include')
> -self.destConfDir   = os.path.join(self.destDir, 'lib','petsc','conf')
> -self.destLibDir= os.path.join(self.destDir, 'lib')
> -self.destBinDir= os.path.join(self.destDir, 'lib','petsc','bin')
> +self.destConfDir   = os.path.join(self.destDir, 
> 'lib64','petsc','conf')
> +self.destLibDir= os.path.join(self.destDir, 'lib64')
> +self.destBinDir= os.path.join(self.destDir, 
> 'lib64','petsc','bin')
>  self.installIncludeDir = os.path.join(self.installDir, 'include')
>  self.installBinDir = os.path.join(self.installDir, 
> 'lib','petsc','bin')
>  self.rootShareDir  = os.path.join(self.rootDir, 'share')
>
> On Thu, 5 Jan 2023, Fellype via petsc-users wrote:
>
>> Hi,
>> I'm building petsc from sources on a 64-bit Slackware Linux and I would like 
>> to know how to install the libraries in /usr/lib64 instead of /usr/lib. Is 
>> it possible? I've not found an option like --libdir=DIR to pass to 
>> ./configure.
>> 
>> Regards,
>> Fellype

Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Jed Brown

Junchao Zhang  writes:

>> I don't think it's remotely crazy. libCEED supports both together and it's
>> very convenient when testing on a development machine that has one of each
>> brand GPU and simplifies binary distribution for us and every package that
>> uses us. Every day I wish PETSc could build with both simultaneously, but
>> everyone tells me it's silly.
>>
>
> So an executable supports both GPUs,  but a running instance supports one
> or both at the same time?

I personally only have reason to instantiate one at a time within a given 
executable, though libCEED supports both instantiated at the same time.

Re: [petsc-users] MatCreateSeqAIJWithArrays for GPU / cusparse

2023-01-05 Thread Jed Brown

Mark Adams  writes:

> Support of HIP and CUDA hardware together would be crazy, 

I don't think it's remotely crazy. libCEED supports both together and it's very 
convenient when testing on a development machine that has one of each brand GPU 
and simplifies binary distribution for us and every package that uses us. Every 
day I wish PETSc could build with both simultaneously, but everyone tells me 
it's silly.

Re: [petsc-users] puzzling arkimex logic

2023-01-04 Thread Jed Brown

This default probably shouldn't be zero, and probably lengthening steps should 
be more gentle after a recent failure. But Mark, please let us know if what's 
there works for you.

"Zhang, Hong via petsc-users"  writes:

> Hi Mark,
>
> You might want to try -ts_adapt_time_step_increase_delay to delay increasing 
> the time step after it has been decreased due to a failed solve.
>
> Hong (Mr.)
>
>> On Jan 2, 2023, at 12:17 PM, Mark Adams  wrote:
>> 
>> I am using arkimex and the logic with a failed KSP solve is puzzling. This 
>> step starts with a dt of ~.005, the linear solver fails and cuts the time 
>> step by 1/4. So far, so good. The step then works but the next time step the 
>> time step goes to ~0.006.
>> TS seems to have forgotten that it had to cut the time step back.
>> Perhaps that logic is missing or my parameters need work?
>> 
>> Thanks,
>> Mark
>> 
>> -ts_adapt_dt_max 0.01 # (source: command line)
>> -ts_adapt_monitor # (source: file)
>> -ts_arkimex_type 1bee # (source: file)
>> -ts_dt .001 # (source: command line)
>> -ts_max_reject 10 # (source: file)
>> -ts_max_snes_failures -1 # (source: file)
>> -ts_max_steps 8000 # (source: command line)
>> -ts_max_time 14 # (source: command line)
>> -ts_monitor # (source: file)
>> -ts_rtol 1e-6 # (source: command line)
>> -ts_type arkimex # (source: file)
>> 
>>   Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 2
>>   TSAdapt basic arkimex 0:1bee step   1 accepted t=0.001  + 
>> 2.497e-03 dt=5.404e-03  wlte=0.173  wltea=   -1 wlter=  
>>  -1
>> 2 TS dt 0.00540401 time 0.00349731
>> 0 SNES Function norm 1.358886930084e-05 
>> Linear solve did not converge due to DIVERGED_ITS iterations 100
>>   Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
>>   TSAdapt basic step   2 stage rejected (DIVERGED_LINEAR_SOLVE) 
>> t=0.00349731 + 5.404e-03 retrying with dt=1.351e-03 
>> 0 SNES Function norm 1.358886930084e-05 
>> Linear solve converged due to CONVERGED_RTOL iterations 19
>> 1 SNES Function norm 4.412110425362e-10 
>> Linear solve converged due to CONVERGED_RTOL iterations 6
>> 2 SNES Function norm 4.978968053066e-13 
>>   Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 2
>> 0 SNES Function norm 8.549322067920e-06 
>> Linear solve converged due to CONVERGED_RTOL iterations 14
>> 1 SNES Function norm 8.357075378456e-11 
>> Linear solve converged due to CONVERGED_RTOL iterations 4
>> 2 SNES Function norm 4.983138402512e-13 
>>   Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 2
>> 0 SNES Function norm 1.044832467924e-05 
>> Linear solve converged due to CONVERGED_RTOL iterations 13
>> 1 SNES Function norm 1.036101875301e-10 
>> Linear solve converged due to CONVERGED_RTOL iterations 4
>> 2 SNES Function norm 4.984888077288e-13 
>>   Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 2
>>   TSAdapt basic arkimex 0:1bee step   2 accepted t=0.00349731 + 
>> 1.351e-03 dt=6.305e-03  wlte=0.0372  wltea=   -1 wlter= 
>>   -1
>> 3 TS dt 0.00630456 time 0.00484832
>> 0 SNES Function norm 8.116559104264e-06 
>> Linear solve did not converge due to DIVERGED_ITS iterations 100

Re: [petsc-users] Question-Memory of matsetvalue

2022-12-30 Thread Jed Brown

This is what I'd expect to observe if you didn't preallocate correctly for the 
second matrix, which has more nonzeros per row.

https://petsc.org/release/docs/manual/mat/#sec-matsparse

김성익  writes:

> Hello,
>
>
>
> I have a question about memory of matsetvalue.
>
> When I assembly the local matrix to global matrix, I’m just using
> matsetvalue.
> Because the connectivity is so complex I can’t use matsetvalues.
>
> I asked this question because I was curious about how ram memory is
> allocated differently for the two simtuations below.
>
>
>
> First situation.
>
> The local matrix size is 24 by 24. And the number of local matrix is
> 125,000.
>
> When assembly procedure by using matsetvalue, memory allocation does not
> increase.
> So I just put Matassemblybegin and matassemblyend after all matsetvalue.
>
>
>
> Second situation.
>
> The local matrix size is 60 by 60. And the number of local matrix is 27,000.
>
> When assembly procedure by using matsetvalue, memory allocation does
> increase.
>
> So I just put Matassemblybegin and matassemblyend after each local matrix
> assembly.
> This did not increase the memory further..
>
>
>
> Why this situation is happen?
>
> And is there any way to prevent the memory allocation from increasing?
>
>
>
>
>
>
>
> Thanks,
>
> Hyung Kim

Re: [petsc-users] SNES and TS for nonlinear quasi-static problems

2022-12-19 Thread Jed Brown

Indeed, this is exactly how we do quasistatic analysis for solid mechanics in 
Ratel (https://gitlab.com/micromorph/ratel) -- make sure to choose an L-stable 
integrator (backward Euler being the most natural choice). Implicit dynamics 
can be done by choosing a suitable integrator, like TSALPHA2, with almost no 
code change to the residual (only adding the mass term in DMTSSetI2Function()).

Matthew Knepley  writes:

> On Mon, Dec 19, 2022 at 6:12 AM TARDIEU Nicolas via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
>
>> Dear PETSc users,
>>
>> I plan to solve nonlinear quasi-static problems with PETSc. More
>> precisely, these are solid mechanics problems with elasto-plasticity.
>> So they do not involve  "physical time", rather  "pseudo time", which is
>> mandatory to describe the stepping of the loading application.
>> In general, the loading vector F(x, t) is expressed as the following
>> product  F(x, t)=F0(x)*g(t), where g is a scalar function of the
>> pseudo-time.
>>
>> I see how to use a SNES in order to solve a certain step of the loading
>> history but I wonder if a TS can be used to deal with the loading history
>> through the definition of this g(t) function ?
>>
>
> I believe so. We would want you to formulate it as a differential equation,
>
>   F(x, \dot x, t) = G(x)
>
> which is your case is easy
>
>   F(x, t) = 0
>
> so you would just put your function completely into the IFunction. Since
> there is no \dot x term, this is a DAE,
> and you need to use one of the solvers for that.
>
>
>> Furthermore, since too large load steps can lead to non-convergence, a
>> stepping strategy is almost always required to restart a load step that
>> failed. Does TS offer such a feature ?
>>
>
> I think you can use PETSc adaptation. There is a PID option, which might be
> able to capture your behavior.
>
>   Thanks,
>
>   Matt
>
>
>> Thank you for your answers.
>> Regards,
>> Nicolas
>> --
>> *Nicolas Tardieu*
>> *Ing PhD Computational Mechanics*
>> EDF - R Dpt ERMES
>> PARIS-SACLAY, FRANCE
>>
>>
>> Ce message et toutes les pièces jointes (ci-après le 'Message') sont
>> établis à l'intention exclusive des destinataires et les informations qui y
>> figurent sont strictement confidentielles. Toute utilisation de ce Message
>> non conforme à sa destination, toute diffusion ou toute publication totale
>> ou partielle, est interdite sauf autorisation expresse.
>>
>> Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de
>> le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou
>> partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de
>> votre système, ainsi que toutes ses copies, et de n'en garder aucune trace
>> sur quelque support que ce soit. Nous vous remercions également d'en
>> avertir immédiatement l'expéditeur par retour du message.
>>
>> Il est impossible de garantir que les communications par messagerie
>> électronique arrivent en temps utile, sont sécurisées ou dénuées de toute
>> erreur ou virus.
>> 
>>
>> This message and any attachments (the 'Message') are intended solely for
>> the addressees. The information contained in this Message is confidential.
>> Any use of information contained in this Message not in accord with its
>> purpose, any dissemination or disclosure, either whole or partial, is
>> prohibited except formal approval.
>>
>> If you are not the addressee, you may not copy, forward, disclose or use
>> any part of it. If you have received this message in error, please delete
>> it and all copies from your system and notify the sender immediately by
>> return message.
>>
>> E-mail communication cannot be guaranteed to be timely secure, error or
>> virus-free.
>>
>
>
> -- 
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/

Re: [petsc-users] dmplex normal vector incorrect for periodic gmsh grids

2022-12-18 Thread Jed Brown

Matthew Knepley  writes:

> On Fri, Dec 16, 2022 at 12:22 AM Praveen C  wrote:
>
>> Thank you very much. I do see correct normals now.
>>
>> Is there a way to set the option
>>
>> -dm_localize_height 1
>>>
>>
>> within the code ?
>>
>
> The problem is that the localization happens within the Gmsh construction,
> so it is difficult to insert an API.
> We could use a callback, but that rapidly becomes unwieldy. If you want to
> do it programmatically, I would use
> PetscOptionsSetValue().

Can it not be created explicitly later?

This kind of thing isn't really a run-time choice, but rather a statement about 
the way the calling code has been written.

Re: [petsc-users] Help with input construction hang on 2-GPU CG Solve

2022-12-17 Thread Jed Brown

I ran your code successfully with and without GPU-aware MPI. I see a bit of 
time in MatSetValue -- you can make it a bit faster using one MatSetValues call 
per row, but it's typical that assembling a matrix like this (sequentially on 
the host) will be more expensive than some unpreconditioned CG iterations (that 
don't come close to solving the problem -- use multigrid if you want to 
actually solve this problem).

Rohan Yadav  writes:

> Hi,
>
> I'm developing a microbenchmark that runs a CG solve using PETSc on a mesh
> using a 5-point stencil matrix. My code (linked here:
> https://github.com/rohany/petsc-pde-benchmark/blob/main/main.cpp, only 120
> lines) works on 1 GPU and has great performance. When I move to 2 GPUs, the
> program appears to get stuck in the input generation. I've literred the
> code with print statements and have found out the following clues:
>
> * The first rank progresses through this loop:
> https://github.com/rohany/petsc-pde-benchmark/blob/main/main.cpp#L44, but
> then does not exit (it seems to get stuck right before rowStart == rowEnd)
> * The second rank makes very few iterations through the loop for its
> allotted rows.
>
> Therefore, neither rank makes it to the call to MatAssemblyBegin.
>
> I'm running the code using the following command line on the Summit
> supercomputer:
> ```
> jsrun -n 2 -g 1 -c 1 -b rs -r 2
> /gpfs/alpine/scratch/rohany/csc335/petsc-pde-benchmark/main -ksp_max_it 200
> -ksp_type cg -pc_type none -ksp_atol 1e-10 -ksp_rtol 1e-10 -vec_type cuda
> -mat_type aijcusparse -use_gpu_aware_mpi 0 -nx 8485 -ny 8485
> ```
>
> Any suggestions will be appreciated! I feel that I have applied many of the
> common petsc optimizations of preallocating my matrix row counts, so I'm
> not sure what's going on with this input generation.
>
> Thanks,
>
> Rohan Yadav

Re: [petsc-users] GAMG and linearized elasticity

2022-12-13 Thread Jed Brown

Do you have slip/symmetry boundary conditions, where some components are 
constrained? In that case, there is no uniform block size and I think you'll 
need DMPlexCreateRigidBody() and MatSetNearNullSpace().

The PCSetCoordinates() code won't work for non-constant block size.

-pc_type gamg should work okay out of the box for elasticity. For hypre, I've 
had good luck with this options suite, which also runs on GPU.

-pc_type hypre -pc_hypre_boomeramg_coarsen_type pmis 
-pc_hypre_boomeramg_interp_type ext+i -pc_hypre_boomeramg_no_CF 
-pc_hypre_boomeramg_P_max 6 -pc_hypre_boomeramg_relax_type_down Chebyshev 
-pc_hypre_boomeramg_relax_type_up Chebyshev 
-pc_hypre_boomeramg_strong_threshold 0.5

Blaise Bourdin  writes:

> Hi,
>
> I am getting close to finish porting a code from petsc 3.3 / sieve to main / 
> dmplex, but am
> now encountering difficulties 
> I am reasonably sure that the Jacobian and residual are correct. The codes 
> handle boundary
> conditions differently (MatZeroRowCols vs dmplex constraints) so it is not 
> trivial to compare
> them. Running with snes_type ksponly pc_type Jacobi or hyper gives me the 
> same results in
> roughly the same number of iterations.
>
> In my old code, gamg would work out of the box. When using petsc-main, 
> -pc_type gamg -
> pc_gamg_type agg works for _some_ problems using P1-Lagrange elements, but 
> never for
> P2-Lagrange. The typical error message is in gamg_agg.txt
>
> When using -pc_type classical, a problem where the KSP would converge in 47 
> iteration in
> 3.3 now takes 1400.  ksp_view_3.3.txt and ksp_view_main.txt show the output 
> of -ksp_view
> for both versions. I don’t notice anything obvious.
>
> Strangely, removing the call to PCSetCoordinates does not have any impact on 
> the
> convergence.
>
> I am sure that I am missing something, or not passing the right options. 
> What’s a good
> starting point for 3D elasticity?
> Regards,
> Blaise
>
> — 
> Canada Research Chair in Mathematical and Computational Aspects of Solid 
> Mechanics
> (Tier 1)
> Professor, Department of Mathematics & Statistics
> Hamilton Hall room 409A, McMaster University
> 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada 
> https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243
> [0]PETSC ERROR: - Error Message 
> --
> [0]PETSC ERROR: Petsc has generated inconsistent data
> [0]PETSC ERROR: Computed maximum singular value as zero
> [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be 
> the program crashed before they were used or a spelling mistake, etc!
> [0]PETSC ERROR: Option left: name:-displacement_ksp_converged_reason value: 
> ascii source: file
> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.18.2-341-g16200351da0  GIT 
> Date: 2022-12-12 23:42:20 +
> [0]PETSC ERROR: 
> /home/bourdinb/Development/mef90/mef90-dmplex/bbserv-gcc11.2.1-mvapich2-2.3.7-O/bin/ThermoElasticity
>  on a bbserv-gcc11.2.1-mvapich2-2.3.7-O named bb01 by bourdinb Tue Dec 13 
> 17:02:19 2022
> [0]PETSC ERROR: Configure options --CFLAGS=-Wunused 
> --FFLAGS="-ffree-line-length-none -fallow-argument-mismatch -Wunused" 
> --COPTFLAGS="-O2 -march=znver2" --CXXOPTFLAGS="-O2 -march=znver2" 
> --FOPTFLAGS="-O2 -march=znver2" --download-chaco=1 --download-exodusii=1 
> --download-fblaslapack=1 --download-hdf5=1 --download-hypre=1 
> --download-metis=1 --download-ml=1 --download-mumps=1 --download-netcdf=1 
> --download-p4est=1 --download-parmetis=1 --download-pnetcdf=1 
> --download-scalapack=1 --download-sowing=1 
> --download-sowing-cc=/opt/rh/devtoolset-9/root/usr/bin/gcc 
> --download-sowing-cxx=/opt/rh/devtoolset-9/root/usr/bin/g++ 
> --download-sowing-cpp=/opt/rh/devtoolset-9/root/usr/bin/cpp 
> --download-sowing-cxxcpp=/opt/rh/devtoolset-9/root/usr/bin/cpp 
> --download-superlu=1 --download-triangle=1 --download-yaml=1 
> --download-zlib=1 --with-debugging=0 
> --with-mpi-dir=/opt/HPC/mvapich2/2.3.7-gcc11.2.1 --with-pic 
> --with-shared-libraries=1 --with-mpiexec=srun --with-x11=0
> [0]PETSC ERROR: #1 PCGAMGOptProlongator_AGG() at 
> /1/HPC/petsc/main/src/ksp/pc/impls/gamg/agg.c:779
> [0]PETSC ERROR: #2 PCSetUp_GAMG() at 
> /1/HPC/petsc/main/src/ksp/pc/impls/gamg/gamg.c:639
> [0]PETSC ERROR: #3 PCSetUp() at 
> /1/HPC/petsc/main/src/ksp/pc/interface/precon.c:994
> [0]PETSC ERROR: #4 KSPSetUp() at 
> /1/HPC/petsc/main/src/ksp/ksp/interface/itfunc.c:405
> [0]PETSC ERROR: #5 KSPSolve_Private() at 
> /1/HPC/petsc/main/src/ksp/ksp/interface/itfunc.c:824
> [0]PETSC ERROR: #6 KSPSolve() at 
> /1/HPC/petsc/main/src/ksp/ksp/interface/itfunc.c:1070
> [0]PETSC ERROR: #7 SNESSolve_KSPONLY() at 
> /1/HPC/petsc/main/src/snes/impls/ksponly/ksponly.c:48
> [0]PETSC ERROR: #8 SNESSolve() at 
> /1/HPC/petsc/main/src/snes/interface/snes.c:4693
> [0]PETSC ERROR: #9 
>

Re: [petsc-users] Insert one sparse matrix as a block in another

2022-12-12 Thread Jed Brown

The description matches MATNEST (MATCOMPOSITE is for a sum or product of 
matrices) or parallel decompositions. Also consider the assembly style of 
src/snes/tutorials/ex28.c, which can create either a monolithic or block 
(MATNEST) matrix without extra storage or conversion costs.

Mark Adams  writes:

> Do you know what kind of solver works well for this problem?
>
> You probably want to figure that out first and not worry about efficiency.
>
> MATCOMPOSITE does what you want but not all solvers will work with it.
>
> Where does this problem come from? We have a lot of experience and might
> know something.
>
> Mark
>
> On Mon, Dec 12, 2022 at 1:33 PM Peder Jørgensgaard Olesen via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
>
>> Hello
>>
>>
>> I have a set of sparse matrices (A1, A2, ...) , and need to generate a
>> larger matrix B with these as submatrices. I do not know the precise sparse
>> layouts of the A's (only that each row has one or two non-zero values),
>> and extracting *all* values to copy into B seems incredibly wasteful. How
>> can I make use of the sparsity to solve this efficiently?
>>
>>
>> Thanks,
>>
>> Peder
>>
>>
>>
>> Peder Jørgensgaard Olesen
>> PhD student
>> Department of Civil and Mechanical Engineering
>>
>> pj...@mek.dtu.dk
>> Koppels Allé
>> Building 403, room 105
>> 2800 Kgs. Lyngby
>> www.dtu.dk/english
>>
>>

Re: [petsc-users] prevent linking to multithreaded BLAS?

2022-12-08 Thread Jed Brown

Barry Smith  writes:

>> We could test at runtime whether child threads exist/are created when 
>> calling BLAS and deliver a warning. 
>
>   How does one test for this? Some standard Unix API for checking this?

I'm not sure, the ids of child threads are in /proc/$pid/task/ and (when opened 
by a process) /proc/self/task/. See man procfs for details.

Re: [petsc-users] prevent linking to multithreaded BLAS?

2022-12-07 Thread Jed Brown

It isn't always wrong to link threaded BLAS. For example, a user might need to 
call threaded BLAS on the side (but the application can only link one) or a 
sparse direct solver might want threading for the supernode. We could test at 
runtime whether child threads exist/are created when calling BLAS and deliver a 
warning. 

Barry Smith  writes:

>There would need to be, for example, some symbol in all the threaded BLAS 
> libraries that is not in the unthreaded libraries. Of at least in some of the 
> threaded libraries but never in the unthreaded. 
>
>BlasLapack.py could check for the special symbol(s) to determine.
>
>   Barry
>
>
>> On Dec 7, 2022, at 4:47 PM, Mark Lohry  wrote:
>> 
>> Thanks, yes, I figured out the OMP_NUM_THREADS=1 way while triaging it, and 
>> the --download-fblaslapack way occurred to me. 
>> 
>> I was hoping for something that "just worked" (refuse to build in this case) 
>> but I don't know if it's programmatically possible for petsc to tell whether 
>> or not it's linking to a threaded BLAS?
>> 
>> On Wed, Dec 7, 2022 at 4:35 PM Satish Balay > > wrote:
>>> If you don't specify a blas to use - petsc configure will guess and use 
>>> what it can find.
>>> 
>>> So only way to force it use a particular blas is to specify one [one way is 
>>> --download-fblaslapack]
>>> 
>>> Wrt multi-thread openblas -  you can force it run single threaded [by one 
>>> of these 2 env variables]
>>> 
>>> # Use single thread openblas
>>> export OPENBLAS_NUM_THREADS=1
>>> export OMP_NUM_THREADS=1
>>> 
>>> Satish
>>> 
>>> 
>>> On Wed, 7 Dec 2022, Mark Lohry wrote:
>>> 
>>> > I ran into an unexpected issue -- on an NP-core machine, each MPI rank of
>>> > my application was launching NP threads, such that when running with
>>> > multiple ranks the machine was quickly oversubscribed and performance
>>> > tanked.
>>> > 
>>> > The root cause of this was petsc linking against the system-provided
>>> > library (libopenblas0-pthread in this case) set by the update-alternatives
>>> > in ubuntu. At some point this machine got updated to using the threaded
>>> > blas implementation instead of serial; not sure how, and I wouldn't have
>>> > noticed if I weren't running interactively.
>>> > 
>>> > Is there any mechanism in petsc or its build system to prevent linking
>>> > against an inappropriate BLAS, or do I need to be diligent about manually
>>> > setting the BLAS library in the configuration stage?
>>> > 
>>> > Thanks,
>>> > Mark
>>> > 
>>>

Re: [petsc-users] AMD vs Intel mobile CPU performance

2022-11-16 Thread Jed Brown

If you're using iterative solvers, compare memory bandwidth first, then cache. 
Flops aren't very important unless you use sparse direct solvers or have SNES 
residual/Jacobian evaluation that is expensive and has been written for 
vectorization.

If you can get the 6650U with LPDDR5-6400, it'll probably be faster. My laptop 
is the previous generation, 5900HS.

"D.J. Nolte"  writes:

> Hi all,
> I'm looking for a small laptop which I'll be using (also) for small scale
> PETSc (KSP & SNES) simulations. For this setting performance is not that
> important, but still, I wonder if the community has any experience with AMD
> Ryzen CPUs (specifically 5 Pro 6650U) CPUs compared to Intel i7 12th gen.
> Do I have to expect significant performance differences?
>
> Thanks!
>
> David

Re: [petsc-users] On PCFIELDSPLIT and its implementation

2022-11-15 Thread Jed Brown

You do if preconditioners (like AMG) will use it or if using functions like 
MatSetValuesBlocked(). If you have uniform block structure, it doesn't hurt.

Edoardo alinovi  writes:

> Hi Guys,
>
> Very quick one. Do I need to set the block size with MPIAIJ?

Re: [petsc-users] TSBEULER vs TSPSEUDO

2022-11-08 Thread Jed Brown

Francesc Levrero-Florencio  writes:

> Hi Jed,
>
> Thanks for the answer.
>
> We do have a monolithic arc-length implementation based on the TS/SNES logic, 
> but we are also exploring having a custom SNESSHELL because the arc-length 
> logic is substantially more complex than that of traditional load-controlled 
> continuation methods. It works quite well, the only "issue" is its 
> initiation; we are currently performing load-control (or displacement loading 
> as you mentioned) in the first time increment. Besides load-control and 
> arc-length control, what other continuation methods would you suggest 
> exploring?

Those are the main ones, and they're all simple expressions for the constraint 
condition that you have to handle for arc-length methods, thus suitable to make 
extensible. Wriggers' book has a nice discussion and table. I imagine we'll get 
some more experience with the tradeoffs after I add it to SNES.

> The test problem we are dealing with assumes plasticity but with small 
> strains so we will not see any snap-throughs, snap-backs or similar. TSBEULER 
> works quite well for this specific case and converges in a few time steps 
> within around 5-10 SNES iterations per time step. What PETSc functions do you 
> suggest exploring for implementing the TS time step extension control you 
> mentioned?

Check out src/ts/adapt/impls/ for the current implementations.

> Since you mentioned -ts_theta_initial_guess_extrapolate, is it worth using it 
> in highly nonlinear mechanical problems (such as plasticity)? It sounds quite 
> useful if it consistently reduces SNES iterations by one per time step, as 
> each linear solve is quite expensive for large problems.

I found sometimes it overshoots and thus causes problems, so effectiveness was 
problem-dependent. It's just a run-time flag so check it out.

I'm curious if you have experience using BFGS with Jacobian scaling (either a 
V-cycle or a sparse direct solve) instead of Newton. You can try it using 
-snes_type qn -snes_qn_scale_type jacobian. This can greatly reduce the number 
of assemblies and preconditioner setups, and we find it also reduces total 
number of V-cycles so is effective even with our matrix-free p-MG (which are 
very fast and have much lower setup costs, https://arxiv.org/abs/2204.01722).

Re: [petsc-users] TSBEULER vs TSPSEUDO

2022-11-08 Thread Jed Brown

First, I believe arc-length continuation is the right approach in this problem 
domain. I have a branch starting an implementation, but need to revisit it in 
light of some feedback (and time has been too short lately).

My group's nonlinear mechanics solver uses TSBEULER because it's convenient to 
parametrize loading on T=[0,1]. Unlike arc-length continuation, this can't 
handle snap-through effects. TSPSEUDO is the usual recommendation if you don't 
care about time accuracy, though you could register a custom controller for 
normal TS methods that implements any logic you'd like around automatically 
extending the time step without using a truncation error estimate.

Note that displacement loading (as usually implemented) is really bad 
(especially for models with plasticity) because increments that are large 
relative to the mesh size can invert elements or initiate plastic yielding when 
that would not happen if using smaller increments. Arc-length continuation also 
helps fix that problem.

Note that you can use extrapolation (-ts_theta_initial_guess_extrapolate), 
though I've found this to be somewhat brittle and only reduce SNES iteration 
count by about 1 per time step.

Francesc Levrero-Florencio  writes:

> Hi PETSc people,
>
> We are running highly nonlinear quasi-static (steady-state) mechanical finite 
> element problems with PETSc, currently using TSBEULER and the basic time 
> adapt scheme.
>
> What we do in order to tackle these nonlinear problems is to parametrize the 
> applied loads with the time in the TS and apply them incrementally. While 
> this usually works well, we have seen instances in which the adaptor would 
> reject the time step according to the calculated truncation errors, even if 
> the SNES converges in a small number of iterations. Another issue that we 
> have recently observed is that in a sequence of converged time steps the 
> adaptor decides to start cutting the time step to smaller and smaller values 
> using the low clip default value of TSAdaptGetClip (again because the 
> truncation errors are high enough). What can we do in order to avoid these 
> issues? The first one is avoided by using TSAdaptSetAlwaysAccept, but the 
> latter remains. We have tried setting the low clip value to its maximum 
> accepted value of 1, but then the time increment does not increase even if 
> the SNES always converges in 3 or 4 iterations. Maybe a solution is to 
> increase the tolerances of the TSAdapt?
>
> Another potential solution we have recently tried in order to tackle these 
> issues is using TSPSEUDO (and deparametrizing the applied loads), but 
> generally find that it takes a much longer time to reach an acceptable 
> solution compared with TSBEULER. We have mostly used the default KSPONLY 
> option, but we'd like to explore TSPSEUDO with NEWTONLS. A first question 
> would be: what happens if the SNES fails to converge, does the solution get 
> updated somehow in the corresponding time step? We have performed a few tests 
> with TSPSEUDO and NEWTONLS, setting the maximum number of SNES iterations to 
> a relatively low number (e.g. 5), and then always setting the SNES as 
> converged in the poststage function, and found that it performs reasonably 
> well, at least better than with the default KSPONLY (does this make any 
> sense?).
>
> Thanks a lot!
>
> Regards,
> Francesc.

Re: [petsc-users] Field split degree of freedom ordering

2022-11-02 Thread Jed Brown

Yes, the normal approach is to partition your mesh once, then for each field, 
resolve ownership of any interface dofs with respect to the element partition 
(so shared vertex velocity can land on any process that owns an adjacent 
element, though even this isn't strictly necessary).

Alexander Lindsay  writes:

> So, in the latter case, IIUC we can maintain how we distribute data among
> the processes (partitioning of elements) such that with respect to a
> `-ksp_view_pmat` nothing changes and our velocity and pressure dofs are
> interlaced on a global scale (e.g. each process has some velocity and
> pressure dofs) ... but in order to leverage field split we need those index
> sets in order to avoid the equal size constraint?
>
> On Tue, Nov 1, 2022 at 11:57 PM Jed Brown  wrote:
>
>> In most circumstances, you can and should interlace in some form such that
>> each block in fieldsplit is distributed across all ranks. If you interlace
>> at scalar granularity as described, then each block needs to be able to do
>> that. So for the Stokes equations with equal order elements (like P1-P1
>> stabilized), you can interlace (u,v,w,p), but for mixed elements (like
>> Q2-P1^discontinuous) you can't interlace in that way. You can still
>> distribute pressure and velocity over all processes, but will need index
>> sets to identify the velocity-pressure splits.
>>
>> Alexander Lindsay  writes:
>>
>> > In the block matrices documentation, it's stated: "Note that for
>> interlaced
>> > storage the number of rows/columns of each block must be the same size"
>> Is
>> > interlacing defined in a global sense, or a process-local sense? So
>> > explicitly, if I don't want the same size restriction, do I need to
>> ensure
>> > that globally all of my block 1 dofs are numbered after my block 0 dofs?
>> Or
>> > do I need to follow that on a process-local level? Essentially in libMesh
>> > we always follow rank-major ordering. I'm asking whether for unequal row
>> > sizes, in order to split, would we need to strictly follow variable-major
>> > ordering (splitting here meaning splitting by variable)?
>> >
>> > Alex
>>

Re: [petsc-users] Field split degree of freedom ordering

2022-11-01 Thread Jed Brown

In most circumstances, you can and should interlace in some form such that each 
block in fieldsplit is distributed across all ranks. If you interlace at scalar 
granularity as described, then each block needs to be able to do that. So for 
the Stokes equations with equal order elements (like P1-P1 stabilized), you can 
interlace (u,v,w,p), but for mixed elements (like Q2-P1^discontinuous) you 
can't interlace in that way. You can still distribute pressure and velocity 
over all processes, but will need index sets to identify the velocity-pressure 
splits.

Alexander Lindsay  writes:

> In the block matrices documentation, it's stated: "Note that for interlaced
> storage the number of rows/columns of each block must be the same size" Is
> interlacing defined in a global sense, or a process-local sense? So
> explicitly, if I don't want the same size restriction, do I need to ensure
> that globally all of my block 1 dofs are numbered after my block 0 dofs? Or
> do I need to follow that on a process-local level? Essentially in libMesh
> we always follow rank-major ordering. I'm asking whether for unequal row
> sizes, in order to split, would we need to strictly follow variable-major
> ordering (splitting here meaning splitting by variable)?
>
> Alex

Re: [petsc-users] Clarification on MatMPIBAIJSetPreallocation (d_nnz and o_nnz)

2022-10-24 Thread Jed Brown

This looks like one block row per process? (BAIJ formats store explicit zeros 
that appear within nonzero blocks.) You'd use d_nnz[] = {1}, o_nnz[] = {1} on 
each process.

If each of the dummy numbers there was replaced by a nonzero block (so the 
diagram would be sketching nonzero 3x3 blocks of an 18x18 matrix), then you'd 
have bs=3 with:

rank 0:
d_nnz[] = {2,2,1}, o_nnz[] = {1,1,2};
rank 1:
d_nnz[] = {1,1,1}, o_nnz[] = {2,1,1};

Edoardo alinovi  writes:

> Thank you Jed for the hint.
>
> So just to understand it with an example. Say I have this matrix here,
> which has 4 3x3 blocks
>
> 1 2 0 | 0 5 0 |
> 0 2 3 | 0 0 1 |   < Proc 1
> 0 0 1 | 0 2 2 |
> ||
> 1 2 0 | 0 5 0 |
> 0 2 0 | 0 0 1 |   < Proc 2
> 0 0 1 | 0 0 2 |
> ---|-|
>
> This can be represented as a collection of submatrices like:
> A  B
> C  D
>
> A and D are the diagonal blocks, while B and C are the off-diagonal ones.
> How should I set d_nnz and o_nnz in this case?

Re: [petsc-users] A question about solving a saddle point system with a direct solver

2022-10-24 Thread Jed Brown

You can get lucky with null spaces even with factorization preconditioners, 
especially if the right hand side is orthogonal to the null space. But it's 
fragile and you shouldn't rely on that being true as you change the problem. 
You can either remove the null space in your problem formulation (maybe) or use 
iterative solvers/fieldsplit preconditioning (which can use a direct solver on 
nonsingular blocks).

Jau-Uei Chen  writes:

> To whom it may concern,
>
> I am writing to ask about using PETSc with a direct solver to solve a
> linear system where a single zero-value eigenvalue exists.
>
> Currently, I am working on developing a finite-element solver for a
> linearized incompressible MHD equation. The code is based on an open-source
> library called MFEM which has its own wrapper for PETSc and is used in my
> code. From analysis, I already know that the linear system (Ax=b) to be
> solved is a saddle point system. By using the flags "solver_pc_type svd"
> and "solver_pc_svd_monitor", I indeed observe it. Here is an example of an
> output:
>
> SVD: condition number 3.271390119581e+18, 1 of 66 singular values are
> (nearly) zero
> SVD: smallest singular values: 3.236925932523e-17 3.108788619412e-04
> 3.840514506502e-04 4.599292003910e-04 4.909419974671e-04
> SVD: largest singular values : 4.007319935079e+00 4.027759008411e+00
> 4.817755760754e+00 4.176127583956e+01 1.058924751347e+02
>
>
> However, What surprises me is that the numerical solutions are still
> relatively accurate by comparing to the exact ones (i.e. manufactured
> solutions) when I perform convergence tests even if I am using a direct
> solver (i.e. -solver_ksp_type preonly -solver_pc_type lu
> -solver_pc_factor_mat_solver_type
> mumps). My question is: Why the direct solver won't break down in this
> context? I understand that it won't be an issue for iterative solvers such
> as GMRES [1][2] but not really sure why it won't cause trouble in direct
> solvers.
>
> Any comments or suggestions are greatly appreciated.
>
> Best Regards,
> Jau-Uei Chen
>
> Reference:
> [1] Benzi, Michele, et al. “Numerical Solution of Saddle Point Problems.”
> Acta Numerica, vol. 14, May 2005, pp. 1–137. DOI.org (Crossref),
> https://doi.org/10.1017/S0962492904000212.
> [2] Elman, Howard C., et al. Finite Elements and Fast Iterative Solvers:
> With Applications in Incompressible Fluid Dynamics. Second edition, Oxford
> University Press, 2014.

Re: [petsc-users] Clarification on MatMPIBAIJSetPreallocation (d_nnz and o_nnz)

2022-10-24 Thread Jed Brown

I recommend calling this one preallocation function, which will preallocate 
scalar and block formats. It takes one value per block row, counting in blocks.

https://petsc.org/release/docs/manualpages/Mat/MatXAIJSetPreallocation/

Edoardo alinovi  writes:

> Hello Barry,
>
> I am doing some preliminary work to add a coupled solver in my code.  I am
> reading some documentation on MPIBAIJ as the matrix will be composed of
> blocks of size 3x3 in 2D and 4x4 in 3D for each cell in the domain.
>
> I can't quite understand how to set d_nnz and o_nnz. What is their size?
> Should I provide a number of non-zero entries for each block or should I do
> it line by line as in MATAIJ?
>
> Would you be able to provide me with a silly example?
>
> Thank you!

Re: [petsc-users] suppress CUDA warning & choose MCA parameter for mpirun during make PETSC_ARCH=arch-linux-c-debug check

2022-10-08 Thread Jed Brown

Barry Smith  writes:

>I hate these kinds of make rules that hide what the compiler is doing (in 
> the name of having less output, I guess) it makes it difficult to figure out 
> what is going wrong.

You can make VERBOSE=1 with CMake-generated makefiles.

Re: [petsc-users] Solve Linear System with Field Split Preconditioner

2022-09-26 Thread Jed Brown

Xiaofeng, is your saddle point due to incompressibility or other constraints 
(like Lagrange multipliers for contact or multi-point constraints)? If 
incompressibility, are you working on structured or unstructured/non-nested 
meshes?

Matthew Knepley  writes:

> Another option are the PCPATCH solvers for multigrid, as shown in this
> paper: https://arxiv.org/abs/1912.08516
> which I believe solves incompressible elasticity. There is an example in
> PETSc for Stokes I believe.
>
>   Thanks,
>
>  Matt
>
> On Mon, Sep 26, 2022 at 5:20 AM 晓峰 何  wrote:
>
>> Are there other approaches to solve this kind of systems in PETSc except
>> for field-split methods?
>>
>> Thanks,
>> Xiaofeng
>>
>> On Sep 26, 2022, at 14:13, Jed Brown  wrote:
>>
>> This is the joy of factorization field-split methods. The actual Schur
>> complement is dense, so we represent it implicitly. A common strategy is to
>> assemble the mass matrix and drop it in the 11 block of the Pmat. You can
>> check out some examples in the repository for incompressible flow (Stokes
>> problems). The LSC (least squares commutator) is another option. You'll
>> likely find that lumping diag(A00)^{-1} works poorly because the resulting
>> operator behaves like a Laplacian rather than like a mass matrix.
>>
>> 晓峰 何  writes:
>>
>> If assigned a preconditioner to A11 with this cmd options:
>>
>>   -fieldsplit_0_ksp_type gmres -fieldsplit_0_pc_type ilu
>> -fieldsplit_1_ksp_type gmres -fieldsplit_1_pc_type ilu
>>
>> Then I got this error:
>>
>> "Could not locate a solver type for factorization type ILU and matrix type
>> schurcomplement"
>>
>> How could I specify a preconditioner for A11?
>>
>> BR,
>> Xiaofeng
>>
>>
>> On Sep 26, 2022, at 11:02, 晓峰 何 > mailto:tlan...@hotmail.com >> wrote:
>>
>> -fieldsplit_0_ksp_type gmres -fieldsplit_0_pc_type ilu
>> -fieldsplit_1_ksp_type gmres -fieldsplit_1_pc_type none
>>
>>
>>
>
> -- 
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 3941 matches

Mail list logo