Re: [petsc-users] Problem in some macro when using VS+intel cl

2023-06-27 Thread 冯上玮
This is EXACTLY the CRUX of the matter, with this precompile command, there is 
no more error! Thanks for your patience with my numerous and continuous 
questions.


Je vous remercie !!


--Original--
From: "BarrySmith"https://stackoverflow.com/questions/42136395/identifier-builtin-expect-is-undefined-during-ros-on-win-tutorial-talker-ex.
 But Intel says that they also provide such thing on icl, and I actually use 
this compiler instead of visual studio cl... 




The IDE is not showing the actual error message. Are you sure that your IDE 
build has the right includes and libraries? You can
get these using


 cd $PETSC_DIR
 make getincludedirs
 make getlinklibs


 Thanks,


  Matt

Anyway, the project could be built if I delete these error-checking macro.


Installing feedback (or as a test result):
When configure on windows, only icl + impi works, and in this case, both 
--with-cc and --with-cxx options need to point out the version like: 
--with-cc-std-c99 and --with-cxx-std-c++'ver'. Other combinations such as cl + 
impi, icl + msmpi, cl + msmpi never work. My tutor told me that older version 
of msmpi may work but I never try this.


FENG.







-- 
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener


https://www.cse.buffalo.edu/~knepley/

Re: [petsc-users] Problem in some macro when using VS+intel cl

2023-06-27 Thread Barry Smith

   The macros expand differently depending on the compiler being used. In this 
case

#if defined(PETSC_HAVE_BUILTIN_EXPECT)
  #define PetscUnlikely(cond) __builtin_expect(!!(cond), 0)
  #define PetscLikely(cond)   __builtin_expect(!!(cond), 1)
#else
  #define PetscUnlikely(cond) (cond)
  #define PetscLikely(cond)   (cond)
#endif

So with Microsoft Windows compilers, if they do not support built_inexpect the 
compiler will only see the #else for the macro thus the compiler would never 
see the __builtin_expect

You can check in $PETSC_DIR/$PETSC_ARCH/include/petscconf.h and see if 
PETSC_HAVE_BUILTIN_EXPECT is defined. ./configure determines if this (and many 
other) features are supported by the compiler. It is conceivable that somehow 
configure determined incorrectly that this is supported.




> On Jun 27, 2023, at 10:09 PM, 冯上玮  wrote:
> 
> I've followed your advice and include the header's file and libraries in 
> Visual Studio. Such "error" still shows but I can build the project! It's 
> strange!
> I expand the CHKERRQ macro and find the error actually locates at
> 
> <2f077...@673dae5f.6c969b64.bmp>
>  
> What I know from google is that the "__builtin_expect__" is defined in GCC, 
> so is it unsolvable in Windows with visual studio C compiler or Inter C 
> compiler?
> -- Original --
> From:  "Matthew Knepley";
> Date:  Wed, Jun 28, 2023 01:59 AM
> To:  "冯上玮";
> Cc:  "petsc-users";
> Subject:  Re: [petsc-users] Problem in some macro when using VS+intel cl
>  
> On Tue, Jun 27, 2023 at 11:32 AM 冯上玮  > wrote:
>> Hi, 
>> 
>> After failure with MS-MPI once and once again, I tried icl+oneAPI and 
>> succeeded in installing and testing PESTc in Cygwin!
>> 
>> However, (always however) when I copied the example code on Getting Started 
>> page on visual studio, there are tons of error like:
>> 
>> I just wonder where the problem locates, I've googled this error message and 
>> it seems that it's induced by the difference of compilers, c.f. 
>> https://stackoverflow.com/questions/42136395/identifier-builtin-expect-is-undefined-during-ros-on-win-tutorial-talker-ex.
>>  But Intel says that they also provide such thing on icl, and I actually use 
>> this compiler instead of visual studio cl... 
> 
> The IDE is not showing the actual error message. Are you sure that your IDE 
> build has the right includes and libraries? You can
> get these using
> 
>   cd $PETSC_DIR
>   make getincludedirs
>   make getlinklibs
> 
>   Thanks,
> 
>  Matt
>  
>> Anyway, the project could be built if I delete these error-checking macro.
>> 
>> Installing feedback (or as a test result):
>> When configure on windows, only icl + impi works, and in this case, both 
>> --with-cc and --with-cxx options need to point out the version like: 
>> --with-cc-std-c99 and --with-cxx-std-c++'ver'. Other combinations such as cl 
>> + impi, icl + msmpi, cl + msmpi never work. My tutor told me that older 
>> version of msmpi may work but I never try this.
>> 
>> FENG.
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any results to which their experiments 
> lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ 



Re: [petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution

2023-06-27 Thread Matthew Knepley
On Tue, Jun 27, 2023 at 2:56 PM Vanella, Marcos (Fed) <
marcos.vane...@nist.gov> wrote:

> Thank you Matt. I'll try the flags you recommend for monitoring. Correct,
> I'm trying to see if GPU would provide an advantage for this particular
> Poisson solution we do in our code.
>
> Our grids are staggered with the Poisson unknown in cell centers. All my
> tests for single mesh runs with 100K to 200K meshes show MKL PARDISO as the
> faster option for these meshes considering the mesh as unstructured (an
> implementation separate from the PETSc option). We have the option of
> Fishpack (fast trigonometric solvers), but that is not as general (requires
> solution on the whole mesh + a special treatment of immersed geometry). The
> single mesh solver is used as a black box within a fixed point domain
> decomposition iteration in multi-mesh cases. The approximation error in
> this method is confined to the mesh boundaries.
>
> The other option I have tried with MKL is to build the global matrix
> across all meshes and use the MKL cluster sparse solver. The problem
> becomes a memory one for meshes that go over a couple million unknowns due
> to the exact Cholesky factorization matrix storage. I'm thinking the other
> possibility using PETSc is to build in parallel the global matrix (as done
> for the MKL global solver) and try the GPU accelerated Krylov + multigrid
> preconditioner. If this can bring down the time to solution to what we get
> for the previous scheme and keep memory use undrr control it would be a
> good option for CPU+GPU systems. Thing is we need to bring the residual of
> the equation to ~10^-10 or less to avoid instability so it might still be
> costly.
>

Yes,  this is definitely the option I would try. First, I would just use
AMG (GAMG, Hypre, ML). If those work,
you can speed up the setup time and bring down memory somewhat with GMG.
Since your grid is Cartesian, you could use DMDA to do this easily.

  Thanks,

 Matt


> I'll keep you updated. Thanks,
> Marcos
> --
> *From:* Matthew Knepley 
> *Sent:* Tuesday, June 27, 2023 2:08 PM
> *To:* Vanella, Marcos (Fed) 
> *Cc:* Mark Adams ; petsc-users@mcs.anl.gov <
> petsc-users@mcs.anl.gov>
> *Subject:* Re: [petsc-users] SOLVE + PC combination for 7 point stencil
> (unstructured) poisson solution
>
> On Tue, Jun 27, 2023 at 11:23 AM Vanella, Marcos (Fed) <
> marcos.vane...@nist.gov> wrote:
>
> Hi Mark and Matt, I tried swapping the preconditioner to cholmod and also
> the hypre Boomer AMG. They work just fine for my case. I also got my hands
> on a machine with NVIDIA gpus in one of our AI clusters. I compiled PETSc
> to make use of cuda and cuda-enabled openmpi (with gcc).
> I'm running the previous tests and want to also check some of the cuda
> enabled solvers. I was able to submit a case for the default Krylov solver
> with these runtime flags: -vec_type seqcuda -mat_type seqaijcusparse
> -pc_type cholesky -pc_factor_mat_solver_type cusparse. The case run to
> completion.
>
> I guess my question now is how do I monitor (if there is a way) that the
> GPU is being used in the calculation, and any other stats?
>
>
> You should get that automatically with
>
>   -log_view
>
> If you want finer-grained profiling of the kernels, you can use
>
>   -log_view_gpu_time
>
> but it can slows things down.
>
>
> Also, which other solver combination using GPU would you recommend for me
> to try? Can we compile PETSc with the cuda enabled version for CHOLMOD and
> HYPRE?
>
>
> Hypre has GPU support but not CHOLMOD. There are no rules of thumb right
> now for GPUs. It depends on what card you have, what version of the driver,
> what version of the libraries, etc. It is very fragile. Hopefully this
> period ends soon, but I am not optimistic. Unless you are very confident
> that GPUs will help,
> I would not recommend spending the time.
>
>   Thanks,
>
>  Matt
>
>
> Thank you for your help!
> Marcos
>
> --
> *From:* Matthew Knepley 
> *Sent:* Monday, June 26, 2023 12:11 PM
> *To:* Vanella, Marcos (Fed) 
> *Cc:* Mark Adams ; petsc-users@mcs.anl.gov <
> petsc-users@mcs.anl.gov>
> *Subject:* Re: [petsc-users] SOLVE + PC combination for 7 point stencil
> (unstructured) poisson solution
>
> On Mon, Jun 26, 2023 at 12:08 PM Vanella, Marcos (Fed) via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
>
> Than you Matt and Mark, I'll try your suggestions. To configure with hypre
> can I just use the --download-hypre configure line?
>
>
> Yes,
>
>   Thanks,
>
> Matt
>
>
> That is what I did with suitesparse, very nice.
> --
> *From:* Mark Adams 
> *Sent:* Monday, June 26, 2023 12:05 PM
> *To:* Vanella, Marcos (Fed) 
> *Cc:* petsc-users@mcs.anl.gov 
> *Subject:* Re: [petsc-users] SOLVE + PC combination for 7 point stencil
> (unstructured) poisson solution
>
> I'm not sure what MG is doing with an "unstructured" problem. I assume you
> are not using DMDA.
> -pc_type gamg should 

Re: [petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution

2023-06-27 Thread Vanella, Marcos (Fed) via petsc-users
Sorry, meant 100K to 200K cells.

Also, check the release page of suitesparse. The mutli-GPU version of cholmod 
might be coming soon:

https://people.engr.tamu.edu/davis/SuiteSparse/index.html

From: Vanella, Marcos (Fed) 
Sent: Tuesday, June 27, 2023 2:56 PM
To: Matthew Knepley 
Cc: Mark Adams ; petsc-users@mcs.anl.gov 

Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil 
(unstructured) poisson solution

Thank you Matt. I'll try the flags you recommend for monitoring. Correct, I'm 
trying to see if GPU would provide an advantage for this particular Poisson 
solution we do in our code.

Our grids are staggered with the Poisson unknown in cell centers. All my tests 
for single mesh runs with 100K to 200K meshes show MKL PARDISO as the faster 
option for these meshes considering the mesh as unstructured (an implementation 
separate from the PETSc option). We have the option of Fishpack (fast 
trigonometric solvers), but that is not as general (requires solution on the 
whole mesh + a special treatment of immersed geometry). The single mesh solver 
is used as a black box within a fixed point domain decomposition iteration in 
multi-mesh cases. The approximation error in this method is confined to the 
mesh boundaries.

The other option I have tried with MKL is to build the global matrix across all 
meshes and use the MKL cluster sparse solver. The problem becomes a memory one 
for meshes that go over a couple million unknowns due to the exact Cholesky 
factorization matrix storage. I'm thinking the other possibility using PETSc is 
to build in parallel the global matrix (as done for the MKL global solver) and 
try the GPU accelerated Krylov + multigrid preconditioner. If this can bring 
down the time to solution to what we get for the previous scheme and keep 
memory use undrr control it would be a good option for CPU+GPU systems. Thing 
is we need to bring the residual of the equation to ~10^-10 or less to avoid 
instability so it might still be costly.

I'll keep you updated. Thanks,
Marcos

From: Matthew Knepley 
Sent: Tuesday, June 27, 2023 2:08 PM
To: Vanella, Marcos (Fed) 
Cc: Mark Adams ; petsc-users@mcs.anl.gov 

Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil 
(unstructured) poisson solution

On Tue, Jun 27, 2023 at 11:23 AM Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>> wrote:
Hi Mark and Matt, I tried swapping the preconditioner to cholmod and also the 
hypre Boomer AMG. They work just fine for my case. I also got my hands on a 
machine with NVIDIA gpus in one of our AI clusters. I compiled PETSc to make 
use of cuda and cuda-enabled openmpi (with gcc).
I'm running the previous tests and want to also check some of the cuda enabled 
solvers. I was able to submit a case for the default Krylov solver with these 
runtime flags: -vec_type seqcuda -mat_type seqaijcusparse -pc_type cholesky 
-pc_factor_mat_solver_type cusparse. The case run to completion.

I guess my question now is how do I monitor (if there is a way) that the GPU is 
being used in the calculation, and any other stats?

You should get that automatically with

  -log_view

If you want finer-grained profiling of the kernels, you can use

  -log_view_gpu_time

but it can slows things down.

Also, which other solver combination using GPU would you recommend for me to 
try? Can we compile PETSc with the cuda enabled version for CHOLMOD and HYPRE?

Hypre has GPU support but not CHOLMOD. There are no rules of thumb right now 
for GPUs. It depends on what card you have, what version of the driver, what 
version of the libraries, etc. It is very fragile. Hopefully this period ends 
soon, but I am not optimistic. Unless you are very confident that GPUs will 
help,
I would not recommend spending the time.

  Thanks,

 Matt

Thank you for your help!
Marcos


From: Matthew Knepley mailto:knep...@gmail.com>>
Sent: Monday, June 26, 2023 12:11 PM
To: Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>>
Cc: Mark Adams mailto:mfad...@lbl.gov>>; 
petsc-users@mcs.anl.gov 
mailto:petsc-users@mcs.anl.gov>>
Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil 
(unstructured) poisson solution

On Mon, Jun 26, 2023 at 12:08 PM Vanella, Marcos (Fed) via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Than you Matt and Mark, I'll try your suggestions. To configure with hypre can 
I just use the --download-hypre configure line?

Yes,

  Thanks,

Matt

That is what I did with suitesparse, very nice.

From: Mark Adams mailto:mfad...@lbl.gov>>
Sent: Monday, June 26, 2023 12:05 PM
To: Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>>
Cc: petsc-users@mcs.anl.gov 
mailto:petsc-users@mcs.anl.gov>>
Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil 
(unstructured) poisson 

Re: [petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution

2023-06-27 Thread Vanella, Marcos (Fed) via petsc-users
Thank you Matt. I'll try the flags you recommend for monitoring. Correct, I'm 
trying to see if GPU would provide an advantage for this particular Poisson 
solution we do in our code.

Our grids are staggered with the Poisson unknown in cell centers. All my tests 
for single mesh runs with 100K to 200K meshes show MKL PARDISO as the faster 
option for these meshes considering the mesh as unstructured (an implementation 
separate from the PETSc option). We have the option of Fishpack (fast 
trigonometric solvers), but that is not as general (requires solution on the 
whole mesh + a special treatment of immersed geometry). The single mesh solver 
is used as a black box within a fixed point domain decomposition iteration in 
multi-mesh cases. The approximation error in this method is confined to the 
mesh boundaries.

The other option I have tried with MKL is to build the global matrix across all 
meshes and use the MKL cluster sparse solver. The problem becomes a memory one 
for meshes that go over a couple million unknowns due to the exact Cholesky 
factorization matrix storage. I'm thinking the other possibility using PETSc is 
to build in parallel the global matrix (as done for the MKL global solver) and 
try the GPU accelerated Krylov + multigrid preconditioner. If this can bring 
down the time to solution to what we get for the previous scheme and keep 
memory use undrr control it would be a good option for CPU+GPU systems. Thing 
is we need to bring the residual of the equation to ~10^-10 or less to avoid 
instability so it might still be costly.

I'll keep you updated. Thanks,
Marcos

From: Matthew Knepley 
Sent: Tuesday, June 27, 2023 2:08 PM
To: Vanella, Marcos (Fed) 
Cc: Mark Adams ; petsc-users@mcs.anl.gov 

Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil 
(unstructured) poisson solution

On Tue, Jun 27, 2023 at 11:23 AM Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>> wrote:
Hi Mark and Matt, I tried swapping the preconditioner to cholmod and also the 
hypre Boomer AMG. They work just fine for my case. I also got my hands on a 
machine with NVIDIA gpus in one of our AI clusters. I compiled PETSc to make 
use of cuda and cuda-enabled openmpi (with gcc).
I'm running the previous tests and want to also check some of the cuda enabled 
solvers. I was able to submit a case for the default Krylov solver with these 
runtime flags: -vec_type seqcuda -mat_type seqaijcusparse -pc_type cholesky 
-pc_factor_mat_solver_type cusparse. The case run to completion.

I guess my question now is how do I monitor (if there is a way) that the GPU is 
being used in the calculation, and any other stats?

You should get that automatically with

  -log_view

If you want finer-grained profiling of the kernels, you can use

  -log_view_gpu_time

but it can slows things down.

Also, which other solver combination using GPU would you recommend for me to 
try? Can we compile PETSc with the cuda enabled version for CHOLMOD and HYPRE?

Hypre has GPU support but not CHOLMOD. There are no rules of thumb right now 
for GPUs. It depends on what card you have, what version of the driver, what 
version of the libraries, etc. It is very fragile. Hopefully this period ends 
soon, but I am not optimistic. Unless you are very confident that GPUs will 
help,
I would not recommend spending the time.

  Thanks,

 Matt

Thank you for your help!
Marcos


From: Matthew Knepley mailto:knep...@gmail.com>>
Sent: Monday, June 26, 2023 12:11 PM
To: Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>>
Cc: Mark Adams mailto:mfad...@lbl.gov>>; 
petsc-users@mcs.anl.gov 
mailto:petsc-users@mcs.anl.gov>>
Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil 
(unstructured) poisson solution

On Mon, Jun 26, 2023 at 12:08 PM Vanella, Marcos (Fed) via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Than you Matt and Mark, I'll try your suggestions. To configure with hypre can 
I just use the --download-hypre configure line?

Yes,

  Thanks,

Matt

That is what I did with suitesparse, very nice.

From: Mark Adams mailto:mfad...@lbl.gov>>
Sent: Monday, June 26, 2023 12:05 PM
To: Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>>
Cc: petsc-users@mcs.anl.gov 
mailto:petsc-users@mcs.anl.gov>>
Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil 
(unstructured) poisson solution

I'm not sure what MG is doing with an "unstructured" problem. I assume you are 
not using DMDA.
-pc_type gamg should work
I would configure with hypre and try that also: -pc_type hypre

As Matt said MG should be faster. How many iterations was it taking?
Try a 100^3 and check that the iteration count does not change much, if at all.

Mark


On Mon, Jun 26, 2023 at 11:35 AM Vanella, Marcos (Fed) via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:

Re: [petsc-users] How to build compatible MPI matrix for dmplex

2023-06-27 Thread Matthew Knepley
On Tue, Jun 27, 2023 at 2:20 PM Duan Junming via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> Dear all,
>
>
> I try to create a compatible sparse MPI matrix A with dmplex global vector
> x, so I can do matrix-vector multiplication y = A*x.
>
> I think I can first get the local and global sizes of x on comm, say n and
> N, also sizes of y, m, M,
>
> then create A by using MatCreate(comm, ), set the sizes using
> MatSetSizes(A, m, n, M, N), set the type using MatSetType(A, MATMPIAIJ). Is
> this process correct?
>

Yes.


> Another question is: Do the entries not filled automatically compressed
> out?
>

Yes.

  Thanks,

Matt


> Thanks!
>
> Junming
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


[petsc-users] How to build compatible MPI matrix for dmplex

2023-06-27 Thread Duan Junming via petsc-users
Dear all,


I try to create a compatible sparse MPI matrix A with dmplex global vector x, 
so I can do matrix-vector multiplication y = A*x.

I think I can first get the local and global sizes of x on comm, say n and N, 
also sizes of y, m, M,

then create A by using MatCreate(comm, ), set the sizes using MatSetSizes(A, 
m, n, M, N), set the type using MatSetType(A, MATMPIAIJ). Is this process 
correct?


Another question is: Do the entries not filled automatically compressed out?


Thanks!

Junming


Re: [petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution

2023-06-27 Thread Matthew Knepley
On Tue, Jun 27, 2023 at 11:23 AM Vanella, Marcos (Fed) <
marcos.vane...@nist.gov> wrote:

> Hi Mark and Matt, I tried swapping the preconditioner to cholmod and also
> the hypre Boomer AMG. They work just fine for my case. I also got my hands
> on a machine with NVIDIA gpus in one of our AI clusters. I compiled PETSc
> to make use of cuda and cuda-enabled openmpi (with gcc).
> I'm running the previous tests and want to also check some of the cuda
> enabled solvers. I was able to submit a case for the default Krylov solver
> with these runtime flags: -vec_type seqcuda -mat_type seqaijcusparse
> -pc_type cholesky -pc_factor_mat_solver_type cusparse. The case run to
> completion.
>
> I guess my question now is how do I monitor (if there is a way) that the
> GPU is being used in the calculation, and any other stats?
>

You should get that automatically with

  -log_view

If you want finer-grained profiling of the kernels, you can use

  -log_view_gpu_time

but it can slows things down.


> Also, which other solver combination using GPU would you recommend for me
> to try? Can we compile PETSc with the cuda enabled version for CHOLMOD and
> HYPRE?
>

Hypre has GPU support but not CHOLMOD. There are no rules of thumb right
now for GPUs. It depends on what card you have, what version of the driver,
what version of the libraries, etc. It is very fragile. Hopefully this
period ends soon, but I am not optimistic. Unless you are very confident
that GPUs will help,
I would not recommend spending the time.

  Thanks,

 Matt


> Thank you for your help!
> Marcos
>
> --
> *From:* Matthew Knepley 
> *Sent:* Monday, June 26, 2023 12:11 PM
> *To:* Vanella, Marcos (Fed) 
> *Cc:* Mark Adams ; petsc-users@mcs.anl.gov <
> petsc-users@mcs.anl.gov>
> *Subject:* Re: [petsc-users] SOLVE + PC combination for 7 point stencil
> (unstructured) poisson solution
>
> On Mon, Jun 26, 2023 at 12:08 PM Vanella, Marcos (Fed) via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
>
> Than you Matt and Mark, I'll try your suggestions. To configure with hypre
> can I just use the --download-hypre configure line?
>
>
> Yes,
>
>   Thanks,
>
> Matt
>
>
> That is what I did with suitesparse, very nice.
> --
> *From:* Mark Adams 
> *Sent:* Monday, June 26, 2023 12:05 PM
> *To:* Vanella, Marcos (Fed) 
> *Cc:* petsc-users@mcs.anl.gov 
> *Subject:* Re: [petsc-users] SOLVE + PC combination for 7 point stencil
> (unstructured) poisson solution
>
> I'm not sure what MG is doing with an "unstructured" problem. I assume you
> are not using DMDA.
> -pc_type gamg should work
> I would configure with hypre and try that also: -pc_type hypre
>
> As Matt said MG should be faster. How many iterations was it taking?
> Try a 100^3 and check that the iteration count does not change much, if at
> all.
>
> Mark
>
>
> On Mon, Jun 26, 2023 at 11:35 AM Vanella, Marcos (Fed) via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
>
> Hi, I was wondering if anyone has experience on what combinations are more
> efficient to solve a Poisson problem derived from a 7 point stencil on a
> single mesh (serial).
> I've been doing some tests of multigrid and cholesky on a 50^3 mesh. *-pc_type
> mg* takes about 75% more time than *-pc_type cholesky
> -pc_factor_mat_solver_type cholmod* for the case I'm testing.
> I'm new to PETSc so any suggestions are most welcome and appreciated,
> Marcos
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> 
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] GAMG and Hypre preconditioner

2023-06-27 Thread Zisheng Ye via petsc-users
Hi Jed

Thanks for your reply. I have sent the log files to petsc-ma...@mcs.anl.gov.

Zisheng

From: Jed Brown 
Sent: Tuesday, June 27, 2023 1:02 PM
To: Zisheng Ye ; petsc-users@mcs.anl.gov 

Subject: Re: [petsc-users] GAMG and Hypre preconditioner

[External Sender]

Zisheng Ye via petsc-users  writes:

> Dear PETSc Team
>
> We are testing the GPU support in PETSc's KSPSolve, especially for the GAMG 
> and Hypre preconditioners. We have encountered several issues that we would 
> like to ask for your suggestions.
>
> First, we have couple of questions when working with a single MPI rank:
>
>   1.  We have tested two backends, CUDA and Kokkos. One commonly encountered 
> error is related to SpGEMM in CUDA when the mat is large as listed below:
>
> cudaMalloc((void **), bufferSize2) error( cudaErrorMemoryAllocation): 
> out of memory
>
> For CUDA backend, one can use "-matmatmult_backend_cpu -matptap_backend_cpu" 
> to avoid these problems. However, there seems no equivalent options in Kokkos 
> backend. Is there any good practice to avoid this error for both backends and 
> if we can avoid this error in Kokkos backend?

Junchao will know more about KK tuning, but the faster GPU matrix-matrix 
algorithms use extra memory. We should be able to make the host option 
available with kokkos.

>   2.  We have tested the combination of Hypre and Kokkos as backend. It looks 
> like this combination is not compatible with each other, as we observed that 
> KSPSolve takes a greater number of iterations to exit, and the residual norm 
> in the post-checking is much larger than the one obtained when working with 
> CUDA backend. This happens for matrices with block size larger than 1. Is 
> there any explanation to the error?
>
> Second, we have couple more questions when working with multiple MPI ranks:
>
>   1.  We are currently using OpenMPI as we couldnt get Intel MPI to work as a 
> GPU-aware MPI, is this a known issue with Intel MPI?

As far as I know, Intel's MPI is only for SYCL/Intel GPUs. In general, 
GPU-aware MPI has been incredibly flaky on all HPC systems despite being 
introduced ten years ago.

>   2.  With OpenMPI we currently see a slow down when increasing the MPI count 
> as shown in the figure below, is this normal?

Could you share -log_view output from a couple representative runs? You could 
send those here or to petsc-ma...@mcs.anl.gov. We need to see what kind of work 
is not scaling to attribute what may be causing it.


Re: [petsc-users] Problem in some macro when using VS+intel cl

2023-06-27 Thread Barry Smith

  Regarding PetscCall(). It sounds like you are working with two different 
versions of PETSc with different compilers? This isn't practical since things 
do change (improve we hope) with newer versions of PETSc. You should just built 
the latest version of PETSc with all the compiler suites you are interested in.

  Barry


> On Jun 27, 2023, at 11:32 AM, 冯上玮  wrote:
> 
> Hi, 
> 
> After failure with MS-MPI once and once again, I tried icl+oneAPI and 
> succeeded in installing and testing PESTc in Cygwin!
> 
> However, (always however) when I copied the example code on Getting Started 
> page on visual studio, there are tons of error like:
> <724ce...@d9b26517.fa009b64.jpg>
> I just wonder where the problem locates, I've googled this error message and 
> it seems that it's induced by the difference of compilers, c.f. 
> https://stackoverflow.com/questions/42136395/identifier-builtin-expect-is-undefined-during-ros-on-win-tutorial-talker-ex.
>  But Intel says that they also provide such thing on icl, and I actually use 
> this compiler instead of visual studio cl... 
> 
> Anyway, the project could be built if I delete these error-checking macro.
> 
> Installing feedback (or as a test result):
> When configure on windows, only icl + impi works, and in this case, both 
> --with-cc and --with-cxx options need to point out the version like: 
> --with-cc-std-c99 and --with-cxx-std-c++'ver'. Other combinations such as cl 
> + impi, icl + msmpi, cl + msmpi never work. My tutor told me that older 
> version of msmpi may work but I never try this.
> 
> FENG.



Re: [petsc-users] Scalable Solver for Incompressible Flow

2023-06-27 Thread Alexander Lindsay
I've opened https://gitlab.com/petsc/petsc/-/merge_requests/6642 which adds
a couple more scaling applications of the inverse of the diagonal of A

On Mon, Jun 26, 2023 at 6:06 PM Alexander Lindsay 
wrote:

> I guess that similar to the discussions about selfp, the approximation of
> the velocity mass matrix by the diagonal of the velocity sub-matrix will
> improve when running a transient as opposed to a steady calculation,
> especially if the time derivative is lumped Just thinking while typing
>
> On Mon, Jun 26, 2023 at 6:03 PM Alexander Lindsay <
> alexlindsay...@gmail.com> wrote:
>
>> Returning to Sebastian's question about the correctness of the current
>> LSC implementation: in the taxonomy paper that Jed linked to (which talks
>> about SIMPLE, PCD, and LSC), equation 21 shows four applications of the
>> inverse of the velocity mass matrix. In the PETSc implementation there are
>> at most two applications of the reciprocal of the diagonal of A (an
>> approximation to the velocity mass matrix without more plumbing, as already
>> pointed out). It seems like for code implementations in which there are
>> possible scaling differences between the velocity and pressure equations,
>> that this difference in the number of inverse applications could be
>> significant? I know Jed said that these scalings wouldn't really matter if
>> you have a uniform grid, but I'm not 100% convinced yet.
>>
>> I might try fiddling around with adding two more reciprocal applications.
>>
>> On Fri, Jun 23, 2023 at 1:09 PM Pierre Jolivet 
>> wrote:
>>
>>>
>>> On 23 Jun 2023, at 10:06 PM, Pierre Jolivet 
>>> wrote:
>>>
>>>
>>> On 23 Jun 2023, at 9:39 PM, Alexander Lindsay 
>>> wrote:
>>>
>>> Ah, I see that if I use Pierre's new 'full' option for
>>> -mat_schur_complement_ainv_type
>>>
>>>
>>> That was not initially done by me
>>>
>>>
>>> Oops, sorry for the noise, looks like it was done by me indeed
>>> in 9399e4fd88c6621aad8fe9558ce84df37bd6fada…
>>>
>>> Thanks,
>>> Pierre
>>>
>>> (though I recently tweaked MatSchurComplementComputeExplicitOperator() a
>>> bit to use KSPMatSolve(), so that if you have a small Schur complement —
>>> which is not really the case for NS — this could be a viable option, it was
>>> previously painfully slow).
>>>
>>> Thanks,
>>> Pierre
>>>
>>> that I get a single iteration for the Schur complement solve with LU.
>>> That's a nice testing option
>>>
>>> On Fri, Jun 23, 2023 at 12:02 PM Alexander Lindsay <
>>> alexlindsay...@gmail.com> wrote:
>>>
 I guess it is because the inverse of the diagonal form of A00 becomes a
 poor representation of the inverse of A00? I guess naively I would have
 thought that the blockdiag form of A00 is A00

 On Fri, Jun 23, 2023 at 10:18 AM Alexander Lindsay <
 alexlindsay...@gmail.com> wrote:

> Hi Jed, I will come back with answers to all of your questions at some
> point. I mostly just deal with MOOSE users who come to me and tell me 
> their
> solve is converging slowly, asking me how to fix it. So I generally assume
> they have built an appropriate mesh and problem size for the problem they
> want to solve and added appropriate turbulence modeling (although my
> general assumption is often violated).
>
> > And to confirm, are you doing a nonlinearly implicit
> velocity-pressure solve?
>
> Yes, this is our default.
>
> A general question: it seems that it is well known that the quality of
> selfp degrades with increasing advection. Why is that?
>
> On Wed, Jun 7, 2023 at 8:01 PM Jed Brown  wrote:
>
>> Alexander Lindsay  writes:
>>
>> > This has been a great discussion to follow. Regarding
>> >
>> >> when time stepping, you have enough mass matrix that cheaper
>> preconditioners are good enough
>> >
>> > I'm curious what some algebraic recommendations might be for high
>> Re in
>> > transients.
>>
>> What mesh aspect ratio and streamline CFL number? Assuming your model
>> is turbulent, can you say anything about momentum thickness Reynolds 
>> number
>> Re_θ? What is your wall normal spacing in plus units? (Wall resolved or
>> wall modeled?)
>>
>> And to confirm, are you doing a nonlinearly implicit
>> velocity-pressure solve?
>>
>> > I've found one-level DD to be ineffective when applied
>> monolithically or to the momentum block of a split, as it scales with the
>> mesh size.
>>
>> I wouldn't put too much weight on "scaling with mesh size" per se.
>> You want an efficient solver for the coarsest mesh that delivers 
>> sufficient
>> accuracy in your flow regime. Constants matter.
>>
>> Refining the mesh while holding time steps constant changes the
>> advective CFL number as well as cell Peclet/cell Reynolds numbers. A
>> meaningful scaling study is to increase Reynolds number (e.g., by growing
>> the domain) while keeping mesh size 

Re: [petsc-users] GAMG and Hypre preconditioner

2023-06-27 Thread Jed Brown
Zisheng Ye via petsc-users  writes:

> Dear PETSc Team
>
> We are testing the GPU support in PETSc's KSPSolve, especially for the GAMG 
> and Hypre preconditioners. We have encountered several issues that we would 
> like to ask for your suggestions.
>
> First, we have couple of questions when working with a single MPI rank:
>
>   1.  We have tested two backends, CUDA and Kokkos. One commonly encountered 
> error is related to SpGEMM in CUDA when the mat is large as listed below:
>
> cudaMalloc((void **), bufferSize2) error( cudaErrorMemoryAllocation): 
> out of memory
>
> For CUDA backend, one can use "-matmatmult_backend_cpu -matptap_backend_cpu" 
> to avoid these problems. However, there seems no equivalent options in Kokkos 
> backend. Is there any good practice to avoid this error for both backends and 
> if we can avoid this error in Kokkos backend?

Junchao will know more about KK tuning, but the faster GPU matrix-matrix 
algorithms use extra memory. We should be able to make the host option 
available with kokkos.

>   2.  We have tested the combination of Hypre and Kokkos as backend. It looks 
> like this combination is not compatible with each other, as we observed that 
> KSPSolve takes a greater number of iterations to exit, and the residual norm 
> in the post-checking is much larger than the one obtained when working with 
> CUDA backend. This happens for matrices with block size larger than 1. Is 
> there any explanation to the error?
>
> Second, we have couple more questions when working with multiple MPI ranks:
>
>   1.  We are currently using OpenMPI as we couldnt get Intel MPI to work as a 
> GPU-aware MPI, is this a known issue with Intel MPI?

As far as I know, Intel's MPI is only for SYCL/Intel GPUs. In general, 
GPU-aware MPI has been incredibly flaky on all HPC systems despite being 
introduced ten years ago.

>   2.  With OpenMPI we currently see a slow down when increasing the MPI count 
> as shown in the figure below, is this normal?

Could you share -log_view output from a couple representative runs? You could 
send those here or to petsc-ma...@mcs.anl.gov. We need to see what kind of work 
is not scaling to attribute what may be causing it.


[petsc-users] GAMG and Hypre preconditioner

2023-06-27 Thread Zisheng Ye via petsc-users
Dear PETSc Team

We are testing the GPU support in PETSc's KSPSolve, especially for the GAMG and 
Hypre preconditioners. We have encountered several issues that we would like to 
ask for your suggestions.

First, we have couple of questions when working with a single MPI rank:

  1.  We have tested two backends, CUDA and Kokkos. One commonly encountered 
error is related to SpGEMM in CUDA when the mat is large as listed below:

cudaMalloc((void **), bufferSize2) error( cudaErrorMemoryAllocation): 
out of memory

For CUDA backend, one can use "-matmatmult_backend_cpu -matptap_backend_cpu" to 
avoid these problems. However, there seems no equivalent options in Kokkos 
backend. Is there any good practice to avoid this error for both backends and 
if we can avoid this error in Kokkos backend?

  2.  We have tested the combination of Hypre and Kokkos as backend. It looks 
like this combination is not compatible with each other, as we observed that 
KSPSolve takes a greater number of iterations to exit, and the residual norm in 
the post-checking is much larger than the one obtained when working with CUDA 
backend. This happens for matrices with block size larger than 1. Is there any 
explanation to the error?

Second, we have couple more questions when working with multiple MPI ranks:

  1.  We are currently using OpenMPI as we couldnt get Intel MPI to work as a 
GPU-aware MPI, is this a known issue with Intel MPI?
  2.  With OpenMPI we currently see a slow down when increasing the MPI count 
as shown in the figure below, is this normal?

[cid:9242808d-34af-4b51-8a0b-8295f0a012e5]

Zisheng


Re: [petsc-users] [EXTERNAL] Re: Initializing kokkos before petsc causes a problem

2023-06-27 Thread Fackler, Philip via petsc-users
OK, great! I'll try it out soon.

Thank you,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory

From: Junchao Zhang 
Sent: Tuesday, June 27, 2023 10:58
To: Fackler, Philip 
Cc: petsc-users@mcs.anl.gov ; Blondel, Sophie 
; xolotl-psi-developm...@lists.sourceforge.net 

Subject: Re: [EXTERNAL] Re: [petsc-users] Initializing kokkos before petsc 
causes a problem

Hi, Philip,
   It's my fault.  I should follow up early that this problem was fixed by 
https://gitlab.com/petsc/petsc/-/merge_requests/6586.
   Could you try petsc/main?

   Thanks.
--Junchao Zhang


On Tue, Jun 27, 2023 at 9:30 AM Fackler, Philip 
mailto:fackle...@ornl.gov>> wrote:
Good morning Junchao! I'm following up here to see if there is any update to 
petsc to resolve this issue, or if we need to come up with a work-around.

Thank you,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory

From: Junchao Zhang mailto:junchao.zh...@gmail.com>>
Sent: Wednesday, June 7, 2023 22:45
To: Fackler, Philip mailto:fackle...@ornl.gov>>
Cc: petsc-users@mcs.anl.gov 
mailto:petsc-users@mcs.anl.gov>>; Blondel, Sophie 
mailto:sblon...@utk.edu>>; 
xolotl-psi-developm...@lists.sourceforge.net
 
mailto:xolotl-psi-developm...@lists.sourceforge.net>>
Subject: [EXTERNAL] Re: [petsc-users] Initializing kokkos before petsc causes a 
problem

Hi, Philip,
  Thanks for reporting. I will have a look at the issue.
--Junchao Zhang


On Wed, Jun 7, 2023 at 9:30 AM Fackler, Philip via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
I'm encountering a problem in xolotl. We initialize kokkos before initializing 
petsc. Therefore...

The pointer referenced here:
https://gitlab.com/petsc/petsc/-/blob/main/src/vec/is/sf/impls/basic/kokkos/sfkok.kokkos.cxx#L363


from here:
https://gitlab.com/petsc/petsc/-/blob/main/include/petsc_kokkos.hpp

remains null because the code to initialize it is skipped here:
https://gitlab.com/petsc/petsc/-/blob/main/src/sys/objects/kokkos/kinit.kokkos.cxx#L28
See line 71.

Can this be modified to allow for kokkos to have been initialized by the 
application before initializing petsc?

Thank you for your help,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory


Re: [petsc-users] SOLVE + PC combination for 7 point stencil (unstructured) poisson solution

2023-06-27 Thread Vanella, Marcos (Fed) via petsc-users
Hi Mark and Matt, I tried swapping the preconditioner to cholmod and also the 
hypre Boomer AMG. They work just fine for my case. I also got my hands on a 
machine with NVIDIA gpus in one of our AI clusters. I compiled PETSc to make 
use of cuda and cuda-enabled openmpi (with gcc).
I'm running the previous tests and want to also check some of the cuda enabled 
solvers. I was able to submit a case for the default Krylov solver with these 
runtime flags: -vec_type seqcuda -mat_type seqaijcusparse -pc_type cholesky 
-pc_factor_mat_solver_type cusparse. The case run to completion.

I guess my question now is how do I monitor (if there is a way) that the GPU is 
being used in the calculation, and any other stats? Also, which other solver 
combination using GPU would you recommend for me to try? Can we compile PETSc 
with the cuda enabled version for CHOLMOD and HYPRE?

Thank you for your help!
Marcos


From: Matthew Knepley 
Sent: Monday, June 26, 2023 12:11 PM
To: Vanella, Marcos (Fed) 
Cc: Mark Adams ; petsc-users@mcs.anl.gov 

Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil 
(unstructured) poisson solution

On Mon, Jun 26, 2023 at 12:08 PM Vanella, Marcos (Fed) via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Than you Matt and Mark, I'll try your suggestions. To configure with hypre can 
I just use the --download-hypre configure line?

Yes,

  Thanks,

Matt

That is what I did with suitesparse, very nice.

From: Mark Adams mailto:mfad...@lbl.gov>>
Sent: Monday, June 26, 2023 12:05 PM
To: Vanella, Marcos (Fed) 
mailto:marcos.vane...@nist.gov>>
Cc: petsc-users@mcs.anl.gov 
mailto:petsc-users@mcs.anl.gov>>
Subject: Re: [petsc-users] SOLVE + PC combination for 7 point stencil 
(unstructured) poisson solution

I'm not sure what MG is doing with an "unstructured" problem. I assume you are 
not using DMDA.
-pc_type gamg should work
I would configure with hypre and try that also: -pc_type hypre

As Matt said MG should be faster. How many iterations was it taking?
Try a 100^3 and check that the iteration count does not change much, if at all.

Mark


On Mon, Jun 26, 2023 at 11:35 AM Vanella, Marcos (Fed) via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hi, I was wondering if anyone has experience on what combinations are more 
efficient to solve a Poisson problem derived from a 7 point stencil on a single 
mesh (serial).
I've been doing some tests of multigrid and cholesky on a 50^3 mesh. -pc_type 
mg takes about 75% more time than -pc_type cholesky -pc_factor_mat_solver_type 
cholmod for the case I'm testing.
I'm new to PETSc so any suggestions are most welcome and appreciated,
Marcos


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/


Re: [petsc-users] [EXTERNAL] Re: Initializing kokkos before petsc causes a problem

2023-06-27 Thread Junchao Zhang
Hi, Philip,
   It's my fault.  I should follow up early that this problem was fixed by
https://gitlab.com/petsc/petsc/-/merge_requests/6586.
   Could you try petsc/main?

   Thanks.
--Junchao Zhang


On Tue, Jun 27, 2023 at 9:30 AM Fackler, Philip  wrote:

> Good morning Junchao! I'm following up here to see if there is any update
> to petsc to resolve this issue, or if we need to come up with a work-around.
>
> Thank you,
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
> --
> *From:* Junchao Zhang 
> *Sent:* Wednesday, June 7, 2023 22:45
> *To:* Fackler, Philip 
> *Cc:* petsc-users@mcs.anl.gov ; Blondel, Sophie <
> sblon...@utk.edu>; xolotl-psi-developm...@lists.sourceforge.net <
> xolotl-psi-developm...@lists.sourceforge.net>
> *Subject:* [EXTERNAL] Re: [petsc-users] Initializing kokkos before petsc
> causes a problem
>
> Hi, Philip,
>   Thanks for reporting. I will have a look at the issue.
> --Junchao Zhang
>
>
> On Wed, Jun 7, 2023 at 9:30 AM Fackler, Philip via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
>
> I'm encountering a problem in xolotl. We initialize kokkos before
> initializing petsc. Therefore...
>
> The pointer referenced here:
>
> https://gitlab.com/petsc/petsc/-/blob/main/src/vec/is/sf/impls/basic/kokkos/sfkok.kokkos.cxx#L363
> 
>
> 
>
> from here:
> https://gitlab.com/petsc/petsc/-/blob/main/include/petsc_kokkos.hpp
> 
>
> remains null because the code to initialize it is skipped here:
>
> https://gitlab.com/petsc/petsc/-/blob/main/src/sys/objects/kokkos/kinit.kokkos.cxx#L28
> 
> See line 71.
>
> Can this be modified to allow for kokkos to have been initialized by the
> application before initializing petsc?
>
> Thank you for your help,
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
>
>


Re: [petsc-users] [EXTERNAL] Re: Initializing kokkos before petsc causes a problem

2023-06-27 Thread Fackler, Philip via petsc-users
Good morning Junchao! I'm following up here to see if there is any update to 
petsc to resolve this issue, or if we need to come up with a work-around.

Thank you,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory

From: Junchao Zhang 
Sent: Wednesday, June 7, 2023 22:45
To: Fackler, Philip 
Cc: petsc-users@mcs.anl.gov ; Blondel, Sophie 
; xolotl-psi-developm...@lists.sourceforge.net 

Subject: [EXTERNAL] Re: [petsc-users] Initializing kokkos before petsc causes a 
problem

Hi, Philip,
  Thanks for reporting. I will have a look at the issue.
--Junchao Zhang


On Wed, Jun 7, 2023 at 9:30 AM Fackler, Philip via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
I'm encountering a problem in xolotl. We initialize kokkos before initializing 
petsc. Therefore...

The pointer referenced here:
https://gitlab.com/petsc/petsc/-/blob/main/src/vec/is/sf/impls/basic/kokkos/sfkok.kokkos.cxx#L363


from here:
https://gitlab.com/petsc/petsc/-/blob/main/include/petsc_kokkos.hpp

remains null because the code to initialize it is skipped here:
https://gitlab.com/petsc/petsc/-/blob/main/src/sys/objects/kokkos/kinit.kokkos.cxx#L28
See line 71.

Can this be modified to allow for kokkos to have been initialized by the 
application before initializing petsc?

Thank you for your help,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory