[petsc-users] Question about DMPlex for parallel computing

2021-10-17 Thread Gong Yujie
Hi,

I'm learning to use DMPlex to write a parallel program. I've tried to write a 
sequential code earlier successfully, but when to write a parallel code, there 
are many things different. There are some questions I'm curious about.


  1.  Are the functions as DMPlexCreateGmshFromFile() and other read from file 
functions reading in the mesh in parallel? Or just the root node read in the 
mesh?
  2.  Are there some examples available for distribute the mesh and create the 
correspondingly local to global node(or position) mapping ?

I'm grateful for your kindly help!

Best Regards,
Gong


Re: [petsc-users] MatVec on GPUs

2021-10-17 Thread Swarnava Ghosh
Thanks Matt and Junchao.

Sincerely,
Swarnava

On Sun, Oct 17, 2021 at 7:50 PM Matthew Knepley  wrote:

> On Sun, Oct 17, 2021 at 7:12 PM Swarnava Ghosh 
> wrote:
>
>> Do I need convert the MATSEQBAIJ to a cuda matrix in code?
>>
>
> You would need a call to MatSetFromOptions() to take that type from the
> command line, and not have
> the type hard-coded in your application. It is generally a bad idea to
> hard code the implementation type.
>
>
>> If I do it from command line, then are the other MatVec calls are ported
>> onto CUDA? I have many MatVec calls in my code, but I specifically want to
>> port just one call.
>>
>
> You can give that one matrix an options prefix to isolate it.
>
>   Thanks,
>
>  Matt
>
>
>> Sincerely,
>> Swarnava
>>
>> On Sun, Oct 17, 2021 at 7:07 PM Junchao Zhang 
>> wrote:
>>
>>> You can do that with command line options -mat_type aijcusparse
>>> -vec_type cuda
>>>
>>> On Sun, Oct 17, 2021, 5:32 PM Swarnava Ghosh 
>>> wrote:
>>>
 Dear Petsc team,

 I had a query regarding using CUDA to accelerate a matrix vector
 product.
 I have a sequential sparse matrix (MATSEQBAIJ type). I want to port a
 MatVec call onto GPUs. Is there any code/example I can look at?

 Sincerely,
 SG

>>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> 
>


Re: [petsc-users] MatVec on GPUs

2021-10-17 Thread Matthew Knepley
On Sun, Oct 17, 2021 at 7:12 PM Swarnava Ghosh  wrote:

> Do I need convert the MATSEQBAIJ to a cuda matrix in code?
>

You would need a call to MatSetFromOptions() to take that type from the
command line, and not have
the type hard-coded in your application. It is generally a bad idea to hard
code the implementation type.


> If I do it from command line, then are the other MatVec calls are ported
> onto CUDA? I have many MatVec calls in my code, but I specifically want to
> port just one call.
>

You can give that one matrix an options prefix to isolate it.

  Thanks,

 Matt


> Sincerely,
> Swarnava
>
> On Sun, Oct 17, 2021 at 7:07 PM Junchao Zhang 
> wrote:
>
>> You can do that with command line options -mat_type aijcusparse -vec_type
>> cuda
>>
>> On Sun, Oct 17, 2021, 5:32 PM Swarnava Ghosh 
>> wrote:
>>
>>> Dear Petsc team,
>>>
>>> I had a query regarding using CUDA to accelerate a matrix vector
>>> product.
>>> I have a sequential sparse matrix (MATSEQBAIJ type). I want to port a
>>> MatVec call onto GPUs. Is there any code/example I can look at?
>>>
>>> Sincerely,
>>> SG
>>>
>>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] MatVec on GPUs

2021-10-17 Thread Swarnava Ghosh
Do I need convert the MATSEQBAIJ to a cuda matrix in code?
If I do it from command line, then are the other MatVec calls are ported
onto CUDA? I have many MatVec calls in my code, but I specifically want to
port just one call.

Sincerely,
Swarnava

On Sun, Oct 17, 2021 at 7:07 PM Junchao Zhang 
wrote:

> You can do that with command line options -mat_type aijcusparse -vec_type
> cuda
>
> On Sun, Oct 17, 2021, 5:32 PM Swarnava Ghosh  wrote:
>
>> Dear Petsc team,
>>
>> I had a query regarding using CUDA to accelerate a matrix vector product.
>> I have a sequential sparse matrix (MATSEQBAIJ type). I want to port a
>> MatVec call onto GPUs. Is there any code/example I can look at?
>>
>> Sincerely,
>> SG
>>
>


Re: [petsc-users] MatVec on GPUs

2021-10-17 Thread Junchao Zhang
You can do that with command line options -mat_type aijcusparse -vec_type
cuda

On Sun, Oct 17, 2021, 5:32 PM Swarnava Ghosh  wrote:

> Dear Petsc team,
>
> I had a query regarding using CUDA to accelerate a matrix vector product.
> I have a sequential sparse matrix (MATSEQBAIJ type). I want to port a
> MatVec call onto GPUs. Is there any code/example I can look at?
>
> Sincerely,
> SG
>


[petsc-users] MatVec on GPUs

2021-10-17 Thread Swarnava Ghosh
Dear Petsc team,

I had a query regarding using CUDA to accelerate a matrix vector product.
I have a sequential sparse matrix (MATSEQBAIJ type). I want to port a
MatVec call onto GPUs. Is there any code/example I can look at?

Sincerely,
SG


Re: [petsc-users] gamg student questions

2021-10-17 Thread Matthew Knepley
On Sun, Oct 17, 2021 at 9:04 AM Mark Adams  wrote:

> Hi Daniel, [this is a PETSc users list question so let me move it there]
>
> The behavior that you are seeing is a bit odd but not surprising.
>
> First, you should start with simple problems and get AMG (you might want
> to try this exercise with hypre as well: --download-hypre and use -pc_type
> hypre, or BDDC, see below).
>

We have two examples that do this:

  1) SNES ex56: This shows good performance of GAMG on Q1 and Q2 elasticity

  2) SNES ex17: This sets up a lot of finite element elasticity problems
where you can experiment with GAMG, ML, Hypre, BDDC, and other
preconditioners

As a rule of thumb, if my solver is taking more than 100 iterations
(usually for 1e-8 tolerance), something is very wrong. Either the problem
is setup incorrectly, the solver is
configured incorrectly, or I need to switch solvers.

  Thanks,

 Matt


> There are, alas, a lot of tuning parameters in AMG/DD and I recommend a
> homotopy process: you can start with issues that deal with your
> discretization on a simple cube, linear elasticity, cube elements, modest
> Posson ratio, etc., and first get "textbook multigrid efficiency" (TME),
> which for elasticity and a V(2,2) cycle in GAMG is about one digit of error
> reduction per iteration and perfectly monotonic until it hits floating
> point precision.
>
> I would set this problem up and I would hope it runs OK, but the
> problems that you want to do are probably pretty hard (high order FE,
> plasticity, incompressibility) so there will be more work to do.
>
> That said, PETSc has nice domain decomposition solvers that are more
> optimized and maintained for elasticity. Now that I think about it, you
> should probably look at these (
> https://petsc.org/release/docs/manualpages/PC/PCBDDC.html
> https://petsc.org/release/docs/manual/ksp/#balancing-domain-decomposition-by-constraints).
> I think they prefer, but do not require, that you do not assemble your
> element matrices, but let them do it. The docs will make that clear.
>
> BSSC is great but it is not magic, and it is no less complex, so I would
> still recommend the same process of getting TME and then moving to the
> problems that you want to solve.
>
> Good luck,
> Mark
>
>
>
> On Sat, Oct 16, 2021 at 10:50 PM Daniel N Pickard  wrote:
>
>> Hi Dr Adams,
>>
>>
>> I am using the gamg in petsc to solve some elasticity problems for
>> modeling bones. I am new to profiling with petsc, but I am observing that
>> around a thousand iterations my norm has gone down 3 orders of magnitude
>> but the solver slows down and progress sort of stalls. The norm
>> also doesn't decrease monotonically, but jumps around a bit. I also notice
>> that if I request to only use 1 multigrid level, the preconditioner is
>> much cheaper and not as powerful so the code takes more iterations, but
>> runs 2-3x faster. Is this expected that large models require lots of
>> iterations and convergence slows down as we get more accurate? What exactly
>> should I be looking for when I am profiling to try to understand how to run
>> faster? I see that a lot of my ratio's are 2.7, but I think that is because
>> my mesh partitioner is not doing a great job making equal domains. What are
>> the giveaways in the log_view that tell you that petsc could be optimized
>> more?
>>
>>
>> Also when I look at the solution with just 4 orders of magnitude of
>> convergence I can see that the solver has not made much progress in the
>> interior of the domain, but seems to have smoothed out the boundary where
>> forces where applied very well. Does this mean I should use a larger
>> threshold to get more course grids that can fix the low frequency error?
>>
>>
>> Thanks,
>>
>> Daniel Pickard
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] gamg student questions

2021-10-17 Thread Mark Adams
Hi Daniel, [this is a PETSc users list question so let me move it there]

The behavior that you are seeing is a bit odd but not surprising.

First, you should start with simple problems and get AMG (you might want to
try this exercise with hypre as well: --download-hypre and use -pc_type
hypre, or BDDC, see below).

There are, alas, a lot of tuning parameters in AMG/DD and I recommend a
homotopy process: you can start with issues that deal with your
discretization on a simple cube, linear elasticity, cube elements, modest
Posson ratio, etc., and first get "textbook multigrid efficiency" (TME),
which for elasticity and a V(2,2) cycle in GAMG is about one digit of error
reduction per iteration and perfectly monotonic until it hits floating
point precision.

I would set this problem up and I would hope it runs OK, but the
problems that you want to do are probably pretty hard (high order FE,
plasticity, incompressibility) so there will be more work to do.

That said, PETSc has nice domain decomposition solvers that are more
optimized and maintained for elasticity. Now that I think about it, you
should probably look at these (
https://petsc.org/release/docs/manualpages/PC/PCBDDC.html
https://petsc.org/release/docs/manual/ksp/#balancing-domain-decomposition-by-constraints).
I think they prefer, but do not require, that you do not assemble your
element matrices, but let them do it. The docs will make that clear.

BSSC is great but it is not magic, and it is no less complex, so I would
still recommend the same process of getting TME and then moving to the
problems that you want to solve.

Good luck,
Mark



On Sat, Oct 16, 2021 at 10:50 PM Daniel N Pickard  wrote:

> Hi Dr Adams,
>
>
> I am using the gamg in petsc to solve some elasticity problems for
> modeling bones. I am new to profiling with petsc, but I am observing that
> around a thousand iterations my norm has gone down 3 orders of magnitude
> but the solver slows down and progress sort of stalls. The norm
> also doesn't decrease monotonically, but jumps around a bit. I also notice
> that if I request to only use 1 multigrid level, the preconditioner is
> much cheaper and not as powerful so the code takes more iterations, but
> runs 2-3x faster. Is this expected that large models require lots of
> iterations and convergence slows down as we get more accurate? What exactly
> should I be looking for when I am profiling to try to understand how to run
> faster? I see that a lot of my ratio's are 2.7, but I think that is because
> my mesh partitioner is not doing a great job making equal domains. What are
> the giveaways in the log_view that tell you that petsc could be optimized
> more?
>
>
> Also when I look at the solution with just 4 orders of magnitude of
> convergence I can see that the solver has not made much progress in the
> interior of the domain, but seems to have smoothed out the boundary where
> forces where applied very well. Does this mean I should use a larger
> threshold to get more course grids that can fix the low frequency error?
>
>
> Thanks,
>
> Daniel Pickard
>