Re: [petsc-users] Some questions about TS/examples/tutorials/ex13

2018-12-22 Thread Sajid Ali via petsc-users
>   No, A(t) has five diagonals as indicated by the calls to MatSetValues()
with five entries per row; they correspond to the usual 5 pt stencil in 2d.
This might not be the right place to ask this, but I'm new to PDE's and I
expected to find something like the matrix to be used as the one derived in
slides 10-12 of this presentation :
https://www.math.ust.hk/~mawang/teaching/math532/mgtut.pdf

Thanks Matt for the advice on the book. I'll have a look at it.


Re: [petsc-users] Some questions about TS/examples/tutorials/ex13

2018-12-22 Thread Smith, Barry F. via petsc-users



> On Dec 22, 2018, at 12:29 PM, Matthew Knepley via petsc-users 
>  wrote:
> 
> On Sat, Dec 22, 2018 at 1:18 PM Sajid Ali via petsc-users 
>  wrote:
> Hi, 
> 
> I have a few questions about ex13 from TS explaining how to solve the 
> following PDE in 2D :
> u_t = uxx + uyy
> 1) Can I get a reference/explanation for how the RHSJacobian function is 
> derived ?
> 
> Its Finite Differences, so maybe the Strikwerda book?
>  
> 2) If the problem is linear, why is TSSetProblemType never set to TS_LINEAR?
> 
> That is the default.
>  
> 3) Since this is just a 2D version of ex3, can't this be set up by only 
> providing a function for the evaluation for TSSetRHSJacobian with the 
> TSSetRHSFunction set to TSComputeRHSFunctionLinear ? 
> 
> It could. This way, we could do a nonlinear perturbation if we wanted.
>  
> If this is the case and a matrix could be written that equivalently states 
> the problem as u_t = A(t)*u, then how would one make such a matrix A(t) ? (In 
> essence, what I'm asking is how does petsc define 2D vectors. Would A(t) be 
> the familiar block tridiagonal matrix ?)

   No, A(t) has five diagonals as indicated by the calls to MatSetValues() with 
five entries per row; they correspond to the usual 5 pt stencil in 2d.

> 
> This uses a DMDA, so it has that numbering.
>  
> 4) In lines 153/154, why does the outer loop iterate over columns and the 
> inner loop iterate over rows? Does this again have something to do with how 
> petsc stores 2d vectors?
> 
> PETSc does not prescribe how you store them. We choose to do it this way 
> because it matched BLAS.
> 
>   Thanks,
> 
> Matt
>  
> Thank You,
> Sajid Ali
> Applied Physics
> Northwestern University
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any results to which their experiments 
> lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/



[petsc-users] Some questions about TS/examples/tutorials/ex13

2018-12-22 Thread Sajid Ali via petsc-users
Hi,

I have a few questions about ex13 from TS explaining how to solve the
following PDE in 2D :

u_t = uxx + uyy

1) Can I get a reference/explanation for how the RHSJacobian function is
derived ?

2) If the problem is linear, why is TSSetProblemType never set to TS_LINEAR?

3) Since this is just a 2D version of ex3, can't this be set up by only
providing a function for the evaluation for TSSetRHSJacobian with the
TSSetRHSFunction set to TSComputeRHSFunctionLinear ?

If this is the case and a matrix could be written that equivalently states
the problem as u_t = A(t)*u, then how would one make such a matrix A(t) ?
(In essence, what I'm asking is how does petsc define 2D vectors. Would
A(t) be the familiar block tridiagonal matrix ?)

4) In lines 153/154, why does the outer loop iterate over columns and the
inner loop iterate over rows? Does this again have something to do with how
petsc stores 2d vectors?

Thank You,
Sajid Ali
Applied Physics
Northwestern University


Re: [petsc-users] GAMG scaling

2018-12-22 Thread Mark Adams via petsc-users
Wow, this is an old thread.

Sorry if I sound like an old fart talking about the good old days but I
originally did RAP. in Prometheus, in a non work optimal way that might be
of interest. Not hard to implement. I bring this up because we continue to
struggle with this damn thing. I think this approach is perfectly scalable
and pretty low overhead, and simple.

Note, I talked to the hypre people about this in like 97 when they were
implementing RAP and perhaps they are doing it this way ... the 4x slower
way.

Anyway, my data for this is in my SC 2004 paper (MakeNextMat_private in
attached, NB, this is code that I wrote in grad school). It is memory
efficient and simple, just four nested loops i,j,I,J: C(I,J) =
P(i,I)*A(i,j)*P(j,J). In eyeballing the numbers and from new data that I am
getting from my bone modeling colleagues, that use this old code on
Stampede now, the times look reasonable compared to GAMG. This is optimized
for elasticity, where I unroll loops (so it is really six nested loops).

In thinking about this now, I think you want to make a local copy of P with
rows (j) for every column in A that you have locally, then transpose this
local thing for the P(j,J) term. A sparse AXPY on j. (My code uses a
special tree data structure but a matrix is simpler.)


On Sat, Dec 22, 2018 at 3:39 AM Mark Adams  wrote:

> OK, so this thread has drifted, see title :)
>
> On Fri, Dec 21, 2018 at 10:01 PM Fande Kong  wrote:
>
>> Sorry, hit the wrong button.
>>
>>
>>
>> On Fri, Dec 21, 2018 at 7:56 PM Fande Kong  wrote:
>>
>>>
>>>
>>> On Fri, Dec 21, 2018 at 9:44 AM Mark Adams  wrote:
>>>
 Also, you mentioned that you are using 10 levels. This is very strange
 with GAMG. You can run with -info and grep on GAMG to see the sizes and the
 number of non-zeros per level. You should coarsen at a rate of about 2^D to
 3^D with GAMG (with 10 levels this would imply a very large fine grid
 problem so I suspect there is something strange going on with coarsening).
 Mark

>>>
>>> Hi Mark,
>>>
>>>
>> Thanks for your email. We did not try GAMG much for our problems since we
>> still have troubles to figure out how to effectively use GAMG so far.
>> Instead, we are building our own customized  AMG  that needs to use PtAP to
>> construct coarse matrices.  The customized AMG works pretty well for our
>> specific simulations. The bottleneck right now is that PtAP might
>> take too much memory, and the code crashes within the function "PtAP". I
>> defiantly need a memory profiler to confirm my statement here.
>>
>> Thanks,
>>
>> Fande Kong,
>>
>>
>>
>>>
>>>
>>>

 On Fri, Dec 21, 2018 at 11:36 AM Zhang, Hong via petsc-users <
 petsc-users@mcs.anl.gov> wrote:

> Fande:
> I will explore it and get back to you.
> Does anyone know how to profile memory usage?
> Hong
>
> Thanks, Hong,
>>
>> I just briefly went through the code. I was wondering if it is
>> possible to destroy "c->ptap" (that caches a lot of intermediate data) to
>> release the memory after the coarse matrix is assembled. I understand you
>> may still want to reuse these data structures by default but for my
>> simulation, the preconditioner is fixed and there is no reason to keep 
>> the
>> "c->ptap".
>>
>
>> It would be great, if we could have this optional functionality.
>>
>> Fande Kong,
>>
>> On Thu, Dec 20, 2018 at 9:45 PM Zhang, Hong 
>> wrote:
>>
>>> We use nonscalable implementation as default, and switch to scalable
>>> for matrices over finer grids. You may use option '-matptap_via 
>>> scalable'
>>> to force scalable PtAP  implementation for all PtAP. Let me know if it
>>> works.
>>> Hong
>>>
>>> On Thu, Dec 20, 2018 at 8:16 PM Smith, Barry F. 
>>> wrote:
>>>

   See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable
 automatically for "large" problems, which is determined by some 
 heuristic.

Barry


 > On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users <
 petsc-users@mcs.anl.gov> wrote:
 >
 >
 >
 > On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong 
 wrote:
 > Fande:
 > Hong,
 > Thanks for your improvements on PtAP that is critical for MG-type
 algorithms.
 >
 > On Wed, May 3, 2017 at 10:17 AM Hong  wrote:
 > Mark,
 > Below is the copy of my email sent to you on Feb 27:
 >
 > I implemented scalable MatPtAP and did comparisons of three
 implementations using ex56.c on alcf cetus machine (this machine has 
 small
 memory, 1GB/core):
 > - nonscalable PtAP: use an array of length PN to do dense axpy
 > - scalable PtAP:   do sparse axpy without use of PN array
 >
 > What PN means here?
 > Global number of columns of P.

Re: [petsc-users] GAMG scaling

2018-12-22 Thread Mark Adams via petsc-users
OK, so this thread has drifted, see title :)

On Fri, Dec 21, 2018 at 10:01 PM Fande Kong  wrote:

> Sorry, hit the wrong button.
>
>
>
> On Fri, Dec 21, 2018 at 7:56 PM Fande Kong  wrote:
>
>>
>>
>> On Fri, Dec 21, 2018 at 9:44 AM Mark Adams  wrote:
>>
>>> Also, you mentioned that you are using 10 levels. This is very strange
>>> with GAMG. You can run with -info and grep on GAMG to see the sizes and the
>>> number of non-zeros per level. You should coarsen at a rate of about 2^D to
>>> 3^D with GAMG (with 10 levels this would imply a very large fine grid
>>> problem so I suspect there is something strange going on with coarsening).
>>> Mark
>>>
>>
>> Hi Mark,
>>
>>
> Thanks for your email. We did not try GAMG much for our problems since we
> still have troubles to figure out how to effectively use GAMG so far.
> Instead, we are building our own customized  AMG  that needs to use PtAP to
> construct coarse matrices.  The customized AMG works pretty well for our
> specific simulations. The bottleneck right now is that PtAP might
> take too much memory, and the code crashes within the function "PtAP". I
> defiantly need a memory profiler to confirm my statement here.
>
> Thanks,
>
> Fande Kong,
>
>
>
>>
>>
>>
>>>
>>> On Fri, Dec 21, 2018 at 11:36 AM Zhang, Hong via petsc-users <
>>> petsc-users@mcs.anl.gov> wrote:
>>>
 Fande:
 I will explore it and get back to you.
 Does anyone know how to profile memory usage?
 Hong

 Thanks, Hong,
>
> I just briefly went through the code. I was wondering if it is
> possible to destroy "c->ptap" (that caches a lot of intermediate data) to
> release the memory after the coarse matrix is assembled. I understand you
> may still want to reuse these data structures by default but for my
> simulation, the preconditioner is fixed and there is no reason to keep the
> "c->ptap".
>

> It would be great, if we could have this optional functionality.
>
> Fande Kong,
>
> On Thu, Dec 20, 2018 at 9:45 PM Zhang, Hong 
> wrote:
>
>> We use nonscalable implementation as default, and switch to scalable
>> for matrices over finer grids. You may use option '-matptap_via scalable'
>> to force scalable PtAP  implementation for all PtAP. Let me know if it
>> works.
>> Hong
>>
>> On Thu, Dec 20, 2018 at 8:16 PM Smith, Barry F. 
>> wrote:
>>
>>>
>>>   See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable automatically
>>> for "large" problems, which is determined by some heuristic.
>>>
>>>Barry
>>>
>>>
>>> > On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users <
>>> petsc-users@mcs.anl.gov> wrote:
>>> >
>>> >
>>> >
>>> > On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong 
>>> wrote:
>>> > Fande:
>>> > Hong,
>>> > Thanks for your improvements on PtAP that is critical for MG-type
>>> algorithms.
>>> >
>>> > On Wed, May 3, 2017 at 10:17 AM Hong  wrote:
>>> > Mark,
>>> > Below is the copy of my email sent to you on Feb 27:
>>> >
>>> > I implemented scalable MatPtAP and did comparisons of three
>>> implementations using ex56.c on alcf cetus machine (this machine has 
>>> small
>>> memory, 1GB/core):
>>> > - nonscalable PtAP: use an array of length PN to do dense axpy
>>> > - scalable PtAP:   do sparse axpy without use of PN array
>>> >
>>> > What PN means here?
>>> > Global number of columns of P.
>>> >
>>> > - hypre PtAP.
>>> >
>>> > The results are attached. Summary:
>>> > - nonscalable PtAP is 2x faster than scalable, 8x faster than
>>> hypre PtAP
>>> > - scalable PtAP is 4x faster than hypre PtAP
>>> > - hypre uses less memory (see job.ne399.n63.np1000.sh)
>>> >
>>> > I was wondering how much more memory PETSc PtAP uses than hypre? I
>>> am implementing an AMG algorithm based on PETSc right now, and it is
>>> working well. But we find some a bottleneck with PtAP. For the same P 
>>> and
>>> A, PETSc PtAP fails to generate a coarse matrix due to out of memory, 
>>> while
>>> hypre still can generates the coarse matrix.
>>> >
>>> > I do not want to just use the HYPRE one because we had to
>>> duplicate matrices if I used HYPRE PtAP.
>>> >
>>> > It would be nice if you guys already have done some compassions on
>>> these implementations for the memory usage.
>>> > Do you encounter memory issue with  scalable PtAP?
>>> >
>>> > By default do we use the scalable PtAP?? Do we have to specify
>>> some options to use the scalable version of PtAP?  If so, it would be 
>>> nice
>>> to use the scalable version by default.  I am totally missing something
>>> here.
>>> >
>>> > Thanks,
>>> >
>>> > Fande
>>> >
>>> >
>>> > Karl had a student in the summer who improved MatPtAP(). Do you
>>> use the latest version of