Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-09 Thread Hengjie Wang
Hi Barry,

I checked. On the supercomputer, I had the option "-ksp_view_pre" but it is
not in file I sent you. I am sorry for the confusion.

Regards,
Frank

On Friday, September 9, 2016, Barry Smith  wrote:

>
> > On Sep 9, 2016, at 3:11 PM, frank >
> wrote:
> >
> > Hi Barry,
> >
> > I think the first KSP view output is from -ksp_view_pre. Before I
> submitted the test, I was not sure whether there would be OOM error or not.
> So I added both -ksp_view_pre and -ksp_view.
>
>   But the options file you sent specifically does NOT list the
> -ksp_view_pre so how could it be from that?
>
>Sorry to be pedantic but I've spent too much time in the past trying to
> debug from incorrect information and want to make sure that the information
> I have is correct before thinking. Please recheck exactly what happened.
> Rerun with the exact input file you emailed if that is needed.
>
>Barry
>
> >
> > Frank
> >
> >
> > On 09/09/2016 12:38 PM, Barry Smith wrote:
> >>   Why does ksp_view2.txt have two KSP views in it while ksp_view1.txt
> has only one KSPView in it? Did you run two different solves in the 2 case
> but not the one?
> >>
> >>   Barry
> >>
> >>
> >>
> >>> On Sep 9, 2016, at 10:56 AM, frank >
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I want to continue digging into the memory problem here.
> >>> I did find a work around in the past, which is to use less cores per
> node so that each core has 8G memory. However this is deficient and
> expensive. I hope to locate the place that uses the most memory.
> >>>
> >>> Here is a brief summary of the tests I did in past:
>  Test1:   Mesh 1536*128*384  |  Process Mesh 48*4*12
> >>> Maximum (over computational time) process memory:   total
> 7.0727e+08
> >>> Current process memory:
>  total 7.0727e+08
> >>> Maximum (over computational time) space PetscMalloc()ed:  total
> 6.3908e+11
> >>> Current space PetscMalloc()ed:
> total 1.8275e+09
> >>>
>  Test2:Mesh 1536*128*384  |  Process Mesh 96*8*24
> >>> Maximum (over computational time) process memory:   total
> 5.9431e+09
> >>> Current process memory:
>  total 5.9431e+09
> >>> Maximum (over computational time) space PetscMalloc()ed:  total
> 5.3202e+12
> >>> Current space PetscMalloc()ed:
>  total 5.4844e+09
> >>>
>  Test3:Mesh 3072*256*768  |  Process Mesh 96*8*24
> >>> OOM( Out Of Memory ) killer of the supercomputer terminated the
> job during "KSPSolve".
> >>>
> >>> I attached the output of ksp_view( the third test's output is from
> ksp_view_pre ), memory_view and also the petsc options.
> >>>
> >>> In all the tests, each core can access about 2G memory. In test3,
> there are 4223139840 non-zeros in the matrix. This will consume about
> 1.74M, using double precision. Considering some extra memory used to store
> integer index, 2G memory should still be way enough.
> >>>
> >>> Is there a way to find out which part of KSPSolve uses the most memory?
> >>> Thank you so much.
> >>>
> >>> BTW, there are 4 options remains unused and I don't understand why
> they are omitted:
> >>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly
> >>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi
> >>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1
> >>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson
> >>>
> >>>
> >>> Regards,
> >>> Frank
> >>>
> >>> On 07/13/2016 05:47 PM, Dave May wrote:
> 
>  On 14 July 2016 at 01:07, frank >
> wrote:
>  Hi Dave,
> 
>  Sorry for the late reply.
>  Thank you so much for your detailed reply.
> 
>  I have a question about the estimation of the memory usage. There are
> 4223139840 allocated non-zeros and 18432 MPI processes. Double precision is
> used. So the memory per process is:
>    4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ?
>  Did I do sth wrong here? Because this seems too small.
> 
>  No - I totally f***ed it up. You are correct. That'll teach me for
> fumbling around with my iphone calculator and not using my brain. (Note
> that to convert to MB just divide by 1e6, not 1024^2 - although I
> apparently cannot convert between units correctly)
> 
>  From the PETSc objects associated with the solver, It looks like it
> _should_ run with 2GB per MPI rank. Sorry for my mistake. Possibilities
> are: somewhere in your usage of PETSc you've introduced a memory leak;
> PETSc is doing a huge over allocation (e.g. as per our discussion of
> MatPtAP); or in your application code there are other objects you have
> forgotten to log the memory for.
> 
> 
> 
>  I am running this job on Bluewater
>  I am using the 7 points FD stencil in 3D.
> 
>  I thought so on both counts.
> 
>  I apologize that I made a stupid mistake in computing the memory per
> core. My settings render each core can access 

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-09 Thread Barry Smith

> On Sep 9, 2016, at 3:11 PM, frank  wrote:
> 
> Hi Barry,
> 
> I think the first KSP view output is from -ksp_view_pre. Before I submitted 
> the test, I was not sure whether there would be OOM error or not. So I added 
> both -ksp_view_pre and -ksp_view.

  But the options file you sent specifically does NOT list the -ksp_view_pre so 
how could it be from that?

   Sorry to be pedantic but I've spent too much time in the past trying to 
debug from incorrect information and want to make sure that the information I 
have is correct before thinking. Please recheck exactly what happened. Rerun 
with the exact input file you emailed if that is needed.

   Barry

> 
> Frank
> 
> 
> On 09/09/2016 12:38 PM, Barry Smith wrote:
>>   Why does ksp_view2.txt have two KSP views in it while ksp_view1.txt has 
>> only one KSPView in it? Did you run two different solves in the 2 case but 
>> not the one?
>> 
>>   Barry
>> 
>> 
>> 
>>> On Sep 9, 2016, at 10:56 AM, frank  wrote:
>>> 
>>> Hi,
>>> 
>>> I want to continue digging into the memory problem here.
>>> I did find a work around in the past, which is to use less cores per node 
>>> so that each core has 8G memory. However this is deficient and expensive. I 
>>> hope to locate the place that uses the most memory.
>>> 
>>> Here is a brief summary of the tests I did in past:
 Test1:   Mesh 1536*128*384  |  Process Mesh 48*4*12
>>> Maximum (over computational time) process memory:   total 7.0727e+08
>>> Current process memory: 
>>> total 7.0727e+08
>>> Maximum (over computational time) space PetscMalloc()ed:  total 6.3908e+11
>>> Current space PetscMalloc()ed:  
>>>   total 1.8275e+09
>>> 
 Test2:Mesh 1536*128*384  |  Process Mesh 96*8*24
>>> Maximum (over computational time) process memory:   total 5.9431e+09
>>> Current process memory: 
>>> total 5.9431e+09
>>> Maximum (over computational time) space PetscMalloc()ed:  total 5.3202e+12
>>> Current space PetscMalloc()ed:  
>>>total 5.4844e+09
>>> 
 Test3:Mesh 3072*256*768  |  Process Mesh 96*8*24
>>> OOM( Out Of Memory ) killer of the supercomputer terminated the job 
>>> during "KSPSolve".
>>> 
>>> I attached the output of ksp_view( the third test's output is from 
>>> ksp_view_pre ), memory_view and also the petsc options.
>>> 
>>> In all the tests, each core can access about 2G memory. In test3, there are 
>>> 4223139840 non-zeros in the matrix. This will consume about 1.74M, using 
>>> double precision. Considering some extra memory used to store integer 
>>> index, 2G memory should still be way enough.
>>> 
>>> Is there a way to find out which part of KSPSolve uses the most memory?
>>> Thank you so much.
>>> 
>>> BTW, there are 4 options remains unused and I don't understand why they are 
>>> omitted:
>>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly
>>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi
>>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1
>>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson
>>> 
>>> 
>>> Regards,
>>> Frank
>>> 
>>> On 07/13/2016 05:47 PM, Dave May wrote:
 
 On 14 July 2016 at 01:07, frank  wrote:
 Hi Dave,
 
 Sorry for the late reply.
 Thank you so much for your detailed reply.
 
 I have a question about the estimation of the memory usage. There are 
 4223139840 allocated non-zeros and 18432 MPI processes. Double precision 
 is used. So the memory per process is:
   4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ?
 Did I do sth wrong here? Because this seems too small.
 
 No - I totally f***ed it up. You are correct. That'll teach me for 
 fumbling around with my iphone calculator and not using my brain. (Note 
 that to convert to MB just divide by 1e6, not 1024^2 - although I 
 apparently cannot convert between units correctly)
 
 From the PETSc objects associated with the solver, It looks like it 
 _should_ run with 2GB per MPI rank. Sorry for my mistake. Possibilities 
 are: somewhere in your usage of PETSc you've introduced a memory leak; 
 PETSc is doing a huge over allocation (e.g. as per our discussion of 
 MatPtAP); or in your application code there are other objects you have 
 forgotten to log the memory for.
 
 
 
 I am running this job on Bluewater
 I am using the 7 points FD stencil in 3D.
 
 I thought so on both counts.
  
 I apologize that I made a stupid mistake in computing the memory per core. 
 My settings render each core can access only 2G memory on average instead 
 of 8G which I mentioned in previous email. I re-run the job with 8G memory 
 per core on average and there is no "Out Of Memory" 

Re: [petsc-users] Sorted CSR Matrix and Multigrid PC.

2016-09-09 Thread Manuel Valera
Thank you SO much for helping me out on this. Dumb error from my part not
to notice.

This means the common /mypcs/ elements are preconfigure internally in PETSc?

Regards and happy weekend,

On Fri, Sep 9, 2016 at 5:19 PM, Barry Smith  wrote:

>
>   The missing third argument to PCApply means that you haven't allocated
> the vector work?
>
>
> > On Sep 9, 2016, at 3:34 PM, Manuel Valera  wrote:
> >
> > Hello everyone,
> >
> > I'm having an error with my program that i cannot understand, the weird
> part is that the same implementation in my main model does not show errors
> and they are virtually identical.
> >
> > The problematic part of the code is:
> >
> >
> >   call KSPCreate(PETSC_COMM_WORLD,ksp,ierr)
> >   call KSPSetOperators(ksp,Ap,Ap,ierr)
> >   call KSPGetPC(ksp,pc,ierr)
> >   tol = 1.e-5
> >   call KSPSetTolerances(ksp,tol,PETSC_DEFAULT_REAL,PETSC_
> DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr)
> >
> >   call PCGetOperators(pc,PETSC_NULL_OBJECT,pmat,ierr)
> >   call PCCreate(PETSC_COMM_WORLD,mg,ierr)
> >   call PCSetType(mg,PCJACOBI,ierr)
> >   call PCSetOperators(mg,pmat,pmat,ierr)
> >   call PCSetUp(mg,ierr)
> >
> >   call PCApply(mg,xp,work,ierr)
> >
> >
> > And the errors i get are:
> >
> > [0]PETSC ERROR: - Error Message
> --
> > [0]PETSC ERROR: Null argument, when expecting valid pointer
> > [0]PETSC ERROR: Null Object: Parameter # 3
> > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [0]PETSC ERROR: Petsc Release Version 3.7.3, Jul, 24, 2016
> > [0]PETSC ERROR: ./solvelinearmgPETSc
>
>
>on a
> arch-linux2-c-debug named valera-HP-xw4600-Workstation by valera Fri Sep  9
> 12:46:02 2016
> > [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++
> --with-fc=gfortran --download-fblaslapack=1 --download-mpich=1
> --download-ml=1
> > [0]PETSC ERROR: #1 PCApply() line 467 in /home/valera/sergcemv4/
> bitbucket/serucoamv4/petsc-3.7.3/src/ksp/pc/interface/precon.c
> > [0]PETSC ERROR: - Error Message
> --
> > [0]PETSC ERROR: Null argument, when expecting valid pointer
> > [0]PETSC ERROR: Null Object: Parameter # 3
> > [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> > [0]PETSC ERROR: Petsc Release Version 3.7.3, Jul, 24, 2016
> > [0]PETSC ERROR: ./solvelinearmgPETSc
> >
> >
> > As i said before, the exact same code in a bigger model does not print
> errors. I'm trying to solve this before moving into multigrid
> implementation in my prototype,
> >
> > Thanks for your time,
> >
> >
> >
> > On Wed, Sep 7, 2016 at 8:27 PM, Barry Smith  wrote:
> >
> > > On Sep 7, 2016, at 10:24 PM, Manuel Valera 
> wrote:
> > >
> > > Thank you I will try this. What would be the call if I wanted to use
> other Multigrid option?
> >
> >There really isn't any other choices.
> >
> > > Don't worry this is a standalone prototype, it should be fine on the
> main model. Anyway, any hints would be appreciated. Thanks a lot for your
> time.
> > >
> > >
> > > On Sep 7, 2016 8:22 PM, "Barry Smith"  wrote:
> > >
> > >   Sorry, this was due to our bug, the fortran function for
> PCGAMGSetType() was wrong. I have fixed this in the maint and master branch
> of PETSc in the git repository. But you can simply remove the call to
> PCGAMGSetType() from your code since what you are setting is the default
> type.
> > >
> > >   BTW: there are other problems with your code after that call that
> you will have to work through.
> > >
> > >   Barry
> > >
> > > > On Sep 7, 2016, at 8:46 PM, Manuel Valera 
> wrote:
> > > >
> > > >
> > > > -- Forwarded message --
> > > > From: Manuel Valera 
> > > > Date: Wed, Sep 7, 2016 at 6:40 PM
> > > > Subject: Re: [petsc-users] Sorted CSR Matrix and Multigrid PC.
> > > > To:
> > > > Cc: PETSc users list 
> > > >
> > > >
> > > > Hello,
> > > >
> > > > I was able to sort the data but the PCGAMG does not seem  to be
> working.
> > > >
> > > > I reconfigured everything from scratch as suggested and updated to
> the latest PETSc version, same results,
> > > >
> > > > I get the following error:
> > > >
> > > > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
> Violation, probably memory access out of range
> > > > [...]
> > > > [0]PETSC ERROR: -  Stack Frames
> 
> > > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not
> available,
> > > > [0]PETSC ERROR:   INSTEAD the line number of the start of the
> function
> > > > [0]PETSC ERROR:   is given.
> > > > [0]PETSC ERROR: [0] PetscStrcmp line 524 

Re: [petsc-users] Sorted CSR Matrix and Multigrid PC.

2016-09-09 Thread Barry Smith
  
  The missing third argument to PCApply means that you haven't allocated the 
vector work?


> On Sep 9, 2016, at 3:34 PM, Manuel Valera  wrote:
> 
> Hello everyone,
> 
> I'm having an error with my program that i cannot understand, the weird part 
> is that the same implementation in my main model does not show errors and 
> they are virtually identical.
> 
> The problematic part of the code is:
> 
> 
>   call KSPCreate(PETSC_COMM_WORLD,ksp,ierr)
>   call KSPSetOperators(ksp,Ap,Ap,ierr)   
>   call KSPGetPC(ksp,pc,ierr)
>   tol = 1.e-5
>   call 
> KSPSetTolerances(ksp,tol,PETSC_DEFAULT_REAL,PETSC_DEFAULT_REAL,PETSC_DEFAULT_INTEGER,ierr)
> 
>   call PCGetOperators(pc,PETSC_NULL_OBJECT,pmat,ierr)
>   call PCCreate(PETSC_COMM_WORLD,mg,ierr)
>   call PCSetType(mg,PCJACOBI,ierr)
>   call PCSetOperators(mg,pmat,pmat,ierr)
>   call PCSetUp(mg,ierr)
> 
>   call PCApply(mg,xp,work,ierr)
> 
> 
> And the errors i get are:
> 
> [0]PETSC ERROR: - Error Message 
> --
> [0]PETSC ERROR: Null argument, when expecting valid pointer
> [0]PETSC ERROR: Null Object: Parameter # 3
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for 
> trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.7.3, Jul, 24, 2016 
> [0]PETSC ERROR: ./solvelinearmgPETSc  
>   
>   
>  on a arch-linux2-c-debug 
> named valera-HP-xw4600-Workstation by valera Fri Sep  9 12:46:02 2016
> [0]PETSC ERROR: Configure options --with-cc=gcc --with-cxx=g++ 
> --with-fc=gfortran --download-fblaslapack=1 --download-mpich=1 
> --download-ml=1
> [0]PETSC ERROR: #1 PCApply() line 467 in 
> /home/valera/sergcemv4/bitbucket/serucoamv4/petsc-3.7.3/src/ksp/pc/interface/precon.c
> [0]PETSC ERROR: - Error Message 
> --
> [0]PETSC ERROR: Null argument, when expecting valid pointer
> [0]PETSC ERROR: Null Object: Parameter # 3
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for 
> trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.7.3, Jul, 24, 2016 
> [0]PETSC ERROR: ./solvelinearmgPETSc  
>
> 
> 
> As i said before, the exact same code in a bigger model does not print 
> errors. I'm trying to solve this before moving into multigrid implementation 
> in my prototype, 
> 
> Thanks for your time,
> 
> 
> 
> On Wed, Sep 7, 2016 at 8:27 PM, Barry Smith  wrote:
> 
> > On Sep 7, 2016, at 10:24 PM, Manuel Valera  wrote:
> >
> > Thank you I will try this. What would be the call if I wanted to use other 
> > Multigrid option?
> 
>There really isn't any other choices.
> 
> > Don't worry this is a standalone prototype, it should be fine on the main 
> > model. Anyway, any hints would be appreciated. Thanks a lot for your time.
> >
> >
> > On Sep 7, 2016 8:22 PM, "Barry Smith"  wrote:
> >
> >   Sorry, this was due to our bug, the fortran function for PCGAMGSetType() 
> > was wrong. I have fixed this in the maint and master branch of PETSc in the 
> > git repository. But you can simply remove the call to PCGAMGSetType() from 
> > your code since what you are setting is the default type.
> >
> >   BTW: there are other problems with your code after that call that you 
> > will have to work through.
> >
> >   Barry
> >
> > > On Sep 7, 2016, at 8:46 PM, Manuel Valera  wrote:
> > >
> > >
> > > -- Forwarded message --
> > > From: Manuel Valera 
> > > Date: Wed, Sep 7, 2016 at 6:40 PM
> > > Subject: Re: [petsc-users] Sorted CSR Matrix and Multigrid PC.
> > > To:
> > > Cc: PETSc users list 
> > >
> > >
> > > Hello,
> > >
> > > I was able to sort the data but the PCGAMG does not seem  to be working.
> > >
> > > I reconfigured everything from scratch as suggested and updated to the 
> > > latest PETSc version, same results,
> > >
> > > I get the following error:
> > >
> > > [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, 
> > > probably memory access out of range
> > > [...]
> > > [0]PETSC ERROR: -  Stack Frames 
> > > 
> > > [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not 
> > > available,
> > > [0]PETSC ERROR:   INSTEAD the line number of the start of the function
> > > [0]PETSC ERROR:   is given.
> > > [0]PETSC ERROR: [0] PetscStrcmp line 524 
> > > /home/valera/sergcemv4/bitbucket/serucoamv4/petsc-3.7.3/src/sys/utils/str.c
> > > [0]PETSC ERROR: [0] 

Re: [petsc-users] DMPlex problem

2016-09-09 Thread Matthew Knepley
On Fri, Sep 9, 2016 at 9:49 AM, Mark Lohry  wrote:

> Regarding DMPlex, I'm unclear on the separation between Index Sets and
> Plex for unstructured solvers. I've got an existing unstructured serial
> solver using petsc's newton-krylov
>
 Index Sets are lists of integers. They can mean whatever you want, but
they are just lists.

Plex stores mesh topology and has topological query functions.

> solvers. Should I be looking to parallelize this via IS or plex? Is there
> an interface for either that allows me to specify the partitioning (e.g. by
> metis' output)? Last but not least, is
>
There is DMPlexDistribute() which partitions and distributes, or you can
use DMPlexMigrates() if you want to calculate
your own partition for some reason.

> there a working example of dmplex in the docs? The only unstructured code
> example I see in the docs is SNES ex10, which uses IS.
>
SNES ex12, ex62, ex77 and TS ex11.

  Thanks,

Matt

> Thanks,
>
> Mark
>
> On 09/09/2016 04:21 AM, Matthew Knepley wrote:
>
> On Fri, Sep 9, 2016 at 4:04 AM, Morten Nobel-Jørgensen 
> wrote:
>
>> Dear PETSc developers and users,
>>
>> Last week we posted a question regarding an error with DMPlex and
>> multiple dofs and have not gotten any feedback yet. This is uncharted
>> waters for us, since we have gotten used to an extremely fast feedback from
>> the PETSc crew. So - with the chance of sounding impatient and ungrateful -
>> we would like to hear if anybody has any ideas that could point us in the
>> right direction?
>>
>
> This is my fault. You have not gotten a response because everyone else was
> waiting for me, and I have been
> slow because I just moved houses at the same time as term started here.
> Sorry about that.
>
> The example ran for me and I saw your problem. The local-tp-global map is
> missing for some reason.
> I am tracking it down now. It should be made by DMCreateMatrix(), so this
> is mysterious. I hope to have
> this fixed by early next week.
>
>   Thanks,
>
> Matt
>
>
>> We have created a small example problem that demonstrates the error in
>> the matrix assembly.
>>
>> Thanks,
>> Morten
>>
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener


Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-09 Thread frank

Hi Barry,

I think the first KSP view output is from -ksp_view_pre. Before I 
submitted the test, I was not sure whether there would be OOM error or 
not. So I added both -ksp_view_pre and -ksp_view.


Frank


On 09/09/2016 12:38 PM, Barry Smith wrote:

   Why does ksp_view2.txt have two KSP views in it while ksp_view1.txt has only 
one KSPView in it? Did you run two different solves in the 2 case but not the 
one?

   Barry




On Sep 9, 2016, at 10:56 AM, frank  wrote:

Hi,

I want to continue digging into the memory problem here.
I did find a work around in the past, which is to use less cores per node so 
that each core has 8G memory. However this is deficient and expensive. I hope 
to locate the place that uses the most memory.

Here is a brief summary of the tests I did in past:

Test1:   Mesh 1536*128*384  |  Process Mesh 48*4*12

Maximum (over computational time) process memory:   total 7.0727e+08
Current process memory: 
total 7.0727e+08
Maximum (over computational time) space PetscMalloc()ed:  total 6.3908e+11
Current space PetscMalloc()ed:
total 1.8275e+09


Test2:Mesh 1536*128*384  |  Process Mesh 96*8*24

Maximum (over computational time) process memory:   total 5.9431e+09
Current process memory: 
total 5.9431e+09
Maximum (over computational time) space PetscMalloc()ed:  total 5.3202e+12
Current space PetscMalloc()ed: 
total 5.4844e+09


Test3:Mesh 3072*256*768  |  Process Mesh 96*8*24

 OOM( Out Of Memory ) killer of the supercomputer terminated the job during 
"KSPSolve".

I attached the output of ksp_view( the third test's output is from ksp_view_pre 
), memory_view and also the petsc options.

In all the tests, each core can access about 2G memory. In test3, there are 
4223139840 non-zeros in the matrix. This will consume about 1.74M, using double 
precision. Considering some extra memory used to store integer index, 2G memory 
should still be way enough.

Is there a way to find out which part of KSPSolve uses the most memory?
Thank you so much.

BTW, there are 4 options remains unused and I don't understand why they are 
omitted:
-mg_coarse_telescope_mg_coarse_ksp_type value: preonly
-mg_coarse_telescope_mg_coarse_pc_type value: bjacobi
-mg_coarse_telescope_mg_levels_ksp_max_it value: 1
-mg_coarse_telescope_mg_levels_ksp_type value: richardson


Regards,
Frank

On 07/13/2016 05:47 PM, Dave May wrote:


On 14 July 2016 at 01:07, frank  wrote:
Hi Dave,

Sorry for the late reply.
Thank you so much for your detailed reply.

I have a question about the estimation of the memory usage. There are 
4223139840 allocated non-zeros and 18432 MPI processes. Double precision is 
used. So the memory per process is:
   4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ?
Did I do sth wrong here? Because this seems too small.

No - I totally f***ed it up. You are correct. That'll teach me for fumbling 
around with my iphone calculator and not using my brain. (Note that to convert 
to MB just divide by 1e6, not 1024^2 - although I apparently cannot convert 
between units correctly)

 From the PETSc objects associated with the solver, It looks like it _should_ 
run with 2GB per MPI rank. Sorry for my mistake. Possibilities are: somewhere 
in your usage of PETSc you've introduced a memory leak; PETSc is doing a huge 
over allocation (e.g. as per our discussion of MatPtAP); or in your application 
code there are other objects you have forgotten to log the memory for.



I am running this job on Bluewater
I am using the 7 points FD stencil in 3D.

I thought so on both counts.
  


I apologize that I made a stupid mistake in computing the memory per core. My settings 
render each core can access only 2G memory on average instead of 8G which I mentioned in 
previous email. I re-run the job with 8G memory per core on average and there is no 
"Out Of Memory" error. I would do more test to see if there is still some 
memory issue.

Ok. I'd still like to know where the memory was being used since my estimates 
were off.


Thanks,
   Dave
  


Regards,
Frank



On 07/11/2016 01:18 PM, Dave May wrote:

Hi Frank,


On 11 July 2016 at 19:14, frank  wrote:
Hi Dave,

I re-run the test using bjacobi as the preconditioner on the coarse mesh of 
telescope. The Grid is 3072*256*768 and process mesh is 96*8*24. The petsc 
option file is attached.
I still got the "Out Of Memory" error. The error occurred before the linear 
solver finished one step. So I don't have the full info from ksp_view. The info from 
ksp_view_pre is attached.

Okay - that is essentially useless (sorry)
  


It seems to me that the error occurred when the decomposition was going to be 
changed.

Based on what information?
Running with -info would give us more clues, 

Re: [petsc-users] Diagnosing a difference between "unpreconditioned" and "true" residual norms

2016-09-09 Thread Barry Smith

   Patrick,

 I have only seen this when the "linear" operator turned out to not 
actually be linear or at least not linear in double precision. 
Are you using differencing or anything in your MatShell that might make it not 
be a linear operator in full precision?

Since your problem is so small you can compute the Jacobian explicitly via 
finite differencing and then use that matrix plus your shell preconditioner. I 
beat if you do this you will see the true and non-true residual norms remain 
the same, this would likely mean something is wonky with your shell matrix.

  Barry

> On Sep 9, 2016, at 9:32 AM, Patrick Sanan  wrote:
> 
> I am debugging a linear solver which uses a custom operator and
> preconditioner, via MATSHELL and PCSHELL. Convergence seems to be
> fine, except that I unexpectedly see a difference between the
> "unpreconditioned" and "true" residual norms when I use
> -ksp_monitor_true_residual with a right-preconditioned Krylov method
> (FGMRES or right-preconditioned GMRES).
> 
>  0 KSP unpreconditioned resid norm 9.266794204683e+08 true resid norm
> 9.266794204683e+08 ||r(i)||/||b|| 1.e+00
>  1 KSP unpreconditioned resid norm 2.317801431974e+04 true resid norm
> 2.317826550333e+04 ||r(i)||/||b|| 2.501217248530e-05
>  2 KSP unpreconditioned resid norm 4.453270507534e+00 true resid norm
> 2.699824780158e+01 ||r(i)||/||b|| 2.913439880638e-08
>  3 KSP unpreconditioned resid norm 1.015490793887e-03 true resid norm
> 2.658635801018e+01 ||r(i)||/||b|| 2.868991953738e-08
>  4 KSP unpreconditioned resid norm 4.710220776105e-07 true resid norm
> 2.658631616810e+01 ||r(i)||/||b|| 2.868987438467e-08
> KSP Object:(mgk_) 1 MPI processes
>  type: fgmres
>GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
>GMRES: happy breakdown tolerance 1e-30
>  maximum iterations=1, initial guess is zero
>  tolerances:  relative=1e-13, absolute=1e-50, divergence=1.
>  right preconditioning
>  using UNPRECONDITIONED norm type for convergence test
> PC Object:(mgk_) 1 MPI processes
>  type: shell
>Shell: Custom PC
>  linear system matrix = precond matrix:
>  Mat Object:  Custom Operator   1 MPI processes
>type: shell
>rows=256, cols=256
>  has attached null space
> 
> I have dumped the explicit operator and preconditioned operator, and I
> can see that the operator and the preconditioned operator each have a
> 1-dimensional nullspace (a constant-pressure nullspace) which I have
> accounted for by constructing a normalized, constant-pressure vector
> and supplying it to the operator via a MatNullSpace.
> 
> If I disregard the (numerically) zero singular value, the operator has
> a condition number of 1.5669e+05 and the preconditioned operator has a
> condition number of 1.01 (strong preconditioner).
> 
> Has anyone seen this sort of behavior before and if so, is there a
> common culprit that I am overlooking? Any ideas of what to test next
> to try to isolate the issue?
> 
> As I understand it, the unpreconditioned and true residual norms
> should be identical in exact arithmetic, so I would suspect that
> somehow I've ended up with a "bad Hessenberg matrix" in some way as I
> perform this solve (or maybe I have a more subtle bug).



Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-09 Thread Barry Smith

  Why does ksp_view2.txt have two KSP views in it while ksp_view1.txt has only 
one KSPView in it? Did you run two different solves in the 2 case but not the 
one? 

  Barry



> On Sep 9, 2016, at 10:56 AM, frank  wrote:
> 
> Hi,
> 
> I want to continue digging into the memory problem here.  
> I did find a work around in the past, which is to use less cores per node so 
> that each core has 8G memory. However this is deficient and expensive. I hope 
> to locate the place that uses the most memory.
> 
> Here is a brief summary of the tests I did in past:   
> > Test1:   Mesh 1536*128*384  |  Process Mesh 48*4*12 
> Maximum (over computational time) process memory:   total 7.0727e+08 
> Current process memory:   
>   total 7.0727e+08 
> Maximum (over computational time) space PetscMalloc()ed:  total 6.3908e+11
> Current space PetscMalloc()ed:
> total 1.8275e+09 
> 
> > Test2:Mesh 1536*128*384  |  Process Mesh 96*8*24 
> Maximum (over computational time) process memory:   total 5.9431e+09 
> Current process memory:   
>   total 5.9431e+09
> Maximum (over computational time) space PetscMalloc()ed:  total 5.3202e+12
> Current space PetscMalloc()ed:
>  total 5.4844e+09
> 
> > Test3:Mesh 3072*256*768  |  Process Mesh 96*8*24
> OOM( Out Of Memory ) killer of the supercomputer terminated the job 
> during "KSPSolve". 
> 
> I attached the output of ksp_view( the third test's output is from 
> ksp_view_pre ), memory_view and also the petsc options.
> 
> In all the tests, each core can access about 2G memory. In test3, there are 
> 4223139840 non-zeros in the matrix. This will consume about 1.74M, using 
> double precision. Considering some extra memory used to store integer index, 
> 2G memory should still be way enough.
> 
> Is there a way to find out which part of KSPSolve uses the most memory? 
> Thank you so much.
> 
> BTW, there are 4 options remains unused and I don't understand why they are 
> omitted:
> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly
> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi
> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1
> -mg_coarse_telescope_mg_levels_ksp_type value: richardson
> 
> 
> Regards,
> Frank
> 
> On 07/13/2016 05:47 PM, Dave May wrote:
>> 
>> 
>> On 14 July 2016 at 01:07, frank  wrote:
>> Hi Dave,
>> 
>> Sorry for the late reply.
>> Thank you so much for your detailed reply.
>> 
>> I have a question about the estimation of the memory usage. There are 
>> 4223139840 allocated non-zeros and 18432 MPI processes. Double precision is 
>> used. So the memory per process is:
>>   4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ? 
>> Did I do sth wrong here? Because this seems too small.
>> 
>> No - I totally f***ed it up. You are correct. That'll teach me for fumbling 
>> around with my iphone calculator and not using my brain. (Note that to 
>> convert to MB just divide by 1e6, not 1024^2 - although I apparently cannot 
>> convert between units correctly)
>> 
>> From the PETSc objects associated with the solver, It looks like it _should_ 
>> run with 2GB per MPI rank. Sorry for my mistake. Possibilities are: 
>> somewhere in your usage of PETSc you've introduced a memory leak; PETSc is 
>> doing a huge over allocation (e.g. as per our discussion of MatPtAP); or in 
>> your application code there are other objects you have forgotten to log the 
>> memory for.
>> 
>> 
>> 
>> I am running this job on Bluewater 
>> I am using the 7 points FD stencil in 3D. 
>> 
>> I thought so on both counts.
>>  
>> 
>> I apologize that I made a stupid mistake in computing the memory per core. 
>> My settings render each core can access only 2G memory on average instead of 
>> 8G which I mentioned in previous email. I re-run the job with 8G memory per 
>> core on average and there is no "Out Of Memory" error. I would do more test 
>> to see if there is still some memory issue.
>> 
>> Ok. I'd still like to know where the memory was being used since my 
>> estimates were off.
>> 
>> 
>> Thanks,
>>   Dave
>>  
>> 
>> Regards,
>> Frank
>> 
>> 
>> 
>> On 07/11/2016 01:18 PM, Dave May wrote:
>>> Hi Frank,
>>> 
>>> 
>>> On 11 July 2016 at 19:14, frank  wrote:
>>> Hi Dave,
>>> 
>>> I re-run the test using bjacobi as the preconditioner on the coarse mesh of 
>>> telescope. The Grid is 3072*256*768 and process mesh is 96*8*24. The petsc 
>>> option file is attached.
>>> I still got the "Out Of Memory" error. The error occurred before the linear 
>>> solver finished one step. So I don't have the full info from ksp_view. The 
>>> info from ksp_view_pre is attached.
>>> 
>>> Okay - that is essentially useless (sorry)
>>>  
>>> 
>>> It seems to me that the error occurred when the 

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-09 Thread frank

Hi,

I want to continue digging into the memory problem here.
I did find a work around in the past, which is to use less cores per 
node so that each core has 8G memory. However this is deficient and 
expensive. I hope to locate the place that uses the most memory.


Here is a brief summary of the tests I did in past:
> Test1:   Mesh 1536*128*384  |  Process Mesh 48*4*12
Maximum (over computational time) process memory:   total 
7.0727e+08

Current process memory: total 7.0727e+08
Maximum (over computational time) space PetscMalloc()ed:  total 6.3908e+11
Current space PetscMalloc()ed:   total 1.8275e+09

> Test2:Mesh 1536*128*384  |  Process Mesh 96*8*24
Maximum (over computational time) process memory:   total 
5.9431e+09

Current process memory: total 5.9431e+09
Maximum (over computational time) space PetscMalloc()ed:  total 5.3202e+12
Current space PetscMalloc()ed: total 5.4844e+09

> Test3:Mesh 3072*256*768  |  Process Mesh 96*8*24
OOM( Out Of Memory ) killer of the supercomputer terminated the job 
during "KSPSolve".


I attached the output of ksp_view( the third test's output is from 
ksp_view_pre ), memory_view and also the petsc options.


In all the tests, each core can access about 2G memory. In test3, there 
are 4223139840 non-zeros in the matrix. This will consume about 1.74M, 
using double precision. Considering some extra memory used to store 
integer index, 2G memory should still be way enough.


Is there a way to find out which part of KSPSolve uses the most memory?
Thank you so much.

BTW, there are 4 options remains unused and I don't understand why they 
are omitted:

-mg_coarse_telescope_mg_coarse_ksp_type value: preonly
-mg_coarse_telescope_mg_coarse_pc_type value: bjacobi
-mg_coarse_telescope_mg_levels_ksp_max_it value: 1
-mg_coarse_telescope_mg_levels_ksp_type value: richardson


Regards,
Frank

On 07/13/2016 05:47 PM, Dave May wrote:



On 14 July 2016 at 01:07, frank > wrote:


Hi Dave,

Sorry for the late reply.
Thank you so much for your detailed reply.

I have a question about the estimation of the memory usage. There
are 4223139840 allocated non-zeros and 18432 MPI processes. Double
precision is used. So the memory per process is:
  4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ?
Did I do sth wrong here? Because this seems too small.


No - I totally f***ed it up. You are correct. That'll teach me for 
fumbling around with my iphone calculator and not using my brain. 
(Note that to convert to MB just divide by 1e6, not 1024^2 - although 
I apparently cannot convert between units correctly)


From the PETSc objects associated with the solver, It looks like it 
_should_ run with 2GB per MPI rank. Sorry for my mistake. 
Possibilities are: somewhere in your usage of PETSc you've introduced 
a memory leak; PETSc is doing a huge over allocation (e.g. as per our 
discussion of MatPtAP); or in your application code there are other 
objects you have forgotten to log the memory for.




I am running this job on Bluewater


I am using the 7 points FD stencil in 3D.


I thought so on both counts.


I apologize that I made a stupid mistake in computing the memory
per core. My settings render each core can access only 2G memory
on average instead of 8G which I mentioned in previous email. I
re-run the job with 8G memory per core on average and there is no
"Out Of Memory" error. I would do more test to see if there is
still some memory issue.


Ok. I'd still like to know where the memory was being used since my 
estimates were off.



Thanks,
  Dave


Regards,
Frank



On 07/11/2016 01:18 PM, Dave May wrote:

Hi Frank,


On 11 July 2016 at 19:14, frank > wrote:

Hi Dave,

I re-run the test using bjacobi as the preconditioner on the
coarse mesh of telescope. The Grid is 3072*256*768 and
process mesh is 96*8*24. The petsc option file is attached.
I still got the "Out Of Memory" error. The error occurred
before the linear solver finished one step. So I don't have
the full info from ksp_view. The info from ksp_view_pre is
attached.


Okay - that is essentially useless (sorry)


It seems to me that the error occurred when the decomposition
was going to be changed.


Based on what information?
Running with -info would give us more clues, but will create a
ton of output.
Please try running the case which failed with -info

I had another test with a grid of 1536*128*384 and the same
process mesh as above. There was no error. The ksp_view info
is attached for comparison.
Thank you.



[3] Here is my crude estimate of your memory usage.
I'll target the biggest memory hogs only to get 

Re: [petsc-users] DMPlex problem

2016-09-09 Thread Mark Lohry
Regarding DMPlex, I'm unclear on the separation between Index Sets and 
Plex for unstructured solvers. I've got an existing unstructured serial 
solver using petsc's newton-krylov solvers. Should I be looking to 
parallelize this via IS or plex? Is there an interface for either that 
allows me to specify the partitioning (e.g. by metis' output)? Last but 
not least, is there a working example of dmplex in the docs? The only 
unstructured code example I see in the docs is SNES ex10, which uses IS.


Thanks,

Mark


On 09/09/2016 04:21 AM, Matthew Knepley wrote:
On Fri, Sep 9, 2016 at 4:04 AM, Morten Nobel-Jørgensen > wrote:


Dear PETSc developers and users,

Last week we posted a question regarding an error with DMPlex and
multiple dofs and have not gotten any feedback yet. This is
uncharted waters for us, since we have gotten used to an extremely
fast feedback from the PETSc crew. So - with the chance of
sounding impatient and ungrateful - we would like to hear if
anybody has any ideas that could point us in the right direction?


This is my fault. You have not gotten a response because everyone else 
was waiting for me, and I have been
slow because I just moved houses at the same time as term started 
here. Sorry about that.


The example ran for me and I saw your problem. The local-tp-global map 
is missing for some reason.
I am tracking it down now. It should be made by DMCreateMatrix(), so 
this is mysterious. I hope to have

this fixed by early next week.

  Thanks,

Matt

We have created a small example problem that demonstrates the
error in the matrix assembly.

Thanks,
Morten





--
What most experimenters take for granted before they begin their 
experiments is infinitely more interesting than any results to which 
their experiments lead.

-- Norbert Wiener




[petsc-users] Diagnosing a difference between "unpreconditioned" and "true" residual norms

2016-09-09 Thread Patrick Sanan
I am debugging a linear solver which uses a custom operator and
preconditioner, via MATSHELL and PCSHELL. Convergence seems to be
fine, except that I unexpectedly see a difference between the
"unpreconditioned" and "true" residual norms when I use
-ksp_monitor_true_residual with a right-preconditioned Krylov method
(FGMRES or right-preconditioned GMRES).

  0 KSP unpreconditioned resid norm 9.266794204683e+08 true resid norm
9.266794204683e+08 ||r(i)||/||b|| 1.e+00
  1 KSP unpreconditioned resid norm 2.317801431974e+04 true resid norm
2.317826550333e+04 ||r(i)||/||b|| 2.501217248530e-05
  2 KSP unpreconditioned resid norm 4.453270507534e+00 true resid norm
2.699824780158e+01 ||r(i)||/||b|| 2.913439880638e-08
  3 KSP unpreconditioned resid norm 1.015490793887e-03 true resid norm
2.658635801018e+01 ||r(i)||/||b|| 2.868991953738e-08
  4 KSP unpreconditioned resid norm 4.710220776105e-07 true resid norm
2.658631616810e+01 ||r(i)||/||b|| 2.868987438467e-08
KSP Object:(mgk_) 1 MPI processes
  type: fgmres
GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30
  maximum iterations=1, initial guess is zero
  tolerances:  relative=1e-13, absolute=1e-50, divergence=1.
  right preconditioning
  using UNPRECONDITIONED norm type for convergence test
PC Object:(mgk_) 1 MPI processes
  type: shell
Shell: Custom PC
  linear system matrix = precond matrix:
  Mat Object:  Custom Operator   1 MPI processes
type: shell
rows=256, cols=256
  has attached null space

I have dumped the explicit operator and preconditioned operator, and I
can see that the operator and the preconditioned operator each have a
1-dimensional nullspace (a constant-pressure nullspace) which I have
accounted for by constructing a normalized, constant-pressure vector
and supplying it to the operator via a MatNullSpace.

If I disregard the (numerically) zero singular value, the operator has
a condition number of 1.5669e+05 and the preconditioned operator has a
condition number of 1.01 (strong preconditioner).

Has anyone seen this sort of behavior before and if so, is there a
common culprit that I am overlooking? Any ideas of what to test next
to try to isolate the issue?

As I understand it, the unpreconditioned and true residual norms
should be identical in exact arithmetic, so I would suspect that
somehow I've ended up with a "bad Hessenberg matrix" in some way as I
perform this solve (or maybe I have a more subtle bug).


Re: [petsc-users] DMPlex problem

2016-09-09 Thread Matthew Knepley
On Fri, Sep 9, 2016 at 4:04 AM, Morten Nobel-Jørgensen  wrote:

> Dear PETSc developers and users,
>
> Last week we posted a question regarding an error with DMPlex and multiple
> dofs and have not gotten any feedback yet. This is uncharted waters for us,
> since we have gotten used to an extremely fast feedback from the PETSc
> crew. So - with the chance of sounding impatient and ungrateful - we would
> like to hear if anybody has any ideas that could point us in the right
> direction?
>

This is my fault. You have not gotten a response because everyone else was
waiting for me, and I have been
slow because I just moved houses at the same time as term started here.
Sorry about that.

The example ran for me and I saw your problem. The local-tp-global map is
missing for some reason.
I am tracking it down now. It should be made by DMCreateMatrix(), so this
is mysterious. I hope to have
this fixed by early next week.

  Thanks,

Matt


> We have created a small example problem that demonstrates the error in the
> matrix assembly.
>
> Thanks,
> Morten
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener


[petsc-users] DMPlex problem

2016-09-09 Thread Morten Nobel-Jørgensen
Dear PETSc developers and users,

Last week we posted a question regarding an error with DMPlex and multiple dofs 
and have not gotten any feedback yet. This is uncharted waters for us, since we 
have gotten used to an extremely fast feedback from the PETSc crew. So - with 
the chance of sounding impatient and ungrateful - we would like to hear if 
anybody has any ideas that could point us in the right direction?

We have created a small example problem that demonstrates the error in the 
matrix assembly.

Thanks,
Morten




ex18k.cc
Description: ex18k.cc