Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-16 Thread Jed Brown
Barry Smith  writes:

>> On Sep 15, 2016, at 1:10 PM, Dave May  wrote:
>> 
>> 
>> 
>> On Thursday, 15 September 2016, Barry Smith  wrote:
>> 
>>Should we have some simple selection of default algorithms based on 
>> problem size/number of processes? For example if using more than 1000 
>> processes then use scalable version etc?  How would we decide on the 
>> parameter values?
>> 
>> I don't like the idea of having "smart" selection by default as it's 
>> terribly annoying for the user when they try and understand the performance 
>> characteristics of a given method when they do a strong/weak scaling test. 
>> If such a smart selection strategy was adopted, the details of it should be 
>> made abundantly clear to the user.
>> 
>> These algs are dependent on many some factors, thus making the smart 
>> selection for all use cases hard / impossible.
>> 
>> I would be happy with unifying the three inplemtationa with three different 
>> options AND having these implantation options documented in the man page. 
>> Maybe even the man page should advise users which to use in particular 
>> circumstances (I think there is something similar on the VecScatter page).
>> 
>> I have these as suggestions for unifying the options names using bools 
>> 
>> -matptap_explicit_transpose
>> -matptap_symbolic_transpose_dense
>> -matptap_symbolic_transpose
>> 
>> Or maybe enums is more clear
>> -matptap_impl {explicit_pt,symbolic_pt_dense,symbolic_pt}
>> 
>> which are equivalent to these options
>> 1) the current default
>> 2) -matrap 0
>> 3) -matrap 0 -matptap_scalable
>> 
>> Maybe there could be a fourth option
>> -matptap_dynamic_selection
>> which chooses the most appropriate alg given machine info, problem size, 
>> partition size, At least if the user explicitly chooses the 
>> dynamic_selection mode, they wouldn't be surprised if there were any bumps 
>> appearing in any scaling study they conducted.
>
>I like the idea of enum types with the final enum type being "dynamically 
> select one for me".

I also like enums and "-matptap_impl auto" (which could be the default).


signature.asc
Description: PGP signature


Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-16 Thread Hengjie Wang

Hi Dave,

I add both options and test it by solving the poisson eqn in  a 1024 
cube with 32^3 cores. This test used to give the OOM error. Now it runs 
well.

I attach the ksp_view and log_view's output in case you want to know.
I also test my original code with those petsc options by simulating a 
decaying turbulence in a 1024 cube. It also works.  I am going to test 
the code on a larger scale. If there is any problem then, I will let you 
know.

This really helps a lot. Thank you so much.

Regards,
Frank


On 9/15/2016 3:35 AM, Dave May wrote:

HI all,

I the only unexpected memory usage I can see is associated with the 
call to MatPtAP().

Here is something you can try immediately.
Run your code with the additional options
  -matrap 0 -matptap_scalable

I didn't realize this before, but the default behaviour of MatPtAP in 
parallel is actually to to explicitly form the transpose of P (e.g. 
assemble R = P^T) and then compute R.A.P.

You don't want to do this. The option -matrap 0 resolves this issue.

The implementation of P^T.A.P has two variants.
The scalable implementation (with respect to memory usage) is selected 
via the second option -matptap_scalable.


Try it out - I see a significant memory reduction using these options 
for particular mesh sizes / partitions.


I've attached a cleaned up version of the code you sent me.
There were a number of memory leaks and other issues.
The main points being
  * You should call DMDAVecGetArrayF90() before VecAssembly{Begin,End}
  * You should call PetscFinalize(), otherwise the option -log_summary 
(-log_view) will not display anything once the program has completed.



Thanks,
  Dave


On 15 September 2016 at 08:03, Hengjie Wang > wrote:


Hi Dave,

Sorry, I should have put more comment to explain the code.
The number of process in each dimension is the same: Px = Py=Pz=P.
So is the domain size.
So if the you want to run the code for a  512^3 grid points on
16^3 cores, you need to set "-N 512 -P 16" in the command line.
I add more comments and also fix an error in the attached code. (
The error only effects the accuracy of solution but not the memory
usage. )

Thank you.
Frank


On 9/14/2016 9:05 PM, Dave May wrote:



On Thursday, 15 September 2016, Dave May > wrote:



On Thursday, 15 September 2016, frank  wrote:

Hi,

I write a simple code to re-produce the error. I hope
this can help to diagnose the problem.
The code just solves a 3d poisson equation.


Why is the stencil width a runtime parameter?? And why is the
default value 2? For 7-pnt FD Laplace, you only need
a stencil width of 1.

Was this choice made to mimic something in the
real application code?


Please ignore - I misunderstood your usage of the param set by -P


I run the code on a 1024^3 mesh. The process partition is
32 * 32 * 32. That's when I re-produce the OOM error.
Each core has about 2G memory.
I also run the code on a 512^3 mesh with 16 * 16 * 16
processes. The ksp solver works fine.
I attached the code, ksp_view_pre's output and my petsc
option file.

Thank you.
Frank

On 09/09/2016 06:38 PM, Hengjie Wang wrote:

Hi Barry,

I checked. On the supercomputer, I had the option
"-ksp_view_pre" but it is not in file I sent you. I am
sorry for the confusion.

Regards,
Frank

On Friday, September 9, 2016, Barry Smith
 wrote:


> On Sep 9, 2016, at 3:11 PM, frank
 wrote:
>
> Hi Barry,
>
> I think the first KSP view output is from
-ksp_view_pre. Before I submitted the test, I was
not sure whether there would be OOM error or not. So
I added both -ksp_view_pre and -ksp_view.

  But the options file you sent specifically does
NOT list the -ksp_view_pre so how could it be from that?

   Sorry to be pedantic but I've spent too much time
in the past trying to debug from incorrect
information and want to make sure that the
information I have is correct before thinking.
Please recheck exactly what happened. Rerun with the
exact input file you emailed if that is needed.

   Barry

>
> Frank
>
>
> On 09/09/2016 12:38 PM, Barry Smith wrote:
>>   Why does ksp_view2.txt have two KSP views in it
  

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-15 Thread Barry Smith

> On Sep 15, 2016, at 1:10 PM, Dave May  wrote:
> 
> 
> 
> On Thursday, 15 September 2016, Barry Smith  wrote:
> 
>Should we have some simple selection of default algorithms based on 
> problem size/number of processes? For example if using more than 1000 
> processes then use scalable version etc?  How would we decide on the 
> parameter values?
> 
> I don't like the idea of having "smart" selection by default as it's terribly 
> annoying for the user when they try and understand the performance 
> characteristics of a given method when they do a strong/weak scaling test. If 
> such a smart selection strategy was adopted, the details of it should be made 
> abundantly clear to the user.
> 
> These algs are dependent on many some factors, thus making the smart 
> selection for all use cases hard / impossible.
> 
> I would be happy with unifying the three inplemtationa with three different 
> options AND having these implantation options documented in the man page. 
> Maybe even the man page should advise users which to use in particular 
> circumstances (I think there is something similar on the VecScatter page).
> 
> I have these as suggestions for unifying the options names using bools 
> 
> -matptap_explicit_transpose
> -matptap_symbolic_transpose_dense
> -matptap_symbolic_transpose
> 
> Or maybe enums is more clear
> -matptap_impl {explicit_pt,symbolic_pt_dense,symbolic_pt}
> 
> which are equivalent to these options
> 1) the current default
> 2) -matrap 0
> 3) -matrap 0 -matptap_scalable
> 
> Maybe there could be a fourth option
> -matptap_dynamic_selection
> which chooses the most appropriate alg given machine info, problem size, 
> partition size, At least if the user explicitly chooses the 
> dynamic_selection mode, they wouldn't be surprised if there were any bumps 
> appearing in any scaling study they conducted.

   I like the idea of enum types with the final enum type being "dynamically 
select one for me".

   Barry

> 
> Cheers
>   Dave
> 
>  
> 
>Barry
> 
> > On Sep 15, 2016, at 5:35 AM, Dave May  wrote:
> >
> > HI all,
> >
> > I the only unexpected memory usage I can see is associated with the call to 
> > MatPtAP().
> > Here is something you can try immediately.
> > Run your code with the additional options
> >   -matrap 0 -matptap_scalable
> >
> > I didn't realize this before, but the default behaviour of MatPtAP in 
> > parallel is actually to to explicitly form the transpose of P (e.g. 
> > assemble R = P^T) and then compute R.A.P.
> > You don't want to do this. The option -matrap 0 resolves this issue.
> >
> > The implementation of P^T.A.P has two variants.
> > The scalable implementation (with respect to memory usage) is selected via 
> > the second option -matptap_scalable.
> >
> > Try it out - I see a significant memory reduction using these options for 
> > particular mesh sizes / partitions.
> >
> > I've attached a cleaned up version of the code you sent me.
> > There were a number of memory leaks and other issues.
> > The main points being
> >   * You should call DMDAVecGetArrayF90() before VecAssembly{Begin,End}
> >   * You should call PetscFinalize(), otherwise the option -log_summary 
> > (-log_view) will not display anything once the program has completed.
> >
> >
> > Thanks,
> >   Dave
> >
> >
> > On 15 September 2016 at 08:03, Hengjie Wang  wrote:
> > Hi Dave,
> >
> > Sorry, I should have put more comment to explain the code.
> > The number of process in each dimension is the same: Px = Py=Pz=P. So is 
> > the domain size.
> > So if the you want to run the code for a  512^3 grid points on 16^3 cores, 
> > you need to set "-N 512 -P 16" in the command line.
> > I add more comments and also fix an error in the attached code. ( The error 
> > only effects the accuracy of solution but not the memory usage. )
> >
> > Thank you.
> > Frank
> >
> >
> > On 9/14/2016 9:05 PM, Dave May wrote:
> >>
> >>
> >> On Thursday, 15 September 2016, Dave May  wrote:
> >>
> >>
> >> On Thursday, 15 September 2016, frank  wrote:
> >> Hi,
> >>
> >> I write a simple code to re-produce the error. I hope this can help to 
> >> diagnose the problem.
> >> The code just solves a 3d poisson equation.
> >>
> >> Why is the stencil width a runtime parameter?? And why is the default 
> >> value 2? For 7-pnt FD Laplace, you only need a stencil width of 1.
> >>
> >> Was this choice made to mimic something in the real application code?
> >>
> >> Please ignore - I misunderstood your usage of the param set by -P
> >>
> >>
> >>
> >> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * 32. 
> >> That's when I re-produce the OOM error. Each core has about 2G memory.
> >> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The ksp 
> >> solver works fine.
> >> I attached the code, ksp_view_pre's output and my petsc option file.
> >>
> >> Thank you.

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-15 Thread Dave May
On Thursday, 15 September 2016, Barry Smith  wrote:

>
>Should we have some simple selection of default algorithms based on
> problem size/number of processes? For example if using more than 1000
> processes then use scalable version etc?  How would we decide on the
> parameter values?


I don't like the idea of having "smart" selection by default as it's
terribly annoying for the user when they try and understand the performance
characteristics of a given method when they do a strong/weak scaling test.
If such a smart selection strategy was adopted, the details of it should be
made abundantly clear to the user.

These algs are dependent on many some factors, thus making the smart
selection for all use cases hard / impossible.

I would be happy with unifying the three inplemtationa with three different
options AND having these implantation options documented in the man page.
Maybe even the man page should advise users which to use in particular
circumstances (I think there is something similar on the VecScatter page).

I have these as suggestions for unifying the options names using bools

-matptap_explicit_transpose
-matptap_symbolic_transpose_dense
-matptap_symbolic_transpose

Or maybe enums is more clear
-matptap_impl {explicit_pt,symbolic_pt_dense,symbolic_pt}

which are equivalent to these options
1) the current default
2) -matrap 0
3) -matrap 0 -matptap_scalable

Maybe there could be a fourth option
-matptap_dynamic_selection
which chooses the most appropriate alg given machine info, problem size,
partition size, At least if the user explicitly chooses the
dynamic_selection mode, they wouldn't be surprised if there were any
bumps appearing in any scaling study they conducted.

Cheers
  Dave



>
>Barry
>
> > On Sep 15, 2016, at 5:35 AM, Dave May  > wrote:
> >
> > HI all,
> >
> > I the only unexpected memory usage I can see is associated with the call
> to MatPtAP().
> > Here is something you can try immediately.
> > Run your code with the additional options
> >   -matrap 0 -matptap_scalable
> >
> > I didn't realize this before, but the default behaviour of MatPtAP in
> parallel is actually to to explicitly form the transpose of P (e.g.
> assemble R = P^T) and then compute R.A.P.
> > You don't want to do this. The option -matrap 0 resolves this issue.
> >
> > The implementation of P^T.A.P has two variants.
> > The scalable implementation (with respect to memory usage) is selected
> via the second option -matptap_scalable.
> >
> > Try it out - I see a significant memory reduction using these options
> for particular mesh sizes / partitions.
> >
> > I've attached a cleaned up version of the code you sent me.
> > There were a number of memory leaks and other issues.
> > The main points being
> >   * You should call DMDAVecGetArrayF90() before VecAssembly{Begin,End}
> >   * You should call PetscFinalize(), otherwise the option -log_summary
> (-log_view) will not display anything once the program has completed.
> >
> >
> > Thanks,
> >   Dave
> >
> >
> > On 15 September 2016 at 08:03, Hengjie Wang  > wrote:
> > Hi Dave,
> >
> > Sorry, I should have put more comment to explain the code.
> > The number of process in each dimension is the same: Px = Py=Pz=P. So is
> the domain size.
> > So if the you want to run the code for a  512^3 grid points on 16^3
> cores, you need to set "-N 512 -P 16" in the command line.
> > I add more comments and also fix an error in the attached code. ( The
> error only effects the accuracy of solution but not the memory usage. )
> >
> > Thank you.
> > Frank
> >
> >
> > On 9/14/2016 9:05 PM, Dave May wrote:
> >>
> >>
> >> On Thursday, 15 September 2016, Dave May  > wrote:
> >>
> >>
> >> On Thursday, 15 September 2016, frank >
> wrote:
> >> Hi,
> >>
> >> I write a simple code to re-produce the error. I hope this can help to
> diagnose the problem.
> >> The code just solves a 3d poisson equation.
> >>
> >> Why is the stencil width a runtime parameter?? And why is the default
> value 2? For 7-pnt FD Laplace, you only need a stencil width of 1.
> >>
> >> Was this choice made to mimic something in the real application code?
> >>
> >> Please ignore - I misunderstood your usage of the param set by -P
> >>
> >>
> >>
> >> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * 32.
> That's when I re-produce the OOM error. Each core has about 2G memory.
> >> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The
> ksp solver works fine.
> >> I attached the code, ksp_view_pre's output and my petsc option file.
> >>
> >> Thank you.
> >> Frank
> >>
> >> On 09/09/2016 06:38 PM, Hengjie Wang wrote:
> >>> Hi Barry,
> >>>
> >>> I checked. On the supercomputer, I had the option "-ksp_view_pre" but
> it is not in file I sent you. I am sorry for the confusion.
> >>>
> >>> Regards,
> >>> Frank
> >>>
> 

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-15 Thread Barry Smith

   Should we have some simple selection of default algorithms based on problem 
size/number of processes? For example if using more than 1000 processes then 
use scalable version etc?  How would we decide on the parameter values?

   Barry

> On Sep 15, 2016, at 5:35 AM, Dave May  wrote:
> 
> HI all,
> 
> I the only unexpected memory usage I can see is associated with the call to 
> MatPtAP().
> Here is something you can try immediately.
> Run your code with the additional options
>   -matrap 0 -matptap_scalable
> 
> I didn't realize this before, but the default behaviour of MatPtAP in 
> parallel is actually to to explicitly form the transpose of P (e.g. assemble 
> R = P^T) and then compute R.A.P. 
> You don't want to do this. The option -matrap 0 resolves this issue.
> 
> The implementation of P^T.A.P has two variants. 
> The scalable implementation (with respect to memory usage) is selected via 
> the second option -matptap_scalable.
> 
> Try it out - I see a significant memory reduction using these options for 
> particular mesh sizes / partitions.
> 
> I've attached a cleaned up version of the code you sent me.
> There were a number of memory leaks and other issues.
> The main points being
>   * You should call DMDAVecGetArrayF90() before VecAssembly{Begin,End}
>   * You should call PetscFinalize(), otherwise the option -log_summary 
> (-log_view) will not display anything once the program has completed.
> 
> 
> Thanks,
>   Dave
> 
> 
> On 15 September 2016 at 08:03, Hengjie Wang  wrote:
> Hi Dave,
> 
> Sorry, I should have put more comment to explain the code.  
> The number of process in each dimension is the same: Px = Py=Pz=P. So is the 
> domain size.
> So if the you want to run the code for a  512^3 grid points on 16^3 cores, 
> you need to set "-N 512 -P 16" in the command line.
> I add more comments and also fix an error in the attached code. ( The error 
> only effects the accuracy of solution but not the memory usage. ) 
> 
> Thank you.
> Frank
> 
> 
> On 9/14/2016 9:05 PM, Dave May wrote:
>> 
>> 
>> On Thursday, 15 September 2016, Dave May  wrote:
>> 
>> 
>> On Thursday, 15 September 2016, frank  wrote:
>> Hi, 
>> 
>> I write a simple code to re-produce the error. I hope this can help to 
>> diagnose the problem.
>> The code just solves a 3d poisson equation. 
>> 
>> Why is the stencil width a runtime parameter?? And why is the default value 
>> 2? For 7-pnt FD Laplace, you only need a stencil width of 1. 
>> 
>> Was this choice made to mimic something in the real application code?
>> 
>> Please ignore - I misunderstood your usage of the param set by -P
>>  
>>  
>> 
>> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * 32. 
>> That's when I re-produce the OOM error. Each core has about 2G memory.
>> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The ksp 
>> solver works fine. 
>> I attached the code, ksp_view_pre's output and my petsc option file.
>> 
>> Thank you.
>> Frank
>> 
>> On 09/09/2016 06:38 PM, Hengjie Wang wrote:
>>> Hi Barry, 
>>> 
>>> I checked. On the supercomputer, I had the option "-ksp_view_pre" but it is 
>>> not in file I sent you. I am sorry for the confusion.
>>> 
>>> Regards,
>>> Frank
>>> 
>>> On Friday, September 9, 2016, Barry Smith  wrote:
>>> 
>>> > On Sep 9, 2016, at 3:11 PM, frank  wrote:
>>> >
>>> > Hi Barry,
>>> >
>>> > I think the first KSP view output is from -ksp_view_pre. Before I 
>>> > submitted the test, I was not sure whether there would be OOM error or 
>>> > not. So I added both -ksp_view_pre and -ksp_view.
>>> 
>>>   But the options file you sent specifically does NOT list the 
>>> -ksp_view_pre so how could it be from that?
>>> 
>>>Sorry to be pedantic but I've spent too much time in the past trying to 
>>> debug from incorrect information and want to make sure that the information 
>>> I have is correct before thinking. Please recheck exactly what happened. 
>>> Rerun with the exact input file you emailed if that is needed.
>>> 
>>>Barry
>>> 
>>> >
>>> > Frank
>>> >
>>> >
>>> > On 09/09/2016 12:38 PM, Barry Smith wrote:
>>> >>   Why does ksp_view2.txt have two KSP views in it while ksp_view1.txt 
>>> >> has only one KSPView in it? Did you run two different solves in the 2 
>>> >> case but not the one?
>>> >>
>>> >>   Barry
>>> >>
>>> >>
>>> >>
>>> >>> On Sep 9, 2016, at 10:56 AM, frank  wrote:
>>> >>>
>>> >>> Hi,
>>> >>>
>>> >>> I want to continue digging into the memory problem here.
>>> >>> I did find a work around in the past, which is to use less cores per 
>>> >>> node so that each core has 8G memory. However this is deficient and 
>>> >>> expensive. I hope to locate the place that uses the most memory.
>>> >>>
>>> >>> Here is a brief summary of the tests I did in past:
>>>  Test1:   Mesh 1536*128*384  |  Process Mesh 48*4*12
>>> >>> Maximum (over 

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-15 Thread Dave May
HI all,

I the only unexpected memory usage I can see is associated with the call to
MatPtAP().
Here is something you can try immediately.
Run your code with the additional options
  -matrap 0 -matptap_scalable

I didn't realize this before, but the default behaviour of MatPtAP in
parallel is actually to to explicitly form the transpose of P (e.g.
assemble R = P^T) and then compute R.A.P.
You don't want to do this. The option -matrap 0 resolves this issue.

The implementation of P^T.A.P has two variants.
The scalable implementation (with respect to memory usage) is selected via
the second option -matptap_scalable.

Try it out - I see a significant memory reduction using these options for
particular mesh sizes / partitions.

I've attached a cleaned up version of the code you sent me.
There were a number of memory leaks and other issues.
The main points being
  * You should call DMDAVecGetArrayF90() before VecAssembly{Begin,End}
  * You should call PetscFinalize(), otherwise the option -log_summary
(-log_view) will not display anything once the program has completed.


Thanks,
  Dave


On 15 September 2016 at 08:03, Hengjie Wang  wrote:

> Hi Dave,
>
> Sorry, I should have put more comment to explain the code.
> The number of process in each dimension is the same: Px = Py=Pz=P. So is
> the domain size.
> So if the you want to run the code for a  512^3 grid points on 16^3 cores,
> you need to set "-N 512 -P 16" in the command line.
> I add more comments and also fix an error in the attached code. ( The
> error only effects the accuracy of solution but not the memory usage. )
>
> Thank you.
> Frank
>
>
> On 9/14/2016 9:05 PM, Dave May wrote:
>
>
>
> On Thursday, 15 September 2016, Dave May  wrote:
>
>>
>>
>> On Thursday, 15 September 2016, frank  wrote:
>>
>>> Hi,
>>>
>>> I write a simple code to re-produce the error. I hope this can help to
>>> diagnose the problem.
>>> The code just solves a 3d poisson equation.
>>>
>>
>> Why is the stencil width a runtime parameter?? And why is the default
>> value 2? For 7-pnt FD Laplace, you only need a stencil width of 1.
>>
>> Was this choice made to mimic something in the real application code?
>>
>
> Please ignore - I misunderstood your usage of the param set by -P
>
>
>>
>>
>>>
>>> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * 32.
>>> That's when I re-produce the OOM error. Each core has about 2G memory.
>>> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The ksp
>>> solver works fine.
>>> I attached the code, ksp_view_pre's output and my petsc option file.
>>>
>>> Thank you.
>>> Frank
>>>
>>> On 09/09/2016 06:38 PM, Hengjie Wang wrote:
>>>
>>> Hi Barry,
>>>
>>> I checked. On the supercomputer, I had the option "-ksp_view_pre" but it
>>> is not in file I sent you. I am sorry for the confusion.
>>>
>>> Regards,
>>> Frank
>>>
>>> On Friday, September 9, 2016, Barry Smith  wrote:
>>>

 > On Sep 9, 2016, at 3:11 PM, frank  wrote:
 >
 > Hi Barry,
 >
 > I think the first KSP view output is from -ksp_view_pre. Before I
 submitted the test, I was not sure whether there would be OOM error or not.
 So I added both -ksp_view_pre and -ksp_view.

   But the options file you sent specifically does NOT list the
 -ksp_view_pre so how could it be from that?

Sorry to be pedantic but I've spent too much time in the past trying
 to debug from incorrect information and want to make sure that the
 information I have is correct before thinking. Please recheck exactly what
 happened. Rerun with the exact input file you emailed if that is needed.

Barry

 >
 > Frank
 >
 >
 > On 09/09/2016 12:38 PM, Barry Smith wrote:
 >>   Why does ksp_view2.txt have two KSP views in it while
 ksp_view1.txt has only one KSPView in it? Did you run two different solves
 in the 2 case but not the one?
 >>
 >>   Barry
 >>
 >>
 >>
 >>> On Sep 9, 2016, at 10:56 AM, frank  wrote:
 >>>
 >>> Hi,
 >>>
 >>> I want to continue digging into the memory problem here.
 >>> I did find a work around in the past, which is to use less cores
 per node so that each core has 8G memory. However this is deficient and
 expensive. I hope to locate the place that uses the most memory.
 >>>
 >>> Here is a brief summary of the tests I did in past:
  Test1:   Mesh 1536*128*384  |  Process Mesh 48*4*12
 >>> Maximum (over computational time) process memory:   total
 7.0727e+08
 >>> Current process memory:
  total 7.0727e+08
 >>> Maximum (over computational time) space PetscMalloc()ed:  total
 6.3908e+11
 >>> Current space PetscMalloc()ed:
   total 1.8275e+09
 >>>
  Test2:Mesh 1536*128*384  |  Process Mesh 96*8*24
 >>> Maximum 

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-15 Thread Dave May
On Thursday, 15 September 2016, Hengjie Wang  wrote:

> Hi Dave,
>
> Sorry, I should have put more comment to explain the code.
>

No problem. I was looking at the code after only 3 hrs of sleep


>
> The number of process in each dimension is the same: Px = Py=Pz=P. So is
> the domain size.
> So if the you want to run the code for a  512^3 grid points on 16^3 cores,
> you need to set "-N 512 -P 16" in the command line.
> I add more comments and also fix an error in the attached code. ( The
> error only effects the accuracy of solution but not the memory usage. )
>

Yep thanks, I see that now.

I know this is only a test, but this is kinda clunky. The dmda can
automatically choose the partition, and if the user wants control over it,
they can use the command line options -da_processors_{x,y,z} (as in your
options file).

For my testing purposes I'll have to tweak your code as I don't want to
always have to change two options when changing the partition size or mesh
size (as I'll certainly get it wrong every second time leading to a lose of
my time due to queue wait times)

Thanks,
  Dave



>
>
> Thank you.
> Frank
>
> On 9/14/2016 9:05 PM, Dave May wrote:
>
>
>
> On Thursday, 15 September 2016, Dave May  > wrote:
>
>>
>>
>> On Thursday, 15 September 2016, frank  wrote:
>>
>>> Hi,
>>>
>>> I write a simple code to re-produce the error. I hope this can help to
>>> diagnose the problem.
>>> The code just solves a 3d poisson equation.
>>>
>>
>> Why is the stencil width a runtime parameter?? And why is the default
>> value 2? For 7-pnt FD Laplace, you only need a stencil width of 1.
>>
>> Was this choice made to mimic something in the real application code?
>>
>
> Please ignore - I misunderstood your usage of the param set by -P
>
>
>>
>>
>>>
>>> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * 32.
>>> That's when I re-produce the OOM error. Each core has about 2G memory.
>>> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The ksp
>>> solver works fine.
>>> I attached the code, ksp_view_pre's output and my petsc option file.
>>>
>>> Thank you.
>>> Frank
>>>
>>> On 09/09/2016 06:38 PM, Hengjie Wang wrote:
>>>
>>> Hi Barry,
>>>
>>> I checked. On the supercomputer, I had the option "-ksp_view_pre" but it
>>> is not in file I sent you. I am sorry for the confusion.
>>>
>>> Regards,
>>> Frank
>>>
>>> On Friday, September 9, 2016, Barry Smith  wrote:
>>>

 > On Sep 9, 2016, at 3:11 PM, frank  wrote:
 >
 > Hi Barry,
 >
 > I think the first KSP view output is from -ksp_view_pre. Before I
 submitted the test, I was not sure whether there would be OOM error or not.
 So I added both -ksp_view_pre and -ksp_view.

   But the options file you sent specifically does NOT list the
 -ksp_view_pre so how could it be from that?

Sorry to be pedantic but I've spent too much time in the past trying
 to debug from incorrect information and want to make sure that the
 information I have is correct before thinking. Please recheck exactly what
 happened. Rerun with the exact input file you emailed if that is needed.

Barry

 >
 > Frank
 >
 >
 > On 09/09/2016 12:38 PM, Barry Smith wrote:
 >>   Why does ksp_view2.txt have two KSP views in it while
 ksp_view1.txt has only one KSPView in it? Did you run two different solves
 in the 2 case but not the one?
 >>
 >>   Barry
 >>
 >>
 >>
 >>> On Sep 9, 2016, at 10:56 AM, frank  wrote:
 >>>
 >>> Hi,
 >>>
 >>> I want to continue digging into the memory problem here.
 >>> I did find a work around in the past, which is to use less cores
 per node so that each core has 8G memory. However this is deficient and
 expensive. I hope to locate the place that uses the most memory.
 >>>
 >>> Here is a brief summary of the tests I did in past:
  Test1:   Mesh 1536*128*384  |  Process Mesh 48*4*12
 >>> Maximum (over computational time) process memory:   total
 7.0727e+08
 >>> Current process memory:
  total 7.0727e+08
 >>> Maximum (over computational time) space PetscMalloc()ed:  total
 6.3908e+11
 >>> Current space PetscMalloc()ed:
   total 1.8275e+09
 >>>
  Test2:Mesh 1536*128*384  |  Process Mesh 96*8*24
 >>> Maximum (over computational time) process memory:   total
 5.9431e+09
 >>> Current process memory:
  total 5.9431e+09
 >>> Maximum (over computational time) space PetscMalloc()ed:  total
 5.3202e+12
 >>> Current space PetscMalloc()ed:
total 5.4844e+09
 >>>
  Test3:Mesh 3072*256*768  |  Process Mesh 96*8*24
 >>> OOM( Out Of Memory ) killer of the 

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-15 Thread Hengjie Wang

Hi Dave,

Sorry, I should have put more comment to explain the code.
The number of process in each dimension is the same: Px = Py=Pz=P. So is 
the domain size.
So if the you want to run the code for a  512^3 grid points on 16^3 
cores, you need to set "-N 512 -P 16" in the command line.
I add more comments and also fix an error in the attached code. ( The 
error only effects the accuracy of solution but not the memory usage. )


Thank you.
Frank

On 9/14/2016 9:05 PM, Dave May wrote:



On Thursday, 15 September 2016, Dave May > wrote:




On Thursday, 15 September 2016, frank > wrote:

Hi,

I write a simple code to re-produce the error. I hope this can
help to diagnose the problem.
The code just solves a 3d poisson equation.


Why is the stencil width a runtime parameter?? And why is the
default value 2? For 7-pnt FD Laplace, you only need a stencil
width of 1.

Was this choice made to mimic something in the real application code?


Please ignore - I misunderstood your usage of the param set by -P


I run the code on a 1024^3 mesh. The process partition is 32 *
32 * 32. That's when I re-produce the OOM error. Each core has
about 2G memory.
I also run the code on a 512^3 mesh with 16 * 16 * 16
processes. The ksp solver works fine.
I attached the code, ksp_view_pre's output and my petsc option
file.

Thank you.
Frank

On 09/09/2016 06:38 PM, Hengjie Wang wrote:

Hi Barry,

I checked. On the supercomputer, I had the option
"-ksp_view_pre" but it is not in file I sent you. I am sorry
for the confusion.

Regards,
Frank

On Friday, September 9, 2016, Barry Smith
 wrote:


> On Sep 9, 2016, at 3:11 PM, frank  wrote:
>
> Hi Barry,
>
> I think the first KSP view output is from
-ksp_view_pre. Before I submitted the test, I was not
sure whether there would be OOM error or not. So I added
both -ksp_view_pre and -ksp_view.

  But the options file you sent specifically does NOT
list the -ksp_view_pre so how could it be from that?

   Sorry to be pedantic but I've spent too much time in
the past trying to debug from incorrect information and
want to make sure that the information I have is correct
before thinking. Please recheck exactly what happened.
Rerun with the exact input file you emailed if that is
needed.

   Barry

>
> Frank
>
>
> On 09/09/2016 12:38 PM, Barry Smith wrote:
>>   Why does ksp_view2.txt have two KSP views in it
while ksp_view1.txt has only one KSPView in it? Did you
run two different solves in the 2 case but not the one?
>>
>>   Barry
>>
>>
>>
>>> On Sep 9, 2016, at 10:56 AM, frank 
wrote:
>>>
>>> Hi,
>>>
>>> I want to continue digging into the memory problem here.
>>> I did find a work around in the past, which is to use
less cores per node so that each core has 8G memory.
However this is deficient and expensive. I hope to locate
the place that uses the most memory.
>>>
>>> Here is a brief summary of the tests I did in past:
 Test1:   Mesh 1536*128*384  | Process Mesh 48*4*12
>>> Maximum (over computational time) process memory:   
   total 7.0727e+08
>>> Current process memory:  
   total 7.0727e+08

>>> Maximum (over computational time) space
PetscMalloc()ed:  total 6.3908e+11
>>> Current space PetscMalloc()ed:  
total 1.8275e+09

>>>
 Test2:Mesh 1536*128*384  | Process Mesh 96*8*24
>>> Maximum (over computational time) process memory:   
   total 5.9431e+09
>>> Current process memory:  
   total 5.9431e+09

>>> Maximum (over computational time) space
PetscMalloc()ed:  total 5.3202e+12
>>> Current space PetscMalloc()ed:  
 total 5.4844e+09

>>>
 Test3:Mesh 3072*256*768  | Process Mesh 96*8*24
>>> OOM( Out Of Memory ) killer of the supercomputer
terminated the job during 

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-14 Thread Dave May
On Thursday, 15 September 2016, Dave May  wrote:

>
>
> On Thursday, 15 September 2016, frank  > wrote:
>
>> Hi,
>>
>> I write a simple code to re-produce the error. I hope this can help to
>> diagnose the problem.
>> The code just solves a 3d poisson equation.
>>
>
> Why is the stencil width a runtime parameter?? And why is the default
> value 2? For 7-pnt FD Laplace, you only need a stencil width of 1.
>
> Was this choice made to mimic something in the real application code?
>

Please ignore - I misunderstood your usage of the param set by -P


>
>
>>
>> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * 32.
>> That's when I re-produce the OOM error. Each core has about 2G memory.
>> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The ksp
>> solver works fine.
>> I attached the code, ksp_view_pre's output and my petsc option file.
>>
>> Thank you.
>> Frank
>>
>> On 09/09/2016 06:38 PM, Hengjie Wang wrote:
>>
>> Hi Barry,
>>
>> I checked. On the supercomputer, I had the option "-ksp_view_pre" but it
>> is not in file I sent you. I am sorry for the confusion.
>>
>> Regards,
>> Frank
>>
>> On Friday, September 9, 2016, Barry Smith  wrote:
>>
>>>
>>> > On Sep 9, 2016, at 3:11 PM, frank  wrote:
>>> >
>>> > Hi Barry,
>>> >
>>> > I think the first KSP view output is from -ksp_view_pre. Before I
>>> submitted the test, I was not sure whether there would be OOM error or not.
>>> So I added both -ksp_view_pre and -ksp_view.
>>>
>>>   But the options file you sent specifically does NOT list the
>>> -ksp_view_pre so how could it be from that?
>>>
>>>Sorry to be pedantic but I've spent too much time in the past trying
>>> to debug from incorrect information and want to make sure that the
>>> information I have is correct before thinking. Please recheck exactly what
>>> happened. Rerun with the exact input file you emailed if that is needed.
>>>
>>>Barry
>>>
>>> >
>>> > Frank
>>> >
>>> >
>>> > On 09/09/2016 12:38 PM, Barry Smith wrote:
>>> >>   Why does ksp_view2.txt have two KSP views in it while ksp_view1.txt
>>> has only one KSPView in it? Did you run two different solves in the 2 case
>>> but not the one?
>>> >>
>>> >>   Barry
>>> >>
>>> >>
>>> >>
>>> >>> On Sep 9, 2016, at 10:56 AM, frank  wrote:
>>> >>>
>>> >>> Hi,
>>> >>>
>>> >>> I want to continue digging into the memory problem here.
>>> >>> I did find a work around in the past, which is to use less cores per
>>> node so that each core has 8G memory. However this is deficient and
>>> expensive. I hope to locate the place that uses the most memory.
>>> >>>
>>> >>> Here is a brief summary of the tests I did in past:
>>>  Test1:   Mesh 1536*128*384  |  Process Mesh 48*4*12
>>> >>> Maximum (over computational time) process memory:   total
>>> 7.0727e+08
>>> >>> Current process memory:
>>>total 7.0727e+08
>>> >>> Maximum (over computational time) space PetscMalloc()ed:  total
>>> 6.3908e+11
>>> >>> Current space PetscMalloc()ed:
>>>   total 1.8275e+09
>>> >>>
>>>  Test2:Mesh 1536*128*384  |  Process Mesh 96*8*24
>>> >>> Maximum (over computational time) process memory:   total
>>> 5.9431e+09
>>> >>> Current process memory:
>>>total 5.9431e+09
>>> >>> Maximum (over computational time) space PetscMalloc()ed:  total
>>> 5.3202e+12
>>> >>> Current space PetscMalloc()ed:
>>>total 5.4844e+09
>>> >>>
>>>  Test3:Mesh 3072*256*768  |  Process Mesh 96*8*24
>>> >>> OOM( Out Of Memory ) killer of the supercomputer terminated the
>>> job during "KSPSolve".
>>> >>>
>>> >>> I attached the output of ksp_view( the third test's output is from
>>> ksp_view_pre ), memory_view and also the petsc options.
>>> >>>
>>> >>> In all the tests, each core can access about 2G memory. In test3,
>>> there are 4223139840 non-zeros in the matrix. This will consume about
>>> 1.74M, using double precision. Considering some extra memory used to store
>>> integer index, 2G memory should still be way enough.
>>> >>>
>>> >>> Is there a way to find out which part of KSPSolve uses the most
>>> memory?
>>> >>> Thank you so much.
>>> >>>
>>> >>> BTW, there are 4 options remains unused and I don't understand why
>>> they are omitted:
>>> >>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly
>>> >>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi
>>> >>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1
>>> >>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson
>>> >>>
>>> >>>
>>> >>> Regards,
>>> >>> Frank
>>> >>>
>>> >>> On 07/13/2016 05:47 PM, Dave May wrote:
>>> 
>>>  On 14 July 2016 at 01:07, frank  wrote:
>>>  Hi Dave,
>>> 
>>>  Sorry for the late reply.
>>>  Thank you so much for your detailed reply.
>>> 
>>>  I have a question about the estimation of the memory 

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-14 Thread Dave May
On Thursday, 15 September 2016, frank  wrote:

> Hi,
>
> I write a simple code to re-produce the error. I hope this can help to
> diagnose the problem.
> The code just solves a 3d poisson equation.
>

Why is the stencil width a runtime parameter?? And why is the default value
2? For 7-pnt FD Laplace, you only need a stencil width of 1.

Was this choice made to mimic something in the real application code?


>
> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * 32.
> That's when I re-produce the OOM error. Each core has about 2G memory.
> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The ksp
> solver works fine.
> I attached the code, ksp_view_pre's output and my petsc option file.
>
> Thank you.
> Frank
>
> On 09/09/2016 06:38 PM, Hengjie Wang wrote:
>
> Hi Barry,
>
> I checked. On the supercomputer, I had the option "-ksp_view_pre" but it
> is not in file I sent you. I am sorry for the confusion.
>
> Regards,
> Frank
>
> On Friday, September 9, 2016, Barry Smith  > wrote:
>
>>
>> > On Sep 9, 2016, at 3:11 PM, frank  wrote:
>> >
>> > Hi Barry,
>> >
>> > I think the first KSP view output is from -ksp_view_pre. Before I
>> submitted the test, I was not sure whether there would be OOM error or not.
>> So I added both -ksp_view_pre and -ksp_view.
>>
>>   But the options file you sent specifically does NOT list the
>> -ksp_view_pre so how could it be from that?
>>
>>Sorry to be pedantic but I've spent too much time in the past trying
>> to debug from incorrect information and want to make sure that the
>> information I have is correct before thinking. Please recheck exactly what
>> happened. Rerun with the exact input file you emailed if that is needed.
>>
>>Barry
>>
>> >
>> > Frank
>> >
>> >
>> > On 09/09/2016 12:38 PM, Barry Smith wrote:
>> >>   Why does ksp_view2.txt have two KSP views in it while ksp_view1.txt
>> has only one KSPView in it? Did you run two different solves in the 2 case
>> but not the one?
>> >>
>> >>   Barry
>> >>
>> >>
>> >>
>> >>> On Sep 9, 2016, at 10:56 AM, frank  wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> I want to continue digging into the memory problem here.
>> >>> I did find a work around in the past, which is to use less cores per
>> node so that each core has 8G memory. However this is deficient and
>> expensive. I hope to locate the place that uses the most memory.
>> >>>
>> >>> Here is a brief summary of the tests I did in past:
>>  Test1:   Mesh 1536*128*384  |  Process Mesh 48*4*12
>> >>> Maximum (over computational time) process memory:   total
>> 7.0727e+08
>> >>> Current process memory:
>>total 7.0727e+08
>> >>> Maximum (over computational time) space PetscMalloc()ed:  total
>> 6.3908e+11
>> >>> Current space PetscMalloc()ed:
>> total 1.8275e+09
>> >>>
>>  Test2:Mesh 1536*128*384  |  Process Mesh 96*8*24
>> >>> Maximum (over computational time) process memory:   total
>> 5.9431e+09
>> >>> Current process memory:
>>total 5.9431e+09
>> >>> Maximum (over computational time) space PetscMalloc()ed:  total
>> 5.3202e+12
>> >>> Current space PetscMalloc()ed:
>>  total 5.4844e+09
>> >>>
>>  Test3:Mesh 3072*256*768  |  Process Mesh 96*8*24
>> >>> OOM( Out Of Memory ) killer of the supercomputer terminated the
>> job during "KSPSolve".
>> >>>
>> >>> I attached the output of ksp_view( the third test's output is from
>> ksp_view_pre ), memory_view and also the petsc options.
>> >>>
>> >>> In all the tests, each core can access about 2G memory. In test3,
>> there are 4223139840 non-zeros in the matrix. This will consume about
>> 1.74M, using double precision. Considering some extra memory used to store
>> integer index, 2G memory should still be way enough.
>> >>>
>> >>> Is there a way to find out which part of KSPSolve uses the most
>> memory?
>> >>> Thank you so much.
>> >>>
>> >>> BTW, there are 4 options remains unused and I don't understand why
>> they are omitted:
>> >>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly
>> >>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi
>> >>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1
>> >>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson
>> >>>
>> >>>
>> >>> Regards,
>> >>> Frank
>> >>>
>> >>> On 07/13/2016 05:47 PM, Dave May wrote:
>> 
>>  On 14 July 2016 at 01:07, frank  wrote:
>>  Hi Dave,
>> 
>>  Sorry for the late reply.
>>  Thank you so much for your detailed reply.
>> 
>>  I have a question about the estimation of the memory usage. There
>> are 4223139840 allocated non-zeros and 18432 MPI processes. Double
>> precision is used. So the memory per process is:
>>    4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ?
>>  Did I do sth wrong here? Because this seems too small.
>> 
>>  No - I totally 

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-14 Thread Dave May
Hi Frank,

On Thursday, 15 September 2016, frank  wrote:

> Hi,
>
> I write a simple code to re-produce the error. I hope this can help to
> diagnose the problem.
> The code just solves a 3d poisson equation.
> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * 32.
> That's when I re-produce the OOM error. Each core has about 2G memory.
> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The ksp
> solver works fine.
>

Perfect! That's very helpful, I can use this to track down here the issue
is coming from. Give me some time to figure this out.

Thanks,
  Dave



>
> I attached the code, ksp_view_pre's output and my petsc option file.
>
> Thank you.
> Frank
>
> On 09/09/2016 06:38 PM, Hengjie Wang wrote:
>
> Hi Barry,
>
> I checked. On the supercomputer, I had the option "-ksp_view_pre" but it
> is not in file I sent you. I am sorry for the confusion.
>
> Regards,
> Frank
>
> On Friday, September 9, 2016, Barry Smith  > wrote:
>
>>
>> > On Sep 9, 2016, at 3:11 PM, frank  wrote:
>> >
>> > Hi Barry,
>> >
>> > I think the first KSP view output is from -ksp_view_pre. Before I
>> submitted the test, I was not sure whether there would be OOM error or not.
>> So I added both -ksp_view_pre and -ksp_view.
>>
>>   But the options file you sent specifically does NOT list the
>> -ksp_view_pre so how could it be from that?
>>
>>Sorry to be pedantic but I've spent too much time in the past trying
>> to debug from incorrect information and want to make sure that the
>> information I have is correct before thinking. Please recheck exactly what
>> happened. Rerun with the exact input file you emailed if that is needed.
>>
>>Barry
>>
>> >
>> > Frank
>> >
>> >
>> > On 09/09/2016 12:38 PM, Barry Smith wrote:
>> >>   Why does ksp_view2.txt have two KSP views in it while ksp_view1.txt
>> has only one KSPView in it? Did you run two different solves in the 2 case
>> but not the one?
>> >>
>> >>   Barry
>> >>
>> >>
>> >>
>> >>> On Sep 9, 2016, at 10:56 AM, frank  wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> I want to continue digging into the memory problem here.
>> >>> I did find a work around in the past, which is to use less cores per
>> node so that each core has 8G memory. However this is deficient and
>> expensive. I hope to locate the place that uses the most memory.
>> >>>
>> >>> Here is a brief summary of the tests I did in past:
>>  Test1:   Mesh 1536*128*384  |  Process Mesh 48*4*12
>> >>> Maximum (over computational time) process memory:   total
>> 7.0727e+08
>> >>> Current process memory:
>>total 7.0727e+08
>> >>> Maximum (over computational time) space PetscMalloc()ed:  total
>> 6.3908e+11
>> >>> Current space PetscMalloc()ed:
>> total 1.8275e+09
>> >>>
>>  Test2:Mesh 1536*128*384  |  Process Mesh 96*8*24
>> >>> Maximum (over computational time) process memory:   total
>> 5.9431e+09
>> >>> Current process memory:
>>total 5.9431e+09
>> >>> Maximum (over computational time) space PetscMalloc()ed:  total
>> 5.3202e+12
>> >>> Current space PetscMalloc()ed:
>>  total 5.4844e+09
>> >>>
>>  Test3:Mesh 3072*256*768  |  Process Mesh 96*8*24
>> >>> OOM( Out Of Memory ) killer of the supercomputer terminated the
>> job during "KSPSolve".
>> >>>
>> >>> I attached the output of ksp_view( the third test's output is from
>> ksp_view_pre ), memory_view and also the petsc options.
>> >>>
>> >>> In all the tests, each core can access about 2G memory. In test3,
>> there are 4223139840 non-zeros in the matrix. This will consume about
>> 1.74M, using double precision. Considering some extra memory used to store
>> integer index, 2G memory should still be way enough.
>> >>>
>> >>> Is there a way to find out which part of KSPSolve uses the most
>> memory?
>> >>> Thank you so much.
>> >>>
>> >>> BTW, there are 4 options remains unused and I don't understand why
>> they are omitted:
>> >>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly
>> >>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi
>> >>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1
>> >>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson
>> >>>
>> >>>
>> >>> Regards,
>> >>> Frank
>> >>>
>> >>> On 07/13/2016 05:47 PM, Dave May wrote:
>> 
>>  On 14 July 2016 at 01:07, frank  wrote:
>>  Hi Dave,
>> 
>>  Sorry for the late reply.
>>  Thank you so much for your detailed reply.
>> 
>>  I have a question about the estimation of the memory usage. There
>> are 4223139840 allocated non-zeros and 18432 MPI processes. Double
>> precision is used. So the memory per process is:
>>    4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ?
>>  Did I do sth wrong here? Because this seems too small.
>> 
>>  No - I totally f***ed it up. You are correct. That'll teach me for

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-14 Thread frank

Hi,

I write a simple code to re-produce the error. I hope this can help to 
diagnose the problem.

The code just solves a 3d poisson equation.
I run the code on a 1024^3 mesh. The process partition is 32 * 32 * 32. 
That's when I re-produce the OOM error. Each core has about 2G memory.
I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The ksp 
solver works fine.

I attached the code, ksp_view_pre's output and my petsc option file.

Thank you.
Frank

On 09/09/2016 06:38 PM, Hengjie Wang wrote:

Hi Barry,

I checked. On the supercomputer, I had the option "-ksp_view_pre" but 
it is not in file I sent you. I am sorry for the confusion.


Regards,
Frank

On Friday, September 9, 2016, Barry Smith > wrote:



> On Sep 9, 2016, at 3:11 PM, frank > wrote:
>
> Hi Barry,
>
> I think the first KSP view output is from -ksp_view_pre. Before
I submitted the test, I was not sure whether there would be OOM
error or not. So I added both -ksp_view_pre and -ksp_view.

  But the options file you sent specifically does NOT list the
-ksp_view_pre so how could it be from that?

   Sorry to be pedantic but I've spent too much time in the past
trying to debug from incorrect information and want to make sure
that the information I have is correct before thinking. Please
recheck exactly what happened. Rerun with the exact input file you
emailed if that is needed.

   Barry

>
> Frank
>
>
> On 09/09/2016 12:38 PM, Barry Smith wrote:
>>   Why does ksp_view2.txt have two KSP views in it while
ksp_view1.txt has only one KSPView in it? Did you run two
different solves in the 2 case but not the one?
>>
>>   Barry
>>
>>
>>
>>> On Sep 9, 2016, at 10:56 AM, frank > wrote:
>>>
>>> Hi,
>>>
>>> I want to continue digging into the memory problem here.
>>> I did find a work around in the past, which is to use less
cores per node so that each core has 8G memory. However this is
deficient and expensive. I hope to locate the place that uses the
most memory.
>>>
>>> Here is a brief summary of the tests I did in past:
 Test1:   Mesh 1536*128*384  |  Process Mesh 48*4*12
>>> Maximum (over computational time) process memory: 
 total 7.0727e+08

>>> Current process memory:total
7.0727e+08
>>> Maximum (over computational time) space PetscMalloc()ed: 
total 6.3908e+11
>>> Current space PetscMalloc()ed:
total 1.8275e+09

>>>
 Test2:Mesh 1536*128*384  |  Process Mesh 96*8*24
>>> Maximum (over computational time) process memory: 
 total 5.9431e+09

>>> Current process memory:total
5.9431e+09
>>> Maximum (over computational time) space PetscMalloc()ed: 
total 5.3202e+12
>>> Current space PetscMalloc()ed:
 total 5.4844e+09

>>>
 Test3:Mesh 3072*256*768  |  Process Mesh 96*8*24
>>> OOM( Out Of Memory ) killer of the supercomputer
terminated the job during "KSPSolve".
>>>
>>> I attached the output of ksp_view( the third test's output is
from ksp_view_pre ), memory_view and also the petsc options.
>>>
>>> In all the tests, each core can access about 2G memory. In
test3, there are 4223139840 non-zeros in the matrix. This will
consume about 1.74M, using double precision. Considering some
extra memory used to store integer index, 2G memory should still
be way enough.
>>>
>>> Is there a way to find out which part of KSPSolve uses the
most memory?
>>> Thank you so much.
>>>
>>> BTW, there are 4 options remains unused and I don't understand
why they are omitted:
>>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly
>>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi
>>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1
>>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson
>>>
>>>
>>> Regards,
>>> Frank
>>>
>>> On 07/13/2016 05:47 PM, Dave May wrote:

 On 14 July 2016 at 01:07, frank > wrote:
 Hi Dave,

 Sorry for the late reply.
 Thank you so much for your detailed reply.

 I have a question about the estimation of the memory usage.
There are 4223139840 allocated non-zeros and 18432 MPI processes.
Double precision is used. So the memory per process is:
   4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ?
 Did I do sth wrong here? Because this seems too small.

 No - I totally f***ed it up. You are correct. That'll teach
me for fumbling around with my iphone calculator 

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-09 Thread Hengjie Wang
Hi Barry,

I checked. On the supercomputer, I had the option "-ksp_view_pre" but it is
not in file I sent you. I am sorry for the confusion.

Regards,
Frank

On Friday, September 9, 2016, Barry Smith  wrote:

>
> > On Sep 9, 2016, at 3:11 PM, frank >
> wrote:
> >
> > Hi Barry,
> >
> > I think the first KSP view output is from -ksp_view_pre. Before I
> submitted the test, I was not sure whether there would be OOM error or not.
> So I added both -ksp_view_pre and -ksp_view.
>
>   But the options file you sent specifically does NOT list the
> -ksp_view_pre so how could it be from that?
>
>Sorry to be pedantic but I've spent too much time in the past trying to
> debug from incorrect information and want to make sure that the information
> I have is correct before thinking. Please recheck exactly what happened.
> Rerun with the exact input file you emailed if that is needed.
>
>Barry
>
> >
> > Frank
> >
> >
> > On 09/09/2016 12:38 PM, Barry Smith wrote:
> >>   Why does ksp_view2.txt have two KSP views in it while ksp_view1.txt
> has only one KSPView in it? Did you run two different solves in the 2 case
> but not the one?
> >>
> >>   Barry
> >>
> >>
> >>
> >>> On Sep 9, 2016, at 10:56 AM, frank >
> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I want to continue digging into the memory problem here.
> >>> I did find a work around in the past, which is to use less cores per
> node so that each core has 8G memory. However this is deficient and
> expensive. I hope to locate the place that uses the most memory.
> >>>
> >>> Here is a brief summary of the tests I did in past:
>  Test1:   Mesh 1536*128*384  |  Process Mesh 48*4*12
> >>> Maximum (over computational time) process memory:   total
> 7.0727e+08
> >>> Current process memory:
>  total 7.0727e+08
> >>> Maximum (over computational time) space PetscMalloc()ed:  total
> 6.3908e+11
> >>> Current space PetscMalloc()ed:
> total 1.8275e+09
> >>>
>  Test2:Mesh 1536*128*384  |  Process Mesh 96*8*24
> >>> Maximum (over computational time) process memory:   total
> 5.9431e+09
> >>> Current process memory:
>  total 5.9431e+09
> >>> Maximum (over computational time) space PetscMalloc()ed:  total
> 5.3202e+12
> >>> Current space PetscMalloc()ed:
>  total 5.4844e+09
> >>>
>  Test3:Mesh 3072*256*768  |  Process Mesh 96*8*24
> >>> OOM( Out Of Memory ) killer of the supercomputer terminated the
> job during "KSPSolve".
> >>>
> >>> I attached the output of ksp_view( the third test's output is from
> ksp_view_pre ), memory_view and also the petsc options.
> >>>
> >>> In all the tests, each core can access about 2G memory. In test3,
> there are 4223139840 non-zeros in the matrix. This will consume about
> 1.74M, using double precision. Considering some extra memory used to store
> integer index, 2G memory should still be way enough.
> >>>
> >>> Is there a way to find out which part of KSPSolve uses the most memory?
> >>> Thank you so much.
> >>>
> >>> BTW, there are 4 options remains unused and I don't understand why
> they are omitted:
> >>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly
> >>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi
> >>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1
> >>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson
> >>>
> >>>
> >>> Regards,
> >>> Frank
> >>>
> >>> On 07/13/2016 05:47 PM, Dave May wrote:
> 
>  On 14 July 2016 at 01:07, frank >
> wrote:
>  Hi Dave,
> 
>  Sorry for the late reply.
>  Thank you so much for your detailed reply.
> 
>  I have a question about the estimation of the memory usage. There are
> 4223139840 allocated non-zeros and 18432 MPI processes. Double precision is
> used. So the memory per process is:
>    4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ?
>  Did I do sth wrong here? Because this seems too small.
> 
>  No - I totally f***ed it up. You are correct. That'll teach me for
> fumbling around with my iphone calculator and not using my brain. (Note
> that to convert to MB just divide by 1e6, not 1024^2 - although I
> apparently cannot convert between units correctly)
> 
>  From the PETSc objects associated with the solver, It looks like it
> _should_ run with 2GB per MPI rank. Sorry for my mistake. Possibilities
> are: somewhere in your usage of PETSc you've introduced a memory leak;
> PETSc is doing a huge over allocation (e.g. as per our discussion of
> MatPtAP); or in your application code there are other objects you have
> forgotten to log the memory for.
> 
> 
> 
>  I am running this job on Bluewater
>  I am using the 7 points FD stencil in 3D.
> 
>  I thought so on both counts.
> 
>  I apologize that I made a stupid mistake in computing the memory per
> core. My settings render each core can access 

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-09 Thread Barry Smith

> On Sep 9, 2016, at 3:11 PM, frank  wrote:
> 
> Hi Barry,
> 
> I think the first KSP view output is from -ksp_view_pre. Before I submitted 
> the test, I was not sure whether there would be OOM error or not. So I added 
> both -ksp_view_pre and -ksp_view.

  But the options file you sent specifically does NOT list the -ksp_view_pre so 
how could it be from that?

   Sorry to be pedantic but I've spent too much time in the past trying to 
debug from incorrect information and want to make sure that the information I 
have is correct before thinking. Please recheck exactly what happened. Rerun 
with the exact input file you emailed if that is needed.

   Barry

> 
> Frank
> 
> 
> On 09/09/2016 12:38 PM, Barry Smith wrote:
>>   Why does ksp_view2.txt have two KSP views in it while ksp_view1.txt has 
>> only one KSPView in it? Did you run two different solves in the 2 case but 
>> not the one?
>> 
>>   Barry
>> 
>> 
>> 
>>> On Sep 9, 2016, at 10:56 AM, frank  wrote:
>>> 
>>> Hi,
>>> 
>>> I want to continue digging into the memory problem here.
>>> I did find a work around in the past, which is to use less cores per node 
>>> so that each core has 8G memory. However this is deficient and expensive. I 
>>> hope to locate the place that uses the most memory.
>>> 
>>> Here is a brief summary of the tests I did in past:
 Test1:   Mesh 1536*128*384  |  Process Mesh 48*4*12
>>> Maximum (over computational time) process memory:   total 7.0727e+08
>>> Current process memory: 
>>> total 7.0727e+08
>>> Maximum (over computational time) space PetscMalloc()ed:  total 6.3908e+11
>>> Current space PetscMalloc()ed:  
>>>   total 1.8275e+09
>>> 
 Test2:Mesh 1536*128*384  |  Process Mesh 96*8*24
>>> Maximum (over computational time) process memory:   total 5.9431e+09
>>> Current process memory: 
>>> total 5.9431e+09
>>> Maximum (over computational time) space PetscMalloc()ed:  total 5.3202e+12
>>> Current space PetscMalloc()ed:  
>>>total 5.4844e+09
>>> 
 Test3:Mesh 3072*256*768  |  Process Mesh 96*8*24
>>> OOM( Out Of Memory ) killer of the supercomputer terminated the job 
>>> during "KSPSolve".
>>> 
>>> I attached the output of ksp_view( the third test's output is from 
>>> ksp_view_pre ), memory_view and also the petsc options.
>>> 
>>> In all the tests, each core can access about 2G memory. In test3, there are 
>>> 4223139840 non-zeros in the matrix. This will consume about 1.74M, using 
>>> double precision. Considering some extra memory used to store integer 
>>> index, 2G memory should still be way enough.
>>> 
>>> Is there a way to find out which part of KSPSolve uses the most memory?
>>> Thank you so much.
>>> 
>>> BTW, there are 4 options remains unused and I don't understand why they are 
>>> omitted:
>>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly
>>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi
>>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1
>>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson
>>> 
>>> 
>>> Regards,
>>> Frank
>>> 
>>> On 07/13/2016 05:47 PM, Dave May wrote:
 
 On 14 July 2016 at 01:07, frank  wrote:
 Hi Dave,
 
 Sorry for the late reply.
 Thank you so much for your detailed reply.
 
 I have a question about the estimation of the memory usage. There are 
 4223139840 allocated non-zeros and 18432 MPI processes. Double precision 
 is used. So the memory per process is:
   4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ?
 Did I do sth wrong here? Because this seems too small.
 
 No - I totally f***ed it up. You are correct. That'll teach me for 
 fumbling around with my iphone calculator and not using my brain. (Note 
 that to convert to MB just divide by 1e6, not 1024^2 - although I 
 apparently cannot convert between units correctly)
 
 From the PETSc objects associated with the solver, It looks like it 
 _should_ run with 2GB per MPI rank. Sorry for my mistake. Possibilities 
 are: somewhere in your usage of PETSc you've introduced a memory leak; 
 PETSc is doing a huge over allocation (e.g. as per our discussion of 
 MatPtAP); or in your application code there are other objects you have 
 forgotten to log the memory for.
 
 
 
 I am running this job on Bluewater
 I am using the 7 points FD stencil in 3D.
 
 I thought so on both counts.
  
 I apologize that I made a stupid mistake in computing the memory per core. 
 My settings render each core can access only 2G memory on average instead 
 of 8G which I mentioned in previous email. I re-run the job with 8G memory 
 per core on average and there is no "Out Of Memory" 

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-09 Thread frank

Hi Barry,

I think the first KSP view output is from -ksp_view_pre. Before I 
submitted the test, I was not sure whether there would be OOM error or 
not. So I added both -ksp_view_pre and -ksp_view.


Frank


On 09/09/2016 12:38 PM, Barry Smith wrote:

   Why does ksp_view2.txt have two KSP views in it while ksp_view1.txt has only 
one KSPView in it? Did you run two different solves in the 2 case but not the 
one?

   Barry




On Sep 9, 2016, at 10:56 AM, frank  wrote:

Hi,

I want to continue digging into the memory problem here.
I did find a work around in the past, which is to use less cores per node so 
that each core has 8G memory. However this is deficient and expensive. I hope 
to locate the place that uses the most memory.

Here is a brief summary of the tests I did in past:

Test1:   Mesh 1536*128*384  |  Process Mesh 48*4*12

Maximum (over computational time) process memory:   total 7.0727e+08
Current process memory: 
total 7.0727e+08
Maximum (over computational time) space PetscMalloc()ed:  total 6.3908e+11
Current space PetscMalloc()ed:
total 1.8275e+09


Test2:Mesh 1536*128*384  |  Process Mesh 96*8*24

Maximum (over computational time) process memory:   total 5.9431e+09
Current process memory: 
total 5.9431e+09
Maximum (over computational time) space PetscMalloc()ed:  total 5.3202e+12
Current space PetscMalloc()ed: 
total 5.4844e+09


Test3:Mesh 3072*256*768  |  Process Mesh 96*8*24

 OOM( Out Of Memory ) killer of the supercomputer terminated the job during 
"KSPSolve".

I attached the output of ksp_view( the third test's output is from ksp_view_pre 
), memory_view and also the petsc options.

In all the tests, each core can access about 2G memory. In test3, there are 
4223139840 non-zeros in the matrix. This will consume about 1.74M, using double 
precision. Considering some extra memory used to store integer index, 2G memory 
should still be way enough.

Is there a way to find out which part of KSPSolve uses the most memory?
Thank you so much.

BTW, there are 4 options remains unused and I don't understand why they are 
omitted:
-mg_coarse_telescope_mg_coarse_ksp_type value: preonly
-mg_coarse_telescope_mg_coarse_pc_type value: bjacobi
-mg_coarse_telescope_mg_levels_ksp_max_it value: 1
-mg_coarse_telescope_mg_levels_ksp_type value: richardson


Regards,
Frank

On 07/13/2016 05:47 PM, Dave May wrote:


On 14 July 2016 at 01:07, frank  wrote:
Hi Dave,

Sorry for the late reply.
Thank you so much for your detailed reply.

I have a question about the estimation of the memory usage. There are 
4223139840 allocated non-zeros and 18432 MPI processes. Double precision is 
used. So the memory per process is:
   4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ?
Did I do sth wrong here? Because this seems too small.

No - I totally f***ed it up. You are correct. That'll teach me for fumbling 
around with my iphone calculator and not using my brain. (Note that to convert 
to MB just divide by 1e6, not 1024^2 - although I apparently cannot convert 
between units correctly)

 From the PETSc objects associated with the solver, It looks like it _should_ 
run with 2GB per MPI rank. Sorry for my mistake. Possibilities are: somewhere 
in your usage of PETSc you've introduced a memory leak; PETSc is doing a huge 
over allocation (e.g. as per our discussion of MatPtAP); or in your application 
code there are other objects you have forgotten to log the memory for.



I am running this job on Bluewater
I am using the 7 points FD stencil in 3D.

I thought so on both counts.
  


I apologize that I made a stupid mistake in computing the memory per core. My settings 
render each core can access only 2G memory on average instead of 8G which I mentioned in 
previous email. I re-run the job with 8G memory per core on average and there is no 
"Out Of Memory" error. I would do more test to see if there is still some 
memory issue.

Ok. I'd still like to know where the memory was being used since my estimates 
were off.


Thanks,
   Dave
  


Regards,
Frank



On 07/11/2016 01:18 PM, Dave May wrote:

Hi Frank,


On 11 July 2016 at 19:14, frank  wrote:
Hi Dave,

I re-run the test using bjacobi as the preconditioner on the coarse mesh of 
telescope. The Grid is 3072*256*768 and process mesh is 96*8*24. The petsc 
option file is attached.
I still got the "Out Of Memory" error. The error occurred before the linear 
solver finished one step. So I don't have the full info from ksp_view. The info from 
ksp_view_pre is attached.

Okay - that is essentially useless (sorry)
  


It seems to me that the error occurred when the decomposition was going to be 
changed.

Based on what information?
Running with -info would give us more clues, 

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-09 Thread Barry Smith

  Why does ksp_view2.txt have two KSP views in it while ksp_view1.txt has only 
one KSPView in it? Did you run two different solves in the 2 case but not the 
one? 

  Barry



> On Sep 9, 2016, at 10:56 AM, frank  wrote:
> 
> Hi,
> 
> I want to continue digging into the memory problem here.  
> I did find a work around in the past, which is to use less cores per node so 
> that each core has 8G memory. However this is deficient and expensive. I hope 
> to locate the place that uses the most memory.
> 
> Here is a brief summary of the tests I did in past:   
> > Test1:   Mesh 1536*128*384  |  Process Mesh 48*4*12 
> Maximum (over computational time) process memory:   total 7.0727e+08 
> Current process memory:   
>   total 7.0727e+08 
> Maximum (over computational time) space PetscMalloc()ed:  total 6.3908e+11
> Current space PetscMalloc()ed:
> total 1.8275e+09 
> 
> > Test2:Mesh 1536*128*384  |  Process Mesh 96*8*24 
> Maximum (over computational time) process memory:   total 5.9431e+09 
> Current process memory:   
>   total 5.9431e+09
> Maximum (over computational time) space PetscMalloc()ed:  total 5.3202e+12
> Current space PetscMalloc()ed:
>  total 5.4844e+09
> 
> > Test3:Mesh 3072*256*768  |  Process Mesh 96*8*24
> OOM( Out Of Memory ) killer of the supercomputer terminated the job 
> during "KSPSolve". 
> 
> I attached the output of ksp_view( the third test's output is from 
> ksp_view_pre ), memory_view and also the petsc options.
> 
> In all the tests, each core can access about 2G memory. In test3, there are 
> 4223139840 non-zeros in the matrix. This will consume about 1.74M, using 
> double precision. Considering some extra memory used to store integer index, 
> 2G memory should still be way enough.
> 
> Is there a way to find out which part of KSPSolve uses the most memory? 
> Thank you so much.
> 
> BTW, there are 4 options remains unused and I don't understand why they are 
> omitted:
> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly
> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi
> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1
> -mg_coarse_telescope_mg_levels_ksp_type value: richardson
> 
> 
> Regards,
> Frank
> 
> On 07/13/2016 05:47 PM, Dave May wrote:
>> 
>> 
>> On 14 July 2016 at 01:07, frank  wrote:
>> Hi Dave,
>> 
>> Sorry for the late reply.
>> Thank you so much for your detailed reply.
>> 
>> I have a question about the estimation of the memory usage. There are 
>> 4223139840 allocated non-zeros and 18432 MPI processes. Double precision is 
>> used. So the memory per process is:
>>   4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ? 
>> Did I do sth wrong here? Because this seems too small.
>> 
>> No - I totally f***ed it up. You are correct. That'll teach me for fumbling 
>> around with my iphone calculator and not using my brain. (Note that to 
>> convert to MB just divide by 1e6, not 1024^2 - although I apparently cannot 
>> convert between units correctly)
>> 
>> From the PETSc objects associated with the solver, It looks like it _should_ 
>> run with 2GB per MPI rank. Sorry for my mistake. Possibilities are: 
>> somewhere in your usage of PETSc you've introduced a memory leak; PETSc is 
>> doing a huge over allocation (e.g. as per our discussion of MatPtAP); or in 
>> your application code there are other objects you have forgotten to log the 
>> memory for.
>> 
>> 
>> 
>> I am running this job on Bluewater 
>> I am using the 7 points FD stencil in 3D. 
>> 
>> I thought so on both counts.
>>  
>> 
>> I apologize that I made a stupid mistake in computing the memory per core. 
>> My settings render each core can access only 2G memory on average instead of 
>> 8G which I mentioned in previous email. I re-run the job with 8G memory per 
>> core on average and there is no "Out Of Memory" error. I would do more test 
>> to see if there is still some memory issue.
>> 
>> Ok. I'd still like to know where the memory was being used since my 
>> estimates were off.
>> 
>> 
>> Thanks,
>>   Dave
>>  
>> 
>> Regards,
>> Frank
>> 
>> 
>> 
>> On 07/11/2016 01:18 PM, Dave May wrote:
>>> Hi Frank,
>>> 
>>> 
>>> On 11 July 2016 at 19:14, frank  wrote:
>>> Hi Dave,
>>> 
>>> I re-run the test using bjacobi as the preconditioner on the coarse mesh of 
>>> telescope. The Grid is 3072*256*768 and process mesh is 96*8*24. The petsc 
>>> option file is attached.
>>> I still got the "Out Of Memory" error. The error occurred before the linear 
>>> solver finished one step. So I don't have the full info from ksp_view. The 
>>> info from ksp_view_pre is attached.
>>> 
>>> Okay - that is essentially useless (sorry)
>>>  
>>> 
>>> It seems to me that the error occurred when the 

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-09-09 Thread frank

Hi,

I want to continue digging into the memory problem here.
I did find a work around in the past, which is to use less cores per 
node so that each core has 8G memory. However this is deficient and 
expensive. I hope to locate the place that uses the most memory.


Here is a brief summary of the tests I did in past:
> Test1:   Mesh 1536*128*384  |  Process Mesh 48*4*12
Maximum (over computational time) process memory:   total 
7.0727e+08

Current process memory: total 7.0727e+08
Maximum (over computational time) space PetscMalloc()ed:  total 6.3908e+11
Current space PetscMalloc()ed:   total 1.8275e+09

> Test2:Mesh 1536*128*384  |  Process Mesh 96*8*24
Maximum (over computational time) process memory:   total 
5.9431e+09

Current process memory: total 5.9431e+09
Maximum (over computational time) space PetscMalloc()ed:  total 5.3202e+12
Current space PetscMalloc()ed: total 5.4844e+09

> Test3:Mesh 3072*256*768  |  Process Mesh 96*8*24
OOM( Out Of Memory ) killer of the supercomputer terminated the job 
during "KSPSolve".


I attached the output of ksp_view( the third test's output is from 
ksp_view_pre ), memory_view and also the petsc options.


In all the tests, each core can access about 2G memory. In test3, there 
are 4223139840 non-zeros in the matrix. This will consume about 1.74M, 
using double precision. Considering some extra memory used to store 
integer index, 2G memory should still be way enough.


Is there a way to find out which part of KSPSolve uses the most memory?
Thank you so much.

BTW, there are 4 options remains unused and I don't understand why they 
are omitted:

-mg_coarse_telescope_mg_coarse_ksp_type value: preonly
-mg_coarse_telescope_mg_coarse_pc_type value: bjacobi
-mg_coarse_telescope_mg_levels_ksp_max_it value: 1
-mg_coarse_telescope_mg_levels_ksp_type value: richardson


Regards,
Frank

On 07/13/2016 05:47 PM, Dave May wrote:



On 14 July 2016 at 01:07, frank > wrote:


Hi Dave,

Sorry for the late reply.
Thank you so much for your detailed reply.

I have a question about the estimation of the memory usage. There
are 4223139840 allocated non-zeros and 18432 MPI processes. Double
precision is used. So the memory per process is:
  4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ?
Did I do sth wrong here? Because this seems too small.


No - I totally f***ed it up. You are correct. That'll teach me for 
fumbling around with my iphone calculator and not using my brain. 
(Note that to convert to MB just divide by 1e6, not 1024^2 - although 
I apparently cannot convert between units correctly)


From the PETSc objects associated with the solver, It looks like it 
_should_ run with 2GB per MPI rank. Sorry for my mistake. 
Possibilities are: somewhere in your usage of PETSc you've introduced 
a memory leak; PETSc is doing a huge over allocation (e.g. as per our 
discussion of MatPtAP); or in your application code there are other 
objects you have forgotten to log the memory for.




I am running this job on Bluewater


I am using the 7 points FD stencil in 3D.


I thought so on both counts.


I apologize that I made a stupid mistake in computing the memory
per core. My settings render each core can access only 2G memory
on average instead of 8G which I mentioned in previous email. I
re-run the job with 8G memory per core on average and there is no
"Out Of Memory" error. I would do more test to see if there is
still some memory issue.


Ok. I'd still like to know where the memory was being used since my 
estimates were off.



Thanks,
  Dave


Regards,
Frank



On 07/11/2016 01:18 PM, Dave May wrote:

Hi Frank,


On 11 July 2016 at 19:14, frank > wrote:

Hi Dave,

I re-run the test using bjacobi as the preconditioner on the
coarse mesh of telescope. The Grid is 3072*256*768 and
process mesh is 96*8*24. The petsc option file is attached.
I still got the "Out Of Memory" error. The error occurred
before the linear solver finished one step. So I don't have
the full info from ksp_view. The info from ksp_view_pre is
attached.


Okay - that is essentially useless (sorry)


It seems to me that the error occurred when the decomposition
was going to be changed.


Based on what information?
Running with -info would give us more clues, but will create a
ton of output.
Please try running the case which failed with -info

I had another test with a grid of 1536*128*384 and the same
process mesh as above. There was no error. The ksp_view info
is attached for comparison.
Thank you.



[3] Here is my crude estimate of your memory usage.
I'll target the biggest memory hogs only to get 

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-07-13 Thread Dave May
On 14 July 2016 at 01:07, frank  wrote:

> Hi Dave,
>
> Sorry for the late reply.
> Thank you so much for your detailed reply.
>
> I have a question about the estimation of the memory usage. There are
> 4223139840 allocated non-zeros and 18432 MPI processes. Double precision is
> used. So the memory per process is:
>   4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ?
> Did I do sth wrong here? Because this seems too small.
>

No - I totally f***ed it up. You are correct. That'll teach me for fumbling
around with my iphone calculator and not using my brain. (Note that to
convert to MB just divide by 1e6, not 1024^2 - although I apparently cannot
convert between units correctly)

>From the PETSc objects associated with the solver, It looks like it
_should_ run with 2GB per MPI rank. Sorry for my mistake. Possibilities
are: somewhere in your usage of PETSc you've introduced a memory leak;
PETSc is doing a huge over allocation (e.g. as per our discussion of
MatPtAP); or in your application code there are other objects you have
forgotten to log the memory for.



> I am running this job on Bluewater
> 
>
I am using the 7 points FD stencil in 3D.
>

I thought so on both counts.


>
> I apologize that I made a stupid mistake in computing the memory per core.
> My settings render each core can access only 2G memory on average instead
> of 8G which I mentioned in previous email. I re-run the job with 8G memory
> per core on average and there is no "Out Of Memory" error. I would do more
> test to see if there is still some memory issue.
>

Ok. I'd still like to know where the memory was being used since my
estimates were off.


Thanks,
  Dave


>
> Regards,
> Frank
>
>
>
> On 07/11/2016 01:18 PM, Dave May wrote:
>
> Hi Frank,
>
>
> On 11 July 2016 at 19:14, frank  wrote:
>
>> Hi Dave,
>>
>> I re-run the test using bjacobi as the preconditioner on the coarse mesh
>> of telescope. The Grid is 3072*256*768 and process mesh is 96*8*24. The
>> petsc option file is attached.
>> I still got the "Out Of Memory" error. The error occurred before the
>> linear solver finished one step. So I don't have the full info from
>> ksp_view. The info from ksp_view_pre is attached.
>>
>
> Okay - that is essentially useless (sorry)
>
>
>>
>> It seems to me that the error occurred when the decomposition was going
>> to be changed.
>>
>
> Based on what information?
> Running with -info would give us more clues, but will create a ton of
> output.
> Please try running the case which failed with -info
>
>
>> I had another test with a grid of 1536*128*384 and the same process mesh
>> as above. There was no error. The ksp_view info is attached for comparison.
>> Thank you.
>>
>
>
> [3] Here is my crude estimate of your memory usage.
> I'll target the biggest memory hogs only to get an order of magnitude
> estimate
>
> * The Fine grid operator contains 4223139840 non-zeros --> 1.8 GB per MPI
> rank assuming double precision.
> The indices for the AIJ could amount to another 0.3 GB (assuming 32 bit
> integers)
>
> * You use 5 levels of coarsening, so the other operators should represent
> (collectively)
> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4  ~ 300 MB per MPI rank on the
> communicator with 18432 ranks.
> The coarse grid should consume ~ 0.5 MB per MPI rank on the communicator
> with 18432 ranks.
>
> * You use a reduction factor of 64, making the new communicator with 288
> MPI ranks.
> PCTelescope will first gather a temporary matrix associated with your
> coarse level operator assuming a comm size of 288 living on the comm with
> size 18432.
> This matrix will require approximately 0.5 * 64 = 32 MB per core on the
> 288 ranks.
> This matrix is then used to form a new MPIAIJ matrix on the subcomm, thus
> require another 32 MB per rank.
> The temporary matrix is now destroyed.
>
> * Because a DMDA is detected, a permutation matrix is assembled.
> This requires 2 doubles per point in the DMDA.
> Your coarse DMDA contains 92 x 16 x 48 points.
> Thus the permutation matrix will require < 1 MB per MPI rank on the
> sub-comm.
>
> * Lastly, the matrix is permuted. This uses MatPtAP(), but the resulting
> operator will have the same memory footprint as the unpermuted matrix (32
> MB). At any stage in PCTelescope, only 2 operators of size 32 MB are held
> in memory when the DMDA is provided.
>
> From my rough estimates, the worst case memory foot print for any given
> core, given your options is approximately
> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB  = 2465 MB
> This is way below 8 GB.
>
> Note this estimate completely ignores:
> (1) the memory required for the restriction operator,
> (2) the potential growth in the number of non-zeros per row due to
> Galerkin coarsening (I wished -ksp_view_pre reported the output from
> MatView so we could see the number of non-zeros required by the coarse
> level operators)
> (3) all temporary vectors required by the CG 

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-07-13 Thread Barry Smith

> On Jul 13, 2016, at 6:07 PM, frank  wrote:
> 
> Hi Dave,
> 
> Sorry for the late reply.
> Thank you so much for your detailed reply.
> 
> I have a question about the estimation of the memory usage. There are 
> 4223139840 allocated non-zeros and 18432 MPI processes. Double precision is 
> used. So the memory per process is:
>   4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ? 
> Did I do sth wrong here? Because this seems too small.

   In addition to storing the non-zero values there are several integer arrays 
that need to be stored. For each nonzero it stores the column index so if 
integers are 4 bytes that is another 1.7M/2 . If PetscInt is 64 bit then the 
column indices take the same amount of space as the numerical values 1.74 M. In 
addition there are at least 7  PetscInt Arrays that are of size mlocal  where 
mlocal is the number of rows local to the process. 
> 
> I am running this job on Bluewater
> I am using the 7 points FD stencil in 3D. 
> 
> I apologize that I made a stupid mistake in computing the memory per core. My 
> settings render each core can access only 2G memory on average instead of 8G 
> which I mentioned in previous email. I re-run the job with 8G memory per core 
> on average and there is no "Out Of Memory" error. I would do more test to see 
> if there is still some memory issue.
> 
> Regards,
> Frank
> 
> 
> On 07/11/2016 01:18 PM, Dave May wrote:
>> Hi Frank,
>> 
>> 
>> On 11 July 2016 at 19:14, frank  wrote:
>> Hi Dave,
>> 
>> I re-run the test using bjacobi as the preconditioner on the coarse mesh of 
>> telescope. The Grid is 3072*256*768 and process mesh is 96*8*24. The petsc 
>> option file is attached.
>> I still got the "Out Of Memory" error. The error occurred before the linear 
>> solver finished one step. So I don't have the full info from ksp_view. The 
>> info from ksp_view_pre is attached.
>> 
>> Okay - that is essentially useless (sorry)
>>  
>> 
>> It seems to me that the error occurred when the decomposition was going to 
>> be changed.
>> 
>> Based on what information?
>> Running with -info would give us more clues, but will create a ton of output.
>> Please try running the case which failed with -info
>>  
>> I had another test with a grid of 1536*128*384 and the same process mesh as 
>> above. There was no error. The ksp_view info is attached for comparison.
>> Thank you.
>> 
>> 
>> [3] Here is my crude estimate of your memory usage. 
>> I'll target the biggest memory hogs only to get an order of magnitude 
>> estimate
>> 
>> * The Fine grid operator contains 4223139840 non-zeros --> 1.8 GB per MPI 
>> rank assuming double precision.
>> The indices for the AIJ could amount to another 0.3 GB (assuming 32 bit 
>> integers)
>> 
>> * You use 5 levels of coarsening, so the other operators should represent 
>> (collectively)  
>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4  ~ 300 MB per MPI rank on the 
>> communicator with 18432 ranks.
>> The coarse grid should consume ~ 0.5 MB per MPI rank on the communicator 
>> with 18432 ranks.
>> 
>> * You use a reduction factor of 64, making the new communicator with 288 MPI 
>> ranks. 
>> PCTelescope will first gather a temporary matrix associated with your coarse 
>> level operator assuming a comm size of 288 living on the comm with size 
>> 18432. 
>> This matrix will require approximately 0.5 * 64 = 32 MB per core on the 288 
>> ranks. 
>> This matrix is then used to form a new MPIAIJ matrix on the subcomm, thus 
>> require another 32 MB per rank. 
>> The temporary matrix is now destroyed.
>> 
>> * Because a DMDA is detected, a permutation matrix is assembled. 
>> This requires 2 doubles per point in the DMDA. 
>> Your coarse DMDA contains 92 x 16 x 48 points. 
>> Thus the permutation matrix will require < 1 MB per MPI rank on the sub-comm.
>> 
>> * Lastly, the matrix is permuted. This uses MatPtAP(), but the resulting 
>> operator will have the same memory footprint as the unpermuted matrix (32 
>> MB). At any stage in PCTelescope, only 2 operators of size 32 MB are held in 
>> memory when the DMDA is provided.
>> 
>> From my rough estimates, the worst case memory foot print for any given 
>> core, given your options is approximately 
>> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB  = 2465 MB
>> This is way below 8 GB.
>> 
>> Note this estimate completely ignores:
>> (1) the memory required for the restriction operator, 
>> (2) the potential growth in the number of non-zeros per row due to Galerkin 
>> coarsening (I wished -ksp_view_pre reported the output from MatView so we 
>> could see the number of non-zeros required by the coarse level operators)
>> (3) all temporary vectors required by the CG solver, and those required by 
>> the smoothers.
>> (4) internal memory allocated by MatPtAP
>> (5) memory associated with IS's used within PCTelescope
>> 
>> So either I am completely off in my estimates, or you have not carefully 
>> estimated the memory usage of your application code. 

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-07-13 Thread frank

Hi Dave,

Sorry for the late reply.
Thank you so much for your detailed reply.

I have a question about the estimation of the memory usage. There are 
4223139840 allocated non-zeros and 18432 MPI processes. Double precision 
is used. So the memory per process is:

  4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ?
Did I do sth wrong here? Because this seems too small.

I am running this job on Bluewater 


I am using the 7 points FD stencil in 3D.

I apologize that I made a stupid mistake in computing the memory per 
core. My settings render each core can access only 2G memory on average 
instead of 8G which I mentioned in previous email. I re-run the job with 
8G memory per core on average and there is no "Out Of Memory" error. I 
would do more test to see if there is still some memory issue.


Regards,
Frank


On 07/11/2016 01:18 PM, Dave May wrote:

Hi Frank,


On 11 July 2016 at 19:14, frank > wrote:


Hi Dave,

I re-run the test using bjacobi as the preconditioner on the
coarse mesh of telescope. The Grid is 3072*256*768 and process
mesh is 96*8*24. The petsc option file is attached.
I still got the "Out Of Memory" error. The error occurred before
the linear solver finished one step. So I don't have the full info
from ksp_view. The info from ksp_view_pre is attached.


Okay - that is essentially useless (sorry)


It seems to me that the error occurred when the decomposition was
going to be changed.


Based on what information?
Running with -info would give us more clues, but will create a ton of 
output.

Please try running the case which failed with -info

I had another test with a grid of 1536*128*384 and the same
process mesh as above. There was no error. The ksp_view info is
attached for comparison.
Thank you.



[3] Here is my crude estimate of your memory usage.
I'll target the biggest memory hogs only to get an order of magnitude 
estimate


* The Fine grid operator contains 4223139840 non-zeros --> 1.8 GB per 
MPI rank assuming double precision.
The indices for the AIJ could amount to another 0.3 GB (assuming 32 
bit integers)


* You use 5 levels of coarsening, so the other operators should 
represent (collectively)
2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4  ~ 300 MB per MPI rank on the 
communicator with 18432 ranks.
The coarse grid should consume ~ 0.5 MB per MPI rank on the 
communicator with 18432 ranks.


* You use a reduction factor of 64, making the new communicator with 
288 MPI ranks.
PCTelescope will first gather a temporary matrix associated with your 
coarse level operator assuming a comm size of 288 living on the comm 
with size 18432.
This matrix will require approximately 0.5 * 64 = 32 MB per core on 
the 288 ranks.
This matrix is then used to form a new MPIAIJ matrix on the subcomm, 
thus require another 32 MB per rank.

The temporary matrix is now destroyed.

* Because a DMDA is detected, a permutation matrix is assembled.
This requires 2 doubles per point in the DMDA.
Your coarse DMDA contains 92 x 16 x 48 points.
Thus the permutation matrix will require < 1 MB per MPI rank on the 
sub-comm.


* Lastly, the matrix is permuted. This uses MatPtAP(), but the 
resulting operator will have the same memory footprint as the 
unpermuted matrix (32 MB). At any stage in PCTelescope, only 2 
operators of size 32 MB are held in memory when the DMDA is provided.


From my rough estimates, the worst case memory foot print for any 
given core, given your options is approximately

2100 MB + 300 MB + 32 MB + 32 MB + 1 MB  = 2465 MB
This is way below 8 GB.

Note this estimate completely ignores:
(1) the memory required for the restriction operator,
(2) the potential growth in the number of non-zeros per row due to 
Galerkin coarsening (I wished -ksp_view_pre reported the output from 
MatView so we could see the number of non-zeros required by the coarse 
level operators)
(3) all temporary vectors required by the CG solver, and those 
required by the smoothers.

(4) internal memory allocated by MatPtAP
(5) memory associated with IS's used within PCTelescope

So either I am completely off in my estimates, or you have not 
carefully estimated the memory usage of your application code. 
Hopefully others might examine/correct my rough estimates


Since I don't have your code I cannot access the latter.
Since I don't have access to the same machine you are running on, I 
think we need to take a step back.


[1] What machine are you running on? Send me a URL if its available

[2] What discretization are you using? (I am guessing a scalar 7 point 
FD stencil)
If it's a 7 point FD stencil, we should be able to examine the memory 
usage of your solver configuration using a standard, light weight 
existing PETSc example, run on your machine at the same scale.
This would hopefully enable us to correctly evaluate the actual memory 
usage required by the solver 

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-07-12 Thread Barry Smith

> On Jul 11, 2016, at 3:18 PM, Dave May  wrote:
> 
> Hi Frank,
> 
> 
> On 11 July 2016 at 19:14, frank  wrote:
> Hi Dave,
> 
> I re-run the test using bjacobi as the preconditioner on the coarse mesh of 
> telescope. The Grid is 3072*256*768 and process mesh is 96*8*24. The petsc 
> option file is attached.
> I still got the "Out Of Memory" error. The error occurred before the linear 
> solver finished one step. So I don't have the full info from ksp_view. The 
> info from ksp_view_pre is attached.
> 
> Okay - that is essentially useless (sorry)
>  
> 
> It seems to me that the error occurred when the decomposition was going to be 
> changed.
> 
> Based on what information?
> Running with -info would give us more clues, but will create a ton of output.
> Please try running the case which failed with -info
>  
> I had another test with a grid of 1536*128*384 and the same process mesh as 
> above. There was no error. The ksp_view info is attached for comparison.
> Thank you.
> 
> 
> [3] Here is my crude estimate of your memory usage. 
> I'll target the biggest memory hogs only to get an order of magnitude estimate
> 
> * The Fine grid operator contains 4223139840 non-zeros --> 1.8 GB per MPI 
> rank assuming double precision.
> The indices for the AIJ could amount to another 0.3 GB (assuming 32 bit 
> integers)
> 
> * You use 5 levels of coarsening, so the other operators should represent 
> (collectively)  
> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4  ~ 300 MB per MPI rank on the 
> communicator with 18432 ranks.
> The coarse grid should consume ~ 0.5 MB per MPI rank on the communicator with 
> 18432 ranks.
> 
> * You use a reduction factor of 64, making the new communicator with 288 MPI 
> ranks. 
> PCTelescope will first gather a temporary matrix associated with your coarse 
> level operator assuming a comm size of 288 living on the comm with size 
> 18432. 
> This matrix will require approximately 0.5 * 64 = 32 MB per core on the 288 
> ranks. 
> This matrix is then used to form a new MPIAIJ matrix on the subcomm, thus 
> require another 32 MB per rank. 
> The temporary matrix is now destroyed.
> 
> * Because a DMDA is detected, a permutation matrix is assembled. 
> This requires 2 doubles per point in the DMDA. 
> Your coarse DMDA contains 92 x 16 x 48 points. 
> Thus the permutation matrix will require < 1 MB per MPI rank on the sub-comm.
> 
> * Lastly, the matrix is permuted. This uses MatPtAP(), but the resulting 
> operator will have the same memory footprint as the unpermuted matrix (32 MB).

  Dave,

   MatPtAP has to generate some work space. Is it possible the "guess" it uses 
for needed work space is so absurdly (and unnecessarily) large that it triggers 
a memory issue?  It is possible that other places that require "guesses" for 
work space produce a problem? Also are all the "guesses" properly -info logged 
so that we can detected them before the program is killed?


  Barry


> At any stage in PCTelescope, only 2 operators of size 32 MB are held in 
> memory when the DMDA is provided.
> 
> From my rough estimates, the worst case memory foot print for any given core, 
> given your options is approximately 
> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB  = 2465 MB
> This is way below 8 GB.
> 
> Note this estimate completely ignores:
> (1) the memory required for the restriction operator, 
> (2) the potential growth in the number of non-zeros per row due to Galerkin 
> coarsening (I wished -ksp_view_pre reported the output from MatView so we 
> could see the number of non-zeros required by the coarse level operators)
> (3) all temporary vectors required by the CG solver, and those required by 
> the smoothers.
> (4) internal memory allocated by MatPtAP
> (5) memory associated with IS's used within PCTelescope
> 
> So either I am completely off in my estimates, or you have not carefully 
> estimated the memory usage of your application code. Hopefully others might 
> examine/correct my rough estimates
> 
> Since I don't have your code I cannot access the latter.
> Since I don't have access to the same machine you are running on, I think we 
> need to take a step back.
> 
> [1] What machine are you running on? Send me a URL if its available
> 
> [2] What discretization are you using? (I am guessing a scalar 7 point FD 
> stencil)
> If it's a 7 point FD stencil, we should be able to examine the memory usage 
> of your solver configuration using a standard, light weight existing PETSc 
> example, run on your machine at the same scale. 
> This would hopefully enable us to correctly evaluate the actual memory usage 
> required by the solver configuration you are using.
> 
> Thanks,
>   Dave
>  
> 
> 
> Frank
> 
> 
> 
> 
> On 07/08/2016 10:38 PM, Dave May wrote:
>> 
>> 
>> On Saturday, 9 July 2016, frank  wrote:
>> Hi Barry and Dave,
>> 
>> Thank both of you for the advice.
>> 
>> @Barry
>> I made a mistake in the file names in last email. I 

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-07-11 Thread Dave May
Hi Frank,


On 11 July 2016 at 19:14, frank  wrote:

> Hi Dave,
>
> I re-run the test using bjacobi as the preconditioner on the coarse mesh
> of telescope. The Grid is 3072*256*768 and process mesh is 96*8*24. The
> petsc option file is attached.
> I still got the "Out Of Memory" error. The error occurred before the
> linear solver finished one step. So I don't have the full info from
> ksp_view. The info from ksp_view_pre is attached.
>

Okay - that is essentially useless (sorry)


>
> It seems to me that the error occurred when the decomposition was going to
> be changed.
>

Based on what information?
Running with -info would give us more clues, but will create a ton of
output.
Please try running the case which failed with -info


> I had another test with a grid of 1536*128*384 and the same process mesh
> as above. There was no error. The ksp_view info is attached for comparison.
> Thank you.
>


[3] Here is my crude estimate of your memory usage.
I'll target the biggest memory hogs only to get an order of magnitude
estimate

* The Fine grid operator contains 4223139840 non-zeros --> 1.8 GB per MPI
rank assuming double precision.
The indices for the AIJ could amount to another 0.3 GB (assuming 32 bit
integers)

* You use 5 levels of coarsening, so the other operators should represent
(collectively)
2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4  ~ 300 MB per MPI rank on the
communicator with 18432 ranks.
The coarse grid should consume ~ 0.5 MB per MPI rank on the communicator
with 18432 ranks.

* You use a reduction factor of 64, making the new communicator with 288
MPI ranks.
PCTelescope will first gather a temporary matrix associated with your
coarse level operator assuming a comm size of 288 living on the comm with
size 18432.
This matrix will require approximately 0.5 * 64 = 32 MB per core on the 288
ranks.
This matrix is then used to form a new MPIAIJ matrix on the subcomm, thus
require another 32 MB per rank.
The temporary matrix is now destroyed.

* Because a DMDA is detected, a permutation matrix is assembled.
This requires 2 doubles per point in the DMDA.
Your coarse DMDA contains 92 x 16 x 48 points.
Thus the permutation matrix will require < 1 MB per MPI rank on the
sub-comm.

* Lastly, the matrix is permuted. This uses MatPtAP(), but the resulting
operator will have the same memory footprint as the unpermuted matrix (32
MB). At any stage in PCTelescope, only 2 operators of size 32 MB are held
in memory when the DMDA is provided.

>From my rough estimates, the worst case memory foot print for any given
core, given your options is approximately
2100 MB + 300 MB + 32 MB + 32 MB + 1 MB  = 2465 MB
This is way below 8 GB.

Note this estimate completely ignores:
(1) the memory required for the restriction operator,
(2) the potential growth in the number of non-zeros per row due to Galerkin
coarsening (I wished -ksp_view_pre reported the output from MatView so we
could see the number of non-zeros required by the coarse level operators)
(3) all temporary vectors required by the CG solver, and those required by
the smoothers.
(4) internal memory allocated by MatPtAP
(5) memory associated with IS's used within PCTelescope

So either I am completely off in my estimates, or you have not carefully
estimated the memory usage of your application code. Hopefully others might
examine/correct my rough estimates

Since I don't have your code I cannot access the latter.
Since I don't have access to the same machine you are running on, I think
we need to take a step back.

[1] What machine are you running on? Send me a URL if its available

[2] What discretization are you using? (I am guessing a scalar 7 point FD
stencil)
If it's a 7 point FD stencil, we should be able to examine the memory usage
of your solver configuration using a standard, light weight existing PETSc
example, run on your machine at the same scale.
This would hopefully enable us to correctly evaluate the actual memory
usage required by the solver configuration you are using.

Thanks,
  Dave


>
>
> Frank
>
>
>
>
> On 07/08/2016 10:38 PM, Dave May wrote:
>
>
>
> On Saturday, 9 July 2016, frank  wrote:
>
>> Hi Barry and Dave,
>>
>> Thank both of you for the advice.
>>
>> @Barry
>> I made a mistake in the file names in last email. I attached the correct
>> files this time.
>> For all the three tests, 'Telescope' is used as the coarse preconditioner.
>>
>> == Test1:   Grid: 1536*128*384,   Process Mesh: 48*4*12
>> Part of the memory usage:  Vector   125124 3971904 0.
>>  Matrix   101 101
>> 9462372 0
>>
>> == Test2: Grid: 1536*128*384,   Process Mesh: 96*8*24
>> Part of the memory usage:  Vector   125124 681672 0.
>>  Matrix   101 101
>> 1462180 0.
>>
>> In theory, the memory usage in Test1 should be 8 times of Test2. In my
>> case, it is about 6 times.
>>
>> == Test3: Grid: 3072*256*768,   

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-07-08 Thread Barry Smith

  Frank,

I don't think we yet have enough information to figure out what is going on.

Can you please run the test1 but on the larger number of processes? Our 
goal is to determine the memory usage scaling as you increase the mesh size 
with a fixed number of processes, from test 2 to test 3 so it is better to see 
the memory usage in test 1 with the same number of processes as test 2.



> On Jul 8, 2016, at 8:05 PM, frank  wrote:
> 
> Hi Barry and Dave,
> 
> Thank both of you for the advice.
> 
> @Barry
> I made a mistake in the file names in last email. I attached the correct 
> files this time.
> For all the three tests, 'Telescope' is used as the coarse preconditioner.
> 
> == Test1:   Grid: 1536*128*384,   Process Mesh: 48*4*12
> Part of the memory usage:  Vector   125124 3971904 0.
> Matrix   101 101  9462372 > 0
> 
> == Test2: Grid: 1536*128*384,   Process Mesh: 96*8*24
> Part of the memory usage:  Vector   125124 681672 0.
> Matrix   101 101  1462180 
> 0.
> 
> In theory, the memory usage in Test1 should be 8 times of Test2. In my case, 
> it is about 6 times.
> 
> == Test3: Grid: 3072*256*768,   Process Mesh: 96*8*24. Sub-domain per 
> process: 32*32*32
> Here I get the out of memory error.

   Please re-send us all the output from this failed case.

> 
> I tried to use -mg_coarse jacobi. In this way, I don't need to set 
> -mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right?
> The linear solver didn't work in this case. Petsc output some errors.

  You better set the options you want because the default options may not be 
want you want. 

   But it is possible that using jacobi on the coarse grid will result in 
failed failed convergence so I don't recommend it, better to use the defaults.

  The one thing I noted is that PETSc requests allocations much larger than are 
actually used (compare the maximum process memory to the maximum petscmalloc 
memory) in the test 1 and test 2 cases (likely because in the Galerkin RAR' 
process it doesn't know how much memory it will actually need). Normally these 
large requested allocations due no harm because it never actually needs to 
allocate all the memory pages for the full request. 

  Barry


> 
> @Dave
> In test3, I use only one instance of 'Telescope'. On the coarse mesh of 
> 'Telescope', I used LU as the preconditioner instead of SVD.
> If my set the levels correctly, then on the last coarse mesh of MG where it 
> calls 'Telescope', the sub-domain per process is 2*2*2.
> On the last coarse mesh of 'Telescope', there is only one grid point per 
> process.
> I still got the OOM error. The detailed petsc option file is attached.
> 
> 
> Thank you so much.
> 
> Frank
> 
> 
> 
> On 07/06/2016 02:51 PM, Barry Smith wrote:
>>> On Jul 6, 2016, at 4:19 PM, frank  wrote:
>>> 
>>> Hi Barry,
>>> 
>>> Thank you for you advice.
>>> I tried three test. In the 1st test, the grid is 3072*256*768 and the 
>>> process mesh is 96*8*24.
>>> The linear solver is 'cg' the preconditioner is 'mg' and 'telescope' is 
>>> used as the preconditioner at the coarse mesh.
>>> The system gives me the "Out of Memory" error before the linear system is 
>>> completely solved.
>>> The info from '-ksp_view_pre' is attached. I seems to me that the error 
>>> occurs when it reaches the coarse mesh.
>>> 
>>> The 2nd test uses a grid of 1536*128*384 and process mesh is 96*8*24. The 
>>> 3rd test uses the same grid but a different process mesh 48*4*12.
>>Are you sure this is right? The total matrix and vector memory usage goes 
>> from 2nd test
>>   Vector   384383  8,193,712 0.
>>   Matrix   103103 11,508,688 0.
>> to 3rd test
>>  Vector   384383  1,590,520 0.
>>   Matrix   103103  3,508,664 0.
>> that is the memory usage got smaller but if you have only 1/8th the 
>> processes and the same grid it should have gotten about 8 times bigger. Did 
>> you maybe cut the grid by a factor of 8 also? If so that still doesn't 
>> explain it because the memory usage changed by a factor of 5 something for 
>> the vectors and 3 something for the matrices.
>> 
>> 
>>> The linear solver and petsc options in 2nd and 3rd tests are the same in 
>>> 1st test. The linear solver works fine in both test.
>>> I attached the memory usage of the 2nd and 3rd tests. The memory info is 
>>> from the option '-log_summary'. I tried to use '-momery_info' as you 
>>> suggested, but in my case petsc treated it as an unused option. It output 
>>> nothing about the memory. Do I need to add sth to my code so I can use 
>>> '-memory_info'?
>>Sorry, my mistake the option is -memory_view
>> 
>>   Can you run the one case with -memory_view and -mg_coarse jacobi 
>> -ksp_max_it 1 (just so it doesn't 

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-07-08 Thread frank

Hi Barry and Dave,

Thank both of you for the advice.

@Barry
I made a mistake in the file names in last email. I attached the correct 
files this time.

For all the three tests, 'Telescope' is used as the coarse preconditioner.

== Test1:   Grid: 1536*128*384,   Process Mesh: 48*4*12
Part of the memory usage:  Vector   125124 3971904 0.
 Matrix   101 101  
9462372 0


== Test2: Grid: 1536*128*384,   Process Mesh: 96*8*24
Part of the memory usage:  Vector   125124 681672 0.
 Matrix   101 101  
1462180 0.


In theory, the memory usage in Test1 should be 8 times of Test2. In my 
case, it is about 6 times.


== Test3: Grid: 3072*256*768,   Process Mesh: 96*8*24. Sub-domain per 
process: 32*32*32

Here I get the out of memory error.

I tried to use -mg_coarse jacobi. In this way, I don't need to set 
-mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right?

The linear solver didn't work in this case. Petsc output some errors.

@Dave
In test3, I use only one instance of 'Telescope'. On the coarse mesh of 
'Telescope', I used LU as the preconditioner instead of SVD.
If my set the levels correctly, then on the last coarse mesh of MG where 
it calls 'Telescope', the sub-domain per process is 2*2*2.
On the last coarse mesh of 'Telescope', there is only one grid point per 
process.

I still got the OOM error. The detailed petsc option file is attached.


Thank you so much.

Frank



On 07/06/2016 02:51 PM, Barry Smith wrote:

On Jul 6, 2016, at 4:19 PM, frank  wrote:

Hi Barry,

Thank you for you advice.
I tried three test. In the 1st test, the grid is 3072*256*768 and the process 
mesh is 96*8*24.
The linear solver is 'cg' the preconditioner is 'mg' and 'telescope' is used as 
the preconditioner at the coarse mesh.
The system gives me the "Out of Memory" error before the linear system is 
completely solved.
The info from '-ksp_view_pre' is attached. I seems to me that the error occurs 
when it reaches the coarse mesh.

The 2nd test uses a grid of 1536*128*384 and process mesh is 96*8*24. The 3rd 
test uses the same grid but a different process mesh 48*4*12.

Are you sure this is right? The total matrix and vector memory usage goes 
from 2nd test
   Vector   384383  8,193,712 0.
   Matrix   103103 11,508,688 0.
to 3rd test
  Vector   384383  1,590,520 0.
   Matrix   103103  3,508,664 0.
that is the memory usage got smaller but if you have only 1/8th the processes 
and the same grid it should have gotten about 8 times bigger. Did you maybe cut 
the grid by a factor of 8 also? If so that still doesn't explain it because the 
memory usage changed by a factor of 5 something for the vectors and 3 something 
for the matrices.



The linear solver and petsc options in 2nd and 3rd tests are the same in 1st 
test. The linear solver works fine in both test.
I attached the memory usage of the 2nd and 3rd tests. The memory info is from 
the option '-log_summary'. I tried to use '-momery_info' as you suggested, but 
in my case petsc treated it as an unused option. It output nothing about the 
memory. Do I need to add sth to my code so I can use '-memory_info'?

Sorry, my mistake the option is -memory_view

   Can you run the one case with -memory_view and -mg_coarse jacobi -ksp_max_it 
1 (just so it doesn't iterate forever) to see how much memory is used without 
the telescope? Also run case 2 the same way.

   Barry




In both tests the memory usage is not large.

It seems to me that it might be the 'telescope'  preconditioner that allocated 
a lot of memory and caused the error in the 1st test.
Is there is a way to show how much memory it allocated?

Frank

On 07/05/2016 03:37 PM, Barry Smith wrote:

   Frank,

 You can run with -ksp_view_pre to have it "view" the KSP before the solve 
so hopefully it gets that far.

  Please run the problem that does fit with -memory_info when the problem completes 
it will show the "high water mark" for PETSc allocated memory and total memory 
used. We first want to look at these numbers to see if it is using more memory than you 
expect. You could also run with say half the grid spacing to see how the memory usage 
scaled with the increase in grid points. Make the runs also with -log_view and send all 
the output from these options.

Barry


On Jul 5, 2016, at 5:23 PM, frank  wrote:

Hi,

I am using the CG ksp solver and Multigrid preconditioner  to solve a linear 
system in parallel.
I chose to use the 'Telescope' as the preconditioner on the coarse mesh for its 
good performance.
The petsc options file is attached.

The domain is a 3d box.
It works well when the grid is  1536*128*384 and the process mesh is 96*8*24. When I 
double the size of grid and keep the same 

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-07-07 Thread Dave May
Hi Frank,

On 6 July 2016 at 00:23, frank  wrote:

> Hi,
>
> I am using the CG ksp solver and Multigrid preconditioner  to solve a
> linear system in parallel.
> I chose to use the 'Telescope' as the preconditioner on the coarse mesh
> for its good performance.
> The petsc options file is attached.
>
> The domain is a 3d box.
> It works well when the grid is  1536*128*384 and the process mesh is
> 96*8*24. When I double the size of grid and keep the same process mesh and
> petsc options, I get an "out of memory" error from the super-cluster I am
> using.
>

When you increase the mesh resolution, did you also increasing the number
of effective MG levels?
If the number of levels was held constant, then your coarse grid is
increasing in size.
I notice that you coarsest grid solver is PCSVD.
This can be become expensive as PCSVD will convert your coarse level
operator into a dense matrix and could be the cause of your OOM error.

Telescope does have to store a couple of temporary matrices, but generally
when used in the context of multigrid coarse level solves these operators
represent a very small fraction of the fine level operator.

We need to isolate if it's these temporary matrices from telescope causing
the OOM error, or if they are caused by something else (e.g. PCSVD).



> Each process has access to at least 8G memory, which should be more than
> enough for my application. I am sure that all the other parts of my code(
> except the linear solver ) do not use much memory. So I doubt if there is
> something wrong with the linear solver.
> The error occurs before the linear system is completely solved so I don't
> have the info from ksp view. I am not able to re-produce the error with a
> smaller problem either.
> In addition,  I tried to use the block jacobi as the preconditioner with
> the same grid and same decomposition. The linear solver runs extremely slow
> but there is no memory error.
>
> How can I diagnose what exactly cause the error?
>

This going to be kinda hard as I notice your configuration uses nested
calls to telescope.
You need to debug the solver configuration.

The only way I know to do this is by invoking telescope one step at a time.
By this I mean, use telescope once, check the configuration is what you
want.
 Then add the next instance of telescope.
For solver debugging  purposes, get rid of PCSVD.
The constant null space is propagated with telescope so you can just use an
iterative method.
Furthermore, for debugging purposes, you don't care about the solve time or
even convergence, so set -ksp_max_it 1 everywhere in your solver stack
(e.g. outer most KSP and on the coarsest level).

If one instance of telescope works, e.g. no OOM error occurs, add the next
instance of telescope.
If two instance of telescope also works (no OOM), revert back to PCSVD.
If now you have an OOM error, you should consider adding more levels, or
getting rid of PCSVD as your coarse grid solver.

Lastly, the option

-repart_da_processors_x 24

has been depreciated.
It now inherits the prefix from the solver running on the sub-communicator.
For your use case, it should this be something like
  -mg_coarse_telescope_repart_da_processors_x 24
Use -options_left 1 to verify the option is getting picked up (another
useful tool for solver config debugging).


Cheers
  Dave



> Thank you so much.
>
> Frank
>


Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-07-06 Thread Barry Smith

> On Jul 6, 2016, at 4:19 PM, frank  wrote:
> 
> Hi Barry,
> 
> Thank you for you advice.
> I tried three test. In the 1st test, the grid is 3072*256*768 and the process 
> mesh is 96*8*24.
> The linear solver is 'cg' the preconditioner is 'mg' and 'telescope' is used 
> as the preconditioner at the coarse mesh.
> The system gives me the "Out of Memory" error before the linear system is 
> completely solved.
> The info from '-ksp_view_pre' is attached. I seems to me that the error 
> occurs when it reaches the coarse mesh.
> 
> The 2nd test uses a grid of 1536*128*384 and process mesh is 96*8*24. The 3rd 
> test uses the same grid but a different process mesh 48*4*12.

   Are you sure this is right? The total matrix and vector memory usage goes 
from 2nd test 
  Vector   384383  8,193,712 0.
  Matrix   103103 11,508,688 0.
to 3rd test
 Vector   384383  1,590,520 0.
  Matrix   103103  3,508,664 0. 
that is the memory usage got smaller but if you have only 1/8th the processes 
and the same grid it should have gotten about 8 times bigger. Did you maybe cut 
the grid by a factor of 8 also? If so that still doesn't explain it because the 
memory usage changed by a factor of 5 something for the vectors and 3 something 
for the matrices. 


> The linear solver and petsc options in 2nd and 3rd tests are the same in 1st 
> test. The linear solver works fine in both test.
> I attached the memory usage of the 2nd and 3rd tests. The memory info is from 
> the option '-log_summary'. I tried to use '-momery_info' as you suggested, 
> but in my case petsc treated it as an unused option. It output nothing about 
> the memory. Do I need to add sth to my code so I can use '-memory_info'?

   Sorry, my mistake the option is -memory_view 

  Can you run the one case with -memory_view and -mg_coarse jacobi -ksp_max_it 
1 (just so it doesn't iterate forever) to see how much memory is used without 
the telescope? Also run case 2 the same way.

  Barry



> In both tests the memory usage is not large.
> 
> It seems to me that it might be the 'telescope'  preconditioner that 
> allocated a lot of memory and caused the error in the 1st test.
> Is there is a way to show how much memory it allocated?
> 
> Frank
> 
> On 07/05/2016 03:37 PM, Barry Smith wrote:
>>   Frank,
>> 
>> You can run with -ksp_view_pre to have it "view" the KSP before the 
>> solve so hopefully it gets that far.
>> 
>>  Please run the problem that does fit with -memory_info when the problem 
>> completes it will show the "high water mark" for PETSc allocated memory and 
>> total memory used. We first want to look at these numbers to see if it is 
>> using more memory than you expect. You could also run with say half the grid 
>> spacing to see how the memory usage scaled with the increase in grid points. 
>> Make the runs also with -log_view and send all the output from these options.
>> 
>>Barry
>> 
>>> On Jul 5, 2016, at 5:23 PM, frank  wrote:
>>> 
>>> Hi,
>>> 
>>> I am using the CG ksp solver and Multigrid preconditioner  to solve a 
>>> linear system in parallel.
>>> I chose to use the 'Telescope' as the preconditioner on the coarse mesh for 
>>> its good performance.
>>> The petsc options file is attached.
>>> 
>>> The domain is a 3d box.
>>> It works well when the grid is  1536*128*384 and the process mesh is 
>>> 96*8*24. When I double the size of grid and keep the same process mesh and 
>>> petsc options, I get an "out of memory" error from the super-cluster I am 
>>> using.
>>> Each process has access to at least 8G memory, which should be more than 
>>> enough for my application. I am sure that all the other parts of my code( 
>>> except the linear solver ) do not use much memory. So I doubt if there is 
>>> something wrong with the linear solver.
>>> The error occurs before the linear system is completely solved so I don't 
>>> have the info from ksp view. I am not able to re-produce the error with a 
>>> smaller problem either.
>>> In addition,  I tried to use the block jacobi as the preconditioner with 
>>> the same grid and same decomposition. The linear solver runs extremely slow 
>>> but there is no memory error.
>>> 
>>> How can I diagnose what exactly cause the error?
>>> Thank you so much.
>>> 
>>> Frank
>>> 
> 
> 



Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-07-06 Thread frank

Hi Barry,

Thank you for you advice.
I tried three test. In the 1st test, the grid is 3072*256*768 and the 
process mesh is 96*8*24.
The linear solver is 'cg' the preconditioner is 'mg' and 'telescope' is 
used as the preconditioner at the coarse mesh.
The system gives me the "Out of Memory" error before the linear system 
is completely solved.
The info from '-ksp_view_pre' is attached. I seems to me that the error 
occurs when it reaches the coarse mesh.


The 2nd test uses a grid of 1536*128*384 and process mesh is 96*8*24. 
The 3rd test uses the same grid but a different process mesh 48*4*12.
The linear solver and petsc options in 2nd and 3rd tests are the same in 
1st test. The linear solver works fine in both test.
I attached the memory usage of the 2nd and 3rd tests. The memory info is 
from the option '-log_summary'. I tried to use '-momery_info' as you 
suggested, but in my case petsc treated it as an unused option. It 
output nothing about the memory. Do I need to add sth to my code so I 
can use '-memory_info'?

In both tests the memory usage is not large.

It seems to me that it might be the 'telescope'  preconditioner that 
allocated a lot of memory and caused the error in the 1st test.

Is there is a way to show how much memory it allocated?

Frank

On 07/05/2016 03:37 PM, Barry Smith wrote:

   Frank,

 You can run with -ksp_view_pre to have it "view" the KSP before the solve 
so hopefully it gets that far.

  Please run the problem that does fit with -memory_info when the problem completes 
it will show the "high water mark" for PETSc allocated memory and total memory 
used. We first want to look at these numbers to see if it is using more memory than you 
expect. You could also run with say half the grid spacing to see how the memory usage 
scaled with the increase in grid points. Make the runs also with -log_view and send all 
the output from these options.

Barry


On Jul 5, 2016, at 5:23 PM, frank  wrote:

Hi,

I am using the CG ksp solver and Multigrid preconditioner  to solve a linear 
system in parallel.
I chose to use the 'Telescope' as the preconditioner on the coarse mesh for its 
good performance.
The petsc options file is attached.

The domain is a 3d box.
It works well when the grid is  1536*128*384 and the process mesh is 96*8*24. When I 
double the size of grid and keep the same process mesh and petsc options, I get an 
"out of memory" error from the super-cluster I am using.
Each process has access to at least 8G memory, which should be more than enough 
for my application. I am sure that all the other parts of my code( except the 
linear solver ) do not use much memory. So I doubt if there is something wrong 
with the linear solver.
The error occurs before the linear system is completely solved so I don't have 
the info from ksp view. I am not able to re-produce the error with a smaller 
problem either.
In addition,  I tried to use the block jacobi as the preconditioner with the 
same grid and same decomposition. The linear solver runs extremely slow but 
there is no memory error.

How can I diagnose what exactly cause the error?
Thank you so much.

Frank



KSP Object: 18432 MPI processes
  type: cg
  maximum iterations=1
  tolerances:  relative=1e-07, absolute=1e-50, divergence=1.
  left preconditioning
  using nonzero initial guess
  using UNPRECONDITIONED norm type for convergence test
PC Object: 18432 MPI processes
  type: mg
  PC has not been set up so information may be incomplete
MG: type is MULTIPLICATIVE, levels=4 cycles=v
  Cycles per PCApply=1
  Using Galerkin computed coarse grid matrices
  Coarse grid solver -- level ---
KSP Object:(mg_coarse_) 18432 MPI processes
  type: preonly
  maximum iterations=1, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
  left preconditioning
  using DEFAULT norm type for convergence test
PC Object:(mg_coarse_) 18432 MPI processes
  type: redundant
  PC has not been set up so information may be incomplete
Redundant preconditioner: Not yet setup
  Down solver (pre-smoother) on level 1 ---
KSP Object:(mg_levels_1_) 18432 MPI processes
  type: chebyshev
Chebyshev: eigenvalue estimates:  min = 0., max = 0.
  maximum iterations=2, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
  left preconditioning
  using NONE norm type for convergence test
PC Object:(mg_levels_1_) 18432 MPI processes
  type: sor
  PC has not been set up so information may be incomplete
SOR: type = local_symmetric, iterations = 1, local iterations = 1, 
omega = 1.
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 2 ---
KSP Object:(mg_levels_2_) 18432 MPI processes
 

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-07-05 Thread Barry Smith

  Frank,

You can run with -ksp_view_pre to have it "view" the KSP before the solve 
so hopefully it gets that far.

 Please run the problem that does fit with -memory_info when the problem 
completes it will show the "high water mark" for PETSc allocated memory and 
total memory used. We first want to look at these numbers to see if it is using 
more memory than you expect. You could also run with say half the grid spacing 
to see how the memory usage scaled with the increase in grid points. Make the 
runs also with -log_view and send all the output from these options.

   Barry

> On Jul 5, 2016, at 5:23 PM, frank  wrote:
> 
> Hi,
> 
> I am using the CG ksp solver and Multigrid preconditioner  to solve a linear 
> system in parallel.
> I chose to use the 'Telescope' as the preconditioner on the coarse mesh for 
> its good performance.
> The petsc options file is attached.
> 
> The domain is a 3d box.
> It works well when the grid is  1536*128*384 and the process mesh is 96*8*24. 
> When I double the size of grid and keep the same process mesh and petsc 
> options, I get an "out of memory" error from the super-cluster I am using.
> Each process has access to at least 8G memory, which should be more than 
> enough for my application. I am sure that all the other parts of my code( 
> except the linear solver ) do not use much memory. So I doubt if there is 
> something wrong with the linear solver.
> The error occurs before the linear system is completely solved so I don't 
> have the info from ksp view. I am not able to re-produce the error with a 
> smaller problem either.
> In addition,  I tried to use the block jacobi as the preconditioner with the 
> same grid and same decomposition. The linear solver runs extremely slow but 
> there is no memory error.
> 
> How can I diagnose what exactly cause the error?
> Thank you so much.
> 
> Frank
>