[petsc-users] Question about memory usage in Multigrid preconditioner

2016-07-08 Thread Dave May
On Saturday, 9 July 2016, frank > wrote:

> Hi Barry and Dave,
>
> Thank both of you for the advice.
>
> @Barry
> I made a mistake in the file names in last email. I attached the correct
> files this time.
> For all the three tests, 'Telescope' is used as the coarse preconditioner.
>
> == Test1:   Grid: 1536*128*384,   Process Mesh: 48*4*12
> Part of the memory usage:  Vector   125124 3971904 0.
>  Matrix   101 101
> 9462372 0
>
> == Test2: Grid: 1536*128*384,   Process Mesh: 96*8*24
> Part of the memory usage:  Vector   125124 681672 0.
>  Matrix   101 101
> 1462180 0.
>
> In theory, the memory usage in Test1 should be 8 times of Test2. In my
> case, it is about 6 times.
>
> == Test3: Grid: 3072*256*768,   Process Mesh: 96*8*24. Sub-domain per
> process: 32*32*32
> Here I get the out of memory error.
>
> I tried to use -mg_coarse jacobi. In this way, I don't need to set
> -mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right?
> The linear solver didn't work in this case. Petsc output some errors.
>
> @Dave
> In test3, I use only one instance of 'Telescope'. On the coarse mesh of
> 'Telescope', I used LU as the preconditioner instead of SVD.
> If my set the levels correctly, then on the last coarse mesh of MG where
> it calls 'Telescope', the sub-domain per process is 2*2*2.
> On the last coarse mesh of 'Telescope', there is only one grid point per
> process.
> I still got the OOM error. The detailed petsc option file is attached.


Do you understand the expected memory usage for the particular parallel
LU implementation you are using? I don't (seriously). Replace LU with
bjacobi and re-run this test. My point about solver debugging is still
valid.

And please send the result of KSPView so we can see what is actually used
in the computations

Thanks
  Dave


>
>
> Thank you so much.
>
> Frank
>
>
>
> On 07/06/2016 02:51 PM, Barry Smith wrote:
>
>> On Jul 6, 2016, at 4:19 PM, frank  wrote:
>>>
>>> Hi Barry,
>>>
>>> Thank you for you advice.
>>> I tried three test. In the 1st test, the grid is 3072*256*768 and the
>>> process mesh is 96*8*24.
>>> The linear solver is 'cg' the preconditioner is 'mg' and 'telescope' is
>>> used as the preconditioner at the coarse mesh.
>>> The system gives me the "Out of Memory" error before the linear system
>>> is completely solved.
>>> The info from '-ksp_view_pre' is attached. I seems to me that the error
>>> occurs when it reaches the coarse mesh.
>>>
>>> The 2nd test uses a grid of 1536*128*384 and process mesh is 96*8*24.
>>> The 3rd test uses the same grid but a different process mesh 48*4*12.
>>>
>> Are you sure this is right? The total matrix and vector memory usage
>> goes from 2nd test
>>Vector   384383  8,193,712 0.
>>Matrix   103103 11,508,688 0.
>> to 3rd test
>>   Vector   384383  1,590,520 0.
>>Matrix   103103  3,508,664 0.
>> that is the memory usage got smaller but if you have only 1/8th the
>> processes and the same grid it should have gotten about 8 times bigger. Did
>> you maybe cut the grid by a factor of 8 also? If so that still doesn't
>> explain it because the memory usage changed by a factor of 5 something for
>> the vectors and 3 something for the matrices.
>>
>>
>> The linear solver and petsc options in 2nd and 3rd tests are the same in
>>> 1st test. The linear solver works fine in both test.
>>> I attached the memory usage of the 2nd and 3rd tests. The memory info is
>>> from the option '-log_summary'. I tried to use '-momery_info' as you
>>> suggested, but in my case petsc treated it as an unused option. It output
>>> nothing about the memory. Do I need to add sth to my code so I can use
>>> '-memory_info'?
>>>
>> Sorry, my mistake the option is -memory_view
>>
>>Can you run the one case with -memory_view and -mg_coarse jacobi
>> -ksp_max_it 1 (just so it doesn't iterate forever) to see how much memory
>> is used without the telescope? Also run case 2 the same way.
>>
>>Barry
>>
>>
>>
>> In both tests the memory usage is not large.
>>>
>>> It seems to me that it might be the 'telescope'  preconditioner that
>>> allocated a lot of memory and caused the error in the 1st test.
>>> Is there is a way to show how much memory it allocated?
>>>
>>> Frank
>>>
>>> On 07/05/2016 03:37 PM, Barry Smith wrote:
>>>
Frank,

  You can run with -ksp_view_pre to have it "view" the KSP before
 the solve so hopefully it gets that far.

   Please run the problem that does fit with -memory_info when the
 problem completes it will show the "high water mark" for PETSc allocated
 memory and total memory used. We first want to look at these numbers to see

Re: [petsc-users] Need help: Poisson's equation with complex number

2016-07-08 Thread Barry Smith

  I would start with -pc_type gamg and -ksp_type gmres see how many iterations 
it requires and how the number of iterations grows when you  refine the mesh 
(if life is good then the iterations will grow only moderately as you refine 
the mesh).  If these options result in very bad convergence then send us the 
output with -ksp_monitor_true_residual and we'll have to consider other options.

  Barry



> On Jul 8, 2016, at 6:59 PM, Yaoyu Hu  wrote:
> 
> Hi everyone,
> 
> I am now trying to solve a partial differential equation which is
> similar to the three dimensional Poisson’s equation but with complex
> numbers. The equation is the result of the transformation of a set of
> fluid dynamic equations from time domain to frequency domain. I have
> Dirichlet boundary conditions all over the boundaries. The coefficient
> matrix that obtained by finite volume method (with collocated grid) is
> made of complex numbers. I would like to know that, for my discretized
> equation which solver and PC are the most suitable to work with. And
> BTW, the solution should be done in parallel with about 10^4 - 10^6
> unknowns.
> 
> It is the first time for me to solve equations with complex numbers,
> however, I am not so good at mathematics involving complex number. I
> would like to know want should I bear in mind throughout the whole
> process? Any suggestions or comments are appreciated.
> 
> Thanks!
> 
> HU Yaoyu



Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-07-08 Thread Barry Smith

  Frank,

I don't think we yet have enough information to figure out what is going on.

Can you please run the test1 but on the larger number of processes? Our 
goal is to determine the memory usage scaling as you increase the mesh size 
with a fixed number of processes, from test 2 to test 3 so it is better to see 
the memory usage in test 1 with the same number of processes as test 2.



> On Jul 8, 2016, at 8:05 PM, frank  wrote:
> 
> Hi Barry and Dave,
> 
> Thank both of you for the advice.
> 
> @Barry
> I made a mistake in the file names in last email. I attached the correct 
> files this time.
> For all the three tests, 'Telescope' is used as the coarse preconditioner.
> 
> == Test1:   Grid: 1536*128*384,   Process Mesh: 48*4*12
> Part of the memory usage:  Vector   125124 3971904 0.
> Matrix   101 101  9462372 > 0
> 
> == Test2: Grid: 1536*128*384,   Process Mesh: 96*8*24
> Part of the memory usage:  Vector   125124 681672 0.
> Matrix   101 101  1462180 
> 0.
> 
> In theory, the memory usage in Test1 should be 8 times of Test2. In my case, 
> it is about 6 times.
> 
> == Test3: Grid: 3072*256*768,   Process Mesh: 96*8*24. Sub-domain per 
> process: 32*32*32
> Here I get the out of memory error.

   Please re-send us all the output from this failed case.

> 
> I tried to use -mg_coarse jacobi. In this way, I don't need to set 
> -mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right?
> The linear solver didn't work in this case. Petsc output some errors.

  You better set the options you want because the default options may not be 
want you want. 

   But it is possible that using jacobi on the coarse grid will result in 
failed failed convergence so I don't recommend it, better to use the defaults.

  The one thing I noted is that PETSc requests allocations much larger than are 
actually used (compare the maximum process memory to the maximum petscmalloc 
memory) in the test 1 and test 2 cases (likely because in the Galerkin RAR' 
process it doesn't know how much memory it will actually need). Normally these 
large requested allocations due no harm because it never actually needs to 
allocate all the memory pages for the full request. 

  Barry


> 
> @Dave
> In test3, I use only one instance of 'Telescope'. On the coarse mesh of 
> 'Telescope', I used LU as the preconditioner instead of SVD.
> If my set the levels correctly, then on the last coarse mesh of MG where it 
> calls 'Telescope', the sub-domain per process is 2*2*2.
> On the last coarse mesh of 'Telescope', there is only one grid point per 
> process.
> I still got the OOM error. The detailed petsc option file is attached.
> 
> 
> Thank you so much.
> 
> Frank
> 
> 
> 
> On 07/06/2016 02:51 PM, Barry Smith wrote:
>>> On Jul 6, 2016, at 4:19 PM, frank  wrote:
>>> 
>>> Hi Barry,
>>> 
>>> Thank you for you advice.
>>> I tried three test. In the 1st test, the grid is 3072*256*768 and the 
>>> process mesh is 96*8*24.
>>> The linear solver is 'cg' the preconditioner is 'mg' and 'telescope' is 
>>> used as the preconditioner at the coarse mesh.
>>> The system gives me the "Out of Memory" error before the linear system is 
>>> completely solved.
>>> The info from '-ksp_view_pre' is attached. I seems to me that the error 
>>> occurs when it reaches the coarse mesh.
>>> 
>>> The 2nd test uses a grid of 1536*128*384 and process mesh is 96*8*24. The 
>>> 3rd test uses the same grid but a different process mesh 48*4*12.
>>Are you sure this is right? The total matrix and vector memory usage goes 
>> from 2nd test
>>   Vector   384383  8,193,712 0.
>>   Matrix   103103 11,508,688 0.
>> to 3rd test
>>  Vector   384383  1,590,520 0.
>>   Matrix   103103  3,508,664 0.
>> that is the memory usage got smaller but if you have only 1/8th the 
>> processes and the same grid it should have gotten about 8 times bigger. Did 
>> you maybe cut the grid by a factor of 8 also? If so that still doesn't 
>> explain it because the memory usage changed by a factor of 5 something for 
>> the vectors and 3 something for the matrices.
>> 
>> 
>>> The linear solver and petsc options in 2nd and 3rd tests are the same in 
>>> 1st test. The linear solver works fine in both test.
>>> I attached the memory usage of the 2nd and 3rd tests. The memory info is 
>>> from the option '-log_summary'. I tried to use '-momery_info' as you 
>>> suggested, but in my case petsc treated it as an unused option. It output 
>>> nothing about the memory. Do I need to add sth to my code so I can use 
>>> '-memory_info'?
>>Sorry, my mistake the option is -memory_view
>> 
>>   Can you run the one case with -memory_view and -mg_coarse jacobi 
>> -ksp_max_it 1 (just so it doesn't 

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

2016-07-08 Thread frank

Hi Barry and Dave,

Thank both of you for the advice.

@Barry
I made a mistake in the file names in last email. I attached the correct 
files this time.

For all the three tests, 'Telescope' is used as the coarse preconditioner.

== Test1:   Grid: 1536*128*384,   Process Mesh: 48*4*12
Part of the memory usage:  Vector   125124 3971904 0.
 Matrix   101 101  
9462372 0


== Test2: Grid: 1536*128*384,   Process Mesh: 96*8*24
Part of the memory usage:  Vector   125124 681672 0.
 Matrix   101 101  
1462180 0.


In theory, the memory usage in Test1 should be 8 times of Test2. In my 
case, it is about 6 times.


== Test3: Grid: 3072*256*768,   Process Mesh: 96*8*24. Sub-domain per 
process: 32*32*32

Here I get the out of memory error.

I tried to use -mg_coarse jacobi. In this way, I don't need to set 
-mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right?

The linear solver didn't work in this case. Petsc output some errors.

@Dave
In test3, I use only one instance of 'Telescope'. On the coarse mesh of 
'Telescope', I used LU as the preconditioner instead of SVD.
If my set the levels correctly, then on the last coarse mesh of MG where 
it calls 'Telescope', the sub-domain per process is 2*2*2.
On the last coarse mesh of 'Telescope', there is only one grid point per 
process.

I still got the OOM error. The detailed petsc option file is attached.


Thank you so much.

Frank



On 07/06/2016 02:51 PM, Barry Smith wrote:

On Jul 6, 2016, at 4:19 PM, frank  wrote:

Hi Barry,

Thank you for you advice.
I tried three test. In the 1st test, the grid is 3072*256*768 and the process 
mesh is 96*8*24.
The linear solver is 'cg' the preconditioner is 'mg' and 'telescope' is used as 
the preconditioner at the coarse mesh.
The system gives me the "Out of Memory" error before the linear system is 
completely solved.
The info from '-ksp_view_pre' is attached. I seems to me that the error occurs 
when it reaches the coarse mesh.

The 2nd test uses a grid of 1536*128*384 and process mesh is 96*8*24. The 3rd 
test uses the same grid but a different process mesh 48*4*12.

Are you sure this is right? The total matrix and vector memory usage goes 
from 2nd test
   Vector   384383  8,193,712 0.
   Matrix   103103 11,508,688 0.
to 3rd test
  Vector   384383  1,590,520 0.
   Matrix   103103  3,508,664 0.
that is the memory usage got smaller but if you have only 1/8th the processes 
and the same grid it should have gotten about 8 times bigger. Did you maybe cut 
the grid by a factor of 8 also? If so that still doesn't explain it because the 
memory usage changed by a factor of 5 something for the vectors and 3 something 
for the matrices.



The linear solver and petsc options in 2nd and 3rd tests are the same in 1st 
test. The linear solver works fine in both test.
I attached the memory usage of the 2nd and 3rd tests. The memory info is from 
the option '-log_summary'. I tried to use '-momery_info' as you suggested, but 
in my case petsc treated it as an unused option. It output nothing about the 
memory. Do I need to add sth to my code so I can use '-memory_info'?

Sorry, my mistake the option is -memory_view

   Can you run the one case with -memory_view and -mg_coarse jacobi -ksp_max_it 
1 (just so it doesn't iterate forever) to see how much memory is used without 
the telescope? Also run case 2 the same way.

   Barry




In both tests the memory usage is not large.

It seems to me that it might be the 'telescope'  preconditioner that allocated 
a lot of memory and caused the error in the 1st test.
Is there is a way to show how much memory it allocated?

Frank

On 07/05/2016 03:37 PM, Barry Smith wrote:

   Frank,

 You can run with -ksp_view_pre to have it "view" the KSP before the solve 
so hopefully it gets that far.

  Please run the problem that does fit with -memory_info when the problem completes 
it will show the "high water mark" for PETSc allocated memory and total memory 
used. We first want to look at these numbers to see if it is using more memory than you 
expect. You could also run with say half the grid spacing to see how the memory usage 
scaled with the increase in grid points. Make the runs also with -log_view and send all 
the output from these options.

Barry


On Jul 5, 2016, at 5:23 PM, frank  wrote:

Hi,

I am using the CG ksp solver and Multigrid preconditioner  to solve a linear 
system in parallel.
I chose to use the 'Telescope' as the preconditioner on the coarse mesh for its 
good performance.
The petsc options file is attached.

The domain is a 3d box.
It works well when the grid is  1536*128*384 and the process mesh is 96*8*24. When I 
double the size of grid and keep the same 

[petsc-users] Need help: Poisson's equation with complex number

2016-07-08 Thread Yaoyu Hu
Hi everyone,

I am now trying to solve a partial differential equation which is
similar to the three dimensional Poisson’s equation but with complex
numbers. The equation is the result of the transformation of a set of
fluid dynamic equations from time domain to frequency domain. I have
Dirichlet boundary conditions all over the boundaries. The coefficient
matrix that obtained by finite volume method (with collocated grid) is
made of complex numbers. I would like to know that, for my discretized
equation which solver and PC are the most suitable to work with. And
BTW, the solution should be done in parallel with about 10^4 - 10^6
unknowns.

It is the first time for me to solve equations with complex numbers,
however, I am not so good at mathematics involving complex number. I
would like to know want should I bear in mind throughout the whole
process? Any suggestions or comments are appreciated.

Thanks!

HU Yaoyu


Re: [petsc-users] Are performance benchmarks available?

2016-07-08 Thread Mark Adams
This would be a good idea,
Please use SNES ex56 and send me the '-info | grep GAMG' result, and
-log_view, so that I can check that it looks OK.
Thanks,
Mark

On Thu, Jul 7, 2016 at 7:25 PM, Barry Smith  wrote:

>
>While I agree that having this type of information available would be
> very useful it is surprisingly difficult to do this and keep it up to date,
> plus we have little time to do it, so unfortunately we don't having thing
> like this.
>
>We should do this! Perhaps pick one or two problems and run them with
> say a simple preconditioner like ASM and then GAMG on a large problem with
> a couple of different number of processes, say 1, 32 and 256 then run them
> once a month to confirm they remain the same performance wise and make the
> performance numbers available on the web. Maybe using Mark's ex56.c case.
>
>I'll try to set something up
>
>Barry
>
>
> Always a big pain to try to automate the running on those damn batch
> systems!
>
>
>
>
>
> > On Jun 30, 2016, at 9:20 AM, Faraz Hussain 
> wrote:
> >
> > I am wondering if there are benchmarks available that I can solve on my
> cluster to compare performance? I want to compare how scaling up-to 240
> cores compares to large models already solved on an optimized
> configuration and hardware.
> >
>
>


Re: [petsc-users] Reordering rows of parallel matrix across processors

2016-07-08 Thread Cyrill Vonplanta
Trying to make a small example for reproducing I could figure out my mistake in 
the code (totally unrelated to the question). MatPermute(..) just works fine.

My apologies.
Cyrill

From: Matthew Knepley >
Date: Donnerstag, 7. Juli 2016 um 16:48
To: von Planta Cyrill 
>
Cc: "petsc-users@mcs.anl.gov" 
>
Subject: Re: [petsc-users] Reordering rows of parallel matrix across processors

On Thu, Jul 7, 2016 at 3:37 AM, Cyrill Vonplanta 
> wrote:
Dear all,

I would like to reorder the rows of a matrix across processors. Is this 
possible with MatPermute(…)?

Yes, this works with MatPermute().

Could you send this small example so I can reproduce it?

To illustrate here is how an index set would look like for a matrix with  M=35 
on 2 CPU’s. Amongst other things I intend to swap the first and last row here.

[0] Number of indices in set 24
[0] 0 34
[0] 1 1
[0] 2 2
[0] 3 3
[0] 4 4
[0] 5 5
[0] 6 6
[0] 7 7
[0] 8 15
[0] 9 16
[0] 10 11
[0] 11 8
[0] 12 10
[0] 13 21
[0] 14 9
[0] 15 12
[0] 16 13
[0] 17 14
[0] 18 17
[0] 19 18
[0] 20 19
[0] 21 20
[0] 22 22
[0] 23 23
[1] Number of indices in set 11
[1] 0 24
[1] 1 25
[1] 2 26
[1] 3 27
[1] 4 28
[1] 5 29
[1] 6 30
[1] 7 31
[1] 8 32
[1] 9 33
[1] 10 0

Instead of exchanging the first and last row it seems to replace them with 
zeros only.
If this can’t be done with MatPermute how could it be done?

You could also use MatGetSubMatrix().

  Thanks,

Matt

Thanks
Cyrill




--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener