Re: [petsc-users] Question about memory usage in Multigrid preconditioner

Barry Smith Thu, 15 Sep 2016 09:30:31 -0700

   Should we have some simple selection of default algorithms based on problem 
size/number of processes? For example if using more than 1000 processes then 
use scalable version etc?  How would we decide on the parameter values?


   Barry

> On Sep 15, 2016, at 5:35 AM, Dave May <dave.mayhe...@gmail.com> wrote:
> 
> HI all,
> 
> I the only unexpected memory usage I can see is associated with the call to 
> MatPtAP().
> Here is something you can try immediately.
> Run your code with the additional options
>   -matrap 0 -matptap_scalable
> 
> I didn't realize this before, but the default behaviour of MatPtAP in 
> parallel is actually to to explicitly form the transpose of P (e.g. assemble 
> R = P^T) and then compute R.A.P. 
> You don't want to do this. The option -matrap 0 resolves this issue.
> 
> The implementation of P^T.A.P has two variants. 
> The scalable implementation (with respect to memory usage) is selected via 
> the second option -matptap_scalable.
> 
> Try it out - I see a significant memory reduction using these options for 
> particular mesh sizes / partitions.
> 
> I've attached a cleaned up version of the code you sent me.
> There were a number of memory leaks and other issues.
> The main points being
>   * You should call DMDAVecGetArrayF90() before VecAssembly{Begin,End}
>   * You should call PetscFinalize(), otherwise the option -log_summary 
> (-log_view) will not display anything once the program has completed.
> 
> 
> Thanks,
>   Dave
> 
> 
> On 15 September 2016 at 08:03, Hengjie Wang <hengj...@uci.edu> wrote:
> Hi Dave,
> 
> Sorry, I should have put more comment to explain the code.  
> The number of process in each dimension is the same: Px = Py=Pz=P. So is the 
> domain size.
> So if the you want to run the code for a  512^3 grid points on 16^3 cores, 
> you need to set "-N 512 -P 16" in the command line.
> I add more comments and also fix an error in the attached code. ( The error 
> only effects the accuracy of solution but not the memory usage. ) 
> 
> Thank you.
> Frank
> 
> 
> On 9/14/2016 9:05 PM, Dave May wrote:
>> 
>> 
>> On Thursday, 15 September 2016, Dave May <dave.mayhe...@gmail.com> wrote:
>> 
>> 
>> On Thursday, 15 September 2016, frank <hengj...@uci.edu> wrote:
>> Hi, 
>> 
>> I write a simple code to re-produce the error. I hope this can help to 
>> diagnose the problem.
>> The code just solves a 3d poisson equation. 
>> 
>> Why is the stencil width a runtime parameter?? And why is the default value 
>> 2? For 7-pnt FD Laplace, you only need a stencil width of 1. 
>> 
>> Was this choice made to mimic something in the real application code?
>> 
>> Please ignore - I misunderstood your usage of the param set by -P
>>  
>>  
>> 
>> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * 32. 
>> That's when I re-produce the OOM error. Each core has about 2G memory.
>> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The ksp 
>> solver works fine. 
>> I attached the code, ksp_view_pre's output and my petsc option file.
>> 
>> Thank you.
>> Frank
>> 
>> On 09/09/2016 06:38 PM, Hengjie Wang wrote:
>>> Hi Barry, 
>>> 
>>> I checked. On the supercomputer, I had the option "-ksp_view_pre" but it is 
>>> not in file I sent you. I am sorry for the confusion.
>>> 
>>> Regards,
>>> Frank
>>> 
>>> On Friday, September 9, 2016, Barry Smith <bsm...@mcs.anl.gov> wrote:
>>> 
>>> > On Sep 9, 2016, at 3:11 PM, frank <hengj...@uci.edu> wrote:
>>> >
>>> > Hi Barry,
>>> >
>>> > I think the first KSP view output is from -ksp_view_pre. Before I 
>>> > submitted the test, I was not sure whether there would be OOM error or 
>>> > not. So I added both -ksp_view_pre and -ksp_view.
>>> 
>>>   But the options file you sent specifically does NOT list the 
>>> -ksp_view_pre so how could it be from that?
>>> 
>>>    Sorry to be pedantic but I've spent too much time in the past trying to 
>>> debug from incorrect information and want to make sure that the information 
>>> I have is correct before thinking. Please recheck exactly what happened. 
>>> Rerun with the exact input file you emailed if that is needed.
>>> 
>>>    Barry
>>> 
>>> >
>>> > Frank
>>> >
>>> >
>>> > On 09/09/2016 12:38 PM, Barry Smith wrote:
>>> >>   Why does ksp_view2.txt have two KSP views in it while ksp_view1.txt 
>>> >> has only one KSPView in it? Did you run two different solves in the 2 
>>> >> case but not the one?
>>> >>
>>> >>   Barry
>>> >>
>>> >>
>>> >>
>>> >>> On Sep 9, 2016, at 10:56 AM, frank <hengj...@uci.edu> wrote:
>>> >>>
>>> >>> Hi,
>>> >>>
>>> >>> I want to continue digging into the memory problem here.
>>> >>> I did find a work around in the past, which is to use less cores per 
>>> >>> node so that each core has 8G memory. However this is deficient and 
>>> >>> expensive. I hope to locate the place that uses the most memory.
>>> >>>
>>> >>> Here is a brief summary of the tests I did in past:
>>> >>>> Test1:   Mesh 1536*128*384  |  Process Mesh 48*4*12
>>> >>> Maximum (over computational time) process memory:           total 
>>> >>> 7.0727e+08
>>> >>> Current process memory:                                                 
>>> >>>         total 7.0727e+08
>>> >>> Maximum (over computational time) space PetscMalloc()ed:  total 
>>> >>> 6.3908e+11
>>> >>> Current space PetscMalloc()ed:                                          
>>> >>>       total 1.8275e+09
>>> >>>
>>> >>>> Test2:    Mesh 1536*128*384  |  Process Mesh 96*8*24
>>> >>> Maximum (over computational time) process memory:           total 
>>> >>> 5.9431e+09
>>> >>> Current process memory:                                                 
>>> >>>         total 5.9431e+09
>>> >>> Maximum (over computational time) space PetscMalloc()ed:  total 
>>> >>> 5.3202e+12
>>> >>> Current space PetscMalloc()ed:                                          
>>> >>>        total 5.4844e+09
>>> >>>
>>> >>>> Test3:    Mesh 3072*256*768  |  Process Mesh 96*8*24
>>> >>>     OOM( Out Of Memory ) killer of the supercomputer terminated the job 
>>> >>> during "KSPSolve".
>>> >>>
>>> >>> I attached the output of ksp_view( the third test's output is from 
>>> >>> ksp_view_pre ), memory_view and also the petsc options.
>>> >>>
>>> >>> In all the tests, each core can access about 2G memory. In test3, there 
>>> >>> are 4223139840 non-zeros in the matrix. This will consume about 1.74M, 
>>> >>> using double precision. Considering some extra memory used to store 
>>> >>> integer index, 2G memory should still be way enough.
>>> >>>
>>> >>> Is there a way to find out which part of KSPSolve uses the most memory?
>>> >>> Thank you so much.
>>> >>>
>>> >>> BTW, there are 4 options remains unused and I don't understand why they 
>>> >>> are omitted:
>>> >>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly
>>> >>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi
>>> >>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1
>>> >>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson
>>> >>>
>>> >>>
>>> >>> Regards,
>>> >>> Frank
>>> >>>
>>> >>> On 07/13/2016 05:47 PM, Dave May wrote:
>>> >>>>
>>> >>>> On 14 July 2016 at 01:07, frank <hengj...@uci.edu> wrote:
>>> >>>> Hi Dave,
>>> >>>>
>>> >>>> Sorry for the late reply.
>>> >>>> Thank you so much for your detailed reply.
>>> >>>>
>>> >>>> I have a question about the estimation of the memory usage. There are 
>>> >>>> 4223139840 allocated non-zeros and 18432 MPI processes. Double 
>>> >>>> precision is used. So the memory per process is:
>>> >>>>   4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ?
>>> >>>> Did I do sth wrong here? Because this seems too small.
>>> >>>>
>>> >>>> No - I totally f***ed it up. You are correct. That'll teach me for 
>>> >>>> fumbling around with my iphone calculator and not using my brain. 
>>> >>>> (Note that to convert to MB just divide by 1e6, not 1024^2 - although 
>>> >>>> I apparently cannot convert between units correctly....)
>>> >>>>
>>> >>>> From the PETSc objects associated with the solver, It looks like it 
>>> >>>> _should_ run with 2GB per MPI rank. Sorry for my mistake. 
>>> >>>> Possibilities are: somewhere in your usage of PETSc you've introduced 
>>> >>>> a memory leak; PETSc is doing a huge over allocation (e.g. as per our 
>>> >>>> discussion of MatPtAP); or in your application code there are other 
>>> >>>> objects you have forgotten to log the memory for.
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> I am running this job on Bluewater
>>> >>>> I am using the 7 points FD stencil in 3D.
>>> >>>>
>>> >>>> I thought so on both counts.
>>> >>>>
>>> >>>> I apologize that I made a stupid mistake in computing the memory per 
>>> >>>> core. My settings render each core can access only 2G memory on 
>>> >>>> average instead of 8G which I mentioned in previous email. I re-run 
>>> >>>> the job with 8G memory per core on average and there is no "Out Of 
>>> >>>> Memory" error. I would do more test to see if there is still some 
>>> >>>> memory issue.
>>> >>>>
>>> >>>> Ok. I'd still like to know where the memory was being used since my 
>>> >>>> estimates were off.
>>> >>>>
>>> >>>>
>>> >>>> Thanks,
>>> >>>>   Dave
>>> >>>>
>>> >>>> Regards,
>>> >>>> Frank
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On 07/11/2016 01:18 PM, Dave May wrote:
>>> >>>>> Hi Frank,
>>> >>>>>
>>> >>>>>
>>> >>>>> On 11 July 2016 at 19:14, frank <hengj...@uci.edu> wrote:
>>> >>>>> Hi Dave,
>>> >>>>>
>>> >>>>> I re-run the test using bjacobi as the preconditioner on the coarse 
>>> >>>>> mesh of telescope. The Grid is 3072*256*768 and process mesh is 
>>> >>>>> 96*8*24. The petsc option file is attached.
>>> >>>>> I still got the "Out Of Memory" error. The error occurred before the 
>>> >>>>> linear solver finished one step. So I don't have the full info from 
>>> >>>>> ksp_view. The info from ksp_view_pre is attached.
>>> >>>>>
>>> >>>>> Okay - that is essentially useless (sorry)
>>> >>>>>
>>> >>>>> It seems to me that the error occurred when the decomposition was 
>>> >>>>> going to be changed.
>>> >>>>>
>>> >>>>> Based on what information?
>>> >>>>> Running with -info would give us more clues, but will create a ton of 
>>> >>>>> output.
>>> >>>>> Please try running the case which failed with -info
>>> >>>>>  I had another test with a grid of 1536*128*384 and the same process 
>>> >>>>> mesh as above. There was no error. The ksp_view info is attached for 
>>> >>>>> comparison.
>>> >>>>> Thank you.
>>> >>>>>
>>> >>>>>
>>> >>>>> [3] Here is my crude estimate of your memory usage.
>>> >>>>> I'll target the biggest memory hogs only to get an order of magnitude 
>>> >>>>> estimate
>>> >>>>>
>>> >>>>> * The Fine grid operator contains 4223139840 non-zeros --> 1.8 GB per 
>>> >>>>> MPI rank assuming double precision.
>>> >>>>> The indices for the AIJ could amount to another 0.3 GB (assuming 32 
>>> >>>>> bit integers)
>>> >>>>>
>>> >>>>> * You use 5 levels of coarsening, so the other operators should 
>>> >>>>> represent (collectively)
>>> >>>>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4  ~ 300 MB per MPI rank on the 
>>> >>>>> communicator with 18432 ranks.
>>> >>>>> The coarse grid should consume ~ 0.5 MB per MPI rank on the 
>>> >>>>> communicator with 18432 ranks.
>>> >>>>>
>>> >>>>> * You use a reduction factor of 64, making the new communicator with 
>>> >>>>> 288 MPI ranks.
>>> >>>>> PCTelescope will first gather a temporary matrix associated with your 
>>> >>>>> coarse level operator assuming a comm size of 288 living on the comm 
>>> >>>>> with size 18432.
>>> >>>>> This matrix will require approximately 0.5 * 64 = 32 MB per core on 
>>> >>>>> the 288 ranks.
>>> >>>>> This matrix is then used to form a new MPIAIJ matrix on the subcomm, 
>>> >>>>> thus require another 32 MB per rank.
>>> >>>>> The temporary matrix is now destroyed.
>>> >>>>>
>>> >>>>> * Because a DMDA is detected, a permutation matrix is assembled.
>>> >>>>> This requires 2 doubles per point in the DMDA.
>>> >>>>> Your coarse DMDA contains 92 x 16 x 48 points.
>>> >>>>> Thus the permutation matrix will require < 1 MB per MPI rank on the 
>>> >>>>> sub-comm.
>>> >>>>>
>>> >>>>> * Lastly, the matrix is permuted. This uses MatPtAP(), but the 
>>> >>>>> resulting operator will have the same memory footprint as the 
>>> >>>>> unpermuted matrix (32 MB). At any stage in PCTelescope, only 2 
>>> >>>>> operators of size 32 MB are held in memory when the DMDA is provided.
>>> >>>>>
>>> >>>>> From my rough estimates, the worst case memory foot print for any 
>>> >>>>> given core, given your options is approximately
>>> >>>>> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB  = 2465 MB
>>> >>>>> This is way below 8 GB.
>>> >>>>>
>>> >>>>> Note this estimate completely ignores:
>>> >>>>> (1) the memory required for the restriction operator,
>>> >>>>> (2) the potential growth in the number of non-zeros per row due to 
>>> >>>>> Galerkin coarsening (I wished -ksp_view_pre reported the output from 
>>> >>>>> MatView so we could see the number of non-zeros required by the 
>>> >>>>> coarse level operators)
>>> >>>>> (3) all temporary vectors required by the CG solver, and those 
>>> >>>>> required by the smoothers.
>>> >>>>> (4) internal memory allocated by MatPtAP
>>> >>>>> (5) memory associated with IS's used within PCTelescope
>>> >>>>>
>>> >>>>> So either I am completely off in my estimates, or you have not 
>>> >>>>> carefully estimated the memory usage of your application code. 
>>> >>>>> Hopefully others might examine/correct my rough estimates
>>> >>>>>
>>> >>>>> Since I don't have your code I cannot access the latter.
>>> >>>>> Since I don't have access to the same machine you are running on, I 
>>> >>>>> think we need to take a step back.
>>> >>>>>
>>> >>>>> [1] What machine are you running on? Send me a URL if its available
>>> >>>>>
>>> >>>>> [2] What discretization are you using? (I am guessing a scalar 7 
>>> >>>>> point FD stencil)
>>> >>>>> If it's a 7 point FD stencil, we should be able to examine the memory 
>>> >>>>> usage of your solver configuration using a standard, light weight 
>>> >>>>> existing PETSc example, run on your machine at the same scale.
>>> >>>>> This would hopefully enable us to correctly evaluate the actual 
>>> >>>>> memory usage required by the solver configuration you are using.
>>> >>>>>
>>> >>>>> Thanks,
>>> >>>>>   Dave
>>> >>>>>
>>> >>>>>
>>> >>>>> Frank
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> On 07/08/2016 10:38 PM, Dave May wrote:
>>> >>>>>>
>>> >>>>>> On Saturday, 9 July 2016, frank <hengj...@uci.edu> wrote:
>>> >>>>>> Hi Barry and Dave,
>>> >>>>>>
>>> >>>>>> Thank both of you for the advice.
>>> >>>>>>
>>> >>>>>> @Barry
>>> >>>>>> I made a mistake in the file names in last email. I attached the 
>>> >>>>>> correct files this time.
>>> >>>>>> For all the three tests, 'Telescope' is used as the coarse 
>>> >>>>>> preconditioner.
>>> >>>>>>
>>> >>>>>> == Test1:   Grid: 1536*128*384,   Process Mesh: 48*4*12
>>> >>>>>> Part of the memory usage:  Vector   125            124 3971904     0.
>>> >>>>>>                                              Matrix   101 101      
>>> >>>>>> 9462372     0
>>> >>>>>>
>>> >>>>>> == Test2: Grid: 1536*128*384,   Process Mesh: 96*8*24
>>> >>>>>> Part of the memory usage:  Vector   125            124 681672     0.
>>> >>>>>>                                              Matrix   101 101      
>>> >>>>>> 1462180     0.
>>> >>>>>>
>>> >>>>>> In theory, the memory usage in Test1 should be 8 times of Test2. In 
>>> >>>>>> my case, it is about 6 times.
>>> >>>>>>
>>> >>>>>> == Test3: Grid: 3072*256*768,   Process Mesh: 96*8*24. Sub-domain 
>>> >>>>>> per process: 32*32*32
>>> >>>>>> Here I get the out of memory error.
>>> >>>>>>
>>> >>>>>> I tried to use -mg_coarse jacobi. In this way, I don't need to set 
>>> >>>>>> -mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right?
>>> >>>>>> The linear solver didn't work in this case. Petsc output some errors.
>>> >>>>>>
>>> >>>>>> @Dave
>>> >>>>>> In test3, I use only one instance of 'Telescope'. On the coarse mesh 
>>> >>>>>> of 'Telescope', I used LU as the preconditioner instead of SVD.
>>> >>>>>> If my set the levels correctly, then on the last coarse mesh of MG 
>>> >>>>>> where it calls 'Telescope', the sub-domain per process is 2*2*2.
>>> >>>>>> On the last coarse mesh of 'Telescope', there is only one grid point 
>>> >>>>>> per process.
>>> >>>>>> I still got the OOM error. The detailed petsc option file is 
>>> >>>>>> attached.
>>> >>>>>>
>>> >>>>>> Do you understand the expected memory usage for the particular 
>>> >>>>>> parallel LU implementation you are using? I don't (seriously). 
>>> >>>>>> Replace LU with bjacobi and re-run this test. My point about solver 
>>> >>>>>> debugging is still valid.
>>> >>>>>>
>>> >>>>>> And please send the result of KSPView so we can see what is actually 
>>> >>>>>> used in the computations
>>> >>>>>>
>>> >>>>>> Thanks
>>> >>>>>>   Dave
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> Thank you so much.
>>> >>>>>>
>>> >>>>>> Frank
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> On 07/06/2016 02:51 PM, Barry Smith wrote:
>>> >>>>>> On Jul 6, 2016, at 4:19 PM, frank <hengj...@uci.edu> wrote:
>>> >>>>>>
>>> >>>>>> Hi Barry,
>>> >>>>>>
>>> >>>>>> Thank you for you advice.
>>> >>>>>> I tried three test. In the 1st test, the grid is 3072*256*768 and 
>>> >>>>>> the process mesh is 96*8*24.
>>> >>>>>> The linear solver is 'cg' the preconditioner is 'mg' and 'telescope' 
>>> >>>>>> is used as the preconditioner at the coarse mesh.
>>> >>>>>> The system gives me the "Out of Memory" error before the linear 
>>> >>>>>> system is completely solved.
>>> >>>>>> The info from '-ksp_view_pre' is attached. I seems to me that the 
>>> >>>>>> error occurs when it reaches the coarse mesh.
>>> >>>>>>
>>> >>>>>> The 2nd test uses a grid of 1536*128*384 and process mesh is 
>>> >>>>>> 96*8*24. The 3rd                                             test 
>>> >>>>>> uses the same grid but a different process mesh 48*4*12.
>>> >>>>>>     Are you sure this is right? The total matrix and vector memory 
>>> >>>>>> usage goes from 2nd test
>>> >>>>>>                Vector   384            383      8,193,712     0.
>>> >>>>>>                Matrix   103            103     11,508,688     0.
>>> >>>>>> to 3rd test
>>> >>>>>>               Vector   384            383      1,590,520     0.
>>> >>>>>>                Matrix   103            103      3,508,664     0.
>>> >>>>>> that is the memory usage got smaller but if you have only 1/8th the 
>>> >>>>>> processes and the same grid it should have gotten about 8 times 
>>> >>>>>> bigger. Did you maybe cut the grid by a factor of 8 also? If so that 
>>> >>>>>> still doesn't explain it because the memory usage changed by a 
>>> >>>>>> factor of 5 something for the vectors and 3 something for the 
>>> >>>>>> matrices.
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> The linear solver and petsc options in 2nd and 3rd tests are the 
>>> >>>>>> same in 1st test. The linear solver works fine in both test.
>>> >>>>>> I attached the memory usage of the 2nd and 3rd tests. The memory 
>>> >>>>>> info is from the option '-log_summary'. I tried to use 
>>> >>>>>> '-momery_info' as you suggested, but in my case petsc treated it as 
>>> >>>>>> an unused option. It output nothing about the memory. Do I need to 
>>> >>>>>> add sth to my code so I can use '-memory_info'?
>>> >>>>>>     Sorry, my mistake the option is -memory_view
>>> >>>>>>
>>> >>>>>>    Can you run the one case with -memory_view and -mg_coarse jacobi 
>>> >>>>>> -ksp_max_it 1 (just so it doesn't iterate forever) to see how much 
>>> >>>>>> memory is used without the telescope? Also run case 2 the same way.
>>> >>>>>>
>>> >>>>>>    Barry
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> In both tests the memory usage is not large.
>>> >>>>>>
>>> >>>>>> It seems to me that it might be the 'telescope'  preconditioner that 
>>> >>>>>> allocated a lot of memory and caused the error in the 1st test.
>>> >>>>>> Is there is a way to show how much memory it allocated?
>>> >>>>>>
>>> >>>>>> Frank
>>> >>>>>>
>>> >>>>>> On 07/05/2016 03:37 PM, Barry Smith wrote:
>>> >>>>>>    Frank,
>>> >>>>>>
>>> >>>>>>      You can run with -ksp_view_pre to have it "view" the KSP before 
>>> >>>>>> the solve so hopefully it gets that far.
>>> >>>>>>
>>> >>>>>>       Please run the problem that does fit with -memory_info when 
>>> >>>>>> the problem completes it will show the "high water mark" for PETSc 
>>> >>>>>> allocated memory and total memory used. We first want to look at 
>>> >>>>>> these numbers to see if it is using more memory than you expect. You 
>>> >>>>>> could also run with say half the grid spacing to see how the memory 
>>> >>>>>> usage scaled with the increase in grid points. Make the runs also 
>>> >>>>>> with -log_view and send all the output from these options.
>>> >>>>>>
>>> >>>>>>     Barry
>>> >>>>>>
>>> >>>>>> On Jul 5, 2016, at 5:23 PM, frank <hengj...@uci.edu> wrote:
>>> >>>>>>
>>> >>>>>> Hi,
>>> >>>>>>
>>> >>>>>> I am using the CG ksp solver and Multigrid preconditioner  to solve 
>>> >>>>>> a linear system in parallel.
>>> >>>>>> I chose to use the 'Telescope' as the preconditioner on the coarse 
>>> >>>>>> mesh for its good performance.
>>> >>>>>> The petsc options file is attached.
>>> >>>>>>
>>> >>>>>> The domain is a 3d box.
>>> >>>>>> It works well when the grid is  1536*128*384 and the process mesh is 
>>> >>>>>> 96*8*24. When I double the size of grid and                          
>>> >>>>>>                        keep the same process mesh and petsc options, 
>>> >>>>>> I get an "out of memory" error from the super-cluster I am using.
>>> >>>>>> Each process has access to at least 8G memory, which should be more 
>>> >>>>>> than enough for my application. I am sure that all the other parts 
>>> >>>>>> of my code( except the linear solver ) do not use much memory. So I 
>>> >>>>>> doubt if there is something wrong with the linear solver.
>>> >>>>>> The error occurs before the linear system is completely solved so I 
>>> >>>>>> don't have the info from ksp view. I am not able to re-produce the 
>>> >>>>>> error with a smaller problem either.
>>> >>>>>> In addition,  I tried to use the block jacobi as the preconditioner 
>>> >>>>>> with the same grid and same decomposition. The linear solver runs 
>>> >>>>>> extremely slow but there is no memory error.
>>> >>>>>>
>>> >>>>>> How can I diagnose what exactly cause the error?
>>> >>>>>> Thank you so much.
>>> >>>>>>
>>> >>>>>> Frank
>>> >>>>>> <petsc_options.txt>
>>> >>>>>> <ksp_view_pre.txt><memory_test2.txt><memory_test3.txt><petsc_options.txt>
>>> >>>>>>
>>> >>>>>
>>> >>>>
>>> >>> <ksp_view1.txt><ksp_view2.txt><ksp_view3.txt><memory1.txt><memory2.txt><petsc_options1.txt><petsc_options2.txt><petsc_options3.txt>
>>> >
>>> 
>> 
>>  
> 
> 
> <test_ksp.F90>

Re: [petsc-users] Question about memory usage in Multigrid preconditioner

Reply via email to