-ksp_view in both cases?
> On Oct 4, 2016, at 1:13 PM, frank <hengj...@uci.edu> wrote:
>
> Hi,
>
> This question is follow-up of the thread "Question about memory usage in
> Multigrid preconditioner".
> I used to have the "Out of Memory(OOM)" problem when using the CG+Telescope
> MG solver with 32768 cores. Adding the "-matrap 0; -matptap_scalable" option
> did solve that problem.
>
> Then I test the scalability by solving a 3d poisson eqn for 1 step. I used
> one sub-communicator in all the tests. The difference between the petsc
> options in those tests are: 1 the pc_telescope_reduction_factor; 2 the number
> of multigrid levels in the up/down solver. The function "ksp_solve" is timed.
> It is kind of slow and doesn't scale at all.
>
> Test1: 512^3 grid points
> Core# telescope_reduction_factor MG levels# for up/down solver
> Time for KSPSolve (s)
> 512 8 4 / 3
> 6.2466
> 4096 64 5 / 3
> 0.9361
> 32768 64 4 / 3
> 4.8914
>
> Test2: 1024^3 grid points
> Core# telescope_reduction_factor MG levels# for up/down solver
> Time for KSPSolve (s)
> 4096 64 5 / 4
> 3.4139
> 8192 128 5 / 4
> 2.4196
> 16384 32 5 / 3
> 5.4150
> 32768 64 5 / 3
> 5.6067
> 65536 128 5 / 3
> 6.5219
>
> I guess I didn't set the MG levels properly. What would be the efficient way
> to arrange the MG levels?
> Also which preconditionr at the coarse mesh of the 2nd communicator should I
> use to improve the performance?
>
> I attached the test code and the petsc options file for the 1024^3 cube with
> 32768 cores.
>
> Thank you.
>
> Regards,
> Frank
>
>
>
>
>
>
> On 09/15/2016 03:35 AM, Dave May wrote:
>> HI all,
>>
>> I the only unexpected memory usage I can see is associated with the call to
>> MatPtAP().
>> Here is something you can try immediately.
>> Run your code with the additional options
>> -matrap 0 -matptap_scalable
>>
>> I didn't realize this before, but the default behaviour of MatPtAP in
>> parallel is actually to to explicitly form the transpose of P (e.g. assemble
>> R = P^T) and then compute R.A.P.
>> You don't want to do this. The option -matrap 0 resolves this issue.
>>
>> The implementation of P^T.A.P has two variants.
>> The scalable implementation (with respect to memory usage) is selected via
>> the second option -matptap_scalable.
>>
>> Try it out - I see a significant memory reduction using these options for
>> particular mesh sizes / partitions.
>>
>> I've attached a cleaned up version of the code you sent me.
>> There were a number of memory leaks and other issues.
>> The main points being
>> * You should call DMDAVecGetArrayF90() before VecAssembly{Begin,End}
>> * You should call PetscFinalize(), otherwise the option -log_summary
>> (-log_view) will not display anything once the program has completed.
>>
>>
>> Thanks,
>> Dave
>>
>>
>> On 15 September 2016 at 08:03, Hengjie Wang <hengj...@uci.edu> wrote:
>> Hi Dave,
>>
>> Sorry, I should have put more comment to explain the code.
>> The number of process in each dimension is the same: Px = Py=Pz=P. So is the
>> domain size.
>> So if the you want to run the code for a 512^3 grid points on 16^3 cores,
>> you need to set "-N 512 -P 16" in the command line.
>> I add more comments and also fix an error in the attached code. ( The error
>> only effects the accuracy of solution but not the memory usage. )
>>
>> Thank you.
>> Frank
>>
>>
>> On 9/14/2016 9:05 PM, Dave May wrote:
>>>
>>>
>>> On Thursday, 15 September 2016, Dave May <dave.mayhe...@gmail.com> wrote:
>>>
>>>
>>> On Thursday, 15 September 2016, frank <hengj...@uci.edu> wrote:
>>> Hi,
>>>
>>> I write a simple code to re-produce the error. I hope this can help to
>>> diagnose the problem.
>>> The code just solves a 3d poisson equation.
>>>
>>> Why is the stencil width a runtime parameter?? And why is the default value
>>> 2? For 7-pnt FD Laplace, you only need a stencil width of 1.
>>>
>>> Was this choice made to mimic something in the real application code?
>>>
>>> Please ignore - I misunderstood your usage of the param set by -P
>>>
>>>
>>>
>>> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * 32.
>>> That's when I re-produce the OOM error. Each core has about 2G memory.
>>> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The ksp
>>> solver works fine.
>>> I attached the code, ksp_view_pre's output and my petsc option file.
>>>
>>> Thank you.
>>> Frank
>>>
>>> On 09/09/2016 06:38 PM, Hengjie Wang wrote:
>>>> Hi Barry,
>>>>
>>>> I checked. On the supercomputer, I had the option "-ksp_view_pre" but it
>>>> is not in file I sent you. I am sorry for the confusion.
>>>>
>>>> Regards,
>>>> Frank
>>>>
>>>> On Friday, September 9, 2016, Barry Smith <bsm...@mcs.anl.gov> wrote:
>>>>
>>>> > On Sep 9, 2016, at 3:11 PM, frank <hengj...@uci.edu> wrote:
>>>> >
>>>> > Hi Barry,
>>>> >
>>>> > I think the first KSP view output is from -ksp_view_pre. Before I
>>>> > submitted the test, I was not sure whether there would be OOM error or
>>>> > not. So I added both -ksp_view_pre and -ksp_view.
>>>>
>>>> But the options file you sent specifically does NOT list the
>>>> -ksp_view_pre so how could it be from that?
>>>>
>>>> Sorry to be pedantic but I've spent too much time in the past trying to
>>>> debug from incorrect information and want to make sure that the
>>>> information I have is correct before thinking. Please recheck exactly what
>>>> happened. Rerun with the exact input file you emailed if that is needed.
>>>>
>>>> Barry
>>>>
>>>> >
>>>> > Frank
>>>> >
>>>> >
>>>> > On 09/09/2016 12:38 PM, Barry Smith wrote:
>>>> >> Why does ksp_view2.txt have two KSP views in it while ksp_view1.txt
>>>> >> has only one KSPView in it? Did you run two different solves in the 2
>>>> >> case but not the one?
>>>> >>
>>>> >> Barry
>>>> >>
>>>> >>
>>>> >>
>>>> >>> On Sep 9, 2016, at 10:56 AM, frank <hengj...@uci.edu> wrote:
>>>> >>>
>>>> >>> Hi,
>>>> >>>
>>>> >>> I want to continue digging into the memory problem here.
>>>> >>> I did find a work around in the past, which is to use less cores per
>>>> >>> node so that each core has 8G memory. However this is deficient and
>>>> >>> expensive. I hope to locate the place that uses the most memory.
>>>> >>>
>>>> >>> Here is a brief summary of the tests I did in past:
>>>> >>>> Test1: Mesh 1536*128*384 | Process Mesh 48*4*12
>>>> >>> Maximum (over computational time) process memory: total
>>>> >>> 7.0727e+08
>>>> >>> Current process memory:
>>>> >>> total 7.0727e+08
>>>> >>> Maximum (over computational time) space PetscMalloc()ed: total
>>>> >>> 6.3908e+11
>>>> >>> Current space PetscMalloc()ed:
>>>> >>> total 1.8275e+09
>>>> >>>
>>>> >>>> Test2: Mesh 1536*128*384 | Process Mesh 96*8*24
>>>> >>> Maximum (over computational time) process memory: total
>>>> >>> 5.9431e+09
>>>> >>> Current process memory:
>>>> >>> total 5.9431e+09
>>>> >>> Maximum (over computational time) space PetscMalloc()ed: total
>>>> >>> 5.3202e+12
>>>> >>> Current space PetscMalloc()ed:
>>>> >>> total 5.4844e+09
>>>> >>>
>>>> >>>> Test3: Mesh 3072*256*768 | Process Mesh 96*8*24
>>>> >>> OOM( Out Of Memory ) killer of the supercomputer terminated the
>>>> >>> job during "KSPSolve".
>>>> >>>
>>>> >>> I attached the output of ksp_view( the third test's output is from
>>>> >>> ksp_view_pre ), memory_view and also the petsc options.
>>>> >>>
>>>> >>> In all the tests, each core can access about 2G memory. In test3,
>>>> >>> there are 4223139840 non-zeros in the matrix. This will consume about
>>>> >>> 1.74M, using double precision. Considering some extra memory used to
>>>> >>> store integer index, 2G memory should still be way enough.
>>>> >>>
>>>> >>> Is there a way to find out which part of KSPSolve uses the most memory?
>>>> >>> Thank you so much.
>>>> >>>
>>>> >>> BTW, there are 4 options remains unused and I don't understand why
>>>> >>> they are omitted:
>>>> >>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly
>>>> >>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi
>>>> >>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1
>>>> >>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson
>>>> >>>
>>>> >>>
>>>> >>> Regards,
>>>> >>> Frank
>>>> >>>
>>>> >>> On 07/13/2016 05:47 PM, Dave May wrote:
>>>> >>>>
>>>> >>>> On 14 July 2016 at 01:07, frank <hengj...@uci.edu> wrote:
>>>> >>>> Hi Dave,
>>>> >>>>
>>>> >>>> Sorry for the late reply.
>>>> >>>> Thank you so much for your detailed reply.
>>>> >>>>
>>>> >>>> I have a question about the estimation of the memory usage. There are
>>>> >>>> 4223139840 allocated non-zeros and 18432 MPI processes. Double
>>>> >>>> precision is used. So the memory per process is:
>>>> >>>> 4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ?
>>>> >>>> Did I do sth wrong here? Because this seems too small.
>>>> >>>>
>>>> >>>> No - I totally f***ed it up. You are correct. That'll teach me for
>>>> >>>> fumbling around with my iphone calculator and not using my brain.
>>>> >>>> (Note that to convert to MB just divide by 1e6, not 1024^2 - although
>>>> >>>> I apparently cannot convert between units correctly....)
>>>> >>>>
>>>> >>>> From the PETSc objects associated with the solver, It looks like it
>>>> >>>> _should_ run with 2GB per MPI rank. Sorry for my mistake.
>>>> >>>> Possibilities are: somewhere in your usage of PETSc you've introduced
>>>> >>>> a memory leak; PETSc is doing a huge over allocation (e.g. as per our
>>>> >>>> discussion of MatPtAP); or in your application code there are other
>>>> >>>> objects you have forgotten to log the memory for.
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>> I am running this job on Bluewater
>>>> >>>> I am using the 7 points FD stencil in 3D.
>>>> >>>>
>>>> >>>> I thought so on both counts.
>>>> >>>>
>>>> >>>> I apologize that I made a stupid mistake in computing the memory per
>>>> >>>> core. My settings render each core can access only 2G memory on
>>>> >>>> average instead of 8G which I mentioned in previous email. I re-run
>>>> >>>> the job with 8G memory per core on average and there is no "Out Of
>>>> >>>> Memory" error. I would do more test to see if there is still some
>>>> >>>> memory issue.
>>>> >>>>
>>>> >>>> Ok. I'd still like to know where the memory was being used since my
>>>> >>>> estimates were off.
>>>> >>>>
>>>> >>>>
>>>> >>>> Thanks,
>>>> >>>> Dave
>>>> >>>>
>>>> >>>> Regards,
>>>> >>>> Frank
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>> >>>> On 07/11/2016 01:18 PM, Dave May wrote:
>>>> >>>>> Hi Frank,
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> On 11 July 2016 at 19:14, frank <hengj...@uci.edu> wrote:
>>>> >>>>> Hi Dave,
>>>> >>>>>
>>>> >>>>> I re-run the test using bjacobi as the preconditioner on the coarse
>>>> >>>>> mesh of telescope. The Grid is 3072*256*768 and process mesh is
>>>> >>>>> 96*8*24. The petsc option file is attached.
>>>> >>>>> I still got the "Out Of Memory" error. The error occurred before the
>>>> >>>>> linear solver finished one step. So I don't have the full info from
>>>> >>>>> ksp_view. The info from ksp_view_pre is attached.
>>>> >>>>>
>>>> >>>>> Okay - that is essentially useless (sorry)
>>>> >>>>>
>>>> >>>>> It seems to me that the error occurred when the decomposition was
>>>> >>>>> going to be changed.
>>>> >>>>>
>>>> >>>>> Based on what information?
>>>> >>>>> Running with -info would give us more clues, but will create a ton
>>>> >>>>> of output.
>>>> >>>>> Please try running the case which failed with -info
>>>> >>>>> I had another test with a grid of 1536*128*384 and the same process
>>>> >>>>> mesh as above. There was no error. The ksp_view info is attached for
>>>> >>>>> comparison.
>>>> >>>>> Thank you.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> [3] Here is my crude estimate of your memory usage.
>>>> >>>>> I'll target the biggest memory hogs only to get an order of
>>>> >>>>> magnitude estimate
>>>> >>>>>
>>>> >>>>> * The Fine grid operator contains 4223139840 non-zeros --> 1.8 GB
>>>> >>>>> per MPI rank assuming double precision.
>>>> >>>>> The indices for the AIJ could amount to another 0.3 GB (assuming 32
>>>> >>>>> bit integers)
>>>> >>>>>
>>>> >>>>> * You use 5 levels of coarsening, so the other operators should
>>>> >>>>> represent (collectively)
>>>> >>>>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4 ~ 300 MB per MPI rank on the
>>>> >>>>> communicator with 18432 ranks.
>>>> >>>>> The coarse grid should consume ~ 0.5 MB per MPI rank on the
>>>> >>>>> communicator with 18432 ranks.
>>>> >>>>>
>>>> >>>>> * You use a reduction factor of 64, making the new communicator with
>>>> >>>>> 288 MPI ranks.
>>>> >>>>> PCTelescope will first gather a temporary matrix associated with
>>>> >>>>> your coarse level operator assuming a comm size of 288 living on the
>>>> >>>>> comm with size 18432.
>>>> >>>>> This matrix will require approximately 0.5 * 64 = 32 MB per core on
>>>> >>>>> the 288 ranks.
>>>> >>>>> This matrix is then used to form a new MPIAIJ matrix on the subcomm,
>>>> >>>>> thus require another 32 MB per rank.
>>>> >>>>> The temporary matrix is now destroyed.
>>>> >>>>>
>>>> >>>>> * Because a DMDA is detected, a permutation matrix is assembled.
>>>> >>>>> This requires 2 doubles per point in the DMDA.
>>>> >>>>> Your coarse DMDA contains 92 x 16 x 48 points.
>>>> >>>>> Thus the permutation matrix will require < 1 MB per MPI rank on the
>>>> >>>>> sub-comm.
>>>> >>>>>
>>>> >>>>> * Lastly, the matrix is permuted. This uses MatPtAP(), but the
>>>> >>>>> resulting operator will have the same memory footprint as the
>>>> >>>>> unpermuted matrix (32 MB). At any stage in PCTelescope, only 2
>>>> >>>>> operators of size 32 MB are held in memory when the DMDA is provided.
>>>> >>>>>
>>>> >>>>> From my rough estimates, the worst case memory foot print for any
>>>> >>>>> given core, given your options is approximately
>>>> >>>>> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB = 2465 MB
>>>> >>>>> This is way below 8 GB.
>>>> >>>>>
>>>> >>>>> Note this estimate completely ignores:
>>>> >>>>> (1) the memory required for the restriction operator,
>>>> >>>>> (2) the potential growth in the number of non-zeros per row due to
>>>> >>>>> Galerkin coarsening (I wished -ksp_view_pre reported the output from
>>>> >>>>> MatView so we could see the number of non-zeros required by the
>>>> >>>>> coarse level operators)
>>>> >>>>> (3) all temporary vectors required by the CG solver, and those
>>>> >>>>> required by the smoothers.
>>>> >>>>> (4) internal memory allocated by MatPtAP
>>>> >>>>> (5) memory associated with IS's used within PCTelescope
>>>> >>>>>
>>>> >>>>> So either I am completely off in my estimates, or you have not
>>>> >>>>> carefully estimated the memory usage of your application code.
>>>> >>>>> Hopefully others might examine/correct my rough estimates
>>>> >>>>>
>>>> >>>>> Since I don't have your code I cannot access the latter.
>>>> >>>>> Since I don't have access to the same machine you are running on, I
>>>> >>>>> think we need to take a step back.
>>>> >>>>>
>>>> >>>>> [1] What machine are you running on? Send me a URL if its available
>>>> >>>>>
>>>> >>>>> [2] What discretization are you using? (I am guessing a scalar 7
>>>> >>>>> point FD stencil)
>>>> >>>>> If it's a 7 point FD stencil, we should be able to examine the
>>>> >>>>> memory usage of your solver configuration using a standard, light
>>>> >>>>> weight existing PETSc example, run on your machine at the same scale.
>>>> >>>>> This would hopefully enable us to correctly evaluate the actual
>>>> >>>>> memory usage required by the solver configuration you are using.
>>>> >>>>>
>>>> >>>>> Thanks,
>>>> >>>>> Dave
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> Frank
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> On 07/08/2016 10:38 PM, Dave May wrote:
>>>> >>>>>>
>>>> >>>>>> On Saturday, 9 July 2016, frank <hengj...@uci.edu> wrote:
>>>> >>>>>> Hi Barry and Dave,
>>>> >>>>>>
>>>> >>>>>> Thank both of you for the advice.
>>>> >>>>>>
>>>> >>>>>> @Barry
>>>> >>>>>> I made a mistake in the file names in last email. I attached the
>>>> >>>>>> correct files this time.
>>>> >>>>>> For all the three tests, 'Telescope' is used as the coarse
>>>> >>>>>> preconditioner.
>>>> >>>>>>
>>>> >>>>>> == Test1: Grid: 1536*128*384, Process Mesh: 48*4*12
>>>> >>>>>> Part of the memory usage: Vector 125 124 3971904
>>>> >>>>>> 0.
>>>> >>>>>> Matrix 101 101
>>>> >>>>>> 9462372 0
>>>> >>>>>>
>>>> >>>>>> == Test2: Grid: 1536*128*384, Process Mesh: 96*8*24
>>>> >>>>>> Part of the memory usage: Vector 125 124 681672 0.
>>>> >>>>>> Matrix 101 101
>>>> >>>>>> 1462180 0.
>>>> >>>>>>
>>>> >>>>>> In theory, the memory usage in Test1 should be 8 times of Test2. In
>>>> >>>>>> my case, it is about 6 times.
>>>> >>>>>>
>>>> >>>>>> == Test3: Grid: 3072*256*768, Process Mesh: 96*8*24. Sub-domain
>>>> >>>>>> per process: 32*32*32
>>>> >>>>>> Here I get the out of memory error.
>>>> >>>>>>
>>>> >>>>>> I tried to use -mg_coarse jacobi. In this way, I don't need to set
>>>> >>>>>> -mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right?
>>>> >>>>>> The linear solver didn't work in this case. Petsc output some
>>>> >>>>>> errors.
>>>> >>>>>>
>>>> >>>>>> @Dave
>>>> >>>>>> In test3, I use only one instance of 'Telescope'. On the coarse
>>>> >>>>>> mesh of 'Telescope', I used LU as the preconditioner instead of SVD.
>>>> >>>>>> If my set the levels correctly, then on the last coarse mesh of MG
>>>> >>>>>> where it calls 'Telescope', the sub-domain per process is 2*2*2.
>>>> >>>>>> On the last coarse mesh of 'Telescope', there is only one grid
>>>> >>>>>> point per process.
>>>> >>>>>> I still got the OOM error. The detailed petsc option file is
>>>> >>>>>> attached.
>>>> >>>>>>
>>>> >>>>>> Do you understand the expected memory usage for the particular
>>>> >>>>>> parallel LU implementation you are using? I don't (seriously).
>>>> >>>>>> Replace LU with bjacobi and re-run this test. My point about solver
>>>> >>>>>> debugging is still valid.
>>>> >>>>>>
>>>> >>>>>> And please send the result of KSPView so we can see what is
>>>> >>>>>> actually used in the computations
>>>> >>>>>>
>>>> >>>>>> Thanks
>>>> >>>>>> Dave
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> Thank you so much.
>>>> >>>>>>
>>>> >>>>>> Frank
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> On 07/06/2016 02:51 PM, Barry Smith wrote:
>>>> >>>>>> On Jul 6, 2016, at 4:19 PM, frank <hengj...@uci.edu> wrote:
>>>> >>>>>>
>>>> >>>>>> Hi Barry,
>>>> >>>>>>
>>>> >>>>>> Thank you for you advice.
>>>> >>>>>> I tried three test. In the 1st test, the grid is 3072*256*768 and
>>>> >>>>>> the process mesh is 96*8*24.
>>>> >>>>>> The linear solver is 'cg' the preconditioner is 'mg' and
>>>> >>>>>> 'telescope' is used as the preconditioner at the coarse mesh.
>>>> >>>>>> The system gives me the "Out of Memory" error before the linear
>>>> >>>>>> system is completely solved.
>>>> >>>>>> The info from '-ksp_view_pre' is attached. I seems to me that the
>>>> >>>>>> error occurs when it reaches the coarse mesh.
>>>> >>>>>>
>>>> >>>>>> The 2nd test uses a grid of 1536*128*384 and process mesh is
>>>> >>>>>> 96*8*24. The 3rd test
>>>> >>>>>> uses the same grid but a different process mesh 48*4*12.
>>>> >>>>>> Are you sure this is right? The total matrix and vector memory
>>>> >>>>>> usage goes from 2nd test
>>>> >>>>>> Vector 384 383 8,193,712 0.
>>>> >>>>>> Matrix 103 103 11,508,688 0.
>>>> >>>>>> to 3rd test
>>>> >>>>>> Vector 384 383 1,590,520 0.
>>>> >>>>>> Matrix 103 103 3,508,664 0.
>>>> >>>>>> that is the memory usage got smaller but if you have only 1/8th the
>>>> >>>>>> processes and the same grid it should have gotten about 8 times
>>>> >>>>>> bigger. Did you maybe cut the grid by a factor of 8 also? If so
>>>> >>>>>> that still doesn't explain it because the memory usage changed by a
>>>> >>>>>> factor of 5 something for the vectors and 3 something for the
>>>> >>>>>> matrices.
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> The linear solver and petsc options in 2nd and 3rd tests are the
>>>> >>>>>> same in 1st test. The linear solver works fine in both test.
>>>> >>>>>> I attached the memory usage of the 2nd and 3rd tests. The memory
>>>> >>>>>> info is from the option '-log_summary'. I tried to use
>>>> >>>>>> '-momery_info' as you suggested, but in my case petsc treated it as
>>>> >>>>>> an unused option. It output nothing about the memory. Do I need to
>>>> >>>>>> add sth to my code so I can use '-memory_info'?
>>>> >>>>>> Sorry, my mistake the option is -memory_view
>>>> >>>>>>
>>>> >>>>>> Can you run the one case with -memory_view and -mg_coarse jacobi
>>>> >>>>>> -ksp_max_it 1 (just so it doesn't iterate forever) to see how much
>>>> >>>>>> memory is used without the telescope? Also run case 2 the same way.
>>>> >>>>>>
>>>> >>>>>> Barry
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> In both tests the memory usage is not large.
>>>> >>>>>>
>>>> >>>>>> It seems to me that it might be the 'telescope' preconditioner
>>>> >>>>>> that allocated a lot of memory and caused the error in the 1st test.
>>>> >>>>>> Is there is a way to show how much memory it allocated?
>>>> >>>>>>
>>>> >>>>>> Frank
>>>> >>>>>>
>>>> >>>>>> On 07/05/2016 03:37 PM, Barry Smith wrote:
>>>> >>>>>> Frank,
>>>> >>>>>>
>>>> >>>>>> You can run with -ksp_view_pre to have it "view" the KSP
>>>> >>>>>> before the solve so hopefully it gets that far.
>>>> >>>>>>
>>>> >>>>>> Please run the problem that does fit with -memory_info when
>>>> >>>>>> the problem completes it will show the "high water mark" for PETSc
>>>> >>>>>> allocated memory and total memory used. We first want to look at
>>>> >>>>>> these numbers to see if it is using more memory than you expect.
>>>> >>>>>> You could also run with say half the grid spacing to see how the
>>>> >>>>>> memory usage scaled with the increase in grid points. Make the runs
>>>> >>>>>> also with -log_view and send all the output from these options.
>>>> >>>>>>
>>>> >>>>>> Barry
>>>> >>>>>>
>>>> >>>>>> On Jul 5, 2016, at 5:23 PM, frank <hengj...@uci.edu> wrote:
>>>> >>>>>>
>>>> >>>>>> Hi,
>>>> >>>>>>
>>>> >>>>>> I am using the CG ksp solver and Multigrid preconditioner to solve
>>>> >>>>>> a linear system in parallel.
>>>> >>>>>> I chose to use the 'Telescope' as the preconditioner on the coarse
>>>> >>>>>> mesh for its good performance.
>>>> >>>>>> The petsc options file is attached.
>>>> >>>>>>
>>>> >>>>>> The domain is a 3d box.
>>>> >>>>>> It works well when the grid is 1536*128*384 and the process mesh
>>>> >>>>>> is 96*8*24. When I double the size of grid and
>>>> >>>>>> keep the same process mesh and petsc
>>>> >>>>>> options, I get an "out of memory" error from the super-cluster I am
>>>> >>>>>> using.
>>>> >>>>>> Each process has access to at least 8G memory, which should be more
>>>> >>>>>> than enough for my application. I am sure that all the other parts
>>>> >>>>>> of my code( except the linear solver ) do not use much memory. So I
>>>> >>>>>> doubt if there is something wrong with the linear solver.
>>>> >>>>>> The error occurs before the linear system is completely solved so I
>>>> >>>>>> don't have the info from ksp view. I am not able to re-produce the
>>>> >>>>>> error with a smaller problem
>>>> >>>>>> either.
>>>> >>>>>> In addition, I tried to use the block jacobi as the preconditioner
>>>> >>>>>> with the same grid and same decomposition. The linear solver runs
>>>> >>>>>> extremely slow but there is no memory error.
>>>> >>>>>>
>>>> >>>>>> How can I diagnose what exactly cause the error?
>>>> >>>>>> Thank you so much.
>>>> >>>>>>
>>>> >>>>>> Frank
>>>> >>>>>> <petsc_options.txt>
>>>> >>>>>> <ksp_view_pre.txt><memory_test2.txt><memory_test3.txt><petsc_options.txt>
>>>> >>>>>>
>>>> >>>>>
>>>> >>>>
>>>> >>> <ksp_view1.txt><ksp_view2.txt><ksp_view3.txt><memory1.txt><memory2.txt><petsc_options1.txt><petsc_options2.txt><petsc_options3.txt>
>>>> >
>>>>
>>>
>>>
>>
>>
>
> <petsc_options_32768.txt><test_ksp.f90>