On Tue, Oct 4, 2016 at 3:09 PM, frank <hengj...@uci.edu> wrote: > Hi Dave, > > Thank you for the reply. > What do you mean by the "nested calls to KSPSolve"? >
KSPSolve is called again after redistributing the computation. > I tried to call KSPSolve twice, but the the second solve converged in 0 > iteration. KSPSolve seems to remember the solution. How can I force both > solves start from the same initial guess? > Did you zero the solution vector between solves? VecSet(x, 0.0); Matt > Thank you. > > Frank > > > > On 10/04/2016 12:56 PM, Dave May wrote: > > > > On Tuesday, 4 October 2016, frank <hengj...@uci.edu> wrote: > >> Hi, >> This question is follow-up of the thread "Question about memory usage in >> Multigrid preconditioner". >> I used to have the "Out of Memory(OOM)" problem when using the >> CG+Telescope MG solver with 32768 cores. Adding the "-matrap 0; >> -matptap_scalable" option did solve that problem. >> >> Then I test the scalability by solving a 3d poisson eqn for 1 step. I >> used one sub-communicator in all the tests. The difference between the >> petsc options in those tests are: 1 the pc_telescope_reduction_factor; 2 >> the number of multigrid levels in the up/down solver. The function >> "ksp_solve" is timed. It is kind of slow and doesn't scale at all. >> >> Test1: 512^3 grid points >> Core# telescope_reduction_factor MG levels# for up/down >> solver Time for KSPSolve (s) >> 512 8 4 / >> 3 6.2466 >> 4096 64 5 / >> 3 0.9361 >> 32768 64 4 / >> 3 4.8914 >> >> Test2: 1024^3 grid points >> Core# telescope_reduction_factor MG levels# for up/down >> solver Time for KSPSolve (s) >> 4096 64 5 / 4 >> 3.4139 >> 8192 128 5 / >> 4 2.4196 >> 16384 32 5 / 3 >> 5.4150 >> 32768 64 5 / >> 3 5.6067 >> 65536 128 5 / >> 3 6.5219 >> > > You have to be very careful how you interpret these numbers. Your solver > contains nested calls to KSPSolve, and unfortunately as a result the > numbers you report include setup time. This will remain true even if you > call KSPSetUp on the outermost KSP. > > Your email concerns scalability of the silver application, so let's focus > on that issue. > > The only way to clearly separate setup from solve time is to perform two > identical solves. The second solve will not require any setup. You should > monitor the second solve via a new PetscStage. > > This was what I did in the telescope paper. It was the only way to > understand the setup cost (and scaling) cf the solve time (and scaling). > > Thanks > Dave > > > >> I guess I didn't set the MG levels properly. What would be the efficient >> way to arrange the MG levels? >> Also which preconditionr at the coarse mesh of the 2nd communicator >> should I use to improve the performance? >> >> I attached the test code and the petsc options file for the 1024^3 cube >> with 32768 cores. >> >> Thank you. >> >> Regards, >> Frank >> >> >> >> >> >> >> On 09/15/2016 03:35 AM, Dave May wrote: >> >> HI all, >> >> I the only unexpected memory usage I can see is associated with the call >> to MatPtAP(). >> Here is something you can try immediately. >> Run your code with the additional options >> -matrap 0 -matptap_scalable >> >> I didn't realize this before, but the default behaviour of MatPtAP in >> parallel is actually to to explicitly form the transpose of P (e.g. >> assemble R = P^T) and then compute R.A.P. >> You don't want to do this. The option -matrap 0 resolves this issue. >> >> The implementation of P^T.A.P has two variants. >> The scalable implementation (with respect to memory usage) is selected >> via the second option -matptap_scalable. >> >> Try it out - I see a significant memory reduction using these options for >> particular mesh sizes / partitions. >> >> I've attached a cleaned up version of the code you sent me. >> There were a number of memory leaks and other issues. >> The main points being >> * You should call DMDAVecGetArrayF90() before VecAssembly{Begin,End} >> * You should call PetscFinalize(), otherwise the option -log_summary >> (-log_view) will not display anything once the program has completed. >> >> >> Thanks, >> Dave >> >> >> On 15 September 2016 at 08:03, Hengjie Wang <hengj...@uci.edu> wrote: >> >>> Hi Dave, >>> >>> Sorry, I should have put more comment to explain the code. >>> The number of process in each dimension is the same: Px = Py=Pz=P. So is >>> the domain size. >>> So if the you want to run the code for a 512^3 grid points on 16^3 >>> cores, you need to set "-N 512 -P 16" in the command line. >>> I add more comments and also fix an error in the attached code. ( The >>> error only effects the accuracy of solution but not the memory usage. ) >>> >>> Thank you. >>> Frank >>> >>> >>> On 9/14/2016 9:05 PM, Dave May wrote: >>> >>> >>> >>> On Thursday, 15 September 2016, Dave May <dave.mayhe...@gmail.com> >>> wrote: >>> >>>> >>>> >>>> On Thursday, 15 September 2016, frank <hengj...@uci.edu> wrote: >>>> >>>>> Hi, >>>>> >>>>> I write a simple code to re-produce the error. I hope this can help to >>>>> diagnose the problem. >>>>> The code just solves a 3d poisson equation. >>>>> >>>> >>>> Why is the stencil width a runtime parameter?? And why is the default >>>> value 2? For 7-pnt FD Laplace, you only need a stencil width of 1. >>>> >>>> Was this choice made to mimic something in the real application code? >>>> >>> >>> Please ignore - I misunderstood your usage of the param set by -P >>> >>> >>>> >>>> >>>>> >>>>> I run the code on a 1024^3 mesh. The process partition is 32 * 32 * >>>>> 32. That's when I re-produce the OOM error. Each core has about 2G memory. >>>>> I also run the code on a 512^3 mesh with 16 * 16 * 16 processes. The >>>>> ksp solver works fine. >>>>> I attached the code, ksp_view_pre's output and my petsc option file. >>>>> >>>>> Thank you. >>>>> Frank >>>>> >>>>> On 09/09/2016 06:38 PM, Hengjie Wang wrote: >>>>> >>>>> Hi Barry, >>>>> >>>>> I checked. On the supercomputer, I had the option "-ksp_view_pre" but >>>>> it is not in file I sent you. I am sorry for the confusion. >>>>> >>>>> Regards, >>>>> Frank >>>>> >>>>> On Friday, September 9, 2016, Barry Smith <bsm...@mcs.anl.gov> wrote: >>>>> >>>>>> >>>>>> > On Sep 9, 2016, at 3:11 PM, frank <hengj...@uci.edu> wrote: >>>>>> > >>>>>> > Hi Barry, >>>>>> > >>>>>> > I think the first KSP view output is from -ksp_view_pre. Before I >>>>>> submitted the test, I was not sure whether there would be OOM error or >>>>>> not. >>>>>> So I added both -ksp_view_pre and -ksp_view. >>>>>> >>>>>> But the options file you sent specifically does NOT list the >>>>>> -ksp_view_pre so how could it be from that? >>>>>> >>>>>> Sorry to be pedantic but I've spent too much time in the past >>>>>> trying to debug from incorrect information and want to make sure that the >>>>>> information I have is correct before thinking. Please recheck exactly >>>>>> what >>>>>> happened. Rerun with the exact input file you emailed if that is needed. >>>>>> >>>>>> Barry >>>>>> >>>>>> > >>>>>> > Frank >>>>>> > >>>>>> > >>>>>> > On 09/09/2016 12:38 PM, Barry Smith wrote: >>>>>> >> Why does ksp_view2.txt have two KSP views in it while >>>>>> ksp_view1.txt has only one KSPView in it? Did you run two different >>>>>> solves >>>>>> in the 2 case but not the one? >>>>>> >> >>>>>> >> Barry >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >>> On Sep 9, 2016, at 10:56 AM, frank <hengj...@uci.edu> wrote: >>>>>> >>> >>>>>> >>> Hi, >>>>>> >>> >>>>>> >>> I want to continue digging into the memory problem here. >>>>>> >>> I did find a work around in the past, which is to use less cores >>>>>> per node so that each core has 8G memory. However this is deficient and >>>>>> expensive. I hope to locate the place that uses the most memory. >>>>>> >>> >>>>>> >>> Here is a brief summary of the tests I did in past: >>>>>> >>>> Test1: Mesh 1536*128*384 | Process Mesh 48*4*12 >>>>>> >>> Maximum (over computational time) process memory: total >>>>>> 7.0727e+08 >>>>>> >>> Current process memory: >>>>>> total 7.0727e+08 >>>>>> >>> Maximum (over computational time) space PetscMalloc()ed: total >>>>>> 6.3908e+11 >>>>>> >>> Current space PetscMalloc()ed: >>>>>> total 1.8275e+09 >>>>>> >>> >>>>>> >>>> Test2: Mesh 1536*128*384 | Process Mesh 96*8*24 >>>>>> >>> Maximum (over computational time) process memory: total >>>>>> 5.9431e+09 >>>>>> >>> Current process memory: >>>>>> total 5.9431e+09 >>>>>> >>> Maximum (over computational time) space PetscMalloc()ed: total >>>>>> 5.3202e+12 >>>>>> >>> Current space PetscMalloc()ed: >>>>>> total 5.4844e+09 >>>>>> >>> >>>>>> >>>> Test3: Mesh 3072*256*768 | Process Mesh 96*8*24 >>>>>> >>> OOM( Out Of Memory ) killer of the supercomputer terminated >>>>>> the job during "KSPSolve". >>>>>> >>> >>>>>> >>> I attached the output of ksp_view( the third test's output is >>>>>> from ksp_view_pre ), memory_view and also the petsc options. >>>>>> >>> >>>>>> >>> In all the tests, each core can access about 2G memory. In test3, >>>>>> there are 4223139840 non-zeros in the matrix. This will consume about >>>>>> 1.74M, using double precision. Considering some extra memory used to >>>>>> store >>>>>> integer index, 2G memory should still be way enough. >>>>>> >>> >>>>>> >>> Is there a way to find out which part of KSPSolve uses the most >>>>>> memory? >>>>>> >>> Thank you so much. >>>>>> >>> >>>>>> >>> BTW, there are 4 options remains unused and I don't understand >>>>>> why they are omitted: >>>>>> >>> -mg_coarse_telescope_mg_coarse_ksp_type value: preonly >>>>>> >>> -mg_coarse_telescope_mg_coarse_pc_type value: bjacobi >>>>>> >>> -mg_coarse_telescope_mg_levels_ksp_max_it value: 1 >>>>>> >>> -mg_coarse_telescope_mg_levels_ksp_type value: richardson >>>>>> >>> >>>>>> >>> >>>>>> >>> Regards, >>>>>> >>> Frank >>>>>> >>> >>>>>> >>> On 07/13/2016 05:47 PM, Dave May wrote: >>>>>> >>>> >>>>>> >>>> On 14 July 2016 at 01:07, frank <hengj...@uci.edu> wrote: >>>>>> >>>> Hi Dave, >>>>>> >>>> >>>>>> >>>> Sorry for the late reply. >>>>>> >>>> Thank you so much for your detailed reply. >>>>>> >>>> >>>>>> >>>> I have a question about the estimation of the memory usage. >>>>>> There are 4223139840 allocated non-zeros and 18432 MPI processes. Double >>>>>> precision is used. So the memory per process is: >>>>>> >>>> 4223139840 * 8bytes / 18432 / 1024 / 1024 = 1.74M ? >>>>>> >>>> Did I do sth wrong here? Because this seems too small. >>>>>> >>>> >>>>>> >>>> No - I totally f***ed it up. You are correct. That'll teach me >>>>>> for fumbling around with my iphone calculator and not using my brain. >>>>>> (Note >>>>>> that to convert to MB just divide by 1e6, not 1024^2 - although I >>>>>> apparently cannot convert between units correctly....) >>>>>> >>>> >>>>>> >>>> From the PETSc objects associated with the solver, It looks like >>>>>> it _should_ run with 2GB per MPI rank. Sorry for my mistake. >>>>>> Possibilities >>>>>> are: somewhere in your usage of PETSc you've introduced a memory leak; >>>>>> PETSc is doing a huge over allocation (e.g. as per our discussion of >>>>>> MatPtAP); or in your application code there are other objects you have >>>>>> forgotten to log the memory for. >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> I am running this job on Bluewater >>>>>> >>>> I am using the 7 points FD stencil in 3D. >>>>>> >>>> >>>>>> >>>> I thought so on both counts. >>>>>> >>>> >>>>>> >>>> I apologize that I made a stupid mistake in computing the memory >>>>>> per core. My settings render each core can access only 2G memory on >>>>>> average >>>>>> instead of 8G which I mentioned in previous email. I re-run the job with >>>>>> 8G >>>>>> memory per core on average and there is no "Out Of Memory" error. I would >>>>>> do more test to see if there is still some memory issue. >>>>>> >>>> >>>>>> >>>> Ok. I'd still like to know where the memory was being used since >>>>>> my estimates were off. >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> Thanks, >>>>>> >>>> Dave >>>>>> >>>> >>>>>> >>>> Regards, >>>>>> >>>> Frank >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> On 07/11/2016 01:18 PM, Dave May wrote: >>>>>> >>>>> Hi Frank, >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> On 11 July 2016 at 19:14, frank <hengj...@uci.edu> wrote: >>>>>> >>>>> Hi Dave, >>>>>> >>>>> >>>>>> >>>>> I re-run the test using bjacobi as the preconditioner on the >>>>>> coarse mesh of telescope. The Grid is 3072*256*768 and process mesh is >>>>>> 96*8*24. The petsc option file is attached. >>>>>> >>>>> I still got the "Out Of Memory" error. The error occurred >>>>>> before the linear solver finished one step. So I don't have the full info >>>>>> from ksp_view. The info from ksp_view_pre is attached. >>>>>> >>>>> >>>>>> >>>>> Okay - that is essentially useless (sorry) >>>>>> >>>>> >>>>>> >>>>> It seems to me that the error occurred when the decomposition >>>>>> was going to be changed. >>>>>> >>>>> >>>>>> >>>>> Based on what information? >>>>>> >>>>> Running with -info would give us more clues, but will create a >>>>>> ton of output. >>>>>> >>>>> Please try running the case which failed with -info >>>>>> >>>>> I had another test with a grid of 1536*128*384 and the same >>>>>> process mesh as above. There was no error. The ksp_view info is attached >>>>>> for comparison. >>>>>> >>>>> Thank you. >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> [3] Here is my crude estimate of your memory usage. >>>>>> >>>>> I'll target the biggest memory hogs only to get an order of >>>>>> magnitude estimate >>>>>> >>>>> >>>>>> >>>>> * The Fine grid operator contains 4223139840 non-zeros --> 1.8 >>>>>> GB per MPI rank assuming double precision. >>>>>> >>>>> The indices for the AIJ could amount to another 0.3 GB >>>>>> (assuming 32 bit integers) >>>>>> >>>>> >>>>>> >>>>> * You use 5 levels of coarsening, so the other operators should >>>>>> represent (collectively) >>>>>> >>>>> 2.1 / 8 + 2.1/8^2 + 2.1/8^3 + 2.1/8^4 ~ 300 MB per MPI rank on >>>>>> the communicator with 18432 ranks. >>>>>> >>>>> The coarse grid should consume ~ 0.5 MB per MPI rank on the >>>>>> communicator with 18432 ranks. >>>>>> >>>>> >>>>>> >>>>> * You use a reduction factor of 64, making the new communicator >>>>>> with 288 MPI ranks. >>>>>> >>>>> PCTelescope will first gather a temporary matrix associated >>>>>> with your coarse level operator assuming a comm size of 288 living on the >>>>>> comm with size 18432. >>>>>> >>>>> This matrix will require approximately 0.5 * 64 = 32 MB per >>>>>> core on the 288 ranks. >>>>>> >>>>> This matrix is then used to form a new MPIAIJ matrix on the >>>>>> subcomm, thus require another 32 MB per rank. >>>>>> >>>>> The temporary matrix is now destroyed. >>>>>> >>>>> >>>>>> >>>>> * Because a DMDA is detected, a permutation matrix is assembled. >>>>>> >>>>> This requires 2 doubles per point in the DMDA. >>>>>> >>>>> Your coarse DMDA contains 92 x 16 x 48 points. >>>>>> >>>>> Thus the permutation matrix will require < 1 MB per MPI rank on >>>>>> the sub-comm. >>>>>> >>>>> >>>>>> >>>>> * Lastly, the matrix is permuted. This uses MatPtAP(), but the >>>>>> resulting operator will have the same memory footprint as the unpermuted >>>>>> matrix (32 MB). At any stage in PCTelescope, only 2 operators of size 32 >>>>>> MB >>>>>> are held in memory when the DMDA is provided. >>>>>> >>>>> >>>>>> >>>>> From my rough estimates, the worst case memory foot print for >>>>>> any given core, given your options is approximately >>>>>> >>>>> 2100 MB + 300 MB + 32 MB + 32 MB + 1 MB = 2465 MB >>>>>> >>>>> This is way below 8 GB. >>>>>> >>>>> >>>>>> >>>>> Note this estimate completely ignores: >>>>>> >>>>> (1) the memory required for the restriction operator, >>>>>> >>>>> (2) the potential growth in the number of non-zeros per row due >>>>>> to Galerkin coarsening (I wished -ksp_view_pre reported the output from >>>>>> MatView so we could see the number of non-zeros required by the coarse >>>>>> level operators) >>>>>> >>>>> (3) all temporary vectors required by the CG solver, and those >>>>>> required by the smoothers. >>>>>> >>>>> (4) internal memory allocated by MatPtAP >>>>>> >>>>> (5) memory associated with IS's used within PCTelescope >>>>>> >>>>> >>>>>> >>>>> So either I am completely off in my estimates, or you have not >>>>>> carefully estimated the memory usage of your application code. Hopefully >>>>>> others might examine/correct my rough estimates >>>>>> >>>>> >>>>>> >>>>> Since I don't have your code I cannot access the latter. >>>>>> >>>>> Since I don't have access to the same machine you are running >>>>>> on, I think we need to take a step back. >>>>>> >>>>> >>>>>> >>>>> [1] What machine are you running on? Send me a URL if its >>>>>> available >>>>>> >>>>> >>>>>> >>>>> [2] What discretization are you using? (I am guessing a scalar >>>>>> 7 point FD stencil) >>>>>> >>>>> If it's a 7 point FD stencil, we should be able to examine the >>>>>> memory usage of your solver configuration using a standard, light weight >>>>>> existing PETSc example, run on your machine at the same scale. >>>>>> >>>>> This would hopefully enable us to correctly evaluate the actual >>>>>> memory usage required by the solver configuration you are using. >>>>>> >>>>> >>>>>> >>>>> Thanks, >>>>>> >>>>> Dave >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> Frank >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> On 07/08/2016 10:38 PM, Dave May wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Saturday, 9 July 2016, frank <hengj...@uci.edu> wrote: >>>>>> >>>>>> Hi Barry and Dave, >>>>>> >>>>>> >>>>>> >>>>>> Thank both of you for the advice. >>>>>> >>>>>> >>>>>> >>>>>> @Barry >>>>>> >>>>>> I made a mistake in the file names in last email. I attached >>>>>> the correct files this time. >>>>>> >>>>>> For all the three tests, 'Telescope' is used as the coarse >>>>>> preconditioner. >>>>>> >>>>>> >>>>>> >>>>>> == Test1: Grid: 1536*128*384, Process Mesh: 48*4*12 >>>>>> >>>>>> Part of the memory usage: Vector 125 124 >>>>>> 3971904 0. >>>>>> >>>>>> Matrix 101 101 >>>>>> 9462372 0 >>>>>> >>>>>> >>>>>> >>>>>> == Test2: Grid: 1536*128*384, Process Mesh: 96*8*24 >>>>>> >>>>>> Part of the memory usage: Vector 125 124 681672 >>>>>> 0. >>>>>> >>>>>> Matrix 101 101 >>>>>> 1462180 0. >>>>>> >>>>>> >>>>>> >>>>>> In theory, the memory usage in Test1 should be 8 times of >>>>>> Test2. In my case, it is about 6 times. >>>>>> >>>>>> >>>>>> >>>>>> == Test3: Grid: 3072*256*768, Process Mesh: 96*8*24. >>>>>> Sub-domain per process: 32*32*32 >>>>>> >>>>>> Here I get the out of memory error. >>>>>> >>>>>> >>>>>> >>>>>> I tried to use -mg_coarse jacobi. In this way, I don't need to >>>>>> set -mg_coarse_ksp_type and -mg_coarse_pc_type explicitly, right? >>>>>> >>>>>> The linear solver didn't work in this case. Petsc output some >>>>>> errors. >>>>>> >>>>>> >>>>>> >>>>>> @Dave >>>>>> >>>>>> In test3, I use only one instance of 'Telescope'. On the >>>>>> coarse mesh of 'Telescope', I used LU as the preconditioner instead of >>>>>> SVD. >>>>>> >>>>>> If my set the levels correctly, then on the last coarse mesh >>>>>> of MG where it calls 'Telescope', the sub-domain per process is 2*2*2. >>>>>> >>>>>> On the last coarse mesh of 'Telescope', there is only one grid >>>>>> point per process. >>>>>> >>>>>> I still got the OOM error. The detailed petsc option file is >>>>>> attached. >>>>>> >>>>>> >>>>>> >>>>>> Do you understand the expected memory usage for the particular >>>>>> parallel LU implementation you are using? I don't (seriously). Replace LU >>>>>> with bjacobi and re-run this test. My point about solver debugging is >>>>>> still >>>>>> valid. >>>>>> >>>>>> >>>>>> >>>>>> And please send the result of KSPView so we can see what is >>>>>> actually used in the computations >>>>>> >>>>>> >>>>>> >>>>>> Thanks >>>>>> >>>>>> Dave >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Thank you so much. >>>>>> >>>>>> >>>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 07/06/2016 02:51 PM, Barry Smith wrote: >>>>>> >>>>>> On Jul 6, 2016, at 4:19 PM, frank <hengj...@uci.edu> wrote: >>>>>> >>>>>> >>>>>> >>>>>> Hi Barry, >>>>>> >>>>>> >>>>>> >>>>>> Thank you for you advice. >>>>>> >>>>>> I tried three test. In the 1st test, the grid is 3072*256*768 >>>>>> and the process mesh is 96*8*24. >>>>>> >>>>>> The linear solver is 'cg' the preconditioner is 'mg' and >>>>>> 'telescope' is used as the preconditioner at the coarse mesh. >>>>>> >>>>>> The system gives me the "Out of Memory" error before the >>>>>> linear system is completely solved. >>>>>> >>>>>> The info from '-ksp_view_pre' is attached. I seems to me that >>>>>> the error occurs when it reaches the coarse mesh. >>>>>> >>>>>> >>>>>> >>>>>> The 2nd test uses a grid of 1536*128*384 and process mesh is >>>>>> 96*8*24. The 3rd test uses >>>>>> the >>>>>> same grid but a different process mesh 48*4*12. >>>>>> >>>>>> Are you sure this is right? The total matrix and vector >>>>>> memory usage goes from 2nd test >>>>>> >>>>>> Vector 384 383 8,193,712 >>>>>> 0. >>>>>> >>>>>> Matrix 103 103 11,508,688 >>>>>> 0. >>>>>> >>>>>> to 3rd test >>>>>> >>>>>> Vector 384 383 1,590,520 0. >>>>>> >>>>>> Matrix 103 103 3,508,664 >>>>>> 0. >>>>>> >>>>>> that is the memory usage got smaller but if you have only >>>>>> 1/8th the processes and the same grid it should have gotten about 8 times >>>>>> bigger. Did you maybe cut the grid by a factor of 8 also? If so that >>>>>> still >>>>>> doesn't explain it because the memory usage changed by a factor of 5 >>>>>> something for the vectors and 3 something for the matrices. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> The linear solver and petsc options in 2nd and 3rd tests are >>>>>> the same in 1st test. The linear solver works fine in both test. >>>>>> >>>>>> I attached the memory usage of the 2nd and 3rd tests. The >>>>>> memory info is from the option '-log_summary'. I tried to use >>>>>> '-momery_info' as you suggested, but in my case petsc treated it as an >>>>>> unused option. It output nothing about the memory. Do I need to add sth >>>>>> to >>>>>> my code so I can use '-memory_info'? >>>>>> >>>>>> Sorry, my mistake the option is -memory_view >>>>>> >>>>>> >>>>>> >>>>>> Can you run the one case with -memory_view and -mg_coarse >>>>>> jacobi -ksp_max_it 1 (just so it doesn't iterate forever) to see how much >>>>>> memory is used without the telescope? Also run case 2 the same way. >>>>>> >>>>>> >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> In both tests the memory usage is not large. >>>>>> >>>>>> >>>>>> >>>>>> It seems to me that it might be the 'telescope' >>>>>> preconditioner that allocated a lot of memory and caused the error in the >>>>>> 1st test. >>>>>> >>>>>> Is there is a way to show how much memory it allocated? >>>>>> >>>>>> >>>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> >>>>>> On 07/05/2016 03:37 PM, Barry Smith wrote: >>>>>> >>>>>> Frank, >>>>>> >>>>>> >>>>>> >>>>>> You can run with -ksp_view_pre to have it "view" the KSP >>>>>> before the solve so hopefully it gets that far. >>>>>> >>>>>> >>>>>> >>>>>> Please run the problem that does fit with -memory_info >>>>>> when the problem completes it will show the "high water mark" for PETSc >>>>>> allocated memory and total memory used. We first want to look at these >>>>>> numbers to see if it is using more memory than you expect. You could also >>>>>> run with say half the grid spacing to see how the memory usage scaled >>>>>> with >>>>>> the increase in grid points. Make the runs also with -log_view and send >>>>>> all >>>>>> the output from these options. >>>>>> >>>>>> >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>> On Jul 5, 2016, at 5:23 PM, frank <hengj...@uci.edu> wrote: >>>>>> >>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> >>>>>> >>>>>> I am using the CG ksp solver and Multigrid preconditioner to >>>>>> solve a linear system in parallel. >>>>>> >>>>>> I chose to use the 'Telescope' as the preconditioner on the >>>>>> coarse mesh for its good performance. >>>>>> >>>>>> The petsc options file is attached. >>>>>> >>>>>> >>>>>> >>>>>> The domain is a 3d box. >>>>>> >>>>>> It works well when the grid is 1536*128*384 and the process >>>>>> mesh is 96*8*24. When I double the size of grid and >>>>>> keep the same process mesh and petsc options, I >>>>>> get an "out of memory" error from the super-cluster I am using. >>>>>> >>>>>> Each process has access to at least 8G memory, which should be >>>>>> more than enough for my application. I am sure that all the other parts >>>>>> of >>>>>> my code( except the linear solver ) do not use much memory. So I doubt if >>>>>> there is something wrong with the linear solver. >>>>>> >>>>>> The error occurs before the linear system is completely solved >>>>>> so I don't have the info from ksp view. I am not able to re-produce the >>>>>> error with a smaller problem either. >>>>>> >>>>>> In addition, I tried to use the block jacobi as the >>>>>> preconditioner with the same grid and same decomposition. The linear >>>>>> solver >>>>>> runs extremely slow but there is no memory error. >>>>>> >>>>>> >>>>>> >>>>>> How can I diagnose what exactly cause the error? >>>>>> >>>>>> Thank you so much. >>>>>> >>>>>> >>>>>> >>>>>> Frank >>>>>> >>>>>> <petsc_options.txt> >>>>>> >>>>>> <ksp_view_pre.txt><memory_test2.txt><memory_test3.txt><petsc >>>>>> _options.txt> >>>>>> >>>>>> >>>>>> >>>>> >>>>>> >>>> >>>>>> >>> <ksp_view1.txt><ksp_view2.txt><ksp_view3.txt><memory1.txt><m >>>>>> emory2.txt><petsc_options1.txt><petsc_options2.txt><petsc_op >>>>>> tions3.txt> >>>>>> > >>>>>> >>>>>> >>>>> >>>> >>> >>> >> >> > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener