Lots of optimizations in VecScatter where it matters (i.e. in parallel) but not sequential :-). We actually should fix this :-(
Barry On Oct 21, 2013, at 3:46 PM, Matthew Knepley <[email protected]> wrote: > On Mon, Oct 21, 2013 at 3:23 PM, Barry Smith <[email protected]> wrote: > > Matt, > > I think you are running on 1 process where the DMDA doesn't have an > optimized path, when I run on 2 processes the numbers indicate nothing > proportional to dof* number of local points > > Yes, I figured if it was not doing the right thing on 1, why go to more? :) > > Matt > > dof = 12 > ~/Src/petsc/test master $ petscmpiexec -n 2 ./ex1 -malloc_log | grep > VecScatter > [0] 7 21344 VecScatterCreate() > [0] 2 32 VecScatterCreateCommon_PtoS() > [0] 39 182480 VecScatterCreate_PtoS() > > dof = 8 > ~/Src/petsc/test master $ petscmpiexec -n 2 ./ex1 -malloc_log | grep > VecScatter > [0] 7 21344 VecScatterCreate() > [0] 2 32 VecScatterCreateCommon_PtoS() > [0] 39 176080 VecScatterCreate_PtoS() > > dof = 4 > > ~/Src/petsc/test master $ petscmpiexec -n 2 ./ex1 -malloc_log | grep > VecScatter > [0] 7 21344 VecScatterCreate() > [0] 2 32 VecScatterCreateCommon_PtoS() > [0] 39 169680 VecScatterCreate_PtoS() > > dof = 2 > ~/Src/petsc/test master $ petscmpiexec -n 2 ./ex1 -malloc_log | grep > VecScatter > [0] 7 21344 VecScatterCreate() > [0] 2 32 VecScatterCreateCommon_PtoS() > [0] 39 166480 VecScatterCreate_PtoS() > > dof =2 grid is 50 by 50 instead of 100 by 100 > > ~/Src/petsc/test master $ petscmpiexec -n 2 ./ex1 -malloc_log | grep > VecScatter > [0] 7 6352 VecScatterCreate() > [0] 2 32 VecScatterCreateCommon_PtoS() > [0] 39 43952 VecScatterCreate_PtoS() > > The IS creation in the DMDA is far more troubling > > /Src/petsc/test master $ petscmpiexec -n 2 ./ex1 -malloc_log | grep IS > > dof = 2 > > [0] 1 20400 ISBlockSetIndices_Block() > [0] 15 3760 ISCreate() > [0] 4 128 ISCreate_Block() > [0] 1 16 ISCreate_Stride() > [0] 2 81600 ISGetIndices_Block() > [0] 1 20400 ISLocalToGlobalMappingBlock() > [0] 7 42016 ISLocalToGlobalMappingCreate() > > dof = 4 > > ~/Src/petsc/test master $ petscmpiexec -n 2 ./ex1 -malloc_log | grep IS > [0] 1 20400 ISBlockSetIndices_Block() > [0] 15 3760 ISCreate() > [0] 4 128 ISCreate_Block() > [0] 1 16 ISCreate_Stride() > [0] 2 163200 ISGetIndices_Block() > [0] 1 20400 ISLocalToGlobalMappingBlock() > [0] 7 82816 ISLocalToGlobalMappingCreate() > > dof = 8 > > ~/Src/petsc/test master $ petscmpiexec -n 2 ./ex1 -malloc_log | grep IS > [0] 1 20400 ISBlockSetIndices_Block() > [0] 15 3760 ISCreate() > [0] 4 128 ISCreate_Block() > [0] 1 16 ISCreate_Stride() > [0] 2 326400 ISGetIndices_Block() > [0] 1 20400 ISLocalToGlobalMappingBlock() > [0] 7 164416 ISLocalToGlobalMappingCreate() > > dof = 12 > ~/Src/petsc/test master $ petscmpiexec -n 2 ./ex1 -malloc_log | grep IS > [0] 1 20400 ISBlockSetIndices_Block() > [0] 15 3760 ISCreate() > [0] 4 128 ISCreate_Block() > [0] 1 16 ISCreate_Stride() > [0] 2 489600 ISGetIndices_Block() > [0] 1 20400 ISLocalToGlobalMappingBlock() > [0] 7 246016 ISLocalToGlobalMappingCreate() > > Here the accessing of indices is at the point level (as well as block) and > hence memory usage is proportional to dof* local number of grid points. Of > course it is still only proportional to the vector size. There is some > improvement we could make it; with a lot of refactoring we can remove the > dof* completely, with a little refactoring we can bring it down to a single > dof*local number of grid points. > > I cannot understand why you are seeing memory usage 7 times more than a > vector. That seems like a lot. > > Barry > > > > On Oct 21, 2013, at 11:32 AM, Barry Smith <[email protected]> wrote: > > > > > The PETSc DMDA object greedily allocates several arrays of data used to > > set up the communication and other things like local to global mappings > > even before you create any vectors. This is why you see this big bump in > > memory usage. > > > > BUT I don't think it should be any worse in 3.4 than in 3.3 or earlier; > > at least we did not intend to make it worse. Are you sure it is using more > > memory than in 3.3 > > > > In order for use to decrease the memory usage of the DMDA setup it would > > be helpful if we knew which objects created within it used the most memory. > > There is some sloppiness in that routine of not reusing memory as well as > > could be, not sure how much difference that would make. > > > > > > Barry > > > > > > > > On Oct 21, 2013, at 7:02 AM, Juha Jäykkä <[email protected]> wrote: > > > >> Dear list members, > >> > >> I have noticed strange memory consumption after upgrading to 3.4 series. I > >> never had time to properly investigate, but here is what happens [yes, this > >> might be a petsc4py issue, but I doubt it] is > >> > >> # helpers contains _ProcessMemoryInfoProc routine which just digs the > >> memory > >> # usage data from /proc > >> import helpers > >> procdata=helpers._ProcessMemoryInfoProc() > >> print procdata.rss/2**20, "MiB /", procdata.os_specific[3][1] > >> from petsc4py import PETSc > >> procdata=helpers._ProcessMemoryInfoProc() > >> print procdata.rss/2**20, "MiB /", procdata.os_specific[3][1] > >> da = PETSc.DA().create(sizes=[100,100,100], > >> proc_sizes=[PETSc.DECIDE,PETSc.DECIDE,PETSc.DECIDE], > >> boundary_type=[3,0,0], > >> stencil_type=PETSc.DA.StencilType.BOX, > >> dof=7, stencil_width=1, comm=PETSc.COMM_WORLD) > >> procdata=helpers._ProcessMemoryInfoProc() > >> print procdata.rss/2**20, "MiB /", procdata.os_specific[3][1] > >> vec=da.createGlobalVec() > >> procdata=helpers._ProcessMemoryInfoProc() > >> print procdata.rss/2**20, "MiB /", procdata.os_specific[3][1] > >> > >> outputs > >> > >> 48 MiB / 49348 kB > >> 48 MiB / 49360 kB > >> 381 MiB / 446228 kB > >> 435 MiB / 446228 kB > >> > >> Which is odd: size of the actual data to be stored in the da is just about > >> 56 > >> megabytes, so why does creating the da consume 7 times that? And why does > >> the > >> DA reserve the memory in the first place? I thought memory only gets > >> allocated > >> once an associated vector is created and it indeed looks like the > >> createGlobalVec call does indeed allocate the right amount of data. But > >> what > >> is that 330 MiB that DA().create() consumes? [It's actually the .setUp() > >> method that does the consuming, but that's not of much use as it needs to > >> be > >> called before a vector can be created.] > >> > >> Cheers, > >> Juha > >> > > > > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener
