Sorry, my mistake. The double precision work arrays needed inside the VecScatter are dof * number of ghost points while the space for the indices should be number of local grid points (Not times dof). Is that what you see? If the space is dof* number of local grid points then something is wrong somewhere along the processing.
Barry On Oct 21, 2013, at 1:04 PM, Matthew Knepley <[email protected]> wrote: > On Mon, Oct 21, 2013 at 1:00 PM, Barry Smith <[email protected]> wrote: > > Matt, > > The scatters should always use block indices (I think they do) so the > memory usage for the scatters should not have a dof* in front of this. Are > you sure that the dof* is there? If it is there is it because it is a block > size that we don't support directly? We currently have special support for > BS or 1,2,3,4,5,6,7,8,12 We should at least fill in 9,10,11 > > Do we somewhere inside the VecScatter create business mistakenly create > an array that depends on dof*? > > I am sure of this dependence. Its very easy to see by just creating the DA > and ending using -malloc_test. If it is intended to use block indices, > this is not happening. > > Matt > > > Barry > > > > > On Oct 21, 2013, at 11:52 AM, Matthew Knepley <[email protected]> wrote: > > > On Mon, Oct 21, 2013 at 11:32 AM, Barry Smith <[email protected]> wrote: > > > > The PETSc DMDA object greedily allocates several arrays of data used to > > set up the communication and other things like local to global mappings > > even before you create any vectors. This is why you see this big bump in > > memory usage. > > > > BUT I don't think it should be any worse in 3.4 than in 3.3 or earlier; > > at least we did not intend to make it worse. Are you sure it is using more > > memory than in 3.3 > > > > In order for use to decrease the memory usage of the DMDA setup it would > > be helpful if we knew which objects created within it used the most memory. > > There is some sloppiness in that routine of not reusing memory as well as > > could be, not sure how much difference that would make. > > > > I am adding a DMDA example to look at this is detail. Here is what I have > > up front. Suppose that there are G grid vertices, e,g, 10^6 in > > your example, so that a vector takes up dof*8G bytes. Then the 2D DMDA > > allocates > > > > Create ltog scatter dof*8G > > Create gtol scatter dof*8G > > Raw indices dof*4G > > Create ltogmap dof*4G > > Create ltogmapb 4G > > -------------------------------------------- > > dof*24G + 4G < 4 vectors > > > > It also allocates 2 temporary vectors which are freed but your test may > > pick up since the OS might not have garbage collected them. I will > > get the precise numbers for 3D, but they should be similar. > > > > I don't really see the point of using a DMDA without the scatters. You > > could save 1 vector of storage by making the creation of the l2g maps > > for the global vector lazy (and possibly those indices we use to remap > > arrays). > > > > Matt > > > > > > Barry > > > > > > > > On Oct 21, 2013, at 7:02 AM, Juha Jäykkä <[email protected]> wrote: > > > > > Dear list members, > > > > > > I have noticed strange memory consumption after upgrading to 3.4 series. I > > > never had time to properly investigate, but here is what happens [yes, > > > this > > > might be a petsc4py issue, but I doubt it] is > > > > > > # helpers contains _ProcessMemoryInfoProc routine which just digs the > > > memory > > > # usage data from /proc > > > import helpers > > > procdata=helpers._ProcessMemoryInfoProc() > > > print procdata.rss/2**20, "MiB /", procdata.os_specific[3][1] > > > from petsc4py import PETSc > > > procdata=helpers._ProcessMemoryInfoProc() > > > print procdata.rss/2**20, "MiB /", procdata.os_specific[3][1] > > > da = PETSc.DA().create(sizes=[100,100,100], > > > proc_sizes=[PETSc.DECIDE,PETSc.DECIDE,PETSc.DECIDE], > > > boundary_type=[3,0,0], > > > stencil_type=PETSc.DA.StencilType.BOX, > > > dof=7, stencil_width=1, comm=PETSc.COMM_WORLD) > > > procdata=helpers._ProcessMemoryInfoProc() > > > print procdata.rss/2**20, "MiB /", procdata.os_specific[3][1] > > > vec=da.createGlobalVec() > > > procdata=helpers._ProcessMemoryInfoProc() > > > print procdata.rss/2**20, "MiB /", procdata.os_specific[3][1] > > > > > > outputs > > > > > > 48 MiB / 49348 kB > > > 48 MiB / 49360 kB > > > 381 MiB / 446228 kB > > > 435 MiB / 446228 kB > > > > > > Which is odd: size of the actual data to be stored in the da is just > > > about 56 > > > megabytes, so why does creating the da consume 7 times that? And why does > > > the > > > DA reserve the memory in the first place? I thought memory only gets > > > allocated > > > once an associated vector is created and it indeed looks like the > > > createGlobalVec call does indeed allocate the right amount of data. But > > > what > > > is that 330 MiB that DA().create() consumes? [It's actually the .setUp() > > > method that does the consuming, but that's not of much use as it needs to > > > be > > > called before a vector can be created.] > > > > > > Cheers, > > > Juha > > > > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which their > > experiments lead. > > -- Norbert Wiener > > > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener
