Matt,
I think you are running on 1 process where the DMDA doesn't have an
optimized path, when I run on 2 processes the numbers indicate nothing
proportional to dof* number of local points
dof = 12
~/Src/petsc/test master $ petscmpiexec -n 2 ./ex1 -malloc_log | grep VecScatter
[0] 7 21344 VecScatterCreate()
[0] 2 32 VecScatterCreateCommon_PtoS()
[0] 39 182480 VecScatterCreate_PtoS()
dof = 8
~/Src/petsc/test master $ petscmpiexec -n 2 ./ex1 -malloc_log | grep VecScatter
[0] 7 21344 VecScatterCreate()
[0] 2 32 VecScatterCreateCommon_PtoS()
[0] 39 176080 VecScatterCreate_PtoS()
dof = 4
~/Src/petsc/test master $ petscmpiexec -n 2 ./ex1 -malloc_log | grep VecScatter
[0] 7 21344 VecScatterCreate()
[0] 2 32 VecScatterCreateCommon_PtoS()
[0] 39 169680 VecScatterCreate_PtoS()
dof = 2
~/Src/petsc/test master $ petscmpiexec -n 2 ./ex1 -malloc_log | grep VecScatter
[0] 7 21344 VecScatterCreate()
[0] 2 32 VecScatterCreateCommon_PtoS()
[0] 39 166480 VecScatterCreate_PtoS()
dof =2 grid is 50 by 50 instead of 100 by 100
~/Src/petsc/test master $ petscmpiexec -n 2 ./ex1 -malloc_log | grep VecScatter
[0] 7 6352 VecScatterCreate()
[0] 2 32 VecScatterCreateCommon_PtoS()
[0] 39 43952 VecScatterCreate_PtoS()
The IS creation in the DMDA is far more troubling
/Src/petsc/test master $ petscmpiexec -n 2 ./ex1 -malloc_log | grep IS
dof = 2
[0] 1 20400 ISBlockSetIndices_Block()
[0] 15 3760 ISCreate()
[0] 4 128 ISCreate_Block()
[0] 1 16 ISCreate_Stride()
[0] 2 81600 ISGetIndices_Block()
[0] 1 20400 ISLocalToGlobalMappingBlock()
[0] 7 42016 ISLocalToGlobalMappingCreate()
dof = 4
~/Src/petsc/test master $ petscmpiexec -n 2 ./ex1 -malloc_log | grep IS
[0] 1 20400 ISBlockSetIndices_Block()
[0] 15 3760 ISCreate()
[0] 4 128 ISCreate_Block()
[0] 1 16 ISCreate_Stride()
[0] 2 163200 ISGetIndices_Block()
[0] 1 20400 ISLocalToGlobalMappingBlock()
[0] 7 82816 ISLocalToGlobalMappingCreate()
dof = 8
~/Src/petsc/test master $ petscmpiexec -n 2 ./ex1 -malloc_log | grep IS
[0] 1 20400 ISBlockSetIndices_Block()
[0] 15 3760 ISCreate()
[0] 4 128 ISCreate_Block()
[0] 1 16 ISCreate_Stride()
[0] 2 326400 ISGetIndices_Block()
[0] 1 20400 ISLocalToGlobalMappingBlock()
[0] 7 164416 ISLocalToGlobalMappingCreate()
dof = 12
~/Src/petsc/test master $ petscmpiexec -n 2 ./ex1 -malloc_log | grep IS
[0] 1 20400 ISBlockSetIndices_Block()
[0] 15 3760 ISCreate()
[0] 4 128 ISCreate_Block()
[0] 1 16 ISCreate_Stride()
[0] 2 489600 ISGetIndices_Block()
[0] 1 20400 ISLocalToGlobalMappingBlock()
[0] 7 246016 ISLocalToGlobalMappingCreate()
Here the accessing of indices is at the point level (as well as block) and
hence memory usage is proportional to dof* local number of grid points. Of
course it is still only proportional to the vector size. There is some
improvement we could make it; with a lot of refactoring we can remove the dof*
completely, with a little refactoring we can bring it down to a single
dof*local number of grid points.
I cannot understand why you are seeing memory usage 7 times more than a
vector. That seems like a lot.
Barry
On Oct 21, 2013, at 11:32 AM, Barry Smith <[email protected]> wrote:
>
> The PETSc DMDA object greedily allocates several arrays of data used to set
> up the communication and other things like local to global mappings even
> before you create any vectors. This is why you see this big bump in memory
> usage.
>
> BUT I don't think it should be any worse in 3.4 than in 3.3 or earlier; at
> least we did not intend to make it worse. Are you sure it is using more
> memory than in 3.3
>
> In order for use to decrease the memory usage of the DMDA setup it would be
> helpful if we knew which objects created within it used the most memory.
> There is some sloppiness in that routine of not reusing memory as well as
> could be, not sure how much difference that would make.
>
>
> Barry
>
>
>
> On Oct 21, 2013, at 7:02 AM, Juha Jäykkä <[email protected]> wrote:
>
>> Dear list members,
>>
>> I have noticed strange memory consumption after upgrading to 3.4 series. I
>> never had time to properly investigate, but here is what happens [yes, this
>> might be a petsc4py issue, but I doubt it] is
>>
>> # helpers contains _ProcessMemoryInfoProc routine which just digs the memory
>> # usage data from /proc
>> import helpers
>> procdata=helpers._ProcessMemoryInfoProc()
>> print procdata.rss/2**20, "MiB /", procdata.os_specific[3][1]
>> from petsc4py import PETSc
>> procdata=helpers._ProcessMemoryInfoProc()
>> print procdata.rss/2**20, "MiB /", procdata.os_specific[3][1]
>> da = PETSc.DA().create(sizes=[100,100,100],
>> proc_sizes=[PETSc.DECIDE,PETSc.DECIDE,PETSc.DECIDE],
>> boundary_type=[3,0,0],
>> stencil_type=PETSc.DA.StencilType.BOX,
>> dof=7, stencil_width=1, comm=PETSc.COMM_WORLD)
>> procdata=helpers._ProcessMemoryInfoProc()
>> print procdata.rss/2**20, "MiB /", procdata.os_specific[3][1]
>> vec=da.createGlobalVec()
>> procdata=helpers._ProcessMemoryInfoProc()
>> print procdata.rss/2**20, "MiB /", procdata.os_specific[3][1]
>>
>> outputs
>>
>> 48 MiB / 49348 kB
>> 48 MiB / 49360 kB
>> 381 MiB / 446228 kB
>> 435 MiB / 446228 kB
>>
>> Which is odd: size of the actual data to be stored in the da is just about
>> 56
>> megabytes, so why does creating the da consume 7 times that? And why does
>> the
>> DA reserve the memory in the first place? I thought memory only gets
>> allocated
>> once an associated vector is created and it indeed looks like the
>> createGlobalVec call does indeed allocate the right amount of data. But what
>> is that 330 MiB that DA().create() consumes? [It's actually the .setUp()
>> method that does the consuming, but that's not of much use as it needs to be
>> called before a vector can be created.]
>>
>> Cheers,
>> Juha
>>
>