On Mon, Oct 21, 2013 at 1:15 PM, Barry Smith <[email protected]> wrote:

>
>    Sorry, my mistake. The double precision work arrays needed inside the
> VecScatter are dof * number of ghost points  while the space for the
> indices should be number of local grid points  (Not times dof). Is that
> what you see? If the space is dof* number of local grid points then
> something is wrong somewhere along the processing.


No, it looks like dof*local grid points, unless I am making a mistake. Run
this with -malloc test and change dof from 4 to 2. You will see the memory
easily.

  Matt


>
>    Barry
>
>
> On Oct 21, 2013, at 1:04 PM, Matthew Knepley <[email protected]> wrote:
>
> > On Mon, Oct 21, 2013 at 1:00 PM, Barry Smith <[email protected]> wrote:
> >
> >   Matt,
> >
> >    The scatters should always use block indices (I think they do) so the
> memory usage for the scatters should not have a dof* in front of this. Are
> you sure that the dof* is there? If it is there is it because it is a block
> size that we don't support directly?  We currently have special support for
> BS or 1,2,3,4,5,6,7,8,12    We should at least fill in 9,10,11
> >
> >     Do we somewhere inside the VecScatter create business mistakenly
> create an array that depends on dof*?
> >
> > I am sure of this dependence. Its very easy to see by just creating the
> DA and ending using -malloc_test. If it is intended to use block indices,
> > this is not happening.
> >
> >    Matt
> >
> >
> >    Barry
> >
> >
> >
> >
> > On Oct 21, 2013, at 11:52 AM, Matthew Knepley <[email protected]> wrote:
> >
> > > On Mon, Oct 21, 2013 at 11:32 AM, Barry Smith <[email protected]>
> wrote:
> > >
> > >    The PETSc DMDA object greedily allocates several arrays of data
> used to set up the communication and other things like local to global
> mappings even before you create any vectors. This is why you see this big
> bump in memory usage.
> > >
> > >    BUT I don't think it should be any worse in 3.4 than in 3.3 or
> earlier; at least we did not intend to make it worse. Are you sure it is
> using more memory than in 3.3
> > >
> > >    In order for use to decrease the memory usage of the DMDA setup it
> would be helpful if we knew which objects created within it used the most
> memory.  There is some sloppiness in that routine of not reusing memory as
> well as could be, not sure how much difference that would make.
> > >
> > > I am adding a DMDA example to look at this is detail. Here is what I
> have up front. Suppose that there are G grid vertices, e,g, 10^6 in
> > > your example, so that a vector takes up dof*8G bytes. Then the 2D DMDA
> allocates
> > >
> > >   Create ltog scatter          dof*8G
> > >   Create gtol scatter          dof*8G
> > >   Raw indices                    dof*4G
> > >   Create ltogmap               dof*4G
> > >   Create ltogmapb                   4G
> > > --------------------------------------------
> > >                                             dof*24G + 4G < 4 vectors
> > >
> > > It also allocates 2 temporary vectors which are freed but your test
> may pick up since the OS might not have garbage collected them. I will
> > > get the precise numbers for 3D, but they should be similar.
> > >
> > > I don't really see the point of using a DMDA without the scatters. You
> could save 1 vector of storage by making the creation of the l2g maps
> > > for the global vector lazy (and possibly those indices we use to remap
> arrays).
> > >
> > >    Matt
> > >
> > >
> > >    Barry
> > >
> > >
> > >
> > > On Oct 21, 2013, at 7:02 AM, Juha Jäykkä <[email protected]> wrote:
> > >
> > > > Dear list members,
> > > >
> > > > I have noticed strange memory consumption after upgrading to 3.4
> series. I
> > > > never had time to properly investigate, but here is what happens
> [yes, this
> > > > might be a petsc4py issue, but I doubt it] is
> > > >
> > > > # helpers contains _ProcessMemoryInfoProc routine which just digs
> the memory
> > > > # usage data from /proc
> > > > import helpers
> > > > procdata=helpers._ProcessMemoryInfoProc()
> > > > print procdata.rss/2**20, "MiB /", procdata.os_specific[3][1]
> > > > from petsc4py import PETSc
> > > > procdata=helpers._ProcessMemoryInfoProc()
> > > > print procdata.rss/2**20, "MiB /", procdata.os_specific[3][1]
> > > > da = PETSc.DA().create(sizes=[100,100,100],
> > > >
> proc_sizes=[PETSc.DECIDE,PETSc.DECIDE,PETSc.DECIDE],
> > > >                       boundary_type=[3,0,0],
> > > >                       stencil_type=PETSc.DA.StencilType.BOX,
> > > >                       dof=7, stencil_width=1, comm=PETSc.COMM_WORLD)
> > > > procdata=helpers._ProcessMemoryInfoProc()
> > > > print procdata.rss/2**20, "MiB /", procdata.os_specific[3][1]
> > > > vec=da.createGlobalVec()
> > > > procdata=helpers._ProcessMemoryInfoProc()
> > > > print procdata.rss/2**20, "MiB /", procdata.os_specific[3][1]
> > > >
> > > > outputs
> > > >
> > > > 48 MiB / 49348 kB
> > > > 48 MiB / 49360 kB
> > > > 381 MiB / 446228 kB
> > > > 435 MiB / 446228 kB
> > > >
> > > > Which is odd: size of the actual data to be stored in the da is just
> about 56
> > > > megabytes, so why does creating the da consume 7 times that? And why
> does the
> > > > DA reserve the memory in the first place? I thought memory only gets
> allocated
> > > > once an associated vector is created and it indeed looks like the
> > > > createGlobalVec call does indeed allocate the right amount of data.
> But what
> > > > is that 330 MiB that DA().create() consumes? [It's actually the
> .setUp()
> > > > method that does the consuming, but that's not of much use as it
> needs to be
> > > > called before a vector can be created.]
> > > >
> > > > Cheers,
> > > > Juha
> > > >
> > >
> > >
> > >
> > >
> > > --
> > > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > > -- Norbert Wiener
> >
> >
> >
> >
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
static char help[] = "Tests memory usage of DMDA\n\n";

#include <petscdmda.h>

#undef __FUNCT__
#define __FUNCT__ "main"
int main(int argc, char **argv)
{
  DM             dm;
  PetscErrorCode ierr;

#if 0
  VecScatter            gtol,ltog,ltol;        /* scatters, see below for details */
  AO                    ao;                    /* application ordering context */
  Vec                   natural;            /* global vector for storing items in natural order */
  VecScatter            gton;               /* vector scatter from global to natural */
  ISColoring            localcoloring;       /* set by DMCreateColoring() */
  ISColoring            ghostedcoloring;
  void                  *arrayin[DMDA_MAX_WORK_ARRAYS],*arrayout[DMDA_MAX_WORK_ARRAYS];
  void                  *arrayghostedin[DMDA_MAX_WORK_ARRAYS],*arrayghostedout[DMDA_MAX_WORK_ARRAYS];
  void                  *startin[DMDA_MAX_WORK_ARRAYS],*startout[DMDA_MAX_WORK_ARRAYS];
  void                  *startghostedin[DMDA_MAX_WORK_ARRAYS],*startghostedout[DMDA_MAX_WORK_ARRAYS];

  1 Grid ==  10000 bytes
  1  Vec == 16G

  In DMSetUp_DMDA(),
  1) Create global, local which are subsequently destroyed
  2) Create ltog scatter (dof*8G)
  3) Create gtol scatter (dof*8G)
  4) Raw indices (dof*4G)
  5) Create ltogmap (dof*4G)
  6) Create ltogmapb (4G)
#endif

  ierr = PetscInitialize(&argc, &argv, NULL, help);CHKERRQ(ierr);
  ierr = DMDACreate2d(PETSC_COMM_WORLD, DMDA_BOUNDARY_NONE, DMDA_BOUNDARY_NONE, DMDA_STENCIL_STAR, 100, 100, PETSC_DETERMINE, PETSC_DETERMINE, 4, 1, NULL, NULL, &dm);CHKERRQ(ierr);
  //ierr = DMDestroy(&dm);CHKERRQ(ierr);
  ierr = PetscFinalize();
  return 0;
}

Reply via email to