Dear all,

I'm trying to understand how to use the recent natural-global mappings that 
have made it into dmplex.  The goal is to do checkpoint restart (onto, perhaps, 
differing numbers of processes) with DMs with a non-zero overlap (I can see 
that this is currently unsupported, but let's try the zero overlap case first).

I'm running into a number of problems that may be misunderstandings, but are 
possibly bugs.

Let's try the simplest thing first, build a DMPlex on two processes, dump it to 
disk and then try to reload.

#include <petsc.h>


int main(int argc, char **argv)
{
    PetscErrorCode ierr;
    MPI_Comm       comm;
    DM             dm, newdm, pardm;
    PetscViewer    vwr;
    const char    *name = "dump.h5";

    PetscInitialize(&argc, &argv, NULL, NULL);

    comm = PETSC_COMM_WORLD;
    ierr = DMPlexCreateBoxMesh(comm, 2, PETSC_TRUE, &dm); CHKERRQ(ierr);
    ierr = DMPlexDistribute(dm, 0, NULL, &pardm); CHKERRQ(ierr);
    if (pardm) {
        ierr = DMDestroy(&dm); CHKERRQ(ierr);
        dm = pardm;
    }

    ierr = PetscViewerCreate(comm, &vwr); CHKERRQ(ierr);
    ierr = PetscViewerSetType(vwr, PETSCVIEWERHDF5); CHKERRQ(ierr);
    ierr = PetscViewerFileSetMode(vwr, FILE_MODE_WRITE); CHKERRQ(ierr);
    ierr = PetscViewerSetFormat(vwr, PETSC_VIEWER_NATIVE); CHKERRQ(ierr);
    ierr = PetscViewerFileSetName(vwr, name); CHKERRQ(ierr);
    ierr = DMView(dm, vwr); CHKERRQ(ierr);

    ierr = PetscViewerDestroy(&vwr); CHKERRQ(ierr);

    ierr = DMCreate(comm, &newdm); CHKERRQ(ierr);
    ierr = DMSetType(newdm, DMPLEX); CHKERRQ(ierr);

    ierr = PetscViewerCreate(comm, &vwr);
    ierr = PetscViewerSetType(vwr, PETSCVIEWERHDF5); CHKERRQ(ierr);
    ierr = PetscViewerFileSetMode(vwr, FILE_MODE_READ); CHKERRQ(ierr);
    ierr = PetscViewerSetFormat(vwr, PETSC_VIEWER_NATIVE); CHKERRQ(ierr);
    ierr = PetscViewerFileSetName(vwr, name); CHKERRQ(ierr);

    ierr = DMLoad(newdm, vwr); CHKERRQ(ierr);

    ierr = PetscViewerDestroy(&vwr); CHKERRQ(ierr);

    ierr = DMPlexDistribute(newdm, 0, NULL, &pardm); CHKERRQ(ierr);

    if (pardm) {
        ierr = DMDestroy(&newdm); CHKERRQ(ierr);
        newdm = pardm;
    }

    ierr = DMDestroy(&dm); CHKERRQ(ierr);
    ierr = DMDestroy(&newdm); CHKERRQ(ierr);
    PetscFinalize();

    return 0;
}


This fails with the following error in DMPlexDistributeCoordinates:

VecSetBlockSize:

[0]PETSC ERROR: Invalid argument
[0]PETSC ERROR: Int value must be same on all processes, argument # 2

Digging a bit, I think this is because in DMCreateLocalVector the block size is 
determined by checking the block size of the section.  But on rank 1, the 
section is empty (and so the created Vec does not have a block size set).  We 
then go on to copy this block size over to the distributed coordinates, and get 
an error because on rank-0 the block size is 2, not 1.

I fix this in the following way:

commit aef309e0b49bf6e55e6f970bc0bdabf021b346f6
Author: Lawrence Mitchell <lawrence.mitch...@imperial.ac.uk>
Date:   Fri Oct 23 09:32:36 2015 +0100

    plexdistribute: Use better guess for local coordinate block size

    If we create local coordinates on a proces with an empty section, we end
    up setting a block size of 1 on the newly distributed local
    coordinates (since the local block size is unset).  In this case, check
    if we have global coordinates in place and try using the block size
    they provide instead.

diff --git a/src/dm/impls/plex/plexdistribute.c b/src/dm/impls/plex/plexdistribu
index 24d39b8..8049019 100644
--- a/src/dm/impls/plex/plexdistribute.c
+++ b/src/dm/impls/plex/plexdistribute.c
@@ -1040,7 +1040,7 @@ PetscErrorCode DMPlexDistributeCoordinates(DM dm, PetscSF
 {
   MPI_Comm         comm;
   PetscSection     originalCoordSection, newCoordSection;
-  Vec              originalCoordinates, newCoordinates;
+  Vec              originalCoordinates, newCoordinates, globalCoordinates;
   PetscInt         bs;
   const char      *name;
   const PetscReal *maxCell, *L;
@@ -1055,6 +1055,7 @@ PetscErrorCode DMPlexDistributeCoordinates(DM dm, PetscSF
   ierr = DMGetCoordinateSection(dm, &originalCoordSection);CHKERRQ(ierr);
   ierr = DMGetCoordinateSection(dmParallel, &newCoordSection);CHKERRQ(ierr);
   ierr = DMGetCoordinatesLocal(dm, &originalCoordinates);CHKERRQ(ierr);
+  ierr = DMGetCoordinates(dm, &globalCoordinates);CHKERRQ(ierr);
   if (originalCoordinates) {
     ierr = VecCreate(comm, &newCoordinates);CHKERRQ(ierr);
     ierr = PetscObjectGetName((PetscObject) originalCoordinates, &name);CHKERRQ
@@ -1063,6 +1064,15 @@ PetscErrorCode DMPlexDistributeCoordinates(DM dm, PetscSF
     ierr = DMPlexDistributeField(dm, migrationSF, originalCoordSection, origina
     ierr = DMSetCoordinatesLocal(dmParallel, newCoordinates);CHKERRQ(ierr);
     ierr = VecGetBlockSize(originalCoordinates, &bs);CHKERRQ(ierr);
+    if (globalCoordinates) {
+      PetscLayout map, gmap;
+      ierr = VecGetLayout(originalCoordinates, &map); CHKERRQ(ierr);
+      ierr = VecGetLayout(globalCoordinates, &gmap); CHKERRQ(ierr);
+      /* If local block size not set but global size is, pick global size */
+      if (map->bs < 0 && gmap->bs > 0) {
+        ierr = PetscLayoutGetBlockSize(gmap, &bs); CHKERRQ(ierr);
+      }
+    }
     ierr = VecSetBlockSize(newCoordinates, bs);CHKERRQ(ierr);
     ierr = VecDestroy(&newCoordinates);CHKERRQ(ierr);
   }

Now at least I can dump and reload my DM.

OK, now let's think about the natural-to-global stuff.  If I understand this 
correctly, I need to set a section on the serial DM (before distribution) then 
use DMSetUseNatural and now the natural-to-global SF will be created during 
distribution.

This is pretty awkward for me to deal with, since I don't have a section at all 
before distributing, and would rather not build it then.  Is this a hard 
limitation?  Naively it seems like I should be able to derive a natural 
numbering from the distributed DM by using the point migration SF.  Thoughts?


Cheers,

Lawrence

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to