Dear all, I'm trying to understand how to use the recent natural-global mappings that have made it into dmplex. The goal is to do checkpoint restart (onto, perhaps, differing numbers of processes) with DMs with a non-zero overlap (I can see that this is currently unsupported, but let's try the zero overlap case first).
I'm running into a number of problems that may be misunderstandings, but are possibly bugs. Let's try the simplest thing first, build a DMPlex on two processes, dump it to disk and then try to reload. #include <petsc.h> int main(int argc, char **argv) { PetscErrorCode ierr; MPI_Comm comm; DM dm, newdm, pardm; PetscViewer vwr; const char *name = "dump.h5"; PetscInitialize(&argc, &argv, NULL, NULL); comm = PETSC_COMM_WORLD; ierr = DMPlexCreateBoxMesh(comm, 2, PETSC_TRUE, &dm); CHKERRQ(ierr); ierr = DMPlexDistribute(dm, 0, NULL, &pardm); CHKERRQ(ierr); if (pardm) { ierr = DMDestroy(&dm); CHKERRQ(ierr); dm = pardm; } ierr = PetscViewerCreate(comm, &vwr); CHKERRQ(ierr); ierr = PetscViewerSetType(vwr, PETSCVIEWERHDF5); CHKERRQ(ierr); ierr = PetscViewerFileSetMode(vwr, FILE_MODE_WRITE); CHKERRQ(ierr); ierr = PetscViewerSetFormat(vwr, PETSC_VIEWER_NATIVE); CHKERRQ(ierr); ierr = PetscViewerFileSetName(vwr, name); CHKERRQ(ierr); ierr = DMView(dm, vwr); CHKERRQ(ierr); ierr = PetscViewerDestroy(&vwr); CHKERRQ(ierr); ierr = DMCreate(comm, &newdm); CHKERRQ(ierr); ierr = DMSetType(newdm, DMPLEX); CHKERRQ(ierr); ierr = PetscViewerCreate(comm, &vwr); ierr = PetscViewerSetType(vwr, PETSCVIEWERHDF5); CHKERRQ(ierr); ierr = PetscViewerFileSetMode(vwr, FILE_MODE_READ); CHKERRQ(ierr); ierr = PetscViewerSetFormat(vwr, PETSC_VIEWER_NATIVE); CHKERRQ(ierr); ierr = PetscViewerFileSetName(vwr, name); CHKERRQ(ierr); ierr = DMLoad(newdm, vwr); CHKERRQ(ierr); ierr = PetscViewerDestroy(&vwr); CHKERRQ(ierr); ierr = DMPlexDistribute(newdm, 0, NULL, &pardm); CHKERRQ(ierr); if (pardm) { ierr = DMDestroy(&newdm); CHKERRQ(ierr); newdm = pardm; } ierr = DMDestroy(&dm); CHKERRQ(ierr); ierr = DMDestroy(&newdm); CHKERRQ(ierr); PetscFinalize(); return 0; } This fails with the following error in DMPlexDistributeCoordinates: VecSetBlockSize: [0]PETSC ERROR: Invalid argument [0]PETSC ERROR: Int value must be same on all processes, argument # 2 Digging a bit, I think this is because in DMCreateLocalVector the block size is determined by checking the block size of the section. But on rank 1, the section is empty (and so the created Vec does not have a block size set). We then go on to copy this block size over to the distributed coordinates, and get an error because on rank-0 the block size is 2, not 1. I fix this in the following way: commit aef309e0b49bf6e55e6f970bc0bdabf021b346f6 Author: Lawrence Mitchell <lawrence.mitch...@imperial.ac.uk> Date: Fri Oct 23 09:32:36 2015 +0100 plexdistribute: Use better guess for local coordinate block size If we create local coordinates on a proces with an empty section, we end up setting a block size of 1 on the newly distributed local coordinates (since the local block size is unset). In this case, check if we have global coordinates in place and try using the block size they provide instead. diff --git a/src/dm/impls/plex/plexdistribute.c b/src/dm/impls/plex/plexdistribu index 24d39b8..8049019 100644 --- a/src/dm/impls/plex/plexdistribute.c +++ b/src/dm/impls/plex/plexdistribute.c @@ -1040,7 +1040,7 @@ PetscErrorCode DMPlexDistributeCoordinates(DM dm, PetscSF { MPI_Comm comm; PetscSection originalCoordSection, newCoordSection; - Vec originalCoordinates, newCoordinates; + Vec originalCoordinates, newCoordinates, globalCoordinates; PetscInt bs; const char *name; const PetscReal *maxCell, *L; @@ -1055,6 +1055,7 @@ PetscErrorCode DMPlexDistributeCoordinates(DM dm, PetscSF ierr = DMGetCoordinateSection(dm, &originalCoordSection);CHKERRQ(ierr); ierr = DMGetCoordinateSection(dmParallel, &newCoordSection);CHKERRQ(ierr); ierr = DMGetCoordinatesLocal(dm, &originalCoordinates);CHKERRQ(ierr); + ierr = DMGetCoordinates(dm, &globalCoordinates);CHKERRQ(ierr); if (originalCoordinates) { ierr = VecCreate(comm, &newCoordinates);CHKERRQ(ierr); ierr = PetscObjectGetName((PetscObject) originalCoordinates, &name);CHKERRQ @@ -1063,6 +1064,15 @@ PetscErrorCode DMPlexDistributeCoordinates(DM dm, PetscSF ierr = DMPlexDistributeField(dm, migrationSF, originalCoordSection, origina ierr = DMSetCoordinatesLocal(dmParallel, newCoordinates);CHKERRQ(ierr); ierr = VecGetBlockSize(originalCoordinates, &bs);CHKERRQ(ierr); + if (globalCoordinates) { + PetscLayout map, gmap; + ierr = VecGetLayout(originalCoordinates, &map); CHKERRQ(ierr); + ierr = VecGetLayout(globalCoordinates, &gmap); CHKERRQ(ierr); + /* If local block size not set but global size is, pick global size */ + if (map->bs < 0 && gmap->bs > 0) { + ierr = PetscLayoutGetBlockSize(gmap, &bs); CHKERRQ(ierr); + } + } ierr = VecSetBlockSize(newCoordinates, bs);CHKERRQ(ierr); ierr = VecDestroy(&newCoordinates);CHKERRQ(ierr); } Now at least I can dump and reload my DM. OK, now let's think about the natural-to-global stuff. If I understand this correctly, I need to set a section on the serial DM (before distribution) then use DMSetUseNatural and now the natural-to-global SF will be created during distribution. This is pretty awkward for me to deal with, since I don't have a section at all before distributing, and would rather not build it then. Is this a hard limitation? Naively it seems like I should be able to derive a natural numbering from the distributed DM by using the point migration SF. Thoughts? Cheers, Lawrence
signature.asc
Description: Message signed with OpenPGP using GPGMail