On Wed, Aug 28, 2013 at 3:04 PM, Garnet Vaz <[email protected]> wrote:
> Hi Matt, > > I just built the 3.4.2 release in the hope that it will work. It was > working fine for the 'next' > branch until a recent update last night. I updated my laptop/desktop with > a 1/2 hour > gap which caused crashes in one build but not in the other. Hence, I moved > to the > 3.4.2 release. > > I will rebuild using the current 'next' and let you know if there are any > problems. > Can you send configure.log? I built against OpenMPI and it looks like a get a similar error which is not there with MPICH. Trying to confirm now. Matt > Thanks. > > - > Garnet > > > > On Wed, Aug 28, 2013 at 12:51 PM, Matthew Knepley <[email protected]>wrote: > >> On Wed, Aug 28, 2013 at 1:58 PM, Garnet Vaz <[email protected]> wrote: >> >>> Hi Matt, >>> >>> Attached is a folder containing the code and a sample mesh. >>> >> >> I have built and run it here with the 'next' branch from today, and it >> does not crash. >> What branch are you using? >> >> Matt >> >> >>> Thanks for the help. >>> >>> - >>> Garnet >>> >>> >>> On Wed, Aug 28, 2013 at 11:43 AM, Matthew Knepley <[email protected]>wrote: >>> >>>> On Wed, Aug 28, 2013 at 12:52 PM, Garnet Vaz <[email protected]>wrote: >>>> >>>>> Thanks Jed. I did as you told and the code finally crashes on both >>>>> builds. I installed the 3.4.2 release now. >>>>> >>>>> The problem now seems to come from DMPlexDistribute(). I have two >>>>> versions to load the mesh. One creates a mesh using Triangle >>>>> from PETSc and the other loads a mesh using DMPlexCreateFromCellList(). >>>>> >>>>> Is the following piece of code for creating a mesh using Triangle >>>>> right? >>>>> >>>> >>>> Okay, something is really very wrong here. It is calling >>>> EnlargePartition(), but for >>>> that path to be taken, you have to trip and earlier exception. It >>>> should not be possible >>>> to call it. So I think you have memory corruption somewhere. >>>> >>>> Can you send a sample code we can run? >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> ierr = >>>>> DMPlexCreateBoxMesh(comm,2,interpolate,&user->dm);CHKERRQ(ierr); >>>>> if (user->dm) { >>>>> DM refinedMesh = NULL; >>>>> DM distributedMesh = NULL; >>>>> ierr = >>>>> DMPlexSetRefinementLimit(user->dm,refinementLimit);CHKERRQ(ierr); >>>>> ierr = >>>>> DMRefine(user->dm,PETSC_COMM_WORLD,&refinedMesh);CHKERRQ(ierr); >>>>> if (refinedMesh) { >>>>> ierr = DMDestroy(&user->dm);CHKERRQ(ierr); >>>>> user->dm = refinedMesh; >>>>> } >>>>> ierr = >>>>> DMPlexDistribute(user->dm,"chaco",1,&distributedMesh);CHKERRQ(ierr); >>>>> if (distributedMesh) { >>>>> ierr = DMDestroy(&user->dm);CHKERRQ(ierr); >>>>> user->dm = distributedMesh; >>>>> } >>>>> } >>>>> >>>>> Using gdb, the code gives a SEGV during distribution. The backtrace >>>>> when the fault >>>>> occurs points to an invalid pointer for ISGetIndices(). Attached is a >>>>> screenshot of the >>>>> gdb backtrace. >>>>> Do I need to set up some index set here? >>>>> >>>>> The same error occurs when trying to distribute a mesh using >>>>> DMPlexCreateFromCellList(). >>>>> >>>>> Thanks for the help. >>>>> >>>>> >>>>> - >>>>> Garnet >>>>> >>>>> >>>>> On Wed, Aug 28, 2013 at 6:38 AM, Jed Brown <[email protected]>wrote: >>>>> >>>>>> Garnet Vaz <[email protected]> writes: >>>>>> >>>>>> > Hi, >>>>>> > >>>>>> > I just rebuilt PETSc on both my laptop and my desktop. >>>>>> > On both machines the output of >grep GIT configure.log >>>>>> > Defined "VERSION_GIT" to >>>>>> > ""d8f7425765acda418e23a679c25fd616d9da8153"" >>>>>> > Defined "VERSION_DATE_GIT" to ""2013-08-27 10:05:35 -0500"" >>>>>> >>>>>> Thanks for the report. Matt just merged a bunch of DMPlex-related >>>>>> branches (about 60 commits in total). Can you 'git pull && make' to >>>>>> let >>>>>> us know if the problem is still there? (It may not fix the issue, but >>>>>> at least we'll be debugging current code.) >>>>>> >>>>>> When dealing with debug vs. optimized issues, it's useful to configure >>>>>> --with-debugging=0 COPTFLAGS='-O2 -g'. This allows valgrind to >>>>>> include >>>>>> line numbers, but it (usually!) does not affect whether the error >>>>>> occurs. >>>>>> >>>>>> > My code runs on both machines in the debug build without causing >>>>>> > any problems. When I try to run the optimized build, the code >>>>>> crashes >>>>>> > with a SEGV fault on my laptop but not on the desktop. I have built >>>>>> > PETSc using the same configure options. >>>>>> > >>>>>> > I have attached the outputs of valgrind for both my laptop/desktop >>>>>> for >>>>>> > both the debug/opt builds. How can I figure out what differences are >>>>>> > causing the errors in one case and not the other? >>>>>> >>>>>> It looks like an uninitialized variable. Debug mode often ends up >>>>>> initializing local variables where as optimized leaves junk in them. >>>>>> Stack allocation alignment/padding is also often different. >>>>>> Unfortunately, valgrind is less powerful for debugging stack >>>>>> corruption, >>>>>> so the uninitialized warning is usually the best you get. >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Regards, >>>>> Garnet >>>>> >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>> >>> >>> >>> -- >>> Regards, >>> Garnet >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > > > -- > Regards, > Garnet > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
