Can you run the mvapich codde with -vecscatter_alltoall and see if it goes through?
satish On Thu, 14 Apr 2011, Ethan Coon wrote: > I'm a bit grasping at straws here, because I'm completely stymied, so > please bear with me. > > > I'm running a program in two locations -- on local workstations with > mpich2 and on a supercomputer with mvapich. > > On the workstation, the program runs, in all cases I've tested, > including 8 processes (the number of cores), and up to 64 processes > (multiple procs per core). > > On the supercomputer, it runs on 16 cores (one full node). With 64 > cores, it seg-faults and core dumps many timesteps in to the > simulation. > > Using a debugger, a debug-enabled petsc-dev, but with no access to > debugging symbols in the mvapich installation, I've looked at the core. > It appears to dump during VecScatterBegin_1 (within a DMDALocalToLocal() > with xin = xout). The Vec I pass in as both input and output appears > normal. > > The stack looks something like: > > MPIR_HBT_lookup, FP=7fff1010f740 > PMPI_Attr_get, FP=7fff1010f780 > PetscCommDuplicate, FP=7fff1010f7d0 > PetscViewerASCIIGetStdout, FP=7fff1010f800 > PETSC_VIEWER_STDOUT_, FP=7fff1010f820 > PetscDefaultSignalHandler, FP=7fff1010fa70 > PetscSignalHandler_Private, FP=7fff1010fa90 > **** Signal Stack Frame ****************** > MPID_IsendContig, FP=7fff1010ff20 > MPID_IsendDatatype, FP=7fff1010ffa0 > PMPI_Start, FP=7fff1010fff0 > VecScatterBegin_1, FP=7fff10110080 > VecScatterBegin, FP=7fff101100e0 > DMDALocalToLocalBegin, FP=7fff10110120 > dmdalocaltolocalbegin_, FP=7fff10110160 > > > Has anyone run into anything like this before? I have no clue even how > to proceed, and I doubt this is a PETSc problem, but I figured you guys > might have enough experience in these types of issues to know where to > look from here... > > Thanks, > > Ethan > > >
