On Fri, Jul 24, 2015 at 11:58 AM, Aaron Kitzmiller < [email protected]> wrote:
> I'm using 1.8.3 which is Sept 2014. I'll try some others. > > Do you happen to know what the bug is (or a good Google term for finding > it)? > It was fixed in 1.8.4 https://github.com/open-mpi/ompi-release/commit/6dcd42be28ac1a8dac887bf5e6c9ffb9b99f9511 Matt > ajk > > Aaron Kitzmiller > Informatics and Scientific Applications > [email protected] > > > > On Jul 24, 2015, at 12:42 PM, Matthew Knepley <[email protected]> wrote: > > On Fri, Jul 24, 2015 at 11:36 AM, Aaron Kitzmiller < > [email protected]> wrote: > >> futex is a Linux system call used for locking shared resources. >> >> It could be indicative of an MPI problem. I wouldn't be surprised. If >> anyone has any idea how to get around it that would be great. We have >> dozens of applications on our compute cluster that use MPI, this version >> being our default. I'm wondering if there is something specific to the mix >> of MPI flavor / compiler, etc. that could be going on here. >> > > Yes, this is a bug in OpenMPI that has been open for years. > > Can you please switch to MPICH and try another test? I thought the newest > version of OpenMPI had fixed this, but maybe you are using an older release. > > Thanks, > > Matt > > >> This is the gdb stack trace: >> >> #0 0x00000039c6a0e264 in __lll_lock_wait () from /lib64/libpthread.so.0 >> #1 0x00000039c6a09508 in _L_lock_854 () from /lib64/libpthread.so.0 >> #2 0x00000039c6a093d7 in pthread_mutex_lock () >> from /lib64/libpthread.so.0 >> #3 0x00002aaaaf13ddd4 in opal_mutex_lock (attr_hash=0x2aaaaf651c70, >> key=128, attribute=0x7fffffffc200, flag=0xffffffffffffffff) >> at ../opal/threads/mutex_unix.h:104 >> #4 ompi_attr_get_c (attr_hash=0x2aaaaf651c70, >> key=128, attribute=0x7fffffffc200, flag=0xffffffffffffffff) >> at attribute/attribute.c:758 >> #5 0x00002aaaaf17080e in PMPI_Attr_get (comm=0x2aaaaf651c70, keyval=128, >> attribute_val=0x7fffffffc200, flag=0xffffffffffffffff) >> at pattr_get.c:61 >> #6 0x00002aaaaacad0b3 in Petsc_DelComm_Outer (comm=0x2aaaaf6d4140, >> keyval=13, attr_val=0x7af160, extra_state=0x0) >> at /n/home08/lchristakis/petsc/petsc-3.5.4/src/sys/objects/pinit.c:409 >> #7 0x00002aaaaf13f1a4 in ompi_attr_delete_impl >> (type=2942639216, object=0x80, attr_hash=0x7fffffffc200, key=-1, >> predefined=112 'p') >> at attribute/attribute.c:970 >> #8 0x00002aaaaf13ee02 in ompi_attr_delete (type=2942639216, object=0x80, >> attr_hash=0x7fffffffc200, key=-1, predefined=112 'p') >> at attribute/attribute.c:1019 >> #9 0x00002aaaaf170710 in PMPI_Attr_delete >> (comm=0x2aaaaf651c70, keyval=128) at pattr_delete.c:59 >> #10 0x00002aaaaac61848 in PetscCommDestroy (comm=0x888cf0) >> at /n/home08/lchristakis/petsc/petsc-3.5.4/src/sys/objects/tagm.c:256 >> #11 0x00002aaaaac6a273 in PetscHeaderDestroy_Private (h=0x888ce0) >> at >> /n/home08/lchristakis/petsc/petsc-3.5.4/src/sys/objects/inherit.c:121 >> #12 0x00002aaaaaf51512 in VecDestroy (v=0x7fffffffcbd0) >> at >> /n/home08/lchristakis/petsc/petsc-3.5.4/src/vec/vec/interface/vector.c:434 >> #13 0x00002aaaab9c5c7f in DMSetUp_DA_2D (da=0x87b1b0) >> at /n/home08/lchristakis/petsc/petsc-3.5.4/src/dm/impls/da/da2.c:776 >> #14 0x00002aaaaba73bfd in DMSetUp_DA (da=0x87b1b0) >> at /n/home08/lchristakis/petsc/petsc-3.5.4/src/dm/impls/da/dareg.c:25 >> #15 0x00002aaaab93399a in DMSetUp (dm=0x87b1b0) >> at /n/home08/lchristakis/petsc/petsc-3.5.4/src/dm/interface/dm.c:560 >> #16 0x00002aaaab9c6941 in DMDACreate2d >> (comm=0x2aaaaf6d45c0, bx=DM_BOUNDARY_NONE, by=DM_BOUNDARY_NONE, >> stencil_type=DMDA_STENCIL_STAR, M=-4, N=-4, m=-1, n=-1, dof=1, s=1, >> lx=0x0, ly=0x0, da=0x7fffffffd668) >> at /n/home08/lchristakis/petsc/petsc-3.5.4/src/dm/impls/da/da2.c:862 >> #17 0x00000000004023d0 in main (argc=1, argv=0x7fffffffd8c8) >> at >> /n/home08/lchristakis/petsc/petsc-3.5.4/src/snes/examples/tutorials/ex5.c:116 >> >> >> Aaron Kitzmiller >> Informatics and Scientific Applications >> [email protected] >> >> >> >> On Jul 24, 2015, at 12:18 PM, Matthew Knepley <[email protected]> wrote: >> >> On Fri, Jul 24, 2015 at 11:17 AM, Matthew Knepley <[email protected]> >> wrote: >> >>> On Fri, Jul 24, 2015 at 11:09 AM, Aaron Kitzmiller < >>> [email protected]> wrote: >>> >>>> Doesn't run. Hangs just like the tests do. >>>> >>>> I doubt it's helpful, but when I run it under strace, it hangs on a >>>> "futex". The last thing vaguely informative was an attempt to read the >>>> non-existent .petscrc. >>>> >>> >>> Run in the debugger and get a stack trace. >>> >> >> Also futex does not appear in the PETSc source: >> >> knepley/feature-snes-deflation *+$|MERGING:/PETSc3/petsc/petsc-dev$ >> find src -name "*.c" | xargs grep futex >> find src -name "*.c" | xargs grep futex >> >> You have an MPI problem. >> >> Matt >> >> >>> Matt >>> >>> >>>> ajk >>>> >>>> Aaron Kitzmiller >>>> Informatics and Scientific Applications >>>> [email protected] >>>> >>>> >>>> >>>> On Jul 24, 2015, at 11:21 AM, Matthew Knepley <[email protected]> >>>> wrote: >>>> >>>> ./ex5 -snes_monitor >>>> >>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
