Depending on the alignment of the different types there might be small holes in the low-level headers we exchange between processes It should not be a concern for users.
valgrind should not stop on the first detected issue except if --exit-on-first-error has been provided (the default value should be no), so the SIGTERM might be generated for some other reason. What is at jackhmmer.c:1597 ? George. On Tue, Apr 30, 2019 at 2:27 PM David Mathog via users < users@lists.open-mpi.org> wrote: > Attempting to debug a complex program (99.9% of which is others' code) > which stops running when run in valgrind as follows: > > mpirun -np 10 \ > --hostfile /usr/common/etc/openmpi.machines.LINUX_INTEL_newsaf_rev2 \ > --mca plm_rsh_agent rsh \ > /usr/bin/valgrind \ > --leak-check=full \ > --leak-resolution=high \ > --show-reachable=yes \ > --log-file=nc.vg.%p \ > --suppressions=/opt/ompi401/share/openmpi/openmpi-valgrind.supp \ > /usr/common/tmp/jackhmmer \ > --tformat ncbi \ > -T 150 \ > --chkhmm jackhmmer_test \ > --mpi \ > ~safrun/a1hu.pfa \ > /usr/common/tmp/testing/nr_lcl \ > >jackhmmer_test_mpi.out 2>jackhmmer_test_mpi.stderr & > > Every one of the nodes has a variant of this in the log file (followed > by a long list > of memory allocation errors, since it exits without being able to clean > anything up): > > ==5135== Memcheck, a memory error detector > ==5135== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. > ==5135== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright > info > ==5135== Command: /usr/common/tmp/jackhmmer --tformat ncbi -T 150 > --chkhmm jackhmmer_test --mpi /ulhhmi/safrun > /a1hu.pfa /usr/common/tmp/testing/nr_lcl > ==5135== Parent PID: 5119 > ==5135== > ==5135== Syscall param socketcall.sendto(msg) points to uninitialised > byte(s) > ==5135== at 0x5459BFB: send (in /usr/lib64/libpthread-2.17.so) > ==5135== by 0xF84A282: mca_btl_tcp_send_blocking (in > /opt/ompi401/lib/openmpi/mca_btl_tcp.so) > ==5135== by 0xF84E414: mca_btl_tcp_endpoint_send_handler (in > /opt/ompi401/lib/openmpi/mca_btl_tcp.so) > ==5135== by 0x5D6E4EF: event_persist_closure (event.c:1321) > ==5135== by 0x5D6E4EF: event_process_active_single_queue > (event.c:1365) > ==5135== by 0x5D6E4EF: event_process_active (event.c:1440) > ==5135== by 0x5D6E4EF: opal_libevent2022_event_base_loop > (event.c:1644) > ==5135== by 0x5D2465F: opal_progress (in > /opt/ompi401/lib/libopen-pal.so.40.20.1) > ==5135== by 0xF36A9CC: ompi_request_wait_completion (in > /opt/ompi401/lib/openmpi/mca_pml_ob1.so) > ==5135== by 0xF36C30E: mca_pml_ob1_send (in > /opt/ompi401/lib/openmpi/mca_pml_ob1.so) > ==5135== by 0x51BC581: PMPI_Send (in > /opt/ompi401/lib/libmpi.so.40.20.1) > ==5135== by 0x40B46E: mpi_worker (jackhmmer.c:1560) > ==5135== by 0x406726: main (jackhmmer.c:413) > ==5135== Address 0x1ffefff8d5 is on thread 1's stack > ==5135== in frame #2, created by mca_btl_tcp_endpoint_send_handler > (???:) > ==5135== > ==5135== > ==5135== Process terminating with default action of signal 15 (SIGTERM) > ==5135== at 0x5459EFD: ??? (in /usr/lib64/libpthread-2.17.so) > ==5135== by 0x408817: mpi_failure (jackhmmer.c:887) > ==5135== by 0x40B708: mpi_worker (jackhmmer.c:1597) > ==5135== by 0x406726: main (jackhmmer.c:413) > > jackhmmer line 1560 is just this: > > > MPI_Send(&status, 1, MPI_INT, 0, HMMER_SETUP_READY_TAG, > MPI_COMM_WORLD); > > preceded at varying distances by: > > int status = eslOK; > status = 0; > > I can see why MPI might have some uninitialized bytes in that send, for > instance, if it has a minimum buffer size it will send or something like > that. The problem is that it completely breaks valgrind in this > application because valgrind exits immediately when it sees this error. > The suppression file supplied with the release does not prevent that. > > How do I work around this? > > Thank you, > > David Mathog > mat...@caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users