Hi Ralph,
We've had a number of user complaints about this. Since it seems on
the face of it that it is a debugger issue, it may have not made it's
way back here. Is your objection that the patch basically aborts if it
gets a bad value? I could understand that being a concern. Of
course, it aborts on TotalView now if we attempt to move forward without
this patch.
I've passed your comment back to the engineer, with a suspicion about
the concerns about the abort, but if you have other objections, let me know.
Cheers,
PeterT
Ralph Castain wrote:
That would be a problem, I fear. We need to push those envars into the
environment.
Is there some particular problem causing what you see? We have no other reports
of this issue, and orterun has had that code forever.
Sent from my iPad
On May 11, 2011, at 2:05 PM, Peter Thompson <peter.thomp...@roguewave.com>
wrote:
We've gotten a few reports of problems with memory debugging when using OpenMPI
under TotalView. Usually, TotalView will attach tot he processes started after
an MPI_Init. However in the case where memory debugging is enabled, things
seemed to run away or fail. My analysis showed that we had a number of core
files left over from the attempt, and all were mpirun (or orterun) cores. It
seemed to be a regression on our part, since testing seemed to indicate this
worked okay before TotalView 8.9.0-0, so I filed an internal bug and passed it
to engineering. After giving our engineer a brief tutorial on how to build a
debug version of OpenMPI, he found what appears to be a problem in the code for
orterun.c. He's made a slight change that fixes the issue in 1.4.2, 1.4.3,
1.4.4rc2 and 1.5.3, those being the versions he's tested with so far. He
doesn't subscribe to this list that I know of, so I offered to pass this by the
group. Of course, I'm not sure if this is exactly the right place to submit
patches, but I'm sure you'd tell me where to put it if I'm in the wrong here.
It's a short patch, so I'll cut and paste it, and attach as well, since cut and
paste can do weird things to formatting.
Credit goes to Ariel Burton for this patch. Of course he used TotalVIew to
find this ;-) It shows up if you do 'mpirun -tv -np 4 ./foo' or 'totalview
mpirun -a -np 4 ./foo'
Cheers,
PeterT
more ~/patches/anbs-patch
*** orte/tools/orterun/orterun.c 2010-04-13 13:30:34.000000000 -0400
--- /home/anb/packages/openmpi-1.4.2/linux-x8664-iwashi/installation/bin/../../.
./src/openmpi-1.4.2/orte/tools/orterun/orterun.c 2011-05-09 20:28:16.5881
83000 -0400
***************
*** 1578,1588 ****
}
if (NULL != env) {
size1 = opal_argv_count(env);
for (j = 0; j < size1; ++j) {
! putenv(env[j]);
}
}
/* All done */
--- 1578,1600 ----
}
if (NULL != env) {
size1 = opal_argv_count(env);
for (j = 0; j < size1; ++j) {
! /* Use-after-Free error possible here. putenv does not copy
! the string passed to it, and instead stores only the pointer.
! env[j] may be freed later, in which case the pointer
! in environ will now be left dangling into a deallocated
! region.
! So we make a copy of the variable.
! */
! char *s = strdup(env[j]);
!
! if (NULL == s) {
! return OPAL_ERR_OUT_OF_RESOURCE;
! }
! putenv(s);
}
}
/* All done */
*** orte/tools/orterun/orterun.c 2010-04-13 13:30:34.000000000 -0400
---
/home/anb/packages/openmpi-1.4.2/linux-x8664-iwashi/installation/bin/../../../src/openmpi-1.4.2/orte/tools/orterun/orterun.c
2011-05-09 20:28:16.588183000 -0400
***************
*** 1578,1588 ****
}
if (NULL != env) {
size1 = opal_argv_count(env);
for (j = 0; j < size1; ++j) {
! putenv(env[j]);
}
}
/* All done */
--- 1578,1600 ----
}
if (NULL != env) {
size1 = opal_argv_count(env);
for (j = 0; j < size1; ++j) {
! /* Use-after-Free error possible here. putenv does not copy
! the string passed to it, and instead stores only the pointer.
! env[j] may be freed later, in which case the pointer
! in environ will now be left dangling into a deallocated
! region.
! So we make a copy of the variable.
! */
! char *s = strdup(env[j]);
!
! if (NULL == s) {
! return OPAL_ERR_OUT_OF_RESOURCE;
! }
! putenv(s);
}
}
/* All done */
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users