Re: [OMPI devel] about btl/scif thread cancellation (#4616 / r31738)

2014-05-14 Thread Ralph Castain
As I said, this isn't the only thread that faces this issue, and we have resolved it elsewhere - surely we can resolve it here as well in an acceptable manner. Nathan? On May 13, 2014, at 7:33 PM, Gilles Gouaillardet wrote: > Ralph, > > scif_poll(...) is

Re: [OMPI devel] about btl/scif thread cancellation (#4616 / r31738)

2014-05-14 Thread Ralph Castain
+1 - seen it before, and you'll find warnings across many software sites about this problem. Easy to have the main program segfault by touching the wrong thing after a cancel unless all the stars are properly aligned in the various libraries. On May 13, 2014, at 7:56 PM, Paul Hargrove

[OMPI devel] Please provide the pshmem_finalize symbol

2014-05-14 Thread Bert Wesarg
Dear all, the Score-P community is currently in the process to support the OpenSHMEM API in its performance measurement infrastructure Score-P [1]. And we are near a release of a new major version of it. Now that Open MPI also provides an OpenSHMEM implementation, we extended our testing

Re: [OMPI devel] Please provide the pshmem_finalize symbol

2014-05-14 Thread Mike Dubman
here it goes, https://svn.open-mpi.org/trac/ompi/changeset/31751 On Wed, May 14, 2014 at 9:19 AM, Bert Wesarg wrote: > Dear all, > > the Score-P community is currently in the process to support the OpenSHMEM > API in its performance measurement infrastructure Score-P

Re: [OMPI devel] Non-uniform BTL problems in: openib, tcp, sctp, portals4, vader, scif

2014-05-14 Thread George Bosilca
Good catch. I fixed the TCP BTL (r31753). It is the only BTL I can test so that's the most I can do here. However, I never get OPAL_ERR_DATA_VALUE_NOT_FOUND out of the modex call when the key doesn't exists. I looked in dstore and the correct value one should look for is OPAL_ERR_NOT_FOUND. I

Re: [OMPI devel] about btl/scif thread cancellation (#4616 / r31738)

2014-05-14 Thread Nathan Hjelm
Looks like this is a scif bug. From the documentation: scif_poll() waits for one of a set of endpoints to become ready to perform an I/O operation; it is syntactically and semantically very similar to poll() . The SCIF functions on which scif_poll() waits are scif_accept(), scif_send(), and

Re: [OMPI devel] about btl/scif thread cancellation (#4616 / r31738)

2014-05-14 Thread Ralph Castain
Couple of suggestions: * detect that this is an older scif lib and just don't build or enable the scif btl * have a flag that indicates "you should exit", and then tickle the fd so scif_poll exits Ralph On May 14, 2014, at 7:45 AM, Nathan Hjelm wrote: > Looks like this is

Re: [OMPI devel] about btl/scif thread cancellation (#4616 / r31738)

2014-05-14 Thread Gilles Gouaillardet
Nathan, > Looks like this is a scif bug. From the documentation: and from the source code, scif_poll(...) simply calls poll(...) at least in MPSS 2.1 > Since that is not the case I will look through the documentation and see if there is a way other than pthread_cancel. what about : - use a

Re: [OMPI devel] about btl/scif thread cancellation (#4616 / r31738)

2014-05-14 Thread George Bosilca
It sounds more like a suboptimal usage of the pthread cancelation helpers than a real issue with the pthread_cancel itself. I do agree the usage is not necessarily straightforward even for a veteran coder, but the related issues remain belong to the realm of implementation not at the conceptual

Re: [OMPI devel] about btl/scif thread cancellation (#4616 / r31738)

2014-05-14 Thread Nathan Hjelm
On Wed, May 14, 2014 at 07:55:54AM -0700, Ralph Castain wrote: > Couple of suggestions: > > * detect that this is an older scif lib and just don't build or enable the > scif btl > > * have a flag that indicates "you should exit", and then tickle the fd so > scif_poll exits Thinking along

Re: [OMPI devel] about btl/scif thread cancellation (#4616 / r31738)

2014-05-14 Thread George Bosilca
There seems to be a consensus on the fact that closing an fd should trigger the return from poll. Unfortunately this assumption is wrong, and not condoned by any documentation available online. To be more clear, all documentation I know tend to point in the opposite direction: it is unwise to

Re: [OMPI devel] about btl/scif thread cancellation (#4616 / r31738)

2014-05-14 Thread Nathan Hjelm
That is exactly how I decided to fix it. It looks like it is working. Please try r31755 when you get a chance. -Nathan On Thu, May 15, 2014 at 12:03:53AM +0900, Gilles Gouaillardet wrote: >Nathan, > >> Looks like this is a scif bug. From the documentation: > >and from the source

Re: [OMPI devel] Please provide the pshmem_finalize symbol

2014-05-14 Thread Bert Wesarg
On 05/14/2014 03:15 PM, Mike Dubman wrote: here it goes, https://svn.open-mpi.org/trac/ompi/changeset/31751 Thank you very much. I will test against the latest nightly builds for trunk and v1.8 and report back. Regards, Bert On Wed, May 14, 2014 at 9:19 AM, Bert Wesarg

[hwloc-devel] Create success (hwloc git dev-158-g9737520)

2014-05-14 Thread MPI Team
Creating nightly hwloc snapshot git tarball was a success. Snapshot: hwloc dev-158-g9737520 Start time: Wed May 14 21:01:02 EDT 2014 End time: Wed May 14 21:03:09 EDT 2014 Your friendly daemon, Cyrador