Re: [OMPI devel] race condition in oob/tcp

2014-09-26 Thread Gilles Gouaillardet
Ralph,

i just commited r32799 in order to fix this issue.
i cmr'ed (#4923) and set the target for 1.8.4

Cheers,

Gilles

On 2014/09/23 22:55, Ralph Castain wrote:
> Thanks! I won't have time to work on it this week, but appreciate your 
> effort. Also, thanks for clarifying the race condition vis 1.8 - I agree it 
> is not a blocker for that release.
>
> Ralph
>
> On Sep 22, 2014, at 4:49 PM, Gilles Gouaillardet 
> <gilles.gouaillar...@gmail.com> wrote:
>
>> Ralph,
>>
>> here is the patch i am using so far.
>> i will resume working on this from Wednesday (there is at least one 
>> remaining race condition yet) unless you have the time to take care of it 
>> today.
>>
>> so far, the race condition has only been observed in real life with the 
>> grpcomm/rcd module, and this is not the default in v1.8, so imho this is not 
>> a blocker for v1.8.3
>>
>> Cheers,
>>
>> Gilles
>>
>> On Tue, Sep 23, 2014 at 7:46 AM, Ralph Castain <r...@open-mpi.org> wrote:
>> Gilles - please let me know if/when you think you'll do this. I'm debating 
>> about adding it to 1.8.3, but don't want to delay that release too long. 
>> Alternatively, I can take care of it if you don't have time (I'm asking if 
>> you can do it solely because you have the reproducer).
>>
>>
>> On Sep 21, 2014, at 6:54 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>
>>> Sounds fine with me - please go ahead, and thanks
>>>
>>> On Sep 20, 2014, at 10:26 PM, Gilles Gouaillardet 
>>> <gilles.gouaillar...@gmail.com> wrote:
>>>
>>>> Thanks for the pointer George !
>>>>
>>>> On Sat, Sep 20, 2014 at 5:46 AM, George Bosilca <bosi...@icl.utk.edu> 
>>>> wrote:
>>>> Or copy the handshake protocol design of the TCP BTL...
>>>>
>>>>
>>>> the main difference between oob/tcp and btl/tcp is the way we resolve the 
>>>> situation in which two processes send their first message to each other at 
>>>> the same time.
>>>>
>>>> in oob/tcp, all (e.g. one or two) sockets are closed and the higher vpid 
>>>> is directed to retry establishing a connection.
>>>>
>>>> in btl/tcp, the useless socket is closed (e.g. the one that was connect-ed 
>>>> on the lower vpid and the one that was accept-ed on the higher vpid.
>>>>
>>>>
>>>> my first impression is that oob/tcp is un-necessary complex and it should 
>>>> use the simpler and most efficient protocol of btl/tcp.
>>>> that being said, this conclusion could be too naive and for some good 
>>>> reasons i ignore, the btl/tcp handshake protocol might not be a good fit 
>>>> for oob/tcp.
>>>>
>>>> any thoughts ?
>>>>
>>>> i will revamp oob/tcp in order to use the same btl/tcp handshake protocol 
>>>> from tomorrow unless indicated otherwise
>>>>
>>>> Cheers,
>>>>
>>>> Gilles
>>>> ___
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/devel/2014/09/15885.php
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/09/15895.php
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/09/15897.php
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15900.php



Re: [OMPI devel] Conversion to GitHub: POSTPONED

2014-09-24 Thread Gilles Gouaillardet
my 0.02 US$ ...

Bitbucket pricing model is per user (but with free public/private
repository up to 5 users)
whereas github pricing is per *private* repository (and free public
repository and with unlimited users)

from an OpenMPI point of view, this means :
- with github, only the private ompi-tests repository requires a fee
- with bitbucket, the ompi repository requires a fee (there are 119
users in https://github.com/open-mpi/authors/blob/master/authors.txt, in
bitbucket pricing model, that means unlimited users and this is 200US$
per month)

per branch ACL is a feature that was requested lng time ago on
bitbucket, and now they implemented it, i would not expect it takes
too much time before github implements it too.

from the documentation, gerrithub has also interesting features :
- force the use of a workflow (assuming the workflow is a good match
with how we want to work ...)
- prevent developers from commiting a huge mess to github

Gilles

On 2014/09/24 10:36, Jeff Squyres (jsquyres) wrote:
> On Sep 23, 2014, at 7:52 PM, Jed Brown  wrote:
>
>> I don't have experience with GerritHub, but Bitbucket supports this
>> feature (permissions on branch names/globs) and we use it in PETSc.
> Thanks for the info.  Paul Hargrove said pretty much the same thing to me, 
> off-list.
>
> I'll check it out.
>



Re: [OMPI devel] RFC: "v1.9.0" (vs. "v1.9")

2014-09-22 Thread Gilles Gouaillardet
Folks,

if i read between the lines, it looks like the next stable branch will be
v2.0 and not v1.10
is there a strong reason for that (such as ABI compatibility will break, or
a major but internal refactoring) ?
/* other than v1.10 is less than v1.8 when comparing strings :-) */

Cheers,

Gilles

On Tue, Sep 23, 2014 at 8:25 AM, Andreas Schäfer  wrote:

> On 13:50 Mon 22 Sep , Aurélien Bouteiller wrote:
> > Could also start at 1.9.1 instead of 1.9.0. That gives a free number
> > for the “trunk” nightly builds.
>
> Could also start at 2.1 instead of 2.0. That gives a free boost in
> maturity.
>
> More seriously, I'd suggest to use Semantic Versioning[1], which is
> actually straight forward for both, developers and users. I'm quoting
> the linked website:
>
> > Given a version number MAJOR.MINOR.PATCH, increment the:
> >
> > 1. MAJOR version when you make incompatible API changes,
> > 2. MINOR version when you add functionality in a backwards-compatible
> >manner, and
> > 3. PATCH version when you make backwards-compatible bug fixes.
> >
> > Additional labels for pre-release and build metadata are available as
> > extensions to the MAJOR.MINOR.PATCH format.
>
> [1] http://semver.org/
>
> HTH
> -Andreas
>
>
> --
> ==
> Andreas Schäfer
> HPC and Grid Computing
> Chair of Computer Science 3
> Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
> +49 9131 85-27910
> PGP/GPG key via keyserver
> http://www.libgeodecomp.org
> ==
>
> (\___/)
> (+'.'+)
> (")_(")
> This is Bunny. Copy and paste Bunny into your
> signature to help him gain world domination!
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/09/15896.php
>


Re: [OMPI devel] race condition in oob/tcp

2014-09-22 Thread Gilles Gouaillardet
Ralph,

here is the patch i am using so far.
i will resume working on this from Wednesday (there is at least one
remaining race condition yet) unless you have the time to take care of it
today.

so far, the race condition has only been observed in real life with the
grpcomm/rcd module, and this is not the default in v1.8, so imho this is
not a blocker for v1.8.3

Cheers,

Gilles

On Tue, Sep 23, 2014 at 7:46 AM, Ralph Castain <r...@open-mpi.org> wrote:

> Gilles - please let me know if/when you think you'll do this. I'm debating
> about adding it to 1.8.3, but don't want to delay that release too long.
> Alternatively, I can take care of it if you don't have time (I'm asking if
> you can do it solely because you have the reproducer).
>
>
> On Sep 21, 2014, at 6:54 AM, Ralph Castain <r...@open-mpi.org> wrote:
>
> Sounds fine with me - please go ahead, and thanks
>
> On Sep 20, 2014, at 10:26 PM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com> wrote:
>
> Thanks for the pointer George !
>
> On Sat, Sep 20, 2014 at 5:46 AM, George Bosilca <bosi...@icl.utk.edu>
> wrote:
>
>> Or copy the handshake protocol design of the TCP BTL...
>>
>>
> the main difference between oob/tcp and btl/tcp is the way we resolve the
> situation in which two processes send their first message to each other at
> the same time.
>
> in oob/tcp, all (e.g. one or two) sockets are closed and the higher vpid
> is directed to retry establishing a connection.
>
> in btl/tcp, the useless socket is closed (e.g. the one that was connect-ed
> on the lower vpid and the one that was accept-ed on the higher vpid.
>
>
> my first impression is that oob/tcp is un-necessary complex and it should
> use the simpler and most efficient protocol of btl/tcp.
> that being said, this conclusion could be too naive and for some good
> reasons i ignore, the btl/tcp handshake protocol might not be a good fit
> for oob/tcp.
>
> any thoughts ?
>
> i will revamp oob/tcp in order to use the same btl/tcp handshake protocol
> from tomorrow unless indicated otherwise
>
> Cheers,
>
> Gilles
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/09/15885.php
>
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/09/15895.php
>


oobtcp2.patch
Description: Binary data


Re: [OMPI devel] race condition in oob/tcp

2014-09-21 Thread Gilles Gouaillardet
Thanks for the pointer George !

On Sat, Sep 20, 2014 at 5:46 AM, George Bosilca  wrote:

> Or copy the handshake protocol design of the TCP BTL...
>
>
the main difference between oob/tcp and btl/tcp is the way we resolve the
situation in which two processes send their first message to each other at
the same time.

in oob/tcp, all (e.g. one or two) sockets are closed and the higher vpid is
directed to retry establishing a connection.

in btl/tcp, the useless socket is closed (e.g. the one that was connect-ed
on the lower vpid and the one that was accept-ed on the higher vpid.


my first impression is that oob/tcp is un-necessary complex and it should
use the simpler and most efficient protocol of btl/tcp.
that being said, this conclusion could be too naive and for some good
reasons i ignore, the btl/tcp handshake protocol might not be a good fit
for oob/tcp.

any thoughts ?

i will revamp oob/tcp in order to use the same btl/tcp handshake protocol
from tomorrow unless indicated otherwise

Cheers,

Gilles


Re: [OMPI devel] race condition in oob/tcp

2014-09-19 Thread Gilles Gouaillardet
Ralph,

i found an other race condition.
in a very specific scenario, vpid3 is in the MCA_OOB_TCP_CLOSED state,
and processes data from the socket received from vpid 2
vpid3 is in the MCA_OOB_TCP_CLOSED state because vpid2 called retry()
and closed all its both sockets to vpid 3

vpid3 read the ack data that was send to the socket (ok) and then ends
up calling tcp_peer_send_blocking

Function
main (orted.c:62)
  orte_daemon (orted_main.c:828)
opal_libevent2021_event_base_loop (event.c:1645)
  event_process_active (event.c:1437)
event_process_active_single_queue (event.c:1367)
  recv_handler (oob_tcp.c:599)
mca_oob_tcp_peer_accept (oob_tcp_connection.c:1071)
  tcp_peer_send_connect_ack (oob_tcp_connection.c:384)
tcp_peer_send_blocking (oob_tcp_connection.c:525)


though the socket (fd 17) is my case has been closed by the peer, and is
hence reported in the CLOSE_WAIT state by lsof,
send(17, ...) is a success (!!!)

i thought the root cause was we previously set the O_NONBLOCK flag to
this socket.
so i explicitly cleared this flag (that was not set anyway...), before
invoking mca_oob_tcp_peer_accept
but i got the very same behaviour :-(

could you please advise :
- should the send fail because the socket is in the CLOSE_WAIT state ?
- if a success is not a bad behaviour, does this mean an other step
should be added to the oob/tcp "handshake" ?
- or could this mean that when the peer state was moved from
MCA_OOB_TCP_CONNECT_ACK to MCA_OOB_TCP_CLOSED,
retry() should have been invoked ?

Cheers,

Gilles

On 2014/09/18 17:02, Ralph Castain wrote:
> The patch looks fine to me - please go ahead and apply it. Thanks!
>
> On Sep 17, 2014, at 11:35 PM, Gilles Gouaillardet 
> <gilles.gouaillar...@iferc.org> wrote:
>
>> Ralph,
>>
>> yes and no ...
>>
>> mpi hello world with four nodes can be used to reproduce the issue,
>>
>>
>> you can increase the likelyhood of producing the race condition by hacking
>> ./opal/mca/event/libevent2021/libevent/poll.c
>> and replace
>>i = random() % nfds;
>> with
>>   if (nfds < 2) {
>>   i = 0;
>>   } else {
>>   i = nfds - 2;
>>   }
>>
>> but since this is really a race condition, all i could do is show you
>> how to use a debugger in order to force it
>>
>>
>> here is what really happens :
>> - thanks to your patch, when vpid 2 cannot read the acknowledge, this is
>> no more a fatal error.
>> - that being said, the peer->recv_event is not removed from the libevent
>> - later, send_event will be added to the libevent
>> - and then peer->recv_event will be added to the libevent
>> /* this is clearly not supported, and the interesting behaviour is that
>> peer->send_event will be kicked out of libevent (!) */
>>
>> The attached patch fixes this race condition, could you please review it ?
>>
>> Cheers,
>>
>> Gilles
>>
>> On 2014/09/17 22:17, Ralph Castain wrote:
>>> Do you have a reproducer you can share for testing this? I'm unable to get 
>>> it to happen on my machine, but maybe you have a test code that triggers it 
>>> so I can continue debugging
>>>
>>> Ralph
>>>
>>> On Sep 17, 2014, at 4:07 AM, Gilles Gouaillardet 
>>> <gilles.gouaillar...@iferc.org> wrote:
>>>
>>>> Thanks Ralph,
>>>>
>>>> this is much better but there is still a bug :
>>>> with the very same scenario i described earlier, vpid 2 does not send
>>>> its message to vpid 3 once the connection has been established.
>>>>
>>>> i tried to debug it but i have been pretty unsuccessful so far ..
>>>>
>>>> vpid 2 calls tcp_peer_connected and execute the following snippet
>>>>
>>>> if (NULL != peer->send_msg && !peer->send_ev_active) {
>>>>   opal_event_add(>send_event, 0);
>>>>   peer->send_ev_active = true;
>>>>   }
>>>>
>>>> but when evmap_io_active is invoked later, the following part :
>>>>
>>>>   TAILQ_FOREACH(ev, >events, ev_io_next) {
>>>>   if (ev->ev_events & events)
>>>>   event_active_nolock(ev, ev->ev_events & events, 1);
>>>>   }
>>>>
>>>> finds only one ev (mca_oob_tcp_recv_handler and *no*
>>>> mca_oob_tcp_send_handler)
>>>>
>>>> i will resume my investigations tomorrow
>>>>
>>>> Cheers,
>>>>
>>>> Gilles
>>>>
>>>> On 2014/

[OMPI devel] v1.8 does not compile any more

2014-09-19 Thread Gilles Gouaillardet
Folks,

r32716 broke v1.8 :-(

the root cause it it included MCA_BASE_VAR_TYPE_VERSION_STRING which has
not yet landed into v1.8

the attached trivial patch fixes this issue

Can the RM/GK please review it and apply it ?

Cheers,

Gilles
Index: opal/mca/base/mca_base_var.c
===
--- opal/mca/base/mca_base_var.c(revision 32763)
+++ opal/mca/base/mca_base_var.c(working copy)
@@ -14,6 +14,8 @@
  * Copyright (c) 2012-2014 Los Alamos National Security, LLC. All rights
  * reserved.
  * Copyright (c) 2014  Intel, Inc. All rights reserved.
+ * Copyright (c) 2014  Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  * 
  * Additional copyrights may follow
@@ -1164,7 +1166,6 @@
 case MCA_BASE_VAR_TYPE_DOUBLE:
 align = OPAL_ALIGNMENT_DOUBLE;
 break;
-case MCA_BASE_VAR_TYPE_VERSION_STRING:
 case MCA_BASE_VAR_TYPE_STRING:
 default:
 align = 0;


[OMPI devel] RFC: remove the --with-threads configure option

2014-09-18 Thread Gilles Gouaillardet
Folks,

for both trunk and v1.8 branch, configure takes the --with-threads option.
valid usages are
--with-threads, --with-threads=yes, --with-threads=posix and
--with-threads=no

/* v1.6 used to support the --with-threads=solaris */

if we try to configure with --with-threads=no, this will result in a
fatal error :

checking for working POSIX threads package... yes
checking for type of thread support... none
configure: error: User requested MPI threads, but no threading model
supported

bottom line, the --with-threads configure option is useless in both v1.8
and trunk.

is there any plan to support --with-threads=no in the future ?
if no, i'd like to simply remove the --with-threads option

Thanks in advance for your feedback

Gilles

FYI
there is still some dead code / bug related to solaris threads, and this
will be removed / fixed
see https://svn.open-mpi.org/trac/ompi/ticket/4911



Re: [OMPI devel] race condition in oob/tcp

2014-09-18 Thread Gilles Gouaillardet
Ralph,

yes and no ...

mpi hello world with four nodes can be used to reproduce the issue,


you can increase the likelyhood of producing the race condition by hacking
./opal/mca/event/libevent2021/libevent/poll.c
and replace
i = random() % nfds;
with
   if (nfds < 2) {
   i = 0;
   } else {
   i = nfds - 2;
   }

but since this is really a race condition, all i could do is show you
how to use a debugger in order to force it


here is what really happens :
- thanks to your patch, when vpid 2 cannot read the acknowledge, this is
no more a fatal error.
- that being said, the peer->recv_event is not removed from the libevent
- later, send_event will be added to the libevent
- and then peer->recv_event will be added to the libevent
/* this is clearly not supported, and the interesting behaviour is that
peer->send_event will be kicked out of libevent (!) */

The attached patch fixes this race condition, could you please review it ?

Cheers,

Gilles

On 2014/09/17 22:17, Ralph Castain wrote:
> Do you have a reproducer you can share for testing this? I'm unable to get it 
> to happen on my machine, but maybe you have a test code that triggers it so I 
> can continue debugging
>
> Ralph
>
> On Sep 17, 2014, at 4:07 AM, Gilles Gouaillardet 
> <gilles.gouaillar...@iferc.org> wrote:
>
>> Thanks Ralph,
>>
>> this is much better but there is still a bug :
>> with the very same scenario i described earlier, vpid 2 does not send
>> its message to vpid 3 once the connection has been established.
>>
>> i tried to debug it but i have been pretty unsuccessful so far ..
>>
>> vpid 2 calls tcp_peer_connected and execute the following snippet
>>
>> if (NULL != peer->send_msg && !peer->send_ev_active) {
>>opal_event_add(>send_event, 0);
>>peer->send_ev_active = true;
>>}
>>
>> but when evmap_io_active is invoked later, the following part :
>>
>>TAILQ_FOREACH(ev, >events, ev_io_next) {
>>if (ev->ev_events & events)
>>event_active_nolock(ev, ev->ev_events & events, 1);
>>}
>>
>> finds only one ev (mca_oob_tcp_recv_handler and *no*
>> mca_oob_tcp_send_handler)
>>
>> i will resume my investigations tomorrow
>>
>> Cheers,
>>
>> Gilles
>>
>> On 2014/09/17 4:01, Ralph Castain wrote:
>>> Hi Gilles
>>>
>>> I took a crack at solving this in r32744 - CMRd it for 1.8.3 and assigned 
>>> it to you for review. Give it a try and let me know if I (hopefully) got it.
>>>
>>> The approach we have used in the past is to have both sides close their 
>>> connections, and then have the higher vpid retry while the lower one waits. 
>>> The logic for that was still in place, but it looks like you are hitting a 
>>> different code path, and I found another potential one as well. So I think 
>>> I plugged the holes, but will wait to hear if you confirm.
>>>
>>> Thanks
>>> Ralph
>>>
>>> On Sep 16, 2014, at 6:27 AM, Gilles Gouaillardet 
>>> <gilles.gouaillar...@gmail.com> wrote:
>>>
>>>> Ralph,
>>>>
>>>> here is the full description of a race condition in oob/tcp i very briefly 
>>>> mentionned in a previous post :
>>>>
>>>> the race condition can occur when two not connected orted try to send a 
>>>> message to each other for the first time and at the same time.
>>>>
>>>> that can occur when running mpi helloworld on 4 nodes with the grpcomm/rcd 
>>>> module.
>>>>
>>>> here is a scenario in which the race condition occurs :
>>>>
>>>> orted vpid 2 and 3 enter the allgather
>>>> /* they are not orte yet oob/tcp connected*/
>>>> and they call orte.send_buffer_nb each other.
>>>> from a libevent point of view, vpid 2 and 3 will call 
>>>> mca_oob_tcp_peer_try_connect
>>>>
>>>> vpid 2 calls mca_oob_tcp_send_handler
>>>>
>>>> vpid 3 calls connection_event_handler
>>>>
>>>> depending on the value returned by random() in libevent, vpid 3 will
>>>> either call mca_oob_tcp_send_handler (likely) or recv_handler (unlikely)
>>>> if vpid 3 calls recv_handler, it will close the two sockets to vpid 2
>>>>
>>>> then vpid 2 will call mca_oob_tcp_recv_handler
>>>> (peer->state is MCA_OOB_TCP_CONNECT_ACK)
>>>> that will invoke mca_oob_tcp_recv_connect_ack
>>>> tcp_peer_recv_blocking will fail 
&g

Re: [OMPI devel] race condition in oob/tcp

2014-09-17 Thread Gilles Gouaillardet
Thanks Ralph,

this is much better but there is still a bug :
with the very same scenario i described earlier, vpid 2 does not send
its message to vpid 3 once the connection has been established.

i tried to debug it but i have been pretty unsuccessful so far ..

vpid 2 calls tcp_peer_connected and execute the following snippet

 if (NULL != peer->send_msg && !peer->send_ev_active) {
opal_event_add(>send_event, 0);
peer->send_ev_active = true;
}

but when evmap_io_active is invoked later, the following part :

TAILQ_FOREACH(ev, >events, ev_io_next) {
if (ev->ev_events & events)
event_active_nolock(ev, ev->ev_events & events, 1);
}

finds only one ev (mca_oob_tcp_recv_handler and *no*
mca_oob_tcp_send_handler)

i will resume my investigations tomorrow

Cheers,

Gilles

On 2014/09/17 4:01, Ralph Castain wrote:
> Hi Gilles
>
> I took a crack at solving this in r32744 - CMRd it for 1.8.3 and assigned it 
> to you for review. Give it a try and let me know if I (hopefully) got it.
>
> The approach we have used in the past is to have both sides close their 
> connections, and then have the higher vpid retry while the lower one waits. 
> The logic for that was still in place, but it looks like you are hitting a 
> different code path, and I found another potential one as well. So I think I 
> plugged the holes, but will wait to hear if you confirm.
>
> Thanks
> Ralph
>
> On Sep 16, 2014, at 6:27 AM, Gilles Gouaillardet 
> <gilles.gouaillar...@gmail.com> wrote:
>
>> Ralph,
>>
>> here is the full description of a race condition in oob/tcp i very briefly 
>> mentionned in a previous post :
>>
>> the race condition can occur when two not connected orted try to send a 
>> message to each other for the first time and at the same time.
>>
>> that can occur when running mpi helloworld on 4 nodes with the grpcomm/rcd 
>> module.
>>
>> here is a scenario in which the race condition occurs :
>>
>> orted vpid 2 and 3 enter the allgather
>> /* they are not orte yet oob/tcp connected*/
>> and they call orte.send_buffer_nb each other.
>> from a libevent point of view, vpid 2 and 3 will call 
>> mca_oob_tcp_peer_try_connect
>>
>> vpid 2 calls mca_oob_tcp_send_handler
>>
>> vpid 3 calls connection_event_handler
>>
>> depending on the value returned by random() in libevent, vpid 3 will
>> either call mca_oob_tcp_send_handler (likely) or recv_handler (unlikely)
>> if vpid 3 calls recv_handler, it will close the two sockets to vpid 2
>>
>> then vpid 2 will call mca_oob_tcp_recv_handler
>> (peer->state is MCA_OOB_TCP_CONNECT_ACK)
>> that will invoke mca_oob_tcp_recv_connect_ack
>> tcp_peer_recv_blocking will fail 
>> /* zero bytes are recv'ed since vpid 3 previously closed the socket before 
>> writing a header */
>> and this is handled by mca_oob_tcp_recv_handler as a fatal error
>> /* ORTE_FORCED_TERMINATE(1) */
>>
>> could you please have a look at it ?
>>
>> if you are too busy, could you please advise where this scenario should be 
>> handled differently ?
>> - should vpid 3 keep one socket instead of closing both and retrying ?
>> - should vpid 2 handle the failure as a non fatal error ?
>>
>> Cheers,
>>
>> Gilles
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/09/15836.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15844.php



[OMPI devel] race condition in oob/tcp

2014-09-16 Thread Gilles Gouaillardet
Ralph,

here is the full description of a race condition in oob/tcp i very briefly
mentionned in a previous post :

the race condition can occur when two not connected orted try to send a
message to each other for the first time and at the same time.

that can occur when running mpi helloworld on 4 nodes with the grpcomm/rcd
module.

here is a scenario in which the race condition occurs :

orted vpid 2 and 3 enter the allgather
/* they are not orte yet oob/tcp connected*/
and they call orte.send_buffer_nb each other.
from a libevent point of view, vpid 2 and 3 will call
mca_oob_tcp_peer_try_connect

vpid 2 calls mca_oob_tcp_send_handler

vpid 3 calls connection_event_handler

depending on the value returned by random() in libevent, vpid 3 will
either call mca_oob_tcp_send_handler (likely) or recv_handler (unlikely)
if vpid 3 calls recv_handler, it will close the two sockets to vpid 2

then vpid 2 will call mca_oob_tcp_recv_handler
(peer->state is MCA_OOB_TCP_CONNECT_ACK)
that will invoke mca_oob_tcp_recv_connect_ack
tcp_peer_recv_blocking will fail
/* zero bytes are recv'ed since vpid 3 previously closed the socket before
writing a header */
and this is handled by mca_oob_tcp_recv_handler as a fatal error
/* ORTE_FORCED_TERMINATE(1) */

could you please have a look at it ?

if you are too busy, could you please advise where this scenario should be
handled differently ?
- should vpid 3 keep one socket instead of closing both and retrying ?
- should vpid 2 handle the failure as a non fatal error ?

Cheers,

Gilles


Re: [OMPI devel] coll ml error with some nonblocking collectives

2014-09-15 Thread Gilles Gouaillardet
Howard, and Rolf,

i initially reported the issue at
http://www.open-mpi.org/community/lists/devel/2014/09/15767.php

r32659 is not a fix nor a regression, it simply aborts instead of
OBJ_RELEASE(mpi_comm_world).
/* my point here is we should focus on the root cause and not the
consequence */

first, this is a race condition, so one run is not enough to conclude the
problem is fixed.
second, if you do not configure with --enable-debug, there might be a
silent data corruption with undefined results if the bug is hit. undefined
result can mean the test success.

bottom line and imho :
- if your test success without r32659, it just means you were lucky ...
- an abort with an understandable error message is better than a silent
corruption

last but not least, r32659 was acked for v1.8 8 #4888).
coll/ml priority is now zero in v1.8 and this is likely the only reason why
you do not see any errors in this branch.

Cheers,

Gilles

On Tue, Sep 16, 2014 at 8:33 AM, Pritchard Jr., Howard 
wrote:

>  HI Rolf,
>
>
>
> Okay.  I’ll work with ORNL folks to see how to really fix this.
>
>
>
> Howard
>
>
>
>
>
> *From:* devel [mailto:devel-boun...@open-mpi.org] *On Behalf Of *Rolf
> vandeVaart
> *Sent:* Monday, September 15, 2014 3:10 PM
>
> *To:* Open MPI Developers
> *Subject:* Re: [OMPI devel] coll ml error with some nonblocking
> collectives
>
>
>
> Confirmed that trunk version r32658 does pass the test.
>
>
>
> *From:* devel [mailto:devel-boun...@open-mpi.org
> ] *On Behalf Of *Pritchard Jr., Howard
> *Sent:* Monday, September 15, 2014 4:16 PM
> *To:* Open MPI Developers
> *Subject:* Re: [OMPI devel] coll ml error with some nonblocking
> collectives
>
>
>
> Hi Rolf,
>
>
>
> This may be related to change set 32659.
>
>
>
> If you back this change out, do the tests pass?
>
>
>
>
>
> Howard
>
>
>
>
>
>
>
>
>
> *From:* devel [mailto:devel-boun...@open-mpi.org
> ] *On Behalf Of *Rolf vandeVaart
> *Sent:* Monday, September 15, 2014 8:55 AM
> *To:* de...@open-mpi.org
> *Subject:* [OMPI devel] coll ml error with some nonblocking collectives
>
>
>
> I wonder if anyone else is seeing this failure. Not sure when this started
> but it is only on the trunk. Here is a link to my failures as well as an
> example below that. There are a variety of nonblocking collectives failing
> like this.
>
>
>
> http://mtt.open-mpi.org/index.php?do_redir=2208
>
>
>
> [rvandevaart@drossetti-ivy0 collective]$ mpirun --mca btl self,sm,tcp
> -host drossetti-ivy0,drossetti-ivy0,drossetti-ivy1,drossetti-ivy1 iallreduce
> --
>
> ML detected an unrecoverable error on intrinsic communicator MPI_COMM_WORLD
>
> The program will now abort
> --
> [drossetti-ivy0.nvidia.com:04664] 3 more processes have sent help message
> help-mpi-coll-ml.txt / coll-ml-check-fatal-error
> [rvandevaart@drossetti-ivy0 collective]$
>   --
>
> This email message is for the sole use of the intended recipient(s) and
> may contain confidential information.  Any unauthorized review, use,
> disclosure or distribution is prohibited.  If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
>   --
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/09/15834.php
>


Re: [OMPI devel] race condition in grpcomm/rcd

2014-09-12 Thread Gilles Gouaillardet
Ralph,

On Fri, Sep 12, 2014 at 10:54 AM, Ralph Castain <r...@open-mpi.org> wrote:

> The design is supposed to be that each node knows precisely how many
> daemons are involved in each collective, and who is going to talk to them.


ok, but in the design does not ensure that things will happen in the right
order :
- enter the allgather
- receive data from the daemon at distance 1
- receive data from the daemon at distance 2
- and so on

with current implementation when 2 daemons are involved, if a daemon enters
the allgather after it received data from the peer, then the mpi processes
local to this daemon will hang

with 4 nodes, it got trickier :
0 enter allgather and send a message to 1
1 receive the message and send to 2 but with data from 0 only
/* 1 did not enter the allgather, so its data cannot be sent to 2 */

this issue did not occur before the persistent receive :
no receive was posted if the daemon did not enter the allgather


The signature contains the info required to ensure the receiver knows which
> collective this message relates to, and just happens to also allow them to
> lookup the number of daemons involved (the base function takes care of that
> for them).
>
>
ok too, this issue was solved with the persistent receive

So there is no need for a "pending" list - if you receive a message about a
> collective you don't yet know about, you just put it on the ongoing
> collective list. You should only receive it if you are going to be involved
> - i.e., you have local procs that are going to participate. So you wait
> until your local procs participate, and then pass your collected bucket
> along.
>
> ok, i did something similar
(e.g. pass all the available data)
some data might be passed twice, but that might not be an issue


> I suspect the link to the local procs isn't being correctly dealt with,
> else you couldn't be hanging. Or the rcd isn't correctly passing incoming
> messages to the base functions to register the collective.
>
> I'll look at it over the weekend and can resolve it then.
>
>
 the attached patch is an illustration of what i was trying to explain.
coll->nreported is used by rcd as a bitmask of the received messages
(bit 0 is for the local daemon, bit n for the daemon at distance n)

i was still debugging a race condition :
if daemons 2 and 3 enter the allgather at the send time, they will sent a
message to each other at the same time and rml fails establishing the
connection.  i could not find whether this is linked to my changes...

Cheers,

Gilles

>
> On Sep 11, 2014, at 5:23 PM, Gilles Gouaillardet <
> gilles.gouaillar...@iferc.org> wrote:
>
> > Ralph,
> >
> > you are right, this was definetly not the right fix (at least with 4
> > nodes or more)
> >
> > i finally understood what is going wrong here :
> > to make it simple, the allgather recursive doubling algo is not
> > implemented with
> > MPI_Recv(...,peer,...) like functions but with
> > MPI_Recv(...,MPI_ANY_SOURCE,...) like functions
> > and that makes things slightly more complicated :
> > right now :
> > - with two nodes : if node 1 is late, it gets stuck in the allgather
> > - with four nodes : if node 0 is first, then node 2 and 3 while node 1
> > is still late, then node 0
> > will likely leaves the allgather though it did not receive anything
> > from  node 1
> > - and so on
> >
> > i think i can fix that from now
> >
> > Cheers,
> >
> > Gilles
> >
> > On 2014/09/11 23:47, Ralph Castain wrote:
> >> Yeah, that's not the right fix, I'm afraid. I've made the direct
> component the default again until I have time to dig into this deeper.
> >>
> >> On Sep 11, 2014, at 4:02 AM, Gilles Gouaillardet <
> gilles.gouaillar...@iferc.org> wrote:
> >>
> >>> Ralph,
> >>>
> >>> the root cause is when the second orted/mpirun runs rcd_finalize_coll,
> >>> it does not invoke pmix_server_release
> >>> because allgather_stub was not previously invoked since the the fence
> >>> was not yet entered.
> >>> /* in rcd_finalize_coll, coll->cbfunc is NULL */
> >>>
> >>> the attached patch is likely not the right fix, it was very lightly
> >>> tested, but so far, it works for me ...
> >>>
> >>> Cheers,
> >>>
> >>> Gilles
> >>>
> >>> On 2014/09/11 16:11, Gilles Gouaillardet wrote:
> >>>> Ralph,
> >>>>
> >>>> things got worst indeed :-(
> >>>>
> >>>> now a simple hello world involving two hosts hang in mpi_init.
> >>>> there is still a race 

Re: [OMPI devel] race condition in grpcomm/rcd

2014-09-11 Thread Gilles Gouaillardet
Ralph,

you are right, this was definetly not the right fix (at least with 4
nodes or more)

i finally understood what is going wrong here :
to make it simple, the allgather recursive doubling algo is not
implemented with
MPI_Recv(...,peer,...) like functions but with
MPI_Recv(...,MPI_ANY_SOURCE,...) like functions
and that makes things slightly more complicated :
right now :
- with two nodes : if node 1 is late, it gets stuck in the allgather
- with four nodes : if node 0 is first, then node 2 and 3 while node 1
is still late, then node 0
will likely leaves the allgather though it did not receive anything
from  node 1
- and so on

i think i can fix that from now

Cheers,

Gilles

On 2014/09/11 23:47, Ralph Castain wrote:
> Yeah, that's not the right fix, I'm afraid. I've made the direct component 
> the default again until I have time to dig into this deeper.
>
> On Sep 11, 2014, at 4:02 AM, Gilles Gouaillardet 
> <gilles.gouaillar...@iferc.org> wrote:
>
>> Ralph,
>>
>> the root cause is when the second orted/mpirun runs rcd_finalize_coll,
>> it does not invoke pmix_server_release
>> because allgather_stub was not previously invoked since the the fence
>> was not yet entered.
>> /* in rcd_finalize_coll, coll->cbfunc is NULL */
>>
>> the attached patch is likely not the right fix, it was very lightly
>> tested, but so far, it works for me ...
>>
>> Cheers,
>>
>> Gilles
>>
>> On 2014/09/11 16:11, Gilles Gouaillardet wrote:
>>> Ralph,
>>>
>>> things got worst indeed :-(
>>>
>>> now a simple hello world involving two hosts hang in mpi_init.
>>> there is still a race condition : if a tasks a call fence long after task b,
>>> then task b will never leave the fence
>>>
>>> i ll try to debug this ...
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> On 2014/09/11 2:36, Ralph Castain wrote:
>>>> I think I now have this fixed - let me know what you see.
>>>>
>>>>
>>>> On Sep 9, 2014, at 6:15 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>
>>>>> Yeah, that's not the correct fix. The right way to fix it is for all 
>>>>> three components to have their own RML tag, and for each of them to 
>>>>> establish a persistent receive. They then can use the signature to tell 
>>>>> which collective the incoming message belongs to.
>>>>>
>>>>> I'll fix it, but it won't be until tomorrow I'm afraid as today is shot.
>>>>>
>>>>>
>>>>> On Sep 9, 2014, at 3:10 AM, Gilles Gouaillardet 
>>>>> <gilles.gouaillar...@iferc.org> wrote:
>>>>>
>>>>>> Folks,
>>>>>>
>>>>>> Since r32672 (trunk), grpcomm/rcd is the default module.
>>>>>> the attached spawn.c test program is a trimmed version of the
>>>>>> spawn_with_env_vars.c test case
>>>>>> from the ibm test suite.
>>>>>>
>>>>>> when invoked on two nodes :
>>>>>> - the program hangs with -np 2
>>>>>> - the program can crash with np > 2
>>>>>> error message is
>>>>>> [node0:30701] [[42913,0],0] TWO RECEIVES WITH SAME PEER [[42913,0],1]
>>>>>> AND TAG -33 - ABORTING
>>>>>>
>>>>>> here is my full command line (from node0) :
>>>>>>
>>>>>> mpirun -host node0,node1 -np 2 --oversubscribe --mca btl tcp,self --mca
>>>>>> coll ^ml ./spawn
>>>>>>
>>>>>> a simple workaround is to add the following extra parameter to the
>>>>>> mpirun command line :
>>>>>> --mca grpcomm_rcd_priority 0
>>>>>>
>>>>>> my understanding it that the race condition occurs when all the
>>>>>> processes call MPI_Finalize()
>>>>>> internally, the pmix module will have mpirun/orted issue two ALLGATHER
>>>>>> involving mpirun and orted
>>>>>> (one job 1 aka the parent, and one for job 2 aka the spawned tasks)
>>>>>> the error message is very explicit : this is not (currently) supported
>>>>>>
>>>>>> i wrote the attached rml.patch which is really a workaround and not a 
>>>>>> fix :
>>>>>> in this case, each job will invoke an ALLGATHER but with a different tag
>>>>>> /* that works for a limited number of jobs only */
>>>>>>
&g

Re: [OMPI devel] race condition in grpcomm/rcd

2014-09-11 Thread Gilles Gouaillardet
Ralph,

the root cause is when the second orted/mpirun runs rcd_finalize_coll,
it does not invoke pmix_server_release
because allgather_stub was not previously invoked since the the fence
was not yet entered.
/* in rcd_finalize_coll, coll->cbfunc is NULL */

the attached patch is likely not the right fix, it was very lightly
tested, but so far, it works for me ...

Cheers,

Gilles

On 2014/09/11 16:11, Gilles Gouaillardet wrote:
> Ralph,
>
> things got worst indeed :-(
>
> now a simple hello world involving two hosts hang in mpi_init.
> there is still a race condition : if a tasks a call fence long after task b,
> then task b will never leave the fence
>
> i ll try to debug this ...
>
> Cheers,
>
> Gilles
>
> On 2014/09/11 2:36, Ralph Castain wrote:
>> I think I now have this fixed - let me know what you see.
>>
>>
>> On Sep 9, 2014, at 6:15 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>
>>> Yeah, that's not the correct fix. The right way to fix it is for all three 
>>> components to have their own RML tag, and for each of them to establish a 
>>> persistent receive. They then can use the signature to tell which 
>>> collective the incoming message belongs to.
>>>
>>> I'll fix it, but it won't be until tomorrow I'm afraid as today is shot.
>>>
>>>
>>> On Sep 9, 2014, at 3:10 AM, Gilles Gouaillardet 
>>> <gilles.gouaillar...@iferc.org> wrote:
>>>
>>>> Folks,
>>>>
>>>> Since r32672 (trunk), grpcomm/rcd is the default module.
>>>> the attached spawn.c test program is a trimmed version of the
>>>> spawn_with_env_vars.c test case
>>>> from the ibm test suite.
>>>>
>>>> when invoked on two nodes :
>>>> - the program hangs with -np 2
>>>> - the program can crash with np > 2
>>>> error message is
>>>> [node0:30701] [[42913,0],0] TWO RECEIVES WITH SAME PEER [[42913,0],1]
>>>> AND TAG -33 - ABORTING
>>>>
>>>> here is my full command line (from node0) :
>>>>
>>>> mpirun -host node0,node1 -np 2 --oversubscribe --mca btl tcp,self --mca
>>>> coll ^ml ./spawn
>>>>
>>>> a simple workaround is to add the following extra parameter to the
>>>> mpirun command line :
>>>> --mca grpcomm_rcd_priority 0
>>>>
>>>> my understanding it that the race condition occurs when all the
>>>> processes call MPI_Finalize()
>>>> internally, the pmix module will have mpirun/orted issue two ALLGATHER
>>>> involving mpirun and orted
>>>> (one job 1 aka the parent, and one for job 2 aka the spawned tasks)
>>>> the error message is very explicit : this is not (currently) supported
>>>>
>>>> i wrote the attached rml.patch which is really a workaround and not a fix :
>>>> in this case, each job will invoke an ALLGATHER but with a different tag
>>>> /* that works for a limited number of jobs only */
>>>>
>>>> i did not commit this patch since this is not a fix, could someone
>>>> (Ralph ?) please review the issue and comment ?
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Gilles
>>>>
>>>> ___
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/devel/2014/09/15780.php
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/09/15794.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15804.php

Index: orte/mca/grpcomm/rcd/grpcomm_rcd.c
===
--- orte/mca/grpcomm/rcd/grpcomm_rcd.c  (revision 32706)
+++ orte/mca/grpcomm/rcd/grpcomm_rcd.c  (working copy)
@@ -6,6 +6,8 @@
  * Copyright (c) 2011-2013 Los Alamos National Security, LLC. All
  * rights reserved.
  * Copyright (c) 2014  Intel, Inc.  All rights reserved.
+ * Copyright (c) 2014  Research Organization for Information Science
+ * and Technology 

Re: [OMPI devel] race condition in grpcomm/rcd

2014-09-11 Thread Gilles Gouaillardet
Ralph,

things got worst indeed :-(

now a simple hello world involving two hosts hang in mpi_init.
there is still a race condition : if a tasks a call fence long after task b,
then task b will never leave the fence

i ll try to debug this ...

Cheers,

Gilles

On 2014/09/11 2:36, Ralph Castain wrote:
> I think I now have this fixed - let me know what you see.
>
>
> On Sep 9, 2014, at 6:15 AM, Ralph Castain <r...@open-mpi.org> wrote:
>
>> Yeah, that's not the correct fix. The right way to fix it is for all three 
>> components to have their own RML tag, and for each of them to establish a 
>> persistent receive. They then can use the signature to tell which collective 
>> the incoming message belongs to.
>>
>> I'll fix it, but it won't be until tomorrow I'm afraid as today is shot.
>>
>>
>> On Sep 9, 2014, at 3:10 AM, Gilles Gouaillardet 
>> <gilles.gouaillar...@iferc.org> wrote:
>>
>>> Folks,
>>>
>>> Since r32672 (trunk), grpcomm/rcd is the default module.
>>> the attached spawn.c test program is a trimmed version of the
>>> spawn_with_env_vars.c test case
>>> from the ibm test suite.
>>>
>>> when invoked on two nodes :
>>> - the program hangs with -np 2
>>> - the program can crash with np > 2
>>> error message is
>>> [node0:30701] [[42913,0],0] TWO RECEIVES WITH SAME PEER [[42913,0],1]
>>> AND TAG -33 - ABORTING
>>>
>>> here is my full command line (from node0) :
>>>
>>> mpirun -host node0,node1 -np 2 --oversubscribe --mca btl tcp,self --mca
>>> coll ^ml ./spawn
>>>
>>> a simple workaround is to add the following extra parameter to the
>>> mpirun command line :
>>> --mca grpcomm_rcd_priority 0
>>>
>>> my understanding it that the race condition occurs when all the
>>> processes call MPI_Finalize()
>>> internally, the pmix module will have mpirun/orted issue two ALLGATHER
>>> involving mpirun and orted
>>> (one job 1 aka the parent, and one for job 2 aka the spawned tasks)
>>> the error message is very explicit : this is not (currently) supported
>>>
>>> i wrote the attached rml.patch which is really a workaround and not a fix :
>>> in this case, each job will invoke an ALLGATHER but with a different tag
>>> /* that works for a limited number of jobs only */
>>>
>>> i did not commit this patch since this is not a fix, could someone
>>> (Ralph ?) please review the issue and comment ?
>>>
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/09/15780.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15794.php



Re: [OMPI devel] Need to know your Github ID

2014-09-11 Thread Gilles Gouaillardet
ggouaillardet -> ggouaillardet

On 2014/09/10 19:46, Jeff Squyres (jsquyres) wrote:
> As the next step of the planned migration to Github, I need to know:
>
> - Your Github ID (so that you can be added to the new OMPI git repo)
> - Your SVN ID (so that I can map SVN->Github IDs, and therefore map Trac 
> tickets to appropriate owners)
>
> Here's the list of SVN IDs who have committed over the past year -- I'm 
> guessing that most of these people will need Github IDs:
>
>  adrian 
>  alekseys 
>  alex 
>  alinas 
>  amikheev 
>  bbenton 
>  bosilca (done)
>  bouteill 
>  brbarret 
>  bwesarg 
>  devendar 
>  dgoodell (done)
>  edgar 
>  eugene 
>  ggouaillardet 
>  hadi 
>  hjelmn 
>  hpcchris 
>  hppritcha 
>  igoru 
>  jjhursey (done)
>  jladd 
>  jroman 
>  jsquyres (done)
>  jurenz 
>  kliteyn 
>  manjugv 
>  miked (done)
>  mjbhaskar 
>  mpiteam (done)
>  naughtont 
>  osvegis 
>  pasha 
>  regrant 
>  rfaucett 
>  rhc (done)
>  rolfv (done)
>  samuel 
>  shiqing 
>  swise 
>  tkordenbrock 
>  vasily 
>  vvenkates 
>  vvenkatesan 
>  yaeld 
>  yosefe 
>



[OMPI devel] race condition in grpcomm/rcd

2014-09-09 Thread Gilles Gouaillardet
Folks,

Since r32672 (trunk), grpcomm/rcd is the default module.
the attached spawn.c test program is a trimmed version of the
spawn_with_env_vars.c test case
from the ibm test suite.

when invoked on two nodes :
- the program hangs with -np 2
- the program can crash with np > 2
error message is
[node0:30701] [[42913,0],0] TWO RECEIVES WITH SAME PEER [[42913,0],1]
AND TAG -33 - ABORTING

here is my full command line (from node0) :

mpirun -host node0,node1 -np 2 --oversubscribe --mca btl tcp,self --mca
coll ^ml ./spawn

a simple workaround is to add the following extra parameter to the
mpirun command line :
--mca grpcomm_rcd_priority 0

my understanding it that the race condition occurs when all the
processes call MPI_Finalize()
internally, the pmix module will have mpirun/orted issue two ALLGATHER
involving mpirun and orted
(one job 1 aka the parent, and one for job 2 aka the spawned tasks)
the error message is very explicit : this is not (currently) supported

i wrote the attached rml.patch which is really a workaround and not a fix :
in this case, each job will invoke an ALLGATHER but with a different tag
/* that works for a limited number of jobs only */

i did not commit this patch since this is not a fix, could someone
(Ralph ?) please review the issue and comment ?


Cheers,

Gilles

/*
 * $HEADER$
 *
 * Program to test MPI_Comm_spawn with environment variables.
 */

#include 
#include 
#include 

#include "mpi.h"

static void do_parent(char *cmd, int rank, int count)
{
int *errcode, err;
int i;
MPI_Comm child_inter;
MPI_Comm intra;
FILE *fp;
int found;
int size;

/* First, see if cmd exists on all ranks */

fp = fopen(cmd, "r");
if (NULL == fp) {
found = 0;
} else {
fclose(fp);
found = 1;
}
MPI_Comm_size(MPI_COMM_WORLD, );
MPI_Allreduce(, , 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD);
if (count != size) {
if (rank == 0) {
MPI_Abort(MPI_COMM_WORLD, 77);
}
return;
}

/* Now try the spawn if it's found anywhere */

errcode = malloc(sizeof(int) * count);
if (NULL == errcode) {
MPI_Abort(MPI_COMM_WORLD, 1);
}
memset(errcode, -1, count);
MPI_Comm_spawn(cmd, MPI_ARGV_NULL, count, MPI_INFO_NULL, 0,
   MPI_COMM_WORLD, _inter, errcode);

/* Clean up */
MPI_Barrier(child_inter);

MPI_Comm_disconnect(_inter);
free(errcode);
}


static void do_target(MPI_Comm parent)
{
MPI_Barrier(parent);
MPI_Comm_disconnect();
}


int main(int argc, char *argv[])
{
int rank, size;
MPI_Comm parent;

/* Ok, we're good.  Proceed with the test. */
MPI_Init(, );
MPI_Comm_size(MPI_COMM_WORLD, );
MPI_Comm_rank(MPI_COMM_WORLD, );

/* Check to see if we *were* spawned -- because this is a test, we
   can only assume the existence of this one executable.  Hence, we
   both mpirun it and spawn it. */

parent = MPI_COMM_NULL;
MPI_Comm_get_parent();
if (parent != MPI_COMM_NULL) {
do_target(parent);
} else {
do_parent(argv[0], rank, size);
}

MPI_Comm_size(MPI_COMM_WORLD, );
MPI_Comm_rank(MPI_COMM_WORLD, );
if (0 < rank) sleep(3);

MPI_Finalize();

/* All done */

return 0;
}
Index: orte/mca/grpcomm/brks/grpcomm_brks.c
===
--- orte/mca/grpcomm/brks/grpcomm_brks.c(revision 32688)
+++ orte/mca/grpcomm/brks/grpcomm_brks.c(working copy)
@@ -6,6 +6,8 @@
  * Copyright (c) 2011-2013 Los Alamos National Security, LLC. All
  * rights reserved.
  * Copyright (c) 2014  Intel, Inc.  All rights reserved.
+ * Copyright (c) 2014  Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -111,6 +113,7 @@
 static int brks_allgather_send_dist(orte_grpcomm_coll_t *coll, orte_vpid_t 
distance) {
 orte_process_name_t peer_send, peer_recv;
 opal_buffer_t *send_buf;
+orte_rml_tag_t tag;
 int rc;

 peer_send.jobid = ORTE_PROC_MY_NAME->jobid;
@@ -174,8 +177,14 @@
  ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
  ORTE_NAME_PRINT(_send)));

+if (1 != coll->sig->sz || ORTE_VPID_WILDCARD != 
coll->sig->signature[0].vpid) {
+tag = ORTE_RML_TAG_ALLGATHER;
+} else {
+tag = ORTE_RML_TAG_JOB_ALLGATHER + 
ORTE_LOCAL_JOBID(coll->sig->signature[0].jobid) % 
(ORTE_RML_TAG_MAX-ORTE_RML_TAG_JOB_ALLGATHER);
+}
+
 if (0 > (rc = orte_rml.send_buffer_nb(_send, send_buf,
-  -ORTE_RML_TAG_ALLGATHER,
+  -tag,
   orte_rml_send_callback, NULL))) {
 ORTE_ERROR_LOG(rc);
 OBJ_RELEASE(send_buf);
@@ -189,7 +198,7 @@

 /* setup recv for distance data 

Re: [OMPI devel] about r32685

2014-09-08 Thread Gilles Gouaillardet
Ralph,

ok, let me clarify my point :

tolower() is invoked in :
opal/mca/hwloc/hwloc191/hwloc/src/misc.c
and ctype.h is already #include'd in this file

tolower() is also invoked in :
opal/mca/hwloc/hwloc191/hwloc/include/private/misc.h
*only* if HWLOC_HAVE_DECL_STRNCASECMP is not #define'd :

static __hwloc_inline int hwloc_strncasecmp(const char *s1, const char
*s2, size_t n)
{
#ifdef HWLOC_HAVE_DECL_STRNCASECMP
  return strncasecmp(s1, s2, n);
#else
  while (n) {
char c1 = tolower(*s1), c2 = tolower(*s2);
if (!c1 || !c2 || c1 != c2)
  return c1-c2;
n--; s1++; s2++;
  }
  return 0;
#endif
}

my point was that on your CentOS box, HWLOC_HAVE_DECL_STRNCASECMP
*should* have been #define'd by configure,
even if you are using intel or clang 3.2 compiler.

Cheers,

Gilles

On 2014/09/09 11:47, Ralph Castain wrote:
> I'll have to let Brice comment on the config change. All I can say is that 
> "tolower" on my CentOS box is defined in , and that has to be 
> included in the misc.h header.
>
>
> On Sep 8, 2014, at 5:49 PM, Gilles Gouaillardet 
> <gilles.gouaillar...@iferc.org> wrote:
>
>> Ralph and Brice,
>>
>> i noted Ralph commited r32685 in order to fix a problem with Intel
>> compilers.
>> The very similar issue occurs with clang 3.2 (gcc and clang 3.4 are ok
>> for me)
>>
>> imho, the root cause is in the hwloc configure.
>> in this case, configure fails to detect strncasecmp is part of the C
>> include files.
>>
>> in order to achieve this, the conftest.1.c program is compiled and a
>> failure means that
>> strncasecmp is supported since it is declared in some C include files.
>>
>> gcc and clang 3.4 both fail to compile this program :
>>
>> $ gcc -c /tmp/conftest.1.c ; echo $?
>> /tmp/conftest.1.c:592: warning: data definition has no type or storage class
>> /tmp/conftest.1.c:592: error: conflicting types for ‘strncasecmp’
>> 1
>>
>> $ clang --version
>> clang version 3.4 (tags/RELEASE_34/final)
>> Target: x86_64-redhat-linux-gnu
>> Thread model: posix
>> $ clang -c /tmp/conftest.1.c ; echo $?
>> /tmp/conftest.1.c:592:8: warning: type specifier missing, defaults to 'int'
>> [-Wimplicit-int]
>> strncasecmp(int,long,int,long,int,long,int,long,int,long);
>> ^~~
>> /tmp/conftest.1.c:592:8: error: conflicting types for 'strncasecmp'
>> /usr/include/string.h:540:12: note: previous declaration is here
>> extern int strncasecmp (__const char *__s1, __const char *__s2, size_t __n)
>> ^
>> /tmp/conftest.1.c:596:19: error: too many arguments to function call,
>> expected
>> 3, have 10
>> strncasecmp(1,2,3,4,5,6,7,8,9,10);
>> ~~~ ^~
>> 1 warning and 2 errors generated.
>> 1
>>
>>
>> but clang 3.2 and icc simply issue a warning and no error :
>>
>> $ clang --version
>> clang version 3.2 (tags/RELEASE_32/final)
>> Target: x86_64-unknown-linux-gnu
>> Thread model: posix
>> $ clang -c /tmp/conftest.1.c ; echo $?
>> /tmp/conftest.1.c:592:8: warning: type specifier missing, defaults to 'int'
>> [-Wimplicit-int]
>> strncasecmp(int,long,int,long,int,long,int,long,int,long);
>> ^~~
>> /tmp/conftest.1.c:592:8: warning: incompatible redeclaration of library
>> function
>> 'strncasecmp'
>> /usr/include/string.h:540:12: note: 'strncasecmp' is a builtin with type
>> 'int
>> (const char *, const char *, size_t)'
>> extern int strncasecmp (__const char *__s1, __const char *__s2, size_t __n)
>> ^
>> 2 warnings generated.
>> 0
>>
>> $ icc -c conftest.1.c ; echo $?
>> conftest.1.c(592): warning #77: this declaration has no storage class or
>> type specifier
>> strncasecmp(int,long,int,long,int,long,int,long,int,long);
>> ^
>>
>> conftest.1.c(592): warning #147: declaration is incompatible with "int
>> strncasecmp(const char *, const char *, size_t={unsigned long})"
>> (declared at line 540 of "/usr/include/string.h")
>> strncasecmp(int,long,int,long,int,long,int,long,int,long);
>> ^
>>
>> 0
>>
>>
>> the attached hwloc_config.patch is used in order to make the test
>> program slightly different (conftest.2.c) and it does fail with all the
>> compilers.
>>
>>
>> that being said, r32685 might not be reversed since in the case
>> strncasecmp is not supported by the system (i do not even know if such
>> os exist)
>> ctype.h must be #include'd in order to get the prototype of the
>> tolower() function.
>>
>>
>> could you please review the hwloc_config.patch and comment ?
>>
>> Cheers,
>>
>> Gilles
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/09/15775.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/09/15776.php



[OMPI devel] about r32685

2014-09-08 Thread Gilles Gouaillardet
Ralph and Brice,

i noted Ralph commited r32685 in order to fix a problem with Intel
compilers.
The very similar issue occurs with clang 3.2 (gcc and clang 3.4 are ok
for me)

imho, the root cause is in the hwloc configure.
in this case, configure fails to detect strncasecmp is part of the C
include files.

in order to achieve this, the conftest.1.c program is compiled and a
failure means that
strncasecmp is supported since it is declared in some C include files.

gcc and clang 3.4 both fail to compile this program :

$ gcc -c /tmp/conftest.1.c ; echo $?
/tmp/conftest.1.c:592: warning: data definition has no type or storage class
/tmp/conftest.1.c:592: error: conflicting types for ‘strncasecmp’
1

$ clang --version
clang version 3.4 (tags/RELEASE_34/final)
Target: x86_64-redhat-linux-gnu
Thread model: posix
$ clang -c /tmp/conftest.1.c ; echo $?
/tmp/conftest.1.c:592:8: warning: type specifier missing, defaults to 'int'
[-Wimplicit-int]
strncasecmp(int,long,int,long,int,long,int,long,int,long);
^~~
/tmp/conftest.1.c:592:8: error: conflicting types for 'strncasecmp'
/usr/include/string.h:540:12: note: previous declaration is here
extern int strncasecmp (__const char *__s1, __const char *__s2, size_t __n)
^
/tmp/conftest.1.c:596:19: error: too many arguments to function call,
expected
3, have 10
strncasecmp(1,2,3,4,5,6,7,8,9,10);
~~~ ^~
1 warning and 2 errors generated.
1


but clang 3.2 and icc simply issue a warning and no error :

$ clang --version
clang version 3.2 (tags/RELEASE_32/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
$ clang -c /tmp/conftest.1.c ; echo $?
/tmp/conftest.1.c:592:8: warning: type specifier missing, defaults to 'int'
[-Wimplicit-int]
strncasecmp(int,long,int,long,int,long,int,long,int,long);
^~~
/tmp/conftest.1.c:592:8: warning: incompatible redeclaration of library
function
'strncasecmp'
/usr/include/string.h:540:12: note: 'strncasecmp' is a builtin with type
'int
(const char *, const char *, size_t)'
extern int strncasecmp (__const char *__s1, __const char *__s2, size_t __n)
^
2 warnings generated.
0

$ icc -c conftest.1.c ; echo $?
conftest.1.c(592): warning #77: this declaration has no storage class or
type specifier
strncasecmp(int,long,int,long,int,long,int,long,int,long);
^

conftest.1.c(592): warning #147: declaration is incompatible with "int
strncasecmp(const char *, const char *, size_t={unsigned long})"
(declared at line 540 of "/usr/include/string.h")
strncasecmp(int,long,int,long,int,long,int,long,int,long);
^

0


the attached hwloc_config.patch is used in order to make the test
program slightly different (conftest.2.c) and it does fail with all the
compilers.


that being said, r32685 might not be reversed since in the case
strncasecmp is not supported by the system (i do not even know if such
os exist)
ctype.h must be #include'd in order to get the prototype of the
tolower() function.


could you please review the hwloc_config.patch and comment ?

Cheers,

Gilles
/* confdefs.h */
#define PACKAGE_NAME "Open MPI"
#define PACKAGE_TARNAME "openmpi"
#define PACKAGE_VERSION "1.9a1"
#define PACKAGE_STRING "Open MPI 1.9a1"
#define PACKAGE_BUGREPORT "http://www.open-mpi.org/community/help/;
#define PACKAGE_URL ""
#define OPAL_ARCH "x86_64-unknown-linux-gnu"
#define STDC_HEADERS 1
#define HAVE_SYS_TYPES_H 1
#define HAVE_SYS_STAT_H 1
#define HAVE_STDLIB_H 1
#define HAVE_STRING_H 1
#define HAVE_MEMORY_H 1
#define HAVE_STRINGS_H 1
#define HAVE_INTTYPES_H 1
#define HAVE_STDINT_H 1
#define HAVE_UNISTD_H 1
#define __EXTENSIONS__ 1
#define _ALL_SOURCE 1
#define _GNU_SOURCE 1
#define _POSIX_PTHREAD_SEMANTICS 1
#define _TANDEM_SOURCE 1
#define OMPI_MAJOR_VERSION 1
#define OMPI_MINOR_VERSION 9
#define OMPI_RELEASE_VERSION 0
#define OMPI_GREEK_VERSION "a1"
#define OMPI_VERSION "0"
#define OMPI_RELEASE_DATE "Unreleased developer copy"
#define OMPI_WANT_REPO_REV 1
#define OMPI_REPO_REV "r32670M"
#define ORTE_MAJOR_VERSION 1
#define ORTE_MINOR_VERSION 9
#define ORTE_RELEASE_VERSION 0
#define ORTE_GREEK_VERSION "a1"
#define ORTE_VERSION "0"
#define ORTE_RELEASE_DATE "Unreleased developer copy"
#define ORTE_WANT_REPO_REV 1
#define ORTE_REPO_REV "r32670M"
#define OSHMEM_MAJOR_VERSION 1
#define OSHMEM_MINOR_VERSION 9
#define OSHMEM_RELEASE_VERSION 0
#define OSHMEM_GREEK_VERSION "a1"
#define OSHMEM_VERSION "0"
#define OSHMEM_RELEASE_DATE "Unreleased developer copy"
#define OSHMEM_WANT_REPO_REV 1
#define OSHMEM_REPO_REV "r32670M"
#define OPAL_MAJOR_VERSION 1
#define OPAL_MINOR_VERSION 9
#define OPAL_RELEASE_VERSION 0
#define OPAL_GREEK_VERSION "a1"
#define OPAL_VERSION "0"
#define OPAL_RELEASE_DATE "Unreleased developer copy"
#define OPAL_WANT_REPO_REV 1
#define OPAL_REPO_REV "r32670M"
#define OPAL_ENABLE_MEM_DEBUG 0
#define OPAL_ENABLE_MEM_PROFILE 0
#define OPAL_ENABLE_DEBUG 0
#define OPAL_WANT_PRETTY_PRINT_STACKTRACE 1
#define OPAL_ENABLE_PTY_SUPPORT 1
#define OPAL_ENABLE_HETEROGENEOUS_SUPPORT 0
#define OPAL_WANT_HOME_CONFIG_FILES 1
#define 

[OMPI devel] f08 bindings and weak symbols

2014-09-05 Thread Gilles Gouaillardet
Folks,

when OpenMPI is configured with --disable-weak-symbols and a fortran
2008 capable compiler (e.g. gcc 4.9),
MPI_STATUSES_IGNORE invoked from Fortran is not correctly interpreted as
it should.
/* instead of being a special array of statuses, it is an array of one
status, which can lead to buffer overflow and memory corruption */

A simple workaround is the attached workaround.patch

i could not find any reason why :
a) weak symbols matter here, since none of these are weak symbols
b) all 4 symbols are always defined no matter what

so i tried to improve this and came up with a much larger patch.
it comes in two patches :
1) fortran.patch : only compile some files and some piece of code if
fortran is supported
/* while i was working on a fix, that came as a pre-requisite for the
following patch */
2) weak.patch : only define the required symbols in fortran
   (e.g. one of CAPS, PLAIN, SINGLE_UNDERSCORE and DOUBLE_UNDERSCORE)
   and bind the f08 symbols to the correct C defined symbol

Since this is quite an important changeset, i did not commit it yet.

Could someone (Jeff ?) please review and comment ?

If the last two patches can be commited into the trunk, what about v1.8 ?
- should these patches land into the v1.8 branch as well ?
- is the workaround.patch enough for the v1.8 branch ?

Cheers,

Gilles

Index: ompi/mpi/fortran/base/constants.h
===
--- ompi/mpi/fortran/base/constants.h   (revision 32668)
+++ ompi/mpi/fortran/base/constants.h   (working copy)
@@ -12,6 +12,8 @@
  * Copyright (c) 2006-2012 Cisco Systems, Inc.  All rights reserved.
  * Copyright (c) 2011-2013 Inria.  All rights reserved.
  * Copyright (c) 2011-2012 Universite Bordeaux 1
+ * Copyright (c) 2014  Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  * 
  * Additional copyrights may follow
@@ -116,10 +118,8 @@
  mpi_fortran_statuses_ignore_, mpi_fortran_statuses_ignore__);

 /*
- * Create macros to do the checking.  Only check for all 4 if we have
- * weak symbols.  Otherwise, just check for the one relevant symbol.
+ * Create macros to do the checking.  Always check for all 4.
  */
-#if OPAL_HAVE_WEAK_SYMBOLS
 #define OMPI_IS_FORTRAN_BOTTOM(addr) \
   (addr == (void*) _FORTRAN_BOTTOM || \
addr == (void*) _fortran_bottom || \
@@ -166,88 +166,6 @@
addr == (void*) _fortran_statuses_ignore_ || \
addr == (void*) _fortran_statuses_ignore__)

-#elif OMPI_FORTRAN_CAPS
-#define OMPI_IS_FORTRAN_BOTTOM(addr) \
-  (addr == (void*) _FORTRAN_BOTTOM)
-#define OMPI_IS_FORTRAN_IN_PLACE(addr) \
-  (addr == (void*) _FORTRAN_IN_PLACE)
-#define OMPI_IS_FORTRAN_UNWEIGHTED(addr) \
-  (addr == (void*) _FORTRAN_UNWEIGHTED)
-#define OMPI_IS_FORTRAN_WEIGHTS_EMPTY(addr) \
-  (addr == (void*) _FORTRAN_WEIGHTS_EMPTY)
-#define OMPI_IS_FORTRAN_ARGV_NULL(addr) \
-  (addr == (void*) _FORTRAN_ARGV_NULL)
-#define OMPI_IS_FORTRAN_ARGVS_NULL(addr) \
-  (addr == (void*) _FORTRAN_ARGVS_NULL)
-#define OMPI_IS_FORTRAN_ERRCODES_IGNORE(addr) \
-  (addr == (void*) _FORTRAN_ERRCODES_IGNORE)
-#define OMPI_IS_FORTRAN_STATUS_IGNORE(addr) \
-  (addr == (void*) _FORTRAN_STATUS_IGNORE)
-#define OMPI_IS_FORTRAN_STATUSES_IGNORE(addr) \
-  (addr == (void*) _FORTRAN_STATUSES_IGNORE)
-
-#elif OMPI_FORTRAN_PLAIN
-#define OMPI_IS_FORTRAN_BOTTOM(addr) \
-   (addr == (void*) _fortran_bottom)
-#define OMPI_IS_FORTRAN_IN_PLACE(addr) \
-   (addr == (void*) _fortran_in_place)
-#define OMPI_IS_FORTRAN_UNWEIGHTED(addr) \
-   (addr == (void*) _fortran_unweighted)
-#define OMPI_IS_FORTRAN_WEIGHTS_EMPTY(addr) \
-   (addr == (void*) _fortran_weights_empty)
-#define OMPI_IS_FORTRAN_ARGV_NULL(addr) \
-   (addr == (void*) _fortran_argv_null)
-#define OMPI_IS_FORTRAN_ARGVS_NULL(addr) \
-   (addr == (void*) _fortran_argvs_null)
-#define OMPI_IS_FORTRAN_ERRCODES_IGNORE(addr) \
-   (addr == (void*) _fortran_errcodes_ignore)
-#define OMPI_IS_FORTRAN_STATUS_IGNORE(addr) \
-   (addr == (void*) _fortran_status_ignore)
-#define OMPI_IS_FORTRAN_STATUSES_IGNORE(addr) \
-   (addr == (void*) _fortran_statuses_ignore)
-
-#elif OMPI_FORTRAN_SINGLE_UNDERSCORE
-#define OMPI_IS_FORTRAN_BOTTOM(addr) \
-   (addr == (void*) _fortran_bottom_)
-#define OMPI_IS_FORTRAN_IN_PLACE(addr) \
-   (addr == (void*) _fortran_in_place_)
-#define OMPI_IS_FORTRAN_UNWEIGHTED(addr) \
-   (addr == (void*) _fortran_unweighted_)
-#define OMPI_IS_FORTRAN_WEIGHTS_EMPTY(addr) \
-   (addr == (void*) _fortran_weights_empty_)
-#define OMPI_IS_FORTRAN_ARGV_NULL(addr) \
-   (addr == (void*) _fortran_argv_null_)
-#define OMPI_IS_FORTRAN_ARGVS_NULL(addr) \
-   (addr == (void*) _fortran_argvs_null_)
-#define OMPI_IS_FORTRAN_ERRCODES_IGNORE(addr) \
-   (addr == (void*) _fortran_errcodes_ignore_)
-#define OMPI_IS_FORTRAN_STATUS_IGNORE(addr) \
-   (addr == (void*) _fortran_status_ignore_)
-#define OMPI_IS_FORTRAN_STATUSES_IGNORE(addr) \
-   (addr == (void*) _fortran_statuses_ignore_)
-

Re: [OMPI devel] OMPI devel] race condition in coll/ml

2014-09-01 Thread Gilles Gouaillardet
Ralph,

The changeset avoids SIGSEGV by calling mpi_abort before bad things happen.

The attached patch seems to fix the problem (and makes the changeset kind of 
useless).
Once again, the patch was very little tested and might break other parts of 
coll/m.laposte

Cheers,

Gilles

Ralph Castain <r...@open-mpi.org> wrote:
>Usually we have trouble with coll/ml because the process locality isn't being 
>reported sufficiently for its needs. Given the recent change in data exchange, 
>I suspect that is the root cause here - I have a note to Nathan asking for 
>clarification of the coll/ml locality requirement.
>
>Did this patch "fix" the problem by avoiding the segfault due to coll/ml 
>disqualifying itself? Or did it make everything work okay again?
>
>
>On Sep 1, 2014, at 3:16 AM, Gilles Gouaillardet 
><gilles.gouaillar...@iferc.org> wrote:
>
>> Folks,
>> 
>> mtt recently failed a bunch of times with the trunk.
>> a good suspect is the collective/ibarrier test from the ibm test suite.
>> 
>> most of the time, CHECK_AND_RECYCLE will fail
>> /* IS_COLL_SYNCMEM(coll_op) is true */
>> 
>> with this test case, we just get a glory SIGSEGV since OBJ_RELEASE is
>> called on MPI_COMM_WORLD (which has *not* been allocated with OBJ_NEW)
>> 
>> i commited r32659 in order to :
>> - display an error message
>> - abort if the communicator is an intrincic one
>> 
>> with attached modified version of the ibarrier test, i always get an
>> error on task 0 when invoked with
>> mpirun -np 2 -host node0,node1 --mca btl tcp,self ./ibarrier
>> 
>> the modified version adds some sleep(1) in order to work around the race
>> condition and get a reproducible crash
>> 
>> i tried to dig and could not find a correct way to fix this.
>> that being said, i tried the attached ml.patch and it did fix the
>> problem (even with NREQS=1024)
>> i did not commit it since this is very likely incorrect.
>> 
>> could someone have a look ?
>> 
>> Cheers,
>> 
>> Gilles
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/09/15767.php
>
>___
>devel mailing list
>de...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Link to this post: 
>http://www.open-mpi.org/community/lists/devel/2014/09/15769.php


[OMPI devel] race condition in coll/ml

2014-09-01 Thread Gilles Gouaillardet
Folks,

mtt recently failed a bunch of times with the trunk.
a good suspect is the collective/ibarrier test from the ibm test suite.

most of the time, CHECK_AND_RECYCLE will fail
/* IS_COLL_SYNCMEM(coll_op) is true */

with this test case, we just get a glory SIGSEGV since OBJ_RELEASE is
called on MPI_COMM_WORLD (which has *not* been allocated with OBJ_NEW)

i commited r32659 in order to :
- display an error message
- abort if the communicator is an intrincic one

with attached modified version of the ibarrier test, i always get an
error on task 0 when invoked with
mpirun -np 2 -host node0,node1 --mca btl tcp,self ./ibarrier

the modified version adds some sleep(1) in order to work around the race
condition and get a reproducible crash

i tried to dig and could not find a correct way to fix this.
that being said, i tried the attached ml.patch and it did fix the
problem (even with NREQS=1024)
i did not commit it since this is very likely incorrect.

could someone have a look ?

Cheers,

Gilles
/*
 * $HEADER$
 */
/

 MESSAGE PASSING INTERFACE TEST CASE SUITE

 Copyright IBM Corp. 1995

 IBM Corp. hereby grants a non-exclusive license to use, copy, modify, and
 distribute this software for any purpose and without fee provided that the
 above copyright notice and the following paragraphs appear in all copies.

 IBM Corp. makes no representation that the test cases comprising this
 suite are correct or are an accurate representation of any standard.

 In no event shall IBM be liable to any party for direct, indirect, special
 incidental, or consequential damage arising out of the use of this software
 even if IBM Corp. has been advised of the possibility of such damage.

 IBM CORP. SPECIFICALLY DISCLAIMS ANY WARRANTIES INCLUDING, BUT NOT LIMITED
 TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
 PURPOSE.  THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS AND IBM
 CORP. HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES,
 ENHANCEMENTS, OR MODIFICATIONS.



 These test cases reflect an interpretation of the MPI Standard.  They are
 are, in most cases, unit tests of specific MPI behaviors.  If a user of any
 test case from this set believes that the MPI Standard requires behavior
 different than that implied by the test case we would appreciate feedback.

 Comments may be sent to:
Richard Treumann
treum...@kgn.ibm.com


*/
#include 
#include 
#include 

#include 

#include "ompitest_error.h"

#ifndef NREQS
#define NREQS 16
#endif


int main(int argc, char** argv)
{
int i, me, rank, tasks;
double t1, t2;
MPI_Request req[NREQS];
MPI_Comm comm;

MPI_Init(,);

ompitest_check_size(__FILE__, __LINE__, 2, 1);

MPI_Comm_rank(MPI_COMM_WORLD, );

MPI_Comm_dup(MPI_COMM_WORLD, );

MPI_Barrier(MPI_COMM_WORLD);
if (rank > 0) sleep(2);

/* Do a bunch of barriers */
for (i = 0; i < NREQS; ++i) {
MPI_Ibarrier(comm, [i]);
}
MPI_Waitall(NREQS, req, MPI_STATUSES_IGNORE);
if (rank > 0) sleep(2);
MPI_Barrier(MPI_COMM_WORLD);

MPI_Finalize();
return 0;
}
Index: ompi/mca/coll/ml/coll_ml_inlines.h
===
--- ompi/mca/coll/ml/coll_ml_inlines.h  (revision 32658)
+++ ompi/mca/coll/ml/coll_ml_inlines.h  (working copy)
@@ -192,7 +192,7 @@
 !out_of_resource) {
 */
 if (((_op->full_message != 
coll_op->fragment_data.message_descriptor) &&
-!out_of_resource) || IS_COLL_SYNCMEM(coll_op)) {
+!out_of_resource)) {
 /* non-zero offset ==> this is not fragment 0 */
 CHECK_AND_RECYCLE(coll_op);
 }


Re: [OMPI devel] about the test_shmem_zero_get.x test from the openshmem test suite

2014-09-01 Thread Gilles Gouaillardet
Jeff,

i did not get any reply :-(

from the OpenSHMEM 1.1 specs :

Data object on the PE identified by pe that contains the data to be
copied. This data object must be remotely accessible.

so i assumed the test was incorrect and i commited a new one (r2418)

Cheers,

Gilles

On 2014/08/29 23:41, Jeff Squyres (jsquyres) wrote:
> Gilles --
>
> Did you get a reply about this?
>
>
> On Aug 26, 2014, at 3:17 AM, Gilles Gouaillardet 
> <gilles.gouaillar...@iferc.org> wrote:
>
>> Folks,
>>
>> the test_shmem_zero_get.x from the openshmem-release-1.0d test suite is
>> currently failing.
>>
>> i looked at the test itself, and compared it to test_shmem_zero_put.x
>> (that is a success) and
>> i am very puzzled ...
>>
>> the test calls several flavors of shmem_*_get where :
>> - the destination is in the shmem (why not, but this is useless)
>> - the source is *not* in the shmem
>> - the number of elements to be transferred is zero
>>
>> currently, this is failing because the source is *not* in the shmem.
>>
>> 1) is the test itself correct ?
>> i mean that if we compare it to test_shmem_zero_put.x, i would guess that
>> destination should be in the local memory and source should be in the shmem.
>>
>> 2) should shmem_*_get even fail ?
>> i mean there is zero data to be transferred, so why do we even care
>> whether source is in the shmem or not ?
>> is the openshmem standard explicit about this case (e.g. zero elements
>> to be transferred) ?
>>
>> 3) is a failure expected ?
>> even if i doubt it, this is an option ... and in this case, mtt should
>> be aware about it and report a success when the test fails
>>
>> 4) the test is a success on v1.8.
>> the reason is the default configure value is --oshmem-param-check=never
>> on v1.8 whereas it is --oshmem-param-check=always on trunk
>> is there any reason for this ?
>>
>> Cheers,
>>
>> Gilles
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/08/15707.php
>



Re: [OMPI devel] oshmem-openmpi-1.8.2 causes compile error with -i8(64bit fortarn integer) configuration

2014-09-01 Thread Gilles Gouaillardet
Mishima-san,

the root cause is macro expansion does not always occur as one would
have expected ...

could you please give a try to the attached patch ?

it compiles (at least with gcc) and i made zero tests so far 

Cheers,

Gilles

On 2014/09/01 10:44, tmish...@jcity.maeda.co.jp wrote:
> Hi folks,
>
> I tried to build openmpi-1.8.2 with PGI fortran and -i8(64bit fortran int)
> option
> as shown below:
>
> ./configure \
> --prefix=/home/mishima/opt/mpi/openmpi-1.8.2-pgi14.7_int64 \
> --enable-abi-breaking-fortran-status-i8-fix \
> --with-tm \
> --with-verbs \
> --disable-ipv6 \
> CC=pgcc CFLAGS="-tp k8-64e -fast" \
> CXX=pgCC CXXFLAGS="-tp k8-64e -fast" \
> F77=pgfortran FFLAGS="-i8 -tp k8-64e -fast" \
> FC=pgfortran FCFLAGS="-i8 -tp k8-64e -fast"
>
> Then I saw this compile error in making oshmem at the last stage:
>
> if test ! -r pshmem_real8_swap_f.c ; then \
> pname=`echo pshmem_real8_swap_f.c | cut -b '2-'` ; \
> ln -s ../../../../oshmem/shmem/fortran/$pname
> pshmem_real8_swap_f.c ; \
> fi
>   CC   pshmem_real8_swap_f.lo
> if test ! -r pshmem_int4_cswap_f.c ; then \
> pname=`echo pshmem_int4_cswap_f.c | cut -b '2-'` ; \
> ln -s ../../../../oshmem/shmem/fortran/$pname
> pshmem_int4_cswap_f.c ; \
> fi
>   CC   pshmem_int4_cswap_f.lo
> PGC-S-0058-Illegal lvalue (pshmem_int4_cswap_f.c: 39)
> PGC/x86-64 Linux 14.7-0: compilation completed with severe errors
> make[3]: *** [pshmem_int4_cswap_f.lo] Error 1
> make[3]: Leaving directory
> `/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem/shmem/fortran/profile'
> make[2]: *** [all-recursive] Error 1
> make[2]: Leaving directory
> `/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem/shmem/fortran'
> make[1]: *** [all-recursive] Error 1
> make[1]: Leaving directory
> `/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem'
> make: *** [all-recursive] Error 1
>
> I confirmed that it worked if I added configure option of --disable-oshmem.
> So, I hope that oshmem experts would fix this problem.
>
> (additional note)
> I switched to use gnu compiler and checked with this configuration, then
> I got the same error:
>
> ./configure \
> --prefix=/home/mishima/opt/mpi/openmpi-1.8.2-gnu_int64 \
> --enable-abi-breaking-fortran-status-i8-fix \
> --disable-ipv6 \
> F77=gfortran \
> FC=gfortran \
> CC=gcc \
> CXX=g++ \
> FFLAGS="-m64 -fdefault-integer-8" \
> FCFLAGS="-m64 -fdefault-integer-8" \
> CFLAGS=-m64 \
> CXXFLAGS=-m64
>
> make
> 
> if test ! -r pshmem_int4_cswap_f.c ; then \
> pname=`echo pshmem_int4_cswap_f.c | cut -b '2-'` ; \
> ln -s ../../../../oshmem/shmem/fortran/$pname
> pshmem_int4_cswap_f.c ; \
> fi
>   CC   pshmem_int4_cswap_f.lo
> pshmem_int4_cswap_f.c: In function 'shmem_int4_cswap_f':
> pshmem_int4_cswap_f.c:39: error: invalid lvalue in unary '&'
> make[3]: *** [pshmem_int4_cswap_f.lo] Error 1
>
> Regards
> Tetsuya Mishima
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15764.php

Index: oshmem/shmem/fortran/shmem_int4_cswap_f.c
===
--- oshmem/shmem/fortran/shmem_int4_cswap_f.c   (revision 32657)
+++ oshmem/shmem/fortran/shmem_int4_cswap_f.c   (working copy)
@@ -2,6 +2,8 @@
  * Copyright (c) 2013  Mellanox Technologies, Inc.
  * All rights reserved.
  * Copyright (c) 2013 Cisco Systems, Inc.  All rights reserved.
+ * Copyright (c) 2014  Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  * 
  * Additional copyrights may follow
@@ -38,7 +40,7 @@

 MCA_ATOMIC_CALL(cswap(FPTR_2_VOID_PTR(target), 
 (void *)_value, 
-(const void*)(_FINT_2_INT(*cond)), 
+(const void*)(OMPI_PFINT_2_PINT(cond)), 
 FPTR_2_VOID_PTR(value), 
 sizeof(out_value), 
 OMPI_FINT_2_INT(*pe)));
Index: oshmem/shmem/fortran/shmem_int8_cswap_f.c
===
--- oshmem/shmem/fortran/shmem_int8_cswap_f.c   (revision 32657)
+++ oshmem/shmem/fortran/shmem_int8_cswap_f.c   (working copy)
@@ -2,6 +2,8 @@
  * Copyright (c) 2013  Mellanox Technologies, Inc.
  * All rights reserved.
  * Copyright (c) 2013 Cisco Systems, Inc.  All rights reserved.
+ * Copyright (c) 2014  Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  * 
  * Additional copyrights may follow
@@ -38,7 +40,7 @@

 MCA_ATOMIC_CALL(cswap(FPTR_2_VOID_PTR(target), 
 (void *)_value, 
-(const void*)(_FINT_2_INT(*cond)), 
+(const 

Re: [OMPI devel] segfault in openib component on trunk

2014-08-29 Thread Gilles Gouaillardet
Ralph,

r32639 and r32642 fixes bugs that do exist in both trunk and v1.8, and they can 
be considered as independent of the issue that is discussed in this thread and 
the one you pointed.

so imho, they should land v1.8 even if they do not fix the issue we are now 
discussing here

Cheers,

Gilles


On 2014/08/29 16:42, Ralph Castain wrote:
> This is the email thread which sparked the problem:
>
> http://www.open-mpi.org/community/lists/devel/2014/07/15329.php
>
> I actually tried to apply the original CMR and couldn't get it to work in the 
> 1.8 branch - just kept having problems, so I pushed it off to 1.8.3. I'm 
> leery to accept either of the current CMRs for two reasons: (a) none of the 
> preceding changes is in the 1.8 series yet, and (b) it doesn't sound like we 
> still have a complete solution.
>
> Anyway, I just wanted to point to the original problem that was trying to be 
> addressed.
>
>
> On Aug 28, 2014, at 10:01 PM, Gilles Gouaillardet 
> <gilles.gouaillar...@iferc.org> wrote:
>
>> Howard and Edgar,
>>
>> i fixed a few bugs (r32639 and r32642)
>>
>> the bug is trivial to reproduce with any mpi hello world program
>>
>> mpirun -np 2 --mca btl openib,self hello_world
>>
>> after setting the mca param in the $HOME/.openmpi/mca-params.conf
>>
>> $ cat ~/.openmpi/mca-params.conf
>> btl_openib_receive_queues = S,12288,128,64,32:S,65536,128,64,3
>>
>> good news is the program does not crash with a glory SIGSEGV any more
>> bad news is the program will (nicely) abort for an incorrect reason :
>>
>> --
>> The Open MPI receive queue configuration for the OpenFabrics devices
>> on two nodes are incompatible, meaning that MPI processes on two
>> specific nodes were unable to communicate with each other.  This
>> generally happens when you are using OpenFabrics devices from
>> different vendors on the same network.  You should be able to use the
>> mca_btl_openib_receive_queues MCA parameter to set a uniform receive
>> queue configuration for all the devices in the MPI job, and therefore
>> be able to run successfully.
>>
>>  Local host:   node0
>>  Local adapter:mlx4_0 (vendor 0x2c9, part ID 4099)
>>  Local queues: S,12288,128,64,32:S,65536,128,64,3
>>
>>  Remote host:  node0
>>  Remote adapter:   (vendor 0x2c9, part ID 4099)
>>  Remote queues:   
>> P,128,256,192,128:S,2048,1024,1008,64:S,12288,1024,1008,64:S,65536,1024,1008,64
>>
>> the root cause is the remote host did not send its receive_queues to the
>> local host
>> (and hence the local host believes the remote hosts uses the default value)
>>
>> the logic was revamped vs v1.8, that is why v1.8 does not have such issue.
>>
>> i am still thinking what should be the right fix :
>> - one option is to send the receive queues
>> - an other option would be to differenciate value overrided in
>> mca-params.conf (should be always ok) of value overrided in the .ini
>>  (might want to double check local and remote values match)
>>
>> Cheers,
>>
>> Gilles
>>
>> On 2014/08/29 7:02, Pritchard Jr., Howard wrote:
>>> Hi Edgar,
>>>
>>> Could you send me your conf file?  I'll try to reproduce it.
>>>
>>> Maybe run with --mca btl_base_verbose 20 or something to
>>> see what the code that is parsing this field in the conf file
>>> is finding.
>>>
>>>
>>> Howard
>>>
>>>
>>> -Original Message-
>>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Edgar Gabriel
>>> Sent: Thursday, August 28, 2014 3:40 PM
>>> To: Open MPI Developers
>>> Subject: Re: [OMPI devel] segfault in openib component on trunk
>>>
>>> to add another piece of information that I just found, the segfault only 
>>> occurs if I have a particular mca parameter set in my mca-params.conf file, 
>>> namely
>>>
>>> btl_openib_receive_queues = S,12288,128,64,32:S,65536,128,64,3
>>>
>>> Has the syntax for this parameter changed, or should/can I get rid of it?
>>>
>>> Thanks
>>> Edgar
>>>
>>> On 08/28/2014 04:19 PM, Edgar Gabriel wrote:
>>>> we are having recently problems running trunk with openib component 
>>>> enabled on one of our clusters. The problem occurs right in the 
>>>> initialization part, here is the stack right before the segfault:
>>>>
>>>> ---snip---
>>>> (gdb) where
>>>&g

[OMPI devel] mpirun hangs when a task exits with a non zero code

2014-08-29 Thread Gilles Gouaillardet
Ralph and all,

The following trivial test hangs
/* it hangs at least 99% of the time in my environment, 1% is a race
condition and the program behaves as expected */

mpirun -np 1 --mca btl self /bin/false

same behaviour happen with the following trivial but MPI program :

#include 

int main (int argc, char *argv[]) {
MPI_Init(, );
MPI_Finalize();
return 1;
}

The attached patch fixes the hang (e.g. the program nicely abort with
the correct error message)

i did not commit it since i am not confident at all

could you please review it ?

Cheers

Gilles
Index: orte/mca/errmgr/default_hnp/errmgr_default_hnp.c
===
--- orte/mca/errmgr/default_hnp/errmgr_default_hnp.c(revision 32642)
+++ orte/mca/errmgr/default_hnp/errmgr_default_hnp.c(working copy)
@@ -10,6 +10,8 @@
  * Copyright (c) 2011-2013 Los Alamos National Security, LLC.
  * All rights reserved.
  * Copyright (c) 2014  Intel, Inc.  All rights reserved.
+ * Copyright (c) 2014  Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  * 
  * Additional copyrights may follow
@@ -382,6 +384,14 @@
 jdata->num_terminated++;
 }

+/* FIXME ???
+ * mark the proc as no more alive if needed
+ */
+if (ORTE_PROC_STATE_KILLED_BY_CMD == state) {
+if (ORTE_FLAG_TEST(pptr, ORTE_PROC_FLAG_WAITPID) && 
ORTE_FLAG_TEST(pptr, ORTE_PROC_FLAG_IOF_COMPLETE)) {
+ORTE_FLAG_UNSET(pptr, ORTE_PROC_FLAG_ALIVE);
+}
+}
 /* if we were ordered to terminate, mark this proc as dead and see if
  * any of our routes or local  children remain alive - if not, then
  * terminate ourselves. */


Re: [OMPI devel] segfault in openib component on trunk

2014-08-29 Thread Gilles Gouaillardet
Howard and Edgar,

i fixed a few bugs (r32639 and r32642)

the bug is trivial to reproduce with any mpi hello world program

mpirun -np 2 --mca btl openib,self hello_world

after setting the mca param in the $HOME/.openmpi/mca-params.conf

$ cat ~/.openmpi/mca-params.conf
btl_openib_receive_queues = S,12288,128,64,32:S,65536,128,64,3

good news is the program does not crash with a glory SIGSEGV any more
bad news is the program will (nicely) abort for an incorrect reason :

--
The Open MPI receive queue configuration for the OpenFabrics devices
on two nodes are incompatible, meaning that MPI processes on two
specific nodes were unable to communicate with each other.  This
generally happens when you are using OpenFabrics devices from
different vendors on the same network.  You should be able to use the
mca_btl_openib_receive_queues MCA parameter to set a uniform receive
queue configuration for all the devices in the MPI job, and therefore
be able to run successfully.

  Local host:   node0
  Local adapter:mlx4_0 (vendor 0x2c9, part ID 4099)
  Local queues: S,12288,128,64,32:S,65536,128,64,3

  Remote host:  node0
  Remote adapter:   (vendor 0x2c9, part ID 4099)
  Remote queues:   
P,128,256,192,128:S,2048,1024,1008,64:S,12288,1024,1008,64:S,65536,1024,1008,64

the root cause is the remote host did not send its receive_queues to the
local host
(and hence the local host believes the remote hosts uses the default value)

the logic was revamped vs v1.8, that is why v1.8 does not have such issue.

i am still thinking what should be the right fix :
- one option is to send the receive queues
- an other option would be to differenciate value overrided in
mca-params.conf (should be always ok) of value overrided in the .ini
  (might want to double check local and remote values match)

Cheers,

Gilles

On 2014/08/29 7:02, Pritchard Jr., Howard wrote:
> Hi Edgar,
>
> Could you send me your conf file?  I'll try to reproduce it.
>
> Maybe run with --mca btl_base_verbose 20 or something to
> see what the code that is parsing this field in the conf file
> is finding.
>
>
> Howard
>
>
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Edgar Gabriel
> Sent: Thursday, August 28, 2014 3:40 PM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] segfault in openib component on trunk
>
> to add another piece of information that I just found, the segfault only 
> occurs if I have a particular mca parameter set in my mca-params.conf file, 
> namely
>
> btl_openib_receive_queues = S,12288,128,64,32:S,65536,128,64,3
>
> Has the syntax for this parameter changed, or should/can I get rid of it?
>
> Thanks
> Edgar
>
> On 08/28/2014 04:19 PM, Edgar Gabriel wrote:
>> we are having recently problems running trunk with openib component 
>> enabled on one of our clusters. The problem occurs right in the 
>> initialization part, here is the stack right before the segfault:
>>
>> ---snip---
>> (gdb) where
>> #0  mca_btl_openib_tune_endpoint (openib_btl=0x762a40,
>> endpoint=0x7d9660) at btl_openib.c:470
>> #1  0x7f1062f105c4 in mca_btl_openib_add_procs (btl=0x762a40, 
>> nprocs=2, procs=0x759be0, peers=0x762440, reachable=0x7fff22dd16f0) at
>> btl_openib.c:1093
>> #2  0x7f106316102c in mca_bml_r2_add_procs (nprocs=2, 
>> procs=0x759be0, reachable=0x7fff22dd16f0) at bml_r2.c:201
>> #3  0x7f10615c0dd5 in mca_pml_ob1_add_procs (procs=0x70dc00,
>> nprocs=2) at pml_ob1.c:334
>> #4  0x7f106823ed84 in ompi_mpi_init (argc=1, argv=0x7fff22dd1da8, 
>> requested=0, provided=0x7fff22dd184c) at runtime/ompi_mpi_init.c:790
>> #5  0x7f1068273a2c in MPI_Init (argc=0x7fff22dd188c,
>> argv=0x7fff22dd1880) at init.c:84
>> #6  0x004008e7 in main (argc=1, argv=0x7fff22dd1da8) at
>> hello_world.c:13
>> ---snip---
>>
>>
>> in line 538 of the file containing the mca_btl_openib_tune_endpoint 
>> routine, the strcmp operation fails, because  recv_qps is a NULL pointer.
>>
>>
>> ---snip---
>>
>> if(0 != strcmp(mca_btl_openib_component.receive_queues, recv_qps)) {
>>
>> ---snip---
>>
>> Does anybody have an idea on what might be going wrong and how to 
>> resolve it? Just to confirm, everything works perfectly with the 1.8 
>> series on that very same  cluster
>>
>> Thanks
>> Edgar
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/08/15746.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15747.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: 

Re: [OMPI devel] intercomm_create from the ibm test suite hangs

2014-08-28 Thread Gilles Gouaillardet
Thanks Ralph !

Cheers,

Gilles

On 2014/08/28 4:52, Ralph Castain wrote:
> Took me awhile to track this down, but it is now fixed - combination of 
> several minor errors
>
> Thanks
> Ralph
>
> On Aug 27, 2014, at 4:07 AM, Gilles Gouaillardet 
> <gilles.gouaillar...@iferc.org> wrote:
>
>> Folks,
>>
>> the intercomm_create test case from the ibm test suite can hang under
>> some configuration.
>>
>> basically, it will spawn n tasks in a first communicator, and then n
>> tasks in a second communicator.
>>
>> when i run from node0 :
>> mpirun -np 1 --mca btl tcp,self --mca coll ^ml -host node1,node2
>> ./intercomm_create
>>
>> the second spawn will hang.
>> a simple workaround is to use 3 hosts :
>> mpirun -np 1 --mca btl tcp,self --mca coll ^ml -host node1,node2,node3
>> ./intercomm_create
>>
>> the second spawn creates the task on node2.
>> for some reasons i cannot fully understand, pmix believe orted of nodes
>> node1 and node2 are involved in allgather.
>> since node1 in not involved whatsoever, the program hangs
>> /* in create_dmns, orte_get_job_data_object(sig->signature[0].jobid)
>> returns jdata with jdata->map->num_nodes = 2 */
>>
>> Cheers,
>>
>> Gilles
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/08/15732.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15743.php



[OMPI devel] intercomm_create from the ibm test suite hangs

2014-08-27 Thread Gilles Gouaillardet
Folks,

the intercomm_create test case from the ibm test suite can hang under
some configuration.

basically, it will spawn n tasks in a first communicator, and then n
tasks in a second communicator.

when i run from node0 :
mpirun -np 1 --mca btl tcp,self --mca coll ^ml -host node1,node2
./intercomm_create

the second spawn will hang.
a simple workaround is to use 3 hosts :
mpirun -np 1 --mca btl tcp,self --mca coll ^ml -host node1,node2,node3
./intercomm_create

the second spawn creates the task on node2.
for some reasons i cannot fully understand, pmix believe orted of nodes
node1 and node2 are involved in allgather.
since node1 in not involved whatsoever, the program hangs
/* in create_dmns, orte_get_job_data_object(sig->signature[0].jobid)
returns jdata with jdata->map->num_nodes = 2 */

Cheers,

Gilles


[OMPI devel] coll/ml without hwloc (?)

2014-08-26 Thread Gilles Gouaillardet
Folks,

i just commited r32604 in order to fix compilation (pmix) when ompi is
configured with --without-hwloc

now, even a trivial hello world program issues the following output
(which is a non fatal, and could even be reported as a warning) :

[soleil][[32389,1],0][../../../../../../src/ompi-trunk/ompi/mca/coll/ml/coll_ml_module.c:1496:ml_discover_hierarchy]
COLL-ML Error: (size of mca_bcol_base_components_in_use = 3) != (size of
mca_sbgp_base_components_in_use = 2) or zero.
[soleil][[32389,1],1][../../../../../../src/ompi-trunk/ompi/mca/coll/ml/coll_ml_module.c:1496:ml_discover_hierarchy]
COLL-ML Error: (size of mca_bcol_base_components_in_use = 3) != (size of
mca_sbgp_base_components_in_use = 2) or zero.


in my understanding, coll/ml somehow relies on the topology information
(reported by hwloc) so i am wondering whether we should simply
*not* compile coll/ml or set its priority to zero if ompi is configured
with --without-hwloc

any thoughts ?

Cheers,

Gilles


[OMPI devel] about the test_shmem_zero_get.x test from the openshmem test suite

2014-08-26 Thread Gilles Gouaillardet
Folks,

the test_shmem_zero_get.x from the openshmem-release-1.0d test suite is
currently failing.

i looked at the test itself, and compared it to test_shmem_zero_put.x
(that is a success) and
i am very puzzled ...

the test calls several flavors of shmem_*_get where :
- the destination is in the shmem (why not, but this is useless)
- the source is *not* in the shmem
- the number of elements to be transferred is zero

currently, this is failing because the source is *not* in the shmem.

1) is the test itself correct ?
i mean that if we compare it to test_shmem_zero_put.x, i would guess that
destination should be in the local memory and source should be in the shmem.

2) should shmem_*_get even fail ?
i mean there is zero data to be transferred, so why do we even care
whether source is in the shmem or not ?
is the openshmem standard explicit about this case (e.g. zero elements
to be transferred) ?

3) is a failure expected ?
even if i doubt it, this is an option ... and in this case, mtt should
be aware about it and report a success when the test fails

4) the test is a success on v1.8.
the reason is the default configure value is --oshmem-param-check=never
on v1.8 whereas it is --oshmem-param-check=always on trunk
is there any reason for this ?

Cheers,

Gilles


Re: [OMPI devel] OMPI devel] pmix: race condition in dynamic/intercomm_create from the ibm test suite

2014-08-25 Thread Gilles Gouaillardet
Thanks for the explanation

In orte_dt_compare_sig(...) memcmp did not multiply value1->sz by 
sizeof(opal_identifier_t).

Being afk, I could not test but that looks like a good suspect

Cheers,

Gilles

Ralph Castain <r...@open-mpi.org> wrote:
>Each collective is given a "signature" that is just the array of names for all 
>procs involved in the collective. Thus, even though task 0 is involved in both 
>of the disconnect barriers, the two collectives should be running in isolation 
>from each other.
>
>The "tags" are just receive callbacks and have no meaning other than to 
>associate a particular callback to a given send/recv pair. It is the signature 
>that counts as the daemons are using that to keep the various collectives 
>separated.
>
>I'll have to take a look at why task 2 is leaving early. The key will be to 
>look at that signature to ensure we aren't getting it confused.
>
>On Aug 25, 2014, at 1:59 AM, Gilles Gouaillardet 
><gilles.gouaillar...@iferc.org> wrote:
>
>> Folks,
>> 
>> when i run
>> mpirun -np 1 ./intercomm_create
>> from the ibm test suite, it either :
>> - success
>> - hangs
>> - mpirun crashes (SIGSEGV) soon after writing the following message
>> ORTE_ERROR_LOG: Not found in file
>> ../../../src/ompi-trunk/orte/orted/pmix/pmix_server.c at line 566
>> 
>> here is what happens :
>> 
>> first, the test program itself :
>> task 0 spawns task 1 : the inter communicator is ab_inter on task 0 and
>> parent on task 1
>> then
>> task 0 spawns task 2 : the inter communicator is ac_inter on task 0 and
>> parent on task 2
>> then
>> several operations (merge, barrier, ...)
>> and then without any synchronization :
>> - task 0 MPI_Comm_disconnect(ab_inter) and then
>> MPI_Comm_disconnect(ac_inter)
>> - task 1 and task 2 MPI_Comm_disconnect(parent)
>> 
>> i applied the attached pmix_debug.patch and ran
>> mpirun -np 1 --mca pmix_base_verbose 90 ./intercomm_create
>> 
>> basically, tasks 0 and 1 execute a native fence and in parallel, tasks 0
>> and 2 execute a native fence.
>> they both use the *same* tags on different though overlapping tasks
>> bottom line, task 2 leave the fences *before* task 0 enterred the fence
>> (it seems task 1 told task 2 it is ok to leave the fence)
>> 
>> a simple work around is to call MPI_Barrier before calling
>> MPI_Comm_disconnect
>> 
>> at this stage, i doubt it is even possible to get this working at the
>> pmix level, so the fix
>> might be to have MPI_Comm_disconnect invoke MPI_Barrier
>> the attached comm_disconnect.patch always call the barrier before
>> (indirectly) invoking pmix
>> 
>> could you please comment on this issue ?
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> here are the relevant logs :
>> 
>> [soleil:00650] [[8110,3],0] pmix:native executing fence on 2 procs
>> [[8110,1],0] and [[8110,3],0]
>> [soleil:00650] [[8110,3],0]
>> [../../../../../../src/ompi-trunk/opal/mca/pmix/native/pmix_native.c:493] 
>> post
>> send to server
>> [soleil:00650] [[8110,3],0] posting recv on tag 5
>> [soleil:00650] [[8110,3],0] usock:send_nb: already connected to server -
>> queueing for send
>> [soleil:00650] [[8110,3],0] usock:send_handler called to send to server
>> [soleil:00650] [[8110,3],0] usock:send_handler SENDING TO SERVER
>> [soleil:00647] [[8110,2],0] pmix:native executing fence on 2 procs
>> [[8110,1],0] and [[8110,2],0]
>> [soleil:00647] [[8110,2],0]
>> [../../../../../../src/ompi-trunk/opal/mca/pmix/native/pmix_native.c:493] 
>> post
>> send to server
>> [soleil:00647] [[8110,2],0] posting recv on tag 5
>> [soleil:00647] [[8110,2],0] usock:send_nb: already connected to server -
>> queueing for send
>> [soleil:00647] [[8110,2],0] usock:send_handler called to send to server
>> [soleil:00647] [[8110,2],0] usock:send_handler SENDING TO SERVER
>> [soleil:00650] [[8110,3],0] usock:recv:handler called
>> [soleil:00650] [[8110,3],0] usock:recv:handler CONNECTED
>> [soleil:00650] [[8110,3],0] usock:recv:handler allocate new recv msg
>> [soleil:00650] usock:recv:handler read hdr
>> [soleil:00650] [[8110,3],0] usock:recv:handler allocate data region of
>> size 14
>> [soleil:00650] [[8110,3],0] RECVD COMPLETE MESSAGE FROM SERVER OF 14
>> BYTES FOR TAG 5
>> [soleil:00650] [[8110,3],0]
>> [../../../../../../src/ompi-trunk/opal/mca/pmix/native/usock_sendrecv.c:415]
>> post msg
>> [soleil:00650] [[8110,3],0] message received 14 bytes for tag 5
>> [soleil:00650] [[8110,3],0] che

[OMPI devel] pmix: race condition in dynamic/intercomm_create from the ibm test suite

2014-08-25 Thread Gilles Gouaillardet
Folks,

when i run
mpirun -np 1 ./intercomm_create
from the ibm test suite, it either :
- success
- hangs
- mpirun crashes (SIGSEGV) soon after writing the following message
ORTE_ERROR_LOG: Not found in file
../../../src/ompi-trunk/orte/orted/pmix/pmix_server.c at line 566

here is what happens :

first, the test program itself :
task 0 spawns task 1 : the inter communicator is ab_inter on task 0 and
parent on task 1
then
task 0 spawns task 2 : the inter communicator is ac_inter on task 0 and
parent on task 2
then
several operations (merge, barrier, ...)
and then without any synchronization :
- task 0 MPI_Comm_disconnect(ab_inter) and then
MPI_Comm_disconnect(ac_inter)
- task 1 and task 2 MPI_Comm_disconnect(parent)

i applied the attached pmix_debug.patch and ran
mpirun -np 1 --mca pmix_base_verbose 90 ./intercomm_create

basically, tasks 0 and 1 execute a native fence and in parallel, tasks 0
and 2 execute a native fence.
they both use the *same* tags on different though overlapping tasks
bottom line, task 2 leave the fences *before* task 0 enterred the fence
(it seems task 1 told task 2 it is ok to leave the fence)

a simple work around is to call MPI_Barrier before calling
MPI_Comm_disconnect

at this stage, i doubt it is even possible to get this working at the
pmix level, so the fix
might be to have MPI_Comm_disconnect invoke MPI_Barrier
the attached comm_disconnect.patch always call the barrier before
(indirectly) invoking pmix

could you please comment on this issue ?

Cheers,

Gilles

here are the relevant logs :

[soleil:00650] [[8110,3],0] pmix:native executing fence on 2 procs
[[8110,1],0] and [[8110,3],0]
[soleil:00650] [[8110,3],0]
[../../../../../../src/ompi-trunk/opal/mca/pmix/native/pmix_native.c:493] post
send to server
[soleil:00650] [[8110,3],0] posting recv on tag 5
[soleil:00650] [[8110,3],0] usock:send_nb: already connected to server -
queueing for send
[soleil:00650] [[8110,3],0] usock:send_handler called to send to server
[soleil:00650] [[8110,3],0] usock:send_handler SENDING TO SERVER
[soleil:00647] [[8110,2],0] pmix:native executing fence on 2 procs
[[8110,1],0] and [[8110,2],0]
[soleil:00647] [[8110,2],0]
[../../../../../../src/ompi-trunk/opal/mca/pmix/native/pmix_native.c:493] post
send to server
[soleil:00647] [[8110,2],0] posting recv on tag 5
[soleil:00647] [[8110,2],0] usock:send_nb: already connected to server -
queueing for send
[soleil:00647] [[8110,2],0] usock:send_handler called to send to server
[soleil:00647] [[8110,2],0] usock:send_handler SENDING TO SERVER
[soleil:00650] [[8110,3],0] usock:recv:handler called
[soleil:00650] [[8110,3],0] usock:recv:handler CONNECTED
[soleil:00650] [[8110,3],0] usock:recv:handler allocate new recv msg
[soleil:00650] usock:recv:handler read hdr
[soleil:00650] [[8110,3],0] usock:recv:handler allocate data region of
size 14
[soleil:00650] [[8110,3],0] RECVD COMPLETE MESSAGE FROM SERVER OF 14
BYTES FOR TAG 5
[soleil:00650] [[8110,3],0]
[../../../../../../src/ompi-trunk/opal/mca/pmix/native/usock_sendrecv.c:415]
post msg
[soleil:00650] [[8110,3],0] message received 14 bytes for tag 5
[soleil:00650] [[8110,3],0] checking msg on tag 5 for tag 5
[soleil:00650] [[8110,3],0] pmix:native recv callback activated with 14
bytes
[soleil:00650] [[8110,3],0] pmix:native fence released on 2 procs
[[8110,1],0] and [[8110,3],0]


Index: opal/mca/pmix/native/pmix_native.c
===
--- opal/mca/pmix/native/pmix_native.c  (revision 32594)
+++ opal/mca/pmix/native/pmix_native.c  (working copy)
@@ -390,9 +390,17 @@
 size_t i;
 uint32_t np;

-opal_output_verbose(2, opal_pmix_base_framework.framework_output,
+if (2 == nprocs) {
+opal_output_verbose(2, opal_pmix_base_framework.framework_output,
+"%s pmix:native executing fence on %u procs %s and %s",
+OPAL_NAME_PRINT(OPAL_PROC_MY_NAME), (unsigned 
int)nprocs,
+OPAL_NAME_PRINT(procs[0]),
+OPAL_NAME_PRINT(procs[1]));
+} else {
+opal_output_verbose(2, opal_pmix_base_framework.framework_output,
 "%s pmix:native executing fence on %u procs",
 OPAL_NAME_PRINT(OPAL_PROC_MY_NAME), (unsigned 
int)nprocs);
+}

 if (NULL == mca_pmix_native_component.uri) {
 /* no server available, so just return */
@@ -545,9 +553,17 @@

 OBJ_RELEASE(cb);

-opal_output_verbose(2, opal_pmix_base_framework.framework_output,
-"%s pmix:native fence released",
-OPAL_NAME_PRINT(OPAL_PROC_MY_NAME));
+if (2 == nprocs) {
+opal_output_verbose(2, opal_pmix_base_framework.framework_output,
+"%s pmix:native fence released on %u procs %s and %s",
+OPAL_NAME_PRINT(OPAL_PROC_MY_NAME), (unsigned 
int)nprocs,
+OPAL_NAME_PRINT(procs[0]),
+

Re: [OMPI devel] OMPI devel] MPI_Abort does not make mpirun return with the right exit code

2014-08-25 Thread Gilles Gouaillardet
Thanks Ralph !

i confirm my all test cases pass now :-)

FYI, i commited r32592 in order to fix a parsing bug on 32bits platform
(hence the mtt failures on trunk on x86)

Cheers,

Gilles


On 2014/08/23 4:59, Ralph Castain wrote:
> I think these are fixed now - at least, your test cases all pass for me
>
>
> On Aug 22, 2014, at 9:12 AM, Ralph Castain <r...@open-mpi.org> wrote:
>
>> On Aug 22, 2014, at 9:06 AM, Gilles Gouaillardet 
>> <gilles.gouaillar...@gmail.com> wrote:
>>
>>> Ralph,
>>>
>>> Will do on Monday
>>>
>>> About the first test, in my case echo $? returns 0
>> My "showcode" is just an alias for the echo
>>
>>> I noticed this confusing message in your output :
>>> mpirun noticed that process rank 0 with PID 24382 on node bend002 exited on 
>>> signal 0 (Unknown signal 0).
>> I'll take a look at why that happened
>>
>>> About the second test, please note my test program return 3;
>>> whereas your mpi_no_op.c return 0;
>> I didn't see that little cuteness - sigh
>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> Ralph Castain <r...@open-mpi.org> wrote:
>>> You might want to try again with current head of trunk as something seems 
>>> off in what you are seeing - more below
>>>
>>>
>>> On Aug 22, 2014, at 3:12 AM, Gilles Gouaillardet 
>>> <gilles.gouaillar...@iferc.org> wrote:
>>>
>>>> Ralph,
>>>>
>>>> i tried again after the merge and found the same behaviour, though the
>>>> internals are very different.
>>>>
>>>> i run without any batch manager
>>>>
>>>> from node0:
>>>> mpirun -np 1 --mca btl tcp,self -host node1 ./abort
>>>>
>>>> exit with exit code zero :-(
>>> Hmmm...it works fine for me, without your patch:
>>>
>>> 07:35:41  $ mpirun -n 1 -mca btl tcp,self -host bend002 ./abort
>>> Hello, World, I am 0 of 1
>>> --
>>> MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD 
>>> with errorcode 2.
>>>
>>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>>> You may or may not see output from other processes, depending on
>>> exactly when Open MPI kills them.
>>> --
>>> --
>>> mpirun noticed that process rank 0 with PID 24382 on node bend002 exited on 
>>> signal 0 (Unknown signal 0).
>>> --
>>> 07:35:56  $ showcode
>>> 130
>>>
>>>> short story : i applied pmix.2.patch and that fixed my problem
>>>> could you please review this ?
>>>>
>>>> long story :
>>>> i initially applied pmix.1.patch and it solved my problem
>>>> then i ran
>>>> mpirun -np 1 --mca btl openib,self -host node1 ./abort
>>>> and i came back to square one : exit code is zero
>>>> so i used the debugger and was unable to reproduce the issue
>>>> (one more race condition, yeah !)
>>>> finally, i wrote pmix.2.patch, fixed my issue and realized that
>>>> pmix.1.patch was no more needed.
>>>> currently, and assuming pmix.2.patch is correct, i cannot tell wether
>>>> pmix.1.patch is needed or not
>>>> since this part of the code is no more executed.
>>>>
>>>> i also found one hang with the following trivial program within one node :
>>>>
>>>> int main (int argc, char *argv[]) {
>>>> MPI_Init(, );
>>>>MPI_Finalize();
>>>>return 3;
>>>> }
>>>>
>>>> from node0 :
>>>> $ mpirun -np 1 ./test
>>>> ---
>>>> Primary job  terminated normally, but 1 process returned
>>>> a non-zero exit code.. Per user-direction, the job has been aborted.
>>>> ---
>>>>
>>>> AND THE PROGRAM HANGS
>>> This also works fine for me:
>>>
>>> 07:37:27  $ mpirun -n 1 ./mpi_no_op
>>> 07:37:36  $ cat mpi_no_op.c
>>> /* -*- C -*-
>>>  *
>>>  * $HEADER$
>>>  *
>>>  * The most basic of MPI applications
>>&

Re: [OMPI devel] OMPI devel] MPI_Abort does not make mpirun return with the right exit code

2014-08-22 Thread Gilles Gouaillardet
Ralph,

Will do on Monday

About the first test, in my case echo $? returns 0
I noticed this confusing message in your output :
mpirun noticed that process rank 0 with PID 24382 on node bend002 exited on 
signal 0 (Unknown signal 0).

About the second test, please note my test program return 3;
whereas your mpi_no_op.c return 0;

Cheers,

Gilles

Ralph Castain <r...@open-mpi.org> wrote:
>You might want to try again with current head of trunk as something seems off 
>in what you are seeing - more below
>
>
>
>On Aug 22, 2014, at 3:12 AM, Gilles Gouaillardet 
><gilles.gouaillar...@iferc.org> wrote:
>
>
>Ralph,
>
>i tried again after the merge and found the same behaviour, though the
>internals are very different.
>
>i run without any batch manager
>
>from node0:
>mpirun -np 1 --mca btl tcp,self -host node1 ./abort
>
>exit with exit code zero :-(
>
>
>Hmmm...it works fine for me, without your patch:
>
>
>07:35:41  $ mpirun -n 1 -mca btl tcp,self -host bend002 ./abort
>
>Hello, World, I am 0 of 1
>
>--
>
>MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD 
>
>with errorcode 2.
>
>
>NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>
>You may or may not see output from other processes, depending on
>
>exactly when Open MPI kills them.
>
>--
>
>--
>
>mpirun noticed that process rank 0 with PID 24382 on node bend002 exited on 
>signal 0 (Unknown signal 0).
>
>--
>
>07:35:56  $ showcode
>
>130
>
>
>
>short story : i applied pmix.2.patch and that fixed my problem
>could you please review this ?
>
>long story :
>i initially applied pmix.1.patch and it solved my problem
>then i ran
>mpirun -np 1 --mca btl openib,self -host node1 ./abort
>and i came back to square one : exit code is zero
>so i used the debugger and was unable to reproduce the issue
>(one more race condition, yeah !)
>finally, i wrote pmix.2.patch, fixed my issue and realized that
>pmix.1.patch was no more needed.
>currently, and assuming pmix.2.patch is correct, i cannot tell wether
>pmix.1.patch is needed or not
>since this part of the code is no more executed.
>
>i also found one hang with the following trivial program within one node :
>
>int main (int argc, char *argv[]) {
>MPI_Init(, );
>   MPI_Finalize();
>   return 3;
>}
>
>from node0 :
>$ mpirun -np 1 ./test
>---
>Primary job  terminated normally, but 1 process returned
>a non-zero exit code.. Per user-direction, the job has been aborted.
>---
>
>AND THE PROGRAM HANGS
>
>
>This also works fine for me:
>
>
>07:37:27  $ mpirun -n 1 ./mpi_no_op
>
>07:37:36  $ cat mpi_no_op.c
>
>/* -*- C -*-
>
> *
>
> * $HEADER$
>
> *
>
> * The most basic of MPI applications
>
> */
>
>
>#include 
>
>#include "mpi.h"
>
>
>int main(int argc, char* argv[])
>
>{
>
>    MPI_Init(, );
>
>
>    MPI_Finalize();
>
>    return 0;
>
>}
>
>
>
>
>*but*
>$ mpirun -np 1 -host node1 ./test
>---
>Primary job  terminated normally, but 1 process returned
>a non-zero exit code.. Per user-direction, the job has been aborted.
>---
>--
>mpirun detected that one or more processes exited with non-zero status,
>thus causing
>the job to be terminated. The first process to do so was:
>
> Process name: [[22080,1],0]
> Exit code:    3
>--
>
>return with exit code 3.
>
>
>Likewise here - works just fine for me
>
>
>
>
>then i found a strange behaviour with helloworld if only the self btl is
>used :
>$ mpirun -np 1 --mca btl self ./hw
>[helios91:23319] OPAL dss:unpack: got type 12 when expecting type 3
>[helios91:23319] [[22303,0],0] ORTE_ERROR_LOG: Pack data mismatch in
>file ../../../src/ompi-trunk/orte/orted/pmix/pmix_server_sendrecv.c at
>line 722
>
>the program returns with exit code zero, but display an error message.
>
>Cheers,
>
>Gilles
>
>On 2014/08/21 6:21, Ralph Castain wrote:
>
>I'm aware of the problem, but it will be fixed when the PMIx bran

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32555 - trunk/opal/mca/btl/scif

2014-08-21 Thread Gilles Gouaillardet
Thanks Ashley !

this is now fixed in r32568

Cheers,

Gilles

On 2014/08/21 19:00, Ashley Pittman wrote:
> One potential other issue, r32555 means that any other struct members are now 
> no longer zeroed, it might be worth putting a memset() or simply assigning a 
> value of {0} to the struct in order to preserve the old behaviour.
>
> Ashley.
>
> On 21 Aug 2014, at 04:31, Gilles Gouaillardet <gilles.gouaillar...@iferc.org> 
> wrote:
>
>> Paul,
>>
>> the piece of code that causes an issue with PGI 2013 and older is just a bit 
>> more complex.
>>
>> here is the enhanced test :
>>
>> struct S { int i; double d; };
>> struct Y { struct S s; } ;
>> struct S x = {1,0};
>>
>> int main (void)
>> {
>>   struct Y y = { .s = x };
>>   return 0;
>> }
>>
>>
>> it compiles just fine with PGI 2014 (14.7) but fails with PGI 2013 (13.9) 
>> and 2012 (12.10)
>> -c9x nor -c99 help with the older compilers :
>>
>> [gouaillardet@soleil tmp]$ /opt/pgi/linux86-64/14.7/bin/pgcc -c test.c
>> [gouaillardet@soleil tmp]$ /opt/pgi/linux86-64/13.9/bin/pgcc -c test.c
>> PGC-S-0094-Illegal type conversion required (test.c: 7)
>> PGC/x86-64 Linux 13.9-0: compilation completed with severe errors
>> [gouaillardet@soleil tmp]$ /opt/pgi/linux86-64/12.10/bin/pgcc -c test.c
>> PGC-S-0094-Illegal type conversion required (test.c: 7)
>> PGC/x86-64 Linux 12.10-0: compilation completed with severe errors
>> [gouaillardet@soleil tmp]$ /opt/pgi/linux86-64/13.9/bin/pgcc -c9x -c test.c
>> PGC-S-0094-Illegal type conversion required (test.c: 7)
>> PGC/x86-64 Linux 13.9-0: compilation completed with severe errors
>> [gouaillardet@soleil tmp]$ /opt/pgi/linux86-64/13.9/bin/pgcc -c99 -c test.c
>> PGC-S-0094-Illegal type conversion required (test.c: 7)
>> PGC/x86-64 Linux 13.9-0: compilation completed with severe errors
>>
>>
>> so unless there is room for interpretation in C99, this is a compiler bug.
>>
>> All,
>>
>> one option is r32555
>> an other option is to detect this in configure and skip the scif btl
>> an other option is not to support PGI compilers 2013 and older
>> and i am out of ideas for other options ...
>>
>> imho, r32555 is the less worst (not to say the best) option here
>>
>> Cheers,
>>
>> Gilles
>>
>> On 2014/08/21 2:06, Paul Hargrove wrote:
>>> Can somebody confirm that configure is adding "-c9x" or "-c99" to CFLAGS
>>> with this compiler?
>>> If not then r32555 could possibly be reverted in favor of adding the proper
>>> compiler flag.
>>>
>>> Also, I am suspicious of this failure because even without a language-level
>>> option pgcc 12.9 and 13.4 compile the following:
>>>
>>> struct S { int i; double d; };
>>> struct S x = {1,0};
>>> int main (void)
>>> {
>>>   struct S y = { .i = x.i };
>>>   return y.i;
>>> }
>>>
>>>
>>> -Paul
>>>
>>>
>>> On Wed, Aug 20, 2014 at 7:20 AM, Nathan Hjelm 
>>> <hje...@lanl.gov>
>>>  wrote:
>>>
>>>
>>>> Really? That means PGI 2013 is NOT C99 compliant! Figures.
>>>>
>>>> -Nathan
>>>>
>>>> On Tue, Aug 19, 2014 at 10:48:48PM -0400, 
>>>> svn-commit-mai...@open-mpi.org
>>>>
>>>> wrote:
>>>>
>>>>> Author: ggouaillardet (Gilles Gouaillardet)
>>>>> Date: 2014-08-19 22:48:47 EDT (Tue, 19 Aug 2014)
>>>>> New Revision: 32555
>>>>> URL: 
>>>>> https://svn.open-mpi.org/trac/ompi/changeset/32555
>>>>>
>>>>>
>>>>> Log:
>>>>> btl/scif: use safe syntax
>>>>>
>>>>> PGI compilers 2013 and older do not support the following syntax :
>>>>> mca_btl_scif_modex_t modex = {.port_id = mca_btl_scif_module.port_id};
>>>>> so split it on two lines
>>>>>
>>>>> cmr=v1.8.2:reviewer=hjelmn
>>>>>
>>>>> Text files modified:
>>>>>trunk/opal/mca/btl/scif/btl_scif_component.c | 3 ++-
>>>>>1 files changed, 2 insertions(+), 1 deletions(-)
>>>>>
>>>>> Modified: trunk/opal/mca/btl/scif/btl_scif_component.c
>>>>>
>>>>>
>>>> ==
>>>>
>>>>> --- trun

Re: [OMPI devel] [OMPI svn] svn:open-mpi r32555 - trunk/opal/mca/btl/scif

2014-08-21 Thread Gilles Gouaillardet
Paul,

the piece of code that causes an issue with PGI 2013 and older is just a
bit more complex.

here is the enhanced test :

struct S { int i; double d; };
struct Y { struct S s; } ;
struct S x = {1,0};

int main (void)
{
  struct Y y = { .s = x };
  return 0;
}


it compiles just fine with PGI 2014 (14.7) but fails with PGI 2013
(13.9) and 2012 (12.10)
-c9x nor -c99 help with the older compilers :

[gouaillardet@soleil tmp]$ /opt/pgi/linux86-64/14.7/bin/pgcc -c test.c
[gouaillardet@soleil tmp]$ /opt/pgi/linux86-64/13.9/bin/pgcc -c test.c
PGC-S-0094-Illegal type conversion required (test.c: 7)
PGC/x86-64 Linux 13.9-0: compilation completed with severe errors
[gouaillardet@soleil tmp]$ /opt/pgi/linux86-64/12.10/bin/pgcc -c test.c
PGC-S-0094-Illegal type conversion required (test.c: 7)
PGC/x86-64 Linux 12.10-0: compilation completed with severe errors
[gouaillardet@soleil tmp]$ /opt/pgi/linux86-64/13.9/bin/pgcc -c9x -c test.c
PGC-S-0094-Illegal type conversion required (test.c: 7)
PGC/x86-64 Linux 13.9-0: compilation completed with severe errors
[gouaillardet@soleil tmp]$ /opt/pgi/linux86-64/13.9/bin/pgcc -c99 -c test.c
PGC-S-0094-Illegal type conversion required (test.c: 7)
PGC/x86-64 Linux 13.9-0: compilation completed with severe errors


so unless there is room for interpretation in C99, this is a compiler bug.

All,

one option is r32555
an other option is to detect this in configure and skip the scif btl
an other option is not to support PGI compilers 2013 and older
and i am out of ideas for other options ...

imho, r32555 is the less worst (not to say the best) option here

Cheers,

Gilles

On 2014/08/21 2:06, Paul Hargrove wrote:
> Can somebody confirm that configure is adding "-c9x" or "-c99" to CFLAGS
> with this compiler?
> If not then r32555 could possibly be reverted in favor of adding the proper
> compiler flag.
>
> Also, I am suspicious of this failure because even without a language-level
> option pgcc 12.9 and 13.4 compile the following:
>
> struct S { int i; double d; };
> struct S x = {1,0};
> int main (void)
> {
>   struct S y = { .i = x.i };
>   return y.i;
> }
>
>
> -Paul
>
>
> On Wed, Aug 20, 2014 at 7:20 AM, Nathan Hjelm <hje...@lanl.gov> wrote:
>
>> Really? That means PGI 2013 is NOT C99 compliant! Figures.
>>
>> -Nathan
>>
>> On Tue, Aug 19, 2014 at 10:48:48PM -0400, svn-commit-mai...@open-mpi.org
>> wrote:
>>> Author: ggouaillardet (Gilles Gouaillardet)
>>> Date: 2014-08-19 22:48:47 EDT (Tue, 19 Aug 2014)
>>> New Revision: 32555
>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/32555
>>>
>>> Log:
>>> btl/scif: use safe syntax
>>>
>>> PGI compilers 2013 and older do not support the following syntax :
>>> mca_btl_scif_modex_t modex = {.port_id = mca_btl_scif_module.port_id};
>>> so split it on two lines
>>>
>>> cmr=v1.8.2:reviewer=hjelmn
>>>
>>> Text files modified:
>>>trunk/opal/mca/btl/scif/btl_scif_component.c | 3 ++-
>>>1 files changed, 2 insertions(+), 1 deletions(-)
>>>
>>> Modified: trunk/opal/mca/btl/scif/btl_scif_component.c
>>>
>> ==
>>> --- trunk/opal/mca/btl/scif/btl_scif_component.c  Tue Aug 19
>> 18:34:49 2014(r32554)
>>> +++ trunk/opal/mca/btl/scif/btl_scif_component.c  2014-08-19
>> 22:48:47 EDT (Tue, 19 Aug 2014)  (r32555)
>>> @@ -208,7 +208,8 @@
>>>
>>>  static int mca_btl_scif_modex_send (void)
>>>  {
>>> -mca_btl_scif_modex_t modex = {.port_id =
>> mca_btl_scif_module.port_id};
>>> +mca_btl_scif_modex_t modex;
>>> +modex.port_id = mca_btl_scif_module.port_id;
>>>
>>>  return opal_modex_send (_btl_scif_component.super.btl_version,
>> , sizeof (modex));
>>>  }
>>> ___
>>> svn mailing list
>>> s...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/08/15667.php
>>
>
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15669.php



[OMPI devel] MPI_Abort does not make mpirun return with the right exit code

2014-08-20 Thread Gilles Gouaillardet
Folks,

let's look at the following trivial test program :

#include 
#include 

int main (int argc, char * argv[]) {
int rank, size;
MPI_Init(, );
MPI_Comm_size(MPI_COMM_WORLD, );
MPI_Comm_rank(MPI_COMM_WORLD, );
printf ("I am %d/%d and i abort\n", rank, size);
MPI_Abort(MPI_COMM_WORLD, 2);
printf ("%d/%d aborted !\n", rank, size);
return 3;
}

and let's run mpirun (trunk) on node0 and ask the mpi task to run on
task 1 :
with two tasks or more :

node0 $ mpirun --mca btl tcp,self -host node1 -np 2 ./abort
--
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 2.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--
I am 1/2 and i abort
I am 0/2 and i abort
[node0:00740] 1 more process has sent help message help-mpi-api.txt /
mpi-abort
[node0:00740] Set MCA parameter "orte_base_help_aggregate" to 0 to see
all help / error messages

node0 $ echo $?
0

the exit status of mpirun is zero
/* this is why the MPI_Errhandler_fatal_c test fails in mtt */

now if we run only one task :

node0 $ mpirun --mca btl tcp,self -host node1 -np 1 ./abort
I am 0/1 and i abort
--
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 2.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--
--
mpirun has exited due to process rank 0 with PID 15884 on
node node1 exiting improperly. There are three reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
orte_create_session_dirs is set to false. In this case, the run-time cannot
detect that the abort call was an abnormal termination. Hence, the only
error message you will receive is this one.

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

You can avoid this message by specifying -quiet on the mpirun command line.

--
node0 $ echo $?
1

the program displayed a misleading error message and mpirun exited with
error code 1
/* i would have expected 2, or 3 in the worst case scenario */


i digged it a bit and found a kind of race condition in orted (running
on node 1)
basically, when the process dies, it writes stuff in the openmpi session
directory and exits.
exiting send a SIGCHLD to orted and close the socket/pipe connected to
orted.
on orted, the loss of connection is generally processed before the
SIGCHLD by libevent,
and as a consequence, the exit code is not correctly set (e.g. it is
left to zero).
i did not see any kind of communication between the mpi task and orted
(except writing a file in the openmpi session directory) as i would have
expected
/* but this was just my initial guess, the truth is i do not know what
is supposed to happen */

i wrote the attached abort.patch patch to basically get it working.
i highly suspect this is not the right thing to do so i did not commit it.

it works fine with two tasks or more.
with only one task, mpirun display a misleading error message but the
exit status is ok.

could someone (Ralph ?) have a look at this ?

Cheers,

Gilles


node0 $ mpirun --mca btl tcp,self -host node1 -np 2 ./abort
I am 1/2 and i abort
I am 0/2 and i abort
--
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 2.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--
[node0:00920] 1 more process has sent help message help-mpi-api.txt /
mpi-abort
[node0:00920] Set MCA parameter "orte_base_help_aggregate" to 0 to see
all help / error messages
node0 $ echo $?
2



node0 $ mpirun --mca btl tcp,self -host node1 -np 1 ./abort
I am 0/1 and i abort

Re: [OMPI devel] OMPI devel] [1.8.2rc4] OSHMEM fortran bindings with bad compilers

2014-08-19 Thread Gilles Gouaillardet
r32551 now detects this limitation and automatically disable oshmem profile. I 
am now revamping the patch for v1.8

Gilles

Gilles Gouaillardet <gilles.gouaillar...@iferc.org> wrote:
>In the case of PGI compilers prior to 13, a workaround is to configure with 
>--disable-oshmem-profile
>
>On 2014/08/18 16:21, Gilles Gouaillardet wrote:
>
>Josh, Paul, the problem with old PGI compilers comes from the preprocessor (!) 
>with pgi 12.10 : oshmem/shmem/fortran/start_pes_f.c 
>SHMEM_GENERATE_WEAK_BINDINGS(START_PES, start_pes) gets expanded as #pragma 
>weak START_PES = PSTART_PES SHMEM_GENERATE_WEAK_PRAGMA ( weak start_pes_ = 
>pstart_pes_ ) whereas with pgi 14.7, it gets expanded as #pragma weak 
>START_PES = PSTART_PES #pragma weak start_pes_ = pstart_pes_ #pragma weak 
>start_pes__ = pstart_pes__ from oshmem/shmem/fortran/profile/pbindings.h : 
>#define SHMEM_GENERATE_WEAK_PRAGMA(x) _Pragma(#x) #define 
>SHMEM_GENERATE_WEAK_BINDINGS(UPPER_NAME, lower_name) \ 
>SHMEM_GENERATE_WEAK_PRAGMA(weak UPPER_NAME = P ## UPPER_NAME) \ 
>SHMEM_GENERATE_WEAK_PRAGMA(weak lower_name ## _ = p ## lower_name ## _) \ 
>SHMEM_GENERATE_WEAK_PRAGMA(weak lower_name ## __ = p ## lower_name ## __) a 
>workaround is to manually expand the SHMEM_GENERATE_WEAK_BINDINGS macro and 
>replace SHMEM_GENERATE_WEAK_BINDINGS(START_PES, start_pes) with 
>SHMEM_GENERATE_WEAK_PRAGMA(weak START_PES = PSTART_PES) 
>SHMEM_GENERATE_WEAK_PRAGMA(weak start_pes_ = pstart_pes_) 
>SHMEM_GENERATE_WEAK_PRAGMA(weak start_pes__ = pstart_pes__) /* i was unable to 
>get something that works by simply replacing the definition of the 
>SHMEM_GENERATE_WEAK_BINDINGS macro */ of course, this would have to be 
>repeated in all the source files ... Cheers, Gilles On 2014/08/15 3:44, Paul 
>Hargrove wrote: 
>
>Josh, The specific compilers that caused the most problems are the older PGI 
>compilers (any before 13.x). In this case the user has the option to update 
>their compiler to 13.10 or newer. There are also issues with IBM's xlf. For 
>the IBM compiler I have never found a version that builds/links the MPI f08 
>bindings, and now also find no version that can link the OSHMEM fortran 
>bindings. -Paul -Paul On Thu, Aug 14, 2014 at 11:30 AM, Joshua Ladd 
><jladd.m...@gmail.com> wrote: 
>
>We will update the README accordingly. Thank you, Paul. Josh On Thu, Aug 14, 
>2014 at 10:00 AM, Jeff Squyres (jsquyres) < jsquy...@cisco.com> wrote: 
>
>Good points. Mellanox -- can you update per Paul's suggestions? On Aug 13, 
>2014, at 8:26 PM, Paul Hargrove <phhargr...@lbl.gov> wrote: 
>
>The following is NOT a bug report. This is just an observation that may 
>deserve some text in the README. I've reported issues in the past with some 
>Fortran compilers (mostly 
>
>older XLC and PGI) which either cannot build the "use mpi_f08" module, or 
>cannot correctly link to it (and sometimes this fails only if configured with 
>--enable-debug). 
>
>Testing the OSHMEM Fortran bindings (enabled by default on Linux) I 
>
>have found several compilers which fail to link the examples (hello_oshmemfh 
>and ring_oshmemfh). I reported one specific instance (with xlc-11/xlf-13) back 
>in February: http://www.open-mpi.org/community/lists/devel/2014/02/14057.php 
>
>So far I have these failures only on platforms where the Fortran 
>
>compiler is *known* to be broken for the MPI f90 and/or f08 bindings. 
>Specifically, all the failing platforms are ones on which either: 
>
>+ Configure determines (without my help) that FC cannot build the F90 
>
>and/or F08 modules. 
>
>OR + I must pass --enable-mpi-fortran=usempi or --enable-mpi-fortran=mpifh 
>
>for cases configure cannot detect. 
>
>So, I do *not* believe there is anything wrong with the OSHMEM code, 
>
>which is why I started this post with "The following is NOT a bug report". 
>However, I have two recommendations to make: 
>
>1) Documentation The README says just: --disable-oshmem-fortran Disable 
>building only the Fortran OSHMEM bindings. So, I recommend adding a sentence 
>there referencing the "Compiler 
>
>Notes" section of the README which has details on some known bad Fortran 
>compilers. 
>
>2) Configure: As I noted above, at least some of the failures are on platforms 
>where 
>
>configure has determined it cannot build the f08 MPI bindings. So, maybe there 
>is something that could be done at configure time to disqualify some Fortran 
>compilers from building the OSHMEM fotran bindings, too. 
>
>-Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group 
>Computer and Data Sciences Department Tel: +1-510-495-2352 Lawrence Berkeley 
>National Laboratory Fax: +1-510-486-6900 
>___

Re: [OMPI devel] [1.8.2rc4] OSHMEM fortran bindings with bad compilers

2014-08-18 Thread Gilles Gouaillardet
In the case of PGI compilers prior to 13, a workaround is to configure
with --disable-oshmem-profile

On 2014/08/18 16:21, Gilles Gouaillardet wrote:
> Josh, Paul,
>
> the problem with old PGI compilers comes from the preprocessor (!)
>
> with pgi 12.10 :
> oshmem/shmem/fortran/start_pes_f.c
> SHMEM_GENERATE_WEAK_BINDINGS(START_PES, start_pes)
>
> gets expanded as
>
> #pragma weak START_PES = PSTART_PES SHMEM_GENERATE_WEAK_PRAGMA ( weak
> start_pes_ = pstart_pes_ )
>
> whereas with pgi 14.7, it gets expanded as
>
> #pragma weak START_PES = PSTART_PES
> #pragma weak start_pes_ = pstart_pes_
> #pragma weak start_pes__ = pstart_pes__
>
> from oshmem/shmem/fortran/profile/pbindings.h :
> #define SHMEM_GENERATE_WEAK_PRAGMA(x) _Pragma(#x)
>
> #define SHMEM_GENERATE_WEAK_BINDINGS(UPPER_NAME,
> lower_name) \
> SHMEM_GENERATE_WEAK_PRAGMA(weak UPPER_NAME = P ##
> UPPER_NAME)\
> SHMEM_GENERATE_WEAK_PRAGMA(weak lower_name ## _ = p ## lower_name ##
> _)  \
> SHMEM_GENERATE_WEAK_PRAGMA(weak lower_name ## __ = p ## lower_name
> ## __)
>
> a workaround is to manually expand the SHMEM_GENERATE_WEAK_BINDINGS
> macro and replace
>
> SHMEM_GENERATE_WEAK_BINDINGS(START_PES, start_pes)
>
> with
>
> SHMEM_GENERATE_WEAK_PRAGMA(weak START_PES = PSTART_PES)
> SHMEM_GENERATE_WEAK_PRAGMA(weak start_pes_ = pstart_pes_)
> SHMEM_GENERATE_WEAK_PRAGMA(weak start_pes__ = pstart_pes__)
>
> /* i was unable to get something that works by simply replacing the
> definition of the SHMEM_GENERATE_WEAK_BINDINGS macro */
>
> of course, this would have to be repeated in all the source files ...
>
>
> Cheers,
>
> Gilles
>
> On 2014/08/15 3:44, Paul Hargrove wrote:
>> Josh,
>>
>> The specific compilers that caused the most problems are the older PGI
>> compilers (any before 13.x).
>> In this case the user has the option to update their compiler to 13.10 or
>> newer.
>>
>> There are also issues with IBM's xlf.
>> For the IBM compiler I have never found a version that builds/links the MPI
>> f08 bindings, and now also find no version that can link the OSHMEM fortran
>> bindings.
>>
>> -Paul
>>
>> -Paul
>>
>>
>> On Thu, Aug 14, 2014 at 11:30 AM, Joshua Ladd <jladd.m...@gmail.com> wrote:
>>
>>> We will update the README accordingly. Thank you, Paul.
>>>
>>> Josh
>>>
>>>
>>> On Thu, Aug 14, 2014 at 10:00 AM, Jeff Squyres (jsquyres) <
>>> jsquy...@cisco.com> wrote:
>>>
>>>> Good points.
>>>>
>>>> Mellanox -- can you update per Paul's suggestions?
>>>>
>>>>
>>>> On Aug 13, 2014, at 8:26 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>>>>
>>>>> The following is NOT a bug report.
>>>>> This is just an observation that may deserve some text in the README.
>>>>>
>>>>> I've reported issues in the past with some Fortran compilers (mostly
>>>> older XLC and PGI) which either cannot build the "use mpi_f08" module, or
>>>> cannot correctly link to it (and sometimes this fails only if configured
>>>> with --enable-debug).
>>>>> Testing the OSHMEM Fortran bindings (enabled by default on Linux) I
>>>> have found several compilers which fail to link the examples
>>>> (hello_oshmemfh and ring_oshmemfh).  I reported one specific instance (with
>>>> xlc-11/xlf-13) back in February:
>>>> http://www.open-mpi.org/community/lists/devel/2014/02/14057.php
>>>>> So far I have these failures only on platforms where the Fortran
>>>> compiler is *known* to be broken for the MPI f90 and/or f08 bindings.
>>>> Specifically, all the failing platforms are ones on which either:
>>>>> + Configure determines (without my help) that FC cannot build the F90
>>>> and/or F08 modules.
>>>>> OR
>>>>> + I must pass --enable-mpi-fortran=usempi or --enable-mpi-fortran=mpifh
>>>> for cases configure cannot detect.
>>>>> So, I do *not* believe there is anything wrong with the OSHMEM code,
>>>> which is why I started this post with "The following is NOT a bug report".
>>>> However, I have two recommendations to make:
>>>>> 1) Documentation
>>>>>
>>>>> The README says just:
>>>>>
>>>>> --disable-oshmem-fortran
>>>>>   Disable building only the Fortran OSHMEM bindings.
>>>>>
>>>>> So, I reco

Re: [OMPI devel] [1.8.2rc4] OSHMEM fortran bindings with bad compilers

2014-08-18 Thread Gilles Gouaillardet
Josh, Paul,

the problem with old PGI compilers comes from the preprocessor (!)

with pgi 12.10 :
oshmem/shmem/fortran/start_pes_f.c
SHMEM_GENERATE_WEAK_BINDINGS(START_PES, start_pes)

gets expanded as

#pragma weak START_PES = PSTART_PES SHMEM_GENERATE_WEAK_PRAGMA ( weak
start_pes_ = pstart_pes_ )

whereas with pgi 14.7, it gets expanded as

#pragma weak START_PES = PSTART_PES
#pragma weak start_pes_ = pstart_pes_
#pragma weak start_pes__ = pstart_pes__

from oshmem/shmem/fortran/profile/pbindings.h :
#define SHMEM_GENERATE_WEAK_PRAGMA(x) _Pragma(#x)

#define SHMEM_GENERATE_WEAK_BINDINGS(UPPER_NAME,
lower_name) \
SHMEM_GENERATE_WEAK_PRAGMA(weak UPPER_NAME = P ##
UPPER_NAME)\
SHMEM_GENERATE_WEAK_PRAGMA(weak lower_name ## _ = p ## lower_name ##
_)  \
SHMEM_GENERATE_WEAK_PRAGMA(weak lower_name ## __ = p ## lower_name
## __)

a workaround is to manually expand the SHMEM_GENERATE_WEAK_BINDINGS
macro and replace

SHMEM_GENERATE_WEAK_BINDINGS(START_PES, start_pes)

with

SHMEM_GENERATE_WEAK_PRAGMA(weak START_PES = PSTART_PES)
SHMEM_GENERATE_WEAK_PRAGMA(weak start_pes_ = pstart_pes_)
SHMEM_GENERATE_WEAK_PRAGMA(weak start_pes__ = pstart_pes__)

/* i was unable to get something that works by simply replacing the
definition of the SHMEM_GENERATE_WEAK_BINDINGS macro */

of course, this would have to be repeated in all the source files ...


Cheers,

Gilles

On 2014/08/15 3:44, Paul Hargrove wrote:
> Josh,
>
> The specific compilers that caused the most problems are the older PGI
> compilers (any before 13.x).
> In this case the user has the option to update their compiler to 13.10 or
> newer.
>
> There are also issues with IBM's xlf.
> For the IBM compiler I have never found a version that builds/links the MPI
> f08 bindings, and now also find no version that can link the OSHMEM fortran
> bindings.
>
> -Paul
>
> -Paul
>
>
> On Thu, Aug 14, 2014 at 11:30 AM, Joshua Ladd  wrote:
>
>> We will update the README accordingly. Thank you, Paul.
>>
>> Josh
>>
>>
>> On Thu, Aug 14, 2014 at 10:00 AM, Jeff Squyres (jsquyres) <
>> jsquy...@cisco.com> wrote:
>>
>>> Good points.
>>>
>>> Mellanox -- can you update per Paul's suggestions?
>>>
>>>
>>> On Aug 13, 2014, at 8:26 PM, Paul Hargrove  wrote:
>>>
 The following is NOT a bug report.
 This is just an observation that may deserve some text in the README.

 I've reported issues in the past with some Fortran compilers (mostly
>>> older XLC and PGI) which either cannot build the "use mpi_f08" module, or
>>> cannot correctly link to it (and sometimes this fails only if configured
>>> with --enable-debug).
 Testing the OSHMEM Fortran bindings (enabled by default on Linux) I
>>> have found several compilers which fail to link the examples
>>> (hello_oshmemfh and ring_oshmemfh).  I reported one specific instance (with
>>> xlc-11/xlf-13) back in February:
>>> http://www.open-mpi.org/community/lists/devel/2014/02/14057.php
 So far I have these failures only on platforms where the Fortran
>>> compiler is *known* to be broken for the MPI f90 and/or f08 bindings.
>>> Specifically, all the failing platforms are ones on which either:
 + Configure determines (without my help) that FC cannot build the F90
>>> and/or F08 modules.
 OR
 + I must pass --enable-mpi-fortran=usempi or --enable-mpi-fortran=mpifh
>>> for cases configure cannot detect.
 So, I do *not* believe there is anything wrong with the OSHMEM code,
>>> which is why I started this post with "The following is NOT a bug report".
>>> However, I have two recommendations to make:
 1) Documentation

 The README says just:

 --disable-oshmem-fortran
   Disable building only the Fortran OSHMEM bindings.

 So, I recommend adding a sentence there referencing the "Compiler
>>> Notes" section of the README which has details on some known bad Fortran
>>> compilers.
 2) Configure:

 As I noted above, at least some of the failures are on platforms where
>>> configure has determined it cannot build the f08 MPI bindings.  So, maybe
>>> there is something that could be done at configure time to disqualify some
>>> Fortran compilers from building the OSHMEM fotran bindings, too.
 -Paul

 --
 Paul H. Hargrove  phhargr...@lbl.gov
 Future Technologies Group
 Computer and Data Sciences Department Tel: +1-510-495-2352
 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
 ___
 devel mailing list
 de...@open-mpi.org
 Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
 Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2014/08/15643.php
>>>
>>>
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>> 

Re: [OMPI devel] [OMPI users] OpenMPI fails with np > 65

2014-08-13 Thread Gilles Gouaillardet
Lenny,

that looks related to #4857 which has been fixed in trunk since r32517

could you please update your openmpi library and try again ?

Gilles

On 2014/08/13 17:00, Lenny Verkhovsky wrote:
> Following Jeff's suggestion adding devel mailing list.
>
> Hi All,
> I am currently facing strange situation that I can't run OMPI on more than 65 
> nodes.
> It seems like environmental issue that does not allow me to open more 
> connections.
> Any ideas ?
> Log attached, more info below in the mail.
>
> Running OMPI from trunk
> [node-119.ssauniversal.ssa.kodiak.nx:02996] [[56978,0],65] ORTE_ERROR_LOG: 
> Error in file base/ess_base_std_orted.c at line 288
>
> Thanks.
> Lenny Verkhovsky
> SW Engineer,  Mellanox Technologies
> www.mellanox.com
>
> Office:+972 74 712 9244
> Mobile:  +972 54 554 0233
> Fax:+972 72 257 9400
>
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Lenny Verkhovsky
> Sent: Tuesday, August 12, 2014 1:13 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] OpenMPI fails with np > 65
>
>
> Hi,
>
> Config:
> ./configure --enable-openib-rdmacm-ibaddr --prefix /home/sources/ompi-bin 
> --enable-mpirun-prefix-by-default --with-openib=/usr/local --enable-debug 
> --disable-openib-connectx-xrc
>
> Run:
> /home/sources/ompi-bin/bin/mpirun -np 65 --host 
> ko0067,ko0069,ko0070,ko0074,ko0076,ko0079,ko0080,ko0082,ko0085,ko0087,ko0088,ko0090,ko0096,ko0098,ko0099,ko0101,ko0103,ko0107,ko0111,ko0114,ko0116,ko0125,ko0128,ko0134,ko0141,ko0144,ko0145,ko0148,ko0149,ko0150,ko0152,ko0154,ko0156,ko0157,ko0158,ko0162,ko0164,ko0166,ko0168,ko0170,ko0174,ko0178,ko0181,ko0185,ko0190,ko0192,ko0195,ko0197,ko0200,ko0203,ko0205,ko0207,ko0209,ko0210,ko0211,ko0213,ko0214,ko0217,ko0218,ko0223,ko0228,ko0229,ko0231,ko0235,ko0237
>  --mca btl openib,self  --mca btl_openib_cpc_include rdmacm --mca pml ob1 
> --mca btl_openib_if_include mthca0:1 --mca plm_base_verbose 5 --debug-daemons 
> hostname 2>&1|tee > /tmp/mpi.log
>
> Environment:
>  According to the attached log it's rsh environment
>
>
> Output attached
>
> Notes:
> The problem is always with tha last node, 64 connections work, 65 connections 
> fail.
> node-119.ssauniversal.ssa.kodiak.nx == ko0237
>
> mpi.log line 1034:
> --
> An invalid value was supplied for an enum variable.
>   Variable : orte_debug_daemons
>   Value: 1,1
>   Valid values : 0: f|false|disabled, 1: t|true|enabled
> --
>
> mpi.log line 1059:
> [node-119.ssauniversal.ssa.kodiak.nx:02996] [[56978,0],65] ORTE_ERROR_LOG: 
> Error in file base/ess_base_std_orted.c at line 288
>
>
>
> Lenny Verkhovsky
> SW Engineer,  Mellanox Technologies
> www.mellanox.com
>
> Office:+972 74 712 9244
> Mobile:  +972 54 554 0233
> Fax:+972 72 257 9400
>
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
> Sent: Monday, August 11, 2014 4:53 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] OpenMPI fails with np > 65
>
> Okay, let's start with the basics :-)
>
> How was this configured? What environment are you running in (rsh, slurm, 
> ??)? If you configured --enable-debug, then please run it with
>
> --mca plm_base_verbose 5 --debug-daemons
>
> and send the output
>
>
> On Aug 11, 2014, at 12:07 AM, Lenny Verkhovsky 
> > wrote:
>
> I don't think so,
> It's always the 66th node, even if I swap between 65th and 66th
> I also get the same error when setting np=66, while having only 65 hosts in 
> hostfile
> (I am using only tcp btl )
>
>
> Lenny Verkhovsky
> SW Engineer,  Mellanox Technologies
> www.mellanox.com
>
> Office:+972 74 712 9244
> Mobile:  +972 54 554 0233
> Fax:+972 72 257 9400
>
> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
> Sent: Monday, August 11, 2014 1:07 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] OpenMPI fails with np > 65
>
> Looks to me like your 65th host is missing the dstore library - is it 
> possible you don't have your paths set correctly on all hosts in your 
> hostfile?
>
>
> On Aug 10, 2014, at 1:13 PM, Lenny Verkhovsky 
> > wrote:
>
>
> Hi all,
>
> Trying to run OpenMPI ( trunk Revision: 32428 ) I faced the problem running 
> OMPI with more than 65 procs.
> It looks like MPI failes to open 66th connection even with running `hostname` 
> over tcp.
> It also seems to unrelated to specific host.
> All hosts are Ubuntu 12.04.1 LTS
>
> mpirun -np 66 --hostfile /proj/SSA/Mellanox/tmp//20140810_070156_hostfile.txt 
> --mca btl tcp,self hostname
> [nodename] [[4452,0],65] ORTE_ERROR_LOG: Error in file 
> base/ess_base_std_orted.c at line 288
>
> ...
> It looks like environment issue, but I can't find any limit related.
> Any 

Re: [OMPI devel] Grammar error in git master: 'You job will now abort'

2014-08-13 Thread Gilles Gouaillardet
Thanks Christopher,

this has been fixed in the trunk with r32520

Cheers,

Gilles

On 2014/08/13 14:49, Christopher Samuel wrote:
> Hi all,
>
> We spotted this in 1.6.5 and git grep shows it's fixed in the
> v1.8 branch but in master it's still there:
>
> samuel@haswell:~/Code/OMPI/ompi-svn-mirror$ git grep -n 'You job will now 
> abort'
> orte/tools/orterun/help-orterun.txt:679:You job will now abort.
> samuel@haswell:~/Code/OMPI/ompi-svn-mirror$ 
>
> I'm using https://github.com/open-mpi/ompi-svn-mirror.git so
> let me know if I should be using something else now.
>
> cheers,
> Chris



[OMPI devel] trunk hang when nodes have similar but private network

2014-08-13 Thread Gilles Gouaillardet
Folks,

i noticed mpirun (trunk) hangs when running any mpi program on two nodes
*and* each node has a private network with the same ip
(in my case, each node has a private network to a MIC)

in order to reproduce the problem, you can simply run (as root) on the
two compute nodes
brctl addbr br0
ifconfig br0 192.168.255.1 netmask 255.255.255.0

mpirun will hang

a workaroung is to add --mca btl_tcp_if_include eth0

v1.8 does not hang in this case

Cheers,

Gilles


Re: [OMPI devel] errors and warnings with show_help() usage

2014-08-11 Thread Gilles Gouaillardet
Jeff and all,

i fixed the trivial errors in the trunk, there are now 11 non trivial
errors.
(commits r32490 to r32497)

i ran the script vs the v1.8 branch and found 54 errors
(first, you need to
touch Makefile.ompi-rules
in the top-level Open MPI directory in order to make the script happy)

Cheers,

Gilles

On 2014/08/08 22:43, Jeff Squyres (jsquyres) wrote:
> SHORT VERSION
> =
>
> The ./contrib/check-help-strings.pl script is showing ***47 coding errors*** 
> with regards to using show_help() in components.  Here's a summary of the 
> offenders:
>
> - ORTE (lumped together because there's a single maintainer :-) )
> - smcuda and cuda
> - common/verbs
> - bcol
> - mxm
> - openib
> - oshmem
>
> Could the owners of these portions of the code base please run 
> ./contrib/check-help-strings.pl and fix the ERRORs that are shown?
>
> Thanks!
>
> MORE DETAIL
> ===
>
> The first part of ./contrib/check-help-strings.pl's output shows ERRORs -- 
> referring to help files that do not exist, or referring to help topics that 
> do not exist.
>
> I'm only calling out the ERRORs in this mail -- but the second part of the 
> output shows a bazillion WARNINGs, too.  These are help topics that are 
> probably unused -- they don't seem to be referenced by the code anywhere.  
>
> It would be good to clean up all the WARNINGs, too, but the ERRORs are more 
> worrisome.
>



Re: [OMPI devel] ibm abort test hangs on one node

2014-08-11 Thread Gilles Gouaillardet
Thanks Ralph !

this was necessary but not sufficient :

orte_errmgr_base_abort calls orte_session_dir_finalize at
errmgr_base_fns.c:219
that will remove the proc session dir
then, orte_errmgr_base_abort (indirectly) calls orte_ess_base_app_abort
at line 227

first, the proc session dir is removed
then the "aborted" empty file is created in the previously removed directory
(and there is no error check, so the failure gets un-noticed)
as a consequence, the code you added in r32460 do not get executed.

i commited r32498 to fix this.
it simply does not call orte_session_dir_finalize in the first place
(which is sufficient but might not be necessary ...)

Cheers,

Gilles

On 2014/08/09 1:27, Ralph Castain wrote:
> Committed a fix for this in r32460 - see if I got it!
>
> On Aug 8, 2014, at 4:02 AM, Gilles Gouaillardet 
> <gilles.gouaillar...@iferc.org> wrote:
>
>> Folks,
>>
>> here is the description of a hang i briefly mentionned a few days ago.
>>
>> with the trunk (i did not check 1.8 ...) simply run on one node :
>> mpirun -np 2 --mca btl sm,self ./abort
>>
>> (the abort test is taken from the ibm test suite : process 0 call
>> MPI_Abort while process 1 enters an infinite loop)
>>
>> there is a race condition : sometimes it hangs, sometimes it aborts
>> nicely as expected.
>> when the hang occurs, both abort processes have exited and mpirun waits
>> forever
>>
>> i made some investigations and i have now a better idea of what happens
>> (but i am still clueless on how to fix this)
>>
>> when process 0 abort, it :
>> - closes the tcp socket connected to mpirun
>> - closes the pipe connected to mpirun
>> - send SIGCHLD to mpirun
>>
>> then on mpirun :
>> when SIGCHLD is received, the handler basically writes 17 (the signal
>> number) to a socketpair.
>> then libevent will return from a poll and here is the race condition,
>> basically :
>> if revents is non zero for the three fds (socket, pipe and socketpair)
>> then the program will abort nicely
>> if revents is non zero for both socket and pipe but is zero for the
>> socketpair, then the mpirun will hang
>>
>> i digged a bit deeper and found that when the event on the socketpair is
>> processed, it will end up calling
>> odls_base_default_wait_local_proc.
>> if proc->state is 5 (aka ORTE_PROC_STATE_REGISTERED), then the program
>> will abort nicely
>> *but* if proc->state is 6 (aka ORTE_PROC_STATE_IOF_COMPLETE), then the
>> program will hang
>>
>> an other way to put this is that
>> when the program aborts nicely, the call sequence is
>> odls_base_default_wait_local_proc
>> proc_errors(vpid=0)
>> proc_errors(vpid=0)
>> proc_errors(vpid=1)
>> proc_errors(vpid=1)
>>
>> when the program hangs, the call sequence is
>> proc_errors(vpid=0)
>> odls_base_default_wait_local_proc
>> proc_errors(vpid=0)
>> proc_errors(vpid=1)
>> proc_errors(vpid=1)
>>
>> i will resume this on Monday unless someone can fix this in the mean
>> time :-)
>>
>> Cheers,
>>
>> Gilles
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/08/15552.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15560.php



[OMPI devel] ibm abort test hangs on one node

2014-08-08 Thread Gilles Gouaillardet
Folks,

here is the description of a hang i briefly mentionned a few days ago.

with the trunk (i did not check 1.8 ...) simply run on one node :
mpirun -np 2 --mca btl sm,self ./abort

(the abort test is taken from the ibm test suite : process 0 call
MPI_Abort while process 1 enters an infinite loop)

there is a race condition : sometimes it hangs, sometimes it aborts
nicely as expected.
when the hang occurs, both abort processes have exited and mpirun waits
forever

i made some investigations and i have now a better idea of what happens
(but i am still clueless on how to fix this)

when process 0 abort, it :
- closes the tcp socket connected to mpirun
- closes the pipe connected to mpirun
- send SIGCHLD to mpirun

then on mpirun :
when SIGCHLD is received, the handler basically writes 17 (the signal
number) to a socketpair.
then libevent will return from a poll and here is the race condition,
basically :
if revents is non zero for the three fds (socket, pipe and socketpair)
then the program will abort nicely
if revents is non zero for both socket and pipe but is zero for the
socketpair, then the mpirun will hang

i digged a bit deeper and found that when the event on the socketpair is
processed, it will end up calling
odls_base_default_wait_local_proc.
if proc->state is 5 (aka ORTE_PROC_STATE_REGISTERED), then the program
will abort nicely
*but* if proc->state is 6 (aka ORTE_PROC_STATE_IOF_COMPLETE), then the
program will hang

an other way to put this is that
when the program aborts nicely, the call sequence is
odls_base_default_wait_local_proc
proc_errors(vpid=0)
proc_errors(vpid=0)
proc_errors(vpid=1)
proc_errors(vpid=1)

when the program hangs, the call sequence is
proc_errors(vpid=0)
odls_base_default_wait_local_proc
proc_errors(vpid=0)
proc_errors(vpid=1)
proc_errors(vpid=1)

i will resume this on Monday unless someone can fix this in the mean
time :-)

Cheers,

Gilles


Re: [OMPI devel] [OMPI users] bus error with openmpi-1.8.2rc2 on Solaris 10 Sparc

2014-08-08 Thread Gilles Gouaillardet
George,

(one of the) faulty line was :

   if (ORTE_SUCCESS != (rc =
opal_db.store((opal_identifier_t*)ORTE_PROC_MY_NAME, OPAL_SCOPE_INTERNAL,

OPAL_DB_LOCALLDR, (opal_identifier_t*), OPAL_ID_T))) {

so if proc is not 64 bits aligned, a SIGBUS will occur on sparc.
as you pointed, replacing OPAL_ID_T with ORTE_NAME will very likely fix
the issue (i have no arch to test...)

i was initially also "confused" with the following line

if (ORTE_SUCCESS != (rc =
opal_db.store((opal_identifier_t*), OPAL_SCOPE_INTERNAL,
ORTE_DB_NPROC_OFFSET,
, OPAL_UINT32))) {

the first argument of store is an (opal_identifier_t *)
strictly speaking this is "a pointer to a 64 bits aligned address", and
proc might not be 64 bits aligned.
/* that being said, there is no crash :-) */

in this case, opal_db.store pointer points to the store function
(db_hash.c:178)
and proc is only used id memcpy at line 194, so 64 bits alignment is not
required.
(and comment is explicit :/* to protect alignment, copy the data across */

that might sounds pedantic, but are we doing the right thing here ?
(e.g. cast to (opal_identifier_t *), followed by a memcpy  in case the
pointer was not 64 bits aligned
vs always use aligned data ?)

Cheers,

Gilles

On 2014/08/08 14:58, George Bosilca wrote:
> This is a gigantic patch for an almost trivial issue. The current problem
> is purely related to the fact that in a single location (nidmap.c) the
> orte_process_name_t (which is a structure of 2 integers) is supposed to be
> aligned based on the uint64_t requirements. Bad assumption!
>
> Looking at the code one might notice that the orte_process_name_t is stored
> using a particular DSS type OPAL_ID_T. This is a shortcut that doesn't hold
> on the SPARC architecture because the two types (int32_t and int64_t) have
> different alignments.  However, ORTE define a type for orte_process_name_t.
> Thus, I think that if instead of saving the orte_process_name_t as an
> OPAL_ID_T, we save it as an ORTE_NAME the issue will go away.
>
>   George.
>
>
>
> On Fri, Aug 8, 2014 at 1:04 AM, Gilles Gouaillardet <
> gilles.gouaillar...@iferc.org> wrote:
>
>> Kawashima-san and all,
>>
>> Here is attached a one off patch for v1.8.
>> /* it does not use the __attribute__ modifier that might not be
>> supported by all compilers */
>>
>> as far as i am concerned, the same issue is also in the trunk,
>> and if you do not hit it, it just means you are lucky :-)
>>
>> the same issue might also be in other parts of the code :-(
>>
>> Cheers,
>>
>> Gilles
>>
>> On 2014/08/08 13:45, Kawashima, Takahiro wrote:
>>> Gilles, George,
>>>
>>> The problem is the one Gilles pointed.
>>> I temporarily modified the code bellow and the bus error disappeared.
>>>
>>> --- orte/util/nidmap.c  (revision 32447)
>>> +++ orte/util/nidmap.c  (working copy)
>>> @@ -885,7 +885,7 @@
>>>  orte_proc_state_t state;
>>>  orte_app_idx_t app_idx;
>>>  int32_t restarts;
>>> -orte_process_name_t proc, dmn;
>>> +orte_process_name_t proc __attribute__((__aligned__(8))), dmn;
>>>  char *hostname;
>>>  uint8_t flag;
>>>  opal_buffer_t *bptr;
>>>
>>> Takahiro Kawashima,
>>> MPI development team,
>>> Fujitsu
>>>
>>>> Kawashima-san,
>>>>
>>>> This is interesting :-)
>>>>
>>>> proc is in the stack and has type orte_process_name_t
>>>>
>>>> with
>>>>
>>>> typedef uint32_t orte_jobid_t;
>>>> typedef uint32_t orte_vpid_t;
>>>> struct orte_process_name_t {
>>>> orte_jobid_t jobid; /**< Job number */
>>>> orte_vpid_t vpid;   /**< Process id - equivalent to rank */
>>>> };
>>>> typedef struct orte_process_name_t orte_process_name_t;
>>>>
>>>>
>>>> so there is really no reason to align this on 8 bytes...
>>>> but later, proc is casted into an uint64_t ...
>>>> so proc should have been aligned on 8 bytes but it is too late,
>>>> and hence the glory SIGBUS
>>>>
>>>>
>>>> this is loosely related to
>>>> http://www.open-mpi.org/community/lists/devel/2014/08/15532.php
>>>> (see heterogeneous.v2.patch)
>>>> if we make opal_process_name_t an union of uint64_t and a struct of two
>>>> uint32_t, the compiler
>>>> will align this on 8 bytes.
>>>> note the patch is not enough (and will not apply on t

Re: [OMPI devel] [OMPI users] bus error with openmpi-1.8.2rc2 on Solaris 10 Sparc

2014-08-08 Thread Gilles Gouaillardet
Kawashima-san,

This is interesting :-)

proc is in the stack and has type orte_process_name_t

with

typedef uint32_t orte_jobid_t;
typedef uint32_t orte_vpid_t;
struct orte_process_name_t {
orte_jobid_t jobid; /**< Job number */
orte_vpid_t vpid;   /**< Process id - equivalent to rank */
};
typedef struct orte_process_name_t orte_process_name_t;


so there is really no reason to align this on 8 bytes...
but later, proc is casted into an uint64_t ...
so proc should have been aligned on 8 bytes but it is too late,
and hence the glory SIGBUS


this is loosely related to
http://www.open-mpi.org/community/lists/devel/2014/08/15532.php
(see heterogeneous.v2.patch)
if we make opal_process_name_t an union of uint64_t and a struct of two
uint32_t, the compiler
will align this on 8 bytes.
note the patch is not enough (and will not apply on the v1.8 branch anyway),
we could simply remove orte_process_name_t and ompi_process_name_t and
use only
opal_process_name_t (and never declare variables with type
opal_proc_name_t otherwise alignment might be incorrect)

as a workaround, you can declare an opal_process_name_t (for alignment),
and cast it to an orte_process_name_t

i will write a patch (i will not be able to test on sparc ...)
please note this issue might be present in other places

Cheers,

Gilles

On 2014/08/08 13:03, Kawashima, Takahiro wrote:
> Hi,
>
>> I have installed openmpi-1.8.2rc2 with gcc-4.9.0 on Solaris
>> 10 Sparc and I receive a bus error, if I run a small program.
> I've finally reproduced the bus error in my SPARC environment.
>
> #0 0x00db4740 (__waitpid_nocancel + 0x44) 
> (0x200,0x0,0x0,0xa0,0xf80100064af0,0x35b4)
> #1 0x0001a310 (handle_signal + 0x574) (signo=10,info=(struct siginfo 
> *) 0x07fed100,p=(void *) 0x07fed100) at line 277 in 
> ../sigattach.c 
> #2 0x0282aff4 (store + 0x540) (uid=(unsigned long *) 
> 0x0118a128,scope=8:'\b',key=(char *) 0x0106a0a8 
> "opal.local.ldr",data=(void *) 0x07fede74,type=15:'\017') at line 252 
> in db_hash.c
> #3 0x01266350 (opal_db_base_store + 0xc4) (proc=(unsigned long *) 
> 0x0118a128,scope=8:'\b',key=(char *) 0x0106a0a8 
> "opal.local.ldr",object=(void *) 0x07fede74,type=15:'\017') at line 
> 49 in db_base_fns.c
> #4 0x00fdbab4 (orte_util_decode_pidmap + 0x790) (bo=(struct *) 
> 0x00281d70) at line 975 in nidmap.c
> #5 0x00fd6d20 (orte_util_nidmap_init + 0x3dc) (buffer=(struct 
> opal_buffer_t *) 0x00241fc0) at line 141 in nidmap.c
> #6 0x01e298cc (rte_init + 0x2a0) () at line 153 in ess_env_module.c
> #7 0x00f9f28c (orte_init + 0x308) (pargc=(int *) 
> 0x,pargv=(char ***) 0x,flags=32) at line 148 
> in orte_init.c
> #8 0x001a6f08 (ompi_mpi_init + 0x31c) (argc=1,argv=(char **) 
> 0x07fef348,requested=0,provided=(int *) 0x07fee698) at line 
> 464 in ompi_mpi_init.c
> #9 0x001ff79c (MPI_Init + 0x2b0) (argc=(int *) 
> 0x07fee814,argv=(char ***) 0x07fee818) at line 84 in init.c
> #10 0x00100ae4 (main + 0x44) (argc=1,argv=(char **) 
> 0x07fef348) at line 8 in mpiinitfinalize.c
> #11 0x00d2b81c (__libc_start_main + 0x194) 
> (0x100aa0,0x1,0x7fef348,0x100d24,0x100d14,0x0)
> #12 0x0010094c (_start + 0x2c) ()
>
> The line 252 in opal/mca/db/hash/db_hash.c is:
>
> case OPAL_UINT64:
> if (NULL == data) {
> OPAL_ERROR_LOG(OPAL_ERR_BAD_PARAM);
> return OPAL_ERR_BAD_PARAM;
> }
> kv->type = OPAL_UINT64;
> kv->data.uint64 = *(uint64_t*)(data); // !!! here !!!
> break;
>
> My environment is:
>
>   Open MPI v1.8 branch r32447 (latest)
>   configure --enable-debug
>   SPARC-V9 (Fujitsu SPARC64 IXfx)
>   Linux (custom)
>   gcc 4.2.4
>
> I could not reproduce it with Open MPI trunk nor with Fujitsu compiler.
>
> Can this information help?
>
> Takahiro Kawashima,
> MPI development team,
> Fujitsu
>
>> Hi,
>>
>> I'm sorry once more to answer late, but the last two days our mail
>> server was down (hardware error).
>>
>>> Did you configure this --enable-debug?
>> Yes, I used the following command.
>>
>> ../openmpi-1.8.2rc3/configure --prefix=/usr/local/openmpi-1.8.2_64_gcc \
>>   --libdir=/usr/local/openmpi-1.8.2_64_gcc/lib64 \
>>   --with-jdk-bindir=/usr/local/jdk1.8.0/bin \
>>   --with-jdk-headers=/usr/local/jdk1.8.0/include \
>>   JAVA_HOME=/usr/local/jdk1.8.0 \
>>   LDFLAGS="-m64 -L/usr/local/gcc-4.9.0/lib/amd64" \
>>   CC="gcc" CXX="g++" FC="gfortran" \
>>   CFLAGS="-m64" CXXFLAGS="-m64" FCFLAGS="-m64" \
>>   CPP="cpp" CXXCPP="cpp" \
>>   CPPFLAGS="" CXXCPPFLAGS="" \
>>   --enable-mpi-cxx \
>>   --enable-cxx-exceptions \
>>   --enable-mpi-java \
>>   --enable-heterogeneous \
>>   --enable-mpi-thread-multiple \
>>   --with-threads=posix \
>>   --with-hwloc=internal \
>>   --without-verbs \
>>   

Re: [OMPI devel] v1.8.2 still held up...

2014-08-08 Thread Gilles Gouaillardet
Ralph and all,


> * static linking failure - Gilles has posted a proposed fix, but somebody 
> needs to approve and CMR it. Please see:
> https://svn.open-mpi.org/trac/ompi/ticket/4834

Jeff made a better fix (r32447) to which i added a minor correction
(r32448).
as far as i am concerned, i am fine with to approve #4841

that being said, per Jeff's commit log :
"This needs to soak for a day or two on the trunk before moving to the
v1.8 branch"

so you might want to wait a bit ...

> * Siegmar reports another alignment issue on Sparc
> http://www.open-mpi.org/community/lists/users/2014/07/24869.php
>
Is there any chance r32449 fixes the issue ?

i found the problem on Solaris/x86_64 but i have no way to test it on a
Solaris/sparc box.

Cheers,

Gilles


Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-07 Thread Gilles Gouaillardet
Ralph and George,

here are attached two patches :
- heterogeneous.v1.patch : a cleanup of the previous patch
- heterogeneous.v2.patch : a new patch based on Ralph suggestion. i made
the minimal changes to move jobid and vpid into the OPAL layer.

Cheers,

Gilles

On 2014/08/07 11:27, Ralph Castain wrote:
> Are we maybe approaching this from the wrong direction? I ask because we had 
> to do some gyrations in the pmix framework to work around the difference in 
> naming schemes between OPAL and the rest of the code base, and now we have 
> more gyrations here.
>
> Given that the MPI and RTE layers both rely on the structured form of the 
> name, what about if we just mimic that down in OPAL? I think we could perhaps 
> do this in a way that still allows someone to overlay it with a 64-bit 
> unstructured identifier if they want, but that would put the extra work on 
> their side. In other words, we make it easy to work with the other parts of 
> our own code base, acknowledging that those wanting to do something else may 
> have to do some extra work.
>
> I ask because every resource manager out there assigns each process a jobid 
> and vpid in some form of integer format. So we have to absorb that 
> information in {jobid, vpid} format regardless of what we may want to do 
> internally. What we now have to do is immediately convert that into the 
> unstructured form for OPAL (where we take it in via PMI), then convert it 
> back to structured form when passing it up to ORTE so it can be handed to 
> OMPI, and then convert it back to unstructured form every time either OMPI or 
> ORTE accesses the OPAL layer.
>
> Seems awfully convoluted and error prone. Simplifying things for ourselves 
> might make more sense.
>
>
> On Aug 6, 2014, at 1:21 PM, George Bosilca <bosi...@icl.utk.edu> wrote:
>
>> Gilles,
>>
>> This looks right. It is really unfortunately that we have to change the 
>> definition of orte_process_name_t for big endian architectures, but I don't 
>> think there is a way around.
>>
>> Regarding your patch I have two comments:
>> 1. There is a flagrant lack of comments ... especially on the ORTE side
>> 2. at the OPAL level we are really implementing a htonll, and I really think 
>> we should stick to the POSIX prototype (aka. returning the changes value 
>> instead of doing things inplace).
>>
>>   George.
>>
>>
>>
>> On Wed, Aug 6, 2014 at 7:02 AM, Gilles Gouaillardet 
>> <gilles.gouaillar...@iferc.org> wrote:
>> Ralph and George,
>>
>> here is attached a patch that fixes the heterogeneous support without the 
>> abstraction violation.
>>
>> Cheers,
>>
>> Gilles
>>
>>
>> On 2014/08/06 9:40, Gilles Gouaillardet wrote:
>>> hummm
>>>
>>> i intentionally did not swap the two 32 bits (!)
>>>
>>> from the top level, what we have is :
>>>
>>> typedef struct {
>>>union {
>>>   uint64_t opal;
>>>   struct {
>>>uint32_t jobid;
>>>uint32_t vpid;
>>>} orte;
>>> } meta_process_name_t;
>>>
>>> OPAL is agnostic about jobid and vpid.
>>> jobid and vpid are set in ORTE/MPI and OPAL is used only
>>> to transport the 64 bits
>>> /* opal_process_name_t and orte_process_name_t are often casted into each
>>> other */
>>> at ORTE/MPI level, jobid and vpid are set individually
>>> /* e.g. we do *not* do something like opal = jobid | (vpid<<32) */
>>> this is why everything works fine on homogeneous clusters regardless
>>> endianness.
>>>
>>> now in heterogeneous cluster, thing get a bit trickier ...
>>>
>>> i was initially unhappy with my commit and i think i found out why :
>>> this is an abstraction violation !
>>> the two 32 bits are not swapped by OPAL because this is what is expected by
>>> the ORTE/OMPI.
>>>
>>> now i d like to suggest the following lightweight approach :
>>>
>>> at OPAL, use #if protected htonll/ntohll
>>> (e.g. swap the two 32bits)
>>>
>>> do the trick at the ORTE level :
>>>
>>> simply replace
>>>
>>> struct orte_process_name_t {
>>> orte_jobid_t jobid;
>>> orte_vpid_t vpid;
>>> };
>>>
>>> with
>>>
>>> #if OPAL_ENABLE_HETEROGENEOUS_SUPPORT && !defined(WORDS_BIGENDIAN)
>>> struct orte_process_name_t {
>>> orte_vpid_t vpid;
>>> orte_jobid_t jobid;
>>> };
>>> #else
>>> 

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-06 Thread Gilles Gouaillardet
Ralph and George,

here is attached a patch that fixes the heterogeneous support without
the abstraction violation.

Cheers,

Gilles

On 2014/08/06 9:40, Gilles Gouaillardet wrote:
> hummm
>
> i intentionally did not swap the two 32 bits (!)
>
> from the top level, what we have is :
>
> typedef struct {
>union {
>   uint64_t opal;
>   struct {
>uint32_t jobid;
>uint32_t vpid;
>} orte;
> } meta_process_name_t;
>
> OPAL is agnostic about jobid and vpid.
> jobid and vpid are set in ORTE/MPI and OPAL is used only
> to transport the 64 bits
> /* opal_process_name_t and orte_process_name_t are often casted into each
> other */
> at ORTE/MPI level, jobid and vpid are set individually
> /* e.g. we do *not* do something like opal = jobid | (vpid<<32) */
> this is why everything works fine on homogeneous clusters regardless
> endianness.
>
> now in heterogeneous cluster, thing get a bit trickier ...
>
> i was initially unhappy with my commit and i think i found out why :
> this is an abstraction violation !
> the two 32 bits are not swapped by OPAL because this is what is expected by
> the ORTE/OMPI.
>
> now i d like to suggest the following lightweight approach :
>
> at OPAL, use #if protected htonll/ntohll
> (e.g. swap the two 32bits)
>
> do the trick at the ORTE level :
>
> simply replace
>
> struct orte_process_name_t {
> orte_jobid_t jobid;
> orte_vpid_t vpid;
> };
>
> with
>
> #if OPAL_ENABLE_HETEROGENEOUS_SUPPORT && !defined(WORDS_BIGENDIAN)
> struct orte_process_name_t {
> orte_vpid_t vpid;
> orte_jobid_t jobid;
> };
> #else
> struct orte_process_name_t {
> orte_jobid_t jobid;
> orte_vpid_t vpid;
> };
> #endif
>
>
> so we keep OPAL agnostic about how the uint64_t is really used at the upper
> level.
> an other option is to make OPAL aware of jobid and vpid but this is a bit
> more heavyweight imho.
>
> i'll try this today and make sure it works.
>
> any thoughts ?
>
> Cheers,
>
> Gilles
>
>
> On Wed, Aug 6, 2014 at 8:17 AM, Ralph Castain <r...@open-mpi.org> wrote:
>
>> Ah yes, so it is - sorry I missed that last test :-/
>>
>> On Aug 5, 2014, at 10:50 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
>>
>> The code committed by Gilles is correctly protected for big endian (
>> https://svn.open-mpi.org/trac/ompi/changeset/32425). I was merely
>> pointing out that I think he should also swap the 2 32 bits in his
>> implementation.
>>
>>   George.
>>
>>
>>
>> On Tue, Aug 5, 2014 at 1:32 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>
>>> On Aug 5, 2014, at 10:23 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
>>>
>>> On Tue, Aug 5, 2014 at 1:15 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>
>>>> Hmmm...wouldn't that then require that you know (a) the other side is
>>>> little endian, and (b) that you are on a big endian? Otherwise, you wind up
>>>> with the same issue in reverse, yes?
>>>>
>>> This is similar to the 32 bits ntohl that we are using in other parts of
>>> the project. Any  little endian participant will do the conversion, while
>>> every big endian participant will use an empty macro instead.
>>>
>>>
>>>> In the ORTE methods, we explicitly set the fields (e.g., jobid =
>>>> ntohl(remote-jobid)) to get around this problem. I missed that he did it by
>>>> location instead of named fields - perhaps we should do that instead?
>>>>
>>> As soon as we impose the ORTE naming scheme at the OPAL level (aka. the
>>> notion of jobid and vpid) this approach will become possible.
>>>
>>>
>>> Not proposing that at all so long as the other method will work without
>>> knowing the other side's endianness. Sounds like your approach should work
>>> fine as long as Gilles adds a #if so big endian defines the macro away
>>>
>>>
>>>   George.
>>>
>>>
>>>
>>>>
>>>> On Aug 5, 2014, at 10:06 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
>>>>
>>>> Technically speaking, converting a 64 bits to a big endian
>>>> representation requires the swap of the 2 32 bits parts. So the correct
>>>> approach would have been:
>>>> uint64_t htonll(uint64_t v)
>>>> {
>>>> return uint64_t)ntohl(n)) << 32 | (uint64_t)ntohl(n >> 32));
>>>> }
>>>>
>>>>   

Re: [OMPI devel] [1.8.2rc3] static linking fails on linux when not building ROMIO

2014-08-06 Thread Gilles Gouaillardet
Paul,

i missed a step indeed :
opal is required by rte, that is in turn required by mpi

the attached patch does the job (tested on a solaris10/x86_64 vm with
gnu compilers)

Cheers,

Gilles

On 2014/08/06 4:40, Paul Hargrove wrote:
> Gilles,
>
> I have not tested your patch.
> I've only read it.
>
> It looks like it could work, except that libopen-rte.a depends on libsocket
> and libnsl on Solaris.
> So, one probably needs to add $LIBS to the ORTE wrapper libs as well.
>
> Additionally,if your approach is the correct one, then I think one can fold:
>
> OPAL_FLAGS_APPEND_UNIQ([OPAL_WRAPPER_EXTRA_LIBS],
> [$wrapper_extra_libs])
> OPAL_WRAPPER_EXTRA_LIBS="$OPAL_WRAPPER_EXTRA_LIBS
> $with_wrapper_libs"
> +   OPAL_FLAGS_APPEND_UNIQ([OPAL_WRAPPER_EXTRA_LIBS], [$LIBS])
> +   OPAL_WRAPPER_EXTRA_LIBS="$OPAL_WRAPPER_EXTRA_LIBS
> $with_wrapper_libs"
>
> into just
>
> -OPAL_FLAGS_APPEND_UNIQ([OPAL_WRAPPER_EXTRA_LIBS],
> [$wrapper_extra_libs])
> +   OPAL_FLAGS_APPEND_UNIQ([OPAL_WRAPPER_EXTRA_LIBS],
> [$wrapper_extra_libs $LIBS])
>
> which merges two calls to OPAL_FLAGS_APPEND_UNIQ and avoids double-adding
> of the user's $with_wrapper_libs.
> And of course the same 1-line change would apply for the OMPI and
> eventually ORTE variables too.
>
> I'd like to wait until Jeff has had a chance to look this over before I
> devote time to testing.
> Since I've determined already that 1.6.5 did not have the problem while
> 1.7.x does, the possibility exists that some smaller change might exist to
> restore what ever was lost between the v1.6 and v1.7 branches.
>
> -Paul
>
>
> On Tue, Aug 5, 2014 at 1:33 AM, Gilles Gouaillardet <
> gilles.gouaillar...@iferc.org> wrote:
>
>>  Here is a patch that has been minimally tested.
>>
>> this is likely an overkill (at least when dynamic libraries can be used),
>> but it does the job so far ...
>>
>> Cheers,
>>
>> Gilles
>>
>> On 2014/08/05 16:56, Gilles Gouaillardet wrote:
>>
>> from libopen-pal.la :
>> dependency_libs=' -lrdmacm -libverbs -lscif -lnuma -ldl -lrt -lnsl
>> -lutil -lm'
>>
>>
>> i confirm mpicc fails linking
>>
>> but FWIT, using libtool does work (!)
>>
>> could the bug come from the mpicc (and other) wrappers ?
>>
>> Gilles
>>
>> $ gcc -g -O0 -o hw /csc/home1/gouaillardet/hw.c
>> -I/tmp/install/ompi.noromio/include -pthread -L/usr/lib64 -Wl,-rpath
>> -Wl,/usr/lib64 -Wl,-rpath -Wl,/tmp/install/ompi.noromio/lib
>> -Wl,--enable-new-dtags -L/tmp/install/ompi.noromio/lib -lmpi -lopen-rte
>> -lopen-pal -lm -lnuma -libverbs -lscif -lrdmacm -ldl -llustreapi
>>
>> $ /tmp/install/ompi.noromio/bin/mpicc -g -O0 -o hw -show ~/hw.c
>> gcc -g -O0 -o hw /csc/home1/gouaillardet/hw.c
>> -I/tmp/install/ompi.noromio/include -pthread -L/usr/lib64 -Wl,-rpath
>> -Wl,/usr/lib64 -Wl,-rpath -Wl,/tmp/install/ompi.noromio/lib
>> -Wl,--enable-new-dtags -L/tmp/install/ompi.noromio/lib -lmpi -lopen-rte
>> -lopen-pal -lm -lnuma -libverbs -lscif -lrdmacm -ldl -llustreapi
>> [gouaillardet@soleil build]$ /tmp/install/ompi.noromio/bin/mpicc -g -O0
>> -o hw ~/hw.c
>> /tmp/install/ompi.noromio/lib/libmpi.a(fbtl_posix_ipwritev.o): In
>> function `mca_fbtl_posix_ipwritev':
>> fbtl_posix_ipwritev.c:(.text+0x17b): undefined reference to `aio_write'
>> fbtl_posix_ipwritev.c:(.text+0x237): undefined reference to `aio_write'
>> fbtl_posix_ipwritev.c:(.text+0x3f4): undefined reference to `aio_write'
>> fbtl_posix_ipwritev.c:(.text+0x48e): undefined reference to `aio_write'
>> /tmp/install/ompi.noromio/lib/libopen-pal.a(opal_pty.o): In function
>> `opal_openpty':
>> opal_pty.c:(.text+0x1): undefined reference to `openpty'
>> /tmp/install/ompi.noromio/lib/libopen-pal.a(event.o): In function
>> `event_add_internal':
>> event.c:(.text+0x288d): undefined reference to `clock_gettime'
>>
>> $ /bin/sh ./static/libtool --silent --tag=CC   --mode=compile gcc
>> -std=gnu99 -I/tmp/install/ompi.noromio/include -c ~/hw.c
>> $ /bin/sh ./static/libtool --silent --tag=CC   --mode=link gcc
>> -std=gnu99 -o hw hw.o -L/tmp/install/ompi.noromio/lib -lmpi
>> $ ldd hw
>> linux-vdso.so.1 =>  (0x7fff7530d000)
>> librdmacm.so.1 => /usr/lib64/librdmacm.so.1 (0x7f0ed541e000)
>> libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x7f0ed521)
>> libscif.so.0 => /usr/lib64/libscif.so.0 (0x003b9c60)
>> libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x003ba560)
>> libdl.so.2 => /lib64/libdl.so.2 (0x003b9be0)
>> 

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-05 Thread Gilles Gouaillardet
hummm

i intentionally did not swap the two 32 bits (!)

from the top level, what we have is :

typedef struct {
   union {
  uint64_t opal;
  struct {
   uint32_t jobid;
   uint32_t vpid;
   } orte;
} meta_process_name_t;

OPAL is agnostic about jobid and vpid.
jobid and vpid are set in ORTE/MPI and OPAL is used only
to transport the 64 bits
/* opal_process_name_t and orte_process_name_t are often casted into each
other */
at ORTE/MPI level, jobid and vpid are set individually
/* e.g. we do *not* do something like opal = jobid | (vpid<<32) */
this is why everything works fine on homogeneous clusters regardless
endianness.

now in heterogeneous cluster, thing get a bit trickier ...

i was initially unhappy with my commit and i think i found out why :
this is an abstraction violation !
the two 32 bits are not swapped by OPAL because this is what is expected by
the ORTE/OMPI.

now i d like to suggest the following lightweight approach :

at OPAL, use #if protected htonll/ntohll
(e.g. swap the two 32bits)

do the trick at the ORTE level :

simply replace

struct orte_process_name_t {
orte_jobid_t jobid;
orte_vpid_t vpid;
};

with

#if OPAL_ENABLE_HETEROGENEOUS_SUPPORT && !defined(WORDS_BIGENDIAN)
struct orte_process_name_t {
orte_vpid_t vpid;
orte_jobid_t jobid;
};
#else
struct orte_process_name_t {
orte_jobid_t jobid;
orte_vpid_t vpid;
};
#endif


so we keep OPAL agnostic about how the uint64_t is really used at the upper
level.
an other option is to make OPAL aware of jobid and vpid but this is a bit
more heavyweight imho.

i'll try this today and make sure it works.

any thoughts ?

Cheers,

Gilles


On Wed, Aug 6, 2014 at 8:17 AM, Ralph Castain <r...@open-mpi.org> wrote:

> Ah yes, so it is - sorry I missed that last test :-/
>
> On Aug 5, 2014, at 10:50 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
>
> The code committed by Gilles is correctly protected for big endian (
> https://svn.open-mpi.org/trac/ompi/changeset/32425). I was merely
> pointing out that I think he should also swap the 2 32 bits in his
> implementation.
>
>   George.
>
>
>
> On Tue, Aug 5, 2014 at 1:32 PM, Ralph Castain <r...@open-mpi.org> wrote:
>
>>
>> On Aug 5, 2014, at 10:23 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
>>
>> On Tue, Aug 5, 2014 at 1:15 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>
>>> Hmmm...wouldn't that then require that you know (a) the other side is
>>> little endian, and (b) that you are on a big endian? Otherwise, you wind up
>>> with the same issue in reverse, yes?
>>>
>>
>> This is similar to the 32 bits ntohl that we are using in other parts of
>> the project. Any  little endian participant will do the conversion, while
>> every big endian participant will use an empty macro instead.
>>
>>
>>> In the ORTE methods, we explicitly set the fields (e.g., jobid =
>>> ntohl(remote-jobid)) to get around this problem. I missed that he did it by
>>> location instead of named fields - perhaps we should do that instead?
>>>
>>
>> As soon as we impose the ORTE naming scheme at the OPAL level (aka. the
>> notion of jobid and vpid) this approach will become possible.
>>
>>
>> Not proposing that at all so long as the other method will work without
>> knowing the other side's endianness. Sounds like your approach should work
>> fine as long as Gilles adds a #if so big endian defines the macro away
>>
>>
>>   George.
>>
>>
>>
>>>
>>>
>>> On Aug 5, 2014, at 10:06 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
>>>
>>> Technically speaking, converting a 64 bits to a big endian
>>> representation requires the swap of the 2 32 bits parts. So the correct
>>> approach would have been:
>>> uint64_t htonll(uint64_t v)
>>> {
>>> return uint64_t)ntohl(n)) << 32 | (uint64_t)ntohl(n >> 32));
>>> }
>>>
>>>   George.
>>>
>>>
>>>
>>> On Tue, Aug 5, 2014 at 5:52 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>>
>>>> FWIW: that's exactly how we do it in ORTE
>>>>
>>>> On Aug 4, 2014, at 10:25 PM, Gilles Gouaillardet <
>>>> gilles.gouaillar...@iferc.org> wrote:
>>>>
>>>> George,
>>>>
>>>> i confirm there was a problem when running on an heterogeneous cluster,
>>>> this is now fixed in r32425.
>>>>
>>>> i am not convinced i chose the most elegant way to achieve the desired
>>>> result ...
>>>> could you please double check

Re: [OMPI devel] [1.8.2rc3] static linking fails on linux when not building ROMIO

2014-08-05 Thread Gilles Gouaillardet
Here is a patch that has been minimally tested.

this is likely an overkill (at least when dynamic libraries can be used),
but it does the job so far ...

Cheers,

Gilles

On 2014/08/05 16:56, Gilles Gouaillardet wrote:
> from libopen-pal.la :
> dependency_libs=' -lrdmacm -libverbs -lscif -lnuma -ldl -lrt -lnsl
> -lutil -lm'
>
>
> i confirm mpicc fails linking
>
> but FWIT, using libtool does work (!)
>
> could the bug come from the mpicc (and other) wrappers ?
>
> Gilles
>
> $ gcc -g -O0 -o hw /csc/home1/gouaillardet/hw.c
> -I/tmp/install/ompi.noromio/include -pthread -L/usr/lib64 -Wl,-rpath
> -Wl,/usr/lib64 -Wl,-rpath -Wl,/tmp/install/ompi.noromio/lib
> -Wl,--enable-new-dtags -L/tmp/install/ompi.noromio/lib -lmpi -lopen-rte
> -lopen-pal -lm -lnuma -libverbs -lscif -lrdmacm -ldl -llustreapi
>
> $ /tmp/install/ompi.noromio/bin/mpicc -g -O0 -o hw -show ~/hw.c
> gcc -g -O0 -o hw /csc/home1/gouaillardet/hw.c
> -I/tmp/install/ompi.noromio/include -pthread -L/usr/lib64 -Wl,-rpath
> -Wl,/usr/lib64 -Wl,-rpath -Wl,/tmp/install/ompi.noromio/lib
> -Wl,--enable-new-dtags -L/tmp/install/ompi.noromio/lib -lmpi -lopen-rte
> -lopen-pal -lm -lnuma -libverbs -lscif -lrdmacm -ldl -llustreapi
> [gouaillardet@soleil build]$ /tmp/install/ompi.noromio/bin/mpicc -g -O0
> -o hw ~/hw.c
> /tmp/install/ompi.noromio/lib/libmpi.a(fbtl_posix_ipwritev.o): In
> function `mca_fbtl_posix_ipwritev':
> fbtl_posix_ipwritev.c:(.text+0x17b): undefined reference to `aio_write'
> fbtl_posix_ipwritev.c:(.text+0x237): undefined reference to `aio_write'
> fbtl_posix_ipwritev.c:(.text+0x3f4): undefined reference to `aio_write'
> fbtl_posix_ipwritev.c:(.text+0x48e): undefined reference to `aio_write'
> /tmp/install/ompi.noromio/lib/libopen-pal.a(opal_pty.o): In function
> `opal_openpty':
> opal_pty.c:(.text+0x1): undefined reference to `openpty'
> /tmp/install/ompi.noromio/lib/libopen-pal.a(event.o): In function
> `event_add_internal':
> event.c:(.text+0x288d): undefined reference to `clock_gettime'
>
> $ /bin/sh ./static/libtool --silent --tag=CC   --mode=compile gcc
> -std=gnu99 -I/tmp/install/ompi.noromio/include -c ~/hw.c
> $ /bin/sh ./static/libtool --silent --tag=CC   --mode=link gcc
> -std=gnu99 -o hw hw.o -L/tmp/install/ompi.noromio/lib -lmpi
> $ ldd hw
> linux-vdso.so.1 =>  (0x7fff7530d000)
> librdmacm.so.1 => /usr/lib64/librdmacm.so.1 (0x7f0ed541e000)
> libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x7f0ed521)
> libscif.so.0 => /usr/lib64/libscif.so.0 (0x003b9c60)
> libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x003ba560)
> libdl.so.2 => /lib64/libdl.so.2 (0x003b9be0)
> librt.so.1 => /lib64/librt.so.1 (0x003b9ca0)
> libnsl.so.1 => /lib64/libnsl.so.1 (0x003bae20)
> libutil.so.1 => /lib64/libutil.so.1 (0x003bac60)
> libm.so.6 => /lib64/libm.so.6 (0x003b9ba0)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x003b9c20)
> libc.so.6 => /lib64/libc.so.6 (0x003b9b60)
> /lib64/ld-linux-x86-64.so.2 (0x003b9b20)
>
>
>
>
> On 2014/08/05 7:56, Ralph Castain wrote:
>> My thought was to post initially as a blocker, pending a discussion with 
>> Jeff at tomorrow's telecon. If he thinks this is something we can fix in 
>> some central point (thus catching it everywhere), then it could be quick and 
>> worth doing. However, I'm skeptical as I tried to do that in the most 
>> obvious place, and it failed (could be operator error).
>>
>> Will let you know tomorrow. Truly appreciate your digging on this!
>> Ralph
>>
>> On Aug 4, 2014, at 3:50 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>>
>>> Ralph and Jeff,
>>>
>>> I've been digging and find the problem is wider than just the one library 
>>> and has manifestations specific to FreeBSD, NetBSD and Solaris.  I am 
>>> adding new info to the ticket as I unearth it.
>>>
>>> Additionally, it appears this existed in 1.8, 1.8.1 and in the 1.7 series 
>>> as well.
>>> So, would suggest this NOT be a blocker for a 1.8.2 release.
>>>
>>> Of course I am willing to provide testing if you still want to push for a 
>>> quick resolution.
>>>
>>> -Paul
>>>
>>>
>>> On Mon, Aug 4, 2014 at 1:27 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>> Okay, I filed a blocker on this for 1.8.2 and assigned it to Jeff. I took a 
>>> crack at fixing it, but came up short :-(
>>>
>>>
>>> On Aug 3, 2014, at 10:46 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>>>
>>>> I've ident

Re: [OMPI devel] [1.8.2rc3] static linking fails on linux when not building ROMIO

2014-08-05 Thread Gilles Gouaillardet
from libopen-pal.la :
dependency_libs=' -lrdmacm -libverbs -lscif -lnuma -ldl -lrt -lnsl
-lutil -lm'


i confirm mpicc fails linking

but FWIT, using libtool does work (!)

could the bug come from the mpicc (and other) wrappers ?

Gilles

$ gcc -g -O0 -o hw /csc/home1/gouaillardet/hw.c
-I/tmp/install/ompi.noromio/include -pthread -L/usr/lib64 -Wl,-rpath
-Wl,/usr/lib64 -Wl,-rpath -Wl,/tmp/install/ompi.noromio/lib
-Wl,--enable-new-dtags -L/tmp/install/ompi.noromio/lib -lmpi -lopen-rte
-lopen-pal -lm -lnuma -libverbs -lscif -lrdmacm -ldl -llustreapi

$ /tmp/install/ompi.noromio/bin/mpicc -g -O0 -o hw -show ~/hw.c
gcc -g -O0 -o hw /csc/home1/gouaillardet/hw.c
-I/tmp/install/ompi.noromio/include -pthread -L/usr/lib64 -Wl,-rpath
-Wl,/usr/lib64 -Wl,-rpath -Wl,/tmp/install/ompi.noromio/lib
-Wl,--enable-new-dtags -L/tmp/install/ompi.noromio/lib -lmpi -lopen-rte
-lopen-pal -lm -lnuma -libverbs -lscif -lrdmacm -ldl -llustreapi
[gouaillardet@soleil build]$ /tmp/install/ompi.noromio/bin/mpicc -g -O0
-o hw ~/hw.c
/tmp/install/ompi.noromio/lib/libmpi.a(fbtl_posix_ipwritev.o): In
function `mca_fbtl_posix_ipwritev':
fbtl_posix_ipwritev.c:(.text+0x17b): undefined reference to `aio_write'
fbtl_posix_ipwritev.c:(.text+0x237): undefined reference to `aio_write'
fbtl_posix_ipwritev.c:(.text+0x3f4): undefined reference to `aio_write'
fbtl_posix_ipwritev.c:(.text+0x48e): undefined reference to `aio_write'
/tmp/install/ompi.noromio/lib/libopen-pal.a(opal_pty.o): In function
`opal_openpty':
opal_pty.c:(.text+0x1): undefined reference to `openpty'
/tmp/install/ompi.noromio/lib/libopen-pal.a(event.o): In function
`event_add_internal':
event.c:(.text+0x288d): undefined reference to `clock_gettime'

$ /bin/sh ./static/libtool --silent --tag=CC   --mode=compile gcc
-std=gnu99 -I/tmp/install/ompi.noromio/include -c ~/hw.c
$ /bin/sh ./static/libtool --silent --tag=CC   --mode=link gcc
-std=gnu99 -o hw hw.o -L/tmp/install/ompi.noromio/lib -lmpi
$ ldd hw
linux-vdso.so.1 =>  (0x7fff7530d000)
librdmacm.so.1 => /usr/lib64/librdmacm.so.1 (0x7f0ed541e000)
libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x7f0ed521)
libscif.so.0 => /usr/lib64/libscif.so.0 (0x003b9c60)
libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x003ba560)
libdl.so.2 => /lib64/libdl.so.2 (0x003b9be0)
librt.so.1 => /lib64/librt.so.1 (0x003b9ca0)
libnsl.so.1 => /lib64/libnsl.so.1 (0x003bae20)
libutil.so.1 => /lib64/libutil.so.1 (0x003bac60)
libm.so.6 => /lib64/libm.so.6 (0x003b9ba0)
libpthread.so.0 => /lib64/libpthread.so.0 (0x003b9c20)
libc.so.6 => /lib64/libc.so.6 (0x003b9b60)
/lib64/ld-linux-x86-64.so.2 (0x003b9b20)




On 2014/08/05 7:56, Ralph Castain wrote:
> My thought was to post initially as a blocker, pending a discussion with Jeff 
> at tomorrow's telecon. If he thinks this is something we can fix in some 
> central point (thus catching it everywhere), then it could be quick and worth 
> doing. However, I'm skeptical as I tried to do that in the most obvious 
> place, and it failed (could be operator error).
>
> Will let you know tomorrow. Truly appreciate your digging on this!
> Ralph
>
> On Aug 4, 2014, at 3:50 PM, Paul Hargrove  wrote:
>
>> Ralph and Jeff,
>>
>> I've been digging and find the problem is wider than just the one library 
>> and has manifestations specific to FreeBSD, NetBSD and Solaris.  I am adding 
>> new info to the ticket as I unearth it.
>>
>> Additionally, it appears this existed in 1.8, 1.8.1 and in the 1.7 series as 
>> well.
>> So, would suggest this NOT be a blocker for a 1.8.2 release.
>>
>> Of course I am willing to provide testing if you still want to push for a 
>> quick resolution.
>>
>> -Paul
>>
>>
>> On Mon, Aug 4, 2014 at 1:27 PM, Ralph Castain  wrote:
>> Okay, I filed a blocker on this for 1.8.2 and assigned it to Jeff. I took a 
>> crack at fixing it, but came up short :-(
>>
>>
>> On Aug 3, 2014, at 10:46 PM, Paul Hargrove  wrote:
>>
>>> I've identified the difference between the platform that does link libutil 
>>> and the one that does not.
>>>
>>> 1) libutil is linked (as an OMPI dependency) only on the working system:
>>>
>>> Working system:
>>> $ grep 'checking for .* LIBS' configure.out
>>> checking for OPAL LIBS... -lm -lpciaccess -ldl 
>>> checking for ORTE LIBS... -lm -lpciaccess -ldl -ltorque 
>>> checking for OMPI LIBS... -lm -lpciaccess -ldl -ltorque -lrt -lnsl -lutil
>>>
>>> NON-working system:
>>> $ grep 'checking for .* LIBS' configure.out
>>> checking for OPAL LIBS... -lm -ldl 
>>> checking for ORTE LIBS... -lm -ldl -ltorque 
>>> checking for OMPI LIBS... -lm -ldl -ltorque 
>>>
>>> So, the working system that does link libutil is doing so as an OMPI 
>>> dependency.
>>> However it is also needed for opal (only caller of openpty is 
>>> opal/util/open_pty.c).
>>>
>>> 2) Only the working system is building ROMIO:
>>>

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-05 Thread Gilles Gouaillardet
George,

i confirm there was a problem when running on an heterogeneous cluster,
this is now fixed in r32425.

i am not convinced i chose the most elegant way to achieve the desired
result ...
could you please double check this commit ?

Thanks,

Gilles

On 2014/08/02 0:14, George Bosilca wrote:
> Gilles,
>
> The design of the BTL move was to let the opal_process_name_t be agnostic to 
> what is stored inside, and all accesses should be done through the provided 
> accessors. Thus, big endian or little endian doesn't make a difference, as 
> long as everything goes through the accessors.
>
> I'm skeptical about the support of heterogeneous environments in the current 
> code, so I didn't pay much attention to handling the case in the TCP BTL. But 
> in case we do care it is enough to make  the 2 macros point to something 
> meaningful instead of being empty (bswap_64 or something).
>
>   George.
>
> On Aug 1, 2014, at 06:52 , Gilles Gouaillardet 
> <gilles.gouaillar...@iferc.org> wrote:
>
>> George and Ralph,
>>
>> i am very confused whether there is an issue or not.
>>
>>
>> anyway, today Paul and i ran basic tests on big endian machines and did not 
>> face any issue related to big endianness.
>>
>> so i made my homework, digged into the code, and basically, 
>> opal_process_name_t is used as an orte_process_name_t.
>> for example, in ompi_proc_init :
>>
>> OMPI_CAST_ORTE_NAME(>super.proc_name)->jobid = 
>> OMPI_PROC_MY_NAME->jobid;
>> OMPI_CAST_ORTE_NAME(>super.proc_name)->vpid = i;
>>
>> and with
>>
>> #define OMPI_CAST_ORTE_NAME(a) ((orte_process_name_t*)(a))
>>
>> so as long as an opal_process_name_t is used as an orte_process_name_t, 
>> there is no problem,
>> regardless the endianness of the homogenous cluster we are running on.
>>
>> for the sake of readability (and for being pedantic too ;-) ) in r32357,
>> _temp->super.proc_name 
>> could be replaced with
>> OMPI_CAST_ORTE_NAME(_temp->super.proc_name) 
>>
>>
>>
>> That being said, in btl/tcp, i noticed :
>>
>> in mca_btl_tcp_component_recv_handler :
>>
>> opal_process_name_t guid;
>> [...]
>> /* recv the process identifier */
>> retval = recv(sd, (char *), sizeof(guid), 0);
>> if(retval != sizeof(guid)) {
>> CLOSE_THE_SOCKET(sd);
>> return;
>> }
>> OPAL_PROCESS_NAME_NTOH(guid);
>>
>> and in mca_btl_tcp_endpoint_send_connect_ack :
>>
>> /* send process identifier to remote endpoint */
>> opal_process_name_t guid = btl_proc->proc_opal->proc_name;
>>
>> OPAL_PROCESS_NAME_HTON(guid);
>> if(mca_btl_tcp_endpoint_send_blocking(btl_endpoint, , sizeof(guid)) 
>> !=
>>
>> and with
>>
>> #define OPAL_PROCESS_NAME_NTOH(guid)
>> #define OPAL_PROCESS_NAME_HTON(guid)
>>
>>
>> i had no time yet to test yet, but for now, i can only suspect :
>> - there will be an issue with the tcp btl on an heterogeneous cluster
>> - for this case, the fix is to have a different version of the 
>> OPAL_PROCESS_NAME_xTOy
>>   on little endian arch if heterogeneous mode is supported.
>>
>>
>>
>> does that make sense ?
>>
>> Cheers,
>>
>> Gilles
>>
>>
>> On 2014/07/31 1:29, George Bosilca wrote:
>>> The underlying structure changed, so a little bit of fiddling is normal.
>>> Instead of using a field in the ompi_proc_t you are now using a field down
>>> in opal_proc_t, a field that simply cannot have the same type as before
>>> (orte_process_name_t).
>>>
>>>   George.
>>>
>>>
>>>
>>> On Wed, Jul 30, 2014 at 12:19 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>
>>>> George - my point was that we regularly tested using the method in that
>>>> routine, and now we have to do something a little different. So it is an
>>>> "issue" in that we have to make changes across the code base to ensure we
>>>> do things the "new" way, that's all
>>>>
>>>> On Jul 30, 2014, at 9:17 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
>>>>
>>>> No, this is not going to be an issue if the opal_identifier_t is used
>>>> correctly (aka only via the exposed accessors).
>>>>
>>>>   George.
>>>>
>>>>
>>>>
>>>> On Wed, Jul 30, 2014 at 12:09 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>
>>>

Re: [OMPI devel] oshmem enabled by default

2014-08-04 Thread Gilles Gouaillardet
Paul,

this is a bit trickier ...

on a Linux platform oshmem is built by default,
on a non Linux platform, oshmem is *not* built by default.

so the configure message (disabled by default) is correct on non Linux
platform, and incorrect on Linux platform ...

i do not know what should be done, here are some options :
- have a different behaviour on Linux vs non Linux platforms (by the
way, does autotools support this ?)
- disable by default, provide only the --enable-oshmem option (so
configure abort if --enable-oshmem on non Linux platforms)
- provide only the --disable-oshmem option, useful only on Linux
platforms. on non Linux platforms do not build oshmem and this is not an
error
- other ?

Cheers,

Gilles

r31155 | rhc | 2014-03-20 05:32:15 +0900 (Thu, 20 Mar 2014) | 5 lines

As per the thread on ticket #4399, OSHMEM does not support non-Linux
platforms. So provide a check for Linux and error out if --enable-oshmem
is given on a non-supported platform. If no OSHMEM option is given
(enable or disable), then don't attempt to build OSHMEM unless we are on
a Linux platform. Default to building if we are on Linux for now,
pending the outcome of the Debian situation.


On 2014/08/05 6:41, Paul Hargrove wrote:
> In both trunk and 1.8.2rc3 the behavior is to enable oshmem by default.
>
> In the 1.8.2rc3 tarball the configure help output matches the behavior.
> HOWEVER, in the trunk the configure help output still says oshmem is
> DISabled by default.
>
> {~/OMPI/ompi-trunk}$ svn info | grep "Revision"
> Revision: 32422
> {~/OMPI/ompi-trunk}$ ./configure --help | grep -A1 'enable-oshmem '
>   --enable-oshmem Enable building the OpenSHMEM interface (disabled
> by
>   default)
>
> -Paul
>
>
> On Thu, Jul 24, 2014 at 2:09 PM, Ralph Castain  wrote:
>
>> Actually, it already is set correctly - the help message was out of date,
>> so I corrected that.
>>
>> On Jul 24, 2014, at 10:58 AM, Marco Atzeri  wrote:
>>
>>> On 24/07/2014 15:52, Ralph Castain wrote:
 Oshmem should be enabled by default now
>>> Ok,
>>> so please reverse the configure switch
>>>
>>>  --enable-oshmem Enable building the OpenSHMEM interface
>> (disabled by default)
>>> I will test enabling it in the meantime.
>>>
>>> Regards
>>> Marco
>>>
>>>
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/07/15254.php
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/07/15261.php
>>
>
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15502.php



Re: [OMPI devel] 1.8.2rc3 now out

2014-08-04 Thread Gilles Gouaillardet
Fixed in r32409 : %d and %s were swapped in a MLERROR (printf like)

Gilles

On 2014/08/02 11:07, Gilles Gouaillardet wrote:
> Paul,
>
> about the second point :
> mmap is called with the MAP_FIXED flag, before the fix, the
> required address was not aligned on a page size and hence
> mmap failed.
> the mmap failure was immediatly handled, but for some reasons
> i did not fully investigate yet, this failure was not correctly propagated,
> leading to a SIGSEGV later in lmngr_register (if i remember correctly)
>
> i will add this to my todo list : investigate why the error is not correctly
> propagated and handled.
>
> Cheers,
>
> Gilles
>
> On Sat, Aug 2, 2014 at 6:05 AM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>
>> Regarding review of the coll/ml fix:
>>
>> While the fix Gilles worked out overnight proved sufficient on
>> Solaris/SPARC, Linux/PPC64 and Linux/IA64, I had two concerns:
>>
>> 1) As I already voiced on the list, I am concerned with the portability of
>> _SC_PAGESIZE vs _SC_PAGE_SIZE (vs get_pagesize()).
>>
>> 2) Though I have not tried to trace the code, the fact that fixing the
>> alignment prevents a SEGV strongly suggests that there was a mmap (or
>> something else sensitive to page size) call failing.  So, there should
>> probably be a check added for failure of that call to produce a cleaner
>> failure than SEGV.
>>
>> Just my USD 0.02.
>> -Paul
>>
>>
>> On Fri, Aug 1, 2014 at 6:39 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>
>>> Okay, I fixed those two and will release rc4 once the coll/ml fix has
>>> been reviewed. Thanks
>>>
>>> On Aug 1, 2014, at 2:46 AM, Mike Dubman <mi...@dev.mellanox.co.il> wrote:
>>>
>>> Also, latest commit into openib (origin/v1.8
>>> https://svn.open-mpi.org/trac/ompi/changeset/32391) broke something:
>>>
>>> *11:45:01* + timeout -s SIGSEGV 3m 
>>> /scrap/jenkins/workspace/OMPI-vendor/label/hpctest/ompi_install1/bin/mpirun 
>>> -np 8 -mca pml ob1 -mca btl self,openib 
>>> /scrap/jenkins/workspace/OMPI-vendor/label/hpctest/ompi_install1/examples/hello_usempi*11:45:01*
>>>  
>>> --*11:45:01*
>>>  WARNING: There are more than one active ports on host 'hpctest', but 
>>> the*11:45:01* default subnet GID prefix was detected on more than one of 
>>> these*11:45:01* ports.  If these ports are connected to different physical 
>>> IB*11:45:01* networks, this configuration will fail in Open MPI.  This 
>>> version of*11:45:01* Open MPI requires that every physically separate IB 
>>> subnet that is*11:45:01* used between connected MPI processes must have 
>>> different subnet ID*11:45:01* values.*11:45:01* *11:45:01* Please see this 
>>> FAQ entry for more details:*11:45:01* *11:45:01*   
>>> http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid*11:45:01*
>>>  *11:45:01* NOTE: You can turn off this warning by setting the MCA 
>>> parameter*11:45:01*   btl_openib_warn_default_gid_prefix to 
>>> 0.*11:45:01* 
>>> --*11:45:01*
>>>  
>>> --*11:45:01*
>>>  WARNING: No queue pairs were defined in the 
>>> btl_openib_receive_queues*11:45:01* MCA parameter.  At least one queue pair 
>>> must be defined.  The*11:45:01* OpenFabrics (openib) BTL will therefore be 
>>> deactivated for this run.*11:45:01* *11:45:01*   Local host: 
>>> hpctest*11:45:01* 
>>> --*11:45:01*
>>>  
>>> --*11:45:01*
>>>  At least one pair of MPI processes are unable to reach each other 
>>> for*11:45:01* MPI communications.  This means that no Open MPI device has 
>>> indicated*11:45:01* that it can be used to communicate between these 
>>> processes.  This is*11:45:01* an error; Open MPI requires that all MPI 
>>> processes be able to reach*11:45:01* each other.  This error can sometimes 
>>> be the result of forgetting to*11:45:01* specify the "self" BTL.*11:45:01* 
>>> *11:45:01*   Process 1 ([[55281,1],1]) is on host: hpctest*11:45:01*   
>>> Process 2 ([[55281,1],0]) is on host: hpctest*11:45:01*   BTLs attempted: 
>>> self*11:45:01* *11:45:01* Your MPI job is now going to abort; 
>>> sorry.*11:45:01*

Re: [OMPI devel] OMPI devel] trunk warnings on x86

2014-08-04 Thread Gilles Gouaillardet
Paul and all,

before r32408, the environment/abort test from the ibm test suite
crashed with SIGSEGV.

there is no more crash after the fix :-)

that being said, i experience some (random) hangs on my VM :
--mca btl tcp,self => no hang
--mca btl sm,self or --mca btl vader,self => hang about 25% of the time
--mca btl scif,self => always hang

only the mpirun process remains and is hanging.

i will try to debug this, and i welcome any help !

Cheers,

Gilles

On 2014/08/04 11:57, Gilles Gouaillardet wrote:
> Paul,
>
> i confirm ampersand was missing and this was a bug
> /* a similar bug was fixed by Ralph in r32357 */
>
> i commited r32408 in order to fix these three bugs.
>
> i also took the liberty to replace the OMPI_CAST_RTE_NAME
> with an inline function (only in debug mode) in order to get a
> compiler warning on both 32 and 64 bits arch in this case :
>
> #if OPAL_ENABLE_DEBUG
> static inline orte_process_name_t *
> OMPI_CAST_RTE_NAME(opal_process_name_t * name);
> #else
> #define OMPI_CAST_RTE_NAME(a) ((orte_process_name_t*)(a))
> #endif
>
> Cheers,
>
> Gilles
>
> On 2014/08/03 14:49, Gilles GOUAILLARDET wrote:
>> Paul,
>>
>> imho, the root cause is a missing ampersand.
>>
>> I will double check this from tomorrow only
>>
>> Cheers,
>>
>> Gilles
>>
>> Ralph Castain <r...@open-mpi.org> wrote:
>>> Arg - that raises an interesting point. This is a pointer to a 64-bit 
>>> number. Will uintptr_t resolve that problem on such platforms?
>>>
>>>
>>> On Aug 2, 2014, at 8:12 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>>>
>>>
>>> Looks like on a 32-bit platform a (uintptr_t) cast is desired in the 
>>> OMPI_CAST_RTE_NAME() macro.
>>>
>>>
>>> Warnings from current trunk tarball attributable to the missing case 
>>> include:
>>>
>>>
>>> /home/pcp1/phargrov/OMPI/openmpi-trunk-linux-x86-gcc/openmpi-1.9a1r32406/ompi/runtime/ompi_mpi_abort.c:89:
>>>  warning: cast to pointer from integer of different size
>>>
>>> /home/pcp1/phargrov/OMPI/openmpi-trunk-linux-x86-gcc/openmpi-1.9a1r32406/ompi/runtime/ompi_mpi_abort.c:97:
>>>  warning: cast to pointer from integer of different size
>>>
>>> /home/pcp1/phargrov/OMPI/openmpi-trunk-linux-x86-gcc/openmpi-1.9a1r32406/ompi/mca/pml/bfo/pml_bfo_failover.c:1417:
>>>  warning: cast to pointer from integer of different size
>>>
>>>
>>> -Paul
>>>
>>>
>>> -- 
>>>
>>> Paul H. Hargrove  phhargr...@lbl.gov
>>>
>>> Future Technologies Group
>>>
>>> Computer and Data Sciences Department Tel: +1-510-495-2352
>>>
>>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/08/15481.php
>>>
>>>
>>>
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/08/15484.php
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15489.php



Re: [OMPI devel] OMPI devel] trunk warnings on x86

2014-08-03 Thread Gilles Gouaillardet
Paul,

i confirm ampersand was missing and this was a bug
/* a similar bug was fixed by Ralph in r32357 */

i commited r32408 in order to fix these three bugs.

i also took the liberty to replace the OMPI_CAST_RTE_NAME
with an inline function (only in debug mode) in order to get a
compiler warning on both 32 and 64 bits arch in this case :

#if OPAL_ENABLE_DEBUG
static inline orte_process_name_t *
OMPI_CAST_RTE_NAME(opal_process_name_t * name);
#else
#define OMPI_CAST_RTE_NAME(a) ((orte_process_name_t*)(a))
#endif

Cheers,

Gilles

On 2014/08/03 14:49, Gilles GOUAILLARDET wrote:
> Paul,
>
> imho, the root cause is a missing ampersand.
>
> I will double check this from tomorrow only
>
> Cheers,
>
> Gilles
>
> Ralph Castain <r...@open-mpi.org> wrote:
>> Arg - that raises an interesting point. This is a pointer to a 64-bit 
>> number. Will uintptr_t resolve that problem on such platforms?
>>
>>
>> On Aug 2, 2014, at 8:12 PM, Paul Hargrove <phhargr...@lbl.gov> wrote:
>>
>>
>> Looks like on a 32-bit platform a (uintptr_t) cast is desired in the 
>> OMPI_CAST_RTE_NAME() macro.
>>
>>
>> Warnings from current trunk tarball attributable to the missing case include:
>>
>>
>> /home/pcp1/phargrov/OMPI/openmpi-trunk-linux-x86-gcc/openmpi-1.9a1r32406/ompi/runtime/ompi_mpi_abort.c:89:
>>  warning: cast to pointer from integer of different size
>>
>> /home/pcp1/phargrov/OMPI/openmpi-trunk-linux-x86-gcc/openmpi-1.9a1r32406/ompi/runtime/ompi_mpi_abort.c:97:
>>  warning: cast to pointer from integer of different size
>>
>> /home/pcp1/phargrov/OMPI/openmpi-trunk-linux-x86-gcc/openmpi-1.9a1r32406/ompi/mca/pml/bfo/pml_bfo_failover.c:1417:
>>  warning: cast to pointer from integer of different size
>>
>>
>> -Paul
>>
>>
>> -- 
>>
>> Paul H. Hargrove  phhargr...@lbl.gov
>>
>> Future Technologies Group
>>
>> Computer and Data Sciences Department Tel: +1-510-495-2352
>>
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/08/15481.php
>>
>>
>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/08/15484.php



Re: [OMPI devel] OMPI devel] trunk warnings on x86

2014-08-03 Thread Gilles GOUAILLARDET
Paul,

imho, the root cause is a missing ampersand.

I will double check this from tomorrow only

Cheers,

Gilles

Ralph Castain  wrote:
>Arg - that raises an interesting point. This is a pointer to a 64-bit number. 
>Will uintptr_t resolve that problem on such platforms?
>
>
>On Aug 2, 2014, at 8:12 PM, Paul Hargrove  wrote:
>
>
>Looks like on a 32-bit platform a (uintptr_t) cast is desired in the 
>OMPI_CAST_RTE_NAME() macro.
>
>
>Warnings from current trunk tarball attributable to the missing case include:
>
>
>/home/pcp1/phargrov/OMPI/openmpi-trunk-linux-x86-gcc/openmpi-1.9a1r32406/ompi/runtime/ompi_mpi_abort.c:89:
> warning: cast to pointer from integer of different size
>
>/home/pcp1/phargrov/OMPI/openmpi-trunk-linux-x86-gcc/openmpi-1.9a1r32406/ompi/runtime/ompi_mpi_abort.c:97:
> warning: cast to pointer from integer of different size
>
>/home/pcp1/phargrov/OMPI/openmpi-trunk-linux-x86-gcc/openmpi-1.9a1r32406/ompi/mca/pml/bfo/pml_bfo_failover.c:1417:
> warning: cast to pointer from integer of different size
>
>
>-Paul
>
>
>-- 
>
>Paul H. Hargrove                          phhargr...@lbl.gov
>
>Future Technologies Group
>
>Computer and Data Sciences Department     Tel: +1-510-495-2352
>
>Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
>
>___
>devel mailing list
>de...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Link to this post: 
>http://www.open-mpi.org/community/lists/devel/2014/08/15481.php
>
>


Re: [OMPI devel] 1.8.2rc3 now out

2014-08-01 Thread Gilles Gouaillardet
Paul,

about the second point :
mmap is called with the MAP_FIXED flag, before the fix, the
required address was not aligned on a page size and hence
mmap failed.
the mmap failure was immediatly handled, but for some reasons
i did not fully investigate yet, this failure was not correctly propagated,
leading to a SIGSEGV later in lmngr_register (if i remember correctly)

i will add this to my todo list : investigate why the error is not correctly
propagated and handled.

Cheers,

Gilles

On Sat, Aug 2, 2014 at 6:05 AM, Paul Hargrove  wrote:

> Regarding review of the coll/ml fix:
>
> While the fix Gilles worked out overnight proved sufficient on
> Solaris/SPARC, Linux/PPC64 and Linux/IA64, I had two concerns:
>
> 1) As I already voiced on the list, I am concerned with the portability of
> _SC_PAGESIZE vs _SC_PAGE_SIZE (vs get_pagesize()).
>
> 2) Though I have not tried to trace the code, the fact that fixing the
> alignment prevents a SEGV strongly suggests that there was a mmap (or
> something else sensitive to page size) call failing.  So, there should
> probably be a check added for failure of that call to produce a cleaner
> failure than SEGV.
>
> Just my USD 0.02.
> -Paul
>
>
> On Fri, Aug 1, 2014 at 6:39 AM, Ralph Castain  wrote:
>
>> Okay, I fixed those two and will release rc4 once the coll/ml fix has
>> been reviewed. Thanks
>>
>> On Aug 1, 2014, at 2:46 AM, Mike Dubman  wrote:
>>
>> Also, latest commit into openib (origin/v1.8
>> https://svn.open-mpi.org/trac/ompi/changeset/32391) broke something:
>>
>> *11:45:01* + timeout -s SIGSEGV 3m 
>> /scrap/jenkins/workspace/OMPI-vendor/label/hpctest/ompi_install1/bin/mpirun 
>> -np 8 -mca pml ob1 -mca btl self,openib 
>> /scrap/jenkins/workspace/OMPI-vendor/label/hpctest/ompi_install1/examples/hello_usempi*11:45:01*
>>  
>> --*11:45:01*
>>  WARNING: There are more than one active ports on host 'hpctest', but 
>> the*11:45:01* default subnet GID prefix was detected on more than one of 
>> these*11:45:01* ports.  If these ports are connected to different physical 
>> IB*11:45:01* networks, this configuration will fail in Open MPI.  This 
>> version of*11:45:01* Open MPI requires that every physically separate IB 
>> subnet that is*11:45:01* used between connected MPI processes must have 
>> different subnet ID*11:45:01* values.*11:45:01* *11:45:01* Please see this 
>> FAQ entry for more details:*11:45:01* *11:45:01*   
>> http://www.open-mpi.org/faq/?category=openfabrics#ofa-default-subnet-gid*11:45:01*
>>  *11:45:01* NOTE: You can turn off this warning by setting the MCA 
>> parameter*11:45:01*   btl_openib_warn_default_gid_prefix to 0.*11:45:01* 
>> --*11:45:01*
>>  
>> --*11:45:01*
>>  WARNING: No queue pairs were defined in the 
>> btl_openib_receive_queues*11:45:01* MCA parameter.  At least one queue pair 
>> must be defined.  The*11:45:01* OpenFabrics (openib) BTL will therefore be 
>> deactivated for this run.*11:45:01* *11:45:01*   Local host: 
>> hpctest*11:45:01* 
>> --*11:45:01*
>>  
>> --*11:45:01*
>>  At least one pair of MPI processes are unable to reach each other 
>> for*11:45:01* MPI communications.  This means that no Open MPI device has 
>> indicated*11:45:01* that it can be used to communicate between these 
>> processes.  This is*11:45:01* an error; Open MPI requires that all MPI 
>> processes be able to reach*11:45:01* each other.  This error can sometimes 
>> be the result of forgetting to*11:45:01* specify the "self" BTL.*11:45:01* 
>> *11:45:01*   Process 1 ([[55281,1],1]) is on host: hpctest*11:45:01*   
>> Process 2 ([[55281,1],0]) is on host: hpctest*11:45:01*   BTLs attempted: 
>> self*11:45:01* *11:45:01* Your MPI job is now going to abort; 
>> sorry.*11:45:01* 
>> --*11:45:01*
>>  
>> --*11:45:01*
>>  MPI_INIT has failed because at least one MPI process is 
>> unreachable*11:45:01* from another.  This *usually* means that an underlying 
>> communication*11:45:01* plugin -- such as a BTL or an MTL -- has either not 
>> loaded or not*11:45:01* allowed itself to be used.  Your MPI job will now 
>> abort.*11:45:01* *11:45:01* You may wish to try to narrow down the 
>> problem;*11:45:01* *11:45:01*  * Check the output of ompi_info to see which 
>> BTL/MTL plugins are*11:45:01*available.*11:45:01*  * Run your 
>> application with MPI_THREAD_SINGLE.*11:45:01*  * Set the MCA parameter 
>> btl_base_verbose to 100 (or mtl_base_verbose,*11:45:01*if using 
>> 

Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-08-01 Thread Gilles Gouaillardet
one last point :

in orte_process_name_t, jobid and vpid have type orte_jobid_t and
orte_vpid_t which really is uint32_t.

in orte/util/proc.c, the function pointers opal_process_name_vpid and
opal_process_name_jobid
return an int32_t

should it be an uint32_t instead ?
/* and then _process_name_jobid_for_opal, _process_name_vpid_for_opal,
opal_process_name_vpid_should_never_be_called
should also be updated */

Cheers,

Gilles

On 2014/08/01 19:52, Gilles Gouaillardet wrote:
> George and Ralph,
>
> i am very confused whether there is an issue or not.
>
>
> anyway, today Paul and i ran basic tests on big endian machines and did
> not face any issue related to big endianness.
>
> so i made my homework, digged into the code, and basically,
> opal_process_name_t is used as an orte_process_name_t.
> for example, in ompi_proc_init :
>
> OMPI_CAST_ORTE_NAME(>super.proc_name)->jobid =
> OMPI_PROC_MY_NAME->jobid;
> OMPI_CAST_ORTE_NAME(>super.proc_name)->vpid = i;
>
> and with
>
> #define OMPI_CAST_ORTE_NAME(a) ((orte_process_name_t*)(a))
>
> so as long as an opal_process_name_t is used as an orte_process_name_t,
> there is no problem,
> regardless the endianness of the homogenous cluster we are running on.
>
> for the sake of readability (and for being pedantic too ;-) ) in r32357,
> _temp->super.proc_name
> could be replaced with
> OMPI_CAST_ORTE_NAME(_temp->super.proc_name)
>
>
>
> That being said, in btl/tcp, i noticed :
>
> in mca_btl_tcp_component_recv_handler :
>
> opal_process_name_t guid;
> [...]
> /* recv the process identifier */
> retval = recv(sd, (char *), sizeof(guid), 0);
> if(retval != sizeof(guid)) {
> CLOSE_THE_SOCKET(sd);
> return;
> }
> OPAL_PROCESS_NAME_NTOH(guid);
>
> and in mca_btl_tcp_endpoint_send_connect_ack :
>
> /* send process identifier to remote endpoint */
> opal_process_name_t guid = btl_proc->proc_opal->proc_name;
>
> OPAL_PROCESS_NAME_HTON(guid);
> if(mca_btl_tcp_endpoint_send_blocking(btl_endpoint, ,
> sizeof(guid)) !=
>
> and with
>
> #define OPAL_PROCESS_NAME_NTOH(guid)
> #define OPAL_PROCESS_NAME_HTON(guid)
>
>
> i had no time yet to test yet, but for now, i can only suspect :
> - there will be an issue with the tcp btl on an heterogeneous cluster
> - for this case, the fix is to have a different version of the
> OPAL_PROCESS_NAME_xTOy
>   on little endian arch if heterogeneous mode is supported.
>
>
>
> does that make sense ?
>
> Cheers,
>
> Gilles
>
>
> On 2014/07/31 1:29, George Bosilca wrote:
>> The underlying structure changed, so a little bit of fiddling is normal.
>> Instead of using a field in the ompi_proc_t you are now using a field down
>> in opal_proc_t, a field that simply cannot have the same type as before
>> (orte_process_name_t).
>>
>>   George.
>>
>>
>>
>> On Wed, Jul 30, 2014 at 12:19 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>
>>> George - my point was that we regularly tested using the method in that
>>> routine, and now we have to do something a little different. So it is an
>>> "issue" in that we have to make changes across the code base to ensure we
>>> do things the "new" way, that's all
>>>
>>> On Jul 30, 2014, at 9:17 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
>>>
>>> No, this is not going to be an issue if the opal_identifier_t is used
>>> correctly (aka only via the exposed accessors).
>>>
>>>   George.
>>>
>>>
>>>
>>> On Wed, Jul 30, 2014 at 12:09 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>
>>>> Yeah, my fix won't work for big endian machines - this is going to be an
>>>> issue across the code base now, so we'll have to troll and fix it. I was
>>>> doing the minimal change required to fix the trunk in the meantime.
>>>>
>>>> On Jul 30, 2014, at 9:06 AM, George Bosilca <bosi...@icl.utk.edu> wrote:
>>>>
>>>> Yes. opal_process_name_t has basically no meaning by itself, it is a 64
>>>> bits storage location used by the upper layer to save some local key that
>>>> can be later used to extract information. Calling the OPAL level compare
>>>> function might be a better fit there.
>>>>
>>>>   George.
>>>>
>>>>
>>>>
>>>> On Wed, Jul 30, 2014 at 11:50 AM, Gilles Gouaillardet <
>>>> gilles.gouaillar...@gmail.com> wrote:
>>>>
>>>>> Ralph,
>&

Re: [OMPI devel] Trunk broken for PPC64?

2014-08-01 Thread Gilles Gouaillardet
Paul,

i just commited r32393 (and made a CMR for v1.8)

can you please give it a try ?

in the mean time, i received your email ...
sysconf is called directly (e.g. no #ifdef protected) in several other
places :
$ grep -R sysconf . | grep -v svn | grep -v sysconfdir | grep -v
autom4te |grep PAGE | grep -v LARGE
./oshmem/mca/memheap/ptmalloc/malloc.c:#define malloc_getpagesize
sysconf(_SC_PAGE_SIZE)
./ompi/mca/pml/base/pml_base_bsend.c:tmp = mca_pml_bsend_pagesz =
sysconf(_SC_PAGESIZE);
./ompi/mca/coll/ml/coll_ml_lmngr.c:cm->lmngr_alignment =
sysconf(_SC_PAGESIZE);
./orte/mca/oob/ud/oob_ud_module.c:posix_memalign ((void
**)_mem->ptr, sysconf(_SC_PAGESIZE), buffer_len);
./opal/mca/memory/linux/malloc.c:#define malloc_getpagesize
sysconf(_SC_PAGE_SIZE)
./opal/mca/hwloc/hwloc172/hwloc/src/topology-solaris.c:  remainder =
(uintptr_t) addr & (sysconf(_SC_PAGESIZE)-1);
./opal/mca/hwloc/hwloc172/hwloc/src/topology-linux.c:  remainder =
(uintptr_t) addr & (sysconf(_SC_PAGESIZE)-1);
./opal/mca/hwloc/hwloc172/hwloc/include/private/private.h:#define
hwloc_getpagesize() sysconf(_SC_PAGE_SIZE)
./opal/mca/hwloc/hwloc172/hwloc/include/private/private.h:#define
hwloc_getpagesize() sysconf(_SC_PAGESIZE)
./opal/mca/mpool/base/mpool_base_frame.c:mca_mpool_base_page_size =
sysconf(_SC_PAGESIZE);
./opal/mca/btl/openib/connect/btl_openib_connect_sl.c:long page_size
= sysconf(_SC_PAGESIZE);
./opal/mca/btl/openib/connect/btl_openib_connect_udcm.c:   
posix_memalign ((void **)>cm_buffer, sysconf(_SC_PAGESIZE),

that is why i did not #ifdef protect it in coll/ml


Cheers,

Gilles

On 2014/08/01 17:12, Paul Hargrove wrote:
> Gilles,
>
> At the moment ompi/mca/osc/sm/osc_sm_component.c is using the following:
>
> #ifdef HAVE_GETPAGESIZE
> pagesize = getpagesize();
> #else
> pagesize = 4096;
> #endif
>
> While other places in the code use sysconf(), but not always consistently.
>
> And on some systems _SC_PAGESIZE is spelled as _SC_PAGE_SIZE.
> Fortunately configure already checks these variations for you.
>
> So, I suggest
>
> #ifdef HAVE_GETPAGESIZE
> pagesize = getpagesize();
> #elif defined(_SC_PAGESIZE )
> pagesize = sysconf(_SC_PAGESIZE);
> #elif defined(_SC_PAGE_SIZE)
> pagesize = sysconf(_SC_PAGE_SIZE);
> #else
> pagesize = 65536; /* safer to overestimate than under */
> #endif
>
>
> opal_pagesize() anyone?
>
> -Paul
>
> On Fri, Aug 1, 2014 at 12:50 AM, Gilles Gouaillardet <
> gilles.gouaillar...@iferc.org> wrote:
>
>>  Paul,
>>
>> you are absolutly right !
>>
>> in ompi/mca/coll/ml/coll_ml_lmngr.c at line 53,
>> cm->lmngr_alignment is hard coded to 4096
>>
>> as a proof of concept, i hard coded it to 65536 and now coll/ml works just
>> fine
>>
>> i will now write a patch that uses sysconf(_SC_PAGESIZE) instead
>>
>> Cheers,
>>
>> Gilles
>>
>> On 2014/08/01 15:56, Paul Hargrove wrote:
>>
>> Hmm, maybe this has nothing to do with big-endian.
>> Below is a backtrace from ring_c on an IA64 platform (definitely
>> little-endian) that looks very similar to me.
>>
>> It happens that sysconf(_SC_PAGESIZE) returns 64K on both of these systems.
>> So, I wonder if that might be related.
>>
>> -Paul
>>
>> $ mpirun -mca btl sm,self -np 2 examples/ring_c'
>> [altix][[26769,1],0][/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/openmpi-1.9a1r32386/ompi/mca/coll/ml/coll_ml_lmngr.c:231:mca_coll_ml_lmngr_init]
>> COLL-ML [altix:20418] *** Process received signal ***
>> [altix:20418] Signal: Segmentation fault (11)
>> [altix:20418] Signal code: Invalid permissions (2)
>> [altix:20418] Failing at address: 0x16
>> [altix][[26769,1],1][/eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/openmpi-1.9a1r32386/ompi/mca/coll/ml/coll_ml_lmngr.c:231:mca_coll_ml_lmngr_init]
>> COLL-ML [altix:20419] *** Process received signal ***
>> [altix:20419] Signal: Segmentation fault (11)
>> [altix:20419] Signal code: Invalid permissions (2)
>> [altix:20419] Failing at address: 0x16
>> [altix:20418] [ 0] [0xa0010800]
>> [altix:20418] [ 1] /lib/libc.so.6.1(strlen-0x92e930)[0x2051b2a0]
>> [altix:20418] [altix:20419] [ 0] [0xa0010800]
>> [altix:20419] [ 1] [ 2]
>> /lib/libc.so.6.1(strlen-0x92e930)[0x2051b2a0]
>> [altix:20419] [ 2]
>> /lib/libc.so.6.1(_IO_vfprintf-0x998610)[0x204b15d0]
>> [altix:20419] [ 3] /lib/libc.so.6.1(+0x82860)[0x204b2860]
>> [altix:20419] [ 4]
>> /lib/libc.so.6.1(_IO_vfprintf-0x99f140)[0x2040]
>> [altix:20419] [ 5]
>> /eng/home/PHHargrove/OMPI/openmpi-trunk-linux-ia64/INST/lib/openmpi

Re: [OMPI devel] Trunk broken for PPC64?

2014-08-01 Thread Gilles Gouaillardet
Paul and Ralph,

for what it's worth :

a) i faced the very same issue on my (slw) qemu emulated ppc64 vm
b) i was able to run very basic programs when passing --mca coll ^ml to
mpirun

Cheers,

Gilles

On 2014/08/01 12:30, Ralph Castain wrote:
> Yes, I fear this will require some effort to chase all the breakage down 
> given that (to my knowledge, at least) we lack PPC machines in the devel 
> group.
>
>
> On Jul 31, 2014, at 5:46 PM, Paul Hargrove  wrote:
>
>> On the path to verifying George's atomics patch, I have started just by 
>> verifying that I can still build the UNPATCHED trunk on each of the 
>> platforms I listed.
>>
>> I have tried two PPC64/Linux systems so far and am seeing the same problem 
>> on both.  Though I can pass "make check" both platforms SEGV on
>>mpirun -mca btl sm,self -np 2 examples/ring_c
>>
>> Is this the expected state of the trunk on big-endian systems?
>> I am thinking in particular of 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15365.php in which 
>> Ralph wrote:
>>> Yeah, my fix won't work for big endian machines - this is going to be an 
>>> issue across the
>>> code base now, so we'll have to troll and fix it. I was doing the minimal 
>>> change required to
>>> fix the trunk in the meantime. 
>> If this big-endian failure is not known/expected let me know and I'll 
>> provide details.
>> Since testing George's patch only requires "make check" I can proceed with 
>> that regardless.
>>
>> -Paul
>>
>>
>> On Thu, Jul 31, 2014 at 4:25 PM, George Bosilca  wrote:
>> Awesome, thanks Paul. When the results will be in we will fix whatever is 
>> needed for these less common architectures.
>>
>>   George.
>>
>>
>>
>> On Thu, Jul 31, 2014 at 7:24 PM, Paul Hargrove  wrote:
>>
>>
>> On Thu, Jul 31, 2014 at 4:22 PM, Paul Hargrove  wrote:
>>
>> On Thu, Jul 31, 2014 at 4:13 PM, George Bosilca  wrote:
>> Paul, I know you have a pretty diverse range computers. Can you try to 
>> compile and run a "make check" with the following patch?
>>
>> I will see what I can do for ARMv7, MIPS, PPC and IA64 (or whatever subset 
>> of those is still supported).
>> The ARM and MIPS system are emulators and take forever to build OMPI.
>> However, I am not even sure how soon I'll get to start this testing.
>>
>>
>> Add SPARC (v8plus and v9) to that list.
>>  
>>
>>
>> -- 
>> Paul H. Hargrove  phhargr...@lbl.gov
>> Future Technologies Group
>> Computer and Data Sciences Department Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15411.php
>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15412.php
>>
>>
>>
>> -- 
>> Paul H. Hargrove  phhargr...@lbl.gov
>> Future Technologies Group
>> Computer and Data Sciences Department Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15414.php
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15425.php



Re: [OMPI devel] [OMPI svn] svn:open-mpi r32388 - trunk/ompi/mca/pml/ob1

2014-08-01 Thread Gilles Gouaillardet
Paul,

George just made a good point, you should test with his patch first

if it still does not work, could you try to mix gnu and sun compilers ?
configure ... CC=/usr/sfw/bin/gcc CXX=/usr/sfw/bin/g++ FC=

Cheers,

Gilles

On 2014/08/01 13:19, Paul Hargrove wrote:
> Gilles,
>
> This test was using the Solaris Studio Compilers version 12.3.
>
> /usr/bin/gcc on this system is "gccfss" which Open MPI does NOT support.
>
> There is also a gcc-3.3.2 in /usr/local/bin and gcc-3.4.3 in /usr/sfw/bin
> Neither includes usable fortran compilers, which is why the Studio
> compilers are preferred.
> Let me know if you need me to try any of those gcc installations.
>
> -Paul
>
>
> On Thu, Jul 31, 2014 at 9:12 PM, Gilles Gouaillardet <
> gilles.gouaillar...@iferc.org> wrote:
>
>>  Paul,
>>
>> As Ralph pointed, this issue was reported last month on the user mailing
>> list.
>>
>> #include  did not help :
>> http://www.open-mpi.org/community/lists/users/2014/07/24883.php
>>
>> I will try if i can reproduce and fix this issue on a solaris10 (but x86)
>> VM
>>
>> BTW, are you using the GNU compiler ?
>>
>> Cheers,
>>
>> Gilles
>>
>> On 2014/08/01 13:08, Paul Hargrove wrote:
>>
>> In general I am only setup to build from tarballs, not svn.
>> However, I can (and will) apply this change manually w/o difficulty.
>>
>> I will report back when I've had a chance to try that.
>>
>> I already have many builds in-flight to test George's atomics patch and am
>> in danger of confusing myself if I am not careful.
>>
>> -Paul
>>
>>
>> On Thu, Jul 31, 2014 at 8:29 PM, Ralph Castain <r...@open-mpi.org> 
>> <r...@open-mpi.org> wrote:
>>
>>
>>  FWIW: we had Siegmar try that and it didn't solve the problem. Paul?
>>
>>
>> On Jul 31, 2014, at 8:28 PM, svn-commit-mai...@open-mpi.org wrote:
>>
>>
>>  Author: bosilca (George Bosilca)
>> Date: 2014-07-31 23:28:23 EDT (Thu, 31 Jul 2014)
>> New Revision: 32388
>> URL: https://svn.open-mpi.org/trac/ompi/changeset/32388
>>
>> Log:
>> Missing alloca.h. Thanks Paul for catching this.
>>
>> Text files modified:
>>   trunk/ompi/mca/pml/ob1/pml_ob1_irecv.c | 3 +++
>>   trunk/ompi/mca/pml/ob1/pml_ob1_isend.c | 3 +++
>>   2 files changed, 6 insertions(+), 0 deletions(-)
>>
>> Modified: trunk/ompi/mca/pml/ob1/pml_ob1_irecv.c
>>
>>
>>  
>> ==
>>
>>  --- trunk/ompi/mca/pml/ob1/pml_ob1_irecv.cThu Jul 31 21:00:42 2014
>>
>>   (r32387)
>>
>>  +++ trunk/ompi/mca/pml/ob1/pml_ob1_irecv.c2014-07-31 23:28:23 EDT
>>
>>  (Thu, 31 Jul 2014)  (r32388)
>>
>>  @@ -28,6 +28,9 @@
>> #include "pml_ob1_recvfrag.h"
>> #include "ompi/peruse/peruse-internal.h"
>> #include "ompi/message/message.h"
>> +#if HAVE_ALLOCA_H
>> +#include 
>> +#endif  /* HAVE_ALLOCA_H */
>>
>> int mca_pml_ob1_irecv_init(void *addr,
>>size_t count,
>>
>> Modified: trunk/ompi/mca/pml/ob1/pml_ob1_isend.c
>>
>>
>>  
>> ==
>>
>>  --- trunk/ompi/mca/pml/ob1/pml_ob1_isend.cThu Jul 31 21:00:42 2014
>>
>>   (r32387)
>>
>>  +++ trunk/ompi/mca/pml/ob1/pml_ob1_isend.c2014-07-31 23:28:23 EDT
>>
>>  (Thu, 31 Jul 2014)  (r32388)
>>
>>  @@ -26,6 +26,9 @@
>> #include "pml_ob1_sendreq.h"
>> #include "pml_ob1_recvreq.h"
>> #include "ompi/peruse/peruse-internal.h"
>> +#if HAVE_ALLOCA_H
>> +#include 
>> +#endif  /* HAVE_ALLOCA_H */
>>
>> int mca_pml_ob1_isend_init(void *buf,
>>size_t count,
>> ___
>> svn mailing 
>> listsvn@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/svn
>>
>>  ___
>> devel mailing listde...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this 
>> post:http://www.open-mpi.org/community/lists/devel/2014/07/15424.php
>>
>>
>>
>>
>> ___
>> devel mailing listde...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/08/15427.php
>>
>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/08/15428.php
>>
>
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15430.php



Re: [OMPI devel] [OMPI svn] svn:open-mpi r32388 - trunk/ompi/mca/pml/ob1

2014-08-01 Thread Gilles Gouaillardet
Paul,

As Ralph pointed, this issue was reported last month on the user mailing
list.

#include  did not help :
http://www.open-mpi.org/community/lists/users/2014/07/24883.php

I will try if i can reproduce and fix this issue on a solaris10 (but x86) VM

BTW, are you using the GNU compiler ?

Cheers,

Gilles

On 2014/08/01 13:08, Paul Hargrove wrote:
> In general I am only setup to build from tarballs, not svn.
> However, I can (and will) apply this change manually w/o difficulty.
>
> I will report back when I've had a chance to try that.
>
> I already have many builds in-flight to test George's atomics patch and am
> in danger of confusing myself if I am not careful.
>
> -Paul
>
>
> On Thu, Jul 31, 2014 at 8:29 PM, Ralph Castain  wrote:
>
>> FWIW: we had Siegmar try that and it didn't solve the problem. Paul?
>>
>>
>> On Jul 31, 2014, at 8:28 PM, svn-commit-mai...@open-mpi.org wrote:
>>
>>> Author: bosilca (George Bosilca)
>>> Date: 2014-07-31 23:28:23 EDT (Thu, 31 Jul 2014)
>>> New Revision: 32388
>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/32388
>>>
>>> Log:
>>> Missing alloca.h. Thanks Paul for catching this.
>>>
>>> Text files modified:
>>>   trunk/ompi/mca/pml/ob1/pml_ob1_irecv.c | 3 +++
>>>   trunk/ompi/mca/pml/ob1/pml_ob1_isend.c | 3 +++
>>>   2 files changed, 6 insertions(+), 0 deletions(-)
>>>
>>> Modified: trunk/ompi/mca/pml/ob1/pml_ob1_irecv.c
>>>
>> ==
>>> --- trunk/ompi/mca/pml/ob1/pml_ob1_irecv.cThu Jul 31 21:00:42 2014
>>  (r32387)
>>> +++ trunk/ompi/mca/pml/ob1/pml_ob1_irecv.c2014-07-31 23:28:23 EDT
>> (Thu, 31 Jul 2014)  (r32388)
>>> @@ -28,6 +28,9 @@
>>> #include "pml_ob1_recvfrag.h"
>>> #include "ompi/peruse/peruse-internal.h"
>>> #include "ompi/message/message.h"
>>> +#if HAVE_ALLOCA_H
>>> +#include 
>>> +#endif  /* HAVE_ALLOCA_H */
>>>
>>> int mca_pml_ob1_irecv_init(void *addr,
>>>size_t count,
>>>
>>> Modified: trunk/ompi/mca/pml/ob1/pml_ob1_isend.c
>>>
>> ==
>>> --- trunk/ompi/mca/pml/ob1/pml_ob1_isend.cThu Jul 31 21:00:42 2014
>>  (r32387)
>>> +++ trunk/ompi/mca/pml/ob1/pml_ob1_isend.c2014-07-31 23:28:23 EDT
>> (Thu, 31 Jul 2014)  (r32388)
>>> @@ -26,6 +26,9 @@
>>> #include "pml_ob1_sendreq.h"
>>> #include "pml_ob1_recvreq.h"
>>> #include "ompi/peruse/peruse-internal.h"
>>> +#if HAVE_ALLOCA_H
>>> +#include 
>>> +#endif  /* HAVE_ALLOCA_H */
>>>
>>> int mca_pml_ob1_isend_init(void *buf,
>>>size_t count,
>>> ___
>>> svn mailing list
>>> s...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/07/15424.php
>>
>
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/08/15427.php



Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-31 Thread Gilles Gouaillardet
Paul,

the ibm test suite from the non public ompi-tests repository has several
tests for usempif08.

Cheers,

Gilles

On 2014/08/01 11:04, Paul Hargrove wrote:
> Second related issue:
>
> Can/should examples/hello_usempif08.f90 be extended to use more of the
> module such that it would have illustrated the bug found with Tetsuya's
> example code?   I don't know about MTT, but my scripts for testing a
> release candidate includes running "make" in the example subdir.
>
>



Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-31 Thread Gilles Gouaillardet
Paul,

in .../ompi/mpi/fortran/use-mpi-f08, can you create the following dumb
test program,
compile and run nm | grep f08 on the object :

$ cat foo.f90
program foo
use mpi_f08_sizeof

implicit none

real :: x
integer :: size, ierror

call MPI_Sizeof_real_s_4(x, size, ierror)

stop
end program


with intel compiler :
$ ifort -c foo.f90
$ nm foo.o | grep f08
 U mpi_f08_sizeof_mp_mpi_sizeof_real_s_4_

i am wondering whether PGI compiler adds an additional undefined
reference to mpi_f08_sizeof_ ...

Cheers,

Gilles



Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-31 Thread Gilles Gouaillardet
Paul and all,

For what it's worth, with openmpi 1.8.2rc2 and the intel fortran
compiler version 14.0.3.174 :

$ nm libmpi_usempif08.so| grep -i sizeof

there is no such undefined symbol (mpi_f08_sizeof_)

as a temporary workaround, did you try to force the linker use
libforce_usempif08_internal_modules_to_be_built.a

/* this library does not get installed (at least with intel compilers),
but it is in the compilation tree */

Cheers,

Gilles


On 2014/07/31 12:53, Paul Hargrove wrote:
> In 1.8.2rc2:
> $ nm openmpi-1.8.2rc2-linux-x86_64-pgi-14.4/INST/lib/libmpi_usempif08.so |
> grep ' mpi_f08_sizeof_'
>  U mpi_f08_sizeof_



Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread Gilles Gouaillardet
Ralph,

was it really that simple ?

proc_temp->super.proc_name has type opal_process_name_t :
typedef opal_identifier_t opal_process_name_t;
typedef uint64_t opal_identifier_t;

*but*

item_ptr->peer has type orte_process_name_t :
struct orte_process_name_t {
   orte_jobid_t jobid;
   orte_vpid_t vpid;
};

bottom line, is r32357 still valid on a big endian arch ?

Cheers,

Gilles


On Wed, Jul 30, 2014 at 11:49 PM, Ralph Castain <r...@open-mpi.org> wrote:

> I just fixed this one - all that was required was an ampersand as the name
> was being passed into the function instead of a pointer to the name
>
> r32357
>
> On Jul 30, 2014, at 7:43 AM, Gilles GOUAILLARDET <
> gilles.gouaillar...@gmail.com> wrote:
>
> Rolf,
>
> r32353 can be seen as a suspect...
> Even if it is correct, it might have exposed the bug discussed in #4815
> even more (e.g. we hit the bug 100% after the fix)
>
> does the attached patch to #4815 fixes the problem ?
>
> If yes, and if you see this issue as a showstopper, feel free to commit it
> and drop a note to #4815
> ( I am afk until tomorrow)
>
> Cheers,
>
> Gilles
>
> Rolf vandeVaart <rvandeva...@nvidia.com> wrote:
>
> Just an FYI that my trunk version (r32355) does not work at all anymore if
> I do not include "--mca coll ^ml".Here is a stack trace from the
> ibm/pt2pt/send test running on a single node.
>
>
>
> (gdb) where
>
> #0  0x7f6c0d1321d0 in ?? ()
>
> #1  
>
> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017',
> name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>
> #3  0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection
> (sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748,
> back_files=0x7f6bf3ffd6c8,
>
> comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 "sm_payload_mem_",
> map_all=false) at
> ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237
>
> #4  0x7f6c0be98307 in bcol_basesmuma_bank_init_opti
> (payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040,
> reg_data=0xba28c0)
>
> at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
>
> #5  0x7f6c0cced386 in mca_coll_ml_register_bcols (ml_module=0xba5c40)
> at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
>
> #6  0x7f6c0cced68f in ml_module_memory_initialization
> (ml_module=0xba5c40) at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
>
> #7  0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at
> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
>
> #8  0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0,
> priority=0x7fffe7991b58) at
> ../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
>
> #9  0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940,
> comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>
> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
>
> #10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, comm=0x6037a0,
> priority=0x7fffe7991b58, module=0x7fffe7991b90)
>
> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
>
> #11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0,
> component=0x7f6c0cf50940, module=0x7fffe7991b90)
>
> at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
>
> #12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0,
> comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
>
> #13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at
> ../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
>
> #14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8,
> requested=0, provided=0x7fffe79922e8) at
> ../../ompi/runtime/ompi_mpi_init.c:918
>
> #15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c,
> argv=0x7fffe7992340) at pinit.c:84
>
> #16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32
>
> (gdb) up
>
> #1  
>
> (gdb) up
>
> #2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017',
> name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>
> 522   if (name1->jobid < name2->jobid) {
>
> (gdb) print name1
>
> $1 = (const orte_process_name_t *) 0x192350001
>
> (gdb) print *name1
>
> Cannot access memory at address 0x192350001
>
> (gdb) print name2
>
> $2 = (const orte_process_name_t *) 0xbaf76c
>
> (gdb) print *name2
>
> $3 = {jobid = 2452946945, vpid = 1}
>
> (gdb)
>
>
>
>
>
>
>
> >-Original Message-
>
> >From: devel [mailto:devel-boun...

Re: [OMPI devel] OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread Gilles GOUAILLARDET
I will fix this tomorrow

Right now, --enable-mpi-fortran is --enable-mpi-fortran=yes is 
--enable-mpi-fortran=all :
So configure aborts if not all bindings can be built

In ompi_configure_options.m4 :
OMPI_FORTRAN_USER_REQUESTED=0
108 case "x$enable_mpi_fortran" in
109 x)
110 AC_MSG_RESULT([yes (all/default)])
111 OMPI_WANT_FORTRAN_MPIFH_BINDINGS=1
112 OMPI_WANT_FORTRAN_USEMPI_BINDINGS=1
113 OMPI_WANT_FORTRAN_USEMPIF08_BINDINGS=1
114 ;;
115 
116 xyes|xall)
117 AC_MSG_RESULT([yes (all)])
118 OMPI_FORTRAN_USER_REQUESTED=1
119 OMPI_WANT_FORTRAN_MPIFH_BINDINGS=1
120 OMPI_WANT_FORTRAN_USEMPI_BINDINGS=1
121 OMPI_WANT_FORTRAN_USEMPIF08_BINDINGS=1
122 ;;

OMPI_FORTRAN_USER_REQUESTED=1
should only happen when xall an not when xyes

I will review this tomorrow,
In the mean time, feel free to revert the changeset or simply not use the 
--enable-mpi-fortran for now

Cheers,

Gilles

Ralph Castain <r...@open-mpi.org> wrote:
>Ummthis really broke things now. I can't build the fortran bindings at 
>all, and I don't have a PGI compiler. I also didn't specify a level of Fortran 
>support, but just had --enable-mpi-fortran
>
>Maybe we need to revert this commit until we figure out a better solution?
>
>On Jul 30, 2014, at 12:16 AM, Gilles Gouaillardet 
><gilles.gouaillar...@iferc.org> wrote:
>
>> Paul,
>> 
>> this is a fair point.
>> 
>> i commited r32354 in order to abort configure in this case
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> On 2014/07/30 15:11, Paul Hargrove wrote:
>>> On a related topic:
>>> 
>>> I configured with an explicit --enable-mpi-fortran=usempif08.
>>> Then configure found PROCEDURE was missing/broken.
>>> The result is that the build continued, but without the requested f08
>>> support.
>>> 
>>> If the user has explicitly enabled a given level of Fortran support, but it
>>> cannot be provided, shouldn't this be a configure-time error?
>>> 
>>> -Paul
>>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/07/15352.php
>
>___
>devel mailing list
>de...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Link to this post: 
>http://www.open-mpi.org/community/lists/devel/2014/07/15357.php


Re: [OMPI devel] OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread Gilles GOUAILLARDET
Rolf,

r32353 can be seen as a suspect...
Even if it is correct, it might have exposed the bug discussed in #4815 even 
more (e.g. we hit the bug 100% after the fix)

does the attached patch to #4815 fixes the problem ?

If yes, and if you see this issue as a showstopper, feel free to commit it and 
drop a note to #4815
( I am afk until tomorrow)

Cheers,

Gilles

Rolf vandeVaart <rvandeva...@nvidia.com> wrote:
>
>
>Just an FYI that my trunk version (r32355) does not work at all anymore if I 
>do not include "--mca coll ^ml".    Here is a stack trace from the 
>ibm/pt2pt/send test running on a single node.
>
> 
>
>(gdb) where
>
>#0  0x7f6c0d1321d0 in ?? ()
>
>#1  
>
>#2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
>name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>
>#3  0x7f6c0bea17be in bcol_basesmuma_smcm_allgather_connection 
>(sm_bcol_module=0x7f6bf3b68040, module=0xb3d200, peer_list=0x7f6c0c0a6748, 
>back_files=0x7f6bf3ffd6c8, 
>
>comm=0x6037a0, input=..., base_fname=0x7f6c0bea2606 "sm_payload_mem_", 
>map_all=false) at 
>../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_smcm.c:237
>
>#4  0x7f6c0be98307 in bcol_basesmuma_bank_init_opti 
>(payload_block=0xbc0f60, data_offset=64, bcol_module=0x7f6bf3b68040, 
>reg_data=0xba28c0)
>
>    at ../../../../../ompi/mca/bcol/basesmuma/bcol_basesmuma_buf_mgmt.c:302
>
>#5  0x7f6c0cced386 in mca_coll_ml_register_bcols (ml_module=0xba5c40) at 
>../../../../../ompi/mca/coll/ml/coll_ml_module.c:510
>
>#6  0x7f6c0cced68f in ml_module_memory_initialization (ml_module=0xba5c40) 
>at ../../../../../ompi/mca/coll/ml/coll_ml_module.c:558
>
>#7  0x7f6c0ccf06b1 in ml_discover_hierarchy (ml_module=0xba5c40) at 
>../../../../../ompi/mca/coll/ml/coll_ml_module.c:1539
>
>#8  0x7f6c0ccf4e0b in mca_coll_ml_comm_query (comm=0x6037a0, 
>priority=0x7fffe7991b58) at 
>../../../../../ompi/mca/coll/ml/coll_ml_module.c:2963
>
>#9  0x7f6c18cc5b09 in query_2_0_0 (component=0x7f6c0cf50940, 
>comm=0x6037a0, priority=0x7fffe7991b58, module=0x7fffe7991b90)
>
>    at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:372
>
>#10 0x7f6c18cc5ac8 in query (component=0x7f6c0cf50940, comm=0x6037a0, 
>priority=0x7fffe7991b58, module=0x7fffe7991b90)
>
>    at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:355
>
>#11 0x7f6c18cc59d2 in check_one_component (comm=0x6037a0, 
>component=0x7f6c0cf50940, module=0x7fffe7991b90)
>
>    at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:317
>
>#12 0x7f6c18cc5818 in check_components (components=0x7f6c18f46ef0, 
>comm=0x6037a0) at ../../../../ompi/mca/coll/base/coll_base_comm_select.c:281
>
>#13 0x7f6c18cbe3c9 in mca_coll_base_comm_select (comm=0x6037a0) at 
>../../../../ompi/mca/coll/base/coll_base_comm_select.c:117
>
>#14 0x7f6c18c52301 in ompi_mpi_init (argc=1, argv=0x7fffe79924c8, 
>requested=0, provided=0x7fffe79922e8) at ../../ompi/runtime/ompi_mpi_init.c:918
>
>#15 0x7f6c18c86e92 in PMPI_Init (argc=0x7fffe799234c, argv=0x7fffe7992340) 
>at pinit.c:84
>
>#16 0x00401056 in main (argc=1, argv=0x7fffe79924c8) at send.c:32
>
>(gdb) up
>
>#1  
>
>(gdb) up
>
>#2  0x7f6c183abd52 in orte_util_compare_name_fields (fields=15 '\017', 
>name1=0x192350001, name2=0xbaf76c) at ../../orte/util/name_fns.c:522
>
>522       if (name1->jobid < name2->jobid) {
>
>(gdb) print name1
>
>$1 = (const orte_process_name_t *) 0x192350001
>
>(gdb) print *name1
>
>Cannot access memory at address 0x192350001
>
>(gdb) print name2
>
>$2 = (const orte_process_name_t *) 0xbaf76c
>
>(gdb) print *name2
>
>$3 = {jobid = 2452946945, vpid = 1}
>
>(gdb)
>
> 
>
> 
>
> 
>
>>-Original Message-
>
>>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles
>
>>Gouaillardet
>
>>Sent: Wednesday, July 30, 2014 2:16 AM
>
>>To: Open MPI Developers
>
>>Subject: Re: [OMPI devel] trunk compilation errors in jenkins
>
>>
>
> 
>
>>George,
>
>>
>
> 
>
>>#4815 is indirectly related to the move :
>
>>
>
> 
>
>>in bcol/basesmuma, we used to compare ompi_process_name_t, and now
>
>>we (try to) compare an ompi_process_name_t and an opal_process_name_t
>
>>(which causes a glory SIGSEGV)
>
>>
>
> 
>
>>i proposed a temporary patch which is both broken and unelegant, could you
>
>>please advise a correct solution ?
>
>>
>
> 
>
>>Cheers,
>
>>
>
> 
>
>>Gilles
>
>>
>
> 
>
>

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread Gilles Gouaillardet
Paul,

this is a fair point.

i commited r32354 in order to abort configure in this case

Cheers,

Gilles

On 2014/07/30 15:11, Paul Hargrove wrote:
> On a related topic:
>
> I configured with an explicit --enable-mpi-fortran=usempif08.
> Then configure found PROCEDURE was missing/broken.
> The result is that the build continued, but without the requested f08
> support.
>
> If the user has explicitly enabled a given level of Fortran support, but it
> cannot be provided, shouldn't this be a configure-time error?
>
> -Paul
>



Re: [OMPI devel] trunk compilation errors in jenkins

2014-07-30 Thread Gilles Gouaillardet
George,

#4815 is indirectly related to the move :

in bcol/basesmuma, we used to compare ompi_process_name_t, and now we
(try to)
compare an ompi_process_name_t and an opal_process_name_t (which causes
a glory SIGSEGV)

i proposed a temporary patch which is both broken and unelegant,
could you please advise a correct solution ?

Cheers,

Gilles

On 2014/07/27 7:37, George Bosilca wrote:
> If you have any issue with the move, I’ll be happy to help and/or support you 
> on your last move toward a completely generic BTL. To facilitate your work I 
> exposed a minimalistic set of OMPI information at the OPAL level. Take a look 
> at opal/util/proc.h for more info, but please try not to expose more.



Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread Gilles Gouaillardet
Paul,

i am sorry i missed that.

and you are right, 1.8.1 and 1.8 from svn differs :

from svn (config/ompi_setup_mpi_fortran.m4)
# Per https://svn.open-mpi.org/trac/ompi/ticket/4590, if the
# Fortran compiler doesn't support PROCEDURE in the way we
# want/need, disable the mpi_f08 module.
OMPI_FORTRAN_HAVE_PROCEDURE=0
AS_IF([test $OMPI_WANT_FORTRAN_USEMPIF08_BINDINGS -eq 1 -a \
   $OMPI_BUILD_FORTRAN_USEMPIF08_BINDINGS -eq 1],
  [ # Does the compiler support "procedure"
   OMPI_FORTRAN_CHECK_PROCEDURE(
   [OMPI_FORTRAN_HAVE_PROCEDURE=1],
   [OMPI_FORTRAN_HAVE_PROCEDURE=0
OMPI_BUILD_FORTRAN_USEMPIF08_BINDINGS=0])])

1.8.1 does not disqualify f08 bindings if PROCEDURE is not supported.
/* for the sake of completion, in some cases, 1.8.1 *might* disqualify
f08 bindings if PROCEDURE *is* supported :
# Per https://svn.open-mpi.org/trac/ompi/ticket/4157, temporarily
# disqualify the fortran compiler if it exhibits the behavior
# described in that ticket.  Short version: OMPI does something
# non-Fortran that we don't have time to fix 1.7.4.  So we just
# disqualify Fortran compilers who actually enforce this issue,
# and we'll fix OMPI to be Fortran-compliant after 1.7.4
AS_IF([test $OMPI_WANT_FORTRAN_USEMPIF08_BINDINGS -eq 1 && \
   test $OMPI_BUILD_FORTRAN_USEMPIF08_BINDINGS -eq 1 && \
   test $OMPI_FORTRAN_HAVE_PROCEDURE -eq 1 && \
   test $OMPI_FORTRAN_HAVE_ABSTRACT -eq 1],
  [ # Check for ticket 4157
   OMPI_FORTRAN_CHECK_TICKET_4157(
   [],
   [ # If we don't have this, don't build the mpi_f08 module
OMPI_BUILD_FORTRAN_USEMPIF08_BINDINGS=0])])


from the sources and #4590, f08 binding is intentionally disabled since
PGI compilers does not support PROCEDURE.
i agree this is really bad for PGI users :-(

Jeff, can you comment on that ?

Cheers,

Gilles

On 2014/07/30 13:25, Paul Hargrove wrote:
> Giles,
>
> If you look more carefully at the output I provided you will see that 1.8.1
> *does* test for PROCEDURE support and finds it lacking.  BOTH outputs
> include:
>  checking if Fortran compiler supports PROCEDURE... no
>
> However in the 1.8.1 case that is apparently not sufficient to disqualify
> building the f08 module.
>
> The test does fail in both 1.8.1 and 1.8.2rc2.
> Here is the related portion of config.log from one of them:
>
> configure:57708: checking if Fortran compiler supports PROCEDURE
> configure:57735: pgf90 -c -g conftest.f90 >&5 PGF90-S-0155-Illegal
> procedure interface - mpi_user_function (conftest.f90: 12)
> PGF90-S-0155-Illegal procedure interface - mpi_user_function (conftest.f90:
> 12) 0 inform, 0 warnings, 2 severes, 0 fatal for test_proc configure:57735:
> $? = 2 configure: failed program was: | MODULE proc_mod | INTERFACE |
> SUBROUTINE MPI_User_function | END SUBROUTINE | END INTERFACE | END MODULE
> proc_mod | | PROGRAM test_proc | INTERFACE | SUBROUTINE binky(user_fn) |
> USE proc_mod | PROCEDURE(MPI_User_function) :: user_fn | END SUBROUTINE |
> END INTERFACE | END PROGRAM configure:57751: result: no
>
> Other than the line numbers the 1.8.1 and 1.8.2rc2 output are identical in
> this respect.
>
> The test also fails run manually:
>
> {hargrove@hopper04 OMPI}$ pgf90 -c -g conftest.f90 PGF90-S-0155-Illegal
> procedure interface - mpi_user_function (conftest.f90: 12)
> PGF90-S-0155-Illegal procedure interface - mpi_user_function (conftest.f90:
> 12) 0 inform, 0 warnings, 2 severes, 0 fatal for test_proc
> {hargrove@hopper04 OMPI}$ pgf90 -V pgf90 13.10-0 64-bit target on x86-64
> Linux -tp shanghai The Portland Group - PGI Compilers and Tools Copyright
> (c) 2013, NVIDIA CORPORATION. All rights reserved.
>
> -Paul
>
> On Tue, Jul 29, 2014 at 9:09 PM, Gilles Gouaillardet <
> gilles.gouaillar...@iferc.org> wrote:
>
>>  Paul,
>>
>> from the logs, the only difference i see is about Fortran PROCEDURE.
>>
>> openpmi 1.8 (svn checkout) does not build the usempif08 bindings if
>> PROCEDURE is not supported.
>>
>> from the logs, openmpi 1.8.1 does not check whether PROCEDURE is supported
>> or not
>>
>> here is the sample program to check PROCEDURE (from
>> config/ompi_fortran_check_procedure.m4)
>>
>> MODULE proc_mod
>> INTERFACE
>> SUBROUTINE MPI_User_function
>> END SUBROUTINE
>> END INTERFACE
>> END MODULE proc_mod
>>
>> PROGRAM test_proc
>> INTERFACE
>> SUBROUTINE binky(user_fn)
>>   USE proc_mod
>>   PROCEDURE(MPI_User_function) :: user_fn
>> END SUBROUTINE
>> END INTERFACE
>> END PROGRAM
>>
>> i do not have a PGI license, could you plea

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread Gilles Gouaillardet
Paul,

from the logs, the only difference i see is about Fortran PROCEDURE.

openpmi 1.8 (svn checkout) does not build the usempif08 bindings if
PROCEDURE is not supported.

from the logs, openmpi 1.8.1 does not check whether PROCEDURE is
supported or not

here is the sample program to check PROCEDURE (from
config/ompi_fortran_check_procedure.m4)

MODULE proc_mod
INTERFACE
SUBROUTINE MPI_User_function
END SUBROUTINE
END INTERFACE
END MODULE proc_mod

PROGRAM test_proc
INTERFACE
SUBROUTINE binky(user_fn)
  USE proc_mod
  PROCEDURE(MPI_User_function) :: user_fn
END SUBROUTINE
END INTERFACE
END PROGRAM

i do not have a PGI license, could you please confirm the PGI compiler
fails compiling the test above ?

Cheers,

Gilles

On 2014/07/30 12:54, Paul Hargrove wrote:
> On Tue, Jul 29, 2014 at 6:38 PM, Paul Hargrove  wrote:
>
>> On Tue, Jul 29, 2014 at 6:33 PM, Paul Hargrove  wrote:
>>
>>> I am trying again with an explicit --enable-mpi-fortran=usempi at
>>> configure time to see what happens.
>>>
>> Of course that should have said --enable-mpi-fortran=usempif08
>>
> I've switched to using PG13.6 for my testing.
> I find that even when I pass that flag I see that use_mpi_f08 is NOT
> enabled:
>
> checking Fortran compiler ignore TKR syntax... not cached; checking variants
> checking for Fortran compiler support of TYPE(*), DIMENSION(*)... no
> checking for Fortran compiler support of !DEC$ ATTRIBUTES NO_ARG_CHECK... no
> checking for Fortran compiler support of !$PRAGMA IGNORE_TKR... no
> checking for Fortran compiler support of !DIR$ IGNORE_TKR... yes
> checking Fortran compiler ignore TKR syntax... 1:real, dimension(*):!DIR$
> IGNORE_TKR
> checking if Fortran compiler supports ISO_C_BINDING... yes
> checking if building Fortran 'use mpi' bindings... yes
> checking if Fortran compiler supports SUBROUTINE BIND(C)... yes
> checking if Fortran compiler supports TYPE, BIND(C)... yes
> checking if Fortran compiler supports TYPE(type), BIND(C, NAME="name")...
> yes
> checking if Fortran compiler supports PROCEDURE... no
> *checking if building Fortran 'use mpi_f08' bindings... no*
>
> Contrast that to openmpi-1.8.1 and the same compiler:
>
> checking Fortran compiler ignore TKR syntax... not cached; checking variants
> checking for Fortran compiler support of TYPE(*), DIMENSION(*)... no
> checking for Fortran compiler support of !DEC$ ATTRIBUTES NO_ARG_CHECK... no
> checking for Fortran compiler support of !$PRAGMA IGNORE_TKR... no
> checking for Fortran compiler support of !DIR$ IGNORE_TKR... yes
> checking Fortran compiler ignore TKR syntax... 1:real, dimension(*):!DIR$
> IGNORE_TKR
> checking if building Fortran 'use mpi' bindings... yes
> checking if Fortran compiler supports ISO_C_BINDING... yes
> checking if Fortran compiler supports SUBROUTINE BIND(C)... yes
> checking if Fortran compiler supports TYPE, BIND(C)... yes
> checking if Fortran compiler supports TYPE(type), BIND(C, NAME="name")...
> yes
> checking if Fortran compiler supports optional arguments... yes
> checking if Fortran compiler supports PRIVATE... yes
> checking if Fortran compiler supports PROTECTED... yes
> checking if Fortran compiler supports ABSTRACT... yes
> checking if Fortran compiler supports ASYNCHRONOUS... yes
> checking if Fortran compiler supports PROCEDURE... no
> checking size of Fortran type(test_mpi_handle)... 4
> checking Fortran compiler F08 assumed rank syntax... not cached; checking
> checking for Fortran compiler support of TYPE(*), DIMENSION(..)... no
> checking Fortran compiler F08 assumed rank syntax... no
> checking which mpi_f08 implementation to build... "good" compiler, no array
> subsections
> *checking if building Fortran 'use mpi_f08' bindings... yes*
>
> So, somewhere between 1.8.1 and 1.8.2rc2 something has happened in the
> configure logic to disqualify the pgf90 compiler.
>
> I also surprised to see 1.8.2rc2 performing *fewer* tests of FC then 1.8.1
> did (unless they moved elsewhere?).
>
> In the end I cannot reproduce the originally reported problem for the
> simple reason that I instead see:
>
> {hargrove@hopper04 openmpi-1.8.2rc2-linux-x86_64-pgi-14.4}$
> ./INST/bin/mpif90 ../test.f
> PGF90-F-0004-Unable to open MODULE file mpi_f08.mod (../test.f: 2)
> PGF90/x86-64 Linux 14.4-0: compilation aborted
>
>
> Tetsuya Mishima,
>
> Is it possible that your installation of 1.8.2rc2 was to the same prefix as
> an older build?
> It that is the case, you may have the mpi_f08.mod from the older build even
> though no f08 support is in the new build.
>
>
> -Paul
>
>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15342.php



Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-18 Thread Gilles Gouaillardet
>
> It would make sense, though I guess I always thought that was part of what
> happened in OBJ_CLASS_INSTANCE - guess I was wrong. My thinking was that
> DEREGISTER would be the counter to INSTANCE, and I do want to keep this
> from getting even more clunky - so maybe renaming INSTANCE to be REGISTER
> and completing the initialization inside it would be the way to go. Or
> renaming DEREGISTER to something more obviously the counter to INSTANCE?
>
>
just so we are clear :

on one hand OBJ_CLASS_INSTANCE is a macro that must be invoked "outside" of
a function :
It *statically* initializes a struct.

on the other hand, OBJ_CLASS_DEREGISTER is a macro that must be invoked
inside a function.

using OBJ_CLASS_REGISTER is not only about renaming, it also requires to
move all these invokations into functions.

my idea of having both OBJ_CLASS_INSTANCE and OBJ_CLASS_REGISTER is :
- we do not need to move OBJ_CLASS_INSTANCE into functions
- we can have two behaviours depending on OPAL_ENABLE_DEBUG :
OBJ_CLASS_REGISTER would simply do nothing if OPAL_ENABLE_DEBUG is zero
(and opal_class_initialize would still be invoked in opal_obj_new). that
could also be a bit faster than having only one OBJ_CLASS_REGISTER macro in
optimized mode.

that being said, i am also fine with simplifying this, remove
OBJ_CLASS_INSTANCE and use OBJ_CLASS_REGISTER and OBJ_CLASS_DEREGISTER


about the bug you hit, did you already solve it and how ?
a trivial workaround is not to dlclose the dynamic library (ok, that's
cheating ...)
a simple workaround (if it is even doable) is to declare the class
"somewhere else" so the (library containing the) class struct is not
dlclose'd before it is invoked (ok, that's ugly ...).

what i wrote earlier was misleading :
OBJ_CLASS_INSTANCE(class);
foo = OBJ_NEW(class);
then
opal_class_t class_class = {...};
foo->super.obj_class = _class;

class_class is no more accessible when the OBJ_RELEASE is called since the
library was dlclose'd, so you do not even get a change to invoke the
destructor ...

a possible workaround could be to malloc a copy of class_class, have
foo->super.obj_class point to it after each OBJ_NEW, and finally have its
cls_destruct_array point to NULL when closing the framework/component.
(of course that causes a leak ...)

Cheers,

Gilles


Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-18 Thread Gilles Gouaillardet
+1 for the overall idea !

On Fri, Jul 18, 2014 at 10:17 PM, Ralph Castain  wrote:
>
> * add an OBJ_CLASS_DEREGISTER and require that all instantiations be
> matched by deregister at close of the framework/component that instanced
> it. Of course, that requires that we protect the class system against
> someone releasing/deconstructing an object after the class was deregistered
> since we don't know who might be using that class outside of where it was
> created.
>
> my understanding is that in theory, we already have an issue and
fortunatly, we do not hit it :
let's consider a framework/component that instanciate a class
(OBJ_CLASS_INSTANCE) *with a destructor*, allocate an object of this class
(OBJ_NEW) and expects "someone else" will free it (OBJ_RELEASE)
if this framework/component ends up in a dynamic library that is dlclose'd
when the framework/component is no more used, then OBJ_RELEASE will try to
call the destructor which is no more accessible (since the lib was
dlclose'd)

i could not experience such a scenario yet, and of course, this does not
mean there is no problem. i experienced a "kind of" similar situation
described in http://www.open-mpi.org/community/lists/devel/2014/06/14937.php

back to OBJ_CLASS_DEREGISTER, what about an OBJ_CLASS_REGISTER in order to
make this symmetric and easier to debug ?

currently, OBJ_CLASS_REGISTER is "implied" the first time an object of a
given class is allocated. from opal_obj_new :
if (0 == cls->cls_initialized) opal_class_initialize(cls);

that could be replaced by an error if 0 == cls->cls_initialized
and OBJ_CLASS_REGISTER would simply call opal_class_initialize

of course, this change could be implemented only when compiled
with OPAL_ENABLE_DEBUG

Cheers,

Gilles


Re: [OMPI devel] Onesided failures

2014-07-17 Thread Gilles Gouaillardet
Rolf,

i commited r2389.

MPI_Win_allocate_shared is now invoked on a single node communicator

Cheers,

Gilles

On 2014/07/16 22:59, Rolf vandeVaart wrote:
> Sounds like a good plan.  Thanks for looking into this Gilles!
>
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles 
> GOUAILLARDET
> Sent: Wednesday, July 16, 2014 9:53 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] Onesided failures
>
>
> Unless I am missing something obvious, I will update the test tomorrow and 
> add a comm split to ensure MPI_Win_allocate_shared is called from single node 
> communicator and skip the test if this impossible
>



Re: [OMPI devel] Onesided failures

2014-07-16 Thread Gilles GOUAILLARDET
Rolf,

From the man page of MPI_Win_allocate_shared

It is the user's responsibility to ensure that the communicator comm represents 
a group of processes that can create a shared memory segment that can be 
accessed by all processes in the group

And from the mtt logs, you are running 4 tasks on 2 nodes.

Unless I am missing something obvious, I will update the test tomorrow and add 
a comm split to ensure MPI_Win_allocate_shared is called from single node 
communicator and skip the test if this impossible

Cheers,

Gilles

Rolf vandeVaart  wrote:
>
>
>On both 1.8 and trunk (as Ralph mentioned in meeting) we are seeing three 
>tests fail.
>
>http://mtt.open-mpi.org/index.php?do_redir=2205
>
> 
>
>Ibm/onesided/win_allocate_shared
>
>Ibm/onesided/win_allocated_shared_mpifh
>
>Ibm/onesided/win_allocated_shared_usempi
>
> 
>
>Is there a ticket that covers these failures?
>
> 
>
>Thanks,
>
>Rolf
>
>This email message is for the sole use of the intended recipient(s) and may 
>contain confidential information.  Any unauthorized review, use, disclosure or 
>distribution is prohibited.  If you are not the intended recipient, please 
>contact the sender by reply email and destroy all copies of the original 
>message. 
>


Re: [OMPI devel] RFC: Add an __attribute__((destructor)) function to opal

2014-07-16 Thread Gilles Gouaillardet
Ralph and all,

my understanding is that

opal_finalize_util

agressively tries to free memory that would be still allocated otherwise.

an other way of saying "make valgrind happy" is "fully automake memory
leak detection"
(Joost pointed to the -fsanitize=leak feature of gcc 4.9 in
http://www.open-mpi.org/community/lists/devel/2014/05/14672.php)

the following simple program :

#include 

int main(int argc, char* argv[])
{
  int ret, provided;
  ret = MPI_T_init_thread(MPI_THREAD_SINGLE, );
  ret = MPI_T_finalize();
  return 0;
}

leaks a *lot* of objects (and might remove some environment variables as
well) which have been half destroyed by opal_finalize_util, for example :
- classes are still marked as initialized *but* the cls_contruct_array
has been free'd
- the oob framework was not unallocated, it is still marked as
MCA_BASE_FRAMEWORK_FLAG_REGISTERED
  but some mca variables were freed, and that will cause problems when
MPI_Init try to (re)start the tcp component

now my 0.02$ :

ideally, MPI_Finalize nor MPI_T_finalize would leak any memory and the
framework would be re-initializable.
this could be a goal and George gave some good explanations on why it is
hard to achieve.
from my pragmatic point of view, and for this test case only, i am very
happy with a simple working solution,
even if it means that MPI_T_finalize leaks way too much memory in order
to work around the non re-initializable framework.

Cheers,

Gilles

On 2014/07/16 12:49, Ralph Castain wrote:
> I've attached a solution that blocks the segfault without requiring any 
> gyrations. Can someone explain why this isn't adequate?
>
> Alternate solution was to simply decrement opal_util_initialized in 
> MPI_T_finalize rather than calling finalize itself. Either way resolves the 
> problem in a very simple manner.
>



Re: [OMPI devel] 100% test failures

2014-07-15 Thread Gilles GOUAILLARDET
r32236 is a suspect

i am afk

I just read the code and a class is initialized with opal_class_initialize the 
first time an object is instantiated with OBJ_NEW

I would simply revert r32236 or update opal_class_finalize and 
free(cls->cls_construct_array); only if cls->cls_construct_array is not NULL

I hope this helps

Gilles

Ralph Castain  wrote:
>Hi folks
>
>
>The changes to opal_class_finalize are generating 100% segfaults on the trunk:
>
>
>175            free(cls->cls_construct_array);
>
>Missing separate debuginfos, use: debuginfo-install 
>glibc-2.12-1.132.el6_5.2.x86_64 libgcc-4.4.7-4.el6.x86_64 
>numactl-2.0.7-8.el6.x86_64
>
>(gdb) where
>
>#0  0x7f93e3206385 in opal_class_finalize () at class/opal_object.c:175
>
>#1  0x7f93e320b62f in opal_finalize_util () at runtime/opal_finalize.c:110
>
>#2  0x7f93e320b73b in opal_finalize () at runtime/opal_finalize.c:175
>
>#3  0x7f93e350e05f in orte_finalize () at runtime/orte_finalize.c:79
>
>#4  0x004057e2 in orterun (argc=4, argv=0x7fffe27ea718) at 
>orterun.c:1098
>
>#5  0x00403a04 in main (argc=4, argv=0x7fffe27ea718) at main.c:13
>
>
>Can someone please fix this?
>
>Ralph
>
>


Re: [OMPI devel] trunk and fortran errors

2014-07-11 Thread Gilles Gouaillardet
Thanks Jeff,

i confirm the problem is fixed on CentOS 5

i commited r32215 because some files were missing from the
tarball/nightly snapshot/make dist.

Cheers,

Gilles

On 2014/07/11 4:21, Jeff Squyres (jsquyres) wrote:
> As of r32204, this should be fixed.  Please let me know if it now works for 
> you.



Re: [OMPI devel] trunk and fortran errors

2014-07-10 Thread Gilles Gouaillardet
On CentOS 5.x, gfortran is unable to compile this simple program :

subroutine foo ()
  use, intrinsic :: iso_c_binding, only : c_ptr
end subroutine foo

an other workaround is to install gfortran 4.4
(yum install gcc44-gfortran)
and configure with
FC=gfortran44


On 2014/07/09 19:46, Jeff Squyres (jsquyres) wrote:
> This is almost certainly due to r32162 (Fortran commit from last night).
> [...]
> For the moment/as a workaround, use --disable-mpi-fortran in your builds if 
> you are building with an older gfortran.



Re: [OMPI devel] segv in ompi_info

2014-07-09 Thread Gilles Gouaillardet
Mike,

how do you test ?
i cannot reproduce a bug :

if you run ompi_info -a -l 9 | less

and i press 'q' at the early stage (e.g. before all output is written to
the pipe)
then the less process exits and receives SIG_PIPE and crash (which is a
normal unix behaviour)

now if i press the spacebar until the end of the output (e.g. i get the
(END) message from less)
and then press 'q', then there is no problem.

strace -e signal ompi_info -a -l 9 | true
will cause ompi_info receives a SIG_PIPE

strace -e signal dd if=/dev/zero bs=1M count=1 | true
will cause dd receives a SIG_PIPE

unless i miss something, i would conclude there is no bug

Cheers,

Gilles

On 2014/07/09 19:33, Mike Dubman wrote:
> mxm only intercept signals and prints the stacktrace.
> happens on trunk as well.
> only when "| less" is used.
>
>
>
>
>
>
> On Tue, Jul 8, 2014 at 4:50 PM, Jeff Squyres (jsquyres) 
> wrote:
>
>> I'm unable to replicate.  Please provide more detail...?  Is this a
>> problem in the MXM component?
>>
>> On Jul 8, 2014, at 9:20 AM, Mike Dubman  wrote:
>>
>>>
>>> $/usr/mpi/gcc/openmpi-1.8.2a1/bin/ompi_info -a -l 9|less
>>> Caught signal 13 (Broken pipe)
>>>  backtrace 
>>>  2 0x00054cac mxm_handle_error()
>>  /var/tmp/OFED_topdir/BUILD/mxm-3.2.2883/src/mxm/util/debug/debug.c:653
>>>  3 0x00054e74 mxm_error_signal_handler()
>>  /var/tmp/OFED_topdir/BUILD/mxm-3.2.2883/src/mxm/util/debug/debug.c:628
>>>  4 0x0033fbe32920 killpg()  ??:0
>>>  5 0x0033fbedb650 __write_nocancel()  interp.c:0
>>>  6 0x0033fbe71d53 _IO_file_write@@GLIBC_2.2.5()  ??:0
>>>  7 0x0033fbe73305 _IO_do_write@@GLIBC_2.2.5()  ??:0
>>>  8 0x0033fbe719cd _IO_file_xsputn@@GLIBC_2.2.5()  ??:0
>>>  9 0x0033fbe48410 _IO_vfprintf()  ??:0
>>> 10 0x0033fbe4f40a printf()  ??:0
>>> 11 0x0002bc84 opal_info_out()
>>  
>> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2a1/opal/runtime/opal_info_support.c:853
>>> 12 0x0002c6bb opal_info_show_mca_group_params()
>>  
>> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2a1/opal/runtime/opal_info_support.c:658
>>> 13 0x0002c882 opal_info_show_mca_group_params()
>>  
>> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2a1/opal/runtime/opal_info_support.c:716
>>> 14 0x0002cc13 opal_info_show_mca_params()
>>  
>> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2a1/opal/runtime/opal_info_support.c:742
>>> 15 0x0002d074 opal_info_do_params()
>>  
>> /var/tmp/OFED_topdir/BUILD/openmpi-1.8.2a1/opal/runtime/opal_info_support.c:485
>>> 16 0x0040167b main()  ??:0
>>> 17 0x0033fbe1ecdd __libc_start_main()  ??:0
>>> 18 0x00401349 _start()  ??:0
>>> ===
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/07/15075.php
>>
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/07/15076.php
>>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/07/15080.php



Re: [OMPI devel] centos-7 / rhel-7 build fail (configure fails to recognize g++)

2014-07-07 Thread Gilles Gouaillardet
Olivier,

i was unable to reproduce the issue on a centos7 beta with :

- trunk (latest nightly snapshot)
- 1.8.1
- 1.6.5

the libtool-ltdl-devel package is not installed on this server

that being said, i did not use
--with-verbs
nor
--with-tm

since these packages are not installed on my server.

are you installing from a tarball or from svn/git/hg ?
could you also compress and include the config.log ?

Gilles

On 2014/07/04 22:00, olivier.laha...@free.fr wrote:
> On centos-7 beta, the configure script fails to recognize the g++ compiler. 
> checking for the C++ compiler vendor... unknown 
> checking if C and C++ are link compatible... no 
> ** 
> * It appears that your C++ compiler is unable to link against object 
> * files created by your C compiler. This generally indicates either 
> * a conflict between the options specified in CFLAGS and CXXFLAGS 
> * or a problem with the local compiler installation. More 
> * information (including exactly what command was given to the 
> * compilers and what error resulted when the commands were executed) is 
> * available in the config.log file in this directory. 
> ** 
>
>



Re: [OMPI devel] MPI_Recv_init_null_c from intel test suite fails vs ompi trunk

2014-07-04 Thread Gilles Gouaillardet
Yossi,

thanks for reporting this issue.

i commited r32139 and r32140 to trunk in order to fix this issue (with
MPI_Startall)
and some misc extra bugs.

i also made CMR #4764 for the v1.8 branch (and asked George to review it)

Cheers,

Gilles

On 2014/07/03 22:25, Yossi Etigin wrote:
> Looks like this has to be fixed also for MPI_Startall, right?
>
>



Re: [OMPI devel] trunk broken

2014-06-25 Thread Gilles Gouaillardet
Mike,

could you try again with

OMPI_MCA_btl=vader,self,openib

it seems the sm module causes a hang
(which later causes the timeout sending a SIGSEGV)

Cheers,

Gilles

On 2014/06/25 14:22, Mike Dubman wrote:
> Hi,
> The following commit broke trunk in jenkins:
>
 Per the OMPI developer conference, remove the last vestiges of
> OMPI_USE_PROGRESS_THREADS
>
> *22:15:09* + 
> LD_LIBRARY_PATH=/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/lib*22:15:09*
> + OMPI_MCA_scoll_fca_enable=1*22:15:09* +
> OMPI_MCA_scoll_fca_np=0*22:15:09* + OMPI_MCA_pml=ob1*22:15:09* +
> OMPI_MCA_btl=sm,self,openib*22:15:09* + OMPI_MCA_spml=yoda*22:15:09* +
> OMPI_MCA_memheap_mr_interleave_factor=8*22:15:09* +
> OMPI_MCA_memheap=ptmalloc*22:15:09* +
> OMPI_MCA_btl_openib_if_include=mlx4_0:1*22:15:09* +
> OMPI_MCA_rmaps_base_dist_hca=mlx4_0*22:15:09* +
> OMPI_MCA_memheap_base_hca_name=mlx4_0*22:15:09* +
> OMPI_MCA_rmaps_base_mapping_policy=dist:mlx4_0*22:15:09* +
> MXM_RDMA_PORTS=mlx4_0:1*22:15:09* +
> SHMEM_SYMMETRIC_HEAP_SIZE=1024M*22:15:09* + timeout -s SIGSEGV 3m
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/oshm_install2/bin/oshrun
> -np 8 
> /scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/examples/hello_shmem*22:15:09*
> [vegas12:08101] *** Process received signal 22:15:09*
> [vegas12:08101] Signal: Segmentation fault (11)*22:15:09*
> [vegas12:08101] Signal code: Address not mapped (1)*22:15:09*
> [vegas12:08101] Failing at address: (nil)*22:15:09* [vegas12:08101] [
>



Re: [OMPI devel] MPI_Comm_spawn fails under certain conditions

2014-06-25 Thread Gilles Gouaillardet
Hi Ralph,

On 2014/06/25 2:51, Ralph Castain wrote:
> Had a chance to review this with folks here, and we think that having
> oversubscribe automatically set overload makes some sense. However, we do
> want to retain the ability to separately specify oversubscribe and overload
> as well since these two terms don't mean quite the same thing.
>
> Our proposal, therefore, is to have the --oversubscribe flag set both the
> --map-by :oversubscribe and --bind-to :overload-allowed properties. If
> someone specifies both the --oversubscribe flag and a conflicting directive
> for one or both of the individual properties, then we'll error out with a
> "bozo" message.
i fully agree.
> The use-cases you describe are (minus the crash) correct as the warning
> only is emitted when you are overloaded (i.e., trying to bind to more cpus
> than you have). So you won't get any warning when running on three nodes as
> you have enough cpus for all the procs, etc.
>
> I'll investigate the crash once I get home and have access to a cluster
> again. The problem likely has to do with not properly responding to the
> failure to spawn.
humm

because you already made the change described above(r32072), the crash
does not occur any more.

about the crash, i see things the other way around : spawn should have
not failed.
/* or spawn should have failed when running on a single node, at least
for the sake of consistency */

but like i said, it works now, so it might be just pedantic to point a
bug that is still here but that cannot be triggered ...

Cheers,

Gilles


Re: [OMPI devel] OMPI devel] RFC: semantic change of opal_hwloc_base_get_relative_locality

2014-06-24 Thread Gilles Gouaillardet
Ralph,

i pushed the change (r32079) and updated the wiki.

the RFC can be now closed and the consensus is semantic of
opal_hwloc_base_get_relative_locality
will not be changed since this is not needed : the hang is a coll/ml
bug, so it will be fixed within coll/ml.

Cheers,

Gilles

On 2014/06/25 1:12, Ralph Castain wrote:
> Yeah, we should make that change, if you wouldn't mind doing it.
>
>
>
> On Tue, Jun 24, 2014 at 9:43 AM, Gilles GOUAILLARDET <
> gilles.gouaillar...@gmail.com> wrote:
>
>> Ralph,
>>
>> That makes perfect sense.
>>
>> What about FCA_IS_LOCAL_PROCESS ?
>> Shall we keep it or shall we use directly OPAL_PROC_ON_LOCAL_NODE directly
>> ?
>>
>> Cheers
>>
>> Gilles
>>
>> Ralph Castain <r...@open-mpi.org> wrote:
>> Hi Gilles
>>
>> We discussed this at the devel conference this morning. The root cause of
>> the problem is a test in coll/ml that we feel is incorrect - it basically
>> checks to see if the proc itself is bound, and then assumes that all other
>> procs are similarly bound. This in fact is never guaranteed to be true as
>> someone could use the rank_file method to specify that some procs are to be
>> left unbound, while others are to be bound to specified cpus.
>>
>> Nathan has looked at that check before and believes it isn't necessary.
>> All coll/ml really needs to know is that the two procs share the same node,
>> and the current locality algorithm will provide that information. We have
>> asked him to "fix" the coll/ml selection logic to resolve that situation.
>>
>> After then discussing the various locality definitions, it was our feeling
>> that the current definition is probably the better one unless you have a
>> reason for changing it other than coll/ml. If so, we'd be happy to revisit
>> the proposal.
>>
>> Make sense?
>> Ralph
>>
>>
>>
>> On Tue, Jun 24, 2014 at 3:24 AM, Gilles Gouaillardet <
>> gilles.gouaillar...@iferc.org> wrote:
>>
>>> WHAT: semantic change of opal_hwloc_base_get_relative_locality
>>>
>>> WHY:  make is closer to what coll/ml expects.
>>>
>>>   Currently, opal_hwloc_base_get_relative_locality means "at what
>>> level do these procs share cpus"
>>>   however, coll/ml is using it as "at what level are these procs
>>> commonly bound".
>>>
>>>   it is important to note that if a task is bound to all the
>>> available cpus, locality should
>>>   be set to OPAL_PROC_ON_NODE only.
>>>   /* e.g. on a single socket Sandy Bridge system, use
>>> OPAL_PROC_ON_NODE instead of OPAL_PROC_ON_L3CACHE */
>>>
>>>   This has been initially discussed in the devel mailing list
>>>   http://www.open-mpi.org/community/lists/devel/2014/06/15030.php
>>>
>>>   as advised by Ralph, i browsed the source code looking for how the
>>> (ompi_proc_t *)->proc_flags is used.
>>>   so far, it is mainly used to figure out wether the proc is on the
>>> same node or not.
>>>
>>>   notable exceptions are :
>>>a) ompi/mca/sbgp/basesmsocket/sbgp_basesmsocket_component.c :
>>> OPAL_PROC_ON_LOCAL_SOCKET
>>>b) ompi/mca/coll/fca/coll_fca_module.c and
>>> oshmem/mca/scoll/fca/scoll_fca_module.c : FCA_IS_LOCAL_PROCESS
>>>
>>>   about a) the new definition fixes a hang in coll/ml
>>>   about b) FCA_IS_LOCAL_SOCKET looks like legacy code /* i could only
>>> found OMPI_PROC_FLAG_LOCAL in v1.3 */
>>>   so this macro can be simply removed and replaced with
>>> OPAL_PROC_ON_LOCAL_NODE
>>>
>>>   at this stage, i cannot find any objection not to do the described
>>> change.
>>>   please report if any and/or feel free to comment.
>>>
>>> WHERE: see the two attached patches
>>>
>>> TIMEOUT: June 30th, after the Open MPI developers meeting in Chicago,
>>> June 24-26.
>>>  The RFC will become final only after the meeting.
>>>  /* Ralph already added this topic to the agenda */
>>>
>>> Thanks
>>>
>>> Gilles
>>>
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2014/06/15046.php
>>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/06/15049.php
>>
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/06/15050.php



Re: [OMPI devel] OMPI devel] RFC: semantic change of opal_hwloc_base_get_relative_locality

2014-06-24 Thread Gilles GOUAILLARDET
Ralph,

That makes perfect sense.

What about FCA_IS_LOCAL_PROCESS ?
Shall we keep it or shall we use directly OPAL_PROC_ON_LOCAL_NODE directly ?

Cheers

Gilles

Ralph Castain <r...@open-mpi.org> wrote:
>Hi Gilles
>
>
>We discussed this at the devel conference this morning. The root cause of the 
>problem is a test in coll/ml that we feel is incorrect - it basically checks 
>to see if the proc itself is bound, and then assumes that all other procs are 
>similarly bound. This in fact is never guaranteed to be true as someone could 
>use the rank_file method to specify that some procs are to be left unbound, 
>while others are to be bound to specified cpus.
>
>
>Nathan has looked at that check before and believes it isn't necessary. All 
>coll/ml really needs to know is that the two procs share the same node, and 
>the current locality algorithm will provide that information. We have asked 
>him to "fix" the coll/ml selection logic to resolve that situation.
>
>
>After then discussing the various locality definitions, it was our feeling 
>that the current definition is probably the better one unless you have a 
>reason for changing it other than coll/ml. If so, we'd be happy to revisit the 
>proposal.
>
>
>Make sense?
>
>Ralph
>
>
>
>
>On Tue, Jun 24, 2014 at 3:24 AM, Gilles Gouaillardet 
><gilles.gouaillar...@iferc.org> wrote:
>
>WHAT: semantic change of opal_hwloc_base_get_relative_locality
>
>WHY:  make is closer to what coll/ml expects.
>
>      Currently, opal_hwloc_base_get_relative_locality means "at what level do 
>these procs share cpus"
>      however, coll/ml is using it as "at what level are these procs commonly 
>bound".
>
>      it is important to note that if a task is bound to all the available 
>cpus, locality should
>      be set to OPAL_PROC_ON_NODE only.
>      /* e.g. on a single socket Sandy Bridge system, use OPAL_PROC_ON_NODE 
>instead of OPAL_PROC_ON_L3CACHE */
>
>      This has been initially discussed in the devel mailing list
>      http://www.open-mpi.org/community/lists/devel/2014/06/15030.php
>
>      as advised by Ralph, i browsed the source code looking for how the 
>(ompi_proc_t *)->proc_flags is used.
>      so far, it is mainly used to figure out wether the proc is on the same 
>node or not.
>
>      notable exceptions are :
>       a) ompi/mca/sbgp/basesmsocket/sbgp_basesmsocket_component.c : 
>OPAL_PROC_ON_LOCAL_SOCKET
>       b) ompi/mca/coll/fca/coll_fca_module.c and 
>oshmem/mca/scoll/fca/scoll_fca_module.c : FCA_IS_LOCAL_PROCESS
>
>      about a) the new definition fixes a hang in coll/ml
>      about b) FCA_IS_LOCAL_SOCKET looks like legacy code /* i could only 
>found OMPI_PROC_FLAG_LOCAL in v1.3 */
>      so this macro can be simply removed and replaced with 
>OPAL_PROC_ON_LOCAL_NODE
>
>      at this stage, i cannot find any objection not to do the described 
>change.
>      please report if any and/or feel free to comment.
>
>WHERE: see the two attached patches
>
>TIMEOUT: June 30th, after the Open MPI developers meeting in Chicago, June 
>24-26.
>         The RFC will become final only after the meeting.
>         /* Ralph already added this topic to the agenda */
>
>Thanks
>
>Gilles
>
>
>___
>devel mailing list
>de...@open-mpi.org
>Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>Link to this post: 
>http://www.open-mpi.org/community/lists/devel/2014/06/15046.php
>
>


[OMPI devel] MPI_Comm_spawn fails under certain conditions

2014-06-24 Thread Gilles Gouaillardet
Folks,

this issue is related to the failures reported by mtt on the trunk when
the ibm test suite invokes MPI_Comm_spawn.

my test bed is made of 3 (virtual) machines with 2 sockets and 8 cpus
per socket each.

if i run on one host (without any batch manager)

mpirun -np 16 --host slurm1 --oversubscribe --mca coll ^ml
./intercomm_create

then the test is a success with the following warning  :

--
A request was made to bind to that would result in binding more
processes than cpus on a resource:

   Bind to: CORE
   Node:slurm2
   #processes:  2
   #cpus:   1

You can override this protection by adding the "overload-allowed"
option to your binding directive.
--


now if i run on three hosts

mpirun -np 16 --host slurm1,slurm2,slurm3 --oversubscribe --mca coll ^ml
./intercomm_create

then the test is a success without any warning


but now, if i run on two hosts

mpirun -np 16 --host slurm1,slurm2 --oversubscribe --mca coll ^ml
./intercomm_create

then the test is a failure.

first, i get the following same warning :

--
A request was made to bind to that would result in binding more
processes than cpus on a resource:

   Bind to: CORE
   Node:slurm2
   #processes:  2
   #cpus:   1

You can override this protection by adding the "overload-allowed"
option to your binding directive.
--

followed by a crash

[slurm1:2482] *** An error occurred in MPI_Comm_spawn
[slurm1:2482] *** reported by process [2068512769,0]
[slurm1:2482] *** on communicator MPI_COMM_WORLD
[slurm1:2482] *** MPI_ERR_SPAWN: could not spawn processes
[slurm1:2482] *** MPI_ERRORS_ARE_FATAL (processes in this communicator
will now abort,
[slurm1:2482] ***and potentially your MPI job)


that being said, i the following command works :

mpirun -np 16 --host slurm1,slurm2 --mca coll ^ml --bind-to none
./intercomm_create


1) what does the first message means ?
is it a warning ? /* if yes, why does mpirun on two hosts fail ? */
is it a fatal error ? /* if yes, why does mpirun on one host success
? */

2) generally speaking, and assuming the first message is a warning,
should --oversubscribe automatically set overload-allowed ?
/* as far as i am concerned, that would be much more intuitive */

Cheers,

Gilles



[OMPI devel] RFC: semantic change of opal_hwloc_base_get_relative_locality

2014-06-24 Thread Gilles Gouaillardet
WHAT: semantic change of opal_hwloc_base_get_relative_locality

WHY:  make is closer to what coll/ml expects.

  Currently, opal_hwloc_base_get_relative_locality means "at what level do 
these procs share cpus"
  however, coll/ml is using it as "at what level are these procs commonly 
bound".

  it is important to note that if a task is bound to all the available 
cpus, locality should
  be set to OPAL_PROC_ON_NODE only.
  /* e.g. on a single socket Sandy Bridge system, use OPAL_PROC_ON_NODE 
instead of OPAL_PROC_ON_L3CACHE */

  This has been initially discussed in the devel mailing list
  http://www.open-mpi.org/community/lists/devel/2014/06/15030.php

  as advised by Ralph, i browsed the source code looking for how the 
(ompi_proc_t *)->proc_flags is used.
  so far, it is mainly used to figure out wether the proc is on the same 
node or not.

  notable exceptions are :
   a) ompi/mca/sbgp/basesmsocket/sbgp_basesmsocket_component.c : 
OPAL_PROC_ON_LOCAL_SOCKET
   b) ompi/mca/coll/fca/coll_fca_module.c and 
oshmem/mca/scoll/fca/scoll_fca_module.c : FCA_IS_LOCAL_PROCESS

  about a) the new definition fixes a hang in coll/ml
  about b) FCA_IS_LOCAL_SOCKET looks like legacy code /* i could only found 
OMPI_PROC_FLAG_LOCAL in v1.3 */
  so this macro can be simply removed and replaced with 
OPAL_PROC_ON_LOCAL_NODE

  at this stage, i cannot find any objection not to do the described change.
  please report if any and/or feel free to comment.

WHERE: see the two attached patches

TIMEOUT: June 30th, after the Open MPI developers meeting in Chicago, June 
24-26.
 The RFC will become final only after the meeting.
 /* Ralph already added this topic to the agenda */

Thanks

Gilles

Index: opal/mca/hwloc/base/hwloc_base_util.c
===
--- opal/mca/hwloc/base/hwloc_base_util.c   (revision 32067)
+++ opal/mca/hwloc/base/hwloc_base_util.c   (working copy)
@@ -13,6 +13,8 @@
  * Copyright (c) 2012-2013 Los Alamos National Security, LLC.
  * All rights reserved.
  * Copyright (c) 2013-2014 Intel, Inc. All rights reserved.
+ * Copyright (c) 2014  Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  * 
  * Additional copyrights may follow
@@ -1315,8 +1317,7 @@
 hwloc_cpuset_t avail;
 bool shared;
 hwloc_obj_type_t type;
-int sect1, sect2;
-hwloc_cpuset_t loc1, loc2;
+hwloc_cpuset_t loc1, loc2, loc;

 /* start with what we know - they share a node on a cluster
  * NOTE: we may alter that latter part as hwloc's ability to
@@ -1337,6 +1338,19 @@
 hwloc_bitmap_list_sscanf(loc1, cpuset1);
 loc2 = hwloc_bitmap_alloc();
 hwloc_bitmap_list_sscanf(loc2, cpuset2);
+loc = hwloc_bitmap_alloc();
+hwloc_bitmap_or(loc, loc1, loc2);
+
+width = hwloc_get_nbobjs_by_depth(topo, 0);
+for (w = 0; w < width; w++) {
+obj = hwloc_get_obj_by_depth(topo, 0, w);
+avail = opal_hwloc_base_get_available_cpus(topo, obj);
+if ( hwloc_bitmap_isequal(avail, loc) ) {
+/* the task is bound to all the node cpus,
+   return without digging further */
+goto out;
+}
+}

 /* start at the first depth below the top machine level */
 for (d=1; d < depth; d++) {
@@ -1362,11 +1376,8 @@
 obj = hwloc_get_obj_by_depth(topo, d, w);
 /* get the available cpuset for this obj */
 avail = opal_hwloc_base_get_available_cpus(topo, obj);
-/* see if our locations intersect with it */
-sect1 = hwloc_bitmap_intersects(avail, loc1);
-sect2 = hwloc_bitmap_intersects(avail, loc2);
-/* if both intersect, then we share this level */
-if (sect1 && sect2) {
+/* see if our locations is included */
+if ( hwloc_bitmap_isincluded(loc, avail) ) {
 shared = true;
 switch(obj->type) {
 case HWLOC_OBJ_NODE:
@@ -1410,9 +1421,11 @@
 }
 }

+out:
 opal_output_verbose(5, opal_hwloc_base_framework.framework_output,
 "locality: %s",
 opal_hwloc_base_print_locality(locality));
+hwloc_bitmap_free(loc);
 hwloc_bitmap_free(loc1);
 hwloc_bitmap_free(loc2);

Index: oshmem/mca/scoll/fca/scoll_fca.h
===
--- oshmem/mca/scoll/fca/scoll_fca.h(revision 32067)
+++ oshmem/mca/scoll/fca/scoll_fca.h(working copy)
@@ -1,12 +1,14 @@
 /**
- *   Copyright (c) 2013  Mellanox Technologies, Inc.
- *   All rights reserved.
- * $COPYRIGHT$
+ * Copyright (c) 2013  Mellanox Technologies, Inc.
+ * All rights reserved.
+ * Copyright (c) 2014  Research 

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-20 Thread Gilles Gouaillardet
Ralph,

Here is attached a patch that fixes/works around my issue.
this is more of a proof of concept, so i did not commit it to the trunk.

basically :

opal_hwloc_base_get_relative_locality (topo, set1, set2)
sets the locality based on the deepest element that is part of both set1 and 
set2.
in my case, set2 means "all the available cpus" that is why the subroutine
will return OPAL_PROC_ON_HWTHREAD

the patch uses opal_hwloc_base_get_relative_locality2 instead.
if one of the cpuset means "all the available cpus", then the subroutine will
simply return OPAL_PROC_ON_NODE.

i am puzzled wether this is a bug in opal_hwloc_base_get_relative_locality
or in proc.c that should not call this subroutine because it does not do what
should be expected.

Cheers,

Gilles

On 2014/06/20 13:59, Gilles Gouaillardet wrote:
> Ralph,
>
> my test VM is single socket four cores.
> here is something odd i just found when running mpirun -np 2
> intercomm_create.
> tasks [0,1] are bound on cpus [0,1] => OK
> tasks[2-3] (first spawn) are bound on cpus [2,3] => OK
> tasks[4-5] (second spawn) are not bound (and cpuset is [0-3]) => OK
>
> in ompi_proc_set_locality (ompi/proc/proc.c:228) on task 0
> locality =
> opal_hwloc_base_get_relative_locality(opal_hwloc_topology,
> 
> ompi_process_info.cpuset,
> 
> cpu_bitmap);
> where
> ompi_process_info.cpuset is "0"
> cpu_bitmap is "0-3"
>
> and locality is set to OPAL_PROC_ON_HWTHREAD (!)
>
> is this correct ?
>
> i would have expected OPAL_PROC_ON_L2CACHE (since there is a single L2
> cache on my vm,
> as reported by lstopo) or even OPAL_PROC_LOCALITY_UNKNOWN
>
> then in mca_coll_ml_comm_query (ompi/mca/coll/ml/coll_ml_module.c:2899)
> the module
> disqualifies itself if !ompi_rte_proc_bound.
> if locality were previously set to OPAL_PROC_LOCALITY_UNKNOWN, coll/ml
> could checked the flag
> of all the procs of the communicator and disqualify itself if at least
> one of them is OPAL_PROC_LOCALITY_UNKNOWN.
>
>
> as you wrote, there might be a bunch of other corner cases.
> that being said, i ll try to write a simple proof of concept and see it
> this specific hang can be avoided
>
> Cheers,
>
> Gilles
>
> On 2014/06/20 12:08, Ralph Castain wrote:
>> It is related, but it means that coll/ml has a higher degree of sensitivity 
>> to the binding pattern than what you reported (which was that coll/ml 
>> doesn't work with unbound processes). What we are now seeing is that coll/ml 
>> also doesn't work when processes are bound across sockets.
>>
>> Which means that Nathan's revised tests are going to have to cover a lot 
>> more corner cases. Our locality flags don't currently include 
>> "bound-to-multiple-sockets", and I'm not sure how he is going to easily 
>> resolve that case.
>>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/06/15036.php

Index: opal/mca/hwloc/base/base.h
===
--- opal/mca/hwloc/base/base.h  (revision 32056)
+++ opal/mca/hwloc/base/base.h  (working copy)
@@ -1,6 +1,8 @@
 /*
  * Copyright (c) 2011-2012 Cisco Systems, Inc.  All rights reserved.
  * Copyright (c) 2013-2014 Intel, Inc. All rights reserved.
+ * Copyright (c) 2014  Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  * 
  * Additional copyrights may follow
@@ -86,6 +88,9 @@
 OPAL_DECLSPEC opal_hwloc_locality_t 
opal_hwloc_base_get_relative_locality(hwloc_topology_t topo,
   char 
*cpuset1, char *cpuset2);

+OPAL_DECLSPEC opal_hwloc_locality_t 
opal_hwloc_base_get_relative_locality2(hwloc_topology_t topo,
+  char 
*cpuset1, char *cpuset2);
+
 OPAL_DECLSPEC int opal_hwloc_base_set_binding_policy(opal_binding_policy_t 
*policy, char *spec);

 /**
Index: opal/mca/hwloc/base/hwloc_base_util.c
===
--- opal/mca/hwloc/base/hwloc_base_util.c   (revision 32056)
+++ opal/mca/hwloc/base/hwloc_base_util.c   (working copy)
@@ -13,6 +13,8 @@
  * Copyright (c) 2012-2013 Los Alamos National Security, LLC.
  * All rights reserved.
  * Copyright (c) 2013-2014 Intel, Inc. All rights reserved.
+ * Copyright (c) 2014  Research Org

Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-20 Thread Gilles Gouaillardet
Ralph,

my test VM is single socket four cores.
here is something odd i just found when running mpirun -np 2
intercomm_create.
tasks [0,1] are bound on cpus [0,1] => OK
tasks[2-3] (first spawn) are bound on cpus [2,3] => OK
tasks[4-5] (second spawn) are not bound (and cpuset is [0-3]) => OK

in ompi_proc_set_locality (ompi/proc/proc.c:228) on task 0
locality =
opal_hwloc_base_get_relative_locality(opal_hwloc_topology,

ompi_process_info.cpuset,

cpu_bitmap);
where
ompi_process_info.cpuset is "0"
cpu_bitmap is "0-3"

and locality is set to OPAL_PROC_ON_HWTHREAD (!)

is this correct ?

i would have expected OPAL_PROC_ON_L2CACHE (since there is a single L2
cache on my vm,
as reported by lstopo) or even OPAL_PROC_LOCALITY_UNKNOWN

then in mca_coll_ml_comm_query (ompi/mca/coll/ml/coll_ml_module.c:2899)
the module
disqualifies itself if !ompi_rte_proc_bound.
if locality were previously set to OPAL_PROC_LOCALITY_UNKNOWN, coll/ml
could checked the flag
of all the procs of the communicator and disqualify itself if at least
one of them is OPAL_PROC_LOCALITY_UNKNOWN.


as you wrote, there might be a bunch of other corner cases.
that being said, i ll try to write a simple proof of concept and see it
this specific hang can be avoided

Cheers,

Gilles

On 2014/06/20 12:08, Ralph Castain wrote:
> It is related, but it means that coll/ml has a higher degree of sensitivity 
> to the binding pattern than what you reported (which was that coll/ml doesn't 
> work with unbound processes). What we are now seeing is that coll/ml also 
> doesn't work when processes are bound across sockets.
>
> Which means that Nathan's revised tests are going to have to cover a lot more 
> corner cases. Our locality flags don't currently include 
> "bound-to-multiple-sockets", and I'm not sure how he is going to easily 
> resolve that case.
>



Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-20 Thread Gilles Gouaillardet
Ralph and Tetsuya,

is this related to the hang i reported at
http://www.open-mpi.org/community/lists/devel/2014/06/14975.php ?

Nathan already replied he is working on a fix.

Cheers,

Gilles


On 2014/06/20 11:54, Ralph Castain wrote:
> My guess is that the coll/ml component may have problems with binding a 
> single process across multiple cores like that - it might be that we'll have 
> to have it check for that condition and disqualify itself. It is a 
> particularly bad binding pattern, though, as shared memory gets completely 
> messed up when you split that way.
>



[OMPI devel] v1.8 cannot compile since r31979

2014-06-10 Thread Gilles Gouaillardet
Folks,

in mca_oob_tcp_component_hop_unknown, the local variable bpr is not
defined, which prevents v1.8 compilation.

/* there was a local variable called pr, it seems it was removed instead of
being renamed into bpr */

the attached patch fixes this issue.

Cheers,

Gilles
Index: orte/mca/oob/tcp/oob_tcp_component.c
===
--- orte/mca/oob/tcp/oob_tcp_component.c	(revision 31980)
+++ orte/mca/oob/tcp/oob_tcp_component.c	(working copy)
@@ -928,6 +928,7 @@
 mca_oob_tcp_msg_error_t *mop = (mca_oob_tcp_msg_error_t*)cbdata;
 uint64_t ui64;
 orte_rml_send_t *snd;
+orte_oob_base_peer_t *bpr;
 
 opal_output_verbose(OOB_TCP_DEBUG_CONNECT, orte_oob_base_framework.framework_output,
 "%s tcp:unknown hop called for peer %s",


<    2   3   4   5   6   7   8   >