[OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-28 Thread tmishima
Hi folks, I tried to build openmpi-1.8.2rc2 with PGI-14.7 and execute a sample program. Then, it causes linking error: [mishima@manage work]$ cat test.f program hello_world use mpi_f08 implicit none type(MPI_Comm) :: comm integer :: myid, npes, ierror integer

Re: [OMPI devel] === CREATE FAILURE (trunk) ===

2014-07-28 Thread Ralph Castain
Fix is coming On Jul 28, 2014, at 6:11 PM, MPI Team wrote: > > ERROR: Command returned a non-zero exist status (trunk): > make -j 8 distcheck > > Start time: Mon Jul 28 21:05:01 EDT 2014 > End time: Mon Jul 28 21:11:02 EDT 2014 > > =

Re: [OMPI devel] opal_config_bottom.h problem with trunk

2014-07-28 Thread Jeff Squyres (jsquyres)
Nope, haven't seen that before... On Jul 28, 2014, at 6:43 PM, Pritchard Jr., Howard wrote: > Hi Folks, > > I was feeling lucky and decided to a fresh svn checkout of trunk and simple > ./autogen.pl, ./configure > make on a opensuse 13.1. > > I get a blowup in opal_config_bottom.h: > > p

[OMPI devel] opal_config_bottom.h problem with trunk

2014-07-28 Thread Pritchard Jr., Howard
Hi Folks, I was feeling lucky and decided to a fresh svn checkout of trunk and simple ./autogen.pl, ./configure make on a opensuse 13.1. I get a blowup in opal_config_bottom.h: pp@hagel-vm:~>../../opal/include/opal_config_bottom.h:383:38: error: expected declaration specifiers or '...' before

Re: [OMPI devel] Trunk fails to build with --disable-dlopen

2014-07-28 Thread Nathan Hjelm
Ok, got --disable-dlopen working again. I removed the code in question and changed how coll/sm shares the segment data. -Nathan On Mon, Jul 28, 2014 at 02:41:37PM -0600, Nathan Hjelm wrote: > > Or pull it into coll/sm. Though I think we can do better here since > point-to-point messaging can be

Re: [OMPI devel] Trunk fails to build with --disable-dlopen

2014-07-28 Thread Nathan Hjelm
Or pull it into coll/sm. Though I think we can do better here since point-to-point messaging can be used in coll/sm. We can use the netpatterns code to share the segment information. -Nathan On Mon, Jul 28, 2014 at 08:37:15PM +, Jeff Squyres (jsquyres) wrote: > Perhaps that RML code can go b

Re: [OMPI devel] Trunk fails to build with --disable-dlopen

2014-07-28 Thread Jeff Squyres (jsquyres)
Perhaps that RML code can go back up in ompi/common/sm...? (since only ompi/coll/sm uses it) On Jul 28, 2014, at 4:34 PM, Nathan Hjelm wrote: > > Damn, spoke too soon. coll/sm uses it: > > ./ompi/mca/coll/sm/coll_sm_module.c: > mca_common_sm_init_group(comm->c_local_group, size, ful

Re: [OMPI devel] Trunk fails to build with --disable-dlopen

2014-07-28 Thread Nathan Hjelm
Damn, spoke too soon. coll/sm uses it: ./ompi/mca/coll/sm/coll_sm_module.c: mca_common_sm_init_group(comm->c_local_group, size, fullpath, ./ompi/mca/coll/sm/coll_sm_module.c: "coll:sm:enable:bootstrap comm (%d/%s): mca_common_sm_init_group failed", Let me se

Re: [OMPI devel] Trunk fails to build with --disable-dlopen

2014-07-28 Thread Nathan Hjelm
Looks like you are correct. The function that calls the rml code is mca_common_sm_init which is no longer called by anything (other than mca_common_sm_init_group.. which isn't called either). Let me see if I can fix this. I need this build working again with --disable-dlopen. mu-fey:/usr/projects

Re: [OMPI devel] Trunk fails to build with --disable-dlopen

2014-07-28 Thread Ralph Castain
I'm not sure the sm actually relies on the RML any more - I thought we had removed that dependency, though the file may not have been deleted. On Jul 28, 2014, at 1:02 PM, Nathan Hjelm wrote: > > The trunk is totally broken and it might not be easy to fix. I am seeing > this error when buildin

Re: [OMPI devel] Trunk fails to build with --disable-dlopen

2014-07-28 Thread George Bosilca
This has been clear from day one: everything based on RML to setup will need to be rewritten. This is not only SM, it also related to IB. Meanwhile, one must build with dlopen enabled in order to get access to these calls. George. On Mon, Jul 28, 2014 at 4:02 PM, Nathan Hjelm wrote: > > The

[OMPI devel] Trunk fails to build with --disable-dlopen

2014-07-28 Thread Nathan Hjelm
The trunk is totally broken and it might not be easy to fix. I am seeing this error when building with --disable-dlopen (the LANL default): /usr/projects/hpctools/hjelmn/ompi-trunk-git/opal/mca/common/sm/common_sm_rml.c: In function 'mca_common_sm_rml_info_bcast': /usr/projects/hpctools/hjelmn/o

Re: [OMPI devel] opal_process_info.job_session_dir: "not yet defined"

2014-07-28 Thread Jeff Squyres (jsquyres)
Ralph's commit fixed both usnic and sm. Thanks! On Jul 28, 2014, at 2:36 PM, Jeff Squyres (jsquyres) wrote: > On Jul 28, 2014, at 2:23 PM, George Bosilca wrote: > >> Patience ... > > No worries; we knew stuff like this would happen after the initial merge. > > Thanks for digging in to it.

Re: [OMPI devel] opal_process_info.job_session_dir: "not yet defined"

2014-07-28 Thread Ralph Castain
Working on it now... On Jul 28, 2014, at 11:36 AM, Jeff Squyres (jsquyres) wrote: > On Jul 28, 2014, at 2:23 PM, George Bosilca wrote: > >> Patience ... > > No worries; we knew stuff like this would happen after the initial merge. > > Thanks for digging in to it. > > -- > Jeff Squyres > j

Re: [OMPI devel] opal_process_info.job_session_dir: "not yet defined"

2014-07-28 Thread Jeff Squyres (jsquyres)
On Jul 28, 2014, at 2:23 PM, George Bosilca wrote: > Patience ... No worries; we knew stuff like this would happen after the initial merge. Thanks for digging in to it. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/leg

Re: [OMPI devel] opal_process_info.job_session_dir: "not yet defined"

2014-07-28 Thread George Bosilca
Ignore my previous email, I see what is going on. Basically, there are 6 data made available to the BTL: nodename, job_session_dir, proc_session_dir, num_local_peers, my_local_rank and if available cpuset. Some of this information is available early in the startup while others are only available a

Re: [OMPI devel] opal_process_info.job_session_dir: "not yet defined"

2014-07-28 Thread George Bosilca
Well, I'm slightly confused as the BTL are initialized outside opal_init. There must be a specific call to mca_base_framework_open for the BTL, and currently this call is made in the BML. As the BML is only initialized once the RTE is up, I don't understand how do you get the "not initialized".

Re: [OMPI devel] opal_process_info.job_session_dir: "not yet defined"

2014-07-28 Thread Jeff Squyres (jsquyres)
I'd be ok with that. George? On Jul 28, 2014, at 1:20 PM, Ralph Castain wrote: > I think we should not have opal_init setup the BTLs at all. Let's leave that > for the RTE setup to do as it can control the sequencing to ensure all the > data is available and ready > > On Jul 28, 2014, at 10

Re: [OMPI devel] opal_process_info.job_session_dir: "not yet defined"

2014-07-28 Thread Ralph Castain
I think we should not have opal_init setup the BTLs at all. Let's leave that for the RTE setup to do as it can control the sequencing to ensure all the data is available and ready On Jul 28, 2014, at 10:21 AM, Jeff Squyres (jsquyres) wrote: > Well, this is a pickle. > > I'm setting up compon

Re: [OMPI devel] opal_process_info.job_session_dir: "not yet defined"

2014-07-28 Thread Jeff Squyres (jsquyres)
Well, this is a pickle. I'm setting up component-wide resources in the BTL component init. I am doing this because the creation of the modules that I return from BTL component init (currently) *assume* that all of these component resources are already setup. If I have to defer setting up compo

Re: [OMPI devel] TCP btl seq

2014-07-28 Thread Jeff Squyres (jsquyres)
On Jul 28, 2014, at 1:09 PM, Ralph Castain wrote: >> 2. If we keep it, I don't remember offhand what the difference is between >> node_rank and local_rank. The one we want is the 0-based index rank of this >> process *on this server*. E.g., on a 2-server job, each with 16 slots, the >> first

Re: [OMPI devel] TCP btl seq

2014-07-28 Thread George Bosilca
So do we want to sequence the BTL interfaces between jobs or only between local processes on the same job? I'm also fine with removing this option if it is not in use. George. On Mon, Jul 28, 2014 at 1:09 PM, Ralph Castain wrote: > > On Jul 28, 2014, at 10:02 AM, Jeff Squyres (jsquyres) >

Re: [OMPI devel] TCP btl seq

2014-07-28 Thread Ralph Castain
On Jul 28, 2014, at 10:02 AM, Jeff Squyres (jsquyres) wrote: > From George's comments on > http://www.open-mpi.org/community/lists/devel/2014/07/15275.php: > > "Ralph and Jeff (I think you added the seq interface to TCP), please take a > look at the following: > - the implementation of the T

Re: [OMPI devel] opal_process_info.job_session_dir: "not yet defined"

2014-07-28 Thread George Bosilca
This means you are trying to initialize things too early. Most of the information made available in opal/util/proc.h is only available once the RTE was setup, i.e. only after the call to rte_init. Thus, the BTL can only use it after the init call... George. On Mon, Jul 28, 2014 at 1:01 PM, Ra

Re: [OMPI devel] opal_process_info.job_session_dir: "not yet defined"

2014-07-28 Thread Ralph Castain
On Jul 28, 2014, at 9:57 AM, Jeff Squyres (jsquyres) wrote: > I'm getting a value of "not yet defined" for > opal_process_info.job_session_dir in the usnic BTL (is this also what is > happening for > http://www.open-mpi.org/community/lists/devel/2014/07/15276.php?). > > Can the job_session_d

[OMPI devel] TCP btl seq

2014-07-28 Thread Jeff Squyres (jsquyres)
>From George's comments on >http://www.open-mpi.org/community/lists/devel/2014/07/15275.php: "Ralph and Jeff (I think you added the seq interface to TCP), please take a look at the following: - the implementation of the TCP seq interface seems to be wrong: it used the my_node_rank to compute th

[OMPI devel] opal_process_info.job_session_dir: "not yet defined"

2014-07-28 Thread Jeff Squyres (jsquyres)
I'm getting a value of "not yet defined" for opal_process_info.job_session_dir in the usnic BTL (is this also what is happening for http://www.open-mpi.org/community/lists/devel/2014/07/15276.php?). Can the job_session_dir be define/setup before the BTLs are setup? -- Jeff Squyres jsquy...@cis

[OMPI devel] trunk failures this morning

2014-07-28 Thread Jeff Squyres (jsquyres)
I know we expect failures because of the OPAL moves this weekend. Here's a failure in shared memory -- can someone take a look? - [8:52] savbu-usnic-a:~/s/o/examples $ mpirun --np 2 hostname mpi001 mpi001 [8:52] savbu-usnic-a:~/s/o/examples $ mpirun --np 2 --mca btl sm,self ring_c [mpi001: