Hi folks,
I tried to build openmpi-1.8.2rc2 with PGI-14.7 and execute a sample
program. Then, it causes linking error:
[mishima@manage work]$ cat test.f
program hello_world
use mpi_f08
implicit none
type(MPI_Comm) :: comm
integer :: myid, npes, ierror
integer
Fix is coming
On Jul 28, 2014, at 6:11 PM, MPI Team wrote:
>
> ERROR: Command returned a non-zero exist status (trunk):
> make -j 8 distcheck
>
> Start time: Mon Jul 28 21:05:01 EDT 2014
> End time: Mon Jul 28 21:11:02 EDT 2014
>
> =
Nope, haven't seen that before...
On Jul 28, 2014, at 6:43 PM, Pritchard Jr., Howard wrote:
> Hi Folks,
>
> I was feeling lucky and decided to a fresh svn checkout of trunk and simple
> ./autogen.pl, ./configure
> make on a opensuse 13.1.
>
> I get a blowup in opal_config_bottom.h:
>
> p
Hi Folks,
I was feeling lucky and decided to a fresh svn checkout of trunk and simple
./autogen.pl, ./configure
make on a opensuse 13.1.
I get a blowup in opal_config_bottom.h:
pp@hagel-vm:~>../../opal/include/opal_config_bottom.h:383:38: error: expected
declaration specifiers or '...' before
Ok, got --disable-dlopen working again. I removed the code in question
and changed how coll/sm shares the segment data.
-Nathan
On Mon, Jul 28, 2014 at 02:41:37PM -0600, Nathan Hjelm wrote:
>
> Or pull it into coll/sm. Though I think we can do better here since
> point-to-point messaging can be
Or pull it into coll/sm. Though I think we can do better here since
point-to-point messaging can be used in coll/sm. We can use the
netpatterns code to share the segment information.
-Nathan
On Mon, Jul 28, 2014 at 08:37:15PM +, Jeff Squyres (jsquyres) wrote:
> Perhaps that RML code can go b
Perhaps that RML code can go back up in ompi/common/sm...? (since only
ompi/coll/sm uses it)
On Jul 28, 2014, at 4:34 PM, Nathan Hjelm wrote:
>
> Damn, spoke too soon. coll/sm uses it:
>
> ./ompi/mca/coll/sm/coll_sm_module.c:
> mca_common_sm_init_group(comm->c_local_group, size, ful
Damn, spoke too soon. coll/sm uses it:
./ompi/mca/coll/sm/coll_sm_module.c:
mca_common_sm_init_group(comm->c_local_group, size, fullpath,
./ompi/mca/coll/sm/coll_sm_module.c:
"coll:sm:enable:bootstrap comm (%d/%s): mca_common_sm_init_group failed",
Let me se
Looks like you are correct. The function that calls the rml code is
mca_common_sm_init which is no longer called by anything (other than
mca_common_sm_init_group.. which isn't called either). Let me see if I
can fix this. I need this build working again with --disable-dlopen.
mu-fey:/usr/projects
I'm not sure the sm actually relies on the RML any more - I thought we had
removed that dependency, though the file may not have been deleted.
On Jul 28, 2014, at 1:02 PM, Nathan Hjelm wrote:
>
> The trunk is totally broken and it might not be easy to fix. I am seeing
> this error when buildin
This has been clear from day one: everything based on RML to setup will
need to be rewritten. This is not only SM, it also related to IB.
Meanwhile, one must build with dlopen enabled in order to get access to
these calls.
George.
On Mon, Jul 28, 2014 at 4:02 PM, Nathan Hjelm wrote:
>
> The
The trunk is totally broken and it might not be easy to fix. I am seeing
this error when building with --disable-dlopen (the LANL default):
/usr/projects/hpctools/hjelmn/ompi-trunk-git/opal/mca/common/sm/common_sm_rml.c:
In function 'mca_common_sm_rml_info_bcast':
/usr/projects/hpctools/hjelmn/o
Ralph's commit fixed both usnic and sm.
Thanks!
On Jul 28, 2014, at 2:36 PM, Jeff Squyres (jsquyres) wrote:
> On Jul 28, 2014, at 2:23 PM, George Bosilca wrote:
>
>> Patience ...
>
> No worries; we knew stuff like this would happen after the initial merge.
>
> Thanks for digging in to it.
Working on it now...
On Jul 28, 2014, at 11:36 AM, Jeff Squyres (jsquyres)
wrote:
> On Jul 28, 2014, at 2:23 PM, George Bosilca wrote:
>
>> Patience ...
>
> No worries; we knew stuff like this would happen after the initial merge.
>
> Thanks for digging in to it.
>
> --
> Jeff Squyres
> j
On Jul 28, 2014, at 2:23 PM, George Bosilca wrote:
> Patience ...
No worries; we knew stuff like this would happen after the initial merge.
Thanks for digging in to it.
--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/leg
Ignore my previous email, I see what is going on. Basically, there are 6
data made available to the BTL: nodename, job_session_dir,
proc_session_dir, num_local_peers, my_local_rank and if available cpuset.
Some of this information is available early in the startup while others are
only available a
Well, I'm slightly confused as the BTL are initialized outside opal_init.
There must be a specific call to mca_base_framework_open for the BTL, and
currently this call is made in the BML. As the BML is only initialized once
the RTE is up, I don't understand how do you get the "not initialized".
I'd be ok with that.
George?
On Jul 28, 2014, at 1:20 PM, Ralph Castain wrote:
> I think we should not have opal_init setup the BTLs at all. Let's leave that
> for the RTE setup to do as it can control the sequencing to ensure all the
> data is available and ready
>
> On Jul 28, 2014, at 10
I think we should not have opal_init setup the BTLs at all. Let's leave that
for the RTE setup to do as it can control the sequencing to ensure all the data
is available and ready
On Jul 28, 2014, at 10:21 AM, Jeff Squyres (jsquyres)
wrote:
> Well, this is a pickle.
>
> I'm setting up compon
Well, this is a pickle.
I'm setting up component-wide resources in the BTL component init. I am doing
this because the creation of the modules that I return from BTL component init
(currently) *assume* that all of these component resources are already setup.
If I have to defer setting up compo
On Jul 28, 2014, at 1:09 PM, Ralph Castain wrote:
>> 2. If we keep it, I don't remember offhand what the difference is between
>> node_rank and local_rank. The one we want is the 0-based index rank of this
>> process *on this server*. E.g., on a 2-server job, each with 16 slots, the
>> first
So do we want to sequence the BTL interfaces between jobs or only between
local processes on the same job?
I'm also fine with removing this option if it is not in use.
George.
On Mon, Jul 28, 2014 at 1:09 PM, Ralph Castain wrote:
>
> On Jul 28, 2014, at 10:02 AM, Jeff Squyres (jsquyres)
>
On Jul 28, 2014, at 10:02 AM, Jeff Squyres (jsquyres)
wrote:
> From George's comments on
> http://www.open-mpi.org/community/lists/devel/2014/07/15275.php:
>
> "Ralph and Jeff (I think you added the seq interface to TCP), please take a
> look at the following:
> - the implementation of the T
This means you are trying to initialize things too early. Most of the
information made available in opal/util/proc.h is only available once the
RTE was setup, i.e. only after the call to rte_init. Thus, the BTL can only
use it after the init call...
George.
On Mon, Jul 28, 2014 at 1:01 PM, Ra
On Jul 28, 2014, at 9:57 AM, Jeff Squyres (jsquyres) wrote:
> I'm getting a value of "not yet defined" for
> opal_process_info.job_session_dir in the usnic BTL (is this also what is
> happening for
> http://www.open-mpi.org/community/lists/devel/2014/07/15276.php?).
>
> Can the job_session_d
>From George's comments on
>http://www.open-mpi.org/community/lists/devel/2014/07/15275.php:
"Ralph and Jeff (I think you added the seq interface to TCP), please take a
look at the following:
- the implementation of the TCP seq interface seems to be wrong: it used the
my_node_rank to compute th
I'm getting a value of "not yet defined" for opal_process_info.job_session_dir
in the usnic BTL (is this also what is happening for
http://www.open-mpi.org/community/lists/devel/2014/07/15276.php?).
Can the job_session_dir be define/setup before the BTLs are setup?
--
Jeff Squyres
jsquy...@cis
I know we expect failures because of the OPAL moves this weekend. Here's a
failure in shared memory -- can someone take a look?
-
[8:52] savbu-usnic-a:~/s/o/examples
$ mpirun --np 2 hostname
mpi001
mpi001
[8:52] savbu-usnic-a:~/s/o/examples
$ mpirun --np 2 --mca btl sm,self ring_c
[mpi001:
28 matches
Mail list logo