Re: [OMPI devel] Announcing Open MPI v4.0.0rc1

2018-09-19 Thread Cabral, Matias A
Hi Arm,

> IIRC, OFI BTL only create one EP
Correct. But only one is needed to trigger the below issues. There are 
different manifestations according to combinations of MTL OFI/PSM2, the version 
of libpsm2, and the support of OFI Scalable Eps.

> Do you think moving EP creation from component_init to component_open will 
> solve the problem?
If the component_open is only called when the component is/will be effectively 
used, it may work. Let me check.

Thanks,

_MAC

From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of Thananon 
Patinyasakdikul
Sent: Wednesday, September 19, 2018 10:15 AM
To: Open MPI Developers 
Subject: Re: [OMPI devel] Announcing Open MPI v4.0.0rc1

Mattias,

IIRC, OFI BTL only create one EP. If you move it to add_proc, you might need to 
add some checks to not re-creating EP over and over. Do you think moving EP 
creation from component_init to component_open will solve the problem?

Arm


On Sep 19, 2018, at 1:08 PM, Cabral, Matias A 
mailto:matias.a.cab...@intel.com>> wrote:

Hi Edgar,

I also saw some similar issues, not exactly the same, but look very similar 
(may be because of different version of libpsm2 ). 1 and 2 are related to the 
introduction of the OFI BTL and the fact that it opens an OFI EP in its init 
function. I see that all btls call the init function during transport selection 
time. Moreover, this happens even when you explicitly ask for a different one 
(-mca pml cm -mca mtl psm2).  Workaround:  -mca btl ^ofi.  My current idea is 
to update the OFI BTL and move the EPs opening to add_procs. Feedback?

Number 3 is goes beyond me.

Thanks,

_MAC

From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of Gabriel, 
Edgar
Sent: Wednesday, September 19, 2018 9:25 AM
To: Open MPI Developers 
mailto:devel@lists.open-mpi.org>>
Subject: Re: [OMPI devel] Announcing Open MPI v4.0.0rc1

I performed some tests on our Omnipath cluster, and I have a mixed bag of 
results with 4.0.0rc1

1.   Good news, the problems with the psm2 mtl that I reported in June/July 
seem to be fixed. I still get however a warning every time I run a job with 
4.0.0, e.g.

compute-1-1.local.4351PSM2 has not been initialized
compute-1-0.local.3826PSM2 has not been initialized

although based on the performance, it is very clear that psm2 is being used. I 
double checked with 3.0 series, I do not get the same warnings on the same
set of nodes. The unfortunate part about this error  message is, that it seems 
that applications seem to return an error (although tests and applications seem 
to
finish correctly otherwise)

--
mpirun detected that one or more processes exited with non-zero status, thus 
causing
the job to be terminated. The first process to do so was:

  Process name: [[38418,1],1]
  Exit code:255
  


2.   The ofi mtl does not work at all on our Omnipath cluster. If I try to 
force it using ‘mpirun –mca mtl ofi …’ I get the following error message.

[compute-1-0:03988] *** An error occurred in MPI_Barrier
[compute-1-0:03988] *** reported by process [2712141825,0]
[compute-1-0:03988] *** on communicator MPI_COMM_WORLD
[compute-1-0:03988] *** MPI_ERR_OTHER: known error not in list
[compute-1-0:03988] *** MPI_ERRORS_ARE_FATAL (processes in this communicator 
will now abort,
[compute-1-0:03988] ***and potentially your MPI job)
[sabine.cacds.uh.edu:21046<http://sabine.cacds.uh.edu:21046>] 1 more process 
has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[sabine.cacds.uh.edu:21046<http://sabine.cacds.uh.edu:21046>] Set MCA parameter 
"orte_base_help_aggregate" to 0 to see all help / error messages

I once again double checked that this works correctly in the 3.0 (and 3.1, 
although I did not run that test this time).

3.   The openib btl component is always getting in the way with annoying 
warnings. It is not really used, but constantly complains:


[sabine.cacds.uh.edu:25996<http://sabine.cacds.uh.edu:25996>] 1 more process 
has sent help message help-mpi-btl-openib.txt / ib port not selected
[sabine.cacds.uh.edu:25996<http://sabine.cacds.uh.edu:25996>] Set MCA parameter 
"orte_base_help_aggregate" to 0 to see all help / error messages
[sabine.cacds.uh.edu:25996<http://sabine.cacds.uh.edu:25996>] 1 more process 
has sent help message help-mpi-btl-openib.txt / error in device init

So bottom line, if I do

mpirun –mca btl^openib –mca mtl^ofi ….

my tests finish correctly, although mpirun will still return an error.

Thanks
Edgar


From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of Geoffrey 
Paulsen
Sent: Sunday, September 16, 2018 2:31 PM
To: devel@lists.open-mpi.org<mailto:devel@lists.open-mpi.org>
Subject: [OMPI devel] Announcing Open MPI v4.0.0rc1


Th

Re: [OMPI devel] Announcing Open MPI v4.0.0rc1

2018-09-19 Thread Cabral, Matias A
Hi Edgar,

I also saw some similar issues, not exactly the same, but look very similar 
(may be because of different version of libpsm2 ). 1 and 2 are related to the 
introduction of the OFI BTL and the fact that it opens an OFI EP in its init 
function. I see that all btls call the init function during transport selection 
time. Moreover, this happens even when you explicitly ask for a different one 
(-mca pml cm -mca mtl psm2).  Workaround:  -mca btl ^ofi.  My current idea is 
to update the OFI BTL and move the EPs opening to add_procs. Feedback?

Number 3 is goes beyond me.

Thanks,

_MAC

From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of Gabriel, 
Edgar
Sent: Wednesday, September 19, 2018 9:25 AM
To: Open MPI Developers 
Subject: Re: [OMPI devel] Announcing Open MPI v4.0.0rc1

I performed some tests on our Omnipath cluster, and I have a mixed bag of 
results with 4.0.0rc1


1.   Good news, the problems with the psm2 mtl that I reported in June/July 
seem to be fixed. I still get however a warning every time I run a job with 
4.0.0, e.g.



compute-1-1.local.4351PSM2 has not been initialized

compute-1-0.local.3826PSM2 has not been initialized

although based on the performance, it is very clear that psm2 is being used. I 
double checked with 3.0 series, I do not get the same warnings on the same
set of nodes. The unfortunate part about this error  message is, that it seems 
that applications seem to return an error (although tests and applications seem 
to
finish correctly otherwise)

--
mpirun detected that one or more processes exited with non-zero status, thus 
causing
the job to be terminated. The first process to do so was:

  Process name: [[38418,1],1]
  Exit code:255
  



2.   The ofi mtl does not work at all on our Omnipath cluster. If I try to 
force it using ‘mpirun –mca mtl ofi …’ I get the following error message.



[compute-1-0:03988] *** An error occurred in MPI_Barrier

[compute-1-0:03988] *** reported by process [2712141825,0]

[compute-1-0:03988] *** on communicator MPI_COMM_WORLD

[compute-1-0:03988] *** MPI_ERR_OTHER: known error not in list

[compute-1-0:03988] *** MPI_ERRORS_ARE_FATAL (processes in this communicator 
will now abort,

[compute-1-0:03988] ***and potentially your MPI job)

[sabine.cacds.uh.edu:21046] 1 more process has sent help message 
help-mpi-errors.txt / mpi_errors_are_fatal

[sabine.cacds.uh.edu:21046] Set MCA parameter "orte_base_help_aggregate" to 0 
to see all help / error messages



I once again double checked that this works correctly in the 3.0 (and 3.1, 
although I did not run that test this time).



3.   The openib btl component is always getting in the way with annoying 
warnings. It is not really used, but constantly complains:



[sabine.cacds.uh.edu:25996] 1 more process has sent help message 
help-mpi-btl-openib.txt / ib port not selected
[sabine.cacds.uh.edu:25996] Set MCA parameter "orte_base_help_aggregate" to 0 
to see all help / error messages
[sabine.cacds.uh.edu:25996] 1 more process has sent help message 
help-mpi-btl-openib.txt / error in device init

So bottom line, if I do

mpirun –mca btl^openib –mca mtl^ofi ….

my tests finish correctly, although mpirun will still return an error.

Thanks
Edgar


From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of Geoffrey 
Paulsen
Sent: Sunday, September 16, 2018 2:31 PM
To: devel@lists.open-mpi.org
Subject: [OMPI devel] Announcing Open MPI v4.0.0rc1


The first release candidate for the Open MPI v4.0.0 release is posted at

https://www.open-mpi.org/software/ompi/v4.0/

Major changes include:



4.0.0 -- September, 2018





- OSHMEM updated to the OpenSHMEM 1.4 API.

- Do not build Open SHMEM layer when there are no SPMLs available.

  Currently, this means the Open SHMEM layer will only build if

  a MXM or UCX library is found.

- A UCX BTL was added for enhanced MPI RMA support using UCX

- With this release,  OpenIB BTL now only supports iWarp and RoCE by default.

- Updated internal HWLOC to 2.0.1

- Updated internal PMIx to 3.0.1

- Change the priority for selecting external verses internal HWLOC

  and PMIx packages to build.  Starting with this release, configure

  by default selects available external HWLOC and PMIx packages over

  the internal ones.

- Updated internal ROMIO to 3.2.1.

- Removed support for the MXM MTL.

- Improved CUDA support when using UCX.

- Improved support for two phase MPI I/O operations when using OMPIO.

- Added support for Software-based Performance Counters, see

  
https://github.com/davideberius/ompi/wiki/How-to-Use-Software-Based-Performance-Counters-(SPCs)-in-Open-MPI-
 Various improvements to MPI RMA performance when using RDMA

  capable interconnects.

- Update 

Re: [OMPI devel] Default tag for OFI MTL

2018-03-05 Thread Cabral, Matias A
> It is a predefined attribute and should be automatically set by the MPI layer 
> using the pml_max_tag field of the selected PML.

In the MLTs this is set at registration time in
struct mca_mtl_base_module_t {
…
int  mtl_max_tag; /**< maximum tag value.  note that negative tags 
must be allowed */

However, I found something that seems to be wrong or I’m missing somethings.  
The current implementation supports 32 bits but is setting (1UL << 30). 
Shouldn’t this actually be (1 <<31) -1 ?

>As I mentioned PML (OB1) only supports 16 bits tags (in fact 15 because 
>negative tags are reserved for OPI internal usage). I do not recall any 
>complaints about this limit. Targeting consistency across PMLs provide 
>user-friendliness, thus a default of 16 bits for tag and then everything else 
>for the cid might be a sensible choice

Ok, 26 bits cid | 18 bits source rank | 4 bits prot|16 bits tag. For the 
default and a build time option to use the one I proposed in my first email.


Thanks,

_MAC

From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of George 
Bosilca
Sent: Sunday, March 04, 2018 9:08 AM
To: Open MPI Developers 
Subject: Re: [OMPI devel] Default tag for OFI MTL

On Sat, Mar 3, 2018 at 6:35 PM, Cabral, Matias A 
mailto:matias.a.cab...@intel.com>> wrote:
Hi George,

Thanks for the feedback, appreciated.  Few questions/comments:

> Regarding the tag with your proposal the OFI MTL will support a wider range 
> of tags than the OB1 PML, where we are limited to 16 bits. Just make sure you 
> correctly expose your tag limit via the MPI_TAG_UB.

I will take a look at MPI_TAG_UB.

It is a predefined attribute and should be automatically set by the MPI layer 
using the pml_max_tag field of the selected PML.

> I personally would prefer a solution where we can alter the distribution of 
> bits between bits in the cid and tag at compile time.

Sure, I can do this. What would you suggest for plan B? Fewer tag bits and more 
cid ones? Numbers?

As I mentioned PML (OB1) only supports 16 bits tags (in fact 15 because 
negative tags are reserved for OPI internal usage). I do not recall any 
complaints about this limit. Targeting consistency across PMLs provide 
user-friendliness, thus a default of 16 bits for tag and then everything else 
for the cid might be a sensible choice.

George.

>. We can also envision this selection to be driven by an MCA parameter, but 
>this might be too costly

I did think about it. However, as you say, I’m not yet convinced it is worth it:

a)  I will be soon reviewing synchronous send protocol. Not reviewed 
thoroughly yet, but I’m quite sure I can reduce it to use 2 bits (maybe just 
1). Freeing 2 (or 3) more bits for cids or ranks.

b)  Most of the providers TODAY effectively support FI_REMOTE_CQ_DATA and 
FI_DIRECTED_RECV (psm2, gni, verbs;ofi_rxm, sockets). This is just a fallback 
for potential new ones.  FI_DIRECTED_RECV is necessary to discriminate the 
source at RX time when the source is not in the tag.

c)   I will include build_time_plan_B you just suggested ;)

Thanks, again.

_MAC

From: devel 
[mailto:devel-boun...@lists.open-mpi.org<mailto:devel-boun...@lists.open-mpi.org>]
 On Behalf Of George Bosilca
Sent: Saturday, March 03, 2018 6:29 AM
To: Open MPI Developers 
mailto:devel@lists.open-mpi.org>>
Subject: Re: [OMPI devel] Default tag for OFI MTL

Hi Matias,

Relaxing the restriction on the number of ranks is definitively a good thing. 
The cost will be reflected on the number of communicators and tags, and we must 
be careful how we balance this.

Assuming context_id is the communicator cid, with 10 bits you can only support 
1024. A little low, even lower than MVAPICH. The way we allocate cid is very 
sparse, and with a limited number of possible cid, we might run in troubles 
very quickly for the few applications that are using a large number of 
communicators, and for the resilience support. Yet another reason to revisit 
the cid allocation in the short term.

Regarding the tag with your proposal the OFI MTL will support a wider range of 
tags than the OB1 PML, where we are limited to 16 bits. Just make sure you 
correctly expose your tag limit via the MPI_TAG_UB.

I personally would prefer a solution where we can alter the distribution of 
bits between bits in the cid and tag at compile time. We can also envision this 
selection to be driven by an MCA parameter, but this might be too costly.
  George.




On Sat, Mar 3, 2018 at 2:56 AM, Cabral, Matias A 
mailto:matias.a.cab...@intel.com>> wrote:
Hi all,

I’m working on extending the OFI MTL to support FI_REMOTE_CQ_DATA (1) to extend 
the number of ranks currently supported by the MTL. Currently limited to only 
16 bits included in the OFI tag (2). After the feature is implemented there 
will be no limitation for providers that support FI_REMOTE_CQ_DATA and 
FI_DIRECTED_RECEIVE (3). However, ther

Re: [OMPI devel] Default tag for OFI MTL

2018-03-03 Thread Cabral, Matias A
Hi George,

Thanks for the feedback, appreciated.  Few questions/comments:

> Regarding the tag with your proposal the OFI MTL will support a wider range 
> of tags than the OB1 PML, where we are limited to 16 bits. Just make sure you 
> correctly expose your tag limit via the MPI_TAG_UB.

I will take a look at MPI_TAG_UB.

> I personally would prefer a solution where we can alter the distribution of 
> bits between bits in the cid and tag at compile time.

Sure, I can do this. What would you suggest for plan B? Fewer tag bits and more 
cid ones? Numbers?

>. We can also envision this selection to be driven by an MCA parameter, but 
>this might be too costly

I did think about it. However, as you say, I’m not yet convinced it is worth it:

a)  I will be soon reviewing synchronous send protocol. Not reviewed 
thoroughly yet, but I’m quite sure I can reduce it to use 2 bits (maybe just 
1). Freeing 2 (or 3) more bits for cids or ranks.

b)  Most of the providers TODAY effectively support FI_REMOTE_CQ_DATA and 
FI_DIRECTED_RECV (psm2, gni, verbs;ofi_rxm, sockets). This is just a fallback 
for potential new ones.  FI_DIRECTED_RECV is necessary to discriminate the 
source at RX time when the source is not in the tag.

c)   I will include build_time_plan_B you just suggested ;)

Thanks, again.

_MAC

From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of George 
Bosilca
Sent: Saturday, March 03, 2018 6:29 AM
To: Open MPI Developers 
Subject: Re: [OMPI devel] Default tag for OFI MTL

Hi Matias,

Relaxing the restriction on the number of ranks is definitively a good thing. 
The cost will be reflected on the number of communicators and tags, and we must 
be careful how we balance this.

Assuming context_id is the communicator cid, with 10 bits you can only support 
1024. A little low, even lower than MVAPICH. The way we allocate cid is very 
sparse, and with a limited number of possible cid, we might run in troubles 
very quickly for the few applications that are using a large number of 
communicators, and for the resilience support. Yet another reason to revisit 
the cid allocation in the short term.

Regarding the tag with your proposal the OFI MTL will support a wider range of 
tags than the OB1 PML, where we are limited to 16 bits. Just make sure you 
correctly expose your tag limit via the MPI_TAG_UB.

I personally would prefer a solution where we can alter the distribution of 
bits between bits in the cid and tag at compile time. We can also envision this 
selection to be driven by an MCA parameter, but this might be too costly.
  George.




On Sat, Mar 3, 2018 at 2:56 AM, Cabral, Matias A 
mailto:matias.a.cab...@intel.com>> wrote:
Hi all,

I’m working on extending the OFI MTL to support FI_REMOTE_CQ_DATA (1) to extend 
the number of ranks currently supported by the MTL. Currently limited to only 
16 bits included in the OFI tag (2). After the feature is implemented there 
will be no limitation for providers that support FI_REMOTE_CQ_DATA and 
FI_DIRECTED_RECEIVE (3). However, there will be a fallback mode for providers 
that do not support these features and I would like to get consensus on the 
default tag distribution. This is my proposal:

* Default: No FI_REMOTE_CQ_DATA
* 01234567 01| 234567 01234567 0123| 4567 |01234567 01234567 01234567 01234567
* context_id   |source rank |proto|  message tag

#define MTL_OFI_CONTEXT_MASK(0xFFC0ULL)
#define MTL_OFI_SOURCE_MASK (0x00300ULL)
#define MTL_OFI_SOURCE_BITS_COUNT   (18) /* 262,143 ranks */
#define MTL_OFI_CONTEXT_BITS_COUNT  (10) /* 1,023 communicators */
#define MTL_OFI_TAG_BITS_COUNT  (32) /* no restrictions */
#define MTL_OFI_PROTO_BITS_COUNT(4)

Notes:

-  More ranks and fewer context ids than the current implementation.

-  Moved the protocol bits from the most significant bits because some 
providers may reserve starting from there (see mem_tag_format (4)) and sync 
send will not work.

Thoughts?

Today we had a call with Howard (LANL), John and Hamuri (HPE) and briefly 
talked about this, and also thought about sending this email as a query to find 
other developers keeping an eye on OFI support in OMPI.

Thanks,
_MAC



(1)https://ofiwg.github.io/libfabric/master/man/fi_cq.3.html

(2)
https://github.com/open-mpi/ompi/blob/master/ompi/mca/mtl/ofi/mtl_ofi_types.h#L70

(3)https://ofiwg.github.io/libfabric/master/man/fi_getinfo.3.html

(4)https://ofiwg.github.io/libfabric/master/man/fi_endpoint.3.html







___
devel mailing list
devel@lists.open-mpi.org<mailto:devel@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/devel

___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

[OMPI devel] Default tag for OFI MTL

2018-03-02 Thread Cabral, Matias A
Hi all,

I'm working on extending the OFI MTL to support FI_REMOTE_CQ_DATA (1) to extend 
the number of ranks currently supported by the MTL. Currently limited to only 
16 bits included in the OFI tag (2). After the feature is implemented there 
will be no limitation for providers that support FI_REMOTE_CQ_DATA and 
FI_DIRECTED_RECEIVE (3). However, there will be a fallback mode for providers 
that do not support these features and I would like to get consensus on the 
default tag distribution. This is my proposal:

* Default: No FI_REMOTE_CQ_DATA
* 01234567 01| 234567 01234567 0123| 4567 |01234567 01234567 01234567 01234567
* context_id   |source rank |proto|  message tag

#define MTL_OFI_CONTEXT_MASK(0xFFC0ULL)
#define MTL_OFI_SOURCE_MASK (0x00300ULL)
#define MTL_OFI_SOURCE_BITS_COUNT   (18) /* 262,143 ranks */
#define MTL_OFI_CONTEXT_BITS_COUNT  (10) /* 1,023 communicators */
#define MTL_OFI_TAG_BITS_COUNT  (32) /* no restrictions */
#define MTL_OFI_PROTO_BITS_COUNT(4)

Notes:

-  More ranks and fewer context ids than the current implementation.

-  Moved the protocol bits from the most significant bits because some 
providers may reserve starting from there (see mem_tag_format (4)) and sync 
send will not work.

Thoughts?

Today we had a call with Howard (LANL), John and Hamuri (HPE) and briefly 
talked about this, and also thought about sending this email as a query to find 
other developers keeping an eye on OFI support in OMPI.

Thanks,
_MAC



(1)https://ofiwg.github.io/libfabric/master/man/fi_cq.3.html

(2)
https://github.com/open-mpi/ompi/blob/master/ompi/mca/mtl/ofi/mtl_ofi_types.h#L70

(3)https://ofiwg.github.io/libfabric/master/man/fi_getinfo.3.html

(4)https://ofiwg.github.io/libfabric/master/man/fi_endpoint.3.html






___
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Re: [OMPI devel] Last call: v1.10.5

2016-12-19 Thread Cabral, Matias A
Hi Ralph, 

Should v1.10.5 release wait to include the fix for #2591? 

Thanks, 

_MAC

-Original Message-
From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of 
r...@open-mpi.org
Sent: Monday, December 19, 2016 8:57 AM
To: OpenMPI Devel 
Subject: [OMPI devel] Last call: v1.10.5

Any last concerns or desired changes? Otherwise, barring hearing anything by 
noon Pacific, I’ll build/release the final version

Ralph

___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
___
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Re: [OMPI devel] mtl/psm2 and $PSM2_DEVICES

2016-10-03 Thread Cabral, Matias A
   "Cannot continue.\n");
>  return NULL;
>  }
> +if (OMPI_SUCCESS != get_num_max_procs(&num_max_procs)) {
> +opal_output(0, "Cannot determine max number of processes. "
> +"Cannot continue.\n");
> +return NULL;
> +}
>
>  err = psm2_error_register_handler(NULL /* no ep */,
>  PSM2_ERRHANDLER_NOP); @@ -230,8 
> +244,10 @@ ompi_mtl_psm2_component_init(bool enable_progress_threads,
> return NULL;
>  }
>
> -    if (num_local_procs == num_total_procs) {
> -  setenv("PSM2_DEVICES", "self,shm", 0);
> +if ((num_local_procs == num_total_procs) && (num_max_procs <=
> num_total_procs)) {
> +if (NULL == getenv("PSM2_DEVICES")) {
> +setenv("PSM2_DEVICES", "self,shm", 0);
> +}
>  }
>
>  err = psm2_init(&verno_major, &verno_minor);
>
>
>
>
>
> On 9/30/2016 12:38 AM, Cabral, Matias A wrote:
>
> Hi Giles et.al.,
>
> You are right, ptl.c is in PSM2 code. As Ralph mentions, dynamic 
> process support was/is not working in OMPI when using PSM2 because of 
> an issue related to the transport keys. This was fixed in PR #1602
> (https://github.com/open-mpi/ompi/pull/1602) and should be included in 
> v2.0.2. HOWEVER, this not the error Juraj is seeing. The root of the 
> assertion is because the PSM/PSM2 MTLs will check for where the “original”
> process are running and, if detects all are local to the node, it will 
> ONLY initialize the shared memory device (variable PSM2_DEVICES="self,shm” ).
> This is to avoid “reserving” HW resources in the HFI card that 
> wouldn’t be used unless you later on spawn ranks in other nodes.  
> Therefore, to allow dynamic process to be spawned on other nodes you 
> need to tell PSM2 to instruct the HW to initialize all the de devices 
> by making the environment variable PSM2_DEVICES="self,shm,hfi" available 
> before running the job.
> Note that setting PSM2_DEVICES (*) will solve the below assertion, you 
> will most likely still see the transport key issue if PR1602 if is not 
> included.
>
> Thanks,
>
> _MAC
>
> (*)
> PSM2_DEVICES  -> Omni Path
> PSM_DEVICES  -> TrueScale
>
> From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of 
> r...@open-mpi.org
> Sent: Thursday, September 29, 2016 7:12 AM
> To: Open MPI Users 
> Subject: Re: [OMPI users] MPI_Comm_spawn
>
> Ah, that may be why it wouldn’t show up in the OMPI code base itself. 
> If that is the case here, then no - OMPI v2.0.1 does not support 
> comm_spawn for PSM. It is fixed in the upcoming 2.0.2
>
>
> On Sep 29, 2016, at 6:58 AM, Gilles Gouaillardet 
>  wrote:
>
> Ralph,
>
> My guess is that ptl.c comes from PSM lib ...
>
> Cheers,
>
> Gilles
>
> On Thursday, September 29, 2016, r...@open-mpi.org  wrote:
>
> Spawn definitely does not work with srun. I don’t recognize the name 
> of the file that segfaulted - what is “ptl.c”? Is that in your manager 
> program?
>
>
>
> On Sep 29, 2016, at 6:06 AM, Gilles Gouaillardet 
>  wrote:
>
> Hi,
>
> I do not expect spawn can work with direct launch (e.g. srun)
>
> Do you have PSM (e.g. Infinipath) hardware ? That could be linked to 
> the failure
>
> Can you please try
>
> mpirun --mca pml ob1 --mca btl tcp,sm,self -np 1 --hostfile my_hosts 
> ./manager 1
>
> and see if it help ?
>
> Note if you have the possibility, I suggest you first try that without 
> slurm, and then within a slurm job
>
> Cheers,
>
> Gilles
>
> On Thursday, September 29, 2016, juraj2...@gmail.com 
> 
> wrote:
>
> Hello,
>
> I am using MPI_Comm_spawn to dynamically create new processes from 
> single manager process. Everything works fine when all the processes 
> are running on the same node. But imposing restriction to run only a 
> single process per node does not work. Below are the errors produced 
> during multinode interactive session and multinode sbatch job.
>
> The system I am using is: Linux version 3.10.0-229.el7.x86_64
> (buil...@kbuilder.dev.centos.org) (gcc version 4.8.2 20140120 (Red Hat
> 4.8.2-16) (GCC) )
> I am using Open MPI 2.0.1
> Slurm is version 15.08.9
>
> What is preventing my jobs to spawn on multiple nodes? Does slurm 
> requires some additional configuration to allow it? Is it issue on the 
> MPI side, does it need to be compiled with some special flag (I have 
> compiled it with --enable-mpi-fortran=all --with-pmi)?
>
> The code I am launching is here: https://github.com/goghino/dynamicMPI
>
> Manager tries to laun

Re: [OMPI devel] mtl/psm2 and $PSM2_DEVICES

2016-09-29 Thread Cabral, Matias A
Hey Gilles,

Quick answer on the first part until I read a little more about num_max_procs :O
Being the third parameter of setenv a 0 means:  do not override if found in the 
env.  So the workaround does work today. Moreover, I would like to know if 
there is a place in some OMPI wiki to document this behavior.

Thanks,

_MAC

From: devel [mailto:devel-boun...@lists.open-mpi.org] On Behalf Of Gilles 
Gouaillardet
Sent: Thursday, September 29, 2016 6:14 PM
To: Open MPI Developers 
Subject: [OMPI devel] mtl/psm2 and $PSM2_DEVICES


This is a follow-up of 
https://mail-archive.com/users@lists.open-mpi.org/msg30055.html



Thanks Matias for the lengthy explanation.



currently, PSM2_DEVICES is overwritten, so i do not think setting it before 
invoking mpirun will help



also, in this specific case

- the user is running within a SLURM allocation with 2 nodes

- the user specified a host file with 2 distinct nodes



my first impression is that mtl/psm2 could/should handle this (well only one 
condition has to be met) properly and *not* set

export PSM2_DEVICES="self,shm"

the patch below
- does not overwrite PSM2_DEVICES
- does not set PSM2_DEVICES when num_max_procs > num_total_procs
this is suboptimal, but i could not find a way to get the number of orted.
iirc, MPI_Comm_spawn can have an orted dynamically spawned by passing a host in 
the MPI_Info.
if this host is not part of the hostfile (nor RM allocation ?), then 
PSM2_DEVICES must be set manually by the user


Ralph,

is there a way to get the number of orted ?
- if i mpirun -np 1 --host n0,n1 ... orte_process_info.num_nodes is 1 (i wish i 
could get 2)
- if running in singleton mode, orte_process_info.num_max_procs is 0 (is this a 
bug or a feature ?)

Cheers,

Gilles


diff --git a/ompi/mca/mtl/psm2/mtl_psm2_component.c 
b/ompi/mca/mtl/psm2/mtl_psm2_component.c
index 26bccd2..52b906b 100644
--- a/ompi/mca/mtl/psm2/mtl_psm2_component.c
+++ b/ompi/mca/mtl/psm2/mtl_psm2_component.c
@@ -14,6 +14,8 @@
  * Copyright (c) 2012-2015 Los Alamos National Security, LLC.
  * All rights reserved.
  * Copyright (c) 2013-2016 Intel, Inc. All rights reserved
+ * Copyright (c) 2016  Research Organization for Information Science
+ * and Technology (RIST). All rights reserved.
  * $COPYRIGHT$
  *
  * Additional copyrights may follow
@@ -170,6 +172,13 @@ get_num_total_procs(int *out_ntp)
 }

 static int
+get_num_max_procs(int *out_nmp)
+{
+  *out_nmp = (int)ompi_process_info.max_procs;
+  return OMPI_SUCCESS;
+}
+
+static int
 get_num_local_procs(int *out_nlp)
 {
 /* num_local_peers does not include us in
@@ -201,7 +210,7 @@ ompi_mtl_psm2_component_init(bool enable_progress_threads,
 intverno_major = PSM2_VERNO_MAJOR;
 int verno_minor = PSM2_VERNO_MINOR;
 int local_rank = -1, num_local_procs = 0;
-int num_total_procs = 0;
+int num_total_procs = 0, num_max_procs = 0;

 /* Compute the total number of processes on this host and our local rank
  * on that node. We need to provide PSM2 with these values so it can
@@ -221,6 +230,11 @@ ompi_mtl_psm2_component_init(bool enable_progress_threads,
 "Cannot continue.\n");
 return NULL;
 }
+if (OMPI_SUCCESS != get_num_max_procs(&num_max_procs)) {
+opal_output(0, "Cannot determine max number of processes. "
+"Cannot continue.\n");
+return NULL;
+}

 err = psm2_error_register_handler(NULL /* no ep */,
 PSM2_ERRHANDLER_NOP);
@@ -230,8 +244,10 @@ ompi_mtl_psm2_component_init(bool enable_progress_threads,
return NULL;
 }

-if (num_local_procs == num_total_procs) {
-  setenv("PSM2_DEVICES", "self,shm", 0);
+if ((num_local_procs == num_total_procs) && (num_max_procs <= 
num_total_procs)) {
+if (NULL == getenv("PSM2_DEVICES")) {
+setenv("PSM2_DEVICES", "self,shm", 0);
+        }
 }

 err = psm2_init(&verno_major, &verno_minor);






On 9/30/2016 12:38 AM, Cabral, Matias A wrote:
Hi Giles et.al.,

You are right, ptl.c is in PSM2 code. As Ralph mentions, dynamic process 
support was/is not working in OMPI when using PSM2 because of an issue related 
to the transport keys. This was fixed in PR #1602 
(https://github.com/open-mpi/ompi/pull/1602) and should be included in v2.0.2. 
HOWEVER, this not the error Juraj is seeing. The root of the assertion is 
because the PSM/PSM2 MTLs will check for where the "original" process are 
running and, if detects all are local to the node, it will ONLY initialize the 
shared memory device (variable PSM2_DEVICES="self,shm" ). This is to avoid 
"reserving" HW resources in the HFI card that wouldn't be used unless you later 
on spawn ranks in other nodes.  Therefore, to allow dynamic process to be 
spawn

[OMPI devel] MPI_Init() affecting rand()

2016-07-14 Thread Cabral, Matias A
Hi All,

Doing quick test with rand()/srand() I found that MPI_Init() seems to be 
calling a function in their family  that is affecting the values in the user 
application.  Please see below my simple test and the results. Yes, moving the 
second call to srand() after MPI_init() solves the problem. However, I'm 
confused since this was supposedly addressed in version 1.7.5. From release 
notes:


1.7.5 20 Mar 2014:



- OMPI now uses its own internal random number generator and will not perturb 
srand() and friends.


I tested on OMPI 1.10.2 and 1.10.3. The result is deterministic.



Any ideas?



Thanks,
Regards,

#include 
#include 
#include 
int main(int argc, char *argv[])
{
int rand1;
int rand2;
  int name_len;
srand(10);
rand1 = rand();
srand(10);
MPI_Init(&argc, &argv);
rand2 = rand();
if (rand1 != rand2) {
printf("%d != %d\n", rand1, rand2);
fflush(stdout);
}
else {
printf("%d == %d\n", rand1, rand2);
fflush(stdout);
}
MPI_Finalize();
return 0;
}


host1:/tmp> mpirun -np 1 -host host1 -mca pml ob1 -mca btl tcp,self ./rand1

964940668 != 865007240


_MAC



Re: [OMPI devel] 2.0.0 is coming: what do we need to communicate to users?

2016-04-29 Thread Cabral, Matias A
Exactly on the nail! Thanks Ralph. I've seen several comments about patches 
needing to be ported from 2.x to 1.x. That will definitely be transparent to 
users (apologize if I diverged the email intention), but maybe few others will 
be grateful to read that wiki entry. 

Thanks!

_MAC


-Original Message-
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Friday, April 29, 2016 10:56 AM
To: Open MPI Developers 
Subject: Re: [OMPI devel] 2.0.0 is coming: what do we need to communicate to 
users?

FWIW: I think Matias has a good point, though perhaps it belongs on a wiki 
page. When we moved the BTLs down to OPAL, for example, it doesn’t impact 
users, but would be worth ensuring developer’s and ISVs had a convenient place 
to see what changed.


> On Apr 29, 2016, at 10:47 AM, Jeff Squyres (jsquyres)  
> wrote:
> 
> Matias --
> 
> You're probably already in pretty good shape.  If you're following master and 
> the v2.x branch (and you are), then your code is already in good shape.
> 
> I was thinking mostly for users: transitioning from v1.8/v1.10 series to the 
> v2.x series -- what kinds of user-noticeable things will they see?
> 
> 
>> On Apr 29, 2016, at 12:34 PM, Cabral, Matias A  
>> wrote:
>> 
>> How about for “developers that have not been following the transition from 
>> 1.x to 2.0?  Particularly myself J. I started contributing to some specific 
>> parts (psm2 mtl) and following changes. However, I don’t have details of 
>> what is changing in 2.0. I see there could be different level of details in 
>> the “developer’s transition guide book”, ranging from architectural change 
>> to what pieces were moved where.
>> 
>> Thanks,
>> 
>> _MAC
>> 
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Joshua Ladd
>> Sent: Friday, April 29, 2016 7:11 AM
>> To: Open MPI Developers 
>> Subject: Re: [OMPI devel] 2.0.0 is coming: what do we need to communicate to 
>> users?
>> 
>> Certainly we need to communicate / advertise / evangelize the improvements 
>> in job launch - the largest and most substantial change between the two 
>> branches - and provide some best practice guidelines for usage (use direct 
>> modex for applications with sparse communication patterns and full modex for 
>> dense.) I would be happy to contribute some paragraphs. 
>> 
>> Obviously, we also need to communicate, reiterate the need to recompile 
>> codes built against the 1.10 series.  
>> 
>> Best, 
>> 
>> Josh
>> 
>> 
>> 
>> On Thursday, April 28, 2016, Jeff Squyres (jsquyres)  
>> wrote:
>> We're getting darn close to v2.0.0.
>> 
>> What "gotchas" do we need to communicate to users?  I.e., what will people 
>> upgrading from v1.8.x/v1.10.x be surprised by?
>> 
>> The most obvious one I can think of is mpirun requiring -np when slots are 
>> not specified somehow.
>> 
>> What else do we need to communicate?  It would be nice to avoid the 
>> confusion users experienced regarding affinity functionality/options when 
>> upgrading from v1.6 -> v1.8 (because we didn't communicate these changes 
>> well, IMHO).
>> 
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2016/04/18832.php
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2016/04/18843.php
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/04/18845.php

___
devel mailing list
de...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2016/04/18847.php


Re: [OMPI devel] 2.0.0 is coming: what do we need to communicate to users?

2016-04-29 Thread Cabral, Matias A
How about for “developers that have not been following the transition from 1.x 
to 2.0?  Particularly myself ☺. I started contributing to some specific parts 
(psm2 mtl) and following changes. However, I don’t have details of what is 
changing in 2.0. I see there could be different level of details in the 
“developer’s transition guide book”, ranging from architectural change to what 
pieces were moved where.

Thanks,

_MAC

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Joshua Ladd
Sent: Friday, April 29, 2016 7:11 AM
To: Open MPI Developers 
Subject: Re: [OMPI devel] 2.0.0 is coming: what do we need to communicate to 
users?

Certainly we need to communicate / advertise / evangelize the improvements in 
job launch - the largest and most substantial change between the two branches - 
and provide some best practice guidelines for usage (use direct modex for 
applications with sparse communication patterns and full modex for dense.) I 
would be happy to contribute some paragraphs.

Obviously, we also need to communicate, reiterate the need to recompile codes 
built against the 1.10 series.

Best,

Josh



On Thursday, April 28, 2016, Jeff Squyres (jsquyres) 
mailto:jsquy...@cisco.com>> wrote:
We're getting darn close to v2.0.0.

What "gotchas" do we need to communicate to users?  I.e., what will people 
upgrading from v1.8.x/v1.10.x be surprised by?

The most obvious one I can think of is mpirun requiring -np when slots are not 
specified somehow.

What else do we need to communicate?  It would be nice to avoid the confusion 
users experienced regarding affinity functionality/options when upgrading from 
v1.6 -> v1.8 (because we didn't communicate these changes well, IMHO).

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
devel mailing list
de...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2016/04/18832.php


Re: [OMPI devel] PSM2 Intel folks question

2016-04-20 Thread Cabral, Matias A
Hi Howard,

I’ve been playing with the same version of psm (hfi1-psm-0.7-221.ch6.x86_64) 
but cannot yet reproduce the issue.  Just in case, please share the version of 
the driver you have installed (hfi1-X.XX-XX.x86_64.rpm, modinfo hfi1).

What I can tell so far, is that I still suspect this has  some relation to the 
job_id, that OMPI uses to generate the unique job key, that psm uses to 
generate the epid. By looking at the logfile.busted, I see some entries for 
‘epid 1’. This can only happen if psm2_ep_open() is called with a unique 
job key of 1 and having the PSM2 hfi device disabled (only shm communication 
expected). In your workaround (hfi enabled) the epid generation goes through a 
different path that includes the HFI LID which ends with different number.  
HOWEVER, I hardcoded the above (to get epid 1) case but I still see the 
hello_c running with stock OMPI 1.10.2.

Would you please try forcing different jobid and share the results?

Thanks,

_MAC


From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Howard Pritchard
Sent: Wednesday, April 20, 2016 8:49 AM
To: Open MPI Developers 
Subject: Re: [OMPI devel] PSM2 Intel folks question

HI Matias,

Actually I found the problem.  I kept wondering why the OFI MTL works fine, but 
the
PSM2 MTL doesn't.  When I cranked up the debugging level I noticed that for OFI 
MTL,
it doesn't mess with the PSM2_DEVICES env variable.  So the PSM2 tries all three
"devices" as part of initialization.  However, the PSM2 MTL sets the 
PSM2_DEVICES
to not include hfi.  If I comment out those lines of code in the PSM2 MTL, my 
one-node
problem vanishes.

I suspect there's some setup code when "initializing" the hfi device that is 
actually
required even when using the shm device for on-node messages.

Is there by an chance some psm2 device driver parameter setting that might
result in this behavior.

Anyway, I set PSM2_TRACEMASK to 0x and got a bunch of output that
might be helpful.  I attached the log files to issue 1559.

For now, I will open a PR with fixes to get the PSM2 MTL working on our
omnipath clusters.

I don't think this problem has anything to do with SLURM except for the jobid
manipulation to generate the unique key.

Howard


2016-04-19 17:18 GMT-06:00 Cabral, Matias A 
mailto:matias.a.cab...@intel.com>>:
Howard,

PSM2_DEVICES, I went back to the roots and found that shm is the only device 
supporting communication between ranks in the same node. Therefore, the below 
error “Endpoint could not be reached” would be expected.

Back to the psm2_ep_connect() hanging, I cloned the same psm2 as you have from 
github and have hello_c and ring_c running with 80 ranks on a local node using 
PSM2 mtl. I do not have any SLURM setup on my system.  I will proceed to setup 
SLURM to see if I can reproduce the issue with it. In the meantime please share 
any extra detail you find relevant.

Thanks,

_MAC

From: devel 
[mailto:devel-boun...@open-mpi.org<mailto:devel-boun...@open-mpi.org>] On 
Behalf Of Howard Pritchard
Sent: Tuesday, April 19, 2016 12:21 PM
To: Open MPI Developers mailto:de...@open-mpi.org>>
Subject: Re: [OMPI devel] PSM2 Intel folks question

Hi Matias,

My usual favorites in ompi/examples/hello_c.c and ompi/examples/ring_c.c.
If I disable the shared memory device using the PSM2_DEVICES option
it looks like psm2 is unhappy:


kit001.localdomain:08222] PSM2 EP connect error (Endpoint could not be reached):
[kit001.localdomain:08222]  kit001
[kit001.localdomain:08222] PSM2 EP connect error (unknown connect error):
[kit001.localdomain:08222]  kit001
 psm2_ep_connect returned 41
[kit001.localdomain:08221] PSM2 EP connect error (unknown connect error):
[kit001.localdomain:08221]  kit001
[kit001.localdomain:08221] PSM2 EP connect error (Endpoint could not be 
reached):
[kit001.localdomain:08221]  kit001
leaving ompi_mtl_psm2_add_procs nprocs 2

I went back and tried again with the OFI MTL (without the PSM2_DEVICES set)
and that works correctly on a single node.
I get this same psm2_ep_connect timeout using mpirun, so its not a SLURM
specific problem.

2016-04-19 12:25 GMT-06:00 Cabral, Matias A 
mailto:matias.a.cab...@intel.com>>:
Hi Howard,

Couple more questions to understand a little better the context:

-  What type of job running?

-  Is this also under srun?

For PSM2 you may find more details in the programmer’s guide:
http://www.intel.com/content/dam/support/us/en/documents/network/omni-adptr/sb/Intel_PSM2_PG_H76473_v1_0.pdf

To disable shared memory:
Section 2.7.1:
PSM2_DEVICES="self,fi"

Thanks,
_MAC

From: devel 
[mailto:devel-boun...@open-mpi.org<mailto:devel-boun...@open-mpi.org>] On 
Behalf Of Howard Pritchard
Sent: Tuesday, April 19, 2016 11:04 AM
To: Open MPI Developers List mailto:de...@open-mpi.org>>
Subject: [OMPI devel] PSM2 Intel folks question

Hi Folks,

I'm making progress with issue #1559 (patches on the

Re: [OMPI devel] PSM2 Intel folks question

2016-04-19 Thread Cabral, Matias A
Howard,

PSM2_DEVICES, I went back to the roots and found that shm is the only device 
supporting communication between ranks in the same node. Therefore, the below 
error “Endpoint could not be reached” would be expected.

Back to the psm2_ep_connect() hanging, I cloned the same psm2 as you have from 
github and have hello_c and ring_c running with 80 ranks on a local node using 
PSM2 mtl. I do not have any SLURM setup on my system.  I will proceed to setup 
SLURM to see if I can reproduce the issue with it. In the meantime please share 
any extra detail you find relevant.

Thanks,

_MAC

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Howard Pritchard
Sent: Tuesday, April 19, 2016 12:21 PM
To: Open MPI Developers 
Subject: Re: [OMPI devel] PSM2 Intel folks question

Hi Matias,

My usual favorites in ompi/examples/hello_c.c and ompi/examples/ring_c.c.
If I disable the shared memory device using the PSM2_DEVICES option
it looks like psm2 is unhappy:


kit001.localdomain:08222] PSM2 EP connect error (Endpoint could not be reached):
[kit001.localdomain:08222]  kit001
[kit001.localdomain:08222] PSM2 EP connect error (unknown connect error):
[kit001.localdomain:08222]  kit001
 psm2_ep_connect returned 41
[kit001.localdomain:08221] PSM2 EP connect error (unknown connect error):
[kit001.localdomain:08221]  kit001
[kit001.localdomain:08221] PSM2 EP connect error (Endpoint could not be 
reached):
[kit001.localdomain:08221]  kit001
leaving ompi_mtl_psm2_add_procs nprocs 2

I went back and tried again with the OFI MTL (without the PSM2_DEVICES set)
and that works correctly on a single node.
I get this same psm2_ep_connect timeout using mpirun, so its not a SLURM
specific problem.

2016-04-19 12:25 GMT-06:00 Cabral, Matias A 
mailto:matias.a.cab...@intel.com>>:
Hi Howard,

Couple more questions to understand a little better the context:

-  What type of job running?

-  Is this also under srun?

For PSM2 you may find more details in the programmer’s guide:
http://www.intel.com/content/dam/support/us/en/documents/network/omni-adptr/sb/Intel_PSM2_PG_H76473_v1_0.pdf

To disable shared memory:
Section 2.7.1:
PSM2_DEVICES="self,fi"

Thanks,
_MAC

From: devel 
[mailto:devel-boun...@open-mpi.org<mailto:devel-boun...@open-mpi.org>] On 
Behalf Of Howard Pritchard
Sent: Tuesday, April 19, 2016 11:04 AM
To: Open MPI Developers List mailto:de...@open-mpi.org>>
Subject: [OMPI devel] PSM2 Intel folks question

Hi Folks,

I'm making progress with issue #1559 (patches on the mail list didn't help),
and I'll open a PR to help the PSM2 MTL work on a single node, but I'm
noticing something more troublesome.

If I run on just one node, and I use more than one process, process zero
consistently hangs in psm2_ep_connect.

I've tried using the psm2 code on github - at sha e951cf31, but I still see
the same behavior.

The PSM2 related rpms installed on our system are:

infinipath-psm-devel-3.3-0.g6f42cdb1bb8.2.el7.x86_64
hfi1-psm-0.7-221.ch6.x86_64
hfi1-psm-devel-0.7-221.ch6.x86_64
infinipath-psm-3.3-0.g6f42cdb1bb8.2.el7.x86_64
should we get newer rpms installed?

Is there a way to disable the AMSHM path?  I'm wondering if that
would help since multi-node jobs seems to run fine.

Thanks for any help,

Howard


___
devel mailing list
de...@open-mpi.org<mailto:de...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2016/04/18783.php



Re: [OMPI devel] PSM2 Intel folks question

2016-04-19 Thread Cabral, Matias A
Errata:
PSM2_DEVICES="self,hfi"


_MAC

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Cabral, Matias A
Sent: Tuesday, April 19, 2016 11:25 AM
To: Open MPI Developers 
Subject: Re: [OMPI devel] PSM2 Intel folks question

Hi Howard,

Couple more questions to understand a little better the context:

-  What type of job running?

-  Is this also under srun?

For PSM2 you may find more details in the programmer’s guide:
http://www.intel.com/content/dam/support/us/en/documents/network/omni-adptr/sb/Intel_PSM2_PG_H76473_v1_0.pdf

To disable shared memory:
Section 2.7.1:
PSM2_DEVICES="self,fi"

Thanks,
_MAC

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Howard Pritchard
Sent: Tuesday, April 19, 2016 11:04 AM
To: Open MPI Developers List mailto:de...@open-mpi.org>>
Subject: [OMPI devel] PSM2 Intel folks question

Hi Folks,

I'm making progress with issue #1559 (patches on the mail list didn't help),
and I'll open a PR to help the PSM2 MTL work on a single node, but I'm
noticing something more troublesome.

If I run on just one node, and I use more than one process, process zero
consistently hangs in psm2_ep_connect.

I've tried using the psm2 code on github - at sha e951cf31, but I still see
the same behavior.

The PSM2 related rpms installed on our system are:

infinipath-psm-devel-3.3-0.g6f42cdb1bb8.2.el7.x86_64
hfi1-psm-0.7-221.ch6.x86_64
hfi1-psm-devel-0.7-221.ch6.x86_64
infinipath-psm-3.3-0.g6f42cdb1bb8.2.el7.x86_64
should we get newer rpms installed?

Is there a way to disable the AMSHM path?  I'm wondering if that
would help since multi-node jobs seems to run fine.

Thanks for any help,

Howard



Re: [OMPI devel] PSM2 Intel folks question

2016-04-19 Thread Cabral, Matias A
Hi Howard,

Couple more questions to understand a little better the context:

-  What type of job running?

-  Is this also under srun?

For PSM2 you may find more details in the programmer’s guide:
http://www.intel.com/content/dam/support/us/en/documents/network/omni-adptr/sb/Intel_PSM2_PG_H76473_v1_0.pdf

To disable shared memory:
Section 2.7.1:
PSM2_DEVICES="self,fi"

Thanks,
_MAC

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Howard Pritchard
Sent: Tuesday, April 19, 2016 11:04 AM
To: Open MPI Developers List 
Subject: [OMPI devel] PSM2 Intel folks question

Hi Folks,

I'm making progress with issue #1559 (patches on the mail list didn't help),
and I'll open a PR to help the PSM2 MTL work on a single node, but I'm
noticing something more troublesome.

If I run on just one node, and I use more than one process, process zero
consistently hangs in psm2_ep_connect.

I've tried using the psm2 code on github - at sha e951cf31, but I still see
the same behavior.

The PSM2 related rpms installed on our system are:

infinipath-psm-devel-3.3-0.g6f42cdb1bb8.2.el7.x86_64
hfi1-psm-0.7-221.ch6.x86_64
hfi1-psm-devel-0.7-221.ch6.x86_64
infinipath-psm-3.3-0.g6f42cdb1bb8.2.el7.x86_64
should we get newer rpms installed?

Is there a way to disable the AMSHM path?  I'm wondering if that
would help since multi-node jobs seems to run fine.

Thanks for any help,

Howard



Re: [OMPI devel] psm2 and psm2_ep_open problems

2016-04-14 Thread Cabral, Matias A
Hi Howard,

I suspect this is the known issue that when using SLURM with OMPI and PSM that 
is discussed here:
https://www.open-mpi.org/community/lists/users/2010/12/15220.php

As per today, orte generates the psm_key, so when using SLURM this does not 
happen and is necessary to set it in the environment.  Here Ralph explains the 
workaround:
https://www.open-mpi.org/community/lists/users/2010/12/15242.php

As you found, epid of 0 is not a valid value. So, basing comments on:
https://github.com/01org/opa-psm2/blob/master/psm_ep.c

the assert of line 832. psmi_ep_open_device()  will do :

/*
* We use a LID of 0 for non-HFI communication.
* Since a jobkey is not available from IPS, 
pull the
* first 16 bits from the UUID.
*/

*epid = PSMI_EPID_PACK(((uint16_t *) 
unique_job_key)[0],
   (rank >> 
3), rank, 0,
   
PSMI_HFI_TYPE_DEFAULT, rank);
 In the particular case you mention below, when there is no HFI (shared 
memory), rank 0 and the passed key is 0, epid will be 0.

SOLUTION: set
Set in the environment OMPI_MCA_orte_precondition_transports with a value 
different than 0.

Thanks,

_MAC

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Howard Pritchard
Sent: Thursday, April 14, 2016 1:10 PM
To: Open MPI Developers List 
Subject: [OMPI devel] psm2 and psm2_ep_open problems

Hi Folks,

So we have this brand-new omnipath cluster here at work,
but people are having problem using it on a single node using
srun as the job launcher.

The customer wants to use srun to launch jobs not the open mpi
mpirun.

The customer installed 1.10.1, but I can reproduce the
problem with v2.x and I'm sure with master, unless I build the
ofi mtl.  ofi mtl works, psm2 mtl doesn't.

I downloaded the psm2 code from github and started hacking.

What appears to be the problem is that when running on a single
node one can go through a path in psmi_ep_open_device where
for a single process job, the value stored into epid is zero.

This results in an assert failing in the __psm2_ep_open_internal
function.

Is there a quick and dirty workaround that doesn't involve fixing
psm2 MTL?  I could suggest to the sysadmins to install libfabric 1.3
and build the openmpi to only have ofi mtl, but perhaps there's
another way to get psm2 mtl to work for single node jobs?  I'd prefer
to not ask users to disable psm2 mtl explicitly for their single node jobs.

Thanks for suggestions.

Howard





[OMPI devel] orted hangs on SLES12 when running 80 ranks per node

2016-02-03 Thread Cabral, Matias A
Hi,

I have hit an issue in which orted hangs during the finalization of a job. This 
is reproduced by running 80 ranks per node (yes, oversubscribed) on a 4 nodes 
cluster that runs SLES12 with OMPI 1.10.2 (I also see it with 1.10.0). I found 
that it is independent of the binary used (I used a very simple sample to init 
ranks do nothing and finalize) and also saw happens after MPI_Finalize(). It is 
not a deterministic issue and takes a few attempts to reproduce. When the hang 
occurs, the mpirun process does not get to wait() its childs (see below(1)) and 
is stuck on a poll() (see below (2)). I logged in the other nodes and found all 
the "other" orted processes are also held on a different poll (see below (3)).  
I found that after attaching gdb to mpirun and letting it continue the 
execution finishes with no issues. Same thing sending a SIGSTOP and SIGCONT the 
hung mpirun.

(1)
root 164356 161186  0 16:50 pts/000:00:00 mpirun -np 320 
--allow-run-as-root -machinefile ./user/hostfile /scratch/user/osu_multi_lat
root 164358 164356  0 16:50 pts/000:00:00 /usr/bin/ssh -x n3 
PATH=/scratch/user/bin:$PATH ; export PATH ; 
LD_LIBRARY_PATH=/scratch/user/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; 
DYLD
root 164359 164356  0 16:50 pts/000:00:00 /usr/bin/ssh -x n2 
PATH=/scratch/user/bin:$PATH ; export PATH ; 
LD_LIBRARY_PATH=/scratch/user/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; 
DYLD
root 164361 164356  0 16:50 pts/000:00:06 [osu_multi_lat] 
root 164362 164356  0 16:50 pts/000:00:06 [osu_multi_lat] 
root 164364 164356  0 16:50 pts/000:00:06 [osu_multi_lat] 
root 164365 164356  0 16:50 pts/000:00:06 [osu_multi_lat] 
root 164366 164356  0 16:50 pts/000:00:06 [osu_multi_lat] 
root 164367 164356  0 16:50 pts/000:00:06 [osu_multi_lat] 
root 164370 164356  0 16:50 pts/000:00:06 [osu_multi_lat] 
root 164372 164356  0 16:50 pts/000:00:06 [osu_multi_lat] 
root 164374 164356  0 16:50 pts/000:00:06 [osu_multi_lat] 
root 164375 164356  0 16:50 pts/000:00:06 [osu_multi_lat] 
root 164378 164356  0 16:50 pts/000:00:06 [osu_multi_lat] 
root 164379 164356  0 16:50 pts/000:00:06 [osu_multi_lat] 


(2)
gdb -p 164356
...

Missing separate debuginfos, use: zypper install 
glibc-debuginfo-2.19-17.72.x86_64
(gdb) bt
#0  0x7f143177a3cd in poll () from /lib64/libc.so.6
#1  0x7f14325e0636 in poll_dispatch () from 
/scratch/user/lib/libopen-pal.so.13
#2  0x7f14325d77bf in opal_libevent2021_event_base_loop () from 
/scratch/user/lib/libopen-pal.so.13
#3  0x004051cd in orterun (argc=7, argv=0x7fff8c4bb428) at 
orterun.c:1133
#4  0x00403a8d in main (argc=7, argv=0x7fff8c4bb428) at main.c:13


(3) (remote nodes orted)
(gdb) bt
#0  0x7f8c288d33b0 in __poll_nocancel () from /lib64/libc.so.6
#1  0x7f8c29941186 in poll_dispatch () /scratch/user/lib/libopen-pal.so.13
#2  0x7f8c2993830f in opal_libevent2021_event_base_loop () from 
/scratch/user/lib/libopen-pal.so.13
#3  0x7f8c29be75c4 in orte_daemon () from 
/scratch/user/lib/libopen-rte.so.12
#4  0x00400827 in main ()


Thanks,

_MAC



Re: [OMPI devel] Problem running from ompi master

2015-09-01 Thread Cabral, Matias A
Hi Ralph,

RHEL 7.0, building in the repo location.

Yes, running autogen.pl to generate configure.

I suspect this is unrelated, but I saw this while make install:

WARNING!  Common symbols found:
 btl_openib_lex.o: 0008 C btl_openib_ini_yyleng
 btl_openib_lex.o: 0008 C btl_openib_ini_yytext
 keyval_lex.o: 0008 C opal_util_keyval_yyleng
 keyval_lex.o: 0008 C opal_util_keyval_yytext
  show_help_lex.o: 0008 C opal_show_help_yyleng
  show_help_lex.o: 0008 C opal_show_help_yytext
rmaps_rank_file_lex.o: 0008 C orte_rmaps_rank_file_leng
rmaps_rank_file_lex.o: 0008 C orte_rmaps_rank_file_text
   hostfile_lex.o: 0008 C orte_util_hostfile_leng
   hostfile_lex.o: 0008 C orte_util_hostfile_text
make[3]: [install-exec-hook] Error 1 (ignored)

Thanks,

_MAC

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Tuesday, September 01, 2015 10:43 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] Problem running from ompi master

What system is this on? CentOS7? Are you doing a VPATH build, or doing the 
build in the repo location?

Also, I assume you remembered to run autogen.pl before configure, yes?


On Sep 1, 2015, at 10:11 AM, Cabral, Matias A 
mailto:matias.a.cab...@intel.com>> wrote:

Hi Gilles,

I deleted everything, re-cloned and re-built (without my patch), but still see 
the same issue.  The only option I’m using with configure is --prefix. I even 
tried building with --enable-mpirun-prefix-by-default, and also passing the 
prefix at runtime  (mpirun –prefix =/…), but I always end with the same issue. 
Is it possible that the issue is related to configure --prefix ?

Thanks,

_MAC

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles Gouaillardet
Sent: Monday, August 31, 2015 5:46 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] Problem running from ompi master

Hi,

this part has been revamped recently.

at first, i would recommend you make a fresh install
remove the install directory, and the build directory if you use VPATH, re-run 
configure && make && make install
that should hopefully fix the issue

Cheers,

Gilles
On 9/1/2015 9:35 AM, Cabral, Matias A wrote:
Hi,

Before submitting a pull req I decided to test some changes on ompi master 
branch but I’m facing an unrelated runtime error with ess pmi not being found. 
I confirmed PATH and LD_LIBRARY_PATH are set correctly and also that 
mca_ess_pmi.so where it should.  Any suggestions?

Thanks,
Regards,

s-7  ~/devel/ompi> ls ./lib/openmpi/ |grep pmi
mca_ess_pmi.la
mca_ess_pmi.so
mca_pmix_pmix1xx.la
mca_pmix_pmix1xx.so

s-7 ~/devel/ompi> cat ~/.bashrc |grep -e PATH -e LD_LIBRARY_PATH
export PATH=$HOME/devel/ompi/bin/:$PATH
export LD_LIBRARY_PATH=$HOME/devel/ompi/lib


s-7 ~ ./bin/mpirun  -host s-7,s-8 -np 2  ./osu_latency
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
[s-7.abc.com<http://s-7.abc.com/>:56614] Local abort before MPI_INIT completed 
successfully; not able to aggregate error messages, and not able to guarantee 
that all other processes were killed!
[s-7.abc.com<http://s-7.abc.com/>:56614] [[INVALID],INVALID] ORTE_ERROR_LOG: 
Not found in file runtime/orte_init.c at line 129
--
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:  s-7.abc.com<http://s-7.abc.com/>
Framework: ess
Component: pmi
--
--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_base_open failed
  --> Returned value Not found (-13) instead of ORTE_SUCCESS
--
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environ

Re: [OMPI devel] Problem running from ompi master

2015-09-01 Thread Cabral, Matias A
Hi Gilles,

I deleted everything, re-cloned and re-built (without my patch), but still see 
the same issue.  The only option I'm using with configure is --prefix. I even 
tried building with --enable-mpirun-prefix-by-default, and also passing the 
prefix at runtime  (mpirun -prefix =/...), but I always end with the same 
issue. Is it possible that the issue is related to configure --prefix ?

Thanks,

_MAC

From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Gilles Gouaillardet
Sent: Monday, August 31, 2015 5:46 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] Problem running from ompi master

Hi,

this part has been revamped recently.

at first, i would recommend you make a fresh install
remove the install directory, and the build directory if you use VPATH, re-run 
configure && make && make install
that should hopefully fix the issue

Cheers,

Gilles
On 9/1/2015 9:35 AM, Cabral, Matias A wrote:
Hi,

Before submitting a pull req I decided to test some changes on ompi master 
branch but I'm facing an unrelated runtime error with ess pmi not being found. 
I confirmed PATH and LD_LIBRARY_PATH are set correctly and also that 
mca_ess_pmi.so where it should.  Any suggestions?

Thanks,
Regards,

s-7  ~/devel/ompi> ls ./lib/openmpi/ |grep pmi
mca_ess_pmi.la
mca_ess_pmi.so
mca_pmix_pmix1xx.la
mca_pmix_pmix1xx.so

s-7 ~/devel/ompi> cat ~/.bashrc |grep -e PATH -e LD_LIBRARY_PATH
export PATH=$HOME/devel/ompi/bin/:$PATH
export LD_LIBRARY_PATH=$HOME/devel/ompi/lib


s-7 ~ ./bin/mpirun  -host s-7,s-8 -np 2  ./osu_latency
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
[s-7.abc.com:56614] Local abort before MPI_INIT completed successfully; not 
able to aggregate error messages, and not able to guarantee that all other 
processes were killed!
[s-7.abc.com:56614] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file 
runtime/orte_init.c at line 129
--
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:  s-7.abc.com
Framework: ess
Component: pmi
--
--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_base_open failed
  --> Returned value Not found (-13) instead of ORTE_SUCCESS
--
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "Not found" (-13) instead of "Success" (0)
--
---
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
---
--
mpirun detected that one or more processes exited with non-zero status, thus 
causing
the job to be terminated. The first process to do so was:

  Process name: [[2886,1],0]
  Exit code:1
--




___

devel mailing list

de...@open-mpi.org<mailto:de...@open-mpi.org>

Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel

Link to this post: 
http://www.open-mpi.org/community/lists/devel/2015/08/17908.php



[OMPI devel] Problem running from ompi master

2015-08-31 Thread Cabral, Matias A
Hi,

Before submitting a pull req I decided to test some changes on ompi master 
branch but I'm facing an unrelated runtime error with ess pmi not being found. 
I confirmed PATH and LD_LIBRARY_PATH are set correctly and also that 
mca_ess_pmi.so where it should.  Any suggestions?

Thanks,
Regards,

s-7  ~/devel/ompi> ls ./lib/openmpi/ |grep pmi
mca_ess_pmi.la
mca_ess_pmi.so
mca_pmix_pmix1xx.la
mca_pmix_pmix1xx.so

s-7 ~/devel/ompi> cat ~/.bashrc |grep -e PATH -e LD_LIBRARY_PATH
export PATH=$HOME/devel/ompi/bin/:$PATH
export LD_LIBRARY_PATH=$HOME/devel/ompi/lib


s-7 ~ ./bin/mpirun  -host s-7,s-8 -np 2  ./osu_latency
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
[s-7.abc.com:56614] Local abort before MPI_INIT completed successfully; not 
able to aggregate error messages, and not able to guarantee that all other 
processes were killed!
[s-7.abc.com:56614] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file 
runtime/orte_init.c at line 129
--
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:  s-7.abc.com
Framework: ess
Component: pmi
--
--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_base_open failed
  --> Returned value Not found (-13) instead of ORTE_SUCCESS
--
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "Not found" (-13) instead of "Success" (0)
--
---
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
---
--
mpirun detected that one or more processes exited with non-zero status, thus 
causing
the job to be terminated. The first process to do so was:

  Process name: [[2886,1],0]
  Exit code:1
--


[OMPI devel] cosmetic misleading mpirun error message

2015-08-25 Thread Cabral, Matias A
Hi,

Playing with the 1.10.0 (just released) build I found a cosmetic misleading 
error message in mpirun. If by mistake you type -hosts (with an extra  "s"), 
the error message complains about an unknown "-o" option that is actually not 
being used. Typing the parameters correctly fixes the issue :)

m> mpirun --allow-run-as-root -hosts m7,m8 -np 2  osu_latency
mpirun: Error: unknown option "-o"
Type 'mpirun --help' for usage.

Thanks,
Regards,


_MAC