Re: [OMPI users] Error in hello_cxx.cc

2018-04-23 Thread r...@open-mpi.org
and 3) we no longer support Windows. You could try using the cygwin port 
instead.

> On Apr 23, 2018, at 7:52 PM, Nathan Hjelm  wrote:
> 
> Two things. 1) 1.4 is extremely old and you will not likely get much help 
> with it, and 2) the c++ bindings were deprecated in MPI-2.2 (2009) and 
> removed in MPI-3.0 (2012) so you probably want to use the C bindings instead.
> 
> -Nathan
> 
> On Apr 23, 2018, at 8:14 PM, Amir via users  > wrote:
> 
>> Yes, I am running under windows using visual studio 2010 express edition. 
>> The build process is done fine but when I am trying to run I would get the 
>> error code 6 in MPI::Init().  I have installed openmpi-1.4.5 . I have also 
>> attached the log file of the CMake hope this would help. The screenshot of 
>> ipconfig is also attached. 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> ___
>> users mailing list
>> users@lists.open-mpi.org 
>> https://lists.open-mpi.org/mailman/listinfo/users 
>> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Error in hello_cxx.cc

2018-04-23 Thread Nathan Hjelm
Two things. 1) 1.4 is extremely old and you will not likely get much help with 
it, and 2) the c++ bindings were deprecated in MPI-2.2 (2009) and removed in 
MPI-3.0 (2012) so you probably want to use the C bindings instead.

-Nathan

> On Apr 23, 2018, at 8:14 PM, Amir via users  wrote:
> 
> Yes, I am running under windows using visual studio 2010 express edition. The 
> build process is done fine but when I am trying to run I would get the error 
> code 6 in MPI::Init().  I have installed openmpi-1.4.5 . I have also attached 
> the log file of the CMake hope this would help. The screenshot of ipconfig is 
> also attached. 
> 
> 
> 
> 
> 
> 
> 
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] OpenMPI 3.0.1 debug crashes eclipse due to mpirun -display-map bug

2018-04-23 Thread Érico
Hello,

Because of the error below, Eclipse is not able to run PTP debugger with 
OpenMPI 3.0.1.

Can someone help me??

I use CentOS 7. 

Thanks!

Erico

-

[erico@centos64 ContainerServiceDebug]$ mpirun -mca 
orte_show_resolved_nodenames 1 -display-map -np 1 pwd
 Data for JOB [23564,1] offset 0 Total slots allocated 4
[centos64:98315] *** Process received signal ***
[centos64:98315] Signal: Segmentation fault (11)
[centos64:98315] Signal code: Address not mapped (1)
[centos64:98315] Failing at address: (nil)
[centos64:98315] [ 0] /usr/lib64/libpthread.so.0(+0xf100)[0x7f74537c9100]
[centos64:98315] [ 1] 
/usr/local/lib/libopen-rte.so.40(orte_dt_print_node+0x451)[0x7f7454a6a35f]
[centos64:98315] [ 2] 
/usr/local/lib/libopen-pal.so.40(opal_dss_print+0x68)[0x7f745474d1d5]
[centos64:98315] [ 3] 
/usr/local/lib/libopen-rte.so.40(orte_dt_print_map+0x517)[0x7f7454a6b834]
[centos64:98315] [ 4] 
/usr/local/lib/libopen-pal.so.40(opal_dss_print+0x68)[0x7f745474d1d5]
[centos64:98315] [ 5] 
/usr/local/lib/libopen-rte.so.40(orte_rmaps_base_display_map+0x53b)[0x7f7454aefd0c]
[centos64:98315] [ 6] 
/usr/local/lib/libopen-rte.so.40(orte_odls_base_default_construct_child_list+0x13f7)[0x7f7454acf090]
[centos64:98315] [ 7] 
/usr/local/lib/openmpi/mca_odls_default.so(+0x2c7c)[0x7f744d5f3c7c]
[centos64:98315] [ 8] 
/usr/local/lib/libopen-rte.so.40(orte_daemon_recv+0x6d7)[0x7f7454a9bdb5]
[centos64:98315] [ 9] 
/usr/local/lib/libopen-rte.so.40(orte_rml_base_process_msg+0x2e5)[0x7f7454afbde8]
[centos64:98315] [10] 
/usr/local/lib/libopen-pal.so.40(opal_libevent2022_event_base_loop+0x8fc)[0x7f74547a246c]
[centos64:98315] [11] mpirun[0x4016f7]
[centos64:98315] [12] mpirun[0x4010e0]
[centos64:98315] [13] 
/usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f7453419b15]
[centos64:98315] [14] mpirun[0x400ff9]
[centos64:98315] *** End of error message ***
Segmentation fault (core dumped)
[erico@centos64 ContainerServiceDebug]$ 

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] openmpi/slurm/pmix

2018-04-23 Thread r...@open-mpi.org
Hi Michael

Looks like the problem is that you didn’t wind up with the external PMIx. The 
component listed in your error is the internal PMIx one which shouldn’t have 
built given that configure line.

Check your config.out and see what happened. Also, ensure that your 
LD_LIBRARY_PATH is properly pointing to the installation, and that you built 
into a “clean” prefix.


> On Apr 23, 2018, at 12:01 PM, Michael Di Domenico  
> wrote:
> 
> i'm trying to get slurm 17.11.5 and openmpi 3.0.1 working with pmix.
> 
> everything compiled, but when i run something it get
> 
> : symbol lookup error: /openmpi/mca_pmix_pmix2x.so: undefined symbol:
> opal_libevent2022_evthread_use_pthreads
> 
> i more then sure i did something wrong, but i'm not sure what, here's what i 
> did
> 
> compile libevent 2.1.8
> 
> ./configure --prefix=/libevent-2.1.8
> 
> compile pmix 2.1.0
> 
> ./configure --prefix=/pmix-2.1.0 --with-psm2
> --with-munge=/munge-0.5.13 --with-libevent=/libevent-2.1.8
> 
> compile openmpi
> 
> ./configure --prefix=/openmpi-3.0.1 --with-slurm=/slurm-17.11.5
> --with-hwloc=external --with-mxm=/opt/mellanox/mxm
> --with-cuda=/usr/local/cuda --with-pmix=/pmix-2.1.0
> --with-libevent=/libevent-2.1.8
> 
> when i look at the symbols in the mca_pmix_pmix2x.so library the
> function is indeed undefined (U) in the output, but checking ldd
> against the library doesn't show any missing
> 
> any thoughts?
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] openmpi/slurm/pmix

2018-04-23 Thread Michael Di Domenico
i'm trying to get slurm 17.11.5 and openmpi 3.0.1 working with pmix.

everything compiled, but when i run something it get

: symbol lookup error: /openmpi/mca_pmix_pmix2x.so: undefined symbol:
opal_libevent2022_evthread_use_pthreads

i more then sure i did something wrong, but i'm not sure what, here's what i did

compile libevent 2.1.8

./configure --prefix=/libevent-2.1.8

compile pmix 2.1.0

./configure --prefix=/pmix-2.1.0 --with-psm2
--with-munge=/munge-0.5.13 --with-libevent=/libevent-2.1.8

compile openmpi

./configure --prefix=/openmpi-3.0.1 --with-slurm=/slurm-17.11.5
--with-hwloc=external --with-mxm=/opt/mellanox/mxm
--with-cuda=/usr/local/cuda --with-pmix=/pmix-2.1.0
--with-libevent=/libevent-2.1.8

when i look at the symbols in the mca_pmix_pmix2x.so library the
function is indeed undefined (U) in the output, but checking ldd
against the library doesn't show any missing

any thoughts?
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] btl_openib_if_include

2018-04-23 Thread Marshall2, John (SSC/SPC)
Hi,

That gives me an avenue to pursue.

Thanks,
John

On Mon, 2018-04-23 at 15:12 +, Jeff Squyres (jsquyres) wrote:

On Apr 23, 2018, at 11:00 AM, Marshall2, John (SSC/SPC) 
> wrote:



Only one ib interface shows up via ifconfig and at /sys/class/net/ibX.

But, under /sys/class/infiniband and /sys/class/infiniband_cm, all the mlx4_Y 
do show
up. E.g.,
mlx4_0  mlx4_10  mlx4_12  mlx4_14  mlx4_16  mlx4_3  mlx4_5  mlx4_7  mlx4_9
mlx4_1  mlx4_11  mlx4_13  mlx4_15  mlx4_2   mlx4_4  mlx4_6  mlx4_8

I'm not sure if this can be avoided.

So, where is openmpi looking for the available mlx4_Y? Under one of those two 
directories
or whatever is at /sys/class/net/ibX/device/infiniband/mlx4_Y?



It will use whatever devices libibverbs reports back.

It's been quite a while since I've looked in the libibverbs code, but it 
*might* return all the devices...?  What does ibv_devinfo(1) return inside one 
of your containers?  That's probably the same information that is returned to 
Open MPI programmatically via the libibverbs API.

If libibverbs is returning all devices vs. just the one that is actually 
available in your container, then that might explain the performance disparity.


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] btl_openib_if_include

2018-04-23 Thread Jeff Squyres (jsquyres)
On Apr 23, 2018, at 11:00 AM, Marshall2, John (SSC/SPC) 
 wrote:
> 
> Only one ib interface shows up via ifconfig and at /sys/class/net/ibX.
> 
> But, under /sys/class/infiniband and /sys/class/infiniband_cm, all the mlx4_Y 
> do show
> up. E.g.,
> mlx4_0mlx4_10  mlx4_12  mlx4_14  mlx4_16  mlx4_3  mlx4_5  mlx4_7  
> mlx4_9
> mlx4_1mlx4_11  mlx4_13  mlx4_15  mlx4_2   mlx4_4  mlx4_6  mlx4_8
> 
> I'm not sure if this can be avoided.
> 
> So, where is openmpi looking for the available mlx4_Y? Under one of those two 
> directories
> or whatever is at /sys/class/net/ibX/device/infiniband/mlx4_Y?

It will use whatever devices libibverbs reports back.

It's been quite a while since I've looked in the libibverbs code, but it 
*might* return all the devices...?  What does ibv_devinfo(1) return inside one 
of your containers?  That's probably the same information that is returned to 
Open MPI programmatically via the libibverbs API.

If libibverbs is returning all devices vs. just the one that is actually 
available in your container, then that might explain the performance disparity.

-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] btl_openib_if_include

2018-04-23 Thread Marshall2, John (SSC/SPC)
On Mon, 2018-04-23 at 14:44 +, Jeff Squyres (jsquyres) wrote:

On Apr 20, 2018, at 1:03 PM, Marshall2, John (SSC/SPC) 
> wrote:



I am trying to verify/determine what the proper setting is for 
btl_openib_ib_include.



I think you mean btl_openib_if_include ("if" = "interface").


Yes.





Some background:
* openmpi 2.1.1 (and 1.6.5 - yes it is old)
* lxc containers
* SRIOV (virtual functions) being used
* dedicated IB interface (e.g., ib2) per container

Should the mlx4_X:1 correspond to a specific ibY interface? E.g., for ib26, I 
find
mlx4_13:1 by:
$ ls /sys/class/net/ib26/device/infiniband
mlx4_13

Does the mlx4_X have to be determined at each location where an mpi task
would run? I suppose it would because the ibY is likely to be different.



Open MPI basically probes its environment at run time.  In your case, it will 
find all available IB interfaces (per MPI process), filter them through 
if_include / if_exclude, and then use whatever is left.



On some tests, I have found that the setting:
export OMPI_MCA_btl_openib_if_include=mlx4_0:1

provides better performance than not specifying a value or letting mpirun/orted
figure it out at runtime.



That's a little surprising.

Do you have more than 1 IB interface?  If not, then Open MPI should likely be 
independently coming to the same conclusion (i.e., "mlx4_0:1").  If it's not, 
that's weird.


Only one ib interface shows up via ifconfig and at /sys/class/net/ibX.

But, under /sys/class/infiniband and /sys/class/infiniband_cm, all the mlx4_Y 
do show
up. E.g.,

mlx4_0  mlx4_10  mlx4_12  mlx4_14  mlx4_16  mlx4_3  mlx4_5  mlx4_7  mlx4_9

mlx4_1  mlx4_11  mlx4_13  mlx4_15  mlx4_2   mlx4_4  mlx4_6  mlx4_8

I'm not sure if this can be avoided.

So, where is openmpi looking for the available mlx4_Y? Under one of those two 
directories
or whatever is at /sys/class/net/ibX/device/infiniband/mlx4_Y?

Thanks,
John






___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] btl_openib_if_include

2018-04-23 Thread Jeff Squyres (jsquyres)
On Apr 20, 2018, at 1:03 PM, Marshall2, John (SSC/SPC) 
 wrote:
> 
> I am trying to verify/determine what the proper setting is for 
> btl_openib_ib_include.

I think you mean btl_openib_if_include ("if" = "interface").

> Some background:
> * openmpi 2.1.1 (and 1.6.5 - yes it is old)
> * lxc containers
> * SRIOV (virtual functions) being used
> * dedicated IB interface (e.g., ib2) per container
> 
> Should the mlx4_X:1 correspond to a specific ibY interface? E.g., for ib26, I 
> find
> mlx4_13:1 by:
> $ ls /sys/class/net/ib26/device/infiniband
> mlx4_13
> 
> Does the mlx4_X have to be determined at each location where an mpi task
> would run? I suppose it would because the ibY is likely to be different.

Open MPI basically probes its environment at run time.  In your case, it will 
find all available IB interfaces (per MPI process), filter them through 
if_include / if_exclude, and then use whatever is left.

> On some tests, I have found that the setting:
> export OMPI_MCA_btl_openib_if_include=mlx4_0:1
> 
> provides better performance than not specifying a value or letting 
> mpirun/orted
> figure it out at runtime.

That's a little surprising.

Do you have more than 1 IB interface?  If not, then Open MPI should likely be 
independently coming to the same conclusion (i.e., "mlx4_0:1").  If it's not, 
that's weird.

-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Error in hello_cxx.cc

2018-04-23 Thread r...@open-mpi.org
Also, I note from the screenshot that you appear to be running on Windows with 
a Windows binary. Correct?


> On Apr 23, 2018, at 7:08 AM, Jeff Squyres (jsquyres)  
> wrote:
> 
> Can you send all the information listed here:
> 
>https://www.open-mpi.org/community/help/
> 
> 
> 
>> On Apr 22, 2018, at 2:28 PM, Amir via users  wrote:
>> 
>> Hi everybody,
>> 
>> After having some problems with setting up the debugging environment for 
>> Visual Studio 10, I am trying to debug the first Open_MPI example program 
>> (hello_cxx.cc) . 
>> 
>> I am getting an error in the function call MPI::Init();  . The attached 
>> screenshot should clarify this better.  I guess this is related to the rank 
>> but don't have any idea why and how to fix it. 
>> 
>> Any guidance is highly appreciated.
>> 
>> Thanks you,
>> 
>> Amir
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Error in hello_cxx.cc

2018-04-23 Thread Jeff Squyres (jsquyres)
Can you send all the information listed here:

https://www.open-mpi.org/community/help/



> On Apr 22, 2018, at 2:28 PM, Amir via users  wrote:
> 
> Hi everybody,
> 
> After having some problems with setting up the debugging environment for 
> Visual Studio 10, I am trying to debug the first Open_MPI example program 
> (hello_cxx.cc) . 
> 
> I am getting an error in the function call MPI::Init();  . The attached 
> screenshot should clarify this better.  I guess this is related to the rank 
> but don't have any idea why and how to fix it. 
> 
> Any guidance is highly appreciated.
> 
> Thanks you,
> 
> Amir
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users


-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users