Re: [OMPI users] Several threads making progress - How to disable them

2016-08-04 Thread r...@open-mpi.org
Yep, there are indeed two progress threads running - and no, you cannot disable 
them. They are, however, “blocked” so they aren’t eating any cycles during 
normal operation unless an event that requires their attention wakes them up. 
So they shouldn’t interfere with your app.


> On Aug 4, 2016, at 4:23 PM, Jaime Arteaga  wrote:
> 
> Hi:
> 
> ​When running a simple MPI program in the form of:
> 
> int main()
> {
> ...
> MPI_Init();
> ...
> sleep(10);
> ...
> MPI_Finalize()​;
> ...​
> ​}​
> 
> ​Two threads, additionally to the master, can be seen when using gstack:
> 
> ​
> ​Thread 3 (Thread 0x7f2238a6c700 (LWP 106578)):
> #0  0x7f223a869783 in epoll_wait () from /lib64/libc.so.6
> #1  0x7f223a268983 in epoll_dispatch (base=0x1be76c0, tv=) 
> at epoll.c:407
> #2  0x7f223a26c3d0 in opal_libevent2022_event_base_loop (base=0x1be76c0, 
> flags=1) at event.c:1630
> #3  0x7f2238a91b9d in progress_engine () from 
> ../openmpi-2.0.0/lib/openmpi/mca_pmix_pmix112.so
> #4  0x7f223ab3bdf5 in start_thread () from /lib64/libpthread.so.0
> #5  0x7f223a8691ad in clone () from /lib64/libc.so.6
> Thread 2 (Thread 0x7f2233fff700 (LWP 106579)):
> #0  0x7f223a85eb7d in poll () from /lib64/libc.so.6
> #1  0x7f223a274736 in poll_dispatch (base=0x1be8bd0, tv=0x7f2233ffeea0) 
> at poll.c:165
> #2  0x7f223a26c3d0 in opal_libevent2022_event_base_loop (base=0x1be8bd0, 
> flags=1) at event.c:1630
> #3  0x7f223a23115e in progress_engine () from 
> .../openmpi-2.0.0/lib/libopen-pal.so.20
> #4  0x7f223ab3bdf5 in start_thread () from /lib64/libpthread.so.0
> #5  0x7f223a8691ad in clone () from /lib64/libc.so.6
> Thread 1 (Thread 0x7f223b23e740 (LWP 106577)):
> #0  0x7f223a83048d in nanosleep () from /lib64/libc.so.6
> #1  0x7f223a830324 in sleep () from /lib64/libc.so.6
> #2  0x00400acb in main (argc=1, argv=0x7fffd6e7b498)
> 
> What are these threads designation? Are they progress threads?
> 
> Also, if I'd wanted to disable the extra threads, so there was only one 
> thread (the master one), how could I do it?
> 
> 
> Info:
> 
> OpenMPI used: 2.0.0
> 
> Command used: mpirun -machinefile ~/hostfile --n 2 --map-by node ./test​
> 
> Thanks!
> 
> Jaime
> 
> 
> ​
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OPENSHMEM ERROR with 2+ Distributed Machines

2016-08-12 Thread r...@open-mpi.org
Just as a suggestion: most of us are leery of opening Word attachments on 
mailing lists. I’d suggest sending this to us as plain text if you want us to 
read it.


> On Aug 12, 2016, at 4:03 AM, Debendra Das  wrote:
> 
> I have installed OpenMPI-2.0.0 in 5 systems with IP addresses 172.16.5.29, 
> 172.16.5.30, 172.16.5.31, 172.16.5.32, 172.16.5.33.While executing the 
> hello_oshmem_c.c program (under the examples directory) , correct output is 
> coming only when executing is done using 2 distributed machines.But error is 
> coming when 3 or more distributed machines are used.The outputs and the host 
> file  are attached.Can anybody please help me to sort out this error?
> 
> Thanking You.
> Debendranath Das 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] SGE integration broken in 2.0.0

2016-08-12 Thread r...@open-mpi.org
IIRC, the rationale behind adding the check was that someone using SGE wanted 
to specify a custom launch agent, and we were overriding it with qrsh. However, 
the check is incorrect as that MCA param cannot be NULL.

I have updated this on master - can you see if this fixes the problem for you?

https://github.com/open-mpi/ompi/pull/1957

As for the blank in the cmd line - that is likely due to a space reserved for 
some entry that you aren’t using (e.g., when someone manually specifies the 
prefix). It shouldn’t cause any harm as the cmd line parser is required to 
ignore spaces

The -ldl problem sounds like a configuration issue - you might want to file a 
separate issue about it

> On Aug 11, 2016, at 4:28 AM, Reuti  wrote:
> 
> Hi,
> 
> In the file orte/mca/plm/rsh/plm_rsh_component I see an if-statement, which 
> seems to prevent the tight integration with SGE to start:
> 
>if (NULL == mca_plm_rsh_component.agent) {
> 
> Why is it there (it wasn't in 1.10.3)?
> 
> If I just remove it I get:
> 
> [node17:25001] [[27678,0],0] plm:rsh: final template argv:
>qrsh   orted --hnp-topo-sig ...
> 
> instead of the former:
> 
> /usr/sge/bin/lx24-amd64/qrsh -inherit -nostdin -V -verbose   orted 
> --hnp-topo-sig ...
> 
> So, just removing the if-statement is not a perfect cure as the 
> $SGE_ROOT/$ARC does not prefix `qrsh`.
> 
> ==
> 
> BTW: why is there blank before " orted" in the assembled command line - and 
> it's really in the argument when I check this on the slave nodes what should 
> be started by the `qrsh_starter`? As long as there is a wrapping shell, it 
> will be removed anyway. But in a special setup we noticed this additional 
> blank.
> 
> ==
> 
> I also notice, that I have to supply "-ldl" to `mpicc` to allow the 
> compilation of an application to succeed in 2.0.0.
> 
> -- Reuti
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] mpirun won't find programs from the PATH environment variable that are in directories that are relative paths

2016-08-12 Thread r...@open-mpi.org
Sorry for the delay - I had to catchup on some other things before I could come 
back to checking this one. Took me awhile to track this down, but the change is 
in test for master:

https://github.com/open-mpi/ompi/pull/1958

Once complete, I’ll set it up for inclusion in v2.0.1

Thanks for reporting it!
Ralph


> On Jul 29, 2016, at 5:47 PM, Phil Regier  wrote:
> 
> If I'm reading you right, you're presently unable to do the equivalent 
> (albeit probably with PATH set on a different line somewhere above) of
> 
> PATH=arch/x86_64-rhel7-gcc48-opt/bin mpirun -n 1 psana
> 
> I'm mildly curious whether it would help to add a leading "./" to get the 
> equivalent of
> 
> PATH=./arch/x86_64-rhel7-gcc48-opt/bin mpirun -n 1 psana
> 
> But to be clear, I'm advocating
> 
> PATH=$PWD/arch/x86_64-rhel7-gcc48-opt/bin mpirun -n 1 psana
> 
> as opposed to
> 
> mpirun -n 1 $PWD/arch/x86_64-rhel7-gcc48-opt/bin/psana
> 
> mostly because you still get to set the path once and use it many times 
> without duplicating code.
> 
> 
> For what it's worth, I've seen Ralph's suggestion generalized to something 
> like
> 
> PREFIX=$PWD/arch/x86_64-rhel7-gcc48-opt/bin mpirun -n 1 $PREFIX/psana
> 
> where PREFIX might be set above in the same script, or sourced from a common 
> config script or a custom environment module.  I think this style appeals to 
> many users on many levels.
> 
> 
> In any event, though, if this really is a bug that gets fixed, you've got 
> lots of options.
> 
> 
> 
> 
> On Fri, Jul 29, 2016 at 5:24 PM, Schneider, David A. 
> mailto:david...@slac.stanford.edu>> wrote:
> Hi, Thanks for the reply! It does look like mpirun runs from the same 
> directory as where I launch it, and that the environment has the same value 
> for PATH that I had before (with the relative directory in front), but of 
> course, there are lots of other MPI based environment variables defined - 
> maybe one of those means don't use the relative paths?
> 
> Explicitly setting the path with $PWD like you say, yes, I agree that is a 
> good defensive practice, but it is more cumbersome, the actually path looks
> 
>  mpirun -n 1 $PWD/arch/x86_64-rhel7-gcc48-opt/bin/psana
> 
> best,
> 
> David Schneider
> SLAC/LCLS
> 
> From: users [users-boun...@lists.open-mpi.org 
> ] on behalf of Phil Regier 
> [preg...@penguincomputing.com ]
> Sent: Friday, July 29, 2016 5:12 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] mpirun won't find programs from the PATH 
> environment variable that are in directories that are relative paths
> 
> I might be three steps behind you here, but does "mpirun  pwd" show 
> that all your launched processes are running in the same directory as the 
> mpirun command?  I assume that "mpirun  env" would show that your PATH 
> variable is being passed along correctly, since you don't have any problems 
> with absolute paths.  In any event, is PATH=$PWD/dir/bin not an option?
> 
> Seems to me that this last would be good practice for location-sensitive 
> launches in general, though I do tend to miss things.
> 
> On Fri, Jul 29, 2016 at 4:34 PM, Schneider, David A. 
>   >> wrote:
> I am finding, on linux, rhel7, with openmpi 1.8.8 and 1.10.3, that mpirun 
> won't find apps that are specified on a relative path, i.e, if I have
> 
> PATH=dir/bin
> 
> and I am in a directory which has dir/bin as a subdirectory, and an 
> executable bir/bin/myprogram, I can't do
> 
> mpirun myprogram
> 
> I get the error message that
> 
> mpirun was unable to find the specified executable file, and therefore
> did not launch the job.
> 
> whereas if I put an absolute path, something like
> 
> PATH=/home/me/dir/bin
> 
> then it works.
> 
> This causes some problematic silent failure, sometimes we use relative 
> directories to override a 'base' release, so if I had
> 
> PATH=dir/bin:/central/install/dir/bin
> 
> and myprogram was in both dir/bin and /central/install/dir/bin, through 
> mpirun, I would be running myprogram from the central install, but otherwise 
> I would run it from my own directory.
> 
> Do other people find this is the case? I wonder if it is a problem that got 
> introduced through our installation of openmpi.  We do create relocatable 
> rpm's, and I'm also trying openmpi from a conda package that is relocatable, 
> I think all the prefix paths in the binary and text files were corrected 
> properly for the install - at least everything else seems to work.
> 
> best,
> 
> David Schneider
> SLAC/LCLS
> ___
> users mailing list
> users@lists.open-mpi.org 
>  >
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
> 

Re: [OMPI users] SGE integration broken in 2.0.0

2016-08-12 Thread r...@open-mpi.org
Don’t know about the toolchain issue - I use those same versions, and don’t 
have a problem. I’m on CentOS-7, so that might be the difference?

Anyway, I found the missing code to assemble the cmd line for qrsh - not sure 
how/why it got deleted.

https://github.com/open-mpi/ompi/pull/1960


> On Aug 12, 2016, at 12:15 PM, Reuti  wrote:
> 
>> 
>> Am 12.08.2016 um 16:52 schrieb r...@open-mpi.org <mailto:r...@open-mpi.org>:
>> 
>> IIRC, the rationale behind adding the check was that someone using SGE 
>> wanted to specify a custom launch agent, and we were overriding it with 
>> qrsh. However, the check is incorrect as that MCA param cannot be NULL.
>> 
>> I have updated this on master - can you see if this fixes the problem for 
>> you?
>> 
>> https://github.com/open-mpi/ompi/pull/1957 
>> <https://github.com/open-mpi/ompi/pull/1957>
> 
> I updated my tools to:
> 
> autoconf-2.69
> automake-1.15
> libtool-2.4.6
> 
> but I face with Open MPI's ./autogen.pl:
> 
> configure.ac:152: error: possibly undefined macro: AC_PROG_LIBTOOL
> 
> I recall seeing in already before, how to get rid of it? For now I fixed the 
> single source file just by hand.
> 
> -- Reuti
> 
> 
>> As for the blank in the cmd line - that is likely due to a space reserved 
>> for some entry that you aren’t using (e.g., when someone manually specifies 
>> the prefix). It shouldn’t cause any harm as the cmd line parser is required 
>> to ignore spaces
>> 
>> The -ldl problem sounds like a configuration issue - you might want to file 
>> a separate issue about it
>> 
>>> On Aug 11, 2016, at 4:28 AM, Reuti  wrote:
>>> 
>>> Hi,
>>> 
>>> In the file orte/mca/plm/rsh/plm_rsh_component I see an if-statement, which 
>>> seems to prevent the tight integration with SGE to start:
>>> 
>>>  if (NULL == mca_plm_rsh_component.agent) {
>>> 
>>> Why is it there (it wasn't in 1.10.3)?
>>> 
>>> If I just remove it I get:
>>> 
>>> [node17:25001] [[27678,0],0] plm:rsh: final template argv:
>>>  qrsh   orted --hnp-topo-sig ...
>>> 
>>> instead of the former:
>>> 
>>> /usr/sge/bin/lx24-amd64/qrsh -inherit -nostdin -V -verbose   
>>> orted --hnp-topo-sig ...
>>> 
>>> So, just removing the if-statement is not a perfect cure as the 
>>> $SGE_ROOT/$ARC does not prefix `qrsh`.
>>> 
>>> ==
>>> 
>>> BTW: why is there blank before " orted" in the assembled command line - and 
>>> it's really in the argument when I check this on the slave nodes what 
>>> should be started by the `qrsh_starter`? As long as there is a wrapping 
>>> shell, it will be removed anyway. But in a special setup we noticed this 
>>> additional blank.
>>> 
>>> ==
>>> 
>>> I also notice, that I have to supply "-ldl" to `mpicc` to allow the 
>>> compilation of an application to succeed in 2.0.0.
>>> 
>>> -- Reuti
>>> ___
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>> 
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>> 
> 
> ___
> users mailing list
> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] SGE integration broken in 2.0.0

2016-08-12 Thread r...@open-mpi.org

> On Aug 12, 2016, at 1:48 PM, Reuti  wrote:
> 
> 
> Am 12.08.2016 um 21:44 schrieb r...@open-mpi.org <mailto:r...@open-mpi.org>:
> 
>> Don’t know about the toolchain issue - I use those same versions, and don’t 
>> have a problem. I’m on CentOS-7, so that might be the difference?
>> 
>> Anyway, I found the missing code to assemble the cmd line for qrsh - not 
>> sure how/why it got deleted.
>> 
>> https://github.com/open-mpi/ompi/pull/1960 
>> <https://github.com/open-mpi/ompi/pull/1960>
> 
> Yep, it's working again - thx.
> 
> But for sure there was a reason behind the removal, which may be elaborated 
> in the Open MPI team to avoid any side effects by fixing this issue.

I actually don’t recall a reason - and I’m the one that generally maintains 
that code area. I think it fell of the map accidentally when I was updating 
that area.

However, we’ll toss it out there for comment - anyone recall?


> 
> -- Reuti
> 
> PS: The other items I'll investigate on Monday.
> 
> 
>>> On Aug 12, 2016, at 12:15 PM, Reuti  wrote:
>>> 
>>>> 
>>>> Am 12.08.2016 um 16:52 schrieb r...@open-mpi.org:
>>>> 
>>>> IIRC, the rationale behind adding the check was that someone using SGE 
>>>> wanted to specify a custom launch agent, and we were overriding it with 
>>>> qrsh. However, the check is incorrect as that MCA param cannot be NULL.
>>>> 
>>>> I have updated this on master - can you see if this fixes the problem for 
>>>> you?
>>>> 
>>>> https://github.com/open-mpi/ompi/pull/1957
>>> 
>>> I updated my tools to:
>>> 
>>> autoconf-2.69
>>> automake-1.15
>>> libtool-2.4.6
>>> 
>>> but I face with Open MPI's ./autogen.pl:
>>> 
>>> configure.ac:152: error: possibly undefined macro: AC_PROG_LIBTOOL
>>> 
>>> I recall seeing in already before, how to get rid of it? For now I fixed 
>>> the single source file just by hand.
>>> 
>>> -- Reuti
>>> 
>>> 
>>>> As for the blank in the cmd line - that is likely due to a space reserved 
>>>> for some entry that you aren’t using (e.g., when someone manually 
>>>> specifies the prefix). It shouldn’t cause any harm as the cmd line parser 
>>>> is required to ignore spaces
>>>> 
>>>> The -ldl problem sounds like a configuration issue - you might want to 
>>>> file a separate issue about it
>>>> 
>>>>> On Aug 11, 2016, at 4:28 AM, Reuti  wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> In the file orte/mca/plm/rsh/plm_rsh_component I see an if-statement, 
>>>>> which seems to prevent the tight integration with SGE to start:
>>>>> 
>>>>> if (NULL == mca_plm_rsh_component.agent) {
>>>>> 
>>>>> Why is it there (it wasn't in 1.10.3)?
>>>>> 
>>>>> If I just remove it I get:
>>>>> 
>>>>> [node17:25001] [[27678,0],0] plm:rsh: final template argv:
>>>>> qrsh   orted --hnp-topo-sig ...
>>>>> 
>>>>> instead of the former:
>>>>> 
>>>>> /usr/sge/bin/lx24-amd64/qrsh -inherit -nostdin -V -verbose   
>>>>> orted --hnp-topo-sig ...
>>>>> 
>>>>> So, just removing the if-statement is not a perfect cure as the 
>>>>> $SGE_ROOT/$ARC does not prefix `qrsh`.
>>>>> 
>>>>> ==
>>>>> 
>>>>> BTW: why is there blank before " orted" in the assembled command line - 
>>>>> and it's really in the argument when I check this on the slave nodes what 
>>>>> should be started by the `qrsh_starter`? As long as there is a wrapping 
>>>>> shell, it will be removed anyway. But in a special setup we noticed this 
>>>>> additional blank.
>>>>> 
>>>>> ==
>>>>> 
>>>>> I also notice, that I have to supply "-ldl" to `mpicc` to allow the 
>>>>> compilation of an application to succeed in 2.0.0.
>>>>> 
>>>>> -- Reuti
>>>>> ___
>>>>> users mailing list
>>>>> users@lists.open-mpi.org
>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>> 
>>>> ___
>>>> users mailing list
>>>> users@lists.open-mpi.org
>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>> 
>>> 
>>> ___
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>> 
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> ___
> users mailing list
> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Problems with mpirun in openmpi-1.8.1 and -2.0.0

2016-08-19 Thread r...@open-mpi.org
The rdma error sounds like something isn’t right with your machine’s Infiniband 
installation.

The cross-version problem sounds like you installed both OMPI versions into the 
same location - did you do that?? If so, then that might be the root cause of 
both problems. You need to install them in totally different locations. Then 
you need to _prefix_ your PATH and LD_LIBRARY_PATH with the location of the 
version you want to use.

HTH
Ralph

> On Aug 19, 2016, at 12:53 AM, Juan A. Cordero Varelaq 
>  wrote:
> 
> Dear users,
> 
> I am totally stuck using openmpi. I have two versions on my machine: 1.8.1 
> and 2.0.0, and none of them work. When use the mpirun 1.8.1 version, I get 
> the following error:
> 
> librdmacm: Fatal: unable to open RDMA device
> librdmacm: Fatal: unable to open RDMA device
> librdmacm: Fatal: unable to open RDMA device
> librdmacm: Fatal: unable to open RDMA device
> librdmacm: Fatal: unable to open RDMA device
> --
> Open MPI failed to open the /dev/knem device due to a local error.
> Please check with your system administrator to get the problem fixed,
> or set the btl_sm_use_knem MCA parameter to 0 to run without /dev/knem
> support.
> 
>   Local host: MYMACHINE
>   Errno:  2 (No such file or directory)
> --
> --
> Open MPI failed to open an OpenFabrics device.  This is an unusual
> error; the system reported the OpenFabrics device as being present,
> but then later failed to access it successfully.  This usually
> indicates either a misconfiguration or a failed OpenFabrics hardware
> device.
> 
> All OpenFabrics support has been disabled in this MPI process; your
> job may or may not continue.
> 
>   Hostname:MYMACHINE
>   Device name: mlx4_0
>   Errror (22): Invalid argument
> --
> --
> [[60527,1],4]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
> 
> Module: usNIC
>   Host: MYMACHINE
> 
> When I use the 2.0.0 version, I get something strange, it seems openmpi-2.0.0 
> looks for openmpi-1.8.1 libraries?:
> 
> A requested component was not found, or was unable to be opened.  This
> means that this component is either not installed or is unable to be
> used on your system (e.g., sometimes this means that shared libraries
> that the component requires are unable to be found/loaded).  Note that
> Open MPI stopped checking at the first component that it did not find.
> 
> Host:  MYMACHINE
> Framework: ess
> Component: pmi
> --
> [MYMACHINE:126820] *** Process received signal ***
> [MYMACHINE:126820] Signal: Segmentation fault (11)
> [MYMACHINE:126820] Signal code: Address not mapped (1)
> [MYMACHINE:126820] Failing at address: 0x1c0
> [MYMACHINE:126820] [ 0] 
> /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f39b2ec4cb0]
> [MYMACHINE:126820] [ 1] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(opal_libevent2021_event_add+0x10)[0x7f39b23e7430]
> [MYMACHINE:126820] [ 2] 
> /opt/openmpi-1.8.1/lib/libopen-rte.so.7(+0x25a57)[0x7f39b2676a57]
> [MYMACHINE:126820] [ 3] 
> /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help_norender+0x197)[0x7f39b2676fb7]
> [MYMACHINE:126820] [ 4] 
> /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help+0x10f)[0x7f39b267718f]
> [MYMACHINE:126820] [ 5] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(+0x41f2a)[0x7f39b23c5f2a]
> [MYMACHINE:126820] [ 6] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_components_filter+0x273)[0x7f39b23c70c3]
> [MYMACHINE:126820] [ 7] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_components_open+0x58)[0x7f39b23c8278]
> [MYMACHINE:126820] [ 8] 
> /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_open+0x7c)[0x7f39b23d1e6c]
> [MYMACHINE:126820] [ 9] 
> /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_init+0x111)[0x7f39b2666e21]
> [MYMACHINE:126820] [10] 
> /opt/openmpi-1.8.1/lib/libmpi.so.1(ompi_mpi_init+0x1c2)[0x7f39b3115c92]
> [MYMACHINE:126820] [11] 
> /opt/openmpi-1.8.1/lib/libmpi.so.1(MPI_Init+0x1ab)[0x7f39b31387bb]
> [MYMACHINE:126820] [12] mb[0x402024]
> [MYMACHINE:126820] [13] 
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7f39b2b187ed]
> [MYMACHINE:126820] [14] mb[0x402111]
> [MYMACHINE:126820] *** End of error message ***
> --
> A requested component was not found, or was unable to be opened.  This
> means that this component is either not installed or is unable to be
> used on your system (e.g., sometimes this means that shared libraries
> that the component requires are unable to be found/loaded).  Note that
> O

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-22 Thread r...@open-mpi.org
Hmmm...perhaps we can break this out a bit? The stdin will be going to your 
rank=0 proc. It sounds like you have some subsequent step that calls MPI_Bcast?

Can you first verify that the input is being correctly delivered to rank=0? 
This will help us isolate if the problem is in the IO forwarding, or in the 
subsequent Bcast.

> On Aug 22, 2016, at 1:11 PM, Jingchao Zhang  wrote:
> 
> Hi all,
> 
> We compiled openmpi/2.0.0 with gcc/6.1.0 and intel/13.1.3. Both of them have 
> odd behaviors when trying to read from standard input.
> 
> For example, if we start the application lammps across 4 nodes, each node 16 
> cores, connected by Intel QDR Infiniband, mpirun works fine for the 1st time, 
> but always stuck in a few seconds thereafter.
> Command:
> mpirun ./lmp_ompi_g++ < in.snr
> in.snr is the Lammps input file. compiler is gcc/6.1.
> 
> Instead, if we use
> mpirun ./lmp_ompi_g++ -in in.snr
> it works 100%.
> 
> Some odd behaviors we gathered so far. 
> 1. For 1 node job, stdin always works.
> 2. For multiple nodes, stdin works unstably when the number of cores per node 
> are relatively small. For example, for 2/3/4 nodes, each node 8 cores, mpirun 
> works most of the time. But for each node with >8 cores, mpirun works the 1st 
> time, then always stuck. There seems to be a magic number when it stops 
> working.
> 3. We tested Quantum Expresso with compiler intel/13 and had the same issue. 
> 
> We used gdb to debug and found when mpirun was stuck, the rest of the 
> processes were all waiting on mpi broadcast from the master thread. The 
> lammps binary, input file and gdb core files (example.tar.bz2) can be 
> downloaded from this link 
> https://drive.google.com/open?id=0B3Yj4QkZpI-dVWZtWmJ3ZXNVRGc 
> 
> 
> Extra information:
> 1. Job scheduler is slurm.
> 2. configure setup:
> ./configure --prefix=$PREFIX \
> --with-hwloc=internal \
> --enable-mpirun-prefix-by-default \
> --with-slurm \
> --with-verbs \
> --with-psm \
> --disable-openib-connectx-xrc \
> --with-knem=/opt/knem-1.1.2.90mlnx1 \
> --with-cma
> 3. openmpi-mca-params.conf file 
> orte_hetero_nodes=1
> hwloc_base_binding_policy=core
> rmaps_base_mapping_policy=core
> opal_cuda_support=0
> btl_openib_use_eager_rdma=0
> btl_openib_max_eager_rdma=0
> btl_openib_flags=1
> 
> Thanks,
> Jingchao 
> 
> Dr. Jingchao Zhang
> Holland Computing Center
> University of Nebraska-Lincoln
> 402-472-6400
> ___
> users mailing list
> users@lists.open-mpi.org 
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
> 
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-22 Thread r...@open-mpi.org
Well, I can try to find time to take a look. However, I will reiterate what 
Jeff H said - it is very unwise to rely on IO forwarding. Much better to just 
directly read the file unless that file is simply unavailable on the node where 
rank=0 is running.

> On Aug 22, 2016, at 1:55 PM, Jingchao Zhang  wrote:
> 
> Here you can find the source code for lammps input 
> https://github.com/lammps/lammps/blob/r13864/src/input.cpp 
> <https://github.com/lammps/lammps/blob/r13864/src/input.cpp>
> Based on the gdb output, rank 0 stuck at line 167
> if
>  (fgets(&line[m],maxline-m,infile)
>  == NULL)
> and the rest threads stuck at line 203
> MPI_Bcast(&n,1,MPI_INT,0,world);
> 
> So rank 0 possibly hangs on the fgets() function.
> 
> Here are the whole backtrace information:
> $ cat master.backtrace worker.backtrace
> #0  0x003c37cdb68d in read () from /lib64/libc.so.6
> #1  0x003c37c71ca8 in _IO_new_file_underflow () from /lib64/libc.so.6
> #2  0x003c37c737ae in _IO_default_uflow_internal () from /lib64/libc.so.6
> #3  0x003c37c67e8a in _IO_getline_info_internal () from /lib64/libc.so.6
> #4  0x003c37c66ce9 in fgets () from /lib64/libc.so.6
> #5  0x005c5a43 in LAMMPS_NS::Input::file() () at ../input.cpp:167
> #6  0x005d4236 in main () at ../main.cpp:31
> #0  0x2b1635d2ace2 in poll_dispatch () from 
> /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libopen-pal.so.20
> #1  0x2b1635d1fa71 in opal_libevent2022_event_base_loop ()
>from /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libopen-pal.so.20
> #2  0x2b1635ce4634 in opal_progress () from 
> /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libopen-pal.so.20
> #3  0x2b16351b8fad in ompi_request_default_wait () from 
> /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libmpi.so.20
> #4  0x2b16351fcb40 in ompi_coll_base_bcast_intra_generic ()
>from /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libmpi.so.20
> #5  0x2b16351fd0c2 in ompi_coll_base_bcast_intra_binomial ()
>from /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libmpi.so.20
> #6  0x2b1644fa6d9b in ompi_coll_tuned_bcast_intra_dec_fixed ()
>from /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/openmpi/mca_coll_tuned.so
> #7  0x2b16351cb4fb in PMPI_Bcast () from 
> /util/opt/openmpi/2.0.0/gcc/6.1.0/lib/libmpi.so.20
> #8  0x005c5b5d in LAMMPS_NS::Input::file() () at ../input.cpp:203
> #9  0x005d4236 in main () at ../main.cpp:31
> 
> Thanks,
> 
> Dr. Jingchao Zhang
> Holland Computing Center
> University of Nebraska-Lincoln
> 402-472-6400
> From: users  <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org 
> <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>>
> Sent: Monday, August 22, 2016 2:17:10 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>  
> Hmmm...perhaps we can break this out a bit? The stdin will be going to your 
> rank=0 proc. It sounds like you have some subsequent step that calls 
> MPI_Bcast?
> 
> Can you first verify that the input is being correctly delivered to rank=0? 
> This will help us isolate if the problem is in the IO forwarding, or in the 
> subsequent Bcast.
> 
>> On Aug 22, 2016, at 1:11 PM, Jingchao Zhang > <mailto:zh...@unl.edu>> wrote:
>> 
>> Hi all,
>> 
>> We compiled openmpi/2.0.0 with gcc/6.1.0 and intel/13.1.3. Both of them have 
>> odd behaviors when trying to read from standard input.
>> 
>> For example, if we start the application lammps across 4 nodes, each node 16 
>> cores, connected by Intel QDR Infiniband, mpirun works fine for the 1st 
>> time, but always stuck in a few seconds thereafter.
>> Command:
>> mpirun ./lmp_ompi_g++ < in.snr
>> in.snr is the Lammps input file. compiler is gcc/6.1.
>> 
>> Instead, if we use
>> mpirun ./lmp_ompi_g++ -in in.snr
>> it works 100%.
>> 
>> Some odd behaviors we gathered so far. 
>> 1. For 1 node job, stdin always works.
>> 2. For multiple nodes, stdin works unstably when the number of cores per 
>> node are relatively small. For example, for 2/3/4 nodes, each node 8 cores, 
>> mpirun works most of the time. But for each node with >8 cores, mpirun works 
>> the 1st time, then always stuck. There seems to be a magic number when it 
>> stops working.
>> 3. We tested Quantum Expresso with compiler intel/13 and had the same issue. 
>> 
>> We used gdb to debug and found when mpirun was stuck, the rest of the 
>> processes were all waiting on mpi broadcast from the master thread. The 
>> lammps binary, input file and gdb core files (example.tar.bz2) can be 
>> downloaded from this link 
>> https://dri

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-22 Thread r...@open-mpi.org
FWIW: I just tested forwarding up to 100MBytes via stdin using the simple test 
shown below with OMPI v2.0.1rc1, and it worked fine. So I’d suggest upgrading 
when the official release comes out, or going ahead and at least testing 
2.0.1rc1 on your machine. Or you can test this program with some input file and 
let me know if it works for you.

Ralph

#include 
#include 
#include 
#include 
#include 
#include 

#define ORTE_IOF_BASE_MSG_MAX   2048

int main(int argc, char *argv[])
{
int i, rank, size, next, prev, tag = 201;
int pos, msgsize, nbytes;
bool done;
char *msg;

MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);

fprintf(stderr, "Rank %d has cleared MPI_Init\n", rank);

next = (rank + 1) % size;
prev = (rank + size - 1) % size;
msg = malloc(ORTE_IOF_BASE_MSG_MAX);
pos = 0;
nbytes = 0;

if (0 == rank) {
while (0 != (msgsize = read(0, msg, ORTE_IOF_BASE_MSG_MAX))) {
fprintf(stderr, "Rank %d: sending blob %d\n", rank, pos);
if (msgsize > 0) {
MPI_Bcast(msg, ORTE_IOF_BASE_MSG_MAX, MPI_BYTE, 0, 
MPI_COMM_WORLD);
}
++pos;
nbytes += msgsize;
}
fprintf(stderr, "Rank %d: sending termination blob %d\n", rank, pos);
memset(msg, 0, ORTE_IOF_BASE_MSG_MAX);
MPI_Bcast(msg, ORTE_IOF_BASE_MSG_MAX, MPI_BYTE, 0, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
} else {
while (1) {
MPI_Bcast(msg, ORTE_IOF_BASE_MSG_MAX, MPI_BYTE, 0, MPI_COMM_WORLD);
fprintf(stderr, "Rank %d: recvd blob %d\n", rank, pos);
++pos;
done = true;
for (i=0; i < ORTE_IOF_BASE_MSG_MAX; i++) {
if (0 != msg[i]) {
done = false;
break;
}
}
if (done) {
break;
}
}
fprintf(stderr, "Rank %d: recv done\n", rank);
MPI_Barrier(MPI_COMM_WORLD);
}

fprintf(stderr, "Rank %d has completed bcast\n", rank);
MPI_Finalize();
return 0;
}



> On Aug 22, 2016, at 3:40 PM, Jingchao Zhang  wrote:
> 
> This might be a thin argument but we have many users running mpirun in this 
> way for years with no problem until this recent upgrade. And some home-brewed 
> mpi codes do not even have a standard way to read the input files. Last time 
> I checked, the openmpi manual still claims it supports stdin 
> (https://www.open-mpi.org/doc/v2.0/man1/mpirun.1.php#sect14 
> <https://www.open-mpi.org/doc/v2.0/man1/mpirun.1.php#sect14>). Maybe I missed 
> it but the v2.0 release notes did not mention any changes to the behaviors of 
> stdin as well.
> 
> We can tell our users to run mpirun in the suggested way, but I do hope 
> someone can look into the issue and fix it.
> 
> Dr. Jingchao Zhang
> Holland Computing Center
> University of Nebraska-Lincoln
> 402-472-6400
> From: users  <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org 
> <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>>
> Sent: Monday, August 22, 2016 3:04:50 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>  
> Well, I can try to find time to take a look. However, I will reiterate what 
> Jeff H said - it is very unwise to rely on IO forwarding. Much better to just 
> directly read the file unless that file is simply unavailable on the node 
> where rank=0 is running.
> 
>> On Aug 22, 2016, at 1:55 PM, Jingchao Zhang > <mailto:zh...@unl.edu>> wrote:
>> 
>> Here you can find the source code for lammps input 
>> https://github.com/lammps/lammps/blob/r13864/src/input.cpp 
>> <https://github.com/lammps/lammps/blob/r13864/src/input.cpp>
>> Based on the gdb output, rank 0 stuck at line 167
>> if
>>  (fgets(&line[m],maxline-m,infile)
>>  == NULL)
>> and the rest threads stuck at line 203
>> MPI_Bcast(&n,1,MPI_INT,0,world);
>> 
>> So rank 0 possibly hangs on the fgets() function.
>> 
>> Here are the whole backtrace information:
>> $ cat master.backtrace worker.backtrace
>> #0  0x003c37cdb68d in read () from /lib64/libc.so.6
>> #1  0x003c37c71ca8 in _IO_new_file_underflow () from /lib64/libc.so.6
>> #2  0x003c37c737ae in _IO_default_uflow_internal () from /lib64/libc.so.6
>> #3  0x003c37c67e8a in _IO_getline_info_internal () from /lib64/libc.so.6
>> #4  0x003c37c66ce9 in fgets () from /lib64/libc.so.6
>> #5  0x005c5a43 in LAMMPS_NS::Input::file() () at ../input.cpp:167
>> #6  0x005d4236 in main () at ../main.cpp:31

Re: [OMPI users] OS X El Capitan 10.11.6 ld: symbol(s) not found for architecture x86_64

2016-08-23 Thread r...@open-mpi.org
I’m confused - you keep talking about MPICH, but the symbol you are looking for 
is from OMPI. You cannot mix the two MPI libraries - is that what you are 
trying to do?

> On Aug 23, 2016, at 1:30 PM, Richard G French  wrote:
> 
> Thanks for the suggestion, Doug - but I can't seem to find the missing 
> function ompi_mpi_byte in any of those other libraries. I'll keep looking! I 
> wonder if I failed to configure mpich properly when I built it.
> Dick
> 
> 
> On Tue, Aug 23, 2016 at 4:01 PM, Douglas L Reeder  > wrote:
> Richard,
> 
> It looks like you need to add some -l arguments to the the specific 
> openmpi libraries hat you need (e.g., -lmpi -lmpi_cxx)
> 
> Doug
>> On Aug 23, 2016, at 1:43 PM, Richard G French > > wrote:
>> 
>> Hi, all -
>> I'm trying to build the SPH code Gadget2 
>> (http://wwwmpa.mpa-garching.mpg.de/gadget/ 
>> ) under OS X 10.11.6 and I am 
>> getting the following type of error:
>> 
>> 222 rfrench@cosmos> make
>> 
>> mpicc main.o  run.o  predict.o begrun.o endrun.o global.o timestep.o  init.o 
>> restart.o  io.o accel.o   read_ic.o  ngb.o system.o  allocate.o  density.o 
>> gravtree.o hydra.o  driftfac.o domain.o  allvars.o potential.o forcetree.o   
>> peano.o gravtree_forcetest.o pm_periodic.o pm_nonperiodic.o longrange.o   -g 
>>  -L/opt/local/lib/mpich-mp  -L/usr/local/lib -lgsl -lgslcblas -lm 
>> -L/usr/local/lib -lrfftw_mpi -lfftw_mpi -lrfftw -lfftw-o  Gadget2  
>> 
>> Undefined symbols for architecture x86_64:
>> 
>>   "_ompi_mpi_byte", referenced from:
>> 
>>   _read_parameter_file in begrun.o
>> 
>>   _compute_global_quantities_of_system in global.o
>> 
>>   _restart in restart.o
>> 
>>   _write_file in io.o
>> 
>>   _read_file in read_ic.o
>> 
>>   _find_files in read_ic.o
>> 
>>   _density in density.o
>> 
>> ..
>> 
>> I built the mpich library using 
>> 
>> cd openmpi-2.0.0/
>> 
>> 
>> ./configure
>> 
>> 
>> sudo make all install
>> 
>> which installed the libraries in
>> 
>> 
>> /opt/local/lib/mpich-mp
>> 
>> 
>> 
>> I can't seem to track down the library that contains ompi_mpi_byte.
>> 
>> 
>> 
>> Any suggestions would be welcome. Thanks!
>> 
>> Dick French
>> 
>> 
>> 
>> 
>> -- 
>> Richard G. French
>> McDowell and Whiting Professor of Astrophysics
>> Chair of the Astronomy Department, Wellesley College
>> Director of the Whitin Observatory
>> Cassini Mission to Saturn Radio Science Team Leader
>> Wellesley, MA 02481-8203
>> (781) 283-3747 
>> ___
>> users mailing list
>> users@lists.open-mpi.org 
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>> 
> 
> ___
> users mailing list
> users@lists.open-mpi.org 
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
> 
> 
> 
> 
> -- 
> Richard G. French
> McDowell and Whiting Professor of Astrophysics
> Chair of the Astronomy Department, Wellesley College
> Director of the Whitin Observatory
> Cassini Mission to Saturn Radio Science Team Leader
> Wellesley, MA 02481-8203
> (781) 283-3747
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-23 Thread r...@open-mpi.org
The IO forwarding messages all flow over the Ethernet, so the type of fabric is 
irrelevant. The number of procs involved would definitely have an impact, but 
that might not be due to the IO forwarding subsystem. We know we have flow 
control issues with collectives like Bcast that don’t have built-in 
synchronization points. How many reads were you able to do before it hung?

I was running it on my little test setup (2 nodes, using only a few procs), but 
I’ll try scaling up and see what happens. I’ll also try introducing some forced 
“syncs” on the Bcast and see if that solves the issue.

Ralph

> On Aug 23, 2016, at 2:30 PM, Jingchao Zhang  wrote:
> 
> Hi Ralph,
> 
> I tested v2.0.1rc1 with your code but has the same issue. I also installed 
> v2.0.1rc1 on a different cluster which has Mellanox QDR Infiniband and get 
> the same result. For the tests you have done, how many cores and nodes did 
> you use? I can trigger the problem by using multiple nodes and each node with 
> more than 10 cores. 
> 
> Thank you for looking into this.
> 
> Dr. Jingchao Zhang
> Holland Computing Center
> University of Nebraska-Lincoln
> 402-472-6400
> From: users  <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org 
> <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>>
> Sent: Monday, August 22, 2016 10:23:42 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>  
> FWIW: I just tested forwarding up to 100MBytes via stdin using the simple 
> test shown below with OMPI v2.0.1rc1, and it worked fine. So I’d suggest 
> upgrading when the official release comes out, or going ahead and at least 
> testing 2.0.1rc1 on your machine. Or you can test this program with some 
> input file and let me know if it works for you.
> 
> Ralph
> 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> #define ORTE_IOF_BASE_MSG_MAX   2048
> 
> int main(int argc, char *argv[])
> {
> int i, rank, size, next, prev, tag = 201;
> int pos, msgsize, nbytes;
> bool done;
> char *msg;
> 
> MPI_Init(&argc, &argv);
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> MPI_Comm_size(MPI_COMM_WORLD, &size);
> 
> fprintf(stderr, "Rank %d has cleared MPI_Init\n", rank);
> 
> next = (rank + 1) % size;
> prev = (rank + size - 1) % size;
> msg = malloc(ORTE_IOF_BASE_MSG_MAX);
> pos = 0;
> nbytes = 0;
> 
> if (0 == rank) {
> while (0 != (msgsize = read(0, msg, ORTE_IOF_BASE_MSG_MAX))) {
> fprintf(stderr, "Rank %d: sending blob %d\n", rank, pos);
> if (msgsize > 0) {
> MPI_Bcast(msg, ORTE_IOF_BASE_MSG_MAX, MPI_BYTE, 0, 
> MPI_COMM_WORLD);
> }
> ++pos;
> nbytes += msgsize;
> }
> fprintf(stderr, "Rank %d: sending termination blob %d\n", rank, pos);
> memset(msg, 0, ORTE_IOF_BASE_MSG_MAX);
> MPI_Bcast(msg, ORTE_IOF_BASE_MSG_MAX, MPI_BYTE, 0, MPI_COMM_WORLD);
> MPI_Barrier(MPI_COMM_WORLD);
> } else {
> while (1) {
> MPI_Bcast(msg, ORTE_IOF_BASE_MSG_MAX, MPI_BYTE, 0, 
> MPI_COMM_WORLD);
> fprintf(stderr, "Rank %d: recvd blob %d\n", rank, pos);
> ++pos;
> done = true;
> for (i=0; i < ORTE_IOF_BASE_MSG_MAX; i++) {
> if (0 != msg[i]) {
> done = false;
> break;
> }
> }
> if (done) {
> break;
> }
> }
> fprintf(stderr, "Rank %d: recv done\n", rank);
> MPI_Barrier(MPI_COMM_WORLD);
> }
> 
> fprintf(stderr, "Rank %d has completed bcast\n", rank);
> MPI_Finalize();
> return 0;
> }
> 
> 
> 
>> On Aug 22, 2016, at 3:40 PM, Jingchao Zhang > <mailto:zh...@unl.edu>> wrote:
>> 
>> This might be a thin argument but we have many users running mpirun in this 
>> way for years with no problem until this recent upgrade. And some 
>> home-brewed mpi codes do not even have a standard way to read the input 
>> files. Last time I checked, the openmpi manual still claims it supports 
>> stdin (https://www.open-mpi.org/doc/v2.0/man1/mpirun.1.php#sect14 
>> <https://www.open-mpi.org/doc/v2.0/man1/mpirun.1.php#sect14>). Maybe I 
>> missed it but the v2.0 release notes did not mention any changes to the 
>> behaviors of stdin as well.
>> 
>> We can tell our users to run mpirun in the suggested way, but I do hope 
>> someone can look

Re: [OMPI users] Using Open MPI with PBS Pro

2016-08-23 Thread r...@open-mpi.org
I’ve never heard of that, and cannot imagine what it has to do with the 
resource manager. Can you point to where you heard that one?

FWIW: we don’t ship OMPI with anything in the default mca params file, so 
somebody must have put it in there for you.


> On Aug 23, 2016, at 4:48 PM, Andy Riebs  wrote:
> 
> I gleaned from the web that I need to comment out "opal_event_include=epoll" 
> in /etc/openmpi-mca-params.conf in order to use Open MPI with 
> PBS Pro.
> 
> Can we also disable that in other cases, like Slurm, or is this something 
> specific to PBS Pro?
> 
> Andy
> 
> -- 
> Andy Riebs
> andy.ri...@hpe.com
> Hewlett-Packard Enterprise
> High Performance Computing Software Engineering
> +1 404 648 9024
> My opinions are not necessarily those of HPE
>May the source be with you!
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-23 Thread r...@open-mpi.org
Very strange. I cannot reproduce it as I’m able to run any number of nodes and 
procs, pushing over 100Mbytes thru without any problem.

Which leads me to suspect that the issue here is with the tty interface. Can 
you tell me what shell and OS you are running?


> On Aug 23, 2016, at 3:25 PM, Jingchao Zhang  wrote:
> 
> Everything stuck at MPI_Init. For a test job with 2 nodes and 10 cores each 
> node, I got the following
> 
> $ mpirun ./a.out < test.in
> Rank 2 has cleared MPI_Init
> Rank 4 has cleared MPI_Init
> Rank 7 has cleared MPI_Init
> Rank 8 has cleared MPI_Init
> Rank 0 has cleared MPI_Init
> Rank 5 has cleared MPI_Init
> Rank 6 has cleared MPI_Init
> Rank 9 has cleared MPI_Init
> Rank 1 has cleared MPI_Init
> Rank 16 has cleared MPI_Init
> Rank 19 has cleared MPI_Init
> Rank 10 has cleared MPI_Init
> Rank 11 has cleared MPI_Init
> Rank 12 has cleared MPI_Init
> Rank 13 has cleared MPI_Init
> Rank 14 has cleared MPI_Init
> Rank 15 has cleared MPI_Init
> Rank 17 has cleared MPI_Init
> Rank 18 has cleared MPI_Init
> Rank 3 has cleared MPI_Init
> 
> then it just hanged.
> 
> --Jingchao
> 
> Dr. Jingchao Zhang
> Holland Computing Center
> University of Nebraska-Lincoln
> 402-472-6400
> From: users  <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org 
> <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>>
> Sent: Tuesday, August 23, 2016 4:03:07 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>  
> The IO forwarding messages all flow over the Ethernet, so the type of fabric 
> is irrelevant. The number of procs involved would definitely have an impact, 
> but that might not be due to the IO forwarding subsystem. We know we have 
> flow control issues with collectives like Bcast that don’t have built-in 
> synchronization points. How many reads were you able to do before it hung?
> 
> I was running it on my little test setup (2 nodes, using only a few procs), 
> but I’ll try scaling up and see what happens. I’ll also try introducing some 
> forced “syncs” on the Bcast and see if that solves the issue.
> 
> Ralph
> 
>> On Aug 23, 2016, at 2:30 PM, Jingchao Zhang > <mailto:zh...@unl.edu>> wrote:
>> 
>> Hi Ralph,
>> 
>> I tested v2.0.1rc1 with your code but has the same issue. I also installed 
>> v2.0.1rc1 on a different cluster which has Mellanox QDR Infiniband and get 
>> the same result. For the tests you have done, how many cores and nodes did 
>> you use? I can trigger the problem by using multiple nodes and each node 
>> with more than 10 cores. 
>> 
>> Thank you for looking into this.
>> 
>> Dr. Jingchao Zhang
>> Holland Computing Center
>> University of Nebraska-Lincoln
>> 402-472-6400
>> From: users > <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org 
>> <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>>
>> Sent: Monday, August 22, 2016 10:23:42 PM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>  
>> FWIW: I just tested forwarding up to 100MBytes via stdin using the simple 
>> test shown below with OMPI v2.0.1rc1, and it worked fine. So I’d suggest 
>> upgrading when the official release comes out, or going ahead and at least 
>> testing 2.0.1rc1 on your machine. Or you can test this program with some 
>> input file and let me know if it works for you.
>> 
>> Ralph
>> 
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>> 
>> #define ORTE_IOF_BASE_MSG_MAX   2048
>> 
>> int main(int argc, char *argv[])
>> {
>> int i, rank, size, next, prev, tag = 201;
>> int pos, msgsize, nbytes;
>> bool done;
>> char *msg;
>> 
>> MPI_Init(&argc, &argv);
>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>> MPI_Comm_size(MPI_COMM_WORLD, &size);
>> 
>> fprintf(stderr, "Rank %d has cleared MPI_Init\n", rank);
>> 
>> next = (rank + 1) % size;
>> prev = (rank + size - 1) % size;
>> msg = malloc(ORTE_IOF_BASE_MSG_MAX);
>> pos = 0;
>> nbytes = 0;
>> 
>> if (0 == rank) {
>> while (0 != (msgsize = read(0, msg, ORTE_IOF_BASE_MSG_MAX))) {
>> fprintf(stderr, "Rank %d: sending blob %d\n", rank, pos);
>> if (msgsize > 0) {
>> MPI_Bcast(msg, ORTE_IOF_BASE_MSG_MAX, MPI_BYTE, 0, 
>> MPI_COMM_WORLD);
>> }
>>  

Re: [OMPI users] Using Open MPI with PBS Pro

2016-08-24 Thread r...@open-mpi.org
Ah yes - that was a case where someone had added an incorrect param to the 
default file, and we suggested they remove it


> On Aug 24, 2016, at 5:26 AM, Andy Riebs  wrote:
> 
> Hi Ralph,
> 
> I think I found that information at 
> <https://github.com/open-mpi/ompi/issues/341> :-)
> 
> In any case, thanks for the information about the default params file -- I 
> won't worry too much about modifying it then.
> 
> Andy
> 
> I
> On 08/23/2016 08:08 PM, r...@open-mpi.org wrote:
>> I’ve never heard of that, and cannot imagine what it has to do with the 
>> resource manager. Can you point to where you heard that one?
>> 
>> FWIW: we don’t ship OMPI with anything in the default mca params file, so 
>> somebody must have put it in there for you.
>> 
>> 
>>> On Aug 23, 2016, at 4:48 PM, Andy Riebs  wrote:
>>> 
>>> I gleaned from the web that I need to comment out 
>>> "opal_event_include=epoll" in /etc/openmpi-mca-params.conf in 
>>> order to use Open MPI with PBS Pro.
>>> 
>>> Can we also disable that in other cases, like Slurm, or is this something 
>>> specific to PBS Pro?
>>> 
>>> Andy
>>> 
>>> -- 
>>> Andy Riebs
>>> andy.ri...@hpe.com
>>> Hewlett-Packard Enterprise
>>> High Performance Computing Software Engineering
>>> +1 404 648 9024
>>> My opinions are not necessarily those of HPE
>>>May the source be with you!
>>> 
>>> ___
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-24 Thread r...@open-mpi.org
Afraid I can’t replicate a problem at all, whether rank=0 is local or not. I’m 
also using bash, but on CentOS-7, so I suspect the OS is the difference.

Can you configure OMPI with --enable-debug, and then run the test again with 
--mca iof_base_verbose 100? It will hopefully tell us something about why the 
IO subsystem is stuck.


> On Aug 24, 2016, at 8:46 AM, Jingchao Zhang  wrote:
> 
> Hi Ralph,
> 
> For our tests, rank 0 is always on the same node with mpirun. I just tested 
> mpirun with -nolocal and it still hangs. 
> 
> Information on shell and OS
> $ echo $0
> -bash
> 
> $ lsb_release -a
> LSB Version:
> :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
> Distributor ID: Scientific
> Description:Scientific Linux release 6.8 (Carbon)
> Release:6.8
> Codename:   Carbon
> 
> $ uname -a
> Linux login.crane.hcc.unl.edu <http://login.crane.hcc.unl.edu/> 
> 2.6.32-642.3.1.el6.x86_64 #1 SMP Tue Jul 12 11:25:51 CDT 2016 x86_64 x86_64 
> x86_64 GNU/Linux
> 
> 
> Dr. Jingchao Zhang
> Holland Computing Center
> University of Nebraska-Lincoln
> 402-472-6400
> From: users  <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org 
> <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>>
> Sent: Tuesday, August 23, 2016 8:14:48 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>  
> Hmmm...that’s a good point. Rank 0 and mpirun are always on the same node on 
> my cluster. I’ll give it a try.
> 
> Jingchao: is rank 0 on the node with mpirun, or on a remote node?
> 
> 
>> On Aug 23, 2016, at 5:58 PM, Gilles Gouaillardet > <mailto:gil...@rist.or.jp>> wrote:
>> 
>> Ralph,
>> 
>> did you run task 0 and mpirun on different nodes ?
>> 
>> i observed some random hangs, though i cannot blame openmpi 100% yet
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> On 8/24/2016 9:41 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote:
>>> Very strange. I cannot reproduce it as I’m able to run any number of nodes 
>>> and procs, pushing over 100Mbytes thru without any problem.
>>> 
>>> Which leads me to suspect that the issue here is with the tty interface. 
>>> Can you tell me what shell and OS you are running?
>>> 
>>> 
>>>> On Aug 23, 2016, at 3:25 PM, Jingchao Zhang >>> <mailto:zh...@unl.edu>> wrote:
>>>> 
>>>> Everything stuck at MPI_Init. For a test job with 2 nodes and 10 cores 
>>>> each node, I got the following
>>>> 
>>>> $ mpirun ./a.out < test.in
>>>> Rank 2 has cleared MPI_Init
>>>> Rank 4 has cleared MPI_Init
>>>> Rank 7 has cleared MPI_Init
>>>> Rank 8 has cleared MPI_Init
>>>> Rank 0 has cleared MPI_Init
>>>> Rank 5 has cleared MPI_Init
>>>> Rank 6 has cleared MPI_Init
>>>> Rank 9 has cleared MPI_Init
>>>> Rank 1 has cleared MPI_Init
>>>> Rank 16 has cleared MPI_Init
>>>> Rank 19 has cleared MPI_Init
>>>> Rank 10 has cleared MPI_Init
>>>> Rank 11 has cleared MPI_Init
>>>> Rank 12 has cleared MPI_Init
>>>> Rank 13 has cleared MPI_Init
>>>> Rank 14 has cleared MPI_Init
>>>> Rank 15 has cleared MPI_Init
>>>> Rank 17 has cleared MPI_Init
>>>> Rank 18 has cleared MPI_Init
>>>> Rank 3 has cleared MPI_Init
>>>> 
>>>> then it just hanged.
>>>> 
>>>> --Jingchao
>>>> 
>>>> Dr. Jingchao Zhang
>>>> Holland Computing Center
>>>> University of Nebraska-Lincoln
>>>> 402-472-6400
>>>> From: users >>> <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org 
>>>> <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>>
>>>> Sent: Tuesday, August 23, 2016 4:03:07 PM
>>>> To: Open MPI Users
>>>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>>>  
>>>> The IO forwarding messages all flow over the Ethernet, so the type of 
>>>> fabric is irrelevant. The number of procs involved would definitely have 
>>>> an impact, but that might not be due to the IO forwarding subsystem. We 
>>>> know we have flow control issues with collectives like Bcast that don’t 
>>>> have built-in synchronization points. How many reads were you able to do 
>>>> before it hung?

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-24 Thread r...@open-mpi.org
Bingo - found it, fix submitted and hope to get it into 2.0.1

Thanks for the assist!
Ralph


> On Aug 24, 2016, at 12:15 PM, Jingchao Zhang  wrote:
> 
> I configured v2.0.1rc1 with --enable-debug and ran the test with --mca 
> iof_base_verbose 100. I also added -display-devel-map in case it provides 
> some useful information.
> 
> Test job has 2 nodes, each node 10 cores. Rank 0 and mpirun command on the 
> same node.
> $ mpirun -display-devel-map --mca iof_base_verbose 100 ./a.out < test.in &> 
> debug_info.txt
> 
> The debug_info.txt is attached. 
> 
> Dr. Jingchao Zhang
> Holland Computing Center
> University of Nebraska-Lincoln
> 402-472-6400
> From: users  <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org 
> <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>>
> Sent: Wednesday, August 24, 2016 12:14:26 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>  
> Afraid I can’t replicate a problem at all, whether rank=0 is local or not. 
> I’m also using bash, but on CentOS-7, so I suspect the OS is the difference.
> 
> Can you configure OMPI with --enable-debug, and then run the test again with 
> --mca iof_base_verbose 100? It will hopefully tell us something about why the 
> IO subsystem is stuck.
> 
> 
>> On Aug 24, 2016, at 8:46 AM, Jingchao Zhang > <mailto:zh...@unl.edu>> wrote:
>> 
>> Hi Ralph,
>> 
>> For our tests, rank 0 is always on the same node with mpirun. I just tested 
>> mpirun with -nolocal and it still hangs. 
>> 
>> Information on shell and OS
>> $ echo $0
>> -bash
>> 
>> $ lsb_release -a
>> LSB Version:
>> :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
>> Distributor ID: Scientific
>> Description:Scientific Linux release 6.8 (Carbon)
>> Release:6.8
>> Codename:   Carbon
>> 
>> $ uname -a
>> Linux login.crane.hcc.unl.edu <http://login.crane.hcc.unl.edu/> 
>> 2.6.32-642.3.1.el6.x86_64 #1 SMP Tue Jul 12 11:25:51 CDT 2016 x86_64 x86_64 
>> x86_64 GNU/Linux
>> 
>> 
>> Dr. Jingchao Zhang
>> Holland Computing Center
>> University of Nebraska-Lincoln
>> 402-472-6400
>> From: users > <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org 
>> <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>>
>> Sent: Tuesday, August 23, 2016 8:14:48 PM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>  
>> Hmmm...that’s a good point. Rank 0 and mpirun are always on the same node on 
>> my cluster. I’ll give it a try.
>> 
>> Jingchao: is rank 0 on the node with mpirun, or on a remote node?
>> 
>> 
>>> On Aug 23, 2016, at 5:58 PM, Gilles Gouaillardet >> <mailto:gil...@rist.or.jp>> wrote:
>>> 
>>> Ralph,
>>> 
>>> did you run task 0 and mpirun on different nodes ?
>>> 
>>> i observed some random hangs, though i cannot blame openmpi 100% yet
>>> 
>>> Cheers,
>>> 
>>> Gilles
>>> 
>>> On 8/24/2016 9:41 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote:
>>>> Very strange. I cannot reproduce it as I’m able to run any number of nodes 
>>>> and procs, pushing over 100Mbytes thru without any problem.
>>>> 
>>>> Which leads me to suspect that the issue here is with the tty interface. 
>>>> Can you tell me what shell and OS you are running?
>>>> 
>>>> 
>>>>> On Aug 23, 2016, at 3:25 PM, Jingchao Zhang >>>> <mailto:zh...@unl.edu>> wrote:
>>>>> 
>>>>> Everything stuck at MPI_Init. For a test job with 2 nodes and 10 cores 
>>>>> each node, I got the following
>>>>> 
>>>>> $ mpirun ./a.out < test.in
>>>>> Rank 2 has cleared MPI_Init
>>>>> Rank 4 has cleared MPI_Init
>>>>> Rank 7 has cleared MPI_Init
>>>>> Rank 8 has cleared MPI_Init
>>>>> Rank 0 has cleared MPI_Init
>>>>> Rank 5 has cleared MPI_Init
>>>>> Rank 6 has cleared MPI_Init
>>>>> Rank 9 has cleared MPI_Init
>>>>> Rank 1 has cleared MPI_Init
>>>>> Rank 16 has cleared MPI_Init
>>>>> Rank 19 has cleared MPI_Init
>>>>> Rank 10 has cleared MPI_Init
>>>>> Rank 11 has cleared MPI_Init
>>>>> Rank 12 has c

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-25 Thread r...@open-mpi.org
??? Weird - can you send me an updated output of that last test we ran?

> On Aug 25, 2016, at 7:51 AM, Jingchao Zhang  wrote:
> 
> Hi Ralph,
> 
> I saw the pull request and did a test with v2.0.1rc1, but the problem 
> persists. Any ideas?
> 
> Thanks,
> 
> Dr. Jingchao Zhang
> Holland Computing Center
> University of Nebraska-Lincoln
> 402-472-6400
> From: users  <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org 
> <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>>
> Sent: Wednesday, August 24, 2016 1:27:28 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>  
> Bingo - found it, fix submitted and hope to get it into 2.0.1
> 
> Thanks for the assist!
> Ralph
> 
> 
>> On Aug 24, 2016, at 12:15 PM, Jingchao Zhang > <mailto:zh...@unl.edu>> wrote:
>> 
>> I configured v2.0.1rc1 with --enable-debug and ran the test with --mca 
>> iof_base_verbose 100. I also added -display-devel-map in case it provides 
>> some useful information.
>> 
>> Test job has 2 nodes, each node 10 cores. Rank 0 and mpirun command on the 
>> same node.
>> $ mpirun -display-devel-map --mca iof_base_verbose 100 ./a.out < test.in &> 
>> debug_info.txt
>> 
>> The debug_info.txt is attached. 
>> 
>> Dr. Jingchao Zhang
>> Holland Computing Center
>> University of Nebraska-Lincoln
>> 402-472-6400
>> From: users > <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org 
>> <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>>
>> Sent: Wednesday, August 24, 2016 12:14:26 PM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>  
>> Afraid I can’t replicate a problem at all, whether rank=0 is local or not. 
>> I’m also using bash, but on CentOS-7, so I suspect the OS is the difference.
>> 
>> Can you configure OMPI with --enable-debug, and then run the test again with 
>> --mca iof_base_verbose 100? It will hopefully tell us something about why 
>> the IO subsystem is stuck.
>> 
>> 
>>> On Aug 24, 2016, at 8:46 AM, Jingchao Zhang >> <mailto:zh...@unl.edu>> wrote:
>>> 
>>> Hi Ralph,
>>> 
>>> For our tests, rank 0 is always on the same node with mpirun. I just tested 
>>> mpirun with -nolocal and it still hangs. 
>>> 
>>> Information on shell and OS
>>> $ echo $0
>>> -bash
>>> 
>>> $ lsb_release -a
>>> LSB Version:
>>> :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
>>> Distributor ID: Scientific
>>> Description:    Scientific Linux release 6.8 (Carbon)
>>> Release:6.8
>>> Codename:   Carbon
>>> 
>>> $ uname -a
>>> Linux login.crane.hcc.unl.edu <http://login.crane.hcc.unl.edu/> 
>>> 2.6.32-642.3.1.el6.x86_64 #1 SMP Tue Jul 12 11:25:51 CDT 2016 x86_64 x86_64 
>>> x86_64 GNU/Linux
>>> 
>>> 
>>> Dr. Jingchao Zhang
>>> Holland Computing Center
>>> University of Nebraska-Lincoln
>>> 402-472-6400
>>> From: users >> <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org 
>>> <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>>
>>> Sent: Tuesday, August 23, 2016 8:14:48 PM
>>> To: Open MPI Users
>>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>>  
>>> Hmmm...that’s a good point. Rank 0 and mpirun are always on the same node 
>>> on my cluster. I’ll give it a try.
>>> 
>>> Jingchao: is rank 0 on the node with mpirun, or on a remote node?
>>> 
>>> 
>>>> On Aug 23, 2016, at 5:58 PM, Gilles Gouaillardet >>> <mailto:gil...@rist.or.jp>> wrote:
>>>> 
>>>> Ralph,
>>>> 
>>>> did you run task 0 and mpirun on different nodes ?
>>>> 
>>>> i observed some random hangs, though i cannot blame openmpi 100% yet
>>>> 
>>>> Cheers,
>>>> 
>>>> Gilles
>>>> 
>>>> On 8/24/2016 9:41 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote:
>>>>> Very strange. I cannot reproduce it as I’m able to run any number of 
>>>>> nodes and procs, pushing over 100Mbytes thru without any problem.
>>>>> 
>>>>> Which leads me to suspect that the issue here is with the tty interf

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-27 Thread r...@open-mpi.org
I am finding this impossible to replicate, so something odd must be going on. 
Can you please (a) pull down the latest v2.0.1 nightly tarball, and (b) add 
this patch to it?

diff --git a/orte/mca/iof/hnp/iof_hnp.c b/orte/mca/iof/hnp/iof_hnp.c
old mode 100644
new mode 100755
index 512fcdb..362ff46
--- a/orte/mca/iof/hnp/iof_hnp.c
+++ b/orte/mca/iof/hnp/iof_hnp.c
@@ -143,16 +143,17 @@ static int hnp_push(const orte_process_name_t* dst_name, 
orte_iof_tag_t src_tag,
 int np, numdigs;
 orte_ns_cmp_bitmask_t mask;
 
+opal_output(0,
+ "%s iof:hnp pushing fd %d for process %s",
+ ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
+ fd, ORTE_NAME_PRINT(dst_name));
+
 /* don't do this if the dst vpid is invalid or the fd is negative! */
 if (ORTE_VPID_INVALID == dst_name->vpid || fd < 0) {
 return ORTE_SUCCESS;
 }
 
-OPAL_OUTPUT_VERBOSE((1, orte_iof_base_framework.framework_output,
- "%s iof:hnp pushing fd %d for process %s",
- ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
- fd, ORTE_NAME_PRINT(dst_name)));
-
 if (!(src_tag & ORTE_IOF_STDIN)) {
 /* set the file descriptor to non-blocking - do this before we setup
  * and activate the read event in case it fires right away


You can then run the test again without the "--mca iof_base_verbose 100” flag 
to reduce the chatter - this print statement will tell me what I need to know.

Thanks!
Ralph


> On Aug 25, 2016, at 8:19 AM, Jeff Squyres (jsquyres)  
> wrote:
> 
> The IOF fix PR for v2.0.1 was literally just merged a few minutes ago; it 
> wasn't in last night's tarball.
> 
> 
> 
>> On Aug 25, 2016, at 10:59 AM, r...@open-mpi.org wrote:
>> 
>> ??? Weird - can you send me an updated output of that last test we ran?
>> 
>>> On Aug 25, 2016, at 7:51 AM, Jingchao Zhang  wrote:
>>> 
>>> Hi Ralph,
>>> 
>>> I saw the pull request and did a test with v2.0.1rc1, but the problem 
>>> persists. Any ideas?
>>> 
>>> Thanks,
>>> 
>>> Dr. Jingchao Zhang
>>> Holland Computing Center
>>> University of Nebraska-Lincoln
>>> 402-472-6400
>>> From: users  on behalf of 
>>> r...@open-mpi.org 
>>> Sent: Wednesday, August 24, 2016 1:27:28 PM
>>> To: Open MPI Users
>>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>> 
>>> Bingo - found it, fix submitted and hope to get it into 2.0.1
>>> 
>>> Thanks for the assist!
>>> Ralph
>>> 
>>> 
>>>> On Aug 24, 2016, at 12:15 PM, Jingchao Zhang  wrote:
>>>> 
>>>> I configured v2.0.1rc1 with --enable-debug and ran the test with --mca 
>>>> iof_base_verbose 100. I also added -display-devel-map in case it provides 
>>>> some useful information.
>>>> 
>>>> Test job has 2 nodes, each node 10 cores. Rank 0 and mpirun command on the 
>>>> same node.
>>>> $ mpirun -display-devel-map --mca iof_base_verbose 100 ./a.out < test.in 
>>>> &> debug_info.txt
>>>> 
>>>> The debug_info.txt is attached. 
>>>> 
>>>> Dr. Jingchao Zhang
>>>> Holland Computing Center
>>>> University of Nebraska-Lincoln
>>>> 402-472-6400
>>>> From: users  on behalf of 
>>>> r...@open-mpi.org 
>>>> Sent: Wednesday, August 24, 2016 12:14:26 PM
>>>> To: Open MPI Users
>>>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>>> 
>>>> Afraid I can’t replicate a problem at all, whether rank=0 is local or not. 
>>>> I’m also using bash, but on CentOS-7, so I suspect the OS is the 
>>>> difference.
>>>> 
>>>> Can you configure OMPI with --enable-debug, and then run the test again 
>>>> with --mca iof_base_verbose 100? It will hopefully tell us something about 
>>>> why the IO subsystem is stuck.
>>>> 
>>>> 
>>>>> On Aug 24, 2016, at 8:46 AM, Jingchao Zhang  wrote:
>>>>> 
>>>>> Hi Ralph,
>>>>> 
>>>>> For our tests, rank 0 is always on the same node with mpirun. I just 
>>>>> tested mpirun with -nolocal and it still hangs. 
>>>>> 
>>>>> Information on shell and OS
>>>>> $ echo $0
>>>>> -bash
>>>>> 
>>>>> $ lsb_release -a
>>>>> LSB Version:
>>>>> :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:c

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-29 Thread r...@open-mpi.org
PI_Init
> Rank 1 has cleared MPI_Init
> Rank 2 has cleared MPI_Init
> Rank 3 has cleared MPI_Init
> Rank 4 has cleared MPI_Init
> Rank 8 has cleared MPI_Init
> Rank 0 has cleared MPI_Init
> Rank 6 has cleared MPI_Init
> Rank 7 has cleared MPI_Init
> Rank 14 has cleared MPI_Init
> Rank 15 has cleared MPI_Init
> Rank 16 has cleared MPI_Init
> Rank 18 has cleared MPI_Init
> Rank 10 has cleared MPI_Init
> Rank 11 has cleared MPI_Init
> Rank 12 has cleared MPI_Init
> Rank 13 has cleared MPI_Init
> Rank 17 has cleared MPI_Init
> Rank 19 has cleared MPI_Init
> 
> Thanks,
> 
> Dr. Jingchao Zhang
> Holland Computing Center
> University of Nebraska-Lincoln
> 402-472-6400
> From: users  <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org 
> <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>>
> Sent: Saturday, August 27, 2016 12:31:53 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>  
> I am finding this impossible to replicate, so something odd must be going on. 
> Can you please (a) pull down the latest v2.0.1 nightly tarball, and (b) add 
> this patch to it?
> 
> diff --git a/orte/mca/iof/hnp/iof_hnp.c b/orte/mca/iof/hnp/iof_hnp.c
> old mode 100644
> new mode 100755
> index 512fcdb..362ff46
> --- a/orte/mca/iof/hnp/iof_hnp.c
> +++ b/orte/mca/iof/hnp/iof_hnp.c
> @@ -143,16 +143,17 @@ static int hnp_push(const orte_process_name_t* 
> dst_name, orte_iof_tag_t src_tag,
>  int np, numdigs;
>  orte_ns_cmp_bitmask_t mask;
>  
> +opal_output(0,
> + "%s iof:hnp pushing fd %d for process %s",
> + ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
> + fd, ORTE_NAME_PRINT(dst_name));
> +
>  /* don't do this if the dst vpid is invalid or the fd is negative! */
>  if (ORTE_VPID_INVALID == dst_name->vpid || fd < 0) {
>  return ORTE_SUCCESS;
>  }
>  
> -OPAL_OUTPUT_VERBOSE((1, orte_iof_base_framework.framework_output,
> - "%s iof:hnp pushing fd %d for process %s",
> - ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
> - fd, ORTE_NAME_PRINT(dst_name)));
> -
>  if (!(src_tag & ORTE_IOF_STDIN)) {
>  /* set the file descriptor to non-blocking - do this before we setup
>   * and activate the read event in case it fires right away
> 
> 
> You can then run the test again without the "--mca iof_base_verbose 100” flag 
> to reduce the chatter - this print statement will tell me what I need to know.
> 
> Thanks!
> Ralph
> 
> 
>> On Aug 25, 2016, at 8:19 AM, Jeff Squyres (jsquyres) > <mailto:jsquy...@cisco.com>> wrote:
>> 
>> The IOF fix PR for v2.0.1 was literally just merged a few minutes ago; it 
>> wasn't in last night's tarball.
>> 
>> 
>> 
>>> On Aug 25, 2016, at 10:59 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> 
>>> wrote:
>>> 
>>> ??? Weird - can you send me an updated output of that last test we ran?
>>> 
>>>> On Aug 25, 2016, at 7:51 AM, Jingchao Zhang >>> <mailto:zh...@unl.edu>> wrote:
>>>> 
>>>> Hi Ralph,
>>>> 
>>>> I saw the pull request and did a test with v2.0.1rc1, but the problem 
>>>> persists. Any ideas?
>>>> 
>>>> Thanks,
>>>> 
>>>> Dr. Jingchao Zhang
>>>> Holland Computing Center
>>>> University of Nebraska-Lincoln
>>>> 402-472-6400
>>>> From: users >>> <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org 
>>>> <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>>
>>>> Sent: Wednesday, August 24, 2016 1:27:28 PM
>>>> To: Open MPI Users
>>>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>>> 
>>>> Bingo - found it, fix submitted and hope to get it into 2.0.1
>>>> 
>>>> Thanks for the assist!
>>>> Ralph
>>>> 
>>>> 
>>>>> On Aug 24, 2016, at 12:15 PM, Jingchao Zhang >>>> <mailto:zh...@unl.edu>> wrote:
>>>>> 
>>>>> I configured v2.0.1rc1 with --enable-debug and ran the test with --mca 
>>>>> iof_base_verbose 100. I also added -display-devel-map in case it provides 
>>>>> some useful information.
>>>>> 
>>>>> Test job has 2 nodes, each node 10 cores. Rank 0 and mpirun command on 
>>>>> the same node.

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-30 Thread r...@open-mpi.org
for process [[26513,1],8]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC debug: 
> [[26513,0],0] iof:hnp pushing fd 72 for process [[26513,1],8]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC debug: 
> [[26513,0],0] iof:hnp pushing fd 67 for process [[26513,1],9]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC debug: 
> [[26513,0],0] iof:hnp pushing fd 74 for process [[26513,1],9]
> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC debug: 
> [[26513,0],0] iof:hnp pushing fd 76 for process [[26513,1],9]
> Rank 1 has cleared MPI_Init
> Rank 3 has cleared MPI_Init
> Rank 4 has cleared MPI_Init
> Rank 5 has cleared MPI_Init
> Rank 6 has cleared MPI_Init
> Rank 7 has cleared MPI_Init
> Rank 0 has cleared MPI_Init
> Rank 2 has cleared MPI_Init
> Rank 8 has cleared MPI_Init
> Rank 9 has cleared MPI_Init
> Rank 10 has cleared MPI_Init
> Rank 11 has cleared MPI_Init
> Rank 12 has cleared MPI_Init
> Rank 13 has cleared MPI_Init
> Rank 16 has cleared MPI_Init
> Rank 17 has cleared MPI_Init
> Rank 18 has cleared MPI_Init
> Rank 14 has cleared MPI_Init
> Rank 15 has cleared MPI_Init
> Rank 19 has cleared MPI_Init
> 
> 
> The part of code I changed in file ./orte/mca/iof/hnp/iof_hnp.c
> 
> opal_output(0,
>  "HCC debug: %s iof:hnp pushing fd %d for process %s",
>  ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
>  fd, ORTE_NAME_PRINT(dst_name));
> 
> /* don't do this if the dst vpid is invalid or the fd is negative! */
> if (ORTE_VPID_INVALID == dst_name->vpid || fd < 0) {
> return ORTE_SUCCESS;
> }
> 
> /*OPAL_OUTPUT_VERBOSE((1, orte_iof_base_framework.framework_output,
>  "%s iof:hnp pushing fd %d for process %s",
>  ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
>  fd, ORTE_NAME_PRINT(dst_name)));
> */
> 
> From: users  <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org 
> <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>>
> Sent: Monday, August 29, 2016 11:42:00 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>  
> I’m sorry, but something is simply very wrong here. Are you sure you are 
> pointed at the correct LD_LIBRARY_PATH? Perhaps add a “BOO” or something at 
> the front of the output message to ensure we are using the correct plugin?
> 
> This looks to me like you must be picking up a stale library somewhere.
> 
>> On Aug 29, 2016, at 10:29 AM, Jingchao Zhang > <mailto:zh...@unl.edu>> wrote:
>> 
>> Hi Ralph,
>> 
>> I used the tarball from Aug 26 and added the patch. Tested with 2 nodes, 10 
>> cores/node. Please see the results below:
>> 
>> $ mpirun ./a.out < test.in
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
>> [[43954,0],0] iof:hnp pushing fd 35 for process [[43954,1],0]
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
>> [[43954,0],0] iof:hnp pushing fd 41 for process [[43954,1],0]
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
>> [[43954,0],0] iof:hnp pushing fd 43 for process [[43954,1],0]
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
>> [[43954,0],0] iof:hnp pushing fd 37 for process [[43954,1],1]
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
>> [[43954,0],0] iof:hnp pushing fd 46 for process [[43954,1],1]
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
>> [[43954,0],0] iof:hnp pushing fd 49 for process [[43954,1],1]
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
>> [[43954,0],0] iof:hnp pushing fd 38 for process [[43954,1],2]
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
>> [[43954,0],0] iof:hnp pushing fd 50 for process [[43954,1],2]
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
>> [[43954,0],0] iof:hnp pushing fd 52 for process [[43954,1],2]
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
>> [[43954,0],0] iof:hnp pushing fd 42 for process [[43954,1],3]
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
>> [[43954,0],0] iof:hnp pushing fd 53 for process [[43954,1],3]
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:170750] 
>> [[43954,0],0] iof:hnp pushing fd 55 for process [[43954,1],3]
>> [c1725.cran

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-30 Thread r...@open-mpi.org
Well, that helped a bit. For some reason, your system is skipping a step in the 
launch state machine, and so we never hit the step where we setup the IO 
forwarding system.

Sorry to keep poking, but I haven’t seen this behavior anywhere else, and so I 
have no way to replicate it. Must be a subtle race condition.

Can you replace “plm” with ‘“state” and try to hit a “bad” run again?


> On Aug 30, 2016, at 12:30 PM, Jingchao Zhang  wrote:
> 
> Yes, all procs were launched properly. I added “-mca plm_base_verbose 5” to 
> the mpirun command. Please see attached for the results.
> 
> $mpirun -mca plm_base_verbose 5 ./a.out < test.in
> 
> I mentioned in my initial post that the test job can run properly for the 1st 
> time. But if I kill the job and resubmit, then it hangs. It happened with the 
> job above as well. Very odd. 
> From: users  <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org 
> <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>>
> Sent: Tuesday, August 30, 2016 12:56:33 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>  
> Hmmm...well, the problem appears to be that we aren’t setting up the input 
> channel to read stdin. This happens immediately after the application is 
> launched - there is no “if” clause or anything else in front of it. The only 
> way it wouldn’t get called is if all the procs weren’t launched, but that 
> appears to be happening, yes?
> 
> Hence my confusion - there is no test in front of that print statement now, 
> and yet we aren’t seeing the code being called.
> 
> Could you please add “-mca plm_base_verbose 5” to your cmd line? We should 
> see a debug statement print that contains "plm:base:launch wiring up iof for 
> job”
> 
> 
> 
>> On Aug 30, 2016, at 11:40 AM, Jingchao Zhang > <mailto:zh...@unl.edu>> wrote:
>> 
>> I checked again and as far as I can tell, everything was setup correctly. I 
>> added "HCC debug" to the output message to make sure it's the correct 
>> plugin. 
>> 
>> The updated outputs:
>> $ mpirun ./a.out < test.in
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC 
>> debug: [[26513,0],0] iof:hnp pushing fd 35 for process [[26513,1],0]
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC 
>> debug: [[26513,0],0] iof:hnp pushing fd 41 for process [[26513,1],0]
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC 
>> debug: [[26513,0],0] iof:hnp pushing fd 43 for process [[26513,1],0]
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC 
>> debug: [[26513,0],0] iof:hnp pushing fd 37 for process [[26513,1],1]
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC 
>> debug: [[26513,0],0] iof:hnp pushing fd 46 for process [[26513,1],1]
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC 
>> debug: [[26513,0],0] iof:hnp pushing fd 49 for process [[26513,1],1]
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC 
>> debug: [[26513,0],0] iof:hnp pushing fd 38 for process [[26513,1],2]
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC 
>> debug: [[26513,0],0] iof:hnp pushing fd 50 for process [[26513,1],2]
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC 
>> debug: [[26513,0],0] iof:hnp pushing fd 52 for process [[26513,1],2]
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC 
>> debug: [[26513,0],0] iof:hnp pushing fd 42 for process [[26513,1],3]
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC 
>> debug: [[26513,0],0] iof:hnp pushing fd 53 for process [[26513,1],3]
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC 
>> debug: [[26513,0],0] iof:hnp pushing fd 55 for process [[26513,1],3]
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC 
>> debug: [[26513,0],0] iof:hnp pushing fd 45 for process [[26513,1],4]
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC 
>> debug: [[26513,0],0] iof:hnp pushing fd 56 for process [[26513,1],4]
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC 
>> debug: [[26513,0],0] iof:hnp pushing fd 58 for process [[26513,1],4]
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC 
>> debug: [[26513,0],0] iof:hnp pushing fd 47 for process [[26513,1],5]
>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218

Re: [OMPI users] stdin issue with openmpi/2.0.0

2016-08-30 Thread r...@open-mpi.org
Oh my - that indeed illustrated the problem!! It is indeed a race condition on 
the backend orted. I’ll try to fix it - probably have to send you a patch to 
test?

> On Aug 30, 2016, at 1:04 PM, Jingchao Zhang  wrote:
> 
> $mpirun -mca state_base_verbose 5 ./a.out < test.in
> 
> Please see attached for the outputs.
> 
> Thank you Ralph. I am willing to provide whatever information you need.
> 
> From: users  <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org 
> <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>>
> Sent: Tuesday, August 30, 2016 1:45:45 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>  
> Well, that helped a bit. For some reason, your system is skipping a step in 
> the launch state machine, and so we never hit the step where we setup the IO 
> forwarding system.
> 
> Sorry to keep poking, but I haven’t seen this behavior anywhere else, and so 
> I have no way to replicate it. Must be a subtle race condition.
> 
> Can you replace “plm” with ‘“state” and try to hit a “bad” run again?
> 
> 
>> On Aug 30, 2016, at 12:30 PM, Jingchao Zhang > <mailto:zh...@unl.edu>> wrote:
>> 
>> Yes, all procs were launched properly. I added “-mca plm_base_verbose 5” to 
>> the mpirun command. Please see attached for the results.
>> 
>> $mpirun -mca plm_base_verbose 5 ./a.out < test.in
>> 
>> I mentioned in my initial post that the test job can run properly for the 
>> 1st time. But if I kill the job and resubmit, then it hangs. It happened 
>> with the job above as well. Very odd. 
>> From: users > <mailto:users-boun...@lists.open-mpi.org>> on behalf of r...@open-mpi.org 
>> <mailto:r...@open-mpi.org> mailto:r...@open-mpi.org>>
>> Sent: Tuesday, August 30, 2016 12:56:33 PM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] stdin issue with openmpi/2.0.0
>>  
>> Hmmm...well, the problem appears to be that we aren’t setting up the input 
>> channel to read stdin. This happens immediately after the application is 
>> launched - there is no “if” clause or anything else in front of it. The only 
>> way it wouldn’t get called is if all the procs weren’t launched, but that 
>> appears to be happening, yes?
>> 
>> Hence my confusion - there is no test in front of that print statement now, 
>> and yet we aren’t seeing the code being called.
>> 
>> Could you please add “-mca plm_base_verbose 5” to your cmd line? We should 
>> see a debug statement print that contains "plm:base:launch wiring up iof for 
>> job”
>> 
>> 
>> 
>>> On Aug 30, 2016, at 11:40 AM, Jingchao Zhang >> <mailto:zh...@unl.edu>> wrote:
>>> 
>>> I checked again and as far as I can tell, everything was setup correctly. I 
>>> added "HCC debug" to the output message to make sure it's the correct 
>>> plugin. 
>>> 
>>> The updated outputs:
>>> $ mpirun ./a.out < test.in
>>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC 
>>> debug: [[26513,0],0] iof:hnp pushing fd 35 for process [[26513,1],0]
>>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC 
>>> debug: [[26513,0],0] iof:hnp pushing fd 41 for process [[26513,1],0]
>>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC 
>>> debug: [[26513,0],0] iof:hnp pushing fd 43 for process [[26513,1],0]
>>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC 
>>> debug: [[26513,0],0] iof:hnp pushing fd 37 for process [[26513,1],1]
>>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC 
>>> debug: [[26513,0],0] iof:hnp pushing fd 46 for process [[26513,1],1]
>>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC 
>>> debug: [[26513,0],0] iof:hnp pushing fd 49 for process [[26513,1],1]
>>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC 
>>> debug: [[26513,0],0] iof:hnp pushing fd 38 for process [[26513,1],2]
>>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC 
>>> debug: [[26513,0],0] iof:hnp pushing fd 50 for process [[26513,1],2]
>>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC 
>>> debug: [[26513,0],0] iof:hnp pushing fd 52 for process [[26513,1],2]
>>> [c1725.crane.hcc.unl.edu <http://c1725.crane.hcc.unl.edu/>:218844] HCC 
>>> debug: [[26513,0],0] iof:hnp pushing fd 42 for process [[26513,1],3]

Re: [OMPI users] OMPI users] Unable to mpirun from within torque

2016-09-07 Thread r...@open-mpi.org
The usual cause of this problem is that the nodename in the machinefile is 
given as a00551, while Torque is assigning the node name as 
a00551.science.domain. Thus, mpirun thinks those are two separate nodes and 
winds up spawning an orted on its own node.

You might try ensuring that your machinefile is using the exact same name as 
provided in your allocation


> On Sep 7, 2016, at 7:06 AM, Gilles Gouaillardet 
>  wrote:
> 
> Thanjs for the ligs
> 
> From what i see now, it looks like a00551 is running both mpirun and orted, 
> though it should only run mpirun, and orted should run only on a00553
> 
> I will check the code and see what could be happening here
> 
> Btw, what is the output of
> hostname
> hostname -f
> On a00551 ?
> 
> Out of curiosity, is a previous version of Open MPI (e.g. v1.10.4) installled 
> and running correctly on your cluster ?
> 
> Cheers,
> 
> Gilles
> 
> Oswin Krause  wrote:
>> Hi Gilles,
>> 
>> Thanks for the hint with the machinefile. I know it is not equivalent 
>> and i do not intend to use that approach. I just wanted to know whether 
>> I could start the program successfully at all.
>> 
>> Outside torque(4.2), rsh seems to be used which works fine, querying a 
>> password if no kerberos ticket is there
>> 
>> Here is the output:
>> [zbh251@a00551 ~]$ mpirun -V
>> mpirun (Open MPI) 2.0.1
>> [zbh251@a00551 ~]$ ompi_info | grep ras
>> MCA ras: loadleveler (MCA v2.1.0, API v2.0.0, Component 
>> v2.0.1)
>> MCA ras: simulator (MCA v2.1.0, API v2.0.0, Component 
>> v2.0.1)
>> MCA ras: slurm (MCA v2.1.0, API v2.0.0, Component 
>> v2.0.1)
>> MCA ras: tm (MCA v2.1.0, API v2.0.0, Component v2.0.1)
>> [zbh251@a00551 ~]$ mpirun --mca plm_base_verbose 10 --tag-output 
>> -display-map hostname
>> [a00551.science.domain:04104] mca: base: components_register: 
>> registering framework plm components
>> [a00551.science.domain:04104] mca: base: components_register: found 
>> loaded component isolated
>> [a00551.science.domain:04104] mca: base: components_register: component 
>> isolated has no register or open function
>> [a00551.science.domain:04104] mca: base: components_register: found 
>> loaded component rsh
>> [a00551.science.domain:04104] mca: base: components_register: component 
>> rsh register function successful
>> [a00551.science.domain:04104] mca: base: components_register: found 
>> loaded component slurm
>> [a00551.science.domain:04104] mca: base: components_register: component 
>> slurm register function successful
>> [a00551.science.domain:04104] mca: base: components_register: found 
>> loaded component tm
>> [a00551.science.domain:04104] mca: base: components_register: component 
>> tm register function successful
>> [a00551.science.domain:04104] mca: base: components_open: opening plm 
>> components
>> [a00551.science.domain:04104] mca: base: components_open: found loaded 
>> component isolated
>> [a00551.science.domain:04104] mca: base: components_open: component 
>> isolated open function successful
>> [a00551.science.domain:04104] mca: base: components_open: found loaded 
>> component rsh
>> [a00551.science.domain:04104] mca: base: components_open: component rsh 
>> open function successful
>> [a00551.science.domain:04104] mca: base: components_open: found loaded 
>> component slurm
>> [a00551.science.domain:04104] mca: base: components_open: component 
>> slurm open function successful
>> [a00551.science.domain:04104] mca: base: components_open: found loaded 
>> component tm
>> [a00551.science.domain:04104] mca: base: components_open: component tm 
>> open function successful
>> [a00551.science.domain:04104] mca:base:select: Auto-selecting plm 
>> components
>> [a00551.science.domain:04104] mca:base:select:(  plm) Querying component 
>> [isolated]
>> [a00551.science.domain:04104] mca:base:select:(  plm) Query of component 
>> [isolated] set priority to 0
>> [a00551.science.domain:04104] mca:base:select:(  plm) Querying component 
>> [rsh]
>> [a00551.science.domain:04104] mca:base:select:(  plm) Query of component 
>> [rsh] set priority to 10
>> [a00551.science.domain:04104] mca:base:select:(  plm) Querying component 
>> [slurm]
>> [a00551.science.domain:04104] mca:base:select:(  plm) Querying component 
>> [tm]
>> [a00551.science.domain:04104] mca:base:select:(  plm) Query of component 
>> [tm] set priority to 75
>> [a00551.science.domain:04104] mca:base:select:(  plm) Selected component 
>> [tm]
>> [a00551.science.domain:04104] mca: base: close: component isolated 
>> closed
>> [a00551.science.domain:04104] mca: base: close: unloading component 
>> isolated
>> [a00551.science.domain:04104] mca: base: close: component rsh closed
>> [a00551.science.domain:04104] mca: base: close: unloading component rsh
>> [a00551.science.domain:04104] mca: base: close: component slurm closed
>> [a00551.science.domain:04104] mca: base: close: unloading component 
>> slurm
>> [a00551.science.domain:04109] mca: base: comp

Re: [OMPI users] OMPI users] Unable to mpirun from within torque

2016-09-07 Thread r...@open-mpi.org
You aren’t looking in the right place - there is an “openmpi” directory 
underneath that one, and the mca_xxx libraries are down there

> On Sep 7, 2016, at 7:43 AM, Oswin Krause  
> wrote:
> 
> Hi Gilles,
> 
> I do not have this library. Maybe this helps already...
> 
> libmca_common_sm.so  libmpi_mpifh.so  libmpi_usempif08.so  
> libompitrace.so  libopen-rte.so
> libmpi_cxx.solibmpi.solibmpi_usempi_ignore_tkr.so  
> libopen-pal.so   liboshmem.so
> 
> and mpirun does only link to libopen-pal/libopen-rte (aside the standard 
> stuff)
> 
> But still it is telling me that it has support for tm? libtorque is there and 
> the headers are also there and since i have enabled tm...*sigh*
> 
> Thanks again!
> 
> Oswin
> 
> On 2016-09-07 16:21, Gilles Gouaillardet wrote:
>> Note the torque library will only show up if you configure'd with
>> --disable-dlopen. Otherwise, you can ldd
>> /.../lib/openmpi/mca_plm_tm.so
>> Cheers,
>> Gilles
>> Bennet Fauber  wrote:
>>> Oswin,
>>> Does the torque library show up if you run
>>> $ ldd mpirun
>>> That would indicate that Torque support is compiled in.
>>> Also, what happens if you use the same hostfile, or some hostfile as
>>> an explicit argument when you run mpirun from within the torque job?
>>> -- bennet
>>> On Wed, Sep 7, 2016 at 9:25 AM, Oswin Krause
>>>  wrote:
 Hi Gilles,
 Thanks for the hint with the machinefile. I know it is not equivalent and i
 do not intend to use that approach. I just wanted to know whether I could
 start the program successfully at all.
 Outside torque(4.2), rsh seems to be used which works fine, querying a
 password if no kerberos ticket is there
 Here is the output:
 [zbh251@a00551 ~]$ mpirun -V
 mpirun (Open MPI) 2.0.1
 [zbh251@a00551 ~]$ ompi_info | grep ras
 MCA ras: loadleveler (MCA v2.1.0, API v2.0.0, Component
 v2.0.1)
 MCA ras: simulator (MCA v2.1.0, API v2.0.0, Component
 v2.0.1)
 MCA ras: slurm (MCA v2.1.0, API v2.0.0, Component v2.0.1)
 MCA ras: tm (MCA v2.1.0, API v2.0.0, Component v2.0.1)
 [zbh251@a00551 ~]$ mpirun --mca plm_base_verbose 10 --tag-output
 -display-map hostname
 [a00551.science.domain:04104] mca: base: components_register: registering
 framework plm components
 [a00551.science.domain:04104] mca: base: components_register: found loaded
 component isolated
 [a00551.science.domain:04104] mca: base: components_register: component
 isolated has no register or open function
 [a00551.science.domain:04104] mca: base: components_register: found loaded
 component rsh
 [a00551.science.domain:04104] mca: base: components_register: component rsh
 register function successful
 [a00551.science.domain:04104] mca: base: components_register: found loaded
 component slurm
 [a00551.science.domain:04104] mca: base: components_register: component
 slurm register function successful
 [a00551.science.domain:04104] mca: base: components_register: found loaded
 component tm
 [a00551.science.domain:04104] mca: base: components_register: component tm
 register function successful
 [a00551.science.domain:04104] mca: base: components_open: opening plm
 components
 [a00551.science.domain:04104] mca: base: components_open: found loaded
 component isolated
 [a00551.science.domain:04104] mca: base: components_open: component 
 isolated
 open function successful
 [a00551.science.domain:04104] mca: base: components_open: found loaded
 component rsh
 [a00551.science.domain:04104] mca: base: components_open: component rsh 
 open
 function successful
 [a00551.science.domain:04104] mca: base: components_open: found loaded
 component slurm
 [a00551.science.domain:04104] mca: base: components_open: component slurm
 open function successful
 [a00551.science.domain:04104] mca: base: components_open: found loaded
 component tm
 [a00551.science.domain:04104] mca: base: components_open: component tm open
 function successful
 [a00551.science.domain:04104] mca:base:select: Auto-selecting plm 
 components
 [a00551.science.domain:04104] mca:base:select:(  plm) Querying component
 [isolated]
 [a00551.science.domain:04104] mca:base:select:(  plm) Query of component
 [isolated] set priority to 0
 [a00551.science.domain:04104] mca:base:select:(  plm) Querying component
 [rsh]
 [a00551.science.domain:04104] mca:base:select:(  plm) Query of component
 [rsh] set priority to 10
 [a00551.science.domain:04104] mca:base:select:(  plm) Querying component
 [slurm]
 [a00551.science.domain:04104] mca:base:select:(  plm) Querying component
 [tm]
 [a00551.science.domain:04104] mca:base:select:(  plm) Query of component
 [tm] set priority to 75
 [a00551.science.domain:04104] mca:base:select:(  plm) Selected 

Re: [OMPI users] OMPI users] Unable to mpirun from within torque

2016-09-08 Thread r...@open-mpi.org
Someone has done some work there since I last did, but I can see the issue. 
Torque indeed always provides an ordered file - the only way you can get an 
unordered one is for someone to edit it, and that is forbidden - i.e., you get 
what you deserve because you are messing around with a system-defined file :-)

The problem is that Torque internally assigns a “launch ID” which is just the 
integer position of the nodename in the PBS_NODEFILE. So if you modify that 
position, then we get the wrong index - and everything goes down the drain from 
there. In your example, n1.cluster changed index from 3 to 2 because of your 
edit. Torque thinks that index 2 is just another reference to n0.cluster, and 
so we merrily launch a daemon onto the wrong node.

They have a good reason for doing things this way. It allows you to launch a 
process against each launch ID, and the pattern will reflect the original qsub 
request in what we would call a map-by slot round-robin mode. This maximizes 
the use of shared memory, and is expected to provide good performance for a 
range of apps.

Lesson to be learned: never, ever muddle around with a system-generated file. 
If you want to modify where things go, then use one or more of the mpirun 
options to do so. We give you lots and lots of knobs for just that reason.



> On Sep 7, 2016, at 10:53 PM, Gilles Gouaillardet  wrote:
> 
> Ralph,
> 
> 
> there might be an issue within Open MPI.
> 
> 
> on the cluster i used, hostname returns the FQDN, and $PBS_NODEFILE uses the 
> FQDN too.
> 
> my $PBS_NODEFILE has one line per task, and it is ordered
> 
> e.g.
> 
> n0.cluster
> 
> n0.cluster
> 
> n1.cluster
> 
> n1.cluster
> 
> 
> in my torque script, i rewrote the machinefile like this
> 
> n0.cluster
> 
> n1.cluster
> 
> n0.cluster
> 
> n1.cluster
> 
> and updated the PBS environment variable to point to my new file.
> 
> 
> then i invoked
> 
> mpirun hostname
> 
> 
> 
> in the first case, 2 tasks run on n0 and 2 tasks run on n1
> in the second case, 4 tasks run on n0, and none on n1.
> 
> so i am thinking we might not support unordered $PBS_NODEFILE.
> 
> as a reminder, the submit command was
> qsub -l nodes=3:ppn=1
> but for some reasons i ignore, only two nodes were allocated (two slots on 
> the first one, one on the second one)
> and if i understand correctly, $PBS_NODEFILE was not ordered.
> (e.g. n0 n1 n0 and *not * n0 n0 n1)
> 
> i tried to reproduce this without hacking $PBS_NODEFILE, but my jobs hang in 
> the queue if only two nodes with 16 slots each are available and i request
> -l nodes=3:ppn=1
> i guess this is a different scheduler configuration, and i cannot change that.
> 
> Could you please have a look at this ?
> 
> Cheers,
> 
> Gilles
> 
> On 9/7/2016 11:15 PM, r...@open-mpi.org wrote:
>> The usual cause of this problem is that the nodename in the machinefile is 
>> given as a00551, while Torque is assigning the node name as 
>> a00551.science.domain. Thus, mpirun thinks those are two separate nodes and 
>> winds up spawning an orted on its own node.
>> 
>> You might try ensuring that your machinefile is using the exact same name as 
>> provided in your allocation
>> 
>> 
>>> On Sep 7, 2016, at 7:06 AM, Gilles Gouaillardet 
>>>  wrote:
>>> 
>>> Thanjs for the ligs
>>> 
>>> From what i see now, it looks like a00551 is running both mpirun and orted, 
>>> though it should only run mpirun, and orted should run only on a00553
>>> 
>>> I will check the code and see what could be happening here
>>> 
>>> Btw, what is the output of
>>> hostname
>>> hostname -f
>>> On a00551 ?
>>> 
>>> Out of curiosity, is a previous version of Open MPI (e.g. v1.10.4) 
>>> installled and running correctly on your cluster ?
>>> 
>>> Cheers,
>>> 
>>> Gilles
>>> 
>>> Oswin Krause  wrote:
>>>> Hi Gilles,
>>>> 
>>>> Thanks for the hint with the machinefile. I know it is not equivalent
>>>> and i do not intend to use that approach. I just wanted to know whether
>>>> I could start the program successfully at all.
>>>> 
>>>> Outside torque(4.2), rsh seems to be used which works fine, querying a
>>>> password if no kerberos ticket is there
>>>> 
>>>> Here is the output:
>>>> [zbh251@a00551 ~]$ mpirun -V
>>>> mpirun (Open MPI) 2.0.1
>>>> [zbh251@a00551 ~]$ ompi_info | grep ras
>>>> MCA ras: loadleveler (MCA v2.1.0, API v2.0.0, Component
>>>> 

Re: [OMPI users] Unable to mpirun from within torque

2016-09-08 Thread r...@open-mpi.org
If you are correctly analyzing things, then there would be an issue in the 
code. When we get an allocation from a resource manager, we set a flag 
indicating that it is “gospel” - i.e., that we do not directly sense the number 
of cores on a node and set the #slots equal to that value. Instead, we take the 
RM-provided allocation as ultimate truth.

This should be true even if you add a machinefile, as the machinefile is only 
used to “filter” the nodelist provided by the RM. It shouldn’t cause the #slots 
to be modified.

Taking a quick glance at the v2.x code, it looks to me like all is being done 
correctly. Again, output from a debug build would resolve that question


> On Sep 7, 2016, at 10:56 PM, Gilles Gouaillardet  wrote:
> 
> Oswin,
> 
> 
> unfortunatly some important info is missing.
> 
> i guess the root cause is Open MPI was not configure'd with --enable-debug
> 
> 
> could you please update your torque script and simply add the following 
> snippet before invoking mpirun
> 
> 
> echo PBS_NODEFILE
> 
> cat $PBS_NODEFILE
> 
> echo ---
> 
> 
> as i wrote in an other email, i suspect hosts are not ordered (and i'd like 
> to confirm that) and Open MPI does not handle that correctly
> 
> 
> Cheers,
> 
> 
> Gilles
> 
> On 9/7/2016 10:25 PM, Oswin Krause wrote:
>> Hi Gilles,
>> 
>> Thanks for the hint with the machinefile. I know it is not equivalent and i 
>> do not intend to use that approach. I just wanted to know whether I could 
>> start the program successfully at all.
>> 
>> Outside torque(4.2), rsh seems to be used which works fine, querying a 
>> password if no kerberos ticket is there
>> 
>> Here is the output:
>> [zbh251@a00551 ~]$ mpirun -V
>> mpirun (Open MPI) 2.0.1
>> [zbh251@a00551 ~]$ ompi_info | grep ras
>> MCA ras: loadleveler (MCA v2.1.0, API v2.0.0, Component 
>> v2.0.1)
>> MCA ras: simulator (MCA v2.1.0, API v2.0.0, Component v2.0.1)
>> MCA ras: slurm (MCA v2.1.0, API v2.0.0, Component v2.0.1)
>> MCA ras: tm (MCA v2.1.0, API v2.0.0, Component v2.0.1)
>> [zbh251@a00551 ~]$ mpirun --mca plm_base_verbose 10 --tag-output 
>> -display-map hostname
>> [a00551.science.domain:04104] mca: base: components_register: registering 
>> framework plm components
>> [a00551.science.domain:04104] mca: base: components_register: found loaded 
>> component isolated
>> [a00551.science.domain:04104] mca: base: components_register: component 
>> isolated has no register or open function
>> [a00551.science.domain:04104] mca: base: components_register: found loaded 
>> component rsh
>> [a00551.science.domain:04104] mca: base: components_register: component rsh 
>> register function successful
>> [a00551.science.domain:04104] mca: base: components_register: found loaded 
>> component slurm
>> [a00551.science.domain:04104] mca: base: components_register: component 
>> slurm register function successful
>> [a00551.science.domain:04104] mca: base: components_register: found loaded 
>> component tm
>> [a00551.science.domain:04104] mca: base: components_register: component tm 
>> register function successful
>> [a00551.science.domain:04104] mca: base: components_open: opening plm 
>> components
>> [a00551.science.domain:04104] mca: base: components_open: found loaded 
>> component isolated
>> [a00551.science.domain:04104] mca: base: components_open: component isolated 
>> open function successful
>> [a00551.science.domain:04104] mca: base: components_open: found loaded 
>> component rsh
>> [a00551.science.domain:04104] mca: base: components_open: component rsh open 
>> function successful
>> [a00551.science.domain:04104] mca: base: components_open: found loaded 
>> component slurm
>> [a00551.science.domain:04104] mca: base: components_open: component slurm 
>> open function successful
>> [a00551.science.domain:04104] mca: base: components_open: found loaded 
>> component tm
>> [a00551.science.domain:04104] mca: base: components_open: component tm open 
>> function successful
>> [a00551.science.domain:04104] mca:base:select: Auto-selecting plm components
>> [a00551.science.domain:04104] mca:base:select:(  plm) Querying component 
>> [isolated]
>> [a00551.science.domain:04104] mca:base:select:(  plm) Query of component 
>> [isolated] set priority to 0
>> [a00551.science.domain:04104] mca:base:select:(  plm) Querying component 
>> [rsh]
>> [a00551.science.domain:04104] mca:base:select:(  plm) Query of component 
>> [rsh] set priority to 10
>> [a00551.science.domain:04104] mca:base:select:(  plm) Querying component 
>> [slurm]
>> [a00551.science.domain:04104] mca:base:select:(  plm) Querying component [tm]
>> [a00551.science.domain:04104] mca:base:select:(  plm) Query of component 
>> [tm] set priority to 75
>> [a00551.science.domain:04104] mca:base:select:(  plm) Selected component [tm]
>> [a00551.science.domain:04104] mca: base: close: component isolated closed
>> [a00551.science.domain:04104] mca: base: close: unloading component isolated
>> [a00

Re: [OMPI users] OMPI users] Unable to mpirun from within torque

2016-09-08 Thread r...@open-mpi.org
I’m pruning this email thread so I can actually read the blasted thing :-)

Guys: you are off in the wilderness chasing ghosts! Please stop.

When I say that Torque uses an “ordered” file, I am _not_ saying that all the 
host entries of the same name have to be listed consecutively. I am saying that 
the _position_ of each entry has meaning, and you cannot just change it.

I have honestly totally lost the root of this discussion in all the white noise 
about the PBS_NODEFILE. Can we reboot?
Ralph


> On Sep 8, 2016, at 5:26 AM, Gilles Gouaillardet 
>  wrote:
> 
> Oswin,
> 
> One more thing, can you
> 
> pbsdsh -v hostname
> 
> before invoking mpirun ?
> Hopefully this should print the three hostnames
> 
> Then you can
> ldd `which pbsdsh`
> And see which libtorque.so is linked with it
> 
> Cheers,
> 
> Gilles
> 
> 

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Strange errors when running mpirun

2016-09-22 Thread r...@open-mpi.org
Maybe I’m missing something, but “mpirun -n 1” doesn’t include the name of an 
application to execute.

The error message prior to that error indicates that you have some cruft 
sitting in your tmpdir. You just need to clean it out - look for something that 
starts with “openmpi”


> On Sep 22, 2016, at 1:45 AM, Justin Chang  wrote:
> 
> Dear all,
> 
> So I upgraded/updated my Homebrew on my Macbook and installed Open MPI
> 2.0.1 using "brew install openmpi". However, when I open up a terminal
> and type "mpirun -n 1" I get the following messages:
> 
> ~ mpirun -n 1
> [Justins-MacBook-Pro-2.local:20793] [[13318,0],0] bind() failed on
> error Address already in use (48)
> [Justins-MacBook-Pro-2.local:20793] [[13318,0],0] ORTE_ERROR_LOG:
> Error in file oob_usock_component.c at line 228
> --
> No executable was specified on the mpirun command line.
> 
> Aborting.
> --
> 
> 
> I have never seen anything like the first two lines. I also installed
> python and mpi4py via pip, and when I still get the same messages:
> 
> ~ python -c "from mpi4py import MPI"
> [Justins-MacBook-Pro-2.local:20871] [[13496,0],0] bind() failed on
> error Address already in use (48)
> [Justins-MacBook-Pro-2.local:20871] [[13496,0],0] ORTE_ERROR_LOG:
> Error in file oob_usock_component.c at line 228
> 
> But now if I add "mpirun -n 1" I get the following:
> 
> ~ mpirun -n 1 python -c "from mpi4py import MPI"
> [Justins-MacBook-Pro-2.local:20935] [[13560,0],0] bind() failed on
> error Address already in use (48)
> [Justins-MacBook-Pro-2.local:20935] [[13560,0],0] ORTE_ERROR_LOG:
> Error in file oob_usock_component.c at line 228
> [Justins-MacBook-Pro-2.local:20936] [[13560,1],0]
> usock_peer_send_blocking: send() to socket 17 failed: Socket is not
> connected (57)
> [Justins-MacBook-Pro-2.local:20936] [[13560,1],0] ORTE_ERROR_LOG:
> Unreachable in file oob_usock_connection.c at line 315
> [Justins-MacBook-Pro-2.local:20936] [[13560,1],0]
> orte_usock_peer_try_connect: usock_peer_send_connect_ack to proc
> [[13560,0],0] failed: Unreachable (-12)
> [Justins-MacBook-Pro-2:20936] *** Process received signal ***
> [Justins-MacBook-Pro-2:20936] Signal: Segmentation fault: 11 (11)
> [Justins-MacBook-Pro-2:20936] Signal code:  (0)
> [Justins-MacBook-Pro-2:20936] Failing at address: 0x0
> ---
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code.. Per user-direction, the job has been aborted.
> ---
> --
> mpirun detected that one or more processes exited with non-zero
> status, thus causing
> the job to be terminated. The first process to do so was:
> 
>  Process name: [[13560,1],0]
>  Exit code:1
> --
> 
> Clearly something is wrong here. I already tried things like "rm -rf
> $TMPDIR/openmpi-sessions-*" but said directory keeps reappearing and
> the error persists. Why does this happen and how do I fix it? For what
> it's worth, here's some other information that may help:
> 
> ~ mpicc --version
> Apple LLVM version 8.0.0 (clang-800.0.38)
> Target: x86_64-apple-darwin15.6.0
> Thread model: posix
> InstalledDir: 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
> 
> I tested Hello World with both mpicc and mpif90, and they still work
> despite showing those two error/warning messages.
> 
> Thanks,
> Justin
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Strange errors when running mpirun

2016-09-22 Thread r...@open-mpi.org
Try removing the “pmix” entries as well

> On Sep 22, 2016, at 2:19 AM, Justin Chang  wrote:
> 
> "mpirun -n 1" was just to demonstrate that I get those error messages.
> I ran a simple helloworld.c and it still gives those two messages.
> 
> I did delete openmpi-sessions-* from my $TMPDIR but it doesn't solve
> the problem. Here's my $TMPDIR:
> 
> ~ cd $TMPDIR
> ~ pwd
> /var/folders/jd/qh5zn6jn5kz_byz9gxz5kl2mgn/T
> ~ ls
> MediaCache
> TemporaryItems
> com.apple.AddressBook.ContactsAccountsService
> com.apple.AddressBook.InternetAccountsBridge
> com.apple.AirPlayUIAgent
> com.apple.BKAgentService
> com.apple.CalendarAgent
> com.apple.CalendarAgent.CalNCService
> com.apple.CloudPhotosConfiguration
> com.apple.DataDetectorsDynamicData
> com.apple.ICPPhotoStreamLibraryService
> com.apple.InputMethodKit.TextReplacementService
> com.apple.PhotoIngestService
> com.apple.Preview
> com.apple.Safari
> com.apple.SocialPushAgent
> com.apple.WeatherKitService
> com.apple.cloudphotosd
> com.apple.dt.XCDocumenter.XCDocumenterExtension
> com.apple.dt.XcodeBuiltInExtensions
> com.apple.geod
> com.apple.iCal.CalendarNC
> com.apple.lateragent
> com.apple.ncplugin.stocks
> com.apple.ncplugin.weather
> com.apple.notificationcenterui.WeatherSummary
> com.apple.photolibraryd
> com.apple.photomoments
> com.apple.quicklook.ui.helper
> com.apple.soagent
> com.getdropbox.dropbox.garcon
> icdd501
> ics21406
> openmpi-sessions-501@Justins-MacBook-Pro-2_0
> pmix-12195
> pmix-12271
> pmix-12289
> pmix-12295
> pmix-12304
> pmix-12313
> pmix-12367
> pmix-12397
> pmix-12775
> pmix-12858
> pmix-17118
> pmix-1754
> pmix-20632
> pmix-20793
> pmix-20849
> pmix-21019
> pmix-22316
> pmix-8129
> pmix-8494
> xcrun_db
> ~ rm -rf openmpi-sessions-501@Justins-MacBook-Pro-2_0
> ~ mpirun -n 1
> [Justins-MacBook-Pro-2.local:22527] [[12992,0],0] bind() failed on
> error Address already in use (48)
> [Justins-MacBook-Pro-2.local:22527] [[12992,0],0] ORTE_ERROR_LOG:
> Error in file oob_usock_component.c at line 228
> --
> No executable was specified on the mpirun command line.
> 
> Aborting.
> ----------
> 
> and when I type "ls" the directory
> "openmpi-sessions-501@Justins-MacBook-Pro-2_0" reappeared. Unless
> there's a different directory I need to look for?
> 
> On Thu, Sep 22, 2016 at 4:08 AM, r...@open-mpi.org  wrote:
>> Maybe I’m missing something, but “mpirun -n 1” doesn’t include the name of 
>> an application to execute.
>> 
>> The error message prior to that error indicates that you have some cruft 
>> sitting in your tmpdir. You just need to clean it out - look for something 
>> that starts with “openmpi”
>> 
>> 
>>> On Sep 22, 2016, at 1:45 AM, Justin Chang  wrote:
>>> 
>>> Dear all,
>>> 
>>> So I upgraded/updated my Homebrew on my Macbook and installed Open MPI
>>> 2.0.1 using "brew install openmpi". However, when I open up a terminal
>>> and type "mpirun -n 1" I get the following messages:
>>> 
>>> ~ mpirun -n 1
>>> [Justins-MacBook-Pro-2.local:20793] [[13318,0],0] bind() failed on
>>> error Address already in use (48)
>>> [Justins-MacBook-Pro-2.local:20793] [[13318,0],0] ORTE_ERROR_LOG:
>>> Error in file oob_usock_component.c at line 228
>>> --
>>> No executable was specified on the mpirun command line.
>>> 
>>> Aborting.
>>> --
>>> 
>>> 
>>> I have never seen anything like the first two lines. I also installed
>>> python and mpi4py via pip, and when I still get the same messages:
>>> 
>>> ~ python -c "from mpi4py import MPI"
>>> [Justins-MacBook-Pro-2.local:20871] [[13496,0],0] bind() failed on
>>> error Address already in use (48)
>>> [Justins-MacBook-Pro-2.local:20871] [[13496,0],0] ORTE_ERROR_LOG:
>>> Error in file oob_usock_component.c at line 228
>>> 
>>> But now if I add "mpirun -n 1" I get the following:
>>> 
>>> ~ mpirun -n 1 python -c "from mpi4py import MPI"
>>> [Justins-MacBook-Pro-2.local:20935] [[13560,0],0] bind() failed on
>>> error Address already in use (48)
>>> [Justins-MacBook-Pro-2.local:20935] [[13560,0],0] ORTE_ERROR_LOG:
>>> Error

Re: [OMPI users] Openmpi 1.10.x, mpirun and Slurm 15.08 problem

2016-09-23 Thread r...@open-mpi.org
This isn’t an issue with the SLURM integration - this is the problem of our OOB 
not correctly picking the right subnet for connecting back to mpirun. In this 
specific case, you probably want

-mca btl_tcp_if_include em4 -mca oob_tcp_if_include em4

since it is the em4 network that ties the compute nodes together, and the 
compute nodes to the frontend

We are working on the subnet selection logic, but the 1.10 series seems to have 
not been updated with those changes

> On Sep 23, 2016, at 6:00 AM, Marcin Krotkiewski 
>  wrote:
> 
> Hi,
> 
> I have stumbled upon a similar issue, so I wonder those might be related. On 
> one of our systems I get the following error message, both when using openmpi 
> 1.8.8 and 1.10.4
> 
> $ mpirun -debug-daemons --mca btl tcp,self --mca mca_base_verbose 100 --mca 
> btl_base_verbose 100 ls
> 
> [...]
> [compute-1-1.local:07302] mca: base: close: unloading component direct
> [compute-1-1.local:07302] mca: base: close: unloading component radix
> [compute-1-1.local:07302] mca: base: close: unloading component debruijn
> [compute-1-1.local:07302] orte_routed_base_select: initializing selected 
> component binomial
> [compute-1-2.local:13744] [[63041,0],2]: parent 0 num_children 0
> Daemon [[63041,0],2] checking in as pid 13744 on host c1-2
> [compute-1-2.local:13744] [[63041,0],2] orted: up and running - waiting for 
> commands!
> [compute-1-2.local:13744] [[63041,0],2] tcp_peer_send_blocking: send() to 
> socket 9 failed: Broken pipe (32)
> [compute-1-2.local:13744] mca: base: close: unloading component binomial
> [compute-1-1.local:07302] [[63041,0],1]: parent 0 num_children 0
> Daemon [[63041,0],1] checking in as pid 7302 on host c1-1
> [compute-1-1.local:07302] [[63041,0],1] orted: up and running - waiting for 
> commands!
> [compute-1-1.local:07302] [[63041,0],1] tcp_peer_send_blocking: send() to 
> socket 9 failed: Broken pipe (32)
> [compute-1-1.local:07302] mca: base: close: unloading component binomial
> srun: error: c1-1: task 0: Exited with exit code 1
> srun: Terminating job step 4538.1
> srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
> srun: error: c1-2: task 1: Exited with exit code 1
> 
> 
> I have also tested version 2.0.1 - this one works without problems.
> 
> In my case the problem appears on one system with slurm versions 15.08.8 and 
> 15.08.12. On another system running 15.08.8 all is working fine, so I guess 
> it is not about SLURM version, but maybe system / network configuration?
> 
> Following that thought I have also noticed this thread:
> 
> http://users.open-mpi.narkive.com/PwJpWXLm/ompi-users-tcp-peer-send-blocking-send-to-socket-9-failed-broken-pipe-32-on-openvz-containers
>  
> 
> As Jeff suggested there, I tried to run with --mca btl_tcp_if_include em1 
> --mca oob_tcp_if_include em1, but got the same error.
> 
> Could these problems be related to interface naming / lack of infiniband? Or 
> to the fact that the front-end node, from which I execute mpirun, has a 
> different network configuration? The system, on which things don't work, only 
> has TCP  network interfaces:
> 
> em1, lo (frontend has em1, em4 - local compute network, lo)
> 
> while the cluster, on which openmpi does work, uses infiniband, and had the 
> following tcp interfaces:
> 
> eth0, eth1, ib0, lo
> 
> I would appreciate any hints..
> 
> Thanks!
> 
> Marcin
> 
> 
> On 04/01/2016 04:16 PM, Jeff Squyres (jsquyres) wrote:
>> Ralph --
>> 
>> What's the state of PMI integration with SLURM in the v1.10.x series?  (I 
>> haven't kept up with SLURM's recent releases to know if something broke 
>> between existing Open MPI releases and their new releases...?)
>> 
>> 
>> 
>>> On Mar 31, 2016, at 4:24 AM, Tommi T  wrote:
>>> 
>>> Hi,
>>> 
>>> stack:
>>> el6.7, mlnx ofed 3.1 (IB FDR) and slurm 15.08.9 (whithout *.la libs).
>>> 
>>> problem:
>>> OpenMPI 1.10.x built with pmi support does not work when trying to use 
>>> sbatch/salloc - mpirun combination. srun ompi_mpi_app works fine.
>>> 
>>> Older 1.8.x version works fine under same salloc session.
>>> 
>>> ./configure --with-slurm --with-verbs --with-hwloc=internal --with-pmi 
>>> --with-cuda=/appl/opt/cuda/7.5/ --with-pic --enable-shared 
>>> --enable-mpi-thread-multiple --enable-contrib-no-build=vt
>>> 
>>> 
>>> I tried 1.10.3a from git also.
>>> 
>>> 
>>> mpirun  -debug-daemons ./1103aompitest
>>> Daemon [[44437,0],1] checking in as pid 40979 on host g59
>>> Daemon [[44437,0],2] checking in as pid 23566 on host g60
>>> [g59:40979] [[44437,0],1] orted: up and running - waiting for commands!
>>> [g60:23566] [[44437,0],2] orted: up and running - waiting for commands!
>>> [g59:40979] [[44437,0],1] tcp_peer_send_blocking: send() to socket 9 
>>> failed: Broken pipe (32)
>>> [g59:40979] [[44437,0],1]:errmgr_default_orted.c(260) updating exit status 
>>> to 1
>>> [g60:23566] [[44437,0],2] tcp_peer_send_blocking: send() to socket 9 
>>> failed: Broken pipe (32)
>>> [g60:23566] [[44437,0],2]:errmgr_default_ort

Re: [OMPI users] MPI_Comm_spawn

2016-09-29 Thread r...@open-mpi.org
Spawn definitely does not work with srun. I don’t recognize the name of the 
file that segfaulted - what is “ptl.c”? Is that in your manager program?


> On Sep 29, 2016, at 6:06 AM, Gilles Gouaillardet 
>  wrote:
> 
> Hi,
> 
> I do not expect spawn can work with direct launch (e.g. srun)
> 
> Do you have PSM (e.g. Infinipath) hardware ? That could be linked to the 
> failure
> 
> Can you please try
> 
> mpirun --mca pml ob1 --mca btl tcp,sm,self -np 1 --hostfile my_hosts 
> ./manager 1
> 
> and see if it help ?
> 
> Note if you have the possibility, I suggest you first try that without slurm, 
> and then within a slurm job
> 
> Cheers,
> 
> Gilles
> 
> On Thursday, September 29, 2016, juraj2...@gmail.com 
>   > wrote:
> Hello,
> 
> I am using MPI_Comm_spawn to dynamically create new processes from single 
> manager process. Everything works fine when all the processes are running on 
> the same node. But imposing restriction to run only a single process per node 
> does not work. Below are the errors produced during multinode interactive 
> session and multinode sbatch job.
> 
> The system I am using is: Linux version 3.10.0-229.el7.x86_64 
> (buil...@kbuilder.dev.centos.org 
> ) (gcc 
> version 4.8.2 20140120 (Red Hat 4.8.2-16) (GCC) )
> I am using Open MPI 2.0.1
> Slurm is version 15.08.9
> 
> What is preventing my jobs to spawn on multiple nodes? Does slurm requires 
> some additional configuration to allow it? Is it issue on the MPI side, does 
> it need to be compiled with some special flag (I have compiled it with 
> --enable-mpi-fortran=all --with-pmi)? 
> 
> The code I am launching is here: https://github.com/goghino/dynamicMPI 
> 
> 
> Manager tries to launch one new process (./manager 1), the error produced by 
> requesting each process to be located on different node (interactive session):
> $ salloc -N 2
> $ cat my_hosts
> icsnode37
> icsnode38
> $ mpirun -np 1 -npernode 1 --hostfile my_hosts ./manager 1
> [manager]I'm running MPI 3.1
> [manager]Runing on node icsnode37
> icsnode37.12614Assertion failure at ptl.c:183: epaddr == ((void *)0)
> icsnode38.32443Assertion failure at ptl.c:183: epaddr == ((void *)0)
> [icsnode37:12614] *** Process received signal ***
> [icsnode37:12614] Signal: Aborted (6)
> [icsnode37:12614] Signal code:  (-6)
> [icsnode38:32443] *** Process received signal ***
> [icsnode38:32443] Signal: Aborted (6)
> [icsnode38:32443] Signal code:  (-6)
> 
> The same example as above via sbatch job submission:
> $ cat job.sbatch
> #!/bin/bash
> 
> #SBATCH --nodes=2
> #SBATCH --ntasks-per-node=1
> 
> module load openmpi/2.0.1
> srun -n 1 -N 1 ./manager 1
> 
> $ cat output.o
> [manager]I'm running MPI 3.1
> [manager]Runing on node icsnode39
> srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
> [icsnode39:9692] *** An error occurred in MPI_Comm_spawn
> [icsnode39:9692] *** reported by process [1007812608,0]
> [icsnode39:9692] *** on communicator MPI_COMM_SELF
> [icsnode39:9692] *** MPI_ERR_SPAWN: could not spawn processes
> [icsnode39:9692] *** MPI_ERRORS_ARE_FATAL (processes in this communicator 
> will now abort,
> [icsnode39:9692] ***and potentially your MPI job)
> In: PMI_Abort(50, N/A)
> slurmstepd: *** STEP 15378.0 ON icsnode39 CANCELLED AT 2016-09-26T16:48:20 ***
> srun: error: icsnode39: task 0: Exited with exit code 50
> 
> Thank for any feedback!
> 
> Best regards,
> Juraj
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] MPI_Comm_spawn

2016-09-29 Thread r...@open-mpi.org
Ah, that may be why it wouldn’t show up in the OMPI code base itself. If that 
is the case here, then no - OMPI v2.0.1 does not support comm_spawn for PSM. It 
is fixed in the upcoming 2.0.2

> On Sep 29, 2016, at 6:58 AM, Gilles Gouaillardet 
>  wrote:
> 
> Ralph,
> 
> My guess is that ptl.c comes from PSM lib ...
> 
> Cheers,
> 
> Gilles
> 
> On Thursday, September 29, 2016, r...@open-mpi.org <mailto:r...@open-mpi.org> 
> mailto:r...@open-mpi.org>> wrote:
> Spawn definitely does not work with srun. I don’t recognize the name of the 
> file that segfaulted - what is “ptl.c”? Is that in your manager program?
> 
> 
>> On Sep 29, 2016, at 6:06 AM, Gilles Gouaillardet 
>> > > wrote:
>> 
>> Hi,
>> 
>> I do not expect spawn can work with direct launch (e.g. srun)
>> 
>> Do you have PSM (e.g. Infinipath) hardware ? That could be linked to the 
>> failure
>> 
>> Can you please try
>> 
>> mpirun --mca pml ob1 --mca btl tcp,sm,self -np 1 --hostfile my_hosts 
>> ./manager 1
>> 
>> and see if it help ?
>> 
>> Note if you have the possibility, I suggest you first try that without 
>> slurm, and then within a slurm job
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> On Thursday, September 29, 2016, juraj2...@gmail.com 
>>  > > wrote:
>> Hello,
>> 
>> I am using MPI_Comm_spawn to dynamically create new processes from single 
>> manager process. Everything works fine when all the processes are running on 
>> the same node. But imposing restriction to run only a single process per 
>> node does not work. Below are the errors produced during multinode 
>> interactive session and multinode sbatch job.
>> 
>> The system I am using is: Linux version 3.10.0-229.el7.x86_64 
>> (buil...@kbuilder.dev.centos.org <>) (gcc version 4.8.2 20140120 (Red Hat 
>> 4.8.2-16) (GCC) )
>> I am using Open MPI 2.0.1
>> Slurm is version 15.08.9
>> 
>> What is preventing my jobs to spawn on multiple nodes? Does slurm requires 
>> some additional configuration to allow it? Is it issue on the MPI side, does 
>> it need to be compiled with some special flag (I have compiled it with 
>> --enable-mpi-fortran=all --with-pmi)? 
>> 
>> The code I am launching is here: https://github.com/goghino/dynamicMPI 
>> <https://github.com/goghino/dynamicMPI>
>> 
>> Manager tries to launch one new process (./manager 1), the error produced by 
>> requesting each process to be located on different node (interactive 
>> session):
>> $ salloc -N 2
>> $ cat my_hosts
>> icsnode37
>> icsnode38
>> $ mpirun -np 1 -npernode 1 --hostfile my_hosts ./manager 1
>> [manager]I'm running MPI 3.1
>> [manager]Runing on node icsnode37
>> icsnode37.12614Assertion failure at ptl.c:183: epaddr == ((void *)0)
>> icsnode38.32443Assertion failure at ptl.c:183: epaddr == ((void *)0)
>> [icsnode37:12614] *** Process received signal ***
>> [icsnode37:12614] Signal: Aborted (6)
>> [icsnode37:12614] Signal code:  (-6)
>> [icsnode38:32443] *** Process received signal ***
>> [icsnode38:32443] Signal: Aborted (6)
>> [icsnode38:32443] Signal code:  (-6)
>> 
>> The same example as above via sbatch job submission:
>> $ cat job.sbatch
>> #!/bin/bash
>> 
>> #SBATCH --nodes=2
>> #SBATCH --ntasks-per-node=1
>> 
>> module load openmpi/2.0.1
>> srun -n 1 -N 1 ./manager 1
>> 
>> $ cat output.o
>> [manager]I'm running MPI 3.1
>> [manager]Runing on node icsnode39
>> srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
>> [icsnode39:9692] *** An error occurred in MPI_Comm_spawn
>> [icsnode39:9692] *** reported by process [1007812608,0]
>> [icsnode39:9692] *** on communicator MPI_COMM_SELF
>> [icsnode39:9692] *** MPI_ERR_SPAWN: could not spawn processes
>> [icsnode39:9692] *** MPI_ERRORS_ARE_FATAL (processes in this communicator 
>> will now abort,
>> [icsnode39:9692] ***and potentially your MPI job)
>> In: PMI_Abort(50, N/A)
>> slurmstepd: *** STEP 15378.0 ON icsnode39 CANCELLED AT 2016-09-26T16:48:20 
>> ***
>> srun: error: icsnode39: task 0: Exited with exit code 50
>> 
>> Thank for any feedback!
>> 
>> Best regards,
>> Juraj
>> ___
>> users mailing list
>> users@lists.open-mpi.org 
>> 
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Launching hybrid MPI/OpenMP jobs on a cluster: correct OpenMPI flags?

2016-10-03 Thread r...@open-mpi.org
FWIW: the socket option seems to work fine for me:

$ mpirun -n 12 -map-by socket:pe=2 -host rhc001 --report-bindings hostname
[rhc001:200408] MCW rank 1 bound to socket 1[core 12[hwt 0-1]], socket 1[core 
13[hwt 0-1]]: 
[../../../../../../../../../../../..][BB/BB/../../../../../../../../../..]
[rhc001:200408] MCW rank 2 bound to socket 0[core 2[hwt 0-1]], socket 0[core 
3[hwt 0-1]]: 
[../../BB/BB/../../../../../../../..][../../../../../../../../../../../..]
[rhc001:200408] MCW rank 3 bound to socket 1[core 14[hwt 0-1]], socket 1[core 
15[hwt 0-1]]: 
[../../../../../../../../../../../..][../../BB/BB/../../../../../../../..]
[rhc001:200408] MCW rank 4 bound to socket 0[core 4[hwt 0-1]], socket 0[core 
5[hwt 0-1]]: 
[../../../../BB/BB/../../../../../..][../../../../../../../../../../../..]
[rhc001:200408] MCW rank 5 bound to socket 1[core 16[hwt 0-1]], socket 1[core 
17[hwt 0-1]]: 
[../../../../../../../../../../../..][../../../../BB/BB/../../../../../..]
[rhc001:200408] MCW rank 6 bound to socket 0[core 6[hwt 0-1]], socket 0[core 
7[hwt 0-1]]: 
[../../../../../../BB/BB/../../../..][../../../../../../../../../../../..]
[rhc001:200408] MCW rank 7 bound to socket 1[core 18[hwt 0-1]], socket 1[core 
19[hwt 0-1]]: 
[../../../../../../../../../../../..][../../../../../../BB/BB/../../../..]
[rhc001:200408] MCW rank 8 bound to socket 0[core 8[hwt 0-1]], socket 0[core 
9[hwt 0-1]]: 
[../../../../../../../../BB/BB/../..][../../../../../../../../../../../..]
[rhc001:200408] MCW rank 9 bound to socket 1[core 20[hwt 0-1]], socket 1[core 
21[hwt 0-1]]: 
[../../../../../../../../../../../..][../../../../../../../../BB/BB/../..]
[rhc001:200408] MCW rank 10 bound to socket 0[core 10[hwt 0-1]], socket 0[core 
11[hwt 0-1]]: 
[../../../../../../../../../../BB/BB][../../../../../../../../../../../..]
[rhc001:200408] MCW rank 11 bound to socket 1[core 22[hwt 0-1]], socket 1[core 
23[hwt 0-1]]: 
[../../../../../../../../../../../..][../../../../../../../../../../BB/BB]
[rhc001:200408] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 
1[hwt 0-1]]: 
[BB/BB/../../../../../../../../../..][../../../../../../../../../../../..]
rhc001
rhc001
rhc001
rhc001
rhc001
rhc001
rhc001
rhc001
rhc001
rhc001
rhc001
rhc001
$

I know that isn’t the pattern you are seeking - will have to ponder that one a 
bit. Is it possible that mpirun is not sitting on the same topology as your 
compute nodes?


> On Oct 3, 2016, at 2:22 PM, Wirawan Purwanto  wrote:
> 
> Hi,
> 
> I have been trying to understand how to correctly launch hybrid
> MPI/OpenMP (i.e. multi-threaded MPI jobs) with mpirun. I am quite
> puzzled as to what is the correct command-line options to use. The
> description on mpirun man page is very confusing and I could not get
> what I wanted.
> 
> A background: The cluster is using SGE, and I am using OpenMPI 1.10.2
> compiled with & for gcc 4.9.3. The MPI library was configured with SGE
> support. The compute nodes have 32 cores, which are basically 2
> sockets of Xeon E5-2698 v3 (16-core Haswell).
> 
> A colleague told me the following:
> 
> $ export OMP_NUM_THREADS=2
> $ mpirun -np 16 -map-by node:PE=2 ./EXECUTABLE
> 
> I could see the executable using 200% of CPU per process--that's good.
> There is one catch in the general case. "-map-by node" will assign the
> MPI processes in a round-robin fashion (so MPI rank 0 gets node 0, mpi
> rank 1 gets node 1, and so on until all nodes are given 1 process,
> then it will go back to node 0,1, ...).
> 
> Instead of the scenario above, I was trying to get the MPI processes
> side-by-side (more like "fill_up" policy in SGE scheduler), i.e. fill
> node 0 first, then fill node 1, and so on. How do I do this properly?
> 
> I tried a few attempts that fail:
> 
> $ export OMP_NUM_THREADS=2
> $ mpirun -np 16 -map-by core:PE=2 ./EXECUTABLE
> 
> or
> 
> $ export OMP_NUM_THREADS=2
> $ mpirun -np 16 -map-by socket:PE=2 ./EXECUTABLE
> 
> Both failed with an error mesage:
> 
> --
> A request for multiple cpus-per-proc was given, but a directive
> was also give to map to an object level that cannot support that
> directive.
> 
> Please specify a mapping level that has more than one cpu, or
> else let us define a default mapping that will allow multiple
> cpus-per-proc.
> --
> 
> Another attempt was:
> 
> $ export OMP_NUM_THREADS=2
> $ mpirun -np 16 -map-by socket:PE=2 -bind-to socket ./EXECUTABLE
> 
> Here's the error message:
> 
> --
> A request for multiple cpus-per-proc was given, but a conflicting binding
> policy was specified:
> 
>  #cpus-per-proc:  2
>  type of cpus:cores as cpus
>  binding policy given: SOCKET
> 
> The correct binding policy for the given type of cpu is:
> 
>  correct binding policy:  bind-to core
> 
> This is the binding policy we would apply by default for thi

Re: [OMPI users] how to tell if pmi or pmi2 is being used?

2016-10-13 Thread r...@open-mpi.org
If you are using mpirun, then neither PMI1 or PMI2 are involved at all. ORTE 
has its own internal mechanism for handling wireup.


> On Oct 13, 2016, at 10:43 AM, David Shrader  wrote:
> 
> Hello All,
> 
> I'm using Open MPI 1.10.3 with Slurm and would like to ask how do I find out 
> if pmi1 or pmi2 was used for process launching? The Slurm installation is 
> supposed to support both pmi1 and pmi2, but I would really like to know which 
> one I fall in to. I tried using '-mca plm_base_verbose 100' on the mpirun 
> line, but it didn't mention pmi specifically. Instead, all I could really 
> find was that it was using the slurm component. Is there something else I can 
> look at in the output that would have that detail?
> 
> Thank you for your time,
> David
> 
> -- 
> David Shrader
> HPC-ENV High Performance Computer Systems
> Los Alamos National Lab
> Email: dshrader  lanl.gov
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


[OMPI users] Supercomputing 2016: Birds-of-a-Feather meetings

2016-10-24 Thread r...@open-mpi.org
Hello all

This year, we will again be hosting Birds-of-a-Feather meetings for Open MPI 
and PMIx. 

Open MPI: Wed, Nov 16th, 5:15-7pm

http://sc16.supercomputing.org/presentation/?id=bof103&sess=sess322 



PMIx: Wed, Nov16th, 12:15-1:15pm:

http://sc16.supercomputing.org/presentation/?id=bof104&sess=sess323 


Please plan to attend!

Ralph

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Slurm binding not propagated to MPI jobs

2016-10-27 Thread r...@open-mpi.org
Hey Andy

Is there a SLURM envar that would tell us the binding option from the srun cmd 
line? We automatically bind when direct launched due to user complaints of poor 
performance if we don’t. If the user specifies a binding option, then we detect 
that we were already bound and don’t do it.

However, if the user specifies that they not be bound, then we think they 
simply didn’t specify anything - and that isn’t the case. If we can see 
something that tells us “they explicitly said not to do it”, then we can avoid 
the situation.

Ralph

> On Oct 27, 2016, at 8:48 AM, Andy Riebs  wrote:
> 
> Hi All,
> 
> We are running Open MPI version 1.10.2, built with support for Slurm version 
> 16.05.0. When a user specifies "--cpu_bind=none", MPI tries to bind by core, 
> which segv's if there are more processes than cores.
> 
> The user reports:
> 
> What I found is that
> 
> % srun --ntasks-per-node=8 --cpu_bind=none  \
> env SHMEM_SYMMETRIC_HEAP_SIZE=1024M bin/all2all.shmem.exe 0
> 
> will have the problem, but:
> 
> % srun --ntasks-per-node=8 --cpu_bind=none  \
> env SHMEM_SYMMETRIC_HEAP_SIZE=1024M ./bindit.sh bin/all2all.shmem.exe 0
> 
> Will run as expected and print out the usage message because I didn’t provide 
> the right arguments to the code.
> 
> So, it appears that the binding has something to do with the issue. My 
> binding script is as follows:
> 
> % cat bindit.sh
> #!/bin/bash
> 
> #echo SLURM_LOCALID=$SLURM_LOCALID
> 
> stride=1
> 
> if [ ! -z "$SLURM_LOCALID" ]; then
>   let bindCPU=$SLURM_LOCALID*$stride
>   exec numactl --membind=0 --physcpubind=$bindCPU $*
> fi
> 
> $*
> 
> %
> 
> 
> -- 
> Andy Riebs
> andy.ri...@hpe.com
> Hewlett-Packard Enterprise
> High Performance Computing Software Engineering
> +1 404 648 9024
> My opinions are not necessarily those of HPE
>May the source be with you!
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Slurm binding not propagated to MPI jobs

2016-10-27 Thread r...@open-mpi.org
And if there is no --cpu_bind on the cmd line? Do these not exist?

> On Oct 27, 2016, at 10:14 AM, Andy Riebs  wrote:
> 
> Hi Ralph,
> 
> I think I've found the magic keys...
> 
> $ srun --ntasks-per-node=2 -N1 --cpu_bind=none env | grep BIND
> SLURM_CPU_BIND_VERBOSE=quiet
> SLURM_CPU_BIND_TYPE=none
> SLURM_CPU_BIND_LIST=
> SLURM_CPU_BIND=quiet,none
> SLURM_CPU_BIND_VERBOSE=quiet
> SLURM_CPU_BIND_TYPE=none
> SLURM_CPU_BIND_LIST=
> SLURM_CPU_BIND=quiet,none
> $ srun --ntasks-per-node=2 -N1 --cpu_bind=core env | grep BIND
> SLURM_CPU_BIND_VERBOSE=quiet
> SLURM_CPU_BIND_TYPE=mask_cpu:
> SLURM_CPU_BIND_LIST=0x,0x
> SLURM_CPU_BIND=quiet,mask_cpu:0x,0x
> SLURM_CPU_BIND_VERBOSE=quiet
> SLURM_CPU_BIND_TYPE=mask_cpu:
> SLURM_CPU_BIND_LIST=0x,0x
> SLURM_CPU_BIND=quiet,mask_cpu:0x,0x
> 
> Andy
> 
> On 10/27/2016 11:57 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote:
>> Hey Andy
>> 
>> Is there a SLURM envar that would tell us the binding option from the srun 
>> cmd line? We automatically bind when direct launched due to user complaints 
>> of poor performance if we don’t. If the user specifies a binding option, 
>> then we detect that we were already bound and don’t do it.
>> 
>> However, if the user specifies that they not be bound, then we think they 
>> simply didn’t specify anything - and that isn’t the case. If we can see 
>> something that tells us “they explicitly said not to do it”, then we can 
>> avoid the situation.
>> 
>> Ralph
>> 
>>> On Oct 27, 2016, at 8:48 AM, Andy Riebs >> <mailto:andy.ri...@hpe.com>> wrote:
>>> 
>>> Hi All,
>>> 
>>> We are running Open MPI version 1.10.2, built with support for Slurm 
>>> version 16.05.0. When a user specifies "--cpu_bind=none", MPI tries to bind 
>>> by core, which segv's if there are more processes than cores.
>>> 
>>> The user reports:
>>> 
>>> What I found is that
>>> 
>>> % srun --ntasks-per-node=8 --cpu_bind=none  \
>>> env SHMEM_SYMMETRIC_HEAP_SIZE=1024M bin/all2all.shmem.exe 0
>>> 
>>> will have the problem, but:
>>> 
>>> % srun --ntasks-per-node=8 --cpu_bind=none  \
>>> env SHMEM_SYMMETRIC_HEAP_SIZE=1024M ./bindit.sh bin/all2all.shmem.exe 0
>>> 
>>> Will run as expected and print out the usage message because I didn’t 
>>> provide the right arguments to the code.
>>> 
>>> So, it appears that the binding has something to do with the issue. My 
>>> binding script is as follows:
>>> 
>>> % cat bindit.sh
>>> #!/bin/bash
>>> 
>>> #echo SLURM_LOCALID=$SLURM_LOCALID
>>> 
>>> stride=1
>>> 
>>> if [ ! -z "$SLURM_LOCALID" ]; then
>>>   let bindCPU=$SLURM_LOCALID*$stride
>>>   exec numactl --membind=0 --physcpubind=$bindCPU $*
>>> fi
>>> 
>>> $*
>>> 
>>> %
>>> 
>>> 
>>> -- 
>>> Andy Riebs
>>> andy.ri...@hpe.com
>>> Hewlett-Packard Enterprise
>>> High Performance Computing Software Engineering
>>> +1 404 648 9024
>>> My opinions are not necessarily those of HPE
>>>May the source be with you!
>>> 
>>> ___
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> ___
> users mailing list
> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Slurm binding not propagated to MPI jobs

2016-10-27 Thread r...@open-mpi.org
Sigh - of course it wouldn’t be simple :-(

All right, let’s suppose we look for SLURM_CPU_BIND:

* if it includes the word “none”, then we know the user specified that they 
don’t want us to bind

* if it includes the word mask_cpu, then we have to check the value of that 
option.

* If it is all F’s, then they didn’t specify a binding and we should do our 
thing.

* If it is anything else, then we assume they _did_ specify a binding, and we 
leave it alone

Would that make sense? Is there anything else that could be in that envar which 
would trip us up?


> On Oct 27, 2016, at 10:37 AM, Andy Riebs  wrote:
> 
> Yes, they still exist:
> 
> $ srun --ntasks-per-node=2 -N1 env | grep BIND | sort -u
> SLURM_CPU_BIND_LIST=0x
> SLURM_CPU_BIND=quiet,mask_cpu:0x
> SLURM_CPU_BIND_TYPE=mask_cpu:
> SLURM_CPU_BIND_VERBOSE=quiet
> Here are the relevant Slurm configuration options that could conceivably 
> change the behavior from system to system:
> SelectType  = select/cons_res
> SelectTypeParameters= CR_CPU
> 
> 
> On 10/27/2016 01:17 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote:
>> And if there is no --cpu_bind on the cmd line? Do these not exist?
>> 
>>> On Oct 27, 2016, at 10:14 AM, Andy Riebs >> <mailto:andy.ri...@hpe.com>> wrote:
>>> 
>>> Hi Ralph,
>>> 
>>> I think I've found the magic keys...
>>> 
>>> $ srun --ntasks-per-node=2 -N1 --cpu_bind=none env | grep BIND
>>> SLURM_CPU_BIND_VERBOSE=quiet
>>> SLURM_CPU_BIND_TYPE=none
>>> SLURM_CPU_BIND_LIST=
>>> SLURM_CPU_BIND=quiet,none
>>> SLURM_CPU_BIND_VERBOSE=quiet
>>> SLURM_CPU_BIND_TYPE=none
>>> SLURM_CPU_BIND_LIST=
>>> SLURM_CPU_BIND=quiet,none
>>> $ srun --ntasks-per-node=2 -N1 --cpu_bind=core env | grep BIND
>>> SLURM_CPU_BIND_VERBOSE=quiet
>>> SLURM_CPU_BIND_TYPE=mask_cpu:
>>> SLURM_CPU_BIND_LIST=0x1111,0x2222
>>> SLURM_CPU_BIND=quiet,mask_cpu:0x,0x
>>> SLURM_CPU_BIND_VERBOSE=quiet
>>> SLURM_CPU_BIND_TYPE=mask_cpu:
>>> SLURM_CPU_BIND_LIST=0x,0x
>>> SLURM_CPU_BIND=quiet,mask_cpu:0x,0x
>>> 
>>> Andy
>>> 
>>> On 10/27/2016 11:57 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote:
>>>> Hey Andy
>>>> 
>>>> Is there a SLURM envar that would tell us the binding option from the srun 
>>>> cmd line? We automatically bind when direct launched due to user 
>>>> complaints of poor performance if we don’t. If the user specifies a 
>>>> binding option, then we detect that we were already bound and don’t 
>>>> do it.
>>>> 
>>>> However, if the user specifies that they not be bound, then we think they 
>>>> simply didn’t specify anything - and that isn’t the case. If 
>>>> we can see something that tells us “they explicitly said not to do 
>>>> it”, then we can avoid the situation.
>>>> 
>>>> Ralph
>>>> 
>>>>> On Oct 27, 2016, at 8:48 AM, Andy Riebs >>>> <mailto:andy.ri...@hpe.com>> wrote:
>>>>> 
>>>>> Hi All,
>>>>> 
>>>>> We are running Open MPI version 1.10.2, built with support for Slurm 
>>>>> version 16.05.0. When a user specifies "--cpu_bind=none", MPI tries to 
>>>>> bind by core, which segv's if there are more processes than cores.
>>>>> 
>>>>> The user reports:
>>>>> 
>>>>> What I found is that
>>>>> 
>>>>> % srun --ntasks-per-node=8 --cpu_bind=none  \
>>>>> env SHMEM_SYMMETRIC_HEAP_SIZE=1024M bin/all2all.shmem.exe 0
>>>>> 
>>>>> will have the problem, but:
>>>>> 
>>>>> % srun --ntasks-per-node=8 --cpu_bind=none  \
>>>>> env SHMEM_SYMMETRIC_HEAP_SIZE=1024M ./bindit.sh bin/all2all.shmem.exe >>>>> 0
>>>>> 
>>>>> Will run as expected and print out the usage message because I 
>>>>> didn’t provide the right arguments to the code.
>>>>> 
>>>>> So, it appears that the binding has something to do with the issue. My 
>>>>> binding script is as follows:
>>>>> 
>>>>> % cat bindit.sh
>>>>> #!/bin/bash
>>>>> 
>>>>> #echo SLURM_LOCALID=$SLURM_LOCALID
>>>>> 
>>>>> stride=1
>>>>> 
>>>>> if [ ! -z &qu

Re: [OMPI users] MCA compilation later

2016-10-28 Thread r...@open-mpi.org
You don’t need any of the hardware - you just need the headers. Things like 
libfabric and libibverbs are all publicly available, and so you can build all 
that support even if you cannot run it on your machine.

Once your customer installs the binary, the various plugins will check for 
their required library and hardware and disqualify themselves if it isn’t found.

> On Oct 28, 2016, at 12:33 PM, Sean Ahern  wrote:
> 
> There's been discussion on the OpenMPI list recently about static linking of 
> OpenMPI with all of the desired MCAs in it. I've got the opposite question. 
> I'd like to add MCAs later on to an already-compiled version of OpenMPI and 
> am not quite sure how to do it.
> 
> Let me summarize. We've got a commercial code that we deploy on customer 
> machines in binary form. We're working to integrate OpenMPI into the 
> installer, and things seem to be progressing well. (Note: because we're a 
> commercial code, making the customer compile something doesn't work for us 
> like it can for open source or research codes.)
> 
> Now, we want to take advantage of OpenMPI's ability to find MCAs at runtime, 
> pointing to the various plugins that might apply to a deployed system. I've 
> configured and compiled OpenMPI on one of our build machines, one that 
> doesn't have any special interconnect hardware or software installed. We take 
> this compiled version of OpenMPI and use it on all of our machines. (Yes, 
> I've read Building FAQ #39 
>  about 
> relocating OpenMPI. Useful, that.) I'd like to take our pre-compiled version 
> of OpenMPI and add MCA libraries to it, giving OpenMPI the ability to 
> communicate via transport mechanisms that weren't available on the original 
> build machine. Things like InfiniBand, OmniPath, or one of Cray's 
> interconnects.
> 
> How would I go about doing this? And what are the limitations?
> 
> I'm guessing that I need to go configure and compile the same version of 
> OpenMPI on a machine that has the desired interconnect installation (headers 
> and libraries), then go grab the corresponding lib/openmpi/mca_*{la,so} 
> files. Take those files and drop them in our pre-built OpenMPI from our build 
> machine in the same relative plugin location (lib/openmpi). If I stick with 
> the same compiler (gcc, in this case), I'm hoping that symbols will all 
> resolve themselves at runtime. (I probably will have to do some 
> LD_LIBRARY_PATH games to be sure to find the appropriate underlying libraries 
> unless OpenMPI's process for building MCAs links them in statically somehow.)
> 
> Am I even on the right track here? (The various system-level FAQs (here 
> , here 
> , and especially here 
> ) seem to suggest that I am.)
> 
> Our first test platform will be getting OpenMPI via IB working on our 
> cluster, where we have IB (and TCP/IP) functional and not OpenMPI. This will 
> be a great stand-in for a customer that has an IB cluster and wants to just 
> run our binary installation.
> 
> Thanks.
> 
> -Sean
> 
> --
> Sean Ahern
> Computational Engineering International
> 919-363-0883
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Launching hybrid MPI/OpenMP jobs on a cluster: correct OpenMPI flags?

2016-10-28 Thread r...@open-mpi.org
FWIW: I’ll be presenting “Mapping, Ranking, and Binding - Oh My!” at the OMPI 
BoF meeting at SC’16, for those who can attend


> On Oct 11, 2016, at 8:16 AM, Dave Love  wrote:
> 
> Wirawan Purwanto  writes:
> 
>> Instead of the scenario above, I was trying to get the MPI processes
>> side-by-side (more like "fill_up" policy in SGE scheduler), i.e. fill
>> node 0 first, then fill node 1, and so on. How do I do this properly?
>> 
>> I tried a few attempts that fail:
>> 
>> $ export OMP_NUM_THREADS=2
>> $ mpirun -np 16 -map-by core:PE=2 ./EXECUTABLE
> 
> ...
> 
>> Clearly I am not understanding how this map-by works. Could somebody
>> help me? There was a wiki article partially written:
>> 
>> https://github.com/open-mpi/ompi/wiki/ProcessPlacement
>> 
>> but unfortunately it is also not clear to me.
> 
> Me neither; this stuff has traditionally been quite unclear and really
> needs documenting/explaining properly.
> 
> This sort of thing from my local instructions for OMPI 1.8 probably does
> what you want for OMP_NUM_THREADS=2 (where the qrsh options just get me
> a couple of small nodes):
> 
>  $ qrsh -pe mpi 24 -l num_proc=12 \
> mpirun -n 12 --map-by slot:PE=2 --bind-to core --report-bindings true |&
> sort -k 4 -n
>  [comp544:03093] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 
> 1[hwt 0]]: [B/B/./././.][./././././.]
>  [comp544:03093] MCW rank 1 bound to socket 0[core 2[hwt 0]], socket 0[core 
> 3[hwt 0]]: [././B/B/./.][./././././.]
>  [comp544:03093] MCW rank 2 bound to socket 0[core 4[hwt 0]], socket 0[core 
> 5[hwt 0]]: [././././B/B][./././././.]
>  [comp544:03093] MCW rank 3 bound to socket 1[core 6[hwt 0]], socket 1[core 
> 7[hwt 0]]: [./././././.][B/B/./././.]
>  [comp544:03093] MCW rank 4 bound to socket 1[core 8[hwt 0]], socket 1[core 
> 9[hwt 0]]: [./././././.][././B/B/./.]
>  [comp544:03093] MCW rank 5 bound to socket 1[core 10[hwt 0]], socket 1[core 
> 11[hwt 0]]: [./././././.][././././B/B]
>  [comp527:03056] MCW rank 6 bound to socket 0[core 0[hwt 0]], socket 0[core 
> 1[hwt 0]]: [B/B/./././.][./././././.]
>  [comp527:03056] MCW rank 7 bound to socket 0[core 2[hwt 0]], socket 0[core 
> 3[hwt 0]]: [././B/B/./.][./././././.]
>  [comp527:03056] MCW rank 8 bound to socket 0[core 4[hwt 0]], socket 0[core 
> 5[hwt 0]]: [././././B/B][./././././.]
>  [comp527:03056] MCW rank 9 bound to socket 1[core 6[hwt 0]], socket 1[core 
> 7[hwt 0]]: [./././././.][B/B/./././.]
>  [comp527:03056] MCW rank 10 bound to socket 1[core 8[hwt 0]], socket 1[core 
> 9[hwt 0]]: [./././././.][././B/B/./.]
>  [comp527:03056] MCW rank 11 bound to socket 1[core 10[hwt 0]], socket 1[core 
> 11[hwt 0]]: [./././././.][././././B/B]
> 
> I don't remember how I found that out.
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] what was the rationale behind rank mapping by socket?

2016-10-28 Thread r...@open-mpi.org
FWIW: I’ll be presenting “Mapping, Ranking, and Binding - Oh My!” at the OMPI 
BoF meeting at SC’16, for those who can attend. Will try to explain the 
rationale as well as the mechanics of the options

> On Oct 11, 2016, at 8:09 AM, Dave Love  wrote:
> 
> Gilles Gouaillardet mailto:gil...@rist.or.jp>> writes:
> 
>> Bennet,
>> 
>> 
>> my guess is mapping/binding to sockets was deemed the best compromise
>> from an
>> 
>> "out of the box" performance point of view.
>> 
>> 
>> iirc, we did fix some bugs that occured when running under asymmetric
>> cpusets/cgroups.
>> 
>> if you still have some issues with the latest Open MPI version (2.0.1)
>> and the default policy,
>> 
>> could you please describe them ?
> 
> I also don't understand why binding to sockets is the right thing to do.
> Binding to cores seems the right default to me, and I set that locally,
> with instructions about running OpenMP.  (Isn't that what other
> implementations do, which makes them look better?)
> 
> I think at least numa should be used, rather than socket.  Knights
> Landing, for instance, is single-socket, so no gets no actual binding by
> default.
> ___
> users mailing list
> users@lists.open-mpi.org 
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
> 
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] what was the rationale behind rank mapping by socket?

2016-10-28 Thread r...@open-mpi.org
   Alternatively,  processes mapped by l2cache and then bound to socket 
will simply be bound to all the processors in the socket where they are 
located. In this manner,
   users can exert detailed control over relative MCW rank location and 
binding.

   Finally, --report-bindings can be used to report bindings.

   As an example, consider a node with two processor sockets, each 
comprising four cores.  We run mpirun with -np  4  --report-bindings  and  the  
following  additional
   options:

% mpirun ... --map-by core --bind-to core
[...] ... binding child [...,0] to cpus 0001
[...] ... binding child [...,1] to cpus 0002
[...] ... binding child [...,2] to cpus 0004
[...] ... binding child [...,3] to cpus 0008

% mpirun ... --map-by socket --bind-to socket
[...] ... binding child [...,0] to socket 0 cpus 000f
[...] ... binding child [...,1] to socket 1 cpus 00f0
[...] ... binding child [...,2] to socket 0 cpus 000f
[...] ... binding child [...,3] to socket 1 cpus 00f0

% mpirun ... --map-by core:PE=2 --bind-to core
[...] ... binding child [...,0] to cpus 0003
[...] ... binding child [...,1] to cpus 000c
[...] ... binding child [...,2] to cpus 0030
[...] ... binding child [...,3] to cpus 00c0

% mpirun ... --bind-to none


  Here, --report-bindings shows the binding of each process as a mask.  In 
the first case, the processes bind to successive cores as indicated by the 
masks 0001, 0002,
   0004, and 0008.  In the second case, processes bind to all cores on 
successive sockets as indicated by the masks 000f and 00f0.  The processes 
cycle through the pro‐
   cessor  sockets  in a round-robin fashion as many times as are needed.  
In the third case, the masks show us that 2 cores have been bound per process.  
In the fourth
   case, binding is turned off and no bindings are reported.

   Open MPI's support for process binding depends on the underlying 
operating system.  Therefore, certain process binding options may not be 
available on every system.

   Process binding can also be set with MCA parameters.  Their usage is 
less convenient than that of mpirun options.  On the other hand, MCA parameters 
can be  set  not
   only on the mpirun command line, but alternatively in a system or user 
mca-params.conf file or as environment variables, as described in the MCA 
section below.  Some
   examples include:

   mpirun option  MCA parameter key value

 --map-by core  rmaps_base_mapping_policy   core
 --map-by socketrmaps_base_mapping_policy   socket
 --rank-by core rmaps_base_ranking_policy   core
 --bind-to core hwloc_base_binding_policy   core
 --bind-to socket   hwloc_base_binding_policy   socket
 --bind-to none hwloc_base_binding_policy   none


> On Oct 28, 2016, at 4:50 PM, Bennet Fauber  wrote:
> 
> Ralph,
> 
> Alas, I will not be at SC16.  I would like to hear and/or see what you
> present, so if it gets made available in alternate format, I'd
> appreciated know where and how to get it.
> 
> I am more and more coming to think that our cluster configuration is
> essentially designed to frustrated MPI developers because we use the
> scheduler to create cgroups (once upon a time, cpusets) for subsets of
> cores on multisocket machines, and I think that invalidates a lot of
> the assumptions that are getting made by people who want to bind to
> particular patters.
> 
> It's our foot, and we have been doing a good job of shooting it.  ;-)
> 
> -- bennet
> 
> 
> 
> 
> On Fri, Oct 28, 2016 at 7:18 PM, r...@open-mpi.org  wrote:
>> FWIW: I’ll be presenting “Mapping, Ranking, and Binding - Oh My!” at the
>> OMPI BoF meeting at SC’16, for those who can attend. Will try to explain the
>> rationale as well as the mechanics of the options
>> 
>> On Oct 11, 2016, at 8:09 AM, Dave Love  wrote:
>> 
>> Gilles Gouaillardet  writes:
>> 
>> Bennet,
>> 
>> 
>> my guess is mapping/binding to sockets was deemed the best compromise
>> from an
>> 
>> "out of the box" performance point of view.
>> 
>> 
>> iirc, we did fix some bugs that occured when running under asymmetric
>> cpusets/cgroups.
>> 
>> if you still have some issues with the latest Open MPI version (2.0.1)
>> and the default policy,
>> 
>> could you please describe them ?
>> 
>> 
>> I also don't understand why binding to sockets is the right thing to do.
>> Binding to cores seems the right default to me, and I set that locally,
>> with instructions about running OpenMP.  (Isn't that what other
>> i

Re: [OMPI users] mpi4py+OpenMPI: Qs about submitting bugs and examples

2016-10-31 Thread r...@open-mpi.org

> On Oct 31, 2016, at 10:39 AM, Jason Maldonis  wrote:
> 
> Hello everyone,
> 
> I am using mpi4py with OpenMPI for a simulation that uses dynamic resource 
> allocation via `mpi_spawn_multiple`.  I've been working on this problem for 
> about 6 months now and I have some questions and potential bugs I'd like to 
> submit.
> 
> Is this mailing list a good spot to submit bugs for OpenMPI? Or do I use 
> github?

You can use either - I would encourage the use of github “issues” when you have 
a specific bug, and the mailing list for general questions

> Are previous versions (like 1.10.2) still being developed for bugfixes, or do 
> I need to reproduce bugs for 2.x only?

The 1.10 series is still being supported - it has proven fairly stable and so 
the release rate has slowed down considerably in the last year. Primary 
development focus in on 2.x

> 
> I may also submit bugs to mpi4py, but I don't yet know exactly where the bugs 
> are originating from.  Do any of you know if github is the correct place to 
> submit bugs for mpi4py?

I honestly don’t know, but I do believe mpi4py is on github as well

> 
> I have also learned some cool things that are not well documented on the web, 
> and I'd like to provide nice examples or something similar. Can I contribute 
> examples to either mpi4py or OpenMPI?

Please do!

> 
> As a side note, OpenMPI 1.10.2 seems to be much more stable than 2.x for the 
> dynamic resource allocation code I am writing.

Yes, there has been an outstanding bug on the 2.x series for dynamic 
operations. We just finally found the missing code change and it is being 
ported at this time.

> 
> Thanks in advance,
> Jason Maldonis
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] MCA compilation later

2016-10-31 Thread r...@open-mpi.org
Here’s a link on how to create components:

https://github.com/open-mpi/ompi/wiki/devel-CreateComponent

and if you want to create a completely new framework:

https://github.com/open-mpi/ompi/wiki/devel-CreateFramework

If you want to distribute a proprietary plugin, you first develop and build it 
within the OMPI code base on your own machines. Then, just take the dll for 
your plugin from the /lib/openmpi directory and distribute that “blob”.

I’ll correct my comment: you need the headers and the libraries. You just don’t 
need the hardware, though it means you cannot test those features.


> On Oct 31, 2016, at 6:19 AM, Sean Ahern  wrote:
> 
> Thanks. That's what I expected and hoped. But is there a pointer about how to 
> get started? If I've got an existing OpenMPI build, what's the process to get 
> a new MCA plugin built with a new set of header files?
> 
> (I'm a bit surprised only header files are necessary. Shouldn't the plugin 
> require at least runtime linking with a low-level transport library?)
> 
> -Sean
> 
> --
> Sean Ahern
> Computational Engineering International
> 919-363-0883
> 
> On Fri, Oct 28, 2016 at 3:40 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> 
> mailto:r...@open-mpi.org>> wrote:
> You don’t need any of the hardware - you just need the headers. Things like 
> libfabric and libibverbs are all publicly available, and so you can build all 
> that support even if you cannot run it on your machine.
> 
> Once your customer installs the binary, the various plugins will check for 
> their required library and hardware and disqualify themselves if it isn’t 
> found.
> 
>> On Oct 28, 2016, at 12:33 PM, Sean Ahern > <mailto:s...@ensight.com>> wrote:
>> 
>> There's been discussion on the OpenMPI list recently about static linking of 
>> OpenMPI with all of the desired MCAs in it. I've got the opposite question. 
>> I'd like to add MCAs later on to an already-compiled version of OpenMPI and 
>> am not quite sure how to do it.
>> 
>> Let me summarize. We've got a commercial code that we deploy on customer 
>> machines in binary form. We're working to integrate OpenMPI into the 
>> installer, and things seem to be progressing well. (Note: because we're a 
>> commercial code, making the customer compile something doesn't work for us 
>> like it can for open source or research codes.)
>> 
>> Now, we want to take advantage of OpenMPI's ability to find MCAs at runtime, 
>> pointing to the various plugins that might apply to a deployed system. I've 
>> configured and compiled OpenMPI on one of our build machines, one that 
>> doesn't have any special interconnect hardware or software installed. We 
>> take this compiled version of OpenMPI and use it on all of our machines. 
>> (Yes, I've read Building FAQ #39 
>> <https://www.open-mpi.org/faq/?category=building#installdirs> about 
>> relocating OpenMPI. Useful, that.) I'd like to take our pre-compiled version 
>> of OpenMPI and add MCA libraries to it, giving OpenMPI the ability to 
>> communicate via transport mechanisms that weren't available on the original 
>> build machine. Things like InfiniBand, OmniPath, or one of Cray's 
>> interconnects.
>> 
>> How would I go about doing this? And what are the limitations?
>> 
>> I'm guessing that I need to go configure and compile the same version of 
>> OpenMPI on a machine that has the desired interconnect installation (headers 
>> and libraries), then go grab the corresponding lib/openmpi/mca_*{la,so} 
>> files. Take those files and drop them in our pre-built OpenMPI from our 
>> build machine in the same relative plugin location (lib/openmpi). If I stick 
>> with the same compiler (gcc, in this case), I'm hoping that symbols will all 
>> resolve themselves at runtime. (I probably will have to do some 
>> LD_LIBRARY_PATH games to be sure to find the appropriate underlying 
>> libraries unless OpenMPI's process for building MCAs links them in 
>> statically somehow.)
>> 
>> Am I even on the right track here? (The various system-level FAQs (here 
>> <https://www.open-mpi.org/faq/?category=supported-systems>, here 
>> <https://www.open-mpi.org/faq/?category=developers>, and especially here 
>> <https://www.open-mpi.org/faq/?category=sysadmin>) seem to suggest that I 
>> am.)
>> 
>> Our first test platform will be getting OpenMPI via IB working on our 
>> cluster, where we have IB (and TCP/IP) functional and not OpenMPI. This will 
>> be a great stand-in for a customer that

Re: [OMPI users] Slurm binding not propagated to MPI jobs

2016-11-01 Thread r...@open-mpi.org
Ah crumby!! We already solved this on master, but it cannot be backported to 
the 1.10 series without considerable pain. For some reason, the support for it 
has been removed from the 2.x series as well. I’ll try to resolve that issue 
and get the support reinstated there (probably not until 2.1).

Can you manage until then? I think the v2 RM’s are thinking Dec/Jan for 2.1.
Ralph


> On Nov 1, 2016, at 11:38 AM, Riebs, Andy  wrote:
> 
> To close the thread here… I got the following information:
>  
> Looking at SLURM_CPU_BIND is the right idea, but there are quite a few more 
> options. It misses map_cpu, rank, plus the NUMA-based options:
> rank_ldom, map_ldom, and mask_ldom. See the srun man pages for documentation.
>  
>  
> From: Riebs, Andy 
> Sent: Thursday, October 27, 2016 1:53 PM
> To: users@lists.open-mpi.org
> Subject: Re: [OMPI users] Slurm binding not propagated to MPI jobs
>  
> Hi Ralph,
> 
> I haven't played around in this code, so I'll flip the question over to the 
> Slurm list, and report back here when I learn anything.
> 
> Cheers
> Andy
> 
> On 10/27/2016 01:44 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote:
> Sigh - of course it wouldn’t be simple :-( 
>  
> All right, let’s suppose we look for SLURM_CPU_BIND:
>  
> * if it includes the word “none”, then we know the user specified that they 
> don’t want us to bind
>  
> * if it includes the word mask_cpu, then we have to check the value of that 
> option.
>  
> * If it is all F’s, then they didn’t specify a binding and we should do our 
> thing.
>  
> * If it is anything else, then we assume they _did_ specify a binding, and we 
> leave it alone
>  
> Would that make sense? Is there anything else that could be in that envar 
> which would trip us up?
>  
>  
> On Oct 27, 2016, at 10:37 AM, Andy Riebs  <mailto:andy.ri...@hpe.com>> wrote:
>  
> Yes, they still exist:
> $ srun --ntasks-per-node=2 -N1 env | grep BIND | sort -u
> SLURM_CPU_BIND_LIST=0x
> SLURM_CPU_BIND=quiet,mask_cpu:0x
> SLURM_CPU_BIND_TYPE=mask_cpu:
> SLURM_CPU_BIND_VERBOSE=quiet
> Here are the relevant Slurm configuration options that could conceivably 
> change the behavior from system to system:
> SelectType  = select/cons_res
> SelectTypeParameters= CR_CPU
> 
>  
> On 10/27/2016 01:17 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote:
> And if there is no --cpu_bind on the cmd line? Do these not exist?
>  
> On Oct 27, 2016, at 10:14 AM, Andy Riebs  <mailto:andy.ri...@hpe.com>> wrote:
>  
> Hi Ralph,
> 
> I think I've found the magic keys...
> 
> $ srun --ntasks-per-node=2 -N1 --cpu_bind=none env | grep BIND
> SLURM_CPU_BIND_VERBOSE=quiet
> SLURM_CPU_BIND_TYPE=none
> SLURM_CPU_BIND_LIST=
> SLURM_CPU_BIND=quiet,none
> SLURM_CPU_BIND_VERBOSE=quiet
> SLURM_CPU_BIND_TYPE=none
> SLURM_CPU_BIND_LIST=
> SLURM_CPU_BIND=quiet,none
> $ srun --ntasks-per-node=2 -N1 --cpu_bind=core env | grep BIND
> SLURM_CPU_BIND_VERBOSE=quiet
> SLURM_CPU_BIND_TYPE=mask_cpu:
> SLURM_CPU_BIND_LIST=0x,0x
> SLURM_CPU_BIND=quiet,mask_cpu:0x,0x
> SLURM_CPU_BIND_VERBOSE=quiet
> SLURM_CPU_BIND_TYPE=mask_cpu:
> SLURM_CPU_BIND_LIST=0x,0x
> SLURM_CPU_BIND=quiet,mask_cpu:0x,0x
> 
> Andy
> 
> On 10/27/2016 11:57 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote:
> 
> Hey Andy
> 
> Is there a SLURM envar that would tell us the binding option from the srun 
> cmd line? We automatically bind when direct launched due to user complaints 
> of poor performance if we don’t. If the user specifies a binding 
> option, then we detect that we were already bound and don’t do it.
> 
> However, if the user specifies that they not be bound, then we think they 
> simply didn’t specify anything - and that isn’t the case. If we 
> can see something that tells us “they explicitly said not to do 
> it”, then we can avoid the situation.
> 
> Ralph
> 
> 
> On Oct 27, 2016, at 8:48 AM, Andy Riebs  <mailto:andy.ri...@hpe.com>> wrote:
> 
> Hi All,
> 
> We are running Open MPI version 1.10.2, built with support for Slurm version 
> 16.05.0. When a user specifies "--cpu_bind=none", MPI tries to bind by core, 
> which segv's if there are more processes than cores.
> 
> The user reports:
> 
> What I found is that
> 
> % srun --ntasks-per-node=8 --cpu_bind=none  \
> env SHMEM_SYMMETRIC_HEAP_SIZE=1024M bin/all2all.shmem.exe 0
> 
> will have the problem, but:
> 
> % srun --ntasks-per-node=8 --cpu_bind=none  \
> env SHMEM_SYMMETRIC_HEAP_SIZE=1024M ./bindit.sh bin/all2all

Re: [OMPI users] mpirun --map-by-node

2016-11-04 Thread r...@open-mpi.org
you mistyped the option - it is “--map-by node”. Note the space between “by” 
and “node” - you had typed it with a “-“ instead of a “space”


> On Nov 4, 2016, at 4:28 AM, Mahesh Nanavalla  
> wrote:
> 
> Hi all,
> 
> I am using openmpi-1.10.3,using quad core processor(node).
> 
> I am running 3 processes on three nodes(provided by hostfile) each node 
> process is limited  by --map-by-node as below
> 
> root@OpenWrt:~# /usr/bin/mpirun --allow-run-as-root -np 3 --hostfile 
> myhostfile /usr/bin/openmpiWiFiBulb --map-by-node
> 
> root@OpenWrt:~# cat myhostfile 
> root@10.73.145.1:1 
> root@10.74.25.1:1 
> root@10.74.46.1:1 
> 
> 
> Problem is 3 process running on one node.it 's not mapping 
> one process by node.
> 
> is there any library used to run like above.if yes please tell me that .
> 
> Kindly help me where am doing wrong...
> 
> Thanks&Regards,
> Mahesh N
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] mpirun --map-by-node

2016-11-04 Thread r...@open-mpi.org
My apologies - the problem is that you list the option _after_ your executable 
name, and so we think it is an argument for your executable. You need to list 
the option _before_ your executable on the cmd line


> On Nov 4, 2016, at 4:44 AM, Mahesh Nanavalla  
> wrote:
> 
> Thanks for reply,
> 
> But,with space also not running on one process one each node
> 
> root@OpenWrt:~# /usr/bin/mpirun --allow-run-as-root -np 3 --hostfile 
> myhostfile /usr/bin/openmpiWiFiBulb --map-by node
> 
> And 
> 
> If use like this it,s working fine(running one process on each node)
> /root@OpenWrt:~#/usr/bin/mpirun --allow-run-as-root -np 3 --host 
> root@10.74.25.1 <mailto:root@10.74.25.1>,root@10.74.46.1 
> <mailto:root@10.74.46.1>,root@10.73.145.1 <mailto:root@10.73.145.1> 
> /usr/bin/openmpiWiFiBulb 
> 
> But,i want use hostfile only..
> kindly help me.
> 
> 
> On Fri, Nov 4, 2016 at 5:00 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> 
> mailto:r...@open-mpi.org>> wrote:
> you mistyped the option - it is “--map-by node”. Note the space between “by” 
> and “node” - you had typed it with a “-“ instead of a “space”
> 
> 
>> On Nov 4, 2016, at 4:28 AM, Mahesh Nanavalla > <mailto:mahesh.nanavalla...@gmail.com>> wrote:
>> 
>> Hi all,
>> 
>> I am using openmpi-1.10.3,using quad core processor(node).
>> 
>> I am running 3 processes on three nodes(provided by hostfile) each node 
>> process is limited  by --map-by-node as below
>> 
>> root@OpenWrt:~# /usr/bin/mpirun --allow-run-as-root -np 3 --hostfile 
>> myhostfile /usr/bin/openmpiWiFiBulb --map-by-node
>> 
>> root@OpenWrt:~# cat myhostfile 
>> root@10.73.145.1:1 <http://root@10.73.145.1:1/>
>> root@10.74.25.1:1 <http://root@10.74.25.1:1/>
>> root@10.74.46.1:1 <http://root@10.74.46.1:1/>
>> 
>> 
>> Problem is 3 process running on one node.it <http://node.it/>'s not mapping 
>> one process by node.
>> 
>> is there any library used to run like above.if yes please tell me that .
>> 
>> Kindly help me where am doing wrong...
>> 
>> Thanks&Regards,
>> Mahesh N
>> 
>> ___
>> users mailing list
>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
> 
> ___
> users mailing list
> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] mpirun --map-by-node

2016-11-04 Thread r...@open-mpi.org
All true - but I reiterate. The source of the problem is that the "--map-by 
node” on the cmd line must come *before* your application. Otherwise, none of 
these suggestions will help.

> On Nov 4, 2016, at 6:52 AM, Jeff Squyres (jsquyres)  
> wrote:
> 
> In your case, using slots or --npernode or --map-by node will result in the 
> same distribution of processes because you're only launching 1 process per 
> node (a.k.a. "1ppn").
> 
> They have more pronounced differences when you're launching more than 1ppn.
> 
> Let's take a step back: you should know that Open MPI uses 3 phases to plan 
> out how it will launch your MPI job:
> 
> 1. Mapping: where each process will go
> 2. Ordering: after mapping, how each process will be numbered (this 
> translates to rank ordering MPI_COMM_WORLD)
> 3. Binding: binding processes to processors
> 
> #3 is not pertinent to this conversation, so I'll leave it out of my 
> discussion below.
> 
> We're mostly talking about #1 here.  Let's look at each of the three options 
> mentioned in this thread individually.  In each of the items below, I assume 
> you are using *just* that option, and *neither of the other 2 options*:
> 
> 1. slots: this tells Open MPI the maximum number of processes that can be 
> placed on a server before it is considered to be "oversubscribed" (and Open 
> MPI won't let you oversubscribe by default).
> 
> So when you say "slots=1", you're basically telling Open MPI to launch 1 
> process per node and then to move on to the next node.  If you said 
> "slots=3", then Open MPI would launch up to 3 processes per node before 
> moving on to the next (until the total np processes were launched).
> 
> *** Be aware that we have changed the hostfile default value of slots (i.e., 
> what number of slots to use if it is not specified in the hostfile) in 
> different versions of Open MPI.  When using hostfiles, in most cases, you'll 
> see either a default value of 1 or the total number of cores on the node.
> 
> 2. --map-by node: in this case, Open MPI will map out processes round robin 
> by *node* instead of its default by *core*.  Hence, even if you had "slots=3" 
> and -np 9, Open MPI would first put a process on node A, then put a process 
> on node B, then a process on node C, and then loop back to putting a 2nd 
> process on node A, ...etc.
> 
> 3. --npernode: in this case, you're telling Open MPI how many processes to 
> put on each node before moving on to the next node.  E.g., if you "mpirun -np 
> 9 ..." (and assuming you have >=3 slots per node), Open MPI will put 3 
> processes on each node before moving on to the next node.
> 
> With the default MPI_COMM_WORLD rank ordering, the practical difference in 
> these three options is:
> 
> Case 1:
> 
> $ cat hostfile
> a slots=3
> b slots=3
> c slots=3
> $ mpirun --hostfile hostfile -np 9 my_mpi_executable
> 
> In this case, you'll end up with MCW ranks 0-2 on a, 3-5 on b, and 6-8 on c.
> 
> Case 2:
> 
> # Setting an arbitrarily large number of slots per host just to be explicitly 
> clear for this example
> $ cat hostfile
> a slots=20
> b slots=20
> c slots=20
> $ mpirun --hostfile hostfile -np 9 --map-by node my_mpi_executable
> 
> In this case, you'll end up with MCW ranks 0,3,6 on a, 1,4,7 on b, and 2,5,8 
> on c.
> 
> Case 3:
> 
> # Setting an arbitrarily large number of slots per host just to be explicitly 
> clear for this example
> $ cat hostfile
> a slots=20
> b slots=20
> c slots=20
> $ mpirun --hostfile hostfile -np 9 --npernode 3 my_mpi_executable
> 
> In this case, you'll end up with the same distribution / rank ordering as 
> case #1, but you'll still have 17 more slots you could have used.
> 
> There are lots of variations on this, too, because these mpirun options (and 
> many others) can be used in conjunction with each other.  But that gets 
> pretty esoteric pretty quickly; most users don't have a need for such 
> complexity.
> 
> 
> 
>> On Nov 4, 2016, at 8:57 AM, Bennet Fauber  wrote:
>> 
>> Mahesh,
>> 
>> Depending what you are trying to accomplish, might using the mpirun option
>> 
>> -pernode  -o-  --pernode
>> 
>> work for you?  That requests that only one process be spawned per
>> available node.
>> 
>> We generally use this for hybrid codes, where the single process will
>> spawn threads to the remaining processors.
>> 
>> Just a thought,   -- bennet
>> 
>> 
>> 
>> 
>> 
>> On Fri, Nov 4, 2016 at 8:39 AM, Mahesh Nanavalla
>&

Re: [OMPI users] Slurm binding not propagated to MPI jobs

2016-11-04 Thread r...@open-mpi.org
See https://github.com/open-mpi/ompi/pull/2365 
<https://github.com/open-mpi/ompi/pull/2365>

Let me know if that solves it for you


> On Nov 3, 2016, at 9:48 AM, Andy Riebs  wrote:
> 
> Getting that support into 2.1 would be terrific -- and might save us from 
> having to write some Slurm prolog scripts to effect that.
> 
> Thanks Ralph!
> 
> On 11/01/2016 11:36 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote:
>> Ah crumby!! We already solved this on master, but it cannot be backported to 
>> the 1.10 series without considerable pain. For some reason, the support for 
>> it has been removed from the 2.x series as well. I’ll try to resolve that 
>> issue and get the support reinstated there (probably not until 2.1).
>> 
>> Can you manage until then? I think the v2 RM’s are thinking Dec/Jan for 
>> 2.1.
>> Ralph
>> 
>> 
>>> On Nov 1, 2016, at 11:38 AM, Riebs, Andy >> <mailto:andy.ri...@hpe.com>> wrote:
>>> 
>>> To close the thread here… I got the following information:
>>>  
>>> Looking at SLURM_CPU_BIND is the right idea, but there are quite a few more 
>>> options. It misses map_cpu, rank, plus the NUMA-based options:
>>> rank_ldom, map_ldom, and mask_ldom. See the srun man pages for 
>>> documentation.
>>>  
>>>  
>>> From: Riebs, Andy 
>>> Sent: Thursday, October 27, 2016 1:53 PM
>>> To: users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>> Subject: Re: [OMPI users] Slurm binding not propagated to MPI jobs
>>>  
>>> Hi Ralph,
>>> 
>>> I haven't played around in this code, so I'll flip the question over to the 
>>> Slurm list, and report back here when I learn anything.
>>> 
>>> Cheers
>>> Andy
>>> 
>>> On 10/27/2016 01:44 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote:
>>> Sigh - of course it wouldn’t be simple :-( 
>>>  
>>> All right, let’s suppose we look for SLURM_CPU_BIND:
>>>  
>>> * if it includes the word “none”, then we know the user specified that 
>>> they don’t want us to bind
>>>  
>>> * if it includes the word mask_cpu, then we have to check the value of that 
>>> option.
>>>  
>>> * If it is all F’s, then they didn’t specify a binding and we should do 
>>> our thing.
>>>  
>>> * If it is anything else, then we assume they _did_ specify a binding, and 
>>> we leave it alone
>>>  
>>> Would that make sense? Is there anything else that could be in that envar 
>>> which would trip us up?
>>>  
>>>  
>>> On Oct 27, 2016, at 10:37 AM, Andy Riebs >> <mailto:andy.ri...@hpe.com>> wrote:
>>>  
>>> Yes, they still exist:
>>> $ srun --ntasks-per-node=2 -N1 env | grep BIND | sort -u
>>> SLURM_CPU_BIND_LIST=0x
>>> SLURM_CPU_BIND=quiet,mask_cpu:0x
>>> SLURM_CPU_BIND_TYPE=mask_cpu:
>>> SLURM_CPU_BIND_VERBOSE=quiet
>>> Here are the relevant Slurm configuration options that could conceivably 
>>> change the behavior from system to system:
>>> SelectType  = select/cons_res
>>> SelectTypeParameters= CR_CPU
>>> 
>>>  
>>> On 10/27/2016 01:17 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote:
>>> And if there is no --cpu_bind on the cmd line? Do these not exist?
>>>  
>>> On Oct 27, 2016, at 10:14 AM, Andy Riebs >> <mailto:andy.ri...@hpe.com>> wrote:
>>>  
>>> Hi Ralph,
>>> 
>>> I think I've found the magic keys...
>>> 
>>> $ srun --ntasks-per-node=2 -N1 --cpu_bind=none env | grep BIND
>>> SLURM_CPU_BIND_VERBOSE=quiet
>>> SLURM_CPU_BIND_TYPE=none
>>> SLURM_CPU_BIND_LIST=
>>> SLURM_CPU_BIND=quiet,none
>>> SLURM_CPU_BIND_VERBOSE=quiet
>>> SLURM_CPU_BIND_TYPE=none
>>> SLURM_CPU_BIND_LIST=
>>> SLURM_CPU_BIND=quiet,none
>>> $ srun --ntasks-per-node=2 -N1 --cpu_bind=core env | grep BIND
>>> SLURM_CPU_BIND_VERBOSE=quiet
>>> SLURM_CPU_BIND_TYPE=mask_cpu:
>>> SLURM_CPU_BIND_LIST=0x,0x
>>> SLURM_CPU_BIND=quiet,mask_cpu:0x,0x
>>> SLURM_CPU_BIND_VERBOSE=quiet
>>> SLURM_CPU_BIND_TYPE=mask_cpu:
>>> SLURM_CPU_BIND_LIST=0x,0x
>>> SLURM_CPU_BIND=quiet,mask_cpu:0x,0x
>>> 
>>> Andy
>>> 
>>> On 10/27/2016 11:57 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> wr

Re: [OMPI users] malloc related crash inside openmpi

2016-11-23 Thread r...@open-mpi.org
It looks like the library may not have been fully installed on that node - can 
you see if the prefix location is present, and that the LD_LIBRARY_PATH on that 
node is correctly set? The referenced component did not exist prior to the 2.0 
series, so I’m betting that your LD_LIBRARY_PATH isn’t correct on that node.


> On Nov 23, 2016, at 2:21 PM, Noam Bernstein  
> wrote:
> 
> 
>> On Nov 23, 2016, at 3:45 PM, George Bosilca > > wrote:
>> 
>> Thousands reasons ;)
> 
> Still trying to check if 2.0.1 fixes the problem, and discovered that earlier 
> runs weren’t actually using the version I intended.  When I do use 2.0.1, I 
> get the following errors:
> --
> A requested component was not found, or was unable to be opened.  This
> means that this component is either not installed or is unable to be
> used on your system (e.g., sometimes this means that shared libraries
> that the component requires are unable to be found/loaded).  Note that
> Open MPI stopped checking at the first component that it did not find.
> 
> Host:  compute-1-35
> Framework: ess
> Component: pmi
> --
> --
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> 
>   orte_ess_base_open failed
>   --> Returned value Not found (-13) instead of ORTE_SUCCESS
> --
> 
> I’ve confirmed that mpirun PATH and LD_LIBRARY_PATH are pointing to 2.0.1 
> version of things within the job script.  Configure line is as I’ve used for 
> 1.8.x, i.e.
> export CC=gcc
> export CXX=g++
> export F77=ifort
> export FC=ifort 
> 
> ./configure \
> --prefix=${DEST} \
> --with-tm=/usr/local/torque \
> --enable-mpirun-prefix-by-default \
> --with-verbs=/usr \
> --with-verbs-libdir=/usr/lib64
> Followed by “make install” Any suggestions for getting 2.0.1 working?
> 
>   thanks,
>   Noam
> 
> 
> ||
> |U.S. NAVAL|
> |_RESEARCH_|
> LABORATORY
> 
> Noam Bernstein, Ph.D.
> Center for Materials Physics and Technology
> U.S. Naval Research Laboratory
> T +1 202 404 8628  F +1 202 404 7546
> https://www.nrl.navy.mil 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] malloc related crash inside openmpi

2016-11-24 Thread r...@open-mpi.org
Just to be clear: are you saying that mpirun exits with that message? Or is 
your application process exiting with it?

There is no reason for mpirun to be looking for that library.

The library in question is in the /lib/openmpi directory, and is named 
mca_ess_pmi.[la,so]


> On Nov 23, 2016, at 2:31 PM, Noam Bernstein  
> wrote:
> 
> 
>> On Nov 23, 2016, at 5:26 PM, r...@open-mpi.org <mailto:r...@open-mpi.org> 
>> wrote:
>> 
>> It looks like the library may not have been fully installed on that node - 
>> can you see if the prefix location is present, and that the LD_LIBRARY_PATH 
>> on that node is correctly set? The referenced component did not exist prior 
>> to the 2.0 series, so I’m betting that your LD_LIBRARY_PATH isn’t correct on 
>> that node.
> 
> The LD_LIBRARY path is definitely correct on the node that’s running the 
> mpirun, I checked that, and the openmpi directory is supposedly NFS mounted 
> everywhere.  I suppose installation may have not fully worked and I didn’t 
> notice.  What’s the name of the library it’s looking for?
> 
>   
> Noam
> 
> 
> 
> ||
> |U.S. NAVAL|
> |_RESEARCH_|
> LABORATORY
> 
> Noam Bernstein, Ph.D.
> Center for Materials Physics and Technology
> U.S. Naval Research Laboratory
> T +1 202 404 8628  F +1 202 404 7546
> https://www.nrl.navy.mil <https://www.nrl.navy.mil/>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] question about "--rank-by slot" behavior

2016-11-30 Thread r...@open-mpi.org
I think you have confused “slot” with a physical “core”. The two have 
absolutely nothing to do with each other.

A “slot” is nothing more than a scheduling entry in which a process can be 
placed. So when you --rank-by slot, the ranks are assigned round-robin by 
scheduler entry - i.e., you assign all the ranks on the first node, then assign 
all the ranks on the next node, etc.

It doesn’t matter where those ranks are placed, or what core or socket they are 
running on. We just blindly go thru and assign numbers.

If you rank-by core, then we cycle across the procs by looking at the core 
number they are bound to, assigning all the procs on a node before moving to 
the next node. If you rank-by socket, then you cycle across the procs on a node 
by round-robin of sockets, assigning all procs on the node before moving to the 
next node. If you then added “span” to that directive, we’d round-robin by 
socket across all nodes before circling around to the next proc on this node.

HTH
Ralph


> On Nov 30, 2016, at 11:26 AM, David Shrader  wrote:
> 
> Hello All,
> 
> The man page for mpirun says that the default ranking procedure is 
> round-robin by slot. It doesn't seem to be that straight-forward to me, 
> though, and I wanted to ask about the behavior.
> 
> To help illustrate my confusion, here are a few examples where the ranking 
> behavior changed based on the mapping behavior, which doesn't make sense to 
> me, yet. First, here is a simple map by core (using 4 nodes of 32 cpu cores 
> each):
> 
> $> mpirun -n 128 --map-by core --report-bindings true
> [gr0649.localdomain:119614] MCW rank 0 bound to socket 0[core 0[hwt 0]]: 
> [B/././././././././././././././././.][./././././././././././././././././.]
> [gr0649.localdomain:119614] MCW rank 1 bound to socket 0[core 1[hwt 0]]: 
> [./B/./././././././././././././././.][./././././././././././././././././.]
> [gr0649.localdomain:119614] MCW rank 2 bound to socket 0[core 2[hwt 0]]: 
> [././B/././././././././././././././.][./././././././././././././././././.]
> ...output snipped...
> 
> Things look as I would expect: ranking happens round-robin through the cpu 
> cores. Now, here's a map by socket example:
> 
> $> mpirun -n 128 --map-by socket --report-bindings true
> [gr0649.localdomain:119926] MCW rank 0 bound to socket 0[core 0[hwt 0]]: 
> [B/././././././././././././././././.][./././././././././././././././././.]
> [gr0649.localdomain:119926] MCW rank 1 bound to socket 1[core 18[hwt 0]]: 
> [./././././././././././././././././.][B/././././././././././././././././.]
> [gr0649.localdomain:119926] MCW rank 2 bound to socket 0[core 1[hwt 0]]: 
> [./B/./././././././././././././././.][./././././././././././././././././.]
> ...output snipped...
> 
> Why is rank 1 on a different socket? I know I am mapping by socket in this 
> example, but, fundamentally, nothing should really be different in terms of 
> ranking, correct? The same number of processes are available on each host as 
> in the first example, and available in the same locations. How is "slot" 
> different in this case? If I use "--rank-by core," I recover the output from 
> the first example.
> 
> I thought that maybe "--rank-by slot" might be following something laid down 
> by "--map-by", but the following example shows that isn't completely correct, 
> either:
> 
> $> mpirun -n 128 --map-by socket:span --report-bindings true
> [gr0649.localdomain:119319] MCW rank 0 bound to socket 0[core 0[hwt 0]]: 
> [B/././././././././././././././././.][./././././././././././././././././.]
> [gr0649.localdomain:119319] MCW rank 1 bound to socket 1[core 18[hwt 0]]: 
> [./././././././././././././././././.][B/././././././././././././././././.]
> [gr0649.localdomain:119319] MCW rank 2 bound to socket 0[core 1[hwt 0]]: 
> [./B/./././././././././././././././.][./././././././././././././././././.]
> ...output snipped...
> 
> If ranking by slot were somehow following something left over by mapping, I 
> would have expected rank 2 to end up on a different host. So, now I don't 
> know what to expect from using "--rank-by slot." Does anyone have any 
> pointers?
> 
> Thank you for the help!
> David
> 
> -- 
> David Shrader
> HPC-ENV High Performance Computer Systems
> Los Alamos National Lab
> Email: dshrader  lanl.gov
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] question about "--rank-by slot" behavior

2016-11-30 Thread r...@open-mpi.org
“slot’ never became equivalent to “socket”, or to “core”. Here is what happened:

*for your first example: the mapper assigns the first process to the first node 
because there is a free core there, and you said to map-by core. It goes on to 
assign the second process to the second core, and the third process to the 
third core, etc. until we reach the defined #procs for that node (i.e., the 
number of assigned “slots” for that node). When it goes to rank the procs, the 
ranker starts with the first process assigned on the first node - this process 
occupies the first “slot”, and so it gets rank 0. The ranker then assigns rank 
1 to the second process it assigned to the first node, as that process occupies 
the second “slot”. Etc.

* your 2nd example: the mapper assigns the first process to the first socket of 
the first node, the second process to the second socket of the first node, and 
the third process to the first socket of the first node, until all the “slots” 
for that node have been filled. The ranker then starts with the first process 
that was assigned to the first node, and gives it rank 0. The ranker then 
assigns rank 1 to the second process that was assigned to the node - that would 
be the first proc mapped to the second socket. The ranker then assigns rank 2 
to the third proc assigned to the node - that would be the 2nd proc assigned to 
the first socket.

* your 3rd example: the mapper assigns the first process to the first socket of 
the first node, the second process to the second socket of the first node, and 
the third process to the first socket of the second node, continuing around 
until all procs have been mapped. The ranker then starts with the first proc 
assigned to the first node, and gives it rank 0. The ranker then assigns rank 1 
to the second process assigned to the first node (because we are ranking by 
slot!), which corresponds to the first proc mapped to the second socket. The 
ranker then assigns rank 2 to the third process assigned to the first node, 
which corresponds to the second proc mapped to the first socket of that node.

So you can see that you will indeed get the same relative ranking, even though 
the mapping was done using a different algorithm.

HTH
Ralph

> On Nov 30, 2016, at 2:16 PM, David Shrader  wrote:
> 
> Hello Ralph,
> 
> I do understand that "slot" is an abstract term and isn't tied down to any 
> particular piece of hardware. What I am trying to understand is how "slot" 
> came to be equivalent to "socket" in my second and third example, but "core" 
> in my first example. As far as I can tell, MPI ranks should have been 
> assigned the same in all three examples. Why weren't they?
> 
> You mentioned that, when using "--rank-by slot", the ranks are assigned 
> round-robin by scheduler entry; does this mean that the scheduler entries 
> change based on the mapping algorithm (the only thing I changed in my 
> examples) and this results in ranks being assigned differently?
> 
> Thanks again,
> David
> 
> On 11/30/2016 01:23 PM, r...@open-mpi.org wrote:
>> I think you have confused “slot” with a physical “core”. The two have 
>> absolutely nothing to do with each other.
>> 
>> A “slot” is nothing more than a scheduling entry in which a process can be 
>> placed. So when you --rank-by slot, the ranks are assigned round-robin by 
>> scheduler entry - i.e., you assign all the ranks on the first node, then 
>> assign all the ranks on the next node, etc.
>> 
>> It doesn’t matter where those ranks are placed, or what core or socket they 
>> are running on. We just blindly go thru and assign numbers.
>> 
>> If you rank-by core, then we cycle across the procs by looking at the core 
>> number they are bound to, assigning all the procs on a node before moving to 
>> the next node. If you rank-by socket, then you cycle across the procs on a 
>> node by round-robin of sockets, assigning all procs on the node before 
>> moving to the next node. If you then added “span” to that directive, we’d 
>> round-robin by socket across all nodes before circling around to the next 
>> proc on this node.
>> 
>> HTH
>> Ralph
>> 
>> 
>>> On Nov 30, 2016, at 11:26 AM, David Shrader  wrote:
>>> 
>>> Hello All,
>>> 
>>> The man page for mpirun says that the default ranking procedure is 
>>> round-robin by slot. It doesn't seem to be that straight-forward to me, 
>>> though, and I wanted to ask about the behavior.
>>> 
>>> To help illustrate my confusion, here are a few examples where the ranking 
>>> behavior changed based on the mapping behavior, which doesn't make sense to 
>>> me, yet. First, here is a simp

Re: [OMPI users] Signal propagation in 2.0.1

2016-12-01 Thread r...@open-mpi.org
Yeah, that’s a bug - we’ll have to address it

Thanks
Ralph

> On Nov 28, 2016, at 9:29 AM, Noel Rycroft  wrote:
> 
> I'm seeing different behaviour between Open MPI 1.8.4 and 2.0.1 with regards 
> to signal propagation.
> 
> With version 1.8.4 mpirun seems to propagate SIGTERM to the tasks it starts 
> which enables the tasks to handle SIGTERM.
> 
> In version 2.0.1 mpirun does not seem to propagate SIGTERM and instead I 
> suspect it's sending SIGKILL immediately. Because the child tasks are not 
> given a chance to handle SIGTERM they end up orphaning their child processes.
> 
> I have a pretty simply reproducer which consists of:
> A simple MPI application that sleeps for a number of seconds.
> A simple bash script which launches mpirun.  
> A second bash script which is used to launch a 'child' MPI application 
> 'sleep' binary
> Both scripts launch their children in the background, and 'wait' on 
> completion. They both install signal handlers for SIGTERM.
> 
> When SIGTERM is sent to the top level script it is explicitly propagated to 
> 'mpirun' via the signal handler. 
> 
> In Open MPI 1.8.4 SIGTERM is propagated to the child MPI tasks which in turn 
> explicitly propagate the signal to the child binary processes.
> 
> In Open MPI 2.0.1 I see no evidence that SIGTERM is propagated to the child 
> MPI tasks. Instead those tasks are killed and their children (the application 
> binaries) are orphaned.
> 
> Is the difference in behaviour between the different versions expected..?
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Signal propagation in 2.0.1

2016-12-02 Thread r...@open-mpi.org
Fix is on the way: https://github.com/open-mpi/ompi/pull/2498 
<https://github.com/open-mpi/ompi/pull/2498>

Thanks
Ralph

> On Dec 1, 2016, at 10:49 AM, r...@open-mpi.org wrote:
> 
> Yeah, that’s a bug - we’ll have to address it
> 
> Thanks
> Ralph
> 
>> On Nov 28, 2016, at 9:29 AM, Noel Rycroft > <mailto:noel.rycr...@cd-adapco.com>> wrote:
>> 
>> I'm seeing different behaviour between Open MPI 1.8.4 and 2.0.1 with regards 
>> to signal propagation.
>> 
>> With version 1.8.4 mpirun seems to propagate SIGTERM to the tasks it starts 
>> which enables the tasks to handle SIGTERM.
>> 
>> In version 2.0.1 mpirun does not seem to propagate SIGTERM and instead I 
>> suspect it's sending SIGKILL immediately. Because the child tasks are not 
>> given a chance to handle SIGTERM they end up orphaning their child processes.
>> 
>> I have a pretty simply reproducer which consists of:
>> A simple MPI application that sleeps for a number of seconds.
>> A simple bash script which launches mpirun.  
>> A second bash script which is used to launch a 'child' MPI application 
>> 'sleep' binary
>> Both scripts launch their children in the background, and 'wait' on 
>> completion. They both install signal handlers for SIGTERM.
>> 
>> When SIGTERM is sent to the top level script it is explicitly propagated to 
>> 'mpirun' via the signal handler. 
>> 
>> In Open MPI 1.8.4 SIGTERM is propagated to the child MPI tasks which in turn 
>> explicitly propagate the signal to the child binary processes.
>> 
>> In Open MPI 2.0.1 I see no evidence that SIGTERM is propagated to the child 
>> MPI tasks. Instead those tasks are killed and their children (the 
>> application binaries) are orphaned.
>> 
>> Is the difference in behaviour between the different versions expected..?
>> ___
>> users mailing list
>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Abort/ Deadlock issue in allreduce

2016-12-07 Thread r...@open-mpi.org
Hi Christof

Sorry if I missed this, but it sounds like you are saying that one of your 
procs abnormally terminates, and we are failing to kill the remaining job? Is 
that correct?

If so, I just did some work that might relate to that problem that is pending 
in PR #2528: https://github.com/open-mpi/ompi/pull/2528 


Would you be able to try that?

Ralph

> On Dec 7, 2016, at 9:37 AM, Christof Koehler 
>  wrote:
> 
> Hello,
> 
> On Wed, Dec 07, 2016 at 10:19:10AM -0500, Noam Bernstein wrote:
>>> On Dec 7, 2016, at 10:07 AM, Christof Koehler 
>>>  wrote:
 
>>> I really think the hang is a consequence of
>>> unclean termination (in the sense that the non-root ranks are not
>>> terminated) and probably not the cause, in my interpretation of what I
>>> see. Would you have any suggestion to catch signals sent between orterun
>>> (mpirun) and the child tasks ?
>> 
>> Do you know where in the code the termination call is?  Is it actually 
>> calling mpi_abort(), or just doing something ugly like calling fortran 
>> “stop”?  If the latter, would that explain a possible hang?
> Well, basically it tries to use wannier90 (LWANNIER=.TRUE.). The wannier90 
> input contains
> an error, a restart is requested and the wannier90.chk file the restart
> information is missing.
> "
> Exiting...
> Error: restart requested but wannier90.chk file not found
> "
> So it must terminate.
> 
> The termination happens in the libwannier.a, source file io.F90:
> 
> write(stdout,*)  'Exiting...'
> write(stdout, '(1x,a)') trim(error_msg)
> close(stdout)
> stop "wannier90 error: examine the output/error file for details"
> 
> So it calls stop  as you assumed.
> 
>> Presumably someone here can comment on what the standard says about the 
>> validity of terminating without mpi_abort.
> 
> Well, probably stop is not a good way to terminate then.
> 
> My main point was the change relative to 1.10 anyway :-) 
> 
> 
>> 
>> Actually, if you’re willing to share enough input files to reproduce, I 
>> could take a look.  I just recompiled our VASP with openmpi 2.0.1 to fix a 
>> crash that was apparently addressed by some change in the memory allocator 
>> in a recent version of openmpi.  Just e-mail me if that’s the case.
> 
> I think that is no longer necessary ? In principle it is no problem but
> it at the end of a (small) GW calculation, the Si tutorial example. 
> So the mail would be abit larger due to the WAVECAR.
> 
> 
>> 
>>  Noam
>> 
>> 
>> 
>> ||
>> |U.S. NAVAL|
>> |_RESEARCH_|
>> LABORATORY
>> Noam Bernstein, Ph.D.
>> Center for Materials Physics and Technology
>> U.S. Naval Research Laboratory
>> T +1 202 404 8628  F +1 202 404 7546
>> https://www.nrl.navy.mil 
> 
> -- 
> Dr. rer. nat. Christof Köhler   email: c.koeh...@bccms.uni-bremen.de
> Universitaet Bremen/ BCCMS  phone:  +49-(0)421-218-62334
> Am Fallturm 1/ TAB/ Raum 3.12   fax: +49-(0)421-218-62770
> 28359 Bremen  
> 
> PGP: http://www.bccms.uni-bremen.de/cms/people/c_koehler/
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] device failed to appear .. Connection timed out

2016-12-08 Thread r...@open-mpi.org
Sounds like something didn’t quite get configured right, or maybe you have a 
library installed that isn’t quite setup correctly, or...

Regardless, we generally advise building from source to avoid such problems. Is 
there some reason not to just do so?

> On Dec 8, 2016, at 6:16 AM, Daniele Tartarini  
> wrote:
> 
> Hi,
> 
> I've installed on a Red Hat 7.2 the OpenMPI distributed via Yum:
> 
> openmpi-devel.x86_64 1.10.3-3.el7  
> 
> any code I try to run (including the mpitests-*) I get the following message 
> with slight variants:
> 
>  my_machine.171619hfi_wait_for_device: The /dev/hfi1_0 device failed 
> to appear after 15.0 seconds: Connection timed out
> 
> Is anyone able to help me in identifying the source of the problem?
> Anyway,  /dev/hfi1_0 doesn't exist.
> 
> If I use an OpenMPI version compiled from source I have no issue (gcc 4.8.5).
> 
> many thanks in advance.
> 
> cheers
> Daniele
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Abort/ Deadlock issue in allreduce

2016-12-08 Thread r...@open-mpi.org
c8c019 b9420bb
>>>> Author: Jeff Squyres >
>>>> Date:   Wed Dec 7 18:24:46 2016 -0500
>>>>Merge pull request #2528 from rhc54/cmr20x/signals
>>>> 
>>>> Unfortunately it changes nothing. The root rank stops and all other
>>>> ranks (and mpirun) just stay, the remaining ranks at 100 % CPU waiting
>>>> apparently in that allreduce. The stack trace looks a bit more
>>>> interesting (git is always debug build ?), so I include it at the very
>>>> bottom just in case.
>>>> 
>>>> Off-list Gilles Gouaillardet suggested to set breakpoints at exit,
>>>> __exit etc. to try to catch signals. Would that be useful ? I need a
>>>> moment to figure out how to do this, but I can definitively try.
>>>> 
>>>> Some remark: During "make install" from the git repo I see a
>>>> 
>>>> WARNING!  Common symbols found:
>>>>  mpi-f08-types.o: 0004 C ompi_f08_mpi_2complex
>>>>  mpi-f08-types.o: 0004 C ompi_f08_mpi_2double_complex
>>>>  mpi-f08-types.o: 0004 C
>>>> ompi_f08_mpi_2double_precision
>>>>  mpi-f08-types.o: 0004 C ompi_f08_mpi_2integer
>>>>  mpi-f08-types.o: 0004 C ompi_f08_mpi_2real
>>>>  mpi-f08-types.o: 0004 C ompi_f08_mpi_aint
>>>>  mpi-f08-types.o: 0004 C ompi_f08_mpi_band
>>>>  mpi-f08-types.o: 0004 C ompi_f08_mpi_bor
>>>>  mpi-f08-types.o: 0004 C ompi_f08_mpi_bxor
>>>>  mpi-f08-types.o: 0004 C ompi_f08_mpi_byte
>>>> 
>>>> I have never noticed this before.
>>>> 
>>>> 
>>>> Best Regards
>>>> 
>>>> Christof
>>>> 
>>>> Thread 1 (Thread 0x2af84cde4840 (LWP 11219)):
>>>> #0  0x2af84e4c669d in poll () from /lib64/libc.so.6
>>>> #1  0x2af850517496 in poll_dispatch () from /cluster/mpi/openmpi/2.0.2/
>>>> intel2016/lib/libopen-pal.so.20
>>>> #2  0x2af85050ffa5 in opal_libevent2022_event_base_loop () from
>>>> /cluster/mpi/openmpi/2.0.2/intel2016/lib/libopen-pal.so.20
>>>> #3  0x2af85049fa1f in opal_progress () at runtime/opal_progress.c:207
>>>> #4  0x2af84e02f7f7 in ompi_request_default_wait_all (count=233618144,
>>>> requests=0x2, statuses=0x0) at ../opal/threads/wait_sync.h:80
>>>> #5  0x2af84e0758a7 in ompi_coll_base_allreduce_intra_recursivedoubling
>>>> (sbuf=0xdecbae0,
>>>> rbuf=0x2, count=0, dtype=0x, op=0x0, comm=0x1,
>>>> module=0xdee69e0) at base/coll_base_allreduce.c:225
>>>> #6  0x2af84e07b747 in ompi_coll_tuned_allreduce_intra_dec_fixed
>>>> (sbuf=0xdecbae0, rbuf=0x2, count=0, dtype=0x, op=0x0,
>>>> comm=0x1, module=0x1) at coll_tuned_decision_fixed.c:66
>>>> #7  0x2af84e03e832 in PMPI_Allreduce (sendbuf=0xdecbae0, recvbuf=0x2,
>>>> count=0, datatype=0x, op=0x0, comm=0x1) at pallreduce.c:107
>>>> #8  0x2af84ddaac90 in ompi_allreduce_f (sendbuf=0xdecbae0 "\005",
>>>> recvbuf=0x2 , count=0x0,
>>>> datatype=0x, op=0x0, comm=0x1, ierr=0x7ffdf3cffe9c) at
>>>> pallreduce_f.c:87
>>>> #9  0x0045ecc6 in m_sum_i_ ()
>>>> #10 0x00e172c9 in mlwf_mp_mlwf_wannier90_ ()
>>>> #11 0x004325ff in vamp () at main.F:2640
>>>> #12 0x0040de1e in main ()
>>>> #13 0x2af84e3fbb15 in __libc_start_main () from /lib64/libc.so.6
>>>> #14 0x0040dd29 in _start ()
>>>> 
>>>> On Wed, Dec 07, 2016 at 09:47:48AM -0800, r...@open-mpi.org 
>>>> wrote:
>>>>> Hi Christof
>>>>> 
>>>>> Sorry if I missed this, but it sounds like you are saying that one of
>>>> your procs abnormally terminates, and we are failing to kill the remaining
>>>> job? Is that correct?
>>>>> 
>>>>> If so, I just did some work that might relate to that problem that is
>>>> pending in PR #2528: https://github.com/open-mpi/ompi/pull/2528 <
>>>> https://github.com/open-mpi/ompi/pull/2528>
>>>>> 
>>>>> Would you be able to try that?
>>>>> 
>>>>> Ralph
>>>>> 
>>>>>> On Dec 7, 2016, at 9:37 AM, Christof Koehler <
>

[OMPI users] Release of OMPI v1.10.5

2016-12-19 Thread r...@open-mpi.org
The Open MPI Team, representing a consortium of research, academic, and 
industry partners, is pleased to announce the release of Open MPI version 
1.10.5.

v1.10.5 is a bug fix release that includes an important performance regression 
fix. All users are encouraged to upgrade to v1.10.5 when possible.  

Version 1.10.5 can be downloaded from the main Open MPI web site

https://www.open-mpi.org/software/ompi/v1.10/ 



NEWS

1.10.5 - 19 Dec 2016
--
- Update UCX APIs
- Fix bug in darray that caused MPI/IO failures
- Use a MPI_Get_library_version() like string to tag the debugger DLL.
  Thanks to Alastair McKinstry for the report
- Fix multi-threaded race condition in coll/libnbc
- Several fixes to OSHMEM
- Fix bug in UCX support due to uninitialized field
- Fix MPI_Ialltoallv with MPI_IN_PLACE and without MPI param check
- Correctly reset receive request type before init. Thanks Chris Pattison
  for the report and test case.
- Fix bug in iallgather[v]
- Fix concurrency issue with MPI_Comm_accept. Thanks to Pieter Noordhuis
  for the patch
- Fix ompi_coll_base_{gather,scatter}_intra_binomial
- Fixed an issue with MPI_Type_get_extent returning the wrong extent
  for distributed array datatypes.
- Re-enable use of rtdtsc instruction as a monotonic clock source if
  the processor has a core-invariant tsc. This is a partial fix for a
  performance regression introduced in Open MPI v1.10.3.


Ralph

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI + InfiniBand

2016-12-23 Thread r...@open-mpi.org
Also check to ensure you are using the same version of OMPI on all nodes - this 
message usually means that a different version was used on at least one node.

> On Dec 23, 2016, at 1:58 AM, gil...@rist.or.jp wrote:
> 
>  Serguei,
> 
>  
> this looks like a very different issue, orted cannot be remotely started.
> 
>  
> that typically occurs if orted cannot find some dependencies
> 
> (the Open MPI libs and/or the compiler runtime)
> 
>  
> for example, from a node, ssh  orted should not fail because of 
> unresolved dependencies.
> 
> a simple trick is to replace
> 
> mpirun ...
> 
> with
> 
> `which mpirun` ...
> 
>  
> a better option (as long as you do not plan to relocate Open MPI install dir) 
> is to configure with
> 
> --enable-mpirun-prefix-by-default
> 
>  
> Cheers,
> 
>  
> Gilles
> 
> - Original Message -
> 
> Hi All !
> As there are no any positive changes with "UDSM + IPoIB" problem since my 
> previous post, 
> we installed IPoIB on the cluster and "No OpenFabrics connection..." error 
> doesn't appear more.
> But now OpenMPI reports about another problem:
> 
> In app ERROR OUTPUT stream:
> 
> [node2:14142] [[37935,0],0] ORTE_ERROR_LOG: Data unpack had inadequate space 
> in file base/plm_base_launch_support.c at line 1035
> 
> In app OUTPUT stream:
> 
> --
> ORTE was unable to reliably start one or more daemons.
> This usually is caused by:
> 
> * not finding the required libraries and/or binaries on
>   one or more nodes. Please check your PATH and LD_LIBRARY_PATH
>   settings, or configure OMPI with --enable-orterun-prefix-by-default
> 
> * lack of authority to execute on one or more specified nodes.
>   Please verify your allocation and authorities.
> 
> * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
>   Please check with your sys admin to determine the correct location to use.
> 
> *  compilation of the orted with dynamic libraries when static are required
>   (e.g., on Cray). Please check your configure cmd line and consider using
>   one of the contrib/platform definitions for your system type.
> 
> * an inability to create a connection back to mpirun due to a
>   lack of common network interfaces and/or no route found between
>   them. Please check network connectivity (including firewalls
>   and network routing requirements).
> --
> 
> When I'm trying to run the task using single node - all works properly.
> But when I specify "run on 2 nodes", the problem appears.
> 
> I tried to run ping using IPoIB addresses and all hosts are resolved 
> properly, 
> ping requests and replies are going over IB without any problems.
> So all nodes (including head) see each other via IPoIB.
> But MPI app fails.
> 
> Same test task works perfect on all nodes being run with Ethernet transport 
> instead of InfiniBand.
> 
> P.S. We use Torque resource manager to enqueue MPI tasks.
> 
> Best regards,
> Sergei.
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] still segmentation fault with openmpi-2.0.2rc3 on Linux

2017-01-10 Thread r...@open-mpi.org
I think there is some relevant discussion here: 
https://github.com/open-mpi/ompi/issues/1569 


It looks like Gilles had (at least at one point) a fix for master when 
enable-heterogeneous, but I don’t know if that was committed.

> On Jan 9, 2017, at 8:23 AM, Howard Pritchard  wrote:
> 
> HI Siegmar,
> 
> You have some config parameters I wasn't trying that may have some impact.
> I'll give a try with these parameters.
> 
> This should be enough info for now,
> 
> Thanks,
> 
> Howard
> 
> 
> 2017-01-09 0:59 GMT-07:00 Siegmar Gross  >:
> Hi Howard,
> 
> I use the following commands to build and install the package.
> ${SYSTEM_ENV} is "Linux" and ${MACHINE_ENV} is "x86_64" for my
> Linux machine.
> 
> mkdir openmpi-2.0.2rc3-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc
> cd openmpi-2.0.2rc3-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc
> 
> ../openmpi-2.0.2rc3/configure \
>   --prefix=/usr/local/openmpi-2.0.2_64_cc \
>   --libdir=/usr/local/openmpi-2.0.2_64_cc/lib64 \
>   --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \
>   --with-jdk-headers=/usr/local/jdk1.8.0_66/include \
>   JAVA_HOME=/usr/local/jdk1.8.0_66 \
>   LDFLAGS="-m64 -mt -Wl,-z -Wl,noexecstack" CC="cc" CXX="CC" FC="f95" \
>   CFLAGS="-m64 -mt" CXXFLAGS="-m64" FCFLAGS="-m64" \
>   CPP="cpp" CXXCPP="cpp" \
>   --enable-mpi-cxx \
>   --enable-mpi-cxx-bindings \
>   --enable-cxx-exceptions \
>   --enable-mpi-java \
>   --enable-heterogeneous \
>   --enable-mpi-thread-multiple \
>   --with-hwloc=internal \
>   --without-verbs \
>   --with-wrapper-cflags="-m64 -mt" \
>   --with-wrapper-cxxflags="-m64" \
>   --with-wrapper-fcflags="-m64" \
>   --with-wrapper-ldflags="-mt" \
>   --enable-debug \
>   |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc
> 
> make |& tee log.make.$SYSTEM_ENV.$MACHINE_ENV.64_cc
> rm -r /usr/local/openmpi-2.0.2_64_cc.old
> mv /usr/local/openmpi-2.0.2_64_cc /usr/local/openmpi-2.0.2_64_cc.old
> make install |& tee log.make-install.$SYSTEM_ENV.$MACHINE_ENV.64_cc
> make check |& tee log.make-check.$SYSTEM_ENV.$MACHINE_ENV.64_cc
> 
> 
> I get a different error if I run the program with gdb.
> 
> loki spawn 118 gdb /usr/local/openmpi-2.0.2_64_cc/bin/mpiexec
> GNU gdb (GDB; SUSE Linux Enterprise 12) 7.11.1
> Copyright (C) 2016 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later  >
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-suse-linux".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> >.
> Find the GDB manual and other documentation resources online at:
>  >.
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from /usr/local/openmpi-2.0.2_64_cc/bin/mpiexec...done.
> (gdb) r -np 1 --host loki --slot-list 0:0-5,1:0-5 spawn_master
> Starting program: /usr/local/openmpi-2.0.2_64_cc/bin/mpiexec -np 1 --host 
> loki --slot-list 0:0-5,1:0-5 spawn_master
> Missing separate debuginfos, use: zypper install 
> glibc-debuginfo-2.24-2.3.x86_64
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> [New Thread 0x73b97700 (LWP 13582)]
> [New Thread 0x718a4700 (LWP 13583)]
> [New Thread 0x710a3700 (LWP 13584)]
> [New Thread 0x7fffebbba700 (LWP 13585)]
> Detaching after fork from child process 13586.
> 
> Parent process 0 running on loki
>   I create 4 slave processes
> 
> Detaching after fork from child process 13589.
> Detaching after fork from child process 13590.
> Detaching after fork from child process 13591.
> [loki:13586] OPAL ERROR: Timeout in file 
> ../../../../openmpi-2.0.2rc3/opal/mca/pmix/base/pmix_base_fns.c at line 193
> [loki:13586] *** An error occurred in MPI_Comm_spawn
> [loki:13586] *** reported by process [2873294849,0]
> [loki:13586] *** on communicator MPI_COMM_WORLD
> [loki:13586] *** MPI_ERR_UNKNOWN: unknown error
> [loki:13586] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will 
> now abort,
> [loki:13586] ***and potentially your MPI job)
> [Thread 0x7fffebbba700 (LWP 13585) exited]
> [Thread 0x710a3700 (LWP 13584) exited]
> [Thread 0x718a4700 (LWP 13583) exited]
> [Thread 0x73b97700 (LWP 13582) exited]
> [Inferior 1 (process 13567) exited with code 016]
> Missing separate debuginfos, use: zypper install 
> libpciaccess0-debuginfo-0.13.2-5.1.x86_64 
> libudev1-debuginfo-210-116.3.3.x86_64
> (gdb) bt
> No stack.
> (gdb)
> 
> Do you need anything else?
> 
> 
> Kind regards
> 
> Siegmar
> 
> Am 08.01.2017 um 17:02 schrieb Howard Pritchard:
>

Re: [OMPI users] Startup limited to 128 remote hosts in some situations?

2017-01-17 Thread r...@open-mpi.org
As I recall, the problem was that qrsh isn’t available on the backend compute 
nodes, and so we can’t use a tree for launch. If that isn’t true, then we can 
certainly adjust it.

> On Jan 17, 2017, at 9:37 AM, Mark Dixon  wrote:
> 
> Hi,
> 
> While commissioning a new cluster, I wanted to run HPL across the whole thing 
> using openmpi 2.0.1.
> 
> I couldn't get it to start on more than 129 hosts under Son of Gridengine 
> (128 remote plus the localhost running the mpirun command). openmpi would sit 
> there, waiting for all the orted's to check in; however, there were "only" a 
> maximum of 128 qrsh processes, therefore a maximum of 128 orted's, therefore 
> waiting a lng time.
> 
> Increasing plm_rsh_num_concurrent beyond the default of 128 gets the job to 
> launch.
> 
> Is this intentional, please?
> 
> Doesn't openmpi use a tree-like startup sometimes - any particular reason 
> it's not using it here?
> 
> Cheers,
> 
> Mark
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Startup limited to 128 remote hosts in some situations?

2017-01-19 Thread r...@open-mpi.org
I’ll create a patch that you can try - if it works okay, we can commit it

> On Jan 18, 2017, at 3:29 AM, William Hay  wrote:
> 
> On Tue, Jan 17, 2017 at 09:56:54AM -0800, r...@open-mpi.org wrote:
>> As I recall, the problem was that qrsh isn???t available on the backend 
>> compute nodes, and so we can???t use a tree for launch. If that isn???t 
>> true, then we can certainly adjust it.
>> 
> qrsh should be available on all nodes of a SoGE cluster but, depending on how 
> things are set up, may not be 
> findable (ie not in the PATH) when you qrsh -inherit into a node.  A 
> workaround would be to start backend 
> processes with qrsh -inherit -v PATH which will copy the PATH from the master 
> node to the slave node 
> process or otherwise pass the location of qrsh from one node or another.  
> That of course assumes that 
> qrsh is in the same location on all nodes.
> 
> I've tested that it is possible to qrsh from the head node of a job to a 
> slave node and then on to
> another slave node by this method.
> 
> William
> 
> 
>>> On Jan 17, 2017, at 9:37 AM, Mark Dixon  wrote:
>>> 
>>> Hi,
>>> 
>>> While commissioning a new cluster, I wanted to run HPL across the whole 
>>> thing using openmpi 2.0.1.
>>> 
>>> I couldn't get it to start on more than 129 hosts under Son of Gridengine 
>>> (128 remote plus the localhost running the mpirun command). openmpi would 
>>> sit there, waiting for all the orted's to check in; however, there were 
>>> "only" a maximum of 128 qrsh processes, therefore a maximum of 128 orted's, 
>>> therefore waiting a lng time.
>>> 
>>> Increasing plm_rsh_num_concurrent beyond the default of 128 gets the job to 
>>> launch.
>>> 
>>> Is this intentional, please?
>>> 
>>> Doesn't openmpi use a tree-like startup sometimes - any particular reason 
>>> it's not using it here?
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Startup limited to 128 remote hosts in some situations?

2017-01-20 Thread r...@open-mpi.org
Well, it appears we are already forwarding all envars, which should include 
PATH. Here is the qrsh command line we use:

“qrsh --inherit --nostdin -V"

So would you please try the following patch:

diff --git a/orte/mca/plm/rsh/plm_rsh_component.c 
b/orte/mca/plm/rsh/plm_rsh_component.c
index 0183bcc..1cc5aa4 100644
--- a/orte/mca/plm/rsh/plm_rsh_component.c
+++ b/orte/mca/plm/rsh/plm_rsh_component.c
@@ -288,8 +288,6 @@ static int rsh_component_query(mca_base_module_t **module, 
int *priority)
 }
 mca_plm_rsh_component.agent = tmp;
 mca_plm_rsh_component.using_qrsh = true;
-/* no tree spawn allowed under qrsh */
-mca_plm_rsh_component.no_tree_spawn = true;
 goto success;
 } else if (!mca_plm_rsh_component.disable_llspawn &&
NULL != getenv("LOADL_STEP_ID")) {


> On Jan 19, 2017, at 5:29 PM, r...@open-mpi.org wrote:
> 
> I’ll create a patch that you can try - if it works okay, we can commit it
> 
>> On Jan 18, 2017, at 3:29 AM, William Hay  wrote:
>> 
>> On Tue, Jan 17, 2017 at 09:56:54AM -0800, r...@open-mpi.org wrote:
>>> As I recall, the problem was that qrsh isn???t available on the backend 
>>> compute nodes, and so we can???t use a tree for launch. If that isn???t 
>>> true, then we can certainly adjust it.
>>> 
>> qrsh should be available on all nodes of a SoGE cluster but, depending on 
>> how things are set up, may not be 
>> findable (ie not in the PATH) when you qrsh -inherit into a node.  A 
>> workaround would be to start backend 
>> processes with qrsh -inherit -v PATH which will copy the PATH from the 
>> master node to the slave node 
>> process or otherwise pass the location of qrsh from one node or another.  
>> That of course assumes that 
>> qrsh is in the same location on all nodes.
>> 
>> I've tested that it is possible to qrsh from the head node of a job to a 
>> slave node and then on to
>> another slave node by this method.
>> 
>> William
>> 
>> 
>>>> On Jan 17, 2017, at 9:37 AM, Mark Dixon  wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> While commissioning a new cluster, I wanted to run HPL across the whole 
>>>> thing using openmpi 2.0.1.
>>>> 
>>>> I couldn't get it to start on more than 129 hosts under Son of Gridengine 
>>>> (128 remote plus the localhost running the mpirun command). openmpi would 
>>>> sit there, waiting for all the orted's to check in; however, there were 
>>>> "only" a maximum of 128 qrsh processes, therefore a maximum of 128 
>>>> orted's, therefore waiting a lng time.
>>>> 
>>>> Increasing plm_rsh_num_concurrent beyond the default of 128 gets the job 
>>>> to launch.
>>>> 
>>>> Is this intentional, please?
>>>> 
>>>> Doesn't openmpi use a tree-like startup sometimes - any particular reason 
>>>> it's not using it here?
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] MPI_Comm_spawn question

2017-01-31 Thread r...@open-mpi.org
What version of OMPI are you using?

> On Jan 31, 2017, at 7:33 AM, elistrato...@info.sgu.ru wrote:
> 
> Hi,
> 
> I am trying to write trivial master-slave program. Master simply creates
> slaves, sends them a string, they print it out and exit. Everything works
> just fine, however, when I add a delay (more than 2 sec) before calling
> MPI_Init on slave, MPI fails with MPI_ERR_SPAWN. I am pretty sure that
> MPI_Comm_spawn has some kind of timeout on waiting for slaves to call
> MPI_Init, and if they fail to respond in time, it returns an error.
> 
> I believe there is a way to change this behaviour, but I wasn't able to
> find any suggestions/ideas in the internet.
> I would appreciate if someone could help with this.
> 
> ---
> --- terminal command i use to run program:
> mpirun -n 1 hello 2 2 // the first argument to "hello" is number of
> slaves, the second is delay in seconds
> 
> --- Error message I get when delay is >=2 sec:
> [host:2231] *** An error occurred in MPI_Comm_spawn
> [host:2231] *** reported by process [3453419521,0]
> [host:2231] *** on communicator MPI_COMM_SELF
> [host:2231] *** MPI_ERR_SPAWN: could not spawn processes
> [host:2231] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will
> now abort,
> [host:2231] ***and potentially your MPI job)
> 
> --- The program itself:
> #include "stdlib.h"
> #include "stdio.h"
> #include "mpi.h"
> #include "unistd.h"
> 
> MPI_Comm slave_comm;
> MPI_Comm new_world;
> #define MESSAGE_SIZE 40
> 
> void slave() {
>   printf("Slave initialized; ");
>   MPI_Comm_get_parent(&slave_comm);
>   MPI_Intercomm_merge(slave_comm, 1, &new_world);
> 
>   int slave_rank;
>   MPI_Comm_rank(new_world, &slave_rank);
> 
>   char message[MESSAGE_SIZE];
>   MPI_Bcast(message, MESSAGE_SIZE, MPI_CHAR, 0, new_world);
> 
>   printf("Slave %d received message from master: %s\n", slave_rank, 
> message);
> }
> 
> void master(int slave_count, char* executable, char* delay) {
>   char* slave_argv[] = { delay, NULL };
>   MPI_Comm_spawn( executable,
>   slave_argv,
>   slave_count,
>   MPI_INFO_NULL,
>   0,
>   MPI_COMM_SELF,
>   &slave_comm,
>   MPI_ERRCODES_IGNORE);
>   MPI_Intercomm_merge(slave_comm, 0, &new_world);
>   char* helloWorld = "Hello New World!\0";
>   MPI_Bcast(helloWorld, MESSAGE_SIZE, MPI_CHAR, 0, new_world);
>   printf("Processes spawned!\n");
> }
> 
> int main(int argc, char* argv[]) {
>   if (argc > 2) {
>   MPI_Init(&argc, &argv);
>   master(atoi(argv[1]), argv[0], argv[2]);
>   } else {
>   sleep(atoi(argv[1])); /// delay
>   MPI_Init(&argc, &argv);
>   slave();
>   }
>   MPI_Comm_free(&new_world);
>   MPI_Comm_free(&slave_comm);
>   MPI_Finalize();
> }
> 
> 
> Thank you,
> 
> Andrew Elistratov
> 
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Performance Issues on SMP Workstation

2017-02-01 Thread r...@open-mpi.org
Simple test: replace your executable with “hostname”. If you see multiple hosts 
come out on your cluster, then you know why the performance is different.

> On Feb 1, 2017, at 2:46 PM, Andy Witzig  wrote:
> 
> Honestly, I’m not exactly sure what scheme is being used.  I am using the 
> default template from Penguin Computing for job submission.  It looks like:
> 
> #PBS -S /bin/bash
> #PBS -q T30
> #PBS -l walltime=24:00:00,nodes=1:ppn=20
> #PBS -j oe
> #PBS -N test
> #PBS -r n
> 
> mpirun $EXECUTABLE $INPUT_FILE
> 
> I’m not configuring OpenMPI anywhere else. It is possible the Penguin 
> Computing folks have pre-configured my MPI environment.  I’ll see what I can 
> find.
> 
> Best regards,
> Andy
> 
> On Feb 1, 2017, at 4:32 PM, Douglas L Reeder  > wrote:
> 
> Andy,
> 
> What allocation scheme are you using on the cluster. For some codes we see 
> noticeable differences using fillup vs round robin, not 4x though. Fillup is 
> more shared memory use while round robin uses more infinniband.
> 
> Doug
>> On Feb 1, 2017, at 3:25 PM, Andy Witzig > > wrote:
>> 
>> Hi Tom,
>> 
>> The cluster uses an Infiniband interconnect.  On the cluster I’m requesting: 
>> #PBS -l walltime=24:00:00,nodes=1:ppn=20.  So technically, the run on the 
>> cluster should be SMP on the node, since there are 20 cores/node.  On the 
>> workstation I’m just using the command: mpirun -np 20 …. I haven’t finished 
>> setting Torque/PBS up yet.
>> 
>> Best regards,
>> Andy
>> 
>> On Feb 1, 2017, at 4:10 PM, Elken, Tom > > wrote:
>> 
>> For this case:  " a cluster system with 2.6GHz Intel Haswell with 20 cores / 
>> node and 128GB RAM/node.  "
>> 
>> are you running 5 ranks per node on 4 nodes?
>> What interconnect are you using for the cluster?
>> 
>> -Tom
>> 
>>> -Original Message-
>>> From: users [mailto:users-boun...@lists.open-mpi.org 
>>> ] On Behalf Of Andrew
>>> Witzig
>>> Sent: Wednesday, February 01, 2017 1:37 PM
>>> To: Open MPI Users
>>> Subject: Re: [OMPI users] Performance Issues on SMP Workstation
>>> 
>>> By the way, the workstation has a total of 36 cores / 72 threads, so using 
>>> mpirun
>>> -np 20 is possible (and should be equivalent) on both platforms.
>>> 
>>> Thanks,
>>> cap79
>>> 
 On Feb 1, 2017, at 2:52 PM, Andy Witzig >>> > wrote:
 
 Hi all,
 
 I’m testing my application on a SMP workstation (dual Intel Xeon E5-2697 V4
>>> 2.3 GHz Intel Broadwell (boost 2.8-3.1GHz) processors 128GB RAM) and am
>>> seeing a 4x performance drop compared to a cluster system with 2.6GHz Intel
>>> Haswell with 20 cores / node and 128GB RAM/node.  Both applications have
>>> been compiled using OpenMPI 1.6.4.  I have tried running:
 
 mpirun -np 20 $EXECUTABLE $INPUT_FILE
 mpirun -np 20 --mca btl self,sm $EXECUTABLE $INPUT_FILE
 
 and others, but cannot achieve the same performance on the workstation as 
 is
>>> seen on the cluster.  The workstation outperforms on other non-MPI but 
>>> multi-
>>> threaded applications, so I don’t think it’s a hardware issue.
 
 Any help you can provide would be appreciated.
 
 Thanks,
 cap79
 ___
 users mailing list
 users@lists.open-mpi.org 
 https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>> ___
>>> users mailing list
>>> users@lists.open-mpi.org 
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>> ___
>> users mailing list
>> users@lists.open-mpi.org 
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>> 
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> ___
> users mailing list
> users@lists.open-mpi.org 
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-03 Thread r...@open-mpi.org
I do see a diff between 2.0.1 and 2.0.2 that might have a related impact. The 
way we handled the MCA param that specifies the launch agent (ssh, rsh, or 
whatever) was modified, and I don’t think the change is correct. It basically 
says that we don’t look for qrsh unless the MCA param has been changed from the 
coded default, which means we are not detecting SGE by default.

Try setting "-mca plm_rsh_agent foo" on your cmd line - that will get past the 
test, and then we should auto-detect SGE again


> On Feb 3, 2017, at 8:49 AM, Mark Dixon  wrote:
> 
> On Fri, 3 Feb 2017, Reuti wrote:
> ...
>> SGE on its own is not configured to use SSH? (I mean the entries in `qconf 
>> -sconf` for rsh_command resp. daemon).
> ...
> 
> Nope, everything left as the default:
> 
> $ qconf -sconf | grep _command
> qlogin_command   builtin
> rlogin_command   builtin
> rsh_command  builtin
> 
> I have 2.0.1 and 2.0.2 installed side by side. 2.0.1 is happy but 2.0.2 isn't.
> 
> I'll start digging, but I'd appreciate hearing from any other SGE user who 
> had tried 2.0.2 and tell me if it had worked for them, please? :)
> 
> Cheers,
> 
> Mark
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-03 Thread r...@open-mpi.org
I don’t think so - at least, that isn’t the code I was looking at.

> On Feb 3, 2017, at 9:43 AM, Glenn Johnson  wrote:
> 
> Is this the same issue that was previously fixed in PR-1960?
> 
> https://github.com/open-mpi/ompi/pull/1960/files 
> <https://github.com/open-mpi/ompi/pull/1960/files>
> 
> 
> Glenn
> 
> On Fri, Feb 3, 2017 at 10:56 AM, r...@open-mpi.org <mailto:r...@open-mpi.org> 
> mailto:r...@open-mpi.org>> wrote:
> I do see a diff between 2.0.1 and 2.0.2 that might have a related impact. The 
> way we handled the MCA param that specifies the launch agent (ssh, rsh, or 
> whatever) was modified, and I don’t think the change is correct. It basically 
> says that we don’t look for qrsh unless the MCA param has been changed from 
> the coded default, which means we are not detecting SGE by default.
> 
> Try setting "-mca plm_rsh_agent foo" on your cmd line - that will get past 
> the test, and then we should auto-detect SGE again
> 
> 
> > On Feb 3, 2017, at 8:49 AM, Mark Dixon  > <mailto:m.c.di...@leeds.ac.uk>> wrote:
> >
> > On Fri, 3 Feb 2017, Reuti wrote:
> > ...
> >> SGE on its own is not configured to use SSH? (I mean the entries in `qconf 
> >> -sconf` for rsh_command resp. daemon).
> > ...
> >
> > Nope, everything left as the default:
> >
> > $ qconf -sconf | grep _command
> > qlogin_command   builtin
> > rlogin_command   builtin
> > rsh_command  builtin
> >
> > I have 2.0.1 and 2.0.2 installed side by side. 2.0.1 is happy but 2.0.2 
> > isn't.
> >
> > I'll start digging, but I'd appreciate hearing from any other SGE user who 
> > had tried 2.0.2 and tell me if it had worked for them, please? :)
> >
> > Cheers,
> >
> > Mark
> > ___
> > users mailing list
> > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
> > <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
> 
> ___
> users mailing list
> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] MPI_Comm_spawn question

2017-02-03 Thread r...@open-mpi.org
We know v2.0.1 has problems with comm_spawn, and so you may be encountering one 
of those. Regardless, there is indeed a timeout mechanism in there. It was 
added because people would execute a comm_spawn, and then would hang and eat up 
their entire allocation time for nothing.

In v2.0.2, I see it is still hardwired at 60 seconds. I believe we eventually 
realized we needed to make that a variable, but it didn’t get into the 2.0.2 
release.


> On Feb 1, 2017, at 1:00 AM, elistrato...@info.sgu.ru wrote:
> 
> I am using Open MPI version 2.0.1.
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-12 Thread r...@open-mpi.org
Yeah, I’ll fix it this week. The problem is that you can’t check the source as 
being default as the default is ssh - so the only way to get the current code 
to check for qrsh is to specify something other than the default ssh (it 
doesn’t matter what you specify - anything will get you past the erroneous 
check so you look for qrsh).


> On Feb 9, 2017, at 3:21 PM, Jeff Squyres (jsquyres)  
> wrote:
> 
> Yes, we can get it fixed.
> 
> Ralph is unavailable this week; I don't know offhand what he meant by his 
> prior remarks.  It's possible that 
> https://github.com/open-mpi/ompi/commit/71ec5cfb436977ea9ad409ba634d27e6addf6fae;
>  can you try changing the "!=" on line to be "=="?  I.e., from
> 
> if (MCA_BASE_VAR_SOURCE_DEFAULT != source) {
> 
> to
> 
> if (MCA_BASE_VAR_SOURCE_DEFAULT == source) {
> 
> I filed https://github.com/open-mpi/ompi/issues/2947 to track the issue.
> 
> 
>> On Feb 9, 2017, at 6:01 PM, Glenn Johnson  wrote:
>> 
>> Will this be fixed in the 2.0.3 release?
>> 
>> Thanks.
>> 
>> 
>> Glenn
>> 
>> On Mon, Feb 6, 2017 at 10:45 AM, Mark Dixon  wrote:
>> On Mon, 6 Feb 2017, Mark Dixon wrote:
>> ...
>> Ah-ha! "-mca plm_rsh_agent foo" fixes it!
>> 
>> Thanks very much - presumably I can stick that in the system-wide 
>> openmpi-mca-params.conf for now.
>> ...
>> 
>> Except if I do that, it means running ompi outside of the SGE environment no 
>> longer works :(
>> 
>> Should I just revoke the following commit?
>> 
>> Cheers,
>> 
>> Mark
>> 
>> commit d51c2af76b0c011177aca8e08a5a5fcf9f5e67db
>> Author: Jeff Squyres 
>> Date:   Tue Aug 16 06:58:20 2016 -0500
>> 
>>rsh: robustify the check for plm_rsh_agent default value
>> 
>>Don't strcmp against the default value -- the default value may change
>>over time.  Instead, check to see if the MCA var source is not
>>DEFAULT.
>> 
>>Signed-off-by: Jeff Squyres 
>> 
>>(cherry picked from commit 
>> open-mpi/ompi@71ec5cfb436977ea9ad409ba634d27e6addf6fae)
>> 
>> 
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>> 
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Is gridengine integration broken in openmpi 2.0.2?

2017-02-13 Thread r...@open-mpi.org
I dug into this further, and the simplest solution for now is to simply do one 
of the following:

* replace the “!=“ with “==“ in the test, as Jeff indicated; or

* revert the commit Mark identified

Both options will restore the original logic. Given that someone already got it 
wrong, I have clarified the logic in the OMPI master repo. However, I don’t 
know how long it will be before a 2.0.3 release is issued, so GridEngine users 
might want to locally fix things in the interim.


> On Feb 12, 2017, at 1:52 PM, r...@open-mpi.org wrote:
> 
> Yeah, I’ll fix it this week. The problem is that you can’t check the source 
> as being default as the default is ssh - so the only way to get the current 
> code to check for qrsh is to specify something other than the default ssh (it 
> doesn’t matter what you specify - anything will get you past the erroneous 
> check so you look for qrsh).
> 
> 
>> On Feb 9, 2017, at 3:21 PM, Jeff Squyres (jsquyres)  
>> wrote:
>> 
>> Yes, we can get it fixed.
>> 
>> Ralph is unavailable this week; I don't know offhand what he meant by his 
>> prior remarks.  It's possible that 
>> https://github.com/open-mpi/ompi/commit/71ec5cfb436977ea9ad409ba634d27e6addf6fae;
>>  can you try changing the "!=" on line to be "=="?  I.e., from
>> 
>> if (MCA_BASE_VAR_SOURCE_DEFAULT != source) {
>> 
>> to
>> 
>> if (MCA_BASE_VAR_SOURCE_DEFAULT == source) {
>> 
>> I filed https://github.com/open-mpi/ompi/issues/2947 to track the issue.
>> 
>> 
>>> On Feb 9, 2017, at 6:01 PM, Glenn Johnson  wrote:
>>> 
>>> Will this be fixed in the 2.0.3 release?
>>> 
>>> Thanks.
>>> 
>>> 
>>> Glenn
>>> 
>>> On Mon, Feb 6, 2017 at 10:45 AM, Mark Dixon  wrote:
>>> On Mon, 6 Feb 2017, Mark Dixon wrote:
>>> ...
>>> Ah-ha! "-mca plm_rsh_agent foo" fixes it!
>>> 
>>> Thanks very much - presumably I can stick that in the system-wide 
>>> openmpi-mca-params.conf for now.
>>> ...
>>> 
>>> Except if I do that, it means running ompi outside of the SGE environment 
>>> no longer works :(
>>> 
>>> Should I just revoke the following commit?
>>> 
>>> Cheers,
>>> 
>>> Mark
>>> 
>>> commit d51c2af76b0c011177aca8e08a5a5fcf9f5e67db
>>> Author: Jeff Squyres 
>>> Date:   Tue Aug 16 06:58:20 2016 -0500
>>> 
>>>   rsh: robustify the check for plm_rsh_agent default value
>>> 
>>>   Don't strcmp against the default value -- the default value may change
>>>   over time.  Instead, check to see if the MCA var source is not
>>>   DEFAULT.
>>> 
>>>   Signed-off-by: Jeff Squyres 
>>> 
>>>   (cherry picked from commit 
>>> open-mpi/ompi@71ec5cfb436977ea9ad409ba634d27e6addf6fae)
>>> 
>>> 
>>> ___
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>> 
>>> ___
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>> 
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> 
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] "-map-by socket:PE=1" doesn't do what I expect

2017-02-15 Thread r...@open-mpi.org
Ah, yes - I know what the problem is. We weren’t expecting a PE value of 1 - 
the logic is looking expressly for values > 1 as we hadn’t anticipated this 
use-case.

I can make that change. I’m off to a workshop for the next day or so, but can 
probably do this on the plane.


> On Feb 15, 2017, at 3:17 AM, Mark Dixon  wrote:
> 
> Hi,
> 
> When combining OpenMPI 2.0.2 with OpenMP, I'm interested in launching a 
> number of ranks and allocating a number of cores to each rank. Using "-map-by 
> socket:PE=", switching to "-map-by node:PE=" if I want to allocate 
> more than a single socket to a rank, seems to do what I want.
> 
> Except for "-map-by socket:PE=1". That seems to allocate an entire socket to 
> each rank instead of a single core. Here's the output of a test program on a 
> dual socket non-hyperthreading system that reports rank core bindings (odd 
> cores on one socket, even on the other):
> 
>   $ mpirun -np 2 -map-by socket:PE=1 ./report_binding
>   Rank 0 bound somehost.somewhere:  0 2 4 6 8 10 12 14 16 18 20 22
>   Rank 1 bound somehost.somewhere:  1 3 5 7 9 11 13 15 17 19 21 23
> 
>   $ mpirun -np 2 -map-by socket:PE=2 ./report_binding
>   Rank 0 bound somehost.somewhere:  0 2
>   Rank 1 bound somehost.somewhere:  1 3
> 
>   $ mpirun -np 2 -map-by socket:PE=3 ./report_binding
>   Rank 0 bound somehost.somewhere:  0 2 4
>   Rank 1 bound somehost.somewhere:  1 3 5
> 
>   $ mpirun -np 2 -map-by socket:PE=4 ./report_binding
>   Rank 0 bound somehost.somewhere:  0 2 4 6
>   Rank 1 bound somehost.somewhere:  1 3 5 7
> 
> I get the same result if I change "socket" to "numa". Changing "socket" to 
> either "core", "node" or "slot" binds each rank to a single core (good), but 
> doesn't round-robin ranks across sockets like "socket" does (bad).
> 
> Is "-map-by socket:PE=1" doing the right thing, please? I tried reading the 
> man page but I couldn't work out what the expected behaviour is :o
> 
> Cheers,
> 
> Mark
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] "-map-by socket:PE=1" doesn't do what I expect

2017-02-15 Thread r...@open-mpi.org
Ah, yes - I know what the problem is. We weren’t expecting a PE value of 1 - 
the logic is looking expressly for values > 1 as we hadn’t anticipated this 
use-case.

I can make that change. I’m off to a workshop for the next day or so, but can 
probably do this on the plane.


> On Feb 15, 2017, at 3:17 AM, Mark Dixon  wrote:
> 
> Hi,
> 
> When combining OpenMPI 2.0.2 with OpenMP, I'm interested in launching a 
> number of ranks and allocating a number of cores to each rank. Using "-map-by 
> socket:PE=", switching to "-map-by node:PE=" if I want to allocate 
> more than a single socket to a rank, seems to do what I want.
> 
> Except for "-map-by socket:PE=1". That seems to allocate an entire socket to 
> each rank instead of a single core. Here's the output of a test program on a 
> dual socket non-hyperthreading system that reports rank core bindings (odd 
> cores on one socket, even on the other):
> 
>   $ mpirun -np 2 -map-by socket:PE=1 ./report_binding
>   Rank 0 bound somehost.somewhere:  0 2 4 6 8 10 12 14 16 18 20 22
>   Rank 1 bound somehost.somewhere:  1 3 5 7 9 11 13 15 17 19 21 23
> 
>   $ mpirun -np 2 -map-by socket:PE=2 ./report_binding
>   Rank 0 bound somehost.somewhere:  0 2
>   Rank 1 bound somehost.somewhere:  1 3
> 
>   $ mpirun -np 2 -map-by socket:PE=3 ./report_binding
>   Rank 0 bound somehost.somewhere:  0 2 4
>   Rank 1 bound somehost.somewhere:  1 3 5
> 
>   $ mpirun -np 2 -map-by socket:PE=4 ./report_binding
>   Rank 0 bound somehost.somewhere:  0 2 4 6
>   Rank 1 bound somehost.somewhere:  1 3 5 7
> 
> I get the same result if I change "socket" to "numa". Changing "socket" to 
> either "core", "node" or "slot" binds each rank to a single core (good), but 
> doesn't round-robin ranks across sockets like "socket" does (bad).
> 
> Is "-map-by socket:PE=1" doing the right thing, please? I tried reading the 
> man page but I couldn't work out what the expected behaviour is :o
> 
> Cheers,
> 
> Mark
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] "-map-by socket:PE=1" doesn't do what I expect

2017-02-15 Thread r...@open-mpi.org

> On Feb 15, 2017, at 5:45 AM, Mark Dixon  wrote:
> 
> On Wed, 15 Feb 2017, r...@open-mpi.org wrote:
> 
>> Ah, yes - I know what the problem is. We weren’t expecting a PE value of 1 - 
>> the logic is looking expressly for values > 1 as we hadn’t anticipated this 
>> use-case.
> 
> Is it a sensible use-case, or am I crazy?

Not crazy, I’d say. The expected way of doing it would be “--map-by socket 
--bind-to core”. However, I can see why someone might expect pe=1 to work.

> 
>> I can make that change. I’m off to a workshop for the next day or so, but 
>> can probably do this on the plane.
> 
> You're a star - thanks :)
> 
> Mark___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Specify the core binding when spawning a process

2017-02-15 Thread r...@open-mpi.org
Sorry for slow response - was away for awhile. What version of OMPI are you 
using?


> On Feb 8, 2017, at 1:59 PM, Allan Ma  wrote:
> 
> Hello,
> 
> I'm designing a program on a dual socket system that needs the parent process 
> and spawned child process to be at least running on (or bound to) the cores 
> of the same socket in the same node.
> 
> I wonder if anyone knows how to specify the core binding or socket binding 
> when spawning a single process using MPI_COMM_Spawn. 
> 
> Currently I tried using the setting key 'host' in mpiinfo when passing it to 
> Spawn and it appears to be working, but I don't know how to specify exactly 
> the logical core number to run on. When I bind processes to sockets when 
> starting with mpirun, I used the -cpu-set option for setting to the core 
> number in the desired socket.
> 
> Also, I was just checking the manual here:
> 
> https://www.open-mpi.org/doc/v2.0/man3/MPI_Comm_spawn.3.php#toc7 
> 
> 
> I found there is a "mapper" key in the MPI_INFO that might be useful in my 
> case:
> 
> mapper char* Mapper to be used for this job
> 
> I wonder if there's any more detailed documentation or any example on how to 
> use this mapper key.
> 
> Thanks
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread r...@open-mpi.org
Nothing immediate comes to mind - all sbatch does is create an allocation and 
then run your script in it. Perhaps your script is using a different “mpirun” 
command than when you type it interactively?

> On Feb 14, 2017, at 5:11 AM, Anastasia Kruchinina 
>  wrote:
> 
> Hi, 
> 
> I am trying to use MPI_Comm_spawn function in my code. I am having trouble 
> with openmpi 2.0.x + sbatch (batch system Slurm). 
> My test program is located here: 
> http://user.it.uu.se/~anakr367/files/MPI_test/ 
>  
> 
> When I am running my code I am getting an error: 
> 
> OPAL ERROR: Timeout in file 
> ../../../../openmpi-2.0.1/opal/mca/pmix/base/pmix_base_fns.c at line 193 
> *** An error occurred in MPI_Init_thread 
> *** on a NULL communicator 
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, 
> ***and potentially your MPI job) 
> -- 
> It looks like MPI_INIT failed for some reason; your parallel process is 
> likely to abort.  There are many reasons that a parallel process can 
> fail during MPI_INIT; some of which are due to configuration or environment 
> problems.  This failure appears to be an internal failure; here's some 
> additional information (which may only be relevant to an Open MPI 
> developer): 
> 
>ompi_dpm_dyn_init() failed 
>--> Returned "Timeout" (-15) instead of "Success" (0) 
> -- 
> 
> The interesting thing is that there is no error when I am firstly allocating 
> nodes with salloc and then run my program. So, I noticed that the program 
> works fine using openmpi 1.x+sbach/salloc or openmpi 2.0.x+salloc but not 
> openmpi 2.0.x+sbatch. 
> 
> The error was reproduced on three different computer clusters. 
> 
> Best regards, 
> Anastasia 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] fatal error with openmpi-master-201702150209-404fe32 on Linux with Sun C

2017-02-15 Thread r...@open-mpi.org
If we knew what line in that file was causing the compiler to barf, we could at 
least address it. There is probably something added in recent commits that is 
causing problems for the compiler.

So checking to see what commit might be triggering the failure would be most 
helpful.


> On Feb 15, 2017, at 8:29 AM, Siegmar Gross 
>  wrote:
> 
> Hi Gilles,
> 
>> this looks like a compiler crash, and it should be reported to Oracle.
> 
> I can try, but I don't think that they are interested, because
> we don't have a contract any longer. I didn't get the error
> building openmpi-master-201702080209-bc2890e as you can see
> below. Would it be helpful to build all intermediate versions
> to find out when the error occured the first time? Perhaps we
> can identify which change of code is responsible for the error.
> 
> loki openmpi-master-201702080209-bc2890e-Linux.x86_64.64_cc 111 grep Error 
> log.make.Linux.x86_64.64_cc
>  GENERATE mpi/man/man3/MPI_Error_class.3
>  GENERATE mpi/man/man3/MPI_Error_string.3
> 
> loki openmpi-master-201702080209-bc2890e-Linux.x86_64.64_cc 112 cd 
> ../openmpi-master-201702150209-404fe32-Linux.x86_64.64_cc
> loki openmpi-master-201702150209-404fe32-Linux.x86_64.64_cc 113 grep Error 
> log.make.Linux.x86_64.64_cc
> make[5]: *** [dstore/pmix_esh.lo] Error 1
> make[4]: *** [all-recursive] Error 1
> make[3]: *** [all-recursive] Error 1
> make[2]: *** [all-recursive] Error 1
> make[1]: *** [all-recursive] Error 1
> make: *** [all-recursive] Error 1
> loki openmpi-master-201702150209-404fe32-Linux.x86_64.64_cc 114
> 
> 
> Kind regards and thank you very much for your help
> 
> Siegmar
> 
> 
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> On Wednesday, February 15, 2017, Siegmar Gross 
>> > > wrote:
>> 
>>Hi,
>> 
>>I tried to install openmpi-master-201702150209-404fe32 on my "SUSE Linux
>>Enterprise Server 12.2 (x86_64)" with Sun C 5.14. Unfortunately, "make"
>>breaks with the following error. I've had no problems with gcc-6.3.0.
>> 
>> 
>>...
>>
>> "../../../../../../../openmpi-master-201702150209-404fe32/opal/mca/pmix/pmix2x/pmix/src/buffer_ops/copy.c",
>>  line 1004: warning: statement not reached
>>  CC   buffer_ops/internal_functions.lo
>>  CC   buffer_ops/open_close.lo
>>  CC   buffer_ops/pack.lo
>>  CC   buffer_ops/print.lo
>>  CC   buffer_ops/unpack.lo
>>  CC   sm/pmix_sm.lo
>>  CC   sm/pmix_mmap.lo
>>  CC   dstore/pmix_dstore.lo
>>  CC   dstore/pmix_esh.lo
>>cc: Fatal error in /opt/sun/developerstudio12.5/lib/compilers/bin/acomp : 
>> Signal number = 139
>>Makefile:1322: recipe for target 'dstore/pmix_esh.lo' failed
>>make[5]: *** [dstore/pmix_esh.lo] Error 1
>>make[5]: Leaving directory 
>> '/export2/src/openmpi-master/openmpi-master-201702150209-404fe32-Linux.x86_64.64_cc/opal/mca/pmix/pmix2x/pmix/src'
>>Makefile:1375: recipe for target 'all-recursive' failed
>>make[4]: *** [all-recursive] Error 1
>>make[4]: Leaving directory 
>> '/export2/src/openmpi-master/openmpi-master-201702150209-404fe32-Linux.x86_64.64_cc/opal/mca/pmix/pmix2x/pmix/src'
>>Makefile:652: recipe for target 'all-recursive' failed
>>make[3]: *** [all-recursive] Error 1
>>make[3]: Leaving directory 
>> '/export2/src/openmpi-master/openmpi-master-201702150209-404fe32-Linux.x86_64.64_cc/opal/mca/pmix/pmix2x/pmix'
>>Makefile:2037: recipe for target 'all-recursive' failed
>>make[2]: *** [all-recursive] Error 1
>>make[2]: Leaving directory 
>> '/export2/src/openmpi-master/openmpi-master-201702150209-404fe32-Linux.x86_64.64_cc/opal/mca/pmix/pmix2x'
>>Makefile:2386: recipe for target 'all-recursive' failed
>>make[1]: *** [all-recursive] Error 1
>>make[1]: Leaving directory 
>> '/export2/src/openmpi-master/openmpi-master-201702150209-404fe32-Linux.x86_64.64_cc/opal'
>>Makefile:1903: recipe for target 'all-recursive' failed
>>make: *** [all-recursive] Error 1
>>loki openmpi-master-201702150209-404fe32-Linux.x86_64.64_cc 129
>> 
>> 
>>I would be grateful, if somebody can fix the problem. Do you need anything
>>else? Thank you very much for any help in advance.
>> 
>> 
>>Kind regards
>> 
>>Siegmar
>>___
>>users mailing list
>>users@lists.open-mpi.org
>>https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>> 
>> 
>> 
>> 
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread r...@open-mpi.org
The cmd line looks fine - when you do your “sbatch” request, what is in the 
shell script you give it? Or are you saying you just “sbatch” the mpirun cmd 
directly?


> On Feb 15, 2017, at 8:07 AM, Anastasia Kruchinina 
>  wrote:
> 
> Hi, 
> 
> I am running like this: 
> mpirun -np 1 ./manager
> 
> Should I do it differently?
> 
> I also thought that all sbatch does is create an allocation and then run my 
> script in it. But it seems it is not since I am getting these results...
> 
> I would like to upgrade to OpenMPI, but no clusters near me have it yet :( So 
> I even cannot check if it works with OpenMPI 2.0.2. 
> 
> On 15 February 2017 at 16:04, Howard Pritchard  <mailto:hpprit...@gmail.com>> wrote:
> Hi Anastasia,
> 
> Definitely check the mpirun when in batch environment but you may also want 
> to upgrade to Open MPI 2.0.2.
> 
> Howard
> 
> r...@open-mpi.org <mailto:r...@open-mpi.org>  <mailto:r...@open-mpi.org>> schrieb am Mi. 15. Feb. 2017 um 07:49:
> Nothing immediate comes to mind - all sbatch does is create an allocation and 
> then run your script in it. Perhaps your script is using a different “mpirun” 
> command than when you type it interactively?
> 
>> On Feb 14, 2017, at 5:11 AM, Anastasia Kruchinina 
>> mailto:nastja.kruchin...@gmail.com>> wrote:
>> 
>> Hi, 
>> 
>> I am trying to use MPI_Comm_spawn function in my code. I am having trouble 
>> with openmpi 2.0.x + sbatch (batch system Slurm). 
>> My test program is located here: 
>> http://user.it.uu.se/~anakr367/files/MPI_test/ 
>> <http://user.it.uu.se/%7Eanakr367/files/MPI_test/> 
>> 
>> When I am running my code I am getting an error: 
>> 
>> OPAL ERROR: Timeout in file 
>> ../../../../openmpi-2.0.1/opal/mca/pmix/base/pmix_base_fns.c at line 193 
>> *** An error occurred in MPI_Init_thread 
>> *** on a NULL communicator 
>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, 
>> ***and potentially your MPI job) 
>> -- 
>> It looks like MPI_INIT failed for some reason; your parallel process is 
>> likely to abort.  There are many reasons that a parallel process can 
>> fail during MPI_INIT; some of which are due to configuration or environment 
>> problems.  This failure appears to be an internal failure; here's some 
>> additional information (which may only be relevant to an Open MPI 
>> developer): 
>> 
>>ompi_dpm_dyn_init() failed 
>>--> Returned "Timeout" (-15) instead of "Success" (0) 
>> -- 
>> 
>> The interesting thing is that there is no error when I am firstly allocating 
>> nodes with salloc and then run my program. So, I noticed that the program 
>> works fine using openmpi 1.x+sbach/salloc or openmpi 2.0.x+salloc but not 
>> openmpi 2.0.x+sbatch. 
>> 
>> The error was reproduced on three different computer clusters. 
>> 
>> Best regards, 
>> Anastasia 
>> ___
>> users mailing list
>> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
>> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
> ___
> users mailing list
> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
> ___
> users mailing list
> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Problem with MPI_Comm_spawn using openmpi 2.0.x + sbatch

2017-02-15 Thread r...@open-mpi.org
Yes, 2.0.1 has a spawn issue. We believe that 2.0.2 is okay if you want to give 
it a try 

Sent from my iPad

> On Feb 15, 2017, at 1:14 PM, Jason Maldonis  wrote:
> 
> Just to throw this out there -- to me, that doesn't seem to be just a problem 
> with SLURM. I'm guessing the exact same error would be thrown interactively 
> (unless I didn't read the above messages carefully enough).  I had a lot of 
> problems running spawned jobs on 2.0.x a few months ago, so I switched back 
> to 1.10.2 and everything worked. Just in case that helps someone.
> 
> Jason
> 
>> On Wed, Feb 15, 2017 at 1:09 PM, Anastasia Kruchinina 
>>  wrote:
>> Hi!
>> 
>> I am doing like this:
>> 
>> sbatch  -N 2 -n 5 ./job.sh
>> 
>> where job.sh is:
>> 
>> #!/bin/bash -l
>> module load openmpi/2.0.1-icc
>> mpirun -np 1 ./manager 4
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>> On 15 February 2017 at 17:58, r...@open-mpi.org  wrote:
>>> The cmd line looks fine - when you do your “sbatch” request, what is in the 
>>> shell script you give it? Or are you saying you just “sbatch” the mpirun 
>>> cmd directly?
>>> 
>>> 
>>>> On Feb 15, 2017, at 8:07 AM, Anastasia Kruchinina 
>>>>  wrote:
>>>> 
>>>> Hi, 
>>>> 
>>>> I am running like this: 
>>>> mpirun -np 1 ./manager
>>>> 
>>>> Should I do it differently?
>>>> 
>>>> I also thought that all sbatch does is create an allocation and then run 
>>>> my script in it. But it seems it is not since I am getting these results...
>>>> 
>>>> I would like to upgrade to OpenMPI, but no clusters near me have it yet :( 
>>>> So I even cannot check if it works with OpenMPI 2.0.2. 
>>>> 
>>>>> On 15 February 2017 at 16:04, Howard Pritchard  
>>>>> wrote:
>>>>> Hi Anastasia,
>>>>> 
>>>>> Definitely check the mpirun when in batch environment but you may also 
>>>>> want to upgrade to Open MPI 2.0.2.
>>>>> 
>>>>> Howard
>>>>> 
>>>>> r...@open-mpi.org  schrieb am Mi. 15. Feb. 2017 um 
>>>>> 07:49:
>>>>>> Nothing immediate comes to mind - all sbatch does is create an 
>>>>>> allocation and then run your script in it. Perhaps your script is using 
>>>>>> a different “mpirun” command than when you type it interactively?
>>>>>> 
>>>>>>> On Feb 14, 2017, at 5:11 AM, Anastasia Kruchinina 
>>>>>>>  wrote:
>>>>>>> 
>>>>>>> Hi, 
>>>>>>> 
>>>>>>> I am trying to use MPI_Comm_spawn function in my code. I am having 
>>>>>>> trouble with openmpi 2.0.x + sbatch (batch system Slurm). 
>>>>>>> My test program is located here: 
>>>>>>> http://user.it.uu.se/~anakr367/files/MPI_test/ 
>>>>>>> 
>>>>>>> When I am running my code I am getting an error: 
>>>>>>> 
>>>>>>> OPAL ERROR: Timeout in file 
>>>>>>> ../../../../openmpi-2.0.1/opal/mca/pmix/base/pmix_base_fns.c at line 
>>>>>>> 193 
>>>>>>> *** An error occurred in MPI_Init_thread 
>>>>>>> *** on a NULL communicator 
>>>>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now 
>>>>>>> abort, 
>>>>>>> ***and potentially your MPI job) 
>>>>>>> --
>>>>>>>  
>>>>>>> It looks like MPI_INIT failed for some reason; your parallel process is 
>>>>>>> likely to abort.  There are many reasons that a parallel process can 
>>>>>>> fail during MPI_INIT; some of which are due to configuration or 
>>>>>>> environment 
>>>>>>> problems.  This failure appears to be an internal failure; here's some 
>>>>>>> additional information (which may only be relevant to an Open MPI 
>>>>>>> developer): 
>>>>>>> 
>>>>>>>ompi_dpm_dyn_init() failed 
>>>>>>>--> Returned "Timeout" (-15) instead of "Success" (0) 
>>>>>>> -

Re: [OMPI users] fatal error with openmpi-master-201702150209-404fe32 on Linux with Sun C

2017-02-15 Thread r...@open-mpi.org
I guess it was the next nightly tarball, but not next commit. However, it was 
almost certainly 7acef48 from Gilles that updated the PMIx code.

Gilles: can you perhaps take a peek?

Sent from my iPad

> On Feb 15, 2017, at 11:43 AM, Siegmar Gross 
>  wrote:
> 
> Hi Ralph,
> 
> I get the error already with openmpi-master-201702100209-51def91 which
> is the next version after openmpi-master-201702080209-bc2890e, if I'm
> right.
> 
> loki openmpi-master 146 grep Error \
>  
> openmpi-master-201702080209-bc2890e-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc
>  \
>  
> openmpi-master-201702100209-51def91-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc
> openmpi-master-201702080209-bc2890e-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:
>   GENERATE mpi/man/man3/MPI_Error_class.3
> openmpi-master-201702080209-bc2890e-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:
>   GENERATE mpi/man/man3/MPI_Error_string.3
> 
> openmpi-master-201702100209-51def91-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:make[5]:
>  *** [dstore/pmix_esh.lo] Error 1
> openmpi-master-201702100209-51def91-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:make[4]:
>  *** [all-recursive] Error 1
> openmpi-master-201702100209-51def91-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:make[3]:
>  *** [all-recursive] Error 1
> openmpi-master-201702100209-51def91-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:make[2]:
>  *** [all-recursive] Error 1
> openmpi-master-201702100209-51def91-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:make[1]:
>  *** [all-recursive] Error 1
> openmpi-master-201702100209-51def91-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:make:
>  *** [all-recursive] Error 1
> 
> "pmix_esh.lo" isn't available for openmpi-master-201702100209-51def91. It's
> also not available for the other versions which break.
> 
> loki openmpi-master 147 find 
> openmpi-master-201702080209-bc2890e-Linux.x86_64.64_cc -name pmix_esh.lo
> openmpi-master-201702080209-bc2890e-Linux.x86_64.64_cc/opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.lo
> 
> loki openmpi-master 148 find 
> openmpi-master-201702100209-51def91-Linux.x86_64.64_cc -name pmix_esh.lo
> loki openmpi-master 149
> 
> Which files do you need? Which commands shall I run to get differences of
> files?
> 
> 
> Kind regards
> 
> Siegmar
> 
> 
>> Am 15.02.2017 um 17:42 schrieb r...@open-mpi.org:
>> If we knew what line in that file was causing the compiler to barf, we
>> could at least address it. There is probably something added in recent
>> commits that is causing problems for the compiler.
>> 
>> So checking to see what commit might be triggering the failure would be most 
>> helpful.
>> 
>> 
>>> On Feb 15, 2017, at 8:29 AM, Siegmar Gross 
>>>  wrote:
>>> 
>>> Hi Gilles,
>>> 
>>>> this looks like a compiler crash, and it should be reported to Oracle.
>>> 
>>> I can try, but I don't think that they are interested, because
>>> we don't have a contract any longer. I didn't get the error
>>> building openmpi-master-201702080209-bc2890e as you can see
>>> below. Would it be helpful to build all intermediate versions
>>> to find out when the error occured the first time? Perhaps we
>>> can identify which change of code is responsible for the error.
>>> 
>>> loki openmpi-master-201702080209-bc2890e-Linux.x86_64.64_cc 111 grep Error 
>>> log.make.Linux.x86_64.64_cc
>>> GENERATE mpi/man/man3/MPI_Error_class.3
>>> GENERATE mpi/man/man3/MPI_Error_string.3
>>> 
>>> loki openmpi-master-201702080209-bc2890e-Linux.x86_64.64_cc 112 cd 
>>> ../openmpi-master-201702150209-404fe32-Linux.x86_64.64_cc
>>> loki openmpi-master-201702150209-404fe32-Linux.x86_64.64_cc 113 grep Error 
>>> log.make.Linux.x86_64.64_cc
>>> make[5]: *** [dstore/pmix_esh.lo] Error 1
>>> make[4]: *** [all-recursive] Error 1
>>> make[3]: *** [all-recursive] Error 1
>>> make[2]: *** [all-recursive] Error 1
>>> make[1]: *** [all-recursive] Error 1
>>> make: *** [all-recursive] Error 1
>>> loki openmpi-master-201702150209-404fe32-Linux.x86_64.64_cc 114
>>> 
>>> 
>>> Kind regards and thank you very much for your help
>>> 
>>> Siegmar
>>> 
>>> 
>>>> 
>>>> Cheers,
>>>> 
>>>> Gilles
>>>> 
>>>> On Wednesday, February 15, 2017, Siegmar Gross 
>>>> >>> <mailto:siegmar.gr...@informatik.hs-fulda.de>> wrote:
>>>> 
>>>>   Hi,
>>>> 

Re: [OMPI users] Is building with "--enable-mpi-thread-multiple" recommended?

2017-02-17 Thread r...@open-mpi.org
Depends on the version, but if you are using something in the v2.x range, you 
should be okay with just one installed version

> On Feb 17, 2017, at 4:41 AM, Mark Dixon  wrote:
> 
> Hi,
> 
> We have some users who would like to try out openmpi MPI_THREAD_MULTIPLE 
> support on our InfiniBand cluster. I am wondering if we should enable it on 
> our production cluster-wide version, or install it as a separate "here be 
> dragons" copy.
> 
> I seem to recall openmpi folk cautioning that MPI_THREAD_MULTIPLE support was 
> pretty crazy and that enabling it could have problems for 
> non-MPI_THREAD_MULTIPLE codes (never mind codes that explicitly used it), so 
> such an install shouldn't be used unless for codes that actually need it.
> 
> Is that still the case, please?
> 
> Thanks,
> 
> Mark
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] fatal error with openmpi-master-201702150209-404fe32 on Linux with Sun C

2017-02-17 Thread r...@open-mpi.org
Thanks Gilles!

> On Feb 15, 2017, at 10:24 PM, Gilles Gouaillardet  wrote:
> 
> Ralph,
> 
> 
> i was able to rewrite some macros to make Oracle compilers happy, and filed 
> https://github.com/pmix/master/pull/309 for that
> 
> 
> Siegmar,
> 
> 
> meanwhile, feel free to manually apply the attached patch
> 
> 
> 
> Cheers,
> 
> 
> Gilles
> 
> 
> On 2/16/2017 8:09 AM, r...@open-mpi.org wrote:
>> I guess it was the next nightly tarball, but not next commit. However, it 
>> was almost certainly 7acef48 from Gilles that updated the PMIx code.
>> 
>> Gilles: can you perhaps take a peek?
>> 
>> Sent from my iPad
>> 
>>> On Feb 15, 2017, at 11:43 AM, Siegmar Gross 
>>>  wrote:
>>> 
>>> Hi Ralph,
>>> 
>>> I get the error already with openmpi-master-201702100209-51def91 which
>>> is the next version after openmpi-master-201702080209-bc2890e, if I'm
>>> right.
>>> 
>>> loki openmpi-master 146 grep Error \
>>>  
>>> openmpi-master-201702080209-bc2890e-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc
>>>  \
>>>  
>>> openmpi-master-201702100209-51def91-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc
>>> openmpi-master-201702080209-bc2890e-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:
>>>   GENERATE mpi/man/man3/MPI_Error_class.3
>>> openmpi-master-201702080209-bc2890e-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:
>>>   GENERATE mpi/man/man3/MPI_Error_string.3
>>> 
>>> openmpi-master-201702100209-51def91-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:make[5]:
>>>  *** [dstore/pmix_esh.lo] Error 1
>>> openmpi-master-201702100209-51def91-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:make[4]:
>>>  *** [all-recursive] Error 1
>>> openmpi-master-201702100209-51def91-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:make[3]:
>>>  *** [all-recursive] Error 1
>>> openmpi-master-201702100209-51def91-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:make[2]:
>>>  *** [all-recursive] Error 1
>>> openmpi-master-201702100209-51def91-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:make[1]:
>>>  *** [all-recursive] Error 1
>>> openmpi-master-201702100209-51def91-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:make:
>>>  *** [all-recursive] Error 1
>>> 
>>> "pmix_esh.lo" isn't available for openmpi-master-201702100209-51def91. It's
>>> also not available for the other versions which break.
>>> 
>>> loki openmpi-master 147 find 
>>> openmpi-master-201702080209-bc2890e-Linux.x86_64.64_cc -name pmix_esh.lo
>>> openmpi-master-201702080209-bc2890e-Linux.x86_64.64_cc/opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.lo
>>> 
>>> loki openmpi-master 148 find 
>>> openmpi-master-201702100209-51def91-Linux.x86_64.64_cc -name pmix_esh.lo
>>> loki openmpi-master 149
>>> 
>>> Which files do you need? Which commands shall I run to get differences of
>>> files?
>>> 
>>> 
>>> Kind regards
>>> 
>>> Siegmar
>>> 
>>> 
>>>> Am 15.02.2017 um 17:42 schrieb r...@open-mpi.org:
>>>> If we knew what line in that file was causing the compiler to barf, we
>>>> could at least address it. There is probably something added in recent
>>>> commits that is causing problems for the compiler.
>>>> 
>>>> So checking to see what commit might be triggering the failure would be 
>>>> most helpful.
>>>> 
>>>> 
>>>>> On Feb 15, 2017, at 8:29 AM, Siegmar Gross 
>>>>>  wrote:
>>>>> 
>>>>> Hi Gilles,
>>>>> 
>>>>>> this looks like a compiler crash, and it should be reported to Oracle.
>>>>> I can try, but I don't think that they are interested, because
>>>>> we don't have a contract any longer. I didn't get the error
>>>>> building openmpi-master-201702080209-bc2890e as you can see
>>>>> below. Would it be helpful to build all intermediate versions
>>>>> to find out when the error occured the first time? Perhaps we
>>>>> can identify which change of code is responsible for the error.
>>>>> 
>>>>> loki openmpi-master-201702080209-bc2890e-Linux.x86_64.64_cc 111 grep 
>>>>> Error log.make.Linux.x86_64.64_cc
>>>>> GENERATE mpi/man/man3/MPI_Error_class.3
>>>>> GENERATE mpi/man/man3/MPI_Error_string.3
>>>>

Re: [OMPI users] "-map-by socket:PE=1" doesn't do what I expect

2017-02-17 Thread r...@open-mpi.org
Mark - this is now available in master. Will look at what might be required to 
bring it to 2.0

> On Feb 15, 2017, at 5:49 AM, r...@open-mpi.org wrote:
> 
> 
>> On Feb 15, 2017, at 5:45 AM, Mark Dixon  wrote:
>> 
>> On Wed, 15 Feb 2017, r...@open-mpi.org wrote:
>> 
>>> Ah, yes - I know what the problem is. We weren’t expecting a PE value of 1 
>>> - the logic is looking expressly for values > 1 as we hadn’t anticipated 
>>> this use-case.
>> 
>> Is it a sensible use-case, or am I crazy?
> 
> Not crazy, I’d say. The expected way of doing it would be “--map-by socket 
> --bind-to core”. However, I can see why someone might expect pe=1 to work.
> 
>> 
>>> I can make that change. I’m off to a workshop for the next day or so, but 
>>> can probably do this on the plane.
>> 
>> You're a star - thanks :)
>> 
>> Mark___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI and Singularity

2017-02-17 Thread r...@open-mpi.org
The embedded Singularity support hasn’t made it into the OMPI 2.x release 
series yet, though OMPI will still work within a Singularity container anyway.

Compatibility across the container boundary is always a problem, as your 
examples illustrate. If the system is using one OMPI version and the container 
is using another, then the only concern is compatibility across the container 
boundary of the process-to-ORTE daemon communication. In the OMPI 2.x series 
and beyond, this is done with PMIx. OMPI v2.0 is based on PMIx v1.x, and so 
will OMPI v2.1. Thus, there is no compatibility issue there. However, that 
statement is _not_ true for OMPI v1.10 and earlier series.

Future OMPI versions will utilize PMIx v2 and above, which include a 
cross-version compatibility layer. Thus, you shouldn’t have any issues mixing 
and matching OMPI versions from this regard.

However, your second example is a perfect illustration of where 
containerization can break down. If you build your container on a system that 
doesn’t have (for example) tm and verbs installed on it, then those OMPI 
components will not be built. The tm component won’t matter as the system 
version of mpirun will be executing, and it presumably knows how to interact 
with Torque.

However, if you run that container on a system that has verbs, your application 
won’t be able to utilize the verbs support because those components were never 
compiled. Note that the converse is not true: if you build your container on a 
system that has verbs installed, you can then run it on a system that doesn’t 
have verbs support and those components will dynamically disqualify themselves.

Remember, you only need the verbs headers to be installed - you don’t have to 
build on a machine that actually has a verbs-supporting NIC installed (this is 
how the distributions get around the problem). Thus, it isn’t hard to avoid 
this portability problem - you just need to think ahead a bit.

HTH
Ralph

> On Feb 17, 2017, at 3:49 PM, Bennet Fauber  wrote:
> 
> I am wishing to follow the instructions on the Singularity web site,
> 
>http://singularity.lbl.gov/docs-hpc
> 
> to test Singularity and OMPI on our cluster.  My previously normal
> configure for the 1.x series looked like this.
> 
> ./configure --prefix=/usr/local \
>   --mandir=${PREFIX}/share/man \
>   --with-tm --with-verbs \
>   --disable-dlopen --enable-shared
>   CC=gcc CXX=g++ FC=gfortran
> 
> I have a couple of wonderments.
> 
> First, I presume it will be best to have the same version of OMPI
> inside the container as out, but how sensitive will it be to minor
> versions?  All 2.1.x version should be fine, but not mix 2.1.x outside
> with 2.2.x inside or vice-versa (might be backward compatible but not
> forward)?
> 
> Second, if someone builds OMPI inside their container on an external
> system, without tm and verbs, then brings the container to our system,
> will the tm and verbs be handled by the calling mpirun from the host
> system, and the OMPI inside the container won't care?  Will not having
> those inside the container cause them to be suppressed outside?
> 
> Thanks in advance,  -- bennet
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI and Singularity

2017-02-17 Thread r...@open-mpi.org
I -think- that is correct, but you may need the verbs library as well - I 
honestly don’t remember if the configury checks for functions in the library or 
not. If so, then you’ll need that wherever you build OMPI, but everything else 
is accurate

Good luck - and let us know how it goes!
Ralph

> On Feb 17, 2017, at 4:34 PM, Bennet Fauber  wrote:
> 
> Ralph.
> 
> I will be building from the Master branch at github.com for testing
> purposes.  We are not 'supporting' Singularity container creation, but
> we do hope to be able to offer some guidance, so I think we can
> finesse the PMIx version, yes?
> 
> That is good to know about the verbs headers being the only thing
> needed; thanks for that detail.  Sometimes the library also needs to
> be present.
> 
> Also very good to know that the host mpirun will start processes, as
> we are using cgroups, and if the processes get started by a
> non-tm-supporting MPI, they will be outside the proper cgroup.
> 
> So, just to recap, if I install from the current master at
> http://github.com/open-mpi/ompi.git on the host system and within the
> container, I copy the verbs headers into the container, then configure
> and build OMPI within the container and ignore TM support, I should be
> able to copy the container to the cluster and run it with verbs and
> the system OMPI using tm.
> 
> If a user were to build without the verbs support, it would still run,
> but it would fall back to non-verbs communication, so it would just be
> commensurately slower.
> 
> Let me know if I've garbled things.  Otherwise, wish me luck, and have
> a good weekend!
> 
> Thanks,  -- bennet
> 
> 
> 
> On Fri, Feb 17, 2017 at 7:24 PM, r...@open-mpi.org  wrote:
>> The embedded Singularity support hasn’t made it into the OMPI 2.x release 
>> series yet, though OMPI will still work within a Singularity container 
>> anyway.
>> 
>> Compatibility across the container boundary is always a problem, as your 
>> examples illustrate. If the system is using one OMPI version and the 
>> container is using another, then the only concern is compatibility across 
>> the container boundary of the process-to-ORTE daemon communication. In the 
>> OMPI 2.x series and beyond, this is done with PMIx. OMPI v2.0 is based on 
>> PMIx v1.x, and so will OMPI v2.1. Thus, there is no compatibility issue 
>> there. However, that statement is _not_ true for OMPI v1.10 and earlier 
>> series.
>> 
>> Future OMPI versions will utilize PMIx v2 and above, which include a 
>> cross-version compatibility layer. Thus, you shouldn’t have any issues 
>> mixing and matching OMPI versions from this regard.
>> 
>> However, your second example is a perfect illustration of where 
>> containerization can break down. If you build your container on a system 
>> that doesn’t have (for example) tm and verbs installed on it, then those 
>> OMPI components will not be built. The tm component won’t matter as the 
>> system version of mpirun will be executing, and it presumably knows how to 
>> interact with Torque.
>> 
>> However, if you run that container on a system that has verbs, your 
>> application won’t be able to utilize the verbs support because those 
>> components were never compiled. Note that the converse is not true: if you 
>> build your container on a system that has verbs installed, you can then run 
>> it on a system that doesn’t have verbs support and those components will 
>> dynamically disqualify themselves.
>> 
>> Remember, you only need the verbs headers to be installed - you don’t have 
>> to build on a machine that actually has a verbs-supporting NIC installed 
>> (this is how the distributions get around the problem). Thus, it isn’t hard 
>> to avoid this portability problem - you just need to think ahead a bit.
>> 
>> HTH
>> Ralph
>> 
>>> On Feb 17, 2017, at 3:49 PM, Bennet Fauber  wrote:
>>> 
>>> I am wishing to follow the instructions on the Singularity web site,
>>> 
>>>   http://singularity.lbl.gov/docs-hpc
>>> 
>>> to test Singularity and OMPI on our cluster.  My previously normal
>>> configure for the 1.x series looked like this.
>>> 
>>> ./configure --prefix=/usr/local \
>>>  --mandir=${PREFIX}/share/man \
>>>  --with-tm --with-verbs \
>>>  --disable-dlopen --enable-shared
>>>  CC=gcc CXX=g++ FC=gfortran
>>> 
>>> I have a couple of wonderments.
>>> 
>>> First, I presume it will be best to have the same version of OMPI
>>> inside the container as out, but how sensitive will it be to 

Re: [OMPI users] Is building with "--enable-mpi-thread-multiple" recommended?

2017-02-18 Thread r...@open-mpi.org
We have been making a concerted effort to resolve outstanding issues as the 
interest in threaded applications has grown. It should be pretty good now, but 
we do see occasional bug reports, so it isn’t perfect.

> On Feb 18, 2017, at 12:14 AM, Mark Dixon  wrote:
> 
> On Fri, 17 Feb 2017, r...@open-mpi.org wrote:
> 
>> Depends on the version, but if you are using something in the v2.x range, 
>> you should be okay with just one installed version
> 
> Thanks Ralph.
> 
> How good is MPI_THREAD_MULTIPLE support these days and how far up the 
> wishlist is it, please?
> 
> We don't get many openmpi-specific queries from users but, other than core 
> binding, it seems to be the thing we get asked about the most (I normally 
> point those people at mvapich2 or intelmpi instead).
> 
> Cheers,
> 
> Mark
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Is building with "--enable-mpi-thread-multiple" recommended?

2017-02-18 Thread r...@open-mpi.org
FWIW: have you taken a look at the event notification mechanisms in PMIx yet? 
The intent there, among other features, is to provide async notification of 
events generated either by the system (e.g., node failures and/or congestion) 
or other application processes.

https://pmix.github.io/master <https://pmix.github.io/master>

OMPI includes PMIx support beginning with OMPI v2.0, and various RMs are 
releasing their integrated support as well.
Ralph

> On Feb 18, 2017, at 10:07 AM, Michel Lesoinne  wrote:
> 
> I am also a proponent of the multiple thread support. For many reasons:
>  - code simplification
>  - easier support of computation/communication overlap with fewer 
> synchronization points
>  - possibility of creating exception aware MPI Code (I think the MPI standard 
> cruelly lacks constructs for a natural clean handling of application 
> exceptions across processes)
> 
> So it is good to hear there is progress.
> 
> On Feb 18, 2017 7:43 AM, "r...@open-mpi.org <mailto:r...@open-mpi.org>" 
> mailto:r...@open-mpi.org>> wrote:
> We have been making a concerted effort to resolve outstanding issues as the 
> interest in threaded applications has grown. It should be pretty good now, 
> but we do see occasional bug reports, so it isn’t perfect.
> 
> > On Feb 18, 2017, at 12:14 AM, Mark Dixon  > <mailto:m.c.di...@leeds.ac.uk>> wrote:
> >
> > On Fri, 17 Feb 2017, r...@open-mpi.org <mailto:r...@open-mpi.org> wrote:
> >
> >> Depends on the version, but if you are using something in the v2.x range, 
> >> you should be okay with just one installed version
> >
> > Thanks Ralph.
> >
> > How good is MPI_THREAD_MULTIPLE support these days and how far up the 
> > wishlist is it, please?
> >
> > We don't get many openmpi-specific queries from users but, other than core 
> > binding, it seems to be the thing we get asked about the most (I normally 
> > point those people at mvapich2 or intelmpi instead).
> >
> > Cheers,
> >
> > Mark
> > ___
> > users mailing list
> > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
> > <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
> 
> ___
> users mailing list
> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users 
> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI and Singularity

2017-02-20 Thread r...@open-mpi.org
If you can send us some more info on how it breaks, that would be helpful. I’ll 
file it as an issue so we can track things

Thanks
Ralph


> On Feb 20, 2017, at 9:13 AM, Bennet Fauber  wrote:
> 
> I got mixed results when bringing a container that doesn't have the IB
> and Torque libraries compiled into the OMPI inside the container to a
> cluster where it does.
> 
> The short summary is that mutlinode communication seems unreliable.  I
> can mostly get up to 8 procs, two-per-node, to run, but beyond that
> not.  In a couple of cases, a particular node seemed able to cause a
> problem.  I am going to try again making the configure line inside the
> container the same as outside, but I have to chase down the IB and
> Torque to do so.
> 
> If you're interested in how it breaks, I can send you some more
> information.  If there are diagnostics you would like, I can try to
> provide those.  I will be gone starting Thu for a week.
> 
> -- bennet
> 
> 
> 
> 
> On Fri, Feb 17, 2017 at 11:20 PM, r...@open-mpi.org  wrote:
>> I -think- that is correct, but you may need the verbs library as well - I 
>> honestly don’t remember if the configury checks for functions in the library 
>> or not. If so, then you’ll need that wherever you build OMPI, but everything 
>> else is accurate
>> 
>> Good luck - and let us know how it goes!
>> Ralph
>> 
>>> On Feb 17, 2017, at 4:34 PM, Bennet Fauber  wrote:
>>> 
>>> Ralph.
>>> 
>>> I will be building from the Master branch at github.com for testing
>>> purposes.  We are not 'supporting' Singularity container creation, but
>>> we do hope to be able to offer some guidance, so I think we can
>>> finesse the PMIx version, yes?
>>> 
>>> That is good to know about the verbs headers being the only thing
>>> needed; thanks for that detail.  Sometimes the library also needs to
>>> be present.
>>> 
>>> Also very good to know that the host mpirun will start processes, as
>>> we are using cgroups, and if the processes get started by a
>>> non-tm-supporting MPI, they will be outside the proper cgroup.
>>> 
>>> So, just to recap, if I install from the current master at
>>> http://github.com/open-mpi/ompi.git on the host system and within the
>>> container, I copy the verbs headers into the container, then configure
>>> and build OMPI within the container and ignore TM support, I should be
>>> able to copy the container to the cluster and run it with verbs and
>>> the system OMPI using tm.
>>> 
>>> If a user were to build without the verbs support, it would still run,
>>> but it would fall back to non-verbs communication, so it would just be
>>> commensurately slower.
>>> 
>>> Let me know if I've garbled things.  Otherwise, wish me luck, and have
>>> a good weekend!
>>> 
>>> Thanks,  -- bennet
>>> 
>>> 
>>> 
>>> On Fri, Feb 17, 2017 at 7:24 PM, r...@open-mpi.org  
>>> wrote:
>>>> The embedded Singularity support hasn’t made it into the OMPI 2.x release 
>>>> series yet, though OMPI will still work within a Singularity container 
>>>> anyway.
>>>> 
>>>> Compatibility across the container boundary is always a problem, as your 
>>>> examples illustrate. If the system is using one OMPI version and the 
>>>> container is using another, then the only concern is compatibility across 
>>>> the container boundary of the process-to-ORTE daemon communication. In the 
>>>> OMPI 2.x series and beyond, this is done with PMIx. OMPI v2.0 is based on 
>>>> PMIx v1.x, and so will OMPI v2.1. Thus, there is no compatibility issue 
>>>> there. However, that statement is _not_ true for OMPI v1.10 and earlier 
>>>> series.
>>>> 
>>>> Future OMPI versions will utilize PMIx v2 and above, which include a 
>>>> cross-version compatibility layer. Thus, you shouldn’t have any issues 
>>>> mixing and matching OMPI versions from this regard.
>>>> 
>>>> However, your second example is a perfect illustration of where 
>>>> containerization can break down. If you build your container on a system 
>>>> that doesn’t have (for example) tm and verbs installed on it, then those 
>>>> OMPI components will not be built. The tm component won’t matter as the 
>>>> system version of mpirun will be executing, and it presumably knows how to 
>>>> interact with Torque.
>>>> 
>>>&

Re: [OMPI users] Segmentation Fault when using OpenMPI 1.10.6 and PGI 17.1.0 on POWER8

2017-02-21 Thread r...@open-mpi.org
Can you provide a backtrace with line numbers from a debug build? We don’t get 
much testing with lsf, so it is quite possible there is a bug in there.

> On Feb 21, 2017, at 7:39 PM, Hammond, Simon David (-EXP)  
> wrote:
> 
> Hi OpenMPI Users,
> 
> Has anyone successfully tested OpenMPI 1.10.6 with PGI 17.1.0 on POWER8 with 
> the LSF scheduler (—with-lsf=..)?
> 
> I am getting this error when the code hits MPI_Finalize. It causes the job to 
> abort (i.e. exit the LSF session) when I am running interactively.
> 
> Are there any materials we can supply to aid debugging/problem isolation?
> 
> [white23:58788] *** Process received signal ***
> [white23:58788] Signal: Segmentation fault (11)
> [white23:58788] Signal code: Invalid permissions (2)
> [white23:58788] Failing at address: 0x108e0810
> [white23:58788] [ 0] [0x10050478]
> [white23:58788] [ 1] [0x0]
> [white23:58788] [ 2] 
> /home/projects/pwr8-rhel73-lsf/openmpi/1.10.6/pgi/17.1.0/cuda/none/lib/libopen-rte.so.12(+0x1b6b0)[0x1071b6b0]
> [white23:58788] [ 3] 
> /home/projects/pwr8-rhel73-lsf/openmpi/1.10.6/pgi/17.1.0/cuda/none/lib/libopen-rte.so.12(orte_finalize+0x70)[0x1071b5b8]
> [white23:58788] [ 4] 
> /home/projects/pwr8-rhel73-lsf/openmpi/1.10.6/pgi/17.1.0/cuda/none/lib/libmpi.so.12(ompi_mpi_finalize+0x760)[0x10121dc8]
> [white23:58788] [ 5] 
> /home/projects/pwr8-rhel73-lsf/openmpi/1.10.6/pgi/17.1.0/cuda/none/lib/libmpi.so.12(PMPI_Finalize+0x6c)[0x10153154]
> [white23:58788] [ 6] ./IMB-MPI1[0x100028dc]
> [white23:58788] [ 7] /lib64/libc.so.6(+0x24700)[0x104b4700]
> [white23:58788] [ 8] /lib64/libc.so.6(__libc_start_main+0xc4)[0x104b48f4]
> [white23:58788] *** End of error message ***
> [white22:73620] *** Process received signal ***
> [white22:73620] Signal: Segmentation fault (11)
> [white22:73620] Signal code: Invalid permissions (2)
> [white22:73620] Failing at address: 0x108e0810
> 
> 
> Thanks,
> 
> S.
> 
> —
> 
> Si Hammond
> Scalable Computer Architectures
> Sandia National Laboratories, NM, USA
> 
> [Sent from Remote Connection, Please excuse typos]
> 
> 
> 
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] More confusion about --map-by!

2017-02-23 Thread r...@open-mpi.org
From the mpirun man page:

**
Open MPI employs a three-phase procedure for assigning process locations and 
ranks:
mapping
Assigns a default location to each process
ranking
Assigns an MPI_COMM_WORLD rank value to each process
binding
Constrains each process to run on specific processors
The mapping step is used to assign a default location to each process based on 
the mapper being employed. Mapping by slot, node, and sequentially results in 
the assignment of the processes to the node level. In contrast, mapping by 
object, allows the mapper to assign the process to an actual object on each 
node.

Note: the location assigned to the process is independent of where it will be 
bound - the assignment is used solely as input to the binding algorithm.

The mapping of process processes to nodes can be defined not just with general 
policies but also, if necessary, using arbitrary mappings that cannot be 
described by a simple policy. One can use the "sequential mapper," which reads 
the hostfile line by line, assigning processes to nodes in whatever order the 
hostfile specifies. Use the -mca rmaps seq option. For example, using the same 
hostfile as before:

mpirun -hostfile myhostfile -mca rmaps seq ./a.out

will launch three processes, one on each of nodes aa, bb, and cc, respectively. 
The slot counts don’t matter; one process is launched per line on whatever node 
is listed on the line.

Another way to specify arbitrary mappings is with a rankfile, which gives you 
detailed control over process binding as well. Rankfiles are discussed below.

The second phase focuses on the ranking of the process within the job’s 
MPI_COMM_WORLD. Open MPI separates this from the mapping procedure to allow 
more flexibility in the relative placement of MPI processes. 

The binding phase actually binds each process to a given set of processors. 
This can improve performance if the operating system is placing processes 
suboptimally. For example, it might oversubscribe some multi-core processor 
sockets, leaving other sockets idle; this can lead processes to contend 
unnecessarily for common resources. Or, it might spread processes out too 
widely; this can be suboptimal if application performance is sensitive to 
interprocess communication costs. Binding can also keep the operating system 
from migrating processes excessively, regardless of how optimally those 
processes were placed to begin with.


So what you probably want is:  --map-by socket:pe=N --rank-by core

Remember, the pe=N modifier automatically forces binding at the cpu level. The 
rank-by directive defaults to rank-by socket when you map-by socket, hence you 
need to specify that you want it to map by core instead. Here is the result of 
doing that on my box:

$ mpirun --map-by socket:pe=2 --rank-by core --report-bindings -n 8 hostname
[rhc001:154283] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 
1[hwt 0-1]]: 
[BB/BB/../../../../../../../../../..][../../../../../../../../../../../..]
[rhc001:154283] MCW rank 1 bound to socket 0[core 2[hwt 0-1]], socket 0[core 
3[hwt 0-1]]: 
[../../BB/BB/../../../../../../../..][../../../../../../../../../../../..]
[rhc001:154283] MCW rank 2 bound to socket 0[core 4[hwt 0-1]], socket 0[core 
5[hwt 0-1]]: 
[../../../../BB/BB/../../../../../..][../../../../../../../../../../../..]
[rhc001:154283] MCW rank 3 bound to socket 0[core 6[hwt 0-1]], socket 0[core 
7[hwt 0-1]]: 
[../../../../../../BB/BB/../../../..][../../../../../../../../../../../..]
[rhc001:154283] MCW rank 4 bound to socket 1[core 12[hwt 0-1]], socket 1[core 
13[hwt 0-1]]: 
[../../../../../../../../../../../..][BB/BB/../../../../../../../../../..]
[rhc001:154283] MCW rank 5 bound to socket 1[core 14[hwt 0-1]], socket 1[core 
15[hwt 0-1]]: 
[../../../../../../../../../../../..][../../BB/BB/../../../../../../../..]
[rhc001:154283] MCW rank 6 bound to socket 1[core 16[hwt 0-1]], socket 1[core 
17[hwt 0-1]]: 
[../../../../../../../../../../../..][../../../../BB/BB/../../../../../..]
[rhc001:154283] MCW rank 7 bound to socket 1[core 18[hwt 0-1]], socket 1[core 
19[hwt 0-1]]: 
[../../../../../../../../../../../..][../../../../../../BB/BB/../../../..]


HTH
Ralph

> On Feb 23, 2017, at 6:18 AM,   wrote:
> 
> Mark,
> 
> what about
> mpirun -np 6 -map-by slot:PE=4 --bind-to core --report-bindings ./prog
> 
> it is a fit for 1) and 2) but not 3)
> 
> if you use OpenMP and want 2 threads per task, then you can
> export OMP_NUM_THREADS=2
> not to use 4 threads by default (with most OpenMP runtimes)
> 
> Cheers,
> 
> Gilles
> - Original Message -
>> Hi,
>> 
>> I'm still trying to figure out how to express the core binding I want 
> to 
>> openmpi 2.x via the --map-by option. Can anyone help, please?
>> 
>> I bet I'm being dumb, but it's proving tricky to achieve the following 
>> aims (most important first):
>> 
>> 1) Maximise memory bandwidth usage (e.g. load balance ranks across
>>processor sockets)
>> 2) Optimise

Re: [OMPI users] More confusion about --map-by!

2017-02-23 Thread r...@open-mpi.org
Just as a fun follow-up: if you wanted to load-balance across nodes as well as 
within nodes, then you would add the “span” modifier to map-by:

$ mpirun --map-by socket:span,pe=2 --rank-by core --report-bindings -n 8 
hostname
[rhc001:162391] SETTING BINDING TO CORE
[rhc001:162391] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 
1[hwt 0-1]]: 
[BB/BB/../../../../../../../../../..][../../../../../../../../../../../..]
[rhc001:162391] MCW rank 1 bound to socket 0[core 2[hwt 0-1]], socket 0[core 
3[hwt 0-1]]: 
[../../BB/BB/../../../../../../../..][../../../../../../../../../../../..]
[rhc001:162391] MCW rank 2 bound to socket 1[core 12[hwt 0-1]], socket 1[core 
13[hwt 0-1]]: 
[../../../../../../../../../../../..][BB/BB/../../../../../../../../../..]
[rhc001:162391] MCW rank 3 bound to socket 1[core 14[hwt 0-1]], socket 1[core 
15[hwt 0-1]]: 
[../../../../../../../../../../../..][../../BB/BB/../../../../../../../..]


[rhc002.cluster:150295] MCW rank 4 bound to socket 0[core 0[hwt 0-1]], socket 
0[core 1[hwt 0-1]]: 
[BB/BB/../../../../../../../../../..][../../../../../../../../../../../..]
[rhc002.cluster:150295] MCW rank 5 bound to socket 0[core 2[hwt 0-1]], socket 
0[core 3[hwt 0-1]]: 
[../../BB/BB/../../../../../../../..][../../../../../../../../../../../..]
[rhc002.cluster:150295] MCW rank 6 bound to socket 1[core 12[hwt 0-1]], socket 
1[core 13[hwt 0-1]]: 
[../../../../../../../../../../../..][BB/BB/../../../../../../../../../..]
[rhc002.cluster:150295] MCW rank 7 bound to socket 1[core 14[hwt 0-1]], socket 
1[core 15[hwt 0-1]]: 
[../../../../../../../../../../../..][../../BB/BB/../../../../../../../..]

“span” causes ORTE to treat all the sockets etc. as being on a single giant 
node.

HTH
Ralph


> On Feb 23, 2017, at 6:38 AM, r...@open-mpi.org wrote:
> 
> From the mpirun man page:
> 
> **
> Open MPI employs a three-phase procedure for assigning process locations and 
> ranks:
> mapping
> Assigns a default location to each process
> ranking
> Assigns an MPI_COMM_WORLD rank value to each process
> binding
> Constrains each process to run on specific processors
> The mapping step is used to assign a default location to each process based 
> on the mapper being employed. Mapping by slot, node, and sequentially results 
> in the assignment of the processes to the node level. In contrast, mapping by 
> object, allows the mapper to assign the process to an actual object on each 
> node.
> 
> Note: the location assigned to the process is independent of where it will be 
> bound - the assignment is used solely as input to the binding algorithm.
> 
> The mapping of process processes to nodes can be defined not just with 
> general policies but also, if necessary, using arbitrary mappings that cannot 
> be described by a simple policy. One can use the "sequential mapper," which 
> reads the hostfile line by line, assigning processes to nodes in whatever 
> order the hostfile specifies. Use the -mca rmaps seq option. For example, 
> using the same hostfile as before:
> 
> mpirun -hostfile myhostfile -mca rmaps seq ./a.out
> 
> will launch three processes, one on each of nodes aa, bb, and cc, 
> respectively. The slot counts don’t matter; one process is launched per line 
> on whatever node is listed on the line.
> 
> Another way to specify arbitrary mappings is with a rankfile, which gives you 
> detailed control over process binding as well. Rankfiles are discussed below.
> 
> The second phase focuses on the ranking of the process within the job’s 
> MPI_COMM_WORLD. Open MPI separates this from the mapping procedure to allow 
> more flexibility in the relative placement of MPI processes. 
> 
> The binding phase actually binds each process to a given set of processors. 
> This can improve performance if the operating system is placing processes 
> suboptimally. For example, it might oversubscribe some multi-core processor 
> sockets, leaving other sockets idle; this can lead processes to contend 
> unnecessarily for common resources. Or, it might spread processes out too 
> widely; this can be suboptimal if application performance is sensitive to 
> interprocess communication costs. Binding can also keep the operating system 
> from migrating processes excessively, regardless of how optimally those 
> processes were placed to begin with.
> 
> 
> So what you probably want is:  --map-by socket:pe=N --rank-by core
> 
> Remember, the pe=N modifier automatically forces binding at the cpu level. 
> The rank-by directive defaults to rank-by socket when you map-by socket, 
> hence you need to specify that you want it to map by core instead. Here is 
> the result of doing that on my box:
> 
> $ mpirun --map-by socket:pe=2 --rank-by core --report-bindings -n 8 hostname
> [rhc001:1

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-23 Thread r...@open-mpi.org
You might want to try using the DVM (distributed virtual machine) mode in ORTE. 
You can start it on an allocation using the “orte-dvm” cmd, and then submit 
jobs to it with “mpirun --hnp ”, where foo is either the contact info 
printed out by orte-dvm, or the name of the file you told orte-dvm to put that 
info in. You’ll need to take it from OMPI master at this point.

Alternatively, you can get just the DVM bits by downloading the PMIx Reference 
Server (https://github.com/pmix/pmix-reference-server 
). It’s just ORTE, but with it 
locked to the DVM operation. So a simple “psrvr” starts the machine, and then 
“prun” executes cmds (supports all the orterun options, doesn’t need to be told 
how to contact psrvr).

Both will allow you to run serial as well as parallel codes (so long as they 
are built against OMPI master). We are working on providing cross-version PMIx 
support - at that time, you’ll be able to run OMPI v2.0 and above against 
either one as well.

HTH
Ralph

> On Feb 23, 2017, at 1:41 PM, Brock Palen  wrote:
> 
> Is it possible to use mpirun / orte as a load balancer for running serial
> jobs in parallel similar to GNU Parallel?
> https://www.biostars.org/p/63816/ 
> 
> Reason is on any major HPC system you normally want to use a resource
> manager launcher (TM, slurm etc)  and not ssh like gnu parallel.
> 
> I recall there being a way to give OMPI a stack of work todo from the talk
> at SC this year, but I can't figure it out if it does what I think it
> should do.
> 
> Thanks,
> 
> Brock Palen
> www.umich.edu/~brockp 
> Director Advanced Research Computing - TS
> XSEDE Campus Champion
> bro...@umich.edu 
> (734)936-1985
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-27 Thread r...@open-mpi.org

> On Feb 27, 2017, at 4:58 AM, Angel de Vicente  wrote:
> 
> Hi,
> 
> "r...@open-mpi.org"  writes:
>> You might want to try using the DVM (distributed virtual machine)
>> mode in ORTE. You can start it on an allocation using the “orte-dvm”
>> cmd, and then submit jobs to it with “mpirun --hnp ”, where foo
>> is either the contact info printed out by orte-dvm, or the name of
>> the file you told orte-dvm to put that info in. You’ll need to take
>> it from OMPI master at this point.
> 
> this question looked interesting so I gave it a try. In a cluster with
> Slurm I had no problem submitting a job which launched an orte-dvm
> -report-uri ... and then use that file to launch jobs onto that virtual
> machine via orte-submit. 
> 
> To be useful to us at this point, I should be able to start executing
> jobs if there are cores available and just hold them in a queue if the
> cores are already filled. At this point this is not happenning, and if I
> try to submit a second job while the previous one has not finished, I
> get a message like:
> 
> ,
> | DVM ready
> | --
> | All nodes which are allocated for this job are already filled.
> | --
> `
> 
> With the DVM, is it possible to keep these jobs in some sort of queue,
> so that they will be executed when the cores get free?

It wouldn’t be hard to do so - as long as it was just a simple FIFO scheduler. 
I wouldn’t want it to get too complex.

> 
> Thanks,
> -- 
> Ángel de Vicente
> http://www.iac.es/galeria/angelv/  
> -
> ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protección de 
> Datos, acceda a http://www.iac.es/disclaimer.php
> WARNING: For more information on privacy and fulfilment of the Law concerning 
> the Protection of Data, consult http://www.iac.es/disclaimer.php?lang=en
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Using OpenMPI / ORTE as cluster aware GNU Parallel

2017-02-27 Thread r...@open-mpi.org

> On Feb 27, 2017, at 9:39 AM, Reuti  wrote:
> 
> 
>> Am 27.02.2017 um 18:24 schrieb Angel de Vicente :
>> 
>> […]
>> 
>> For a small group of users if the DVM can run with my user and there is
>> no restriction on who can use it or if I somehow can authorize others to
>> use it (via an authority file or similar) that should be enough.
> 
> AFAICS there is no user authorization at all. Everyone can hijack a running 
> DVM once he knows the URI. The only problem might be, that all processes are 
> running under the account of the user who started the DVM. I.e. output files 
> have to go to the home directory of this user, as any other user can't write 
> to his own directory any longer this way.

We can add some authorization protection, at least at the user/group level. One 
can resolve the directory issue by creating some place that has group 
authorities, and then requesting that to be the working directory.

> 
> Running the DVM under root might help, but this would be a high risk that any 
> faulty script might write to a place where sensible system information is 
> stored and may leave the machine unusable afterwards.
> 

I would advise against that

> My first attempts using DVM often leads to a terminated DVM once a process 
> returned with a non-zero exit code. But once the DVM is gone, the queued jobs 
> might be lost too I fear. I would wish that the DVM could be more forgivable 
> (or this feature be adjustable what to do in case of a non-zero exit code).

We just fixed that issue the other day :-)

> 
> -- Reuti
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] State of the DVM in Open MPI

2017-02-28 Thread r...@open-mpi.org
Hi Reuti

The DVM in master seems to be fairly complete, but several organizations are in 
the process of automating tests for it so it gets more regular exercise.

If you are using a version in OMPI 2.x, those are early prototype - we haven’t 
updated the code in the release branches. The more production-ready version 
will be in 3.0, and we’ll start supporting it there.

Meantime, we do appreciate any suggestions and bug reports as we polish it up.


> On Feb 28, 2017, at 2:17 AM, Reuti  wrote:
> 
> Hi,
> 
> Only by reading recent posts I got aware of the DVM. This would be a welcome 
> feature for our setup*. But I see not all options working as expected - is it 
> still a work in progress, or should all work as advertised?
> 
> 1)
> 
> $ soft@server:~> orte-submit -cf foo --hnp file:/home/reuti/dvmuri -n 1 touch 
> /home/reuti/hacked
> 
> Open MPI has detected that a parameter given to a command line
> option does not match the expected format:
> 
>  Option: np
>  Param:  foo
> 
> ==> The given option is -cf, not -np
> 
> 2)
> 
> According to `man orte-dvm` there is -H, -host, --host, -machinefile, 
> -hostfile but none of them seem operational (Open MPI 2.0.2). A given 
> hostlist given by SGE is honored though.
> 
> -- Reuti
> 
> 
> *) We run Open MPI jobs inside SGE. This works fine. Some applications invoke 
> several `mpiexec`-calls during their execution and rely on temporary files 
> they created in the last step(s). While this is working fine on one and the 
> same machine, it fails in case SGE granted slots on several machines as the 
> scratch directories created by `qrsh -inherit …` vanish once the 
> `mpiexec`-call on this particular node finishes (and not at the end of the 
> complete job). I can mimic persistent scratch directories in SGE for a 
> complete job, but invoking the DVM before and shutting it down later on 
> (either by hand in the job script or by SGE killing all remains at the end of 
> the job) might be more straight forward (looks like `orte-dvm` is started by 
> `qrsh -inherit …` too).
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Issues with different IB adapters and openmpi 2.0.2

2017-02-28 Thread r...@open-mpi.org
The root cause is that the nodes are defined as “heterogeneous” because the 
difference in HCAs causes a difference in selection logic. For scalability 
purposes, we don’t circulate the choice of PML as that isn’t something mpirun 
can “discover” and communicate.

One option we could pursue is to provide a mechanism by which we add the HCAs 
to the topology “signature” sent back by the daemon. This would allow us to 
detect the difference, and then ensure that the PML selection is included in 
the circulated wireup data so the system can at least warn you of the problem 
instead of silently hanging.


> On Feb 28, 2017, at 10:38 AM, Orion Poplawski  wrote:
> 
> On 02/27/2017 05:19 PM, Howard Pritchard wrote:
>> Hi Orion
>> 
>> Does the problem occur if you only use font2 and 3?  Do you have MXM 
>> installed
>> on the font1 node?
> 
> No, running across font2/3 is fine.  No idea what MXM is.
> 
>> The 2.x series is using PMIX and it could be that is impacting the PML sanity
>> check.
>> 
>> Howard
>> 
>> 
>> Orion Poplawski mailto:or...@cora.nwra.com>> schrieb am
>> Mo. 27. Feb. 2017 um 14:50:
>> 
>>We have a couple nodes with different IB adapters in them:
>> 
>>font1/var/log/lspci:03:00.0 InfiniBand [0c06]: Mellanox Technologies 
>> MT25204
>>[InfiniHost III Lx HCA] [15b3:6274] (rev 20)
>>font2/var/log/lspci:03:00.0 InfiniBand [0c06]: QLogic Corp. IBA7220 
>> InfiniBand
>>HCA [1077:7220] (rev 02)
>>font3/var/log/lspci:03:00.0 InfiniBand [0c06]: QLogic Corp. IBA7220 
>> InfiniBand
>>HCA [1077:7220] (rev 02)
>> 
>>With 1.10.3 we saw the following errors with mpirun:
>> 
>>[font2.cora.nwra.com:13982 ]
>>[[23220,1],10] selected pml cm, but peer
>>[[23220,1],0] on font1 selected pml ob1
>> 
>>which crashed MPI_Init.
>> 
>>We worked around this by passing "--mca pml ob1".  I notice now with 
>> openmpi
>>2.0.2 without that option I no longer see errors, but the mpi program will
>>hang shortly after startup.  Re-adding the option makes it work, so I'm
>>assuming the underlying problem is still the same, but openmpi appears to 
>> have
>>stopped alerting me to the issue.
>> 
>>Thoughts?
>> 
>>--
>>Orion Poplawski
>>Technical Manager  720-772-5637
>>NWRA, Boulder/CoRA Office FAX: 303-415-9702
>>3380 Mitchell Lane   or...@nwra.com
>>
>>Boulder, CO 80301   http://www.nwra.com
>>___
>>users mailing list
>>users@lists.open-mpi.org 
>>https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>> 
>> 
>> 
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>> 
> 
> 
> -- 
> Orion Poplawski
> Technical Manager  720-772-5637
> NWRA, Boulder/CoRA Office FAX: 303-415-9702
> 3380 Mitchell Lane   or...@nwra.com
> Boulder, CO 80301   http://www.nwra.com
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] MPI for microcontrolles without OS

2017-03-08 Thread r...@open-mpi.org
OpenMPI has been ported to microcontrollers before, but it does require at 
least a minimal OS to provide support (e.g., TCP for communications). Most IoT 
systems already include an OS on them for just that reason. I personally have 
OMPI running on a little Edison board using the OS that comes with it - no 
major changes were required.

Others have used various typical microcontroller real-time OS on their systems, 
and porting OMPI to them usually isn’t that bad. May require some configuration 
mods.


> On Mar 8, 2017, at 9:00 AM, Mateusz Tasz  wrote:
> 
> Hello,
> 
> I am a student. I am attracted by concept of MPI and i would like to
> apply ths idea to bare metel devices like mictrocontrollers e.g.
> stm32. But your solution requires an operating system on board. May I
> ask why is it necessary? Can I neglect it? And if so how can I do it?
> I ask because I'd like to apply this concept to IoT system where data
> can be processed by few device in local neighbourhood.
> 
> Thank in advance,
> Mateusz Tasz
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] MPI for microcontrolles without OS

2017-03-08 Thread r...@open-mpi.org
A quick web search can answer your quest, I believe - here are a few hits I got 
(Texas Instruments has been active in this area):

http://processors.wiki.ti.com/index.php/MCSDK_HPC_3.x_OpenMPI_Under_Review 
<http://processors.wiki.ti.com/index.php/MCSDK_HPC_3.x_OpenMPI_Under_Review>
https://e2e.ti.com/support/applications/high-performance-computing/f/952/t/440905
 
<https://e2e.ti.com/support/applications/high-performance-computing/f/952/t/440905>

Several of our members have Raspberry Pi systems running OMPI - looks something 
like this one:

https://www.hackster.io/darthbison/raspberry-pi-cluster-with-mpi-4602cb 
<https://www.hackster.io/darthbison/raspberry-pi-cluster-with-mpi-4602cb>

and here’s a little book on how to do it:

https://www.packtpub.com/hardware-and-creative/raspberry-pi-super-cluster 
<https://www.packtpub.com/hardware-and-creative/raspberry-pi-super-cluster>

or one of many online explanations:

http://www.southampton.ac.uk/~sjc/raspberrypi/pi_supercomputer_southampton_web.pdf
 
<http://www.southampton.ac.uk/~sjc/raspberrypi/pi_supercomputer_southampton_web.pdf>

HTH
Ralph

> On Mar 8, 2017, at 10:41 AM, Mateusz Tasz  wrote:
> 
> Hi,
> 
> Thank for Your answer. Although I am still confused. As I know TCP
> communication is not a problem for microcontrollers, so that cannot be
> crucial cause for OS choice. Maybe something else is also necessary -
> maybe acording memory - do you know?
> Do you know where I can find a ported version of OMPI to
> microcontrollers(hopefully with documentation :) ?  I admit that
> having OS on the board is nice and gives a high level of abstraction.
> But I belive that sometimes the lower level would be necessary - thats
> why I force to find a solutions.
> About this ported version - was it working properly?
> 
> Thanks in advance,
> Mateusz Tasz
> 
> 
> 2017-03-08 18:23 GMT+01:00 r...@open-mpi.org :
>> OpenMPI has been ported to microcontrollers before, but it does require at 
>> least a minimal OS to provide support (e.g., TCP for communications). Most 
>> IoT systems already include an OS on them for just that reason. I personally 
>> have OMPI running on a little Edison board using the OS that comes with it - 
>> no major changes were required.
>> 
>> Others have used various typical microcontroller real-time OS on their 
>> systems, and porting OMPI to them usually isn’t that bad. May require some 
>> configuration mods.
>> 
>> 
>>> On Mar 8, 2017, at 9:00 AM, Mateusz Tasz  wrote:
>>> 
>>> Hello,
>>> 
>>> I am a student. I am attracted by concept of MPI and i would like to
>>> apply ths idea to bare metel devices like mictrocontrollers e.g.
>>> stm32. But your solution requires an operating system on board. May I
>>> ask why is it necessary? Can I neglect it? And if so how can I do it?
>>> I ask because I'd like to apply this concept to IoT system where data
>>> can be processed by few device in local neighbourhood.
>>> 
>>> Thank in advance,
>>> Mateusz Tasz
>>> ___
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>> 
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI in docker container

2017-03-11 Thread r...@open-mpi.org
Past attempts have indicated that only TCP works well with Docker - if you want 
to use OPA, you’re probably better off using Singularity as your container.

http://singularity.lbl.gov/ 

The OMPI master has some optimized integration for Singularity, but 2.0.2 will 
work with it just fine as well.


> On Mar 11, 2017, at 11:09 AM, Ender GÜLER  wrote:
> 
> Hi Josh,
> 
> Thanks for your suggestion. When I add "-mca pml ob1" it worked. Actually I 
> need the psm support (but not with this scenario). Here's the story: 
> 
> I compiled the openmpi source with psm2 support becuase the host has OmniPath 
> device and my first try is to test whether I can use the hardware or not and 
> I ended up testing the compiled OpenMPI against the different transport modes 
> without success. 
> 
> The psm2 support is working when running directly from physical host and I 
> suppose the docker layer has something to do with this error. But I cannot 
> figure out what causes this situation.
> 
> Do you guys, have any idea what to look at next? I'll ask opinion at the 
> Docker Forums but before that I try to get more information and I wondered 
> whether anyone else have this kind of problem before.
> 
> Regards,
> 
> Ender
> 
> On Sat, Mar 11, 2017 at 6:19 PM Josh Hursey  > wrote:
> From the stack track it looks like it's failing the PSM2 MTL, which you 
> shouldn't need (or want?) in this scenario.
> 
> Try adding this additional MCA parameter to your command line:
>  -mca pml ob1
> 
> That will force Open MPI's selection such that it avoids that component. That 
> might get you further along.
> 
> 
> On Sat, Mar 11, 2017 at 7:49 AM, Ender GÜLER  > wrote:
> Hi there,
> 
> I try to use openmpi in a docker container. My host and container OS is 
> CentOS 7 (7.2.1511 to be exact). When I try to run a simple MPI hello world 
> application, the app core dumps every time with BUS ERROR. The OpenMPI 
> version is 2.0.2 and I compiled in the container. When I copied the 
> installation from container to host, it runs without any problem.
> 
> Have you ever tried to run OpenMPI and encountered a problem like this one. 
> If so what can be wrong? What should I do to find the root cause and solve 
> the problem? The very same application can be run with IntelMPI in the 
> container without any problem.
> 
> I pasted the output of my mpirun command and its output below.
> 
> [root@cn15 ~]# mpirun --allow-run-as-root -mca btl sm -np 2 -machinefile 
> mpd.hosts ./mpi_hello.x
> [cn15:25287] *** Process received signal ***
> [cn15:25287] Signal: Bus error (7)
> [cn15:25287] Signal code: Non-existant physical address (2)
> [cn15:25287] Failing at address: 0x7fe2d0fbf000
> [cn15:25287] [ 0] /lib64/libpthread.so.0(+0xf100)[0x7fe2d53e9100]
> [cn15:25287] [ 1] /lib64/libpsm2.so.2(+0x4b034)[0x7fe2d5a9a034]
> [cn15:25287] [ 2] /lib64/libpsm2.so.2(+0xc45f)[0x7fe2d5a5b45f]
> [cn15:25287] [ 3] /lib64/libpsm2.so.2(+0xc706)[0x7fe2d5a5b706]
> [cn15:25287] [ 4] /lib64/libpsm2.so.2(+0x10d60)[0x7fe2d5a5fd60]
> [cn15:25287] [ 5] /lib64/libpsm2.so.2(psm2_ep_open+0x41e)[0x7fe2d5a5e8de]
> [cn15:25287] [ 6] 
> /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mtl_psm2_module_init+0x1df)[0x7fe2d69b5d5b]
> [cn15:25287] [ 7] 
> /opt/openmpi/2.0.2/lib/libmpi.so.20(+0x1b3249)[0x7fe2d69b7249]
> [cn15:25287] [ 8] 
> /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mtl_base_select+0xc2)[0x7fe2d69b2956]
> [cn15:25287] [ 9] 
> /opt/openmpi/2.0.2/lib/libmpi.so.20(+0x216c9f)[0x7fe2d6a1ac9f]
> [cn15:25287] [10] 
> /opt/openmpi/2.0.2/lib/libmpi.so.20(mca_pml_base_select+0x29b)[0x7fe2d69f7566]
> [cn15:25287] [11] 
> /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mpi_init+0x665)[0x7fe2d687e0f4]
> [cn15:25287] [12] 
> /opt/openmpi/2.0.2/lib/libmpi.so.20(MPI_Init+0x99)[0x7fe2d68b1cb4]
> [cn15:25287] [13] ./mpi_hello.x[0x400927]
> [cn15:25287] [14] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fe2d5039b15]
> [cn15:25287] [15] ./mpi_hello.x[0x400839]
> [cn15:25287] *** End of error message ***
> [cn15:25286] *** Process received signal ***
> [cn15:25286] Signal: Bus error (7)
> [cn15:25286] Signal code: Non-existant physical address (2)
> [cn15:25286] Failing at address: 0x7fd4abb18000
> [cn15:25286] [ 0] /lib64/libpthread.so.0(+0xf100)[0x7fd4b3f56100]
> [cn15:25286] [ 1] /lib64/libpsm2.so.2(+0x4b034)[0x7fd4b4607034]
> [cn15:25286] [ 2] /lib64/libpsm2.so.2(+0xc45f)[0x7fd4b45c845f]
> [cn15:25286] [ 3] /lib64/libpsm2.so.2(+0xc706)[0x7fd4b45c8706]
> [cn15:25286] [ 4] /lib64/libpsm2.so.2(+0x10d60)[0x7fd4b45ccd60]
> [cn15:25286] [ 5] /lib64/libpsm2.so.2(psm2_ep_open+0x41e)[0x7fd4b45cb8de]
> [cn15:25286] [ 6] 
> /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mtl_psm2_module_init+0x1df)[0x7fd4b5522d5b]
> [cn15:25286] [ 7] 
> /opt/openmpi/2.0.2/lib/libmpi.so.20(+0x1b3249)[0x7fd4b5524249]
> [cn15:25286] [ 8] 
> /opt/openmpi/2.0.2/lib/libmpi.so.20(ompi_mtl_base_select+0xc2)[0x7fd4b551f956]
> [cn15:25286] [ 9] 
> /opt/openmpi/2.0.2/

  1   2   3   >