Re: [OMPI users] mxm 3.0 and knem warnings

2014-08-27 Thread Brock Palen
Brice, et al. Thanks a lot for this info. We are setting up new builds of OMPI 1.8.2 with knem and mxm 3.0, If we have questions we will let you know. Brock Palen www.umich.edu/~brockp CAEN Advanced Computing XSEDE Campus Champion bro...@umich.edu (734)936-1985 On Aug 27, 2014, at 12:44 PM,

Re: [OMPI users] Running a hybrid MPI+openMP program

2014-08-27 Thread tmishima
Hi, Here is a very simple patch, but Ralph might have a different idea. So I'd like him to decide how to treat it. As far as I checked, I believe it has no side effect. (See attached file: patch.bind-to-none) Tetsuya > Hi, > > Am 27.08.2014 um 09:57 schrieb Tetsuya Mishima: > > > Hi Reuti and R

Re: [OMPI users] mxm 3.0 and knem warnings

2014-08-27 Thread Brice Goglin
Hello Brock, Some people complained that giving world-wide access to a device file by default might be bad if we ever find a security leak in the kernel module. So I needed a better default. The rdma group is often used for OFED devices, and OFED and KNEM users are often the same, so it was a good

Re: [OMPI users] mxm 3.0 and knem warnings

2014-08-27 Thread Alina Sklarevich
I'm not sure why this is the default but in your case you should set the permissions to 666 to use it. On Wed, Aug 27, 2014 at 5:25 PM, Brock Palen wrote: > Is there any major issues letting all users use it by setting /dev/knem to > 666 ? It appears knem by default wants to only allow users o

Re: [OMPI users] mxm 3.0 and knem warnings

2014-08-27 Thread Brock Palen
Is there any major issues letting all users use it by setting /dev/knem to 666 ? It appears knem by default wants to only allow users of the rdma group (if defined) to access knem. We are a generic provider and want everyone to be able to use it, just feels strange to restrict it, so I am tr

Re: [OMPI users] mxm 3.0 and knem warnings

2014-08-27 Thread Alina Sklarevich
Hi, KNEM can improve the performance significantly for intra-node communication and that's why MXM is using it. If you don't want to use it, you can suppress this warning by adding the following to your command line after mpirun: -x MXM_LOG_LEVEL=error Alina. On Wed, Aug 27, 2014 at 4:28 PM, Br

Re: [OMPI users] 答复: 答复: Does multiple Irecv means concurrent receiving ?

2014-08-27 Thread Jeff Squyres (jsquyres)
On Aug 27, 2014, at 9:21 AM, Zhang,Lei(Ecom) wrote: > The problem is that I profiled the receiving node and found that its network > bandwidth is used only less than 50%. How did you profile that? > That's why I want to find ways to increase the receiving throughput. Any > ideas ? A lot of t

Re: [OMPI users] long initialization

2014-08-27 Thread Ralph Castain
How bizarre. Please add "--leave-session-attached -mca oob_base_verbose 100" to your cmd line On Aug 27, 2014, at 4:31 AM, Timur Ismagilov wrote: > When i try to specify oob with --mca oob_tcp_if_include from ifconfig>, i alwase get error: > > $ mpirun --mca oob_tcp_if_include ib0 -np 1 ./he

[OMPI users] mxm 3.0 and knem warnings

2014-08-27 Thread Brock Palen
We updated our ofed and started to rebuild our MPI builds with mxm 3.0 . Now we get warnings bout knem [1409145437.578861] [flux-login1:31719:0] shm.c:65 MXM WARN Could not open the KNEM device file at /dev/knem : No such file or directory. Won't use knem. I have heard about it a

[OMPI users] 答复: 答复: Does multiple Irecv means concurrent receiving ?

2014-08-27 Thread Zhang,Lei(Ecom)
The problem is that I profiled the receiving node and found that its network bandwidth is used only less than 50%. That's why I want to find ways to increase the receiving throughput. Any ideas ? Lei -邮件原件- 发件人: users [mailto:users-boun...@open-mpi.org] 代表 George Bosilca 发送时间: 2014年8月27

Re: [OMPI users] 答复: Does multiple Irecv means concurrent receiving ?

2014-08-27 Thread George Bosilca
You have a physical constraint, the capacity of your links. If you are over 90% of your network bandwidth, there is little to be improved. George. On Aug 27, 2014, at 0:18, "Zhang,Lei(Ecom)" wrote: >> I'm not sure what you mean by this statement. If you add N asynchronous >> requests and the

Re: [OMPI users] long initialization

2014-08-27 Thread Timur Ismagilov
When i try to specify oob with --mca oob_tcp_if_include , i alwase get error: $ mpirun  --mca oob_tcp_if_include ib0 -np 1 ./hello_c -- An ORTE daemon has unexpectedly failed after launch and before communicating back to mpiru

Re: [OMPI users] Running a hybrid MPI+openMP program

2014-08-27 Thread Reuti
Hi, Am 27.08.2014 um 09:57 schrieb Tetsuya Mishima: > Hi Reuti and Ralph, > > How do you think if we accept bind-to none option even when the pe=N option > is provided? > > just like: > mpirun -map-by slot:pe=N -bind-to none ./inverse Yes, this would be ok to cover all cases. -- Reuti > If

Re: [OMPI users] Running a hybrid MPI+openMP program

2014-08-27 Thread Tetsuya Mishima
Hi Reuti and Ralph, How do you think if we accept bind-to none option even when the pe=N option is provided? just like: mpirun -map-by slot:pe=N -bind-to none ./inverse If yes, it's easy for me to make a patch. Tetsuya Tetsuya Mishima tmish...@jcity.maeda.co.jp

Re: [OMPI users] OpenMPI Remote Execution Problem (Application does not start)

2014-08-27 Thread Benjamin Giehle
Ok it's working now. I disabled the IP table firewall, now it works. I configured my both machines differently, that is why I got the error from the other subject. Sorry for new subject, but it's the first time for me that I am using a mailing list I hope I rely now correct :D Thanks for you

[OMPI users] OpenMPI Remote Execution Problem (Application does not start)

2014-08-27 Thread Benjamin Giehle
Thank you, i added the parameters and I figured out, that the ip table firewall was messing up something, so I disabled it on both machines. But now I get another error: [superuser@localhost ~]$ mpirun --host 192.168.54.56 --leave-session-attached -mca plm_base_verbose 5 -mca oob_base_verbose 5

[OMPI users] 答复: Does multiple Irecv means concurrent receiving ?

2014-08-27 Thread Zhang,Lei(Ecom)
> I'm not sure what you mean by this statement. If you add N asynchronous > requests and the speed is not decreased, that's a *good* thing, right? My problem is that N asynchronous irecv does not *increase* the speed of receiving data compared to just 1 irecv. I have multiple nodes sending larg