Brice, et al.
Thanks a lot for this info. We are setting up new builds of OMPI 1.8.2 with
knem and mxm 3.0,
If we have questions we will let you know.
Brock Palen
www.umich.edu/~brockp
CAEN Advanced Computing
XSEDE Campus Champion
bro...@umich.edu
(734)936-1985
On Aug 27, 2014, at 12:44 PM,
Hi,
Here is a very simple patch, but Ralph might have a different idea.
So I'd like him to decide how to treat it. As far as I checked, I
believe it has no side effect.
(See attached file: patch.bind-to-none)
Tetsuya
> Hi,
>
> Am 27.08.2014 um 09:57 schrieb Tetsuya Mishima:
>
> > Hi Reuti and R
Hello Brock,
Some people complained that giving world-wide access to a device file by
default might be bad if we ever find a security leak in the kernel
module. So I needed a better default. The rdma group is often used for
OFED devices, and OFED and KNEM users are often the same, so it was a
good
I'm not sure why this is the default but in your case you should set the
permissions to 666 to use it.
On Wed, Aug 27, 2014 at 5:25 PM, Brock Palen wrote:
> Is there any major issues letting all users use it by setting /dev/knem to
> 666 ? It appears knem by default wants to only allow users o
Is there any major issues letting all users use it by setting /dev/knem to 666
? It appears knem by default wants to only allow users of the rdma group (if
defined) to access knem.
We are a generic provider and want everyone to be able to use it, just feels
strange to restrict it, so I am tr
Hi,
KNEM can improve the performance significantly for intra-node communication
and that's why MXM is using it.
If you don't want to use it, you can suppress this warning by adding the
following to your command line after mpirun:
-x MXM_LOG_LEVEL=error
Alina.
On Wed, Aug 27, 2014 at 4:28 PM, Br
On Aug 27, 2014, at 9:21 AM, Zhang,Lei(Ecom) wrote:
> The problem is that I profiled the receiving node and found that its network
> bandwidth is used only less than 50%.
How did you profile that?
> That's why I want to find ways to increase the receiving throughput. Any
> ideas ?
A lot of t
How bizarre. Please add "--leave-session-attached -mca oob_base_verbose 100" to
your cmd line
On Aug 27, 2014, at 4:31 AM, Timur Ismagilov wrote:
> When i try to specify oob with --mca oob_tcp_if_include from ifconfig>, i alwase get error:
>
> $ mpirun --mca oob_tcp_if_include ib0 -np 1 ./he
We updated our ofed and started to rebuild our MPI builds with mxm 3.0 .
Now we get warnings bout knem
[1409145437.578861] [flux-login1:31719:0] shm.c:65 MXM WARN Could
not open the KNEM device file at /dev/knem : No such file or directory. Won't
use knem.
I have heard about it a
The problem is that I profiled the receiving node and found that its network
bandwidth is used only less than 50%. That's why I want to find ways to
increase the receiving throughput. Any ideas ?
Lei
-邮件原件-
发件人: users [mailto:users-boun...@open-mpi.org] 代表 George Bosilca
发送时间: 2014年8月27
You have a physical constraint, the capacity of your links. If you are over 90%
of your network bandwidth, there is little to be improved.
George.
On Aug 27, 2014, at 0:18, "Zhang,Lei(Ecom)" wrote:
>> I'm not sure what you mean by this statement. If you add N asynchronous
>> requests and the
When i try to specify oob with --mca oob_tcp_if_include , i alwase get error:
$ mpirun --mca oob_tcp_if_include ib0 -np 1 ./hello_c
--
An ORTE daemon has unexpectedly failed after launch and before
communicating back to mpiru
Hi,
Am 27.08.2014 um 09:57 schrieb Tetsuya Mishima:
> Hi Reuti and Ralph,
>
> How do you think if we accept bind-to none option even when the pe=N option
> is provided?
>
> just like:
> mpirun -map-by slot:pe=N -bind-to none ./inverse
Yes, this would be ok to cover all cases.
-- Reuti
> If
Hi Reuti and Ralph,
How do you think if we accept bind-to none option even when the pe=N option is
provided?
just like:
mpirun -map-by slot:pe=N -bind-to none ./inverse
If yes, it's easy for me to make a patch.
Tetsuya
Tetsuya Mishima tmish...@jcity.maeda.co.jp
Ok it's working now. I disabled the IP table firewall, now it works.
I configured my both machines differently, that is why I got the error from the
other subject.
Sorry for new subject, but it's the first time for me that I am using a mailing
list I hope I rely now correct :D
Thanks for you
Thank you,
i added the parameters and I figured out, that the ip table firewall was
messing up something, so I disabled it on both machines.
But now I get another error:
[superuser@localhost ~]$ mpirun --host 192.168.54.56 --leave-session-attached
-mca plm_base_verbose 5 -mca oob_base_verbose 5
> I'm not sure what you mean by this statement. If you add N asynchronous
> requests and the speed is not decreased, that's a *good* thing, right?
My problem is that N asynchronous irecv does not *increase* the speed of
receiving data compared to just 1 irecv.
I have multiple nodes sending larg
17 matches
Mail list logo