Hi Nathan,
Thanks for your answer.
The "credits" make sense for the purpose of flow control. However, the
sender in my case will be blocked even for the first message. This doesn't
seem to be the symptom of running out of credits. Is there any reason for
this? Also, is there a mac parameter for
When using eager_rdma the sender will block once it runs out of
"credits". If the receiver enters MPI for any reason the incoming
messages will be placed in the ob1 unexpected queue and the credits will
be returned to the sender. If you turn off eager_rdma you will probably
get different results.
I honestly have no idea…
> On May 16, 2016, at 10:39 AM, Zabiziz Zaz wrote:
>
> Ok.
> Could you please tell me the latest version that is supported?
>
> Regards,
> Guilherme.
>
> On Mon, May 16, 2016 at 12:30 PM, Ralph Castain
Ok.
Could you please tell me the latest version that is supported?
Regards,
Guilherme.
On Mon, May 16, 2016 at 12:30 PM, Ralph Castain wrote:
> We used to do so, but don’t currently support that model - folks are
> working on restoring it. No timetable, though I don’t think
Hi,
I am using Open MPI 1.8.6. I guess my question is related to the flow
control algorithm for small messages. The question is how to avoid the
sender being blocked by the receiver when using *openib* module for small
messages and using *blocking send*. I have looked through this FAQ(
I'm afraid I don't know what the difference is in systemctld for ssh.socket vs.
ssh.service, or why that would change Open MPI's behavior.
One other thing to try is to mpirun non-MPI programs, like "hostname" and see
if that works. This will help distinguish between problems with Open MPI's
On May 10, 2016, at 12:26 , Gilles Gouaillardet wrote:
except if you #include the libc header in your app, *and* your send
function has a different prototype, I do not see how clang can issue
a warning
(except of course if clang "knows" all the libc subroutines ...)
not sure if that helps
My application have a heartbeat that checks if a node is alive and can
redistribute a task to another node if the master lost communication with
it. The application also have a checkpoint/restart, but since I usually
have hundreds of nodes for one job and usually takes a long time to restart
the
We used to do so, but don’t currently support that model - folks are working on
restoring it. No timetable, though I don’t think it will be too much longer
before it is in master. Can’t say when it will hit release
> On May 16, 2016, at 8:25 AM, Zabiziz Zaz wrote:
>
> Hi
What do you mean by fault tolerant application ?
from an OpenMPI point of view, if such a connection is lost, your
application will no more be able to communicate, so killing it is the best
option.
if your application has built in checkpoint/restart, then you have to
restart it with mpirun after
Hi Llolsten,
the problem is not a firewall issue. The simplest way to reproduce the
problem is rebooting a node in the middle of the job. It's possible to
configure the openmpi to not terminate the job if, in the middle of the
job, one node is rebooted?
Thanks again for your help.
Regards,
We already do that as a check, but it came after the 1.6 series - and so you
get the old error message if you mix with the 1.6 series or older versions.
> On May 16, 2016, at 8:22 AM, Gilles Gouaillardet
> wrote:
>
> or this could be caused by a firewall ...
>
or this could be caused by a firewall ...
v1.10 and earlier uses tcp for oob,
from v2.x, unix sockets are used
detecting consistent version is a good idea,
printing them (mpirun, orted and a.out) can be a first step.
my idea is
mpirun invokes orted with '--ompi_version=x.y.z'
orted checks it is
Ralph Castain writes:
> This usually indicates that the remote process is using a different OMPI
> version. You might check to ensure that the paths on the remote nodes are
> correct.
That seems quite a common problem with non-obvious failure modes.
Is it not possible to
Hello Guilherme,
This may be off but try running your mpirun command with the option
“–tag-output”. If you see a “broken pipe”, then your issue may be firewall
related. You could then check the thread “Re: [OMPI users] mpirun command won't
run unless the firewalld daemon is disabled” for
Gilles Gouaillardet writes:
> Are you sure ulimit -c unlimited is *really* applied on all hosts
>
>
> can you please run the simple program below and confirm that ?
Nothing specifically wrong with that, but it's worth installing
procenv(1) as a general solution to checking
"Rob Malpass" writes:
> Almost in desperation, I cheated:
Why is that cheating? Unless you specifically want a different version,
it seems sensible to me, especially as you then have access to packaged
versions of at least some MPI programs. Likewise with rpm-based
+1 to everything so far.
Also, look in your shell startup files (e.g., $HOME/.bashrc) to see if certain
parts of it are not executed for non-interactive logins. A common mistake we
see is a shell startup file like this:
# ... do setup for all logins ...
if (this is a non-interactive
Hi,
I'm using openmpi-1.10.2 and sometimes I'm receiving the message below:
--
ORTE has lost communication with its daemon located on node:
hostname:
This is usually due to either a failure of the TCP network
Hey Rob,
I don't know if this is what is going on, but in general, when a package
is installed via a distro's package management system, it ends up in
system locations such as /usr/bin and /usr/lib that are automatically
searched when looking for executables and libraries. So, it isn't
Greetings Manu.
It looks like you are using ancient / outdated information; we switched away
from SVN to Git ~1.5 years ago. Do we still have some stale references to SVN
somewhere, or are you working from old data?
Are you looking at the MTT wiki?
> On May 16, 2016, at 8:20 AM, Manu S.
Hi All, I'm trying to setup and run MTT suite but the download of the tools is
failing due to the username & password. How do I obtain the username and
password for the repo?
I have attached the output from the console also.
Thanks
Manu
[root@core96cn1 mtt]# client/mtt --file
22 matches
Mail list logo