Hi,
I'm running an mpi module in python (pypar), but I believe (after
googling) that this might be a problem with openmpi.
When I run: 'python -c "import pypar"', I get:
[titus:21965] *** Process received signal ***
[titus:21965] Signal: Segmentation fault (11)
[titus:21965] Signal code: Address n
I've found that I always have to use mpirun to start my spawner
process, due to the exact problem you are having: the need to give
OMPI a hosts file! It seems the singleton functionality is lacking
somehow... it won't allow you to spawn on arbitrary hosts. I have not
tested if this is fixed in t
Afraid I am out of suggestions - could be a bug in the old 1.2 series.
You might try with the 1.3 series...or perhaps someone else has a
suggestion here.
On Jul 29, 2008, at 2:46 PM, Mark Borgerding wrote:
Yes. The host names are listed in the host file.
e.g.
"op2-1 slots=8"
and there is a
Yes. The host names are listed in the host file.
e.g.
"op2-1 slots=8"
and there is an IP address for op2-1 in the /etc/hosts directory
I've read the FAQ. Everything in there seems to assume I am starting
the process group with mpirun or one of its brothers. This is not the
case.
I've cre
Terry Dontje wrote:
Date: Tue, 29 Jul 2008 14:19:14 -0400
From: "Alexander Shabarshin"
Subject: Re: [OMPI users] Communitcation between OpenMPI and
ClusterTools
To:
Message-ID: <00b701c8f1a7$9c24f7c0$c8afcea7@Shabarshin>
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
re
Date: Tue, 29 Jul 2008 14:19:14 -0400
From: "Alexander Shabarshin"
Subject: Re: [OMPI users] Communitcation between OpenMPI and
ClusterTools
To:
Message-ID: <00b701c8f1a7$9c24f7c0$c8afcea7@Shabarshin>
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
reply-type=resp
On Mon, 2008-07-28 at 20:01 -0500, Dirk Eddelbuettel wrote:
> On 24 July 2008 at 14:39, Adam C Powell IV wrote:
> | Greetings,
> |
> | I'm seeing a segfault in a code on Ubuntu 8.04 with gcc 4.2. I
> | recompiled the Debian lenny openmpi 1.2.7~rc2 package on Ubuntu, and
> | compiled the Debian le
Hello
> One idea comes to mind is whether the two nodes are on the same
> subnet? If they are not on the same subnet I think there is a bug in
> which the TCP BTL will recuse itself from communications between the
> two nodes.
you are right - subnets are different, but routes set up correctl
OMPI doesn't care what your hosts are named - many of us use names
that have no numeric pattern or any other discernible pattern to them.
OMPI_MCA_rds_hostfile should point to a file that contains a list of
the hosts - have you ensured that it does, and that the hostfile
format is correct?
I listed the node names in the path named in ompi_info --param rds
hostfile -- no luck.
I also tried copying that file to another location and setting
OMPI_MCA_rds_hostfile_path -- no luck.
The remote hosts are named op2-1 and op2-2. Could this be another case
of the problem I saw a few days
For the 1.2 release, I believe you will find the enviro param is
OMPI_MCA_rds_hostfile_path - you can check that with "ompi_info".
On Jul 29, 2008, at 11:10 AM, Mark Borgerding wrote:
Umm ... what -hostfile file?
I am not starting anything via mpiexec/orterun so there is no "-
hostfile" ar
Hello
So now we will try to put Linux machine and SunFires to the same subnet
and test again
It's working now! Thanks!
Alexander Shabarshin
Umm ... what -hostfile file?
I am not starting anything via mpiexec/orterun so there is no
"-hostfile" argument AFAIK.
Is there some other way to communicate this? An environment variable or
mca param?
-- Mark
Ralph Castain wrote:
Are the hosts where you want the children to go in your -ho
Hello
Yes, you are right - subnets are different, but routes set up correctly
and everything like ping, ssh etc. are working OK between them
But it isn't a routing problem but how the tcp btl in Open MPI decides
which interface the nodes can communicate with (completely out of the
hands of the
Date: Tue, 29 Jul 2008 09:03:40 -0400 From: "Alexander Shabarshin"
Subject: Re: [OMPI users]
Communitcation between OpenMPI and ClusterTools To:
Message-ID:
<001e01c8f17b$867d2900$0349130a@Shabarshin> Content-Type: text/plain;
format=flowed; charset="iso-8859-1"; reply-type=response Hello
Are the hosts where you want the children to go in your -hostfile
file? All of the hosts you intend to use have to be in that file, even
if they don't get used until the comm_spawn.
On Jul 29, 2008, at 9:08 AM, Mark Borgerding wrote:
I've tried lots of different values for the "host" key in
I've tried lots of different values for the "host" key in the info handle.
I've tried hardcoding the hostname+ip entries in the /etc/hosts file --
no luck. I cannot get my MPI_Comm_spawn children to go anywhere else on
the network.
mpiexec can start groups on the other machines just fine.
It
The string "localhost" may not be recognized in the 1.2 series for
comm_spawn. Do a "hostname" and use that string instead - should work.
Ralph
On Jul 28, 2008, at 10:38 AM, Mark Borgerding wrote:
When I add the info parameter in MPI_Comm_spawn, I get the error
"Some of the requested hosts a
Hello
Yes, you are right - subnets are different, but routes set up correctly and
everything like ping, ssh etc. are working OK between them
Alexander Shabarshin
P.S. Between Linuxes I even tried different versions of OpenMPI 1.2.4 and
1.2.5 - these versions work together correctly, but not
On Tue, 29 Jul 2008, Jeff Squyres wrote:
...
I suggest that you bring this issue up with PGI support; they're fairly
responsive on their web forums.
...
Will do: thanks for giving this a look, you've been really helpful.
Cheers,
Mark
--
> Thanks for the fast answer. So is this latency normal for TCP
> communications over MPI!? Could RDMA maybe reduce the latency? It
> should work with those cards but there are still problems with OFED.
> iWARP is also one of the features they offer but if it works...
Hi Andy,
Yes, ~40us TCP laten
On Jul 29, 2008, at 6:52 AM, Mark Dixon wrote:
FWIW: I compile with PGI 7.1.4 regularly on RHEL4U4 and don't see
this problem. It would be interesting to see the config.log's from
these builds to see the actual details of what went wrong.
Thanks Jeff: it's good to know it's just me ;)
:-
On Jul 29, 2008, at 6:01 AM, George Bosilca wrote:
I just want to make sure that I correctly understand your statement.
You're saying that running NetPIPE (NPtcp) directly over TCP give
you a latency of 12us, but running NetPIPE (NPmpi) over Open MPI
bring this latency up to 45us ?
That d
On Jul 29, 2008, at 4:52 AM, Andy Georgi wrote:
The upcoming Open MPI v1.3 series will support iWARP, which gives
much
better latency than that. I don't know all the Chelsio models
offhand;
are those iWARP-capable cards?
Thanks for the fast answer. So is this latency normal for TCP
commun
I have not tested this type of setup so the following disclaimer needs to be said. These are not exactly the same release number. They are close but their code could have something in them that makes them incompatible.
One idea comes to mind is whether the two nodes are on the same subnet? If
On Mon, 28 Jul 2008, Jeff Squyres wrote:
FWIW: I compile with PGI 7.1.4 regularly on RHEL4U4 and don't see this
problem. It would be interesting to see the config.log's from these builds
to see the actual details of what went wrong.
Thanks Jeff: it's good to know it's just me ;)
Following y
I just want to make sure that I correctly understand your statement.
You're saying that running NetPIPE (NPtcp) directly over TCP give you
a latency of 12us, but running NetPIPE (NPmpi) over Open MPI bring
this latency up to 45us ?
george.
On Jul 29, 2008, at 10:52 AM, Andy Georgi wrote:
Zitat von Jeff Squyres :
On Jul 28, 2008, at 2:53 PM, Andy Georgi wrote:
we use Chelsio S320E-CXA adapters
(http://www.chelsio.com/assetlibrary/products/S320E%20Product%20Brief%20080424.pdf)
in one of our clusters. After tuning the kernel i measured the ping
pong latency via NetPIPE and got
Dear OpenMPI users,
I wanted to install a fresh copy of the OpenMPI-1.2.6 on a testing Ubuntu 8.04
machine. I have compiled openmpi using the Intel compiler 10.1 (C++, Fortran).
It compiled without any problems. By doing
make check
most of the checks PASS, but (I think) only the following does
29 matches
Mail list logo