Okay, I fixed this in r24536.
Sorry for the problem, Damien - thanks for catching it! Went unnoticed because
the folks at the Labs always use IB.
On Mar 16, 2011, at 7:20 PM, Ralph Castain wrote:
> I believe I see the problem - and why it wouldn't show up for IB. It looks
> like the hier modu
I believe I see the problem - and why it wouldn't show up for IB. It looks like
the hier module passes an incorrect flag to the modex unpack function, which
causes that function to place the modex values as attributes assigned to the
node instead of a process, rather than placing the values into
On Mar 16, 2011, at 5:37 PM, George Bosilca wrote:
> I just checked and IB does work correctly. But then I remembered that IB is
> different, the connection are peer based, so they don't happens during the
> modex exchange. The data is exchanged over RML messages, but outside the
> modex.
Not
I just checked and IB does work correctly. But then I remembered that IB is
different, the connection are peer based, so they don't happens during the
modex exchange. The data is exchanged over RML messages, but outside the modex.
george.
On Mar 16, 2011, at 17:28 , Ralph Castain wrote:
> In
In looking at this, perhaps you can help me understand something. The grpcomm
hier modex is the same regardless of what info is given to it. So how is it
that this works fine with IB, but not for the TCP btl? Are you relying on
something in the modex to track data identity, but the IB btl doesn'
Very strange - I'll bet it is something in the hier modex algo that is losing
the info about where the data came from. I'll take a look.
On Mar 16, 2011, at 2:25 PM, George Bosilca wrote:
> Actually I think that Damien analysis is correct. On a 8 nodes cluster
>
> mpirun -npernode 1 -np 4 --mc
Actually I think that Damien analysis is correct. On a 8 nodes cluster
mpirun -npernode 1 -np 4 --mca grpcomm hier --mca btl self,sm,tcp ./IMB-MPI1
Sendrecv
does work, while
mpirun -npernode 2 -np 4 --mca grpcomm hier --mca btl self,sm,tcp ./IMB-MPI1
Sendrecv
doesn't. As soon as I remove the
I suspect something else is wrong - the grpcomm system never has any visibility
as to what data goes into the modex, or how that data is used. In other words,
if the tcp btl isn't providing adequate info, then it would fail regardless of
which grpcomm module was in use. So your statement about t
Hi all
From my test, it is impossible to use "btl:tcp" with "grpcomm:hier".
The "grpcomm:hier" module is important because, "srun" launch protocol
can't use any other "grpcomm" module.
You can reproduce this bug, by using "btl:tcp" and "grpcomm:hier" , when
you create a ring(like: IMB sendrecv
sorry about that, we find a better way to resolve it later.
fix commited.
On Wed, Mar 16, 2011 at 6:00 PM, Jeff Squyres wrote:
> Ya, you're right -- I'm looking at my MTT right now and I see lots of
> broken installs.
>
> But it works if I compile manually. Weird.
>
> Mellanox -- please fix ASA
On 03/16/2011 12:00 PM, Jeff Squyres wrote:
Ya, you're right -- I'm looking at my MTT right now and I see lots of broken
installs.
But it works if I compile manually. Weird.
So when I saw your MTT results it was not finding a header file as
opposed to the problem I was incurring which was a r
Ya, you're right -- I'm looking at my MTT right now and I see lots of broken
installs.
But it works if I compile manually. Weird.
Mellanox -- please fix ASAP, or we'll likely back our r24507 so that people can
keep working...
On Mar 16, 2011, at 11:58 AM, George Bosilca wrote:
> The trunk i
The trunk is indeed broken. The reason is, as Terry pointed out, the inclusion
of infiniband/mad.h introduced by r24507
(https://svn.open-mpi.org/trac/ompi/changeset/24507). As long as OFED 1.4 is
available, it will compile independent of the version of the kernel,
libpthread, moon position or
rc1 was borked; we fixed it in rc2. This will likely be the last rc.
http://www.open-mpi.org/software/ompi/v1.5/
--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
On Mar 16, 2011, at 7:48 AM, Paul H. Hargrove wrote:
> I have looked before for symbols to distinguish LinuxThreads from NPTL, but I
> was not successful in finding anything. I don't recall if I examined headers
> for differences, but the implementations are binary compatible by design,
> maki
On Mar 16, 2011, at 6:50 AM, Terry Dontje wrote:
>> K. When Ralph and I removed that code, it was on he educated guess that no
>> one was using it (because it hasn't compiled right in a while). If we were
>> wrong, it can be put back, but someone will need to update it and Ralph and
>> I don't
On 03/16/2011 06:34 AM, Terry Dontje wrote:
On 03/16/2011 06:21 AM, Jeff Squyres wrote:
On Mar 16, 2011, at 5:51 AM, Terry Dontje wrote:
I've seen this with the following:
RH 4.6 / OFED 1.3.6
Errr... did you look
athttp://www.open-mpi.org/community/lists/devel/2011/03/9068.php?
Yes I did, a
I have looked before for symbols to distinguish LinuxThreads from NPTL,
but I was not successful in finding anything. I don't recall if I
examined headers for differences, but the implementations are binary
compatible by design, making differences intentionally minimal.
I suppose one can grep
On 03/16/2011 06:38 AM, Jeff Squyres (jsquyres) wrote:
K. When Ralph and I removed that code, it was on he educated guess
that no one was using it (because it hasn't compiled right in a
while). If we were wrong, it can be put back, but someone will need to
update it and Ralph and I don't have a
K. When Ralph and I removed that code, it was on he educated guess that no one
was using it (because it hasn't compiled right in a while). If we were wrong,
it can be put back, but someone will need to update it and Ralph and I don't
have access to machines to test that behavior.
Sent from my
On 03/16/2011 06:21 AM, Jeff Squyres wrote:
On Mar 16, 2011, at 5:51 AM, Terry Dontje wrote:
I've seen this with the following:
RH 4.6 / OFED 1.3.6
Errr... did you look at
http://www.open-mpi.org/community/lists/devel/2011/03/9068.php?
Yes I did, and I will be talking with my group about thi
Is there a version in a pthreads header file that can be checked?
You're right that I am currently checking Linux kernel version, not pthread
version. Note that this is *only* in cross-compiling environments; in non cross
compiling situations, we actually test the behavior to see if threads have
On Mar 16, 2011, at 5:51 AM, Terry Dontje wrote:
> I've seen this with the following:
>
> RH 4.6 / OFED 1.3.6
Errr... did you look at
http://www.open-mpi.org/community/lists/devel/2011/03/9068.php?
> CentOS 5.2 / OFED 1.3.6
> SLES 10.1 / OFED 1.3.6
>
> I know the above is pretty darn old bu
On 03/15/2011 03:54 PM, Jeff Squyres wrote:
Which Linux / OFED are you using?
I've seen this with the following:
RH 4.6 / OFED 1.3.6
CentOS 5.2 / OFED 1.3.6
SLES 10.1 / OFED 1.3.6
I know the above is pretty darn old but it would be nice to know what is
the oldest s/w we can be using? Note t
24 matches
Mail list logo