Josh Roland, Alright...I'm not sure whats going on. I just now
Josh got a chance to test this out and it still isn't working.
Josh I've tried with the latest from SVN as well as the exact
Josh same kernel/rootFS you were using, and I still can't ping
Josh between nodes. I'm
Hi Josh,
[You wrote:]
I've tried with the latest from
SVN as well as the exact same kernel/rootFS you were using, and I still
can't ping between nodes. I'm just doing 'modprobe ib_mthca; modprobe
ib_ipoib; ifconfig ...; ping ...' on the same two nodes. Am I missing
something?
Anything in
Roland,
Alright...I'm not sure whats going on. I just now got a chance to test
this out and it still isn't working. I've tried with the latest from
SVN as well as the exact same kernel/rootFS you were using, and I still
can't ping between nodes. I'm just doing 'modprobe ib_mthca; modprobe
On Wed, 2004-12-15 at 13:22, England, Joshua J wrote:
I'll definitely pound on the stuff and let you know if anything
breaks.
You are using the 4.3.5 firmware, right ? I want to put the proper info
into the IPoIB FAQ. Thanks.
-- Hal
___
Title: RE: [openib-general] IPoIB still not working
They're on 4.5.3.
-JE
-Original Message-
From: Hal Rosenstock [mailto:[EMAIL PROTECTED]]
Sent: Thu 12/16/2004 8:54 AM
To: England, Joshua J
Cc: Roland Dreier; Robert J Woodruff; [EMAIL PROTECTED]
Subject: RE: [openib-general
Hi Woody,
One thing I would be sure of is the following:
In /var/log/messages (or via dmesg), make sure there are no oops
(that all processes needed are running).
I have seen (just reported) what I think is an alignment issue on x64_64
which is what I think you are using.
-- Hal
On Wed, 2004-12-15 at 10:51, Roland Dreier wrote:
Thanks to Josh England for giving me access to
a couple of his machines for debugging, and to Tziporet Koren for
pointing out this Arbel feature.
+ /* Arbel workaround -- low byte of GID must be 2 */
+ av-dgid[3] =
OK, I think the following change (already committed) should fix IPoIB
with PCI Express HCAs. Thanks to Josh England for giving me access to
a couple of his machines for debugging, and to Tziporet Koren for
pointing out this Arbel feature.
- Roland
Index: infiniband/hw/mthca/mthca_av.c
On Wed, 2004-12-15 at 11:13, Roland Dreier wrote:
As I understand it, this is a permanent workaround for an Arbel
hardware issue.
If that is the case, isn't there more than this that needs to be done ?
-- Hal
___
openib-general mailing list
[EMAIL
Hal If that is the case, isn't there more than this that needs to
Hal be done ?
The docs say that the low byte of the GID has to be set to 2 when a
GRH is not being included in the AV. That's what this patch does.
Why would there be anything else required?
- Roland
Hal Isn't the GID the subnet prefix concatenated with the EUI-64 GUID ?
Hal If so, I see 2 issues: 1. GUIDInfo should also reflect this.
Hal 2. Uniquification of EUI-64s (flag odd ones as errors)
This change/workaround only affects the value put into an internal HCA
data structure
Hal
+ /* Arbel workaround -- low byte of GID must be 2 */
+ av-dgid[3] = cpu_to_be32(2);
Is this intended as a temporary or permanent workaround ? If temporary,
how temporary (is there a timeframe for the real fix) and any idea of
what it would entail ?
-- Hal
Title: RE: [openib-general] IPoIB still not working
Roland,
Thanks for taking the time to debug this issue. I'll definitely pound on the stuff and let you know if anything breaks.
-JE
-Original Message-
From: [EMAIL PROTECTED] on behalf of Roland Dreier
Sent: Wed 12/15/2004 8:51
[You wrote:]
MGID0xff12401b : 0x0016
[1102546997:53389][18007] - osm_mcmr_rcv_join_mgrp: ERR 1B11:
method = SubnAdmSet,scope_state = 0x1, component mask =
0x00010083, expected comp mask = 0x000130c7.
MGID 0xff12401b :
By the way, does anyone have a fabric with both PCI-X and PCIe HCAs?
It would be interesting to see if a PCI-X HCA receives ARPs sent by a
PCIe HCA (and also test the other direction).
- R.
___
openib-general mailing list
[EMAIL PROTECTED]
I believe (yet to be confirmed) that this is fixed in the upcoming
Arbel firmware release.
I hope so. I am running the 5.4.6-rc4 firmware that I got from Trevor
and it still fails. Perhaps Eitan knows if there is some newer version
that fixes this issue.
There could be an OpenIB problem. I
On Mon, 2004-12-13 at 14:56 -0800, Woodruff, Robert J wrote:
I have other stacks that work fine on the same systems with the same
hardware
This is what I'm seeing as well. I'm doing IPoIB over PCI-E just fine
with VAPI 3.2. Could there be some extra (software) initialization
stuff for PCI-E
Robert I have other stacks that work fine on the same systems
Robert with the same hardware so I suppose that mthca could be
Robert doing something wrong.
Seems like there must be an mthca bug -- unfortunately I don't have a
PCI-e system set up to debug IPoIB so it may be a while
Woodruff, Robert J wrote:
I was able to get rid of the multicast join errors by removing various
network services
from starting up. Now I see no errors in joining multicast
groups, since I am only joining the 2 default ones, but I still cannot
ping on EM64T
with the PCI-E HCAs.
Those join
[1102546997:53389][18007] - osm_mcmr_rcv_join_mgrp: ERR 1B11:
method = SubnAdmSet,scope_state = 0x1, component mask =
0x00010083, expected comp mask = 0x000130c7.
This is the IPv4 equivalent of what the previous post on All Routers
Multicast
Group.
For some reason, your
Eitan Results with - ERR 1B10: Provided Join State != FullMember
Eitan - required for create. You can not create a group if you
Eitan are not a full member.
Right. However, ScopeState is dumped as 0x1, which means bit 0 of
JoinState (the FullMember bit) is in fact set, so OpenSM
Hi Eitan,
The component masks are also significant relative to the below. See
embedded comments.
On Wed, 2004-12-08 at 01:39, Eitan Zahavi wrote:
Forgive me for not following the entire thread.
But I did take a look at the log files:
The 64bit version have the following multicast
Rolland I think the difference is not 32 bit vs. 64 bit but no IPv6 vs
IPv6.
Ok, I'll take a look at the opensm code and see if we can make a fix
that
will allow it to work in the IPv6 case.
___
openib-general mailing list
[EMAIL PROTECTED]
Roland Dreier wrote:
Eitan Results with - ERR 1B10: Provided Join State != FullMember
Eitan - required for create. You can not create a group if you
Eitan are not a full member.
Right. However, ScopeState is dumped as 0x1, which means bit 0 of
JoinState (the FullMember bit) is in
I found the problem with parsing of ScopeState. It appears that there
was a bug
in ib_types.h in the sourceforge code base. This was already fixed in
the latest Mellanox code base. However, with that fixed, I still get the
following
error from opensm when ipoib tries to join the multicast group.
On Tue, 2004-12-07 at 14:33, Woodruff, Robert J wrote:
Ok here is what I found. I installed the openib.org code on my
EM64T (x86_64) systems that have PCI-E HCAs. We have a 8 node
Mellanox switch. On one node (a 32-bit Xeon node),
we are running opensm from the SF project.
We have one
Is it correct to presume you do not have an IB analyzer to capture the
SA packets on the link to/from the x86_64 system ?
-- Hal
No, unfortunately we do not have a 4x IB analyzer.
Note that the same openib.org code running
on a 32-bit system does not seem to invoke the complaints from
Are the x86_64 machines PCIe or PCIX ? (Mine is PCIX).
-- Hal
PCIe
___
openib-general mailing list
[EMAIL PROTECTED]
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
On Tue, 2004-12-07 at 15:26, Woodruff, Robert J wrote:
No, unfortunately we do not have a 4x IB analyzer.
Do you have 4x to 1x cables ? Then you could use a 1x analyzer for this.
Note that the same openib.org code running
on a 32-bit system does not seem to invoke the complaints from opensm.
Do you have 4x to 1x cables ? Then you could use a 1x analyzer for
this.
Use to have one, but I think it was lost. The 1x analyzer we have
is very old and out of date, not sure that it is worth trying to
get running.
Yes, but is that SA client local to the node with the OpenSM ? I'm not
On Tue, 2004-12-07 at 20:11, Woodruff, Robert J wrote:
So other than this, does IPoIB seem to be working ? I guess that's at
least IPv4. It doesn't seem like IPv6 could be working properly.
-- Hal
___
openib-general mailing list
[EMAIL PROTECTED]
Title: RE: [openib-general] IPoIB still not working
Forgive me for not following the entire thread.
But I did take a look at the log files:
The 64bit version have the following multicast activities:
1. Port 0x0002c9010ad258f1 joining MLID 0xC000 - success.
Note that MLID 0xC000
32 matches
Mail list logo