Re: [openib-general] IPoIB still not working

2004-12-21 Thread Roland Dreier
Josh Roland, Alright...I'm not sure whats going on. I just now Josh got a chance to test this out and it still isn't working. Josh I've tried with the latest from SVN as well as the exact Josh same kernel/rootFS you were using, and I still can't ping Josh between nodes. I'm

RE: [openib-general] IPoIB still not working

2004-12-21 Thread Hal Rosenstock
Hi Josh, [You wrote:] I've tried with the latest from SVN as well as the exact same kernel/rootFS you were using, and I still can't ping between nodes. I'm just doing 'modprobe ib_mthca; modprobe ib_ipoib; ifconfig ...; ping ...' on the same two nodes. Am I missing something? Anything in

Re: [openib-general] IPoIB still not working

2004-12-20 Thread Josh England
Roland, Alright...I'm not sure whats going on. I just now got a chance to test this out and it still isn't working. I've tried with the latest from SVN as well as the exact same kernel/rootFS you were using, and I still can't ping between nodes. I'm just doing 'modprobe ib_mthca; modprobe

RE: [openib-general] IPoIB still not working

2004-12-16 Thread Hal Rosenstock
On Wed, 2004-12-15 at 13:22, England, Joshua J wrote: I'll definitely pound on the stuff and let you know if anything breaks. You are using the 4.3.5 firmware, right ? I want to put the proper info into the IPoIB FAQ. Thanks. -- Hal ___

RE: [openib-general] IPoIB still not working

2004-12-16 Thread England, Joshua J
Title: RE: [openib-general] IPoIB still not working They're on 4.5.3. -JE -Original Message- From: Hal Rosenstock [mailto:[EMAIL PROTECTED]] Sent: Thu 12/16/2004 8:54 AM To: England, Joshua J Cc: Roland Dreier; Robert J Woodruff; [EMAIL PROTECTED] Subject: RE: [openib-general

RE: [openib-general] IPoIB still not working

2004-12-15 Thread Hal Rosenstock
Hi Woody, One thing I would be sure of is the following: In /var/log/messages (or via dmesg), make sure there are no oops (that all processes needed are running). I have seen (just reported) what I think is an alignment issue on x64_64 which is what I think you are using. -- Hal

Re: [openib-general] IPoIB still not working

2004-12-15 Thread Hal Rosenstock
On Wed, 2004-12-15 at 10:51, Roland Dreier wrote: Thanks to Josh England for giving me access to a couple of his machines for debugging, and to Tziporet Koren for pointing out this Arbel feature. + /* Arbel workaround -- low byte of GID must be 2 */ + av-dgid[3] =

Re: [openib-general] IPoIB still not working

2004-12-15 Thread Roland Dreier
OK, I think the following change (already committed) should fix IPoIB with PCI Express HCAs. Thanks to Josh England for giving me access to a couple of his machines for debugging, and to Tziporet Koren for pointing out this Arbel feature. - Roland Index: infiniband/hw/mthca/mthca_av.c

Re: [openib-general] IPoIB still not working

2004-12-15 Thread Hal Rosenstock
On Wed, 2004-12-15 at 11:13, Roland Dreier wrote: As I understand it, this is a permanent workaround for an Arbel hardware issue. If that is the case, isn't there more than this that needs to be done ? -- Hal ___ openib-general mailing list [EMAIL

Re: [openib-general] IPoIB still not working

2004-12-15 Thread Roland Dreier
Hal If that is the case, isn't there more than this that needs to Hal be done ? The docs say that the low byte of the GID has to be set to 2 when a GRH is not being included in the AV. That's what this patch does. Why would there be anything else required? - Roland

Re: [openib-general] IPoIB still not working

2004-12-15 Thread Roland Dreier
Hal Isn't the GID the subnet prefix concatenated with the EUI-64 GUID ? Hal If so, I see 2 issues: 1. GUIDInfo should also reflect this. Hal 2. Uniquification of EUI-64s (flag odd ones as errors) This change/workaround only affects the value put into an internal HCA data structure

RE: [openib-general] IPoIB still not working

2004-12-15 Thread Woodruff, Robert J
Hal + /* Arbel workaround -- low byte of GID must be 2 */ + av-dgid[3] = cpu_to_be32(2); Is this intended as a temporary or permanent workaround ? If temporary, how temporary (is there a timeframe for the real fix) and any idea of what it would entail ? -- Hal

RE: [openib-general] IPoIB still not working

2004-12-15 Thread England, Joshua J
Title: RE: [openib-general] IPoIB still not working Roland, Thanks for taking the time to debug this issue. I'll definitely pound on the stuff and let you know if anything breaks. -JE -Original Message- From: [EMAIL PROTECTED] on behalf of Roland Dreier Sent: Wed 12/15/2004 8:51

RE: [openib-general] IPoIB still not working

2004-12-13 Thread Hal Rosenstock
[You wrote:] MGID0xff12401b : 0x0016 [1102546997:53389][18007] - osm_mcmr_rcv_join_mgrp: ERR 1B11: method = SubnAdmSet,scope_state = 0x1, component mask = 0x00010083, expected comp mask = 0x000130c7. MGID 0xff12401b :

Re: [openib-general] IPoIB still not working

2004-12-13 Thread Roland Dreier
By the way, does anyone have a fabric with both PCI-X and PCIe HCAs? It would be interesting to see if a PCI-X HCA receives ARPs sent by a PCIe HCA (and also test the other direction). - R. ___ openib-general mailing list [EMAIL PROTECTED]

RE: [openib-general] IPoIB still not working

2004-12-13 Thread Woodruff, Robert J
I believe (yet to be confirmed) that this is fixed in the upcoming Arbel firmware release. I hope so. I am running the 5.4.6-rc4 firmware that I got from Trevor and it still fails. Perhaps Eitan knows if there is some newer version that fixes this issue. There could be an OpenIB problem. I

RE: [openib-general] IPoIB still not working

2004-12-13 Thread Josh England
On Mon, 2004-12-13 at 14:56 -0800, Woodruff, Robert J wrote: I have other stacks that work fine on the same systems with the same hardware This is what I'm seeing as well. I'm doing IPoIB over PCI-E just fine with VAPI 3.2. Could there be some extra (software) initialization stuff for PCI-E

Re: [openib-general] IPoIB still not working

2004-12-13 Thread Roland Dreier
Robert I have other stacks that work fine on the same systems Robert with the same hardware so I suppose that mthca could be Robert doing something wrong. Seems like there must be an mthca bug -- unfortunately I don't have a PCI-e system set up to debug IPoIB so it may be a while

Re: [openib-general] IPoIB still not working

2004-12-10 Thread Hal Rosenstock
Woodruff, Robert J wrote: I was able to get rid of the multicast join errors by removing various network services from starting up. Now I see no errors in joining multicast groups, since I am only joining the 2 default ones, but I still cannot ping on EM64T with the PCI-E HCAs. Those join

RE: [openib-general] IPoIB still not working

2004-12-10 Thread Woodruff, Robert J
[1102546997:53389][18007] - osm_mcmr_rcv_join_mgrp: ERR 1B11: method = SubnAdmSet,scope_state = 0x1, component mask = 0x00010083, expected comp mask = 0x000130c7. This is the IPv4 equivalent of what the previous post on All Routers Multicast Group. For some reason, your

Re: [openib-general] IPoIB still not working

2004-12-08 Thread Roland Dreier
Eitan Results with - ERR 1B10: Provided Join State != FullMember Eitan - required for create. You can not create a group if you Eitan are not a full member. Right. However, ScopeState is dumped as 0x1, which means bit 0 of JoinState (the FullMember bit) is in fact set, so OpenSM

RE: [openib-general] IPoIB still not working

2004-12-08 Thread Hal Rosenstock
Hi Eitan, The component masks are also significant relative to the below. See embedded comments. On Wed, 2004-12-08 at 01:39, Eitan Zahavi wrote: Forgive me for not following the entire thread. But I did take a look at the log files: The 64bit version have the following multicast

RE: [openib-general] IPoIB still not working

2004-12-08 Thread Woodruff, Robert J
Rolland I think the difference is not 32 bit vs. 64 bit but no IPv6 vs IPv6. Ok, I'll take a look at the opensm code and see if we can make a fix that will allow it to work in the IPv6 case. ___ openib-general mailing list [EMAIL PROTECTED]

Re: [openib-general] IPoIB still not working

2004-12-08 Thread Sean Hefty
Roland Dreier wrote: Eitan Results with - ERR 1B10: Provided Join State != FullMember Eitan - required for create. You can not create a group if you Eitan are not a full member. Right. However, ScopeState is dumped as 0x1, which means bit 0 of JoinState (the FullMember bit) is in

RE: [openib-general] IPoIB still not working

2004-12-08 Thread Woodruff, Robert J
I found the problem with parsing of ScopeState. It appears that there was a bug in ib_types.h in the sourceforge code base. This was already fixed in the latest Mellanox code base. However, with that fixed, I still get the following error from opensm when ipoib tries to join the multicast group.

RE: [openib-general] IPoIB still not working [was IPoIB FAQ Update]

2004-12-07 Thread Hal Rosenstock
On Tue, 2004-12-07 at 14:33, Woodruff, Robert J wrote: Ok here is what I found. I installed the openib.org code on my EM64T (x86_64) systems that have PCI-E HCAs. We have a 8 node Mellanox switch. On one node (a 32-bit Xeon node), we are running opensm from the SF project. We have one

RE: [openib-general] IPoIB still not working [was IPoIB FAQ Update]

2004-12-07 Thread Woodruff, Robert J
Is it correct to presume you do not have an IB analyzer to capture the SA packets on the link to/from the x86_64 system ? -- Hal No, unfortunately we do not have a 4x IB analyzer. Note that the same openib.org code running on a 32-bit system does not seem to invoke the complaints from

RE: [openib-general] IPoIB still not working [was IPoIB FAQ Update]

2004-12-07 Thread Woodruff, Robert J
Are the x86_64 machines PCIe or PCIX ? (Mine is PCIX). -- Hal PCIe ___ openib-general mailing list [EMAIL PROTECTED] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

RE: [openib-general] IPoIB still not working [was IPoIB FAQ Update]

2004-12-07 Thread Hal Rosenstock
On Tue, 2004-12-07 at 15:26, Woodruff, Robert J wrote: No, unfortunately we do not have a 4x IB analyzer. Do you have 4x to 1x cables ? Then you could use a 1x analyzer for this. Note that the same openib.org code running on a 32-bit system does not seem to invoke the complaints from opensm.

RE: [openib-general] IPoIB still not working [was IPoIB FAQ Update]

2004-12-07 Thread Woodruff, Robert J
Do you have 4x to 1x cables ? Then you could use a 1x analyzer for this. Use to have one, but I think it was lost. The 1x analyzer we have is very old and out of date, not sure that it is worth trying to get running. Yes, but is that SA client local to the node with the OpenSM ? I'm not

RE: [openib-general] IPoIB still not working

2004-12-07 Thread Hal Rosenstock
On Tue, 2004-12-07 at 20:11, Woodruff, Robert J wrote: So other than this, does IPoIB seem to be working ? I guess that's at least IPv4. It doesn't seem like IPv6 could be working properly. -- Hal ___ openib-general mailing list [EMAIL PROTECTED]

RE: [openib-general] IPoIB still not working

2004-12-07 Thread Eitan Zahavi
Title: RE: [openib-general] IPoIB still not working Forgive me for not following the entire thread. But I did take a look at the log files: The 64bit version have the following multicast activities: 1. Port 0x0002c9010ad258f1 joining MLID 0xC000 - success. Note that MLID 0xC000