Hi Paul, On Tue, 2006-05-30 at 11:06, Paul wrote: > Hi All, > I will be working on this as time permits this week. > Unfortunately my employer is not crazy about giving out remote access, > so I will have to be your hands on this. If you want me to do > something just tell me what it is. I know its a pain I have been there > myself.
I should have access to a G5 in a day or so so let me see if I can recreate this. -- Hal > Regards. > > On 5/30/06, [EMAIL PROTECTED] <[EMAIL PROTECTED] > wrote: > Hal, > > With your patch to OpenSM, I think everything is ok on the > local node. The remote node is definitely having some > problems, resulting in not responding to the MAD packets. I > have entered a separate message on the problems with the "ib0" > interface on that machine. > > > > > On Fri, 2006-05-26 at 20:59, Hal Rosenstock wrote: > > > > What next, coach? > > > > > > Can you turn on madeye on the remote node and see what > packets are > > > received and sent ? Let me know if you need help with > that. I think you > > > said you were running OFED, right ? > > > > > Yes, I am running kernel 2.6.16 with the OFED RC5 release. I > will investigate how to run madeye, but the hangs on the > remote machine are probably the root cause of the link > failure. > > > I don't think madeye is part of OFED :-( Can it get added > for RC6, > > Tziporet ? I think it would be a useful tool to add for > problems like > > this. > > > > Also, was this a working setup before ? Did anything else > change besides > > installing RC5 on both nodes ? > > > > > This back to back setup was working originally with a > backported 2.6.11-34 kernel and I believe it was revision 6500 > from the OpenIB svn trunk at that time. The problems started > when I tried to move to RC4 and now RC5 of the OFED release, > with the 2.6.16 kernel. > > > I have two more experiments I'd like you to try, before we > go down the > > madeye "route": > > > > 1. Do you have another IB cable to try ? > > > > 2. Can you completely shutdown and repower the remote node > and see if it > > starts responding ? > > > > > It is difficult for me to debug this sort of thing, since I > telecommute from Tucson and the machines are located in > Phoenix. But I can get someone there to power the machine > down and reboot. > > -Don Albert- > > > _______________________________________________ > openib-general mailing list > [email protected] > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit > http://openib.org/mailman/listinfo/openib-general > > _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
