Don, On Fri, 2006-05-26 at 14:35, [EMAIL PROTECTED] wrote: > Hal, > > > Yes, that is very useful. I had been working on trying to come up > with > > what the problem was but this narrows it down to something I was > > thinking might be going on. > > > > It looks like you are running back to back HCAs, right ? > > Yes, the HCAs are 4X DDR, connected back to back. > > > > > It also looks to me like your remote (in terms of OpenSM) CA node is > not > > responding to SMA requests like SubnGet NodeInfo yet the link is > active. > > Can you describe what state that node is in (what modules are > loaded, > > etc.) ? Can you do an ibstat/ibstatus on that node ? > > Both systems are booted and the link appears active. Here is the > information you asked for: > > >>>>>>>>>>>>>>>>>>> > > Local System (where OpenSM is attempting to run) > > [koa] (ib) ib> ibstat > CA 'mthca0' > CA type: MT25204 > Number of ports: 1 > Firmware version: 1.0.800 > Hardware version: a0 > Node GUID: 0x0002c90200216dc4 > System image GUID: 0x0002c90200216dc7 > Port 1: > State: Initializing > Physical state: LinkUp > Rate: 20 > Base lid: 0 > LMC: 0 > SM lid: 0 > Capability mask: 0x02510a68 > Port GUID: 0x0002c90200216dc5 > [koa] (ib) ib> ibstatus > Infiniband device 'mthca0' port 1 status: > default gid: fe80:0000:0000:0000:0002:c902:0021:6dc5 > base lid: 0x0 > sm lid: 0x0 > state: 2: INIT > phys state: 5: LinkUp > rate: 20 Gb/sec (4X DDR) > > [koa] (ib) ib> /sbin/lsmod > Module Size Used by > parport_pc 28008 0 > lp 12872 0 > parport 37260 2 parport_pc,lp > ib_ipath 58392 0 > ipath_core 154596 1 ib_ipath > pcmcia 34864 0 > yenta_socket 25484 0 > rsrc_nonstatic 12160 1 yenta_socket > pcmcia_core 38068 3 pcmcia,yenta_socket,rsrc_nonstatic > button 7328 0 > battery 10120 0 > ac 5512 0 > uhci_hcd 31776 0 > hw_random 6824 0 > i2c_i801 10260 0 > i2c_core 20992 1 i2c_i801 > ib_mthca 109744 0 > ib_ipoib 48792 0 > ib_uverbs 34128 0 > ib_umad 14000 0 > ib_ucm 16520 0 > ib_sa 13884 1 ib_ipoib > ib_cm 30144 1 ib_ucm > ib_mad 35896 4 ib_mthca,ib_umad,ib_sa,ib_cm > ib_core 45952 9 > ib_ipath,ib_mthca,ib_ipoib,ib_uverbs,ib_umad,ib_ucm,ib_sa,ib_cm,ib_mad > floppy 67400 0 > > >>>>>>>>>>>>>>>>>>> > > Remote system (no OpenSM instance) > > [jatoba] (ib) ib> ibstat > CA 'mthca0' > CA type: MT25204 > Number of ports: 1 > Firmware version: 1.0.800 > Hardware version: a0 > Node GUID: 0x0002c90200216e40 > System image GUID: 0x0002c90200216e43 > Port 1: > State: Initializing > Physical state: LinkUp > Rate: 20 > Base lid: 0 > LMC: 0 > SM lid: 0 > Capability mask: 0x02510a68 > Port GUID: 0x0002c90200216e41 > [jatoba] (ib) ib> ibstatus > Infiniband device 'mthca0' port 1 status: > default gid: fe80:0000:0000:0000:0002:c902:0021:6e41 > base lid: 0x0 > sm lid: 0x0 > state: 2: INIT > phys state: 5: LinkUp > rate: 20 Gb/sec (4X DDR)
One more thing on the remote side, try: smpquery nodeinfo -D 0 > [jatoba] (ib) ib> /sbin/lsmod > Module Size Used by > parport_pc 28008 0 > lp 12872 0 > parport 37260 2 parport_pc,lp > ib_ipath 58392 0 > ipath_core 154596 1 ib_ipath > pcmcia 34864 0 > yenta_socket 25484 0 > rsrc_nonstatic 12160 1 yenta_socket > pcmcia_core 38068 3 pcmcia,yenta_socket,rsrc_nonstatic > button 7328 0 > battery 10120 0 > ac 5512 0 > uhci_hcd 31776 0 > hw_random 6824 0 > i2c_i801 10260 0 > i2c_core 20992 1 i2c_i801 > ib_mthca 109744 0 > ib_ipoib 48792 0 > ib_uverbs 34128 0 > ib_umad 14000 2 > ib_ucm 16520 0 > ib_sa 13884 1 ib_ipoib > ib_cm 30144 1 ib_ucm > ib_mad 35896 4 ib_mthca,ib_umad,ib_sa,ib_cm > ib_core 45952 9 > ib_ipath,ib_mthca,ib_ipoib,ib_uverbs,ib_umad,ib_ucm,ib_sa,ib_cm,ib_mad > floppy 67400 0 Do you also have an iPath adapter ? If not, no need to load those modules. > >>>>>>>>>>>>>>>>>>> > > > > > Can you try this patch to see if it gets you further and let me know > ? > > Note that this is just a potential workaround right now. > > > > I will try rebuilding with the patch and let you know the results. Thanks for your help in resolving this. -- Hal > Thanks, > -Don Albert- _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
