Hi, On Mon, 2006-01-16 at 05:48, Andrey Slepuhin wrote: > Dear folks, > > I have a problem starting opensm on a p5 570 machine.
Is this the first time trying this on a p5 machine ? > The following messages > appear in the opensm log file: > > ****************************************************************** > ******************** INITIATING HEAVY SWEEP ********************** > ****************************************************************** > > > Jan 16 13:30:55 737114 [40018DC0] -> osm_req_get: [ > Jan 16 13:30:55 737130 [40018DC0] -> osm_mad_pool_get: [ > Jan 16 13:30:55 737147 [40018DC0] -> osm_vendor_get: [ > Jan 16 13:30:55 737161 [40018DC0] -> osm_vendor_get: Acquiring UMAD for > p_madw = 0x100747dc, size = 256 > Jan 16 13:30:55 737176 [40018DC0] -> osm_vendor_get: Acquired UMAD > 0x1008ee40, size = 256 > Jan 16 13:30:55 737192 [40018DC0] -> osm_vendor_get: ] > Jan 16 13:30:55 737208 [40018DC0] -> osm_mad_pool_get: Acquired p_madw = > 0x100747d0, p_mad = 0x1008ee78, size = 256 > Jan 16 13:30:55 737223 [40018DC0] -> osm_mad_pool_get: ] > Jan 16 13:30:55 737238 [40018DC0] -> osm_req_get: Getting NodeInfo (0x11), > modifier = 0x0, TID = 0x1234 > Jan 16 13:30:55 737255 [40018DC0] -> osm_vl15_post: [ > Jan 16 13:30:55 737269 [40018DC0] -> osm_vl15_post: Posting p_madw = > 0x0x100747d0 > Jan 16 13:30:55 737284 [40018DC0] -> osm_vl15_post: 0 QP0 MADs on wire, 1 QP0 > MADs outstanding > Jan 16 13:30:55 737299 [40018DC0] -> osm_vl15_poll: [ > Jan 16 13:30:55 737313 [40018DC0] -> osm_vl15_poll: Signalling poller thread > Jan 16 13:30:55 737334 [40018DC0] -> osm_vl15_poll: ] > Jan 16 13:30:55 737338 [42827B20] -> __osm_vl15_poller: Servicing p_madw = > 0x100747d0 > Jan 16 13:30:55 737352 [40018DC0] -> osm_vl15_post: ] > Jan 16 13:30:55 737388 [40018DC0] -> osm_req_get: ] > Jan 16 13:30:55 737404 [40018DC0] -> __osm_state_mgr_sweep_hop_0: ] > Jan 16 13:30:55 737420 [40018DC0] -> osm_state_mgr_process: ] > Jan 16 13:30:55 737436 [40018DC0] -> osm_sm_sweep: ] > Jan 16 13:30:55 737464 [42827B20] -> SMP dump: > base_ver................0x1 > mgmt_class..............0x81 > class_ver...............0x1 > method..................0x1 (SubnGet) > D bit...................0x0 > status..................0x0 > hop_ptr.................0x0 > hop_count...............0x0 > trans_id................0x1234 > attr_id.................0x11 (NodeInfo) > resv....................0x0 > attr_mod................0x0 > m_key...................0x0000000000000000 > dr_slid.................0xFFFF > dr_dlid.................0xFFFF > > Initial path: [0] > Return path: [0] > Reserved: [0][0][0][0][0][0][0] > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 > > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 > > Jan 16 13:30:55 737604 [42827B20] -> osm_vendor_send: [ > Jan 16 13:30:55 737742 [42827B20] -> osm_vendor_send: Completed Sending > Request p_madw = 0x100747d0 > Jan 16 13:30:55 737761 [42827B20] -> osm_vendor_send: ] > Jan 16 13:30:55 737768 [43027B20] -> osm_mad_pool_get: [ > Jan 16 13:30:55 737784 [42827B20] -> __osm_vl15_poller: 1 QP0 MADs on wire, 1 > outstanding, 0 unicasts sent, 1 total sent > Jan 16 13:30:55 737812 [43027B20] -> osm_vendor_get: [ > Jan 16 13:30:55 737848 [43027B20] -> osm_vendor_get: Acquiring UMAD for > p_madw = 0x10074724, size = 256 > Jan 16 13:30:55 737866 [43027B20] -> osm_vendor_get: Acquired UMAD > 0x1008ef80, size = 256 > Jan 16 13:30:55 737883 [43027B20] -> osm_vendor_get: ] > Jan 16 13:30:55 737897 [43027B20] -> osm_mad_pool_get: Acquired p_madw = > 0x10074718, p_mad = 0x1008efb8, size = 256 > Jan 16 13:30:55 737915 [43027B20] -> osm_mad_pool_get: ] > Jan 16 13:30:55 737939 [43027B20] -> umad_receiver: ERR 5413: Failed to > obtain request madw for received MAD(method=0x81 > attr=0x11) -- dropping This means that no matching transaction was found in transaction match table. This may be an endian problem with the tid. Can you validate the tid (print them out) in both get_madw and put_madw in osm_vendor_ibumad.c ? Since this seems to happen early on, there shouldn't be too many of these. Thanks. > Jan 16 13:30:55 737960 [43027B20] -> osm_mad_pool_put: [ > Jan 16 13:30:55 737975 [43027B20] -> osm_mad_pool_put: Releasing p_madw = > 0x10074718, p_mad = 0x1008ed00 > Jan 16 13:30:55 737993 [43027B20] -> osm_vendor_put: [ > Jan 16 13:30:55 738008 [43027B20] -> osm_vendor_put: Retiring UMAD 0x1008ecc8 > Jan 16 13:30:55 738026 [43027B20] -> osm_vendor_put: ] > Jan 16 13:30:55 738041 [43027B20] -> osm_mad_pool_put: ] > > > My configuration consists of two 23108 HCAs directly connected without a > switch, > firmware is 3.3.3, kernel is 2.6.15-4 from OpenSUSE repository, userspace > revision is 4978. Are the two HCAs on separate machines ? -- Hal > Any help will be much appreciated > > Best regards, > Andrey > _______________________________________________ > openib-general mailing list > [email protected] > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
