Is the port state active ?
The port is active for port 1 and down for port 2. Port 2 is not connected.
What are you running ? Is this OpenSM and IPoIB off the trunk or something else
?
I am at a loss to find out what the problem is. I did notice a lot of errors in
the /var/log/osm.log which I have listed below for today:
Yes, I guess I should have mentioned that. I am running cAos 2.0 with
the openib package along with the opensm that comes with openib. I am
also trying to run over IPoIB.
Aug 24 08:19:10 [42FFF960] -> osm_report_notice: Reporting Generic
Notice type:3 num:67 from LID:0x0001
GID:0xfe80000000000000,0x0005ad000003d269
Aug 24 08:19:10 [42FFF960] -> osm_vendor_send: RMPP 0 length 112
Aug 24 08:19:10 [42FFF960] -> osm_mcmr_rcv_join_mgrp: ERR 1B11: method =
SubnAdmSet,scope_state = 0x1, component mask = 0x0000000000010083,
expected comp mask = 0x00000000000130c7.
It appears that a join is failing for some reason. It doesn't say which group
(MGID) this is. (I will add that into the log).
The SM is receiving a join rather than a create request for
a new multicast group. That might be OK depending on which group it is.
Aug 24 08:19:10 [42FFF960] -> osm_vendor_send: RMPP 0 length 256
Aug 24 08:19:14 [42FFF960] -> osm_vendor_send: RMPP 0 length 112
Aug 24 08:19:14 [42FFF960] -> osm_vendor_send: RMPP 0 length 112
Aug 24 08:19:14 [42FFF960] -> osm_vendor_send: RMPP 0 length 112
Aug 24 08:19:14 [42FFF960] -> osm_vendor_send: RMPP 0 length 112
Aug 24 08:19:14 [42FFF960] -> osm_report_notice: Reporting Generic
Notice type:3 num:67 from LID:0x0001
GID:0xfe80000000000000,0x0005ad000003d269
Aug 24 08:19:14 [42FFF960] -> osm_report_notice: Reporting Generic
Notice type:3 num:67 from LID:0x0001
GID:0xfe80000000000000,0x0005ad000003d269
Aug 24 08:19:16 [447FF960] -> umad_receiver: recv error Interrupted
system call
Aug 24 08:22:05 [AB441140] -> OpenSM Rev:openib-1.0.0
Aug 24 08:22:05 [AB441140] -> osm_opensm_init: Forcing single threaded
dispatcher.
It looks like OpenSM restarted here. If OpenSM is restarted currently, the IPoIB
interface needs to be downed and then upped as client reregistration is not currently
supported.
Yes, from the 4.5 hours I spent looking yesterday and with looking at
the arp table, this makes since. What I ended up doing to fix it is to
bring down ib0 and then bring it back up. After a little while when I
started to try and ping, things were back to working. I will have to say
that I was very concerned with our applications running using IPoIB, but
after you mentioned this and after what I saw, I think we will be ok.
Thank you for your response.
Sean
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general