Re: [PATCH] opensm/opensm/osm_lid_mgr.c: set "send_set" when setting rereg bit (Was: Re: [ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup.)
On 07:59 Wed 14 May , Ira Weiny wrote: > Yea I guess it should be. Ok. I applied this to 1.3 branch too. Sasha ___ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [PATCH] opensm/opensm/osm_lid_mgr.c: set "send_set" when setting rereg bit (Was: Re: [ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup.)
Yea I guess it should be. Ira On Wed, 14 May 2008 04:24:04 -0700 Hal Rosenstock <[EMAIL PROTECTED]> wrote: > On Wed, 2008-05-14 at 00:02 +, Sasha Khapyorsky wrote: > > On 18:16 Thu 24 Apr , Ira Weiny wrote: > > > > > > From 2e5511d6daf9c586c39698416e4bd36e24b13e62 Mon Sep 17 00:00:00 2001 > > > From: Ira K. Weiny <[EMAIL PROTECTED]> > > > Date: Thu, 24 Apr 2008 18:05:01 -0700 > > > Subject: [PATCH] opensm/opensm/osm_lid_mgr.c: set "send_set" when setting > > > rereg bit > > > > > > > > > Signed-off-by: Ira K. Weiny <[EMAIL PROTECTED]> > > > > Applied. Thanks. > > Would this change also be applied to ofed_1_3 branch ? > > -- Hal > > > Sasha > > ___ > > general mailing list > > [email protected] > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > ___ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [PATCH] opensm/opensm/osm_lid_mgr.c: set "send_set" when setting rereg bit (Was: Re: [ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup.)
On Wed, 2008-05-14 at 00:02 +, Sasha Khapyorsky wrote: > On 18:16 Thu 24 Apr , Ira Weiny wrote: > > > > From 2e5511d6daf9c586c39698416e4bd36e24b13e62 Mon Sep 17 00:00:00 2001 > > From: Ira K. Weiny <[EMAIL PROTECTED]> > > Date: Thu, 24 Apr 2008 18:05:01 -0700 > > Subject: [PATCH] opensm/opensm/osm_lid_mgr.c: set "send_set" when setting > > rereg bit > > > > > > Signed-off-by: Ira K. Weiny <[EMAIL PROTECTED]> > > Applied. Thanks. Would this change also be applied to ofed_1_3 branch ? -- Hal > Sasha > ___ > general mailing list > [email protected] > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general ___ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [PATCH] opensm/opensm/osm_lid_mgr.c: set "send_set" when setting rereg bit (Was: Re: [ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup.)
On 18:16 Thu 24 Apr , Ira Weiny wrote: > > From 2e5511d6daf9c586c39698416e4bd36e24b13e62 Mon Sep 17 00:00:00 2001 > From: Ira K. Weiny <[EMAIL PROTECTED]> > Date: Thu, 24 Apr 2008 18:05:01 -0700 > Subject: [PATCH] opensm/opensm/osm_lid_mgr.c: set "send_set" when setting > rereg bit > > > Signed-off-by: Ira K. Weiny <[EMAIL PROTECTED]> Applied. Thanks. Sasha ___ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [PATCH] opensm/opensm/osm_lid_mgr.c: set "send_set" when setting rereg bit (Was: Re: [ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup.)
On Sun, 27 Apr 2008 11:47:54 +0300 Or Gerlitz <[EMAIL PROTECTED]> wrote: > Ira Weiny wrote: > > > > I did not get any output with multicast_debug_level! > why should you, as from the node's point of view nothing has happened > (the exact param name is mcast_debug_level) > > > > Here is a patch which fixes the problem. (At least with the partial > > sub-nets > > configuration I explained before.) I will have to verify this fixes the > > problem > > I originally reported. > OK, good. Does this problem exist in the released openSM? if yes, what > would be the trigger for the SM to "really discover" (i.e do PortInfo > SET) this sub-fabric and how much time would it take to reach this > trigger, worst case wise? Yes, this is in the current released version of OpenSM, AFAICT. The trigger is: the single link separating the partial sub net will come up and that trap will cause OpenSM to resweep. I believe this will happen on the next resweep cycle which is by default 10 sec. (But this is configurable.) I don't think there is an issue with allowing OpenSM to resweep as designed. > > The failure configuration you have set to reproduce the problem is very > untypical, I think. I agree. I made a patch to turn off the processing of MAD's in the kernel to test my original theory, that the node is not responding to MAD's. Using this patch I have been able to verify that if a node stops responding that the rereg is sent by OpenSM when the node comes back. See my next email response to Sasha concerning the original issue. Ira > > Since under common clos etc topologies which don't > have a 1:n blocking nature, failure of such link would cause re-route > etc by the SM which would not (and should not) be noted by the nodes (I > hope I am not falling into another problem here...) > > Or. > > > ___ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
Re: [PATCH] opensm/opensm/osm_lid_mgr.c: set "send_set" when setting rereg bit (Was: Re: [ofa-general] Nodes dropping out of IPoIB mcast group due to a temporary node soft lockup.)
Ira Weiny wrote: I did not get any output with multicast_debug_level! why should you, as from the node's point of view nothing has happened (the exact param name is mcast_debug_level) Here is a patch which fixes the problem. (At least with the partial sub-nets configuration I explained before.) I will have to verify this fixes the problem I originally reported. OK, good. Does this problem exist in the released openSM? if yes, what would be the trigger for the SM to "really discover" (i.e do PortInfo SET) this sub-fabric and how much time would it take to reach this trigger, worst case wise? The failure configuration you have set to reproduce the problem is very untypical, I think. Since under common clos etc topologies which don't have a 1:n blocking nature, failure of such link would cause re-route etc by the SM which would not (and should not) be noted by the nodes (I hope I am not falling into another problem here...) Or. ___ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
