Re: [ofa-general] opensm: Unsupported attribute = 0xFF02

2007-11-01 Thread Hal Rosenstock
On Thu, 2007-11-01 at 09:48 -0700, Sean Hefty wrote:
> > As mentioned in the past, client reregistration is a rather large
> > hammer. There have been discussions on utilizing this mechanism in more
> > scenarios (which FWIW is not a good thing IMO). This approach (and it is
> > optional) pushes the burden back on the end nodes rather than the SM.
> > Scalability is certainly an issue with it. It was begrudgingly put into
> > the spec. It was intended only as a stopgap measure.
> > 
> > There was informative text put into the spec alluding to the
> > "appropriate" use of this option:
> > 
> > "A reason for the SM doing this might be that the SM suffered a failure
> > and as a result lost its own records of such subscriptions."
> > This is referring to a single SM (although that is not the recommended
> > deployment topology) crashing and being restarted.
> > 
> > IMO a civil SM would not rely on this mechanism.
> 
> There's still the problem that the ULPs on the end-node do not know when 
> or if the data is lost.

Such data is not supposed to be lost although this is left as an
exercise to the reader. That is all beyond the spec value add currently.

> IMO, making client reregistration mandatory

Couldn't be done due to backwards compatibility guarantee of IBA.
 
> would have been a better solution, allowing ULPs to only re-register on 
> that event.  As it stands now, ULPs automatically reregister on SM LID 
> changes, port events, etc..  In order to avoid ULP re-registration, SM 
> failover has to bring along the LID.

In general, it does. Most SMs do not change LIDs unless they absolutely
have to. Using reregister for this (and some other point cases) would be
fine but not in the general case of failover for all ports.

> An alternate solution could have let an SM learn what it needed from the 
> end nodes through queries...

Currently, SA queries are from the client (end node) to the SA; not the
other way around. The only thing from the SA to the end node are
reports. That could be changed if it really is needed.

-- Hal

> - Sean
___
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] opensm: Unsupported attribute = 0xFF02

2007-11-01 Thread Sean Hefty

As mentioned in the past, client reregistration is a rather large
hammer. There have been discussions on utilizing this mechanism in more
scenarios (which FWIW is not a good thing IMO). This approach (and it is
optional) pushes the burden back on the end nodes rather than the SM.
Scalability is certainly an issue with it. It was begrudgingly put into
the spec. It was intended only as a stopgap measure.

There was informative text put into the spec alluding to the
"appropriate" use of this option:

"A reason for the SM doing this might be that the SM suffered a failure
and as a result lost its own records of such subscriptions."
This is referring to a single SM (although that is not the recommended
deployment topology) crashing and being restarted.

IMO a civil SM would not rely on this mechanism.


There's still the problem that the ULPs on the end-node do not know when 
or if the data is lost.  IMO, making client reregistration mandatory 
would have been a better solution, allowing ULPs to only re-register on 
that event.  As it stands now, ULPs automatically reregister on SM LID 
changes, port events, etc..  In order to avoid ULP re-registration, SM 
failover has to bring along the LID.


An alternate solution could have let an SM learn what it needed from the 
end nodes through queries...


- Sean
___
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] opensm: Unsupported attribute = 0xFF02

2007-11-01 Thread Hal Rosenstock
On Thu, 2007-11-01 at 05:56 +0200, Sasha Khapyorsky wrote:
> On 20:41 Wed 31 Oct , Jason Gunthorpe wrote:
> > 
> > On Thu, Nov 01, 2007 at 02:24:10AM +0200, Sasha Khapyorsky wrote:
> > 
> > > What are the reasons? I think complaint SMs should be able to
> > > inter-operate, of course not in part of proprietary extensions. At least
> > > I am able to run OpenSM with Voltaire SM on one subnet.
> > 
> > At a minimum how hand off is supposed to work is very vaugely
> > specified in the IBA.
> 
> It is at least basically described in the IBA - with exchanging SMInfo.
> 
> > Besides, even if hand off wasn't a problem the two SMs would have to
> > have very similar ideas on routing, multicast, QOS, services, etc
> 
> In worst case the routing tables and QoS setups could be reconfigured
> from scratch (just as if it could be first SM run), and all SA related
> things could be rerequested with ClientReregistration bit.

As mentioned in the past, client reregistration is a rather large
hammer. There have been discussions on utilizing this mechanism in more
scenarios (which FWIW is not a good thing IMO). This approach (and it is
optional) pushes the burden back on the end nodes rather than the SM.
Scalability is certainly an issue with it. It was begrudgingly put into
the spec. It was intended only as a stopgap measure.

There was informative text put into the spec alluding to the
"appropriate" use of this option:

"A reason for the SM doing this might be that the SM suffered a failure
and as a result lost its own records of such subscriptions."
This is referring to a single SM (although that is not the recommended
deployment topology) crashing and being restarted.

IMO a civil SM would not rely on this mechanism.

-- Hal

> And sure, some configurations (partitions, QoS, routing, etc.) can be
> not synchronized for SMs, but then the differences in a fabric setups
> should be expected results.
> 
> And I'm not about "how fast and efficient it is" and even not about
> "interoperability" bugs in various implementations.
> 
> > or
> > the fabric will be badly disrupted after hand off.. Without extensions
> > to transfer this live data over before hand off it is unlikely to
> > be non-disruptive except in very constrained situations.
> > 
> > It seems to me the main benifit of the whole standardized mechanism
> > (in an interoperability context) is just to help make it so that a new
> > sm starting up doesn't just trash the fabric accidentally, and provide
> > at least some sensible behavior when two seperate subnets are combined
> > into one.
> > 
> > If you want to test hand over interop joining two operating networks
> > is a good way to do it - that is really hard to get right in all of
> > the cases :) This was the area where I felt the spec was weakest since
> > it really didn't say exactly when during the hand over exchanges each
> > SM was in control of the nodes, and exactly what should happen when
> > things go wrong was not specified..
> 
> Ok, so we are not about "impossibility" to do this... Just current lack
> of standardization makes it hard to do handover properly?
> 
> Sasha
> ___
> general mailing list
> [email protected]
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
___
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] opensm: Unsupported attribute = 0xFF02

2007-11-01 Thread Hal Rosenstock
On Thu, 2007-11-01 at 14:52 +0200, Sasha Khapyorsky wrote:
> On 04:56 Thu 01 Nov , Hal Rosenstock wrote:
> > 
> > SM handover/failover is tested for interop by the IBTA but that's it.
> 
> BTW was it officially done with SMs from different vendors?

Just SMInfo testing and the fact that the other became master  in
various scenarios.

-- Hal

> 
> Sasha
> ___
> general mailing list
> [email protected]
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
___
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] opensm: Unsupported attribute = 0xFF02

2007-11-01 Thread Sasha Khapyorsky
On 05:25 Thu 01 Nov , Hal Rosenstock wrote:
> > 
> > Nothing really special. I'm just running OpenSM on the subnet with
> > Voltaire managed switches (where VoltaireSM is on). And I don't remember
> > any big problems there (including handover, etc.). Never investigated
> > interoperability issue in deep however.
> 
> What ULPs if any have you run across failovers and failbacks ?

At least IPoIB is always running...

Sasha
___
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] opensm: Unsupported attribute = 0xFF02

2007-11-01 Thread Sasha Khapyorsky
On 04:56 Thu 01 Nov , Hal Rosenstock wrote:
> 
> SM handover/failover is tested for interop by the IBTA but that's it.

BTW was it officially done with SMs from different vendors?

Sasha
___
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] opensm: Unsupported attribute = 0xFF02

2007-11-01 Thread Sasha Khapyorsky
On 05:00 Thu 01 Nov , Hal Rosenstock wrote:
> On Thu, 2007-11-01 at 05:56 +0200, Sasha Khapyorsky wrote:
> > On 20:41 Wed 31 Oct , Jason Gunthorpe wrote:
> > > 
> > > On Thu, Nov 01, 2007 at 02:24:10AM +0200, Sasha Khapyorsky wrote:
> > > 
> > > > What are the reasons? I think complaint SMs should be able to
> > > > inter-operate, of course not in part of proprietary extensions. At least
> > > > I am able to run OpenSM with Voltaire SM on one subnet.
> > > 
> > > At a minimum how hand off is supposed to work is very vaugely
> > > specified in the IBA.
> > 
> > It is at least basically described in the IBA - with exchanging SMInfo.
> > 
> > > Besides, even if hand off wasn't a problem the two SMs would have to
> > > have very similar ideas on routing, multicast, QOS, services, etc
> > 
> > In worst case the routing tables and QoS setups could be reconfigured
> > from scratch (just as if it could be first SM run), and all SA related
> > things could be rerequested with ClientReregistration bit.
> 
> Routing tables are usually driven by algorithms (all beyond the spec)
> rather than table loading.
> 
> Don't trivialize management data in a large subnet. It is potentially a
> large amount of configuration which people try hard to avoid until they
> no longer have a choice.
> 
> I view client reregistration as a workaround for this very issue. I am
> regretting pushing that into the spec for that purpose.
> 
> > And sure, some configurations (partitions, QoS, routing, etc.) can be
> > not synchronized for SMs, but then the differences in a fabric setups
> > should be expected results.
> 
> Is that really acceptable for a real customer ?

This was not a question - "acceptable" and "impossible" is not a same.

Sasha
___
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] opensm: Unsupported attribute = 0xFF02

2007-11-01 Thread Hal Rosenstock
On Thu, 2007-11-01 at 14:32 +0200, Sasha Khapyorsky wrote:
> On 04:50 Thu 01 Nov , Hal Rosenstock wrote:
> > 
> > Aside from value adds (proprietary extensions) which is an important and
> > large issue as all SMs vendors claim advantage from this and they are
> > not being opened up, there are the issues of routing algorithms (both
> > unicast and multicast), management, and consistency of data. These are
> > all different to varying degrees for each flavor.
> > 
> > While I am aware of some customers requesting this, the IBTA does not
> > sanction this configuration. The MgtWG has produced a white paper on the
> > topic which is available on their web site.
> 
> Do you mean management interoperability white paper?

Yes.

> > In fact, some areas of the
> > above are beyond the IBTA charter. If you feel strongly about this, I
> > suggest you get a like minded vendors and get as much of this
> > standardized as possible. IMO the place for this is the IBTA MgtWG.
> 
> I see. Thanks for suggestion :)
> 
> > > At least
> > > I am able to run OpenSM with Voltaire SM on one subnet.
> > 
> > Define what you mean by run here ? What "experiments" have you
> > performed ?
> 
> Nothing really special. I'm just running OpenSM on the subnet with
> Voltaire managed switches (where VoltaireSM is on). And I don't remember
> any big problems there (including handover, etc.). Never investigated
> interoperability issue in deep however.

What ULPs if any have you run across failovers and failbacks ?

> > Does Voltaire stand behind this as a supported configuration ?
> 
> Not sure.

Let us know when you find out. I, for one, and I think there are others
would be very interested in this.

-- Hal

> > If not,
> > are there any plans to do so ?
> 
> No idea, sorry :(
> 
> Sasha
> ___
> general mailing list
> [email protected]
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
___
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] opensm: Unsupported attribute = 0xFF02

2007-11-01 Thread Sasha Khapyorsky
On 04:50 Thu 01 Nov , Hal Rosenstock wrote:
> 
> Aside from value adds (proprietary extensions) which is an important and
> large issue as all SMs vendors claim advantage from this and they are
> not being opened up, there are the issues of routing algorithms (both
> unicast and multicast), management, and consistency of data. These are
> all different to varying degrees for each flavor.
> 
> While I am aware of some customers requesting this, the IBTA does not
> sanction this configuration. The MgtWG has produced a white paper on the
> topic which is available on their web site.

Do you mean management interoperability white paper?

> In fact, some areas of the
> above are beyond the IBTA charter. If you feel strongly about this, I
> suggest you get a like minded vendors and get as much of this
> standardized as possible. IMO the place for this is the IBTA MgtWG.

I see. Thanks for suggestion :)

> > At least
> > I am able to run OpenSM with Voltaire SM on one subnet.
> 
> Define what you mean by run here ? What "experiments" have you
> performed ?

Nothing really special. I'm just running OpenSM on the subnet with
Voltaire managed switches (where VoltaireSM is on). And I don't remember
any big problems there (including handover, etc.). Never investigated
interoperability issue in deep however.

> Does Voltaire stand behind this as a supported configuration ?

Not sure.

> If not,
> are there any plans to do so ?

No idea, sorry :(

Sasha
___
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] opensm: Unsupported attribute = 0xFF02

2007-11-01 Thread Hal Rosenstock
On Thu, 2007-11-01 at 05:56 +0200, Sasha Khapyorsky wrote:
> On 20:41 Wed 31 Oct , Jason Gunthorpe wrote:
> > 
> > On Thu, Nov 01, 2007 at 02:24:10AM +0200, Sasha Khapyorsky wrote:
> > 
> > > What are the reasons? I think complaint SMs should be able to
> > > inter-operate, of course not in part of proprietary extensions. At least
> > > I am able to run OpenSM with Voltaire SM on one subnet.
> > 
> > At a minimum how hand off is supposed to work is very vaugely
> > specified in the IBA.
> 
> It is at least basically described in the IBA - with exchanging SMInfo.
> 
> > Besides, even if hand off wasn't a problem the two SMs would have to
> > have very similar ideas on routing, multicast, QOS, services, etc
> 
> In worst case the routing tables and QoS setups could be reconfigured
> from scratch (just as if it could be first SM run), and all SA related
> things could be rerequested with ClientReregistration bit.

Routing tables are usually driven by algorithms (all beyond the spec)
rather than table loading.

Don't trivialize management data in a large subnet. It is potentially a
large amount of configuration which people try hard to avoid until they
no longer have a choice.

I view client reregistration as a workaround for this very issue. I am
regretting pushing that into the spec for that purpose.

> And sure, some configurations (partitions, QoS, routing, etc.) can be
> not synchronized for SMs, but then the differences in a fabric setups
> should be expected results.

Is that really acceptable for a real customer ?

-- Hal

> And I'm not about "how fast and efficient it is" and even not about
> "interoperability" bugs in various implementations.
> 
> > or
> > the fabric will be badly disrupted after hand off.. Without extensions
> > to transfer this live data over before hand off it is unlikely to
> > be non-disruptive except in very constrained situations.
> > 
> > It seems to me the main benifit of the whole standardized mechanism
> > (in an interoperability context) is just to help make it so that a new
> > sm starting up doesn't just trash the fabric accidentally, and provide
> > at least some sensible behavior when two seperate subnets are combined
> > into one.
> > 
> > If you want to test hand over interop joining two operating networks
> > is a good way to do it - that is really hard to get right in all of
> > the cases :) This was the area where I felt the spec was weakest since
> > it really didn't say exactly when during the hand over exchanges each
> > SM was in control of the nodes, and exactly what should happen when
> > things go wrong was not specified..
> 
> Ok, so we are not about "impossibility" to do this... Just current lack
> of standardization makes it hard to do handover properly?
> 
> Sasha
> ___
> general mailing list
> [email protected]
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
___
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] opensm: Unsupported attribute = 0xFF02

2007-11-01 Thread Hal Rosenstock
On Wed, 2007-10-31 at 20:41 -0600, Jason Gunthorpe wrote:
> On Thu, Nov 01, 2007 at 02:24:10AM +0200, Sasha Khapyorsky wrote:
> 
> > What are the reasons? I think complaint SMs should be able to
> > inter-operate, of course not in part of proprietary extensions. At least
> > I am able to run OpenSM with Voltaire SM on one subnet.
> 
> At a minimum how hand off is supposed to work is very vaugely
> specified in the IBA.

SM handover/failover is tested for interop by the IBTA but that's it.
There are known proprietary extensions as well as this being the nose of
the camel being in the tent as in order for this to work well, there is
data replication needed (and no, please don't bring up client
reregistration as the best solution for this).

> Besides, even if hand off wasn't a problem the two SMs would have to
> have very similar ideas on routing, multicast, QOS, services, etc or
> the fabric will be badly disrupted after hand off..

Exactly.

>  Without extensions
> to transfer this live data over before hand off it is unlikely to
> be non-disruptive except in very constrained situations.

Indeed.

> It seems to me the main benifit of the whole standardized mechanism
> (in an interoperability context) is just to help make it so that a new
> sm starting up doesn't just trash the fabric accidentally, and provide
> at least some sensible behavior when two seperate subnets are combined
> into one.
> 
> If you want to test hand over interop joining two operating networks
> is a good way to do it - that is really hard to get right in all of
> the cases :) This was the area where I felt the spec was weakest since
> it really didn't say exactly when during the hand over exchanges each
> SM was in control of the nodes, and exactly what should happen when
> things go wrong was not specified..

Yes, I too believe more work could be and should be done here.

-- Hal

> Jason
___
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] opensm: Unsupported attribute = 0xFF02

2007-11-01 Thread Hal Rosenstock
Hi Sasha,

On Thu, 2007-11-01 at 02:24 +0200, Sasha Khapyorsky wrote:
> Hi Hal,
> 
> On 14:01 Tue 30 Oct , Hal Rosenstock wrote:
> > > status..0x0
> > > hop_ptr.0x0
> > > hop_count...0x0
> > > trans_id0x377df6ce
> > > attr_id.0xFF02 (UNKNOWN)
> > 
> > This is a proprietary SM attribute used by Cisco SM. Also, I believe the
> > Cisco SM supports replication to standby's and that would be via
> > proprietary means.
> > 
> > > resv0x0
> > > attr_mod0x1
> > > m_key...0x
> > > MAD IS LID ROUTED
> > > 
> > > I'm not sure what this ERR 3107 means, is there something I could do about
> > > it? Is there a way to use OpenSM as a standby SM with a managed switch?
> > 
> > No; SM flavors should not be mixed on a subnet. There are numerous
> > reasons for this.
> 
> What are the reasons? I think complaint SMs should be able to
> inter-operate, of course not in part of proprietary extensions. 

Aside from value adds (proprietary extensions) which is an important and
large issue as all SMs vendors claim advantage from this and they are
not being opened up, there are the issues of routing algorithms (both
unicast and multicast), management, and consistency of data. These are
all different to varying degrees for each flavor.

While I am aware of some customers requesting this, the IBTA does not
sanction this configuration. The MgtWG has produced a white paper on the
topic which is available on their web site. In fact, some areas of the
above are beyond the IBTA charter. If you feel strongly about this, I
suggest you get a like minded vendors and get as much of this
standardized as possible. IMO the place for this is the IBTA MgtWG.

> At least
> I am able to run OpenSM with Voltaire SM on one subnet.

Define what you mean by run here ? What "experiments" have you
performed ?

Does Voltaire stand behind this as a supported configuration ? If not,
are there any plans to do so ?

-- Hal

> Sasha
> ___
> general mailing list
> [email protected]
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
> 
> To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
___
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] opensm: Unsupported attribute = 0xFF02

2007-10-31 Thread Jason Gunthorpe
On Thu, Nov 01, 2007 at 05:56:48AM +0200, Sasha Khapyorsky wrote:

> > At a minimum how hand off is supposed to work is very vaugely
> > specified in the IBA.
> 
> It is at least basically described in the IBA - with exchanging SMInfo.

Well sort of..

Lets take the hardest example I know of, connecting two running
subnets together. There are several phases
1) Discovery - Two fully operational master SMs are running and
   maintaining their non-overlapping subset of nodes.
2) Election - Each SM independently decides who should become
   the master, but each SM continues to operate fully within its
   partition.
3) Quiscence - The new master waits for the old masters
   to stop operating on their partitions (this is what HANDOVER
   could signal)
4) Master assertion - The new master assumes control of the nodes
5) Standby - The old master drops to standby (this is what HANDOVER
   ACK could signal)

The spec isn't really clear about how the two HANDOVER sminfos map to
the above process. My personal view on this was that HANDOVER was sent
old master -> new master when the old SM is quiet and HANDOVER ACK is
what signals the old SM to go to standby. 

Ie the master sends HANDOVER ACK once all partitions it is assuming
control of have sent HANDOVER and after it has completely progrgrammed
the nodes.

A similar but ultimately simpler process happens when promoting a
standby sm to master..

IIRC there are other valid views on how this process goes, and I have
no idea what opensm does, or if it would be compatible with this view :)

> > Besides, even if hand off wasn't a problem the two SMs would have to
> > have very similar ideas on routing, multicast, QOS, services, etc
> 
> In worst case the routing tables and QoS setups could be reconfigured
> from scratch (just as if it could be first SM run), and all SA related
> things could be rerequested with ClientReregistration bit.

Well, I think you have to ask what the point of this is - what you are
describing is not high availability, you are just talking about a
dis-orderly restart of the entire fabric. I guess, why would you ever
run a master/standby SM configuration if not for HA? You can't get HA
by mixing vendors today..

Jason
___
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] opensm: Unsupported attribute = 0xFF02

2007-10-31 Thread Sasha Khapyorsky
On 20:41 Wed 31 Oct , Jason Gunthorpe wrote:
> 
> On Thu, Nov 01, 2007 at 02:24:10AM +0200, Sasha Khapyorsky wrote:
> 
> > What are the reasons? I think complaint SMs should be able to
> > inter-operate, of course not in part of proprietary extensions. At least
> > I am able to run OpenSM with Voltaire SM on one subnet.
> 
> At a minimum how hand off is supposed to work is very vaugely
> specified in the IBA.

It is at least basically described in the IBA - with exchanging SMInfo.

> Besides, even if hand off wasn't a problem the two SMs would have to
> have very similar ideas on routing, multicast, QOS, services, etc

In worst case the routing tables and QoS setups could be reconfigured
from scratch (just as if it could be first SM run), and all SA related
things could be rerequested with ClientReregistration bit.

And sure, some configurations (partitions, QoS, routing, etc.) can be
not synchronized for SMs, but then the differences in a fabric setups
should be expected results.

And I'm not about "how fast and efficient it is" and even not about
"interoperability" bugs in various implementations.

> or
> the fabric will be badly disrupted after hand off.. Without extensions
> to transfer this live data over before hand off it is unlikely to
> be non-disruptive except in very constrained situations.
> 
> It seems to me the main benifit of the whole standardized mechanism
> (in an interoperability context) is just to help make it so that a new
> sm starting up doesn't just trash the fabric accidentally, and provide
> at least some sensible behavior when two seperate subnets are combined
> into one.
> 
> If you want to test hand over interop joining two operating networks
> is a good way to do it - that is really hard to get right in all of
> the cases :) This was the area where I felt the spec was weakest since
> it really didn't say exactly when during the hand over exchanges each
> SM was in control of the nodes, and exactly what should happen when
> things go wrong was not specified..

Ok, so we are not about "impossibility" to do this... Just current lack
of standardization makes it hard to do handover properly?

Sasha
___
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] opensm: Unsupported attribute = 0xFF02

2007-10-31 Thread Jason Gunthorpe

On Thu, Nov 01, 2007 at 02:24:10AM +0200, Sasha Khapyorsky wrote:

> What are the reasons? I think complaint SMs should be able to
> inter-operate, of course not in part of proprietary extensions. At least
> I am able to run OpenSM with Voltaire SM on one subnet.

At a minimum how hand off is supposed to work is very vaugely
specified in the IBA.

Besides, even if hand off wasn't a problem the two SMs would have to
have very similar ideas on routing, multicast, QOS, services, etc or
the fabric will be badly disrupted after hand off.. Without extensions
to transfer this live data over before hand off it is unlikely to
be non-disruptive except in very constrained situations.

It seems to me the main benifit of the whole standardized mechanism
(in an interoperability context) is just to help make it so that a new
sm starting up doesn't just trash the fabric accidentally, and provide
at least some sensible behavior when two seperate subnets are combined
into one.

If you want to test hand over interop joining two operating networks
is a good way to do it - that is really hard to get right in all of
the cases :) This was the area where I felt the spec was weakest since
it really didn't say exactly when during the hand over exchanges each
SM was in control of the nodes, and exactly what should happen when
things go wrong was not specified..

Jason
___
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] opensm: Unsupported attribute = 0xFF02

2007-10-31 Thread Sasha Khapyorsky
Hi Hal,

On 14:01 Tue 30 Oct , Hal Rosenstock wrote:
> > status..0x0
> > hop_ptr.0x0
> > hop_count...0x0
> > trans_id0x377df6ce
> > attr_id.0xFF02 (UNKNOWN)
> 
> This is a proprietary SM attribute used by Cisco SM. Also, I believe the
> Cisco SM supports replication to standby's and that would be via
> proprietary means.
> 
> > resv0x0
> > attr_mod0x1
> > m_key...0x
> > MAD IS LID ROUTED
> > 
> > I'm not sure what this ERR 3107 means, is there something I could do about
> > it? Is there a way to use OpenSM as a standby SM with a managed switch?
> 
> No; SM flavors should not be mixed on a subnet. There are numerous
> reasons for this.

What are the reasons? I think complaint SMs should be able to
inter-operate, of course not in part of proprietary extensions. At least
I am able to run OpenSM with Voltaire SM on one subnet.

Sasha
___
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] opensm: Unsupported attribute = 0xFF02

2007-10-30 Thread Kilian CAVALOTTI
Hi Hal,

On Tuesday 30 October 2007 02:01:57 pm Hal Rosenstock wrote:
> This is a proprietary SM attribute used by Cisco SM. Also, I believe
> the Cisco SM supports replication to standby's and that would be via
> proprietary means.

Thanks for the info.

> No; SM flavors should not be mixed on a subnet. There are numerous
> reasons for this.

All right, that's good to know.

Thanks a lot!
-- 
Kilian
___
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] opensm: Unsupported attribute = 0xFF02

2007-10-30 Thread Hal Rosenstock
Kilian,

On Tue, 2007-10-30 at 13:56 -0700, Kilian CAVALOTTI wrote:
> Hi,
> 
> I'm trying to use opensm as a standby SM on a fabric where the master SM 
> is running on a Cisco SFS-7000D (TopspinOS 2.9.0 releng #147)
> 
> The switch (Master SM) logs report the following:
> 
> Oct 30 12:08:46 10.0.100.2 ib_sm.x[588]: %IB-6-INFO: Configuration caused by 
> SM role change
> Oct 30 12:08:55 10.0.100.2 ib_sm.x[605]: %IB-6-INFO: Initialize a backup 
> session with Standby SM guid 00:05:ad:00:00:08:cf:0d
> Oct 30 12:09:05 10.0.100.2 ib_sm.x[605]: %IB-6-INFO: Session initialization 
> failed with Standby SM guid 00:05:ad:00:00:08:cf:0d
> Oct 30 12:11:05 10.0.100.2 ib_sm.x[605]: %IB-6-INFO: Session not initiated: 
> Cold Sync Limit exceeded for Standby SM guid 00:05:ad:00:00:08:cf:0d
> 
> And the opensm (Standby SM) logs show this:
> 
> Oct 30 12:08:45 013321 [95AB1160] -> OpenSM Rev:openib-3.0.13
> Oct 30 12:08:45 013366 [95AB1160] -> OpenSM Rev:openib-3.0.13
> Oct 30 12:08:45 031429 [95AB1160] -> osm_vendor_bind: Binding to port 
> 0x5ad08cf0d
> Oct 30 12:08:45 033276 [95AB1160] -> osm_vendor_bind: Binding to port 
> 0x5ad08cf0d
> Oct 30 12:08:46 064757 [45007960] -> Entering STANDBY state
> Oct 30 12:08:55 461419 [4780B960] -> __osm_sm_mad_ctrl_process_set: ERR 3107: 
> Unsupported attribute = 0xFF02
> Oct 30 12:08:55 461482 [4780B960] -> SMP dump:
> base_ver0x1
> mgmt_class..0x1
> class_ver...0x1
> method..0x2 (SubnSet)
> status..0x0
> hop_ptr.0x0
> hop_count...0x0
> trans_id0x377df6ce
> attr_id.0xFF02 (UNKNOWN)

This is a proprietary SM attribute used by Cisco SM. Also, I believe the
Cisco SM supports replication to standby's and that would be via
proprietary means.

> resv0x0
> attr_mod0x1
> m_key...0x
> MAD IS LID ROUTED
> 
> I'm not sure what this ERR 3107 means, is there something I could do about
> it? Is there a way to use OpenSM as a standby SM with a managed switch?

No; SM flavors should not be mixed on a subnet. There are numerous
reasons for this.

-- Hal

> For information, I'm using OFED 1.2 (the Cisco fcs version) and details
> about the HCAs are below:
> 
> # ibv_devinfo
> hca_id: mthca0
> fw_ver: 1.2.917
> node_guid:  0005:ad00:0008:cf0c
> sys_image_guid: 0005:ad00:0100:d050
> vendor_id:  0x05ad
> vendor_part_id: 25204
> hw_ver: 0xA0
> board_id:   HCA.Cheetah-DDR.20
> phys_port_cnt:  1
> port:   1
> state:  PORT_ACTIVE (4)
> max_mtu:2048 (4)
> active_mtu: 2048 (4)
> sm_lid: 2
> port_lid:   4
> port_lmc:   0x00
> 
> Thanks,
___
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general