Given past 6140/6540 experience (and the code is shared with 25xx
series) I would suggest you look into MPxIO's aquisition of the
primary/secondary paths, specifically that the preferred path has
changed since module loading and therefore the system ping-pongs
between paths.

MPxIO has traditionally only classified paths (ie. as to their status
as either a primary/secondary path) once upon first discover after
boot, hence which path is discovered first as the primary is retained
by the kernel module. Most A/P arrays propogate the identity bits and
hence MPxIO ids the path and builds its maps accordingly and
everything is fine, hence primary paths are accessed until all are
exhausted (ie. multiple primaries are round-robined), upon which even
we switch to secondary paths (again utilising all secondaries in a
round-robin fashion).

The specific difference in the LSI arrays is that we have the notion
of "preferred path" and more correctly this is a mechanism of changing
the identity of the primary path. If the preferred path is changed
after a host has booted and acquired its path clasifications then
Solaris's view of the paths is now at odds with the arrays and hence
the flapping effect as Solaris attempts to regain its "primary" path,
whilst the array does otherwise.

You need to re-initialise the path classification, which I assume
would take a module unload/reload, not particularly easy with most of
the FC drivers, so normally a host reboot after the correct set of
"preferred" mappings is accomplished in the array is sufficient and
then verify the Solaris view of the paths.

Note that this would normally be a one-time event on first
config/deployment of the LSI box/LUNs, we are not talking about
failure of paths here, specifically path classification, which is a
one time event.

HTH

Craig

2008/9/2 Joel Miller <[EMAIL PROTECTED]>:
> Hi Bob,
>
> Sorry I have not had a lot of spare time lately...
>
> So the LSI-sourced arrays have bits in the inquiry data that tells the host 
> whether or not the path that the inquiry is being handled on is a preferred 
> or secondary path...
>
> When a path is brought online (or back online after going away), MPxIO is 
> supposed to use that information to know whether or not to attempt to 
> failback the LUN or not..
>
> If you can collect the support data via CAM, I can take a look at your 
> configuration...but basically it will likely come down to:
> 1) If the preferred owner of a LUN is not what you expected...which I assume 
> you checked already...
>
> 2) A timing issue that is either causing the controller to not set the 
> correct bit during re-discovery
>
> or
>
> 3) A timing or load related issue that is causing the host to "drop the ball" 
> during re-discovery
>
>
> BTW, The Wide-port message you see when you reset a controller is likely the 
> surviving controller noticing that it lost one of its back-end SAS channels 
> until the other controller comes back...
>
> -Joel
> --
> This message posted from opensolaris.org
> _______________________________________________
> storage-discuss mailing list
> [email protected]
> http://mail.opensolaris.org/mailman/listinfo/storage-discuss
>



-- 
Craig Morgan
Cinnabar Solutions Ltd

t: +44 (0)791 338 3190
f: +44 (0)870 705 1726

e: [EMAIL PROTECTED]
w: www.cinnabar-solutions.com
_______________________________________________
storage-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/storage-discuss

Reply via email to