VxVM failover issue

Sebastien DAUBIGNE Thu, 07 Oct 2010 02:28:30 -0700

  I found this technote which confirmed your statement Christian :  
http://www.symantec.com/business/support/index?page=content&id=TECH51507


"- Storage Foundation on Solaris sparc and X64 is supported with MPxIO 
on Sun Storage hardware only. Storage Foundation does not support MPxIO 
on non-sun storage arrays. For Non-Sun storage hardware, DMP is 
required.  If MPxIO is enabled on a host, the tunable dmp_fast_recovery 
must be set to off: vxdmpadm settune dmp_fast_recovery=off."



Le 07/10/2010 11:12, Sebastien DAUBIGNE a écrit :
>  Hi,
>
> Thank you all for your feedback.
>
> I am very surprised that MPxIO+DMP is only supported on Sun storages : 
> as stated in my very first message, the MPxIO solution was imposed by 
> our SAN team, following HP recommendations.
>
> When we joined this SAN, I asked to go with DMP for multipathing layer 
> because we usually adopt this solution for all our 
> Solaris+VxVM+dedicated storage configuration, regardless of the 
> storage hardware : for instance with EMC hardware we use DMP and not 
> Powerpath and it works like a charm.
> Unfortunately the SAN team and HP told us that for Solaris servers 
> incluing thoses with VxVM, we must use MPxIO otherwise they would not 
> support it, hence we used MPxIO.
>
> Now for the issue, the question is still : will 5.0 bypass the MPxIO 
> layer for error detection or is this functionality only implemented 
> starting at MP2 ?
> The idea is to be sure that this is a fast recovery issue and not 
> anything else.
>
> Cheers,
>
> Le 06/10/2010 23:02, Christian Gerbrandt a écrit :
>> We support several 3rd party multipathing solutions, like MPxIO or 
>> EMCs PowerPath.
>> However, MPxIO is only supported on Sun branded Storages.
>> DMP has also been known to outperform other solutions in certain 
>> configurations.
>>
>> When a 3rd party multipathing is in use, DMP will fail back into TPD 
>> mode (Third Party Driver), and let the underlaying multipathing do 
>> its job.
>> That's when you see just a single disk in VxVM, when you know you 
>> have more than one path per disk.
>>
>> I would recommend to install the 5.0 MP3 RP4 patch, and then check 
>> again if MPxIO is still misbehaving.
>> Or ideally, switch over to DMP.
>>
>> -----Original Message-----
>> From: veritas-vx-boun...@mailman.eng.auburn.edu 
>> [mailto:veritas-vx-boun...@mailman.eng.auburn.edu] On Behalf Of 
>> Victor Engle
>> Sent: 06 October 2010 20:48
>> To: Ashish Yajnik
>> Cc: sebastien.daubi...@atosorigin.com; "undisclosed-recipients:, 
>> "@mailman.eng.auburn.edu; Veritas-vx@mailman.eng.auburn.edu
>> Subject: Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue
>>
>> This is absolutely false!
>>
>> MPxIO is an excellent multipathing solution and is supported by all 
>> major storage vendors including HP. This issue discussed in this 
>> thread has to do with improper behavior of DMP when multipathing is 
>> managed by a native layer like MPxIO.
>>
>> Storage and OS vendors have no motivation to lock you into a veritas 
>> solution.
>>
>> Or, Ashish, are you saying that your Symantec is locking Symantec 
>> customers into DMP? Hitachi, EMC, NetApp and HP all have supported 
>> configurations which include vxvm and native OS multipathing stacks.
>>
>> Thanks,
>> Vic
>>
>>
>> On Wed, Oct 6, 2010 at 1:26 PM, Ashish 
>> Yajnik<ashish_yaj...@symantec.com>  wrote:
>>> MPxIO with VxVM is only supported with Sun storage. If you run into 
>>> problems with MPxIO and SF on XP24K then support will not be able to 
>>> help you. I would recommend using DMP with XP24K.
>>>
>>> Ashish
>>> --------------------------
>>> Sent using BlackBerry
>>>
>>>
>>> ----- Original Message -----
>>> From: veritas-vx-boun...@mailman.eng.auburn.edu
>>> <veritas-vx-boun...@mailman.eng.auburn.edu>
>>> To: Sebastien DAUBIGNE<sebastien.daubi...@atosorigin.com>;
>>> undisclosed-recipients
>>> <"undisclosed-recipients:;"@mailman.eng.auburn.edu>
>>> Cc: Veritas-vx@mailman.eng.auburn.edu
>>> <Veritas-vx@mailman.eng.auburn.edu>
>>> Sent: Wed Oct 06 10:08:08 2010
>>> Subject: Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue
>>>
>>> Hi Sebastien,
>>>
>>> In the first mail you mentioned that you are using mpxio to control 
>>> the XP24K array. Why are you using mpxio here?
>>>
>>> Thanks,
>>> Venkata Sreenivasarao Nagineni,
>>> Symantec
>>>
>>>> -----Original Message-----
>>>> From: veritas-vx-boun...@mailman.eng.auburn.edu [mailto:veritas-vx-
>>>> boun...@mailman.eng.auburn.edu] On Behalf Of Sebastien DAUBIGNE
>>>> Sent: Wednesday, October 06, 2010 9:32 AM
>>>> To: undisclosed-recipients
>>>> Cc: Veritas-vx@mailman.eng.auburn.edu
>>>> Subject: Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue
>>>>
>>>>    Hi,
>>>>
>>>> I come back with my dmp_fast_recovery issue (VxDMP fails the path
>>>> before MPxIO gets a chance to failover on alternate path).
>>>> As stated previously, I am running 5.0GA, and this tunable is not
>>>> supported in this release. However I still don't know if VxVM 5.0GA
>>>> silently bypasses the MPxIO stack for error recovery.
>>>>
>>>> Now I try to determine if upgrading to MP3 will resolve this issue
>>>> (which rarely occured).
>>>>
>>>> Could anyone (maybe Joshua ?) explain if the behaviour of 5.0GA
>>>> without tunable  is functionally identical to dmp_fast_recovery=0 or
>>>> dmp_fast_recovery=1 ? Maybe the mechanism has been implemented in 5.0
>>>> without the option to disable it (this could explain my issue) ?
>>>>
>>>> Joshua, you mentioned another tuneable for 5.0 but looking at the
>>>> list I can't identify the corresponding tunable :
>>>>
>>>> >  vxdmpadm gettune all
>>>>               Tunable               Current Value  Default Value
>>>> ------------------------------    -------------  -------------
>>>> dmp_failed_io_threshold               57600            57600
>>>> dmp_retry_count                           5                5
>>>> dmp_pathswitch_blks_shift                11               11
>>>> dmp_queue_depth                          32               32
>>>> dmp_cache_open                           on               on
>>>> dmp_daemon_count                         10               10
>>>> dmp_scsi_timeout                         30               30
>>>> dmp_delayq_interval                      15               15
>>>> dmp_path_age                              0              300
>>>> dmp_stat_interval                         1                1
>>>> dmp_health_time                           0               60
>>>> dmp_probe_idle_lun                       on               on
>>>> dmp_log_level                             4                1
>>>>
>>>> Cheers.
>>>>
>>>>
>>>>
>>>> Le 16/09/2010 16:50, Joshua Fielden a écrit :
>>>>> dmp_fast_recovery is a mechanism by which we bypass the sd/scsi
>>>>> stack
>>>> and send path inquiry/status CDBs directly from the HBA in order to
>>>> bypass long SCSI queues and recover paths faster. With a TPD (third-
>>>> party driver) such as MPxIO, bypassing the stack means we bypass the
>>>> TPD completely, and interactions such as this can happen. The vxesd
>>>> (event-source daemon) is another 5.0/MP2 backport addition that's
>>>> moot in the presence of a TPD.
>>>>>    From your modinfo, you're not actually running MP3. This technote
>>>> (http://seer.entsupport.symantec.com/docs/327057.htm) isn't exactly
>>>> your scenario, but looking for partially-installed pkgs is a good
>>>> start to getting your server correctly installed, then the tuneable
>>>> should work -- very early 5.0 versions had a differently-named
>>>> tuneable I can't find in my mail archive ATM.
>>>>> Cheers,
>>>>>
>>>>> Jf
>>>>>
>>>>> -----Original Message-----
>>>>> From: veritas-vx-boun...@mailman.eng.auburn.edu [mailto:veritas-vx-
>>>> boun...@mailman.eng.auburn.edu] On Behalf Of Sebastien DAUBIGNE
>>>>> Sent: Thursday, September 16, 2010 7:41 AM
>>>>> To: Veritas-vx@mailman.eng.auburn.edu
>>>>> Subject: Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue
>>>>>
>>>>>     Thank you Victor and William, it seems to be a very good lead.
>>>>>
>>>>> Unfortunately, this tunable seems not to be supported in the VxVM
>>>>> version installed on my system :
>>>>>
>>>>> >   vxdmpadm gettune dmp_fast_recovery VxVM vxdmpadm ERROR
>>>>> V-5-1-12015  Incorrect tunable vxdmpadm gettune [tunable name] Note
>>>>> - Tunable name can be dmp_failed_io_threshold, dmp_retry_count,
>>>>> dmp_pathswitch_blks_shift, dmp_queue_depth, dmp_cache_open,
>>>>> dmp_daemon_count, dmp_scsi_timeout, dmp_delayq_interval,
>>>> dmp_path_age,
>>>>> or dmp_stat_interval
>>>>>
>>>>> Something odd because my version is 5.0 MP3 Solaris SPARC, and
>>>> according
>>>>> to http://seer.entsupport.symantec.com/docs/316981.htm this tunable
>>>>> should be available.
>>>>>
>>>>> >   modinfo | grep -i vx
>>>>>     38 7846a000  3800e 288   1  vxdmp (VxVM 5.0-2006-05-11a: DMP
>>>> Drive)
>>>>>     40 784a4000 334c40 289   1  vxio (VxVM 5.0-2006-05-11a I/O
>>>>> driver)
>>>>>     42 783ec71d    df8 290   1  vxspec (VxVM 5.0-2006-05-11a
>>>> control/st)
>>>>> 296 78cfb0a2    c6b 291   1  vxportal (VxFS 5.0_REV-5.0A55_sol
>>>>> portal
>>>> )
>>>>> 297 78d6c000 1b9d4f   8   1  vxfs (VxFS 5.0_REV-5.0A55_sol SunOS 5)
>>>>> 298 78f18000   a270 292   1  fdd (VxQIO 5.0_REV-5.0A55_sol Quick )
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Le 16/09/2010 12:15, Victor Engle a écrit :
>>>>>> Which version of veritas? Version 4/2MP2 and version 5.x
>>>>>> introduced
>>>> a
>>>>>> feature called DMP fast recovery. It was probably supposed to be
>>>>>> called DMP fast fail but "recovery" sounds better. It is supposed
>>>>>> to fail suspect paths more aggressively to speed up failover. But
>>>>>> when you only have one vxvm DMP path, as is the case with MPxIO,
>>>>>> and fast-recovery fails that path, then you're in trouble. In
>>>>>> version
>>>> 5.x,
>>>>>> it is possible to disable this feature.
>>>>>>
>>>>>> Google DMP fast recovery.
>>>>>>
>>>>>> http://seer.entsupport.symantec.com/docs/307959.htm
>>>>>>
>>>>>> I can imagine there must have been some internal fights at
>>>>>> symantec between product management and QA to get that feature 
>>>>>> released.
>>>>>>
>>>>>> Vic
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Sep 16, 2010 at 6:03 AM, Sebastien DAUBIGNE
>>>>>> <sebastien.daubi...@atosorigin.com>    wrote:
>>>>>>>     Dear Vx-addicts,
>>>>>>>
>>>>>>> We encountered a failover issue on this configuration :
>>>>>>>
>>>>>>> - Solaris 9 HW 9/05
>>>>>>> - SUN SAN (SFS) 4.4.15
>>>>>>> - Emulex with SUN generic driver (emlx)
>>>>>>> - VxVM 5.0-2006-05-11a
>>>>>>>
>>>>>>> - storage on HP SAN (XP 24K).
>>>>>>>
>>>>>>>
>>>>>>> Multipathing is managed by MPxIO (not VxDMP) because the SAN team
>>>> and HP
>>>>>>> support imposed the Solaris native solution for multipathing :
>>>>>>>
>>>>>>> VxVM ==>    VxDMP ==>    MPxIO ==>    FCP ...
>>>>>>>
>>>>>>> We have 2 paths to the switch, linked to 2 paths to the storage,
>>>>>>> so
>>>> the
>>>>>>> LUNs have 4 paths, with active/active support.
>>>>>>> Failover operation has been tested successfully by offlining each
>>>> port
>>>>>>> successively on the SAN.
>>>>>>>
>>>>>>> We regulary have transient I/O errors (scsi timeout, I/O error
>>>> retries
>>>>>>> with "Unit attention"), due to SAN-side issues. Usually these
>>>> errors are
>>>>>>> transparently managed by MPxIO/VxVM without impact on the
>>>> applications.
>>>>>>> Now for the incident we encountered :
>>>>>>>
>>>>>>> One of the SAN port was reset , consequently there were some
>>>> transient
>>>>>>> I/O error.
>>>>>>> The other SAN port was OK, so the MPxIO multipathing layer should
>>>> have
>>>>>>> failover the I/O on the other path, without transmiting the error
>>>> to the
>>>>>>> VxDMP layer.
>>>>>>> For some reason, it did not failover the I/O before VxVM caught
>>>>>>> it
>>>> as
>>>>>>> unrecoverable I/O error, disabling the subdisk and consequently
>>>>>>> the filesystem.
>>>>>>>
>>>>>>> Note the "giving up" message from scsi layer at 06:23:03 :
>>>>>>>
>>>>>>> Sep  1 06:18:54 myserver vxdmp: [ID 917986 kern.notice] NOTICE:
>>>> VxVM
>>>>>>> vxdmp V-5-0-112 disabled path 118/0x558 belonging to the dmpnode
>>>> 288/0x60
>>>>>>> Sep  1 06:18:54 myserver vxdmp: [ID 824220 kern.notice] NOTICE:
>>>> VxVM
>>>>>>> vxdmp V-5-0-111 disabled dmpnode 288/0x60 Sep  1 06:18:54
>>>>>>> myserver vxdmp: [ID 917986 kern.notice] NOTICE:
>>>> VxVM
>>>>>>> vxdmp V-5-0-112 disabled path 118/0x538 belonging to the dmpnode
>>>> 288/0x20
>>>>>>> Sep  1 06:18:54 myserver vxdmp: [ID 917986 kern.notice] NOTICE:
>>>> VxVM
>>>>>>> vxdmp V-5-0-112 disabled path 118/0x550 belonging to the dmpnode
>>>> 288/0x18
>>>>>>> Sep  1 06:18:54 myserver vxdmp: [ID 824220 kern.notice] NOTICE:
>>>> VxVM
>>>>>>> vxdmp V-5-0-111 disabled dmpnode 288/0x20 Sep  1 06:18:54
>>>>>>> myserver vxdmp: [ID 824220 kern.notice] NOTICE:
>>>> VxVM
>>>>>>> vxdmp V-5-0-111 disabled dmpnode 288/0x18 Sep  1 06:18:54
>>>>>>> myserver scsi: [ID 107833 kern.warning] WARNING:
>>>>>>> /scsi_vhci/s...@g60060e80152777000001277700003794 (ssd165):
>>>>>>> Sep  1 06:18:54 myserver        SCSI transport failed: reason
>>>>>>> 'tran_err': retrying command
>>>>>>> Sep  1 06:19:05 myserver scsi: [ID 107833 kern.warning] WARNING:
>>>>>>> /scsi_vhci/s...@g60060e80152777000001277700003794 (ssd165):
>>>>>>> Sep  1 06:19:05 myserver        SCSI transport failed: reason
>>>> 'timeout':
>>>>>>> retrying command
>>>>>>> Sep  1 06:21:57 myserver scsi: [ID 107833 kern.warning] WARNING:
>>>>>>> /scsi_vhci/s...@g60060e8015277700000127770000376d (ssd168):
>>>>>>> Sep  1 06:21:57 myserver        SCSI transport failed: reason
>>>>>>> 'tran_err': retrying command
>>>>>>> Sep  1 06:22:45 myserver scsi: [ID 107833 kern.warning] WARNING:
>>>>>>> /scsi_vhci/s...@g60060e8015277700000127770000376d (ssd168):
>>>>>>> Sep  1 06:22:45 myserver        SCSI transport failed: reason
>>>> 'timeout':
>>>>>>> retrying command
>>>>>>> Sep  1 06:23:03 myserver scsi: [ID 107833 kern.warning] WARNING:
>>>>>>> /scsi_vhci/s...@g60060e80152777000001277700003787 (ssd166):
>>>>>>> Sep  1 06:23:03 myserver        SCSI transport failed: reason
>>>> 'timeout':
>>>>>>> giving up
>>>>>>> Sep  1 06:23:03 myserver vxio: [ID 539309 kern.warning] WARNING:
>>>> VxVM
>>>>>>> vxio V-5-3-0 voldmp_errbuf_sio_start: Failed to flush the error
>>>> buffer
>>>>>>> 300ce41c340 on device 0x1200000003a to DMP Sep  1 06:23:03
>>>>>>> myserver vxio: [ID 771159 kern.warning] WARNING:
>>>> VxVM
>>>>>>> vxio V-5-0-2 Subdisk mydisk_2-02 block 5935: Uncorrectable write
>>>> error
>>>>>>> Sep  1 06:23:03 myserver vxfs: [ID 702911 kern.warning] WARNING:
>>>> msgcnt
>>>>>>> 1 mesg 037: V-2-37: vx_metaioerr - vx_logbuf_clean -
>>>>>>> /dev/vx/dsk/mydg/vol1 file system meta data write error in
>>>> dev/block 0/5935
>>>>>>> Sep  1 06:23:03 myserver vxfs: [ID 702911 kern.warning] WARNING:
>>>> msgcnt
>>>>>>> 2 mesg 031: V-2-31: vx_disable - /dev/vx/dsk/mydg/vol1 file
>>>>>>> system
>>>> disabled
>>>>>>> Sep  1 06:23:03 myserver vxfs: [ID 702911 kern.warning] WARNING:
>>>> msgcnt
>>>>>>> 3 mesg 037: V-2-37: vx_metaioerr - vx_inode_iodone -
>>>>>>> /dev/vx/dsk/mydg/vol1 file system meta data write error in
>>>> dev/block
>>>>>>> 0/265984
>>>>>>>
>>>>>>>
>>>>>>> It seems VxDMP gets the I/O error at the same time as MPxIO  : I
>>>> though
>>>>>>> MPxIO would have conceal the I/O error until failover has
>>>>>>> occured,
>>>> which
>>>>>>> is not the case.
>>>>>>>
>>>>>>> As a workaround, I increased the VxDMP
>>>>>>> recoveryotion/fixedretry/retrycount tunable from 5 to 20 to give
>>>> MPxIO a
>>>>>>> chance to failover before VxDMP fails, but I still don't
>>>>>>> understand
>>>> why
>>>>>>> VxVM catch the scsi errors.
>>>>>>>
>>>>>>> Any advice ?
>>>>>>>
>>>>>>> thanks.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -- 
>>>>>>> Sebastien DAUBIGNE
>>>>>>> sebastien.daubi...@atosorigin.com  - +33(0)5.57.89.31.09
>>>>>>> AtosOrigin Infogerance - AIS/D1/SudOuest/Bordeaux/IS-Unix
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Veritas-vx maillist  -  Veritas-vx@mailman.eng.auburn.edu
>>>>>>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-vx
>>>>>>>
>>>>
>>>> -- 
>>>> Sebastien DAUBIGNE
>>>> sebastien.daubi...@atosorigin.com - +33(0)5.57.89.31.09 AtosOrigin
>>>> Infogerance - AIS/D1/SudOuest/Bordeaux/IS-Unix
>>>>
>>>> _______________________________________________
>>>> Veritas-vx maillist  -  Veritas-vx@mailman.eng.auburn.edu
>>>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-vx
>>> _______________________________________________
>>> Veritas-vx maillist  -  Veritas-vx@mailman.eng.auburn.edu
>>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-vx
>>> _______________________________________________
>>> Veritas-vx maillist  -  Veritas-vx@mailman.eng.auburn.edu
>>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-vx
>>>
>> _______________________________________________
>> Veritas-vx maillist  -  Veritas-vx@mailman.eng.auburn.edu 
>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-vx
>>
>> _______________________________________________
>> Veritas-vx maillist  -  Veritas-vx@mailman.eng.auburn.edu
>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-vx
>
>


-- 
Sebastien DAUBIGNE
sebastien.daubi...@atosorigin.com - +33(0)5.57.89.31.09
AtosOrigin Infogerance - AIS/D1/SudOuest/Bordeaux/IS-Unix

_______________________________________________
Veritas-vx maillist  -  Veritas-vx@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-vx

Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

Reply via email to