VxVM failover issue

Christian Gerbrandt Wed, 06 Oct 2010 14:02:17 -0700

We support several 3rd party multipathing solutions, like MPxIO or EMCs 
PowerPath.
However, MPxIO is only supported on Sun branded Storages.
DMP has also been known to outperform other solutions in certain configurations.


When a 3rd party multipathing is in use, DMP will fail back into TPD mode 
(Third Party Driver), and let the underlaying multipathing do its job.
That's when you see just a single disk in VxVM, when you know you have more 
than one path per disk.

I would recommend to install the 5.0 MP3 RP4 patch, and then check again if 
MPxIO is still misbehaving.
Or ideally, switch over to DMP.  

-----Original Message-----
From: veritas-vx-boun...@mailman.eng.auburn.edu 
[mailto:veritas-vx-boun...@mailman.eng.auburn.edu] On Behalf Of Victor Engle
Sent: 06 October 2010 20:48
To: Ashish Yajnik
Cc: sebastien.daubi...@atosorigin.com; "undisclosed-recipients:, 
"@mailman.eng.auburn.edu; Veritas-vx@mailman.eng.auburn.edu
Subject: Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

This is absolutely false!

MPxIO is an excellent multipathing solution and is supported by all major 
storage vendors including HP. This issue discussed in this thread has to do 
with improper behavior of DMP when multipathing is managed by a native layer 
like MPxIO.

Storage and OS vendors have no motivation to lock you into a veritas solution.

Or, Ashish, are you saying that your Symantec is locking Symantec customers 
into DMP? Hitachi, EMC, NetApp and HP all have supported configurations which 
include vxvm and native OS multipathing stacks.

Thanks,
Vic


On Wed, Oct 6, 2010 at 1:26 PM, Ashish Yajnik <ashish_yaj...@symantec.com> 
wrote:
> MPxIO with VxVM is only supported with Sun storage. If you run into problems 
> with MPxIO and SF on XP24K then support will not be able to help you. I would 
> recommend using DMP with XP24K.
>
> Ashish
> --------------------------
> Sent using BlackBerry
>
>
> ----- Original Message -----
> From: veritas-vx-boun...@mailman.eng.auburn.edu 
> <veritas-vx-boun...@mailman.eng.auburn.edu>
> To: Sebastien DAUBIGNE <sebastien.daubi...@atosorigin.com>; 
> undisclosed-recipients 
> <"undisclosed-recipients:;"@mailman.eng.auburn.edu>
> Cc: Veritas-vx@mailman.eng.auburn.edu 
> <Veritas-vx@mailman.eng.auburn.edu>
> Sent: Wed Oct 06 10:08:08 2010
> Subject: Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue
>
> Hi Sebastien,
>
> In the first mail you mentioned that you are using mpxio to control the XP24K 
> array. Why are you using mpxio here?
>
> Thanks,
> Venkata Sreenivasarao Nagineni,
> Symantec
>
>> -----Original Message-----
>> From: veritas-vx-boun...@mailman.eng.auburn.edu [mailto:veritas-vx- 
>> boun...@mailman.eng.auburn.edu] On Behalf Of Sebastien DAUBIGNE
>> Sent: Wednesday, October 06, 2010 9:32 AM
>> To: undisclosed-recipients
>> Cc: Veritas-vx@mailman.eng.auburn.edu
>> Subject: Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue
>>
>>   Hi,
>>
>> I come back with my dmp_fast_recovery issue (VxDMP fails the path 
>> before MPxIO gets a chance to failover on alternate path).
>> As stated previously, I am running 5.0GA, and this tunable is not 
>> supported in this release. However I still don't know if VxVM 5.0GA 
>> silently bypasses the MPxIO stack for error recovery.
>>
>> Now I try to determine if upgrading to MP3 will resolve this issue 
>> (which rarely occured).
>>
>> Could anyone (maybe Joshua ?) explain if the behaviour of 5.0GA 
>> without tunable  is functionally identical to dmp_fast_recovery=0 or
>> dmp_fast_recovery=1 ? Maybe the mechanism has been implemented in 5.0 
>> without the option to disable it (this could explain my issue) ?
>>
>> Joshua, you mentioned another tuneable for 5.0 but looking at the 
>> list I can't identify the corresponding tunable :
>>
>>  > vxdmpadm gettune all
>>              Tunable               Current Value  Default Value
>> ------------------------------    -------------  ------------- 
>> dmp_failed_io_threshold               57600            57600 
>> dmp_retry_count                           5                5 
>> dmp_pathswitch_blks_shift                11               11 
>> dmp_queue_depth                          32               32 
>> dmp_cache_open                           on               on 
>> dmp_daemon_count                         10               10 
>> dmp_scsi_timeout                         30               30 
>> dmp_delayq_interval                      15               15 
>> dmp_path_age                              0              300 
>> dmp_stat_interval                         1                1 
>> dmp_health_time                           0               60 
>> dmp_probe_idle_lun                       on               on 
>> dmp_log_level                             4                1
>>
>> Cheers.
>>
>>
>>
>> Le 16/09/2010 16:50, Joshua Fielden a écrit :
>> > dmp_fast_recovery is a mechanism by which we bypass the sd/scsi 
>> > stack
>> and send path inquiry/status CDBs directly from the HBA in order to 
>> bypass long SCSI queues and recover paths faster. With a TPD (third- 
>> party driver) such as MPxIO, bypassing the stack means we bypass the 
>> TPD completely, and interactions such as this can happen. The vxesd 
>> (event-source daemon) is another 5.0/MP2 backport addition that's 
>> moot in the presence of a TPD.
>> >
>> >  From your modinfo, you're not actually running MP3. This technote
>> (http://seer.entsupport.symantec.com/docs/327057.htm) isn't exactly 
>> your scenario, but looking for partially-installed pkgs is a good 
>> start to getting your server correctly installed, then the tuneable 
>> should work -- very early 5.0 versions had a differently-named 
>> tuneable I can't find in my mail archive ATM.
>> >
>> > Cheers,
>> >
>> > Jf
>> >
>> > -----Original Message-----
>> > From: veritas-vx-boun...@mailman.eng.auburn.edu [mailto:veritas-vx-
>> boun...@mailman.eng.auburn.edu] On Behalf Of Sebastien DAUBIGNE
>> > Sent: Thursday, September 16, 2010 7:41 AM
>> > To: Veritas-vx@mailman.eng.auburn.edu
>> > Subject: Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue
>> >
>> >    Thank you Victor and William, it seems to be a very good lead.
>> >
>> > Unfortunately, this tunable seems not to be supported in the VxVM 
>> > version installed on my system :
>> >
>> >   >  vxdmpadm gettune dmp_fast_recovery VxVM vxdmpadm ERROR 
>> > V-5-1-12015  Incorrect tunable vxdmpadm gettune [tunable name] Note 
>> > - Tunable name can be dmp_failed_io_threshold, dmp_retry_count, 
>> > dmp_pathswitch_blks_shift, dmp_queue_depth, dmp_cache_open, 
>> > dmp_daemon_count, dmp_scsi_timeout, dmp_delayq_interval,
>> dmp_path_age,
>> > or dmp_stat_interval
>> >
>> > Something odd because my version is 5.0 MP3 Solaris SPARC, and
>> according
>> > to http://seer.entsupport.symantec.com/docs/316981.htm this tunable 
>> > should be available.
>> >
>> >   >  modinfo | grep -i vx
>> >    38 7846a000  3800e 288   1  vxdmp (VxVM 5.0-2006-05-11a: DMP
>> Drive)
>> >    40 784a4000 334c40 289   1  vxio (VxVM 5.0-2006-05-11a I/O 
>> > driver)
>> >    42 783ec71d    df8 290   1  vxspec (VxVM 5.0-2006-05-11a
>> control/st)
>> > 296 78cfb0a2    c6b 291   1  vxportal (VxFS 5.0_REV-5.0A55_sol 
>> > portal
>> )
>> > 297 78d6c000 1b9d4f   8   1  vxfs (VxFS 5.0_REV-5.0A55_sol SunOS 5)
>> > 298 78f18000   a270 292   1  fdd (VxQIO 5.0_REV-5.0A55_sol Quick )
>> >
>> >
>> >
>> >
>> >
>> > Le 16/09/2010 12:15, Victor Engle a écrit :
>> >> Which version of veritas? Version 4/2MP2 and version 5.x 
>> >> introduced
>> a
>> >> feature called DMP fast recovery. It was probably supposed to be 
>> >> called DMP fast fail but "recovery" sounds better. It is supposed 
>> >> to fail suspect paths more aggressively to speed up failover. But 
>> >> when you only have one vxvm DMP path, as is the case with MPxIO, 
>> >> and fast-recovery fails that path, then you're in trouble. In 
>> >> version
>> 5.x,
>> >> it is possible to disable this feature.
>> >>
>> >> Google DMP fast recovery.
>> >>
>> >> http://seer.entsupport.symantec.com/docs/307959.htm
>> >>
>> >> I can imagine there must have been some internal fights at 
>> >> symantec between product management and QA to get that feature released.
>> >>
>> >> Vic
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Thu, Sep 16, 2010 at 6:03 AM, Sebastien DAUBIGNE 
>> >> <sebastien.daubi...@atosorigin.com>   wrote:
>> >>>    Dear Vx-addicts,
>> >>>
>> >>> We encountered a failover issue on this configuration :
>> >>>
>> >>> - Solaris 9 HW 9/05
>> >>> - SUN SAN (SFS) 4.4.15
>> >>> - Emulex with SUN generic driver (emlx)
>> >>> - VxVM 5.0-2006-05-11a
>> >>>
>> >>> - storage on HP SAN (XP 24K).
>> >>>
>> >>>
>> >>> Multipathing is managed by MPxIO (not VxDMP) because the SAN team
>> and HP
>> >>> support imposed the Solaris native solution for multipathing :
>> >>>
>> >>> VxVM ==>   VxDMP ==>   MPxIO ==>   FCP ...
>> >>>
>> >>> We have 2 paths to the switch, linked to 2 paths to the storage, 
>> >>> so
>> the
>> >>> LUNs have 4 paths, with active/active support.
>> >>> Failover operation has been tested successfully by offlining each
>> port
>> >>> successively on the SAN.
>> >>>
>> >>> We regulary have transient I/O errors (scsi timeout, I/O error
>> retries
>> >>> with "Unit attention"), due to SAN-side issues. Usually these
>> errors are
>> >>> transparently managed by MPxIO/VxVM without impact on the
>> applications.
>> >>>
>> >>> Now for the incident we encountered :
>> >>>
>> >>> One of the SAN port was reset , consequently there were some
>> transient
>> >>> I/O error.
>> >>> The other SAN port was OK, so the MPxIO multipathing layer should
>> have
>> >>> failover the I/O on the other path, without transmiting the error
>> to the
>> >>> VxDMP layer.
>> >>> For some reason, it did not failover the I/O before VxVM caught 
>> >>> it
>> as
>> >>> unrecoverable I/O error, disabling the subdisk and consequently 
>> >>> the filesystem.
>> >>>
>> >>> Note the "giving up" message from scsi layer at 06:23:03 :
>> >>>
>> >>> Sep  1 06:18:54 myserver vxdmp: [ID 917986 kern.notice] NOTICE:
>> VxVM
>> >>> vxdmp V-5-0-112 disabled path 118/0x558 belonging to the dmpnode
>> 288/0x60
>> >>> Sep  1 06:18:54 myserver vxdmp: [ID 824220 kern.notice] NOTICE:
>> VxVM
>> >>> vxdmp V-5-0-111 disabled dmpnode 288/0x60 Sep  1 06:18:54 
>> >>> myserver vxdmp: [ID 917986 kern.notice] NOTICE:
>> VxVM
>> >>> vxdmp V-5-0-112 disabled path 118/0x538 belonging to the dmpnode
>> 288/0x20
>> >>> Sep  1 06:18:54 myserver vxdmp: [ID 917986 kern.notice] NOTICE:
>> VxVM
>> >>> vxdmp V-5-0-112 disabled path 118/0x550 belonging to the dmpnode
>> 288/0x18
>> >>> Sep  1 06:18:54 myserver vxdmp: [ID 824220 kern.notice] NOTICE:
>> VxVM
>> >>> vxdmp V-5-0-111 disabled dmpnode 288/0x20 Sep  1 06:18:54 
>> >>> myserver vxdmp: [ID 824220 kern.notice] NOTICE:
>> VxVM
>> >>> vxdmp V-5-0-111 disabled dmpnode 288/0x18 Sep  1 06:18:54 
>> >>> myserver scsi: [ID 107833 kern.warning] WARNING:
>> >>> /scsi_vhci/s...@g60060e80152777000001277700003794 (ssd165):
>> >>> Sep  1 06:18:54 myserver        SCSI transport failed: reason
>> >>> 'tran_err': retrying command
>> >>> Sep  1 06:19:05 myserver scsi: [ID 107833 kern.warning] WARNING:
>> >>> /scsi_vhci/s...@g60060e80152777000001277700003794 (ssd165):
>> >>> Sep  1 06:19:05 myserver        SCSI transport failed: reason
>> 'timeout':
>> >>> retrying command
>> >>> Sep  1 06:21:57 myserver scsi: [ID 107833 kern.warning] WARNING:
>> >>> /scsi_vhci/s...@g60060e8015277700000127770000376d (ssd168):
>> >>> Sep  1 06:21:57 myserver        SCSI transport failed: reason
>> >>> 'tran_err': retrying command
>> >>> Sep  1 06:22:45 myserver scsi: [ID 107833 kern.warning] WARNING:
>> >>> /scsi_vhci/s...@g60060e8015277700000127770000376d (ssd168):
>> >>> Sep  1 06:22:45 myserver        SCSI transport failed: reason
>> 'timeout':
>> >>> retrying command
>> >>> Sep  1 06:23:03 myserver scsi: [ID 107833 kern.warning] WARNING:
>> >>> /scsi_vhci/s...@g60060e80152777000001277700003787 (ssd166):
>> >>> Sep  1 06:23:03 myserver        SCSI transport failed: reason
>> 'timeout':
>> >>> giving up
>> >>> Sep  1 06:23:03 myserver vxio: [ID 539309 kern.warning] WARNING:
>> VxVM
>> >>> vxio V-5-3-0 voldmp_errbuf_sio_start: Failed to flush the error
>> buffer
>> >>> 300ce41c340 on device 0x1200000003a to DMP Sep  1 06:23:03 
>> >>> myserver vxio: [ID 771159 kern.warning] WARNING:
>> VxVM
>> >>> vxio V-5-0-2 Subdisk mydisk_2-02 block 5935: Uncorrectable write
>> error
>> >>> Sep  1 06:23:03 myserver vxfs: [ID 702911 kern.warning] WARNING:
>> msgcnt
>> >>> 1 mesg 037: V-2-37: vx_metaioerr - vx_logbuf_clean -
>> >>> /dev/vx/dsk/mydg/vol1 file system meta data write error in
>> dev/block 0/5935
>> >>> Sep  1 06:23:03 myserver vxfs: [ID 702911 kern.warning] WARNING:
>> msgcnt
>> >>> 2 mesg 031: V-2-31: vx_disable - /dev/vx/dsk/mydg/vol1 file 
>> >>> system
>> disabled
>> >>> Sep  1 06:23:03 myserver vxfs: [ID 702911 kern.warning] WARNING:
>> msgcnt
>> >>> 3 mesg 037: V-2-37: vx_metaioerr - vx_inode_iodone -
>> >>> /dev/vx/dsk/mydg/vol1 file system meta data write error in
>> dev/block
>> >>> 0/265984
>> >>>
>> >>>
>> >>> It seems VxDMP gets the I/O error at the same time as MPxIO  : I
>> though
>> >>> MPxIO would have conceal the I/O error until failover has 
>> >>> occured,
>> which
>> >>> is not the case.
>> >>>
>> >>> As a workaround, I increased the VxDMP 
>> >>> recoveryotion/fixedretry/retrycount tunable from 5 to 20 to give
>> MPxIO a
>> >>> chance to failover before VxDMP fails, but I still don't 
>> >>> understand
>> why
>> >>> VxVM catch the scsi errors.
>> >>>
>> >>> Any advice ?
>> >>>
>> >>> thanks.
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Sebastien DAUBIGNE
>> >>> sebastien.daubi...@atosorigin.com  - +33(0)5.57.89.31.09 
>> >>> AtosOrigin Infogerance - AIS/D1/SudOuest/Bordeaux/IS-Unix
>> >>>
>> >>> _______________________________________________
>> >>> Veritas-vx maillist  -  Veritas-vx@mailman.eng.auburn.edu 
>> >>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-vx
>> >>>
>> >
>>
>>
>> --
>> Sebastien DAUBIGNE
>> sebastien.daubi...@atosorigin.com - +33(0)5.57.89.31.09 AtosOrigin 
>> Infogerance - AIS/D1/SudOuest/Bordeaux/IS-Unix
>>
>> _______________________________________________
>> Veritas-vx maillist  -  Veritas-vx@mailman.eng.auburn.edu 
>> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-vx
> _______________________________________________
> Veritas-vx maillist  -  Veritas-vx@mailman.eng.auburn.edu 
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-vx
> _______________________________________________
> Veritas-vx maillist  -  Veritas-vx@mailman.eng.auburn.edu 
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-vx
>
_______________________________________________
Veritas-vx maillist  -  Veritas-vx@mailman.eng.auburn.edu 
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-vx

_______________________________________________
Veritas-vx maillist  -  Veritas-vx@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-vx

Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

Reply via email to