Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

2010-09-16 Thread Victor Engle
Which version of veritas? Version 4/2MP2 and version 5.x introduced a
feature called DMP fast recovery. It was probably supposed to be
called DMP fast fail but recovery sounds better. It is supposed to
fail suspect paths more aggressively to speed up failover. But when
you only have one vxvm DMP path, as is the case with MPxIO, and
fast-recovery fails that path, then you're in trouble. In version 5.x,
it is possible to disable this feature.

Google DMP fast recovery.

http://seer.entsupport.symantec.com/docs/307959.htm

I can imagine there must have been some internal fights at symantec
between product management and QA to get that feature released.

Vic





On Thu, Sep 16, 2010 at 6:03 AM, Sebastien DAUBIGNE
sebastien.daubi...@atosorigin.com wrote:
  Dear Vx-addicts,

 We encountered a failover issue on this configuration :

 - Solaris 9 HW 9/05
 - SUN SAN (SFS) 4.4.15
 - Emulex with SUN generic driver (emlx)
 - VxVM 5.0-2006-05-11a

 - storage on HP SAN (XP 24K).


 Multipathing is managed by MPxIO (not VxDMP) because the SAN team and HP
 support imposed the Solaris native solution for multipathing :

 VxVM == VxDMP == MPxIO == FCP ...

 We have 2 paths to the switch, linked to 2 paths to the storage, so the
 LUNs have 4 paths, with active/active support.
 Failover operation has been tested successfully by offlining each port
 successively on the SAN.

 We regulary have transient I/O errors (scsi timeout, I/O error retries
 with Unit attention), due to SAN-side issues. Usually these errors are
 transparently managed by MPxIO/VxVM without impact on the applications.

 Now for the incident we encountered :

 One of the SAN port was reset , consequently there were some transient
 I/O error.
 The other SAN port was OK, so the MPxIO multipathing layer should have
 failover the I/O on the other path, without transmiting the error to the
 VxDMP layer.
 For some reason, it did not failover the I/O before VxVM caught it as
 unrecoverable I/O error, disabling the subdisk and consequently the
 filesystem.

 Note the giving up message from scsi layer at 06:23:03 :

 Sep  1 06:18:54 myserver vxdmp: [ID 917986 kern.notice] NOTICE: VxVM
 vxdmp V-5-0-112 disabled path 118/0x558 belonging to the dmpnode 288/0x60
 Sep  1 06:18:54 myserver vxdmp: [ID 824220 kern.notice] NOTICE: VxVM
 vxdmp V-5-0-111 disabled dmpnode 288/0x60
 Sep  1 06:18:54 myserver vxdmp: [ID 917986 kern.notice] NOTICE: VxVM
 vxdmp V-5-0-112 disabled path 118/0x538 belonging to the dmpnode 288/0x20
 Sep  1 06:18:54 myserver vxdmp: [ID 917986 kern.notice] NOTICE: VxVM
 vxdmp V-5-0-112 disabled path 118/0x550 belonging to the dmpnode 288/0x18
 Sep  1 06:18:54 myserver vxdmp: [ID 824220 kern.notice] NOTICE: VxVM
 vxdmp V-5-0-111 disabled dmpnode 288/0x20
 Sep  1 06:18:54 myserver vxdmp: [ID 824220 kern.notice] NOTICE: VxVM
 vxdmp V-5-0-111 disabled dmpnode 288/0x18
 Sep  1 06:18:54 myserver scsi: [ID 107833 kern.warning] WARNING:
 /scsi_vhci/s...@g60060e801527770127773794 (ssd165):
 Sep  1 06:18:54 myserver        SCSI transport failed: reason
 'tran_err': retrying command
 Sep  1 06:19:05 myserver scsi: [ID 107833 kern.warning] WARNING:
 /scsi_vhci/s...@g60060e801527770127773794 (ssd165):
 Sep  1 06:19:05 myserver        SCSI transport failed: reason 'timeout':
 retrying command
 Sep  1 06:21:57 myserver scsi: [ID 107833 kern.warning] WARNING:
 /scsi_vhci/s...@g60060e80152777012777376d (ssd168):
 Sep  1 06:21:57 myserver        SCSI transport failed: reason
 'tran_err': retrying command
 Sep  1 06:22:45 myserver scsi: [ID 107833 kern.warning] WARNING:
 /scsi_vhci/s...@g60060e80152777012777376d (ssd168):
 Sep  1 06:22:45 myserver        SCSI transport failed: reason 'timeout':
 retrying command
 Sep  1 06:23:03 myserver scsi: [ID 107833 kern.warning] WARNING:
 /scsi_vhci/s...@g60060e801527770127773787 (ssd166):
 Sep  1 06:23:03 myserver        SCSI transport failed: reason 'timeout':
 giving up
 Sep  1 06:23:03 myserver vxio: [ID 539309 kern.warning] WARNING: VxVM
 vxio V-5-3-0 voldmp_errbuf_sio_start: Failed to flush the error buffer
 300ce41c340 on device 0x120003a to DMP
 Sep  1 06:23:03 myserver vxio: [ID 771159 kern.warning] WARNING: VxVM
 vxio V-5-0-2 Subdisk mydisk_2-02 block 5935: Uncorrectable write error
 Sep  1 06:23:03 myserver vxfs: [ID 702911 kern.warning] WARNING: msgcnt
 1 mesg 037: V-2-37: vx_metaioerr - vx_logbuf_clean -
 /dev/vx/dsk/mydg/vol1 file system meta data write error in dev/block 0/5935
 Sep  1 06:23:03 myserver vxfs: [ID 702911 kern.warning] WARNING: msgcnt
 2 mesg 031: V-2-31: vx_disable - /dev/vx/dsk/mydg/vol1 file system disabled
 Sep  1 06:23:03 myserver vxfs: [ID 702911 kern.warning] WARNING: msgcnt
 3 mesg 037: V-2-37: vx_metaioerr - vx_inode_iodone -
 /dev/vx/dsk/mydg/vol1 file system meta data write error in dev/block
 0/265984


 It seems VxDMP gets the I/O error at the same time as MPxIO  : I though
 MPxIO would have conceal the I/O error until failover has occured, which
 

Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

2010-10-06 Thread Victor Engle
 in the VxVM
  version installed on my system :
 
      vxdmpadm gettune dmp_fast_recovery
  VxVM vxdmpadm ERROR V-5-1-12015  Incorrect tunable
  vxdmpadm gettune [tunable name]
  Note - Tunable name can be dmp_failed_io_threshold, dmp_retry_count,
  dmp_pathswitch_blks_shift, dmp_queue_depth, dmp_cache_open,
  dmp_daemon_count, dmp_scsi_timeout, dmp_delayq_interval,
 dmp_path_age,
  or dmp_stat_interval
 
  Something odd because my version is 5.0 MP3 Solaris SPARC, and
 according
  to http://seer.entsupport.symantec.com/docs/316981.htm this tunable
  should be available.
 
      modinfo | grep -i vx
     38 7846a000  3800e 288   1  vxdmp (VxVM 5.0-2006-05-11a: DMP
 Drive)
     40 784a4000 334c40 289   1  vxio (VxVM 5.0-2006-05-11a I/O driver)
     42 783ec71d    df8 290   1  vxspec (VxVM 5.0-2006-05-11a
 control/st)
  296 78cfb0a2    c6b 291   1  vxportal (VxFS 5.0_REV-5.0A55_sol portal
 )
  297 78d6c000 1b9d4f   8   1  vxfs (VxFS 5.0_REV-5.0A55_sol SunOS 5)
  298 78f18000   a270 292   1  fdd (VxQIO 5.0_REV-5.0A55_sol Quick )
 
 
 
 
 
  Le 16/09/2010 12:15, Victor Engle a écrit :
  Which version of veritas? Version 4/2MP2 and version 5.x introduced
 a
  feature called DMP fast recovery. It was probably supposed to be
  called DMP fast fail but recovery sounds better. It is supposed to
  fail suspect paths more aggressively to speed up failover. But when
  you only have one vxvm DMP path, as is the case with MPxIO, and
  fast-recovery fails that path, then you're in trouble. In version
 5.x,
  it is possible to disable this feature.
 
  Google DMP fast recovery.
 
  http://seer.entsupport.symantec.com/docs/307959.htm
 
  I can imagine there must have been some internal fights at symantec
  between product management and QA to get that feature released.
 
  Vic
 
 
 
 
 
  On Thu, Sep 16, 2010 at 6:03 AM, Sebastien DAUBIGNE
  sebastien.daubi...@atosorigin.com   wrote:
     Dear Vx-addicts,
 
  We encountered a failover issue on this configuration :
 
  - Solaris 9 HW 9/05
  - SUN SAN (SFS) 4.4.15
  - Emulex with SUN generic driver (emlx)
  - VxVM 5.0-2006-05-11a
 
  - storage on HP SAN (XP 24K).
 
 
  Multipathing is managed by MPxIO (not VxDMP) because the SAN team
 and HP
  support imposed the Solaris native solution for multipathing :
 
  VxVM ==   VxDMP ==   MPxIO ==   FCP ...
 
  We have 2 paths to the switch, linked to 2 paths to the storage, so
 the
  LUNs have 4 paths, with active/active support.
  Failover operation has been tested successfully by offlining each
 port
  successively on the SAN.
 
  We regulary have transient I/O errors (scsi timeout, I/O error
 retries
  with Unit attention), due to SAN-side issues. Usually these
 errors are
  transparently managed by MPxIO/VxVM without impact on the
 applications.
 
  Now for the incident we encountered :
 
  One of the SAN port was reset , consequently there were some
 transient
  I/O error.
  The other SAN port was OK, so the MPxIO multipathing layer should
 have
  failover the I/O on the other path, without transmiting the error
 to the
  VxDMP layer.
  For some reason, it did not failover the I/O before VxVM caught it
 as
  unrecoverable I/O error, disabling the subdisk and consequently the
  filesystem.
 
  Note the giving up message from scsi layer at 06:23:03 :
 
  Sep  1 06:18:54 myserver vxdmp: [ID 917986 kern.notice] NOTICE:
 VxVM
  vxdmp V-5-0-112 disabled path 118/0x558 belonging to the dmpnode
 288/0x60
  Sep  1 06:18:54 myserver vxdmp: [ID 824220 kern.notice] NOTICE:
 VxVM
  vxdmp V-5-0-111 disabled dmpnode 288/0x60
  Sep  1 06:18:54 myserver vxdmp: [ID 917986 kern.notice] NOTICE:
 VxVM
  vxdmp V-5-0-112 disabled path 118/0x538 belonging to the dmpnode
 288/0x20
  Sep  1 06:18:54 myserver vxdmp: [ID 917986 kern.notice] NOTICE:
 VxVM
  vxdmp V-5-0-112 disabled path 118/0x550 belonging to the dmpnode
 288/0x18
  Sep  1 06:18:54 myserver vxdmp: [ID 824220 kern.notice] NOTICE:
 VxVM
  vxdmp V-5-0-111 disabled dmpnode 288/0x20
  Sep  1 06:18:54 myserver vxdmp: [ID 824220 kern.notice] NOTICE:
 VxVM
  vxdmp V-5-0-111 disabled dmpnode 288/0x18
  Sep  1 06:18:54 myserver scsi: [ID 107833 kern.warning] WARNING:
  /scsi_vhci/s...@g60060e801527770127773794 (ssd165):
  Sep  1 06:18:54 myserver        SCSI transport failed: reason
  'tran_err': retrying command
  Sep  1 06:19:05 myserver scsi: [ID 107833 kern.warning] WARNING:
  /scsi_vhci/s...@g60060e801527770127773794 (ssd165):
  Sep  1 06:19:05 myserver        SCSI transport failed: reason
 'timeout':
  retrying command
  Sep  1 06:21:57 myserver scsi: [ID 107833 kern.warning] WARNING:
  /scsi_vhci/s...@g60060e80152777012777376d (ssd168):
  Sep  1 06:21:57 myserver        SCSI transport failed: reason
  'tran_err': retrying command
  Sep  1 06:22:45 myserver scsi: [ID 107833 kern.warning] WARNING:
  /scsi_vhci/s...@g60060e80152777012777376d (ssd168):
  Sep  1 06:22:45 myserver        SCSI transport failed: reason
 'timeout':
  retrying command
  Sep  1 06:23:03

[Veritas-vx] root disk mirroring

2009-04-20 Thread Victor Engle -X (viengle - Insight Global at Cisco)
List,
 
Can someone direct me to a document with a good explanation of root disk
mirroring and recovery with veritas vxvm 4.1 for Solaris? I have an
encapsulated root disk with a mirror. I think I have a good
understanding of the encapsulated disk and how to manually unencapsulate
it because all the partitions are still intact. My concern is with the
mirror. It's partitions are not the same as the root disk so if the root
disk fails I'm not sure how it's possible to get back to having an
encapsulated root disk. The reason I think I need an encapsulated root
disk is that I can disable veritas and still boot from the encapsulated
disk assuming I know how to unencapsulate it.
 
Thanks,
Vic
 
___
Veritas-vx maillist  -  Veritas-vx@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-vx