Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

2010-10-11 Thread Dedhi Sujatmiko
On Wed, 6 Oct 2010 10:08:08 -0700
Venkata Sreenivasa Rao Nagineni venkatasreenivasarao_nagin...@symantec.com 
wrote:

 Hi Sebastien,
 
 In the first mail you mentioned that you are using mpxio to control the XP24K 
 array. Why are you using mpxio here?

I guess this part of his email said that Multipathing is managed by MPxIO (not 
VxDMP) because the SAN team and HP 
support imposed the Solaris native solution for multipathing

-- 
sujatmiko.de...@gmail.com sujatmiko.de...@gmail.com
___
Veritas-vx maillist  -  Veritas-vx@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-vx


Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

2010-10-07 Thread Sebastien DAUBIGNE
  Hi,

Thank you all for your feedback.

I am very surprised that MPxIO+DMP is only supported on Sun storages : 
as stated in my very first message, the MPxIO solution was imposed by 
our SAN team, following HP recommendations.

When we joined this SAN, I asked to go with DMP for multipathing layer 
because we usually adopt this solution for all our 
Solaris+VxVM+dedicated storage configuration, regardless of the storage 
hardware : for instance with EMC hardware we use DMP and not Powerpath 
and it works like a charm.
Unfortunately the SAN team and HP told us that for Solaris servers 
incluing thoses with VxVM, we must use MPxIO otherwise they would not 
support it, hence we used MPxIO.

Now for the issue, the question is still : will 5.0 bypass the MPxIO 
layer for error detection or is this functionality only implemented 
starting at MP2 ?
The idea is to be sure that this is a fast recovery issue and not 
anything else.

Cheers,

Le 06/10/2010 23:02, Christian Gerbrandt a écrit :
 We support several 3rd party multipathing solutions, like MPxIO or EMCs 
 PowerPath.
 However, MPxIO is only supported on Sun branded Storages.
 DMP has also been known to outperform other solutions in certain 
 configurations.

 When a 3rd party multipathing is in use, DMP will fail back into TPD mode 
 (Third Party Driver), and let the underlaying multipathing do its job.
 That's when you see just a single disk in VxVM, when you know you have more 
 than one path per disk.

 I would recommend to install the 5.0 MP3 RP4 patch, and then check again if 
 MPxIO is still misbehaving.
 Or ideally, switch over to DMP.

 -Original Message-
 From: veritas-vx-boun...@mailman.eng.auburn.edu 
 [mailto:veritas-vx-boun...@mailman.eng.auburn.edu] On Behalf Of Victor Engle
 Sent: 06 October 2010 20:48
 To: Ashish Yajnik
 Cc: sebastien.daubi...@atosorigin.com; undisclosed-recipients:, 
 @mailman.eng.auburn.edu; Veritas-vx@mailman.eng.auburn.edu
 Subject: Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

 This is absolutely false!

 MPxIO is an excellent multipathing solution and is supported by all major 
 storage vendors including HP. This issue discussed in this thread has to do 
 with improper behavior of DMP when multipathing is managed by a native layer 
 like MPxIO.

 Storage and OS vendors have no motivation to lock you into a veritas solution.

 Or, Ashish, are you saying that your Symantec is locking Symantec customers 
 into DMP? Hitachi, EMC, NetApp and HP all have supported configurations which 
 include vxvm and native OS multipathing stacks.

 Thanks,
 Vic


 On Wed, Oct 6, 2010 at 1:26 PM, Ashish Yajnikashish_yaj...@symantec.com  
 wrote:
 MPxIO with VxVM is only supported with Sun storage. If you run into problems 
 with MPxIO and SF on XP24K then support will not be able to help you. I 
 would recommend using DMP with XP24K.

 Ashish
 --
 Sent using BlackBerry


 - Original Message -
 From: veritas-vx-boun...@mailman.eng.auburn.edu
 veritas-vx-boun...@mailman.eng.auburn.edu
 To: Sebastien DAUBIGNEsebastien.daubi...@atosorigin.com;
 undisclosed-recipients
 undisclosed-recipients:;@mailman.eng.auburn.edu
 Cc: Veritas-vx@mailman.eng.auburn.edu
 Veritas-vx@mailman.eng.auburn.edu
 Sent: Wed Oct 06 10:08:08 2010
 Subject: Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

 Hi Sebastien,

 In the first mail you mentioned that you are using mpxio to control the 
 XP24K array. Why are you using mpxio here?

 Thanks,
 Venkata Sreenivasarao Nagineni,
 Symantec

 -Original Message-
 From: veritas-vx-boun...@mailman.eng.auburn.edu [mailto:veritas-vx-
 boun...@mailman.eng.auburn.edu] On Behalf Of Sebastien DAUBIGNE
 Sent: Wednesday, October 06, 2010 9:32 AM
 To: undisclosed-recipients
 Cc: Veritas-vx@mailman.eng.auburn.edu
 Subject: Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

Hi,

 I come back with my dmp_fast_recovery issue (VxDMP fails the path
 before MPxIO gets a chance to failover on alternate path).
 As stated previously, I am running 5.0GA, and this tunable is not
 supported in this release. However I still don't know if VxVM 5.0GA
 silently bypasses the MPxIO stack for error recovery.

 Now I try to determine if upgrading to MP3 will resolve this issue
 (which rarely occured).

 Could anyone (maybe Joshua ?) explain if the behaviour of 5.0GA
 without tunable  is functionally identical to dmp_fast_recovery=0 or
 dmp_fast_recovery=1 ? Maybe the mechanism has been implemented in 5.0
 without the option to disable it (this could explain my issue) ?

 Joshua, you mentioned another tuneable for 5.0 but looking at the
 list I can't identify the corresponding tunable :

 vxdmpadm gettune all
   Tunable   Current Value  Default Value
 ---  -
 dmp_failed_io_threshold   5760057600
 dmp_retry_count   55

Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

2010-10-07 Thread Sebastien DAUBIGNE
  I found this technote which confirmed your statement Christian :  
http://www.symantec.com/business/support/index?page=contentid=TECH51507

- Storage Foundation on Solaris sparc and X64 is supported with MPxIO 
on Sun Storage hardware only. Storage Foundation does not support MPxIO 
on non-sun storage arrays. For Non-Sun storage hardware, DMP is 
required.  If MPxIO is enabled on a host, the tunable dmp_fast_recovery 
must be set to off: vxdmpadm settune dmp_fast_recovery=off.



Le 07/10/2010 11:12, Sebastien DAUBIGNE a écrit :
  Hi,

 Thank you all for your feedback.

 I am very surprised that MPxIO+DMP is only supported on Sun storages : 
 as stated in my very first message, the MPxIO solution was imposed by 
 our SAN team, following HP recommendations.

 When we joined this SAN, I asked to go with DMP for multipathing layer 
 because we usually adopt this solution for all our 
 Solaris+VxVM+dedicated storage configuration, regardless of the 
 storage hardware : for instance with EMC hardware we use DMP and not 
 Powerpath and it works like a charm.
 Unfortunately the SAN team and HP told us that for Solaris servers 
 incluing thoses with VxVM, we must use MPxIO otherwise they would not 
 support it, hence we used MPxIO.

 Now for the issue, the question is still : will 5.0 bypass the MPxIO 
 layer for error detection or is this functionality only implemented 
 starting at MP2 ?
 The idea is to be sure that this is a fast recovery issue and not 
 anything else.

 Cheers,

 Le 06/10/2010 23:02, Christian Gerbrandt a écrit :
 We support several 3rd party multipathing solutions, like MPxIO or 
 EMCs PowerPath.
 However, MPxIO is only supported on Sun branded Storages.
 DMP has also been known to outperform other solutions in certain 
 configurations.

 When a 3rd party multipathing is in use, DMP will fail back into TPD 
 mode (Third Party Driver), and let the underlaying multipathing do 
 its job.
 That's when you see just a single disk in VxVM, when you know you 
 have more than one path per disk.

 I would recommend to install the 5.0 MP3 RP4 patch, and then check 
 again if MPxIO is still misbehaving.
 Or ideally, switch over to DMP.

 -Original Message-
 From: veritas-vx-boun...@mailman.eng.auburn.edu 
 [mailto:veritas-vx-boun...@mailman.eng.auburn.edu] On Behalf Of 
 Victor Engle
 Sent: 06 October 2010 20:48
 To: Ashish Yajnik
 Cc: sebastien.daubi...@atosorigin.com; undisclosed-recipients:, 
 @mailman.eng.auburn.edu; Veritas-vx@mailman.eng.auburn.edu
 Subject: Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

 This is absolutely false!

 MPxIO is an excellent multipathing solution and is supported by all 
 major storage vendors including HP. This issue discussed in this 
 thread has to do with improper behavior of DMP when multipathing is 
 managed by a native layer like MPxIO.

 Storage and OS vendors have no motivation to lock you into a veritas 
 solution.

 Or, Ashish, are you saying that your Symantec is locking Symantec 
 customers into DMP? Hitachi, EMC, NetApp and HP all have supported 
 configurations which include vxvm and native OS multipathing stacks.

 Thanks,
 Vic


 On Wed, Oct 6, 2010 at 1:26 PM, Ashish 
 Yajnikashish_yaj...@symantec.com  wrote:
 MPxIO with VxVM is only supported with Sun storage. If you run into 
 problems with MPxIO and SF on XP24K then support will not be able to 
 help you. I would recommend using DMP with XP24K.

 Ashish
 --
 Sent using BlackBerry


 - Original Message -
 From: veritas-vx-boun...@mailman.eng.auburn.edu
 veritas-vx-boun...@mailman.eng.auburn.edu
 To: Sebastien DAUBIGNEsebastien.daubi...@atosorigin.com;
 undisclosed-recipients
 undisclosed-recipients:;@mailman.eng.auburn.edu
 Cc: Veritas-vx@mailman.eng.auburn.edu
 Veritas-vx@mailman.eng.auburn.edu
 Sent: Wed Oct 06 10:08:08 2010
 Subject: Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

 Hi Sebastien,

 In the first mail you mentioned that you are using mpxio to control 
 the XP24K array. Why are you using mpxio here?

 Thanks,
 Venkata Sreenivasarao Nagineni,
 Symantec

 -Original Message-
 From: veritas-vx-boun...@mailman.eng.auburn.edu [mailto:veritas-vx-
 boun...@mailman.eng.auburn.edu] On Behalf Of Sebastien DAUBIGNE
 Sent: Wednesday, October 06, 2010 9:32 AM
 To: undisclosed-recipients
 Cc: Veritas-vx@mailman.eng.auburn.edu
 Subject: Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

Hi,

 I come back with my dmp_fast_recovery issue (VxDMP fails the path
 before MPxIO gets a chance to failover on alternate path).
 As stated previously, I am running 5.0GA, and this tunable is not
 supported in this release. However I still don't know if VxVM 5.0GA
 silently bypasses the MPxIO stack for error recovery.

 Now I try to determine if upgrading to MP3 will resolve this issue
 (which rarely occured).

 Could anyone (maybe Joshua ?) explain if the behaviour of 5.0GA
 without tunable  is functionally identical

Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

2010-10-06 Thread Venkata Sreenivasa Rao Nagineni
Hi Sebastien,

In the first mail you mentioned that you are using mpxio to control the XP24K 
array. Why are you using mpxio here?

Thanks,
Venkata Sreenivasarao Nagineni,
Symantec

 -Original Message-
 From: veritas-vx-boun...@mailman.eng.auburn.edu [mailto:veritas-vx-
 boun...@mailman.eng.auburn.edu] On Behalf Of Sebastien DAUBIGNE
 Sent: Wednesday, October 06, 2010 9:32 AM
 To: undisclosed-recipients
 Cc: Veritas-vx@mailman.eng.auburn.edu
 Subject: Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue
 
   Hi,
 
 I come back with my dmp_fast_recovery issue (VxDMP fails the path
 before
 MPxIO gets a chance to failover on alternate path).
 As stated previously, I am running 5.0GA, and this tunable is not
 supported in this release. However I still don't know if VxVM 5.0GA
 silently bypasses the MPxIO stack for error recovery.
 
 Now I try to determine if upgrading to MP3 will resolve this issue
 (which rarely occured).
 
 Could anyone (maybe Joshua ?) explain if the behaviour of 5.0GA without
 tunable  is functionally identical to dmp_fast_recovery=0 or
 dmp_fast_recovery=1 ? Maybe the mechanism has been implemented in 5.0
 without the option to disable it (this could explain my issue) ?
 
 Joshua, you mentioned another tuneable for 5.0 but looking at the list
 I
 can't identify the corresponding tunable :
 
   vxdmpadm gettune all
  Tunable   Current Value  Default Value
 ---  -
 dmp_failed_io_threshold   5760057600
 dmp_retry_count   55
 dmp_pathswitch_blks_shift11   11
 dmp_queue_depth  32   32
 dmp_cache_open   on   on
 dmp_daemon_count 10   10
 dmp_scsi_timeout 30   30
 dmp_delayq_interval  15   15
 dmp_path_age  0  300
 dmp_stat_interval 11
 dmp_health_time   0   60
 dmp_probe_idle_lun   on   on
 dmp_log_level 41
 
 Cheers.
 
 
 
 Le 16/09/2010 16:50, Joshua Fielden a écrit :
  dmp_fast_recovery is a mechanism by which we bypass the sd/scsi stack
 and send path inquiry/status CDBs directly from the HBA in order to
 bypass long SCSI queues and recover paths faster. With a TPD (third-
 party driver) such as MPxIO, bypassing the stack means we bypass the
 TPD completely, and interactions such as this can happen. The vxesd
 (event-source daemon) is another 5.0/MP2 backport addition that's moot
 in the presence of a TPD.
 
   From your modinfo, you're not actually running MP3. This technote
 (http://seer.entsupport.symantec.com/docs/327057.htm) isn't exactly
 your scenario, but looking for partially-installed pkgs is a good start
 to getting your server correctly installed, then the tuneable should
 work -- very early 5.0 versions had a differently-named tuneable I
 can't find in my mail archive ATM.
 
  Cheers,
 
  Jf
 
  -Original Message-
  From: veritas-vx-boun...@mailman.eng.auburn.edu [mailto:veritas-vx-
 boun...@mailman.eng.auburn.edu] On Behalf Of Sebastien DAUBIGNE
  Sent: Thursday, September 16, 2010 7:41 AM
  To: Veritas-vx@mailman.eng.auburn.edu
  Subject: Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue
 
 Thank you Victor and William, it seems to be a very good lead.
 
  Unfortunately, this tunable seems not to be supported in the VxVM
  version installed on my system :
 
  vxdmpadm gettune dmp_fast_recovery
  VxVM vxdmpadm ERROR V-5-1-12015  Incorrect tunable
  vxdmpadm gettune [tunable name]
  Note - Tunable name can be dmp_failed_io_threshold, dmp_retry_count,
  dmp_pathswitch_blks_shift, dmp_queue_depth, dmp_cache_open,
  dmp_daemon_count, dmp_scsi_timeout, dmp_delayq_interval,
 dmp_path_age,
  or dmp_stat_interval
 
  Something odd because my version is 5.0 MP3 Solaris SPARC, and
 according
  to http://seer.entsupport.symantec.com/docs/316981.htm this tunable
  should be available.
 
  modinfo | grep -i vx
 38 7846a000  3800e 288   1  vxdmp (VxVM 5.0-2006-05-11a: DMP
 Drive)
 40 784a4000 334c40 289   1  vxio (VxVM 5.0-2006-05-11a I/O driver)
 42 783ec71ddf8 290   1  vxspec (VxVM 5.0-2006-05-11a
 control/st)
  296 78cfb0a2c6b 291   1  vxportal (VxFS 5.0_REV-5.0A55_sol portal
 )
  297 78d6c000 1b9d4f   8   1  vxfs (VxFS 5.0_REV-5.0A55_sol SunOS 5)
  298 78f18000   a270 292   1  fdd (VxQIO 5.0_REV-5.0A55_sol Quick )
 
 
 
 
 
  Le 16/09/2010 12:15, Victor Engle a écrit :
  Which version of veritas? Version 4/2MP2 and version 5.x introduced
 a
  feature called DMP fast recovery. It was probably supposed to be
  called DMP fast fail but recovery sounds better. It is supposed to
  fail suspect

Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

2010-10-06 Thread Ashish Yajnik
MPxIO with VxVM is only supported with Sun storage. If you run into problems 
with MPxIO and SF on XP24K then support will not be able to help you. I would 
recommend using DMP with XP24K.

Ashish
--
Sent using BlackBerry


- Original Message -
From: veritas-vx-boun...@mailman.eng.auburn.edu 
veritas-vx-boun...@mailman.eng.auburn.edu
To: Sebastien DAUBIGNE sebastien.daubi...@atosorigin.com; 
undisclosed-recipients undisclosed-recipients:;@mailman.eng.auburn.edu
Cc: Veritas-vx@mailman.eng.auburn.edu Veritas-vx@mailman.eng.auburn.edu
Sent: Wed Oct 06 10:08:08 2010
Subject: Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

Hi Sebastien,

In the first mail you mentioned that you are using mpxio to control the XP24K 
array. Why are you using mpxio here?

Thanks,
Venkata Sreenivasarao Nagineni,
Symantec

 -Original Message-
 From: veritas-vx-boun...@mailman.eng.auburn.edu [mailto:veritas-vx-
 boun...@mailman.eng.auburn.edu] On Behalf Of Sebastien DAUBIGNE
 Sent: Wednesday, October 06, 2010 9:32 AM
 To: undisclosed-recipients
 Cc: Veritas-vx@mailman.eng.auburn.edu
 Subject: Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue
 
   Hi,
 
 I come back with my dmp_fast_recovery issue (VxDMP fails the path
 before
 MPxIO gets a chance to failover on alternate path).
 As stated previously, I am running 5.0GA, and this tunable is not
 supported in this release. However I still don't know if VxVM 5.0GA
 silently bypasses the MPxIO stack for error recovery.
 
 Now I try to determine if upgrading to MP3 will resolve this issue
 (which rarely occured).
 
 Could anyone (maybe Joshua ?) explain if the behaviour of 5.0GA without
 tunable  is functionally identical to dmp_fast_recovery=0 or
 dmp_fast_recovery=1 ? Maybe the mechanism has been implemented in 5.0
 without the option to disable it (this could explain my issue) ?
 
 Joshua, you mentioned another tuneable for 5.0 but looking at the list
 I
 can't identify the corresponding tunable :
 
   vxdmpadm gettune all
  Tunable   Current Value  Default Value
 ---  -
 dmp_failed_io_threshold   5760057600
 dmp_retry_count   55
 dmp_pathswitch_blks_shift11   11
 dmp_queue_depth  32   32
 dmp_cache_open   on   on
 dmp_daemon_count 10   10
 dmp_scsi_timeout 30   30
 dmp_delayq_interval  15   15
 dmp_path_age  0  300
 dmp_stat_interval 11
 dmp_health_time   0   60
 dmp_probe_idle_lun   on   on
 dmp_log_level 41
 
 Cheers.
 
 
 
 Le 16/09/2010 16:50, Joshua Fielden a écrit :
  dmp_fast_recovery is a mechanism by which we bypass the sd/scsi stack
 and send path inquiry/status CDBs directly from the HBA in order to
 bypass long SCSI queues and recover paths faster. With a TPD (third-
 party driver) such as MPxIO, bypassing the stack means we bypass the
 TPD completely, and interactions such as this can happen. The vxesd
 (event-source daemon) is another 5.0/MP2 backport addition that's moot
 in the presence of a TPD.
 
   From your modinfo, you're not actually running MP3. This technote
 (http://seer.entsupport.symantec.com/docs/327057.htm) isn't exactly
 your scenario, but looking for partially-installed pkgs is a good start
 to getting your server correctly installed, then the tuneable should
 work -- very early 5.0 versions had a differently-named tuneable I
 can't find in my mail archive ATM.
 
  Cheers,
 
  Jf
 
  -Original Message-
  From: veritas-vx-boun...@mailman.eng.auburn.edu [mailto:veritas-vx-
 boun...@mailman.eng.auburn.edu] On Behalf Of Sebastien DAUBIGNE
  Sent: Thursday, September 16, 2010 7:41 AM
  To: Veritas-vx@mailman.eng.auburn.edu
  Subject: Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue
 
 Thank you Victor and William, it seems to be a very good lead.
 
  Unfortunately, this tunable seems not to be supported in the VxVM
  version installed on my system :
 
  vxdmpadm gettune dmp_fast_recovery
  VxVM vxdmpadm ERROR V-5-1-12015  Incorrect tunable
  vxdmpadm gettune [tunable name]
  Note - Tunable name can be dmp_failed_io_threshold, dmp_retry_count,
  dmp_pathswitch_blks_shift, dmp_queue_depth, dmp_cache_open,
  dmp_daemon_count, dmp_scsi_timeout, dmp_delayq_interval,
 dmp_path_age,
  or dmp_stat_interval
 
  Something odd because my version is 5.0 MP3 Solaris SPARC, and
 according
  to http://seer.entsupport.symantec.com/docs/316981.htm this tunable
  should be available.
 
  modinfo | grep -i vx
 38 7846a000  3800e 288   1  vxdmp (VxVM 5.0

Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

2010-10-06 Thread Victor Engle
This is absolutely false!

MPxIO is an excellent multipathing solution and is supported by all
major storage vendors including HP. This issue discussed in this
thread has to do with improper behavior of DMP when multipathing is
managed by a native layer like MPxIO.

Storage and OS vendors have no motivation to lock you into a veritas solution.

Or, Ashish, are you saying that your Symantec is locking Symantec
customers into DMP? Hitachi, EMC, NetApp and HP all have supported
configurations which include vxvm and native OS multipathing stacks.

Thanks,
Vic


On Wed, Oct 6, 2010 at 1:26 PM, Ashish Yajnik
ashish_yaj...@symantec.com wrote:
 MPxIO with VxVM is only supported with Sun storage. If you run into problems 
 with MPxIO and SF on XP24K then support will not be able to help you. I would 
 recommend using DMP with XP24K.

 Ashish
 --
 Sent using BlackBerry


 - Original Message -
 From: veritas-vx-boun...@mailman.eng.auburn.edu 
 veritas-vx-boun...@mailman.eng.auburn.edu
 To: Sebastien DAUBIGNE sebastien.daubi...@atosorigin.com; 
 undisclosed-recipients undisclosed-recipients:;@mailman.eng.auburn.edu
 Cc: Veritas-vx@mailman.eng.auburn.edu Veritas-vx@mailman.eng.auburn.edu
 Sent: Wed Oct 06 10:08:08 2010
 Subject: Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

 Hi Sebastien,

 In the first mail you mentioned that you are using mpxio to control the XP24K 
 array. Why are you using mpxio here?

 Thanks,
 Venkata Sreenivasarao Nagineni,
 Symantec

 -Original Message-
 From: veritas-vx-boun...@mailman.eng.auburn.edu [mailto:veritas-vx-
 boun...@mailman.eng.auburn.edu] On Behalf Of Sebastien DAUBIGNE
 Sent: Wednesday, October 06, 2010 9:32 AM
 To: undisclosed-recipients
 Cc: Veritas-vx@mailman.eng.auburn.edu
 Subject: Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

   Hi,

 I come back with my dmp_fast_recovery issue (VxDMP fails the path
 before
 MPxIO gets a chance to failover on alternate path).
 As stated previously, I am running 5.0GA, and this tunable is not
 supported in this release. However I still don't know if VxVM 5.0GA
 silently bypasses the MPxIO stack for error recovery.

 Now I try to determine if upgrading to MP3 will resolve this issue
 (which rarely occured).

 Could anyone (maybe Joshua ?) explain if the behaviour of 5.0GA without
 tunable  is functionally identical to dmp_fast_recovery=0 or
 dmp_fast_recovery=1 ? Maybe the mechanism has been implemented in 5.0
 without the option to disable it (this could explain my issue) ?

 Joshua, you mentioned another tuneable for 5.0 but looking at the list
 I
 can't identify the corresponding tunable :

   vxdmpadm gettune all
              Tunable               Current Value  Default Value
 --    -  -
 dmp_failed_io_threshold               57600            57600
 dmp_retry_count                           5                5
 dmp_pathswitch_blks_shift                11               11
 dmp_queue_depth                          32               32
 dmp_cache_open                           on               on
 dmp_daemon_count                         10               10
 dmp_scsi_timeout                         30               30
 dmp_delayq_interval                      15               15
 dmp_path_age                              0              300
 dmp_stat_interval                         1                1
 dmp_health_time                           0               60
 dmp_probe_idle_lun                       on               on
 dmp_log_level                             4                1

 Cheers.



 Le 16/09/2010 16:50, Joshua Fielden a écrit :
  dmp_fast_recovery is a mechanism by which we bypass the sd/scsi stack
 and send path inquiry/status CDBs directly from the HBA in order to
 bypass long SCSI queues and recover paths faster. With a TPD (third-
 party driver) such as MPxIO, bypassing the stack means we bypass the
 TPD completely, and interactions such as this can happen. The vxesd
 (event-source daemon) is another 5.0/MP2 backport addition that's moot
 in the presence of a TPD.
 
   From your modinfo, you're not actually running MP3. This technote
 (http://seer.entsupport.symantec.com/docs/327057.htm) isn't exactly
 your scenario, but looking for partially-installed pkgs is a good start
 to getting your server correctly installed, then the tuneable should
 work -- very early 5.0 versions had a differently-named tuneable I
 can't find in my mail archive ATM.
 
  Cheers,
 
  Jf
 
  -Original Message-
  From: veritas-vx-boun...@mailman.eng.auburn.edu [mailto:veritas-vx-
 boun...@mailman.eng.auburn.edu] On Behalf Of Sebastien DAUBIGNE
  Sent: Thursday, September 16, 2010 7:41 AM
  To: Veritas-vx@mailman.eng.auburn.edu
  Subject: Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue
 
     Thank you Victor and William, it seems to be a very good lead.
 
  Unfortunately, this tunable seems not to be supported

Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

2010-10-06 Thread Christian Gerbrandt
We support several 3rd party multipathing solutions, like MPxIO or EMCs 
PowerPath.
However, MPxIO is only supported on Sun branded Storages.
DMP has also been known to outperform other solutions in certain configurations.

When a 3rd party multipathing is in use, DMP will fail back into TPD mode 
(Third Party Driver), and let the underlaying multipathing do its job.
That's when you see just a single disk in VxVM, when you know you have more 
than one path per disk.

I would recommend to install the 5.0 MP3 RP4 patch, and then check again if 
MPxIO is still misbehaving.
Or ideally, switch over to DMP.  

-Original Message-
From: veritas-vx-boun...@mailman.eng.auburn.edu 
[mailto:veritas-vx-boun...@mailman.eng.auburn.edu] On Behalf Of Victor Engle
Sent: 06 October 2010 20:48
To: Ashish Yajnik
Cc: sebastien.daubi...@atosorigin.com; undisclosed-recipients:, 
@mailman.eng.auburn.edu; Veritas-vx@mailman.eng.auburn.edu
Subject: Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

This is absolutely false!

MPxIO is an excellent multipathing solution and is supported by all major 
storage vendors including HP. This issue discussed in this thread has to do 
with improper behavior of DMP when multipathing is managed by a native layer 
like MPxIO.

Storage and OS vendors have no motivation to lock you into a veritas solution.

Or, Ashish, are you saying that your Symantec is locking Symantec customers 
into DMP? Hitachi, EMC, NetApp and HP all have supported configurations which 
include vxvm and native OS multipathing stacks.

Thanks,
Vic


On Wed, Oct 6, 2010 at 1:26 PM, Ashish Yajnik ashish_yaj...@symantec.com 
wrote:
 MPxIO with VxVM is only supported with Sun storage. If you run into problems 
 with MPxIO and SF on XP24K then support will not be able to help you. I would 
 recommend using DMP with XP24K.

 Ashish
 --
 Sent using BlackBerry


 - Original Message -
 From: veritas-vx-boun...@mailman.eng.auburn.edu 
 veritas-vx-boun...@mailman.eng.auburn.edu
 To: Sebastien DAUBIGNE sebastien.daubi...@atosorigin.com; 
 undisclosed-recipients 
 undisclosed-recipients:;@mailman.eng.auburn.edu
 Cc: Veritas-vx@mailman.eng.auburn.edu 
 Veritas-vx@mailman.eng.auburn.edu
 Sent: Wed Oct 06 10:08:08 2010
 Subject: Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

 Hi Sebastien,

 In the first mail you mentioned that you are using mpxio to control the XP24K 
 array. Why are you using mpxio here?

 Thanks,
 Venkata Sreenivasarao Nagineni,
 Symantec

 -Original Message-
 From: veritas-vx-boun...@mailman.eng.auburn.edu [mailto:veritas-vx- 
 boun...@mailman.eng.auburn.edu] On Behalf Of Sebastien DAUBIGNE
 Sent: Wednesday, October 06, 2010 9:32 AM
 To: undisclosed-recipients
 Cc: Veritas-vx@mailman.eng.auburn.edu
 Subject: Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

   Hi,

 I come back with my dmp_fast_recovery issue (VxDMP fails the path 
 before MPxIO gets a chance to failover on alternate path).
 As stated previously, I am running 5.0GA, and this tunable is not 
 supported in this release. However I still don't know if VxVM 5.0GA 
 silently bypasses the MPxIO stack for error recovery.

 Now I try to determine if upgrading to MP3 will resolve this issue 
 (which rarely occured).

 Could anyone (maybe Joshua ?) explain if the behaviour of 5.0GA 
 without tunable  is functionally identical to dmp_fast_recovery=0 or
 dmp_fast_recovery=1 ? Maybe the mechanism has been implemented in 5.0 
 without the option to disable it (this could explain my issue) ?

 Joshua, you mentioned another tuneable for 5.0 but looking at the 
 list I can't identify the corresponding tunable :

   vxdmpadm gettune all
  Tunable   Current Value  Default Value
 ---  - 
 dmp_failed_io_threshold   5760057600 
 dmp_retry_count   55 
 dmp_pathswitch_blks_shift11   11 
 dmp_queue_depth  32   32 
 dmp_cache_open   on   on 
 dmp_daemon_count 10   10 
 dmp_scsi_timeout 30   30 
 dmp_delayq_interval  15   15 
 dmp_path_age  0  300 
 dmp_stat_interval 11 
 dmp_health_time   0   60 
 dmp_probe_idle_lun   on   on 
 dmp_log_level 41

 Cheers.



 Le 16/09/2010 16:50, Joshua Fielden a écrit :
  dmp_fast_recovery is a mechanism by which we bypass the sd/scsi 
  stack
 and send path inquiry/status CDBs directly from the HBA in order to 
 bypass long SCSI queues and recover paths faster. With a TPD (third- 
 party driver) such as MPxIO, bypassing the stack means

Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

2010-09-17 Thread Sebastien DAUBIGNE


  
  
Steven,

As I said yesterday, I made a mistake with my version which is 5.0
GA.

According to http://seer.entsupport.symantec.com/docs/316981.htm,
dmp_fast_recovery should be applicable to 5.0 on Solaris :

Products Applied:
Volume Manager for UNIX/Linux 4.1 MP2 (Solaris), 4.1 MP2
  (Solaris) RP5, 5.0 (Solaris), 5.0 MP1
  (Solaris), 5.0 MP3 (Solaris), 5.0 MP3 (Solaris) RP1, 5.0 MP3
  (Solaris) RP2, 5.0 MP3 (Solaris) RP3 


Le 16/09/2010 20:31, Green, Steven a crit:

  Check the technote again ... It specifically lists only AIX as the applicable OS. Also, your modinfo output suggests you are not running MP3 anyway. Here is my modinfo output from a Solaris 10 system running VxVM 5.0 MP3:

[kultarr:root]: modinfo | grep -i vx
 40 7be08000  3e4e0 183   1  vxdmp (VxVM 5.0MP3: DMP Driver)
 42 7ba0 209248 184   1  vxio (VxVM 5.0MP3 I/O driver)
 44 7be073c0c78 265   1  vxspec (VxVM 5.0MP3 control/status driv)
201 7be75228cb0 266   1  vxportal (VxFS 5.0_REV-5.0MP3A25_sol port)
202 7aa0 1d89e0  21   1  vxfs (VxFS 5.0_REV-5.0MP3A25_sol SunO)
224 7abe4000   a9e0 267   1  fdd (VxQIO 5.0_REV-5.0MP3A25_sol Qui)


-Original Message-
From: Sebastien DAUBIGNE [mailto:sebastien.daubi...@atosorigin.com] 
Sent: Thursday, September 16, 2010 8:41 AM
To: Veritas-vx@mailman.eng.auburn.edu
Subject: Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

  Thank you Victor and William, it seems to be a very good lead.

Unfortunately, this tunable seems not to be supported in the VxVM 
version installed on my system :

  vxdmpadm gettune dmp_fast_recovery
VxVM vxdmpadm ERROR V-5-1-12015  Incorrect tunable
vxdmpadm gettune [tunable name]
Note - Tunable name can be dmp_failed_io_threshold, dmp_retry_count, 
dmp_pathswitch_blks_shift, dmp_queue_depth, dmp_cache_open, 
dmp_daemon_count, dmp_scsi_timeout, dmp_delayq_interval, dmp_path_age, 
or dmp_stat_interval

Something odd because my version is 5.0 MP3 Solaris SPARC, and according 
to http://seer.entsupport.symantec.com/docs/316981.htm this tunable 
should be available.

  modinfo | grep -i vx
  38 7846a000  3800e 288   1  vxdmp (VxVM 5.0-2006-05-11a: DMP Drive)
  40 784a4000 334c40 289   1  vxio (VxVM 5.0-2006-05-11a I/O driver)
  42 783ec71ddf8 290   1  vxspec (VxVM 5.0-2006-05-11a control/st)
296 78cfb0a2c6b 291   1  vxportal (VxFS 5.0_REV-5.0A55_sol portal )
297 78d6c000 1b9d4f   8   1  vxfs (VxFS 5.0_REV-5.0A55_sol SunOS 5)
298 78f18000   a270 292   1  fdd (VxQIO 5.0_REV-5.0A55_sol Quick )





Le 16/09/2010 12:15, Victor Engle a crit :

  
Which version of veritas? Version 4/2MP2 and version 5.x introduced a
feature called DMP fast recovery. It was probably supposed to be
called DMP fast fail but "recovery" sounds better. It is supposed to
fail suspect paths more aggressively to speed up failover. But when
you only have one vxvm DMP path, as is the case with MPxIO, and
fast-recovery fails that path, then you're in trouble. In version 5.x,
it is possible to disable this feature.

Google DMP fast recovery.

http://seer.entsupport.symantec.com/docs/307959.htm

I can imagine there must have been some internal fights at symantec
between product management and QA to get that feature released.

Vic





On Thu, Sep 16, 2010 at 6:03 AM, Sebastien DAUBIGNE
sebastien.daubi...@atosorigin.com  wrote:


Dear Vx-addicts,

We encountered a failover issue on this configuration :

- Solaris 9 HW 9/05
- SUN SAN (SFS) 4.4.15
- Emulex with SUN generic driver (emlx)
- VxVM 5.0-2006-05-11a

- storage on HP SAN (XP 24K).


Multipathing is managed by MPxIO (not VxDMP) because the SAN team and HP
support imposed the Solaris native solution for multipathing :

VxVM ==  VxDMP ==  MPxIO ==  FCP ...

We have 2 paths to the switch, linked to 2 paths to the storage, so the
LUNs have 4 paths, with active/active support.
Failover operation has been tested successfully by offlining each port
successively on the SAN.

We regulary have transient I/O errors (scsi timeout, I/O error retries
with "Unit attention"), due to SAN-side issues. Usually these errors are
transparently managed by MPxIO/VxVM without impact on the applications.

Now for the incident we encountered :

One of the SAN port was reset , consequently there were some transient
I/O error.
The other SAN port was OK, so the MPxIO multipathing layer should have
failover the I/O on the other path, without transmiting the error to the
VxDMP layer.
For some reason, it did not failover the I/O before VxVM caught it as
unrecoverable I/O error, disabling the subdisk and consequently the
filesystem.

Note the "giving up" message from scsi layer at 06:23:03 :

Sep  1 06:18:54 myserver vxdmp: [ID 917986 kern.notice] NOTICE: VxVM
vxdmp V-5-0-112 disabled path 118/0x558 belonging to the dmpnode 288/0x60
Sep  1 06:18:54 myserver vxdmp: [ID 824220 kern.notice] NOTICE: VxVM
vx

Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

2010-09-16 Thread Victor Engle
Which version of veritas? Version 4/2MP2 and version 5.x introduced a
feature called DMP fast recovery. It was probably supposed to be
called DMP fast fail but recovery sounds better. It is supposed to
fail suspect paths more aggressively to speed up failover. But when
you only have one vxvm DMP path, as is the case with MPxIO, and
fast-recovery fails that path, then you're in trouble. In version 5.x,
it is possible to disable this feature.

Google DMP fast recovery.

http://seer.entsupport.symantec.com/docs/307959.htm

I can imagine there must have been some internal fights at symantec
between product management and QA to get that feature released.

Vic





On Thu, Sep 16, 2010 at 6:03 AM, Sebastien DAUBIGNE
sebastien.daubi...@atosorigin.com wrote:
  Dear Vx-addicts,

 We encountered a failover issue on this configuration :

 - Solaris 9 HW 9/05
 - SUN SAN (SFS) 4.4.15
 - Emulex with SUN generic driver (emlx)
 - VxVM 5.0-2006-05-11a

 - storage on HP SAN (XP 24K).


 Multipathing is managed by MPxIO (not VxDMP) because the SAN team and HP
 support imposed the Solaris native solution for multipathing :

 VxVM == VxDMP == MPxIO == FCP ...

 We have 2 paths to the switch, linked to 2 paths to the storage, so the
 LUNs have 4 paths, with active/active support.
 Failover operation has been tested successfully by offlining each port
 successively on the SAN.

 We regulary have transient I/O errors (scsi timeout, I/O error retries
 with Unit attention), due to SAN-side issues. Usually these errors are
 transparently managed by MPxIO/VxVM without impact on the applications.

 Now for the incident we encountered :

 One of the SAN port was reset , consequently there were some transient
 I/O error.
 The other SAN port was OK, so the MPxIO multipathing layer should have
 failover the I/O on the other path, without transmiting the error to the
 VxDMP layer.
 For some reason, it did not failover the I/O before VxVM caught it as
 unrecoverable I/O error, disabling the subdisk and consequently the
 filesystem.

 Note the giving up message from scsi layer at 06:23:03 :

 Sep  1 06:18:54 myserver vxdmp: [ID 917986 kern.notice] NOTICE: VxVM
 vxdmp V-5-0-112 disabled path 118/0x558 belonging to the dmpnode 288/0x60
 Sep  1 06:18:54 myserver vxdmp: [ID 824220 kern.notice] NOTICE: VxVM
 vxdmp V-5-0-111 disabled dmpnode 288/0x60
 Sep  1 06:18:54 myserver vxdmp: [ID 917986 kern.notice] NOTICE: VxVM
 vxdmp V-5-0-112 disabled path 118/0x538 belonging to the dmpnode 288/0x20
 Sep  1 06:18:54 myserver vxdmp: [ID 917986 kern.notice] NOTICE: VxVM
 vxdmp V-5-0-112 disabled path 118/0x550 belonging to the dmpnode 288/0x18
 Sep  1 06:18:54 myserver vxdmp: [ID 824220 kern.notice] NOTICE: VxVM
 vxdmp V-5-0-111 disabled dmpnode 288/0x20
 Sep  1 06:18:54 myserver vxdmp: [ID 824220 kern.notice] NOTICE: VxVM
 vxdmp V-5-0-111 disabled dmpnode 288/0x18
 Sep  1 06:18:54 myserver scsi: [ID 107833 kern.warning] WARNING:
 /scsi_vhci/s...@g60060e801527770127773794 (ssd165):
 Sep  1 06:18:54 myserver        SCSI transport failed: reason
 'tran_err': retrying command
 Sep  1 06:19:05 myserver scsi: [ID 107833 kern.warning] WARNING:
 /scsi_vhci/s...@g60060e801527770127773794 (ssd165):
 Sep  1 06:19:05 myserver        SCSI transport failed: reason 'timeout':
 retrying command
 Sep  1 06:21:57 myserver scsi: [ID 107833 kern.warning] WARNING:
 /scsi_vhci/s...@g60060e80152777012777376d (ssd168):
 Sep  1 06:21:57 myserver        SCSI transport failed: reason
 'tran_err': retrying command
 Sep  1 06:22:45 myserver scsi: [ID 107833 kern.warning] WARNING:
 /scsi_vhci/s...@g60060e80152777012777376d (ssd168):
 Sep  1 06:22:45 myserver        SCSI transport failed: reason 'timeout':
 retrying command
 Sep  1 06:23:03 myserver scsi: [ID 107833 kern.warning] WARNING:
 /scsi_vhci/s...@g60060e801527770127773787 (ssd166):
 Sep  1 06:23:03 myserver        SCSI transport failed: reason 'timeout':
 giving up
 Sep  1 06:23:03 myserver vxio: [ID 539309 kern.warning] WARNING: VxVM
 vxio V-5-3-0 voldmp_errbuf_sio_start: Failed to flush the error buffer
 300ce41c340 on device 0x120003a to DMP
 Sep  1 06:23:03 myserver vxio: [ID 771159 kern.warning] WARNING: VxVM
 vxio V-5-0-2 Subdisk mydisk_2-02 block 5935: Uncorrectable write error
 Sep  1 06:23:03 myserver vxfs: [ID 702911 kern.warning] WARNING: msgcnt
 1 mesg 037: V-2-37: vx_metaioerr - vx_logbuf_clean -
 /dev/vx/dsk/mydg/vol1 file system meta data write error in dev/block 0/5935
 Sep  1 06:23:03 myserver vxfs: [ID 702911 kern.warning] WARNING: msgcnt
 2 mesg 031: V-2-31: vx_disable - /dev/vx/dsk/mydg/vol1 file system disabled
 Sep  1 06:23:03 myserver vxfs: [ID 702911 kern.warning] WARNING: msgcnt
 3 mesg 037: V-2-37: vx_metaioerr - vx_inode_iodone -
 /dev/vx/dsk/mydg/vol1 file system meta data write error in dev/block
 0/265984


 It seems VxDMP gets the I/O error at the same time as MPxIO  : I though
 MPxIO would have conceal the I/O error until failover has occured, which
 

Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

2010-09-16 Thread William Havey
Sebastien,

I found the following in my notes:

Ø See http://seer.entsupport.symantec.com/docs/288497.htm  and
http://support.veritas.com/docs/276602

Ø If MPxIO is enabled on a host, the Veritas DMP tunable “dmp_fast_recovery”
must be set to off, after SF is installed.

vxdmpadm gettune dmp_fast_recovery

Tunable   Current Value  Default Value

---  -

dmp_fast_recoveryon   on

vxdmpadm settune dmp_fast_recovery=off where stored?

I am not certain the effect of not turning off the tunable but is
dmp_fast_recovery set to off?

Bill
On Thu, Sep 16, 2010 at 6:03 AM, Sebastien DAUBIGNE 
sebastien.daubi...@atosorigin.com wrote:

  Dear Vx-addicts,

 We encountered a failover issue on this configuration :

 - Solaris 9 HW 9/05
 - SUN SAN (SFS) 4.4.15
 - Emulex with SUN generic driver (emlx)
 - VxVM 5.0-2006-05-11a

 - storage on HP SAN (XP 24K).


 Multipathing is managed by MPxIO (not VxDMP) because the SAN team and HP
 support imposed the Solaris native solution for multipathing :

 VxVM == VxDMP == MPxIO == FCP ...

 We have 2 paths to the switch, linked to 2 paths to the storage, so the
 LUNs have 4 paths, with active/active support.
 Failover operation has been tested successfully by offlining each port
 successively on the SAN.

 We regulary have transient I/O errors (scsi timeout, I/O error retries
 with Unit attention), due to SAN-side issues. Usually these errors are
 transparently managed by MPxIO/VxVM without impact on the applications.

 Now for the incident we encountered :

 One of the SAN port was reset , consequently there were some transient
 I/O error.
 The other SAN port was OK, so the MPxIO multipathing layer should have
 failover the I/O on the other path, without transmiting the error to the
 VxDMP layer.
 For some reason, it did not failover the I/O before VxVM caught it as
 unrecoverable I/O error, disabling the subdisk and consequently the
 filesystem.

 Note the giving up message from scsi layer at 06:23:03 :

 Sep  1 06:18:54 myserver vxdmp: [ID 917986 kern.notice] NOTICE: VxVM
 vxdmp V-5-0-112 disabled path 118/0x558 belonging to the dmpnode 288/0x60
 Sep  1 06:18:54 myserver vxdmp: [ID 824220 kern.notice] NOTICE: VxVM
 vxdmp V-5-0-111 disabled dmpnode 288/0x60
 Sep  1 06:18:54 myserver vxdmp: [ID 917986 kern.notice] NOTICE: VxVM
 vxdmp V-5-0-112 disabled path 118/0x538 belonging to the dmpnode 288/0x20
 Sep  1 06:18:54 myserver vxdmp: [ID 917986 kern.notice] NOTICE: VxVM
 vxdmp V-5-0-112 disabled path 118/0x550 belonging to the dmpnode 288/0x18
 Sep  1 06:18:54 myserver vxdmp: [ID 824220 kern.notice] NOTICE: VxVM
 vxdmp V-5-0-111 disabled dmpnode 288/0x20
 Sep  1 06:18:54 myserver vxdmp: [ID 824220 kern.notice] NOTICE: VxVM
 vxdmp V-5-0-111 disabled dmpnode 288/0x18
 Sep  1 06:18:54 myserver scsi: [ID 107833 kern.warning] WARNING:
 /scsi_vhci/s...@g60060e801527770127773794 (ssd165):
 Sep  1 06:18:54 myserverSCSI transport failed: reason
 'tran_err': retrying command
 Sep  1 06:19:05 myserver scsi: [ID 107833 kern.warning] WARNING:
 /scsi_vhci/s...@g60060e801527770127773794 (ssd165):
 Sep  1 06:19:05 myserverSCSI transport failed: reason 'timeout':
 retrying command
 Sep  1 06:21:57 myserver scsi: [ID 107833 kern.warning] WARNING:
 /scsi_vhci/s...@g60060e80152777012777376d (ssd168):
 Sep  1 06:21:57 myserverSCSI transport failed: reason
 'tran_err': retrying command
 Sep  1 06:22:45 myserver scsi: [ID 107833 kern.warning] WARNING:
 /scsi_vhci/s...@g60060e80152777012777376d (ssd168):
 Sep  1 06:22:45 myserverSCSI transport failed: reason 'timeout':
 retrying command
 Sep  1 06:23:03 myserver scsi: [ID 107833 kern.warning] WARNING:
 /scsi_vhci/s...@g60060e801527770127773787 (ssd166):
 Sep  1 06:23:03 myserverSCSI transport failed: reason 'timeout':
 giving up
 Sep  1 06:23:03 myserver vxio: [ID 539309 kern.warning] WARNING: VxVM
 vxio V-5-3-0 voldmp_errbuf_sio_start: Failed to flush the error buffer
 300ce41c340 on device 0x120003a to DMP
 Sep  1 06:23:03 myserver vxio: [ID 771159 kern.warning] WARNING: VxVM
 vxio V-5-0-2 Subdisk mydisk_2-02 block 5935: Uncorrectable write error
 Sep  1 06:23:03 myserver vxfs: [ID 702911 kern.warning] WARNING: msgcnt
 1 mesg 037: V-2-37: vx_metaioerr - vx_logbuf_clean -
 /dev/vx/dsk/mydg/vol1 file system meta data write error in dev/block 0/5935
 Sep  1 06:23:03 myserver vxfs: [ID 702911 kern.warning] WARNING: msgcnt
 2 mesg 031: V-2-31: vx_disable - /dev/vx/dsk/mydg/vol1 file system disabled
 Sep  1 06:23:03 myserver vxfs: [ID 702911 kern.warning] WARNING: msgcnt
 3 mesg 037: V-2-37: vx_metaioerr - vx_inode_iodone -
 /dev/vx/dsk/mydg/vol1 file system meta data write error in dev/block
 0/265984


 It seems VxDMP gets the I/O error at the same time as MPxIO  : I though
 MPxIO would have conceal the I/O error until failover has occured, which
 is not the 

Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

2010-09-16 Thread Sebastien DAUBIGNE
  Thank you Victor and William, it seems to be a very good lead.

Unfortunately, this tunable seems not to be supported in the VxVM 
version installed on my system :

  vxdmpadm gettune dmp_fast_recovery
VxVM vxdmpadm ERROR V-5-1-12015  Incorrect tunable
vxdmpadm gettune [tunable name]
Note - Tunable name can be dmp_failed_io_threshold, dmp_retry_count, 
dmp_pathswitch_blks_shift, dmp_queue_depth, dmp_cache_open, 
dmp_daemon_count, dmp_scsi_timeout, dmp_delayq_interval, dmp_path_age, 
or dmp_stat_interval

Something odd because my version is 5.0 MP3 Solaris SPARC, and according 
to http://seer.entsupport.symantec.com/docs/316981.htm this tunable 
should be available.

  modinfo | grep -i vx
  38 7846a000  3800e 288   1  vxdmp (VxVM 5.0-2006-05-11a: DMP Drive)
  40 784a4000 334c40 289   1  vxio (VxVM 5.0-2006-05-11a I/O driver)
  42 783ec71ddf8 290   1  vxspec (VxVM 5.0-2006-05-11a control/st)
296 78cfb0a2c6b 291   1  vxportal (VxFS 5.0_REV-5.0A55_sol portal )
297 78d6c000 1b9d4f   8   1  vxfs (VxFS 5.0_REV-5.0A55_sol SunOS 5)
298 78f18000   a270 292   1  fdd (VxQIO 5.0_REV-5.0A55_sol Quick )





Le 16/09/2010 12:15, Victor Engle a écrit :
 Which version of veritas? Version 4/2MP2 and version 5.x introduced a
 feature called DMP fast recovery. It was probably supposed to be
 called DMP fast fail but recovery sounds better. It is supposed to
 fail suspect paths more aggressively to speed up failover. But when
 you only have one vxvm DMP path, as is the case with MPxIO, and
 fast-recovery fails that path, then you're in trouble. In version 5.x,
 it is possible to disable this feature.

 Google DMP fast recovery.

 http://seer.entsupport.symantec.com/docs/307959.htm

 I can imagine there must have been some internal fights at symantec
 between product management and QA to get that feature released.

 Vic





 On Thu, Sep 16, 2010 at 6:03 AM, Sebastien DAUBIGNE
 sebastien.daubi...@atosorigin.com  wrote:
   Dear Vx-addicts,

 We encountered a failover issue on this configuration :

 - Solaris 9 HW 9/05
 - SUN SAN (SFS) 4.4.15
 - Emulex with SUN generic driver (emlx)
 - VxVM 5.0-2006-05-11a

 - storage on HP SAN (XP 24K).


 Multipathing is managed by MPxIO (not VxDMP) because the SAN team and HP
 support imposed the Solaris native solution for multipathing :

 VxVM ==  VxDMP ==  MPxIO ==  FCP ...

 We have 2 paths to the switch, linked to 2 paths to the storage, so the
 LUNs have 4 paths, with active/active support.
 Failover operation has been tested successfully by offlining each port
 successively on the SAN.

 We regulary have transient I/O errors (scsi timeout, I/O error retries
 with Unit attention), due to SAN-side issues. Usually these errors are
 transparently managed by MPxIO/VxVM without impact on the applications.

 Now for the incident we encountered :

 One of the SAN port was reset , consequently there were some transient
 I/O error.
 The other SAN port was OK, so the MPxIO multipathing layer should have
 failover the I/O on the other path, without transmiting the error to the
 VxDMP layer.
 For some reason, it did not failover the I/O before VxVM caught it as
 unrecoverable I/O error, disabling the subdisk and consequently the
 filesystem.

 Note the giving up message from scsi layer at 06:23:03 :

 Sep  1 06:18:54 myserver vxdmp: [ID 917986 kern.notice] NOTICE: VxVM
 vxdmp V-5-0-112 disabled path 118/0x558 belonging to the dmpnode 288/0x60
 Sep  1 06:18:54 myserver vxdmp: [ID 824220 kern.notice] NOTICE: VxVM
 vxdmp V-5-0-111 disabled dmpnode 288/0x60
 Sep  1 06:18:54 myserver vxdmp: [ID 917986 kern.notice] NOTICE: VxVM
 vxdmp V-5-0-112 disabled path 118/0x538 belonging to the dmpnode 288/0x20
 Sep  1 06:18:54 myserver vxdmp: [ID 917986 kern.notice] NOTICE: VxVM
 vxdmp V-5-0-112 disabled path 118/0x550 belonging to the dmpnode 288/0x18
 Sep  1 06:18:54 myserver vxdmp: [ID 824220 kern.notice] NOTICE: VxVM
 vxdmp V-5-0-111 disabled dmpnode 288/0x20
 Sep  1 06:18:54 myserver vxdmp: [ID 824220 kern.notice] NOTICE: VxVM
 vxdmp V-5-0-111 disabled dmpnode 288/0x18
 Sep  1 06:18:54 myserver scsi: [ID 107833 kern.warning] WARNING:
 /scsi_vhci/s...@g60060e801527770127773794 (ssd165):
 Sep  1 06:18:54 myserverSCSI transport failed: reason
 'tran_err': retrying command
 Sep  1 06:19:05 myserver scsi: [ID 107833 kern.warning] WARNING:
 /scsi_vhci/s...@g60060e801527770127773794 (ssd165):
 Sep  1 06:19:05 myserverSCSI transport failed: reason 'timeout':
 retrying command
 Sep  1 06:21:57 myserver scsi: [ID 107833 kern.warning] WARNING:
 /scsi_vhci/s...@g60060e80152777012777376d (ssd168):
 Sep  1 06:21:57 myserverSCSI transport failed: reason
 'tran_err': retrying command
 Sep  1 06:22:45 myserver scsi: [ID 107833 kern.warning] WARNING:
 /scsi_vhci/s...@g60060e80152777012777376d (ssd168):
 Sep  1 06:22:45 myserverSCSI transport failed: reason 'timeout':
 retrying command
 Sep  1 06:23:03 myserver scsi: [ID 107833 kern.warning] WARNING:
 

Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

2010-09-16 Thread Joshua Fielden
dmp_fast_recovery is a mechanism by which we bypass the sd/scsi stack and send 
path inquiry/status CDBs directly from the HBA in order to bypass long SCSI 
queues and recover paths faster. With a TPD (third-party driver) such as MPxIO, 
bypassing the stack means we bypass the TPD completely, and interactions such 
as this can happen. The vxesd (event-source daemon) is another 5.0/MP2 backport 
addition that's moot in the presence of a TPD.

From your modinfo, you're not actually running MP3. This technote 
(http://seer.entsupport.symantec.com/docs/327057.htm) isn't exactly your 
scenario, but looking for partially-installed pkgs is a good start to getting 
your server correctly installed, then the tuneable should work -- very early 
5.0 versions had a differently-named tuneable I can't find in my mail archive 
ATM.

Cheers,

Jf

-Original Message-
From: veritas-vx-boun...@mailman.eng.auburn.edu 
[mailto:veritas-vx-boun...@mailman.eng.auburn.edu] On Behalf Of Sebastien 
DAUBIGNE
Sent: Thursday, September 16, 2010 7:41 AM
To: Veritas-vx@mailman.eng.auburn.edu
Subject: Re: [Veritas-vx] Solaris-SFS / MPxIO / VxVM failover issue

  Thank you Victor and William, it seems to be a very good lead.

Unfortunately, this tunable seems not to be supported in the VxVM 
version installed on my system :

  vxdmpadm gettune dmp_fast_recovery
VxVM vxdmpadm ERROR V-5-1-12015  Incorrect tunable
vxdmpadm gettune [tunable name]
Note - Tunable name can be dmp_failed_io_threshold, dmp_retry_count, 
dmp_pathswitch_blks_shift, dmp_queue_depth, dmp_cache_open, 
dmp_daemon_count, dmp_scsi_timeout, dmp_delayq_interval, dmp_path_age, 
or dmp_stat_interval

Something odd because my version is 5.0 MP3 Solaris SPARC, and according 
to http://seer.entsupport.symantec.com/docs/316981.htm this tunable 
should be available.

  modinfo | grep -i vx
  38 7846a000  3800e 288   1  vxdmp (VxVM 5.0-2006-05-11a: DMP Drive)
  40 784a4000 334c40 289   1  vxio (VxVM 5.0-2006-05-11a I/O driver)
  42 783ec71ddf8 290   1  vxspec (VxVM 5.0-2006-05-11a control/st)
296 78cfb0a2c6b 291   1  vxportal (VxFS 5.0_REV-5.0A55_sol portal )
297 78d6c000 1b9d4f   8   1  vxfs (VxFS 5.0_REV-5.0A55_sol SunOS 5)
298 78f18000   a270 292   1  fdd (VxQIO 5.0_REV-5.0A55_sol Quick )





Le 16/09/2010 12:15, Victor Engle a écrit :
 Which version of veritas? Version 4/2MP2 and version 5.x introduced a
 feature called DMP fast recovery. It was probably supposed to be
 called DMP fast fail but recovery sounds better. It is supposed to
 fail suspect paths more aggressively to speed up failover. But when
 you only have one vxvm DMP path, as is the case with MPxIO, and
 fast-recovery fails that path, then you're in trouble. In version 5.x,
 it is possible to disable this feature.

 Google DMP fast recovery.

 http://seer.entsupport.symantec.com/docs/307959.htm

 I can imagine there must have been some internal fights at symantec
 between product management and QA to get that feature released.

 Vic





 On Thu, Sep 16, 2010 at 6:03 AM, Sebastien DAUBIGNE
 sebastien.daubi...@atosorigin.com  wrote:
   Dear Vx-addicts,

 We encountered a failover issue on this configuration :

 - Solaris 9 HW 9/05
 - SUN SAN (SFS) 4.4.15
 - Emulex with SUN generic driver (emlx)
 - VxVM 5.0-2006-05-11a

 - storage on HP SAN (XP 24K).


 Multipathing is managed by MPxIO (not VxDMP) because the SAN team and HP
 support imposed the Solaris native solution for multipathing :

 VxVM ==  VxDMP ==  MPxIO ==  FCP ...

 We have 2 paths to the switch, linked to 2 paths to the storage, so the
 LUNs have 4 paths, with active/active support.
 Failover operation has been tested successfully by offlining each port
 successively on the SAN.

 We regulary have transient I/O errors (scsi timeout, I/O error retries
 with Unit attention), due to SAN-side issues. Usually these errors are
 transparently managed by MPxIO/VxVM without impact on the applications.

 Now for the incident we encountered :

 One of the SAN port was reset , consequently there were some transient
 I/O error.
 The other SAN port was OK, so the MPxIO multipathing layer should have
 failover the I/O on the other path, without transmiting the error to the
 VxDMP layer.
 For some reason, it did not failover the I/O before VxVM caught it as
 unrecoverable I/O error, disabling the subdisk and consequently the
 filesystem.

 Note the giving up message from scsi layer at 06:23:03 :

 Sep  1 06:18:54 myserver vxdmp: [ID 917986 kern.notice] NOTICE: VxVM
 vxdmp V-5-0-112 disabled path 118/0x558 belonging to the dmpnode 288/0x60
 Sep  1 06:18:54 myserver vxdmp: [ID 824220 kern.notice] NOTICE: VxVM
 vxdmp V-5-0-111 disabled dmpnode 288/0x60
 Sep  1 06:18:54 myserver vxdmp: [ID 917986 kern.notice] NOTICE: VxVM
 vxdmp V-5-0-112 disabled path 118/0x538 belonging to the dmpnode 288/0x20
 Sep  1 06:18:54 myserver vxdmp: [ID 917986 kern.notice] NOTICE: VxVM
 vxdmp V-5-0-112 disabled path 118/0x550 belonging to the dmpnode 288/0x18
 Sep  1 06:18:54