Re: [Veritas-vx] Question over DMP partitionsize
Hi All, Thanks for all the information and suggestions it's given me a lot to look at. Hopefully I've covered all your questions below... I've checked and there's no evidence I can find that anyone has made any changes against the arrays, either via the command line or vea - the latter which we don't use. However, looking at the tuneable values and what they're set to and what the defaults are there are two that are different to the defaults... fuj411:/root# vxdmpadm gettune all Tunable Current Value Default Value ---- - dmp_failed_io_threshold 5760057600 dmp_retry_count 55 dmp_pathswitch_blks_shift11 11 dmp_queue_depth 32 32 dmp_cache_open off off dmp_daemon_count 10 10 dmp_scsi_timeout 30 30 dmp_delayq_interval 15 15 dmp_path_age 0 300 dmp_stat_interval 11 dmp_health_time 0 60 dmp_probe_idle_lun on on dmp_log_level 11 dmp_retry_timeout 00 fuj411:/root# I don't know who would've changed these or when, or at all. There's a lot of disk seen down each path - physical HBA. We have an A and a B side to the SAN with the disk being presented down both sides to separate HBA's on the server. The errors and other messages are always to the same array (IBM_SHARK1) which is the one that has the different value set, and is the one that see's the throttling against it. The version of VxVM is 4.1 MP2 so I know we are downlevel and doesn't have the new default value of 512 that's come in with 5.1 MP3. All paths are active and i/o is seen going down both paths to all disks. I can see this at the server and the SAN port level. The storage team have checked and the cannot find anything untoward. I've also had a look at the HBA's as well (fcinfo hba-port -l HBA_WWN) The recovery option on both arrays is the same and set to the following... fuj411:/root# vxdmpadm getattr enclosure IBM_SHARK0 recoveryoption ENCLR-NAME RECOVERY-OPTION DEFAULT[VAL]CURRENT[VAL] == IBM_SHARK0 Throttle Timebound[10] Timebound [10] IBM_SHARK0 Error-Retry Fixed-Retry[5] Fixed-Retry [5] fuj411:/root# fuj411:/root# vxdmpadm getattr enclosure IBM_SHARK1 recoveryoption ENCLR-NAME RECOVERY-OPTION DEFAULT[VAL]CURRENT[VAL] == IBM_SHARK1 Throttle Timebound[10] Timebound [10] IBM_SHARK1 Error-Retry Fixed-Retry[5] Fixed-Retry [5] fuj411:/root# My suspicion is that we have some kind of fault it's just identifying where. I suspect this will involve getting the server up to the latest patch levels in the o/s and within the version installed for starters and getting the storage team to carry out a full end to end test of the hardware. In the meantime thanks again for all your help with this, very much appreciated. I think I will take Dmitry's advice and log a call with Symantec and see if they can explain the difference and whether it's just the way this array type operates. Thanks again. Cheers Phil. Dmitry Glushenok gl...@jet.msk.su To William Havey bbha...@gmail.com 26/07/2012 21:09 phil.cole...@ba.com cc veritas-vx@mailman.eng.auburn.edu Subject Re: [Veritas-vx] Question over DMP partitionsize
Re: [Veritas-vx] Question over DMP partitionsize
Your situation helps me understand the internals of DMP that much more. Sorry my benefit comes at your disadvantage. Good luck with the mystery. BTW, a tip-of-the-hat to you for using the word ontoward. Nice to read a tech person using our language so well. On Fri, Jul 27, 2012 at 6:26 AM, phil.cole...@ba.com wrote: Hi All, Thanks for all the information and suggestions it's given me a lot to look at. Hopefully I've covered all your questions below... I've checked and there's no evidence I can find that anyone has made any changes against the arrays, either via the command line or vea - the latter which we don't use. However, looking at the tuneable values and what they're set to and what the defaults are there are two that are different to the defaults... fuj411:/root# vxdmpadm gettune all Tunable Current Value Default Value ---- - dmp_failed_io_threshold 5760057600 dmp_retry_count 55 dmp_pathswitch_blks_shift11 11 dmp_queue_depth 32 32 dmp_cache_open off off dmp_daemon_count 10 10 dmp_scsi_timeout 30 30 dmp_delayq_interval 15 15 dmp_path_age 0 300 dmp_stat_interval 11 dmp_health_time 0 60 dmp_probe_idle_lun on on dmp_log_level 11 dmp_retry_timeout 00 fuj411:/root# I don't know who would've changed these or when, or at all. There's a lot of disk seen down each path - physical HBA. We have an A and a B side to the SAN with the disk being presented down both sides to separate HBA's on the server. The errors and other messages are always to the same array (IBM_SHARK1) which is the one that has the different value set, and is the one that see's the throttling against it. The version of VxVM is 4.1 MP2 so I know we are downlevel and doesn't have the new default value of 512 that's come in with 5.1 MP3. All paths are active and i/o is seen going down both paths to all disks. I can see this at the server and the SAN port level. The storage team have checked and the cannot find anything untoward. I've also had a look at the HBA's as well (fcinfo hba-port -l HBA_WWN) The recovery option on both arrays is the same and set to the following... fuj411:/root# vxdmpadm getattr enclosure IBM_SHARK0 recoveryoption ENCLR-NAME RECOVERY-OPTION DEFAULT[VAL]CURRENT[VAL] == IBM_SHARK0 Throttle Timebound[10] Timebound [10] IBM_SHARK0 Error-Retry Fixed-Retry[5] Fixed-Retry [5] fuj411:/root# fuj411:/root# vxdmpadm getattr enclosure IBM_SHARK1 recoveryoption ENCLR-NAME RECOVERY-OPTION DEFAULT[VAL]CURRENT[VAL] == IBM_SHARK1 Throttle Timebound[10] Timebound [10] IBM_SHARK1 Error-Retry Fixed-Retry[5] Fixed-Retry [5] fuj411:/root# My suspicion is that we have some kind of fault it's just identifying where. I suspect this will involve getting the server up to the latest patch levels in the o/s and within the version installed for starters and getting the storage team to carry out a full end to end test of the hardware. In the meantime thanks again for all your help with this, very much appreciated. I think I will take Dmitry's advice and log a call with Symantec and see if they can explain the difference and whether it's just the way this array type operates. Thanks again. Cheers Phil. Dmitry Glushenok gl...@jet.msk.su To William Havey bbha...@gmail.com 26/07/2012 21:09 phil.cole...@ba.com cc veritas-vx@mailman.eng.auburn.edu Subject Re: [Veritas-vx] Question over DMP partitionsize Hello William, Phil, 26.07.2012, в 21:25, William Havey написал(а
Re: [Veritas-vx] Question over DMP partitionsize
Hi William, Thanks for the reply. Not sure how to get this track cach size? What's confusing me most here is that the values are so different between the two arrays. They're identical models set-up up the same and with the same number of disks allocated to the servers - sorry, forgot to mention they're in an HA pair using VCS. The only difference is that IBM_SHARK0 is local to the server where the workload is currently running, and IBM_SHARK1 is in another building about 1.5KM's away. It's this disparity that is confusing me and making me wonder whether an issue we are seeing is being caused by this, or if it's an indication of an issue, though I've been informed by our storage people that there's nothing wrong with either array. This is certainly not something we've changed so I don't know if VxVM/DMP is throttling things back because it's seeing an issue. I have seen messages in the /etc/vx/dmpevents.log file for disks in the IBM_SHARK1 array reporting 'Throttled Path' and then 'Un-throttled Path' and I'm trying to work out if the two are linked. Cheers Phil. William Havey bbha...@gmail.co m To phil.cole...@ba.com 26/07/2012 15:48 cc Subject Re: [Veritas-vx] Question over DMP partitionsize Partitionsize is in play when the iopolicy is Balanced, which is the default policy. You have Balanced. It is defined as The partitionsize attribute: Each successive I/O starting within in this range (default is 2048 sectors) goes through the same path as the previous I/O The man page has Takes the track cache into consideration when balancing I/O across paths What is the track cache size on each array? Seems that the partitionsize should be the same value as the track cache size. On Thu, Jul 26, 2012 at 7:09 AM, phil.cole...@ba.com wrote: Hi, I'm trying to understand why one array has a wildly different partitionsize value to the other... fuj411:/root# vxdmpadm getattr arrayname IBM_SHARK partitionsize ENCLR_NAME DEFAULT CURRENT IBM_SHARK0 2048 2048 IBM_SHARK1 256 512 fuj411:/root# Both arrays are IBM ESS SHARK 2105's which are active/active and operating in a Balanced i/o policy... fuj411:/root# vxdmpadm getattr arrayname IBM_SHARK iopolicy ENCLR_NAME DEFAULT CURRENT IBM_SHARK0 Balanced Balanced IBM_SHARK1 Balanced Balanced fuj411:/root# The server is a Fujitsu PW850 running Solaris 10 with VxVM v4.1 MP2. The volume are mirrored between the two arrays. I'm trying to get a better understanding of what this means and why it would be so different between the two arrays. Any help greatly appreciated. Cheers Phil. -- This message is private and confidential and may also be legally privileged. If you have received this message in error, please email it back to the sender and immediately permanently delete it from your computer system. Please do not read, print, re-transmit, store or act in reliance on it or any attachments. British Airways may monitor email traffic data and also the content of emails, where permitted by law, for the purposes of security and staff training and in order to prevent or detect unauthorised use of the British Airways email system. Virus checking of emails (including attachments) is the responsibility of the recipient. British Airways Plc is a public limited company registered in England and Wales. Registered number: 177. Registered office: Waterside, PO Box 365, Harmondsworth, West Drayton, Middlesex, England, UB7 0GB. Additional terms and conditions are available on our website: www.ba.com ___ Veritas-vx maillist - Veritas-vx@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman
Re: [Veritas-vx] Question over DMP partitionsize
DMP is most likely throttling things back due to an issue on a path: I/O throttling is a mechanism by which Dynamic Multi-Pathing (DMP) temporarily stops issuing I/Os to paths that appear to be either overloaded or underperforming. There is a default I/O throttling mechanism in DMP based on the number of requests queued on a path. I am researching why you are seeing differing partition sizes. Would think it has something to do with configuration within the OS level or vxvm area, not the array itself. Regards, Terrie Douglas Sr. Prin. Technical Support Engineer Symantec Software Corporation Email: terrie_doug...@symantec.com Customer Support: 1(800) 342-0652 View your case online at: https://mysupport.symantec.com Save time and visit the Veritas Installation Assessment Services website and check out our automated tools: https://vias.symantec.com/main.php This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public,proprietary, privileged, confidential, and exempt from disclosure under applicable law or may constitute as attorney work product. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error,notify us immediately by telephone and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. -Original Message- From: veritas-vx-boun...@mailman.eng.auburn.edu [mailto:veritas-vx-boun...@mailman.eng.auburn.edu] On Behalf Of phil.cole...@ba.com Sent: Thursday, July 26, 2012 8:56 AM To: William Havey Cc: veritas-vx@mailman.eng.auburn.edu Subject: Re: [Veritas-vx] Question over DMP partitionsize Hi William, Thanks for the reply. Not sure how to get this track cach size? What's confusing me most here is that the values are so different between the two arrays. They're identical models set-up up the same and with the same number of disks allocated to the servers - sorry, forgot to mention they're in an HA pair using VCS. The only difference is that IBM_SHARK0 is local to the server where the workload is currently running, and IBM_SHARK1 is in another building about 1.5KM's away. It's this disparity that is confusing me and making me wonder whether an issue we are seeing is being caused by this, or if it's an indication of an issue, though I've been informed by our storage people that there's nothing wrong with either array. This is certainly not something we've changed so I don't know if VxVM/DMP is throttling things back because it's seeing an issue. I have seen messages in the /etc/vx/dmpevents.log file for disks in the IBM_SHARK1 array reporting 'Throttled Path' and then 'Un-throttled Path' and I'm trying to work out if the two are linked. Cheers Phil. Veritas-vx maillist - Veritas-vx@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-vx ___ Veritas-vx maillist - Veritas-vx@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-vx
Re: [Veritas-vx] Question over DMP partitionsize
Phil, I don't believe any DMP values are changed by vxconfigd automatically. To be certain the partitionsize has not been manually changed, Is VxVM command line logging enabled? run vxcmdlog -l to find out If yes, look in /var/adm/vx/cmdlog for any vxdmpadm setattr commands. Do you use vea? If so, its commands are logged in /var/adm/vx/veacmdlog. But, I don't believe these values can beset in vea. I believe the throttle and unthrottle in the dmpevents log file means holding back I/O from being addressed and sent out to storage but I certainly can't say the two (partitionsize of 512 and log entries for the SHARK1 with that size) are related or not. Are all paths being used for I/O just about the same? vxdmpadm iostat start vxdmpadm iostat reset vxdmpadm iostat show enclosure*=*IBM_SHARK0 enclosure*=*IBM_SHARK1 interval=5 The first iteration of the utility shows cumulative statistics from the time of mounting the file systems. So, ignore the first output. If totals are about equal, then the value the partitionsize value may be irrelevant. Perhaps I/O is truly random so that no I/O address is within 512 sectors of the previous I/O. Even in a random environment sometimes numerous successive I/O's can be within 512 sectors of each other so that throttling those I/Os kicks-in. IHTH, Bill On Thu, Jul 26, 2012 at 11:56 AM, phil.cole...@ba.com wrote: Hi William, Thanks for the reply. Not sure how to get this track cach size? What's confusing me most here is that the values are so different between the two arrays. They're identical models set-up up the same and with the same number of disks allocated to the servers - sorry, forgot to mention they're in an HA pair using VCS. The only difference is that IBM_SHARK0 is local to the server where the workload is currently running, and IBM_SHARK1 is in another building about 1.5KM's away. It's this disparity that is confusing me and making me wonder whether an issue we are seeing is being caused by this, or if it's an indication of an issue, though I've been informed by our storage people that there's nothing wrong with either array. This is certainly not something we've changed so I don't know if VxVM/DMP is throttling things back because it's seeing an issue. I have seen messages in the /etc/vx/dmpevents.log file for disks in the IBM_SHARK1 array reporting 'Throttled Path' and then 'Un-throttled Path' and I'm trying to work out if the two are linked. Cheers Phil. William Havey bbha...@gmail.co m To phil.cole...@ba.com 26/07/2012 15:48 cc Subject Re: [Veritas-vx] Question over DMP partitionsize Partitionsize is in play when the iopolicy is Balanced, which is the default policy. You have Balanced. It is defined as The partitionsize attribute: Each successive I/O starting within in this range (default is 2048 sectors) goes through the same path as the previous I/O The man page has Takes the track cache into consideration when balancing I/O across paths What is the track cache size on each array? Seems that the partitionsize should be the same value as the track cache size. On Thu, Jul 26, 2012 at 7:09 AM, phil.cole...@ba.com wrote: Hi, I'm trying to understand why one array has a wildly different partitionsize value to the other... fuj411:/root# vxdmpadm getattr arrayname IBM_SHARK partitionsize ENCLR_NAME DEFAULTCURRENT IBM_SHARK0 2048 2048 IBM_SHARK1 256512 fuj411:/root# Both arrays are IBM ESS SHARK 2105's which are active/active and operating in a Balanced i/o policy... fuj411:/root# vxdmpadm getattr arrayname IBM_SHARK iopolicy ENCLR_NAME DEFAULTCURRENT IBM_SHARK0 Balanced Balanced IBM_SHARK1 Balanced Balanced fuj411:/root# The server is a Fujitsu PW850 running Solaris 10 with VxVM v4.1 MP2. The volume are mirrored between the two arrays. I'm trying to get a better understanding of what this means and why it would be so different between the two arrays. Any help greatly appreciated. Cheers Phil. -- This message is private and confidential and may also be legally privileged. If you have received this message in error, please email it back to the sender and immediately permanently delete it from your computer system. Please do not read, print, re-transmit, store or act in reliance on it or any attachments. British Airways may monitor email traffic data and also the content of emails