Re: Tape Reliability Recommendations
Hi Peter, I sincerely empathize, as I've been living the same martyrdom as you some months ago. We had a brand new 3584 library, equiped with scsi lto drives, attached to the server thru 2108 San data gateways and fiber channeling. After months of investigation, upgrades of all kinds, drive exchanges etc ... We finally found that the length of fiber cables was a little bit too long, therefore generating timeouts errors, and subsequent tape failures. Don't know if it could be your case, but worth throwing an eye on it, if not already done ! My 2 cents ! Arnaud =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= | Arnaud Brion, Panalpina Management Ltd., IT Group | | Viaduktstrasse 42, P.O. Box, 4002 Basel - Switzerland | | Phone: +41 61 226 19 78 / Fax: +41 61 226 17 01 | =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= -Original Message- From: Peter Ford [mailto:[EMAIL PROTECTED]] Sent: Tuesday, 18 February, 2003 19:43 To: [EMAIL PROTECTED] Subject: Re: Tape Reliability Recommendations -Original Message- From: Kelly J. Lipp [mailto:[EMAIL PROTECTED]] Sent: Tuesday, February 18, 2003 10:32 AM To: [EMAIL PROTECTED] Subject: Re: Tape Reliability Recommendations As for reliability. That turns out to be a very mixed bag. ...snip... For instance, we have a site with a large fiber channel and LTO configuration. No end to the problems so far and they are very serious problems. Is this a result of the tape technology? I doubt it, but one never knows, do one?\ I would be very curious to hear what type of reliability problems you have seen with LTO. I have posted here before, but we have been experiencing an incredibly high number of read errors with our 3584 LTO library. We regularly see errors when trying to restore data from tapes. We have been auditing volumes recently and have seen errors on a tape during one audit, and then audit again, with no errors. There is no discernable pattern to these errors (across multiple tapes and multiple drives). Due to the nature of the data we are backing up, the data does not change often (and therefore the tapes are generally written to once, and the data stays there), so over-used tapes should not be an issue. Anything that you could share with the list, or me directly, would be greatly appreciated. Thanks. Peter Peter Ford System Engineer Stentor, Inc. 5000 Marina Blvd, Brisbane, CA 94005-1811 Main Phone: 650-228- Fax: 650 228-5566 http://www.stentor.com [EMAIL PROTECTED]
Re: Tape Reliability Recommendations
-Original Message- From: Peter Ford [mailto:[EMAIL PROTECTED]] As for reliability. That turns out to be a very mixed bag. ...snip... For instance, we have a site with a large fiber channel and LTO configuration. No end to the problems so far and they are very serious problems. Is this a result of the tape technology? I doubt it, but one never knows, do one?\ From: Kelly J. Lipp [mailto:[EMAIL PROTECTED]] I would be very curious to hear what type of reliability problems you have seen with LTO. I have posted here before, but we have been experiencing an incredibly high number of read errors with our 3584 LTO library. We regularly see errors when trying to restore data from tapes. We have been auditing volumes recently and have seen errors on a tape during one audit, and then audit again, with no errors. There is no discernable pattern to these errors (across multiple tapes and multiple drives). Due to the nature of the data we are backing up, the data does not change often (and therefore the tapes are generally written to once, and the data stays there), so over-used tapes should not be an issue. Anything that you could share with the list, or me directly, would be greatly appreciated. A problem we've had several times up here in the Great Dry North this winter has been an environmental one. The 3583 library is fairly vulnerable to a lack of humidity. While the docs say that 20% is the minimum required for proper operation, we've found that 40% is really the minimum needed, particularly in server rooms that are not really equipped as server rooms; i.e., carpet on the floor, no raised floor, lots of foot traffic, etc. If you can scoot your feet around, touch the outside of the library cabinet, and get **zapped**, you've got a problem. (Your server room should be at 40% in any event; tape 'floats' best across tape heads at that humidity.) IBM found a workaround for the lack of humidity. At two sites I've been to, they've taken one of those yellow-and-green grounding straps they use to ground mainframe boxes, and attached the library's outside panel to a decent ground. One customer went from multiple, daily, severe problems to no problems at all in one day. (Many thanks go to Bryan Hanson, IBM tape Top Gun, for the fix.) The problem is that the 3583 is a complex machine that combines many moving mechanical parts and electronics in a relatively small metal box. When you send a charge through the box, its relatively small surface area allows a substantial charge through the box, rather than dissipating it across its surface. Larger libraries (like the 3584) can dissipate the charge faster and are therefore less vulnerable. Don't trust those Wal-Mart temperature/humidity meters. If you're having 3583 problems that can't seem to get fixed, and your environment looks like the one I've described above, get a good meter and check your server room. -- Mark Stapleton ([EMAIL PROTECTED])
Re: Tape Reliability Recommendations
Adding a bit of my own experience to Kelley's: Even though we are all using TSM, we use the hardware differently. At one time we had two DLT libraries, with libraries and drives provided by the same vendor. Identical hardware, manufacturer, media, and microcode, same level of TSM running each. One we got perfect reliability, the other was a nightmare - constant read and write errors, and many drive failures. We couldn't stop the problems, even after replacing all the drives, more than once. After a LOT of time spent with the vendor and hardware gurus, we finally learned: Part of it is just the total load on the hardware, and the other part is the TYPE of backups you run. - If you are doing a LOT OF TINY files - for example, workstation/desktop backups - you will get a tremendous amount of start/stop activity (or call it backhitch, or repositioning, whatever) during migration to the tape, and during reclaims. TSM uses tape almost like a direct access device, and this pushes the media and the drive to their max capability. You need the best drive mechanics and the best media you can buy. And you will need to be on GOOD TERMS with your vendor and stay on top of those microcode levels. -If you are dumping a FEW BIG FILES daily - for example, huge databases - you tend to write one or two big files on the tape and you're done. The tape sits in your vault for a while, then you do the same thing to the tape again. Even though you may be writing MORE GB PER DAY than with the workstation model above, it's far less stress on the drives and the media. You will probably get better reliability on your drives/media than someone doing workstation backups with the same hardware. My opinion and nobody else's, Wanda Prather -Original Message- From: Kelly J. Lipp [mailto:[EMAIL PROTECTED] mailto:[EMAIL PROTECTED] ] Sent: Tuesday, February 18, 2003 1:32 PM To: [EMAIL PROTECTED] Subject: Re: Tape Reliability Recommendations I have done a significant amount of testing and have quite a lot of practical experience with what I will refer to as the Big Three tape technologies: AIT3 LTO1 SDLT320 Of the three, I know AIT the best. It's always good to know where someone sits before they tell you where they stand. In a TSM environment, all three of these technologies perform very similarly: within 10-15% of each other. Don't let the manufacturer's performance claims sway your decision. Backup is generally not about the hardware but more about the software. TSM is quite powerful, but often trades power for speed. Remember, we have a sophisticated database running here to track what's going on. For all three drives, we are able to sustain between 35 and 45 GBytes per hour during storage pool to storage pool operations. For instance, migration from Disk to Tape or Backup stg tape to tape. In addition, you can expect to see about the same performance when clients are writing data directly to tape (or even multiple tapes simultaneously while using the stg pool parameter copystgpool). When sizing an environment, use the 35 GB/Hour number and you won't be unhappy. As for reliability. That turns out to be a very mixed bag. I have seen sites with high volumes of data and no errors or problems with all three and I have seen sites with numerous problems. The problems seem to be mostly related to drive firmware levels and tape batches. Once the drive firmware is correct and bad tapes are eliminated, most sites settle down nicely. The more complex the environment, the more likely the problems. For instance, we have a site with a large fiber channel and LTO configuration. No end to the problems so far and they are very serious problems. Is this a result of the tape technology? I doubt it, but one never knows, do one?\ Due to the nature of AIT3, I would suspect that overall reliability numbers will be lower than for LTO and SDLT, but my hands-on experience doesn't show that. As for Automation. There are gazillions of libraries for each technology. Clear winners in my opinion are Qualstar and perhaps IBM. I give the IBM libraries a perhaps as we have had very good experience with the 349x libraries and only limited experience with 3584. These seem OK, but not much experience. The lower end IBM libraries are based on someone else's technology so I would think one might get a better deal buying direct from that manufacturer. Compatibility with previous technology. Some DLT bigots are SDLT bigots because they believe in investment protection. I think that's balderdash as very few people would ever try to read a DLT tape with an SDLT drive anyway so what difference does it make? All three of these are relatively new technologies and you are going to switch to one anyway, so investigate all three. The all important Kelly recommendation: For value, AIT3 is unsurpassed: very good performance, relatively inexpensive, great automation, manufactured by one company so technology is first rate
Re: Tape Reliability Recommendations
-Original Message- From: Kelly J. Lipp [mailto:[EMAIL PROTECTED]] Sent: Tuesday, February 18, 2003 10:32 AM To: [EMAIL PROTECTED] Subject: Re: Tape Reliability Recommendations As for reliability. That turns out to be a very mixed bag. ...snip... For instance, we have a site with a large fiber channel and LTO configuration. No end to the problems so far and they are very serious problems. Is this a result of the tape technology? I doubt it, but one never knows, do one?\ I would be very curious to hear what type of reliability problems you have seen with LTO. I have posted here before, but we have been experiencing an incredibly high number of read errors with our 3584 LTO library. We regularly see errors when trying to restore data from tapes. We have been auditing volumes recently and have seen errors on a tape during one audit, and then audit again, with no errors. There is no discernable pattern to these errors (across multiple tapes and multiple drives). Due to the nature of the data we are backing up, the data does not change often (and therefore the tapes are generally written to once, and the data stays there), so over-used tapes should not be an issue. Anything that you could share with the list, or me directly, would be greatly appreciated. Thanks. Peter Peter Ford System Engineer Stentor, Inc. 5000 Marina Blvd, Brisbane, CA 94005-1811 Main Phone: 650-228- Fax: 650 228-5566 http://www.stentor.com [EMAIL PROTECTED]
Re: Tape Reliability Recommendations
I would be very curious to hear what type of reliability problems you have seen with LTO. I have posted here before, but we have been experiencing an incredibly high number of read errors with our 3584 LTO library. We regularly see errors when trying to restore data from tapes. We have been auditing volumes recently and have seen errors on a tape during one audit, and then audit again, with no errors. There is no discernable pattern to these errors (across multiple tapes and multiple drives). Due to the nature of the data we are backing up, the data does not change often (and therefore the tapes are generally written to once, and the data stays there), so over-used tapes should not be an issue. I can name without much thinking 5 to 6 customers with either a 3583 or 3584 with lots of errors on tapes. For the 3583 this is due to the quality of the library. For the 3584 we got better since microcode 25D4 is installed on the (fc) drives. The library we keep on fw 2460. Since that time we saw that the IBM engineers did not have to come every week onsite to replace drives where stuck tapes were locked inside. We also found out it is not a matter of tapebrands, the IBM or the Imation give an equal amout of errors. I think it can only solved by keep checking and doing audits on volumes, in some envrioment we even moved the data of the tape and removed that tapecartridge. Another thing about reliability is the internal storwatch specialist which is running, or not, oh yes it works,, no it does not Even setting everything to 10mbit half duplex did not solve this issue. I would move the data from tape to tape every year to avoid tapeproblems. good luck Peter
Re: Tape Reliability Recommendations
We were getting a fair amount of 36 errors on our 3584 and they corresponded to I/O errors I was seeing in TSM. After talking to IBM and our CE, we wound up updating our library firmware to 3060 and (knock on wood) we haven't seen any errors since. Library 3584 4 fiber attached LTO drives Library firmware: 3060 Drive firmware: 25D4 TSM server 5.1.1.6 Greg Redell Great-West Life Annuity Insurance Co. Phone: 314-525-5877 Email: [EMAIL PROTECTED] |-+ | | Peter Ford | | | [EMAIL PROTECTED]| | | M | | | Sent by: ADSM: | | | Dist Stor| | | Manager | | | [EMAIL PROTECTED]| | | .EDU| | || | || | | 02/18/2003 12:43 | | | PM | | | Please respond to| | | ADSM: Dist Stor | | | Manager | |-+ --| | | | To: [EMAIL PROTECTED] | | cc: | | Subject: Re: Tape Reliability Recommendations | --| -Original Message- From: Kelly J. Lipp [mailto:[EMAIL PROTECTED]] Sent: Tuesday, February 18, 2003 10:32 AM To: [EMAIL PROTECTED] Subject: Re: Tape Reliability Recommendations As for reliability. That turns out to be a very mixed bag. ...snip... For instance, we have a site with a large fiber channel and LTO configuration. No end to the problems so far and they are very serious problems. Is this a result of the tape technology? I doubt it, but one never knows, do one?\ I would be very curious to hear what type of reliability problems you have seen with LTO. I have posted here before, but we have been experiencing an incredibly high number of read errors with our 3584 LTO library. We regularly see errors when trying to restore data from tapes. We have been auditing volumes recently and have seen errors on a tape during one audit, and then audit again, with no errors. There is no discernable pattern to these errors (across multiple tapes and multiple drives). Due to the nature of the data we are backing up, the data does not change often (and therefore the tapes are generally written to once, and the data stays there), so over-used tapes should not be an issue. Anything that you could share with the list, or me directly, would be greatly appreciated. Thanks. Peter Peter Ford System Engineer Stentor, Inc. 5000 Marina Blvd, Brisbane, CA 94005-1811 Main Phone: 650-228- Fax: 650 228-5566 http://www.stentor.com [EMAIL PROTECTED]
Re: Tape Reliability Recommendations
I have done a significant amount of testing and have quite a lot of practical experience with what I will refer to as the Big Three tape technologies: AIT3 LTO1 SDLT320 Of the three, I know AIT the best. It's always good to know where someone sits before they tell you where they stand. In a TSM environment, all three of these technologies perform very similarly: within 10-15% of each other. Don't let the manufacturer's performance claims sway your decision. Backup is generally not about the hardware but more about the software. TSM is quite powerful, but often trades power for speed. Remember, we have a sophisticated database running here to track what's going on. For all three drives, we are able to sustain between 35 and 45 GBytes per hour during storage pool to storage pool operations. For instance, migration from Disk to Tape or Backup stg tape to tape. In addition, you can expect to see about the same performance when clients are writing data directly to tape (or even multiple tapes simultaneously while using the stg pool parameter copystgpool). When sizing an environment, use the 35 GB/Hour number and you won't be unhappy. As for reliability. That turns out to be a very mixed bag. I have seen sites with high volumes of data and no errors or problems with all three and I have seen sites with numerous problems. The problems seem to be mostly related to drive firmware levels and tape batches. Once the drive firmware is correct and bad tapes are eliminated, most sites settle down nicely. The more complex the environment, the more likely the problems. For instance, we have a site with a large fiber channel and LTO configuration. No end to the problems so far and they are very serious problems. Is this a result of the tape technology? I doubt it, but one never knows, do one?\ Due to the nature of AIT3, I would suspect that overall reliability numbers will be lower than for LTO and SDLT, but my hands-on experience doesn't show that. As for Automation. There are gazillions of libraries for each technology. Clear winners in my opinion are Qualstar and perhaps IBM. I give the IBM libraries a perhaps as we have had very good experience with the 349x libraries and only limited experience with 3584. These seem OK, but not much experience. The lower end IBM libraries are based on someone else's technology so I would think one might get a better deal buying direct from that manufacturer. Compatibility with previous technology. Some DLT bigots are SDLT bigots because they believe in investment protection. I think that's balderdash as very few people would ever try to read a DLT tape with an SDLT drive anyway so what difference does it make? All three of these are relatively new technologies and you are going to switch to one anyway, so investigate all three. The all important Kelly recommendation: For value, AIT3 is unsurpassed: very good performance, relatively inexpensive, great automation, manufactured by one company so technology is first rate. For openness (or perceived openness) LTO: excellent performance, reasonably priced, so-so automation, standards based and built by more than one manufacturer (but how many of us are going to buy from more than one anyway and if you attend presentations by each one about their LTO product you come away from each one in succession thinking you have found the best, i.e., they all lie equally convincingly (probably shouldn't have two ly words in the same sentence)). For perceived technical excellence, SDLT: Quantum has very neat technology in their drives. Does it matter much? Probably not, but cool anyway. So: For the price conscious: AIT3 going to AIT4 when available. If you're an Open kind of dude: LTO If you believe in Quantum: SDLT. They offer a very good product IMHO. LTO and SDLT will be very close in price so go with your gut. As always, study, study, study. Get input from those you respect. Choose wisely and then get and stay behind your choice. STORServer supports all three technologies equally. Views expressed here are my own. Kelly J. Lipp STORServer, Inc. 485-B Elkton Drive Colorado Springs, CO 80907 [EMAIL PROTECTED] or [EMAIL PROTECTED] www.storsol.com or www.storserver.com (719)531-5926 Fax: (240)539-7175 -Original Message- From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED]]On Behalf Of Colby Morgan Sent: Thursday, February 13, 2003 3:24 PM To: [EMAIL PROTECTED] Subject: Tape Reliability Recommendations We are currently running TSM 5.1.5 on Win2k with an IBM Mammoth-2 drive for offsite copypools. We have had problems with both our onsite M2 and offsite M2 drives at our disaster recovery center. IBM has replaced the drive more than a dozen times in the last two years and Exabyte has replaced countless tapes. Most recently we are experiencing a high rate of media write failures on a newly replaced drive as well as media read failures in DR testing, both using brand new 225m AME media
Tape Reliability Recommendations
We are currently running TSM 5.1.5 on Win2k with an IBM Mammoth-2 drive for offsite copypools. We have had problems with both our onsite M2 and offsite M2 drives at our disaster recovery center. IBM has replaced the drive more than a dozen times in the last two years and Exabyte has replaced countless tapes. Most recently we are experiencing a high rate of media write failures on a newly replaced drive as well as media read failures in DR testing, both using brand new 225m AME media. Is anybody else out their running an IBM/Exabyte Mammoth-2 drive and if so what kind of results do you see? My real question is what is the most common/reliable removable tape technologies for the Intel TSM environment? We are considering switching technologies and I wanted to solicit testimonies on other technologies (DLT, LTO, SDLT,etc...). We currently copy around 135GB to 300GB offsite daily. Thanks, Colby
Re: Tape Reliability Recommendations
Colby, We have been using the Exabyte M2 drives and tapes for a couple of years now. We did have lots of problems and wore out several drives before we found out that that running our four drive library on one 29160 interface was causing the drives and tapes to wear out. We reconfigured and now only have two drives on each 29160 and the problems have almost completely ceased. We have two TSM server sites. Total primary pool storage use is about 3tb. In site 1 we have one library with 80 slots and four M2 drives for our primary tapepool. We also have two external M2 drives that we create offsite copypool tapes with. The tapes from one copypool go across town by courier and the other copypool tapes are sent DHL to site 2 in another city. At site 2 we have a 3494 with two 3590e drives for our primary tapepool and use the same drives for creating copypool tapes that go across the street. Another set of external M2 drives create copypool tapes that are sent DHL to site 1. Each site is the disaster recovery site for the other. I have run a complete recovery test using the media that was sent from site 1 to site 2 and had no problems reading the M2 tapes. The M2 drives are better than the M2 media. Heavy usage of the tapes would require regular replacement. Not like the 3590 drive and tapes that we beat to death and as they say takes a lickin and keeps on tickin. In my experience Exabyte support is marginal at best. It takes an act of God for them to admit that one of their tapes might be defective. Contact me offline if you want to discuss further. Colby Morgan wrote: We are currently running TSM 5.1.5 on Win2k with an IBM Mammoth-2 drive for offsite copypools. We have had problems with both our onsite M2 and offsite M2 drives at our disaster recovery center. IBM has replaced the drive more than a dozen times in the last two years and Exabyte has replaced countless tapes. Most recently we are experiencing a high rate of media write failures on a newly replaced drive as well as media read failures in DR testing, both using brand new 225m AME media. Is anybody else out their running an IBM/Exabyte Mammoth-2 drive and if so what kind of results do you see? My real question is what is the most common/reliable removable tape technologies for the Intel TSM environment? We are considering switching technologies and I wanted to solicit testimonies on other technologies (DLT, LTO, SDLT,etc...). We currently copy around 135GB to 300GB offsite daily. Thanks, Colby -- Steve Bennett, (907) 465-5783 State of Alaska, Information Technology Group, Technical Services Section