Re: [Veritas-bu] Tapeless backup environments
Hmmm.. I need to look into this further. I could have sworn that it stores a checksum per file backed up, and that it used that checksum when it restored the file to see if the restored file is the same as the backup. I wonder if we can get an authoritative answer on this from a Symantec lurker. --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Thursday, October 25, 2007 12:26 PM To: [EMAIL PROTECTED]; veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments Nope - I don't think Netbackup is making checksums. Tape hardware seems to be reasonably adept at detecting big tape errors, though. This, of course, goes away with disk based backups. bpverify is just a check of the tape contents vs the media catalog. It does read the tape blocks so it may allow the drive to detect a media error but it's not a verification of the block integrity vs some stored checksum. DESCRIPTION bpverify verifies the contents of one or more backups by reading the backup volume and comparing its contents to the NetBackup catalog. This operation does not compare the data on the volume with the contents of the client disk. However, it does read each block in the image, thus verifying that the volume is readable. NetBackup verifies only one backup at a time and tries to minimize media mounts and positioning time. -M -Original Message- From: Len Boyle [mailto:[EMAIL PROTECTED] Sent: Monday, October 22, 2007 6:37 PM To: Donaldson, Mark - Broomfield, CO; veritas-bu@mailman.eng.auburn.edu Subject: RE: Re: [Veritas-bu] Tapeless backup environments Hello Mark, Did I read in this list that netbackup was supposed to do some kind of checksum on the data written to tape? If so would a bpverify check this. I would assume that if netbackup does this it would find the error. because netbackup would do it's calc before passing the block to the dedupe hardware/software. And the block that it gets back from the dedupe hardware/software would be different. Of course the brings on the question with the Symantec/Veritas pure disk product or emc's as the netbackup and the dedupe parts are merged one would not have this double check. At least I would not think that one would. len -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Monday, October 22, 2007 4:52 PM To: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments I think that part of the problem is that a hash duplication is nearly undetectable until you have restored and tested it as false. We all know that 99.999% of what we back up is never restored. It just ages gracefully on media and is expired. If any of that .001% is restored and is damaged due to a tape fault (and we've all had it happen) then we all know that we can usually reach back to a different version or different tape and we'll be close enough to make the user go away and let us return to our coffee and surfing. I think a big part of the worry of a hash collision is that the restore seems to happen, the file restores flawlessly, and it'll not be detectable unless someone can checksum the whole file or it's a binary or similar that simply refuses to work. Again, restoring from a different tape, different version may be ineffective depending on where the hash collision occurred and for what reason. Every version may use this same unchanging block which is restore incorrectly due to an invalid hash match. I know the odds are astronomical but I still remember that even though the odds are 150 million to one I'll win the lottery, I still see smiling faces on TV holding giant checks. It's a bet, like all other restore techniques, and I'm going to make sure management has full knowledge of the risks before we implement it here (which is likely). -M -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeff Lightner Sent: Monday, October 22, 2007 10:28 AM To: Austin Murphy; veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments This paper looks to be 5 years old (based on newest references it cites - it actually cites others that go back nearly 10 years). It would be interesting to see his take on current deduplication offerings to see if the other checks they contain over simple hashing were enough to allay his concerns. One thing I've not seen in all this discussion is anyone saying they've actually experienced data loss as a result of commercial deduplication devices. Can anyone here claim that? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Austin Murphy Sent: Monday, October 22, 2007 10:47 AM To: veritas-bu@mailman.eng.auburn.edu Subject
Re: [Veritas-bu] Tapeless backup environments
You're absolutely right. Of course, every time you copy data, you face a similar risk. Every single time you copy data from one device to another, multiple levels of CRC/ECC are used to make sure that the target copy is the same as the source copy, and there is a chance (however small) every time you make a copy that the copy will make a mistake and CRC/ECC will not pick it up. That was part of my original point that I made in the first article I wrote on the subject. Yes, I know there is a chance for a hash collision and data corruption, but there is a chance of that every time you copy data anywhere, disk to disk, disk to tape, etc -- and there's a chance you'll never know it. I just don't understand all the vitriol aimed at this particular method. I did read the paper that someone forwarded, and while the paper is quite old, I think his arguments still hold true. (He also used the birthday paradox in the same way I did, BTW.) The only part I didn't quite understand was the part where he said that you can't compare hash collisions with hardware errors (like I'm doing above). I read that part a couple of times and didn't get it. I'm not saying I understood his argument and disagree with it, mind you. I'm saying he spent only two or three paragraphs explaining that part, and at the end I didn't understand what he said. --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of A Darren Dunham Sent: Thursday, October 25, 2007 1:02 PM To: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments Did I read in this list that netbackup was supposed to do some kind of checksum on the data written to tape? If so would a bpverify check this. I would assume that if netbackup does this it would find the error. because netbackup would do it's calc before passing the block to the dedupe hardware/software. And the block that it gets back from the dedupe hardware/software would be different. Even if bpverify did checksum in this manner, you can't assume that it would find all such errors. The checksum can collide in a manner identical to the hash. Unless it lined up exactly with the hash algorithm, it would likely provide some additional protection, but at the same time it must include some collisions where both the block hash and the overall checksum give identical values for a replacement block. The presense of an additional checksum like this changes the specific numbers, but does not change the essential character of the issue. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments
Nope - I don't think Netbackup is making checksums. Tape hardware seems to be reasonably adept at detecting big tape errors, though. This, of course, goes away with disk based backups. bpverify is just a check of the tape contents vs the media catalog. It does read the tape blocks so it may allow the drive to detect a media error but it's not a verification of the block integrity vs some stored checksum. DESCRIPTION bpverify verifies the contents of one or more backups by reading the backup volume and comparing its contents to the NetBackup catalog. This operation does not compare the data on the volume with the contents of the client disk. However, it does read each block in the image, thus verifying that the volume is readable. NetBackup verifies only one backup at a time and tries to minimize media mounts and positioning time. -M -Original Message- From: Len Boyle [mailto:[EMAIL PROTECTED] Sent: Monday, October 22, 2007 6:37 PM To: Donaldson, Mark - Broomfield, CO; veritas-bu@mailman.eng.auburn.edu Subject: RE: Re: [Veritas-bu] Tapeless backup environments Hello Mark, Did I read in this list that netbackup was supposed to do some kind of checksum on the data written to tape? If so would a bpverify check this. I would assume that if netbackup does this it would find the error. because netbackup would do it's calc before passing the block to the dedupe hardware/software. And the block that it gets back from the dedupe hardware/software would be different. Of course the brings on the question with the Symantec/Veritas pure disk product or emc's as the netbackup and the dedupe parts are merged one would not have this double check. At least I would not think that one would. len -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Monday, October 22, 2007 4:52 PM To: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments I think that part of the problem is that a hash duplication is nearly undetectable until you have restored and tested it as false. We all know that 99.999% of what we back up is never restored. It just ages gracefully on media and is expired. If any of that .001% is restored and is damaged due to a tape fault (and we've all had it happen) then we all know that we can usually reach back to a different version or different tape and we'll be close enough to make the user go away and let us return to our coffee and surfing. I think a big part of the worry of a hash collision is that the restore seems to happen, the file restores flawlessly, and it'll not be detectable unless someone can checksum the whole file or it's a binary or similar that simply refuses to work. Again, restoring from a different tape, different version may be ineffective depending on where the hash collision occurred and for what reason. Every version may use this same unchanging block which is restore incorrectly due to an invalid hash match. I know the odds are astronomical but I still remember that even though the odds are 150 million to one I'll win the lottery, I still see smiling faces on TV holding giant checks. It's a bet, like all other restore techniques, and I'm going to make sure management has full knowledge of the risks before we implement it here (which is likely). -M -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeff Lightner Sent: Monday, October 22, 2007 10:28 AM To: Austin Murphy; veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments This paper looks to be 5 years old (based on newest references it cites - it actually cites others that go back nearly 10 years). It would be interesting to see his take on current deduplication offerings to see if the other checks they contain over simple hashing were enough to allay his concerns. One thing I've not seen in all this discussion is anyone saying they've actually experienced data loss as a result of commercial deduplication devices. Can anyone here claim that? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Austin Murphy Sent: Monday, October 22, 2007 10:47 AM To: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments Here is some required reading on the topic from Val Henson, a noted academic/storage-guru. An Analysis of Compare-by-hash www.nmt.edu/~val/review/hash.pdf Of particular interst is why hardware error rates can't be compared with deterministic software errors. Austin ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu -- CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential information and is for the sole use of the intended recipient(s). If you
Re: [Veritas-bu] Tapeless backup environments
Did I read in this list that netbackup was supposed to do some kind of checksum on the data written to tape? If so would a bpverify check this. I would assume that if netbackup does this it would find the error. because netbackup would do it's calc before passing the block to the dedupe hardware/software. And the block that it gets back from the dedupe hardware/software would be different. Even if bpverify did checksum in this manner, you can't assume that it would find all such errors. The checksum can collide in a manner identical to the hash. Unless it lined up exactly with the hash algorithm, it would likely provide some additional protection, but at the same time it must include some collisions where both the block hash and the overall checksum give identical values for a replacement block. The presense of an additional checksum like this changes the specific numbers, but does not change the essential character of the issue. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments
Why don't we just move on.. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Curtis Preston Sent: Friday, October 19, 2007 6:52 PM To: Eagle, Kent; veritas-bu@mailman.eng.auburn.edu Cc: [EMAIL PROTECTED] Subject: Re: [Veritas-bu] Tapeless backup environments Since you've impugned my honor, I feel the need to defend myself a bit, but I don't want to spend much more time on this topic either: My first point was that you quoted a Wikipedia article as a source. The debate as to whether Wikipedia articles have any value is an ongoing one, and no point in rehashing it here. Suffice it to say that I have slightly higher opinion of it than you do. What I meant was that the posts made by Bob944 seemed to me to be supported by cited facts, and denoted personal experiences. I have personally used and tested many dedupe solutions. Based on his opinion of them, I'm pretty sure Bob944 has not. So I'm not sure how my posts could be construed as coming from theory and his from reality. Perhaps it was just my style of writing. He's not pointing to something he previously authored as proof that information is fact. I never did that. I only point you to the blog so because I put a lot of thought into it and figured you could read that version, instead of me having to rehash it here in email. To be fair, I haven't read any of your blog postings, only your posts in this forum. And I'm guessing you've never read my books or articles or seen me speak. I think you'll find that I'm not nearly as stupid as you seem to think I am. :) And yes; an Industry Pundit, Author, SME, or whomever, quoting Wikipedia as a source does tend to dilute credibility, in my mind. It's not a personal attack, just my personal position on the issue. Again, I didn't cite it as my only proof, and yes, we do have a different opinion on the validity of Wikipedia. The part below has me confused where you say No, because I never said those words or anything like them in my article. Since I never... What I was trying to say was that it seemed to me that he was saying something along the lines of my minds already made up, don't confuse me with the facts. You said I said the same thing, and I'm saying I didn't. In my blog posts (that I was referring and that you did not read), I think you'll find a very this is what I think, what do you think? mentality. If you inferred I was trying to say anything else, please believe I wasn't. So one could easily conclude that a position was taken (and published) on this topic without sufficient testing or research (the related SunSolve and other articles were already out there before these posts were made). Again, you haven't read my blog, so I'm not sure how you can criticize it. And it's a BLOG, dude. The whole spirit of blogging is that it's a stream of consciousness, not full articles and/or research. I didn't write GbE is a lie! in an article, I wrote it in a blog. The same blog where I wrote Top 10 Things I learned about backups from watching Die Hard. A lot of it is written tongue in cheek and I think anyone who follows it knows that. I don't put blogging on a subject on the same level of publishing any more than you consider a Wikipedia article valid information. And I think that most people feel the same way. BTW, I did a ton of research on Sun.com, neterion, intel, alacritech, google, etc to find ANY evidence of benchmarks to prove my feelings wrong before I wrote that blog article. The sunsolve articles to which you refer were written, but they aren't benchmarks, they only say this is how to configure a 10 GbE NIC on Solaris. You can see how maybe a newbie might assume a post as gospel with the barrage of credentials? I get that, as it happens to me all the time. It goes with the territory of being a prolific speaker/author/blogger/blabber. I try to help people. I write and speak a lot as part of that. If someone takes my word as gospel without doing their own research then shame on them and I can't control that. I stand by what I wrote, and when I'm wrong, I admit it. I'm not going to stop writing/emailing/blogging because I might say something wrong. I actually cut my teeth right down the road from you as the backup guy at MBNA. (I lived in Newark, DE, and you were my bank.) I'm not sure what you meant to imply by all this? If tenure with backup is an issue, than I would suggest you really don't have all that much time in this space, I never meant to imply that I have more credentials than you. I only meant to reply to the part of your post that suggested that I wasn't coming from a real/practical/having-actually-done-this-before position. And if 14 years doing backups and restores for my company and other companies don't give me some amount of credibility, I'm not sure what does. I've never made mention of my employer, or even implied that any of my statements represented any opinion or position of theirs? I find
Re: [Veritas-bu] Tapeless backup environments
Agree with you on that I set this subject to auto delete now a few weeks back! Regards Simon Weaver 3rd Line Technical Support Windows Domain Administrator EADS Astrium Limited, B23AA IM (DCS) Anchorage Road, Portsmouth, PO3 5PU Email: [EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Hall, Christian N. Sent: Monday, October 22, 2007 2:55 PM To: Curtis Preston; Eagle, Kent; veritas-bu@mailman.eng.auburn.edu Cc: [EMAIL PROTECTED] Subject: Re: [Veritas-bu] Tapeless backup environments Why don't we just move on.. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Curtis Preston Sent: Friday, October 19, 2007 6:52 PM To: Eagle, Kent; veritas-bu@mailman.eng.auburn.edu Cc: [EMAIL PROTECTED] Subject: Re: [Veritas-bu] Tapeless backup environments Since you've impugned my honor, I feel the need to defend myself a bit, but I don't want to spend much more time on this topic either: My first point was that you quoted a Wikipedia article as a source. The debate as to whether Wikipedia articles have any value is an ongoing one, and no point in rehashing it here. Suffice it to say that I have slightly higher opinion of it than you do. What I meant was that the posts made by Bob944 seemed to me to be supported by cited facts, and denoted personal experiences. I have personally used and tested many dedupe solutions. Based on his opinion of them, I'm pretty sure Bob944 has not. So I'm not sure how my posts could be construed as coming from theory and his from reality. Perhaps it was just my style of writing. He's not pointing to something he previously authored as proof that information is fact. I never did that. I only point you to the blog so because I put a lot of thought into it and figured you could read that version, instead of me having to rehash it here in email. To be fair, I haven't read any of your blog postings, only your posts in this forum. And I'm guessing you've never read my books or articles or seen me speak. I think you'll find that I'm not nearly as stupid as you seem to think I am. :) And yes; an Industry Pundit, Author, SME, or whomever, quoting Wikipedia as a source does tend to dilute credibility, in my mind. It's not a personal attack, just my personal position on the issue. Again, I didn't cite it as my only proof, and yes, we do have a different opinion on the validity of Wikipedia. The part below has me confused where you say No, because I never said those words or anything like them in my article. Since I never... What I was trying to say was that it seemed to me that he was saying something along the lines of my minds already made up, don't confuse me with the facts. You said I said the same thing, and I'm saying I didn't. In my blog posts (that I was referring and that you did not read), I think you'll find a very this is what I think, what do you think? mentality. If you inferred I was trying to say anything else, please believe I wasn't. So one could easily conclude that a position was taken (and published) on this topic without sufficient testing or research (the related SunSolve and other articles were already out there before these posts were made). Again, you haven't read my blog, so I'm not sure how you can criticize it. And it's a BLOG, dude. The whole spirit of blogging is that it's a stream of consciousness, not full articles and/or research. I didn't write GbE is a lie! in an article, I wrote it in a blog. The same blog where I wrote Top 10 Things I learned about backups from watching Die Hard. A lot of it is written tongue in cheek and I think anyone who follows it knows that. I don't put blogging on a subject on the same level of publishing any more than you consider a Wikipedia article valid information. And I think that most people feel the same way. BTW, I did a ton of research on Sun.com, neterion, intel, alacritech, google, etc to find ANY evidence of benchmarks to prove my feelings wrong before I wrote that blog article. The sunsolve articles to which you refer were written, but they aren't benchmarks, they only say this is how to configure a 10 GbE NIC on Solaris. You can see how maybe a newbie might assume a post as gospel with the barrage of credentials? I get that, as it happens to me all the time. It goes with the territory of being a prolific speaker/author/blogger/blabber. I try to help people. I write and speak a lot as part of that. If someone takes my word as gospel without doing their own research then shame on them and I can't control that. I stand by what I wrote, and when I'm wrong, I admit it. I'm not going to stop writing/emailing/blogging because I might say something wrong. I actually cut my teeth right down the road from you as the backup guy at MBNA. (I lived in Newark, DE, and you were my bank.) I'm not sure what you meant to imply by all this? If tenure with backup is an issue, than I
Re: [Veritas-bu] Tapeless backup environments
Here is some required reading on the topic from Val Henson, a noted academic/storage-guru. An Analysis of Compare-by-hash www.nmt.edu/~val/review/hash.pdf Of particular interst is why hardware error rates can't be compared with deterministic software errors. Austin ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments
This paper looks to be 5 years old (based on newest references it cites - it actually cites others that go back nearly 10 years). It would be interesting to see his take on current deduplication offerings to see if the other checks they contain over simple hashing were enough to allay his concerns. One thing I've not seen in all this discussion is anyone saying they've actually experienced data loss as a result of commercial deduplication devices. Can anyone here claim that? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Austin Murphy Sent: Monday, October 22, 2007 10:47 AM To: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments Here is some required reading on the topic from Val Henson, a noted academic/storage-guru. An Analysis of Compare-by-hash www.nmt.edu/~val/review/hash.pdf Of particular interst is why hardware error rates can't be compared with deterministic software errors. Austin ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu -- CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential information and is for the sole use of the intended recipient(s). If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you. -- ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments
I don't know. I, for one, am now thoroughly engrossed given Curtis' honor has been impugned. =P -Jonathan From: [EMAIL PROTECTED] on behalf of Hall, Christian N. Sent: Mon 10/22/2007 9:55 AM To: Curtis Preston; Eagle, Kent; veritas-bu@mailman.eng.auburn.edu Cc: [EMAIL PROTECTED] Subject: Re: [Veritas-bu] Tapeless backup environments Why don't we just move on.. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Curtis Preston Sent: Friday, October 19, 2007 6:52 PM To: Eagle, Kent; veritas-bu@mailman.eng.auburn.edu Cc: [EMAIL PROTECTED] Subject: Re: [Veritas-bu] Tapeless backup environments Since you've impugned my honor, I feel the need to defend myself a bit, but I don't want to spend much more time on this topic either: My first point was that you quoted a Wikipedia article as a source. The debate as to whether Wikipedia articles have any value is an ongoing one, and no point in rehashing it here. Suffice it to say that I have slightly higher opinion of it than you do. What I meant was that the posts made by Bob944 seemed to me to be supported by cited facts, and denoted personal experiences. I have personally used and tested many dedupe solutions. Based on his opinion of them, I'm pretty sure Bob944 has not. So I'm not sure how my posts could be construed as coming from theory and his from reality. Perhaps it was just my style of writing. He's not pointing to something he previously authored as proof that information is fact. I never did that. I only point you to the blog so because I put a lot of thought into it and figured you could read that version, instead of me having to rehash it here in email. To be fair, I haven't read any of your blog postings, only your posts in this forum. And I'm guessing you've never read my books or articles or seen me speak. I think you'll find that I'm not nearly as stupid as you seem to think I am. :) And yes; an Industry Pundit, Author, SME, or whomever, quoting Wikipedia as a source does tend to dilute credibility, in my mind. It's not a personal attack, just my personal position on the issue. Again, I didn't cite it as my only proof, and yes, we do have a different opinion on the validity of Wikipedia. The part below has me confused where you say No, because I never said those words or anything like them in my article. Since I never... What I was trying to say was that it seemed to me that he was saying something along the lines of my minds already made up, don't confuse me with the facts. You said I said the same thing, and I'm saying I didn't. In my blog posts (that I was referring and that you did not read), I think you'll find a very this is what I think, what do you think? mentality. If you inferred I was trying to say anything else, please believe I wasn't. So one could easily conclude that a position was taken (and published) on this topic without sufficient testing or research (the related SunSolve and other articles were already out there before these posts were made). Again, you haven't read my blog, so I'm not sure how you can criticize it. And it's a BLOG, dude. The whole spirit of blogging is that it's a stream of consciousness, not full articles and/or research. I didn't write GbE is a lie! in an article, I wrote it in a blog. The same blog where I wrote Top 10 Things I learned about backups from watching Die Hard. A lot of it is written tongue in cheek and I think anyone who follows it knows that. I don't put blogging on a subject on the same level of publishing any more than you consider a Wikipedia article valid information. And I think that most people feel the same way. BTW, I did a ton of research on Sun.com, neterion, intel, alacritech, google, etc to find ANY evidence of benchmarks to prove my feelings wrong before I wrote that blog article. The sunsolve articles to which you refer were written, but they aren't benchmarks, they only say this is how to configure a 10 GbE NIC on Solaris. You can see how maybe a newbie might assume a post as gospel with the barrage of credentials? I get that, as it happens to me all the time. It goes with the territory of being a prolific speaker/author/blogger/blabber. I try to help people. I write and speak a lot as part of that. If someone takes my word as gospel without doing their own research then shame on them and I can't control that. I stand by what I wrote, and when I'm wrong, I admit it. I'm not going to stop writing/emailing/blogging because I might say something wrong. I actually cut my teeth right down the road from you as the backup guy at MBNA. (I lived in Newark, DE, and you were my bank.) I'm not sure what you meant to imply by all this? If tenure with backup is an issue, than I would suggest you really don't have all that much time in this space, I never meant to imply that I have more credentials than you. I only meant to reply to the part of your post that suggested
Re: [Veritas-bu] Tapeless backup environments
Funny. VERY Funny. ;) --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Martin, Jonathan Sent: Monday, October 22, 2007 9:23 AM To: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments I don't know. I, for one, am now thoroughly engrossed given Curtis' honor has been impugned. =P -Jonathan From: [EMAIL PROTECTED] on behalf of Hall, Christian N. Sent: Mon 10/22/2007 9:55 AM To: Curtis Preston; Eagle, Kent; veritas-bu@mailman.eng.auburn.edu Cc: [EMAIL PROTECTED] Subject: Re: [Veritas-bu] Tapeless backup environments Why don't we just move on.. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Curtis Preston Sent: Friday, October 19, 2007 6:52 PM To: Eagle, Kent; veritas-bu@mailman.eng.auburn.edu Cc: [EMAIL PROTECTED] Subject: Re: [Veritas-bu] Tapeless backup environments Since you've impugned my honor, I feel the need to defend myself a bit, but I don't want to spend much more time on this topic either: My first point was that you quoted a Wikipedia article as a source. The debate as to whether Wikipedia articles have any value is an ongoing one, and no point in rehashing it here. Suffice it to say that I have slightly higher opinion of it than you do. What I meant was that the posts made by Bob944 seemed to me to be supported by cited facts, and denoted personal experiences. I have personally used and tested many dedupe solutions. Based on his opinion of them, I'm pretty sure Bob944 has not. So I'm not sure how my posts could be construed as coming from theory and his from reality. Perhaps it was just my style of writing. He's not pointing to something he previously authored as proof that information is fact. I never did that. I only point you to the blog so because I put a lot of thought into it and figured you could read that version, instead of me having to rehash it here in email. To be fair, I haven't read any of your blog postings, only your posts in this forum. And I'm guessing you've never read my books or articles or seen me speak. I think you'll find that I'm not nearly as stupid as you seem to think I am. :) And yes; an Industry Pundit, Author, SME, or whomever, quoting Wikipedia as a source does tend to dilute credibility, in my mind. It's not a personal attack, just my personal position on the issue. Again, I didn't cite it as my only proof, and yes, we do have a different opinion on the validity of Wikipedia. The part below has me confused where you say No, because I never said those words or anything like them in my article. Since I never... What I was trying to say was that it seemed to me that he was saying something along the lines of my minds already made up, don't confuse me with the facts. You said I said the same thing, and I'm saying I didn't. In my blog posts (that I was referring and that you did not read), I think you'll find a very this is what I think, what do you think? mentality. If you inferred I was trying to say anything else, please believe I wasn't. So one could easily conclude that a position was taken (and published) on this topic without sufficient testing or research (the related SunSolve and other articles were already out there before these posts were made). Again, you haven't read my blog, so I'm not sure how you can criticize it. And it's a BLOG, dude. The whole spirit of blogging is that it's a stream of consciousness, not full articles and/or research. I didn't write GbE is a lie! in an article, I wrote it in a blog. The same blog where I wrote Top 10 Things I learned about backups from watching Die Hard. A lot of it is written tongue in cheek and I think anyone who follows it knows that. I don't put blogging on a subject on the same level of publishing any more than you consider a Wikipedia article valid information. And I think that most people feel the same way. BTW, I did a ton of research on Sun.com, neterion, intel, alacritech, google, etc to find ANY evidence of benchmarks to prove my feelings wrong before I wrote that blog article. The sunsolve articles to which you refer were written, but they aren't benchmarks, they only say this is how to configure a 10 GbE NIC on Solaris. You can see how maybe a newbie might assume a post as gospel with the barrage of credentials? I get that, as it happens to me all the time. It goes with the territory of being a prolific speaker/author/blogger/blabber. I try to help people. I write and speak a lot as part of that. If someone takes my word as gospel without doing their own research then shame on them and I can't control that. I stand by what I wrote, and when I'm wrong, I admit it. I'm not going to stop writing/emailing/blogging because I might say something wrong. I actually cut my teeth right down the road from
Re: [Veritas-bu] Tapeless backup environments
I think that part of the problem is that a hash duplication is nearly undetectable until you have restored and tested it as false. We all know that 99.999% of what we back up is never restored. It just ages gracefully on media and is expired. If any of that .001% is restored and is damaged due to a tape fault (and we've all had it happen) then we all know that we can usually reach back to a different version or different tape and we'll be close enough to make the user go away and let us return to our coffee and surfing. I think a big part of the worry of a hash collision is that the restore seems to happen, the file restores flawlessly, and it'll not be detectable unless someone can checksum the whole file or it's a binary or similar that simply refuses to work. Again, restoring from a different tape, different version may be ineffective depending on where the hash collision occurred and for what reason. Every version may use this same unchanging block which is restore incorrectly due to an invalid hash match. I know the odds are astronomical but I still remember that even though the odds are 150 million to one I'll win the lottery, I still see smiling faces on TV holding giant checks. It's a bet, like all other restore techniques, and I'm going to make sure management has full knowledge of the risks before we implement it here (which is likely). -M -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeff Lightner Sent: Monday, October 22, 2007 10:28 AM To: Austin Murphy; veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments This paper looks to be 5 years old (based on newest references it cites - it actually cites others that go back nearly 10 years). It would be interesting to see his take on current deduplication offerings to see if the other checks they contain over simple hashing were enough to allay his concerns. One thing I've not seen in all this discussion is anyone saying they've actually experienced data loss as a result of commercial deduplication devices. Can anyone here claim that? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Austin Murphy Sent: Monday, October 22, 2007 10:47 AM To: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments Here is some required reading on the topic from Val Henson, a noted academic/storage-guru. An Analysis of Compare-by-hash www.nmt.edu/~val/review/hash.pdf Of particular interst is why hardware error rates can't be compared with deterministic software errors. Austin ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu -- CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential information and is for the sole use of the intended recipient(s). If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you. -- ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments
Hello Mark, Did I read in this list that netbackup was supposed to do some kind of checksum on the data written to tape? If so would a bpverify check this. I would assume that if netbackup does this it would find the error. because netbackup would do it's calc before passing the block to the dedupe hardware/software. And the block that it gets back from the dedupe hardware/software would be different. Of course the brings on the question with the Symantec/Veritas pure disk product or emc's as the netbackup and the dedupe parts are merged one would not have this double check. At least I would not think that one would. len -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Monday, October 22, 2007 4:52 PM To: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments I think that part of the problem is that a hash duplication is nearly undetectable until you have restored and tested it as false. We all know that 99.999% of what we back up is never restored. It just ages gracefully on media and is expired. If any of that .001% is restored and is damaged due to a tape fault (and we've all had it happen) then we all know that we can usually reach back to a different version or different tape and we'll be close enough to make the user go away and let us return to our coffee and surfing. I think a big part of the worry of a hash collision is that the restore seems to happen, the file restores flawlessly, and it'll not be detectable unless someone can checksum the whole file or it's a binary or similar that simply refuses to work. Again, restoring from a different tape, different version may be ineffective depending on where the hash collision occurred and for what reason. Every version may use this same unchanging block which is restore incorrectly due to an invalid hash match. I know the odds are astronomical but I still remember that even though the odds are 150 million to one I'll win the lottery, I still see smiling faces on TV holding giant checks. It's a bet, like all other restore techniques, and I'm going to make sure management has full knowledge of the risks before we implement it here (which is likely). -M -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeff Lightner Sent: Monday, October 22, 2007 10:28 AM To: Austin Murphy; veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments This paper looks to be 5 years old (based on newest references it cites - it actually cites others that go back nearly 10 years). It would be interesting to see his take on current deduplication offerings to see if the other checks they contain over simple hashing were enough to allay his concerns. One thing I've not seen in all this discussion is anyone saying they've actually experienced data loss as a result of commercial deduplication devices. Can anyone here claim that? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Austin Murphy Sent: Monday, October 22, 2007 10:47 AM To: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments Here is some required reading on the topic from Val Henson, a noted academic/storage-guru. An Analysis of Compare-by-hash www.nmt.edu/~val/review/hash.pdf Of particular interst is why hardware error rates can't be compared with deterministic software errors. Austin ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu -- CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential information and is for the sole use of the intended recipient(s). If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you. -- ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
I wish we had a white board and could sit in front of each other to finish the discussion, but it's obvious that it's not going to be resolved here. You believe I'm missing your point, and I believe you're missing my point. what matters is if you use a shorthand to track the values which can't tell that Feb 7 and Dec 28 are different values because you put them in the same hash bucket and therefore think that everything that bucket is Feb 7, you retrieve the wrong data. Not sure how many times I (or others) have to keep saying, the dates are not the data that are being deduped. The dates are the hashes. The data is the person. An 8KB chunk of data can have 2^65536 possible values. Representing that 8KB of data in 160 bits means that each of the 2^160 possible checksum/hash/fingerprint values MUST represent, on average, 2^65376 *different* 8KB chunks of data. This, again, only makes sense if you are using the hash to store/reconstruct the data, not to ID the data. The fingerprint (like a real fingerprint) is not used to reconstruct a block, it's only used to give it a unique ID that distinguishes it from other blocks. You still have to store the block with the key. And with 2^160 different fingerprints, that means we can calculate unique fingerprints for 2^160 blocks. That means we can calculate a unique fingerprint for 1,461,501,637,330,900,000,000,000,000,000,000,000,000,000,000,000 blocks, which is 11,832,317,255,831,000,000,000,000,000,000,000,000,000,000,000,000,000 bytes of data. That's a lot of stinking data. If that doesn't concern you, well, it's safe to say I won't be hiring you as my backup admin. Or as my technology consultant, since you I really don't think you need to make it personal, and suggest that I don't know what I'm doing simply because we have been unable to successfully communicate to each other in this medium. This medium can be a very difficult one to communicate such a difficult subject in. I think things would be very different in person with a whiteboard. should know from earlier postings that spoofing your favorite 160-bit hashing algorithm with reasonable-looking fake data is now old hat. The exploit itself should concern us, not to mention that it also illustrates that similar data which yields the same hash is not the once-in-the-lifetime-of-the-universe oddity you portray. They worked really hard to figure out how to take one block that calculates to a particular hash and create another block that calculates to the same hash. It's used to fake a signature. I get it. I just don't see how or why somebody would use this to do I don't know what with my backups. And if we were having this discussion over a few drinks we could try to come up with some ideas. Right now, I'm as tired as you are of this discussion. Everything mentioned here was covered in the original postings a month ago. Unless there's something new, I'm done with this. You're right. IN THIS MEDIUM, you don't understand me, and I don't understand you. Let's agree to disagree and move on. For anyone who's still reading, I just want to say this: I was only trying to bring some sanity to what I felt was an undue amount of FUD against the hash-only products. I'm not necessarily trying to talk anyone into them. I just want you to understand what I THINK the real odds are. If after understanding how it works and what the odds are, you're still uncomfortable, don't dismiss dedupe. Just consider a non-hash-based de-dupe product. Curtis out. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments
Since you've impugned my honor, I feel the need to defend myself a bit, but I don't want to spend much more time on this topic either: My first point was that you quoted a Wikipedia article as a source. The debate as to whether Wikipedia articles have any value is an ongoing one, and no point in rehashing it here. Suffice it to say that I have slightly higher opinion of it than you do. What I meant was that the posts made by Bob944 seemed to me to be supported by cited facts, and denoted personal experiences. I have personally used and tested many dedupe solutions. Based on his opinion of them, I'm pretty sure Bob944 has not. So I'm not sure how my posts could be construed as coming from theory and his from reality. Perhaps it was just my style of writing. He's not pointing to something he previously authored as proof that information is fact. I never did that. I only point you to the blog so because I put a lot of thought into it and figured you could read that version, instead of me having to rehash it here in email. To be fair, I haven't read any of your blog postings, only your posts in this forum. And I'm guessing you've never read my books or articles or seen me speak. I think you'll find that I'm not nearly as stupid as you seem to think I am. :) And yes; an Industry Pundit, Author, SME, or whomever, quoting Wikipedia as a source does tend to dilute credibility, in my mind. It's not a personal attack, just my personal position on the issue. Again, I didn't cite it as my only proof, and yes, we do have a different opinion on the validity of Wikipedia. The part below has me confused where you say No, because I never said those words or anything like them in my article. Since I never... What I was trying to say was that it seemed to me that he was saying something along the lines of my minds already made up, don't confuse me with the facts. You said I said the same thing, and I'm saying I didn't. In my blog posts (that I was referring and that you did not read), I think you'll find a very this is what I think, what do you think? mentality. If you inferred I was trying to say anything else, please believe I wasn't. So one could easily conclude that a position was taken (and published) on this topic without sufficient testing or research (the related SunSolve and other articles were already out there before these posts were made). Again, you haven't read my blog, so I'm not sure how you can criticize it. And it's a BLOG, dude. The whole spirit of blogging is that it's a stream of consciousness, not full articles and/or research. I didn't write GbE is a lie! in an article, I wrote it in a blog. The same blog where I wrote Top 10 Things I learned about backups from watching Die Hard. A lot of it is written tongue in cheek and I think anyone who follows it knows that. I don't put blogging on a subject on the same level of publishing any more than you consider a Wikipedia article valid information. And I think that most people feel the same way. BTW, I did a ton of research on Sun.com, neterion, intel, alacritech, google, etc to find ANY evidence of benchmarks to prove my feelings wrong before I wrote that blog article. The sunsolve articles to which you refer were written, but they aren't benchmarks, they only say this is how to configure a 10 GbE NIC on Solaris. You can see how maybe a newbie might assume a post as gospel with the barrage of credentials? I get that, as it happens to me all the time. It goes with the territory of being a prolific speaker/author/blogger/blabber. I try to help people. I write and speak a lot as part of that. If someone takes my word as gospel without doing their own research then shame on them and I can't control that. I stand by what I wrote, and when I'm wrong, I admit it. I'm not going to stop writing/emailing/blogging because I might say something wrong. I actually cut my teeth right down the road from you as the backup guy at MBNA. (I lived in Newark, DE, and you were my bank.) I'm not sure what you meant to imply by all this? If tenure with backup is an issue, than I would suggest you really don't have all that much time in this space, I never meant to imply that I have more credentials than you. I only meant to reply to the part of your post that suggested that I wasn't coming from a real/practical/having-actually-done-this-before position. And if 14 years doing backups and restores for my company and other companies don't give me some amount of credibility, I'm not sure what does. I've never made mention of my employer, or even implied that any of my statements represented any opinion or position of theirs? I find this statement, well, bizarre... Dunno. Seemed right at the time. What was my point? I thought it would help hammer home the I'm a real person who did real stuff in a real place point. Guess it didn't help at all. ;) Maybe I will attend the class after all. I'm beginning to think I'll be entertained.
Re: [Veritas-bu] Tapeless backup environments
Jeff, The mix was deliberate. Please re-read my post it should become evident as to why. There was no implication that someone stated they had experienced data loss. In fact, nothing in my post is really speaking to dedupe or data loss. It's about the posts themselves... - Kent -Original Message- From: Jeff Lightner [mailto:[EMAIL PROTECTED] Sent: Friday, October 19, 2007 12:36 PM To: Eagle, Kent; Curtis Preston; veritas-bu@mailman.eng.auburn.edu Cc: [EMAIL PROTECTED] Subject: RE: [Veritas-bu] Tapeless backup environments Not an attack - just a question: Did someone in this thread say they HAD experienced data loss due to deduplication? If so I missed it. You mixed comments about another thread in here and I *think* you're saying something about someone's experience with 10GigE rather than deduplication. Your post could be misread to say someone had in fact had such a data loss and posted it here. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Eagle, Kent Sent: Friday, October 19, 2007 12:08 PM To: Curtis Preston; veritas-bu@mailman.eng.auburn.edu Cc: [EMAIL PROTECTED] Subject: Re: [Veritas-bu] Tapeless backup environments O.k., at the risk of seeming like I wrote more than you, therefore I must be right... 2nd. (and last) post on this - My first point was that you quoted a Wikipedia article as a source. For me, it really had nothing to do with the subject matter. They have a disclaimer as to the validity of anything on there, and for good reason: Anyone can post anything on there, about anything, containing anything. It might be right, it might be wrong. I would be far more inclined to trust, or quote an industry consortium, or even a vendors test results page than Wikipedia. As long as were throwing credentials around, I might as well mention: As a former scientist, and statistician, and current engineer, I fully understand what empirical research is. It INCLUDES math. It is the actual testing and the statistics of that testing. FWIW: I was trained in this and FMEA (Failure Modes Effects Analysis) by the gentleman who ran the Reliability and Maintainability program for Boeing's Saturn and Apollo space programs, as well as their VERTOL and fixed wing programs. I can see where my second point could have easily been misinterpreted. Apologies to anyone led astray. What I meant was that the posts made by Bob944 seemed to me to be supported by cited facts, and denoted personal experiences. He's not pointing to something he previously authored as proof that information is fact. I've only seen him reference previous posts for the purposes of levelset. To be fair, I haven't read any of your blog postings, only your posts in this forum. More on that below. And yes; an Industry Pundit, Author, SME, or whomever, quoting Wikipedia as a source does tend to dilute credibility, in my mind. It's not a personal attack, just my personal position on the issue. The part below has me confused where you say No, because I never said those words or anything like them in my article. Since I never mentioned anything about any articles... All my comments are in regard you your posts on this forum, in which you did say that. Wouldn't THAT be saying that up until that point, YOU WERE SAYING that no matter what the entire world is saying -- no matter what the numbers are, you're not going to accept... This was your text, no? Obviously there's nothing wrong with admitting you're wrong. What I was pointing out was that it appears duplicitous to make the comment above and then state you're probably going to post a retraction in your blog based one users experience. I'm referring to the 10 GbE thread where one user reported stellar throughput, which contradicted a contrived theoretical maximum, and several reports of ho-hum throughput. 7500 MB/s! That's the most impressive numbers I've ever seen by FAR. I may have to take back my 10 GbE is a Lie! blog post, and I'd be happy to do so. This was your text, no? So one could easily conclude that a position was taken (and published) on this topic without sufficient testing or research (the related SunSolve and other articles were already out there before these posts were made). You said: Remember also that these posts are often done on my own time late at night, etc. I never claimed to be perfect. True, but you do cite that you are an author of books on the subject, author of a blog on the subject, and work for one of the largest industry resources. Indeed the VP Data Protection. You can see how maybe a newbie might assume a post as gospel with the barrage of credentials? Would they not be disappointed to learn they need to check the timestamp of a post before lending any credence to it's contents? ;-) You said: I don't think you'll find that to be a problem. I'm an in-the-trenches guy, who has sat in front of many a tape drive, tape library, and backup GUI in my 14 years in this space. I actually cut my
Re: [Veritas-bu] Tapeless backup environments
Not an attack - just a question: Did someone in this thread say they HAD experienced data loss due to deduplication? If so I missed it. You mixed comments about another thread in here and I *think* you're saying something about someone's experience with 10GigE rather than deduplication. Your post could be misread to say someone had in fact had such a data loss and posted it here. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Eagle, Kent Sent: Friday, October 19, 2007 12:08 PM To: Curtis Preston; veritas-bu@mailman.eng.auburn.edu Cc: [EMAIL PROTECTED] Subject: Re: [Veritas-bu] Tapeless backup environments O.k., at the risk of seeming like I wrote more than you, therefore I must be right... 2nd. (and last) post on this - My first point was that you quoted a Wikipedia article as a source. For me, it really had nothing to do with the subject matter. They have a disclaimer as to the validity of anything on there, and for good reason: Anyone can post anything on there, about anything, containing anything. It might be right, it might be wrong. I would be far more inclined to trust, or quote an industry consortium, or even a vendors test results page than Wikipedia. As long as were throwing credentials around, I might as well mention: As a former scientist, and statistician, and current engineer, I fully understand what empirical research is. It INCLUDES math. It is the actual testing and the statistics of that testing. FWIW: I was trained in this and FMEA (Failure Modes Effects Analysis) by the gentleman who ran the Reliability and Maintainability program for Boeing's Saturn and Apollo space programs, as well as their VERTOL and fixed wing programs. I can see where my second point could have easily been misinterpreted. Apologies to anyone led astray. What I meant was that the posts made by Bob944 seemed to me to be supported by cited facts, and denoted personal experiences. He's not pointing to something he previously authored as proof that information is fact. I've only seen him reference previous posts for the purposes of levelset. To be fair, I haven't read any of your blog postings, only your posts in this forum. More on that below. And yes; an Industry Pundit, Author, SME, or whomever, quoting Wikipedia as a source does tend to dilute credibility, in my mind. It's not a personal attack, just my personal position on the issue. The part below has me confused where you say No, because I never said those words or anything like them in my article. Since I never mentioned anything about any articles... All my comments are in regard you your posts on this forum, in which you did say that. Wouldn't THAT be saying that up until that point, YOU WERE SAYING that no matter what the entire world is saying -- no matter what the numbers are, you're not going to accept... This was your text, no? Obviously there's nothing wrong with admitting you're wrong. What I was pointing out was that it appears duplicitous to make the comment above and then state you're probably going to post a retraction in your blog based one users experience. I'm referring to the 10 GbE thread where one user reported stellar throughput, which contradicted a contrived theoretical maximum, and several reports of ho-hum throughput. 7500 MB/s! That's the most impressive numbers I've ever seen by FAR. I may have to take back my 10 GbE is a Lie! blog post, and I'd be happy to do so. This was your text, no? So one could easily conclude that a position was taken (and published) on this topic without sufficient testing or research (the related SunSolve and other articles were already out there before these posts were made). You said: Remember also that these posts are often done on my own time late at night, etc. I never claimed to be perfect. True, but you do cite that you are an author of books on the subject, author of a blog on the subject, and work for one of the largest industry resources. Indeed the VP Data Protection. You can see how maybe a newbie might assume a post as gospel with the barrage of credentials? Would they not be disappointed to learn they need to check the timestamp of a post before lending any credence to it's contents? ;-) You said: I don't think you'll find that to be a problem. I'm an in-the-trenches guy, who has sat in front of many a tape drive, tape library, and backup GUI in my 14 years in this space. I actually cut my teeth right down the road from you as the backup guy at MBNA. (I lived in Newark, DE, and you were my bank.) I'm not sure what you meant to imply by all this? If tenure with backup is an issue, than I would suggest you really don't have all that much time in this space, relative to my experience anyway. I had been working with various forms of backup for that long before MBNA even had a Data Center in DE. Why would it be necessary to point out that you were in the same geographic locale, or used the services of my employer? I've never made
Re: [Veritas-bu] Tapeless backup environments?
How about setting up a white board / aka NetMeeting ! I think this thread has gone on for some time now, and yet there still appears to be 2 different opinions. Not going to please everyone.! :-) personally, I would not be worried about it and will just step out of the debate and move on. Right or wrong, I really don't care that much :-) But anyhow, something like DIGG Whiteboard might help - think its still free if those wishing to continue the debate want to continue offline :-) Bye ! Regards Simon Weaver 3rd Line Technical Support Windows Domain Administrator EADS Astrium Limited, B23AA IM (DCS) Anchorage Road, Portsmouth, PO3 5PU Email: [EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Curtis Preston Sent: Friday, October 19, 2007 8:38 AM To: [EMAIL PROTECTED]; veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? I wish we had a white board and could sit in front of each other to finish the discussion, but it's obvious that it's not going to be resolved here. You believe I'm missing your point, and I believe you're missing my point. what matters is if you use a shorthand to track the values which can't tell that Feb 7 and Dec 28 are different values because you put them in the same hash bucket and therefore think that everything that bucket is Feb 7, you retrieve the wrong data. Not sure how many times I (or others) have to keep saying, the dates are not the data that are being deduped. The dates are the hashes. The data is the person. An 8KB chunk of data can have 2^65536 possible values. Representing that 8KB of data in 160 bits means that each of the 2^160 possible checksum/hash/fingerprint values MUST represent, on average, 2^65376 *different* 8KB chunks of data. This, again, only makes sense if you are using the hash to store/reconstruct the data, not to ID the data. The fingerprint (like a real fingerprint) is not used to reconstruct a block, it's only used to give it a unique ID that distinguishes it from other blocks. You still have to store the block with the key. And with 2^160 different fingerprints, that means we can calculate unique fingerprints for 2^160 blocks. That means we can calculate a unique fingerprint for 1,461,501,637,330,900,000,000,000,000,000,000,000,000,000,000,000 blocks, which is 11,832,317,255,831,000,000,000,000,000,000,000,000,000,000,000,000,000 bytes of data. That's a lot of stinking data. If that doesn't concern you, well, it's safe to say I won't be hiring you as my backup admin. Or as my technology consultant, since you I really don't think you need to make it personal, and suggest that I don't know what I'm doing simply because we have been unable to successfully communicate to each other in this medium. This medium can be a very difficult one to communicate such a difficult subject in. I think things would be very different in person with a whiteboard. should know from earlier postings that spoofing your favorite 160-bit hashing algorithm with reasonable-looking fake data is now old hat. The exploit itself should concern us, not to mention that it also illustrates that similar data which yields the same hash is not the once-in-the-lifetime-of-the-universe oddity you portray. They worked really hard to figure out how to take one block that calculates to a particular hash and create another block that calculates to the same hash. It's used to fake a signature. I get it. I just don't see how or why somebody would use this to do I don't know what with my backups. And if we were having this discussion over a few drinks we could try to come up with some ideas. Right now, I'm as tired as you are of this discussion. Everything mentioned here was covered in the original postings a month ago. Unless there's something new, I'm done with this. You're right. IN THIS MEDIUM, you don't understand me, and I don't understand you. Let's agree to disagree and move on. For anyone who's still reading, I just want to say this: I was only trying to bring some sanity to what I felt was an undue amount of FUD against the hash-only products. I'm not necessarily trying to talk anyone into them. I just want you to understand what I THINK the real odds are. If after understanding how it works and what the odds are, you're still uncomfortable, don't dismiss dedupe. Just consider a non-hash-based de-dupe product. Curtis out. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu This email (including any attachments) may contain confidential and/or privileged information or information otherwise protected from disclosure. If you are not the intended recipient, please notify the sender immediately, do not copy this message or any attachments and do not use it for any purpose or disclose its content to any person, but delete
Re: [Veritas-bu] Tapeless backup environments?
At the risk of chasing windmills, I will continue to try to have this discussion, although it appears to me that you're already made up your mind. I again say that no one is saying that hash collisions can't happen. We are simply saying that the odds of them happening are astromically less than having an undetected/uncorrected bit error on tape. And I believe that the math that I use in my blog post illustrates this. I said: As promised, I looked into applying the Birthday Paradox logic to de-duplication. I blogged about my results here: http://www.backupcentral.com/content/view/145/47/ Long and short of it: If you've got less than 95 Exabytes of data, I think you'll be OK. Bob944 said: One of us still doesn't understand this. :-) Got that right. :-) Your blog raises a red herring in misunderstanding or misrepresenting the applicability of Birthday Paradox. I completely disagree. If you read the Birthday Paradox entry on Wikipedia, it specifically explains how the Birthday Paradox applies in this case. All the BP says is that the odds of a clash (i.e. a birthday match or a hash collision) in an environment increase with the number of elements in the set, and that the odds increase faster than you think: * The odds of two people in the same room having the same birthday increase with the number of people in the room. If there are only two people in the room, those odds will be roughly 1 in 365, or .27% (leap year aside). If there are 23 people in the room, the odds are 50%. * The odds of two DIFFERENT blocks having the same hash (i.e. a hash collision) increase with the number of blocks in the data set If there are two blocks in the set, the odds are 1 in 2^160. If there are less than 12.7 quintillion blocks in the data set, the odds don't show up in a percentage calculated out to 50 decimal places. As soon as you have more than 12.7 quintillion blocks, the odds at least register in 50 decimal places, but are still really small. And to get 12.7 quintillion blocks, you need to store at least 95 Exabytes of data. The number of possible values in BP is 366; there is no data reduction in it, no key values. An algorithm which reduced the 366 possibilities the same way that hashing 8KB down to 160 bits would yield infinitesimal keys smaller than one bit, an absurdity. Yeah, IMHO, we are talking apples and oranges. Let me try to put the hash collision into the birthday world. Let's say that we want a wall of photos of everyone who came to our party. When you show up, we check your birthday, and we check it off on a list. (We'll call your BD the hash.) If we've never seen your birthday before, we take your photo and put it on the wall. If your birthday has already been checked off on the list, though, we don't take your photo. We assume that since you have the same birthday, you must be the same person. So you don't get your photo taken. We just write on the photo of the first guy whose picture we took that he came to the party twice (he must have left and come back). Now, if he is indeed the same guy, that's not a hash/BD collision. If he is indeed a different person, and we said he was the same person simply because he had the same birthday, then that would be a hash/BD collision. And THIS would be an absurdity to think you can represent n number of people in a party with an array of photos selected solely on their birthday (a key space of only 366). But it's not out of the realm of possibility to say that we could represent n number of bits in our data center with an array of bits selected solely on a 160-bit hash (a keyspace of 2^160). Crytographers have been doing it for years. We're just adding another application on it. An absurdity which should show that even if it stopped at eight bits, one short of the bits required to hold 1-366, there would still be fatal hash collisions--say, Feb 7, Feb 11 and Jun 30 all represented by the same code, in which case you can't figure out if people in the room have the same birthday. Again, I hope if you read what I read above. In the analogy, we're not de-duping birthdays; we're de-duping people BASED on their birthdays. (Which would be a dumb idea because the key space is too small: 366) What you must grasp is that it is *impossible* to represent/re-create/look up the values of 2^65536 bits in fewer than 2^65536 bits--unless you concede that each checksum/hash/fingerprint will represent many different values of the original data--any more than you can represent three bits of data with two. I concede, I concede! The only point I'm trying to make is what are the odds that two different blocks of data will have the same hash (i.e. a hash collision) bin a given data center. Hashing is a technique for saving time in certain circumstances. It is valueless in re-creating (and a lookup is a re-creation) original data when those data can have unlimited arbitrary values. All the blog
Re: [Veritas-bu] Tapeless backup environments?
What you must grasp is that it is *impossible* to represent/re-create/look up the values of 2^65536 bits in fewer than 2^65536 bits--unless you concede that each checksum/hash/fingerprint will represent many different values of the original data--any more than you can represent three bits of data with two. that is why i have turned off all hardware and software compression on my tape drives. imagine trying to store more than 400GB of data onto a single lto3 tape! they say that you can store up to and even more than 800GB, but i don't believe a word of it. there is no way 1 nibble of data can represent 1 byte! once i have the time to study lzr compression and understand it, and see whether or not it is data-loss-less, then i may turn compression back on. until then, tapes are cheap and i'll buy 2.5 times as many as i need. :-) thanks, jerald p.s. our de-dupe vtl does the hash and then a bit by bit comparison of the data block to ensure the data really is the same in order to eliminate the duplicate block. i think some of the confusion may be in not understanding how the de-dupe process works. once you create a hash for a block of data, you are storing the hash AND the block of data. you are never having to re-create a big block a data from a smaller hash. the backup stream of data gets re-written from a string of 8k blocks, into a string of 160-bit pointers which point to the unique 8k blocks of data via the hash table. or something like that... Confidentiality Note: The information contained in this message, and any attachments, may contain confidential and/or privileged material. It is intended solely for the person(s) or entity to which it is addressed. Any review, retransmission, dissemination, or taking of any action in reliance upon this information by persons or entities other than the intended recipient(s) is prohibited. If you received this in error, please contact the sender and delete the material from any computer. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
Hardware compression on your tape drives buys more than saved tapes - it buys reduced backup times. I found that out way back when on DDS tapes. We do compression on our stuff (and I have at many jobs) and have yet to see a restore fail that wasn't due to an issue traced to the original backup job that wasn't noticed at the time rather than some mystical bit change that occurred during the restore. While it is theoretically possible you'll get killed during the next Leonid meteor shower I doubt you're reinforcing your roof with steel to insure it doesn't happen. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Iverson, Jerald Sent: Thursday, October 18, 2007 11:52 AM To: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? What you must grasp is that it is *impossible* to represent/re-create/look up the values of 2^65536 bits in fewer than 2^65536 bits--unless you concede that each checksum/hash/fingerprint will represent many different values of the original data--any more than you can represent three bits of data with two. that is why i have turned off all hardware and software compression on my tape drives. imagine trying to store more than 400GB of data onto a single lto3 tape! they say that you can store up to and even more than 800GB, but i don't believe a word of it. there is no way 1 nibble of data can represent 1 byte! once i have the time to study lzr compression and understand it, and see whether or not it is data-loss-less, then i may turn compression back on. until then, tapes are cheap and i'll buy 2.5 times as many as i need. :-) thanks, jerald p.s. our de-dupe vtl does the hash and then a bit by bit comparison of the data block to ensure the data really is the same in order to eliminate the duplicate block. i think some of the confusion may be in not understanding how the de-dupe process works. once you create a hash for a block of data, you are storing the hash AND the block of data. you are never having to re-create a big block a data from a smaller hash. the backup stream of data gets re-written from a string of 8k blocks, into a string of 160-bit pointers which point to the unique 8k blocks of data via the hash table. or something like that... Confidentiality Note: The information contained in this message, and any attachments, may contain confidential and/or privileged material. It is intended solely for the person(s) or entity to which it is addressed. Any review, retransmission, dissemination, or taking of any action in reliance upon this information by persons or entities other than the intended recipient(s) is prohibited. If you received this in error, please contact the sender and delete the material from any computer. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu -- CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential information and is for the sole use of the intended recipient(s). If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you. -- ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
So you're OK with hash-based de-dupe, which everyone acknowledges has a chance (although quite small) that you could have a hash-collision and potentially corrupt a block of data somewhere, sometime, when you least expect it... But you're NOT ok with the long-running industry standard of loss-less compression algorithms? (All compression algorithms for tape are loss-less algorithms.) Lossy algorithms are only used in things like video compression, where it's ok to lose blocks along the way as long as the human eye can't detect them, or as long as you can fit it on youtube. --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Iverson, Jerald Sent: Thursday, October 18, 2007 8:52 AM To: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? What you must grasp is that it is *impossible* to represent/re-create/look up the values of 2^65536 bits in fewer than 2^65536 bits--unless you concede that each checksum/hash/fingerprint will represent many different values of the original data--any more than you can represent three bits of data with two. that is why i have turned off all hardware and software compression on my tape drives. imagine trying to store more than 400GB of data onto a single lto3 tape! they say that you can store up to and even more than 800GB, but i don't believe a word of it. there is no way 1 nibble of data can represent 1 byte! once i have the time to study lzr compression and understand it, and see whether or not it is data-loss-less, then i may turn compression back on. until then, tapes are cheap and i'll buy 2.5 times as many as i need. :-) thanks, jerald p.s. our de-dupe vtl does the hash and then a bit by bit comparison of the data block to ensure the data really is the same in order to eliminate the duplicate block. i think some of the confusion may be in not understanding how the de-dupe process works. once you create a hash for a block of data, you are storing the hash AND the block of data. you are never having to re-create a big block a data from a smaller hash. the backup stream of data gets re-written from a string of 8k blocks, into a string of 160-bit pointers which point to the unique 8k blocks of data via the hash table. or something like that... Confidentiality Note: The information contained in this message, and any attachments, may contain confidential and/or privileged material. It is intended solely for the person(s) or entity to which it is addressed. Any review, retransmission, dissemination, or taking of any action in reliance upon this information by persons or entities other than the intended recipient(s) is prohibited. If you received this in error, please contact the sender and delete the material from any computer. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments
Sorry, but I just can't keep from jumping in at this point. Not taking either side, but... Are you seriously suggesting that a quote from Wikipedia constitutes empirical scientific research? I could place a posting on there that either concurs with, or totally rejects the position of that posting; and someone else would come along and claim it as gospel. I would be the first to admit that bob944 has made more than a few posts that have pushed my chair back a couple inches, but at least they made me THINK! Saying This is the part where I believe you've made your mind up already. You're saying that no matter what the entire world is saying -- no matter what the numbers are, you're not going to accept hash-based de-dupe. Fine! That's why there are vendors that don't use hashes to de-dupe data. Buy one of those instead. Is pretty gutsy since you have another post within the past few days stating you're ready to RETRACT what you already blogged on this, or blogged on that. Wouldn't THAT be saying that up until that point, YOU WERE SAYING that no matter what the entire world is saying -- no matter what the numbers are, you're not going to accept... If I am asked to restore something for the CEO, and can't, it won't matter a hill of beans what all the theory was and what the odds were. I either can, or I can't. I'll be accountable for that result, and why I got it. As someone so accurately posted recently: We're in the recovery business, not the restore business. I would thing that almost everyone on this forum does some kind of pilot before rolling something out into production. I hope I'm wrong. I love to learn. I'm actually signed up for one of your classes next week. But, if quoting everyone else's posts/blogs/Wikipedia entries, etc. without backing up re-posting them with empirical evidence or firsthand testing is your program agenda, I will skip the engagement... BTW - You Tilt at Windmills (Don Quixote), you don't chase them. ;-) Take care, Kent Eagle MTS Infrastructure Engineer II, MCP, MCSE Tech Services / SMSS --- Message: 1 Date: Thu, 18 Oct 2007 04:06:52 -0400 From: Curtis Preston [EMAIL PROTECTED] Subject: Re: [Veritas-bu] Tapeless backup environments? To: [EMAIL PROTECTED], veritas-bu@mailman.eng.auburn.edu Message-ID: [EMAIL PROTECTED] Content-Type: text/plain; charset=US-ASCII At the risk of chasing windmills, I will continue to try to have this discussion, although it appears to me that you're already made up your mind. I again say that no one is saying that hash collisions can't happen. We are simply saying that the odds of them happening are astromically less than having an undetected/uncorrected bit error on tape. And I believe that the math that I use in my blog post illustrates this. I said: As promised, I looked into applying the Birthday Paradox logic to de-duplication. I blogged about my results here: http://www.backupcentral.com/content/view/145/47/ Long and short of it: If you've got less than 95 Exabytes of data, I think you'll be OK. Bob944 said: One of us still doesn't understand this. :-) Got that right. :-) Your blog raises a red herring in misunderstanding or misrepresenting the applicability of Birthday Paradox. I completely disagree. If you read the Birthday Paradox entry on Wikipedia, it specifically explains how the Birthday Paradox applies in this case. All the BP says is that the odds of a clash (i.e. a birthday match or a hash collision) in an environment increase with the number of elements in the set, and that the odds increase faster than you think: * The odds of two people in the same room having the same birthday increase with the number of people in the room. If there are only two people in the room, those odds will be roughly 1 in 365, or .27% (leap year aside). If there are 23 people in the room, the odds are 50%. * The odds of two DIFFERENT blocks having the same hash (i.e. a hash collision) increase with the number of blocks in the data set If there are two blocks in the set, the odds are 1 in 2^160. If there are less than 12.7 quintillion blocks in the data set, the odds don't show up in a percentage calculated out to 50 decimal places. As soon as you have more than 12.7 quintillion blocks, the odds at least register in 50 decimal places, but are still really small. And to get 12.7 quintillion blocks, you need to store at least 95 Exabytes of data. The number of possible values in BP is 366; there is no data reduction in it, no key values. An algorithm which reduced the 366 possibilities the same way that hashing 8KB down to 160 bits would yield infinitesimal keys smaller than one bit, an absurdity. Yeah, IMHO, we are talking apples and oranges. Let me try to put the hash collision into the birthday world. Let's say that we want a wall of photos of everyone who came to our party. When you show
Re: [Veritas-bu] Tapeless backup environments
I would say no as Wikipedia is like an encyclopedia and is a good spot to start but it isn't peer reviewed published articles so in research it would not be considered a valid source. Dustin D'Amour Wireless Switching Plateau Wireless -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Eagle, Kent Sent: Thursday, October 18, 2007 12:59 PM To: [EMAIL PROTECTED]; veritas-bu@mailman.eng.auburn.edu Cc: [EMAIL PROTECTED] Subject: Re: [Veritas-bu] Tapeless backup environments Sorry, but I just can't keep from jumping in at this point. Not taking either side, but... Are you seriously suggesting that a quote from Wikipedia constitutes empirical scientific research? I could place a posting on there that either concurs with, or totally rejects the position of that posting; and someone else would come along and claim it as gospel. I would be the first to admit that bob944 has made more than a few posts that have pushed my chair back a couple inches, but at least they made me THINK! Saying This is the part where I believe you've made your mind up already. You're saying that no matter what the entire world is saying -- no matter what the numbers are, you're not going to accept hash-based de-dupe. Fine! That's why there are vendors that don't use hashes to de-dupe data. Buy one of those instead. Is pretty gutsy since you have another post within the past few days stating you're ready to RETRACT what you already blogged on this, or blogged on that. Wouldn't THAT be saying that up until that point, YOU WERE SAYING that no matter what the entire world is saying -- no matter what the numbers are, you're not going to accept... If I am asked to restore something for the CEO, and can't, it won't matter a hill of beans what all the theory was and what the odds were. I either can, or I can't. I'll be accountable for that result, and why I got it. As someone so accurately posted recently: We're in the recovery business, not the restore business. I would thing that almost everyone on this forum does some kind of pilot before rolling something out into production. I hope I'm wrong. I love to learn. I'm actually signed up for one of your classes next week. But, if quoting everyone else's posts/blogs/Wikipedia entries, etc. without backing up re-posting them with empirical evidence or firsthand testing is your program agenda, I will skip the engagement... BTW - You Tilt at Windmills (Don Quixote), you don't chase them. ;-) Take care, Kent Eagle MTS Infrastructure Engineer II, MCP, MCSE Tech Services / SMSS --- Message: 1 Date: Thu, 18 Oct 2007 04:06:52 -0400 From: Curtis Preston [EMAIL PROTECTED] Subject: Re: [Veritas-bu] Tapeless backup environments? To: [EMAIL PROTECTED], veritas-bu@mailman.eng.auburn.edu Message-ID: [EMAIL PROTECTED] Content-Type: text/plain; charset=US-ASCII At the risk of chasing windmills, I will continue to try to have this discussion, although it appears to me that you're already made up your mind. I again say that no one is saying that hash collisions can't happen. We are simply saying that the odds of them happening are astromically less than having an undetected/uncorrected bit error on tape. And I believe that the math that I use in my blog post illustrates this. I said: As promised, I looked into applying the Birthday Paradox logic to de-duplication. I blogged about my results here: http://www.backupcentral.com/content/view/145/47/ Long and short of it: If you've got less than 95 Exabytes of data, I think you'll be OK. Bob944 said: One of us still doesn't understand this. :-) Got that right. :-) Your blog raises a red herring in misunderstanding or misrepresenting the applicability of Birthday Paradox. I completely disagree. If you read the Birthday Paradox entry on Wikipedia, it specifically explains how the Birthday Paradox applies in this case. All the BP says is that the odds of a clash (i.e. a birthday match or a hash collision) in an environment increase with the number of elements in the set, and that the odds increase faster than you think: * The odds of two people in the same room having the same birthday increase with the number of people in the room. If there are only two people in the room, those odds will be roughly 1 in 365, or .27% (leap year aside). If there are 23 people in the room, the odds are 50%. * The odds of two DIFFERENT blocks having the same hash (i.e. a hash collision) increase with the number of blocks in the data set If there are two blocks in the set, the odds are 1 in 2^160. If there are less than 12.7 quintillion blocks in the data set, the odds don't show up in a percentage calculated out to 50 decimal places. As soon as you have more than 12.7 quintillion blocks, the odds at least register in 50 decimal places, but are still really small. And to get 12.7 quintillion
Re: [Veritas-bu] Tapeless backup environments
Glad to have another person in the party. What's your birthday? ;) Are you seriously suggesting that a quote from Wikipedia constitutes empirical scientific research? NO. He said that I was misusing the Birthday Paradox, and I merely pointed to the Wikipedia article that uses it the same way. If you search on Birthday Paradox on Google, you'll also find a number of other articles that use the BP in the same way I'm using it, specifically in regards to hash collisions, as the concept is not new to deduplication. It has applied to cryptographic uses of hashing for years. I then went further to explain WHY the BP applies, and I gave a reverse analogy that I believe completed my argument that the BP applies in this situation. So.. As to whether or not what I'm doing is empirical scientific research, It's not. Empirical research requires testing, observation, and repeatability. For the record, I have done repeated testing of many hash-based dedupe systems using hundreds of backups and restores without a single hash occurrence of data corruption, but that doesn't address the question. IMHO, it's the equivalent of saying a meteor has never hit my house so meteors must never hit houses. The discussion is about the statistical probabilities of a meteor hitting your house, and you have to do that with math, not empirical scientific research. I would be the first to admit that bob944 has made more than a few posts that have pushed my chair back a couple inches, but at least they made me THINK! And you're saying that my half-a-dozen or so blog postings on the subject, and none of my responses in this thread don't make you think? I was fine until I quoted Wikipedia, is that it? ;) Is pretty gutsy since you have another post within the past few days stating you're ready to RETRACT what you already blogged on this, or blogged on that. I am admitting that I am not a math or statistics specialist and that I misunderstood the odds before. What's wrong with that? That I was wrong before, or that I'm stating it publicly that I was wrong before? I was wrong. I was told I was wrong because I didn't apply the birthday paradox. So I applied the Birthday Paradox in the same way I see everyone else applying it, and the way that makes sense according to the problem, and the numbers still come out OK. Wouldn't THAT be saying that up until that point, YOU WERE SAYING that no matter what the entire world is saying -- no matter what the numbers are, you're not going to accept... No, because I never said those words or anything like them in my article. I said, some people say this, but I say that. Then I even elicited feedback from the audience. The point of that portion of the article was that some are talking about hash collisions as if they're going to happen to everybody and happen a lot, and I wanted to add some actual math to the discussion, rather than just talk about fear uncertainty and doubt (FUD). I felt there was a little Henny-Penny business going on. If I am asked to restore something for the CEO, and can't, it won't matter a hill of beans what all the theory was and what the odds were. I either can, or I can't. I'll be accountable for that result, and why I got it. As someone so accurately posted recently: We're in the recovery business, not the restore business. You won't get any argument from me. I think you'll find almost that exact sentence in the first few paragraphs of any of my books. Having said that, we all use technologies as part of our backup system that have a failure rate percentage (like tape). And to the best of my understanding, the odds of a single hash collision in 95 Exabytes of data is significantly lower than the odds of having corrupted data on an LTO tape and not even knowing it, based on the odds they publish. Even if you make two copies, the copy could be corrupted, and you could have a failed restore. Yet we're all ok with that, but we're freaking out about hash collisions, which statistically speaking have a MUCH lower probability of happening. I would thing that almost everyone on this forum does some kind of pilot before rolling something out into production. I sure as heck hope so, but I don't think it addresses this issue. So you test it and you don't get any hash collisions. What does that prove? It proves that a meteor has never hit your house. What I recommend (especially if you're using a hash-only de-dupe system) is a constant verification of the system. Use a product like NBU that can do CRC checks against the bytes it's copying or reading, and either copy all de-duped data to tape or run a NBU verify on every backup. If you have a hash collision, your copy or verify will fail, and at least know when it happens. I hope I'm wrong. About what? That I'm an idiot? ;) I think judging me solely on this long, protracted, difficult to follow discussion (with over 70 posts) is probably unfair. Remember also that these posts are often done on my own
Re: [Veritas-bu] Tapeless backup environments?
On Thu, Oct 18, 2007 at 01:44:03PM -0400, Curtis Preston wrote: So you're OK with hash-based de-dupe, which everyone acknowledges has a chance (although quite small) that you could have a hash-collision and potentially corrupt a block of data somewhere, sometime, when you least expect it... But you're NOT ok with the long-running industry standard of loss-less compression algorithms? [...] I think the smiley on the end indicated that it was a humorous comment. At least that's how I took it. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
On 10/18/07, Iverson, Jerald [EMAIL PROTECTED] wrote: ... that is why i have turned off all hardware and software compression on my tape drives. imagine trying to store more than 400GB of data onto a single lto3 tape! they say that you can store up to and even more than 800GB, but i don't believe a word of it. there is no way 1 nibble of data can represent 1 byte! once i have the time to study lzr compression and understand it, snip Hi jerald, Data compression exploits the non-randomness of normal data. Compression algorithms have variable compression rates because their performance is dependent on the data being compressed.Truly random data does NOT compress at all. Typical data is not truly random. Once data has been compressed, it is close to random, so compression can not be applied again. Many encryption algorithms also result in near-random data that does not compress. A formal definition of a data set's randomness is it's Kolmogorov complexity. http://en.wikipedia.org/wiki/Kolmogorov_complexity Compression is just an alternate means of data representation. Several others are at work on your LTO tapes too! http://en.wikipedia.org/wiki/Forward_error_correction http://en.wikipedia.org/wiki/Run_Length_Limited http://en.wikipedia.org/wiki/PRML Don't get too paranoid...these are good things. Austin ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
discussion, although it appears to me that you're already made up your mind. I'd prefer to say I have little interest in a technology which, by design, will retrieve a completely different chunk of data than what was written, with no notice whatsoever. BTW, before you bring out tape errors again, I posted long ago why this argument was not comparable. No point in beating the poor Birthday Paradox to death; you've completely missed the point there. It doesn't matter that the same values come up more often than our intuition suggests--which is the _only_ lesson of BP--what matters is if you use a shorthand to track the values which can't tell that Feb 7 and Dec 28 are different values because you put them in the same hash bucket and therefore think that everything that bucket is Feb 7, you retrieve the wrong data. Here's all a thinking person responsible for data needs to consider: An 8KB chunk of data can have 2^65536 possible values. Representing that 8KB of data in 160 bits means that each of the 2^160 possible checksum/hash/fingerprint values MUST represent, on average, 2^65376 *different* 8KB chunks of data. If that doesn't concern you, well, it's safe to say I won't be hiring you as my backup admin. Or as my technology consultant, since you should know from earlier postings that spoofing your favorite 160-bit hashing algorithm with reasonable-looking fake data is now old hat. The exploit itself should concern us, not to mention that it also illustrates that similar data which yields the same hash is not the once-in-the-lifetime-of-the-universe oddity you portray. Everything mentioned here was covered in the original postings a month ago. Unless there's something new, I'm done with this. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
On Tue, Oct 16, 2007 at 12:09:30AM -0400, bob944 wrote: One of us still doesn't understand this. :-) Your blog raises a red herring in misunderstanding or misrepresenting the applicability of Birthday Paradox. The number of possible values in BP is 366; there is no data reduction in it, no key values. The 366 isn't the data space, it's the keyspace. When we look at a person's birthday, we're hashing them into that space. The paradox then is how many people can we hash before the chance of a collision is significant. Obviously if 400 people are in a room, the number of values exceeds the keyspace and the probability of a collision is greater than 1. An algorithm which reduced the 366 possibilities the same way that hashing 8KB down to 160 bits would yield infinitesimal keys smaller than one bit, an absurdity. I'm afraid I don't understand what you mean with that sentence. An absurdity which should show that even if it stopped at eight bits, one short of the bits required to hold 1-366, there would still be fatal hash collisions--say, Feb 7, Feb 11 and Jun 30 all represented by the same code, in which case you can't figure out if people in the room have the same birthday. What is stopping at 8 bits? Hash collisions can always occur. The question is what is the probability. What you must grasp is that it is *impossible* to represent/re-create/look up the values of 2^65536 bits in fewer than 2^65536 bits--unless you concede that each checksum/hash/fingerprint will represent many different values of the original data--any more than you can represent three bits of data with two. I think everyone aknowledges that as a fact. Hashing is a technique for saving time in certain circumstances. It is valueless in re-creating (and a lookup is a re-creation) original data when those data can have unlimited arbitrary values. The argument is that a process does not have to be infallible to be valuable, much like the electrical and mechanical processes we currently use. That if the chance of failure in the algorithm is much less then the chance of other parts of the system introducing silent data corruption, then the overall amount of data loss is not significantly changed. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
cpreston [EMAIL PROTECTED]: As promised, I looked into applying the Birthday Paradox logic to de-duplication. I blogged about my results here: http://www.backupcentral.com/content/view/145/47/ Long and short of it: If you've got less than 95 Exabytes of data, I think you'll be OK. One of us still doesn't understand this. :-) Your blog raises a red herring in misunderstanding or misrepresenting the applicability of Birthday Paradox. The number of possible values in BP is 366; there is no data reduction in it, no key values. An algorithm which reduced the 366 possibilities the same way that hashing 8KB down to 160 bits would yield infinitesimal keys smaller than one bit, an absurdity. An absurdity which should show that even if it stopped at eight bits, one short of the bits required to hold 1-366, there would still be fatal hash collisions--say, Feb 7, Feb 11 and Jun 30 all represented by the same code, in which case you can't figure out if people in the room have the same birthday. What you must grasp is that it is *impossible* to represent/re-create/look up the values of 2^65536 bits in fewer than 2^65536 bits--unless you concede that each checksum/hash/fingerprint will represent many different values of the original data--any more than you can represent three bits of data with two. Hashing is a technique for saving time in certain circumstances. It is valueless in re-creating (and a lookup is a re-creation) original data when those data can have unlimited arbitrary values. All the blog hand-waving about decimal places, Zetabytes and the specious comparison to undetected write errors will not change that. What _would_ be a useful exercise for the reader is to discover how many unique values of 8KB are, on average, represented by a given 160-bit checksum/hash/fingerprint. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
[Veritas-bu] Tapeless backup environments?
As promised, I looked into applying the Birthday Paradox logic to de-duplication. I blogged about my results here: http://www.backupcentral.com/content/view/145/47/ Long and short of it: If you've got less than 95 Exabytes of data, I think you'll be OK. +-- |This was sent by [EMAIL PROTECTED] via Backup Central. |Forward SPAM to [EMAIL PROTECTED] +-- ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Curtis Preston Sent: 01 October 2007 06:35 To: [EMAIL PROTECTED]; veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? ... These are odds based on the size of the key space. If you have 2^160 odds, you have a 1:2^160 chance of a collision. by saying that, the implication is that the keyspace is uniform. It's not. The probablity of a hash collision is a function of the uniformity of the keyspace as well as the number of items you've hashed and the size of the key. There's lots of research in the crypto field that's relevant to de-dupe. You also should consider the characteristics of the de-dupe software when it encounters a hash collision. Backups are the last line of defence for many, when all else (personal copies, replication, snapshots etc.) has failed. The 'acceptable risk' of a hash collision is of little comfort when you've got one. Does it fail silently, throw it's hands in the air and core dump, or handle the situation gracefully and carry on without missing a beat. Ask them what they do. As Curtis mentioned, not all de-dupe s/ware relies purely on hashes. Balance this with the /fact/ that there's already a chance of undetected corruption in the components you buy today, which is why most technologies that survive impose their own data validation checks instead of relying purely on the underlying technology in the stack to have checked it for them. The multi-layered checks that go on improve your overall confidence. At least one design in the SiS field also accepts that hashing algorithms will improve over time and they've had the foresight to be able to drop in new hashing schemes in future. When picking de-dupe software you should also care about Intellectual Property. Who's got what isn't necessarily clear in this space, and the patent lawyers won't be far away. Picking the big boys help here, but also look at people with a mature view to the marketplace (eg. some companies are prepared to talk about licensing deals rather than court cases when they encounter infringement) There's lots of other things to consider in picking an algorithm, including how well it handles patterns that don't fall naturaly on block boundaries (think of the challenges involved in de-duping 'the quick brown fox' and 'the quicker brown fox') that will affect de-dupe ratios, and how that affects performance. And the solution's not just about the algorithm. De-dupe is a great advance, and a disruptive technology not just for backup but also for primary storage. Look forward to it, but go in with your eyes open. NOTICE: If received in error, please destroy and notify sender. Sender does not intend to waive confidentiality or privilege. Use of this email is prohibited when received in error. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
Chris Freemantle said: It's interesting that the probability of any 2 randomly selected hashs being the same is quoted, rather than the probability that at least 2 out of a whole group are the same. That's probably because the minutely small chance becomes rather bigger when you consider many hashs. This will still be small, but I suspect not as reassuringly small. To illustrate this consider the 'birthday paradox'. I'm really glad you point this out. The way I interpret this is that the odds of their being a hash collision in your environment increase with every new block of data you submit to the de-duplication system. I've talked to somebody who has researched this mathematically, and he says he's going to share with me his calculations. I'll share them if/when he shares them with me. As a proponent of these systems, I certainly don't want to misrepresent the odds they represent. For our data I would certainly not use de-duping, even if it did work well on image data. I think you're under the misconception that all de-dupe systems use ONLY hashes to identify redundant data. While there are products that do this (and I still trust them more than you do), there are also products that do a full block comparison of the supposedly matching blocks before throwing one of them away. In addition, there are ways to completely remove the risk you're worried about. If you backup to a de-dupe backup system, regardless of its design, and then use your backup software to copy from it to tape (or anything), you verify the de-duped data, as any good backup software will check all data it copies against its own stored checksums. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
Bob, I'll try to respond as best as I can. No importa. The length of the checksum/hash/fingerprint and the sophistication of its algorithm only affect how frequently--not whether--the incorrect answer is generated. You and I don't disagree on this. The only thing we differ with is the odds of the event. I think the odds are small enough to not be concerned with, and you think they're larger than that. (I also think it's important to state what I stated in my other reply. Most de-dupe systems do not rely only on hashes. So if you can't get past this whole hashing thing, there's no reason to reject de-dupe altogether. Just make sure your vendor uses an alternate method. The notion that the bad guys will never figure out a way to plant a silent data-change based on checksum/hash/fingerprint collisions is, IMO, naive. So someone is going to exploit the hash collision possibilities in my backup system to do what, exactly? As much as I've spoken and written about storage security, I can't for the life of me figure out what someone would hope to gain or how they would gain it this way. Those are impressive, and dare I guess, vendor-supplied, numbers. And they're meaningless. These are odds based on the size of the key space. If you have 2^160 odds, you have a 1:2^160 chance of a collision. What _is_ important? To me, it's important that if I read back any of the N terrabytes of data I might store this week, I get the same data that was written, not a silently changed version because the checksum/hash/fingerprint of one block that I wrote collides with another cheksum/hash/fingerprint. This is referring to the birthday paradox. As I stated in another post, I haven't thought about this before, and am looking into what the real odds are. I'm trying to translate it into actual numbers. I can NOT have that happen to any block--in a file clerk's .pst, a directory inode or the finance database. Probably, it won't happen is not acceptable. Couldn't agree more. Let's compare those odds with the odds of an unrecoverable read error on a typical disk--approximately 1 in 100 trillion Bogus comparison. In this straw man, that 1/100,000,000,000,000 read error a) probably doesn't affect anything I thought probably wasn't acceptable? I'm sorry, that was just too close to your previous use of probably in a very different context. probably doesn't affect anything because of the higher-level RAID array it's in and b) if it does, there's an error, a we-could-not-read-this-data, you-can't-proceed, stop, fail, get-it-from-another-source error--NOT a silent changing of the data from foo to bar on every read with no indication that it isn't the data that was written. I think Darren's other posts about this point are sufficient. It happens. It happens all the time, and is well documented. And yet the industry's ok with this. On the other hand, the odds of what we're talking about are significantly smaller and people are freaking out. If you want to talk about the odds of something bad happening and not knowing it, keep using tape. Everyone who has worked with tape for any length of time has experienced a tape drive writing something that it then couldn't read. That's not news, and why we've been making copies of data for, oh, 50 years or so. I'm just saying that a hash collision, however possible, would basically translate into a failed backup that looks good. Do you have any idea how many failed backups that look good happen every single day with tape? And, as long as you bring up making copies, making copies of your de-duped data removes any concerns, as it verifies the original. Compare that to successful deduplication disk restores. According to Avamar Technologies Inc. (recently acquired by EMC Corp.), none of its customers has ever had a failed restore. Now _there's_ an unbiased source. Touche'. Anyone who has actually experienced a hash collision in their de-duplication backup system please stand up. Given the hype that de-dupe has made, don't you think that anyone who had experienced such a thing would have reported it and such a report would have been given big press? I sure do. And yet there has been nothing. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
On Wed, Sep 26, 2007 at 05:15:08PM -0400, bob944 wrote: Perhaps anything can have a failure mode where it doesn't alert--but in a previous lifetime in hardware and some design, I saw only one undetected data transformation that did not crash or in some way cause obvious problems (intermittent gate in a mainframe adder that didn't affect any instructions used by the OS). There's a lot more data out there now (more chances for problems). Disk firmware has become much more complex. I don't remember a disk that didn't maintain, compare and _use for error detection_, the cylinder, head and sector numbers in the format. Disks may (usually) do that, but they don't report it back to you so you can verify, and they're not perfect. One of the ZFS developers wrote about a disk firmware bug they uncovered. Every once in a while the disk would return the data not from the requested block but from a block with some odd calculated offset from that one. Unless the array/controller/system is checking the data, you'll never know until it hits something critical. Netapp also talks about the stuff they had to add because of silently dropped writes and corrupted reads. Everything has an error rate. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
cpreston: Simplistically, it checksums the block and looks in a table of checksums-of-blocks-that-it-already-stores to see if the identical ahem, anyone see a hole here? data already lives there. To what hole do you refer? The idea that N bits of data can unambiguously be represented by fewer than N bits. Anyone who claims to the contrary might as well knock out perpetual motion, antigravity and faster-than-light travel while they're on a roll. I see one in your simplistic example, but not in what actually happens (which require a much longer technical explanation). Hence my introduction that began with [s]implistically. But throw in all the much longer technical explanation you like, any process which compares a reduction-of-data to another reduction-of-data will sooner or later return foo when what was originally stored was bar. cpreston: There are no products in the market that rely solely on a checksum to identify redundant data. There are a few that rely solely on a 160-bit hash, which is significantly larger than a checksum (typically 12-16 No importa. The length of the checksum/hash/fingerprint and the sophistication of its algorithm only affect how frequently--not whether--the incorrect answer is generated. [...] The ability to forcibly create a hash collision means absolutely nothing in the context of deduplication. Of course it does. Most examples in the literature concern storing crafted-data-pattern-A (pay me one dollar) in order for the data to be read later as something different (pay me one million dollars). It can't have escaped your attention that every day, some yahoo crafts another buffer-or-stack overflow exploit; some of them are brilliant. The notion that the bad guys will never figure out a way to plant a silent data-change based on checksum/hash/fingerprint collisions is, IMO, naive. What matters is the chance that two random chunks would have a hash collision. With a 128-bit and 160-bit key space, the odds of that happening are 1 in 2128 with MD5, and 1 in 2160 with SHA-1. That's 1038 and 1048, respectively. If you Grasshopper, the wisdom is not in the numbers, it is in remembering that HTML will not paste into ASCII well. But I suspect you mean one in 2^128 or similar. Those are impressive, and dare I guess, vendor-supplied, numbers. And they're meaningless. We do not care about the odds that a particular block the quick brown fox jumps over the lazy dog checksums/hashes/fingerprints to the same value as another particular block now is the time for all good men to come to the aid of their party. Of _course_ that will be astronomically unlikely, and with sufficient hand-waving (to quote your article: the odds of a hash collision with two random chunks are roughly 1,461,501,637,330,900,000,000,000,000 times greater than the number of bytes in the known computing universe) these totally meaningless numbers can seem important. They're not. What _is_ important? To me, it's important that if I read back any of the N terrabytes of data I might store this week, I get the same data that was written, not a silently changed version because the checksum/hash/fingerprint of one block that I wrote collides with another cheksum/hash/fingerprint. I can NOT have that happen to any block--in a file clerk's .pst, a directory inode or the finance database. Probably, it won't happen is not acceptable. Let's compare those odds with the odds of an unrecoverable read error on a typical disk--approximately 1 in 100 trillion Bogus comparison. In this straw man, that 1/100,000,000,000,000 read error a) probably doesn't affect anything because of the higher-level RAID array it's in and b) if it does, there's an error, a we-could-not-read-this-data, you-can't-proceed, stop, fail, get-it-from-another-source error--NOT a silent changing of the data from foo to bar on every read with no indication that it isn't the data that was written. If you want to talk about the odds of something bad happening and not knowing it, keep using tape. Everyone who has worked with tape for any length of time has experienced a tape drive writing something that it then couldn't read. That's not news, and why we've been making copies of data for, oh, 50 years or so. Compare that to successful deduplication disk restores. According to Avamar Technologies Inc. (recently acquired by EMC Corp.), none of its customers has ever had a failed restore. Now _there's_ an unbiased source. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
Just a teensy point - LTO3 tapes should store 400Gb natively. They're marketed as having a capacity up to 800Gb, but that's with 2:1 compression. We normally get about 550GB for MRI data. LTO4 are available with 800Gb native capacity. The drives can also encrypt data. Dave Markham wrote: Guys i've just read this thread and can say im very interested in it. The first thing is i learned a new term called deduplication which i didn't know existed. Question : I gather Deduplication is using other software. DataDomain i think i saw mentioned. Where does this fit in with Netbackup and does the software reside on every client or just a server somewhere? Ok, so im trying to kit refresh a backup environment for a customer which has 2 sites. Production and DR about 200 miles apart. There is a link between the sites but the customer will probably frown on increased bandwidth charges to transfer backup data across for DisasterRecovery purposes. Data is probably only 1 TB for the site with perhaps 70% being required to be transfered daily to offsite media. Currently i use tape and i was just speccing a new tape system as i thought by using disk based backups, and retentions of weekly/monthly backups lasting say 6 weeks, im going to need a LOT of disk, plus the bandwidth transfer costs to DR site LTO3 tapes are storing 200gb a tape which is pretty good compared to disk i thought. I guess in my set up its a trade off between :- Initial cost of disk array vs initial cost of tape library, drives and media Time take to backup ( network will be bottle neck here. Still on 100Meg lan with just 2 DB servers using GigaBit lan to backup server. Offsite transfer of tapes daily to offsite location vs Cost of increased bandwith between sites to transfer backup data. Im now confused what to propose :) -- Do you want a picture of your brain - volunteer for a brain scan! http://www.fil.ion.ucl.ac.uk/Volunteers/ Computer systems go wrong - even backup systems Be paranoid! Chris Freemantle, Data Manager Wellcome Trust Centre for Neuroimaging +44 (0)207 833 7496 www.fil.ion.ucl.ac.uk ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
Most of this while well documented seems to boil down to the same alarmist notion that had people trying to ban cell phones in gas stations. The possibility that something untoward COULD happen does NOT mean it WILL happen. To date I don't know of a single gas pump explosion or car fire that was traced to cell phone usage at the pump. Oddly enough though no one monitors gas pumps to be sure users aren't re-entering their vehicles and fires HAVE been traced to static electricity caused by that. If odds are so important it seems it would be important to worry about the odds that your data center, your offsite storage location and your Disaster Recovery site will all be taken out at the same time. I also suggest the argument is flawed because it seems to imply that only the cksum is stored and no actual the data - it is original compressed data AND the cksum that result in the restore - not the cksum alone. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of bob944 Sent: Wednesday, September 26, 2007 4:03 AM To: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? cpreston: Simplistically, it checksums the block and looks in a table of checksums-of-blocks-that-it-already-stores to see if the identical ahem, anyone see a hole here? data already lives there. To what hole do you refer? The idea that N bits of data can unambiguously be represented by fewer than N bits. Anyone who claims to the contrary might as well knock out perpetual motion, antigravity and faster-than-light travel while they're on a roll. I see one in your simplistic example, but not in what actually happens (which require a much longer technical explanation). Hence my introduction that began with [s]implistically. But throw in all the much longer technical explanation you like, any process which compares a reduction-of-data to another reduction-of-data will sooner or later return foo when what was originally stored was bar. cpreston: There are no products in the market that rely solely on a checksum to identify redundant data. There are a few that rely solely on a 160-bit hash, which is significantly larger than a checksum (typically 12-16 No importa. The length of the checksum/hash/fingerprint and the sophistication of its algorithm only affect how frequently--not whether--the incorrect answer is generated. [...] The ability to forcibly create a hash collision means absolutely nothing in the context of deduplication. Of course it does. Most examples in the literature concern storing crafted-data-pattern-A (pay me one dollar) in order for the data to be read later as something different (pay me one million dollars). It can't have escaped your attention that every day, some yahoo crafts another buffer-or-stack overflow exploit; some of them are brilliant. The notion that the bad guys will never figure out a way to plant a silent data-change based on checksum/hash/fingerprint collisions is, IMO, naive. What matters is the chance that two random chunks would have a hash collision. With a 128-bit and 160-bit key space, the odds of that happening are 1 in 2128 with MD5, and 1 in 2160 with SHA-1. That's 1038 and 1048, respectively. If you Grasshopper, the wisdom is not in the numbers, it is in remembering that HTML will not paste into ASCII well. But I suspect you mean one in 2^128 or similar. Those are impressive, and dare I guess, vendor-supplied, numbers. And they're meaningless. We do not care about the odds that a particular block the quick brown fox jumps over the lazy dog checksums/hashes/fingerprints to the same value as another particular block now is the time for all good men to come to the aid of their party. Of _course_ that will be astronomically unlikely, and with sufficient hand-waving (to quote your article: the odds of a hash collision with two random chunks are roughly 1,461,501,637,330,900,000,000,000,000 times greater than the number of bytes in the known computing universe) these totally meaningless numbers can seem important. They're not. What _is_ important? To me, it's important that if I read back any of the N terrabytes of data I might store this week, I get the same data that was written, not a silently changed version because the checksum/hash/fingerprint of one block that I wrote collides with another cheksum/hash/fingerprint. I can NOT have that happen to any block--in a file clerk's .pst, a directory inode or the finance database. Probably, it won't happen is not acceptable. Let's compare those odds with the odds of an unrecoverable read error on a typical disk--approximately 1 in 100 trillion Bogus comparison. In this straw man, that 1/100,000,000,000,000 read error a) probably doesn't affect anything because of the higher-level RAID array it's in and b) if it does, there's an error, a we-could-not-read-this-data, you-can't-proceed, stop, fail, get-it-from-another-source error
Re: [Veritas-bu] Tapeless backup environments?
Pls read my other post about the odds of this happening. With a decent key space, the odds of a hash collision with a 160=bit key space are so small that any statistician would call them zero. 1 in 2^160. Do you know how big that number is? It's a whole lot bigger than it looks. And those odds are significantly better than the odds that you would write a bad block of data to a regular disk drive and never know it. --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies -Original Message- From: bob944 [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 26, 2007 4:03 AM To: veritas-bu@mailman.eng.auburn.edu Cc: Curtis Preston Subject: RE: [Veritas-bu] Tapeless backup environments? cpreston: Simplistically, it checksums the block and looks in a table of checksums-of-blocks-that-it-already-stores to see if the identical ahem, anyone see a hole here? data already lives there. To what hole do you refer? The idea that N bits of data can unambiguously be represented by fewer than N bits. Anyone who claims to the contrary might as well knock out perpetual motion, antigravity and faster-than-light travel while they're on a roll. I see one in your simplistic example, but not in what actually happens (which require a much longer technical explanation). Hence my introduction that began with [s]implistically. But throw in all the much longer technical explanation you like, any process which compares a reduction-of-data to another reduction-of-data will sooner or later return foo when what was originally stored was bar. cpreston: There are no products in the market that rely solely on a checksum to identify redundant data. There are a few that rely solely on a 160-bit hash, which is significantly larger than a checksum (typically 12-16 No importa. The length of the checksum/hash/fingerprint and the sophistication of its algorithm only affect how frequently--not whether--the incorrect answer is generated. [...] The ability to forcibly create a hash collision means absolutely nothing in the context of deduplication. Of course it does. Most examples in the literature concern storing crafted-data-pattern-A (pay me one dollar) in order for the data to be read later as something different (pay me one million dollars). It can't have escaped your attention that every day, some yahoo crafts another buffer-or-stack overflow exploit; some of them are brilliant. The notion that the bad guys will never figure out a way to plant a silent data-change based on checksum/hash/fingerprint collisions is, IMO, naive. What matters is the chance that two random chunks would have a hash collision. With a 128-bit and 160-bit key space, the odds of that happening are 1 in 2128 with MD5, and 1 in 2160 with SHA-1. That's 1038 and 1048, respectively. If you Grasshopper, the wisdom is not in the numbers, it is in remembering that HTML will not paste into ASCII well. But I suspect you mean one in 2^128 or similar. Those are impressive, and dare I guess, vendor-supplied, numbers. And they're meaningless. We do not care about the odds that a particular block the quick brown fox jumps over the lazy dog checksums/hashes/fingerprints to the same value as another particular block now is the time for all good men to come to the aid of their party. Of _course_ that will be astronomically unlikely, and with sufficient hand-waving (to quote your article: the odds of a hash collision with two random chunks are roughly 1,461,501,637,330,900,000,000,000,000 times greater than the number of bytes in the known computing universe) these totally meaningless numbers can seem important. They're not. What _is_ important? To me, it's important that if I read back any of the N terrabytes of data I might store this week, I get the same data that was written, not a silently changed version because the checksum/hash/fingerprint of one block that I wrote collides with another cheksum/hash/fingerprint. I can NOT have that happen to any block--in a file clerk's .pst, a directory inode or the finance database. Probably, it won't happen is not acceptable. Let's compare those odds with the odds of an unrecoverable read error on a typical disk--approximately 1 in 100 trillion Bogus comparison. In this straw man, that 1/100,000,000,000,000 read error a) probably doesn't affect anything because of the higher-level RAID array it's in and b) if it does, there's an error, a we-could-not-read-this-data, you-can't-proceed, stop, fail, get-it-from-another-source error--NOT a silent changing of the data from foo to bar on every read with no indication that it isn't the data that was written. If you want to talk about the odds of something bad happening and not knowing it, keep using tape. Everyone who has worked with tape for any length of time has experienced a tape drive writing something that it then couldn't read. That's not news, and why we've been making
Re: [Veritas-bu] Tapeless backup environments?
It's interesting that the probability of any 2 randomly selected hashs being the same is quoted, rather than the probability that at least 2 out of a whole group are the same. That's probably because the minutely small chance becomes rather bigger when you consider many hashs. This will still be small, but I suspect not as reassuringly small. To illustrate this consider the 'birthday paradox'. How many people do you need in a room to have at least a 50% chance that 2 of them have the same birthday? The chance of any 2 randomly chosen people sharing the same birthday is 1/365 (neglecting leap years). Thats quite small, so we need a lot of people to get a 50% chance, right? Wrong. You need 23 people. Google for 'birthday paradox' for the simple maths. For our data I would certainly not use de-duping, even if it did work well on image data. bob944 wrote: cpreston: Simplistically, it checksums the block and looks in a table of checksums-of-blocks-that-it-already-stores to see if the identical ahem, anyone see a hole here? data already lives there. To what hole do you refer? The idea that N bits of data can unambiguously be represented by fewer than N bits. Anyone who claims to the contrary might as well knock out perpetual motion, antigravity and faster-than-light travel while they're on a roll. I see one in your simplistic example, but not in what actually happens (which require a much longer technical explanation). Hence my introduction that began with [s]implistically. But throw in all the much longer technical explanation you like, any process which compares a reduction-of-data to another reduction-of-data will sooner or later return foo when what was originally stored was bar. cpreston: There are no products in the market that rely solely on a checksum to identify redundant data. There are a few that rely solely on a 160-bit hash, which is significantly larger than a checksum (typically 12-16 No importa. The length of the checksum/hash/fingerprint and the sophistication of its algorithm only affect how frequently--not whether--the incorrect answer is generated. [...] The ability to forcibly create a hash collision means absolutely nothing in the context of deduplication. Of course it does. Most examples in the literature concern storing crafted-data-pattern-A (pay me one dollar) in order for the data to be read later as something different (pay me one million dollars). It can't have escaped your attention that every day, some yahoo crafts another buffer-or-stack overflow exploit; some of them are brilliant. The notion that the bad guys will never figure out a way to plant a silent data-change based on checksum/hash/fingerprint collisions is, IMO, naive. What matters is the chance that two random chunks would have a hash collision. With a 128-bit and 160-bit key space, the odds of that happening are 1 in 2128 with MD5, and 1 in 2160 with SHA-1. That's 1038 and 1048, respectively. If you Grasshopper, the wisdom is not in the numbers, it is in remembering that HTML will not paste into ASCII well. But I suspect you mean one in 2^128 or similar. Those are impressive, and dare I guess, vendor-supplied, numbers. And they're meaningless. We do not care about the odds that a particular block the quick brown fox jumps over the lazy dog checksums/hashes/fingerprints to the same value as another particular block now is the time for all good men to come to the aid of their party. Of _course_ that will be astronomically unlikely, and with sufficient hand-waving (to quote your article: the odds of a hash collision with two random chunks are roughly 1,461,501,637,330,900,000,000,000,000 times greater than the number of bytes in the known computing universe) these totally meaningless numbers can seem important. They're not. What _is_ important? To me, it's important that if I read back any of the N terrabytes of data I might store this week, I get the same data that was written, not a silently changed version because the checksum/hash/fingerprint of one block that I wrote collides with another cheksum/hash/fingerprint. I can NOT have that happen to any block--in a file clerk's .pst, a directory inode or the finance database. Probably, it won't happen is not acceptable. Let's compare those odds with the odds of an unrecoverable read error on a typical disk--approximately 1 in 100 trillion Bogus comparison. In this straw man, that 1/100,000,000,000,000 read error a) probably doesn't affect anything because of the higher-level RAID array it's in and b) if it does, there's an error, a we-could-not-read-this-data, you-can't-proceed, stop, fail, get-it-from-another-source error--NOT a silent changing of the data from foo to bar on every read with no indication that it isn't the data that was written. If you want to talk about the odds of something bad happening and not knowing it, keep using
Re: [Veritas-bu] Tapeless backup environments?
On Wed, Sep 26, 2007 at 04:02:49AM -0400, bob944 wrote: Bogus comparison. In this straw man, that 1/100,000,000,000,000 read error a) probably doesn't affect anything because of the higher-level RAID array it's in and b) if it does, there's an error, a we-could-not-read-this-data, you-can't-proceed, stop, fail, get-it-from-another-source error--NOT a silent changing of the data from foo to bar on every read with no indication that it isn't the data that was written. While I find the compare only based on hash a bit annoying for other reasons, the argument above doesn't convince me. Disks, controllers, and yes RAID arrays can fail silently in all sorts of ways by either acknowledging a write that is not done, writing to the wrong location, reading from the wrong location, or reading blocks where only some of the data came from the correct location. Most RAID systems do not verify data on read to protect against silent data errors on the storage, only against obvious failures. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
On Wed, Sep 26, 2007 at 09:58:12AM -0400, Jeff Lightner wrote: I also suggest the argument is flawed because it seems to imply that only the cksum is stored and no actual the data - it is original compressed data AND the cksum that result in the restore - not the cksum alone. It's not that the actual data isn't stored, it's whether or not the actual data is checked. Some algorithms search through the hash space, and if a hit comes up, they assume that the previously stored data is a match without a comparison. The original data must always be stored. Even if it were possible to run a hash algorithm in reverse quickly, there would be no way to determine which of various possible input strings was the original. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
On Wed, Sep 26, 2007 at 04:22:01PM +0100, Chris Freemantle wrote: For our data I would certainly not use de-duping, even if it did work well on image data. There are different ways of doing deduplication. Not all of them rely on hash signature matching to find redundant data. You should talk with a particular vendor and see how they accomplish it. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
Most of this while well documented seems to boil down to the same alarmist notion that had people trying to ban cell phones in gas stations. The possibility that something untoward COULD happen does NOT mean it WILL happen. To date I don't know of a single gas pump I can't speak for car fires, but I can speak for checksums/hashes/fingerprints mapping to more than one set of data. It's been demonstrated. It happens. It _has_ to happen. It's the way these data reductions work, and the reason why it's more convenient to refer to small hashes of data rather than the full data for many uses--this has been a programming commonplace since the '50s. But programmers know it's not a two-way street: a set of data generates only one checksum/hash/fingerprint, but one checksum/hash/fingerprint maps to more than one set of data. And that's fine, for a program that takes this into account (either because it doesn't matter to the program's logic or a secondary step checks the data). As a trivial example, reducing three-bit data to a two-bit checksum means that trying to go backwards will retrieve the wrong three-bit data 50% of the time. Bigger hashes and more sophisticated algorithms reduce the number of times you get the wrong data; they don't eliminate it. If odds are so important it seems it would be important to worry about the odds that your data center, your offsite storage location and your Disaster Recovery site will all be taken out at the same time. And if it's not important that the data you read may not be what was written, don't let me stop you. _The odds are_ that it'll be okay. I also suggest the argument is flawed because it seems to imply that only the cksum is stored and no actual the data - it is original compressed data AND the cksum that result in the restore - not the cksum alone. If I get your meaning, you have an incorrect understanding of the argument--nobody is talking about generating the original data from a checksum. As I said in what you quoted (trimmed here), every unique (as determined by the implementation) block of data gets stored, once. A data stream is stored as a list of pointers or checksums/hashes/fingerprints which refer to those common-storage blocks. Any number of data streams will point to the same block when they have it in common, and as many times as that block occurs in their data stream. To read the data stream later, the list of pointers tells the implementation what blocks to retrieve and send back to the file reader. Now, if foo and bar both reduced to the same checksum/hash/fingerprint when stored, somebody is going to receive the wrong data when the stream(s) that had those data are read. So sorry about that corrupted payroll master file... ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
Pls read my other post about the odds of this happening. With a decent key space, the odds of a hash collision with a 160=bit key space are so small that any statistician would call them zero. 1 in 2^160. Do you know how big that number is? It's a whole lot bigger than it looks. And those odds are significantly better than the odds that you would write a bad block of data to a regular disk drive and never know it. I did read your other post, and addressed your numbers. C Freemantle makes the same point I do, perhaps more clearly, in his birthday paradox posting. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
On Wed, Sep 26, 2007 at 04:02:49AM -0400, bob944 wrote: Bogus comparison. In this straw man, that 1/100,000,000,000,000 read error a) probably doesn't affect anything because of the higher-level RAID array it's in and b) if it does, there's an error, a we-could-not-read-this-data, you-can't-proceed, stop, fail, get-it-from-another-source error--NOT a silent changing of the data from foo to bar on every read with no indication that it isn't the data that was written. While I find the compare only based on hash a bit annoying for other reasons, the argument above doesn't convince me. Disks, controllers, and yes RAID arrays can fail silently in all sorts of ways by either acknowledging a write that is not done, writing to the wrong location, reading from the wrong location, or reading blocks where only some of the data came from the correct location. Most RAID systems do not verify data on read to protect against silent data errors on the storage, only against obvious failures. Perhaps anything can have a failure mode where it doesn't alert--but in a previous lifetime in hardware and some design, I saw only one undetected data transformation that did not crash or in some way cause obvious problems (intermittent gate in a mainframe adder that didn't affect any instructions used by the OS). I don't remember a disk that didn't maintain, compare and _use for error detection_, the cylinder, head and sector numbers in the format. The write frailties mentioned, if they occur, will fail on read. And the read frailties mentioned will generally (homage paid to the mainframe example I cited as the _only_ one I ever saw that didn't) cause enough mayhem that apps or data or systems go belly-up in a big way, fast. These events, like double-bit parity errors or EDAC failures, involve 1. that something breaks in the first place 2. that it not be reported 3. that the effects are so subtle that they are unnoticed (the app or system doesn't crash, the data aren't wildly corrupted, ...) The problem with checksumming/hashing/fingerprinting is that the methodology has unavoidable errors designed in, and an implementation with no add-on logic to prevent or detect them will silently corrupt data. That's totally different, IMO. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
And yet there are many companies backing up well beyond a Terabyte from remote offices back to their central office using de-duplication. Consider JPMC's presentation at the last vision. They're backing up over 200 remote offices using Puredisk, a de-duplication backup product. I don't remember the exact numbers, but many of them were quite large. I don't think that bandwidth is free, but neither are trucks. AND if you're going the truck route, make sure you add the cost and risk of an encryption system to the mix. --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ed Wilts Sent: Saturday, September 22, 2007 9:35 AM To: 'Jeff Lightner'; veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? Here's some simple math that may help (complements of ExaGrid's web site). If you have 1TB of data with a 2% change rate, you'll need to back up 20GB of daily incrementals. To replicate this to another site in 18 hours requires 3Mbps of bandwidth. If you have lots of bandwidth or not too much data, replication to an offsite location may make sense. But to think that you can replicate your backups for 20TB of data to another state is going to make your network group squirm. Iron Mountain looks pretty cheap comparing to offsite electronic replication. We have 1 application by itself that adds 30GB of new data every day. It's being replicated within the metro area over a 1Gbps pipe (real time, not via backups). We sure couldn't replicate everything... As the OLD saying goes, never understand the bandwidth of a station wagon full of tapes. .../Ed -- Ed Wilts, RHCE, BCFP, BCSD Mounds View, MN, USA mailto:[EMAIL PROTECTED] From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeff Lightner Sent: Friday, September 21, 2007 8:44 AM To: veritas-bu@mailman.eng.auburn.edu Subject: [Veritas-bu] Tapeless backup environments? Yesterday our director said that he doesn't intend to ever upgrade existing STK L700 because eventually we'll go tapeless as that is what the industry is doing. The idea being we'd have our disk backup devices here (e.g. Data Domain) and transfer to offsite storage to another disk device so as to eliminate the need for ever transporting tapes. It made me wonder if anyone was actually doing the above already or was planning to do so? ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
A 1 TB array that can store 20 TB of de-duped data in it will cost about $20K. (A general rule of them is to base your pricing on a 20:1 de-dupe ratio, then price it at about $1/GB of effective storage. If you do that, you'll be close to list price of a lot of products.) At that cost, it's very close to the price of a tape library fully populated with tapes and drives. As to whether or not it's worth it for a given setup, you should obviously test it vs the pricing, but it's very uncommon for it to not make sense financially. I can think of three setups that are known issues: 1. If you're using it for disk staging and not storing any retention on it. A lot of the de-dupe comes from de-duping full backups against each other. 2. If you're trying to de-dupe non-dedupe-able things, such as seismic data, medical imaging data, or any other data types that are automatically created by a computer (as opposed to database entries and Word docs.) 3. If your backup product doesn't do full backups of filesystem data, you will not get as much as other people. Everything is also negotiable. If you've tested and you're not getting the advertised de-dupe ratio, use that in the negotiation stage. If they generally advertise 20:1 and you're only getting 10:1, it would seem reasonable to assume a 50% discount. --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies -Original Message- From: Ed Wilts [mailto:[EMAIL PROTECTED] Sent: Saturday, September 22, 2007 9:47 AM To: Curtis Preston; 'Justin Piszcz'; 'Jeff Lightner' Cc: veritas-bu@mailman.eng.auburn.edu Subject: RE: [Veritas-bu] Tapeless backup environments? But Curtis, a disk drive by itself isn't very useful either - you'll need to a controller or two. And don't forget to factor in the price of the de-duplication appliances or software. Those suckers are *NOT* cheap. An appliance to support 1TB of compressed data lists out at about $20K. Unless you get a *lot* of de-duplication - and not everybody does - that appliance is going to get killed on price compared to not de-duping it. It took me only 30 minutes with a de-dupe vendor last week to eliminate their product from consideration in our environment. .../Ed -- Ed Wilts, RHCE, BCFP, BCSD Mounds View, MN, USA mailto:[EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:veritas-bu- [EMAIL PROTECTED] On Behalf Of Curtis Preston Sent: Friday, September 21, 2007 12:10 PM To: Justin Piszcz; Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? First, you can't compare the cost of disk and tape directly like that. You have to include the drives and robots. A drive by itself is useful; a tape by itself is not. Setting that aside, if I put that disk in a system that's doing 20:1 de-duplication, my cost is now 1.65c/GB vs your 3-9c/GB. --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Justin Piszcz Sent: Friday, September 21, 2007 7:36 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? I believe disks are 33c/gigabyte and tapes are 3-9cents/gigabyte or even cheaper, I do not remember the exact figures, but someone I know has done a cost analysis and tapes were by far cheaper. Also something that nobody calculates is the cost of power to keep disks spinning. Justin. On Fri, 21 Sep 2007, Jeff Lightner wrote: Disk is not cheaper? You've done a cost analysis? Not saying you're wrong and I haven't done an analysis but I'd be surprised if disks didn't actually work out to be cheaper over time: 1) Tapes age/break - We buy on average several hundred tapes a year - support on a disk array for failing disks may or may not be more expensive. 2) Transport/storage - We have to pay for offsite storage and transfer - it seems just putting an array in offsite facility would eliminate the need for transportation (in trucks) cost. Of course there would be cost in the data transfer disk to disk but since everyone seems to have connectivity over the internet it might be possible to do this using a B2B link rather than via dedicated circuits. 3) Labor cost in dealing with mechanical failures of robots. This one is hidden in salary but every time I have to work on a robot it means I can't be working on something else. While disk drives fail it doesn't seem to happen nearly as often as having to fish a tape out of a drive or the tape drive itself having failed. -Original Message- From: Justin Piszcz [mailto:[EMAIL PROTECTED] Sent: Friday, September 21, 2007 10:08 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
Ed Wilts said: 1) Disk ages and breaks too. But with RAID, no longer will the failure of a piece of media cause a backup or restore failure. 2) Transport is cheap. I'd be surprised if I couldn't transport a thousand tapes for the cost of a terabyte of storage. Bandwidth to move data is *NOT* cheap. 20GB/day requires 3Mbps of pipe. I've done a number of cost comparisons lately, and you're right. It's not cheap, but it's not astronomical either. And you need to weigh that cost against not having the risk of a lost tape and all the multi-million dollar costs that come along with that these days. 3) I spend more time replacing disk drives than I do replacing tapes or tape drives. To back up my 1200 SAN-based spindles, I have 6 LTO-3 drives. You have 200 times more disk drives than you have tape drives. Of course you spend more time replacing them. But those drive failures never have to cause backup or restore failures, as tape/drive failures do. Try having a few hundred tape drives and see how your life changes. I have a customer with 100 drives and their tape drive vendor is in once a week swapping something, and each one of those swaps is associated with a backup or restore failure. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
I currently backup 9TB of data to VTL during a FULL window which writes ~100GB of data to the VTL repository in that window. Another state is one thing, but across town via DWDM is no prob. out of state is handled by duping that data to phys tapewouldn't want to dupe disk outside of a DWDM connection. Paul -- -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ed Wilts Sent: September 22, 2007 9:35 AM To: 'Jeff Lightner'; veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? Here's some simple math that may help (complements of ExaGrid's web site). If you have 1TB of data with a 2% change rate, you'll need to back up 20GB of daily incrementals. To replicate this to another site in 18 hours requires 3Mbps of bandwidth. If you have lots of bandwidth or not too much data, replication to an offsite location may make sense. But to think that you can replicate your backups for 20TB of data to another state is going to make your network group squirm. Iron Mountain looks pretty cheap comparing to offsite electronic replication. We have 1 application by itself that adds 30GB of new data every day. It's being replicated within the metro area over a 1Gbps pipe (real time, not via backups). We sure couldn't replicate everything... As the OLD saying goes, never understand the bandwidth of a station wagon full of tapes. .../Ed La version française suit le texte anglais. This email may contain privileged and/or confidential information, and the Bank of Canada does not waive any related rights. Any distribution, use, or copying of this email or the information it contains by other than the intended recipient is unauthorized. If you received this email in error please delete it immediately from your system and notify the sender promptly by email that you have done so. Le présent courriel peut contenir de l'information privilégiée ou confidentielle. La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion, utilisation ou copie de ce courriel ou des renseignements qu'il contient par une personne autre que le ou les destinataires désignés est interdite. Si vous recevez ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans délai à l'expéditeur un message électronique pour l'aviser que vous avez éliminé de votre ordinateur toute copie du courriel reçu. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
With VTL there is no need to multistream. Instead of writing 8 stream to 1 drive, just create 8 Virtual drives, and not multiplex. It's not because of a performance issue, it's an advantage of virtualization. As far as performance goes, with a Disk as disk config, to create a high perf target, you would need to create HLUNs which are striped over many, many LUNs on your array, or present LUNs which are stripes of segments of many RAID groups. Many VTLs (the one I'm using, for instance) distribute the writes over many LUNs. I'm currently writing dozens of simultaneous jobs distributed over 28 separate LUNs. The data reduction (compression) throughput I'm getting with VTL is definitely better, on a per client job basis than I was getting to MPX'ed jobs going to LTO2. Offsite is SUPER easywe replicate our LUNs caontaining the de-duped data to our DR site. To bring up the other site, once the DR LUNs are made R/W, we just start the daemons on the DR VTL and away we go. The devices are available there as they were at head office. Don't even need to rediscover devices on the NBU servers. Vault works great for spinning off copies to Physical tapes, if necessary. Paul -- -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Clem Kruger Sent: September 22, 2007 5:12 AM To: Jeff Lightner; Justin Piszcz Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? Compression on a VTL is done by the operating system (normally LINUX) which we all know is a slow process and therefore not recommended. Your VTL supplier will also recommend that you do not multistream as this also slows down the process. La version française suit le texte anglais. This email may contain privileged and/or confidential information, and the Bank of Canada does not waive any related rights. Any distribution, use, or copying of this email or the information it contains by other than the intended recipient is unauthorized. If you received this email in error please delete it immediately from your system and notify the sender promptly by email that you have done so. Le présent courriel peut contenir de l'information privilégiée ou confidentielle. La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion, utilisation ou copie de ce courriel ou des renseignements qu'il contient par une personne autre que le ou les destinataires désignés est interdite. Si vous recevez ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans délai à l'expéditeur un message électronique pour l'aviser que vous avez éliminé de votre ordinateur toute copie du courriel reçu. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
Guys i've just read this thread and can say im very interested in it. The first thing is i learned a new term called deduplication which i didn't know existed. Question : I gather Deduplication is using other software. DataDomain i think i saw mentioned. Where does this fit in with Netbackup and does the software reside on every client or just a server somewhere? Ok, so im trying to kit refresh a backup environment for a customer which has 2 sites. Production and DR about 200 miles apart. There is a link between the sites but the customer will probably frown on increased bandwidth charges to transfer backup data across for DisasterRecovery purposes. Data is probably only 1 TB for the site with perhaps 70% being required to be transfered daily to offsite media. Currently i use tape and i was just speccing a new tape system as i thought by using disk based backups, and retentions of weekly/monthly backups lasting say 6 weeks, im going to need a LOT of disk, plus the bandwidth transfer costs to DR site LTO3 tapes are storing 200gb a tape which is pretty good compared to disk i thought. I guess in my set up its a trade off between :- Initial cost of disk array vs initial cost of tape library, drives and media Time take to backup ( network will be bottle neck here. Still on 100Meg lan with just 2 DB servers using GigaBit lan to backup server. Offsite transfer of tapes daily to offsite location vs Cost of increased bandwith between sites to transfer backup data. Im now confused what to propose :) ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
On Mon, 24 Sep 2007, Dave Markham wrote: Guys i've just read this thread and can say im very interested in it. The first thing is i learned a new term called deduplication which i didn't know existed. Question : I gather Deduplication is using other software. DataDomain i think i saw mentioned. Where does this fit in with Netbackup and does the software reside on every client or just a server somewhere? Ok, so im trying to kit refresh a backup environment for a customer which has 2 sites. Production and DR about 200 miles apart. There is a link between the sites but the customer will probably frown on increased bandwidth charges to transfer backup data across for DisasterRecovery purposes. Data is probably only 1 TB for the site with perhaps 70% being required to be transfered daily to offsite media. Currently i use tape and i was just speccing a new tape system as i thought by using disk based backups, and retentions of weekly/monthly backups lasting say 6 weeks, im going to need a LOT of disk, plus the bandwidth transfer costs to DR site LTO3 tapes are storing 200gb a tape which is pretty good compared to disk i thought. LTO-3 = 400GiB I guess in my set up its a trade off between :- Initial cost of disk array vs initial cost of tape library, drives and media Time take to backup ( network will be bottle neck here. Still on 100Meg lan with just 2 DB servers using GigaBit lan to backup server. Offsite transfer of tapes daily to offsite location vs Cost of increased bandwith between sites to transfer backup data. Im now confused what to propose :) ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
Data Domain makes a hardware storage device (disks) which does deduplication. Rather than backing up block for block all the time it does it only for the first backup. For subsequent backups rather than doing an incremental backup at file level it backups up incrementally at block level meaning only the blocks that changed in the source are stored on the target. The benefit to this is good for things like databases on filesystems where the datafile gets updated for any write to the datafile. A standard file incremental would backup the entire datafile but a deduplication incremental would only backup the blocks modified within the datafile. One can get what appears to be a very high level of compression to the deduplication storage. I've seen numbers like 20:1 and even one person on this list last year said something like 80:1 though that wouldn't be typical. Data Domain isn't the only deduplication company out there and we haven't yet implemented the ones we bought (though we will before the end of October). I was contacted off list by another company called Sepaton but there solution seemed to require one to one correspondence between original storage and target storage. I believe there is at least one other company doing deduplication but I don't recall who (Falconstore maybe)? -Original Message- From: Dave Markham [mailto:[EMAIL PROTECTED] Sent: Monday, September 24, 2007 11:35 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? Guys i've just read this thread and can say im very interested in it. The first thing is i learned a new term called deduplication which i didn't know existed. Question : I gather Deduplication is using other software. DataDomain i think i saw mentioned. Where does this fit in with Netbackup and does the software reside on every client or just a server somewhere? Ok, so im trying to kit refresh a backup environment for a customer which has 2 sites. Production and DR about 200 miles apart. There is a link between the sites but the customer will probably frown on increased bandwidth charges to transfer backup data across for DisasterRecovery purposes. Data is probably only 1 TB for the site with perhaps 70% being required to be transfered daily to offsite media. Currently i use tape and i was just speccing a new tape system as i thought by using disk based backups, and retentions of weekly/monthly backups lasting say 6 weeks, im going to need a LOT of disk, plus the bandwidth transfer costs to DR site LTO3 tapes are storing 200gb a tape which is pretty good compared to disk i thought. I guess in my set up its a trade off between :- Initial cost of disk array vs initial cost of tape library, drives and media Time take to backup ( network will be bottle neck here. Still on 100Meg lan with just 2 DB servers using GigaBit lan to backup server. Offsite transfer of tapes daily to offsite location vs Cost of increased bandwith between sites to transfer backup data. Im now confused what to propose :) -- CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential information and is for the sole use of the intended recipient(s). If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you. -- ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
For on-demand type database backups, I had great success with setting up a simple SATA-based DSU which was seen by one of the media servers. It had a vault policy to dump it to tape after 4-5 days, then expire the DSU image. It worked out great for informix onbar log dumps especially... Harry S. Atlanta -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Justin Piszcz Sent: Saturday, September 22, 2007 10:28 AM To: Ed Wilts Cc: veritas-bu@mailman.eng.auburn.edu; 'Jeff Lightner' Subject: Re: [Veritas-bu] Tapeless backup environments? Don't even get me started on SANs, I have seen the entire loss of an MTI (now EMC) SAN and with the new Claiiron SANS I have seen entire shelves go off-line due to bad SPAs etc, IMO not reliable. Also with disk, I have a question with VTLs, etc, if I am feeding multiple LTO-3 tape drives using 10Gbps; what type of disk/VTL (not SAN) is out there that can accept multiple 10Gbps streams/data and will not choke? VTLs seem like a good idea for filesystem backups but for on-demand database backups, I do not see them as the holy grail. Justin. On Sat, 22 Sep 2007, Ed Wilts wrote: 1) Disk ages and breaks too. 2) Transport is cheap. I'd be surprised if I couldn't transport a thousand tapes for the cost of a terabyte of storage. Bandwidth to move data is *NOT* cheap. 20GB/day requires 3Mbps of pipe. 3) I spend more time replacing disk drives than I do replacing tapes or tape drives. To back up my 1200 SAN-based spindles, I have 6 LTO-3 drives. It sounds like you need to either replace your tape drives or treat them better. We do work on our robots perhaps once every few months. We replace disk drives on a weekly basis. NetBackup requires a *lot* more time than the robots or the disk drives ever will. .../Ed -- Ed Wilts, RHCE, BCFP, BCSD Mounds View, MN, USA mailto:[EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:veritas-bu- [EMAIL PROTECTED] On Behalf Of Jeff Lightner Sent: Friday, September 21, 2007 9:34 AM To: Justin Piszcz Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? Disk is not cheaper? You've done a cost analysis? Not saying you're wrong and I haven't done an analysis but I'd be surprised if disks didn't actually work out to be cheaper over time: 1) Tapes age/break - We buy on average several hundred tapes a year - support on a disk array for failing disks may or may not be more expensive. 2) Transport/storage - We have to pay for offsite storage and transfer - it seems just putting an array in offsite facility would eliminate the need for transportation (in trucks) cost. Of course there would be cost in the data transfer disk to disk but since everyone seems to have connectivity over the internet it might be possible to do this using a B2B link rather than via dedicated circuits. 3) Labor cost in dealing with mechanical failures of robots. This one is hidden in salary but every time I have to work on a robot it means I can't be working on something else. While disk drives fail it doesn't seem to happen nearly as often as having to fish a tape out of a drive or the tape drive itself having failed. -Original Message- From: Justin Piszcz [mailto:[EMAIL PROTECTED] Sent: Friday, September 21, 2007 10:08 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? On Fri, 21 Sep 2007, Jeff Lightner wrote: Yesterday our director said that he doesn't intend to ever upgrade existing STK L700 because eventually we'll go tapeless as that is what the industry is doing. The idea being we'd have our disk backup devices here (e.g. Data Domain) and transfer to offsite storage to another disk device so as to eliminate the need for ever transporting tapes. It made me wonder if anyone was actually doing the above already or was planning to do so? That seems to be the way people are 'thinking' but the bottom line is disk still is not cheaper than LTO-3 tape and there are a lot of advantages to tape; however, convicing management of this is an uphill battle. Justin. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
There are several. FalconStor, Diligent, Quantum and Sepaton I believe will all present a tape to an NDMP device, and provide de-dupe on the backend. Paul -- -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jim Horalek Sent: September 24, 2007 12:43 PM To: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? On a similar note how does NDMP play with Disk de-dup? All of the de-dups I've seem are NAS devices. NDMP only talks to tape or VTL. Are there VTL's with De-dup that would solve the NDMP problem? Jim La version française suit le texte anglais. This email may contain privileged and/or confidential information, and the Bank of Canada does not waive any related rights. Any distribution, use, or copying of this email or the information it contains by other than the intended recipient is unauthorized. If you received this email in error please delete it immediately from your system and notify the sender promptly by email that you have done so. Le présent courriel peut contenir de l'information privilégiée ou confidentielle. La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute diffusion, utilisation ou copie de ce courriel ou des renseignements qu'il contient par une personne autre que le ou les destinataires désignés est interdite. Si vous recevez ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans délai à l'expéditeur un message électronique pour l'aviser que vous avez éliminé de votre ordinateur toute copie du courriel reçu. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
On a similar note how does NDMP play with Disk de-dup? All of the de-dups I've seem are NAS devices. NDMP only talks to tape or VTL. Are there VTL's with De-dup that would solve the NDMP problem? Jim -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dave Markham Sent: Monday, September 24, 2007 8:35 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? Guys i've just read this thread and can say im very interested in it. The first thing is i learned a new term called deduplication which i didn't know existed. Question : I gather Deduplication is using other software. DataDomain i think i saw mentioned. Where does this fit in with Netbackup and does the software reside on every client or just a server somewhere? Ok, so im trying to kit refresh a backup environment for a customer which has 2 sites. Production and DR about 200 miles apart. There is a link between the sites but the customer will probably frown on increased bandwidth charges to transfer backup data across for DisasterRecovery purposes. Data is probably only 1 TB for the site with perhaps 70% being required to be transfered daily to offsite media. Currently i use tape and i was just speccing a new tape system as i thought by using disk based backups, and retentions of weekly/monthly backups lasting say 6 weeks, im going to need a LOT of disk, plus the bandwidth transfer costs to DR site LTO3 tapes are storing 200gb a tape which is pretty good compared to disk i thought. I guess in my set up its a trade off between :- Initial cost of disk array vs initial cost of tape library, drives and media Time take to backup ( network will be bottle neck here. Still on 100Meg lan with just 2 DB servers using GigaBit lan to backup server. Offsite transfer of tapes daily to offsite location vs Cost of increased bandwith between sites to transfer backup data. Im now confused what to propose :) ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
Hi Dave, Yes it is a difficult decision I have looked at DataDomain with NetBackup. I have found that the backups are faster and there is a vast amount of disk being saved. NetBackup 6.5 includes de-duplication and I have become a great friend of it. To use the words of a supplier, Saving me Time, Saving me Space and Saving me Money :) Kind Regards, Clem Kruger -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dave Markham Sent: 24 September 2007 17:35 PM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? Guys i've just read this thread and can say im very interested in it. The first thing is i learned a new term called deduplication which i didn't know existed. Question : I gather Deduplication is using other software. DataDomain i think i saw mentioned. Where does this fit in with Netbackup and does the software reside on every client or just a server somewhere? Ok, so im trying to kit refresh a backup environment for a customer which has 2 sites. Production and DR about 200 miles apart. There is a link between the sites but the customer will probably frown on increased bandwidth charges to transfer backup data across for DisasterRecovery purposes. Data is probably only 1 TB for the site with perhaps 70% being required to be transfered daily to offsite media. Currently i use tape and i was just speccing a new tape system as i thought by using disk based backups, and retentions of weekly/monthly backups lasting say 6 weeks, im going to need a LOT of disk, plus the bandwidth transfer costs to DR site LTO3 tapes are storing 200gb a tape which is pretty good compared to disk i thought. I guess in my set up its a trade off between :- Initial cost of disk array vs initial cost of tape library, drives and media Time take to backup ( network will be bottle neck here. Still on 100Meg lan with just 2 DB servers using GigaBit lan to backup server. Offsite transfer of tapes daily to offsite location vs Cost of increased bandwith between sites to transfer backup data. Im now confused what to propose :) ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
Do you need a special license for 6.5 or can those with 6.0 licenses upgrade? I assume you need to open a case with NetBackup to get the download links? Justin. On Mon, 24 Sep 2007, Clem Kruger wrote: Hi Dave, Yes it is a difficult decision I have looked at DataDomain with NetBackup. I have found that the backups are faster and there is a vast amount of disk being saved. NetBackup 6.5 includes de-duplication and I have become a great friend of it. To use the words of a supplier, Saving me Time, Saving me Space and Saving me Money :) Kind Regards, Clem Kruger -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dave Markham Sent: 24 September 2007 17:35 PM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? Guys i've just read this thread and can say im very interested in it. The first thing is i learned a new term called deduplication which i didn't know existed. Question : I gather Deduplication is using other software. DataDomain i think i saw mentioned. Where does this fit in with Netbackup and does the software reside on every client or just a server somewhere? Ok, so im trying to kit refresh a backup environment for a customer which has 2 sites. Production and DR about 200 miles apart. There is a link between the sites but the customer will probably frown on increased bandwidth charges to transfer backup data across for DisasterRecovery purposes. Data is probably only 1 TB for the site with perhaps 70% being required to be transfered daily to offsite media. Currently i use tape and i was just speccing a new tape system as i thought by using disk based backups, and retentions of weekly/monthly backups lasting say 6 weeks, im going to need a LOT of disk, plus the bandwidth transfer costs to DR site LTO3 tapes are storing 200gb a tape which is pretty good compared to disk i thought. I guess in my set up its a trade off between :- Initial cost of disk array vs initial cost of tape library, drives and media Time take to backup ( network will be bottle neck here. Still on 100Meg lan with just 2 DB servers using GigaBit lan to backup server. Offsite transfer of tapes daily to offsite location vs Cost of increased bandwith between sites to transfer backup data. Im now confused what to propose :) ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
I am not quite sure how it is done there. I would contact Symantec in your area and ask how they will manage your license. Kind Regards, Clem Kruger -Original Message- From: Justin Piszcz [mailto:[EMAIL PROTECTED] Sent: 24 September 2007 19:16 PM To: Clem Kruger Cc: [EMAIL PROTECTED]; Jeff Lightner; veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? Do you need a special license for 6.5 or can those with 6.0 licenses upgrade? I assume you need to open a case with NetBackup to get the download links? Justin. On Mon, 24 Sep 2007, Clem Kruger wrote: Hi Dave, Yes it is a difficult decision I have looked at DataDomain with NetBackup. I have found that the backups are faster and there is a vast amount of disk being saved. NetBackup 6.5 includes de-duplication and I have become a great friend of it. To use the words of a supplier, Saving me Time, Saving me Space and Saving me Money :) Kind Regards, Clem Kruger -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dave Markham Sent: 24 September 2007 17:35 PM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? Guys i've just read this thread and can say im very interested in it. The first thing is i learned a new term called deduplication which i didn't know existed. Question : I gather Deduplication is using other software. DataDomain i think i saw mentioned. Where does this fit in with Netbackup and does the software reside on every client or just a server somewhere? Ok, so im trying to kit refresh a backup environment for a customer which has 2 sites. Production and DR about 200 miles apart. There is a link between the sites but the customer will probably frown on increased bandwidth charges to transfer backup data across for DisasterRecovery purposes. Data is probably only 1 TB for the site with perhaps 70% being required to be transfered daily to offsite media. Currently i use tape and i was just speccing a new tape system as i thought by using disk based backups, and retentions of weekly/monthly backups lasting say 6 weeks, im going to need a LOT of disk, plus the bandwidth transfer costs to DR site LTO3 tapes are storing 200gb a tape which is pretty good compared to disk i thought. I guess in my set up its a trade off between :- Initial cost of disk array vs initial cost of tape library, drives and media Time take to backup ( network will be bottle neck here. Still on 100Meg lan with just 2 DB servers using GigaBit lan to backup server. Offsite transfer of tapes daily to offsite location vs Cost of increased bandwith between sites to transfer backup data. Im now confused what to propose :) ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
Dave, Dude, you've got to get our more. ;) I'd recommend continually perusing some of these sites to stay current on what's going on in the industry. De-dupe is kind of the most-mentioned topic in the storage industry since I don't know what. http://www.searchstorage.com http://www.byteandswitch.com http://www.infostoremag.com http://www.isit.com/IndexSTO.cfm http://www.backupcentral.com (My blog) On my blog I've got a series of entries that talks about De-duplication, starting with this one, What is De-duplication? I tried to link all the de-dupe entries together, so that each entry has a forwarding link to the next blog entry in the series: http://www.backupcentral.com/content/view/58/47/ Your question about where de-dupe resides is answered in this entry Two different types of de-dupe: http://www.backupcentral.com/content/view/129/47/ We've got directories of both types: Hardware/Target: http://tinyurl.com/384528 Software/Source: http://tinyurl.com/2dtvh2 (I use TinyUrl.com because the URLs are very long and tend to get truncated in email. BTW, tinyurl uses de-duplication-like techniques, as they run an algorithm against the string to give you a smaller string. Then when you click on that string, they restore the original URL to your browser. Kind of cool.) --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dave Markham Sent: Monday, September 24, 2007 11:35 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? Guys i've just read this thread and can say im very interested in it. The first thing is i learned a new term called deduplication which i didn't know existed. Question : I gather Deduplication is using other software. DataDomain i think i saw mentioned. Where does this fit in with Netbackup and does the software reside on every client or just a server somewhere? Ok, so im trying to kit refresh a backup environment for a customer which has 2 sites. Production and DR about 200 miles apart. There is a link between the sites but the customer will probably frown on increased bandwidth charges to transfer backup data across for DisasterRecovery purposes. Data is probably only 1 TB for the site with perhaps 70% being required to be transfered daily to offsite media. Currently i use tape and i was just speccing a new tape system as i thought by using disk based backups, and retentions of weekly/monthly backups lasting say 6 weeks, im going to need a LOT of disk, plus the bandwidth transfer costs to DR site LTO3 tapes are storing 200gb a tape which is pretty good compared to disk i thought. I guess in my set up its a trade off between :- Initial cost of disk array vs initial cost of tape library, drives and media Time take to backup ( network will be bottle neck here. Still on 100Meg lan with just 2 DB servers using GigaBit lan to backup server. Offsite transfer of tapes daily to offsite location vs Cost of increased bandwith between sites to transfer backup data. Im now confused what to propose :) ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
Question : I gather Deduplication is using other software. DataDomain i think i saw mentioned. Where does this fit in with Netbackup and does the software reside on every client or just a server somewhere? In the technologies I'm familiar with--one of them is old, another new, it's conceptually simple. The system, whether that's a standalone system or a box of disk with some smarts or an agent on the backup client, receives data and examines it in blocks of some size (AFAIK, always way larger than a 512-byte disk block). Simplistically, it checksums the block and looks in a table of checksums-of-blocks-that-it-already-stores to see if the identical ahem, anyone see a hole here? data already lives there. If so, the data can be tossed away and the checksum kept. The file as stored as a collection of these checksums (imprecise term, but works for the example) or a list of pointers to the single instance (hence the SIS term can be overloaded here) of the data represented by that checksum. A simplistic example would be storing a TB of zeros. Deduplicating devices would store the first block of zeros, then find that all the rest of them were the same checksum, same data and just store one more pointer. That 1TB file becomes, say, one real instance of 512KB of zeros (if that is the block size) plus the space for a few million pointers to the same 512KB of data. Obviously, even this could be compressed but that's another story. Backing up the same system with few changes would be a very small full backup. Backing up many instances of, say, the C drive of w2k3 systems will deduplicate like crazy. Backing up a million different JPEGs wouldn't save any appreciable space, but backing them up twice, or multiple instances of the same JPEG, would. LTO3 tapes are storing 200gb a tape which is pretty good compared to disk i thought. But that's a horrible number for LTO3. Either your tapes aren't full or something is broken. Look at the available_media report to get a good idea of the range of data stored on your FULL tapes. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
Simplistically, it checksums the block and looks in a table of checksums-of-blocks-that-it-already-stores to see if the identical ahem, anyone see a hole here? data already lives there. To what hole do you refer? I see one in your simplistic example, but not in what actually happens (which require a much longer technical explanation). ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
On Mon, Sep 24, 2007 at 05:08:31PM -0400, bob944 wrote: In the technologies I'm familiar with--one of them is old, another new, it's conceptually simple. The system, whether that's a standalone system or a box of disk with some smarts or an agent on the backup client, receives data and examines it in blocks of some size (AFAIK, always way larger than a 512-byte disk block). Simplistically, it checksums the block and looks in a table of checksums-of-blocks-that-it-already-stores to see if the identical ahem, anyone see a hole here? data already lives there. Yes, there's a hole there if that's all you're relying on. Not all of them do that. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
There are no products in the market that rely solely on a checksum to identify redundant data. There are a few that rely solely on a 160-bit hash, which is significantly larger than a checksum (typically 12-16 bits). There are some who are concerned about hash collisions in this scenario. I am not one of those people. Here is a quote from an article I wrote. The entire article is available here: http://tinyurl.com/2j7r52 quote Hash collisions occur when two different chunks produce the same hash. It's widely acknowledged in cryptographic circles that a determined hacker could create two blocks of data that would have the same MD5 hash. If a hacker could do that, they might be able to create a fake cryptographic signature. That's why many security experts are turning to SHA-1. Its bigger key space makes it much more difficult for a hacker to crack. However, at least one group has already been credited with creating a hash collision with SHA-1. The ability to forcibly create a hash collision means absolutely nothing in the context of deduplication. What matters is the chance that two random chunks would have a hash collision. With a 128-bit and 160-bit key space, the odds of that happening are 1 in 2128 with MD5, and 1 in 2160 with SHA-1. That's 1038 and 1048, respectively. If you assume that there's less than a yottabyte (1 billion petabytes) of data on the planet Earth, then the odds of a hash collision with two random chunks are roughly 1,461,501,637,330,900,000,000,000,000 times greater than the number of bytes in the known computing universe. Let's compare those odds with the odds of an unrecoverable read error on a typical disk--approximately 1 in 100 trillion or 1014. Even worse odds are data miscorrection, where error-correcting codes step in and believe they have corrected an error, but miscorrect it instead. Those odds are approximately 1 in 1021. So you have a 1 in 1021 chance of writing data to disk, having the data written incorrectly and not even knowing it. Everybody's OK with these numbers, so there's little reason to worry about the 1 in 1048 chance of a SHA-1 hash collision. If you want to talk about the odds of something bad happening and not knowing it, keep using tape. Everyone who has worked with tape for any length of time has experienced a tape drive writing something that it then couldn't read. Compare that to successful deduplication disk restores. According to Avamar Technologies Inc. (recently acquired by EMC Corp.), none of its customers has ever had a failed restore. Hash collisions are a nonissue. /quote --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of A Darren Dunham Sent: Monday, September 24, 2007 5:59 PM To: Veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? On Mon, Sep 24, 2007 at 05:08:31PM -0400, bob944 wrote: In the technologies I'm familiar with--one of them is old, another new, it's conceptually simple. The system, whether that's a standalone system or a box of disk with some smarts or an agent on the backup client, receives data and examines it in blocks of some size (AFAIK, always way larger than a 512-byte disk block). Simplistically, it checksums the block and looks in a table of checksums-of-blocks-that-it-already-stores to see if the identical ahem, anyone see a hole here? data already lives there. Yes, there's a hole there if that's all you're relying on. Not all of them do that. -- Darren Dunham [EMAIL PROTECTED] Senior Technical Consultant TAOShttp://www.taos.com/ Got some Dr Pepper? San Francisco, CA bay area This line left intentionally blank to confuse you. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
I'm not convinced that writing to a DataDomain is going to be faster than writing to multiple LTO-3 drives over a SAN. The DD is limited to about 90MB/sec which is on par with 1-2 LTO-3 drives and not much more than that. Unless, of course, you consider adding extra DD units for every 2 LTO-3 drives you currently have and that's going to bump your costs up even higher (which might be offset by the requirement for a Decru FC520 encrypting appliance for every 2-3 LTO-3 drives today). I don't think that NetBackup 6.5 includes de-duplication. It's provided by PureDisk which is a separately licensed product. With 6.5.1, you'll be able to use PureDisk as a storage unit, something that's not there yet today. .../Ed -- Ed Wilts, RHCE, BCFP, BCSD Mounds View, MN, USA mailto:[EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:veritas-bu- [EMAIL PROTECTED] On Behalf Of Clem Kruger Sent: Monday, September 24, 2007 11:32 AM To: [EMAIL PROTECTED]; Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? Hi Dave, Yes it is a difficult decision I have looked at DataDomain with NetBackup. I have found that the backups are faster and there is a vast amount of disk being saved. NetBackup 6.5 includes de-duplication and I have become a great friend of it. To use the words of a supplier, Saving me Time, Saving me Space and Saving me Money :) Kind Regards, Clem Kruger -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dave Markham Sent: 24 September 2007 17:35 PM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? Guys i've just read this thread and can say im very interested in it. The first thing is i learned a new term called deduplication which i didn't know existed. Question : I gather Deduplication is using other software. DataDomain i think i saw mentioned. Where does this fit in with Netbackup and does the software reside on every client or just a server somewhere? Ok, so im trying to kit refresh a backup environment for a customer which has 2 sites. Production and DR about 200 miles apart. There is a link between the sites but the customer will probably frown on increased bandwidth charges to transfer backup data across for DisasterRecovery purposes. Data is probably only 1 TB for the site with perhaps 70% being required to be transfered daily to offsite media. Currently i use tape and i was just speccing a new tape system as i thought by using disk based backups, and retentions of weekly/monthly backups lasting say 6 weeks, im going to need a LOT of disk, plus the bandwidth transfer costs to DR site LTO3 tapes are storing 200gb a tape which is pretty good compared to disk i thought. I guess in my set up its a trade off between :- Initial cost of disk array vs initial cost of tape library, drives and media Time take to backup ( network will be bottle neck here. Still on 100Meg lan with just 2 DB servers using GigaBit lan to backup server. Offsite transfer of tapes daily to offsite location vs Cost of increased bandwith between sites to transfer backup data. Im now confused what to propose :) ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
I'm not convinced either. Although our numbers are a little different, you and I end up roughly at the same place. There are a number of vendors whose de-dupe targets top out at about 200-300 MB/s, which is roughly the speed of 2-3 LTO-3 drives, depending on how well you use them. If you need more than that, you need to buy another box. (BTW, Data Domain's numbers have increased to about 200 MB/s.) These numbers work just fine when we're talking backups via the LAN to LAN-based backup servers. You're going to need at least two, possibly three network-based backup servers to generate 200 MB/s. Assuming 70 MB/s or so per master/media server, you buy one de-dupe unit per three master/media servers or so. You can scale pretty far that way. You will need to make sure that backup A is always sent to de-dupe unit A, and backup B is always sent to de-dupe unit B, and so on. (If you send backup B to de-dupe unit A after initially sending it to de-dupe unit A, its first backup will not get de-duped against anything, resulting in a significant decrease in overall de-duplication ratio.) While you won't get as big of a de-dupe ratio as you would if you could have a single device that could do 1000s of MB/s, there is an argument to be made that you won't get much de-dupe when de-duping the backups of server A against those of server B -- unless they have similar data. So a very large setup like this will require a bit of planning, but I think the benefits outweigh the extra planning required. Now, if you happen to have a SINGLE SAN media server that needs MORE than 200 MB/s, then you're going to want a device that can handle that level of throughput. This is going to be a pretty big server, BTW, as a 200 MB/s device can back up about 6 TB in 8 hours. And notice I said SAN media server, not a regular media server, as a regular media server isn't going to be able to generate more than 200 MB/s, as it's getting its backups via IP. But a SAN media server is backing up its own data locally, so it can go much faster. This also means you're really looking at a SAN/block device, which means you're really looking at a VTL. (Yes, I'm aware of the Puredisk storage unit around the corner. I think you'll find it's not going after this part of the market.) If you need this kind of throughput, there are a few products that are advertising several hundred or thousands of MB/s within a single de-dupe setup. These are the newer kids on the de-dupe block, of course, so they're not going to have as many customer references as the vendors that have been selling de-dupe as long. But from what I've seen, they're worth a look. --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ed Wilts Sent: Monday, September 24, 2007 9:44 PM To: 'Clem Kruger'; [EMAIL PROTECTED]; 'Jeff Lightner' Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? I'm not convinced that writing to a DataDomain is going to be faster than writing to multiple LTO-3 drives over a SAN. The DD is limited to about 90MB/sec which is on par with 1-2 LTO-3 drives and not much more than that. Unless, of course, you consider adding extra DD units for every 2 LTO-3 drives you currently have and that's going to bump your costs up even higher (which might be offset by the requirement for a Decru FC520 encrypting appliance for every 2-3 LTO-3 drives today). I don't think that NetBackup 6.5 includes de-duplication. It's provided by PureDisk which is a separately licensed product. With 6.5.1, you'll be able to use PureDisk as a storage unit, something that's not there yet today. .../Ed -- Ed Wilts, RHCE, BCFP, BCSD Mounds View, MN, USA mailto:[EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:veritas-bu- [EMAIL PROTECTED] On Behalf Of Clem Kruger Sent: Monday, September 24, 2007 11:32 AM To: [EMAIL PROTECTED]; Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? Hi Dave, Yes it is a difficult decision I have looked at DataDomain with NetBackup. I have found that the backups are faster and there is a vast amount of disk being saved. NetBackup 6.5 includes de-duplication and I have become a great friend of it. To use the words of a supplier, Saving me Time, Saving me Space and Saving me Money :) Kind Regards, Clem Kruger -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dave Markham Sent: 24 September 2007 17:35 PM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? Guys i've just read this thread and can say im very interested in it. The first thing is i learned a new term called deduplication which i didn't know existed. Question : I gather Deduplication
Re: [Veritas-bu] Tapeless backup environments?
I think one of de-duplication's benefits is that even if 2% of your file data changes it doesn't have to replicate that entire 2%. In my mind its similar to byte level replication (although an entirely different technology.) Just because Netbackup backs up 20GB of different files doesn't mean that you've made 20GB of changes. If that were the case I'd roll-over my file servers once a month. I wonder after comparing the two technologies, if there is room for a mixed mode solution that could take advantage of both tape's and disk's benefits without creating a tough to swallow price tag. -Jonathan From: [EMAIL PROTECTED] on behalf of Ed Wilts Sent: Sat 9/22/2007 9:35 AM To: 'Jeff Lightner'; veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? Here's some simple math that may help (complements of ExaGrid's web site). If you have 1TB of data with a 2% change rate, you'll need to back up 20GB of daily incrementals. To replicate this to another site in 18 hours requires 3Mbps of bandwidth. If you have lots of bandwidth or not too much data, replication to an offsite location may make sense. But to think that you can replicate your backups for 20TB of data to another state is going to make your network group squirm. Iron Mountain looks pretty cheap comparing to offsite electronic replication. We have 1 application by itself that adds 30GB of new data every day. It's being replicated within the metro area over a 1Gbps pipe (real time, not via backups). We sure couldn't replicate everything... As the OLD saying goes, never understand the bandwidth of a station wagon full of tapes. .../Ed -- Ed Wilts, RHCE, BCFP, BCSD Mounds View, MN, USA mailto:[EMAIL PROTECTED] From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeff Lightner Sent: Friday, September 21, 2007 8:44 AM To: veritas-bu@mailman.eng.auburn.edu Subject: [Veritas-bu] Tapeless backup environments? Yesterday our director said that he doesn't intend to ever upgrade existing STK L700 because eventually we'll go tapeless as that is what the industry is doing. The idea being we'd have our disk backup devices here (e.g. Data Domain) and transfer to offsite storage to another disk device so as to eliminate the need for ever transporting tapes. It made me wonder if anyone was actually doing the above already or was planning to do so? ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
I agree, disk may not be cheaper BUT one can choose what disk one should use for backups (Tier1 to Tier 4). As we have seen in earlier posts there is a fair amount of work to be done in maintaining VTL tapes which have expired (a salary cost). If your master and media servers have been setup correctly you will find that writing to tape is faster as you can send multiple data streams to the tape AND the tape drive does the compression. Compression on a VTL is done by the operating system (normally LINUX) which we all know is a slow process and therefore not recommended. Your VTL supplier will also recommend that you do not multistream as this also slows down the process. If you want to use disk to disk backups, then do just that! It is available in version 6.0 and 6.5. 6.5 also has a de-duplication facility which will save you space on the disk (you can choice from 1 Tier to 4 Tier) and the raid group you would like to use AND 6.5 has a replication facility to replicate the disk image off site. If your management insist on VTL, my advice is to get the supplier to do a face off between tape and VTL. Don't be intimidated by them! All VTL vendors use SCSI emulation which has an overhead cost to it (they may deny this but it is fact). They will promise you that offsite storage is simple. Let them demonstrate life. Don't be fooled by their added media server. It all a pain and a lot more work, as well as being costly. Tape will remain cheaper and the tape drive manufacturers are fighting hard to keep tape that why, with larger capacity and faster drives, this despite the fact that they know that tape has beginning to reach the end of its life cycle. If you abandon tape rather go for disk to disk as it is easier faster and safer. If you are not cash critical rather go for Veritas Storage Foundation as the snap shot technology will allow to create a snap shot an any available disk on any array, which is attached to the SAN. It has all the tools included in the product that you will have to purchase from disk array suppliers at an enormous cost. The replication will also guarantee your data arrives at the offsite facility and is recoverable. Oracle backups can be at block level saving an incredible amount of time and backup space. Clem. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeff Lightner Sent: 21 September 2007 16:34 PM To: Justin Piszcz Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? Disk is not cheaper? You've done a cost analysis? Not saying you're wrong and I haven't done an analysis but I'd be surprised if disks didn't actually work out to be cheaper over time: 1) Tapes age/break - We buy on average several hundred tapes a year - support on a disk array for failing disks may or may not be more expensive. 2) Transport/storage - We have to pay for offsite storage and transfer - it seems just putting an array in offsite facility would eliminate the need for transportation (in trucks) cost. Of course there would be cost in the data transfer disk to disk but since everyone seems to have connectivity over the internet it might be possible to do this using a B2B link rather than via dedicated circuits. 3) Labor cost in dealing with mechanical failures of robots. This one is hidden in salary but every time I have to work on a robot it means I can't be working on something else. While disk drives fail it doesn't seem to happen nearly as often as having to fish a tape out of a drive or the tape drive itself having failed. -Original Message- From: Justin Piszcz [mailto:[EMAIL PROTECTED] Sent: Friday, September 21, 2007 10:08 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? On Fri, 21 Sep 2007, Jeff Lightner wrote: Yesterday our director said that he doesn't intend to ever upgrade existing STK L700 because eventually we'll go tapeless as that is what the industry is doing. The idea being we'd have our disk backup devices here (e.g. Data Domain) and transfer to offsite storage to another disk device so as to eliminate the need for ever transporting tapes. It made me wonder if anyone was actually doing the above already or was planning to do so? That seems to be the way people are 'thinking' but the bottom line is disk still is not cheaper than LTO-3 tape and there are a lot of advantages to tape; however, convicing management of this is an uphill battle. Justin. -- CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential information and is for the sole use of the intended recipient(s). If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received
Re: [Veritas-bu] Tapeless backup environments?
Here's some simple math that may help (complements of ExaGrid's web site). If you have 1TB of data with a 2% change rate, you'll need to back up 20GB of daily incrementals. To replicate this to another site in 18 hours requires 3Mbps of bandwidth. If you have lots of bandwidth or not too much data, replication to an offsite location may make sense. But to think that you can replicate your backups for 20TB of data to another state is going to make your network group squirm. Iron Mountain looks pretty cheap comparing to offsite electronic replication. We have 1 application by itself that adds 30GB of new data every day. It's being replicated within the metro area over a 1Gbps pipe (real time, not via backups). We sure couldn't replicate everything. As the OLD saying goes, never understand the bandwidth of a station wagon full of tapes. ./Ed -- Ed Wilts, RHCE, BCFP, BCSD Mounds View, MN, USA mailto:[EMAIL PROTECTED] From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeff Lightner Sent: Friday, September 21, 2007 8:44 AM To: veritas-bu@mailman.eng.auburn.edu Subject: [Veritas-bu] Tapeless backup environments? Yesterday our director said that he doesn't intend to ever upgrade existing STK L700 because eventually we'll go tapeless as that is what the industry is doing. The idea being we'd have our disk backup devices here (e.g. Data Domain) and transfer to offsite storage to another disk device so as to eliminate the need for ever transporting tapes. It made me wonder if anyone was actually doing the above already or was planning to do so? ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
1) Disk ages and breaks too. 2) Transport is cheap. I'd be surprised if I couldn't transport a thousand tapes for the cost of a terabyte of storage. Bandwidth to move data is *NOT* cheap. 20GB/day requires 3Mbps of pipe. 3) I spend more time replacing disk drives than I do replacing tapes or tape drives. To back up my 1200 SAN-based spindles, I have 6 LTO-3 drives. It sounds like you need to either replace your tape drives or treat them better. We do work on our robots perhaps once every few months. We replace disk drives on a weekly basis. NetBackup requires a *lot* more time than the robots or the disk drives ever will. .../Ed -- Ed Wilts, RHCE, BCFP, BCSD Mounds View, MN, USA mailto:[EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:veritas-bu- [EMAIL PROTECTED] On Behalf Of Jeff Lightner Sent: Friday, September 21, 2007 9:34 AM To: Justin Piszcz Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? Disk is not cheaper? You've done a cost analysis? Not saying you're wrong and I haven't done an analysis but I'd be surprised if disks didn't actually work out to be cheaper over time: 1) Tapes age/break - We buy on average several hundred tapes a year - support on a disk array for failing disks may or may not be more expensive. 2) Transport/storage - We have to pay for offsite storage and transfer - it seems just putting an array in offsite facility would eliminate the need for transportation (in trucks) cost. Of course there would be cost in the data transfer disk to disk but since everyone seems to have connectivity over the internet it might be possible to do this using a B2B link rather than via dedicated circuits. 3) Labor cost in dealing with mechanical failures of robots. This one is hidden in salary but every time I have to work on a robot it means I can't be working on something else. While disk drives fail it doesn't seem to happen nearly as often as having to fish a tape out of a drive or the tape drive itself having failed. -Original Message- From: Justin Piszcz [mailto:[EMAIL PROTECTED] Sent: Friday, September 21, 2007 10:08 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? On Fri, 21 Sep 2007, Jeff Lightner wrote: Yesterday our director said that he doesn't intend to ever upgrade existing STK L700 because eventually we'll go tapeless as that is what the industry is doing. The idea being we'd have our disk backup devices here (e.g. Data Domain) and transfer to offsite storage to another disk device so as to eliminate the need for ever transporting tapes. It made me wonder if anyone was actually doing the above already or was planning to do so? That seems to be the way people are 'thinking' but the bottom line is disk still is not cheaper than LTO-3 tape and there are a lot of advantages to tape; however, convicing management of this is an uphill battle. Justin. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
But Curtis, a disk drive by itself isn't very useful either - you'll need to a controller or two. And don't forget to factor in the price of the de-duplication appliances or software. Those suckers are *NOT* cheap. An appliance to support 1TB of compressed data lists out at about $20K. Unless you get a *lot* of de-duplication - and not everybody does - that appliance is going to get killed on price compared to not de-duping it. It took me only 30 minutes with a de-dupe vendor last week to eliminate their product from consideration in our environment. .../Ed -- Ed Wilts, RHCE, BCFP, BCSD Mounds View, MN, USA mailto:[EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:veritas-bu- [EMAIL PROTECTED] On Behalf Of Curtis Preston Sent: Friday, September 21, 2007 12:10 PM To: Justin Piszcz; Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? First, you can't compare the cost of disk and tape directly like that. You have to include the drives and robots. A drive by itself is useful; a tape by itself is not. Setting that aside, if I put that disk in a system that's doing 20:1 de-duplication, my cost is now 1.65c/GB vs your 3-9c/GB. --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Justin Piszcz Sent: Friday, September 21, 2007 7:36 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? I believe disks are 33c/gigabyte and tapes are 3-9cents/gigabyte or even cheaper, I do not remember the exact figures, but someone I know has done a cost analysis and tapes were by far cheaper. Also something that nobody calculates is the cost of power to keep disks spinning. Justin. On Fri, 21 Sep 2007, Jeff Lightner wrote: Disk is not cheaper? You've done a cost analysis? Not saying you're wrong and I haven't done an analysis but I'd be surprised if disks didn't actually work out to be cheaper over time: 1) Tapes age/break - We buy on average several hundred tapes a year - support on a disk array for failing disks may or may not be more expensive. 2) Transport/storage - We have to pay for offsite storage and transfer - it seems just putting an array in offsite facility would eliminate the need for transportation (in trucks) cost. Of course there would be cost in the data transfer disk to disk but since everyone seems to have connectivity over the internet it might be possible to do this using a B2B link rather than via dedicated circuits. 3) Labor cost in dealing with mechanical failures of robots. This one is hidden in salary but every time I have to work on a robot it means I can't be working on something else. While disk drives fail it doesn't seem to happen nearly as often as having to fish a tape out of a drive or the tape drive itself having failed. -Original Message- From: Justin Piszcz [mailto:[EMAIL PROTECTED] Sent: Friday, September 21, 2007 10:08 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? On Fri, 21 Sep 2007, Jeff Lightner wrote: Yesterday our director said that he doesn't intend to ever upgrade existing STK L700 because eventually we'll go tapeless as that is what the industry is doing. The idea being we'd have our disk backup devices here (e.g. Data Domain) and transfer to offsite storage to another disk device so as to eliminate the need for ever transporting tapes. It made me wonder if anyone was actually doing the above already or was planning to do so? That seems to be the way people are 'thinking' but the bottom line is disk still is not cheaper than LTO-3 tape and there are a lot of advantages to tape; however, convicing management of this is an uphill battle. Justin. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
Don't even get me started on SANs, I have seen the entire loss of an MTI (now EMC) SAN and with the new Claiiron SANS I have seen entire shelves go off-line due to bad SPAs etc, IMO not reliable. Also with disk, I have a question with VTLs, etc, if I am feeding multiple LTO-3 tape drives using 10Gbps; what type of disk/VTL (not SAN) is out there that can accept multiple 10Gbps streams/data and will not choke? VTLs seem like a good idea for filesystem backups but for on-demand database backups, I do not see them as the holy grail. Justin. On Sat, 22 Sep 2007, Ed Wilts wrote: 1) Disk ages and breaks too. 2) Transport is cheap. I'd be surprised if I couldn't transport a thousand tapes for the cost of a terabyte of storage. Bandwidth to move data is *NOT* cheap. 20GB/day requires 3Mbps of pipe. 3) I spend more time replacing disk drives than I do replacing tapes or tape drives. To back up my 1200 SAN-based spindles, I have 6 LTO-3 drives. It sounds like you need to either replace your tape drives or treat them better. We do work on our robots perhaps once every few months. We replace disk drives on a weekly basis. NetBackup requires a *lot* more time than the robots or the disk drives ever will. .../Ed -- Ed Wilts, RHCE, BCFP, BCSD Mounds View, MN, USA mailto:[EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:veritas-bu- [EMAIL PROTECTED] On Behalf Of Jeff Lightner Sent: Friday, September 21, 2007 9:34 AM To: Justin Piszcz Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? Disk is not cheaper? You've done a cost analysis? Not saying you're wrong and I haven't done an analysis but I'd be surprised if disks didn't actually work out to be cheaper over time: 1) Tapes age/break - We buy on average several hundred tapes a year - support on a disk array for failing disks may or may not be more expensive. 2) Transport/storage - We have to pay for offsite storage and transfer - it seems just putting an array in offsite facility would eliminate the need for transportation (in trucks) cost. Of course there would be cost in the data transfer disk to disk but since everyone seems to have connectivity over the internet it might be possible to do this using a B2B link rather than via dedicated circuits. 3) Labor cost in dealing with mechanical failures of robots. This one is hidden in salary but every time I have to work on a robot it means I can't be working on something else. While disk drives fail it doesn't seem to happen nearly as often as having to fish a tape out of a drive or the tape drive itself having failed. -Original Message- From: Justin Piszcz [mailto:[EMAIL PROTECTED] Sent: Friday, September 21, 2007 10:08 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? On Fri, 21 Sep 2007, Jeff Lightner wrote: Yesterday our director said that he doesn't intend to ever upgrade existing STK L700 because eventually we'll go tapeless as that is what the industry is doing. The idea being we'd have our disk backup devices here (e.g. Data Domain) and transfer to offsite storage to another disk device so as to eliminate the need for ever transporting tapes. It made me wonder if anyone was actually doing the above already or was planning to do so? That seems to be the way people are 'thinking' but the bottom line is disk still is not cheaper than LTO-3 tape and there are a lot of advantages to tape; however, convicing management of this is an uphill battle. Justin. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
Curtis - Although I agree with the other responses you have given out with respect to the tape vs. disk cost I am not sure about your statements below. Going back for a second to the cost of tape vs. disk... if you do an analysis make sure to take all things into account when you backup to tape. This is why most people don't get a proper cost associated with tape backup i.e: 1. SAN ports 2. Tape drives - fixing them, lost time, shoe-shining 3. media cost - fixing media, media failure cost(cost of not being able to do a restore) 4. off siting - the cycles/dollars lost in handling that internally, the cost of dealing with Recall/Iron Mountain (or whoever), the cost associated with the delay in waiting for a tape to be recalled... 5. library maintenance cost 6. restore duration cost (i.e. if i have 100 people waiting for a Tier 1 server to be restored...) Anyways the list of invisible costs associated with tapes go on... As for your EMC CDL comments... First I believe they are now called EDL (EMC Disk Libraries) because they take into account their new Symmetrix backend devices. Although I agree with you that de-dup is important to the future of backups you make it seem that it should be the only deciding factor in a purchase! If you push de-dup aside for a second what do most customers want? My guess is performance, availability, stability, integration with backup application. This has been my thought process and these de-dup companies you speak about such as Sepaton, Diligent, Data Domain all at one point or another have HUGE performance hits (i.e. we have tape drives that go faster then some of these), little capability to scale (without combining multiple devices together), or have un-explainable single points of failures. I also agree that replication is important and if you can minimize the amount you replicate then great. Here is my dilemma: Most of the de-dup vendors out there (i.e. I am thinking of Sepaton) that can perform de-dup have only been in the replication business for a year (probably less) and have very little maturity in that space! That scares me a bit... As for backup integration I personally like the fact that with EMC I can have a built in media server on top of my VTL and control everything from what I am familiar with... no other vendor offers that! Anyways just my two cents... Bottom line is that I agree that de-dup is important but if you can push that aside and look at the other technical merit (assuming that all vendors will have de-dup sooner than later) suddenly the list of enterprise level candidates drops significantly from what I am seeing. -Nicholas - From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Curtis Preston Sent: Friday, September 21, 2007 1:13 PM To: Kevin Whittaker; Jeff Lightner; veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? The only issue there is that the EMC CDL does not support de-duplication, and it doesnt look like theyll be doing it any time soon. I know theyre working on it, but they havent announced anything public, so who knows. Compare that to the other de-dupe vendors that announced probably a year before they were ready, and youve got some sense of my opinion of when EMC de-dupe will actually be GA if not later. Your design would work great if you had de-dupe. Without de-dupe, you are going to be replicated 20 times more data (or more), requiring a significantly larger pipe. --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies - From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kevin Whittaker Sent: Friday, September 21, 2007 7:48 AM To: Jeff Lightner; veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? We have it on our plan. We will be using tape for only long term retention of data. Our plan is to purchase another EMC CDL, and mirror our existing EMC CDL to the EMC CDL at our DR site. Our master server already is duplicated, and this will allow us to start restores of stuff that is not tier 1 applications that already are mirrored to the DR site. I would prefer not to save the long term on tape, but we don't have a solution for any other way to do it at this time. Kevin - From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeff Lightner Sent: Friday, September 21, 2007 9:44 AM To: veritas-bu@mailman.eng.auburn.edu Subject: [Veritas-bu] Tapeless backup environments? Yesterday our director said that he doesnt intend to ever upgrade existing STK L700 because eventually well go tapeless as that is what the industry is doing. The idea being wed have our disk backup devices here
Re: [Veritas-bu] Tapeless backup environments?
It is interesting to see the points for and against disk / tape backup technologies play out. A worthwhile discussion. People have mentioned management/operational/service/infrastructure costs to justify a switch. Nobody has mentioned risk. The problem with comparing tape / disk is they are very different technologies and have different risk profiles subject to how you choose to apply them. I don't think disk was every intended to store dormant data. If it did it would stop spinning don't you think? So here are some of the risk profile differences to consider before you take the leap of faith. 1. Tape can be set to dormant / shelved. Disk can not (some can - but the ones you guys are talking about can't) so it is susceptible to corruption, malicious intent, accidents, unauthorized and often undetected access. 2. A tape backup set is isolated from all other tape backup sets - that is it has few dependents. A disk backup set will often share disks with others - that is it has many dependents. The risk grows exponentially with deduping as the logical structure now becomes dependent upon itself. If I can use an analogy. With deduping your kind of saying incremental forever to tape is acceptable. 3. It would take a long time to wipe 1000 tapes. It would take a few minutes to wipe 1000 tape volumes worth of disk and a couple of seconds for the deduped equivalent. If deduping was considered risk free we would be deduping our entire Enterprise. But somehow it is acceptable for backup. I don't think anyone would agree deduping your backups is acceptable without a tape backup set. So why do we have deduped backups? Deduping is necessary to make disk backup viable for a greater share of the backup market. There is a general rule I apply to technology choices and that is with every step forward always consider what you are compromising. In this case it is risk. However, don't get me wrong. The compromise may be acceptable to you. It is one of those assessments that is difficult to quantify and therefore often misunderstood or ignored completely. D2d2T in my mind gives you the best of both worlds (I don't agree one was every meant to replace the other). It takes away the unpredictability of tape transports without compromising the data's resting place and risk profile. The key is in managing the d in a way that affords you tolerance to T failures and growth in D. So to make it happen the d should be far superior to the D and T combined. PS. These are my opinions not the companies I work for. Regards Peter Marelas +61 400 882 651 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Martin, Jonathan Sent: Saturday, 22 September 2007 3:37 AM To: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? I think what I'm reading here is that no one has done a true 1-to-1 comparison on Tape versus Deduplication / disk. I guess the next question is, what would go into such a comparison? 1) Recovery Point Objective 2) Amount of Data To Be Backed Up 3) Retention 4) Cost of Hardware (Deduplication Appliance w/ Disk) 5) Cost of Hardware (Tape Library) 6) Annual Maintenance on Hardware Above 7) Cost of Media w/ Replacement Figures 8) Cost to power / cool disks (infrastructure) 9) Cost of Network link to remote site for de-dupe 10) Cost of Media Transportation and Storage Price per GB unless factoring in at least all of the above is useless and much of that information depends on configuration. I did such an analysis when we upgraded to NBU6 and considered deduplication this time last year. In my case, many of the features of disk based deduplication weren't applicable to my situation (especially RPO) so tape was easily cheaper. If you are shipping media offsite daily though for a =1 day RPO then deduplication definitely makes a play. Further price per gig on the disk side has been heavily influenced by consumer grade SATA drives at 750gb and 1TB bringing costs way down in comparison to only 1 or 2 years ago. There's certainly a lot of data to injest before making claims of either technology's superiority in a particular environment. -Jonathan -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Curtis Preston Sent: Friday, September 21, 2007 1:10 PM To: Justin Piszcz; Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? First, you can't compare the cost of disk and tape directly like that. You have to include the drives and robots. A drive by itself is useful; a tape by itself is not. Setting that aside, if I put that disk in a system that's doing 20:1 de-duplication, my cost is now 1.65c/GB vs your 3-9c/GB. --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Justin Piszcz Sent: Friday, September
[Veritas-bu] Tapeless backup environments?
Yesterday our director said that he doesn't intend to ever upgrade existing STK L700 because eventually we'll go tapeless as that is what the industry is doing. The idea being we'd have our disk backup devices here (e.g. Data Domain) and transfer to offsite storage to another disk device so as to eliminate the need for ever transporting tapes. It made me wonder if anyone was actually doing the above already or was planning to do so? ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
On Fri, 21 Sep 2007, Jeff Lightner wrote: Yesterday our director said that he doesn't intend to ever upgrade existing STK L700 because eventually we'll go tapeless as that is what the industry is doing. The idea being we'd have our disk backup devices here (e.g. Data Domain) and transfer to offsite storage to another disk device so as to eliminate the need for ever transporting tapes. It made me wonder if anyone was actually doing the above already or was planning to do so? That seems to be the way people are 'thinking' but the bottom line is disk still is not cheaper than LTO-3 tape and there are a lot of advantages to tape; however, convicing management of this is an uphill battle. Justin. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
Discovery Channel = Carl Stehman IT Distributed Services Team Pepco Holdings, Inc. 202-331-6619 Pager 301-765-2703 [EMAIL PROTECTED] Jeff Lightner [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 09/21/2007 09:57 AM To veritas-bu@mailman.eng.auburn.edu cc Subject [Veritas-bu] Tapeless backup environments? Yesterday our director said that he doesn?t intend to ever upgrade existing STK L700 because eventually we?ll go tapeless as that is what the industry is doing. The idea being we?d have our disk backup devices here (e.g. Data Domain) and transfer to offsite storage to another disk device so as to eliminate the need for ever transporting tapes. It made me wonder if anyone was actually doing the above already or was planning to do so?___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu This Email message and any attachment may contain information that is proprietary, legally privileged, confidential and/or subject to copyright belonging to Pepco Holdings, Inc. or its affiliates (PHI). This Email is intended solely for the use of the person(s) to which it is addressed. If you are not an intended recipient, or the employee or agent responsible for delivery of this Email to the intended recipient(s), you are hereby notified that any dissemination, distribution or copying of this Email is strictly prohibited. If you have received this message in error, please immediately notify the sender and permanently delete this Email and any copies. PHI policies expressly prohibit employees from making defamatory or offensive statements and infringing any copyright or any other legal right by Email communication. PHI will not accept any liability in respect of such communications. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
I believe disks are 33c/gigabyte and tapes are 3-9cents/gigabyte or even cheaper, I do not remember the exact figures, but someone I know has done a cost analysis and tapes were by far cheaper. Also something that nobody calculates is the cost of power to keep disks spinning. Justin. On Fri, 21 Sep 2007, Jeff Lightner wrote: Disk is not cheaper? You've done a cost analysis? Not saying you're wrong and I haven't done an analysis but I'd be surprised if disks didn't actually work out to be cheaper over time: 1) Tapes age/break - We buy on average several hundred tapes a year - support on a disk array for failing disks may or may not be more expensive. 2) Transport/storage - We have to pay for offsite storage and transfer - it seems just putting an array in offsite facility would eliminate the need for transportation (in trucks) cost. Of course there would be cost in the data transfer disk to disk but since everyone seems to have connectivity over the internet it might be possible to do this using a B2B link rather than via dedicated circuits. 3) Labor cost in dealing with mechanical failures of robots. This one is hidden in salary but every time I have to work on a robot it means I can't be working on something else. While disk drives fail it doesn't seem to happen nearly as often as having to fish a tape out of a drive or the tape drive itself having failed. -Original Message- From: Justin Piszcz [mailto:[EMAIL PROTECTED] Sent: Friday, September 21, 2007 10:08 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? On Fri, 21 Sep 2007, Jeff Lightner wrote: Yesterday our director said that he doesn't intend to ever upgrade existing STK L700 because eventually we'll go tapeless as that is what the industry is doing. The idea being we'd have our disk backup devices here (e.g. Data Domain) and transfer to offsite storage to another disk device so as to eliminate the need for ever transporting tapes. It made me wonder if anyone was actually doing the above already or was planning to do so? That seems to be the way people are 'thinking' but the bottom line is disk still is not cheaper than LTO-3 tape and there are a lot of advantages to tape; however, convicing management of this is an uphill battle. Justin. -- CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential information and is for the sole use of the intended recipient(s). If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you. -- ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
Disk is not cheaper? You've done a cost analysis? Not saying you're wrong and I haven't done an analysis but I'd be surprised if disks didn't actually work out to be cheaper over time: 1) Tapes age/break - We buy on average several hundred tapes a year - support on a disk array for failing disks may or may not be more expensive. 2) Transport/storage - We have to pay for offsite storage and transfer - it seems just putting an array in offsite facility would eliminate the need for transportation (in trucks) cost. Of course there would be cost in the data transfer disk to disk but since everyone seems to have connectivity over the internet it might be possible to do this using a B2B link rather than via dedicated circuits. 3) Labor cost in dealing with mechanical failures of robots. This one is hidden in salary but every time I have to work on a robot it means I can't be working on something else. While disk drives fail it doesn't seem to happen nearly as often as having to fish a tape out of a drive or the tape drive itself having failed. -Original Message- From: Justin Piszcz [mailto:[EMAIL PROTECTED] Sent: Friday, September 21, 2007 10:08 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? On Fri, 21 Sep 2007, Jeff Lightner wrote: Yesterday our director said that he doesn't intend to ever upgrade existing STK L700 because eventually we'll go tapeless as that is what the industry is doing. The idea being we'd have our disk backup devices here (e.g. Data Domain) and transfer to offsite storage to another disk device so as to eliminate the need for ever transporting tapes. It made me wonder if anyone was actually doing the above already or was planning to do so? That seems to be the way people are 'thinking' but the bottom line is disk still is not cheaper than LTO-3 tape and there are a lot of advantages to tape; however, convicing management of this is an uphill battle. Justin. -- CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential information and is for the sole use of the intended recipient(s). If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you. -- ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
This was in response the the question about eliminating tapes. The IT department of Discovery Channel uses DataDomain and NetApp for all their backups. They are running Netbackup6.0MP5. We had a tour sponsored by DataDomain. We are considering going to disk based backups and are looking at VTL's and how all that stuff fits with Netbackup 6.5. We will probably be upgrading to Netbackup 6.5 next year and adding some sort of disk based backup solution. We are still evaluating vendors, no final decisions have been made. Hope this helps = Carl Stehman IT Distributed Services Team Pepco Holdings, Inc. 202-331-6619 Pager 301-765-2703 [EMAIL PROTECTED] Jeff Lightner [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 09/21/2007 10:41 AM To [EMAIL PROTECTED] cc veritas-bu@mailman.eng.auburn.edu, [EMAIL PROTECTED] Subject Re: [Veritas-bu] Tapeless backup environments? Cartoon Network. Did your post have a point? Discovery Channel had a special on this? You?re annoyed at theoretical questions? wtf? From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, September 21, 2007 10:28 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu; [EMAIL PROTECTED] Subject: Re: [Veritas-bu] Tapeless backup environments? Discovery Channel = Carl Stehman IT Distributed Services Team Pepco Holdings, Inc. 202-331-6619 Pager 301-765-2703 [EMAIL PROTECTED] Jeff Lightner [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 09/21/2007 09:57 AM To veritas-bu@mailman.eng.auburn.edu cc Subject [Veritas-bu] Tapeless backup environments? Yesterday our director said that he doesn?t intend to ever upgrade existing STK L700 because eventually we?ll go tapeless as that is what the industry is doing. The idea being we?d have our disk backup devices here (e.g. Data Domain) and transfer to offsite storage to another disk device so as to eliminate the need for ever transporting tapes. It made me wonder if anyone was actually doing the above already or was planning to do so?___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu This Email message and any attachment may contain information that is proprietary, legally privileged, confidential and/or subject to copyright belonging to Pepco Holdings, Inc. or its affiliates (PHI). This Email is intended solely for the use of the person(s) to which it is addressed. If you are not an intended recipient, or the employee or agent responsible for delivery of this Email to the intended recipient(s), you are hereby notified that any dissemination, distribution or copying of this Email is strictly prohibited. If you have received this message in error, please immediately notify the sender and permanently delete this Email and any copies. PHI policies expressly prohibit employees from making defamatory or offensive statements and infringing any copyright or any other legal right by Email communication. PHI will not accept any liability in respect of such communications. -- CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential information and is for the sole use of the intended recipient(s). If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you. -- ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu This Email message and any attachment may contain information that is proprietary, legally privileged, confidential and/or subject to copyright belonging to Pepco Holdings, Inc. or its affiliates (PHI). This Email is intended solely for the use of the person(s) to which it is addressed. If you are not an intended recipient, or the employee or agent responsible for delivery of this Email to the intended recipient(s), you are hereby notified that any dissemination, distribution or copying of this Email is strictly prohibited. If you have received this message in error, please immediately notify the sender and permanently delete this Email and any copies. PHI policies expressly prohibit employees from making defamatory or offensive statements and infringing any copyright or any other legal right by Email communication. PHI will not accept any liability in respect of such communications. ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
We have it on our plan. We will be using tape for only long term retention of data. Our plan is to purchase another EMC CDL, and mirror our existing EMC CDL to the EMC CDL at our DR site. Our master server already is duplicated, and this will allow us to start restores of stuff that is not tier 1 applications that already are mirrored to the DR site. I would prefer not to save the long term on tape, but we don't have a solution for any other way to do it at this time. Kevin From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeff Lightner Sent: Friday, September 21, 2007 9:44 AM To: veritas-bu@mailman.eng.auburn.edu Subject: [Veritas-bu] Tapeless backup environments? Yesterday our director said that he doesn't intend to ever upgrade existing STK L700 because eventually we'll go tapeless as that is what the industry is doing. The idea being we'd have our disk backup devices here (e.g. Data Domain) and transfer to offsite storage to another disk device so as to eliminate the need for ever transporting tapes. It made me wonder if anyone was actually doing the above already or was planning to do so? ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
Thanks. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, September 21, 2007 10:46 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu; [EMAIL PROTECTED] Subject: Re: [Veritas-bu] Tapeless backup environments? This was in response the the question about eliminating tapes. The IT department of Discovery Channel uses DataDomain and NetApp for all their backups. They are running Netbackup6.0MP5. We had a tour sponsored by DataDomain. We are considering going to disk based backups and are looking at VTL's and how all that stuff fits with Netbackup 6.5. We will probably be upgrading to Netbackup 6.5 next year and adding some sort of disk based backup solution. We are still evaluating vendors, no final decisions have been made. Hope this helps = Carl Stehman IT Distributed Services Team Pepco Holdings, Inc. 202-331-6619 Pager 301-765-2703 [EMAIL PROTECTED] Jeff Lightner [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 09/21/2007 10:41 AM To [EMAIL PROTECTED] cc veritas-bu@mailman.eng.auburn.edu, [EMAIL PROTECTED] Subject Re: [Veritas-bu] Tapeless backup environments? Cartoon Network. Did your post have a point? Discovery Channel had a special on this? You're annoyed at theoretical questions? wtf? From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, September 21, 2007 10:28 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu; [EMAIL PROTECTED] Subject: Re: [Veritas-bu] Tapeless backup environments? Discovery Channel = Carl Stehman IT Distributed Services Team Pepco Holdings, Inc. 202-331-6619 Pager 301-765-2703 [EMAIL PROTECTED] Jeff Lightner [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 09/21/2007 09:57 AM To veritas-bu@mailman.eng.auburn.edu cc Subject [Veritas-bu] Tapeless backup environments? Yesterday our director said that he doesn't intend to ever upgrade existing STK L700 because eventually we'll go tapeless as that is what the industry is doing. The idea being we'd have our disk backup devices here (e.g. Data Domain) and transfer to offsite storage to another disk device so as to eliminate the need for ever transporting tapes. It made me wonder if anyone was actually doing the above already or was planning to do so?___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu This Email message and any attachment may contain information that is proprietary, legally privileged, confidential and/or subject to copyright belonging to Pepco Holdings, Inc. or its affiliates (PHI). This Email is intended solely for the use of the person(s) to which it is addressed. If you are not an intended recipient, or the employee or agent responsible for delivery of this Email to the intended recipient(s), you are hereby notified that any dissemination, distribution or copying of this Email is strictly prohibited. If you have received this message in error, please immediately notify the sender and permanently delete this Email and any copies. PHI policies expressly prohibit employees from making defamatory or offensive statements and infringing any copyright or any other legal right by Email communication. PHI will not accept any liability in respect of such communications. -- CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential information and is for the sole use of the intended recipient(s). If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you. -- ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu This Email message and any attachment may contain information that is proprietary, legally privileged, confidential and/or subject to copyright belonging to Pepco Holdings, Inc. or its affiliates (PHI). This Email is intended solely for the use of the person(s) to which it is addressed. If you are not an intended recipient, or the employee or agent responsible for delivery of this Email to the intended recipient(s), you are hereby notified that any dissemination, distribution or copying of this Email is strictly prohibited. If you have received this message in error, please immediately notify the sender and permanently delete this Email and any copies. PHI policies expressly prohibit employees from making defamatory or offensive statements and infringing any copyright or any other
Re: [Veritas-bu] Tapeless backup environments?
Cartoon Network. Did your post have a point? Discovery Channel had a special on this? You're annoyed at theoretical questions? wtf? From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, September 21, 2007 10:28 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu; [EMAIL PROTECTED] Subject: Re: [Veritas-bu] Tapeless backup environments? Discovery Channel = Carl Stehman IT Distributed Services Team Pepco Holdings, Inc. 202-331-6619 Pager 301-765-2703 [EMAIL PROTECTED] Jeff Lightner [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 09/21/2007 09:57 AM To veritas-bu@mailman.eng.auburn.edu cc Subject [Veritas-bu] Tapeless backup environments? Yesterday our director said that he doesn't intend to ever upgrade existing STK L700 because eventually we'll go tapeless as that is what the industry is doing. The idea being we'd have our disk backup devices here (e.g. Data Domain) and transfer to offsite storage to another disk device so as to eliminate the need for ever transporting tapes. It made me wonder if anyone was actually doing the above already or was planning to do so?___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu This Email message and any attachment may contain information that is proprietary, legally privileged, confidential and/or subject to copyright belonging to Pepco Holdings, Inc. or its affiliates (PHI). This Email is intended solely for the use of the person(s) to which it is addressed. If you are not an intended recipient, or the employee or agent responsible for delivery of this Email to the intended recipient(s), you are hereby notified that any dissemination, distribution or copying of this Email is strictly prohibited. If you have received this message in error, please immediately notify the sender and permanently delete this Email and any copies. PHI policies expressly prohibit employees from making defamatory or offensive statements and infringing any copyright or any other legal right by Email communication. PHI will not accept any liability in respect of such communications. -- CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential information and is for the sole use of the intended recipient(s). If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you. -- ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
If you only do filesystem backups and not a lot of on-demand user-database backups, you can probably get away with disk. If you are doings 1,000s of user-initiated database backups though, disk will not cut it unless you had a massive infrastructure. Using LTO-3 or LTO-4 drives with 10GBps for example is much cheaper. Justin. On Fri, 21 Sep 2007, [EMAIL PROTECTED] wrote: This was in response the the question about eliminating tapes. The IT department of Discovery Channel uses DataDomain and NetApp for all their backups. They are running Netbackup6.0MP5. We had a tour sponsored by DataDomain. We are considering going to disk based backups and are looking at VTL's and how all that stuff fits with Netbackup 6.5. We will probably be upgrading to Netbackup 6.5 next year and adding some sort of disk based backup solution. We are still evaluating vendors, no final decisions have been made. Hope this helps = Carl Stehman IT Distributed Services Team Pepco Holdings, Inc. 202-331-6619 Pager 301-765-2703 [EMAIL PROTECTED] Jeff Lightner [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 09/21/2007 10:41 AM To [EMAIL PROTECTED] cc veritas-bu@mailman.eng.auburn.edu, [EMAIL PROTECTED] Subject Re: [Veritas-bu] Tapeless backup environments? Cartoon Network. Did your post have a point? Discovery Channel had a special on this? You?re annoyed at theoretical questions? wtf? From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, September 21, 2007 10:28 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu; [EMAIL PROTECTED] Subject: Re: [Veritas-bu] Tapeless backup environments? Discovery Channel = Carl Stehman IT Distributed Services Team Pepco Holdings, Inc. 202-331-6619 Pager 301-765-2703 [EMAIL PROTECTED] Jeff Lightner [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 09/21/2007 09:57 AM To veritas-bu@mailman.eng.auburn.edu cc Subject [Veritas-bu] Tapeless backup environments? Yesterday our director said that he doesn?t intend to ever upgrade existing STK L700 because eventually we?ll go tapeless as that is what the industry is doing. The idea being we?d have our disk backup devices here (e.g. Data Domain) and transfer to offsite storage to another disk device so as to eliminate the need for ever transporting tapes. It made me wonder if anyone was actually doing the above already or was planning to do so?___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu This Email message and any attachment may contain information that is proprietary, legally privileged, confidential and/or subject to copyright belonging to Pepco Holdings, Inc. or its affiliates (PHI). This Email is intended solely for the use of the person(s) to which it is addressed. If you are not an intended recipient, or the employee or agent responsible for delivery of this Email to the intended recipient(s), you are hereby notified that any dissemination, distribution or copying of this Email is strictly prohibited. If you have received this message in error, please immediately notify the sender and permanently delete this Email and any copies. PHI policies expressly prohibit employees from making defamatory or offensive statements and infringing any copyright or any other legal right by Email communication. PHI will not accept any liability in respect of such communications. -- CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential information and is for the sole use of the intended recipient(s). If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you. -- ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu This Email message and any attachment may contain information that is proprietary, legally privileged, confidential and/or subject to copyright belonging to Pepco Holdings, Inc. or its affiliates (PHI). This Email is intended solely for the use of the person(s) to which it is addressed. If you are not an intended recipient, or the employee or agent responsible for delivery of this Email to the intended recipient(s), you are hereby notified that any dissemination, distribution or copying of this Email is strictly prohibited. If you have received this message in error, please immediately notify the sender and permanently delete this Email and any copies. PHI policies expressly prohibit
Re: [Veritas-bu] Tapeless backup environments?
Huh? I've got to say I think completely the opposite of you on this one. User directed backups are really hard to direct at a resource that is limited by the number of drives. I suppose you could multiplex, but yuck. Why wouldn't you point all user backups to disk? It's very similar to what I like to do with Redologs/logical logs/transaction logs. When it's time to back them up, it's time to back them up. They don't want to wait for a tape drive. So send them to disk. Why wouldn't you want to send them to disk? --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Justin Piszcz Sent: Friday, September 21, 2007 8:01 AM To: [EMAIL PROTECTED] Cc: veritas-bu@mailman.eng.auburn.edu; Jeff Lightner; [EMAIL PROTECTED] Subject: Re: [Veritas-bu] Tapeless backup environments? If you only do filesystem backups and not a lot of on-demand user-database backups, you can probably get away with disk. If you are doings 1,000s of user-initiated database backups though, disk will not cut it unless you had a massive infrastructure. Using LTO-3 or LTO-4 drives with 10GBps for example is much cheaper. Justin. On Fri, 21 Sep 2007, [EMAIL PROTECTED] wrote: This was in response the the question about eliminating tapes. The IT department of Discovery Channel uses DataDomain and NetApp for all their backups. They are running Netbackup6.0MP5. We had a tour sponsored by DataDomain. We are considering going to disk based backups and are looking at VTL's and how all that stuff fits with Netbackup 6.5. We will probably be upgrading to Netbackup 6.5 next year and adding some sort of disk based backup solution. We are still evaluating vendors, no final decisions have been made. Hope this helps = Carl Stehman IT Distributed Services Team Pepco Holdings, Inc. 202-331-6619 Pager 301-765-2703 [EMAIL PROTECTED] Jeff Lightner [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 09/21/2007 10:41 AM To [EMAIL PROTECTED] cc veritas-bu@mailman.eng.auburn.edu, [EMAIL PROTECTED] Subject Re: [Veritas-bu] Tapeless backup environments? Cartoon Network. Did your post have a point? Discovery Channel had a special on this? You?re annoyed at theoretical questions? wtf? From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, September 21, 2007 10:28 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu; [EMAIL PROTECTED] Subject: Re: [Veritas-bu] Tapeless backup environments? Discovery Channel = Carl Stehman IT Distributed Services Team Pepco Holdings, Inc. 202-331-6619 Pager 301-765-2703 [EMAIL PROTECTED] Jeff Lightner [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 09/21/2007 09:57 AM To veritas-bu@mailman.eng.auburn.edu cc Subject [Veritas-bu] Tapeless backup environments? Yesterday our director said that he doesn?t intend to ever upgrade existing STK L700 because eventually we?ll go tapeless as that is what the industry is doing. The idea being we?d have our disk backup devices here (e.g. Data Domain) and transfer to offsite storage to another disk device so as to eliminate the need for ever transporting tapes. It made me wonder if anyone was actually doing the above already or was planning to do so?___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu This Email message and any attachment may contain information that is proprietary, legally privileged, confidential and/or subject to copyright belonging to Pepco Holdings, Inc. or its affiliates (PHI). This Email is intended solely for the use of the person(s) to which it is addressed. If you are not an intended recipient, or the employee or agent responsible for delivery of this Email to the intended recipient(s), you are hereby notified that any dissemination, distribution or copying of this Email is strictly prohibited. If you have received this message in error, please immediately notify the sender and permanently delete this Email and any copies. PHI policies expressly prohibit employees from making defamatory or offensive statements and infringing any copyright or any other legal right by Email communication. PHI will not accept any liability in respect of such communications. -- CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential information and is for the sole use of the intended recipient(s). If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete
Re: [Veritas-bu] Tapeless backup environments?
The only issue there is that the EMC CDL does not support de-duplication, and it doesn't look like they'll be doing it any time soon. I know they're working on it, but they haven't announced anything public, so who knows. Compare that to the other de-dupe vendors that announced probably a year before they were ready, and you've got some sense of my opinion of when EMC de-dupe will actually be GA - if not later. Your design would work great if you had de-dupe. Without de-dupe, you are going to be replicated 20 times more data (or more), requiring a significantly larger pipe. --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kevin Whittaker Sent: Friday, September 21, 2007 7:48 AM To: Jeff Lightner; veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? We have it on our plan. We will be using tape for only long term retention of data. Our plan is to purchase another EMC CDL, and mirror our existing EMC CDL to the EMC CDL at our DR site. Our master server already is duplicated, and this will allow us to start restores of stuff that is not tier 1 applications that already are mirrored to the DR site. I would prefer not to save the long term on tape, but we don't have a solution for any other way to do it at this time. Kevin From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeff Lightner Sent: Friday, September 21, 2007 9:44 AM To: veritas-bu@mailman.eng.auburn.edu Subject: [Veritas-bu] Tapeless backup environments? Yesterday our director said that he doesn't intend to ever upgrade existing STK L700 because eventually we'll go tapeless as that is what the industry is doing. The idea being we'd have our disk backup devices here (e.g. Data Domain) and transfer to offsite storage to another disk device so as to eliminate the need for ever transporting tapes. It made me wonder if anyone was actually doing the above already or was planning to do so? ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
First, you can't compare the cost of disk and tape directly like that. You have to include the drives and robots. A drive by itself is useful; a tape by itself is not. Setting that aside, if I put that disk in a system that's doing 20:1 de-duplication, my cost is now 1.65c/GB vs your 3-9c/GB. --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Justin Piszcz Sent: Friday, September 21, 2007 7:36 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? I believe disks are 33c/gigabyte and tapes are 3-9cents/gigabyte or even cheaper, I do not remember the exact figures, but someone I know has done a cost analysis and tapes were by far cheaper. Also something that nobody calculates is the cost of power to keep disks spinning. Justin. On Fri, 21 Sep 2007, Jeff Lightner wrote: Disk is not cheaper? You've done a cost analysis? Not saying you're wrong and I haven't done an analysis but I'd be surprised if disks didn't actually work out to be cheaper over time: 1) Tapes age/break - We buy on average several hundred tapes a year - support on a disk array for failing disks may or may not be more expensive. 2) Transport/storage - We have to pay for offsite storage and transfer - it seems just putting an array in offsite facility would eliminate the need for transportation (in trucks) cost. Of course there would be cost in the data transfer disk to disk but since everyone seems to have connectivity over the internet it might be possible to do this using a B2B link rather than via dedicated circuits. 3) Labor cost in dealing with mechanical failures of robots. This one is hidden in salary but every time I have to work on a robot it means I can't be working on something else. While disk drives fail it doesn't seem to happen nearly as often as having to fish a tape out of a drive or the tape drive itself having failed. -Original Message- From: Justin Piszcz [mailto:[EMAIL PROTECTED] Sent: Friday, September 21, 2007 10:08 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? On Fri, 21 Sep 2007, Jeff Lightner wrote: Yesterday our director said that he doesn't intend to ever upgrade existing STK L700 because eventually we'll go tapeless as that is what the industry is doing. The idea being we'd have our disk backup devices here (e.g. Data Domain) and transfer to offsite storage to another disk device so as to eliminate the need for ever transporting tapes. It made me wonder if anyone was actually doing the above already or was planning to do so? That seems to be the way people are 'thinking' but the bottom line is disk still is not cheaper than LTO-3 tape and there are a lot of advantages to tape; however, convicing management of this is an uphill battle. Justin. -- CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential information and is for the sole use of the intended recipient(s). If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information is prohibited and may be unlawful. If you have received this electronic transmission in error, please reply immediately to the sender that you have received the message in error, and delete it. Thank you. -- ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
Yes, we are in the middle of this (trying to replace D2T2T with D2D2D) process now. What I am seeing is that while disk media costs more than tape per TB, de-duplication is the difference-maker, the enabler, making extra weeks or months retention of D2D data inexpensive. Buy another appliance for off-site replication, and only changes, not full backups, are essentially moved between sites, causing much lower volume of transmission. Kind of like running an rsync after previously rsyncing. Good news that all vendors I've seriously looked at make this automatic and great news is that one might expect NetBackup will be able to see and use the replicated site directly (soon). I'm not a fan of VTL, so we are not looking at VTL. Your situation may vary. Disk-to-disk backup planning is not a simple exercise, however, as features, operations and even terminology varies substantially from vendor to vendor. The state of the art does seem much more mature than even a couple of years ago. All vendors we're seriously looking at know their competition and will make very substantial discounting from list prices. Proper sizing has been a chore, as each vendor tries to minimize the cost of their proposal. Although we have not made final decisions, I find the Data Domain backup appliance offerings superb (though we have not yet had an on-site trial). On the other hand, we have a lot of NetApp primary disk, and so the NetApp backup offerings are interesting for their support/use of snapshots and integration with NetBackup ($$$ features on both NetApp and Symantec sides). Technically speaking, though, NetApp NearStore used as a simple disk backup appliance does not appear to stack up to the Data Domain offerings. Which solution is best for us has yet to be determined for my approximately 10TB site. All of the D2D2D solutions we've studied and have been proposed to us would entail significantly more capital outlay than simply adding some staging disk and getting more modern tape drives for our L700, but the performance, scalability and automation levels of D2D2D are very exciting. Hope this helps! (please do post your experiences) cheers, wayne Jeff Lightner wrote, in part, on 2007-09-21 9:43 AM: Yesterday our director said that he doesn’t intend to ever upgrade existing STK L700 because eventually we’ll go tapeless as that is what the industry is doing. The idea being we’d have our disk backup devices here (e.g. Data Domain) and transfer to offsite storage to another disk device so as to eliminate the need for ever transporting tapes. It made me wonder if anyone was actually doing the above already or was planning to do so? ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
I think what I'm reading here is that no one has done a true 1-to-1 comparison on Tape versus Deduplication / disk. I guess the next question is, what would go into such a comparison? 1) Recovery Point Objective 2) Amount of Data To Be Backed Up 3) Retention 4) Cost of Hardware (Deduplication Appliance w/ Disk) 5) Cost of Hardware (Tape Library) 6) Annual Maintenance on Hardware Above 7) Cost of Media w/ Replacement Figures 8) Cost to power / cool disks (infrastructure) 9) Cost of Network link to remote site for de-dupe 10) Cost of Media Transportation and Storage Price per GB unless factoring in at least all of the above is useless and much of that information depends on configuration. I did such an analysis when we upgraded to NBU6 and considered deduplication this time last year. In my case, many of the features of disk based deduplication weren't applicable to my situation (especially RPO) so tape was easily cheaper. If you are shipping media offsite daily though for a =1 day RPO then deduplication definitely makes a play. Further price per gig on the disk side has been heavily influenced by consumer grade SATA drives at 750gb and 1TB bringing costs way down in comparison to only 1 or 2 years ago. There's certainly a lot of data to injest before making claims of either technology's superiority in a particular environment. -Jonathan -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Curtis Preston Sent: Friday, September 21, 2007 1:10 PM To: Justin Piszcz; Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? First, you can't compare the cost of disk and tape directly like that. You have to include the drives and robots. A drive by itself is useful; a tape by itself is not. Setting that aside, if I put that disk in a system that's doing 20:1 de-duplication, my cost is now 1.65c/GB vs your 3-9c/GB. --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Justin Piszcz Sent: Friday, September 21, 2007 7:36 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? I believe disks are 33c/gigabyte and tapes are 3-9cents/gigabyte or even cheaper, I do not remember the exact figures, but someone I know has done a cost analysis and tapes were by far cheaper. Also something that nobody calculates is the cost of power to keep disks spinning. Justin. On Fri, 21 Sep 2007, Jeff Lightner wrote: Disk is not cheaper? You've done a cost analysis? Not saying you're wrong and I haven't done an analysis but I'd be surprised if disks didn't actually work out to be cheaper over time: 1) Tapes age/break - We buy on average several hundred tapes a year - support on a disk array for failing disks may or may not be more expensive. 2) Transport/storage - We have to pay for offsite storage and transfer - it seems just putting an array in offsite facility would eliminate the need for transportation (in trucks) cost. Of course there would be cost in the data transfer disk to disk but since everyone seems to have connectivity over the internet it might be possible to do this using a B2B link rather than via dedicated circuits. 3) Labor cost in dealing with mechanical failures of robots. This one is hidden in salary but every time I have to work on a robot it means I can't be working on something else. While disk drives fail it doesn't seem to happen nearly as often as having to fish a tape out of a drive or the tape drive itself having failed. -Original Message- From: Justin Piszcz [mailto:[EMAIL PROTECTED] Sent: Friday, September 21, 2007 10:08 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? On Fri, 21 Sep 2007, Jeff Lightner wrote: Yesterday our director said that he doesn't intend to ever upgrade existing STK L700 because eventually we'll go tapeless as that is what the industry is doing. The idea being we'd have our disk backup devices here (e.g. Data Domain) and transfer to offsite storage to another disk device so as to eliminate the need for ever transporting tapes. It made me wonder if anyone was actually doing the above already or was planning to do so? That seems to be the way people are 'thinking' but the bottom line is disk still is not cheaper than LTO-3 tape and there are a lot of advantages to tape; however, convicing management of this is an uphill battle. Justin. -- CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential information and is for the sole use of the intended recipient(s). If you are not the intended recipient, any disclosure, copying, distribution, or use of the contents of this information
Re: [Veritas-bu] Tapeless backup environments?
Just curious. You said I'm not a fan of VTL, and so therefore aren't looking at the VTL vendors at all. That kind of leaves out a whole segment of the market, doesn't it? I'm aware of Data Domain, Exagrid, NEC NetApp NAS-based de-dupe products, but I can't imagine not also bringing Diligent, Falconstor, Quantum SEPATON to the table just because their interface is (currently) virtual tape. I would bring them all to the table and make them tell me why I should go their direction, be it virtual tape or NAS. In addition, depending on how things are configured and what software/version we're talking about, I'd much rather back up to a VTL than a NAS head. --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Wayne T Smith Sent: Friday, September 21, 2007 10:38 AM To: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? Yes, we are in the middle of this (trying to replace D2T2T with D2D2D) process now. What I am seeing is that while disk media costs more than tape per TB, de-duplication is the difference-maker, the enabler, making extra weeks or months retention of D2D data inexpensive. Buy another appliance for off-site replication, and only changes, not full backups, are essentially moved between sites, causing much lower volume of transmission. Kind of like running an rsync after previously rsyncing. Good news that all vendors I've seriously looked at make this automatic and great news is that one might expect NetBackup will be able to see and use the replicated site directly (soon). I'm not a fan of VTL, so we are not looking at VTL. Your situation may vary. Disk-to-disk backup planning is not a simple exercise, however, as features, operations and even terminology varies substantially from vendor to vendor. The state of the art does seem much more mature than even a couple of years ago. All vendors we're seriously looking at know their competition and will make very substantial discounting from list prices. Proper sizing has been a chore, as each vendor tries to minimize the cost of their proposal. Although we have not made final decisions, I find the Data Domain backup appliance offerings superb (though we have not yet had an on-site trial). On the other hand, we have a lot of NetApp primary disk, and so the NetApp backup offerings are interesting for their support/use of snapshots and integration with NetBackup ($$$ features on both NetApp and Symantec sides). Technically speaking, though, NetApp NearStore used as a simple disk backup appliance does not appear to stack up to the Data Domain offerings. Which solution is best for us has yet to be determined for my approximately 10TB site. All of the D2D2D solutions we've studied and have been proposed to us would entail significantly more capital outlay than simply adding some staging disk and getting more modern tape drives for our L700, but the performance, scalability and automation levels of D2D2D are very exciting. Hope this helps! (please do post your experiences) cheers, wayne Jeff Lightner wrote, in part, on 2007-09-21 9:43 AM: Yesterday our director said that he doesn't intend to ever upgrade existing STK L700 because eventually we'll go tapeless as that is what the industry is doing. The idea being we'd have our disk backup devices here (e.g. Data Domain) and transfer to offsite storage to another disk device so as to eliminate the need for ever transporting tapes. It made me wonder if anyone was actually doing the above already or was planning to do so? ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
I stand corrected. Curtis has all the answers and he's sitting on them. =P Worrying about multiplexing settings and tape failures? Come on, that's about as soft a cost as you can dream up. -Jonathan -Original Message- From: Curtis Preston [mailto:[EMAIL PROTECTED] Sent: Friday, September 21, 2007 2:06 PM To: Martin, Jonathan; veritas-bu@mailman.eng.auburn.edu Subject: RE: [Veritas-bu] Tapeless backup environments? Oh, I wouldn't say that. ;) We've been doing a lot of comparisons lately, and the comparisons include all of what you listed plus the cost differential in cost of operation. For example, opex savings from not having to worry about multiplexing settings, tape failures, etc. --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Martin, Jonathan Sent: Friday, September 21, 2007 10:37 AM To: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? I think what I'm reading here is that no one has done a true 1-to-1 comparison on Tape versus Deduplication / disk. I guess the next question is, what would go into such a comparison? 1) Recovery Point Objective 2) Amount of Data To Be Backed Up 3) Retention 4) Cost of Hardware (Deduplication Appliance w/ Disk) 5) Cost of Hardware (Tape Library) 6) Annual Maintenance on Hardware Above 7) Cost of Media w/ Replacement Figures 8) Cost to power / cool disks (infrastructure) 9) Cost of Network link to remote site for de-dupe 10) Cost of Media Transportation and Storage Price per GB unless factoring in at least all of the above is useless and much of that information depends on configuration. I did such an analysis when we upgraded to NBU6 and considered deduplication this time last year. In my case, many of the features of disk based deduplication weren't applicable to my situation (especially RPO) so tape was easily cheaper. If you are shipping media offsite daily though for a =1 day RPO then deduplication definitely makes a play. Further price per gig on the disk side has been heavily influenced by consumer grade SATA drives at 750gb and 1TB bringing costs way down in comparison to only 1 or 2 years ago. There's certainly a lot of data to injest before making claims of either technology's superiority in a particular environment. -Jonathan -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Curtis Preston Sent: Friday, September 21, 2007 1:10 PM To: Justin Piszcz; Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? First, you can't compare the cost of disk and tape directly like that. You have to include the drives and robots. A drive by itself is useful; a tape by itself is not. Setting that aside, if I put that disk in a system that's doing 20:1 de-duplication, my cost is now 1.65c/GB vs your 3-9c/GB. --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Justin Piszcz Sent: Friday, September 21, 2007 7:36 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? I believe disks are 33c/gigabyte and tapes are 3-9cents/gigabyte or even cheaper, I do not remember the exact figures, but someone I know has done a cost analysis and tapes were by far cheaper. Also something that nobody calculates is the cost of power to keep disks spinning. Justin. On Fri, 21 Sep 2007, Jeff Lightner wrote: Disk is not cheaper? You've done a cost analysis? Not saying you're wrong and I haven't done an analysis but I'd be surprised if disks didn't actually work out to be cheaper over time: 1) Tapes age/break - We buy on average several hundred tapes a year - support on a disk array for failing disks may or may not be more expensive. 2) Transport/storage - We have to pay for offsite storage and transfer - it seems just putting an array in offsite facility would eliminate the need for transportation (in trucks) cost. Of course there would be cost in the data transfer disk to disk but since everyone seems to have connectivity over the internet it might be possible to do this using a B2B link rather than via dedicated circuits. 3) Labor cost in dealing with mechanical failures of robots. This one is hidden in salary but every time I have to work on a robot it means I can't be working on something else. While disk drives fail it doesn't seem to happen nearly as often as having to fish a tape out of a drive or the tape drive itself having failed. -Original Message- From: Justin Piszcz [mailto:[EMAIL PROTECTED] Sent: Friday, September 21, 2007 10:08 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu
Re: [Veritas-bu] Tapeless backup environments?
Come on, man. Can't give away everything! ;) --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies -Original Message- From: Martin, Jonathan [mailto:[EMAIL PROTECTED] Sent: Friday, September 21, 2007 11:16 AM To: Curtis Preston; veritas-bu@mailman.eng.auburn.edu Subject: RE: [Veritas-bu] Tapeless backup environments? I stand corrected. Curtis has all the answers and he's sitting on them. =P Worrying about multiplexing settings and tape failures? Come on, that's about as soft a cost as you can dream up. -Jonathan -Original Message- From: Curtis Preston [mailto:[EMAIL PROTECTED] Sent: Friday, September 21, 2007 2:06 PM To: Martin, Jonathan; veritas-bu@mailman.eng.auburn.edu Subject: RE: [Veritas-bu] Tapeless backup environments? Oh, I wouldn't say that. ;) We've been doing a lot of comparisons lately, and the comparisons include all of what you listed plus the cost differential in cost of operation. For example, opex savings from not having to worry about multiplexing settings, tape failures, etc. --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Martin, Jonathan Sent: Friday, September 21, 2007 10:37 AM To: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? I think what I'm reading here is that no one has done a true 1-to-1 comparison on Tape versus Deduplication / disk. I guess the next question is, what would go into such a comparison? 1) Recovery Point Objective 2) Amount of Data To Be Backed Up 3) Retention 4) Cost of Hardware (Deduplication Appliance w/ Disk) 5) Cost of Hardware (Tape Library) 6) Annual Maintenance on Hardware Above 7) Cost of Media w/ Replacement Figures 8) Cost to power / cool disks (infrastructure) 9) Cost of Network link to remote site for de-dupe 10) Cost of Media Transportation and Storage Price per GB unless factoring in at least all of the above is useless and much of that information depends on configuration. I did such an analysis when we upgraded to NBU6 and considered deduplication this time last year. In my case, many of the features of disk based deduplication weren't applicable to my situation (especially RPO) so tape was easily cheaper. If you are shipping media offsite daily though for a =1 day RPO then deduplication definitely makes a play. Further price per gig on the disk side has been heavily influenced by consumer grade SATA drives at 750gb and 1TB bringing costs way down in comparison to only 1 or 2 years ago. There's certainly a lot of data to injest before making claims of either technology's superiority in a particular environment. -Jonathan -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Curtis Preston Sent: Friday, September 21, 2007 1:10 PM To: Justin Piszcz; Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? First, you can't compare the cost of disk and tape directly like that. You have to include the drives and robots. A drive by itself is useful; a tape by itself is not. Setting that aside, if I put that disk in a system that's doing 20:1 de-duplication, my cost is now 1.65c/GB vs your 3-9c/GB. --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Justin Piszcz Sent: Friday, September 21, 2007 7:36 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? I believe disks are 33c/gigabyte and tapes are 3-9cents/gigabyte or even cheaper, I do not remember the exact figures, but someone I know has done a cost analysis and tapes were by far cheaper. Also something that nobody calculates is the cost of power to keep disks spinning. Justin. On Fri, 21 Sep 2007, Jeff Lightner wrote: Disk is not cheaper? You've done a cost analysis? Not saying you're wrong and I haven't done an analysis but I'd be surprised if disks didn't actually work out to be cheaper over time: 1) Tapes age/break - We buy on average several hundred tapes a year - support on a disk array for failing disks may or may not be more expensive. 2) Transport/storage - We have to pay for offsite storage and transfer - it seems just putting an array in offsite facility would eliminate the need for transportation (in trucks) cost. Of course there would be cost in the data transfer disk to disk but since everyone seems to have connectivity over the internet it might be possible to do this using a B2B link rather than via dedicated circuits. 3) Labor cost in dealing with mechanical failures of robots. This one is hidden in salary but every time I have to work on a robot it means I can't be working
Re: [Veritas-bu] Tapeless backup environments?
Oh, I wouldn't say that. ;) We've been doing a lot of comparisons lately, and the comparisons include all of what you listed plus the cost differential in cost of operation. For example, opex savings from not having to worry about multiplexing settings, tape failures, etc. --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Martin, Jonathan Sent: Friday, September 21, 2007 10:37 AM To: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? I think what I'm reading here is that no one has done a true 1-to-1 comparison on Tape versus Deduplication / disk. I guess the next question is, what would go into such a comparison? 1) Recovery Point Objective 2) Amount of Data To Be Backed Up 3) Retention 4) Cost of Hardware (Deduplication Appliance w/ Disk) 5) Cost of Hardware (Tape Library) 6) Annual Maintenance on Hardware Above 7) Cost of Media w/ Replacement Figures 8) Cost to power / cool disks (infrastructure) 9) Cost of Network link to remote site for de-dupe 10) Cost of Media Transportation and Storage Price per GB unless factoring in at least all of the above is useless and much of that information depends on configuration. I did such an analysis when we upgraded to NBU6 and considered deduplication this time last year. In my case, many of the features of disk based deduplication weren't applicable to my situation (especially RPO) so tape was easily cheaper. If you are shipping media offsite daily though for a =1 day RPO then deduplication definitely makes a play. Further price per gig on the disk side has been heavily influenced by consumer grade SATA drives at 750gb and 1TB bringing costs way down in comparison to only 1 or 2 years ago. There's certainly a lot of data to injest before making claims of either technology's superiority in a particular environment. -Jonathan -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Curtis Preston Sent: Friday, September 21, 2007 1:10 PM To: Justin Piszcz; Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? First, you can't compare the cost of disk and tape directly like that. You have to include the drives and robots. A drive by itself is useful; a tape by itself is not. Setting that aside, if I put that disk in a system that's doing 20:1 de-duplication, my cost is now 1.65c/GB vs your 3-9c/GB. --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Justin Piszcz Sent: Friday, September 21, 2007 7:36 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? I believe disks are 33c/gigabyte and tapes are 3-9cents/gigabyte or even cheaper, I do not remember the exact figures, but someone I know has done a cost analysis and tapes were by far cheaper. Also something that nobody calculates is the cost of power to keep disks spinning. Justin. On Fri, 21 Sep 2007, Jeff Lightner wrote: Disk is not cheaper? You've done a cost analysis? Not saying you're wrong and I haven't done an analysis but I'd be surprised if disks didn't actually work out to be cheaper over time: 1) Tapes age/break - We buy on average several hundred tapes a year - support on a disk array for failing disks may or may not be more expensive. 2) Transport/storage - We have to pay for offsite storage and transfer - it seems just putting an array in offsite facility would eliminate the need for transportation (in trucks) cost. Of course there would be cost in the data transfer disk to disk but since everyone seems to have connectivity over the internet it might be possible to do this using a B2B link rather than via dedicated circuits. 3) Labor cost in dealing with mechanical failures of robots. This one is hidden in salary but every time I have to work on a robot it means I can't be working on something else. While disk drives fail it doesn't seem to happen nearly as often as having to fish a tape out of a drive or the tape drive itself having failed. -Original Message- From: Justin Piszcz [mailto:[EMAIL PROTECTED] Sent: Friday, September 21, 2007 10:08 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? On Fri, 21 Sep 2007, Jeff Lightner wrote: Yesterday our director said that he doesn't intend to ever upgrade existing STK L700 because eventually we'll go tapeless as that is what the industry is doing. The idea being we'd have our disk backup devices here (e.g. Data Domain) and transfer to offsite storage to another disk device so as to eliminate the need for ever transporting tapes
Re: [Veritas-bu] Tapeless backup environments?
/Steve On Fri, 21 Sep 2007, Martin, Jonathan wrote: I stand corrected. Curtis has all the answers and he's sitting on them. =P Worrying about multiplexing settings and tape failures? Come on, that's about as soft a cost as you can dream up. -Jonathan -Original Message- From: Curtis Preston [mailto:[EMAIL PROTECTED] Sent: Friday, September 21, 2007 2:06 PM To: Martin, Jonathan; veritas-bu@mailman.eng.auburn.edu Subject: RE: [Veritas-bu] Tapeless backup environments? Oh, I wouldn't say that. ;) We've been doing a lot of comparisons lately, and the comparisons include all of what you listed plus the cost differential in cost of operation. For example, opex savings from not having to worry about multiplexing settings, tape failures, etc. --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Martin, Jonathan Sent: Friday, September 21, 2007 10:37 AM To: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? I think what I'm reading here is that no one has done a true 1-to-1 comparison on Tape versus Deduplication / disk. I guess the next question is, what would go into such a comparison? 1) Recovery Point Objective 2) Amount of Data To Be Backed Up 3) Retention 4) Cost of Hardware (Deduplication Appliance w/ Disk) 5) Cost of Hardware (Tape Library) 6) Annual Maintenance on Hardware Above 7) Cost of Media w/ Replacement Figures 8) Cost to power / cool disks (infrastructure) 9) Cost of Network link to remote site for de-dupe 10) Cost of Media Transportation and Storage Price per GB unless factoring in at least all of the above is useless and much of that information depends on configuration. I did such an analysis when we upgraded to NBU6 and considered deduplication this time last year. In my case, many of the features of disk based deduplication weren't applicable to my situation (especially RPO) so tape was easily cheaper. If you are shipping media offsite daily though for a =1 day RPO then deduplication definitely makes a play. Further price per gig on the disk side has been heavily influenced by consumer grade SATA drives at 750gb and 1TB bringing costs way down in comparison to only 1 or 2 years ago. There's certainly a lot of data to injest before making claims of either technology's superiority in a particular environment. -Jonathan -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Curtis Preston Sent: Friday, September 21, 2007 1:10 PM To: Justin Piszcz; Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? First, you can't compare the cost of disk and tape directly like that. You have to include the drives and robots. A drive by itself is useful; a tape by itself is not. Setting that aside, if I put that disk in a system that's doing 20:1 de-duplication, my cost is now 1.65c/GB vs your 3-9c/GB. --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Justin Piszcz Sent: Friday, September 21, 2007 7:36 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? I believe disks are 33c/gigabyte and tapes are 3-9cents/gigabyte or even cheaper, I do not remember the exact figures, but someone I know has done a cost analysis and tapes were by far cheaper. Also something that nobody calculates is the cost of power to keep disks spinning. Justin. On Fri, 21 Sep 2007, Jeff Lightner wrote: Disk is not cheaper? You've done a cost analysis? Not saying you're wrong and I haven't done an analysis but I'd be surprised if disks didn't actually work out to be cheaper over time: 1) Tapes age/break - We buy on average several hundred tapes a year - support on a disk array for failing disks may or may not be more expensive. 2) Transport/storage - We have to pay for offsite storage and transfer - it seems just putting an array in offsite facility would eliminate the need for transportation (in trucks) cost. Of course there would be cost in the data transfer disk to disk but since everyone seems to have connectivity over the internet it might be possible to do this using a B2B link rather than via dedicated circuits. 3) Labor cost in dealing with mechanical failures of robots. This one is hidden in salary but every time I have to work on a robot it means I can't be working on something else. While disk drives fail it doesn't seem to happen nearly as often as having to fish a tape out of a drive or the tape drive itself having failed. -Original Message- From: Justin Piszcz [mailto:[EMAIL PROTECTED
Re: [Veritas-bu] Tapeless backup environments?
When we put in our backup to disk system the main driving factor was the point in time recovery and replication. We use the system to backup critical data for business continuity and then replicate it to our DR site. This gives us point in time recovery of about 24 hours or less. When we compared tape to disk we found that although the disk backup was more expensive it wasn't prohibitively so. And the benefit of the point in time recovery of 24 hours or less made it the best solution for our needs. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Martin, Jonathan Sent: Friday, September 21, 2007 12:37 PM To: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? I think what I'm reading here is that no one has done a true 1-to-1 comparison on Tape versus Deduplication / disk. I guess the next question is, what would go into such a comparison? 1) Recovery Point Objective 2) Amount of Data To Be Backed Up 3) Retention 4) Cost of Hardware (Deduplication Appliance w/ Disk) 5) Cost of Hardware (Tape Library) 6) Annual Maintenance on Hardware Above 7) Cost of Media w/ Replacement Figures 8) Cost to power / cool disks (infrastructure) 9) Cost of Network link to remote site for de-dupe 10) Cost of Media Transportation and Storage Price per GB unless factoring in at least all of the above is useless and much of that information depends on configuration. I did such an analysis when we upgraded to NBU6 and considered deduplication this time last year. In my case, many of the features of disk based deduplication weren't applicable to my situation (especially RPO) so tape was easily cheaper. If you are shipping media offsite daily though for a =1 day RPO then deduplication definitely makes a play. Further price per gig on the disk side has been heavily influenced by consumer grade SATA drives at 750gb and 1TB bringing costs way down in comparison to only 1 or 2 years ago. There's certainly a lot of data to injest before making claims of either technology's superiority in a particular environment. -Jonathan -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Curtis Preston Sent: Friday, September 21, 2007 1:10 PM To: Justin Piszcz; Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? First, you can't compare the cost of disk and tape directly like that. You have to include the drives and robots. A drive by itself is useful; a tape by itself is not. Setting that aside, if I put that disk in a system that's doing 20:1 de-duplication, my cost is now 1.65c/GB vs your 3-9c/GB. --- W. Curtis Preston Backup Blog @ www.backupcentral.com VP Data Protection, GlassHouse Technologies -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Justin Piszcz Sent: Friday, September 21, 2007 7:36 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? I believe disks are 33c/gigabyte and tapes are 3-9cents/gigabyte or even cheaper, I do not remember the exact figures, but someone I know has done a cost analysis and tapes were by far cheaper. Also something that nobody calculates is the cost of power to keep disks spinning. Justin. On Fri, 21 Sep 2007, Jeff Lightner wrote: Disk is not cheaper? You've done a cost analysis? Not saying you're wrong and I haven't done an analysis but I'd be surprised if disks didn't actually work out to be cheaper over time: 1) Tapes age/break - We buy on average several hundred tapes a year - support on a disk array for failing disks may or may not be more expensive. 2) Transport/storage - We have to pay for offsite storage and transfer - it seems just putting an array in offsite facility would eliminate the need for transportation (in trucks) cost. Of course there would be cost in the data transfer disk to disk but since everyone seems to have connectivity over the internet it might be possible to do this using a B2B link rather than via dedicated circuits. 3) Labor cost in dealing with mechanical failures of robots. This one is hidden in salary but every time I have to work on a robot it means I can't be working on something else. While disk drives fail it doesn't seem to happen nearly as often as having to fish a tape out of a drive or the tape drive itself having failed. -Original Message- From: Justin Piszcz [mailto:[EMAIL PROTECTED] Sent: Friday, September 21, 2007 10:08 AM To: Jeff Lightner Cc: veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] Tapeless backup environments? On Fri, 21 Sep 2007, Jeff Lightner wrote: Yesterday our director said that he doesn't intend to ever upgrade existing STK L700 because eventually we'll go tapeless as that is what the industry is doing. The idea being we'd have our disk backup devices here (e.g
Re: [Veritas-bu] Tapeless backup environments?
On 9/21/07, Jeff Lightner [EMAIL PROTECTED] wrote: Yesterday our director said that he doesn't intend to ever upgrade existing STK L700 because eventually we'll go tapeless as that is what the industry is doing. snip Tape has been dying for 30 years. http://searchstoragechannel.techtarget.com/originalContent/0,289142,sid98_gci1237881,00.html An article on Slashdot this morning described the 5 biggest SANs. It was interesting to see that NASA's SAN includes 1.1 Pbytes of disk and 10 Pbytes of tape storage and the SAN at the San Diego Supercomputing Center has about 1 Pbyte disk and 18 Pbytes of tape storage. http://www.byteandswitch.com/document.asp?doc_id=134355WT.svl=news1_1 Austin ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu