Re: [Veritas-bu] Tapeless backup environments

2007-10-26 Thread Curtis Preston
Hmmm..  I need to look into this further.  I could have sworn that it
stores a checksum per file backed up, and that it used that checksum
when it restored the file to see if the restored file is the same as the
backup.

I wonder if we can get an authoritative answer on this from a Symantec
lurker.

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: Thursday, October 25, 2007 12:26 PM
To: [EMAIL PROTECTED]; veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments

Nope - I don't think Netbackup is making checksums.

Tape hardware seems to be reasonably adept at detecting big tape errors,
though.  This, of course, goes away with disk based backups.

bpverify is just a check of the tape contents vs the media catalog.  It
does read the tape blocks so it may allow the drive to detect a media
error but it's not a verification of the block integrity vs some stored
checksum.


DESCRIPTION
 bpverify verifies the contents of one  or  more  backups  by
 reading  the backup volume and comparing its contents to the
 NetBackup catalog. This operation does not compare the  data
 on the volume with the contents of the client disk. However,
 it does read each block in the image,  thus  verifying  that
 the  volume  is readable. NetBackup verifies only one backup
 at a time and tries to minimize media mounts and positioning
 time.

-M

-Original Message-
From: Len Boyle [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 22, 2007 6:37 PM
To: Donaldson, Mark - Broomfield, CO; veritas-bu@mailman.eng.auburn.edu
Subject: RE: Re: [Veritas-bu] Tapeless backup environments

Hello Mark,

Did I read in this list that netbackup was supposed to do some kind of
checksum on the data written to tape?
If so would a bpverify check this. I would assume that if netbackup does
this it would find the error.
because netbackup would do it's calc before passing the block to the
dedupe hardware/software. And the block that it gets back from the
dedupe hardware/software would be different.

Of course the brings on the question with the Symantec/Veritas pure disk
product or emc's as the netbackup and the dedupe parts are merged one
would not have this double check. At least I would not think that one
would.

len

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: Monday, October 22, 2007 4:52 PM
To: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments

I think that part of the problem is that a hash duplication is nearly
undetectable until you have restored and tested it as false.

We all know that 99.999% of what we back up is never restored.  It just
ages gracefully on media and is expired.  If any of that .001% is
restored and is damaged due to a tape fault (and we've all had it
happen) then we all know that we can usually reach back to a different
version or different tape and we'll be close enough to make the user go
away and let us return to our coffee and surfing.

I think a big part of the worry of a hash collision is that the restore
seems to happen, the file restores flawlessly, and it'll not be
detectable unless someone can checksum the whole file or it's a binary
or similar that simply refuses to work.

Again, restoring from a different tape, different version may be
ineffective depending on where the hash collision occurred and for what
reason.  Every version may use this same unchanging block which is
restore incorrectly due to an invalid hash match.

I know the odds are astronomical but I still remember that even though
the odds are 150 million to one I'll win the lottery, I still see
smiling faces on TV holding giant checks.

It's a bet, like all other restore techniques, and I'm going to make
sure management has full knowledge of the risks before we implement it
here (which is likely).

-M

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jeff
Lightner
Sent: Monday, October 22, 2007 10:28 AM
To: Austin Murphy; veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments

This paper looks to be 5 years old (based on newest references it cites
- it actually cites others that go back nearly 10 years).  It would be
interesting to see his take on current deduplication offerings to see if
the other checks they contain over simple hashing were enough to allay
his concerns.

One thing I've not seen in all this discussion is anyone saying they've
actually experienced data loss as a result of commercial deduplication
devices.  Can anyone here claim that?

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Austin
Murphy
Sent: Monday, October 22, 2007 10:47 AM
To: veritas-bu@mailman.eng.auburn.edu
Subject

Re: [Veritas-bu] Tapeless backup environments

2007-10-26 Thread Curtis Preston
You're absolutely right.  Of course, every time you copy data, you face
a similar risk.  Every single time you copy data from one device to
another, multiple levels of CRC/ECC are used to make sure that the
target copy is the same as the source copy, and there is a chance
(however small) every time you make a copy that the copy will make a
mistake and CRC/ECC will not pick it up.

That was part of my original point that I made in the first article I
wrote on the subject.  Yes, I know there is a chance for a hash
collision and data corruption, but there is a chance of that every time
you copy data anywhere, disk to disk, disk to tape, etc -- and there's a
chance you'll never know it.  I just don't understand all the vitriol
aimed at this particular method.

I did read the paper that someone forwarded, and while the paper is
quite old, I think his arguments still hold true.  (He also used the
birthday paradox in the same way I did, BTW.)

The only part I didn't quite understand was the part where he said that
you can't compare hash collisions with hardware errors (like I'm doing
above).  I read that part a couple of times and didn't get it.  I'm not
saying I understood his argument and disagree with it, mind you.  I'm
saying he spent only two or three paragraphs explaining that part, and
at the end I didn't understand what he said.

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of A Darren
Dunham
Sent: Thursday, October 25, 2007 1:02 PM
To: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments

 Did I read in this list that netbackup was supposed to do some kind of
 checksum on the data written to tape?
 If so would a bpverify check this. I would assume that if netbackup
does
 this it would find the error.
 because netbackup would do it's calc before passing the block to the
 dedupe hardware/software. And the block that it gets back from the
 dedupe hardware/software would be different.

Even if bpverify did checksum in this manner, you can't assume that it
would find all such errors.  The checksum can collide in a manner
identical to the hash.  Unless it lined up exactly with the hash
algorithm, it would likely provide some additional protection, but at
the same time it must include some collisions where both the block hash
and the overall checksum give identical values for a replacement block.
The presense of an additional checksum like this changes the specific
numbers, but does not change the essential character of the issue.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments

2007-10-25 Thread Mark.Donaldson
Nope - I don't think Netbackup is making checksums.

Tape hardware seems to be reasonably adept at detecting big tape errors,
though.  This, of course, goes away with disk based backups.

bpverify is just a check of the tape contents vs the media catalog.  It
does read the tape blocks so it may allow the drive to detect a media
error but it's not a verification of the block integrity vs some stored
checksum.


DESCRIPTION
 bpverify verifies the contents of one  or  more  backups  by
 reading  the backup volume and comparing its contents to the
 NetBackup catalog. This operation does not compare the  data
 on the volume with the contents of the client disk. However,
 it does read each block in the image,  thus  verifying  that
 the  volume  is readable. NetBackup verifies only one backup
 at a time and tries to minimize media mounts and positioning
 time.

-M

-Original Message-
From: Len Boyle [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 22, 2007 6:37 PM
To: Donaldson, Mark - Broomfield, CO; veritas-bu@mailman.eng.auburn.edu
Subject: RE: Re: [Veritas-bu] Tapeless backup environments

Hello Mark,

Did I read in this list that netbackup was supposed to do some kind of
checksum on the data written to tape?
If so would a bpverify check this. I would assume that if netbackup does
this it would find the error.
because netbackup would do it's calc before passing the block to the
dedupe hardware/software. And the block that it gets back from the
dedupe hardware/software would be different.

Of course the brings on the question with the Symantec/Veritas pure disk
product or emc's as the netbackup and the dedupe parts are merged one
would not have this double check. At least I would not think that one
would.

len

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
[EMAIL PROTECTED]
Sent: Monday, October 22, 2007 4:52 PM
To: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments

I think that part of the problem is that a hash duplication is nearly
undetectable until you have restored and tested it as false.

We all know that 99.999% of what we back up is never restored.  It just
ages gracefully on media and is expired.  If any of that .001% is
restored and is damaged due to a tape fault (and we've all had it
happen) then we all know that we can usually reach back to a different
version or different tape and we'll be close enough to make the user go
away and let us return to our coffee and surfing.

I think a big part of the worry of a hash collision is that the restore
seems to happen, the file restores flawlessly, and it'll not be
detectable unless someone can checksum the whole file or it's a binary
or similar that simply refuses to work.

Again, restoring from a different tape, different version may be
ineffective depending on where the hash collision occurred and for what
reason.  Every version may use this same unchanging block which is
restore incorrectly due to an invalid hash match.

I know the odds are astronomical but I still remember that even though
the odds are 150 million to one I'll win the lottery, I still see
smiling faces on TV holding giant checks.

It's a bet, like all other restore techniques, and I'm going to make
sure management has full knowledge of the risks before we implement it
here (which is likely).

-M

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jeff
Lightner
Sent: Monday, October 22, 2007 10:28 AM
To: Austin Murphy; veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments

This paper looks to be 5 years old (based on newest references it cites
- it actually cites others that go back nearly 10 years).  It would be
interesting to see his take on current deduplication offerings to see if
the other checks they contain over simple hashing were enough to allay
his concerns.

One thing I've not seen in all this discussion is anyone saying they've
actually experienced data loss as a result of commercial deduplication
devices.  Can anyone here claim that?

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Austin
Murphy
Sent: Monday, October 22, 2007 10:47 AM
To: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments

Here is some required reading on the topic from Val Henson, a noted
academic/storage-guru.

An Analysis of Compare-by-hash
www.nmt.edu/~val/review/hash.pdf

Of particular interst is why hardware error rates can't be compared
with deterministic software errors.

Austin
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
--
CONFIDENTIALITY NOTICE: This e-mail may contain privileged or
confidential information and is for the sole use of the intended
recipient(s). If you

Re: [Veritas-bu] Tapeless backup environments

2007-10-25 Thread A Darren Dunham
 Did I read in this list that netbackup was supposed to do some kind of
 checksum on the data written to tape?
 If so would a bpverify check this. I would assume that if netbackup does
 this it would find the error.
 because netbackup would do it's calc before passing the block to the
 dedupe hardware/software. And the block that it gets back from the
 dedupe hardware/software would be different.

Even if bpverify did checksum in this manner, you can't assume that it
would find all such errors.  The checksum can collide in a manner
identical to the hash.  Unless it lined up exactly with the hash
algorithm, it would likely provide some additional protection, but at
the same time it must include some collisions where both the block hash
and the overall checksum give identical values for a replacement block.
The presense of an additional checksum like this changes the specific
numbers, but does not change the essential character of the issue.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments

2007-10-22 Thread Hall, Christian N.

Why don't we just move on..

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Curtis
Preston
Sent: Friday, October 19, 2007 6:52 PM
To: Eagle, Kent; veritas-bu@mailman.eng.auburn.edu
Cc: [EMAIL PROTECTED]
Subject: Re: [Veritas-bu] Tapeless backup environments

Since you've impugned my honor, I feel the need to defend myself a bit,
but I don't want to spend much more time on this topic either:

My first point was that you quoted a Wikipedia article as a source.

The debate as to whether Wikipedia articles have any value is an ongoing
one, and no point in rehashing it here. Suffice it to say that I have
slightly higher opinion of it than you do.

What I meant was that the posts made by
Bob944 seemed to me to be supported by cited facts, and denoted
personal experiences. 

I have personally used and tested many dedupe solutions.  Based on his
opinion of them, I'm pretty sure Bob944 has not.  So I'm not sure how my
posts could be construed as coming from theory and his from reality.
Perhaps it was just my style of writing.

He's not pointing to something he previously authored as proof that
information is fact. 

I never did that.  I only point you to the blog so because I put a lot
of thought into it and figured you could read that version, instead of
me having to rehash it here in email.

To be fair, I haven't read any of your blog postings, only your posts
in this forum. 

And I'm guessing you've never read my books or articles or seen me
speak.  I think you'll find that I'm not nearly as stupid as you seem to
think I am.  :)

And yes; an Industry Pundit, Author, SME, or whomever, quoting
Wikipedia as a source does tend to dilute credibility, in my mind.
It's not a personal attack, just my personal position on the issue.

Again, I didn't cite it as my only proof, and yes, we do have a
different opinion on the validity of Wikipedia.

The part below has me confused where you say  No, because I never said
those words or anything like them in my article. Since I never...

What I was trying to say was that it seemed to me that he was saying
something along the lines of my minds already made up, don't confuse me
with the facts.  You said I said the same thing, and I'm saying I
didn't.  In my blog posts (that I was referring and that you did not
read), I think you'll find a very this is what I think, what do you
think? mentality.  If you inferred I was trying to say anything else,
please believe I wasn't.

So one could easily conclude that a position was taken (and published)
on this topic without sufficient testing or research (the related
SunSolve and other articles were already out there before these posts
were made).

Again, you haven't read my blog, so I'm not sure how you can criticize
it.  And it's a BLOG, dude.  The whole spirit of blogging is that it's a
stream of consciousness, not full articles and/or research.  I didn't
write GbE is a lie! in an article, I wrote it in a blog.  The same
blog where I wrote Top 10 Things I learned about backups from watching
Die Hard.  A lot of it is written tongue in cheek and I think anyone
who follows it knows that.  I don't put blogging on a subject on the
same level of publishing any more than you consider a Wikipedia
article valid information.  And I think that most people feel the same
way.

BTW, I did a ton of research on Sun.com, neterion, intel, alacritech,
google, etc to find ANY evidence of benchmarks to prove my feelings
wrong before I wrote that blog article.  The sunsolve articles to which
you refer were written, but they aren't benchmarks, they only say this
is how to configure a 10 GbE NIC on Solaris.

You can see how maybe a newbie might assume a post as gospel with the
barrage of credentials? 

I get that, as it happens to me all the time.  It goes with the
territory of being a prolific speaker/author/blogger/blabber.  I try to
help people.  I write and speak a lot as part of that.  If someone takes
my word as gospel without doing their own research then shame on them
and I can't control that.  I stand by what I wrote, and when I'm wrong,
I admit it.  I'm not going to stop writing/emailing/blogging because I
might say something wrong.

I actually cut my teeth right down
the road from you as the backup guy at MBNA.  (I lived in Newark, DE,
and you were my bank.)

I'm not sure what you meant to imply by all this? If tenure with backup
is an issue, than I would suggest you really don't have all that much
time in this space,

I never meant to imply that I have more credentials than you.  I only
meant to reply to the part of your post that suggested that I wasn't
coming from a real/practical/having-actually-done-this-before position.
And if 14 years doing backups and restores for my company and other
companies don't give me some amount of credibility, I'm not sure what
does.

I've never made mention of my employer, or even implied that any of my
statements represented any opinion or position of theirs? I find

Re: [Veritas-bu] Tapeless backup environments

2007-10-22 Thread WEAVER, Simon (external)

Agree with you on that I set this subject to auto delete now a few weeks
back!

Regards

Simon Weaver
3rd Line Technical Support
Windows Domain Administrator 

EADS Astrium Limited, B23AA IM (DCS)
Anchorage Road, Portsmouth, PO3 5PU

Email: [EMAIL PROTECTED]



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Hall,
Christian N.
Sent: Monday, October 22, 2007 2:55 PM
To: Curtis Preston; Eagle, Kent; veritas-bu@mailman.eng.auburn.edu
Cc: [EMAIL PROTECTED]
Subject: Re: [Veritas-bu] Tapeless backup environments



Why don't we just move on..

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Curtis
Preston
Sent: Friday, October 19, 2007 6:52 PM
To: Eagle, Kent; veritas-bu@mailman.eng.auburn.edu
Cc: [EMAIL PROTECTED]
Subject: Re: [Veritas-bu] Tapeless backup environments

Since you've impugned my honor, I feel the need to defend myself a bit, but
I don't want to spend much more time on this topic either:

My first point was that you quoted a Wikipedia article as a source.

The debate as to whether Wikipedia articles have any value is an ongoing
one, and no point in rehashing it here. Suffice it to say that I have
slightly higher opinion of it than you do.

What I meant was that the posts made by
Bob944 seemed to me to be supported by cited facts, and denoted 
personal experiences.

I have personally used and tested many dedupe solutions.  Based on his
opinion of them, I'm pretty sure Bob944 has not.  So I'm not sure how my
posts could be construed as coming from theory and his from reality. Perhaps
it was just my style of writing.

He's not pointing to something he previously authored as proof that 
information is fact.

I never did that.  I only point you to the blog so because I put a lot of
thought into it and figured you could read that version, instead of me
having to rehash it here in email.

To be fair, I haven't read any of your blog postings, only your posts
in this forum. 

And I'm guessing you've never read my books or articles or seen me speak.  I
think you'll find that I'm not nearly as stupid as you seem to think I am.
:)

And yes; an Industry Pundit, Author, SME, or whomever, quoting 
Wikipedia as a source does tend to dilute credibility, in my mind. 
It's not a personal attack, just my personal position on the issue.

Again, I didn't cite it as my only proof, and yes, we do have a different
opinion on the validity of Wikipedia.

The part below has me confused where you say  No, because I never said 
those words or anything like them in my article. Since I never...

What I was trying to say was that it seemed to me that he was saying
something along the lines of my minds already made up, don't confuse me
with the facts.  You said I said the same thing, and I'm saying I didn't.
In my blog posts (that I was referring and that you did not read), I think
you'll find a very this is what I think, what do you think? mentality.  If
you inferred I was trying to say anything else, please believe I wasn't.

So one could easily conclude that a position was taken (and published) 
on this topic without sufficient testing or research (the related 
SunSolve and other articles were already out there before these posts 
were made).

Again, you haven't read my blog, so I'm not sure how you can criticize it.
And it's a BLOG, dude.  The whole spirit of blogging is that it's a stream
of consciousness, not full articles and/or research.  I didn't write GbE is
a lie! in an article, I wrote it in a blog.  The same blog where I wrote
Top 10 Things I learned about backups from watching Die Hard.  A lot of it
is written tongue in cheek and I think anyone who follows it knows that.  I
don't put blogging on a subject on the same level of publishing any more
than you consider a Wikipedia article valid information.  And I think that
most people feel the same way.

BTW, I did a ton of research on Sun.com, neterion, intel, alacritech,
google, etc to find ANY evidence of benchmarks to prove my feelings wrong
before I wrote that blog article.  The sunsolve articles to which you refer
were written, but they aren't benchmarks, they only say this is how to
configure a 10 GbE NIC on Solaris.

You can see how maybe a newbie might assume a post as gospel with the 
barrage of credentials?

I get that, as it happens to me all the time.  It goes with the territory of
being a prolific speaker/author/blogger/blabber.  I try to help people.  I
write and speak a lot as part of that.  If someone takes my word as gospel
without doing their own research then shame on them and I can't control
that.  I stand by what I wrote, and when I'm wrong, I admit it.  I'm not
going to stop writing/emailing/blogging because I might say something wrong.

I actually cut my teeth right down
the road from you as the backup guy at MBNA.  (I lived in Newark, DE, 
and you were my bank.)

I'm not sure what you meant to imply by all this? If tenure with backup 
is an issue, than I

Re: [Veritas-bu] Tapeless backup environments

2007-10-22 Thread Austin Murphy
Here is some required reading on the topic from Val Henson, a noted
academic/storage-guru.

An Analysis of Compare-by-hash
www.nmt.edu/~val/review/hash.pdf

Of particular interst is why hardware error rates can't be compared
with deterministic software errors.

Austin
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments

2007-10-22 Thread Jeff Lightner
This paper looks to be 5 years old (based on newest references it cites
- it actually cites others that go back nearly 10 years).  It would be
interesting to see his take on current deduplication offerings to see if
the other checks they contain over simple hashing were enough to allay
his concerns.

One thing I've not seen in all this discussion is anyone saying they've
actually experienced data loss as a result of commercial deduplication
devices.  Can anyone here claim that?

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Austin
Murphy
Sent: Monday, October 22, 2007 10:47 AM
To: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments

Here is some required reading on the topic from Val Henson, a noted
academic/storage-guru.

An Analysis of Compare-by-hash
www.nmt.edu/~val/review/hash.pdf

Of particular interst is why hardware error rates can't be compared
with deterministic software errors.

Austin
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
--
CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential 
information and is for the sole use of the intended recipient(s). If you are 
not the intended recipient, any disclosure, copying, distribution, or use of 
the contents of this information is prohibited and may be unlawful. If you have 
received this electronic transmission in error, please reply immediately to the 
sender that you have received the message in error, and delete it. Thank you.
--

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments

2007-10-22 Thread Martin, Jonathan
I don't know.  I, for one, am now thoroughly engrossed given Curtis' honor has 
been impugned. =P
 
-Jonathan
 


From: [EMAIL PROTECTED] on behalf of Hall, Christian N.
Sent: Mon 10/22/2007 9:55 AM
To: Curtis Preston; Eagle, Kent; veritas-bu@mailman.eng.auburn.edu
Cc: [EMAIL PROTECTED]
Subject: Re: [Veritas-bu] Tapeless backup environments




Why don't we just move on..

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Curtis
Preston
Sent: Friday, October 19, 2007 6:52 PM
To: Eagle, Kent; veritas-bu@mailman.eng.auburn.edu
Cc: [EMAIL PROTECTED]
Subject: Re: [Veritas-bu] Tapeless backup environments

Since you've impugned my honor, I feel the need to defend myself a bit,
but I don't want to spend much more time on this topic either:

My first point was that you quoted a Wikipedia article as a source.

The debate as to whether Wikipedia articles have any value is an ongoing
one, and no point in rehashing it here. Suffice it to say that I have
slightly higher opinion of it than you do.

What I meant was that the posts made by
Bob944 seemed to me to be supported by cited facts, and denoted
personal experiences.

I have personally used and tested many dedupe solutions.  Based on his
opinion of them, I'm pretty sure Bob944 has not.  So I'm not sure how my
posts could be construed as coming from theory and his from reality.
Perhaps it was just my style of writing.

He's not pointing to something he previously authored as proof that
information is fact.

I never did that.  I only point you to the blog so because I put a lot
of thought into it and figured you could read that version, instead of
me having to rehash it here in email.

To be fair, I haven't read any of your blog postings, only your posts
in this forum.

And I'm guessing you've never read my books or articles or seen me
speak.  I think you'll find that I'm not nearly as stupid as you seem to
think I am.  :)

And yes; an Industry Pundit, Author, SME, or whomever, quoting
Wikipedia as a source does tend to dilute credibility, in my mind.
It's not a personal attack, just my personal position on the issue.

Again, I didn't cite it as my only proof, and yes, we do have a
different opinion on the validity of Wikipedia.

The part below has me confused where you say  No, because I never said
those words or anything like them in my article. Since I never...

What I was trying to say was that it seemed to me that he was saying
something along the lines of my minds already made up, don't confuse me
with the facts.  You said I said the same thing, and I'm saying I
didn't.  In my blog posts (that I was referring and that you did not
read), I think you'll find a very this is what I think, what do you
think? mentality.  If you inferred I was trying to say anything else,
please believe I wasn't.

So one could easily conclude that a position was taken (and published)
on this topic without sufficient testing or research (the related
SunSolve and other articles were already out there before these posts
were made).

Again, you haven't read my blog, so I'm not sure how you can criticize
it.  And it's a BLOG, dude.  The whole spirit of blogging is that it's a
stream of consciousness, not full articles and/or research.  I didn't
write GbE is a lie! in an article, I wrote it in a blog.  The same
blog where I wrote Top 10 Things I learned about backups from watching
Die Hard.  A lot of it is written tongue in cheek and I think anyone
who follows it knows that.  I don't put blogging on a subject on the
same level of publishing any more than you consider a Wikipedia
article valid information.  And I think that most people feel the same
way.

BTW, I did a ton of research on Sun.com, neterion, intel, alacritech,
google, etc to find ANY evidence of benchmarks to prove my feelings
wrong before I wrote that blog article.  The sunsolve articles to which
you refer were written, but they aren't benchmarks, they only say this
is how to configure a 10 GbE NIC on Solaris.

You can see how maybe a newbie might assume a post as gospel with the
barrage of credentials?

I get that, as it happens to me all the time.  It goes with the
territory of being a prolific speaker/author/blogger/blabber.  I try to
help people.  I write and speak a lot as part of that.  If someone takes
my word as gospel without doing their own research then shame on them
and I can't control that.  I stand by what I wrote, and when I'm wrong,
I admit it.  I'm not going to stop writing/emailing/blogging because I
might say something wrong.

I actually cut my teeth right down
the road from you as the backup guy at MBNA.  (I lived in Newark, DE,
and you were my bank.)

I'm not sure what you meant to imply by all this? If tenure with backup
is an issue, than I would suggest you really don't have all that much
time in this space,

I never meant to imply that I have more credentials than you.  I only
meant to reply to the part of your post that suggested

Re: [Veritas-bu] Tapeless backup environments

2007-10-22 Thread Curtis Preston
Funny.  VERY Funny. ;)

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Martin,
Jonathan
Sent: Monday, October 22, 2007 9:23 AM
To: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments

I don't know.  I, for one, am now thoroughly engrossed given Curtis'
honor has been impugned. =P
 
-Jonathan
 


From: [EMAIL PROTECTED] on behalf of Hall,
Christian N.
Sent: Mon 10/22/2007 9:55 AM
To: Curtis Preston; Eagle, Kent; veritas-bu@mailman.eng.auburn.edu
Cc: [EMAIL PROTECTED]
Subject: Re: [Veritas-bu] Tapeless backup environments




Why don't we just move on..

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Curtis
Preston
Sent: Friday, October 19, 2007 6:52 PM
To: Eagle, Kent; veritas-bu@mailman.eng.auburn.edu
Cc: [EMAIL PROTECTED]
Subject: Re: [Veritas-bu] Tapeless backup environments

Since you've impugned my honor, I feel the need to defend myself a bit,
but I don't want to spend much more time on this topic either:

My first point was that you quoted a Wikipedia article as a source.

The debate as to whether Wikipedia articles have any value is an ongoing
one, and no point in rehashing it here. Suffice it to say that I have
slightly higher opinion of it than you do.

What I meant was that the posts made by
Bob944 seemed to me to be supported by cited facts, and denoted
personal experiences.

I have personally used and tested many dedupe solutions.  Based on his
opinion of them, I'm pretty sure Bob944 has not.  So I'm not sure how my
posts could be construed as coming from theory and his from reality.
Perhaps it was just my style of writing.

He's not pointing to something he previously authored as proof that
information is fact.

I never did that.  I only point you to the blog so because I put a lot
of thought into it and figured you could read that version, instead of
me having to rehash it here in email.

To be fair, I haven't read any of your blog postings, only your posts
in this forum.

And I'm guessing you've never read my books or articles or seen me
speak.  I think you'll find that I'm not nearly as stupid as you seem to
think I am.  :)

And yes; an Industry Pundit, Author, SME, or whomever, quoting
Wikipedia as a source does tend to dilute credibility, in my mind.
It's not a personal attack, just my personal position on the issue.

Again, I didn't cite it as my only proof, and yes, we do have a
different opinion on the validity of Wikipedia.

The part below has me confused where you say  No, because I never said
those words or anything like them in my article. Since I never...

What I was trying to say was that it seemed to me that he was saying
something along the lines of my minds already made up, don't confuse me
with the facts.  You said I said the same thing, and I'm saying I
didn't.  In my blog posts (that I was referring and that you did not
read), I think you'll find a very this is what I think, what do you
think? mentality.  If you inferred I was trying to say anything else,
please believe I wasn't.

So one could easily conclude that a position was taken (and published)
on this topic without sufficient testing or research (the related
SunSolve and other articles were already out there before these posts
were made).

Again, you haven't read my blog, so I'm not sure how you can criticize
it.  And it's a BLOG, dude.  The whole spirit of blogging is that it's a
stream of consciousness, not full articles and/or research.  I didn't
write GbE is a lie! in an article, I wrote it in a blog.  The same
blog where I wrote Top 10 Things I learned about backups from watching
Die Hard.  A lot of it is written tongue in cheek and I think anyone
who follows it knows that.  I don't put blogging on a subject on the
same level of publishing any more than you consider a Wikipedia
article valid information.  And I think that most people feel the same
way.

BTW, I did a ton of research on Sun.com, neterion, intel, alacritech,
google, etc to find ANY evidence of benchmarks to prove my feelings
wrong before I wrote that blog article.  The sunsolve articles to which
you refer were written, but they aren't benchmarks, they only say this
is how to configure a 10 GbE NIC on Solaris.

You can see how maybe a newbie might assume a post as gospel with the
barrage of credentials?

I get that, as it happens to me all the time.  It goes with the
territory of being a prolific speaker/author/blogger/blabber.  I try to
help people.  I write and speak a lot as part of that.  If someone takes
my word as gospel without doing their own research then shame on them
and I can't control that.  I stand by what I wrote, and when I'm wrong,
I admit it.  I'm not going to stop writing/emailing/blogging because I
might say something wrong.

I actually cut my teeth right down
the road from

Re: [Veritas-bu] Tapeless backup environments

2007-10-22 Thread Mark.Donaldson
I think that part of the problem is that a hash duplication is nearly
undetectable until you have restored and tested it as false.

We all know that 99.999% of what we back up is never restored.  It just
ages gracefully on media and is expired.  If any of that .001% is
restored and is damaged due to a tape fault (and we've all had it
happen) then we all know that we can usually reach back to a different
version or different tape and we'll be close enough to make the user go
away and let us return to our coffee and surfing.

I think a big part of the worry of a hash collision is that the restore
seems to happen, the file restores flawlessly, and it'll not be
detectable unless someone can checksum the whole file or it's a binary
or similar that simply refuses to work.

Again, restoring from a different tape, different version may be
ineffective depending on where the hash collision occurred and for what
reason.  Every version may use this same unchanging block which is
restore incorrectly due to an invalid hash match.

I know the odds are astronomical but I still remember that even though
the odds are 150 million to one I'll win the lottery, I still see
smiling faces on TV holding giant checks.

It's a bet, like all other restore techniques, and I'm going to make
sure management has full knowledge of the risks before we implement it
here (which is likely).

-M

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jeff
Lightner
Sent: Monday, October 22, 2007 10:28 AM
To: Austin Murphy; veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments

This paper looks to be 5 years old (based on newest references it cites
- it actually cites others that go back nearly 10 years).  It would be
interesting to see his take on current deduplication offerings to see if
the other checks they contain over simple hashing were enough to allay
his concerns.

One thing I've not seen in all this discussion is anyone saying they've
actually experienced data loss as a result of commercial deduplication
devices.  Can anyone here claim that?

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Austin
Murphy
Sent: Monday, October 22, 2007 10:47 AM
To: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments

Here is some required reading on the topic from Val Henson, a noted
academic/storage-guru.

An Analysis of Compare-by-hash
www.nmt.edu/~val/review/hash.pdf

Of particular interst is why hardware error rates can't be compared
with deterministic software errors.

Austin
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
--
CONFIDENTIALITY NOTICE: This e-mail may contain privileged or
confidential information and is for the sole use of the intended
recipient(s). If you are not the intended recipient, any disclosure,
copying, distribution, or use of the contents of this information is
prohibited and may be unlawful. If you have received this electronic
transmission in error, please reply immediately to the sender that you
have received the message in error, and delete it. Thank you.
--

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments

2007-10-22 Thread Len Boyle
Hello Mark,

Did I read in this list that netbackup was supposed to do some kind of checksum 
on the data written to tape?
If so would a bpverify check this. I would assume that if netbackup does this 
it would find the error.
because netbackup would do it's calc before passing the block to the dedupe 
hardware/software. And the block that it gets back from the dedupe 
hardware/software would be different.

Of course the brings on the question with the Symantec/Veritas pure disk 
product or emc's as the netbackup and the dedupe parts are merged one would not 
have this double check. At least I would not think that one would.

len

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED]
Sent: Monday, October 22, 2007 4:52 PM
To: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments

I think that part of the problem is that a hash duplication is nearly
undetectable until you have restored and tested it as false.

We all know that 99.999% of what we back up is never restored.  It just
ages gracefully on media and is expired.  If any of that .001% is
restored and is damaged due to a tape fault (and we've all had it
happen) then we all know that we can usually reach back to a different
version or different tape and we'll be close enough to make the user go
away and let us return to our coffee and surfing.

I think a big part of the worry of a hash collision is that the restore
seems to happen, the file restores flawlessly, and it'll not be
detectable unless someone can checksum the whole file or it's a binary
or similar that simply refuses to work.

Again, restoring from a different tape, different version may be
ineffective depending on where the hash collision occurred and for what
reason.  Every version may use this same unchanging block which is
restore incorrectly due to an invalid hash match.

I know the odds are astronomical but I still remember that even though
the odds are 150 million to one I'll win the lottery, I still see
smiling faces on TV holding giant checks.

It's a bet, like all other restore techniques, and I'm going to make
sure management has full knowledge of the risks before we implement it
here (which is likely).

-M

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jeff
Lightner
Sent: Monday, October 22, 2007 10:28 AM
To: Austin Murphy; veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments

This paper looks to be 5 years old (based on newest references it cites
- it actually cites others that go back nearly 10 years).  It would be
interesting to see his take on current deduplication offerings to see if
the other checks they contain over simple hashing were enough to allay
his concerns.

One thing I've not seen in all this discussion is anyone saying they've
actually experienced data loss as a result of commercial deduplication
devices.  Can anyone here claim that?

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Austin
Murphy
Sent: Monday, October 22, 2007 10:47 AM
To: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments

Here is some required reading on the topic from Val Henson, a noted
academic/storage-guru.

An Analysis of Compare-by-hash
www.nmt.edu/~val/review/hash.pdf

Of particular interst is why hardware error rates can't be compared
with deterministic software errors.

Austin
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
--
CONFIDENTIALITY NOTICE: This e-mail may contain privileged or
confidential information and is for the sole use of the intended
recipient(s). If you are not the intended recipient, any disclosure,
copying, distribution, or use of the contents of this information is
prohibited and may be unlawful. If you have received this electronic
transmission in error, please reply immediately to the sender that you
have received the message in error, and delete it. Thank you.
--

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-10-19 Thread Curtis Preston
I wish we had a white board and could sit in front of each other to
finish the discussion, but it's obvious that it's not going to be
resolved here.  

You believe I'm missing your point, and I believe you're missing my
point.

what matters is if you use a shorthand to track the
values which can't tell that Feb 7 and Dec 28 are different values
because you put them in the same hash bucket and therefore think that
everything that bucket is Feb 7, you retrieve the wrong data.

Not sure how many times I (or others) have to keep saying, the dates are
not the data that are being deduped.  The dates are the hashes.  The
data is the person.

An 8KB chunk of data can have 2^65536 possible values.  Representing
that 8KB of data in 160 bits means that each of the 2^160 possible
checksum/hash/fingerprint values MUST represent, on average, 2^65376
*different* 8KB chunks of data.  

This, again, only makes sense if you are using the hash to
store/reconstruct the data, not to ID the data.  The fingerprint (like a
real fingerprint) is not used to reconstruct a block, it's only used to
give it a unique ID that distinguishes it from other blocks.  You still
have to store the block with the key.  And with 2^160 different
fingerprints, that means we can calculate unique fingerprints for 2^160
blocks.  That means we can calculate a unique fingerprint for
1,461,501,637,330,900,000,000,000,000,000,000,000,000,000,000,000
blocks, which is
11,832,317,255,831,000,000,000,000,000,000,000,000,000,000,000,000,000
bytes of data.  That's a lot of stinking data.

If that doesn't concern you, well, it's safe to say I won't be hiring
you as my backup admin.  Or as my technology consultant, since you

I really don't think you need to make it personal, and suggest that I
don't know what I'm doing simply because we have been unable to
successfully communicate to each other in this medium.  This medium can
be a very difficult one to communicate such a difficult subject in.  I
think things would be very different in person with a whiteboard.

should know from earlier postings that spoofing your favorite 160-bit
hashing algorithm with reasonable-looking fake data is now old hat.  
The exploit itself should concern us, not to mention that it also
illustrates that similar data which yields the same hash is not the
once-in-the-lifetime-of-the-universe oddity you portray.

They worked really hard to figure out how to take one block that
calculates to a particular hash and create another block that calculates
to the same hash.  It's used to fake a signature.  I get it.  I just
don't see how or why somebody would use this to do I don't know what
with my backups.  And if we were having this discussion over a few
drinks we could try to come up with some ideas.  Right now, I'm as tired
as you are of this discussion.

Everything mentioned here was covered in the original postings a month
ago.  Unless there's something new, I'm done with this.

You're right.  IN THIS MEDIUM, you don't understand me, and I don't
understand you.  Let's agree to disagree and move on.

For anyone who's still reading, I just want to say this:

I was only trying to bring some sanity to what I felt was an undue
amount of FUD against the hash-only products. I'm not necessarily trying
to talk anyone into them.  I just want you to understand what I THINK
the real odds are.  If after understanding how it works and what the
odds are, you're still uncomfortable, don't dismiss dedupe.  Just
consider a non-hash-based de-dupe product.

Curtis out.

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments

2007-10-19 Thread Curtis Preston
Since you've impugned my honor, I feel the need to defend myself a bit,
but I don't want to spend much more time on this topic either:

My first point was that you quoted a Wikipedia article as a source.

The debate as to whether Wikipedia articles have any value is an ongoing
one, and no point in rehashing it here. Suffice it to say that I have
slightly higher opinion of it than you do.

What I meant was that the posts made by
Bob944 seemed to me to be supported by cited facts, and denoted
personal experiences. 

I have personally used and tested many dedupe solutions.  Based on his
opinion of them, I'm pretty sure Bob944 has not.  So I'm not sure how my
posts could be construed as coming from theory and his from reality.
Perhaps it was just my style of writing.

He's not pointing to something he previously authored as proof that
information is fact. 

I never did that.  I only point you to the blog so because I put a lot
of thought into it and figured you could read that version, instead of
me having to rehash it here in email.

To be fair, I haven't read any of your blog postings, only your posts
in this forum. 

And I'm guessing you've never read my books or articles or seen me
speak.  I think you'll find that I'm not nearly as stupid as you seem to
think I am.  :)

And yes; an Industry Pundit, Author, SME, or whomever, quoting
Wikipedia as a source does tend to dilute credibility, in my mind.
It's not a personal attack, just my personal position on the issue.

Again, I didn't cite it as my only proof, and yes, we do have a
different opinion on the validity of Wikipedia.

The part below has me confused where you say  No, because I never said
those words or anything like them in my article. Since I never...

What I was trying to say was that it seemed to me that he was saying
something along the lines of my minds already made up, don't confuse me
with the facts.  You said I said the same thing, and I'm saying I
didn't.  In my blog posts (that I was referring and that you did not
read), I think you'll find a very this is what I think, what do you
think? mentality.  If you inferred I was trying to say anything else,
please believe I wasn't.

So one could easily conclude that a position was taken (and published)
on this topic without sufficient testing or research (the related
SunSolve and other articles were already out there before these posts
were made).

Again, you haven't read my blog, so I'm not sure how you can criticize
it.  And it's a BLOG, dude.  The whole spirit of blogging is that it's a
stream of consciousness, not full articles and/or research.  I didn't
write GbE is a lie! in an article, I wrote it in a blog.  The same
blog where I wrote Top 10 Things I learned about backups from watching
Die Hard.  A lot of it is written tongue in cheek and I think anyone
who follows it knows that.  I don't put blogging on a subject on the
same level of publishing any more than you consider a Wikipedia
article valid information.  And I think that most people feel the same
way.

BTW, I did a ton of research on Sun.com, neterion, intel, alacritech,
google, etc to find ANY evidence of benchmarks to prove my feelings
wrong before I wrote that blog article.  The sunsolve articles to which
you refer were written, but they aren't benchmarks, they only say this
is how to configure a 10 GbE NIC on Solaris.

You can see how maybe a newbie might assume a post as gospel with the
barrage of credentials? 

I get that, as it happens to me all the time.  It goes with the
territory of being a prolific speaker/author/blogger/blabber.  I try to
help people.  I write and speak a lot as part of that.  If someone takes
my word as gospel without doing their own research then shame on them
and I can't control that.  I stand by what I wrote, and when I'm wrong,
I admit it.  I'm not going to stop writing/emailing/blogging because I
might say something wrong.

I actually cut my teeth right down
the road from you as the backup guy at MBNA.  (I lived in Newark, DE,
and you were my bank.)

I'm not sure what you meant to imply by all this? If tenure with backup
is an issue, than I would suggest you really don't have all that much
time in this space,

I never meant to imply that I have more credentials than you.  I only
meant to reply to the part of your post that suggested that I wasn't
coming from a real/practical/having-actually-done-this-before position.
And if 14 years doing backups and restores for my company and other
companies don't give me some amount of credibility, I'm not sure what
does.

I've never made mention of my employer, or even implied that any of my
statements represented any opinion or position of theirs? I find this
statement, well, bizarre...

Dunno.  Seemed right at the time.  What was my point?  I thought it
would help hammer home the I'm a real person who did real stuff in a
real place point.  Guess it didn't help at all. ;)

Maybe I will attend the class after all. I'm beginning to think I'll be
entertained.


Re: [Veritas-bu] Tapeless backup environments

2007-10-19 Thread Eagle, Kent
Jeff,

The mix was deliberate. Please re-read my post  it should become
evident as to why. There was no implication that someone stated they had
experienced data loss.

In fact, nothing in my post is really speaking to dedupe or data loss.
It's about the posts themselves...

- Kent
 

-Original Message-
From: Jeff Lightner [mailto:[EMAIL PROTECTED] 
Sent: Friday, October 19, 2007 12:36 PM
To: Eagle, Kent; Curtis Preston; veritas-bu@mailman.eng.auburn.edu
Cc: [EMAIL PROTECTED]
Subject: RE: [Veritas-bu] Tapeless backup environments

Not an attack - just a question:  Did someone in this thread say they
HAD experienced data loss due to deduplication?   If so I missed it.

You mixed comments about another thread in here and I *think* you're
saying something about someone's experience with 10GigE rather than
deduplication.  Your post could be misread to say someone had in fact
had such a data loss and posted it here.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Eagle,
Kent
Sent: Friday, October 19, 2007 12:08 PM
To: Curtis Preston; veritas-bu@mailman.eng.auburn.edu
Cc: [EMAIL PROTECTED]
Subject: Re: [Veritas-bu] Tapeless backup environments

O.k., at the risk of seeming like I wrote more than you, therefore I
must be right...

2nd. (and last) post on this -

My first point was that you quoted a Wikipedia article as a source.
For me, it really had nothing to do with the subject matter. They have a
disclaimer as to the validity of anything on there, and for good reason:
Anyone can post anything on there, about anything, containing anything.
It might be right, it might be wrong. I would be far more inclined to
trust, or quote an industry consortium, or even a vendors test results
page than Wikipedia.

As long as were throwing credentials around, I might as well mention: As
a former scientist, and statistician, and current engineer, I fully
understand what empirical research is. It INCLUDES math. It is the
actual testing and the statistics of that testing. FWIW: I was trained
in this and FMEA (Failure Modes Effects Analysis) by the gentleman who
ran the Reliability and Maintainability program for Boeing's Saturn and
Apollo space programs, as well as their VERTOL and fixed wing programs.

I can see where my second point could have easily been misinterpreted.
Apologies to anyone led astray. What I meant was that the posts made by
Bob944 seemed to me to be supported by cited facts, and denoted
personal experiences. He's not pointing to something he previously
authored as proof that information is fact. I've only seen him reference
previous posts for the purposes of levelset. To be fair, I haven't read
any of your blog postings, only your posts in this forum. More on that
below. And yes; an Industry Pundit, Author, SME, or whomever, quoting
Wikipedia as a source does tend to dilute credibility, in my mind.
It's not a personal attack, just my personal position on the issue.


The part below has me confused where you say  No, because I never said
those words or anything like them in my article. Since I never
mentioned anything about any articles... All my comments are in regard
you your posts on this forum, in which you did say that. Wouldn't THAT
be saying that up until that point, YOU
WERE SAYING that no matter what the entire world is saying -- no
matter
what the numbers are, you're not going to accept...
This was your text, no?


Obviously there's nothing wrong with admitting you're wrong. What I was
pointing out was that it appears duplicitous to make the comment above
and then state you're probably going to post a retraction in your blog
based one users experience. I'm referring to the 10 GbE thread where one
user reported stellar throughput, which contradicted a contrived
theoretical maximum, and several reports of ho-hum throughput.
 7500 MB/s!  That's the most impressive numbers I've ever seen by FAR.
I may have to take back my 10 GbE is a Lie! blog post, and I'd be
happy to do so.
This was your text, no?
So one could easily conclude that a position was taken (and published)
on this topic without sufficient testing or research (the related
SunSolve and other articles were already out there before these posts
were made).


You said: Remember also that these posts are often done on my own time
late at night, etc.  I never claimed to be perfect.
True, but you do cite that you are an author of books on the subject,
author of a blog on the subject, and work for one of the largest
industry resources. Indeed the  VP Data Protection. You can see how
maybe a newbie might assume a post as gospel with the barrage of
credentials? Would they not be disappointed to learn they need to check
the timestamp of a post before lending any credence to it's contents?
;-)


You said:  I don't think you'll find that to be a problem.  I'm an
in-the-trenches guy, who has sat in front of many a tape drive, tape
library, and backup GUI in my 14 years in this space.  I actually cut my

Re: [Veritas-bu] Tapeless backup environments

2007-10-19 Thread Jeff Lightner
Not an attack - just a question:  Did someone in this thread say they
HAD experienced data loss due to deduplication?   If so I missed it.

You mixed comments about another thread in here and I *think* you're
saying something about someone's experience with 10GigE rather than
deduplication.  Your post could be misread to say someone had in fact
had such a data loss and posted it here.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Eagle,
Kent
Sent: Friday, October 19, 2007 12:08 PM
To: Curtis Preston; veritas-bu@mailman.eng.auburn.edu
Cc: [EMAIL PROTECTED]
Subject: Re: [Veritas-bu] Tapeless backup environments

O.k., at the risk of seeming like I wrote more than you, therefore I
must be right...

2nd. (and last) post on this -

My first point was that you quoted a Wikipedia article as a source.
For me, it really had nothing to do with the subject matter. They have a
disclaimer as to the validity of anything on there, and for good reason:
Anyone can post anything on there, about anything, containing anything.
It might be right, it might be wrong. I would be far more inclined to
trust, or quote an industry consortium, or even a vendors test results
page than Wikipedia.

As long as were throwing credentials around, I might as well mention: As
a former scientist, and statistician, and current engineer, I fully
understand what empirical research is. It INCLUDES math. It is the
actual testing and the statistics of that testing. FWIW: I was trained
in this and FMEA (Failure Modes Effects Analysis) by the gentleman who
ran the Reliability and Maintainability program for Boeing's Saturn and
Apollo space programs, as well as their VERTOL and fixed wing programs.

I can see where my second point could have easily been misinterpreted.
Apologies to anyone led astray. What I meant was that the posts made by
Bob944 seemed to me to be supported by cited facts, and denoted
personal experiences. He's not pointing to something he previously
authored as proof that information is fact. I've only seen him reference
previous posts for the purposes of levelset. To be fair, I haven't read
any of your blog postings, only your posts in this forum. More on that
below. And yes; an Industry Pundit, Author, SME, or whomever, quoting
Wikipedia as a source does tend to dilute credibility, in my mind.
It's not a personal attack, just my personal position on the issue.


The part below has me confused where you say  No, because I never said
those words or anything like them in my article. Since I never
mentioned anything about any articles... All my comments are in regard
you your posts on this forum, in which you did say that. Wouldn't THAT
be saying that up until that point, YOU
WERE SAYING that no matter what the entire world is saying -- no
matter
what the numbers are, you're not going to accept...
This was your text, no?


Obviously there's nothing wrong with admitting you're wrong. What I was
pointing out was that it appears duplicitous to make the comment above
and then state you're probably going to post a retraction in your blog
based one users experience. I'm referring to the 10 GbE thread where one
user reported stellar throughput, which contradicted a contrived
theoretical maximum, and several reports of ho-hum throughput.
 7500 MB/s!  That's the most impressive numbers I've ever seen by FAR.
I may have to take back my 10 GbE is a Lie! blog post, and I'd be
happy to do so.
This was your text, no?
So one could easily conclude that a position was taken (and published)
on this topic without sufficient testing or research (the related
SunSolve and other articles were already out there before these posts
were made).


You said: Remember also that these posts are often done on my own time
late at night, etc.  I never claimed to be perfect.
True, but you do cite that you are an author of books on the subject,
author of a blog on the subject, and work for one of the largest
industry resources. Indeed the  VP Data Protection. You can see how
maybe a newbie might assume a post as gospel with the barrage of
credentials? Would they not be disappointed to learn they need to check
the timestamp of a post before lending any credence to it's contents?
;-)


You said:  I don't think you'll find that to be a problem.  I'm an
in-the-trenches guy, who has sat in front of many a tape drive, tape
library, and backup GUI in my 14 years in this space.  I actually cut my
teeth right down
the road from you as the backup guy at MBNA.  (I lived in Newark, DE,
and you were my bank.)

I'm not sure what you meant to imply by all this? If tenure with backup
is an issue, than I would suggest you really don't have all that much
time in this space, relative to my experience anyway. I had been
working with various forms of backup for that long before MBNA even had
a Data Center in DE. Why would it be necessary to point out that you
were in the same geographic locale, or used the services of my employer?
I've never made

Re: [Veritas-bu] Tapeless backup environments?

2007-10-19 Thread WEAVER, Simon (external)

How about setting up a white board / aka NetMeeting !

I think this thread has gone on for some time now, and yet there still
appears to be 2 different opinions.

Not going to please everyone.!  :-) personally, I would not be worried
about it and will just step out of the debate and move on.

Right or wrong, I really don't care that much :-)

But anyhow, something like DIGG Whiteboard might help - think its still free
if those wishing to continue the debate want to continue offline :-)

Bye !

Regards

Simon Weaver
3rd Line Technical Support
Windows Domain Administrator 

EADS Astrium Limited, B23AA IM (DCS)
Anchorage Road, Portsmouth, PO3 5PU

Email: [EMAIL PROTECTED]



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Curtis
Preston
Sent: Friday, October 19, 2007 8:38 AM
To: [EMAIL PROTECTED]; veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?


I wish we had a white board and could sit in front of each other to finish
the discussion, but it's obvious that it's not going to be resolved here.  

You believe I'm missing your point, and I believe you're missing my point.

what matters is if you use a shorthand to track the
values which can't tell that Feb 7 and Dec 28 are different values 
because you put them in the same hash bucket and therefore think that 
everything that bucket is Feb 7, you retrieve the wrong data.

Not sure how many times I (or others) have to keep saying, the dates are not
the data that are being deduped.  The dates are the hashes.  The data is the
person.

An 8KB chunk of data can have 2^65536 possible values.  Representing 
that 8KB of data in 160 bits means that each of the 2^160 possible 
checksum/hash/fingerprint values MUST represent, on average, 2^65376
*different* 8KB chunks of data.

This, again, only makes sense if you are using the hash to store/reconstruct
the data, not to ID the data.  The fingerprint (like a real fingerprint) is
not used to reconstruct a block, it's only used to give it a unique ID that
distinguishes it from other blocks.  You still have to store the block with
the key.  And with 2^160 different fingerprints, that means we can calculate
unique fingerprints for 2^160 blocks.  That means we can calculate a unique
fingerprint for
1,461,501,637,330,900,000,000,000,000,000,000,000,000,000,000,000
blocks, which is
11,832,317,255,831,000,000,000,000,000,000,000,000,000,000,000,000,000
bytes of data.  That's a lot of stinking data.

If that doesn't concern you, well, it's safe to say I won't be hiring 
you as my backup admin.  Or as my technology consultant, since you

I really don't think you need to make it personal, and suggest that I don't
know what I'm doing simply because we have been unable to successfully
communicate to each other in this medium.  This medium can be a very
difficult one to communicate such a difficult subject in.  I think things
would be very different in person with a whiteboard.

should know from earlier postings that spoofing your favorite 160-bit 
hashing algorithm with reasonable-looking fake data is now old hat.
The exploit itself should concern us, not to mention that it also
illustrates that similar data which yields the same hash is not the
once-in-the-lifetime-of-the-universe oddity you portray.

They worked really hard to figure out how to take one block that calculates
to a particular hash and create another block that calculates to the same
hash.  It's used to fake a signature.  I get it.  I just don't see how or
why somebody would use this to do I don't know what with my backups.  And if
we were having this discussion over a few drinks we could try to come up
with some ideas.  Right now, I'm as tired as you are of this discussion.

Everything mentioned here was covered in the original postings a month 
ago.  Unless there's something new, I'm done with this.

You're right.  IN THIS MEDIUM, you don't understand me, and I don't
understand you.  Let's agree to disagree and move on.

For anyone who's still reading, I just want to say this:

I was only trying to bring some sanity to what I felt was an undue amount of
FUD against the hash-only products. I'm not necessarily trying to talk
anyone into them.  I just want you to understand what I THINK the real odds
are.  If after understanding how it works and what the odds are, you're
still uncomfortable, don't dismiss dedupe.  Just consider a non-hash-based
de-dupe product.

Curtis out.

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

This email (including any attachments) may contain confidential and/or
privileged information or information otherwise protected from disclosure.
If you are not the intended recipient, please notify the sender
immediately, do not copy this message or any attachments and do not use it
for any purpose or disclose its content to any person, but delete

Re: [Veritas-bu] Tapeless backup environments?

2007-10-18 Thread Curtis Preston
At the risk of chasing windmills, I will continue to try to have this
discussion, although it appears to me that you're already made up your
mind.  I again say that no one is saying that hash collisions can't
happen.  We are simply saying that the odds of them happening are
astromically less than having an undetected/uncorrected bit error on
tape.  And I believe that the math that I use in my blog post
illustrates this.

I said:
 As promised, I looked into applying the Birthday Paradox 
 logic to de-duplication.  I blogged about my results here:
 
 http://www.backupcentral.com/content/view/145/47/
 
 Long and short of it: If you've got less than 95 Exabytes of 
 data, I think you'll be OK.

Bob944 said:
One of us still doesn't understand this. :-)

Got that right. :-)

Your blog raises a red herring in misunderstanding or misrepresenting
the applicability of Birthday Paradox.  

I completely disagree.  If you read the Birthday Paradox entry on
Wikipedia, it specifically explains how the Birthday Paradox applies in
this case.  All the BP says is that the odds of a clash (i.e. a
birthday match or a hash collision) in an environment increase with the
number of elements in the set, and that the odds increase faster than
you think:

* The odds of two people in the same room having the same birthday 
  increase with the number of people in the room.  If there are only
  two people in the room, those odds will be roughly 1 in 365, or .27% 
  (leap year aside).  If there are 23 people in the room, 
  the odds are 50%.
  
* The odds of two DIFFERENT blocks having the same hash (i.e. a
  hash collision) increase with the number of blocks in the data set
  If there are two blocks in the set, the odds are 1 in 2^160.
  If there are less than 12.7 quintillion blocks in the data set,
  the odds don't show up in a percentage calculated out to 50 decimal
  places.  As soon as you have more than 12.7 quintillion blocks, the
  odds at least register in 50 decimal places, but are still really 
  small.  And to get 12.7 quintillion blocks, you need to store at
  least 95 Exabytes of data.

The number of possible values in
BP is 366; there is no data reduction in it, no key values.  An
algorithm which reduced the 366 possibilities the same way that hashing
8KB down to 160 bits would yield infinitesimal keys smaller than one
bit, an absurdity.  

Yeah, IMHO, we are talking apples and oranges.  Let me try to put the
hash collision into the birthday world.  Let's say that we want a wall
of photos of everyone who came to our party.  When you show up, we check
your birthday,  and we check it off on a list.  (We'll call your BD the
hash.)  If we've never seen your birthday before, we take your photo
and put it on the wall.  If your birthday has already been checked off
on the list, though, we don't take your photo.  We assume that since you
have the same birthday, you must be the same person.  So you don't get
your photo taken.  We just write on the photo of the first guy whose
picture we took that he came to the party twice (he must have left and
come back).  Now, if he is indeed the same guy, that's not a hash/BD
collision.  If he is indeed a different person, and we said he was the
same person simply because he had the same birthday, then that would be
a hash/BD collision.

And THIS would be an absurdity to think you can represent n number of
people in a party with an array of photos selected solely on their
birthday (a key space of only 366).  But it's not out of the realm of
possibility to say that we could represent n number of bits in our data
center with an array of bits selected solely on a 160-bit hash (a
keyspace of 2^160).  Crytographers have been doing it for years.  We're
just adding another application on it.

An absurdity which should show that even if it
stopped at eight bits, one short of the bits required to hold 1-366,
there would still be fatal hash collisions--say, Feb 7, Feb 11 and Jun
30 all represented by the same code, in which case you can't figure
out
if people in the room have the same birthday.

Again, I hope if you read what I read above.  In the analogy, we're not
de-duping birthdays; we're de-duping people BASED on their birthdays.
(Which would be a dumb idea because the key space is too small: 366)

What you must grasp is that it is *impossible* to
represent/re-create/look up the values of 2^65536 bits in fewer than
2^65536 bits--unless you concede that each checksum/hash/fingerprint
will represent many different values of the original data--any more
than
you can represent three bits of data with two.

I concede, I concede!  The only point I'm trying to make is what are the
odds that two different blocks of data will have the same hash (i.e. a
hash collision) bin a given data center.

Hashing is a technique for saving time in certain circumstances.  It
is
valueless in re-creating (and a lookup is a re-creation) original data
when those data can have unlimited arbitrary values.  All the blog

Re: [Veritas-bu] Tapeless backup environments?

2007-10-18 Thread Iverson, Jerald

 What you must grasp is that it is *impossible* to
 represent/re-create/look up the values of 2^65536 bits in fewer than
 2^65536 bits--unless you concede that each checksum/hash/fingerprint
 will represent many different values of the original data--any more
than
 you can represent three bits of data with two.

that is why i have turned off all hardware and software compression on
my tape drives.  imagine trying to store more than 400GB of data onto a
single lto3 tape!  they say that you can store up to and even more
than 800GB, but i don't believe a word of it.  there is no way 1 nibble
of data can represent 1 byte!  once i have the time to study lzr
compression and understand it, and see whether or not it is
data-loss-less, then i may turn compression back on.  until then,
tapes are cheap and i'll buy 2.5 times as many as i need.  :-)

thanks,
jerald

p.s.
our de-dupe vtl does the hash and then a bit by bit comparison of the
data block to ensure the data really is the same in order to eliminate
the duplicate block.  i think some of the confusion may be in not
understanding how the de-dupe process works.  once you create a hash for
a block of data, you are storing the hash AND the block of data.  you
are never having to re-create a big block a data from a smaller hash.
the backup stream of data gets re-written from a string of 8k blocks,
into a string of 160-bit pointers which point to the unique 8k blocks
of data via the hash table.  or something like that...

Confidentiality Note: The information contained in this
message, and any attachments, may contain confidential
and/or privileged material.  It is intended solely for the
person(s) or entity to which it is addressed.  Any review,
retransmission, dissemination, or taking of any action in
reliance upon this information by persons or entities other
than the intended recipient(s) is prohibited.  If you received
this in error, please contact the sender and delete the
material from any computer.


___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-10-18 Thread Jeff Lightner
Hardware compression on your tape drives buys more than saved tapes - it
buys reduced backup times.   I found that out way back when on DDS
tapes.  We do compression on our stuff (and I have at many jobs) and
have yet to see a restore fail that wasn't due to an issue traced to the
original backup job that wasn't noticed at the time rather than some
mystical bit change that occurred during the restore.  

While it is theoretically possible you'll get killed during the next
Leonid meteor shower I doubt you're reinforcing your roof with steel to
insure it doesn't happen.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Iverson,
Jerald
Sent: Thursday, October 18, 2007 11:52 AM
To: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?


 What you must grasp is that it is *impossible* to
 represent/re-create/look up the values of 2^65536 bits in fewer than
 2^65536 bits--unless you concede that each checksum/hash/fingerprint
 will represent many different values of the original data--any more
than
 you can represent three bits of data with two.

that is why i have turned off all hardware and software compression on
my tape drives.  imagine trying to store more than 400GB of data onto a
single lto3 tape!  they say that you can store up to and even more
than 800GB, but i don't believe a word of it.  there is no way 1 nibble
of data can represent 1 byte!  once i have the time to study lzr
compression and understand it, and see whether or not it is
data-loss-less, then i may turn compression back on.  until then,
tapes are cheap and i'll buy 2.5 times as many as i need.  :-)

thanks,
jerald

p.s.
our de-dupe vtl does the hash and then a bit by bit comparison of the
data block to ensure the data really is the same in order to eliminate
the duplicate block.  i think some of the confusion may be in not
understanding how the de-dupe process works.  once you create a hash for
a block of data, you are storing the hash AND the block of data.  you
are never having to re-create a big block a data from a smaller hash.
the backup stream of data gets re-written from a string of 8k blocks,
into a string of 160-bit pointers which point to the unique 8k blocks
of data via the hash table.  or something like that...

Confidentiality Note: The information contained in this
message, and any attachments, may contain confidential
and/or privileged material.  It is intended solely for the
person(s) or entity to which it is addressed.  Any review,
retransmission, dissemination, or taking of any action in
reliance upon this information by persons or entities other
than the intended recipient(s) is prohibited.  If you received
this in error, please contact the sender and delete the
material from any computer.


___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
--
CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential 
information and is for the sole use of the intended recipient(s). If you are 
not the intended recipient, any disclosure, copying, distribution, or use of 
the contents of this information is prohibited and may be unlawful. If you have 
received this electronic transmission in error, please reply immediately to the 
sender that you have received the message in error, and delete it. Thank you.
--

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-10-18 Thread Curtis Preston
So you're OK with hash-based de-dupe, which everyone acknowledges has a
chance (although quite small) that you could have a hash-collision and
potentially corrupt a block of data somewhere, sometime, when you least
expect it...

But you're NOT ok with the long-running industry standard of loss-less
compression algorithms?  (All compression algorithms for tape are
loss-less algorithms.)  Lossy algorithms are only used in things like
video compression, where it's ok to lose blocks along the way as long as
the human eye can't detect them, or as long as you can fit it on
youtube.

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Iverson,
Jerald
Sent: Thursday, October 18, 2007 8:52 AM
To: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?


 What you must grasp is that it is *impossible* to
 represent/re-create/look up the values of 2^65536 bits in fewer than
 2^65536 bits--unless you concede that each checksum/hash/fingerprint
 will represent many different values of the original data--any more
than
 you can represent three bits of data with two.

that is why i have turned off all hardware and software compression on
my tape drives.  imagine trying to store more than 400GB of data onto a
single lto3 tape!  they say that you can store up to and even more
than 800GB, but i don't believe a word of it.  there is no way 1 nibble
of data can represent 1 byte!  once i have the time to study lzr
compression and understand it, and see whether or not it is
data-loss-less, then i may turn compression back on.  until then,
tapes are cheap and i'll buy 2.5 times as many as i need.  :-)

thanks,
jerald

p.s.
our de-dupe vtl does the hash and then a bit by bit comparison of the
data block to ensure the data really is the same in order to eliminate
the duplicate block.  i think some of the confusion may be in not
understanding how the de-dupe process works.  once you create a hash for
a block of data, you are storing the hash AND the block of data.  you
are never having to re-create a big block a data from a smaller hash.
the backup stream of data gets re-written from a string of 8k blocks,
into a string of 160-bit pointers which point to the unique 8k blocks
of data via the hash table.  or something like that...

Confidentiality Note: The information contained in this
message, and any attachments, may contain confidential
and/or privileged material.  It is intended solely for the
person(s) or entity to which it is addressed.  Any review,
retransmission, dissemination, or taking of any action in
reliance upon this information by persons or entities other
than the intended recipient(s) is prohibited.  If you received
this in error, please contact the sender and delete the
material from any computer.


___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments

2007-10-18 Thread Eagle, Kent
Sorry, but I just can't keep from jumping in at this point.
Not taking either side, but...

Are you seriously suggesting that a quote from Wikipedia constitutes
empirical scientific research? I could place a posting on there that
either concurs with, or totally rejects the position of that posting;
and someone else would come along and claim it as gospel.

I would be the first to admit that bob944 has made more than a few
posts that have pushed my chair back a couple inches, but at least
they made me THINK!

Saying
 This is the part where I believe you've made your mind up already.
You're saying that no matter what the entire world is saying -- no
matter what the numbers are, you're not going to accept hash-based
de-dupe.  Fine!  That's why there are vendors that don't use hashes to
de-dupe data.  Buy one of those instead.
Is pretty gutsy since you have another post within the past few days
stating you're ready to RETRACT what you already blogged on this, or
blogged on that. Wouldn't THAT be saying that up until that point, YOU
WERE SAYING that no matter what the entire world is saying -- no matter
what the numbers are, you're not going to accept...

If I am asked to restore something for the CEO, and can't, it won't
matter a hill of beans what all the theory was and what the odds were. I
either can, or I can't. I'll be accountable for that result, and why I
got it. As someone so accurately posted recently: We're in the recovery
business, not the restore business.

I would thing that almost everyone on this forum does some kind of pilot
before rolling something out into production.

I hope I'm wrong. I love to learn. I'm actually signed up for one of
your classes next week. But, if quoting everyone else's
posts/blogs/Wikipedia entries, etc. without backing up re-posting them
with empirical evidence or firsthand testing is your program agenda, I
will skip the engagement...

BTW - You Tilt at Windmills (Don Quixote), you don't chase them.  ;-)

Take care,

Kent Eagle
MTS Infrastructure Engineer II, MCP, MCSE
Tech Services / SMSS


---
Message: 1
Date: Thu, 18 Oct 2007 04:06:52 -0400
From: Curtis Preston [EMAIL PROTECTED]
Subject: Re: [Veritas-bu] Tapeless backup environments?
To: [EMAIL PROTECTED], veritas-bu@mailman.eng.auburn.edu
Message-ID:

[EMAIL PROTECTED]
Content-Type: text/plain;   charset=US-ASCII

At the risk of chasing windmills, I will continue to try to have this
discussion, although it appears to me that you're already made up your
mind.  I again say that no one is saying that hash collisions can't
happen.  We are simply saying that the odds of them happening are
astromically less than having an undetected/uncorrected bit error on
tape.  And I believe that the math that I use in my blog post
illustrates this.

I said:
 As promised, I looked into applying the Birthday Paradox
 logic to de-duplication.  I blogged about my results here:
 
 http://www.backupcentral.com/content/view/145/47/
 
 Long and short of it: If you've got less than 95 Exabytes of
 data, I think you'll be OK.

Bob944 said:
One of us still doesn't understand this. :-)

Got that right. :-)

Your blog raises a red herring in misunderstanding or misrepresenting 
the applicability of Birthday Paradox.

I completely disagree.  If you read the Birthday Paradox entry on
Wikipedia, it specifically explains how the Birthday Paradox applies in
this case.  All the BP says is that the odds of a clash (i.e. a
birthday match or a hash collision) in an environment increase with the
number of elements in the set, and that the odds increase faster than
you think:

* The odds of two people in the same room having the same birthday 
  increase with the number of people in the room.  If there are only
  two people in the room, those odds will be roughly 1 in 365, or .27% 
  (leap year aside).  If there are 23 people in the room, 
  the odds are 50%.
  
* The odds of two DIFFERENT blocks having the same hash (i.e. a
  hash collision) increase with the number of blocks in the data set
  If there are two blocks in the set, the odds are 1 in 2^160.
  If there are less than 12.7 quintillion blocks in the data set,
  the odds don't show up in a percentage calculated out to 50 decimal
  places.  As soon as you have more than 12.7 quintillion blocks, the
  odds at least register in 50 decimal places, but are still really 
  small.  And to get 12.7 quintillion blocks, you need to store at
  least 95 Exabytes of data.

The number of possible values in
BP is 366; there is no data reduction in it, no key values.  An 
algorithm which reduced the 366 possibilities the same way that hashing

8KB down to 160 bits would yield infinitesimal keys smaller than one 
bit, an absurdity.

Yeah, IMHO, we are talking apples and oranges.  Let me try to put the
hash collision into the birthday world.  Let's say that we want a wall
of photos of everyone who came to our party.  When you show

Re: [Veritas-bu] Tapeless backup environments

2007-10-18 Thread Dustin Damour
I would say no as Wikipedia is like an encyclopedia and is a good spot
to start but it isn't peer reviewed published articles so in research it
would not be considered a valid source.

Dustin D'Amour
Wireless Switching
Plateau Wireless


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Eagle,
Kent
Sent: Thursday, October 18, 2007 12:59 PM
To: [EMAIL PROTECTED]; veritas-bu@mailman.eng.auburn.edu
Cc: [EMAIL PROTECTED]
Subject: Re: [Veritas-bu] Tapeless backup environments

Sorry, but I just can't keep from jumping in at this point.
Not taking either side, but...

Are you seriously suggesting that a quote from Wikipedia constitutes
empirical scientific research? I could place a posting on there that
either concurs with, or totally rejects the position of that posting;
and someone else would come along and claim it as gospel.

I would be the first to admit that bob944 has made more than a few
posts that have pushed my chair back a couple inches, but at least
they made me THINK!

Saying
 This is the part where I believe you've made your mind up already.
You're saying that no matter what the entire world is saying -- no
matter what the numbers are, you're not going to accept hash-based
de-dupe.  Fine!  That's why there are vendors that don't use hashes to
de-dupe data.  Buy one of those instead.
Is pretty gutsy since you have another post within the past few days
stating you're ready to RETRACT what you already blogged on this, or
blogged on that. Wouldn't THAT be saying that up until that point, YOU
WERE SAYING that no matter what the entire world is saying -- no matter
what the numbers are, you're not going to accept...

If I am asked to restore something for the CEO, and can't, it won't
matter a hill of beans what all the theory was and what the odds were. I
either can, or I can't. I'll be accountable for that result, and why I
got it. As someone so accurately posted recently: We're in the recovery
business, not the restore business.

I would thing that almost everyone on this forum does some kind of pilot
before rolling something out into production.

I hope I'm wrong. I love to learn. I'm actually signed up for one of
your classes next week. But, if quoting everyone else's
posts/blogs/Wikipedia entries, etc. without backing up re-posting them
with empirical evidence or firsthand testing is your program agenda, I
will skip the engagement...

BTW - You Tilt at Windmills (Don Quixote), you don't chase them.  ;-)

Take care,

Kent Eagle
MTS Infrastructure Engineer II, MCP, MCSE
Tech Services / SMSS


---
Message: 1
Date: Thu, 18 Oct 2007 04:06:52 -0400
From: Curtis Preston [EMAIL PROTECTED]
Subject: Re: [Veritas-bu] Tapeless backup environments?
To: [EMAIL PROTECTED], veritas-bu@mailman.eng.auburn.edu
Message-ID:

[EMAIL PROTECTED]
Content-Type: text/plain;   charset=US-ASCII

At the risk of chasing windmills, I will continue to try to have this
discussion, although it appears to me that you're already made up your
mind.  I again say that no one is saying that hash collisions can't
happen.  We are simply saying that the odds of them happening are
astromically less than having an undetected/uncorrected bit error on
tape.  And I believe that the math that I use in my blog post
illustrates this.

I said:
 As promised, I looked into applying the Birthday Paradox
 logic to de-duplication.  I blogged about my results here:
 
 http://www.backupcentral.com/content/view/145/47/
 
 Long and short of it: If you've got less than 95 Exabytes of
 data, I think you'll be OK.

Bob944 said:
One of us still doesn't understand this. :-)

Got that right. :-)

Your blog raises a red herring in misunderstanding or misrepresenting 
the applicability of Birthday Paradox.

I completely disagree.  If you read the Birthday Paradox entry on
Wikipedia, it specifically explains how the Birthday Paradox applies in
this case.  All the BP says is that the odds of a clash (i.e. a
birthday match or a hash collision) in an environment increase with the
number of elements in the set, and that the odds increase faster than
you think:

* The odds of two people in the same room having the same birthday 
  increase with the number of people in the room.  If there are only
  two people in the room, those odds will be roughly 1 in 365, or .27% 
  (leap year aside).  If there are 23 people in the room, 
  the odds are 50%.
  
* The odds of two DIFFERENT blocks having the same hash (i.e. a
  hash collision) increase with the number of blocks in the data set
  If there are two blocks in the set, the odds are 1 in 2^160.
  If there are less than 12.7 quintillion blocks in the data set,
  the odds don't show up in a percentage calculated out to 50 decimal
  places.  As soon as you have more than 12.7 quintillion blocks, the
  odds at least register in 50 decimal places, but are still really 
  small.  And to get 12.7 quintillion

Re: [Veritas-bu] Tapeless backup environments

2007-10-18 Thread Curtis Preston
Glad to have another person in the party.  What's your birthday? ;)

Are you seriously suggesting that a quote from Wikipedia constitutes
empirical scientific research? 

NO.  He said that I was misusing the Birthday Paradox, and I merely
pointed to the Wikipedia article that uses it the same way.  If you
search on Birthday Paradox on Google, you'll also find a number of other
articles that use the BP in the same way I'm using it, specifically in
regards to hash collisions, as the concept is not new to deduplication.
It has applied to cryptographic uses of hashing for years.

I then went further to explain WHY the BP applies, and I gave a reverse
analogy that I believe completed my argument that the BP applies in this
situation. So..

As to whether or not what I'm doing is empirical scientific research,
It's not.  Empirical research requires testing, observation, and
repeatability.  For the record, I have done repeated testing of many
hash-based dedupe systems using hundreds of backups and restores without
a single hash occurrence of data corruption, but that doesn't address
the question.  IMHO, it's the equivalent of saying a meteor has never
hit my house so meteors must never hit houses.  The discussion is about
the statistical probabilities of a meteor hitting your house, and you
have to do that with math, not empirical scientific research.

I would be the first to admit that bob944 has made more than a few
posts that have pushed my chair back a couple inches, but at least
they made me THINK!

And you're saying that my half-a-dozen or so blog postings on the
subject, and none of my responses in this thread don't make you think?
I was fine until I quoted Wikipedia, is that it? ;)

Is pretty gutsy since you have another post within the past few days
stating you're ready to RETRACT what you already blogged on this, or
blogged on that. 

I am admitting that I am not a math or statistics specialist and that I
misunderstood the odds before.  What's wrong with that?  That I was
wrong before, or that I'm stating it publicly that I was wrong before?
I was wrong. I was told I was wrong because I didn't apply the birthday
paradox.  So I applied the Birthday Paradox in the same way I see
everyone else applying it, and the way that makes sense according to the
problem, and the numbers still come out OK.

Wouldn't THAT be saying that up until that point, YOU
WERE SAYING that no matter what the entire world is saying -- no
matter
what the numbers are, you're not going to accept...

No, because I never said those words or anything like them in my
article.   I said, some people say this, but I say that.  Then I even
elicited feedback from the audience.  The point of that portion of the
article was that some are talking about hash collisions as if they're
going to happen to everybody and happen a lot, and I wanted to add some
actual math to the discussion, rather than just talk about fear
uncertainty and doubt (FUD).  I felt there was a little Henny-Penny
business going on.

If I am asked to restore something for the CEO, and can't, it won't
matter a hill of beans what all the theory was and what the odds were.
I
either can, or I can't. I'll be accountable for that result, and why I
got it. As someone so accurately posted recently: We're in the recovery
business, not the restore business.

You won't get any argument from me.  I think you'll find almost that
exact sentence in the first few paragraphs of any of my books.  Having
said that, we all use technologies as part of our backup system that
have a failure rate percentage (like tape).  And to the best of my
understanding, the odds of a single hash collision in 95 Exabytes of
data is significantly lower than the odds of having corrupted data on an
LTO tape and not even knowing it, based on the odds they publish.  Even
if you make two copies, the copy could be corrupted, and you could have
a failed restore. Yet we're all ok with that, but we're freaking out
about hash collisions, which statistically speaking have a MUCH lower
probability of happening.

I would thing that almost everyone on this forum does some kind of
pilot
before rolling something out into production.

I sure as heck hope so, but I don't think it addresses this issue.  So
you test it and you don't get any hash collisions. What does that prove?
It proves that a meteor has never hit your house.

What I recommend (especially if you're using a hash-only de-dupe system)
is a constant verification of the system.  Use a product like NBU that
can do CRC checks against the bytes it's copying or reading, and either
copy all de-duped data to tape or run a NBU verify on every backup.  If
you have a hash collision, your copy or verify will fail, and at least
know when it happens.

I hope I'm wrong. 

About what? That I'm an idiot? ;)  I think judging me solely on this
long, protracted, difficult to follow discussion (with over 70 posts) is
probably unfair.  Remember also that these posts are often done on my
own 

Re: [Veritas-bu] Tapeless backup environments?

2007-10-18 Thread A Darren Dunham
On Thu, Oct 18, 2007 at 01:44:03PM -0400, Curtis Preston wrote:
 So you're OK with hash-based de-dupe, which everyone acknowledges has a
 chance (although quite small) that you could have a hash-collision and
 potentially corrupt a block of data somewhere, sometime, when you least
 expect it...
 
 But you're NOT ok with the long-running industry standard of loss-less
 compression algorithms? [...]

I think the smiley on the end indicated that it was a humorous comment.
At least that's how I took it.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-10-18 Thread Austin Murphy
On 10/18/07, Iverson, Jerald [EMAIL PROTECTED] wrote:
 ...
 that is why i have turned off all hardware and software compression on
 my tape drives.  imagine trying to store more than 400GB of data onto a
 single lto3 tape!  they say that you can store up to and even more
 than 800GB, but i don't believe a word of it.  there is no way 1 nibble
 of data can represent 1 byte!  once i have the time to study lzr
 compression and understand it,
snip

Hi jerald,

Data compression exploits the non-randomness of normal data.
Compression algorithms have variable compression rates because their
performance is dependent on the data being compressed.Truly random
data does NOT compress at all.   Typical data is not truly random.
Once data has been compressed, it is close to random, so compression
can not be applied again.  Many encryption algorithms also result in
near-random data that does not compress.

A formal definition of a data set's randomness is it's Kolmogorov
complexity. http://en.wikipedia.org/wiki/Kolmogorov_complexity

Compression is just an alternate means of data representation.
Several others are at work on your LTO tapes too!
http://en.wikipedia.org/wiki/Forward_error_correction
http://en.wikipedia.org/wiki/Run_Length_Limited
http://en.wikipedia.org/wiki/PRML

Don't get too paranoid...these are good things.

Austin
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-10-18 Thread bob944
 discussion, although it appears to me that you're already made up your
 mind.  

I'd prefer to say I have little interest in a technology which, by
design, will retrieve a completely different chunk of data than what was
written, with no notice whatsoever.  BTW, before you bring out tape
errors again, I posted long ago why this argument was not comparable.

No point in beating the poor Birthday Paradox to death; you've
completely missed the point there.  It doesn't matter that the same
values come up more often than our intuition suggests--which is the
_only_ lesson of BP--what matters is if you use a shorthand to track the
values which can't tell that Feb 7 and Dec 28 are different values
because you put them in the same hash bucket and therefore think that
everything that bucket is Feb 7, you retrieve the wrong data.

Here's all a thinking person responsible for data needs to consider:

An 8KB chunk of data can have 2^65536 possible values.  Representing
that 8KB of data in 160 bits means that each of the 2^160 possible
checksum/hash/fingerprint values MUST represent, on average, 2^65376
*different* 8KB chunks of data.  

If that doesn't concern you, well, it's safe to say I won't be hiring
you as my backup admin.  Or as my technology consultant, since you
should know from earlier postings that spoofing your favorite 160-bit
hashing algorithm with reasonable-looking fake data is now old hat.  The
exploit itself should concern us, not to mention that it also
illustrates that similar data which yields the same hash is not the
once-in-the-lifetime-of-the-universe oddity you portray.

Everything mentioned here was covered in the original postings a month
ago.  Unless there's something new, I'm done with this.


___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-10-16 Thread A Darren Dunham
On Tue, Oct 16, 2007 at 12:09:30AM -0400, bob944 wrote:
 One of us still doesn't understand this. :-)
 
 Your blog raises a red herring in misunderstanding or misrepresenting
 the applicability of Birthday Paradox.  The number of possible values in
 BP is 366; there is no data reduction in it, no key values.

The 366 isn't the data space, it's the keyspace.  When we look at a
person's birthday, we're hashing them into that space.  The paradox
then is how many people can we hash before the chance of a collision
is significant.  

Obviously if 400 people are in a room, the number of values exceeds the
keyspace and the probability of a collision is greater than 1.

 An
 algorithm which reduced the 366 possibilities the same way that hashing
 8KB down to 160 bits would yield infinitesimal keys smaller than one
 bit, an absurdity.

I'm afraid I don't understand what you mean with that sentence.

 An absurdity which should show that even if it
 stopped at eight bits, one short of the bits required to hold 1-366,
 there would still be fatal hash collisions--say, Feb 7, Feb 11 and Jun
 30 all represented by the same code, in which case you can't figure out
 if people in the room have the same birthday.

What is stopping at 8 bits?  Hash collisions can always occur.  The
question is what is the probability.
 
 What you must grasp is that it is *impossible* to
 represent/re-create/look up the values of 2^65536 bits in fewer than
 2^65536 bits--unless you concede that each checksum/hash/fingerprint
 will represent many different values of the original data--any more than
 you can represent three bits of data with two.

I think everyone aknowledges that as a fact.

 Hashing is a technique for saving time in certain circumstances.  It is
 valueless in re-creating (and a lookup is a re-creation) original data
 when those data can have unlimited arbitrary values.

The argument is that a process does not have to be infallible to be
valuable, much like the electrical and mechanical processes we currently
use.  That if the chance of failure in the algorithm is much less then
the chance of other parts of the system introducing silent data
corruption, then the overall amount of data loss is not significantly
changed. 

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-10-15 Thread bob944
cpreston [EMAIL PROTECTED]:
 As promised, I looked into applying the Birthday Paradox 
 logic to de-duplication.  I blogged about my results here:
 
 http://www.backupcentral.com/content/view/145/47/
 
 Long and short of it: If you've got less than 95 Exabytes of 
 data, I think you'll be OK.

One of us still doesn't understand this. :-)

Your blog raises a red herring in misunderstanding or misrepresenting
the applicability of Birthday Paradox.  The number of possible values in
BP is 366; there is no data reduction in it, no key values.  An
algorithm which reduced the 366 possibilities the same way that hashing
8KB down to 160 bits would yield infinitesimal keys smaller than one
bit, an absurdity.  An absurdity which should show that even if it
stopped at eight bits, one short of the bits required to hold 1-366,
there would still be fatal hash collisions--say, Feb 7, Feb 11 and Jun
30 all represented by the same code, in which case you can't figure out
if people in the room have the same birthday.

What you must grasp is that it is *impossible* to
represent/re-create/look up the values of 2^65536 bits in fewer than
2^65536 bits--unless you concede that each checksum/hash/fingerprint
will represent many different values of the original data--any more than
you can represent three bits of data with two.

Hashing is a technique for saving time in certain circumstances.  It is
valueless in re-creating (and a lookup is a re-creation) original data
when those data can have unlimited arbitrary values.  All the blog
hand-waving about decimal places, Zetabytes and the specious comparison
to undetected write errors will not change that.  What _would_ be a
useful exercise for the reader is to discover how many unique values of
8KB are, on average, represented by a given 160-bit
checksum/hash/fingerprint.


___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


[Veritas-bu] Tapeless backup environments?

2007-10-14 Thread cpreston

As promised, I looked into applying the Birthday Paradox logic to 
de-duplication.  I blogged about my results here:

http://www.backupcentral.com/content/view/145/47/

Long and short of it: If you've got less than 95 Exabytes of data, I think 
you'll be OK.

+--
|This was sent by [EMAIL PROTECTED] via Backup Central.
|Forward SPAM to [EMAIL PROTECTED]
+--


___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-10-01 Thread McCammont, Anderson (IT)
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf 
 Of Curtis Preston
 Sent: 01 October 2007 06:35
 To: [EMAIL PROTECTED]; veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu] Tapeless backup environments?
...
 
 These are odds based on the size of the key space.  If you have 2^160
 odds, you have a 1:2^160 chance of a collision.

by saying that, the implication is that the keyspace is uniform.  It's
not.  The probablity of a hash collision is a function of the uniformity
of the keyspace as well as the number of items you've hashed and the
size of the key.  There's lots of research in the crypto field that's
relevant to de-dupe.

You also should consider the characteristics of the de-dupe software
when it encounters a hash collision.  Backups are the last line of
defence for many, when all else (personal copies, replication, snapshots
etc.) has failed.  The 'acceptable risk' of a hash collision is of
little comfort when you've got one.  Does it fail silently, throw it's
hands in the air and core dump, or handle the situation gracefully and
carry on without missing a beat.  Ask them what they do.  As Curtis
mentioned, not all de-dupe s/ware relies purely on hashes.  

Balance this with the /fact/ that there's already a chance of undetected
corruption in the components you buy today, which is why most
technologies that survive impose their own data validation checks
instead of relying purely on the underlying technology in the stack to
have checked it for them.  The multi-layered checks that go on improve
your overall confidence. 

At least one design in the SiS field also accepts that hashing
algorithms will improve over time and they've had the foresight to be
able to drop in new hashing schemes in future.

When picking de-dupe software you should also care about Intellectual
Property.  Who's got what isn't necessarily clear in this space, and the
patent lawyers won't be far away.  Picking the big boys help here, but
also look at people with a mature view to the marketplace (eg. some
companies are prepared to talk about licensing deals rather than court
cases when they encounter infringement)

There's lots of other things to consider in picking an algorithm,
including how well it handles patterns that don't fall naturaly on block
boundaries (think of the challenges involved in de-duping 'the quick
brown fox' and 'the quicker brown fox') that will affect de-dupe ratios,
and how that affects performance.  And the solution's not just about the
algorithm.

De-dupe is a great advance, and a disruptive technology not just for
backup but also for primary storage.  Look forward to it, but go in with
your eyes open.


NOTICE: If received in error, please destroy and notify sender. Sender does not 
intend to waive confidentiality or privilege. Use of this email is prohibited 
when received in error.

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-30 Thread Curtis Preston
Chris Freemantle said:
It's interesting that the probability of any 2 randomly selected hashs 
being the same is quoted, rather than the probability that at least 2 
out of a whole group are the same. That's probably because the minutely

small chance becomes rather bigger when you consider many hashs. This 
will still be small, but I suspect not as reassuringly small.
To illustrate this consider the 'birthday paradox'. 

I'm really glad you point this out.  The way I interpret this is that
the odds of their being a hash collision in your environment increase
with every new block of data you submit to the de-duplication system.
I've talked to somebody who has researched this mathematically, and he
says he's going to share with me his calculations.  I'll share them
if/when he shares them with me.  As a proponent of these systems, I
certainly don't want to misrepresent the odds they represent.

For our data I would certainly not use de-duping, even if it did work 
well on image data.

I think you're under the misconception that all de-dupe systems use ONLY
hashes to identify redundant data.  While there are products that do
this (and I still trust them more than you do), there are also products
that do a full block comparison of the supposedly matching blocks before
throwing one of them away.

In addition, there are ways to completely remove the risk you're worried
about.  If you backup to a de-dupe backup system, regardless of its
design, and then use your backup software to copy from it to tape (or
anything), you verify the de-duped data, as any good backup software
will check all data it copies against its own stored checksums.

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-30 Thread Curtis Preston
Bob,

I'll try to respond as best as I can.

No importa.  The length of the checksum/hash/fingerprint and the
sophistication of its algorithm only affect how frequently--not
whether--the incorrect answer is generated.

You and I don't disagree on this.  The only thing we differ with is the
odds of the event.  I think the odds are small enough to not be
concerned with, and you think they're larger than that.

(I also think it's important to state what I stated in my other reply.
Most de-dupe systems do not rely only on hashes.  So if you can't get
past this whole hashing thing, there's no reason to reject de-dupe
altogether.  Just make sure your vendor uses an alternate method.

The notion that the bad guys will never figure out a way to plant a
silent data-change based on checksum/hash/fingerprint collisions is,
IMO, naive.

So someone is going to exploit the hash collision possibilities in my
backup system to do what, exactly?  As much as I've spoken and written
about storage security, I can't for the life of me figure out what
someone would hope to gain or how they would gain it this way.

Those are impressive, and dare I guess, vendor-supplied, numbers.  And
they're meaningless.  

These are odds based on the size of the key space.  If you have 2^160
odds, you have a 1:2^160 chance of a collision.

What _is_ important?  To me, it's important that if I read
back any of the N terrabytes of data I might store this week, I get the
same data that was written, not a silently changed version because the
checksum/hash/fingerprint of one block that I wrote collides with
another cheksum/hash/fingerprint.  

This is referring to the birthday paradox.  As I stated in another post,
I haven't thought about this before, and am looking into what the real
odds are.  I'm trying to translate it into actual numbers.

I can NOT have that happen to any
block--in a file clerk's .pst, a directory inode or the finance
database.  Probably, it won't happen is not acceptable.

Couldn't agree more.

 Let's compare those odds with the odds of an unrecoverable 
 read error on a typical disk--approximately 1 in 100 trillion

Bogus comparison.  In this straw man, that 1/100,000,000,000,000 read
error a) probably doesn't affect anything

I thought probably wasn't acceptable?  I'm sorry, that was just too
close to your previous use of probably in a very different context.

probably doesn't affect anything because of the higher-level
RAID array it's in and b) if it does, there's an error, a
we-could-not-read-this-data, you-can't-proceed, stop, fail,
get-it-from-another-source error--NOT a silent changing of the data
from
foo to bar on every read with no indication that it isn't the data that
was written.

I think Darren's other posts about this point are sufficient.  It
happens.  It happens all the time, and is well documented.  And yet the
industry's ok with this.  On the other hand, the odds of what we're
talking about are significantly smaller and people are freaking out.

 If you want to talk about the odds of something bad happening and not
 knowing it, keep using tape. Everyone who has worked with tape for
any
 length of time has experienced a tape drive writing something that it
 then couldn't read.

That's not news, and why we've been making copies of data for, oh, 50
years or so.

I'm just saying that a hash collision, however possible, would basically
translate into a failed backup that looks good.  Do you have any idea
how many failed backups that look good happen every single day with
tape?  And, as long as you bring up making copies, making copies of your
de-duped data removes any concerns, as it verifies the original.

 Compare that to successful deduplication disk
 restores. According to Avamar Technologies Inc. (recently acquired by
 EMC Corp.), none of its customers has ever had a failed restore.

Now _there's_ an unbiased source.

Touche'. Anyone who has actually experienced a hash collision in their
de-duplication backup system please stand up.  Given the hype that
de-dupe has made, don't you think that anyone who had experienced such a
thing would have reported it and such a report would have been given big
press?  I sure do.  And yet there has been nothing.


___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-27 Thread A Darren Dunham
On Wed, Sep 26, 2007 at 05:15:08PM -0400, bob944 wrote:
 Perhaps anything can have a failure mode where it doesn't alert--but in
 a previous lifetime in hardware and some design, I saw only one
 undetected data transformation that did not crash or in some way cause
 obvious problems (intermittent gate in a mainframe adder that didn't
 affect any instructions used by the OS).  

There's a lot more data out there now (more chances for problems).
Disk firmware has become much more complex.

 I don't remember a disk that didn't maintain, compare and _use for error
 detection_, the cylinder, head and sector numbers in the format.  

Disks may (usually) do that, but they don't report it back to you so you
can verify, and they're not perfect.

One of the ZFS developers wrote about a disk firmware bug they
uncovered.  Every once in a while the disk would return the data not
from the requested block but from a block with some odd calculated
offset from that one.  Unless the array/controller/system is checking
the data, you'll never know until it hits something critical.

Netapp also talks about the stuff they had to add because of silently
dropped writes and corrupted reads.

Everything has an error rate.  

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-26 Thread bob944
cpreston:
 Simplistically, it checksums the block and looks in a table of
 checksums-of-blocks-that-it-already-stores to see if the identical
 ahem, anyone see a hole here? data already lives there.  
 
 To what hole do you refer? 

The idea that N bits of data can unambiguously be represented by fewer
than N bits.  Anyone who claims to the contrary might as well knock out
perpetual motion, antigravity and faster-than-light travel while they're
on a roll.

 I see one in your simplistic example, but
 not in what actually happens (which require a much longer technical
 explanation).

Hence my introduction that began with [s]implistically.  But throw in
all the much longer technical explanation you like, any process which
compares a reduction-of-data to another reduction-of-data will sooner or
later return foo when what was originally stored was bar.


cpreston:
 There are no products in the market that rely solely on a checksum to
 identify redundant data.  There are a few that rely solely on 
 a 160-bit
 hash, which is significantly larger than a checksum (typically 12-16

No importa.  The length of the checksum/hash/fingerprint and the
sophistication of its algorithm only affect how frequently--not
whether--the incorrect answer is generated.

 [...] The ability to forcibly create a hash collision means 
 absolutely nothing in the context of deduplication.

Of course it does.  Most examples in the literature concern storing
crafted-data-pattern-A (pay me one dollar) in order for the data to be
read later as something different (pay me one million dollars).  It
can't have escaped your attention that every day, some yahoo crafts
another buffer-or-stack overflow exploit; some of them are brilliant.
The notion that the bad guys will never figure out a way to plant a
silent data-change based on checksum/hash/fingerprint collisions is,
IMO, naive.

 What matters is the chance that two
 random chunks would have a hash collision. With a 128-bit and 160-bit
 key space, the odds of that happening are 1 in 2128 with MD5, and 1 in
 2160 with SHA-1. That's 1038 and 1048, respectively. If you 

Grasshopper, the wisdom is not in the numbers, it is in remembering that
HTML will not paste into ASCII well.  But I suspect you mean one in
2^128 or similar.

Those are impressive, and dare I guess, vendor-supplied, numbers.  And
they're meaningless.  We do not care about the odds that a particular
block the quick brown fox jumps over the lazy dog
checksums/hashes/fingerprints to the same value as another particular
block now is the time for all good men to come to the aid of their
party.  Of _course_ that will be astronomically unlikely, and with
sufficient hand-waving (to quote your article:  the odds of a hash
collision with two random chunks are roughly
1,461,501,637,330,900,000,000,000,000 times greater than the number of
bytes in the known computing universe) these totally meaningless
numbers can seem important.

They're not.  What _is_ important?  To me, it's important that if I read
back any of the N terrabytes of data I might store this week, I get the
same data that was written, not a silently changed version because the
checksum/hash/fingerprint of one block that I wrote collides with
another cheksum/hash/fingerprint.  I can NOT have that happen to any
block--in a file clerk's .pst, a directory inode or the finance
database.  Probably, it won't happen is not acceptable.

 Let's compare those odds with the odds of an unrecoverable 
 read error on a typical disk--approximately 1 in 100 trillion

Bogus comparison.  In this straw man, that 1/100,000,000,000,000 read
error a) probably doesn't affect anything because of the higher-level
RAID array it's in and b) if it does, there's an error, a
we-could-not-read-this-data, you-can't-proceed, stop, fail,
get-it-from-another-source error--NOT a silent changing of the data from
foo to bar on every read with no indication that it isn't the data that
was written.

 If you want to talk about the odds of something bad happening and not
 knowing it, keep using tape. Everyone who has worked with tape for any
 length of time has experienced a tape drive writing something that it
 then couldn't read.

That's not news, and why we've been making copies of data for, oh, 50
years or so.

 Compare that to successful deduplication disk
 restores. According to Avamar Technologies Inc. (recently acquired by
 EMC Corp.), none of its customers has ever had a failed restore.

Now _there's_ an unbiased source.


___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-26 Thread Chris Freemantle
Just a teensy point - LTO3 tapes should store 400Gb natively. They're 
marketed as having a capacity up to 800Gb, but that's with 2:1 
compression. We normally get about 550GB for MRI data.

LTO4 are available with 800Gb native capacity. The drives can also 
encrypt data.

Dave Markham wrote:
 Guys i've just read this thread and can say im very interested in it.
 The first thing is i learned a new term called deduplication which i
 didn't know existed.
 
 Question : I gather Deduplication is using other software. DataDomain i
 think i saw mentioned. Where does this fit in with Netbackup and does
 the software reside on every client or just a server somewhere?
 
 Ok, so im trying to kit refresh a backup environment for a customer
 which has 2 sites. Production and DR about 200 miles apart. There is a
 link between the sites but the customer will probably frown on increased
 bandwidth charges to transfer backup data across for DisasterRecovery
 purposes.
 
 Data is probably only 1 TB for the site with perhaps 70% being required
 to be transfered daily to offsite media.
 
 Currently i use tape and i was just speccing a new tape system as i
 thought by using disk based backups, and retentions of weekly/monthly
 backups lasting say 6 weeks, im going to need a LOT of disk, plus the
 bandwidth transfer costs to DR site
 
 LTO3 tapes are storing 200gb a tape which is pretty good compared to
 disk i thought.
 
 I guess in my set up its a trade off between :-
 
 Initial cost of disk array vs initial cost of tape library, drives and media
 
 Time take to backup ( network will be bottle neck here. Still on 100Meg
 lan with just 2 DB servers using GigaBit lan to backup server.
 
 Offsite transfer of tapes daily to offsite location vs Cost of increased
 bandwith between sites to transfer backup data.
 
 
 Im now confused what to propose :)
 
 
 
 

-- 
Do you want a picture of your brain - volunteer for a brain scan!
http://www.fil.ion.ucl.ac.uk/Volunteers/

Computer systems go wrong - even backup systems
Be paranoid!

Chris Freemantle, Data Manager
Wellcome Trust Centre for Neuroimaging
+44 (0)207 833 7496
www.fil.ion.ucl.ac.uk
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-26 Thread Jeff Lightner
Most of this while well documented seems to boil down to the same
alarmist notion that had people trying to ban cell phones in gas
stations.  The possibility that something untoward COULD happen does NOT
mean it WILL happen.  To date I don't know of a single gas pump
explosion or car fire that was traced to cell phone usage at the pump.
Oddly enough though no one monitors gas pumps to be sure users aren't
re-entering their vehicles and fires HAVE been traced to static
electricity caused by that.

If odds are so important it seems it would be important to worry about
the odds that your data center, your offsite storage location and your
Disaster Recovery site will all be taken out at the same time.

I also suggest the argument is flawed because it seems to imply that
only the cksum is stored and no actual the data - it is original
compressed data AND the cksum that result in the restore - not the cksum
alone.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of bob944
Sent: Wednesday, September 26, 2007 4:03 AM
To: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

cpreston:
 Simplistically, it checksums the block and looks in a table of
 checksums-of-blocks-that-it-already-stores to see if the identical
 ahem, anyone see a hole here? data already lives there.  
 
 To what hole do you refer? 

The idea that N bits of data can unambiguously be represented by fewer
than N bits.  Anyone who claims to the contrary might as well knock out
perpetual motion, antigravity and faster-than-light travel while they're
on a roll.

 I see one in your simplistic example, but
 not in what actually happens (which require a much longer technical
 explanation).

Hence my introduction that began with [s]implistically.  But throw in
all the much longer technical explanation you like, any process which
compares a reduction-of-data to another reduction-of-data will sooner or
later return foo when what was originally stored was bar.


cpreston:
 There are no products in the market that rely solely on a checksum to
 identify redundant data.  There are a few that rely solely on 
 a 160-bit
 hash, which is significantly larger than a checksum (typically 12-16

No importa.  The length of the checksum/hash/fingerprint and the
sophistication of its algorithm only affect how frequently--not
whether--the incorrect answer is generated.

 [...] The ability to forcibly create a hash collision means 
 absolutely nothing in the context of deduplication.

Of course it does.  Most examples in the literature concern storing
crafted-data-pattern-A (pay me one dollar) in order for the data to be
read later as something different (pay me one million dollars).  It
can't have escaped your attention that every day, some yahoo crafts
another buffer-or-stack overflow exploit; some of them are brilliant.
The notion that the bad guys will never figure out a way to plant a
silent data-change based on checksum/hash/fingerprint collisions is,
IMO, naive.

 What matters is the chance that two
 random chunks would have a hash collision. With a 128-bit and 160-bit
 key space, the odds of that happening are 1 in 2128 with MD5, and 1 in
 2160 with SHA-1. That's 1038 and 1048, respectively. If you 

Grasshopper, the wisdom is not in the numbers, it is in remembering that
HTML will not paste into ASCII well.  But I suspect you mean one in
2^128 or similar.

Those are impressive, and dare I guess, vendor-supplied, numbers.  And
they're meaningless.  We do not care about the odds that a particular
block the quick brown fox jumps over the lazy dog
checksums/hashes/fingerprints to the same value as another particular
block now is the time for all good men to come to the aid of their
party.  Of _course_ that will be astronomically unlikely, and with
sufficient hand-waving (to quote your article:  the odds of a hash
collision with two random chunks are roughly
1,461,501,637,330,900,000,000,000,000 times greater than the number of
bytes in the known computing universe) these totally meaningless
numbers can seem important.

They're not.  What _is_ important?  To me, it's important that if I read
back any of the N terrabytes of data I might store this week, I get the
same data that was written, not a silently changed version because the
checksum/hash/fingerprint of one block that I wrote collides with
another cheksum/hash/fingerprint.  I can NOT have that happen to any
block--in a file clerk's .pst, a directory inode or the finance
database.  Probably, it won't happen is not acceptable.

 Let's compare those odds with the odds of an unrecoverable 
 read error on a typical disk--approximately 1 in 100 trillion

Bogus comparison.  In this straw man, that 1/100,000,000,000,000 read
error a) probably doesn't affect anything because of the higher-level
RAID array it's in and b) if it does, there's an error, a
we-could-not-read-this-data, you-can't-proceed, stop, fail,
get-it-from-another-source error

Re: [Veritas-bu] Tapeless backup environments?

2007-09-26 Thread Curtis Preston
Pls read my other post about the odds of this happening.  With a decent
key space, the odds of a hash collision with a 160=bit key space are so
small that any statistician would call them zero.  1 in 2^160.  Do you
know how big that number is?  It's a whole lot bigger than it looks.
And those odds are significantly better than the odds that you would
write a bad block of data to a regular disk drive and never know it.

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-Original Message-
From: bob944 [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 26, 2007 4:03 AM
To: veritas-bu@mailman.eng.auburn.edu
Cc: Curtis Preston
Subject: RE: [Veritas-bu] Tapeless backup environments?

cpreston:
 Simplistically, it checksums the block and looks in a table of
 checksums-of-blocks-that-it-already-stores to see if the identical
 ahem, anyone see a hole here? data already lives there.  
 
 To what hole do you refer? 

The idea that N bits of data can unambiguously be represented by fewer
than N bits.  Anyone who claims to the contrary might as well knock out
perpetual motion, antigravity and faster-than-light travel while they're
on a roll.

 I see one in your simplistic example, but
 not in what actually happens (which require a much longer technical
 explanation).

Hence my introduction that began with [s]implistically.  But throw in
all the much longer technical explanation you like, any process which
compares a reduction-of-data to another reduction-of-data will sooner or
later return foo when what was originally stored was bar.


cpreston:
 There are no products in the market that rely solely on a checksum to
 identify redundant data.  There are a few that rely solely on 
 a 160-bit
 hash, which is significantly larger than a checksum (typically 12-16

No importa.  The length of the checksum/hash/fingerprint and the
sophistication of its algorithm only affect how frequently--not
whether--the incorrect answer is generated.

 [...] The ability to forcibly create a hash collision means 
 absolutely nothing in the context of deduplication.

Of course it does.  Most examples in the literature concern storing
crafted-data-pattern-A (pay me one dollar) in order for the data to be
read later as something different (pay me one million dollars).  It
can't have escaped your attention that every day, some yahoo crafts
another buffer-or-stack overflow exploit; some of them are brilliant.
The notion that the bad guys will never figure out a way to plant a
silent data-change based on checksum/hash/fingerprint collisions is,
IMO, naive.

 What matters is the chance that two
 random chunks would have a hash collision. With a 128-bit and 160-bit
 key space, the odds of that happening are 1 in 2128 with MD5, and 1 in
 2160 with SHA-1. That's 1038 and 1048, respectively. If you 

Grasshopper, the wisdom is not in the numbers, it is in remembering that
HTML will not paste into ASCII well.  But I suspect you mean one in
2^128 or similar.

Those are impressive, and dare I guess, vendor-supplied, numbers.  And
they're meaningless.  We do not care about the odds that a particular
block the quick brown fox jumps over the lazy dog
checksums/hashes/fingerprints to the same value as another particular
block now is the time for all good men to come to the aid of their
party.  Of _course_ that will be astronomically unlikely, and with
sufficient hand-waving (to quote your article:  the odds of a hash
collision with two random chunks are roughly
1,461,501,637,330,900,000,000,000,000 times greater than the number of
bytes in the known computing universe) these totally meaningless
numbers can seem important.

They're not.  What _is_ important?  To me, it's important that if I read
back any of the N terrabytes of data I might store this week, I get the
same data that was written, not a silently changed version because the
checksum/hash/fingerprint of one block that I wrote collides with
another cheksum/hash/fingerprint.  I can NOT have that happen to any
block--in a file clerk's .pst, a directory inode or the finance
database.  Probably, it won't happen is not acceptable.

 Let's compare those odds with the odds of an unrecoverable 
 read error on a typical disk--approximately 1 in 100 trillion

Bogus comparison.  In this straw man, that 1/100,000,000,000,000 read
error a) probably doesn't affect anything because of the higher-level
RAID array it's in and b) if it does, there's an error, a
we-could-not-read-this-data, you-can't-proceed, stop, fail,
get-it-from-another-source error--NOT a silent changing of the data from
foo to bar on every read with no indication that it isn't the data that
was written.

 If you want to talk about the odds of something bad happening and not
 knowing it, keep using tape. Everyone who has worked with tape for any
 length of time has experienced a tape drive writing something that it
 then couldn't read.

That's not news, and why we've been making

Re: [Veritas-bu] Tapeless backup environments?

2007-09-26 Thread Chris Freemantle
It's interesting that the probability of any 2 randomly selected hashs 
being the same is quoted, rather than the probability that at least 2 
out of a whole group are the same. That's probably because the minutely 
small chance becomes rather bigger when you consider many hashs. This 
will still be small, but I suspect not as reassuringly small.

To illustrate this consider the 'birthday paradox'. How many people do 
you need in a room to have at least a 50% chance that 2 of them have the 
same birthday? The chance of any 2 randomly chosen people sharing the 
same birthday is 1/365 (neglecting leap years). Thats quite small, so we 
need a lot of people to get a 50% chance, right? Wrong. You need 23 
people. Google for 'birthday paradox' for the simple maths.

For our data I would certainly not use de-duping, even if it did work 
well on image data.


bob944 wrote:
 cpreston:
 Simplistically, it checksums the block and looks in a table of
 checksums-of-blocks-that-it-already-stores to see if the identical
 ahem, anyone see a hole here? data already lives there.  
 To what hole do you refer? 
 
 The idea that N bits of data can unambiguously be represented by fewer
 than N bits.  Anyone who claims to the contrary might as well knock out
 perpetual motion, antigravity and faster-than-light travel while they're
 on a roll.
 
 I see one in your simplistic example, but
 not in what actually happens (which require a much longer technical
 explanation).
 
 Hence my introduction that began with [s]implistically.  But throw in
 all the much longer technical explanation you like, any process which
 compares a reduction-of-data to another reduction-of-data will sooner or
 later return foo when what was originally stored was bar.
 
 
 cpreston:
 There are no products in the market that rely solely on a checksum to
 identify redundant data.  There are a few that rely solely on 
 a 160-bit
 hash, which is significantly larger than a checksum (typically 12-16
 
 No importa.  The length of the checksum/hash/fingerprint and the
 sophistication of its algorithm only affect how frequently--not
 whether--the incorrect answer is generated.
 
 [...] The ability to forcibly create a hash collision means 
 absolutely nothing in the context of deduplication.
 
 Of course it does.  Most examples in the literature concern storing
 crafted-data-pattern-A (pay me one dollar) in order for the data to be
 read later as something different (pay me one million dollars).  It
 can't have escaped your attention that every day, some yahoo crafts
 another buffer-or-stack overflow exploit; some of them are brilliant.
 The notion that the bad guys will never figure out a way to plant a
 silent data-change based on checksum/hash/fingerprint collisions is,
 IMO, naive.
 
 What matters is the chance that two
 random chunks would have a hash collision. With a 128-bit and 160-bit
 key space, the odds of that happening are 1 in 2128 with MD5, and 1 in
 2160 with SHA-1. That's 1038 and 1048, respectively. If you 
 
 Grasshopper, the wisdom is not in the numbers, it is in remembering that
 HTML will not paste into ASCII well.  But I suspect you mean one in
 2^128 or similar.
 
 Those are impressive, and dare I guess, vendor-supplied, numbers.  And
 they're meaningless.  We do not care about the odds that a particular
 block the quick brown fox jumps over the lazy dog
 checksums/hashes/fingerprints to the same value as another particular
 block now is the time for all good men to come to the aid of their
 party.  Of _course_ that will be astronomically unlikely, and with
 sufficient hand-waving (to quote your article:  the odds of a hash
 collision with two random chunks are roughly
 1,461,501,637,330,900,000,000,000,000 times greater than the number of
 bytes in the known computing universe) these totally meaningless
 numbers can seem important.
 
 They're not.  What _is_ important?  To me, it's important that if I read
 back any of the N terrabytes of data I might store this week, I get the
 same data that was written, not a silently changed version because the
 checksum/hash/fingerprint of one block that I wrote collides with
 another cheksum/hash/fingerprint.  I can NOT have that happen to any
 block--in a file clerk's .pst, a directory inode or the finance
 database.  Probably, it won't happen is not acceptable.
 
 Let's compare those odds with the odds of an unrecoverable 
 read error on a typical disk--approximately 1 in 100 trillion
 
 Bogus comparison.  In this straw man, that 1/100,000,000,000,000 read
 error a) probably doesn't affect anything because of the higher-level
 RAID array it's in and b) if it does, there's an error, a
 we-could-not-read-this-data, you-can't-proceed, stop, fail,
 get-it-from-another-source error--NOT a silent changing of the data from
 foo to bar on every read with no indication that it isn't the data that
 was written.
 
 If you want to talk about the odds of something bad happening and not
 knowing it, keep using 

Re: [Veritas-bu] Tapeless backup environments?

2007-09-26 Thread A Darren Dunham
On Wed, Sep 26, 2007 at 04:02:49AM -0400, bob944 wrote:
 Bogus comparison.  In this straw man, that 1/100,000,000,000,000 read
 error a) probably doesn't affect anything because of the higher-level
 RAID array it's in and b) if it does, there's an error, a
 we-could-not-read-this-data, you-can't-proceed, stop, fail,
 get-it-from-another-source error--NOT a silent changing of the data from
 foo to bar on every read with no indication that it isn't the data that
 was written.

While I find the compare only based on hash a bit annoying for other
reasons, the argument above doesn't convince me.

Disks, controllers, and yes RAID arrays can fail silently in all sorts
of ways by either acknowledging a write that is not done, writing to the
wrong location, reading from the wrong location, or reading blocks where
only some of the data came from the correct location.  Most RAID systems
do not verify data on read to protect against silent data errors on the
storage, only against obvious failures.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-26 Thread A Darren Dunham
On Wed, Sep 26, 2007 at 09:58:12AM -0400, Jeff Lightner wrote:
 I also suggest the argument is flawed because it seems to imply that
 only the cksum is stored and no actual the data - it is original
 compressed data AND the cksum that result in the restore - not the cksum
 alone.

It's not that the actual data isn't stored, it's whether or not the
actual data is checked.  Some algorithms search through the hash space,
and if a hit comes up, they assume that the previously stored data is a
match without a comparison.

The original data must always be stored.  Even if it were possible to
run a hash algorithm in reverse quickly, there would be no way to
determine which of various possible input strings was the original.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-26 Thread A Darren Dunham
On Wed, Sep 26, 2007 at 04:22:01PM +0100, Chris Freemantle wrote:
 For our data I would certainly not use de-duping, even if it did work 
 well on image data.

There are different ways of doing deduplication.  Not all of them rely
on hash signature matching to find redundant data.  You should talk with
a particular vendor and see how they accomplish it.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-26 Thread bob944
 Most of this while well documented seems to boil down to the same
 alarmist notion that had people trying to ban cell phones in gas
 stations.  The possibility that something untoward COULD 
 happen does NOT
 mean it WILL happen.  To date I don't know of a single gas pump

I can't speak for car fires, but I can speak for
checksums/hashes/fingerprints mapping to more than one set of data.
It's been demonstrated.  It happens.  It _has_ to happen.  It's the way
these data reductions work, and the reason why it's more convenient to
refer to small hashes of data rather than the full data for many
uses--this has been a programming commonplace since the '50s.  But
programmers know it's not a two-way street:  a set of data generates
only one checksum/hash/fingerprint, but one checksum/hash/fingerprint
maps to more than one set of data.  And that's fine, for a program that
takes this into account (either because it doesn't matter to the
program's logic or a secondary step checks the data).  As a trivial
example, reducing three-bit data to a two-bit checksum means that trying
to go backwards will retrieve the wrong three-bit data 50% of the time.
Bigger hashes and more sophisticated algorithms reduce the number of
times you get the wrong data; they don't eliminate it.

 If odds are so important it seems it would be important to worry about
 the odds that your data center, your offsite storage location and your
 Disaster Recovery site will all be taken out at the same time.

And if it's not important that the data you read may not be what was
written, don't let me stop you.  _The odds are_ that it'll be okay.  

 I also suggest the argument is flawed because it seems to imply that
 only the cksum is stored and no actual the data - it is original
 compressed data AND the cksum that result in the restore - 
 not the cksum alone.

If I get your meaning, you have an incorrect understanding of the
argument--nobody is talking about generating the original data from a
checksum.  As I said in what you quoted (trimmed here), every unique (as
determined by the implementation) block of data gets stored, once.  A
data stream is stored as a list of pointers or
checksums/hashes/fingerprints which refer to those common-storage
blocks.  Any number of data streams will point to the same block
when they have it in common, and as many times as that block occurs in
their data stream.  To read the data stream later, the list of pointers
tells the implementation what blocks to retrieve and send back to the
file reader.  Now, if foo and bar both reduced to the same
checksum/hash/fingerprint when stored, somebody is going to receive the
wrong data when the stream(s) that had those data are read.  So sorry
about that corrupted payroll master file...


___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-26 Thread bob944
 Pls read my other post about the odds of this happening.  
 With a decent
 key space, the odds of a hash collision with a 160=bit key 
 space are so
 small that any statistician would call them zero.  1 in 2^160.  Do you
 know how big that number is?  It's a whole lot bigger than it looks.
 And those odds are significantly better than the odds that you would
 write a bad block of data to a regular disk drive and never know it.

I did read your other post, and addressed your numbers.  C Freemantle
makes the same point I do, perhaps more clearly, in his birthday
paradox posting.


___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-26 Thread bob944
 On Wed, Sep 26, 2007 at 04:02:49AM -0400, bob944 wrote:
  Bogus comparison.  In this straw man, that 
  1/100,000,000,000,000 read error a) probably doesn't
  affect anything because of the higher-level RAID array
  it's in and b) if it does, there's an error, a
  we-could-not-read-this-data, you-can't-proceed, stop,
  fail, get-it-from-another-source error--NOT a silent
  changing of the data from foo to bar on every read
  with no indication that it isn't the data that
  was written.
 
 While I find the compare only based on hash a bit annoying
 for other reasons, the argument above doesn't convince me.
 
 Disks, controllers, and yes RAID arrays can fail silently in
 all sorts of ways by either acknowledging a write that is not
 done, writing to the wrong location, reading from the wrong
 location, or reading blocks where only some of the data came
 from the correct location.  Most RAID systems do not verify
 data on read to protect against silent data errors on the
 storage, only against obvious failures.

Perhaps anything can have a failure mode where it doesn't alert--but in
a previous lifetime in hardware and some design, I saw only one
undetected data transformation that did not crash or in some way cause
obvious problems (intermittent gate in a mainframe adder that didn't
affect any instructions used by the OS).  

I don't remember a disk that didn't maintain, compare and _use for error
detection_, the cylinder, head and sector numbers in the format.  

The write frailties mentioned, if they occur, will fail on read.  And
the read frailties mentioned will generally (homage paid to the
mainframe example I cited as the _only_ one I ever saw that didn't)
cause enough mayhem that apps or data or systems go belly-up in a big
way, fast.  

These events, like double-bit parity errors or EDAC failures, involve
1.  that something breaks in the first place
2.  that it not be reported
3.  that the effects are so subtle that they are unnoticed (the app or
system doesn't crash, the data aren't wildly corrupted, ...)

The problem with checksumming/hashing/fingerprinting is that the
methodology has unavoidable errors designed in, and an implementation
with no add-on logic to prevent or detect them will silently corrupt
data.  That's totally different, IMO.


___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-24 Thread Curtis Preston
And yet there are many companies backing up well beyond a Terabyte from
remote offices back to their central office using de-duplication.
Consider JPMC's presentation at the last vision.  They're backing up
over 200 remote offices using Puredisk, a de-duplication backup product.
I don't remember the exact numbers, but many of them were quite large.

 

I don't think that bandwidth is free, but neither are trucks.  AND if
you're going the truck route, make sure you add the cost and risk of an
encryption system to the mix.

 

---

W. Curtis Preston

Backup Blog @ www.backupcentral.com

VP Data Protection, GlassHouse Technologies



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ed Wilts
Sent: Saturday, September 22, 2007 9:35 AM
To: 'Jeff Lightner'; veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

 

Here's some simple math that may help (complements of ExaGrid's web
site).

 

If you have 1TB of data with a 2% change rate, you'll need to back up
20GB of daily incrementals.  To replicate this to another site in 18
hours requires 3Mbps of bandwidth.  If you have lots of bandwidth or not
too much data, replication to an offsite location may make sense.  But
to think that you can replicate your backups for 20TB of data to another
state is going to make your network group squirm.  Iron Mountain looks
pretty cheap comparing to offsite electronic replication.

 

We have 1 application by itself that adds 30GB of new data every day.
It's being replicated within the metro area over a 1Gbps pipe (real
time, not via backups).  We sure couldn't replicate everything...

 

As the OLD saying goes, never understand the bandwidth of a station
wagon full of tapes.

 

   .../Ed

 

--

Ed Wilts, RHCE, BCFP, BCSD

Mounds View, MN, USA

mailto:[EMAIL PROTECTED]

 

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jeff
Lightner
Sent: Friday, September 21, 2007 8:44 AM
To: veritas-bu@mailman.eng.auburn.edu
Subject: [Veritas-bu] Tapeless backup environments?

 

Yesterday our director said that he doesn't intend to ever upgrade
existing STK L700 because eventually we'll go tapeless as that is what
the industry is doing.   The idea being we'd have our disk backup
devices here (e.g. Data Domain) and transfer to offsite storage to
another disk device so as to eliminate the need for ever transporting
tapes.

It made me wonder if anyone was actually doing the above already or was
planning to do so?

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-24 Thread Curtis Preston
A 1 TB array that can store 20 TB of de-duped data in it will cost about
$20K.  (A general rule of them is to base your pricing on a 20:1 de-dupe
ratio, then price it at about $1/GB of effective storage.  If you do
that, you'll be close to list price of a lot of products.)  At that
cost, it's very close to the price of a tape library fully populated
with tapes and drives.

As to whether or not it's worth it for a given setup, you should
obviously test it vs the pricing, but it's very uncommon for it to not
make sense financially.  I can think of three setups that are known
issues:

1. If you're using it for disk staging and not storing any retention on
it.  A lot of the de-dupe comes from de-duping full backups against each
other.

2. If you're trying to de-dupe non-dedupe-able things, such as seismic
data, medical imaging data, or any other data types that are
automatically created by a computer (as opposed to database entries and
Word docs.)

3. If your backup product doesn't do full backups of filesystem data,
you will not get as much as other people.

Everything is also negotiable.  If you've tested and you're not getting
the advertised de-dupe ratio, use that in the negotiation stage.  If
they generally advertise 20:1 and you're only getting 10:1, it would
seem reasonable to assume a 50% discount.

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-Original Message-
From: Ed Wilts [mailto:[EMAIL PROTECTED] 
Sent: Saturday, September 22, 2007 9:47 AM
To: Curtis Preston; 'Justin Piszcz'; 'Jeff Lightner'
Cc: veritas-bu@mailman.eng.auburn.edu
Subject: RE: [Veritas-bu] Tapeless backup environments?

But Curtis, a disk drive by itself isn't very useful either - you'll
need to
a controller or two.

And don't forget to factor in the price of the de-duplication appliances
or
software.  Those suckers are *NOT* cheap.  An appliance to support 1TB
of
compressed data lists out at about $20K.  Unless you get a *lot* of
de-duplication - and not everybody does - that appliance is going to get
killed on price compared to not de-duping it.

It took me only 30 minutes with a de-dupe vendor last week to eliminate
their product from consideration in our environment.

.../Ed

--
Ed Wilts, RHCE, BCFP, BCSD
Mounds View, MN, USA
mailto:[EMAIL PROTECTED]


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:veritas-bu-
 [EMAIL PROTECTED] On Behalf Of Curtis Preston
 Sent: Friday, September 21, 2007 12:10 PM
 To: Justin Piszcz; Jeff Lightner
 Cc: veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu] Tapeless backup environments?
 
 First, you can't compare the cost of disk and tape directly like that.
 You have to include the drives and robots.  A drive by itself is
 useful;
 a tape by itself is not.
 
 Setting that aside, if I put that disk in a system that's doing 20:1
 de-duplication, my cost is now 1.65c/GB vs your 3-9c/GB.
 
 ---
 W. Curtis Preston
 Backup Blog @ www.backupcentral.com
 VP Data Protection, GlassHouse Technologies
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Justin
 Piszcz
 Sent: Friday, September 21, 2007 7:36 AM
 To: Jeff Lightner
 Cc: veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu] Tapeless backup environments?
 
 
 I believe disks are 33c/gigabyte and tapes are 3-9cents/gigabyte or
 even
 
 cheaper, I do not remember the exact figures, but someone I know has
 done
 a cost analysis and tapes were by far cheaper.  Also something that
 nobody
 calculates is the cost of power to keep disks spinning.
 
 Justin.
 
 On Fri, 21 Sep 2007, Jeff Lightner wrote:
 
  Disk is not cheaper?  You've done a cost analysis?
 
  Not saying you're wrong and I haven't done an analysis but I'd be
  surprised if disks didn't actually work out to be cheaper over time:
 
  1) Tapes age/break - We buy on average several hundred tapes a year
-
  support on a disk array for failing disks may or may not be more
  expensive.
 
  2) Transport/storage - We have to pay for offsite storage and
 transfer
 -
  it seems just putting an array in offsite facility would eliminate
 the
  need for transportation (in trucks) cost.  Of course there would be
 cost
  in the data transfer disk to disk but since everyone seems to have
  connectivity over the internet it might be possible to do this using
 a
  B2B link rather than via dedicated circuits.
 
  3) Labor cost in dealing with mechanical failures of robots.   This
 one
  is hidden in salary but every time I have to work on a robot it
means
 I
  can't be working on something else.   While disk drives fail it
 doesn't
  seem to happen nearly as often as having to fish a tape out of a
 drive
  or the tape drive itself having failed.
 
 
  -Original Message-
  From: Justin Piszcz [mailto:[EMAIL PROTECTED]
  Sent: Friday, September 21, 2007 10:08 AM
  To: Jeff Lightner
  Cc: veritas-bu@mailman.eng.auburn.edu
  Subject: Re: [Veritas-bu

Re: [Veritas-bu] Tapeless backup environments?

2007-09-24 Thread Curtis Preston
Ed Wilts said:

1)  Disk ages and breaks too.  

But with RAID, no longer will the failure of a piece of media cause a
backup or restore failure.

2)  Transport is cheap.  I'd be surprised if I couldn't transport a
thousand tapes for the cost of a terabyte of storage.  Bandwidth to
move data is *NOT* cheap.  20GB/day requires 3Mbps of pipe.

I've done a number of cost comparisons lately, and you're right.  It's
not cheap, but it's not astronomical either.  And you need to weigh that
cost against not having the risk of a lost tape and all the
multi-million dollar costs that come along with that these days.

3)  I spend more time replacing disk drives than I do replacing tapes
or
tape drives.   To back up my 1200 SAN-based spindles, I have 6 LTO-3
drives.

You have 200 times more disk drives than you have tape drives.  Of
course you spend more time replacing them.  But those drive failures
never have to cause backup or restore failures, as tape/drive failures
do.  Try having a few hundred tape drives and see how your life changes.
I have a customer with 100 drives and their tape drive vendor is in once
a week swapping something, and each one of those swaps is associated
with a backup or restore failure.


___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-24 Thread Paul Keating
I currently backup 9TB of data to VTL during a FULL window which writes
~100GB of data to the VTL repository in that window.
 
Another state is one thing, but across town via DWDM is no prob.
 
out of state is handled by duping that data to phys tapewouldn't
want to dupe disk outside of a DWDM connection.
 
Paul
 
-- 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ed Wilts
Sent: September 22, 2007 9:35 AM
To: 'Jeff Lightner'; veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?



Here's some simple math that may help (complements of ExaGrid's
web site).

 

If you have 1TB of data with a 2% change rate, you'll need to
back up 20GB of daily incrementals.  To replicate this to another site
in 18 hours requires 3Mbps of bandwidth.  If you have lots of bandwidth
or not too much data, replication to an offsite location may make sense.
But to think that you can replicate your backups for 20TB of data to
another state is going to make your network group squirm.  Iron Mountain
looks pretty cheap comparing to offsite electronic replication.

 

We have 1 application by itself that adds 30GB of new data every
day.  It's being replicated within the metro area over a 1Gbps pipe
(real time, not via backups).  We sure couldn't replicate everything...

 

As the OLD saying goes, never understand the bandwidth of a
station wagon full of tapes.

 

   .../Ed



La version française suit le texte anglais.



This email may contain privileged and/or confidential information, and the Bank 
of
Canada does not waive any related rights. Any distribution, use, or copying of 
this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately 
from
your system and notify the sender promptly by email that you have done so. 



Le présent courriel peut contenir de l'information privilégiée ou 
confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute 
diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires désignés est interdite. Si vous 
recevez
ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans 
délai à
l'expéditeur un message électronique pour l'aviser que vous avez éliminé de 
votre
ordinateur toute copie du courriel reçu.
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-24 Thread Paul Keating

With VTL there is no need to multistream.

Instead of writing 8 stream to 1 drive, just create 8 Virtual drives,
and not multiplex.

It's not because of a performance issue, it's an advantage of
virtualization.

As far as performance goes, with a Disk as disk config, to create a high
perf target, you would need to create HLUNs which are striped over many,
many LUNs on your array, or present LUNs which are stripes of segments
of many RAID groups.

Many VTLs (the one I'm using, for instance) distribute the writes over
many LUNs. I'm currently writing dozens of simultaneous jobs distributed
over 28 separate LUNs.

The data reduction (compression)  throughput I'm getting with VTL is
definitely better, on a per client job basis than I was getting to
MPX'ed jobs going to LTO2.

Offsite is SUPER easywe replicate our LUNs caontaining the de-duped
data to our DR site.
To bring up the other site, once the DR LUNs are made R/W, we just start
the daemons on the DR VTL and away we go.
The devices are available there as they were at head office.

Don't even need to rediscover devices on the NBU servers.

Vault works great for spinning off copies to Physical tapes, if
necessary.

Paul


-- 


 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf 
 Of Clem Kruger
 Sent: September 22, 2007 5:12 AM
 To: Jeff Lightner; Justin Piszcz
 Cc: veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu] Tapeless backup environments?
 
 Compression on a VTL is done by the operating system (normally LINUX)
 which we all know is a slow process and therefore not 
 recommended. Your
 VTL supplier will also recommend that you do not multistream as this
 also slows down the process.


La version française suit le texte anglais.



This email may contain privileged and/or confidential information, and the Bank 
of
Canada does not waive any related rights. Any distribution, use, or copying of 
this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately 
from
your system and notify the sender promptly by email that you have done so. 



Le présent courriel peut contenir de l'information privilégiée ou 
confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute 
diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires désignés est interdite. Si vous 
recevez
ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans 
délai à
l'expéditeur un message électronique pour l'aviser que vous avez éliminé de 
votre
ordinateur toute copie du courriel reçu.

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-24 Thread Dave Markham
Guys i've just read this thread and can say im very interested in it.
The first thing is i learned a new term called deduplication which i
didn't know existed.

Question : I gather Deduplication is using other software. DataDomain i
think i saw mentioned. Where does this fit in with Netbackup and does
the software reside on every client or just a server somewhere?

Ok, so im trying to kit refresh a backup environment for a customer
which has 2 sites. Production and DR about 200 miles apart. There is a
link between the sites but the customer will probably frown on increased
bandwidth charges to transfer backup data across for DisasterRecovery
purposes.

Data is probably only 1 TB for the site with perhaps 70% being required
to be transfered daily to offsite media.

Currently i use tape and i was just speccing a new tape system as i
thought by using disk based backups, and retentions of weekly/monthly
backups lasting say 6 weeks, im going to need a LOT of disk, plus the
bandwidth transfer costs to DR site

LTO3 tapes are storing 200gb a tape which is pretty good compared to
disk i thought.

I guess in my set up its a trade off between :-

Initial cost of disk array vs initial cost of tape library, drives and media

Time take to backup ( network will be bottle neck here. Still on 100Meg
lan with just 2 DB servers using GigaBit lan to backup server.

Offsite transfer of tapes daily to offsite location vs Cost of increased
bandwith between sites to transfer backup data.


Im now confused what to propose :)



___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-24 Thread Justin Piszcz


On Mon, 24 Sep 2007, Dave Markham wrote:

 Guys i've just read this thread and can say im very interested in it.
 The first thing is i learned a new term called deduplication which i
 didn't know existed.

 Question : I gather Deduplication is using other software. DataDomain i
 think i saw mentioned. Where does this fit in with Netbackup and does
 the software reside on every client or just a server somewhere?

 Ok, so im trying to kit refresh a backup environment for a customer
 which has 2 sites. Production and DR about 200 miles apart. There is a
 link between the sites but the customer will probably frown on increased
 bandwidth charges to transfer backup data across for DisasterRecovery
 purposes.

 Data is probably only 1 TB for the site with perhaps 70% being required
 to be transfered daily to offsite media.

 Currently i use tape and i was just speccing a new tape system as i
 thought by using disk based backups, and retentions of weekly/monthly
 backups lasting say 6 weeks, im going to need a LOT of disk, plus the
 bandwidth transfer costs to DR site

 LTO3 tapes are storing 200gb a tape which is pretty good compared to
 disk i thought.
LTO-3 = 400GiB


 I guess in my set up its a trade off between :-

 Initial cost of disk array vs initial cost of tape library, drives and media

 Time take to backup ( network will be bottle neck here. Still on 100Meg
 lan with just 2 DB servers using GigaBit lan to backup server.

 Offsite transfer of tapes daily to offsite location vs Cost of increased
 bandwith between sites to transfer backup data.


 Im now confused what to propose :)



 ___
 Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
 http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-24 Thread Jeff Lightner
Data Domain makes a hardware storage device (disks) which does
deduplication.  Rather than backing up block for block all the time it
does it only for the first backup.  For subsequent backups rather than
doing an incremental backup at file level it backups up incrementally at
block level meaning only the blocks that changed in the source are
stored on the target.

The benefit to this is good for things like databases on filesystems
where the datafile gets updated for any write to the datafile.   A
standard file incremental would backup the entire datafile but a
deduplication incremental would only backup the blocks modified within
the datafile.   One can get what appears to be a very high level of
compression to the deduplication storage.   I've seen numbers like 20:1
and even one person on this list last year said something like 80:1
though that wouldn't be typical.

Data Domain isn't the only deduplication company out there and we
haven't yet implemented the ones we bought (though we will before the
end of October).   I was contacted off list by another company called
Sepaton but there solution seemed to require one to one correspondence
between original storage and target storage.  I believe there is at
least one other company doing deduplication but I don't recall who
(Falconstore maybe)?


-Original Message-
From: Dave Markham [mailto:[EMAIL PROTECTED] 
Sent: Monday, September 24, 2007 11:35 AM
To: Jeff Lightner
Cc: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

Guys i've just read this thread and can say im very interested in it.
The first thing is i learned a new term called deduplication which i
didn't know existed.

Question : I gather Deduplication is using other software. DataDomain i
think i saw mentioned. Where does this fit in with Netbackup and does
the software reside on every client or just a server somewhere?

Ok, so im trying to kit refresh a backup environment for a customer
which has 2 sites. Production and DR about 200 miles apart. There is a
link between the sites but the customer will probably frown on increased
bandwidth charges to transfer backup data across for DisasterRecovery
purposes.

Data is probably only 1 TB for the site with perhaps 70% being required
to be transfered daily to offsite media.

Currently i use tape and i was just speccing a new tape system as i
thought by using disk based backups, and retentions of weekly/monthly
backups lasting say 6 weeks, im going to need a LOT of disk, plus the
bandwidth transfer costs to DR site

LTO3 tapes are storing 200gb a tape which is pretty good compared to
disk i thought.

I guess in my set up its a trade off between :-

Initial cost of disk array vs initial cost of tape library, drives and
media

Time take to backup ( network will be bottle neck here. Still on 100Meg
lan with just 2 DB servers using GigaBit lan to backup server.

Offsite transfer of tapes daily to offsite location vs Cost of increased
bandwith between sites to transfer backup data.


Im now confused what to propose :)


--

CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential 
information and is for the sole use of the intended recipient(s). If you are 
not the intended recipient, any disclosure, copying, distribution, or use of 
the contents of this information is prohibited and may be unlawful. If you have 
received this electronic transmission in error, please reply immediately to the 
sender that you have received the message in error, and delete it. Thank you.

--



___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-24 Thread Schaefer, Harry
For on-demand type database backups, I had great success with setting up
a simple SATA-based DSU which was seen by one of the media servers. It
had a vault policy to dump it to tape after 4-5 days, then expire the
DSU image. It worked out great for informix onbar log dumps
especially...

Harry S.
Atlanta


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Justin
Piszcz
Sent: Saturday, September 22, 2007 10:28 AM
To: Ed Wilts
Cc: veritas-bu@mailman.eng.auburn.edu; 'Jeff Lightner'
Subject: Re: [Veritas-bu] Tapeless backup environments?

Don't even get me started on SANs, I have seen the entire loss of an MTI

(now EMC) SAN and with the new Claiiron SANS I have seen entire shelves
go 
off-line due to bad SPAs etc, IMO not reliable.

Also with disk, I have a question with VTLs, etc, if I am feeding
multiple 
LTO-3 tape drives using 10Gbps; what type of disk/VTL (not SAN) is out 
there that can accept multiple 10Gbps streams/data and will not choke?

VTLs seem like a good idea for filesystem backups but for on-demand 
database backups, I do not see them as the holy grail.

Justin.

  On Sat, 22 Sep 2007, Ed Wilts wrote:

 1)  Disk ages and breaks too.
 2)  Transport is cheap.  I'd be surprised if I couldn't transport a
thousand
 tapes for the cost of a terabyte of storage.  Bandwidth to move data
is
 *NOT* cheap.  20GB/day requires 3Mbps of pipe.
 3)  I spend more time replacing disk drives than I do replacing tapes
or
 tape drives.   To back up my 1200 SAN-based spindles, I have 6 LTO-3
drives.
 It sounds like you need to either replace your tape drives or treat
them
 better.  We do work on our robots perhaps once every few months.  We
replace
 disk drives on a weekly basis.  NetBackup requires a *lot* more time
than
 the robots or the disk drives ever will.

   .../Ed

 --
 Ed Wilts, RHCE, BCFP, BCSD
 Mounds View, MN, USA
 mailto:[EMAIL PROTECTED]

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:veritas-bu-
 [EMAIL PROTECTED] On Behalf Of Jeff Lightner
 Sent: Friday, September 21, 2007 9:34 AM
 To: Justin Piszcz
 Cc: veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu] Tapeless backup environments?

 Disk is not cheaper?  You've done a cost analysis?

 Not saying you're wrong and I haven't done an analysis but I'd be
 surprised if disks didn't actually work out to be cheaper over time:

 1) Tapes age/break - We buy on average several hundred tapes a year -
 support on a disk array for failing disks may or may not be more
 expensive.

 2) Transport/storage - We have to pay for offsite storage and
transfer
 -
 it seems just putting an array in offsite facility would eliminate
the
 need for transportation (in trucks) cost.  Of course there would be
 cost
 in the data transfer disk to disk but since everyone seems to have
 connectivity over the internet it might be possible to do this using
a
 B2B link rather than via dedicated circuits.

 3) Labor cost in dealing with mechanical failures of robots.   This
one
 is hidden in salary but every time I have to work on a robot it means
I
 can't be working on something else.   While disk drives fail it
doesn't
 seem to happen nearly as often as having to fish a tape out of a
drive
 or the tape drive itself having failed.


 -Original Message-
 From: Justin Piszcz [mailto:[EMAIL PROTECTED]
 Sent: Friday, September 21, 2007 10:08 AM
 To: Jeff Lightner
 Cc: veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu] Tapeless backup environments?



 On Fri, 21 Sep 2007, Jeff Lightner wrote:

 Yesterday our director said that he doesn't intend to ever upgrade
 existing STK L700 because eventually we'll go tapeless as that is
 what
 the industry is doing.   The idea being we'd have our disk backup
 devices here (e.g. Data Domain) and transfer to offsite storage to
 another disk device so as to eliminate the need for ever
transporting
 tapes.

 It made me wonder if anyone was actually doing the above already or
 was
 planning to do so?


 That seems to be the way people are 'thinking' but the bottom line is
 disk
 still is not cheaper than LTO-3 tape and there are a lot of
advantages
 to
 tape; however, convicing management of this is an uphill battle.

 Justin.

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-24 Thread Paul Keating
There are several.
FalconStor, Diligent, Quantum and Sepaton I believe will all present a
tape to an NDMP device, and provide de-dupe on the backend.

Paul

-- 


 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf 
 Of Jim Horalek
 Sent: September 24, 2007 12:43 PM
 To: veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu] Tapeless backup environments?
 
 
 On a similar note how does NDMP play with Disk de-dup? All of 
 the de-dups
 I've seem are NAS devices. NDMP only talks to tape or VTL. 
 Are there VTL's
 with De-dup that would solve the NDMP problem?
 
 Jim


La version française suit le texte anglais.



This email may contain privileged and/or confidential information, and the Bank 
of
Canada does not waive any related rights. Any distribution, use, or copying of 
this
email or the information it contains by other than the intended recipient is
unauthorized. If you received this email in error please delete it immediately 
from
your system and notify the sender promptly by email that you have done so. 



Le présent courriel peut contenir de l'information privilégiée ou 
confidentielle.
La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute 
diffusion,
utilisation ou copie de ce courriel ou des renseignements qu'il contient par une
personne autre que le ou les destinataires désignés est interdite. Si vous 
recevez
ce courriel par erreur, veuillez le supprimer immédiatement et envoyer sans 
délai à
l'expéditeur un message électronique pour l'aviser que vous avez éliminé de 
votre
ordinateur toute copie du courriel reçu.

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-24 Thread Jim Horalek
On a similar note how does NDMP play with Disk de-dup? All of the de-dups
I've seem are NAS devices. NDMP only talks to tape or VTL. Are there VTL's
with De-dup that would solve the NDMP problem?

Jim

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Dave Markham
Sent: Monday, September 24, 2007 8:35 AM
To: Jeff Lightner
Cc: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?


Guys i've just read this thread and can say im very interested in it. The
first thing is i learned a new term called deduplication which i didn't know
existed.

Question : I gather Deduplication is using other software. DataDomain i
think i saw mentioned. Where does this fit in with Netbackup and does the
software reside on every client or just a server somewhere?

Ok, so im trying to kit refresh a backup environment for a customer which
has 2 sites. Production and DR about 200 miles apart. There is a link
between the sites but the customer will probably frown on increased
bandwidth charges to transfer backup data across for DisasterRecovery
purposes.

Data is probably only 1 TB for the site with perhaps 70% being required to
be transfered daily to offsite media.

Currently i use tape and i was just speccing a new tape system as i thought
by using disk based backups, and retentions of weekly/monthly backups
lasting say 6 weeks, im going to need a LOT of disk, plus the bandwidth
transfer costs to DR site

LTO3 tapes are storing 200gb a tape which is pretty good compared to disk i
thought.

I guess in my set up its a trade off between :-

Initial cost of disk array vs initial cost of tape library, drives and media

Time take to backup ( network will be bottle neck here. Still on 100Meg lan
with just 2 DB servers using GigaBit lan to backup server.

Offsite transfer of tapes daily to offsite location vs Cost of increased
bandwith between sites to transfer backup data.


Im now confused what to propose :)



___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-24 Thread Clem Kruger
Hi Dave,

Yes it is a difficult decision I have looked at DataDomain with
NetBackup. I have found that the backups are faster and there is a vast
amount of disk being saved.

NetBackup 6.5 includes de-duplication and I have become a great friend
of it. To use the words of a supplier, Saving me Time, Saving me Space
and Saving me Money :)


Kind Regards,
Clem Kruger

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Dave
Markham
Sent: 24 September 2007 17:35 PM
To: Jeff Lightner
Cc: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

Guys i've just read this thread and can say im very interested in it.
The first thing is i learned a new term called deduplication which i
didn't know existed.

Question : I gather Deduplication is using other software. DataDomain i
think i saw mentioned. Where does this fit in with Netbackup and does
the software reside on every client or just a server somewhere?

Ok, so im trying to kit refresh a backup environment for a customer
which has 2 sites. Production and DR about 200 miles apart. There is a
link between the sites but the customer will probably frown on increased
bandwidth charges to transfer backup data across for DisasterRecovery
purposes.

Data is probably only 1 TB for the site with perhaps 70% being required
to be transfered daily to offsite media.

Currently i use tape and i was just speccing a new tape system as i
thought by using disk based backups, and retentions of weekly/monthly
backups lasting say 6 weeks, im going to need a LOT of disk, plus the
bandwidth transfer costs to DR site

LTO3 tapes are storing 200gb a tape which is pretty good compared to
disk i thought.

I guess in my set up its a trade off between :-

Initial cost of disk array vs initial cost of tape library, drives and
media

Time take to backup ( network will be bottle neck here. Still on 100Meg
lan with just 2 DB servers using GigaBit lan to backup server.

Offsite transfer of tapes daily to offsite location vs Cost of increased
bandwith between sites to transfer backup data.


Im now confused what to propose :)



___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-24 Thread Justin Piszcz
Do you need a special license for 6.5 or can those with 6.0 licenses 
upgrade?  I assume you need to open a case with NetBackup to get the 
download links?

Justin.

On Mon, 24 Sep 2007, Clem Kruger wrote:

 Hi Dave,

 Yes it is a difficult decision I have looked at DataDomain with
 NetBackup. I have found that the backups are faster and there is a vast
 amount of disk being saved.

 NetBackup 6.5 includes de-duplication and I have become a great friend
 of it. To use the words of a supplier, Saving me Time, Saving me Space
 and Saving me Money :)


 Kind Regards,
 Clem Kruger

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Dave
 Markham
 Sent: 24 September 2007 17:35 PM
 To: Jeff Lightner
 Cc: veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu] Tapeless backup environments?

 Guys i've just read this thread and can say im very interested in it.
 The first thing is i learned a new term called deduplication which i
 didn't know existed.

 Question : I gather Deduplication is using other software. DataDomain i
 think i saw mentioned. Where does this fit in with Netbackup and does
 the software reside on every client or just a server somewhere?

 Ok, so im trying to kit refresh a backup environment for a customer
 which has 2 sites. Production and DR about 200 miles apart. There is a
 link between the sites but the customer will probably frown on increased
 bandwidth charges to transfer backup data across for DisasterRecovery
 purposes.

 Data is probably only 1 TB for the site with perhaps 70% being required
 to be transfered daily to offsite media.

 Currently i use tape and i was just speccing a new tape system as i
 thought by using disk based backups, and retentions of weekly/monthly
 backups lasting say 6 weeks, im going to need a LOT of disk, plus the
 bandwidth transfer costs to DR site

 LTO3 tapes are storing 200gb a tape which is pretty good compared to
 disk i thought.

 I guess in my set up its a trade off between :-

 Initial cost of disk array vs initial cost of tape library, drives and
 media

 Time take to backup ( network will be bottle neck here. Still on 100Meg
 lan with just 2 DB servers using GigaBit lan to backup server.

 Offsite transfer of tapes daily to offsite location vs Cost of increased
 bandwith between sites to transfer backup data.


 Im now confused what to propose :)



 ___
 Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
 http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

 ___
 Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
 http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-24 Thread Clem Kruger
I am not quite sure how it is done there. I would contact Symantec in
your area and ask how they will manage your license.

 
 
 
 
Kind Regards,
Clem Kruger

-Original Message-
From: Justin Piszcz [mailto:[EMAIL PROTECTED] 
Sent: 24 September 2007 19:16 PM
To: Clem Kruger
Cc: [EMAIL PROTECTED]; Jeff Lightner;
veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

Do you need a special license for 6.5 or can those with 6.0 licenses 
upgrade?  I assume you need to open a case with NetBackup to get the 
download links?

Justin.

On Mon, 24 Sep 2007, Clem Kruger wrote:

 Hi Dave,

 Yes it is a difficult decision I have looked at DataDomain with
 NetBackup. I have found that the backups are faster and there is a
vast
 amount of disk being saved.

 NetBackup 6.5 includes de-duplication and I have become a great friend
 of it. To use the words of a supplier, Saving me Time, Saving me
Space
 and Saving me Money :)


 Kind Regards,
 Clem Kruger

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Dave
 Markham
 Sent: 24 September 2007 17:35 PM
 To: Jeff Lightner
 Cc: veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu] Tapeless backup environments?

 Guys i've just read this thread and can say im very interested in it.
 The first thing is i learned a new term called deduplication which i
 didn't know existed.

 Question : I gather Deduplication is using other software. DataDomain
i
 think i saw mentioned. Where does this fit in with Netbackup and does
 the software reside on every client or just a server somewhere?

 Ok, so im trying to kit refresh a backup environment for a customer
 which has 2 sites. Production and DR about 200 miles apart. There is a
 link between the sites but the customer will probably frown on
increased
 bandwidth charges to transfer backup data across for DisasterRecovery
 purposes.

 Data is probably only 1 TB for the site with perhaps 70% being
required
 to be transfered daily to offsite media.

 Currently i use tape and i was just speccing a new tape system as i
 thought by using disk based backups, and retentions of weekly/monthly
 backups lasting say 6 weeks, im going to need a LOT of disk, plus the
 bandwidth transfer costs to DR site

 LTO3 tapes are storing 200gb a tape which is pretty good compared to
 disk i thought.

 I guess in my set up its a trade off between :-

 Initial cost of disk array vs initial cost of tape library, drives and
 media

 Time take to backup ( network will be bottle neck here. Still on
100Meg
 lan with just 2 DB servers using GigaBit lan to backup server.

 Offsite transfer of tapes daily to offsite location vs Cost of
increased
 bandwith between sites to transfer backup data.


 Im now confused what to propose :)



 ___
 Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
 http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

 ___
 Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
 http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-24 Thread Curtis Preston
Dave,

Dude, you've got to get our more. ;)  I'd recommend continually perusing
some of these sites to stay current on what's going on in the industry.
De-dupe is kind of the most-mentioned topic in the storage industry
since I don't know what.  

http://www.searchstorage.com
http://www.byteandswitch.com
http://www.infostoremag.com
http://www.isit.com/IndexSTO.cfm
http://www.backupcentral.com (My blog)

On my blog I've got a series of entries that talks about De-duplication,
starting with this one, What is De-duplication?  I tried to link all
the de-dupe entries together, so that each entry has a forwarding link
to the next blog entry in the series:
http://www.backupcentral.com/content/view/58/47/

Your question about where de-dupe resides is answered in this entry Two
different types of de-dupe:

http://www.backupcentral.com/content/view/129/47/

We've got directories of both types:
Hardware/Target: http://tinyurl.com/384528
Software/Source: http://tinyurl.com/2dtvh2

(I use TinyUrl.com because the URLs are very long and tend to get
truncated in email.  BTW, tinyurl uses de-duplication-like techniques,
as they run an algorithm against the string to give you a smaller
string.  Then when you click on that string, they restore the original
URL to your browser.  Kind of cool.)

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Dave
Markham
Sent: Monday, September 24, 2007 11:35 AM
To: Jeff Lightner
Cc: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

Guys i've just read this thread and can say im very interested in it.
The first thing is i learned a new term called deduplication which i
didn't know existed.

Question : I gather Deduplication is using other software. DataDomain i
think i saw mentioned. Where does this fit in with Netbackup and does
the software reside on every client or just a server somewhere?

Ok, so im trying to kit refresh a backup environment for a customer
which has 2 sites. Production and DR about 200 miles apart. There is a
link between the sites but the customer will probably frown on increased
bandwidth charges to transfer backup data across for DisasterRecovery
purposes.

Data is probably only 1 TB for the site with perhaps 70% being required
to be transfered daily to offsite media.

Currently i use tape and i was just speccing a new tape system as i
thought by using disk based backups, and retentions of weekly/monthly
backups lasting say 6 weeks, im going to need a LOT of disk, plus the
bandwidth transfer costs to DR site

LTO3 tapes are storing 200gb a tape which is pretty good compared to
disk i thought.

I guess in my set up its a trade off between :-

Initial cost of disk array vs initial cost of tape library, drives and
media

Time take to backup ( network will be bottle neck here. Still on 100Meg
lan with just 2 DB servers using GigaBit lan to backup server.

Offsite transfer of tapes daily to offsite location vs Cost of increased
bandwith between sites to transfer backup data.


Im now confused what to propose :)



___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-24 Thread bob944
 Question : I gather Deduplication is using other software. 
 DataDomain i
 think i saw mentioned. Where does this fit in with Netbackup and does
 the software reside on every client or just a server somewhere?

In the technologies I'm familiar with--one of them is old, another new,
it's conceptually simple.  The system, whether that's a standalone
system or a box of disk with some smarts or an agent on the backup
client, receives data and examines it in blocks of some size (AFAIK,
always way larger than a 512-byte disk block).  Simplistically, it
checksums the block and looks in a table of
checksums-of-blocks-that-it-already-stores to see if the identical
ahem, anyone see a hole here? data already lives there.  If so, the
data can be tossed away and the checksum kept.  The file as stored as
a collection of these checksums (imprecise term, but works for the
example) or a list of pointers to the single instance (hence the SIS
term can be overloaded here) of the data represented by that checksum.
A simplistic example would be storing a TB of zeros.  Deduplicating
devices would store the first block of zeros, then find that all the
rest of them were the same checksum, same data and just store one more
pointer.  That 1TB file becomes, say, one real instance of 512KB of
zeros (if that is the block size) plus the space for a few million
pointers to the same 512KB of data.  Obviously, even this could be
compressed but that's another story.

Backing up the same system with few changes would be a very small full
backup.  Backing up many instances of, say, the C drive of w2k3 systems
will deduplicate like crazy.  Backing up a million different JPEGs
wouldn't save any appreciable space, but backing them up twice, or
multiple instances of the same JPEG, would.

 LTO3 tapes are storing 200gb a tape which is pretty good compared to
 disk i thought.

But that's a horrible number for LTO3.  Either your tapes aren't full or
something is broken.  Look at the available_media report to get a good
idea of the range of data stored on your FULL tapes.


___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-24 Thread Curtis Preston
Simplistically, it checksums the block and looks in a table of
checksums-of-blocks-that-it-already-stores to see if the identical
ahem, anyone see a hole here? data already lives there.  

To what hole do you refer?  I see one in your simplistic example, but
not in what actually happens (which require a much longer technical
explanation).

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-24 Thread A Darren Dunham
On Mon, Sep 24, 2007 at 05:08:31PM -0400, bob944 wrote:
 In the technologies I'm familiar with--one of them is old, another new,
 it's conceptually simple.  The system, whether that's a standalone
 system or a box of disk with some smarts or an agent on the backup
 client, receives data and examines it in blocks of some size (AFAIK,
 always way larger than a 512-byte disk block).  Simplistically, it
 checksums the block and looks in a table of
 checksums-of-blocks-that-it-already-stores to see if the identical
 ahem, anyone see a hole here? data already lives there.

Yes, there's a hole there if that's all you're relying on.  Not all of
them do that.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-24 Thread Curtis Preston
There are no products in the market that rely solely on a checksum to
identify redundant data.  There are a few that rely solely on a 160-bit
hash, which is significantly larger than a checksum (typically 12-16
bits).  There are some who are concerned about hash collisions in this
scenario.  I am not one of those people.  Here is a quote from an
article I wrote.  The entire article is available here:

http://tinyurl.com/2j7r52

quote
Hash collisions occur when two different chunks produce the same hash.
It's widely acknowledged in cryptographic circles that a determined
hacker could create two blocks of data that would have the same MD5
hash. If a hacker could do that, they might be able to create a fake
cryptographic signature. That's why many security experts are turning to
SHA-1. Its bigger key space makes it much more difficult for a hacker to
crack. However, at least one group has already been credited with
creating a hash collision with SHA-1.

The ability to forcibly create a hash collision means absolutely nothing
in the context of deduplication. What matters is the chance that two
random chunks would have a hash collision. With a 128-bit and 160-bit
key space, the odds of that happening are 1 in 2128 with MD5, and 1 in
2160 with SHA-1. That's 1038 and 1048, respectively. If you assume that
there's less than a yottabyte (1 billion petabytes) of data on the
planet Earth, then the odds of a hash collision with two random chunks
are roughly 1,461,501,637,330,900,000,000,000,000 times greater than the
number of bytes in the known computing universe.

Let's compare those odds with the odds of an unrecoverable read error on
a typical disk--approximately 1 in 100 trillion or 1014. Even worse odds
are data miscorrection, where error-correcting codes step in and believe
they have corrected an error, but miscorrect it instead. Those odds are
approximately 1 in 1021. So you have a 1 in 1021 chance of writing data
to disk, having the data written incorrectly and not even knowing it.
Everybody's OK with these numbers, so there's little reason to worry
about the 1 in 1048 chance of a SHA-1 hash collision.

If you want to talk about the odds of something bad happening and not
knowing it, keep using tape. Everyone who has worked with tape for any
length of time has experienced a tape drive writing something that it
then couldn't read. Compare that to successful deduplication disk
restores. According to Avamar Technologies Inc. (recently acquired by
EMC Corp.), none of its customers has ever had a failed restore. Hash
collisions are a nonissue.
/quote

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of A Darren
Dunham
Sent: Monday, September 24, 2007 5:59 PM
To: Veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

On Mon, Sep 24, 2007 at 05:08:31PM -0400, bob944 wrote:
 In the technologies I'm familiar with--one of them is old, another
new,
 it's conceptually simple.  The system, whether that's a standalone
 system or a box of disk with some smarts or an agent on the backup
 client, receives data and examines it in blocks of some size (AFAIK,
 always way larger than a 512-byte disk block).  Simplistically, it
 checksums the block and looks in a table of
 checksums-of-blocks-that-it-already-stores to see if the identical
 ahem, anyone see a hole here? data already lives there.

Yes, there's a hole there if that's all you're relying on.  Not all of
them do that.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-24 Thread Ed Wilts
I'm not convinced that writing to a DataDomain is going to be faster than
writing to multiple LTO-3 drives over a SAN.  The DD is limited to about
90MB/sec which is on par with 1-2 LTO-3 drives and not much more than that.
Unless, of course, you consider adding extra DD units for every 2 LTO-3
drives you currently have and that's going to bump your costs up even higher
(which might be offset by the requirement for a Decru FC520 encrypting
appliance for every 2-3 LTO-3 drives today).

I don't think that NetBackup 6.5 includes de-duplication.  It's provided by
PureDisk which is a separately licensed product.  With 6.5.1, you'll be able
to use PureDisk as a storage unit, something that's not there yet today.

.../Ed

--
Ed Wilts, RHCE, BCFP, BCSD
Mounds View, MN, USA
mailto:[EMAIL PROTECTED]


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:veritas-bu-
 [EMAIL PROTECTED] On Behalf Of Clem Kruger
 Sent: Monday, September 24, 2007 11:32 AM
 To: [EMAIL PROTECTED]; Jeff Lightner
 Cc: veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu] Tapeless backup environments?
 
 Hi Dave,
 
 Yes it is a difficult decision I have looked at DataDomain with
 NetBackup. I have found that the backups are faster and there is a vast
 amount of disk being saved.
 
 NetBackup 6.5 includes de-duplication and I have become a great friend
 of it. To use the words of a supplier, Saving me Time, Saving me Space
 and Saving me Money :)
 
 
 Kind Regards,
 Clem Kruger
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Dave
 Markham
 Sent: 24 September 2007 17:35 PM
 To: Jeff Lightner
 Cc: veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu] Tapeless backup environments?
 
 Guys i've just read this thread and can say im very interested in it.
 The first thing is i learned a new term called deduplication which i
 didn't know existed.
 
 Question : I gather Deduplication is using other software. DataDomain i
 think i saw mentioned. Where does this fit in with Netbackup and does
 the software reside on every client or just a server somewhere?
 
 Ok, so im trying to kit refresh a backup environment for a customer
 which has 2 sites. Production and DR about 200 miles apart. There is a
 link between the sites but the customer will probably frown on
 increased
 bandwidth charges to transfer backup data across for DisasterRecovery
 purposes.
 
 Data is probably only 1 TB for the site with perhaps 70% being required
 to be transfered daily to offsite media.
 
 Currently i use tape and i was just speccing a new tape system as i
 thought by using disk based backups, and retentions of weekly/monthly
 backups lasting say 6 weeks, im going to need a LOT of disk, plus the
 bandwidth transfer costs to DR site
 
 LTO3 tapes are storing 200gb a tape which is pretty good compared to
 disk i thought.
 
 I guess in my set up its a trade off between :-
 
 Initial cost of disk array vs initial cost of tape library, drives and
 media
 
 Time take to backup ( network will be bottle neck here. Still on 100Meg
 lan with just 2 DB servers using GigaBit lan to backup server.
 
 Offsite transfer of tapes daily to offsite location vs Cost of
 increased
 bandwith between sites to transfer backup data.
 
 
 Im now confused what to propose :)

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-24 Thread Curtis Preston
I'm not convinced either.  Although our numbers are a little different,
you and I end up roughly at the same place.  There are a number of
vendors whose de-dupe targets top out at about 200-300 MB/s, which is
roughly the speed of 2-3 LTO-3 drives, depending on how well you use
them.  If you need more than that, you need to buy another box. (BTW,
Data Domain's numbers have increased to about 200 MB/s.)

These numbers work just fine when we're talking backups via the LAN to
LAN-based backup servers.  You're going to need at least two, possibly
three network-based backup servers to generate 200 MB/s.  Assuming 70
MB/s or so per master/media server, you buy one de-dupe unit per three
master/media servers or so.  You can scale pretty far that way.  You
will need to make sure that backup A is always sent to de-dupe unit A,
and backup B is always sent to de-dupe unit B, and so on.  (If you send
backup B to de-dupe unit A after initially sending it to de-dupe unit A,
its first backup will not get de-duped against anything, resulting in a
significant decrease in overall de-duplication ratio.)  While you won't
get as big of a de-dupe ratio as you would if you could have a single
device that could do 1000s of MB/s, there is an argument to be made that
you won't get much de-dupe when de-duping the backups of server A
against those of server B -- unless they have similar data.  So a very
large setup like this will require a bit of planning, but I think the
benefits outweigh the extra planning required.

Now, if you happen to have a SINGLE SAN media server that needs MORE
than 200 MB/s, then you're going to want a device that can handle that
level of throughput. This is going to be a pretty big server, BTW, as a
200 MB/s device can back up about 6 TB in 8 hours.  And notice I said
SAN media server, not a regular media server, as a regular media server
isn't going to be able to generate more than 200 MB/s, as it's getting
its backups via IP. But a SAN media server is backing up its own data
locally, so it can go much faster.  This also means you're really
looking at a SAN/block device, which means you're really looking at a
VTL.  (Yes, I'm aware of the Puredisk storage unit around the corner.  I
think you'll find it's not going after this part of the market.)

If you need this kind of throughput, there are a few products that are
advertising several hundred or thousands of MB/s within a single de-dupe
setup.  These are the newer kids on the de-dupe block, of course, so
they're not going to have as many customer references as the vendors
that have been selling de-dupe as long.  But from what I've seen,
they're worth a look.

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ed Wilts
Sent: Monday, September 24, 2007 9:44 PM
To: 'Clem Kruger'; [EMAIL PROTECTED]; 'Jeff Lightner'
Cc: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

I'm not convinced that writing to a DataDomain is going to be faster
than
writing to multiple LTO-3 drives over a SAN.  The DD is limited to about
90MB/sec which is on par with 1-2 LTO-3 drives and not much more than
that.
Unless, of course, you consider adding extra DD units for every 2 LTO-3
drives you currently have and that's going to bump your costs up even
higher
(which might be offset by the requirement for a Decru FC520 encrypting
appliance for every 2-3 LTO-3 drives today).

I don't think that NetBackup 6.5 includes de-duplication.  It's provided
by
PureDisk which is a separately licensed product.  With 6.5.1, you'll be
able
to use PureDisk as a storage unit, something that's not there yet today.

.../Ed

--
Ed Wilts, RHCE, BCFP, BCSD
Mounds View, MN, USA
mailto:[EMAIL PROTECTED]


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:veritas-bu-
 [EMAIL PROTECTED] On Behalf Of Clem Kruger
 Sent: Monday, September 24, 2007 11:32 AM
 To: [EMAIL PROTECTED]; Jeff Lightner
 Cc: veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu] Tapeless backup environments?
 
 Hi Dave,
 
 Yes it is a difficult decision I have looked at DataDomain with
 NetBackup. I have found that the backups are faster and there is a
vast
 amount of disk being saved.
 
 NetBackup 6.5 includes de-duplication and I have become a great friend
 of it. To use the words of a supplier, Saving me Time, Saving me
Space
 and Saving me Money :)
 
 
 Kind Regards,
 Clem Kruger
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Dave
 Markham
 Sent: 24 September 2007 17:35 PM
 To: Jeff Lightner
 Cc: veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu] Tapeless backup environments?
 
 Guys i've just read this thread and can say im very interested in it.
 The first thing is i learned a new term called deduplication which i
 didn't know existed.
 
 Question : I gather Deduplication

Re: [Veritas-bu] Tapeless backup environments?

2007-09-23 Thread Martin, Jonathan
I think one of de-duplication's benefits is that even if 2% of your file data 
changes it doesn't have to replicate that entire 2%.  In my mind its similar to 
byte level replication (although an entirely different technology.)  Just 
because Netbackup backs up 20GB of different files doesn't mean that you've 
made 20GB of changes.  If that were the case I'd roll-over my file servers once 
a month.
 
I wonder after comparing the two technologies, if there is room for a mixed 
mode solution that could take advantage of both tape's and disk's benefits 
without creating a tough to swallow price tag.
 
-Jonathan



From: [EMAIL PROTECTED] on behalf of Ed Wilts
Sent: Sat 9/22/2007 9:35 AM
To: 'Jeff Lightner'; veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?



Here's some simple math that may help (complements of ExaGrid's web site).

 

If you have 1TB of data with a 2% change rate, you'll need to back up 20GB of 
daily incrementals.  To replicate this to another site in 18 hours requires 
3Mbps of bandwidth.  If you have lots of bandwidth or not too much data, 
replication to an offsite location may make sense.  But to think that you can 
replicate your backups for 20TB of data to another state is going to make your 
network group squirm.  Iron Mountain looks pretty cheap comparing to offsite 
electronic replication.

 

We have 1 application by itself that adds 30GB of new data every day.  It's 
being replicated within the metro area over a 1Gbps pipe (real time, not via 
backups).  We sure couldn't replicate everything...

 

As the OLD saying goes, never understand the bandwidth of a station wagon full 
of tapes.

 

   .../Ed

 

--

Ed Wilts, RHCE, BCFP, BCSD

Mounds View, MN, USA

mailto:[EMAIL PROTECTED]

 

From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeff Lightner
Sent: Friday, September 21, 2007 8:44 AM
To: veritas-bu@mailman.eng.auburn.edu
Subject: [Veritas-bu] Tapeless backup environments?

 

Yesterday our director said that he doesn't intend to ever upgrade existing STK 
L700 because eventually we'll go tapeless as that is what the industry is 
doing.   The idea being we'd have our disk backup devices here (e.g. Data 
Domain) and transfer to offsite storage to another disk device so as to 
eliminate the need for ever transporting tapes.

It made me wonder if anyone was actually doing the above already or was 
planning to do so?


___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-22 Thread Clem Kruger
I agree, disk may not be cheaper BUT one can choose what disk one should
use for backups (Tier1 to Tier 4).

As we have seen in earlier posts there is a fair amount of work to be
done in maintaining VTL tapes which have expired (a salary cost). If
your master and media servers have been setup correctly you will find
that writing to tape is faster as you can send multiple data streams to
the tape AND the tape drive does the compression.

Compression on a VTL is done by the operating system (normally LINUX)
which we all know is a slow process and therefore not recommended. Your
VTL supplier will also recommend that you do not multistream as this
also slows down the process.

If you want to use disk to disk backups, then do just that! It is
available in version 6.0 and 6.5. 6.5 also has a de-duplication facility
which will save you space on the disk (you can choice from 1 Tier to 4
Tier) and the raid group you would like to use AND 6.5 has a replication
facility to replicate the disk image off site.

If your management insist on VTL, my advice is to get the supplier to do
a face off between tape and VTL. Don't be intimidated by them! All VTL
vendors use SCSI emulation which has an overhead cost to it (they may
deny this but it is fact). They will promise you that offsite storage is
simple. Let them demonstrate life. Don't be fooled by their added media
server. It all a pain and a lot more work, as well as being costly.

Tape will remain cheaper and the tape drive manufacturers are fighting
hard to keep tape that why, with larger capacity and faster drives, this
despite the fact that they know that tape has beginning to reach the end
of its life cycle.

If you abandon tape rather go for disk to disk as it is easier faster
and safer. If you are not cash critical rather go for Veritas Storage
Foundation as the snap shot technology will allow to create a snap shot
an any available disk on any array, which is attached to the SAN. It has
all the tools included in the product that you will have to purchase
from disk array suppliers at an enormous cost. The replication will also
guarantee your data arrives at the offsite facility and is recoverable.
Oracle backups can be at block level saving an incredible amount of time
and backup space.


Clem.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jeff
Lightner
Sent: 21 September 2007 16:34 PM
To: Justin Piszcz
Cc: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

Disk is not cheaper?  You've done a cost analysis?

Not saying you're wrong and I haven't done an analysis but I'd be
surprised if disks didn't actually work out to be cheaper over time:

1) Tapes age/break - We buy on average several hundred tapes a year -
support on a disk array for failing disks may or may not be more
expensive.

2) Transport/storage - We have to pay for offsite storage and transfer -
it seems just putting an array in offsite facility would eliminate the
need for transportation (in trucks) cost.  Of course there would be cost
in the data transfer disk to disk but since everyone seems to have
connectivity over the internet it might be possible to do this using a
B2B link rather than via dedicated circuits.

3) Labor cost in dealing with mechanical failures of robots.   This one
is hidden in salary but every time I have to work on a robot it means I
can't be working on something else.   While disk drives fail it doesn't
seem to happen nearly as often as having to fish a tape out of a drive
or the tape drive itself having failed.


-Original Message-
From: Justin Piszcz [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 21, 2007 10:08 AM
To: Jeff Lightner
Cc: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?



On Fri, 21 Sep 2007, Jeff Lightner wrote:

 Yesterday our director said that he doesn't intend to ever upgrade
 existing STK L700 because eventually we'll go tapeless as that is what
 the industry is doing.   The idea being we'd have our disk backup
 devices here (e.g. Data Domain) and transfer to offsite storage to
 another disk device so as to eliminate the need for ever transporting
 tapes.

 It made me wonder if anyone was actually doing the above already or
was
 planning to do so?


That seems to be the way people are 'thinking' but the bottom line is
disk 
still is not cheaper than LTO-3 tape and there are a lot of advantages
to 
tape; however, convicing management of this is an uphill battle.

Justin.
--

CONFIDENTIALITY NOTICE: This e-mail may contain privileged or
confidential information and is for the sole use of the intended
recipient(s). If you are not the intended recipient, any disclosure,
copying, distribution, or use of the contents of this information is
prohibited and may be unlawful. If you have received this electronic
transmission in error, please reply immediately to the sender that you
have received

Re: [Veritas-bu] Tapeless backup environments?

2007-09-22 Thread Ed Wilts
Here's some simple math that may help (complements of ExaGrid's web site).

 

If you have 1TB of data with a 2% change rate, you'll need to back up 20GB
of daily incrementals.  To replicate this to another site in 18 hours
requires 3Mbps of bandwidth.  If you have lots of bandwidth or not too much
data, replication to an offsite location may make sense.  But to think that
you can replicate your backups for 20TB of data to another state is going to
make your network group squirm.  Iron Mountain looks pretty cheap comparing
to offsite electronic replication.

 

We have 1 application by itself that adds 30GB of new data every day.  It's
being replicated within the metro area over a 1Gbps pipe (real time, not via
backups).  We sure couldn't replicate everything.

 

As the OLD saying goes, never understand the bandwidth of a station wagon
full of tapes.

 

   ./Ed

 

--

Ed Wilts, RHCE, BCFP, BCSD

Mounds View, MN, USA

mailto:[EMAIL PROTECTED]

 

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jeff
Lightner
Sent: Friday, September 21, 2007 8:44 AM
To: veritas-bu@mailman.eng.auburn.edu
Subject: [Veritas-bu] Tapeless backup environments?

 

Yesterday our director said that he doesn't intend to ever upgrade existing
STK L700 because eventually we'll go tapeless as that is what the industry
is doing.   The idea being we'd have our disk backup devices here (e.g. Data
Domain) and transfer to offsite storage to another disk device so as to
eliminate the need for ever transporting tapes.

It made me wonder if anyone was actually doing the above already or was
planning to do so?

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-22 Thread Ed Wilts
1)  Disk ages and breaks too.  
2)  Transport is cheap.  I'd be surprised if I couldn't transport a thousand
tapes for the cost of a terabyte of storage.  Bandwidth to move data is
*NOT* cheap.  20GB/day requires 3Mbps of pipe.
3)  I spend more time replacing disk drives than I do replacing tapes or
tape drives.   To back up my 1200 SAN-based spindles, I have 6 LTO-3 drives.
It sounds like you need to either replace your tape drives or treat them
better.  We do work on our robots perhaps once every few months.  We replace
disk drives on a weekly basis.  NetBackup requires a *lot* more time than
the robots or the disk drives ever will.  

   .../Ed

--
Ed Wilts, RHCE, BCFP, BCSD
Mounds View, MN, USA
mailto:[EMAIL PROTECTED]

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:veritas-bu-
 [EMAIL PROTECTED] On Behalf Of Jeff Lightner
 Sent: Friday, September 21, 2007 9:34 AM
 To: Justin Piszcz
 Cc: veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu] Tapeless backup environments?
 
 Disk is not cheaper?  You've done a cost analysis?
 
 Not saying you're wrong and I haven't done an analysis but I'd be
 surprised if disks didn't actually work out to be cheaper over time:
 
 1) Tapes age/break - We buy on average several hundred tapes a year -
 support on a disk array for failing disks may or may not be more
 expensive.
 
 2) Transport/storage - We have to pay for offsite storage and transfer
 -
 it seems just putting an array in offsite facility would eliminate the
 need for transportation (in trucks) cost.  Of course there would be
 cost
 in the data transfer disk to disk but since everyone seems to have
 connectivity over the internet it might be possible to do this using a
 B2B link rather than via dedicated circuits.
 
 3) Labor cost in dealing with mechanical failures of robots.   This one
 is hidden in salary but every time I have to work on a robot it means I
 can't be working on something else.   While disk drives fail it doesn't
 seem to happen nearly as often as having to fish a tape out of a drive
 or the tape drive itself having failed.
 
 
 -Original Message-
 From: Justin Piszcz [mailto:[EMAIL PROTECTED]
 Sent: Friday, September 21, 2007 10:08 AM
 To: Jeff Lightner
 Cc: veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu] Tapeless backup environments?
 
 
 
 On Fri, 21 Sep 2007, Jeff Lightner wrote:
 
  Yesterday our director said that he doesn't intend to ever upgrade
  existing STK L700 because eventually we'll go tapeless as that is
 what
  the industry is doing.   The idea being we'd have our disk backup
  devices here (e.g. Data Domain) and transfer to offsite storage to
  another disk device so as to eliminate the need for ever transporting
  tapes.
 
  It made me wonder if anyone was actually doing the above already or
 was
  planning to do so?
 
 
 That seems to be the way people are 'thinking' but the bottom line is
 disk
 still is not cheaper than LTO-3 tape and there are a lot of advantages
 to
 tape; however, convicing management of this is an uphill battle.
 
 Justin.

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-22 Thread Ed Wilts
But Curtis, a disk drive by itself isn't very useful either - you'll need to
a controller or two.

And don't forget to factor in the price of the de-duplication appliances or
software.  Those suckers are *NOT* cheap.  An appliance to support 1TB of
compressed data lists out at about $20K.  Unless you get a *lot* of
de-duplication - and not everybody does - that appliance is going to get
killed on price compared to not de-duping it.

It took me only 30 minutes with a de-dupe vendor last week to eliminate
their product from consideration in our environment.

.../Ed

--
Ed Wilts, RHCE, BCFP, BCSD
Mounds View, MN, USA
mailto:[EMAIL PROTECTED]


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:veritas-bu-
 [EMAIL PROTECTED] On Behalf Of Curtis Preston
 Sent: Friday, September 21, 2007 12:10 PM
 To: Justin Piszcz; Jeff Lightner
 Cc: veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu] Tapeless backup environments?
 
 First, you can't compare the cost of disk and tape directly like that.
 You have to include the drives and robots.  A drive by itself is
 useful;
 a tape by itself is not.
 
 Setting that aside, if I put that disk in a system that's doing 20:1
 de-duplication, my cost is now 1.65c/GB vs your 3-9c/GB.
 
 ---
 W. Curtis Preston
 Backup Blog @ www.backupcentral.com
 VP Data Protection, GlassHouse Technologies
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Justin
 Piszcz
 Sent: Friday, September 21, 2007 7:36 AM
 To: Jeff Lightner
 Cc: veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu] Tapeless backup environments?
 
 
 I believe disks are 33c/gigabyte and tapes are 3-9cents/gigabyte or
 even
 
 cheaper, I do not remember the exact figures, but someone I know has
 done
 a cost analysis and tapes were by far cheaper.  Also something that
 nobody
 calculates is the cost of power to keep disks spinning.
 
 Justin.
 
 On Fri, 21 Sep 2007, Jeff Lightner wrote:
 
  Disk is not cheaper?  You've done a cost analysis?
 
  Not saying you're wrong and I haven't done an analysis but I'd be
  surprised if disks didn't actually work out to be cheaper over time:
 
  1) Tapes age/break - We buy on average several hundred tapes a year -
  support on a disk array for failing disks may or may not be more
  expensive.
 
  2) Transport/storage - We have to pay for offsite storage and
 transfer
 -
  it seems just putting an array in offsite facility would eliminate
 the
  need for transportation (in trucks) cost.  Of course there would be
 cost
  in the data transfer disk to disk but since everyone seems to have
  connectivity over the internet it might be possible to do this using
 a
  B2B link rather than via dedicated circuits.
 
  3) Labor cost in dealing with mechanical failures of robots.   This
 one
  is hidden in salary but every time I have to work on a robot it means
 I
  can't be working on something else.   While disk drives fail it
 doesn't
  seem to happen nearly as often as having to fish a tape out of a
 drive
  or the tape drive itself having failed.
 
 
  -Original Message-
  From: Justin Piszcz [mailto:[EMAIL PROTECTED]
  Sent: Friday, September 21, 2007 10:08 AM
  To: Jeff Lightner
  Cc: veritas-bu@mailman.eng.auburn.edu
  Subject: Re: [Veritas-bu] Tapeless backup environments?
 
 
 
  On Fri, 21 Sep 2007, Jeff Lightner wrote:
 
  Yesterday our director said that he doesn't intend to ever upgrade
  existing STK L700 because eventually we'll go tapeless as that is
 what
  the industry is doing.   The idea being we'd have our disk backup
  devices here (e.g. Data Domain) and transfer to offsite storage to
  another disk device so as to eliminate the need for ever
 transporting
  tapes.
 
  It made me wonder if anyone was actually doing the above already or
  was
  planning to do so?
 
 
  That seems to be the way people are 'thinking' but the bottom line is
  disk
  still is not cheaper than LTO-3 tape and there are a lot of
 advantages
  to
  tape; however, convicing management of this is an uphill battle.
 
  Justin.

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-22 Thread Justin Piszcz
Don't even get me started on SANs, I have seen the entire loss of an MTI 
(now EMC) SAN and with the new Claiiron SANS I have seen entire shelves go 
off-line due to bad SPAs etc, IMO not reliable.

Also with disk, I have a question with VTLs, etc, if I am feeding multiple 
LTO-3 tape drives using 10Gbps; what type of disk/VTL (not SAN) is out 
there that can accept multiple 10Gbps streams/data and will not choke?

VTLs seem like a good idea for filesystem backups but for on-demand 
database backups, I do not see them as the holy grail.

Justin.

  On Sat, 22 Sep 2007, Ed Wilts wrote:

 1)  Disk ages and breaks too.
 2)  Transport is cheap.  I'd be surprised if I couldn't transport a thousand
 tapes for the cost of a terabyte of storage.  Bandwidth to move data is
 *NOT* cheap.  20GB/day requires 3Mbps of pipe.
 3)  I spend more time replacing disk drives than I do replacing tapes or
 tape drives.   To back up my 1200 SAN-based spindles, I have 6 LTO-3 drives.
 It sounds like you need to either replace your tape drives or treat them
 better.  We do work on our robots perhaps once every few months.  We replace
 disk drives on a weekly basis.  NetBackup requires a *lot* more time than
 the robots or the disk drives ever will.

   .../Ed

 --
 Ed Wilts, RHCE, BCFP, BCSD
 Mounds View, MN, USA
 mailto:[EMAIL PROTECTED]

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:veritas-bu-
 [EMAIL PROTECTED] On Behalf Of Jeff Lightner
 Sent: Friday, September 21, 2007 9:34 AM
 To: Justin Piszcz
 Cc: veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu] Tapeless backup environments?

 Disk is not cheaper?  You've done a cost analysis?

 Not saying you're wrong and I haven't done an analysis but I'd be
 surprised if disks didn't actually work out to be cheaper over time:

 1) Tapes age/break - We buy on average several hundred tapes a year -
 support on a disk array for failing disks may or may not be more
 expensive.

 2) Transport/storage - We have to pay for offsite storage and transfer
 -
 it seems just putting an array in offsite facility would eliminate the
 need for transportation (in trucks) cost.  Of course there would be
 cost
 in the data transfer disk to disk but since everyone seems to have
 connectivity over the internet it might be possible to do this using a
 B2B link rather than via dedicated circuits.

 3) Labor cost in dealing with mechanical failures of robots.   This one
 is hidden in salary but every time I have to work on a robot it means I
 can't be working on something else.   While disk drives fail it doesn't
 seem to happen nearly as often as having to fish a tape out of a drive
 or the tape drive itself having failed.


 -Original Message-
 From: Justin Piszcz [mailto:[EMAIL PROTECTED]
 Sent: Friday, September 21, 2007 10:08 AM
 To: Jeff Lightner
 Cc: veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu] Tapeless backup environments?



 On Fri, 21 Sep 2007, Jeff Lightner wrote:

 Yesterday our director said that he doesn't intend to ever upgrade
 existing STK L700 because eventually we'll go tapeless as that is
 what
 the industry is doing.   The idea being we'd have our disk backup
 devices here (e.g. Data Domain) and transfer to offsite storage to
 another disk device so as to eliminate the need for ever transporting
 tapes.

 It made me wonder if anyone was actually doing the above already or
 was
 planning to do so?


 That seems to be the way people are 'thinking' but the bottom line is
 disk
 still is not cheaper than LTO-3 tape and there are a lot of advantages
 to
 tape; however, convicing management of this is an uphill battle.

 Justin.

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-22 Thread NICHOLAS MERIZZI
  Curtis - Although I agree with the other responses you have given out 
with respect to the tape vs. disk cost I am not sure about your statements 
below.
   
  Going back for a second to the cost of tape vs. disk... if you do an analysis 
make sure to take all things into account when you backup to tape. This is why 
most people don't get a proper cost associated with tape backup i.e: 
  1. SAN ports
  2. Tape drives - fixing them, lost time, shoe-shining
  3. media cost - fixing media, media failure cost(cost of not being able to 
do a restore)
  4. off siting - the cycles/dollars lost in handling that internally, the 
cost of dealing with Recall/Iron Mountain (or whoever), the cost associated 
with the delay in waiting for a tape to be recalled...
  5. library maintenance cost
  6. restore duration cost (i.e. if i have 100 people waiting for a Tier 1 
server to be restored...)
  Anyways the list of invisible costs associated with tapes go on... 
   
  As for your EMC CDL comments... First I believe they are now called EDL (EMC 
Disk Libraries) because they take into account their new Symmetrix backend 
devices. Although I agree with you that de-dup is important to the future of 
backups you make it seem that it should be the only deciding factor in a 
purchase! If you push de-dup aside for a second what do most customers want? My 
guess is performance, availability, stability, integration with backup 
application. This has been my thought process and these de-dup companies you 
speak about such as Sepaton, Diligent, Data Domain all at one point or another 
have HUGE performance hits (i.e. we have tape drives that go faster then some 
of these), little capability to scale (without combining multiple devices 
together), or have un-explainable single points of failures.
  I also agree that replication is important and if you can minimize the amount 
you replicate then great. Here is my dilemma: Most of the de-dup vendors out 
there  (i.e. I am thinking of Sepaton) that can perform de-dup have only been 
in the replication business for a year (probably less) and have very little 
maturity in that space! That scares me a bit... 
   
  As for backup integration I personally like the fact that with EMC I can have 
a built in media server on top of my VTL and control everything from what I am 
familiar with... no other vendor offers that!
   
  Anyways just my two cents... Bottom line is that I agree that de-dup is 
important but if you can push that aside and look at the other technical merit 
(assuming that all vendors will have de-dup sooner than later) suddenly the 
list of enterprise level candidates drops significantly from what I am seeing. 
   
  -Nicholas
   
   
  
-
  
  From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Curtis Preston
Sent: Friday, September 21, 2007 1:13 PM
To: Kevin Whittaker; Jeff Lightner; veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

   
  The only issue there is that the EMC CDL does not support de-duplication, and 
it doesn’t look like they’ll be doing it any time soon.  I know they’re working 
on it, but they haven’t announced anything public, so who knows.  Compare that 
to the other de-dupe vendors that announced probably a year before they were 
ready, and you’ve got some sense of my opinion of when EMC de-dupe will 
actually be GA – if not later.
   
  Your design would work great if you had de-dupe. Without de-dupe, you are 
going to be replicated 20 times more data (or more), requiring a significantly 
larger pipe.
   
---
  W. Curtis Preston
  Backup Blog @ www.backupcentral.com
  VP Data Protection, GlassHouse Technologies

  
-
  
  From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kevin Whittaker
Sent: Friday, September 21, 2007 7:48 AM
To: Jeff Lightner; veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

   
  We have it on our plan.  We will be using tape for only long term retention 
of data.
   
  Our plan is to purchase another EMC CDL, and mirror our existing EMC CDL to 
the EMC CDL at our DR site.  Our master server already is duplicated, and this 
will allow us to start restores of stuff that is not tier 1 applications that 
already are mirrored to the DR site.
   
  I would prefer not to save the long term on tape, but we don't have a 
solution for any other way to do it at this time.
   
  Kevin
   

-
  
  From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jeff Lightner
Sent: Friday, September 21, 2007 9:44 AM
To: veritas-bu@mailman.eng.auburn.edu
Subject: [Veritas-bu] Tapeless backup environments?
  Yesterday our director said that he doesn’t intend to ever upgrade existing 
STK L700 because eventually we’ll go tapeless as that is what the industry is 
doing.   The idea being we’d have our disk backup devices here

Re: [Veritas-bu] Tapeless backup environments?

2007-09-22 Thread Peter Marelas
It is interesting to see the points for and against disk / tape backup
technologies play out. A worthwhile discussion.

People have mentioned management/operational/service/infrastructure
costs to justify a switch. Nobody has mentioned risk.

The problem with comparing tape / disk is they are very different
technologies and have different risk profiles subject to how you choose
to apply them.

I don't think disk was every intended to store dormant data. If it did
it would stop spinning don't you think?

So here are some of the risk profile differences to consider before you
take the leap of faith.

1. Tape can be set to dormant / shelved. Disk can not (some can - but
the ones you guys are talking about can't) so it is susceptible to
corruption, malicious intent, accidents, unauthorized and often
undetected access.

2. A tape backup set is isolated from all other tape backup sets - that
is it has few dependents. A disk backup set will often share disks with
others - that is it has many dependents. The risk grows exponentially
with deduping as the logical structure now becomes dependent upon
itself. If I can use an analogy. With deduping your kind of saying
incremental forever to tape is acceptable.

3. It would take a long time to wipe 1000 tapes. It would take a few
minutes to wipe 1000 tape volumes worth of disk and a couple of seconds
for the deduped equivalent.

If deduping was considered risk free we would be deduping our entire
Enterprise. But somehow it is acceptable for backup. I don't think
anyone would agree deduping your backups is acceptable without a tape
backup set. So why do we have deduped backups? Deduping is necessary to
make disk backup viable for a greater share of the backup market.

There is a general rule I apply to technology choices and that is with
every step forward always consider what you are compromising. In this
case it is risk. However, don't get me wrong. The compromise may be
acceptable to you. It is one of those assessments that is difficult to
quantify and therefore often misunderstood or ignored completely.

D2d2T in my mind gives you the best of both worlds (I don't agree one
was every meant to replace the other). It takes away the
unpredictability of tape transports without compromising the data's
resting place and risk profile. The key is in managing the d in a way
that affords you tolerance to T failures and growth in D. So to make
it happen the d should be far superior to the D and T combined.

PS. These are my opinions not the companies I work for.

Regards
Peter Marelas
+61 400 882 651

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Martin,
Jonathan
Sent: Saturday, 22 September 2007 3:37 AM
To: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

 
I think what I'm reading here is that no one has done a true 1-to-1
comparison on Tape versus Deduplication / disk.  I guess the next
question is, what would go into such a comparison?

1) Recovery Point Objective
2) Amount of Data To Be Backed Up
3) Retention
4) Cost of Hardware (Deduplication Appliance w/ Disk)
5) Cost of Hardware (Tape Library)
6) Annual Maintenance on Hardware Above
7) Cost of Media w/ Replacement Figures
8) Cost to power / cool disks (infrastructure)
9) Cost of Network link to remote site for de-dupe
10) Cost of Media Transportation and Storage

Price per GB unless factoring in at least all of the above is useless
and much of that information depends on configuration.  I did such an
analysis when we upgraded to NBU6 and considered deduplication this time
last year.  In my case, many of the features of disk based deduplication
weren't applicable to my situation (especially RPO) so tape was easily
cheaper.  If you are shipping media offsite daily though for a =1 day
RPO then deduplication definitely makes a play.  Further price per gig
on the disk side has been heavily influenced by consumer grade SATA
drives at 750gb and 1TB bringing costs way down in comparison to only 1
or 2 years ago.

There's certainly a lot of data to injest before making claims of either
technology's superiority in a particular environment.

-Jonathan


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Curtis
Preston
Sent: Friday, September 21, 2007 1:10 PM
To: Justin Piszcz; Jeff Lightner
Cc: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

First, you can't compare the cost of disk and tape directly like that.
You have to include the drives and robots.  A drive by itself is useful;
a tape by itself is not.  

Setting that aside, if I put that disk in a system that's doing 20:1
de-duplication, my cost is now 1.65c/GB vs your 3-9c/GB.  

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Justin
Piszcz
Sent: Friday, September

[Veritas-bu] Tapeless backup environments?

2007-09-21 Thread Jeff Lightner
Yesterday our director said that he doesn't intend to ever upgrade
existing STK L700 because eventually we'll go tapeless as that is what
the industry is doing.   The idea being we'd have our disk backup
devices here (e.g. Data Domain) and transfer to offsite storage to
another disk device so as to eliminate the need for ever transporting
tapes.

It made me wonder if anyone was actually doing the above already or was
planning to do so?
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-21 Thread Justin Piszcz


On Fri, 21 Sep 2007, Jeff Lightner wrote:

 Yesterday our director said that he doesn't intend to ever upgrade
 existing STK L700 because eventually we'll go tapeless as that is what
 the industry is doing.   The idea being we'd have our disk backup
 devices here (e.g. Data Domain) and transfer to offsite storage to
 another disk device so as to eliminate the need for ever transporting
 tapes.

 It made me wonder if anyone was actually doing the above already or was
 planning to do so?


That seems to be the way people are 'thinking' but the bottom line is disk 
still is not cheaper than LTO-3 tape and there are a lot of advantages to 
tape; however, convicing management of this is an uphill battle.

Justin.
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-21 Thread ckstehman
Discovery Channel

=
Carl Stehman
IT Distributed Services Team
Pepco Holdings, Inc.
202-331-6619
Pager 301-765-2703
[EMAIL PROTECTED]



Jeff Lightner [EMAIL PROTECTED] 
Sent by: [EMAIL PROTECTED]
09/21/2007 09:57 AM

To
veritas-bu@mailman.eng.auburn.edu
cc

Subject
[Veritas-bu] Tapeless backup environments?






Yesterday our director said that he doesn?t intend to ever upgrade 
existing STK L700 because eventually we?ll go tapeless as that is what the 
industry is doing.   The idea being we?d have our disk backup devices here 
(e.g. Data Domain) and transfer to offsite storage to another disk device 
so as to eliminate the need for ever transporting tapes.
It made me wonder if anyone was actually doing the above already or was 
planning to do so?___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


This Email message and any attachment may contain information that is
proprietary, legally privileged, confidential and/or subject to copyright
belonging to Pepco Holdings, Inc. or its affiliates (PHI).  This Email is
intended solely for the use of the person(s) to which it is addressed.  If
you are not an intended recipient, or the employee or agent responsible for
delivery of this Email to the intended recipient(s), you are hereby notified
that any dissemination, distribution or copying of this Email is strictly
prohibited.  If you have received this message in error, please immediately
notify the sender and permanently delete this Email and any copies.  PHI
policies expressly prohibit employees from making defamatory or offensive
statements and infringing any copyright or any other legal right by Email
communication.  PHI will not accept any liability in respect of such
communications.
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-21 Thread Justin Piszcz

I believe disks are 33c/gigabyte and tapes are 3-9cents/gigabyte or even 
cheaper, I do not remember the exact figures, but someone I know has done 
a cost analysis and tapes were by far cheaper.  Also something that nobody 
calculates is the cost of power to keep disks spinning.

Justin.

On Fri, 21 Sep 2007, Jeff Lightner wrote:

 Disk is not cheaper?  You've done a cost analysis?

 Not saying you're wrong and I haven't done an analysis but I'd be
 surprised if disks didn't actually work out to be cheaper over time:

 1) Tapes age/break - We buy on average several hundred tapes a year -
 support on a disk array for failing disks may or may not be more
 expensive.

 2) Transport/storage - We have to pay for offsite storage and transfer -
 it seems just putting an array in offsite facility would eliminate the
 need for transportation (in trucks) cost.  Of course there would be cost
 in the data transfer disk to disk but since everyone seems to have
 connectivity over the internet it might be possible to do this using a
 B2B link rather than via dedicated circuits.

 3) Labor cost in dealing with mechanical failures of robots.   This one
 is hidden in salary but every time I have to work on a robot it means I
 can't be working on something else.   While disk drives fail it doesn't
 seem to happen nearly as often as having to fish a tape out of a drive
 or the tape drive itself having failed.


 -Original Message-
 From: Justin Piszcz [mailto:[EMAIL PROTECTED]
 Sent: Friday, September 21, 2007 10:08 AM
 To: Jeff Lightner
 Cc: veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu] Tapeless backup environments?



 On Fri, 21 Sep 2007, Jeff Lightner wrote:

 Yesterday our director said that he doesn't intend to ever upgrade
 existing STK L700 because eventually we'll go tapeless as that is what
 the industry is doing.   The idea being we'd have our disk backup
 devices here (e.g. Data Domain) and transfer to offsite storage to
 another disk device so as to eliminate the need for ever transporting
 tapes.

 It made me wonder if anyone was actually doing the above already or
 was
 planning to do so?


 That seems to be the way people are 'thinking' but the bottom line is
 disk
 still is not cheaper than LTO-3 tape and there are a lot of advantages
 to
 tape; however, convicing management of this is an uphill battle.

 Justin.
 --

 CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential 
 information and is for the sole use of the intended recipient(s). If you are 
 not the intended recipient, any disclosure, copying, distribution, or use of 
 the contents of this information is prohibited and may be unlawful. If you 
 have received this electronic transmission in error, please reply immediately 
 to the sender that you have received the message in error, and delete it. 
 Thank you.

 --


___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-21 Thread Jeff Lightner
Disk is not cheaper?  You've done a cost analysis?

Not saying you're wrong and I haven't done an analysis but I'd be
surprised if disks didn't actually work out to be cheaper over time:

1) Tapes age/break - We buy on average several hundred tapes a year -
support on a disk array for failing disks may or may not be more
expensive.

2) Transport/storage - We have to pay for offsite storage and transfer -
it seems just putting an array in offsite facility would eliminate the
need for transportation (in trucks) cost.  Of course there would be cost
in the data transfer disk to disk but since everyone seems to have
connectivity over the internet it might be possible to do this using a
B2B link rather than via dedicated circuits.

3) Labor cost in dealing with mechanical failures of robots.   This one
is hidden in salary but every time I have to work on a robot it means I
can't be working on something else.   While disk drives fail it doesn't
seem to happen nearly as often as having to fish a tape out of a drive
or the tape drive itself having failed.


-Original Message-
From: Justin Piszcz [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 21, 2007 10:08 AM
To: Jeff Lightner
Cc: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?



On Fri, 21 Sep 2007, Jeff Lightner wrote:

 Yesterday our director said that he doesn't intend to ever upgrade
 existing STK L700 because eventually we'll go tapeless as that is what
 the industry is doing.   The idea being we'd have our disk backup
 devices here (e.g. Data Domain) and transfer to offsite storage to
 another disk device so as to eliminate the need for ever transporting
 tapes.

 It made me wonder if anyone was actually doing the above already or
was
 planning to do so?


That seems to be the way people are 'thinking' but the bottom line is
disk 
still is not cheaper than LTO-3 tape and there are a lot of advantages
to 
tape; however, convicing management of this is an uphill battle.

Justin.
--

CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential 
information and is for the sole use of the intended recipient(s). If you are 
not the intended recipient, any disclosure, copying, distribution, or use of 
the contents of this information is prohibited and may be unlawful. If you have 
received this electronic transmission in error, please reply immediately to the 
sender that you have received the message in error, and delete it. Thank you.

--



___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-21 Thread ckstehman
This was in response the the question about eliminating tapes.  The IT 
department of Discovery Channel uses DataDomain and
NetApp for all their backups.  They are running Netbackup6.0MP5. We had a 
tour sponsored by DataDomain.  We are considering
going to disk based backups and are looking at VTL's and how all that 
stuff fits with Netbackup 6.5.  We will probably be upgrading
to Netbackup 6.5 next year and adding some sort of disk based backup 
solution.  We are still evaluating vendors, no final decisions
have been made.

Hope this helps

=
Carl Stehman
IT Distributed Services Team
Pepco Holdings, Inc.
202-331-6619
Pager 301-765-2703
[EMAIL PROTECTED]



Jeff Lightner [EMAIL PROTECTED] 
Sent by: [EMAIL PROTECTED]
09/21/2007 10:41 AM

To
[EMAIL PROTECTED]
cc
veritas-bu@mailman.eng.auburn.edu, 
[EMAIL PROTECTED]
Subject
Re: [Veritas-bu] Tapeless backup environments?






Cartoon Network.
 
Did your post have a point?  Discovery Channel had a special on this? 
You?re annoyed at theoretical questions?  wtf?
 

From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 21, 2007 10:28 AM
To: Jeff Lightner
Cc: veritas-bu@mailman.eng.auburn.edu; 
[EMAIL PROTECTED]
Subject: Re: [Veritas-bu] Tapeless backup environments?
 

Discovery Channel 

=
Carl Stehman
IT Distributed Services Team
Pepco Holdings, Inc.
202-331-6619
Pager 301-765-2703
[EMAIL PROTECTED] 


Jeff Lightner [EMAIL PROTECTED] 
Sent by: [EMAIL PROTECTED] 
09/21/2007 09:57 AM 


To
veritas-bu@mailman.eng.auburn.edu 
cc
 
Subject
[Veritas-bu] Tapeless backup environments?
 


 
 




Yesterday our director said that he doesn?t intend to ever upgrade 
existing STK L700 because eventually we?ll go tapeless as that is what the 
industry is doing.   The idea being we?d have our disk backup devices here 
(e.g. Data Domain) and transfer to offsite storage to another disk device 
so as to eliminate the need for ever transporting tapes. 
It made me wonder if anyone was actually doing the above already or was 
planning to do so?___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


This Email message and any attachment may contain information that is 
proprietary, legally privileged, confidential and/or subject to copyright 
belonging to Pepco Holdings, Inc. or its affiliates (PHI). This Email is 
intended solely for the use of the person(s) to which it is addressed. If 
you are not an intended recipient, or the employee or agent responsible 
for delivery of this Email to the intended recipient(s), you are hereby 
notified that any dissemination, distribution or copying of this Email is 
strictly prohibited. If you have received this message in error, please 
immediately notify the sender and permanently delete this Email and any 
copies. PHI policies expressly prohibit employees from making defamatory 
or offensive statements and infringing any copyright or any other legal 
right by Email communication. PHI will not accept any liability in respect 
of such communications. 
--
CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential 
information and is for the sole use of the intended recipient(s). If you 
are not the intended recipient, any disclosure, copying, distribution, or 
use of the contents of this information is prohibited and may be unlawful. 
If you have received this electronic transmission in error, please reply 
immediately to the sender that you have received the message in error, and 
delete it. Thank you.
-- 
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


This Email message and any attachment may contain information that is
proprietary, legally privileged, confidential and/or subject to copyright
belonging to Pepco Holdings, Inc. or its affiliates (PHI).  This Email is
intended solely for the use of the person(s) to which it is addressed.  If
you are not an intended recipient, or the employee or agent responsible for
delivery of this Email to the intended recipient(s), you are hereby notified
that any dissemination, distribution or copying of this Email is strictly
prohibited.  If you have received this message in error, please immediately
notify the sender and permanently delete this Email and any copies.  PHI
policies expressly prohibit employees from making defamatory or offensive
statements and infringing any copyright or any other legal right by Email
communication.  PHI will not accept any liability in respect of such
communications.
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-21 Thread Kevin Whittaker
We have it on our plan.  We will be using tape for only long term
retention of data.
 
Our plan is to purchase another EMC CDL, and mirror our existing EMC CDL
to the EMC CDL at our DR site.  Our master server already is duplicated,
and this will allow us to start restores of stuff that is not tier 1
applications that already are mirrored to the DR site.
 
I would prefer not to save the long term on tape, but we don't have a
solution for any other way to do it at this time.
 
Kevin



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jeff
Lightner
Sent: Friday, September 21, 2007 9:44 AM
To: veritas-bu@mailman.eng.auburn.edu
Subject: [Veritas-bu] Tapeless backup environments?



Yesterday our director said that he doesn't intend to ever upgrade
existing STK L700 because eventually we'll go tapeless as that is what
the industry is doing.   The idea being we'd have our disk backup
devices here (e.g. Data Domain) and transfer to offsite storage to
another disk device so as to eliminate the need for ever transporting
tapes.

It made me wonder if anyone was actually doing the above already or was
planning to do so?
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-21 Thread Jeff Lightner
Thanks.

 



From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 21, 2007 10:46 AM
To: Jeff Lightner
Cc: veritas-bu@mailman.eng.auburn.edu;
[EMAIL PROTECTED]
Subject: Re: [Veritas-bu] Tapeless backup environments?

 


This was in response the the question about eliminating tapes.  The IT
department of Discovery Channel uses DataDomain and 
NetApp for all their backups.  They are running Netbackup6.0MP5. We had
a tour sponsored by DataDomain.  We are considering 
going to disk based backups and are looking at VTL's and how all that
stuff fits with Netbackup 6.5.  We will probably be upgrading 
to Netbackup 6.5 next year and adding some sort of disk based backup
solution.  We are still evaluating vendors, no final decisions 
have been made. 

Hope this helps 

=
Carl Stehman
IT Distributed Services Team
Pepco Holdings, Inc.
202-331-6619
Pager 301-765-2703
[EMAIL PROTECTED] 



Jeff Lightner [EMAIL PROTECTED] 
Sent by: [EMAIL PROTECTED] 

09/21/2007 10:41 AM 

To

[EMAIL PROTECTED] 

cc

veritas-bu@mailman.eng.auburn.edu,
[EMAIL PROTECTED] 

Subject

Re: [Veritas-bu] Tapeless backup environments?

 

 

 




Cartoon Network. 
  
Did your post have a point?  Discovery Channel had a special on this?
You're annoyed at theoretical questions?  wtf? 
  

 




From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 21, 2007 10:28 AM
To: Jeff Lightner
Cc: veritas-bu@mailman.eng.auburn.edu;
[EMAIL PROTECTED]
Subject: Re: [Veritas-bu] Tapeless backup environments? 
  

Discovery Channel 

=
Carl Stehman
IT Distributed Services Team
Pepco Holdings, Inc.
202-331-6619
Pager 301-765-2703
[EMAIL PROTECTED] 

Jeff Lightner [EMAIL PROTECTED] 
Sent by: [EMAIL PROTECTED] 

09/21/2007 09:57 AM 

 

To

veritas-bu@mailman.eng.auburn.edu 

cc

  

Subject

[Veritas-bu] Tapeless backup environments?


  

 

  

 





Yesterday our director said that he doesn't intend to ever upgrade
existing STK L700 because eventually we'll go tapeless as that is what
the industry is doing.   The idea being we'd have our disk backup
devices here (e.g. Data Domain) and transfer to offsite storage to
another disk device so as to eliminate the need for ever transporting
tapes. 

It made me wonder if anyone was actually doing the above already or was
planning to do so?___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu 



This Email message and any attachment may contain information that is
proprietary, legally privileged, confidential and/or subject to
copyright belonging to Pepco Holdings, Inc. or its affiliates (PHI).
This Email is intended solely for the use of the person(s) to which it
is addressed. If you are not an intended recipient, or the employee or
agent responsible for delivery of this Email to the intended
recipient(s), you are hereby notified that any dissemination,
distribution or copying of this Email is strictly prohibited. If you
have received this message in error, please immediately notify the
sender and permanently delete this Email and any copies. PHI policies
expressly prohibit employees from making defamatory or offensive
statements and infringing any copyright or any other legal right by
Email communication. PHI will not accept any liability in respect of
such communications. 

--
CONFIDENTIALITY NOTICE: This e-mail may contain privileged or
confidential information and is for the sole use of the intended
recipient(s). If you are not the intended recipient, any disclosure,
copying, distribution, or use of the contents of this information is
prohibited and may be unlawful. If you have received this electronic
transmission in error, please reply immediately to the sender that you
have received the message in error, and delete it. Thank you.
--
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu



This Email message and any attachment may contain information that is
proprietary, legally privileged, confidential and/or subject to
copyright belonging to Pepco Holdings, Inc. or its affiliates (PHI).
This Email is intended solely for the use of the person(s) to which it
is addressed. If you are not an intended recipient, or the employee or
agent responsible for delivery of this Email to the intended
recipient(s), you are hereby notified that any dissemination,
distribution or copying of this Email is strictly prohibited. If you
have received this message in error, please immediately notify the
sender and permanently delete this Email and any copies. PHI policies
expressly prohibit employees from making defamatory or offensive
statements and infringing any copyright or any other

Re: [Veritas-bu] Tapeless backup environments?

2007-09-21 Thread Jeff Lightner
Cartoon Network.

 

Did your post have a point?  Discovery Channel had a special on this?
You're annoyed at theoretical questions?  wtf?

 



From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 21, 2007 10:28 AM
To: Jeff Lightner
Cc: veritas-bu@mailman.eng.auburn.edu;
[EMAIL PROTECTED]
Subject: Re: [Veritas-bu] Tapeless backup environments?

 


Discovery Channel 

=
Carl Stehman
IT Distributed Services Team
Pepco Holdings, Inc.
202-331-6619
Pager 301-765-2703
[EMAIL PROTECTED] 



Jeff Lightner [EMAIL PROTECTED] 
Sent by: [EMAIL PROTECTED] 

09/21/2007 09:57 AM 

To

veritas-bu@mailman.eng.auburn.edu 

cc

 

Subject

[Veritas-bu] Tapeless backup environments?

 

 

 




Yesterday our director said that he doesn't intend to ever upgrade
existing STK L700 because eventually we'll go tapeless as that is what
the industry is doing.   The idea being we'd have our disk backup
devices here (e.g. Data Domain) and transfer to offsite storage to
another disk device so as to eliminate the need for ever transporting
tapes. 

It made me wonder if anyone was actually doing the above already or was
planning to do so?___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu



This Email message and any attachment may contain information that is
proprietary, legally privileged, confidential and/or subject to
copyright belonging to Pepco Holdings, Inc. or its affiliates (PHI).
This Email is intended solely for the use of the person(s) to which it
is addressed. If you are not an intended recipient, or the employee or
agent responsible for delivery of this Email to the intended
recipient(s), you are hereby notified that any dissemination,
distribution or copying of this Email is strictly prohibited. If you
have received this message in error, please immediately notify the
sender and permanently delete this Email and any copies. PHI policies
expressly prohibit employees from making defamatory or offensive
statements and infringing any copyright or any other legal right by
Email communication. PHI will not accept any liability in respect of
such communications. 
--

CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential 
information and is for the sole use of the intended recipient(s). If you are 
not the intended recipient, any disclosure, copying, distribution, or use of 
the contents of this information is prohibited and may be unlawful. If you have 
received this electronic transmission in error, please reply immediately to the 
sender that you have received the message in error, and delete it. Thank you.

--


___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-21 Thread Justin Piszcz
If you only do filesystem backups and not a lot of on-demand user-database 
backups, you can probably get away with disk.  If you are doings 1,000s of 
user-initiated database backups though, disk will not cut it unless you 
had a massive infrastructure.  Using LTO-3 or LTO-4 drives with 10GBps 
for example is much cheaper.

Justin.

On Fri, 21 Sep 2007, [EMAIL PROTECTED] wrote:

 This was in response the the question about eliminating tapes.  The IT
 department of Discovery Channel uses DataDomain and
 NetApp for all their backups.  They are running Netbackup6.0MP5. We had a
 tour sponsored by DataDomain.  We are considering
 going to disk based backups and are looking at VTL's and how all that
 stuff fits with Netbackup 6.5.  We will probably be upgrading
 to Netbackup 6.5 next year and adding some sort of disk based backup
 solution.  We are still evaluating vendors, no final decisions
 have been made.

 Hope this helps

 =
 Carl Stehman
 IT Distributed Services Team
 Pepco Holdings, Inc.
 202-331-6619
 Pager 301-765-2703
 [EMAIL PROTECTED]



 Jeff Lightner [EMAIL PROTECTED]
 Sent by: [EMAIL PROTECTED]
 09/21/2007 10:41 AM

 To
 [EMAIL PROTECTED]
 cc
 veritas-bu@mailman.eng.auburn.edu,
 [EMAIL PROTECTED]
 Subject
 Re: [Veritas-bu] Tapeless backup environments?






 Cartoon Network.

 Did your post have a point?  Discovery Channel had a special on this?
 You?re annoyed at theoretical questions?  wtf?


 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
 Sent: Friday, September 21, 2007 10:28 AM
 To: Jeff Lightner
 Cc: veritas-bu@mailman.eng.auburn.edu;
 [EMAIL PROTECTED]
 Subject: Re: [Veritas-bu] Tapeless backup environments?


 Discovery Channel

 =
 Carl Stehman
 IT Distributed Services Team
 Pepco Holdings, Inc.
 202-331-6619
 Pager 301-765-2703
 [EMAIL PROTECTED]


 Jeff Lightner [EMAIL PROTECTED]
 Sent by: [EMAIL PROTECTED]
 09/21/2007 09:57 AM


 To
 veritas-bu@mailman.eng.auburn.edu
 cc

 Subject
 [Veritas-bu] Tapeless backup environments?









 Yesterday our director said that he doesn?t intend to ever upgrade
 existing STK L700 because eventually we?ll go tapeless as that is what the
 industry is doing.   The idea being we?d have our disk backup devices here
 (e.g. Data Domain) and transfer to offsite storage to another disk device
 so as to eliminate the need for ever transporting tapes.
 It made me wonder if anyone was actually doing the above already or was
 planning to do so?___
 Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
 http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


 This Email message and any attachment may contain information that is
 proprietary, legally privileged, confidential and/or subject to copyright
 belonging to Pepco Holdings, Inc. or its affiliates (PHI). This Email is
 intended solely for the use of the person(s) to which it is addressed. If
 you are not an intended recipient, or the employee or agent responsible
 for delivery of this Email to the intended recipient(s), you are hereby
 notified that any dissemination, distribution or copying of this Email is
 strictly prohibited. If you have received this message in error, please
 immediately notify the sender and permanently delete this Email and any
 copies. PHI policies expressly prohibit employees from making defamatory
 or offensive statements and infringing any copyright or any other legal
 right by Email communication. PHI will not accept any liability in respect
 of such communications.
 --
 CONFIDENTIALITY NOTICE: This e-mail may contain privileged or confidential
 information and is for the sole use of the intended recipient(s). If you
 are not the intended recipient, any disclosure, copying, distribution, or
 use of the contents of this information is prohibited and may be unlawful.
 If you have received this electronic transmission in error, please reply
 immediately to the sender that you have received the message in error, and
 delete it. Thank you.
 --
 ___
 Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
 http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


 This Email message and any attachment may contain information that is
 proprietary, legally privileged, confidential and/or subject to copyright
 belonging to Pepco Holdings, Inc. or its affiliates (PHI).  This Email is
 intended solely for the use of the person(s) to which it is addressed.  If
 you are not an intended recipient, or the employee or agent responsible for
 delivery of this Email to the intended recipient(s), you are hereby notified
 that any dissemination, distribution or copying of this Email is strictly
 prohibited.  If you have received this message in error, please immediately
 notify the sender and permanently delete this Email and any copies.  PHI
 policies expressly prohibit

Re: [Veritas-bu] Tapeless backup environments?

2007-09-21 Thread Curtis Preston
Huh?  I've got to say I think completely the opposite of you on this
one.  User directed backups are really hard to direct at a resource that
is limited by the number of drives.  I suppose you could multiplex, but
yuck.

Why wouldn't you point all user backups to disk?  It's very similar to
what I like to do with Redologs/logical logs/transaction logs.  When
it's time to back them up, it's time to back them up. They don't want to
wait for a tape drive.  So send them to disk.

Why wouldn't you want to send them to disk?

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Justin
Piszcz
Sent: Friday, September 21, 2007 8:01 AM
To: [EMAIL PROTECTED]
Cc: veritas-bu@mailman.eng.auburn.edu; Jeff Lightner;
[EMAIL PROTECTED]
Subject: Re: [Veritas-bu] Tapeless backup environments?

If you only do filesystem backups and not a lot of on-demand
user-database 
backups, you can probably get away with disk.  If you are doings 1,000s
of 
user-initiated database backups though, disk will not cut it unless you 
had a massive infrastructure.  Using LTO-3 or LTO-4 drives with 10GBps 
for example is much cheaper.

Justin.

On Fri, 21 Sep 2007, [EMAIL PROTECTED] wrote:

 This was in response the the question about eliminating tapes.  The IT
 department of Discovery Channel uses DataDomain and
 NetApp for all their backups.  They are running Netbackup6.0MP5. We
had a
 tour sponsored by DataDomain.  We are considering
 going to disk based backups and are looking at VTL's and how all that
 stuff fits with Netbackup 6.5.  We will probably be upgrading
 to Netbackup 6.5 next year and adding some sort of disk based backup
 solution.  We are still evaluating vendors, no final decisions
 have been made.

 Hope this helps

 =
 Carl Stehman
 IT Distributed Services Team
 Pepco Holdings, Inc.
 202-331-6619
 Pager 301-765-2703
 [EMAIL PROTECTED]



 Jeff Lightner [EMAIL PROTECTED]
 Sent by: [EMAIL PROTECTED]
 09/21/2007 10:41 AM

 To
 [EMAIL PROTECTED]
 cc
 veritas-bu@mailman.eng.auburn.edu,
 [EMAIL PROTECTED]
 Subject
 Re: [Veritas-bu] Tapeless backup environments?






 Cartoon Network.

 Did your post have a point?  Discovery Channel had a special on this?
 You?re annoyed at theoretical questions?  wtf?


 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
 Sent: Friday, September 21, 2007 10:28 AM
 To: Jeff Lightner
 Cc: veritas-bu@mailman.eng.auburn.edu;
 [EMAIL PROTECTED]
 Subject: Re: [Veritas-bu] Tapeless backup environments?


 Discovery Channel

 =
 Carl Stehman
 IT Distributed Services Team
 Pepco Holdings, Inc.
 202-331-6619
 Pager 301-765-2703
 [EMAIL PROTECTED]


 Jeff Lightner [EMAIL PROTECTED]
 Sent by: [EMAIL PROTECTED]
 09/21/2007 09:57 AM


 To
 veritas-bu@mailman.eng.auburn.edu
 cc

 Subject
 [Veritas-bu] Tapeless backup environments?









 Yesterday our director said that he doesn?t intend to ever upgrade
 existing STK L700 because eventually we?ll go tapeless as that is what
the
 industry is doing.   The idea being we?d have our disk backup devices
here
 (e.g. Data Domain) and transfer to offsite storage to another disk
device
 so as to eliminate the need for ever transporting tapes.
 It made me wonder if anyone was actually doing the above already or
was
 planning to do so?___
 Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
 http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


 This Email message and any attachment may contain information that is
 proprietary, legally privileged, confidential and/or subject to
copyright
 belonging to Pepco Holdings, Inc. or its affiliates (PHI). This
Email is
 intended solely for the use of the person(s) to which it is addressed.
If
 you are not an intended recipient, or the employee or agent
responsible
 for delivery of this Email to the intended recipient(s), you are
hereby
 notified that any dissemination, distribution or copying of this Email
is
 strictly prohibited. If you have received this message in error,
please
 immediately notify the sender and permanently delete this Email and
any
 copies. PHI policies expressly prohibit employees from making
defamatory
 or offensive statements and infringing any copyright or any other
legal
 right by Email communication. PHI will not accept any liability in
respect
 of such communications.
 --
 CONFIDENTIALITY NOTICE: This e-mail may contain privileged or
confidential
 information and is for the sole use of the intended recipient(s). If
you
 are not the intended recipient, any disclosure, copying, distribution,
or
 use of the contents of this information is prohibited and may be
unlawful.
 If you have received this electronic transmission in error, please
reply
 immediately to the sender that you have received the message in error,
and
 delete

Re: [Veritas-bu] Tapeless backup environments?

2007-09-21 Thread Curtis Preston
The only issue there is that the EMC CDL does not support
de-duplication, and it doesn't look like they'll be doing it any time
soon.  I know they're working on it, but they haven't announced anything
public, so who knows.  Compare that to the other de-dupe vendors that
announced probably a year before they were ready, and you've got some
sense of my opinion of when EMC de-dupe will actually be GA - if not
later.

 

Your design would work great if you had de-dupe. Without de-dupe, you
are going to be replicated 20 times more data (or more), requiring a
significantly larger pipe.

 

---

W. Curtis Preston

Backup Blog @ www.backupcentral.com

VP Data Protection, GlassHouse Technologies



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Kevin
Whittaker
Sent: Friday, September 21, 2007 7:48 AM
To: Jeff Lightner; veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

 

We have it on our plan.  We will be using tape for only long term
retention of data.

 

Our plan is to purchase another EMC CDL, and mirror our existing EMC CDL
to the EMC CDL at our DR site.  Our master server already is duplicated,
and this will allow us to start restores of stuff that is not tier 1
applications that already are mirrored to the DR site.

 

I would prefer not to save the long term on tape, but we don't have a
solution for any other way to do it at this time.

 

Kevin

 



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jeff
Lightner
Sent: Friday, September 21, 2007 9:44 AM
To: veritas-bu@mailman.eng.auburn.edu
Subject: [Veritas-bu] Tapeless backup environments?

Yesterday our director said that he doesn't intend to ever upgrade
existing STK L700 because eventually we'll go tapeless as that is what
the industry is doing.   The idea being we'd have our disk backup
devices here (e.g. Data Domain) and transfer to offsite storage to
another disk device so as to eliminate the need for ever transporting
tapes.

It made me wonder if anyone was actually doing the above already or was
planning to do so?

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-21 Thread Curtis Preston
First, you can't compare the cost of disk and tape directly like that.
You have to include the drives and robots.  A drive by itself is useful;
a tape by itself is not.  

Setting that aside, if I put that disk in a system that's doing 20:1
de-duplication, my cost is now 1.65c/GB vs your 3-9c/GB.  

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Justin
Piszcz
Sent: Friday, September 21, 2007 7:36 AM
To: Jeff Lightner
Cc: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?


I believe disks are 33c/gigabyte and tapes are 3-9cents/gigabyte or even

cheaper, I do not remember the exact figures, but someone I know has
done 
a cost analysis and tapes were by far cheaper.  Also something that
nobody 
calculates is the cost of power to keep disks spinning.

Justin.

On Fri, 21 Sep 2007, Jeff Lightner wrote:

 Disk is not cheaper?  You've done a cost analysis?

 Not saying you're wrong and I haven't done an analysis but I'd be
 surprised if disks didn't actually work out to be cheaper over time:

 1) Tapes age/break - We buy on average several hundred tapes a year -
 support on a disk array for failing disks may or may not be more
 expensive.

 2) Transport/storage - We have to pay for offsite storage and transfer
-
 it seems just putting an array in offsite facility would eliminate the
 need for transportation (in trucks) cost.  Of course there would be
cost
 in the data transfer disk to disk but since everyone seems to have
 connectivity over the internet it might be possible to do this using a
 B2B link rather than via dedicated circuits.

 3) Labor cost in dealing with mechanical failures of robots.   This
one
 is hidden in salary but every time I have to work on a robot it means
I
 can't be working on something else.   While disk drives fail it
doesn't
 seem to happen nearly as often as having to fish a tape out of a drive
 or the tape drive itself having failed.


 -Original Message-
 From: Justin Piszcz [mailto:[EMAIL PROTECTED]
 Sent: Friday, September 21, 2007 10:08 AM
 To: Jeff Lightner
 Cc: veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu] Tapeless backup environments?



 On Fri, 21 Sep 2007, Jeff Lightner wrote:

 Yesterday our director said that he doesn't intend to ever upgrade
 existing STK L700 because eventually we'll go tapeless as that is
what
 the industry is doing.   The idea being we'd have our disk backup
 devices here (e.g. Data Domain) and transfer to offsite storage to
 another disk device so as to eliminate the need for ever transporting
 tapes.

 It made me wonder if anyone was actually doing the above already or
 was
 planning to do so?


 That seems to be the way people are 'thinking' but the bottom line is
 disk
 still is not cheaper than LTO-3 tape and there are a lot of advantages
 to
 tape; however, convicing management of this is an uphill battle.

 Justin.
 --

 CONFIDENTIALITY NOTICE: This e-mail may contain privileged or
confidential information and is for the sole use of the intended
recipient(s). If you are not the intended recipient, any disclosure,
copying, distribution, or use of the contents of this information is
prohibited and may be unlawful. If you have received this electronic
transmission in error, please reply immediately to the sender that you
have received the message in error, and delete it. Thank you.

 --


___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-21 Thread Wayne T Smith
Yes, we are in the middle of this (trying to replace D2T2T with D2D2D) 
process now.

What I am seeing is that while disk media costs more than tape per TB, 
de-duplication is the difference-maker, the enabler, making extra weeks 
or months retention of D2D data inexpensive. Buy another appliance for 
off-site replication, and only changes, not full backups, are 
essentially moved between sites, causing much lower volume of 
transmission. Kind of like running an rsync after previously 
rsyncing. Good news that all vendors I've seriously looked at make 
this automatic and great news is that one might expect NetBackup will be 
able to see and use the replicated site directly (soon).

I'm not a fan of VTL, so we are not looking at VTL. Your situation may vary.

Disk-to-disk backup planning is not a simple exercise, however, as 
features, operations and even terminology varies substantially from 
vendor to vendor.

The state of the art does seem much more mature than even a couple of 
years ago. All vendors we're seriously looking at know their competition 
and will make very substantial discounting from list prices. Proper 
sizing has been a chore, as each vendor tries to minimize the cost of 
their proposal.

Although we have not made final decisions, I find the Data Domain 
backup appliance offerings superb (though we have not yet had an on-site 
trial).

On the other hand, we have a lot of NetApp primary disk, and so the 
NetApp backup offerings are interesting for their support/use of 
snapshots and integration with NetBackup ($$$ features on both NetApp 
and Symantec sides). Technically speaking, though, NetApp NearStore used 
as a simple disk backup appliance does not appear to stack up to the 
Data Domain offerings.

Which solution is best for us has yet to be determined for my 
approximately 10TB site.

All of the D2D2D solutions we've studied and have been proposed to us 
would entail significantly more capital outlay than simply adding some 
staging disk and getting more modern tape drives for our L700, but the 
performance, scalability and automation levels of D2D2D are very exciting.

Hope this helps! (please do post your experiences)

cheers, wayne

Jeff Lightner wrote, in part, on 2007-09-21 9:43 AM:

 Yesterday our director said that he doesn’t intend to ever upgrade 
 existing STK L700 because eventually we’ll go tapeless as that is what 
 the industry is doing. The idea being we’d have our disk backup 
 devices here (e.g. Data Domain) and transfer to offsite storage to 
 another disk device so as to eliminate the need for ever transporting 
 tapes.

 It made me wonder if anyone was actually doing the above already or 
 was planning to do so?



___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-21 Thread Martin, Jonathan
 
I think what I'm reading here is that no one has done a true 1-to-1
comparison on Tape versus Deduplication / disk.  I guess the next
question is, what would go into such a comparison?

1) Recovery Point Objective
2) Amount of Data To Be Backed Up
3) Retention
4) Cost of Hardware (Deduplication Appliance w/ Disk)
5) Cost of Hardware (Tape Library)
6) Annual Maintenance on Hardware Above
7) Cost of Media w/ Replacement Figures
8) Cost to power / cool disks (infrastructure)
9) Cost of Network link to remote site for de-dupe
10) Cost of Media Transportation and Storage

Price per GB unless factoring in at least all of the above is useless
and much of that information depends on configuration.  I did such an
analysis when we upgraded to NBU6 and considered deduplication this time
last year.  In my case, many of the features of disk based deduplication
weren't applicable to my situation (especially RPO) so tape was easily
cheaper.  If you are shipping media offsite daily though for a =1 day
RPO then deduplication definitely makes a play.  Further price per gig
on the disk side has been heavily influenced by consumer grade SATA
drives at 750gb and 1TB bringing costs way down in comparison to only 1
or 2 years ago.

There's certainly a lot of data to injest before making claims of either
technology's superiority in a particular environment.

-Jonathan


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Curtis
Preston
Sent: Friday, September 21, 2007 1:10 PM
To: Justin Piszcz; Jeff Lightner
Cc: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

First, you can't compare the cost of disk and tape directly like that.
You have to include the drives and robots.  A drive by itself is useful;
a tape by itself is not.  

Setting that aside, if I put that disk in a system that's doing 20:1
de-duplication, my cost is now 1.65c/GB vs your 3-9c/GB.  

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Justin
Piszcz
Sent: Friday, September 21, 2007 7:36 AM
To: Jeff Lightner
Cc: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?


I believe disks are 33c/gigabyte and tapes are 3-9cents/gigabyte or even

cheaper, I do not remember the exact figures, but someone I know has
done a cost analysis and tapes were by far cheaper.  Also something that
nobody calculates is the cost of power to keep disks spinning.

Justin.

On Fri, 21 Sep 2007, Jeff Lightner wrote:

 Disk is not cheaper?  You've done a cost analysis?

 Not saying you're wrong and I haven't done an analysis but I'd be 
 surprised if disks didn't actually work out to be cheaper over time:

 1) Tapes age/break - We buy on average several hundred tapes a year - 
 support on a disk array for failing disks may or may not be more 
 expensive.

 2) Transport/storage - We have to pay for offsite storage and transfer
-
 it seems just putting an array in offsite facility would eliminate the

 need for transportation (in trucks) cost.  Of course there would be
cost
 in the data transfer disk to disk but since everyone seems to have 
 connectivity over the internet it might be possible to do this using a

 B2B link rather than via dedicated circuits.

 3) Labor cost in dealing with mechanical failures of robots.   This
one
 is hidden in salary but every time I have to work on a robot it means
I
 can't be working on something else.   While disk drives fail it
doesn't
 seem to happen nearly as often as having to fish a tape out of a drive

 or the tape drive itself having failed.


 -Original Message-
 From: Justin Piszcz [mailto:[EMAIL PROTECTED]
 Sent: Friday, September 21, 2007 10:08 AM
 To: Jeff Lightner
 Cc: veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu] Tapeless backup environments?



 On Fri, 21 Sep 2007, Jeff Lightner wrote:

 Yesterday our director said that he doesn't intend to ever upgrade 
 existing STK L700 because eventually we'll go tapeless as that is
what
 the industry is doing.   The idea being we'd have our disk backup
 devices here (e.g. Data Domain) and transfer to offsite storage to 
 another disk device so as to eliminate the need for ever transporting

 tapes.

 It made me wonder if anyone was actually doing the above already or
 was
 planning to do so?


 That seems to be the way people are 'thinking' but the bottom line is 
 disk still is not cheaper than LTO-3 tape and there are a lot of 
 advantages to tape; however, convicing management of this is an uphill

 battle.

 Justin.
 --

 CONFIDENTIALITY NOTICE: This e-mail may contain privileged or
confidential information and is for the sole use of the intended
recipient(s). If you are not the intended recipient, any disclosure,
copying, distribution, or use of the contents of this information

Re: [Veritas-bu] Tapeless backup environments?

2007-09-21 Thread Curtis Preston
Just curious.  You said I'm not a fan of VTL, and so therefore aren't
looking at the VTL vendors at all.  That kind of leaves out a whole
segment of the market, doesn't it?  I'm aware of Data Domain, Exagrid,
NEC  NetApp NAS-based de-dupe products, but I can't imagine not also
bringing Diligent, Falconstor, Quantum  SEPATON to the table just
because their interface is (currently) virtual tape.  I would bring them
all to the table and make them tell me why I should go their direction,
be it virtual tape or NAS.

In addition, depending on how things are configured and what
software/version we're talking about, I'd much rather back up to a VTL
than a NAS head.


---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Wayne T
Smith
Sent: Friday, September 21, 2007 10:38 AM
To: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

Yes, we are in the middle of this (trying to replace D2T2T with D2D2D) 
process now.

What I am seeing is that while disk media costs more than tape per TB, 
de-duplication is the difference-maker, the enabler, making extra weeks 
or months retention of D2D data inexpensive. Buy another appliance for 
off-site replication, and only changes, not full backups, are 
essentially moved between sites, causing much lower volume of 
transmission. Kind of like running an rsync after previously 
rsyncing. Good news that all vendors I've seriously looked at make 
this automatic and great news is that one might expect NetBackup will be

able to see and use the replicated site directly (soon).

I'm not a fan of VTL, so we are not looking at VTL. Your situation may
vary.

Disk-to-disk backup planning is not a simple exercise, however, as 
features, operations and even terminology varies substantially from 
vendor to vendor.

The state of the art does seem much more mature than even a couple of 
years ago. All vendors we're seriously looking at know their competition

and will make very substantial discounting from list prices. Proper 
sizing has been a chore, as each vendor tries to minimize the cost of 
their proposal.

Although we have not made final decisions, I find the Data Domain 
backup appliance offerings superb (though we have not yet had an on-site

trial).

On the other hand, we have a lot of NetApp primary disk, and so the 
NetApp backup offerings are interesting for their support/use of 
snapshots and integration with NetBackup ($$$ features on both NetApp 
and Symantec sides). Technically speaking, though, NetApp NearStore used

as a simple disk backup appliance does not appear to stack up to the 
Data Domain offerings.

Which solution is best for us has yet to be determined for my 
approximately 10TB site.

All of the D2D2D solutions we've studied and have been proposed to us 
would entail significantly more capital outlay than simply adding some 
staging disk and getting more modern tape drives for our L700, but the 
performance, scalability and automation levels of D2D2D are very
exciting.

Hope this helps! (please do post your experiences)

cheers, wayne

Jeff Lightner wrote, in part, on 2007-09-21 9:43 AM:

 Yesterday our director said that he doesn't intend to ever upgrade 
 existing STK L700 because eventually we'll go tapeless as that is what

 the industry is doing. The idea being we'd have our disk backup 
 devices here (e.g. Data Domain) and transfer to offsite storage to 
 another disk device so as to eliminate the need for ever transporting 
 tapes.

 It made me wonder if anyone was actually doing the above already or 
 was planning to do so?



___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] Tapeless backup environments?

2007-09-21 Thread Martin, Jonathan
I stand corrected.  Curtis has all the answers and he's sitting on them.
=P

Worrying about multiplexing settings and tape failures?  Come on, that's
about as soft a cost as you can dream up.

-Jonathan 

-Original Message-
From: Curtis Preston [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 21, 2007 2:06 PM
To: Martin, Jonathan; veritas-bu@mailman.eng.auburn.edu
Subject: RE: [Veritas-bu] Tapeless backup environments?

Oh, I wouldn't say that. ;)  We've been doing a lot of comparisons
lately, and the comparisons include all of what you listed plus the cost
differential in cost of operation.  For example, opex savings from not
having to worry about multiplexing settings, tape failures, etc.

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Martin,
Jonathan
Sent: Friday, September 21, 2007 10:37 AM
To: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

 
I think what I'm reading here is that no one has done a true 1-to-1
comparison on Tape versus Deduplication / disk.  I guess the next
question is, what would go into such a comparison?

1) Recovery Point Objective
2) Amount of Data To Be Backed Up
3) Retention
4) Cost of Hardware (Deduplication Appliance w/ Disk)
5) Cost of Hardware (Tape Library)
6) Annual Maintenance on Hardware Above
7) Cost of Media w/ Replacement Figures
8) Cost to power / cool disks (infrastructure)
9) Cost of Network link to remote site for de-dupe
10) Cost of Media Transportation and Storage

Price per GB unless factoring in at least all of the above is useless
and much of that information depends on configuration.  I did such an
analysis when we upgraded to NBU6 and considered deduplication this time
last year.  In my case, many of the features of disk based deduplication
weren't applicable to my situation (especially RPO) so tape was easily
cheaper.  If you are shipping media offsite daily though for a =1 day
RPO then deduplication definitely makes a play.  Further price per gig
on the disk side has been heavily influenced by consumer grade SATA
drives at 750gb and 1TB bringing costs way down in comparison to only 1
or 2 years ago.

There's certainly a lot of data to injest before making claims of either
technology's superiority in a particular environment.

-Jonathan


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Curtis
Preston
Sent: Friday, September 21, 2007 1:10 PM
To: Justin Piszcz; Jeff Lightner
Cc: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

First, you can't compare the cost of disk and tape directly like that.
You have to include the drives and robots.  A drive by itself is useful;
a tape by itself is not.  

Setting that aside, if I put that disk in a system that's doing 20:1
de-duplication, my cost is now 1.65c/GB vs your 3-9c/GB.  

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Justin
Piszcz
Sent: Friday, September 21, 2007 7:36 AM
To: Jeff Lightner
Cc: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?


I believe disks are 33c/gigabyte and tapes are 3-9cents/gigabyte or even

cheaper, I do not remember the exact figures, but someone I know has
done a cost analysis and tapes were by far cheaper.  Also something that
nobody calculates is the cost of power to keep disks spinning.

Justin.

On Fri, 21 Sep 2007, Jeff Lightner wrote:

 Disk is not cheaper?  You've done a cost analysis?

 Not saying you're wrong and I haven't done an analysis but I'd be 
 surprised if disks didn't actually work out to be cheaper over time:

 1) Tapes age/break - We buy on average several hundred tapes a year - 
 support on a disk array for failing disks may or may not be more 
 expensive.

 2) Transport/storage - We have to pay for offsite storage and transfer
-
 it seems just putting an array in offsite facility would eliminate the

 need for transportation (in trucks) cost.  Of course there would be
cost
 in the data transfer disk to disk but since everyone seems to have 
 connectivity over the internet it might be possible to do this using a

 B2B link rather than via dedicated circuits.

 3) Labor cost in dealing with mechanical failures of robots.   This
one
 is hidden in salary but every time I have to work on a robot it means
I
 can't be working on something else.   While disk drives fail it
doesn't
 seem to happen nearly as often as having to fish a tape out of a drive

 or the tape drive itself having failed.


 -Original Message-
 From: Justin Piszcz [mailto:[EMAIL PROTECTED]
 Sent: Friday, September 21, 2007 10:08 AM
 To: Jeff Lightner
 Cc: veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu

Re: [Veritas-bu] Tapeless backup environments?

2007-09-21 Thread Curtis Preston
Come on, man.  Can't give away everything! ;)

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-Original Message-
From: Martin, Jonathan [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 21, 2007 11:16 AM
To: Curtis Preston; veritas-bu@mailman.eng.auburn.edu
Subject: RE: [Veritas-bu] Tapeless backup environments?

I stand corrected.  Curtis has all the answers and he's sitting on them.
=P

Worrying about multiplexing settings and tape failures?  Come on, that's
about as soft a cost as you can dream up.

-Jonathan 

-Original Message-
From: Curtis Preston [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 21, 2007 2:06 PM
To: Martin, Jonathan; veritas-bu@mailman.eng.auburn.edu
Subject: RE: [Veritas-bu] Tapeless backup environments?

Oh, I wouldn't say that. ;)  We've been doing a lot of comparisons
lately, and the comparisons include all of what you listed plus the cost
differential in cost of operation.  For example, opex savings from not
having to worry about multiplexing settings, tape failures, etc.

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Martin,
Jonathan
Sent: Friday, September 21, 2007 10:37 AM
To: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

 
I think what I'm reading here is that no one has done a true 1-to-1
comparison on Tape versus Deduplication / disk.  I guess the next
question is, what would go into such a comparison?

1) Recovery Point Objective
2) Amount of Data To Be Backed Up
3) Retention
4) Cost of Hardware (Deduplication Appliance w/ Disk)
5) Cost of Hardware (Tape Library)
6) Annual Maintenance on Hardware Above
7) Cost of Media w/ Replacement Figures
8) Cost to power / cool disks (infrastructure)
9) Cost of Network link to remote site for de-dupe
10) Cost of Media Transportation and Storage

Price per GB unless factoring in at least all of the above is useless
and much of that information depends on configuration.  I did such an
analysis when we upgraded to NBU6 and considered deduplication this time
last year.  In my case, many of the features of disk based deduplication
weren't applicable to my situation (especially RPO) so tape was easily
cheaper.  If you are shipping media offsite daily though for a =1 day
RPO then deduplication definitely makes a play.  Further price per gig
on the disk side has been heavily influenced by consumer grade SATA
drives at 750gb and 1TB bringing costs way down in comparison to only 1
or 2 years ago.

There's certainly a lot of data to injest before making claims of either
technology's superiority in a particular environment.

-Jonathan


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Curtis
Preston
Sent: Friday, September 21, 2007 1:10 PM
To: Justin Piszcz; Jeff Lightner
Cc: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

First, you can't compare the cost of disk and tape directly like that.
You have to include the drives and robots.  A drive by itself is useful;
a tape by itself is not.  

Setting that aside, if I put that disk in a system that's doing 20:1
de-duplication, my cost is now 1.65c/GB vs your 3-9c/GB.  

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Justin
Piszcz
Sent: Friday, September 21, 2007 7:36 AM
To: Jeff Lightner
Cc: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?


I believe disks are 33c/gigabyte and tapes are 3-9cents/gigabyte or even

cheaper, I do not remember the exact figures, but someone I know has
done a cost analysis and tapes were by far cheaper.  Also something that
nobody calculates is the cost of power to keep disks spinning.

Justin.

On Fri, 21 Sep 2007, Jeff Lightner wrote:

 Disk is not cheaper?  You've done a cost analysis?

 Not saying you're wrong and I haven't done an analysis but I'd be 
 surprised if disks didn't actually work out to be cheaper over time:

 1) Tapes age/break - We buy on average several hundred tapes a year - 
 support on a disk array for failing disks may or may not be more 
 expensive.

 2) Transport/storage - We have to pay for offsite storage and transfer
-
 it seems just putting an array in offsite facility would eliminate the

 need for transportation (in trucks) cost.  Of course there would be
cost
 in the data transfer disk to disk but since everyone seems to have 
 connectivity over the internet it might be possible to do this using a

 B2B link rather than via dedicated circuits.

 3) Labor cost in dealing with mechanical failures of robots.   This
one
 is hidden in salary but every time I have to work on a robot it means
I
 can't be working

Re: [Veritas-bu] Tapeless backup environments?

2007-09-21 Thread Curtis Preston
Oh, I wouldn't say that. ;)  We've been doing a lot of comparisons
lately, and the comparisons include all of what you listed plus the cost
differential in cost of operation.  For example, opex savings from not
having to worry about multiplexing settings, tape failures, etc.

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Martin,
Jonathan
Sent: Friday, September 21, 2007 10:37 AM
To: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

 
I think what I'm reading here is that no one has done a true 1-to-1
comparison on Tape versus Deduplication / disk.  I guess the next
question is, what would go into such a comparison?

1) Recovery Point Objective
2) Amount of Data To Be Backed Up
3) Retention
4) Cost of Hardware (Deduplication Appliance w/ Disk)
5) Cost of Hardware (Tape Library)
6) Annual Maintenance on Hardware Above
7) Cost of Media w/ Replacement Figures
8) Cost to power / cool disks (infrastructure)
9) Cost of Network link to remote site for de-dupe
10) Cost of Media Transportation and Storage

Price per GB unless factoring in at least all of the above is useless
and much of that information depends on configuration.  I did such an
analysis when we upgraded to NBU6 and considered deduplication this time
last year.  In my case, many of the features of disk based deduplication
weren't applicable to my situation (especially RPO) so tape was easily
cheaper.  If you are shipping media offsite daily though for a =1 day
RPO then deduplication definitely makes a play.  Further price per gig
on the disk side has been heavily influenced by consumer grade SATA
drives at 750gb and 1TB bringing costs way down in comparison to only 1
or 2 years ago.

There's certainly a lot of data to injest before making claims of either
technology's superiority in a particular environment.

-Jonathan


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Curtis
Preston
Sent: Friday, September 21, 2007 1:10 PM
To: Justin Piszcz; Jeff Lightner
Cc: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

First, you can't compare the cost of disk and tape directly like that.
You have to include the drives and robots.  A drive by itself is useful;
a tape by itself is not.  

Setting that aside, if I put that disk in a system that's doing 20:1
de-duplication, my cost is now 1.65c/GB vs your 3-9c/GB.  

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Justin
Piszcz
Sent: Friday, September 21, 2007 7:36 AM
To: Jeff Lightner
Cc: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?


I believe disks are 33c/gigabyte and tapes are 3-9cents/gigabyte or even

cheaper, I do not remember the exact figures, but someone I know has
done a cost analysis and tapes were by far cheaper.  Also something that
nobody calculates is the cost of power to keep disks spinning.

Justin.

On Fri, 21 Sep 2007, Jeff Lightner wrote:

 Disk is not cheaper?  You've done a cost analysis?

 Not saying you're wrong and I haven't done an analysis but I'd be 
 surprised if disks didn't actually work out to be cheaper over time:

 1) Tapes age/break - We buy on average several hundred tapes a year - 
 support on a disk array for failing disks may or may not be more 
 expensive.

 2) Transport/storage - We have to pay for offsite storage and transfer
-
 it seems just putting an array in offsite facility would eliminate the

 need for transportation (in trucks) cost.  Of course there would be
cost
 in the data transfer disk to disk but since everyone seems to have 
 connectivity over the internet it might be possible to do this using a

 B2B link rather than via dedicated circuits.

 3) Labor cost in dealing with mechanical failures of robots.   This
one
 is hidden in salary but every time I have to work on a robot it means
I
 can't be working on something else.   While disk drives fail it
doesn't
 seem to happen nearly as often as having to fish a tape out of a drive

 or the tape drive itself having failed.


 -Original Message-
 From: Justin Piszcz [mailto:[EMAIL PROTECTED]
 Sent: Friday, September 21, 2007 10:08 AM
 To: Jeff Lightner
 Cc: veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu] Tapeless backup environments?



 On Fri, 21 Sep 2007, Jeff Lightner wrote:

 Yesterday our director said that he doesn't intend to ever upgrade 
 existing STK L700 because eventually we'll go tapeless as that is
what
 the industry is doing.   The idea being we'd have our disk backup
 devices here (e.g. Data Domain) and transfer to offsite storage to 
 another disk device so as to eliminate the need for ever transporting

 tapes

Re: [Veritas-bu] Tapeless backup environments?

2007-09-21 Thread Steve Quan


/Steve

On Fri, 21 Sep 2007, Martin, Jonathan wrote:

 I stand corrected.  Curtis has all the answers and he's sitting on them.
 =P

 Worrying about multiplexing settings and tape failures?  Come on, that's
 about as soft a cost as you can dream up.

 -Jonathan

 -Original Message-
 From: Curtis Preston [mailto:[EMAIL PROTECTED]
 Sent: Friday, September 21, 2007 2:06 PM
 To: Martin, Jonathan; veritas-bu@mailman.eng.auburn.edu
 Subject: RE: [Veritas-bu] Tapeless backup environments?

 Oh, I wouldn't say that. ;)  We've been doing a lot of comparisons
 lately, and the comparisons include all of what you listed plus the cost
 differential in cost of operation.  For example, opex savings from not
 having to worry about multiplexing settings, tape failures, etc.

 ---
 W. Curtis Preston
 Backup Blog @ www.backupcentral.com
 VP Data Protection, GlassHouse Technologies

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Martin,
 Jonathan
 Sent: Friday, September 21, 2007 10:37 AM
 To: veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu] Tapeless backup environments?


 I think what I'm reading here is that no one has done a true 1-to-1
 comparison on Tape versus Deduplication / disk.  I guess the next
 question is, what would go into such a comparison?

 1) Recovery Point Objective
 2) Amount of Data To Be Backed Up
 3) Retention
 4) Cost of Hardware (Deduplication Appliance w/ Disk)
 5) Cost of Hardware (Tape Library)
 6) Annual Maintenance on Hardware Above
 7) Cost of Media w/ Replacement Figures
 8) Cost to power / cool disks (infrastructure)
 9) Cost of Network link to remote site for de-dupe
 10) Cost of Media Transportation and Storage

 Price per GB unless factoring in at least all of the above is useless
 and much of that information depends on configuration.  I did such an
 analysis when we upgraded to NBU6 and considered deduplication this time
 last year.  In my case, many of the features of disk based deduplication
 weren't applicable to my situation (especially RPO) so tape was easily
 cheaper.  If you are shipping media offsite daily though for a =1 day
 RPO then deduplication definitely makes a play.  Further price per gig
 on the disk side has been heavily influenced by consumer grade SATA
 drives at 750gb and 1TB bringing costs way down in comparison to only 1
 or 2 years ago.

 There's certainly a lot of data to injest before making claims of either
 technology's superiority in a particular environment.

 -Jonathan


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Curtis
 Preston
 Sent: Friday, September 21, 2007 1:10 PM
 To: Justin Piszcz; Jeff Lightner
 Cc: veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu] Tapeless backup environments?

 First, you can't compare the cost of disk and tape directly like that.
 You have to include the drives and robots.  A drive by itself is useful;
 a tape by itself is not.

 Setting that aside, if I put that disk in a system that's doing 20:1
 de-duplication, my cost is now 1.65c/GB vs your 3-9c/GB.

 ---
 W. Curtis Preston
 Backup Blog @ www.backupcentral.com
 VP Data Protection, GlassHouse Technologies

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Justin
 Piszcz
 Sent: Friday, September 21, 2007 7:36 AM
 To: Jeff Lightner
 Cc: veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu] Tapeless backup environments?


 I believe disks are 33c/gigabyte and tapes are 3-9cents/gigabyte or even

 cheaper, I do not remember the exact figures, but someone I know has
 done a cost analysis and tapes were by far cheaper.  Also something that
 nobody calculates is the cost of power to keep disks spinning.

 Justin.

 On Fri, 21 Sep 2007, Jeff Lightner wrote:

 Disk is not cheaper?  You've done a cost analysis?

 Not saying you're wrong and I haven't done an analysis but I'd be
 surprised if disks didn't actually work out to be cheaper over time:

 1) Tapes age/break - We buy on average several hundred tapes a year -
 support on a disk array for failing disks may or may not be more
 expensive.

 2) Transport/storage - We have to pay for offsite storage and transfer
 -
 it seems just putting an array in offsite facility would eliminate the

 need for transportation (in trucks) cost.  Of course there would be
 cost
 in the data transfer disk to disk but since everyone seems to have
 connectivity over the internet it might be possible to do this using a

 B2B link rather than via dedicated circuits.

 3) Labor cost in dealing with mechanical failures of robots.   This
 one
 is hidden in salary but every time I have to work on a robot it means
 I
 can't be working on something else.   While disk drives fail it
 doesn't
 seem to happen nearly as often as having to fish a tape out of a drive

 or the tape drive itself having failed.


 -Original Message-
 From: Justin Piszcz [mailto:[EMAIL PROTECTED

Re: [Veritas-bu] Tapeless backup environments?

2007-09-21 Thread Lepley, Michael

When we put in our backup to disk system the main driving factor was the
point in time recovery and replication.  We use the system to backup
critical data for business continuity and then replicate it to our DR
site.  This gives us point in time recovery of about 24 hours or less.  

When we compared tape to disk we found that although the disk backup was
more expensive it wasn't prohibitively so.  And the benefit of the point
in time recovery of 24 hours or less made it the best solution for our
needs.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Martin,
Jonathan
Sent: Friday, September 21, 2007 12:37 PM
To: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?


 
I think what I'm reading here is that no one has done a true 1-to-1
comparison on Tape versus Deduplication / disk.  I guess the next
question is, what would go into such a comparison?

1) Recovery Point Objective
2) Amount of Data To Be Backed Up
3) Retention
4) Cost of Hardware (Deduplication Appliance w/ Disk)
5) Cost of Hardware (Tape Library)
6) Annual Maintenance on Hardware Above
7) Cost of Media w/ Replacement Figures
8) Cost to power / cool disks (infrastructure)
9) Cost of Network link to remote site for de-dupe
10) Cost of Media Transportation and Storage

Price per GB unless factoring in at least all of the above is useless
and much of that information depends on configuration.  I did such an
analysis when we upgraded to NBU6 and considered deduplication this time
last year.  In my case, many of the features of disk based deduplication
weren't applicable to my situation (especially RPO) so tape was easily
cheaper.  If you are shipping media offsite daily though for a =1 day
RPO then deduplication definitely makes a play.  Further price per gig
on the disk side has been heavily influenced by consumer grade SATA
drives at 750gb and 1TB bringing costs way down in comparison to only 1
or 2 years ago.

There's certainly a lot of data to injest before making claims of either
technology's superiority in a particular environment.

-Jonathan


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Curtis
Preston
Sent: Friday, September 21, 2007 1:10 PM
To: Justin Piszcz; Jeff Lightner
Cc: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?

First, you can't compare the cost of disk and tape directly like that.
You have to include the drives and robots.  A drive by itself is useful;
a tape by itself is not.  

Setting that aside, if I put that disk in a system that's doing 20:1
de-duplication, my cost is now 1.65c/GB vs your 3-9c/GB.  

---
W. Curtis Preston
Backup Blog @ www.backupcentral.com
VP Data Protection, GlassHouse Technologies 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Justin
Piszcz
Sent: Friday, September 21, 2007 7:36 AM
To: Jeff Lightner
Cc: veritas-bu@mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] Tapeless backup environments?


I believe disks are 33c/gigabyte and tapes are 3-9cents/gigabyte or even

cheaper, I do not remember the exact figures, but someone I know has
done a cost analysis and tapes were by far cheaper.  Also something that
nobody calculates is the cost of power to keep disks spinning.

Justin.

On Fri, 21 Sep 2007, Jeff Lightner wrote:

 Disk is not cheaper?  You've done a cost analysis?

 Not saying you're wrong and I haven't done an analysis but I'd be 
 surprised if disks didn't actually work out to be cheaper over time:

 1) Tapes age/break - We buy on average several hundred tapes a year - 
 support on a disk array for failing disks may or may not be more 
 expensive.

 2) Transport/storage - We have to pay for offsite storage and transfer
-
 it seems just putting an array in offsite facility would eliminate the

 need for transportation (in trucks) cost.  Of course there would be
cost
 in the data transfer disk to disk but since everyone seems to have 
 connectivity over the internet it might be possible to do this using a

 B2B link rather than via dedicated circuits.

 3) Labor cost in dealing with mechanical failures of robots.   This
one
 is hidden in salary but every time I have to work on a robot it means
I
 can't be working on something else.   While disk drives fail it
doesn't
 seem to happen nearly as often as having to fish a tape out of a drive

 or the tape drive itself having failed.


 -Original Message-
 From: Justin Piszcz [mailto:[EMAIL PROTECTED]
 Sent: Friday, September 21, 2007 10:08 AM
 To: Jeff Lightner
 Cc: veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu] Tapeless backup environments?



 On Fri, 21 Sep 2007, Jeff Lightner wrote:

 Yesterday our director said that he doesn't intend to ever upgrade 
 existing STK L700 because eventually we'll go tapeless as that is
what
 the industry is doing.   The idea being we'd have our disk backup
 devices here (e.g

Re: [Veritas-bu] Tapeless backup environments?

2007-09-21 Thread Austin Murphy
On 9/21/07, Jeff Lightner [EMAIL PROTECTED] wrote:

 Yesterday our director said that he doesn't intend to ever upgrade existing
 STK L700 because eventually we'll go tapeless as that is what the industry
 is doing.
snip


Tape has been dying for 30 years.
http://searchstoragechannel.techtarget.com/originalContent/0,289142,sid98_gci1237881,00.html

An article on Slashdot this morning described the 5 biggest SANs.
It was interesting to see that NASA's SAN includes 1.1 Pbytes of disk
and 10 Pbytes of tape storage and the SAN at the San Diego
Supercomputing Center has about 1 Pbyte disk and 18 Pbytes of tape
storage.
http://www.byteandswitch.com/document.asp?doc_id=134355WT.svl=news1_1


Austin
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu