Re: [CODE4LIB] digital storage

2009-08-28 Thread Edward Iglesias
Thanks to all of you who answered.  Crowdsourcing does work if you
pick the right crowd.  We have been looking at the S3 possibility but
I agree this would have to be a second copy.  The policy and
institutional support comments from my tokayo

see http://en.wiktionary.org/wiki/tocayo

seem especially appropriate.  I am going to include a link on our
staff blog to this thread as a resource.

Thanks again,

Edward Iglesias



On Thu, Aug 27, 2009 at 8:59 PM, Edward M. Corradoecorr...@ecorrado.us wrote:
 Joe Atzberger wrote:

 On Thu, Aug 27, 2009 at 4:25 PM, Edward M. Corrado
 ecorr...@ecorrado.uswrote:



 Nate Vack wrote:



 On Thu, Aug 27, 2009 at 1:57 PM, Ryan Ordwayrord...@oregonstate.edu
 wrote:




 $213,360 over 3 years




  If you're ONLY looking at storage costs, SATA drives in enterprise RAID


 systems range from about $1.00/GB to about $1.25/GB for online storage.



 Yeah -- but if you're looking only at storage costs, you'll have an
 inaccurate estimate of your costs. You've got power, cooling, sysadmin
 time, and replacements for failed disks. If you want an
 apples-to-apples comparison, you'll want an offsite mirror, as well.

 I'm not saying S3 is always cost-effective -- but in our experience,
 the costs of the disks themselves is dwarfed by the costs of the
 related infrastructure.

  I agree that the cost of storage is only one factor. I have to wonder


 though, how much more staff time do you need for local storage than cloud
 storage? I don't know the answer but I'm not sure it is much more than
 setting up S3 storage, especially if you have a good partnership with
 your
 storage vendor.



 Support relationships, especially regarding storage are very costly.  When
 I
 worked at a midsize datacenter, we implemented a backup solution with
 STORServer and tivoli.  Both hardware and software were considerably
 costly.  Initial and ongoing support, while indispensable was basically as
 much as the cost of the hardware every few years.


 They can be depending on what you are doing and what choices on software you
 make, but for long term preservation purposes they don't have to be nearly
 as expensive as what Ryan calculated S3 to cost. If you shop around you can
 get a quality 36GB array with 3 yr warranty for say $30,000 that is almost
 $180,000 less than S3 (probably much less, I'm be less than generous with my
 Sun discounts and only briefly looked at there prices). Even if we use the
 double your cost for support, it is still over $50,000 a year less for 3
 years. Yes, we might need some expertise, but running a 36TB preservation
 storage array is not a $50,000 a year job and besides, what is wrong with
 growing local expertise?

 ...

 Yes, maybe you save on staff time patching software on your storage
 array,
 but that is not a significant amount of time - esp. since you are still
 going to have some local storage, and there isn't much difference in
 staff
 time in doing 2 TB vs. 20 TB.



 There's a real difference.  I can get 2 TB in a single HDD, for example
 this
 one for $200 at NewEgg:
 http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413
 http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413

 Any high school kid can install that.  20 TB requires some kind of
 additional structure and additional expertise.


 Well building a 20 TB storage device and getting it to work can actually be
 very cheap and doesn't require a PhD (just a local GNU/Linux geek who likes
 to play with hardware) if you are OK with a home grown solution. I wouldn't
 be satisfied with that, but I don't see how a commercial offering that adds
 up to $150,000 worth of expertise and infrastructure.

 You may some time on the initial configuration, but you still need to


 configure cloud storage. Is cloud storage that much easier/less time
 consuming to configure than an iSCSI device? Replacement for disks would
 be
 covered under your warranty or support contract (at least I would hope
 you
 would have one).



 Warranties expire and force you into ill-timed, hardly-afforded and
 dangerous-to-your-data upgrades.  Sorta like some ILS systems with which
 we
 are all familiar.

 Yes some application upgrades can cause issues, but how is that different if
 your application and/or storage is in a  cloud?

  The cloud doesn't necessarily stay the same, but the part
 you care about (data in, data out) does.


 How do you know they won't change their cloud models? And you don't even
 have a warranty with the cloud. They won't even guarantee they won't delete
 your data.

 As long as you use a common standards based method of storage, you won't
 have any more issues getting it to work than you will getting future
 application servers to work with the cloud. While I'm not a huge fan of NFS
 I've been using it for many years with no problems due to changes in NFS or
 operating systems or hardware. NFS has been available to the public for
 about 20 years. Occasionally you may need to migrate it 

Re: [CODE4LIB] digital storage

2009-08-28 Thread Tim McGeary

Related to our discussion:
http://online.wsj.com/article/SB125139942345664387.html

I particularly like the quote at the end:


Digital information lasts forever -- or five years, says RAND Corp.
computer analyst Jeff Rothenberg, whichever comes first.


Tim McGeary
Team Leader, Library Technology
Lehigh University
610-758-4998
tim.mcge...@lehigh.edu
Google Talk: timmcgeary
Yahoo IM: timmcgeary

Edward Iglesias wrote:
Thanks to all of you who answered.  Crowdsourcing does work if you 
pick the right crowd.  We have been looking at the S3 possibility but
 I agree this would have to be a second copy.  The policy and 
institutional support comments from my tokayo


see http://en.wiktionary.org/wiki/tocayo

seem especially appropriate.  I am going to include a link on our 
staff blog to this thread as a resource.


Thanks again,

Edward Iglesias



On Thu, Aug 27, 2009 at 8:59 PM, Edward M.
Corradoecorr...@ecorrado.us wrote:

Joe Atzberger wrote:
On Thu, Aug 27, 2009 at 4:25 PM, Edward M. Corrado 
ecorr...@ecorrado.uswrote:




Nate Vack wrote:



On Thu, Aug 27, 2009 at 1:57 PM, Ryan
Ordwayrord...@oregonstate.edu wrote:




$213,360 over 3 years




If you're ONLY looking at storage costs, SATA drives in
enterprise RAID


systems range from about $1.00/GB to about $1.25/GB for
online storage.



Yeah -- but if you're looking only at storage costs, you'll
have an inaccurate estimate of your costs. You've got power,
cooling, sysadmin time, and replacements for failed disks. If
you want an apples-to-apples comparison, you'll want an
offsite mirror, as well.

I'm not saying S3 is always cost-effective -- but in our
experience, the costs of the disks themselves is dwarfed by
the costs of the related infrastructure.

I agree that the cost of storage is only one factor. I have
to wonder


though, how much more staff time do you need for local storage
than cloud storage? I don't know the answer but I'm not sure it
is much more than setting up S3 storage, especially if you have
a good partnership with your storage vendor.



Support relationships, especially regarding storage are very
costly.  When I worked at a midsize datacenter, we implemented a
backup solution with STORServer and tivoli.  Both hardware and
software were considerably costly.  Initial and ongoing support,
while indispensable was basically as much as the cost of the
hardware every few years.


They can be depending on what you are doing and what choices on
software you make, but for long term preservation purposes they
don't have to be nearly as expensive as what Ryan calculated S3 to
cost. If you shop around you can get a quality 36GB array with 3 yr
warranty for say $30,000 that is almost $180,000 less than S3
(probably much less, I'm be less than generous with my Sun
discounts and only briefly looked at there prices). Even if we use
the double your cost for support, it is still over $50,000 a year
less for 3 years. Yes, we might need some expertise, but running a
36TB preservation storage array is not a $50,000 a year job and
besides, what is wrong with growing local expertise?

...

Yes, maybe you save on staff time patching software on your
storage array, but that is not a significant amount of time -
esp. since you are still going to have some local storage, and
there isn't much difference in staff time in doing 2 TB vs. 20
TB.



There's a real difference.  I can get 2 TB in a single HDD, for
example this one for $200 at NewEgg: 
http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413 
http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413



Any high school kid can install that.  20 TB requires some kind
of additional structure and additional expertise.


Well building a 20 TB storage device and getting it to work can
actually be very cheap and doesn't require a PhD (just a local
GNU/Linux geek who likes to play with hardware) if you are OK with
a home grown solution. I wouldn't be satisfied with that, but I
don't see how a commercial offering that adds up to $150,000 worth
of expertise and infrastructure.


You may some time on the initial configuration, but you still
need to


configure cloud storage. Is cloud storage that much easier/less
time consuming to configure than an iSCSI device? Replacement
for disks would be covered under your warranty or support
contract (at least I would hope you would have one).



Warranties expire and force you into ill-timed, hardly-afforded
and dangerous-to-your-data upgrades.  Sorta like some ILS systems
with which we are all familiar.

Yes some application upgrades can cause issues, but how is that
different if your application and/or storage is in a  cloud?


The cloud doesn't necessarily stay the same, but the part you
care about (data in, data out) does.


How do you know they won't change their cloud models? And you don't
even have a warranty with the cloud. They won't even guarantee they
won't delete your data.

As long as you use a common standards based method of storage, you

Re: [CODE4LIB] digital storage

2009-08-28 Thread John Fereira

Edward Iglesias wrote:

Thanks to all of you who answered.  Crowdsourcing does work if you
pick the right crowd.  We have been looking at the S3 possibility but
I agree this would have to be a second copy.  


There been lots of talk about hardware platorms but not much about 
software to manage all that data.  You might want to take a look at 
DuraCloud from DuraSpace (http://duraspace.org/duracloud.php). 
DuraSpace is the recently created organization from the merger of Fedora 
and DSpace.  When I first started at Cornell 12 years ago I sat next to 
Sandy Payette, who is now the CEO of DuraSpace.


[CODE4LIB] digital storage

2009-08-27 Thread Edward Iglesias
As I was trying to figure out what to do with half a terabyte of
archival TIFFS it occurred to me that perhaps someone else had this
problem.  We are starting to produce massive amounts of digital
objects (videos, archival TIFFS, audio interviews).  Up until now we
have been dealing with ways to display them to the public.  Now we are
starting to look at dark archives like OCLC's digital archive
product.  I would welcome any suggestions from those of you who have
dealt with this on an archival level.  It's one thing to stick the
stuff up on a server, but then what?  Our CIO suggested storage
appliances like this one


http://www.drobo.com/products/index.php

but I am wary of the proprietary RAID system.

Thanks in advance,



~
Edward Iglesias
Systems Librarian
Central Connecticut State University


Re: [CODE4LIB] digital storage

2009-08-27 Thread Rosalyn Metz
Hi Edward,

Might I suggest you look into cloud computing services if you're looking at
different options. (I know you're all shocked I suggested it).  If our
budget weren't so abysmal (and going to get worse) we would be using it
right now rather than the snap server we purchased with leftover funds.  The
benefits of using the cloud is of course the elasticity it offers you.  The
negative is that you have to pay to put your files into the cloud and then
pay again to take them out (and since we've already been slashed 30% and are
guaranteed another slash...that idea was shot down).

Of course the major player out there is Amazon S3.  The problem is that you
can't use S3 via Amazon's Web Management Console.  But there is a company
called RightScale (http://www.rightscale.com/index.php) which has a web
management console that allows you to upload files quickly and easily
without having to write scripts and what not.

Anyway, just my two cents.

Rosalyn



On Thu, Aug 27, 2009 at 8:10 AM, Edward Iglesias
edwardigles...@gmail.comwrote:

 As I was trying to figure out what to do with half a terabyte of
 archival TIFFS it occurred to me that perhaps someone else had this
 problem.  We are starting to produce massive amounts of digital
 objects (videos, archival TIFFS, audio interviews).  Up until now we
 have been dealing with ways to display them to the public.  Now we are
 starting to look at dark archives like OCLC's digital archive
 product.  I would welcome any suggestions from those of you who have
 dealt with this on an archival level.  It's one thing to stick the
 stuff up on a server, but then what?  Our CIO suggested storage
 appliances like this one


 http://www.drobo.com/products/index.php

 but I am wary of the proprietary RAID system.

 Thanks in advance,



 ~
 Edward Iglesias
 Systems Librarian
 Central Connecticut State University



Re: [CODE4LIB] digital storage

2009-08-27 Thread Edward M. Corrado
I think you probably need to come up with a long term plan with real 
institutional commitment. Storing files and making sure they are backed 
up is all well and good, but that is only one part of a long term 
digital preservation project. How are you protecting against bit rot? 
what about formats that become obsolete? etc.


We are planning to outsource our longterm digital preservation because 
we do not have the staff to maintain it.


Edward

Edward Iglesias wrote:

As I was trying to figure out what to do with half a terabyte of
archival TIFFS it occurred to me that perhaps someone else had this
problem.  We are starting to produce massive amounts of digital
objects (videos, archival TIFFS, audio interviews).  Up until now we
have been dealing with ways to display them to the public.  Now we are
starting to look at dark archives like OCLC's digital archive
product.  I would welcome any suggestions from those of you who have
dealt with this on an archival level.  It's one thing to stick the
stuff up on a server, but then what?  Our CIO suggested storage
appliances like this one


http://www.drobo.com/products/index.php

but I am wary of the proprietary RAID system.

Thanks in advance,



~
Edward Iglesias
Systems Librarian
Central Connecticut State University
  


Re: [CODE4LIB] digital storage

2009-08-27 Thread Edward M. Corrado
Rosalyn's post  made me think of one more thing if you are looking 
into outside entities (such as we are), what are the terms of service 
and what guarantee do they offer they won't lose your data? I believe 
that A3 does not offer any guarantee, so if you go with them, you 
probably want to have some other form of storage as well. Even if they 
offered a guarantee, what good is it once they loose your documents you 
were trying to preserve?


Edward Corrado



Rosalyn Metz wrote:

Hi Edward,

Might I suggest you look into cloud computing services if you're looking at
different options. (I know you're all shocked I suggested it).  If our
budget weren't so abysmal (and going to get worse) we would be using it
right now rather than the snap server we purchased with leftover funds.  The
benefits of using the cloud is of course the elasticity it offers you.  The
negative is that you have to pay to put your files into the cloud and then
pay again to take them out (and since we've already been slashed 30% and are
guaranteed another slash...that idea was shot down).

Of course the major player out there is Amazon S3.  The problem is that you
can't use S3 via Amazon's Web Management Console.  But there is a company
called RightScale (http://www.rightscale.com/index.php) which has a web
management console that allows you to upload files quickly and easily
without having to write scripts and what not.

Anyway, just my two cents.

Rosalyn



On Thu, Aug 27, 2009 at 8:10 AM, Edward Iglesias
edwardigles...@gmail.comwrote:

  

As I was trying to figure out what to do with half a terabyte of
archival TIFFS it occurred to me that perhaps someone else had this
problem.  We are starting to produce massive amounts of digital
objects (videos, archival TIFFS, audio interviews).  Up until now we
have been dealing with ways to display them to the public.  Now we are
starting to look at dark archives like OCLC's digital archive
product.  I would welcome any suggestions from those of you who have
dealt with this on an archival level.  It's one thing to stick the
stuff up on a server, but then what?  Our CIO suggested storage
appliances like this one


http://www.drobo.com/products/index.php

but I am wary of the proprietary RAID system.

Thanks in advance,



~
Edward Iglesias
Systems Librarian
Central Connecticut State University




Re: [CODE4LIB] digital storage

2009-08-27 Thread Jason Griffey
The basic idea of LOCKSS is always what I think of when it comes to
archival: lots of copies. For my own personal archival stuff, I do use a
Drobo...and have recommended that we get one of the new Drobo Pros for use
here in the library. But not for archival, just for storage. For things that
I really do not want to ever go away, I make sure that I have 3 copies: one
remote, and at least two local.

There are bigadvantages to the Drobo over traditional RAID, and with about
the same amount of risks overall. The Drobo is growable, and can use mix and
match drives, which gives it, IMO, a leg up over traditional RAID. I'm a
huge, huge fan. But for things I really care about, I'd have one copy on a
server, one copy on a drobo, and another copy in the cloud somewhere.

Jason


On Thu, Aug 27, 2009 at 8:10 AM, Edward Iglesias
edwardigles...@gmail.comwrote:

 As I was trying to figure out what to do with half a terabyte of
 archival TIFFS it occurred to me that perhaps someone else had this
 problem.  We are starting to produce massive amounts of digital
 objects (videos, archival TIFFS, audio interviews).  Up until now we
 have been dealing with ways to display them to the public.  Now we are
 starting to look at dark archives like OCLC's digital archive
 product.  I would welcome any suggestions from those of you who have
 dealt with this on an archival level.  It's one thing to stick the
 stuff up on a server, but then what?  Our CIO suggested storage
 appliances like this one


 http://www.drobo.com/products/index.php

 but I am wary of the proprietary RAID system.

 Thanks in advance,



 ~
 Edward Iglesias
 Systems Librarian
 Central Connecticut State University




-- 
Follow me on Twitter! http://www.twitter.com/griffey


Re: [CODE4LIB] digital storage

2009-08-27 Thread Mark Jordan
- Jason Griffey grif...@gmail.com wrote:

 The basic idea of LOCKSS is always what I think of when it comes to
 archival: lots of copies. 

We're starting to use LOCKSS, in the form of a consortial Private LOCKSS 
Network (PLN), and it is proving to be useful. I'll be presenting on what we're 
doing at Access next month.

Currently LOCKSS doesn't scale very high. The problem is that you need at least 
six boxes in your network, and if each site has 5 TB of stuff, then each box 
needs 30 TB of storage. Since LOCKSS uses 32-bit OpenBSD, and OpenBSD's support 
for attached storage isn't as good as some other OSs, you're pretty much 
limited to using local disk. That all said, LOCKSS is actively working to 
overcome the storage scalability issue, and better support for attached storage 
is being tested right now.

Mark

Mark Jordan
Head of Library Systems
W.A.C. Bennett Library, Simon Fraser University
Burnaby, British Columbia, V5A 1S6, Canada
Voice: 778.782.5753 / Fax: 778.782.3023
mjor...@sfu.ca


Re: [CODE4LIB] digital storage

2009-08-27 Thread Custer, Mark
Speaking of LOCKSS (and PLNs), there's the MetaArchive:
http://www.metaarchive.org/

You might want to consider contacting them as well.


Mark


-Original Message-
From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jason 
Griffey
Sent: Thursday, August 27, 2009 9:59 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] digital storage

The basic idea of LOCKSS is always what I think of when it comes to
archival: lots of copies. For my own personal archival stuff, I do use a
Drobo...and have recommended that we get one of the new Drobo Pros for use
here in the library. But not for archival, just for storage. For things that
I really do not want to ever go away, I make sure that I have 3 copies: one
remote, and at least two local.

There are bigadvantages to the Drobo over traditional RAID, and with about
the same amount of risks overall. The Drobo is growable, and can use mix and
match drives, which gives it, IMO, a leg up over traditional RAID. I'm a
huge, huge fan. But for things I really care about, I'd have one copy on a
server, one copy on a drobo, and another copy in the cloud somewhere.

Jason


On Thu, Aug 27, 2009 at 8:10 AM, Edward Iglesias
edwardigles...@gmail.comwrote:

 As I was trying to figure out what to do with half a terabyte of
 archival TIFFS it occurred to me that perhaps someone else had this
 problem.  We are starting to produce massive amounts of digital
 objects (videos, archival TIFFS, audio interviews).  Up until now we
 have been dealing with ways to display them to the public.  Now we are
 starting to look at dark archives like OCLC's digital archive
 product.  I would welcome any suggestions from those of you who have
 dealt with this on an archival level.  It's one thing to stick the
 stuff up on a server, but then what?  Our CIO suggested storage
 appliances like this one


 http://www.drobo.com/products/index.php

 but I am wary of the proprietary RAID system.

 Thanks in advance,



 ~
 Edward Iglesias
 Systems Librarian
 Central Connecticut State University




-- 
Follow me on Twitter! http://www.twitter.com/griffey


Re: [CODE4LIB] digital storage

2009-08-27 Thread Joe Hourcle

On Thu, 27 Aug 2009, Edward Iglesias wrote:


As I was trying to figure out what to do with half a terabyte of
archival TIFFS it occurred to me that perhaps someone else had this
problem.  We are starting to produce massive amounts of digital
objects (videos, archival TIFFS, audio interviews).  Up until now we
have been dealing with ways to display them to the public.  Now we are
starting to look at dark archives like OCLC's digital archive
product.  I would welcome any suggestions from those of you who have
dealt with this on an archival level.  It's one thing to stick the
stuff up on a server, but then what?  Our CIO suggested storage
appliances like this one


I'd recommend looking at two classes of products:

Near-line storage
MAID (Massive Array of Idle Disks)

Near line can be things like DVD juke boxes, where you don't have to have 
someone manually load the items, but it can't all be accessed at once. 
You put lower-res JPEGs up what we call 'browse images', and then when 
someone wants to full res high quality image, it goes to the jukebox. 
You might have to wait between 15 sec and 2 minutes for the file.


You can 'tune' them by adjusting the number of drives relative to the 
amount of disks in them, to reduce the latency.


MAID systems are like RAID, but they spin down the disks when they're not 
in use, so they have a much lower power draw when used for storage.  We've 
used them as both primary systems, and as storage for our backups of more 
highly-available data.


...

As for your comment of Drobo, we don't use that specific brand, but we 
do have a number of 4 or 6 disk RAID enclosures that we use for both 
transporting files (if someone needs to copy 2TB of files, we mail it to 
'em, rather than send it over the network), and for our off-site storage 
of critical data.


-Joe


Re: [CODE4LIB] digital storage

2009-08-27 Thread Rosalyn Metz
I have to agree with Ed.  You should have a good policy in place for backing
up your data.  Just throwing it on a server isn't a policy.

At the same time I would have to disagree with Ed.  You should look at S3 as
if it was your own server.  What is the guarantee that you supply to your
users with your own server.  The snap server we use here (instead of S3) is
the back up to a back up system already in place.


On Thu, Aug 27, 2009 at 9:52 AM, Edward M. Corrado ecorr...@ecorrado.uswrote:

 Rosalyn's post  made me think of one more thing if you are looking into
 outside entities (such as we are), what are the terms of service and what
 guarantee do they offer they won't lose your data? I believe that A3 does
 not offer any guarantee, so if you go with them, you probably want to have
 some other form of storage as well. Even if they offered a guarantee, what
 good is it once they loose your documents you were trying to preserve?

 Edward Corrado




 Rosalyn Metz wrote:

 Hi Edward,

 Might I suggest you look into cloud computing services if you're looking
 at
 different options. (I know you're all shocked I suggested it).  If our
 budget weren't so abysmal (and going to get worse) we would be using it
 right now rather than the snap server we purchased with leftover funds.
  The
 benefits of using the cloud is of course the elasticity it offers you.
  The
 negative is that you have to pay to put your files into the cloud and then
 pay again to take them out (and since we've already been slashed 30% and
 are
 guaranteed another slash...that idea was shot down).

 Of course the major player out there is Amazon S3.  The problem is that
 you
 can't use S3 via Amazon's Web Management Console.  But there is a company
 called RightScale (http://www.rightscale.com/index.php) which has a web
 management console that allows you to upload files quickly and easily
 without having to write scripts and what not.

 Anyway, just my two cents.

 Rosalyn



 On Thu, Aug 27, 2009 at 8:10 AM, Edward Iglesias
 edwardigles...@gmail.comwrote:



 As I was trying to figure out what to do with half a terabyte of
 archival TIFFS it occurred to me that perhaps someone else had this
 problem.  We are starting to produce massive amounts of digital
 objects (videos, archival TIFFS, audio interviews).  Up until now we
 have been dealing with ways to display them to the public.  Now we are
 starting to look at dark archives like OCLC's digital archive
 product.  I would welcome any suggestions from those of you who have
 dealt with this on an archival level.  It's one thing to stick the
 stuff up on a server, but then what?  Our CIO suggested storage
 appliances like this one


 http://www.drobo.com/products/index.php

 but I am wary of the proprietary RAID system.

 Thanks in advance,



 ~
 Edward Iglesias
 Systems Librarian
 Central Connecticut State University






Re: [CODE4LIB] digital storage

2009-08-27 Thread Kyle Banerjee
Agreed on both of Rosalyn's points.

I'm wary of the hot backup options discussed in this thread for large
quantities of data. First of all, hot backup is expensive -- disks
aren't that inexpensive, and after you add power and space, it gets
much worse. Start keeping many copies, and the price gets much worse.
LOCKSS is good for protecting articles since that is what it is
designed to do. For a variety of reasons that go beyond cost, I think
it's a hopeless model for backup.

Even if money is no object, bandwidth is a huge issue. Transferring a
few GB at a time is not a big deal, but it takes awhile. Transfer
large quantities and you run into trouble quickly. Bit rot is not so
much of an issue because you can check integrity regularly. For
example, a bottom of the line EC2 instance could continuously monitor
your S3 files.

Of course, there is the whole practicality aspect -- backup must be
convenient as well as effective. Different solutions strike me as
appropriate to different situations, but as much as I hate tapes,
they're effective, cheap, and efficient presuming you don't keep them
on site and verify them.

kyle

On Thu, Aug 27, 2009 at 8:43 AM, Rosalyn Metzrosalynm...@gmail.com wrote:
 I have to agree with Ed.  You should have a good policy in place for backing
 up your data.  Just throwing it on a server isn't a policy.

 At the same time I would have to disagree with Ed.  You should look at S3 as
 if it was your own server.  What is the guarantee that you supply to your
 users with your own server.  The snap server we use here (instead of S3) is
 the back up to a back up system already in place.


 On Thu, Aug 27, 2009 at 9:52 AM, Edward M. Corrado 
 ecorr...@ecorrado.uswrote:

 Rosalyn's post  made me think of one more thing if you are looking into
 outside entities (such as we are), what are the terms of service and what
 guarantee do they offer they won't lose your data? I believe that A3 does
 not offer any guarantee, so if you go with them, you probably want to have
 some other form of storage as well. Even if they offered a guarantee, what
 good is it once they loose your documents you were trying to preserve?

 Edward Corrado




 Rosalyn Metz wrote:

 Hi Edward,

 Might I suggest you look into cloud computing services if you're looking
 at
 different options. (I know you're all shocked I suggested it).  If our
 budget weren't so abysmal (and going to get worse) we would be using it
 right now rather than the snap server we purchased with leftover funds.
  The
 benefits of using the cloud is of course the elasticity it offers you.
  The
 negative is that you have to pay to put your files into the cloud and then
 pay again to take them out (and since we've already been slashed 30% and
 are
 guaranteed another slash...that idea was shot down).

 Of course the major player out there is Amazon S3.  The problem is that
 you
 can't use S3 via Amazon's Web Management Console.  But there is a company
 called RightScale (http://www.rightscale.com/index.php) which has a web
 management console that allows you to upload files quickly and easily
 without having to write scripts and what not.

 Anyway, just my two cents.

 Rosalyn



 On Thu, Aug 27, 2009 at 8:10 AM, Edward Iglesias
 edwardigles...@gmail.comwrote:



 As I was trying to figure out what to do with half a terabyte of
 archival TIFFS it occurred to me that perhaps someone else had this
 problem.  We are starting to produce massive amounts of digital
 objects (videos, archival TIFFS, audio interviews).  Up until now we
 have been dealing with ways to display them to the public.  Now we are
 starting to look at dark archives like OCLC's digital archive
 product.  I would welcome any suggestions from those of you who have
 dealt with this on an archival level.  It's one thing to stick the
 stuff up on a server, but then what?  Our CIO suggested storage
 appliances like this one


 http://www.drobo.com/products/index.php

 but I am wary of the proprietary RAID system.

 Thanks in advance,



 ~
 Edward Iglesias
 Systems Librarian
 Central Connecticut State University








-- 
--
Kyle Banerjee
Digital Services Program Manager
Orbis Cascade Alliance
baner...@uoregon.edu / 503.999.9787


Re: [CODE4LIB] digital storage

2009-08-27 Thread Mark Jordan
Hi Kyle,

- Kyle Banerjee kyle.baner...@gmail.com wrote:

 LOCKSS is good for protecting articles since that is what it is
 designed to do. For a variety of reasons that go beyond cost, I think
 it's a hopeless model for backup.
 

Just to clarify, I wasn't suggesting that LOCKSS is for backup, in its PLN form 
it's part of a more general collaborative preservation program that includes 
policies, business continuity plans, etc. It was never intended to be a backup 
tool. You're right about journal articles being central to its original design, 
but PLNs are simply another use for the platform; for example, content on a PLN 
is not restricted to public-facing versions, it can be packaged up for 
long-term preservation.

Mark


Re: [CODE4LIB] digital storage

2009-08-27 Thread Edward M. Corrado

Hi Roslyn,

I probably wasn't clear I didn't mean to say don't use cloud storage 
if you think it is a good solution, in many cases it could be. I meant 
that if you really want to preserve your data you need to do more than 
put it in the cloud (or for that matter on a local storage device). It 
is not a panacea. Just like if you were housing it locally you need to 
make sure you have redundant copies.


Edward

Rosalyn Metz wrote:

I have to agree with Ed.  You should have a good policy in place for backing
up your data.  Just throwing it on a server isn't a policy.

At the same time I would have to disagree with Ed.  You should look at S3 as
if it was your own server.  What is the guarantee that you supply to your
users with your own server.  The snap server we use here (instead of S3) is
the back up to a back up system already in place.


On Thu, Aug 27, 2009 at 9:52 AM, Edward M. Corrado ecorr...@ecorrado.uswrote:

  

Rosalyn's post  made me think of one more thing if you are looking into
outside entities (such as we are), what are the terms of service and what
guarantee do they offer they won't lose your data? I believe that A3 does
not offer any guarantee, so if you go with them, you probably want to have
some other form of storage as well. Even if they offered a guarantee, what
good is it once they loose your documents you were trying to preserve?

Edward Corrado




Rosalyn Metz wrote:



Hi Edward,

Might I suggest you look into cloud computing services if you're looking
at
different options. (I know you're all shocked I suggested it).  If our
budget weren't so abysmal (and going to get worse) we would be using it
right now rather than the snap server we purchased with leftover funds.
 The
benefits of using the cloud is of course the elasticity it offers you.
 The
negative is that you have to pay to put your files into the cloud and then
pay again to take them out (and since we've already been slashed 30% and
are
guaranteed another slash...that idea was shot down).

Of course the major player out there is Amazon S3.  The problem is that
you
can't use S3 via Amazon's Web Management Console.  But there is a company
called RightScale (http://www.rightscale.com/index.php) which has a web
management console that allows you to upload files quickly and easily
without having to write scripts and what not.

Anyway, just my two cents.

Rosalyn



On Thu, Aug 27, 2009 at 8:10 AM, Edward Iglesias
edwardigles...@gmail.comwrote:



  

As I was trying to figure out what to do with half a terabyte of
archival TIFFS it occurred to me that perhaps someone else had this
problem.  We are starting to produce massive amounts of digital
objects (videos, archival TIFFS, audio interviews).  Up until now we
have been dealing with ways to display them to the public.  Now we are
starting to look at dark archives like OCLC's digital archive
product.  I would welcome any suggestions from those of you who have
dealt with this on an archival level.  It's one thing to stick the
stuff up on a server, but then what?  Our CIO suggested storage
appliances like this one


http://www.drobo.com/products/index.php

but I am wary of the proprietary RAID system.

Thanks in advance,



~
Edward Iglesias
Systems Librarian
Central Connecticut State University






Re: [CODE4LIB] digital storage

2009-08-27 Thread Jimmy Ghaphery
We have a historic idea of what it means to maintain space for analog 
collections. For many institutions a lot of that initial funding has 
come from capital building funds. While the technological solutions are 
not clear to me at this point (and I'm benefiting from this thread on 
that), I am not sure if this won't turn into more of a long-term 
business problem.


Has anyone been able to give a projection to their management on what 
the total cost per TB is for preservation over even a short horizon of 
10 years?


--Jimmy



--
Jimmy Ghaphery
Head, Library Information Systems
VCU Libraries
http://www.library.vcu.edu
--


Re: [CODE4LIB] digital storage

2009-08-27 Thread Kyle Banerjee
This would require multiple cases. But if they were distributed to
different points, the chances of losing them all would be reduced...

On Thu, Aug 27, 2009 at 10:35 AM, David J. Fianderda...@fiander.info wrote:
 You know, putting Dick Cheney is a pelican case might have solved a lot of
 problems later on.

 - David

 On 27-Aug-2009, at 13:30 , Rosalyn Metz wrote:

 ah good.  then we are agreeing.  strike the whole disagree with ed portion
 of my email.

 also i like the pelican idea too.  it reminds me of dick cheney in an
 undisclosed location.

 On Thu, Aug 27, 2009 at 1:15 PM, Edward M. Corrado
 ecorr...@ecorrado.uswrote:

 Hi Roslyn,

 I probably wasn't clear I didn't mean to say don't use cloud storage
 if
 you think it is a good solution, in many cases it could be. I meant that
 if
 you really want to preserve your data you need to do more than put it in
 the
 cloud (or for that matter on a local storage device). It is not a
 panacea.
 Just like if you were housing it locally you need to make sure you have
 redundant copies.

 Edward


 Rosalyn Metz wrote:

 I have to agree with Ed.  You should have a good policy in place for
 backing
 up your data.  Just throwing it on a server isn't a policy.

 At the same time I would have to disagree with Ed.  You should look at
 S3
 as
 if it was your own server.  What is the guarantee that you supply to
 your
 users with your own server.  The snap server we use here (instead of S3)
 is
 the back up to a back up system already in place.


 On Thu, Aug 27, 2009 at 9:52 AM, Edward M. Corrado ecorr...@ecorrado.us

 wrote:



 Rosalyn's post  made me think of one more thing if you are looking
 into
 outside entities (such as we are), what are the terms of service and
 what
 guarantee do they offer they won't lose your data? I believe that A3
 does
 not offer any guarantee, so if you go with them, you probably want to
 have
 some other form of storage as well. Even if they offered a guarantee,
 what
 good is it once they loose your documents you were trying to preserve?

 Edward Corrado




 Rosalyn Metz wrote:



 Hi Edward,

 Might I suggest you look into cloud computing services if you're
 looking
 at
 different options. (I know you're all shocked I suggested it).  If our
 budget weren't so abysmal (and going to get worse) we would be using
 it
 right now rather than the snap server we purchased with leftover
 funds.
 The
 benefits of using the cloud is of course the elasticity it offers you.
 The
 negative is that you have to pay to put your files into the cloud and
 then
 pay again to take them out (and since we've already been slashed 30%
 and
 are
 guaranteed another slash...that idea was shot down).

 Of course the major player out there is Amazon S3.  The problem is
 that
 you
 can't use S3 via Amazon's Web Management Console.  But there is a
 company
 called RightScale (http://www.rightscale.com/index.php) which has a
 web
 management console that allows you to upload files quickly and easily
 without having to write scripts and what not.

 Anyway, just my two cents.

 Rosalyn



 On Thu, Aug 27, 2009 at 8:10 AM, Edward Iglesias
 edwardigles...@gmail.comwrote:





 As I was trying to figure out what to do with half a terabyte of
 archival TIFFS it occurred to me that perhaps someone else had this
 problem.  We are starting to produce massive amounts of digital
 objects (videos, archival TIFFS, audio interviews).  Up until now we
 have been dealing with ways to display them to the public.  Now we
 are
 starting to look at dark archives like OCLC's digital archive
 product.  I would welcome any suggestions from those of you who have
 dealt with this on an archival level.  It's one thing to stick the
 stuff up on a server, but then what?  Our CIO suggested storage
 appliances like this one


 http://www.drobo.com/products/index.php

 but I am wary of the proprietary RAID system.

 Thanks in advance,



 ~
 Edward Iglesias
 Systems Librarian
 Central Connecticut State University










-- 
--
Kyle Banerjee
Digital Services Program Manager
Orbis Cascade Alliance
baner...@uoregon.edu / 503.999.9787


Re: [CODE4LIB] digital storage

2009-08-27 Thread Kyle Banerjee
 Has anyone been able to give a projection to their management on what the
 total cost per TB is for preservation over even a short horizon of 10 years?

The trick is that the cost varies drastically with the model employed.

Preservation is insurance, plain and simple. If you buy more coverage,
you're protected against a wider variety of threats. The problem with
most preservation discussions is that options are weighed only in the
abstract. The best protection consumes significant financial and staff
resources -- which reduces your ability to deliver services. Plus,
there is no such thing as removing all risk.

The most appropriate model for an institution will vary depending on
what they need to preserve, how much there is, and how they define
acceptable risk. It's all a matter of defining where the lines are
drawn.

There is a tendency to pretend that analog libraries are somehow safe,
but even if theft/loss weren't issues, they get flooded and catch
fire. In the bad 'ol days, catalog drawers could be burned in
protests, and in contemporary times, loss of vendor support for your
system or other problems represent a real threat.

kyle


Re: [CODE4LIB] digital storage

2009-08-27 Thread Jimmy Ghaphery

yep, good points, agree all 'round.

One thing in the analog world that may be appropriate is that we do not 
view all collections as equal. In kicking this around locally we've been 
discussing different levels (or insurance policies) per collection 
depending on things like how unique it is, born-digital, cost to re-scan 
etc.


I still tend to think that the TCO of this is generally underestimated 
in part to due to consumer prices for storage.




Kyle Banerjee wrote:

Has anyone been able to give a projection to their management on what the
total cost per TB is for preservation over even a short horizon of 10 years?


The trick is that the cost varies drastically with the model employed.

Preservation is insurance, plain and simple. If you buy more coverage,
you're protected against a wider variety of threats. The problem with
most preservation discussions is that options are weighed only in the
abstract. The best protection consumes significant financial and staff
resources -- which reduces your ability to deliver services. Plus,
there is no such thing as removing all risk.

The most appropriate model for an institution will vary depending on
what they need to preserve, how much there is, and how they define
acceptable risk. It's all a matter of defining where the lines are
drawn.

There is a tendency to pretend that analog libraries are somehow safe,
but even if theft/loss weren't issues, they get flooded and catch
fire. In the bad 'ol days, catalog drawers could be burned in
protests, and in contemporary times, loss of vendor support for your
system or other problems represent a real threat.

kyle


--
Jimmy Ghaphery
Head, Library Information Systems
VCU Libraries
http://www.library.vcu.edu
--


Re: [CODE4LIB] digital storage

2009-08-27 Thread Ryan Ordway

On Aug 27, 2009, at 6:22 AM, Rosalyn Metz wrote:
Might I suggest you look into cloud computing services if you're  
looking at

different options. (I know you're all shocked I suggested it).  If our
budget weren't so abysmal (and going to get worse) we would be using  
it
right now rather than the snap server we purchased with leftover  
funds.  The
benefits of using the cloud is of course the elasticity it offers  
you.  The
negative is that you have to pay to put your files into the cloud  
and then
pay again to take them out (and since we've already been slashed 30%  
and are

guaranteed another slash...that idea was shot down).



I did a rough cost analysis of S3 as an offsite archive of roughly  
20TB of data with estimated growth of between 6-8TB per year based on  
current growth rates. It ended up looking something like this:


$1.80 * 2storage
$2.04 * 2data transfer

$36,000 year 1 storage (20TB)
$40,800 year 1 data transfer (20TB)
$46,800 year 2 storage (26TB)
$12,240 year 2 data transfer (6TB)
$61,200 year 3 storage (34TB)
$16,320 year 3 data transfer (8TB)

$213,360 over 3 years

This only took into account storage and data transfer costs, and did  
not include READ/WRITE request costs.


Granted, this was awhile ago. I haven't checked to see if Amazon has  
changed any of their pricing or policies so this could be out of date.  
It looks like the data transfer cost could be avoided by shipping the  
data to them, although I don't know if they will do that for large  
amounts of data.


If you're ONLY looking at storage costs, SATA drives in enterprise  
RAID systems range from about $1.00/GB to about $1.25/GB for online  
storage. If you don't need immediate access to files, then nearline  
and offline storage is much cheaper. I can't find the exact figures,  
but LTO-4 tapes have a 800GB native / 1.6TB compressed capacity with a  
cost of something like $0.25/GB or something like that.


Also, don't rule out compression. The TIFF files that I was told were  
not compressable I was able to compress down from about 20TB to about  
4TB using bzip2 -9. It will require some intermediate decompression  
when someone needs to use them, but it's a lot less expensive to store  
4TB than 20TB. You could even decompress the files on-the-fly without  
too much effort.


Ryan

--
Ryan Ordway   E-mail: rord...@oregonstate.edu
Unix Systems Administrator   rord...@library.oregonstate.edu
OSU Libraries, Corvallis, OR 97331Office: Valley Library #4657


Re: [CODE4LIB] digital storage

2009-08-27 Thread Joe Hourcle

On Thu, 27 Aug 2009, Jimmy Ghaphery wrote:

We have a historic idea of what it means to maintain space for analog 
collections. For many institutions a lot of that initial funding has come 
from capital building funds. While the technological solutions are not clear 
to me at this point (and I'm benefiting from this thread on that), I am not 
sure if this won't turn into more of a long-term business problem.


Has anyone been able to give a projection to their management on what the 
total cost per TB is for preservation over even a short horizon of 10 years?


I think it was the NSSDC (National Space Science Data Center) who had done 
some estimates, and I can't remember exactly what they were, I do remember 
that they had basically made the assumption that storage would continue to 
get cheaper and larger, and that computers to handle any verification and 
reformatting would get faster, resulting in the costs dropping off 
exponentially.


The result, if you were to convert to present dollars (to charge the group 
whose data you were taking), the cost of short term storage (~20 years?) 
was about the same as indefinate storage.


Unfortunately, I can't remember who it was (Tim Eastman?  Ed Grayzek? 
Joe King?  ... I don't think it was Don Sawyer) or where it was (Tim gave 
a talk at ASIST in 2006; almost all of 'em spoke at the Science Archives 
in the 21st Century workshop)


As I can't find the source of that, I don't know if this just came down to 
the technical aspects, or if it also included issues in understanding the 
data being preserved.  It's possible that they use the PDS cost analysis, 
which assumes that those costs are up-front:


http://pds.nasa.gov/tools/cost-analysis-tool.shtml

(Phases A-D are before the mission even launches; phase E is basically 
everything once data starts being collected)


...

And to look at some of the costs that you have to consider when archiving, 
see, from 1999 (so the numbers won't be right for today) :


How Many Terabytes Was That? Archiving and Serving Solar Space
Data Without Losing Your Shirt
http://umbra.nascom.nasa.gov/aas_spd/abstracts/aas199906.pdf

-Joe

ps.  I looked at the pricing for cloud storage, where we'd only be holding 
60TB at any one time, but adding 2TB per day, and the prices were insane 
before we even estimated people downloading the data.


Re: [CODE4LIB] digital storage

2009-08-27 Thread Nate Vack
On Thu, Aug 27, 2009 at 1:57 PM, Ryan Ordwayrord...@oregonstate.edu wrote:

 $213,360 over 3 years

 If you're ONLY looking at storage costs, SATA drives in enterprise RAID
 systems range from about $1.00/GB to about $1.25/GB for online storage.

Yeah -- but if you're looking only at storage costs, you'll have an
inaccurate estimate of your costs. You've got power, cooling, sysadmin
time, and replacements for failed disks. If you want an
apples-to-apples comparison, you'll want an offsite mirror, as well.

I'm not saying S3 is always cost-effective -- but in our experience,
the costs of the disks themselves is dwarfed by the costs of the
related infrastructure.

Cheers,
-Nate
Waisman Lab for Brain Imaging, UW-Madison


Re: [CODE4LIB] digital storage

2009-08-27 Thread Edward M. Corrado

Nate Vack wrote:

On Thu, Aug 27, 2009 at 1:57 PM, Ryan Ordwayrord...@oregonstate.edu wrote:

  

$213,360 over 3 years



  

If you're ONLY looking at storage costs, SATA drives in enterprise RAID
systems range from about $1.00/GB to about $1.25/GB for online storage.



Yeah -- but if you're looking only at storage costs, you'll have an
inaccurate estimate of your costs. You've got power, cooling, sysadmin
time, and replacements for failed disks. If you want an
apples-to-apples comparison, you'll want an offsite mirror, as well.

I'm not saying S3 is always cost-effective -- but in our experience,
the costs of the disks themselves is dwarfed by the costs of the
related infrastructure.

  
I agree that the cost of storage is only one factor. I have to wonder 
though, how much more staff time do you need for local storage than 
cloud storage? I don't know the answer but I'm not sure it is much more 
than setting up S3 storage, especially if you have a good partnership 
with your storage vendor. With cloud storage you still need other 
backups and mirrors, so I don't see the off-site mirror as an argument 
in favor of the cloud. You should have that redundancy either way.


Yes, maybe you save on staff time patching software on your storage 
array, but that is not a significant amount of time - esp. since you are 
still going to have some local storage, and there isn't much difference 
in staff time in doing 2 TB vs. 20 TB.


You may some time on the initial configuration, but you still need to 
configure cloud storage. Is cloud storage that much easier/less time 
consuming to configure than an iSCSI device? Replacement for disks would 
be covered under your warranty or support contract (at least I would 
hope you would have one).


The power and cooling can be a savings, but in many cases the library or 
individual departments don't pay for electricity, so while *someone* 
pays the cost, it might not be the individual department. Cooling and 
electricity costs are an actually a great argument for tape for 
large-scale storage. Tape might seem old fashioned, but in many 
applications it by far offers the best value of long term storage per GB.


Again, I'm not totally against the cloud and there are some things I 
think it could be very useful for, but the cloud doesn't make up for the 
lack of (or just bad) planning. As someone else said during this thread 
this is really more of a management issue than it is a technology issue. 
Yes, technology is involved in the solution, but proper planning and 
long term commitment is more important than the technology du jour. 
There are many different options from cloud to tape to disk, but no 
matter what you choose without a long term digital preservation plan, 
you might be doing storage but you are not doing preservation.


Edward




Cheers,
-Nate
Waisman Lab for Brain Imaging, UW-Madison
  


Re: [CODE4LIB] digital storage

2009-08-27 Thread Dwiggins David
I've been pondering this a lot lately. We're starting from the ground up on a 
concerted digital asset management effort after years of one-off solutions. 
When I arrived, I inherited piles of CDs and DVDs, things stashed on servers 
all over the place, etc.
 
I am now implementing a digital asset management system (ResourceSpace) to 
start ordering all this, which will bly tie into our new collections management 
system and new web content management system.
 
For the moment, I have written a script to copy the resource and preview assets 
from ResourceSpace to a bucket on S3. (To save bandwidth/time I also used the 
batch load capability to ship them a hard drive with about 500 GB of data a few 
weeks ago.) So I now have two copies of all images: one protected by RAID on 
our iSCSI storage box, and one theoretically spread across multiple data 
centers at Amazon.
 
Ideally I'd like to have one other copy at one of our remote offices (either 
online or offline), but that's for the future.
 
I'm not sure we've entirely come to terms with the long term cost of preserving 
the material. We're buying enough local storage to get through our grant-funded 
ramp-up. After that replacing/adding drives and servers is going to have to be 
considered as much of a preservation/conservation expense as replacing the a 
leaky roof. But it's a relatively new expense (or at least orders of magnitude 
bigger than it has been for other data systems) so it's something we're going 
to have to educate people on.
 
-David Dwiggins
Historic New England
 
 
__
 
David Dwiggins
Systems Librarian/Archivist, Historic New England
141 Cambridge Street, Boston, MA 02114
(617) 227-3956 x 242 
ddwiggins [at] historicnewengland.org ( mailto:ddwigg...@historicnewengland.org 
)
http://www.historicnewengland.org ( http://www.historicnewengland.org/ )


 Jimmy Ghaphery jghap...@vcu.edu 8/27/2009 1:37 PM 
We have a historic idea of what it means to maintain space for analog 
collections. For many institutions a lot of that initial funding has 
come from capital building funds. While the technological solutions are 
not clear to me at this point (and I'm benefiting from this thread on 
that), I am not sure if this won't turn into more of a long-term 
business problem.

Has anyone been able to give a projection to their management on what 
the total cost per TB is for preservation over even a short horizon of 
10 years?

--Jimmy



-- 
Jimmy Ghaphery
Head, Library Information Systems
VCU Libraries
http://www.library.vcu.edu 
--

Visit http://www.LymanEstate.org for information on renting the historic Lyman 
Estate for your next event - a very special place for very special occasions.


Re: [CODE4LIB] digital storage

2009-08-27 Thread Nate Vack
On Thu, Aug 27, 2009 at 3:25 PM, Edward M. Corradoecorr...@ecorrado.us wrote:

 Yes, maybe you save on staff time patching software on your storage array,
 but that is not a significant amount of time - esp. since you are still
 going to have some local storage, and there isn't much difference in staff
 time in doing 2 TB vs. 20 TB.

Well... with 2TB you might try and get away with a few 1TB disks
slapped onto a Promise RAID card or something. With 20TB, you're
probably at least seriously considering a SAN.

 There are many
 different options from cloud to tape to disk, but no matter what you choose
 without a long term digital preservation plan, you might be doing storage
 but you are not doing preservation.

If I already did enterprise storage, I wouldn't really consider cloud
storage -- it's unlikely to be cheaper enough to outweigh its
disadvantages. If I didn't already do it, and had to build staff
expertise and buy big expensive kit... I'd look at the cloud bit more
seriously.

Even if you don't use it for storage, S3 is particularly useful is as
a reality check in the planning process. Once you model all of the
costs (even if you don't pay power, model it at market rates), you're
unlikely to beat Amazon's price. If you think you're doing so, you're
probably being optimistic or missing something.

Cheers,
-Nate


Re: [CODE4LIB] digital storage

2009-08-27 Thread Joe Atzberger
On Thu, Aug 27, 2009 at 4:25 PM, Edward M. Corrado ecorr...@ecorrado.uswrote:

 Nate Vack wrote:

 On Thu, Aug 27, 2009 at 1:57 PM, Ryan Ordwayrord...@oregonstate.edu
 wrote:


 $213,360 over 3 years


  If you're ONLY looking at storage costs, SATA drives in enterprise RAID
 systems range from about $1.00/GB to about $1.25/GB for online storage.


 Yeah -- but if you're looking only at storage costs, you'll have an
 inaccurate estimate of your costs. You've got power, cooling, sysadmin
 time, and replacements for failed disks. If you want an
 apples-to-apples comparison, you'll want an offsite mirror, as well.

 I'm not saying S3 is always cost-effective -- but in our experience,
 the costs of the disks themselves is dwarfed by the costs of the
 related infrastructure.

  I agree that the cost of storage is only one factor. I have to wonder
 though, how much more staff time do you need for local storage than cloud
 storage? I don't know the answer but I'm not sure it is much more than
 setting up S3 storage, especially if you have a good partnership with your
 storage vendor.


Support relationships, especially regarding storage are very costly.  When I
worked at a midsize datacenter, we implemented a backup solution with
STORServer and tivoli.  Both hardware and software were considerably
costly.  Initial and ongoing support, while indispensable was basically as
much as the cost of the hardware every few years.


 With cloud storage you still need other backups and mirrors, so I don't see
 the off-site mirror as an argument in favor of the cloud. You should have
 that redundancy either way.


You have the original, and the copy, wherever it is.  So you can build rack
elsewhere (and reintroduce power, cooling, security and bandwidth costs), or
get a tape rotation scheme in place, or whatever, but a cloud-based backup
is already offsite, whereas an in-house tape library (like our STORServer)
still requires a staffer to populate the lockbox to be picked up (we used
Iron Mountain, then later Cintas).


 Yes, maybe you save on staff time patching software on your storage array,
 but that is not a significant amount of time - esp. since you are still
 going to have some local storage, and there isn't much difference in staff
 time in doing 2 TB vs. 20 TB.


There's a real difference.  I can get 2 TB in a single HDD, for example this
one for $200 at NewEgg:
http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413
http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413

Any high school kid can install that.  20 TB requires some kind of
additional structure and additional expertise.

You may some time on the initial configuration, but you still need to
 configure cloud storage. Is cloud storage that much easier/less time
 consuming to configure than an iSCSI device? Replacement for disks would be
 covered under your warranty or support contract (at least I would hope you
 would have one).


Warranties expire and force you into ill-timed, hardly-afforded and
dangerous-to-your-data upgrades.  Sorta like some ILS systems with which we
are all familiar.  The cloud doesn't necessarily stay the same, but the part
you care about (data in, data out) does.


 The power and cooling can be a savings, but in many cases the library or
 individual departments don't pay for electricity, so while *someone* pays
 the cost, it might not be the individual department. Cooling and electricity
 costs are an actually a great argument for tape for large-scale storage.
 Tape might seem old fashioned, but in many applications it by far offers the
 best value of long term storage per GB.


It's true, tape is still an worthwhile option. Alternatives like optical or
magneto-optical media just have not kept up.

Again, I'm not totally against the cloud and there are some things I think
 it could be very useful for, but the cloud doesn't make up for the lack of
 (or just bad) planning.


Yeah, there's no system good enough to compensate for bad planning and
management.
--Joe


Re: [CODE4LIB] digital storage

2009-08-27 Thread Edward M. Corrado

Joe Atzberger wrote:

On Thu, Aug 27, 2009 at 4:25 PM, Edward M. Corrado ecorr...@ecorrado.uswrote:

  

Nate Vack wrote:



On Thu, Aug 27, 2009 at 1:57 PM, Ryan Ordwayrord...@oregonstate.edu
wrote:


  

$213,360 over 3 years




 If you're ONLY looking at storage costs, SATA drives in enterprise RAID
  

systems range from about $1.00/GB to about $1.25/GB for online storage.



Yeah -- but if you're looking only at storage costs, you'll have an
inaccurate estimate of your costs. You've got power, cooling, sysadmin
time, and replacements for failed disks. If you want an
apples-to-apples comparison, you'll want an offsite mirror, as well.

I'm not saying S3 is always cost-effective -- but in our experience,
the costs of the disks themselves is dwarfed by the costs of the
related infrastructure.

 I agree that the cost of storage is only one factor. I have to wonder
  

though, how much more staff time do you need for local storage than cloud
storage? I don't know the answer but I'm not sure it is much more than
setting up S3 storage, especially if you have a good partnership with your
storage vendor.




Support relationships, especially regarding storage are very costly.  When I
worked at a midsize datacenter, we implemented a backup solution with
STORServer and tivoli.  Both hardware and software were considerably
costly.  Initial and ongoing support, while indispensable was basically as
much as the cost of the hardware every few years.
  
They can be depending on what you are doing and what choices on software 
you make, but for long term preservation purposes they don't have to be 
nearly as expensive as what Ryan calculated S3 to cost. If you shop 
around you can get a quality 36GB array with 3 yr warranty for say 
$30,000 that is almost $180,000 less than S3 (probably much less, I'm be 
less than generous with my Sun discounts and only briefly looked at 
there prices). Even if we use the double your cost for support, it is 
still over $50,000 a year less for 3 years. Yes, we might need some 
expertise, but running a 36TB preservation storage array is not a 
$50,000 a year job and besides, what is wrong with growing local expertise?


...

Yes, maybe you save on staff time patching software on your storage array,
but that is not a significant amount of time - esp. since you are still
going to have some local storage, and there isn't much difference in staff
time in doing 2 TB vs. 20 TB.




There's a real difference.  I can get 2 TB in a single HDD, for example this
one for $200 at NewEgg:
http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413
http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413

Any high school kid can install that.  20 TB requires some kind of
additional structure and additional expertise.
  
Well building a 20 TB storage device and getting it to work can actually 
be very cheap and doesn't require a PhD (just a local GNU/Linux geek who 
likes to play with hardware) if you are OK with a home grown solution. I 
wouldn't be satisfied with that, but I don't see how a commercial 
offering that adds up to $150,000 worth of expertise and infrastructure.



You may some time on the initial configuration, but you still need to
  

configure cloud storage. Is cloud storage that much easier/less time
consuming to configure than an iSCSI device? Replacement for disks would be
covered under your warranty or support contract (at least I would hope you
would have one).




Warranties expire and force you into ill-timed, hardly-afforded and
dangerous-to-your-data upgrades.  Sorta like some ILS systems with which we
are all familiar.
Yes some application upgrades can cause issues, but how is that 
different if your application and/or storage is in a  cloud?



  The cloud doesn't necessarily stay the same, but the part
you care about (data in, data out) does.
  
How do you know they won't change their cloud models? And you don't even 
have a warranty with the cloud. They won't even guarantee they won't 
delete your data.


As long as you use a common standards based method of storage, you won't 
have any more issues getting it to work than you will getting future 
application servers to work with the cloud. While I'm not a huge fan of 
NFS I've been using it for many years with no problems due to changes in 
NFS or operating systems or hardware. NFS has been available to the 
public for about 20 years. Occasionally you may need to migrate it from 
one platform or one machine to another but you very well need to do that 
with clouds as well. Maybe you are using S3 but for whatever reason Sun 
gives you a better deal with better terms and guarantees for using their 
cloud. Maybe Amazon drops S3. Maybe because S3 moves servers to a 
country that you are not legally allow to have your data in.  Yes, you 
have to plan for migration to new platforms but I fail to see how you 
don't need to do that with the cloud. Really any major