Re: [CODE4LIB] digital storage
Thanks to all of you who answered. Crowdsourcing does work if you pick the right crowd. We have been looking at the S3 possibility but I agree this would have to be a second copy. The policy and institutional support comments from my tokayo see http://en.wiktionary.org/wiki/tocayo seem especially appropriate. I am going to include a link on our staff blog to this thread as a resource. Thanks again, Edward Iglesias On Thu, Aug 27, 2009 at 8:59 PM, Edward M. Corradoecorr...@ecorrado.us wrote: Joe Atzberger wrote: On Thu, Aug 27, 2009 at 4:25 PM, Edward M. Corrado ecorr...@ecorrado.uswrote: Nate Vack wrote: On Thu, Aug 27, 2009 at 1:57 PM, Ryan Ordwayrord...@oregonstate.edu wrote: $213,360 over 3 years If you're ONLY looking at storage costs, SATA drives in enterprise RAID systems range from about $1.00/GB to about $1.25/GB for online storage. Yeah -- but if you're looking only at storage costs, you'll have an inaccurate estimate of your costs. You've got power, cooling, sysadmin time, and replacements for failed disks. If you want an apples-to-apples comparison, you'll want an offsite mirror, as well. I'm not saying S3 is always cost-effective -- but in our experience, the costs of the disks themselves is dwarfed by the costs of the related infrastructure. I agree that the cost of storage is only one factor. I have to wonder though, how much more staff time do you need for local storage than cloud storage? I don't know the answer but I'm not sure it is much more than setting up S3 storage, especially if you have a good partnership with your storage vendor. Support relationships, especially regarding storage are very costly. When I worked at a midsize datacenter, we implemented a backup solution with STORServer and tivoli. Both hardware and software were considerably costly. Initial and ongoing support, while indispensable was basically as much as the cost of the hardware every few years. They can be depending on what you are doing and what choices on software you make, but for long term preservation purposes they don't have to be nearly as expensive as what Ryan calculated S3 to cost. If you shop around you can get a quality 36GB array with 3 yr warranty for say $30,000 that is almost $180,000 less than S3 (probably much less, I'm be less than generous with my Sun discounts and only briefly looked at there prices). Even if we use the double your cost for support, it is still over $50,000 a year less for 3 years. Yes, we might need some expertise, but running a 36TB preservation storage array is not a $50,000 a year job and besides, what is wrong with growing local expertise? ... Yes, maybe you save on staff time patching software on your storage array, but that is not a significant amount of time - esp. since you are still going to have some local storage, and there isn't much difference in staff time in doing 2 TB vs. 20 TB. There's a real difference. I can get 2 TB in a single HDD, for example this one for $200 at NewEgg: http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413 http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413 Any high school kid can install that. 20 TB requires some kind of additional structure and additional expertise. Well building a 20 TB storage device and getting it to work can actually be very cheap and doesn't require a PhD (just a local GNU/Linux geek who likes to play with hardware) if you are OK with a home grown solution. I wouldn't be satisfied with that, but I don't see how a commercial offering that adds up to $150,000 worth of expertise and infrastructure. You may some time on the initial configuration, but you still need to configure cloud storage. Is cloud storage that much easier/less time consuming to configure than an iSCSI device? Replacement for disks would be covered under your warranty or support contract (at least I would hope you would have one). Warranties expire and force you into ill-timed, hardly-afforded and dangerous-to-your-data upgrades. Sorta like some ILS systems with which we are all familiar. Yes some application upgrades can cause issues, but how is that different if your application and/or storage is in a cloud? The cloud doesn't necessarily stay the same, but the part you care about (data in, data out) does. How do you know they won't change their cloud models? And you don't even have a warranty with the cloud. They won't even guarantee they won't delete your data. As long as you use a common standards based method of storage, you won't have any more issues getting it to work than you will getting future application servers to work with the cloud. While I'm not a huge fan of NFS I've been using it for many years with no problems due to changes in NFS or operating systems or hardware. NFS has been available to the public for about 20 years. Occasionally you may need to migrate it
Re: [CODE4LIB] digital storage
Related to our discussion: http://online.wsj.com/article/SB125139942345664387.html I particularly like the quote at the end: Digital information lasts forever -- or five years, says RAND Corp. computer analyst Jeff Rothenberg, whichever comes first. Tim McGeary Team Leader, Library Technology Lehigh University 610-758-4998 tim.mcge...@lehigh.edu Google Talk: timmcgeary Yahoo IM: timmcgeary Edward Iglesias wrote: Thanks to all of you who answered. Crowdsourcing does work if you pick the right crowd. We have been looking at the S3 possibility but I agree this would have to be a second copy. The policy and institutional support comments from my tokayo see http://en.wiktionary.org/wiki/tocayo seem especially appropriate. I am going to include a link on our staff blog to this thread as a resource. Thanks again, Edward Iglesias On Thu, Aug 27, 2009 at 8:59 PM, Edward M. Corradoecorr...@ecorrado.us wrote: Joe Atzberger wrote: On Thu, Aug 27, 2009 at 4:25 PM, Edward M. Corrado ecorr...@ecorrado.uswrote: Nate Vack wrote: On Thu, Aug 27, 2009 at 1:57 PM, Ryan Ordwayrord...@oregonstate.edu wrote: $213,360 over 3 years If you're ONLY looking at storage costs, SATA drives in enterprise RAID systems range from about $1.00/GB to about $1.25/GB for online storage. Yeah -- but if you're looking only at storage costs, you'll have an inaccurate estimate of your costs. You've got power, cooling, sysadmin time, and replacements for failed disks. If you want an apples-to-apples comparison, you'll want an offsite mirror, as well. I'm not saying S3 is always cost-effective -- but in our experience, the costs of the disks themselves is dwarfed by the costs of the related infrastructure. I agree that the cost of storage is only one factor. I have to wonder though, how much more staff time do you need for local storage than cloud storage? I don't know the answer but I'm not sure it is much more than setting up S3 storage, especially if you have a good partnership with your storage vendor. Support relationships, especially regarding storage are very costly. When I worked at a midsize datacenter, we implemented a backup solution with STORServer and tivoli. Both hardware and software were considerably costly. Initial and ongoing support, while indispensable was basically as much as the cost of the hardware every few years. They can be depending on what you are doing and what choices on software you make, but for long term preservation purposes they don't have to be nearly as expensive as what Ryan calculated S3 to cost. If you shop around you can get a quality 36GB array with 3 yr warranty for say $30,000 that is almost $180,000 less than S3 (probably much less, I'm be less than generous with my Sun discounts and only briefly looked at there prices). Even if we use the double your cost for support, it is still over $50,000 a year less for 3 years. Yes, we might need some expertise, but running a 36TB preservation storage array is not a $50,000 a year job and besides, what is wrong with growing local expertise? ... Yes, maybe you save on staff time patching software on your storage array, but that is not a significant amount of time - esp. since you are still going to have some local storage, and there isn't much difference in staff time in doing 2 TB vs. 20 TB. There's a real difference. I can get 2 TB in a single HDD, for example this one for $200 at NewEgg: http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413 http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413 Any high school kid can install that. 20 TB requires some kind of additional structure and additional expertise. Well building a 20 TB storage device and getting it to work can actually be very cheap and doesn't require a PhD (just a local GNU/Linux geek who likes to play with hardware) if you are OK with a home grown solution. I wouldn't be satisfied with that, but I don't see how a commercial offering that adds up to $150,000 worth of expertise and infrastructure. You may some time on the initial configuration, but you still need to configure cloud storage. Is cloud storage that much easier/less time consuming to configure than an iSCSI device? Replacement for disks would be covered under your warranty or support contract (at least I would hope you would have one). Warranties expire and force you into ill-timed, hardly-afforded and dangerous-to-your-data upgrades. Sorta like some ILS systems with which we are all familiar. Yes some application upgrades can cause issues, but how is that different if your application and/or storage is in a cloud? The cloud doesn't necessarily stay the same, but the part you care about (data in, data out) does. How do you know they won't change their cloud models? And you don't even have a warranty with the cloud. They won't even guarantee they won't delete your data. As long as you use a common standards based method of storage, you
Re: [CODE4LIB] digital storage
Edward Iglesias wrote: Thanks to all of you who answered. Crowdsourcing does work if you pick the right crowd. We have been looking at the S3 possibility but I agree this would have to be a second copy. There been lots of talk about hardware platorms but not much about software to manage all that data. You might want to take a look at DuraCloud from DuraSpace (http://duraspace.org/duracloud.php). DuraSpace is the recently created organization from the merger of Fedora and DSpace. When I first started at Cornell 12 years ago I sat next to Sandy Payette, who is now the CEO of DuraSpace.
[CODE4LIB] digital storage
As I was trying to figure out what to do with half a terabyte of archival TIFFS it occurred to me that perhaps someone else had this problem. We are starting to produce massive amounts of digital objects (videos, archival TIFFS, audio interviews). Up until now we have been dealing with ways to display them to the public. Now we are starting to look at dark archives like OCLC's digital archive product. I would welcome any suggestions from those of you who have dealt with this on an archival level. It's one thing to stick the stuff up on a server, but then what? Our CIO suggested storage appliances like this one http://www.drobo.com/products/index.php but I am wary of the proprietary RAID system. Thanks in advance, ~ Edward Iglesias Systems Librarian Central Connecticut State University
Re: [CODE4LIB] digital storage
Hi Edward, Might I suggest you look into cloud computing services if you're looking at different options. (I know you're all shocked I suggested it). If our budget weren't so abysmal (and going to get worse) we would be using it right now rather than the snap server we purchased with leftover funds. The benefits of using the cloud is of course the elasticity it offers you. The negative is that you have to pay to put your files into the cloud and then pay again to take them out (and since we've already been slashed 30% and are guaranteed another slash...that idea was shot down). Of course the major player out there is Amazon S3. The problem is that you can't use S3 via Amazon's Web Management Console. But there is a company called RightScale (http://www.rightscale.com/index.php) which has a web management console that allows you to upload files quickly and easily without having to write scripts and what not. Anyway, just my two cents. Rosalyn On Thu, Aug 27, 2009 at 8:10 AM, Edward Iglesias edwardigles...@gmail.comwrote: As I was trying to figure out what to do with half a terabyte of archival TIFFS it occurred to me that perhaps someone else had this problem. We are starting to produce massive amounts of digital objects (videos, archival TIFFS, audio interviews). Up until now we have been dealing with ways to display them to the public. Now we are starting to look at dark archives like OCLC's digital archive product. I would welcome any suggestions from those of you who have dealt with this on an archival level. It's one thing to stick the stuff up on a server, but then what? Our CIO suggested storage appliances like this one http://www.drobo.com/products/index.php but I am wary of the proprietary RAID system. Thanks in advance, ~ Edward Iglesias Systems Librarian Central Connecticut State University
Re: [CODE4LIB] digital storage
I think you probably need to come up with a long term plan with real institutional commitment. Storing files and making sure they are backed up is all well and good, but that is only one part of a long term digital preservation project. How are you protecting against bit rot? what about formats that become obsolete? etc. We are planning to outsource our longterm digital preservation because we do not have the staff to maintain it. Edward Edward Iglesias wrote: As I was trying to figure out what to do with half a terabyte of archival TIFFS it occurred to me that perhaps someone else had this problem. We are starting to produce massive amounts of digital objects (videos, archival TIFFS, audio interviews). Up until now we have been dealing with ways to display them to the public. Now we are starting to look at dark archives like OCLC's digital archive product. I would welcome any suggestions from those of you who have dealt with this on an archival level. It's one thing to stick the stuff up on a server, but then what? Our CIO suggested storage appliances like this one http://www.drobo.com/products/index.php but I am wary of the proprietary RAID system. Thanks in advance, ~ Edward Iglesias Systems Librarian Central Connecticut State University
Re: [CODE4LIB] digital storage
Rosalyn's post made me think of one more thing if you are looking into outside entities (such as we are), what are the terms of service and what guarantee do they offer they won't lose your data? I believe that A3 does not offer any guarantee, so if you go with them, you probably want to have some other form of storage as well. Even if they offered a guarantee, what good is it once they loose your documents you were trying to preserve? Edward Corrado Rosalyn Metz wrote: Hi Edward, Might I suggest you look into cloud computing services if you're looking at different options. (I know you're all shocked I suggested it). If our budget weren't so abysmal (and going to get worse) we would be using it right now rather than the snap server we purchased with leftover funds. The benefits of using the cloud is of course the elasticity it offers you. The negative is that you have to pay to put your files into the cloud and then pay again to take them out (and since we've already been slashed 30% and are guaranteed another slash...that idea was shot down). Of course the major player out there is Amazon S3. The problem is that you can't use S3 via Amazon's Web Management Console. But there is a company called RightScale (http://www.rightscale.com/index.php) which has a web management console that allows you to upload files quickly and easily without having to write scripts and what not. Anyway, just my two cents. Rosalyn On Thu, Aug 27, 2009 at 8:10 AM, Edward Iglesias edwardigles...@gmail.comwrote: As I was trying to figure out what to do with half a terabyte of archival TIFFS it occurred to me that perhaps someone else had this problem. We are starting to produce massive amounts of digital objects (videos, archival TIFFS, audio interviews). Up until now we have been dealing with ways to display them to the public. Now we are starting to look at dark archives like OCLC's digital archive product. I would welcome any suggestions from those of you who have dealt with this on an archival level. It's one thing to stick the stuff up on a server, but then what? Our CIO suggested storage appliances like this one http://www.drobo.com/products/index.php but I am wary of the proprietary RAID system. Thanks in advance, ~ Edward Iglesias Systems Librarian Central Connecticut State University
Re: [CODE4LIB] digital storage
The basic idea of LOCKSS is always what I think of when it comes to archival: lots of copies. For my own personal archival stuff, I do use a Drobo...and have recommended that we get one of the new Drobo Pros for use here in the library. But not for archival, just for storage. For things that I really do not want to ever go away, I make sure that I have 3 copies: one remote, and at least two local. There are bigadvantages to the Drobo over traditional RAID, and with about the same amount of risks overall. The Drobo is growable, and can use mix and match drives, which gives it, IMO, a leg up over traditional RAID. I'm a huge, huge fan. But for things I really care about, I'd have one copy on a server, one copy on a drobo, and another copy in the cloud somewhere. Jason On Thu, Aug 27, 2009 at 8:10 AM, Edward Iglesias edwardigles...@gmail.comwrote: As I was trying to figure out what to do with half a terabyte of archival TIFFS it occurred to me that perhaps someone else had this problem. We are starting to produce massive amounts of digital objects (videos, archival TIFFS, audio interviews). Up until now we have been dealing with ways to display them to the public. Now we are starting to look at dark archives like OCLC's digital archive product. I would welcome any suggestions from those of you who have dealt with this on an archival level. It's one thing to stick the stuff up on a server, but then what? Our CIO suggested storage appliances like this one http://www.drobo.com/products/index.php but I am wary of the proprietary RAID system. Thanks in advance, ~ Edward Iglesias Systems Librarian Central Connecticut State University -- Follow me on Twitter! http://www.twitter.com/griffey
Re: [CODE4LIB] digital storage
- Jason Griffey grif...@gmail.com wrote: The basic idea of LOCKSS is always what I think of when it comes to archival: lots of copies. We're starting to use LOCKSS, in the form of a consortial Private LOCKSS Network (PLN), and it is proving to be useful. I'll be presenting on what we're doing at Access next month. Currently LOCKSS doesn't scale very high. The problem is that you need at least six boxes in your network, and if each site has 5 TB of stuff, then each box needs 30 TB of storage. Since LOCKSS uses 32-bit OpenBSD, and OpenBSD's support for attached storage isn't as good as some other OSs, you're pretty much limited to using local disk. That all said, LOCKSS is actively working to overcome the storage scalability issue, and better support for attached storage is being tested right now. Mark Mark Jordan Head of Library Systems W.A.C. Bennett Library, Simon Fraser University Burnaby, British Columbia, V5A 1S6, Canada Voice: 778.782.5753 / Fax: 778.782.3023 mjor...@sfu.ca
Re: [CODE4LIB] digital storage
Speaking of LOCKSS (and PLNs), there's the MetaArchive: http://www.metaarchive.org/ You might want to consider contacting them as well. Mark -Original Message- From: Code for Libraries [mailto:code4...@listserv.nd.edu] On Behalf Of Jason Griffey Sent: Thursday, August 27, 2009 9:59 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] digital storage The basic idea of LOCKSS is always what I think of when it comes to archival: lots of copies. For my own personal archival stuff, I do use a Drobo...and have recommended that we get one of the new Drobo Pros for use here in the library. But not for archival, just for storage. For things that I really do not want to ever go away, I make sure that I have 3 copies: one remote, and at least two local. There are bigadvantages to the Drobo over traditional RAID, and with about the same amount of risks overall. The Drobo is growable, and can use mix and match drives, which gives it, IMO, a leg up over traditional RAID. I'm a huge, huge fan. But for things I really care about, I'd have one copy on a server, one copy on a drobo, and another copy in the cloud somewhere. Jason On Thu, Aug 27, 2009 at 8:10 AM, Edward Iglesias edwardigles...@gmail.comwrote: As I was trying to figure out what to do with half a terabyte of archival TIFFS it occurred to me that perhaps someone else had this problem. We are starting to produce massive amounts of digital objects (videos, archival TIFFS, audio interviews). Up until now we have been dealing with ways to display them to the public. Now we are starting to look at dark archives like OCLC's digital archive product. I would welcome any suggestions from those of you who have dealt with this on an archival level. It's one thing to stick the stuff up on a server, but then what? Our CIO suggested storage appliances like this one http://www.drobo.com/products/index.php but I am wary of the proprietary RAID system. Thanks in advance, ~ Edward Iglesias Systems Librarian Central Connecticut State University -- Follow me on Twitter! http://www.twitter.com/griffey
Re: [CODE4LIB] digital storage
On Thu, 27 Aug 2009, Edward Iglesias wrote: As I was trying to figure out what to do with half a terabyte of archival TIFFS it occurred to me that perhaps someone else had this problem. We are starting to produce massive amounts of digital objects (videos, archival TIFFS, audio interviews). Up until now we have been dealing with ways to display them to the public. Now we are starting to look at dark archives like OCLC's digital archive product. I would welcome any suggestions from those of you who have dealt with this on an archival level. It's one thing to stick the stuff up on a server, but then what? Our CIO suggested storage appliances like this one I'd recommend looking at two classes of products: Near-line storage MAID (Massive Array of Idle Disks) Near line can be things like DVD juke boxes, where you don't have to have someone manually load the items, but it can't all be accessed at once. You put lower-res JPEGs up what we call 'browse images', and then when someone wants to full res high quality image, it goes to the jukebox. You might have to wait between 15 sec and 2 minutes for the file. You can 'tune' them by adjusting the number of drives relative to the amount of disks in them, to reduce the latency. MAID systems are like RAID, but they spin down the disks when they're not in use, so they have a much lower power draw when used for storage. We've used them as both primary systems, and as storage for our backups of more highly-available data. ... As for your comment of Drobo, we don't use that specific brand, but we do have a number of 4 or 6 disk RAID enclosures that we use for both transporting files (if someone needs to copy 2TB of files, we mail it to 'em, rather than send it over the network), and for our off-site storage of critical data. -Joe
Re: [CODE4LIB] digital storage
I have to agree with Ed. You should have a good policy in place for backing up your data. Just throwing it on a server isn't a policy. At the same time I would have to disagree with Ed. You should look at S3 as if it was your own server. What is the guarantee that you supply to your users with your own server. The snap server we use here (instead of S3) is the back up to a back up system already in place. On Thu, Aug 27, 2009 at 9:52 AM, Edward M. Corrado ecorr...@ecorrado.uswrote: Rosalyn's post made me think of one more thing if you are looking into outside entities (such as we are), what are the terms of service and what guarantee do they offer they won't lose your data? I believe that A3 does not offer any guarantee, so if you go with them, you probably want to have some other form of storage as well. Even if they offered a guarantee, what good is it once they loose your documents you were trying to preserve? Edward Corrado Rosalyn Metz wrote: Hi Edward, Might I suggest you look into cloud computing services if you're looking at different options. (I know you're all shocked I suggested it). If our budget weren't so abysmal (and going to get worse) we would be using it right now rather than the snap server we purchased with leftover funds. The benefits of using the cloud is of course the elasticity it offers you. The negative is that you have to pay to put your files into the cloud and then pay again to take them out (and since we've already been slashed 30% and are guaranteed another slash...that idea was shot down). Of course the major player out there is Amazon S3. The problem is that you can't use S3 via Amazon's Web Management Console. But there is a company called RightScale (http://www.rightscale.com/index.php) which has a web management console that allows you to upload files quickly and easily without having to write scripts and what not. Anyway, just my two cents. Rosalyn On Thu, Aug 27, 2009 at 8:10 AM, Edward Iglesias edwardigles...@gmail.comwrote: As I was trying to figure out what to do with half a terabyte of archival TIFFS it occurred to me that perhaps someone else had this problem. We are starting to produce massive amounts of digital objects (videos, archival TIFFS, audio interviews). Up until now we have been dealing with ways to display them to the public. Now we are starting to look at dark archives like OCLC's digital archive product. I would welcome any suggestions from those of you who have dealt with this on an archival level. It's one thing to stick the stuff up on a server, but then what? Our CIO suggested storage appliances like this one http://www.drobo.com/products/index.php but I am wary of the proprietary RAID system. Thanks in advance, ~ Edward Iglesias Systems Librarian Central Connecticut State University
Re: [CODE4LIB] digital storage
Agreed on both of Rosalyn's points. I'm wary of the hot backup options discussed in this thread for large quantities of data. First of all, hot backup is expensive -- disks aren't that inexpensive, and after you add power and space, it gets much worse. Start keeping many copies, and the price gets much worse. LOCKSS is good for protecting articles since that is what it is designed to do. For a variety of reasons that go beyond cost, I think it's a hopeless model for backup. Even if money is no object, bandwidth is a huge issue. Transferring a few GB at a time is not a big deal, but it takes awhile. Transfer large quantities and you run into trouble quickly. Bit rot is not so much of an issue because you can check integrity regularly. For example, a bottom of the line EC2 instance could continuously monitor your S3 files. Of course, there is the whole practicality aspect -- backup must be convenient as well as effective. Different solutions strike me as appropriate to different situations, but as much as I hate tapes, they're effective, cheap, and efficient presuming you don't keep them on site and verify them. kyle On Thu, Aug 27, 2009 at 8:43 AM, Rosalyn Metzrosalynm...@gmail.com wrote: I have to agree with Ed. You should have a good policy in place for backing up your data. Just throwing it on a server isn't a policy. At the same time I would have to disagree with Ed. You should look at S3 as if it was your own server. What is the guarantee that you supply to your users with your own server. The snap server we use here (instead of S3) is the back up to a back up system already in place. On Thu, Aug 27, 2009 at 9:52 AM, Edward M. Corrado ecorr...@ecorrado.uswrote: Rosalyn's post made me think of one more thing if you are looking into outside entities (such as we are), what are the terms of service and what guarantee do they offer they won't lose your data? I believe that A3 does not offer any guarantee, so if you go with them, you probably want to have some other form of storage as well. Even if they offered a guarantee, what good is it once they loose your documents you were trying to preserve? Edward Corrado Rosalyn Metz wrote: Hi Edward, Might I suggest you look into cloud computing services if you're looking at different options. (I know you're all shocked I suggested it). If our budget weren't so abysmal (and going to get worse) we would be using it right now rather than the snap server we purchased with leftover funds. The benefits of using the cloud is of course the elasticity it offers you. The negative is that you have to pay to put your files into the cloud and then pay again to take them out (and since we've already been slashed 30% and are guaranteed another slash...that idea was shot down). Of course the major player out there is Amazon S3. The problem is that you can't use S3 via Amazon's Web Management Console. But there is a company called RightScale (http://www.rightscale.com/index.php) which has a web management console that allows you to upload files quickly and easily without having to write scripts and what not. Anyway, just my two cents. Rosalyn On Thu, Aug 27, 2009 at 8:10 AM, Edward Iglesias edwardigles...@gmail.comwrote: As I was trying to figure out what to do with half a terabyte of archival TIFFS it occurred to me that perhaps someone else had this problem. We are starting to produce massive amounts of digital objects (videos, archival TIFFS, audio interviews). Up until now we have been dealing with ways to display them to the public. Now we are starting to look at dark archives like OCLC's digital archive product. I would welcome any suggestions from those of you who have dealt with this on an archival level. It's one thing to stick the stuff up on a server, but then what? Our CIO suggested storage appliances like this one http://www.drobo.com/products/index.php but I am wary of the proprietary RAID system. Thanks in advance, ~ Edward Iglesias Systems Librarian Central Connecticut State University -- -- Kyle Banerjee Digital Services Program Manager Orbis Cascade Alliance baner...@uoregon.edu / 503.999.9787
Re: [CODE4LIB] digital storage
Hi Kyle, - Kyle Banerjee kyle.baner...@gmail.com wrote: LOCKSS is good for protecting articles since that is what it is designed to do. For a variety of reasons that go beyond cost, I think it's a hopeless model for backup. Just to clarify, I wasn't suggesting that LOCKSS is for backup, in its PLN form it's part of a more general collaborative preservation program that includes policies, business continuity plans, etc. It was never intended to be a backup tool. You're right about journal articles being central to its original design, but PLNs are simply another use for the platform; for example, content on a PLN is not restricted to public-facing versions, it can be packaged up for long-term preservation. Mark
Re: [CODE4LIB] digital storage
Hi Roslyn, I probably wasn't clear I didn't mean to say don't use cloud storage if you think it is a good solution, in many cases it could be. I meant that if you really want to preserve your data you need to do more than put it in the cloud (or for that matter on a local storage device). It is not a panacea. Just like if you were housing it locally you need to make sure you have redundant copies. Edward Rosalyn Metz wrote: I have to agree with Ed. You should have a good policy in place for backing up your data. Just throwing it on a server isn't a policy. At the same time I would have to disagree with Ed. You should look at S3 as if it was your own server. What is the guarantee that you supply to your users with your own server. The snap server we use here (instead of S3) is the back up to a back up system already in place. On Thu, Aug 27, 2009 at 9:52 AM, Edward M. Corrado ecorr...@ecorrado.uswrote: Rosalyn's post made me think of one more thing if you are looking into outside entities (such as we are), what are the terms of service and what guarantee do they offer they won't lose your data? I believe that A3 does not offer any guarantee, so if you go with them, you probably want to have some other form of storage as well. Even if they offered a guarantee, what good is it once they loose your documents you were trying to preserve? Edward Corrado Rosalyn Metz wrote: Hi Edward, Might I suggest you look into cloud computing services if you're looking at different options. (I know you're all shocked I suggested it). If our budget weren't so abysmal (and going to get worse) we would be using it right now rather than the snap server we purchased with leftover funds. The benefits of using the cloud is of course the elasticity it offers you. The negative is that you have to pay to put your files into the cloud and then pay again to take them out (and since we've already been slashed 30% and are guaranteed another slash...that idea was shot down). Of course the major player out there is Amazon S3. The problem is that you can't use S3 via Amazon's Web Management Console. But there is a company called RightScale (http://www.rightscale.com/index.php) which has a web management console that allows you to upload files quickly and easily without having to write scripts and what not. Anyway, just my two cents. Rosalyn On Thu, Aug 27, 2009 at 8:10 AM, Edward Iglesias edwardigles...@gmail.comwrote: As I was trying to figure out what to do with half a terabyte of archival TIFFS it occurred to me that perhaps someone else had this problem. We are starting to produce massive amounts of digital objects (videos, archival TIFFS, audio interviews). Up until now we have been dealing with ways to display them to the public. Now we are starting to look at dark archives like OCLC's digital archive product. I would welcome any suggestions from those of you who have dealt with this on an archival level. It's one thing to stick the stuff up on a server, but then what? Our CIO suggested storage appliances like this one http://www.drobo.com/products/index.php but I am wary of the proprietary RAID system. Thanks in advance, ~ Edward Iglesias Systems Librarian Central Connecticut State University
Re: [CODE4LIB] digital storage
We have a historic idea of what it means to maintain space for analog collections. For many institutions a lot of that initial funding has come from capital building funds. While the technological solutions are not clear to me at this point (and I'm benefiting from this thread on that), I am not sure if this won't turn into more of a long-term business problem. Has anyone been able to give a projection to their management on what the total cost per TB is for preservation over even a short horizon of 10 years? --Jimmy -- Jimmy Ghaphery Head, Library Information Systems VCU Libraries http://www.library.vcu.edu --
Re: [CODE4LIB] digital storage
This would require multiple cases. But if they were distributed to different points, the chances of losing them all would be reduced... On Thu, Aug 27, 2009 at 10:35 AM, David J. Fianderda...@fiander.info wrote: You know, putting Dick Cheney is a pelican case might have solved a lot of problems later on. - David On 27-Aug-2009, at 13:30 , Rosalyn Metz wrote: ah good. then we are agreeing. strike the whole disagree with ed portion of my email. also i like the pelican idea too. it reminds me of dick cheney in an undisclosed location. On Thu, Aug 27, 2009 at 1:15 PM, Edward M. Corrado ecorr...@ecorrado.uswrote: Hi Roslyn, I probably wasn't clear I didn't mean to say don't use cloud storage if you think it is a good solution, in many cases it could be. I meant that if you really want to preserve your data you need to do more than put it in the cloud (or for that matter on a local storage device). It is not a panacea. Just like if you were housing it locally you need to make sure you have redundant copies. Edward Rosalyn Metz wrote: I have to agree with Ed. You should have a good policy in place for backing up your data. Just throwing it on a server isn't a policy. At the same time I would have to disagree with Ed. You should look at S3 as if it was your own server. What is the guarantee that you supply to your users with your own server. The snap server we use here (instead of S3) is the back up to a back up system already in place. On Thu, Aug 27, 2009 at 9:52 AM, Edward M. Corrado ecorr...@ecorrado.us wrote: Rosalyn's post made me think of one more thing if you are looking into outside entities (such as we are), what are the terms of service and what guarantee do they offer they won't lose your data? I believe that A3 does not offer any guarantee, so if you go with them, you probably want to have some other form of storage as well. Even if they offered a guarantee, what good is it once they loose your documents you were trying to preserve? Edward Corrado Rosalyn Metz wrote: Hi Edward, Might I suggest you look into cloud computing services if you're looking at different options. (I know you're all shocked I suggested it). If our budget weren't so abysmal (and going to get worse) we would be using it right now rather than the snap server we purchased with leftover funds. The benefits of using the cloud is of course the elasticity it offers you. The negative is that you have to pay to put your files into the cloud and then pay again to take them out (and since we've already been slashed 30% and are guaranteed another slash...that idea was shot down). Of course the major player out there is Amazon S3. The problem is that you can't use S3 via Amazon's Web Management Console. But there is a company called RightScale (http://www.rightscale.com/index.php) which has a web management console that allows you to upload files quickly and easily without having to write scripts and what not. Anyway, just my two cents. Rosalyn On Thu, Aug 27, 2009 at 8:10 AM, Edward Iglesias edwardigles...@gmail.comwrote: As I was trying to figure out what to do with half a terabyte of archival TIFFS it occurred to me that perhaps someone else had this problem. We are starting to produce massive amounts of digital objects (videos, archival TIFFS, audio interviews). Up until now we have been dealing with ways to display them to the public. Now we are starting to look at dark archives like OCLC's digital archive product. I would welcome any suggestions from those of you who have dealt with this on an archival level. It's one thing to stick the stuff up on a server, but then what? Our CIO suggested storage appliances like this one http://www.drobo.com/products/index.php but I am wary of the proprietary RAID system. Thanks in advance, ~ Edward Iglesias Systems Librarian Central Connecticut State University -- -- Kyle Banerjee Digital Services Program Manager Orbis Cascade Alliance baner...@uoregon.edu / 503.999.9787
Re: [CODE4LIB] digital storage
Has anyone been able to give a projection to their management on what the total cost per TB is for preservation over even a short horizon of 10 years? The trick is that the cost varies drastically with the model employed. Preservation is insurance, plain and simple. If you buy more coverage, you're protected against a wider variety of threats. The problem with most preservation discussions is that options are weighed only in the abstract. The best protection consumes significant financial and staff resources -- which reduces your ability to deliver services. Plus, there is no such thing as removing all risk. The most appropriate model for an institution will vary depending on what they need to preserve, how much there is, and how they define acceptable risk. It's all a matter of defining where the lines are drawn. There is a tendency to pretend that analog libraries are somehow safe, but even if theft/loss weren't issues, they get flooded and catch fire. In the bad 'ol days, catalog drawers could be burned in protests, and in contemporary times, loss of vendor support for your system or other problems represent a real threat. kyle
Re: [CODE4LIB] digital storage
yep, good points, agree all 'round. One thing in the analog world that may be appropriate is that we do not view all collections as equal. In kicking this around locally we've been discussing different levels (or insurance policies) per collection depending on things like how unique it is, born-digital, cost to re-scan etc. I still tend to think that the TCO of this is generally underestimated in part to due to consumer prices for storage. Kyle Banerjee wrote: Has anyone been able to give a projection to their management on what the total cost per TB is for preservation over even a short horizon of 10 years? The trick is that the cost varies drastically with the model employed. Preservation is insurance, plain and simple. If you buy more coverage, you're protected against a wider variety of threats. The problem with most preservation discussions is that options are weighed only in the abstract. The best protection consumes significant financial and staff resources -- which reduces your ability to deliver services. Plus, there is no such thing as removing all risk. The most appropriate model for an institution will vary depending on what they need to preserve, how much there is, and how they define acceptable risk. It's all a matter of defining where the lines are drawn. There is a tendency to pretend that analog libraries are somehow safe, but even if theft/loss weren't issues, they get flooded and catch fire. In the bad 'ol days, catalog drawers could be burned in protests, and in contemporary times, loss of vendor support for your system or other problems represent a real threat. kyle -- Jimmy Ghaphery Head, Library Information Systems VCU Libraries http://www.library.vcu.edu --
Re: [CODE4LIB] digital storage
On Aug 27, 2009, at 6:22 AM, Rosalyn Metz wrote: Might I suggest you look into cloud computing services if you're looking at different options. (I know you're all shocked I suggested it). If our budget weren't so abysmal (and going to get worse) we would be using it right now rather than the snap server we purchased with leftover funds. The benefits of using the cloud is of course the elasticity it offers you. The negative is that you have to pay to put your files into the cloud and then pay again to take them out (and since we've already been slashed 30% and are guaranteed another slash...that idea was shot down). I did a rough cost analysis of S3 as an offsite archive of roughly 20TB of data with estimated growth of between 6-8TB per year based on current growth rates. It ended up looking something like this: $1.80 * 2storage $2.04 * 2data transfer $36,000 year 1 storage (20TB) $40,800 year 1 data transfer (20TB) $46,800 year 2 storage (26TB) $12,240 year 2 data transfer (6TB) $61,200 year 3 storage (34TB) $16,320 year 3 data transfer (8TB) $213,360 over 3 years This only took into account storage and data transfer costs, and did not include READ/WRITE request costs. Granted, this was awhile ago. I haven't checked to see if Amazon has changed any of their pricing or policies so this could be out of date. It looks like the data transfer cost could be avoided by shipping the data to them, although I don't know if they will do that for large amounts of data. If you're ONLY looking at storage costs, SATA drives in enterprise RAID systems range from about $1.00/GB to about $1.25/GB for online storage. If you don't need immediate access to files, then nearline and offline storage is much cheaper. I can't find the exact figures, but LTO-4 tapes have a 800GB native / 1.6TB compressed capacity with a cost of something like $0.25/GB or something like that. Also, don't rule out compression. The TIFF files that I was told were not compressable I was able to compress down from about 20TB to about 4TB using bzip2 -9. It will require some intermediate decompression when someone needs to use them, but it's a lot less expensive to store 4TB than 20TB. You could even decompress the files on-the-fly without too much effort. Ryan -- Ryan Ordway E-mail: rord...@oregonstate.edu Unix Systems Administrator rord...@library.oregonstate.edu OSU Libraries, Corvallis, OR 97331Office: Valley Library #4657
Re: [CODE4LIB] digital storage
On Thu, 27 Aug 2009, Jimmy Ghaphery wrote: We have a historic idea of what it means to maintain space for analog collections. For many institutions a lot of that initial funding has come from capital building funds. While the technological solutions are not clear to me at this point (and I'm benefiting from this thread on that), I am not sure if this won't turn into more of a long-term business problem. Has anyone been able to give a projection to their management on what the total cost per TB is for preservation over even a short horizon of 10 years? I think it was the NSSDC (National Space Science Data Center) who had done some estimates, and I can't remember exactly what they were, I do remember that they had basically made the assumption that storage would continue to get cheaper and larger, and that computers to handle any verification and reformatting would get faster, resulting in the costs dropping off exponentially. The result, if you were to convert to present dollars (to charge the group whose data you were taking), the cost of short term storage (~20 years?) was about the same as indefinate storage. Unfortunately, I can't remember who it was (Tim Eastman? Ed Grayzek? Joe King? ... I don't think it was Don Sawyer) or where it was (Tim gave a talk at ASIST in 2006; almost all of 'em spoke at the Science Archives in the 21st Century workshop) As I can't find the source of that, I don't know if this just came down to the technical aspects, or if it also included issues in understanding the data being preserved. It's possible that they use the PDS cost analysis, which assumes that those costs are up-front: http://pds.nasa.gov/tools/cost-analysis-tool.shtml (Phases A-D are before the mission even launches; phase E is basically everything once data starts being collected) ... And to look at some of the costs that you have to consider when archiving, see, from 1999 (so the numbers won't be right for today) : How Many Terabytes Was That? Archiving and Serving Solar Space Data Without Losing Your Shirt http://umbra.nascom.nasa.gov/aas_spd/abstracts/aas199906.pdf -Joe ps. I looked at the pricing for cloud storage, where we'd only be holding 60TB at any one time, but adding 2TB per day, and the prices were insane before we even estimated people downloading the data.
Re: [CODE4LIB] digital storage
On Thu, Aug 27, 2009 at 1:57 PM, Ryan Ordwayrord...@oregonstate.edu wrote: $213,360 over 3 years If you're ONLY looking at storage costs, SATA drives in enterprise RAID systems range from about $1.00/GB to about $1.25/GB for online storage. Yeah -- but if you're looking only at storage costs, you'll have an inaccurate estimate of your costs. You've got power, cooling, sysadmin time, and replacements for failed disks. If you want an apples-to-apples comparison, you'll want an offsite mirror, as well. I'm not saying S3 is always cost-effective -- but in our experience, the costs of the disks themselves is dwarfed by the costs of the related infrastructure. Cheers, -Nate Waisman Lab for Brain Imaging, UW-Madison
Re: [CODE4LIB] digital storage
Nate Vack wrote: On Thu, Aug 27, 2009 at 1:57 PM, Ryan Ordwayrord...@oregonstate.edu wrote: $213,360 over 3 years If you're ONLY looking at storage costs, SATA drives in enterprise RAID systems range from about $1.00/GB to about $1.25/GB for online storage. Yeah -- but if you're looking only at storage costs, you'll have an inaccurate estimate of your costs. You've got power, cooling, sysadmin time, and replacements for failed disks. If you want an apples-to-apples comparison, you'll want an offsite mirror, as well. I'm not saying S3 is always cost-effective -- but in our experience, the costs of the disks themselves is dwarfed by the costs of the related infrastructure. I agree that the cost of storage is only one factor. I have to wonder though, how much more staff time do you need for local storage than cloud storage? I don't know the answer but I'm not sure it is much more than setting up S3 storage, especially if you have a good partnership with your storage vendor. With cloud storage you still need other backups and mirrors, so I don't see the off-site mirror as an argument in favor of the cloud. You should have that redundancy either way. Yes, maybe you save on staff time patching software on your storage array, but that is not a significant amount of time - esp. since you are still going to have some local storage, and there isn't much difference in staff time in doing 2 TB vs. 20 TB. You may some time on the initial configuration, but you still need to configure cloud storage. Is cloud storage that much easier/less time consuming to configure than an iSCSI device? Replacement for disks would be covered under your warranty or support contract (at least I would hope you would have one). The power and cooling can be a savings, but in many cases the library or individual departments don't pay for electricity, so while *someone* pays the cost, it might not be the individual department. Cooling and electricity costs are an actually a great argument for tape for large-scale storage. Tape might seem old fashioned, but in many applications it by far offers the best value of long term storage per GB. Again, I'm not totally against the cloud and there are some things I think it could be very useful for, but the cloud doesn't make up for the lack of (or just bad) planning. As someone else said during this thread this is really more of a management issue than it is a technology issue. Yes, technology is involved in the solution, but proper planning and long term commitment is more important than the technology du jour. There are many different options from cloud to tape to disk, but no matter what you choose without a long term digital preservation plan, you might be doing storage but you are not doing preservation. Edward Cheers, -Nate Waisman Lab for Brain Imaging, UW-Madison
Re: [CODE4LIB] digital storage
I've been pondering this a lot lately. We're starting from the ground up on a concerted digital asset management effort after years of one-off solutions. When I arrived, I inherited piles of CDs and DVDs, things stashed on servers all over the place, etc. I am now implementing a digital asset management system (ResourceSpace) to start ordering all this, which will bly tie into our new collections management system and new web content management system. For the moment, I have written a script to copy the resource and preview assets from ResourceSpace to a bucket on S3. (To save bandwidth/time I also used the batch load capability to ship them a hard drive with about 500 GB of data a few weeks ago.) So I now have two copies of all images: one protected by RAID on our iSCSI storage box, and one theoretically spread across multiple data centers at Amazon. Ideally I'd like to have one other copy at one of our remote offices (either online or offline), but that's for the future. I'm not sure we've entirely come to terms with the long term cost of preserving the material. We're buying enough local storage to get through our grant-funded ramp-up. After that replacing/adding drives and servers is going to have to be considered as much of a preservation/conservation expense as replacing the a leaky roof. But it's a relatively new expense (or at least orders of magnitude bigger than it has been for other data systems) so it's something we're going to have to educate people on. -David Dwiggins Historic New England __ David Dwiggins Systems Librarian/Archivist, Historic New England 141 Cambridge Street, Boston, MA 02114 (617) 227-3956 x 242 ddwiggins [at] historicnewengland.org ( mailto:ddwigg...@historicnewengland.org ) http://www.historicnewengland.org ( http://www.historicnewengland.org/ ) Jimmy Ghaphery jghap...@vcu.edu 8/27/2009 1:37 PM We have a historic idea of what it means to maintain space for analog collections. For many institutions a lot of that initial funding has come from capital building funds. While the technological solutions are not clear to me at this point (and I'm benefiting from this thread on that), I am not sure if this won't turn into more of a long-term business problem. Has anyone been able to give a projection to their management on what the total cost per TB is for preservation over even a short horizon of 10 years? --Jimmy -- Jimmy Ghaphery Head, Library Information Systems VCU Libraries http://www.library.vcu.edu -- Visit http://www.LymanEstate.org for information on renting the historic Lyman Estate for your next event - a very special place for very special occasions.
Re: [CODE4LIB] digital storage
On Thu, Aug 27, 2009 at 3:25 PM, Edward M. Corradoecorr...@ecorrado.us wrote: Yes, maybe you save on staff time patching software on your storage array, but that is not a significant amount of time - esp. since you are still going to have some local storage, and there isn't much difference in staff time in doing 2 TB vs. 20 TB. Well... with 2TB you might try and get away with a few 1TB disks slapped onto a Promise RAID card or something. With 20TB, you're probably at least seriously considering a SAN. There are many different options from cloud to tape to disk, but no matter what you choose without a long term digital preservation plan, you might be doing storage but you are not doing preservation. If I already did enterprise storage, I wouldn't really consider cloud storage -- it's unlikely to be cheaper enough to outweigh its disadvantages. If I didn't already do it, and had to build staff expertise and buy big expensive kit... I'd look at the cloud bit more seriously. Even if you don't use it for storage, S3 is particularly useful is as a reality check in the planning process. Once you model all of the costs (even if you don't pay power, model it at market rates), you're unlikely to beat Amazon's price. If you think you're doing so, you're probably being optimistic or missing something. Cheers, -Nate
Re: [CODE4LIB] digital storage
On Thu, Aug 27, 2009 at 4:25 PM, Edward M. Corrado ecorr...@ecorrado.uswrote: Nate Vack wrote: On Thu, Aug 27, 2009 at 1:57 PM, Ryan Ordwayrord...@oregonstate.edu wrote: $213,360 over 3 years If you're ONLY looking at storage costs, SATA drives in enterprise RAID systems range from about $1.00/GB to about $1.25/GB for online storage. Yeah -- but if you're looking only at storage costs, you'll have an inaccurate estimate of your costs. You've got power, cooling, sysadmin time, and replacements for failed disks. If you want an apples-to-apples comparison, you'll want an offsite mirror, as well. I'm not saying S3 is always cost-effective -- but in our experience, the costs of the disks themselves is dwarfed by the costs of the related infrastructure. I agree that the cost of storage is only one factor. I have to wonder though, how much more staff time do you need for local storage than cloud storage? I don't know the answer but I'm not sure it is much more than setting up S3 storage, especially if you have a good partnership with your storage vendor. Support relationships, especially regarding storage are very costly. When I worked at a midsize datacenter, we implemented a backup solution with STORServer and tivoli. Both hardware and software were considerably costly. Initial and ongoing support, while indispensable was basically as much as the cost of the hardware every few years. With cloud storage you still need other backups and mirrors, so I don't see the off-site mirror as an argument in favor of the cloud. You should have that redundancy either way. You have the original, and the copy, wherever it is. So you can build rack elsewhere (and reintroduce power, cooling, security and bandwidth costs), or get a tape rotation scheme in place, or whatever, but a cloud-based backup is already offsite, whereas an in-house tape library (like our STORServer) still requires a staffer to populate the lockbox to be picked up (we used Iron Mountain, then later Cintas). Yes, maybe you save on staff time patching software on your storage array, but that is not a significant amount of time - esp. since you are still going to have some local storage, and there isn't much difference in staff time in doing 2 TB vs. 20 TB. There's a real difference. I can get 2 TB in a single HDD, for example this one for $200 at NewEgg: http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413 http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413 Any high school kid can install that. 20 TB requires some kind of additional structure and additional expertise. You may some time on the initial configuration, but you still need to configure cloud storage. Is cloud storage that much easier/less time consuming to configure than an iSCSI device? Replacement for disks would be covered under your warranty or support contract (at least I would hope you would have one). Warranties expire and force you into ill-timed, hardly-afforded and dangerous-to-your-data upgrades. Sorta like some ILS systems with which we are all familiar. The cloud doesn't necessarily stay the same, but the part you care about (data in, data out) does. The power and cooling can be a savings, but in many cases the library or individual departments don't pay for electricity, so while *someone* pays the cost, it might not be the individual department. Cooling and electricity costs are an actually a great argument for tape for large-scale storage. Tape might seem old fashioned, but in many applications it by far offers the best value of long term storage per GB. It's true, tape is still an worthwhile option. Alternatives like optical or magneto-optical media just have not kept up. Again, I'm not totally against the cloud and there are some things I think it could be very useful for, but the cloud doesn't make up for the lack of (or just bad) planning. Yeah, there's no system good enough to compensate for bad planning and management. --Joe
Re: [CODE4LIB] digital storage
Joe Atzberger wrote: On Thu, Aug 27, 2009 at 4:25 PM, Edward M. Corrado ecorr...@ecorrado.uswrote: Nate Vack wrote: On Thu, Aug 27, 2009 at 1:57 PM, Ryan Ordwayrord...@oregonstate.edu wrote: $213,360 over 3 years If you're ONLY looking at storage costs, SATA drives in enterprise RAID systems range from about $1.00/GB to about $1.25/GB for online storage. Yeah -- but if you're looking only at storage costs, you'll have an inaccurate estimate of your costs. You've got power, cooling, sysadmin time, and replacements for failed disks. If you want an apples-to-apples comparison, you'll want an offsite mirror, as well. I'm not saying S3 is always cost-effective -- but in our experience, the costs of the disks themselves is dwarfed by the costs of the related infrastructure. I agree that the cost of storage is only one factor. I have to wonder though, how much more staff time do you need for local storage than cloud storage? I don't know the answer but I'm not sure it is much more than setting up S3 storage, especially if you have a good partnership with your storage vendor. Support relationships, especially regarding storage are very costly. When I worked at a midsize datacenter, we implemented a backup solution with STORServer and tivoli. Both hardware and software were considerably costly. Initial and ongoing support, while indispensable was basically as much as the cost of the hardware every few years. They can be depending on what you are doing and what choices on software you make, but for long term preservation purposes they don't have to be nearly as expensive as what Ryan calculated S3 to cost. If you shop around you can get a quality 36GB array with 3 yr warranty for say $30,000 that is almost $180,000 less than S3 (probably much less, I'm be less than generous with my Sun discounts and only briefly looked at there prices). Even if we use the double your cost for support, it is still over $50,000 a year less for 3 years. Yes, we might need some expertise, but running a 36TB preservation storage array is not a $50,000 a year job and besides, what is wrong with growing local expertise? ... Yes, maybe you save on staff time patching software on your storage array, but that is not a significant amount of time - esp. since you are still going to have some local storage, and there isn't much difference in staff time in doing 2 TB vs. 20 TB. There's a real difference. I can get 2 TB in a single HDD, for example this one for $200 at NewEgg: http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413 http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413 Any high school kid can install that. 20 TB requires some kind of additional structure and additional expertise. Well building a 20 TB storage device and getting it to work can actually be very cheap and doesn't require a PhD (just a local GNU/Linux geek who likes to play with hardware) if you are OK with a home grown solution. I wouldn't be satisfied with that, but I don't see how a commercial offering that adds up to $150,000 worth of expertise and infrastructure. You may some time on the initial configuration, but you still need to configure cloud storage. Is cloud storage that much easier/less time consuming to configure than an iSCSI device? Replacement for disks would be covered under your warranty or support contract (at least I would hope you would have one). Warranties expire and force you into ill-timed, hardly-afforded and dangerous-to-your-data upgrades. Sorta like some ILS systems with which we are all familiar. Yes some application upgrades can cause issues, but how is that different if your application and/or storage is in a cloud? The cloud doesn't necessarily stay the same, but the part you care about (data in, data out) does. How do you know they won't change their cloud models? And you don't even have a warranty with the cloud. They won't even guarantee they won't delete your data. As long as you use a common standards based method of storage, you won't have any more issues getting it to work than you will getting future application servers to work with the cloud. While I'm not a huge fan of NFS I've been using it for many years with no problems due to changes in NFS or operating systems or hardware. NFS has been available to the public for about 20 years. Occasionally you may need to migrate it from one platform or one machine to another but you very well need to do that with clouds as well. Maybe you are using S3 but for whatever reason Sun gives you a better deal with better terms and guarantees for using their cloud. Maybe Amazon drops S3. Maybe because S3 moves servers to a country that you are not legally allow to have your data in. Yes, you have to plan for migration to new platforms but I fail to see how you don't need to do that with the cloud. Really any major