Re: [CODE4LIB] Amazon Glacier - tracking deposits
Hi Sara, At OU Libraries, we've just started using Glacier in earnest. We're tracking our glacier archives in DynamoDB tables. I've whipped up a little python script to stick LC bags into glacier and make them easier for us to keep track of and retrieve. https://github.com/OULibraries/FreezerBag On Thu, Apr 9, 2015 at 11:27 AM, Kyle Banerjee kyle.baner...@gmail.com wrote: Howdy Sara, I've played around a bit with Glacier. It's a bit weird to work with, but tools keep on improving. The real question is what you hope to accomplish with it. As its name implies, it's designed for stuff that is basically frozen. When you take things out, you need to do so very slowly. The pricing model is such that if you try to pull out stuff quickly (e.g. you're trying to restore a system), the cost goes into the stratosphere -- definitely model what things would look like before using it for purposes like backup. However, if you have access images that are already backed up on disk or tape offsite (i.e. system recovery needs already taken care of) and this is just for storage of high res scans, Glacier could be a good way to go. As far as the ID's go, I'd embed them directly into the access image metadata. That way, it's impossible to lose the connection between the image and the master. You can keep it elsewhere as well, but embedded metadata is a great place to store critical identifiers. kyle On Wed, Apr 8, 2015 at 3:32 PM, Sara Amato sam...@willamette.edu wrote: Has anyone leapt on board with Glacier? We are considering using it for long term storage of high res archival scans. We have derivative copies for dissemination, so don’t intend touching these often, if ever. The question I have is how to best track the Archive ID that glacier attaches to deposits, as it looks like that is the only way to retrieve information if needed (though you can attach a brief description also that appears on the inventory along with the id.) We’re considering putting the ID in Archivist Toolkit, where the location of the dissemination copies is noted, but am wondering if there are other tools out there specific for this scenario that people are using. -- Jason
Re: [CODE4LIB] Amazon Glacier - tracking deposits
Have you looked at google's cloud storage nearline? it is about $0.01 per gigabyte per month with about 3 second access time http://googlecloudplatform.blogspot.com/2015/03/introducing-Google-Cloud-Storage-Nearline-near-online-data-at-an-offline-price.html -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Cary Gordon Sent: Wednesday, April 08, 2015 7:49 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Amazon Glacier - tracking deposits We have been playing with Glacier, but so far neither us nor our clients have been convinced of its cost-effectiveness. A while back, we were discussing a project with 15 PB of archival assets, and that would certainly have made Glacier cost-effective, saving about $30k/mo. over S3, although requests could cut into that. The Glacier location is in the format /Account ID/vaults/Vault Name/archives/Archive ID, so you might want to consider using the whole string. Thanks, Cary On Apr 8, 2015, at 3:32 PM, Sara Amato sam...@willamette.edu wrote: Has anyone leapt on board with Glacier? We are considering using it for long term storage of high res archival scans. We have derivative copies for dissemination, so don't intend touching these often, if ever. The question I have is how to best track the Archive ID that glacier attaches to deposits, as it looks like that is the only way to retrieve information if needed (though you can attach a brief description also that appears on the inventory along with the id.) We're considering putting the ID in Archivist Toolkit, where the location of the dissemination copies is noted, but am wondering if there are other tools out there specific for this scenario that people are using.
Re: [CODE4LIB] Amazon Glacier - tracking deposits
Howdy Sara, I've played around a bit with Glacier. It's a bit weird to work with, but tools keep on improving. The real question is what you hope to accomplish with it. As its name implies, it's designed for stuff that is basically frozen. When you take things out, you need to do so very slowly. The pricing model is such that if you try to pull out stuff quickly (e.g. you're trying to restore a system), the cost goes into the stratosphere -- definitely model what things would look like before using it for purposes like backup. However, if you have access images that are already backed up on disk or tape offsite (i.e. system recovery needs already taken care of) and this is just for storage of high res scans, Glacier could be a good way to go. As far as the ID's go, I'd embed them directly into the access image metadata. That way, it's impossible to lose the connection between the image and the master. You can keep it elsewhere as well, but embedded metadata is a great place to store critical identifiers. kyle On Wed, Apr 8, 2015 at 3:32 PM, Sara Amato sam...@willamette.edu wrote: Has anyone leapt on board with Glacier? We are considering using it for long term storage of high res archival scans. We have derivative copies for dissemination, so don’t intend touching these often, if ever. The question I have is how to best track the Archive ID that glacier attaches to deposits, as it looks like that is the only way to retrieve information if needed (though you can attach a brief description also that appears on the inventory along with the id.) We’re considering putting the ID in Archivist Toolkit, where the location of the dissemination copies is noted, but am wondering if there are other tools out there specific for this scenario that people are using.
Re: [CODE4LIB] Amazon Glacier - tracking deposits
Be aware of data transfer cost if you are using Glacier. Glacier is excellent choice for archive use, but you want to be sure these files shall not be accessed often. You shall consider the total cost of ownership including data transfer cost, which could be very expensive if you retrieve more than 5%? Of your data. It adds up quickly if you do not check carefully. I have one article to-be-published discussing Amazon S3 , Glacier. Also including history of data transfer and storage cost over the past 7 years in Library Hi-tech. For id, I designed and implemented a unique persistent id system for all the digital files (which is also used as DOI if needed). Yan Han The University of Arizona Libraries On 4/9/15, 4:13 AM, Scancella, John j...@loc.gov wrote: Have you looked at google's cloud storage nearline? it is about $0.01 per gigabyte per month with about 3 second access time http://googlecloudplatform.blogspot.com/2015/03/introducing-Google-Cloud-S torage-Nearline-near-online-data-at-an-offline-price.html -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Cary Gordon Sent: Wednesday, April 08, 2015 7:49 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Amazon Glacier - tracking deposits We have been playing with Glacier, but so far neither us nor our clients have been convinced of its cost-effectiveness. A while back, we were discussing a project with 15 PB of archival assets, and that would certainly have made Glacier cost-effective, saving about $30k/mo. over S3, although requests could cut into that. The Glacier location is in the format /Account ID/vaults/Vault Name/archives/Archive ID, so you might want to consider using the whole string. Thanks, Cary On Apr 8, 2015, at 3:32 PM, Sara Amato sam...@willamette.edu wrote: Has anyone leapt on board with Glacier? We are considering using it for long term storage of high res archival scans. We have derivative copies for dissemination, so don't intend touching these often, if ever. The question I have is how to best track the Archive ID that glacier attaches to deposits, as it looks like that is the only way to retrieve information if needed (though you can attach a brief description also that appears on the inventory along with the id.) We're considering putting the ID in Archivist Toolkit, where the location of the dissemination copies is noted, but am wondering if there are other tools out there specific for this scenario that people are using.
[CODE4LIB] Amazon Glacier - tracking deposits
Has anyone leapt on board with Glacier? We are considering using it for long term storage of high res archival scans. We have derivative copies for dissemination, so don’t intend touching these often, if ever. The question I have is how to best track the Archive ID that glacier attaches to deposits, as it looks like that is the only way to retrieve information if needed (though you can attach a brief description also that appears on the inventory along with the id.) We’re considering putting the ID in Archivist Toolkit, where the location of the dissemination copies is noted, but am wondering if there are other tools out there specific for this scenario that people are using.
Re: [CODE4LIB] Amazon Glacier - tracking deposits
We have been playing with Glacier, but so far neither us nor our clients have been convinced of its cost-effectiveness. A while back, we were discussing a project with 15 PB of archival assets, and that would certainly have made Glacier cost-effective, saving about $30k/mo. over S3, although requests could cut into that. The Glacier location is in the format /Account ID/vaults/Vault Name/archives/Archive ID, so you might want to consider using the whole string. Thanks, Cary On Apr 8, 2015, at 3:32 PM, Sara Amato sam...@willamette.edu wrote: Has anyone leapt on board with Glacier? We are considering using it for long term storage of high res archival scans. We have derivative copies for dissemination, so don’t intend touching these often, if ever. The question I have is how to best track the Archive ID that glacier attaches to deposits, as it looks like that is the only way to retrieve information if needed (though you can attach a brief description also that appears on the inventory along with the id.) We’re considering putting the ID in Archivist Toolkit, where the location of the dissemination copies is noted, but am wondering if there are other tools out there specific for this scenario that people are using.