Re: [CODE4LIB] Amazon Glacier - tracking deposits

2015-04-09 Thread Jason Sherman
Hi Sara,

At OU Libraries, we've just started using Glacier in earnest.  We're
tracking our glacier archives in DynamoDB tables.  I've whipped up a
little python script to stick LC bags into glacier and make them
easier for us to keep track of and retrieve.

https://github.com/OULibraries/FreezerBag

On Thu, Apr 9, 2015 at 11:27 AM, Kyle Banerjee kyle.baner...@gmail.com wrote:
 Howdy Sara,

 I've played around a bit with Glacier. It's a bit weird to work with, but
 tools keep on improving.

 The real question is what you hope to accomplish with it. As its name
 implies, it's designed for stuff that is basically frozen. When you take
 things out, you need to do so very slowly. The pricing model is such that
 if you try to pull out stuff quickly (e.g. you're trying to restore a
 system), the cost goes into the stratosphere -- definitely model what
 things would look like before using it for purposes like backup.

 However, if you have access images that are already backed up on disk or
 tape offsite (i.e. system recovery needs already taken care of) and this is
 just for storage of high res scans, Glacier could be a good way to go.

 As far as the ID's go, I'd embed them directly into the access image
 metadata. That way, it's impossible to lose the connection between the
 image and the master. You can keep it elsewhere as well, but embedded
 metadata is a great place to store critical identifiers.

 kyle

 On Wed, Apr 8, 2015 at 3:32 PM, Sara Amato sam...@willamette.edu wrote:

 Has anyone leapt on board with Glacier?   We are considering using it for
 long term storage of high res archival scans.  We have derivative copies
 for dissemination, so don’t intend touching these often, if ever.   The
 question I have is how to best track the  Archive ID that glacier attaches
 to deposits, as it looks like that is the only way to retrieve information
 if needed (though you can attach a brief description also that appears on
 the inventory along with the id.)   We’re considering putting the ID in
 Archivist Toolkit, where the location of the dissemination copies is noted,
 but am wondering if there are other tools out there specific for this
 scenario that people are using.




-- 
Jason


Re: [CODE4LIB] Amazon Glacier - tracking deposits

2015-04-09 Thread Scancella, John
Have you looked at google's cloud storage nearline? it is about $0.01 per 
gigabyte per month with about 3 second access time
http://googlecloudplatform.blogspot.com/2015/03/introducing-Google-Cloud-Storage-Nearline-near-online-data-at-an-offline-price.html


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Cary 
Gordon
Sent: Wednesday, April 08, 2015 7:49 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Amazon Glacier - tracking deposits

We have been playing with Glacier, but so far neither us nor our clients have 
been convinced of its cost-effectiveness. A while back, we were discussing a 
project with 15 PB of archival assets, and that would certainly have made 
Glacier cost-effective, saving about $30k/mo. over S3, although requests could 
cut into that.

The Glacier location is in the format /Account ID/vaults/Vault 
Name/archives/Archive ID, so you might want to consider using the whole 
string.

Thanks,

Cary


 On Apr 8, 2015, at 3:32 PM, Sara Amato sam...@willamette.edu wrote:
 
 Has anyone leapt on board with Glacier?   We are considering using it for 
 long term storage of high res archival scans.  We have derivative copies for 
 dissemination, so don't intend touching these often, if ever.   The question 
 I have is how to best track the  Archive ID that glacier attaches to 
 deposits, as it looks like that is the only way to retrieve information if 
 needed (though you can attach a brief description also that appears on the 
 inventory along with the id.)   We're considering putting the ID in Archivist 
 Toolkit, where the location of the dissemination copies is noted, but am 
 wondering if there are other tools out there specific for this scenario that 
 people are using. 


Re: [CODE4LIB] Amazon Glacier - tracking deposits

2015-04-09 Thread Kyle Banerjee
Howdy Sara,

I've played around a bit with Glacier. It's a bit weird to work with, but
tools keep on improving.

The real question is what you hope to accomplish with it. As its name
implies, it's designed for stuff that is basically frozen. When you take
things out, you need to do so very slowly. The pricing model is such that
if you try to pull out stuff quickly (e.g. you're trying to restore a
system), the cost goes into the stratosphere -- definitely model what
things would look like before using it for purposes like backup.

However, if you have access images that are already backed up on disk or
tape offsite (i.e. system recovery needs already taken care of) and this is
just for storage of high res scans, Glacier could be a good way to go.

As far as the ID's go, I'd embed them directly into the access image
metadata. That way, it's impossible to lose the connection between the
image and the master. You can keep it elsewhere as well, but embedded
metadata is a great place to store critical identifiers.

kyle

On Wed, Apr 8, 2015 at 3:32 PM, Sara Amato sam...@willamette.edu wrote:

 Has anyone leapt on board with Glacier?   We are considering using it for
 long term storage of high res archival scans.  We have derivative copies
 for dissemination, so don’t intend touching these often, if ever.   The
 question I have is how to best track the  Archive ID that glacier attaches
 to deposits, as it looks like that is the only way to retrieve information
 if needed (though you can attach a brief description also that appears on
 the inventory along with the id.)   We’re considering putting the ID in
 Archivist Toolkit, where the location of the dissemination copies is noted,
 but am wondering if there are other tools out there specific for this
 scenario that people are using.



Re: [CODE4LIB] Amazon Glacier - tracking deposits

2015-04-09 Thread Han, Yan - (yhan)
Be aware of data transfer cost if you are using Glacier.
Glacier is excellent choice for archive use, but you want to be sure these
files shall not be accessed often.

You shall consider the total cost of ownership including data transfer
cost, which could be very expensive if you retrieve more than 5%? Of your
data. It adds up quickly if you do not check carefully.

I have one article to-be-published discussing Amazon S3 , Glacier. Also
including history of data transfer and storage cost over the past 7 years
in Library Hi-tech.

For id, I designed and implemented  a unique persistent id system for all
the digital files (which is also used as DOI if needed).


Yan Han
The University of Arizona Libraries




On 4/9/15, 4:13 AM, Scancella, John j...@loc.gov wrote:

Have you looked at google's cloud storage nearline? it is about $0.01
per gigabyte per month with about 3 second access time
http://googlecloudplatform.blogspot.com/2015/03/introducing-Google-Cloud-S
torage-Nearline-near-online-data-at-an-offline-price.html


-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Cary Gordon
Sent: Wednesday, April 08, 2015 7:49 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Amazon Glacier - tracking deposits

We have been playing with Glacier, but so far neither us nor our clients
have been convinced of its cost-effectiveness. A while back, we were
discussing a project with 15 PB of archival assets, and that would
certainly have made Glacier cost-effective, saving about $30k/mo. over
S3, although requests could cut into that.

The Glacier location is in the format /Account ID/vaults/Vault
Name/archives/Archive ID, so you might want to consider using the
whole string.

Thanks,

Cary


 On Apr 8, 2015, at 3:32 PM, Sara Amato sam...@willamette.edu wrote:
 
 Has anyone leapt on board with Glacier?   We are considering using it
for long term storage of high res archival scans.  We have derivative
copies for dissemination, so don't intend touching these often, if ever.
  The question I have is how to best track the  Archive ID that glacier
attaches to deposits, as it looks like that is the only way to retrieve
information if needed (though you can attach a brief description also
that appears on the inventory along with the id.)   We're considering
putting the ID in Archivist Toolkit, where the location of the
dissemination copies is noted, but am wondering if there are other tools
out there specific for this scenario that people are using. 


[CODE4LIB] Amazon Glacier - tracking deposits

2015-04-08 Thread Sara Amato
Has anyone leapt on board with Glacier?   We are considering using it for long 
term storage of high res archival scans.  We have derivative copies for 
dissemination, so don’t intend touching these often, if ever.   The question I 
have is how to best track the  Archive ID that glacier attaches to deposits, as 
it looks like that is the only way to retrieve information if needed (though 
you can attach a brief description also that appears on the inventory along 
with the id.)   We’re considering putting the ID in Archivist Toolkit, where 
the location of the dissemination copies is noted, but am wondering if there 
are other tools out there specific for this scenario that people are using. 


Re: [CODE4LIB] Amazon Glacier - tracking deposits

2015-04-08 Thread Cary Gordon
We have been playing with Glacier, but so far neither us nor our clients have 
been convinced of its cost-effectiveness. A while back, we were discussing a 
project with 15 PB of archival assets, and that would certainly have made 
Glacier cost-effective, saving about $30k/mo. over S3, although requests could 
cut into that.

The Glacier location is in the format /Account ID/vaults/Vault 
Name/archives/Archive ID, so you might want to consider using the whole 
string.

Thanks,

Cary


 On Apr 8, 2015, at 3:32 PM, Sara Amato sam...@willamette.edu wrote:
 
 Has anyone leapt on board with Glacier?   We are considering using it for 
 long term storage of high res archival scans.  We have derivative copies for 
 dissemination, so don’t intend touching these often, if ever.   The question 
 I have is how to best track the  Archive ID that glacier attaches to 
 deposits, as it looks like that is the only way to retrieve information if 
 needed (though you can attach a brief description also that appears on the 
 inventory along with the id.)   We’re considering putting the ID in Archivist 
 Toolkit, where the location of the dissemination copies is noted, but am 
 wondering if there are other tools out there specific for this scenario that 
 people are using.