Re: [CODE4LIB] archiving web pages

2014-01-16 Thread Kari R Smith
[mailto:CODE4LIB@listserv.nd.edu] On Behalf Of Wilhelmina Randtke Sent: Wednesday, January 15, 2014 10:29 AM To: CODE4LIB@listserv.nd.edu Subject: Re: [CODE4LIB] archiving web pages Agreed, don't focus too much on preserving the presentation for an online newspaper. The text and images

Re: [CODE4LIB] archiving web pages

2014-01-15 Thread Stern, Randy
Here is another: http://wax.lib.harvard.edu/collections/home.do - Randy -- Date:Tue, 14 Jan 2014 10:43:18 -0700 From:Robert Sanderson azarot...@gmail.com Subject: Re: archiving web pages Here are several to consider: *

Re: [CODE4LIB] archiving web pages

2014-01-15 Thread Wilhelmina Randtke
Agreed, don't focus too much on preserving the presentation for an online newspaper. The text and images are important, but the layout isn't so important. -Wilhelmina Randtke On Tue, Jan 14, 2014 at 10:59 AM, Kyle Banerjee kyle.baner...@gmail.comwrote: IMO, there are many web archiving

Re: [CODE4LIB] archiving web pages

2014-01-15 Thread Andrew Darby
If it's doable, I think preserving the whole enchilada is desirable. For instance, at my last library, there was a regular assignment where students needed the print version of old periodicals because they were tasked with analysing the ads and layouts. Someone might be interested in web layouts

Re: [CODE4LIB] archiving web pages

2014-01-15 Thread Alexander Duryee
There's always the option of capturing a WARC of the newspaper as the preservation master for dark storage, and generating PDFs for access via your CMS. If you're in ContentDM already, then a PDF would be much easier to use (both on the back and frontends). The provenance metadata of WARC is too

Re: [CODE4LIB] archiving web pages

2014-01-15 Thread Kyle Banerjee
On Wed, Jan 15, 2014 at 8:52 AM, Andrew Darby darby.li...@gmail.com wrote: If it's doable, I think preserving the whole enchilada is desirable. For instance, at my last library, there was a regular assignment where students needed the print version of old periodicals because they were tasked

Re: [CODE4LIB] archiving web pages

2014-01-15 Thread Nicholas Taylor
+1 to Alex's suggestion to use WARC for the preservation master and generate PDFs for access. While I agree with Kyle that it's ultimately the content that's important and that hypothetical researcher needs are inexhaustible, I do think there's an advantage to preserving web content in a

Re: [CODE4LIB] archiving web pages

2014-01-14 Thread Kyle Banerjee
IMO, there are many web archiving situations where it is more appropriate to just focus on the content rather than the manifestation of the content. Just as you wouldn't expect a 1995 article from the NYT to be displayed as the website was in 1995 or an article in an online database to actually

Re: [CODE4LIB] archiving web pages

2014-01-14 Thread L Snider
Hi Kathryn, Right now the WARC format is considered the best preservation format for websites/social media, in terms of digital archives. It is our best guess right now. It will likely will be with us for a long time, because it has been adopted by most of the major players. The way I have seen

Re: [CODE4LIB] archiving web pages

2014-01-14 Thread Robert Sanderson
For what it's worth, the latest wayback code is: https://github.com/iipc/openwayback And being developed by the IIPC consortium, rather than just the Internet Archive alone. It has many additional features, contributed by other members. It should be used in preference to the sourceforge

Re: [CODE4LIB] archiving web pages

2014-01-14 Thread Francis Kayiwa
On 1/14/2014 11:48 AM, Kathryn Frederick (Library) wrote: Hi, I'm trying to develop a strategy for preserving issues our school's online newspaper. Creating a WARC file of the content seems straightforward, but how will that content fair long-term? Also, how is the WARC served to an end-user?

Re: [CODE4LIB] archiving web pages

2014-01-14 Thread L Snider
Rob is right on! I included the wrong link, thanks for catching that... Cheers Lisa On Tue, Jan 14, 2014 at 11:04 AM, Robert Sanderson azarot...@gmail.comwrote: For what it's worth, the latest wayback code is: https://github.com/iipc/openwayback And being developed by the IIPC

Re: [CODE4LIB] archiving web pages

2014-01-14 Thread Nathan Tallman
On Tue, Jan 14, 2014 at 12:08 PM, Francis Kayiwa fkay...@colgate.eduwrote: If Skidmore has an IR I'd looking into adding them into your IR and render from there (in addition to WARC'ing them) Francis, I'm confused when you say in addition to WARC'ing them. Wouldn't you be putting the WARC

Re: [CODE4LIB] archiving web pages

2014-01-14 Thread Nathan Tallman
Lisa, Is your local web archive available online? I'd like to see a production example of non-Internet Archive instance of Wayback/Open Wayback. Thanks, Nathan On Tue, Jan 14, 2014 at 12:17 PM, L Snider lsni...@gmail.com wrote: Rob is right on! I included the wrong link, thanks for catching

Re: [CODE4LIB] archiving web pages

2014-01-14 Thread L Snider
Hi Nathan, Nope, unfortunately not...It was done as a test, and at that time we used the IA only version. Cheers Lisa On Tue, Jan 14, 2014 at 11:31 AM, Nathan Tallman ntall...@gmail.com wrote: Lisa, Is your local web archive available online? I'd like to see a production example of

Re: [CODE4LIB] archiving web pages

2014-01-14 Thread Robert Sanderson
Here are several to consider: * http://www.webarchive.org.uk/wayback/archive/*/http://www.aboutmayfair.co.uk/ * http://webarchive.loc.gov/lcwa0015/*/http://lawprofessors.typepad.com/adminlaw/ * http://www.padi.cat:8080/wayback/*/http://www.ajberga.cat/ * http://vefsafn.is/index.php?page=english

Re: [CODE4LIB] archiving web pages

2014-01-14 Thread Nick Ruest
Hi- We actually have implemented the original question above with some shell scripts[1] for harvesting, and creating SIPs. The SIPs are then ingested into our Islandora instance with the Web ARChive Solution Pack[2] as AIPs. DIPs are also available via our local Wayback instance[3], and on

Re: [CODE4LIB] archiving web pages

2014-01-14 Thread Kari R Smith
Kathryn, When you write strategy do you mean a technology solution or a preservation strategy, one component of which is the technology implementation of said strategy? If it's a preservation strategy for your school's online (web) content - so archival records - see what the University of

Re: [CODE4LIB] archiving web pages

2014-01-14 Thread Francis Kayiwa
On 1/14/2014 12:26 PM, Nathan Tallman wrote: On Tue, Jan 14, 2014 at 12:08 PM, Francis Kayiwa fkay...@colgate.eduwrote: If Skidmore has an IR I'd looking into adding them into your IR and render from there (in addition to WARC'ing them) Francis, I'm confused when you say in addition to

Re: [CODE4LIB] archiving web pages

2014-01-14 Thread Kathryn Frederick (Library)
Thanks for the thoughtful responses. We've been actively digitizing our print paper (which ceased publication in 2011) and I was thinking of this as an extension of that effort. Right now, I think capturing a monthly WARC file of the site is definitely a good idea no matter what. But beyond

Re: [CODE4LIB] archiving web pages

2014-01-14 Thread L Snider
As an archivist, I don't see any problem using a PDF. Technically it should be a PDF-A, but realistically it is usually a PDF. I have done projects where I used PDFs for the archiving of full websites. It can be quite handy, depending on needs of course. Sometimes it works with the look and