Re: [Reprozip-users] Web Archiving

Rémi Rampin Wed, 04 Apr 2018 12:29:32 -0700

---------- Forwarded message ----------
From: Rasa Bočytė <rboc...@beeldengeluid.nl>
Date: Wed, Apr 4, 2018 at 8:03 AM
Subject: Re: [Reprozip-users] Web Archiving
To: Rémi Rampin <remi.ram...@nyu.edu>



Dear Remi,

thank you for your response! It is good to hear that other people are
working on similar issues as well!

Could you tell me a bit more about your work on trying to packaging dynamic
websites? Are you working on specific cases or just exploring the
possibilities? I would be very interested to hear how you approach this.

In our case, we are getting all the source files directly from content
creators and we are looking for a way to record and store all the
technical, administrative and descriptive metadata, and visualise
dependencies on software/hardware/file formats/ etc. (similar to what
Binder does). We would try to get as much information from the creators
(probably via a questionnaire) about all the technical details as well as
creative processes, and preferably record it in a machine and human
readable format (XML) or README file.

At the end of the day, we want to see whether we can come up with some kind
of scalable strategy for websites preservation or whether this would have
to be done very much on a case-by-case basis. We have been mostly
considering migration as it is a more scalable approach and less
technically demanding. Do you find that virtualisation is a better strategy
for website preservation? At least from the archival community, we have
heard some reservations about using Docker since it is not considered a
stable platform.


Kind regards,
Rasa

On 3 April 2018 at 19:11, Rémi Rampin <remi.ram...@nyu.edu> wrote:

> 2018-03-30 08:59 EDT, Rasa Bočytė <rboc...@beeldengeluid.nl>:
>
>> What I am most interested in is to know whether ReproZip could be used to:
>> - document the original environment from which the files were acquired
>> (web server, hardware, software)
>> - record extra technical details and instructions that could be added
>> manually
>> - maintain dependencies between files and folders
>> - capture metadata
>>
>
> Hello Rasa,
>
> Thanks for your note! We have been looking specifically at packing dynamic
> websites, so your email is timely. In short, ReproZip can do the bullet
> points you’ve outlined in your email.
>
> ReproZip stores the kernel version and distribution information (provided
> by Python’s platform module
> <https://docs.python.org/3/library/platform.html>). That is usually
> enough to get a virtual machine (or base Docker image) to run the software.
> It also stores the version numbers of all packages from which files are
> used (on deb-based and rpm-based distributions for now).
>
> For instructions, ReproZip stores the command-lines used to run the
> software that is getting packaged. It can also record which files are
> “input” or “output” of the process, which is useful when reproducing, to
> allow the user to replace inputs and look at outputs easily. We try to
> infer this information, but we rely on the user to check this, and provide
> descriptive names for them (however this means editing the YAML
> configuration file, so our experience is that not everyone takes the time).
> They can also edit the command-lines/environment variables.
>
> I am not entirely sure what you mean by “dependencies between files and
> folders”. The path to files in the original environment is stored, so the
> reproduction can happen with the exact same setup.
>
> As for metadata, ReproZip can automatically capture technical and
> administrative metadata in the config.yml file included in the ReproZip
> package. The user (archivist, in your case!) will need to add descriptive
> metadata to the package in something like a README, a finding aid, or by
> archiving the ReproZip package in a repository and filling in the required
> metadata form.
>
> We have a few examples of ReproZip packing and preserving websites. We
> packed a general blog made with Django (a small web application using a
> SQLite3 database) -- you can view more information about that website and
> the instructions for unpacking it here
> <https://github.com/ViDA-NYU/reprozip-examples/tree/master/django-blog>.
> We’ve also packed a data journalism app called Stacked Up, a dynamic web
> application to explore the textbook inventory of Philadelphia public
> schools (data is stored in a PostgreSQL database)-- you can view more
> information about that website and the instructions for unpacking it here
> <https://github.com/ViDA-NYU/reprozip-examples/tree/master/stacked-up>.
> You can view all of our examples in our examples site
> <https://examples.reprozip.org/>.
>
> Best regards,
> --
> Rémi Rampin
> ReproZip Developer
> Center for Data Science, New York University
>



-- 

*Rasa Bocyte*
Web Archiving Intern

*Netherlands Institute for Sound and Vision*
*Media
<https://maps.google.com/?q=Media%C2%A0Parkboulevard%C2%A01&entry=gmail&source=g>
Parkboulevard
<https://maps.google.com/?q=Media%C2%A0Parkboulevard%C2%A01&entry=gmail&source=g>
1
<https://maps.google.com/?q=Media%C2%A0Parkboulevard%C2%A01&entry=gmail&source=g>,
1217 WE  Hilversum | Postbus 1060, 1200 BB  Hilversum |
**beeldengeluid.nl
<http://www.beeldengeluid.nl/>*

_______________________________________________
Reprozip-users mailing list
Reprozip-users@vgc.poly.edu
https://vgc.poly.edu/mailman/listinfo/reprozip-users

Re: [Reprozip-users] Web Archiving

Reply via email to