Re: [Reprozip-users] Web Archiving

Rémi Rampin Tue, 03 Apr 2018 10:13:47 -0700

2018-03-30 08:59 EDT, Rasa Bočytė <rboc...@beeldengeluid.nl>:

> What I am most interested in is to know whether ReproZip could be used to:
> - document the original environment from which the files were acquired
> (web server, hardware, software)
> - record extra technical details and instructions that could be added
> manually
> - maintain dependencies between files and folders
> - capture metadata
>


Hello Rasa,

Thanks for your note! We have been looking specifically at packing dynamic
websites, so your email is timely. In short, ReproZip can do the bullet
points you’ve outlined in your email.

ReproZip stores the kernel version and distribution information (provided
by Python’s platform module
<https://docs.python.org/3/library/platform.html>). That is usually enough
to get a virtual machine (or base Docker image) to run the software. It
also stores the version numbers of all packages from which files are used
(on deb-based and rpm-based distributions for now).

For instructions, ReproZip stores the command-lines used to run the
software that is getting packaged. It can also record which files are
“input” or “output” of the process, which is useful when reproducing, to
allow the user to replace inputs and look at outputs easily. We try to
infer this information, but we rely on the user to check this, and provide
descriptive names for them (however this means editing the YAML
configuration file, so our experience is that not everyone takes the time).
They can also edit the command-lines/environment variables.

I am not entirely sure what you mean by “dependencies between files and
folders”. The path to files in the original environment is stored, so the
reproduction can happen with the exact same setup.

As for metadata, ReproZip can automatically capture technical and
administrative metadata in the config.yml file included in the ReproZip
package. The user (archivist, in your case!) will need to add descriptive
metadata to the package in something like a README, a finding aid, or by
archiving the ReproZip package in a repository and filling in the required
metadata form.

We have a few examples of ReproZip packing and preserving websites. We
packed a general blog made with Django (a small web application using a
SQLite3 database) -- you can view more information about that website and
the instructions for unpacking it here
<https://github.com/ViDA-NYU/reprozip-examples/tree/master/django-blog>.
We’ve also packed a data journalism app called Stacked Up, a dynamic web
application to explore the textbook inventory of Philadelphia public
schools (data is stored in a PostgreSQL database)-- you can view more
information about that website and the instructions for unpacking it here
<https://github.com/ViDA-NYU/reprozip-examples/tree/master/stacked-up>. You
can view all of our examples in our examples site
<https://examples.reprozip.org/>.

Best regards,
-- 
Rémi Rampin
ReproZip Developer
Center for Data Science, New York University

_______________________________________________
Reprozip-users mailing list
Reprozip-users@vgc.poly.edu
https://vgc.poly.edu/mailman/listinfo/reprozip-users

Re: [Reprozip-users] Web Archiving

Reply via email to