Re: [Reprozip-users] Web Archiving

2018-06-05 Thread Rémi Rampin
2018-05-11 09:54 EDT, Rasa Bočytė :

> A couple of questions:
> - apart from tracing the server, do I need to trace any other processes to
> ensure that I can reproduce the experiment?
> - how do I ensure that my package includes all the website data? I tried
> running a php inbuilt server (reprozip trace php -S localhost:8000 -t
> path/to/the/website/files) and the finished package only captured data from
> the web pages that I visited while running the experiment.
> - can I somehow include the browser needed to run the website into the
> package?
>

Hi Rasa,

Apologies for the delay. To answer your question, when using LAMP you would
need to trace both Apache (which would capture the PHP interpreter, since
it is embedded as a module) and MySQL (you can see Capturing Connections to
Servers 
for more information).

Only the PHP files you visited will be traced by ReproZip, however you can
add the whole site folder (usually under /var/www) and MySQL data folder (
/var/lib/mysql) to the additional_files section of the configuration file

if you don't want to visit every single page (make sure to visit some of
them to get all the libraries and database connections involved).

Let us know if you have more questions!
-- 
Rémi
___
Reprozip-users mailing list
Reprozip-users@vgc.poly.edu
https://vgc.poly.edu/mailman/listinfo/reprozip-users


Re: [Reprozip-users] Web Archiving

2018-05-11 Thread Rasa Bočytė
Hi,

I was just wondering if you would be able to help me with testing out
Reprozip.

I have my case study website set up on my laptop. It does not require any
special server-side software, I can run it with a simple php inbuilt server
or Lampp. At the moment I am doing it only for testing purposes, so I could
use any server software that would work with Reprozip.

A couple of questions:
- apart from tracing the server, do I need to trace any other processes to
ensure that I can reproduce the experiment?
- how do I ensure that my package includes all the website data? I tried
running a php inbuilt server (reprozip trace php -S localhost:8000 -t
path/to/the/website/files) and the finished package only captured data from
the web pages that I visited while running the experiment.
- can I somehow include the browser needed to run the website into the
package?

Thank you for your help!

Regards,
Rasa

On 24 April 2018 at 15:37, Rasa Bočytė  wrote:

> Hi both,
>
> I presented ReproZip to other researchers in my institution and everyone
> seems quite excited to see if it would work for us! I still need to discuss
> it with a couple of other colleagues but I think we will try to test it.
>
> One of the things that I am trying to figure out is how to include
> client-side software, i.e. web browser, into the equation. Would you have
> to create a separate container for that? I deally, we would like to package
> everything, source files, server-side dependencies and client-side
> dependencies, into one place, but I don't know if that is feasible.
>
> Regards,
> Rasa
>
> On 18 April 2018 at 18:27, Vicky Steeves  wrote:
>
>> Hi Rasa,
>>
>> Apologies, we were traveling and just got back to the office. We are very
>> glad to be of help!
>>
>> We let the users packing experiments to edit the yml file before the
>> final packing step, and for those secondary users who unpack, we let them
>> download and view the yml file. We certainly *could* automatically
>> extract categories of information for the user. It bears more thinking
>> about, especially since there are a few ways that unpacking users interface
>> with ReproUnzip.
>>
>> Best,
>> Vicky
>>
>> Vicky Steeves
>> Research Data Management and Reproducibility Librarian
>> Phone: 1-212-992-6269
>> ORCID: orcid.org/-0003-4298-168X/
>> vickysteeves.com | @VickySteeves 
>> NYU Libraries Data Services | NYU Center for Data Science
>>
>> On Tue, Apr 10, 2018 at 4:46 AM, Rasa Bočytė 
>> wrote:
>>
>>> Hi Remi,
>>>
>>> In terms of migration, originally my institute planned to acquire files
>>> from the creators and then figure out what to do with them, most likely
>>> migrate individual files to updated versions when needed. Which I think is
>>> not a helpful approach since you need to start at the server and capture
>>> the environment and software that manipulates those files to create a
>>> website. Especially, if you want to be able to reproduce it.
>>>
>>> I am definitely leaning towards the idea that virtualisation of a web
>>> server would be the best approach for us. I will try to test out the
>>> examples that you have on your website and see if I can run some tests with
>>> my own case studies (of course, it depends if the creators will allow us to
>>> do it).
>>>
>>> I promise I won't bother you too much but my last question is about the
>>> metadata captured on the yml file. It is machine and human readable, but
>>> the question is what do you with it and how you present it once you have it
>>> so it becomes a valuable resource for those using the preserved object.
>>> Have you thought about automatically extracting some categories of
>>> information from that file in a user-friendly format or do you think it is
>>> enough as it is?
>>>
>>> Just wanted to say a massive thank you for your feedback. It has been
>>> incredibly helpful!
>>>
>>> Rasa
>>>
>>> On 6 April 2018 at 19:53, Rémi Rampin  wrote:
>>>
 Rasa,

 2018-04-04 08:03 EDT, Rasa Bočytė :

> In our case, we are getting all the source files directly from content
> creators and we are looking for a way to record and store all the
> technical, administrative and descriptive metadata, and visualise
> dependencies on software/hardware/file formats/ etc. (similar to what
> Binder does).
>

 I didn't think Binder did that (this binder?
 ). It is certainly a good
 resource for reproducing environments already described as a Docker image
 or Conda YaML, but I am not aware of ways to use it to track or visualize
 dependencies or any metadata.

 We have been mostly considering migration as it is a more scalable
> approach and less technically demanding. Do you find that virtualisation 
> is
> a better strategy for website preservation? 

Re: [Reprozip-users] Web Archiving

2018-04-24 Thread Rasa Bočytė
Hi both,

I presented ReproZip to other researchers in my institution and everyone
seems quite excited to see if it would work for us! I still need to discuss
it with a couple of other colleagues but I think we will try to test it.

One of the things that I am trying to figure out is how to include
client-side software, i.e. web browser, into the equation. Would you have
to create a separate container for that? I deally, we would like to package
everything, source files, server-side dependencies and client-side
dependencies, into one place, but I don't know if that is feasible.

Regards,
Rasa

On 18 April 2018 at 18:27, Vicky Steeves  wrote:

> Hi Rasa,
>
> Apologies, we were traveling and just got back to the office. We are very
> glad to be of help!
>
> We let the users packing experiments to edit the yml file before the final
> packing step, and for those secondary users who unpack, we let them
> download and view the yml file. We certainly *could* automatically
> extract categories of information for the user. It bears more thinking
> about, especially since there are a few ways that unpacking users interface
> with ReproUnzip.
>
> Best,
> Vicky
>
> Vicky Steeves
> Research Data Management and Reproducibility Librarian
> Phone: 1-212-992-6269
> ORCID: orcid.org/-0003-4298-168X/
> vickysteeves.com | @VickySteeves 
> NYU Libraries Data Services | NYU Center for Data Science
>
> On Tue, Apr 10, 2018 at 4:46 AM, Rasa Bočytė 
> wrote:
>
>> Hi Remi,
>>
>> In terms of migration, originally my institute planned to acquire files
>> from the creators and then figure out what to do with them, most likely
>> migrate individual files to updated versions when needed. Which I think is
>> not a helpful approach since you need to start at the server and capture
>> the environment and software that manipulates those files to create a
>> website. Especially, if you want to be able to reproduce it.
>>
>> I am definitely leaning towards the idea that virtualisation of a web
>> server would be the best approach for us. I will try to test out the
>> examples that you have on your website and see if I can run some tests with
>> my own case studies (of course, it depends if the creators will allow us to
>> do it).
>>
>> I promise I won't bother you too much but my last question is about the
>> metadata captured on the yml file. It is machine and human readable, but
>> the question is what do you with it and how you present it once you have it
>> so it becomes a valuable resource for those using the preserved object.
>> Have you thought about automatically extracting some categories of
>> information from that file in a user-friendly format or do you think it is
>> enough as it is?
>>
>> Just wanted to say a massive thank you for your feedback. It has been
>> incredibly helpful!
>>
>> Rasa
>>
>> On 6 April 2018 at 19:53, Rémi Rampin  wrote:
>>
>>> Rasa,
>>>
>>> 2018-04-04 08:03 EDT, Rasa Bočytė :
>>>
 In our case, we are getting all the source files directly from content
 creators and we are looking for a way to record and store all the
 technical, administrative and descriptive metadata, and visualise
 dependencies on software/hardware/file formats/ etc. (similar to what
 Binder does).

>>>
>>> I didn't think Binder did that (this binder?
>>> ). It is certainly a good
>>> resource for reproducing environments already described as a Docker image
>>> or Conda YaML, but I am not aware of ways to use it to track or visualize
>>> dependencies or any metadata.
>>>
>>> We have been mostly considering migration as it is a more scalable
 approach and less technically demanding. Do you find that virtualisation is
 a better strategy for website preservation? At least from the archival
 community, we have heard some reservations about using Docker since it is
 not considered a stable platform.

>>>
>>> When you talk of migration, do you mean to new hardware? What would you
>>> be migrating to? Or do you mean upgrading underlying software/frameworks?
>>> The way I see it, virtualization (sometimes referred to as "preserving
>>> the mess") is definitely less technically demanding than migration. Could
>>> you share a bit more about what you mean by this?
>>>
>>> Thanks
>>>
>>> PS: Please make sure you keep us...@reprozip.org in the recipients list.
>>> --
>>> Rémi Rampin
>>> ReproZip Developer
>>> Center for Data Science, New York University
>>>
>>
>>
>>
>> --
>>
>> *Rasa Bocyte*
>> Web Archiving Intern
>>
>> *Netherlands Institute for Sound and Vision*
>> *Media
>> 
>>  Parkboulevard
>> 
>>  1
>> ,
>>  1217 WE  Hilversum | 

Re: [Reprozip-users] Web Archiving

2018-04-18 Thread Vicky Steeves
Hi Rasa,

Apologies, we were traveling and just got back to the office. We are very
glad to be of help!

We let the users packing experiments to edit the yml file before the final
packing step, and for those secondary users who unpack, we let them
download and view the yml file. We certainly *could* automatically extract
categories of information for the user. It bears more thinking about,
especially since there are a few ways that unpacking users interface with
ReproUnzip.

Best,
Vicky

Vicky Steeves
Research Data Management and Reproducibility Librarian
Phone: 1-212-992-6269
ORCID: orcid.org/-0003-4298-168X/
vickysteeves.com | @VickySteeves 
NYU Libraries Data Services | NYU Center for Data Science

On Tue, Apr 10, 2018 at 4:46 AM, Rasa Bočytė 
wrote:

> Hi Remi,
>
> In terms of migration, originally my institute planned to acquire files
> from the creators and then figure out what to do with them, most likely
> migrate individual files to updated versions when needed. Which I think is
> not a helpful approach since you need to start at the server and capture
> the environment and software that manipulates those files to create a
> website. Especially, if you want to be able to reproduce it.
>
> I am definitely leaning towards the idea that virtualisation of a web
> server would be the best approach for us. I will try to test out the
> examples that you have on your website and see if I can run some tests with
> my own case studies (of course, it depends if the creators will allow us to
> do it).
>
> I promise I won't bother you too much but my last question is about the
> metadata captured on the yml file. It is machine and human readable, but
> the question is what do you with it and how you present it once you have it
> so it becomes a valuable resource for those using the preserved object.
> Have you thought about automatically extracting some categories of
> information from that file in a user-friendly format or do you think it is
> enough as it is?
>
> Just wanted to say a massive thank you for your feedback. It has been
> incredibly helpful!
>
> Rasa
>
> On 6 April 2018 at 19:53, Rémi Rampin  wrote:
>
>> Rasa,
>>
>> 2018-04-04 08:03 EDT, Rasa Bočytė :
>>
>>> In our case, we are getting all the source files directly from content
>>> creators and we are looking for a way to record and store all the
>>> technical, administrative and descriptive metadata, and visualise
>>> dependencies on software/hardware/file formats/ etc. (similar to what
>>> Binder does).
>>>
>>
>> I didn't think Binder did that (this binder?
>> ). It is certainly a good
>> resource for reproducing environments already described as a Docker image
>> or Conda YaML, but I am not aware of ways to use it to track or visualize
>> dependencies or any metadata.
>>
>> We have been mostly considering migration as it is a more scalable
>>> approach and less technically demanding. Do you find that virtualisation is
>>> a better strategy for website preservation? At least from the archival
>>> community, we have heard some reservations about using Docker since it is
>>> not considered a stable platform.
>>>
>>
>> When you talk of migration, do you mean to new hardware? What would you
>> be migrating to? Or do you mean upgrading underlying software/frameworks?
>> The way I see it, virtualization (sometimes referred to as "preserving
>> the mess") is definitely less technically demanding than migration. Could
>> you share a bit more about what you mean by this?
>>
>> Thanks
>>
>> PS: Please make sure you keep us...@reprozip.org in the recipients list.
>> --
>> Rémi Rampin
>> ReproZip Developer
>> Center for Data Science, New York University
>>
>
>
>
> --
>
> *Rasa Bocyte*
> Web Archiving Intern
>
> *Netherlands Institute for Sound and Vision*
> *Media
> 
>  Parkboulevard
> 
>  1
> ,
>  1217 WE  Hilversum | Postbus 1060, 1200 BB  Hilversum | **beeldengeluid.nl
> *
>
> ___
> Reprozip-users mailing list
> Reprozip-users@vgc.poly.edu
> https://vgc.poly.edu/mailman/listinfo/reprozip-users
>
>
___
Reprozip-users mailing list
Reprozip-users@vgc.poly.edu
https://vgc.poly.edu/mailman/listinfo/reprozip-users


Re: [Reprozip-users] Web Archiving

2018-04-06 Thread Rémi Rampin
2018-04-06 09:11 EDT, Rasa Bočytė :

> If I understand correctly, ReproZip can describe environments that are
> necessary to run a particular software or a web application that is used to
> create a dynamic website (Nikola in the case of your website or Django with
> StackedUp).
>

ReproZip can capture any software environment. That can be a web server,
that is serving a website (possibly dynamic), for example in the Django
case. Nikola is *not* a web server, it is a program that generates static
files initially, and those files will later be served to user through the
use of a web server.

Would it work if there is no such software? [...] could it capture the
> environment if the content is placed on a server just as files and folders
> with no software?
>

Every website necessarily has a web server. If all the content is in static
files without any kind of database or dynamic aspect to it (like files
generated by Nikola), preservation is a lot easier, since you can just zip
up those files. Is this your use case?

-- 
Rémi Rampin
ReproZip Developer
Center for Data Science, New York University
___
Reprozip-users mailing list
Reprozip-users@vgc.poly.edu
https://vgc.poly.edu/mailman/listinfo/reprozip-users


Re: [Reprozip-users] Web Archiving

2018-04-06 Thread Rémi Rampin
Rasa,

2018-04-04 08:03 EDT, Rasa Bočytė :

> In our case, we are getting all the source files directly from content
> creators and we are looking for a way to record and store all the
> technical, administrative and descriptive metadata, and visualise
> dependencies on software/hardware/file formats/ etc. (similar to what
> Binder does).
>

I didn't think Binder did that (this binder?
). It is certainly a good resource
for reproducing environments already described as a Docker image or Conda
YaML, but I am not aware of ways to use it to track or visualize
dependencies or any metadata.

We have been mostly considering migration as it is a more scalable approach
> and less technically demanding. Do you find that virtualisation is a better
> strategy for website preservation? At least from the archival community, we
> have heard some reservations about using Docker since it is not considered
> a stable platform.
>

When you talk of migration, do you mean to new hardware? What would you be
migrating to? Or do you mean upgrading underlying software/frameworks?
The way I see it, virtualization (sometimes referred to as "preserving the
mess") is definitely less technically demanding than migration. Could you
share a bit more about what you mean by this?

Thanks

PS: Please make sure you keep us...@reprozip.org in the recipients list.
-- 
Rémi Rampin
ReproZip Developer
Center for Data Science, New York University
___
Reprozip-users mailing list
Reprozip-users@vgc.poly.edu
https://vgc.poly.edu/mailman/listinfo/reprozip-users


Re: [Reprozip-users] Web Archiving

2018-04-05 Thread Vicky Steeves
Hello Rasa,

As the resident librarian on the team, I am really happy to see this email
on the ReproZip users list!

We are mainly exploring the possibilities of packing with dynamic sites,
but within the domain of data journalism. Once we have worked with those
use cases, we can certainly go beyond to other dynamic sites. Data
journalism is a good place to start because of the nature of the work and
how data journalism applications are served to the web --- lots of
containers, databases, interactive websites, etc. In order to pack anything
with ReproZip, we (or anyone using ReproZip!) needs access to the original
environment and source files. We basically need access to the server, and
then we can pack the dynamic site. We recorded the process of packing a
dynamic website and put it on YouTube, which might be helpful:
https://www.youtube.com/watch?v=SoE2nEJWylw=
PLjgZ3v4gFxpXdPRBaFTh42w3HRMmX2WfD

ReproZip automatically captures technical and administrative metadata. You
can view the technical and administrative metadata collected in this sample
config.yml file: https://gitlab.com/snippets/1686638. The config.yml has
all the metadata from the .rpz package. That particular yml file I just
linked comes from an experiment of mine, packing a website with ReproZip
made with Nikola, a static site generator, and deployed on Firefox. This is
the same website with Nikola, deployed on Google Chrome:
https://gitlab.com/snippets/1686640. The config.yml file is human readable,
but very long (lots of dependencies!). We still need to get the descriptive
metadata from the users, though.

ReproZip can visualize the provenance of the processes, dependencies, and
input/output files via a graph. The documentation and examples of those can
be found here: https://docs.reprozip.org/en/1.0.x/graph.html. We are in the
process of integrating a patch for transforming this static graph into an
interactive visualizations using D3.

As for Docker, I too do not trust it. However, ReproZip simply *uses*
Docker, but does not rely on it. ReproZip works on a plugin model -- so the
.rpz file is generalized and can be used by many virtualization and
container softwares. We are in the process of adding an unpacker for
Singularity, for example. If Docker goes out of business/ceases to exist
tomorrow, we can still unpack and reuse the contents of .rpz files. We
actually wrote a paper about how ReproZip could be used for digital
preservation, available open access here: https://osf.io/preprints/lissa/5tm8d/


In regards to emulation vs. migration, this is a larger conversation in the
digital preservation community as you probably know. There is a benefit to
serving a website (or any digital object really) in the original
environment, for serving it up to users later on. We've seen this in the
art conservation world, where time-based media preservation has forced
archivists to engage with emulation more seriously. Yale University offers
emulation as a service, where they provide access to various old operating
systems and software to allow folks to interact with digital objects in
their original environment. The video game community also has many
discussions about this, with emulators really rising out of that community.
A high quality migration can be as effective as emulation in preserving the
original look and feel of complex digital objects -- but in terms of that
scaling, I'm not sure.

Cheers,
Vicky
Vicky Steeves
Research Data Management and Reproducibility Librarian
Phone: 1-212-992-6269
ORCID: orcid.org/-0003-4298-168X/
vickysteeves.com | @VickySteeves <https://twitter.com/VickySteeves>
NYU Libraries Data Services | NYU Center for Data Science

On Wed, Apr 4, 2018 at 3:28 PM, Rémi Rampin <remi.ram...@nyu.edu> wrote:

> -- Forwarded message --
> From: Rasa Bočytė <rboc...@beeldengeluid.nl>
> Date: Wed, Apr 4, 2018 at 8:03 AM
> Subject: Re: [Reprozip-users] Web Archiving
> To: Rémi Rampin <remi.ram...@nyu.edu>
>
>
> Dear Remi,
>
> thank you for your response! It is good to hear that other people are
> working on similar issues as well!
>
> Could you tell me a bit more about your work on trying to packaging
> dynamic websites? Are you working on specific cases or just exploring the
> possibilities? I would be very interested to hear how you approach this.
>
> In our case, we are getting all the source files directly from content
> creators and we are looking for a way to record and store all the
> technical, administrative and descriptive metadata, and visualise
> dependencies on software/hardware/file formats/ etc. (similar to what
> Binder does). We would try to get as much information from the creators
> (probably via a questionnaire) about all the technical details as well as
> creative processes, and preferably record it in a machine and human
> readable format (XML) or README file.
>
> At the end of the day,

Re: [Reprozip-users] Web Archiving

2018-04-04 Thread Rémi Rampin
-- Forwarded message --
From: Rasa Bočytė <rboc...@beeldengeluid.nl>
Date: Wed, Apr 4, 2018 at 8:03 AM
Subject: Re: [Reprozip-users] Web Archiving
To: Rémi Rampin <remi.ram...@nyu.edu>


Dear Remi,

thank you for your response! It is good to hear that other people are
working on similar issues as well!

Could you tell me a bit more about your work on trying to packaging dynamic
websites? Are you working on specific cases or just exploring the
possibilities? I would be very interested to hear how you approach this.

In our case, we are getting all the source files directly from content
creators and we are looking for a way to record and store all the
technical, administrative and descriptive metadata, and visualise
dependencies on software/hardware/file formats/ etc. (similar to what
Binder does). We would try to get as much information from the creators
(probably via a questionnaire) about all the technical details as well as
creative processes, and preferably record it in a machine and human
readable format (XML) or README file.

At the end of the day, we want to see whether we can come up with some kind
of scalable strategy for websites preservation or whether this would have
to be done very much on a case-by-case basis. We have been mostly
considering migration as it is a more scalable approach and less
technically demanding. Do you find that virtualisation is a better strategy
for website preservation? At least from the archival community, we have
heard some reservations about using Docker since it is not considered a
stable platform.


Kind regards,
Rasa

On 3 April 2018 at 19:11, Rémi Rampin <remi.ram...@nyu.edu> wrote:

> 2018-03-30 08:59 EDT, Rasa Bočytė <rboc...@beeldengeluid.nl>:
>
>> What I am most interested in is to know whether ReproZip could be used to:
>> - document the original environment from which the files were acquired
>> (web server, hardware, software)
>> - record extra technical details and instructions that could be added
>> manually
>> - maintain dependencies between files and folders
>> - capture metadata
>>
>
> Hello Rasa,
>
> Thanks for your note! We have been looking specifically at packing dynamic
> websites, so your email is timely. In short, ReproZip can do the bullet
> points you’ve outlined in your email.
>
> ReproZip stores the kernel version and distribution information (provided
> by Python’s platform module
> <https://docs.python.org/3/library/platform.html>). That is usually
> enough to get a virtual machine (or base Docker image) to run the software.
> It also stores the version numbers of all packages from which files are
> used (on deb-based and rpm-based distributions for now).
>
> For instructions, ReproZip stores the command-lines used to run the
> software that is getting packaged. It can also record which files are
> “input” or “output” of the process, which is useful when reproducing, to
> allow the user to replace inputs and look at outputs easily. We try to
> infer this information, but we rely on the user to check this, and provide
> descriptive names for them (however this means editing the YAML
> configuration file, so our experience is that not everyone takes the time).
> They can also edit the command-lines/environment variables.
>
> I am not entirely sure what you mean by “dependencies between files and
> folders”. The path to files in the original environment is stored, so the
> reproduction can happen with the exact same setup.
>
> As for metadata, ReproZip can automatically capture technical and
> administrative metadata in the config.yml file included in the ReproZip
> package. The user (archivist, in your case!) will need to add descriptive
> metadata to the package in something like a README, a finding aid, or by
> archiving the ReproZip package in a repository and filling in the required
> metadata form.
>
> We have a few examples of ReproZip packing and preserving websites. We
> packed a general blog made with Django (a small web application using a
> SQLite3 database) -- you can view more information about that website and
> the instructions for unpacking it here
> <https://github.com/ViDA-NYU/reprozip-examples/tree/master/django-blog>.
> We’ve also packed a data journalism app called Stacked Up, a dynamic web
> application to explore the textbook inventory of Philadelphia public
> schools (data is stored in a PostgreSQL database)-- you can view more
> information about that website and the instructions for unpacking it here
> <https://github.com/ViDA-NYU/reprozip-examples/tree/master/stacked-up>.
> You can view all of our examples in our examples site
> <https://examples.reprozip.org/>.
>
> Best regards,
> --
> Rémi Rampin
> ReproZip Developer
> Center for Data Science

Re: [Reprozip-users] Web Archiving

2018-04-03 Thread Rémi Rampin
2018-03-30 08:59 EDT, Rasa Bočytė :

> What I am most interested in is to know whether ReproZip could be used to:
> - document the original environment from which the files were acquired
> (web server, hardware, software)
> - record extra technical details and instructions that could be added
> manually
> - maintain dependencies between files and folders
> - capture metadata
>

Hello Rasa,

Thanks for your note! We have been looking specifically at packing dynamic
websites, so your email is timely. In short, ReproZip can do the bullet
points you’ve outlined in your email.

ReproZip stores the kernel version and distribution information (provided
by Python’s platform module
). That is usually enough
to get a virtual machine (or base Docker image) to run the software. It
also stores the version numbers of all packages from which files are used
(on deb-based and rpm-based distributions for now).

For instructions, ReproZip stores the command-lines used to run the
software that is getting packaged. It can also record which files are
“input” or “output” of the process, which is useful when reproducing, to
allow the user to replace inputs and look at outputs easily. We try to
infer this information, but we rely on the user to check this, and provide
descriptive names for them (however this means editing the YAML
configuration file, so our experience is that not everyone takes the time).
They can also edit the command-lines/environment variables.

I am not entirely sure what you mean by “dependencies between files and
folders”. The path to files in the original environment is stored, so the
reproduction can happen with the exact same setup.

As for metadata, ReproZip can automatically capture technical and
administrative metadata in the config.yml file included in the ReproZip
package. The user (archivist, in your case!) will need to add descriptive
metadata to the package in something like a README, a finding aid, or by
archiving the ReproZip package in a repository and filling in the required
metadata form.

We have a few examples of ReproZip packing and preserving websites. We
packed a general blog made with Django (a small web application using a
SQLite3 database) -- you can view more information about that website and
the instructions for unpacking it here
.
We’ve also packed a data journalism app called Stacked Up, a dynamic web
application to explore the textbook inventory of Philadelphia public
schools (data is stored in a PostgreSQL database)-- you can view more
information about that website and the instructions for unpacking it here
. You
can view all of our examples in our examples site
.

Best regards,
-- 
Rémi Rampin
ReproZip Developer
Center for Data Science, New York University
___
Reprozip-users mailing list
Reprozip-users@vgc.poly.edu
https://vgc.poly.edu/mailman/listinfo/reprozip-users


[Reprozip-users] Web Archiving

2018-03-30 Thread Rasa Bočytė
 Dear ReproZip Team,

let me introduce myself. I am a researcher at the Netherlands Institute for
Sound and Vision and I am currently investigating server-side web
preservation strategies as a way to preserve dynamic websites.

We have been conducting a review of best practices for server-side website
preservation and a couple of people mentioned that ReproZip might be
helpful for that. I was wondering if you think it could be used for this
purpose and whether you know of any examples of website servers or similar
database preservation.

What I am most interested in is to know whether ReproZip could be used to:
- document the original environment from which the files were acquired (web
server, hardware, software)
- record extra technical details and instructions that could be added
manually
- maintain dependencies between files and folders
- capture metadata

I would be very interested to hear about this and I would really appreciate
your help!

Kind regards,
---

*Rasa Bocyte*
Web Archiving Intern

*Netherlands Institute for Sound and Vision*
*Media

Parkboulevard

1
,
1217 WE  Hilversum | Postbus 1060, 1200 BB  Hilversum |
**beeldengeluid.nl
*
___
Reprozip-users mailing list
Reprozip-users@vgc.poly.edu
https://vgc.poly.edu/mailman/listinfo/reprozip-users