Re: Use guix to distribute data & reproducible (data) science

2018-02-18 Thread Ricardo Wurmus
Amirouche Boubekki writes: > Then, in a follow up mail, you reply to Konrad: > >>> Konrad Hinsen skribis: >> >> [...] >> >>> It would be nice if big datasets could conceptually be handled in the >>> same way while being stored elsewhere - a

Re: Use guix to distribute data & reproducible (data) science

2018-02-18 Thread Ludovic Courtès
Hi Amirouche, Amirouche Boubekki skribis: > On 2018-02-09 18:13, ludovic.cour...@inria.fr wrote: >> Hi! >> >> Amirouche Boubekki skribis: >> >>> tl;dr: Distribution of data and software seems similar. >>>Data is more and more important

Re: Use guix to distribute data & reproducible (data) science

2018-02-17 Thread Roel Janssen
Amirouche Boubekki writes: > Hello again Ludovic, > > On 2018-02-09 18:13, ludovic.cour...@inria.fr wrote: >> Hi! >> >> Amirouche Boubekki skribis: >> >>> tl;dr: Distribution of data and software seems similar. >>>Data is more and more important in software

Re: Use guix to distribute data & reproducible (data) science

2018-02-16 Thread Amirouche Boubekki
Hello again Ludovic, On 2018-02-09 18:13, ludovic.cour...@inria.fr wrote: Hi! Amirouche Boubekki skribis: tl;dr: Distribution of data and software seems similar. Data is more and more important in software and reproducible science. Data science

Re: Use guix to distribute data & reproducible (data) science

2018-02-16 Thread Konrad Hinsen
Hi George, myg...@gmail.com writes: >> The three missing pieces are: >> >> - Dealing with measurements, which might involve interacting with >>experimental equipment or databases. Moreover, since data from >>such sources can change, its hash in the store must be computed >>from the

Re: Use guix to distribute data & reproducible (data) science

2018-02-16 Thread Amirouche Boubekki
On Thu, Feb 15, 2018 at 6:11 PM zimoun wrote: > Hi, > > Thank you for this food for thought. > > > I agree that the frontier between code and data is arbitary. > > However, I am not sure to get the picture about the data management in > the context of Reproducible

Re: Use guix to distribute data & reproducible (data) science

2018-02-16 Thread Konrad Hinsen
Hi, > In other words, on the paper, what are the benefits of a management of > some piece of data in the store ? For example for the applications of > weights of a trained neural network; or of the positions of the atoms in > protein structure. Provenance tracking. In a complex data processing

Re: Use guix to distribute data & reproducible (data) science

2018-02-15 Thread zimoun
Hi, Thank you for this food for thought. I agree that the frontier between code and data is arbitary. However, I am not sure to get the picture about the data management in the context of Reproducible Science. What is the issue ? So, I catch your invitation to explore your idea. :-) Let

Re: Use guix to distribute data & reproducible (data) science

2018-02-14 Thread Ludovic Courtès
Hello, Konrad Hinsen skribis: > It would be nice if big datasets could conceptually be handled in the > same way while being stored elsewhere - a bit like git-annex does for > git. And for parallel computing, we could have special build daemons. Exactly. I think we

Re: Use guix to distribute data & reproducible (data) science

2018-02-12 Thread Konrad Hinsen
Hi everyone, zimoun writes: > From my point of view, there is 2 kind of datasets: > a- the ones which are part of the software, e.g., used to pass the > tests. Therefore, they are usually small, not always; > b- the ones which are applied to the software and somehow

Re: Use guix to distribute data & reproducible (data) science

2018-02-10 Thread zimoun
Hi, Thank you for the topic feeding my thoughts. And thank you Ricardo for your explanations. > What I was thinking about, is use guix to distribute data packages just like > we distribute softwares from pypi. The advantage of using guix seems > obvious, > but apparantly it's not desirable or

Re: Use guix to distribute data & reproducible (data) science

2018-02-10 Thread Amirouche Boubekki
On Fri, Feb 9, 2018 at 8:16 PM Konrad Hinsen wrote: > Hi, > > On 09/02/2018 18:13, Ludovic Courtès wrote: > > > Amirouche Boubekki skribis: > > > >> tl;dr: Distribution of data and software seems similar. > >> Data is more and more

Re: Use guix to distribute data & reproducible (data) science

2018-02-09 Thread Ricardo Wurmus
zimoun writes: > I do not know so much, but a idea should to write a workflow: you > fetch the data, you clean them and you check by hashing that the > result is the expected one. Only the softwares used to do that are in > the store. The input and output data are not,

Re: Use guix to distribute data & reproducible (data) science

2018-02-09 Thread zimoun
Hi, > I'd say it depends on the data and how it is used inside and outside of a > workflow. Some data could very well stored in the store, and then > distributed via standard channels (Zenodo, ...) after export by "guix pack". > For big datasets, some other mechanism is required. I am not sure

Re: Use guix to distribute data & reproducible (data) science

2018-02-09 Thread zimoun
Dear, >From my understanding, what you are describing is what bioinfo guys call a workflow: 1- fetch data here and there 2- clean and prepare data 3- compute stuff with these data 4- obtain an answer and loop several times on several data sets. Guix Workflow Language allows to implement the

Re: Use guix to distribute data & reproducible (data) science

2018-02-09 Thread Ludovic Courtès
Hi! Amirouche Boubekki skribis: > tl;dr: Distribution of data and software seems similar. >Data is more and more important in software and reproducible >science. Data science ecosystem lakes resources sharing. >I think guix can help. I think

Use guix to distribute data & reproducible (data) science

2018-02-09 Thread Amirouche Boubekki
Héllo all, tl;dr: Distribution of data and software seems similar. Data is more and more important in software and reproducible science. Data science ecosystem lakes resources sharing. I think guix can help. Recently I stumbled upon open data movement and its links with