Amirouche Boubekki writes:
> Then, in a follow up mail, you reply to Konrad:
>
>>> Konrad Hinsen skribis:
>>
>> [...]
>>
>>> It would be nice if big datasets could conceptually be handled in the
>>> same way while being stored elsewhere - a
Hi Amirouche,
Amirouche Boubekki skribis:
> On 2018-02-09 18:13, ludovic.cour...@inria.fr wrote:
>> Hi!
>>
>> Amirouche Boubekki skribis:
>>
>>> tl;dr: Distribution of data and software seems similar.
>>>Data is more and more important
Amirouche Boubekki writes:
> Hello again Ludovic,
>
> On 2018-02-09 18:13, ludovic.cour...@inria.fr wrote:
>> Hi!
>>
>> Amirouche Boubekki skribis:
>>
>>> tl;dr: Distribution of data and software seems similar.
>>>Data is more and more important in software
Hello again Ludovic,
On 2018-02-09 18:13, ludovic.cour...@inria.fr wrote:
Hi!
Amirouche Boubekki skribis:
tl;dr: Distribution of data and software seems similar.
Data is more and more important in software and reproducible
science. Data science
Hi George,
myg...@gmail.com writes:
>> The three missing pieces are:
>>
>> - Dealing with measurements, which might involve interacting with
>>experimental equipment or databases. Moreover, since data from
>>such sources can change, its hash in the store must be computed
>>from the
On Thu, Feb 15, 2018 at 6:11 PM zimoun wrote:
> Hi,
>
> Thank you for this food for thought.
>
>
> I agree that the frontier between code and data is arbitary.
>
> However, I am not sure to get the picture about the data management in
> the context of Reproducible
Hi,
> In other words, on the paper, what are the benefits of a management of
> some piece of data in the store ? For example for the applications of
> weights of a trained neural network; or of the positions of the atoms in
> protein structure.
Provenance tracking. In a complex data processing
Hi,
Thank you for this food for thought.
I agree that the frontier between code and data is arbitary.
However, I am not sure to get the picture about the data management in
the context of Reproducible Science. What is the issue ?
So, I catch your invitation to explore your idea. :-)
Let
Hello,
Konrad Hinsen skribis:
> It would be nice if big datasets could conceptually be handled in the
> same way while being stored elsewhere - a bit like git-annex does for
> git. And for parallel computing, we could have special build daemons.
Exactly. I think we
Hi everyone,
zimoun writes:
> From my point of view, there is 2 kind of datasets:
> a- the ones which are part of the software, e.g., used to pass the
> tests. Therefore, they are usually small, not always;
> b- the ones which are applied to the software and somehow
Hi,
Thank you for the topic feeding my thoughts.
And thank you Ricardo for your explanations.
> What I was thinking about, is use guix to distribute data packages just like
> we distribute softwares from pypi. The advantage of using guix seems
> obvious,
> but apparantly it's not desirable or
On Fri, Feb 9, 2018 at 8:16 PM Konrad Hinsen
wrote:
> Hi,
>
> On 09/02/2018 18:13, Ludovic Courtès wrote:
>
> > Amirouche Boubekki skribis:
> >
> >> tl;dr: Distribution of data and software seems similar.
> >> Data is more and more
zimoun writes:
> I do not know so much, but a idea should to write a workflow: you
> fetch the data, you clean them and you check by hashing that the
> result is the expected one. Only the softwares used to do that are in
> the store. The input and output data are not,
Hi,
> I'd say it depends on the data and how it is used inside and outside of a
> workflow. Some data could very well stored in the store, and then
> distributed via standard channels (Zenodo, ...) after export by "guix pack".
> For big datasets, some other mechanism is required.
I am not sure
Dear,
>From my understanding, what you are describing is what bioinfo guys
call a workflow:
1- fetch data here and there
2- clean and prepare data
3- compute stuff with these data
4- obtain an answer
and loop several times on several data sets.
Guix Workflow Language allows to implement the
Hi!
Amirouche Boubekki skribis:
> tl;dr: Distribution of data and software seems similar.
>Data is more and more important in software and reproducible
>science. Data science ecosystem lakes resources sharing.
>I think guix can help.
I think
Héllo all,
tl;dr: Distribution of data and software seems similar.
Data is more and more important in software and reproducible
science. Data science ecosystem lakes resources sharing.
I think guix can help.
Recently I stumbled upon open data movement and its links with
17 matches
Mail list logo