Re: data sets and/or access to data sets

2011-02-17 Thread Steffen Möller
I like what I saw about biomaj. What is cannot do for the moment (from what I understood) is to express a runtime dependency against a particular database version and have that then installed package trigger biomaj to perform that step. Correct me if I am wrong, please. Could that be added? What

Re: data sets and/or access to data sets

2011-02-16 Thread Olivier Sallou
Hi, there is the BioMAJ tool, currently in packaging for Debian. Biomaj takes a property file that describes a remote data bank and the post-processes to apply on the downloaded data. You can get more info at http://biomaj.genouest.org. Biomaj takes in charge the download, the postprocesses and

Re: data sets and/or access to data sets

2011-02-16 Thread Olivier Sallou
The web interface will have togo to non-free 'cause it depends on several GWT libraries which are not packaged for Debian so we need to provide the Jar files with the softs. The core e.g. the main program that is called by the web interface will be in free. Core can also be used without the

Re: data sets and/or access to data sets

2011-02-16 Thread Scott Christley
Ok, great. Is the core package working and available in SVN? I will give it a try. Scott On Feb 16, 2011, at 10:58 AM, Olivier Sallou wrote: The web interface will have togo to non-free 'cause it depends on several GWT libraries which are not packaged for Debian so we need to provide the

Re: data sets and/or access to data sets

2011-02-16 Thread Olivier Sallou
Both are available in SVN but those are not officially validated. My mentor is analysing it to push it. I tested those however. In Alioth, there is a get-orig-source to create source file and create package. It also requires a mysql db (it is mentioned in man page) Olivier Le 2/16/11 6:00

Re: data sets and/or access to data sets

2011-02-16 Thread Andreas Tille
Hi, On Wed, Feb 16, 2011 at 06:04:46PM +0100, Olivier Sallou wrote: Both are available in SVN but those are not officially validated. My mentor is analysing it to push it. I tested those however. In Alioth, there is a get-orig-source to create source file and create package. It also

Re: data sets and/or access to data sets

2011-02-16 Thread Scott Christley
I don't disagree, in principle. There are many nice aspects to the debian packaging as you indicate. We don't want to replicate the 100s of terabytes of data into the debian repository, so any package would not have the real data but would download the data from its source during the package

Re: data sets and/or access to data sets

2011-02-16 Thread Olivier Sallou
Data versioning is very difficult as all data sources do not keep old versions online, only a current one. With biomaj we propose to keep old versions (or a number of old versions), but this is locally, it cannot help to reproduce an experiment with exactly the same data if remote source

data sets and/or access to data sets

2011-02-15 Thread Scott Christley
Hello, I wonder if anybody has thought about providing large data sets, like genomes, microarray data, etc. into debian packages in a way that makes it easy for users to get those data sets onto their machine, making it easier to use various tools? I can think of many great ways this would be

Re: data sets and/or access to data sets

2011-02-15 Thread Andreas Tille
Hi Scott, I think your idea is quite reasonable in principle. As far as I understood (but I did not dived into this) the getData effort[1] is one step into this direction and the to be soon uploaded package Biomaj does something that might be helpful as well. Regarding to actually buold

Re: data sets and/or access to data sets

2011-02-15 Thread Yaroslav Halchenko
just few cents. In the domain of neuroimaging we are also confronted with the problem of distributing data. Various aspects are relevant to this question if someone is to package data statically (instead of fetching via some data-sharing framework) into a proper Debian package: 1. with a

Re: data sets and/or access to data sets

2011-02-15 Thread Scott Christley
I think putting the data itself into debian repository is problematic. Regardless of any licensing issue, the shear amount of data is too great. Better to let the professionals who are getting paid to manage the data (NCBI, KEGG, etc.) and download directly from those sites. Pretty much all

Re: data sets and/or access to data sets

2011-02-15 Thread Yaroslav Halchenko
well -- this issue is tangentially related to the software: why should we care about having Debian packages while there are CRAN, easy_install, etc -- all those great tools to deploy software -- domain specific and created by specialists. Although such comparison is a stretch, I think it has its

Re: data sets and/or access to data sets

2011-02-15 Thread Charles Plessy
Le Tue, Feb 15, 2011 at 05:35:07PM -0600, Scott Christley a écrit : I like the getData effort. Have a set of data descriptors with information about how/where to get data, then when requested performs the download. This is very much the architecture I was thinking about. I see a number of