2009/3/4 Lukasz Szybalski <[email protected]>: > Hello, > As some of you might know I run a project called datahub. > > "Datahub is a tool that allows faster download/crawl, parse, load, and > visualize of data. It achieves this by allowing you to divide each > step into its own work folders. In each work folder you get a sample > files that you can start coding." > > http://lucasmanual.com/mywiki/DataHub
Sounds nice and like it might have something in come with both data packages/bundles in our Open Economics project and our datapkg utility. For example here's the 'data bundle' for the Millenium Development Goals: <http://knowledgeforge.net/econ/hg/file/d1275e3592b1/econdata/mdg/> <http://knowledgeforge.net/econ/hg/file/d1275e3592b1/econdata/mdg/data.py> The data.py file has code for getting the data, parsing it etc etc. Here's datapkg: <http://www.okfn.org/datapkg/> Among other things datapkg has a create command for creating a basic set of 'package' files on disk (see the $ datapkg man command for more info). > There were some discussion in collaboration of ckan and datahub. The > main goal as I see datahub right now is to create tools for getting, > parsing, manipulating and possibly visualizing data. If every project > that is listed here: http://www.ckan.net/package/list had a > corresponding package that I could download, run some command which > would get the data, run another command to parse and load the data, > then data mining would allow us to do so much more without the > overhead of getting,parsing and loading the data. We share a similar dream :) CKAN has a nice REST API: <http://www.ckan.net/api/rest/> And there's a python implementation that talks to this: <http://project.knowledgeforge.net/ckan/svn/ckanclient/trunk/> datapkg also has facilities for talking to CKAN in order to register and download material so these are in a fairly alpha state (see $ datapkg man). However, as should be clear from browsing around CKAN not all packages there have a 'download url' and when they do it isn't usually something that packaged (usually just a tar.gz or the like). That said I definitely think things should move in the direction you suggest. In fact there have been discussions here for a while of the idea of have 'data package maintainers' a la Debian who maintain CKAN packages and do the job of converting the raw material into something a more standardized form (in the way that Debian maintainers 'package' up the underlying software libraries and applications). Regards, Rufus _______________________________________________ okfn-discuss mailing list [email protected] http://lists.okfn.org/cgi-bin/mailman/listinfo/okfn-discuss
