http://lucasmanual.com/mywiki/DataHub
*Datahub is a tool that allows faster download/crawl, parse, load, and visualize of data. It achieves this by allowing you to divide each step into its own work folders. In each work folder you get a sample files that you can start coding. *Datahub is for people who found some interesting data source for them, they want to download it, parse it, load it into database, provide some documentation, and visualize it. Datahub will speed up the process by creating folder for each of these actions. You will create all the programs from our base default template and move on to analyzing the data in no time. How to get started?: Datahub is a python based tool and here is how to run it. **Create python virtualenviroment: virtualenv --no-site-packages datahubENV source datahubENV/bin/activate **How to get it: wget http://launchpad.net/datahub/trunk/0.7/+download/datahub-0.7.tar.gz tar -xzvf datahub-0.7.tar.gz ** Install it: cd datahub-0.7/ python setup.py install **Create you project using datahub default templates: paster create --list-templates paster create -t datahub ** Where do I start: Above commands created a project skeleton that has 4 folders: crawl (sample code to download via wget or harvestman), parse (here is where you parse raw data), load (here is where you load the data into database using sqlalchemy or a tool of your choice), hdf5 (convert to hdf5 if you don't want to use database), wiki (provide some documentation) This is a first release, so feedback is appreciated. Give it a try if you have some interesting data to deal with. Thanks, Lucas ------------------------------------------------------------------------------ Check out the new SourceForge.net Marketplace. It is the best place to buy or sell services for just about anything Open Source. http://p.sf.net/sfu/Xq1LFB _______________________________________________ rpy-list mailing list rpy-list@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rpy-list