Just because it sounds cool:
----- Original Message ----- > Google to Host Terabytes of Open-Source Science Data > By Alexis Madrigal > January 18, 2008 | 2:23:21 PM > > Categories: Dataset, Research > > <http://blog.wired.com/wiredscience/2008/01/google-to-provi.html> > > Sources at Google have disclosed that the humble domain, > http://research.google.com , > will soon provide a home for terabytes of open-source scientific > datasets. The storage will be free to scientists and access to the > data will be free for all. The project, known as Palimpsest and first > previewed to the scientific community at the Science Foo camp at the > Googleplex last August, missed its original launch date this week, but > will debut soon. > > Building on the company's acquisition of the data visualization > technology, Trendalyzer, from the oft-lauded, TED presenting Gapminder > team, Google will also be offering algorithms for the examination and > probing of the information. The new site will have YouTube-style > annotating and commenting features. > > [N.N.: TED talk : http://www.ted.com/index.php/talks/view/id/92 ] > > The storage would fill a major need for scientists who want to openly > share their data, and would allow citizen scientists access to an > unprecedented amount of data to explore. For example, two planned > datasets are all 120 terabytes of Hubble Space Telescope data and the > images from the Archimedes Palimpsest, the 10th century manuscript > that inspired the Google dataset storage project. > > UPDATE (12:01pm): Attila Csordas of Pimm has a lot more details on the > project, including a set of slides that Jon Trowbridge of Google gave > at a presentation in Paris last year. WIRED's own Thomas Goetz also > mentioned the project in his fantastic piece of freeing dark data. > > One major issue with science's huge datasets is how to get them to > Google. In this post by a SciFoo attendee over at business|bytes|genes| > molecules, the collection plan was described: > > (Google people) are providing a 3TB drive array (Linux RAID5). The > array is provided in "suitcase" and shipped to anyone who wants to > send they data to Google. Anyone interested gives Google the file > tree, and they SLURP the data off the drive. I believe they can extend > this to a larger array (my memory says 20TB). > > You can check out more details on why hard drives are the preferred > distribution method at Pimm. And we hear that Google is hunting for > cool datasets, so if you have one, it might pay to get in touch with > them.