Hi Josh! We have a few wikipedia term occurrences datasets here: http://www.select.cs.cmu.edu/code/graphlab/datasets.html
Another interesting dataset I learned about today is American airlines statistics: http://stat-computing.org/dataexpo/2009/ Best, Dr. Danny Bickson Project Scientist, Machine Learning Dept. Carnegie Mellon University On Tue, Aug 28, 2012 at 6:07 PM, Josh Patterson <[email protected]> wrote: > Does anyone have any great suggestions for open datasets to run/test > SGD on that are in the 500MB - 1GB range? > > Just looking for nice benchmarking datasets, wondered what the > community thought here. > > Thanks, > > Josh > > -- > Twitter: @jpatanooga > Principal Solution Architect @ Cloudera > hadoop: http://www.cloudera.com
