Hi Josh!
We have a few wikipedia term occurrences datasets
here: http://www.select.cs.cmu.edu/code/graphlab/datasets.html

Another interesting dataset I learned about today is American airlines
statistics: http://stat-computing.org/dataexpo/2009/

Best,

Dr. Danny Bickson
Project Scientist, Machine Learning Dept.
Carnegie Mellon University


On Tue, Aug 28, 2012 at 6:07 PM, Josh Patterson <[email protected]> wrote:
> Does anyone have any great suggestions for open datasets to run/test
> SGD on that are in the 500MB - 1GB range?
>
> Just looking for nice benchmarking datasets, wondered what the
> community thought here.
>
> Thanks,
>
> Josh
>
> --
> Twitter: @jpatanooga
> Principal Solution Architect @ Cloudera
> hadoop: http://www.cloudera.com

Reply via email to