Hi All / Ted, I tried looking through the mailing list first, since similar questions have been asked before. But couldn't really find what I wanted.
Quick background - I have been working on higher order learning algorithms (Feature Sharding to be specific) for some time. While getting this stuff into Mahout will require some solid progress on the pig/mahout integration front among other things, I have been exploring how vertical sharding generally affects classifier performance using some simple code I've written in Weka. Most of my studies so far have been done on moderate dimensional datasets. Can someone please suggest me some high/very high dimensional datasets suitable for binary classification and available for free? Thank you! -- Praneet Mhatre Graduate Student Donald Bren School of ICS University of California, Irvine
