2012/1/23 Gael Varoquaux <[email protected]>: > On Mon, Jan 23, 2012 at 02:17:21PM +0100, Olivier Grisel wrote: >> Hehe, that would be nice but I am affraid Gael won't let me do this as >> part of the main scikit repository: large scale examples mean >> largescale datasets ;) > > Why can't we just generate data. The goal is to get the idea through, not > to solve SETI@HOME on our users laptop :).
Indeed we could extend / refactor the multilabel dataset generator to output arbitrarily big sparse CSR data with a text document structure. Would be nice for benchmarks too. I'll add that on my TODO list of interesting-stuff-but-not-that-a-priority-so-if-you-want-you-can-implement-it-yourself-before-i-do. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
