2012/1/23 Dimitrios Pritsos <[email protected]>: > On 01/23/2012 12:24 PM, Olivier Grisel wrote: >> BTW: what is the structure of you data in PyTables? Is is mapped to a >> scipy.sparse Compressed Sparse Row datastructure? How many features do >> you have in your dataset? > > The training data are in a EArray (Compressed per row due to lots of > zeros). > I have 34000 Samples and the length of my Dictionary depending on the > Training Set is about 1,500,000. > However, using about 30,000 features seems satisfactory for a > proof-of-concept case. However the samples needs to be approximately > about 30-50k.
That would be doable. 30k features × 50k samples in a CSR matrix with dtype=float32, assuming it's 90% zeros (a pessimistic guess for topic spotting) would take just over 2GB. -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam ------------------------------------------------------------------------------ Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
