2012/1/23 Lars Buitinck <[email protected]>: > 2012/1/23 Dimitrios Pritsos <[email protected]>: >> I will give it a try however in some of my tests had a memory management >> problem. As I can recall it was mostly because of numpy function that >> might ask from pyTable to load every thing in main men. I guess some >> loops and some slicing might solve the problem. > > No experience with PyTables, sorry. > >> However I fist try to figure out how to use linear_model.SGDClassifier >> which it suppose to be capable to be trained in stages. Plus since I am >> using Linear Kernel it won't effect my results. > > Is that an SVC(kernel="linear") or a LinearSVC? The latter should be > able to handle a 50k samples array if the number of features is kept > within some bound (a few 100k should certainly be fine).
Indeed SVC will not scale to 50k samples, only LinearSVC will. In any case I found SGDClassifier (with the fit method) to be much faster than LinearSVC or LogisticRegression (i.e. any liblinear based models). And discrete naive Bayes models are sometimes even faster. Dimitrios: also if you are trying to work with scipy.sparse CSR matrices, be careful to read the docstring of the classifier: the supported input format are changing quite a bit in the current master: we are trying to merge all classifier implementations to accept both dense numpy arrays and sparse CSR matrices as input but this is still a work in progress. Sometimes the classifier that support the sparse variant is kept separated in a `.sparse` subpackage. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
