2013/11/22 Yi Pan <[email protected]>: > Dear scikit-learn persons, > > This is Pan Yi from the University of Washington, US. I am currently working > on a course project, exploring the performance of AdaBoostClassifier when > using the same base classifier, such as DecisionTreeClassifier, Perceptron, > > KNeighborsClassifier, or mixing different classifiers in one boosting. > Because my input is sparse matrix (41.8MB, mtx format), AdaBoostClassifier > doesn't work unless I change it to dense. The problem is that it will run > out of memory soon. > > > I want to know whether AdaBoostClassifier and DecisionTreeClassifier have > been improved to work with sparse matrix input X.
Unfortunately no, at least not for the DecisionTreeClassifier class. > If not, I need to > implement my own version of AdaBoost that can takes sparse matrix, could you > give me some advice on what kind of change should I make in the code? You would need to write a sparse variant (probably for matrices in the CSC layout) of the cython tree code but this is probably not an easy task: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_tree.pyx For reference here is a linear regression model that works with the CSC sparse matrix representation: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/cd_fast.pyx#L227-376 -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ Shape the Mobile Experience: Free Subscription Software experts and developers: Be at the forefront of tech innovation. Intel(R) Software Adrenaline delivers strategic insight and game-changing conversations that shape the rapidly evolving mobile landscape. Sign up now. http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
