2013/11/22 Yi Pan <[email protected]>:
> Dear scikit-learn persons,
>
> This is Pan Yi from the University of Washington, US. I am currently working
> on a course project, exploring the performance of AdaBoostClassifier when
> using the same base classifier, such as DecisionTreeClassifier, Perceptron,
>
> KNeighborsClassifier, or mixing different classifiers in one boosting.
> Because my input is sparse matrix (41.8MB, mtx format), AdaBoostClassifier
> doesn't work unless I change it to dense. The problem is that it will run
> out of memory soon.
>
>
> I want to know whether AdaBoostClassifier and DecisionTreeClassifier have
> been improved to work with sparse matrix input X.

Unfortunately no, at least not for the DecisionTreeClassifier class.

> If not,  I need to
> implement my own version of AdaBoost that can takes sparse matrix, could you
> give me some advice on what kind of change should I make in the code?

You would need to write a sparse variant (probably for matrices in the
CSC layout) of the cython tree code but this is probably not an easy
task:

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_tree.pyx

For reference here is a linear regression model that works with the
CSC sparse matrix representation:

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/cd_fast.pyx#L227-376

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
Shape the Mobile Experience: Free Subscription
Software experts and developers: Be at the forefront of tech innovation.
Intel(R) Software Adrenaline delivers strategic insight and game-changing 
conversations that shape the rapidly evolving mobile landscape. Sign up now. 
http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to