OK, I finished reading _tree.pyx and now I understand CSC dense matrix
format.
I have a general view of what is necessary to be implemented.

I've never seriously used Cython. What are you guys using as development
environment? How to easily code/compile/test?


On Thu, Jan 23, 2014 at 11:55 AM, Olivier Grisel
<olivier.gri...@ensta.org>wrote:

> 2014/1/23 Felipe Eltermann <felipe.elterm...@gmail.com>:
> > I'm testing different classifiers for a BoW problem and last week I got
> > disappointed that I couldn't use scikit's DecisionTree.
> > However, using NaiveBayes was awesome! Thanks for this great piece of
> > software.
> > So, if you are planning to add the support for scipy sparse matrix on
> > DecisionTree, I'd like to help.
> >
> > Gilles, I read /sklearn/tree/tree.py and found that there are 4 methods
> that
> > receive X as a dense matrix:
> > BaseDecisionTree.fit()
> > BaseDecisionTree.predict()
> > DecisionTreeClassifier.predict_proba()
> > DecisionTreeClassifier.predict_log_proba()
> >
> > fit() calls some Cython classes, that I think you referred to:
> > _tree.BestSplitter
> > _tree.PresortBestSplitter
> > _tree.RandomSplitter
> > _tree.Gini
> > _tree.Entropy
> > _tree.MSE
> > _tree.FriedmanMSE
>
> As Gilles said, have a look at the Splitters first. You probably want
> to do feature-wise access to the input data, hence the
> scipy.sparse.csc_matrix representation should be supported first.  If
> you are not familiar with the internal data structure of the CSC
> representation, here is a piece of cython code of another estimator
> that can deal efficiently with CSC sparse input data:
>
>
> https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/cd_fast.pyx#L227
>
> which is called by:
>
>
> https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/coordinate_descent.py#L450
>
> Also have a look at:
>
> http://docs.scipy.org/doc/scipy/reference/sparse.html
>
> http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csc_matrix.html#scipy.sparse.csc_matrix
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>
>
> ------------------------------------------------------------------------------
> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
> Learn Why More Businesses Are Choosing CenturyLink Cloud For
> Critical Workloads, Development Environments & Everything In Between.
> Get a Quote or Start a Free Trial Today.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to