Hi
As I think, using sparse data we can enhance the descriptiveness of the
data while keeping its' smaller compared to the dense data without loosing
information. Isn't that what trees generally need for improved accuracy?
I will try using sparse data on 20newsgroups data and let you know the
results.
Arnaud,
I've gone through those messages and I've already started working on
patches. Last year I've done a project of a module in our university. It
was to implement Bagging in Scikit-learn. As Gilles had already begun that,
I was not able to get my code merged. Moreover I have not implemented
feature bootstrapping as it was beyond the scope of my original proposal to
the project.
https://github.com/maheshakya/scikit-learn/blob/bagging2/sklearn/ensemble/bagging.py
I would appreciate if you can review and give some feedback on my
implementation and what can I do further.
Thank you.
On Wed, Jan 22, 2014 at 2:51 PM, Caleb <[email protected]> wrote:
> Hi all,
>
> I am using random forest to do deep learning/feature learning using the
> RandomForestEmbedding in scikit-learn. It would be cool to apply
> the random forest on the learned features and induced a higher level
> representation.
>
> I have actually tried the naive approach of densified the output from
> RandomForestEmbedding and feed it back to another one to get the second
> level of representation of the same data, and then apply SVM on it. Not
> only it is extremely slow, the result become worst.
>
> However, I think sparse matrix support for decision tree is a worthwhile
> effort as it enables me to investigate why the result is worst easily.
>
> Just my 2 cents.
>
> Caleb
>
>
> On Wednesday, January 22, 2014 1:15 PM, Maheshakya Wijewardena <
> [email protected]> wrote:
> Hi,
>
> I have been using Scikit-learn One hot encoder for data encoding and the
> resulting array supports only for a few models such as logistic regression,
> SVC, etc. When I convert those sparse matrices with list comprehension or
> toarray() function to dense matrices, resulting arrays become too large for
> those classifiers such as Decision trees or any other tree based
> classifier.
> I saw a GSOC project idea of implementing this as mentioned here.
>
> https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-(GSOC)-2014
> I'm looking forward to apply for GSOC this year as well, so I would like
> start working on this. From where can I get support for this. (There're no
> possible mentors assigned for this)
>
> Regards,
> Maheshakya
>
>
> ------------------------------------------------------------------------------
> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
> Learn Why More Businesses Are Choosing CenturyLink Cloud For
> Critical Workloads, Development Environments & Everything In Between.
> Get a Quote or Start a Free Trial Today.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
> Learn Why More Businesses Are Choosing CenturyLink Cloud For
> Critical Workloads, Development Environments & Everything In Between.
> Get a Quote or Start a Free Trial Today.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general