Re: [Scikit-learn-general] Random Forest Regression - Large Sparse Data

2013-04-23 Thread Brian Holt
At the moment your three options are 1) get more memory 2) do feature selection - 400k features on 200k samples seems to me to contain a lot of redundant information or irrelevant features 3) submit a PR to support dense matrices - this is going to be a lot of work and I doubt it's worth it. All t

Re: [Scikit-learn-general] Random Forest Regression - Large Sparse Data

2013-04-23 Thread Juan Nunez-Iglesias
@Alex: I don't have a workaround for you but this seems like a useful addition. I don't know how hard it would be, but you should definitely raise it as an issue on the github issues page for the project: https://github.com/scikit-learn/scikit-learn/issues?sort=updated&state=open On Wed, Apr 24,

Re: [Scikit-learn-general] GSoC applications are open (based on Fwd: [Soc2013-general] Student Application Template (Applications start April 22!))

2013-04-23 Thread Vlad Niculae
Sorry to reply to myself but I want to point something else out to all possible GSoC students: All proposals we had until now are new additions of algorithms. In my opinion this is always welcome, given that several conditions are checked: the algorithm should have proved to be useful generally,

Re: [Scikit-learn-general] GSOC idea

2013-04-23 Thread Vlad Niculae
Hi Şükrü We can focus on the proposal now and decide later who is better to mentor it. I could do it but it is not the thing I would be the best at mentoring, so to solve the chicken-and-egg problem we can optimize the decisions jointly when the time comes. Did you start working on your proposal

Re: [Scikit-learn-general] GSOC

2013-04-23 Thread Vlad Niculae
On Wed, Apr 24, 2013 at 11:50 AM, Ronnie Ghose wrote: > sorry but -1 for neural net based things. neural nets are heavily based on > structure, they're not blackbox afaik. I am a bit skeptical too, but to be honest, an RBM is not much different from factor analysis except for the assumptions and

Re: [Scikit-learn-general] GSOC

2013-04-23 Thread Ronnie Ghose
sorry but -1 for neural net based things. neural nets are heavily based on structure, they're not blackbox afaik. On Tue, Apr 23, 2013 at 10:43 PM, Vlad Niculae wrote: > Dear Roland, > > > In my opinion the directions in Issam's deep learning proposal are a > bit better suited for scikit-learn.

Re: [Scikit-learn-general] GSOC

2013-04-23 Thread Vlad Niculae
Dear Roland, In my opinion the directions in Issam's deep learning proposal are a bit better suited for scikit-learn. Our estimators are supposed to be black-box and as general as possible with sensible defaults. I don't know to what extent recurrent nets can be implemented in such a way. Coul

Re: [Scikit-learn-general] GSOC 2013 Proposal

2013-04-23 Thread Vlad Niculae
Hello Siddharth Sorry for the late reply. The list in the link you provide seems to be very eclectic, some methods being very different than others in nature, only having in common the fact that they can do dimensionality reduction. I think that a solid GSoC proposal should be very specific. Amo

[Scikit-learn-general] GSoC applications are open (based on Fwd: [Soc2013-general] Student Application Template (Applications start April 22!))

2013-04-23 Thread Vlad Niculae
Dear students interested in applying for this year's GSoC with scikit-learn: As of a couple of days, applications are open. Scikit-learn is a suborganization of the PSF this year, like in previous years, so you will apply with PSF as an organzation, specifying that you will work on scikit-learn. T

Re: [Scikit-learn-general] Random Forest Regression - Large Sparse Data

2013-04-23 Thread Calvin Morrison
get more memory? On 23 April 2013 17:06, Alex Kopp wrote: > Hi, > > I am looking to build a random forest regression model with a pretty large > amount of sparse data. I noticed that I cannot fit the random forest model > with a sparse matrix. Unfortunately, a dense matrix is too large to fit in

[Scikit-learn-general] Random Forest Regression - Large Sparse Data

2013-04-23 Thread Alex Kopp
Hi, I am looking to build a random forest regression model with a pretty large amount of sparse data. I noticed that I cannot fit the random forest model with a sparse matrix. Unfortunately, a dense matrix is too large to fit in memory. What are my options? For reference, I have just over 400k fe

Re: [Scikit-learn-general] Metric Learning Algorithms

2013-04-23 Thread John Collins
That's also the way I've used these techniques in the past: Build the matrix A. Transform the space X to Y = A^(1/2) X. Then apply via knn or whatever takes your fancy. - John -- Try New Relic Now & We'll Send You this Co

Re: [Scikit-learn-general] Metric Learning Algorithms (John Collins)

2013-04-23 Thread John Collins
My experience also: similarity/dissimilarity pairs in and a mahalanobis type matrix out. - John On Tue, Apr 23, 2013 at 12:59 AM, < scikit-learn-general-requ...@lists.sourceforge.net> wrote: > Send Scikit-learn-general mailing list submissions to > scikit-learn-general@lists.sourceforge

Re: [Scikit-learn-general] Metric Learning Algorithms

2013-04-23 Thread Wei LI
For Mahalanobis metric, maybe we can do a cholesky decomposition to the learned metric and make it a transformer? Then after it we can chain a knn classifier after the transform. Best, Wei On Tue, Apr 23, 2013 at 3:59 PM, Robert McGibbon wrote: > Input to such algorithms is usually given as: >

Re: [Scikit-learn-general] Metric Learning Algorithms

2013-04-23 Thread Robert McGibbon
> Input to such algorithms is usually given as: > - a set of similarity and dissimilarity links, > - relative comparisons (x is closer to y than w is to z), or > - target distances (x should be no farther than q from y). > > The outputs of all methods I've worked with are Mahalanobis distanc T