At the moment your three options are
1) get more memory
2) do feature selection - 400k features on 200k samples seems to me to
contain a lot of redundant information or irrelevant features
3) submit a PR to support dense matrices - this is going to be a lot of
work and I doubt it's worth it.
All t
@Alex: I don't have a workaround for you but this seems like a useful
addition. I don't know how hard it would be, but you should definitely
raise it as an issue on the github issues page for the project:
https://github.com/scikit-learn/scikit-learn/issues?sort=updated&state=open
On Wed, Apr 24,
Sorry to reply to myself but I want to point something else out to all
possible GSoC students:
All proposals we had until now are new additions of algorithms. In my
opinion this is always welcome, given that several conditions are
checked: the algorithm should have proved to be useful generally,
Hi Şükrü
We can focus on the proposal now and decide later who is better to
mentor it. I could do it but it is not the thing I would be the best
at mentoring, so to solve the chicken-and-egg problem we can optimize
the decisions jointly when the time comes.
Did you start working on your proposal
On Wed, Apr 24, 2013 at 11:50 AM, Ronnie Ghose wrote:
> sorry but -1 for neural net based things. neural nets are heavily based on
> structure, they're not blackbox afaik.
I am a bit skeptical too, but to be honest, an RBM is not much
different from factor analysis except for the assumptions and
sorry but -1 for neural net based things. neural nets are heavily based on
structure, they're not blackbox afaik.
On Tue, Apr 23, 2013 at 10:43 PM, Vlad Niculae wrote:
> Dear Roland,
>
>
> In my opinion the directions in Issam's deep learning proposal are a
> bit better suited for scikit-learn.
Dear Roland,
In my opinion the directions in Issam's deep learning proposal are a
bit better suited for scikit-learn. Our estimators are supposed to be
black-box and as general as possible with sensible defaults. I don't
know to what extent recurrent nets can be implemented in such a way.
Coul
Hello Siddharth
Sorry for the late reply.
The list in the link you provide seems to be very eclectic, some
methods being very different than others in nature, only having in
common the fact that they can do dimensionality reduction.
I think that a solid GSoC proposal should be very specific. Amo
Dear students interested in applying for this year's GSoC with scikit-learn:
As of a couple of days, applications are open. Scikit-learn is a
suborganization of the PSF this year, like in previous years, so you
will apply with PSF as an organzation, specifying that you will work
on scikit-learn. T
get more memory?
On 23 April 2013 17:06, Alex Kopp wrote:
> Hi,
>
> I am looking to build a random forest regression model with a pretty large
> amount of sparse data. I noticed that I cannot fit the random forest model
> with a sparse matrix. Unfortunately, a dense matrix is too large to fit in
Hi,
I am looking to build a random forest regression model with a pretty large
amount of sparse data. I noticed that I cannot fit the random forest model
with a sparse matrix. Unfortunately, a dense matrix is too large to fit in
memory. What are my options?
For reference, I have just over 400k fe
That's also the way I've used these techniques in the past: Build the
matrix A. Transform the space X to Y = A^(1/2) X. Then apply via knn or whatever takes your fancy.
- John
--
Try New Relic Now & We'll Send You this Co
My experience also: similarity/dissimilarity pairs in and a mahalanobis
type matrix out.
- John
On Tue, Apr 23, 2013 at 12:59 AM, <
scikit-learn-general-requ...@lists.sourceforge.net> wrote:
> Send Scikit-learn-general mailing list submissions to
> scikit-learn-general@lists.sourceforge
For Mahalanobis metric, maybe we can do a cholesky decomposition to the
learned metric and make it a transformer? Then after it we can chain a knn
classifier after the transform.
Best,
Wei
On Tue, Apr 23, 2013 at 3:59 PM, Robert McGibbon wrote:
> Input to such algorithms is usually given as:
>
> Input to such algorithms is usually given as:
> - a set of similarity and dissimilarity links,
> - relative comparisons (x is closer to y than w is to z), or
> - target distances (x should be no farther than q from y).
>
> The outputs of all methods I've worked with are Mahalanobis distanc
T
15 matches
Mail list logo