Re: [Scikit-learn-general] memory troubles with RidgeClassifier

2012-09-15 Thread Lars Buitinck
2012/9/10 Mathieu Blondel : > Olivier: RidgeCV is based on an eigenvalue decomposition (kernel case) and > an SVD (linear case) so I think it's independent. > > Lars: That's a good idea. So we want to minimize \sum_i mu_i (w^T x_i - > y_i)^2 where mu_i is the sample weight. This should be equivalen

Re: [Scikit-learn-general] memory troubles with RidgeClassifier

2012-09-09 Thread Mathieu Blondel
Olivier: RidgeCV is based on an eigenvalue decomposition (kernel case) and an SVD (linear case) so I think it's independent. Lars: That's a good idea. So we want to minimize \sum_i mu_i (w^T x_i - y_i)^2 where mu_i is the sample weight. This should be equivalent to \sum_i (sqrt(mu_i) w^T x_i - sqr

Re: [Scikit-learn-general] memory troubles with RidgeClassifier

2012-09-09 Thread Paolo Losi
On Sun, Sep 9, 2012 at 4:31 PM, Mathieu Blondel wrote: > I've just tried scipy.sparse.linalg.lsqr [*] on the full news20 dataset. > On my box it takes 8 seconds to run with tol=1e-3 and 5 seconds with > tol=1e-2 without any accuracy loss. It also solves the memory problem > mentioned by Lars, as i

Re: [Scikit-learn-general] memory troubles with RidgeClassifier

2012-09-09 Thread Lars Buitinck
2012/9/9 Mathieu Blondel : > I've just tried scipy.sparse.linalg.lsqr [*] on the full news20 dataset. On > my box it takes 8 seconds to run with tol=1e-3 and 5 seconds with tol=1e-2 > without any accuracy loss. It also solves the memory problem mentioned by > Lars, as it works directly with X and y

Re: [Scikit-learn-general] memory troubles with RidgeClassifier

2012-09-09 Thread Olivier Grisel
2012/9/9 Mathieu Blondel : > I've just tried scipy.sparse.linalg.lsqr [*] on the full news20 dataset. On > my box it takes 8 seconds to run with tol=1e-3 and 5 seconds with tol=1e-2 > without any accuracy loss. It also solves the memory problem mentioned by > Lars, as it works directly with X and y

Re: [Scikit-learn-general] memory troubles with RidgeClassifier

2012-09-09 Thread Mathieu Blondel
I've just tried scipy.sparse.linalg.lsqr [*] on the full news20 dataset. On my box it takes 8 seconds to run with tol=1e-3 and 5 seconds with tol=1e-2 without any accuracy loss. It also solves the memory problem mentioned by Lars, as it works directly with X and y. Unlike scipy.linalg.lsqr, scipy.

Re: [Scikit-learn-general] memory troubles with RidgeClassifier

2012-09-07 Thread Fabian Pedregosa
On Fri, Sep 7, 2012 at 4:18 PM, Olivier Grisel wrote: > 2012/9/7 Lars Buitinck : >> I just tried running the document classification example with all 20 >> classes, but the RidgeClassifier was taking so much memory that it >> triggered the OOM killer. This didn't happen before, so it much be a >>

Re: [Scikit-learn-general] memory troubles with RidgeClassifier

2012-09-07 Thread Mathieu Blondel
On Fri, Sep 7, 2012 at 11:41 PM, Lars Buitinck wrote: > 2012/9/7 Mathieu Blondel : > > On my box, RidgeClassifier finishes in 136 seconds but kNN dies with > > MemoryError. > > That's incredibly slow compared to all the other classifiers. Also > without --all_categories, it's slower than everythi

Re: [Scikit-learn-general] memory troubles with RidgeClassifier

2012-09-07 Thread Olivier Grisel
2012/9/7 Lars Buitinck : > 2012/9/7 Olivier Grisel : >> Maybe the default feature extraction has changed and made the matrix >> much denser that it used to be for this example? Although recent >> changes to the vectorizer would tend to decrease the number of >> features (min_df=2) hence make the pr

Re: [Scikit-learn-general] memory troubles with RidgeClassifier

2012-09-07 Thread Lars Buitinck
2012/9/7 Olivier Grisel : > Maybe the default feature extraction has changed and made the matrix > much denser that it used to be for this example? Although recent > changes to the vectorizer would tend to decrease the number of > features (min_df=2) hence make the problem smaller to solve. What w

Re: [Scikit-learn-general] memory troubles with RidgeClassifier

2012-09-07 Thread Olivier Grisel
2012/9/7 Lars Buitinck : > 2012/9/7 Mathieu Blondel : >> On my box, RidgeClassifier finishes in 136 seconds but kNN dies with >> MemoryError. > > That's incredibly slow compared to all the other classifiers. Also > without --all_categories, it's slower than everything else. > > I've already found o

Re: [Scikit-learn-general] memory troubles with RidgeClassifier

2012-09-07 Thread Lars Buitinck
2012/9/7 Mathieu Blondel : > On my box, RidgeClassifier finishes in 136 seconds but kNN dies with > MemoryError. That's incredibly slow compared to all the other classifiers. Also without --all_categories, it's slower than everything else. I've already found out where things go wrong, though I st

Re: [Scikit-learn-general] memory troubles with RidgeClassifier

2012-09-07 Thread Mathieu Blondel
On my box, RidgeClassifier finishes in 136 seconds but kNN dies with MemoryError. Could you try to identify which commit introduced the regression with git bisect? Mathieu pred = clf.predict(X_test) File "/Users/mathieublondel/Desktop/projects/scikit-learn/sklearn/neighbors/classification.

Re: [Scikit-learn-general] memory troubles with RidgeClassifier

2012-09-07 Thread Olivier Grisel
2012/9/7 Lars Buitinck : > I just tried running the document classification example with all 20 > classes, but the RidgeClassifier was taking so much memory that it > triggered the OOM killer. This didn't happen before, so it much be a > recent change to the codebase. Does anyone know what may be c

[Scikit-learn-general] memory troubles with RidgeClassifier

2012-09-07 Thread Lars Buitinck
I just tried running the document classification example with all 20 classes, but the RidgeClassifier was taking so much memory that it triggered the OOM killer. This didn't happen before, so it much be a recent change to the codebase. Does anyone know what may be causing this? -- Lars Buitinck S