Re: [Scikit-learn-general] memory troubles with RidgeClassifier

2012-09-09 Thread Mathieu Blondel
Olivier: RidgeCV is based on an eigenvalue decomposition (kernel case) and an SVD (linear case) so I think it's independent. Lars: That's a good idea. So we want to minimize \sum_i mu_i (w^T x_i - y_i)^2 where mu_i is the sample weight. This should be equivalent to \sum_i (sqrt(mu_i) w^T x_i - sqr

[Scikit-learn-general] MiniBatchKMeans on optdigits: 10 classes in .11, only 8 in .12 ?

2012-09-09 Thread denis
Folks, what changed in MiniBatchKMeans in .12 ? Running it on datasets.load_digits() gave 10 classes in .11 but now only 8 in .12 ? test-mbkmeans.py and logs attached. (Sure the size is too small for MiniBatch and for that matter kmeans is I think generally weak, low priority.) Bytheway datase

Re: [Scikit-learn-general] ANN: scikit-learn 0.12

2012-09-09 Thread Andreas Mueller
Hi Yaroslav. Thanks for the report. I didn't know about the deprecation warnings. For the other warnings: I think using the sklearn.test() is a bad idea and using ``nosetests sklearn --exe`` should work better. Also thanks for all the porting, it is really appreciated :) Cheers, Andy On 09/08/2

Re: [Scikit-learn-general] Reports on custom kernels

2012-09-09 Thread abdalrahman eweiwi
Hi Andreas, Exactly! already did that but still have some strange overfitting behaviour for the intersection kernel that I have to further investigate. Thanks On Fri, Sep 7, 2012 at 11:43 AM, Andreas Müller wrote: > Hi Abdalrahman. > I am not sure I know what you mean. > Are you referring to th

Re: [Scikit-learn-general] memory troubles with RidgeClassifier

2012-09-09 Thread Paolo Losi
On Sun, Sep 9, 2012 at 4:31 PM, Mathieu Blondel wrote: > I've just tried scipy.sparse.linalg.lsqr [*] on the full news20 dataset. > On my box it takes 8 seconds to run with tol=1e-3 and 5 seconds with > tol=1e-2 without any accuracy loss. It also solves the memory problem > mentioned by Lars, as i

Re: [Scikit-learn-general] memory troubles with RidgeClassifier

2012-09-09 Thread Lars Buitinck
2012/9/9 Mathieu Blondel : > I've just tried scipy.sparse.linalg.lsqr [*] on the full news20 dataset. On > my box it takes 8 seconds to run with tol=1e-3 and 5 seconds with tol=1e-2 > without any accuracy loss. It also solves the memory problem mentioned by > Lars, as it works directly with X and y

Re: [Scikit-learn-general] memory troubles with RidgeClassifier

2012-09-09 Thread Olivier Grisel
2012/9/9 Mathieu Blondel : > I've just tried scipy.sparse.linalg.lsqr [*] on the full news20 dataset. On > my box it takes 8 seconds to run with tol=1e-3 and 5 seconds with tol=1e-2 > without any accuracy loss. It also solves the memory problem mentioned by > Lars, as it works directly with X and y

Re: [Scikit-learn-general] memory troubles with RidgeClassifier

2012-09-09 Thread Mathieu Blondel
I've just tried scipy.sparse.linalg.lsqr [*] on the full news20 dataset. On my box it takes 8 seconds to run with tol=1e-3 and 5 seconds with tol=1e-2 without any accuracy loss. It also solves the memory problem mentioned by Lars, as it works directly with X and y. Unlike scipy.linalg.lsqr, scipy.

Re: [Scikit-learn-general] problem clustering using PCA and kmeans

2012-09-09 Thread Aliabbas Petiwala
Thanks Olivier that helped to show me the output, but for the same code as given before i am not getting proper clusters as shown in the plot below there are no clearly disparate clusters , the points seems to overlap. But using heirarchical clustering on same dataset i did find about 7 disparate