Re: [Scikit-learn-general] Getting Started with scikit-learn

2013-05-10 Thread Nigel Legg
Upgraded from Python 2.6 -> 2.7, then installed numpy, scipy, and scikit-learn on that. Now starting to gt to grips with sklearn. Thanks for advice, Nigel. Regards, Nigel Legg 07722 652866 http://twitter.com/nigellegg http://uk.linkedin.com/in/nigellegg To support the footyvoice app, please visi

Re: [Scikit-learn-general] Out of memory when running silhouette score function

2013-05-10 Thread Bao Thien
Hi Alexandre, It sounds very great. I will try it and let you know soon. Regards, T.Bao On Fri, May 10, 2013 at 6:19 PM, Alexandre ABRAHAM < abraham.alexan...@gmail.com> wrote: > Bao, > > Sorry for the delay. I have push a new version of the code on the gist > (there is now a n_jobs keyword p

Re: [Scikit-learn-general] Out of memory when running silhouette score function

2013-05-10 Thread Alexandre ABRAHAM
Bao, Sorry for the delay. I have push a new version of the code on the gist (there is now a n_jobs keyword parameter). It should use a bit more memory. Fast bench (see main in the gist) : Scikit silhouette (113.294149s): -0.013992 Block silhouette (23.485517s): -0.013992 Block silhouette parallel

Re: [Scikit-learn-general] Out of memory when running silhouette score function

2013-05-10 Thread Alexandre ABRAHAM
the dataset is clustered into 50 clusters > OK, so each clusters contains approximately 5K elements, which means distance matrices of size 25 000K. > I have not monitored the memory usage. But the computation time here is > the real CPU time, not the elapse time > OK. > I only can run the

Re: [Scikit-learn-general] Out of memory when running silhouette score function

2013-05-10 Thread Bao Thien
Hi Alexandre, I have a few questions on your experiment though: > - how many clusters do you have (as the block method speed and memory > consumption is dependent of the number of cluster) > the dataset is clustered into 50 clusters > - have you monitored memory usage ? In particular, did you

Re: [Scikit-learn-general] Out of memory when running silhouette score function

2013-05-10 Thread Alexandre ABRAHAM
Hi Bao, Thanks for your feedback ! I am not surprised that the sampling method saves time and gives a good approximation, especially considering the size of your data. I have a few questions on your experiment though: - how many clusters do you have (as the block method speed and memory consumpti

Re: [Scikit-learn-general] Out of memory when running silhouette score function

2013-05-10 Thread Bao Thien
Hi Alexandre, I run the silhouette_score_block on my dataset, and this is the result dataset size |X| = 260486, dimension 40, RAM 4GB Trial Original Ward (whole data)(1) *Original Ward (sub_sample=50K)(2)* Silhouette Score Time(s) Silhouette Score Time(s) 1st 0.19045893 6250.758648 0.189+/