the dataset is clustered into 50 clusters
>

OK, so each clusters contains approximately 5K elements, which means
distance matrices of size 25 000K.


>  I have not monitored the memory usage. But the computation time here is
> the real CPU time, not the elapse time
>

OK.


> I only can run the original scikit-learn silhouette score with the size of
> data less than 40K (or the sub_sample <40K). With the size 50K it becomes
> out of memory. Due to that, I don't run the original one on my whole
> dataset. I will re-check it and let you know soon.
>

So your memory can store a 40K x 40K distance matrix, about 1 600 000K
elements.

I am not very clear about this. Do you mean that at the same time, we can
> run multiple blocks with the same sub_sample size, and then save time?
>

No. I mean that the computation of the score itself can be parallelized. If
you memory can handle a 1 600 000K element matrix, that means that you
parallelize the score computation on 1 600 000K / 25 000K = 64 cores.

Give me 10 minutes to set up a gist with the parallelized version.

Alexandre.



>
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and
> their applications. This 200-page book is written by three acclaimed
> leaders in the field. The early access version is available now.
> Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and 
their applications. This 200-page book is written by three acclaimed 
leaders in the field. The early access version is available now. 
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to