Hi Alexandre,

I have a few questions on your experiment though:
> - how many clusters do you have (as the block method speed and memory
> consumption is dependent of the number of cluster)
>

  the dataset is clustered into 50 clusters


> - have you monitored memory usage ? In particular, did you swap at any
> moment ? Because swapping is a time killer.
>

  I have not monitored the memory usage. But the computation time here is
the real CPU time, not the elapse time


> - have you some results using the scikit learn function (and using
> sampling to make data fit into memory) ?
>

I only can run the original scikit-learn silhouette score with the size of
data less than 40K (or the sub_sample <40K). With the size 50K it becomes
out of memory. Due to that, I don't run the original one on my whole
dataset. I will re-check it and let you know soon.

The big advantage of the block version is that it can easily be
> parallelized so if your memory is not full, we can still speed up
> computation !
>
>
I am not very clear about this. Do you mean that at the same time, we can
run multiple blocks with the same sub_sample size, and then save time?

Regards,

T.Bao

>
>
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and 
their applications. This 200-page book is written by three acclaimed 
leaders in the field. The early access version is available now. 
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to