Hi Bao,

Thanks for your feedback ! I am not surprised that the sampling method
saves time and gives a good approximation, especially considering the size
of your data.

I have a few questions on your experiment though:
- how many clusters do you have (as the block method speed and memory
consumption is dependent of the number of cluster)
- have you monitored memory usage ? In particular, did you swap at any
moment ? Because swapping is a time killer.
- have you some results using the scikit learn function (and using sampling
to make data fit into memory) ?

The big advantage of the block version is that it can easily be
parallelized so if your memory is not full, we can still speed up
computation !

Best,

Alexandre.
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and 
their applications. This 200-page book is written by three acclaimed 
leaders in the field. The early access version is available now. 
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to