Could you please check memory usage while running DBSCAN to make sure freezing is due to running out of memory and not to something else? Which parameters do you run DBSCAN with? Changing algorithm, leaf_size parameters and ensuring n_jobs=1 could help.

Assuming eps is reasonable, I think it shouldn't be an issue to run DBSCAN on L2 normalized data: using the default euclidean metric, this should produce somewhat similar results to clustering not normalized data with metric='cosine'.

On 13/05/18 00:20, Andrew Nystrom wrote:
If you’re l2 norming your data, you’re making it live on the surface of a hypershere. That surface will have a high density of points and may not have areas of low density, in which case the entire surface could be recognized as a single cluster if epsilon is high enough and min neighbors is low enough. I’d suggest not using l2 norm with DBSCAN. On Sat, May 12, 2018 at 7:27 AM Mauricio Reis <rei...@gmail.com <mailto:rei...@gmail.com>> wrote:

    The DBScan "fit" method (in scikit-learn v0.19.1) is freezing my
    computer without any warning message!

    I am using WinPython 3.6.5 64 bit.

    The method works normally with the original data, but freezes when I
    use the normalized data (between 0 and 1).

    What should I do?

    Att.,
    Mauricio Reis
    _______________________________________________
    scikit-learn mailing list
    scikit-learn@python.org <mailto:scikit-learn@python.org>
    https://mail.python.org/mailman/listinfo/scikit-learn



_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to