-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi everyone,
I just stumbled upon the example plot_dbscan.py at [1]. As far as I understand, the similarity matrix S is computed from the data in X and then it is used for clustering with DBSCAN. What confused me was that the documentation for DBSCAN.fit(X) says that it takes a *distance* matrix. Here is the code snippet: - ------------------------ # Compute similarities D = distance.squareform(distance.pdist(X)) S = 1 - (D / np.max(D)) # Compute DBSCAN db = DBSCAN(eps=0.95, min_samples=10).fit(S) - ------------------------ Shouldn't it be "[?].fit(D)" instead? I would be happy if anybody could explain if my understanding is wrong or if the example is flawed. Best regards, Johannes [1] http://scikit-learn.org/dev/auto_examples/cluster/plot_dbscan.html#example-cluster-plot-dbscan-py [2] http://scikit-learn.org/dev/modules/generated/sklearn.cluster.DBSCAN.html#sklearn.cluster.DBSCAN - -- Knowledge Representation & Knowledge Management Research Group 68159 Mannheim B6, 26, Room C 1.10 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAEBAgAGBQJRXbD2AAoJENiDBNxHmpwDmjgH/1mbZsvdMfhlI96bh/GvxBkI j/4zCROlkfGRE9ATyC8esBrchq1i0muuh3FJU9uzXPiqVDiVh7WEBhkt1KdrOQ1G BadSJlWpeH2KX/2WP6KFsYul61Y0mFRUgeBw75ixCE2CMfq1MHbAsZVInBVHwcbq ZnSrXrbD+EVWFUDrYipDYGibTCqzDdTiIaOge+mD3/QGpOmUIpkm6cctsyeZvo/q 5JQKpARUpXGowrrbEpvX0m2iQ9NQmff1yKRRMznqSM1zGBEbqUN15HWAq+cMQUfO 8W5vD28z6jP1/RHxnwyg8LmFCGseCL52mfmNSivUvGlJy/5COmBhTEhLuJ2xvfs= =PlLq -----END PGP SIGNATURE----- ------------------------------------------------------------------------------ Minimize network downtime and maximize team effectiveness. Reduce network management and security costs.Learn how to hire the most talented Cisco Certified professionals. Visit the Employer Resources Portal http://www.cisco.com/web/learning/employer_resources/index.html _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
