Hi,
I modified a little bit the program of demo dbscan (plot_dbscan.py). I am
using just distance (no similarities) and I am having bad results. There
are just 5 points, I changed the eps as the minimum distance between the
points and the number of minimun points are 2 since this is what I
requiered for doing the cluster. I am getting that all the points are
noise. I used WEKA (java) too and it produced the desired results.
What is wrong with the plot_dbscan.py modified by me?
Here it is:
# -*- coding: utf-8 -*-
"""
===================================
Demo of DBSCAN clustering algorithm
===================================
Finds core samples of high density and expands clusters from them.
"""
print __doc__
import numpy as np
from scipy.spatial import distance
from sklearn.cluster import DBSCAN
from sklearn import metrics
from sklearn.datasets.samples_generator import make_blobs
#data map 3d 198
X=[[-1.86470825e-01, 0.00000000e+00, -7.01133734e+00],
[1.15102438e-01, 0.00000000e+00, -7.50645482e+00],
[-8.97935212e-01, 0.00000000e+00, -1.44829809e+01],
[-5.93406197e-01, 0.00000000e+00 , -1.34563465e+01],
[-1.76435088e+00, 0.00000000e+00 , -1.10931162e+01]
]
print X
#############################################################################
#Compute Distance
EuclidDist=distance.pdist(X)
print EuclidDist
S = distance.squareform(EuclidDist)
mineps=np.min(EuclidDist)
print "mineps",mineps, np.min(EuclidDist)
print "len(S)",len(S),S
##############################################################################
# Compute DBSCAN
#===============================================================================
db = DBSCAN().fit(S,eps=mineps, min_samples=2)
core_samples = db.core_sample_indices_
print "core_samples",core_samples
labels = db.labels_
print "labels",labels
# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
##############################################################################
# Plot result
import pylab as pl
from itertools import cycle
pl.close('all')
pl.figure(1)
pl.clf()
# Black removed and is used for noise instead.
colors = cycle('bgrcmybgrcmybgrcmybgrcmy')
for k, col in zip(set(labels), colors):
if k == -1:
# Black used for noise.
col = 'k'
markersize = 6
class_members = [index[0] for index in np.argwhere(labels == k)]
cluster_core_samples = [index for index in core_samples
if labels[index] == k]
for index in class_members:
x = X[index]
if index in core_samples and k != -1:
markersize = 14
else:
markersize = 6
pl.plot(x[0], x[1], 'o', markerfacecolor=col,
markeredgecolor='k', markersize=markersize)
pl.title('Estimated number of clusters: %d' % n_clusters_)
pl.show()
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general