[Scikit-learn-general] Demo DBSCAN

María Helena Mejía Salazar Fri, 02 Dec 2011 11:17:29 -0800

Hi,


I modified a little bit the program of demo dbscan (plot_dbscan.py).  I am
using just distance (no similarities) and I am having bad results. There
are just 5 points, I changed  the eps as the minimum distance between  the
points and the number of minimun points are 2 since this is what I
requiered for doing the cluster.     I am getting that  all the points  are
noise.    I used WEKA (java)  too and it produced the desired results.

What is wrong with the plot_dbscan.py modified by me?

Here it is:

# -*- coding: utf-8 -*-
"""
===================================
Demo of DBSCAN clustering algorithm
===================================

Finds core samples of high density and expands clusters from them.

"""
print __doc__


import numpy as np
from scipy.spatial import distance
from sklearn.cluster import DBSCAN
from sklearn import metrics
from sklearn.datasets.samples_generator import make_blobs


#data map 3d 198
X=[[-1.86470825e-01,    0.00000000e+00,    -7.01133734e+00],
[1.15102438e-01,    0.00000000e+00,    -7.50645482e+00],
[-8.97935212e-01,    0.00000000e+00,    -1.44829809e+01],
[-5.93406197e-01,    0.00000000e+00 ,   -1.34563465e+01],
[-1.76435088e+00,    0.00000000e+00  ,  -1.10931162e+01]
]


print X
#############################################################################
#Compute Distance
EuclidDist=distance.pdist(X)
print EuclidDist

S = distance.squareform(EuclidDist)
mineps=np.min(EuclidDist)
print "mineps",mineps, np.min(EuclidDist)

print "len(S)",len(S),S
##############################################################################
# Compute DBSCAN
#===============================================================================

db = DBSCAN().fit(S,eps=mineps, min_samples=2)
core_samples = db.core_sample_indices_
print "core_samples",core_samples
labels = db.labels_
print "labels",labels
# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)



##############################################################################
# Plot result
import pylab as pl
from itertools import cycle

pl.close('all')
pl.figure(1)
pl.clf()

# Black removed and is used for noise instead.
colors = cycle('bgrcmybgrcmybgrcmybgrcmy')
for k, col in zip(set(labels), colors):
    if k == -1:
        # Black used for noise.
        col = 'k'
        markersize = 6
    class_members = [index[0] for index in np.argwhere(labels == k)]
    cluster_core_samples = [index for index in core_samples
                            if labels[index] == k]
    for index in class_members:
        x = X[index]
        if index in core_samples and k != -1:
            markersize = 14
        else:
            markersize = 6
        pl.plot(x[0], x[1], 'o', markerfacecolor=col,
                markeredgecolor='k', markersize=markersize)

pl.title('Estimated number of clusters: %d' % n_clusters_)
pl.show()

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] Demo DBSCAN

Reply via email to