On 3 December 2011 23:47, Olivier Grisel <[email protected]> wrote:
> 2011/12/2 María Helena Mejía Salazar <[email protected]>:
> > Hi,
> >
> > I modified a little bit the program of demo dbscan (plot_dbscan.py). I
> am
> > using just distance (no similarities) and I am having bad results. There
> are
> > just 5 points, I changed the eps as the minimum distance between the
> > points and the number of minimun points are 2 since this is what I
> requiered
> > for doing the cluster. I am getting that all the points are noise.
> > I used WEKA (java) too and it produced the desired results.
>
> I have little understanding of the DBSCAN algorithm itself but
> clustering 5 datapoints sounds like an hill-defined task to me. Are
> you sure this is what you are looking for? Maybe a full hierarchical
> clustering tree (the dendrogram) would be more interesting in this
> case.
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure
> contains a definitive record of customers, application performance,
> security threats, fraudulent activity, and more. Splunk takes this
> data and makes sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-novd2d
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
It is an ill-defined problem for DBSCAN in general, but the example needs
to be adjusted:
Change this line:
db = DBSCAN().fit(S,eps=mineps, min_samples=2)
To this:
db = DBSCAN(metric='precomputed').fit(S,eps=mineps, min_samples=2)
To fit in with the API for scikits.learn in general, it was deemed that,
but default, the euclidean distance is calculated from the given matrix.
This is in the docs, should it be made clearer?
Perhaps we should give a warning if a square matrix with a zero diagonal is
given?
- Robert
--
Public key at: http://pgp.mit.edu/ Search for this email address and select
the key from "2011-08-19" (key id: 54BA8735)
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general