Re: [Scikit-learn-general] Implementation of DBCLASD for clustering

Andreas Mueller Mon, 03 Aug 2015 08:15:39 -0700

This sounds interesting.
Can someone how is familiar with this method maybe chime in?

Many of the large scale clustering methods are more from the data miningcommunity and not discussed that much in the machine learning community,

unfortunately.

I haven't heard of the method and it is hard for me to say if we shouldinclude it or not.

The two papers you mention both have <10 citations and are both invenues I have never heard of, so that doesn't really inspire confidence.

The DBSCAN paper has 7000+ citations, 260 is not that much.

That is a bit surprising, since they seem to be from the same authorsand DBCLASD is sold as an improvement on DBSCAN.

Maybe it turned out that the slower runtime is not worth it?

The way I am now used to comparing clustering algorithms is usingclassification datasets and ARI,

and being aware of how weird that is.

Cheers,
Andy


On 08/03/2015 04:36 AM, Sebastián Palacio wrote:

Hi Andreas,
Looking through the literature, DBCLASD has been a constant referenceboth for practical applications and benchmark papers that are asrecent as 2014 (see for example "A Survey of Clustering Algorithms forBig Data: Taxonomy and Empirical Analysis" or "A new clusteringalgorithm with adaptive attractor for LIDAR points"). The paper itselfhas been cited over 260 times. The most interesting feature of thisalgorithm is its non-parametric nature, allowing it to addressclassification problems without imposing hard constrains such as thenumber of classes. It can be used precisely as a way to estimate K forKMeans (and related problems) :-)
Comparing clustering algorithms is always a tricky task but accordingto the survey I've mentioned before (from 2014), DBCLASD is quitesimilar to DBSCAN and OPTICS: in terms of size of the dataset; itcopes with noise (as oppose to DBSCAN, BIRCH and K-Means) and it has acomplexity of O(3n^2) which compares with DBSCAN's O(n^2)
Regards,
Sebastian
On 31 July 2015 at 18:43, Andreas Mueller <t3k...@gmail.com<mailto:t3k...@gmail.com>> wrote:
    Hi Sebastian.
    Have you seen this used much recently? How does it compare against
    DBSCAN, BIRCH, OPTICS or just KMeans?

    Cheers,
    Andy



    On 07/31/2015 10:28 AM, Sebastián Palacio wrote:
    Hello all,

    I've been investigating clustering algorithms with special interest in
    non-parametric methods and, one that is being mentioned quite often is
    DBCLASD [1]. I've looked around but I haven't been able to find one single
    implementation of this algorithm whatsoever so I decided to implement my
    own.

    My first running version is already on GitHub:https://goo.gl/V4HOVH
    I tried to make it as simple as possible for anyone to run it: it's all
    written in Python, requires only "standard" python packages (numpy,
    scikit-learn, scipy and matplotlib) and it comes with a main routine that
    runs an example.

    I would really appreciate some feedback from the community, regarding the
    correctness of this implementation (if you happen to have some experience
    with the algorithm) and perhaps a discussion about how useful this
    algorithm may be in order to decide whether it makes sense to integrate it
    into a future version of scikit-learn or not. Thanks in advance for your
    time :-)

    Regards,
    Sebastian
------------------------------------------------------------------------------


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Implementation of DBCLASD for clustering

Reply via email to