This sounds interesting.
Can someone how is familiar with this method maybe chime in?
Many of the large scale clustering methods are more from the data mining community and not discussed that much in the machine learning community,
unfortunately.
I haven't heard of the method and it is hard for me to say if we should include it or not.

The two papers you mention both have <10 citations and are both in venues I have never heard of, so that doesn't really inspire confidence.
The DBSCAN paper has 7000+ citations, 260 is not that much.
That is a bit surprising, since they seem to be from the same authors and DBCLASD is sold as an improvement on DBSCAN.
Maybe it turned out that the slower runtime is not worth it?

The way I am now used to comparing clustering algorithms is using classification datasets and ARI,
and being aware of how weird that is.

Cheers,
Andy


On 08/03/2015 04:36 AM, Sebastián Palacio wrote:
Hi Andreas,

Looking through the literature, DBCLASD has been a constant reference both for practical applications and benchmark papers that are as recent as 2014 (see for example "A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis" or "A new clustering algorithm with adaptive attractor for LIDAR points"). The paper itself has been cited over 260 times. The most interesting feature of this algorithm is its non-parametric nature, allowing it to address classification problems without imposing hard constrains such as the number of classes. It can be used precisely as a way to estimate K for KMeans (and related problems) :-)

Comparing clustering algorithms is always a tricky task but according to the survey I've mentioned before (from 2014), DBCLASD is quite similar to DBSCAN and OPTICS: in terms of size of the dataset; it copes with noise (as oppose to DBSCAN, BIRCH and K-Means) and it has a complexity of O(3n^2) which compares with DBSCAN's O(n^2)

Regards,
Sebastian

On 31 July 2015 at 18:43, Andreas Mueller <t3k...@gmail.com <mailto:t3k...@gmail.com>> wrote:

    Hi Sebastian.
    Have you seen this used much recently? How does it compare against
    DBSCAN, BIRCH, OPTICS or just KMeans?

    Cheers,
    Andy



    On 07/31/2015 10:28 AM, Sebastián Palacio wrote:
    Hello all,

    I've been investigating clustering algorithms with special interest in
    non-parametric methods and, one that is being mentioned quite often is
    DBCLASD [1]. I've looked around but I haven't been able to find one single
    implementation of this algorithm whatsoever so I decided to implement my
    own.

    My first running version is already on GitHub:https://goo.gl/V4HOVH
    I tried to make it as simple as possible for anyone to run it: it's all
    written in Python, requires only "standard" python packages (numpy,
    scikit-learn, scipy and matplotlib) and it comes with a main routine that
    runs an example.

    I would really appreciate some feedback from the community, regarding the
    correctness of this implementation (if you happen to have some experience
    with the algorithm) and perhaps a discussion about how useful this
    algorithm may be in order to decide whether it makes sense to integrate it
    into a future version of scikit-learn or not. Thanks in advance for your
    time :-)

    Regards,
    Sebastian




------------------------------------------------------------------------------


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to