Re: [Scikit-learn-general] Implementation of DBCLASD for clustering

Sebastián Palacio Mon, 03 Aug 2015 01:37:42 -0700

Hi Andreas,

Looking through the literature, DBCLASD has been a constant reference both
for practical applications and benchmark papers that are as recent as 2014
(see for example "A Survey of Clustering Algorithms for Big Data: Taxonomy
and Empirical Analysis" or "A new clustering algorithm with adaptive
attractor for LIDAR points"). The paper itself has been cited over 260
times. The most interesting feature of this algorithm is its non-parametric
nature, allowing it to address classification problems without imposing
hard constrains such as the number of classes. It can be used precisely as
a way to estimate K for KMeans (and related problems)   :-)


Comparing clustering algorithms is always a tricky task but according to
the survey I've mentioned before (from 2014), DBCLASD is quite similar to
DBSCAN and OPTICS: in terms of size of the dataset; it copes with noise (as
oppose to DBSCAN, BIRCH and K-Means) and it has a complexity of O(3n^2)
which compares with DBSCAN's O(n^2)

Regards,
Sebastian

On 31 July 2015 at 18:43, Andreas Mueller <t3k...@gmail.com> wrote:

> Hi Sebastian.
> Have you seen this used much recently? How does it compare against DBSCAN,
> BIRCH, OPTICS or just KMeans?
>
> Cheers,
> Andy
>
>
>
> On 07/31/2015 10:28 AM, Sebastián Palacio wrote:
>
> Hello all,
>
> I've been investigating clustering algorithms with special interest in
> non-parametric methods and, one that is being mentioned quite often is
> DBCLASD [1]. I've looked around but I haven't been able to find one single
> implementation of this algorithm whatsoever so I decided to implement my
> own.
>
> My first running version is already on GitHub: https://goo.gl/V4HOVH
> I tried to make it as simple as possible for anyone to run it: it's all
> written in Python, requires only "standard" python packages (numpy,
> scikit-learn, scipy and matplotlib) and it comes with a main routine that
> runs an example.
>
> I would really appreciate some feedback from the community, regarding the
> correctness of this implementation (if you happen to have some experience
> with the algorithm) and perhaps a discussion about how useful this
> algorithm may be in order to decide whether it makes sense to integrate it
> into a future version of scikit-learn or not. Thanks in advance for your
> time :-)
>
> Regards,
> Sebastian
>
>

------------------------------------------------------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Implementation of DBCLASD for clustering

Reply via email to