This sounds interesting.
Can someone how is familiar with this method maybe chime in?
Many of the large scale clustering methods are more from the data mining
community and not discussed that much in the machine learning community,
unfortunately.
I haven't heard of the method and it is hard for me to say if we should
include it or not.
The two papers you mention both have <10 citations and are both in
venues I have never heard of, so that doesn't really inspire confidence.
The DBSCAN paper has 7000+ citations, 260 is not that much.
That is a bit surprising, since they seem to be from the same authors
and DBCLASD is sold as an improvement on DBSCAN.
Maybe it turned out that the slower runtime is not worth it?
The way I am now used to comparing clustering algorithms is using
classification datasets and ARI,
and being aware of how weird that is.
Cheers,
Andy
On 08/03/2015 04:36 AM, Sebastián Palacio wrote:
Hi Andreas,
Looking through the literature, DBCLASD has been a constant reference
both for practical applications and benchmark papers that are as
recent as 2014 (see for example "A Survey of Clustering Algorithms for
Big Data: Taxonomy and Empirical Analysis" or "A new clustering
algorithm with adaptive attractor for LIDAR points"). The paper itself
has been cited over 260 times. The most interesting feature of this
algorithm is its non-parametric nature, allowing it to address
classification problems without imposing hard constrains such as the
number of classes. It can be used precisely as a way to estimate K for
KMeans (and related problems) :-)
Comparing clustering algorithms is always a tricky task but according
to the survey I've mentioned before (from 2014), DBCLASD is quite
similar to DBSCAN and OPTICS: in terms of size of the dataset; it
copes with noise (as oppose to DBSCAN, BIRCH and K-Means) and it has a
complexity of O(3n^2) which compares with DBSCAN's O(n^2)
Regards,
Sebastian
On 31 July 2015 at 18:43, Andreas Mueller <t3k...@gmail.com
<mailto:t3k...@gmail.com>> wrote:
Hi Sebastian.
Have you seen this used much recently? How does it compare against
DBSCAN, BIRCH, OPTICS or just KMeans?
Cheers,
Andy
On 07/31/2015 10:28 AM, Sebastián Palacio wrote:
Hello all,
I've been investigating clustering algorithms with special interest in
non-parametric methods and, one that is being mentioned quite often is
DBCLASD [1]. I've looked around but I haven't been able to find one single
implementation of this algorithm whatsoever so I decided to implement my
own.
My first running version is already on GitHub:https://goo.gl/V4HOVH
I tried to make it as simple as possible for anyone to run it: it's all
written in Python, requires only "standard" python packages (numpy,
scikit-learn, scipy and matplotlib) and it comes with a main routine that
runs an example.
I would really appreciate some feedback from the community, regarding the
correctness of this implementation (if you happen to have some experience
with the algorithm) and perhaps a discussion about how useful this
algorithm may be in order to decide whether it makes sense to integrate it
into a future version of scikit-learn or not. Thanks in advance for your
time :-)
Regards,
Sebastian
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general