Hi, Andy. Thank you for the interest.

Consensus clustering is usually used in the same context as traditional
clustering techniques. Many papers have reported significantly accuracy
improvements when using these methods, as they can combine partitions from
several different algorithm, finding interesting structures, usually not
discovered by traditional methods. They are similar to ensemble methods in
the supervised world, although they have their own particularities, of
course.

One of the motivations of these methods is to avoid the choice of a single
clustering algorithm by the inexperienced user, who usually finds a lot of
different alternatives for his problem, and this choice is generally not
easy for them. Consensus clustering tries to mitigate this by running
several clustering methods with different parameters (like the number of
clusters). This set of partitions is called ensemble, and it is the input
of the consensus function, which derives from it a single consensus
partition, which usually outperforms all the individual members of the
input set. The JMLR paper
<http://www.jmlr.org/papers/volume3/strehl02a/strehl02a.pdf> I mentioned
before proposes a framework for this, called Robust Centralized Clustering
(RCC).

Another interesting applications of these methods, as mentioned in the
previous paper, are the Feature-Distributed Clustering (FDC) and
Object-Distributed Clustering (ODC). The first one, FDC, allows the user to
combine partitions generated from partial views of the data. A common
scenario are distributed data bases, which usually can not be integrated at
a centralized location because of different aspects (proprietary data,
privacy concerns, performance issues, etc). In such scenarios, it is more
realistic to have different "clusterers" at those different places, and
then combine only the clustering results at a central location. This is
possible because the consensus function only needs access to cluster labels
produced by those clusterers (traditional methods), not to the whole data.
The other application, ODC, is similar but with distributed objects instead
of distributed features, and it has their own challenges. An example is a
distributed customer data base of a company located at different cities.
One of the issues here, for instance, is that the consensus function needs
some overlap.

Well, this is a short description of these methods. Let me know if you need
more details.

Regards,

Milton

2015-02-12 18:47 GMT-03:00 Andy <t3k...@gmail.com>:

>  Hi Milton.
>
> In which context is consensus clustering usually used, and what are the
> main applications?
> We will not add an external dependency, sorry.
>
> Cheers,
> Andy
>
>
>
> On 02/12/2015 01:55 PM, Milton Pividori wrote:
>
> Hi, guys. My name is Milton Pividori and this is the first time I write to
> this list. I'm a PhD student, working on clustering, particularly on
> consensus clustering. I'm relatively new to Python, and I am migrating
> legacy code from MATLAB. I plan to use scikit-learn as well as other
> libraries.
>
>  After looking at the scikit code and the mailing list, I didn't found
> any methods related to consensus clustering or cluster ensembles. I think
> the main paper about it is the one from Strehl and Ghosh (2002, JMLR, link
> <http://www.jmlr.org/papers/volume3/strehl02a/strehl02a.pdf>). I don't
> know if you discussed about it before, but I think it could be a good idea
> to have these consensus functions implemented in scikit-learn (the paper
> proposes three, graph-based).
>
>  I was thinking on how to implement them. These three consensus functions
> (CSPA, HGPA and MCLA) use METIS for graph partitioning. That could be an
> obstacle for scikit-learn interests, as a new dependency would be needed (I
> found python bindings for it). It would be also necessary to implement some
> methods for ensemble generation with varying levels of diversity
> (generating different clustering partitions by varying algorithms, changing
> their parameters or manipulating data with projections, subsampling or
> feature selection), but that's easier than implementing the consensus
> functions.
>
>  Well, it's just an idea. I would be glad to help with coding if this is
> interesting for the community.
>
>  Regards,
>
> 2015-02-12 13:38 GMT-03:00 Sebastian Raschka <se.rasc...@gmail.com>:
>
>>  What about adding multiclass support for the SVC "roc_auc" for grid
>> search CV to the to do list?
>>
>>  Best,
>> Sebastian
>>
>> On Feb 12, 2015, at 10:12 AM, Ronnie Ghose <ronnie.gh...@gmail.com>
>> wrote:
>>
>>   +1 to partial fit -1 to gam and more probabilistic things in sklean
>>
>> On Thu, Feb 12, 2015, 9:22 AM ragv ragv <rag...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Is there a good deal of interest in having GAMs implemented?
>>>
>>> The timeline for such a project would go something like :
>>>
>>> Before GSoC:
>>> * Implement SpAM
>>>
>>> Before Midterm :
>>> * Help merge pyearth into scikit learn
>>> * Implement Additive Model -> `AdditiveClassifier` /
>>> `AdditiveRegressor` ( Not sure if my wording here is correct )
>>>
>>> After Midterm :
>>> * Implement GAMLSS
>>> * Implement LISO
>>>
>>> Kindly also see
>>> https://github.com/scikit-learn/scikit-learn/issues/3482 for
>>> references with citation counts.
>>>
>>> The package mgcv by Simon Woods / GAM / BAM in CRAN is mature and
>>> could be used as reference material too...
>>>
>>> On a scale of 0 to 100 could I know how much importance / interest
>>> would there be in such a project for GSoC 2015?
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>> sponsored by Intel and developed in partnership with Slashdot Media, is
>>> your
>>> hub for all things parallel software development, from weekly thought
>>> leadership blogs to news, videos, case studies, tutorials and more. Take
>>> a
>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>>
>>  _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
>  --
> Milton Pividori
> Blog: www.miltonpividori.com.ar
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
>
>
>
> _______________________________________________
> Scikit-learn-general mailing 
> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>


-- 
Milton Pividori
Blog: www.miltonpividori.com.ar
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to