Re: [scikit-learn] Accessing Clustering Feature Tree in Birch

Roman Yurchak Mon, 02 Oct 2017 02:17:10 -0700

Hello,

sklearn.cluster.Birch follows the original BIRCH paper, that appears tobe mostly focused on efficiently building the hierarchical clusteringtree (and not so much on making the later analysis user friendly). Theattributes exposed by Birch are those that could be reasonably exposedgiven the scikit-learn API constraints. Though, one does have access tothe full cluster hierarchy via the Birch.root_.

As Joel said, traversing the tree is a standard CS problem, and there isalso probably a number of operations that could be done with it,depending on the application. For instance, for my use case, I foundthat re-constructing the Birch hierarchy using a custom container classfor each subcluster was the easiest to run subsequent analysis with. Adetailed example can be found here,

http://freediscovery.io/doc/stable/python/examples/birch_cluster_hierarchy.html

Alternatively, I wonder if converting the tree to a format readable bysome tree/graph specialized library (e.g. networkx) could be useful foranalysis.

Generally there is a number of places in scikit-learn where trees areused (Birch, AgglomerativeClustering, tree bases classifiers, etc) butfor now there is no way to export the constructed tree to some standardformat (apart for sklearn.tree.export_graphviz). Not sure if this isrealistically achievable though..


--
Roman

On 20/09/17 13:40, Sema Atasever wrote:

I need this information to use it in a scientific study and
I think that a function interface would make this easier.

Thank you for your answer.

On Sat, Sep 16, 2017 at 1:53 PM, Joel Nothman <joel.noth...@gmail.com
<mailto:joel.noth...@gmail.com>> wrote:

    There is no such thing as "the data samples in this cluster". The
    point of Birch being online is that it loses any reference to the
    individual samples that contributed to each node, but stores some
    statistics on their basis. Roman Yurchak has, however, offered a PR
    where, for the non-online case, storage of the indices contributing
    to each node can be optionally turned on:
    https://github.com/scikit-learn/scikit-learn/pull/8808
    <https://github.com/scikit-learn/scikit-learn/pull/8808>

    As for finding what is contained under any particular node,
    traversing the tree is a fairly basic task from a computer science
    perspective. Before we were to support something to make this much
    easier, I think we'd need to be clear on what kinds of use case we
    were supporting. What do you hope to do with this information, and
    what would a function interface look like that would make this much
    easier?

    Decimals aren't a practical option as the branching factor may be
    greater than 10, it is a hard structure to inspect, and susceptible
    to computational imprecision. Better off with a list of tuples, but
    what for that is not easy enough to do now?



    _______________________________________________
    scikit-learn mailing list
    scikit-learn@python.org <mailto:scikit-learn@python.org>
    https://mail.python.org/mailman/listinfo/scikit-learn
    <https://mail.python.org/mailman/listinfo/scikit-learn>




_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Accessing Clustering Feature Tree in Birch

Reply via email to