Hi Roman, Thank you for the detailed and informative answer.
On Mon, Oct 2, 2017 at 12:14 PM, Roman Yurchak <rth.yurc...@gmail.com> wrote: > Hello, > > sklearn.cluster.Birch follows the original BIRCH paper, that appears to be > mostly focused on efficiently building the hierarchical clustering tree > (and not so much on making the later analysis user friendly). The > attributes exposed by Birch are those that could be reasonably exposed > given the scikit-learn API constraints. Though, one does have access to the > full cluster hierarchy via the Birch.root_. > > As Joel said, traversing the tree is a standard CS problem, and there is > also probably a number of operations that could be done with it, depending > on the application. For instance, for my use case, I found that > re-constructing the Birch hierarchy using a custom container class for each > subcluster was the easiest to run subsequent analysis with. A detailed > example can be found here, > http://freediscovery.io/doc/stable/python/examples/birch_clu > ster_hierarchy.html > Alternatively, I wonder if converting the tree to a format readable by > some tree/graph specialized library (e.g. networkx) could be useful for > analysis. > > Generally there is a number of places in scikit-learn where trees are used > (Birch, AgglomerativeClustering, tree bases classifiers, etc) but for now > there is no way to export the constructed tree to some standard format > (apart for sklearn.tree.export_graphviz). Not sure if this is realistically > achievable though.. > > -- > Roman > > On 20/09/17 13:40, Sema Atasever wrote: > >> I need this information to use it in a scientific study and >> I think that a function interface would make this easier. >> >> Thank you for your answer. >> >> On Sat, Sep 16, 2017 at 1:53 PM, Joel Nothman <joel.noth...@gmail.com >> <mailto:joel.noth...@gmail.com>> wrote: >> >> There is no such thing as "the data samples in this cluster". The >> point of Birch being online is that it loses any reference to the >> individual samples that contributed to each node, but stores some >> statistics on their basis. Roman Yurchak has, however, offered a PR >> where, for the non-online case, storage of the indices contributing >> to each node can be optionally turned on: >> https://github.com/scikit-learn/scikit-learn/pull/8808 >> <https://github.com/scikit-learn/scikit-learn/pull/8808> >> >> As for finding what is contained under any particular node, >> traversing the tree is a fairly basic task from a computer science >> perspective. Before we were to support something to make this much >> easier, I think we'd need to be clear on what kinds of use case we >> were supporting. What do you hope to do with this information, and >> what would a function interface look like that would make this much >> easier? >> >> Decimals aren't a practical option as the branching factor may be >> greater than 10, it is a hard structure to inspect, and susceptible >> to computational imprecision. Better off with a list of tuples, but >> what for that is not easy enough to do now? >> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org <mailto:scikit-learn@python.org> >> https://mail.python.org/mailman/listinfo/scikit-learn >> <https://mail.python.org/mailman/listinfo/scikit-learn> >> >> >> >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn