There is no such thing as "the data samples in this cluster". The point of Birch being online is that it loses any reference to the individual samples that contributed to each node, but stores some statistics on their basis. Roman Yurchak has, however, offered a PR where, for the non-online case, storage of the indices contributing to each node can be optionally turned on: https://github.com/scikit-learn/scikit-learn/pull/8808
As for finding what is contained under any particular node, traversing the tree is a fairly basic task from a computer science perspective. Before we were to support something to make this much easier, I think we'd need to be clear on what kinds of use case we were supporting. What do you hope to do with this information, and what would a function interface look like that would make this much easier? Decimals aren't a practical option as the branching factor may be greater than 10, it is a hard structure to inspect, and susceptible to computational imprecision. Better off with a list of tuples, but what for that is not easy enough to do now?
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn