Re: [scikit-learn] Accessing Clustering Feature Tree in Birch

Joel Nothman Sat, 16 Sep 2017 03:56:08 -0700

There is no such thing as "the data samples in this cluster". The point of
Birch being online is that it loses any reference to the individual samples
that contributed to each node, but stores some statistics on their basis.
Roman Yurchak has, however, offered a PR where, for the non-online case,
storage of the indices contributing to each node can be optionally turned
on: https://github.com/scikit-learn/scikit-learn/pull/8808


As for finding what is contained under any particular node, traversing the
tree is a fairly basic task from a computer science perspective. Before we
were to support something to make this much easier, I think we'd need to be
clear on what kinds of use case we were supporting. What do you hope to do
with this information, and what would a function interface look like that
would make this much easier?

Decimals aren't a practical option as the branching factor may be greater
than 10, it is a hard structure to inspect, and susceptible to
computational imprecision. Better off with a list of tuples, but what for
that is not easy enough to do now?

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Accessing Clustering Feature Tree in Birch

Reply via email to