I'm using this workflow for a minimum spanning tree clusterer for my
current PR on evidence accumulation clustering (see
https://github.com/scikit-learn/scikit-learn/pull/1830 )
My understanding is that the output of minimum_spanning_tree (recently
backported to sklearn) should plug directly into cs_graph_components.
The code for the fit function is this:
#imports
from ..utils.mst import minimum_spanning_tree
from ..utils._csgraph import cs_graph_components
<snipped>
def fit(X):
self.span_tree = minimum_spanning_tree(X)
# Remote any link with a distance more than self.threshold
self.span_tree.data[self.span_tree.data < self.threshold] = 0
self.span_tree = self.span_tree.trunc()
# Compute clusters by finding connected subgraphs in self.span_tree
n_components, self.labels_ = cs_graph_components(self.span_tree)
return self
This doesn't give the intended answer when run with this test (matrix is
from
http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csgraph.minimum_spanning_tree.html
):
def simple_test():
# Uses the simple example from the website
X = csr_matrix([[0, 8, 0, 3],
[0, 0, 2, 5],
[0, 0, 0, 6],
[0, 0, 0, 0]])
model = MSTCluster(threshold=3) # Clustering algorithm defined above
labels = model.fit(X).labels_
assert_array_equal(labels, np.array([0,0,1,0]),
"Given={}\nExpected={}".format(labels, np.array([0,0,1,0])))
return labels
Instead, the labels are [ 0 1 -2 -2]
I would expect this to be [0, -2, 0, 0], given that unconnected nodes are
given -2 as a label (I can fix this). Ideally it would be [0, 0, 1, 0]
There seems to be a bit of discussion about this here:
http://mail.scipy.org/pipermail/scipy-dev/2012-March/017261.html
My thinking is that this is the problem that I'm encountering. If so,
should this backported code be updated to a more recent version?
Thanks,
- Robert
--
Public key at: http://pgp.mit.edu/ Search for this email address and select
the key from "2011-08-19" (key id: 54BA8735)
------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general