I was afraid you would say that, but if there was the problem, this seems
like the only solution :)
I'll get on it.
On 6 June 2013 12:39, Jacob Vanderplas <jake...@cs.washington.edu> wrote:
> Hi,
> The cs_graph_components routine in sklearn utils has a few problems: in
> particular, though the doc string says it only accesses the upper
> triangular part of the matrix, this is not true. Internally it assumes the
> matrix is symmetric, and will not return the correct results if the
> lower-triangular piece is zero. Also, I believe that due to the symmetry
> requirement it effectively only finds strongly-connected components, so a
> MST should be considered entirely unconnected!
>
> The components-finding algorithm has been greatly improved in the
> scipy.sparse.csgraph, especially given a recent PR which sped up the
> algorithm and improved memory performance immensely. It may be worth
> back-porting that routine as well, and replacing the old
> cs_graph_components utility routine.
> Jake
>
>
> On Wed, Jun 5, 2013 at 4:54 PM, Robert Layton <robertlay...@gmail.com>wrote:
>
>> I'm using this workflow for a minimum spanning tree clusterer for my
>> current PR on evidence accumulation clustering (see
>> https://github.com/scikit-learn/scikit-learn/pull/1830 )
>>
>> My understanding is that the output of minimum_spanning_tree (recently
>> backported to sklearn) should plug directly into cs_graph_components.
>>
>> The code for the fit function is this:
>> #imports
>> from ..utils.mst import minimum_spanning_tree
>> from ..utils._csgraph import cs_graph_components
>>
>> <snipped>
>>
>> def fit(X):
>> self.span_tree = minimum_spanning_tree(X)
>> # Remote any link with a distance more than self.threshold
>> self.span_tree.data[self.span_tree.data < self.threshold] = 0
>> self.span_tree = self.span_tree.trunc()
>> # Compute clusters by finding connected subgraphs in
>> self.span_tree
>> n_components, self.labels_ = cs_graph_components(self.span_tree)
>> return self
>>
>>
>> This doesn't give the intended answer when run with this test (matrix is
>> from
>> http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csgraph.minimum_spanning_tree.html
>> ):
>>
>> def simple_test():
>> # Uses the simple example from the website
>> X = csr_matrix([[0, 8, 0, 3],
>> [0, 0, 2, 5],
>> [0, 0, 0, 6],
>> [0, 0, 0, 0]])
>> model = MSTCluster(threshold=3) # Clustering algorithm defined above
>> labels = model.fit(X).labels_
>> assert_array_equal(labels, np.array([0,0,1,0]),
>> "Given={}\nExpected={}".format(labels, np.array([0,0,1,0])))
>> return labels
>>
>> Instead, the labels are [ 0 1 -2 -2]
>> I would expect this to be [0, -2, 0, 0], given that unconnected nodes are
>> given -2 as a label (I can fix this). Ideally it would be [0, 0, 1, 0]
>>
>> There seems to be a bit of discussion about this here:
>> http://mail.scipy.org/pipermail/scipy-dev/2012-March/017261.html
>>
>> My thinking is that this is the problem that I'm encountering. If so,
>> should this backported code be updated to a more recent version?
>>
>>
>> Thanks,
>>
>> - Robert
>>
>>
>>
>>
>>
>>
>> --
>>
>> Public key at: http://pgp.mit.edu/ Search for this email address and
>> select the key from "2011-08-19" (key id: 54BA8735)
>>
>>
>> ------------------------------------------------------------------------------
>> How ServiceNow helps IT people transform IT departments:
>> 1. A cloud service to automate IT design, transition and operations
>> 2. Dashboards that offer high-level views of enterprise services
>> 3. A single system of record for all IT processes
>> http://p.sf.net/sfu/servicenow-d2d-j
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. A cloud service to automate IT design, transition and operations
> 2. Dashboards that offer high-level views of enterprise services
> 3. A single system of record for all IT processes
> http://p.sf.net/sfu/servicenow-d2d-j
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
--
Public key at: http://pgp.mit.edu/ Search for this email address and select
the key from "2011-08-19" (key id: 54BA8735)
------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general