OK, it's up as PR https://github.com/scikit-learn/scikit-learn/pull/2037 and
ready to go.


On 6 June 2013 13:09, Robert Layton <robertlay...@gmail.com> wrote:

> I was afraid you would say that, but if there was the problem, this seems
> like the only solution :)
>
> I'll get on it.
>
>
> On 6 June 2013 12:39, Jacob Vanderplas <jake...@cs.washington.edu> wrote:
>
>> Hi,
>> The cs_graph_components routine in sklearn utils has a few problems: in
>> particular, though the doc string says it only accesses the upper
>> triangular part of the matrix, this is not true.  Internally it assumes the
>> matrix is symmetric, and will not return the correct results if the
>> lower-triangular piece is zero.  Also, I believe that due to the symmetry
>> requirement it effectively only finds strongly-connected components, so a
>> MST should be considered entirely unconnected!
>>
>> The components-finding algorithm has been greatly improved in the
>> scipy.sparse.csgraph, especially given a recent PR which sped up the
>> algorithm and improved memory performance immensely.  It may be worth
>> back-porting that routine as well, and replacing the old
>> cs_graph_components utility routine.
>>    Jake
>>
>>
>> On Wed, Jun 5, 2013 at 4:54 PM, Robert Layton <robertlay...@gmail.com>wrote:
>>
>>> I'm using this workflow for a minimum spanning tree clusterer for my
>>> current PR on evidence accumulation clustering (see
>>> https://github.com/scikit-learn/scikit-learn/pull/1830 )
>>>
>>> My understanding is that the output of minimum_spanning_tree (recently
>>> backported to sklearn) should plug directly into cs_graph_components.
>>>
>>> The code for the fit function is this:
>>>         #imports
>>>         from ..utils.mst import minimum_spanning_tree
>>>         from ..utils._csgraph import cs_graph_components
>>>
>>>     <snipped>
>>>
>>>     def fit(X):
>>>         self.span_tree = minimum_spanning_tree(X)
>>>         # Remote any link with a distance more than self.threshold
>>>         self.span_tree.data[self.span_tree.data < self.threshold] = 0
>>>         self.span_tree = self.span_tree.trunc()
>>>         # Compute clusters by finding connected subgraphs in
>>> self.span_tree
>>>         n_components, self.labels_ = cs_graph_components(self.span_tree)
>>>         return self
>>>
>>>
>>> This doesn't give the intended answer when run with this test (matrix is
>>> from
>>> http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csgraph.minimum_spanning_tree.html
>>>  ):
>>>
>>> def simple_test():
>>>     # Uses the simple example from the website
>>>     X = csr_matrix([[0, 8, 0, 3],
>>>                     [0, 0, 2, 5],
>>>                     [0, 0, 0, 6],
>>>                     [0, 0, 0, 0]])
>>>     model = MSTCluster(threshold=3)   # Clustering algorithm defined
>>> above
>>>     labels = model.fit(X).labels_
>>>     assert_array_equal(labels, np.array([0,0,1,0]),
>>> "Given={}\nExpected={}".format(labels, np.array([0,0,1,0])))
>>>     return labels
>>>
>>> Instead, the labels are [ 0  1 -2 -2]
>>> I would expect this to be [0, -2, 0, 0], given that unconnected nodes
>>> are given -2 as a label (I can fix this). Ideally it would be [0, 0, 1, 0]
>>>
>>> There seems to be a bit of discussion about this here:
>>> http://mail.scipy.org/pipermail/scipy-dev/2012-March/017261.html
>>>
>>> My thinking is that this is the problem that I'm encountering. If so,
>>> should this backported code be updated to a more recent version?
>>>
>>>
>>> Thanks,
>>>
>>> - Robert
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Public key at: http://pgp.mit.edu/ Search for this email address and
>>> select the key from "2011-08-19" (key id: 54BA8735)
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> How ServiceNow helps IT people transform IT departments:
>>> 1. A cloud service to automate IT design, transition and operations
>>> 2. Dashboards that offer high-level views of enterprise services
>>> 3. A single system of record for all IT processes
>>> http://p.sf.net/sfu/servicenow-d2d-j
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> How ServiceNow helps IT people transform IT departments:
>> 1. A cloud service to automate IT design, transition and operations
>> 2. Dashboards that offer high-level views of enterprise services
>> 3. A single system of record for all IT processes
>> http://p.sf.net/sfu/servicenow-d2d-j
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> --
>
> Public key at: http://pgp.mit.edu/ Search for this email address and
> select the key from "2011-08-19" (key id: 54BA8735)
>



-- 

Public key at: http://pgp.mit.edu/ Search for this email address and select
the key from "2011-08-19" (key id: 54BA8735)
------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to