Hi Satra

On Fri, May 04, 2012 at 05:54:27PM -0400, Satrajit Ghosh wrote:
>    any more thoughts on this. especially on the practical front. if you are
>    ok with the changes, i'll send a PR with just the graph laplacian changes.
>    cheers,

Sorry, I ran out of time on this issue.

I don't have much time to write a well-redacted email, so please excuse
the rough corners.

>      1. do we want the implementation in scikits to be general? there is no
>      loss in efficiency at this point to compute it in all ways possible.
>      it's what you do with the laplacian (solvers etc.,.) that might
>      determine which you one you use. (and as we know, in the long run this
>      is going to be available in scipy itself).
>      2. regarding the default option, i'm planning on running things on a
>      bunch of interpoint comparison matrices, whose properties i yet don't
>      know. from my reading of uvl, proposition 4 and section 7 (perturbation
>      theory), it seemed quite clear that for graphs with very low degrees
>      spectral clustering with Lsym can be problematic. in my case of
>      typically small world graphs, this is like to be true of at least a
>      handful of nodes (especially after thresholding).
>      3. the return_diag option: i still don't understand intuitively what
>      this is supposed to represent in the Lsym case (current normed option in
>      code).

My position on your different points is:

 1 Genericity for the sake of genericity is really not something that
   I think we should pursue. It comes with a maintenance cost. It
   particular it tends to lead to significantly higher cyclomatic
   complexity in the code. In addition, too many options confuse the
   non expert. Thus for a different approach than the
   current one to be added in the scikit, it should bring a
   demonstrated gain. For the same reason, if the new approach
   outperforms the current approach in all respects, this means that
   the current approach should be phased out.

 2 It is the 'D' matrix in the UvL paper. If you find a better
   formulation for the docstring I'd love to merge it in, as it might
   make the code easier to follow.

To sum up the plan of action on spectral clustering/embedding, as long as
we are not able to witness clear cut improvements on examples (possibly
new ones), I am not in favor of merging in a change of strategy. In
addition, I am not in favor of scheduling any change other than trivial
ones (e.g. documentation) for the upcoming release: I think that rushing
code in will not leave us time to gather the insight necessary for good
code and APIs.

Thanks for leading the discussion,

Gaƫl

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to