Hi Satra On Fri, May 04, 2012 at 05:54:27PM -0400, Satrajit Ghosh wrote: > any more thoughts on this. especially on the practical front. if you are > ok with the changes, i'll send a PR with just the graph laplacian changes. > cheers,
Sorry, I ran out of time on this issue. I don't have much time to write a well-redacted email, so please excuse the rough corners. > 1. do we want the implementation in scikits to be general? there is no > loss in efficiency at this point to compute it in all ways possible. > it's what you do with the laplacian (solvers etc.,.) that might > determine which you one you use. (and as we know, in the long run this > is going to be available in scipy itself). > 2. regarding the default option, i'm planning on running things on a > bunch of interpoint comparison matrices, whose properties i yet don't > know. from my reading of uvl, proposition 4 and section 7 (perturbation > theory), it seemed quite clear that for graphs with very low degrees > spectral clustering with Lsym can be problematic. in my case of > typically small world graphs, this is like to be true of at least a > handful of nodes (especially after thresholding). > 3. the return_diag option: i still don't understand intuitively what > this is supposed to represent in the Lsym case (current normed option in > code). My position on your different points is: 1 Genericity for the sake of genericity is really not something that I think we should pursue. It comes with a maintenance cost. It particular it tends to lead to significantly higher cyclomatic complexity in the code. In addition, too many options confuse the non expert. Thus for a different approach than the current one to be added in the scikit, it should bring a demonstrated gain. For the same reason, if the new approach outperforms the current approach in all respects, this means that the current approach should be phased out. 2 It is the 'D' matrix in the UvL paper. If you find a better formulation for the docstring I'd love to merge it in, as it might make the code easier to follow. To sum up the plan of action on spectral clustering/embedding, as long as we are not able to witness clear cut improvements on examples (possibly new ones), I am not in favor of merging in a change of strategy. In addition, I am not in favor of scheduling any change other than trivial ones (e.g. documentation) for the upcoming release: I think that rushing code in will not leave us time to gather the insight necessary for good code and APIs. Thanks for leading the discussion, Gaƫl ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
