Sklearn folks,
  imho the Right Thing for dense / sparse distances would be to combine
1) cython for L1 L2 Linf( dense, dense )
    _distance_p in 
http://svn.scipy.org/svn/scipy/trunk/scipy/spatial/ckdtree.pyx
    ~ 20 lines
2) pure python to expand sparse todense.
    I believe -- correct me -- that dist( sparse, dense )
    is much more common than dist( sparse, sparse ) anyway ?
    (except hammingdist on long sparse bools).
    Below is a cut at a cdist_sparse which just calls cdist(),
    trivial enough to have a chance of being correct :)

As a side point, do ML people generally use L1 rather than
sensitive-to-outliers L2 ?  Or custom metrics,
in which case we need plugin metrics like cdist anyway ?

    def cdist_sparse( X, Y, **kwargs ):
        """ -> cdist( X or Y may be sparse ), any metric """
            # todense row at a time, very slow if both very sparse
        sxy = 2*issparse(X) + issparse(Y)
        if sxy == 0:
            return cdist( X, Y, **kwargs )
        d = np.empty( (X.shape[0], Y.shape[0]), np.float64 )
        if sxy == 2:
            for j, x in enumerate(X):
                d[j] = cdist( x.todense(), Y, **kwargs ) [0]
        elif sxy == 1:
            for k, y in enumerate(Y):
                d[:,k] = cdist( X, y.todense(), **kwargs ) [0]
        else:
            for j, x in enumerate(X):
                for k, y in enumerate(Y):
                    d[j,k] = cdist( x.todense(), y.todense(), **kwargs ) [0]
        return d

cheers
  -- denis




------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to