>
> ---------- Forwarded message ----------
> From: denis
> Date: 18 November 2011 02:24
> Subject: sklearn pairwise_distances( sparse, sparse, l1 ) ?
> To: [email protected]
>
>
> Robert,
>  could you take a look at the attached testcase for sklearn
> pairwise_distances( sparse, sparse, l1 ) ?
> I'd try fixing it myself, but
> 1) i don't understand the failing line
>    D = np.abs(X[:, np.newaxis, :] - Y[np.newaxis, :, :])
> 2) i get grouchy every time i get near scipy.sparse.
>
> Thanks,
> cheers
>  -- denis
>

I received this email about a bug with pairwise.manhattan_distances when
both x and y are sparse.
I have no idea what is going on with the error*, so I was unable to help.
Any thoughts?

* what does putting np.newaxis do here?
Thanks

- Robert

-- 

Public key at: http://pgp.mit.edu/ Search for this email address and select
the key from "2011-08-19" (key id: 54BA8735)
# from: pairwise-l1.py
# run: 17 Nov 2011 16:14  in ~bz/py/ml/sklearn    mac 10.4.11 ppc 
versions: numpy 1.6.0 scipy 0.9.0 sklearn 0.9 py 2.6.4 (r264:75706, Nov  3 
2009, 13:13:00) 
[GCC 4.0.1 (Apple Computer, Inc. build 5250)]
randomcsr: [ 0 14 30 41 72]
randomcsr: [ 9 18 34 39 53]
pairwise_distances l2: 1.19
cdist l1: 3.15
Traceback (most recent call last):
  File "pairwise-l1.py", line 44, in <module>
    d1 = pairwise_distances( x, y, metric="l1" )
  File 
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/sklearn/metrics/pairwise.py",
 line 448, in pairwise_distances
    return pairwise_distance_functions[metric](X, Y, **kwds)
  File 
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/sklearn/metrics/pairwise.py",
 line 231, in manhattan_distances
    D = np.abs(X[:, np.newaxis, :] - Y[np.newaxis, :, :])
  File 
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/sparse/csr.py",
 line 220, in __getitem__
    P = extractor(col,self.shape[1]).T        #[1:2,[1,2]]
  File 
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/sparse/csr.py",
 line 186, in extractor
    indices = asindices(indices)
  File 
"/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/scipy/sparse/csr.py",
 line 168, in asindices
    raise IndexError('invalid index')
IndexError: invalid index
real 3  user 1  sys 0  cpu 82.01 %
# sklearn pairwise_distances( sparse, sparse, l1 ) ?
from __future__ import division
import sys
import numpy as np
from scipy import sparse  # $scipy/sparse/csr.py
from scipy.spatial.distance import cdist  # $slearn/metrics/pairwise.py
from sklearn.metrics import pairwise_distances

# $slearn/metrics/pairwise.py 232 ?
# D = np.abs(X[:, np.newaxis, :] - Y[np.newaxis, :, :])

__date__ = "2011-11-17 Nov"
__author_email__ = "denis-bz-py at t-online dot de"
import scipy, sklearn
print "versions: numpy %s scipy %s sklearn %s py %s" % (
    np.__version__, scipy.__version__, sklearn.__version__, sys.version)

N = 100
density = .05
seed = 1

exec( "\n".join( sys.argv[1:] ))  # run this.py N= ...
np.set_printoptions( 1, threshold=100, edgeitems=10, suppress=True )
np.random.seed(seed)

def randomcsr( N, density ):
    """ random csr_matrix of 0 / 1 """
    sample = np.random.random_sample( int( N * density ))
    x = np.zeros(N)
    x[(sample * N).astype(int)] = sample
    xcsr = sparse.csr_matrix(x)
    print "randomcsr:", xcsr.indices  # sorted ?
    return xcsr

x = randomcsr( N, density )
y = randomcsr( N, density )

d2 = pairwise_distances( x, y )
print "pairwise_distances l2: %.3g" % d2

d1 = cdist( x.todense(), y.todense(), metric="cityblock" )
print "cdist l1: %.3g" % d1

d1 = pairwise_distances( x, y, metric="l1" )
print "pairwise_distances l1: %.3g" % d1
    # IndexError: invalid index

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to