That looks like it should do the trick, thanks to you both,
Martin

On 3 April 2012 12:37, Lars Buitinck <[email protected]> wrote:

> Op 3 april 2012 00:51 heeft David Warde-Farley
> <[email protected]> het volgende geschreven:
> > You might try representing it as a sparse bag-of-words, i.e. a sparse
> matrix
> > of  100,000 x (several million), where each row contains a 1 in positions
> > where a feature is present and 0 otherwise. Such a representation should
> be
> > fairly efficient in CSR or CSC.
>
> Good idea. It's easier if you go through an intermediate DOK matrix:
>
>    >>> x1 = [20, 1, 10]
>    >>> x2 = [ 1, 20, 10]
>    >>> X = dok_matrix((2, 100))  # replace 100 with the maximum pixel index
>    >>> for i in x1:
>    ...     X[0, i] = 1
>    ...
>    >>> for i in x2:
>    ...     X[1, i] = 1
>    ...
>    >>> X = X.tocsr()
>
>
> > I'm not sure which clustering estimators in scikit-learn support sparse
> > inputs but there should be a couple.
>
> KMeans accepts sparse matrices; so do the metrics.pairwise functions,
> so any clustering algorithm that accepts a square distance matrix
> should be fine as well.
>
>    >>> from sklearn.metrics.pairwise import euclidean_distances
>    >>> euclidean_distances(X, X)
>    array([[ 0.,  0.],
>           [ 0.,  0.]])
>
>
> --
> Lars Buitinck
> Scientific programmer, ILPS
> University of Amsterdam
>
>
> ------------------------------------------------------------------------------
> Better than sec? Nothing is better than sec when it comes to
> monitoring Big Data applications. Try Boundary one-second
> resolution app monitoring today. Free.
> http://p.sf.net/sfu/Boundary-dev2dev
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to