Hi Phani.
It's good that you where able to work around the problem.
Could you still please open an issue on github and give a script
that reproduces the problem (non-deterministically)?
That would help us fix the problem so that other won't have the same
issue.
Thanks,
Andy
On 05/24/2012 11:53 PM, Phani Vadrevu wrote:
Hi Andy,
I ran it a number of times. Every once in a while, it does
finish the clustering successfully. But many times it results in the
error that I have forwarded. Anyway, for my purposes, I found that
removing the init='random' argument from the kmeans object
instantiation, solves the problem. With k-means++ it is always running
successfully to completion.
Thanks,
Phani
On 24 May 2012 17:37, Andreas Mueller <[email protected]
<mailto:[email protected]>> wrote:
Hi Phani.
Are you sure the behavior is non-deterministic?
I am not sure what comes out of the vectorizer,
but my guess would be that X is a sparse matrix, which
KMeans doesn't handle.
Could you check that, please?
Cheers,
Andy
On 05/24/2012 06:19 PM, Phani Vadrevu wrote:
Hi all,
I am trying to run some basic clustering code.
vectorizer =
CountVectorizer(preprocessor=preprocessor,token_pattern=u'/\w+/')
# url_list is a list of strings
X = vectorizer.fit_transform(url_list)
print "feature extraction done in %f s"%(time() - t0)
t0 = time()
km = KMeans(init='random', max_iter=100,verbose=1,n_init=1)
km.fit(X)
print "clustering done in %f s"%(time() - t0)
It runs some times, but mostly it ends in the following:
feature extraction done in 0.003542 s
Initialization complete
Traceback (most recent call last):
File "cluster.py", line 42, in <module>
km.fit(X)
File
"/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py",
line 735, in fit
n_jobs=self.n_jobs)
File
"/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py",
line 265, in k_means
x_squared_norms=x_squared_norms, random_state=random_state)
File
"/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py",
line 380, in _kmeans_single
centers = _centers(X, labels, k, distances)
File
"/usr/local/lib/python2.7/dist-packages/sklearn/cluster/k_means_.py",
line 507, in _centers
centers[center_id] = X[far_from_centers[reallocated_idx]]
ValueError: setting an array element with a sequence.
What could be wrong here?
Thanks,
Phani Vadrevu
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats.http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond.
Discussions
will include endpoint security, mobile security and the latest in
malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general