Thanks for the suggestions

This snippet of code does what I want w.r.t. identifying documents /
grouped by cluster

for cluster_id in range(0, km.n_clusters):
    cluster_doc_filenames = dataset.filenames[np.where(km.labels_ ==
cluster_id)]
    for cluster_doc_filename in cluster_doc_filenames:
        print str(cluster_id) +" : " + cluster_doc_filename
    print


sample output
.
.
3 : 
/home/vinayb/scikit_learn_data/20news_home/20news-bydate-test/alt.atheism/53583
3 : 
/home/vinayb/scikit_learn_data/20news_home/20news-bydate-test/talk.religion.misc/84215
3 : 
/home/vinayb/scikit_learn_data/20news_home/20news-bydate-train/alt.atheism/51204
3 : 
/home/vinayb/scikit_learn_data/20news_home/20news-bydate-train/talk.religion.misc/83515
3 : 
/home/vinayb/scikit_learn_data/20news_home/20news-bydate-test/talk.religion.misc/84216
3 : 
/home/vinayb/scikit_learn_data/20news_home/20news-bydate-test/alt.atheism/53413
3 : 
/home/vinayb/scikit_learn_data/20news_home/20news-bydate-train/alt.atheism/53167
3 : 
/home/vinayb/scikit_learn_data/20news_home/20news-bydate-train/alt.atheism/51241
.
.



---------- Forwarded message ----------
From: Vinay B, <[email protected]>
Date: Thu, Jan 31, 2013 at 5:20 PM
Subject: Text document clustering: How can I access the actual
clustered documents
To: [email protected]


Another newbie question.

I'm not referring to a confusion matrix or similar summary. Rather, If
I had a number of documents clustered using (say KMeans) into 3
clusters, .. how could I access
1. each cluster and a list of cluster terms?
2. a list of documents associated with each cluster?


Thanks

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_jan
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to