n documents clustering using a precomputed similarity metric between a pair
of documents.
Code so Far

Sim=np.zeros((n, n)) # create a numpy arrary
i=0
j=0
for i in range(0,n):
   for j in range(i,n):
    if i==j:
        Sim[i][j]=1
     else:
         Sim[i][j]=simfunction(list_doc[i],list_doc[j]) # calculate
similarity between documents i and j using simfunction

Sim=Sim+ Sim.T - np.diag(Sim.diagonal()) # complete the symmetric matrix

AggClusterDistObj=AgglomerativeClustering(n_clusters=num_cluster,linkage='average',affinity="precomputed")

Res_Labels=AggClusterDistObj.fit_predict(Sim)

 My concern is that here I used a similarity function , and I think as per
documents it should be a disimilarity matrix, how can I change it to
dissimilarity matrix. Also what would be a more efficient way to do this.


Thanks,

Amita


On Wed, Sep 3, 2014 at 12:50 AM, Amita Misra <[email protected]> wrote:

> Hello,
>
> I have n documents and want to use precomputed similarity mertric between
> a pair of documents for clustering.
> I created a  2 dim numpy Array say X, containing similarity score for
> every pair of documents.
> Also
> type(X) and X.shape gives the output as
> <type 'numpy.ndarray'>
> (n, n)
> Then I create a cluster object using
> *object= *sklearn.cluster.AgglomerativeClustering(*n_clusters=10*,
> *affinity='precomputed')*
> and then may I  do
> *object.*fit_predict(X)  to get the labels in each cluster.
>
> Thanks,
> Amita
>
>
>
>
------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to