I'm trying to using to search though news websites, but I was interested in classification on index time, is there any available solution for this?
Greetings! On Dec 3, 2012, at 12:37 PM, Stanislaw Osinski <stanis...@osinski.name> wrote: >> I mean measuring the similarity between the document in each cluster. >> Also, difference between document on one cluster with another cluster. >> >> I saw the sample code ClusteringQualityBencmark.java >> However, I do not know how to make use of it for assessing my Solr >> Clustering performance. >> > > You'd need to write your own code for this, here are the most common > clustering quality measures you mentioned: > > http://en.wikipedia.org/wiki/Cluster_analysis#Evaluation_of_clustering_results > > These are meant for the general case (numeric attributes), to apply them to > texts, you'd need to use the vector representation of the documents. > > One a more general note, synthetic measures test only the document-cluster > assignments, but none take the quality of labels into account (this is > really hard to measure objectively). > > Staszek > > > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS > INFORMATICAS... > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION > > http://www.uci.cu > http://www.facebook.com/universidad.uci > http://www.flickr.com/photos/universidad_uci 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci