Thanks Staszek I'll give a try to stopwords treatbment, but the problem is that we perform POS tagging and then use payloads to keep only Nouns and Adjectives, and we thought that could be interesting to perform clustering only with these elements, to avoid senseless words.
Of course is a problem of clustering, but maybe is also a feature that could be interesting to have in solr: not to index the raw input text but the analyzed one, so "stored" could be <False | Raw | analyzed> Stanislaw Osinski-2 wrote: > > Hi Joan, > > I'm trying to use carrot2 (now I started with the workbench) and I can >> cluster any field, but, the text used for clustering is the original raw >> text, the one that was indexed, without any of the processing performed >> by >> the tokenizer or filters. >> So I get stop words. >> > > The easiest way to fix this is to update the stop words list used by > Carrot2, see http://wiki.apache.org/solr/ClusteringComponent, "Tuning > Carrot2 clustering" section at the bottom. > > If you want to get readable > cluster labels, it's best to feed the raw text for clustering (cluster > labels are phrases taken from the input text, if you remove stopwords and > stem everything, the phrases will become unreadable). > > Cheers, > > Staszek > > -- View this message in context: http://old.nabble.com/Clustering-from-anlayzed-text-instead-of-raw-input-tp27765780p27769034.html Sent from the Solr - User mailing list archive at Nabble.com.