[ https://issues.apache.org/jira/browse/SOLR-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stanislaw Osinski updated SOLR-2450: ------------------------------------ Attachment: SOLR-2450.patch Patch for the use of stop words from the field's {{StopWordFilterFactory}} and {{CommonGramsFilterFactory}} in addition to Carrot2's built-in stop words. Requires the SOLR-2448 and SOLR-2449 patches applied. > Carrot2 clustering should use both its own and Solr's stop words > ---------------------------------------------------------------- > > Key: SOLR-2450 > URL: https://issues.apache.org/jira/browse/SOLR-2450 > Project: Solr > Issue Type: Improvement > Components: contrib - Clustering > Reporter: Stanislaw Osinski > Assignee: Stanislaw Osinski > Priority: Minor > Fix For: 3.2, 4.0 > > Attachments: SOLR-2450.patch > > > While using only Solr's stop words for clustering isn't a good idea (compared > to indexing, clustering needs more aggressive stop word removal to get > reasonable cluster labels), it would be good if Carrot2 used both its own and > Solr's stop words. > I'm not sure what the best way to implement this would be though. My first > thought was to simply load {{stopwords.txt}} from Solr config dir and merge > them with Carrot2's. But then, maybe a better approach would be to get the > stop words from the StopFilter being used? Ideally, we should also consider > the per-field stop filters configured on the fields used for clustering. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org