[jira] [Updated] (SOLR-2450) Carrot2 clustering should use both its own and Solr's stop words

Stanislaw Osinski (JIRA) Sat, 02 Apr 2011 11:12:44 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Stanislaw Osinski updated SOLR-2450:
------------------------------------

    Attachment: SOLR-2450.patch

Patch for the use of stop words from the field's {{StopWordFilterFactory}} and 
{{CommonGramsFilterFactory}} in addition to Carrot2's built-in stop words.

Requires the SOLR-2448 and SOLR-2449 patches applied. 

> Carrot2 clustering should use both its own and Solr's stop words
> ----------------------------------------------------------------
>
>                 Key: SOLR-2450
>                 URL: https://issues.apache.org/jira/browse/SOLR-2450
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - Clustering
>            Reporter: Stanislaw Osinski
>            Assignee: Stanislaw Osinski
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: SOLR-2450.patch
>
>
> While using only Solr's stop words for clustering isn't a good idea (compared 
> to indexing, clustering needs more aggressive stop word removal to get 
> reasonable cluster labels), it would be good if Carrot2 used both its own and 
> Solr's stop words.
> I'm not sure what the best way to implement this would be though. My first 
> thought was to simply load {{stopwords.txt}} from Solr config dir and merge 
> them with Carrot2's. But then, maybe a better approach would be to get the 
> stop words from the StopFilter being used? Ideally, we should also consider 
> the per-field stop filters configured on the fields used for clustering.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2450) Carrot2 clustering should use both its own and Solr's stop words

Reply via email to