[ 
https://issues.apache.org/jira/browse/SOLR-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795925#action_12795925
 ] 

Stanislaw Osinski commented on SOLR-1692:
-----------------------------------------

{quote}
bq. Where should the configuration of the highlighter we use for clustering 
come from?

We have all the code hooked in for it already, we're just ignoring the output.
{quote}

To avoid confusion and questions along the lines of "why clusters don't match 
the (highlighted) documents I'm seeing", I'd suggest a slightly more elaborate 
scenario for the clustering highlighter configuration:

1. If main Solr highlighting is disabled, use the clustering component's 
highlighter settings.
2. If main Solr highlighting is enabled, use the main highlighter's 
configuration as the defaults and let the clustering-specific highlighter 
configuration override the defaults.

If we do it this way, we'll minimize the chances of users accidentally 
performing clustering on documents different (differently highlighted) than 
those they will see.

bq. Would be great if, Carrot2 could also just use the analysis that 
Lucene/Solr produces, that way it would be much easier to configure stopwords, 
HTML stripping, etc.

This one would require some larger changes to Carrot2 internals. We do use 
Lucene infrastructure for preprocessing (currently for tokenization), but I can 
investigate if we can extend that further. A potential problem here is that 
very often the set of stopwords you use for document retrieval may not work 
equally well for clustering. I've filed a [Carrot2-specific 
issue|http://issues.carrot2.org/browse/CARROT-606] for it and will try to come 
up with something.

> CarrotClusteringEngine produce summary does nothing
> ---------------------------------------------------
>
>                 Key: SOLR-1692
>                 URL: https://issues.apache.org/jira/browse/SOLR-1692
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - Clustering
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>             Fix For: 1.5
>
>         Attachments: SOLR-1692.patch
>
>
> In the CarrotClusteringEngine, the produceSummary option does nothing, as the 
> results of doing the highlighting are just ignored.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to