Thanks Jack.

There seem to be a never ending set of FilterFactories, I keep hearing
about new ones all the time :)

Ok, I get it, so our existing code is the first N tokens of each value, and
using LimitTokenPositionFilterFactor**y with the same number would give us
the first N of the combined set of tokens, that's good to know.



On 16 July 2013 14:15, Jack Krupansky <j...@basetechnology.com> wrote:

> Yes, each input value is analyzed separately. Solr passes each input value
> to Lucene and then Lucene analyzes each.
>
> You could use LimitTokenPositionFilterFactor**y which uses the absolute
> token position - each successive analyzed value would have an incremented
> position, plus the positionIncrementGap (typically 100 for text.)
>
> -- Jack Krupansky
>
> -----Original Message----- From: Daniel Collins
> Sent: Tuesday, July 16, 2013 8:46 AM
> To: solr-user@lucene.apache.org
> Subject: Are analysers applied to each value in a multi-valued field
> separately?
>
>
> I'm guessing the answer is yes, but here's the background.
>
> We index 2 separate fields, headline and body text for a document, and then
> we want to identify the "top" of the story which is th headline + N words
> of the body (we want to weight that in scoring).
>
> So do to that:
>
> <copyField src="headline" dest="top"/>
> <copyField src="body" dest="top"/>
>
> And the "top" field has a LimitTokenCountFilterFactory appended to it to do
> the limiting.
>
>        <filter class="solr.**LimitTokenCountFilterFactory"
> maxTokenCount="N"/>
>
> I realised that top needs to be multi-valued, which got me thinking: is
> that N tokens PER VALUE of top or N tokens in total within the top field...
> The field is indexed but not stored, so its hard to determine exactly
> which is being done.
>
> Logically, I presume each value in the field is independent (and Solr then
> just matches searches against each one), so that would suggest N is per
> value?
>
> Cheers, Daniel
>

Reply via email to