KeywordTokenizer for faceting; was: Re: Analyzer for indexing only, not for queries

Michael Kuhlmann Fri, 12 Mar 2010 07:24:48 -0800

Hi Erick,

thank you very much for your help. What's confusing me is that another
of my fields does not have any analyzers defined at all, and it's
working fine without problems. So, it must be possible to define field
type without specifying any analyzers. I don't understand why it
shouldn't be possible any more if either the index or the query analyzer
is specified and the other not. Maybe it would be clearer if Solr would
raise an exception in this case instead of using some analyzer that was
specified for the opposite type.

Anyway; I took your advice and used the KeywordTokenizerFactory instead.
Great! Now it does excactly what I want. Thanks again!

But may I ask another question? As with the categories, I have some
fields that are only used for faceting, so they're only queried by facet
results. No modification is needed, no lowercase, nothing. So the
KeywordTokenizerFactory is perfect for them.

Alas, when the value contains spaces, I'm still getting too many
results. I have a field defined like this:

    <fieldType name="text_unchanged" class="solr.StrField"
positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>
      </analyzer>
    </fieldType>

(Using solr.TextField didn't change anything)

When quering for:
....&fq=label:Aces+of+London

I get the result:
.... "facet_fields":{
        "label":[
         "Aces of London",31,
         "Feud London",2,
         "Fly London",2],
....},

I get the same result when taking "Feud London" as the facet value.

When inspecting the index with the schema browser, I can see that all
labels are tokenized correctly in complete, i.e. there's no token
"London", but a token "Aces of London". So the KeywordTokenizer seems to
work as expected, at least for indexing. It's only that the facet query
is not narrow enough.

Even the superb Solr book didn't help me here. Do you - or any other -
has/have a clue what I'm doing wrong here?

Greetings,
Michael

On 03/12/10 14:52, Erick Erickson wrote:
> Well, what would you have SOLR do that makes sense if you
> don't define a query analyzer? Very very strange things
> happen if you use different analyzers for indexing
> and querying. At least defaulting that way has a *chance* of
> giving expected results...
> 
> Why not use, say, KeywordTokenizerFactory if you really
> want the query analyzer to do nothing? Perhaps lowercasing
> etc. See:
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
> 
> <http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters>HTH
> Erick
> 
> On Fri, Mar 12, 2010 at 3:00 AM, Michael Kuhlmann <
> michael.kuhlm...@zalando.de> wrote:
>

KeywordTokenizer for faceting; was: Re: Analyzer for indexing only, not for queries

Reply via email to