The query parsers normally tokenize on white space and query operators, but you can escape any white space with backslash or put the text in quotes and then it will be tokenized by the analyzer rather than the query parser.

Also, you have:

<analyzer type="search">

Change "search" to "query", but that won't change your problem since Solr defaults to using the "index" analyzer if it doesn't "see" a "query" analyzer.

-- Jack Krupansky

-----Original Message----- From: Dirk Högemann
Sent: Monday, December 17, 2012 5:59 AM
To: solr-user@lucene.apache.org
Subject: Solr3.5 PatternTokenizer / Search Analyzer tokenizing always at whitespace?

Hi,

I am not sure if am missing something, or maybe I do not exactly understand
the index/search analyzer definition and their execution.

I have a field definition like this:


   <fieldType name="cl2tokenized_string" class="solr.TextField"
sortMissingLast="true" omitNorms="true">
     <analyzer type="index">
       <tokenizer class="solr.PatternTokenizerFactory" pattern="###"
group="-1"/>
       <filter class="solr.LowerCaseFilterFactory"/>
     </analyzer>
     <analyzer type="search">
       <tokenizer class="solr.PatternTokenizerFactory" pattern="###"
group="-1"/>
       <filter class="solr.LowerCaseFilterFactory"/>
     </analyzer>
   </fieldType>

Any field starting with cl2 should be recognized as being of type
cl2Tokenized_string:
<dynamicField name="cl2*" type="cl2tokenized_string" indexed="true"
stored="true" />

When I try to search for a token in that sense the query is tokenized at
whitespaces:

<arr name="filter_queries"><str>{!q.op=AND
df=cl2Categories_NACE}cl2Categories_NACE:08 Gewinnung von Steinen und
Erden, sonstiger Bergbau</str></arr><arr
name="parsed_filter_queries"><str>+cl2Categories_NACE:08
+cl2Categories_NACE:gewinnung +cl2Categories_NACE:von
+cl2Categories_NACE:steinen +cl2Categories_NACE:und
+cl2Categories_NACE:erden, +cl2Categories_NACE:sonstiger
+cl2Categories_NACE:bergbau</str></arr>

I expected the query parser would also tokenize ONLY at the pattern ###,
instead of using a white space tokenizer here?
Is is possible to define a filter query, without using phrases, to achieve
the desired behavior?
Maybe local parameters are not the way to go here?

Best
Dirk

Reply via email to