Thanks for your email Koji. Can you please explain what is the role of 
tokenizer and filter so I can understand why I should not have two tokenizer in 
index and I should have at least one tokenizer in query?

My understanding is tokenizer is used to say how the content should be indexed 
physically in file system. Filters are used to query result




Corporate Executive Board India Private Limited. Registration No: 
U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building 
No.10 DLF Cyber City, Gurgaon, Haryana-122002, India.

This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including CEB subsidiaries that offer 
SHL Talent Measurement products and services. If you have received this e-mail 
in error, please notify the sender and immediately, destroy all copies of this 
email and its attachments. The publication, copying, in whole or in part, or 
use or dissemination in any other way of this e-mail and attachments by anyone 
other than the intended person(s) is prohibited.

-----Original Message-----
From: Koji Sekiguchi [mailto:koji.sekigu...@rondhuit.com]
Sent: Wednesday, March 2, 2016 8:10 PM
To: solr-user@lucene.apache.org
Subject: Re: FW: Difference Between Tokenizer and filter

Hi,

<analyzer>...</analyzer> must have one and only one <tokenizer/> and it can 
have zero or more <filter/>s. From the point of view of the rules, your 
<analyzer type="index">...</analyzer> is not correct because it has more than 
one <tokenizer/> and <analyzer type="query"> ...</analyzer> is not correct as 
well because it has no <tokenizer/>.

Koji

On 2016/03/02 20:25, G, Rajesh wrote:
> Hi Team,
>
> Can you please clarify the below. My understanding is tokenizer is used to 
> say how the content should be indexed physically in file system. Filters are 
> used to query result. The blow lines are from my setup. But I have seen eg 
> that include filters inside <analyzer type=”index”> and tokenizer in 
> <analyzer type=”query”> that confused me.
>
>                  <fieldType name="customSearch" class="solr.TextField" 
> positionIncrementGap="100" >
>                                  <analyzer type="index">
>                                     <tokenizer 
> class="solr.LowerCaseTokenizerFactory"/>
>                                     <tokenizer 
> class="solr.StandardTokenizerFactory"/>
>                                     <tokenizer 
> class="solr.NGramTokenizerFactory" minGramSize="2" maxGramSize="2"/>
>                                  </analyzer>
>                                  <analyzer type="query">
>                                     <filter class="solr.NGramFilterFactory" 
> minGramSize="2" maxGramSize="2"/>
>                                  </analyzer>
>                  </fieldType>
>
> My goal is to user solr and find the best match among the technology
> names e.g Actual tech name
>
> 1.       Microsoft Visual Studio
>
> 2.       Microsoft Internet Explorer
>
> 3.       Microsoft Visio
>
> When user types Microsoft Visal Studio user should get Microsoft
> Visual Studio. Basically misspelled and jumble words should match
> closest tech name
>
>
>
>
>
> Corporate Executive Board India Private Limited. Registration No: 
> U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building 
> No.10 DLF Cyber City, Gurgaon, Haryana-122002, India..
>
>
>
> This e-mail and/or its attachments are intended only for the use of the 
> addressee(s) and may contain confidential and legally privileged information 
> belonging to CEB and/or its subsidiaries, including CEB subsidiaries that 
> offer SHL Talent Measurement products and services. If you have received this 
> e-mail in error, please notify the sender and immediately, destroy all copies 
> of this email and its attachments. The publication, copying, in whole or in 
> part, or use or dissemination in any other way of this e-mail and attachments 
> by anyone other than the intended person(s) is prohibited.
>
>

Reply via email to