Hi Jack,


Please correct me if iam wrong I added Char filter because



In Analyzer[solr ui]  I have provided "Microsoft office" in Field Value (Index) 
now WhitespaceTokenizerFactory produces the below result Office starts at 10. 
if I leave additional space say 2 more spaces Office starts at 12 should it not 
start at 10?



text


raw_bytes


start


end


positionLength


type


position




microsoft


[6d 69 63 72 6f 73 6f 66 74]


0


9


1


word


1




office


[6f 66 66 69 63 65]


10


16


1


word


2






text


raw_bytes


start


end


positionLength


type


position




microsoft


[6d 69 63 72 6f 73 6f 66 74]


0


9


1


word


1




office


[6f 66 66 69 63 65]


12


18


1


word


2






Thanks

Rajesh





Corporate Executive Board India Private Limited. Registration No: 
U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building 
No.10 DLF Cyber City, Gurgaon, Haryana-122002, India..



This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including CEB subsidiaries that offer 
SHL Talent Measurement products and services. If you have received this e-mail 
in error, please notify the sender and immediately, destroy all copies of this 
email and its attachments. The publication, copying, in whole or in part, or 
use or dissemination in any other way of this e-mail and attachments by anyone 
other than the intended person(s) is prohibited.



-----Original Message-----
From: Jack Krupansky [mailto:jack.krupan...@gmail.com]
Sent: Monday, March 7, 2016 8:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Text search NGram



The charFilter isn't doing anything useful - the white space tokenzier will 
ignore extra white space anyway.



-- Jack Krupansky



On Mon, Mar 7, 2016 at 5:44 AM, G, Rajesh 
<r...@cebglobal.com<mailto:r...@cebglobal.com>> wrote:



> Hi Team,

>

> We have the blow type and we have indexed the value  "title":

> "Microsoft Visual Studio 2006" and "title": "Microsoft Visual Studio

> 8.0.61205.56 (2005)"

>

> When I search for title:(Microsoft Visual AND Studio AND 2005)  I get

> Microsoft Visual Studio 8.0.61205.56 (2005) as the second record and

> Microsoft Visual Studio 2006 as first record. I wanted to have

> Microsoft Visual Studio 8.0.61205.56 (2005) listed first since the

> user has searched for Microsoft Visual Studio 2005. Can you please help?.

>

> We are using NGram so it takes care of misspelled or jumbled words[it

> works as expected] e.g.

> searching Micrs Visual Studio will gets Microsoft Visual Studio

> searching Visual Microsoft Studio will gets Microsoft Visual Studio

>

>   <fieldType name="txt_token" class="solr.TextField"

> positionIncrementGap="0" >

>                 <analyzer type="index">

>                                 <charFilter

> class="solr.PatternReplaceCharFilterFactory" pattern="\s+" replacement=" "/>

>                                 <tokenizer

> class="solr.WhitespaceTokenizerFactory"/>

>                                 <filter

> class="solr.LowerCaseFilterFactory"/>

>                                 <filter class="solr.NGramFilterFactory"

> minGramSize="2" maxGramSize="800"/>

>                 </analyzer>

>                  <analyzer type="query">

>                                 <charFilter

> class="solr.PatternReplaceCharFilterFactory" pattern="\s+" replacement=" "/>

>                                 <tokenizer

> class="solr.WhitespaceTokenizerFactory"/>

>                                 <filter

> class="solr.LowerCaseFilterFactory"/>

>                                 <filter class="solr.NGramFilterFactory"

> minGramSize="2" maxGramSize="800"/>

>                 </analyzer>

>   </fieldType>

>

>

>

> Corporate Executive Board India Private Limited. Registration No:

> U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF

> Building

> No.10 DLF Cyber City, Gurgaon, Haryana-122002, India..

>

>

>

> This e-mail and/or its attachments are intended only for the use of

> the

> addressee(s) and may contain confidential and legally privileged

> information belonging to CEB and/or its subsidiaries, including CEB

> subsidiaries that offer SHL Talent Measurement products and services.

> If you have received this e-mail in error, please notify the sender

> and immediately, destroy all copies of this email and its attachments.

> The publication, copying, in whole or in part, or use or dissemination

> in any other way of this e-mail and attachments by anyone other than

> the intended

> person(s) is prohibited.

>

>

>

Reply via email to