Hi Jack,
Please correct me if iam wrong I added Char filter because In Analyzer[solr ui] I have provided "Microsoft office" in Field Value (Index) now WhitespaceTokenizerFactory produces the below result Office starts at 10. if I leave additional space say 2 more spaces Office starts at 12 should it not start at 10? text raw_bytes start end positionLength type position microsoft [6d 69 63 72 6f 73 6f 66 74] 0 9 1 word 1 office [6f 66 66 69 63 65] 10 16 1 word 2 text raw_bytes start end positionLength type position microsoft [6d 69 63 72 6f 73 6f 66 74] 0 9 1 word 1 office [6f 66 66 69 63 65] 12 18 1 word 2 Thanks Rajesh Corporate Executive Board India Private Limited. Registration No: U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, Haryana-122002, India.. This e-mail and/or its attachments are intended only for the use of the addressee(s) and may contain confidential and legally privileged information belonging to CEB and/or its subsidiaries, including CEB subsidiaries that offer SHL Talent Measurement products and services. If you have received this e-mail in error, please notify the sender and immediately, destroy all copies of this email and its attachments. The publication, copying, in whole or in part, or use or dissemination in any other way of this e-mail and attachments by anyone other than the intended person(s) is prohibited. -----Original Message----- From: Jack Krupansky [mailto:jack.krupan...@gmail.com] Sent: Monday, March 7, 2016 8:24 PM To: solr-user@lucene.apache.org Subject: Re: Text search NGram The charFilter isn't doing anything useful - the white space tokenzier will ignore extra white space anyway. -- Jack Krupansky On Mon, Mar 7, 2016 at 5:44 AM, G, Rajesh <r...@cebglobal.com<mailto:r...@cebglobal.com>> wrote: > Hi Team, > > We have the blow type and we have indexed the value "title": > "Microsoft Visual Studio 2006" and "title": "Microsoft Visual Studio > 8.0.61205.56 (2005)" > > When I search for title:(Microsoft Visual AND Studio AND 2005) I get > Microsoft Visual Studio 8.0.61205.56 (2005) as the second record and > Microsoft Visual Studio 2006 as first record. I wanted to have > Microsoft Visual Studio 8.0.61205.56 (2005) listed first since the > user has searched for Microsoft Visual Studio 2005. Can you please help?. > > We are using NGram so it takes care of misspelled or jumbled words[it > works as expected] e.g. > searching Micrs Visual Studio will gets Microsoft Visual Studio > searching Visual Microsoft Studio will gets Microsoft Visual Studio > > <fieldType name="txt_token" class="solr.TextField" > positionIncrementGap="0" > > <analyzer type="index"> > <charFilter > class="solr.PatternReplaceCharFilterFactory" pattern="\s+" replacement=" "/> > <tokenizer > class="solr.WhitespaceTokenizerFactory"/> > <filter > class="solr.LowerCaseFilterFactory"/> > <filter class="solr.NGramFilterFactory" > minGramSize="2" maxGramSize="800"/> > </analyzer> > <analyzer type="query"> > <charFilter > class="solr.PatternReplaceCharFilterFactory" pattern="\s+" replacement=" "/> > <tokenizer > class="solr.WhitespaceTokenizerFactory"/> > <filter > class="solr.LowerCaseFilterFactory"/> > <filter class="solr.NGramFilterFactory" > minGramSize="2" maxGramSize="800"/> > </analyzer> > </fieldType> > > > > Corporate Executive Board India Private Limited. Registration No: > U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF > Building > No.10 DLF Cyber City, Gurgaon, Haryana-122002, India.. > > > > This e-mail and/or its attachments are intended only for the use of > the > addressee(s) and may contain confidential and legally privileged > information belonging to CEB and/or its subsidiaries, including CEB > subsidiaries that offer SHL Talent Measurement products and services. > If you have received this e-mail in error, please notify the sender > and immediately, destroy all copies of this email and its attachments. > The publication, copying, in whole or in part, or use or dissemination > in any other way of this e-mail and attachments by anyone other than > the intended > person(s) is prohibited. > > >