Re: Text search NGram

Emir Arnautovic Mon, 07 Mar 2016 06:38:16 -0800

Hi Rajesh,

Solution includes 2 fields - one "ngram" field (like your txt_token) andother "nonngram" field - just tokenized (like your txt_token withoutngram token filter). If you have two documents:

1. ABCDEF
2. ABCD

And you are searching for ABCD, if you use only ngram field, both arematches and doc 1 can be first, but if you search from ngram:ABCD ORnonngram:ABCD, doc 2 will have higher score.


Regards,
Emir

On 07.03.2016 15:20, G, Rajesh wrote:

Hi Emir,

Thanks for you email. Can you please help me to understand what do you mean by "e.g. 
boost if matching tokenized fileds to make sure exact matches are ordered first"



Corporate Executive Board India Private Limited. Registration No: 
U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building 
No.10 DLF Cyber City, Gurgaon, Haryana-122002, India.

This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including CEB subsidiaries that offer 
SHL Talent Measurement products and services. If you have received this e-mail 
in error, please notify the sender and immediately, destroy all copies of this 
email and its attachments. The publication, copying, in whole or in part, or 
use or dissemination in any other way of this e-mail and attachments by anyone 
other than the intended person(s) is prohibited.

-----Original Message-----
From: Emir Arnautovic [mailto:emir.arnauto...@sematext.com]
Sent: Monday, March 7, 2016 7:36 PM
To: solr-user@lucene.apache.org
Subject: Re: Text search NGram

Hi Rajesh,
It is most likely related to norms - you can try setting omitNorms="true" and 
reindexing content. Anyway, it is not common to use just ngrams for matching content - in 
such case you can expect more unexpected ordering/results. You should combine ngrams 
fields with normally tokenized fields (e.g. boost if matching tokenized fileds to make 
sure exact matches are ordered first).

Regards,
Emir

On 07.03.2016 11:44, G, Rajesh wrote:

Hi Team,

We have the blow type and we have indexed the value  "title": "Microsoft Visual Studio 2006" and 
"title": "Microsoft Visual Studio 8.0.61205.56 (2005)"

When I search for title:(Microsoft Visual AND Studio AND 2005)  I get Microsoft 
Visual Studio 8.0.61205.56 (2005) as the second record and  Microsoft Visual 
Studio 2006 as first record. I wanted to have Microsoft Visual Studio 
8.0.61205.56 (2005) listed first since the user has searched for Microsoft 
Visual Studio 2005. Can you please help?.

We are using NGram so it takes care of misspelled or jumbled words[it
works as expected] e.g.
searching Micrs Visual Studio will gets Microsoft Visual Studio
searching Visual Microsoft Studio will gets Microsoft Visual Studio

    <fieldType name="txt_token" class="solr.TextField" positionIncrementGap="0" 
>
                  <analyzer type="index">
                                  <charFilter class="solr.PatternReplaceCharFilterFactory" 
pattern="\s+" replacement=" "/>
                                  <tokenizer 
class="solr.WhitespaceTokenizerFactory"/>
                                  <filter class="solr.LowerCaseFilterFactory"/>
                                  <filter class="solr.NGramFilterFactory" minGramSize="2" 
maxGramSize="800"/>
                  </analyzer>
                   <analyzer type="query">
                                  <charFilter class="solr.PatternReplaceCharFilterFactory" 
pattern="\s+" replacement=" "/>
                                  <tokenizer 
class="solr.WhitespaceTokenizerFactory"/>
                                  <filter class="solr.LowerCaseFilterFactory"/>
                                  <filter class="solr.NGramFilterFactory" minGramSize="2" 
maxGramSize="800"/>
                  </analyzer>
    </fieldType>



Corporate Executive Board India Private Limited. Registration No: 
U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building 
No.10 DLF Cyber City, Gurgaon, Haryana-122002, India..



This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including CEB subsidiaries that offer 
SHL Talent Measurement products and services. If you have received this e-mail 
in error, please notify the sender and immediately, destroy all copies of this 
email and its attachments. The publication, copying, in whole or in part, or 
use or dissemination in any other way of this e-mail and attachments by anyone 
other than the intended person(s) is prohibited.

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & 
Elasticsearch Support * http://sematext.com/


--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

Re: Text search NGram

Reply via email to