Re: Query terms and the match state

2019-09-08 Thread Scott Stults
Lucene has a SynonymQuery and a BlendedTermQuery that do something like you
want in different ways. However, if you want to keep your existing schema
and do this through Solr you can use the constant score syntax in edismax
on each term:

q=name:(corsair)^=1.0 name:(ddr)^=1.0 manu:(corsair)^=1.0 manu:(ddr)^=1.0

The resulting score will be the total number of times each term matched in
either field. (Note, if you group the terms together in the parentheses
like "name:(corsair ddr)^=1.0" you'll only know if either term matched --
the whole clause gets a score of 1.0). For the techproducts example corpus:

[
  {
"name":"CORSAIR  XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM
Unbuffered DDR 400 (PC 3200) Dual Channel Kit System Memory - Retail",
"manu":"Corsair Microsystems Inc.",
"score":3.0},
  {
"name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered
DDR 400 (PC 3200) System Memory - Retail",
"manu":"Corsair Microsystems Inc.",
"score":3.0},
  {
"name":"A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR
400 (PC 3200) System Memory - OEM",
"manu":"A-DATA Technology Inc.",
"score":1.0}]


You could use this as the basis for a function query to gain more control
over your scoring.

Hope that helps!

-Scott


On Tue, Sep 3, 2019 at 1:35 PM Kumaresh AK  wrote:

> Hello Solr Community!
>
> *Problem*: I wish to know if the result document matched all the terms in
> the query. The ranking used in solr works most of the time. For some cases
> where one of the term is rare and occurs in couple of fields; such
> documents trump a document which matches all the terms. Ideally i wish to
> have such a document (that matches all terms) to trump a document that
> matches only 9/10 terms but matches one of the rare terms twice.
> eg:
> *query1*
> field1:(a b c d) field2:(a b c d)
> Results of the above query looks good.
>
> *query2*
> filed1:(a b c 5) field2:(a b c 5)
> result:
> doc1: {field1: b c 5 field2: b c 5}
> 
> doc21: {field1: a b c 5 field: null}
>
> Results are almost good except that doc21 is trailing doc1. There are a few
> documents similar to doc1 and pushes doc21 to next page (I use default page
> size = 10)
>
> I understand that this is how tf-idf works. I tried to boost certain fields
> to solve this problem. But that breaks normal cases (query1). So, I set out
> to just solve the case where I wish to boost (or) augment a field with that
> information (as ratio of matched-terms/total-terms)
>
> *Ask:* Is it possible to get back the terms of the query and the matched
> state ?
>
> I tried
>
>- debug=query option (with the default select handler)
>- with terms in the debug response I could write a function query to
>know its match state
>
> Is this approach safe/performant for production use ? Is there a better
> approach to solve this problem ?
>
> Regards,
> Kumaresh
>


-- 
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com


Query terms and the match state

2019-09-03 Thread Kumaresh AK
Hello Solr Community!

*Problem*: I wish to know if the result document matched all the terms in
the query. The ranking used in solr works most of the time. For some cases
where one of the term is rare and occurs in couple of fields; such
documents trump a document which matches all the terms. Ideally i wish to
have such a document (that matches all terms) to trump a document that
matches only 9/10 terms but matches one of the rare terms twice.
eg:
*query1*
field1:(a b c d) field2:(a b c d)
Results of the above query looks good.

*query2*
filed1:(a b c 5) field2:(a b c 5)
result:
doc1: {field1: b c 5 field2: b c 5}

doc21: {field1: a b c 5 field: null}

Results are almost good except that doc21 is trailing doc1. There are a few
documents similar to doc1 and pushes doc21 to next page (I use default page
size = 10)

I understand that this is how tf-idf works. I tried to boost certain fields
to solve this problem. But that breaks normal cases (query1). So, I set out
to just solve the case where I wish to boost (or) augment a field with that
information (as ratio of matched-terms/total-terms)

*Ask:* Is it possible to get back the terms of the query and the matched
state ?

I tried

   - debug=query option (with the default select handler)
   - with terms in the debug response I could write a function query to
   know its match state

Is this approach safe/performant for production use ? Is there a better
approach to solve this problem ?

Regards,
Kumaresh