Gert,

In your second query example you used "qf=...".  Did you mean "fq=...." ?  If 
so, the answer is no - filter queries don't affect the score.


I haven't tried your approach, but intuitively feel that looking at % overlap 
may work better. 
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: "Villemos, Gert" <gert.ville...@logica.com>
> To: solr-user@lucene.apache.org; solr-user@lucene.apache.org
> Sent: Fri, April 23, 2010 5:08:04 PM
> Subject: RE: Comparing two queries
> 
> I was thinking along the lines

1. Retrieve the top result for one 
> query.
2. Take the resulting document and evaluate the score that it would 
> get in another query.
3. If the scores are similar, then the queries most 
> likely overlap.

I guess that if I had two simple query strings "archive 
> crash" and query "archiving failure" then I could:

1. Use the query 
> ?q="archive crash"&rows=1 which will return me one result (if any).
2. 
> Read the score of the returned document.
3. Read the unique identifier field 
> value, lets say it has field name 'URI' and value 
> "50d1c07b-a635-4f9a-a6eb-f2e3ebcb6b55'.
4. Use the query ?q="archiving 
> failure"&qf=URI:50d1c07b-a635-4f9a-a6eb-f2e3ebcb6b55&rows=1
5. Read 
> the score of the returned document (the document will be the same as returned 
> under 1, the score will be different, evaluated based on the second 
> query).
6. Evaluate how similar the scores are.

My question this 
> approach is; is the score calculated in 4 affected by the subquery, whoes 
> role 
> is solely to select a specific result?

I'm using the dismax by the way. 
> Should I use the standard handler instead? Would it make a difference?

> 
Thanks,
Gert.


> 

________________________________

From: Erik Hatcher [mailto:
> ymailto="mailto:erik.hatc...@gmail.com"; 
> href="mailto:erik.hatc...@gmail.com";>erik.hatc...@gmail.com]
Sent: Fri 
> 4/23/2010 8:08 PM
To: 
> href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org
Subject: 
> Re: Comparing two queries



Or, use facet.query to get the 
> overlap.  Here's 
> ?
q=<query1>&facet=on&facet.query=<query2>

You'll 
> get the hit count from query #1 in the results, and the 
overlapping count to 
> query #2 in the facet query response.

        Erik - 
> 
> >http://www.lucidimagination.com <
> href="http://www.lucidimagination.com/"; target=_blank 
> >http://www.lucidimagination.com/> 

On Apr 23, 2010, at 11:01 AM, 
> Otis Gospodnetic wrote:

> Hello Gert,
>
> I think you'd 
> have to apply custom heuristics that involves looking 
> at top N hits for 
> each query and looking at the % overlap.
>
> Otis
> 
> ----
> Sematext :: 
> >http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem 
> search :: 
> >http://search-lucene.com/
>
>
>
> ----- Original 
> Message ----
>> From: "Villemos, Gert" <
> ymailto="mailto:gert.ville...@logica.com"; 
> href="mailto:gert.ville...@logica.com";>gert.ville...@logica.com>
>> 
> To: 
> href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org
>> 
> Sent: Fri, April 23, 2010 10:20:54 AM
>> Subject: Comparing two 
> queries
>>
>> We want to support that a user can register for 
> interest in
>> information,
> based on a query he has defined 
> himself. For example that he
>> type in a
> query, press a save 
> button, provides his email and the system will
>> now
> email him 
> with a daily digest.
>
>
>
> As part of this, it 
> would
>> be nice to be able to tell the user that the
> same / a 
> similar query are
>> already being monitored by another user, 
> as
> the users will likely have the
>> same interests. I would 
> therefore like to
> evaluate whether two queries will
>> return 
> (almost) the same set of
> results.
>
>
>
> But 
> how can I
>> compare two queries to determine if they will 
> return
> (almost) the same set of
>> 
> results?
>
>
>
> Thanks,
>
> 
> Gert.
>
>
>
> Please help Logica
>> to respect 
> the environment by not printing this email  / Pour 
>> 
> contribuer
>> comme Logica au respect de l'environnement, merci de ne 
> pas 
>> imprimer ce mail
>> /  Bitte drucken Sie diese 
> Nachricht nicht aus und helfen Sie so 
>> Logica
>> dabei, die 
> Umwelt zu schützen. /  Por favor ajude a Logica a 
>> respeitar 
> o
>> ambiente nao imprimindo este correio 
> electronico.
>
>
>
> This e-mail and
>> any 
> attachment is for authorised use by the intended recipient(s) 
>> only. 
> It may
>> contain proprietary material, confidential information and/or 
> be 
>> subject to
>> legal privilege. It should not be copied, 
> disclosed to, retained or 
>> used by, any
>> other party. If 
> you are not an intended recipient then please 
>> promptly 
> delete
>> this e-mail and any attachment and all copies and inform the 
> 
>> sender. Thank
>> you.






Please 
> help Logica to respect the environment by not printing this email  / Pour 
> contribuer comme Logica au respect de l'environnement, merci de ne pas 
> imprimer 
> ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so 
> Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
> respeitar o ambiente nao imprimindo este correio 
> electronico.



This e-mail and any attachment is for authorised use 
> by the intended recipient(s) only. It may contain proprietary material, 
> confidential information and/or be subject to legal privilege. It should not 
> be 
> copied, disclosed to, retained or used by, any other party. If you are not an 
> intended recipient then please promptly delete this e-mail and any attachment 
> and all copies and inform the sender. Thank you.

Reply via email to