Re: measure result set quality

2013-10-24 Thread Chris Hostetter

: As a first approach I will evaluate (manually :( ) hits that are out of the
: intersection set for every query in each system. Anyway I will keep

FYI: LucidWorks has a Relevancy Workbench tool that serves as a simple 
UI designed explicitly for the purpose of comparing the result sets of 
from different solr query configurations...

http://www.lucidworks.com/market_app/lucidworks-relevancy-workbench/


-Hoss


Re: measure result set quality

2013-10-21 Thread Alvaro Cabrerizo
Thanks for your valuable answers.

As a first approach I will evaluate (manually :( ) hits that are out of the
intersection set for every query in each system. Anyway I will keep
searching for literature in the field.

Regards.


On Sun, Oct 20, 2013 at 10:55 PM, Doug Turnbull 
dturnb...@opensourceconnections.com wrote:

 That's exactly what we advocate for in our Solr work. We call in Test
 Driven Relevancy. We work closely with content experts to help build
 collaboration around search quality. (disclaimer, yes we build a product
 around this) but the advice still stands regardless.


 http://www.opensourceconnections.com/2013/10/14/what-is-test-driven-search-relevancy/

 Cheers
 -Doug Turnbull
 Search Relevancy Expert
 OpenSource Connections




 On Sun, Oct 20, 2013 at 4:21 PM, Furkan KAMACI furkankam...@gmail.com
 wrote:

  Let's assume that you have keywords to search and different
 configurations
  for indexing. A/B testing is one of techniques that you can use as like
  Erick mentioned.
 
  If you want to have an automated comparison and do not have a oracle for
  A/B testing there is another way. If you have an ideal result list you
 can
  compare the similarity of your different configuration results and that
  ideal result list.
 
  The ideal result list can be created by an expert just for one time. If
  you are developing a search engine you can search same keywords at that
 one
  of search engines and you can use that results as ideal result list to
  measure your result lists' similarities.
 
  Kendall's tau is one of the methods to use for such kind of situations.
 If
  you do not have any document duplication at your index (without any other
  versions) I suggest to use tau a.
 
  If you explain your system and if you explain what is good for you or
 what
  is ideal for you I can explain you more.
 
  Thanks;
  Furkan KAMACI
 
 
  2013/10/18 Erick Erickson erickerick...@gmail.com
 
   bq: How do you compare the quality of your
   search result in order to decide which schema is better?
  
   Well, that's actually a hard problem. There's the
   various TREC data, but that's a generic solution and most
   every individual application of this generic thing called
   search has its own version of good results.
  
   Note that scores are NOT comparable across different
   queries even in the same data set, so don't go down that
   path.
  
   I'd fire the question back at you, Can you define what
   good (or better) results are in such a way that you can
   program an evaluation? Often the answer is no...
  
   One common technique is to have knowledgable users
   do what's called A/B testing. You fire the query at two
   separate Solr instances and display the results side-by-side,
   and the user says A is more relevant, or B is more
   relevant. Kind of like an eye doctor. In sophisticated A/B
   testing, the program randomly changes which side the
   results go, so you remove sidedness bias.
  
  
   FWIW,
   Erick
  
  
   On Thu, Oct 17, 2013 at 11:28 AM, Alvaro Cabrerizo topor...@gmail.com
   wrote:
  
Hi,
   
Imagine the next situation. You have a corpus of documents and a list
  of
queries extracted from production environment. The corpus haven't
 been
manually annotated with relvant/non relevant tags for every query.
 Then
   you
configure various solr instances changing the schema (adding
 synonyms,
stopwords...). After indexing, you prepare and execute the test over
different schema configurations.  How do you compare the quality of
  your
search result in order to decide which schema is better?
   
Regards.
   
  
 



 --
 Doug Turnbull
 Search  Big Data Architect
 OpenSource Connections http://o19s.com



Re: measure result set quality

2013-10-20 Thread Furkan KAMACI
Let's assume that you have keywords to search and different configurations
for indexing. A/B testing is one of techniques that you can use as like
Erick mentioned.

If you want to have an automated comparison and do not have a oracle for
A/B testing there is another way. If you have an ideal result list you can
compare the similarity of your different configuration results and that
ideal result list.

The ideal result list can be created by an expert just for one time. If
you are developing a search engine you can search same keywords at that one
of search engines and you can use that results as ideal result list to
measure your result lists' similarities.

Kendall's tau is one of the methods to use for such kind of situations. If
you do not have any document duplication at your index (without any other
versions) I suggest to use tau a.

If you explain your system and if you explain what is good for you or what
is ideal for you I can explain you more.

Thanks;
Furkan KAMACI


2013/10/18 Erick Erickson erickerick...@gmail.com

 bq: How do you compare the quality of your
 search result in order to decide which schema is better?

 Well, that's actually a hard problem. There's the
 various TREC data, but that's a generic solution and most
 every individual application of this generic thing called
 search has its own version of good results.

 Note that scores are NOT comparable across different
 queries even in the same data set, so don't go down that
 path.

 I'd fire the question back at you, Can you define what
 good (or better) results are in such a way that you can
 program an evaluation? Often the answer is no...

 One common technique is to have knowledgable users
 do what's called A/B testing. You fire the query at two
 separate Solr instances and display the results side-by-side,
 and the user says A is more relevant, or B is more
 relevant. Kind of like an eye doctor. In sophisticated A/B
 testing, the program randomly changes which side the
 results go, so you remove sidedness bias.


 FWIW,
 Erick


 On Thu, Oct 17, 2013 at 11:28 AM, Alvaro Cabrerizo topor...@gmail.com
 wrote:

  Hi,
 
  Imagine the next situation. You have a corpus of documents and a list of
  queries extracted from production environment. The corpus haven't been
  manually annotated with relvant/non relevant tags for every query. Then
 you
  configure various solr instances changing the schema (adding synonyms,
  stopwords...). After indexing, you prepare and execute the test over
  different schema configurations.  How do you compare the quality of your
  search result in order to decide which schema is better?
 
  Regards.
 



Re: measure result set quality

2013-10-20 Thread Doug Turnbull
That's exactly what we advocate for in our Solr work. We call in Test
Driven Relevancy. We work closely with content experts to help build
collaboration around search quality. (disclaimer, yes we build a product
around this) but the advice still stands regardless.

http://www.opensourceconnections.com/2013/10/14/what-is-test-driven-search-relevancy/

Cheers
-Doug Turnbull
Search Relevancy Expert
OpenSource Connections




On Sun, Oct 20, 2013 at 4:21 PM, Furkan KAMACI furkankam...@gmail.comwrote:

 Let's assume that you have keywords to search and different configurations
 for indexing. A/B testing is one of techniques that you can use as like
 Erick mentioned.

 If you want to have an automated comparison and do not have a oracle for
 A/B testing there is another way. If you have an ideal result list you can
 compare the similarity of your different configuration results and that
 ideal result list.

 The ideal result list can be created by an expert just for one time. If
 you are developing a search engine you can search same keywords at that one
 of search engines and you can use that results as ideal result list to
 measure your result lists' similarities.

 Kendall's tau is one of the methods to use for such kind of situations. If
 you do not have any document duplication at your index (without any other
 versions) I suggest to use tau a.

 If you explain your system and if you explain what is good for you or what
 is ideal for you I can explain you more.

 Thanks;
 Furkan KAMACI


 2013/10/18 Erick Erickson erickerick...@gmail.com

  bq: How do you compare the quality of your
  search result in order to decide which schema is better?
 
  Well, that's actually a hard problem. There's the
  various TREC data, but that's a generic solution and most
  every individual application of this generic thing called
  search has its own version of good results.
 
  Note that scores are NOT comparable across different
  queries even in the same data set, so don't go down that
  path.
 
  I'd fire the question back at you, Can you define what
  good (or better) results are in such a way that you can
  program an evaluation? Often the answer is no...
 
  One common technique is to have knowledgable users
  do what's called A/B testing. You fire the query at two
  separate Solr instances and display the results side-by-side,
  and the user says A is more relevant, or B is more
  relevant. Kind of like an eye doctor. In sophisticated A/B
  testing, the program randomly changes which side the
  results go, so you remove sidedness bias.
 
 
  FWIW,
  Erick
 
 
  On Thu, Oct 17, 2013 at 11:28 AM, Alvaro Cabrerizo topor...@gmail.com
  wrote:
 
   Hi,
  
   Imagine the next situation. You have a corpus of documents and a list
 of
   queries extracted from production environment. The corpus haven't been
   manually annotated with relvant/non relevant tags for every query. Then
  you
   configure various solr instances changing the schema (adding synonyms,
   stopwords...). After indexing, you prepare and execute the test over
   different schema configurations.  How do you compare the quality of
 your
   search result in order to decide which schema is better?
  
   Regards.
  
 




-- 
Doug Turnbull
Search  Big Data Architect
OpenSource Connections http://o19s.com


Re: measure result set quality

2013-10-18 Thread Erick Erickson
bq: How do you compare the quality of your
search result in order to decide which schema is better?

Well, that's actually a hard problem. There's the
various TREC data, but that's a generic solution and most
every individual application of this generic thing called
search has its own version of good results.

Note that scores are NOT comparable across different
queries even in the same data set, so don't go down that
path.

I'd fire the question back at you, Can you define what
good (or better) results are in such a way that you can
program an evaluation? Often the answer is no...

One common technique is to have knowledgable users
do what's called A/B testing. You fire the query at two
separate Solr instances and display the results side-by-side,
and the user says A is more relevant, or B is more
relevant. Kind of like an eye doctor. In sophisticated A/B
testing, the program randomly changes which side the
results go, so you remove sidedness bias.


FWIW,
Erick


On Thu, Oct 17, 2013 at 11:28 AM, Alvaro Cabrerizo topor...@gmail.comwrote:

 Hi,

 Imagine the next situation. You have a corpus of documents and a list of
 queries extracted from production environment. The corpus haven't been
 manually annotated with relvant/non relevant tags for every query. Then you
 configure various solr instances changing the schema (adding synonyms,
 stopwords...). After indexing, you prepare and execute the test over
 different schema configurations.  How do you compare the quality of your
 search result in order to decide which schema is better?

 Regards.



measure result set quality

2013-10-17 Thread Alvaro Cabrerizo
Hi,

Imagine the next situation. You have a corpus of documents and a list of
queries extracted from production environment. The corpus haven't been
manually annotated with relvant/non relevant tags for every query. Then you
configure various solr instances changing the schema (adding synonyms,
stopwords...). After indexing, you prepare and execute the test over
different schema configurations.  How do you compare the quality of your
search result in order to decide which schema is better?

Regards.