Re: measure result set quality
: As a first approach I will evaluate (manually :( ) hits that are out of the : intersection set for every query in each system. Anyway I will keep FYI: LucidWorks has a Relevancy Workbench tool that serves as a simple UI designed explicitly for the purpose of comparing the result sets of from different solr query configurations... http://www.lucidworks.com/market_app/lucidworks-relevancy-workbench/ -Hoss
Re: measure result set quality
Thanks for your valuable answers. As a first approach I will evaluate (manually :( ) hits that are out of the intersection set for every query in each system. Anyway I will keep searching for literature in the field. Regards. On Sun, Oct 20, 2013 at 10:55 PM, Doug Turnbull dturnb...@opensourceconnections.com wrote: That's exactly what we advocate for in our Solr work. We call in Test Driven Relevancy. We work closely with content experts to help build collaboration around search quality. (disclaimer, yes we build a product around this) but the advice still stands regardless. http://www.opensourceconnections.com/2013/10/14/what-is-test-driven-search-relevancy/ Cheers -Doug Turnbull Search Relevancy Expert OpenSource Connections On Sun, Oct 20, 2013 at 4:21 PM, Furkan KAMACI furkankam...@gmail.com wrote: Let's assume that you have keywords to search and different configurations for indexing. A/B testing is one of techniques that you can use as like Erick mentioned. If you want to have an automated comparison and do not have a oracle for A/B testing there is another way. If you have an ideal result list you can compare the similarity of your different configuration results and that ideal result list. The ideal result list can be created by an expert just for one time. If you are developing a search engine you can search same keywords at that one of search engines and you can use that results as ideal result list to measure your result lists' similarities. Kendall's tau is one of the methods to use for such kind of situations. If you do not have any document duplication at your index (without any other versions) I suggest to use tau a. If you explain your system and if you explain what is good for you or what is ideal for you I can explain you more. Thanks; Furkan KAMACI 2013/10/18 Erick Erickson erickerick...@gmail.com bq: How do you compare the quality of your search result in order to decide which schema is better? Well, that's actually a hard problem. There's the various TREC data, but that's a generic solution and most every individual application of this generic thing called search has its own version of good results. Note that scores are NOT comparable across different queries even in the same data set, so don't go down that path. I'd fire the question back at you, Can you define what good (or better) results are in such a way that you can program an evaluation? Often the answer is no... One common technique is to have knowledgable users do what's called A/B testing. You fire the query at two separate Solr instances and display the results side-by-side, and the user says A is more relevant, or B is more relevant. Kind of like an eye doctor. In sophisticated A/B testing, the program randomly changes which side the results go, so you remove sidedness bias. FWIW, Erick On Thu, Oct 17, 2013 at 11:28 AM, Alvaro Cabrerizo topor...@gmail.com wrote: Hi, Imagine the next situation. You have a corpus of documents and a list of queries extracted from production environment. The corpus haven't been manually annotated with relvant/non relevant tags for every query. Then you configure various solr instances changing the schema (adding synonyms, stopwords...). After indexing, you prepare and execute the test over different schema configurations. How do you compare the quality of your search result in order to decide which schema is better? Regards. -- Doug Turnbull Search Big Data Architect OpenSource Connections http://o19s.com
Re: measure result set quality
Let's assume that you have keywords to search and different configurations for indexing. A/B testing is one of techniques that you can use as like Erick mentioned. If you want to have an automated comparison and do not have a oracle for A/B testing there is another way. If you have an ideal result list you can compare the similarity of your different configuration results and that ideal result list. The ideal result list can be created by an expert just for one time. If you are developing a search engine you can search same keywords at that one of search engines and you can use that results as ideal result list to measure your result lists' similarities. Kendall's tau is one of the methods to use for such kind of situations. If you do not have any document duplication at your index (without any other versions) I suggest to use tau a. If you explain your system and if you explain what is good for you or what is ideal for you I can explain you more. Thanks; Furkan KAMACI 2013/10/18 Erick Erickson erickerick...@gmail.com bq: How do you compare the quality of your search result in order to decide which schema is better? Well, that's actually a hard problem. There's the various TREC data, but that's a generic solution and most every individual application of this generic thing called search has its own version of good results. Note that scores are NOT comparable across different queries even in the same data set, so don't go down that path. I'd fire the question back at you, Can you define what good (or better) results are in such a way that you can program an evaluation? Often the answer is no... One common technique is to have knowledgable users do what's called A/B testing. You fire the query at two separate Solr instances and display the results side-by-side, and the user says A is more relevant, or B is more relevant. Kind of like an eye doctor. In sophisticated A/B testing, the program randomly changes which side the results go, so you remove sidedness bias. FWIW, Erick On Thu, Oct 17, 2013 at 11:28 AM, Alvaro Cabrerizo topor...@gmail.com wrote: Hi, Imagine the next situation. You have a corpus of documents and a list of queries extracted from production environment. The corpus haven't been manually annotated with relvant/non relevant tags for every query. Then you configure various solr instances changing the schema (adding synonyms, stopwords...). After indexing, you prepare and execute the test over different schema configurations. How do you compare the quality of your search result in order to decide which schema is better? Regards.
Re: measure result set quality
That's exactly what we advocate for in our Solr work. We call in Test Driven Relevancy. We work closely with content experts to help build collaboration around search quality. (disclaimer, yes we build a product around this) but the advice still stands regardless. http://www.opensourceconnections.com/2013/10/14/what-is-test-driven-search-relevancy/ Cheers -Doug Turnbull Search Relevancy Expert OpenSource Connections On Sun, Oct 20, 2013 at 4:21 PM, Furkan KAMACI furkankam...@gmail.comwrote: Let's assume that you have keywords to search and different configurations for indexing. A/B testing is one of techniques that you can use as like Erick mentioned. If you want to have an automated comparison and do not have a oracle for A/B testing there is another way. If you have an ideal result list you can compare the similarity of your different configuration results and that ideal result list. The ideal result list can be created by an expert just for one time. If you are developing a search engine you can search same keywords at that one of search engines and you can use that results as ideal result list to measure your result lists' similarities. Kendall's tau is one of the methods to use for such kind of situations. If you do not have any document duplication at your index (without any other versions) I suggest to use tau a. If you explain your system and if you explain what is good for you or what is ideal for you I can explain you more. Thanks; Furkan KAMACI 2013/10/18 Erick Erickson erickerick...@gmail.com bq: How do you compare the quality of your search result in order to decide which schema is better? Well, that's actually a hard problem. There's the various TREC data, but that's a generic solution and most every individual application of this generic thing called search has its own version of good results. Note that scores are NOT comparable across different queries even in the same data set, so don't go down that path. I'd fire the question back at you, Can you define what good (or better) results are in such a way that you can program an evaluation? Often the answer is no... One common technique is to have knowledgable users do what's called A/B testing. You fire the query at two separate Solr instances and display the results side-by-side, and the user says A is more relevant, or B is more relevant. Kind of like an eye doctor. In sophisticated A/B testing, the program randomly changes which side the results go, so you remove sidedness bias. FWIW, Erick On Thu, Oct 17, 2013 at 11:28 AM, Alvaro Cabrerizo topor...@gmail.com wrote: Hi, Imagine the next situation. You have a corpus of documents and a list of queries extracted from production environment. The corpus haven't been manually annotated with relvant/non relevant tags for every query. Then you configure various solr instances changing the schema (adding synonyms, stopwords...). After indexing, you prepare and execute the test over different schema configurations. How do you compare the quality of your search result in order to decide which schema is better? Regards. -- Doug Turnbull Search Big Data Architect OpenSource Connections http://o19s.com
Re: measure result set quality
bq: How do you compare the quality of your search result in order to decide which schema is better? Well, that's actually a hard problem. There's the various TREC data, but that's a generic solution and most every individual application of this generic thing called search has its own version of good results. Note that scores are NOT comparable across different queries even in the same data set, so don't go down that path. I'd fire the question back at you, Can you define what good (or better) results are in such a way that you can program an evaluation? Often the answer is no... One common technique is to have knowledgable users do what's called A/B testing. You fire the query at two separate Solr instances and display the results side-by-side, and the user says A is more relevant, or B is more relevant. Kind of like an eye doctor. In sophisticated A/B testing, the program randomly changes which side the results go, so you remove sidedness bias. FWIW, Erick On Thu, Oct 17, 2013 at 11:28 AM, Alvaro Cabrerizo topor...@gmail.comwrote: Hi, Imagine the next situation. You have a corpus of documents and a list of queries extracted from production environment. The corpus haven't been manually annotated with relvant/non relevant tags for every query. Then you configure various solr instances changing the schema (adding synonyms, stopwords...). After indexing, you prepare and execute the test over different schema configurations. How do you compare the quality of your search result in order to decide which schema is better? Regards.
measure result set quality
Hi, Imagine the next situation. You have a corpus of documents and a list of queries extracted from production environment. The corpus haven't been manually annotated with relvant/non relevant tags for every query. Then you configure various solr instances changing the schema (adding synonyms, stopwords...). After indexing, you prepare and execute the test over different schema configurations. How do you compare the quality of your search result in order to decide which schema is better? Regards.