Thanks for your valuable answers. As a first approach I will evaluate (manually :( ) hits that are out of the intersection set for every query in each system. Anyway I will keep searching for literature in the field.
Regards. On Sun, Oct 20, 2013 at 10:55 PM, Doug Turnbull < dturnb...@opensourceconnections.com> wrote: > That's exactly what we advocate for in our Solr work. We call in "Test > Driven Relevancy". We work closely with content experts to help build > collaboration around search quality. (disclaimer, yes we build a product > around this) but the advice still stands regardless. > > > http://www.opensourceconnections.com/2013/10/14/what-is-test-driven-search-relevancy/ > > Cheers > -Doug Turnbull > Search Relevancy Expert > OpenSource Connections > > > > > On Sun, Oct 20, 2013 at 4:21 PM, Furkan KAMACI <furkankam...@gmail.com > >wrote: > > > Let's assume that you have keywords to search and different > configurations > > for indexing. A/B testing is one of techniques that you can use as like > > Erick mentioned. > > > > If you want to have an automated comparison and do not have a oracle for > > A/B testing there is another way. If you have an ideal result list you > can > > compare the similarity of your different configuration results and that > > ideal result list. > > > > The "ideal result list" can be created by an expert just for one time. If > > you are developing a search engine you can search same keywords at that > one > > of search engines and you can use that results as ideal result list to > > measure your result lists' similarities. > > > > Kendall's tau is one of the methods to use for such kind of situations. > If > > you do not have any document duplication at your index (without any other > > versions) I suggest to use tau a. > > > > If you explain your system and if you explain what is good for you or > what > > is ideal for you I can explain you more. > > > > Thanks; > > Furkan KAMACI > > > > > > 2013/10/18 Erick Erickson <erickerick...@gmail.com> > > > > > bq: How do you compare the quality of your > > > search result in order to decide which schema is better? > > > > > > Well, that's actually a hard problem. There's the > > > various TREC data, but that's a generic solution and most > > > every individual application of this generic thing called > > > "search" has its own version of "good" results. > > > > > > Note that scores are NOT comparable across different > > > queries even in the same data set, so don't go down that > > > path. > > > > > > I'd fire the question back at you, "Can you define what > > > good (or better) results are in such a way that you can > > > program an evaluation?" Often the answer is "no"... > > > > > > One common technique is to have knowledgable users > > > do what's called A/B testing. You fire the query at two > > > separate Solr instances and display the results side-by-side, > > > and the user says "A is more relevant", or "B is more > > > relevant". Kind of like an eye doctor. In sophisticated A/B > > > testing, the program randomly changes which side the > > > results go, so you remove "sidedness" bias. > > > > > > > > > FWIW, > > > Erick > > > > > > > > > On Thu, Oct 17, 2013 at 11:28 AM, Alvaro Cabrerizo <topor...@gmail.com > > > >wrote: > > > > > > > Hi, > > > > > > > > Imagine the next situation. You have a corpus of documents and a list > > of > > > > queries extracted from production environment. The corpus haven't > been > > > > manually annotated with relvant/non relevant tags for every query. > Then > > > you > > > > configure various solr instances changing the schema (adding > synonyms, > > > > stopwords...). After indexing, you prepare and execute the test over > > > > different schema configurations. How do you compare the quality of > > your > > > > search result in order to decide which schema is better? > > > > > > > > Regards. > > > > > > > > > > > > > -- > Doug Turnbull > Search & Big Data Architect > OpenSource Connections <http://o19s.com> >