Thanks for your valuable answers.

As a first approach I will evaluate (manually :( ) hits that are out of the
intersection set for every query in each system. Anyway I will keep
searching for literature in the field.

Regards.


On Sun, Oct 20, 2013 at 10:55 PM, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> That's exactly what we advocate for in our Solr work. We call in "Test
> Driven Relevancy". We work closely with content experts to help build
> collaboration around search quality. (disclaimer, yes we build a product
> around this) but the advice still stands regardless.
>
>
> http://www.opensourceconnections.com/2013/10/14/what-is-test-driven-search-relevancy/
>
> Cheers
> -Doug Turnbull
> Search Relevancy Expert
> OpenSource Connections
>
>
>
>
> On Sun, Oct 20, 2013 at 4:21 PM, Furkan KAMACI <furkankam...@gmail.com
> >wrote:
>
> > Let's assume that you have keywords to search and different
> configurations
> > for indexing. A/B testing is one of techniques that you can use as like
> > Erick mentioned.
> >
> > If you want to have an automated comparison and do not have a oracle for
> > A/B testing there is another way. If you have an ideal result list you
> can
> > compare the similarity of your different configuration results and that
> > ideal result list.
> >
> > The "ideal result list" can be created by an expert just for one time. If
> > you are developing a search engine you can search same keywords at that
> one
> > of search engines and you can use that results as ideal result list to
> > measure your result lists' similarities.
> >
> > Kendall's tau is one of the methods to use for such kind of situations.
> If
> > you do not have any document duplication at your index (without any other
> > versions) I suggest to use tau a.
> >
> > If you explain your system and if you explain what is good for you or
> what
> > is ideal for you I can explain you more.
> >
> > Thanks;
> > Furkan KAMACI
> >
> >
> > 2013/10/18 Erick Erickson <erickerick...@gmail.com>
> >
> > > bq: How do you compare the quality of your
> > > search result in order to decide which schema is better?
> > >
> > > Well, that's actually a hard problem. There's the
> > > various TREC data, but that's a generic solution and most
> > > every individual application of this generic thing called
> > > "search" has its own version of "good" results.
> > >
> > > Note that scores are NOT comparable across different
> > > queries even in the same data set, so don't go down that
> > > path.
> > >
> > > I'd fire the question back at you, "Can you define what
> > > good (or better) results are in such a way that you can
> > > program an evaluation?" Often the answer is "no"...
> > >
> > > One common technique is to have knowledgable users
> > > do what's called A/B testing. You fire the query at two
> > > separate Solr instances and display the results side-by-side,
> > > and the user says "A is more relevant", or "B is more
> > > relevant". Kind of like an eye doctor. In sophisticated A/B
> > > testing, the program randomly changes which side the
> > > results go, so you remove "sidedness" bias.
> > >
> > >
> > > FWIW,
> > > Erick
> > >
> > >
> > > On Thu, Oct 17, 2013 at 11:28 AM, Alvaro Cabrerizo <topor...@gmail.com
> > > >wrote:
> > >
> > > > Hi,
> > > >
> > > > Imagine the next situation. You have a corpus of documents and a list
> > of
> > > > queries extracted from production environment. The corpus haven't
> been
> > > > manually annotated with relvant/non relevant tags for every query.
> Then
> > > you
> > > > configure various solr instances changing the schema (adding
> synonyms,
> > > > stopwords...). After indexing, you prepare and execute the test over
> > > > different schema configurations.  How do you compare the quality of
> > your
> > > > search result in order to decide which schema is better?
> > > >
> > > > Regards.
> > > >
> > >
> >
>
>
>
> --
> Doug Turnbull
> Search & Big Data Architect
> OpenSource Connections <http://o19s.com>
>

Reply via email to