Re: Relevancy Practices

2010-05-05 Thread Avi Rosenschein
On Wed, May 5, 2010 at 5:08 PM, Grant Ingersoll wrote: > > On May 2, 2010, at 5:50 AM, Avi Rosenschein wrote: > > > On 4/30/10, Grant Ingersoll wrote: > >> > >> On Apr 30, 2010, at 8:00 AM, Avi Rosenschein wrote: > >>> Also, tuning the algorithms to the users can be very important. For > >>> ins

Re: Relevancy Practices

2010-05-05 Thread Peter Keegan
The feedback came directly from customers and customer facing support folks. Here is an example of a query with keywords: nurse, rn, nursing, hospital. The top 2 hits have scores of 26.86348 and 26.407215. To the customer, both results were equally relevant because all of their keywords were in the

Re: Relevancy Practices

2010-05-05 Thread Grant Ingersoll
Thanks, Peter. Can you share what kind of evaluations you did to determine that the end user believed the results were equally relevant? How formal was that process? -Grant On May 3, 2010, at 11:08 AM, Peter Keegan wrote: > We discovered very soon after going to production that Lucene's score

Re: Relevancy Practices

2010-05-05 Thread Grant Ingersoll
On May 2, 2010, at 5:50 AM, Avi Rosenschein wrote: > On 4/30/10, Grant Ingersoll wrote: >> >> On Apr 30, 2010, at 8:00 AM, Avi Rosenschein wrote: >>> Also, tuning the algorithms to the users can be very important. For >>> instance, we have found that in a basic search functionality, the default

Re: Relevancy Practices

2010-05-03 Thread Ivan Provalov
Grant, We are currently working on a relevancy improvement project. We took the IBM's paper from 2007 TREC and followed the approaches they described to improve Lucene's relevance. It also gave us some idea of Lucene’s out-of-the-box precision performance (MAP). In addition to it we used som

Re: Relevancy Practices

2010-05-03 Thread Peter Keegan
We discovered very soon after going to production that Lucene's scores were often 'too precise'. For example, a page of 25 results may have several different score values, and all within 15% of each other, but to the end user all 25 results were equally relevant. Thus we wanted the secondary sort f

Re: Relevancy Practices

2010-05-02 Thread Avi Rosenschein
On 4/30/10, Grant Ingersoll wrote: > > On Apr 30, 2010, at 8:00 AM, Avi Rosenschein wrote: >> Also, tuning the algorithms to the users can be very important. For >> instance, we have found that in a basic search functionality, the default >> query parser operator OR works very well. But on a page

Re: Relevancy Practices

2010-04-30 Thread MitchK
I found your thread at the Solr-user-list. However, it seems like your topic belongs more to Lucene in general? I copy my posting from there, so that everything is accessible by one thread. -- I think the problems one has to

Re: Relevancy Practices

2010-04-30 Thread Grant Ingersoll
On Apr 30, 2010, at 8:00 AM, Avi Rosenschein wrote: > Also, tuning the algorithms to the users can be very important. For > instance, we have found that in a basic search functionality, the default > query parser operator OR works very well. But on a page for advanced users, > who want to very pre

Re: Relevancy Practices

2010-04-30 Thread Avi Rosenschein
On Thu, Apr 29, 2010 at 5:59 PM, Mark Bennett wrote: > Hi Grant, > > You're welcome to use any of my slides (Dave's got them), with attribution > of course. > > BUT > > Have you considered a section something like "why the hell do you think > Relevancy tweaking is gonna save you!?!?" > Basi

Re: Relevancy Practices

2010-04-29 Thread Mark Bennett
Hi Grant, You're welcome to use any of my slides (Dave's got them), with attribution of course. BUT Have you considered a section something like "why the hell do you think Relevancy tweaking is gonna save you!?!?" Basically that, as a corpus grows exponentially, so do results list sizes, so

RE: Relevancy Practices

2010-04-29 Thread Fornoville, Tom
We've only been using Lucene for a couple of weeks and we're still in the evaluation and R&D phase but there's one single thing that has helped us out enormously with the relevance testing: a set of reference documents and queries. We basically sat together with the business people a created a list