-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com]
Sent: Thursday, February 18, 2010 3:09 PM
To: java-user@lucene.apache.org
Subject: Re: BM25 Scoring Patch
Yuval, don't we still need this 'document-level IDF' for BM25f?
- Yes, we do need 'document-
d be a great help.
> Thanks,
> Yuval
>
> -Original Message-
> From: Robert Muir [mailto:rcm...@gmail.com]
> Sent: Wednesday, February 17, 2010 6:47 PM
> To: java-user@lucene.apache.org
> Subject: Re: BM25 Scoring Patch
>
> I tend to agree with you Marvin, you are righ
me of this work myself, but guidance from a Lucene scoring guru
would be a great help.
Thanks,
Yuval
-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com]
Sent: Wednesday, February 17, 2010 6:47 PM
To: java-user@lucene.apache.org
Subject: Re: BM25 Scoring Patch
I tend to agree
I tend to agree with you Marvin, you are right, the different scoring
mechanisms need different information available and this is the problem.
although last I checked, one hard part of BM25 rotates around fields versus
documents... e.g. BM25's IDF calculation.
but maybe this is just an extreme fo
On Wed, Feb 17, 2010 at 10:31:19AM -0500, Robert Muir wrote:
> yet if we don't do the hard work up front to make it easy to plug in things
> like BM25, then no one will implement additional scoring formulas for
> Lucene, we currently make it terribly difficult to do this.
FWIW... Similarity and po
>
> We opened up the TermScorer class for that.
>
> Thanks,
>
> Ivan
>
> --- On Wed, 2/17/10, Robert Muir wrote:
>
> > From: Robert Muir
> > Subject: Re: BM25 Scoring Patch
> > To: java-user@lucene.apache.org
> > Date: Wednesday, Feb
Muir wrote:
> From: Robert Muir
> Subject: Re: BM25 Scoring Patch
> To: java-user@lucene.apache.org
> Date: Wednesday, February 17, 2010, 10:31 AM
> Yuval, i apologize for not having an
> intelligent response for your question
> (if i did i would try to formulate it as a patch),
m]
> Sent: Tuesday, February 16, 2010 10:38 PM
> To: java-user@lucene.apache.org
> Subject: Re: BM25 Scoring Patch
>
> Joaquin, I have a typical methodology where I don't optimize any scoring
> params: be it BM25 params (I stick with your defaults), or lnb.ltc params
&
we
> >> are
> >> >> dealing here.
> >> >>
> >> >> PS2: In relation with TREC4 Cornell used a pivoted length
> >> normalisation
> >> >> and they were applying pseudo-relevance feedback, what honestly makes
> >>
e
> >> >> dealing here.
> >> >>
> >> >> PS2: In relation with TREC4 Cornell used a pivoted length
> >> normalisation
> >> >> and they were applying pseudo-relevance feedback, what honestly makes
> >> >> much
> &
e
>> >> part of the pool.
>> >>
>> >> Sorry for the huge mail :-
>> >>
>> >> > Hi Ivan,
>> >> >
>> >> > the problem is that unfortunately BM25
>> >> > cannot be implemented overwriting
IDF (what is
> >> > interesting only at search time).
> >> > If you set BM25Similarity at indexing time
> >> > some basic stats are not stored
> >> > correctly in the segments (like docs length).
> >> >
> >> > When you use BM25BooleanQuery this c
; > not interfering on the typical use of Lucene (so no changing
>> > DefaultSimilarity).
>> >
>> >> Joaquin, Robert,
>> >>
>> >> I followed Joaquin's recommendation and removed the call to set
>> >> similarity
>> >> to BM25 expli
publish the results once we run the
> experiments on a full collection. Are you talking about the bias caused by
> using a sub-collection?
>
> Thanks,
>
> Ivan
>
> --- On Tue, 2/16/10, Robert Muir wrote:
>
> > From: Robert Muir
> > Subject: Re: BM25 Sc
By the end of the week, I will publish the results once we run the experiments
on a full collection. Are you talking about the bias caused by using a
sub-collection?
Thanks,
Ivan
--- On Tue, 2/16/10, Robert Muir wrote:
> From: Robert Muir
> Subject: Re: BM25 Scoring Patch
> To:
l numbers on the complete collection.
>
> We are planning to also apply the stemming. Right now we are trying to
> isolate each improvement experiment.
>
> Thanks,
>
> Ivan
>
>
>
> --- On Tue, 2/16/10, Robert Muir wrote:
>
> > From: Robert Muir
> > Sub
r). The results showed 55%
> >> improvement for the MAP score (0.141->0.219) over default similarity.
> >>
> >> Joaquin, how would setting the similarity to BM25 explicitly make the
> >> score worse?
> >>
> >> Thank you,
> >&g
ng to isolate
each improvement experiment.
Thanks,
Ivan
--- On Tue, 2/16/10, Robert Muir wrote:
> From: Robert Muir
> Subject: Re: BM25 Scoring Patch
> To: java-user@lucene.apache.org
> Date: Tuesday, February 16, 2010, 1:14 PM
> Ivan just a little more food for
> though
itly (indexer, searcher). The results showed 55%
>> improvement for the MAP score (0.141->0.219) over default similarity.
>>
>> Joaquin, how would setting the similarity to BM25 explicitly make the
>> score worse?
>>
>> Thank you,
>>
>> Ivan
>&g
cher). The results showed 55% improvement
> for the MAP score (0.141->0.219) over default similarity.
>
> Joaquin, how would setting the similarity to BM25 explicitly make the score
> worse?
>
> Thank you,
>
> Ivan
>
>
>
> --- On Tue, 2/16/10, Robert Muir wrot
gt; Joaquin, how would setting the similarity to BM25 explicitly make the
> score worse?
>
> Thank you,
>
> Ivan
>
>
>
> --- On Tue, 2/16/10, Robert Muir wrote:
>
>> From: Robert Muir
>> Subject: Re: BM25 Scoring Patch
>> To: java-user@lucene.apache.org
&g
van
>
>
>
> --- On Tue, 2/16/10, Robert Muir wrote:
>
> > From: Robert Muir
> > Subject: Re: BM25 Scoring Patch
> > To: java-user@lucene.apache.org
> > Date: Tuesday, February 16, 2010, 11:36 AM
> > yes Ivan, if possible please report
> > back any
tly make the score
worse?
Thank you,
Ivan
--- On Tue, 2/16/10, Robert Muir wrote:
> From: Robert Muir
> Subject: Re: BM25 Scoring Patch
> To: java-user@lucene.apache.org
> Date: Tuesday, February 16, 2010, 11:36 AM
> yes Ivan, if possible please report
> back a
yes Ivan, if possible please report back any findings you can on the
experiments you are doing!
On Tue, Feb 16, 2010 at 11:22 AM, Joaquin Perez Iglesias <
joaquin.pe...@lsi.uned.es> wrote:
> Hi Ivan,
>
> You shouldn't set the BM25Similarity for indexing or searching.
> Please try removing the lin
Hi Ivan,
You shouldn't set the BM25Similarity for indexing or searching.
Please try removing the lines:
writer.setSimilarity(new BM25Similarity());
searcher.setSimilarity(sim);
Please let us/me know if you improve your results with these changes.
Robert Muir escribió:
Hi Ivan, I've seen
Hi Ivan, I've seen many cases where BM25 performs worse than Lucene's
default Similarity. Perhaps this is just another one?
Again while I have not worked with this particular collection, I looked at
the statistics and noted that its composed of several 'sub-collections': for
example the PAT docume
26 matches
Mail list logo