Re: exponential boosts

2009-04-24 Thread Steven Bethard
On 4/24/2009 3:16 AM, Doron Cohen wrote: > On Fri, Apr 24, 2009 at 12:28 AM, Steven Bethard wrote: > >> On 4/23/2009 2:08 PM, Marcus Herou wrote: >>> But perhaps one could use a FieldCache somehow ? >> Some code snippets that may help. I add the PageRank value as a field of >> the documents I inde

Re: exponential boosts

2009-04-24 Thread Doron Cohen
On Fri, Apr 24, 2009 at 12:28 AM, Steven Bethard wrote: > On 4/23/2009 2:08 PM, Marcus Herou wrote: > > But perhaps one could use a FieldCache somehow ? > > Some code snippets that may help. I add the PageRank value as a field of > the documents I index with Lucene like this: > >Document docum

Re: exponential boosts

2009-04-23 Thread Marcus Herou
Thank you Steve, now it's implementation time... I'll be back :) /M On Fri, Apr 24, 2009 at 3:13 AM, Steven Bethard wrote: > On 4/23/2009 2:42 PM, Marcus Herou wrote: > > So what you basically are saying is that: > > > > 1. You have an index which contains data that is more or less static (no >

Re: exponential boosts

2009-04-23 Thread Steven Bethard
On 4/23/2009 2:42 PM, Marcus Herou wrote: > So what you basically are saying is that: > > 1. You have an index which contains data that is more or less static (no > updates) or you have another update interval than the PR interval. > 2. A PR index which is rebuilt (from scratch ?) every X days/wee

Re: exponential boosts

2009-04-23 Thread Marcus Herou
Never mind of how to open the ParallellReader stuff (I am an idiot): RTFM: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/index/ParallelReader.html But the rest is of course interesting :) /M On Thu, Apr 23, 2009 at 11:42 PM, Marcus Herou wrote: > Thanks! (I started my reply and then

Re: exponential boosts

2009-04-23 Thread Marcus Herou
Thanks! (I started my reply and then saw that you added code snippets) I think we are narrowing down the problem to the updating issue of the PageRank score. So what you basically are saying is that: 1. You have an index which contains data that is more or less static (no updates) or you have an

Re: exponential boosts

2009-04-23 Thread Steven Bethard
On 4/23/2009 2:08 PM, Marcus Herou wrote: > But perhaps one could use a FieldCache somehow ? Some code snippets that may help. I add the PageRank value as a field of the documents I index with Lucene like this: Document document = new Document(); double pageRank = this.pageRanks.getCount(

Re: exponential boosts

2009-04-23 Thread Steven Bethard
On 4/23/2009 1:58 PM, Doron Cohen wrote: >> I think we are doing similar things, at least I am trying to implement >> document boosting with pagerank. Having issues of howto appky the scoring >> of >> specific docs without actually reindex them. I feel something should be >> done >> at query time w

Re: exponential boosts

2009-04-23 Thread Marcus Herou
But perhaps one could use a FieldCache somehow ? /M On Thu, Apr 23, 2009 at 11:07 PM, Marcus Herou wrote: > Yes I have considered it for 30 minutes :) > > How do one apply that in the real world ? > > If the only thing I get access to is the actual docId would it not be > really expensive to get

Re: exponential boosts

2009-04-23 Thread Marcus Herou
Yes I have considered it for 30 minutes :) How do one apply that in the real world ? If the only thing I get access to is the actual docId would it not be really expensive to get the Document itself from the index and later use some field in it as external lookup in some optimized structure for t

Re: exponential boosts

2009-04-23 Thread Doron Cohen
> > I think we are doing similar things, at least I am trying to implement > document boosting with pagerank. Having issues of howto appky the scoring > of > specific docs without actually reindex them. I feel something should be > done > at query time which looks at external data but do not know h

Re: exponential boosts

2009-04-23 Thread Marcus Herou
Hi. I think we are doing similar things, at least I am trying to implement document boosting with pagerank. Having issues of howto appky the scoring of specific docs without actually reindex them. I feel something should be done at query time which looks at external data but do not know howto impl

Re: exponential boosts

2009-04-12 Thread Steven Bethard
On 4/10/2009 5:13 PM, Steven Bethard wrote: > On 4/10/2009 12:56 PM, Steven Bethard wrote: >> I need to have a scoring model of the form: >> >> s1(d, q)^a1 * s2(d, q)^a2 * ... * sN(d, q)^aN >> >> where "d" is a document, "q" is a query, "sK" is a scoring function, and >> "aK" is the exponential

Re: exponential boosts

2009-04-10 Thread Steven Bethard
On 4/10/2009 12:56 PM, Steven Bethard wrote: > I need to have a scoring model of the form: > > s1(d, q)^a1 * s2(d, q)^a2 * ... * sN(d, q)^aN > > where "d" is a document, "q" is a query, "sK" is a scoring function, and > "aK" is the exponential boost factor for that scoring function. As a > si

Re: exponential boosts

2009-04-10 Thread Steven Bethard
On 4/10/2009 1:08 PM, Jack Stahl wrote: > Perhaps you'd find it easier to implement the equivalent: > > log(s1(d, q))*a1 + ... + log(sN(d, q))*aN Yes, that's fine too - that's actually what I'd be optimizing anyway. But how would I do that? If I took the query boost route, how do I get a TermQue

Re: exponential boosts

2009-04-10 Thread Jack Stahl
Perhaps you'd find it easier to implement the equivalent: log(s1(d, q))*a1 + ... + log(sN(d, q))*aN On Fri, Apr 10, 2009 at 12:56 PM, Steven Bethard wrote: > I need to have a scoring model of the form: > >s1(d, q)^a1 * s2(d, q)^a2 * ... * sN(d, q)^aN > > where "d" is a document, "q" is a que