Re: strange idf in Lucene 2.1

2007-04-12 Thread Yonik Seeley
On 4/12/07, Koji Sekiguchi <[EMAIL PROTECTED]> wrote: > Is the index completely removed between the 2.0 and 2.1 runs? Sure. If you see my program, you'll find I'm using RAMDirectory. OK, I think it's due to the change in merge policy. Lucene 2.0 could under-merge (not enough) or over-merge (b

Re: strange idf in Lucene 2.1

2007-04-12 Thread Koji Sekiguchi
> Is the index completely removed between the 2.0 and 2.1 runs? Sure. If you see my program, you'll find I'm using RAMDirectory. regards, Koji - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMA

Re: strange idf in Lucene 2.1

2007-04-12 Thread Yonik Seeley
On 4/12/07, Koji Sekiguchi <[EMAIL PROTECTED]> wrote: Chris, > i'm not understanding this part of the thread ... are you saying that if > you have two identical setups, the only difference being that one uses 2.0 > and the other uses 2.1, then you see different idfs after > adding/deleting/re-add

Re: strange idf in Lucene 2.1

2007-04-12 Thread Koji Sekiguchi
Chris, i'm not understanding this part of the thread ... are you saying that if you have two identical setups, the only difference being that one uses 2.0 and the other uses 2.1, then you see different idfs after adding/deleting/re-adding many docs? Exactly. Please try to run the program whic

Re: strange idf in Lucene 2.1

2007-04-12 Thread Chris Hostetter
: But if now the index goes through a massive update, where almost all the : docs containing TC are deleted, and TC is not in any newly added doc, : practically TC becomes rare too, and hence D2 should probably be scored : higher than D1. But IDF(TC) might not (yet) reflect the massive docs : dele

Re: strange idf in Lucene 2.1

2007-04-12 Thread Doron Cohen
Chris Hostetter <[EMAIL PROTECTED]> wrote on 12/04/2007 15:22:20: > > : But not which terms have an odd IDF value because of those deleted > : documents. How much does the IDF value contribute to the "score" in > : search? > > all idf's are affected equally, because the 'numDocs" value used is >

Re: strange idf in Lucene 2.1

2007-04-12 Thread Yonik Seeley
On 4/12/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: : But not which terms have an odd IDF value because of those deleted : documents. How much does the IDF value contribute to the "score" in : search? all idf's are affected equally, because the 'numDocs" value used is allways the same The

Re: strange idf in Lucene 2.1

2007-04-12 Thread Chris Hostetter
: > This should be the same for Lucene 2.0 and 2.1. : : I understand. But I think we could well come accross this issue : with Lucene 2.1 than 2.0? i'm not understanding this part of the thread ... are you saying that if you have two identical setups, the only difference being that one uses 2.0

Re: strange idf in Lucene 2.1

2007-04-12 Thread Chris Hostetter
: But not which terms have an odd IDF value because of those deleted : documents. How much does the IDF value contribute to the "score" in : search? all idf's are affected equally, because the 'numDocs" value used is allways the same ... it really shouldn't affect the scores from a query, it jus

Re: strange idf in Lucene 2.1

2007-04-12 Thread Bill Janssen
> The difference between IndexReader.maxDoc() and numDocs() tells you > how many documents have been marked for deletion but still take up > space in the index. But not which terms have an odd IDF value because of those deleted documents. How much does the IDF value contribute to the "score" in s

Re: strange idf in Lucene 2.1

2007-04-12 Thread Yonik Seeley
On 4/12/07, Bill Janssen <[EMAIL PROTECTED]> wrote: > docfreqs (idfs) do not take into account deleted docs. > This is more of an engineering tradeoff rather than a feature. > If we could cheaply and easily update idfs when documents are deleted > from an index, we would. Wow. So is it fair to

Re: strange idf in Lucene 2.1

2007-04-12 Thread Bill Janssen
> docfreqs (idfs) do not take into account deleted docs. > This is more of an engineering tradeoff rather than a feature. > If we could cheaply and easily update idfs when documents are deleted > from an index, we would. Wow. So is it fair to say that the stored IDF is really the cumulative IDF f

Re: strange idf in Lucene 2.1

2007-04-11 Thread Koji Sekiguchi
Yonik, Thank you for your explanation. In passing, I realized this issue by my customer. They are using Solr. To reproduce the issue with Solr, post exampledocs/*.xml twice and issue a query with q=ipod&debugQuery=on. > This should be the same for Lucene 2.0 and 2.1. I understand. But I think w

strange idf in Lucene 2.1

2007-04-11 Thread Koji Sekiguchi
Hello, I have the following three documents in my index: - Java programming is required to write Lucene application. - Java is a popular computer language. I like Java. - Perl is not a kind of jewelry. It is a programming language. With Lucene 2.0, if I search "java" and print explanation, the o

Re: strange idf in Lucene 2.1

2007-04-11 Thread Yonik Seeley
On 4/11/07, Koji Sekiguchi <[EMAIL PROTECTED]> wrote: In the program, I added these three documents to the index, then deleted all of them, and then added them to the index on purpose. If I optimize the index, idf gets into 1.0 with Lucene 2.1 (uncomment in the program). Is it a feature? docfre