On 4/12/07, Koji Sekiguchi <[EMAIL PROTECTED]> wrote:
> Is the index completely removed between the 2.0 and 2.1 runs?
Sure. If you see my program, you'll find I'm using RAMDirectory.
OK, I think it's due to the change in merge policy.
Lucene 2.0 could under-merge (not enough) or over-merge (b
> Is the index completely removed between the 2.0 and 2.1 runs?
Sure. If you see my program, you'll find I'm using RAMDirectory.
regards,
Koji
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMA
On 4/12/07, Koji Sekiguchi <[EMAIL PROTECTED]> wrote:
Chris,
> i'm not understanding this part of the thread ... are you saying that if
> you have two identical setups, the only difference being that one uses 2.0
> and the other uses 2.1, then you see different idfs after
> adding/deleting/re-add
Chris,
i'm not understanding this part of the thread ... are you saying that if
you have two identical setups, the only difference being that one uses 2.0
and the other uses 2.1, then you see different idfs after
adding/deleting/re-adding many docs?
Exactly. Please try to run the program whic
: But if now the index goes through a massive update, where almost all the
: docs containing TC are deleted, and TC is not in any newly added doc,
: practically TC becomes rare too, and hence D2 should probably be scored
: higher than D1. But IDF(TC) might not (yet) reflect the massive docs
: dele
Chris Hostetter <[EMAIL PROTECTED]> wrote on 12/04/2007 15:22:20:
>
> : But not which terms have an odd IDF value because of those deleted
> : documents. How much does the IDF value contribute to the "score" in
> : search?
>
> all idf's are affected equally, because the 'numDocs" value used is
>
On 4/12/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
: But not which terms have an odd IDF value because of those deleted
: documents. How much does the IDF value contribute to the "score" in
: search?
all idf's are affected equally, because the 'numDocs" value used is
allways the same
The
: > This should be the same for Lucene 2.0 and 2.1.
:
: I understand. But I think we could well come accross this issue
: with Lucene 2.1 than 2.0?
i'm not understanding this part of the thread ... are you saying that if
you have two identical setups, the only difference being that one uses 2.0
: But not which terms have an odd IDF value because of those deleted
: documents. How much does the IDF value contribute to the "score" in
: search?
all idf's are affected equally, because the 'numDocs" value used is
allways the same ... it really shouldn't affect the scores from a query,
it jus
> The difference between IndexReader.maxDoc() and numDocs() tells you
> how many documents have been marked for deletion but still take up
> space in the index.
But not which terms have an odd IDF value because of those deleted
documents. How much does the IDF value contribute to the "score" in
s
On 4/12/07, Bill Janssen <[EMAIL PROTECTED]> wrote:
> docfreqs (idfs) do not take into account deleted docs.
> This is more of an engineering tradeoff rather than a feature.
> If we could cheaply and easily update idfs when documents are deleted
> from an index, we would.
Wow. So is it fair to
> docfreqs (idfs) do not take into account deleted docs.
> This is more of an engineering tradeoff rather than a feature.
> If we could cheaply and easily update idfs when documents are deleted
> from an index, we would.
Wow. So is it fair to say that the stored IDF is really the
cumulative IDF f
Yonik,
Thank you for your explanation.
In passing, I realized this issue by my customer. They are using Solr.
To reproduce the issue with Solr, post exampledocs/*.xml twice
and issue a query with q=ipod&debugQuery=on.
> This should be the same for Lucene 2.0 and 2.1.
I understand. But I think w
Hello,
I have the following three documents in my index:
- Java programming is required to write Lucene application.
- Java is a popular computer language. I like Java.
- Perl is not a kind of jewelry. It is a programming language.
With Lucene 2.0, if I search "java" and print explanation, the o
On 4/11/07, Koji Sekiguchi <[EMAIL PROTECTED]> wrote:
In the program, I added these three documents to the index,
then deleted all of them, and then added them to the index on purpose.
If I optimize the index, idf gets into 1.0 with Lucene 2.1 (uncomment in
the program).
Is it a feature?
docfre
15 matches
Mail list logo